Wiktionary talk:About Chinese

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search

Please see Wiktionary talk:Entries on Chinese characters#Sortkeys and subcats for single-character entries for discussion of how to categorize the single-Chinese-character entries since they (may) apply to Chinese, Japanese, Korean and Vietnamese (CJKV). - dcljr 08:39, 25 January 2006 (UTC)

Archived discussions[edit]

Discussions from before January 2010 are in Wiktionary talk:About Sinitic languages/archive 1.

Category:Entries with translation table format problems[edit]

Autoformat has identified a number of entries that have the non-conforming language name "Chinese (traditional/simplified)". There are others that it has not yet flagged as well. I could not be trusted to correct this properly. DCDuring TALK 17:32, 9 May 2010 (UTC)

As a rule, assume it's Mandarin. Traditional/Simplified entries go on a single, they're not really different 'scripts' but more like the French spelling reforms, were paraître becomes paraitre as the circumflex doesn't serve any purpose. Mglovesfun (talk) 08:26, 29 September 2011 (UTC)


(Note: I don't know a thing about Chinese.) A few questions/issues:

  1. What's up with the categories? There are Category:cmn:All topics, Category:zh:All topics, Category:zh-cn:All topics and Category:zh-tw:All topics. What's the difference?
  2. WT:AZH#Min_Nan says that Min Nan "has four main branches... This poses a problem for Wiktionary, since these dialects are not mutually intelligible, and only one L2 header may be used per ISO 639 code. ... To date, virtually all entries for Min Nan have been based on the Amoy dialect, which is widely considered to be a de facto standard. The disposition of other dialects such as Teochew and Qiongwen Hainanese remains undecided at this time." I'm pretty sure that standard practice for branches among languages is to use context labels for words that don't exist in some branches. Why should this language be different?
  3. I seem to recall some consensus about not allowing toneless pinyin entries? If there was, shouldn't this be mentioned on WT:About Chinese?
  4. WT:AZH lists {{infl}} as being the standard template to use, and repeats it many times for all the languages that do not yet have templates built for them specifically. Rather than showing an explanation for {{infl}} over and over again, wouldn't it make sense to make the page say that for dialects that don't have specific templates yet, use infl, and then explain how to use it once?
  5. Are these languages treated as separate languages or as dialects of one languages? If they're separate languages, why do things like Category:Chinese templates exist, instead of being split into sections?
  6. What is the Wiktionary code for Mandarin, zh or cmn?

--Yair rand (talk) 07:02, 24 May 2010 (UTC)

Just one answer for the moment: #What is the Wiktionary code for Mandarin, zh or cmn?. This is annoying but the assisted method doesn't work well with cmn, it creates {{ tø|cmn| for translations, this they can't be linked to zh:wiki. zh works better but bots change them to cmn. ZH is short for Chinese 中文 (Zhōngwén), CMN is Chinese Mandarin but both have the word Mandarin in templates. I learned to live with this :) The reasons for existence of Chinese and Mandarin are historical. Mandarin is standard Chinese and most written Chinese material is in Mandarin. There are no YUE, NAN, etc. Wiktionaries but there are some new WIkipedias in dialects. --Anatoli 12:36, 24 May 2010 (UTC)
I proposed on WT:BP, and still do propose eliminating zh, zh-cn and zh-tw from category names. zh is used for translations as the Mandarin Wikiprojects uses the code zh not cmn. Mglovesfun (talk) 08:28, 29 September 2011 (UTC)

Move debate[edit]


The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Wiktionary:About Chinese[edit]

I'd prefer Wiktionary:About Chinese languages as a title. It makes it clearer that we don't allow Chinese as a language. Furthermore, as much content as is reasonable/possible should be moved to the individual languages involved - Wiktionary:About Mandarin shouldn't be a redirect. Mglovesfun (talk) 12:55, 9 November 2010 (UTC)

I support moving the contents into Wiktionary:About Mandarin, Wiktionary:About Min Nan, etc. Despite these languages naturally sharing common characteristics, they conceivably have different conventions as well, such as grammar and names of templates. --Daniel. 13:02, 9 November 2010 (UTC)
Wiktionary:About Chinese (or a renamed version) should still exist, at the very least it could give context on what we call 'Chinese' here, and then link to the individual languages' pages. Mglovesfun (talk) 13:22, 9 November 2010 (UTC)
I support moving to About Chinese languages. IMO as long as there is no Mandarin-specific information to be split off of that page, hard-redirect from About Mandarin. Precedent, fwiw, is About sign languages, redirected to from both About American Sign Language and WT:AASE (ase is American Sign Language) as well as from WT:ASGN (sgn is the group (or whatever it's called) code for sign languages).​—msh210 (talk) 21:03, 10 November 2010 (UTC)

Moved. Mglovesfun (talk) 16:17, 25 November 2010 (UTC)

Move debate (2)[edit]


The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

Wiktionary:About Chinese languages[edit]

We don't have a Category:Chinese languages; we have a Category:Sinitic languages for that.

For that reason, I suggest moving Wiktionary:About Chinese languages to Wiktionary:About Sinitic languages. (And keeping the old name as a redirect.) --Daniel 19:05, 25 May 2011 (UTC)

Done. Nobody objected. --Daniel 02:27, 8 June 2011 (UTC)
Next time, please remember to check for double-redirects; in this case, that would be pages that redirect to Wiktionary:About Chinese languages. MediaWiki only supports one level of redirection, so once Wiktionary:About Chinese languages became a redirect to Wiktionary:About Sinitic languages, those redirects stopped working. (Don't worry, I've updated them now. Just something to remember for next time.) —RuakhTALK 03:09, 8 June 2011 (UTC)
OK, I will check for all double-redirects next time. I've fixed some double-redirects to Wiktionary:About Sinitic languages, and missed others, before your help. Thanks. --Daniel 03:58, 8 June 2011 (UTC)

Banning foreign proper nouns as Mandarin[edit]

I propose to make it a language policy of banning all proper names used in Mandarin context if they are not in Hanzi, regardless whether there are citations - Chinese do write in foreign language occasionaly, these foreign words don't become Chinese though. Foreign words should be and are transliterated into Chinese characters, otherwise they should not be considered Mandarin. The complexity is not a justification for not following this rule. This is to avoid entries such Thames河, Alps山, Alzheimer病, etc. once and for all. PRC and RC policies both regard using names in Roman letters as incorrect, which is widely accepted. --Anatoli 05:18, 29 September 2011 (UTC)

I support this. Japanese speakers also use Latin-based foreign words in their writing occasionally, when there is a perfect katakana equivalent. Sometimes, it's done for stylistic reasons (as, very unfortunately, Western cultures are considered trendy in Asian countries), sometimes, well, some just want to show off. You can find this aspect especially in their song lyrics. Quite often the English lines don't even make sense whatsoever. Anyway, I digress. As I noted, writing in foreign scripts especially Latin-based languages is especially trendy among younger generations. Ok let me put it another way. I have seen English speakers putting words in Japanese hira or kata characters in their writing, when the same concept can be written in English perfectly. It's the result of a change in people's perception towards the Japanese (language or otherwise), which is now considered trendy and also the proliferation of Japanese learners in the past decade. Again, does it mean these words are now considered borrowed into English? If you say yes, then I have no problem with Thames河 being included in this dictionary. JamesjiaoTC 06:00, 29 September 2011 (UTC)
Re: setting up a vote (something mentioned in the BP): do you want to set up a vote that would only ban proper nouns? Or do you think common nouns like e-mail地址 should be banned, too? If so, then the vote could be broader. But your comments on RFV suggest you wouldn't delete all mixed-script entries (eg Y字). Presuming you'd like to ban e-mail地址 but not Y字, how can the vote be worded, so that it does that? - -sche (discuss) 06:03, 29 September 2011 (UTC)
@-sche, don't get me wrong, mixed scripts are perfectly normal, like the ones you listed and many more, eg. AA制. Karaoke can only be written as 卡拉OK in Mandarin. I'm talking about proper nouns, I don't want mislead users to believe that Oslo is Oslo市 in Chinese, even if you find examples of usage. I have seen a Chinese map of Australia on a Chinese site on the internet where ony biggest cities were translated into Chinese. A user like Engirst would start quoting the untranslated names as Mandarin, which is wrong.
@Jamesjiao, sorry you lost me, I don't know what you mean. Could you rephrase it, please?--Anatoli 06:20, 29 September 2011 (UTC)
I was just comparing the analogy of using Japanese hiragana/katakana in English (esp. among Japanophiles) with the use of English (or other Latin script based languages) words in Chinese (due to trendiess probably?). This might not be a perfect analogy, but it's a start. You will also find that people are more inclined to use Latin characters in, especially for Proper nouns when using a computer keyboard (as opposed to handwriting). I also mentioned the fact that monolingual Chinese speakers wouldn't understand a mixed construction like this. JamesjiaoTC 06:45, 29 September 2011 (UTC)
Oh another thing is pronunciation. For a word to exist in a language, there has to be a way to pronounce it. I can't imagine a non-English speaking Chinese speaker trying to pronounce Thames河 even if he/she is able to recognize and even pronounce the individual letters. JamesjiaoTC 06:52, 29 September 2011 (UTC)
I definitely don't think that Kana words in English are to be considered English but I haven't seen it, that's why I couldn't understand what you mean. Yes, you're right, most Chinese speakers wouldn't have a clue how to pronounce Thames河 or Seine河, Hudson河 or Volga河. --Anatoli 09:47, 29 September 2011 (UTC)

There is no only one standard for Chinese language. Chinese is not only for Mainland China, but for Taiwan, Hong Kong, Macau, Singapore and overseas. Such as President Bush is written as 布什, 布殊 and Bush as well. 13:02, 30 September 2011 (UTC)

In which part of the world is the standard Chinese name for Bush "Bush"? 13:13, 30 September 2011 (UTC)
There is no only one standard. A dictionary just record the words exist. 14:09, 30 September 2011 (UTC)

Wow, I get such a strong sense of déjà vu here... Engirst, do you have any original arguments? Your points above have been refuted. As noted elsewhere:

  1. we already have a record of Thames and a record of ;
  2. using a term from one language in a sentence of another language may represent w:code-switching instead of borrowing;
  3. there is nothing intrinsically Chinese about Thames;
  4. the use of Thames in Thames河 is an example of an English term used as an English term in a Chinese context;
  5. the use of Thames in Thames河 is a collocation of two independent terms;
  6. as a non-idiomatic sum-of-parts phrase, Thames河 fails WT:CFI, just as yellow sweater or tasty kumquat fail WT:CFI for the same reason.

So, to extrapolate a basic list of criteria for including any word from Language A under the heading for Language B, not just proper nouns:

  1. Is the term used in Language B to convey any meaning that is different from its meaning in Language A?
  2. Alternately, is the term used widely enough in Language B that most speakers and/or readers of Language B should be expected to know and readily use the term?

Well, that's it, actually. I can't think of any other solid reasons for including a term from one language under the heading for another language. Use in Language B does not necessarily mean that the term has been adopted into that language. As soon as the term is used as Language B, i.e. where it has some meaning that is specific to that language or where it is well-known and widely used, then I am happy to advocate listing under both Language A and Language B headings. -- HTH, Eiríkr Útlendi | Tala við mig 23:04, 30 September 2011 (UTC)

Your list seems good for the vote. I suggest to add the Mandarin romanisation entries, like Thames Hé vs Tàiwùshì Hé, the former falls into the same category. --Anatoli 21:43, 2 October 2011 (UTC)
This is a very comprehensive list. Code-switching is what I had in mind, but I couldn't remember the term at the time. Code-switching occurs extremely often in Taiwan, not just between Mandarin and English, but Japanese, Korean and even their local flavour of Hokkien dialect as well. I often see short Japanese phrases like かわいいね。。。 in Taiwanese online blogs mixed in with Chinese characters. This is a very typical case of code-switching in writing. JamesjiaoTC 02:06, 5 October 2011 (UTC)
The vote to ban this kind of entries is set up here. Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters. --Anatoli 01:05, 3 October 2011 (UTC)
Not being a speaker of Mandarin or Japanese, I have a question which might help to clarify the issue for those in a similar position. Which of the following example in English best equates to "Thames河" in Mardarin: "résumé" (a French word, wholly adopted but retaining glyphs which are not properly in the English alphabet), άλφα (a Greek word which, when used, is italicized to indicate that it is from a different language), or something completely different? I do think it might be a bit early for voting, since in all of the discussions around this topic I have only seen 5 or 6 contributors. - [The]DaveRoss 02:37, 5 October 2011 (UTC)
In answer to your question: this is like Москва#English (a foreign word, which indicates that it is from a different language by being in a different script). - -sche (discuss) 03:41, 5 October 2011 (UTC)
TheDaveRoss, it's only one user, not many (who creates/recreates them), trust me, with different IP's. The issue at hand is that this user claims that "Thames河" - English "Thames" + (river) is a Mandarin word, citing examples from books. Note that river names are always followed by or other similar words in Mandarin. There are other examples where foreign names are written in Mandarin without translating, showing the foreign name in the original script. My argument is that the Chinese word for Thames is 泰晤士河 (Google Books -3,150 hits) and there is no reason to include the SoP term Thames河, there is nothing Chinese in Thames. The rule and common practise is transliterate/translate people's names cities, etc. no matter how small. There are borrowings into Mandarin, very few have also a few Roman letters (三K黨 / 三K党 Ku Klux Klan) but writing full names in Roman letters is a case of code-switching. OK#Mandarin is a common noun, not a proper name, it has become partially naturalised. Like any other language, Mandarin uses native script to write words, using other scripts when it absolutely has to. "London市" or "Hyde公园" are not exceptions, they are case of code-switching (simply Chinglish) - correct and common terms - "伦敦", "海德公园". The issue is not just Mandarin specific. Some argue that bluetooth should be the right way to write the word in Russian. A similar situation could arise for Japanese, Russian, Hindi or Korean, Arabic, others, where people insert Roman letter names. I believe these names don't become naturalised. I hope expressed myself well. If a word in Roman becomes naturalised, then we can include them, still discussing pizza#Mandarin (a common word). --Anatoli 03:07, 5 October 2011 (UTC)

Pinyin with no tra or sim[edit]

Is there any sensible way to find these? I have been speedy deleting some of these; given that {{pinyin reading of}} links to the tra and sim, it seems reasonable. For example we don't allow plurals that don't have a singlular ({{plural of|xyz}} when xyz doesn't exist yet). If anyone wants to create Hanzi entries for these, then recreate the pinyin, it is with my blessing. Mglovesfun (talk) 12:27, 2 October 2011 (UTC)

I don't understand what you said. Engirst 12:40, 2 October 2011 (UTC)
He is saying that we don't allow a plural form entry for English words when the singular form does not yet exist. He is asking if that also means that we shouldn't have the pinyin form when the traditional or simplified Mandarin forms do not yet exist. He has been deleting them when he sees them. - [The]DaveRoss 02:39, 5 October 2011 (UTC)
I think Engirst considers the character entries too complex and is not worth his time creating. I digress. There is in fact here: vote (That a pinyin entry, using the tone-marking diacritics, be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling.). It doesn't however explicitly exclude pinyin entries when there are no character entries present. Maybe the wording can be change to something like: That a pinyin entry, using the tone-marking diacritics, only be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling.. JamesjiaoTC 02:47, 5 October 2011 (UTC)
Sounds like a reasonable suggestion. There's not enough resources validating Romanisation entries (SoP, attestability, etc. let alone the Chinese characters - often one version is omitted). Not sure how this can be done but I support voting on this. Maybe Engirst will start creating some Chinese character entries before adding pinyin? (wishful thinking) --Anatoli 03:11, 5 October 2011 (UTC)
I'm OK with users adding valid pinyin (attestable / with correct tone-markings) without adding hanzi, I'm also OK with users creating valid plurals (attestable) without creating the singulars... we allow that on de.Wikt, we even have bots to create forms without regard to the presence of the lemmata, because in that way, a user who looks up the form or the pinyin will at least have a bit of information, better than nothing. Having said that, I think all of you, as the active Chinese editors, could form a consensus and agree that you interpret the vote as requiring hanzi to exist first (this is how I always interpreted the vote), and delete pinyin entries that have no hanzi form, without having a new vote. - -sche (discuss) 03:36, 5 October 2011 (UTC)
Seems like without any vote, nothing can be achieved in Mandarin space, most active Chinese editors (except for this user) all disagree with Engirst (he may now be avoiding his own user account) but changing or deleting his entries causes edit wars or someone may think he is just being bullied. --Anatoli 03:55, 5 October 2011 (UTC)
Just as another reference for comparison --
If I understand it correctly, the current policy for Japanese entries is to have the main entry with most of the information located under the kanji headword when there is one, or under the kana headword otherwise, and for the romaji (Japanese pinyin, as it were) entries to *only* serve as disambig pages pointing users to the relevant other headwords. Consequently, romaji entries should not have any "See also", "Derived terms", "Usage notes", or other headings. The kōgai entry is a good example of this in action. -- Eiríkr Útlendi | Tala við mig 04:56, 5 October 2011 (UTC)
Pinyin romanisation rules went further - parts of speech are not allowed but we do have many pinyin entries without hanzi. --Anatoli 00:23, 7 October 2011 (UTC)

Are the tone-markings on these words correct?[edit]

Talk:Nèi Ménggǔ, Talk:Ménggǔ. (Other editors: feel free to list entries in this section if you doubt they have correct tone-markings. It should be helpful to have a single place to gather them for cleanup. If there is such a place already, other than the clogged WT:RFC page, please move these there.) - -sche (discuss) 12:01, 3 October 2011 (UTC)

It's Nèi Měnggǔ and Měnggǔ. --Anatoli 12:51, 3 October 2011 (UTC)
Your examples show the tone sandhi where the original third tone is pronounced as second in front of another third tone but it's usually not reflected in pinyin romanisation. --Anatoli 12:54, 3 October 2011 (UTC)

Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries[edit]

Wakie-wakie, the vote is on. --Anatoli 04:50, 19 October 2011 (UTC)

Mandarin part of speech template[edit]

Templates like {{cmn-noun}} allow p for pinyin as a first parameter. This should be phased out. There's an effort to remove all pinyin from part of speech categories and have them only in Category:Mandarin pinyin and subcategories, at some point the templates will have to follow suit, though we're months away from being ready. So this is a heads up. --Mglovesfun (talk) 21:22, 10 November 2011 (UTC)

But this parameter serves the same purpose as tr - transliteration and the hyperlink allows to see if there are other hanzi with the same pinyin. I have no strong opinion on your suggestion at the moment.
I've been checking your list at User:MglovesfunBot/cmn-parts-of-speech-Latn, as you have noticed. It's quite big, very time consuming, inviting other Sinophone editors to join the effort. If the entry' hanzi are red-linked, it can be deleted, rather than converted. Sometimes I also leave entries if they only have a Japanese but no Mandarin entry (planning to add them later). --Anatoli 21:54, 10 November 2011 (UTC)
{{cmn-noun|p}} is used for Mandarin nouns in the Latin script. Since we no longer use {{cmn-noun}} (cmn-adj, adv, abrr, etc.) for pinyin entries. Something like {{cmn-noun|ts|pin=fú}} will still work! --Mglovesfun (talk) 18:37, 11 November 2011 (UTC)
I misunderstood, sorry, I was thinking about pin parameter. Can you give an example, please? --Anatoli 21:47, 13 November 2011 (UTC)

Category:Mandarin Wade-Giles[edit]

Err, when did we approve Wade-Giles transliterations for inclusion? I can kinda understand Pinyin, but this? -- Liliana 15:23, 10 December 2011 (UTC)

Thanks. Binned. --Anatoli (обсудить) 03:20, 11 December 2011 (UTC)

Audio files[edit]

See commons:Commons:Village_pump#Category:Chinese_pronunciation. Mglovesfun (talk) 13:00, 19 February 2012 (UTC)

I have some doubts about your request. The main reason being many homophones, and then the request should also specify if we want jiantizi, fantizi or both (there are variant characters) too. The conversion is far from straightforward. Perhaps, using audiofiles based on toned pinyijn was the right choice, even if it's more complicated to use bots to add audio files to hanzi entries. I see some of audio entries miss tone marks. --Anatoli (обсудить) 21:43, 19 February 2012 (UTC)
I think the audio files should stay at the pinyin filenames, because if I am not mistaken, multiple characters with the same pinyin romanization X have the same pronunciation. Giving the file a pinyin filename allows it to be uploaded to all characters that have pinyin X. It seems easier to write a bot to do that, than to host the same file under dozens of names. - -sche (discuss) 21:48, 19 February 2012 (UTC)


Note: the title of this section was previously {{Commonsrad}}.

Sarang (talkcontribs) has created {{Commonsrad}}, and would like me to run a bot that will add it to all entries and indices for radicals (e.g. and Index:Chinese radical/一). Does everyone agree that this should be done? —RuakhTALK 15:38, 11 April 2012 (UTC)

If it's going to be bot-added, there is no harm in giving it a clearer name first. Maybe {{Commons radical}}? —CodeCat 16:29, 11 April 2012 (UTC)
If Commonsrad seems not clear enough, I have no objections to give the name 5 bytes more — the data space of 2000 bytes more won't mind either. I chose a name close to {{Commonscat}} because it is very similar to it. In fact, Commonsrad can told a variation of Commonscat but with a display better suited for its usage, and the possibility for easy expansion whenever wanted. If then it may be used to link non-radical Chinese glyph Wiktionary pages to their Commons categories, Commonsrad is not so misleading than a clearer descriptive name like {{commons radical}}. -- sarang사랑 18:09, 11 April 2012 (UTC)
It seems to be a use at Wiktionary to have template names with lower case initials (with upper case redirects)? Another question to decide! -- sarang사랑 05:48, 12 April 2012 (UTC)
I'm not exactly sure from the description what the template will do, but it looks harmless enough. -- A-cai (talk) 22:26, 17 April 2012 (UTC)
Template has been moved to {{commonsrad}}, hence the red links above. Mglovesfun (talk) 22:28, 17 April 2012 (UTC)


I'm not really active in this project, but I did add a new appendix, Appendix:Baxter-Sagart Old Chinese reconstruction. It's referenced, and the table data is programmatically generated from the reference data with a program whose source code I also made available. I hope this in some way can be of help. - Gilgamesh (talk) 22:57, 31 May 2012 (UTC)


If someone knowledgeable could check that the pronunciation and pinyin of [[葡文]] are correct, it would be appreciated. :) - -sche (discuss) 21:00, 26 December 2012 (UTC)

Thanks for that. Does "(written)" mean that 葡文 refers to written Portguese, or that 葡文 is {{literary|lang=cmn}} and mostly used in written Chinese and not in spoken Chinese? - -sche (discuss) 21:52, 26 December 2012 (UTC)
It refers to written Portuguese (normally). 葡萄牙語 / 葡萄牙语 (Pútáoyá yǔ) and 葡萄牙文 (Pútáoyá wén) are more common words. The suffix / (yǔ) more commonly refers to the spoken and (wén) to the written language. --Anatoli (обсудить/вклад) 22:34, 26 December 2012 (UTC)
Ah, interesting! - -sche (discuss) 00:01, 27 December 2012 (UTC)

, 𡰪[edit]

The transliteration and four-corner number, respectively, of these characters were tagged {{fact}}; can anyone verify them? they and the Japanese character (the On-reading of which has been questioned) are the last remaining Han characters tagged {{fact}}. - -sche (discuss) 00:01, 27 December 2012 (UTC)

Toneless pinyin usage notes[edit]

Currently, our toneless pinyin entries all have a usage note at the bottom which says:

  • English transcriptions of Chinese speech often fail to distinguish between the critical tonal differences employed in the Chinese language, using words such as this one without the appropriate indication of tone.

I don't have much of a problem with it (although maybe "Chinese" should be changed to "Mandarin"), but I realized that if we do want to change it, it will be somewhat difficult, and some of them may be edited and fall out of synch. To solve that, I propose that we create a template called {{cmn-toneless-note}} or something similar and ask an editor with an AWB account to change all instances of the text into a template call. What do you guys think? —Μετάknowledgediscuss/deeds 19:13, 6 January 2013 (UTC)

Support. - -sche (discuss) 19:43, 6 January 2013 (UTC)
Support. Also, "using words" should probably be "writing syllables". (We don't have toneless-pinyin entries for whole words, only for individual syllables.) —RuakhTALK 20:28, 6 January 2013 (UTC)
Well... sort of. On one hand, you are correct that this is only used for specific syllables, but OTOH the syllables are words, in the loose Chinese way of looking at what constitutes a word. (One Chinese man was trying arduously to convince me that all words in Mandarin are one syllable long. I was unsuccessful in my attempts to get him to revise his native definition of what a word is to the Western linguistic concept.) Incidentally, the entries (like nu#Mandarin) also point to forms like , which not only is marked for tone but also has a different vowel, and perhaps the note should reflect that. (Of course, I'm not sure how useful that is anyway, because when my friends don't have access to the character , they type nv3, not the equally inaccessible diacritic form.) —Μετάknowledgediscuss/deeds 21:13, 6 January 2013 (UTC)
Well, if our goal were to conform to "the loose Chinese way of looking at" their languages, then we'd treat all of them as dialects of a single language. It isn't, so we don't. By most linguistically-well-informed accounts, the vast majority of Mandarin words are bisyllabic. —RuakhTALK 22:50, 6 January 2013 (UTC)
I personally find your comment rather arrogant and disparaging. 04:36, 10 January 2013 (UTC)
I don't find it arrogant but one needs to know Chinese (also Vietnamese, Thai, etc.) are traditionally called monosyllabic as all or almost all polysyllabic words are made of component words, exceptions are phonetic transription, characters that have lost their meaning over the time but it's less of a case with Mandarin. --Anatoli (обсудить/вклад) 04:44, 10 January 2013 (UTC)
I was referring to the "dialect/language" comment, where he regarded "we" as identical to himself in having the personal stance of considering "Chinese is not a single language" to be false. It is a language, by Wikipedia at least. 05:04, 10 January 2013 (UTC)
Views on this differ but I agree that Chinese topolects are more like dialects than separate languages, even if they may not be mutually comprehensible when spoken, quite different on the written level, they are often closer than dialects of other languages (provided they are written the Chinese way, using hanzi, not Roman, Cyrillic, Arabic or other scripts). Wiktionary treats Chinese topolects differently as per language headers but translation are all nested under "Chinese", e.g. Chinese/Mandarin, Chinese/Cantonese, etc. --Anatoli (обсудить/вклад) 05:13, 10 January 2013 (UTC)

Please note that full words in toneless pinyin were explicitly forbidden by votes and almost unanimous agreements, it happened before Metaknowledge became active. --Anatoli (обсудить/вклад) 22:54, 6 January 2013 (UTC)

So do you support this? —Μετάknowledgediscuss/deeds 06:03, 9 January 2013 (UTC)
Yes, Support. --Anatoli (обсудить/вклад) 04:31, 10 January 2013 (UTC)
Erm... so do any of you AWBers/botters want to actually do it? —Μετάknowledgediscuss/deeds 04:59, 10 January 2013 (UTC)
Delete all pinyin, whether toned or not. Move it to Appendix at least. It is merely a transcription scheme, not even official orthography. 05:06, 10 January 2013 (UTC)
It doesn't work this way. IP users (anonymous) with no or little contributions have little influence and structure is decided after discussions, votes, etc. Entries in Category:Mandarin pinyin do not claim they are proper writing, they are a helpful tool for users to help them find hanzi entries. They have limited information, all information is contained in hanzi entries. Compare bàoyuàn and 抱怨. --Anatoli (обсудить/вклад) 05:19, 10 January 2013 (UTC)
I knew they contain limited information. Still, they should not exist in the main namespace. This is a dictionary, much more specific than a "tool". The search function is sufficient in directing users to character entries for polysyllabics. With the monosyllabics a link to an Appendix page is all that is necessary. Keeping everything in the main namespace is unworthily energy-consuming. 06:40, 10 January 2013 (UTC)

Proposal to change topical categories for Mandarin to match other languages, sort by pinyin, not radical[edit]

See Wiktionary:Beer_parlour/2013/April#Some small changes to Mandarin (also Cantonese, Min Nan) entry structure and about topic categories - suggestion. --Anatoli (обсудить/вклад) 00:20, 11 April 2013 (UTC)

Chinese entries with vowelless pronunciations[edit]

The pronunciation transcriptions in the following entries do not list vowels, though I suspect they should:

  1. 妒嫉
  2. 积累
  3. 積累
  4. 妓男
  5. 喊叫
  6. 水汽
  7. 冷靜
  8. 坚固
  9. 堅固
  10. 记住
  11. 記住
  12. 评价
  13. 評價
  14. 相机
  15. 相機
  16. 即将
  17. 即將
  18. 前门
  19. 前門
  20. 经历
  21. 經歷
  22. 金牌
  23. 决不
  24. 決不
  25. 绝不
  26. 絕不
  27. 告罄
  28. 模具
  29. 顺其自然
  30. 順其自然
  31. 布拉吉

- -sche (discuss) 23:22, 23 May 2013 (UTC)

How did you find them, at random or you have a script for that? User:Tooironic used to add IPA but he is less active now, User:Wyang has developed an entry creation template - Template:cmn new, which also generates the IPA, so for 积累, the IPA is /t͡ɕi⁵⁵ leɪ̯²¹⁴⁻²¹⁽⁴⁾/. My preference is to delete the IPA altogether (replace with {{rfp}}, rather than showing the wrong info. --Anatoli (обсудить/вклад) 23:36, 23 May 2013 (UTC)
I found them at random(ish). I used WP:AWB to find entries containing deprecated IPA characters, and happened to notice that in addition to containing deprecated characters, all of these entries also lacked vowels. - -sche (discuss) 23:47, 23 May 2013 (UTC)

Unified Chinese vote[edit]

Wiktionary:Votes/pl-2014-04/Unified Chinese is starting tomorrow. --Anatoli (обсудить/вклад) 00:45, 28 March 2014 (UTC)

Capitalisation of demonyms and language names - a mini-vote[edit]


@Tooironic, @Jamesjiao, @Kc kennylau, @Wyang

Demonyms and language names are common nouns in Chinese. I suggest to use lower case for pinyin and no space, even if dictionaries are inconsistent. Please vote below and invite anyone who might be interested. So, for example: For 中國人中国人 (zhōngguórén) - zhōngguórén, 中文 (zhōngwén) - zhōngwén, not Zhōngguórén/Zhōngguó rén and Zhōngwén.

Rationale: they are nouns and automatic pinyin generation makes them in lower case, Japanese has already implemented this.

  1. Symbol support vote.svg Support Use lower case, common nouns (not proper nouns), spell pinyin without a space for most demonyms and language name --Anatoli (обсудить/вклад) 00:45, 8 May 2014 (UTC)
  1. Symbol oppose vote.svg Oppose The official instruction is to use capital letters and spaces. See w:Pinyin#Capitalization and word formation. --kc_kennylau (talk) 09:00, 8 May 2014 (UTC)
    I don't mean place or personal names. It's about languages and demonyms--Anatoli (обсудить/вклад) 09:08, 8 May 2014 (UTC)
    They're just names anyways. Do you capitalize the word English? --kc_kennylau (talk) 09:56, 8 May 2014 (UTC)
    I do, in English but nihongo or nihonjin is not capitalised. Russian, Finnish doesn't capitalise those. French only capitalises demonyms, not languages. It can go both ways with language names and demonyms, dictionaries have one or the other way. That's why this discussion. --Anatoli (обсудить/вклад) 10:20, 8 May 2014 (UTC)
    Okay, please find me examples of both cases, and I'll switch to abstain (I'm so lazy). --kc_kennylau (talk) 10:28, 8 May 2014 (UTC)
  1. Don't really have any preference for this as I am generally not interested in Pinyin. Wyang (talk) 01:02, 8 May 2014 (UTC)
    What about proper vs common nouns. Is 普通话 or 美国人 a common or a proper noun? --Anatoli (обсудить/вклад) 09:08, 8 May 2014 (UTC)

--Anatoli (обсудить/вклад) 00:45, 8 May 2014 (UTC)


I ran into a capitalized pinyin entry today: Lai2 (linked to from ). Capitalized tone-number pinyin like that does look weird to me. Capitalized diacritical pinyin looks less weird. - -sche (discuss) 20:11, 31 May 2014 (UTC)

Capitalisation and part of speech of month names[edit]

Related to the preceding topic, capitalisation and part of speech of month names is being discussed at User talk:LlywelynII#Chinese_months_as_proper_nouns. - -sche (discuss) 18:23, 30 August 2016 (UTC)

Transliterations of months should definitely be lower case and common nouns in Chinese. --Anatoli T. (обсудить/вклад) 23:25, 26 September 2016 (UTC)

Single-character entry format[edit]

@Suzukaze-c Hi. Are you willing and interested in expanding the Wiktionary:About_Chinese#Entry_format section regarding single-character entries, the use of Definitions header and the need for {{zh-hanzi}}, parameters for |cat= in {{zh-pron}}? --Anatoli T. (обсудить/вклад) 23:18, 26 September 2016 (UTC)

I am for most of it, but what about the blurry area of "what belongs on Template:zh-pron/documentation" and "what belongs on Wiktionary:About Chinese"? (For example, how much do we write about |cat= on each page, etc.) —suzukaze (tc) 23:38, 26 September 2016 (UTC)
This is a policy document. That's the difference. It's OK to duplicate a bit and link to Template:zh-pron/documentation for detail. --Anatoli T. (обсудить/вклад) 23:47, 26 September 2016 (UTC)
@Atitarev Wiktionary:About_Chinese#Entries_for_single_characters. What do you think? —suzukaze (tc) 03:49, 27 September 2016 (UTC)
Looks good, thank you! The document can be tweaked over time but it's a good start.--Anatoli T. (обсудить/вклад) 03:52, 27 September 2016 (UTC)

Obsolete policies on Middle and Old Chinese[edit]

==Historical languages==
{{wikipedia|History of the Chinese language}}
{{wikipedia|Historical Chinese phonology}}

Historical Sinitic languages include the spoken languages {{w|Middle Chinese}} (ltc) and {{w|Old Chinese}} (och), the written language {{w|Literary Chinese}} (lzh), and the protolanguage {{w|Proto-Sino-Tibetan}}. Entries for words in these languages are used, except for Proto-Sino-Tibetan, which is a protolanguage and thus in the Reconstruction namespace. These terms can also appear in etymologies for entries in modern Sinitic languages, and in entries for languages that have borrowed from Chinese, notably Japanese, Korean, and Vietnamese.

Finer distinctions are possible, such as Late Middle Chinese and Early Middle Chinese for the spoken language, and Literary Chinese versus earlier Classical Chinese for the written language. These distinctions can be made in the text of etymologies, but these do not have ISO 639 codes, and thus are not used for level 2 headings.

The precise meaning and status of these “languages” is complicated: narrowly speaking “Middle Chinese” and “Old Chinese” refer to various phonological reconstructions, notably based on rime dictionaries, and do not necessarily refer to a specific historical dialect or common language. Nevertheless, they are useful designations for historical periods.

Most modern Sinitic languages descend from Middle Chinese, with the notable exception of Min, which diverged earlier, with Proto-Min also descending from Old Chinese; see [[w:Historical Chinese phonology#Branching off of the modern varieties|branching of modern varieties of Chinese]]. A notable example of this difference is {{m|zh|茶}}, from which English {{m|en|tea}} is from Min and {{m|en|chai}} is from other Chinese.

Literary Chinese is significantly different from the spoken languages; this may be compared with Medieval Latin versus Romance languages. Literary Chinese (lzh) is the correct source language for literary terms in modern Sinitic languages, notably {{w|chengyu}} ({{w|four-character idiom}}s), and in borrowings such as the corresponding Japanese {{w|yojijukugo}}.

===Middle Chinese===
{{wikipedia|Middle Chinese}}

As Middle Chinese phonology is not attested (it is only reconstructed), please be sure to mark pronunciations with *.

===Old Chinese===
{{wikipedia|Old Chinese}}
{{wikipedia|Old Chinese phonology}}

As {{w|Old Chinese phonology}} is not attested (it is only reconstructed), please be sure to mark pronunciations with *. As sources differ, please carefully cite specific references (author and year) for any reconstructions.

Obsolete policies on cognates and stubs[edit]

==Cognates and stubs==
Across Sinitic languages, a single written form is very frequently shared across a long historical period and wide geographical area. Thus cognate entries in different languages appear on the same page; this occurs quite frequently for cognates in closely related languages in other scripts, but to nowhere near the same degree as in Sinitic languages. Due to this, it is generally unhelpful, and possibly incorrect, to create an entry for one Sinitic simply by copying the heading and definitions for Mandarin. It is unhelpful because this adds no information beyond which a reader could themselves guess (cognate so probably the same meaning), and possibly incorrect because words do differ between these language; blindly copying without a reference is not reliable.

Thus, when creating a new Sinitic entry, please try to add ''some'' information distinctive to the particular language, particularly pronunciation, references, or citations.

For etymologies, each entry should include an Etymology section indicating its immediate ancestor term. For native words in modern Sinitic languages this is either Middle Chinese (most) or Proto-Min (thence Old Chinese) for Min languages. Per usual practice (see [[Wiktionary:Etymology]]), it is acceptable to include full etymologies back to Proto-Sino-Tibetan in modern entries. However, unless there is something specific to the etymology of a term in a given language, this is tedious to repeat for all modern languages. It is thus preferred (and sufficient) to only include the full history at representative languages, namely Mandarin and Min Nan (most used in each branch), with other languages just indicating the immediate predecessor and having a link reading “more at Mandarin/Min Nan”.

Similarly, it is tedious and not helpful to list contemporary cognate terms ''unless'' some particular relationship or contrast is being given. Instead, ancestral relationships can be given both backwards (in the Etymology section), to Middle Chinese, Old Chinese, and Proto-Sino-Tibetan, and forwards (in the Descendents section), from Middle Chinese, Old Chinese, and Proto-Sino-Tibetan to later forms. In these Descendents sections, listing pronunciations of descendent terms along with the spelling allows easy comparison, and avoids the duplication of the same listing in all modern forms. These are more useful than sibling relationships between cognates.

New font for Chinese?[edit]

@Justinrleung, Suzukaze-c Is it just me who feels the font for Chinese is not as pretty as Japanese? I updated my Mac and it has become even uglier. It lacks the 'weight' (is this the correct term?) in comparison. For example, - even the cangjie input looks prettier than the Chinese font. Thoughts? (Disclaimer: I know nothing about fonts...) Wyang (talk) 06:25, 12 October 2016 (UTC)

I don't know what it looks like on a Mac, nor what fonts are available on a Mac... —suzukaze (tc) 06:34, 12 October 2016 (UTC)
Some screenshots: Hani, Hant and Hans. The Hani font can perhaps be improved... edit: screenshot of the zh-ja comparison. Wyang (talk) 06:44, 12 October 2016 (UTC)
What does this look like? —suzukaze (tc) 07:20, 12 October 2016 (UTC)
It looks like this. I feel that all the ones below are more aesthetically pleasing. Wyang (talk) 07:25, 12 October 2016 (UTC)
Here are how they look on my Mac on three browsers. I think I've changed my browser's font settings, so I don't have the problem that you have. — justin(r)leung (t...) | c=› } 07:36, 12 October 2016 (UTC)
It looks like the browser default (and thus probably the best choice) is "PingFang SC". SimSun seems to be imposed on readers by MediaWiki:Common.css. (man, there are some questionable font choices there...) —suzukaze (tc) 07:41, 12 October 2016 (UTC)
Code2000? That's the last font I (or anyone) would want for Chinese. And why would the generic sans-serif be put first? — justin(r)leung (t...) | c=› } 07:47, 12 October 2016 (UTC)
(holy shit someone shares my hatred for code2000) It's also weird how the fonts for .Hans and .Hant are defined a second time later on. —suzukaze (tc) 07:48, 12 October 2016 (UTC)
I think it may have been me who messed it up before (羞慚). Any recommendations on what the
/* Chinese (Han) */
block should be changed to? Wyang (talk) 08:32, 12 October 2016 (UTC)
Maybe this:
/* Chinese (Han) */

/* Hani: generic */
/* Hans: simplified */
/* Hant: traditional */

.Hans {
	font-family: PingFang SC, Heiti SC, DengXian, Microsoft Yahei, SimHei, Source Han Sans CN, Noto Sans CJK SC, SimSun, NSimSun, SimSun-ExtB, Song, sans-serif;
.Hant {
	font-family: PingFang TC, Heiti TC, Microsoft Jhenghei, Source Han Sans TW, Noto Sans CJK TC, PMingLiU, PMingLiU-ExtB, MingLiU, MingLiU-ExtB, Ming, sans-serif;

.Hant {
	font-size: 1.2em;

.Hani, .Hani *,
.Hans, .Hans *,
.Hant, .Hant * {
	font-style: normal;
	font-weight: normal;

big.Hani, strong.Hani, b.Hani, b .Hani,
big.Hans, strong.Hans, b.Hans, b .Hans,
big.Hant, strong.Hant, b.Hant, b .Hant {
	font-size: 137%;

.Hani b,
.Hans b,
.Hant b {
	font-size: 125%;
suzukaze (tc) 01:58, 13 October 2016 (UTC)
Ooohhh, I like this. It definitely looks better and more solid than before. If no one objects, we will change it to this until someone proposes an improvement. Wyang (talk) 07:49, 13 October 2016 (UTC)

Simplified Chinese in all templates and modules[edit]

@Wyang, Justinrleung, Suzukaze-c, Tooironic, Kc kennylau, Bumm13

I think we should stick to the promise of providing simplified Chinese in all templates, modules. The dialectal data tables currently don't show simplified forms. Do people think we need to cater for that? I understand this will be formatting and other work involved but simplified Chinese users shouldn't feel neglected. --Anatoli T. (обсудить/вклад) 09:31, 14 October 2016 (UTC)

Yeah, it is disabled for now. Displaying both made the table look very cluttered. I was thinking about developing a js switch for all Chinese entries, allowing the user to choose trad/simp in all Chinese texts (zh-l, zh-x, zh-der, zh-dial, etc.). Wyang (talk) 11:01, 14 October 2016 (UTC)
But that will only work for registered users. How about we have the simplified characters display as ruby, like this: , ? (We might want to increase the size of the ruby.) — justin(r)leung (t...) | c=› } 16:36, 14 October 2016 (UTC)
The switch may be a dropdown underneath the ==Chinese== header, similar to how this page hides the romanisation on a click. The Ruby method is potentially good too, if we can increase the size and align them well, though making links may be more complicated. I think User:Suzukaze-c was trying to write some sort of gadget for this some time ago, but I can't find it now. Wyang (talk) 21:21, 14 October 2016 (UTC)
Why not just display 我們我们 with a suppressed romanisation? The columns may need to get wider and care should be taken to have correct conversions with the ability to override. What does everybody think? --Anatoli T. (обсудить/вклад) 02:53, 15 October 2016 (UTC)
I support the idea of showing simplified Chinese wherever possible and when it doesn't look cluttered. —suzukaze (tc) 05:24, 29 October 2016 (UTC)


Ranked first (+4089) when sorted by change in #gloss definitions. Wyang (talk) 03:51, 7 November 2016 (UTC)

Still going strong - number one (+3738) in November 2016. Wyang (talk) 16:44, 13 December 2016 (UTC)
First again (+3450). 再接再厲! (壓力山大) Wyang (talk) 05:33, 4 February 2017 (UTC)
First again (+4868). 再接再厲! 奔向100000個詞。 Wyang (talk) 12:26, 9 April 2017 (UTC)


Are we having entries like 印第安納州, or do we treat them like 上海市 (redirect to 上海?) —suzukaze (tc) 08:16, 11 November 2016 (UTC)

I'd say nah, unless it's an abbreviation, like 安省. Wyang (talk) 09:14, 11 November 2016 (UTC)

Definitions format overhaul[edit]

Hi all. I'm thinking about overhauling the format of Chinese definitions, by using a templated approach which strictly associates word information (part of speech, synonyms, antonyms, measure words, examples, dialectal equivalents, etc.) with the individual senses. It may be along the lines of User:Wyang/zh-def. I think this is more conducive to the efficient expansion of the Chinese content with more synonyms, antonyms, ... etc. information. What does everyone think about the changes? Wyang (talk) 09:16, 23 November 2016 (UTC)

@Suzukaze-c, Justinrleung, Atitarev, Tooironic, Hongthay, Mar vin kaiser Wyang (talk) 10:20, 23 November 2016 (UTC)

+1, very attractive, but I fear it's too radically different from the standard entry format. —suzukaze (tc) 09:35, 23 November 2016 (UTC)
I think the formatting of Chinese definitions should match the formatting of definitions for other languages. —Granger (talk · contribs) 12:17, 23 November 2016 (UTC)
It looks great, and I know you've worked hard on this, but here are potential problems I see:
  1. Like the others have said, it's too different from other languages.
  2. The wikicode would probably be harder to pick up for new editors. (It'll take me some time to get use to.)
  3. There's a bit of repetition, like putting |pos=part multiple times for 的. Is that something we really want to do?
  4. It would probably take up more Lua memory, which would not be necessary if we keep the current format. — justin(r)leung (t...) | c=› } 13:31, 23 November 2016 (UTC)
Thanks guys. It is a big change, but my feeling is that this sort of sense-synonym/antonym/... integration has to be done sooner or later; there were some calls before (for example User:DTLHS/export, which was referenced in this layout), but no one has really tested doing it. The reason for the integration is that synonyms etc. are only valid on a sense-specific basis, the same as classifiers (which has already been adapted to be sense-specific) and dialectal equivalents. Moedict and Cantodict also do the same.
The code can probably be simplified, such as switching pos to argument 1, and definition to argument 2. The enclosing zh-def template may be omittable too - if we can automatically generate the <ol> ~ </ol> using some css magic. If a Java gadget could be designed to allow GUI edit of the individual senses, while the raw code remains unchanged, that would be the most fantastic. The increase in Lua memory usage seems quite small - I tested with the equivalent current code, which was 18.96 MB, slightly smaller compared to the new version (19.62 MB). A good thing about enclosing senses is that sense ids can be created and used to reference individual senses elsewhere. Bot conversion of the definitions should be reasonably straightforward too. Wyang (talk) 14:24, 23 November 2016 (UTC)
I removed the need for the outer enclosing template, and integrated all the code into a single template. It looks like this:
|syn: 食糖
|ant: 鹽
|x1: {{zh-x|糖尿病|[[diabetes]]}}
|x2: {{zh-x|糖{tong4}水|[[sugar water]]|C}}
|n|[[candy]]; [[sweets]]
|mw: m:塊-“piece”,c:嚿-“piece”
|syn: 糖果
|x1: {{zh-x|棒棒糖|lollipop|C}}
|x2: {{zh-x|糖 食 得 多 冇益。|Eating too much '''candy''' is unhealthy.|C}}
|lb: organic chemistry
|x1: {{zh-x|多糖|polysaccharide}}
where the different senses are separated by |-, and the effect is the same. It should be easier to use now. The memory requirement is slightly reduced in the process: 18.97 MB, nearly the same as the current format (18.96 MB). Wyang (talk) 23:18, 23 November 2016 (UTC)
@Atitarev, Tooironic, Hongthay, Mar vin kaiser: Perhaps the previous ping didn't work correctly. Wyang (talk) 09:32, 24 November 2016 (UTC)
Hi. Sorry, I got the ping but I'm a bit confused. It's a good effort but I agree that this is a radical change with the current format and too different with other languages again. Displaying PoS's in front of definitions (translations) are definitely worth considering. --Anatoli T. (обсудить/вклад) 09:38, 24 November 2016 (UTC)
Thanks Anatoli. I accidentally discovered something I wrote > 2 yrs ago on Talk:一致, and it seems my desire to change the format has been long-standing... The division of definitions by part of speech is really not ideal for analytic and less inflecting languages (努力, 保險, 可能). IMO treating synonyms, antonyms and so forth as belonging to senses is also important, as we add more and more of these see-also-type of words. At the moment (shēng) looks fairly neat (albeit not as clear as if the PoS info were next to the senses), but if I add synonyms, antonyms, see-also terms as in User:Wyang/zh-def#生, the page could become quite confusing. Wyang (talk) 11:22, 24 November 2016 (UTC)
  • As already mentioned above, these are radical changes you are proposing. Since they go to the heart of Wiktionary's layout, you'd be better off seeing if you can carry them out by getting support from other members of the community for ALL languages, not just Chinese. ---> Tooironic (talk) 12:49, 24 November 2016 (UTC)
  • I really have no faith in the Wiktionary community in this. Haiz.
    If we, as the Chinese-editing community, believe that a current practice is unfittingly designed for Chinese, we should strive to achieve what we think is most suitable. I myself only have limited power in making a difference. It's like the opposition to {{zh-pron}} formatting and the Chinese merger before; other people are unfamiliar with this, so unless we adopt what is right, we won't progress efficiently. Wyang (talk) 11:58, 25 November 2016 (UTC)


I'd like to add entries for Sichaunese, but it appears that it doesn't have an ISO code, so one would have to be created. I don't think it should be included under the Mandarin section for zh-pron and other places due to the differences between the two (47.8% lexical similarity and < 60% intellegility) and also there is the sheer number of potentional listings that could be under Mandarin (ie Shandong, Shaanxi, Dongbei etc.) Maybe listing under Southwest Mandarin would be okay though. Most of the coverage would probably be on the Chengdu dialect, but I'm not sure how other dialects, some of which are quite different, would be accounted for.--Prisencolin (talk) 02:13, 10 December 2016 (UTC)

(TBH it's currently impossible to nest it under Mandarin with the current zh-pron code —suzukaze (tc) 03:53, 10 December 2016 (UTC))
It is a variety of Mandarin though - it would make more sense to group it under Mandarin and reorganise the Standard Mandarin tags accordingly. Wyang (talk) 16:44, 13 December 2016 (UTC)
Sichuanese should definitely be nested under Mandarin but I don't have the guts to modify Module:cmn-pron. —suzukaze (tc) 12:47, 14 December 2016 (UTC)
I'm not contesting that it's part of the Mandarin branch, I'm just concerned that at some point we might have over a dozen different entries under "Mandarin" that could appear and it might be a bit disorganized. Why don't we just create a new module for "Southwest Mandarin" and put it under that? It still has "Mandarin" in the name after all. It also has the benefit of being able to group related varieties together in specific subcategories.--Prisencolin (talk) 04:55, 15 December 2016 (UTC)

'The body of this page needs to be updated to explain the new policy'[edit]

Hi, regarding the message reading 'The body of this page needs to be updated to explain the new policy.', I'd like to know when the update is going to be carried out, or at least where I can read the new policy. Thanks in advance. --Backinstadiums (talk) 15:48, 8 June 2017 (UTC)

@Backinstadiums: I think it's pretty much up to date now. @Suzukaze-c, Atitarev, Wyang, is there any old policy still lingering around on the page? Can we remove that notice? — justin(r)leung (t...) | c=› } 16:02, 8 June 2017 (UTC)
Yes, the notice can go now. I put it there after we moved to the unified Chinese L2 header but the policy described the old standards. Now it's matching what we are doing.--Anatoli T. (обсудить/вклад) 22:15, 8 June 2017 (UTC)
Agreed. Wyang (talk) 23:01, 8 June 2017 (UTC)
The Etymology section is still outdated (I'm not sure how to update it), but otherwise I think it's OK. —suzukaze (tc) 23:08, 8 June 2017 (UTC)
@Suzukaze-c: Could we just remove the part that mentions literary Chinese altogether for now? — justin(r)leung (t...) | c=› } 23:17, 8 June 2017 (UTC)
@Justinrleung I don't know if this policy is a draft. It's official - either endorsed by a vote or unchallenged by the community. The format of soft redirects wasn't endorsed, though but wasn't challenged either. There are still thousands of unconverted Mandarin and Cantonese hanzi entries, which are hard to convert for obvious reasons. Things to discuss are pinyin, jyutping and POJ entries (headers, categories and templates), Cyrillic Dungan and Arabic Xiao'erjing. What to do with topolects without an established writing system and lack of transliteration standards.--Anatoli T. (обсудить/вклад) 23:47, 8 June 2017 (UTC)
@Atitarev: Is there a better template to use? {{policy}} seems too strong, but {{policy-DP}} / {{policy-ED}} seem too weak. — justin(r)leung (t...) | c=› } 23:51, 8 June 2017 (UTC)
@Justinrleung I see your point, thanks. Yes, leave it as is. Another thing we need to do for Chinese (and any language in scriptio continua, including Vietnamese) is to define CFI. Definition of "word" or "some of parts" are not exactly the same in Chinese as with languages with spaces. Even German or Finnish criteria for inclusion differ from English. --Anatoli T. (обсудить/вклад)
On that note, I would like to suggest that we relax the part of CFI on personal names slightly, to allow names which are directly found in idioms and set phrases, such as
Wyang (talk) 03:54, 10 June 2017 (UTC)

Looking to improve Wenzhounese coverage[edit]

I started the outline of an "about" page at User:Prisencolin/wenzhou. Wenzhounese should be distinct enough from other Wu dialects to warrant a page by itself.--Prisencolin (talk) 18:12, 5 July 2017 (UTC)

(See also Template_talk:zh-pron#Wenzhou_dialectsuzukaze (tc) 18:15, 5 July 2017 (UTC))
@Prisencolin, Suzukaze-c: I think the first step is to add Wenzhounese to {{zh-pron}}. We do have an editor from Wenzhou, @Mteechan, so it would be great if we could start adding Wenzhounese to Wiktionary. We need to determine which romanization system we should be using. @Wyang, Atitarev, any thoughts? — justin(r)leung (t...) | c=› } 18:40, 5 July 2017 (UTC)
Wupin, or Wu romanization, the one wu-chinese.com uses will do. Nevertheless, it could be improved to some extent. Mteechan (talk) 18:52, 5 July 2017 (UTC)
I'm still curious as to how irregular the phonology (esp. tone sandhis) is - this will determine the kind of system that would be ideal for use. Wyang (talk) 23:01, 5 July 2017 (UTC)
Well, the tone sandhi is pretty complicated. I've made a lookup table for 2-word sandhi, but it's based on my accent, not the de facto "standard" accent in urban Wenzhou. Other than that, the phonology is not that irregular. Mteechan (talk) 04:38, 6 July 2017 (UTC)
@D.s.ronis has done some work on Wenzhounese on Wikipedia, such as creating Wenzhounese romanisation.--Prisencolin (talk) 06:33, 9 July 2017 (UTC)
Glossika has a Wenzhounese course as well, for those interested. Wyang (talk) 07:42, 11 July 2017 (UTC)

Teochew syllabic ng?[edit]

My grandparents from Chaoyang pronounce 門 something like /mŋ̍/ ~ /mɤ̯ŋ/. czyzd.com transcribes this as ⟨meng⟩ /mɯŋ/. I had written text describing my thoughts on this but I think I will instead leave this to the consideration of others. —suzukaze (tc) 18:15, 24 October 2017 (UTC)

Taishanese and Teochew[edit]

Taishanese and Teochew now have codes, pursuant to the discussion archived at Wiktionary talk:Language treatment/Discussions#Taishanese_and_Teochew. - -sche (discuss) 07:24, 19 January 2018 (UTC)

Header of non-Chinese script entries[edit]

Wiktionary:Votes/pl-2014-04/Unified Chinese decided that words written in Chinese characters should be unified to Chinese header. However it also says the formats of templates in words written in non-Han scripts devised specifically for particular topolects above are not the subject of the vote and can be discussed separately if needed.

Sinitic terms (lemma or not) written in non-Han scripts includes:

  1. Pinyin romanization of words
  2. Jyutping romanization of characters
  3. POJ form of words
  4. Cyrillic Dungan
  5. Xiao'erjing words
  6. others, like zhuyin fuhao

There're two different topolect headings to use:

  1. Use the topolect as heading (e.g. Mandarin, Cantonese, Min Nan, Dungan)
  2. Use Chinese as heading for all terms (like this and this)

Also needing point out:

  1. Currently the heading of Pinyin entry is inconsistent (29822 Mandarin header vs 1318 Chinese header; MediaWiki:Gadget-AcceleratedFormCreation.js uses "Chinese")
  2. There're precedent to not use specific dialectal header for terms orthography exclusive to a specific dialect, see Wiktionary:Votes/2011-10/Unified Romanian

I propose to migrate all Sinitic terms (lemma or not) to Chinese header and eliminate any topolect header, to finish unification of Chinese. Any thought? Note this proposal only concerns header and says nothing about category. --Zcreator (talk) 02:24, 4 February 2018 (UTC)

Support. Wyang (talk) 03:06, 4 February 2018 (UTC)
Weak Symbol oppose vote.svg Oppose, since romanizations like Jyutping are made specifically for Cantonese, unlike hanzi spellings, which can be shared across dialects.
AcceleratedFormCreation.js seems to be using "Chinese" because the accelerated creation links are found under a Chinese header. —Suzukaze-c 07:05, 3 May 2018 (UTC)
(Notifying Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator, Zcreator alt, Dine2016): : Can we agree on some of the existing non-Chinese headers, maybe one at a time? I think pinyin entries could look like this under the Chinese L2 header:
Hanyu Pinyin Běijīng (Zhuyin ㄅㄟˇ ㄐㄧㄥ)
The only difference from the current Běijīng entry would be a different L2 header (Chinese) and a linked name of the romanisation. Since Hanyu Pinyin is only used for Mandarin, it becomes obvious, which lect the romanisation applies to. --Anatoli T. (обсудить/вклад) 07:29, 3 May 2018 (UTC)
My understanding is that unification of Chinese reduces duplication due to the large number of shared written forms across lects.
There is no such concern for romanizations, which are unique to a certain lect, so I think they should not use the "Chinese" header. Min Nan is Chinese, but I am not yet convinced that chai-iáⁿ#Chinese is helpful. I imagine that a "unified Chinese" plan would never have taken place if China used phonetic scripts, and there were no hanzi to "bind" lects together.
Suzukaze-c 08:33, 3 May 2018 (UTC)
Thanks for the response. Let's see what other people think. Converting Min Nan Pe̍h-ōe-jī to Chinese L2 was looked at favourably but not everyone thinks we should have Hanyu pinyin entries in the first place. --Anatoli T. (обсудить/вклад) 13:22, 3 May 2018 (UTC)

Dungan Cyrillic transliteration[edit]

(Notifying Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator, Zcreator alt, Dine2016): : Hi. I think Dungan Cyrillic should be transliterated into Roman letters in Chinese entries.

I also think we should review the method itself, which is not quite standard, anyway - e.g. get rid of Cyrillics in the translit and make it more meaningful. --Anatoli T. (обсудить/вклад) 03:20, 11 April 2018 (UTC)

Translit in {{zh-pron}}: definitely.
About the translit itself: I based it on w:ru:Дунганская_письменность#Таблица_соответствия_алфавитов, which is where ь came from. I'm not sure what it should be replaced with. î? —Suzukaze-c 03:25, 11 April 2018 (UTC)
@Suzukaze-c: Thanks, let me think about it when I have a bit more time ("î" may not be a bad suggestion) and let's see what others think about it. --Anatoli T. (обсудить/вклад) 03:36, 11 April 2018 (UTC)

Superscript tone numbers[edit]

(Notifying Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator, Zcreator alt, Dine2016): : Would superscript tone numbers for Gan, Xiang, Jin, Teochew, etc. look better if they were made superscript by default like Cantonese, e.g. 所有 (so2 jau5)? This is not currently applicable to all templates - 所有 (so2 jau5). Pretty sure it was implemented by Kenny. --Anatoli T. (обсудить/вклад) 11:40, 15 April 2018 (UTC)

I agree. Wyang (talk) 12:54, 15 April 2018 (UTC)
I agree as well. I think this should also be automatic in {{zh-l}}. — justin(r)leung (t...) | c=› } 18:19, 15 April 2018 (UTC)
Sure. —Suzukaze-c 22:14, 15 April 2018 (UTC)
What's the more recognised standard? I am not familiar with pronunciation schemes for dialects other than Mandarin. JamesjiaoTC 22:02, 16 April 2018 (UTC)

"other" dialects[edit]

Having {{zh-dial}} but not adding relevant IPA to entries seems odd to me. Perhaps we need to rethink {{zh-pron}}. —Suzukaze-c 04:43, 25 September 2018 (UTC)

By all means do please. Something similar to Xiaoxuetang would be ideal, but it would also mean a large maintenance requirement. Wyang (talk) 05:00, 25 September 2018 (UTC)

Romanisations of Chinese[edit]

According to the present policy, Pinyin romanisations of monosyllables and polysyllables for Standard Mandarin (aka Putonghua), such as "" and "bùguò" are allowed. However, for Standard Cantonese, only Jyutping romanisations of monosyllables of monosyllables are allowed (e.g. jyut6, ping3), while those of polysyllables are disallowed. Why is there such unequal treatments for the two languages? I believe that Jyutping romanisations of polysyllables should be allowed and massly created, as Pinyin romanisations of polysyllables are allowed and exist in a large quantity. Jonashtand (talk) 06:34, 9 December 2018 (UTC)

@Jonashtand: It was a result of this vote. The only reason for only monosyllables I see in that vote is that "this is also what is done for pinyin with tone numbers". — justin(r)leung (t...) | c=› } 06:49, 9 December 2018 (UTC)