Wiktionary talk:About Chinese
- Discussions from before January 2010 are in Wiktionary talk:About Sinitic languages/archive 1.
untitled
[edit]Please see Wiktionary talk:Entries on Chinese characters#Sortkeys and subcats for single-character entries for discussion of how to categorize the single-Chinese-character entries since they (may) apply to Chinese, Japanese, Korean and Vietnamese (CJKV). - dcljr 08:39, 25 January 2006 (UTC)
Autoformat has identified a number of entries that have the non-conforming language name "Chinese (traditional/simplified)". There are others that it has not yet flagged as well. I could not be trusted to correct this properly. DCDuring TALK 17:32, 9 May 2010 (UTC)
- As a rule, assume it's Mandarin. Traditional/Simplified entries go on a single, they're not really different 'scripts' but more like the French spelling reforms, were paraître becomes paraitre as the circumflex doesn't serve any purpose. Mglovesfun (talk) 08:26, 29 September 2011 (UTC)
?
[edit](Note: I don't know a thing about Chinese.) A few questions/issues:
- What's up with the categories? There are Category:cmn:All topics, Category:zh:All topics, Category:zh-cn:All topics and Category:zh-tw:All topics. What's the difference?
- WT:AZH#Min_Nan says that Min Nan "has four main branches... This poses a problem for Wiktionary, since these dialects are not mutually intelligible, and only one L2 header may be used per ISO 639 code. ... To date, virtually all entries for Min Nan have been based on the Amoy dialect, which is widely considered to be a de facto standard. The disposition of other dialects such as Teochew and Qiongwen Hainanese remains undecided at this time." I'm pretty sure that standard practice for branches among languages is to use context labels for words that don't exist in some branches. Why should this language be different?
- I seem to recall some consensus about not allowing toneless pinyin entries? If there was, shouldn't this be mentioned on WT:About Chinese?
- WT:AZH lists
{{infl}}
as being the standard template to use, and repeats it many times for all the languages that do not yet have templates built for them specifically. Rather than showing an explanation for{{infl}}
over and over again, wouldn't it make sense to make the page say that for dialects that don't have specific templates yet, use infl, and then explain how to use it once? - Are these languages treated as separate languages or as dialects of one languages? If they're separate languages, why do things like Category:Chinese templates exist, instead of being split into sections?
- What is the Wiktionary code for Mandarin, zh or cmn?
--Yair rand (talk) 07:02, 24 May 2010 (UTC)
- Just one answer for the moment: #What is the Wiktionary code for Mandarin, zh or cmn?. This is annoying but the assisted method doesn't work well with cmn, it creates {{ tø|cmn| for translations, this they can't be linked to zh:wiki. zh works better but bots change them to cmn. ZH is short for Chinese 中文 (Zhōngwén), CMN is Chinese Mandarin but both have the word Mandarin in templates. I learned to live with this :) The reasons for existence of Chinese and Mandarin are historical. Mandarin is standard Chinese and most written Chinese material is in Mandarin. There are no YUE, NAN, etc. Wiktionaries but there are some new WIkipedias in dialects. --Anatoli 12:36, 24 May 2010 (UTC)
- I proposed on WT:BP, and still do propose eliminating zh, zh-cn and zh-tw from category names. zh is used for translations as the Mandarin Wikiprojects uses the code zh not cmn. Mglovesfun (talk) 08:28, 29 September 2011 (UTC)
- Just one answer for the moment: #What is the Wiktionary code for Mandarin, zh or cmn?. This is annoying but the assisted method doesn't work well with cmn, it creates {{ tø|cmn| for translations, this they can't be linked to zh:wiki. zh works better but bots change them to cmn. ZH is short for Chinese 中文 (Zhōngwén), CMN is Chinese Mandarin but both have the word Mandarin in templates. I learned to live with this :) The reasons for existence of Chinese and Mandarin are historical. Mandarin is standard Chinese and most written Chinese material is in Mandarin. There are no YUE, NAN, etc. Wiktionaries but there are some new WIkipedias in dialects. --Anatoli 12:36, 24 May 2010 (UTC)
Move debate
[edit]The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.
This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.
I'd prefer Wiktionary:About Chinese languages as a title. It makes it clearer that we don't allow Chinese as a language. Furthermore, as much content as is reasonable/possible should be moved to the individual languages involved - Wiktionary:About Mandarin shouldn't be a redirect. Mglovesfun (talk) 12:55, 9 November 2010 (UTC)
- I support moving the contents into Wiktionary:About Mandarin, Wiktionary:About Min Nan, etc. Despite these languages naturally sharing common characteristics, they conceivably have different conventions as well, such as grammar and names of templates. --Daniel. 13:02, 9 November 2010 (UTC)
- Wiktionary:About Chinese (or a renamed version) should still exist, at the very least it could give context on what we call 'Chinese' here, and then link to the individual languages' pages. Mglovesfun (talk) 13:22, 9 November 2010 (UTC)
- I support moving to About Chinese languages. IMO as long as there is no Mandarin-specific information to be split off of that page, hard-redirect from About Mandarin. Precedent, fwiw, is About sign languages, redirected to from both About American Sign Language and WT:AASE (ase is American Sign Language) as well as from WT:ASGN (sgn is the group (or whatever it's called) code for sign languages).—msh210℠ (talk) 21:03, 10 November 2010 (UTC)
Moved. Mglovesfun (talk) 16:17, 25 November 2010 (UTC)
Move debate (2)
[edit]The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.
This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.
We don't have a Category:Chinese languages; we have a Category:Sinitic languages for that.
For that reason, I suggest moving Wiktionary:About Chinese languages to Wiktionary:About Sinitic languages. (And keeping the old name as a redirect.) --Daniel 19:05, 25 May 2011 (UTC)
- Done. Nobody objected. --Daniel 02:27, 8 June 2011 (UTC)
- Next time, please remember to check for double-redirects; in this case, that would be pages that redirect to Wiktionary:About Chinese languages. MediaWiki only supports one level of redirection, so once Wiktionary:About Chinese languages became a redirect to Wiktionary:About Sinitic languages, those redirects stopped working. (Don't worry, I've updated them now. Just something to remember for next time.) —RuakhTALK 03:09, 8 June 2011 (UTC)
- OK, I will check for all double-redirects next time. I've fixed some double-redirects to Wiktionary:About Sinitic languages, and missed others, before your help. Thanks. --Daniel 03:58, 8 June 2011 (UTC)
- Next time, please remember to check for double-redirects; in this case, that would be pages that redirect to Wiktionary:About Chinese languages. MediaWiki only supports one level of redirection, so once Wiktionary:About Chinese languages became a redirect to Wiktionary:About Sinitic languages, those redirects stopped working. (Don't worry, I've updated them now. Just something to remember for next time.) —RuakhTALK 03:09, 8 June 2011 (UTC)
Banning foreign proper nouns as Mandarin
[edit]I propose to make it a language policy of banning all proper names used in Mandarin context if they are not in Hanzi, regardless whether there are citations - Chinese do write in foreign language occasionaly, these foreign words don't become Chinese though. Foreign words should be and are transliterated into Chinese characters, otherwise they should not be considered Mandarin. The complexity is not a justification for not following this rule. This is to avoid entries such Thames河, Alps山, Alzheimer病, etc. once and for all. PRC and RC policies both regard using names in Roman letters as incorrect, which is widely accepted. --Anatoli 05:18, 29 September 2011 (UTC)
- I support this. Japanese speakers also use Latin-based foreign words in their writing occasionally, when there is a perfect katakana equivalent. Sometimes, it's done for stylistic reasons (as, very unfortunately, Western cultures are considered trendy in Asian countries), sometimes, well, some just want to show off. You can find this aspect especially in their song lyrics. Quite often the English lines don't even make sense whatsoever. Anyway, I digress. As I noted, writing in foreign scripts especially Latin-based languages is especially trendy among younger generations. Ok let me put it another way. I have seen English speakers putting words in Japanese hira or kata characters in their writing, when the same concept can be written in English perfectly. It's the result of a change in people's perception towards the Japanese (language or otherwise), which is now considered trendy and also the proliferation of Japanese learners in the past decade. Again, does it mean these words are now considered borrowed into English? If you say yes, then I have no problem with Thames河 being included in this dictionary. Jamesjiao → T ◊ C 06:00, 29 September 2011 (UTC)
- Re: setting up a vote (something mentioned in the BP): do you want to set up a vote that would only ban proper nouns? Or do you think common nouns like e-mail地址 should be banned, too? If so, then the vote could be broader. But your comments on RFV suggest you wouldn't delete all mixed-script entries (eg Y字). Presuming you'd like to ban e-mail地址 but not Y字, how can the vote be worded, so that it does that? - -sche (discuss) 06:03, 29 September 2011 (UTC)
- @-sche, don't get me wrong, mixed scripts are perfectly normal, like the ones you listed and many more, eg. AA制 (ēi'ēi zhì). Karaoke can only be written as 卡拉OK in Mandarin. I'm talking about proper nouns, I don't want mislead users to believe that Oslo is Oslo市 in Chinese, even if you find examples of usage. I have seen a Chinese map of Australia on a Chinese site on the internet where ony biggest cities were translated into Chinese. A user like Engirst would start quoting the untranslated names as Mandarin, which is wrong.
- @Jamesjiao, sorry you lost me, I don't know what you mean. Could you rephrase it, please?--Anatoli 06:20, 29 September 2011 (UTC)
- I was just comparing the analogy of using Japanese hiragana/katakana in English (esp. among Japanophiles) with the use of English (or other Latin script based languages) words in Chinese (due to trendiess probably?). This might not be a perfect analogy, but it's a start. You will also find that people are more inclined to use Latin characters in, especially for Proper nouns when using a computer keyboard (as opposed to handwriting). I also mentioned the fact that monolingual Chinese speakers wouldn't understand a mixed construction like this. Jamesjiao → T ◊ C 06:45, 29 September 2011 (UTC)
- Oh another thing is pronunciation. For a word to exist in a language, there has to be a way to pronounce it. I can't imagine a non-English speaking Chinese speaker trying to pronounce Thames河 even if he/she is able to recognize and even pronounce the individual letters. Jamesjiao → T ◊ C 06:52, 29 September 2011 (UTC)
- I was just comparing the analogy of using Japanese hiragana/katakana in English (esp. among Japanophiles) with the use of English (or other Latin script based languages) words in Chinese (due to trendiess probably?). This might not be a perfect analogy, but it's a start. You will also find that people are more inclined to use Latin characters in, especially for Proper nouns when using a computer keyboard (as opposed to handwriting). I also mentioned the fact that monolingual Chinese speakers wouldn't understand a mixed construction like this. Jamesjiao → T ◊ C 06:45, 29 September 2011 (UTC)
- I definitely don't think that Kana words in English are to be considered English but I haven't seen it, that's why I couldn't understand what you mean. Yes, you're right, most Chinese speakers wouldn't have a clue how to pronounce Thames河 or Seine河, Hudson河 or Volga河. --Anatoli 09:47, 29 September 2011 (UTC)
There is no only one standard for Chinese language. Chinese is not only for Mainland China, but for Taiwan, Hong Kong, Macau, Singapore and overseas. Such as President Bush is written as 布什, 布殊 and Bush as well. 2.25.212.4 13:02, 30 September 2011 (UTC)
- In which part of the world is the standard Chinese name for Bush "Bush"? 60.240.101.246 13:13, 30 September 2011 (UTC)
- There is no only one standard. A dictionary just record the words exist. 2.25.212.4 14:09, 30 September 2011 (UTC)
Wow, I get such a strong sense of déjà vu here... Engirst, do you have any original arguments? Your points above have been refuted. As noted elsewhere:
- we already have a record of Thames and a record of 河 (hé);
- using a term from one language in a sentence of another language may represent w:code-switching instead of borrowing;
- there is nothing intrinsically Chinese about Thames;
- the use of Thames in Thames河 is an example of an English term used as an English term in a Chinese context;
- the use of Thames in Thames河 is a collocation of two independent terms;
- as a non-idiomatic sum-of-parts phrase, Thames河 fails WT:CFI, just as yellow sweater or tasty kumquat fail WT:CFI for the same reason.
So, to extrapolate a basic list of criteria for including any word from Language A under the heading for Language B, not just proper nouns:
- Is the term used in Language B to convey any meaning that is different from its meaning in Language A?
- Alternately, is the term used widely enough in Language B that most speakers and/or readers of Language B should be expected to know and readily use the term?
Well, that's it, actually. I can't think of any other solid reasons for including a term from one language under the heading for another language. Use in Language B does not necessarily mean that the term has been adopted into that language. As soon as the term is used as Language B, i.e. where it has some meaning that is specific to that language or where it is well-known and widely used, then I am happy to advocate listing under both Language A and Language B headings. -- HTH, Eiríkr Útlendi | Tala við mig 23:04, 30 September 2011 (UTC)
- Your list seems good for the vote. I suggest to add the Mandarin romanisation entries, like Thames Hé vs Tàiwùshì Hé, the former falls into the same category. --Anatoli 21:43, 2 October 2011 (UTC)
- This is a very comprehensive list. Code-switching is what I had in mind, but I couldn't remember the term at the time. Code-switching occurs extremely often in Taiwan, not just between Mandarin and English, but Japanese, Korean and even their local flavour of Hokkien dialect as well. I often see short Japanese phrases like かわいいね。。。 in Taiwanese online blogs mixed in with Chinese characters. This is a very typical case of code-switching in writing. Jamesjiao → T ◊ C 02:06, 5 October 2011 (UTC)
- The vote to ban this kind of entries is set up here. Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters. --Anatoli 01:05, 3 October 2011 (UTC)
- Not being a speaker of Mandarin or Japanese, I have a question which might help to clarify the issue for those in a similar position. Which of the following example in English best equates to "Thames河" in Mardarin: "résumé" (a French word, wholly adopted but retaining glyphs which are not properly in the English alphabet), άλφα (a Greek word which, when used, is italicized to indicate that it is from a different language), or something completely different? I do think it might be a bit early for voting, since in all of the discussions around this topic I have only seen 5 or 6 contributors. - [The]DaveRoss 02:37, 5 October 2011 (UTC)
- In answer to your question: this is like Москва#English (a foreign word, which indicates that it is from a different language by being in a different script). - -sche (discuss) 03:41, 5 October 2011 (UTC)
- Not being a speaker of Mandarin or Japanese, I have a question which might help to clarify the issue for those in a similar position. Which of the following example in English best equates to "Thames河" in Mardarin: "résumé" (a French word, wholly adopted but retaining glyphs which are not properly in the English alphabet), άλφα (a Greek word which, when used, is italicized to indicate that it is from a different language), or something completely different? I do think it might be a bit early for voting, since in all of the discussions around this topic I have only seen 5 or 6 contributors. - [The]DaveRoss 02:37, 5 October 2011 (UTC)
- TheDaveRoss, it's only one user, not many (who creates/recreates them), trust me, with different IP's. The issue at hand is that this user claims that "Thames河" - English "Thames" + 河 (river) is a Mandarin word, citing examples from books. Note that river names are always followed by 河 or other similar words in Mandarin. There are other examples where foreign names are written in Mandarin without translating, showing the foreign name in the original script. My argument is that the Chinese word for Thames is 泰晤士河 (Google Books -3,150 hits) and there is no reason to include the SoP term Thames河, there is nothing Chinese in Thames. The rule and common practise is transliterate/translate people's names cities, etc. no matter how small. There are borrowings into Mandarin, very few have also a few Roman letters (三K黨 / 三K党 Ku Klux Klan) but writing full names in Roman letters is a case of code-switching. OK#Mandarin is a common noun, not a proper name, it has become partially naturalised. Like any other language, Mandarin uses native script to write words, using other scripts when it absolutely has to. "London市" or "Hyde公园" are not exceptions, they are case of code-switching (simply Chinglish) - correct and common terms - "伦敦", "海德公园". The issue is not just Mandarin specific. Some argue that bluetooth should be the right way to write the word in Russian. A similar situation could arise for Japanese, Russian, Hindi or Korean, Arabic, others, where people insert Roman letter names. I believe these names don't become naturalised. I hope expressed myself well. If a word in Roman becomes naturalised, then we can include them, still discussing pizza#Mandarin (a common word). --Anatoli 03:07, 5 October 2011 (UTC)
Pinyin with no tra or sim
[edit]Is there any sensible way to find these? I have been speedy deleting some of these; given that {{pinyin reading of}}
links to the tra and sim, it seems reasonable. For example we don't allow plurals that don't have a singlular ({{plural of|xyz}}
when xyz doesn't exist yet). If anyone wants to create Hanzi entries for these, then recreate the pinyin, it is with my blessing. Mglovesfun (talk) 12:27, 2 October 2011 (UTC)
- I don't understand what you said. Engirst 12:40, 2 October 2011 (UTC)
- He is saying that we don't allow a plural form entry for English words when the singular form does not yet exist. He is asking if that also means that we shouldn't have the pinyin form when the traditional or simplified Mandarin forms do not yet exist. He has been deleting them when he sees them. - [The]DaveRoss 02:39, 5 October 2011 (UTC)
- I think Engirst considers the character entries too complex and is not worth his time creating. I digress. There is in fact here: vote (That a pinyin entry, using the tone-marking diacritics, be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling.). It doesn't however explicitly exclude pinyin entries when there are no character entries present. Maybe the wording can be change to something like: That a pinyin entry, using the tone-marking diacritics, only be allowed when
everwe have an entry for a traditional-characters or simplified-characters spelling.. Jamesjiao → T ◊ C 02:47, 5 October 2011 (UTC)
- I think Engirst considers the character entries too complex and is not worth his time creating. I digress. There is in fact here: vote (That a pinyin entry, using the tone-marking diacritics, be allowed whenever we have an entry for a traditional-characters or simplified-characters spelling.). It doesn't however explicitly exclude pinyin entries when there are no character entries present. Maybe the wording can be change to something like: That a pinyin entry, using the tone-marking diacritics, only be allowed when
- He is saying that we don't allow a plural form entry for English words when the singular form does not yet exist. He is asking if that also means that we shouldn't have the pinyin form when the traditional or simplified Mandarin forms do not yet exist. He has been deleting them when he sees them. - [The]DaveRoss 02:39, 5 October 2011 (UTC)
- Sounds like a reasonable suggestion. There's not enough resources validating Romanisation entries (SoP, attestability, etc. let alone the Chinese characters - often one version is omitted). Not sure how this can be done but I support voting on this. Maybe Engirst will start creating some Chinese character entries before adding pinyin? (wishful thinking) --Anatoli 03:11, 5 October 2011 (UTC)
- I'm OK with users adding valid pinyin (attestable / with correct tone-markings) without adding hanzi, I'm also OK with users creating valid plurals (attestable) without creating the singulars... we allow that on de.Wikt, we even have bots to create forms without regard to the presence of the lemmata, because in that way, a user who looks up the form or the pinyin will at least have a bit of information, better than nothing. Having said that, I think all of you, as the active Chinese editors, could form a consensus and agree that you interpret the vote as requiring hanzi to exist first (this is how I always interpreted the vote), and delete pinyin entries that have no hanzi form, without having a new vote. - -sche (discuss) 03:36, 5 October 2011 (UTC)
- Seems like without any vote, nothing can be achieved in Mandarin space, most active Chinese editors (except for this user) all disagree with Engirst (he may now be avoiding his own user account) but changing or deleting his entries causes edit wars or someone may think he is just being bullied. --Anatoli 03:55, 5 October 2011 (UTC)
- Just as another reference for comparison --
- If I understand it correctly, the current policy for Japanese entries is to have the main entry with most of the information located under the kanji headword when there is one, or under the kana headword otherwise, and for the romaji (Japanese pinyin, as it were) entries to *only* serve as disambig pages pointing users to the relevant other headwords. Consequently, romaji entries should not have any "See also", "Derived terms", "Usage notes", or other headings. The kōgai (kōgai) entry is a good example of this in action. -- Eiríkr Útlendi | Tala við mig 04:56, 5 October 2011 (UTC)
- Pinyin romanisation rules went further - parts of speech are not allowed but we do have many pinyin entries without hanzi. --Anatoli 00:23, 7 October 2011 (UTC)
Are the tone-markings on these words correct?
[edit]Talk:Nèi Ménggǔ, Talk:Ménggǔ. (Other editors: feel free to list entries in this section if you doubt they have correct tone-markings. It should be helpful to have a single place to gather them for cleanup. If there is such a place already, other than the clogged WT:RFC page, please move these there.) - -sche (discuss) 12:01, 3 October 2011 (UTC)
- It's Nèi Měnggǔ and Měnggǔ. --Anatoli 12:51, 3 October 2011 (UTC)
- Your examples show the tone sandhi where the original third tone is pronounced as second in front of another third tone but it's usually not reflected in pinyin romanisation. --Anatoli 12:54, 3 October 2011 (UTC)
Wakie-wakie, the vote is on. --Anatoli 04:50, 19 October 2011 (UTC)
Mandarin part of speech template
[edit]Templates like {{cmn-noun}}
allow p for pinyin as a first parameter. This should be phased out. There's an effort to remove all pinyin from part of speech categories and have them only in Category:Mandarin pinyin and subcategories, at some point the templates will have to follow suit, though we're months away from being ready. So this is a heads up. --Mglovesfun (talk) 21:22, 10 November 2011 (UTC)
- But this parameter serves the same purpose as tr - transliteration and the hyperlink allows to see if there are other hanzi with the same pinyin. I have no strong opinion on your suggestion at the moment.
- I've been checking your list at User:MglovesfunBot/cmn-parts-of-speech-Latn, as you have noticed. It's quite big, very time consuming, inviting other Sinophone editors to join the effort. If the entry' hanzi are red-linked, it can be deleted, rather than converted. Sometimes I also leave entries if they only have a Japanese but no Mandarin entry (planning to add them later). --Anatoli 21:54, 10 November 2011 (UTC)
{{cmn-noun|p}}
is used for Mandarin nouns in the Latin script. Since we no longer use{{cmn-noun}}
(cmn-adj, adv, abrr, etc.) for pinyin entries. Something like{{cmn-noun|ts|pin=fú}}
will still work! --Mglovesfun (talk) 18:37, 11 November 2011 (UTC)
- I've been checking your list at User:MglovesfunBot/cmn-parts-of-speech-Latn, as you have noticed. It's quite big, very time consuming, inviting other Sinophone editors to join the effort. If the entry' hanzi are red-linked, it can be deleted, rather than converted. Sometimes I also leave entries if they only have a Japanese but no Mandarin entry (planning to add them later). --Anatoli 21:54, 10 November 2011 (UTC)
- I misunderstood, sorry, I was thinking about pin parameter. Can you give an example, please? --Anatoli 21:47, 13 November 2011 (UTC)
Err, when did we approve Wade-Giles transliterations for inclusion? I can kinda understand Pinyin, but this? -- Liliana • 15:23, 10 December 2011 (UTC)
- Thanks. Binned. --Anatoli (обсудить) 03:20, 11 December 2011 (UTC)
Audio files
[edit]See commons:Commons:Village_pump#Category:Chinese_pronunciation. Mglovesfun (talk) 13:00, 19 February 2012 (UTC)
- I have some doubts about your request. The main reason being many homophones, and then the request should also specify if we want jiantizi, fantizi or both (there are variant characters) too. The conversion is far from straightforward. Perhaps, using audiofiles based on toned pinyijn was the right choice, even if it's more complicated to use bots to add audio files to hanzi entries. I see some of audio entries miss tone marks. --Anatoli (обсудить) 21:43, 19 February 2012 (UTC)
- I think the audio files should stay at the pinyin filenames, because if I am not mistaken, multiple characters with the same pinyin romanization X have the same pronunciation. Giving the file a pinyin filename allows it to be uploaded to all characters that have pinyin X. It seems easier to write a bot to do that, than to host the same file under dozens of names. - -sche (discuss) 21:48, 19 February 2012 (UTC)
{{commonsrad}}
[edit]- Note: the title of this section was previously
{{Commonsrad}}
.
Sarang (talk • contribs) has created {{Commonsrad}}
, and would like me to run a bot that will add it to all entries and indices for radicals (e.g. 一 and Index:Chinese radical/一). Does everyone agree that this should be done? —RuakhTALK 15:38, 11 April 2012 (UTC)
- If it's going to be bot-added, there is no harm in giving it a clearer name first. Maybe
{{Commons radical}}
? —CodeCat 16:29, 11 April 2012 (UTC)
- If Commonsrad seems not clear enough, I have no objections to give the name 5 bytes more — the data space of 2000 bytes more won't mind either. I chose a name close to
{{Commonscat}}
because it is very similar to it. In fact, Commonsrad can told a variation of Commonscat but with a display better suited for its usage, and the possibility for easy expansion whenever wanted. If then it may be used to link non-radical Chinese glyph Wiktionary pages to their Commons categories, Commonsrad is not so misleading than a clearer descriptive name like{{commons radical}}
. -- sarang♥사랑 18:09, 11 April 2012 (UTC)
- If Commonsrad seems not clear enough, I have no objections to give the name 5 bytes more — the data space of 2000 bytes more won't mind either. I chose a name close to
- It seems to be a use at Wiktionary to have template names with lower case initials (with upper case redirects)? Another question to decide! -- sarang♥사랑 05:48, 12 April 2012 (UTC)
- I'm not exactly sure from the description what the template will do, but it looks harmless enough. -- A-cai (talk) 22:26, 17 April 2012 (UTC)
- Template has been moved to
{{commonsrad}}
, hence the red links above. Mglovesfun (talk) 22:28, 17 April 2012 (UTC)
- Template has been moved to
- I'm not exactly sure from the description what the template will do, but it looks harmless enough. -- A-cai (talk) 22:26, 17 April 2012 (UTC)
Baxter-Sagart
[edit]I'm not really active in this project, but I did add a new appendix, Appendix:Baxter-Sagart Old Chinese reconstruction. It's referenced, and the table data is programmatically generated from the reference data with a program whose source code I also made available. I hope this in some way can be of help. - Gilgamesh (talk) 22:57, 31 May 2012 (UTC)
If someone knowledgeable could check that the pronunciation and pinyin of [[葡文]] are correct, it would be appreciated. :) - -sche (discuss) 21:00, 26 December 2012 (UTC)
- Thanks for that. Does "(written)" mean that 葡文 refers to written Portguese, or that 葡文 is
{{literary|lang=cmn}}
and mostly used in written Chinese and not in spoken Chinese? - -sche (discuss) 21:52, 26 December 2012 (UTC)
- Ah, interesting! - -sche (discuss) 00:01, 27 December 2012 (UTC)
The transliteration and four-corner number, respectively, of these characters were tagged {{fact}}
; can anyone verify them? they and the Japanese character 䋖 (the On-reading of which has been questioned) are the last remaining Han characters tagged {{fact}}
. - -sche (discuss) 00:01, 27 December 2012 (UTC)
Toneless pinyin usage notes
[edit]Currently, our toneless pinyin entries all have a usage note at the bottom which says:
- English transcriptions of Chinese speech often fail to distinguish between the critical tonal differences employed in the Chinese language, using words such as this one without the appropriate indication of tone.
I don't have much of a problem with it (although maybe "Chinese" should be changed to "Mandarin"), but I realized that if we do want to change it, it will be somewhat difficult, and some of them may be edited and fall out of synch. To solve that, I propose that we create a template called {{cmn-toneless-note}}
or something similar and ask an editor with an AWB account to change all instances of the text into a template call. What do you guys think? —Μετάknowledgediscuss/deeds 19:13, 6 January 2013 (UTC)
- Support. - -sche (discuss) 19:43, 6 January 2013 (UTC)
- Support. Also, "using words" should probably be "writing syllables". (We don't have toneless-pinyin entries for whole words, only for individual syllables.) —RuakhTALK 20:28, 6 January 2013 (UTC)
- Well... sort of. On one hand, you are correct that this is only used for specific syllables, but OTOH the syllables are words, in the loose Chinese way of looking at what constitutes a word. (One Chinese man was trying arduously to convince me that all words in Mandarin are one syllable long. I was unsuccessful in my attempts to get him to revise his native definition of what a word is to the Western linguistic concept.) Incidentally, the entries (like nu#Mandarin) also point to forms like nǚ, which not only is marked for tone but also has a different vowel, and perhaps the note should reflect that. (Of course, I'm not sure how useful that is anyway, because when my friends don't have access to the character 女, they type nv3, not the equally inaccessible diacritic form.) —Μετάknowledgediscuss/deeds 21:13, 6 January 2013 (UTC)
- Well, if our goal were to conform to "the loose Chinese way of looking at" their languages, then we'd treat all of them as dialects of a single language. It isn't, so we don't. By most linguistically-well-informed accounts, the vast majority of Mandarin words are bisyllabic. —RuakhTALK 22:50, 6 January 2013 (UTC)
- I personally find your comment rather arrogant and disparaging. 129.78.32.21 04:36, 10 January 2013 (UTC)
- Well, if our goal were to conform to "the loose Chinese way of looking at" their languages, then we'd treat all of them as dialects of a single language. It isn't, so we don't. By most linguistically-well-informed accounts, the vast majority of Mandarin words are bisyllabic. —RuakhTALK 22:50, 6 January 2013 (UTC)
- Well... sort of. On one hand, you are correct that this is only used for specific syllables, but OTOH the syllables are words, in the loose Chinese way of looking at what constitutes a word. (One Chinese man was trying arduously to convince me that all words in Mandarin are one syllable long. I was unsuccessful in my attempts to get him to revise his native definition of what a word is to the Western linguistic concept.) Incidentally, the entries (like nu#Mandarin) also point to forms like nǚ, which not only is marked for tone but also has a different vowel, and perhaps the note should reflect that. (Of course, I'm not sure how useful that is anyway, because when my friends don't have access to the character 女, they type nv3, not the equally inaccessible diacritic form.) —Μετάknowledgediscuss/deeds 21:13, 6 January 2013 (UTC)
- I don't find it arrogant but one needs to know Chinese (also Vietnamese, Thai, etc.) are traditionally called monosyllabic as all or almost all polysyllabic words are made of component words, exceptions are phonetic transription, characters that have lost their meaning over the time but it's less of a case with Mandarin. --Anatoli (обсудить/вклад) 04:44, 10 January 2013 (UTC)
- I was referring to the "dialect/language" comment, where he regarded "we" as identical to himself in having the personal stance of considering "Chinese is not a single language" to be false. It is a language, by Wikipedia at least. 129.78.32.21 05:04, 10 January 2013 (UTC)
- Views on this differ but I agree that Chinese topolects are more like dialects than separate languages, even if they may not be mutually comprehensible when spoken, quite different on the written level, they are often closer than dialects of other languages (provided they are written the Chinese way, using hanzi, not Roman, Cyrillic, Arabic or other scripts). Wiktionary treats Chinese topolects differently as per language headers but translation are all nested under "Chinese", e.g. Chinese/Mandarin, Chinese/Cantonese, etc. --Anatoli (обсудить/вклад) 05:13, 10 January 2013 (UTC)
Please note that full words in toneless pinyin were explicitly forbidden by votes and almost unanimous agreements, it happened before Metaknowledge became active. --Anatoli (обсудить/вклад) 22:54, 6 January 2013 (UTC)
- So do you support this? —Μετάknowledgediscuss/deeds 06:03, 9 January 2013 (UTC)
- Yes, Support. --Anatoli (обсудить/вклад) 04:31, 10 January 2013 (UTC)
- Erm... so do any of you AWBers/botters want to actually do it? —Μετάknowledgediscuss/deeds 04:59, 10 January 2013 (UTC)
- Delete all pinyin, whether toned or not. Move it to Appendix at least. It is merely a transcription scheme, not even official orthography. 129.78.32.21 05:06, 10 January 2013 (UTC)
- It doesn't work this way. IP users (anonymous) with no or little contributions have little influence and structure is decided after discussions, votes, etc. Entries in Category:Mandarin pinyin do not claim they are proper writing, they are a helpful tool for users to help them find hanzi entries. They have limited information, all information is contained in hanzi entries. Compare bàoyuàn and 抱怨 (bàoyuàn). --Anatoli (обсудить/вклад) 05:19, 10 January 2013 (UTC)
- I knew they contain limited information. Still, they should not exist in the main namespace. This is a dictionary, much more specific than a "tool". The search function is sufficient in directing users to character entries for polysyllabics. With the monosyllabics a link to an Appendix page is all that is necessary. Keeping everything in the main namespace is unworthily energy-consuming. 60.240.101.246 06:40, 10 January 2013 (UTC)
Proposal to change topical categories for Mandarin to match other languages, sort by pinyin, not radical
[edit]See Wiktionary:Beer_parlour/2013/April#Some small changes to Mandarin (also Cantonese, Min Nan) entry structure and about topic categories - suggestion. --Anatoli (обсудить/вклад) 00:20, 11 April 2013 (UTC)
Chinese entries with vowelless pronunciations
[edit]The pronunciation transcriptions in the following entries do not list vowels, though I suspect they should:
- -sche (discuss) 23:22, 23 May 2013 (UTC)
- How did you find them, at random or you have a script for that? User:Tooironic used to add IPA but he is less active now, User:Wyang has developed an entry creation template - Template:cmn new, which also generates the IPA, so for 积累, the IPA is /t͡ɕi⁵⁵ leɪ̯²¹⁴⁻²¹⁽⁴⁾/. My preference is to delete the IPA altogether (replace with
{{rfp}}
, rather than showing the wrong info. --Anatoli (обсудить/вклад) 23:36, 23 May 2013 (UTC)
- I found them at random(ish). I used WP:AWB to find entries containing deprecated IPA characters, and happened to notice that in addition to containing deprecated characters, all of these entries also lacked vowels. - -sche (discuss) 23:47, 23 May 2013 (UTC)
Unified Chinese vote
[edit]Wiktionary:Votes/pl-2014-04/Unified Chinese is starting tomorrow. --Anatoli (обсудить/вклад) 00:45, 28 March 2014 (UTC)
Capitalisation of demonyms and language names - a mini-vote
[edit]Hi,
@Tooironic, @Jamesjiao, @Kc_kennylau, @Wyang
Demonyms and language names are common nouns in Chinese. I suggest to use lower case for pinyin and no space, even if dictionaries are inconsistent. Please vote below and invite anyone who might be interested. So, for example: For 中國人/中国人 (Zhōngguórén) - zhōngguórén, 中文 (Zhōngwén) - zhōngwén, not Zhōngguórén/Zhōngguó rén and Zhōngwén.
Rationale: they are nouns and automatic pinyin generation makes them in lower case, Japanese has already implemented this. --Anatoli (обсудить/вклад) 00:45, 8 May 2014 (UTC)
- Support Use lower case, common nouns (not proper nouns), spell pinyin without a space for most demonyms and language name --Anatoli (обсудить/вклад) 00:45, 8 May 2014 (UTC)
- Oppose The official instruction is to use capital letters and spaces. See w:Pinyin#Capitalization and word formation. --kc_kennylau (talk) 09:00, 8 May 2014 (UTC)
- I don't mean place or personal names. It's about languages and demonyms--Anatoli (обсудить/вклад) 09:08, 8 May 2014 (UTC)
- They're just names anyways. Do you capitalize the word English? --kc_kennylau (talk) 09:56, 8 May 2014 (UTC)
- I do, in English but nihongo or nihonjin is not capitalised. Russian, Finnish doesn't capitalise those. French only capitalises demonyms, not languages. It can go both ways with language names and demonyms, dictionaries have one or the other way. That's why this discussion. --Anatoli (обсудить/вклад) 10:20, 8 May 2014 (UTC)
- Okay, please find me examples of both cases, and I'll switch to abstain (I'm so lazy). --kc_kennylau (talk) 10:28, 8 May 2014 (UTC)
- I do, in English but nihongo or nihonjin is not capitalised. Russian, Finnish doesn't capitalise those. French only capitalises demonyms, not languages. It can go both ways with language names and demonyms, dictionaries have one or the other way. That's why this discussion. --Anatoli (обсудить/вклад) 10:20, 8 May 2014 (UTC)
- They're just names anyways. Do you capitalize the word English? --kc_kennylau (talk) 09:56, 8 May 2014 (UTC)
- I don't mean place or personal names. It's about languages and demonyms--Anatoli (обсудить/вклад) 09:08, 8 May 2014 (UTC)
- (Withdrawn) 00:01, 27 May 2019 (UTC) (modified)
- @Geographyinitiative: Can you make up your mind if you support or oppose this proposal? You voted twice on the same day. It's illegal (your vote won't count) and supporting both options is not part of this vote but you can comment or abstain. Also, please check the topic of the vote. This is about capitalisation of Mandarin pinyin only and only for demonyms e.g. 中國人/中国人 (Zhōngguórén) and language names e.g. 中文 (Zhōngwén). Country, city names, etc. are not part of this vote - they are capitalised. --Anatoli T. (обсудить/вклад) 01:01, 27 May 2019 (UTC)
- Whoops, I see what you are saying. But anyway, here's my response: 汉语拼音正词法基本规则 (2012) 6.3.3 says "专有名词成分与普通名词成分连写在一起,是专有名词或视为专有名词的,首字母大写。例如:Míngshǐ(明史)Hànyǔ(汉语)Yuèyǔ(粤语)Guǎngdōnghuà(广东话)Fójiào(佛教)Tángcháo(唐朝)"
- @Geographyinitiative: Can you make up your mind if you support or oppose this proposal? You voted twice on the same day. It's illegal (your vote won't count) and supporting both options is not part of this vote but you can comment or abstain. Also, please check the topic of the vote. This is about capitalisation of Mandarin pinyin only and only for demonyms e.g. 中國人/中国人 (Zhōngguórén) and language names e.g. 中文 (Zhōngwén). Country, city names, etc. are not part of this vote - they are capitalised. --Anatoli T. (обсудить/вклад) 01:01, 27 May 2019 (UTC)
- Xiandai Hanyu Cidian 7 p513 Hànyǔ / p1620 yuèyǔ (but Yuè by itself is capitalized- don't know what's going on there) / p488 No entry for 广东话, but there is one for "Guǎngdōng níngméng" and "Guǎngdōng yīnyuè" / p396 Fójiào / p1273 No entry for 唐朝, but Táng by itself is capitalized
- (Withdrawn) 01:32, 27 May 2019 (UTC)
- (Withdrawn) 07:47, 28 May 2019 (UTC)
- I find your comments in this section about "do them all" disturbing. Do you actually realise we're building a dictionary? It's not a playground. I don't think this minivote is going anywhere, anyway. --Anatoli T. (обсудить/вклад) 08:24, 28 May 2019 (UTC)
- It's a bit extreme, but I think there is sense to it. We could avoid debate about whether chengyu are hyphenated or whether 不知道 is bu zhidao or buzhidao. And including bu zhi dao AND buzhidao might make searches easier. —Suzukaze-c◇◇ 08:48, 28 May 2019 (UTC)
- I find your comments in this section about "do them all" disturbing. Do you actually realise we're building a dictionary? It's not a playground. I don't think this minivote is going anywhere, anyway. --Anatoli T. (обсудить/вклад) 08:24, 28 May 2019 (UTC)
- (Withdrawn) 07:47, 28 May 2019 (UTC)
- Don't really have any preference for this as I am generally not interested in Pinyin. Wyang (talk) 01:02, 8 May 2014 (UTC)
- What about proper vs common nouns. Is 普通话 or 美国人 a common or a proper noun? --Anatoli (обсудить/вклад) 09:08, 8 May 2014 (UTC)
- Abstain: Survey major dictionaries and 1950-era documents. —Suzukaze-c◇◇ 07:55, 26 May 2019 (UTC)
Comments
[edit]I ran into a capitalized pinyin entry today: Lai2 (linked to from 萊). Capitalized tone-number pinyin like that does look weird to me. Capitalized diacritical pinyin looks less weird. - -sche (discuss) 20:11, 31 May 2014 (UTC)
Capitalisation and part of speech of month names
[edit]Related to the preceding topic, capitalisation and part of speech of month names is being discussed at User talk:LlywelynII#Chinese_months_as_proper_nouns. - -sche (discuss) 18:23, 30 August 2016 (UTC)
- Transliterations of months should definitely be lower case and common nouns in Chinese. --Anatoli T. (обсудить/вклад) 23:25, 26 September 2016 (UTC)
Single-character entry format
[edit]@Suzukaze-c Hi. Are you willing and interested in expanding the Wiktionary:About_Chinese#Entry_format section regarding single-character entries, the use of Definitions header and the need for {{zh-hanzi}}
, parameters for |cat= in {{zh-pron}}
? --Anatoli T. (обсудить/вклад) 23:18, 26 September 2016 (UTC)
- I am for most of it, but what about the blurry area of "what belongs on Template:zh-pron/documentation" and "what belongs on Wiktionary:About Chinese"? (For example, how much do we write about
|cat=
on each page, etc.) —suzukaze (t・c) 23:38, 26 September 2016 (UTC)- This is a policy document. That's the difference. It's OK to duplicate a bit and link to Template:zh-pron/documentation for detail. --Anatoli T. (обсудить/вклад) 23:47, 26 September 2016 (UTC)
- @Atitarev Wiktionary:About_Chinese#Entries_for_single_characters. What do you think? —suzukaze (t・c) 03:49, 27 September 2016 (UTC)
- Looks good, thank you! The document can be tweaked over time but it's a good start.--Anatoli T. (обсудить/вклад) 03:52, 27 September 2016 (UTC)
- @Atitarev Wiktionary:About_Chinese#Entries_for_single_characters. What do you think? —suzukaze (t・c) 03:49, 27 September 2016 (UTC)
- This is a policy document. That's the difference. It's OK to duplicate a bit and link to Template:zh-pron/documentation for detail. --Anatoli T. (обсудить/вклад) 23:47, 26 September 2016 (UTC)
Obsolete policies on Middle and Old Chinese
[edit]==Historical languages== {{wikipedia|History of the Chinese language}} {{wikipedia|Historical Chinese phonology}} Historical Sinitic languages include the spoken languages {{w|Middle Chinese}} (ltc) and {{w|Old Chinese}} (och), the written language {{w|Literary Chinese}} (lzh), and the protolanguage {{w|Proto-Sino-Tibetan}}. Entries for words in these languages are used, except for Proto-Sino-Tibetan, which is a protolanguage and thus in the Reconstruction namespace. These terms can also appear in etymologies for entries in modern Sinitic languages, and in entries for languages that have borrowed from Chinese, notably Japanese, Korean, and Vietnamese. Finer distinctions are possible, such as Late Middle Chinese and Early Middle Chinese for the spoken language, and Literary Chinese versus earlier Classical Chinese for the written language. These distinctions can be made in the text of etymologies, but these do not have ISO 639 codes, and thus are not used for level 2 headings. The precise meaning and status of these “languages” is complicated: narrowly speaking “Middle Chinese” and “Old Chinese” refer to various phonological reconstructions, notably based on rime dictionaries, and do not necessarily refer to a specific historical dialect or common language. Nevertheless, they are useful designations for historical periods. Most modern Sinitic languages descend from Middle Chinese, with the notable exception of Min, which diverged earlier, with Proto-Min also descending from Old Chinese; see [[w:Historical Chinese phonology#Branching off of the modern varieties|branching of modern varieties of Chinese]]. A notable example of this difference is {{m|zh|茶}}, from which English {{m|en|tea}} is from Min and {{m|en|chai}} is from other Chinese. Literary Chinese is significantly different from the spoken languages; this may be compared with Medieval Latin versus Romance languages. Literary Chinese (lzh) is the correct source language for literary terms in modern Sinitic languages, notably {{w|chengyu}} ({{w|four-character idiom}}s), and in borrowings such as the corresponding Japanese {{w|yojijukugo}}. ===Middle Chinese=== {{wikipedia|Middle Chinese}} As Middle Chinese phonology is not attested (it is only reconstructed), please be sure to mark pronunciations with *. ===Old Chinese=== {{wikipedia|Old Chinese}} {{wikipedia|Old Chinese phonology}} As {{w|Old Chinese phonology}} is not attested (it is only reconstructed), please be sure to mark pronunciations with *. As sources differ, please carefully cite specific references (author and year) for any reconstructions.
Obsolete policies on cognates and stubs
[edit]==Cognates and stubs== Across Sinitic languages, a single written form is very frequently shared across a long historical period and wide geographical area. Thus cognate entries in different languages appear on the same page; this occurs quite frequently for cognates in closely related languages in other scripts, but to nowhere near the same degree as in Sinitic languages. Due to this, it is generally unhelpful, and possibly incorrect, to create an entry for one Sinitic simply by copying the heading and definitions for Mandarin. It is unhelpful because this adds no information beyond which a reader could themselves guess (cognate so probably the same meaning), and possibly incorrect because words do differ between these language; blindly copying without a reference is not reliable. Thus, when creating a new Sinitic entry, please try to add ''some'' information distinctive to the particular language, particularly pronunciation, references, or citations. For etymologies, each entry should include an Etymology section indicating its immediate ancestor term. For native words in modern Sinitic languages this is either Middle Chinese (most) or Proto-Min (thence Old Chinese) for Min languages. Per usual practice (see [[Wiktionary:Etymology]]), it is acceptable to include full etymologies back to Proto-Sino-Tibetan in modern entries. However, unless there is something specific to the etymology of a term in a given language, this is tedious to repeat for all modern languages. It is thus preferred (and sufficient) to only include the full history at representative languages, namely Mandarin and Min Nan (most used in each branch), with other languages just indicating the immediate predecessor and having a link reading “more at Mandarin/Min Nan”. Similarly, it is tedious and not helpful to list contemporary cognate terms ''unless'' some particular relationship or contrast is being given. Instead, ancestral relationships can be given both backwards (in the Etymology section), to Middle Chinese, Old Chinese, and Proto-Sino-Tibetan, and forwards (in the Descendents section), from Middle Chinese, Old Chinese, and Proto-Sino-Tibetan to later forms. In these Descendents sections, listing pronunciations of descendent terms along with the spelling allows easy comparison, and avoids the duplication of the same listing in all modern forms. These are more useful than sibling relationships between cognates.
New font for Chinese?
[edit]@Justinrleung, Suzukaze-c Is it just me who feels the font for Chinese is not as pretty as Japanese? I updated my Mac and it has become even uglier. It lacks the 'weight' (is this the correct term?) in comparison. For example, 產 - even the cangjie input looks prettier than the Chinese font. Thoughts? (Disclaimer: I know nothing about fonts...) Wyang (talk) 06:25, 12 October 2016 (UTC)
- I don't know what it looks like on a Mac, nor what fonts are available on a Mac... —suzukaze (t・c) 06:34, 12 October 2016 (UTC)
- Some screenshots: Hani, Hant and Hans. The Hani font can perhaps be improved... edit: screenshot of the zh-ja comparison. Wyang (talk) 06:44, 12 October 2016 (UTC)
- What does this look like? —suzukaze (t・c) 07:20, 12 October 2016 (UTC)
- It looks like this. I feel that all the ones below are more aesthetically pleasing. Wyang (talk) 07:25, 12 October 2016 (UTC)
- Here are how they look on my Mac on three browsers. I think I've changed my browser's font settings, so I don't have the problem that you have. — justin(r)leung { (t...) | c=› } 07:36, 12 October 2016 (UTC)
- It looks like the browser default (and thus probably the best choice) is "PingFang SC". SimSun seems to be imposed on readers by MediaWiki:Common.css. (man, there are some questionable font choices there...) —suzukaze (t・c) 07:41, 12 October 2016 (UTC)
- Code2000? That's the last font I (or anyone) would want for Chinese. And why would the generic sans-serif be put first? — justin(r)leung { (t...) | c=› } 07:47, 12 October 2016 (UTC)
- (holy shit someone shares my hatred for code2000) It's also weird how the fonts for .Hans and .Hant are defined a second time later on. —suzukaze (t・c) 07:48, 12 October 2016 (UTC)
- I think it may have been me who messed it up before (羞慚). Any recommendations on what the block should be changed to? Wyang (talk) 08:32, 12 October 2016 (UTC)
/* Chinese (Han) */
- Maybe this:
- I think it may have been me who messed it up before (羞慚). Any recommendations on what the
- (holy shit someone shares my hatred for code2000) It's also weird how the fonts for .Hans and .Hant are defined a second time later on. —suzukaze (t・c) 07:48, 12 October 2016 (UTC)
- Code2000? That's the last font I (or anyone) would want for Chinese. And why would the generic sans-serif be put first? — justin(r)leung { (t...) | c=› } 07:47, 12 October 2016 (UTC)
- It looks like the browser default (and thus probably the best choice) is "PingFang SC". SimSun seems to be imposed on readers by MediaWiki:Common.css. (man, there are some questionable font choices there...) —suzukaze (t・c) 07:41, 12 October 2016 (UTC)
- Here are how they look on my Mac on three browsers. I think I've changed my browser's font settings, so I don't have the problem that you have. — justin(r)leung { (t...) | c=› } 07:36, 12 October 2016 (UTC)
- It looks like this. I feel that all the ones below are more aesthetically pleasing. Wyang (talk) 07:25, 12 October 2016 (UTC)
- What does this look like? —suzukaze (t・c) 07:20, 12 October 2016 (UTC)
- Some screenshots: Hani, Hant and Hans. The Hani font can perhaps be improved... edit: screenshot of the zh-ja comparison. Wyang (talk) 06:44, 12 October 2016 (UTC)
/* Chinese (Han) */
/* Hani: generic */
/* Hans: simplified */
/* Hant: traditional */
.Hani,
.Hans {
font-family: PingFang SC, Heiti SC, DengXian, Microsoft Yahei, SimHei, Source Han Sans CN, Noto Sans CJK SC, SimSun, NSimSun, SimSun-ExtB, Song, sans-serif;
}
.Hant {
font-family: PingFang TC, Heiti TC, Microsoft Jhenghei, Source Han Sans TW, Noto Sans CJK TC, PMingLiU, PMingLiU-ExtB, MingLiU, MingLiU-ExtB, Ming, sans-serif;
}
.Hani,
.Hans,
.Hant {
font-size: 1.2em;
}
.Hani, .Hani *,
.Hans, .Hans *,
.Hant, .Hant * {
font-style: normal;
font-weight: normal;
}
big.Hani, strong.Hani, b.Hani, b .Hani,
big.Hans, strong.Hans, b.Hans, b .Hans,
big.Hant, strong.Hant, b.Hant, b .Hant {
font-size: 137%;
}
.Hani b,
.Hans b,
.Hant b {
font-size: 125%;
}
- —suzukaze (t・c) 01:58, 13 October 2016 (UTC)
- Ooohhh, I like this. It definitely looks better and more solid than before. If no one objects, we will change it to this until someone proposes an improvement. Wyang (talk) 07:49, 13 October 2016 (UTC)
- —suzukaze (t・c) 01:58, 13 October 2016 (UTC)
Simplified Chinese in all templates and modules
[edit]@Wyang, Justinrleung, Suzukaze-c, Tooironic, Kc kennylau, Bumm13
I think we should stick to the promise of providing simplified Chinese in all templates, modules. The dialectal data tables currently don't show simplified forms. Do people think we need to cater for that? I understand this will be formatting and other work involved but simplified Chinese users shouldn't feel neglected. --Anatoli T. (обсудить/вклад) 09:31, 14 October 2016 (UTC)
- Yeah, it is disabled for now. Displaying both made the table look very cluttered. I was thinking about developing a js switch for all Chinese entries, allowing the user to choose trad/simp in all Chinese texts (zh-l, zh-x, zh-der, zh-dial, etc.). Wyang (talk) 11:01, 14 October 2016 (UTC)
- But that will only work for registered users. How about we have the simplified characters display as ruby, like this: 我們, 妳們? (We might want to increase the size of the ruby.) — justin(r)leung { (t...) | c=› } 16:36, 14 October 2016 (UTC)
- The switch may be a dropdown underneath the ==Chinese== header, similar to how this page hides the romanisation on a click. The Ruby method is potentially good too, if we can increase the size and align them well, though making links may be more complicated. I think User:Suzukaze-c was trying to write some sort of gadget for this some time ago, but I can't find it now. Wyang (talk) 21:21, 14 October 2016 (UTC)
- Why not just display 我們/我们 with a suppressed romanisation? The columns may need to get wider and care should be taken to have correct conversions with the ability to override. What does everybody think? --Anatoli T. (обсудить/вклад) 02:53, 15 October 2016 (UTC)
- I support the idea of showing simplified Chinese wherever possible and when it doesn't look cluttered. —suzukaze (t・c) 05:24, 29 October 2016 (UTC)
- Why not just display 我們/我们 with a suppressed romanisation? The columns may need to get wider and care should be taken to have correct conversions with the ability to override. What does everybody think? --Anatoli T. (обсудить/вклад) 02:53, 15 October 2016 (UTC)
- The switch may be a dropdown underneath the ==Chinese== header, similar to how this page hides the romanisation on a click. The Ruby method is potentially good too, if we can increase the size and align them well, though making links may be more complicated. I think User:Suzukaze-c was trying to write some sort of gadget for this some time ago, but I can't find it now. Wyang (talk) 21:21, 14 October 2016 (UTC)
- But that will only work for registered users. How about we have the simplified characters display as ruby, like this: 我們, 妳們? (We might want to increase the size of the ruby.) — justin(r)leung { (t...) | c=› } 16:36, 14 October 2016 (UTC)
Ranked first (+4089) when sorted by change in #gloss definitions. Wyang (talk) 03:51, 7 November 2016 (UTC)
- Still going strong - number one (+3738) in November 2016. Wyang (talk) 16:44, 13 December 2016 (UTC)
- First again (+3450). 再接再厲! (壓力山大) Wyang (talk) 05:33, 4 February 2017 (UTC)
- First again (+4868). 再接再厲! 奔向100000個詞。 Wyang (talk) 12:26, 9 April 2017 (UTC)
~州
[edit]Are we having entries like 印第安納州, or do we treat them like 上海市 (redirect to 上海?) —suzukaze (t・c) 08:16, 11 November 2016 (UTC)
- I'd say nah, unless it's an abbreviation, like 安省. Wyang (talk) 09:14, 11 November 2016 (UTC)
Definitions format overhaul
[edit]Hi all. I'm thinking about overhauling the format of Chinese definitions, by using a templated approach which strictly associates word information (part of speech, synonyms, antonyms, measure words, examples, dialectal equivalents, etc.) with the individual senses. It may be along the lines of User:Wyang/zh-def. I think this is more conducive to the efficient expansion of the Chinese content with more synonyms, antonyms, ... etc. information. What does everyone think about the changes? Wyang (talk) 09:16, 23 November 2016 (UTC)
@Suzukaze-c, Justinrleung, Atitarev, Tooironic, Hongthay, Mar vin kaiser Wyang (talk) 10:20, 23 November 2016 (UTC)
- +1, very attractive, but I fear it's too radically different from the standard entry format. —suzukaze (t・c) 09:35, 23 November 2016 (UTC)
- I think the formatting of Chinese definitions should match the formatting of definitions for other languages. —Granger (talk · contribs) 12:17, 23 November 2016 (UTC)
- It looks great, and I know you've worked hard on this, but here are potential problems I see:
- Like the others have said, it's too different from other languages.
- The wikicode would probably be harder to pick up for new editors. (It'll take me some time to get use to.)
- There's a bit of repetition, like putting
|pos=part
multiple times for 的. Is that something we really want to do? - It would probably take up more Lua memory, which would not be necessary if we keep the current format. — justin(r)leung { (t...) | c=› } 13:31, 23 November 2016 (UTC)
- Thanks guys. It is a big change, but my feeling is that this sort of sense-synonym/antonym/... integration has to be done sooner or later; there were some calls before (for example User:DTLHS/export, which was referenced in this layout), but no one has really tested doing it. The reason for the integration is that synonyms etc. are only valid on a sense-specific basis, the same as classifiers (which has already been adapted to be sense-specific) and dialectal equivalents. Moedict and Cantodict also do the same.
- The code can probably be simplified, such as switching pos to argument 1, and definition to argument 2. The enclosing zh-def template may be omittable too - if we can automatically generate the
<ol>
~</ol>
using some css magic. If a Java gadget could be designed to allow GUI edit of the individual senses, while the raw code remains unchanged, that would be the most fantastic. The increase in Lua memory usage seems quite small - I tested with the equivalent current code, which was 18.96 MB, slightly smaller compared to the new version (19.62 MB). A good thing about enclosing senses is that sense ids can be created and used to reference individual senses elsewhere. Bot conversion of the definitions should be reasonably straightforward too. Wyang (talk) 14:24, 23 November 2016 (UTC)
- I removed the need for the outer enclosing template, and integrated all the code into a single template. It looks like this:
{{zh-def |n|[[sugar]] |syn: 食糖 |ant: 鹽 |x1: {{zh-x|糖尿病|[[diabetes]]}} |x2: {{zh-x|糖{tong4}水|[[sugar water]]|C}} |- |n|[[candy]]; [[sweets]] |mw: m:塊-“piece”,c:嚿-“piece” |syn: 糖果 |x1: {{zh-x|棒棒糖|lollipop|C}} |x2: {{zh-x|糖 食 得 多 冇益。|Eating too much '''candy''' is unhealthy.|C}} |- |n|{{zh-alt-form|醣|[[saccharide]]}} |lb: organic chemistry |x1: {{zh-x|多糖|polysaccharide}} }}
- where the different senses are separated by
|-
, and the effect is the same. It should be easier to use now. The memory requirement is slightly reduced in the process: 18.97 MB, nearly the same as the current format (18.96 MB). Wyang (talk) 23:18, 23 November 2016 (UTC)
- where the different senses are separated by
- @Atitarev, Tooironic, Hongthay, Mar vin kaiser: Perhaps the previous ping didn't work correctly. Wyang (talk) 09:32, 24 November 2016 (UTC)
- Hi. Sorry, I got the ping but I'm a bit confused. It's a good effort but I agree that this is a radical change with the current format and too different with other languages again. Displaying PoS's in front of definitions (translations) are definitely worth considering. --Anatoli T. (обсудить/вклад) 09:38, 24 November 2016 (UTC)
- Thanks Anatoli. I accidentally discovered something I wrote > 2 yrs ago on Talk:一致, and it seems my desire to change the format has been long-standing... The division of definitions by part of speech is really not ideal for analytic and less inflecting languages (努力, 保險, 可能). IMO treating synonyms, antonyms and so forth as belonging to senses is also important, as we add more and more of these see-also-type of words. At the moment 生 (shēng) looks fairly neat (albeit not as clear as if the PoS info were next to the senses), but if I add synonyms, antonyms, see-also terms as in User:Wyang/zh-def#生, the page could become quite confusing. Wyang (talk) 11:22, 24 November 2016 (UTC)
- Hi. Sorry, I got the ping but I'm a bit confused. It's a good effort but I agree that this is a radical change with the current format and too different with other languages again. Displaying PoS's in front of definitions (translations) are definitely worth considering. --Anatoli T. (обсудить/вклад) 09:38, 24 November 2016 (UTC)
- As already mentioned above, these are radical changes you are proposing. Since they go to the heart of Wiktionary's layout, you'd be better off seeing if you can carry them out by getting support from other members of the community for ALL languages, not just Chinese. ---> Tooironic (talk) 12:49, 24 November 2016 (UTC)
- I really have no faith in the Wiktionary community in this. Haiz.
- If we, as the Chinese-editing community, believe that a current practice is unfittingly designed for Chinese, we should strive to achieve what we think is most suitable. I myself only have limited power in making a difference. It's like the opposition to
{{zh-pron}}
formatting and the Chinese merger before; other people are unfamiliar with this, so unless we adopt what is right, we won't progress efficiently. Wyang (talk) 11:58, 25 November 2016 (UTC)
Sichuanese
[edit]I'd like to add entries for Sichaunese, but it appears that it doesn't have an ISO code, so one would have to be created. I don't think it should be included under the Mandarin section for zh-pron and other places due to the differences between the two (47.8% lexical similarity and < 60% intellegility) and also there is the sheer number of potentional listings that could be under Mandarin (ie Shandong, Shaanxi, Dongbei etc.) Maybe listing under Southwest Mandarin would be okay though. Most of the coverage would probably be on the Chengdu dialect, but I'm not sure how other dialects, some of which are quite different, would be accounted for.--Prisencolin (talk) 02:13, 10 December 2016 (UTC)
- (TBH it's currently impossible to nest it under Mandarin with the current zh-pron code —suzukaze (t・c) 03:53, 10 December 2016 (UTC))
- It is a variety of Mandarin though - it would make more sense to group it under Mandarin and reorganise the Standard Mandarin tags accordingly. Wyang (talk) 16:44, 13 December 2016 (UTC)
- Sichuanese should definitely be nested under Mandarin but I don't have the guts to modify Module:cmn-pron. —suzukaze (t・c) 12:47, 14 December 2016 (UTC)
- I'm not contesting that it's part of the Mandarin branch, I'm just concerned that at some point we might have over a dozen different entries under "Mandarin" that could appear and it might be a bit disorganized. Why don't we just create a new module for "Southwest Mandarin" and put it under that? It still has "Mandarin" in the name after all. It also has the benefit of being able to group related varieties together in specific subcategories.--Prisencolin (talk) 04:55, 15 December 2016 (UTC)
- Sichuanese should definitely be nested under Mandarin but I don't have the guts to modify Module:cmn-pron. —suzukaze (t・c) 12:47, 14 December 2016 (UTC)
'The body of this page needs to be updated to explain the new policy'
[edit]Hi, regarding the message reading 'The body of this page needs to be updated to explain the new policy.', I'd like to know when the update is going to be carried out, or at least where I can read the new policy. Thanks in advance. --Backinstadiums (talk) 15:48, 8 June 2017 (UTC)
- @Backinstadiums: I think it's pretty much up to date now. @Suzukaze-c, Atitarev, Wyang, is there any old policy still lingering around on the page? Can we remove that notice? — justin(r)leung { (t...) | c=› } 16:02, 8 June 2017 (UTC)
- Yes, the notice can go now. I put it there after we moved to the unified Chinese L2 header but the policy described the old standards. Now it's matching what we are doing.--Anatoli T. (обсудить/вклад) 22:15, 8 June 2017 (UTC)
- Agreed. Wyang (talk) 23:01, 8 June 2017 (UTC)
- The Etymology section is still outdated (I'm not sure how to update it), but otherwise I think it's OK. —suzukaze (t・c) 23:08, 8 June 2017 (UTC)
- @Suzukaze-c: Could we just remove the part that mentions literary Chinese altogether for now? — justin(r)leung { (t...) | c=› } 23:17, 8 June 2017 (UTC)
- @Justinrleung I don't know if this policy is a draft. It's official - either endorsed by a vote or unchallenged by the community. The format of soft redirects wasn't endorsed, though but wasn't challenged either. There are still thousands of unconverted Mandarin and Cantonese hanzi entries, which are hard to convert for obvious reasons. Things to discuss are pinyin, jyutping and POJ entries (headers, categories and templates), Cyrillic Dungan and Arabic Xiao'erjing. What to do with topolects without an established writing system and lack of transliteration standards.--Anatoli T. (обсудить/вклад) 23:47, 8 June 2017 (UTC)
- @Atitarev: Is there a better template to use?
{{policy}}
seems too strong, but{{policy-DP}}
/{{policy-ED}}
seem too weak. — justin(r)leung { (t...) | c=› } 23:51, 8 June 2017 (UTC) - @Justinrleung I see your point, thanks. Yes, leave it as is. Another thing we need to do for Chinese (and any language in scriptio continua, including Vietnamese) is to define CFI. Definition of "word" or "some of parts" are not exactly the same in Chinese as with languages with spaces. Even German or Finnish criteria for inclusion differ from English. --Anatoli T. (обсудить/вклад)
- @Atitarev: Is there a better template to use?
- @Justinrleung I don't know if this policy is a draft. It's official - either endorsed by a vote or unchallenged by the community. The format of soft redirects wasn't endorsed, though but wasn't challenged either. There are still thousands of unconverted Mandarin and Cantonese hanzi entries, which are hard to convert for obvious reasons. Things to discuss are pinyin, jyutping and POJ entries (headers, categories and templates), Cyrillic Dungan and Arabic Xiao'erjing. What to do with topolects without an established writing system and lack of transliteration standards.--Anatoli T. (обсудить/вклад) 23:47, 8 June 2017 (UTC)
- @Suzukaze-c: Could we just remove the part that mentions literary Chinese altogether for now? — justin(r)leung { (t...) | c=› } 23:17, 8 June 2017 (UTC)
- The Etymology section is still outdated (I'm not sure how to update it), but otherwise I think it's OK. —suzukaze (t・c) 23:08, 8 June 2017 (UTC)
- Agreed. Wyang (talk) 23:01, 8 June 2017 (UTC)
- Yes, the notice can go now. I put it there after we moved to the unified Chinese L2 header but the policy described the old standards. Now it's matching what we are doing.--Anatoli T. (обсудить/вклад) 22:15, 8 June 2017 (UTC)
- On that note, I would like to suggest that we relax the part of CFI on personal names slightly, to allow names which are directly found in idioms and set phrases, such as
- Wyang (talk) 03:54, 10 June 2017 (UTC)
Looking to improve Wenzhounese coverage
[edit]I started the outline of an "about" page at User:Prisencolin/wenzhou. Wenzhounese should be distinct enough from other Wu dialects to warrant a page by itself.--Prisencolin (talk) 18:12, 5 July 2017 (UTC)
- (See also Template_talk:zh-pron#Wenzhou_dialect —suzukaze (t・c) 18:15, 5 July 2017 (UTC))
- @Prisencolin, Suzukaze-c: I think the first step is to add Wenzhounese to
{{zh-pron}}
. We do have an editor from Wenzhou, @Mteechan, so it would be great if we could start adding Wenzhounese to Wiktionary. We need to determine which romanization system we should be using. @Wyang, Atitarev, any thoughts? — justin(r)leung { (t...) | c=› } 18:40, 5 July 2017 (UTC)- Wupin, or Wu romanization, the one wu-chinese.com uses will do. Nevertheless, it could be improved to some extent. Mteechan (talk) 18:52, 5 July 2017 (UTC)
- I'm still curious as to how irregular the phonology (esp. tone sandhis) is - this will determine the kind of system that would be ideal for use. Wyang (talk) 23:01, 5 July 2017 (UTC)
- Well, the tone sandhi is pretty complicated. I've made a lookup table for 2-word sandhi, but it's based on my accent, not the de facto "standard" accent in urban Wenzhou. Other than that, the phonology is not that irregular. Mteechan (talk) 04:38, 6 July 2017 (UTC)
- @Prisencolin, Suzukaze-c: I think the first step is to add Wenzhounese to
- @D.s.ronis has done some work on Wenzhounese on Wikipedia, such as creating Wenzhounese romanisation.--Prisencolin (talk) 06:33, 9 July 2017 (UTC)
- Glossika has a Wenzhounese course as well, for those interested. Wyang (talk) 07:42, 11 July 2017 (UTC)
Taishanese and Teochew
[edit]Taishanese and Teochew now have codes, pursuant to the discussion archived at Wiktionary:Language treatment requests/Archives/2015-19#Taishanese_and_Teochew. - -sche (discuss) 07:24, 19 January 2018 (UTC)
Header of non-Chinese script entries
[edit]Wiktionary:Votes/pl-2014-04/Unified Chinese decided that words written in Chinese characters should be unified to Chinese header. However it also says the formats of templates in words written in non-Han scripts devised specifically for particular topolects above are not the subject of the vote and can be discussed separately if needed.
Sinitic terms (lemma or not) written in non-Han scripts includes:
- Pinyin romanization of words
- Jyutping romanization of characters
- POJ form of words
- Cyrillic Dungan
- Xiao'erjing words
- others, like zhuyin fuhao
There're two different topolect headings to use:
- Use the topolect as heading (e.g. Mandarin, Cantonese, Min Nan, Dungan)
- Use Chinese as heading for all terms (like this and this)
Also needing point out:
- Currently the heading of Pinyin entry is inconsistent (29822 Mandarin header vs 1318 Chinese header; MediaWiki:Gadget-AcceleratedFormCreation.js uses "Chinese")
- There're precedent to not use specific dialectal header for terms orthography exclusive to a specific dialect, see Wiktionary:Votes/2011-10/Unified Romanian
I propose to migrate all Sinitic terms (lemma or not) to Chinese header and eliminate any topolect header, to finish unification of Chinese. Any thought? Note this proposal only concerns header and says nothing about category. --Zcreator (talk) 02:24, 4 February 2018 (UTC)
- Support. Wyang (talk) 03:06, 4 February 2018 (UTC)
- Weak Oppose, since romanizations like Jyutping are made specifically for Cantonese, unlike hanzi spellings, which can be shared across dialects.
- AcceleratedFormCreation.js seems to be using "Chinese" because the accelerated creation links are found under a Chinese header. —Suzukaze-c◇◇ 07:05, 3 May 2018 (UTC)
- (Notifying Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator, Zcreator alt, Dine2016): : Can we agree on some of the existing non-Chinese headers, maybe one at a time? I think pinyin entries could look like this under the Chinese L2 header:
- Hanyu Pinyin Běijīng (Zhuyin ㄅㄟˇ ㄐㄧㄥ)
- The only difference from the current Běijīng entry would be a different L2 header (Chinese) and a linked name of the romanisation. Since Hanyu Pinyin is only used for Mandarin, it becomes obvious, which lect the romanisation applies to. --Anatoli T. (обсудить/вклад) 07:29, 3 May 2018 (UTC)
- My understanding is that unification of Chinese reduces duplication due to the large number of shared written forms across lects.
- There is no such concern for romanizations, which are unique to a certain lect, so I think they should not use the "Chinese" header. Min Nan is Chinese, but I am not yet convinced that chai-iáⁿ#Chinese is helpful. I imagine that a "unified Chinese" plan would never have taken place if China used phonetic scripts, and there were no hanzi to "bind" lects together.
- —Suzukaze-c◇◇ 08:33, 3 May 2018 (UTC)
- Thanks for the response. Let's see what other people think. Converting Min Nan Pe̍h-ōe-jī to Chinese L2 was looked at favourably but not everyone thinks we should have Hanyu pinyin entries in the first place. --Anatoli T. (обсудить/вклад) 13:22, 3 May 2018 (UTC)
Dungan Cyrillic transliteration
[edit](Notifying Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator, Zcreator alt, Dine2016): : Hi. I think Dungan Cyrillic should be transliterated into Roman letters in Chinese entries.
I also think we should review the method itself, which is not quite standard, anyway - e.g. get rid of Cyrillics in the translit and make it more meaningful. --Anatoli T. (обсудить/вклад) 03:20, 11 April 2018 (UTC)
- Translit in
{{zh-pron}}
: definitely. - About the translit itself: I based it on w:ru:Дунганская_письменность#Таблица_соответствия_алфавитов, which is where
ь
came from. I'm not sure what it should be replaced with.î
? —Suzukaze-c◆◆ 03:25, 11 April 2018 (UTC)- @Suzukaze-c: Thanks, let me think about it when I have a bit more time ("î" may not be a bad suggestion) and let's see what others think about it. --Anatoli T. (обсудить/вклад) 03:36, 11 April 2018 (UTC)
Superscript tone numbers
[edit](Notifying Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator, Zcreator alt, Dine2016): : Would superscript tone numbers for Gan, Xiang, Jin, Teochew, etc. look better if they were made superscript by default like Cantonese, e.g. 所有 (so2 jau5)? This is not currently applicable to all templates - 所有 (so2 jau5). Pretty sure it was implemented by Kenny. --Anatoli T. (обсудить/вклад) 11:40, 15 April 2018 (UTC)
- I agree. Wyang (talk) 12:54, 15 April 2018 (UTC)
- I agree as well. I think this should also be automatic in
{{zh-l}}
. — justin(r)leung { (t...) | c=› } 18:19, 15 April 2018 (UTC)
- I agree as well. I think this should also be automatic in
- What's the more recognised standard? I am not familiar with pronunciation schemes for dialects other than Mandarin. Jamesjiao → T ◊ C 22:02, 16 April 2018 (UTC)
"other" dialects
[edit]Having {{zh-dial}}
but not adding relevant IPA to entries seems odd to me. Perhaps we need to rethink {{zh-pron}}
. —Suzukaze-c◇◇ 04:43, 25 September 2018 (UTC)
- By all means do please. Something similar to Xiaoxuetang would be ideal, but it would also mean a large maintenance requirement. Wyang (talk) 05:00, 25 September 2018 (UTC)
Romanisations of Chinese
[edit]According to the present policy, Pinyin romanisations of monosyllables and polysyllables for Standard Mandarin (aka Putonghua), such as "yī" and "bùguò" are allowed. However, for Standard Cantonese, only Jyutping romanisations of monosyllables of monosyllables are allowed (e.g. jyut6, ping3), while those of polysyllables are disallowed. Why is there such unequal treatments for the two languages? I believe that Jyutping romanisations of polysyllables should be allowed and massly created, as Pinyin romanisations of polysyllables are allowed and exist in a large quantity. Jonashtand (talk) 06:34, 9 December 2018 (UTC)
- @Jonashtand: It was a result of this vote. The only reason for only monosyllables I see in that vote is that "this is also what is done for pinyin with tone numbers". — justin(r)leung { (t...) | c=› } 06:49, 9 December 2018 (UTC)
Proposed change to zh-der
[edit](Withdrawn) 22:51, 4 April 2019 (UTC)
- That would be very confusing to mix up different romanisations. Also, I think this topic is only for Chinese editors only, so this can be discussed at Wiktionary talk:About Chinese instead, rather than here. --Anatoli T. (обсудить/вклад) 23:14, 4 April 2019 (UTC)
- moved to Wiktionary talk:About Chinese per suggestion
- If it's really that confusing, then we should split up all Chinese into different dialects. Right now we are basically saying, if an entry has a Mandarin pronunciation, we can tell people about that, but if it has only dialect pronunciations, we're just going to ignore those and not let you see them.
- (Withdrawn) 23:33, 4 April 2019 (UTC)
- Of course we can but it needs to be split by dialects, if they are not Mandarin. I don't know we have a version of
{{zh-l}}
for specific lects, if not, the transliterations need to be provided manually. --Anatoli T. (обсудить/вклад) 23:54, 4 April 2019 (UTC)- Note that if some feature is missing for a language or dialect, it's because nobody has done it. --Anatoli T. (обсудить/вклад) 23:57, 4 April 2019 (UTC)
- (Withdrawn) 00:36, 5 April 2019 (UTC)
- (Withdrawn) 00:53, 5 April 2019 (UTC) (modified)
- (Withdrawn) 00:57, 5 April 2019 (UTC)
- This is easier said than done. Something we need to consider is which lect to display (if we're only displaying one). Mandarin has that status of being the standard for Chinese, but there's no rule to say which one should be chosen if a term is non-Mandarin but used in many other lects. Do we want to display several lects at a time? Do we need a separate parameter to specify which lect(s) we want to display? There are just so many things to consider that editors like me haven't actually bothered to address about it. — justin(r)leung { (t...) | c=› } 03:08, 5 April 2019 (UTC)
- @Geographyinitiative: Please don't claim you received endorsement, if nobody reverted your edit. A good practice is, at the very minimum, advise the users with a
{{qualifier|CHINESE VARIETY}}
, that the term is dialectal or doesn't belong to the same lect, as in 斯普特尼克 (sīpǔtèníkè) and 史撥尼克/史拨尼克. --Anatoli T. (обсудить/вклад) 03:43, 5 April 2019 (UTC)- (Withdrawn) 04:10, 5 April 2019 (UTC)
- (Withdrawn) 04:17, 5 April 2019 (UTC)
- (Withdrawn) 04:10, 5 April 2019 (UTC)
- @Geographyinitiative: Please don't claim you received endorsement, if nobody reverted your edit. A good practice is, at the very minimum, advise the users with a
- This is easier said than done. Something we need to consider is which lect to display (if we're only displaying one). Mandarin has that status of being the standard for Chinese, but there's no rule to say which one should be chosen if a term is non-Mandarin but used in many other lects. Do we want to display several lects at a time? Do we need a separate parameter to specify which lect(s) we want to display? There are just so many things to consider that editors like me haven't actually bothered to address about it. — justin(r)leung { (t...) | c=› } 03:08, 5 April 2019 (UTC)
- Note that if some feature is missing for a language or dialect, it's because nobody has done it. --Anatoli T. (обсудить/вклад) 23:57, 4 April 2019 (UTC)
- Of course we can but it needs to be split by dialects, if they are not Mandarin. I don't know we have a version of
- I don't like your e.g. diff because, again, you are mixing dialects together without supplying the important information, eg.
{{q|Cantonese}}
: (Cantonese) next to the word (before or after) or under a different subheader. --Anatoli T. (обсудить/вклад) 04:39, 5 April 2019 (UTC)
- I don't like your e.g. diff because, again, you are mixing dialects together without supplying the important information, eg.
(Withdrawn) 01:17, 22 April 2019 (UTC)
- ([1] —Suzukaze-c◇◇ 01:20, 22 April 2019 (UTC))
- As a temporary measure, what about making
extract_gloss
return""
if extracting failed (that is, if the gloss contains characters like|
,=
,{
, etc.)? --Dine2016 (talk) 07:15, 23 April 2019 (UTC)- (Withdrawn) 22:10, 24 April 2019 (UTC)
- (Withdrawn) 02:01, 26 April 2019 (UTC)
- (Withdrawn) 02:05, 26 April 2019 (UTC)
- (Withdrawn) 04:05, 26 April 2019 (UTC)
- You need to have a more positive attitude. The template can't handle other template inside of it at the moment, which can, I'm sure be resolved. What can YOU do to help this "nonsense dictionary"? Edits are linked like this: diff. --Anatoli T. (обсудить/вклад) 04:11, 26 April 2019 (UTC)
- (Withdrawn) 04:32, 26 April 2019 (UTC)
- This is also a technical issue, not specific to Chinese. WT:GP is a place to ask for problems, not necessarily Chinese editors. Wyang has left and the rest of us may not be Lua and template savvy, not on the same level, anyway. I didn't ask for full definitions to be added to
{{zh-see}}
, which caused the issue in the first place. We can revert to simple soft-redirect without any definitions, which will invariably cause problems. --Anatoli T. (обсудить/вклад) 04:37, 26 April 2019 (UTC)
- This is also a technical issue, not specific to Chinese. WT:GP is a place to ask for problems, not necessarily Chinese editors. Wyang has left and the rest of us may not be Lua and template savvy, not on the same level, anyway. I didn't ask for full definitions to be added to
- (Withdrawn) 04:32, 26 April 2019 (UTC)
- You need to have a more positive attitude. The template can't handle other template inside of it at the moment, which can, I'm sure be resolved. What can YOU do to help this "nonsense dictionary"? Edits are linked like this: diff. --Anatoli T. (обсудить/вклад) 04:11, 26 April 2019 (UTC)
- (Withdrawn) 04:05, 26 April 2019 (UTC)
- (Withdrawn) 02:05, 26 April 2019 (UTC)
- (Withdrawn) 02:01, 26 April 2019 (UTC)
- (Withdrawn) 22:10, 24 April 2019 (UTC)
- (Withdrawn) 21:16, 27 April 2019 (UTC)
- @Geographyinitiative That is a separate issue, which I have fixed. — justin(r)leung { (t...) | c=› } 05:01, 28 April 2019 (UTC)
- As a temporary measure, what about making
POJ entries
[edit](Withdrawn) 06:06, 1 May 2019 (UTC)
- @Geographyinitiative: Using zh-see for POJ is still experimental, but I think they all should be converted to zh-see eventually. (I can't seem to find the relevant discussions though.) About the quotation, it should either be put on the 綴 page or in Citations:tòe. BTW, tòe does not mean "with" but "to follow". — justin(r)leung { (t...) | c=› } 07:57, 1 May 2019 (UTC)
- I support the use of
{{zh-see}}
for Min Nan POJ when there is a matching hanzi entry, to be decided or simply kept with entries like o͘-tó͘-bái. --Anatoli T. (обсудить/вклад) 08:24, 1 May 2019 (UTC)
- I support the use of
(Withdrawn) 21:52, 20 May 2019 (UTC)
- It's such a big deal, wow. All Chinese editors on board! LOL diff, diff, diff--Anatoli T. (обсудить/вклад)
- (Withdrawn) 08:01, 21 May 2019 (UTC)
- We are a dictionary too. The references give the correct pronunciations. As for capitalisations, spacing and hyphenation of the transliterations, it’s up to dictionary owners’ rule. Pinyin is not that important and the hyphen was used for etymological purposes in those dictionaries, nothing to do with the Chinese spelling or how the word is pronounced. --Anatoli T. (обсудить/вклад) 08:10, 21 May 2019 (UTC)
- (Withdrawn) 08:01, 21 May 2019 (UTC)
- Support —Suzukaze-c◇◇ 22:35, 21 May 2019 (UTC)
- @Geographyinitiative, Atitarev, Suzukaze-c: I'm pretty sure we've talked about this many, many times before. Is there a need to revisit this issue? We've basically settled on no hyphens in chengyu because the structure is often debatable. — justin(r)leung { (t...) | c=› } 23:01, 21 May 2019 (UTC)
- I'm not upset about the status quo, but I wouldn't mind if someone wanted to comprehensively take the time. —Suzukaze-c◇◇ 23:08, 21 May 2019 (UTC)
- Relevant discussion: Talk:興高采烈, Talk:無精打采, User talk:Tooironic#Hyphens in chengyu. If we do decide to use hyphens in chengyu, it will take a lot of work to actually go through all the chengyu entries and correct them accordingly. — justin(r)leung { (t...) | c=› } 23:14, 21 May 2019 (UTC)
- (Withdrawn) 23:58, 21 May 2019 (UTC)
- There's no such thing as "the right thing" if policies and conventions differ between different sources. Using correct Chinese is important. Pinyin is a tool, not a language.
- As dictionary editors, we can and should decide on rules, make a proposal, have a vote, write it up as a language policy. Pinyin is not a writing system, this should only be used as such, even if we allow soft-redirect entries for pinyin.
- Both suggestions - "no space in chengyu" or "follow certain dictionary standard" have merits. I suggested a vote in Wiktionary_talk:About_Chinese#Capitalisation_of_demonyms_and_language_names_-_a_mini-vote on this very page, which was mostly ignored. Chinese editors are less worried in how Chinese words are rendered in Roman scripts, if it doesn't impact pronunciations, such as spaces, capitalisation or hyphenation but we can still define some rules and follow them. --Anatoli T. (обсудить/вклад) 00:45, 22 May 2019 (UTC)
- (Withdrawn) 02:06, 26 May 2019 (UTC)
- I oppose adding all romanisations from all dictionaries. We may want adding zhuyin for Min Nan and Hakka but we almost covered all romanisations for varieties in use, adding all possible variants is silly, no dictionary does it. Any dictionary defines romanisations they use and consistently stick to the definitions. --Anatoli T. (обсудить/вклад) 07:34, 26 May 2019 (UTC)
- (@Geographyinitiative: I invite you to comment on User:Suzukaze-c/p/mul#Chinese at its talk page. —Suzukaze-c◇◇ 07:47, 26 May 2019 (UTC))
- I oppose adding all romanisations from all dictionaries. We may want adding zhuyin for Min Nan and Hakka but we almost covered all romanisations for varieties in use, adding all possible variants is silly, no dictionary does it. Any dictionary defines romanisations they use and consistently stick to the definitions. --Anatoli T. (обсудить/вклад) 07:34, 26 May 2019 (UTC)
- (Withdrawn) 23:58, 21 May 2019 (UTC)
(Withdrawn) 02:01, 26 May 2019 (UTC)
- @Geographyinitiative: I haven't found a better explanation. The glyph origin comes from 臺灣閩南語按呢寫. (BTW, this discussion should be at WT:TR or WT:ER). — justin(r)leung { (t...) | c=› } 02:58, 26 May 2019 (UTC)
Add HSK level of the Hanzi
[edit]The Japanese section shows the 常用漢字 level of the Kanji, so I'd like to propose adding the HSK level of the hanzi too --Backinstadiums (talk) 14:39, 14 July 2019 (UTC)
- (Withdrawn) 11:43, 15 July 2019 (UTC)
- @Geographyinitiative: where have you used it? The best lists are here http://www.hskhsk.com/word-lists.html. I really like the one of homophones http://hskhsk.pythonanywhere.com/homophones --Backinstadiums (talk) 19:18, 15 July 2019 (UTC)
- (Withdrawn) 19:34, 15 July 2019 (UTC)
- @Geographyinitiative: what other tests are you referring to? --Backinstadiums (talk) 19:39, 15 July 2019 (UTC)
- (Withdrawn) 19:46, 15 July 2019 (UTC)
- @Geographyinitiative: I see. Why do kanji only show the list of 常用漢字? --Backinstadiums (talk) 20:05, 15 July 2019 (UTC)
- (Withdrawn) 22:21, 15 July 2019 (UTC)
- @Geographyinitiative: In any case, most tests classify the same characters in the same levels, so only a group of characters would have two different levels at most. --Backinstadiums (talk) 22:54, 15 July 2019 (UTC)
- @Geographyinitiative Where can I add my proposal? --Backinstadiums (talk) 17:39, 17 July 2019 (UTC)
- (Withdrawn) 20:51, 17 July 2019 (UTC)
- @Geographyinitiative in the discussion page? --Backinstadiums (talk) 23:13, 17 July 2019 (UTC)
- @Geographyinitiative: what other tests are you referring to? --Backinstadiums (talk) 19:39, 15 July 2019 (UTC)
- (Withdrawn) 19:34, 15 July 2019 (UTC)
- @Geographyinitiative: where have you used it? The best lists are here http://www.hskhsk.com/word-lists.html. I really like the one of homophones http://hskhsk.pythonanywhere.com/homophones --Backinstadiums (talk) 19:18, 15 July 2019 (UTC)
Proposal: adding elasticity/flexibility
[edit]I'll be concise for those knowledgeable, and refer to brief and basic bibliography for those who are not.
The Chinese elasticity/flexibility is a lexical property of chinese terms, two sides of the same coin, which must be reflected in the very same entry for a certain lemma.
Therefore, for example the fifth version of the prestigious XDHYCD (Xiandai Hanyu Cidian) applies mutual annotations in the respective entries, so that the entry for 煤 mei ‘coal’ reads "noun, … also called 煤炭 mei-tan ‘coal-charcoal’", and the entry for 煤炭 meitan ‘coal-charcoal’ is annotated as "noun, 煤 mei ‘coal’".
Unfortunately, currently in wiktionary this is wrongly reflected in the broadly termed 'compounds' section, as a synonym or after 'see also', and only for the monosyllabic version.
Please, before commenting read the following brief article (and if necessary further references within it); if you still have any questions, I'll be glad to try and answer them.
http://www-personal.umich.edu/~duanmu/2014Elastic.pdf
Finally, elasticity from Xiandai Hanyu Cidian 2005 has been tabulated in the following open access thesis
deepblue.lib.umich.edu/bitstream/2027.42/116629/1/yandong_1.pdf
I hope an enriching discussion ensues for this critical lexicograhical issue --Backinstadiums (talk) 13:43, 19 August 2019 (UTC)
phoneticity
[edit]According to DeFrancis' "Visible Speech", Chinese "phoneticity" reaches up to 90% for "the two thousand or so that are necessary for basic literacy". --Backinstadiums (talk) 12:23, 21 September 2019 (UTC)
Adding homophones irrespective of tone
[edit]As the entry ping already exists as "Nonstandard spelling of pīng, píng and pìng", it stands to reason to add a section in Chinese characters' entries for homophonic words irrespective of tones too --Backinstadiums (talk) 09:59, 1 October 2019 (UTC)
Wiedenhof's A Grammar of Mandarin
[edit]According to Wiedenhof's A Grammar of Mandarin, page 43,
The final spelled as -o is only combined with the initials b-, p-, m-, f-. This vowel matches the vowel part of the final -uo [wɔ].
However in page 45, the author states
The fnal -uo [wɔ] is spelled as -o before the labial initials, b-, p-, m-, f-.
According to page 44,
"Weng" syllables rhyme with the fnal -ong [ʊŋ]
However, he'd specified weng as [wʌŋ], to add
Weng displays the same type of variation as the fnal -un: it may lose its rounding toward the end, [wəŋ].
Page 66 reads
there's free variation between [wʌŋ] and [ʊŋ] for both fnals, with complementary distribution.
Can somebody clarify these contradictions? --Backinstadiums (talk) 20:05, 2 October 2019 (UTC)
bound morphemes
[edit]Is it possible to graphically show bound lemmas just as we do for exmaple for the English -able? --Backinstadiums (talk) 10:52, 12 October 2019 (UTC)
Mandarin tone contours along diphthongs
[edit]I'd like to add information about how tone contours are orally distributed along Mandarin rhyme diphthongs. I've tried to find a graphic with variables such as time, volume of speech, pitch levels, etc. to no avail --Backinstadiums (talk) 09:07, 22 October 2019 (UTC)
- This is more Wikipedia material than Wiktionary, since this is very specific phonetic information. — justin(r)leung { (t...) | c=› } 01:16, 24 October 2019 (UTC)
Enable searches in zhuyin
[edit]Every entry shows its zhuyin rendition, so it makes no sense not to use it in the searchbox. --Backinstadiums (talk) 10:29, 10 December 2019 (UTC)
- (Withdrawn) 10:54, 10 December 2019 (UTC)
- The search box functions cannot be changed by users. Perhaps this can be proposed for next year's meta:Community Wishlist Survey. — justin(r)leung { (t...) | c=› } 21:06, 10 December 2019 (UTC)
zh-dial vs. including dialectal terms in the thesaurus
[edit]i don't like it. —Suzukaze-c◇◇ 23:01, 16 December 2019 (UTC)
- @Suzukaze-c: I don't like it either. @Tooironic, Mar vin kaiser, Atitarev, Dine2016, any thoughts? — justin(r)leung { (t...) | c=› } 23:50, 16 December 2019 (UTC)
- (Withdrawn) 00:28, 17 December 2019 (UTC) (modified)
- @Geographyinitiative: Thanks for your input and raising this issue, but the "Chinese" header is not going any time soon. That's why we need to define the roles of the two templates (
{{zh-dial}}
and{{zh-syn-saurus}}
) so that they don't overlap in function. — justin(r)leung { (t...) | c=› } 00:41, 17 December 2019 (UTC)- (Withdrawn) 01:43, 17 December 2019 (UTC)
- @Geographyinitiative: The Chinese header's not going any time soon, even if it should be. There's no quick and dirty way to change existing framework to separate lects, whatever that looks like. Sometimes it's more about usability than "correctness", whatever that may look like. — justin(r)leung { (t...) | c=› } 01:51, 17 December 2019 (UTC)
- (Withdrawn) 01:58, 17 December 2019 (UTC)
- @Justinrleung: Thanks for pinging. Can we see some
{{diff}}
's, please? - I strongly oppose unilateral actions by User:Geographyinitiative. He again knowingly violates the agreed format. It's not just a viewpoint, it's how it's done. You can express away your opinion, it's your actions in the mainspace, which is the problem. --Anatoli T. (обсудить/вклад) 10:07, 18 December 2019 (UTC)
- @Atitarev: See something like 妓女, where they both exist. — justin(r)leung { (t...) | c=› } 16:30, 18 December 2019 (UTC)
- @Justinrleung: Thanks for pinging. Can we see some
- (Withdrawn) 01:58, 17 December 2019 (UTC)
- @Geographyinitiative: The Chinese header's not going any time soon, even if it should be. There's no quick and dirty way to change existing framework to separate lects, whatever that looks like. Sometimes it's more about usability than "correctness", whatever that may look like. — justin(r)leung { (t...) | c=› } 01:51, 17 December 2019 (UTC)
- (Withdrawn) 01:43, 17 December 2019 (UTC)
- @Geographyinitiative: Thanks for your input and raising this issue, but the "Chinese" header is not going any time soon. That's why we need to define the roles of the two templates (
- (Withdrawn) 00:28, 17 December 2019 (UTC) (modified)
- @Suzukaze-c What's the issue in your view? ---> Tooironic (talk) 09:14, 19 December 2019 (UTC)
- Including dialectal terms in the Thesaurus is an inferior duplication of zh-dial content. —Suzukaze-c◇◇ 01:55, 20 December 2019 (UTC)
Wade-Giles Issue
[edit](Withdrawn) 13:07, 29 December 2019 (UTC)
(Withdrawn) 13:31, 29 December 2019 (UTC)
- The translit without the “u” is not WG but another phonetic non-standard transliteration based on WG. For English and other speakers, it’s easier to make sense of eg “ko” than “kuo”, etc, hence "Komindang". Anatoli T. (обсудить/вклад) 14:23, 29 December 2019 (UTC)
Jyutping /-a/, /-oet/
[edit]New rimes added in 2018 (see the link at the bottom of https://www.lshk.org/jyutping). —Suzukaze-c◇◇ 06:33, 16 February 2020 (UTC)
- @Suzukaze-c: Nice, just saw this. Seems like we already support these. — justin(r)leung { (t...) | c=› } 01:23, 15 May 2020 (UTC)
WDL status
[edit]I noticed that WT:WDL lists "Chinese" as a WDL, which seems contrary to our usual practice (as far as I can tell) of treating dictionaries as sufficient for most topolects and for classical vocabulary. Does anyone object to changing it to "Standard Mandarin", to clarify that non-Mandarin Chinese only requires one use or reliable mention for attestation? —Μετάknowledgediscuss/deeds 23:22, 14 May 2020 (UTC)
- @Metaknowledge: Standard Mandarin should be good. That being said, I'm not sure if that would include Standard Chinese from Hong Kong (not really Cantonese proper, but usually read in Cantonese), which is sufficiently documented for sure. Written Cantonese may also be included, I think - it's pretty robustly attested. Other topolects would not qualify for WDL status as of now AFAICT. — justin(r)leung { (t...) | c=› } 00:17, 15 May 2020 (UTC)
- (edit conflict) @Justinrleung: Hi. Do we really have enough attested material in Written Cantonese? I am surprised. It's always been talked about as a mostly spoken lect with comics and other informal writings occasionally using it. I may have been out of touch wit the latest developments, though. --Anatoli T. (обсудить/вклад) 01:20, 15 May 2020 (UTC)
- @Atitarev: I mean there are always newspaper/magazine articles (usually tabloids) that may mix in Cantonese or use Cantonese entirely. Would that be enough for it to be considered WDL? — justin(r)leung { (t...) | c=› } 01:26, 15 May 2020 (UTC)
- I think that the massive body of spoken media produced by Hong Kong would allow HK Cantonese to be well-documented. —Suzukaze-c◇◇ 01:30, 15 May 2020 (UTC)
- @Suzukaze-c: Thanks. Yes, if the spoken media is considered good for CFI, then yes, but we are a written dictionary, so if a word is written in MSM (in movie or news subtitles) but pronounced in Cantonese, how do you reconcile that with what we are doing here? --Anatoli T. (обсудить/вклад) 01:40, 15 May 2020 (UTC)
- @Justinrleung: Re: your question - I don't know if they are enough. In the past, from what I read and heard discussions about, it wasn't. The Cantonese version of diglossia makes it difficult to separate what is standard and written, since when it's written and is standard, then it's MSM, not Cantonese. --Anatoli T. (обсудить/вклад) 01:43, 15 May 2020 (UTC)
- @Suzukaze-c: Thanks. Yes, if the spoken media is considered good for CFI, then yes, but we are a written dictionary, so if a word is written in MSM (in movie or news subtitles) but pronounced in Cantonese, how do you reconcile that with what we are doing here? --Anatoli T. (обсудить/вклад) 01:40, 15 May 2020 (UTC)
- I think that the massive body of spoken media produced by Hong Kong would allow HK Cantonese to be well-documented. —Suzukaze-c◇◇ 01:30, 15 May 2020 (UTC)
- Yes, I support adding Cantonese as WDL. Wide range of newspaper tabloids written in pure Cantonese (not MSM) in Hong Kong. User:Iambluemon 02:56, 15 May 2020 (UTC)
- @Atitarev: I mean there are always newspaper/magazine articles (usually tabloids) that may mix in Cantonese or use Cantonese entirely. Would that be enough for it to be considered WDL? — justin(r)leung { (t...) | c=› } 01:26, 15 May 2020 (UTC)
- (edit conflict) @Justinrleung: Hi. Do we really have enough attested material in Written Cantonese? I am surprised. It's always been talked about as a mostly spoken lect with comics and other informal writings occasionally using it. I may have been out of touch wit the latest developments, though. --Anatoli T. (обсудить/вклад) 01:20, 15 May 2020 (UTC)
Yes, I object because this policy favors Mandarin as the standard and disregards the presence of other languages, and also it makes it easier to create dialectal entries using only one mention which will introduce many errors to this site. Can I know why non-Mandarin Chinese has been omitted? How about Mandarin read using Cantonese? Will that be considered part of Standard Mandarin? I want to see other Chinese languages such as Cantonese treated the same as Mandarin. All languages equal, not one superior over the rest. Iambluemon (talk) 01:14, 15 May 2020 (UTC)
- Where do you see favouring Mandarin? They are talking about attestations. Standard Mandarin is easier to attest than other forms, since Chinese write much less in the varieties. Chinese contributors have put a lot of efforts in actually improving the coverage of other Chinese varieties, not the other way around. --Anatoli T. (обсудить/вклад) 01:20, 15 May 2020 (UTC)
- Non-MSM lects are often not "well-documented". I think it's fairly straightfoward. Keeping stricter regulations will make it harder for non-MSM content to be on the site, which would lead to a fairly unequal picture on Wiktionary where MSM would dominate even more than it already does. —Suzukaze-c◇◇ 01:24, 15 May 2020 (UTC)
- (Withdrawn) 01:34, 15 May 2020 (UTC)
- @Suzukaze-c: I am not encouraging stricter regulations, quite the opposite. That's why I think Cantonese shouldn't be considered WDL, so that more contents is allowed. --Anatoli T. (обсудить/вклад) 01:40, 15 May 2020 (UTC)
- I support add Cantonese as WDL. If you put Standard Mandarin as the only "well-documented" variety it creates the impression that Mandarin is the dominant variety, that Standard Mandarin is Chinese, that other Chinese varieties are inferior to Mandarin. User:Iambluemon 02:56, 15 May 2020 (UTC)
- Mh, no it doesn't? PUC – 13:19, 15 May 2020 (UTC)
- I support add Cantonese as WDL. If you put Standard Mandarin as the only "well-documented" variety it creates the impression that Mandarin is the dominant variety, that Standard Mandarin is Chinese, that other Chinese varieties are inferior to Mandarin. User:Iambluemon 02:56, 15 May 2020 (UTC)
- @Suzukaze-c: I am not encouraging stricter regulations, quite the opposite. That's why I think Cantonese shouldn't be considered WDL, so that more contents is allowed. --Anatoli T. (обсудить/вклад) 01:40, 15 May 2020 (UTC)
- Iambluemon, it looks like you misunderstood my intention. This policy would favour the non-Mandarin languages by giving them more lenient coverage. You claim it will "introduce many errors", which seems like a straw man argument — I challenge you to find even a single unambiguous error in an otherwise reliable source that would be entered into Wiktionary as a result. —Μετάknowledgediscuss/deeds 01:54, 15 May 2020 (UTC)
- One example of error is the diglossia situation in Cantonese. If you read Mandarin article using Cantonese, it doesn't mean that every Mandarin word automatically transformed into Cantonese word. We can also read Classical Chinese poems using Mandarin, Cantonese, Hokkien, etc, but it doesn't mean that all those words automatically become Mandarin, Cantonese, Hokkien. No, they are just readings, not actual words, not dictionary material. We need stricter criteria, don't have editors copy and paste individual character readings into compound words. Use material from spoken Cantonese (not MSM) that are available in written form. User:Iambluemon 02:54, 15 May 2020 (UTC)
- @Iambluemon: Please don't make assumptions, you have made a few today. We know all that. We don't include Cantonese words simply because there is a word in Mandarin. I'm talking about a common situation when a newsreader speaks Cantonese while their teleprompters and subtitles on the screen are in standard Chinese (automatically converting what they say into the correct Cantonese). The written and pronounced words will mismatch and written words will not be added as Cantonese, if they don't have Cantonese readings and they are used. --Anatoli T. (обсудить/вклад) 04:03, 15 May 2020 (UTC)
- While MSC/MSM words are not Cantonese in a strict sense, they can be considered Cantonese in a broader sense. Cantonese is the main spoken language of instruction of Chinese classes in Hong Kong, so MSC/MSM texts are always read in Chinese. Hong Kongers also write MSM texts and are meant to be read in Cantonese (although they can also be read in Mandarin). — justin(r)leung { (t...) | c=› } 03:53, 15 May 2020 (UTC)
- One example of error is the diglossia situation in Cantonese. If you read Mandarin article using Cantonese, it doesn't mean that every Mandarin word automatically transformed into Cantonese word. We can also read Classical Chinese poems using Mandarin, Cantonese, Hokkien, etc, but it doesn't mean that all those words automatically become Mandarin, Cantonese, Hokkien. No, they are just readings, not actual words, not dictionary material. We need stricter criteria, don't have editors copy and paste individual character readings into compound words. Use material from spoken Cantonese (not MSM) that are available in written form. User:Iambluemon 02:54, 15 May 2020 (UTC)
- (Withdrawn) 01:34, 15 May 2020 (UTC)
- @Metaknowledge: There seems to be consensus here (at least for Standard Chinese/Mandarin). Does this need to be brought to WT:BP for wider/further discussion, and do we need a formal vote? — justin(r)leung { (t...) | c=› } 02:39, 15 May 2020 (UTC)
- @Justinrleung: Consensus among the editing community (i.e. this discussion) suffices. That said, it's a bit unclear to me whether there's consensus regarding the status of Cantonese as a WDL, and we have to decide these together. (I don't know enough about the quantity of Cantonese material that is easily searchable and meets CFI to have an opinion.) —Μετάknowledgediscuss/deeds 04:07, 15 May 2020 (UTC)
I forgot to say this, but back in 2012, when Cantonese and Hokkien have their own header, Cantonese and Hokkien are treated as well documented language. Why downgrade their status now in 2020? User:Iambluemon 03:00, 15 May 2020 (UTC)
- It's not a downgrade. It's less restrictions for providing quotes for a term to be allowed to be included. --Anatoli T. (обсудить/вклад) 03:18, 15 May 2020 (UTC)
- It's also about reality. Hokkien is definitely not well-documented (although there's an increase in Hokkien writing, especially in Taiwan). — justin(r)leung { (t...) | c=› } 03:49, 15 May 2020 (UTC)
- Add Teochew to the list of Chinese dialects that has a growing amount of writing. And there's currently quite a fair bit of Teochew media that is produced in China. I'd probably consider it the third most well-documented dialect of Chinese after Cantonese and Hokkien. The dog2 (talk) 01:55, 24 May 2020 (UTC)
- This isn't a competition. We're essentially talking about languages with strong publishing industries, or else which have such an absurd amount of other durably archived media that it makes up for a lack of written material. —Μετάknowledgediscuss/deeds 02:08, 24 May 2020 (UTC)
- AFAICT, Teochew is nothing close to Hokkien when it comes to the amount of material out there (whether written or spoken). After some thought, although written vernacular Cantonese has a good amount of publication, it cannot compare to the amount of publication in Standard Mandarin / Standard Written Chinese. Thus, the only variety we should consider as a WDL should be Standard Mandarin / Standard Written Chinese. — justin(r)leung { (t...) | c=› } 03:54, 24 May 2020 (UTC)
- For sure there is a lot less Teochew than Hokkien material, but it's still one of the better documented dialects of Chinese. And yes, none of the dialects even come close to Mandarin when it comes to written material. I've been to Hong Kong and Macau and even there, written material is mostly in standard Mandarin. The dog2 (talk) 04:26, 24 May 2020 (UTC)
- @The dog2: Teochew is irrelevant to this discussion. I don't know if you fully understand the implications of this discussion. If Teochew were to be listed as a WDL, a lot of Teochew entries would have to go for lack of attestation per WT:ATTEST. No one is doubting the existence of Teochew material, which is quite a lot compared to other dialects like Changsha Xiang, just as an example, but it's just not eligible for consideration as a well-documented language. — justin(r)leung { (t...) | c=› } 05:08, 24 May 2020 (UTC)
- For sure there is a lot less Teochew than Hokkien material, but it's still one of the better documented dialects of Chinese. And yes, none of the dialects even come close to Mandarin when it comes to written material. I've been to Hong Kong and Macau and even there, written material is mostly in standard Mandarin. The dog2 (talk) 04:26, 24 May 2020 (UTC)
- AFAICT, Teochew is nothing close to Hokkien when it comes to the amount of material out there (whether written or spoken). After some thought, although written vernacular Cantonese has a good amount of publication, it cannot compare to the amount of publication in Standard Mandarin / Standard Written Chinese. Thus, the only variety we should consider as a WDL should be Standard Mandarin / Standard Written Chinese. — justin(r)leung { (t...) | c=› } 03:54, 24 May 2020 (UTC)
- This isn't a competition. We're essentially talking about languages with strong publishing industries, or else which have such an absurd amount of other durably archived media that it makes up for a lack of written material. —Μετάknowledgediscuss/deeds 02:08, 24 May 2020 (UTC)
- Add Teochew to the list of Chinese dialects that has a growing amount of writing. And there's currently quite a fair bit of Teochew media that is produced in China. I'd probably consider it the third most well-documented dialect of Chinese after Cantonese and Hokkien. The dog2 (talk) 01:55, 24 May 2020 (UTC)
- It's also about reality. Hokkien is definitely not well-documented (although there's an increase in Hokkien writing, especially in Taiwan). — justin(r)leung { (t...) | c=› } 03:49, 15 May 2020 (UTC)
- It's not a downgrade. It's less restrictions for providing quotes for a term to be allowed to be included. --Anatoli T. (обсудить/вклад) 03:18, 15 May 2020 (UTC)
OK, Teochew certainly cannot be considered well-documented. Hokkien and Cantonese are somewhat on the fence, but I'd lean towards not considering then well-documented. The dog2 (talk) 05:18, 24 May 2020 (UTC)
- Alright, I've made the change to WT:WDL to specify "Standard Written Chinese" as the only variety that is well-documented. — justin(r)leung { (t...) | c=› } 18:41, 25 May 2020 (UTC)
- This is better, make it more specific (Standard Written Chinese) rather than changing Chinese to Mandarin which is bad suggestion. Does Standard Written Chinese also include literary/formal Cantonese, the type of Cantonese used in official functions and ceremonies that is based on written Mandarin? User talk:iambluemon 08:42, 8 June 2020 (UTC)
- @Iambluemon: I would not include formal spoken Cantonese (which would still have things like 嘅). — justin(r)leung { (t...) | c=› } 08:54, 8 June 2020 (UTC)
Unified Chinese revisited
[edit]
Unified Chinese allows us to document the non-Mandarin languages faster: all you need to do is add the pronunciation, and the meaning if different from Mandarin. But this has the disadvantage that only the difference from Mandarin is documented. For example, the 寫字樓 entry currently has the following definitions:
Now it is clear that sense 2 is Cantonese only, but is sense 1 Mandarin only or both Mandarin and Cantonese? And does the absence of the "Hokkien" label mean "this sense is not in Hokkien" or "the editors have not yet considered that language"?
One way to solve the problem is to build {{zh-dial}}
data. For example, although 都 has the definitions "all; both" and "(Cantonese) as well; also; too", the {{zh-dial}}
tells us that the first sense is also in Cantonese. But this may not be feasible for the smaller entries. Another way is to add examples, but this is not 直觀 for the reader. It is much better to build the senses separately and in full for each language, while retaining the common pronunciation base. (A common pronunciation base is necessary or there will be no place for dialectal readings of Mandarin.) This can be achieved by the following entry layout:
==Chinese== {{zh-forms|...}} ===Pronunciation=== {{zh-pron |m=... |c=... ... }} ===Mandarin=== ... ===Cantonese=== ...
(If this is not allowed, one can always resort to
==Chinese== {{zh-forms|...}} ===Pronunciation=== {{zh-pron |m=... |c=... ... }} ===Definitions=== ... ---- ==Cantonese== ===Pronunciation=== {{yue-pron}} // transcludes the pronunciation in {{zh-pron}}, showing only |c= ===Definitions=== ...
but this requires more typing and part of the advantages of Unified Chinese is lost.)
What do you think of this layout? I think it's better to adopt it incrementally, focusing on words with varying meanings across languages first. Most terms such as 粵語 surely don't need splitting unless they get lots of examples in a variety of languages.
[To answer Geographyinitiative's another question: Wikipedia has nine language editions because there's no ground for unification. You can't write encyclopedic text that are Mandarin and Cantonese and Wu and Classical Chinese at once. But you can write in Traditional and Simplified scripts at once, so Wikipedia unifies them. Wiktionary deals with words, so the unification is the other way round.]
(Notifying Atitarev, Tooironic, Suzukaze-c, Justinrleung, Mar vin kaiser, Geographyinitiative): --Nyarukoseijin (talk) 12:20, 5 June 2020 (UTC)
- Yes, I am very happy to read this suggestion. I want to be able to differentiate between Chinese words that are used in all Chinese varieties and Chinese words that are only limited to certain dialects and this is very good solution to deal with the problem. It only involve extra typing. I definitely support this proposal. User talk:iambluemon 08:55, 8 June 2020 (UTC)
- I'm not a big fan of either proposal. I can see many edge cases where the first option would be problematic. In colloquial Cantonese, 奶奶 refers to "one's husband's mother" or "madam", but in the written language, Cantonese speakers may use this word to refer to "paternal grandmother" as well and it would be read in Cantonese (even though it would not be used in actual speech). I also can't imagine the mess it would be if we adopt the first option for long single character entries with multiple etymologies/pronunciations. The second option is essentially reverting back to disunified Chinese with a redundant Chinese section, which is even messier than before. — justin(r)leung { (t...) | c=› } 09:09, 8 June 2020 (UTC)
- I like the first option. If it is messy for long single character then we only apply this format for compound entries. if there is different usage in spoken and colloquial Cantonese we can use usage notes to explain the difference. Maybe second option is not so nice. Iambluemon (talk) 09:17, 8 June 2020 (UTC)
- Another problem with the first option is that even usage within the dialects of Mandarin or any other major grouping of dialects would have variation. Just look at 阿婆 or 阿公. Are we gonna split it up to every single dialect possible? I don't see how our status quo is much different from other languages like English, where there are lexical differences between different dialects of English. — justin(r)leung { (t...) | c=› } 09:32, 8 June 2020 (UTC)
- if we didn't have chinese characters, we would have no choice but to do so —Suzukaze-c (talk) 10:14, 8 June 2020 (UTC)
- I don't mind usage differences within dialects of Mandarin, or variations within the same dialect group. At least people will be more careful when adding definitions. Right now people just copy and paste pronunciation without bothering whether it is literary or colloquial or which dialect group the word belongs to. Iambluemon (talk) 10:34, 8 June 2020 (UTC)
- Yet another problem with the first option is that we have language varieties in L3 headers. This would automatically push PoS to L4 headers (and when we have more than one etymology, they'd get pushed to L5). Also, if definitions are separated by topolect groups, I don't see why pronunciations need to be grouped together. — justin(r)leung { (t...) | c=› } 10:44, 8 June 2020 (UTC)
- Easier to compare when pronunciation is group together. Iambluemon (talk) 10:48, 8 June 2020 (UTC)
- @justinrleung:—
- 奶奶 meaning "paternal grandmother" in literary Cantonese: is it a borrowing from Mandarin? Does it occur only in Mandarin contexts (奶奶的……) or does it also occur in Cantonese contexts (奶奶嘅……)? I think we need only cover the colloquial language in
==Cantonese==
,==Hokkien==
, etc. The literary language based on Mandarin is already covered under==Chinese==
; the additional L2 headers are for those who want to study the colloquial language without Mandarin influence. - Messiness: I agree that the first option doesn't look good (which is why I reverted this proposal initially). Under the second option, you can still focus on Unified Chinese if you want to. The additional language headers are for those interested in the individual languages (like @Geographyinitiative and me), and they don't need to come with glyph origins, etymologies, etc. The main motivation for having additional language headers is because the current Unified Chinese format doesn't treat the individual languages well: is the "office building" sense of 寫字樓 also in Cantonese? If not, should I add "(Mandarin)" or "(not Cantonese)" if I don't want to research the other languages? etc.
- How to split under the first option: Splitting by the so-called 一級方言 (Mandarin, Cantonese, Gan, etc.) should be enough. The current 阿公 entry looks fine and doesn't need splitting. But if it gets dozens of examples in a variety of dialects it might be useful to split by language.
- PoS headers pushed to L4 under the first option: Wyang thinks that PoS headers should be abolished for Chinese, and they're currently replaced by the dummy
===Definitions===
header for single-character entries. The L3 language headers were intended to take that place. If this is not possible, one can use the following format instead:
- 奶奶 meaning "paternal grandmother" in literary Cantonese: is it a borrowing from Mandarin? Does it occur only in Mandarin contexts (奶奶的……) or does it also occur in Cantonese contexts (奶奶嘅……)? I think we need only cover the colloquial language in
- @justinrleung:—
===Definitions=== {{zh-hanzi}} {{tlb|zh|Mandarin}} # ... ===Definitions=== {{zh-hanzi}} {{tlb|zh|Cantonese}} # ...
- Or with the second
===Definitions===
removed.
- if definitions are separated by topolect groups, I don't see why pronunciations need to be grouped together: terms like 我們 may have more pronunciations than definitions.
- Or with the second
- The current Unified Chinese format is Modern Standard Written Chinese oriented. If we don't solve the problem with additional language headers, what about an extended version of User:Wyang/zh-def that displays a little matrix showing which senses apply to which languages? The cells of the matrix can be simple yeses and noes or they could contain labels like "dialectal" (as in "dialectal Mandarin") or "morpheme" (襪 is a morpheme in Mandarin but a noun in Cantonese). --Nyarukoseijin (talk) 12:17, 10 June 2020 (UTC)
- @Nyarukoseijin: 奶奶 is probably not a good example because I personally wouldn't use it in writing for "paternal grandmother". It's possible to see it in Chinese textbooks/books (taught in Cantonese) in Hong Kong though. It's definitely unlikely for people to say 奶奶嘅 to mean "paternal grandmother's". So under your proposals (especially the second option), does that mean we'll have Chinese as Standard Written Chinese and Mandarin, Cantonese, etc. as covering only the colloquial versions? This is very difficult to determine, as the dialects are in a continuum from colloquial to formal (often closer to Standard Written Chinese). Take the word 太陽 as an example. In Hong Kong, 太陽 is quite commonly used in everyday speech and has kind of replaced the more colloquial words (熱頭 or 日頭), but in Taiwanese Hokkien, it seems to be restricted to literary/poetic registers (like in a song). Would we have to split 太陽, and if so, what's the most appropriate way of doing so?
- About 寫字樓, in Hong Kong, it's usually the "office" sense, but the "office building" sense is also possible (at least according to some Cantonese dictionaries). The usual implication of not labelling is that it's totally fine at least in Standard Written Chinese across regions. Whatever additional lects listed in
{{zh-pron}}
would be okay to use the words (to varying extents). Of course, we need to do a better job at labelling and writing usage notes so that we have a representation of all the lects that is as accurate as possible. - Back to your proposals. They seem to allow several formats to coexist (no splitting for 粵語 but splitting for 寫字樓 maybe). How do we decide which format to use? There are always edge cases that would be hard to define. As for the third option you just proposed, I don't think abolishing PoS across the board is the way to go. Chinese may be more "flexible" with PoS due to the lack of overt morphology, but that doesn't mean we should abandon PoS, especially for entries with more than one character. And where would we put literary Chinese (文言文) under your proposals?
- About 一級方言, we would definitely need to define these properly. Do we group Min as just Min, we follow Ethnologue and split it as Min Nan (which includes Hainanese, Leizhou Min, and maybe Zhongshan Min), Min Dong, Puxian Min, Min Bei (which includes Min Bei proper and Shaojiang Min) and Min Zhong? Do we group Pinghua under Cantonese? Do we group Shehua with Hakka? What do we do about varieties that Ethnologue doesn't seem to deal with (Xiangnan Tuhua, Shaozhou Tuhua)?
- I think many of the details need to be fleshed before we can all make a good judgment as to what we should do. — justin(r)leung { (t...) | c=› } 21:18, 10 June 2020 (UTC)
- @Justinrleung:
- 奶奶: If a text has Cantonese pronunciation but Mandarin vocabulary and grammar, I think it should be subsumed under the
==Chinese==
header. The Taiwanese Min Nan and Hakka dictionaries by the Ministry of Education ROC have only 阿媽/阿妈 (a-má) and 阿婆 (â-phò), not 奶奶. - 太陽:
# {{lb|yue|Hong Kong}} [[sun]]
under==Cantonese==
? - Which words to split: My proposal isn't about splitting. It's about building separate dictionaries for the Chinese languages alongside a dictionary of the Chinese macrolanguage. This means that
==Chinese==
still has the fullest coverage, just at a higher level. It's similar to the English dictionary market where we have both the Oxford English Dictionary and the Middle English Dictionary. You don't have to build==Cantonese==
if you don't want to. - PoS: I didn't say abolish PoS. Abolish PoS headers and use labels instead. Single-character entries need labels anyway.
- Literary Chinese:
==Literary Chinese==
of course. The primary meaning of 走 in Literary Chinese is different from that in Modern Chinese, but the current==Chinese==
entry doesn't mention it. --Nyarukoseijin (talk) 08:49, 17 June 2020 (UTC)
- @Nyarukoseijin: Thanks for clarifying. So would it be fair to say that what you're proposing is close to how Arabic is treated as of now, i.e. Chinese is reserved for Modern Standard Chinese, and language headers for other varieties should coexist with the Chinese header? For example, would 太陽 be formatted like this: a Chinese header with "sun" and "sunshine" (and "greater yang"?), a Mandarin header with "sun", "sunshine" and "temple" (labelled as SW Mandarin), a Cantonese header with "sun", "sunshine" and "temple" (labelled as Guangxi), a Gan header with "sun", "sunshine" and "temple", a Jin header with "sun" and "sunshine", a Min Bei header with "temple", a Min Nan header with "sun" (labelled as formal/literary) and "temple" (labelled as Leizhou) and a Literary Chinese header with "sun" and "greater yang"? You said 阿公 doesn't need to be split unless we have examples in many dialects. I don't think this is a good criterion for splitting. Entry layout should not revolve around examples. Essentially, we need refine the proposal by specifying the scope that each of "Chinese", "Mandarin", "Cantonese", "Gan", "Hakka", "Jin", "Min Bei", "Min Dong", "Min Nan", "Min Zhong", "Puxian Min", "Wu", "Xiang" and "Literary Chinese" covers if this is to be put through a formal vote. But of course, I still see the current layout as better (and would like to see Arabic follow suit). — justin(r)leung { (t...) | c=› } 16:42, 17 June 2020 (UTC)
- I shouldn't have used the word "splitting". It's not about splitting. It's about adding individual languages, not subtracting anything from ==Chinese==. The individual languages would probably be independent from ==Chinese==, like the Dictionary of Old English, the Middle English Dictionary, etc. from the Oxford English Dictionary. This means that (1) ==Chinese== won't be restricted to MSC due to the additional headers, just as the OED isn't restricted to Modern English due to the other dictionaries. (2) Additional headers don't have to be built all at once; they can have their own paces. (3) You don't have to work on them if you don't want to. Building ==Chinese== and
{{zh-dial}}
content is still the most efficient way to document those languages (and, in an age of language suppression, more ethical). The additional headers are for info which ==Chinese== doesn't handle well (e.g. 'run' being the primary sense of 走 in Literary Chinese), and are completely opt-in. --Nyarukoseijin (talk) 17:58, 17 June 2020 (UTC)- This doesn't sound to me like a fully fledged idea. The OED covers both modern and Middle English (but not Old English), because they don't care that they overlap considerably with the Middle English Dictionary — they're different enterprises, and they have no interest in being consistent. At Wiktionary, we want to make one dictionary for all languages, and being consistent isn't optional — it's necessary. You're saying that if I want information about Cantonese, then sometimes I should look under a 'Chinese' header and sometimes I should look under a 'Cantonese' header, and there will be no way to predict which. That is antithetical to how Wiktionary is organised, and it makes it less usable for both humans and machines. —Μετάknowledgediscuss/deeds 18:13, 17 June 2020 (UTC)
- I totally agree with @Metaknowledge. We're not an anthology of dictionaries, but one dictionary with many languages. There would be significant overlap if we start allowing Chinese in addition to other language headers unless we define Chinese as something different from what it is now. — justin(r)leung { (t...) | c=› } 21:31, 17 June 2020 (UTC)
- This doesn't sound to me like a fully fledged idea. The OED covers both modern and Middle English (but not Old English), because they don't care that they overlap considerably with the Middle English Dictionary — they're different enterprises, and they have no interest in being consistent. At Wiktionary, we want to make one dictionary for all languages, and being consistent isn't optional — it's necessary. You're saying that if I want information about Cantonese, then sometimes I should look under a 'Chinese' header and sometimes I should look under a 'Cantonese' header, and there will be no way to predict which. That is antithetical to how Wiktionary is organised, and it makes it less usable for both humans and machines. —Μετάknowledgediscuss/deeds 18:13, 17 June 2020 (UTC)
- I shouldn't have used the word "splitting". It's not about splitting. It's about adding individual languages, not subtracting anything from ==Chinese==. The individual languages would probably be independent from ==Chinese==, like the Dictionary of Old English, the Middle English Dictionary, etc. from the Oxford English Dictionary. This means that (1) ==Chinese== won't be restricted to MSC due to the additional headers, just as the OED isn't restricted to Modern English due to the other dictionaries. (2) Additional headers don't have to be built all at once; they can have their own paces. (3) You don't have to work on them if you don't want to. Building ==Chinese== and
- 奶奶: If a text has Cantonese pronunciation but Mandarin vocabulary and grammar, I think it should be subsumed under the
- @Justinrleung:
- @Nyarukoseijin: Hello. You haven't presented a case where the CURRENT structure doesn't work. I don't understand why you want to change something that's not broken (other than in one of our troll's mind). If there are senses (or PoS), which are only specific to Mandarin (not applicable to Cantonese, etc.), they can be marked/labelled so. The current definitions of e.g. 走 are good. If you want to specifically say that the sense is only applicable to Mandarin (or maybe a bunch of other varieties, they can be labelled so), e.g.
{{lb|zh|Mandarin|Jin|...}}
... --Anatoli T. (обсудить/вклад) 03:32, 18 June 2020 (UTC)- One of the disadvantages of adding labels is that it's all-or-nothing. If you add "(Mandarin)" to the "walk" sense of 走, it would suggest absence in Cantonese, Gan, etc. So you have to research all the languages and dialects at once. Also there are no filters that allow me to see only Cantonese senses and examples/quotations. --Nyarukoseijin (talk) 05:25, 18 June 2020 (UTC)
- @Nyarukoseijin: It is a fair point, thank you very much but it's not a show stopper. If the number of such cases was really large, then the unified approach wouldn't work. The differences between lects are of interest to contributors and generally addressed as a matter of priority. The differences often show in very frequent but common words. The higher (more formal) the level of writing, the less differences you find (very true for Arabic varieties as well). There is no clear line there, any where, as dialects borrow from each other. One statement is generally true that the formal written variety of Mandarin is generally applicable to other Chinese lects. No, you don't have to know or research the usage in other lects. Editors ether add what they know of find in dictionaries. --Anatoli T. (обсудить/вклад) 07:07, 18 June 2020 (UTC)
- (Withdrawn) 05:51, 18 June 2020 (UTC)
- Your arrogance embarrasses me. —Μετάknowledgediscuss/deeds 06:12, 18 June 2020 (UTC)
- Even if Unified Chinese is undone, it has still done more good than bad. When several groups of people are under persecution, saving them at once is more ethical than saving them separately if it allows more people to be saved. And that's what happened: the Mandarin nouns : Cantonese nouns : Wu nouns ratio has changed from 20467 : 317 : 10 to 95,620 : 73,447 : 6,193. My proposal is about experimenting with new ways to document the languages, not abandoning the good old way. --Nyarukoseijin (talk) 06:17, 18 June 2020 (UTC)
- I wonder if a system where contributors can confirm/deny that X word is used in Y lect (or specify: it is literary, etc.), and the result would be calculated to produce
{{lb}}
is feasible. —Suzukaze-c (talk) 06:31, 18 June 2020 (UTC) - It would be helpful for the entirety of Wiktionary, really. Regarding English, I don't know what people say in New Zealand. —Suzukaze-c (talk) 06:37, 18 June 2020 (UTC)
- I agree. We should try to always improve and fine-tune what we have in place. If someone labels an English word as British, if they are from UK, another editor can Oz or NZ labels, if the same usage applies to Australia or New Zealand. --Anatoli T. (обсудить/вклад) 07:07, 18 June 2020 (UTC)
- A note from last year that I found today by chance:
no: "used in Hokkien" // yes: "Cantonese: no; Hokkien: yes; Teochew: unknown; ..."
—Suzukaze-c (talk) 05:11, 20 June 2020 (UTC)
- A note from last year that I found today by chance:
- I agree. We should try to always improve and fine-tune what we have in place. If someone labels an English word as British, if they are from UK, another editor can Oz or NZ labels, if the same usage applies to Australia or New Zealand. --Anatoli T. (обсудить/вклад) 07:07, 18 June 2020 (UTC)
- I wonder if a system where contributors can confirm/deny that X word is used in Y lect (or specify: it is literary, etc.), and the result would be calculated to produce
- Even if Unified Chinese is undone, it has still done more good than bad. When several groups of people are under persecution, saving them at once is more ethical than saving them separately if it allows more people to be saved. And that's what happened: the Mandarin nouns : Cantonese nouns : Wu nouns ratio has changed from 20467 : 317 : 10 to 95,620 : 73,447 : 6,193. My proposal is about experimenting with new ways to document the languages, not abandoning the good old way. --Nyarukoseijin (talk) 06:17, 18 June 2020 (UTC)
- Your arrogance embarrasses me. —Μετάknowledgediscuss/deeds 06:12, 18 June 2020 (UTC)
- One of the disadvantages of adding labels is that it's all-or-nothing. If you add "(Mandarin)" to the "walk" sense of 走, it would suggest absence in Cantonese, Gan, etc. So you have to research all the languages and dialects at once. Also there are no filters that allow me to see only Cantonese senses and examples/quotations. --Nyarukoseijin (talk) 05:25, 18 June 2020 (UTC)
- @Nyarukoseijin: Hello. You haven't presented a case where the CURRENT structure doesn't work. I don't understand why you want to change something that's not broken (other than in one of our troll's mind). If there are senses (or PoS), which are only specific to Mandarin (not applicable to Cantonese, etc.), they can be marked/labelled so. The current definitions of e.g. 走 are good. If you want to specifically say that the sense is only applicable to Mandarin (or maybe a bunch of other varieties, they can be labelled so), e.g.
Language treatment: Only the macrolanguage is treated as a language?
[edit]Wiktionary:Language treatment says that for Chinese: "Only the macrolanguage is treated as a language". Are we sure about this? Does this mean other varieties not treated as a language in Wiktionary? I think this contradict current practice. In example such as Hsi-ning https://en.wiktionary.org/w/index.php?title=Hsi-ning&type=revision&diff=59443856&oldid=59443845 language code for "Mandarin" is preferred over language code for "Chinese". The Unified Chinese vote is about treating Chinese varieties under a single header and using "zh" language code. Does it abolish other Chinese varieties or disallow their language code? Someone can explain? User talk:iambluemon 09:00 8 June 2020 (UTC)
- @Iambluemon: "Treat as a language" in that context refers to how entries are made. The current practice is that we don't have Cantonese, Mandarin, Xiang, etc. entries, but Chinese entries with Mandarin, Cantonese and/or Xiang subsumed under it. (There are exceptions to this, but I digress.) This does not "abolish other varieties" as there are no other varieties if all Chinese varieties are treated as Chinese. This also has nothing to do with how lects are treated in etymologies. There are many "etymology-only" languages/lects/varieties. — justin(r)leung { (t...) | c=› } 09:25, 8 June 2020 (UTC)
- The page for Wiktionary:Language treatment mentions that it is to "document cases where Wiktionary's treatment of lects deviates from that of the ISO/SIL". It doesn't mention that language treatment is in the context of how entries are made. And isn't there a header for Min Nan entry in Wiktionary based on Latin POJ? Maybe the description can be more specific, such as "only the macrolanguage is treated as a language for lects written in Han script". Iambluemon (talk) 11:16, 8 June 2020 (UTC)
- (Withdrawn) 01:49, 9 June 2020 (UTC)
- Please stop your baseless accusations about any ideologies deeply ingrained at Wiktionary. If anything is missing or is incorrect in a Chinese dialect, it means nobody has added it yet. The use of one L2 "Chinese" for all Chinese varieties only brought positive development to the varieties, otherwise miserably neglected. All the promises to provide a separate treatment for each individual word in any given Chinese lect have been kept. You can define not only pronunciations but usage, part of speech, which are specific to Cantonese, Min Nan, etc. Wikipedia, which you quote so much as superior to Wiktionary, uses zh-min-nan language code for Min Nan, we are fairer, we just use "nan". We provide all readings, Min Nan Wikipedia only uses POJ.
- You only bring negative views to this site. If you hate it so much, just leave it. -Anatoli T. (обсудить/вклад) 02:03, 9 June 2020 (UTC)
- Another thing Geographyinitiative fails to notice is that the 中文 (= Chinese) version of Wikipedia is written in Written Standard Chinese, based on Mandarin. This isn't much different from us in treating Mandarin-based Written Standard Chinese as de facto Chinese. — justin(r)leung { (t...) | c=› } 02:45, 9 June 2020 (UTC)
- @Iambluemon: Wiktionary:Language treatment is a "draft proposal", so there definitely needs to be more work done on it to specify what we mean by "language treatment" and how it relates to entry making and etymologies. — justin(r)leung { (t...) | c=› } 02:47, 9 June 2020 (UTC)
- One big disadvantage of Wikipedia is that their Hokkien version is written in the Latin alphabet, when by far the most common way of writing Hokkien is with Chinese characters. I think this makes their Hokkien version far less accessible to the average Hokkien speaker than it could be. The dog2 (talk) 03:52, 9 June 2020 (UTC)
- Yes, I mentioned that above. It fits the agenda of those who prefer the separate treatment. At Wiktionary we provide both the Chinese characters and the romanisation (POJ fro Min Nan). The infrastructure is there for editors. Editors are free to focus only on the terms written in Chinese or POJ. Editing the Chinese characters only doesn't exclude the romanisation to be used (providing the manual or automated transliterations). --Anatoli T. (обсудить/вклад) 04:08, 9 June 2020 (UTC)
- One big disadvantage of Wikipedia is that their Hokkien version is written in the Latin alphabet, when by far the most common way of writing Hokkien is with Chinese characters. I think this makes their Hokkien version far less accessible to the average Hokkien speaker than it could be. The dog2 (talk) 03:52, 9 June 2020 (UTC)
- (Withdrawn) 01:49, 9 June 2020 (UTC)
- The page for Wiktionary:Language treatment mentions that it is to "document cases where Wiktionary's treatment of lects deviates from that of the ISO/SIL". It doesn't mention that language treatment is in the context of how entries are made. And isn't there a header for Min Nan entry in Wiktionary based on Latin POJ? Maybe the description can be more specific, such as "only the macrolanguage is treated as a language for lects written in Han script". Iambluemon (talk) 11:16, 8 June 2020 (UTC)
(Withdrawn) 08:01, 11 June 2020 (UTC)
- @Geographyinitiative: Alright. Seems to be fine since it's probably more common in English. It's not entirely relevant to this page though. — justin(r)leung { (t...) | c=› } 08:07, 11 June 2020 (UTC)
- (Withdrawn) 08:14, 11 June 2020 (UTC)
- Alright, thanks for the notice. — justin(r)leung { (t...) | c=› } 08:18, 11 June 2020 (UTC)
- (Withdrawn) 08:14, 11 June 2020 (UTC)
Mandarin Dialect Romanization
[edit]@Justinrleung Hey, I remember somewhere you talked about the possibility of an in-house romanization for the various Mandarin dialects. I was thinking that it's possible to adapt Sichuanese pinyin as a romanization for most of the Mandarin dialects available in 現代漢語方言大詞典. Maybe the only "complications" are checked tones and number of tones. For checked tones, an "-h" final can be added (like for Nanjing and Yangzhou), and for number of tones, most have 4, but a few have 3 (no problem I guess), and some southern ones have 5 (because they have checked tones). What do you thnk? --Mar vin kaiser (talk) 13:51, 24 June 2020 (UTC)
- @Mar vin kaiser: We can definitely look into it, but we'll have to look at them one by one and check other sources. We should start with one representative from each major grouping:
- Northeastern: Harbin
- Jilu: Jinan
- Jiaoliao: Muping
- Central Plains: Luoyang, Wanrong, Xi'an, Xining, Xuzhou
- Lanyin: Ürümqi, Yinchuan
- Southwestern: Chengdu (done), Guiyang, Liuzhou, Wuhan
- Jianghuai: Nanjing, Yangzhou
- Harbin should be pretty straightforward - we could just use pinyin for it. For the other groupings, let's look at Jinan, Muping, Ürümqi and Nanjing for now. We already have coverage of Central Plains with Dungan and Southwestern with Chengdu, so we can probably worry about those later. What do you think? — justin(r)leung { (t...) | c=› } 21:31, 24 June 2020 (UTC)
- This is probably a good basis for Nanjing - if it corresponds well with 南京方言詞典, we should probably just use it without much modification. — justin(r)leung { (t...) | c=› } 21:34, 24 June 2020 (UTC)
- @Justinrleung: Yeah, that looks good for Nanjing. And maybe it can be used for Yangzhou also, their phonology look almost identical. For Southwestern Mandarin, I was just looking into it, it looks like Sichuanese Pinyin can be used with Guiyang and Wuhan, except maybe it doesn't have "l". I'm gonna look into Jinan next. --Mar vin kaiser (talk) 10:47, 26 June 2020 (UTC)
Allowing Jyutping polysyllabic entries as non-lemmas
[edit]I started a discussion in Beer Parlour in April but no one has responded, so I post it here again:
Under the current policy, Jyutping transliterations for Cantonese are only allowed for monosyllables, such as zoeng1, but not polysyllables; while Pinyin transliterations for Mandarin are allowed for both, as in zhāng and jǐnzhāng. I propose that Jyutping should be given the equal status as Pinyin that polysyllables be allowed as non-lemma entries, since Jyutping has acquired the status as the standard phonetic transliteration for Cantonese in Hong Kong, considering that:
- it is developed by the w:Linguistic Society of Hong Kong;
- it is used in the the Cantonese Read-Aloud Test; and
- recent linguistic papers written in English transliterate Cantonese in Jyutping.
There are no reasons for us to treat Pinyin and Jyutping differently. Jonashtand (talk) 07:57, 17 July 2020 (UTC)
Simplified forms and alternate forms
[edit](Withdrawn) 00:33, 16 September 2020 (UTC)
- @Geographyinitiative: In general usage, it seems to be considered a variant of 璇. In the case of names, basically many variant characters may pop up, so it's not surprising that it's written as 刘璿 rather than 刘璇 (although we should always take Wikipedia with a grain of salt). — justin(r)leung { (t...) | c=› } 00:55, 16 September 2020 (UTC)
- (Withdrawn) 01:02, 16 September 2020 (UTC)
- @Geographyinitiative: I'll have to look at what other sources say. My gut feeling is that 璇
shouldshouldn't be listed as a simplified form of 璿, but we should have the definition say "alternative form of 璇". It's probably even better to collapse it as{{zh-forms}}
if it's an exact equivalent (which is what I need to check) so that everything is centralized at 璇. — justin(r)leung { (t...) | c=› } 01:05, 16 September 2020 (UTC)- (Withdrawn) 01:18, 16 September 2020 (UTC)
- @Geographyinitiative: Sorry, silly me! "should" is a typo for "shouldn't" (which is totally opposite in meaning). — justin(r)leung { (t...) | c=› } 01:36, 16 September 2020 (UTC)
- That said, if you look at the actual 第一批异体字整理表, 璿 is listed as a variant of 璇. — justin(r)leung { (t...) | c=› } 01:38, 16 September 2020 (UTC)
- (Withdrawn) 01:43, 16 September 2020 (UTC)
- I've defined it as an alternative form of 璇. I think this should be a good way to deal with it. — justin(r)leung { (t...) | c=› } 01:50, 16 September 2020 (UTC)
- (Also, I'm not sure what Baptist doctrine you're talking about, because that doesn't sound like what most Baptists believe.) — justin(r)leung { (t...) | c=› } 01:59, 16 September 2020 (UTC)
- (Withdrawn) 01:43, 16 September 2020 (UTC)
- (Withdrawn) 01:18, 16 September 2020 (UTC)
- @Geographyinitiative: I'll have to look at what other sources say. My gut feeling is that 璇
- (Withdrawn) 01:02, 16 September 2020 (UTC)
Classifiers
[edit](Notifying Atitarev, Tooironic, Suzukaze-c, Mar vin kaiser, Geographyinitiative, RcAlex36, The dog2, Frigoris): How should we deal with measurement words for mass/non-count nouns, e.g. 管 for 牙膏, 杯/滴/盆/鍋/口/etc. for 水, 束/盆 (as opposed to 朵) for 花, 群 (as opposed to 個) for 朋友? It seems really messy when we have all these options in {{zh-mw}}
without explanation. — justin(r)leung { (t...) | c=› } 21:47, 28 September 2020 (UTC)
- Maybe we should have a table similar to the dialectal modules, then we can list the right classifier for each context. The dog2 (talk) 22:09, 28 September 2020 (UTC)
- @The dog2: We do have synonym tables for some of these, but they're not quite placed beside the definition like
{{zh-mw}}
is. — justin(r)leung { (t...) | c=› } 22:16, 28 September 2020 (UTC)- One thing that I guess can be done is to have a special table for uncountable words mass count words that can be expanded right next to the definition. The dog2 (talk) 22:20, 28 September 2020 (UTC)
- @The dog2: Hmm, I don't know how that will look. I imagine it might make the definition line really cluttered, which isn't ideal. — justin(r)leung { (t...) | c=› } 22:31, 28 September 2020 (UTC)
- One thing that I guess can be done is to have a special table for uncountable words mass count words that can be expanded right next to the definition. The dog2 (talk) 22:20, 28 September 2020 (UTC)
- @The dog2: We do have synonym tables for some of these, but they're not quite placed beside the definition like
- Maybe usexes? —Suzukaze-c (talk) 22:57, 28 September 2020 (UTC)
- @Suzukaze-c: I guess that's one way to do it. Another issue is that we could have a lot of arbitrary ones, like 克, 磅, 盒, 箱, 杯, etc., based on how we group things. What stops us from adding these to
{{zh-mw}}
? (In other words, should we have constraints on what we allow in the template?) — justin(r)leung { (t...) | c=› } 23:11, 28 September 2020 (UTC)
- @Suzukaze-c: I guess that's one way to do it. Another issue is that we could have a lot of arbitrary ones, like 克, 磅, 盒, 箱, 杯, etc., based on how we group things. What stops us from adding these to
- Is it really messy to include a large number of them? Do you have a specific example? ---> Tooironic (talk) 05:14, 29 September 2020 (UTC)
- @Tooironic 水 is one. We list 瓶; 滴; 池; 盆; 杯 - but these don't refer to the same amounts of water, and these are not intrinsic to 水 (since water isn't really a count noun). We can list even more, like 滴, 鍋, 口, etc., depending on how much water we're referring to. It's not quite good that we list them without explanation. Also, most physical objects could be "classified by" 箱 because we can technically put any physical object in a box. — justin(r)leung { (t...) | c=› } 05:24, 29 September 2020 (UTC)
- I see. I have thought about this before. What you say makes sense. Maybe we could vote to only include classifiers for words/senses that are actually countable? ---> Tooironic (talk) 05:51, 29 September 2020 (UTC)
- @Tooironic 水 is one. We list 瓶; 滴; 池; 盆; 杯 - but these don't refer to the same amounts of water, and these are not intrinsic to 水 (since water isn't really a count noun). We can list even more, like 滴, 鍋, 口, etc., depending on how much water we're referring to. It's not quite good that we list them without explanation. Also, most physical objects could be "classified by" 箱 because we can technically put any physical object in a box. — justin(r)leung { (t...) | c=› } 05:24, 29 September 2020 (UTC)
(Withdrawn) 19:15, 20 October 2020 (UTC)
- (Withdrawn) 19:17, 20 October 2020 (UTC)
- @Geographyinitiative: No, don't do that. They're alternative forms of each other. (Also, this is probably not the right avenue for a question like this because it's about English entries.) — justin(r)leung { (t...) | c=› } 19:29, 20 October 2020 (UTC)
(Withdrawn) 23:11, 20 October 2020 (UTC)
- @Geographyinitiative: You can always ask for citation using w:Template:Citation needed. — justin(r)leung { (t...) | c=› } 23:25, 20 October 2020 (UTC)
Positioning of Forms and images
[edit]Currently, there's an unwritten rule that {{zh-forms}}
and Images should be placed right under the Chinese. However, this leads to two issue.
- For entries with long pronunciations, the images are no longer visible once the user scrolls down to the definition. As wiktionary keeps on expanding, all Chinese entries will have long pronunciations sections as they are filled out. For instance, see 鹿.
{{zh-forms}}
breaks up the Etymology section making it harder to read. 蝴蝶 is a good example of this issue.
Proposal 1
{{zh-forms}}
and links to Wikipedia should be placed at the end of the Etymology section.- Images should be placed directly under the part of speech section, e.g. under Noun
Proposal 2
{{zh-forms}}
, links to Wikipedia, and Images should be placed under the part of speech section, e.g. under Noun
Feedback? Languageseeker (talk) 07:40, 2 November 2020 (UTC)
- The position of
{{zh-forms}}
shouldn't change. Images can be placed wherever it's logical. That would mean they could either be with definitions or right under{{zh-forms}}
(and{{zh-wp}}
, if it's there). They should not be placed under pronunciation because they are not part of the pronunciation. — justin(r)leung { (t...) | c=› } 06:46, 3 November 2020 (UTC)
- @Languageseeker: IMO, images can go under
{{zh-wp}}
or under{{head}}
({{zh-noun}}
, ...). Anything else is (probably) odd. Images under the Pronunciation header is probably not good, as Justin noted on his talk page. —Suzukaze-c (talk) 06:31, 11 November 2020 (UTC)
Dialectal modules for words that seem to be the same across dialects
[edit]We have data for 小 but not for 大, and we don't have data for 水, unlike water#Translations. Should we make these?
Personally, Support for clarity and explicitness.
@Justinrleung, Mar vin kaiser, 沈澄心, The dog2 —Suzukaze-c (talk) 17:16, 12 November 2020 (UTC)
- In the case of 小, it is not the same across all dialects. Many southern dialects use 細. While 大 seems to be the same across all dialects, but if you know any dialects that use a different word, go ahead and create the table. Ditto for 水. The dog2 (talk) 17:24, 12 November 2020 (UTC)
- @The dog2 Oh, what I mean is that we have Module:zh/data/dial-syn/小, but not Module:zh/data/dial-syn/大; I presume that for the latter, it is because dialects use 大 across the board (or not! what do I know! but the lack of symmetry is odd). —Suzukaze-c (talk) 17:27, 12 November 2020 (UTC)
- I do think it'd be useful to have these even if there's no dialectal difference at all (although if we look hard enough, there may be some somewhere; maybe not in the synchronic data, but in the diachronic data). — justin(r)leung { (t...) | c=› } 17:28, 12 November 2020 (UTC)
- @Suzukaze-c: Obviously, I can't speak for all dialects because I don't speak all of them, but yes, for the latter, all the dialects I know use 大. While for 小, that is not the case, because Cantonese, Hokkien and Teochew all used 細. I'm not sure having a dialectal module is necessary if it's the same word across all dialects, though I'm not vehemently opposed to having one either. The dog2 (talk) 17:35, 12 November 2020 (UTC)
- Support for creating these data pages. --沈澄心✉ 11:47, 13 November 2020 (UTC)
Intuitive Middle Chinese reconstructions
[edit]@Frigoris, Suzukaze-c, and others; while editing under the Etymology sections of specific Sino-Japanese readings, I have somewhat intuitively reconstructed some Middle Chinese pronunciations which there are none in their modules using patterns in the reconstructed Old Chinese pronunciations. Here are the characters that I have edited with the notes: 樣 (*jɨɐŋH), 婿 (*seiH)
Does anyone know any others that have Old Chinese reconstructions but no known Middle Chinese attestation? There should be a category for that. Thanks, ~ POKéTalker(═◉═) 21:24, 23 February 2021 (UTC)
- @Poketalker: There is Middle Chinese for 樣. What do you mean? — justin(r)leung { (t...) | c=› } 21:30, 23 February 2021 (UTC)
- I've also moved MOD:zh/data/ltc-pron/壻 to MOD:zh/data/ltc-pron/婿, so there should be Middle Chinese for 婿 as well. — justin(r)leung { (t...) | c=› } 21:32, 23 February 2021 (UTC)
- Slap in the face... last time I edited the shinjitai (new character form in Japan) 様 (yō) with the etymology section there was probably no MC pronunciation for 樣--how long was that...
- There are some Chinese characters that have a reconstructed Old Chinese pronunciation but no Middle Chinese most likely due to lack of information added to their proper modules in here. Any recommended websites with such information? ~ POKéTalker(═◉═) 21:36, 23 February 2021 (UTC)
- @Poketalker: There should theoretically not be such cases because the OC reconstructions should always have MC reflexes. There are two issues: (1) Zhengzhang sometimes reconstructs anachronistically without evidence from early (pre-Han) texts and perhaps reconstructs OC for "late" words, and (2) Guangyun uses different variants than the modern-day standard. In both cases you mentioned, it was the case that Guangyun used a different variant (㨾 and 壻) instead; in this case, we would simply have to move the module or copy the module over to the modern-day standard. The issue of a "late" word not found in Guangyun can't be solved because there probably isn't any "standard" way to reconstruct MC in those cases. — justin(r)leung { (t...) | c=› } 21:47, 23 February 2021 (UTC)
- @Justinrleung: 淘 (MC dɑuH?) appears to be suffering the same issue, perhaps? There might be some more... ~ POKéTalker(═◉═) 08:38, 26 February 2021 (UTC)
- @Justinrleung: also 瑪瑙/玛瑙 (mǎnǎo). ~ POKéTalker(═◉═) 07:59, 27 February 2021 (UTC)
- @Poketalker: For 淘, I'm not sure what the basis of Zhengzhang's reconstruction is. It might be based on Jiyun, but I can't seem to find this word in Guangyun. For 瑪瑙, it was written as 碼碯, so I've created the modules for 瑪 and 瑙 based on 碼 and 碯. — justin(r)leung { (t...) | c=› } 17:05, 27 February 2021 (UTC)
- @Justinrleung: I see; but any progress for 淘 (MC *dɑuH)? 截 (MC *d͡ziᴇt̚) is also missing. Should the MC parameter (or module of character in question) be based only on Guangyun? ~ POKéTalker(═◉═) 01:08, 26 April 2021 (UTC)
- @Poketalker: For 淘, I'm not sure what the basis of Zhengzhang's reconstruction is. It might be based on Jiyun, but I can't seem to find this word in Guangyun. For 瑪瑙, it was written as 碼碯, so I've created the modules for 瑪 and 瑙 based on 碼 and 碯. — justin(r)leung { (t...) | c=› } 17:05, 27 February 2021 (UTC)
- @Poketalker: There should theoretically not be such cases because the OC reconstructions should always have MC reflexes. There are two issues: (1) Zhengzhang sometimes reconstructs anachronistically without evidence from early (pre-Han) texts and perhaps reconstructs OC for "late" words, and (2) Guangyun uses different variants than the modern-day standard. In both cases you mentioned, it was the case that Guangyun used a different variant (㨾 and 壻) instead; in this case, we would simply have to move the module or copy the module over to the modern-day standard. The issue of a "late" word not found in Guangyun can't be solved because there probably isn't any "standard" way to reconstruct MC in those cases. — justin(r)leung { (t...) | c=› } 21:47, 23 February 2021 (UTC)
Fangcheng Dialect
[edit](Withdrawn) 17:51, 17 June 2021 (UTC)
- @Geographyinitiative 圩. See 趁墟 (chènxū). RcAlex36 (talk) 18:04, 17 June 2021 (UTC)
- (Withdrawn) 18:33, 17 June 2021 (UTC)
Nanjing Dialect
[edit]@Justinrleung Can we use the Nanjing dialect module already? It looks ready lol. --Mar vin kaiser (talk) 11:12, 25 July 2021 (UTC)
- Oh nevermind, there's no module yet. I thought there was one already since a romanization system was listed. --Mar vin kaiser (talk) 11:16, 25 July 2021 (UTC)
- @Mar vin kaiser: It has been created already (Module:cmn-pron-Jianghuai), but it's not ready for multicharacter entries AFAICT. I don't have the time to figure things out yet. I'm not sure how familiar you are with Lua, but it would be nice to get some test cases to check if the module works well. — justin(r)leung { (t...) | c=› } 19:12, 25 July 2021 (UTC)
Sources for cites in permanently recorded media
[edit]@Justinrleung, 沈澄心 What are your sources/tricks for cites in permanently recorded media? —Suzukaze-c (talk) 19:24, 29 July 2021 (UTC)
- @Suzukaze-c: I have access to a database called 讀秀 through my university. It provides access to many books, journals and newspapers published in Mainland China. — justin(r)leung { (t...) | c=› } 19:53, 29 July 2021 (UTC)
- Guangxi Library and Zhejiang Library (Resident Identity Card or other valid ID needed) - free access to CNKI, 万方数据, 方正数字报纸全文库 (Global Times and Southern Weekly included), People's Daily, 维普, etc.
- 全国图书馆参考咨询联盟
- Many newspapers published in Mainland China (such as Guangming Daily, 中华读书报 and 文摘报) and many TV programs in Mainland China are available online for free.
- 中国裁判文书网 - court verdicts in Mainland China
- 国家标准全文公开系统 - standards in Mainland China
- 臺灣博碩士論文知識加值系統
- 印尼星洲日报 (pre-2021) and 印度尼西亚日报 - Indonesian Mandarin @Suzukaze-c --沈澄心✉ 05:26, 30 July 2021 (UTC)
Template:zh-der#Features: colons versus semi-colons
[edit]- Discussion moved to Template talk:zh-der.
Incorrect Gwoyeu Romatzyh
[edit]- Discussion moved to Template talk:zh-pron#Incorrect Gwoyeu Romatzyh.
Moral Equality between Cantonese and Mandarin
[edit](Withdrawn) 16:32, 6 April 2022 (UTC)
- Standard Mandarin is much more widely spoken and studied than any other variety of Chinese, so it makes sense that it should appear first in the list of pronunciations. This is part of a broader principle that also applies to other languages: it makes sense to give more focus to the "standard" variety than other varieties. This is not because of "a special status in the realm of linguistics", but rather a practical consideration to help our readers. —Granger (talk · contribs) 18:50, 6 April 2022 (UTC)
- (Withdrawn) 19:38, 8 May 2022 (UTC)
- My bad if this discussion was taken elsewhere. From here it looks like @Geographyinitiative's concern wasn't addressed.
- Russian is "much more widely spoken and studied" (in @Granger's words) than Bulgarian and may have more contributors, but the inequality — if we can call it that — ends there. And so it is for any number of language pairs. How does it "help the users of Wiktionary" that Cantonese is in effect treated as a dialect of Mandarin? Is there a non-circular justification for this? 釆 (talk) 01:45, 21 December 2022 (UTC)
Why are many Chinese words that are from Japanese wasei-kangos not treated as wasei-kangos?
[edit]In Wiktionary, only part of the Chinese words that are from Japanese wasei-kangos are added with template "wasei kango" in their section "Etymology" (such as "電話", "進化", "宗教", etc.), while others are not. In fact, words like "階級", "社會", "文明", "主義", "獨裁", etc. are also wasei-kangos. Why are they not treated as wasei-kangos in this website? --NasalCavityRespiratory (talk) 09:44, 10 April 2022 (UTC)
- @NasalCavityRespiratory: Special:Contributions/49.179.157.161 —Fish bowl (talk) 09:46, 10 April 2022 (UTC)
- @Fish bowl: So the templates in many entries have been removed because their etymologies have not been verified? So how do they make sure that some of the words like "電話", "進化" are verified to be wasei-kangos? (My native language is not English and I am a new user. Please forgive me.) —NasalCavityRespiratory (talk) 10:35, 10 April 2022 (UTC)
- @NasalCavityRespiratory (I'd like to know the answer to this as well, if you've found it.) 釆 (talk) 01:49, 21 December 2022 (UTC)
- @Fish bowl: So the templates in many entries have been removed because their etymologies have not been verified? So how do they make sure that some of the words like "電話", "進化" are verified to be wasei-kangos? (My native language is not English and I am a new user. Please forgive me.) —NasalCavityRespiratory (talk) 10:35, 10 April 2022 (UTC)
Words with no traditional form
[edit]Does the requirement to have a Traditional Chinese form as the lemma still hold for words that exist only in Simplified Chinese form and don't have a Traditional Chinese form (in which case the Traditional-Chinese-as-lemma requirement would force us to invent a Traditional Chinese form out of whole cloth)? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 20:27, 15 April 2022 (UTC)
- Do you have any specific examples in mind? —Granger (talk · contribs) 23:00, 15 April 2022 (UTC)
(Withdrawn) 09:50, 3 May 2022 (UTC)
- A common way to form compounds in Chinese is by combining antonyms – 大小, 东西, 多少, etc. These seem to be categorized in Category:Chinese antonymous compounds. —Granger (talk · contribs) 10:02, 3 May 2022 (UTC)
Remove pinyin when linking to other entries?
[edit]Currently the synonym, compounds, derived terms, etc. sections contains large amount of links to other entries, which is often accompanied the pinyin of that entry. (This includes pinyin automatically created by usage of {{zh-l}}
, {{zh-m}}
, etc, as well as pinyin in plain wikitext or those used in {{zh-der}}
.) There are several issues caused by this:
- It belittles and ignores the non-Mandarin languages which do not use pinyin. Note that the pronunciation/romanisation/transliteration of these are only listed at the entry pages and (rarely) example sentences but not under these sections (there might be some, but almost non-existent that I haven't found one yet), even when that link is specified to be limited to that language.(e.g. in 吹水#synonyms)
- When generated automatically, it assumes that the entry has a Mandarin pronunciation, even when the entry is not (commonly) used in Mandarin (e.g. 靚#Compounds under Etymology 2).
- Editors sometimes would not check the correctness of the automatically generated pinyin, especially when there is a large amount of them in these sections
- This creates clutter, inflates the page size considerably, and uses relatively expensive functions to derive pinyin from the provided words/characters
- This causes inconsistencies where only some links have pinyin (e.g. see 口/derived terms).
- The pinyin can be confused with the language names and/or glosses, since both are in the exact same font style (e.g. in 吹水#Synonyms)
The readers can already check the (often-more-detailed) pronunciation tables via the links, so I believe that removing them will not cause considerable usability issues. -- Wpi31 (talk) 14:52, 4 July 2022 (UTC)
CJKV Character list by Ideographic Description Characters
[edit]I miss an appendix of characters ordered according to the Unicode ideographic description character used in their description Backinstadiums (talk) 13:07, 11 September 2022 (UTC)
"(This includes names derived at an older stage of the language.)"
[edit](Withdrawn) 20:13, 3 October 2022 (UTC)
- The parenthetical appears to be boilerplate, also used in categories such as Category:English surnames from Spanish. It presumably applies to Chinese too; surely there are, for instance, English surnames derived from Cantonese back in the 19th century when Cantonese phonemically distinguished place of articulation for sibilants. Not everything is a political conspiracy. —Granger (talk · contribs) 20:29, 3 October 2022 (UTC)
"Topolects" vs. "Unified Chinese"
[edit]Recently, I began to use "Min Nan" as header for several Min Nan entries, based on the existence of following cases:
- Min Nan lemmas of Japanese origin, e.g. 歐兜邁 should be more appropriately expressed in Pe̍h-ōe-jī as o͘-tó͘-bái.
- Min Nan lemmas with uncertain etynom and with diverse choices of Han characters (as sematic or phonological loans), e.g. siâⁿ written in 唌, 饗, 眩, 城, 邪, 炫, etc.
- Min Nan lemmas with uncertain etynom and without clearly widespread use of Han characters, e.g. phián in phián-thô͘ (to scratch on the ground; to dig soil and turn it over).
But a senpai told me that
- Min Nan should not be used as a heading for Chinese character entries, even if the word is exclusively used in Min Nan.
I don't completely object to this practice; I even support the usage of {{zh-see|xx|poj}}
to direct Pe̍h-ōe-jī entries to Han character entries. What I object to is the attempt to "unify Chinese" to the extreme and suppress the subjectivity of various topolects in the meantime.
Although there have been discussions (Wiktionary:Votes/pl-2014-04/Unified_Chinese and Wiktionary talk:About Chinese#Unified Chinese revisited) about treating several Chinese languages as a single language "Chinese", neither WT:About Chinese nor WT:About Han script has laid down accurate guidelines that Chinese topolectal entries written in Han characters must be put under the header "Chinese". Furthermore, WT:LT doesn't prohibit the possiblity that any topolect can be treated as a separate language and thus owns its headings.
I understand some people want to merge all entries of Chinese languages for the sake of simplicity or even unity. But it is well-known that languages themselves do not possess inherent simplicity, let alone unity. While the identification, naming and classification of languages are considered objective and scientific, the unification of languages should be considered subjective and arbitrary, and often with a political purpose.
Regardless of the motivation for unifying the language, its impact is too severe to ignore. If one attempts to "unify" a language, some varieties will definitely gain prestige and some get marginalized, as blatantly pointed out and preached by the rationale in Wiktionary:Votes/pl-2014-04/Unified_Chinese. However, The fact that 99% of Mandarin lemmas are cross-topolectal is a consequence that Mandarin has long been occupying the written corpus of Chinese languages and dominating in national language policies of many Chinese-speaking countries. If the practice of "unifying Chinese" in Wiktionary is kept implemented but "Chinese" entries are always centered on Modern Standard Mandarin, then the diversity of Chinese languages will only worsen.
Taking a stance to preserve the diversity of language, not to eliminate it, I recommend encouraging any demonstration of the subjectivity of topolect as a language. After all, since we allow Okinawan and the Yonaguni lemmas written in Kana (see 大和 for example), what justifies us to prohibit the Hakka language written in Han characters from having its own lemmas?
Based on the above reasons, I would like to propose some improvements to the current practice:
- About heading
- If a lemma belongs exclusively to one topolect listed in ISO:636-3, no matter it is written in Han characters or in any allowed romanization, it should be placed under the L2 header specifiying the topolect on its own. The "exclusive usage" should always be attested, of course. For example, 𢯭手 should be placed under the header
==Hakka==
, and 鬥跤手 under ==Min Nan==. - If a lemma is used in two or more topolects listed in ISO:636-3, then it can be placed under the header ==Chinese==, in the sense that Chinese is a macrolanguage to which the lemma belongs. For example, 幫忙 (used in Mandarin, Cantonese, etc.) and 鬥相共 (used in Min Nan and Zhao'an Hakka) can be placed under the header ==Chinese==, as currently presented.
- Using ISO:636-3 as a criterion is only advisory but not mandatory. More nuanced distinctions are also welcome.
- If a lemma belongs exclusively to one topolect listed in ISO:636-3, no matter it is written in Han characters or in any allowed romanization, it should be placed under the L2 header specifiying the topolect on its own. The "exclusive usage" should always be attested, of course. For example, 𢯭手 should be placed under the header
- About pronunciation
- The functionality of Template:zh-pron should be enhanced to present phonetic/phonological diversity as much as possible. For example, the pronunciation of Hailu Hakka of 𢯭手 (Taiwanese Hakka Romanization System: tenˇ shiuˊ) should be presented.
- The templates for specific topolects should be accordingly updated.
- About category
- Topolect-specific thematic categories (e.g. Category:nan:Technology) should be allowed for all topolects.
- Template:zh-pron should be modified such that a term is automatically categorized into multiple topolects when the corresponding pronunciations are given.
- About the Han character variants
- Template:zh-forms, Template:zh-see and their respective modules should be modified so that the variants of Chinese characters in different Chinese topolects can be processed and categorized. For example, the recent addition of the available value
trc
in Template:zh-see aims to indicate the Taiwanese Southern Min Recommended Characters. Such specification only appears in Min Nan terms. An expedient approach would be adding a parameter to specify the language code and to display the name of the topolect. - Module:zh/data/glosses should include topolectal glosses, e.g.
["𢯭"] = "(Hakka) to help"
. - Creating modules or glosses specific to each topolect is also feasible.
- Template:zh-forms, Template:zh-see and their respective modules should be modified so that the variants of Chinese characters in different Chinese topolects can be processed and categorized. For example, the recent addition of the available value
- About etymology
- When a Han character is proven a phonological or semantic loan in a certain topolect and therefore owns its own variant forms, it is better to handle the etymology separately. Multiple L3 headers labelling
===Pronunciation x===
or===Etymology x===
() can be used. See 絚 for an example.
- When a Han character is proven a phonological or semantic loan in a certain topolect and therefore owns its own variant forms, it is better to handle the etymology separately. Multiple L3 headers labelling
Simply put, if a term is cross-toplectal, it can be treated as a "Chinese" term, taking into account the toplectal varieties. If a term is specific to just one toplect, it is treated independently.
I am not trying to overturn the decision made in Wiktionary:Votes/pl-2014-04/Unified_Chinese. My suggestions are certainly not perfect and require more discussion. I just hope to inspire everyone to take the variety of Chinese language more seriously.
Wikijb (talk) 21:14, 6 December 2023 (UTC)
- Oppose. The current practice based on the vote is to have all topolect terms, including a large number Cantonese, Min Nan specific terms.
{{zh-pron}}
takes care of categorisations, which won't include Mandarin, etc. if only|mn=
(Min Nan) was specified. Besides we have{{lb|zh|Min Nan}}
labelling technique to make it even more specific. ==Min Nan== L2 headers are only used for POJ soft redirects. The rationale on the vote includes an example of a Cantonese-specific word, which is never used outside Cantonese. Anatoli T. (обсудить/вклад) 23:31, 6 December 2023 (UTC) - Oppose It leads to confusion in formatting, giving editors an additional hurdle in learning entry creation/formatting. When a term is later found to not only be used in Min Nan but other varieties, there would also be several places where changes need to be made, leading to higher probability of malformed entries; these are usually small changes that only affect categorization, which make them hard to detect. I really warn you against implementing any of these ideas you mention above until consensus has been reached. — justin(r)leung { (t...) | c=› } 00:37, 7 December 2023 (UTC)
- One area where we could do better if we are to continue with the unified Chinese approach is with categorization with
{{C}}
. We should probably have both{{C|zh|X|Y|Z}}
and{{C|nan|X|Y|Z}}
when a term is used in Min Nan. — justin(r)leung { (t...) | c=› } 00:41, 7 December 2023 (UTC)- @Justinrleung: Thanks. Do you mean with
{{C|nan|X|Y|Z}}
won't categorise under Category:Chinese lemmas or topical categories only - Category:zh:All_topics? I support the latter, not the former. Anatoli T. (обсудить/вклад) 01:06, 7 December 2023 (UTC)- @Atitarev: I'm talking about topical categories. — justin(r)leung { (t...) | c=› } 01:30, 7 December 2023 (UTC)
- @Justinrleung: Thanks. Do you mean with
- One area where we could do better if we are to continue with the unified Chinese approach is with categorization with
- I just re-read the proposals above, and there are a few other things I would support. The points under pronunciation are definitely something we should pursue; I don't think there would be anyone opposed to having support for additional varieties. Hailu is already one of the things we're planning to implement in the future.
- I'm not exactly sure what the point about etymology means, and how it differs from current practice in general. — justin(r)leung { (t...) | c=› } 01:35, 7 December 2023 (UTC)
- Partial oppose. I share basically the same views with Justin. While I am certainly in opposition to the "let's dump everything under Chinese" approach, I don't think the suggestions regarding "heading" would be feasible, and perhaps even worse than the existing approach.
- On top of that I think that the points under "Han character variants" are symptoms of the problems of the templates
{{zh-see}}
and{{zh-forms}}
- I also find them rather problematic, but I disagree with the suggestions; they should instead be rewritten/replaced with better templates. – wpi (talk) 03:00, 7 December 2023 (UTC) - Oppose. Given that you only make reference to Southern Min and Neo-Hakka, I would like to fill you in that (at the very least) a lot of ISO-636-3 groups are questionable at best and some, like
wuu
would just end up with the same problem we've started with. Splitting headers, like what wpi and justin have already said, would cause a big mess. Whereas your ideas for zh-pron improvement are interesting, at the current point in time, I don't think any form of what you have proposed would be ergonomic or even feasible in Northern Wu due to how varied and complex the tone sandhi systems are. — 義順 (talk) 08:08, 7 December 2023 (UTC)
too many label aliases?
[edit](Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho Hi everyone. I just implemented functionality to display all the labels in all languages that categorize into a given category. See Category:Taiwanese Hokkien for an example. As can be seen in this category, we have a ton of aliases that produce the Taiwanese Hokkien category. Do we really need all of these aliases? It makes bot work a pain to have to account for all of them and it seems needless. What do people think of cutting down the number to something more reasonable? E.g. do we really need a Taiwanese Hokkien and Hakka
label (with 20 or so aliases) at all? Why not just put Taiwanese Hokkien and Hakka separately? Benwing2 (talk) 04:09, 17 March 2024 (UTC)
- @Benwing2: Agreed to split.
- Please also decide on labels. I can see former "Min Nan" changes back and forth with a couple of days(?) and still inconsistent for translations (new, recent, old) and Chinese entries. Confused between "Southern Min" and "Hokkien" for "nan-hbl" language code.
- If using a bot, pls don't forget about the alphabetical order for nested translations. Anatoli T. (обсудить/вклад) 04:15, 17 March 2024 (UTC)
- @Atitarev Yup, my script to fix up translation tables now sorts things alphabetically in nested translations as well as at the top level. Formerly it didn't change the order of nested translations but that's been fixed. In the former state I ran it on all translation tables but I haven't rerun it universally since the fix, only on pages where I renamed "Min Nan" to Hokkien. If you see any translation tables with Southern Min or Min Nan in them, please let me know; I suspect they've been added recently (i.e. after the Mar 1 dump I used to find translation tables with Min Nan translations). Benwing2 (talk) 04:20, 17 March 2024 (UTC)
- @Benwing2 Just on the "Taiwanese Hokkien and Hakka" label, I think it's supposed to be used instead of having "Taiwanese Hokkien" and "Taiwanese Hakka" separately. I agree it's silly, though. Theknightwho (talk) 04:43, 17 March 2024 (UTC)
- @Theknightwho Yeah I suppose it was added to avoid a bit of redundancy with separate labels
Taiwanese Hokkien
andTaiwanese Hakka
displaying the word "Taiwanese" twice. But that seems hardly enough reason to have the label and if this is really an issue, we can add a capability in Module:labels to compress adjacent labels of certain sorts in certain ways. Benwing2 (talk) 04:47, 17 March 2024 (UTC)
- @Theknightwho Yeah I suppose it was added to avoid a bit of redundancy with separate labels
Shuangfeng = Loudi?
[edit]@ND381 Why is the Loudi guide placed under Shuangfeng dialect? Are we treating them as the same dialect? Thanks! Mar vin kaiser (talk) 14:41, 23 July 2024 (UTC)
- you can fix it if you want — nd381 (talk) 23:34, 23 July 2024 (UTC)
Let's categorize single-character entries by radical
[edit]This seems like something that must have been thought of and dismissed before, but here goes...
I would like to propose a set of categories that would be added to entries for all hanzi characters covered by the radical and strokes system.
The top category would be called something like "Han characters by radical", and the others would be called "Han characters with radical 一", etc.
This would cover the same ground as Appendix:Chinese radical and its subpages, but I see it as a complement to them rather than a replacement: one would be able to switch back and forth between them depending on personal preference and convenience.
Now for the technical part: all it would require to implement this would be to modify {{Han char}}
to generate the categories with sortkeys, and to add code for the categories to the appropriate category modules so {{auto cat}}
would know what to do with them.
As for the sort keys: I have created a non-Lua template, {{1chn}}
, that will convert any number from 0 to 50 to a single character using Unicode enclosed characters.
+0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | +8 | +9 |
---|---|---|---|---|---|---|---|---|---|
⓪ | ① | ② | ③ | ④ | ⑤ | ⑥ | ⑦ | ⑧ | ⑨ |
⑩ | ⑪ | ⑫ | ⑬ | ⑭ | ⑮ | ⑯ | ⑰ | ⑱ | ⑲ |
⑳ | ㉑ | ㉒ | ㉓ | ㉔ | ㉕ | ㉖ | ㉗ | ㉘ | ㉙ |
㉚ | ㉛ | ㉜ | ㉝ | ㉞ | ㉟ | ㊱ | ㊲ | ㊳ | ㊴ |
㊵ | ㊶ | ㊷ | ㊸ | ㊹ | ㊺ | ㊻ | ㊼ | ㊽ | ㊾ |
㊿ |
This would allow section headers for stroke counts that would be sorted in the correct order, since MediaWiki categories use the first character of the sortkey for those. It would be a simple matter of generating sortkeys from the |as=
parameter and the character itself. The only peculiarity is that the Unicode characters for the digits 0-9 are different in appearance from those of the higher numbers. There may also be issues of font coverage for those blocks.
I'm assuming the technical changes will be relatively minor, but I might as well ping @Theknightwho in case I've overlooked something about the backend. I don't know enough Lua to do the module work myself, so I want to be sure I'm not asking too much, and I don't want to add too much system overhead to already overloaded pages. Chuck Entz (talk) 01:12, 27 July 2024 (UTC)
- @Chuck Entz I agree with categorising by radical. In terms of the sortkey, I agree that it's a good idea to just use the additional stroke count, since otherwise eveything would be in the same section.
- There are some potential complications, as certain characters have different stroke counts depending on the language (or jurisdiction): e.g. 着 has 11 strokes in mainland China and 12 everywhere else, while 漢 has 14 strokes in China, Taiwan and Korea, but 13 in Japan etc. Theknightwho (talk) 01:27, 27 July 2024 (UTC)