Wiktionary talk:About Han script

From Wiktionary, the free dictionary
Jump to navigation Jump to search

General Questions[edit]

Thanks for letting me know about your recommendations. This looks good, but I do have a few comments.

  1. Is the Hanja not used in North Korea?
  2. I would love to be able to type in the character rather than the Unicode. Looking these things up in the Unicode directory is a tedious process. Most of us don't have the software to allow for directly entering them.
  3. My Japanese dictionary uses the radicals; I don't know about Korean.
  4. It would be lovely if we went beyond stroke number, and could show progressive diagrams of the character being developed stroke by stroke
  5. The order CJK is also alphabetical so we don't need to justify the order on the basis of frequency.
  6. As I have done with English entries I would put matters of "style" and romanization before the menaings. These are all things which describe the word. The matters that follow the meaning all represent things that come from the word.
  7. I believe that a distinction needs to be made between romanization and pronunciation. They are not equivalent. Romanization is a way of writing Chinese with the Roman alphabet, and it is used independently of the particular Roman scripted language. It just happens to have a high degree of correlation with Mandarin pronunciation. Other romanizations can be given if available. Pronunciation guides for other dialects would also be useful.
  8. The number of "Example" or derived words and terms should remain open-ended for now. To be considered here are words that include the character as part a bigger character, whether or not it is as a radical. I would also chow derived Chinese binomials.

Eclecticology 08:09 May 5, 2003 (UTC)

1. According to Omniglot and CJK Dictionary Institute, North Korea stopped using Hanja in 1949. They were abolished by the government.
7. Good point. The pronunciation-Romanization distinction is crucial. The only standard pronunciation entering method I know of is w:SAMPA. It isn't hard to enter and after some usage can be recognized quickly.
Normal IPA appears in the Edit Box of some browsers (such as IE 5) as empty squares.
Should we provide non-standard pronunciation based on English spelling like "shian" for 仙? They're inaccurate, but they benefit non-East Asian readers and those who refuse to learn SAMPA.
8. What do you mean by "chow derived Chinese binomials"?
--Menchi 12:50 May 5, 2003 (UTC)

1. OK not a big point about Korean. Ultimately, Wiktionary is about the languages rather than the countries.

It appears that Hanja education was initially banned in North Korea, but it was eventually phased back in. Hanja are apparently taught in North Korea--2000 in middle school and high school and another 1000 in university. (Source: http://www.hanjadoc.com/news/chosun001015.htm -- sorry, but the page is in Korean.) --Sewing 23:28, 1 Oct 2003 (UTC)

7. I've always been partial to IPA myself. I find many of the usages in SAMPA to be so counterintuitive as to be misleading. I expect that something like UTF-8 will be the standard of the future, but that remains a problem for old browsers. Those who can't read the IPA symbols aren't doing any better with CJK characters.
Non-standard pronunciation guides can sometimes confuse the issue. Anglophone instinct might be to treat "shian" as having two syllables. Non-standard techniques can be used when they can clearly make a distinction; I prefer to avoid them. Pronunciation issues will continue to be a problem in every language, including English. The one right way often does not exist.
8. Sorry! My typo; "chow" should be "show". Eclecticology 16:37 May 5, 2003 (UTC)

8. What are "Chinese binominals"? --Menchi 22:14 May 5, 2003 (UTC)
Chinese binomials refers to the practice of using two Chinese characters to represent a concept. Eclecticology
I see. It'd be of interest. But if the character is a radical, then aren't all the derived binomials listed in the Radical directory? --Menchi 08:24 May 6, 2003 (UTC)
Apparently not... Should we have such lists? Derived binomials following their radicals? --Menchi 08:27 May 6, 2003 (UTC)
Having these on the pages for radicals would soon make these pages far too big, especially for the most common radicals. My dictionary has 39 derivatives for the character alone. The Unicode directory then has a 2½ page listing of charaters with the radical . There is potential here for a few thousand entries on one page. Eclecticology 18:05 May 6, 2003 (UTC)
  • I've put info on alternative usage of 正 in counting votes etc. in Japanese Other info: section. See if it fits your idea of what should go there, and if applicable, do the same for Korean and/or Chinese (I don't know if 正 is used there for that purpose as well). If you think it's the kind of content that should go into other info:, it'd be good to expand template with an example.

80.53.75.130 23:49, 24 July 2005 (UTC)[reply]


Japanese Reading[edit]

Thank you for your message. Maybe I can suggest one more category in Japanese section. There are many kanji that when used together with some other kanji to compose a word, the word then have special reading. This reading doesn't contain the on reading or kun reading of the composing kanji. For example, 今日 (today) is written as 'kyō' in Romaji. 大人 (adult) is written as 'otona'. Here no on or kun reading is included. Petruk 23:29 May 8, 2003 (UTC)

How do we format this? A section for each reading, under the main section "Other readings"? Or can they be fitted under "Example words" section? --Menchi 00:22 May 9, 2003 (UTC)

Maybe we could grouping the three readings under one section. Below is what it would look like for (now) :

Onyomi and kunyomi are on reading and kun reading respectively. These terms are extensively used in Japan and by new learners of the language, so they deserve recognition and not just on or kun which IMO are too short. Petruk 00:09 May 10, 2003 (UTC)

This makes the Chinese look easy. To me the underlying question is always, "How will the unsophisticated person look something up in Wiktionary?" Strictly speaking, no particular pronunciation is implied by a kanji character. We need to assume that the reader does not yet know whether the character in front of him is Chinese or Japanese. We'll need separate articles for both the individual characters, and for the compounds, and also for the romanizations.
The English, and most other European minds, thinks in terms of a script that is directly linked to sound and pronunciation. The kanji is linked directly to meaning, and bypasses pronunciation. This makes for a fascinating challenge. Eclecticology 01:52 May 10, 2003 (UTC)

Chinese Dialectal Readings[edit]

It's easy to include them, like:


But how about in the example section? Do you just use Mandarin for the sake of simplicity and since it is the standard Chinese? --Menchi 01:26 May 14, 2003 (UTC)


Pronunciation & Definition[edit]

There are many Chinese characters that have more than one pronunciation(for Mandrain), and their different pronunciation is related to their meanings. For example, 数 has two ways to pronounce, shu4 or shu3. And the first is used as a noun, while the second as a verb. Where shall we place them? Under definition or pronunciation? --Formulax 08:19 May 21, 2003 (UTC)

Formulax, welcome! I think under pronunciation is neater. There are several established methods in paper Chinese-English and Chinese dictionaries:
  1. Some dictionary lists the different pronunciations in the pronunciation section, and distinguish by means of:
    1. numbering, or
    2. identifying different parts of speech.
  2. Others, in the pronunciation section, next to each pronunciation, they give a one-word definition to identify.
  3. An entirely new entry is given to separate pronunciations when they each have different meaning.
I think option 1.1 looks the most concise. E.g.,
...
...Romanizations
In a list
  • pinyin:
    • yuè (yue4) (1)
    • lè (le4) (2)
    • yào (yao4) (3)
Or, in a row
  • pinyin: yuè (yue4) (1), lè (le4) (2), yào (yao4) (3)
The list probably is easier to read.
What's your opinion? --Menchi 09:25 May 21, 2003 (UTC)

Word vs. Character[edit]

I think I like the list. But I've got another thought that perhaps we need not creat a single page for every Chinese phrases? I mean, we can put all Chinese phrases under one character, e.g. 数学、数字、数落、数据…… can all be classified under the character 数. We need not to create a separate page for them. This is what most Chinese dictionaries do. I think it is quite good as readers can find what they want straight away. For example, I've got one Chinese-English dictionary that lists the following under 数 (just as an example):

(shu4) 1. number; figure: 无理数 irrational number/ 代表人数 number of delegates 2. several; a few: 数百人 several hundred people ……

【数据】 (shu4 ju4) data: 科学数据 scientific data
【数学】 (shu4 xue3) mathematics: 数学家 mathematician

…………

What do you think?

Presently, there is not yet an entry on multi-character word in any of CJK. The cause behind the present state of linking every word, multi or single character, is due to:
  1. A parallelism to the phonetic languages, where, except compound words, words can't be broken down any further, unless done so etymologically.
  2. Also, perhaps a more important cause, is that Wiktionary probably originally had (or maybe still has) the intention to place translations of all languages of the world on the pages of each word, which is to be in all language as well, and not just on English. (The practice now, however, is to place translations of other languages, on just the page of the English word.) To do the original comprehensive translation multi-directionally, each word would require its own page. However, when adding a translation or modifying one to a word, one would have to do this n times, theoretically to all the languages, then that's be hundreds, and in practice to at least 20 or so major languages and 3 artificial languages (Esperanto, Interlingua, and Volapük) that have gathered great presence on the 'Net. I do not know how that is feasible at all for a human with normal attention span.
What's done so far is that all English translations of multi-character CJK words are on their respective initial character page. Most of them will not get their own individual entries any time soon, so Wikifying them may be unnecessary.
An example of a need that would require a multi-character word its own entry: Clustering of data, when the translation is too long (more than 5 words? a sentence?) or too numerous. --Menchi 20:25 May 22, 2003 (UTC)
I think pinyin written in this way will be better :Zhōngwén(中文) than Zhong1wen4(even though it's much easier to type in the latter way), and , can I used 'sC' for 'simplified Chinese' and 'tC' for 'tranditional Chinese' in the English-Chinese dictionary in order to make the page tidy? Here is a example of this (which is not using the abbreviation there): Chinese, and I think in the English-Chinese dictionary, both sC and tC should be added. :O --Samuel 18:50 Jul 27, 2003 (UTC)
by the way, if we present Cantonese in the dictionary as well, maybe we should also add other main dialects too! (or just because Cantonese is used in Hong Kong and Macau?) and .... hmmm... even though my mother tongue is Cantonese, I have never learnt how to write Chinese characters in Latin way... I cant read Penkyamp or something like that! that's too bad for me :'( --Samuel 18:58 Jul 27, 2003 (UTC)



Radical Index[edit]

Why are some radicals mixed up with ordinary characters? That's anomalous for a dictionary and annoying. How about putting radicals on separate indexes like: Wiktionary:Chinese_radical_index_一? --Nanshu 00:19, 21 Aug 2003 (UTC)

I second that motion. -- Emperorbma 04:47, 24 Aug 2003 (UTC)

Bot[edit]

I think Unicode's Unihan database (Note: very large) provides enough information for templates. And it's not difficult to create template articles automatically from the Unihan database. May I complete all "standard" Chinese characters (about 28,000) in Unicode using a bot? --Nanshu 00:46, 24 Aug 2003 (UTC)

Here is an example of machine output text: . Any comment? --Nanshu 00:32, 31 Aug 2003 (UTC)

Known problems:

The Bot

  • does not use long vowel signs to Japanese Kun because it requires morphemic analysis. Compare 講師 kou-shi and 子牛 ko-ushi.
  • detects the type of a character ("simplified", "traditional" or "both") by variants. If it has one or more simplified variants, it is "traditional". If its simplified variant is also a traditonal character (like 臺), its simplified variant (like 台) is "both". The same is true of traditional variants. But the detection is sometimes wrong.
  • does not designate the type of a character ("simplified", "traditional" or "both") if it has neither simplified nor traditional variants.
  • sorts Chinese readings by alphabetical order. They should be rearranged by frequency.

--Nanshu


Index by Korean reading[edit]

I will upload the index of Chinese characters by Korean reading, but I wonder how to title the index. My proposal is:

Index:Korean/Hanja/ㄱ

Any comment? --Nanshu

As long as it is a standard Hangeul pronunciation index of 음 (音) readings, it should be fine. --Sewing 14:03, 30 Sep 2003 (UTC)
Done. --Nanshu 00:40, 1 Oct 2003 (UTC)

Also, I think the index of Chinese characters by Korean reading should be separated from that of Korean words because, as in Japanese, most Hanja are not used independently but as part of compounds in Korean. --Nanshu 01:53, 30 Sep 2003 (UTC)

Yes. Except in the cases where single characters are used as independent morphemes, which does occur (for example, 線 (선; seon; "line"), 幅 (푹; pok; "width"), 個 (개; gae; "object" [for counting]), 七 (칠; chil; "seven"), and so on). Anyhow, this is standard: although some dictionaries mix the characters in with the main entries, I think the trend is to have a separate appendix listing at least the 1800 교육용 한자 (Hanja for educational use), always listed in Hangeul pronunciation order. --Sewing 14:03, 30 Sep 2003 (UTC)
By the way, where did the index come from? If I find a missing character ( seems to be missing, for example), can I add it, or might it be regenerated at some point in the future? --Sewing 00:01, 2 Oct 2003 (UTC)

Korean "sound" and "meaning" readings[edit]

In Korean character dictionaries, the 훈 (hun; 訓; "meaning") reading *always* accompanies the 음 (eum; 音; "sound") reading. These "meanings" are never used to pronounce the character and are often quite archaic (for example, the 메 (me, an obsolete native Korean word for "mountain") in 메 산 (me san; 山)). Nevertheless, when a character is talked about on its own, it is usually prefixed with its "meaning" reading, to differentiate it from other characters with the same "sound" reading.

Examples of hun and eum readings together include:

  • 山 (메 산; me san; "mountain"; always pronounced 산 (san));
  • 水 (물 수; mul su; "water"; always pronounced 수 (su)););
  • 中 (가운데 중; gaunde jung; "centre"; always pronounced 중 (jung)););
  • 人 (사람 인; saram in; "person"; always pronounced 인 (in)););

--Sewing 14:13, 30 Sep 2003 (UTC)

I have added details to the template for Korean Hanja and added information to the character entry for (한 일; one) to give an example of what the Korean coverage should look like. --Sewing 14:40, 30 Sep 2003 (UTC)

If you have a public domain or LGPL-ed Eum-Hun table, I'm glad to incorporate it into the template. --Nanshu 00:40, 1 Oct 2003 (UTC)

An example would be the table at http://deungdae.hihome.com/mh1800.htm, but I don't know if it's in the public domain or not. It would actually be better to go through the characters "by hand" and add the readings, because some characters have multiple readings; and anyhow, the table whose URL I gave only gives the 1800 basic 교육용 한자 (敎育用漢字; Hanja for educational use). I would like to do this job, but it will take a while (like a few weeks or months!). I don't really know how bots work, but would it be easy to write a bot to format each Korean entry according to a template, so then all I (or someone else) would have to do is add the text? --Sewing 06:21, 1 Oct 2003 (UTC)~
I added the blank item Hun-eum reading. And the Yale romanization of Korean and South Korean spelling variants (롱>농) are implemented. --Nanshu 23:08, 1 Oct 2003 (UTC)
Good job considering the NK/SK variations, because the treatment of initial ㄹ is of course very different in the 2 countries--but there's a catch: see my next posting below. Anyhow, I should have stuck to the correct Korean word, which is 음훈 (eumhun). I have created a Wiktionary entry for the word, and edited the template page and also the page for to show what I now think the Korean entries should look like. I hope this doesn't inconvenience you... --Sewing 23:36, 1 Oct 2003 (UTC)
The NK/SK spelling variation (롱/농) is tricky. In South Korea, when a character that starts with ㄹ is used on its own or at the beginning of a word, it is replaced with a ㄴ (or ㅇ if ㅑ, ㅕ, ㅛ, ㅠ, or in some cases ㅣ follows it. (Examples: 羅 (라) in 羅州 나주; the family name 리 (李), usually spelled and pronounced 이.) But when the character comes in the middle or at the end of a word, the ㄹ is maintained (so a South Korean would write 용 (龍) for "dragon," but 백룡 (白龍) for "white dragon"). This rule also applies to the eumhun reading of Hanja: South Koreans normally write 육 for 六, but Chinese character dictiories call the character "여섯 륙" (yeoseon-nyuk). For 용, the eumhun reading is "용 룡" (yong-nyong)! In North Korea, however, the ㄹ is always used, even at the beginning of a word--so a North Korean would spell the South Korean city name 나주 as 라주, and romanize it as Raju (like Rajin near the Russian border), instead of Naju. This consonant-changing rule also applies to ㄴ when it comes before ㅑ, ㅕ, ㅛ, ㅠ: for example, 녀 (女), which is spelled and pronounced 여 in 여자 (女子). --Sewing 16:12, 2 Oct 2003 (UTC)
Fact itself isn't copyrighted, but a compilation of facts can be copyrighted. Anyway, I'm not good at law. --Nanshu
True enough. Basically, all Korean Hanja dictionaries give the same names for the Hanja, and sometimes even the same definitions; because nobody has a copyright on the names of the Hanja. --Sewing 02:52, 4 Oct 2003 (UTC)

The articles , , and [edit]

This articles represents several serious flaws in the current settings.

  1. Not all meanings of 丰 are shared with ; Not all meanings of 余 are shared with ; Not all meanings of 只 are common to and .
  2. By saying 丰 is an alternative form of 豐, people may mistakenly think that it is also the case in Japanese and Korean but it is not true.
  3. 湯 means "soup" in Chinese but means "hot water" in Japanese, there is no standard way to identify the differences.

Removed extra lines[edit]

I removed the extra lines from above the headings. I don't know why we need them, they are not used anywhere else. Wikipedia, wikibooks etc. avoids them, so we should too. Also I think it is best not to put hyperlinks in the headings. On Wikipedia it's

History[edit]

Main Article: History

and not

I think we should do the same Gmcfoley 20:04, 3 November 2005 (UTC)[reply]

Changed heading style[edit]

I changed the heading style from

Japanese Kanji[edit]

to

Japanese[edit]

Kanji[edit]

This is because the kanji may also be a word, for an example view under Japanese. It's a kanji and a noun. Gmcfoley 20:21, 3 November 2005 (UTC)[reply]

Sortkeys and subcats for single-character entries[edit]

Discussion copied/moved from the Beer parlour:

My apologies if this has been discussed elsewhere, but I was going through some of the single-character CJK entries (using Special:Allpages, starting from the end and working backwards) adding the categories Category:Chinese hanzi / Category:Japanese kanji / Category:Hanja to the entries, as appropriate, until I came to my senses and realized that I'm not gonna do that "by hand" for 10,000 entries (or however many there are)! We need someone to write/modify a bot to do this automatically. What it would have to do:

  • For each CJK character, find the page for it or add it to a list if there is no existing page.
  • If the page exists, look for appropriate strings to identify it as belonging to one or more of the above categories. (Since most of the pages were added by the bot NanshuBot and many have not been touched since, it should be easy to create regexes that will pick out this info.)
  • Tag the articles with the necessary categories.
  • Even better, find the radical for each character (already on the pages themselves) and use that as a sortkey for the category, to sort the categories by radical (as is done in the Japanese category).

Anyone know a bot that could easily be modified to do this? Or would someone like to write such a bot? Other comments/objections? - dcljr 20:20, 24 January 2006 (UTC)[reply]

These categories are not very useful. Even with only the 10,000 entries that would be too much. At 200 entries per category page that is already 50 pages. In reality the number of entries that would qualify into this category is many times more. When I just looked Category:Japanese kanji already had 882 entries. Put yourself in the shoes of a user. How do you go about finding something in a category that long or longer.
I think that catagories need planning. Categories should be neither too small nor too big. As I see it any category with more than 200 elements is too big, and should be broken up into sub-categories. How would you go about subdividing Kanji? Eclecticology 00:47, 25 January 2006 (UTC)[reply]
I think it is a fallacy to think that the only possible use of a category is for looking up directly related entries. Large categories complement a paper dictionary analogy: looking up a word one is likely to browse other entries as well. Certainly for language-learners, having an "over 200 entries" list would be useful. Certainly for teaching, having large lists can be beneficial.
I do not know of any major server being overwhelmed maintaining category tables; these record entries seem like a very light-weight approach that the DB servers should handle without any noticeable performance hit. (That is an unsubstantiated guess.) Is there any complaint from the developers that the performance hit of categories is overwhelming? When categories were first introduced, there were serious user-interface problems navigating large categories, but those seem to have been overcome about six months ago. --Connel MacKenzie T C 05:20, 25 January 2006 (UTC)[reply]
If further subcategorization is needed, that can be handled by users familiar with the respective languages (and future bots). As it is, there are a ton of these pages with no category on them at all, so I was just thinking that should be remedied. Note the existence of many "index" pages for Chinese characters (by radical, strokes, and two input methods) which greatly alleviates the lookup problem (links to these should be given on the various category pages — I'll work on that in a bit). It would be nice to compare how they handle the corresponding category in the Chinese Wiktionary, but zh:Category:汉字 doesn't look like where the action is (that's using simplified characters — a search using traditional characters didn't turn up anything).
As for subdividing the kanji, that category is already being worked on to list the characters by radical in the main category and by grade in subcats, the latter approach being best done "by hand", anyway. The bot can just use cats like [[:Category:Japanese radical <character>]] and these can be checked for any discrepancies between how the languages traditionally assign radicals (NanshuBot was working from a Chinese perspective, it appears). Index pages similar to the Chinese ones can be created for Japanese, but the basic category system should probably be based on radicals. In other words, just change my "sortkey" suggestion to a "subcat" suggestion and any further subcategorization can be done later.
Again, checking how the natives do it is a good idea: ja:Category:漢字 is the main kanji cat, ja:Category:漢字 部首 is for radicals, and ja:Category:漢字 総画 for stroke count. It doesn't appear that they're trying to subcategorize the non-Jōyō kanji at all (e.g., by radical). As for sortkeys, they seem to sort everything by kana (i.e., pronunciation), which makes sense if you're a Japanese speaker. I don't know much about the Korean hanja, but I assume they should be categorized here by radical, as well. Finally, note that Vietnamese also uses Chinese characters (Chữ nôm and Chữ nho), but this information wasn't added by Nanshubot and so is lacking on most pages. - dcljr 06:50, 25 January 2006 (UTC)[reply]

Wait, no... don't replace my sortkey suggestion, add to it. Why can't we have both? Hmm... I guess this does require further thought. Let's see how this would work in practice:

  • First of all, we definitely need some kind of sortkeys or else every character gets sorted under itself in the category (not helpful at all):
    ...
  • The alternatives would therefore seem to be:
    1. By radical:
      ...
    2. By kana:
      ...
    3. By stroke count (no existing category doing this, but see ja:Category:漢字 総画):
      29
      30
      ...
    4. Nothing. Don't categorize individual kanji (except semantically); just list them on index pages.

(Wow, that's complicated wiki-formatting... <g>) Okay, well, that's all I have time for tonight. I'll think about this some more... - dcljr 07:41, 25 January 2006 (UTC)[reply]

These are interesting, and very complicated challenges. The sort keys are fine, and the problem of sortkeys is necessitated by the structure of Unicode. But remember that as this project develops there will also be many multi-character kanji representations that will render these one character headings more meaningful. The semantic categories should always be a go for every language, and should reflect a scalable parallel structure in every language. (I shold explain that vision separately some other time.) Whether the kanji category is valid at all is still an open question in my mind. The same can be said of kana and total stroke count. They have their uses, but what is the best way of exploiting those uses?

While Connel is right to say that categories can have other uses, I think that we are more interested in their purpose than in their theoretical uses. We are building a dictionary for the benefit of people rather than the benefit of computers. I see categories as primarily a tool to facilitate access to the huge amount of data that this project will eventually hold. Eclecticology 10:33, 25 January 2006 (UTC)[reply]

I just wanted to point out quickly that radicals, strokes and kana readings are standard aspects of kanji that people who know (or are learning) the language will expect to see somewhere. The only issue really is whether/how to implement them in the category structure. And I think (hope) this is what Eclecticology is saying. I can't imagine not having one or more categories that simply list kanji, however perhaps it's only necessary to do this with the Jōyō kanji (as is done in the Japanese Wiktionary). If we do choose to use, say, Category:Uncommon kanji as a "catch-all" for the non-Jōyō kanji, we can always use category TOCs to make it easier to jump to a particular place in the list (as has started to be done here). All of this is independent of whatever other categories we want to use; the sortkey issues are a bit different when dealing with kanji in semantic/grammatical categories, anyway. - dcljr 21:58, 25 January 2006 (UTC)[reply]
What I'm wondering is whether the category system is even the right way to handle this sort of thing. We already have Wiktionary:Japanese Kanji index but that only includes a couple of lines for "on" and "kun" readings. The category system works for sorting out the Grade 1 kanji, etc., but I think one has to be careful about putting demands on a system that is beyond what it can realistically handle. I agree that if people expect to find something somewhere they should find it. We are only differing in where that where should be. The index pages for Japanese, such as the one mentioned above, don't seem to have been touched since the summer of 2004. We have already agreed that "Index:" should be a full namespace, and someday that will be implemented. What we need to look at in the Japanese context is when we should use category and when we should use index. Eclecticology 19:58, 27 January 2006 (UTC)[reply]
Okay, then we agree on what we're discussing. Now we just need more people to chime in with their opinions/issues... - dcljr 01:55, 28 January 2006 (UTC)[reply]
Sorry for the belated comment, but I prefer the "by radical" approach the most. Not only it is the most usual way for paper dictionaries to sort out knaji/hanji, but it is free from the MediaWiki's technical restriction that prevents us from seeing the far of the first letter of each sort key. If we have a general kanji category at all, I also think that we should build well-organized one. --Tohru 10:51, 3 February 2006 (UTC)[reply]
Oh, yeah, that's true: the fact that only the first character of the sortkey is visible in the category means we can't do option #3 as listed above (on category pages, anyway)! - dcljr 05:13, 10 February 2006 (UTC) (Of course, we could still sort the kanji by stroke count under each radical — e.g., [[Category:Japanese kanji|乙+01]], or whatever — as is done in many "lookup" lists of kanji.) - dcljr 05:21, 10 February 2006 (UTC)[reply]

Ew... I just thought of something else (hence the return to the left margin): Unicode actually has characters specifically for radicals, although I personally don't have the font(s) to display them. My previous comment used the 乙 kanji (U+4E59) and not the ⼄ radical (U+2F04). This is an unpleasant issue we'd have to wrestle with when using radicals in sortkeys, especially since not all radicals have corresponding (identical-looking) kanji. - dcljr 05:31, 10 February 2006 (UTC)[reply]

Penkyamp[edit]

Why is penkyamp listed as one of the romanization schemes for Cantonese? It's a non-peer reviewed system that had its article deleted from Wikipedia a while back because it was deemed original research (see here). —Umofomia 08:33, 11 June 2006 (UTC)[reply]

Seeing no response in over a week, I'm going to remove it. —Umofomia 04:52, 23 June 2006 (UTC)[reply]

Cleanup[edit]

I’ve heavily revised the page to accord with current use – trust this is ok.

Nbarth (email) (talk) 00:18, 9 June 2008 (UTC)[reply]

Etymology[edit]

I’ve suggested a form of “etymology” for Chinese characters: how a character was simplified, for Simplified Chinese and shinjitai, and older forms for Traditional Chinese. This is admittedly ambitious, but seems reasonable to aspire to.

Nbarth (email) (talk) 00:20, 9 June 2008 (UTC)[reply]

I’ve also suggested the use of the traditional 六書六书 (liùshū, six writings) classification. If people know other classification systems, I’m sure they’d be useful to include.
Nbarth (email) (talk) 02:02, 15 June 2008 (UTC)[reply]
I’ve also added caveats against using current forms, and that the overwhelming majority of Chinese characters are phono-semantic compounds.
Nbarth (email) (talk) 02:06, 16 June 2008 (UTC)[reply]

Unihan pinyin problem[edit]

Unihan pinyin for rare characters are sometimes different from the pinyin given in published Chinese dictionaries. For example the character is pronounced "ji1" according Unihan, but listed as "zhen3" according to a Chinese dictionary. This is not the first time I found such discrepancy, I have encountered it before (tho I forgot what that character was). --24.199.97.20 13:11, 22 June 2008 (UTC)[reply]

Radical-and-stroke ordering[edit]

I’ve suggested the use of radical-and-stroke sorting for collation in “Compounds” (Wiktionary:About Chinese characters#Compounds), as this seems the natural system. Alternatively, one could base it phonetically, which seems appropriate for a given language, but not for characters per se.

Nbarth (email) (talk) 22:09, 22 June 2008 (UTC)[reply]

Character references, dictionaries, and entry methods[edit]

What is the guideline for including cross references other than those found in {{Han ref}}? Since character entries are targeted at an English-speaking audience, my thought is more information can be provided, particularly for those that are looking at these from a Japanese perspective. Right now the entry methods (other than the Unicode listing) are limited to Chinese-language entry methods. Shouldn't we also include Kuten, Shift-JIS, and JIS codes as appropriate? Would these go under Transligual or Japanese? THe Japanese wiktionary lists all relevant entry codes for its character articles. There is {{ja-kref}} used in a handful of articles that does this (under the Japanese heading), which could probably be merged with {{Han ref}} or remain as a stand-alone template.

Also, what about referenes to dictionaries/indexes other than the major one's listed. I doubt your average English student of chinese characters is going to have access to the Morohashi dictionary or the Dae Jaewon. But he or she may have one of the other character dictionaries on the market, like the Classic or New Nelson's, One of Halpern's dictionaries, or Spahn's Kanji and Kana dictionary. Several of the more common ones are listed at this website. We wouldn't use them all, obviously, but the main 4 or 5 common ones could be included. Also, we could also addSKIP Patterns to the entries, since that is also a common indexing format in addition to the traditional radical method. Again, some of these are Japanese centric and probably belong under that language heading.

Just a few comments to see what other folks on here think about this proposal before I start making some additions on my own.Dcmacnut 23:23, 15 February 2010 (UTC)[reply]

Combining into one Chinese section with multiple dialectal pronunciations listed in a separate subsection[edit]

The current practice of listing varieties of Chinese under different sections and explaining them separately is senseless - Chinese dialects are numerous and the speech of one county tens of km away from another could well be mutually unintelligible, especially in the mountainous south. But the high homogeneity of written Chinese resulting from thousands of years of imperial governance has resulted in a significant degree of semantic agreement that is comparable to dialects of English, except for perhaps some specially created dialectal characters which could be explained otherwise if needed. Weijicidian 00:00, 25 June 2010 (UTC)[reply]

I agree with you to an extent, and also the amount of standard Mandarin contributions and the general knowledge, usefulness exceeds all other dialects BUT we had a vote and we have to comply. This is a very sensitive issue and there were a lot of discussions. It doesn't cause too much trouble as it is and you can still add a dialect translation if you wish and we shouldn't discriminate any dialect speaker. --Anatoli 01:26, 25 June 2010 (UTC)[reply]
If I am not mistaken, the vote you mentioned above would be [1]. Still, it seems unconvincing and unsound as an established consensus just by observing the pro- and con- arguments, which generally showed a lack of knowledge of the genuine differences across Chinese variants. First of all, it is not the "amount of standard Mandarin contributions" that shaped the current situation of homogeneity. Standard Mandarin has its phonology based on the Beijing dialect but its vocabulary is largely drawn from the category of Mandarin dialects; thus the supposedly accurate "Mandarin pronunciation" provided on Wiktionary pages is by and large erroneous and misleading - well at minimum the participants of the vote who had indicated that they had some knowledge of Chinese would have been expected to realise that Mandarin itself is not homogeneous enough either to be summarised by one single pronunciation when they were talking about the differences between Mandarin and other dialects and how they should be separated. At least the word "standard" should have been somewhere, to distinguish it from the pronunciations elsewhere, say the ones in Harbin Mandarin, Tianjin Mandarin, Chengdu Mandarin, Malaysian Mandarin (in general), Kunming Mandarin, Taiwan Mandarin (in general), or whatever. Secondly, from what I know, even if one looks outside the subgroup Mandarin, there is also substantial lexical and semantic agreement across other varieties of Chinese. For example, 欢迎/文章/科学/自由 would be used to mean "welcome"/"article"/"science"/"free" regardless of the variety that one speaks. True that many strictly dialectal words exist (albeit the widespread phenomenon of absorption of dialectal words into Mandarin), such as 媒倒, but these can be explained fine with annotations in brackets as in bluggy, or using headings if necessary and preferable, as long as there is minimum overlap between different headings. However applying this to general words and creating headings for each variety would be grossly unnecessary, making the dissimilarity across varieties redundantly and stupidly overpronounced. Weijicidian 03:51, 25 June 2010 (UTC)[reply]
I don't think I can change anything, not so interested either. Wiktionary is to help people find translation - that's what we do. The long and endless debates about Mandarin vs dialects generally don't lead anywhere. The current setup allows to have entries in Mandarin, Cantonese, Min Nan and any other major dialects. The translations are all nested under * Chinese. The 文章 has Japanese, Mandarin and Min Nan entries. I have no doubt that over 99% of the formal vocabulary and over 95% of the spoken vocabulary is identical across dialects and they would share the same written form (trad. and simpl.) but since the pronunciation is different and many editors believe they are different languages - we have separate sections. You will find that there will be strong resistance to merging Mandarin and Min Nan and you may be accused of destroying dialects/languages. It's better not to open this can of worms. You can post in WT:BP to get more attention, anyway. Recently we had a user with views quite opposite to yours - Wiktionary:Beer_parlour#Chinese_languages_nesting. --Anatoli 04:30, 25 June 2010 (UTC)[reply]
I came here only to provide my arguments on this issue and hopefully receive some resonance, but if this is discouraged and there is a rule here saying discussions are to be avoided and status quos kept at all times on sensitive issues, then fine. I would have to say that I am a bit amazed by the individual-languages-always-warrant-their-headings rule here though, which I have not observed in other major language editions of Wiktionary before. Weijicidian 05:46, 25 June 2010 (UTC)[reply]
If you are interested why and how - the Beer parlour is a better place. Everything has a reason and everything can change. I'm not saying you can't discuss it. The English wiktionary must be accumulating the largest number of languages and dialects, so you can see more variety. Languages have their headings in other Wiktionaries as well. Wikipedia has now articles in Wu, Min Nan, Cantonese, etc. with their differences, even if they are small. --Anatoli 06:03, 25 June 2010 (UTC)[reply]

切韵[edit]

I think a decent Chinese Character dictionary should provide 切韵/廣韻 information of the character. Lots of Chinese and Japanese dictionaries do this. As far as I know, Chinese dictionary 辞海 and Japanese dictionary 漢字源 has 切韵韵部. Japanese 全訳漢辞海 provide both 韵部 and 聲紐. I suggest we add 切韵/廣韻 into the header's reference section.--Tricia Takanawa 23:58, 5 September 2010 (UTC)[reply]

I like this idea. Different dictionaries such as Qieyun (切韻), Guangyun (廣韻), Kangxi Zidian (康熙字典), etc. may use different Fanqie (反切) though. I suggest adding a Fanqie section to each character, and list the fanqie that are used in each source. —Umofomia 17:38, 13 September 2010 (UTC)[reply]

夀 stroke number[edit]

Wiktionary gives the total number of strokes used to write as 15, Unihan says 14. I guess 15 comes from people first writing a flat 士 on top and giving the stuff under it a separate vertical stroke. But I have a few CJK fonts geared towards different languages, and all of them clearly have a single vertical stroke, so I was wondering if 15 strokes is a variant in actual use or just a mistake. Thanks. 82.83.105.171 13:22, 15 July 2014 (UTC)[reply]

Corrected to 14 strokes. Justinrleung (talk) 06:59, 6 October 2015 (UTC)[reply]

Obscure character variants[edit]

Are there any particular way to represent non-standard variants that aren't encoded into Unicode?--Prisencolin (talk) 04:25, 24 October 2016 (UTC)[reply]

IDS representation: ⿰子力, ⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心suzukaze (tc) 04:29, 24 October 2016 (UTC)[reply]