Wiktionary:About Chinese characters

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

There are a number of entries on Chinese characters (漢字, hanzi), which are used in China (simplified version), Taiwan (traditional version), Japan (kanji), and Korea (hanja).

Chinese characters were formerly used in Vietnamese (chữ Nôm), have been used in minority languages in China: Bai, Dong, Miao and Zhuang (this latter using significant variants), and variant scripts were used for the extinct languages of Khitan, Jurchen, and Tangut.

In addition to a standard format, this page recommends standards on writing (creating or editing) these pages.

Recommendations[edit]

Use {{lang}} to wrap Chinese character text, specifying the language and using Hant (traditional), Hans (simplified), Jpan (Japanese), or Hani (generic Chinese) as the script. Otherwise characters won’t display in the correct font.
The issue is that, due to Han unification, a given Unicode character does not have a language attached, and if one is using a mix of Japanese and Chinese fonts, one needs to specify the language of Han characters, otherwise characters will default to the user’s default language – so a given compound may appear in a mix of Japanese and Simplified, say, and one may have rather jarringly different fonts installed.
Enter characters, not HTML entities
When entering the characters, do so directly as a Unicode character, and not as HTML numeric character references. For example, simply type "犬", rather than the code of 犬, which is unreadable to most humans, and makes editing the entry awkward.
If you can't compose the character (or it isn't simple), enter the numeric character reference (e.g. 犬), preview, and copy/paste the character back to the edit window.
This is for ease and clarity of editing by other contributors.

Entry layout[edit]

Chinese characters are both characters and the spellings of words in various languages. Thus entries for Chinese characters:

  • begin with a “Translingual” section on the character itself, and
  • then include entries in each language that uses them.

The language-specific sections themselves begin with the character itself (listed as Hanzi, Kanji, Hanja, Han character (Vietnamese), respectively), then include a part of speech section if the character is used in isolation as a part of speech. The part of speech sections are identical to the usual WT:ELE form, but the character sections have specific formats (reading/eumhun and compounds), as detailed below.

Sample code from is shown below.

{{also|曰|臼|白}}
==Translingual==
{{stroke order|type=animate}}

===Etymology===
(Explanation of form; ideally shows earlier forms.)

===Han character===
{{Han char|rn=109|rad=目|as=03|sn=8|four=4071<sub>6</sub>|canj=十月一一 (JBMM)}}
# [[sun]]
# [[day]]
# …

====References====
* {{Han ref|kx=0489.010|dkj=13733|dj=0848.140|hdz=21482.010|uh=65E5|ud=26085|bh=A4E9|bd=42217}}

==Cantonese==
===Hanzi===
===Noun===

==Japanese==
===Kanji===
===Noun===

==Korean==
===Hanja===
{{ko-hanja|hangeul=일|rv=il|mr=il|y=il|eumhun=날|ehrv=nal|ehmr=nal|ehy=nal}}

===Noun===
(if used in isolation)

==Mandarin==
===Hanzi===
===Noun===

Categories of characters[edit]

Chinese characters have been used in a number of languages and regions in the Sinosphere, and thus have a great deal of variation.

  • Many characters are used, in the same form, across all languages;
  • some characters only exist in one language or another; and
  • other characters have different forms in different languages.
    Some of these variants are encoded differently in Unicode (see: Variant Chinese character), some are considered typographical variants in Unicode (see: Han unification), and some appear identical but have different stroke orders.
    There are also handwritten simplifications (略字; Japanese ryakuji, Korean yakja), which are not encoded separately.

It is useful to indicate both:

  • What categories a character falls into
  • What variant forms, if any, a character has

Most basically, there are traditional Chinese characters, and two major simplifications: simplified Chinese characters and shinjitai (新字体), Japanese simplifications.

In more detail:

Variant forms[edit]

shinjitai

simplified

traditional

A given character may have multiple forms. One should:

[[Category:CJKV characters simplified differently in Japan and China|x]]
  • Not use a form of a character in a language in which it isn’t used. For instance, do not write Japanese words in Simplified Chinese (if the form differs from shinjitai).
  • Not translate between various forms. Link to variant forms of a character or phrase (as by {{zh-forms}} or {{ja-forms}}), but don’t translate in place or have a “Translations” section. As per WT:ELE#Translations: “Translations are to be given for English words only.”

For instance, the character for “reading” has 3 forms, as in the box at right. This box is produced by the template {{ja-forms}}; if there are just two forms, {{zh-forms}} shows Simplified and Traditional.

Headings[edit]

There are a number of templates which help with the layout, which are listed below.

The only thing that should come before the “Translingual” heading is, if necessary, a {{also}} hatnote for similar characters which may be confused (such as and ), and for variant forms.

Translingual[edit]

(Stroke order)[edit]

The translingual section should begin with a stroke order image, without a separate “Stroke order” section, using the template {{stroke order}}, with parameter strokes= for sizing. This uses stroke orders in commons:Category:CJK stroke order, if present.

To see if a stroke order image is present, check at Commons or go to the corresponding Japanese Wiktionary page (linked at left), which generally includes the stroke order.

Caveat: Different stroke orders[edit]
See also: w:Stroke order

Beware that, just as some characters have different forms in different systems, some characters have different stroke orders in different systems.

There are potentially 3 (or more) different stroke orders, but these very often coincide:

  • Traditional Chinese, used historically, and in Taiwan and Hong Kong
Note that there are some differences between modern Taiwanese and Hong Kong standards and actual historical practice; for example, differs in Taiwan, while and differ in Hong Kong.
  • Japanese
E.g., , , .
  • Simplified Chinese, used in mainland China
    Aka, Modern Chinese; contemporaneous with simplified characters: some characters were not simplified, but their stroke order was changed.
    (There were apparently no stroke order changes in Japanese character modernization.)

There are also Korean and Vietnamese stroke orders and character forms, but modern Korea generally uses Japanese conventions, and Vietnamese is only of historical interest, hence relatively unimplemented.

For instance, is different in Chinese and Japanese, while the radical (and thus all derived characters) differs in Simplified (and Taiwanese) and (historical) Traditional Chinese.

When Simplified and Traditional Chinese stroke orders differ, Japanese and Simplified Chinese coincide. There are apparently no examples where all three share the same form but different stroke orders, though there are examples where the form differs in all three.

  • {{stroke order}} defaults to Simplified Chinese.
  • If there are multiple stroke orders, please include all forms.
  • To include Chinese and Japanese, use the parameter strokes=jbw for the Japanese stroke order: see .
  • To include traditional and simplified forms, you must currently do so manually: see .

Etymology[edit]

If possible, it should have an “Etymology” section, explaining the form of the character, listing earlier forms, and explaining the development.

Beware that there are many folk etymologies based on analyses of modern forms, with many dating to the 2nd century CE (when present forms largely stabilized)! Modern scholarship based on oracle bone script often provides different etymologies. See References.

Please do not include discussion of the etymology of the word (often Old Chinese) that the character was developed to represent; this belongs in the language-specific section. The Translingual Etymology section should not include pronunciation information, except when necessary to understand the form. This occurs for example in phono-semantic compounds, where reconstructions of the pronunciations of the compound character and its phonetic are relevant to the form, but sound is completely irrelevant to pictographs and ideographs. Reconstructed pronunciations should be cited and follow the usual rules for historical Sinitic languages – see About Old Chinese and About Middle Chinese for guidelines, and for an example.

Most characters were coined during the Old Chinese period; this needn’t be explicitly mentioned, but can be stated if helpful. If a character was not coined during the Old Chinese period – notably Middle Chinese or foreign coinages (especially Japanese, some Korean and Vietnamese), this should be mentioned.

Simplified and Shinjitai[edit]

For Simplified Chinese and Shinjitai, this should simply link to the Traditional Chinese form / Kyūjitai, and explain the method of simplification, as in Simplified Chinese characters: Methods of simplification and Shinjitai: Methods of simplifying Kanji. This can be done using the {{Han simp}} template, which also categorizes.

Traditional and coinages[edit]

For Traditional Chinese and country-specific coinages, this should:

Classify composition
See: Chinese character classification. One should provide traditional 六書 (liùshū, six writings) classification, using the template {{liushu}}, and break up compounds via {{Han compound}}. Note that:
  • The overwhelming majority of Chinese characters (90+%) are phono-semantic compounds.
  • Beware of etymologies based on current forms (especially claims that a character is an ideogrammic compound) – the current form is often a simplification of an older form, which may not be related to the current components. For instance, the lower part of is cognate to , not to , which it more closely resembles.
Show previous forms
These are collected at Commons:Ancient Chinese characters, and the template {{Han etyl}} will include them, if they exist.
Note that older forms themselves had variants, which need not be exhaustively displayed.

Han character[edit]

The main section is the “Han character” section, using the {{Han char}} template, which includes radical, stroke count, and various input methods.

This is followed by definitions (still in “Han character”). For language-specific meanings, such as Japanese kokkun (国訓), there is no consensus on whether to place these in the main “Han character” section, marked as language-specific, or whether to place the meaning under the language heading (“Japanese”).

This should also include a “Reference” section, using {{Han ref}}, which links to the character in various standard dictionaries, and includes the Unicode number (linking to Unihan in the process).

==Translingual==
{{stroke order}}

===Etymology===
(Explanation of form; ideally shows earlier forms.)

===Han character===
{{Han char|rn=109|rad=目|as=03|sn=8|four=4071<sub>6</sub>|canj=十月一一 (JBMM)}}
# [[sun]]
# [[day]]
# …

====References====
* {{Han ref|kx=0489.010|dkj=13733|dj=0848.140|hdz=21482.010|uh=65E5|ud=26085|bh=A4E9|bd=42217}}

General considerations[edit]

Compounds[edit]

Compounds and idioms involving a character (熟語) are listed language by language, since they vary between languages.

List compounds using a suitable Category:Column templates, generally {{rel-top5}} or {{top5}} if only listing compounds, or {{rel-top3}} or {{top3}} if also providing a gloss. See is an excellent example.

Compounds should be collated by radical-and-stroke sorting; for order of radicals, see Index:Chinese radical. However, as per Wiktionary:About Japanese#Compounds, compounds that begin with the character should come first.

As per Wiktionary:About Japanese#Compounds, terms involving a character should be listed in an L4 section called “Compounds” – by contrast, in the entry for a 2 or more character compound, longer compounds should be called “Derived terms”.

A separate L4 section called “Names” should contain any common names constructed from the character, even if such names duplicate a compound word.

Note that some pages list compounds as “Derived terms” in the “part of speech” section: contrast 日#Mandarin and 天#Mandarin.

Compound entry[edit]

On the page for a compound (2 or more Chinese characters), some general considerations.

As above, longer compounds (containing a given compound) should be in a section called “Derived terms”.

If one compound is obtained from another by re-arranging the characters, such as 会議 and 議会, it is useful to link these; the “Related terms” section fits best, presuming an etymological connection.

Chinese[edit]

There is no monolithic “Chinese” section; instead, there are separate sections for each language in the Chinese family of languages, as discussed at Wiktionary:About Chinese#Entry format.

There does not appear to be a consensus on the distinction between the character and the word.

The outline is:

  • Pronunciation (in the usual way: Wiktionary:Pronunciation)
  • Hanzi
    • Compounds (alternatively, may be placed in Derived terms)
  • part of speech (Noun, Verb, etc.)
    • Derived terms (Compounds may alternatively be placed here)

For the Hanzi section, the templates {{cmn-hanzi}} and {{yue-hanzi}} help one to show the Hanzi, variant forms, and romanizations. As discussed in Wiktionary:About_Chinese#Hanzi_form_templates, the template {{zh-hanzi}} and related display boxes with variant forms of the character.

For the part of speech section(s), see Wiktionary:About Chinese#Entry format. Note that a particular character may have multiple readings (with different meanings), for instance .

Japanese[edit]

In addition to L3 part of speech headings, Japanese entries for a Chinese character have an L3 heading called “Kanji”, which has an L4 heading called “Readings”, which can use the template {{ja-readings}}. This currently supports the usual on, kun, and (rarer) nanori readings, but also nazuke and 呉音 (go-on) readings.

See Wiktionary:About Japanese#Kanji_entries for more on the format of Japanese entries.

Korean[edit]

There should be an L3 heading for “Hanja”, beginning with the eumhun (meaning/reading), which can be obtained by the template {{ko-hanja}}. This also supports the following romanizations, via the respective parameters: Revised Romanization of South Korea (ehrv), McCune-Reischauer (ehmr), Yale Romanization of Korean (ehy).

Next there should be an L4 heading “Compounds”; in addition to the hanja form, it should also include hangeul forms for all words.

Vietnamese[edit]

Currently, the vast majority of Vietnamese character entries indicate Hán-Việt readings and omit Nôm readings. The layout has not been standardized, though most have a single L3 heading, "Han character", with {{vi-hantu}} below it.

Works in chữ Nôm are quoted in the part of speech section using the {{vi-ruby}} template. Any quốc ngữ works should be quoted in the corresponding quốc ngữ entry.

Note that most Nôm text includes characters not yet encoded in Unicode. Most Nôm sources make use of Private Use Area characters that are found in various Nôm fonts. Do not use Private Use Area characters, because they will be misinterpreted by readers with different Nôm fonts installed. Instead, use Ideographic Description Sequences. (See Template:vi-ruby for an example.)

Pronunciations and etymology generally belong in the quốc ngữ entry. Also in that entry, each headword line takes a list of characters (according to Nôm readings) as an additional parameter. Hán-Việt forms may be listed under an L3 "Readings" section using {{han tu form of}}.

Proposal[edit]

There is a proposal at Wiktionary:Beer parlour/2013/December#Nom character that would do away with the current layout in favor of the following structure:

  • Character – "Han character" is avoided because it appears to exclude Nôm readings or Nôm-only characters.
    • Readings – Specify any Hán-Việt and Nôm readings using the template {{vi-readings}}.
    • Compounds
  • part of speech (Noun, Verb, etc.) – Because Hán-Việt readings rarely differ from the definitions in the Translingual section, the parts of speech sections are for definitions of Nôm readings only. Use headword templates like {{vi-noun}} and {{vi-verb}}, listing Nôm readings in the first parameter.
  • References (if applicable)

Others[edit]

Chinese characters and similar scripts (see Chinese family of scripts) are used for other languages than those primarily discussed above.

Other Sinitic and Japonic languages[edit]

Other than Standard Chinese and Standard Japanese, Chinese characters are used for other Sinitic languages and Japonic languages, following the format for Standard Chinese and Standard Japanese.

Minority languages in China[edit]

Bai, Dong, Miao and Zhuang, like Vietnamese, are currently officially romanized, but used Chinese characters in the past, with Dong continuing to use Chinese characters widely and Zhuang using it in literary contexts.

Further, Zhuang logograms are significant variants, not fully unicoded at present.

As of this writing, there are no Bai entries on Wiktionary, no Dong entries, and while there are Miao entries and Zhuang entries, they are all written in romanizations. As these are all living languages with millions of speakers, these categories can be expected to expand in future.

Extinct languages[edit]

The extinct languages of Khitan, Jurchen, and Tangut each use their own script, derived from Chinese characters. As of this writing, there are very few Wiktionary entries in Khitan, Jurchen, or Tangut: the 8 Khitan words are written in Chinese characters, the Jurchen entries are in Latin characters, and there are no Tangut entries. At present only fragments of Khitan and Jurchen exist, so barring further discoveries, these languages will remain limited. A large corpus of Tangut exists, so it may expand in future.

Other scripts[edit]

Some other scripts used in China are not derived from Chinese characters, but often have borrowed from Chinese. These notably include Dongba script, Geba script, Sui script, and Yi script.

See also[edit]

References[edit]

K’s Bookshelf – useful lists for finding related characters:

Etymology[edit]

  • Xǔ Shèn 許慎/许慎. Shuōwén Jiězì “說文解字”/“说文解字” 100–121 CE – classic reference, but due to lack of access to earlier forms, has errors
  • Xu Zhongshu 徐中舒. “丁山說文闕義箋” [Commentary on the errors in Shuowen by Ding Shan]
  • 李孝定 Lĭ Xiàodìng (Lee Hsiao-ting, 1965). 甲骨文字集釋 Jiǎgǔwénzì jíshì, [Collected interpretations of oracle bone characters], 台北 Táibĕi, 南港 Nángǎng (Nankang): 中央研究院歷史語言研究所 Institute of History and Philology, Academia Sinica
A authoritative modern reference.