|m_note=Only used as a component in a character[edit]

Hmm, I'm not sure this is the way to note this. |m_note= is for notes on the Mandarin pronunciation; Only used as a component in a character sounds more like a ====Usage note==== IMO. (@Wyang, Justinrleung, any opinions?) —suzukaze (tc) 05:33, 9 April 2017 (UTC)

I agree that the note should not be in the pronunciation section, but isn't "component variant" sufficient to say that it's only used as a component? Also, Cantonese muk6 seems out of place. It would be component variant of 目 instead. — justin(r)leung (t...) | c=› } 05:37, 9 April 2017 (UTC)

My Reply[edit]

This note is included for CJK Unified Ideographs characters that have a kMandarin or kCantonese pronunciation defined in Unicode but not found in any commercially available Mandarin dictionaries or national standards such as BIG5, CNS11643, GB18030. —KevinUp (talk) 06:03, 9 April 2017 (UTC)

But that doesn't mean we put the note in the pronunciation section, since it doesn't have much to do with the pronunciation. — justin(r)leung (t...) | c=› } 06:46, 9 April 2017 (UTC)
The note could be modified to "Pronunciation derived from " or "Pronunciation derived from " for characters such as and KevinUp (talk) 07:11, 9 April 2017 (UTC)

I propose that the pronunciation entry be scraped altogether for such characters that are not pronounceable on their own and have common names such as 草字頭 for , 絞絲旁 for , 豎心旁 for and 卷字頭 for KevinUp (talk) 07:11, 9 April 2017 (UTC)

A specially created box such as zh-see to indicate that the character is only used as a component may be appropriate for Unicode characters that are not dictionary characters. —KevinUp (talk) 07:30, 9 April 2017 (UTC)
I think we could keep the pronunciation section but have it contain the name of the character, i.e. 艹#Pronunciation would have Mandarin: cao3 zi4 tou2, and the rest of the entry would treat as a Chinese symbol instead of word. —suzukaze (tc) 09:02, 9 April 2017 (UTC)
Good idea. The pronunciation section could include the names of the character. —KevinUp (talk) 10:44, 9 April 2017 (UTC)

Here are a list of symbols that I have come across: 𩙿. These are the ones that are (1) not found in dictionaries (2) mostly used as character components for ideographic description characters (⿱⿰⿵) —KevinUp (talk) 10:44, 9 April 2017 (UTC)

Isn't a kwukyeol note, only found in Korean? Johnny Shiz (talk) 14:07, 10 February 2019 (UTC)
Are you two going to add 亻 to the dānrénpáng pinyin page? If not, can we truly consider 'dānrénpáng' to be the 'pronunciation' of 亻? --Geographyinitiative (talk) 23:02, 11 February 2019 (UTC)
No, because these are symbols, not actual lemmas or words that are used in the Chinese language. We could assign names to these symbols, but not pronunciations. KevinUp (talk) 12:24, 12 February 2019 (UTC)
@Johnny Shiz, KevinUp I agree with your stance and have made some preliminary edits to ; let me know what you think. --Geographyinitiative (talk) 12:59, 12 February 2019 (UTC)
Much better. Yes, we shouldn't add dānrénpáng as the pronunciation of , because that is the Pinyin for 單人旁单人旁 (dānrénpáng), not (rén). KevinUp (talk) 13:13, 12 February 2019 (UTC)
By the way, keep in mind that the pronunciations aren't just in Chinese, they're also in Japanese, Korean, and Vietnamese. Johnny Shiz (talk) 20:02, 12 February 2019 (UTC)


Hey- I'm fascinated by the strange characters you are adding. Do you have a font that can display this character normally in your browser while on wiktionary? Right now it's just a box to me unless I search for it in a dictionary. Keep it up! Very cool. Thanks for any help! --Geographyinitiative (talk) 13:53, 22 April 2018 (UTC)

Hi, I'm using the Hanazono font. You can get it from here: It displays CJK characters up to extension F. KevinUp (talk) 05:23, 23 April 2018 (UTC)
@Geographyinitiative: What browser do you use? For me, Extension F characters only display on Microsoft devices. Johnny Shiz (talk) 22:02, 11 February 2019 (UTC)
@Johnny Shiz Opera. Since the time I made that post in April 2018, I have become very familiar with various work-arounds to overcome the problem of not seeing the characters, so I don't even think about trying to display the characters anymore. --Geographyinitiative (talk) 22:27, 11 February 2019 (UTC) (modified)

Unihan regional codes in IDS[edit]

Hi, I don't think we need to include regional codes in the IDS if there is only one IDS. We only need them for distinguishing different IDSes. — justin(r)leung (t...) | c=› } 03:16, 23 April 2018 (UTC)

  • Hi, I'm adding it to the translingual section based on the Unicode chart. It provides information where the glyph comes from. I added it so our readers can know the source of the glyph without going through the official unicode charts, which may be hard to download for those with a slow Internet connection and hard to navigate for those that are new to the charts. Since this is the translingual section, the information may also be useful for future editors to add sub-entries for languages that are not yet created or to remove entries for languages that may not be relevant. Also, this information is useful for characters that have various compatibility ideographs because many font manufacturers do not follow the Unicode standard. Here is an example of a proposed entry that might be useful to determine which glyph form belongs to which region. I hope that the wiktionary programmers can code an additional functionality in future so that when your mouse hovers over an area code it will explain what it means. For now I will not add any ids if the glyph exists across all six regions (GHTJKV). However, if the glyph is only for GHTK (such as ) or GTJV (such as ) this information may be useful for those that are studying the characters.
    (radical 181 +4, 13 strokes, cangjie input 一山一月金 (MUMBC) or X一山一月 (XMUMB), composition(G,T for U+980B or T for U+2F9FE) or ⿰⿸(K for U+FACB) or ⿰⿸⿺⿱(T for U+2F9FF))
  • Hi, this may be worth checking out: is provided by GHTK but a Vietnamese entry exists and the Korean entry is missing. —This unsigned comment was added by KevinUp (talkcontribs).
I appreciate your concern about the details of Han characters. However, this information does not belong in the IDS unless there are more than one IDSes to distinguish, like for or . We already link to the Unihan database, which should have the same info about the sources as the Unicode charts; these should be accessible even with slow Internet. While the regional sources are a good indicator for whether a character is used in the specified regions, they may not actually reflect actual usage in the regions. Many characters have a G source just because GB wants to include whole blocks for compatibility. On another note, many of the H glyphs don't actually reflect current standards in Hong Kong; the representative glyph is essentially the same as the T source glyphs. I'm fine with the info for , but for and , I'd prefer this to be in a usage note that describes actual usage. It's more complicated than what the regional codes say:
  • The G source has both characters because of what I've described above, but the actual standard would be 说 (説 as its traditional form).
  • In Taiwan, 說 is much more common than 説 in printed material; both are used in handwriting.
  • The current HK standard (HKSCS-2016) now includes , which is its actual educational standard, but 說 is still very common in publications and on computers.
  • While 説 is the current standard in Japan according to Jōyō Kanji, 說 was historically used in Japan as well.
(BTW, don't forget to sign after your comments with four tildes.) — justin(r)leung (t...) | c=› } 19:50, 23 April 2018 (UTC)
Hi. Thanks for your reply. After going through the Unihan database I agree that the same information can be obtained from the external links provided. Also, after reading your detailed explanation of and , I have to agree that the glyph sources may not necessarily reflect its current actual usage. I started adding this information because I thought it would become useful for more obscure characters such as those in the E and F extension blocks. I'll stop doing this in future edits and thanks for reminding. KevinUp (talk) 17:10, 26 April 2018 (UTC)

Thanks for adding the derived characters[edit]

Hey- thanks for adding the derived characters. When I study Chinese vocabulary, I sometimes like to compare the characters with characters that have similar components. I especially enjoy the rare characters you have added. thanks! --Geographyinitiative (talk) 04:51, 18 May 2018 (UTC)

You're very much welcome. The derived characters added to the translingual section may also include characters that were invented outside of China, such as Korean made Hanja 한국제 한자 (han-gukje hanja) 韓國國字, Japanese kokuji 日本國字 and also Vietnamese Nom characters. KevinUp (talk) 09:45, 21 May 2018 (UTC)

Sycophantic praise; etc.[edit]

Whenever I look at a page, I can tell if you've been there or not- for instance, the page- with one look, I knew that you must have made edits to that page. The tell-tale signs were all there: Derived characters, Related characters, Glyph origin, Usage notes- and a great work up of the definition! You are the editor I wish I was. I liked your interpretation of the character as a pictograph/phono-semantic compound; 汉字源流词典 says it is a ideogrammic compound/phono-semantic compound [1]. Great stuff, just great! --Geographyinitiative (talk) 14:06, 16 December 2018 (UTC)

Archiving of discussions[edit]

Hi. The top of request pages like "Wiktionary:Requests for verification/English" have the following instruction: "At least a week after a request has been closed, if no one has objected to its disposition, the request may be archived to the entry's talk-page". Thus, do please leave closed discussions on the page for at least seven days before archiving them. Thanks. — SGconlaw (talk) 09:13, 4 January 2019 (UTC)

Sorry for the mistake, I'll take better note of this next time. KevinUp (talk) 09:15, 4 January 2019 (UTC)
No worries! This was also pointed out to me by another editor some time back. — SGconlaw (talk) 10:18, 4 January 2019 (UTC)


@KevinUp Hello! I couldn't find a rare character and I would like to ask if you can find it. I didn't see it here: [2], here: [3], here: [4], or here: [5]. Baidu, jiu haishi buzhidao. The unfindable character in question seems to have the same Cantonese pronunciation as the in 簷蛇 (Jyutping: jim4), but the character is written with a 虫 on the left and with a 嚴 on the right: "⿰虫嚴". The character can be seen in 香港粵語詞典 on page 208, where it is used in the word "⿰虫嚴蛇". Any help would be appreciated! Please let me know if you look for it and can't find it. --Geographyinitiative (talk) 09:49, 8 January 2019 (UTC)

If you can't find it or are not interested, then don't worry about it- I added ⿰虫嚴 to the jim4 page. --Geographyinitiative (talk) 11:20, 8 January 2019 (UTC)
@Geographyinitiative: Wow. You seem to have stumbled upon an extremely rare character. For future reference, there's also [6] which lists soon-to-be encoded characters (Extension G,H,etc), and [7] which lists derived characters under a certain component, and [8], [9], [10] for extremely rare characters used for personal names. I've also looked up 《漢語方言大詞典》 (中華書局, 1999) to check if its a dialectal character. However, all these turned up negative. After googling "⿰虫嚴" with apostrophe marks, I found the character quoted here: [11]
I'm not sure if the meaning above is same as that of "⿰虫嚴蛇". An image of ⿰虫嚴 can be found here: [12] Anyway, today I found this: [13], another site to search for rare characters. Here's another site: [14] (缺字系統), but no definitions are provided. KevinUp (talk) 17:13, 8 January 2019 (UTC)

Incorrect use of {{inh}}[edit]

KevinUp, I noticed you incorrectly used {{inh}} here. {{inh}} is only to be use in unbroken chains of inheritance. The proper template to use there would have been {{der}}. Please see the template page for further instructions and please, if you can, go back and correct any past edits. Thanks. --{{victar|talk}} 08:04, 23 January 2019 (UTC)

@Victar: Sorry, overlooked that one. I modified {{etyl|la|fr}} to {{inh|fr|la|}} based on the previous statement {{inh|fr|ML.|pūblicitātem}}. I found that herbage#French (plus 16 other French entries) and lealdade#Portuguese (plus 14 other Portuguese entries) also has this mistake. Shall I correct these entries as well? Those were done by other editors. KevinUp (talk) 09:08, 23 January 2019 (UTC)
No problem. Yeah, if you see any examples of that in the areas you work, please amend. Also note that {{bor}} should only be used at the start of a etymology, so if you see entries to the contrary, please fix them as well. Thanks! --{{victar|talk}} 16:38, 23 January 2019 (UTC)
@Victar: It seems that up to 430 entries have {{der|en|enm}} instead of {{inh|en|enm}} (Search here) Should these be automatically converted to {{inh|en|enm}}? There's also Category:English terms borrowed from Middle English and Category:English terms borrowed from Old English which uses {{bor|en|enm}} or {{bor|en|ang}} but most of them appear to be reintroduced terms. KevinUp (talk) 17:12, 23 January 2019 (UTC)
Looking over those search results, definitely many should be converted to {{inh}}, but I wouldn't say automatically. All those particular borrowings appear fine. --{{victar|talk}} 18:21, 23 January 2019 (UTC)

Using "鎮 / 镇" in zh-div[edit]

Hey- as I've been going through the towns of China, I've always thought it was strange that I would only add the traditional form of the name of the administrative division- '鎮'- and not add the simplified form- '镇'. Would there be a way to add both? Would this change even be desirable? Just a thought. (see Category:zh:Towns in China) --Geographyinitiative (talk) 14:33, 17 February 2019 (UTC)

@Justinrleung, Suzukaze-c Would like your input too. I don't know who else would be interested in this topic. --Geographyinitiative (talk) 14:36, 17 February 2019 (UTC)
For example, '镇' does not appear on the 古驛 page- should it? --Geographyinitiative (talk) 14:38, 17 February 2019 (UTC)
Regarding the simplified form, it seems that Suzukaze-c had a similar request in Nov 2016 (See Template talk:zh-div).
As for adding (zhèn) to 古驛古驿 (Gǔyì) or creating a separate entry 古驛鎮古驿镇, this depends on how the town/village/river/geographical entity is cited in local news, publications or historical records. I think we should create entries based on its attestability, particularly how the locals refer to the place, and not based on listings found in statistical tables/government census/etc. My opinion is that if 古驛古驿 (Gǔyì) and 古驛鎮古驿镇 refers to the same place, then 古驛鎮古驿镇 can be redirected to 古驛古驿 (Gǔyì), similar to how 上海市 redirects to 上海 (Shànghǎi). However, if 古驛古驿 (Gǔyì) is not used on its own, then this entry ought to be moved or redirected to 古驛鎮古驿镇 instead. KevinUp (talk) 16:19, 17 February 2019 (UTC)
Thanks for your reply. Regarding the first issue, I left a message on the Template talk:zh-div page. Regarding the second issue, I have maps which list Mainland China locations without using 市 , 县 , or 镇 , and Justin found some great examples that seemingly proved the attestability of the 頭筆 residential community recently- 社区 wasn't 'obligate'. --Geographyinitiative (talk) 18:40, 17 February 2019 (UTC)
I agree with the redirects you mentioned (上海市 redirects to 上海 (Shànghǎi)). I usually don't actively make pages with the 市 , 县 , or 镇 tacked on unless there's an ethnic minority area involved, in which case they don't redirect. --Geographyinitiative (talk) 19:06, 17 February 2019 (UTC) modified


Do you know what the origin of the element se- in this word is? ←₰-→ Lingo Bingo Dingo (talk) 09:08, 18 February 2019 (UTC)

@Lingo Bingo Dingo: I think the element se- may be derived from a regional dialect, perhaps a form of Bazaar Malay. A google search of "sesate Indonesia" reveals that the word exists in the Balinese language and is a symbolic weapon used in some form of ritual represented by a type of food. [15] [16] [17] [18] [19] KevinUp (talk) 21:02, 18 February 2019 (UTC)
Interestingly, although Afrikaans sosatie is currently listed as a descendant of Dutch sesaté, the Wikipedia article for sosatie suggests that sosatie is of Cape Malay origin, from saus (spicy sauce) + sate (skewered meat).
However, the origin of the Malay word sate is disputed. Most native Malay words have corresponding rhymes, but sate does not rhyme with any other word. The Indonesian Wikipedia article for sate suggests that sate is from a Tamil word, and is a type of street food invented in Java island during the early 19th century.
I'm not sure whether the Dutch version of saté or sesaté contains any pork in historical recipes. If it does, than there's a strong Balinese connection, because the majority of Javanese people are Muslim and do not consume pork, unlike the Balinese people. KevinUp (talk) 21:02, 18 February 2019 (UTC)
Linking sosatie as a descendants was my doing, based on the etymology of English sosatie having alleged a Dutch intermediate step. But that might be wrong, though I wouldn't rely on Wikipedia at all for this either.
I will look into the question about pork. ←₰-→ Lingo Bingo Dingo (talk) 15:58, 19 February 2019 (UTC)
@Lingo Bingo Dingo: I found this while searching for "sesaté sosatie": The second reference (Dialectwoordenboeken en woordenboeken van variëteiten van het Nederlands) suggests that the term "sateh" is from Javanese sateh, originally Tamil sataj, but I'm not sure about the original spelling in Dutch/Javanese/Tamil.
As for English or Afrikaans sosatie, I think it is likely to be derived from Dutch saus + saté rather than sesaté. KevinUp (talk) 17:05, 19 February 2019 (UTC)
Whether it is strongly linked with pork is a little hard to tell, but seems like it isn't in the earlier results.
Van Wyk (Afrikaans etymological dictionary) gives a Indonesian Dutch or Malay origin from sesate(h), sateh. The oldest word list (ca. 1880) gives sassati, with sosatie appearing in the early 20th century. I don't think the compound is likely at all, because then one would expect stress on the first syllable (the Afrikaans is stressed on the second syllable). The form sausati appears once in a word list from 1899, which seems secondary. Forms like sasaté also appear a few times in Dutch. ←₰-→ Lingo Bingo Dingo (talk) 08:54, 20 February 2019 (UTC)
The link from Algemeen Nederduitsch-Maleisch Woordenboek [20] seems to suggest that sasaté has a Javanese origin. Anyway, as mentioned above, sate isn't a native Malay word, due to lack of corresponding rhymes. I found the Javanese spelling ꦱꦠꦺ (saté) on Javanese Wiktionary. Someone else will have to check whether ꦱꦱꦠꦺ (sasaté) exists or not. KevinUp (talk) 09:33, 20 February 2019 (UTC)
Yes, ascribing it to Malay and stopping there is not useful. I have added another unspecific step to the etymology and changed Malay to Indonesian. Does the Tamil origin seem plausible to you? ←₰-→ Lingo Bingo Dingo (talk) 10:05, 20 February 2019 (UTC)
I think the previous edit where sosatie is stated as "from Dutch sesaté or directly from Malay sate" is good enough. We might be dealing with two different etymologies: (1) Dutch sesaté from a Javanese or Balinese word and (2) Dutch saté from a type of street food, presumably based on Betawi (Jakarta Malay dialect) saté, from Tamil சதை (catai, flesh).
However, the origin of the word is disputed, so I think it would be better to exclude the Javanese/Balinese or Tamil origin and revert to the previous edit. KevinUp (talk) 11:12, 20 February 2019 (UTC)
On an unrelated note, I think we have to be careful about converting Malay to Indonesian since technically Indonesian did not exist before its independence in 1945. Prior to independence, there's Malay, spoken in the vicinity of the Riau-Lingga Sultanate, and also the Betawi language, a dialect/creole of Malay spoken in Jakarta. Modern colloquial Indonesian, although based on Riau-Lingga Malay, is significantly influenced by Betawi, because of the position of Jakarta as its capital.
I think we can use "Malay" for the etymology of Dutch saté, because "Betawi" is a direct descendant of Malay. A historical dictionary for Betawi (Batavia/Jakarta Malay dialect) to Dutch might be useful for us to identify such words. I'm reminded of Dutch toko, which might be from Betawi, rather than Malay or Indonesian. KevinUp (talk) 11:12, 20 February 2019 (UTC)
Is 1945 used as a cutoff point for Indonesian? Isn't there continuity with the variety of Malay used by the colonial administration though? ←₰-→ Lingo Bingo Dingo (talk) 14:59, 22 February 2019 (UTC)
  • The cutoff point for Indonesian can be taken as 1928, when the Youth Pledge (Sumpah Pemuda) was made by young nationalists who proclaimed "bahasa Indonesia" as the language of unity. The chosen language was based on the standardized form defined by the late Ali Haji bin Raja Haji Ahmad (1808-1873), who wrote Kitab Pengetahuan Bahasa, the first monolingual Malay dictionary in the region based on the Malay dialect of Johor-Pahang-Riau-Lingga.
  • As for the language used by the Dutch colonial administration, Malay was designated as the second official language in 1865, but was later removed as an official language in 1932 due to the prominent rise of nationalism. [21] This language is probably the same language standardized by Ali Haji bin Raja Haji Ahmad.
  • I think it is important to identify when a "Malay" word was first attested in the Dutch language. From 1641 to 1825, the Dutch occupied Malacca (present day Malaysia), so words from that time period is "Malay". However, words borrowed during the time of the Dutch East Indies (1800-1948) could also be Javanese, Balinese, Sundanese or some other creole of Malay, such as Betawi (these four languages are spoken on Jawa island).
  • To be safe, Malay words based on Riau-Lingga can be searched here: Kitab Pengetahuan Bahasa (romanized in Indonesian) or Puisi-puisi Raja Ali Haji (romanized in modern Malay). KevinUp (talk) 03:57, 23 February 2019 (UTC)
In the Malay language, when the prefix se- is used before a noun, it usually means "one" or "the whole/the entire". However, I've never heard of the term "sesate" used for the sense "one satay" or "the entire satay". The grammatical form to refer to "one satay" is "secucuk sate" (a skewer of satay) while "entire satay" is "seluruh sate".
As stated in the entry for se-, the "one" sense is from a shortened form of esa while the "whole/entire" sense is a clipping of seluruh. I don't think the element se- in sesaté is derived from Malay or Indonesian though. KevinUp (talk) 21:02, 18 February 2019 (UTC)
Additional comment: It seems that the se- element may be a reduplicated form of sate in the Balinese language. [22] (from [23]) KevinUp (talk) 02:01, 25 February 2019 (UTC)


How can I resolve it if I can't see some han characters? It seems like square. --Dingyday (talk) 14:30, 18 February 2019 (UTC)

@Dingyday If everything else fails, copy-paste it into the dictionary at to or --Geographyinitiative (talk) 20:37, 18 February 2019 (UTC)
@Dingyday: To view Han characters that cannot be displayed, you have to install a font such as the Hanazono font, which has the best coverage [24]. If you're using a mobile browser, you can copy the "square" character to대문 KevinUp (talk) 21:16, 18 February 2019 (UTC)
By the way, almost all hanja are viewable in modern browsers. Only 56 hanja for personal names (인명용한자표) are encoded in the extension set of CJK Unified Ideographs, so these characters will need font support to be viewable. KevinUp (talk) 21:16, 18 February 2019 (UTC)
Once you have the proper font installed, the red boxes will display correctly. KevinUp (talk) 21:16, 18 February 2019 (UTC)

The font which I have displays all Hanjas except two letters; (𬟓, hun) and (𬄕, jip). If I use Hanazono Font, it displays all Hanjas, but Hangul is separated and printed, So It is uncomfortable. --Dingyday (talk) 14:54, 19 February 2019 (UTC)

@Dingyday: Interesting. May I know the name of the font you are using that can display all Hanjas except two letters? The Hanazono font is more suitable for Japanese systems, which is why the Hangul appears to be separated. I think you can uninstall the Hanazono font, because most of the "square" characters on your system that cannot display are not hanja. They are mostly obsolete characters found only in historical Chinese dictionaries and Vietnamese chữ Nôm. KevinUp (talk) 15:34, 19 February 2019 (UTC)
The font I use is Kaigen Gothic. I love this font because it prints Hangul and Hanja smoothly. --Dingyday (talk) 14:45, 20 February 2019 (UTC)
Thanks! This font really does print both Hangul and Hanja smoothly. KevinUp (talk) 13:32, 21 February 2019 (UTC)