Talk:𬖾

From Wiktionary, the free dictionary
Latest comment: 10 months ago by 72trombones in topic Is this character Unicode cruft?
Jump to navigation Jump to search

Is this character Unicode cruft?[edit]

The Han-Nôm Institute in Hanoi recommends several Nôm dictionaries, including Giúp đọc Nôm và Hán Việt (Nôm and Sino-Vietnamese Pronunciation Guide, 2004) by Trần Văn Kiệm, Tự điển Chữ Nôm trích dẫn (Dictionary of Nôm characters with quotations, 2009) by the Institute for Vietnamese Studies in Westminster, California, and Tự điển Chữ Nôm trích dẫn (Nôm Characters with Quotations and Annotations, 2014). As none of these dictionaries include the character on this page, I would question the claim that it is Nôm.

I suspect that the real source of this character is Đại Từ Điển Chữ Nôm (1998) by Vũ Văn Kính. This is a notorious dictionary full of made-up characters. While the Han-Nôm Institute gives 9,500 Nôm characters, Vũ gives 37,000, including this one. Vietnam stopped publishing books in Nôm around 1920, so this character does not appear in print until long after the end of the Nôm era. 72trombones (talk) 11:26, 28 June 2023 (UTC)Reply

I would just say that, not having read any Nom texts myself, I can't be sure if this character could be a variant form of some kind. But I suspect that OCR of Nom texts may not be high quality OCR, givem that most ENGLISH OCR sucks, even today! Here's the other convo: [1]. I'm no expert, so good luck on this. --Geographyinitiative (talk) 11:32, 28 June 2023 (UTC)Reply
The Han-Nôm Institute used a corpus of 124 works to produce its dictionary. The National Library of Vietnam has thousands of works in Nôm. Their collection of Nôm scripts for traditional opera takes up several bookcases. They have digital images of their collection online, but none of it is OCR'd.
Perhaps I should back up and explain some history. The Han-Nôm Institute started with two Nôm dictionaries, one by Vu Van Kinh and Nguyen Quang Xy that was published in Saigon in 1973, and another by Ho Le that was published in Hanoi in 1976. Both of these dictionaries give 頗 as the character for phở. Neither of them includes an entry for the character on this page. The institute extracted 9,299 characters and gave them "VSource" codes ranging from V0-0000 to V3-9999. This took eight years, so it was apparently quite a project. Unicode gave these "Nôm Ideograms" codepoints in 2001. For example, 喃 (nôm) has a VSource of V1-4F54 and a codepoint of U+5583.
In 1998, Vu expanded his dictionary from about 6,000 characters to 37,000. 𬖾 is one of the characters that was added at this time. It was assigned the code V4-5055. V4 is a supplementary set of characters that was given codepoints in 2015. These characters were not classified as Nôm, but rather as "Hán Nôm Coded Characters." I guess that sounded better than "Unicode cruft." Although these characters have been in Unicode for eight years now, but no one has created a font to display them. 72trombones (talk) 05:11, 29 June 2023 (UTC)Reply