User talk:Nanshu

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Use the Möllendorf transcription system for Manchu because currently no one could display the Manchu script correctly in a browser.

Hi Nanshu,
I don't think it would be a problem if you would use the Unicode characters (if they exist) and then between parentheses the Möllendorf transcription. With time the characters will be displayed correctly in browsers.Polyglot 05:18 Aug 11, 2003 (UTC)
According to Unicode Manchu uses the Mongolian script, but not many fonts support it yet. Japman 13:40 Aug 11, 2003 (UTC)
I don't think it is a problem to write it with the right kind of characters. When I look at the Wiktionary with Windows I also can't see the Japanese, Chinese and Russian characters. The people that have an interest in these languages will make sure they have a character set that can read them. Putting the transcription behind in parentheses is, of course, always a good idea. 14:05 Aug 11, 2003 (UTC)

Things are tougher than you may think. Unicode supports Mongolian, Manchu, Sibe and Todo scripts but it is difficult to display or input these scripts. This page briefly explains why: I don't know any browser (or font) that treats positional variants correctly. Also, the implementation of "writing-mode: tb-lr" (CSS3 Text Module) is required.

I made a simple conversion script (from the transcription to Unicode sequence), but I'm not sure if my understanding is correct. Here is a test:

ᠮᠠᠨᠵᡠ ᡤᡳᠰᡠᠨ

(manju gisun) --Nanshu 01:32 Aug 12, 2003 (UTC)

OK, I get it, it's currently next to impossible to input and especially to render them correctly. I thought it was mainly a rendering problem due to incomplete font sets. Is there a one on one relationship between the Möllendorf transcription and the UNICODE characters? Maybe you just write the transcription for now, commented like this:
*[[Manchu]]: (Möllendorf transcription: ...)
Then we would be able to have a script convert it to the correct UNICODE characters later on. What do you think?Polyglot 07:47 Aug 12, 2003 (UTC)
I agree. --Nanshu
Unicode allocation is independent from complicated rendering. The mapping from the Möllendorf romanization to Unicode squence is mostly one-to-one, so it is not difficult to write a converter (not input method). But sometimes Free Variaton Selectors need to select unusual variant forms. For more information see I can insert Unicode characters so that someday they can be rendered correctly, but I cannot check whether they are really correct.--Nanshu 01:23 Aug 14, 2003 (UTC)

I can see that script in my browser. Petruk 18:01 Aug 12, 2003 (UTC)

I couldn't see it. What browser (IE, Mozilla?) are you using on which platform (Windows, Macintosh, Linux, UNIX)? On GNU/Linux Mandrake 9.1 with all the languages installed, I could only see square blocks with numbers in them. I tried with different fonts. Does it look more or less like it is supposed to look? There are examples on the web site that Manchu mentioned. Polyglot 18:51 Aug 12, 2003 (UTC)
Is it correct? That's the question. I can see too, but positional variants and ligatures are not supported. --Nanshu
Don't know whether it is correct or not. It shows up like this jpg: File:manju-gisun.jpg. I use Mozilla-1.3 / Win98. I didn't do anything. Maybe I had installed the font accidentally. Petruk 15:04 Aug 13, 2003 (UTC)
The link Nanshu pointed above indicates that it should've looked like one of these Japman 15:59 Aug 13, 2003 (UTC)
Incorrect. It should be something like this: manju gisun.png. At least SimSun-18030, STFangsong and Code2000 (imperfectly) support the script. You have surely one of them. --Nanshu 01:23 Aug 14, 2003 (UTC)
I didn't mean that it should look exactly like it, but only similiar in style. :-) Japman 09:11 Aug 14, 2003 (UTC)
It was a response to Petruk... --Nanshu 01:25 Aug 15, 2003 (UTC)

Your bot add a lot of pages about chinese characters... I can't see many of them. do you know which font I need ? Koxinga 12:15, 10 Sep 2003 (UTC)

I haven't been getting them either, but that may be because I don't seem to have installed the "CJK Unified Ideographs Extension A" in the hex range 3400-4DBF. These are generally less common characters than those in the main hex range 4E00-9FAF. Eclecticology 16:30, 10 Sep 2003 (UTC)

What I currently add are in "CJK Unified Ideographs Extension A" (U+3400 to U+4DBF; 6,582 characters), which were adopted in Unicode 3.0. This area was once occupied by Hangul Syllables, but they were reallocated to another area violating one of Unicode's principles. The Chinese characters, added in turn, were very rare. I confess that I have never seen these words. To show them, you have to get a relatively new Unicode font. If you use Windows, this page will help you. Now I decided to complete "CJK Unified Ideographs" first. --Nanshu 23:41, 10 Sep 2003 (UTC)

Great work that you've been doing on all those Chinese character pages. By way of suggestion, you could revise the "Radical" line to read as follows:

The radical number is especially useful for the non-Chinese for whom the order in which the radicals appear in the dictionary is not always obvious. Adding the word "stroke" at the end simply clarifies that that is what the number is about. Keep up the good work. Eclecticology 09:41, 18 Sep 2003 (UTC)

Implemented. Thank you for your suggestion. --Nanshu 00:50, 19 Sep 2003 (UTC)

Hi, Nanshu: I added some comments and questions on the Entries on Chinese characters meta page...basically, I was wondering where the Hangeul index came from (I'm wondering how complete or accurate it is); also, I was wrong about using the word "Hum-eun": it should be "eumhun" (音訓), since that is the word in Korean. Finally, you mentioned the NK/SK spelling variation (롱/농), but this is tricky: in South Korea, when a character that starts with ㄹ is used on its own or at the beginning of a word, it is replaced with a ㄴ (or ㅇ if ㅑ, ㅕ, ㅛ, ㅠ, or sometimes ㅣ follows it. (Examples: 羅 (라) in 羅州 나주; the family name 李 (리), which is usually written & pronounced 이.) But when the character comes in the middle or at the end of a word, the ㄹ is maintained (so a South Korean would write 용 (龍) for "dragon," but 백룡 (白龍) for "white dragon"). This rule also applies to the eumhun reading of Hanja: South Koreans normally write 육 for 六, but Chinese character dictiories call the character "여섯 륙" (yeoseon-nyuk). For 용, the eumhun reading is "용 룡" (yong-nyong)! In North Korea, however, the ㄹ is always used, even at the beginning of a word--so a North Korean would spell the South Korean city name 나주 as 라주, and romanize it as Raju (like Rajin near the Russian border), instead of Naju. This consonant-changing rule also applies to ㄴ when it comes before ㅑ, ㅕ, ㅛ, ㅠ: for example, 녀 (女), which is spelled and pronounced 여 in 여자 (女子).

...I'm also going to post what I just wrote on the Entries on Chinese characters meta page because it's relevant to the discussion, but I'll leave it here for you as well. --Sewing 16:15, 2 Oct 2003 (UTC)

  • The index was automatically generated from the Unihan database. My script read kKorean fields, composed Hangul syllables from these romanized values and sorted them by South Korean Hangul order (I did not use an sorting algorithm, but made a huge table because this order is that of codepoint). So my index only covers the characters that have the kKorean fields in the database. I'm not sure their accuracy.
  • Fixed.
  • I know what is called 頭音規則 in Japanese. Such a change is notated with ">". See .

--Nanshu 22:56, 2 Oct 2003 (UTC)

Thanks. Almost the same term is used in Korean (두음 법칙; 頭音法則; dueum beopchik) to describe this rule. I forgot the Japanese I learned 6 years ago, but is there a similar rule in Japanese? I didn't know that. --Sewing 00:20, 3 Oct 2003 (UTC)

Bot problems?[edit]

Now your bot seems to be adding only the Korean section and not the Chinese or Japanese sections; is this a bug? (See for an example.) --Sewing 00:47, 3 Oct 2003 (UTC)

Sorry...I was going to say that the bot still seemed to be buggy, but I guess it only adds a section for each language if there is data for that that correct? --Sewing 02:14, 4 Oct 2003 (UTC)

Yeah. My bot don't add a section for each language If I have no information about it. --Nanshu 02:27, 4 Oct 2003 (UTC)

Congratulations for getting throught all those CJK characters. Great job. Eclecticology 02:43, 22 Nov 2003 (UTC)

Hi Nanshu,

I hope you are still around. I'm struggling to get my PHP bot script operational again. Apparently the new software expects a different kind of interaction than before. Could you share with me how Nanshubot interacts with the wiktionary server software? Many thanks,Polyglot 22:14, 19 Dec 2003 (UTC)

Hello. NanshuBot is a POST-only bot. It isn't clever enough to modify existing articles. Sorry if I cannot help you.

My bot is written in Perl. It sends the following request.

    my $request = POST("$url_name&action=submit",
                       { 'wpTextbox1' => $output,
#'wpMinoredit' => 1,   # debug
                         'wpSummary'  => $comment,
#                        'wpPreview'  => 'Show preview',
                         'wpSave'     => 'Save page',
                         'wpSection'  => '',
                         'wpEdittime' => '' });

    $request->header('Cookie' => 'wiktionarywikiUserName=NanshuBot;wiktionarywikiUserID=421;wiktionarywikiPassword=********************************'); # wiktionary

This only works for non-existent articles. In this case, the server returns "Redirect". Otherwise, my bot updates the local error log (and I updates the articles manually). To edit an existing article, 'wpEdittime' should be set correctly. My bot with manually set 'wpEdittime' successfully updated User:Nanshu/Sandbox. --Nanshu 05:34, 20 Dec 2003 (UTC)

Thanks, I appreciate you share this. I had taken wpEdittime into account, but I get a Forbidden when trying to read an existing article. This worked ok before. I'm going to investigate and test a bit more. Maybe it's the cookie that's doing the trick.Polyglot 08:47, 20 Dec 2003 (UTC)

some more suggestions[edit]

Hello Nanshu. Very nice work. I'm willing to link my little Zhendic on your useful wiktionary entries. I have few usability and readability suggestions.

The character displays in bold in entry title. On my computer, it's not very easy to read for some characters with many strokes. I'm not sure that it's possible to change this H1 layout. So I may suggest to add a first line with info readers would want to find quickly, something like a summary of the page. It could be like this:

(big)CHAR [VAR](/big), toned pinyin, rough definition in English.

I don't know if you plan to add multiple characters "words", but it could be useful also. I can send you the database I used (it is public domain, I think).

If you wish to add very valuable meat to English definitions, especially from sinological point of view, one may ask to those who made

This gives you something like that:

學 xué (xue2) Middle Chinese:haewk Old Chinese:*N-kruuk Strokes: 16 Radical: 39.13
Meanings (M): " {I} [xu? hak gaku] (1) To study, to learn. (2) A student. (3) Learning; a branch of learning; a science; -ology. A certain theory or discipline. (4) A school. (5) A school of thought, such as the Taoist, or Confucian school, or the school of Zhuxi or Wang Yangming. {II} [ji�釅 kou] To teach. [Refs] [Credit] cmuller [Morohashi] 07033 [Daehan] 0549.270 [Mathews] 2780 "
Philosophy: HALL, DAVID L. AND AMES, ROGER T. 1987 Learning; an unmediated process of becoming aware; the appropriation and embodiment of the cultural tradition (wen2 文) through pedagogical interaction and exchange (wen2 聞).

By the way, your work is very fine and your bot very friendly :) 10:54, 8 Jan 2004 (UTC) (gbog on wikipedia)

Other categorizations[edit]

I know it's a bit late, but I've added three new Chinese indices:

  • zhuyin (or bopomofo)
  • Wubi Hua (the five strokes method)
  • Wubi Xing

and I'm wondering if you could help populating these indices.

I'm also slightly improving/redoing the TOCs in these indices, for ease of navigation.

KelvSYC 04:25, 5 May 2004 (UTC)

Just to say thanx[edit]

Nanshu, I should have written this many months ago: thank you so much for the CJK Unified Ideographs pages you completed with Nanshubot! They became a very important tool for me when surfing in the net. I have altogether saved long amounts of hours using the wiktionary ideographs in here. Great idea and excellent work! Congratulations! - Piolinfax 14:39, 18 Jul 2004 (UTC)