User talk:A-cai/2006

From Wiktionary, the free dictionary
Jump to navigation Jump to search

EVERY entry has to have a ==language== and ===part of speech=== entries. Also, only language is at level 2 - etymology is at level 3 or 4 e.g. ===Etymology=== not ==Etymology==. Here is our standard welcome. SemperBlotto 11:00, 30 January 2006 (UTC)[reply]

Welcome[edit]

Hello, welcome to Wiktionary, and thank you for your contributions so far.

If you are unfamiliar with wiki-editing, take a look at Help:How to edit a page. It is a concise list of technical guidelines to the wiki format we use here: how to, for example, make text boldfaced or create hyperlinks. Feel free to practice in the sandbox. If you would like a slower introduction we have a short tutorial.

These links may help you familiarize yourself with Wiktionary:

  • Entry layout (EL) is a detailed policy on Wiktionary's page formatting; all entries must conform to it. The easiest way to start off is to copy the contents of an existing same-language entry, and then adapt it to fit the entry you are creating.
  • Check out Language considerations to find out more about how to edit for a particular language.
  • Our Criteria for Inclusion (CFI) defines exactly which words can be added to Wiktionary; the most important part is that Wiktionary only accepts words that have been in somewhat widespread use over the course of at least a year, and citations that demonstrate usage can be asked for when there is doubt.
  • If you already have some experience with editing our sister project Wikipedia, then you may find our guide for Wikipedia users useful.
  • If you have any questions, bring them to Wiktionary:Information desk or ask me on my talk page.
  • Whenever commenting on any discussion page, please sign your posts with four tildes (~~~~) which automatically produces your username and timestamp.
  • You are encouraged to add a BabelBox to your userpage to indicate your self-assessed knowledge of languages.

Enjoy your stay at Wiktionary!

Image uploads[edit]

That's a beautiful image you uploaded, but we'll need license information if we're going to keep it. I'd also encourage you to add appropriately licensed images to commons:Main_page instead, so that all projects can use them. Thanks. —Dvortygirl 04:13, 1 February 2006 (UTC)[reply]

Chinese idioms[edit]

First, thank you for adding the Chinese idioms. Please do not use the template {{idiom}} in them. It's not a big deal, but it adds things to Category:Idioms, a category which is intended for idioms in English. Instead, please mark the definition (''idiom'') or (''idiomatic'') and put it in Category:Chinese idioms, as you have been doing. I have formatted your earlier contributions according to that convention. Keep up the good work. — Dvortygirl 07:15, 2 February 2006 (UTC)[reply]

Simplified and traditional Chinese[edit]

Nice work with the Chinese idioms but we don't separate different ways of writing a single language. It gives people the false impression that they are two distinct languages. Too many people already seem to think this way with regard to Chinese. It is better to put both on a single line just as we would with American and British English. Some people choose to label which is traditional and which is simplified but I prefer just to add both only if the differ, with the traditional always first since it is older and less ambiguous and still in use.

Keep up the good work! — Hippietrail 15:35, 2 February 2006 (UTC)[reply]

Chinese and Japanese[edit]

Hi. Lots of nice words you are adding and I admire your enthusiasm. I've noticed, however, that you are combining the Japanese and Chinese sections of a page. Here each language gets its own section even if the meaning is the same. The way you have it, it looks like the Chinese section has no definition and when there is information about Chinese in the Japanese section, it is confusing for people looking for the definition of a Japanese word. Millie 12:11, 3 February 2006 (UTC)[reply]

Categories[edit]

Hi there. I wonder if you would like to explain somewhere what you are doing with all these categories. I'm sure it would be of interest to a wide audience so maybe the Beer Parlour would be the best place. Cheers. SemperBlotto 13:05, 8 February 2006 (UTC)[reply]

  • Here is the response that I put in the Beer Parlour:

cross indexing Asian languages[edit]

After entering in about 100 new words and phrases, I became frustrated by the inefficiency of the input process. I had to go to several different pages just to properly index a single word. I will give an example of such a word, and show how my new method will help out.

The word is "university": 大學, 大学

etc ...

See: Category:大 for a concrete example. Observe how it now spiders out to:

  • Chinese language->Chinese Min Nan POJ index
  • Chinese language->Chinese Pinyin index
  • Chinese hanzi->CJKV radicals->Japanese kanji->Japanese language
  • Japanese language->hiragana index

and so forth and so on ...

Let's say you want to start a Cantonese index. Simply go to Category:大, make the appropriate entry (Category:zh-yue:daai)->Category:zh-yue:d->Category:Cantonese Yale index->Category:Chinese language, for example). Once you do that, any words that start with , and have Category:大 placed in them will automatically be added to the correct index. For the programmers out there, think pointers in C, or the Collections classes in Java. I hope all of this makes sense. I think it will once I put a little more plumbing in. A-cai 13:39, 8 February 2006 (UTC)[reply]

Why do you nest categories so many levels? You could put, say, Category:かん (kan) directly in Category:hiragana index instead of in Category:か (ka). Millie 04:12, 9 February 2006 (UTC)[reply]
  • Millie, I'm being an optimist!
  1. I'm hoping that once wiktionary catches on, there will be thousands and thousands of words. This will make it necessary to have a well thought out design in the beginning. If we end up not needing so many levels, it should not be too difficult to go back and fix on a case-by-case basis. If you prefer, you can reorganize the Japanese your way for now, and we can create the sub-levels later on as needed.
  2. I read somewhere that one aspect of a well laid out web page is if you don't have to scroll down too much. With the sub levels, you click to the next level, but you can see all of the categories because (in theory) there won't be too many.

Some of you who have been watching me for the last couple of days may have noted that I have tried one thing, realized it was wrong, then went back and redid it. In computer science, we call this the iterative process. Or in layman’s terms, "trial and error" (Chinese: 嘗試錯誤, 尝试错误). I'm glad to have the feedback. As you may have noticed, I am now in the process of implementing one of the suggestions from Wiktionary:Beer Parlour. A-cai 04:30, 9 February 2006 (UTC) ---[reply]

Lee and Li[edit]

Hi there. When you have time, could you look at these two entries and maybe add the equivalent Chinese characters. SemperBlotto 11:37, 9 February 2006 (UTC)[reply]

No problem. A-cai 11:40, 9 February 2006 (UTC)[reply]

Question: Hello. I would just like you to see a category Hiragana, we (I, Millie and some others) have been developing recently, and the entries in it. えき would be one of the most illustrative examples. As you can see, we intend the each hiragana article for a small index of the corresponding words. Just a note to tell you that our two approaches, the hiragana entries and your Japanese categories like Category:うえ (ue), seem to be considerably duplicated. Thanks. --Tohru 14:34, 9 February 2006 (UTC)[reply]

Answer: Tohru, to tell you the truth, I'm not sure which one is better. Here is my proposal; I will have my entries routed to both places for a little while so that we can decide which format works better. My preliminary thought is that if we end up going with Hiragana, we should have a hiragana TOC, for example:

Top










I will dual route my entries for a little while and then we can make a decision after some time has passed as to which one is better. I think it is still too early to tell. Also, I like the romaji part that I have in Category:hiragana index. It will be more helpful for beginning students to have the romaji in there.

In summary, it would be ideal if we could combine the best aspects of both Category:hiragana index and Category:Hiragana.

As far as the idea of reinventing the wheel; it happens all the time in technology. What if someone told the founders of Google, "Hey, we already have Yahoo! Why are you reinventing the wheel?" (Not to compare my Category:hiragana index with Yahoo:) A-cai 23:15, 9 February 2006 (UTC)[reply]

I like the TOC. I had started one myself in Category:Japanese kanji. I get what you're saying about not wanting web pages to be too long and I think the TOC is an excellent way to navigate a long page quickly. Millie 23:53, 9 February 2006 (UTC)[reply]

Millie and Tohru, I have added えきまえ and 駅前 to show how my indexing would work. Take a look at Category:駅 and Category:え (e).

  • Some additional points that I forgot to mention:
  1. All characters from all CJKV languages would only be cross indexed if every single character of the phrase is added (see the bottom of 人類). However, if we only indexed the head character of a word (see: bottom of 一石二鳥), the number would be considerably less (ex. has 2373 entries in 國語辭典 (Guoyu Cidian), but only 1512 of those start with .
  2. One last thing to consider, the PRC, Japan and Taiwan use different characters in many cases. In the 1950's, both the PRC and Japan simplified their characters to a great degree. What does this mean for our indexing? It means that there would be many cases where the two would not necessarily be in the same list. Example, the character in 車輪の再発明 is in Simplified Chinese and in Traditional Chinese. Therefore the entries for 發現 and 发现 would not appear in Category:発, but 発見 would. This phenomenon should help limit the number of words for any given character; especially if that character is not common to all three character sets.
  3. I am thinking that for a word like 車輪の再発明, one would add the following categories:
  • Category:車
  • Category:輪
  • Category:再
  • Category:発
  • Category:明
    • Note that I didn't add a Category:の. I think if we did that, we would run into huge numbers that would not be particularly meaningful.

If the above makes for too many entries, we have two options: add a TOC to the ones with a large number of entries, or only do the head character. So for this one, we would do Category:車. A-cai 11:45, 10 February 2006 (UTC)[reply]

Question:

First of all, sorry for my simple response, but I'd like to hear your simple explanation about what is the Category:駅 is for. I'm guessing that the purpose is duplicating with "compounds section" many kanji/hanzi entries have already equipped with. --Tohru 13:16, 10 February 2006 (UTC)[reply]

Answer: Tohru, Category:下 is a better example. So is Category:十. You can now navigate to Category:十 in the following ways:

  1. Category:CJKV radical index->Category:十 radical->Category:十
  2. Category:hiragana index->Category:と (to)->Category:十
  3. Category:hiragana index->Category:と (to)->Category:とお (tō)->Category:十
  4. Category:hiragana index->Category:と (to)->Category:とたび (totabi)->Category:十
  5. Category:hiragana index->Category:じ (ji)->Category:十
  6. Category:hiragana index->Category:じ (ji)->Category:じゅう (jū)->Category:十
  7. Category:Chinese Pinyin index->Category:sh ㄕ->Category:shi ㄕ->Category:十
  8. Category:Min Nan POJ index->Category:nan:c->Category:nan:chap->Category:十

Hopefully this is a better example. Unfortunately, the reasoning for my design won't become more obvious until we have a lot more words in wiktionary.

Please consider my proposal. We don't have to decide yeah or nay right away. Perhaps everyone would like to see how things progress before making a decision. Now that I have everything set up just about the way I want it, I can start to put in entries at a much more rapid pace than before. Over time, I think my design will work well for everybody.

Warning!!! This next bullet is intended for the programmers who are (hopefully) reading this.

  • Now for the techies out there: I am treating each CJKV character like a C struct that can then be linked to other categories (if one were designing an SQL database, you might achieve the same thing by creating a table for each CJKV character, then linking the CJKV tables back to higher level tables such as one for a Chinese Pinyin index). So far, I have been doing things by hand so that we can come up with a design model that everybody is happy with. Once that happens, it should not be too difficult to write a script that would create a category for each CJKV character. Somebody already wrote a program that did the individual pages with their accompanying phonetic spellings, radical info etc. Now that we have a page for each CJKV character, I am proposing to create a Category for each CJKV character as well. As can be seen from the above example, this will make for an efficient cross-language indexing scheme. Once that is done, all that would be required to add a new entry to all relevant indexes would be to include "Category:CJKV character" in each entry (ex. include Category: in the entry for 大人). Is there anyone at wiktionary with both the know-how and the authorization to do this?

A-cai 14:44, 10 February 2006 (UTC)[reply]


Comment: This is an observation in response to Tohru's comment. Category:下 actually doesn't really duplicate the compounds section since the compounds section segregates each language. Category:下 wouldn't be useful for someone trying to find all Japanese compounds with since the category has lots of Chinese entries and there is no way to tell unless you click on each one. Millie 05:43, 11 February 2006 (UTC)[reply]

Response: Exactly right Millie. Each model has its virtues and draw backs. For example, let's say you were a linguistics major who wanted to track CJKV word usage across languages; my Category:下 would be able to help you come up with the stats. However, it would not be particularly useful for someone trying to improve their intermediate language skills for a given language. For that, we would have other categories such as the ones already created in the Japanese language section like Category:Grade 1 kanji, Category:Grade 2 kanji etc. In fact, I like those categories so much that I am planning on starting something similar for Chinese. Category:下 is intended for a professional translator who is looking for the definition of an obscure term. Here is a basic example to illustrate my point:

  • 制度 - Chinese: zhì dù, Japanese: せいど (seido). I chose this entry because it only had the Chinese definition and not the Japanese definition when I found it about a half hour ago in the Chinese nouns section (you guys have probably already put the Japanese in by now;). Let's say I were translating a sentence from a Japanese document such as:
  1. 経営者皆様自身退職金制度です.

If I don't know what 制度 means, and I look it up by the head character (Category:制), I would still be able to find the definition.

The above is a rather simplistic example, but if it were a more obscure term (the ones that always vex professional translators), this feature could be a huge help. See this article for a more detailed explanation about cross-language indexing.

I'm going to anticipate a few questions you might have:

  1. Q: How would I know before clicking on the entry what language it is?
    A: Because I found the word being used in a Japanese sentence.
  2. Q: How do I know the meaning in Japanese is the same as in Chinese?
    A: This is a more difficult question. Sometimes the exact same characters mean two different things in Japanese and Chinese. Ex. 手紙 means letter (as in to mail a letter) in Japanese, but it means toilet paper in Chinese! One technique to overcome this phenominon is to look at the contextual information in the sentence. If the English definition makes sense in the context of the sentence, then you know you're probably on the right track.

A-cai 10:28, 11 February 2006 (UTC)[reply]

Can you improve this one please. SemperBlotto 11:53, 12 February 2006 (UTC)[reply]

  • I added some info for you. You might want to create an entry called grass script and redirect tsao shu to it.

A-cai 12:57, 12 February 2006 (UTC)[reply]

Understanding the issues with regard to Asian languages[edit]

Hoi,
As you may know, I am working on something called "WiktionaryZ". The intention is to have all words in all languages. Given that it intends to include all information that is included in the Wiktionaries, you may appreciate that it is quite an ambitious project. The current ALPHA code can be found here: http://wiktionaryz.org.

The timeline is rather long :( but we do work on things like design and requirements for particular languages all the time. What I understand from some of your comments is that you want several indexes that are particular to Asian languages. That is something that IS considered. The only thing is how we are going to implement these. If you want to discuss these things, I can be found often on IRC or on skype gtalk ... GerardM 10:56, 16 February 2006 (UTC)[reply]

  • Thanks for the heads-up. I think there are a lot of on-line resources that can help us with the wiktionary endeavor. I have lots of ideas for both indexing and inputting words. Here are a couple of web-sites that could help us right off the bat:
  1. CEDICT in UTF-8 with both traditional and simplified Chinese
    A script could be written that could format each word in the file (approximately 30,000 words) to wiktionary standards and voila! That would free me up to concentrate on words that are not documented in Chinese-English dictionaries (but are in common usage, you would be surprised at how many words are in this category).
  2. http://cojak.ajax.org
    This is an awesome website. As you will see from clicking on the link, the data associated with each CJKV unicode character is listed (Radical/Stroke count along with Mandarin, Cantonese, Japanese, Korean and Vietnamese readings). The sight uses a php script called index.php that takes a unicode value as an argument. A program could be written to gather the information for each of these languages. I believe User:Nanshu must have done something similar to this so that he could generate the pages for each character.
  3. http://lomaji.com/poj/tools/su-tian/index-en.html
    This website can provide the data for Min Nan. It looks like a script could be written to query this one as well. For this website, you would need to use UTF-8 encoding (ex. %E4%B8%83) in the url.
  • I agree that the kind of cross language indexing that I'm advocating will take time to design correctly. However, I don't think it would be a very difficult task for an enterprising programmer with some knowledge of Asian character encodings.

A-cai 12:06, 16 February 2006 (UTC)[reply]

Categories[edit]

Hi there. You're not going to put all these Asian language categories up for deletion like the last ones, are you? --Dangherous 10:07, 23 February 2006 (UTC)[reply]

  • It took a bit of doing, but I think I am happy with the current layout. So the answer to your question is no I do not intend to delete these categories.

A-cai 10:11, 23 February 2006 (UTC)[reply]

Gaoliang[edit]

Original Post[edit]

Hi, got your long message about gaoliang. I removed the definition "kaoliang" because 1) "sorghum wine/liquor" is called "gaoliang jiu" in Chinese; 2) kaoliang isn't the right pinyin; 3) it is known as "sorghum wine/liquor in English and "gaoliang jiu" in Chinese. I'm afraid I don't understand your last comment about splitting traditional and simplified forms; what you say doesn't make sense to me at all. Can you rephrase in a way that I can more easily understand what you are up to? Badagnani 09:49, 6 March 2006 (UTC)[reply]

My response[edit]

    • Badagnani:
  1. You are correct that kaoliang can also be referred to as 高梁酒 (Pinyin: gāoliáng jiǔ). However, it is far more common to omit the 酒 character and refer to it simply as 高梁.
  2. It is spelled kaoliang because it is primarily made and sold in Taiwan. In Taiwan, the spelling kaoliang is used. This spelling is not Pinyin, but rather is based on the Wade-Giles system of romanization (kung fu is also Wade-Giles spelling; in Pinyin, it is gōngfu). Here are a couple of links that you can look at so that you will know that I'm not just making this up:
    Please click on this link for an example
    or this link
  3. Almost nobody I know, that drinks kaoliang, refers to it as "sorghum liquor" (even though that is the literal meaning). That includes non-Chinese speakers.
  4. If you look at Category:Chinese idioms (Simplified Chinese), and compare it to Category:Chinese idioms (Traditional Chinese), I think it will be easier to understand what I am after. Note that the number of phrases in both categories is the same (this is how I can tell if I forgot an entry is missing for one of the two). The sorting order is different. In the PRC, Pinyin and Simplified characters are used. However in Taiwan and Hong Kong, Traditional characters are used. Cantonese is the primary language of Hong Kong, but Mandarin and Min Nan are the most common dialects in Taiwan. This suggests that the Traditional list should be sorted based on radical-stroke order, and not phonetic order.

In conclusion, I understand that it is difficult to trust someone's ability when all of the contributors to wiktionary are anonymous. If you're not convinced that I actually speak Chinese fluently, the best thing that I can tell you is to click on Special:Contributions/A-cai, and look at some of the entries that I have been making. I can assure you that you will not find many of these entries (particularly the longer phrases) in other Chinese-English dictionaries because I have checked. That's the whole reason that I'm putting them here, I want to share them with the English speaking world (free of charge)! Another thing you could do is to read my translation of Preface to the Poems Composed at the Orchid Pavilion. It's about the best resume that I can give you under the circumstances. If you still do not think that I'm qualified to talk about the Chinese language, then there's not much else I can do. A-cai 12:32, 6 March 2006 (UTC)[reply]

Second Post[edit]

  • This all makes good sense and I hadn't thought that the "kaoliang" spelling might be Wade-Giles. But then again a much larger percentage of the Chinese "sorghum wine"-drinking public would use pinyin, if any romanization. As I'm sure you know, the beverage came to the knowledge of non-Chinese speakers through Zhang Yimou's film "Hong Gaoliang" (Red Sorghum), and I think the subtitles translate it just as "wine" or "sorghum wine." I still think that calling it simply "kaoliang" isn't correct, but really slang, in the way that Irish jokingly refer to their whiskey as "the barley." By the way, I'm an American and only know Chinese in a basic way so I'll defer to you on the splitting of trad./simplified. I do think that there should be a Cantonese pronunciation, and that leaving the "Cantonese:" in the article (rather than removing it because the pronunciation isn't there yet) is a good way to alert Cantonese speakers that they need to add this pronunciation--like a redlink. If you look at my user page you'll see that I've made a number of articles in both trad. and simplified versions and find this cumbersome; the Chinese wiktionary idea is a good one, if it were easy for people to realize that they could toggle between simplified and traditional, having only one article for each word. Then again, some usages and spellings are different between Cantonese and Mandarin. Take a look at some of my articles that have both simplified and traditional versions and see if I'm doing it according to the way you have in mind. Nice talking with you. Badagnani 00:36, 7 March 2006 (UTC)[reply]
  • I looked at the photos and see that the spelling used on the liquor boxes is "Kaoliang"--but all the bottles call it "Kaoliang liquor," not just "Kaoliang." And the Chinese on the liquor boxes have the character 酒. About the Wiktionary article, I'm not sure you can say that "kaoliang" is an English word; it's probably a Chinese word romanized into English. Badagnani 00:40, 7 March 2006 (UTC)[reply]
  • The phrases you're adding are interesting, but I wonder whether a dictionary (this one, Wiktionary) should have long phrases, proverbs, etc. I even hesitate myself to add anything beyond something like "tanghulu," which is already two words put together. By the way, which Min Nan language do you speak? Chaozhou? Badagnani 01:00, 7 March 2006 (UTC)[reply]

My response to second post[edit]

    • This could be a regional thing. I believe Zhang Yimou's film takes place in Shandong in the 1920's and 1930's. It's possible that the term "gaoliang jiu" may have been more popular in that area; I have not spent a lot of time in the Shandong region. I can tell you that the term "gaoliang" is used in spoken Chinese in the Fujian/Taiwan area in reference to the liquor (this threw me too the first time I heard it). However, in written Chinese, "gaoliang jiu" seems more common.
    • With respect to "kaoliang" being an English word versus a Chinese word, this is always a tough one. It is so subjective. I am probably not the best person to ask because I approach the world from a multi-lingual point of view. Lines between languages are often more easily blurred. To tell you the truth, the best people to ask would probably be business people who are native speakers of English that have frequent interaction with Chinese business people (particularly in the Fujian/Taiwan area). In that region, "kaoliang" is almost like a ritual at business dinners. It is toasted in tiny shot glasses (乾杯 gānbēi - bottoms up or 隨意 suíyì - just take a sip), and then everybody sings out of tune karaoke.
    • I have looked at some of your entries. I think our formats are fairly close. Again, the one complaint that I heard from User:Eclecticology was that he did not like the ===Pronunciation=== line. Here is what he said:
  1. "Pinyin is a standardized system for romanization, not for pronunciation. I think that it's important to keep this in mind to avoid getting completely confused by the chaotic maze of transliterations that have developed over the years for Chinese."
    • For this reason, I have taken ===Pronunciation=== out of my entries.
    • With respect to the long phrases (idioms and proverbs), not only should these phrases be added to Wiktionary, it is a primary focus of Wiktionary and it is what distinguishes it from many other less ambitious projects. In terms of criteria for inclusion, I have tried to follow the guidelines laid out in these two articles: Wiktionary:Idioms and Wiktionary:Criteria_for_inclusion#Attestation. Attestation is particularly difficult in the case of idioms. For most people who learn a second language, idioms is their Achilles heel. One key question that always pops up: is phrase x really all that common? For the most part, this can only be answered by people who have significant interaction with native speakers. However, how does one convey that on Wiktionary when everybody is anonymous? One solution is to provide some kind of statistical data in each entry which shows how frequently the term or phrase is used. The easiest way to do this is with Google queries. This is why you see the Google hits in most of my entries. These are meant to provide some frame of reference, but should not be given too much credence. You should still run it by as many native speakers as you can.
    • Last but not least, Min Nan; sadly, I'm currently the only one in the world who is regularly adding Min Nan terms to Wiktionary. Part of the problem stems from the fact that Wiktionary is a written medium, and nobody really agrees on how Min Nan should be written. Some write Min Nan using Chinese characters, some use POJ, and still others use a mixture of both. Never-the-less, I press on in hopes that reinforcements will one day arrive. As far as my Min Nan accent, I try to keep it generic. Like many speakers of Min Nan, my speech is a mixture (in my case, Quanzhou 泉州 and Zhangzhou 漳州 accents).

A-cai

      • Thanks for the great info. I actually bought a bottle of kaoliang in Beijing in October (along with 4 other bottles of maotai, zhuyeqing, chajiu, etc.). I find that, with the exception of the zhuyeqing, most of them taste fairly horrible in a petroleum distillate sort of way. I'm only familiar with the Chaozhou form of Min Nan through my study of Chaozhou music. I play the sanxian and yehu and study with a teacher who is Cantonese, but has spent many years investigating Chaozhou music. If indeed it's the case that Wiktionary includes idioms, I think it's fantastic that you're adding these as a service to world culture and knowledge. Regarding pronunciation, I'm quite happy with the use of pinyin and Cantonese in the articles, although pronunciation in other important dialects (Fujian being one) could be added as well. Badagnani 22:21, 7 March 2006 (UTC)[reply]

New idiom[edit]

Here's a new idiom that you might want to make an entry for. 光明正大. A friend just used it. I think it means "frank and upright" or something like that. Badagnani 18:51, 11 March 2006 (UTC)[reply]

I have made an entry for 光明正大. I am trying to categorize words as Category:Beginning Chinese, Category:Intermediate Chinese, and Category:Advanced Chinese. The first two are a little easier because I have a collection of beginning and intermediate Chinese textbooks that I can consult. If I find the term in one of those, I will also try to add an example sentence (with Pinyin) from the textbook (for example: 沙發, 沙发). Advanced Chinese has been more difficult to pin down. At this point, I am loosely defining "Advanced Chinese" as between third to fourth year college Chinese. The problem that I encounter here is that there are a lot of words that get a glossary entry in advanced Chinese textbooks that do not necessarily need to be committed to memory by a third or fourth year Chinese language student. I don't just want to put advanced Chinese on everything that is not beginning or intermediate. In order to make progress at that level, it is critical that the student learn advanced words and phrases that are not so obscure that even native speakers are scratching their heads. I was thinking that I should create a new category called something like "words and phrases that only an educated native speaker or extremely advanced non-native speaker would be likely to know." Of course, I would have to shorten the category name to something more manageable.

A-cai 01:18, 12 March 2006 (UTC)[reply]

Invitation to contribute[edit]

Hi,

You might or might not already be aware that there is now a new system in place for marking translations that need to be checked (those that are suspected of being incorrect or those where it is not clear which sense(s) of a word the translations apply to). (See here for the Beer parlour discussion on this topic.)

Translations to be checked are now categorised by language. For example, Category:Translations_to_be_checked_(French) contains a list of all words where French translations need to be checked. This is designed to make the checking of these translations easier to maintain and work with.

I'm contacting everyone who has expressed an interest in working on translations or has indicated in Wiktionary:Babel that they have a good knowledge of a particular foreign language or languages.

Would you be interested in helping out with the translations to be checked for Chinese? If so, please read the page on how to check translations.

If you want to reply to this message, please do so on my talk page. Thanks for your help you can provide.

Paul G 08:59, 12 March 2006 (UTC)[reply]

Here's a word I've just learned, which needs an entry. I think it means "if." I was confused at first about its meaning because the second character means "fruit." Badagnani 14:52, 31 March 2006 (UTC)[reply]

  • Intermediate Chinese: "if." The original meaning of the second character is fruit. From this root meaning, comes the extended meaning of "result." If we were to be overly literal: "in the case that the result is ..." After several thousand years of conventional use, the meaning simply becomes "if."

A-cai 09:08, 1 April 2006 (UTC)[reply]

Here's another one. Badagnani 21:04, 31 March 2006 (UTC)[reply]

  • Intermediate Chinese: owing to; due to; as a result of.

A-cai 09:08, 1 April 2006 (UTC)[reply]

Another one -- I think it can mean "stomach," and you can make "stomachache" out of it: 兜兜转转. But I'm not sure if by itself can also mean stomach. Badagnani 21:08, 31 March 2006 (UTC)[reply]

  • This word falls outside of the 8840 core vocab words for the HSK. It can be a personal name or a piece of cloth that is wrapped around the midsection to carry either a baby or other items.

A-cai 09:08, 1 April 2006 (UTC) Thanks for all the info--I thought in the context of 兜兜转转 it meant "upset stomach." Badagnani 21:31, 1 April 2006 (UTC)[reply]

  1. 兜兜转转: to go round and round; to go all around.
    兜兜转转两天 航班飞回起点
    After flying around for two days, the airliner finally returned to its point of origin. (from this link)

A-cai 10:55, 5 April 2006 (UTC)[reply]

One more. Badagnani 21:12, 31 March 2006 (UTC)[reply]

  • Beginning Chinese: to perform; to put on a show.

A-cai 09:08, 1 April 2006 (UTC)[reply]

Here's another one. I think it means "puddle" or "pond." Badagnani 21:31, 1 April 2006 (UTC)[reply]

  • Think of this as more of verb object (: accumulate(d) : water). In some situations, it can mean puddle or pond.

A-cai 00:58, 2 April 2006 (UTC)[reply]

Thanks! Here's another idiom I can't make sense of. Do you know the meaning? Badagnani 05:51, 4 April 2006 (UTC)[reply]

  • This phenomenon is known in Chinese as or which means to take a character and write the individual components separately; it is often a subtle attempt at humor. If you put and together, you get . According to several Chinese language websites, including this one, or ( Pinyin: chāoqiáng) is often used on BBS sites as an expression of praise for a particularly well written BBS post. It literally means strong (or super strong in the case of ), but you can pick whatever the current English slang is for cool, awesome, groovy etc as a translation.

A-cai 14:44, 4 April 2006 (UTC) Fascinating! Indeed, I picked up the phrase from a blog of my Chinese friends who are Beijing undergrads. I NEVER, NEVER would have figured this out. Badagnani 06:58, 5 April 2006 (UTC) I wonder if this one deserves an article in this case, if it counts as a slang word. It would help people decipher its use from blogs. Badagnani 07:55, 5 April 2006 (UTC)[reply]

  • If it were English, I might call it a variant spelling. We have this kind of stuff in English as well. For example, it took me forever to figure out IMHO (in my humble opinion). If you do create an entry, make sure that you point out that it only works in Simplified Chinese ( would be in Traditional Chinese, plus is in Traditional).

A-cai 10:22, 5 April 2006 (UTC)[reply]

    • BTW, a famous example of this phenomenon involves the Chinese author Lǎo Shě (). His surname at birth was . When he began writing, he took as his literary name. This then became ("old Shě").

A-cai 10:33, 5 April 2006 (UTC)[reply]

Very interesting. I think this must be the person for whom the Lao She Chaguan in Beijing is named. I saw a performance of various Beijing local narratives and other performances there recently. Badagnani 18:47, 5 April 2006 (UTC)[reply]

Can you check this one I just made? Thank you, Badagnani 18:20, 8 April 2006 (UTC)[reply]

  • I added examples sentences from the HSK dictionary along with my own English translations of the example sentences. I also added Min Nan info. Let me know what you think.

A-cai 01:40, 9 April 2006 (UTC)[reply]

Looks great! The only use I'm wondering about is if you can use the phrase to say you're going to oil something, such as for example a bicycle, a door hinge, or a piece of furniture... Badagnani 04:35, 9 April 2006 (UTC)[reply]

  • I was just blindly following the example from the HSK dictionary. You are correct, to oil and to lubricate are not the same as to refuel. Therefore, I split the first definition into two and added an example sentence for to lubricate from The Pinyin Chinese-English Dictionary.

A-cai 05:59, 9 April 2006 (UTC)[reply]

These expressions are very rich in meaning. It's very educational and, as usual, interesting to see these pages develop, explaining these "unwritten" meanings of these phrases. What a great service you are doing for Chinese learners everywhere. Badagnani 06:16, 9 April 2006 (UTC)[reply]

I'm sending phrases for you to look at as I come across them (and don't find them already in Wiktionary)... This one doesn't make literal sense to me but a friend has just used it in an email and I guess it means "regret." Badagnani 04:35, 9 April 2006 (UTC)[reply]

Chinese phrases wiki[edit]

Looks like this website is similar to your effort--did you know about it? http://lindy.dre.vanderbilt.edu/wikitnt/ Badagnani 06:31, 9 April 2006 (UTC)[reply]

  • No I did not. Thanks for providing the link. The stated goals of the wikiTNT appear to largely coincide with what I'm doing on en.Wiktionary. Namely, to create a resource that acts as a bridge between idiomatic Chinese and English. Of course the focus of wikiTNT is slightly diferent from en.Wiktionary. wikiTNT is focused on accommodating native Chinese speakers who wish to improve their English. In contrast, en.Wiktionary is attempting to explain the Chinese language in an English context. I wonder if anyone is trying to incorporate wikiTNT into zh.Wiktionary. That seems like the best home for the information.

A-cai 07:11, 9 April 2006 (UTC)[reply]

Glad that website could be of use. I agree that this is the best place for the information. Thanks also for the yihan entry, which is quite good. I have a new question: I need the verb for "to play a musical instrument." I only found , but it seems to refer only to playing a string instrument. Is there a single verb that could refer to playing the flute, pipa, piano, guzheng, bianzhong, sheng, dagu, etc.? Badagnani 18:45, 9 April 2006 (UTC)[reply]

I just made an entry for 演奏; can you check it? Badagnani 22:30, 10 April 2006 (UTC)[reply]

A-cai 10:57, 10 April 2006 (UTC)[reply]

I've just made an entry for 表演; can you check this one too? Badagnani 01:57, 11 April 2006 (UTC)[reply]
  • I should have pointed out that 表演 can also mean to perform (in a general sense). If you're looking for a versatile word that can be used in a lot of situations, I would go with 表演.

A-cai 12:09, 11 April 2006 (UTC)[reply]

Here's a new one. I think it means "jog" (as in run slowly). Badagnani 07:11, 10 April 2006 (UTC)[reply]

  • it does indeed.

A-cai 10:58, 10 April 2006 (UTC)[reply]

Thanks--can you check my new entry (especially the first syllable, which I can't determine the tone for)? Badagnani 22:15, 10 April 2006 (UTC)[reply]

  • Looks good, pǎobù is correct.

A-cai 12:14, 11 April 2006 (UTC)[reply]

One more idiom I've just run across. Badagnani 09:52, 10 April 2006 (UTC)[reply]

  • yìshù: art.

A-cai 11:07, 10 April 2006 (UTC)[reply]

Thanks again--can you check this entry (especially the etymology; I'm not sure I have the definition of the second character correct in this context). Badagnani 22:21, 10 April 2006 (UTC)[reply]

A-cai 12:37, 11 April 2006 (UTC)[reply]

Thanks for shu info. Can you check this one? 自问自答 Badagnani 08:59, 12 April 2006 (UTC)[reply]

  • done.

A-cai 10:32, 12 April 2006 (UTC)[reply]

Thanks! As far as the definition you've substituted, I think there's more to it than answering one's own question. I'm under the impression, from the context and from the various sites I've looked at, that this might be a Maoist-type term relating to examining oneself, almost in a ritualistic way. So the sitting down and thinking about one's thoughts and behaviors is just as important as the coming up with the answer. Am I correct in this? I'm not sure whether the meaning would be the same before 1949 (not sure whether the term was even used), or in Taiwan, Singapore, or elsewhere in the non-PRC world. Badagnani 18:13, 12 April 2006 (UTC)[reply]

  • I've looked at a few more websites. The closest I have found to the ideological connotations you describe is from this page. Here is the relevant passage:

主管人员员工成长定期观察指导因此需要利用下列自问自答方式加以检讨
·工作表现进步?
·没有自我改善行动?
·改善工作方法机会敏感?
·同事相处情形?
Managers should periodically observe and guide the progress of employees. As such, managers must ask themselves the following questions as they review an employee's performance:

  • Has this person's performance improved?
  • Has he taken action to better himself?
  • Is he sensitive to methods of betterment and opportunities for betterment?
  • Does he work well with his colleagues?

Perhaps you can post a few of the links that you were looking at. I will take a look at them and give you my opinion. A-cai 11:45, 13 April 2006 (UTC)[reply]

Here's another one I've just made; can you check? Badagnani 21:07, 12 April 2006 (UTC)[reply]

  • I added a little more in the etymology section.

A-cai 12:13, 13 April 2006 (UTC)[reply]

Great--that's very informative. Is there any chance you can figure out the derivation of this term's literal meaning "intersecting toes"? My Vietnamese music teacher says that many Vietnamese believe the name comes from the fact that some Vietnamese in ancient times had big toes that pointed outwards, towards one another. But this seems like folk etymology to me, and he said he's never seen a Vietnamese with toes like that. Badagnani 19:46, 13 April 2006 (UTC)[reply]

交趾 says it is in the Tongkin area, which is in the north of Vietnam; it has nothing about Yueshang. Keith Taylor, op.cit., p.26, says the term (in Vietnamese Giao-chi) means "intertwined feet", referring to a group sleeping custom of some peoples of the region, but not the Vietnamese.

A reference to the Annamites as the Giao-chi (i.e. the "big-toed"—the wide separation of big toe from the others is still a distinctive characteristic of the Annamites), found in the Chinese annals of 2357 B.C.

  • I would like to find a better citation if I can (preferably on-line). The wikipedia:Annam article has one sentence about Giao Chi, but it is not detailed enough for our purposes.

A-cai 00:56, 14 April 2006 (UTC)[reply]

Interesting... Nobody seems sure about the original meaning. I was thinking originally that geographically the northern part of Vietnam might look something like a toe (as the "boot" of Italy resembles a boot), but that doesn't explain the first character. Badagnani 05:20, 14 April 2006 (UTC)[reply]

  • The explanation given on this website seems very plausible to me:

为什么叫做交阯古书解释有的地方睡觉互相交叉有的他们足趾并立足趾相交汉武帝朔方交址子孙意思猜想大概当地人民称呼地方名字,“交趾恐怕译音古书解释或许牵强附会
Why is it called Jiaozhi? Ancient texts contain several different explanations: some state that the people from that region slept with their heads away from the center of the room so that their feet would mingle together; others state that their big toes were separated from the other toes to such a degree that when standing with their feet together, their big toes would overlap. There is also one account that says that should actually be . According to this explanation, the Emperor Wu of Han established Shuofang in the north, and then Jiaozhi in the south in order to signify a location () from which good fortune could be passed down () to one's descendents. My guess is that "Jiaozhi" was nothing more than a transliteration of the name given to this area by the local people who lived there at the time. I'm afraid that the numerous explanations from ancient texts are rather farfetched.

A-cai 08:43, 14 April 2006 (UTC)[reply]

Thanks for your contribution for the term. I now have found the formal usage of the term: 三K行業. Maybe I should move the page to the more formal one?

  • The terms 3K industries and 三K行業 are used as category in Monthly bulletin of labor statistics, Taiwan Area, Republic of China (中華民國台灣地區勞動統計月報) by Council of Labor Affairs, Executive Yuan.
    • 表 11-1 臺閩地區外籍勞工在臺人數按開放項目分 (on page 172), PDF format, JPG format
    • Table 11-1 Alien Workers in Taiwan-Fukien Area by Various Type (English version of the same table), PDF format.

AirBa 04:05, 16 April 2006 (UTC)[reply]

  • Actually 3K產業 has more Google hits than 三K行業 or 3K行業. So I think it is fine to leave it as is, but include 三K行業 and 3K行業 as alternate forms as you have already done. I have seen some attempts to explain 3K by using Chinese terms (高溫、高噪音及高污染)[1]. I debated putting this into the etymology section. It seems a little unreliable. Do you have an opinion about this? I'm also thinking of labeling this entry as Taiwanese Mandarin since there are almost no Google hits from the PRC[2][3][4].

A-cai 05:24, 16 April 2006 (UTC)[reply]

I see. Then we keep it in 3K產業. And I don't know much about 高溫、高噪音及高污染. And I think you are right to put it in Taiwanese Mandarin . AirBa 06:52, 16 April 2006 (UTC)[reply]

clear is unclear[edit]

Could I trouble you to review the Chinese in clear and tidy it up and remove anything inappropriate? I especially wonder what the link under "References" has to do with anything, but I can't really be sure. Thanks. --Dvortygirl 07:58, 17 April 2006 (UTC)[reply]

  • Dr. Eye is the Chinese dictionary software that I consulted to verify that my translations were accurate. You may remove it and the reference section if you feel that it is unnecessary. As for the translation section, I am happy with the layout. Of course, if you have specific suggestions, I will be happy to entertain them.

A-cai 10:25, 17 April 2006 (UTC)[reply]

Looks like this basic word needs an entry. Badagnani 07:17, 20 April 2006 (UTC)[reply]

And this one. Badagnani 07:19, 20 April 2006 (UTC)[reply]

Created an entry but it needs some help. Badagnani 06:05, 23 April 2006 (UTC)[reply]

  • Finnished this one. Corrected the tones, added some example sentences.

A-cai 23:35, 29 April 2006 (UTC)[reply]

I found one more. Badagnani 07:29, 20 April 2006 (UTC)[reply]

I made an entry but it needs some help. Badagnani 10:38, 25 April 2006 (UTC)[reply]

  • Done with this one.

A-cai 08:11, 30 April 2006 (UTC)[reply]

Can you check this one I just made? Badagnani 05:53, 23 April 2006 (UTC)[reply]

  • done.

A-cai 09:23, 30 April 2006 (UTC)[reply]

Great, thanks? I'm wondering, though, can this also mean "interested" or "have an interest" as well as the definition that's given (adjective: "interesting")? Badagnani 21:29, 30 April 2006 (UTC)[reply]
I don't believe I've ever seen it used that way. Here are two example sentences to help out:
word example Pinyin English
有趣 事情有趣 zhè jiàn shìqing hěn yǒuqù this matter is very interesting.
... 感兴趣 游泳感兴趣 duì yóuyǒng gǎn xìngqù interested in swimming.

A-cai 09:29, 2 May 2006 (UTC)[reply]

This is a hard one. My friend wrote this to me and it translated as "happy," but the characters don't translate separately literally as this. Badagnani 10:38, 25 April 2006 (UTC)[reply]

  • Your friend is correct. 开心 = happy. = literally: open (extending meaning: happy) = heart; mind.

A-cai 08:16, 30 April 2006 (UTC)[reply]

"Hung-lu wine" question[edit]

Hi, we need Chinese language help at Wikipedia:Chinese wine. There's a wine which isn't yet discussed that is sold as "hung-lu" wine. It is reddish in color, a sharp smell, and a chemically, diesel-like taste, and is sold by the Oriental Mascot brand (which also makes mijiu and formerly also made Shaoxing jiu). The largest photo of this wine is here, but the characters aren't easily readable. I think "hung-lu" isn't Hanyu pinyin. Can someone provide information about this wine, the characters, etc.? Thank you! http://www.amazon.com/gp/product/images/B0000DJZ0F/ref=dp_primary-product-display_0/102-4042702-9901704?%5Fencoding=UTF8&n=3370831&s=gourmet-food Badagnani 05:57, 26 April 2006 (UTC)[reply]

  • I can't make out the characters on any of the photos for Oriental Mascot Red Cooking wine. I also have been unable to locate the Oriental Mascot website (if there is one).

A-cai 09:17, 30 April 2006 (UTC)[reply]

Chinese names of days and months[edit]

Hello. Would you please help out by creating entries for the Chinese months, or if they exist could you categorize them? I have been organizing the Time related categories and notice that Category:zh:Days of the week and Category:zh:Months have very few entries. Thanks very much for any help you can provide. --EncycloPetey 12:37, 28 April 2006 (UTC)[reply]

... for your improvements to 水涨船高 and 先礼后兵. Cheers! bd2412 T 15:42, 30 April 2006 (UTC)[reply]

Hello, thanks very much for the explanation above. I've just come across this phrase in a Chinese blog and can't figure out the idiom, though I've looked up each of the two characters individually. Can you help? I think this is the kind of non-literal phrase that's good to have Wiktionary entries for. Badagnani 23:05, 4 May 2006 (UTC)[reply]

  • I was not familiar with this term, so I googled it. I found several Chinese language websites that claim 算球 is from the Sichuan (四川) Dialect[5][6][7]. I don't speak Sichuan, so I cannot confirm that. Based on the way it is used in many of the websites that I looked at, and the explanations of its meaning and usage on several others, I have been able to determine that 算球 is most often used as a synonym for 算了 (English: never mind; forget about it)[8].
  • This is a bit advanced, but hopefully it will make sense. Are you familiar with the term 算盘 (abacus)? The abacus has beads () that are used to do basic arithmetic. means "to calculate" and means "board." The root meaning of 算球, therefore is "to calculate" "(abacus) beads" ( in Standard Mandarin) By extended meaning, 算了 and 算球 both have an implication that "the calculating is over," in other words, there's no point in going on (so forget it). appears to add emphasis (i.e. 算球 or 算球了 instead of 算了). On some of the websites, 算球 had a more literal meaning of "to calculate" or "abacus beads."
  • I will attempt to make an entry for this one if you like, but it might be good if we can find a speaker of Sichuan Dialect to weigh in on the subject.

A-cai 10:38, 6 May 2006 (UTC)[reply]

copyvio[edit]

Hi. I see you added a large chunk of text from etymonline.com in silk. Please don't do that - it is protected by copyright laws. Did you do that anywhere else as well? —Vildricianus 23:05, 26 May 2006 (UTC)

  • Isn't that an example of fair use law? It would be the same as if I had quoted a paragraph from a book, wouldn't it? Since the single paragraph about the etymology of the word silk does not constitute a substantial portion of the website etymonline.com, it should be fair game. Of course, I'm not a lawyer! I think we need a tag like {{copyvio}} similar to {{rfv}} where such things can be arbitrated by legal experts.

A-cai 21:51, 28 May 2006 (UTC)[reply]

I'm not an expert either, but as fas as I know, fair use is restricted to instances where it's impossible to obtain the information via other means. That's certainly not the case here I think. It would be okay if it were only this one paragraph, but etymonline stuff is frequently inserted here. Also, since both our aims (etymonline's and Wiktionary's) are more or less the same, namely, describing etymology of words, I don't think it's appropriate here. You might want to ask User:Andrew massyn or User:BD2412, who are legal experts, if you want to know more. —Vildricianus 22:28, 30 May 2006 (UTC)
Fair use is not an automatic thing resulting from the size of the text taken; it involves a four factor analysis looking into the purpose of the use, the potential commercial effect on the copyright owner, the substantiality of the material borrowed, and the degree of creative effort involved in the creation of the work. Here, however, I note that it would be easy enough to re-write the offending material in your own words, and provide a credit to the source publication (as a reference). bd2412 T 22:34, 30 May 2006 (UTC)[reply]

Chinese redirects[edit]

Why are you redirecting between Chinese traditional and simplified entries? We have enough fights and arguments about this kind of thing with English spellings without introducing a reason for people from mainland China, Hong Kong, Singapore, and Taiwan fighting about the same thing. I suggest you stop and instead start a topic on the Beer parlour about it. — Hippietrail 22:27, 6 June 2006 (UTC)[reply]

I would be happy to debate the merits of redirecting versus creating separate entries versus creating a php script like the one for zh:wiktionary. The problem is that I almost always end up debating with people who know next to nothing about the Chinese language. It gets rather tedious. I realize that doing redirects is not ideal. But I am one of the very few regular contributors of Chinese words. Creating separate entries was simply taking too long. It is also a nightmare to maintain. I have raised the idea of creating a php script that would allow the user to click on a tab to switch from Simplified Chinese to Traditional Chinese as is done on zh:wiktionary. So far, no one has shown interest in actually adapting or creating the necessary script.

A-cai 22:06, 7 June 2006 (UTC)[reply]

  • While I don't claim to be an expert on Chinese, I am aware of the challenges presented by CJKV languages including Chinese, Japanese, Korean, and Vietnamese. One of the discussions you pointed to mentions Connel's suggested way of fixing the color/colour problem in English - this is the problem that matches the Chinese simplified/traditional problem. The converter used by the Chinese Wikipedia and apparently the Chinese Wiktionary is not what we need any more than we need a converter between British and American English. A good Chinese or Japanese dictionary - at least what I look for when I'm buying one - contains both traditional and simplified forms of the characters, as well as any other variants. The English Wiktionary needs to do the same. Traditional should come first because it's not obsolete and is not ambiguous like simplified can be since several simplified characters map to 2 or even 3 traditional characters and of course in these cases can also have all the Pinyin readings of each. For a Chinese Wiktionary it makes sense to provide definitions and headings in the script the reader understands but even Chinese and Japanese have character dictionaries so that native readers can look up variants and traditional forms as well as exotic characters and their readings. Let me know if anything doesn't make sense or which points you disagree with. — Hippietrail 02:38, 8 June 2006 (UTC)[reply]
    • I agree that, at least to the extent that a simplified character may represented multiple traditional characters, a redirect would be inappropriate (since all of the meanings accorded to the simplified character may not flow from certain of the traditional characters). bd2412 T 03:47, 8 June 2006 (UTC)[reply]
      • If you look again at the sample links to the zh:wiktionary, you will note that not only can you tab back and forth between simplified and traditional, the entry itself ALSO lists the variant forms. It is not an "either/or" scenario. My main challenge is when I want to include one or more sample sentences. Currently, I have to write the same sentence in both simplified and traditional (which is a pain). My two choices on English Wiktionary are to either include both forms in the same entry (which looks weird), or create two separate entries (which is difficult to maintain). Also keep in mind that these issues come up in the case of Chinese FAR more often than in English. That creates a lot of unnecessary busy work on my part when making entries. The thing I keep asking myself is why aren't more Chinese speakers contributing? I think English Wiktionary is not ideally suited to a language like Chinese. I'm still plugging away, but that's only because I still believe in the concept of a dictionary that anyone can edit.
        • Having said all that, since both of you seem to not like the redirects, I will begrudgingly go back to creating separate entries for each. However, I still hope that the script idea will be taken seriously someday because I think it would solve a lot problems.

A-cai 09:53, 8 June 2006 (UTC)[reply]

  • I do take this problem seriously and I admire the amount of work you put in. I don't think the tab approach is the right one for us though it might be for the UK/US issue since our definitions are in English. Since there is some underlying system used by the tag, have you thought about some other way of accessing that system? Perhaps an extension that takes Traditional Chinese and outputs Simplified Chinese? If you can think of some way something like than can work I would be more than happy to support you from the technical angle in the Grease pit, on Bugzilla, and in the developer's mailing list. — Hippietrail 20:18, 8 June 2006 (UTC)[reply]
    • While many Chinese speakers are familiar with both forms, most prefer either simplified over traditional or vice versa. That suggests to me some kind of setting in preferences. In preferences, I would like to be able to choose from the following options for viewing entries that contain Chinese:
      • unaltered Chinese (variant=zh)
      • PRC Simplified (variant=zh-cn)
      • Singapore Simplified (variant=zh-sg)
      • Hong Kong Traditional (variant=zh-hk)
      • Taiwan Traditional (variant=zh-tw)

The default setting would be unaltered Chinese (whatever the original post uses). Preferences already has an option to choose language for viewing, but it only changes the tabs, navigation bar and tool bar. The entry itself is unchanged. Also, I can see a scenario in which a person would prefer to set the language to English, but have the default view for Chinese entries be PRC Simplified (maybe that person is a native English speaker who took a course in Chinese which taught only simplified). In other words, instead of being triggered by the tabs, the php script would do the conversion that is specified in preferences. A-cai 21:52, 8 June 2006 (UTC)[reply]

My mistake - I misunderstood the discussion - I thought we were talking about individual characters, not whole phrases. I have no objection to redirecting traditional to simplified for phrases, so long as both forms are displayed in the article (in case anyone wishes to see them side by side for comparison). bd2412 T 23:12, 8 June 2006 (UTC)[reply]

A haphazard edit[edit]

Could I trouble you to check and tidy up whatever Chinese has been added to haphazard? Thanks. --Dvortygirl 07:12, 10 June 2006 (UTC)[reply]

    • The original contributor provided several literal translations for random and chaotic (the first two definitions). Some of the Chinese words were synonyms. I left in two that I thought best capture the meaning behind haphazard.

A-cai 07:36, 10 June 2006 (UTC)[reply]

Name changed[edit]

Just letting you know - cheers! bd2412 T 23:10, 10 June 2006 (UTC)[reply]

Deletion[edit]

Please use {{delete}} for speedy deletions. {{rfd}} is for the process and discussion. — Vildricianus 20:59, 23 June 2006 (UTC)[reply]

Categories for deletion![edit]

You've tagged lots of categories for deletion. Too many to make me want to delete them all! Maybe you should try to get temporary admin powers so you can get rid of them all! --Dangherous 18:13, 6 July 2006 (UTC)[reply]

There's a few more categories that you've made, do you want these deleted...Category:ba ㄅㄚ, Category:bīng 冰, Category:cùn 寸, Category:した (shita), Category:ひる (hiru). These are all "dead" categories since February 2006. --Dangherous 22:30, 6 July 2006 (UTC)[reply]
Thanks for your help. I'm just trying to do some cleanup before I add more words. In addition to what I've already marked for deletion and what you mention above, everything in Category:CJKV_radical_index and Category:hiragana index can be deleted.

A-cai 23:05, 6 July 2006 (UTC)[reply]

Toneless characters[edit]

I am still confused about characters expressed without tones - I was taught that every character is expressed in one of the four tones. My sense was that expressing a character without a tone is like writing a word without using letters. How do you say a toneless character? bd2412 T 19:58, 1 August 2006 (UTC)[reply]

In a way, you can think of Standard Mandarin as having five tones. The fifth tone is not marked and is usually called the neutral tone in English (輕聲/轻声 qīngshēng, literal meaning: "light tone"). Click here for a more detailed explanation about tones.

A-cai 20:59, 1 August 2006 (UTC)[reply]

Thanks for the cite, I will study it. Cheers! bd2412 T 21:17, 1 August 2006 (UTC)[reply]
To hear examples, take a look at the vocabularly sections of lessons one and two from Wikibooks.

A-cai 21:18, 1 August 2006 (UTC)[reply]

Taiwanese Mandarin[edit]

Hey, wait a second! Taibei Mandarin isn't an IS 639-3 language ...

Should be Mandarin, with a note on the sense definition, like we use to say a word is used only in UK or US English.

  1. (Taiwan) package software

Much thanks for all the participating in our very interesting discussions. Would you like me to go ahead and start making cmn- templates? Best, Robert Ullmann 13:50, 2 August 2006 (UTC)[reply]

Thanks for pointing that out. I was actually thinking about that as I entered it. Luckily, I have only entered in less than 20 Taiwanese Mandarin words (Category:Taiwanese Mandarin).

A-cai 13:56, 2 August 2006 (UTC)[reply]

Note that there is nothing wrong with the category itself; seems quite useful! Robert Ullmann 14:11, 2 August 2006 (UTC)[reply]
The cmn- templates would be most welcome. I am glad to be working torward developing a standard way to enter Chinese words. Once we get all of this ironed out, I would like to get it documented on a help page somewhere, and then include it at Category:Chinese language. I think the word for dictionary (辭典) will end up being an exellent example for people who are unsure of how to make a proper entry.

A-cai 14:04, 2 August 2006 (UTC)[reply]

Okay, I will make a start. I do have a question: there is a second parameter on some of the cat tags, for example in the entry for Down's syndrome, what is that doing? (;-) I will keep all the zh- categories exactly as they are for now. Robert Ullmann 14:11, 2 August 2006 (UTC)[reply]
Are you talking about the 口07 in [[Category:zh-tw:Diseases|口07]]? This is for sorting purposes. Wiktioanry treats every Chinese character like a letter which makes it very combursome if you want to do a table of contents for a category. is the radical for , and the 07 represents the number of remaining strokes. Let me know if this answers your question, or if I misunderstood your question.

A-cai 14:20, 2 August 2006 (UTC)[reply]

Okay ... but there seems to be quite some inconsistancy in what is out there; there are categories with Chinese characters sorted by character, by radical, and by pinyin/POJ ... we can fix this in the templates if we can figure out what is going on and where we want to be. I've added nan-sim-noun, cmn-tra-noun, cmn-sim-noun, but they don't use the cat= parameter yet. Robert Ullmann
Here is the scheme that I have developed over time. It loosely follows the default behavior of the sort mechanism of Microsoft Word:
  • 辭典
    [[Category:zh-tw:Nouns|辛0a]] -- Mandarin in traditional script sorted according to radical/stroke order [Word: Table/sort, Type: Stroke, Options: Chinese (Taiwan)]. Wiki sorting is based on individual characters, which is why it will not sort properly if the number of strokes is above 9. Therefore, I have used pseudo-hex to force the sort to work correctly: 0-9, a=10, b=11, c=12, d=13, e=14, f=15, e=16, g=17 etc. When the words in the category number in the thousands, we will be able to put a radical/stroke TOC at the top (which would consist of not more than 214 radicals, see Wiktionary:Chinese radical index).
    [[Category:zh-cn:Nouns|ci2dian3]] -- Mandarin in simplified script sorted according to Pinyin [Word: Table/sort, Type: Phonetic, Options: Chinese (PRC)].
    [[Category:nan-tw:Nouns|su5-tian2]] -- Min Nan in traditional script. Word does not cover this. Since POJ is the most common method of Romanization, it makes sense to use this for the sort order.
    [[Category:nan-cn:Nouns|su5-tian2]] -- Min Nan in simplified script. POJ sort order.
    [[Category:yue-hk|Nouns|ci4din2]] -- Cantonese in traditional script [Word: Table/sort, Type: Text, Options: Chinese (Hong Kong SAR)]. Word's behavior is too vague, so I depart from Word's default behavior, and sort according to Jyutping Romanization.
    [[Category:yue-cn:Nouns|ci4din2]] -- Cantonese in simplified script. Word does not cover this either. Sort behavior is according to Jyutping Romanization.

The variation that you refer to is the result of a trial and error process over a number of months. I have been trying to go back and change everything according to the above scheme, but there are some entries that I still have not gotten to. I am basically happy with the above scheme with the exception of one thing: in the case of radical/stroke sort, doing it this way will force the head characters to sort correctly. For example, in Category:zh-tw:Idioms, the correct sort order for the Characters that start with 一 is:

  • 一人之下,萬人之上
  • 一手交錢,一手交貨
  • 一手托兩家
  • 一日三秋
  • 一日不見,如隔三秋
  • 一生受用不盡
  • 一石二鳥
  • 一朵鮮花插在牛糞上
  • 一而二,二而一
  • 一把鼻涕一把淚
  • 一言九鼎
  • 一物剋一物
  • 一家大小
  • 一畝三分地
  • 一針見血
  • 一陣青一陣白
  • 一盞茶工夫
  • 一腳高一腳低
  • 一網打盡
  • 一鳴驚人
  • 一箭雙雕
  • 一舉兩得
  • 一蹴可幾
  • 一蹴而就

But with the my current scheme, I get:

  • 一蹴而就
  • 一蹴可幾
  • 一鳴驚人
  • 一石二鳥
  • 一人之下,萬人之上
  • 一針見血
  • 一物剋一物
  • 一家大小
  • 一手交錢,一手交貨
  • 一畝三分地
  • 一陣青一陣白
  • 一盞茶工夫
  • 一手托兩家
  • 一腳高一腳低
  • 一日不見,如隔三秋
  • 一日三秋
  • 一生受用不盡
  • 一朵鮮花插在牛糞上
  • 一箭雙雕
  • 一而二,二而一
  • 一舉兩得
  • 一言九鼎
  • 一把鼻涕一把淚
  • 一網打盡

the second one is incorrect. I'm not yet sure what the fix should be. A-cai 22:28, 2 August 2006 (UTC)[reply]

Seems to me you should be able to use 辛10, that would sort after 辛09 ... isn't that why you used a two digit number in the first place? ... but in the Idioms example, all of them have a sort key of 一00 for the first character. So they are all sorted together, and then by Unicode/10646 collating order! So you are seeing them in that order by the second character, and by the third, etc. Hence 一日不見 and then 一日三秋, 三 follows 不 in Unicode. To get the order Word is probably using, you would have to do (radical)(stroke) for each successive character ... (urk!). This would have to be fixed within the Wikimedia software. Robert Ullmann 11:27, 3 August 2006 (UTC)[reply]
On a closer look, it seems to me the sort order may be random (i.e. added at the end when entered) once the sort key runs out. I said "三 follows 不 in Unicode"; I don't think that is true. Of course not! We did pick a rational order when doing Han unification! It is 4E09 (三) and 4E0D (不) ... The idioms above are sorted on 一00 and nothing else. I think what we want to do is sort 一日三秋 on key "一00一日三秋" (where the first is the radical, and the 4th character is the full character, they happen to be the same in this case). That will sort everything on radical/stroke for the first character, and then on Unicode sequence (which isn't bad) for the whole word. The templates could concatenate a parameter (rs=) with the headword. What do you think? Robert Ullmann 11:51, 3 August 2006 (UTC)[reply]
Yes, I think given the current limitations of Wiki software, you're solution seems like the best way. From my perspective, a nice default behavior for Wiki would be that if I place a Chinese word in a cateogory without an argument (e.g. [[Category:zh-tw:Idioms]]), Wiki would identify the radical of the head character, and use the radical as a bullet instead of the character itself. Ex.

===Noun===
辭典 (ci2dian3, Pinyin cídiǎn, simplified 辞典)

  1. dictionary

[[Category:zh-tw:Nouns]]

If you clicked on the [[Category:zh-tw:Nouns]], you would see

instead of the current:

In other words, Wiki would not need [[Category:zh-tw:Nouns|辛10]] in order to know what to do. The above would be the default behavior for words with Chinese characters, and then if you did something like [[Category:zh-cn:Nouns|ci2dian3]], the radical/stroke sort would be overridden by the ci2dian3 argument. Of course, in order to make this work, you might need something like a database that gives you the radical/stroke information for all Chinese characters. This kind of information is available from multiple on-line resources (including Wiktionary). It would just be a matter of arranging the data according to the needs of the Wiki software. A-cai 12:07, 3 August 2006 (UTC)[reply]

Chinese sort order[edit]

I wouldn't put too much weight in the Word sort orders, Microsoft tends to actually do things right only when forced to, which is very rarely. The question we should be asking is: what do we want to do? Then, given the Mediawiki s/w, how do we go about it?

It seems to me that headwords in Chinese characters should always be sorted radical-stroke. It is the standard ordering, and can be used without knowing the pronunciation (which may be what one is looking for!). Remember too that we are on computers, and someone who already knows what character/word they are looking for can search it. (Either in the search box, or with the browser search within a category.)

So I think the -tra- and -sim- templates should have an rs= parameter, as above. (If Mediawiki starts doing this, we'll just change the templates to ignore the parameter.) Any words that don't have rs= will just appear under the 1st character, which is better than nothing (and may cause someone to go add it). Or: is sorting by (e.g.) Pinyin preferable? What do you think?

Oh, just one other little thing: I wasn't bolding the headwords in the templates when Chinese characters on purpose; I don't think this is required, and boldface and Chinese fonts just don't work well together at smaller sizes ;-)

Robert Ullmann 14:58, 3 August 2006 (UTC)[reply]

Another thought is to always use Pinyin/POJ/Jyutping for the categories Nouns, Idioms, Diseases, etc, etc. and categorize all the words in each script in a unified radical/stroke index. You did try something like this before IIRC? Or catogorize them in Pinyin/POJ/Jyutping and wait for the Mediawiki software to handle the comprehensive index correctly. (which we could prod them about ;-) Robert Ullmann 15:08, 3 August 2006 (UTC)[reply]

For categories, it makes sense to do radical/stroke order for categories in traditional script since Taiwan (the place with the largest number of Mandarin speakers who use traditional script) lacks a standard Romanization for Chinese characters. The Zhuyin phonetic system is still the most popular in textbooks, but I don't think we want to use (bo po mo fo) because not enough Westerners are sufficiently familiar with the scheme to find it useful. For simplified Mandarin, Pinyin is a good choice because that is the standard Romanization in the PRC and is well known to Western students of Mandarin. For Cantonese and Min Nan, it makes sense to use Romanizations for indexing. One problem is that unlike Pinyin in the PRC, Jyutping and POJ are not universally known to speakers of those languages. However, they probably represent the most common methods of Romanization for their respective dialects.
Before going any further, I must point out that the above proposal for default category sorting behavior represents my current thinking about the situation in the absence of any substantial feedback from other Chinese speakers. I think it is a reasonable scheme, and is based on my own preferences as a Chinese speaker who regularly contributes to Wiktionary.
As for your idea about the index, I recently reworked the Wiktionary:Chinese Pinyin index to take advantage of the Wiki special pages feature. For example, if I wanted to find the word 辭典 (cídiăn) via the Pinyin index: I would click on Wiktionary:Chinese Pinyin index c, then on which actually links to Special:Allpages/辭. It just saves me time from having to type in all the compounds of by hand. My plan was to eventually make similar modifications to the radical/stroke index, as well as the Jyutping and (yet to be built) POJ index.

A-cai 21:28, 3 August 2006 (UTC)[reply]

Taiwanese Mandarin 2[edit]

Hi, you now have a label template {{Taiwanese Mandarin}} that you can use in a definition that uses cattag to automatically categorize the entry. See 3K產業. The name of the label template, the text, and the category name have to match. So I've used "Taiwanese Mandarin". We could change all 3 to "Taiwanese" if you would like. (I didn't pick this word (3K產業) by mistake, but it really is pretty easy in this case!) Robert Ullmann 20:11, 3 August 2006 (UTC)[reply]

I think "Taiwanese Mandarin" is acceptable for now. My intent for this category was for it to be a companion to the Taiwanese Mandarin article on Wikipedia. The term "Taiwanese Mandarin" also has the advantage of having already been reached by consensus after a debate on the Taiwanese Mandarin Talk page

A-cai 21:35, 3 August 2006 (UTC)[reply]

Thank you, good. I am going to be away from the computer for a few days; travelling to a district in the west of Kenya. So please don't worry about a temporary lack of response. Robert Ullmann 21:58, 3 August 2006 (UTC)[reply]
Sounds fun, have a good trip!

A-cai 22:10, 3 August 2006 (UTC)[reply]

Chinese articles[edit]

Since you are very active and pursuasive to edit Chinese articles, would you like to come to Chinese Wiktionary as well if you have not done so?--Jusjih 13:15, 6 August 2006 (UTC)[reply]

Thanks for the offer, I have only edited a couple of articles on Chinese Wiktionary so far. I have mainly focused on English Wiktionary because I discovered it first. I will try to do more on Chinese Wiktionary in the future. I'm curious if you have any thoughts about how to get more Chinese speakers involved with Wiktionary? Why do you think that so few have contributed so far? What improvements to Wiktionary do you think would attract more Chinese speakers (either on Chinese or English Wiktionary)?

A-cai 22:16, 6 August 2006 (UTC)[reply]

How to handle simplified and traditional Chinese is a very painful thing for Chinese speakers. As I also look for Japanese kanji and Korean hanja compounds, I have noticed that most Korean hanja compounds are written in the same way as traditional Chinese ones. As frustrations have arised for those using Wiktionary at more than one language site when many items are duplicated, there is a thought of Ultimate Wiktionary to have one shared database for all language sites, but language sites are not expected to be merged. So far, there are not many active users at Chinese Wiktionary.--Jusjih 01:51, 7 August 2006 (UTC)[reply]
I agree with your observation about the problems with simplified and traditional. I have tried my best to make separate entries in simplified and traditional for words that need it. One problem that I have is that there is no easy way for me to find simplified entries that lack a traditional entry or traditional entries that lack a simplified entry. I hope that Ultimate Wiktionary will solve some of these problems, but I agree that it will need an interface in many languages in order to make it a viable alternative to Wiktionary.

What types of words or phrases do you think would attract more contributors? I have tried to add words and phrases that are not commonly found in other dictionaries. As much as possible, I try to add detail so that the superiority of Wiktionary over other dictionaries can be demonstrated. I particularly like 今朝有酒今朝醉, because not long after I made the entry on English Wiktionary, someone added it to Chinese Wiktionary. When I noticed that the Chinese Wiktionary definition had a included an etymology section, I copied it and translated it into English for the English Wiktionary. I think this is an example of how Wiktionary was made better by combining information from both Wiktionaries. A-cai 10:12, 7 August 2006 (UTC)[reply]

References and Alternative spellings and Etymology[edit]

Hi, could you please leave out References and Alternative spellings? If someone wants to look up something in google, they can do that. References are for specific linked sources.

Likewise, "pinyin with tones" is NOT an "alternative spelling". No one writes the words that way. That's why I was just putting the keys we want in the index in the (so-called) inflection line. And there certainly isn't any reason to explain at length why they are there in every entry.

For Etymology, just look at all the other languages. There is a clear style. Not like this:

Etymology[edit]

  • root meaning: blood relation
    figurative meaning: to cause others to feel as if they are relatives, to be close to
  • root meaning: to cut
    figurative meaning: to be close (as a knife is to flesh when it cuts)
  • Other languages:
    Japanese and Korean: 親切

But like this:

Etymology[edit]

  • from Han blood relation + to cut (close)

Everything else can be found by the user by following the wikilinks. (Yes, the individual character entries created by "NanshuBot" are a total mess. I'm working on that!) If the definition needs more description of the exact meaning, that's fine. And quotations, usage notes, etc. are all good. When they add information. (The extended definition of belongs there; yes, I know that entry doesn't even have a Noun header yet, let alone a definition.)

We don't put translations between other languages in the English Wiktionary; if you want to put the Japanese and Korean words under See Also (at the end of the language section) that is probably okay. (Besides, you already have the page linked as the traditional form!)

You are making it much harder on yourself ;-) we don't need that much, people can follow the links. (sorry, I am Robert Ullmann 11:31, 10 August 2006 (UTC))[reply]

You forgot to sign your post, so it took me a second to figure out who it was. I agree with your comment about the Etymology section. The only reason I did it that way for 亲切 was because the individual links to the Characters did not provide the information. However, you are correct in pointing out that it would be better to correct the individual character entries rather than use a unique Etymology section for just this one word. The Nanshu bot thing has received a lot of negative feedback. I think it is better than nothing. I just wish the guy would have written the bot so that the entries conformed to Wiktionary standards!
I will gladly leave off the Alternative spellings section with all of the variant Pinyin spellings for lookup purposes. Please build away at your index, putting in that stuff in each individual entry IS a lot more work on my part.
I was trying to figure out how to show the connection between languages for when it is not obvious (see Beer parlour discussion). This is sometimes the case for entries such as 一举两得, 一舉兩得 and 一挙両得. I am trying to show the connective tissue between these entries. Maybe ===See also=== is the best way after all.
With respect to the google hits. I had originally placed these under the header ===Attestation===, but Connel M. wrote a bot (without even consulting me I might add) that changed every single one of them to ===References===. This created a side effect of some entries having two ===References=== sections, which I am still in the process of rectifying. The intent of the google hits is to provide a convenient link that can attest to the validity of the entry. It also provides a side benefit of giving me an easy way to look at 100's of sample sentences. I find the links to the google hits to be invaluable, and would like to leave them in.
I think it is important in the early phases to come up with a format that is workable for the average student of Chinese, regardless of proficiency level. In talking to other Chinese speakers about Wiktionary, I consistently get feedback that Wiktionary is simply NOT that user-friendly. I have been trying to rectify this, but it hasn't been easy given the small number of Chinese speaking contributors to Wiktionary.

Finally, you say I'm making it harder on myself. Not entirely true. You guys are also making it harder on me. I am trying to be mindful of other peoples' opinions while making entries. In fact, it was Hippietrail who TOLD me to put the related words in the Etymology section (see Beer parlour discussion). In fact, he said that it was Wiktionary policy to do so. Now, you're telling me NOT to put them there. Do you see the problem that arises? Sometimes too much feedback from too many conflicting points of view actually hinders progress rather than helps it. I would like to get back to creating new entries, but I can't do so until all of this gets straightened out. A-cai 23:10, 9 August 2006 (UTC)[reply]

Where did he tell you to put related words under Etymology? We don't do this in any other language except when they are part of the etymology. (I.e. if the Mandarin word had been derived from Korean or Japanese ...) In this case it is the "same" word in the other script form used in those two languages. What you did in 一挙両得 is good, the Japanese comes from Middle Chinese. In the example I used 亲切, the Korean and Japanese are just see alsos. (and already covered by the zh-forms template) See the difference?
If you must leave the google things in, can you please template it so we can change it without running a bot through all your entries? Use {{googleref|site=cn}} and leave out the text about "2000 references", that is just noise? See 助紂為虐 Robert Ullmann 11:31, 10 August 2006 (UTC)[reply]
Subtle point: the template searches on google.com, with an English language interface. This is appropriate for a reference from the English Wiktionary. The database and results are the same as www.google.com.tw. The database at www.google.cn is restricted to sources approved by the PRC government, so returns fewer results. Robert Ullmann 12:18, 10 August 2006 (UTC)[reply]
I don't see a problem with using English Google, but we need to make an adjustment. We have to set the language, such as:

Do you think you can make the template mimic this?

Wait a minute, compare the above results with www.google.cn and www.google.com.tw:

The Chinese language (including www.google.cn) sites DO give more results. This is important in order to get an accurate reading. It only seemed like the English one is giving more results because you're getting a lot of false hits (the language being set to English instead of Chinese). This happens when you do a search on a double byte language such as Chinese, but treat it as single-byte ASCII text. You get a lot of false positives that hit in between words. Trust me, I have spent a long time comparing the results between English Google, google.com.tw and google.cn :) A-cai 12:33, 10 August 2006 (UTC)[reply]

Using your examples just above, with the result languages specified, .com and .com.tw return exactly the same result set in the same order (26,100 results). The .com site returns more than .cn, 125,000 to 84,800 from the .cn site. This is as would be expected; the .tw site uses the identical full google db (they don't have one per country, except for the PRC) The .cn site is known to be an "approved" subset. Robert Ullmann 16:39, 10 August 2006 (UTC)[reply]

Feedback on your other comments[edit]

If you feel up to it, please read Wiktionary:Beer_parlour#relationship_between_Asian_languages in its entirety. It is possible that I misunderstood what Hippietrail was trying to say. It really doesn't matter at this point. The question remains: what should be said? Nothing? Is it extraneous information that nobody cares about? Or should we come up with a better header? Again, this only applies where there are different scripts involved. If I look at 一石二鳥, I note that it is a phrase used in both Japanese and Mandarin. However, 一石二鸟 is not used in Japanese. It seems like I should point out that 一石二鸟 -> 一石二鳥 in Japanese because not all Chinese idioms can be found in Japanese. Maybe somebody who only knows Simplified Chinese might be interested in knowing that 一石二鸟 was borrowed into Japanese.
I did read it, the other people seemed to understand just fine. But the discussion was about possibly using a "Cognates" subheader; no-one mentioned "Related terms" (If someone had, it would have been pointed out that that is "See also".) And the discussion didn't resolve to any new style decision; only that listing cognates was in general a bad idea. If the difference is Simplified v Traditional, we have that covered with zh-forms and the template references. Robert Ullmann 16:10, 10 August 2006 (UTC)[reply]
With respect to the google hits. I would be willing to entertain the idea of a template. But, I already have my hands full making all of the other changes that you have suggested. By now, I have probably put in 500 entries with the Google hits. I'm not sure I have the energy to go back and put all of these into templates.
Sigh. A bit of mis-communication that is probably frustrating you: I'm not asking you to go back and fix lots of entries; just to go forward using the defined style. There are thousands of entries in the Wiktionary with various style issues; they get fixed when people edit them for some other reason or decide to run a bot (like the SeeAlsoBot). The reason I am on about this is precisely because you are adding lots of good new entries, and it would be good if we didn't have to fix more! And I apologize if I seem to be off-putting in some way, that is not my intent. Robert Ullmann 16:52, 10 August 2006 (UTC)[reply]
Let me put it another way. I am only one person. It requires a lot of time and research to make detailed entries like 不得其門而入. I feel as though I am getting sucked into a formatting morass where the LOOK of the entry becomes more important than what it says. Do you realize that in six months, not a single bilingual person has engaged me with respect to the content of my entries (one person did try to remove a meaning from an entry because that person had never come across the term used in that sense, of course this was a beginning student of Mandarin)? If I put the tone mark over the wrong letter, you can bet somebody will chime in.

A-cai 12:50, 10 August 2006 (UTC)[reply]

Okay. Look: there are defined standards for the style we use. It is not uncommon for a new editor to think that what they are doing is "different" or not understand the style, and go invent things. We then spend a lot of time editing entries into the style. That's why Connel changed the header "Attestations" to "References": the defined style uses "References". Likewise we don't use "Footnotes": citations are in-lined or listed under References. There is a very clear way to do it all.
For example, you say you've done 500 or so entries with the google hits; it dawned on me from looking at them that you must have hand copied and pasted the URL from the google searches each time? (you certainly didn't key all those %00 hex numbers?) Did it occur to you that it shouldn't be that hard? I can make the templates do what we want. And you sure aren't going to go back and edit all those! Why would you even think that? That is an easy bot run. (You think I am going to fix the NanshuBot mess by hand?!!)
What I'm saying is follow the style, don't invent it, and you can concentrate on content as you want to. We do change the style, but with more or less discussion, when it is needed. I haven't yet seen anything in the CJKV language group that requires any style changes except for something like a translingual/Han character section for the entries that are single character words. But even that isn't that much different from a or I.
不得其門而入 is a wonderful entry; but someone will have to style-conform it to inline the references. See WT:ELE, quotations have a very well defined format: source, quotation, English translation on three lines with specific indentation (see the WT:ELE style sheet, it is all there). This is what I mean by making things hard for yourself, you are inventing non-standard variations for things already defined; and thus all you seem to hear about is the look of the entries! Robert Ullmann 15:54, 10 August 2006 (UTC)[reply]
Sigh. one more thing: the links you are using are apparently to google's cache entries, not to the sources. They will presently evaporate. Robert Ullmann 16:20, 10 August 2006 (UTC)[reply]
At first, I did copy and paste the Google hits by hand. Later on, I wrote a computer program that takes a phrase and gives you the correct Google URL. If you can figure out how to make a template do this using {{PAGENAME}}, that would be great. As of now, the template does not give correct results. When I click on the link, I see 三层转换层高支模.
oh dear, IE strikes again. This works in other browsers. They know it is UTF-8 and just generate the %00 syntax. When MediaWiki serves out pages to IE, it has to do all sorts of things to work around Microsoft IE bugs. Sigh. We'll have to do without the template. Would be nice if they were under External links though. Like you've done in 不得其門而入. No, this works fine in IE. I don't know what the problem you are having is. Robert Ullmann 13:38, 11 August 2006 (UTC)[reply]
I have looked at WT:ELE, but cannot find a specific reference to how to properly cite an internet website (which Wiktionary encourages over printed media, so that you can link to it). That is why, at times it looks like I am inventing formatting, because I am. In cases, where I can understand the documentation, I try to follow the format.
That's because the Wiktionary uses print references except when the earliest reference is online-only. (e.g. in a usenet posting). Yes, this is different from Wikipedia. And no, it doesn't mean you can't link to something, just follow the format. Robert Ullmann 09:47, 11 August 2006 (UTC)[reply]
Take a look at 不得其門而入, and see if this is closer to what you are after.
Yes, but inline what are footnotes. The first line of each quotation should be year, (in bold), Author (which may be an organization or "anon", the title in italics (and this is what you would link, and it is the title, not your description of it), and then a comment or gloss in parenthesis. And you have to link to the source, not to the google cache! I'll try editing this in a little while to show you what I mean, and you can look at it (and use it or revert it as you please!) Robert Ullmann 09:47, 11 August 2006 (UTC)[reply]
I know what you mean about people not consulting the reference materials. I made sure that I read all of the WT:ELE before I started making serious entries. While it may seem like a perfectly clear and lucid set of guidelines for the people who wrote it, there are places where it is not clear exactly what the formatting should be. For example, one might have to start at WT:ELE, then link to another help page, and then another. I also think there need to be more examples to look at. I think another thing Wiktionary should do, if it has not already done so, is to provide a simple English version of WT:ELE that is easily accessible from the main page.

Anyway, I think it is critical to have the pages have a consistent look and feel because, in reality, that is how most newcomers will learn to make entries (by copying the format of existing entries). You say others will eventually go back and correct the formatting. This is true, given enough time (a few years or more). Until then, it appears that if I don't do it, nobody will. This is why it is critical that I understand exactly what the format of the entries should be. It is why I have spent a lot of time over the last month consulting as many others as will give feedback about whether my entries are in compliance. A-cai 22:48, 10 August 2006 (UTC)[reply]

Okay! I edited 不得其門而入. I put the quotes in the quotation style, made the links go to the articles (not the google cache which is temporary). I used the wikilink syntax for the Analects wikisource (:zh: goes to the zh wiktionary, then s: goes to the zh wikisource). Put your glosses in parens. I couldn't read the second article well enough to identify what the title and author should be (and please check the first!). Now I just need to make cmn-tra-idiom ... Robert Ullmann 11:55, 11 August 2006 (UTC)[reply]
Take a look at 不得其門而入 again and tell me if I am doing this correctly.
  1. The year - I can't find the year for the second article either, should we put something like undated?
  2. the author - Is it ok to put anon when an article is written by an editorial staff?
  3. the title of the article - This should be the title of the article (in Chinese), followed by the translation of the title in parentheses, is that correct?

One final question, the example sentences are from a Mainland publication (Simplified Chinese). However, this entry is for the Traditional form of the phrase. Which do you think would be more appropriate, Simplified (because that is the script of the original article) or Traditional (because the entry is in traditional)? Note: I already put all exerpts in Traditional Chinese if the entry is a traditional Chinese entry. I only put both script forms if the entry is the same in both traditional and simplified. A-cai 13:06, 11 August 2006 (UTC)[reply]

Not sure what to do about the year. I put in "Design News China editorial staff", seems reasonable. The Title should definitely be in the original form. What you put in parens is a "gloss", something that explicates it. (Like anything else in parens.) A translation is fine.
The script issue is (well, annoying!) ... Quotes should always be in the original script of the source (they are quotes). I'd tend to think they should only be in the entry for that script, after all you are documenting use ... 不得其門而入 should be documenting use of the word/idiom in traditional script. 不得其门而入 should be documenting use of the simplified form. (The only way a use could possibly apply to both forms is if they are the same, and then there is only one page anyway.) Keep in mind that someone is very likely to be looking at both entries.
Yes, this means IMHO that both of these quotations belong in 不得其门而入, and only there. 不得其門而入 is okay with just the definitions. Unless you have quotes in traditional; but keep in mind that you don't have to provide quotes for everything! Most entries don't have any. Some just have invented examples explicating use (see run). The etymology should be in both entries, but I hope that quotation is in the original script in both cases... (grin) Robert Ullmann 13:38, 11 August 2006 (UTC)[reply]
btw the google template works for me in IE. see above. not sure what's up. Robert Ullmann 13:38, 11 August 2006 (UTC)[reply]

Response[edit]

With respect to script choice, I have based my approach on several observations:
  1. If I enter a phrase into Google in traditional characters, it will find all entries that match the phrase as well as all entries that match the simplified version of the phrase (the same is also true for simplified -> traditional).
  2. If you were to peruse any of the Chinese language Wikis, you would note that a tab at the top allows you to select simplified or traditional as you prefer.
  3. Many books and periodicals written in the PRC are reprinted in traditional script and vice versa.
From all of this, I conclude that the script is not the defining aspect. However, most Chinese speakers and students of Chinese tend to prefer one script over the other. If we were to mix and match scripts within the same entry (except for comparison purposes at the top), it would lead to a lot of confusion at the very least and would not be helpful at all at the worst. I'm not confused, I know both. But I am in the minority.
There is some merit to your argument for putting only simplified links to simplified entries. However, it would create twice the amount of work by requiring twice as many quotations. On the other hand, if I only do quotations for one and not the other, one of them gets shortchanged.
To me, the most important thing is to provide the reader with an example of how the phrase is used regardless of where it came from. If the phrase is used in the same way in the PRC and Taiwan, it should be no problem to provide a quote from the PRC in a traditional entry or vice versa. The only exception to this would be where there exists a difference in usage. This is where the "Taiwanese Mandarin" category comes into play.
Finally, I tend to only provide quotes when I feel as though the definition or usage is insufficient without an example. This happens frequently in the case of idioms. An idiom such as 不得其门而入 has a history of 2500 years. As you can see from the example sentences, it is used quite differently now than when the phrase was coined.
BTW, if we really want to be absolutely faithful to the "correct" script, the Analects of Confucius should be written in neither what we now know as simplified NOR traditional! What you know as traditional script was not actually invented until hundreds of years after the Analects of Confucius was written. Technically, we should use Seal script :)

A-cai 14:25, 11 August 2006 (UTC)[reply]

Templates[edit]

Now that we have sorted out a very large number of things, I've gone back to getting the POS templates in good order, using sû-tián, 辭典, and lâng as the examples still. I started with nan-poj-noun again. The template now has some documentation, and doesn't insert itself in the category (!). Look at Template:nan-poj-noun.

I fixed it so that if you don't specify sim= it will say "traditional and simplified". Most importantly, you don't have to wikify simple parameters, it will wikilink them if needed. So in the examples:

  • {{nan-poj-noun|pojn=su5-tian2|sim=辞典|tra=辭典}}
  • {{nan-poj-noun|pojn=lang5|tra=[[人]] or [[儂]]|sim=[[人]] or [[侬]]}}

and it will categorize and sort (and search) correctly. Should all the POJ words continue to be categorized in zh-min-nan? (it is okay if the templates end up doing the same thing twice in one entry).

I think I can do everything you have specified so far. (Except a true radical/stroke index, we will only get the first character as we've figured out.) Robert Ullmann 12:35, 14 August 2006 (UTC)[reply]

note: don't go back and replace templates into entries you've already done unless you'd like to try a few. That is good simple work for a bot! When we are happy with them. Robert Ullmann
Looks good. I was about to post you a well considered response, when my browser locked up and I lost everything that I had been typing in for the last 10 minutes %$#@!#%!!!
I will try to recreate the substance of what I was going to say before my computer acts up again. Three things:
  1. I think we should explicitly state that sû-tián is POJ. All of the POJ on wikis is a bit misleading IMHO. In my personal experience with typical Min Nan speakers, very few are aware of how to spell their language properly. This is why I created w:Template:POJtable on Wikipedia.
    Done. Robert Ullmann 13:43, 14 August 2006 (UTC)[reply]
  2. Type bo5-kang5 or bo5kang in the search box and hit search. You should get bô-kâng. This is because I put [[Category:nan:Adjectives:bo5kang5]]. Interestingly, if you type bo1-kang1 or bo1kang1, you still get bô-kâng! I don't know what to make of that, except to conclude that we don't need the "-" for sort or search purposes. I would just as soon do without it. Less typing for me. What do you think?
    Because POJ is written with hyphens, the POJ-with-tones always seems to be that way. Think we should keep it that way. Robert Ullmann 13:43, 14 August 2006 (UTC)[reply]
  3. Take a look at Category:zh-tw:Idioms again. I've redone like this: [[Category:zh-tw:Idioms|一02{{PAGENAME}}]], and it seems to be sorting correctly! I'm working my way through idioms, because they are the most complicated things we will have to deal with. If we can get the format correct for those, the rest of the words should be easy by comparison. I found a good example sentence for 四面楚歌, let me know if I did the format correctly.

A-cai 13:35, 14 August 2006 (UTC)[reply]

There is also cmn-tra-idiom, I used it in 四面楚歌 and it categorizes and sorts properly. Robert Ullmann 13:43, 14 August 2006 (UTC)[reply]
Looks good, I like the Landis quote. Current now, and 10 years from now it will be very interesting! Are you using 09, 10, 11 for additional strokes or your 0a, 0b thing? The two digit numbers are better ... There will be a cmn-sim-idiom for you in about 3 minutes. Robert Ullmann 13:49, 14 August 2006 (UTC)[reply]
Done. See 爱不释手. Robert Ullmann 14:01, 14 August 2006 (UTC)[reply]
I have updated all of nan-(poj,tra,sim)-noun, nan-poj-adj, cmn-(tra,sim)-(noun,idiom) (if you understand my syntax so I don't have to type them all ;-) Tell me what else you need. (oh, and what about category zh-min-nan? does it go away? There are pages categorized in it, but it doesn't exist?) Robert Ullmann 14:30, 14 August 2006 (UTC)[reply]
I agree that 09, 10, 11 is better. I think it will have the same effect, and is easier to read.
Before the template was put in, 爱不释手 was put as [[Category:zh-cn:Idioms|ai4bu4shi4shou3]]. The reason I chose the numbers over [[Category:zh-cn:Idioms|àibùshìshŏu]] is that Wiktionary does not sort the diacritic marks according to Pinyin convention (1st tone, then 2nd tone etc). It's not obvious from your template which sort is used. Do you have a routine that converts àibùshìshŏu to ai4bu4shi4shou3 for use with the Category fields? Is it possible to have the template do something like that? If it is possible, it would save on typing, and reduce the possibility for typos.
The other thing about the order: When I first looked at 爱不释手, I thought oh no, he listed a simplified entry as traditional! It took a second to realize that you were listing the traditional version in the parentheses. It may be better to do:
爱不释手 (Pinyin ài bù shì shŏu, traditional 愛不釋手)
That way, it separates it a little. Also, could your template compare {{PAGENAME}} to the tra and sim variables, and then format the headline accordingly? I'll use pseudo code:
If the value in tra equals {{PAGENAME}}, then it's a traditional entry and we format the headline:
愛不釋手 (Pinyin ài bù shì shŏu, simplified 爱不释手)
or else if the value in sim equals {{PAGENAME}}, then it's a simplified entry and we format the headline:
爱不释手 (Pinyin ài bù shì shŏu, traditional 愛不釋手)
or else if the value in pin equals {{PAGENAME}}, then it's a pinyin entry and we format the headline:
ài bù shì shŏu (Pinyin, 愛不釋手爱不释手)
or else the value in sim is empty, and we format the headline:
四面楚歌 (traditional and simplified, Pinyin sì miàn chŭ gē)
If you could figure out how to do the above with a template using :ifeq (I think), we should only need one template called cmn-idiom, instead of three templates called cmn-sim-idiom, cmn-tra-idiom, and cmn-pin-idiom. Also, I think it would be better to have cat1=, cat2= ... cat9= or something, since we could have an entry assigned to multiple categories. Would we use multiple templates if we had a word that was both a noun and a verb, for example?
I am unfamiliar with category zh-min-nan. If I did create it, it must have been a long time ago. Currently, I use "nan" for all categories (or nan-tw, nan-cn in the case of traditional/simplified Min Nan entries) per feedback from other Wiktionary contributors. The consensus being that "nan" is an ISO-639-3 code, so the zh-min part is unnecessary.
Finally, at first I didn't translate the example sentences. But I must thank other Wiktionary contributors who suggested that I include a translation of the examples. This has occasionally forced me to reconsider my definitions. Anyone who has done translation knows that the words you choose for a translation depend on how the source word or phrase is being used. My favorite recent example is from the Chinese subtitles of an English language movie. The English phrase was "the apple does not fall far from the tree," but was used in a negative sense. I think the movie was Mystic River with Sean Penn, describing how his character had negative influence from his father. The Chinese phrase in the subtitles was 上樑不正下樑歪 (lit. when the upper beam in a house is not straight, the lower beams will be crooked fig. when the parents do not set a good example, the children go astray). In that one instance, it perfectly captured the meaning of the English sentence. Despite that, I would be hesitant to define 上樑不正下樑歪 generically as "the apple does not fall far from the tree" because the English phrase can be used both in a positive and negative sense, whereas the Chinese phrase is only used in the negative sense. Anyway, I digress ...

A-cai 22:45, 14 August 2006 (UTC)[reply]

I was thinking for a while along the same lines as you about a single (e.g.) nan-noun template, and didn't do it that way for various reasons. (Lest it seem I am knocking your idea; I went about the same thought process.)

The first is that it is fragile, not robust in a number of ways. It requires the user of the template to get it just right to generate the right form. Consider what happens when the parameter doesn't quite match the pagename. Maybe it matched, and one of the tone diacritics was wrong, and the page was moved to the correct headword. Now the template is broken. There are other ways this can happen as well.

Users aren't going to read the documentation. The template would be too "magical"; a user who puts it nan-noun with (say) just the sim= parameter because they aren't familiar with traditional on a POJ page is going to have surprising results, if the template can do anything sensible at all. You know all the forms; other people copying the style of existing pages aren't going to.

And if the parameter that matches the pagename is wikified it won't match. (and so on)

If I had a full programming language I would simply check if the page was POJ (or some romanization), or simplified or traditional (by referencing the code tables), and then go on to generate default forms (that could be over-ridden by parameters), then some large percentage of the time you'd only need {{nan-noun}} ... but we don't.

What we could do is have one template with a required first parameter (t, s, ts, p) that specifies the headword form?

On the other points: Usually there is only one POS-type category for words. I.e. a word would be in :nan:Nouns unless it is in a subcat of that, one or the other. There are other categories, but those tend to be explicitly specified (not in the template). I'll delete cat:zh-min-nan when I see it since it doesn't exist. If a word is both a noun and a verb you use both headings and both templates. The English definitions are not likely to be identical (see 安全. Robert Ullmann 11:51, 15 August 2006 (UTC)[reply]

愛不釋手 (traditional, Pinyin ài bù shì shŏu, simplified 爱不释手)
爱不释手 (simplified, Pinyin ài bù shì shŏu, traditional 愛不釋手)
ài bù shì shŏu (Pinyin, traditional 愛不釋手, simplified 爱不释手)
四面楚歌 (traditional and simplified, Pinyin sì miàn chŭ gē)
This makes it very clear to the English user of the wikt, and also the editor specifies the parameters that are not the headword in each case (and the category keys). Are we going to have Pinyin entries? It also sets up a pattern that I can use for Japanese and Korean. Right now there isn't anything that tells the user that this is hangul, and this is hirigana, this is Hanja and this is shinjitai Kanji and so forth. Robert Ullmann 12:14, 15 August 2006 (UTC)[reply]
Categories require more thought, but we can make the templates do what we want later. If you look at other languages, :nan:Nouns really should be "Min Nan nouns", and then the individual categories are :nan:Cat. How this is supposed to work with 2-3 different scripts in different categories I don't know! ;-) Robert Ullmann 12:31, 15 August 2006 (UTC)[reply]
btw, I added wlink to zh-ts so you don't have to type all the square brackets to wikilink each parameter, unless you want to do it in parts. Robert Ullmann 13:22, 15 August 2006 (UTC)[reply]
I understand your hesitation about making the template too sophisticated. If we rush the template too quickly, without fully understanding what we want or what the ramifications of using the template might be, we may end up with a more confused situation than we have already.
I think that ultimately, we are both motivated by a desire to attract more contributors to Wiktionary. I think we do this in two ways:
  1. Make Wiktionary easier to use for non-computer people
  2. Fill Wiktionary with words that you can NOT find in other dictionaries (or can find, but with nowhere near the same level of detail as what Wiktionary provides)
I'll give you a quick example: I added 佚名 a little bit ago. Why did I do that? Because I looked this word up in a half dozen online and offline dictionaries, and was surprised to learn that I could find this word in NONE of them (not even in the largest ones)! What's so weird about that? Check out the number of Google hits! It will take years for us to add all of the words that are already in dozens of other dictionaries. But that is not why I got involved with Wiktionary. The reason I started adding words to Wiktionary was because of words like 佚名. Had Wiktionary not been around, I would have just filed it away as interesting trivia but not bothered to do anything about it, because I am not in the habit of calling up dictionary publishers every time I run across an interesting word.
Don't get me wrong, I think we eventually need to include the same words as all the other dictionaries. But I think we can make a HUGE impact immediately by adding common words that are not already in other dictionaries. I'm probably preaching to the choir, but I'm constantly trying to figure out how to get more bilingual people involved with English Wiktionary (selfishly ... especially Asian languages). Easy to use and information that is not available elsewhere ... I think these two things are key.
While we're on the easy to use subject, I feel like any of the Asian languages are an uphill battle. In the case of Mandarin, we have traditional script, simplified script and Pinyin (letters with diacritic marks). There very well be millions of Mandarin speakers, but relatively few know how to do all three of those things. Also, the current Wiktionary policy is encouraging me to create up to three separate entries for every single word! This also makes statistics for Mandarin entries a nightmare. I'm not blaming Wiktionary or Mandarin ... I just wish I knew of a way to lessen the pain of entering Mandarin. I am convinced that this is our main stumbling block to attracting more contributors of Mandarin words.

Ok, that rambled a bit ... but hopefully, you catch my drift. A-cai 13:53, 15 August 2006 (UTC)[reply]

See Template:nan-noun used in the usual examples: sû-tián, 辭典, and lâng. I changed "pojn" to "pojt", it was bugging me, it should be derived from POJ-with-tones, not POJ-with-numbers ;-). Better?
You answer your own question about what we need three entries: "There very well be millions of Mandarin speakers, but relatively few know how to do all three of those things." I do think you might not try so hard to find quotations to use in all 3 and forth. Do people write in Pinyin? Are we going to have entries for those, as with POJ?
As you say (and I agree with you about attracting contributors, and adding interesting words), it should be easier. Note that most users don't enter totally complete entries as you do; it is very common for a new user to just enter a one sentence definition, and someone else wikifies it, and someone else adds a citation, and so on. If someone had added 佚名 with one line that said "anonymous in Mandarin", you or I or someone could wikify it, then add the template, and so on. This happens very frequently. (one of the reasons WT:ELE isn't pushed at the person who has just shown up is to keep them from being scared off by all the details! The welcome message starts with the simple tutorial.) Robert Ullmann 14:47, 15 August 2006 (UTC)[reply]
Robert, people do write using Pinyin. Children in the PRC learn Pinyin first before learning Chinese characters. This is also true for beginning students of Mandarin. It is mainly used as a learning tool in the early stages. However, in both cases, students are encouraged to wean themselves from Pinyin as early as possible. I would say that in order for Wiktionary to be useful to beginning students, it is important that we provide them a way to find words according to the Pinyin spelling. My initial thought is to only provide examples sentences in Pinyin for beginning level words (for example, 所以). click here to see an excerpt from a work by Lu Xun in Pinyin. Click here to read an interesting debate in English about writing in Pinyin vs Chinese characters.
You mention about how if someone added a word in the wrong format ... Yes, generally I will wikify new entries if I notice them. It would be nice if Wiktionary had some kind of category where I could find entries that need some help. Currently, the only way for that to happen is for someone to tag it with {{rfc}} by hand, or for me to stumble across by accident.
I agree that it is a pain to add example sentences. I also agree that I'm one of the few people that does it. However, I continue to do so because of positive feedback from readers. In fact, I believe it was Hippietrail who pointed out that example sentences is the thing that will make Wiktionary stand out from the competition. I agree with his sentiment. Why do I add example sentences for foreign languages when others don't? Beyond the fact that it is a pain in terms of formatting, one has to be very fluent in both that foreign language AND in English. Otherwise, you won't be able to determine whether a given quote is truly representative, nor will you be able to render it into idiomatic English.

A-cai 23:02, 15 August 2006 (UTC)[reply]

I didn't know whether anyone wrote in Pinyin, which is why I asked ... there is a {{wikify}}template but it isn't used very much. So what do you think of template:nan-noun Is this useable? Robert Ullmann 11:10, 16 August 2006 (UTC)[reply]
I think it looks like the format we want. However, it might be good to have some usage examples on the Template page. I think I understand the correct way to fill out the template based on your expanation of the variables. But actual examples might be helpful.

A-cai 12:01, 16 August 2006 (UTC)[reply]

Greetings, A-cai. Please check my work on these entries and let me know if I have erred. Cheers! bd2412 T 01:50, 16 August 2006 (UTC)[reply]

Oh, and 金屋藏嬌. bd2412 T 01:55, 16 August 2006 (UTC)[reply]
BD2412, I reformatted the entries according to the consensus reached in Beer Parlor over the last few weeks. Robert (see above) has been working on creating standard templates for a lot of this. In the fullness of time, it will be good to write some Wiktionary guidelines specifically for Chinese entries. Someone did start such an article, but it is hopelessly vague. We need something like WT:ELE, but specifically dealing with the issues unique to Chinese.
On a linguistic note. Your entries were pretty much on the mark as far as meaning. I just tried to flesh some things out a little, and reworked the etymologies for better readability. My thanks go out to all of the zh:Wiktionary contributors who provided the etymology information for 金屋藏娇 on their page. I know for a fact that I would not have come up with that on my own!
With respect to , I never have liked the way the individual character pages do this. is indeed an alternative character for , but it is a non-standard alternative. I can see how this may be confusing from looking at the Wiktionary entry for .
Hopefully, you like the changes that I made. But if there is something that you have a question about or disagree with, let me know and we can bat it back and forth until we're happy with the entries.

A-cai 14:03, 16 August 2006 (UTC)[reply]

I had considered putting in examples of usage for 金屋藏娇 throughout Chinese history. Many Chinese language idiom dictionaries include this type of information as a matter of routine. But I think this may be overkill for native English speakers who are not fluent in Chinese. I'm guessing that if a person is looking up 金屋藏娇 on English Wiktionary, they probably want to know the etymology, pronunciation, definition and modern usage. I'm not sure historical usage is required for an English speaking audience. However, if you would like this information included, let me know and I will be glad to add it for you.

A-cai 14:18, 16 August 2006 (UTC)[reply]

POS categories and Topical categories[edit]

There is a discussion in the Grease Pit about these that I just commented in. The issue for us is that nan:Nouns really ought to be Min Nan nouns; but then what do we do about the different script forms?

I took the cat= out of the template, a topical category shouldn't replace the nouns category; but we'll need to figure out what might be added later. See what happens over there WT:GP. With (almost) any result, I can mod the template(s). The topical categories may be separate. (IMHO they should be.)

What is most useful now? cmn-idiom? I'd like to see if I can get you using them ... ;-) Robert Ullmann 14:36, 16 August 2006 (UTC)[reply]

How is this 金屋藏嬌 ? Robert Ullmann 15:01, 16 August 2006 (UTC)[reply]
Robert, let me think about cmn-idiom for a while. It seems alright, but there's something about it that I can't quite put my finger on. I guess the main thing I'm looking for is a way to minimize heavy re-editing between simplified and traditional.
As it now stands, I would have to do a bit of retyping in order to modify the template according whether the entry was simplified or traditional.
  • {{cmn-idiom|t|pin=[[jīnwū]] [[cáng]] [[jiāo]]|sim=金屋藏娇|rs=金00}}
    {{cmn-idiom|s|pin=[[jīnwū]] [[cáng]] [[jiāo]]|tra=金屋藏嬌}}

For example, if I entered it like:

  • {{cmn-idiom|t|pin=[[jīnwū]] [[cáng]] [[jiāo]]|tra=金屋藏嬌|sim=金屋藏娇|rs=金00}}
  • {{cmn-idiom|s|pin=[[jīnwū]] [[cáng]] [[jiāo]]|tra=金屋藏嬌|sim=金屋藏娇|rs=金00}}
  • {{cmn-idiom|p|pin=[[jīnwū]] [[cáng]] [[jiāo]]|tra=金屋藏嬌|sim=金屋藏娇|rs=金00}}
but still had it display the same way as it does now, I would only have to change one letter, and copy and paste it from one entry to the other. Otherwise, the template probably wouldn't substantially reduce the amount of editing that I have to do between simplified and traditional. Also, the less retyping that I have to do, the smaller the chance to introduce typos across the entries. Have you shown {{cmn-idiom}} on grease pit? Have they given any feedback?

A-cai 22:12, 16 August 2006 (UTC)[reply]

I may be preaching to the choir, but I really don't understand why we should have Min Nan nouns, but nan:everything else! Besides, the ISO 639 version makes sense for a number of reasons. First of all, Min Nan is known by MANY names: Min Nan, Southern Min, Taiwanese, Hokkien, Holo, Fukianese etc. If I'm not mistaken, the ISO 639 code represents all of these. Secondly, I doubt people will type in the category in the search box as often as they navigate their way to Category:Min Nan, where it is fairly easy to understand that nan is the code for Min Nan.

A-cai 22:46, 16 August 2006 (UTC)[reply]

Well, every other language other than Mandarin, Min Nan, and Cantonese has "(Language) nouns" as a category. Also verbs, adjectives, etc. Look at Category:Nouns by language. And there are a number of languages with multiple names, we pick one for the headers and categories etc. Min Nan probably has the most variation though! We can have Min Nan nouns and :nan:Nouns; entries can have a number of categories (sometime when ttbc is used, a page will end up in dozens of categories).
I have an idea about how the categor(ies) should be organized, and I'm going to try it out. It does not change anything that already exists. More later.
As to having the template work the way you suggest above (that is, ignore the tra parameter if form is t or ts) this is a SMOC (Small Matter Of Code ;-), and a good idea, it will also keep new users out of some trouble (if they use tra= on a t entry, they won't know what they did "wrong"). In your example, you would also have pint= in all the entries, used only where needed. (this is already true of pint= and rs=) I haven't shown it off yet, thought I'd ask you first. Robert Ullmann 11:16, 17 August 2006 (UTC)[reply]
Done. cmn-idiom works the way you want. Also nan-noun.
Is it really necessary to sort categories on Pinyin-with-tone-numbers rather than just Pinyin? (or POJ-with-tn rather than POJ?) Seems like a lot to demand that every entry have the form with tone numbers when the Pinyin, etc. would produce almost the same result? (often exactly the same result?) Robert Ullmann 12:24, 17 August 2006 (UTC)[reply]
I understand your question, and if we were talking about only a few hundred words, I would agree. However, when you get to the point where you have several thousand words in a single category (as I think/hope we one day will), it becomes important for the sort order to behave the way one would expect for Pinyin. Unfortunately, Pinyin sorts according to syllable/tone number (ex. ma1 ma2 ma3 ma4), whereas diacritics over Pinyin sort according to ascii order (ex. má mà mă mā -> ma2 ma4 ma3 ma1). If one were accustomed to Pinyin/tone order, this would become extremely confusing.

A-cai 13:32, 17 August 2006 (UTC)[reply]

Understood. Robert Ullmann 13:37, 17 August 2006 (UTC)[reply]
This may seem like an odd question: Does it make sense to say that 金屋藏嬌 is in Chinese traditional hanzi? Or does that sound like it could only apply to a single character? I'm looking for the best English translation of 繁體字 / 老字 -- as applied to words or text -- that starts with "Chinese ...". Robert Ullmann 14:00, 17 August 2006 (UTC)[reply]
I would know what it meant if I read it. Are you trying to decide on a new header? I am torn about using the word hanzi. I understand that it is a more precise word than something like ideograph or character. Actually, I think character or characters is fine. The chinese word zì (like most Chinese nouns) can be either singular or plural. The word for typewriter is 打字機打字机 (lit. type characters machine). As you can see from the word typewriter, can refer either to a letter or a character (in programming, 字串 is a string variable, lit. a character string, and 字節字节 means byte, lit. character node). I believe the technical word for this is "glyph" is it not?
hàn simply refers generically to Chinese (either the language or the people). So literally, hànzì can be translated as Chinese character.

I hope this is enough information. Let me know if I missed the mark on what you wanted to know. A-cai 14:22, 17 August 2006 (UTC)[reply]

I'm toying with creating a category or two categories with a radical/stroke index for all the hanzi entries; yes it would be big, but if one was trying to look up something without knowing exactly what the characters were or exactly how to key them? Maybe in that case "Chinese hanzi" without the t/s distinction might be useful, or might not. Just thinking. Robert Ullmann 14:41, 17 August 2006 (UTC)[reply]
I don't think it's a bad idea. If the category becomes too unwieldy at some point, you could always further divide it. For example, you could have 214 categories (one for each radical). If those end up being too big, you could further divide the radical categories into [[Category:一00]], [[Category:一01]] etc.

A-cai 14:50, 17 August 2006 (UTC)[reply]

I combined all the templates into nan-(noun,adj,verb),cmn-(noun,adj,verb,idiom) and the old ones are gone. Put a small not on WT:BP. Thanks for your help so far! Robert Ullmann 21:52, 17 August 2006 (UTC)[reply]
Why do you say that TOC template "never worked"? It was working just fine for me today .... Robert Ullmann 22:30, 17 August 2006 (UTC)[reply]
The regular TOC template works fine. It's the {{CategoryTOC-radical}} that has never worked.
Also, when I filled in the sim value in a simplified entry, it sorts it under {. Is there a way to fix this?

A-cai 22:37, 17 August 2006 (UTC)[reply]

Ok, I figured out what I did wrong. I did not have the pint value filled in.

A-cai 22:42, 17 August 2006 (UTC)[reply]

Yes, I made some of them default to ?, but not all, I left the existing cat code. {{CategoryTOC-radical}} worked fine for me, what do you think was broken? We should make a working one in any case. Robert Ullmann 22:47, 17 August 2006 (UTC)[reply]
If you go to Category:Chinese hanzi and click on at the top. It will update the page to begin listing at:

etc.

Now go back into the history before I made the modification and click on in the {{CategoryTOC-radical}} table. It will update the page, but still list from:

etc. A-cai 22:53, 17 August 2006 (UTC)[reply]

Did that, works fine; lists from .... maybe an IE bug? Don't know. We'll sort this later. Leave yours there for now. (It is 2AM here ;-) Robert Ullmann 23:03, 17 August 2006 (UTC)[reply]

Also, the idiom is being placed in the correct rs spot in Category:Chinese hanzi, but not the nouns (compare 辭典 to 金屋藏嬌).

辭典 had not been "touched" since the last template edit; it doesn't have the rs= parameter, and is now under ?. I think if you add the rs parameter it will work correctly. Robert Ullmann 23:03, 17 August 2006 (UTC)[reply]

A-cai 22:57, 17 August 2006 (UTC)[reply]

Interesting, 辭典 had the rs info in the {{cmn-noun}} template, but not in the {{nan-noun}} template. Apparently, the {{nan-noun}} overrode the {{cmn-noun}} template, so it was listed under ?. After I copied the rs info from the cmn to the nan template, it sorts fine.

A-cai 23:10, 17 August 2006 (UTC)[reply]

I've been editing a few entries with your new templates. One thing that I like is that if you put in all of the fields, you have a way to look up everything via Pinyin (Mandarin idioms - regardless of whether it is simplified or traditional or zh-cn:Idioms for simplified only). Similarly, we have a way to look up everything via radical/stroke (Chinese hanzi - regardless of whether it is simplified or traditional). One observation, we can look up everything by radical stroke in Chinese hanzi, why not be able to look up everything in Pinyin as well? For example Category:Chinese terms sorted by radical and Category:Chinese terms sorted by Pinyin. This would be an easy modification to your template I think.

A-cai 02:04, 18 August 2006 (UTC)[reply]

I edited it a bit; but don't know whether the Mandarin is s or t or ts (I don't know either of the characters, and our own Han char pages aren't that helpful ... ;-) Left me wondering what I should put in as a first parameter. I put in - but just leaving the first parameter out does the same thing. Robert Ullmann 11:33, 18 August 2006 (UTC)[reply]

I have an idea, if there is no s t or ts ({{{1}}} is blank), the word gets placed in a category like [[Category:Hey, A-cai! Take a look at this and fix it for me will ya!]] (Just kidding :). How about something like [[Category:Mandarin entries with missing data]]. Ok, I'm having trouble coming up with a catchy name, but something like that. The category should be linked to Category:Chinese language or a subcategory like [[Category:Mandarin entries that need expert attention]].

A-cai 12:11, 18 August 2006 (UTC)[reply]


gekkō[edit]

Thanks for fixing that; I thought I'd gotten them all. JWPCe doesn't like the ō character. Guess I'm going to have to be more careful next time. :) Ric | opiaterein 12:38, 19 August 2006 (UTC)[reply]

No problem, I figured it was probably a typo, since all of your other entries use ō.

A-cai 13:01, 19 August 2006 (UTC)[reply]

One of the reasons I like looking at these entries is that I keep learning things! (and I see we have cmn-proper now)

You might find the "pipe trick" useful in saving some typing on wikipedia links in particular:

[[w:January 28 incident|]] renders as January 28 incident, you don't need to type the entry name again. It gives you everything between the leftmost : and the first (, so for example Mandarin. Robert Ullmann 12:27, 20 August 2006 (UTC)[reply]

Thanks for the tip, that should save on typing.

A-cai 12:28, 20 August 2006 (UTC)[reply]

Thanks, I didn't have a good example, and hadn't tested it properly (as you can see, I think you know what I missed) very good. The kanjitab thing alwasy seems to separate out all the characters, ja-forms will link the appropriate segments? And for some reason ! in the first column works even though it is supposed to be | ? Maybe that is to make it easier when | would give trouble like inside templates, so you can use ! instead? I really don't know ;-) Robert Ullmann 20:12, 20 August 2006 (UTC)[reply]

I'm not sure. I mainly used this template as a courtesy to the regular contributors, who seem to like this template. Of course, I'm partial to my own template ja-forms, so that's why there are two in the entry.

A-cai 20:17, 20 August 2006 (UTC)[reply]

Is the crime sense a noun stem or a verb stem? The original implied it was to commit a crime? You use "stem form of" after the definition in that one, and before in the other two, and parentheses differently ... we'll have to consider this; how the stem forms are to be handled. But yes, much better. Wiktionary:Quotations says to use abbrev "tr." after a translator's name, although they are referring to when that is in author field. Since the last line is the translation, I think it is just fine the way it is, you've already attributed authorship. Robert Ullmann 21:50, 20 August 2006 (UTC)[reply]

Good question ... I will attempt to remove the rust from my knowledge of Japanese grammar. Stem forms are not a part of speech as such, but rather are forms that words take when used as specific parts of speech (if that makes any sense). In this case, the stem form is the Continuative (連用形) form. This stem form has several uses. In the example sentence from Genji, the verb okasu (to commit a crime) has been made into the noun okashi (commiting crimes; crimes). This particular form can also serve to make some words into verbs by adding the verb suru (to do). For example, you could have a noun/verb compound phrase such as okashi suru (lit. to do a crime, ex. ... 犯しする危機的状態を言うらしいです[9] it seems that people say that dangerous situations in which crimes are perpetrated ...). I've now embarrassed myself. I am NOT an expert in Japanese grammar! Anyway, I think what I've told you is accurate on the whole. Any Japanese grammarians out there? Help!

A-cai 22:58, 20 August 2006 (UTC)[reply]

Sounds about right to me, but I think I know a good bit less that you. I do know about verbification of nouns (noun stems) with +suru; remember Bushusuru? (nyuk, nyuk, nyuk ;-) Robert Ullmann 23:13, 20 August 2006 (UTC)[reply]

永远, when you have a moment[edit]

Please check my entry for 永远. Thanks. bd2412 T 23:43, 20 August 2006 (UTC)[reply]

I took your suggestion (although you were talking about Chinese). Words can get put in this category by templates either whenever the template it used (janoun) or conditionally if there is something not quite right (although none do that yet). Just thought I'd tell you about it. Robert Ullmann 15:24, 22 August 2006 (UTC)[reply]

Hmmm, speaking of which... Connel MacKenzie came across the following swath of recently-created, wholly unformatted entries for Japanese words: ならびに; ならべる; 丹花; 丹毒; 丹碧; 丹青; 丹頂; 丹朱; 丹色; 丹誠; 丹精; 丹念; 丹塗り; 久遠; 久慈目; 久懐; 久闊; 乗じる; 乙夜; 乗っ取る; 乗っ切る; 乙子; 乙女; 乙姫; 乙巡; みだす; 乱雑; 乱暴; 乱闘; 乱層雲; 乱視; あらかじめ; 混乱; 乱入; 仙人; 詩仙; 仙丹; 仙人掌; 阿仙薬‎. Thought you might have a crack at them. Cheers! bd2412 T 20:54, 22 August 2006 (UTC)[reply]

bang3 and about 1,300 others[edit]

I plan to make entries for each of the 1,300 or so pinyin transliterations using numbers in place of tone marks, basically patterned after the entry I just made for bang3. Do you have any recommendations as to how these should be written, formatted and categorized? Cheers! bd2412 T 18:02, 22 August 2006 (UTC)[reply]

Sorry guys, I got side tracked on Wikipedia for a day or two (check out w:Kwai Chang Caine#Trivia about Caine's name). Robert may be a better person to ask about formatting. I would say that since Pinyin generally refers to Mandarin, it would be more proper to use Mandarin as the level two heading. By the way, Robert's templates have a nice side effect. It allows you to search entries by entering the Pinyin with tones or just the Pinyin by itself. For example, try typing ci dian into the search box, then hit search. Make sure you leave a space between the syllables if you don't plan on using the tones in the search. A-cai 23:15, 22 August 2006 (UTC)[reply]

Thanks - good stuff! bd2412 T 23:19, 22 August 2006 (UTC)[reply]
Yes, we complained pretty loudly about plans to have "Minnan-ASCII-bot" generate POJ-with-tone-numbers (su5-tian2) instead of POJ. The entries ought to be POJ or Pinyin (or romaji or Jyutping...), you can find them by the index key with the tones. That way we don't have a lot of entries that aren't really words in the main namespace. Just make them Pinyin entries, language heading Mandarin, and use the pint= keys in the cmn- templates. They will also sort nicely into the POS categories. Robert Ullmann 11:50, 23 August 2006 (UTC)[reply]
Note that "bang3" isn't a great example for the wiki search; it does find the Han characters, but also finds a lot of English with bang in it. Why the wiki s/w ignores the 3 I don't know; I don't think it should. Maybe this is a project setting we can fix? We shouldn't even have an entry for bǎng unless it is a word; (It is, but like most of these we have no POS headings, just structure from the NanshuBot et al.) Robert Ullmann 12:11, 23 August 2006 (UTC)[reply]
To do this you'll need a few more templates that I haven't gotten to, for example a needs cmn-particle ;-) (and something else; I don't even know what the other Hanzi means. Robert Ullmann 12:17, 23 August 2006 (UTC)[reply]
I fixed up bǎng. It's still not perfect, but at least getting there. One issue with your template: for Pinyin entries such as this one, there is no way to indicate that the chinese character is used in both traditional and simplified (as is the case for and )

A-cai 12:59, 23 August 2006 (UTC)[reply]

We didn't run across this case when doing nan-noun for POJ entries, and didn't think about it? hmmm... fixed something in cmn-noun and bǎng. Robert Ullmann 13:49, 23 August 2006 (UTC)[reply]
Well, we can always just drop in a usage note. I'm a bit concerned about laying out full-blown definitions with parts of speech for all Chinese characters under a particular transliteration. When we get to one like , we'll be dealing with roughly 300 such characters! bd2412 T 13:06, 23 August 2006 (UTC)[reply]
Good point. Maybe we could make proper entries for the everyday ones (I have a pocket dictionary that lists 43 different characters for yì). The rest need not have a proper entry, because as your link to the index demonstrates, we already have a laundry list of characters with corresponding Pinyin.

A-cai 13:14, 23 August 2006 (UTC)[reply]

My face is red - I only know Mandarin, so I'd rather imperiously assumed that these apply for all Chinese - I'll run through them and fix. Thanks for the nudge! bd2412 T 13:25, 23 August 2006 (UTC)[reply]
Look at kansen. It might be very reasonable not to be repeating the ===Noun=== and ===Verb=== headers in this sort of case. It is bending the structure a bit, but the result is reasonable. Of course once you want Derived terms and such you need the POS section. Robert Ullmann 14:00, 23 August 2006 (UTC)[reply]
(update) look at kansen now; I think this is fairly good. It is effectively within WT:ELE style, while being very close to what the people in WT:AJ came up with. And even wouldn't be too painful! Robert Ullmann 11:54, 26 August 2006 (UTC)[reply]
Looks good to me. I'm not sure we have yet hit on the right formula for all of this. It does seem like we end up repeating the same information in a lot of different places(especially for Chinese and Japanese). I'm hoping that wiktionaryZ (or something like it) will provide a solution for this. My biggest wish as a regular contributor would be that I could enter information in one place according to a well understood set of guidelines, and have that information automatically be put in the appropriate indexes, table of contents etc. (and be searchable by all the ways that a person would want to search). Of course, a variety of sort options would be good as well. I think your templates are a step in the right direction, but I still think we have a ways to go before I would be satisfied with the nuts and bolts part of the equation.

A-cai 13:19, 26 August 2006 (UTC)[reply]

attention cats[edit]

Hi, we have Category:Chinese words needing attention and Template:zh-attention and Template:ja-attention to explicitly flag words. Robert Ullmann 20:48, 24 August 2006 (UTC)[reply]

This might help me out. Question, how will the zh-attention tag be added? Are you going to do a bot run or simply have users manually add the tag as they come across words that do not use the new template?

A-cai 22:37, 24 August 2006 (UTC)[reply]

The cat tag is just for manual use right now; must think on the rest. Late in Nairobi, sleep time. Robert Ullmann 22:43, 24 August 2006 (UTC)[reply]

Mandarin idiom 金屋藏娇[edit]

Suppose someone wanted to know, and went to google, searched for "Mandarin idiom 金屋藏娇" ?

Try it! Have a good time, time for me to sleep again. Robert Ullmann 00:42, 31 August 2006 (UTC)[reply]

Robert, I usually do "金屋藏娇 wiki" (which magically finds the relevent entry). This will often allow you to find an English definition, even if there is no English wiktionary definition (ex. try typing "麻省理工学院 wiki"). Of the many uses for google, I have found google to also be the most comprehensive dictionary with a rather crude interface that I have ever come across. For example, if I cannot find an English definition for 创业投资公司, but I know that 公司 means company, I can use the advanced search option to do this.

A-cai 23:06, 31 August 2006 (UTC)[reply]

I was interested in the fact that someone who knew google, but nothing about wiktionary, would find the entry that simply; and almost always listed first. What was wrong with ja-noun? Robert Ullmann 11:11, 1 September 2006 (UTC)[reply]
Nevermind, I figured it out. I typed k when I should have typed ky for kyujitai. It works fine now.

A-cai 11:14, 1 September 2006 (UTC)[reply]

拍噗仔 what are you trying to do? If you give the unicode template a blank parameter {{unicode|}}, it just generates an empty <span>? Why modify the appearance of the wikilink? Robert Ullmann 11:53, 1 September 2006 (UTC)[reply]

If you're using a unicode font to view, it might not make sense. But most newcomers to Wiktionary will not have their browsers set to use a unicode font. The unicode template is one solution for this. It ensures that a unicode font is used for the offending letter. The IPA template does the same thing. However, your template links the poj, so phah-pho̍k-á becomes phah-pho̍k-á. No problem, if your browser is set to use a unicode font. If you're not using a unicode font, you can't do phah-pho{{unicode|̍}}k-á, because your template would make it look like [[phah-pho{{unicode|̍}}k-á]]. Obviously, we don't want that! But the template also isn't flexible enough to allow something like [[phah-pho̍k-á|phah-pho{{unicode|̍}}k-á]]. There may be no convenient solution for this. But if you do know of a way to handle this situation, I think it would benefit the vast majority of web surfers who have no clue about how to set fonts on browsers, but would still like to see a valid glyph rather than the dreaded box.
Yes, this is probably an I.E. phenoninon, and I know that Firefox walks on water ... but the reality is that most users are still using inferior browsers :)

A-cai 12:46, 1 September 2006 (UTC)[reply]

okay, well one thing is that using phah-pho̍k-á putting just the combining form in the template, results in it disappearing on a real unicode browser (nothing to combine with!) You would at least want phah-phk-á (let me see what that looks like, I have Firefox in one window and IE in the other. The combined form is phah-pho̍k-á without the template.)
Both phah-pho̍k-á and phah-phk-á look the same in IE; it shows the diacritical to the right of the character. In Firefox the diacritical disappears in the first place, and looks like the bogus IE display in the second. But (without the template) it displays the character correctly. So you can use the template to get rid of the dreaded box in IE only at the expense of breaking the rendering of a properly functioning browser. Isn't IE just a real treat! (And Windows too; if you cut and paste the "real unicode" between windows, it helpfully breaks it: phah-pho̍k-á)
At some point the unicode template will go away; it already does much less than it used to. Robert Ullmann 14:59, 2 September 2006 (UTC)[reply]

Sigh. took me a number of hours. There is a real deep level wikimedia bug in which the positional parameters and the named parameters are treated differently! the wlink template now has an (optional) w= parameter, and the entry for 拍噗仔 works as expected. I only fixed this one case in nan-verb for now. Took horribly long before I realized that the bug was that deep! Robert Ullmann 20:24, 2 September 2006 (UTC)[reply]

Thanks for your hard work! Question, given that you expect that the unicode template might go away at some point, do you think we should somehow directly incorporate the functionality of the unicode template into your own templates? In other words, if someone entered poj=phah-pho̍k-á, the template would tell the browser to display it in a unicode font?

A-cai 23:13, 2 September 2006 (UTC)[reply]

Robert, another challenge for you[edit]

Take a look at 字画 and 字畫. I don't know why, but some characters default to Japanese shinjitai instead of simplified. In this case:

Japanese
Simplified
Traditional

Note that both simplified forms are 753B in unicode. I tried a partial fix in cmn-noun (字畫 still displays incorrectly for simplified). It makes it display correctly (albeit, in a small and hard to read font on my browser). Any ideas? A-cai 23:44, 2 September 2006 (UTC)[reply]

Since both forms are the same in Unicode, they are going to display in whatever font is your default; and you note that you see some inconsistancy ... the reason they are the same is that the Han unification process viewed differences like this as font style. In most applications this is fine; we are a bit special ... let me just think on this and don't worry about it for now. (In any case, zh-cn would have worked better than zh I think.) Robert Ullmann 11:58, 3 September 2006 (UTC)[reply]
Okay, look now. Robert Ullmann 14:29, 4 September 2006 (UTC)[reply]
Perfect!! This will especially help beginners to the language.

A-cai 22:21, 4 September 2006 (UTC)[reply]

Should I be integrating this and tas= into the other templates? Or do we have anything else we'd like to tweak? I plan to wrap the POJ/Pinyin forms with {{unicode}} as you suggested. The structure is changed quite a bit, to a switch, so we can do something later with template invocations that don't have a form parameter. Also Rodasmith pointed out that the usage documentation should be on the talk page if it is any length, makes it faster when the software transcludes the template. So I have some work to do! Robert Ullmann 11:53, 8 September 2006 (UTC)[reply]
Also: I updated zh-ts to add the same forms magic, and changed t and s to trad. and sim. (bit more informative) Robert Ullmann 12:58, 9 September 2006 (UTC)[reply]
Robert, I think this and the tas variables should be in the other templates as well. In my opinion, all templates should behave the same (to the extent possible). This will make them easier to learn.

A-cai 23:13, 11 September 2006 (UTC)[reply]

Yes, of course they should be the same! The question is: is this (cmn-noun) where we want it, so I can reflect the changes into the others? I think so. Robert Ullmann 20:51, 13 September 2006 (UTC)[reply]
I think cmn-noun is pretty good in its current form. Of course, as I continue to use the templates, I may have more feedback. So far, so good ... I think we have done all we can, given the limited number of contributors of Asian words. I'm hoping that one day, we will attract a lot more native speakers of Asian languages who are also fluent in English. We also need more native English speakers who are fluent in Asian languages. You really need both to get the maximum benefit. I have met very few people in my life who have been absolutely equally proficient in two or more languages. If the brain were a server network, it would be almost as if bilingual people have one server per language (perhaps with shared directories). Maybe the person has an 800GB server with a 2.8GHz CPU for one language, but only a 400GB hard drive with a 1.4GHz CPU for the other. Anyway, once we have more bilingual people contributing, we can start to involve them in the development of the templates and other formatting issues.

A-cai 21:23, 13 September 2006 (UTC)[reply]

My friend Stephen who lives in my building was raised in Japan, and is also native-speaker fluent in English. I'm going to be asking him things ;-). I will reflect the changes to the other templates; just did cmn-verb. Robert Ullmann 21:28, 13 September 2006 (UTC)[reply]
Done. We also have a shortcut WT:AC. Robert Ullmann 11:41, 14 September 2006 (UTC)[reply]

Please check, if you have a chance! bd2412 T 23:08, 5 September 2006 (UTC)[reply]

done.

A-cai 09:28, 6 September 2006 (UTC)[reply]

Voting[edit]

I would like to point out to anyone who hasn't voted yet that there are at least four votes going on at the moment that everyone has a vested interest in, 4 Checkusers, 2 Admins, 1 new logo and 1 boardmember, the more the merrier when it comes to these votes, especially the checkusers which requires 25 votes before anyone can be appointed, and the board vote which determines the course of Wikimedia! - TheDaveRoss 15:45, 12 September 2006 (UTC)[reply]

A little help...[edit]

We need someone who has an understanding of Japanese to take a look at User:Gliorszio's contributions, it appears that he has been removing things from lots of South-east asian entries, and we need to know whether they are good removals or vandalism. Thanks - TheDaveRoss 22:14, 25 September 2006 (UTC)[reply]

Based on User:Gliorszio's user page, he is a native speaker of Japanese. User:Gliorszio does not provide any information about proficiency in languages besides Japanese. Based on User:Gliorszio's level of non-communication on talk pages etc, User:Gliorszio is either extremely introverted, or very shaky in English. I have observed a number of grammatical errors in his English posts on talk pages which suggest an English writing level of no more than one or two. His English reading proficiency is more difficult to assess, since anybody can look things up in dictionaries, and English is probably the best documented language in the world. However, no amount of dictionary work will help you to quickly read and understand the voluminous policy discussions on Beer parlor, if you don't already possess sufficient skills in English. I believe this to be the reason that User:Gliorszio has largely ignored such discussions, and is therefore unaware that his edits are not in accordance with the wishes of other contributors. Another possibility is that he is aware of the discussions, but chooses to ignore them because he thinks his way is better. I would rather give him the benefit of the doubt, and assume that the problem is his lack of ability to effectively communicate in English.
Therefore, I believe that User:Gliorszio is qualified to edit meanings and usages for Japanese terms (where knowledge of the subtleties of English is not required). However, he seems unwilling or unable to conform to the format guidelines for English Wiktionary. Since he has not indicated nor demonstrated fluency in any other language besides Japanese, I would view edits to non-Japanese terms with skepticism. If my assessment of User:Gliorszio's abilities is incorrect, it is up to User:Gliorszio himself to prove us all wrong. The only currency we trade in on Wiktionary is the quality of our user contributions.

A-cai 23:07, 25 September 2006 (UTC)[reply]

A-cai, I'm the one who raised this issue; while agree with most of what you say, I disagree on one point: he is ignoring them because he thinks his way is better. I noticed this because he was de-linking rōmaji forms, both in entries and in templates. He has been reminded/told a dozen times over 15 months not to do this, but has persisted. His responses (when he responds) demonstrate clearly that he knows he is contravening community standards. (See his user talk page, 15 Feb 2006: "It's not the needing reason for the links. People who want to know Japanese are must learn Japanese characters.") I don't want to drive him away, just somehow get him to catch on? Robert Ullmann 12:05, 26 September 2006 (UTC)[reply]

Quick reminder[edit]

The English Wiktionary does not use templates (e.g. {{cmn}}) for languages, neither in headings, nor in translation sections. Thanks! --Connel MacKenzie 09:13, 26 September 2006 (UTC)[reply]

My mistake, I had seen others doing this, and had assumed that this was the new way of doing things. Two questions:
  1. Where is this no-templates rule documented?
  2. If the templates ({{cmn}}, {{en}} etc.) are not for that, exactly what are they for?

A-cai 09:40, 26 September 2006 (UTC)[reply]

It's hidden in Wiktionary:Index to templates/languages:
Please note: These generally make alphabetizing translations more difficult; these should be "subst:"'ed as they reappear (being pasted in from other language Wiktionaries.)
I reacall it being mentioned much more explicitly in WT:ELE, as well, but that section seems to have been modified quite a bit.
Those templates exist so that they can be subst:'ed, as visitors from other language Wiktionaries plunk translations in from elsewhere, in the format they are familiar with. Usually, this is problematic for en.wikt:. So, when they are encountered, they are subst:'ed. --Connel MacKenzie 10:28, 26 September 2006 (UTC)[reply]
Chinese Wikipedia has an article on:
Wikipedia zh
A-cai, I'm sorry, I should have mentioned this the first time I noted it. No, there isn't clear documention and there should be. The templates are useful either to subst: to get the standard language name (e.g. {{sw}} will always get you "Swahili" which is the name we use, even though "Kiswahili" is also considered English.) But also for templates like {{wikipedia}} with a lang= option: lang=zh will link to zh.wikipedia.com and also display "Chinese Wikipedia has ...". Robert Ullmann 12:22, 26 September 2006 (UTC)[reply]

Pinyin number system transliteration entries[edit]

Your thoughts would be welcome at Wiktionary:Beer parlour#Pinyin number system transliteration entries. Cheers! bd2412 T 23:30, 27 September 2006 (UTC)[reply]

Admin?[edit]

I've been thinking that A-cai with Admin powers would be a good thing for the project. Any objection to my nominating you for the same? bd2412 T 02:36, 2 October 2006 (UTC)[reply]

I guess it would be fine. I'm not sure I would do anything different from what I do now. How does that work? Do they just vote on it and then change your account settings?

A-cai 10:18, 2 October 2006 (UTC)[reply]

That is pretty much exactly how it works. I think you'll find it useful having the ability to delete pages, and to rollback and block vandals. I shall nominate you forthwith, and you must accept the nomination at Wiktionary:Administrators. Cheers! bd2412 T 13:03, 2 October 2006 (UTC)[reply]

Remember: you need to go to WT:A and accept ;-) Robert Ullmann 14:00, 3 October 2006 (UTC)[reply]

Ha! That's funny, you just accepted, and you already have enough votes! Well, they'll let it sit a few days to be sure no one has any objections (and to give everyone who's desparate for the excitement a change to vote), but I think things are pretty much set. Cheers! bd2412 T 01:57, 4 October 2006 (UTC)[reply]

If you would kindly look at WT:A again, I believe there is someone who would appreciate a vote ... :-) Robert Ullmann 18:22, 13 October 2006 (UTC)[reply]

reminder to self[edit]

In the etymology sections, add relevant passages and translations for 南轅北轍 and 三人成虎 from Zhan Guo Ce. A-cai 14:07, 4 October 2006 (UTC) done. A-cai 09:01, 8 October 2006 (UTC)[reply]

We have àirén, the proper pinyin entry; we don't use entries like airen. Why are you trying to make it work? (e.g. see RfD discussion of El Niño and El Nino the latter only allowed if it is written that way, which it seems to be.) If we did allow things like that they would have to be some kind of dis-ambiguation entries. (Would you put àir under air? ;-) Robert Ullmann 18:03, 6 October 2006 (UTC)[reply]

I have to disagree in this case. The pinyin airen is not a mispelling for àirén, nor is it incorrect Pinyin (granted, maybe incomplete). Actually, Pinyin without diacritics is far more common than pinyin with diacritics. In fact, the wikipedia article is under w:airen, not w:àirén. In closing, I offer this link as exhibit a :-) Here is another one.

A-cai 18:52, 6 October 2006 (UTC)[reply]

This is going to make life very difficult; it is hard enough to get simple policy understood, without trying to get the forms in use, and only the forms in use, as entries. (people will keep trying to put in things "to make them easier to find").
BTW: don't use English Wikipedia as a source for this, they have a policy against diacritics, even when the form without is simply wrong. (note that it isn't under w:airen either, it is at w:Airen, policy there is to capitalize entries (Title Case) whether it is sense or nonsense. w:Da Vinci? Oh please!
Are we going to have 2 entries for all of the pinyin words? Or the ones that are used? (which would be correct...) Robert Ullmann 19:26, 6 October 2006 (UTC)[reply]
I agree that life would be much easier if everybody would just speak English ;0) I don't know what the correct policy should be for Wiktionary. However, I do feel duty bound to point out that Pinyin without diacritics is a form of written Mandarin. It tends to be used by people who lack Chinese fonts on their computers, and people who can speak Mandarin but do not know Chinese characters (this might include overseas Chinese who grew up speaking the language in the home, but never learned to write in characters, as well as non-native speakers who may have learned to speak but not write Chinese characters). Obviously, they write the Pinyin without diacritics for the sake of convenience. Most native speakers who grew up in the PRC would prefer to read Chinese characters, but are able to understand Mandarin when written in Pinyin without diacritics, since they learn Pinyin in elementary school.
I understand the frustration about creating multiple entries for a single word. I would rather be able to enter the info in one place as well. Until we come up with better software, I'm afraid we will be stuck with such workarounds. Like others, I'm hoping that wiktionaryz will one day solve all of our woes :)

A-cai 19:49, 6 October 2006 (UTC)[reply]

Why should everyone speak English? Why not French (much prettier ;-)? So we should figure this out ... I wouldn't hold out much hope for w-z itself: it became apparent almost immediately that the designers had not known that Japanese could be written in 4 different script forms; and they started trying to figure out how to patch that into their "architecture" instead of realizing that they needed to step back and learn a lot more before doing the basic design!
What do we do when the pinyin-without-diacritics "word" can be 4 or 16 or 64 words? With 3 syllables, and (say) 4 tones, there are 64 possibilities, some of which will be words, most not of course. Think about what you would enter at ma? That is just one syllable. What about at mafan? (That is indeed trouble ... ;-)
In general, an entry will have to refer to a number of others, sometimes many, many others. (Isn't this why the PRC gave up trying to romanize Chinese and settled for essentially copying the Jōyō list? Yes, I know that is a fairly ridiculous oversimplification! ;-) Robert Ullmann 21:25, 6 October 2006 (UTC)[reply]
Part of the problem is confusion about exactly what constitutes a "word" in the Chinese language (Chinese_language#Morphology). Each Chinese character represents a single syllable. From a linguistic point of view, Chinese characters can be thought of as morphemes. Very few of these syllables/characters are free morphemes. Most are bound morphemes. In more simple terms, a "word" in Chinese is a syllable or set of syllables where the meaning can be discerned with little or no context. If you send me a note asking, "what does yi mean in Mandarin?" My answer is, "How is it used? What was the context?" In contrast, if you ask me, "What does yisheng mean in Mandarin?" I say, "It means doctor." How can I be so sure? It is because the other choices for the meaning are either non-existent or too obscure to consider. I think we can make a case for creating an entry for yisheng on Wiktionary. However, as you can see for the entry for yi ... well, let's just say that it is not particularly informative (unless you want an index of affixes).
Part two of your question could be restated in the following way, "what is the maximum number of possible syllables for an individual Chinese word?" Like any language, this is a tricky one because of idioms. We think of an idiom as a word or collection of words, depending on length and familiarity. When should a fixed collection of words be entered into Wiktionary as a single entry? My answer is, "when the sum is greater than the individual parts." Let me provide you with concrete numbers from one of the largest on-line Chinese dictionaries (Guoyu Cidian).
Number of characters number of entries
1 9721
2 84836
3 19535
4 35035
5 2479
6 1188
7 636
8 693
9 64
10 194
11 16
12 23
Part three of your question, "how many of these entries require Pinyin entries in wiktionary?" This is of course a subjective call. I have been trying to get more methodical about making entries. That's why I decided to start making my way through the HSK list, starting with Appendix:HSK list of Mandarin words/Beginning Mandarin#a. I'm holding off on single character entries for now because of the Nanshu copyright problem (I don't want to edit a bunch of stuff, only to have it deleted!). Beyond the shadow of a doubt, it is a good idea to include pinyin (with and without diacritics) for all beginning and elementary Mandarin words. Beyond that, I guess it depends on how motivated I am (unless the Chinese speaking cavalry arrives at Wiktionary).
As to your final question about why Pinyin did not replace Chinese characters completely. Obviously, this is a complicated question that ties into cultural/national identity issues etc. I believe that Mandarin (or any language) can be written with latin letters. Pinyin may very well be less than ideal for this, because it does not deal with the issue of homophones. In English, we have right, write and rite. The spelling and context tell you which meaning should be inferred. Such variation has not taken place in Romanized Chinese writing because of the pervasiveness of Chinese characters. Now that we have computers, there is probably even less reason to abandon Chinese characters than there was in the 1920's when people like Lu Xun originally proposed the idea.

A-cai 01:02, 7 October 2006 (UTC)[reply]

Yes, I see (and I understand about "words", that's why the "scare quotes"). And the last wasn't a question ;-) But here is the question: what are we going to do when an entry, without diacritics, represents a number of words? e.g. mafan which has several? (not sure how many, I only know two, máfan troublesome, máfán trouble) with the diacritics, they are separate entries. Without? Are we not going to have lots of these entries that are for many words? How does this get formatted? Robert Ullmann 01:33, 7 October 2006 (UTC)[reply]
I actually don't think it will be that bad for multi-syllable words (yiyi might have about 12 or so). Besides, take a look at all of the different meanings in English for the word right. Perhaps if English had a similar writing system to Chinese, each meaning for right would be represented by a different character. Well, I was going to link you to an example that I created a while back, but I can't seem to find it. Anyway, I'm thinking of something like this (but with your template):

Noun[edit]

changshi (Pinyin chángshí, Traditional 常識, Simplified 常识)

  1. common sense

changshi (Pinyin chángshì, Traditional and Simplified 常事)

  1. common matters

Verb[edit]

changshi (Pinyin chángshì, Traditional 嘗試, Simplified 尝试)

  1. to attempt; to try

or

Noun[edit]

changshi (Pinyin chángshí, Traditional 常識, Simplified 常识)

  1. common sense

Noun[edit]

changshi (Pinyin chángshì, Traditional and Simplified 常事)

  1. common matters

Verb[edit]

changshi (Pinyin chángshì, Traditional 嘗試, Simplified 尝试)

  1. to attempt; to try

A-cai 02:05, 7 October 2006 (UTC)[reply]

See changshi. Question: how does this entry sort into cat Mandarin nouns? I didn't put the pint= parameters in, but that isn't the solution: the entry can only sort in the category once, so it would just use the second one. But we don't want it to appear at "changshi", that is after "changN..." for any given tone number N. Must noodle this a bit. Robert Ullmann 11:29, 7 October 2006 (UTC)[reply]

Looks good! I'll defer to you about the nuts and bolts under the hood. I would say that it should either not be included in cat Mandarin nouns (since we will already have the diacritics versions in there) or the non-diacritics one should be before the diacritics entries in the list. ex.
  • changshi
  • chángshí
  • chángshì
One way would be to do pint=chang0shi0. I don't relish the thought of doing that, but I can't think of any other easy way at the moment. Perhaps you'll come up with something more elegant.

A-cai 11:38, 7 October 2006 (UTC)[reply]

The problem with that (and it was my first thought too) is that it is non-obvious. Wouldn't be to hard to document, but most people don't read the document, the just copy existing entries. (That's why you need a critical mass of entries done a particular way before other people will get the idea.) People will persist in putting in the "correct" pinyin-with-tones. Robert Ullmann 12:00, 7 October 2006 (UTC)[reply]
Okay, I've set it up so that it will sort correctly if there is one diacritical form, and the usual pint parameter. If there is more than one (and both the same POS), it will sort with the last one. This isn't perfect, but any individual case can be fixed if someone cares. Meanwhile if I think of something else? -noun and -verb for now; we'll see how it works out. Robert Ullmann 12:40, 7 October 2006 (UTC)[reply]
Take a look a baba. This is how I see Mandarin fitting into this part of the picture.

A-cai 20:20, 7 October 2006 (UTC)[reply]

Another interesting use of Pinyin without diacritics is in computing (filenames, directory names etc). For example: www.sdau.edu.cn/chengjiao/bangongshi.html

A-cai 20:32, 7 October 2006 (UTC)[reply]

Oh good. Remember to add new inflection templates to WT:IT and WT:AC Robert Ullmann 12:49, 7 October 2006 (UTC)[reply]

Your comment[edit]

Thanks for the post on my "discussion" page. We all have our own levels of skill, and add "from each according to his ability" :) We also, I think, have the same aims -- of course, we must know that because we've been working together for some time now. We build on one another's strengths and often I've begun a stub and you'll immediately step in and fix it or add to it. That seems a good working process. I was happy that Mandarin and Cantonese (and now, gradually, Min Nan) appeared for all the distinct hanzi pages and perhaps I extrapolated that to phrases as well. Your point that neither of us speaks Cantonese (yet) is well taken, though. I simply wish that redlinks for Cantonese remain so that Cantonese speakers may step in and add the Cantonese. If no spot for missing Cantonese exists, my worry is that future editors may neglect to fill it in. I hope we can understand each other's motivations better now. Badagnani 07:10, 9 October 2006 (UTC)[reply]

I understand the desire to see other Chinese dialects put into Wiktionary. Even though most visitors to Wiktionary are probably more interested in Mandarin, the other dialects have had an undeniable influence on Mandarin (and vice versa), and that should be documented. The main thing about dialects (or any entry to Wiktionary for that matter) is that it is important to have accurate information. This is sometimes very difficult to come by, and can only be verified by a person knowledgeable in the language. For example, one of the Min Nan words that I entered today was 孽詰仔話孽诘仔话 (gia̍t-khiat-á-ōe). You will find this word in exactly zero other dictionaries, whether they be Min Nan/English Min Nan/Mandarin ... it doesn't matter!!! Until today, you could not find this word in a dictionary (Please let me know if you ever do find it!) It is actually a fairly common word[10], so why is it not in a dictionary somewhere? The reason is that, like most Chinese dialects, people often don't know how to write Min Nan (either with Chinese characters or with an established phonetic system). This is one of the reasons you see so much variation in the way Min Nan is written. In some cases, nobody knows what the Chinese characters should be. In other cases, there simply are no Chinese characters for that word! In order to find the correct characters for this particular word, I had to say it to myself out loud (so that I could get an idea of where to even start looking), and then do a lot of research before finally finding the right characters.
At any rate, there may be value in a non-expert in the language adding words from a credible source to Wiktionary in the absence of fluent speakers (which is the current situation with Cantonese). I'm thinking of raising the issue at beer parlor. One good example is the word 的士 (Taxi). Again, I don't speak Cantonese, but I did come across the following article on Wikipedia: Hong Kong Cantonese#Imported loanwords. You can also find the word on Cantodict[11]. So how do we put this information into Wiktionary while still conveying that the information has not been evaluated by a fluent Cantonese speaking Wiktionary contributor? I'm thinking maybe some kind of "unevaluated" or "needs checking" tag might be in order.
Finally, I understand your desire to have Cantonese speakers fill in red links to individual entries. I just don't think you're going to get very far with this approach, because it will be next to impossible for the Cantonese speaker to methodically find the red links so that he may fill them in. With Wiktionary software in its current state, he would basically have to randomly stumble across it. If you're serious about getting someone to make Cantonese entries, I think what we need is a Wiktionary appendix for Cantonese words that need entries. Such a list would be similar to Appendix:HSK list of Mandarin words, but for Cantonese (for words not on the list, you can use Wiktionary:Requested articles:Chinese). No such list exists for Min Nan, so far as I know. However, perhaps one exists for Cantonese. Failing that, a Cantonese speaker could do what I have been doing for Min Nan, which is to put the Min Nan pronunciation for cognates (or a "see also" section, if a cognate does not exist in Min Nan).A-cai 10:09, 9 October 2006 (UTC)[reply]
It took me a while to figure out why you were saying "redlink", when you seemed to be talking about adding a blank Cantonese language section to entries. That is not a redlink.
By adding a blank or mostly blank section you are accomplishing exactly the opposite of what you intend! It makes it look like the entry for Cantonese is present to any kind of 'bot or other tool. If we have a list of Cantonese words as A-cai suggests, a 'bot can make a list of missing entries, that some native or knowledgeable speaker can work on. If you've added a Cantonese section to an entry without completing it—since none of us can—the bot will see it as present and skip that word as already done! Robert Ullmann 17:27, 9 October 2006 (UTC)[reply]

Hmm, interesting points. Regarding the number of Cantonese speakers here, I know there are a lot at English Wikipedia, as they're active on the Hong Kong and other similar articles. It's just a question of bringing them here and interesting them in promoting the documentation of their language. Good job with the figuring out of the characters for that phrase (wouldn't one call it a phrase or idiom rather than a word?). Badagnani 03:22, 10 October 2006 (UTC)[reply]

No it's a word. The meaning is rather straight forward to native speakers. However, there are a large number of native speakers who may not know the proper Chinese characters for the phrase. This phenomenon is similar to Vietnamese, which was also written with Chinese characters until recently. Nowadays, only scholars tend to know how to write Vietnamese using Chinese characters. Min Nan is not quite as severe, only because many Min Nan speakers are also fluent in Mandarin, and therefore do not find it difficult to pick out words that have a Mandarin cognate. This can be tricky however, since sometimes the vernacular Mandarin character may not match the original Min Nan character, even though the words are clearly connected. For example, the Mandarin and Min Nan word for "what" both share common phonological characteristics:
As a result, the Min Nan word is often written with the same characters as in Mandarin. However, the "correct" characters are different:
This may sound nitpicky, but there is significance to the use in Min Nan of those particular characters. This is because unlike Mandarin, the phoneme siáⁿ can mean "what" in other combinations (such as siáⁿ-lâng 啥人 [ɕiã˥˥laŋ˧˥] lit. what person; who). Also unlike Mandarin, mi̍h has etymological significance (mi̍h-kiāⁿ 物件 [miʔ˨˩kiã˧˧], same as the Mandarin word 东西 thing). As a result, it can be a challenge to decide what the "standard" Min Nan characters should be. As I understand, Cantonese had the same problem. However, a committee in Hong Kong standardized a number of Chinese characters that are unique to Cantonese such as and etc. This has not been effective in the case of Min Nan. My theory as to the reason is that not all Min Nan speakers desire to write Min Nan in Chinese characters (hence Min Nan wikipedia in romanized POJ).

A-cai 05:06, 10 October 2006 (UTC)[reply]


One more thing, aren't there multiple Min Nan languages/dialects? The one you're calling Min Nan is the language also called Hokkien that is spoken in Xiamen (Amoy) and Taiwan, correct? -- but Wikipedia has just created a new page for something called the Fuzhou language which is apparently some other kind of Min Nan -- and aren't the Chaozhou also a Min Nan language? So I'm not sure how you are using the term Min Nan, A-cai. Badagnani 03:25, 10 October 2006 (UTC)[reply]

When I speak of Min Nan, I mean the language that is described in the Min Nan wikipedia article (in particular Quanzhou, Zhangzhou and Xiamen speech, which is what you will find at Min Nan wikipedia and Min Nan wiktionary). It is also known as Taiwanese, which has its own article, because Taiwanese does contain words that are non-existant in Fujian (similar to vocab differences between Brittish and American English). In cases where a word is specific to a certain region, I have been tagging it as such in the definition line (ex. pak-lī-ko). The two main accents in Min Nan are Quanzhou and Zhangzhou (kind of like British and American English). The article also lists Xiamen, but actually it is more of a hybrid of Quanzhou and Zhangzhou accents than an actual accent itself (IMHO). People living in the northern areas of Taiwan primarily speak Quanzhou, whereas Zhangzhou is spoken in the southern areas of Taiwan. Since the accents are identical to their place of origin, I have chosen to use Quanzhou and Zhangzhou in the pronunciation section for cases where there is a difference in pronunciation (ex. 未僫), so as to cover all bases. The language in Fuzhou is actually Min Dong, and is commonly known as the Fuzhou dialect (there is now a Min Dong wikipedia). Chaozhou, also known as Teochew, is a prominent variant of Min Nan. According to the Teochew article, it is about 50% mutually intellible with Xiamen speech, but then it says that it is mutually intelligable with Zhangzhou speech. Maybe I'm missing something, because I am familiar with both Xiamen and Zhangzhou speech, and while there are some notable differences in pronunciation and vocabulary, they don't strike me as that different (maybe like the difference between New York and California for American English).

A-cai 05:06, 10 October 2006 (UTC)[reply]

Sysop[edit]

You'll find a few new buttons at the top of your screen: delete and protect, plus a one-click rollback function if you're looking at a diff and block if you're anywhere near a user name. Common sense will go a long way toward their proper use, but you might wish to read (or improve) Help:Sysop tools, too. You can also ask any of the rest of us via talk pages, email or IRC. I'll add you to the list in a minute, so please take a look and make sure I didn't miss anything on your line.

Congratulations on your successful nomination, and welcome aboard. :)Dvortygirl 04:58, 16 October 2006 (UTC)[reply]

Aha! Congrats! Never any doubt. Cheers! bd2412 T 13:53, 16 October 2006 (UTC)[reply]
Yes! Congratulations! Would you go into "my preferences/editing" and check off "mark my edits as patrolled" please? ;-) Robert Ullmann 23:06, 17 October 2006 (UTC)[reply]

Min Nan Wikipedia article[edit]

Hello, would you take a look here: http://en.wikipedia.org/w/index.php?title=Min_Nan&diff=82569531&oldid=82507782 and see if you agree with this person's edit? Badagnani 04:56, 20 October 2006 (UTC)[reply]

And this one: http://en.wikipedia.org/w/index.php?title=Min_Nan&curid=198045&diff=82576449&oldid=82570048 Badagnani 06:01, 20 October 2006 (UTC)[reply]

I'd tend to agree with the edits (I'm sure A-cai will have a more detailed view ;-). In particular, the majority of people on the mainland and on (I think) Taiwan would be astounded to learn that they spoke "Standard Mandarin"! (even as a second language ...) The second is better than the first. You might just leave that? Robert Ullmann 12:18, 20 October 2006 (UTC)[reply]
It sounds to me like the person who made the edit felt that the bit about Mandarin being the official language was somehow demeaning to Min Nan, relegating it to second class status because it is not an "official" language. Of course, it's difficult to know what the motives of various editors might be. Personally, I think the anonymous editor may be a bit of a conspiracy theorist :) My take on it is that the intention was not to belittle Min Nan as a language, but merely to point out to a western reader that Mandarin is the official dialect (a fact that might not be known to a typical westerner). At any rate, Mandarin is hyperlinked in that section, so it would be easy enough for the reader to click on the word Mandarin if he didn't know anything about Mandarin. I agree with the article about the mutual unintelligibility of the spoken languages. This should start to become easier to demonstrate as I work my way through the HSK list (I'm almost done with the d's for beginning Mandarin). You should start to see that even though many of the Min Nan words are cognates with Mandarin, the pronunciation and usage of the words often vary dramatically between the two. In fact, I'm deliberately putting in IPA so that readers can compare the two pronunciations on equal footing. For example, POJ chheng = Pinyin qing = [ tɕʰiŋ ]. Also, it should become clear that there are a large number of words that Mandarin and Min Nan do not share.

A-cai 13:08, 20 October 2006 (UTC)[reply]

To search by Pinyin[edit]

It will be easier to search if Pinyin written in different forms, such as: sky - 天空 (tiānkōng, tian1kong1, tiankong).

The other forms are being entered. But they should not all be given in the translations section. --EncycloPetey 22:49, 19 October 2006 (UTC)[reply]
Is it possible to set by computing program to let the different forms of Pinyin to be equal, such as (tiānkōng = tian1kong1 = tiankong = tian kong)?
The best person to ask might be User:A-cai --EncycloPetey 23:06, 19 October 2006 (UTC)[reply]
Perhaps we could eventually do a combination of templates and expandable sections. For example, tiānkōng would appear by default, then if you expand the section, the rest appear.
I agree with tiānkōng = tian1kong1, but tiānkōng = tiankong is problematic because it is not necessarily always a one for one. See chengshi for an example of what I mean.

A-cai 22:47, 20 October 2006 (UTC)[reply]

I think the direction taken at chengshi is excellent (listing all the various pronunciations and meanings in Chinese of this romanization), as I don't think any similar dictionary exists on the Internet, and this serves as an excellent educational tool. Badagnani 18:52, 21 October 2006 (UTC)[reply]

zh-tw, zh-cn, etc.[edit]

I need a little help identifying the difference between the numerous zh categories, since I can't find the explanations in the categories themselves. I'm worried I'm screwing it up and if I am I'd like to stop as soon as possible. :) Ric | opiaterein 19:37, 22 October 2006 (UTC)[reply]

cmn, nan and yue are all ISO 639-3 codes. They all belong to the macro language code zh from the original ISO 639.
zh-cn Simplified Chinese as used in the PRC
zh-tw Traditional Chinese as used in Taiwan
cmn Chinese Mandarin
nan Min Nan
yue Cantonese
I'm not sure what the intention was for cmn, nan and yue, but since there are no codes for romanizations such as Pinyin, Jyutping, POJ etc, I decided to use them for that. This limits each one to the main phonetic system (e.g. cmn for Chinese Mandarin Pinyin, nan for Min Nan POJ, yue for Cantonese Jyutping etc). I debated switching all the zh codes to cmn for the sake of consistency, but have not done so as of yet. More people are familiar with zh than cmn (if they known the codes at all). So here is the way it breaks down:
  • zh-cn = Chinese (Mandarin) in simplified script, per PRC usage
  • zh-tw = Chinese (Mandarin) in traditional script, per Taiwan usage
  • zh = Chinese (Mandarin) in romanized Pinyin
  • nan-cn = Min Nan in simplified script, per PRC usage
  • nan-tw = Min Nan in traditional script, per Taiwan usage
  • nan = Min Nan in romanized POJ
  • yue-cn = Cantonese in simplified script, per PRC usage
  • yue-hk = Cantonese in traditional script, per Hong Kong usage
  • yue = Cantonese in romanized Jyutping
I took it upon myself to work all of this out. In the absense of feedback from a large number of contributors, I tried to work out a system that I felt would make logical sense. Time will tell whether it is the best system, but atleast it is a system (that a bot could later modify as needed). I will try to write something in WT:AC about the categories, but I agree that perhaps the best way would be to make a note in the categories themselves.

A-cai 22:34, 22 October 2006 (UTC)[reply]

P.S. I did create two templates a while ago that could be used more often: {{zh-simplified}} and {{zh-traditional}}:

Template:zh-simplified

and

Template:zh-traditional

See Category:zh-cn:Beginning Mandarin for an example of their use. Perhaps these and other similar templates should be placed in the categories for all of the Chinese dialects.

A-cai 22:42, 22 October 2006 (UTC)[reply]


Thanks :) Just wanted to make sure I was putting things in the right categories. Ric | opiaterein 21:20, 23 October 2006 (UTC)[reply]

Han characters[edit]

Tell me what you think of how they are sorting out in Category:Han characters. I think it looks pretty good. Robert Ullmann 00:15, 24 October 2006 (UTC)[reply]

Looks fine to me. A couple of comments:
  1. the TOC still does not work on my browser (IE).
  2. simplified radicals - 马 is being placed after 馬. There are quite a few publications that do it this way. However, most of the PRC publications that have a stroke/radical index do it by number of strokes for the radical. So 马 would be in the three stroke radical section, not the 10-stroke radical section. Personally, I think it would be good if they were in both locations, but I don't think this is technically feasible.

A-cai 09:05, 24 October 2006 (UTC)[reply]

No, we can't put the entry in two locations. The radicals are sorting on Unicode order, is 99AC, is 9A6C. (But isn't categorized anywhere yet. Nor anything with that radical. What are you looking at? Or are you saying that will happen? Yes.) The sort order in the category is going to be Unicode-order-of-radical, additional strokes, Unicode-order-of-character.
However the TOC we can do whatever we like with, as long as it isn't too big. It doesn't have to be in the same order as the category, as long as it isn't too confusing.
The TOC works fine for me in IE (6.0/WinXP)? What doesn't work for you? (and we should be improving it anyway, probably using {{cattoc}} Robert Ullmann 13:27, 24 October 2006 (UTC)[reply]
When I click on one of the blue linked characters, the characters with that radical should become the first item on the screen. But that isn't happening in my case. I agree that, in any case, we will eventually need a better design.

A-cai 13:36, 24 October 2006 (UTC)[reply]

What does happen? The only thing I can think of is that if you click on a radical that isn't present, you will get the following one in Unicode order (not TOC order). This works in every browser I have access to, and I can't figure out how it can not work; it is the same URL? Robert Ullmann 17:10, 24 October 2006 (UTC)[reply]

Need Min Nan help[edit]

There's a need for Min Nan help at w:Red yeast rice. The article states that it's called 紅麴米 / 红曲米 but a jar of Taiwanese pickled tofu with this ingredient calls it 紅糟. User:Sjschen believes that 紅糟 might be the Hokkien way of saying 紅麴米. Is there any chance of that? Badagnani 22:34, 24 October 2006 (UTC)[reply]

紅麴 (POJ Âng-khak, [ aŋ˨˩kʰak˩˩ ]) is definitely a Min Nan term. I found it in two Min Nan dictionaries (as it turns out, I once toured a factory where this stuff is made. The guy reffered to it as 紅麴 in both Mandarin and Min Nan). I could not find 紅糟 in a Min Nan dictionary, and have not personally heard this term used. However, there are a number of hits on Taiwanese websites that equate these two terms as synonyms. I also found this glossary on-line which lists a Taiwanese Min Nan pronunciation of âng-chau ([ aŋ˨˩tsau˥˥ ]) for 紅糟.

A-cai 23:33, 24 October 2006 (UTC)[reply]

zh-forms[edit]

It would help if you would document how this is to be used on the talk page or in WT:AC. So it is clear that this (like everything else) is to be used in a particular way, and not however each editor thinks it ought to be? Robert Ullmann 20:25, 25 October 2006 (UTC)[reply]

Finnished with the rough draft.

A-cai 22:30, 25 October 2006 (UTC)[reply]

I just read it and think it's all very sensible. I think these templates are quite good. Badagnani 23:04, 25 October 2006 (UTC)[reply]

Has anyone mentioned the discussion at Wiktionary:Requests for deletion/Others#Category:しゅ (shu)? Cheers! bd2412 T 04:00, 26 October 2006 (UTC)[reply]

further changes to han char entries[edit]

Connel had concerns about the "common meanings" not showing up as # defintion lines. Of course, they may go away completely.

I tried putting them inside the Chinese/Hanzi section, where at least it makes a bit more sense. Would you look at and tell me what you think? Robert Ullmann 11:44, 27 October 2006 (UTC)[reply]

Sounds like an ok compromise. The problem I'm having is that the Chinese and Korean sections still don't look much like a standard wiktionary entry. Furthermore, (and this is just me), do we really need three phonetic schemes for each dialect for each freakin' character?!!! :-) Anyone who cares to can go to Template:Pinyintable if they want to see a comparison chart. Note the lack of Yale on the table, I didn't include it because it is not really used anymore. Yale doesn't even use Yale anymore (for Mandarin), they use Pinyin!!! Ok, done with rant ;-)

A-cai 12:31, 27 October 2006 (UTC)[reply]

I'm trying out something. Do you thing it is reasonable to have what is now "common meanings" in the Translingual section? From what you said in WT:BP you do ... I had thought it was pretty much impossible to have a common definition.
There isn't any way that the Chinese and Korean sections are going to look like real entries without real POS or POS like information! (Oh, and no we don't need 3 romanizations; thats why for Mandarin and Min Nan (and I think Cantonese ;-) we have picked the correct one.) Robert Ullmann 12:56, 27 October 2006 (UTC)[reply]
Take a look at forum. If we were to organize this entry according to the common meanings scheme, we would have something like:

==Translingual==
===Common meanings===

  1. A place for discussion.

==English==
===Noun===

  1. blah blah

==Italian==
===Noun===

  1. blah blah

==Norwegian==
===Noun===

  1. blah blah

==Latin==
===Noun===
forum

  1. market-/public place

It's probably the best you're going to do with a bot. However, I'm thinking that ultimately, we will end up putting most of the "common meanings" stuff into something like:

==Old Chinese==
===Noun===

  1. sun; day

It will take many years to get it to where I want it, unless we get about 50 to 100 qualified people to do what I do :0) A-cai 13:33, 27 October 2006 (UTC)[reply]

You'd be surprised at how quickly things are coming along, with all of your hard work! Question: what does "POS" mean in this context? Badagnani 07:50, 28 October 2006 (UTC)[reply]

We want the "common meanings scheme" TO GO AWAY. We just haven't got anything else we an do with most of these entries right now! And Connel wants anything that looks like a POS section to have definitions.

POS = Part of Speech, sometimes used very loosely to refer no POS-like headers. See WT:POS Robert Ullmann 10:30, 28 October 2006 (UTC)[reply]

formatting some of the other stuff[edit]

So what do you think of what I'm doing now? Robert Ullmann 17:08, 5 November 2006 (UTC)[reply]

is comming along very nicely. Now its just a matter of working on the individual characters to see what pitfalls we may not have considered. I always seem to discover little ways to improve things by making real entries. It helps me to work through the various scenarios that one might encounter.

A-cai 18:14, 10 November 2006 (UTC)[reply]

Min Nan - 地[edit]

Am I correct to assume that the word 地, used as a suffix to form adverbs in Mandarin, is non-existent in Min Nan? Or is there a different word used in place with this word?
-- Hiòng-êng 03:15, 17 November 2006 (UTC)[reply]

地 is not used to form an adverb in Min Nan. The Min Nan equivalent would be 仔 á.
Ex. jî-chhiáⁿ bān-bān-á sit-khì ka-tī ê bûn-hòa (而且家治文化)
moreover, they are slowly losing their culture
The above example is from the Tsou-cho̍k article in Min Nan wikipedia. The Mandarin equivalent would be:
而且失去自己文化

A-cai 12:09, 17 November 2006 (UTC)[reply]

Hi, I tend to think we should keep the Hanzi/cmn-hanzi bit, with the Compounds sub-section. That way the entries serve both as a dictionary of words as well as the kanwa jiten form. Which I think is pretty good; no print dictionary can come close. We can put a lot more in an entry.

Have you noticed the traffic increase? More and more IP edits to characters and the occasional word means more and more people are using these entries. Robert Ullmann 14:02, 25 November 2006 (UTC)[reply]

I thought we were only going to keep the cmn-hanzi part until it could be replaced by a proper POS header. Are you saying that you would rather have both included in the entry? I don't think it would be a problem for most characters. However, did present an interesting problem that occurs with some characters (see Talk:表).
I haven't been paying attention to the number of IP edits lately. However, 多多益善.

A-cai 22:38, 25 November 2006 (UTC)[reply]

The idea (mine, at least) was that we would drop the # {defn} line as redundant, but keep the header, template, and compounds section. The case of seems to me to need multiple etymologies. Then there are two different POS sections, referring to the different meanings. (In some cases, the POS section is just repeated by one editor, then the ety gets added later when someone has the details.) The {zh-forms} template can show multiple traditional characters? Robert Ullmann 02:16, 26 November 2006 (UTC)[reply]
The answer to your question about {zh-forms} is yes (see: tang-pêng).

A-cai 03:37, 26 November 2006 (UTC)[reply]

Wikipedia Min Nan article[edit]

An editor is attempting to move the Wikipedia Min Nan article to Minnan (and related articles moved as well). See the discussion page there as they need Min Nan speakers to provide input on this spelling change. Badagnani 08:04, 26 November 2006 (UTC)[reply]

Family names categories[edit]

Hi, an editor from Beijing has recently begun creating Chinese family names categories at English Wikipedia, which enable users to locate individuals with the same hanzi surname (though which often use different romanizations). There's a lively debate at the deletion proposals area, which you might want to look in on and vote if it's of interest to you. I would use such categories myself and don't see what the strong objection is about. http://en.wikipedia.org/wiki/Wikipedia:Categories_for_deletion/Log/2006_November_22#Family_name_categories Badagnani 09:46, 26 November 2006 (UTC)[reply]

This word has no entry, but seems to be a Cantonese and Mandarin word for week. Could you perhaps create the entry? Thanks sincerely, --EncycloPetey 03:19, 29 November 2006 (UTC)[reply]

done.

A-cai 03:49, 29 November 2006 (UTC)[reply]

Chinese translations of listen[edit]

Hello, could you check the Chinese translation(s) for the word listen, and make sure they are listed in the correct section? Thanks, --EncycloPetey 02:48, 4 December 2006 (UTC)[reply]

Mandarin pinyin in Han character entries[edit]

Would you look at User:Robert Ullmann/Mandarin Pinyin and tell me what you think? Robert Ullmann 20:37, 5 December 2006 (UTC)[reply]

  • I can tell from looking at the page that your bot took issue with these entries in some way. I can even guess in some cases what the problem might have been. But, I think it would be helpful if you were more explicit as far as what you would like to be done (if anything). Other than that, it seems to be working the way you designed it to work.

A-cai 04:51, 17 December 2006 (UTC)[reply]

Do you know what the asterisks were intended for? The other things are syntax errors, and pinyin that is either obviously wrong, or inconsistent with the tones variant. And then there are entries where someone put in the pinyin for a word, not for the single character, because the nanshu form left no room for real entries with definitions? Robert Ullmann 18:17, 18 December 2006 (UTC)[reply]
  • The asterisks seem to be identifying the most common pronunciation, where multiple pronunciations exist.

A-cai 21:01, 18 December 2006 (UTC)[reply]

Greetings! I'm putting together a model entry for a Chinese all-purpose transliteration (e.g. a transliteration missing the tone, as Americans are likely to errantly write). How does shang strike you, for this purpose? Or should we bother having these? Cheers! bd2412 T 17:21, 15 December 2006 (UTC)[reply]

  • Actually, it will take a lot longer to do, but I'm in favor of something like the Mandarin portion for ai. I'm not basing this so much on the amount of work involved as much as what I would like to see in terms of features for the ideal online dictionary. Yes, I acknowledge that my version might take many years to achieve. However, I don't think that should stop us from toiling away :)

A-cai 04:58, 17 December 2006 (UTC)[reply]

    • Wouldn't that ultimately entail a duplication of whatever is in the individual articles for, e.g. shāng, sháng, shǎng, and shàng? Not that duplication is inherently a bad thing, but for some terms that would end up making the entry covering all tones extremely long! Also, in ai, I think the "see also" terms should probably be among those grouped at the top of the entry. bd2412 T 06:14, 17 December 2006 (UTC)[reply]
  • Under wiktionary's current archetecture, information would be duplicated. Would such an entry be extremely long? Possibly, if we created separate definitions representing every single Chinese character whether common or obscure. I'm not convinced that this is necessary. However, I do think it is useful for a student to know that ai can mean short or love in Mandarin depending on the tone and the context. The syllable ai can theoretically mean lots of other things, but I believe some attempt should be made to highlight the more common meanings.

A-cai 06:28, 17 December 2006 (UTC)[reply]

    • Well thought out. But would you object to my using the form for shang for the time being (not to undo anything that has been done in the style you recommend, but to create new entries? Would you recommend any changes to that model, as an interrim entry? bd2412 T 17:22, 17 December 2006 (UTC)[reply]
  • I have no objections. If you find it useful, maybe others will as well.

A-cai 20:51, 18 December 2006 (UTC)[reply]