Japanese basic words[edit]

This cat is supposed to be exactly the set of words listed in Appendix:1000 Japanese basic words; it isn't dependent on whether the word is "common" or in particular dictionaries. Cheers, Robert Ullmann 13:49, 30 April 2007 (UTC)

I see. That doesn't explain why the word 西語 is in the appendix itself, though. It is hardly among the 1000 most common or useful words. スペイン語 has the same meaning and is the word almost always used. -- Coffee2theorems 14:05, 30 April 2007 (UTC)
Note that the appendix lists that too; almost all the 1000 are listed in both kana and kanji forms; so ~2000 entries will appear in the category. Yes, writing スペイン語 is about 15 times as common as writing 西語, but the list considers them to be the same word. (Although the latter is a shortened form.) Robert Ullmann 16:42, 30 April 2007 (UTC)
OK, that explains it. But I still don't quite agree with it :-) 西語 is read せいご, so reading-wise it's not a shortened form, and writing-wise it would be short of 西班牙語, an ateji which is rare (or perhaps even obsolete) itself. I'd consider 西語 a rare synonym of スペイン語 formed from the one-character country codes (米=US, 英=England, 伊=Italy, etc) and forget the full country name atejis altogether for practical purposes. After all, if you see スペイン語 and don't know either that 西 is the country code for Spain or (less likely) that the ateji for Spain is 西班牙, you'll never come up with 西語 as a short form. But you will have no trouble shortening ダイヤモンド to ダイヤ without any additional information.
While it is good to know many of the country codes, I'm not sure 西 is an important one. Also introducing them without any explanation in language names, as well as mentioning rare words as examples of the country codes may be confusing to those who are not already familiar with them. They might get the idea that 西語 is a common word, for instance. Once you know a country code, such as 米, you know what compounds such as 米国, 米語 and 全米 mean. I'd consider them words themselves, just like the particles in the appendix, even though they are only used in compounds -- Coffee2theorems 12:18, 1 May 2007 (UTC)

Is supposed to be just a specific wordlist of 1000. Someone (IP-anon) added lots of things, which I reverted again. Don't know if it was you. If we/you want a general basic vocabulary/index, it should be somewhere else? Robert Ullmann 13:07, 10 May 2007 (UTC)

I haven't edited Wiktionary anonymously since I created this account and have no intention of doing so, that would be just confusing. Perhaps the list would get less unwanted edits if it clearly stated what exactly it is supposed to be. If you look at its talk page, someone has asked where the list came from, but there's no answer. One might then reasonably suppose that it is a list of 1000 most common or most useful words and attempt to improve it.
The page does say (now) that "Do not add words to this list", but that could be taken as "Don't add words, because then this would no longer be a list of 1000 words. Removing one word and adding a better (whatever 'better' is) one instead is acceptable, as the total is still 1000."
I don't know about a general basic vocabulary list, what would one use as an inclusion criterion? If there were such a thing, it could be implemented like Template:rare is, so that the list is constructed as a category automatically and which one of the meanings/uses is intended is clearly marked. I'm not particularly interested in compiling such a list at the moment, as there are lots of other more interesting things. Adding quotations and new entries for one thing. -- Coffee2theorems 13:25, 10 May 2007 (UTC)

Re: そびれる[edit]

It's just that we don't use the header "Auxiliary verb". If it can stand on its own, the header is "Verb" -- if not, "Suffix". In order to ease parsing by robots, we try to keep the number of different headers to a minimum.

Poking through google, it seems it attaches to the 連用形 form (-i form) of verbs. So, yes, it seems to work like ます. Unless the sentence "そびれる。" is grammatical, then it should be changed to Suffix. And yes, it's a good idea to explain anything unusual about a word in a "Usage notes" section. Cynewulf 20:38, 3 May 2007 (UTC)

It is like ます in that it isn't used alone. Looks like ます also uses "Auxiliary verb" and uses "Particle". I hadn't noticed until now that I did a bit of digging that WT:AJA actually explicitly says that Suffix is to be used for 助動詞. I'll change all of ます, そびれる, だ to conform. -- Coffee2theorems 22:07, 3 May 2007 (UTC)


see edit, removing ja-attention and the Hiragana header, adding POS template. Robert Ullmann 11:51, 4 May 2007 (UTC)

Thanks. -- Coffee2theorems 12:02, 4 May 2007 (UTC)


Unicode database (linked from entry) lists the readings as SHU CHU. User "Nanshu" who converted these entries apparently generated the hiragana. (And you'd think he'd get "shu" right? ;-) So I think the shu reading should get put back? Robert Ullmann 16:24, 5 May 2007 (UTC)

I don't doubt that there exists either reading しゅ or しゅう for that kanji, as it's in the unihan database. Neither is in my dictionary so I can't say for sure which one it is. My guess would be しゅう, as many times when there's either ちゅう or しゅう there are both. But e.g. , have readings しゅ and , have readings しゅう, so both readings could be possible. I'd be pretty surprised if it were しゅ though. I only removed it because I didn't want to guess, and don't have a problem if you want to add it back :-) -- Coffee2theorems 16:34, 5 May 2007 (UTC)
The Unihan DB tends to be very good; but then it just says "SHU", not the hiragana. Separate question: why ? Meaning: why is red? I thought Cynewulf was making sure we had all of these ;-) We have of course. Robert Ullmann 16:43, 5 May 2007 (UTC)
The Unihan DB uses "SHUU" for しゅう, so I would say it was good to leave it as しゅ. And ちゅう is written chū. Robert Ullmann 16:48, 5 May 2007 (UTC)
Interesting that it says CHU then; my dictionary (electronic version of 漢字源) gives only チュウ.
Characters and readings that are not (as far as I know) used in modern Japanese are a bit bothersome. 壴 is an example of the former and 手(しゅう) of the latter. Of course one can look them up in a dictionary (or similar, such as unihan db), but ideally there should be attestations for each reading, not just the say-so of dictionary X. I guess digging such things up will be the work of specialists far in the future when Wiktionary is commonly referred to as just "the Dictionary"...
Speaking of kanji-related attestations (-cum-examples), they would be nice for meanings, too. They would need to go to language-specific sections though. -- Coffee2theorems 17:08, 5 May 2007 (UTC)


If you hadn't mentioned it I probably would never have thought to try Google books searching with Japanese... I'm surprised they have so much.

For what it's worth, aozora has versions that aren't in historical kana usage; though there are some instances of creative okurigana.

(reading Sōseki still gives me a headache) Cynewulf 17:43, 12 May 2007 (UTC)

I was surprised too, I didn't really expect them to have any. The stuff I saw with a few quick searches was fairly well off the mainstream (certainly not your average bestseller material :-), and I only got some weird spiritualist stuff when I tried for books that have full text available. I suppose the selection will grow with time though, and some of the material which allows viewing of full pages (but not full book) looked usable. I'll have to look at aozora next..
Now that you mention it, okurigana is also an interesting topic. Marking which forms follow the modern general rules, which ones are allowed exceptions by the same rules, and which ones do not follow the rules at all would be one more useful thing to do. -- Coffee2theorems 18:09, 12 May 2007 (UTC)
I tried that with 終る, as it was recently edited by a native Japanese contributor who marked it as wrong spelling in modern Japanese (not true, it's permissible use by the latest 1981 rules by the then-文部省-now-MEXT, and AFAIK that's the latest rules). As there are plenty of words with okurigana, perhaps a template would be in order. The link will probably have to be changed in the future, too. Specifying which rules were applied for each word is useful, as someone who reads Japanese slowly may want to look at only the relevant parts of the document to verify what's said in the article and perhaps to find out what the exact rule is. -- Coffee2theorems 23:13, 13 May 2007 (UTC)
Good job on 終る -- I've been digging up quotes to try to better show that it's not a misspelling like "coerse" or something. ("alternative spelling" is already inclusionist-speak for misspelling.) Probably a template for that, yes; another template candidate is "This term is usually written as ..." for things like (これ), 一寸 (ちょっと), 為る (する)... Cynewulf 00:53, 14 May 2007 (UTC)

Re: 決まる[edit]

I replied on my talk page. Sorry for the delay. I was a bit busy in my real life, and the topic in such a depth was a bit too delicate for me a non-expert to reply in no time... --Tohru 03:05, 13 May 2007 (UTC)

There's no hurry :-) Thanks for the reply, I'll go read it. -- Coffee2theorems 10:38, 13 May 2007 (UTC)


The net was down here for more than a day; I was going to suggest you go ahead and set up the template ;-)

Think you've got a mis-placed ) though. The name might be shorter? (We don't have to use ja- for a script template) Just "ruby"? (Is it used for anything other than Japanese?) maybe "ja-r". Or "JAruby" which follows a different convention, see Category:Script templates, the template should be in that cat ;-)

Good show! Robert Ullmann 13:47, 17 May 2007 (UTC)

I thought the name could be shorter, but didn't know the naming guidelines, so I mimicked ja-noun etc. :-) It uses markup which specifies the Japanese language, so it's Japanese-specific. The markup is required to force IE to show the ruby with a readable font size (see b:Japanese with IE to see what ruby looks without it). JAruby sounds good. In retrospect the name "furigana" was not the best choice, as ruby is also used for other things than kana in Japanese. -- Coffee2theorems 14:14, 17 May 2007 (UTC)
When I look at 一歩 I see extra )'s. (FireFox, Ruby enabled) Robert Ullmann 14:52, 17 May 2007 (UTC)
I don't know where it gets them. There's only one ( and one ) in the template. IE does not display them at all (as it shows rubies) and Firefox without rubies does not show extra amounts of them. Looking at the page source, the generated tags look correct to me. AFAIK there's only a third-party ruby extension for Firefox, perhaps there's a bug in it? Maybe it only works with full-width parentheses or something? How do these look like? (あさ), あさ. -- Coffee2theorems 15:09, 17 May 2007 (UTC)
Your example is fine, but the template doesn't. I moved the rp tags outside the span for font size, fixed. (Subtle rules for what spans can go around ;-) Robert Ullmann 15:22, 17 May 2007 (UTC)
And I just installed the extension to see how to get it to work, but you beat me to the punch :-) It works fine in IE, too. (had to check, because I don't normally use IE) -- Coffee2theorems 15:36, 17 May 2007 (UTC)
By the way, do you think we should have an infobox like that one at Wikipedia mentioning the extension on pages that use rubies? -- Coffee2theorems 15:44, 17 May 2007 (UTC)


It looks like we need to work on what to call Japanese verb forms -- what our conjugation tables say is the Continuative is the 連用形, i.e. 言う -> 言い; but いる suffixes to the -te ("Conjunctive") form, forming 言っている, which is shortened to 言ってる.

Some thoughts: Do you propose that た is a suffix that attaches to some form of verbs, the same as you're implying with て? How do you then account for the っ sokuon that appears before both of these in certain conjugations?

I don't know what's taught in 国語 classes, but all the (English) books I've read analyze things like this: "The plain form is 言う. To make the past form, drop the う and add った/た/んだ/etc. To make the -te form, make the past form but with て or で instead of た or だ" and so on. No mention of any stem forms other than the actual immutable stem 言.

Since Wiktionary is based on citations of words in print, I don't think that an entry for 言わ would be any help, since it only appears as part of 言わない 言われる 言わせる and so on. So, for 言う, I would have "form of" entries at these three, as well as 行って 言った 言えば 言います 言える -- there's finitely many of them, and they can be loaded by bot.

</ramble> Cynewulf 20:00, 17 May 2007 (UTC)

I agree that the forms need a lot of thought. Earlier I simply assumed we'd follow the analysis used in Japanese school grammar, because WT:AJA says this:
Western explanations of Japanese grammar are so varied that none are definitive and no well-known Western methods are consistent with classical Japanese. Thus, the parts of speech should adhere to the common Japanese way of teaching grammar in modern 国語 kokugo texts, the method with the widest agreement among Japanese scholars.
The school grammar is what all Japanese learn as part of their mandatory education, so it's pretty major, and I assume that's what the above refers to. Now, having studied what it says a bit and how it differs from the Japanese grammar taught to foreigners, I think there are too many differences to completely ignore the latter one. The quote above says that "Western explanations of Japanese grammar are so varied", but there are at least some points where they do not vary much. In these cases I think that both ways should be presented. How exactly to do it for the Japanese as a second language grammar I'm not sure, as I never did study grammar much (except for the basics, obviously) until I knew enough Japanese to read the Japanese explanations of it.
The conjugation tables here are missing some forms. Look at the table at 五段活用, column 連用形. It lists two forms for each verb, and it depends which one is used. (in classical Japanese it's simpler, there used to be 言いて etc.; some of these forms are still very common by the way) 言う is ワ行, and its continuative forms are 言い and 言っ. て and た of 言っ, 言っ are both 助動詞 which attach to the latter continuative form (言っ).
The conjunctive form does not exist as a separate form in school grammar, but universally does in second language grammar, and is one of the things where I think both explanations should be given. I think we should use the "conjunctive form" term whenever possible if that's what people are familiar with, explaining its correspondence to the school grammar at conjunctive, and at explain the school grammar 助動詞 (with a see also link to conjunctive form).
The bot idea for conjugations is fine, but we first need to define what's a conjugation and what's not. Certainly we won't be able to put them all in a table in the main article such as 言う, as that would be too overwhelming - the most important will do for that. But how about the conjugated entries themselves, and what to call the forms? E.g. what would be the name of the form 見ては in 見てはいけない? A colloquial form for that is 見ちゃいけない, and I'd create entries ては and ちゃ following the school grammar (and Japanese dictionary) approach. How would the second language approach go? There are a lot more decisions to make in that, and AFAIK they haven't been made yet. One possibility is not to have an entry for every possible form (such as 見ては - is it a conjugation or 見て+は or 見+ては or 見+て+は? All are possible ways to look at it :-), and when you have entries such as ちゃ (like you have -ism in English), the information is at least not missing, just (maybe) a little harder to find.
Having entries for the school grammar conjugations (the stem forms) surely does no harm and there are very few of them. You could also link to them in the etymology sections of entries such as 言わない (言わ+ない) and then list all the conjugations starting with 言わ we have articles for in a derived terms (or some other) section of 言わ. You'd then have a way of finding all the forms of 言う we have entries for without putting them all in a gigantic table or list at 言う.
Likewise </ramble>, -- Coffee2theorems 21:42, 17 May 2007 (UTC)
OK, how's this sound: since we have -s, -ing, and so on, we can have individual pages that analyze the verb-ending morphemes, such as れる. The entry for 知られる can use a template, something like "ja-passive form of|知る", and that template can link to some explanation of the passive, where we can explain such things as 名を知られた "suffered [his] name being known" and say how it's made from the 未然形 and the れる morpheme.
Actually, now that I think of it, it might be going overboard to do make an entry such as 言わせられちゃいました.. but I guess we can start with the basics and go from there. Yeah, wikt isn't paper, it doesn't hurt to have it, or 言わ, but theoretically we're supposed to be able to cite all these things if somebody asks. Cynewulf 21:41, 18 May 2007 (UTC)
It sounds mostly like a rewording of what I said in the long ramble above :-) Your apprehension with 言わせられちゃいました is similar to what I had at first, and I'm still not absolutely sure that all such things won't take inordinate amounts of space. But by starting with the assumption of infeasibility nothing would ever be done, so I'm now assuming feasibility until proven otherwise. As for quotations for 言わ, we also have prefixes such as re- and im-, and you won't find any better quotations for them either. Or for Japanese. Longer forms may sometimes be problematic in that respect, but maybe that is as well. Getting quotations for the specific form and knowledge of whether it's common or not etc. might be the most interesting part of such entries.
Using templates would reduce page size a lot and allow for easy modification of all forms without letting an army of bots loose at Wiktionary every time someone finds a spelling error. Perhaps just use the suffix instead of "passive" etc.? E.g. "{{ja-ichi-rareru|見|み|mi}}". Sometimes the same form has multiple meanings, e.g. 見られる is both passive and potential. The template would then be the whole content of most such pages. The templates would consist of multiple smaller ones, so if you want to say more in one entry, you can reuse the parts and don't have to modify that page every time someone modifies the parts.
If there's about 20,000 verbs (counting every spelling), each page takes 4KB of space, and there are about 1000 verb forms, then it'd take 76GB to store the pages, which would even fit on my laptop. I don't know the real numbers, for that you'd need to know how the server stores pages (I assumed every page in a separate file on a file system with block size of 4KB), how many verbs there will be (I counted every line in edict with "(v1" or "(v5" in it, it was about 7,500, and I assumed Wiktionary will have many more spellings) and how many forms there will be (I pulled 1000 out of my hat). No matter how hard you think, I bet you'll have hard time coming up with 1000 forms, so maybe there's no need to worry. Especially if we don't try adding them all right away with a bot. At the time Wiktionary covers all the verbs in Japanese the whole thing will fit on your mobile phone anyway. -- Coffee2theorems 00:42, 19 May 2007 (UTC)

Unrelated stuff: I like the template. (re 通則3: I'd be interested in seeing a cite of 男 with okurigana...) I'd like to see ja-verbconj made into a collapsing box like you proposed; also add some more forms (causative passive and the other 連用形 stick out) -- and link them. Cynewulf 21:41, 18 May 2007 (UTC)

I haven't seen 男 with okurigana, but just saw  (ひだ) in 吾輩は猫である and am now collecting the pieces of my jaw off the floor :-) -- Coffee2theorems 04:56, 25 May 2007 (UTC)

You could add a subset of inflections by a bot once you verify (by random sampling, if not otherwise) that there aren't many unclear cases in it. Weeding out the odd exceptions later is less work than entering all the entries by hand. I'm still not sure what exactly qualifies as an inflection, but I think that at least the following can be said:

  • Upper bound: Every regular inflection consists of a (possibly phrasal) verb or adjective stem and a series of suffixes. Unfortunately what exactly is a suffix may be somewhat debatable. (e.g. is (て)いる a suffix? てる surely must be)
  • Lower bound: Any complete (could syntactically appear in text without additional suffixes) non-phrasal term which fits the previous rule is an inflection if every removal of successive suffixes from its end results in an incomplete term. E.g. ない is the only suffix of 言わない and 言わ is an incomplete term, so 言わない is an inflection. 言わないで and 言わないでいる may or may not be.
  • Not every possible regular inflection is in use and some could be considered downright wrong due to semantic weirdness (できさせられる for instance?).
  • There are few irregular inflections and they're easily handled.

After these come two problems. One is that different people have different ideas about what's an inflection and what's not, so one way would need to be chosen and justified. The other is formulating the way precisely. The problems are not entirely separate.

Is 言いながら an inflection? ながら is a suffix that attaches only to verbs. How about 言っても? Or 見ての in 見ての通り? Or 言いますから? I think it has to boil down to some kind of idiomaticity rule. The inflected forms are not themselves idiomatic as they're regular, but the same can't be said of the inflection suffixes. E.g. ている is idiomatic, because the sense of いる in it is hardly the same as the stand-alone meaning anymore. I'd say てから and ての are also, because I don't think the exact meanings can be inferred from the parts alone. (大辞林 also has an entry for each of these three compounds, so I'm not alone in thinking so)

If you consider ている as a unit and apply the rules above, 言っている becomes an inflection because its only suffix is ている and 言っ is incomplete. You could also consider all verbs/adjectives formed by inflection as units, e.g. once you have やらせる from やる, you'd then say that やらせられる is an inflection of the unit やらせる and again apply the rules above. This way 言っても, 見ての and 言わないで would be inflections, but 言いますから and 言わないでいる not. (言わないで would be an inflection by the route 言わ+ないで, as ないで is idiomatic) -- Coffee2theorems 02:07, 21 May 2007 (UTC)


All this needed was to change the * to #; remember, the kana and romaji entries like this are simple (except for the to be verb here, because it is the usual form). All the etys etc can be in the kanji entries. Robert Ullmann 19:20, 20 May 2007 (UTC)

Ah, I had missed the explanation for romaji and kana forms at WT:AJA, as I have mostly been editing the commonest form entries. It doesn't say what to do about the two verb sections of いる though - surely they shouldn't be merged together, as one is an actual entry and another is more of a disambiguation list? -- Coffee2theorems 19:33, 20 May 2007 (UTC)
Also, should entries like 鑄る be treated similarly to romaji and kana entries? Duplicating all the information to such entries is not likely to stay in sync, and when someone goes on such a page they probably want to know either the modern form (linked) or something specific to the old usage. -- Coffee2theorems 19:41, 20 May 2007 (UTC)


Um, could you sign it? Thanks ;-) Robert Ullmann 15:00, 23 May 2007 (UTC)

Oops :-) -- Coffee2theorems 15:20, 23 May 2007 (UTC)


We don't use Pronoun header in Japanese -- WT:AJ -- because with respect to grammar they don't behave differently from plain nouns: 昨日の貴方 (and now I shall disappear for a few days) Cynewulf 04:12, 25 May 2007 (UTC)

Ah, thanks. I hadn't noticed that part of AJ, and assumed we'd call them pronouns simply because the Japanese do so (代名詞 vs. 名詞). -- Coffee2theorems 04:52, 25 May 2007 (UTC)

