Module talk:ja

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
test cases: Module talk:ja/testcases

Thanks ZxxZxxZ! Wyang (talk) 04:13, 11 April 2013 (UTC)

no problem, I hope it works correctly since I don't know Japanese and can't test it. --Z 04:19, 11 April 2013 (UTC)
I noticed this: ["ヴ"]="vye",["フ"]. It's incorrect, ヴ is "vu" but becomes "v" with any small vowel letter (ヴャ - vya), same with "フ" (fu). --Anatoli (обсудить/вклад) 04:23, 11 April 2013 (UTC)
That's not the actual conversion. In the function kata_to_romaji(f) the three character string is detected as "'([ヴフ])ィェ'"; eg. ツュチフィェ -> tsyuchifye (a hypothetical sequence of course). Wyang (talk) 04:26, 11 April 2013 (UTC)
I haven't changed because it's not clear how "kr3" function is used. Perhaps "ィェ" should be "ye" but "ヴ" and "フ" just "vu" and "fu". "ヴ" and "フ" become "v" and "f" in front of small vowel letters ァ, ィ, ェ and ォ (and those with "y"). The same is true for a few other letters, like ツ, デ, ト, ド, etc., which are sometimes used in loanwords. --Anatoli (обсудить/вклад) 04:37, 11 April 2013 (UTC)
kr3 is used only if ヴ/フ is followed by ィェ. ヴ is still -> "vu". Wyang (talk) 04:40, 11 April 2013 (UTC)

Moved from User talk:ZxxZxxZ[edit]

Hi. Thanks for your help there. There is one more debugging request: for function M.romaji_to_kata(f), I want to have the string replaced using rk4, rk3, rk2, rk1 sequentially, like the previous one. When I invoke it, however, "tsyuchifye" should generate "ツュチフィェ" but it instead now generates "ツュチフィェ". Could you please take a look and see where went wrong? Thanks. Wyang (talk) 04:48, 11 April 2013 (UTC)

NP, I just took a look on how the Japanese writing system works to see if there is better ways to convert terms. One thing that I noted is that romaji looks to be irreversible (for example, wi may be both ヰ and ウィ), so is it really possible to convert from romaji to katakana? --Z 05:13, 11 April 2013 (UTC)
Ah, I should have removed these one-to-many ones (also zu). It is convertible. ヰ is obsolete in modern Japanese, so wi should be mapped to ウィ. Similarly zu should correspond to ズ, not ヅ. I suppose there are alternative ways of writing this, by analysing what follows the consonant. It definitely requires more work; don't know if that would work though. Wyang (talk) 05:21, 11 April 2013 (UTC)
It's irreversible, unfortunately or, at least, may not be very accurate. People can usually bear with "おう" being converted to "ou", in words where should be "ō" but the other way around is worse. We don't romanise "東京" (とうきょう) as "toukyou" but "Tōkyō" but "大きい" (おおきい) is also romanised with "ō" as "ōkii. Letter "ヅ" can be romanised as "dzu" to make it different from "ズ" (so it's used when typing) but usually it's "zu". --Anatoli (обсудить/вклад) 05:31, 11 April 2013 (UTC)
Katakana/hiragana to romaji would be useful to create romaji transliteration and romaji entries, so would katakana to hiragana (to build sorting keys in categories). Not sure about hiragana to katakana but most animals, onomatopoeia, etc. have variant spellings in katakana. --Anatoli (обсудить/вклад) 05:36, 11 April 2013 (UTC)
This shouldn't pose a difficulty, if the algorithm is: 1) de-macron, "ō" -> "ou", 2) "to" -> "と", 3) "o" -> "お". Wyang (talk) 05:40, 11 April 2013 (UTC)
You probably missed the section about "ōkii", it's おおきい (ookii), not おうきい (oukii). I don't understand what you meant by 2) "to" -> "と", 3) "o" -> "お". --Anatoli (обсудить/вклад) 05:46, 11 April 2013 (UTC)
I see what you mean. Macron 'o' is essentially a conflation of the combinations 'oo' and 'ou'. There would be no ambiguity if "ō" is disallowed in the input from the beginning (or if not disallowed, set to 'ou' by default unless specified, as 'ou' from Sino-Japanese words would greatly outnumber 'oo' which is mainly of native origin. Wyang (talk) 06:03, 11 April 2013 (UTC)
What I mean is, when this is used at romaji entries: Tōkyō may be used with {{ja-romaji}} with no specifications (as ō is by default 'ou') and this produces とうきょう, but ōkii has to have {{ja-romaji|rom=ookii}} or {{ja-romaji|hira=おおきい}} to limit it to 'oo'. Wyang (talk) 06:08, 11 April 2013 (UTC)
(before edit conflict) That's just how it is, the standard is to use "ō" here and many publications. Notable exceptions: the particles "は" (letter "ha") "へ" (letter "he") are transliterated as "wa" and "e", letter "を" is transliterated as "o", not "wo" (in any position). The tool can still be useful if the transliteration standard is not changed but will require manual override. Archaic letters can be ignored in romaji-kana conversion.
(after edit conflict) I see what you mean. we could have additional params for back translit but it remains to be seen where and how these modules are used, so that an adjustment or a collective decision could be made. We didn't use automatic transliteration before, so... --Anatoli (обсудить/вклад) 06:17, 11 April 2013 (UTC)
BTW, I don't know if a conversion table is necessary for ACCEL creation of JA entries (as the script like Template:ja new (Japanese version of Template:cmn new) will not be using Lua). But if you do, we could agree to type romaji like "Toukyou" and "ookii", so that the conversion to hiragana happened correctly. --Anatoli (обсудить/вклад) 06:31, 11 April 2013 (UTC)

to do[edit]


Needs to convert string-final "n" to ン in kana_to_romaji(f). I added

if mw.ustring.sub(text,mw.ustring.len(text),mw.ustring.len(text)) == 'n' then text = (mw.ustring.sub(text,1,mw.ustring.len(text)-1) .. "ン") end

but it didn't work. Wyang (talk) 21:17, 11 April 2013 (UTC)

2) hidx

3) geminate consonants (done) Wyang (talk) 04:14, 12 April 2013 (UTC)

1) You mean when it is at the end of the word it should be ン? --Z 04:52, 12 April 2013 (UTC)
Yes. See the testcases. shinkansen is not converted correctly. I think converting final 'n' to ン prior to list conversion would solve that problem. Wyang (talk) 04:55, 12 April 2013 (UTC)
I'm not what the exact problem is with ン but ン is ALWAYS "n", also in front of ナ, ニ, etc. It gets an apostrophe ' in front of any vowel (large) - 遠泳 (えんえい = en'ei). Small ones are not used after ン and we don't ever romanise it as m, ng. --Anatoli (обсудить/вклад) 05:01, 12 April 2013 (UTC)
See Module talk:ja/testcases. shinkansen -> シンカンsエン. I guess it's because 'en' is converted first and there is nothing to convert the remaining 's' to. Although converting the final 'n' to ン first doesn't seem to work either. Wyang (talk) 05:06, 12 April 2013 (UTC)
n/s case fixed. --Z 05:23, 12 April 2013 (UTC)

Thanks! Looks like everything listed has been done now (1,2,3). Wyang (talk) 05:25, 12 April 2013 (UTC)


Can somebody please write the documentation? I would, but I don't know how everything in it works. Please, we need to be careful to document our modules so others can use them more easily. —Μετάknowledgediscuss/deeds 04:51, 15 April 2013 (UTC)

Excellent. Thank you! —Μετάknowledgediscuss/deeds 20:16, 15 April 2013 (UTC)