Module talk:ja

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
test cases: Module talk:ja/testcases

Thanks ZxxZxxZ! Wyang (talk) 04:13, 11 April 2013 (UTC)

no problem, I hope it works correctly since I don't know Japanese and can't test it. --Z 04:19, 11 April 2013 (UTC)
I noticed this: ["ヴ"]="vye",["フ"]. It's incorrect, ヴ is "vu" but becomes "v" with any small vowel letter (ヴャ - vya), same with "フ" (fu). --Anatoli (обсудить/вклад) 04:23, 11 April 2013 (UTC)
That's not the actual conversion. In the function kata_to_romaji(f) the three character string is detected as "'([ヴフ])ィェ'"; eg. ツュチフィェ -> tsyuchifye (a hypothetical sequence of course). Wyang (talk) 04:26, 11 April 2013 (UTC)
I haven't changed because it's not clear how "kr3" function is used. Perhaps "ィェ" should be "ye" but "ヴ" and "フ" just "vu" and "fu". "ヴ" and "フ" become "v" and "f" in front of small vowel letters ァ, ィ, ェ and ォ (and those with "y"). The same is true for a few other letters, like ツ, デ, ト, ド, etc., which are sometimes used in loanwords. --Anatoli (обсудить/вклад) 04:37, 11 April 2013 (UTC)
kr3 is used only if ヴ/フ is followed by ィェ. ヴ is still -> "vu". Wyang (talk) 04:40, 11 April 2013 (UTC)

Moved from User talk:ZxxZxxZ[edit]

Hi. Thanks for your help there. There is one more debugging request: for function M.romaji_to_kata(f), I want to have the string replaced using rk4, rk3, rk2, rk1 sequentially, like the previous one. When I invoke it, however, "tsyuchifye" should generate "ツュチフィェ" but it instead now generates "ツュチフィェ". Could you please take a look and see where went wrong? Thanks. Wyang (talk) 04:48, 11 April 2013 (UTC)

NP, I just took a look on how the Japanese writing system works to see if there is better ways to convert terms. One thing that I noted is that romaji looks to be irreversible (for example, wi may be both ヰ and ウィ), so is it really possible to convert from romaji to katakana? --Z 05:13, 11 April 2013 (UTC)
Ah, I should have removed these one-to-many ones (also zu). It is convertible. ヰ is obsolete in modern Japanese, so wi should be mapped to ウィ. Similarly zu should correspond to ズ, not ヅ. I suppose there are alternative ways of writing this, by analysing what follows the consonant. It definitely requires more work; don't know if that would work though. Wyang (talk) 05:21, 11 April 2013 (UTC)
It's irreversible, unfortunately or, at least, may not be very accurate. People can usually bear with "おう" being converted to "ou", in words where should be "ō" but the other way around is worse. We don't romanise "東京" (とうきょう) as "toukyou" but "Tōkyō" but "大きい" (おおきい) is also romanised with "ō" as "ōkii. Letter "ヅ" can be romanised as "dzu" to make it different from "ズ" (so it's used when typing) but usually it's "zu". --Anatoli (обсудить/вклад) 05:31, 11 April 2013 (UTC)
Katakana/hiragana to romaji would be useful to create romaji transliteration and romaji entries, so would katakana to hiragana (to build sorting keys in categories). Not sure about hiragana to katakana but most animals, onomatopoeia, etc. have variant spellings in katakana. --Anatoli (обсудить/вклад) 05:36, 11 April 2013 (UTC)
This shouldn't pose a difficulty, if the algorithm is: 1) de-macron, "ō" -> "ou", 2) "to" -> "と", 3) "o" -> "お". Wyang (talk) 05:40, 11 April 2013 (UTC)
You probably missed the section about "ōkii", it's おおきい (ookii), not おうきい (oukii). I don't understand what you meant by 2) "to" -> "と", 3) "o" -> "お". --Anatoli (обсудить/вклад) 05:46, 11 April 2013 (UTC)
I see what you mean. Macron 'o' is essentially a conflation of the combinations 'oo' and 'ou'. There would be no ambiguity if "ō" is disallowed in the input from the beginning (or if not disallowed, set to 'ou' by default unless specified, as 'ou' from Sino-Japanese words would greatly outnumber 'oo' which is mainly of native origin. Wyang (talk) 06:03, 11 April 2013 (UTC)
What I mean is, when this is used at romaji entries: Tōkyō may be used with {{ja-romaji}} with no specifications (as ō is by default 'ou') and this produces とうきょう, but ōkii has to have {{ja-romaji|rom=ookii}} or {{ja-romaji|hira=おおきい}} to limit it to 'oo'. Wyang (talk) 06:08, 11 April 2013 (UTC)
(before edit conflict) That's just how it is, the standard is to use "ō" here and many publications. Notable exceptions: the particles "は" (letter "ha") "へ" (letter "he") are transliterated as "wa" and "e", letter "を" is transliterated as "o", not "wo" (in any position). The tool can still be useful if the transliteration standard is not changed but will require manual override. Archaic letters can be ignored in romaji-kana conversion.
(after edit conflict) I see what you mean. we could have additional params for back translit but it remains to be seen where and how these modules are used, so that an adjustment or a collective decision could be made. We didn't use automatic transliteration before, so... --Anatoli (обсудить/вклад) 06:17, 11 April 2013 (UTC)
BTW, I don't know if a conversion table is necessary for ACCEL creation of JA entries (as the script like Template:ja new (Japanese version of Template:cmn new) will not be using Lua). But if you do, we could agree to type romaji like "Toukyou" and "ookii", so that the conversion to hiragana happened correctly. --Anatoli (обсудить/вклад) 06:31, 11 April 2013 (UTC)

to do[edit]

1)

Needs to convert string-final "n" to ン in kana_to_romaji(f). I added

if mw.ustring.sub(text,mw.ustring.len(text),mw.ustring.len(text)) == 'n' then text = (mw.ustring.sub(text,1,mw.ustring.len(text)-1) .. "ン") end

but it didn't work. Wyang (talk) 21:17, 11 April 2013 (UTC)

2) hidx

3) geminate consonants (done) Wyang (talk) 04:14, 12 April 2013 (UTC)

1) You mean when it is at the end of the word it should be ン? --Z 04:52, 12 April 2013 (UTC)
Yes. See the testcases. shinkansen is not converted correctly. I think converting final 'n' to ン prior to list conversion would solve that problem. Wyang (talk) 04:55, 12 April 2013 (UTC)
I'm not what the exact problem is with ン but ン is ALWAYS "n", also in front of ナ, ニ, etc. It gets an apostrophe ' in front of any vowel (large) - 遠泳 (えんえい = en'ei). Small ones are not used after ン and we don't ever romanise it as m, ng. --Anatoli (обсудить/вклад) 05:01, 12 April 2013 (UTC)
See Module talk:ja/testcases. shinkansen -> シンカンsエン. I guess it's because 'en' is converted first and there is nothing to convert the remaining 's' to. Although converting the final 'n' to ン first doesn't seem to work either. Wyang (talk) 05:06, 12 April 2013 (UTC)
n/s case fixed. --Z 05:23, 12 April 2013 (UTC)

Thanks! Looks like everything listed has been done now (1,2,3). Wyang (talk) 05:25, 12 April 2013 (UTC)

Documentation[edit]

Can somebody please write the documentation? I would, but I don't know how everything in it works. Please, we need to be careful to document our modules so others can use them more easily. —Μετάknowledgediscuss/deeds 04:51, 15 April 2013 (UTC)

Excellent. Thank you! —Μετάknowledgediscuss/deeds 20:16, 15 April 2013 (UTC)

romanization of ~っ[edit]

@TAKASUGI Shinji I'm not sure 't' is suitable either; なーんてねっ (nānte net) seems odd to me. I chose "h" so that あっ would become ah (I had totally forgotten about h as another method of romanizing long vowels).

Also, FWIW, I rewrote the romanization code recently and the old code simply didn't romanize ~っ at the end of a phrase at all, i.e. あっ (a), which I thought was somewhat problematic. What is your opinion on that behavior? —suzukaze (tc) 07:46, 22 January 2017 (UTC)

@Suzukaze-c: As you know, there is no established transcription for the final っ. I used t because it matches well at least for あっという間. Some scholars use q ([1]), which is based on the long tradition of the phonemic notation /q/ but may look too exotic. Others use an apostrophe ([2], [3]). @Atitarev, Eirikr, Haplology, Wyang, エリック・キィ: What do you think of romanization of the final っ? — TAKASUGI Shinji (talk) 09:20, 22 January 2017 (UTC)
This was discussed before: Wiktionary:Tea room/2014/August#六. Wyang (talk) 09:28, 22 January 2017 (UTC)
I’d forgotten it, thanks. We just omitted the final っ until the revision as of 2016-11-10T11:11:06, which used #. — TAKASUGI Shinji (talk) 09:57, 22 January 2017 (UTC)
"#" was a shortlived personal experiment that got published by accident, please don't mind that part.
I was unaware of both the discussion and the policy, thanks. A lot of policies here seem kind of outdated in comparison with current practice though, for example the points under 'relaxed rules' and the entry layout on Wiktionary:About Japanese. Can we use this opportunity to consider changing the policy on final っ? —suzukaze (tc) 10:05, 22 January 2017 (UTC)
  • There's policy, and then there's the technical side. I think the policy arose in part because omitting it is much easier -- if we make it "t" instead, or "h" instead, there are all kinds of odd corner cases that go funny, as explored in this current go-round.
So long as those corner cases can be properly thought through and planned for, I'm open to being convinced to change current practice. FWIW, I think omission works and is reasonably clear. ‑‑ Eiríkr Útlendi │Tala við mig 18:05, 22 January 2017 (UTC)
Maybe in the case of あっという間に where there is a と to consider it could be transliterated as 't', but in other cases it could be transliterated as something else. —suzukaze (tc) 00:16, 23 January 2017 (UTC)
How about deleting a space after っ in あっという間? That will yield atto iu ma. — TAKASUGI Shinji (talk) 13:54, 23 January 2017 (UTC)
It's totally reasonable but I also feel like morphologically it's あっ+と+いう+間+に and should maybe be romanized as such. —suzukaze (tc) 21:07, 23 January 2017 (UTC)
In this particular case, あっ and と are completely fused. There is no pause between them. — TAKASUGI Shinji (talk) 23:38, 23 January 2017 (UTC)
Which is why I proposed "where there is a と to consider it could be transliterated as 't'". I know there's no glottal stop in the pronunciation in the case of あっという間. —suzukaze (tc) 17:34, 24 January 2017 (UTC)
げっ (geh) / あっという () (at to iu ma)suzukaze (tc) 08:28, 25 January 2017 (UTC)
We shouldn’t use h. It is for a long vowel. — TAKASUGI Shinji (talk) 11:23, 25 January 2017 (UTC)
Hmm, but we already use Hepburn-style rōmaji anyway. I personally am all for alternatives like q and ' but I also fear that it may confuse casual users of Wiktionary. Of course we could also do the previous status quo of romanizing it as nothing but I am of the opinion that romanizating it visibly is beneficial. Would directly using ʔ be too radical? —suzukaze (tc) 12:01, 25 January 2017 (UTC)
Sorry guys. I have limited Internet access as I'm on a holidays in Thailand. I think っ after vowels shouldn't be romanised at all or or should be romanised as nothing. That's the common practice out there and this is not a unique situation when a foreign letter is romanised as nothing in some situations. --Anatoli T. (обсудить/вклад) 15:42, 25 January 2017 (UTC)