Template talk:ja-pron

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Initial discussion[edit]

Discussion moved from Talk:精神分裂病.


Having done Module:ko-pron, I'd like to start the work on Module:ja-pron now. Wyang (talk) 03:19, 16 April 2014 (UTC)
@Eirikr. Might as well move this discussion to a more visible place. ;) --Anatoli (обсудить/вклад) 03:23, 16 April 2014 (UTC)
  • @Atitarev Dunno where that is, but please go ahead and move it. I'm bogged down in meatspace work and don't have time for WT for the next week or two.  :(
  • @Wyang please do! I don't have time, as much as I'd love to dig in and get my hands dirty. ‑‑ Eiríkr Útlendi │ Tala við mig 06:07, 16 April 2014 (UTC)
  • @Eirikr OK. I'll think of a different location. If you have a way to contact Haplology, please let him know we need him here! Take it easy and come back when you can.
  • @Wyang. I'm happy to do the testing and using the future module but I'd need to brush up my IPA for Japanese. We'll have to rely on your skills and knowledge again. :) がんばってね!--Anatoli (обсудить/вклад) 06:17, 16 April 2014 (UTC)


Thanks. I've done a crude version at Template:ja-pron and Module:ja-pron. Currently,

{{ja-pron|せいしん ぶんれつ びょう|acc=h|y=on}}

generates


  • On’yomi
  • (Tokyo) いしんぶんれつびょう [sèéshíń búńrétsú byóó] (Heiban - [0])
  • IPA(key): [se̞ːɕĩm bɯ̟̃ᵝnɾe̞t͡sɨᵝ bʲo̞ː]

Any suggestions? Wyang (talk) 07:52, 16 April 2014 (UTC)

Well done! No suggestions yet but some documentation would be helpful, specifically on parameters and types of accents. It's for 標準語, isn't it or for any variety? Should the template link to/mention the variety name? --Anatoli (обсудить/вклад) 08:01, 16 April 2014 (UTC)
Can we also have Accent: 0, Accent: 1, 'etc. next to accent names? --Anatoli (обсудить/вклад) 08:14, 16 April 2014 (UTC)
It would refer to standard Japanese, although I don't know where that information should be placed. How do the accent numbers correspond to accent types 'h,o,a,n'? Japanese pitch accent isn't very helpful. Do they refer to the accented morae? (in which case, nakadaka could get more than one number, correct?) Wyang (talk) 13:10, 16 April 2014 (UTC)
Letters must be the best way, then. Since we don't know if a given transcription is for the standard Japanese, I'll drop this request as well. They can be marked in brackets for any variety, potentially. Do you think we should have a default IPA info, when the pitch is unknown? Unfortunately, I don't have enough sources for the Japanese accents, nothing online, only some old Japanese-Russian dictionaries with accents. --Anatoli (обсудить/вклад) 22:02, 16 April 2014 (UTC)


  • Looks good. I tweaked the module to replace ɽ with ɾ̠. The former is a retroflex tap, not used in Japanese, while the latter is more generally accepted as the /r/ tap preceding an /a/ sound.
@Atitarev you're talking about the numbers used in some dictionaries to indicate the number of the syllable right after which the downstep in pitch occurs, yes? Or something else? If you mean the downstep syllable, calling it accent isn't quite correct. If you mean something more like dialect, maybe some other less ambiguous term could be used.
@Atitarev, Wyang {{ja-accent-common}} refers to 標準語 as defined by Tokyo standard pronunciation and the NHK pronunciation guidelines for broadcasters. I haven't seen any resources that give pitch information for any other dialects, but I would be quite happy to include those, provided we can find such resources.
With that in mind, is there any easy way to make the module, um, modular :), to allow for pluggable pitch sub-modules or functions? I haven't gone through the code at all really, I just made that one change to swap out ɽ.
Also, @Wyang the pitch drop on long vowels looks plug-ugly at カリフラワー, at least on my machines -- the ` invalid IPA characters (`) that's supposed to show the lower pitch shows up too far to the right, so that it's not even over the vowel at all, appearing instead as a stray mark over the closing square bracket.
Lastly, combos like 'çʲj' don't quite work -- this should just be 'çj' instead, with just the main palatal glide of /j/.
Cheers folks, thank you for your help with this! ‑‑ Eiríkr Útlendi │ Tala við mig 22:03, 16 April 2014 (UTC)
  • Oh, and to clarify, when I say I haven't seen any resources that give pitch information for any other dialects, I mean on a word-by-word basis. Shibatani and others do discuss the broad trends of pitch patterns, but the only lexicographical information about pitch that I've seen in actual dictionaries and the like has been for Tokyo dialect. ‑‑ Eiríkr Útlendi │ Tala við mig 22:05, 16 April 2014 (UTC)
  • Oo, also, じ should be rendered as d͡ʑi, not (d͡)ʑʲi (not really clear what the parens are doing there, and no need for the small "j"). C.f. 御御籤.
For compatibility (and legibility) purposes, {{ja-pron}} should support yomi as a synonym param for y.
And, how / where do we put the reference footnote? If I put it right after the call to {{ja-pron}}, the footnote shows up on the IPA line -- which isn't correct, since I'm using the reference for the pitch accent, not the IPA. See 御御籤 again for an example.
Thank you again! ‑‑ Eiríkr Útlendi │ Tala við mig 22:27, 16 April 2014 (UTC)
@Wyang Re: "nakadaka could get more than one number, correct". Does that mean than "n" may not be sufficient? I'll dig up my old dictionaries/textbooks and check if there is a straightforward mapping between numbers and letters (pitch accent names). --Anatoli (обсудить/вклад) 22:32, 16 April 2014 (UTC)
Pitch and names and numbers:
  • Heiban (平板, “flat”): Pitch rises after first syllable, falls gradually thereafter. Pitch number 0 -- no downstep.
  • Atamadaka (頭高, “high head”): First syllable takes high pitch, downstep immediately thereafter. Pitch number 1 -- downstep after first mora.
  • Nakadaka (中高, “high middle”): Pitch rises after first syllable, downstep after some number of morae. Can only apply to terms with at least 3 morae. Pitch number varies, must be at least 2, and less than the total number of morae in the term -- downstep after mora indicated by number.
  • Odaka (尾高, “high tail”): Pitch rises after first syllable, downstep after last mora. Pitch number varies, must be the number of the last mora in the term -- downstep after mora indicated by number. For odaka terms, the downstep is actually heard on the following particle.
Hope that clarifies! ‑‑ Eiríkr Útlendi │ Tala við mig 23:45, 16 April 2014 (UTC)
Thanks. You have just confirmed that for Nakadaka there are variants. That means that just using "n" won't produce 100% correct pitch accent. Users may want to know when the downstep starts on long words, right? Please clarify. --Anatoli (обсудить/вклад) 00:08, 17 April 2014 (UTC)

Thanks.

  1. 'replace ɽ with ɾ̠' - great
  2. 'the accent fall on long vowels looks plug-ugly'
    Um... not sure how to solve this. Unicode only has combined grave-macron for e (ḕ) and o (ṑ). It is caused by the font formatting <tt>kàrífúráwā̀</tt> (kàrífúráwā̀). cf. normal unformatted 'kàrífúráwā̀'. We could decompose it into single vowels, eg. kàrífúráwàà, or use either no formatting or some other font (which I don't know). For now I have decomposed it into single vowels and it now looks like Navajo.
  3. Palatalisation. It is currently consistently marked. Questions are:
    1. Should it be consistently marked? eg. mi -> mʲi, ki -> kʲi, çj -> çʲj. I have removed this now.
    2. Should it be marked by default after ɕ, t͡ɕ and d͡ʑ, or when those are followed by non-'ij', or not at all? This version of 統合失調症 used the second option. I have changed it to match that.
  4. About '(d͡)ʑ'... Japanese phonology says d͡ʑ ~ ʑ and d͡z ~ z are in free variation for romaji 'j' and 'z', respectively. Hence the notation there... Should these be written as 'd͡ʑ' and 'd͡z' regardless of the environments they are in? I have converted them to 'd͡ʑ' and 'd͡z' for now.
  5. I have added |yomi, |accent, |accent2, |acc_loc, |accent_loc (these are 'Tokyo' by default), |acc_ref, |accent_ref ('DJR' by default), |acc2_loc, |accent2_loc ('Tokyo' by default), |acc2_ref, |accent2_ref ('DJR' if |acc2 exists).

Anatoli: The single-letter accent types in that template mainly match {{ja-accent-common}}, except 'nakadaka' which needs further specifying. Thus |acc=o (o), |acc=a (a), |acc=h (h), |acc=2,3 (n), |acc=2,2 (n). I'm not sure how the accent numbers correspond to this. Maybe they are positions of accented morae? Wyang (talk) 00:20, 17 April 2014 (UTC)

  • @Wyang re: single-letter accent types, see above about Pitch and names and numbers.  :) Theoretically, it should be possible to specify h, a, or o without needing any number. Only nakadaka would require a number to be able to figure out where the downstep happens. As such, one should ideally be able to specify nakadaka using 'nX, where X is the number, or by using X alone.
For that matter, it might make sense to allow accent types to also be specified by number alone, where 0 or 1 would be heiban or atamadaka respectively, and any greater value would wind up as odaka or nakadaka, depending on how many morae are in the term.
  • Re: d͡ʑ ~ ʑ and d͡z ~ z, d͡z happens, but is rarer. Likewise, ʑ happens, but is rarer. This is mostly an issue of geographical variations in dialect. For NHK purposes (i.e. one of the closest things to a standard pronunciation), my understanding is that romaji "j" == [d͡ʑ], and romaji "z" == [z]. This gets complicated, but it might make sense in the longer term to add a param to allow for specifying this variation, since I think it might sometimes be contrastive and / or emphasized in certain careful speech.
Similarly, whether or not certain /i/ or /u/ sounds are unvoiced should also be specifiable. is つき in hiragana, and is usually [t͡sɯ̥ᵝki] as I hear it. Meanwhile, 付き as in about, regarding is also つき in hiragana, and is usually [t͡sɯᵝki] as I hear it. So it's not really possible to tell just from the kana spelling whether a given /i/ or /u/ is unvoiced.
  • Re: where to put references, there's also the question of where to put qualifiers. Sometimes, albeit rarely, certain pitch patterns for a single term are specific to certain senses. See デッキ for one such example.
  • Thanks again! ‑‑ Eiríkr Útlendi │ Tala við mig 01:00, 17 April 2014 (UTC)
How is non-initial /g/ handled? Are both g/ŋ produced? Please demonstrate on ありがとう.
To me, it seems most Japanese who start learning foreign languages late, have difficulty pronouncing /ʑ/, even if they make an effort. :) --Anatoli (обсудить/вклад) 01:11, 17 April 2014 (UTC)

Thanks.

  • For arigatou:
  • IPA(key): [a̠ɾiɡa̠to̞ː]
  • d͡z: changed to 'z'.
  • For vowel devoicing: There was a rule in the module, which devoices vowels between voiceless consonants, and then only keeps the first when two devoiced vowels occur in adjacent morae. I have removed that rule and added a |dev= parameter. Please see Template:ja-pron/documentation.
  • I have added |acc_note, |accent_note, |acc2_note, |accent2_note, which are placed at the end of the accent line.
  • Accent types '0' and '1' treated as 'h' and 'a'. If not single characters 'hao01', then remove 'o'. If resulting string is equal to the length of text, then 'o'. If not, then 'n'. eg. |acc=0 (h), |acc=1 (a), |acc=h (h), |acc=2 for 2-morae word (o), |acc=3 for 3-morae word (n3), |acc=o for 5-morae word (o).
  • I have added an accent reference template so that |acc_ref=NHK etc. can now call the reference template. (Template:ja-pron/documentation)

How about now? Wyang (talk) 02:10, 17 April 2014 (UTC)

Arrowred.png More thoughts :) --
  • There are sometimes more than just two pitch accent patterns. The most I can recall running into is three, but I suppose it's possible that a handful of terms might even have four.
  • I'm changing the description for the dev param -- the number should really be described as the mora number, as some syllabic analyses would give incorrect results. For instance, かんした could be analyzed as having two syllables (sounding like /kan.ɕta/ in casual speech), but four morae, and the devoiced mora is the third one.
  • For references, I think it's best to have the default be nothing. There are terms where Daijirin doesn't include any pitch accent, and I've misplaced my NHK pronunciation dictionary (probably in a box in storage), but I work with native speakers and sometimes crib from them. In these cases, I deliberately don't list any reference, since there isn't really any -- but I think the pitch information is important enough to include, until such time as I can find a real reference to add.
Also, by You can also do it the conventional way, |acc_ref=[1] -- do you mean that it's possible to add the call to {{R:Daijisen}}, etc., directly as the acc_ref param value?
Thanks again, again!  :D ‑‑ Eiríkr Útlendi │ Tala við mig 19:23, 17 April 2014 (UTC)


  • I have added |acc3, |acc4 and |acc5, since there might be occasions where non-Tokyo accent patterns would like to be specified too.
  • 'dev': I think I might have described it inaccurately... As the module analyses it, the 'dev' parameter is the position of the devoiced syllable in the kana string. eg. hyakushou should have |dev=3 (not 2), and だいこん やくしゃ should have |dev=6 (spaces are not counted). This is inconsistent with the format of the accent parameter, but I think it is easier to specify and easier for the module to handle.
  • 'ref': Oops, I forgot to put nowiki tags around it. It should read |acc_ref=<ref name="NHK">{{R:NHK Hatsuon}}</ref>.
  • 'default ref': I have removed the default reference, so that there is no reference listed when the parameter is unspecified.
  • 'dehijacking the talkpage': I agree... Hence it is here now.

Thanks. Wyang (talk) 22:01, 17 April 2014 (UTC)

References[edit]


Long vowel oddity spotted at パーソナルコンピューター[edit]

Just created this entry, and noticed that the ピュー got romanized oddly in the romaji-with-tone-marks bit in the pronunciation section:

[pàásónárúkóńpyúーtàà]

I'm about to log off for the night. If you have time, could someone look at the module and see what's going on there?

Cheers, ‑‑ Eiríkr Útlendi │ Tala við mig 04:53, 20 April 2014 (UTC)

Sorry for the delay. Fixed now. Wyang (talk) 08:02, 22 April 2014 (UTC)

Pitch on moraic /n/ > ん[edit]

I was reformatting 日本 and noticed that the downstep that occurs on the final ん isn't being indicated in the romanized version with tone marks. For instance, {{ja-pron}} is giving [nìhón] and [nìppón], when it should be outputting [nìhóǹ] and [nìppóǹ] instead.

For that matter, even if there were no downstep, the template should still show tone marks for moraic /n/. Could that be fixed? ‑‑ Eiríkr Útlendi │ Tala við mig 05:59, 21 April 2014 (UTC)

Thanks, I think it's fixed now. Wyang (talk) 08:02, 22 April 2014 (UTC)

Displaying numbers next to pitch accent names[edit]

@Wyang It would be good to add numbers next pitch accent names, similar to how some paper and online dictionaries mark accents, e.g. 現在 on Weblio. it would make it easier to cross-reference Wiktionary accent names to those numbers. User:Eirikr seems to agree. Do you think it's a good idea? --Anatoli (обсудить/вклад) 01:45, 20 June 2014 (UTC)

Added now. Wyang (talk) 02:22, 20 June 2014 (UTC)
Thank you. I've added [ ]. Is that OK? From what I've seen so far, either a superscript number is used or a number in square brackets. --Anatoli (обсудить/вклад) 03:01, 20 June 2014 (UTC)
Yes, please prettify anything. :) Wyang (talk) 03:58, 20 June 2014 (UTC)

宿題[edit]

@Wyang, @Eirikr

It didn't work on 宿題 (しゅくだい), the しゅ part. --Anatoli (обсудить/вклад) 02:19, 2 July 2014 (UTC)

Why is |dev=1? I thought vowel devoicing only occurs interconsonantally. Wyang (talk) 06:48, 2 July 2014 (UTC)
It's between two devoiced consonants ɕ and k. Same with しかし (working fine) and 少し (adding now). NHK even uses a similar notation to ours for devoiced vowels. --Anatoli (обсудить/вклад) 07:20, 2 July 2014 (UTC)
It should be |dev=2 instead. |dev= is the position of kana with devoiced vowel in the input kana string. Wyang (talk) 07:31, 2 July 2014 (UTC)
Oh, thank you. Silly me. :) --Anatoli (обсудить/вклад) 07:37, 2 July 2014 (UTC)
  • Just saw this thread again. Wyang, when kana compounds like しゅ are devoiced, the whole thing should be marked as devoiced, like しゅ. Marking it as し makes it look like [ɕiɯ̥ᵝ] or some such oddness, when what we want to indicate instead is [ɕɯ̥ᵝ] or [ɕʲɯ̥ᵝ].. ‑‑ Eiríkr Útlendi │ Tala við mig 01:40, 5 March 2015 (UTC)
@Eirikr I didn't notice this, thanks. You're right. I'm also inviting you to join Wiktionary:Beer_parlour/2015/February#Simplification_of_topic_categories_adding, which may affect Japanese categories, hopefully for the better, if implemented. --Anatoli T. (обсудить/вклад) 02:00, 5 March 2015 (UTC)

dev2, dev3?[edit]

@Wyang Frank, can there be more dev's, please, as in 蛋白質 to get [tã̠mpa̠kɯ̥ᵝɕit͡sɯ̥ᵝ], e.g. ...|dev=4|dev2=6...? --Anatoli T. (обсудить/вклад) 03:22, 28 January 2015 (UTC)

OK, second devil added. I want to rewrite its code... so that the devils can be written as たんぱ(く)し(つ), avoiding the need for |dev11=. Keep it like this for now, I will change the format if I ever get around to doing that... Wyang (talk) 03:43, 28 January 2015 (UTC)
Ah, thanks. I guess all uses will need to be updated? --Anatoli T. (обсудить/вклад) 03:50, 28 January 2015 (UTC)
That's not a big problem when done semi-automatically. We've managed to do all the Chinese format changes... Wyang (talk) 03:53, 28 January 2015 (UTC)
You're genius. :) --Anatoli T. (обсудить/вклад) 04:18, 28 January 2015 (UTC)
@Wyang Hi Frank, I'm back in Melbourne after three weeks in France (also a bit of Belgium) I'm eager to see the change, as 少し and しかし also need to be fixed. :) --Anatoli T. (обсудить/вклад) 01:20, 5 March 2015 (UTC)

Bug with sutegana at the beginning of a term[edit]

{{ja-pron|ふぁふぃとぅふぇふぉふぁ|acc=1}} {{ja-pron|ふぃとぅふぇふぉふぃ|acc=1}} {{ja-pron|とぅふぇふぉふぃ|acc=1}} "fúァ" and "fúィ" probably isn't desirable—umbreon126 07:37, 7 March 2015 (UTC)

'tis okay now —umbreon126 05:11, 27 March 2015 (UTC)
Thanks, User:Wyang! --Anatoli T. (обсудить/вклад) 05:19, 27 March 2015 (UTC)
No worries. :) Wyang (talk) 20:58, 27 March 2015 (UTC)

Delimiting vowels[edit]

For example, on 女王 (applies to the readings じょおう [2] and にょおう [2]), there is a need to delimit the vowels for the IPA to render properly, but when this is done using . as is done in ja-noun, the . is printed in the kana and it also messes up the accent because the dot gets counted as a kana. Nibiko (talk) 08:25, 4 June 2015 (UTC)

Fixed by Kc kennylau! Thank you so much, Kc kennylau! <3 Nibiko (talk) 04:36, 25 April 2016 (UTC)

Twofold long vowels[edit]

Different to my above-mentioned concern, I noticed that on 蓊鬱 おううつ is represented as òóótsú just before the section where it says Heiban. I would expect it to be òóútsú. Nibiko (talk) 03:12, 24 August 2015 (UTC)

[çiβa̠kɯ̥ᵝɕʲa̠][edit]

It gives the pronunciation [çiβa̠kɯ̥ᵝɕʲa̠] for 被爆者. I don’t know where this [β] comes from. The intervocalic /ɡ/ is often realized as a fricative but the phoneme /b/ doesn’t change at least in my pronunciation. In addition, [ɕʲ] is redundant because [ɕ] is already palatalized. — TAKASUGI Shinji (talk) 12:27, 9 September 2015 (UTC)

@Wyang. --Anatoli T. (обсудить/вклад) 00:03, 14 July 2016 (UTC)
Thanks. [β] is from Japanese phonology#Weakening - should the rule be removed? Removed palatalisation of [ɕʑ]. Wyang (talk) 00:11, 14 July 2016 (UTC)
I think we should remove the rule of β. — TAKASUGI Shinji (talk) 06:11, 14 July 2016 (UTC)
Ok no problem. Removed. Thanks! Wyang (talk) 09:01, 14 July 2016 (UTC)
Thank YOU! — TAKASUGI Shinji (talk) 00:53, 15 July 2016 (UTC)

Sorting[edit]

Can a sort parameter be added to the template? —britannic124 (talk) 16:28, 13 July 2016 (UTC)

  • Since this template already uses a kana-ized string as its primary input, a sort key shouldn't be necessary.
I do see that the underlying module is not changing katakana to hiragana for sorting purposes, but this should be fixed in the module itself, so that the sort key is correctly and automatically derived from the data that the module is already using. @Wyang, is that something you could do? If not, could you ping someone who could? ‑‑ Eiríkr Útlendi │Tala við mig 18:14, 13 July 2016 (UTC)
Sure, I added sortkeys to the IPA and audio categories. It's a bit of an ugly hack though, since those templates do not seem to support |sort=. Wyang (talk) 22:20, 13 July 2016 (UTC)

ん (‘n’) before approximates[edit]

Shouldn’t “n” before “w” be represented as [ɰ̃ᵝ], like in “denwa” [dẽ̞ɰ̃ᵝɰᵝa̠]? (Or least [dẽ̞ɴɰᵝa̠]?) And “n” before “y” as [j̃], like in “shin’ya” [ɕĩj̃ja̠]? —britannic124 (talk) 18:02, 2 September 2016 (UTC)

Recent change to Module:ja-pron[edit]

Hi @Eirikr! Just letting you know that there was a change to Module:ja-pron recently by User:Nardog. I'm not qualified to comment on the IPA changes, but I know you are definitely. :) Wyang (talk) 09:17, 17 May 2017 (UTC)

dev2[edit]

Since the Japanese attention category has no organisation, I'm leaving my concern here. It would be good if you could override the value of the dev parameter for a certain accent, as this would allow to express exceptions in a single use of the template. See 増幅器 and 屹度. Nibiko (talk) 13:02, 19 June 2017 (UTC)

Co-occurring pitch accents[edit]

ja-pron does not currently support co-occurring pitch accents. If a term has multiple pitch accents divided across words, then see 因果応報 and 一期一会 for the current way to format them. Nibiko (talk) 02:29, 29 June 2017 (UTC)