Template talk:ja-usex

From Wiktionary, the free dictionary
Jump to navigation Jump to search

How many characters can it handle?[edit]

See 海行かば.

Tested a very large usage example, how much characters can this template can handle without reaching the did not match the kana (due to overflow)? Just wondering. --POKéTalker (talk) 05:33, 9 November 2017 (UTC)[reply]

@Poketalker: I don't know if I understand this properly, but I'll attempt an answer. The Lua pattern length limit (mw.ustring.maxPatternLength) is 10,000 bytes, so I guess the pattern used internally by the ruby function in Module:ja has to be smaller than or equal to that. The pattern won't be the same length as the text in the template, so who knows what the byte limit or character limit for input to the template are. (The patterns used for your usage example were "(.+)%%(.+) (.+)%%(.+)%%(.+) (.+) (.+)%%(.+)'''(.+)%%(.+) (.+)%%(.+) (.+)%%(.+) (.+)%%(.+)%%(.+) (.+)%%(.+) (.+)%%(.+)%%(.+) (.+) (.+)'''(.+) (.+)%%(.+)%%(.+) (.+)%%(.+)%%(.+)%%(.+)", 180 bytes long, and "(.+) の (.+)%%(.+)の(.+) を (.+)り (.+)へし (.+)'''(.+) (.+)かば (.+)く (.+) (.+) (.+)かば (.+)%%(.+)す (.+) (.+) の (.+) に こそ (.+)なめ かへり(.+) は せじ'''と (.+)%%(.+)て(.+)%%(.+) の さき の (.+)けば (.+)み", 245 bytes long, so they didn't come anywhere close to the limit. The man'yougana and modern spelling in the wikitext are 231 and 280 bytes.) There are other unpredictable factors, like whether Lua will run out of memory or processing time with longer text. — Eru·tuon 06:29, 9 November 2017 (UTC)[reply]
@Erutuon: See [1]. I think this is what POKéTalker means. —suzukaze (tc) 07:51, 11 November 2017 (UTC)[reply]
@Suzukaze-c: That's strange. I don't understand it, because the pattern isn't over 10,000 bytes, and it looks like it should match. — Eru·tuon 08:41, 11 November 2017 (UTC)[reply]
Okay, my new theory is that it was because of characters or sequences of characters that appear more than once in the kana text. The "(.+)" elements in the pattern are greedy and they were grabbing too much text (that is, skipping from the first of the repeated sequences of characters to the second, or something), leaving not enough for the ones later on in the pattern. Hence the pattern didn't match. It's hard to verify this theory, because you can't monitor the matching function in the C or PHP code while it's trying to match and figure out where it fails; you just get the result, match or no match.
I changed these elements to non-greedy "(.-)", and now the problem seems to be fixed. — Eru·tuon 05:14, 14 November 2017 (UTC)[reply]
If my willingness may, put the entire chōka here for testing? If not, how about the above link's talk page or in Wikiquote? --POKéTalker (talk) 02:23, 17 November 2017 (UTC)[reply]

attaching ruby to alphanumeric characters[edit]

Hello. How do I attach ruby to a phrase like 100m走? --Dine2016 (talk) 17:07, 28 March 2018 (UTC)[reply]

@Dine2016: {{ja-r|100%m%走|あ%い%う}}100()()() (aiu). — Eru·tuon 23:08, 28 March 2018 (UTC)[reply]

Indentation issue on romanization and translation[edit]

See usage examples at 八雲立つ (ya kumo tatsu)

If ja-usex has no m= and m_kana=, there is indentation for the romanization and translation; while the opposite it has no indentation. Suggesting a fix? ~ POKéTalker21:01, 25 May 2018 (UTC)[reply]

万葉集 quotations[edit]

@Eirikr, Poketalker, Dine2016, Kc kennylau: What orthography is the "modern spelling" of 万葉集 quotations in? (asking because of this edit). —Suzukaze-c 03:20, 10 January 2019 (UTC)[reply]

@Suzukaze-c: should be the historical orthography; the rom= section can have the romanization be academic OJP or modern depending on user's preference. ~ POKéTalker04:12, 10 January 2019 (UTC)[reply]
'Modern spelling' probably refers to modernizing man'yōgana to kanji-kana mixed writing ('kanji-kana majiribun'), but it does not make it clear whether historical or modern kana orthography is used. This user is probably modernizing the kana orthography because 'modern spelling' confusingly suggests it. --Dine2016 (talk) 04:28, 10 January 2019 (UTC)[reply]
This user has also demonstrated some related confusion about spellings and kana, particularly at 五十 and 五十嵐 (at least, those are the ones I've bumped into so far). I suspect they aren't a native Japanese speaker. FWIW, their IP geolocates to the UK.
I confess I'm confused where the label "modern spelling" came from. I don't see this label used anywhere in the {{ja-usex}} template documentation, for instance. @Suzukaze-c, Dine2016, could you enlighten me? Is this an editor-facing label somewhere? If so, we should probably find some way of either changing the label, or better explaining in the documentation somewhere what this is intended for.
Re: the kana strings themselves, we should be using the kana appropriate to the time: ゐ should not be reduced to い, ゑ should not be reduced to え, ふ should not be reduced to う, etc. Re: the romanization, I'm a strong proponent for 1) strict transcription, with 2) an awareness of historical and reconstructed sound values to the extent known. For instance, all the kana starting with /h/ or /f/ in modern Japanese should be romanized with ⟨p⟩, all the 乙類 vowels should be annotated with 2, etc. These distinctions were phonemically significant at that stage of the language. If we want to also include a fully modernized rendering, I'm happy for us to do that, but then we're starting to talk more about translation from ancient to modern, rather than just reproducing the ancient text. ‑‑ Eiríkr Útlendi │Tala við mig 17:47, 10 January 2019 (UTC)[reply]
"modern spelling" appears in the template output: "[Man'yōgana]"; "[Modern spelling]". —Suzukaze-c 17:55, 10 January 2019 (UTC)[reply]
Aha, thank you! Explains why I wasn't seeing it in the wikicode.  :) That definitely needs changing. I'd like to propose Kanji + kana. ‑‑ Eiríkr Útlendi │Tala við mig 19:15, 10 January 2019 (UTC)[reply]
I thought that perhaps it could be renamed "normalized spelling". —Suzukaze-c 08:54, 20 January 2019 (UTC)[reply]

marking usexes as Classical?[edit]

would this be useful? and perhaps specifying "Classical" could modify the automatic romaji as well. —Suzukaze-c 02:45, 23 January 2019 (UTC)[reply]

What about adopting the look of {{zh-x}}: put a label to the right of the text indicating the language and orthography, format the |ref= argument in small font "From:" below the text, and automatically switching to {{ja-usex-inline}} for short examples?
By the way, Classical Japanese or 文語 can also have modern pronunciations. 青取之於藍而青於藍 → 青は之を藍より取りて而も藍より青し, despite being 文語, is likely today to be pronounced "ao wa kore o ai yori torite shikamo ai yori aoshi", not "awo fa kore wo awi yori torite sikamo awi yori awosi", by a modern reader of ancient texts. --Dine2016 (talk) 16:21, 23 January 2019 (UTC)[reply]
The same label format is already used with 万葉仮名 spellings, so using it again seems sensible.
As for |ref=, it is already being used in a way similar to the |ref= parameter of {{ux}}, so I do not fully support changing the format now, unless we are willing to find and reformat them all. (I also don't entirely like the inconsistency between {{zh-x}} and standard formatting TBH.) Automatically switching to {{ja-usex-inline}} could be good, although I don't entirely like the way it's implemented in {{zh-x}} either.
And of course we can provide modern romaji as well; but I figured that since we're using "historical" romaji in {{ja-readings}} and {{ja-conj-bungo}}, we ought to include it in {{ja-usex}} as well (and perhaps {{ja-pos}}?). —Suzukaze-c 02:29, 24 January 2019 (UTC)[reply]
On the contrary, I find {{zh-x}}'s treatment of |ref= to be superior because it displays the quotation with its source in one location, which makes the page look more modular. This, combined with the use of {{zh-ref}} in etymology sections, means Chinese entries rarely need ===References===\n<references/>, which makes them easier to type and maintain. --Dine2016 (talk) 04:35, 24 January 2019 (UTC)[reply]
True. —Suzukaze-c 05:04, 24 January 2019 (UTC)[reply]
  • Do you have any example pages? Your description doesn't make sense to me, and the Template:zh-x page doesn't show any examples of how ref= works or displays. I also don't understand the worry about maintaining ===References===; that is trivial and nothing I've ever seen as cause for concern.
As this thread currently stands, I am somewhat opposed to the proposal -- the {{zh-x}} approach appears to be at odds with how references are handled throughout the entire rest of the site, which is poor usability (introducing such arbitrary variance can be confusing to readers and editors alike). I'm also not thrilled with the idea of imposing romanizations based on modern pronunciation: in some cases, that's appropriate, but not in all cases. ‑‑ Eiríkr Útlendi │Tala við mig 17:48, 24 January 2019 (UTC)[reply]
An example might be 之#Chinese. I agree it's odd to show the source in small type below the quotation. When I made the reply above, I was in the mistaken belief that either the source of a quotation must be provided as a ref to the upper-right of the quotation, or it had to take the format of {{zh-x}}. I wasn't aware that the usual format is to put it above the quotation via indentation. Withdrawing what I said. To turn to the other question, the first thing to recognize is that when a Japanese entry has multiple etymology sections, it cannot be generated fully by {{ja-new}} and it must be typed by hand or with manual copy-paste work. As such, the less code (and the more modular the code), the more productivity. One thing I don't like about {{ja-pron}} is that it is a 係り結び: if you use |acc= in {{ja-pron}}, you have to add ===References===\n<references/> at the end of page; when you remove |acc=, you have to check whether to remove it; which makes the maintenance of pages tedious and error-prone if you add one hundred etymology sections to existing entries a day. --Dine2016 (talk) 18:48, 24 January 2019 (UTC)[reply]
(FWIW, if we want to have a 100% "standard" appearance, [2] might be a consideration. [IMO, the labels are not distinct enough from the example text.] —Suzukaze-c 22:57, 24 January 2019 (UTC))[reply]
Um, kanbun kundoku is an unusual source of quotations and mainly for demonstrative purposes. More commonly the quotation is itself from a Classical Japanese text (such as Taketori Monogatari) and in such cases showing the original orthography and the Classical pronunciation might suffice. --Dine2016 (talk) 08:43, 25 January 2019 (UTC)[reply]
Indeed; I chose kanbun for demonstration purposes (featuring all possible bells and whistles). —Suzukaze-c 09:28, 25 January 2019 (UTC)[reply]

Middle Japanese header[edit]

By the way, I wonder if Classical quotations (and classical-only words) should be under ==Classical Japanese== or ==Middle Japanese==. --Dine2016 (talk) 08:44, 25 January 2019 (UTC)[reply]

That remains an unresolved problem. There's no ISO code for either: we have OJP for Old Japanese in the early Heian and even older, and then just JA for Japanese. The expectation that educated Japanese-language writers and readers would be familiar with Classical, which is basically a variant of Middle, somewhat complicates the situation, as it blurs the distinction for when usage ceased -- effectively, it *hasn't* ceased, so Middle / Classical lives on as a kind of register of Modern. Hence, modern Japanese dictionaries still include a lot of things that are Middle / Classical.
Considering the state of things, I'm happiest with the approach of adding usage notes and labels, rather than trying to arbitrarily split things up altogether under different language headings. ‑‑ Eiríkr Útlendi │Tala við mig 18:25, 25 January 2019 (UTC)[reply]
@Eirikr: Thanks for the reply. I think a code for Middle Japanese (or Classical Japanese) could be needed in the etymology section, like this:

From Middle Japanese よみがへる (yomigaferu, stem yomigafer-, “to revive”), from Old Japanese yo2mi2gape1r- (“to revive”), from yo2mi2 (“land of the dead, underworld”) + kape1r- (“to return”), meaning “to return from the underworld”. Equivalent to 黄泉 (yomi) + 帰る (kaeru).

The current approach of mixing different stages of Japanese in etymology sections, like (awo → ao), doesn't work when there is no kanji to cover the difference of 仮名遣い. --Dine2016 (talk) 07:34, 1 February 2019 (UTC)[reply]

"Modern spelling"[edit]

I think the Man'yōshū format could be generalized to a format where the first line is the original spelling, the second line is modernized spelling, the third line modern pronunciation, and so on. For example:

NIFON NO COTOBA TO Hiſtoria uo narai xiran to FOSSVRV FITO NO TAMENI XEVA NI YAVA RAGETARV FEIQE NO MONOGATARI. [Portuguese spelling]
日本(にほん)言葉(ことば)とHistoriaを(なら)()らんと(ほっ)する(ひと)のために世話(せわ)(やわ)らげたる平家(へいけ)物語(ものがたり) [Modern spelling]
Nihon no kotoba to Historia o narai shiran to hossuru hito no tame ni sewa ni yawaragetaru Heike no Monogatari. [Modern pronunciation]

--Dine2016 (talk) 16:03, 14 August 2019 (UTC)[reply]

@Dine2016 -- I'm fine with including both historical at-that-time spellings / pronunciations, and the modern.
However, for the historical, do we really want to use obsolete typography like the ſ long "S"? This character is really hard to parse correctly in most modern fonts; when reading the above, I initially thought that Historia was Hiltoria with an L instead of the long-S. If we really feel a need to use such obsolete typographical conventions, then from a basic usability perspective, I think we should also use fonts that render these in easier-to-see glyph forms.
I'm also confused by the irregular capitalization in the example. The Portuguese is given as a phonetic gloss of the Japanese, and there's no apparent pattern to the capitalization (such as capitalizing only on'yomi, or some similar practice). Is this intended to be a faithful copy of a Portuguese work that didn't actually include the Japanese text with kana and kanji (much like the Nippo Jisho)? ‑‑ Eiríkr Útlendi │Tala við mig 22:17, 14 August 2019 (UTC)[reply]
If other languages replace obsolete typography like ſ with modern ones then it makes sense for Japanese to do the same. Actually, I've done the same for an example sentence at あるは -- replacing くさ〳〵 (which most fonts cannot render properly) with くさぐさ.
The irregular capitalization comes from the title of the book: https://dglb01.ninjal.ac.jp/BL_amakusa/ --Dine2016 (talk) 02:55, 15 August 2019 (UTC)[reply]

@Eirikr, Nyarukoseijin: in addition to adding a |pt= parameter for Japanese examples written in Portuguese orthography, there should be for other languages as well. What is also needed is Classical Chinese passages with 漢文訓読 (annotated kanbun, see discussion page between myself and @荒巻モロゾフさん). Aramaki-san has been utilizing the |m= and |m_kana= parameters which may not need require |m_kana= or ruby on the original, only the kanbun marks. In summary:

  • |m= and |m_kana= is for passages in, for example, the Shinsen Man'yōshū or any 真名序 (manajo, kanji preface),
  • |en=, |pt=, |es=, etc. for Romanized passages in Latin languages, and
  • |k= or |kanbun= for annotated Classical Chinese passages (i.e. with kanbun marks).

What do you think? Sorry if this is unclear to you. ~ POKéTalker22:27, 11 June 2020 (UTC)[reply]

|k_kana= is also needed, because kanbun can have furigana(and okurigana). And that, [Modern spelling] is unnatural when using Historical kana orthography(歴史的仮名遣), so please add the function to switch to [Classical spelling].--荒巻モロゾフ (talk) 08:32, 12 June 2020 (UTC)[reply]

最も[edit]

{{ja-usex|選%択は最も慎%重|せん%たく は もっとも しん%ちょう}}
{{ja-usex|選%択は最%も慎%重|せん%たく は もっと%も しん%ちょう}}

(せん)(たく)(もっと)(しん)(ちょう)

sentaku wa mottomo shinchō
-

(せん)(たく)(もっと)(しん)(ちょう)

sentaku wa mottomo shinchō
-

--Dine2016 (talk) 05:36, 20 November 2019 (UTC)[reply]

ruby CSS[edit]

@Suzukaze-c There is a CSS rule font-feature-settings: "ruby"1; which made ruby look better on my Windows 10 computer. Strangely, it is applied to 使い but not 出づ. Can you take a look? --Dine2016 (talk) 11:26, 16 January 2020 (UTC)[reply]

Which part of the entries? —Suzukaze-c 18:11, 16 January 2020 (UTC)[reply]
The furigana in quotations, {{ja-r}}, and now headword lines. In 使い, the CSS is applied so the furigana letters looks normal. But in 出づ, the CSS is not applied so they appear in a condensed font (the width of each letter is about half the height). At least on my Windows 10 computer with the additional Japanese fonts installed. --Dine2016 (talk) 01:06, 17 January 2020 (UTC)[reply]
@Dine2016 It appears to be Template:ruby/styles.css. —Suzukaze-c 03:03, 17 January 2020 (UTC)[reply]
ruby > rt,
ruby > rtc {
	font-feature-settings: 'ruby' 1;
}
使い ultimately uses {{ruby}}, and 出づ does not. This really should be in common.css to avoid such inconsistencies. —Suzukaze-c 03:06, 17 January 2020 (UTC)[reply]

Stripped bold formatting from kana?[edit]

  1. トラブルごめんだ。
    Toraburu wa gomen da.
    Keep me out of your troubles.

@TheknightwhoFish bowl (talk) 01:55, 6 February 2024 (UTC)[reply]

@Theknightwho It is coming from remove_ruby_markup, which removes "style markup" since January. Is it necessary? —Fish bowl (talk) 22:51, 3 March 2024 (UTC)[reply]