User talk:Suzukaze-c

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search

bot cleanup work[edit]

{{ja-adj}} currently accepts |infl=i, |infl=い, |decl=i, |decl=い, and the like. Would it be a good idea to stick to a single format (say |infl=i), and eliminate the rest? Similarly, I think {{ja-pron}} need only one parameter for devoiced positions (preferably by morae).

(That said, the biggest difficulty of parsing Wiktionary data for reuse is distinguishing between different lexical items in a single entry. They are sometimes divided by Etymology headers, sometimes by Pronunciation, sometimes both, so PoS headers may be on L3, L4 or L5. If we used page titles in the format "KANJI/KANA", most entries would contain only one lexical item, and the task would be significantly easier.) --Nyarukoseijin (talk) 10:07, 19 May 2020 (UTC)

infl: [1]
ja-pron: Template talk:ja-pron ctrl-f devm.
headers: Wiktionary:Beer parlour/2019/March#Eliminating the difference in formatting between no-etymology, single-etymology and multiple-etymology entries
( ;∀;)
page titles: True, but at the same time I worry that "choosing" a kanji spelling would be troublesome/controversial.
( ;∀;)
Suzukaze-c 04:29, 20 May 2020 (UTC)
infl: also text = replace(text, 'infl=い', 'infl=i')
ja-pron: convert existing |dev= to |devm=, then if dev ~= "" then error(...) end
headers: Thanks, I wasn't aware of that discussion. But I prefer something like ==English (1)== (which would render empty Etymology headers unnecessary).
page titles: choose the first kanji spelling listed in say Daijirin
Would it be a good idea to start cleanup work now, one kind of change at a time (say |infl=i), instead of waiting until everyone agrees on the final format? --Nyarukoseijin (talk) 11:02, 20 May 2020 (UTC)
Well, it would be most straightforward to remove all inflectional types from {{ja-adj}} since they're already in the Inflection section. If {{ja-adj}} needs them it can just transclude the whole entry and examine the Inflection section.
Maybe the most useful thing at the moment is to improve the inflectional templates. It's better to use a unified template (e.g. {{ja-infl|i}} instead of {{ja-i}}) since it allows easy extension (e.g. {{ja-infl|no}}, {{ja-infl|nari|tari}}). The reading should be given in non-truncated form (e.g. むずかしい over むずかし) if given at all. Also it would be better to give the traditional and JFL paradigms separately -- the traditional paradigm can be extended for godan verbs (two kinds of 未然形, two kinds of 連用形, 可能動詞) and show 接続 info, the JFL paradigm should put the ます form first, etc. --Nyarukoseijin (talk) 11:48, 20 May 2020 (UTC)
  • Re: I worry that "choosing" a kanji spelling would be troublesome/controversial. -- why choose? The reader would go to 付く or 着く or 就く or 即く or 憑く. The data would live at the [[KANJI/KANA]] page, while the reader-facing pages would present the data.
I don't think we can expect readers to understand that they need to go to any [[KANJI/KANA]] page.
(@Nyarukoseijin, correct me if I've misunderstood your idea.)
  • Re: adjective inflection templates, I like the idea of unifying into {{ja-infl}}. I confess I don't understand your distinction between "traditional" and "JFL", and I don't even know for sure what JFL means, so I don't quite know what to think of that part.
  • Re: entry structure in general, I've never been a fan of Wiktionary's insistence on different structure -- sub-Etym headings starting at L3 for single-etym entries, but at L4 for multi-etym entries. Unnecessarily complicated and confusing, and now biting us in the ass as we try to automate more. While any [[KANJI/KANA]] page would likely only have one etymology, I can think of rare cases where a single such kanji + kana combo might have multiple etymologies. There's also the issue of what happens when such an entry might be transcluded into a bigger one, such as if we try to replicate monolingual electronic dictionary behavior, where the user goes to つく and gets everything that's read as つく. Suggestion: 1) either suck it up and we deal with differing header levels and structures in the Lua code, or 2) ideally, but also less likely to get traction with the greater EN WT editor community, strike up a thread at WT:GP or WT:BP etc. and propose that we always start sub-etym headers at L4, and reduce this unnecessary variation in header levels. I suppose 2.5) we could propose this structure just for JA entries.
  • Re: ===Pronunciation 1,2,3...=== as the top-level entry header, that's something I experimented with for kana-spelling entries, where a reader might look something up based on something they heard but don't know how to spell. If the reader knows that it's, say, はし[2] and not はし[0], they want to be able to quickly and easily find the info for はし[2]. Since the old soft-redirect only listed lemmata with no other info (sometimes a short gloss), and since even {{ja-see}} currently shows only the glosses but not pitch, the reader can't see which はし entry is for はし[2] unless they click through to each lemma entry to find the right one. This is bad usability.
However, with {{ja-see}}, this could be fixed by tweaking what the template shows to include pitch accent info. Also, using ===Pronunciation=== at the top of the entry sounds more and more like it is untenable, and causing more complexity than it's worth.
I'm okay with this going away (i.e. keeping Pron always under Etym), particularly if {{ja-see}} could be updated.
‑‑ Eiríkr Útlendi │Tala við mig 18:56, 20 May 2020 (UTC)
  • choosing kanji spelling: You got it! The title and even the format of the data-holding page is irrelevant, since Lua is powerful enough to transform it to standard entry layout on reader-facing pages. The Chinese editors implemented the idea long ago (see for example Module:zh/data/dial-syn/我們 as well as 我們, 我等, 吾等, etc.).
  • inflection templates: "traditional" = 学校文法, "JFL" = Japanese as a foreign language = 日本語教育文法, so the traditional paradigm is 未然形, 連用形, ... and the JFL paradigm is ます形, て形, .... We have "Stem forms" and "Key constructions" which roughly correspond to the two, but it's better to rearrange the latter in the JFL order. The former can be extended like
未然形  書か(~ない,~れる,~せる,~ず)
連用形  書き(~ます,~たい,…)
可能動詞 書ける(一段) // note that this is not derived from 仮定形!
  • entry structure: The so-called "single-etym entries" and "multi-etym entries" are single entries and multiple entries with homonym numbers in professional dictionaries like OED. I don't see why Wiktionary should be different. Hence my proposal of ==English 1==. Japanese can use a similar approach. For example, if we redirect to 弁/べん/1, the effect is to fetch ==Japanese 1== from 弁/べん. Or we can use special page titles like 弁/べん/1 or 辨/べん for those rare cases.
  • pronunciation: Verb ren'yokei and the derived noun may have different accents, such as 休み[2] (verb) and 休み[3] (noun). But one can always use |accn_note= and do away with ===Pronunciation 1=== etc. --Nyarukoseijin (talk) 05:30, 21 May 2020 (UTC)
@Suzukaze-c, Nyarukoseijin -- Ah, it occurred to me today one other case where ===Pronunciation=== might want to come before ===Etymology=== -- cases where there is one pronunciation that applies to multiple etymologies. See セント (sento), for instance. The only way to keep ===Etymology=== at the top here is to duplicate the pronunciation information, which seems ... inelegant, maybe even sloppy. ‑‑ Eiríkr Útlendi │Tala við mig 17:37, 27 May 2020 (UTC)
I'm afraid I can't agree. How would you add an acc_ref that applies to only one of the homonyms (say a Kyoto accent dictionary listing only "cent")? The two pronunciations should be maintained separately even though they are identical at present. --Nyarukoseijin (talk) 18:55, 27 May 2020 (UTC)
Agreed. —Suzukaze-c 19:45, 27 May 2020 (UTC)
  • If you all don't mind data duplication, then by all means please tweak the structure at セント accordingly. I'll happily follow suit going forward. ‑‑ Eiríkr Útlendi │Tala við mig 00:19, 28 May 2020 (UTC)

Fixing conjugation of hoti.[edit]

Thank you, but the actual error was subtly different. The |voice=act from the demoted* {{pi-conj-future}} was formally a duplicate, but it should have been converted to |futu_voice=act. I haven't yet documented the new parameters started with 'futu_' - I only got them working a few days ago and the modified module, {{#invoke:pi-conj/verb}}, is still being shaken down. --RichardW57 (talk) 08:56, 31 May 2020 (UTC)

  • Most futures will be handled in {{pi-conj-special}}, which should probably become {{pi-conj-verb}}. I intend, however, to treat future tense forms with special semantics as related lemmas. --RichardW57 (talk) 08:56, 31 May 2020 (UTC)
Good to know. That's why I ping people when 'fixing' entries. :D —Suzukaze-c 09:26, 31 May 2020 (UTC)
@RichardW57 please take a look at brūti. When you make changes to a module that's transcluded in entries, you should always check CAT:E for a few days. Chuck Entz (talk) 21:39, 31 May 2020 (UTC)
@Chuck Entz: That was lucky. The module was not designed to work for the likes of brūti, so it was missing from the testing. It so easily have started to generate unexpected erroneous entries. --RichardW57 (talk) 23:13, 31 May 2020 (UTC)

Alt forms vs. deriveds[edit]

Re: diff --

So far as I've understood it, alt forms for JA entries have been for alternative written forms that represent the exact same word -- same etym, same pronunciation, same meanings (for at least some of the senses).

Meanwhile, パンツ一丁 (pantsu itchō) and パンイチ (pan'ichi) have the same meaning, but different etymologies and different pronunciations. The latter is also clearly a derivation (shortening) of the former, as you noted. The two cannot be swapped out one-to-one in quite the same way as other alternative forms, such as (ryū, dragon) or 壊す毀す (kowasu, to break something) or 輝く耀く (kagayaku, to twinkle, to sparkle).

The addition of alternative-form functionality to {{ja-kanjitab}} (perhaps soon to be spun off into some other template) seemed to be in part intended to do away with the additional ===Alternative forms=== header. Curious as to @Nyarukoseijin's perspective. ‑‑ Eiríkr Útlendi │Tala við mig 05:23, 10 June 2020 (UTC)

Nyarukoseijin's |alt= is specifically/exclusively for alternative spellings (which would be your 3 (6?) examples), in contrast to alternative forms. —Suzukaze-c (talk) 05:39, 10 June 2020 (UTC)
Yet, an alternative form is another form of the same word. The above are two different words, albeit related. I see also that Wiktionary:Entry_layout#Alternative_forms describes a similar use case. ‑‑ Eiríkr Útlendi │Tala við mig 05:44, 10 June 2020 (UTC)
Wiktionary:Votes/pl-2010-07/Alternative forms header abolished "Alternative spellings" apparently because people had a hard time distinguishing them in English, but it is easier in Japanese. I believe Nyarukoseijin wants to distinguish the two as well, but maybe I am remembering wrong, and I will let them clarify if they want to. —Suzukaze-c (talk) 05:50, 10 June 2020 (UTC)
That depends on how you define "alternative forms". When I coded the |alt= parameter of {{ja-kanjitab}}, I was influenced by the modern linguistic camp, which considered spellings as secondary to forms (defined as sound shape + meaning). So 位 and くらい are alternative spellings of a form while 位/くらい and 位/ぐらい are alternative forms of a word. Does Wiktionary define "alternative forms" in some other way? --Nyarukoseijin (talk) 06:23, 10 June 2020 (UTC)
@Suzukaze-c: Ah, yes, I wanted to establish this two-level hierarchy but the main motivation for introducing {{ja-kanjitab|alt=...}} was to reduce the number of keystrokes for adding alternative spellings. But {{ja-kanjitab|alt=...}} doesn't show romaji, so when the romaji is different, use ===Alternative forms=== to show it. The two-level hierarchy comes naturally as an afterthought. --Nyarukoseijin (talk) 06:56, 10 June 2020 (UTC)

イン change and rfv[edit]

Heya, saw that you added an RFV at イン (in), presumably for the inn sense. Multiple reference works list this sense at the Kotobank page. Does that suffice? ‑‑ Eiríkr Útlendi │Tala við mig 18:01, 18 June 2020 (UTC)

Not really. —Suzukaze-c (talk) 19:50, 18 June 2020 (UTC)
Hmm, that's interesting. What about the presence of this element in various Japanese company names, like 東横イン, or ホテルルートイン? This book also includes eleven instances of 「真木町イン」; possibly a fictional place and I'm uncertain of the kanji reading, but the book's audience was expected to know what an イン is in this context.
Among other hits. ご参照に。 Cheers! ‑‑ Eiríkr Útlendi │Tala við mig 21:46, 18 June 2020 (UTC)