Module talk:ko-headword

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Some comments[edit]

If they are always supposed to do the exact same thing, there's no need to duplicate all the code. You can just make one point to the other. Also, I'm not sure why the "infinitive" inflection is being added twice. If someone gives both hae and hae2 then it will show (infinitive hae, infinitive hae2). I don't think that's the idea? —CodeCat 02:06, 8 January 2014 (UTC)[reply]

I don't know how to make them point to each other to be honest... When there are two infinitives the templates display them as "infinitive a or b", but I don't know how to make Module:headword do that. Would one just put the links inside the label? Haplogy () 02:23, 8 January 2014 (UTC)[reply]
I made the change, I hope it works right now. I'd like to help if anything is unclear. I'm wondering what the "form" parameter is for exactly. —CodeCat 02:37, 8 January 2014 (UTC)[reply]
Thanks, I appreciate all of your help. I assume you mean the "form" parameter for "verb forms" and "adjective forms"? Those are from {{ko-verb-form}} and {{ko-adj-form}}. I was trying to reproduce the behavior at 귀여운. ko-verb-form seems not to use them so much... Currently it defaults to "form of" when no form is specified but the template seem not to do that, so (I guess) that's a bug in the module. Haplogy () 02:46, 8 January 2014 (UTC)[reply]
I can change that, but the template just displays "of" which is even worse. Also, I wonder if the parameter should be removed altogether. We don't normally list form-of definitions or etymologies on the headword line, so I think it would be better to put that information somewhere else. —CodeCat 02:59, 8 January 2014 (UTC)[reply]
That sounds reasonable. Korean is totally foreign to me and I was just trying to duplicate the behavior of the templates. I would suggest bringing it up with the other Korean editors, but we probably have this situation to begin with because there aren't any. I guess it could be mentioned at Wiktionary:Beer_parlour#Korean_.ED.95.98.EB.8B.A4-verbs as a change to be lumped in with the proposed modification for 하다-verbs. Haplogy () 03:22, 8 January 2014 (UTC)[reply]
Thanks for the efforts, guys! I'm too slow with Lua and definitely can't beat you. It's also sad we don't have dedicated Korean editors.
Not sure if we currently need Category:Korean terms with transliterations needing attention. The development of Module:ko-translit have stopped for the moment, even though editors involved agreed on the phonetic methods of transliterating Korean, see Module:ko-translit/testcases. Kephir (talkcontribs) was the active developer. The w:Revised Romanization of Korean is used but there is no comprehensive English description for the phonetic transcription recommended by the National Institute of the Korean Language. TAKASUGI Shinji (talkcontribs) has more info. I'm happy to continue designing test cases and explaining rules now that I understand and agree on the transliteration method. Some minor details of the transliteration may not be acceptable by other editors, especially those who favour graphical transliteration. Anyway, there must be involvement of editors for this module to be successful.
Re: hyphens - I don't see another way of automatically transliterating Korean. There are particles, copulas, prefixes, multipart words, "hada-verbs" (= Japanese "suru-verbs"). It's impossible to guess where a hyphen is legitimate. Besides, hyphens help to delimit individual hangeul (not hangul) and syllables and hyphens are implemented in the Korean Wiktionary.
Sorry again for not being helpful with the module. As I said, I can help with testing, checking some info regarding Korean (currently brushing up my Korean, which is basic) and converting entries (manually) but my Lua skills leave much to be desired, so I have to rely on someone else. --Anatoli (обсудить/вклад) 22:04, 8 January 2014 (UTC)[reply]
You can find all possible combinations in the subtemplate fr:Modèle:ko-roman/frontière on the French Wiktionary. — TAKASUGI Shinji (talk) 00:42, 9 January 2014 (UTC)[reply]
The Category:Korean terms with transliterations needing attention can go. The idea was to check manual transliterations against automatic ones and flag ones that don't match but if it's not appropriate right now that's fine. For the next immediate stage I'd suggest choosing one of the simpler and lesser-used PoS templates and testing this module on that. How about {{ko-interj}}? Since there's agreement about what romanized hangeul should look like, can we just go completely automatic? Is there anybody besides Kephir (talkcontribs) and TAKASUGI Shinji (talkcontribs) who might be interested in Korean headwords? @Atitarev, what do you think about the format of ko-adj-form and ko-verb-form, i.e. putting "determinative of" etc. on the definition line instead of the headword line? Haplogy () 00:51, 9 January 2014 (UTC)[reply]
I can write a new Lua function just for transcription, but I find it difficult to modify existing functions because I don’t fully understand how they are used now. — TAKASUGI Shinji (talk) 01:04, 9 January 2014 (UTC)[reply]
I'm OKAY to go fully automatic when the module is in an acceptable shape. It's already used on translations without transliteration. If we use hyphens, then the capitalisation of proper nouns won't make sense i.e "han-guk" vs "Hanguk". (Note: Graphical vs phonetic method was agreed by participants on the test cases page but there are active editors who oppose it, including Codecat, Wikitiki89, Mzajac but I don't know their opinion on Korean transliteration. I just mention this because this method may need a vote, including the use of hyphens and capitalisation).
Re: "determinative of". No strong opinion. I guess we could use Japanese as a model but need to take into account conversion efforts.
Re: {{ko-interj}}. Yes, it's a good idea. I will do that but there's not so much to do, since there is no inflection but I will try to check other things. I will have to find conversion table between different Romanisation systems. I am more or less familiar with w:Revised Romanization of Korean (let's call it "RR") and w:McCune–Reischauer (let's call it "MR"), they are the most common and North Korea only uses a version of MR.
Shinji, thanks for joining the discussion. Shall I put more test cases for you? Are you able to use Kephir's module or you need to write a new one? I haven't got any response from Kephir yet (I've asked him today), so it's not yet clear if he is willing to continue. --Anatoli (обсудить/вклад) 01:23, 9 January 2014 (UTC)[reply]
I have been using my templates on the French Wiktionary for 4 years and I hope I fully understand the rules. As for hyphens, they don’t always match syllable boundaries and they can be misleading when ㅇ (ng) and ㅎ (h) are concerned. 축하 (chukha) is actually pronounced as chuk-kha, and 일하다 (ilhada) and 사랑하다 (saranghada) as i-rha-da and sa-ra-ngha-da respectively. That is because the pronunciation rules and the Romanization rules don’t completely match. — TAKASUGI Shinji (talk) 01:35, 9 January 2014 (UTC)[reply]
I think these problems are minor, we can still transliterate "chuk-ha" (chuk-a?), "ir-ha-da" and "sa-rang-ha-da" despite the pronunciation but long strings with prefixes, particles, copulas without hyphens or spaces will look horrible and unreadable to learners, IMO. "jeoneun tongyeoksaga piryohamnida" is not the worst example, compare with "jeo-neun tong-yeok-sa-ga pi-ryo-ham-nida". --Anatoli (обсудить/вклад) 01:43, 9 January 2014 (UTC)[reply]
I thought about 축하 and other examples. On a second thought, if you're willing to develop/complete a transliteration module and do it your way I won't make obstacles for you. Having hyphens or not won't make huge differences. For manual insertion of hyphens we could use some other methods if they are necessary. I favour phonetic transliteration but as I said, I don't know if the suggested transliteration methods for Korean will be acceptable to the rest of the community. --Anatoli (обсудить/вклад) 03:20, 9 January 2014 (UTC)[reply]

Rv transliteration not always working[edit]

It works for 클래스 (keullaeseu) but not for 플래시 몹 (peullaesimop) for some reason.

Currently I see:

  • 클래스 (keullaeseu) OK
  • 플래시 몹 () not OK

--Anatoli (обсудить/вклад) 12:52, 10 January 2014 (UTC)[reply]

It's not working on words with a space. --Anatoli (обсудить/вклад) 21:42, 10 January 2014 (UTC)[reply]
Oops... Should be fixed now.
Just now I also added a line to romanize hangeul automatically in hanja entries when no rr is entered, for example at 成年.
Probably somebody should write a function to generate hidx automatically, or copy it over if it already has been written. @Shinji, does French WT's code make a hidx style sort key automatically? Haplogy () 05:26, 11 January 2014 (UTC)[reply]
Thank you again. You have renamed rv to rr but it's rv that's used in entries, not rv! The change should be accompanied by changes in entries or both rv and rr variables and parameters should be handled equally.
Not sure if hidx is needed. I've added derivations from English and they are sorted correctly - by the first jamo, not the whole hangeul character. See 캠퍼스, which is sorted by ㅋ, not 캠. --Anatoli (обсудить/вклад) 05:52, 11 January 2014 (UTC)[reply]
I didn't know the Wikimedia software could sort by jamo automatically. That's good news! Hanja terms like 水曜日 have a hidx value in the entry, e.g. ㅅ수요일 in the case of 水曜日. If the software can sort the hangeul form "수요일" properly (without the jamo on the front) then the template can use the hangeul as the sort key. The module currently doesn't do any sorting (it ignores hidx) and a lot of hanja entries have floated to the top of Category:Korean nouns. I'll do that now.
Sorry about changing rv to rr--I only changed it to rr because I thought everybody preferred rr. I can change it back now. I hope that automatic romanizations can be used from now on and therefore entries will use neither rv nor rr.
I posted this at Template talk:ko-hanjatab but there has been no response. Is everyone ok with Lua-izing {{ko-hanjatab}}? Lua can extract all the hanja and link to them by itself without accepting any parameters from the editor (e.g. {{ja-hanjatab}} not {{ja-hanjatab|成|年}} Haplogy () 13:07, 11 January 2014 (UTC)[reply]
I have converted most headword templates (which were easy), except for Template:ko-pos, Template:ko-hangul-symbol, Template:ko-syllable-hangul, Template:ko-punctuation mark. Template:ko-pos covers all other parts of speech, for which there is no special template. It could use automatic transliteration and other features. --Anatoli (обсудить/вклад) 03:49, 12 January 2014 (UTC)[reply]
I think this module has all the code necessary to support ko-pos, and ko-pos just needs some code like at {{ja-pos}}.
Update on sorting: hanja nouns currently get put into Category:Korean nouns in Han script (that was the previous behavior of the template ko-noun.) Are there any hanja terms that are not nouns? I assume not because there is no Category:Korean verbs in Han script. Hanja terms are sorted by the hangeul param if provided, but many hanja nouns currently have no hangeul in the headword, but they do have a hidx, which is usually just the jamo but it varies. For hanja terms, I think hangeul should be entered for sorting purposes, but should it be displayed in the headword in hanja entries or hidden? Haplogy () 03:58, 12 January 2014 (UTC)[reply]
They all must be nouns but it's hard to check. Korean doesn't have a concept of Japanese on'yomi and kun'yomi because every hanja follows Sino-Korean reading (=on'yomi). They do have native Korean "explanations" but native Korean words are never written in hanja. There are derivations, though, notably "hada-verbs", which attach "-hada" to Sino-Korean terms like Japanese adds "-suru".
Can you give me an example of a hanja entry without hangeul? I will try and fix them. The number is not terribly big. --Anatoli (обсудить/вклад) 04:12, 13 January 2014 (UTC)[reply]
I don't know why hanja entries get added to Category:Sort key tracking/needed, with or without "hidx". --Anatoli (обсудить/вклад) 04:16, 13 January 2014 (UTC)[reply]
Thanks for the explanation. My entire knowledge of Korean is very small and only a week old so I need support from more knowledgeable editors. I guess these are technically nouns or something, but a couple of prefixes have hanja: and .
公言 for example has no hangeul in the headword. I can make this module put all hanja entries without hangeul into a special category if that would be helpful. I think sort key tracking/needed comes from the format_categories function in Module:utilities, and that module expects there to be no sortkey for Korean, but in the case of hanja entries a sortkey is needed and this module sends the hangeul as the sortkey and that surprises Module:utilities. I'm not sure about it though. Haplogy ()
I should add that borrowing from Japanese kanji (with on'yomi) and Korean-made words using Sino-Korean components can also be hanja. Korean words derived from Japanese kun'yomi are written only in hangeul, like 가방 (gabang) = かばん. It's impossible to have a word that is partially written in hanja and hangeul, where hangeul part belongs to the root, like 読む, 新しい. This reduces the number of potential hanja but as I said, words can be formed by prefixes, suffixes, which can be Sino-Korean but like Japanese, they are usually attached to Sino-Korean roots, not native roots.
Yes, there should be more prefixes and suffixes and Sino-Korean numerals also have hanja spellings. Like Japanese, Korean can make a lot of words with Sino-Korean components, even if Chinese doesn't have them.
Please see my change to 公言#Korean. I've added hangeul and have removed hidx. They are sorted by hangeul, aren't they?
Yes, please add all hanja without hangeul into a separate category Category:Korean hanja terms lacking hangeul or something.
Should we ask for help re: sorting? --Anatoli (обсудить/вклад) 05:05, 13 January 2014 (UTC)[reply]
Category:Korean hanja terms lacking hangeul is set up.
I see, so any Japanese term with on'yomi could possibly also be a Korean term written in hanja, although the hanja form is less common in writing. It makes it easier that there are no mixed-script terms (except compounds). I'm glad that they didn't decide to write native Korean words in hangeul and give every hanja two readings.
Yes, I think one of the Lua gurus should weigh in on the sorting question. I'm pretty sure that it's not a problem with this module though.
Hanja nouns with a hangeul parameter are sorted by hangeul. Currently the module ignores hidx. Strangely though the hanja nouns sorted with hangeul are put in the back of categories, so now some nouns in Category:Korean nouns in Han script (that were sorted previously by hidx) are in the front, some are sorted by hanja in the middle, and the most recently updated ones are sorted by jamo in the back. That should be temporary as all are sorted by hangeul from this module however. Haplogy () 05:27, 13 January 2014 (UTC)[reply]
I saw you added categorisation of hangeul-less hanja, thanks! I've requested assistance at Module_talk:utilities#Sorting_Korean_hanja_terms re: sorting/categorisation. --Anatoli (обсудить/вклад) 05:24, 13 January 2014 (UTC)[reply]
Did you see any hanja with hangeul, which is sorted incorrectly?
Yes, Korean hanja is much more consistent than Japanese kanji, including pronunciation. They are closer to Chinese and there shouldn't be too many multiple readings, unless there is more than one reading in Chinese as well, like 快樂 (쾌락 "kwae-rak") and 音樂 (음악 - "eum-ak") where 樂 has two different readings in Chinese and Korean (cf Japanese 快楽 and 音楽). --Anatoli (обсудить/вклад) 05:43, 13 January 2014 (UTC)[reply]

Template:ko-pos[edit]

I've converted {{ko-pos}} to use this module in the manner of {{ja-pos}}. There is some overlap in parts of speech with this and other templates. In particular, 90% (actually 29 out of 32) suffixes use ko-pos, but there is {{ko-suffix}} used by only 3 entries. Same thing with {{ko-proper noun}}: 11 entries use this out of 448 proper nouns total. Also {{ko-num}}. Should ko-suffix & friends be orphaned and deleted? It doesn't really matter, because they both just point to this module, but it's kind of weird to have two templates that handle the same PoS. A handful of entries used ko-pos for common nouns, and I went ahead and edited those to use ko-noun (i.e., {{ko-pos|noun|...}} > {{ko-noun|...}}). Currently entries that use unrecognized parts of speech are categorized in Category:Korean terms whose use of ko-pos needs attention. Right now the only members are three pages with an L3 of "Symbol" or "Letter" and a pos of "letter" like this: ====Symbol==== {{ko-pos|letter|rv=k|mr=k'}} ko-pos can be edited to accept "letter" as a pos, but are those entries actually right? Haplogy () 03:59, 13 January 2014 (UTC)[reply]

Thanks again. Theoretically I agree with orphaning and deletion but the existing entries should be converted to better templates. Ideally it should be done by a bot. I can start converting some entries, though. I think proper nouns should use {{ko-proper noun}} but suffixes {{ko-pos}}. --Anatoli (обсудить/вклад) 04:05, 13 January 2014 (UTC)[reply]

2021 tidying proposal[edit]

  1. Remove |mr= McCune-Reischauer and |y= Yale.
  2. Replace |rv= with |tr=.
  3. Replace |hangeul= and |hanja= with |1=, |2=, etc.:
  4. and something for |occasional hanja= to be in line with the above, but I'm not sure what.

@Atitarev, TibidibiSuzukaze-c (talk) 07:45, 13 November 2021 (UTC)[reply]

@Suzukaze-c Support on all points.-- Tibidibi (talk) 09:45, 13 November 2021 (UTC)[reply]
OK. (atitarev) --101.176.14.82 02:13, 16 November 2021 (UTC)[reply]