Talk:

From Wiktionary, the free dictionary
Jump to navigation Jump to search

RFD[edit]

The following information passed a request for deletion.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Test case:

  1. This is not a Korean word.
  2. This is a syllable occurring within Korean words (principally in the penultimate syllable of past-tense forms of verbs which happen to end in 나다), but is not in any sense a "root." We don't normally include random syllables for other languages, so it is difficult to see why Korean would be an exception.
  3. Heretofore the ===Syllable=== heading for Korean has been used only for hanja roots, for which no other suitable PoS header is available.
  4. This is a Unicode codepoint.

So, do we include all Unicode entities on principle, regardless of their lexical significance, or not? It would not be terribly difficult to robo-add minimal entries for every Unicoded hangul syllable. But is that really what Wiktionary is in the business of doing? -- Visviva 12:57, 21 January 2008 (UTC)[reply]

Probably obvious from the above, but I incline to delete this and any others that are lagging around. But if the consensus is to keep, I'll get on with the business of robo-adding all the others. It's silly to just have a few. -- Visviva 12:58, 21 January 2008 (UTC)[reply]
But it's not a Korean letter, it's a sequence of three Korean letters, not particularly distinct from any other possible sequence. We don't have xyg or pga either. -- Visviva 13:06, 21 January 2008 (UTC)[reply]
My thinking on this echoes Cynewulf's comment below. English really lacks anything comparable - we don't change the shape of our characters just because we happen to group them together (except in the rare cases of æ and œ. If we did, these alternative shapes would surely have entries here (as those two do). bd2412 T 03:41, 22 January 2008 (UTC)[reply]
Keep all Unicode characters, even deprecated compatibility characters, and characters whose normalized forms are decompositions into sequences of other characters, and the characters with not-safe-for-work glyphs that Messrs. Uni and Code added when they quit the Consortium and decided to burn bridges on their way out, and that one character that decomposes recursively into two of itself. —RuakhTALK 18:33, 21 January 2008 (UTC)[reply]
To clarify: my rationale is not that Unicode is magical (I don't know whether it is), but that pretty much every Unicode character is actually used, and Unicode characters are typically the minimal unit of copy-and-paste. —RuakhTALK 22:35, 21 January 2008 (UTC)[reply]
I'm going to reverse myself here and say keep. Here's why: Imagine somebody who doesn't know Korean comes across some bit of Korean text, say 감사합니다, and tries to look it up on Wiktionary, but it's not there. So they think "Hmm. Wiktionary has 'a' and '' and '', maybe I can look up the 'letters' and figure out what this sounds like at least." So they try to look up and and and and . So, in the interests of explaining all language, I think we should include composed syllable blocks that might reasonably form Korean words, as "Korean: Pronunciation: (however) Symbol (or Syllable, or anything reasonable): A syllable block composed of ." I don't know whether Unicode includes the equivalent of "bxd", sequences that don't appear in any word and can't really be pronounced, but I think that's a different question. Cynewulf 19:04, 21 January 2008 (UTC)[reply]
I guess I could see "Symbol" as a header. On the other hand I don't foresee us including Middle Korean syllables here, either in the PUA versions that everyone uses or the conjoined Unicode versions that only we use. Seems unfair that we provide this rather extraordinary service only for the modern form of a single language. -- Visviva 03:26, 22 January 2008 (UTC)[reply]
To expand, how is someone who doesn't already know hangeul supposed to know that can be broken down to , but can't be broken down to ? Cynewulf 22:07, 21 January 2008 (UTC)[reply]
Redirect. We are not Unicode's bitch. DAVilla 21:42, 21 January 2008 (UTC)[reply]
I think Davilla's got the right idea here. If there is an argument for letting a user get helpful results from inputting this character in a search, but there is no argument for having an actual entry, a redirect is the best solution. Atelaes 21:49, 21 January 2008 (UTC)[reply]
Redirect to what? Cynewulf 22:07, 21 January 2008 (UTC)[reply]
Appendix:Unicode/Hangul? -- Visviva 03:26, 22 January 2008 (UTC)[reply]
If we're going to have redirects, we may as well have a short informative entry on the specific symbol along with a link to such an appendix. Or else we're going to have an enormous appendix page. bd2412 T 03:43, 22 January 2008 (UTC)[reply]
Everything there is to say about such a symbol -- decomposition, transliterations (two), codepoint, perhaps keystroke sequence -- would fit comfortably onto a single line in a table. Granted, we have a lot of entries that could fit on a single line in a table, but in general such entries can be expanded. Here once the basic technical stuff is filled in, there will never be anything more to say. Unlike virtually all other Unicode characters, these are not meaningful graphemic units in any context. -- Visviva 09:47, 22 January 2008 (UTC)[reply]
I'd be O.K. with a redirect to something like Appendix:Hangul syllables#났 (with a <span id="났"/> at the right point in the table); I'd rather have a separate article, but I don't feel strongly about it. —RuakhTALK 02:26, 24 January 2008 (UTC)[reply]
Sorry, but we are Unicode's bitch. But we don't need to feel bad about it: the vast majority of the Web is Unicode's bitch. And I'd be hard-pressed to think of a coded character set whose bitch I'd rather be. Most don't even have decent bidi algorithms! —RuakhTALK 22:35, 21 January 2008 (UTC)[reply]

Keep. What is this character? I have no idea, and I want to look it up! You have explained above that it is not a word but a syllable ocurring within Korean words, all of which is very useful and pertinant and should go on the page for this character, rather than here on RFD. Widsith 09:31, 22 January 2008 (UTC)[reply]

I fail to see how this logic would not apply to any other random sequence of characters (which is what this actually is). I can't necessarily tell where one Hebrew or Arabic letter ends and another begins, but that doesn't mean we should have entries for every conceivable combination of letters in those systems. -- Visviva 09:47, 22 January 2008 (UTC)[reply]
If a random sequence of characters is encoded as a single separate character, then we certainly should have an entry for it. Besides, what harm does it do exactly? Widsith 09:52, 22 January 2008 (UTC)[reply]
Keep. We’ve discussed this several times before. There is nothing exactly like it in other languages or writing systems, but it’s close to a letter of the alphabet (please don’t bother to explain more precisely what it is...I know what it is). It is a syllable that cannot be divided into smaller parts by people who can’t type Korean. I think it is important to include these syllables for the purpose of showing pronunciation. A user might not find a particular word or form anywhere, but at least he could find out how it’s pronounced by searching for each Unicode point (syllable), just as we can do with words written in Russian, Arabic, etc. And, as Widsith points out, what harm does it do? —Stephen 17:29, 22 January 2008 (UTC)[reply]
Er, since it's not a word, it doesn't have a pronunciation. Any information the user might imagine himself to be gleaning thus would be illusory. But the rest of your points are well-taken. -- Visviva 18:48, 22 January 2008 (UTC)[reply]
I would pronounce "nat". —Stephen 21:40, 22 January 2008 (UTC)[reply]
Sure, unless it were followed by a nasal or a vowel, in which case you would (presumably) pronounce it "nan" or "nass-." Any useful pronunciation info would have to be bound to the specific word in which it occurs. I don't think these symbols have a true pronunciation any more than, for example, does. -- Visviva 11:36, 23 January 2008 (UTC)[reply]
  • OK, I'm not sure I understand why we're so eager to take on items that are far outside our mandate, but so be it. Further question: Is it because it's a recognized Unicode character, or because it's the smallest pasteable unit? Specifically, I'm curious if an archaic Hangul syllable such as (that's ᄂᆡ in the Unicode-compliant version nobody uses) would be accepted here. It occurs in archaic Korean texts online, and you'd be surprised where odd little bits of archaic Korean end up that someone might want to know about. However, it's a Private Use Area codepoint, not part of Unicode proper. Would it also be acceptable to have an entry for this entity (possibly including other PUA applications of the same codepoint, if any)? -- Visviva 18:48, 22 January 2008 (UTC)[reply]
I would say that being a recognized Unicode character is enough to warrant an explanation here, and it seems to me that being the smallest pasteable unit is just another way of looking at it. If we can make practical, useful pages for letters such as , then I’m for it. It’s just that I’m just not sure about the private-use aspect. I suppose it means that the letter exists only in one or a small set of fonts, one of which must be in your system before you will see the correct form. In this case, I can see a backward P with a dot over it, but I don’t know if that’s what it really looks like. This is a question for somebody who knows more about fonts and private-use areas than I do. —Stephen 02:31, 23 January 2008 (UTC)[reply]
Well, the archaic Hangul blocks are not recognized Unicode, but they are pasteable, which made me curious as to where we draw the line. For the record, I don't plan actually to create any of these, if only because I'm fairly sure it would cause Robert to become apoplectic. It's interesting that you see a backwards-P; when I've used computers that didn't have the font installed, these syllables have always rendered as weird Chinese characters. I assume that means that the same part of the PUA has also been used to render those symbols, although it's possible these are just weird OS artifacts. (For the record, the archaic Hangul looks like 뇌, but without the bar across the bottom). A comprehensive list of uses for each PUA codepoint would be an interesting project, although I'm not sure how one would even begin to gather such information. -- Visviva 11:36, 23 January 2008 (UTC)[reply]
Comment. I fail to see why we'd keep ʃ and not this, or vice versa.—msh210 17:12, 24 January 2008 (UTC)[reply]


Reflection[edit]

In retrospect it was clearly a mistake to carry out the RFD as I did. I am not sure whether the community as a whole has simply decided en bloc to forsake all principles and welcome every sort of meaningless crap into our dictionary, or whether I simply failed to explain the facts of the matter clearly. In the first case, I should simply have speedily deleted the entry out of hand for the good of the project. In the latter case perhaps I would have done better to purge the entry of gunk before nominating. In any event, the consensus of the community is regrettably clear. -- Visviva 03:57, 26 January 2008 (UTC)[reply]

Kept. See archived discussion of January 2008. 05:07, 5 February 2008 (UTC)