Template talk:character info

From Wiktionary, the free dictionary
Latest comment: 6 months ago by 2607:FB91:308:5050:81E3:4316:C055:EE15 in topic Move the Unihan Database link from {{Han ref}} to {{character info}}?
Jump to navigation Jump to search

Script detection[edit]

We should be able to deduce the script name from the code point. This would save people from having to type it out and would avoid typos. Equinox 14:34, 15 October 2009 (UTC)Reply

This template shouldn't be invoked directly at all; the idea is to have a specialized template for each block. I guess I should probably document it or something. :-) -- Visviva 15:06, 15 October 2009 (UTC)Reply
Now done. Although now I wonder if "character info" should just be "info" for these. It is a lot of typing. On the other hand, IMO people shouldn't be typing them out in the individual entries either. That's why God invented spreadsheets. ;-) -- Visviva 15:13, 15 October 2009 (UTC)Reply

Unicode bias[edit]

This is all very Unicode-centric. Most of these languages an scripts have been around for hundreds or thousands of years, Unicode about a decade or so. I am a big fan of Unicode but they don't define languages or scripts or characters and they often have weird and unnatural names and groupings of characters. Either this/these templates should try to cover the actual letters an characters as the full historic and cultural entities or it should prominently label itself specifically "Unicode character info". — hippietrail 01:44, 21 October 2009 (UTC)Reply

Well, I think it's appropriate for us to be Unicode-centric in the same way it's appropriate for us to be English-centric; it's an inescapable fact that Wiktionary uses the Universal Character Set. Wiktionary can't include characters that can't be encoded in UTF-8, for the same reason that we can't describe concepts that can't be described in English. Which is not to say that we shouldn't include information about the presence/encoding of a character in other standards as well; IMO we should. Probably some graphological info as well (most obviously in the form of an image). But a Wiktionary entry is inescapably linked to a specific Unicode codepoint, in a way that it isn't linked to any other system. -- Visviva 02:06, 21 October 2009 (UTC)Reply
Yes I'm not talking about including characters not covered by Unicode, since it endeavours to cover all characters in all used scripts. Of course they don't cover some artificial scripts such as Klingon and Elvish yet but that's not what I'm saying.
All I'm saying is that nobody needs to know codepoints to read or edit a dictionary so we don't include Unicode because it's necessary, only because it's interesting. But if Unicode is interesting, so are legacy encodings, many of which are still in wide use, and a few of which, such as Japanese ones, are still preferred over Unicode.
But more interesting than codepoints and character sets by far are such things as different styles, stroke orders, case mappings, alphabetical order, names in their native languages, names in English which are often common, more concise, and quite different from the Unicode names. I would very much like to see how they are written in handwriting for instance.
Many languages or scripts have a small number of different styles, apart from how a particular font may interpret a style. For instance Hebrew has two main styles, Khmer has two very different styles, Arabic has kufic, nastaliq, and naskh, Chinese and Japanese have different styles such as oracle bone, small seal, big seal, grass, etc. Anyway lots of fascinating stuff beyond dry old Unicode codepoints that would be wonderful for us to have in the "character info" area. — hippietrail 10:30, 21 October 2009 (UTC)Reply
Unicode codepoints are necessary, not just interesting. They allow users to instantly distinguish characters which aren't distinguishable in their current font. This can happen because their font has no glyphs for the codepoints (and they all look like boxes) or because the glyphs provided appear very similar (e.g. - ). It is for the first case that we also should always include a reference image. For this reason Unicode should be prioritized above other encodings (and it's usually doesn't get in the way as it's not that much info).
The other information is interesting and I think we should find a consistent way to show it. I think this is very interesting because it should integrate with the alternative encoding info that we already provide at the Latin letters including NATO phonetic, Morse code, Signal flag, Semaphore, ASL Manual, (Braille has it's own Unicode space and we transcribe ASL so these two are not just different encoding schemes). Maybe a new =Character encoding= header for the top entry (often the native language entry for that character or Translingual if the character is used in several different languages). As for native/English names, I think we've got that taken care of, as the common English names would be on the definition line (sometimes along with the native name). All translations of the English names would be found at the translations section of the English name entry. Is any of this information we're considering too encyclopedic? --Bequw¢τ 18:03, 21 October 2009 (UTC)Reply
One belated opinion: Unicode character names are of no interest to anyone outside the context of Unicode. They're often badly worded, misleading, based on obsolete or inaccurate transliterations, and not uncommonly outright wrong, but for legacy reasons can't be corrected. Characters should be known by their common names in English, not by their Unicode names, except when identifying them as Unicode. kwami (talk) 21:06, 16 May 2022 (UTC)Reply

Hiding next/previous line[edit]

I was thinking of adding this template to a page like A (incorporating the letter image). As that page is already pretty crowded, I was wondering if we could make the display of this template a bit more trim. That page already lists the alphabet so I was going to omit the next/previous links. As I don't think it's that important to show the name of the Unicode block, what if we moved that link to the "codepoint" text (which is currently unlinked)? That would free up and extra line. If next & previous are specified, I don't care if the link is moved or not. Does this sound doable? --Bequw¢τ 15:43, 22 October 2009 (UTC)Reply

If you're using {{Basic Latin character info}}, the codepoint links to FFI, which I think is how it should be. That is, the codepoint should always link to a page about the codepoint, so that the user is only one click away from

important but tedious information like URL encoding and keyboard entry. (I do wish we had a more authoritative source for this info than FFI, though; it's a pity Unicode's web offerings are so information-poor.) I suppose ASCII characters are kind of a special case, though, since most of the information about them is ridiculously trivial (e.g. you produce "a" by typing "a", and it renders in a URL as "a").

Still, I'm weakly opposed to making the basic elements of the template optional. They seem about as compact as they can reasonably be already, and the whole concept of this template is to have a consistent presentation of, and navigation among, all the members of the Universal Character Set. And while it's true that the alphabet is present on the page, the two aren't exactly the same (e.g. @ does not come before A in the alphabet), and it is nice if we can allow click-through on the first screen (so that someone can e.g. take a quick survey of our coverage of the block without having to scroll to the bottom of ==Translingual== each time). But again, I can see a reasonable exception for ASCII.
I had been thinking about A myself, but my thoughts had been running more towards using a more compact image for the reference glyph, and moving the other images into a gallery. -- Visviva 16:47, 22 October 2009 (UTC)Reply
OK, now I've taken a stab at A, and I have to admit it looks kind of garish and wrong. Humbug. -- Visviva 17:26, 22 October 2009 (UTC)Reply
CodepointU+41
CodepointU+41
Basic Latin
Ah, I musn't have been clear when I was explaining. If no next/prev params are entered, I'd like the bottom of the display to be like the top version at right rather than the bottom version. Also, do you think many people will be browsing my Unicode order (checking the "coverage of the block")? This is where I'd say we'd be overly biasing Unicode. It's a small point though. Also, about your A cleanup, I think it was still an improvement:) --Bequw¢τ 18:20, 22 October 2009 (UTC)Reply
U+41 is not the correct format for Unicode codepoints. It's 4 or 5 hex digits as actually used on the fileformat.info page: U+0041 — hippietrail 00:54, 23 October 2009 (UTC)Reply
I've fixed this now in the template (so that it will zero-pad {{{hex}}} to 4 digits). However, I foolishly wrote explicit {{{codepoint}}} values (which assume the hex is already zero-padded) into a lot of the feeder templates, which will have to be unexplicitized. -- Visviva 07:40, 23 October 2009 (UTC)Reply
I've implemented this; please adjust as needed. As far as previous/next, I think they support two important functions of the web interface: QC and casual discovery. Being able to click through from one entry to the next is useful for someone working on getting, say, all of our Burmese character entries up to snuff (dealing with 100 open tabs can be a serious pain, after all). It may also entice a casual user to click through and discover new and interesting things, which may in turn result in our hooking another much-needed Wiktionarian. :-) But neither function is important enough to trump the principle of least astonishment, and there is something rather astonishing about having "@" be one of the most prominent links in the first screen of the long, complex entry for A. There must be a way of designing the template that would be less troublesome, but for now just omitting the navbar seems like the way to go. -- Visviva 07:40, 23 October 2009 (UTC)Reply
I see what you mean. Thanks for implementing that. --Bequw¢τ 12:44, 23 October 2009 (UTC)Reply

More compacter?[edit]

I like it. It saves that extra line (that I notice because I browse with RHS TOC). --Bequw¢τ 16:38, 24 October 2009 (UTC)Reply

stroke order[edit]

For CJK entries, should the reference image be a simple image or be one from the Stroke Order Project? We could use either the animated or red-gradient variants (the multi-panel B&W ones are quite wide). Reading w:stroke order, I think they are mostly standard per character (though this is not my area of expertise). If we do include them, I assume the {*character info} would subsume {{stroke order}} and {{Han stroke}}. Thoughts? --Bequw¢τ 22:09, 2 November 2009 (UTC)Reply

Or should animations such as on be used? --Bequw¢τ 16:10, 6 November 2009 (UTC)Reply

multiple parameters[edit]

Is there a reason we have script, list, and root parameters? I can see having two for occasions when a list is so long it's split up, (e.g. Appendix:Unicode/Hangul Syllables) and there for the name and link will be different. Why three? --Bequw¢τ 16:16, 6 November 2009 (UTC)Reply

Codepoint equivalence[edit]

As noted in the GP, sometimes Mediawiki software (possibly because of the Unicode standard) converts Unicode codepoints to more standard variants. I've noted this for now in the image caption on [[;]], but if someone comes up with a better way to show this (or a list of all characters this happens for) please be bold. --Bequw¢τ 05:25, 27 November 2009 (UTC)Reply

See Appendix:Unicode normalization. --Bequw¢τ 21:18, 9 December 2009 (UTC)Reply

Fileformat alternative[edit]

You can use the Unicode site to get character details like http://unicode.org/cldr/utility/character.jsp?a=0958. But for some reason it doesn't work on Chrome for me. If that get's fixed we can switch the Fileformat link over. --Bequw¢τ 23:35, 9 December 2009 (UTC)Reply

Works now, so I switched it. --Bequw¢τ 01:01, 17 December 2009 (UTC)Reply

Uppercasing of the name[edit]

I think it would be better to use the CSS property "text-transform" to display the name in uppercase than hardcoding it in the HTML source. Semantically, there's no reason for the name to be in uppercase, it's only a matter of displaying. --The Evil IP address 13:38, 3 October 2011 (UTC)Reply

RFDO discussion: September–November 2016[edit]

The following discussion has been moved from Wiktionary:Requests for deletion/Others (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


See this diff in which I replaced the former by the latter in the entry Ç. As an interesting additional perk, the "/new" template automatically displayed the composition as C + cedilla.

Is Template:character info/new ready to completely replace the older template in all character entries? The "/new" template uses Module:character info, which says it's in beta stage... But as far as I know, it seems to be working well.

I propose deleting Template:character info and all the Unicode block templates. It's interesting that a few of those actually use the "/new" template, which makes them even more redundant and unnecessary:

These templates have incorrect names:

I believe we should also rename {{character info/new}} to {{character info}}, if this RFDO passes. --Daniel Carrero (talk) 05:15, 14 September 2016 (UTC)Reply

Failed, I guess. It's been 2 months without replies and as I said above, it seems obvious that we can replace the various Unicode block charboxes with simply {{character info/new}} (but maybe it should be simply renamed to {{character info}} without "/new"). I'll do the work later. --Daniel Carrero (talk) 01:05, 13 November 2016 (UTC)Reply
@Daniel Carrero: {{character info/new}} is still in beta, really. One specific problem comes to mind, namely that it produces previous and next links even when either or both of those codepoints are unassigned; it should be able to tell when they're unassigned and then either leave the link space blank or note that the previous and/or next codepoint is/are unassigned. — I.S.M.E.T.A. 00:21, 14 November 2016 (UTC)Reply
@I'm so meta even this acronym: I believe I was able to fix the issue you mentioned; I attempted to make the module detect automatically whether the "previous" or "next" links are unassigned and then look for the "nearest" assigned link.
See the entry Ա (U+0531). The "previous" link is currently pointing to ԯ (U+052F), because the intermediate code (U+0530) is unassigned.
Are there any other problems in {{character info/new}}? I've been using it for some time, and I don't remember other issues to be fixed at the moment, so I removed the "this is a beta template" warning from it (but feel free to re-add if you want). Let me know if there are is something else to fix in the module/template. I'm interested in keeping the work of adding it to entries as I said in my talk page. --Daniel Carrero (talk) 02:39, 14 November 2016 (UTC)Reply
@Daniel Carrero: I had advocated leaving the previous/next field(s) blank in the case of unassigned codepoints, but actually, I think your solution is better. However, would it be possible to add <small></small> text under the previous/next link saying, e.g., “codepoint U+0530 is unassigned” or “codepoints U+A7B2–U+A7F6 are unassigned” where appropriate? No other general problems with the module/template currently come to my mind. — I.S.M.E.T.A. 13:21, 14 November 2016 (UTC)Reply
@I'm so meta even this acronym:  Done, check entries like ԯ, Ա, , . They should be in this category. --Daniel Carrero (talk) 08:47, 18 November 2016 (UTC)Reply
@Daniel Carrero: I'm very chuffed with this fix and I think the category is a good idea, too. Once we resolve the issue at Module talk:character info‎#Bold codename, it looks like everything will be fine with {{character info/new}}. Thereafter, I think we should propose in the Beer Parlour that the template be renamed to {{character info}}, as you suggested. — I.S.M.E.T.A. 16:27, 23 November 2016 (UTC)Reply

 Done. I deleted all the old "character info" templates. I believe all the single-character entries are using {{character info/new}} now. --Daniel Carrero (talk) 13:57, 23 November 2016 (UTC)Reply

Character entity names[edit]

This template does not seem to allow for the display of character entity names such as &amp; for &, or &percnt; for %. Why not ? —Jerome Potts (talk) 08:39, 4 February 2021 (UTC)Reply

Merge w {{emojibox}}[edit]

@Daniel Carrero Could we merge {{emojibox}} as an option?

Some characters have alt text and emoji variants. The recent template 'emojibox' shows the two side by side, but it has alignment problems. Could we perhaps merge it into this template, as a param like "emoji alt"? Or could the alignment of character_info be applied to emojibox, so it doesn't interfere with the alignment of the rest of the entry? kwami (talk) 21:02, 16 May 2022 (UTC)Reply

Move the Unihan Database link from {{Han ref}} to {{character info}}?[edit]

I was thinking about {{Han ref}} with someone, and came to the idea given above.

Moving the Unihan Database link:

  • would be reasonable given the existing link to https://util.unicode.org/
  • would make any codepoint errors (incorrectly given codepoint) readily obvious, as {{Han ref}} does not render the codepoint
  • would leave just 1 instance of {{Han ref}} on each page (for entries where {{Han ref}} is also added for compatibility ideographs, at least), simplifying error checking (incorrectly given data for {{Han ref}})
  • would leave 1 single place to add compatibility ideograph information instead of 2 ({{Han ref}} + {{character info}})

(Alternative, more radical idea for consideration: combine both {{Han char}} and {{Han ref}} into {{character info}}, powered fully by centralized Module: data.)

(@Theknightwho, Erutuon)

Fish bowl (talk) 05:41, 2 November 2023 (UTC)Reply

This is a good idea because people sometimes do input wrong code points (example 1, example 2, example 3). 2607:FB91:308:5050:81E3:4316:C055:EE15 06:52, 2 November 2023 (UTC)Reply