Template talk:character info

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Script detection[edit]

We should be able to deduce the script name from the code point. This would save people from having to type it out and would avoid typos. Equinox 14:34, 15 October 2009 (UTC)

This template shouldn't be invoked directly at all; the idea is to have a specialized template for each block. I guess I should probably document it or something. :-) -- Visviva 15:06, 15 October 2009 (UTC)
Now done. Although now I wonder if "character info" should just be "info" for these. It is a lot of typing. On the other hand, IMO people shouldn't be typing them out in the individual entries either. That's why God invented spreadsheets. ;-) -- Visviva 15:13, 15 October 2009 (UTC)

Unicode bias[edit]

This is all very Unicode-centric. Most of these languages an scripts have been around for hundreds or thousands of years, Unicode about a decade or so. I am a big fan of Unicode but they don't define languages or scripts or characters and they often have weird and unnatural names and groupings of characters. Either this/these templates should try to cover the actual letters an characters as the full historic and cultural entities or it should prominently label itself specifically "Unicode character info". — hippietrail 01:44, 21 October 2009 (UTC)

Well, I think it's appropriate for us to be Unicode-centric in the same way it's appropriate for us to be English-centric; it's an inescapable fact that Wiktionary uses the Universal Character Set. Wiktionary can't include characters that can't be encoded in UTF-8, for the same reason that we can't describe concepts that can't be described in English. Which is not to say that we shouldn't include information about the presence/encoding of a character in other standards as well; IMO we should. Probably some graphological info as well (most obviously in the form of an image). But a Wiktionary entry is inescapably linked to a specific Unicode codepoint, in a way that it isn't linked to any other system. -- Visviva 02:06, 21 October 2009 (UTC)
Yes I'm not talking about including characters not covered by Unicode, since it endeavours to cover all characters in all used scripts. Of course they don't cover some artificial scripts such as Klingon and Elvish yet but that's not what I'm saying.
All I'm saying is that nobody needs to know codepoints to read or edit a dictionary so we don't include Unicode because it's necessary, only because it's interesting. But if Unicode is interesting, so are legacy encodings, many of which are still in wide use, and a few of which, such as Japanese ones, are still preferred over Unicode.
But more interesting than codepoints and character sets by far are such things as different styles, stroke orders, case mappings, alphabetical order, names in their native languages, names in English which are often common, more concise, and quite different from the Unicode names. I would very much like to see how they are written in handwriting for instance.
Many languages or scripts have a small number of different styles, apart from how a particular font may interpret a style. For instance Hebrew has two main styles, Khmer has two very different styles, Arabic has kufic, nastaliq, and naskh, Chinese and Japanese have different styles such as oracle bone, small seal, big seal, grass, etc. Anyway lots of fascinating stuff beyond dry old Unicode codepoints that would be wonderful for us to have in the "character info" area. — hippietrail 10:30, 21 October 2009 (UTC)
Unicode codepoints are necessary, not just interesting. They allow users to instantly distinguish characters which aren't distinguishable in their current font. This can happen because their font has no glyphs for the codepoints (and they all look like boxes) or because the glyphs provided appear very similar (e.g. - ). It is for the first case that we also should always include a reference image. For this reason Unicode should be prioritized above other encodings (and it's usually doesn't get in the way as it's not that much info).
The other information is interesting and I think we should find a consistent way to show it. I think this is very interesting because it should integrate with the alternative encoding info that we already provide at the Latin letters including NATO phonetic, Morse code, Signal flag, Semaphore, ASL Manual, (Braille has it's own Unicode space and we transcribe ASL so these two are not just different encoding schemes). Maybe a new =Character encoding= header for the top entry (often the native language entry for that character or Translingual if the character is used in several different languages). As for native/English names, I think we've got that taken care of, as the common English names would be on the definition line (sometimes along with the native name). All translations of the English names would be found at the translations section of the English name entry. Is any of this information we're considering too encyclopedic? --Bequw¢τ 18:03, 21 October 2009 (UTC)

Hiding next/previous line[edit]

I was thinking of adding this template to a page like A (incorporating the letter image). As that page is already pretty crowded, I was wondering if we could make the display of this template a bit more trim. That page already lists the alphabet so I was going to omit the next/previous links. As I don't think it's that important to show the name of the Unicode block, what if we moved that link to the "codepoint" text (which is currently unlinked)? That would free up and extra line. If next & previous are specified, I don't care if the link is moved or not. Does this sound doable? --Bequw¢τ 15:43, 22 October 2009 (UTC)

If you're using {{Basic Latin character info}}, the codepoint links to FFI, which I think is how it should be. That is, the codepoint should always link to a page about the codepoint, so that the user is only one click away from

important but tedious information like URL encoding and keyboard entry. (I do wish we had a more authoritative source for this info than FFI, though; it's a pity Unicode's web offerings are so information-poor.) I suppose ASCII characters are kind of a special case, though, since most of the information about them is ridiculously trivial (e.g. you produce "a" by typing "a", and it renders in a URL as "a").

Still, I'm weakly opposed to making the basic elements of the template optional. They seem about as compact as they can reasonably be already, and the whole concept of this template is to have a consistent presentation of, and navigation among, all the members of the Universal Character Set. And while it's true that the alphabet is present on the page, the two aren't exactly the same (e.g. @ does not come before A in the alphabet), and it is nice if we can allow click-through on the first screen (so that someone can e.g. take a quick survey of our coverage of the block without having to scroll to the bottom of ==Translingual== each time). But again, I can see a reasonable exception for ASCII.
I had been thinking about A myself, but my thoughts had been running more towards using a more compact image for the reference glyph, and moving the other images into a gallery. -- Visviva 16:47, 22 October 2009 (UTC)
OK, now I've taken a stab at A, and I have to admit it looks kind of garish and wrong. Humbug. -- Visviva 17:26, 22 October 2009 (UTC)
Codepoint U+41
Codepoint U+41
Basic Latin
Ah, I musn't have been clear when I was explaining. If no next/prev params are entered, I'd like the bottom of the display to be like the top version at right rather than the bottom version. Also, do you think many people will be browsing my Unicode order (checking the "coverage of the block")? This is where I'd say we'd be overly biasing Unicode. It's a small point though. Also, about your A cleanup, I think it was still an improvement:) --Bequw¢τ 18:20, 22 October 2009 (UTC)
U+41 is not the correct format for Unicode codepoints. It's 4 or 5 hex digits as actually used on the fileformat.info page: U+0041 — hippietrail 00:54, 23 October 2009 (UTC)
I've fixed this now in the template (so that it will zero-pad {{{hex}}} to 4 digits). However, I foolishly wrote explicit {{{codepoint}}} values (which assume the hex is already zero-padded) into a lot of the feeder templates, which will have to be unexplicitized. -- Visviva 07:40, 23 October 2009 (UTC)
I've implemented this; please adjust as needed. As far as previous/next, I think they support two important functions of the web interface: QC and casual discovery. Being able to click through from one entry to the next is useful for someone working on getting, say, all of our Burmese character entries up to snuff (dealing with 100 open tabs can be a serious pain, after all). It may also entice a casual user to click through and discover new and interesting things, which may in turn result in our hooking another much-needed Wiktionarian. :-) But neither function is important enough to trump the principle of least astonishment, and there is something rather astonishing about having "@" be one of the most prominent links in the first screen of the long, complex entry for A. There must be a way of designing the template that would be less troublesome, but for now just omitting the navbar seems like the way to go. -- Visviva 07:40, 23 October 2009 (UTC)
I see what you mean. Thanks for implementing that. --Bequw¢τ 12:44, 23 October 2009 (UTC)

More compacter?[edit]

  • Thoughts on this? Seems clearer and more compact, but will require some cleanup of existing uses. -- Visviva 05:58, 24 October 2009 (UTC)
I like it. It saves that extra line (that I notice because I browse with RHS TOC). --Bequw¢τ 16:38, 24 October 2009 (UTC)

stroke order[edit]

For CJK entries, should the reference image be a simple image or be one from the Stroke Order Project? We could use either the animated or red-gradient variants (the multi-panel B&W ones are quite wide). Reading w:stroke order, I think they are mostly standard per character (though this is not my area of expertise). If we do include them, I assume the {*character info} would subsume {{stroke order}} and {{Han stroke}}. Thoughts? --Bequw¢τ 22:09, 2 November 2009 (UTC)

Or should animations such as on be used? --Bequw¢τ 16:10, 6 November 2009 (UTC)

multiple parameters[edit]

Is there a reason we have script, list, and root parameters? I can see having two for occasions when a list is so long it's split up, (e.g. Appendix:Unicode/Hangul Syllables) and there for the name and link will be different. Why three? --Bequw¢τ 16:16, 6 November 2009 (UTC)

Codepoint equivalence[edit]

As noted in the GP, sometimes Mediawiki software (possibly because of the Unicode standard) converts Unicode codepoints to more standard variants. I've noted this for now in the image caption on [[;]], but if someone comes up with a better way to show this (or a list of all characters this happens for) please be bold. --Bequw¢τ 05:25, 27 November 2009 (UTC)

See Appendix:Unicode normalization. --Bequw¢τ 21:18, 9 December 2009 (UTC)

Fileformat alternative[edit]

You can use the Unicode site to get character details like http://unicode.org/cldr/utility/character.jsp?a=0958. But for some reason it doesn't work on Chrome for me. If that get's fixed we can switch the Fileformat link over. --Bequw¢τ 23:35, 9 December 2009 (UTC)

Works now, so I switched it. --Bequw¢τ 01:01, 17 December 2009 (UTC)

Uppercasing of the name[edit]

I think it would be better to use the CSS property "text-transform" to display the name in uppercase than hardcoding it in the HTML source. Semantically, there's no reason for the name to be in uppercase, it's only a matter of displaying. --The Evil IP address 13:38, 3 October 2011 (UTC)