User talk:Stephen G. Brown/Arabic vowels
I don’t see the value in vowelled Arabic redirects. No one ever uses all the possible vowel points in a given word. If the writer thinks they are needed in a particular case, he only puts one, rarely two. Two different writers, or the same writer in two different cases, might choose to use a different vowel on the same word. The best Arabic software just ignores the vowels, as though they weren’t there. If you search for a word, you can put any or all of the vowels, or none, and it will find all cases of the word regardless of the vowels. But Wiktionary software doesn’t ignore them, so if you put redirects, you need to include a word with all possible permutations of vowel combinations. —Stephen 10:08, 11 May 2005 (UTC)
- Not exactly "no one". The two major uses are dictionaries and the Koran. Voweled Korans exist on the Internet. I can easily imagine people more interested in Islam than in Arabic trying to look up words by cutting and pasting into a dictionary like ours. Ideally, the Wiktionary software would have search ability capable of ignoring vowels, but I've looked into it and it's not in an easy state to hack right now due to how MySQL implements free-text search. — Hippietrail 01:27, 17 May 2005 (UTC)
- Well, I meant that no writer uses them, not that no dictionary uses them. I recall that there was a time that Qaddafy mandated full vowel-pointing to be used in all material printed in and for Libya. People hated it. Yes, people sometimes cut and paste into search engines to search for Arabic words, but different sources provided different vowel-pointing for the very same word. Not only that, but some sources also make use of tashdid to stretch some words (or parts of words) out for esthetic purposes. Also, some letters (such as kaf) have one or more variations which may be used, again for esthetic reasons. Any search engine that cannot discern and ignore the vowels, sukuns, shaddas, tashdids, and elegant variations, really is very, very primitive, and should only be used in the most basic way ... meaning that all the vowels and other extraneous marks should be removed first. Anyone who is searching for an Arabic word who does not know to (and know how to) add or remove these things, will never be able to get anywhere with a program such as ours that really is not meant for languages that use the Arabic script. —Stephen 10:03, 17 May 2005 (UTC)
- Shuō cáo cāo, cáo cāo dào! I just read on the wikitech mailing list minutes ago that a new, much better full-text search engine has been enabled on the English Wiktionary and several other wikis. It's nice for English, I'll see what it can do for other languages. I'll ask on the list how "hackable" it is. Maybe we can improve it. (I'm an ex software engineer) — Hippietrail 11:08, 17 May 2005 (UTC)
Then we should, if we're going to use the universal accent, make up a Cyrillic template and use it religiously (despite the problems that it causes in bulleted items). I have no idea how to do that. —Stephen 15:57, 8 May 2005 (UTC)
- I'll try to find some time to ask the developers or the people who came up with the IPA, Unicode, and Polytonic Greek templates. — Hippietrail 04:30, 11 May 2005 (UTC)
- Yes, Polytonic Greek is another problem. I’m assuming that the fonts they use to create these templates are embedded fonts, so that everyone sees them regardless of what’s on their computers. It that’s the case, it would be helpful to make templates that include, for example, Khmer. Khmer is one of the most complex scripts, and very few people have the ability to see Khmer Unicode. It’s still too new. Almost all Cambodians are still using old-fashioned TT Khmer fonts with numerous possible encodings. —Stephen 10:08, 11 May 2005 (UTC)
Regarding the Arabic and Thai templates. I'm still learning wiki templates and CSS so I've made some errors with the new experimental templates that I'm trying to fix.
I noticed you used the template once or twice and saw that you were using it in the headings but not in the plain article text. This made me wonder if it was because of the size. Then I realized I'd given absolute sizes in the templates which naturally override the sizes due to being at a particular heading level. I've made them relative now, both at 150% but please feel free to tweak them if you find they're not quite right.
- No, it’s because of the problem with the vowels. After they fix it, a lot of the Arabic will have to be retyped, so I haven’t wanted to mess with it much. At least, I assume retyping will be necessary...I don’t see how they could be fixed automatically. After they fix the problem and I begin retyping, then I’ll use to AR template to indicate that the vowels in an entry have been checked and fixed if needed. —Stephen 09:56, 15 Jun 2005 (UTC)
I've also found that overriding the sizes in your personal CSS file doesn't actually work. I thought it was due to a special hack used in the IPA template I based my templates on which was supposed to make the fonts change only for Internet Explorer since other browsers worked fine without the IPA template. Anyway, these attempts did not make any difference for me. — Hippietrail 09:32, 15 Jun 2005 (UTC)
- I’ve been looking at some Monobook pages and it surprises me how cryptic and unintuitive they are. I guess I’ll just have to start trying different things to see what effects I can get. —Stephen 09:56, 15 Jun 2005 (UTC)
No Wikipedia doesn't use any embedded fonts. Wikipedia gives help on how to install fonts for exotic purposes for those who need them and just uses Unicode with the aforementioned macros.
- I was afraid of that. Too bad, embedding would be a tremendous tool here if it were available. —Stephen 10:03, 17 May 2005 (UTC)
I'm very interested in the complex Asian scripts, especially the ones that are so complex that few fonts still exist. Khmer, Myanmar, Sinhala, Tibetan are all difficult to find fonts for which do as expected. That hasn't stopped me from making a small number of Wiktionary entries in each language so that I can learn how to use them and what their difficulties are - including the technological difficulties that Wiktionary will have to overcome if it's to live up to "all words in all languages". It's a challenge! — Hippietrail 01:27, 17 May 2005 (UTC)
- Yes, they are a problem. I have a lot of those fonts (Khmer, Burmese, etc.) that I use with QuarkXPress or Adobe Illustrator, but they are not Unicode. I recently tried to install Unicode support for Khmer, but something wasn’t compatible with my Win2K and I was forced to re-install my OS to fix the problems. —Stephen 10:03, 17 May 2005 (UTC)
Other ways which may work for different people are to adjust your Wiktionary settings, to create a custom Wiktionary "user stylesheet", to change the browser's default font, or just to have some nice fonts installed. Some browsers and OSes have different degrees of smarts than others.
- I'm using the most recent version of IE 6, plus the most recent Uniscribe and top-quality fonts. I may be limited in my OS, however, which is Win2K Professional. I think I'm seeing the same thing you are, but to me it’s grotesque. —Stephen 15:57, 8 May 2005 (UTC)
- I think I mentioned that I worked for years in the typographics and print industry, and in printed materials a "universal" accent such as the one here would be unacceptable. Each typeface requires specially-designed accents, and every character requires custom positioning of the accents. I don't think anyone would produce a dictionary that included accented Cyrillic letters (even with proper accents) along with transliterations. —Stephen 15:57, 8 May 2005 (UTC)
One trap we should especially do our best to avoid is making the "text" or "source" subservient to the software/OS/etc. Software and OS developers resist improving their products on the basis of "no websites use that stuff". We need that stuff! Wikipedia is one of the most-used resources on the Internet. We are its sister project. In a way it's our job to make sure we use all the Unicode features so that Microsoft, MediaWiki, etc support them. If people didn't try to push the technology to work with their "exotic" languages, everybody would still be typing Greek and Russian and Japanese and Arabic in Latin transliteration.
- Somehow I don't think the pseudo-accent has anything to do with Unicode or pushing the technology. Yes, it would be nice if we had Cyrillic fonts with properly accented letters and an easy way to type them, but I don't see how using the false acute is going to make anyone see the need. They'll just think that we don't care how it looks. —Stephen 15:57, 8 May 2005 (UTC)
Another way of looking at it is that every time OSes and browsers actually did improve, we'd have to go through and re-do all our articles to put in the features that we really wanted all along.
- The only way to fix the accents is to develop fonts with custom-accented characters according to the typeface, with Unicode assignments. And if that ever happened, then we'd have to go back and delete and redo all of these fake accents. Putting the accent isn't a way a getting ready for the future, it's something that the future may demand that we go back and delete them and replace them with the correct Unicode forms. —Stephen 15:57, 8 May 2005 (UTC)
- In fact it's not possible to make fonts with "precomposed" Cyrillic letters unless we put those letters in the user-defined range of Unicode. There are surely other Cyrillic fonts already around which do have the combining acute done properly. Two that I know of are Arial Unicode MS and Code2000, though the latter is very ugly. I'll try to find some people to ask about such fonts, as well as fonts which include obsolete Cyrllic characters. — Hippietrail 04:30, 11 May 2005 (UTC)
- I know a lot of this sounds like a bit of a rant or sounds unreasonable for the user. But most users just want English and maybe one or two western European languages. People that want more exotic stuff already have to customize their computer by installing fonts and keyboards etc. People who want to use all kinds of funky languages like Hindi and Georgian know they have to do even more. The best way to make Wiktionary better for such people is to resist the urge to dumb down the content because of deficiencies in the default setup of the software. If we dumb down, there's no way for the user to customize to get back what's been omitted.
- I don't think keeping all the pronunciation guides all together in the transliteration is dumbing down, I think it's the best way to go. If we're going to put the "universal" accents that we may someday have to redo, then we shouldn't be bothering with the transliteration. The pronunciation doesn't need to be in two different places. —Stephen 15:57, 8 May 2005 (UTC)
Please feel free to discuss any of these points at length. I know my writing is bad and may be unclear for those trying to understand what I want to say. — Hippietrail 14:25, 8 May 2005 (UTC)
- As I said (or at least meant to say), I'll leave the "universal" accents alone, but they are too much work to improve. I think all I can do when I come across entries that are treated this way is to ignore them. As far as I'm concerned, transliteration is currently the only good answer with Russian, Japanese, and most other languages. There are some languages such as Thai where this really is not enough, and I haven't thought of anything reasonable for Thai. There are commonly accepted practices for indicating tones in Chinese, either by accents (liù) or by numbers (liu4). But Thai has more complex vowels and more accents, and I don't know of a nontechnical way of indicating them. For example, I've been using àèìòù for both low and falling tones. Chinese doesn't present this problem. To me this is a real problem. I don't feel that putting accents strictly on the transliteration amounts to a problem with Russian. —Stephen 15:57, 8 May 2005 (UTC)