Wiktionary talk:Ukrainian transliteration

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Combining ties for ALA-LC[edit]

These are composed using the correct Unicode characters in the correct order: i͡e, z͡h, t͡s, i͡u, i͡a. If they don't look right, then your browser is probably displaying them using Arial Unicode MS font. Blame Microsoft, but please don't change the table to ruin the display for people with fonts which work right. For details, see w:Arial Unicode MS#Bugs.—w:user:Mzajac

Attribution[edit]

Most of this table was copy-pasted directly from the wikitext in w:Romanization of Ukrainian. I believe Wikipedia's licence requires that the original authors be credited here. —w:user:Mzajac

If you know how, transwiki the file and then we can make the needed changes. I would do it but I don’t know how. —Stephen 23:56, 15 October 2007 (UTC)
Since it's free, I think all it would require is a notice here, but I have no idea what the right form or procedure is.Mzajac 01:44, 16 October 2007 (UTC)

Ц[edit]

Shouldn't ц be transliterated as c, to be consistent with the rest? —w:user:Mzajac

Indicating stress[edit]

We need to have a way to indicate primary and secondary stress on syllables. In Ukrainian and other Cyrillic-alphabet dictionaries this is normally done using stress marks which look a bit like acute accents over the syllables' vowels. There is normally a bolder one for primary stress. Unfortunately, I don't think Unicode has accents characters which allow both. There can also be problems displaying certain character combinations. For example, the default Wiktionary font on my computer doesn't display combining diacritics on Cyrillic characters correctly, so ма́ти has the accent appearing after the Cyrillic а on the page, even though it displays correctly in editing fields (Latin á, é, í, ó, ú, ý display correctly on the page). Finally, there is also the problem that this is not a normal convention in English dictionaries, so readers may think that the accent represents a word's normal Ukrainian orthography.

In IPA, stress is shown using high and low vertical lines before the main and secondary stressed syllables, respectively: /ˈsɪlaˌbl/ (syllable)

There seems to be a mix in how this is done in Ukrainian entries in Wiktionary: in the Cyrillic, in the transliteration, and of course in the IPA. For example, at мати, stress is indicated all three ways:

Pronunciation[edit]

IPA(key): /ˈmatɪ/

Noun[edit]

ма́ти (máty) f

  1. mother

Perhaps it's best to only apply stress in the IPA, since there is a standardized method which is familiar to English-language readers. —Mzajac 03:33, 16 October 2007 (UTC)

I’m more familiar with Russian. Russian dictionaries always indicate the stress using acute accents, and that’s how we do it here.
I use a font-call template for this purpose: ма́ти ({{Cyrl|ма́ти}}). This one accent is all that is needed for Russian. For secondary stress in Ukrainian, how about є̀ versus є́? —Stephen 08:44, 17 October 2007 (UTC)


That's not a bad solution, although non-standard. It should be fairly compatible in web browsers, etc. (Unicode is quite robust these days, so I'm surprised and disappointed that it doesn't accommodate the bold acute accent.)
Another possibility is double and single acute accents, although I suspect this will not display in many fonts. Another possibility is to actually use bold formatting, but this is problematic because it won't be preserved in a plain-text copy, and it breaks up the word in the source code. Or we could adopt the IPA standard, but this presents the problem of identifying syllable boundaries, while Ukrainian dictionaries always put stress on the vowel.
I suspect accents are more likely to work consistently over Latin letters than Cyrillic, but all of the examples below work in my browser (Safari 3.0.4 on Mac OS X Leopard 10.5.2):
  1. acute and grave: карбо́ванѐць, karbóvanècʼ
  2. double and single acute: карбо̋ване́ць, karbővanécʼ
  3. bold and acutes: карбо́ване́ць, karbóvanécʼ
  4. IPA: карˈбоваˌнець, karˈbovaˌnecʼ
A quick look at the Ukrainian wiktionary shows that most words have no stress, or only primary stress indicated. Occasionally syllable breaks are shown with hyphens, as in від-мі́-ню-ва-ти. Maybe we should just follow the leader on this one.
The question still remains where we should indicate stress: in which of Cyrillic, transliterations, and pronunciation? Obviously stress should be indicated in the IPA pronunciation in the word's entry. Does it also belong in links to the word, e.g. in a "translations" section?
Perhaps it's best to only indicate stress where it's required, in a word's main entry, and not clutter everything else up with accents which some readers may find confusing. —Mzajac 03:48, 18 February 2008 (UTC)
Looking at the above examples with the latest Mozilla Firefox, I see о̋ and ѐ correctly, but the acute о́ is flying high and to the right. With рˈбо, only native Ukrainian-speakers would be able to tell the difference between ˈ and the regular apostrophe ’. I can tell when an entire word or phrase is bolded, but when only one letter is bolded, I don’t notice any difference on my screen.
Of course the accent should be marked in any IPA pronunciation, but many people don’t know or use IPA and they rely on either the Cyrillic or the transliteration. For that reason, it needs to be on the transliteration at least.
I don’t think it would be a good idea to follow the lead of the Ukrainian Wiki, because that is designed for native speakers. Americans require more clarity. For example, a Ukrainian can always tell the difference between ˈ and ’ in a Ukrainian word, but Americans cannot.
I agree, it’s probably best to add accents only where required, although it is helpful to have them marked in the Roman transliterations in the translation sections when a word’s main entry has not yet been created. But if the main entry exists, accentuation in the Translation sections is not so important. —Stephen 15:33, 18 February 2008 (UTC)
I've had a look at a few Ukrainian translation dictionaries in print (of varying quality). Some don't show stress at all, some only indicate primary stress, and others indicate two places of stress without differentiation. I am positive that I remember Ukrainian schoolbooks from my youth which use a bold acute accent for primary stress, but I haven't been able to find one. I suspect this has probably become rare in digital typesetting, since it would require a custom font or character set. I speculate that it may also be out of current practise in Ukraine, which would be influenced by Russian-language typography. Of course, this applies to Cyrillic text.
I agree that the Latin conventions of indicating stress shouldn't be mixed with the apostrophes and double apostrophes used in romanization because of confusion. I think that acute accents may be misleading too, for readers who aren't familiar with this non-Latin convention. This could be further confusing, because in some languages there are Cyrillic characters with acute accents ѓ and ќ.
Does it make sense to use the Cyrillic convention on romanized words at all? Romanization is used to convey Cyrillic spelling to a Latin-alphabet audience, while IPA is used for pronunciation. And stress is an aspect of pronunciation, not spelling. Note that stress marks are placed on headwords in Cyrillic dictionaries, and there is no separate pronunciation, since these languages are more-or-less phonetic.
I suggest that we don't mix metaphors, and stick to the common systems English-language readers are familiar with. Acute accents for stress are used in Cyrillic reference books, and should remain at the uk, ru, and other Cyrillic wiktionaries.
мати (maty)
IPA(key): /ˈmatɪ/
Mzajac 03:08, 6 March 2008 (UTC)
Sounds okay. For the Roman alphabet, I use Windows’ "International English" keyboard, which makes it very easy for me to type the accents. And since Russian is so phonetic (except for the accent), not many Russian words have a separate Pronunciation section (although they should). And of course, the translation sections only have room for romanizations, not IPA pronunciations. So that’s why I like to include the accents right in the romanizations (máty). —Stephen 20:13, 6 March 2008 (UTC)
I've been giving this some more thought, and have written about some general principals in Wiktionary talk:Transliteration#Purpose, and about Russian in Wiktionary talk:About Russian. (The romanization guide in Wiktionary:About Russian contradicts Appendix:Russian transliteration, and is working at cross-purposes, in my opinion)
I think stress, syllabification, Russian akanye, etc. are all attributes of pronunciation. They belong only in the pronunciation section. They should be represented with IPA, and this can be supplemented by another system. Cyrillic dictionaries don't have a separate pronunciation guide, so stress is laid on the headword, but that doesn't belong in the English-language Wiktionary.
Headwords in English dictionaries are in plain English, and show hyphenation breaks (an attribute of orthography). They don't show stress, syllabification, or other pronunciation attributes. It follows that foreign-language headwords and entries in a translation section should include a Latin-alphabet representation of the original orthography only. Perhaps the stress accent could be placed on the Cyrillic version for the benefit of those familiar with the alphabet and its conventions (I would prefer not, to avoid confusion and help with searching), but this foreign convention doesn't belong in the romanized word.
Sorry if I'm repeating myself. —Mzajac 00:58, 7 March 2008 (UTC)

Having worked on more definitions and discussed this elsewhere, some factors have come to my attention:

  • In publications about the Ukrainian language, stress can be indicated on both Cyrillic and transliterations.
  • (In such publications, scholarly transliteration is used for both orthographic transliteration and phonetic transcription, so perhaps it could be used here to supplement IPA in pronunciation sections.)
  • In Wiktionary, many Ukrainian words appear in translation sections but don't have their own entries yet.
  • Where the Cyrillic is linked in translations, it is impractical to indicate stress on it, so stress naturally belongs on the transliteration.
  • In Ukrainian entries, stress can also be indicated on transliterations, for clarity and consistency.

I'll add a note to this guideline about stress. —Michael Z. 21:05, 18 March 2008 (UTC)

Archaic letters[edit]

Archaic Ukrainian letters may be added as a supplement. Their transliteration is not universal, but the following suggestions follow academic practice and are unambiguous. I've already encountered the jat’ (ѣ, ять) in the etymology for горілка.

These were sometimes used in Ukrainian up to the 1917 revolution:

  • ъ ъ (jer) sometimes (”), but that is confused with the apostrophe ’ (”).
  • ы y (jery)
  • ѣ ě (jat’)
  • э è (e) sometimes (e), but that is confused with Cyrillic е (e).

These were in older orthographies, some only used for Greek words or very rare. Also found in Church Slavonic, and in obscure Rusyn orthography of the late 1800s.

  • ѕ dz (dze)
  • ѹ u (uk)
  • ѡ ô (omega) sometimes (o), but that is confused with Cyrillic о (o).
  • іа ja (a jotovane) often treated as equivalent to я (ja)
  • ѧ ę (jus malyj)
  • ѩ ję (jus malyj jotovanyj)
  • ѫ ǫ (velykyj yus)
  • ѭ jǫ (velykyj yus jotovanyj)
  • ѯ ks (ksi)
  • ѱ ps (psi)
  • ѳ θ (fita) sometimes (f) or (th), but those are confused with Cyrillic ф (f) or тг (th).
  • ѵ i (ižicja) very rare.
  • ѥ je (e jotovane)

Any comments or objections to adding these to the Appendix? I suppose some will be unreadable in Internet Explorer, without a font-style template added. —Michael Z. 18:56, 26 March 2008 (UTC)

I have added the short list to the appendix, plus ё, which was part of the Zhelekhivka. —Michael Z. 19:31, 2 April 2008 (UTC)

Standards[edit]

Since we are discussing transliteration standards in the Beer parlour, I thought I'd make a relevant note here. This appendix currently falls conforms to normal practice, where some variation exists. The ISO/R 9:1968 standard is more specific, and here are the differences:

Cyrillic Appendix ISO/R 9:1968
ї ji ï
х x ch
ь ’ (apostrophe) ʹ (prime)
” (r-quotes) ʺ (2-prime)
ъ ъ ʺ (2-prime)

Personally, I prefer the standard's unambiguous ï over ji, but I am prejudiced towards the traditional x, ъ, and curly apostrophes.

I haven't considered the newer ISO 9:1995 standard, because it departs from conventional practice in linguistics. —Michael Z. 21:14, 7 April 2008 (UTC)

There was no discussion about this change in BP. Michael, you have changed "x" for "ch" in the established system without any discussion and changed it for a dozen Ukrainian words. What about the rest? What about many translations into Ukrainian? I think the Wiktionary transliteration systems should be with those who actually work with these languages, not with people who impose the rules in the Beer Parlour. Some casual editors have added more Ukrainian contents than you have and "ch" is very uncommon with Ukrainian contributors. It will also now differ from Belarusian and Russian, even though both languages also have similar standards but they were not adopted here. --Anatoli (обсудить/вклад) 04:21, 29 January 2013 (UTC)
Under the topic of “standards,” I'm trying to make this consistent with real-world usage and actual standards. The х=x has only been used in a few individual authors’ systems, while the broad use in most European-style systems is х=ch (while Anglo-American–style systems use х=kh).
I didn’t consider consistency with the Wiktionary transliteration system for Russian. It doesn’t correspond with any system, fails in some of the essential requirements of a dictionary transliteration system, and arguably it is not stable. The special rules and exceptions turn it into a transcription system – it duplicates the pronunciation and doesn't actually transliterate the words where it matters most.
I am prepared to go through and update all of the transliterations.
Do you really think we should talk about harmonizing our diverse Cyrillic romanization systems at the BP? I think this would be a great idea, but given the history, I doubt anything would come of it. Michael Z. 2013-01-29 21:00 z
The "real world" doesn't use "ch" for "х", since it may and will be misread as /tʃ/. Most Wiktionary transliteration systems are customised over the time. The current system has been established and used. If anything, Anglo-American "kh" is the real world usage.
There's no problem with the Russian system (except for wording and formatting, perhaps), unlike Ukrainian and Belarusian, there are a few important exceptions in Russian when letters are not pronounced as expected and the knowledge of reading rules doesn't help (e.g. "молоко" is not transliterated as "malakó"). The transliteration system helps users to pronounce those words correctly without having to use IPA. A similar approach is used with other languages where letters are pronounced differently (e.g. Japanese, Korean, Arabic, Hebrew).
I don't see the need to change anything. There's no point in changing if the rule won't be followed. It would make sense to discuss rules with productive contributors if their method were different from the current system, as was the case with Arabic when we compromised and used a chat transliteration system for Arabic for a while. As of today, I seem to be the most productive contributor for Ukrainian, even though most of the time it's translations - about a dozen a week into Ukrainian. The same can be said about Belarusian, unfortunately. As for Russian, there are active editors, which have been using the current system. The only remaining (minor) inconsistency is in transliterating Russian "ь" and "ъ". --Anatoli (обсудить/вклад) 22:01, 29 January 2013 (UTC)
The "real world" doesn't use "ch" for "х", since it may and will be misread as /tʃ/. – “Ch” is sometimes pronounced /x/ in English (loch, Bach), while “x” is never. I think you are making an unfounded assumption here. The real world uses several variations. English-based systems use kh. European systems mainly use ch for the last 150 years or so, while a few have used h or x. H is unacceptable because it steals the only letter usable for Ukrainian and Belarusian г. X is a minority usage. See Gerych 1965, especially the tables at the back, for a survey of older usage.
There's no problem with the Russian system [. . .] The transliteration system helps users to pronounce those words correctly. – That's the biggest problem. The point of transliteration is to convey the spelling, and for example, to let readers perceive irregular pronunciation or spelling. If you think IPA is inadequate to convey pronunciation, then add a better transcription. But by substituting transcription for transliteration, you hide the fact that the word is pronounced \jevo\ but spelled jego. You know that you can put an international-system transcription into the “Pronunciation” section of an entry, don't you?
Regardless of whether you agree with me on that, the Russian system used here is substantially different from the Ukrainian and Belarusian, because it is based on a different fundamental intent: a preference for conveying pronunciation over spelling. If we are to consider making these compatible, our treatment of one or the other has to change radically. Michael Z. 2013-01-29 22:34 z

Module:uk-translit[edit]

For your consideration, Module:uk-translit, see also the talk page with tests. --Anatoli (обсудить/вклад) 04:00, 27 March 2013 (UTC)