Module talk:km-translit

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Module[edit]

@Stephen G. Brown Steve, what are the transliteration/pronunciation rules for letters, such as "ហ្វ", which have two names - fâ and wâ? And what "a-series" and "o-series" for vowels? For example, ភាសា (phiəsaa) is transliterated as "pīəsā", although the vowel letter is the same in both syllables - ()? @Wyang Please take a look at this module, it's not much but I'm struggling :) --Anatoli (обсудить/вклад) 02:50, 18 June 2014 (UTC)[reply]

ហ្វ is the consonant cluster + (h+w). It is a rare combination in native Khmer words, and it is an artificial construct used to represent the foreign sound of f: កាហ្វេ (kaafee). However, since it contains , it can sometimes be pronounced as w (actually, this is a sound that is halfway between w and v, or /ʋ/). For example, ហ្វឹក can be pronounced either /fək/ or /ʋək/. ហ្វូង can be pronounced /ʋooŋ/), /fooŋ/, or /pʰooŋ/. ហ្វេ, borrowed from Vietnamese, is only pronounced /ʋee/ and means the Vietnamese city of Hue. ហ្វៃយ៉ង់ is only pronounced /ʋayyɑŋ/, meaning faience. ហ្វៃហ្វា is only pronounced /ʋayvaa/. There is no rule, it depends on the word.
Khmer consists of two different alphabets, called the "a-series" and "o-series". Vowels with "a-series" consonants are pronounced differently from the same vowel with "o-series" consonants. There is also a converter , which converts a-series consonants to the o-series. converts o-series consonants to the a-series.
The following tables show the a-series and the o-series consonants. is a-series, so សា = saa. is o-series, so ភា = pʰie. —Stephen (Talk) 11:51, 18 June 2014 (UTC)[reply]
a-series consonants Subscript form IPA
(kɑɑ) ្ក (្kâ)
(khɑɑ) ្ខ (្khâ) kʰɑ
(cɑɑ) ្ច (្châ)
(chɑɑ) ្ឆ (្chhâ) cʰɑ
(dɑɑ) ្ដ (្dâ) ɗɑ
(thɑɑ) ្ឋ (្thâ) tʰɑ
(nɑɑ) ្ណ (្nâ)
(tɑɑ) ្ត (្tâ)
(thɑɑ) ្ថ (្thâ) tʰɑ
(bɑɑ) ្ប (្bâ) ɓɑ
(phɑɑ) ្ផ (្phâ) pʰɑ
(sɑɑ) ្ឝ (្shâ) śɑ (Pali/Sanskrit)
(sɑɑ) ្ឞ (្ssô) ṣɑ (Pali/Sanskrit)
(sɑɑ) ្ស (្sâ)
(hɑɑ) ្ហ (្hâ)
(lɑɑ) ្ឡ (្lâ)
(ʼɑɑ) ្អ (្ʼâ) ʔɑ
o-series consonants Subscript form IPA
(kɔɔ) ្គ (្kô)
(khɔɔ) ្ឃ (្khô) kʰɔ
(ngɔɔ) ្ង (្ngô) ŋɔ
(cɔɔ) ្ជ (្chô)
(chɔɔ) ្ឈ (្chhô) cʰɔ
(ñɔɔ) ្ញ (្nhô) ɲɔ
(dɔɔ) ្ឌ (្dô) ɗɔ
(thɔɔ) ្ឍ (្thô) tʰɔ
(tɔɔ) ្ទ (្tô)
(thɔɔ) ្ធ (្thô) tʰɔ
(nɔɔ) ្ន (្nô)
(pɔɔ) ្ព (្pô)
(phɔɔ) ្ភ (្phô) pʰɔ
(mɔɔ) ្ម (្mô)
(yɔɔ) ្យ (្yô)
(rɔɔ) ្រ (្rô)
(lɔɔ) ្ល (្lô)
(vɔɔ) ្វ (្vô) ʋɔ
Thank you, Stephen! Multiple unpredictable readings sounds like a little problem or an obstacle. Can we default such letters to most common pronunciations and provide phonetic respellings (or additional parameters) for such cases? Can the lists of exceptions be made or it's not very practical? Stephen, are you happy to follow the new transliteration standard as used here or you have a different preference? I see you use Sealang dictionary transliteration. This will be mostly automatic but standardised translit may be needed for test cases. Wikipedia doesn't describe well transliteration for diacritics. @Wyang, thank you for making it work already, on the basic level! Could you add spaces between syllable as with Lao? It will be important to make longer strings more readable. --Anatoli (обсудить/вклад) 12:30, 18 June 2014 (UTC)[reply]
The most common pronunciation is with f, but f would be incorrect for words such as Hue. I don’t know how this can be handled. I have not seen what the new transliteration scheme will look like, but Khmer has many vowel sounds, and they will require special symbols to represent them all. I think the f and the transliteration scheme will be the least of the problems ... the biggest problem, in my opinion, is determining the ends of words. Putting a space between every syllable, like some people do with Chinese, is not going to be acceptable. It is important to put a space only between words. I think every transliteration will have to be replaced with a manual transliteration. —Stephen (Talk) 13:18, 18 June 2014 (UTC)[reply]
The module won't be smart enough to determine end of words but can hopefully determine ends of syllables. If you oppose spaces, we can do without them. For manual vs automatic - the automatic transliteration can be made non-mandatory, so that manual (when "tr=" exists) overrides automatic. If a transliteration method is accepted (it can be changed and is tuned) you can preview e.g. ព្រហ្មវិហារ (prum vihiə) - copy/paste "prômvĭharô", if it's incorrect, fix, insert spaces for phrases between words, etc. Entries/translations without manual transliteration (tr=) will definitely benefit. --Anatoli (обсудить/вклад) 13:30, 18 June 2014 (UTC)[reply]
ព្រហ្មវិហារ (prum vihiə) = prum vi’hie. —Stephen (Talk) 14:22, 18 June 2014 (UTC)[reply]
Thanks, Stephen. I saw that it wasn't transliterated correctly. Sealang dictionary gives "prummeaʔviʔhie" or "prum viʔhie". I've added a few more words to test cases. Hopefully Frank can make this module work. --Anatoli (обсудить/вклад) 23:50, 18 June 2014 (UTC)[reply]
Yes. When (rɔɔ) comes at the end of a syllable, it is not pronounced. It is similar to the British -r, which is pronounced at the beginning of a syllable but not at the end. Sometimes it can cause the preceding vowel to be long. The (bantoc) makes the preceding vowel short: បក (bɑɑk), but បក់ (bɑk). The (samyok sannya) has no pronunciation, but denotes a deviation from the general rules of pronunciation (used mostly in loan words from other languages). (toandakhiat) indicates that the base character is not pronounced. —Stephen (Talk) 06:28, 19 June 2014 (UTC)[reply]
"toandakhiat" must be like Thai thankhankhat (a consonant killer) as in แมนเชสเตอร์ (Manchester) making ร silent? --Anatoli (обсудить/вклад) 06:46, 19 June 2014 (UTC)[reply]
Exactly. is the repetition sign, which repeats the previous word. is the independent vowel ’u. (laʼ) (etc.) is lɑ’ or lɑ’nɨŋlɑ’. Many Khmer texts include a zero-width space (​​) between words which allows software programs to break lines at the correct place...the zero-width space should be transliterated as a word space. —Stephen (Talk) 12:07, 19 June 2014 (UTC)[reply]

Question[edit]

@Atitarev @Stephen G. Brown

Wiktionary:Khmer transliteration seems to suggest that the transliteration system for Khmer here is the United Nations Romanization System for Geographical Names scheme. Should we use the UN scheme for Khmer in this module as well? ភាសា: phéasa or pʰiesaa? Would this mean syllable-final 'r' would still be 'r'? Wyang (talk) 04:03, 25 June 2014 (UTC)[reply]

As for me, whatever is achievable. Apart from basics I couldn't find anything useful. If one transliteration system starts working, it would be great. Sealang dictionary seems to be using the UN scheme and Stephen has been using it or a similar scheme. --Anatoli (обсудить/вклад) 04:16, 25 June 2014 (UTC)[reply]
ភាសា pʰiesaa is the only one I am familiar with. Just skimming United Nations Romanization System for Geographical Names, it appears that syllable-final r would still be r.
If we use the United Nations Romanization System for Geographical Names, then we have to rely on the transliteration program exclusively, because I would not know how to make manual transliterations in that system.
If that’s the case, I think the only way it can work is if we insert zero-width spaces (​) wherever needed. I think Lua makes it possible to filter the zero-width spaces out of links so that they would don’t appear in page names (the same way it works with Russian acute accents). The zero-width space is the standard used in native Khmer keyboards, where the spacebar by default inserts a zero-width space (instead of a Western-style word space), and if we insert them where needed, then the transliteration program would know how to separate words. —Stephen (Talk) 12:30, 25 June 2014 (UTC)[reply]

misnested tags[edit]

In Module:km-translit, there is a section

			if match(syl[i], '៍') then
				syl[i] = '<small><del>' .. gsub(syl[i], '.', function(consonant)
					if cons_conv[consonant] then
						return cons_conv[consonant][1]
					end end) .. '</small></del>'

and it would appear that (to avoid misnested tags lint errors) the opening HTML tags

<small><del>

should be closed with

</del></small>

rather than

</small></del>

as it is now. However, the edit tab doesn't give access to this section and I don't know how to do it, so I leave it for others to fix. Anomalocaris (talk) 23:19, 17 June 2018 (UTC)[reply]

 DoneSuzukaze-c 23:24, 17 June 2018 (UTC)[reply]
Suzukaze-c: Thanks! I get it now! I have to click on the tool that looks like <> in order to access the source code. —Anomalocaris (talk) 00:08, 18 June 2018 (UTC)[reply]