Module talk:lo-translit

From Wiktionary, the free dictionary
Latest comment: 9 years ago by Wyang in topic More involving ວ
Jump to navigation Jump to search

Spaces between syllables?[edit]

Just a suggestion. Perhaps syllables should be separated by a space? It's the current practice in some Lao dictionaries. It may make reading and learning the script a lit bit easier. --Anatoli (обсудить/вклад) 03:04, 16 June 2014 (UTC)Reply

I'm not sure. Should we strictly adhere to the LC rules or have our own variant here? Wyang (talk) 03:16, 16 June 2014 (UTC)Reply
I'm not insisting, just asking. Is that an official guide with official examples? Thai and Lao (and Khmer, which has similarities) dictionaries I've seen often use spaces or hyphens, there's no consistency there. Perhaps we should ask for some feedback? With or without spaces, it won't be any violation, IMHO. I know you're busy, take your time.--Anatoli (обсудить/вклад) 03:26, 16 June 2014 (UTC)Reply

Syllabification enabled; module rewritten. Hopefully it works better now. Wyang (talk) 04:34, 17 June 2014 (UTC)Reply

It's very good, indeed, better than Google, especially with Burmese! I don't mind removing "beta stage" if you're confident. @Widsith would you like to comment/test/add something? --Anatoli (обсудить/вклад) 04:51, 17 June 2014 (UTC)Reply
I made mandatory in Module:links, I see you even fixed ສຍາມ (sa nyām). I don't know enough about Lao. Hopefully, future problems can be fixed. --Anatoli (обсудить/вклад) 05:22, 17 June 2014 (UTC)Reply
Thanks! There doesn't seem to be a Burmese translation or transliteration function by Google Translate. Wyang (talk) 08:09, 17 June 2014 (UTC)Reply
No translate but you can transliterate languages, which are not included in Google Translate, e.g. Kazakh by using another language as a target :) --Anatoli (обсудить/вклад) 12:54, 17 June 2014 (UTC)Reply
  • Thanks for mentioning me, but I'm happy to observe: this looks very promising. [PROBLEMATIC MODULE CALL REDACTED] 10:52, 17 June 2014 (UTC)

Problems with some vowel orthographies including ວ[edit]

Lao is my current active study language I'm focussing on. This has led me to find some bugs in the automatic transliteration here.

They mostly seem to be due to the still very common occurrence of supposedly obsolete orthography but could be due to flaws in the quite sparse documentation of the Lao language.

Here is one I've seen before but am seeing again now:

ຄວັນ is transliterated to kha wan whereas ຄັວນ is transliterated to khuan.

My understanding of the documentation of the current orthography is that superscript vowels go over the main consonant and that ວ is regarded as a vowel. This would mean ຄັວນ is the correct spelling and indeed it gets a good transliteration. However this and similar spellings are not included in SEAlang's dictionary at all, and get orders of magnitude fewer Google hits that the other orthography where ວ seems to be treated as part of a consonant compound which itself bears the superscript vowel: ຄວັນ.

I think there are variations with other superscript vowels and tone marks. I'll keep an eye out.

Other things I've noticed are with abbreviations and/or acronyms and words with other unpronounced consonant compounds. I'll keep an eye out for specific examples of those too. — hippietrail (talk) 05:39, 28 July 2014 (UTC)Reply

You can create some test cases in Module:lo-translit/testcases, User:Wyang or another Lua expert may look into those. What should these be, anyway? Currently: ຄວັນ (khuan) and ຄັວນ (khuan). Is ຄວັນ a misspelling? --Anatoli T. (обсудить/вклад) 05:51, 28 July 2014 (UTC)Reply
As best I can tell ຄວັນ (khuan) and ຄັວນ (khuan) are alternative spellings for the same word and should therefore have the same transliteration as a single syllable (khuan). The former is incorrect according to modern rules but still far more common whereas the latter is the correct modern spelling but a lot less common. This module currently works only for the uncommon but correct modern spelling and incorrectly renders the popular spelling as two-syllables.
There's a chance I'm wrong and the module is wrong about the orthography rules though. It would be good to find an authoritative answer either way. — hippietrail (talk) 07:03, 29 July 2014 (UTC)Reply
Please see Lao collation, Syllabification of Lao Script for Line Breaking as well as Towards a Computerization of the Lao Tham System of Writing. Wyang (talk) 00:28, 31 July 2014 (UTC)Reply

Another example is ກ່ວາ (kuā) / ກວ່າ (kuā). In this case the module currently does render them both the same (kuā). The former seems to agree with the stuff I read but is not in SEAlang. The latter is in SEAlang and gets a lot more Google hits. I don't know if it's just a fluke or by design that the module treats this pair the same but not ຄວັນ / ຄັວນ. It could be that this word has tone mark that can appear in two places whereas the other has a superscript vowel than can appear in two places. — hippietrail (talk) 00:59, 2 August 2014 (UTC)Reply

Problem with ວ as final consonant after some short vowels[edit]

In the word ຜິວໜັງ (phiu nang) / ຜິວຫນັງ (phiu nang) the module currently treats ວ as an initial consonant with an obsolete-orthography-style implied vowel, resulting in the transliteration (phi wa nang). In reality ວ is just a final consonant so the module should convert it to something like (phiw nang). Compare with SEAlang which gives pʰĭw năŋ and pʰǐunǎŋ. — hippietrail (talk) 00:32, 2 August 2014 (UTC)Reply

This is probably the same bug: ດຽວນີ້ (diāu nī). We currently have (dīa wa nī) as though ວ is an initial with implied vowel. SEAlang has diaːw nīː / dìaunîː — hippietrail (talk) 01:21, 2 August 2014 (UTC)Reply

Both are now added. Wyang (talk) 23:59, 4 August 2014 (UTC)Reply
I found another which isn't yet fixed. One of the words for bottle is ແກ້ວ (kǣu). Currently this gets rendered as (kǣ wa), while SEAlang gives "kɛ̑ːw" or "kɛ̂ːu" — hippietrail (talk) 08:14, 28 August 2014 (UTC)Reply

Problem with ຽ as final consonant[edit]

In the old orthography ຽ was a final consonant equivalent to the modern ຍ. Like other features of the old orthography it's still extremely common. SEAlang is full of them, so is the Lao Wikipedia.

One example is ຍ່ອຍ (nyǭi) correctly transliterated as "nyǭi" but ຍ່ອຽ (nyǭi) currently coming up blank. — hippietrail (talk) 03:49, 16 August 2014 (UTC)Reply

Ah I see final ຽ does indeed work for other words, it must just be a problem when coming after ອ which can be a consonant or a vowel. Might want to check it after ວ then too in that case. — 58.172.68.199 05:12, 19 August 2014 (UTC)Reply

More involving ວ[edit]

  • ແຂວງ (khǣung) on SEAlang is (kʰwɛ̆ːŋ / kʰwɛ̌ːŋ) but we currently do (khǣ wang). Basically the rule where ວ is a glide between the initial consonant and the core vowel, even if in the orthographic left position, should always have higher priority than interpreting consonants as requiring an inherent vowel, which is officially obsolete though still occurs in abundance. — hippietrail (talk) 04:56, 3 September 2014 (UTC)Reply