Module talk:mnw-translit

RFC[edit]

@Octahedron80, 咽頭べさ, エリック・キィ, Lancepark: The module is somewhat stable now. In Module:mnw-translit/testcases, is "ကောန်တှ် = konth" better than "ကောန်တှ် = kontah"? Any comments? Any idea about Old Mon? EdwardAlexanderCrowley (talk) 04:24, 17 July 2021 (UTC)[reply]

You asked 'Is "ကောန်တှ် = konth" better than "ကောန်တှ် = kontah"'. No, it isn't. 'Medial' + asat may represent a final consonant. I've added a conservatively expressed substitution using table pre2 to fix it, taking the number of errors down from 185 to 178.

You've missed a curable ambiguity of anusvara. Anusvara before a velar coda represents /ɔ/, which Shorto and Jenny transliterate as <å>.

That brings me to the next issue. You have a test that says that က္ၜံၚ် should transliterate as "bɔṅ", but that is the transcription. Apart from the issue of anusvara before nasals, the achieved result "kṗaṃṅ" is correct by your scheme. (I'm not sold on the transliteration of the implosives - I think the clear <ḍ> is better matched by <ḅ>.) I wish there were a way of identifying the test cases, and a source. There are a lot where you've confounded transcription and transliteration. A few of the test words look misspelt. I'm contemplating adding data fields to some of the test cases, to display in the text column as labelling and possibly suppress some of the erroneous ones. It would be nice to have a table somewhere of how vowels are to be transliterated. --RichardW57m (talk) 13:14, 6 October 2022 (UTC)[reply]

We need to generalise the new rule for 'medial' (or perhaps just <h>) + any vowel + asat - test case "ဗှ်ေ = beh". Will fix later. You've left out the rule for the last complex vowel of ပ္ဍဲကဵု = pḍoakiuw - that gives about a dozen failures. Well done for the coverage; it's unfortunate that you've often used the pronunciation. I'm not sure how to fix many of the tests without 'cheating'. --RichardW57m (talk) 15:16, 6 October 2022 (UTC)[reply]

The generalisation is more complicated than this. @咽頭べさ: Are there multiple ways of writing <coh>? UTN-11 suggests that the following might be equivalent, but I don't entirely trust it: စှော် စှ်ော စောှ်. --RichardW57 (talk) 22:56, 6 October 2022 (UTC)[reply]

@RichardW57 hi, Take a look at the example below.

--𝓓𝓻.𝓘𝓷𝓽𝓸𝓫𝓮𝓼𝓪|𝒯𝒶𝓁𝓀 13:07, 7 October 2022 (UTC)[reply]

@咽頭べさ: Thanks, I've modified the code to handle two other vowels phonetically before subscripted <h>, as in ယှ်ေ and စှော်. --RichardW57 (talk) 04:15, 8 October 2022 (UTC)[reply]

@咽頭べさ: I've just realised that you encoded the last two sequences 'incorrectly'. Your sequences usually result in dashed circles. The sequences that work are:

စှ်ော: CA, MEDIAL HA, ASAT, SIGN E, SIGN AA

စောှ်: CA, SIGN E, SIGN AA, MEDIAL HA, ASAT

I hope that doesn't affect your answer. --RichardW57 (talk) 04:32, 8 October 2022 (UTC)[reply]

@RichardW57 hi, Mon language is still a language that does not have 100% Unicode code like other languages, so for the Mon language, we can only see 100% true Mon letter with the font edited by the Myanmar technicians. Mon letter cannot be seen 100% correctly in the current world Unicode font, so I would like to contact Unicode organizations to resolve this issue, but I have no contact. Because of this issue, many Mon people are very reluctant to write Mon, so I myself have to pay a lot of money to phone technicians to see Mon letter 100% on my phone, so I can use Mon letter 100%, so to find out if this is what I mentioned, come back and see Mon letter again using Pyidaungsu font on your device, thanks.

--𝓓𝓻.𝓘𝓷𝓽𝓸𝓫𝓮𝓼𝓪|𝒯𝒶𝓁𝓀 14:23, 8 October 2022 (UTC)[reply]

@咽頭べさ: Please:

Please keep your pictures within your message.
Please respect the indentation structure of the discussion. If the text gets too indented, you can use {{outdent}} to reduce the indentation level.

If you don't do this, it becomes too difficult to follow the discussion. I want to reply to your remarks, not spend my time keeping the discussion structured. --RichardW57 (talk) 15:52, 8 October 2022 (UTC)[reply]

@Crowley666: I've added labelling to group (b) of the tests, and corrected about a sixth of the tests that were wrong (as opposed to simply being reported as failing). There's a variable select in the /testcases module which selects the versions to show:

1 => the test as at 5 October 2022

2 => the latest versions for the corrected tests (this gives the lowest error count)

3 => both (this gives the highest error count)

I've kept to the current rules as far as I can discern them. --RichardW57 (talk) 23:08, 6 October 2022 (UTC)[reply]

@Crowley666: I've corrected all the faulty tests I could, and I've also added the conversion of ဵု to 'iuw'. We are now down to 13 errors, all but one of them because of the ambiguity of anusvara. If I liked the transliterations, I'd say the module was now clearly usable. --RichardW57 (talk) 04:15, 8 October 2022 (UTC)[reply]

Hello@EdwardAlexanderCrowley:,There is no Old Mon alphabet for Unicode yet, so I do not think there is a need a module for Old Mon.

You can see the Old Mon alphabet.

k (/kaˀ/)	kh (/kʰaˀ/)	g (/kɛ̤ˀ/)	gh (/kʰɛ̤ˀ/)	ṅ (/ŋɛ̤ˀ/)
c (/caˀ/)	ch (/cʰaˀ/)	j (/cɛ̤ˀ/)	jh (/cʰɛ̤ˀ/)	ñ (/ɲɛ̤ˀ/)
ṭ (/taˀ/)	ṭh (/tʰaˀ/)	ḍ (/ɗaˀ/~[daˀ])	ḍh (/tʰɛ̤ˀ/)	ṇ (/naˀ/)
t (/taˀ/)	th (/tʰaˀ/)	d (/tɛ̤ˀ/)	dh (/tʰɛ̤ˀ/)	n (/nɛ̤ˀ/)
p (/paˀ/)	ph (/pʰaˀ/)	b (/pɛ̤ˀ/)	bh (/pʰɛ̤ˀ/)	m (/mɛ̤ˀ/)
y (/jɛ̤ˀ/)	r (/rɛ̤ˀ/)	l (/lɛ̤ˀ/)	w (/wɛ̤ˀ/)	s (/saˀ/)
h (/haˀ/)	ḷ (/laˀ/)	b (/ɓaˀ/~[baˀ])	a (/ʔaˀ/)	mb (/ɓɛ̤ˀ/~[bɛ̤ˀ])

This Old Mon alphabet is called the Hanthawaddy alphabet or Thai Raman alphabet the pronunciation of Mon words is confusing.

Listen to this တ္ၚဲ sample audio file

(Audio read)

(Audio speak)

The Mon vocabulary is very different from the pronunciation of reading and speaking thanks.--Music writer Dr.Intobesa of Japanese idol NMB48 and BNK48. (talk) 07:03, 17 July 2021 (UTC)[reply]

@EdwardAlexanderCrowley I am quite skeptic whether we can really develop this kind of gear in a right manner, since Mon transliteration system has probably not yet been established (not found in the ALA-LC romanization site) and quite complicated issues. တှ် 'breast' may be tah since it is abbreviation of တဟ်, but there exist more complicated issues.

ံ

This issue may be most difficult one and requires the highest caution, since there are some irregular exceptions. It is usually (a)ṁ, but တြုံ truĥ /kraoh~krauh/ 'male', ဖျေံ phyeĥ 'down' and ဂွံ gwaʼ /kɜ̤ʔ/ (Kaw Kyaik, Burma) 'to obtain' (Jenny 2005: 176, 183, 280). Judging from them and Shorto (1962), ဍေံ /deh/ 'he, she, it' also belongs to these kinds of exceptions.

ို

Transliteration for this combination in Literary Mon is ui (apparently more broadly used) or iu in Diffloth (1984) and Jenny (2005), but Jenny (2019) has altered it to ə, as in ဂၠုင် gləṅ 'many'. With a logic similar to it, လီု 'to be spoiled, to go bad' is luiṁ / liuṁ / ləṁ. Besides, at least Western researchers (Haswell 1874: 7; Jenny 2005) have interpreted ဵု as abbreviation for ိုဝ်, thus ကၠဵု 'dog' is kluiw / kliuw / kləw in favor of their logics. I am not sure, however, what kind of transliteration systems Mon people currently accept indeed.

ဲ

It pronounces generally /oa/, but Jenny transliterates it as ay and it is found also as coda, e.g. နာဲ nāy /na̤i/ 'Mr.', တုဲ tuy 'to finish', ပိုဲ puiy or pəy /poj/ (Burma) 'we', etc. --Eryk Kij (talk) 10:29, 17 July 2021 (UTC)[reply]

A system by Mon people is Romanization for Mon Script by Transliteration Method, or [1][2], by Dho Ong Jhaan. I think the goal is just translit, not IPA, as oral Mon is different from written Mon. EdwardAlexanderCrowley (talk) 11:54, 17 July 2021 (UTC)[reply]

These symbols will not able be solved automatically. Even a word spells the same, it could still read different for other meanings. (example ကၠံ reads /klɔʔ/ "garden" or /klɔm/ "hundred") We are doing transliteration that it surely has some limitations, same as Burmese system. --Octahedron80 (talk) 15:59, 18 July 2021 (UTC)[reply]

@EdwardAlexanderCrowley, Octahedron80, 咽頭べさ, エリック・キィ, Erik Kyj: I think we can allow transliteration to fail, though preferably it should be obvious where it has failed - there is normally the |tr= option available to override the errors. The anusvara ambiguity, I am afraid, will probably have to be a case of "let the user beware". For quotations and examples, we might use mark up, e.g. letters A and H (perhaps encircled, but then fairly horrid for a mobile phone), with special versions {{mnw-quote}} and {{mnw-usex}} of {{quote}} and {{usex}} to remove the mark-up. There's a similar problem for Hindi quotation, which is solved by |subst=, but that solution would hide the original spelling. It isn't actually necessary to solve this 'problem' - we can treat it the way we treat the ambiguities of English 'ch', 'g' and 'ugh'. --RichardW57m (talk) 11:07, 6 October 2022 (UTC)[reply]

I'm going to boldly go ahead and apply mark up. That way, I can boldly start deploying the transliteration and any changes to the transliteration scheme will automatically be deployed. Now, if Mon be used as a general communication tool, we're going to find the American alphabet embedded in text, so by default plain ASCII will not be interpreted as mark-up. I will therefore us encircled Latin letters ⒶⒽⓄⓂ by default. Ⓞ and Ⓜ are for those who want to classify every anusvara. --RichardW57 (talk) 11:27, 8 October 2022 (UTC)[reply]

@Crowley666, Octahedron80, 咽頭べさ, エリック・キィ: I've now implemented {{mnw-quote}}; there is an example of use at ညးဗနိက်. --RichardW57 (talk) 09:09, 9 October 2022 (UTC)[reply]

(Notifying RichardW57): : I've abandoned a separate {{mnw-usex}}; for usage example, just use {{mnw-quote|isex=1}}. --RichardW57m (talk) 10:28, 20 October 2022 (UTC)[reply]

About Old Mon, it is just the font problem. If someone has a proper font, the same word will show proper writing. So you can just spell it normally. (This is not transliteration problem.) --Octahedron80 (talk) 15:59, 18 July 2021 (UTC)[reply]

@EdwardAlexanderCrowley:,I'm busy with the NMB48 musical group, so I'm late to get the answer to your question, I apologize to you. You can learn the Mon vocabulary answers you want with pronunciation.

Take a look

Mon vowel(ိ)

Burmese and Mon (ဝWa) alphabet uppercase and lowercase letters.

Mon vowel(ို)this Mon term is not used in literature, but is used in Mon Literary Scholar tests, non Mon literary scholars can not read this term related to ို, but only Mon literary scholar can read this ို term as, it is not taught by teachers in Mon school, this ို word is a word hidden for the teacher.
Mon vowel(ု)

Burmese and Mon (ုHaiycainmạ̀w) alphabet uppercase and lowercase letters.

Mon vowel(ူ)

Burmese and Mon (ူHaiycainba) alphabet uppercase and lowercase letters.

Mon vowel(ဲ)
တဟ်Mon terms for genital organs it is an တဟ် term that is rarely used by the Mon people.
တှ်The most commonly used တှ် term for nutrition.
တဲု
နာဲ
ပိုဲ
ဖျေံ
တြုံ
ဍေံNon civilized Mon vocabulary
ကၠံ or မွဲကၠံ
ကၠဵု Mon Te pronunciation
ကၠဵု Mon Ye pronunciation

ကၠဵု 카유 Mon Korea pronunciation
ဂွံ
အောန်
ဂၠိုၚ်
ဂမၠိုၚ်
လီု