Module talk:pi-Latn-translit

From Wiktionary, the free dictionary
Latest comment: 3 months ago by RichardW57m in topic Burmese: Mon, Old Shan, New Shan
Jump to navigation Jump to search
Moved from Module talk:pi-Latn-Deva-translit.

@Wyang, this is perfect! I'll try to implement in {{pi-alt}}. —Aryamanarora (मुझसे बात करो) 15:50, 1 January 2016 (UTC)Reply

nvm, it uses Module:pi-headword and I don't know Lua. —Aryamanarora (मुझसे बात करो) 15:54, 1 January 2016 (UTC)Reply
No problem, please see my edit in that module. If the given transliteration disagrees with the module one, it will be sent into Category:Pali terms with inconsistent transliterations. Wyang (talk) 22:58, 1 January 2016 (UTC)Reply
That's great! Great job on such short notice. I can vouch for the Devanagari transcriptions. —Aryamanarora (मुझसे बात करो) 02:32, 2 January 2016 (UTC)Reply

Thai[edit]

@Wyang. What is the Thai transliteration based on?

E.g. in Thai proper is a silent consonant at the beginning of a word and is a short vowel "a", a syllable "ra" should be "ระ" (not just ร) but a standalone syllable "a" should be "อะ". --Anatoli T. (обсудить/вклад) 09:58, 2 January 2016 (UTC)Reply

Hi Anatoli, please see Thai_alphabet#Sanskrit_and_Pali. Wyang (talk) 10:01, 2 January 2016 (UTC)Reply
Ah, thanks! I see it's very different from modern Thai spelling. They shouldn't list "อะ" in a vowel table as "a", since the inherent "a" is always pronounced as in Sanskrit. --Anatoli T. (обсудить/вклад) 10:10, 2 January 2016 (UTC)Reply

Some errors[edit]

@Wyang see bhātar and an-. Thanks! —Aryamanarora (मुझसे बात करो) 16:51, 31 January 2016 (UTC)Reply

Cheers, good catch! I'm not 100% sure about the right solution. I have used the equivalents of virāma for now. Wyang (talk) 21:55, 31 January 2016 (UTC)Reply
Thanks! That seems to be an ideal solution. —Aryamanarora (मुझसे बात करो) 23:35, 31 January 2016 (UTC)Reply

Bengali[edit]

Bengali script form is breaking at kattar. I think it's because of the nuqta? —Aryaman (मुझसे बात करो) 00:00, 5 October 2017 (UTC)Reply

@Aryamanarora Thanks, it's fixed now. Wyang (talk) 00:30, 5 October 2017 (UTC)Reply

Brahmi[edit]

@Wyang, it seems that the tables are showing Brahmi label but no Brahmi script. Can this be fixed? —*i̯óh₁nC[5] 07:46, 4 December 2017 (UTC)Reply

@JohnC5 Ya, not a problem. Fixed. Wyang (talk) 11:23, 4 December 2017 (UTC)Reply
@Wyang: Thank you. For my reference, what was the actual issue and can we get the Brahmi translit working? —*i̯óh₁nC[5] 09:47, 5 December 2017 (UTC)Reply
@JohnC5 Ah, the Pali language is said to have "Brahmi" as one of its scripts in Module:languages/data2, but this module currently has no support for the Brahmi script when it inherits the list of scripts used to write Pali from Module:pi-headword. To fix it we have to add the Brahmi equivalents of the Indic characters to this module. Wyang (talk) 14:36, 5 December 2017 (UTC)Reply
I just added Brahmi few days ago dudes! --Octahedron80 (talk) 02:20, 6 December 2017 (UTC)Reply
@Wyang: I think I did it? —*i̯óh₁nC[5] 01:45, 6 December 2017 (UTC)Reply
@JohnC5: From my null edits of kamma and ca, you did fine so far. --Lo Ximiendo (talk) 01:51, 6 December 2017 (UTC)Reply
@Lo Ximiendo: Thanks! To be fair, I was curious whether they are correct, because I effectively just copy-pasted from the Unicode table. —*i̯óh₁nC[5] 02:00, 6 December 2017 (UTC)Reply

Bengali 2[edit]

Aryaman Hi, is the Bengali version a modified version or traditional? Because traditionally Bengalis didn't use ৱ for va, they used ব for both. ৱ (wo) is an Assamese letter and it was derived from its earlier version: র (wo/va) which is still used in Mithilakshar. Sagir Ahmed Msa (talk) 14:00, 19 February 2018 (UTC)Reply

@Sagir Ahmed Msa: I am not sure, @Octahedron80 added it. It is probably meant to be traditional, I can change it. —AryamanA (मुझसे बात करेंयोगदान) 14:55, 19 February 2018 (UTC)Reply
Pali uses both "ba" and "va" making words different meaning but common Bengali has only ব for ba so we need another symbol. I choose ৱ from Assamese for va because it is only one left. (র is already belonged to "ra") Please note that this module is NOT Bengali/Assamese language so their rules can't be used here. --Octahedron80 (talk) 02:11, 20 February 2018 (UTC)Reply
@Octahedron80 Does this module follow Pali's rule? I mean these scripts are here because they are traditionally used to write Pali. So in the traditional Pali manuscripts, written in Bengali script, which letters were used for r, b and v? Sagir Ahmed Msa (talk) 04:38, 20 February 2018 (UTC)Reply
I don't know either. But if one has an evidence, changing letter is easy. --Octahedron80 (talk) 19:35, 20 February 2018 (UTC)Reply

@Octahedron80 I think just like Sanskrit, Pali also didn't use র for ra and ৱ for va. Are there any evidence? Even ৱ is a variant of র in Bengali. It's common to see ৱ used as ra in Bengali and other languages of Bengal. I think the Pali module should follow the Sanskrit module for Bengali alphabet where ব is used for both ba and va. I have seen a lot of writings from Bengal where ব is used for both b and v. Unfortunately I can't identify the languages. I've never seen writings from Bengal where ব is ba and ৱ is va. Msasag (talk) 16:59, 22 November 2018 (UTC)Reply

What we need is evidence of Bengali script usage for Pali - preferably sources fit for use in quotations. It seems that at one time the Bengali script used the 'vva' conjunct ব্ব for va in Sanskrit. At https://www.nirvanapeace.com/buddhism-philosophy/buddhist-philosophy/121, we can find ব U+09AC for ba and ৰ U+09F0 for va! The example is বহুং ৰে সরণং যন্তি, পব্বতানি ৰনানি চ = "bahuṃ ve saraṇaṃ yanti, pabbatāni vanāni ca". RichardW57 (talk) 00:11, 23 November 2018 (UTC)Reply

@RichardW57 Hi, the ৱ for va is not used in any traditional texts in the Bengal region. ব is used for both ba and va in Sanskrit, Bengali etc. Adding ৱ from Assamese is neologism. Please add sources that it is not and that ৱ for va is traditionally used in Bengal. Msasag (talk) 11:09, 26 December 2018 (UTC)Reply

@Msasag: What traditional usage is there of Eastern Nagari for Pali?

I'm not confident that Pali and Sanskrit writing usage coincide. For example, I have seen a claim that to use niggahita for homorganic nasals before stops in Pali is simply wrong; whereas that is the standard Indian Devanagari practice for Sanskrit. (European Devanagari for Sanskrit uses the nasal consonant instead.)

The Thai script spelling we use for Pali uses a character (namely, phinthu) that was added in the 19th century. What do we have for modern Pali use in the Eastern Nagari script? All I could dig out from the Internet was U+09F0 BENGALI LETTER RA WITH MIDDLE DIAGONAL for va; my best guess is that should be interpreted as a substitute for the character formerly known as BENGALI LETTER VA WITH LOWER DIAGONAL because too many fonts and code charts show it with a dot rather than an extra stroke. The scanty evidence I have seen says that ba and va are different in modern Eastern Nagari script usage. Vidyābhūṣaṇ (1898) says that U+09F1 is acceptable for writing the sound [v] or [w] in English loanwords, so the letter is not such a stranger to Bengali usage.

Modern usage in books should trump traditional usage when it dominates it for Pali. However, I believe we need to allow for systematic alternatives in spelling; I intend to automate alternatives for the Tai Tham script. I expect at least a few alternatives will be needed for the Sinhalese script. (I have already automated Thai script alternatives in the conjugation of the middle voice of the verb.) --RichardW57 (talk) 16:43, 26 December 2018 (UTC)Reply

@RichardW57 But U+09F1 was never used in Bengali for v/wa of English loanwords, ওয় is used instead. Like ওয়াশিংটন for 'Washington'. And ৱ is used for 'ra' instead. You can search আমাৱ and see a lot of results of Bengali texts using ৱ as 'ro'. In many fonts U+09B0 has a line instead of dot and in handwritings as well. And similarly in other variants of Eastern Nagari like Assamese, Mithilakshar, ৱ and র are variants of each other for wa/va sound. In Charyapada র is used for va (and ব for ba and ra. But sometimes ৰ for ra). Also in Bengali both র and ৰ were used for ra. And ব for ba and va is still used (check the Bengali alphabet, including different variants of some characters used in this 1778 book: https://books.google.co.in/books/about/A_Grammar_of_the_Bengal_Language.html?id=bttGAAAAcAAJ&printsec=frontcover&source=kp_read_button&redir_esc=y ). So it's better to use the traditional ব or the modern ওয়/ওঅ. Msasag (talk) 19:12, 27 December 2018 (UTC)Reply

@Msasag Your arguments make a lot of sense for devising a transliteration to use on bn.wiktionary.org if there is no standard scheme for writing Pali in the script or transliterating it to the script. However, our primary aim here is to record the spelling used for Pali texts when written in the Eastern Nagari script. The scanty evidence I have seen for usage in the script shows that va is distinguished from ba and ra. What evidence have you on the actual usage for Pali in the Eastern Nagari script?

I tried to expand my knowledge by fishing on Google with various spellings of the exact phrase "evaṃ me sutaṃ":

"এৱং মে সুতং" (using U+09F1 BENGALI LETTER RA WITH LOWER DIAGONAL) got 0 hits! "এবং মে সুতং" (using U+09AC BENGALI LETTER BA) got 10 hits. "এৰং মে সুতং" (using U+09F0 BENGALI LETTER RA WITH MIDDLE DIAGONAL) got 20 hits. "এরং মে সুতং" (using U+09B0 BENGALI LETTER RA) got 0 hits

There was nothing in Google books, alas.

So, there is a case for *adding* the spelling with U+09AC as an alternative. I'm not sure how strong this is; the lack of a source for quotations is a handicap. @AryamanA @Wyang.

Should we take this to the Beer Parlour? --RichardW57 (talk) 23:02, 27 December 2018 (UTC)Reply

{{ping:Octahedron80}: I've been watching the ongoing vandalism with horror. However, if we're going to use ৰ U+09F0 BENGAL LETTER RA WITH MIDDLE DIAGONAL for 'v', I thing the form before other consonants needs not to be <U+09F0, U+200C, U+09CD>, but <U+09F0, U+09CD, U+200C>. Test case: 'vyi', which one would naively encode as <U+09F0, U+200C, U+09CD, U+09AF, U+09BF> yields:

1) <U+09F0, U+09CD, U+09AF, U+09BF> is  ৰ্যি - glyphs: i, 'v', modified ya
2) <U+09F0, U+200C, U+09CD, U+09AF, U+09BF> is ৰ‌্যি - glyphs: i, 'v', modified ya
3) <U+09F0, U+09CD, U+200D, U+09AF, U+09BF> is ৰ্‍যি - glyphs: i, 'v', virama, modified ya
4) <U+09F0, U+09CD, U+200C, U+09AF, U+09BF> is ৰ্‌যি - glyphs: 'v', virama, i, ya

Test case: 'vvi', which one would naively encode as <U+09F0, U+200C, U+09CD, U+09AF, U+09BF> yields

5) <U+09F0, U+09CD, U+09F0, U+09BF> is ৰ্ৰি - glyphs 'v', virama, i, 'v'
6) <U+09F0, U+200C, U+09CD, U+09F0, U+09BF> is ৰ‌্ৰি - glyphs 'v', virama, i, 'v'
7) <U+09F0, U+09CD, U+200D, U+09F0, U+09BF> is ৰ্‍ৰি - glyphs i, 'v', virama, 'v' 
8) <U+09F0, U+09CD, U+200C, U+09F0, U+09BF> is ৰ্‌ৰি - glyphs 'v', virama, i, 'v'

The combinations 'vy' and 'vv' are the only clusters I can thing of that begin with 'v'.

Do you see something different? I see no repha! --RichardW57 (talk) 15:49, 31 May 2020 (UTC)Reply

@Octahedron80: I forgot the cluster 'vh' (chiefly found in the second person plural of the middle). So, testing for the very real cluster 'vhe':
9) <U+09F0, U+09CD, U+09B9, U+09C7> is ৰ্হে - glyphs: 'v', virama, e, h
10) <U+09F0, U+200C, U+09CD, U+09B9, U+09C7> is ৰ‌্হে - glyphs: 'v', virama, e, h
11) <U+09F0, U+09CD, U+200D, U+09B9, U+09C7> is ৰ্‍হে - glyphs: e, 'v', virma, h
12) <U+09F0, U+09CD, U+200C, U+09B9, U+09C7> is ৰ্‌হে - glyphs: 'v', virama, e, ha

Again, no sign of repha! It may be worth noting that the renderer is HarfBuzz - I wouldn't guarantee all systems to be the same. I'm not sure of the font - the system seems to think it can fake a half-form by sticking a virama in, which is why the vowel ends up on the left in sequences 3, 7 and 11. In sequences 1 and 2, the special form of 'y' (ya phalaa?) indicates a non-initial consonant, so the vowel can be moved to the font. Otherwise, no form of consonant combination can be achieved (sometimes due to ZWNJ), so we get two orthographic syllables (as fallback in 5 and 9, by command in the others), so the vowel moves to after the 'v'. --RichardW57 (talk) 17:30, 31 May 2020 (UTC)Reply

I found the problem last night with "vyañjana". It must be rendered as ৰ‌্যঞ্জন instead of incorrect ৰ্যঞ্জন, so I solved it (in the same way ব্য bya appears). I use method described in w:ZWNJ at Wikipedia. Yeah, ZWNJ must be put between ra\va and next virama, I tried all methods. Replied to RichardW57, I see reph on 1, 5, and 9 without modification. 2, 6 and 10 are prefered results. This because ৰ is designated for displaying Assamese ra at first, so it become reph when followed by virama. --Octahedron80 (talk) 00:48, 1 June 2020 (UTC)Reply
In case you want to see what I see, I have a screenshot [1] I use default system fonts which Windows provides. --Octahedron80 (talk) 01:11, 1 June 2020 (UTC)Reply
@Octahedron80: Your rendering system is indeed generating a repha - I'm not getting that on Firefox under Ubuntu. Now when I type <U+09B0 RA, U+09CD, U+09AF, U+09BF>, I do get র্যি with a repha. So, how much better is the behaviour using the standard combination <U+200D, U+09CD> as defined in http://www.unicode.org/versions/Unicode13.0.0/ch12.pdf p478?
The test sequences and what I get on Ubuntu are:
13) <U+09F0, U+200D, U+09CD, U+09AF, U+09BF> is ৰ‍্যি. Glyphs are i, 'v', ya phalaa, as desired
14) <U+09F0, U+200D, U+09CD, U+09F0, U+09BF> is ৰ‍্ৰি. Glyphs are 'v', virama, i, 'v'.
15) <U+09F0, U+200D, U+09CD, U+09B9, U+09C7> is ৰ‍্হে. Glyphs are 'v', virama, e, h
The renderings you get are what I can get on IE11 using Nirmala UI. The repha formation seems to be a feature of that font. When I use a HarfBuzz renderer on the same font, I get almost the same shape except for numbers 3, 7 and 11 - HarfBuzz puts the vowel between the consonants.

Now for 13 to 15, with Nirmala UI and either IE11 or the HarfBuzz renderer, I get:

13: i, 'v', ya phalaa, as desired. 14: i, 'vv' conjunct - what we want, but this is a very rare and strange sequence. 15: 'v', virama, e, h. I rather we had a conjunct, but if it's not available, this is an acceptable fallback.

Overall, the Unicode connections sequence <U+200D, U+09CD> seems to give the best results if we have to use U+09F0. --RichardW57 (talk) 02:41, 1 June 2020 (UTC)Reply

OK then, we will use 200D (ZWJ) instead of 200C (ZWNJ) as the Unicode suggests. --Octahedron80 (talk) 06:24, 1 June 2020 (UTC)Reply

2021 Update[edit]

@Octahedron80, AryamanA: We've just had a discussion on Template talk:pi-alt about the Bengali script equivalent of <v> when writing Pali. We now have a book ambiguously using U+09AC BENGALI LETTER BA and a book using U+09F0 BENGALI LETTER RA WITH MIDDLE DIAGONAL as this module already does - see the other discussion page for links. I intend to check what can be extracted from the online presence of the first book; at first glance its spelling looked a little inconsistent. For now, I'd say no action is needed, but we may find we will benefit from generating an extra transliteration. There is, I think, no action needed for transliteration to Latin; we certainly cannot distinguish initial <b> and <v> if the writing system does not. --RichardW57 (talk) 18:49, 17 May 2021 (UTC)Reply

Rajbansi[edit]

AryamanA Hi, if Rajbansi includes Koch Rajbongsi (Assam), Rangpuri (Bangladesh) and Kamatapuri (West Bengal) then it needs a similar template like this, with 3 alphabets. Koch Rajbongsi alphabet is similar to Assamese alphabet, means it has (ro) and (wo); Rangpuri uses a Bengali based alphabet, means it has (ro) and (ó) for "r" and "w" respectively. Sagir Ahmed Msa (talk) 14:08, 19 February 2018 (UTC)Reply

@Sagir Ahmed Msa: We only have a few Rajbanshi words right now, but I agree that would be necessary if we want to expand it. —AryamanA (मुझसे बात करेंयोगदान) 02:15, 20 February 2018 (UTC)Reply
@AryamanA: Its phonology looks very different from that of Kamata/Koch Rajbongsi/Rangpuri. There was a category for Kamatapuri, I can't find it. -- Sagir

Tai Tham Consonants[edit]

There are some problems similar to the one solved by replacing ᩈ᩠ᩈ <HIGH SA, SAKOT, HIGH SA> by <GREAT SA>. In Pali, the subscript forms of RA, LA, LOW PA (Pali <b>) and HIGH RATHA (Pali <ṭh>) are not formed by prefixing SAKOT. Instead, special subscript consonants, MEDIAL RA, MEDIAL LA and SIGN LOW PA OR HIGH RATHA, are used instead. Thus, instead of ᨠ᩠ᩁ ᩃ᩠ᩃ ᨻ᩠ᨻ ᨭ᩠ᨮ, we have ᨠᩕ ᩃᩖ ᨻᩛ ᨭᩛ. (<SAKOT, LA> may be used occasionally to represent <al>.)

The general issues of subscript <ṭh>, <b>, <r> and <r> were addressed on 29-30 September 2018 by Octahedron80. RichardW57 (talk) 09:00, 30 September 2018 (UTC)Reply

It is safest to restrict the use of SIGN LOW PA OR HIGH RATHA to the combinations ṭṭh ṇṭh bb mb.

Pali <p> is represented by U+1A37 BA in the west (Northern Thailand and the Shan States), but U+1A38 PA in Laos and Northeast Thailand. Thus the Tai Tham alternative form for pāpa is wrong for someone who uses Tai Tham in a Northern Thai context. -- RichardW57 (talk) 21:04, 5 September 2018 (UTC)Reply

Default was switched to U+1A37 BA on 29-30 September 2018 by Octahedron80. RichardW57 (talk) 09:00, 30 September 2018 (UTC)Reply
We also have two alternatives for consonant stems - one with U+1A7A TAI THAM SIGN RA HAAM and one with U+1A7C TAI THAM SIGN KHUEN-LUE KARAN. - RichardW57 (talk) 20:07, 17 September 2018 (UTC)Reply
I add conditions to mainly used by Northern Thai. For other variants, please manually add to Lana2, Lana3, Lana4. --Octahedron80 (talk) 23:59, 29 September 2018 (UTC)Reply
@Octahedron80, that should be a temporary measure for predictable variations. I'm trying to amass evidence of what these variations are. Once we have that evidence, I believe {{pi-alt}} should generate them automatically. Or will you write a bot to report inconsistencies in what is passed to {{pi-alt}}? The other sane alternatives are to:
  1. Generate templates for each cross-script word with variants. I don't think @AryamanA would approve, as each macro might not ever be used in much more than a dozen words (9 current scripts, Lao with Pali-support consonants restored, and just possibly the spelling found in Thai chanting books).
  2. Consolidate the supplementary forms in a data module indexed by primary Roman script form. Perhaps this should be accessed by something like 'pi-alt-extra', so that adding an extra form does not invalidate the cache of every page for a Pali word.
Should we take this discussion to the Grease Pit? - RichardW57 (talk) 00:39, 30 September 2018 (UTC)Reply
@RichardW57: The sanest alternative is having a way to provide word variants by hand within {{pi-alt}}, since there is so little duplication. The alternative script pages are just glorified redirects and a bot could be used to make them and handle their {{pi-alt}} templates. I think having all these modules, as you have been suggesting and implementing for a while now, is overkill, and also it's neither editor- nor user-friendly (how would a new editor understand how to add forms in the data module? what if a user wants to see where the data is stored?). I don't edit much Pali anymore so I have taken a laissez-faire approach, but if I was more involved with Pali I would have been a bit more vocal about my disapproval. I don't think this is the right direction to go with Pali entries. —AryamanA (मुझसे बात करेंयोगदान) 03:28, 30 September 2018 (UTC)Reply
@AryamanA: The automation of {{pi-alt}} is the first step towards making a bot for automatically creating alternative script pages. However, the data on what alternative script pages to create would have to be stored somewhere. Where? A big step was the automation of {{pi-alt}}, but I think it needs to handle automatic variation *within* scripts. This variation is mostly between writing systems. (Incidentally, is the writing system of the old PTS dictionary considered too rare to fully accommodate?) However, not all variation is of this nature. We also have variable Sanskritisation (or apparent Sankritisation), as in {{l|pi|tatra} v. tattha and {{l|pi|ti} v. {{l|pi|tri}. There is also local variation. Francis Mason reports in his grammar that the Burmese used -smi, -smiṃ, -smī and -smīṃ indiscriminately as the locative singular ending. The last of them is a fairly fundamental breach of the laws of Pali phonology! I think fully grammatical tasmiṃ deserves a lemma-like entry of its own, but how far those Burmese variants deserve propagation (as opposed to cross-referencing) across scripts is another matter. A few words can be supported manually - though expect groans when Pali-capable Lao is added. (The addition to Unicode of the consonants restored or invented by the Buddhist Institute in the 1930's has started.)
As to how to use {{pi-alt}} and any associated data module, that is what documentation pages are for. ({{pi-alt/documentation}} does need some improvement, but the associated error-checking helps a lot.) Now, a template like pi-alt-pāpa to generate the alternative forms of pāpa (or should that be the less clear pi-alt-paapa?) should be fairly clear to the user. At present, to understand {{pi-alt}} one has to drill down to the module. A bot that made the invocations consistent could lead to edit wars, as has happened on Wikipedia with Thai minority languages and bots that enforce the WTT-type rules of Thai orthography. In this case, a bot should make suggestions, not changes. Otherwise, trouble could arise when the deletion of an alternative form is requested, as I must get round to for a Devanagari Pali entry with -ghgh-. RichardW57 (talk) 07:30, 30 September 2018 (UTC)Reply

Transliterating dhy[edit]

@Octahedron80, @AryamanA: The digraph in 'dh' in the sequence 'dhy' is not recognised. This was shown in the cross-script declension test Module:pi-decl/noun/testcases/documentation for bodhi, which has oblique singular bodhyā if it follows the paradigms. The oblique singulars of बोधि and โพธิ are showing up as बोद्ह्या and โพทฺหฺยา in the 'expected' column, which shows a transliteration of the Roman script case form. - RichardW57 (talk) 20:11, 29 September 2018 (UTC)Reply

Well, I can't find bodhyā, but there are several words with 'dhy'. The commonest seems to osadhyo, which occurs as the unassimilated syncopated vocative plural of osadhī 'medicinal plant in:

  • tiṇalatā ca osadhyo pabbatāni vanāni ca । ammaṃ ārogyaṃ vajjātha: ayaṃ no neti brāhmaṇo
    Grass and creepers, plants, mountains, and woods, please wish our mummy well. This brahmin is taking us away.

in Verse 519 of the Vessantara Jataka (Verse 2174 of the Mahanipato). Thus, we have a bug to be fixed. - RichardW57 (talk) 00:00, 30 September 2018 (UTC)Reply

This module is not originally made by me. I think the problem is at Latin alphabet parsing and I cannot spot where. I also have another version conversing Thai to other scripts at th.wikt that does not need to examine digraphes. I think @Wyang can help in this case. --Octahedron80 (talk) 08:10, 1 October 2018 (UTC)Reply
@Octahedron80:: Look for 'h$' in the code. I'm inclined to greedily convert the onset processing digraphs first, and then stick viramas, stackers or pure killers in between the consonants. However, I'm inclined to set up a decent set of testcases first, to catch the nasties. The longest stack I can find is 'brhm', which looks like a scribal abbreviation for 'brahm' to me - that would just about work in the Burmese script.
Is this proper Pali? I was under the impression that dhy should assimilate to jh word-initially and jjh medially, no? Wyang (talk) 09:21, 1 October 2018 (UTC)Reply
It's valid enough for Wiktionary, which is what matters, and appears to be in the canon. Old Indic *dhy indeed assimilates as you describe. However, what we have here seems to be syncopation - '-iy-' reduces to 'y', inconsistently followed by another round of assimilation. Thus, for the oblique singulars of jāti and nadī, Duroiselle records jātiyā/jatyā/jaccā and nadiyā/nadyā/najjā. Now, he doesn't have plural nadyo, but he does have plural jatyo. Another source of odd clusters is sandhi - that restores clusters in -Cv- that one would have expected to have been assimilated away. RichardW57 (talk) 09:53, 1 October 2018 (UTC)Reply
Ok thanks. Feel free to add this into the module; if you would like me to help please let me know. I think a way to do this could be splitting the onset on line 332 into onset and glide, but it would depend on what similar cases there are. Wyang (talk) 10:47, 1 October 2018 (UTC)Reply
@Wyang:: That dependency is why I'd go for brute force:
  1. Replace '.h' by corresponding single non-Roman letter. 'h' is not the first element of a digraph, so OK.
  2. Replace '.' by corresponding non-Roman letter.
  3. Replace '<letter><letter>' by <letter><script viramoid><letter>
  4. Repeat to handle sequences of 3 or more letters.

Fancy stuff like 'medials', glide handling converting homorganic nasals to niggahita and phonetic syllables goes in the script specific bits. The Thai script has complication with words like 'guyho' - majority คุโยฺห in usage, but majority คุยฺโห in mentions - if we stick to forms with phinthu, and I'm not sure you want to call Pali <h> a 'glide'. RichardW57 (talk) 12:34, 1 October 2018 (UTC)Reply

Solution now implemented as described. Detected issues now limited to round AA v. tall AA, which is discussed elsewhere.

Burmese Pali vo[edit]

This should be transliterated as ဝေါ, but is being transliterated as ဝော, which truly looks like 'te' in my browser. - RichardW57 (talk) 20:11, 29 September 2018 (UTC)Reply

I think it is now fixed. --Octahedron80 (talk) 07:10, 1 October 2018 (UTC)Reply

Pali "brahmañña" and "ဗြာဟ္မဏ" - brahmin[edit]

@Octahedron80, Wyang, AryamanA, RichardW57: Khmer and Burmese sources for Khmer ព្រហ្មញ្ញ (prummañ, brahmin) and Burmese ပုဏ္ဏား (punna:, brahmin) use Pali etymologies but in different scripts Pali brahmañña (brahmin) for Khmer and Pali ဗြာဟ္မဏ (brāhmaṇa, brahmin) for Burmese, which is their usual convention. It should be the same Pali word. Is there an inline template call for converting Pali to different scripts or back to Roman? --Anatoli T. (обсудить/вклад) 08:07, 1 October 2018 (UTC)Reply

IFAIK, Pali in Myanmar script does not use 'lunar ra', but use 'subscript ra' instead. 'brahmañña' should become ဗ္ရဟ္မည (brahmañña). (also ဉ္ဉ > ည) By the way, ñ (ဉ) and ṇ (ဏ) are not the same letter; Burmese ပုဏ္ဏား should come from ဗ္ရာဟ္မဏ (brāhmaṇa). --Octahedron80 (talk) 08:13, 1 October 2018 (UTC)Reply
@Octahedron80: Thanks but please don't remove sourced etymologies. If the spelling is incorrect or unknown, you can remove it like this {{bor|my|pi|}}. --Anatoli T. (обсудить/вклад) 08:31, 1 October 2018 (UTC)Reply
I checked source that there is 'brāhmaṇa' in both Pali/Sanskrit. I put this instead. --Octahedron80 (talk) 08:42, 1 October 2018 (UTC)Reply
@Octahedron80: Thank you! --Anatoli T. (обсудить/вклад) 08:55, 1 October 2018 (UTC)Reply
@Octahedron80: Actually, do you know if brahmañña is a valid Pali word? Is it the same as brāhmaṇa? --Anatoli T. (обсудить/вклад) 08:57, 1 October 2018 (UTC)Reply
Yes. I found it moments ago. [2] And no, you must notice that bra- (his god) and brā- (his follower) are actually not the same word. Be careful. --Octahedron80 (talk) 09:02, 1 October 2018 (UTC)Reply
@Octahedron80:,@Atitarev:: brahmañña is also given in A.P. Buddhadatta Mahathera's 'Concise Pali-English Dictionary', at least as it appears at https://www.budsas.org/ebud/dict-pe/dictpe-19-b.htm. (Note that the text is a bit broken - 'ntilde' appears instead of the letter 'ñ'.) It looks a regular derivative of the forms with 'ṇ'. RichardW57 (talk) 09:26, 1 October 2018 (UTC)Reply
We need to check on Burmese practice on which of subscript and lunar (a.k.a. medial) RA are used in Pali. While www.tipitaka.org uses <VIRAMA, RA> and doesn't appear to be using the Zawgyi encoding, Mason's grammar, and Mazard's revision thereof, both use medial RA in their spelling of the root noun, which I suppose is Pali brahmaṇ. The proposal adding MEDIAL RA leaves the impression that subscript RA is only used in Sanskrit, in imitation of some very old inscriptions. RichardW57 (talk) 09:26, 1 October 2018 (UTC)Reply
Which the default should we use? I guess it gonna be lunar ra? --Octahedron80 (talk) 02:13, 2 October 2018 (UTC)Reply
There's no template to wrap this module, which is for converting Pali from Roman to non-Roman. We ought to get round to writing a non-Roman to Roman module for use in the standard transliteration scheme - at present the Romanisation module uses other languages' Romanisations, and only covers a few scripts. Obviously the Romanisation for Khmer in Khmer script will be no use for Pali. RichardW57 (talk) 09:26, 1 October 2018 (UTC)Reply

Tai Tham vs Myanmar[edit]

Look at this one: [3] @ "9 Attributes of the Buddha". It shows that <da> + <vowel aa> in Tai Tham and Myanmar do not act same behavior; Myanmar use tall aa ဒါ but Tai Tham use normal aa ᨴᩣ. Northern Thai also does not use tall aa in this case. The reason is Tai Tham <a> ᩋ (compare ᨴᩣ) does not look like Myanmar <a> အ (compare ဒာ). (As I said recently, the aa form will be considered by the top of consonant stack.) So gyadā in testcases must be ᨣ᩠ᨿᨴᩣ instead of ᨣ᩠ᨿᨴᩤ.

According to the page, Tai Tham uses ᨸ (with tail) instead of ᨷ against Myanmar ပ. Just to ensure that Pali use both ᨷ/ᨸ in writings.

May be not related, but I have this book [4] to check Northern Thai vocabulary. It contains little error but useable. --Octahedron80 (talk) 02:58, 2 October 2018 (UTC)Reply

Take a look at the N3207 source of the quotation for the common noun ᨻᩩᨴ᩠ᨵ. The quote comes from the top of the lower leaf in Figure 9a. (If my directions are not clear enough , you'll find it most easily be looking for ᨶ᩠ᨶᩴ at the start of a line.) You'll see the syllable ᨴᩤ as well as the syllable ᨴ᩠ᨵᩤ. RichardW57 (talk) 09:42, 2 October 2018 (UTC)Reply
I look through my source every entry and I found that sometimes ᨴᩣ sometimes ᨴᩤ made me more confusing. So I would tell that they are variations of each other. --Octahedron80 (talk) 00:21, 3 October 2018 (UTC)Reply
@Octahedron80:: There are several possibilities as to what is going on. It seems that some people who would always write ᩅᩤ will nevertheless write ᩅ᩵ᩣ. However, subtle rules like that are irrelevant for Pali - Pali doesn't have tone marks. RichardW57 (talk) 23:52, 9 October 2018 (UTC)Reply
Tall AA seems not to occur if there is a rising subscript between the base consonant and the vowel. The reason seems obvious - the rising subscript stops the base consonant and vowel joining (or seeming to be joined), so tall AA is not needed. In the MFL, MEDIAL RA also usually stops tall AA, but not always - the spelling in the MFL is not consistent. RichardW57 (talk) 09:42, 2 October 2018 (UTC)Reply
ᨷ v. ᨸ is a regional difference, and is also related to the shape of ᨷ. I don't know where the author gets his Pali writing system from. It's a shame he presents no manuscripts to back his statements. He doesn't seem to be aware of the vast variation of the shape in LETTER A in Northern Thailand. As far as I am aware, ᨷ v. ᨸ in Pali is a matter of Northern Thai v. Lao. It's also conceivable that some places have switched from ᨷ to ᨸ to be more like Central Thai. RichardW57 (talk) 09:42, 2 October 2018 (UTC)Reply

Variant Forms[edit]

My longer term aim for this module is that, at least when servicing {{pi-alt}}, it will return a list of alternative transliterations for the non-Roman script. It may be appropriate to do that through an explicit target argument. One complication is that not all options are simultaneously present - ᩈᨻ᩠ᨻᩣ might not be a possible transliteration of sabbā, though ᩈᨻᩛᩣ, ᩈᨻᩛᩤ and ᩈᨻ᩠ᨻᩤ will all be found. RichardW57 (talk) 09:42, 2 October 2018 (UTC)Reply

Sinhalese Script[edit]

@Wyang: Is there any rationale to the transliteration of consonant clusters (back) to the Sinhalese script? It's not the way Mazard writes them.

@Wyang, @AryamanA, @Octahedron80: I've now amassed quite a collection of Sinhala examples (see Category:Pali_terms_with_inconsistent_transliterations) to suggest that clusters should generally be made out of touching letters. Dow we need to accept any clusters made with al-lacuna without ZWJ? It affects the inflection of the '(m/v)ant' stems. RichardW57m (talk) 12:57, 22 October 2018 (UTC)Reply

@Wyang, @AryamanA, @Octahedron80: I've changed transliteration so that Sinhala consonant clusters use touching consonants or conjuncts. The conjuncts are used for -r and -y as the second element, and the 6 exceptional clusters kv, tth, tv, nth, nd, ndh. It may be that we will also need repha, though there seem to be very few examples in mainstream Pali. RichardW57 (talk) 22:48, 8 November 2018 (UTC)Reply

@AryamanA, @Octahedron80: I've found another exceptional cluster, 'nv' that is also written as a conjunct. If we take the source of the quote for the Sinhala script form of upagacchati, it's found at Verse 37. (It's Verse 31 at tipitaka.org.) It's another one we don't have font for - Iskoola Pota seems to have overlooked it (neither conjunct nor touching ligatures) and the list of conjuncts and touching forms in the introduction of the book also omits it. I'm mentioning it here so I don't forget it. It also needs to be handled properly in inflection - please no global substitutions. --RichardW57 (talk) 11:46, 4 June 2020 (UTC)Reply

Transliteration and joining of inflections have now been modified so that Sinhala 'nv' now forms a conjunct without breaking the inflection of the middle of tanoti --RichardW57 (talk) 20:57, 9 June 2020 (UTC)Reply

We may now have a complete list for Pali - there are 6 plausible combinations out of [k|t|n][v|th|d|dh], unless 'kth' also occurs. --RichardW57 (talk) 11:46, 4 June 2020 (UTC)Reply

I'm bending the LKLUG font to support Pali. (It's from GNU, so I can publish it when I've finished.) I've already added a set of touching forms, but the conjunct forms will be more effort - I'm going to have assemble each ligature. The conjuncts nd and ndr are already in LKLUG. It's worse than it sounds, for when I make the conjunct for kv, I've also got to add ligatures for kvi, kvī, kvu and kvū. --RichardW57 (talk) 11:46, 4 June 2020 (UTC)Reply

Lao Pali Alphabets[edit]

From what I'm seeing the commoner writing for Pali 'y' seems to be ຍ rather than ຢ. This seems to be the case even when only the Lao lao consonants are used, and so ຍ represents both 'y' and 'ñ'. Does this correspond to others' observations? @Octahedron80 I've added a parameter |y= to the inflection templates to handle the endings. -- RichardW57 (talk) 22:31, 28 June 2019 (UTC)Reply

I notice that either. So I will change y to short-tail . And ñ is still . --Octahedron80 (talk) 03:09, 29 June 2019 (UTC)Reply

Tai Tham Tall AA Following Subscripts[edit]

The rule

word = gsub(word, "([ᨣᨴᨵᨷᩅ])(᩠[ᨠ-ᩌᩔ])(ᩮ?)ᩣ", "%1%2%3ᩤ")

doesn't always work, even in writing systems that normally use tall AA after those base consonants. I propose we gather together examples so that we can make an evidence-based refinement to the rule. Note that inflection tables already allow fine-tuning between round AA and tall AA. Please ensure that the examples come from texts where tall AA is used when there is no subscript. --RichardW57 (talk) 13:36, 4 April 2020 (UTC)Reply

Latin Example
gg
ggh
dd
ddh ᨻᩩᨴ᩠ᨵᩤᨶ (buddhāna) (should also found a good round AA counterexample) --RichardW57 (talk) 13:36, 4 April 2020 (UTC)Reply
dhy
pp
pph ᨷᨷ᩠ᨹᩣᩈ (papphāsa) --RichardW57 (talk) 13:36, 4 April 2020 (UTC)Reply
vy

"iṃ" in Khmer, Thai and Lao[edit]

There's a specific vowel to denote the "iṃ". For example:

In Khmer, Thai and Lao, no vowel stakes on another vowel. Also, it's not appear well on some device. Hence, it should be កឹ, กึ, and ກຶ, respectively. — Pichnat Thong • 16:47, 20 August 2020 (UTC)

@Octahedron80 For Khmer and Thai script, sara ue certainly seems much more popular than using two marks, for which the primary users are tipitaka.org and en.wiktionary.org. That's annoying, because when I queried the rendering with Microsoft about 15 years ago, the man (Peter Constable) said Microsoft did regard <sara i, nikkhahit> as the proper encoding, and that the failed rendering in Windows XP with complex scripts enabled would be fixed in Vista. I am not sure whether the two element combination merits retention as an alternative spelling. --RichardW57 (talk) 00:55, 21 August 2020 (UTC)Reply
For Lao, I'm not so sure what is currently happening. If one goes to page 40 of the Lao source for the first quote of ᩋᩉᩴ (ahaṃ), you'll find ᨠᩥᩴᩈᩩ converted to the Lao script as ກິໍສຸ that looks very different to ກິໍສຸ. (The usual conversion in that book would be to ກິງສຸ). We may have to search hard for a text that will show the difference. --RichardW57 (talk) 00:55, 21 August 2020 (UTC)Reply
Do not use sara UE anywhere to write Pali (and also Sanskrit). If you have a proper font like Angsana New, Leelawadee UI or Lao Pali Alpha, you will see nikkhahit over sara I. Due to old system (like XP), nikkhahit cannot be input over sara I, so many people just use sara UE instead. Todays, font makers already realize the problem and make their fonts to support <sara i, nikkhahit>. For readers, you may want to copy newer version of same fonts to update your old system. And I do not recommend Arial Unicode MS because it stops update for ages. --Octahedron80 (talk) 01:17, 21 August 2020 (UTC)Reply
@Octahedron80, @RichardW57 Thank you, I understand your saying. I was wondering how it would be "sara i, nikkhahit" when "sara ue" is just conveniently combined the former as one. — Pichnat Thong • 02:17, 21 August 2020 (UTC)Reply
@Octahedron80 (You don't need to ping me when you reply. This page is permanently on my watch list.) --RichardW57 (talk) 03:17, 21 August 2020 (UTC)Reply
There are several issues here.
  • Are sara ue and <sara i, nikkhahit> often distinguished in handwriting? I've been looking at extracts from a 19th century book, and I can't see the difference between sara ue in Thai and the vowel in Pali text. Is the difference in appearance real? There are now some Khmer scripts fonts on Windows that distinguish <COENG, TA> and <COENG, DA> because they're encoded differently, though in Cambodia (or Phnom Penh, at least), they've been written the same for nearly a century.
Yes. They are surely different. UE is merged together but IM is stacking in two. And no one write UE in place of IM by hand, and vice versa. See my sample fonts: https://ibb.co/c3ppDZ9 --Octahedron80 (talk) 04:04, 21 August 2020 (UTC)Reply
The difference you describe is what I had hoped to see, but in the old book sara ue (in Thai) was written with a gap between the sara i and nikkhahit parts. And clearly Pichnat Tong hadn't been taught that they were different. On the other hand, he will probably have been taught that sara ue is composed of sara i and nikkhahit. And that is what a lot of fonts do - they put glyphs for sara i and for nikkhahit together to make sara ue. --RichardW57 (talk) 10:40, 21 August 2020 (UTC)Reply
See Lao Pali too: https://ibb.co/sszR38L --Octahedron80 (talk) 04:38, 21 August 2020 (UTC)Reply
Whose is that font? It looks as though it suffers from having to work with Latin text - the vowels above are too small, which forces the nikkhahit to be at the top of the stack. In the Lanna script, I've almost always seen the mai kang to the right of the vowel, and it's very striking how similar Lao Tham and Lao Lao are. (One exception is the Hariphunchai font line, but I think that's an oversight, not an intention. Its mai kang appears inside the vowel.) --RichardW57 (talk) 10:40, 21 August 2020 (UTC)Reply
You can get it from my collection. There are two fonts that display the same. You can get more if you google for "Lao Pali font". (I won't collect more; I am just satisfied with these.) --Octahedron80 (talk) 02:37, 22 August 2020 (UTC)Reply
  • If there is a real difference between the two, have Thais now permanently switched to using sara ue? After all, 'the textbook' uses sare ue! If they have, that is what Wiktionary should treat as the primary form.
As I told before, UE is just easier to type (but incorrect) because the limitation of the system does not allow to type nikkhahit over sara I (and also after sara U as in พาหุํ if you want to know.) Luckily, Firefox allows to do that. In case of capable to change, every UE must be converted to IM. Todays, we will see the actual IM in publications. At Wikibooks, I will change it now. --Octahedron80 (talk) 04:10, 21 August 2020 (UTC)Reply
  • As it is currently what is being used, sara ue needs to be recorded in Wiktionary. We might record it as an XP-induced misspelling. On the other hand, we might have to record <sara i, nikkhahit> as an old-fashioned, obsolete or pedantic spelling.
The correct form is <sara i, nikkhahit>. I do not recommend to use UE in any case because it is not intended to use in this way. --Octahedron80 (talk) 04:21, 21 August 2020 (UTC)Reply
Wiktionary records languages as they are, not as they should be. There's a whole set of alternative Esperanto spellings (digraphs with 'x') because the standard spellings were too hard to type! --RichardW57 (talk) 10:40, 21 August 2020 (UTC)Reply
No prob. --Octahedron80 (talk) 03:28, 22 August 2020 (UTC)Reply
The inflections will need tweaking again. I think it's time to work out how to add footnotes to inflection tables. We're beginning to pick up non-standard spellings and regional forms - උපගංඡි (upagaṃchi) may turn out to to be a purely Sinhalese aorist form, and parāṇi may be a 'late' Sanskritised form of an inflection of para - or perhaps it's even just Devanagari! Verb conjugation already has special handling for Thai having two spellings for the endings -mhe and -vho. -RichardW57 (talk) 03:17, 21 August 2020 (UTC)Reply
I recalled something in this video, at 9:30: The Pali Alphabet & Pronunciation Guide | Learn Pāli Basics on Youtube
  • [iṃ] denotes two sounds: อีง and อึง. A comment mentioned อีง sounds as อิํ, otherwise อึ is อึง. In Khmer there use only one its sound "អឹ"/"អឹង". I doubt if "อิํ" is obsolete in Thai, as in Khmer it is. Another case, "am" denotes "sara am" (អាំ/อำ) that is a part of makāranta which pronounce [ṃ] and [m] as the same "m" final sound, but doesn't count for now. Thanks to you two, I respect your works and explaining. — Pichnat Thong • 06:29, 20 August 2020 (UTC)
The video is rather confusing at that point, and I think the attempt to pronounce nasalised vowels failed. Fortunately, for Wiktionary, we don't at present need to know how to pronounce Pali, just how to write it. On top of the vastly different pronunciations one would expect from the usual local pronunciation of the consonants, there are apparently pronunciations that have been taken from the local vernaculars that badly mangle the vowels. --RichardW57 (talk) 10:40, 21 August 2020 (UTC)Reply

Quotations[edit]

We have, I believe, three quotations giving [iṃ] in Thai script with implicit vowels, all from the same book. An example is

  1. 2005, บุญคิด วัชรศาสตร์, ภาษาเมืองล้านนา [The Language of Mueang Lanna] (overall work in Thai), Chiang Mai: Tharatong Print Shop, →ISBN, page 192:
    (๒) สุณนฺตุ โภนฺโต เย เทวา อสฺมึ ฐาเน อาทิคตา* ทีฆายุกา สทา โหนฺตุ.
    (๓) สพฺพสตฺตานํ สุขี อตฺตานํ ปริหรนฺตุ.
    (2) suṇantu bhonto ye devā asmiṃ ṭhāne ādigatā* dīghāyukā sadā hontu,
    (3) sabbasattānaṃ sukhī attānaṃ pariharantu.
    * Translated as though adhigatā
    Listen, Lords! May the devas who stay at this place always be long-lived, and live happily for themselves and for the benefit of all beings.
    (the text in the book is shown at [5]).

@Pichnat Thong, Octahedron80: What is printed in the book there? Sara ue or <sara i, nikkhahit>? I transcribed it as the latter, but I am worried that I am wrong. Within the book, the precise typeface seems only to be used for Pali. (It could have been chosen because it handles marks below well.) --RichardW57 (talk) 17:43, 22 August 2020 (UTC)Reply

I found <sara ue> and <sara i, nikkhahit> (which I assumed, probably just errors?) in the same page of the book "ภิกขุปาติโมกข์ปาฬิ ฉบับ จปร อักษรสยาม" [6]. You'd found them on third and forth line on page 30/40. Pichnat Thong (talk) 18:21, 22 August 2020 (UTC)Reply
@Pichnat Thong, Octahedron80: I don't think it's กึ but กิํ. For sara ue, I would expect the nikkhahit to be further to the right. The layout is not consistent. On line -9 (i.e. 9th line from the bottom) of the same page we have successive words สัน์ติํ and อาปัต์ติํ. The nikkhahit is at different heights in the two last syllables! So what do you think is truly written in the text I have copied for Wiktionary quotations? --RichardW57 (talk) 01:07, 23 August 2020 (UTC)Reply
I'd figured that it depends on what the writer choose to type out. <sara i, nikkhahit> and <sara ue> might not be the same thing but they could be used alternatively. [7].
I'd say from the text you copied, it clearly is อสฺมึ rather than อสฺมิํ. Unless not in a book, I prefer the former over the latter since on mobile devices including tablets, <sara i, nikkhahit> looks faulty. I couldn't even type "nikkhahit" after "sara i" after all. — Pichnat Thong, 04:13, 23 August 2020 (UTC)
I've updated the quotation module accordingly, and also the inflection tables for the words whose quoted inflected forms have changed - มหาโพธิ (mahābodhi), คนฺธกุฏิ (gandhakuṭi) and อิม (ima). --RichardW57 (talk) 15:50, 23 August 2020 (UTC)Reply

Burmese: Mon, Old Shan, New Shan[edit]

I have added Mon, Old Shan, New Shan variations of Burmese script. You can look at 'variations' table to see what is changed between them. --Octahedron80 (talk) 02:56, 5 December 2020 (UTC)Reply

Conversion to Old Shan was generating hybrid forms (e.g. ၺႃယဿ compared to standard Burmese script ဉာယဿ), which is wrong for servicing {{pi-alt}}. I've changed it so that words with GREAT SA will be left in the Burmese form rather than converted to Old Shan. This does not affect the conversion of words to the New Shan variant. --RichardW57m (talk) 13:12, 19 August 2021 (UTC)Reply

Mon iṃ[edit]

@Octahedron80, Theknightwho: Following the discussion at User talk:RichardW57#ကီ, I am commenting out the conversion of Mon Pali to ီ rather than to ိံ until we get evidence of it good enough to defend an entry. (Corrected text.) --RichardW57 (talk) 01:10, 26 November 2023 (UTC)Reply

Incidentally, when I checked yesterday, ကီ (kiṃ) was the only Pali term with ီ for iṃ that linked to a page. --RichardW57 (talk) 01:13, 26 November 2023 (UTC)Reply
I neglected to make the change until today. --RichardW57m (talk) 10:06, 30 January 2024 (UTC)Reply

Adding the Chakma Writing System[edit]

@Erutuon, Msasag, Octahedron80, RichardW57m

How could the Chakma script be added to the module? --Apisite (talk) 13:02, 15 October 2021 (UTC)Reply

@Erutuon, Msasag, Octahedron80, RichardW57m

The consonant vaa for Pali has been added. --Apisite (talk) 13:48, 16 October 2021 (UTC)Reply

Isn't Cyrillic more urgent? I'm not sure how many dead bodies would be required for Chakma. The CFI issue is important. How many durable instances do we have? Are people likely to just come across Pali in Chakma script? The perceived need for the letter VAA for base letter 'v' is odd, when one can see that WAA is being used for subscript 'v'. --RichardW57 (talk) 19:51, 16 October 2021 (UTC)Reply

The technical issues that immediately spring to mind are:

  • Which independent vowels are used? Do they alternate with spellings with AA?
  • How are geminate consonants written? Does the presence of an explicit vowel matter? If MAAYAA is used, is there consensus on the order of MAAYAA and vowel?
  • How are other consonant clusters written? If there's no tradition of writing Pali or Sanskrit, we can't rely on that to guide the writing.

For transliteration from Chakma, the following questions also arise:

  • Do we need to distinguish VAA and WAA as base consonants? I would not be surprised to see WAA used for 'v'. I think we don't need to.
  • Do we need to distinguish <MAAYAA, vowel> and <vowel, MAAYAA>? I think not.

The only texts I have available for analysis are those in the Unicode proposals. --RichardW57 (talk) 19:51, 16 October 2021 (UTC)Reply

@Apisite: Do you have any other Pali texts to offer? --RichardW57m (talk) 12:01, 18 October 2021 (UTC)Reply

@RichardW57: I have no Pali texts in the Chakma script, but you could ask Jyoti and Suz Moriz Chakma, who made the RibengUni font. (You might want to have a good understanding of Bengali though.) --Apisite (talk) 14:02, 18 October 2021 (UTC)Reply

(Notifying RichardW57): , @Erutuon, Msasag, Octahedron80, Apisite: For transliteration from the Chakma script, I would recommend creating Module ccp-translit, with a switch on language for differences between Chakma and Pali. The use of U+11102 CHAKMA SIGN VISARGA for Pali ā is quite remarkable. --RichardW57m (talk) 14:42, 18 October 2021 (UTC)Reply

@Apisite: Here's an awkward question. Is subscript 'v' encoded as U+11131 CHAKMA O MARK or as <U+11134 CHAKMA MAAYYAA, U+11124 CHAKMA LETTER WAA>? --RichardW57 (talk) 21:56, 18 October 2021 (UTC)Reply

@Apisite: I've added Chakma to this module. The limited data I have is consistent with 𑄠𑄢𑄣𑅇 (yrlv) being subscripted in clusters and all other consonants, including 𑄚 (n), not being subscripted. Until such time as Chakma is added to the script list for Pali in Module:languages/data2, {{link}} and {{mention}} will require |sc= for transliteration to occur - but you didn't ask about these! --RichardW57 (talk) 03:48, 22 October 2021 (UTC) RichardW57 (talk) 03:48, 22 October 2021 (UTC)Reply

@Apisite: In early 2023, Chakma was added for Pali to what is now Module:languages/data/2, so it no longer needs to be treated exceptionally. --RichardW57 (talk) 21:41, 1 July 2023 (UTC)Reply

@Apisite: I've now added inflection capability for Chakma. For {{pi-decl-noun}} you need to add new argument |sc=Cakm and old argument |ending= ceases to be optional. For {{pi-conj-special}}, you need to add new argument |sc=Cakm. --RichardW57 (talk) 17:23, 23 October 2021 (UTC)Reply

Inverse transliteration[edit]

{{ping|Apisite}: I've now knocked up the inverse transliteration, from Chakma to Latin, for Pali. It's Module:Cakm-translit. Unfortunately, there are some unexplained discrepancies between the automated and manual transliterations for Chakma. Does that 'transliteration' depend on invisible phonetic facts? --RichardW57 (talk) 16:49, 20 October 2021 (UTC)Reply

Fixing vital punctuation and re-asking:

@Apisite: I've now knocked up the inverse transliteration, from Chakma to Latin, for Pali. It's Module:Cakm-translit. Unfortunately, there are some unexplained discrepancies between the automated and manual transliterations for Chakma. Does that 'transliteration' depend on invisible phonetic facts? --RichardW57m (talk) 16:39, 13 May 2022 (UTC)Reply

@RichardW57m We'll have to make a poster, that says this: "Help wanted from any people with knowledge of Pali written in the Chakma script." --Apisite (talk) 00:40, 2 July 2023 (UTC)Reply

What I wrote was ambiguous. The problem is with Chakma written in the Chakma script; I suspect the issues may be shared with Bengali. --RichardW57m (talk) 08:57, 3 July 2023 (UTC)Reply