Module talk:pi-translit

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Activation Plans[edit]

The most fraught aspect of transliteration is in automated declension tables, where manual correction will be most difficult. In general, transliteration from the Lao script will be most difficult, as different writing systems lose different information as compared to the more westerly scripts (Burmese, Latin etc.). There are also differences with what we want to do with transliterations. Where inflection tables derive or present subordinate lemmas, we want to link to the form of the lemma in the 'main' script where information is centralised, namely to the Latin script. (Perhaps Sanskrit has some design ideas - transliteration and main script will always require different strings, one Latin and one Devanagari.)

For transliteration to Latin, we frequently do not need to know the precise writing system. For example, we do not need to know which of the various Burmese script writing systems is being transliterated from. The plan of action is as follows. At all stages, the working system should continue to work for almost every case. (Although transliteration from Thai have problems, they arise in relatively few places.)

@Erutuon, Octahedron80, Wyang, Bhagadatta, AryamanA, -sche, Erutuon, Atitarev, Benwing2, Victar, Mahagaja As there will be apparently pointless changes to transliteration and inflection tables as I change one module at a time while preserving a working system, be dimly aware that my intended steps are: (belatedly signed) RichardW57 (talk) 18:47, 10 May 2021 (UTC)[reply]

  1. In function orJoin() of Module:pi-decl/noun, add optional argument options to control the transliteration, and disable transliteration for the Lao script until a full system has been implemented for the Lao script. Note that this function is also used in generating conjugation tables.  Done
  2. Get a transliteration module enabled for Pali in Module:languages/data2. (Request placed).
    @octahedron80 Still waiting! It would be good to have it connected in Module:languages/data2, which is protected against me, to 'translit-redirect'. --RichardW57m (talk) 11:20, 8 June 2021 (UTC)[reply]
    I hope this works. --Octahedron80 (talk) 11:50, 8 June 2021 (UTC)  Done[reply]
  3. Populate Module:translit-redirect/data for Pali and Sanskrit. (SE Asian scripts' transliteration can serve both.)  Done
  4. Add function trwo() to Module:pi-translit to handle writing system selection.  Done
  5. Update the test of pi-transit to use the function trwo().  Done
  6. Implement the writing system options for the function trwo().  Done
  7. Pass options down to orJoin() where possible, and use them there.  Done
  8. Enable transliteration for Lao script functionality.  Done

Note that Module:languages/data2 has recently been renamed Module:languages/data/2. --RichardW57m (talk) 09:35, 16 March 2023 (UTC)[reply]

Pertinent Writing System Options for Thai Script[edit]

These are passed into the inflection tables using optional argument |impl=, which takes the values yes, no and both. This option will be passed down to orJoin() as a field impl of argument options and thence to trwo(). A value of both will be treated as not specifying any value.RichardW57m (talk) 12:12, 7 May 2021 (UTC)[reply]

Pertinent Writing System Options for Lao Script[edit]

These are passed into the inflection tables using optional arguments |impl= (as for Thai), |y=, whose permitted values are both, yaa, yung and synonyms thereof, and, for nominal inflection, |liap=. The option |liap= does not seem to be helpful, as we will not be transliterating PHO TAM as 'bh'.

The option |y= specifies whether Latin 'y' in inflections is reflected in Lao script as the letter YO (option yaa) or NYO (option yung).

The algorithm for transliterating Lao letter NYO will be:

  1. If |y= is undefined, then if YO is present, default it to yaa, else to yung.
  2. If y=='yung' (or equivalent), convert NYO to Latin 'y', but if y=='yaa', convert NYO to Latin ň.

There will also be an algorithm to detect the usage of nuktas.RichardW57m (talk) 12:12, 7 May 2021 (UTC) RichardW57m (talk) 12:12, 7 May 2021 (UTC)[reply]

Variants for Other Scripts[edit]

Thai Mon Pali (i.e. Burmese script Pali in the older Mon tradition of Thailand) appears to use ည MYANMAR LETTER NNYA as a single letter, so we may come to need to tag that in Burmese script transliteration. Clumsier ways round might be better. RichardW57m (talk) 12:12, 7 May 2021 (UTC)[reply]

Sticking Plasters[edit]

Three schemes will ultimately be available:

  • Switch transliteration off for specific inflection tables.  Done
  • For links and headwords, manual override is already available.
  • For inflection tables, partial editing of the inputs or outputs, using an analogue of the |subst= parameter of {{ux}}.  Done

The third option may be necessary if an inflected word is written inconsistently. RichardW57m (talk) 12:12, 7 May 2021 (UTC)[reply]


Mopping up[edit]

@Octahedron80, Benwing2: The transliteration is now successfully activated, and seems to be working. For example, it transliterates the usage examples at တွံ (tvaṃ). There are some bits of mopping up to do:--RichardW57m (talk) 13:58, 8 June 2021 (UTC)[reply]

Headwords[edit]

Compare the headwords for Pali and Sanskrit at ទាស. From previous comments, it seems Benwing would prefer the layout for Pali, with transliteration only in the first definition line; with it being implicitly assumed by default that the Roman script equivalent is the transliteration. (See ທັມມະ (damma) and သံဃ (saṃgha) for examples of where it isn't. On this basis, we need to fix entries which are using {{head}} instead of Pali headwords. They fall into two categories:

  • Lemmas, which should alread by using Pali headword templates, such as တွံ (tvaṃ).
  • Non-lemmas, such as สามี (sāmī). How do we fix these? I see two options:
    • Create a family of headword lemmas for forms.
    • Explicitly suppress the transliteration with |tr=-

Which is the better option?

Presumably the Sanskrit headwords also need to be fixed. So far as I am aware, all Sanskrit headwords are implemented by Sanskrit-specific templates.

Cases where transliteration is not the same as the Roman script form of the word are now being put in cat:Terms with redundant transliterations/pi. How should we handle them? One possibility I can think of is to supply the headword parameter with the option |tr=+ to say, "Contrary to normal practice for Pali, supply the transliteration". (If not supplied with |tr=, Module:pi-headword defaults it to |tr=-. This would need be interpreted by Module:pi-headword rather than by Module:headword. Another option to note that this is simply a list of words whose Roman script forms and transliterations are different, and think about how to rename it. Such a category might also apply to Sanskrit in some fashion - it depends on how systematic alternative spellings are to be handled. (Common examples are anusvara v. homorganic nasal and gemination of the consonant in rC clusters.) --RichardW57m (talk) 13:58, 8 June 2021 (UTC)[reply]

Entries for inflected forms[edit]

A typical example is, for a word such as ສຣີຣໍ (sarīraṃ), a definition line

# {{pi-sc|Lao|sarīraṃ}}, ''which is'' {{inflection of|pi|ສຣີຣ|tr={{l|pi|sarīra}}||nom//acc|s|t=body}}

which yields

  1. Lao script form of sarīraṃ, which is nominative/accusative singular of ສຣີຣ (sarīra, body)

This definition succinctly links to both the Lao script stem (where the Lao script inflection will be found), and to the Roman script stem, where other meanings and general information will be recorded. Unfortunately, the transliteration marked up as a link is deemed not to be the same as the transliteration of the Lao stem. Consequently, the page is placed in cat:Terms with manual transliterations different from the automated ones/pi.

@Benwing2, Octahedron80: What I want is something like the link_tr property of language objects, that causes transliterations to be converted to links, but on a selective basis. There is a dirty trick to get what I want, which is to convert the template invocation to

{{inflection_of|en|ສຣີຣ#Pali|tr={{l|pi|sarīra}}|sc=Thai||nom//acc|s|t=body}}

What I would like is something like

{{inflection_of|pi|ສຣີຣ|link_tr=1||nom//acc|s|t=body}}

to provide what I want. Thoughts?

Inflection tables[edit]

@Benwing2, Octahedron80: Pali words in the Thai and Lao scripts cannot always be transliterated without knowing the writing system. For inflection tables, where manual overrides would be horrendously tedious, I use the trwo entry point to the module. I then need to pass the transliteration down to full_link() in Module:links. Is there any good was to disable the automatic generation and comparison? Delving into the code, it looks as though calling links.full_links(term, nil, nil, true) would work to suppress categorisation as having a redundant transliteration, but the fourth argument seems to be purely for internal calls. I also want mismatches to be accepted - what full_links() calls a manual transliteration (variable manual_tr) is actually itself an 'automatic' transliteration, but one that uses knowledge of the writing system in use. I'm currently using the above trick (in function orJoin() in Module:pi-decl/noun) to avoid the check. --RichardW57 (talk) 21:21, 8 June 2021 (UTC)[reply]

@Benwing2, Octahedron80: The 'above trick' above doesn't actually reference anything! The trick is actually mentioned in Module_talk:pi-decl/noun#Transliteration_Issues. --RichardW57m (talk) 10:33, 16 March 2023 (UTC)[reply]
Using script IDs to distinguish writing systems was discussed at Module_talk:scripts/data#Aphabetic_Thai_and_Lao back in April 2021; the idea was unpopular. --RichardW57m (talk) 10:31, 16 March 2023 (UTC)[reply]


Repinging @Benwing2, Octahedron80 as the previous ping may not have worked. --RichardW57 (talk) 08:04, 2 July 2021 (UTC)[reply]

I have an idea to add the variation parameter to specify which system to be trans? Just like Mymr has done. --Octahedron80 (talk) 08:12, 2 July 2021 (UTC)[reply]
But add it to what? Are you suggesting that links.full_links should know about the entry point trwo()? So far as I am aware, only Pali transliteration has such an entry point. --RichardW57 (talk) 08:30, 2 July 2021 (UTC)[reply]

ś and ṣ[edit]

ś and ṣ are not used by Pali and should be removed. Pali uses only s. --Octahedron80 (talk) 00:39, 8 June 2021 (UTC)[reply]

Octahedron80 then withdrew that comment, saying, "NVM I just saw that it is used for tr. Sanskrit too.".
I've unwithdrawn that comment, as I wanted to record another reason for supporting these two letters. They apparently do leak into later Pali. Furthermore, they seem to occur in more conservative Pali text (my emboldening):
"With respect to editorial principles, the critical apparatus is negative. Only substantive variants have been recorded, that is, variants that alter the sense, metre or syntax of the text. This includes variants that are incorrect in that they are nonsense or defy the standard metre or syntactic usage. Non-substantive variants largely include minor orthographic changes common to Sinhalese manuscripts, such as the interchange of anusvāras and homorganic nasals, the retroflexion of dental nasals, and the palatalisation or retroflexion of the Pali dental sibilant. Where a reading is noted in the apparatus its own orthographic peculiarities are preserved."
- Alastair Gornall and Aleix Ruiz-Falqués, Verses of a Dying Arahant: A New Translation and Revised Edition of the Telakaṭāhagāthā, Journal of the Pali Text Society, Vol. XXXIII (2018), pp. 55–100 (accessed at https://d1wqtxts1xzle7.cloudfront.net/61789286/Gornall_Ruiz-Falques_201820200114-75914-1xxm1ay.pdf?1579070294=&response-content-disposition=inline%3B+filename%3D2018_with_Aleix_Ruiz_Falques_Verses_of_a.pdf&Expires=1623149155&Signature=TQkF9VvPg5gsSfP05zDE9MERMbvQtoBIT6~WTtVnN6yVM0ovOzOhZaqZL5s7p2NcK59zJgxcz8pAR1CuCPWJw~jPgdGov9x1G97gxHY8xvVBDuVupNGtsW5lJiBg2pYDYuVJmxUDo3qYQwPlSQzcjk~RJI9zEAxdZRcxuxioz6Gpz7ObSf3EkY3XGgrvrCsf--gE2H9BNuYOM59EyAJ~Pz7gblI89rde023-7Q3mBtXziJw1aNWcKrZXevKG2~peMXqitYP9~2oewaNbK0h6se0qCT0Ezuy9FjHd89vcvZdwV91IfsG2sniiNQx54ZeX3ZfLnzQcWji8mxFgxR8d9g__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA)
@Octahedron80: Out of caution, I've also supported ṛ, ṝ and ḷ and even ḹ. This then gives the bonus of also supporting 6 scripts for Sanskrit, though we will have to revisit Lao and Lana for syllabic consonants when we find examples of their use in Sanskrit. There may be an unambiguous way of writing the syllabic consonants in the lao script, and I've yet to see how the Lana script handles long syllabic consonants. The Lana vowel letters for them will be interesting. --RichardW57m (talk) 11:20, 8 June 2021 (UTC)[reply]

Myanmar SIGN II[edit]

Assuming @Octahedron80 knows what he is talking about (quotations would be good), we have a problem with transliterating this vowel sign. While in most Burmese script writing systems for Pali, the correct transliteration is as ī (e.g. ဂီတ (gīta)), he claims that the normal Mon way of writing Pali iṃ in Pali is to use SIGN II. (That's not borne out by the writings of @咽頭べさ, for which see the example at တသ္မိံ (tasmiṃ) and the source of the example.) Therefore, this cannot be handled by the tr() interface. The natural method is to treat it as an exception and input the transliteration manually. --RichardW57 (talk) 04:54, 2 July 2021 (UTC)[reply]

Please note that the masculine/neuter locative singular seems to be an invalid example - Mason reports that the Burmese used -smi/-smi(ṃ) and -mhi/mhi(ṃ) indiscriminately, and given the poor discrimination between final -i and -ī, I wouldn't place a lot of trust in random manuscripts. --RichardW57 (talk) 04:54, 2 July 2021 (UTC)[reply]

(I will use က as dummy.) The sign II ကီ is used for IM in Mon Pali & Mon. Lots of printed books use that, or even Shorto's Mon lexicon. Normal IM ကိံ is formed in Burmese/Shan. For the plain vowel II, Burmese/Shan use ကီ and Mon uses ကဳ. This is a significant difference why I put variation option. I have talked to 咽頭べさ what resources he has.--Octahedron80 (talk) 05:13, 2 July 2021 (UTC)[reply]
In case you want some picture [1] [2] [3] [4] [5] --Octahedron80 (talk) 05:30, 2 July 2021 (UTC)[reply]


Caption text
Languages I Ī IṂ ĪṂ
Burmese/Shan Pali & Burmese ကိ ကီ ကိံ ကီံ (Burmese only)
Mon Pali & Mon ကိ ကဳ ကီ (or ကိံ also happen?) not happened
And what's the evidence that the word you point to is yuvatiṃ? Other words are being cited in the nominative, so I think that ယုဝတီ should rather be read as yuvatī. I'd like to see some quotations in Wiktionary. And what's Pali 'ĪṂ'? (It may well exist in Sanskrit.) --RichardW57 (talk) 06:04, 2 July 2021 (UTC)[reply]
Pali does not have ĪṂ. --Octahedron80 (talk) 06:08, 2 July 2021 (UTC)[reply]
About yuvatiṃ, I'm pretty sure because all words in the Pali column are already in the nominative form (nom-s/p). [6] --Octahedron80 (talk) 07:11, 2 July 2021 (UTC)[reply]
@Octahedron80:: I think something is lost in translation. My point is that yuvatiṃ would not appear in that column, because it is not a nominative singular. (I admit that there are nominative singulars ending in -iṃ, but they are a relatively rare form of neuter nouns, not of feminine nouns.) --RichardW57 (talk) 08:04, 2 July 2021 (UTC)[reply]
Perhaps, you may request him for some textbooks. He has a lot more than this dictionary. --Octahedron80 (talk) 08:41, 2 July 2021 (UTC)[reply]
The picture #2 #3 #5 are mine. It is no doubt that Mon language use lots of ကီ as IṂ (or at least a part of another vowel + Ṃ); we both have dictionaries to compare. The picture #1 #4 are from Intubesa that he sent me. About Mon Pali, to be sure, you and me must ask Intubesa for more pages.--Octahedron80 (talk) 06:17, 2 July 2021 (UTC)[reply]
His facebook: [7] Please use simple English to communicate. --Octahedron80 (talk) 06:25, 2 July 2021 (UTC)[reply]
Maybe, you could fix the code to support this; I just tried but it didn't work. --Octahedron80 (talk) 05:40, 2 July 2021 (UTC)[reply]
A proper fix would go under entry point trwo(). I've opened up WT:Beer parlour/2021/July#Regional_Variations_in_Pali_Inflection_Tables for the general discussion. I think the |instr= parameter will be quick fix method to start implementing it - convert II to <I, Ṃ> before applying transliteration, but an inflection table option will be the nicer solution. (I haven't encoded that parameter yet; I've written a test case at User:RichardW57/sandbox.) I would like non-Shan Burmese verbal inflection tables to show it as a footnoted alternative form for the first person singular of the aorists in -i. I must start working on adding footnotes to inflection tables. I believe we need it to be possible for entries to add footnotes to inflection tables. At the moment my technical problem is how one should specify footnotes via the templates parameters. --RichardW57 (talk) 06:04, 2 July 2021 (UTC)[reply]
@Octahedron80: The option |subst= has now been added to {{pi-decl-noun}}, {{pi-conj-special}} and {{pi-conj-future}}. --RichardW57 (talk) 19:02, 3 July 2021 (UTC)[reply]

Why did you tag me?[edit]

@RichardW57 Is the issue that some transliterations were not displaying due to the fact I made incomplete transliterations return nil? If so, that was working as intended, and you should find a more suitable way of displaying problematic characters in the output that doesn't involve displaying the raw input characters with the Latin (i.e. wrong) script code. Thank you. Theknightwho (talk) 00:03, 5 March 2023 (UTC)[reply]