Module talk:fa-ira-translit

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Working on this module[edit]

@‎Awesomemeeos You don't have to work on this module, you know, even if it's just to prove some points? It's a very difficult task - there are still many aspects, which will make the module fail (the current test cases are basic) and the benefits and community acceptance are questionable. --Anatoli T. (обсудить/вклад) 02:27, 27 February 2017 (UTC)[reply]

@‎Atitarev Okay. But what points? — AWESOME meeos * (chōmtī hao /t͡ɕoːm˩˧.tiː˩˧ haw˦˥/) 03:00, 27 February 2017 (UTC)[reply]
That you can do it, serious about it, whatever? --Anatoli T. (обсудить/вклад) 03:27, 27 February 2017 (UTC)[reply]

Review[edit]

Hi @Atitarev. I've just changed the code slightly so the tescases all pass. Do you think the current code suffices or does it need some major improvements still (aside from handling Arabic loanwords), and can you tag other Persian Wiktionary users (I'm not aware of them), so I can take their opinion too, please?
-Taimoor Ahmed(گل بات؟) 22:28, 15 June 2021 (UTC)[reply]

@Taimoorahmed11: Thank you belated and sorry for the long silence.
My Persian is limited and I am not sure about some other possible cases with less commonly used letters or combinations.
My questions are whether Persian automated transliteration should be based on a stricter vocalisations with sukun (jasm) or as it is now? When should the transliteration fail (return nothing - insufficient info)? Should absence of diacritics before و and ی make the vowels "u" and "i" accordingly?
Since the module is working well, perhaps it's worth considering vocalisation all headwords and "productionise" the module?
(Notifying Ariamihr, Dijan, Mazsch, Qehath, ZxxZxxZ): ? Any issues may be resolved on the actual entries. Manual |tr= should still take precedence, the automation should only work on where it's missing.
I've just added a new case in Module:fa-translit/testcases, please change if you disagree - بَرادَرِ بُزُرْگ (barâdar-e bozorg). Can the ezafe be predicted after the final consonant before another word, so should a hyphen be inserted before kasra (zir)? I.e. automate as "barâdar-e ..." vs "barâdare". --Anatoli T. (обсудить/вклад) 04:26, 15 December 2021 (UTC)[reply]
@Atitarev: No worries! I appreciate your reply, and notifying other Persian users.
As far as when the module should not transliterate, that is slightly, difficult because Persian words on their own can be transliterated. One thing is for sure, is that if a number of consonants appear, in a row - with no diacritics, then the module should not transliterate, same with vowels.
I think the module works for the most part, and I wouldn't mind actually putting it into use, I would want to just have opinions from an actual Persian speaker, to understand more about the transliteration module that should be used, and whether the current standard suffices (+ for them to also try out the module with Persian lemmas).
When it comes to the Ezafe, that can be detected if a word ends in a Kasra/Zir and follows another word. I believe outside of Arabic, languages which use the Arabic script tend to not have diacritics as the final character of a word.

-Taimoor Ahmed(گل بات؟) 04:44, 15 December 2021 (UTC)[reply]
@Taimoorahmed11: Thank you.
On returning nothing (failure to transliterate): if making sukun (pls let me use the more common Arabic word) mandatory, except for the final position solves the problems, let's use this rule, otherwise the transliteration will produce numerous false results on unvocalised or only partially vocualised words. In other words, if sukun is mandatory, a missing non-final vocalisation will produce nothing.
I am not 100% sure I understand what you mean by "because Persian words on their own can be transliterated".
I'd like to know more on rules for و and ی, so should strings (without diacritics) like کو or کی automatically produce "ku" and "ki"? (کُو and کِی with diacritics are actually "kow" and "key")?
Re: ezafe - that was my assumption too, the final kasra is only used in ezafe or strictly Arabic literal borrowings. --Anatoli T. (обсудить/вклад) 05:27, 15 December 2021 (UTC)[reply]
Depending on the decisions made, some logic can be taken from Module:ar-translit on failing to produce (wrong) transliterations. Also calling @Benwing2, Erutuon, Fenakhay, Fay Freak for their skills and possible interest. --Anatoli T. (обсудить/вклад) 05:31, 15 December 2021 (UTC)[reply]
@Atitarev, Taimoorahmed11 I agree that sukūn should be mandatory for transliteration to avoid false positives. I see however that per Persian alphabet: "Of the four Arabic short vowels, the Persian language has adopted the following three. The last one, sukūn, has not been adopted." If this is true, one thing we can do is include the sukūn in Persian words, so it gets transliterated, but make the module code remove it before displaying the Persian text. This should be easy to do because I already implemented something exactly similar for Korean involving hyphens (hyphens can be inserted in Korean text and show up in the transliteration but not in the Korean text itself). Benwing2 (talk) 05:48, 15 December 2021 (UTC)[reply]
@Benwing2, Taimoorahmed11 If sukun is not commonly used in Persian vocalisations, it's a great idea to hide it from end users! (Other symbols can also be used, Japanese romanisations uses a few tricks: "^", ".", "-" and space to affect only transliterations. --Anatoli T. (обсудить/вклад) 06:15, 15 December 2021 (UTC)[reply]
A sequence kow, which you above mention, Anatoli, is classically/traditionally kaw, so actually کَو, while the sequence کِی, which would be classically kiy, appears unlikely to me, which means I don’t know if it exists (probably not though, similarly to the Arabic rule that iy becomes ī, which renders the spelling ـِي unambiguous) but کی should bear a fatḥa and have kay in the classical and even modern Persian transliteration/-scription; I don’t know how it comes that Iranians of our day insist to transcribe e when the classical vowel is a, as here and on نارنج, has irregularly become /e/ instead of /æ/ — you see in نارنج’s edit history that I have given up against the influx of Iranian knowbetters who are completely ignorant of changes in Persian phonology, they are apparently like Germans or Spaniards in the Middle Ages pronouncing Latin, or the pedagogues described by Eduard Engel as quoted on Schulmann who would explain to Plato that Ancient Greek is pronounced exactly like currently in German schools.
They are clearly insensitive to our interest in diachrony and all regiolects (Dari, which I sometimes had to cover specifically as Afghans could not care less to edit Wikipedia) and apparently don’t actually know which ḥarakāt words have. The same bias is on Wikipedia e.g. w:Persian phonology#Spelling and example words and their article w:Modern Hebrew verbs is completely unreadable, by virtue of this assumption that one should look bottom-up from the modern language and not aim at being consistent with historical descriptions, in spite of us talking about languages of literary heritage and tradition just apply vocalization and transcription schemes in the most vulgar way found on the streets (which we luckily have averted in Arabic, and Hebrew editors are in the process of mending on Wiktionary).
So pursuant to that view, which is natural to Atitarev in so far as I know him to be insecure or less knowledgeable about historical language developments and do it like certain other people with less philological learning, one just grafts Modern Iranian vowels onto words, from the current Iranian spoken language onto the written language, to have words with diacritics, which turns out wrong as it is کَی and کَو, not کِی and کُو: don’t be Wikipedia. (I intend not to offend Wiktionary editors here, you are great, Anatoli!) Fay Freak (talk) 06:30, 15 December 2021 (UTC)[reply]
@Fay Freak: I know your view on modern Iranian Persian transliterations but let's not deviate, it is the current standard now (spoken is even more different from the standard). If we are to work on Dari or Classical Persian, then the methods and results will be quite different. If we include the coloquialisms, they will be different still. I am aware to some extent of the phonetic changes, so I don't quite appreciate some of your comments. نوروز is "nowruz" now (modern Iranian), even if it is "nawrôz" in Dari or classical Persian or "navrüz" in Tajik. --Anatoli T. (обсудить/вклад) 07:30, 15 December 2021 (UTC)[reply]
PS. If you think that a parallel handling should be developed, say so, perhaps additional parameters will be required but it's not only the results, the diacritics themselves are different between modern and classical. I think it would be possible to transliterate for modern Iranian and classical Persian. In my opinion, modern Iranian should take precedence and it's not so much about Wikipedia, you will find that majority of Persian contributors used modern Iranian. It is the current consensus, if I am not mistaken. --Anatoli T. (обсудить/вклад) 07:45, 15 December 2021 (UTC)[reply]
@Atitarev: But (even in Iran) کَی and کَو is still correct, right? (My claim/impression is that unlike Arabs Iranians rarely see vocalized Persian and therefore superimpose ḥarakāt matching current vowels. Which would oddly mean that Afghans use the vowel signs differently.) And the transcription—would be kay and kow with this module, but Iranians will insist on key (for some reason, I don’t know if they can be convinced otherwise). So the question, apart from whether we truly agree on the usage of the diacritics, is: how will these irregular developments be handled?
The distinction of ô/û and ê/î is a separate issue, I don’t see how it could be handled with Unicode ḥarakāt.
That distinction is also more complicated than many realize (someone may correct me if I err here); I think there have been at least three stages where they have been merged: One from the transition of Middle Persian to New Persian, as in Persian کی (when) and ایران (Iran); another in the later Middle Ages, which appears to have reached India but not early Medieval Turkic borrowings (→ Talk:بوییدن); lastly from the 19th century by now completed in the whole of Iranian Persian). But I haven’t raised this issue now, you have opened the can … in my previous message on this page I have only raised the issue of fatḥa corresponding to /e/ and diacritic usage mismatching in our understanding. Fay Freak (talk) 08:02, 15 December 2021 (UTC)[reply]
@Fay Freak: If Iranian meant to produce "kaw" (phonetically), they would use کَو, if they mean "kow", they would use کُو. So, if using نوروز, the modern Iranian diacritics would use نُوروز ("nowruz") and Dari would use نَوروز ("nawruz"). Note the unmarked "ر".
Re: "But (even in Iran) کَی and کَو is still correct, right?" Not sure what you mean. These pairs are pronounced "kay" and "kaw" (modern as well), to get "key" and "kow", you would use کِی and کُو. No diacritics will produce long "ki" and "ku". The current difference in usage of کِی and کُو vs unmarked کی and کو makes very different from Arabic or classical Persian and is admittedly confusing. I am not sure if classical Persian ever used e.g. kasra before a long "i".
"how will these irregular developments be handled?" It would be possible to have two (or more) systems (|var=classical, |var=Iran) where e.g. kasra is "e" for Iranian and "i" for classical. --Anatoli T. (обсудить/вклад) 08:29, 15 December 2021 (UTC)[reply]
@Atitarev: Well and I think it is phonematically کَو in Modern Iranian Persian when it is kow (kaw doesn’t even exist?), and کی (when) is still correctly کَی, even though it be key. I am not sure that “modern Iranian diacritics” exist, I granted these to be but an error that one can make if one only know the modern vowels and write the diacritics after each such vowel. Does one really think, in the Iranian education system, that نُوروز is a normative spelling? Is this in modern Iranian dictionaries? Am I just living under a rock?
In either case if we use the classical ـَو, as e.g. in Vullers, Johann August (1856–1864) “fa-ira-translit”, in Lexicon Persico-Latinum etymologicum cum linguis maxime cognatis Sanscrita et Zendica et Pehlevica comparatum, e lexicis persice scriptis Borhâni Qâtiu, Haft Qulzum et Bahâri agam et persico-turcico Farhangi-Shuûrî confectum, adhibitis etiam Castelli, Meninski, Richardson et aliorum operibus et auctoritate scriptorum Persicorum adauctum[1] (in Latin), volume II, Gießen: J. Ricker, page 1367, the module could output the modern transcription anyhow, so I wouldn’t see why to write ـُو which could be interpreted as u resp. classical ū. Fay Freak (talk) 08:51, 15 December 2021 (UTC)[reply]
@Fay Freak: Well, according to نُوروز the vocalisation "نُوروز" is the standard Iranian vocalisation and "نَوروز" will bring you mostly Urdu pages. I am not so sure if the confusion exists among Persian speakers. We can even have the same discussion about Arabic, where أَنْتَ (ʔanta) may be read as "ʾinta" (not the standard "ʾanta") by many Arabs (who will insist it's the correct reading) despite the definite fatḥa. It is known that diacritics is much less common with Persian. You can try googling for different combinations. "کِی" ("key") is also searchable. Native speakers may give a better insight. --Anatoli T. (обсудить/вклад) 09:05, 15 December 2021 (UTC)[reply]
Re: "Does one really think, in the Iranian education system, that نُوروز is a normative spelling? Is this in modern Iranian dictionaries? Am I just living under a rock?" To my knowledge and my sources yes. I possess two textbooks, a small Farhang-Moaser dictionary and a phrasebook. They are all based on modern Iranian and "نوروز" appears as "nowruz". The "Complete Persian" has a short introduction on diacritics and it matches what is called the modern Iranian. --Anatoli T. (обсудить/вклад) 09:12, 15 December 2021 (UTC)[reply]
@Fay Freak: I don't know if I managed to convince you but ـُو in modern Iranian is -ow, not but it is in the classical Persian and Dari and I agree it's confusing - it depends, which method you use. BTW, I have created a vote in Wiktionary:Beer_parlour/2021/December#Persian_automated_transliteration. I don't want to force my opinion but it may take some time to get a clear community opinion on this. --Anatoli T. (обсудить/вклад) 09:25, 15 December 2021 (UTC)[reply]

ezâfe with هٔ[edit]

@Benwing2, Tspielberg I incorrectly made a previous test case کُرِۀ شُمالی (it was wrong). The correct spelling is کُرِهٔ شُمالی (kore-ye šomâli) where هٔ is used (U+0647 U+0654). Now it's failing. Could you please fix it?

If I am not mistaken, @Tspielberg you worked on this module when you were Taimoorahmed11.

I have changed from ىٰ to یٰ fix the case with قُرونِ وُسْطیٰ (qorun-e vostâ), which passes now. Anatoli T. (обсудить/вклад) 05:35, 17 March 2023 (UTC)[reply]

UPDATE: It's fixed now, although it may not perfect.
The case with "اَرمَنِستان" is failing intentionally, until a method is chosen about the sokun. Anatoli T. (обсудить/вклад) 05:43, 17 March 2023 (UTC)[reply]

New test cases[edit]

@Benwing2 I was able to add "xâ" for خوا but not "e-yi" for "ه‌ای". Could you please take a look? Anatoli T. (обсудить/вклад) 05:33, 22 March 2023 (UTC)[reply]

@Atitarev Hi, I'll take a look after I finish rewriting Module:fa-IPA. Currently I'm working on that module, doing things like adding Tehrani and Kabuli pronunciation (which are generated only when specifically enabled) and automatic stress, so we can convert all the stuff using {{fa-IPA/old}}. Benwing2 (talk) 05:37, 22 March 2023 (UTC)[reply]
@Benwing2: Thanks!
I noticed (in case you're not aware) something happened with the display or it was always like this. مسلمون (mosalmun) (regional Tehrani only) displays two dots before the IPA. Anatoli T. (обсудить/вклад) 05:44, 22 March 2023 (UTC)[reply]
@Atitarev Hmm, haven't noticed that before in this case but I think I've seen it elsewhere. I'll see if I can fix it as I fix the module. Benwing2 (talk) 07:07, 22 March 2023 (UTC)[reply]
@Atitarev I think because it's *only* displaying the Tehrani accent, which is normally indented under the Iranian pronunciation. Benwing2 (talk) 07:23, 22 March 2023 (UTC)[reply]
@Benwing2: Yes, I knew that. That’s the case where other varieties are missing. Anatoli T. (обсудить/вклад) 07:31, 22 March 2023 (UTC)[reply]

Test cases[edit]

@Sameerhameedy: Hi. Could you please also take a look at this project, perhaps define the vocalisation rules or is it already working on another module? Anatoli T. (обсудить/вклад) 05:55, 29 August 2023 (UTC)[reply]

@Atitarev I can work on this and probably complete it by the end of the week but:
1) I think this module should only transliterate links with the language code: fa-ira. Even though IP is the main transliteration on Persian entries, most languages borrow from Classical Persian, not Iranian Persian. So making Iranian Persian the default site-wide is not a good idea.
2) For the reasons mentioned above, I would prefer if this page was located at fa-ira-translit. But I don't care that much since the name doesn't affect usage. We can just leave it as is if moving it is more trouble than it's worth.
3) There is probably no way to indicate majhūl vowels, due to how Iranian Persian uses diacritics. But Majhūls are very rare in Iranian Persian anyway, so it probably will have very little effect. سَمِیر | Sameer (مشارکت‌هابا مرا گپ بزن) 06:13, 29 August 2023 (UTC)[reply]
@Sameerhameedy
No objection in how you design it. If it's easier to use the classical as the base or split into separate modules, go for it. You can also rename the module.
It's not expected to transliterate 100%, there are a few translit modules that occasionally or under certain conditions require manual input, i.e |tr=. As long as it can transliterate an overwhelming majority of fully vocalised words, it's worth the effort. If the module starts working, perhaps some bot writers will be able to vocalise and auto-transliterate most entries.
I've seen dictionaries that default to classical Persian or modern Iranian, it is the author's choice, I guess. As long as it's consistent, agreeable with other editors and the other form is also available. Since you're the most active Persian editor at the moment, you sort of move it in the direction you want. Don't forget to update the policy documents, e.g. WT:About Persian, WT:FA TR and keep the community in the loop.
I also think that a vocalised form can be the first unnamed parameter, rather than using |head= but I am ahead of it. Anatoli T. (обсудить/вклад) 06:28, 29 August 2023 (UTC)[reply]
@Atitarev hi I moved the page, and was about to start working on this. However it seems that the module was used by balti and now is causing issues for Balti entries. Though after looking it up, Balti's phonology seems to be much closer to classical Persian than to Iranian Persian. Perhaps Balti should be switched to the Classical module?? سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 21:01, 5 September 2023 (UTC)[reply]
@Atitarev the module is finished and appears to be working, might need more test cases though. Because my Lua knowledge has become much better since writing fa-cls-translit, I fully rewrote both of them. Dari and Classical needed some clarifications because they had more vowels. I didn't add any vowel clarifications here because they don't appear to be needed. Through we won't know until we test it.
also, because the module treats any alif that's not paired to a consonant as a Zero consonant. قهوه‌ای has to be substituted as قهوه‌یی because the module would treat the alif as a zero-consonant (a silent consonant that is only written so a vowel can be paired to a consonant). I could make an exception specifically here but i'm not sure since the alif here is indeed a zero consonant... lemme know what you think. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 23:35, 5 September 2023 (UTC)[reply]
@Sameerhameedy: Thank you for this work!
I don't understand or quite agree with the following cases you have changed., though.
  1. خود to خوَد as "xod". What does fathe do?
  2. واو - from "vâw" to "vâv". Is it pronounced [vɒːv] in Iranian? Isn't it analogous to "ow"
  3. قَهْوِه‌ای - from "qahve-yi" to "qahve-i", respelled "ğahve-yi" and pronounced [qæɦwe.jíː]?
I will add more cases when I know more. Anatoli T. (обсудить/вклад) 23:43, 6 September 2023 (UTC)[reply]
خوَد is because it was pronounced xwad in classical Persian and in Iranian Persian the sequence xwa always collapses to xo, otherwise it would transliterate as xud. I suppose I could just make an exception for the word خود but if I just always collapse خوَ, that means that vowel change can be put anywhere. I can't just have خو -> xu though because the sequence xu does happen in Iranian Persian.
No iranian /ow/ = cls /aw/, Iranian Persian lost the diphthong āw (according to wikipedia and jahanshiri).
Well ق and غ are pronounced the same in Iranian Persian, according to Saranamd they are both pronounced as [q] word initially. (Though according to ZxxZxxZ it was ɢ). The fa-IPA respelling just always shows ğ. I can either change a word initial [q] -> [ɢ] and always use غ, or use q/[q] and ق word initially for all spellings. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 23:56, 6 September 2023 (UTC)[reply]
@Sameerhameedy: Thank you for the clarification!
The vocalisation خوَد, transliterated as "xod" might be harder to explain, unlike خوانْدَن (xândan) and خویش (xiš)
Re: قَهْوِه‌ای‎ my question wasn't about "q" but about the "-yi" vs "-i", sorry for not making it clear. The IPA generated has also a [j] ([qæɦwe.jíː]).
BTW, I'm fine with your solution regarding the silent alef. Anatoli T. (обсудить/вклад) 00:11, 7 September 2023 (UTC)[reply]
There are other words similar to خود xwad/xod, including خوش xwash/xosh.
In Iranian Persian, both ق and غ are pronunced as [ɢ] in intial position, and as [ɣ] everywhere else. In Dari, Tajik, and Classical Persian these letters are pronunced differently, and the phonemic distinction is preserved. --Z 16:12, 7 September 2023 (UTC)[reply]

o + و = "ow"?[edit]

@Sameerhameedy: Hi. On خودرو {{fa-IPA|xwadraw}} produces Iranian: [خُدْرَو] and "xodrav". Shouldn't that be [خُدْرُو] and and "xodrow"? Anatoli T. (обсудить/вклад) 08:40, 13 September 2023 (UTC)[reply]

@Atitarev yes it's one of the issues I listed at Template:fa-IPA, the fa-IPA/harakat just does a 1=1 conversion. So incorrect phonetic spellings are because fa-IPA sent the wrong latin spelling, not because of fa-IPA/harakat. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 18:22, 14 September 2023 (UTC)[reply]

Module name and language codes[edit]

@Sameerhameedy, @Benwing2, @Saranamd:

Hi, I have undone my inclusion of the module for Persian transliteration. Transliteration modules work with language codes or there is some other problems. diff caused errors. Either the module should be fixed or renamed or "fa-ira" code should work. Anatoli T. (обсудить/вклад) 04:21, 13 October 2023 (UTC)[reply]

@Atitarev Hi getting different etymology codes for the same language to use different transliteration modules was a problem I remember being discussed before. I think @Theknightwho devised a way for that to work however. The language code "fa-ira" should use "fa-ira-translit" and "fa-cls" "haz" and "prs" should all use "fa-cls-translit" (probably also "tg" since Category:Tajik language lists "fa-Arab" as a recognized script). سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 04:33, 13 October 2023 (UTC)[reply]