Module talk:ru-translit

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Adding word stress[edit]

Will {{#invoke:ru-translit|tr|сло́во}} (with an accent) return "slóvo"? --Anatoli (обсудить/вклад) 02:09, 14 March 2013 (UTC)

You can test it yourself. Look here: [MODULE CALL REDACTED] —Μετάknowledgediscuss/deeds 02:19, 14 March 2013 (UTC)
I see now how you can test it! Does it work for longer texts?
Testing on a random news text:
TEST: [MODULE CALL REDACTED] --Anatoli (обсудить/вклад) 02:34, 14 March 2013 (UTC)

The code is too opaque, I don't understand it![edit]

I am afraid that this code was written to be so clever that I can't understand it. The variable names mean nothing, there are barely any comments to explain what each step does and why. How does the module actually approach the problem? What is the flag parameter for? I would really like this module to be cleaned up and made more readable. This is a wiki after all, so everyone with enough knowledge of Lua should be able to easily edit this, and the last thing we need is more arcane code that nobody except the creator can maintain. —CodeCat 03:22, 14 March 2013 (UTC)

Hard to follow indeed.
  1. "if not mw.ustring.match(flag,"г") then word=mw.ustring.gsub(word,"([ое][́̀]?)го([́̀]?)$","%1vo%2")" romanises -ого"/"-его" as "-ovo"/"-(j)evo"
  2. "word = mw.ustring.gsub(word,"([АОУЫЕЯЁЮИЕаоуыэяёюиеъь][́̀]?)е","%1je");" romanises Cyrillic "е" as "je", not "e" after any of "АОУЫЕЯЁЮИЕаоуыэяёюиеъь".
  3. "word = mw.ustring.gsub(word,"([жшчщЖШЧЩ])ё","%1o");" romanises "ё" as "o" after any of жшчщЖШЧЩ. --Anatoli (обсудить/вклад) 06:17, 14 March 2013 (UTC)
I think I am starting to understand the general idea. But how does the code know when to transliterate г as v? Does it have something to do with the flag parameter? Personally, I don't think the module should be capable of handling such irregular exceptions. It should provide a sensible default, but the default should be able to be overridden if necessary. That is what the tr= parameter would be for, after all... —CodeCat 14:01, 14 March 2013 (UTC)
That beats me, we need Ignatus to reply.
Romanising "-ого"/"-его" as "-ovo"/"-(j)evo" should NOT be done via the tool. Russian has words "много", "ого" where "г" is pronounced as expected or /h/ in ого as a variant. Override manually.
Same with "Чч" as "š", in что, чтобы, конечно. Override manually.
Consistent change - "Ё,ё" as "o" after жшчщЖШЧЩ - OK.
Consistent change - "Е,е" as "je" after АОУЫЕЯЁЮИЕаоуыэяёюиеъь - OK. Add ALL capitals Ъ, Ь to the list.
Don't use "ɛ" at all! It's reserved for foreign words where consonants (бвгдзклмнпрстфх) (excluding жшчщЖШЧЩ and "цЦ") after "е". I don't follow the logic of the code but override manually. In short, "Э, э" is always "e", "Е, е" is "e" or "je" after АОУЫЕЯЁЮИЕЪЬаоуыэяёюиеъь.
This should make the code simpler. Please ask if it's confusing. --Anatoli (обсудить/вклад) 22:34, 14 March 2013 (UTC)
Well, let me reply. Yes, maybe my idea with flags was not good. Simplifications you described can be accepted except it's better to handle -ого/-его by default; most words ending on them are genitives; there just should be a switch-off for cases fhen thy are definitely not, e.g. for {{ru-verb}}. "Что" should be listed as exception since it appears very often, other words with ч=ш may be transliterated manually. Exceptions with е=э are frequent altogether but each word with them is not very, so they will cause need in manual input common. Maybe we should use another way to denote specialities to letters for translit and inflection, like marking them in-place once in template (see my talkpage for suggestion). And, OK, I don't like now that in the module there are different functions for single words and phrases; we should rename phr into tr, and curent tr use innerly if it needed at all. Ignatus (talk) 14:13, 15 March 2013 (UTC)
I would prefer it if exceptions are not added to the module at all, but are just supplied with the tr= parameter. So the module is only used to provide a default. —CodeCat 14:22, 15 March 2013 (UTC)
  • The module was rewritten. It transliterates any phrases with function tr in simplest manner except that words starting on что and ending on ого and его are always treated specially; if genitives are obviously not awaited, as for {{ru-adv}}, parameter nogen= with any value can be sent to #invoke. Restore finally the doc subpage please and documentate this; I'm going now to do changes to affected templates. Ignatus (talk) 13:02, 16 March 2013 (UTC)
    • If the transliterator handles phrases (which it should), then the patterns need to match not just on the beginning or end of the string, but also next to characters that separate words like spaces and punctuation. Currently, it would correctly transliterate "jego" but not "u jego brata". —CodeCat 14:07, 16 March 2013 (UTC)
      • [MODULE CALL REDACTED] - already fixed. Ignatus (talk) 18:32, 16 March 2013 (UTC)
        • Oh... ok? I'm confused now because I don't see anything in the code that makes that work... —CodeCat 19:00, 16 March 2013 (UTC)
          • Line 20. %A is used to determine end of word. Ignatus (talk) 19:52, 16 March 2013 (UTC)

There are a few problems[edit]

The most obvious one first:

  • мно́го (mnógo) (correct): [MODULE CALL REDACTED] (wrong) --Anatoli (обсудить/вклад) 02:11, 17 March 2013 (UTC)
    • Yes, not all words can be converted automatically. Since most words ending on -ого and -его in Russian are genitives should-be transliterated via -v-, it's default behaviour for this module. If you are transliterating word which is definitely not genitive (e. g. in {{ru-adv}}), invoke it with parameter nogen: [MODULE CALL REDACTED]. Ignatus (talk) 09:29, 17 March 2013 (UTC)
Thank you for your efforts.
I suggest we should remove lines 17 to 23 altogether:
    --handle genitive endings, which are spelled -ego but transliterated -evo
    if not frame.args['nogen'] then
        word = mw.ustring.gsub(word, "([ое][́̀]?)го([́̀]?)$","%1vo%2")
        word = mw.ustring.gsub(word, "([ое][́̀]?)го([́̀]?%A)","%1vo%2")
    --Handle common exception words with ч
    word = mw.ustring.gsub(word, "[А-ЯЁа-яё][А-ЯЁа-яё́̀]*",function (w) return w:gsub("^Что","Što"):gsub("^что","što") end)
Let's have a manual override for these kinds of exceptions. The adjective declension tables will have a note on pronunciation of "-ого/-его". As for "что", "чтобы", "что-то", "ничто", "конечно". It's easier to add manual override than rely on the list of exceptions. --Anatoli (обсудить/вклад) 10:17, 17 March 2013 (UTC)
I agree. Also, are there words in Russian where the stem ends in -g- and they receive -o as an inflectional ending? Like neuter adjectives? —CodeCat 14:39, 17 March 2013 (UTC)
The stem ending in "г" has be preceded by "о" or "е" for this test. That would be "строго" (both an adverb and an adjective form).
More arguments in favour of removing "что"'s special treatment:
The word "что", pronounced "što" is not always at the beginning of the word, e.g. "кое-что", "ничто". The string "-что-" is pronounced by the rules ("čto") in words like "ничтожный", "ничтожество".
There are other words where "ч" is not pronounced as "š" or there are variant pronunciations. --Anatoli (обсудить/вклад) 22:34, 17 March 2013 (UTC)

--Anatoli (обсудить/вклад) 22:34, 17 March 2013 (UTC)

How can this be used from another Lua module?[edit]

It currently requires a frame, which means it can't be used from Lua. Can this be fixed please? —CodeCat 02:03, 11 April 2013 (UTC)

I don't think I can help but Module:ko-translit (by Ruakh) is written differently and uses calls to another module - Module:ko-hangul. --Anatoli (обсудить/вклад) 03:36, 11 April 2013 (UTC)
I see. If I were to change this module to work like that one, then all the current uses of the "tr" function from within templates will break. That isn't a bad thing necessarily as long as someone is on standby to fix them all. Would you be so kind? —CodeCat 12:36, 11 April 2013 (UTC)
Why are you changing? To convert Russian verb templates to Lua? I can change while there are not so many calls from templates at the moment, judging by [1]. Will we still be able to call the module from templates? --Anatoli (обсудить/вклад) 12:44, 11 April 2013 (UTC)
Yes, it should still be callable from templates. I could also decide not to make this module work the same as Module:ko-translit but then that would be a bit inconsistent. I will ask Ruakh what he thinks. —CodeCat 12:56, 11 April 2013 (UTC)
Yeah, it's annoying that it's so difficult, in the general case, to make a function that works smoothly from both Lua and templates. My general preference is to make functions that work smoothly from Lua, and then if needed to make a wrapper that can be called from templates, since the opposite approach is so obviously bad. But in this specific case, it's easy to make this function work both ways, so I've gone ahead and done so. —RuakhTALK 14:24, 12 April 2013 (UTC)

Module:ru-translit's signature and a call match a bunch of other transliteration modules. You could probably change the code but leave the signature as is? The difference with Module:ko-translit is that it's invoked {{#invoke:ko-translit|main|tr|한국어}} (also with optional params) --Anatoli (обсудить/вклад) 13:10, 11 April 2013 (UTC)

I have changed both the code and the signature (this entire discussion is about changing the signature to make it callable from Lua, so your suggestion that "You could probably change the code but leave the signature as is?" doesn't really make sense), but in a compatible way that shouldn't break existing uses. —RuakhTALK 14:24, 12 April 2013 (UTC)
@Ruakh. Thank you. My comment about signature was to CodeCat. By signature I meant the module name, the function name, number and type of parameters - in other words what you just did. --Anatoli (обсудить/вклад) 14:41, 12 April 2013 (UTC)

Transliterate ё as jó instead of jo?[edit]

When there is an accent mark on some forms of a word, it is a bit strange when there is none on this one. So, блую́ appears as blujú but блуёт appears as blujot without any accent. That seems a bit inconsistent. —CodeCat 16:36, 12 April 2013 (UTC)

You probably mean блюю́ and блюёт? There's a comment on WT:RU TR: The vowel “ё” is normally stressed in native Russian words, but occasionally it may be necessary to show the stress for this letter: “ё́”. A few exceptions are when multipart words with ё have stresses on other syllables (трёхме́стный - three-seater (adj)) and some rare loanwords. It looks a bit ugly with a stress and Russians never put accent on it. No template stresses it here either. The dots serve as a pronunciation indicator, since most of the time "ё" is written as "е" causing confusion. --Anatoli (обсудить/вклад) 16:46, 12 April 2013 (UTC)

I'm not sure how to solve that, then... —CodeCat 16:48, 12 April 2013 (UTC)
Let's change jo to jó in the module. --Anatoli (обсудить/вклад) 23:31, 12 April 2013 (UTC)
But if it's like you said, then that might cause a word to have two accents in it, if it contains two ё's. —CodeCat 23:33, 12 April 2013 (UTC)
It's OK to transliterate трёхме́стный as trjóxméstnyj and четырёхугольник as četyrjóxugólʹnik with a second stress, at least for the module. --Anatoli (обсудить/вклад) 23:37, 12 April 2013 (UTC)

@CodeCat. I like your idea of transliterating ё selectively as you suggested on WT:RU TR. So, for monosyllabic words would be пёс/чёрт would become (čort/pjos) - no accent, polysyllabic пёстрый/жёлтый - pjóstryj/žóltyj, polysyllabic with another ё and an acute accent on another syllable only the syllable with the accent - чёрно-белый - čorno-bélyj? This might take a bit of coding, though but would be great if you could do it, please. --Anatoli (обсудить/вклад) 05:40, 16 April 2013 (UTC)

I have realised that as well... the module would have to split the text into words first, and then put it back together again later. I have looked into a way to make it work, but I'm not really sure how to write the code. Something like чёрно-белый would come out as čórno-belyj, but what should чёрно-бе́лый become? čórno-bélyj or čorno-bélyj? In other words, does the - separate words that have individual stress, or not? And if so, is that for all words or only some? I'm beginning to think that this may not be as easy as it seemed at first. —CodeCat 12:39, 16 April 2013 (UTC)
It would be easier if it was designed for single words, wouldn't it? :) Let's consider words with "-" solid words with one accent, so "чёрно-бе́лый" (black and white) should become "čorno-bélyj" but I'll use better examples without "-":
"трёхме́стный" (three-seated), "четырёхуго́льный" (quadrangular) ideally should become "trjoxméstnyj", "četyrjoxugólʹnyj"
четырёхколёсный (two "ё"), no stress at all (četyrjoxkoljosnyj) or two stresses (četyrjóxkoljósnyj), whatever is easier.
Please let me know if you have questions or suggestions. These situations are rare, so it's not critical. Even "trjóxméstnyj", "četyrjóxugólʹnyj" do not look terrible, one might consider the words as having two accents, they are compound words, anyway. --Anatoli (обсудить/вклад) 13:38, 16 April 2013 (UTC)

Problems continued[edit]

Words with hyphen and passing head argument have problems --user:Dixtosa 18:43, 31 May 2013 (UTC)

This should be fixed by adding {{delink}} to all templates. But I wonder why this can't be done at the level of the module. DTLHS (talk) 18:52, 31 May 2013 (UTC)

Capital "Е" - Е́сли, Если[edit]

Capital "Е" without a stress mark is not transliterated properly: [MODULE CALL REDACTED] currently gives "Jésli, Esli", it should be "Jésli, Jesli". For some reason in делать из мухи слона the stressed "Е́сли" is "Ésli". --Anatoli (обсудить/вклад) 00:02, 17 July 2013 (UTC)

Thanks for fixing, Z! --Anatoli (обсудить/вклад) 22:49, 25 July 2013 (UTC)


This edit.

   word = mw.ustring.gsub(word, "^Ѣ","Jě")
   word = mw.ustring.gsub(word, "^ѣ","jě")
   word = mw.ustring.gsub(word, "([^Ѐ-ӿ])Ѣ","%1Jě")
   word = mw.ustring.gsub(word, "([^Ѐ-ӿ])ѣ","%1jě")

What is going on here? Michael Z. 2013-10-21 15:36 z

I don't understand? What are you asking? See ѣсть for an example of an entry that is affected by the change. —CodeCat 15:41, 21 October 2013 (UTC)