Module talk:ru-translit

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Adding word stress[edit]

Will {{#invoke:ru-translit|tr|сло́во}} (with an accent) return "slóvo"? --Anatoli (обсудить/вклад) 02:09, 14 March 2013 (UTC)

You can test it yourself. Look here: [MODULE CALL REDACTED] —Μετάknowledgediscuss/deeds 02:19, 14 March 2013 (UTC)
I see now how you can test it! Does it work for longer texts?
Testing on a random news text:
TEST: [MODULE CALL REDACTED] --Anatoli (обсудить/вклад) 02:34, 14 March 2013 (UTC)

The code is too opaque, I don't understand it![edit]

I am afraid that this code was written to be so clever that I can't understand it. The variable names mean nothing, there are barely any comments to explain what each step does and why. How does the module actually approach the problem? What is the flag parameter for? I would really like this module to be cleaned up and made more readable. This is a wiki after all, so everyone with enough knowledge of Lua should be able to easily edit this, and the last thing we need is more arcane code that nobody except the creator can maintain. —CodeCat 03:22, 14 March 2013 (UTC)

Hard to follow indeed.
  1. "if not mw.ustring.match(flag,"г") then word=mw.ustring.gsub(word,"([ое][́̀]?)го([́̀]?)$","%1vo%2")" romanises -ого"/"-его" as "-ovo"/"-(j)evo"
  2. "word = mw.ustring.gsub(word,"([АОУЫЕЯЁЮИЕаоуыэяёюиеъь][́̀]?)е","%1je");" romanises Cyrillic "е" as "je", not "e" after any of "АОУЫЕЯЁЮИЕаоуыэяёюиеъь".
  3. "word = mw.ustring.gsub(word,"([жшчщЖШЧЩ])ё","%1o");" romanises "ё" as "o" after any of жшчщЖШЧЩ. --Anatoli (обсудить/вклад) 06:17, 14 March 2013 (UTC)
I think I am starting to understand the general idea. But how does the code know when to transliterate г as v? Does it have something to do with the flag parameter? Personally, I don't think the module should be capable of handling such irregular exceptions. It should provide a sensible default, but the default should be able to be overridden if necessary. That is what the tr= parameter would be for, after all... —CodeCat 14:01, 14 March 2013 (UTC)
That beats me, we need Ignatus to reply.
Romanising "-ого"/"-его" as "-ovo"/"-(j)evo" should NOT be done via the tool. Russian has words "много", "ого" where "г" is pronounced as expected or /h/ in ого as a variant. Override manually.
Same with "Чч" as "š", in что, чтобы, конечно. Override manually.
Consistent change - "Ё,ё" as "o" after жшчщЖШЧЩ - OK.
Consistent change - "Е,е" as "je" after АОУЫЕЯЁЮИЕаоуыэяёюиеъь - OK. Add ALL capitals Ъ, Ь to the list.
Don't use "ɛ" at all! It's reserved for foreign words where consonants (бвгдзклмнпрстфх) (excluding жшчщЖШЧЩ and "цЦ") after "е". I don't follow the logic of the code but override manually. In short, "Э, э" is always "e", "Е, е" is "e" or "je" after АОУЫЕЯЁЮИЕЪЬаоуыэяёюиеъь.
This should make the code simpler. Please ask if it's confusing. --Anatoli (обсудить/вклад) 22:34, 14 March 2013 (UTC)
Well, let me reply. Yes, maybe my idea with flags was not good. Simplifications you described can be accepted except it's better to handle -ого/-его by default; most words ending on them are genitives; there just should be a switch-off for cases fhen thy are definitely not, e.g. for {{ru-verb}}. "Что" should be listed as exception since it appears very often, other words with ч=ш may be transliterated manually. Exceptions with е=э are frequent altogether but each word with them is not very, so they will cause need in manual input common. Maybe we should use another way to denote specialities to letters for translit and inflection, like marking them in-place once in template (see my talkpage for suggestion). And, OK, I don't like now that in the module there are different functions for single words and phrases; we should rename phr into tr, and curent tr use innerly if it needed at all. Ignatus (talk) 14:13, 15 March 2013 (UTC)
I would prefer it if exceptions are not added to the module at all, but are just supplied with the tr= parameter. So the module is only used to provide a default. —CodeCat 14:22, 15 March 2013 (UTC)
  • The module was rewritten. It transliterates any phrases with function tr in simplest manner except that words starting on что and ending on ого and его are always treated specially; if genitives are obviously not awaited, as for {{ru-adv}}, parameter nogen= with any value can be sent to #invoke. Restore finally the doc subpage please and documentate this; I'm going now to do changes to affected templates. Ignatus (talk) 13:02, 16 March 2013 (UTC)
    • If the transliterator handles phrases (which it should), then the patterns need to match not just on the beginning or end of the string, but also next to characters that separate words like spaces and punctuation. Currently, it would correctly transliterate "jego" but not "u jego brata". —CodeCat 14:07, 16 March 2013 (UTC)
      • [MODULE CALL REDACTED] - already fixed. Ignatus (talk) 18:32, 16 March 2013 (UTC)
        • Oh... ok? I'm confused now because I don't see anything in the code that makes that work... —CodeCat 19:00, 16 March 2013 (UTC)
          • Line 20. %A is used to determine end of word. Ignatus (talk) 19:52, 16 March 2013 (UTC)

There are a few problems[edit]

The most obvious one first:

  • мно́го(mnógo) (correct): [MODULE CALL REDACTED] (wrong) --Anatoli (обсудить/вклад) 02:11, 17 March 2013 (UTC)
    • Yes, not all words can be converted automatically. Since most words ending on -ого and -его in Russian are genitives should-be transliterated via -v-, it's default behaviour for this module. If you are transliterating word which is definitely not genitive (e. g. in {{ru-adv}}), invoke it with parameter nogen: [MODULE CALL REDACTED]. Ignatus (talk) 09:29, 17 March 2013 (UTC)
Thank you for your efforts.
I suggest we should remove lines 17 to 23 altogether:
    --handle genitive endings, which are spelled -ego but transliterated -evo
    if not frame.args['nogen'] then
        word = mw.ustring.gsub(word, "([ое][́̀]?)го([́̀]?)$","%1vo%2")
        word = mw.ustring.gsub(word, "([ое][́̀]?)го([́̀]?%A)","%1vo%2")
    end
    --Handle common exception words with ч
    word = mw.ustring.gsub(word, "[А-ЯЁа-яё][А-ЯЁа-яё́̀]*",function (w) return w:gsub("^Что","Što"):gsub("^что","što") end)
Let's have a manual override for these kinds of exceptions. The adjective declension tables will have a note on pronunciation of "-ого/-его". As for "что", "чтобы", "что-то", "ничто", "конечно". It's easier to add manual override than rely on the list of exceptions. --Anatoli (обсудить/вклад) 10:17, 17 March 2013 (UTC)
I agree. Also, are there words in Russian where the stem ends in -g- and they receive -o as an inflectional ending? Like neuter adjectives? —CodeCat 14:39, 17 March 2013 (UTC)
The stem ending in "г" has be preceded by "о" or "е" for this test. That would be "строго" (both an adverb and an adjective form).
More arguments in favour of removing "что"'s special treatment:
The word "что", pronounced "što" is not always at the beginning of the word, e.g. "кое-что", "ничто". The string "-что-" is pronounced by the rules ("čto") in words like "ничтожный", "ничтожество".
There are other words where "ч" is not pronounced as "š" or there are variant pronunciations. --Anatoli (обсудить/вклад) 22:34, 17 March 2013 (UTC)

--Anatoli (обсудить/вклад) 22:34, 17 March 2013 (UTC)

How can this be used from another Lua module?[edit]

It currently requires a frame, which means it can't be used from Lua. Can this be fixed please? —CodeCat 02:03, 11 April 2013 (UTC)

I don't think I can help but Module:ko-translit (by Ruakh) is written differently and uses calls to another module - Module:ko-hangul. --Anatoli (обсудить/вклад) 03:36, 11 April 2013 (UTC)
I see. If I were to change this module to work like that one, then all the current uses of the "tr" function from within templates will break. That isn't a bad thing necessarily as long as someone is on standby to fix them all. Would you be so kind? —CodeCat 12:36, 11 April 2013 (UTC)
Why are you changing? To convert Russian verb templates to Lua? I can change while there are not so many calls from templates at the moment, judging by [1]. Will we still be able to call the module from templates? --Anatoli (обсудить/вклад) 12:44, 11 April 2013 (UTC)
Yes, it should still be callable from templates. I could also decide not to make this module work the same as Module:ko-translit but then that would be a bit inconsistent. I will ask Ruakh what he thinks. —CodeCat 12:56, 11 April 2013 (UTC)
Yeah, it's annoying that it's so difficult, in the general case, to make a function that works smoothly from both Lua and templates. My general preference is to make functions that work smoothly from Lua, and then if needed to make a wrapper that can be called from templates, since the opposite approach is so obviously bad. But in this specific case, it's easy to make this function work both ways, so I've gone ahead and done so. —RuakhTALK 14:24, 12 April 2013 (UTC)

Module:ru-translit's signature and a call match a bunch of other transliteration modules. You could probably change the code but leave the signature as is? The difference with Module:ko-translit is that it's invoked {{#invoke:ko-translit|main|tr|한국어}} (also with optional params) --Anatoli (обсудить/вклад) 13:10, 11 April 2013 (UTC)

I have changed both the code and the signature (this entire discussion is about changing the signature to make it callable from Lua, so your suggestion that "You could probably change the code but leave the signature as is?" doesn't really make sense), but in a compatible way that shouldn't break existing uses. —RuakhTALK 14:24, 12 April 2013 (UTC)
@Ruakh. Thank you. My comment about signature was to CodeCat. By signature I meant the module name, the function name, number and type of parameters - in other words what you just did. --Anatoli (обсудить/вклад) 14:41, 12 April 2013 (UTC)

Transliterate ё as jó instead of jo?[edit]

When there is an accent mark on some forms of a word, it is a bit strange when there is none on this one. So, блую́ appears as blujú but блуёт appears as blujot without any accent. That seems a bit inconsistent. —CodeCat 16:36, 12 April 2013 (UTC)

You probably mean блюю́ and блюёт? There's a comment on WT:RU TR: The vowel “ё” is normally stressed in native Russian words, but occasionally it may be necessary to show the stress for this letter: “ё́”. A few exceptions are when multipart words with ё have stresses on other syllables (трёхме́стный - three-seater (adj)) and some rare loanwords. It looks a bit ugly with a stress and Russians never put accent on it. No template stresses it here either. The dots serve as a pronunciation indicator, since most of the time "ё" is written as "е" causing confusion. --Anatoli (обсудить/вклад) 16:46, 12 April 2013 (UTC)

I'm not sure how to solve that, then... —CodeCat 16:48, 12 April 2013 (UTC)
Let's change jo to jó in the module. --Anatoli (обсудить/вклад) 23:31, 12 April 2013 (UTC)
But if it's like you said, then that might cause a word to have two accents in it, if it contains two ё's. —CodeCat 23:33, 12 April 2013 (UTC)
It's OK to transliterate трёхме́стный as trjóxméstnyj and четырёхугольник as četyrjóxugólʹnik with a second stress, at least for the module. --Anatoli (обсудить/вклад) 23:37, 12 April 2013 (UTC)

@CodeCat. I like your idea of transliterating ё selectively as you suggested on WT:RU TR. So, for monosyllabic words would be пёс/чёрт would become (čort/pjos) - no accent, polysyllabic пёстрый/жёлтый - pjóstryj/žóltyj, polysyllabic with another ё and an acute accent on another syllable only the syllable with the accent - чёрно-белый - čorno-bélyj? This might take a bit of coding, though but would be great if you could do it, please. --Anatoli (обсудить/вклад) 05:40, 16 April 2013 (UTC)

I have realised that as well... the module would have to split the text into words first, and then put it back together again later. I have looked into a way to make it work, but I'm not really sure how to write the code. Something like чёрно-белый would come out as čórno-belyj, but what should чёрно-бе́лый become? čórno-bélyj or čorno-bélyj? In other words, does the - separate words that have individual stress, or not? And if so, is that for all words or only some? I'm beginning to think that this may not be as easy as it seemed at first. —CodeCat 12:39, 16 April 2013 (UTC)
It would be easier if it was designed for single words, wouldn't it? :) Let's consider words with "-" solid words with one accent, so "чёрно-бе́лый" (black and white) should become "čorno-bélyj" but I'll use better examples without "-":
"трёхме́стный" (three-seated), "четырёхуго́льный" (quadrangular) ideally should become "trjoxméstnyj", "četyrjoxugólʹnyj"
четырёхколёсный (two "ё"), no stress at all (četyrjoxkoljosnyj) or two stresses (četyrjóxkoljósnyj), whatever is easier.
Please let me know if you have questions or suggestions. These situations are rare, so it's not critical. Even "trjóxméstnyj", "četyrjóxugólʹnyj" do not look terrible, one might consider the words as having two accents, they are compound words, anyway. --Anatoli (обсудить/вклад) 13:38, 16 April 2013 (UTC)

Problems continued[edit]

Words with hyphen and passing head argument have problems --user:Dixtosa 18:43, 31 May 2013 (UTC)

This should be fixed by adding {{delink}} to all templates. But I wonder why this can't be done at the level of the module. DTLHS (talk) 18:52, 31 May 2013 (UTC)

Capital "Е" - Е́сли, Если[edit]

Capital "Е" without a stress mark is not transliterated properly: [MODULE CALL REDACTED] currently gives "Jésli, Esli", it should be "Jésli, Jesli". For some reason in делать из мухи слона the stressed "Е́сли" is "Ésli". --Anatoli (обсудить/вклад) 00:02, 17 July 2013 (UTC)

Thanks for fixing, Z! --Anatoli (обсудить/вклад) 22:49, 25 July 2013 (UTC)

Ѣ[edit]

This edit.

   word = mw.ustring.gsub(word, "^Ѣ","Jě")
   word = mw.ustring.gsub(word, "^ѣ","jě")
   word = mw.ustring.gsub(word, "([^Ѐ-ӿ])Ѣ","%1Jě")
   word = mw.ustring.gsub(word, "([^Ѐ-ӿ])ѣ","%1jě")

What is going on here? Michael Z. 2013-10-21 15:36 z

I don't understand? What are you asking? See ѣсть for an example of an entry that is affected by the change. —CodeCat 15:41, 21 October 2013 (UTC)

Discussion leading up to making -го as -vo and что as što be the default[edit]

(moved from Template talk:ru-ux)

@Atitarev, Cinemantique, Wikitiki89 I created this template along with {{ru-xlit}} to make it easier to create long usage examples in Russian without having to specify manual transliteration to handle adjectival -го, что, and other such things. It is like {{ux|ru}} but supports three extra parameters: (signature at top in case ping isn't sent: Benwing2 (talk) 22:47, 6 January 2016 (UTC))

  1. |adj=: Transliterate -го as -vo
  2. |shto=: Transliterate что as što
  3. |sub=: Apply arbitrary Lua pattern substitutions to the Cyrillic text, esp. to handle cases where е should be transliterated as ɛ.
  • I'm thinking maybe adj= and shto= should be made the default, so if you don't want them you need to turn them off with adj=n or shto=n. What do you think?
  • Also, I'm thinking of adding support for this to templates like ru-phrase and ru-adj. Sound good?

Benwing2 (talk) 22:47, 6 January 2016 (UTC)

    • I don't mind. Please be aware that чтобы, кое-что, что-нибудь, что-то, что-либо, ничто, etc. also should use "š". These words are derivations but are pronounced regularly - нечто, ничтожество, ничтожный, etc., should use "č". Unrelated words with "-что-" - уничтожать, почтовый, etc. should use "č". It's not so straightforward but probably feasible for you.
    • I would like Russian references to use "russkovo" instead of "russkogo" in Category:Russian reference templates to make it consistent but Vahagn will probably oppose. BTW, "russkogo" is used more often in referenced books but "russkovo" is also present, also in book titles. --Anatoli T. (обсудить/вклад) 23:22, 6 January 2016 (UTC)
Currently the code for что substitution checks to see if there is a word boundary at both ends, so it will also apply to кое-что, что-нибудь, что-то, что-либо, etc. but not to чтобы or ничто, which I can special-case. Any other such words? Benwing2 (talk) 23:25, 6 January 2016 (UTC)
I would need a list of Russian words containing "что". There won't be too many. enwiki words should be sufficient. --Anatoli T. (обсудить/вклад) 23:32, 6 January 2016 (UTC)
OK, your list confirms what I said, no other words, words with final -его/-ого will also need a list of exceptions ("g", not "v") - много, немного, лого, лего, сого, ого, possibly some loanwords but сегодня and its derivations also use "v".--Anatoli T. (обсудить/вклад) 23:56, 6 January 2016 (UTC)
Here it is (this includes expressions with что):

Benwing2 (talk) 23:41, 6 January 2016 (UTC)

@Wikitiki89 Did you see the above discussion? I am thinking of making adj=y and shto=y the default for ru-ux and maybe other things like ru-phrase after accounting for words like много, so you'd have to turn them off with noadj=y or noshto=y. The purpose is to avoid having to have lots of manual transliterations. Benwing2 (talk) 00:33, 12 January 2016 (UTC)

Possibility of making special-casing for -го and что the default in transliteration[edit]

@Atitarev, Cinemantique, Wikitiki89 I've now implemented special-casing for -го and что in {{ru-ux}} and made it the default. The special-casing for -го (in genitives) is carefully written: It applies specifically to -ого/-его/-аго at the end of a word or followed by -ся, and it also catches сегодня and words beginning with сегодняшн-, and per Anatoli it has exceptions to ensure that много,немного,лого,лего,сого,ого don't get modified. The special-casing for что is also careful to apply only to что,чтобы,чтоб,ничто as whole words. (Note that "end of word" allows for a following hyphen, so cases like кого-либо, что-нибудь will be handled correctly.) Benwing2 (talk) 20:21, 15 January 2016 (UTC)

What do people think about making this the default for all transliteration? This would solve a lot of issues that come up currently in various places, e.g. in бомж, the expansion is rendered лицо́ без определённого ме́ста жи́тельства(licó bez opredeljónnovo mésta žítelʹstva) with -ogo instead of -ovo. It could still be overridden using tr=, if necessary. The special-casing shouldn't slow things down due to the way it's written. Benwing2 (talk) 20:21, 15 January 2016 (UTC)
  • Let's make it for transcription, too.--Cinemantique (talk) 21:04, 15 January 2016 (UTC)
    • Good job, Benwing2! Yes, making it default is good. It should now be possible to use the same logic for IPA. The version on ruwiki doesn't seem to use "phon=" (it's not in the documented examples) but maybe it should, with a different wording? @Cinemantique, what do you think?
    • I think it would still be beneficial to show the phonetic respellings, even if these cases are handled automatically. --Anatoli T. (обсудить/вклад) 00:03, 16 January 2016 (UTC)

Того[edit]

(moved from Talk:Того)

@Benwing2 I missed this one. Does it need a manual transliteration "Tógo" to distinguish from того́(tovó), which can also be capitalised at the beginning of a sentence? --Anatoli T. (обсудить/вклад) 20:40, 18 January 2016 (UTC)

The stress is consistently different, though, can this be used in your logic for the automatic xlit? --Anatoli T. (обсудить/вклад) 20:42, 18 January 2016 (UTC)
I added stressed То́го to the exceptions. Conceivably I could add unstressed Того there as well, but that would fail if того́ ever occurs at the beginning of a sentence and written without an accent (and того́ is much more common than То́го). Can того ever occur sentence-initially? Benwing2 (talk) 22:30, 18 January 2016 (UTC)
Yes, it can. I think for ambiguous cases like sentence-initial "Того" without a stress mark (unknown sense), we should use "g" in translit and [ɡ] in IPA. Adding a stress mark would fix it. (I may have missed other loanwords with final "-ого" or "-его" where it should be "g" but I can't think of others at the moment). --Anatoli T. (обсудить/вклад) 22:44, 18 January 2016 (UTC)
OK, I'll implement that. Benwing2 (talk) 22:46, 18 January 2016 (UTC)

short forms of adjectives in -го[edit]

(moved from Talk:дорого)

@Atitarev This should be another exception to the /v/ pronunciation right? Benwing2 (talk) 18:27, 10 April 2016 (UTC)

@Benwing2 Yes, please! --Anatoli T. (обсудить/вклад) 20:25, 10 April 2016 (UTC)
@Benwing2 Please also add недо́рого(nedórogo). --Anatoli T. (обсудить/вклад) 20:53, 10 April 2016 (UTC)
@Benwing2 There are more - (не)стро́го, убо́го, поло́го, short neuter adjectives длинноно́го, коротконо́го, кривоно́го. --Anatoli T. (обсудить/вклад) 21:16, 10 April 2016 (UTC)
OK thanks. Benwing2 (talk) 21:35, 10 April 2016 (UTC)
@Atitarev Done. The following should all be handled correctly:
Benwing2 (talk) 02:07, 11 April 2016 (UTC)
@Benwing2 Thank you. I am sorry I missed some terms earlier. I wonder if the search for string "ого" in the final position can be done in ruwikt, so that we could find more (potential) examples? All words with "-legged" suffix (like "длинноногий" - "long-legged") are affected. Need to check all -огий adjectives, if they have short forms, then they will need [ɡ] in pronunciation. --Anatoli T. (обсудить/вклад) 02:39, 11 April 2016 (UTC)
@Cinemantique I'm not sure how to search ruwikt but maybe Cinemantique can help. Benwing2 (talk) 02:50, 11 April 2016 (UTC)

I have tried this but this gives too many results. --Anatoli T. (обсудить/вклад) 03:04, 11 April 2016 (UTC)

@Atitarev Here's the list of pages that are words (mostly adjectives) in -огий. Haven't checked which ones have short forms. Benwing2 (talk) 06:37, 11 April 2016 (UTC)
BTW in -огой are only the following:
None in -егий or -егой. Benwing2 (talk) 06:39, 11 April 2016 (UTC)
@Benwing2 Yes, all of these need the same treatment, if they have short forms. In the -егий-group there's an adjective пе́гий(pégij, piebald, skewbald (esp. of horses)), which can have short forms. --Anatoli T. (обсудить/вклад) 07:51, 11 April 2016 (UTC)
@Benwing2 The short neuter in пегий shows "pévo", it should be "pégo". Pls add it to the exceptions. --Anatoli T. (обсудить/вклад) 02:37, 13 April 2016 (UTC)
@Atitarev Will do. Benwing2 (talk) 03:02, 13 April 2016 (UTC)

итого́[edit]

@Atitarev should this be another exception? Benwing2 (talk) 03:25, 29 April 2016 (UTC)

No, it's pronounced "итово́", from и + того́. --Anatoli T. (обсудить/вклад) 03:55, 29 April 2016 (UTC)