Module talk:links

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search
test cases: Module:links/testcases

Alt text[edit]

What should language_link do with the alt-text if the main text already contains links? I presume that if given something like [[text|tēxt]], it shouldn't actually use the alt text, just add the language name. So how should it handle this? —CodeCat 21:29, 29 March 2013 (UTC)

Yeah, I made a change for that. --Z 22:47, 29 March 2013 (UTC)
Your change doesn't actually work right. Consider what it would do if text were "[[word|wōrd]] is [[this]]". —CodeCat 22:56, 29 March 2013 (UTC)
Yeah, forgot that, fixed now, I think. --Z 23:27, 29 March 2013 (UTC)

What should we do in cases like {{l|en|a [[text]]|tēxt}}? Current code will return a [[text#English|tēxt]]. I think value of alt shouldn't affect output in this case either. If we create more links, all of them will have "tēxt" as their link title. --Z 02:50, 31 March 2013 (UTC)

The alt text should only really be used if there are no links in the text. If there are, it should be ignored, so it should give a [[text#English|text]]. —CodeCat 03:51, 31 March 2013 (UTC)

Annotated link[edit]

I made this a function that doesn't receive a frame on purpose. What if another Lua function wants to make a link? It would be a bit silly if it had to call the template... —CodeCat 14:51, 30 March 2013 (UTC)

Ok, feel free to revert. I thought the function is supposed to be used only in l and term. (But we need to add another function which take frame table as argument). --Z 15:02, 30 March 2013 (UTC)
Well, my intention was that this module be used for anything related to making links. {{l}} and {{term}} aren't the only ones that make links, we also have the form-of templates (which really work like {{term}} because they also have "annotations", but with some extras), and {{head}} which could use this module. Provided that it's made in such a way that nothing is written to work only for {{l}}, at least. —CodeCat 16:32, 30 March 2013 (UTC)
But they are still different, even l and term don't work exactly the same, and the current code is more similar to term -- term is like this: <word> (<tr>, "<gloss>"), and l: <word> (<tr>) <g> ("<gloss>"). --Z 17:02, 30 March 2013 (UTC)
That's why I said it may not be a good idea to try to mimic the templates too closely. Do we actually want them to be different in that way? I think they should both work like {{term}} does, with only a single set of brackets instead of two. —CodeCat 17:16, 30 March 2013 (UTC)
Do we need community consensus to make such change? The change is minor and probably not important, but the tempate is widely used... --Z 17:23, 30 March 2013 (UTC)
I don't think so. It doesn't actually change anything that is really important, just a small cosmetic detail, which can easily be changed back (but I doubt anyone would disagree). —CodeCat 19:05, 30 March 2013 (UTC)

"main" and arguments[edit]

What is this function actually meant to do? What are the parameters for? Could it have a better name? Also, the way the parameters are assigned is just wrong; it creates global variables which should be avoided, especially in this case. A better way of doing it would be:

local args = frame:getParent().args
local gloss = args["gloss"] or ""

CodeCat 19:11, 30 March 2013 (UTC)

We need a function like this to make the module accessible from wiki pages. We'll change the name if we feel the need to define more functions for creating links in future, e.g. links in a format that is different from that of term and l (see Template:fa-conj for example, in which transliterations are in a new line). Regarding variables, yeah that's better, I'll change it. --Z 19:22, 30 March 2013 (UTC)
Maybe it should just be called "template_l" so that we know it's just for that template? —CodeCat 19:25, 30 March 2013 (UTC)
Ok, we may change that to main if decide to use that for term, etc. too. Regarding variables, I think the current form is better. By applying your change, all variables will be always True, because they are either a text, or an empty string, "", both of which are True. Then we can't use expressions like "(alt or text)" --Z 19:31, 30 March 2013 (UTC)
Yes, but there is an advantage too. In templates, you can insert a parameter that might be empty, and if it's empty, no text is inserted there. In Lua, if you try to insert nil into a text, you get a script error. So we will have to decide which is more convenient. In any case, the current form isn't better, it still needs to be changed because it uses global variables. —CodeCat 19:35, 30 March 2013 (UTC)
But does that that really happen? So far, the code is written in a way that before adding any variables to the output text, it checks if it's not nil. (except when both text and alt are nil, but I think raising an error is actually appropriate in this case) --Z 19:47, 30 March 2013 (UTC)


Aren't we going to support parameters for gender and number or what? --Z 17:36, 31 March 2013 (UTC)

{{l}} supports it, so I suppose we'll have to. I don't really like the idea of putting grammatical information into a template like this, but for now we should focus on making the existing template work so we can convert it. We can discuss actual changes later on. —CodeCat 18:25, 31 March 2013 (UTC)
Also for that reason, please do not add extras like automated transliterations or removal of macrons. Those are not in {{l}} so they shouldn't be in this either, not until after we have converted {{l}} and are sure that it works. —CodeCat 18:28, 31 March 2013 (UTC)
But then again, it's the best time to perform any planned changes, because we are rewriting the code. --Z 19:01, 31 March 2013 (UTC)
It would be if we could start from scratch. Unfortunately, we can't, so we have to incorporate backwards compatibility into our plans. —CodeCat 19:20, 31 March 2013 (UTC)

General linking module?[edit]

If this module is going to be used as a general module for linking, then I think it should include the functionality provided by {{wlink}}, {{wlink2}}, and {{makelink}}. Also, most or all of export.template_l should be moved into more general functions. --Yair rand (talk) 22:27, 7 April 2013 (UTC)

What extra functionality do those templates provide exactly? And the problem with moving the functionality into general functions is that currently there is a variety of parameter names for different templates. The purpose of export.template_l is to "convert" these parameters into a standard format, and then forward it to a more general function. I don't know if it does that well enough, but that is the idea. —CodeCat 22:30, 7 April 2013 (UTC)
{{wlink|[[test]]}} = Template:wlink, {{wlink|test}} = Template:wlink. --Yair rand (talk) 22:35, 7 April 2013 (UTC)
I think the module already does that currently. But you can try it to make sure. —CodeCat 22:48, 7 April 2013 (UTC)
I just checked. It does not. It should probably also be able to support {{l-self}}, by the way. --Yair rand (talk) 01:13, 9 April 2013 (UTC)
Oh, I see what happened. It did support it at one point, but to avoid having too many potential problems to deal with when changing over {{l}}, it was removed again. And I think that's a good idea for now. —CodeCat 01:24, 9 April 2013 (UTC)
? You think that additional features should be added after the module is already in use on millions of pages, instead of before? That seems kind of backwards to me. Reworking it to add the rest of the features after it's in use sounds more likely to cause some problems. --Yair rand (talk) 01:41, 9 April 2013 (UTC)
It makes more sense to me if we limit the number of things that can go wrong and need to be fixed. I would rather make sure that the template works before we start adding more things to it rather than after. It also means that when we do add something new and it breaks things, we immediately know what caused it and what to revert. If we try to do everything in one go, we might be swamped with problems and have no choice to roll everything back and start over. It's much easier to make small incremental changes to something you know that works, rather than adding things to something you're not even sure worked before. —CodeCat 01:47, 9 April 2013 (UTC)

altForm from WT:EDIT[edit]

In prepare_title, la-utilities is imported if the language is Latin. Rather than setting up a whole bunch of unique functions for every language, perhaps we should just copy over the altForms table from WT:EDIT:

var altForm = {
		ang: {from:"ĀāǢǣĊċĒēĠġĪīŌōŪūȲȳ", to:"AaÆæCcEeGgIiOoUuYy", strip:"\u0304\u0307"}, //macron and above dot
		ar: {strip:"\u064B\u064C\u064D\u064E\u064F\u0650\u0651\u0652"},
		fa: {strip:"\u064B\u064C\u064D\u064E\u064F\u0650\u0651\u0652"},
		ur: {strip:"\u064B\u064C\u064D\u064E\u064F\u0650\u0651\u0652"},
                chl: {from:"ÁáÉéÍíÓóÚú", to:"AaEeIiOoUu",strip:"\u0304"}, //acute accent
		he: {strip:"\u05B0\u05B1\u05B2\u05B3\u05B4\u05B5\u05B6\u05B7\u05B8\u05B9\u05BA\u05BB\u05BC\u05BD\u05BF\u05C1\u05C2"},
		hr: {from:"ȀȁÀàȂȃÁáĀāȄȅÈèȆȇÉéĒēȈȉÌìȊȋÍíĪīȌȍÒòȎȏÓóŌōȐȑȒȓŔŕȔȕÙùȖȗÚúŪū",
		la: {from:"ĀāĒēĪīŌōŪūȲȳ", to:"AaEeIiOoUuYy",strip:"\u0304"}, //macron
		lt: {from:"áãàéẽèìýỹñóõòúù", to:"aaaeeeiyynooouu", strip:"\u0340\u0301\u0303"},
                nci: {from:"ĀāĒēĪīŌōŪūȲȳ", to:"AaEeIiOoUu",strip:"\u0304"}, //macron
		ro: {from:"ŞŢşţ", to:"ȘȚșț"},
		ru: {strip:"\u0300\u0301"},
		uk: {strip:"\u0300\u0301"},
		be: {strip:"\u0300\u0301"},
		bg: {strip:"\u0300\u0301"},
		mk: {strip:"\u0300\u0301"},
		sh: {
			to:  "AaAaAaAaAaEeEeEeEeEeIiIiIiIiIiOoOoOoOoOoRrRrRrUuUuUuUuUuии",
		sr: {
		sl: {from: "áÁàÀâÂȃȂȁȀéÉèÈêÊȇȆȅȄíÍìÌîÎȋȊȉȈóÓòÒôÔȏȎȍȌŕŔȓȒȑȐúÚùÙûÛȗȖȕȔệỆộỘẹẸọỌəł",
			 to: "aAaAaAaAaAeEeEeEeEeEiIiIiIiIiIoOoOoOoOoOrRrRrRuUuUuUuUuUeEoOeEoOel",
			 strip: "\u0301\u0300\u0302\u0311\u030f\u0323"},
		tr: {from:"ÂâÛû", to:"AaUu",strip:"\u0302"},
		zu: {strip_init_hyphen: 1}

--Yair rand (talk) 22:37, 7 April 2013 (UTC)

Because it is a data table, it seems like it would be better suited to Module:languages. It would also be faster, because that module would only be imported once per page, while this module would be imported once per link template. —CodeCat 22:47, 7 April 2013 (UTC)
Added. I stole adopted the approach used in strip_macrons() in Module:la-utilities, so that there is no need for "from" and "to" fields.
I support moving the data to Module:languages BTW. --Z 13:41, 27 June 2013 (UTC)
What you added doesn't actually work. Unicode has many precomposed characters, which are a single character consisting of the base letter and the diacritic. If you look only for the diacritic, you won't find them. —CodeCat 14:14, 27 June 2013 (UTC)
I know. It works, all characters are decomposed by mw.ustring.toNFD() before the replacement. --Z 14:29, 27 June 2013 (UTC)
Oh, I didn't know that existed. It seems useful, but is it fast? —CodeCat 15:18, 27 June 2013 (UTC)
Also, the combining characters are currently so jumbled up in the code that you can't see what they are. Is there a way to fix that? —CodeCat 15:21, 27 June 2013 (UTC)
Spelling them out in UTF-8, which will at least make editing easier. But otherwise, not much can be done. Keφr 15:42, 27 June 2013 (UTC)
What about putting spaces in between? It would then match on a space as well, but you could always just remove the spaces again before using the string. So doing... string operations on a regex pattern. Why not? :) —CodeCat 15:48, 27 June 2013 (UTC)
Well, if we are no longer afraid of getting hands dirty with somewhat performance-hitting operations, why not import the Unicode database, specify the characters with an array of their Unicode names, and let a function convert that to a regular expression? Keφr 15:56, 27 June 2013 (UTC)
Of course they should be spelled in UTF-8, but I don't know why they don't work in Lua (I've tried different forms, decimal, hex... none of which worked). --Z 16:08, 27 June 2013 (UTC)
Good job, it works now. --Z 16:21, 27 June 2013 (UTC)

This is magic to me. How can I remove breve marks in Latin words? In "ăquā" it only removes the macron. --Vriullop (talk) 12:38, 30 June 2013 (UTC)

Breve marks should not be used at all. See WT:ALA. —CodeCat 12:46, 30 June 2013 (UTC)


What is wrong with Module:gender and number that it can't be used in this module? It works fine for {{nl-noun}} and {{ca-noun}}. —CodeCat 17:02, 10 April 2013 (UTC)

(1) it separates only with comma, but it should use "and" as well. This can be done using mw.text.listToText(list, ", ", " and "). (2) it doesn't differ between genders and numbers: m, f, sg or m, f and sg should be m and f sg. --Z 17:15, 10 April 2013 (UTC)
...that's all? Really, that's not a problem at all, I don't see why you think it is. The difference is by design, because the way the old genders were made, you couldn't tell whether "m f p" meant "masculine plural and feminine plural" or "masculine singular and feminine plural". You're right though that this isn't quite the same as how {{l}} currently works, so this should be changed by fixing all the entries that use {{l}} with more than one gender. g1=m g2=p should be changed into g=m-p . —CodeCat 17:59, 10 April 2013 (UTC)
So lets make it backward compatible, if it has dash, it follows your way, otherwise do as Template:l does,
    if gender and gender[1] then
        if mw.ustring.match(gender[1] .. gender[2] .. gender[3], "-") then
            local gen = require("Module:gender and number")
            text = text .. " " .. gen.format(gender)
            text = text .. "&nbsp;" .. frame:expandTemplate{title = gender[1], args = {gender[2], gender[3]}}
then we fix all entries and inform users about the change, and after that we remove the backward compatibility. --Z 03:12, 11 April 2013 (UTC)
Ok, that sounds reasonable. But it's better to do it differently: if the g2= or g3= parameters are specified, use the old method, but if there is only g= or g1= then use the new method. —CodeCat 12:33, 11 April 2013 (UTC)
Now we should put something like {{#if:{{{g2|}}}{{{g3|}}}|[[Category:Pages with g2 or g3]]}} in Template:l's code to get a list of pages that should be updated by bot. --Z 12:45, 11 April 2013 (UTC)
Yes, and {{{g1|}}} should be in there as well (it's redundant to g=). —CodeCat 12:52, 11 April 2013 (UTC)
Ok, now we should update {{l}} to use the module, inform users about how they should use gender parameters from now on, and then update the pages in Category:l with g1, g2 or g3 (too lazy to renew my TS account, and my Internet connection is ridiculously slow and limited to update so many pages, anybody interested in doing this part?!) --Z 14:10, 11 April 2013 (UTC)

Script template class inconsistencies[edit]

There are a few script templates, such as {{Kore}} and {{unicode}}, that give class names that are different from their template names. Module:links uses the script codes themselves as classes instead of using the script templates, so I think we should probably first edit the script templates to be consistent and see if any problems come up (and make any necessary CSS changes), before starting to use this module. --Yair rand (talk) 11:21, 16 April 2013 (UTC)

We should define their class names in Module:languages. --Z 06:55, 17 April 2013 (UTC)
How would that solve the problem? —CodeCat 10:22, 17 April 2013 (UTC)
We will be able to check in the module if the class name(s) are different from the script name. --Z 10:33, 17 April 2013 (UTC)
But why would we add all this extra complexity when we can change the class names so that they are always the same instead? Isn't that the more obvious solution? —CodeCat 10:36, 17 April 2013 (UTC)
It is, but it is also harder to do. --Z 10:42, 17 April 2013 (UTC)
How so? And just because it's harder doesn't mean it shouldn't be done. —CodeCat 10:58, 17 April 2013 (UTC)
You were right, while replying that I was thinking of an unrelated class-related issue for some reason. --Z 11:04, 17 April 2013 (UTC)


language_link() can already do this. --Z 16:21, 16 May 2013 (UTC)

Except it cannot be directly #invoked and the kind of auto-linking it does might be actually undesired. But yes, I actually borrowed some code from it. Keφr (talk) 19:44, 16 May 2013 (UTC)

Testcases failing[edit]

I notice that 10 of the tests at Module talk:links/testcases are failing, and five of those are giving actual script errors. Anyone know what happened to cause this? --Yair rand (talk) 21:52, 2 June 2013 (UTC)

The script erros are because of the recent changes to Module:gender and number (spec:find() returns nil, dunno why). Other "failed" tests are actually because of the minor differences in extended, HTML codes; that's ok. --Z 08:14, 3 June 2013 (UTC)
Fixed now. --Z 12:52, 3 June 2013 (UTC)

Holding and accessing language data[edit]

I think it makes more sense to create a table that contains informations of language, language = languages[language_code] (the variable for language code is currently lang), instead of having a variable for language code and calling the big table languages everywhere. But in that case, we would need to add a field for language code in language. On the other hand, if we do this change, and alter the lang (string for language code) argument to language (table that contain language info), it will make the job a bit harder for functions that are supposed to be invoked directly from templates and use these functions. --Z 08:26, 24 June 2013 (UTC)

language_link() and script detection[edit]

We can improve the script detection feature in a way that it check each term that is going to be linked separately, rather than considering the input as a text that is written in a single script and checking the whole input text once: {{term|[[鳥居]] / [[とりい]] / [[torii]]|lang=ja}} (so that we won't have to write {{term|鳥居|lang=ja}} / {{term|とりい|lang=ja}} / {{term|torii|lang=ja}}). For doing this, we have to merge language_link() to annotated_link(), but this has some disadvantages... --Z 19:27, 24 June 2013 (UTC)

Template:recons with empty first parameter[edit]

{{recons}} is sometimes used without the first parameter. This removes the link, and just applies formatting to the bare word. These are now causing script errors, the module apparently requires the word parameter. But it should only be required for {{l}} (I think?) not for {{term}} or {{recons}}. —CodeCat 20:25, 5 July 2013 (UTC)

That's true, fixed. --Z 20:39, 5 July 2013 (UTC)

diacritic removal for Cyrillic broken[edit]

  • ѝзранити - when this starts generating red links, it doesn't work. --Ivan Štambuk (talk) 14:59, 27 July 2013 (UTC)
    • That's strange. Apparently, ѝ is a single composed character, and not и with a combining diacritic. So the replacement didn't catch. I added the composed character ѝ for Serbo-Croatian now, hopefully that fixed it. —CodeCat 15:41, 27 July 2013 (UTC)
      • I think that when you save и with a combining diacritic it automatically gets converted to ѝ by MW. --Ivan Štambuk (talk) 15:49, 27 July 2013 (UTC)
        • Yes, it tries to create composed characters whenever it can. In this case it's just a bit surprising because only ѐ and ѝ exist as composed characters, not а, о, etc. I wonder why they made Unicode that way. —CodeCat 16:30, 27 July 2013 (UTC)

Merging template_l and template_term[edit]

I'm going to merge template_l and template_term functions, like this (tests), template's title (either "l" or "term") should be provided to the module through the parameter "template". Any objections/suggestions? Any suggestions about the name of the new function? --Z 22:39, 27 July 2013 (UTC)

One of the consequences of this is that the forth parameter in {{l}} can do the the job of the horrible "gloss" parameter and makes it deprecated, so we can get the rid of it and make the two templates more similar to each other. --Z 22:50, 27 July 2013 (UTC)

Also, "id" will be available in {{term}}, so will be "lit" and "pos" in {{l}}. --Z 22:52, 27 July 2013 (UTC)

(edit conflict)
  • "link term" sounds a bit like it's meant to "link a term". "template_l_term" is probably more accurate?
  • It looks like you have made both {{term}} and {{l}} use either the gloss= parameter or the parameter that follows alt. It's not a problem (so that both are equivalent until we decide which one to keep), but if you add that then you should check any usages of {{l}} that currently already have a 4th parameter, and any usages of {{term}} that already have gloss=, just in case someone provided those parameters in the past and they are still present in entries. We don't want those old mistakes to be interpreted wrongly when those parameters suddenly become valid. The same applies to g= for {{term}} and to pos= and lit= for {{l}}, as these templates did not originally support those parameters.
  • I'm not quite sure why you are calling {{rfscript}} with "sc or langinfo.scripts[1]". Why not just sc?
  • I didn't check, but is merging the two the only changes that you made, or did you also make changes to the other functions in the module? —CodeCat 23:00, 27 July 2013 (UTC)
  • Maybe even more accurate: "templates_l_term"? As one may think "template_l_term" is supposed to be used in a template called "l term" or "l-term".
  • One may have already used the forth parameter of l instead of its third parameter, so we should check it, but why should we check "gloss" in term, or "lit" and "pos" in l? Even if someone has already used them, I don't think the user meant anything but what these parameters will mean after our change.
  • We put the sc = langinfo.scripts[1] part inside the "if (term or alt) then" block, yesterday, so sc may be nil. In this case, the template itself checks for the name of the script, but it is faster to be done directly in the module.
  • There are other changes, but I only want to replace "template_l" and "template_term" with "link_term" for now. --Z 23:36, 27 July 2013 (UTC)
  • I think "template_l_term" is better, there is not that much risk of confusion.
  • It's unlikely, but it can't hurt to check...
  • Oh yes, I didn't realise. You can't do script detection if there is no text to detect it from.
CodeCat 23:52, 27 July 2013 (UTC)
Done. --Z 21:21, 28 July 2013 (UTC)

Lua-izing {{term}}[edit]

Unlike other major linking templates {{l}} and {{t}}, {{term}} takes the language code through the named "lang" parameter. There were discussions and efforts regarding fixing this, which went nowhere. the The module is ready to be used in {{term}}, though, since we can use the "compat" option. See tests (see the "Actual" column, ignore the "Expected"). BUT: it's the best time to do any other changes and improvements on the template. The easiest way to get the rid of the "lang" parameter is creating a Lua-ized version of {{term}} under another, better title and replace usages of {{term}} (when the "lang" is specified) with the new one. Any thoughts? Here is a discussion about using {{g}} instead of the gender templates, if we do this, we can choose {{m}} (mentioned term; the best title IMO) as the new title, and by this change we can probably use {{mt}} (mentioned term). --Z 21:50, 28 July 2013 (UTC)

Unfortunately, some people to prefer keeping all these old templates around, so there is no consensus to delete them. Apparently, the new situation is too complicated, but I'm guessing it's only complicated because they don't want to adapt? —CodeCat 21:56, 28 July 2013 (UTC)
Yes, I think people are opposing because they don't get this is an improvement really, maybe we should bring this up at BP again and explain the issue more exactly. Why should we have a four letters long title for such a highly used template, and a one letter one for a template which is normally NOT supposed to be used anywhere, because we have head, t, l, and eventually g? If we could delete {{m}}, we could enjoy our Lua-ized {{term}} with its new title long time ago and put our time and energy that is being wasted to improve it instead. --Z 22:41, 28 July 2013 (UTC)
I'm kind of tired of hitting walls though when I try to improve things and people complain that it's too complicated because they have to type more or because it's just not what they're used to. It's almost an automatic response... —CodeCat 22:46, 28 July 2013 (UTC)

Why do we even need a template like {{term}}? The only difference from {{l}} is that it italicizes, right? Is it worth having a separate template just for italicizing? Besides, only Latin-script words are italicized. --Vahag (talk) 23:04, 28 July 2013 (UTC)

No, the term should actually be tagged with a CSS class (also it's not possible to italicize only the term and not other parts of the output only using l and ''). --Z 23:58, 28 July 2013 (UTC)

What about {{M}}? We can move it to {{m}} later when we removed the gender templates. --Z 18:37, 30 July 2013 (UTC)

Here are the options we have right now:

  1. Semi-Lua-izing {{term}}, using the compat mode. We have to use "lang" in this case.
  2. Putting the Lua-ized version of {{term}} under another [temporary] title, say {{M}}, and consider {{term}} a deprecated template.
  3. Removing {{m}} and use that.

The advantage of 2 is that we can temporally enjoy using the template without "lang", and having a short title. Later we may decide to change it to another title, e.g. {{m}}. We will run a bot on all pages that have used {{term}} with the "lang" specified, and replace them with the new template. For those which doesn't have "lang", we have two options: (2.1) converting them to the new template by human, who add a language code to the new template, or (2.2) replacing them with {{M/m||...}}, by bot.

By choosing 1, the job will be a bit harder: first we should add "|lang=" by bot to all usages of {{term}} which doesn't have "lang". Then we can change the module in a way that if "lang" is specified (even by having an empty string as value) then use the old parameters (use "lang" instead of the first parameter and so forth), therefore people would be able to use {{term}} by passing the language code to the first parameter, too. Then we can run a bot on all usages of {{term}} to remove "|lang=(...)" and pass its value (which may be empty string) to the first parameter. I said we should add "|lang=" and not "|lang=und" because the latter means undetermined, while in this case they are unspecified; although the current code consider them identical, but we may (should) change the behavior in future. Later we may decide to change it to another title, e.g. {{m}}.

The third option is the best, but we can't choose it without convincing the community and removing the gender templates.

We should choose one of these ASAP. It's ridiculous that we have features but the community still can't use it. --Z 21:29, 30 July 2013 (UTC)

I think I prefer both 1 and 2 for now. We should definitely make the new format available (just like we have {{label}} next to {{context}}), but we can't just suddenly switch over because there are many entries to fix. Providing the language code to 20 thousand entries is going to be a huge task, so I think it may be better if we migrate only the cases that have a language code, leaving the rest as {{term}}. Even if we do have a consensus to use {{m}}, that template is still added to entries by bots (yeah... instead of {{head}}... :/). So if we orphan it, we can't be sure that someone's bot won't add it somewhere and break things in ways we hadn't foreseen. On the other hand, the new {{m}} would require a language code as the first parameter, so if someone's bot starts adding it to entries, it will start triggering script errors and alert us to the problem. —CodeCat 21:48, 30 July 2013 (UTC)
As I explained, we can migrate ALL cases, if the language is not provided, we can use {{M||text}} and the module will behave exactly like {{term|text}}. --Z 22:10, 30 July 2013 (UTC)
I don't know if I really like that idea. If people can leave out the language in this new template, then they probably will, and the whole problem starts all over again. I'd rather make the conventions for this new template very rigidly set in stone, so that it's clear how people can and can't use it. A script error is as close as it gets to "you're not doing it right", so we should definitely use that. —CodeCat 22:20, 30 July 2013 (UTC)
We can replace them with {{M|und|text}}. I don't think people will write {{M||text}} or {{M|und|text}} when the language is not undetermined, the main reason that they didn't specified language code for so many {{term}}s is actually because it had a named parameter, compare {{l}} for which users have always specified the language code. --Z 22:30, 30 July 2013 (UTC)
Let's start by converting {{term}} to Lua first. Hopefully that won't give too many problems. We also need to work on {{termx}}, which should be orphaned altogether, but only once {{term}} can support reconstructed languages. —CodeCat 02:16, 31 July 2013 (UTC)
What's next? I want to bring up the gender templates issue again at BP, let me know if you have any plan about the template. --Z 00:19, 4 August 2013 (UTC)
I think it's more or less complete. I've been orphaning both {{termx}} and {{recons}} in the last few days, and I've been trying to deal with the script errors that have been showing up. That will probably take a day or two. {{compound}} and {{suffix}} have also been converted to Lua. {{compound}} is "nice" as far as code goes, but {{suffix}} is a bit more hackish because I focused more on replicating existing behaviour instead of changing or adding new things. That's for later, when we decide what we want. {{prefix}} and {{confix}} will need to be converted as well, that won't take too long. —CodeCat 00:23, 4 August 2013 (UTC)

Links to "und"[edit]

The current version of the module will happily create links to appendix pages when the language is "und". But this is undesirable. Links to "und" are ok for the main namespace; the section id should be empty then. But for appendix pages, there should really be no link at all. This should be added to language_link but I'm not sure how. The best way seems to me that language_link should just return nil when it's not able to make a link. So language_link("attested", "alt", "und") should give [[attested|alt]] (without a #), but language_link("*reconstructed", "alt", "und") should give alt with no link at all.

I'm not sure what it should do when the term contains embedded wikilinks. What if someone writes: language_link("[[attested]] [[*reconstructed]]", nil, "und")? The most sensible thing to me would be to try to process each individual link the same as it would normally, so that this gives [[attested]] *reconstructed. I suppose that the function should only return nil if it could not create any links at all, so language_link("[[*rec1]] [[*rec2]]", nil, "und") should return nil, but I don't know if that is feasible.

An alternative we could try is to just return the link-less text instead of nil, but that seems to go slightly against the idea of making "links". What would be more useful? —CodeCat 22:37, 28 July 2013 (UTC)

Why should we link to the term when the lang is und and the term is attested? --Z 22:57, 28 July 2013 (UTC)
Category:Undetermined language that's why. ==Undetermined== is actually a valid language header. But Appendix:Undetermined/ is not valid as far as I can tell. So this language is kind of unique: it can be attested, but not reconstructed, whereas all other languages can be reconstructed, but some can't be attested. There's also another reason. {{term}} is still missing the language code on about 20 thousand entries, and the module currently treats this as "und"... which would then remove the link. I don't think we want that. —CodeCat 23:06, 28 July 2013 (UTC)
Oh, didn't know that. {{term}} was not a problem though: since it is not used to link to any appendix so far, all we actually need is if term and (compat or lang ~= "und") then
I think texts like attested [[*reconstructed]] is unlikely to appear in the term variable while the lang is und. Editors should be aware that we don't link to reconstructed terms. We have already made linking to reconstructed terms complicated, I'm not sure if this highly unlikely case is worth it to make the code even much more complicated because of it. So we should not call language_link when the have "*" at the beginning. There's a way to always fix this though: replace "[[Appendix:Undetermined/(...)|(...)]]" with the second capture. But I fear if we continue to handle such rare issues like this the code eventually become full of fixes of fixes of fixes... --Z 23:45, 28 July 2013 (UTC)
They're not so rare, though. There are many links with "und" as the code, using {{l}}, {{term}} and {{recons}}. So we can't just ignore it. And the handling of "und" must be done inside language_link, because that function is not used by just {{l}} and {{term}}, other modules also use it. It's better to make the code robust now, rather than regret it later when things start behaving strangely. —CodeCat 23:57, 28 July 2013 (UTC)
OK, done. Use User:ZxxZxxZ/links/User:ZxxZxxZ/term to test it if you want, or see User talk:ZxxZxxZ/links/User talk:ZxxZxxZ/term, I've added a test for this case. --Z 03:17, 29 July 2013 (UTC)
I also added the "curtitle" argument to language_link, when this is provided, the function doesn't link to current title. --Z 03:19, 29 July 2013 (UTC)
Does that work even with embedded wikilinks? —CodeCat 10:45, 29 July 2013 (UTC)
Yes (did you see testcases?) {{User:ZxxZxxZ/links|und|[[attested]] .. [[*unattested]]}} -> User:ZxxZxxZ/links, {{User:ZxxZxxZ/links|und|*[[unattested]] .. [[unattested|alt]]}} -> User:ZxxZxxZ/links --Z 16:35, 29 July 2013 (UTC)
It looks good. Can you make the changes to Module:links? —CodeCat 16:58, 29 July 2013 (UTC)
Done, please improve (wording, etc) / add comments as you see fit. --Z 18:33, 29 July 2013 (UTC)

The format of the annotations[edit]

I noticed that we have two different formats for genders, inflections and annotations. Which one we use depends on the template:

  • {{head}}: term (tr) gender (inflections)
  • {{term}} and {{l}}: term gender (tr, glosses)
  • {{t}}: term (tr) gender

I think we should make all of these show the same, but I'm not sure which way. I think the transliteration should come right after the term, but that would look like this on {{term}} when other glosses are also given: term (tr) gender (glosses). And if there's no gender (which is most of the time) then it becomes: term (tr) (glosses). That doesn't look so nice. So what should we do? —CodeCat 12:47, 1 August 2013 (UTC)

We can split tr and gloss only when we have gender. --Z 21:15, 1 August 2013 (UTC)

Diacritic removal for Lithuanian and Latvian[edit]

tumė́ti should link to tumėti, vil̃na should link to vilna and similar. See more on w:Lithuanian accentuation and w:Latvian_language#Pitch_accent. --Ivan Štambuk (talk) 22:59, 15 August 2013 (UTC)

This should really go at Module:languages, that's where it's defined. Which diacritics should be removed from each language? —CodeCat 11:49, 18 August 2013 (UTC)

Mycenaean Greek italicized[edit]

When it shouldn't be, see etymology at: ναῦς. --Ivan Štambuk (talk) 09:39, 18 August 2013 (UTC)

It doesn't appear italic for me? —CodeCat 11:48, 18 August 2013 (UTC)
It was a minor issue in common.css, fixed it. --Z 14:23, 18 August 2013 (UTC)

"*" before automated transliterations of reconstructed terms[edit]

Shouldn't we remove it? --Z 19:50, 20 August 2013 (UTC)

Maybe. I'm talking to Vahag right now about how to treat scripts for reconstructed terms. Latin script is often used for reconstructions even if it's not a native script of the language otherwise. So detect_script should probably do something special with the *, while it should be removed before transliterating. —CodeCat 20:22, 20 August 2013 (UTC)

broken broken broken[edit]

It's broken when used in {{term/t}}. Why don't you do test edits on some test module, instead of live system?! --Ivan Štambuk (talk) 13:45, 26 August 2013 (UTC)

I can't fix it if you don't tell us where it's broken. Got an example? —CodeCat 13:46, 26 August 2013 (UTC)
See: Appendix:Proto-Slavic/bьrdo. --Ivan Štambuk (talk) 13:52, 26 August 2013 (UTC)
That wasn't caused by any of my recent edits. I fixed it, though. —CodeCat 14:20, 26 August 2013 (UTC)
It was caused by target2 == export.make_pagename(linktitle, lang) that you added at the end of the "core" function. Linking to appendix with an alt which doesn't start with "*" is not allowed now (see the tests 10, 11, 12 of Module talk:links/testcases). Is it a good idea? --Z 14:28, 26 August 2013 (UTC)
I didn't realise that make_pagename would trigger errors like that. I don't think it's necessarily a bad idea either though, because we caught the above "mistake" thanks to it. Otherwise, the * would have been missing from the displayed link, and there wouldn't have been an indication that it's reconstructed. —CodeCat 14:42, 26 August 2013 (UTC)
Did you see the testcases that I mentioned? In past you discussed with me about changing the module is such a way that works with cases like {{l|sla-pro|*[[dьnь]] [[dьnь]]}}, but they won't work after your change... --Z 14:56, 26 August 2013 (UTC)
Ok, that is a bit of a problem but I'm not quite sure how to fix it. The purpose of the code I added is to see whether the alternative form in the link is redundant, because the page being linked to would be the same anyway once diacritics are removed from it. In other words, it's meant to tell us when {{l|xx|term|alt}} can be safely converted into {{l|xx|alt}}. —CodeCat 15:32, 26 August 2013 (UTC)
Easy to fix: if the language is a reconstructed one, pass "*" .. linktitle (instead of linktitle) to make_pagename. --Z 10:53, 27 August 2013 (UTC)
I'll leave it for now, though. I just found a second entry that was missing the * in the alt form. So it can be useful. —CodeCat 11:04, 27 August 2013 (UTC)
OK, but putting them in a category is a more appropriate way to find these mistakes, instead of returning script error. --Z 12:36, 27 August 2013 (UTC)
I tried to implement your suggestion above, by making it add * to the linktitle before calling make_pagename. But it doesn't seem to be working. —CodeCat 11:19, 28 August 2013 (UTC)
The categories are causing this, because they are being added at the middle of the process of searching for embedded links (see this test and the value of the target variable mentioned in the error message which is related to the third capture of the regex). So the solution is inserting the categories in a table and adding them to the text just before returning the text (at the end of language_link), like what you've done in another module, headword or translations if I'm not mistaken. --Z 15:54, 28 August 2013 (UTC)
That fixed it, thank you! —CodeCat 16:00, 28 August 2013 (UTC)
I noticed that now, any embedded links to reconstructed terms that don't have an alt form embedded in the link will be marked as redundant: {{l|gem-pro|*[[dagaz]]}}. It's not a serious issue, but worth noting. —CodeCat 16:09, 28 August 2013 (UTC)
I haven't tested it but that's probably happening only for links like {{l|gem-pro|*[[dagaz]]}} (but not {{l|gem-pro|[[*dagaz]]}} etc.) because *[[dagaz]] will be turned into [[*dagaz|dagaz]] at the "fix for linking to unattested terms (...)" part, which itself becomes [[*dagaz|*dagaz]] in core. We can fix this by checking if new is equal to target AND new is not equal to linktitle, after the line new = export.make_pagename(new, lang), in this case, it shouldn't be marked as redundant. --Z 16:27, 28 August 2013 (UTC)

Link suffixes are not linked[edit]

See: запахи следοв (zapaxi sledοv). The declension suffixes should be linked. I think this module is responsible for it. Keφr 09:11, 12 September 2013 (UTC)

If you want them to be linked, then you need to include them in the link... —CodeCat 11:15, 12 September 2013 (UTC)
No, I need n— wait, it only works for ASCII characters? Apparently: pyjům vs pyjem. Stupid. Keφr 11:45, 12 September 2013 (UTC)
That is a problem with the software then, not with the module. But you can work around it anyway. —CodeCat 11:50, 12 September 2013 (UTC)
It behaves differently in different language editions of the wikis. --Z 12:17, 12 September 2013 (UTC)

Newlines should be stripped[edit]


Please fix this. Keφr 14:46, 16 September 2013 (UTC)

Fixed it for you.[1] ;) --Z 15:27, 16 September 2013 (UTC)
OK seriously: I think it's not Module:links' job to strip final/initial new lines (nor this may be always wanted). If someone wants them to be stripped, s/he should use named parameters (|2=, in this case), otherwise, if the new line is added mistakenly, I think it should be fixed from the page code, the page in which the template is used. --Z 15:28, 16 September 2013 (UTC)
Okay. I was not aware of this trick. Well, sometimes the newline in the page markup is quite convenient and aids readability (in inflection templates, for example), so I would rather have it removed after being passed to the template. And I thought this would be the easiest way. Keφr 15:58, 16 September 2013 (UTC)
You should probably escape newlines in templates by wrapping them in comments. —CodeCat 16:06, 16 September 2013 (UTC)
When invoking declension table templates in entries, it would defeat the purpose of having newlines in the first place. So no. Keφr 16:23, 16 September 2013 (UTC)

interwiki links should not be redirected[edit]

See Heisenberg uncertainty principle. Keφr 18:09, 17 September 2013 (UTC)

I don't understand what you mean. —CodeCat 18:47, 17 September 2013 (UTC)
In the first link in the headword, the link to Wikipedia is redirected to a section named "English". Does little harm, but well… just feels kinda unclean. Keφr 18:59, 17 September 2013 (UTC)


There are many more international punctuation symbols to be removed (e.g. Tibetan), I have added for removal a few more:

text = gsub(text, "[؟?¿!¡;՛՜ ՞ ՟?!।॥။၊]$", "")

There are all used at the end of sentences (except for Spanish inverted ¿ and ¡), so they shouldn't cause problems for other languages. --Anatoli (обсудить/вклад) 00:44, 24 October 2013 (UTC)


I think {{l}} should accept a |face= parameter, to be used for example in headword templates (which may need to set |face=bold). template_l_term will need to be extended. Keφr 19:52, 5 November 2013 (UTC)

I don't see why. What's wrong with three apostrophes? —CodeCat 20:22, 5 November 2013 (UTC)
They would also cover the transliteration, if supplied to (or generated by) {{l}}, and if I put them inside the template (as in, {{l|und|'''[[foo]]'''}}), it breaks some of the functionality provided by the template (like handling already-marked-up links; although it is somewhat doubtful that headwords would or should take advantage of it). Are you planning on staying, or am I asking too soon? Keφr 22:11, 5 November 2013 (UTC)
There was recently a discussion about transliterations for inflected forms. I think the practice is not to add them. You can tell {{l}} to not show a transliteration by using tr=-. So this will work: '''{{l|xx|something|tr=-}}'''. —CodeCat 23:15, 5 November 2013 (UTC)
Template:yi-noun, Template:yi-proper noun and Template:yi-adj say otherwise. Keφr 08:45, 7 November 2013 (UTC)
Let's not have templates enforce policies. If it is decided for a particular language to show transliterations for inflected forms, the templates should be able to handle that. The same applies to {{head}}, I had to use an ugly workaround for אךׄ and אכׄת. --WikiTiki89 14:31, 7 November 2013 (UTC)

Link alt form tracking/redundant, Link alt form tracking/redundant/ru[edit]

Is this piece of code still required? I've asked CodeCat but got no reply here. Every single Russian entry gets added to these categories (Category:Link alt form tracking/redundant and Category:Link alt form tracking/redundant/ru) when e.g. genitive sg and nominative pl (nouns) are added to the headword.

if target == new then
	tracking = tracking .. "[[Category:Link alt form tracking/redundant]][[Category:Link alt form tracking/redundant/" .. lang .. "]]"

--Anatoli (обсудить/вклад) 02:54, 6 December 2013 (UTC)

It's not required, but still useful for tracking down cases where the second parameter isn't needed anymore. So I would prefer to keep it. Are you sure it's the headword that's doing it though? I thought it was the inflection tables, but they can be fixed fairly easily. —CodeCat 03:16, 6 December 2013 (UTC)
It can be both. See зуб мудрости. I don't know what's causing it but there are too many entries affected. If you're planning to work on it, please say so, otherwise it's a bit annoying to have correctly formatted entries added to categories nobody checks. --Anatoli (обсудить/вклад) 03:26, 6 December 2013 (UTC)
The inflection templates can be fixed fairly easily. In the inflection templates, you can see that often the "form" is a wikilink, with both a page name and a display form. You can remove the page name part (which is the same as the display form without the accents, hence the redundancy), and then remove the link [[ ]]'s as well. So [[a|á]] becomes just á. It would help a lot if you could do this. I will look at the headword templates, but can you give an example where it occurs? зуб мудрости is not really a good example because the problem is in the linked display form, which has a redundant piped link. I've fixed it now: diff. —CodeCat 03:32, 6 December 2013 (UTC)
Opps. I knew this type of links causes issues: зуб, I forgot to remove the pipe. Could you explain what you mean by removing a page name part? Do you mean from declension templates? Is {{ru-noun-1}} an example, which doesn't have a page name part? --Anatoli (обсудить/вклад) 03:40, 6 December 2013 (UTC)
Yes, that template has already been fixed, so the entries that use it don't appear in the tracking categories. I think {{ru-noun-3-а}} is the first in the category of templates that hasn't been changed. You will probably notice that after you fix the templates, some of the parameters are actually no longer used anywhere by the template. This is ok; it just means that the templates and the entries that use them will need to be looked at and fixed to change the parameters around, but you don't need to do that unless you want to (you would need a bot to make it easy). —CodeCat 03:47, 6 December 2013 (UTC)
I've made the changes to {{ru-noun-3-а}} now, to show what needs to be done: diff. I hope that helps. —CodeCat 03:53, 6 December 2013 (UTC)
I see. It does help, thanks. I'll try to do it. --Anatoli (обсудить/вклад) 04:09, 6 December 2013 (UTC)
I think we should eliminate these sorts of pseudo-cleanup categories, where there's absolutely nothing wrong with the page. If CodeCat wants to spend her time pseudo-cleaning-up such pages, I guess that's fine, but the category confuses other editors into thinking they're doing something wrong. —RuakhTALK 07:56, 6 December 2013 (UTC)
I personally don't mind having these categories for entries I work with as long as they are understood and there's some action plan. I agree that we could use some prior discussion but this change was kind of expected with the work CodeCat was doing with the Russian declension templates. --Anatoli (обсудить/вклад) 08:47, 6 December 2013 (UTC)
The categories are hidden, so they shouldn't really get in anyone's way, I think? I'm not sure if I understand why people would think there's something wrong if they can't see it. And if they have hidden categories turned on, then... well then that's kind of up to them isn't it? —CodeCat 14:41, 6 December 2013 (UTC)
Firstly — I've definitely seen these, or their friends, as red categories at the bottom of the page. That means that not only are they not hidden, but they actually stand out more than regular categories. Secondly, even when these categories are properly created and hidden — real cleanup categories are also hidden. Editors who want to help clean up real issues shouldn't be tricked into thinking they should also clean up fake ones. (If you really want a hidden list, I suppose you can use something like pcall(require, 'Module:User:CodeCat/Link alt form tracking/redundant') and Special:WhatLinksHere/Module:User:CodeCat/Link alt form tracking/redundant. It's still worse than nothing, but I think it's at least better than what we have now. Though DCDuring might disagree with me, since that will break the WantedPages list . . .) —RuakhTALK 21:57, 6 December 2013 (UTC)
The category was empty at one point, so later changes have repopulated it. We should probably try to empty it out again, starting with the Russian one. Once it's empty, we can get rid of the language-specific subcategories if needed, so they won't appear as red anymore. —CodeCat 22:35, 6 December 2013 (UTC)
Re: "We should probably try to empty it out again, starting with the Russian one": Why? Why does it matter? What are the disadvantages of these "redundant" entry-names? What are the advantages of removing them? —RuakhTALK 04:16, 7 December 2013 (UTC)
I don't know, it just seems neater that way. —CodeCat 04:18, 7 December 2013 (UTC)
Right: no advantages. If you want to neaten up entries you happen to be editing, or even to seek out entries for neatening, that's fine; and heck, if you want to run a bot to neaten up these entries, O.K., I think that's probably fine. (It's not the very best idea, because of course bot-edits have nonzero risk, but whatever.) But there's really no justification for a cleanup category. Please remove the piece of code that Anatoli refers to. —RuakhTALK 04:41, 7 December 2013 (UTC)
Thanks. :-)   —RuakhTALK 21:59, 7 December 2013 (UTC)

Sindhi diacritics[edit]

Sindhi diacritics should work the same way as Arabic, Persian and Urdu, so it should be the same treatment. The translation adder made: لُغَتُ‎ on dictionary#Translations but it should be just لُغَتُ‎ (currently shows in red) with diacritics automatically removed. The tool knew how to link to the entry without diacritics but used "alt=". (I haven't added the transliteration but it's something like "luğatu"). --Anatoli (обсудить/вклад) 02:29, 3 February 2014 (UTC)

Korean transliteration[edit]

It fails on this module when there is a manual transliteration. There are thousands of translations with manual transliteration and in this case it's desirable because it capitalises proper nouns (romaja is usually capitalised for place names:

  1. Manual: 미얀마 (ko) (Miyanma), 버마 (ko) (Beoma)
  2. Automatic: 미얀마 (ko) (miyanma), 버마 (ko) (beoma)

--Anatoli (обсудить/вклад) 05:57, 28 May 2014 (UTC)

Fixed - missing "s" in "annotations". Wyang (talk) 06:05, 28 May 2014 (UTC)
Thanks. Now I see automatic overrides manual, at least for Korean. Is that good? Probably OK, although romaja should capitalise place names. It's NOT OK for languages, which have unhandled exceptions and word stresses are provided in the transliteration but not in the native script, such as Russian. --Anatoli (обсудить/вклад) 06:09, 28 May 2014 (UTC)
I think something is better than nothing at all. I've enabled putting a "^" in front of the letter to be capitalised to allow capitalisation for languages whose script has no case distinction. I don't know Russian well but do you think it'd be possible to make a pronunciation module for Russian? Using accent marks, and some extra tricks for irregularities. Wyang (talk) 06:26, 28 May 2014 (UTC)
I noticed that. Yes, I think it's OK. If Korean transliteration is reliable, we can sacrifice the capitalisation or use ^, as you did. Re: a pronunciation module for Russian. Yes, please! It is predictable in 95-99% of cases, there are some variants and exceptions can use phonetic respelling, e.g. сегодня as сево́дня (sevódnja). I can teach you some Russian too, if interested. --Anatoli (обсудить/вклад) 06:32, 28 May 2014 (UTC)
Great, thanks. Now that I have the confirmation... If no one wants to take the lead, I might do so, in which case I will have to bombard you with questions. :) Wyang (talk) 06:48, 28 May 2014 (UTC)

I have answered in Module talk:ru-pron. --Anatoli (обсудить/вклад) 07:02, 28 May 2014 (UTC)

Linking to reconstructed terms when lang is und[edit]

I noticed that the module links to reconstructed terms while lang is "und": {{l|und|*term}} -> *term, as far as I recall we had fixed this before. --Z 13:57, 3 July 2014 (UTC)

Not an actual problem, but curious behavior[edit]

I can't think of a situation in which this kind of use would arise and be needed (hence it isn't a problem per se that it doesn't work), but I noticed on one of my user-subpages that {{term|*?|lang=sca-pro}} produces a module error saying "Lua error in Module:links at line 102: The specified language Proto-Siouan-Catawban is unattested, while the given word is not marked with '*' to indicate that it is reconstructed" (even though the word is marked with '*'), while {{term|*??|lang=sca-pro}} works fine. - -sche (discuss) 01:27, 21 August 2015 (UTC)

The last question mark is stripped as a punctuation character. Thus, your first example is really just *, which I guess is a special case to be able to link to asterisks, and the second one links to Appendix:Proto-Siouan-Catawban/? with just one question mark. --WikiTiki89 01:40, 21 August 2015 (UTC)
The question mark shouldn't be stripped if it's the only character in the string, just like * alone is a special case. The issue is that the asterisk is included in the code that removes the question mark, so it thinks the text consists of more than a question mark alone. Perhaps the "proper" solution is for the asterisk to be stripped before passing it to the conversion function. However, this edge case is so specific that it might not be worth the effort. —CodeCat 00:23, 30 October 2016 (UTC)

Disabling auto-translit[edit]

Is there a way to disable auto-translit if the link text equals — or &mdash;? KarikaSlayer (talk) 13:55, 6 July 2016 (UTC)

Automatically replacing plain apostrophes with curly apostrophes in link text[edit]

Two recent discussions suggest to me that it would be ideal if this template automatically substituted plain apostrophes with a better-looking character in link text. In the Beer parlour (Wiktionary:Beer parlour/2016/October § ASCII vs. Unicode apostrophes in French entries) @Angr describes it as the usual practice to create French entry names with the plain apostrophe but to display the curly apostrophe (right single quotation mark) in headwords. In Wiktionary talk:About Ancient Greek § Symbol to mark apocope, I proposed that a similar thing be done for Ancient Greek headwords.

Anyway, I thought a similar thing should be done for links. It looks like the function makeLangLink, which deals with link text, would be the place to insert the code. I don't have template editor privileges, but I think the code would to perform this replacement would look something like this:

if lang:getCode() == "fr" or lang:getCode() == "grc" then
	link.display = mw.ustring.sub(link.display, "\'", "’")

This would hopefully make {{m|fr|d'où}} automatically display as d’où, as if it had been produced by the code {{m|fr|d'où|d’où}}. Similarly, {{m|grc|ἀλλ'}} would automatically display as ἀλλ’ ‎(all'), like {{m|grc|ἀλλ'|ἀλλ’}}.

Ideally the curly apostrophe should also be used in the transliteration – ἀλλ’ ‎(all’). Perhaps the replacement should be done somewhere other than in the function makeLangLink, so that both the link text and the text used to make the transliteration already have the curly apostrophe. — Eru·tuon 00:13, 30 October 2016 (UTC)

The code to create the page name for a given display form is actually in Module:languages, specifically makeEntryName. —CodeCat 00:16, 30 October 2016 (UTC)
I'm aware of that, and I'm talking about changing link text, not determining the entry name. — Eru·tuon 00:32, 30 October 2016 (UTC)
So the reverse. Can we be absolutely sure that this change is always appropriate? In some languages, the apostrophe or a similar character (for which we might substitute an apostrophe) is a regular part of the orthography. And what about when ' is used as a quotation mark in an entry name? —CodeCat 00:41, 30 October 2016 (UTC)
I have not encountered quotation marks in entry names; could you give an example?
@CodeCat: It would certainly always be correct in Ancient Greek entries, since there is no other use of an apostrophe in that language. I would assume so in French, but I do not know for sure. 00:46, 30 October 2016 (UTC)
I oppose automatically changing the displayed text for almost any reason, including this one. If we want to do this, it should be by changing the link target. --WikiTiki89 00:45, 30 October 2016 (UTC)
@Wikitiki89: Could you explain why? Do you also oppose displaying plain apostrophes in headwords as curly apostrophes? — Eru·tuon 00:47, 30 October 2016 (UTC)
I don't oppose displaying curly appostrophes, I only oppose the automatic conversion of plain appostrophes to display curly appostrophes (by automatic, I am only referring to Lua modules). --WikiTiki89 01:39, 30 October 2016 (UTC)
@Wikitiki89: Very well, my second question was not worded well. Why do you oppose automatic conversion of plain to curly apostrophes as opposed to manually entering forms in which plain apostrophes are changed to curly into an |alt= parameter? The effect is the same; the only difference is that there is less repetitive work involved. — Eru·tuon 01:53, 30 October 2016 (UTC)
It doesn't have to be an alt parameter. As I said, I'm ok with Lua automatically converting characters for the target link, just not for the display text. So {{l|fr|c’est}} can link to c'est, but {{l|fr|c'est}} should not display "c’est". --WikiTiki89 01:59, 30 October 2016 (UTC)
@Wikitiki89: Well, ideally that should happen too, but only having that would not be particularly useful to me as a Windows user. It's easy to type an apostrophe directly from the keyboard, but you have to use the annoying combination of Alt and 0145 to get a right single quotation mark, or navigate to the correct EditTools menu and select the character. Both are a hassle. It would make things much easier if the module did the work for me. So once again, why do you oppose having a module do it? — Eru·tuon 02:19, 30 October 2016 (UTC)
For a number of reasons. It's bad practice in general. People expect things to display the way they are entered. As for entering them, just copy and paste it. It's not that hard. --WikiTiki89 02:22, 30 October 2016 (UTC)
Well, expectations are overruled by consensus regularly (though I admit that I can think of no functions that automatically modify displayed text from the form in which it was entered; if there were, it would probably be in this module and the headword module). If editors for a particular language have agreed to use a particular apostrophe character, it would be easier to enforce it through modules than to manually replace all apostrophes in all text in that language through all entries, and to have to regularly do cleanup to make sure that newly added text in that language has adhered to the standard. Easier to have a module automatically display c'est as c’est than to have to change ' to wherever it occurs in French text. Of course, perhaps there is no such consensus regarding French. It would be easier to develop consensus for Ancient Greek, which I would think has a smaller group of editors. — Eru·tuon 03:41, 30 October 2016 (UTC)

Phonetic extraction[edit]

@CodeCat, Wyang, I know this was mentioned elsewhere, but I'd like to bring it up directly. @Erutuon has moved the transliteration override data to mod:languages's data. If the phonetic_extraction data were similarly moved there, would this solution be amenable to all parties? —JohnC5 20:27, 14 March 2017 (UTC)

No, because it's completely unnecessary to have it anywhere. All the code does is disable the regular transliteration in favour of a custom module which does the transliteration. If the data in Module:languages were changed so that the translit_module option points to that module, or the current Module:th-translit were modified to call it, then there would be no need for the custom code. —CodeCat 20:31, 14 March 2017 (UTC)
Ah, so that is still a no then. What ever happened to @Isomorphic's transcription vs. transliteration differentiation? I believe we were looking into that for both this and other languages. —JohnC5 20:57, 14 March 2017 (UTC)
I don't think that removing the code should depend on the outcome of that. Wyang's code doesn't belong in Module:links or Module:languages no matter what happens, especially since there's an easy and obvious existing way to make it work without: translit_module. If it were just done that way, like all the hundreds of languages besides Thai already do, we wouldn't have this mess. —CodeCat 21:05, 14 March 2017 (UTC)
So, I believe as far as @Chuck Entz is concerned, this means that neither you nor Wyang will be reinstated as admins until you agree on a solution. I was just trying to see whether we'd made any progress in that respect. —JohnC5 21:11, 14 March 2017 (UTC)
I presented the only proper solution that works within the existing framework of our modules; one that does not depend on the passing of any votes. Is there a particular reason why it's not taken? Why does Wyang's agreement determine my admin status? —CodeCat 21:16, 14 March 2017 (UTC)
I'm not arguing either side of this. Just looking for a solution. —JohnC5 02:22, 15 March 2017 (UTC)
And here I was hoping there'd be one. Waste of time. —CodeCat 02:23, 15 March 2017 (UTC)
*sigh* I'm just trying to help, Code. I'm not your enemy. —JohnC5 02:37, 15 March 2017 (UTC)
@JohnC5 I definitely think the phonetic_extraction data would more properly belong to the language modules, as a language-specific property. There are many languages that would benefit from the existence of such a function. Recently there was a discussion at User talk:DerekWinters#Assamese about how best to deal with languages such as Bengali and Assamese whose pronunciations are not very predictable from the spelling. A good solution for these languages would be to assign a respelling to a word, and follow Thai's practice. Wyang (talk) 06:50, 15 March 2017 (UTC)

Unsupported titles[edit]

@Erutuon This seems like a perfect candidate to be moved into a data subpage to me. —JohnC5 15:25, 24 March 2017 (UTC)

@JohnC5: I agree. Done. And now unsupported titles are linkable: :. — Eru·tuon 19:08, 24 March 2017 (UTC)

Script tags for transliterations[edit]

@Erutuon, I think your recent edits here did something wonky to the fonts for the transliterations. Could you check it out? — justin(r)leung (t...) | c=› } 04:14, 19 May 2017 (UTC)

@Justinrleung: What do you mean? I was aware that adding, for instance, lang="ja" to Japanese transliteration caused transliterations to display with fonts more appropriate for Japanese script, but I changed it to lang="ja-Latn", which may not have the same effect. If it does have that effect, there is a discussion on this topic at Wiktionary:Grease pit/2017/May § CSS classes for transliterations. I do have an idea for how to solve this problem, which I mentioned in that discussion. — Eru·tuon 04:26, 19 May 2017 (UTC)
@Erutuon: Thanks for pointing me to the discussion. I see that you have the same problem I pointed out. Thanks! — justin(r)leung (t...) | c=› } 04:38, 19 May 2017 (UTC)
@Justinrleung: Actually, the problem has been solved for me, ever since I changed the language code for transliterations to lang="language code-Latn". Rōmaji in Japanese headwords displays wrong, because it has class="Jpan"; but that has nothing to do with my recent changes. (I can't figure out how to fix it, unfortunately.) Could you point me to an entry in which you see the problem that you are talking about? — Eru·tuon 04:44, 19 May 2017 (UTC)
@Erutuon: ghee is one of them; the Hindi, Urdu and Sanskrit transliterations aren't using the normal font. — justin(r)leung (t...) | c=› } 04:48, 19 May 2017 (UTC)
@Justinrleung: That sounds like what happened before. Maybe some browsers (not mine) ignore the -Latn part of the language attribute and just look at the language code part, applying fonts appropriate to the ordinary script of the language, but not appropriate for transliteration. That would be an argument for using class="tr-language code" instead. — Eru·tuon 04:53, 19 May 2017 (UTC)
@Erutuon: It's a problem with Firefox, which I'm using, then. Your solution with class="tr-language code" might work better across browsers. — justin(r)leung (t...) | c=› } 05:26, 19 May 2017 (UTC)
@Justinrleung: Fortunately, now that most transliterations are tagged with Module:script utilities, this can be changed very quickly and easily. I'll post on the grease pit thread above, though, before changing anything. — Eru·tuon 05:31, 19 May 2017 (UTC)
Don't some Japanese terms include transliterations in both Latin and Kana scripts? If so, then tagging the entire thing as Latin would be inappropriate. —CodeCat 13:59, 19 May 2017 (UTC)
Heh. I just checked the Japanese translation from water/translations and you're right: {{t+|ja|水|sc=Jpan|tr=みず, mizu}}. Both transliterations are supplied in the same parameter, so they are tagged the same way (and incorrectly for the Kana one). That's bad. — Eru·tuon 15:38, 19 May 2017 (UTC)
@CodeCat, Erutuon: Yeah, that’s definitely a problem, whatever might be applied to the transliteration as a whole. The kana and Latin spellings need separate tagging, whether it is by script or class attributes. Looking over water/translations, I also see that Japanese (and others of the Japonic group such as Okinawan) seems to be the only case where this is done. We don’t e.g. use Zhuyin for Mandarin Chinese. I suppose the main reason that the kana spelling is supplied is that it is another possible spelling of the word, but as such, perhaps we should not treat it like a transliteration or transcription at all, but have a separate parameter for an equivalent spelling in a different mode of writing, e.g.{{t+|ja|水|sp2=みず|tr=mizu}}. I don’t know whether there would be use cases for other languages, but I suggest that the parameter be intended generally for equivalent respellings within an integrated writing system (in this case Jpan), and not for completely separate writing systems (e.g. Latin/Cyrillic, etc.). Of course, this would require a bot run through our Japanese/Japonic translations, and maybe a change to the Translation Adder, but I think it’s the best way to avoid these code errors and to structure our data appropriately. Also, another solution such as having Lua run through the transliteration looking for different script characters would make {{t}} more bloated. – Krun (talk) 14:07, 24 May 2017 (UTC)

Edit request[edit]

@Erutuon Please replace lines 342-343 with the following:

		local class = ""
		if data.accel then
			class = "form-of lang-" .. data.lang:getCode() .. " " .. data.accel
		-- Only make a link if the term has been given, otherwise just show the alt text without a link
		link = m_scriptutils.tag_text(data.term and export.language_link(data, allowSelfLink, dontLinkRecons) or data.alt, data.lang,, face, class)

CodeCat 12:46, 20 August 2017 (UTC)

Yes check.svg DoneEru·tuon 18:10, 20 August 2017 (UTC)

Adding ts= param[edit]

Per this discussion, can someone please add these lines. Thanks.

After line 288:

	elseif itemType == "ts" then
		tag = { '<span class="ts mention-ts Latn">/', '/</span>' }

Replace lines 321-322:

	-- Transliteration and transcription
	if or data.ts then

Replace line 330:

		if and data.ts then
			table_insert(annotations, require("Module:script utilities").tag_translit(, data.lang, kind) .. " " .. export.mark(data.ts, "ts"))
		elseif data.ts then
			table_insert(annotations, export.mark(data.ts, "ts"))
			table_insert(annotations, require("Module:script utilities").tag_translit(, data.lang, kind))

--Victar (talk) 15:24, 8 March 2018 (UTC)

Yes check.svg Done Thanks, @JohnC5 --Victar (talk) 02:43, 9 March 2018 (UTC)

@JohnC5, Victar, Wikitiki89 Could one of you please add information on the correct use of |ts= to Template:link/documentation? I see from the discussion at Wiktionary:Beer parlour/2018/February#Transcription parameter again that most people consider it undesirable to use |ts= for IPA transcriptions, but that's exactly how I've been using it for Burmese, since the pronunciation is not easily deducible from the transliteration. Wikitiki reverted me once because that wasn't the intention behind |ts=, but I'm still not convinced it's such a bad idea. —Mahāgaja (formerly Angr) · talk 12:19, 4 April 2018 (UTC)

@Mahagaja: The problem with putting both IPA and non-IPA in the |ts= parameter is that IPA should be formatted with IPA fonts and non-IPA without, and so one or the other will be formatted incorrectly. Or maybe it's acceptable to put both in an IPA font, if some transcriptions are IPA. (I have no objection to the transcription of Burmese using the IPA otherwise, as long as we specify that you can't, for instance, provide an IPA transcription of English in a linking template.) But this is perhaps not the best place for these remarks.... — Eru·tuon 17:43, 4 April 2018 (UTC)
@Erutuon: And I have no objection to using IPA characters without formatting them as such, since doing so will invariably result in filling up CAT:IPA pronunciations with invalid IPA characters and CAT:IPA pronunciations with invalid representation marks. And I also have no objection to using a non-IPA, pronunciation-faithful transcription like BGN/PCGN (see WT:Burmese transliteration) instead of IPA. —Mahāgaja (formerly Angr) · talk 17:49, 4 April 2018 (UTC)
@Mahagaja: Well, Module:links wouldn't put anything in those categories unless someone made it do that, and it's a fairly costly process in memory and processing time so it would be better not to. — Eru·tuon 18:44, 4 April 2018 (UTC)
My objection was not to the transcription system, but to the idea of having separate transliterations and transcriptions for Burmese in general. Since Burmese is spelled phonetically enough for us to be able to automatically generate IPA, we do not need to have separate transliterations and transcriptions. If our transliterations are so convoluted that they mask the pronunciation, then perhaps we should use a different transliteration system. Or else when an etymology only makes sense given the actual pronunciation, then we can explain it in words instead of stuffing it into another template parameter. Like this. Although, frankly, many of our readers are not gonna make much more sense of that IPA either. --WikiTiki89 18:54, 4 April 2018 (UTC)
There are Burmese words whose pronunciation is unpredictable from the spelling, though. {{my-IPA}} relies on a lot of ad-hoc devices (apostrophes, plus signs, etc.) to get the IPA right; and there are words like ဘတ်စ်ကား (bhatcka:) whose spelling is so weird {{my-IPA}} fails on them and their transcription and pronunciation have to be added manually. —Mahāgaja (formerly Angr) · talk 19:05, 4 April 2018 (UTC)
Even the example that Wikitiki89 mentions above, ခြင်္သေ့ (hkrangse.), seems to be unpredictable: it is pronounced /t͡ɕʰɪ̀ɴðḛ/, but {{my-IPA}} generates /t͡ɕʰɪ̀ɴθḛ/ if it's not supplied a respelling. (I could be wrong because I don't know Burmese.) — Eru·tuon 19:29, 4 April 2018 (UTC)
Oh sorry, my bad. I take it back then. Still, I don't think we should be selectively adding transcriptions to some Burmese links. Either we're gonna (try to) add them everywhere, or only in the pronunciation section of the entry (which is, after all, what we do with English). --WikiTiki89 20:48, 4 April 2018 (UTC)
Rendaku-style voicing is very common in Burmese compounds but not 100% predictable; ခြင်္သေ့ (hkrangse.) could in principle be /t͡ɕʰɪ̀ɴðḛ/ or /t͡ɕʰɪ̀ɴθḛ/, but the former is maybe a little more likely. Reduction of the first syllable is also very common, and is often accompanied by voicing of the consonant before the reduced vowel, so in theory ခြင်္သေ့ (hkrangse.) could also be /t͡ɕʰəðḛ/ or /d͡ʑəðḛ/ (and this vowel reduction sometimes occurs Upper Burma even where it doesn't in the standard language, so for all I know this word really is pronounced /t͡ɕʰəðḛ/ or /d͡ʑəðḛ/ in Mandalay). So I do think it would be helpful for people to see a pronunciation-faithful transcription – even in cases where the pronunciation is well-behaved, just so they know the word in question isn't one of the many exceptions. —Mahāgaja (formerly Angr) · talk 21:17, 4 April 2018 (UTC)
I support the use of |ts= per Mahāgaja and not just for Burmese but other languages as well. Let's take Chinese (Mandarin) word 人民幣人民币 (rénmínbì) for example. Some people might wonder why in the Russian transcription/transliteration of Mandarin the initial "r-" is rendered ж (ž), not р (r) in Russian, e.g. жэньминьби́ (žɛnʹminʹbí) ([ʐɨnʲmʲɪnʲˈbʲi]). With 人民幣 (rénmínbì /ʐən³⁵ min³⁵ pi⁵/) it would make a little clearer that Mandarin "r" is actually /ʐ/, for which the Russian /ʐ/ is almost identical and a much better fit than a Russian /r/ would be but in English it's /r/ which sounds closer to Mandarin /ʐ/. --Anatoli T. (обсудить/вклад) 23:20, 4 April 2018 (UTC)
Hmm, even I don't think we should use |ts= for that. Pinyin is already pronunciation-faithful, you just have to know what the symbols stand for. —Mahāgaja (formerly Angr) · talk 11:31, 5 April 2018 (UTC)
What I meant is, contrasting various transliterations and romanisations/cyrillisations and some reasoning behind one or the other. E.g. (the automated transliteration here is Revised Romanization (RR): 평양 (Pyeong-yang /pʰjʌ̹ŋja̠ŋ/, Pyongyang), McCune–Reischauer: P'yŏngyang, Russian: Пхенья́н (Pxenʹján /pxʲɪˈnʲjan/) vs 부산 (Busan /pusʰa̠n/, Busan), McCune–Reischauer: Pusan, Russian: Пуса́н (Pusán /pʊˈsan/). The focus should be here on pʰ/p/ and p/b, this still causes a lot of confusion and arguments about the choices for Korean transliterations, which has pʰ/p/b sounds but the transliteration either matches the pronunciation or the spelling, depending on the position or the method used. Do you see what I'm trying to say? Similarly, in my previous post, I was contrasting the English/Mandarin pinyin "r" with the Cyrillic Russian "ж" with each other and the pronunciation in the source language. --Anatoli T. (обсудить/вклад) 13:20, 5 April 2018 (UTC)
Should we then do the same thing for Turkish: English soudjouk from Turkish sucuk (/suˈdʒuk/)? Because many of our readers will not know that c in Turkish is pronounced /dʒ/. I think a better solution to this issue is to use our words, as I already linked above, like this. --WikiTiki89 15:42, 5 April 2018 (UTC)
I agree with Wikitiki that for the Mandarin, Korean, and Turkish examples we should just give the IPA separately with {{IPAchar}}, outside the {{m}} template, and not with |ts=. But for Burmese, where the transliteration doesn't tell you whether certain consonants are voiced or not and whether certain vowels are reduced or not, I'd prefer to use |ts=. And wasn't the whole battle between Rua and Wyang regarding Thai transcription last year ultimately about the fact that one of them wanted a more spelling-based tranliteration and the other wanted a more pronunciation-based transcription? Using |ts= for Thai would allow them both to have what they want, wouldn't it? —Mahāgaja (formerly Angr) · talk 18:39, 5 April 2018 (UTC)
@Wikitiki89: What you did at chinthe is fine by me but to me "ts=" is a shortcut for the same thing. Perhaps the templates could display pronounced as ... or even consider what type of brackets to include - / / or [ ], depending on the type of IPA? @Mahagaja: Even for pronunciation-faithful scripts and transliterations, it makes sense to provide pronunciation in many cases, especially when symbols are very confusing or misleading in terms of the pronunciation. --Anatoli T. (обсудить/вклад) 02:20, 7 April 2018 (UTC)
Well what I'm trying to say is precisely that |ts= is not a shortcut for the same thing. It's intended for an entirely different purpose. --WikiTiki89 13:42, 9 April 2018 (UTC)
I really think that if IPA is going to be included, it needs its own parameter: |pron= or, more unambiguously, |ipa=. As you say, IPA shouldn't have hardcoded brackets because it could be either phonetic or phonemic, and it needs to have the IPA class attribute (class="IPA") so that the proper fonts are applied and so that it can be located on a page if anyone wants to mess with it further using CSS or JavaScript. — Eru·tuon 19:29, 7 April 2018 (UTC)
It's not the job of |ts= to convey IPA-level of pronunciation accuracy. In the example of ခြင်္သေ့ (hkrangse.), I can't say I know much of anything about Burmese, but t͡ɕʰəðḛ clearly an IPA rendering, not a transcription, and therefore should not be placed in |ts=. --Victar (talk) 00:30, 7 April 2018 (UTC)
Yes, when we were discussing |ts= originally, it was made very explicit that it should not be used for IPA and should not overlap with the function of the "Pronunciation" section, whose purpose is to give narrow pronunciation information about a word. The transcription parameter was created to include information traditionally associated with/reconstructed for a term but not automatically generatable from the transliteration, all without interfering with the transliteration. The whole point is to give the representations of lemmata that the user is most likely to find in a dictionary (native script, transliteration, (reconstructed) transcription), not narrow pronunciation information. I'd brought up the notion before of limiting the distribution of this parameter to scripts that were deficient enough in information to merit transcription (abjads, syllabaries, cuneiform) so as to avoid this parameter being abused. —*i̯óh₁n̥C[5] 00:48, 7 April 2018 (UTC)
One way to think about it is: if wouldn't pass as a headword on wikt, it shouldn't pass for use in |ts=. In the example of ခြင်္သေ့ (hkrangse.) again, hkrangse. gives us enough information to both disambiguate and get a general understanding has to how to pronounce the word. Anything more is overreach. --Victar (talk) 01:15, 7 April 2018 (UTC)
  • I would like to make a late addition to the conversation to say that I agree with John, Victar, etc. —Μετάknowledgediscuss/deeds 17:55, 8 April 2018 (UTC)