Module talk:columns

From Wiktionary, the free dictionary
Latest comment: 8 days ago by Benwing2 in topic <alt:> is not working with w: prefix
Jump to navigation Jump to search

Split by line endings[edit]

Instead of requiring each list item to be a separate parameter, couldn't it split the text by line endings? I think that would make it a bit easier. —CodeCat 13:14, 25 January 2014 (UTC)Reply

It could, but what if you wanted to preserve line endings for indentation purposes (*, **, etc)? That's why I changed it to parameters. I could add a function for switching between defining the list through template parameters or through newlines. DTLHS (talk) 17:40, 25 January 2014 (UTC)Reply
You could just add the * in the parameters too. That way you could also make nested lists, while at the same time making sure that the module doesn't cut them in half (as long as it's made to understand them). —CodeCat 17:45, 25 January 2014 (UTC)Reply

I am happy either way. I add the pipe symbol by a macro in Word to create the tens or hundreds of separate parameters. Splitting by line endings will probably be more convenient for regular users. --Vahag (talk) 19:45, 26 January 2014 (UTC)Reply

Broken sort with Ancient Greek[edit]

Sort is broken with Ancient Greek: see κρατέω for an example. ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 20:33, 6 July 2014 (UTC)Reply

Okay, so after some digging I've found that Scribunto appears to somehow override the > operator so that it'll interpret á/ä/ą/etc. as "a" (instead of sorting them by Unicode value as usual); however letters with Ancient Greek diacritics (ἀ/ᾳ/ἁ/ᾶ) are treated as an empty string. I don't know how to fix this, or even where the relevant code might be. If anyone could solve this problem, or even point me in the right direction, that would be great. ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 17:28, 7 September 2014 (UTC)Reply

Problem with {{syn2}}[edit]

@CodeCat, DTLHS, Erutuon: according to the documentation, if the parameter |lang= is "left undefined, the entry calling the template will have to provide it". I updated {{syn2}}, which calls this module, by adding |lang={{{lang|en}}} so that it is unnecessary to manually specify the value of |lang= if it is English. However, if |lang= is omitted entirely the template works as expected, but if |lang=en is added the error message "Lua error in Module:parameters at line 108: The parameter "lang" is not used by this template" appears. What did I do wrong? — SGconlaw (talk) 18:00, 14 December 2017 (UTC)Reply

The logic in the module is that if the |lang= parameter is provided on the template page in the {{#invoke:columns}} invocation (frame.args), then the same parameter cannot be provided when the template is used (frame:getParent().args). I think the problem will be fixed if you do not provide a default language code on the template page: thus, |lang={{{lang|}}}. However, a mechanism to change this logic (or a parameter to provide a default language code) could be added if it is really desirable for there to be a default language code. — Eru·tuon 20:41, 14 December 2017 (UTC)Reply
Well, I was just trying to get the template to work according to the documentation which says that if |lang= is not specified then the template defaults to English. However, without adding |lang= to the template, omitting it generates an error stating that |lang= must be specified. — SGconlaw (talk) 02:27, 15 December 2017 (UTC)Reply
Strange. Perhaps that was the behavior in a previous revision of the module or template. — Eru·tuon 02:50, 15 December 2017 (UTC)Reply
I had a look at {{syn3}} and {{syn4}}. So, from what I can tell, if {{lang}} is omitted entirely, the template does not link to any language section of an entry page. If one wishes to link to, say, the English section, then |lang=en must be explicitly specified. Is this correct? — SGconlaw (talk) 02:57, 15 December 2017 (UTC)Reply
However, I've just realized that @Rua has deleted {{syn2}} and {{ant2}} with the message "No longer needed"; not quite sure why. — SGconlaw (talk) 02:29, 15 December 2017 (UTC)Reply

Broken sort with Gothic[edit]

Gothic is being sorted very strangely. At 𐍅𐌰𐌽𐌳𐌾𐌰𐌽 (wandjan) I wrote:

{{der3|lang=got
|𐌰𐍄𐍅𐌰𐌽𐌳𐌾𐌰𐌽
|𐌰𐍆𐍅𐌰𐌽𐌳𐌾𐌰𐌽
|𐌱𐌹𐍅𐌰𐌽𐌳𐌾𐌰𐌽
|𐌲𐌰𐍅𐌰𐌽𐌳𐌾𐌰𐌽
|𐌹𐌽𐍅𐌰𐌽𐌳𐌾𐌰𐌽
|𐌿𐍃𐍅𐌰𐌽𐌳𐌾𐌰𐌽
}}

with the words already in alphabetical order. It surfaces as:

with the alphabetical order all messed up. The two words that start with 𐌰 (a) aren't even next to each other, let alone in alphabetical order. And the word starting with 𐌱 (b) appears after the word starting with 𐌹 (i) instead of before the word starting with 𐌲 (g).

@Erutuon, Wyang: Any ideas? —Mahāgaja (formerly Angr) · talk 14:16, 25 April 2018 (UTC)Reply

For some reason Lua thinks the character "𐍄" (66372) is greater than the character "𐍆" (66374). Therefore the comparison fails on line 80. I'm guessing we'll have to use mw.ustring.codepoint and iterate over all characters instead of just comparing the strings with <. In the mean time you can use {{der3-u}} which will not apply any automatic sorting. DTLHS (talk) 16:35, 25 April 2018 (UTC)Reply
Is that all? Because if that were all, I'd expect 𐌰𐍆𐍅𐌰𐌽𐌳𐌾𐌰𐌽 >> 𐌰𐍄𐍅𐌰𐌽𐌳𐌾𐌰𐌽 >> 𐌱𐌹𐍅𐌰𐌽𐌳𐌾𐌰𐌽, and then the rest correct, but what is actually generated is just bizarre. Or does that one error mean that the whole thing breaks down after that point? —Mahāgaja (formerly Angr) · talk 16:43, 25 April 2018 (UTC)Reply
No that's not all. Looking at all the characters in Gothic, if the original list is { [1] = 𐌰,[2] = 𐌱,[3] = 𐌲,[4] = 𐌳,[5] = 𐌴,[6] = 𐌵,[7] = 𐌶,[8] = 𐌷,[9] = 𐌸,[10] = 𐌹,[11] = 𐌺,[12] = 𐌻,[13] = 𐌼,[14] = 𐌽,[15] = 𐌾,[16] = 𐌿,[17] = 𐍀,[18] = 𐍁,[19] = 𐍂,[20] = 𐍃,[21] = 𐍄,[22] = 𐍅,[23] = 𐍆,[24] = 𐍇,[25] = 𐍈,[26] = 𐍉,[27] = 𐍊,}, (Unicode order) that will be sorted to { [1] = 𐌰,[2] = 𐍁,[3] = 𐍂,[4] = 𐍀,[5] = 𐌿,[6] = 𐌾,[7] = 𐍃,[8] = 𐍅,[9] = 𐍈,[10] = 𐍄,[11] = 𐍇,[12] = 𐍆,[13] = 𐍉,[14] = 𐌽,[15] = 𐌻,[16] = 𐌳,[17] = 𐌴,[18] = 𐌲,[19] = 𐌱,[20] = 𐌼,[21] = 𐌵,[22] = 𐌷,[23] = 𐌺,[24] = 𐌶,[25] = 𐌹,[26] = 𐌸,[27] = 𐍊,}. DTLHS (talk) 16:53, 25 April 2018 (UTC)Reply

We could replace the comparison function with this:

local function comp(item1, item2)
	local l1 = mw.ustring.len(item1)
	local l2 = mw.ustring.len(item2)
	for i=1,math.min(l1,l2) do
		local b1 = mw.ustring.codepoint(item1, i, i)
		local b2 = mw.ustring.codepoint(item2, i, i)
		if b1 ~= b2 then
			return b1 < b2
		end
	end
	return l1 < l2
end

I don't know how much of a performance loss this would be. DTLHS (talk) 17:09, 25 April 2018 (UTC)Reply

Oy vey. It's probably easier to just use {{der3-u}}. Gothic isn't a language that's going to be needing collapsible tables of columns very often anyway. —Mahāgaja (formerly Angr) · talk 17:36, 25 April 2018 (UTC)Reply
This is very strange. I tested the Lua < operator (which is being used to sort the words in the template) with these Gothic words and it returns false no matter which order the words are in: that is, Gothic word number 1 is always both less than and greater than Gothic word number 2. (The > or <= operators also give the same result, true or false, for both orderings.) So, the sorting function is giving the Gothic words a random order.
When I try a version of the same program in my off-wiki Lua interpreters (versions 5.3.4 and 5.1.5), it works correctly (meaning, each word is either greater or less than another). So there seems to be a bug in how Scribunto implements the comparison operators. It seems to only affect codepoints in the Supplementary Multilingual Plane. I tried characters from the Linear B block, at the beginning of the SMP, and it has the bug, but Arabic Presentation Forms, a few blocks below the SMP, doesn't.
Maybe the comparison function is implemented by parsing the string into a series of codepoints stored in 16-bit unsigned integers (range 0-0xFFFF), and since the SMP has codepoints outside of this range, the operation fails and instead of crashing, either true or false is returned. (Or maybe this has something to do with UCS-2.) Maybe they assumed that no wiki will ever need to use comparison operators on the SMP? Anyway, I suppose a bug report needs to be filed about this.
In the meantime, there may be an inexpensive way to detect characters in the SMP and only use the verbose comparison function if it's needed to avoid the bug. — Eru·tuon 19:55, 25 April 2018 (UTC)Reply
Oh duh. I guess codepoints in the SMP are being truncated to 0xFFFF. So, Gothic codepoint 1 is 0xFFFF and Gothic codepoint 2 is also 0xFFFF; 0xFFFF < 0xFFFF is false, and reversing the order, 0xFFFF < 0xFFFF is also false, and so is 0xFFFF > 0xFFFF. But 0xFFFF <= 0xFFFF and 0xFFFF >= 0xFFFF are true. — Eru·tuon 20:26, 25 April 2018 (UTC)Reply
So basically what it's trying to do is sort aaaaaaaaa, aaaaaaaaa, aaaaaaaaa, aaaaaaaaa, aaaaaaaaa, and aaaaaaaaa? —Mahāgaja (formerly Angr) · talk 20:29, 25 April 2018 (UTC)Reply
@Mahagaja: Yeah, in effect. Every SMP character is treated as the same character. I'm writing about this on Phabricator now. — Eru·tuon 21:34, 25 April 2018 (UTC)Reply
Phabricator link: phab:T193096. — Eru·tuon 22:33, 25 April 2018 (UTC)Reply
@Erutuon Your SMP detection finds characters such as , and items such as {{l|eo|hispana lingvo|t=Spanish language}} in the lists. DTLHS (talk) 19:31, 26 April 2018 (UTC)Reply
@DTLHS: Whoops. It's off by a power of 16. (It searches for characters U+1000 and above rather than U+10000 and above.) — Eru·tuon 19:35, 26 April 2018 (UTC)Reply
Okay, so the problem is in a C function used by our server's Lua interpreter, and it may not be fixed anytime soon. The workaround is to search for an SMP non-BMP character in the list of words, and if one is found, use the compare function that DTLHS posted. — Eru·tuon 19:46, 26 April 2018 (UTC)Reply
@Erutuon: Thanks for finding a workaround! I hope the actual problem gets solved someday. —Mahāgaja (formerly Angr) · talk 07:13, 27 April 2018 (UTC)Reply

Broken sort in Latin script[edit]

Apparently the sort keys remove not only punctuation, but also spaces, which is incorrect. For example, the vassal derived terms sort vassal state after vassalship whereas it should be between envassal and vassalage. Can this be fixed promptly? Urhixidur (talk) 12:08, 3 August 2018 (UTC)Reply

@Urhixidur: Makes sense to me. I've made the change, though I'm still interested in the reasoning for removing spaces. — Eru·tuon 18:23, 3 August 2018 (UTC)Reply

now supports <tr:...>, <q:...>, etc.[edit]

@Mahagaja I added the support to {{col3}} etc. for inline transliteration, qualifiers and similar. You can see an example in User:Benwing2/test-col. Benwing2 (talk) 00:56, 2 May 2021 (UTC)Reply

notr[edit]

Can I have "notr=all" for suppress all (manual/automatic) transcription/tranliteration at once, for saving memory? And perhaps "notr=some" can still be overridden by a term? I have hundreds of terms (rhymes) to use with it but it runs out of memory. Segmentation does not help when they display on the same page. --Octahedron80 (talk) 05:18, 5 October 2021 (UTC)Reply

False prefixes[edit]

@Benwing2 Hi - γίνομαι (gínomai) is throwing an error at the bottom of the page due to some (admittedly weird) formatting, as the template thinks Compounds: is a prefix. Not sure there's a straightforward way to solve this without having an override param (awkward) or loading every lang prefix (expensive). Theknightwho (talk) 08:48, 9 April 2023 (UTC)Reply

@Theknightwho I will solve this by restricting the allowed format of the language codes, see my recent post in WT:BP about getting rid of such codes. For the moment it just means those weird codes won't be supported as language prefixes, which isn't the end of the world. Have to go to sleep now, will get to this tomorrow morning if that's OK. Benwing2 (talk) 08:51, 9 April 2023 (UTC)Reply
@Benwing2 Of course! Get some rest. Theknightwho (talk) 08:58, 9 April 2023 (UTC)Reply

<alt:> is not working with w: prefix[edit]

{{der2|ko
|w:ko:말<alt:말-을 말-하다>
|말<alt:말-을 말-하다>
}}

<alt:> is reflected in the transliteration, but is not ultimately shown.

@Benwing2, TheknightwhoFish bowl (talk) 05:16, 3 May 2024 (UTC)Reply

@Theknightwho This is caused by this line: Module:links#L-462. Can you take a look? Before this line runs, `data.term` looks like [[w:ko:말|말]] and `data.alt` looks like 말-을 말-하다. After the line, `data.term` looks like w:ko:말 (the brackets are stripped) and `data.alt` looks like (the actual alt value has been replaced with the value from the brackets). I think in this case it should honor the existing `data.alt` if it is specified. Benwing2 (talk) 06:24, 3 May 2024 (UTC)Reply