Module:encodings

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

This module encodes strings into encodings other than UTF-8. Since the wiki software and all its pages and output use UTF-8, it should not be used to display text on the wiki. Rather, it is useful for encoding text in external links, for certain sites that use older encodings. For example, the Catalan IEC dictionary requires input in ISO 8859-1 encoding, and therefore text needs to be converted into this encoding when links to the dictionary are added to entries.

encode[edit]

encode(text, encoding)

Encodes a given text in the encoding, and returns the resulting string. Scribunto does not allow modules to return invalid UTF-8 text, and replaces any invalid bytes in the output with the U+FFFD REPLACEMENT CHARACTER. Since encoded text is bound to contain invalid UTF-8, the output is URL-encoded (percent encoded) before it is returned. This makes sense as the primary use of this function is to encode text for use in URLs.

The module defines a set "encoders" which are able to encode the text in a given encoding. More encoders can be added to the module as necessary.

The encode function can also be invoked from outside Scribunto. The two parameters of the function are given as two unnamed parameters in the invocation.

Utilization[edit]


local export = {}

local encoders = {}

encoders["ISO 8859-1"] = function(text)
	local ret = {}
	
	for cp in mw.ustring.gcodepoint(text) do
		if cp >= 256 then
			error("Invalid ISO 8859-1 character \"" .. mw.ustring.char(cp) .. "\".")
		end
		
		table.insert(ret, string.char(cp))
	end
	
	return table.concat(ret)
end

function export.encode(text, encoding)
	if type(text) == "table" then
		local params = {
			[1] = {required = true, allow_empty = true},
			[2] = {required = true},
		}
		
		local args = require("Module:parameters").process(text.args, params)
		text = args[1]
		encoding = args[2]
	end
	
	local encoder = encoders[encoding]
	
	if not encoder then
		error("No encoder exists for the encoding \"" .. encoding .. "\".")
	end
	
	return mw.uri.encode(encoder(text))
end

return export