Module:languages

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
The following documentation is located at Module:languages/documentation. [edit]

This module is used to retrieve and manage Wiktionary's various languages and the information associated with them. See Wiktionary:Languages for more information.

This module provides access to other modules. To access the information from within a template, see Module:languages/templates.

The information itself is stored in the various data modules that are subpages of this module. They are listed in Category:Language data modules. These modules should not be used directly by any other module, the data should only be accessed through the functions provided by Module:languages.

Finding and retrieving languages

The module exports a number of functions that are used to find languages.

getLanguageByCode

getLanguageByCode(code)

Finds the language whose code matches the one provided. If it exists, it returns a Language object representing the language. Otherwise, it returns nil.

getLanguageByCanonicalName

getLanguageByCanonicalName(name)

This function is expensive

Finds the language whose canonical name (the name used to represent that language on Wiktionary) matches the one provided. If it exists, it returns a Language object representing the language. Otherwise, it returns nil. The canonical name of languages should always be unique (it is an error for two languages on Wiktionary to share the same canonical name), so this is guaranteed to give at most one result.

This function searches through the whole database of languages, and is therefore relatively resource-intensive. It should be used sparingly.

findLanguagesByName

findLanguagesByName(name, inexact)

This function is expensive

Finds languages which have the provided name among their list of possible names (including their canonical name). It returns a table containing Language objects for the languages found, or an empty table if none were found.

The inexact parameter can be given as true to perform a substring search of the name instead of an exact match. The result will then contain all languages that have the provided name as part of one of their possible names.

This function searches through the whole database of languages, and is therefore relatively resource-intensive. It should be used sparingly.

getAllLanguages

getAllLanguages()

This function is expensive

Returns a table containing Language objects for all languages, sorted by code.

This function searches through the whole database of languages, and is therefore relatively resource-intensive. It should be used sparingly.

Language objects

A Language object is returned from one of the functions above. It is a Lua representation of a language and the data associated with it. It has a number of methods that can be called on it, using the : syntax. For example:

local m_languages = require("Module:languages")
local lang = m_languages.getLanguageByCode("fr")
local name = lang:getCanonicalName()
-- "name" will now be "French"

Language:getCode

:getCode()

Returns the language code of the language. Example: "fr" for French.

Language:getCanonicalName

:getCanonicalName()

Returns the canonical name of the language. This is the name used to represent that language on Wiktionary, and is guaranteed to be unique to that language alone. Example: "French" for French.

Language:getAllNames

:getAllNames()

Returns a table of all names that the language is known by, including the canonical name. The names are not guaranteed to be unique, sometimes more than one language is known by the same name. Example: {"French", "Modern French"} for French.

Language:getType

:getType()

Returns the type of language, which can be "regular", "reconstructed" or "appendix-constructed".

Language:getScripts

:getScripts()

Returns a table of Script objects for all scripts that the language is written in. See Module:scripts.

Language:getFamily

:getFamily()

Returns a Family object for the language family that the language belongs to. See Module:families.

Language:getCategoryName

:getCategoryName()

Returns the name of the main category of that language. Example: "French language" for French, whose category is at Category:French language.

Language:makeEntryName

:makeEntryName(term)

Converts the given term into the form used in the names of entries. This removes diacritical marks from the term if they are not considered part of the normal written form of the language, and which therefore are not permitted in page names. It also removes certain punctuation characters like final question marks or periods which are never present in page names. Example for Latin: "amō""amo" (macron is removed).

The replacements made by this function are defined by the entry_name setting for each language in the data modules.

Language:makeSortKey

:makeSortKey(term)

Creates a sort key for the given, following the rules appropriate for the language. This removes diacritical marks from the term if they are not considered significant for sorting, and may perform some other changes. Any initial hyphen is also removed, and anything parentheses is removed as well.

The replacements made by this function are defined by the sort_key setting for each language in the data modules.

Language:transliterate

:transliterate(text, sc)

Transliterates the text from the given script into the Latin script (see Wiktionary:Transliteration and romanization). The language must have the translit_module property for this to work; if it is not present, nil is returned.

The sc parameter is handled by the transliteration module, and how it is handled is specific to that module. Some transliteration modules may tolerate nil as the script, others require it to be one of the possible scripts that the module can transliterate, and will show an error if it's not one of them. For this reason, the sc parameter should always be provided when writing non-language-specific code.

Language:getRawData

:getRawData()

This function is not for use in entries or other content pages.

Returns a blob of data about the language. The format of this blob is undocumented, and perhaps unstable; it's intended for things like the module's own unit-tests, which are "close friends" with the module and will be kept up-to-date as the format changes.


local export = {}
 
local Language = {}
 
function Language:getCode()
	return self._code
end
 
function Language:getCanonicalName()
	return self._rawData.names[1]
end
 
function Language:getAllNames()
	return self._rawData.names
end
 
function Language:getType()
	return self._rawData.type
end
 
function Language:getScripts()
	local m_scripts = require("Module:scripts")
	local ret = {}
 
	for _, sc in ipairs(self._rawData.scripts) do
		table.insert(ret, m_scripts.getScriptByCode(sc))
	end
 
	return ret
end
 
function Language:getFamily()
	local m_families = require("Module:families")
	return m_families.getFamilyByCode(self._rawData.family)
end
 
function Language:getCategoryName()
	local name = self._rawData.names[1]
 
	-- If the name already has "language" in it, don't add it.
	if name:find("[Ll]anguage$") then
		return name
	else
		return name .. " language"
	end
end
 
function Language:makeEntryName(text)
	text = mw.ustring.gsub(text, "^[¿¡]", "")
	text = mw.ustring.gsub(text, "[؟?!;՛՜ ՞ ՟?!।॥။၊་།]$", "")
 
	if self._rawData.entry_name then
		for i, from in ipairs(self._rawData.entry_name.from) do
			local to = self._rawData.entry_name.to[i] or ""
			text = mw.ustring.gsub(text, from, to)
		end
	end
 
	return text
end
 
function Language:makeSortKey(name)
	name = mw.ustring.lower(name)
 
	-- Remove initial hyphens and *
	name = mw.ustring.gsub(name, "^[-־ـ*]+(.)",
		"%1")
	-- Remove anything in parentheses, as long as they are either preceded or followed by something
	name = mw.ustring.gsub(name, "(.)%([^()]+%)", "%1")
	name = mw.ustring.gsub(name, "%([^()]+%)(.)", "%1")
 
	-- If there are language-specific rules to generate the key, use those
	if self._rawData.sort_key then
		for i, from in ipairs(self._rawData.sort_key.from) do
			local to = self._rawData.sort_key.to[i] or ""
			name = mw.ustring.gsub(name, from, to)
		end
	end
 
	return mw.ustring.upper(name)
end
 
function Language:transliterate(text, sc)
	if not self._rawData.translit_module or not text then
		return nil
	end
 
	return require("Module:" .. self._rawData.translit_module).tr(text, self:getCode(), sc)
end
 
-- Do NOT use this method!
-- All uses should be pre-approved on the talk page!
function Language:getRawData()
	return self._rawData
end
 
Language.__index = Language
 
local function getRawLanguageData(code)
	local stable = mw.loadData("Module:languages/stable")[code]
 
	if stable then
		return stable
	end
 
	local len = string.len(code)
 
	if code:find("^[a-z][a-z]$") then
		return mw.loadData("Module:languages/data2")[code]
	elseif code:find("^[a-z][a-z][a-z]$") then
		local pre = code:sub(1, 1)
		return mw.loadData("Module:languages/data3/" .. pre)[code]
	elseif code:find("^[a-z-]+$") then
		return mw.loadData("Module:languages/datax")[code]
	else
		return nil
	end
end
 
-- The object cache implements memoisation, and is used to avoid duplication
-- of objects. If you request the same language code twice, you should also get
-- the same object twice, not two different objects with identical data.
-- It might also speed things up a bit.
local object_cache = {}
 
function export.getLanguageByCode(code)
	if object_cache[code] then
		return object_cache[code]
	end
 
	local rawData = getRawLanguageData(code)
 
	if rawData then
		local object = setmetatable({ _rawData = rawData, _code = code }, Language)
		object_cache[code] = object
		return object
	else
		return nil
	end
end
 
-- Lua implementation of [[Template:langrev]]
-- We could optimise this by prioritising stable and data2 modules,
-- as they are more frequently used and thus more likely to contain what the user
-- is looking for.
function export.getLanguageByCanonicalName(name)
	mw.incrementExpensiveFunctionCount()
	local m_data = mw.loadData("Module:languages/alldata")
 
	for code, data in pairs(m_data) do
		if data.names[1] == name then
			return export.getLanguageByCode(code)
		end
	end
 
	return nil
end
 
function export.findLanguagesByName(name, inexact)
	mw.incrementExpensiveFunctionCount()
	local m_data = mw.loadData("Module:languages/alldata")
	local found = {}
 
	for code, data in pairs(m_data) do
		for _, n in ipairs(data.names) do
			if inexact and n:find(name, nil, true) or n == name then
				table.insert(found, export.getLanguageByCode(code))
				break
			end
		end
	end
 
	return found
end
 
function export.getAllLanguages()
	mw.incrementExpensiveFunctionCount()
	local m_data = mw.loadData("Module:languages/alldata")
 
	local ret = {}
 
	for code, data in pairs(m_data) do
		-- This isn't the most efficient way to do it, but it works for now.
		table.insert(ret, export.getLanguageByCode(code))
	end
 
	return ret
end
 
return export