Module:pi-decl/noun: difference between revisions

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Content deleted Content added
Added neuter stems in -as and masculine stems in -an to all scripts.
Isolated Brahmi - appears to be a fault in handling of supplementary codepoints.
Line 48: Line 48:
elseif match(stem, sc("[สसসသ][[กฺक्ক্က်]$")) then -- Somehow fails if this test and next are combined!
elseif match(stem, sc("[สसসသ][[กฺक्ক্က်]$")) then -- Somehow fails if this test and next are combined!
ending = "as"
ending = "as"
elseif match(stem, sc("[ᩈសස𑀲][ᨠ᩺ᨠ᩼គ៑ක්𑀓𑁆]$")) then
-- elseif match(stem, sc("[ᩈសස𑀲][ᨠ᩺ᨠ᩼គ៑ක්𑀓𑁆]$")) -- Fails except for Lana. Suspect problem in supplementary plane handling.
elseif match(stem, sc("[ᩈសස][ᨠ᩺ᨠ᩼គ៑ක්]$")) then
ending = "as"
elseif match(stem, "𑀲𑁆$") then
ending = "as"
ending = "as"
elseif match(stem, "an$") then
elseif match(stem, "an$") then

Revision as of 19:41, 3 October 2018

Purpose

This module provides inflection tables for Pali for nouns, adjectives and pronouns. For pronouns, one currently uses the interface for nouns, while for adjectives one uses separate invocations for each gender.

Some functions are exported from this module to service the testing of noun inflection. The module also provides utility functions for the conjugation of verbs.

Normal Use

The normal way to use this module is to invoke the template {{pi-decl-noun}}, which see for the interface. This invokes the exported function show.

Data tables

The primary data table for the inflections is the data module Module:pi-decl/noun/Latn, which contains the Latin script tables. These are supplemented by identically structured tables for each of the other supported scripts. If the table for a particular paradigm is missing from one of these, the table will be generated using the transliteration functions in Module:pi-Latn-translit. The data modules for the other scripts are:

With the exception of the masculine and neuter thematic nouns, the Thai and Lao tables are not used for declension with explicit vowels.

There is no such redundant table for the Chakma script.


Deliberately Exported Functions

The following Lua functions are exported by this module:

  • orJoin()
  • joinSuffix
  • arrcat_nodup
  • present
  • show

Function orJoin

Function joinSuffix

The original idea was to share this function with the code for verb conjugation. However, the conjugation of verbs in the Thai and Lao scripts is more complicated, and there is therefore a more general function in use for verbs.

Function arrcat_nodup

Function present()

Function show()

Other exported functions

  • detectEnding()
  • joinSuffixes
  • getSuffixes
  • modify

Algorithm

The paradigm to use is determined using the script of the stem, the ending of the stem (for which there are a few conventional values - see {{pi-decl-noun}}) and gender of the stem. The script is always deduced from the script of the stem, while the ending may be supplied explicitly (in Latin script) or deduced from the stem. The gender is always supplied explicitly. The deduction of the ending from the stem is performed by function detectEnding.

The set of suffixes is obtained by function getSuffixes. This first attempts to load the paradigm from the data files. However, if the paradigm is unacceptable or missing, it will generate it itself. Paradigms from data files are only acceptable for some combinations of settings. At present, they are not acceptable for non-Roman scripts when using explicit vowels, except for the conventional ending 'ah', which denotes masculine or neuter nouns with stems in explicit -a. (The convention was chosen because the explicit vowel also represents the Sanskrit ending -aḥ.)

When paradigms are generated internally, they are converted from Latin script to the required script and implicit vowel settings. This is implemented in function convert_suffixes.

The second stage of the generation, applicable to the Lao script only, is to, where needed, convert the ablative and instrumental plural in -bhi to the correct forms. The editor specifies the correct form using the parameter |liap=.

The third stage of the generation, applicable to Lao script only, is to, where needed, convert the letter corresponding to <y> in the suffixes to the correct letter. This setting is treated as orthogonal to the choice between using or not using implicit vowels.

The endings are then attached to the stem using the function joinSuffixes. This invokes function joinSuffixes to apply the writing system-dependent rules for the attachment of suffixes. There is one user-controlled input to this process, the parameter |aa=, which is applicable to the Burmes and Tai Tham scripts.

Next, the function modify is applied to add, remove or replace the forms generated so far in accordance a list of modifications included in the invocation of {{pi-decl-noun}}.

Finally, the function present formats the list of forms for each combination of case and number. This formatting includes adding the transliteration, which is done in function orJoin. Function show then returns the inflection table for display on the page.


local export = {}

local links = require("Module:links")
local lang = require("Module:languages").getByCode("pi")

local gsub = mw.ustring.gsub
local match = mw.ustring.match
local sub = mw.ustring.sub
local u = mw.ustring.char

local genders = {
	["m"] = "masculine", ["f"] = "feminine", ["n"] = "neuter",
}
local rows = {
	"Nominative (first)", "Accusative (second)", "Instrumental (third)", "Dative (fourth)",
	"Ablative (fifth)", "Genitive (sixth)", "Locative (seventh)", "Vocative (calling)",
}

local sc = function(str) -- 'strip carrier' - allows more legible inclusion of combining marks in strings.
	return gsub(str, "[กकকကᨠគකk𑀓]", "")
end

function export.detectEnding(stem)

	local ending

	-- Thai, Deva, Beng, Mymr, Lana, Khmr, Sinh, Latn and then Brah
	-- uses u() to prevent decomposition
	if match(stem, "[าा"..u(0x0906).."া"..u(0x0986).."ါာᩣᩤា"..u(0x17A4).."ා"..u(0x0D86).."ā]$")
		or match(stem, "𑀸$") or match(stem, "𑀆$")  then
		ending = "ā"
	elseif match(stem, "[ิिइিইိဣᩥᩍិឥිඉi]$") or match(stem, "𑀺$") or match(stem, "𑀇$") then
		ending = "i"
	elseif match(stem, "[ีीईীঈီဤᩦᩎីឦීඊī]$") or match(stem, "𑀻$") or match(stem, "𑀈$") then
		ending = "ī"
	elseif match(stem, "[ุुउুউုဥᩩᩏុឧුඋu]$") or match(stem, "𑀼$") or match(stem, "𑀉$") then
		ending = "u"
	elseif match(stem, "[ูूऊূঊူဦᩪᩐូឨឩූඌū]$") or match(stem, "𑀽$") or match(stem, "𑀊$") then
		ending = "ū"
	elseif match(stem, "ar$") then
		ending = "ar"
	elseif match(stem, sc("[รरরရ][กฺक्ক্က်]$")) then -- Somehow fails if this test and next are combined!
		ending = "ar"
	elseif match(stem, sc("[ᩁរර𑀭][ᨠ᩺ᨠ᩼គ៑ක්𑀓𑁆]$")) then
		ending = "ar"
	elseif match(stem, "as$") then
		ending = "as"
	elseif match(stem, sc("[สसসသ][[กฺक्ক্က်]$")) then -- Somehow fails if this test and next are combined!
		ending = "as"
--	elseif match(stem, sc("[ᩈសස𑀲][ᨠ᩺ᨠ᩼គ៑ක්𑀓𑁆]$"))  -- Fails except for Lana.  Suspect problem in supplementary plane handling.
	elseif match(stem, sc("[ᩈសස][ᨠ᩺ᨠ᩼គ៑ක්]$")) then
		ending = "as"
	elseif match(stem, "𑀲𑁆$") then
		ending = "as"
	elseif match(stem, "an$") then
		ending = "an"
	elseif match(stem, sc("[นनনန][กฺक्ক্က်]$")) then -- Somehow fails if this test and next are combined!
		ending = "an"
	elseif match(stem, sc("[ᨶនන𑀦][ᨠ᩺ᨠ᩼គ៑ක්𑀓𑁆]$")) then
		ending = "an"
	else
		ending = "a"
	end

	return ending

end

function export.joinSuffix(scriptCode, stem, suffixes)

	local output = {}
	local term, io

    io = 1;
	for _,suffix in ipairs(suffixes) do
		if match(suffix, "^⌫⌫") then --backspace
			term = sub(stem, 1, -3) .. sub(suffix, 3, -1)
		elseif match(suffix, "^⌫") then --backspace
			term = sub(stem, 1, -2) .. sub(suffix, 2, -1)
		else
			term = stem .. suffix
		end
		if scriptCode == "Thai" then
			term = gsub(term, "(.)↶([เโ])", "%2%1") --swap
		end
		if scriptCode == "Mymr" then
			term = gsub(term, "င္", "င်္")
			term = gsub(term, "(င်္)([ဝခဂငဒပ])(ေ?)ာ", "%1%2%3ါ")
			term = gsub(term, "္[ယရ]", { ["္ယ"] = "ျ", ["္ရ"] = "ြ" }) --these not need tall aa
			term = gsub(term, "^([ဝခဂငဒပ])(ေ?)ာ", "%1%2ါ")
			term = gsub(term, "([^္])([ဝခဂငဒပ])(ေ?)ာ", "%1%2%3ါ")
			term = gsub(term, "([ဝခဂငဒပ])(္[က-အဿ])(ေ?)ာ", "%1%2%3ါ")
			term = gsub(term, "္[ဝဟ]", { ["္ဝ"] = "ွ", ["္ဟ"] = "ှ" })
			term = gsub(term, "ဉ္ဉ", "ည")
			term = gsub(term, "သ္သ", "ဿ")
		end
		if scriptCode == "Lana" then
			term = gsub(term, "ᨦ᩠", "ᩘ")
			term = gsub(term, "^([ᩅᨣᨵᨷᨻ])(ᩮ?)ᩣ", "%1%2ᩤ")
			term = gsub(term, "([^᩠])([ᩅᨣᨵᨷᨻ])(ᩮ?)ᩣ", "%1%2%3ᩤ")
			term = gsub(term, "([ᩅᨣᨵᨷᨻ])(᩠[ᨠ-ᩌᩔ])(ᩮ?)ᩣ", "%1%2%3ᩤ")
			term = gsub(term, "᩠[ᩁᩃ]", { ["᩠ᩁ"] = "ᩕ", ["᩠ᩃ"] = "ᩖ" })
			term = gsub(term, "([ᨭ-ᨱ])᩠ᨮ", "%1ᩛ")
			term = gsub(term, "([ᨷ-ᨾ])᩠ᨻ", "%1ᩛ")
			term = gsub(term, "ᩈ᩠ᩈ", "ᩔ")
		end
		--[[if scriptCode == "Laoo" then
			term = gsub(term, "(.)↶([ເໂ])", "%2%1")
		end]]
        output[io] = term;
        io = io + 1;
	end

	return output

end

function export.orJoin(script, list)
   local output = "";
	for _,term in ipairs(list) do
		if output ~= "" then
			output = output .. " <small style=\"color:888\">or</small> "
		end
		output = output .. links.full_link({lang = lang, sc = script, term = term})
	end

	return output
end

-- convert Latin script inflections to another script
local convert_suffixes = function(stem, nstrip, suffixes, sc)
	local form, pre
	local xlitend = {}
	local strip = string.rep("⌫", nstrip)
	for k = 1, #suffixes do
		xlitend[k] = {}
		form = export.joinSuffix('Latn', stem, suffixes[k])
		for ia, va in pairs(form) do
			altform = to_script(va, sc)
-- Special handling is needed for a preposed vowel.
			pre = match(altform, "^[เโ]")
			if pre then
				xlitend[k][ia] = strip .. "↶" .. pre .. sub(altform, 3)
			else
				xlitend[k][ia] = strip .. sub(altform, 2)
			end
		end
	end
	return xlitend
end

function export.getSuffixes(scriptCode, ending, g)
	local pattern = require("Module:pi-decl/noun/" .. scriptCode) or nil
	local applicable
	if pattern[ending] then
		applicable = pattern[ending][g]
	else
		applicable = nil
	end
	if applicable then
		return applicable
	elseif 'Latn' == scriptCode then
		return nil
	else
		pattern = require("Module:pi-decl/noun/Latn") or nil
		to_script = require("Module:pi-Latn-translit").tr
		applicable = pattern[ending] and pattern[ending][g] or nil
		if not applicable then
			return nil
		elseif 'ar' == ending then
			return convert_suffixes('kar', 2, applicable, scriptCode)
		elseif 'as' == ending then
			return convert_suffixes('kas', 2, applicable, scriptCode)
		elseif 'an' == ending then
			return convert_suffixes('kan', 2, applicable, scriptCode)
		else
			return nil
		end
	end
end

function export.show(frame)

	local args = frame:getParent().args
	local PAGENAME = mw.title.getCurrentTitle().text
	local stem = args[1] or args["stem"] or PAGENAME
	currentScript = require("Module:scripts").findBestScript(stem, lang)
	scriptCode = currentScript:getCode()
	local ending = args[2] or args["ending"] or export.detectEnding(stem)
	local g = args[3] or args["g"] or args["gender"] -- for each gender only

	if not g then
		error("A gender is required to display proper declensions.")
	end

	local selectedPattern = export.getSuffixes(scriptCode, ending, g)

	local output = '<div class="NavFrame" style="min-width:30%"><div class="NavHead" style="background:#d9ebff">Declension table of "' .. stem .. '" (' .. genders[g] .. ')</div><div class="NavContent">'
	output = output .. '<table class="inflection-table" style="background:#F9F9F9;text-align:center;width:100%"><tr><th style="background:#eff7ff">Case \\ Number</th><th style="background:#eff7ff">Singular</th><th style="background:#eff7ff">Plural</th></tr>'

	for i,v in ipairs(rows) do
		output = output .. "<tr><td style=\"background-color:#eff7ff;\">" .. v .. "</td>"
		output = output .. "<td>" .. export.orJoin(currentScript, export.joinSuffix(scriptCode, stem, selectedPattern[2 * i - 1])) .. "</td>"
		output = output .. "<td>" .. export.orJoin(currentScript, export.joinSuffix(scriptCode, stem, selectedPattern[2 * i])) .. "</td>"
		output = output .. "</tr>"
	end

	output = output .. "</table></div></div>"
	return output

end

return export