Module:grc-translit: difference between revisions

Content deleted Content added

Inline

Revision as of 20:53, 7 January 2017

The following documentation is located at Module:grc-translit/documentation. ^[edit] Categories were auto-generated by Module:module categorization. ^[edit]

Useful links: subpage list • links • transclusions • testcases • sandbox (diff)

This module will transliterate Ancient Greek language text per WT:GRC TR. It is also used to transliterate Demotic, Greek, Paeonian, Old Ossetic, Oscan, Dacian, Ancient Macedonian, and Phrygian. The module should preferably not be called directly from templates or other modules. To use it from a template, use {{xlit}}. Within a module, use Module:languages#Language:transliterate.

For testcases, see Module:grc-translit/testcases.

Functions

tr(text, lang, sc): Transliterates a given piece of text written in the script specified by the code sc, and language specified by the code lang.; When the transliteration fails, returns nil.

18 of 36 tests failed. (refresh)

testcases for `tr` function in Module:grc-translit:
	Text	Expected	Actual
	λόγος	lógos	lógos
	σφίγξ	sphínx	sphínx
	ϝάναξ	wánax	wánax
	οἷαι	hoîai	oiĥai
current problems
	ΙΧΘΥΣ	IKHTHUS	IKhThUS
	Υἱός	'''Hu'''iós	'''U'''ihós
u/y
	ταῦρος	taûros	taûros
	νηῦς	nēûs	nēûs
	σῦς	sûs	sûs
	ὗς	hûs	uĥs
	γυῖον	guîon	guîon
	ἀναῡ̈τέω	anaṻtéō	anaṻtéō
	δαΐφρων	daḯphrōn	daḯphrōn
vowel length
	τῶν	tôn	tō̂n
	τοὶ	toì	toì
	τῷ	tôi	tō̂i
	τούτῳ	toútōi	toútōi
	σοφίᾳ	sophíāi	sophíai
	μᾱ̆νός	mānós	mānós
h (rough breathing)
	ὁ	ho	oh
	οἱ	hoi	oih
	εὕρισκε	heúriske	euh́riske
	ὑϊκός	huïkós	uhïkós
	πυρρός	purrhós	purrhós
	ῥέω	rhéō	rhéō
	σάἁμον	sáhamon	sáahmon
capitals
	Ὀδυσσεύς	Odusseús	Odusseús
	Εἵλως	Heílōs	Eih́lōs
	ᾍδης	Hā́idēs	Ah́idēs
	ἡ Ἑλήνη	hē Helḗnē	ēh Ehlḗnē
punctuation
	ἔχεις μοι εἰπεῖν, ὦ Σώκρατες, ἆρα διδακτὸν ἡ ἀρετή;	ékheis moi eipeîn, ô Sṓkrates, âra didaktòn hē aretḗ?	ékheis moi eipeîn, ō̂ Sṓkrates, âra didaktòn ēh aretḗ;
	τί τηνικάδε ἀφῖξαι, ὦ Κρίτων; ἢ οὐ πρῲ ἔτι ἐστίν;	tí tēnikáde aphîxai, ô Krítōn? ḕ ou prṑi éti estín?	tí tēnikáde aphîxai, ō̂ Krítōn; ḕ ou prṑi éti estín;
	τούτων φωνήεντα μέν ἐστιν ἑπτά· α ε η ι ο υ ω.	toútōn phōnḗenta mén estin heptá; a e ē i o u ō.	toútōn phōnḗenta mén estin ehptá· a e ē i o u ō.
	πήγ(νῡμῐ)	pḗg(nūmi)	pḗg(nūmi)
HTML entities
	καλός καὶ ἀγαθός	kalós kaì agathós	kalós kaì agathós
	καλός καὶ ἀγαθός	kalós kaì agathós	kalós kaì agathós

local export = {}

local PSILI = mw.ustring.char(0x313)
local DASIA = mw.ustring.char(0x314)
local SUBSCRIPT = mw.ustring.char(0x345)

local TREMA = mw.ustring.char(0x308)

local GRAVE = mw.ustring.char(0x300)
local ACUTE = mw.ustring.char(0x301)
local GREEKCIRCUMFLEX = mw.ustring.char(0x342)
local CIRCUMFLEX = mw.ustring.char(0x302)

local MACRON = mw.ustring.char(0x304)
local BREVE = mw.ustring.char(0x306)

local tt = {
	-- Vowels
	["α"] = "a",
	["ε"] = "e",
	["η"] = "ē",
	["ι"] = "i",
	["ο"] = "o",
	["υ"] = "u",
	["ω"] = "ō",

	-- Consonants
	["β"] = "b",
	["γ"] = "g",
	["δ"] = "d",
	["ζ"] = "z",
	["θ"] = "th",
	["κ"] = "k",
	["λ"] = "l",
	["μ"] = "m",
	["ν"] = "n",
	["ξ"] = "x",
	["π"] = "p",
	["ρ"] = "r",
	["σ"] = "s",
	["ς"] = "s",
	["τ"] = "t",
	["φ"] = "ph",
	["χ"] = "kh",
	["ψ"] = "ps",
	
	-- Archaic letters
	["ϝ"] = "w",
	["ϻ"] = "ś",
	["ϙ"] = "q",
	["ϡ"] = "š",
	["ͷ"] = "v",
	
	-- Diacritics
	[MACRON] = MACRON,
	[BREVE] = '',
	[PSILI] = '',
	[DASIA] = '',
	[TREMA] = TREMA,
	[GRAVE] = GRAVE,
	[ACUTE] = ACUTE,
	[GREEKCIRCUMFLEX] = CIRCUMFLEX,
	[SUBSCRIPT] = 'i',
	
	-- For internal processing of diaeresis
	['+'] = '',
}

local diacritics = PSILI..DASIA..SUBSCRIPT..MACRON..BREVE..TREMA..GRAVE..ACUTE..GREEKCIRCUMFLEX

function export.tr(text, lang, sc)
	-- If the script is given as Cprt, then forward the transliteration to that module
	if sc == "Cprt" then
		return require("Module:Cprt-translit").tr(text, lang, sc)
	end
	
	local gsub = mw.ustring.gsub

	-- decompose text
	text = mw.ustring.toNFD(text)
	
	text = gsub(text,'([ιυ])(['..BREVE..MACRON..']?)'..TREMA,'+%1%2'..TREMA)
	
	--tokenize
	tokens = {}
	ti = 0 -- it gets incremented every time
	for i = 1,mw.ustring.len(text) do
		ch = mw.ustring.sub(text,i,i)
		if ch == 'ι' and tokens[ti] and mw.ustring.match(tokens[ti],'[ΑΕΗΟΥΩαεηουω]') then
			tokens[ti] = tokens[ti]..'ι'
		elseif ch == 'υ' and tokens[ti] and mw.ustring.match(tokens[ti],'[ΑΕΗΟΩαεηοω]') then
			tokens[ti] = tokens[ti]..'υ'
		elseif mw.ustring.match(ch,diacritics) then
			tokens[ti] = tokens[ti]..ch
		else
			ti = ti+1
			tokens[ti] = ch
		end
	end
	
	--now read the tokens
	out = ''
	for i,token in pairs(tokens) do
		t = mw.ustring.gsub(mw.ustring.lower(token),'.',function(x) return tt[x] end)
		
		-- elseif is misleading (these are independent) but it's more concise this way
		if token == 'γ' and tokens[i+1] and mw.ustring.match(tokens[i+1],'[κγχξ]') then
			t = 'n'
		elseif token == 'ρ' and tokens[i-1] and tokens[i-1] == 'ρ' then
			t = 'rh'
		elseif mw.ustring.match(token,'[ΑΕΗΟΩαεηοω]υ') or mw.ustring.match(token,'[Υυ]ι') then
			t = mw.ustring.gsub(t,'y','u')
		elseif mw.ustring.match(token,'[αΑ].*'..SUBSCRIPT) then
			t = mw.ustring.gsub(t,'([aA])','%1'..MACRON)
		end
		
		if mw.ustring.match(token,DASIA) then
			if mw.ustring.match(token,'[Ρρ]') then
				t = t .. 'h'
			else
				t = 'h' .. t
			end
		end
	
		t = mw.ustring.toNFD(t) -- we can't manually enter them as e/o + macron in the table because it'll recombine apparently
		if mw.ustring.match(t,CIRCUMFLEX) then
			t = mw.ustring.gsub(t,MACRON,'')
		end
		
		if token ~= mw.ustring.lower(token) then
			t = mw.ustring.upper(mw.ustring.sub(t,1,1) ) .. mw.ustring.lower(mw.ustring.sub(t,2) )
		end
		out = out .. t
	end
	return out
end

return export

Module:grc-translit: difference between revisions

Revision as of 20:53, 7 January 2017

Functions

Navigation menu

Search