User talk:Gilgamesh~enwiktionary/Tanakh names

From Wiktionary, the free dictionary
Jump to navigation Jump to search

The following cryptic statement is made here: "Modern shva na collapse is predicted algorithmically, and may not always necessarily represent real world desyllabification."

I understand that a certain kind of nikkud, the shva, is either "na" (nuhn-ayin), or "nach" (nuhn-chet). "Na" means "transitory", while "nach" means "stationary".

Only very few Hebrew publishers identify the "shva na" with a graphical symbol, as a letter with a shva is pronounced differently if it is a shva na or an ordinary shva nach.

The new standard font format technology developed by Microsoft, Apple, and Adobe, called "OpenType", can be programmed, so that a grammar rule, such as shva na, or kamatz katan, or hataf kamatz katan, can automatically appear in the text, instead of an ordinary shva, kamtaz, or kamatz katan, based upon the context of letters and nikkud in a word.

Who can help me define these grammar rules, so I can include them in future OpenType fonts? - unsigned by User:Gohebrew

Unfortunately, my algorithm is just a guess. To be honest, Modern Hebrew shva na collapse prediction is annoying as hell. It seems to have myriad exceptions and sometimes just plain ideosyncratic use. So I'm never going to be able to hope that my algorithm is perfect. The Java code for my shva na collapse logic is rather kludgy, relying heavily on regular expressions. Here's the current snapshot of that function in my code (subject to change as I continue to tinker with it):
public static String reduceShewa(String string) {
	return reduceShewa(string, false);
}

public static String reduceShewa(String string, boolean ashkenazi) {
	string = string.replaceAll("([AaÁáEeÉéIiÍíOoÓóUuÚú])" +
			"(['hy]?)(v|b|bb|g|gg|d|dd|h|vv|z|zz|ħ|t|tt|y|yy|x|k|kk" +
			"|l|ll|m|mm|n|nn|s|ss|f|p|pp|c|cc|r|rr|š|šš)ə" +
			"(['vbgdhvzħtyxklmnsfpcrš])", "$1$2$3\uFFFE$4");
	string = string.replaceAll("([Gg])\uFFFEg", "$1əg");
	string = string.replaceAll("([Dd])\uFFFEd", "$1əd");
	string = string.replaceAll("([Vv])\uFFFEv", "$1əv");
	string = string.replaceAll("([Zz])\uFFFEz", "$1əz");
	string = string.replaceAll("([ĦħXx])\uFFFE([ħx])", "$1ə$2");
	string = string.replaceAll("([Tt])\uFFFE([tc])", "$1ə$2");
	string = string.replaceAll("([Yy])\uFFFEy", "$1əy");
	string = string.replaceAll("([Kk])\uFFFEk", "$1ək");
	string = string.replaceAll("([Ll])\uFFFEl", "$1əl");
	string = string.replaceAll("([Mm])\uFFFEm", "$1əm");
	string = string.replaceAll("([Nn])\uFFFEn", "$1ən");
	string = string.replaceAll("([Ss])\uFFFEs", "$1əs");
	string = string.replaceAll("([Ff])\uFFFEf", "$1əf");
	string = string.replaceAll("([Cc])\uFFFE([cs])", "$1ə$2");
	string = string.replaceAll("([Rr])\uFFFEr", "$1ər");
	string = string.replaceAll("([Šš])\uFFFEš", "$1əš");
	string = string.replace("\uFFFE", "");
	final boolean TRI = true;
	string = string.replaceAll((TRI?"([Vv])":"(V)")+"ə([gdzħtyxklnscrš])", "$1$2");
	string = string.replaceAll((TRI?"([Bb])":"(B)")+"ə([vgdzħtyxklnscrš])", "$1$2");
	string = string.replaceAll((TRI?"([Gg])":"(G)")+"ə([vdztylmnsfcrš])", "$1$2");
	string = string.replaceAll((TRI?"([Dd])":"(D)")+"ə([vgzħylxkmfr])", "$1$2");
	string = string.replaceAll((TRI?"([Zz])":"(Z)")+"ə([vgdħtyxklmnfr])", "$1$2");
	string = string.replaceAll((TRI?"([ĦħXx])":"([ĦX])")+"ə([vgdhztyklmnsfrš])", "$1$2");
	string = string.replaceAll((TRI?"([Tt])":"(T)")+"ə([vghħyxklmsfrš])", "$1$2");
	string = string.replaceAll((TRI?"([Yy])":"([Y])")+"ə([vgdhzħtxklmnsfcrš])", "$1$2");
	string = string.replaceAll((TRI?"([Kk])":"(K)")+"ə([vdhzħtyxlmnsfcrš])", "$1$2");
	string = string.replaceAll((TRI?"([Ll])":"(L)")+"ə([vgdhzħtyxkmfcš])", "$1$2");
	string = string.replaceAll((TRI?"([Mm])":"(M)")+"ə([dghzħtylxknscrš])", "$1$2");
	string = string.replaceAll((TRI?"([Nn])":"(N)")+"ə([vghħyxkmfr])", "$1$2");
	string = string.replaceAll((TRI?"([Cc])":"([C])")+"ə([c])", "$1'$2");
	string = string.replaceAll((TRI?"([SsŠšCc])":"([SŠC])")+"ə([vgdhħtyxklmnfcr])", "$1$2");
	string = string.replaceAll((TRI?"([Pp])":"([P])")+"ə([f])", "$1$2");
	string = string.replaceAll((TRI?"([PpFf])":"([PF])")+"ə([gdhzħtyxklnscrš])", "$1$2");
	string = string.replaceAll((TRI?"([Rr])":"(R)")+"ə([vgdztyklmnsfcš])", "$1$2");
	if (false && ashkenazi)
		string = string.replaceAll("ə([Yy])", "$1");
	else
		string = string.replaceAll("ə([Yy])", "i$1");
	string = string.replace('ə', 'e');
	return string;
}
Now, some notes about this. I use \uFFFE for internal purposes only—if it's placed before a lowercase letter, then later formatting code will detect \uFFFE, remove it, and convert the following lowercase letter to uppercase. Also, the consonants at this point are internally intentionally one letter, so c in formatting becomes ts/tz/etc., š becomes sh, x becomes kh/ch/etc., and ħ becomes ħ/kh/ch/etc. Anyway, the problem and ultimate shortcoming with this algorithm is that, as I mentioned before, that it seems to be applied inconsistently, contextually and ideosyncratically. The 2006 revision of the Hebrew transliteration rules require shva na be written as collapsed where it occurs in the spoken, language, but it's almost impossible to know for sure if it collapses or not without having learned the common pronunciation of each and every candidate word, which is a pain in the butt for someone who isn't fluent in Modern Hebrew (I mainly study Biblical Hebrew, and this is a list of Tanakh names which is primarily of biblical academic interest). Finally, the ashkenazi flag for conditional shva na collapse was based on a friend's recommendation as a possibility, but currently the code ignores it. (false && ashkenazi is always false, so that block is currently skipped.) - Gilgamesh 20:38, 5 October 2008 (UTC)[reply]
OH! Did you mean the classical distinction between when shewa collapses (naħ) and when it doesn't (na)? Well, that's much easier to predict. The short answer is that, classically, a shewa must remain open (na) after the consonant at the beginning of a word, or after a doubled consonant (but not immediately at the end of a word), or after a cluster of two consonants (but not immediately at the end of a word), or after a consonant-hatephvowel-consonant sequence, or after a longvowel-consonant. Much of that can be algorithmically predicted, but sometimes whether a vowel (particular qamez) is long or short cannot. The Tanakh text virtually always makes a vowel long if a cantillation mark is over it. But occasionally there is nothing to specially mark a vowel as being long instead of short, except for the reader's memory of encountering the particular word. Fortunately, zere and holem are always long. But hireq, seghol, pathah, qamez and qibbuz/shureq are short in a closed unstressed syllable, and long in an open or stressed syllable. To add insult to injury, sometimes (for odd uncertain reasons) even a long vowel will not prevent a shewa from being shewa naħ, such as in certain verb forms and names. Some are already in the list; see Gershom, Gershon, Ziklag, etc. But names like Asenath still have shewa na. Either way, these are only issues in vocalization and transliteration. Both shewa naħ and shewa na are represented only by the single shewa diacritic in Hebrew with nequddoth—print and Unicode alike. - Gilgamesh 20:54, 5 October 2008 (UTC)[reply]