Wiktionary talk:About Hittite

From Wiktionary, the free dictionary
Latest comment: 6 years ago by JohnC5 in topic Lemmatization
Jump to navigation Jump to search

Lemmatization

[edit]

@JohnC5: Given the issue with Unicode, I think we should lemmatize entries in the broad transcription of their inflectional stem instead of nom.sg and the infinitive in cuneiform. Although heteroclites might be troublesome if we use the stem, but allows us to avoid reconstructions when the lemma is not attested. What do you think? --Tom 144 (𒄩𒇻𒅗𒀸) 17:41, 7 February 2018 (UTC)Reply

@Tom 144: Just to summarize for Unicode, the issue is that some of the characters don't show the Neo-Assyrian forms of the characters like DINGIR vs. AN? I'm not sure I'm convinced because:
  • Unicode has most of the correct symbols. If necessary we can remap them to be correct.
  • This is not a decision that can be made lightly. Indeed we might even need a vote for it.
  • Broad transcriptions are not reliable, especially in the case of labiovelars (/uk/ or /ku/) and plene vowels which are often omitted.
  • Many words in Hittite do not have a known phonetic value, only Sumero- or Akkadograms. I feel uncomfortable lemmatizing them under a trascribed Sumerogram, especially if you have to add a phonetic complement (e.g. LUGAL-). Then you get in to the whole issue of italics for Akkadograms and Hittite, which cannot be encoded.
With all of that said, I am still interested in redirect pages. If it is just a matter of remapping the symbols and declaring a project-wide policy, I think that is the best option. —*i̯óh₁nC[5] 06:28, 8 February 2018 (UTC)Reply
@JohnC5: Compare the first word in the third line:
𒆪𒀉𒈠𒀭𒍝𒃷
ku-it-ma-an-za-kán
The only sign that really matches is the sign for za, and the sign for kán is similar but inaccurate. The sign for AN/DINGIR does not appear in w:List of cuneiform signs, so I'm dubious Unicode can encode it.
I don't understand the issue if labiovelars in the broad transcription, could you be a little more specific. Plene vowels should be expressed with a macron, when some authors differ on whether a word has or not a plene vowel, it's because Hittite scribes where inconsistent writers and both forms are attested (e.g. /tːkáːn/ ta-ga-a-an has also been attested as ta-ga-an, ta-a-ga-an and ta-a-ga-a-an). For most logograms we know the stem, in the case for LUGAL-u- it's ḫaššu-, but it's true that eventually we'll run up to a logogram whose stem we will ignore. In that case we would be forced to do it as dictionaries usually do it, separating the logograms from the phonograms with a hyphen. Superscipted determinatives and italics would be problematic indeed, as far as I know this are the only superscriptable super case letters "ᴬ ᴮ ᴰ ᴱ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴼ ᴾ ᴿ ᵀ ᵁ ⱽ ᵂ". It looks as if there isn't a perfect solution. --Tom 144 (𒄩𒇻𒅗𒀸) 17:10, 8 February 2018 (UTC)Reply
@Tom 144: Yeah, they don't look great, but these are the codepoints that were created. The better thing would be to get a font that displays them correctly for Hittite and use it. As for the other points
  • Labiovelars can show up as either -ku- or -uk-, as in eukzi vs. ekuzi. Indeed, this behavior is one of the primary reasons we believe these are labiovelars. I suppose this issue still applies in cuneiform as well as transcription. I should have said the bigger issue of prop vowels like in walḫzi, which must can be either wa-al-ḫa-zi or wa-al-aḫ-zi (among others). In these cases, we've chosen to interpret it as walḫzi based on comparative evidence, but there are many words where there are competing opinions as to whether a vowel is a prop vowel or not. This also affects initial clusters and their broad transcriptions. IF we used broad transcription, walḫzi should be the one we choose, but we have done some reconstruction/abstraction.
  • About plene spelling, I'm perfectly aware of the issues around plene writing, its usage, and the debates. Again, often we don't know, which can lead to competing broad transcriptions.
  • I should have clarified. I also know that LUGAL-u- it's ḫaššu-, but I didn't want to go dig around for one we don't know, but there are plenty.
  • To be clear, determinatives should absolutely be omitted in entry names. We don't include them Egyption, Akkadian, or Sumerian, and we shouldn't include them here.
The overarching issue that I am raising is that broad transcriptions are, in many cases, academic abstractions which are open to varying opinions and formats. Reconstructing things is fine in reconstructed pages, transcriptions, and maybe even soft redirects, but lemmatizing an abstraction as opposed to an attested form in the mainspace is effectively against the policies of Wiktionary. The Unicode codepoints may look wrong, but that is a lesser evil to lemmatizing artificial words.
Also, having downloaded the Hittite font from this site and having added this code to my common.css file, 𒆪𒀉𒈠𒀭𒍝𒃷 renders correctly when marked as Hittite. I think we could add a note concerning this on WT:AHIT, or perhaps we could implement this as the default project wide (I wonder if @Erutuon thinks this is a good idea), though they'd still need to download the font. This seems like a much simpler and better solution to me. —*i̯óh₁nC[5] 23:01, 8 February 2018 (UTC)Reply
FYI, you may need to open this page in a new tab after doing this to get it to show up correctly the first time. —*i̯óh₁nC[5] 23:06, 8 February 2018 (UTC)Reply
I see on problem changing the default Hittite font. You might want to add a backup font to the stack; for Windows 10, Segoe UI Historic. Also I would note that if you're using Firefox rather than Chrome, you have to restart your whole browser to see newly installed fonts. — Eru·tuon 00:26, 9 February 2018 (UTC)Reply
@Erutuon: Oh, I'd definitely want a default font to fall back on. I was just suggesting adding this one for Hittite so that all a user would need to do would be to install this font, and then everything would render correctly instead of having to do any coding. —*i̯óh₁nC[5] 00:30, 9 February 2018 (UTC)Reply
I got lost when you started talking about fonts. Would this display the correct characters? The user would need to instal something to view them correctly?--Tom 144 (𒄩𒇻𒅗𒀸) 01:19, 9 February 2018 (UTC)Reply
When using this font, the characters show up as their correct Hittite forms. The user would need to install something, but if they care enough about the actual letter forms (which most don't), then we can write a transliteration policy and include a note about installing the font (we'd also mention it on WT:AHIT). The truth of the matter is that most people don't care, but using the correct Unicode characters is crucial for accuracy. —*i̯óh₁nC[5] 02:11, 9 February 2018 (UTC)Reply
Alright, now I can see the correct signs. Yeah, I agree most people won't pay attention to the original script. This seems like a reasonable solution. --Tom 144 (𒄩𒇻𒅗𒀸) 03:04, 9 February 2018 (UTC)Reply
@Tom 144: Great! You're gonna have to wrap the cuneiform in the inflection tables with {{m}} or {{lang}} to fix the display, but that's not bad. I've also added a |lang= parameter to {{cuneiform}} so you can specify {{cuneiform|lang=hit}} for the proper display. But first we should add a note about the font to WT:AHIT, and then create a transliteration policy page as well. @Erutuon, so do you think it is appropriate to update Mediawiki:Common.css with this with a fallback? —*i̯óh₁nC[5] 03:16, 9 February 2018 (UTC)Reply
@JohnC5: I don't really understand what {{lang}} is for, but Ok. --Tom 144 (𒄩𒇻𒅗𒀸) 03:28, 9 February 2018 (UTC)Reply
@Tom 144: {{lang}} is for wrapping text with the appropriate text tagging and for adding transliteration where applicable but not linking. {{m}} and {{l}} are for linking in particular. —*i̯óh₁nC[5] 03:30, 9 February 2018 (UTC)Reply
@JohnC5: I would say go ahead, add Hittite fonts. — Eru·tuon 21:12, 11 February 2018 (UTC)Reply
@Erutuon: Could I request your aid? —*i̯óh₁nC[5] 04:10, 12 February 2018 (UTC)Reply
@JohnC5: On which part? Formulation of the rule, placement in the file? — Eru·tuon 22:11, 12 February 2018 (UTC)Reply
@Erutuon: The formulation (i.e. which other default scripts should be added). I assume we can just plop it down immediately after the other .Xsux stuff. Right? —*i̯óh₁nC[5] 22:15, 12 February 2018 (UTC)Reply
@JohnC5: Well, the only other font I know about was the one I mentioned, so the rule would be :lang(hit).Xsux { font-family: UllikummiA, "Segoe UI Historic"; }. Or maybe the font stack for .Xsux could replace "Segoe UI Historic". And yeah, that's where I'd put it. — Eru·tuon 23:03, 12 February 2018 (UTC)Reply
@Erutuon: Soooo... —*i̯óh₁nC[5] 07:22, 13 February 2018 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── I guess I'm being confusing. The hit-Xsux rule should contain either the font-list I just posted, or the Hittite font followed by all the current fonts specified for Xsux. The latter may be best. The Xsux font-list has to be duplicated in the hit-Xsux rule or it won't be used for hit-Xsux text. — Eru·tuon 08:40, 13 February 2018 (UTC)Reply

@JohnC5, for morphemes should we use appendices? e.g Appendix:Hittite/-ant-? Lemmatizing them in cuneiform would be impossible. --Tom 144 (𒄩𒇻𒅗𒀸) 02:23, 16 February 2018 (UTC)Reply
@Tom 144: hmmm. I'm leaning towards just latin script entries for morphemes. Since, they are an artificial thing, I'm fine with them being in script. —*i̯óh₁nC[5] 02:41, 16 February 2018 (UTC)Reply
@JohnC5: I'm thinking on including the note concerning the fonts in each entry through {{cuneiform}} so it displays it when lang=hit is imputed. It would look like this. The text I intend to put in {{cuneiform}} is:

{| class="messagebox" style="margin-bottom: 0.5em; background: #{{#ifeq:{{{lang|}}}|hit|FFFFFF; border:solid Tan 1px;|FFFFFF}}"
|-
| <div class="Xsux" style="font-size: 300%;" {{#if:{{{lang|}}}|lang{{=}}"{{{lang}}}"}} > {{PAGENAME}}</div>
|{{#ifeq:{{{lang|}}}|hit|<font size=3>The cuneiform characters Unicode displayes by default do not accurately represent the original script. To view the correct characters install the correct fonts at [https://www.hethport.uni-wuerzburg.de/cuneifont/ www.hethport.uni-wuerzburg.de].</font>|}}
|}
Would this conflict with any other purpose of the template? --Tom 144 (𒄩𒇻𒅗𒀸) 03:21, 17 February 2018 (UTC)Reply
@Tom 144: That markup is not great. The extra table infrastructure for a message that may not be displayed is not great. —*i̯óh₁nC[5] 10:56, 17 February 2018 (UTC)Reply
@JohnC5:, the table would be invisible when the mesage isn't displayed. --Tom 144 (𒄩𒇻𒅗𒀸) 12:26, 17 February 2018 (UTC)Reply
@Tom 144: I think it would take up space from the markup though. —*i̯óh₁nC[5] 12:31, 17 February 2018 (UTC)Reply
@JohnC5:, Where should we display the note then?--Tom 144 (𒄩𒇻𒅗𒀸) 17:01, 17 February 2018 (UTC)Reply
@Tom 144: Fixed. Now if I could just get @Erutuon to make that change to common.css for me... —*i̯óh₁n̥C[5] 07:16, 24 February 2018 (UTC)Reply
@JohnC5: Heheh. Did you forget I'm not an admin? — Eru·tuon 07:21, 24 February 2018 (UTC)Reply
@Erutuon: Oh lol. Could you type out specifically what I should put? Thanks! —*i̯óh₁n̥C[5] 07:24, 24 February 2018 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── I guess I was too cryptic. Thus:

:lang(hit).Xsux {
	font-family:  UllikummiA, Akkadian, FreeIdgSerif, CuneiformComposite, Segoe UI Historic, sans-serif;
}

Eru·tuon 09:31, 24 February 2018 (UTC)Reply

@Erutuon: Thanks! @Tom 144: You can remove the extra code from your common.css since it's now in Mediawiki:Common.css. —*i̯óh₁n̥C[5] 09:43, 24 February 2018 (UTC)Reply