Wiktionary:About Marshallese

From Wiktionary, the free dictionary
Jump to navigation Jump to search
link={{{imglink}}} This is a Wiktionary policy, guideline or common practices page. This is a draft proposal. It is unofficial, and it is unknown whether it is widely accepted by Wiktionary editors.
Policies – Entries: CFI - EL - NORM - NPOV - QUOTE - REDIR - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS - VOTES.
A user suggests that this page should be expanded.
If you have knowledge of “About Marshallese”, please edit this page or discuss possible changes.

Please remove this template after the problem has been dealt with.

English Wikipedia has an article on:
Wikipedia

This page proposes how Marshallese word entries may be maintained on Wiktionary. To learn more about the language itself, see the article for the Marshallese language on Wikipedia.

Alphabet issues[edit]

There are currently display issues with five Marshallese letters:

Description Letter Unicode Issue
L-cedilla Ļ, ļ U+013B, U+013C Most fonts display this letter with a comma-below diacritic instead of a cedilla, to accommodate the expectations of the Latvian alphabet.
M-cedilla M̧, m̧ U+004D-U+0327, U+006D-U+0327 Not encoded as single glyph, and as such requires a combining diacritic that does not display or align properly in most fonts. When displayed properly, the cedilla is placed either beneath the middle of the letter or underneath the rightmost column of the letter (but not too far to the right).
N-cedilla Ņ, ņ U+0145, U+0146 Most fonts display this letter with a comma-below diacritic instead of a cedilla, to accommodate the expectations of the Latvian alphabet.
N-macron N̄, n̄ U+004E-U+0304, U+006E-U+0304 Not encoded as single glyph, and as such requires a combining diacritic that does not display or align properly in most fonts.
O-cedilla O̧, o̧ U+004F-U+0327, U+006F-U+0327 Not encoded as single glyph, and as such requires a combining diacritic that does not display or align properly in most fonts.

The characters given here are only approximations of the actual characters that are used in careful typesetting, but they are conditionally used until a better solution is found. Note especially that the sequences involving the combining macron character (U+304) or the combining cedilla character (U+0327) will not display correctly in the majority of fonts.

Only the standard diacritics are used in Wiktionary entries for Marshallese. Alternative schemes (particularly the Ḷ ḷ Ṃ ṃ Ṇ ṇ Ñ ñ Ọ ọ alternatives promoted by the online version of the Marshallese-English Dictionary) are not used. Three other letters with diacritics, Ā ā Ō ō Ū ū, are already well-displayed in most modern default browser fonts; alternative forms à ã Ä ä Õ õ Ö ö Ũ ũ Ü ü are not used.

Compare:

  • Arial: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Arial Unicode MS: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Calibri: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Cambria: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Candara: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Charis SIL: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Code2000: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Consolas: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Constantia: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Corbel: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Cormorant: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Courier New: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • DejaVu Sans: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • DejaVu Sans Mono: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • DejaVu Serif: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Doulos SIL: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Gentium: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Gentium Basic: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Gentium Book Basic: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Gentium Plus: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Inconsolata: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Junicode: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Linux Libertine: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Lucida Sans Unicode: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Noto Sans: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Noto Sans Mono: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Noto Serif: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Open Sans: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Segoe UI: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Source Code Pro: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Source Sans Pro: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Source Serif Pro: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Tahoma: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū
  • Times New Roman: Ā ā Ļ ļ M̧ m̧ Ņ ņ N̄ n̄ O̧ o̧ Ō ō Ū ū

Spelling[edit]

Separate word entries may be provided in both the old orthography and the new orthography, though references are easier to provide for words spelt in the new orthography because this is what the Marshallese-English Dictionary uses, and as such Marshallese word entries on Wiktionary are expected to overwhelmingly be from the new orthography. Where more than one spelling for the same word is used, including differences reflecting old and new orthographies, their entries can cross-reference each other by placing the other spellings in "Alternative forms" section of the entry, with a qualifier of which different orthography, if any, the linked word uses.

Pronunciation template[edit]

Marshallese pronunciations are embedded using a special template, {{mh-ipa-rows}}, using phonological conversion algorithms described in Module:mh-pronunc. For example, {{mh-ipa-rows|mhahjelh}} embeds this:

  • (phonetic) IPA(key): [mˠɑːzʲɛlˠ], (enunciated) [mˠɑ tʲɛlˠ]
  • (phonemic) IPA(key): /mˠæɰtʲɛlˠ/
  • Bender phonemes: {m̧ahjeļ}

The template describes both phonemic and phonetic pronunciations of Marshallese words:

  • MED phonemes are a pronunciation convention originally devised by Byron W. Bender which appears alongside word entries in the Marshallese-English Dictionary, published in 1976. Words are represented with non-IPA symbols for the phonemes of the language, including just four vowels only specified for vowel heights (a, e, ȩ, i), and consonants spelt in a way that resembles written Marshallese (p, b, j, t, k, kʷ, m, m̧, n, ņ, ņʷ, g, gʷ, d, r, rʷ, l, ļ, ļʷ, y, h, w), and bracketed in curly brace characters: { }. Consonants each imply a certain secondary articulation that directly influences the backness and roundedness of neighboring vowels, but this exact interplay is not directly indicated in the transcription other than what can be inferred for the spellings involved.
  • International Phonetic Alphabet (IPA) phonemic transcription, bracketed in forward slash characters: / /. Much like MED phoneme notation it only describes individual consonant and vowel phonemes, but represented in a more IPA-friendly format. Again, as vowel phonemes are specified only for height and not for backness or roundedness, they are given symbols corresponding to the front vowels (æ, ɛ, e, i). The consonants, for the most part, are specified orthogonally for primary articulation (labial, coronal and dorsal), secondary articulation (palatalized, velarized and labiovelarized), and basic manner of articulation (obstruent, nasal, trill, lateral or approximant), with the resulting symbols pʲ, pˠ, tʲ, tˠ, k, kʷ, mʲ, mˠ, nʲ, nˠ, nʷ, ŋ, ŋʷ, rʲ, rˠ, rʷ, lʲ, lˠ, lʷ, j, ɰ, w. They are not specified for voice or frication, as these are not phonemic in Marshallese and can occur in free variation.
  • Either one or two phonetic transcriptions based on the Rālik (western) and Ratak (eastern) dialects of Marshallese. Where both dialects agree on phonetic reflexes for the same word, only one form is displayed.

The pronunciation template cannot rely directly on the spelling of Marshallese word entries to guess their pronunciation, as none of the common orthographies in use have a one-to-one phonemic correspondence, though the newer orthography is significantly more phonologically consistent than the older orthography. Instead, the template uses a code format which is essentially a simplified ASCII-only modification of Bender's pronunciation guide for the MED. For example, the {{mh-ipa-rows|mhahjelh}} example used earlier in this section uses the code mhahjelh, which is similar to m̧ahjeļ in the MED. The code uses a fairly strict syntax, but is case-insensitive.

The supported vowel phoneme symbols are:

Code Height MED Phoneme IPA Phonemic IPA Phonetic Spellings
a open {a} /æ/ [æ, ɑ, ɒ] ā, a, o̧
e open-mid {e} /ɛ/ [ɛ, ʌ, ɔ] e, ō, o
& close-mid {ȩ} /e/ [e, ɤ, o]
i close {i} /i/ [i, ɯ, u] i, ū, u

The supported consonant symbols are:

Code Articulation MED Phoneme IPA Phonemic IPA Phonetic Spellings
Primary Secondary Manner
b labial velarized obstruent {b} /pˠ/ [pˠ, bˠ] b, bw
d coronal palatalized trill {d} /rʲ/ [rʲ] r
h (dorsal) velarized approximant {h} /ɰ/ - -
j coronal palatalized obstruent {j} /tʲ/ [tʲ, zʲ] e, i, -
k dorsal velarized obstruent {k} /k/ [k, ɡ] k
kw dorsal labiovelarized obstruent {kʷ} /kʷ/ [kʷ, ɡʷ] kw, k
l coronal palatalized lateral {l} /lʲ/ [lʲ] l
lh coronal velarized lateral {ļ} /lˠ/ [lˠ] ļ
lw coronal labiovelarized lateral {ļʷ} /lʷ/ [lʷ] ļw, ļ
m labial palatalized nasal {m} /mʲ/ [mʲ] m
mh labial velarized nasal {m̧} /mˠ/ [mˠ] m̧, m̧w
n coronal palatalized nasal {n} /nʲ/ [nʲ] n
ng dorsal velarized nasal {g} /ŋ/ [ŋ]
ngw dorsal labiovelarized nasal {gʷ} /ŋʷ/ [ŋʷ] n̄w, n̄
nh coronal velarized nasal {ņ} /nˠ/ [nˠ] ņ
nw coronal labiovelarized nasal {ņʷ} /nʷ/ [nʷ] ņw, ņ
p labial palatalized obstruent {p} /pʲ/ [pʲ, bʲ] p
r coronal velarized trill {r} /rˠ/ [rˠ] r
rw coronal labiovelarized trill {rʷ} /rʷ/ [rʷ] rw, r
t coronal velarized obstruent {t} /tˠ/ [tˠ, dˠ] t
w (dorsal) labiovelarized approximant {w} /w/ - w, -
y (dorsal) palatalized approximant {y} /j/ - e, i, -

The template's code syntax also supports the use of any number of plain apostrophes (') to disambiguate symbol spellings. For example, {{mh-ipa-rows|jal'w&j}} embeds this:

  • (phonetic) IPA(key): [tʲælʲ(o)wɤtʲ], (enunciated) [tʲælʲ wɤtʲ]
  • (phonemic) IPA(key): /tʲælʲwetʲ/
  • Bender phonemes: {jalwȩj}

...whereas {{mh-ipa-rows|jalw&j}} embeds this instead:

  • (phonetic) IPA(key): [tʲɒlʷɤtʲ]
  • (phonemic) IPA(key): /tʲælʷetʲ/
  • Bender phonemes: {jaļʷȩj}

The syntax also permits any number of ASCII whitespace characters and ASCII plain hyphens (-), as well as commas (,) to separate multiple different examples. In the case of commas, the module script processes each comma-separated piece of code separately, and shows the converted result of each fragment with duplicate results removed. For example, {{mh-ipa-rows|kewkew, k&wk&w}} embeds this:

  • (phonetic) IPA(key): [kɔːɡɔ, koːɡo], (enunciated) [kɔ kɔ, ko ko]
  • (phonemic) IPA(key): /kɛwkɛw, kewkew/
  • Bender phonemes: {kewkew, kȩwkȩw}

Besides these symbols, the template allows hyphens and whitespace characters in any combination intermixed with the code sequences, but besides these, the template only accepts code representations of certain sequences of consonants and vowels for each comma-separated fragment of code, and will display an error if that code is malformed. In particular:

  • Code may begin or end with bare vowel phonemes (not normally allowed in the Marshallese language), but vowels may never directly neighbor each other, and they must be separated by at least one consonant. If a bare vowel is present, the module script will assume that the code represents a prefix and/or suffix, and will generate multiple phonetic IPA pronunciations depending on whether the bare vowel is attached to a consonant that is labialized, velarized or rounded. For example, {{mh-ipa-rows|ri-}} embeds this:
    • (phonetic) IPA(key): [rˠi‿ʲ, rˠɯ‿ʲ, rˠɯ‿ˠ, rˠɯ‿ʷ, rˠu‿ʷ]
    • (phonemic) IPA(key): /rˠi-/
    • Bender phonemes: {ri-}
  • Code may only begin with single consonants or double consonants of the same phoneme (geminates). Where a word begins with a double consonant, an epenthetic vowel will be inserted in phonetic representation in differents ways for the Rālik and Ratak dialects.
  • Code may begin with pseudo-glides h_ w_ y_ or end with identical pseudo-glides _h _w _y attached directly to otherwise bare vowels. These pseudo-glides do not show up in phonemic transcriptions, either in the MED's format or in IPA, but for the phonetic transcription, the module script will use this as a hint to produce only certain vowel contours of the otherwise bare vowels, rather than trying to generate all three possible combinations. For example, {{mh-ipa-rows|ri_w-}} embeds this:

Entry collation[edit]

When sorting entries for categorization, a simple ASCII-based transcription can properly collate Marshallese entries in Marshallese word categories. Use all lowercase for sorting, and include spaces and dashes as normally included in the entry. And for letters with diacritics:

ā ļ ņ ō ū
a~ l~ m~ n~ n~~ o~ o~~ u~

So, a word like M̧ajōļ would be collated m~ajo~~l~.

When using template code like {{head|mh|noun}}, Wiktionary properly and automatically collates entries in associated Marshallese word categories, but collation may still be necessary for topical categories like Category:mh:Islands or Category:mh:Animals which are subcategories of Category:mh:List of sets. For example, in the page for the Marshallese country name M̧ajōļ, the word entry is not categorized with the markup [[Category:mh:Countries]] by itself, but collation syntax is added to produce the markup [[Category:mh:Countries|m~ajo~~l~]] instead.

Marshallese–English Dictionary reference template[edit]

The Marshallese–English Dictionary is the only complete Marshallese dictionary in existence, and has one significant online location. The template {{meod-ref}} links to that location, and the template can be updated in case that location changes. The template can accept up to five arguments, each a separate reference. For each reference, only the URL substring immediately following MOD/ need be provided.