Module talk:zlw-lch-IPA

From Wiktionary, the free dictionary
Latest comment: 25 days ago by Vininn126 in topic Update
Jump to navigation Jump to search

Transcription decisions

[edit]

(Notifying Hergilei, Tweenk, Wrzodek, KamiruPL, BigDom, Hythonia, Tashi): I'd really like everyone's input here - catonif has been beyond kind in helping create a new module for Polish pronunciation. The new module should be a lot more versatile and a lot less janky than the previous. There are some decisions made that are different from the current module that should be discussed. User:Vininn126/Sandbox#New_pl-IPA has examples.

  1. Removal of syllable breaks
    This is because syllable breaks are not phonemic in Polish, furthermore stress will still be marked as it is just barely phonemic, and also we have the hyphenation template printing syllabification. At that, syllabification is much more "intelligent" and can autodetect certain affixes.
  2. No /ŋ/ before /k/, /ɡ/, or /x/, as it's not really phonemic. Vininn126 (talk) 08:13, 7 August 2023 (UTC)Reply
  • Good work both of you. Fine with the removal of syllable breaks. Regarding /ŋ/, I'm OK with removing but it reminds me of something I was thinking about the other day. Both this way and the existing way, it strikes me as odd that we show, say, lęk and okienko as having the same sound. I understand the phonetic argument but it seems unhelpful for non-Polish speakers to use the same character when they sound so different (to me at least). Is there a difference when ą and ę are involved, or is it just my ears playing tricks? BigDom 09:27, 7 August 2023 (UTC)Reply
    @BigDom By transcribing it differently we would be representing how they might sound different, it sounds like your in support of transcribing okienko with /n/ and lęk with /ŋ/, which I believe is how it currently works. Vininn126 (talk) 09:32, 7 August 2023 (UTC)Reply
    Hi @BigDom, thanks for the feedback. The two terms you mentioned would indeed be transcribed identically as /ɔˈkjɛnkɔ/ and /ˈlɛnk/ with the new module, since I assumed they were pronounced the same, since currently they are transcribed identically as well (just with a different character): /ɔˈkjɛŋkɔ/ and /ˈlɛŋk/. I have to ask, if you pronounce them differently then how can you be fine with the current system? Catonif (talk) 09:46, 7 August 2023 (UTC)Reply
    @Vininn126 That's right, although like Catonif says they're currently transcribed the same, just as /ŋ/. What would your opinion be regarding differentiating -ank-/-enk- and -ąk-/-ęk-? TBH I haven't looked in any papers/books, so am happy to be overruled if people generally don't distinguish the two. @Catonif Since the other day when I noticed I'm not really fine with the current system, but like I say it's something that only occurred to me recently. Not a criticism of the new template at all :) BigDom 10:24, 7 August 2023 (UTC)Reply
    @BigDom One option would be to have <nk> and <ng> give /nk/ /nɡ/ and <ąk> <ęk> <ąg> <ęg> give /ɔw̃k/, /ɛw̃k/. Vininn126 (talk) 10:30, 7 August 2023 (UTC)Reply
    Sounds OK to me. BigDom 10:40, 7 August 2023 (UTC)Reply
    Ok that's what I've done. For now at least. Catonif (talk) 11:36, 7 August 2023 (UTC)Reply
  • As for the removal of /ŋ/, I wouldn't be so fond of getting rid of it. Edmund Gussmann in his "The Phonology of Polish (The Phonology of the Worlds Languages)" claims that "The velar nasal has a highly restricted distribution in Polish since it appears only before a following velar plosive. It is not the case, however, that the dental [n] is barred from that position and thus minimal pairs can be found such as" and he provides two examples:
    łąka /ˈwɔŋ.ka/ and łonka /ˈwɔn.ka/ (diminutive plural of łono)
He also adds "Admittedly, the second member of the pair is morphologically complex and involves a diminutive suffix, but morphology is traditionally viewed as independent of phonology and having no influence on it. In such a case the ineluctable conclusion is that both [n] and [˛] belong to or represent separate phonemes; the highly restricted distribution of the velar nasal remains nothing more than an accidental gap in the distribution of phonemes".
In that case, we'd have to really consider leaving it as a separate phoneme. Tashi (talk) 14:54, 7 August 2023 (UTC)Reply
The bigger problem is the relation between /w̃/ and /ŋ/ not /n/. Vininn126 (talk) 14:58, 7 August 2023 (UTC)Reply
@Tashi The two terms you mentioned in the new module would be transcribed as /ˈwɔw̃ka/ and /ˈwɔnka/, respecting the distinction. In the old module currently being used, on the other hand, they'd be both transcribed as /ˈwɔŋka/. As far as I can understand, the change would actually be moving towards your point. Catonif (talk) 15:19, 7 August 2023 (UTC)Reply
If so then yes, that'd solve it. Cheers. Tashi (talk) 18:56, 7 August 2023 (UTC)Reply
(Notifying Hergilei, Tweenk, Wrzodek, KamiruPL, BigDom, Hythonia, Tashi): Next question: Do we want rhymes turned on for multiword terms? Vininn126 (talk) 13:36, 27 August 2023 (UTC)Reply
To clarify, the question is about what to have as default. Rhyme could still be turned on manually with |r=+ for special cases even if we choose to turned them off by default. Catonif (talk) 13:47, 27 August 2023 (UTC)Reply
Yes. I see no reason why multiword terms shouldn't have them. Rhymes are kinda helpful for me sometimes so I think having them in all cases would be really good. Tashi (talk) 16:48, 29 August 2023 (UTC)Reply
I'd agree. Vininn126 (talk) 16:53, 29 August 2023 (UTC)Reply
Alright, made it so. Catonif (talk) 16:56, 29 August 2023 (UTC)Reply
Thanks! Vininn126 (talk) 16:58, 29 August 2023 (UTC)Reply

test cases?

[edit]

@Catonif I did a pass standardizing the code to the style conventions of other modules. I'm looking to see if there are any pages of test cases so I have some idea if I'm breaking things. Do you have such a page hanging out? Benwing2 (talk) Benwing2 (talk) 22:55, 1 September 2023 (UTC)Reply

BTW are you coming from a C or Java background? All those extra parens and semicolons that I removed look very Java-like. Benwing2 (talk) 22:56, 1 September 2023 (UTC)Reply
@Benwing2 How painful, I've been deprived of my semicolons! Jokes aside, for test cases we've been using User:Vininn126/Sandbox, a bit messy but it did the job. You may reorganise them wherever and however you see more fit. Catonif (talk) 23:00, 1 September 2023 (UTC)Reply

Deployment

[edit]

@Benwing2 I was wondering - did you see the TODO's and think they had to be done before deployment? If something like adding infrastructure to include dialects and such are getting in the way, I think we can wait with some of these. I'd rather do away with the spaghetti code that we currently have and deploy something much more effecient. Vininn126 (talk) 13:43, 22 September 2023 (UTC)Reply

@Vininn126 Sorry for the delay. I want to change the parameter structure before we deploy this module widely, otherwise it becomes a lot more painful to do so in the future. I got part way through doing this but some RL issues have come up. I am traveling now but I'll be back in Austin in a few days and have more time to work on this. Benwing2 (talk) 21:16, 23 September 2023 (UTC)Reply
Ah okay! Thanks for the explanation, I just wanted to make sure we were on the same page. Vininn126 (talk) 21:18, 23 September 2023 (UTC)Reply
@Benwing2 May I bug you for a status update? Vininn126 (talk) 09:56, 12 November 2023 (UTC)Reply
@Vininn126 Yes, I've done a bunch of work on understanding and cleaning up Catonif's code, on documenting the existing Spanish and Portuguese code I wrote so I can remember how it works, and porting and merging code from both of these modules. Still some work to do. It's tricky because I need to use code from both the Spanish and Portuguese modules and merge both with Catonif's code. Currently fixing some bugs in {{place}} but I'll be able to get back to working on Module:pl-szl-IPA within two days, probably tomorrow (Sunday). Benwing2 (talk) 10:42, 12 November 2023 (UTC)Reply
@Benwing2 Awesome, thanks! Vininn126 (talk) 10:47, 12 November 2023 (UTC)Reply

New stress rule

[edit]

@Benwing2 There's a stress rule we forgot to add - the prefixes eks-, super-, and hiper-, and arcy- don't effect stress, so if they are attached to a monosyllabic root the final syllable gets the stress. Vininn126 (talk) 20:05, 8 December 2023 (UTC)Reply

@Vininn126 Thanks, I'll add it. Benwing2 (talk) 21:16, 8 December 2023 (UTC)Reply
@Benwing2 And "wice-", "ekstra-", and probably all other borrowed intensifying prefixes but I don't know how many. Read THIS, THIS2, the Wikipedia article. Just use Google Translate ok................... KamilekLebioda (talk) 21:27, 8 December 2023 (UTC)Reply
@KamilekLebioda All three refs mention the same prefixes. Are we missing any? Benwing2 (talk) 21:42, 8 December 2023 (UTC)Reply
@Benwing2 Probably but I posted the refs more to get you to read the rest of the rules, maybe you'll find something you missed like the endings "-liśmy, -liście". KamilekLebioda (talk) 21:55, 8 December 2023 (UTC)Reply
@KamilekLebioda Well, I don't read Polish. I can use Google Translate but it's slow so if you know of anything that's missed, let me know. Benwing2 (talk) 21:58, 8 December 2023 (UTC)Reply
@Benwing2 Use DeepL, it's much better IMO. "I don't read Polish" - well, that means it's time to learn Polish, d'uhh; the best language in the world 🤓🤓🤓😼😼😼 KamilekLebioda (talk) 22:03, 8 December 2023 (UTC)Reply
@KamilekLebioda Those endings are already taken care of. Vininn126 (talk) 23:03, 8 December 2023 (UTC)Reply

Nasals

[edit]

@Benwing2 I'm wondering if we shouldn't transcribe the nasals as closer to true nasals as that is closer to the underlying phoneme and is what is used in careful speech. Vininn126 (talk) 18:42, 24 December 2023 (UTC)Reply

I agree but if so they should be transcribed as “nasal diphthongs” rather than pure nasal vowels as those pretty much never occur in Polish. So in phonemic transcription I’d use something like /ɛɰ̃, ɔɰ̃/. Also, in phonemic transcription I wouldn’t indicate final devoicing. Kot and kod might be both phonetically [kɔt] but one is /kɔt/, the other is /kɔd/ – that’s important since the devoicing might be blocked eg. before a following vowel-initial word in kod but the final in kot would remain voiceless. // Silmeth @talk 20:14, 9 January 2024 (UTC)Reply
This makes me wonder if we should also have a phonetic transcription with all these rules applied. @Benwing2, what do you think? Vininn126 (talk) 20:16, 9 January 2024 (UTC)Reply
I'll add I'm not sure about using that particular nasal diphthong, but in general I think using some nasal diphthong would be better. Vininn126 (talk) 20:30, 9 January 2024 (UTC)Reply
@Vininn126 @Silmethule I'm not sure if it's correct to render final consonants voiced phonemically. Unless there's a phonetic difference (which may in fact be the case, I think I read something to this effect, but that's a different matter), I would think that the final voicing/devoicing is a morphophonemic issue, i.e. it's at a higher level than the phonemic. However, for the moment I think it will add extra complexity to distinguish phonemic and phonetic, and writing /ɛɰ̃, ɔɰ̃/ or [ɛɰ̃, ɔɰ̃] is probably fine. Benwing2 (talk) 23:22, 9 January 2024 (UTC)Reply
@Benwing2: you get distinctions like sklep Adama [ˈsklɛp aˈdama] vs chleb Adama [ˈxlɛb aˈdama] – at least for some speakers. Some, esp. in Lower Poland, AFAIK, would voice both, some in other regions would devoice both. But not all, and the fact there are speakers with this distinction shows this is phonemic (and there is nothing morphological going on in here!).
EDIT: and yeah, I remember reading something about there being a distinction in the strength of release (~the amount of air used) between the stops /p, t, k/ vs /b, d, ɡ/ making them slightly distinguishable even in whisper – but can’t remember where. // Silmeth @talk 11:05, 10 January 2024 (UTC)Reply
@Benwing2 One place we should not have these nasals though is before ł and l, which I think is already in the module, we just shouldn't get rid of it. Vininn126 (talk) 19:07, 18 February 2024 (UTC)Reply

Nasal clusters

[edit]

(Notifying KamiruPL, BigDom, Hythonia, Tashi, Vininn126, Sławobóg, Silmethule):

A. /vɔw̃ʂ/ vs. /ɡɛɲɕ/

The current approach to nasalized approximants is not consistent. /w̃/ is given before fricatives (suggesting it has a phonemic status), while /ɲ/ is used before /ɕ, ʑ, s, z/ (presumably [j̃] would be considered an allophone of /ɲ/). The modern phonematization of Polish I know of takes a stance that both pairs [ŋ -w̃], [ɲ - j̃] are either allophones of one phoneme /ŋ/, respectively /ɲ/ or allophones of two phonemes /ŋ/ and /w̃/, respectively /ɲ/ and /j̃/. For example, Jassem (2003:104) and Wiśniewski (2007:192) give them the status of phonemes, while Sawicka (1995:135) treats them as allophones of /ŋ, ɲ/. Without arguing in favor of either, the following are possible:

  • węże /vɛŋʐɛ/ ('snakes') or /vɛw̃ʐɛ/,
  • wąż /vɔŋʂ/ ('snake') or /vɔw̃ʂ/,
  • gęś /ɡɛɲɕ/ ('goose') or /ɡɛj̃ɕ/,
  • gąsior /ɡɔɲɕɔr/ ('gander') or /ɡɔj̃ɕɔr/.

The current template generates /vɔw̃ʂ/ but /ɡɛɲɕ/ which strikes me as odd. Furthermore, words like gęś or gąsior can be represented phonemically as /ɡɛŋɕ/, /ɡɔŋɕɔr/ as well since /ŋ/, /ɲ/ are (mostly) in free variation before /ɕ, ʑ/ (alternatively analysed as /ɡɛw̃ɕ/ or /ɡɔw̃ɕɔr/ if /w̃/, /j̃/ are favoured). Sawicka (1995:135) characterizes the latter as more frequently used.

Possible solutions:

1a. Ditch /w̃/ in favor of /ŋ/ and use węże /vɛŋʐɛ/ and wąż /vɔŋʂ/ instead of /vɛw̃ʐɛ/, /vɔw̃ʂ/. For gęś, gąsior continue with /ɡɛɲɕ/, /ɡɔɲɕɔr/. Benefit: a minimal number of phonemes, consistent with Sawicka.

1b. Ditch /w̃/ in favor of /ŋ/ and use węże /vɛŋʐɛ/ and wąż /vɔŋʂ/ instead of /vɛw̃ʐɛ/, /vɔw̃ʂ/. For gęś, gąsior, give /ɡɛŋɕ/, /ɡɔŋɕɔr/ as main variants and /ɡɛɲɕ/, /ɡɔɲɕɔr/ as second variants. Benefit: same as 1a, plus it illustrates free variants in front of /ɕ, ʑ/ in the order of their frequency.

2a. Introduce /ɡɛj̃ɕ/, /ɡɔj̃ɕɔr/ and consequently, /j̃/ is used for ń before sibilants, e.g. /ˈkɔj̃.ski/. Benefit: representation of ę, ą as diphthongs before fricatives, consistent with Jassem, Wiśniewski.

2b. Introduce /ɡɛw̃ɕ/, /ɡɔw̃ɕɔr/ as main variants and /ɡɛj̃ɕ/, /ɡɔj̃ɕɔr/ as second variants. consequently, /j̃/ for ń before sibilants, e.g. koński /ˈkɔj̃.ski/. Benefits: the same as 2a, plus it shows free variation between /w̃, j̃/ before /ɕ, ʑ/.

I'm slightly more inclined towards 2 (2b) even though there may be some objections towards treating /j̃/ as a phoneme. But since similar objections can be given against the inclusion of both /w̃/ and /ŋ/ in the current analysis, I think giving /j̃/ is a practical solution for a dictionary. It is a similar solution to how the German pronunciation module treats the question of phonemic status of /ç/ vs /x/, or the inclusion of /ɐ̯/ although it is a predictable allophone.

B. n in clusters before fricatives

Currently, the module generates /n/ for vowel + n + fricative. While this is not incorrect, Słownik wymowy polskiej PWN (SWP) considers this to be a spelling pronunciation. It prefers a nasal variant /w̃/ in front of spirants /s,z, ʂ, ʐ/ and /n/ before labial and velar frictatives (p.LV). Giving /n/ as the only variant seems to me a misrepresention of the matter.

If variants could be generated automatically, it would be helpful to introduce the following (assuming the current analysis in the module, the transcriptions found in Sawiska and SWP were adjusted accordingly, but they can be easily transposed to the proposal 1 or 2 above):

  • m before f, w: amfiteatr /am.fiˈtɛ.atr/ or /aw̃.fiˈtɛ.atr/; tramwaj /ˈtram.vaj/, /ˈtraw̃.vaj/
  • n before f, w: inwalida /im.vaˈli.da/, /in.vaˈli.da/, /iw̃.vaˈli.da/; informacja /im.fɔrˈmat͡s.ja/, /in.fɔrˈmat͡s.ja/, /iw̃.fɔrˈmat͡s.ja/ (SWP does not give the /m.v, m.f/ variant.)
  • n before s, z: pensja /ˈpɛw̃.sja/ /ˈpɛn.sja/; cenzor /ˈt͡sɛw̃.zɔr/, /ˈt͡sɛn.zɔr/
  • n before ch: bronchit /ˈbrɔn.xit/, /ˈbrɔw̃.xit/, koncha /ˈkɔn.xa/, /ˈkɔw̃.xa/, (Sawicka gives /ˈbrɔw̃.xit/ and /ˈkɔn.xa/ as more frequent which does not seem entirely plausible, SWP gives /ˈbrɔn.xit/, /ˈkɔn.xa/ as a first variant)
  • n before ś/si, ź/zi: awansie /aˈvaw̃.ɕɛ/, /aˈvan.ɕɛ/, /aˈvaɲ.ɕɛ/
  • n before sz, ż: plansza /ˈplaw̃.ʂa/, /ˈplan.ʂa/; branża /ˈbraw̃.ʐa/, /ˈbran.ʐa/
  • ę, ą before ś/si, ź/zi: gęś /ɡɛw̃ɕ/, /ɡɛj̃ɕ/, except after palatal consonants where /w̃/ is not possible, e.g. pięść /pjɛj̃ɕt͡ɕ/.

This generally agrees with my own speech habits. Except I'm from the south of Poland so I have a complete velar assimlation of -n- in -nch-. I would say /ˈkɔw̃.xa/ and /ˈbrɔw̃.xit/. I would use /n/ only to highlight the spelling.

C -ng-, -nk-, /ˈɕɔn.ka/ vs /ˈɕɔŋ.ka/

In principle, -n- in -ng- and -nk- is obligatorily assimilated to /ŋ/ only in loanwords. This type of pronunciation is prescribed by SWP, 1977 p. L. This is regardless if -ng-, -nk- is followed by a consonant, a vowel or word-final, e.g. punkt /puŋkt/, manko /ˈmaŋ.kɔ/, bank /baŋk/, żongler, /ˈʐɔŋ.ɡlɛr/, gangster, /ˈɡaŋ.kstɛr/, angina /aŋˈɡi.na/, gong /ɡɔŋk/. SWP disqualifies unassimilated /n/ in loanwords. (The spelling pronunciation The only exception I can find to this is on the prefix boundary. SWP gives /pan.ɡɛrˈma.ɲizm/ and /paŋ.ɡɛrˈma.ɲizm/ for pangermanizm.

However, -nk- and -ng- do not behave exactly the same. This is because -ng- is only found in loanwords while -nk- occurs frequently in native words. If variants can be generated automatically, it would be helpful to generate both /ˈɕɔn.ka/, /ˈɕɔŋ.ka/ for sionka and other native words. Typically, these variants are described as geographically conditioned (/n/ Warsaw vs /ŋ/ Kraków, Poznań). NB. The complete obligatory assimilation of -ng-, -nk- in the southern Poland is a feature shared with our southern neighbours. Slovak and Czech have velar assimilation of -n- in -nk- in native words as well.

It can be assumed that -nk- before a consonant or word-finally occurs only in loanwords while -nk- before a vowel occurs in native words, especially ending in -nka, -nko, -nki. However, there are multiple exceptions: inflected forms or derived words of bank, cynk, drink, dubbing, flank, frank, funk, link, pink, punk, szwank, szynk, tank, tynk, flanka, blanko, manko, panko, ankieta, blankiet, bunkier, Dunkierka, Sri Lanka, Helsinki, sztankiet and more. (I could seeve through SWP and sjp.pl to provide a more exhaustive list.)

Interestingly, there may even be a minimal pair for some speakers: frankowy (a relational adjective, 'pertaining to franc') vs. Frankowy (a possessive adjective, 'belonging to Franek').

Any chances the variants could be generated automatically? Qerez (talk) 21:10, 7 February 2024 (UTC)Reply

@Qerez the transcriptions you gave are not phonemic, but phonetic. We are attempting to give a phonetic transcription. As such, tramwaj will not have a nasal, but any words with ą/ę will. At a phonetic level, these can happen, but it's definitely not phonemic. Please read the above conversation. Vininn126 (talk) 21:16, 7 February 2024 (UTC)Reply
Also this module is the module - our current transcriptions do not use it yet. Vininn126 (talk) 21:19, 7 February 2024 (UTC)Reply
This is not what I am trying to do so. I'm rather pointing that the current anayslis is not consistent. It contains /w̃/ and /ŋ/ which are in complementary distribution by some analysis (e.g. Polish phonology in Wikiepdia which does not list /w̃/ as a phoneme). So the current approach pne could argue is not phonemic already (by some analysis). There are phonemic analyses of Polish where /ɲ/ and /j̃/ are considered phonemic. But this is not the point I'm making. While there are competing analysis of nasal phonemes of Polish, my point is either stick to one or the other analysis. Either /w̃/ is a phoneme next to /ŋ/ and so it is /j̃/ next to /ɲ/. (Jassem's or Wiśniewski analysis) or if [j̃] is an allophone of /ɲ/, so is the [w̃] of /ŋ/ (e.g. Sawicka's analysis, wikipedia approach). If the aim to stick to a phonomic transcritpion that's fine, my plea is for a consistent analysis whatever it is I will be fine. The current wiktionary approach is a mixture pf the two not known in liturature. Qerez (talk) 07:54, 8 February 2024 (UTC)Reply
@Qerez In the new module the velar n will not be present. Vininn126 (talk) 08:19, 8 February 2024 (UTC)Reply
Thanks for clarying that this module is not live yet. What phonemic analysis will be followed? How do you plan to represent lęk, bank? Qerez (talk) 09:29, 8 February 2024 (UTC)Reply
@Qerez lęk will have a nasal vowel, bank will have /n/. I don't think velar n will appear in any transcription. Vininn126 (talk) 09:30, 8 February 2024 (UTC)Reply
So /lɛ̃k/, /bank/? I would caution against it. This may be misleading and prone to misinterpretations. /lɛ̃k/ and /bank/ require understanding that obilgatory realization rules turn them into [lɛ̃ŋk, bãŋk] as [lɛ̃k, lɛw̃k] and [bank] are incorrect, not in free variation with [lɛ̃ŋk, bãŋk]. SWP 1977 gives "lęk: lɛŋk (nie: lɛw̃k)" (p.179), "bank: baŋk (nie: bank)" (p.17). While it gives "sionka: ɕɔnka‖ɕɔŋka" (p.402) as it recognizes "ɕɔŋka" as a Krakow-Poznań variant in the preface (p.L). Qerez (talk) 10:16, 8 February 2024 (UTC)Reply
@Qerez We would transcribe nasals as a diphthong (again, read the thread above this one like I said), and phonemic transcriptions always depend someone on understanding what phonological processes can happen after that. To do what you are suggesting we would have to use [], which is possible, but generally // is preferred. It's not our job to make people aware of what phonemic vs phonetic means. Vininn126 (talk) 10:21, 8 February 2024 (UTC)Reply
While a diphthong is a bit better /ˈwɔw̃ka/ /ˈlɛw̃k/, it will still be prone to misunderstanding. I think the approach taken here is to present a deep, underlying phonemic form. It is comparable to represent English sung with the underlying phonological form /sʌnɡ/ and expect users to understand how to turn this into the surface form [sʌŋ] using the set of morphophonological rules. [[1]]. In short, mandatory rules: /w̃k/ -> [ŋk], /nk/ → [ŋk] if there is no internal morpheme boundary, and an optional rule /n#k/ → [n#k] or [ŋ#k] if there is an internal morpheme boundary (symbolized by #). This is all fine but far from the user-friendliness that a dictionary should offer. Or I again misunderstand. It is quite difficult to follow the discussions above there are many opinions hard to understand what is the final.
My recommendation is for a phonematisation that is closer to the surface form. There are plenty of good one to choose from, e.g. [[2]], [[3]], [[4]]. Sawicka introduces boundary phonemes which would definately help to disambiguate: bank /bank/ [baŋk] vs /ɕɔnka/ [ɕɔnka, ɕɔŋka], /banˈkɔ.vɨ/ [baŋˈkɔvɨ] vs /d͡zbanˈkɔvɨ/ [d͡zbanˈkɔvɨ, d͡zbaŋˈkɔvɨ].
The phonematization in [wikipedia] (without the phonemes in the brackets, i.e. without (ɛ̃), (ɔ̃), (ɣ), (kʲ) ,(ɡʲ) (xʲ) (ɣʲ)) is another well documented and with allophonic rules described. It agrees with the proposal 1a, 1b above, so /ˈwɔŋka/ /ˈlɛŋk/ and /ɕɔnka, ɕɔŋka/. Why not to use that?
If I understand correctly both the currently desplayed pronunciations in wiktionary and the new proposal seem to be unattested in the literature. I think the closest I know of is Bronisław Rocławski, [wiedzy o języku polskim] but he still uses /ɛ̃, ɔ̃/. Qerez (talk) 11:48, 8 February 2024 (UTC)Reply
I do not think this is the right approach. There are a few problems with this interpretation - it's used in popular books but a lot of them are describing things from many years ago, actually listening to people you don't always get nasal assimilation with <n>. And also, again all of these processes can be determined from the true underlying phonemes which I presented in the previous message. Underlying doesn't mean "what is said", it means "how it's treated by the language". Third of all, these are not the pronunciations used in careful speech or what is taught - we are describing the literary standard, not every day speech. These transcriptions, while popular, are not best for what we are attempting to achieve. I have seen all these interpretations but they still not describe what is happening with the literary language/on a phonemic level. I am also aware of all the processes you have presented, they are not new to me. Vininn126 (talk) 11:57, 8 February 2024 (UTC)Reply
I'm confused so is the aim to present a literary model or colloquial one? I'm afraid that the unassimilated /n/ in /bank/ is outside of the literary norm or at least on it's very edge. Dunaj in Zasady poprawnej wymowy polskiej [13] qualifies the assimilated form as the "more correct" (poprawniejsza). The unassimlated ones as rare and heard/admissible in the colloquial style. This is pretty much the same stance that Słownik wymowy polskiej 1977 took 50 years ago, p. LV, 3.1.2. Things did not change that much over last half of the centure.
I find it quite misleading to present the underlaying form that happens to agree with a rare and colloquial form and to leave the average dictionary user to deduce the usual, literary normative form. While this is theoretically sound it is confusing at best.
Maybe a compromise would be to present surface forms next the underlying form, e.g. Latin module does it. Qerez (talk) 12:39, 8 February 2024 (UTC)Reply
You don't need to translate these words to me man :p
And things have change a lot in that time. A big part of that is how linguists analyze things. In the twentieth century you had the communist era alongside endecja, who had their own views to push and also 20th century views overall. Since that time all around linguistics things have changed a lot. Furthermore there has been no real convincing minimal pair to prove that velar n is a phoneme, as all nasal vowels can be understood to be an underlying diphthong. Just by throwing more links at me that are based on that doesn't really convince me of anything. They also said that Kashubian was a dialect in the 20th century which makes no sense in any linguistic context.
A phonetic transcription underneath a phonemic one would be possible, but it might also be cluttery - we already include Middle Polish and we are likely to include some dialects such as Northern Borderlands. I am unsure how to best handle Goral (i.e. split or as a dialect). I also don't see why you are convinced we are supposed to be helpful for the average reader - we are aimed at a linguistic interpretation which is wrought with jargon and some obscurity. We include terms like "virile" and don't include every deducible-from-the-phonetic-realization phoneme. It's not misleading, it's just not holding your hand. Vininn126 (talk) 12:45, 8 February 2024 (UTC)Reply

Readding syllable breaks to the IPA

[edit]

@Benwing2 I just realized that syllable breaks could arguably be marginally phonemic, there's a pair such as pod-robić and po-drobić. How would you feel about readding them? Vininn126 (talk) 15:36, 20 February 2024 (UTC)Reply

@Benwing2 Also in relation to the BP thread we might want to give both a phonemic and broad phonetic transcription. Vininn126 (talk) 10:16, 22 February 2024 (UTC)Reply
@Vininn126 Yes, I agree with both. I think after I finish the declension modules I should be able to get back to the pronunciation module (finally). Benwing2 (talk) 21:38, 22 February 2024 (UTC)Reply

Separating out Silesian

[edit]

@Benwing2 If I remember correctly, you had wanted to separate out Silesian? Would that be difficult to do? I'm asking because I could probably base a Kashubian module on that. Vininn126 (talk) 07:56, 1 May 2024 (UTC)Reply

@Vininn126 Hmm, I don't remember whether I wanted to separate Silesian like that. It probably wouldn't be so difficult to do but I'd need to take a look at the code again and see how much shared code there is. Benwing2 (talk) 08:02, 1 May 2024 (UTC)Reply
@Benwing2 I suppose another issue would be trying to stuff Kashubian in the module, it shares a decent amount with Polish, including a preference for penultimate stress (just with non-penult more frequently). This would also Make maintenance easier, like with the headword module. What do you think? Vininn126 (talk) 08:05, 1 May 2024 (UTC)Reply
Might actually be easiest to include Kashubian in the module, you are right; there will be a lot of shared infrastructure that we can avoid duplicating. So essentially we'd have a Lechitic pronunciation module just like the Lechitic headword module; the specific functions to generate IPA will differ somewhat but the wrapper handling to turn that basic functionality into a {{pl-pr}}, {{csb-pr}}, {{szb-pr}}, etc. call will be the same. Benwing2 (talk) 08:13, 1 May 2024 (UTC)Reply
@Benwing2 It might also be a good idea to add in Slovincian in that case? Hopefully that should be relatively easy. The hardest thing with Kashubian might be making sure the stress is right - there are a lot of 2+ syllable words where it wasn't marked properly, you can probably generate me a .txt file and I can try to fix that. Vininn126 (talk) 08:16, 1 May 2024 (UTC)Reply
Also Masurian and Old Polish I guess... But let's take this one step at a time. Vininn126 (talk) 08:17, 1 May 2024 (UTC)Reply
Indeed, all good. Benwing2 (talk) 08:18, 1 May 2024 (UTC)Reply

Deployment

[edit]

@Benwing2 Alright, what's it gonna take to deploy this/add other Lechitic languages? :P Vininn126 (talk) 15:33, 19 May 2024 (UTC)Reply

@Vininn126 Maybe a week's worth of time or two, I'm guessing, to fix it up and convert the existing templates. I did promise I would get this done (eventually). Benwing2 (talk) 19:07, 19 May 2024 (UTC)Reply
@Benwing2 Alright. I mean I know you're busy, just let me know what I can do to help. I'd also like to add the other Lechitic langs, like I discussed. Vininn126 (talk) 19:20, 19 May 2024 (UTC)Reply

Update

[edit]

@Benwing2 Since I pinged you about the move, I figured I'd also let you know I'm done adding L2's to the module. There are probably some imperfections how I did it, but at least the changes should stop for now.

As far as adding more, basically I'd want a way to handle Polish dialects, but I'm also trying to learn best how we can document these anyway (I'm still unsure about Goral, but that's not important at the moment; having read up on more dialects and having heard recordings of most, I feel it's pretty safe to say only Goral remains a question). I think in theory we'd only need 3 more vowels for some dialects, and a way to call individual dialects to either the main template or perhaps a subtemplate, and I think skipping rhymes/syllabification for them, basically like what we do with Middle Polish. Vininn126 (talk) 12:12, 20 June 2024 (UTC)Reply

@Vininn126 OK cool. I did a bunch of work some weeks ago cleaning up the Tagalog module and I should be able to clean up this module as soon as I'm done with my current work (which is redoing the support for inline modifiers to make it much easier to use). Benwing2 (talk) 08:20, 21 June 2024 (UTC)Reply
@Benwing2 Cool!
My intentions for dialects should be easy to implement in the current module, unless you wanna change things. Basically it's possible to give a lect a code within the module, and then set that lect as either an L2 or a dialect. If it's a dialect, we can call it to the main template and/or have a separate that calls only it.
Other than that we might want some narrow transcriptions (broad ones of course). Vininn126 (talk) 08:24, 21 June 2024 (UTC)Reply