Wiktionary:Beer parlour/2022/December: difference between revisions

Content deleted Content added

Inline

Revision as of 02:19, 21 December 2022

Japanese kyujitai

Previous discussion: Wiktionary:Beer_parlour/2020/November#Move_kyujitai_to_t:ja-kanjitab

Currently in Japanese entries, both t:ja-kanjitab and the headword template are capable of displaying kyujitai. But obviously we want only one of them. So I suppose the community should make a decision on which to stay and which to go.

To abolish headword line kyujitai, some 6000 pages (Special:WhatLinksHere/Template:tracking/ja-headword/kyu) need cleanup. We need bots to do this.
To abolish t:ja-kanjitab kyujitai, perhaps only no more than 100 pages need cleanup. Much easier. -- Huhu9001 (talk) 10:22, 2 December 2022 (UTC)[reply]

Considering that kyujitai is specific to the details of how a word is spelled in kanji, and that is the entire purpose of {{ja-kanjitab}}, it makes more sense to me to have all of the spelling, script, and reading-type information consolidated into that template. The headword is already crowded with other information, which was (I think) a big part of the impetus in the creation of {{ja-kanjitab}} in the first place. ‑‑ Eiríkr Útlendi │^{Tala við mig} 22:58, 2 December 2022 (UTC)[reply]

To me 旧字体 (kyūjítai) and 歴史的仮名遣い (rekishiteki-kanazúkai) should both be at the end of the "Alternative spellings" box, with their own "Historical spellings" caption and kanji/kana labels. No need to add either to the headword, it's just visual noise. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 12:42, 3 December 2022 (UTC)[reply]

Adding a bit more: separating "normal" alternative spellings from historical spelling would allow us to deal with cases like 掴む (tsukámu) and 我が儘 (wagamáma) in a clearer way.

I would put kyūjítai and rekishiteki-kanazúkai together under "historic" with kanji and kana labels since they were both the standard before the post-war reforms that gave us Modern Japanese orthography, so they do belong together like modern kanji and kana belong together. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 12:52, 3 December 2022 (UTC)[reply]

The automated display of kyūjitai is good but I have been providing the manual |kyu= because the automation requires maintenance and incorrect values do happen, as is the case (currently) with 優勝者(ゆうしょうしゃ) (yūshōsha) where the "Alternative spelling" box shows the same entry name: "優勝者". --Anatoli T. ^{(обсудить}/^вклад) 05:20, 19 December 2022 (UTC)[reply]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria): I think this is worth a vote if no conclusion can be drawn here. -- Huhu9001 (talk) 08:59, 3 December 2022 (UTC)[reply]

@Huhu9001 As a bot owner but not a Japanese editor, I think we should do what's right irrespective of how many pages need to be changed. Changing 6000 pages by bot is really not a big deal (I did one change a few years ago that hit about 1.4 million pages ...); the only question is how much can be automated vs. how much needs to be done manually. Any idea about that? For example, if 100 pages can't be handled automatically, that's fairly easy to do manually; doing 1,000 pages manually is more difficult and would best be handled by the "push-manual-changes" method I use for such situations (where you load all the pages into a text file, do all the edits there and then push the results using a bot). Benwing2 (talk) 19:44, 3 December 2022 (UTC)[reply]

I agree with Eirikr and Sartma that ja-kanjitab is the more sensible place for kyujitai, as it is a matter of written form. I would have no objection to a 'historical' section that is somehow separated from other alternative written forms. Cnilep (talk) 01:20, 6 December 2022 (UTC)[reply]

Position of box templates

While working on Umbrian, it was pointed out to me that my usage of {{normalized}} was going against what's written on its documentation, that is, to place it at the end of the entry, where as I place it at the beginning (see avif, persklom, etc.).

This is also common practice with {{LDL}}, and I find it odd, since all the other box templates I can think of are placed at the topmost of the entry: most notably {{reconstructed}} (which actually comes before the L2 header) and {{phrasebook}}. {{hot word}}, although not a box, might also be worth mentioning.

With our current positioning (1) the box is theoretically inside the last header, usually References or Further reading, (2) important information is at the bottom of the page, which is not ideal since due to the bright green everyone is going to look there first thing anyways, so placing it at the top would make the reader follow the normal top-to-bottom order, and (3) it looks worse: I mean... look at the spacing (eg: nyelingur, суъптаъ), it looks like it ended up there by mistake. Cast your votes.

Catonif (talk) 20:15, 3 December 2022 (UTC)[reply]

Top. Vininn126 (talk) 21:21, 3 December 2022 (UTC)[reply]

Top, after L2 header for this template and all other box templates that apply to the entire L2 entry. JeffDoozan (talk) 17:25, 4 December 2022 (UTC)[reply]

At the very Top of L2 section. DCDuring (talk) 17:49, 4 December 2022 (UTC)[reply]

Does that mean above or below the L2 itself? Vininn126 (talk) 17:50, 4 December 2022 (UTC)[reply]

Below the L2. Theknightwho (talk) 19:27, 4 December 2022 (UTC)[reply]

Top makes more sense IMO. —Al-Muqanna المقنع (talk) 21:08, 4 December 2022 (UTC)[reply]

Bottom, as these boxes do not contain any important information at all. MuDavid 栘𩿠 (talk) 01:30, 6 December 2022 (UTC)[reply]

Does this imply that box templates should not be used if they do not apply to all the homographs in the language section? --RichardW57 (talk) 01:51, 6 December 2022 (UTC)[reply]

As appropriate. I think nyelingur, which uses {{LDL}}, looks better as it is. Seeming to be in a References section is actually appropriate for that template. The box for that template will be disruptively thick if it appears at the start of the language section. By contrast, boxes for {{rfv}} indicate something that needs attention; if one is looking up a word found in a durable medium, the user may be able to help. --RichardW57 (talk) 02:04, 6 December 2022 (UTC)[reply]

I suppose I could see moving the LDL template up a la {{hot word}}. Moving {{normalized}} up just seems like clutter, if we're comprehensively normalizing all the words from a reference work in one script/orthography to another script, so my personal aesthetic preference would be to leave that box at the bottom. - -sche (discuss) 02:27, 6 December 2022 (UTC)[reply]

I'm happy to see this matter is gaining so much input.

I would like to underline that we should not have this floating homelessly in the entry, that is, not being technically under any of our headers, defying the structure. If we really want it to be in the References (even though it's not a reference), so be it, as long as in EL we clearly state "the References L3 is for references... and green boxes", and that in the presence of (e.g.) Anagrams, the latter will be placed below. Now this doesn't sound so great, but it's the only way to make the boxes not defy our tree-like structure while still staying at the bottom. Either this, or top.

@MuDavid: while I agree that it is subjective whether the information is important or not, the point is that the bright green is going to attract the eye nonetheless, and in that case, better reading top-to-bottom than top-jump-to-bottom-go-to-top-and-read-to-bottom. @RichardW57: not sure how homographs should be dealt with, but that sounds an exceptionally good reason to have the box at the top (right under the ===Etymology N=== header). For it at the bottom, see at saman#Azerbaijani. @-sche: are you suggesting we move {{LDL}} to the top, but not {{normalized}}? They work and look very similarly, it would be weird to have them with separate positioning. Imagine ката#Udi. About the clutter, I need to point out that (in Umbrian, which I presume is what you're talking about) not all words are normalized, some being lemmatized in the same spelling in which they are actually attested.

Catonif (talk) 13:37, 6 December 2022 (UTC)[reply]

(Currently it's 4-2, so it'll likely be top.) Vininn126 (talk) 13:45, 6 December 2022 (UTC)[reply]

Ok, I waited to not take any premature decision. Seeing that the discussion had the result of top (5-2, counting myself), I can change the the documentations, but on the other hand, I can't manually move all istances of the templates, can this be automated by a bot? Catonif (talk) 15:29, 9 December 2022 (UTC)[reply]

I can have my bot enforce this, which templates should always be at the top of the language entry after the L2 header? Just {{normalized}}, {{reconstructed}} and {{hot word}} or are there others? JeffDoozan (talk) 01:54, 10 December 2022 (UTC)[reply]

Thank you! They should be {{normalized}} and {{LDL}}. {{hot word}} should technically already be there, and {{reconstructed}} is actually before the L2, and I think everyone's fine with that. Catonif (talk) 07:08, 10 December 2022 (UTC)[reply]

Thanks Jeff! Vininn126 (talk) 11:47, 10 December 2022 (UTC)[reply]

The 'top' position is not immediately after the L2 header. It is after the L2 header or Etymology N header. --RichardW57 (talk) 10:37, 11 December 2022 (UTC)[reply]

That's true. @JeffDoozan: could you provide a list of of the entries your bot moved the boxes from that have multiple etymologies?

On another note, we could consider having some sort of |lite= parameter to be enabled in such cases, to make the templates less cluttery, since right under L3 headers the box doesn't look very good. Catonif (talk) 19:02, 14 December 2022 (UTC)[reply]

`{{defdate}}` vs `{{etydate}}`

Would anyone mind if I changed etydate to be placed in the etymology line? Vininn126 (talk) 09:23, 5 December 2022 (UTC)[reply]

Support. Hopefully also a bot to do the cleanup. Catonif (talk) 19:03, 5 December 2022 (UTC)[reply]

@Vininn126: I object to your proposal on the grounds of unintelligibility. What change are you proposing? --RichardW57 (talk) 23:42, 5 December 2022 (UTC)[reply]

Object RichardW57 (talk) 23:43, 5 December 2022 (UTC)[reply]

That is... an odd reason to object? Currently etydate is supposed to be on the definition line like defdate. It is overlapping with defdate in that area if you were to put both. Plus it's ETYdate. Vininn126 (talk) 07:49, 6 December 2022 (UTC)[reply]

You mean in the etymology section like ampersand#Polish, or on the definition line like {{defdate}}? When used on the definition line does seem to overlap with defdate, though I concede it's not redundant because it automates "first attested in" and some other things. If we're putting it in the etymology section, IMO it should be reformatted, because I see no reason for it to be in brackets and at a small font size if it's in the etymology section, though it could still be helpful as a time-/keystroke-saving templatization of our current 'handwritten' etymologies like "First attested in 1644; engineering sense first attested in 1793". - -sche (discuss) 01:58, 6 December 2022 (UTC)[reply]

I could really get behind that. If we increased the font we'd want to increase the reference size as well. Vininn126 (talk) 07:49, 6 December 2022 (UTC)[reply]

I agree that it would need reformatting to be moved into the etymology section. Graham11 (talk) 08:03, 6 December 2022 (UTC)[reply]

We can also discuss if it should be at the beginning or the end. Vininn126 (talk) 08:21, 6 December 2022 (UTC)[reply]

Does that need to be determined? I don't think it really matters if an etymology section says "first attested in 1900, from X + Y" or "from X + Y, first attested in 1900", though my personal preference is for the latter. Agree with -sche about removing the brackets and size formatting in any case. —Al-Muqanna المقنع (talk) 13:06, 6 December 2022 (UTC)[reply]

It's been discussed on the discord, plus there's an argument to be made about consistency and uniformity of entries making them easier to read. Vininn126 (talk) 13:13, 6 December 2022 (UTC)[reply]

Discord discussions don't replace discussions in BP. DCDuring (talk) 00:58, 16 December 2022 (UTC)[reply]

That is why I brought it up here! When I say discussed, I mean raised. No decision like that would be made in that way; again, why I brought it up here. Vininn126 (talk) 01:06, 16 December 2022 (UTC)[reply]

Regardless of position, should we move forward with the reformatting of {{etydate}}? I'd like to use this template more but the current formatting is appropriate for glosses and not the etymology section IMO. I've also noticed that there are quite a few Hungarian entries with a raw {{defdate}} in the etymology section, which might need looking into (e.g. aréna). —Al-Muqanna المقنع (talk) 20:18, 14 December 2022 (UTC)[reply]

I was going to wait about 2 weeks but it seems the conversation has died down since the last comment. I would like to make the changes, and then also look for any etydates in the deflines and defdates in the etylines. Vininn126 (talk) 20:19, 14 December 2022 (UTC)[reply]

@-sche@Al-Muqanna@Graham11 I have made the following changes: not make the template print small text and removed the []'s. Currently converting the approrpriate templates. Vininn126 (talk) 20:28, 15 December 2022 (UTC)[reply]

Wugniu tone notation

We’ve generally come to a conclusion as to how the Shanghainese Wugniu rollout will work. For a refresher on the romanisation scheme, see User:ND381/Wu Expansion, which also has notes on what will be done for the Wugniu display integration. However, as you can see, we do not yet have a consensus as to how to display tones. Here are a few ideas for you, please leave a comment as to what you prefer. (I’m working on the assumption that we can all agree that left-prominent sandhi is to be notated with a dash, but if you disagree, let me know)

1. Diacritics

Wugniu, as the website displays it, does not have diacritics. However, due to the “two phonemic tones” analysis of Shanghainese, many have opted to simply notate the dark level (陰平) tone with a diacritic - usually acute or grave accent.

non	tsén	mo-ve
儂	真	麻煩

An important advantage of this that this makes the transcription a lot cleaner. Though it is to note that this will not be possible for lects such as Suzhounese where no analyses have all non-first-syllable tones lose phonemic tone.

2. Numbers

What we currently do reflects that of many romanisations, however, Wugniu prioritises historical tone distribution, and thus tones 2-5 will be renumbered 5-8. This is, frankly, all that people which use number notation can agree on. Whether to use super/subscript numbers before/after the syllable are all points of contention. Unfortunately, due to how the old module is programmed, there is no way to re-implement tones for syllables after the first.

a. all behind syllable

non⁶	tsen¹	mo⁶-ve
儂	真	麻煩

b. all behind syllable, except for sandhi chains

non⁶	tsen¹	⁶mo-ve
儂	真	麻煩

c. all in front of syllable

⁶non	¹tsen	⁶mo-ve
儂	真	麻煩

3. Right prominent sandhi

It is also of note that many people don't actually notate the right-prominent sandhi in Shanghainese. However, this can lead to changes of tone. I'm not sure whether we should notate it as well (the current module already forces use of +), and if we do decide to, how we ought to do it.

If there are any further thoughts, let me know as well. (yoinking from justin's message from last time: @Atitarev, Thedarkknightli, ChromeGames, Mteechan) — 義順 (talk) 19:15, 5 December 2022 (UTC)[reply]

My two cents is that "non⁶ tsen¹ ⁶mo-ve" is confusing; I would think that the tone a syllable has should be notated either consistently after or consistently before that syllable, but not in different places. (If the issue is that in this case tsen itself is pronounced with a tone that goes from 1 to 6, I would still think that the indication of this should be attached to tsen, not half to tsen and half to mo.) Of those options (after vs before a syllable), it seems like languages in general and Chinese languages in particular usually notate tones after the relevant syllable (tsen¹), rather than before (¹tsen), so notating tone after the syllable here too would be consistent. - -sche (discuss) 02:24, 6 December 2022 (UTC)[reply]

Upon asking several Shanghainese people (most of which having an understanding of Wugniu and/or linguistics), the general consensus seems to be thaf right-prominent sandhi is too variable to be practical to include (ie. the 1 + 6 should not be written). The overwhelming majority agree that 2c looks the best (including those that know of systems such as Jyutping), with one supporting sticking to 1. I personally also agree that 2c looks the best, but we may want more Wiktionarians to reply first. — 義順 (talk) 23:06, 7 December 2022 (UTC)[reply]

2b should not be used, IMO, because of the ambiguity that -sche mentioned. a and c both seem fine to me. —Al-Muqanna المقنع (talk) 11:46, 8 December 2022 (UTC)[reply]

Forgot to mention - Module:wuu-pron/sandbox/documentation#Usage exists and also has a scheme for how the input would work. If there are any comments, leave them somewhere to see — 義順 (talk) 23:35, 19 December 2022 (UTC)[reply]

Location of Footnotes for Etymologies

The problematic text is in Wiktionary:Etymology#References. What does "Etymologies should be referenced if possible, ideally by footnotes within the “Etymology” section" mean? Web pages don't naturally do literal footnotes. For talking of Wikimedia pages, I suggest that 'footnote' should normally mean the display the content implied by the domain of a <ref> tag; such is typically displayed as the expansion of a <references/> tag. --RichardW57 (talk) 01:17, 6 December 2022 (UTC)[reply]

If it means what I think it means, I propose that "footnotes within the "Eymology" section" be replaced by "inline references", and that "inline references" be added to the glossary. Otherwise, I will defend adherences to the currently proposed policy. Silence is consent. --RichardW57 (talk) 01:17, 6 December 2022 (UTC)[reply]

Should they be below or above further reading? Vininn126 (talk) 14:15, 6 December 2022 (UTC)[reply]

I would expect them to be in a 'References' section. --RichardW57 (talk) 23:52, 6 December 2022 (UTC)[reply]

Yes, but I mean the references section itself. Vininn126 (talk) 08:14, 7 December 2022 (UTC)[reply]

@Vininn126: If they be within the Etymology section, then the order is unspecified, but I feel they would be better outside the etymology section even if we have the bodies of the references within the etymology section. As sisters within the same section of another type, I would expect 'References' to come before 'Further Reading'. RichardW57m (talk) 14:39, 7 December 2022 (UTC)[reply]

I ask because on many pages, particularly Proto Slavic pages, References is under, but I would expect it to be above Further Reading as well. Vininn126 (talk) 09:38, 8 December 2022 (UTC)[reply]

This has relevance to the layout of the etymology section of พริก (prík). Potential edit warrrer: @This, that and the other. --RichardW57 (talk) 01:17, 6 December 2022 (UTC)[reply]

Footnotes should go at the bottom. As you can ready in WT:EL, references go below. MuDavid 栘𩿠 (talk) 01:26, 6 December 2022 (UTC)[reply]

I'll jump in here since I think my bot edits provoked this. I would read that as meaning that Etymologies are simply encouraged to use the <ref> tag and that consequently those references would be displayed in the References section at the end of the entry. I see no good reason for any entry to have more than one References section and certainly not one stuck inside the Etymology. JeffDoozan (talk) 01:46, 6 December 2022 (UTC)[reply]

I agree with Jeff, I'd take this to mean etymologies should have <ref>s, not that the <references/> also needs to be directly in the Etymology section (above the POS section and definitions); I would support rewording the guidance to be clearer. On the last point: sometimes, if an entry has two different etymology sections, it may have ====References==== sections at the end of each overall Etymology division, i.e. after the POS, etc. That seems OK. But yeah, don't put <references/> directly inside the ===Etymology=== section i.e. above the POS and definitions. In the exceptional circumstance that it's needed for the exact quote from a reference to be directly adjacent to the etymology, just quote the reference... - -sche (discuss) 02:10, 6 December 2022 (UTC)[reply]

What do you mean by 'entry'? Do you perhaps mean 'language section' or 'language section or numbered etymology section'? You can't mean 'lemma' or 'form', because there may be multiple lemmas for a single etymology, especially in languages where Europeans readily confound verbs with prepositions, or absolute neuter adjectives with abstract nouns. RichardW57 (talk) 14:07, 6 December 2022 (UTC)[reply]

Apologies for the ambiguity, by entry I mean the entire language section. JeffDoozan (talk) 14:20, 6 December 2022 (UTC)[reply]

I'm in complete concurrence with the three users above. If nobody objects, I'll make the change to EL as suggested by Richard, on the basis of this consensus. This, that and the other (talk) 11:28, 6 December 2022 (UTC)[reply]

Do you mean "WT:Etymology"? --RichardW57 (talk) 14:11, 6 December 2022 (UTC)[reply]

@RichardW57 Yes, I do. I thought you were referring to EL, which is a protected policy page, but as this is WT:E, which is not protected, feel free to make the change yourself. This, that and the other (talk) 09:36, 8 December 2022 (UTC)[reply]

Done. --RichardW57 (talk) 12:38, 11 December 2022 (UTC)[reply]

Syllable breaks in English pronunciations

User:Kwamikagami seems to be on a one-person crusade to expunge syllable-break markings from English pronunciation transcriptions (e.g., here, here, here, here, here, here), claiming that English syllabification is theory-dependent, when, in fact, English words naturally fall apart cleanly into separate syllables, something that's inconsistent with syllabification being theory-dependent (as this would require the actual pronunciation of the word to change depending on which theory one subscribes to, which is obviously ludicrous). Other people's thoughts? Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:04, 6 December 2022 (UTC)[reply]

Whoop thinks it's "obvious" that Vashti is syllabified /ˈvæ.ʃti/. To me, it's obviously /ˈvæʃ.ti/. But to Wells it's clearly /ˈvæʃt.i/. If we're going to mark syllable boundaries by default, then we need consensus on an algorithm as to where they are. For example, do we agree that in GA girl is disyllabic? And then there's the question of how to handle ambisyllabicity.

~~[Or maybe Ladefoged. I forget: who is it that analyses nitrate as /ˈnaɪtr.eɪt/?]~~ kwami (talk) 08:07, 6 December 2022 (UTC)[reply]

User:Kwamikagami Personally I think you should avoid unilaterally removing syllable boundaries until this has been discussed here and there is consensus to make these changes. Benwing2 (talk) 08:24, 6 December 2022 (UTC)[reply]

@Benwing2: Should we go and undo Kwami's syllable-break purges until there's a consensus here one way or the other? Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:46, 6 December 2022 (UTC)[reply]

(ec) Well, Wells-or-Ladefoged-or-whoever apparently has some... interesting ideas about what kinds of consonant clusters can serve as an English syllable coda, ideas that seem to not always correspond perfectly with reality (at least if "/ˈnaɪtɹ.eɪt/" is anything to go by). girl can be either disyllabic (/ˈɡɚ.əl/) or monosyllabic (/ɡɚl/) in GA; this isn't a notational difference, but an actual variation in pronunciation (the GA dialects haven't developed a consensus as to the number of syllables in girl). As for Vashti, I strongly suspect that the difference in what various speakers consider the "obvious" syllabification might well reflect actual differences in pronunciation among GA speakers, similarly to the situation with the number of syllables in girl - in which case this isn't a question of what syllabification theory one subscribes to, but, rather, a question of multiple actual coexisting pronunciations that each need to be included. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:31, 6 December 2022 (UTC)[reply]

What is there to stop /ɡɚəl/ being monosyllabic, like British English /bɪəd/ beard? --RichardW57 (talk) 14:38, 6 December 2022 (UTC)[reply]

The R-coloring of the first schwa. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 01:00, 7 December 2022 (UTC)[reply]

How is that a problem? It is quite possible for only the second half of a vowel to be rhotacised. --RichardW57m (talk) 14:48, 7 December 2022 (UTC)[reply]

As regards ambisyllabicity, the natural way to notate that seems to be to include the consonant in question twice, first as the coda of the first syllable and then as the onset of the second syllable, which also seems to correlate the best with how the words in question actually sound. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:35, 6 December 2022 (UTC)[reply]

No, that's not natural, because it implies that the consonant is held longer than others in the word, which is generally not true. Consonants are not geminate simply by virtue of landing on syllable boundaries. Andrew Sheedy (talk) 08:37, 6 December 2022 (UTC)[reply]

It's not geminate; the first syllable has a half-length coda and the second has a half-length onset, with the syllable break coming in the middle of the sound lying across the syllable boundary. How would you go about notating ambisyllabic pronunciations (and don't try to avoid the problem by omitting syllable boundaries altogether, since that wouldn't help in cases where the presence of stress on at least the second syllable requires the syllable break to be marked)? Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:44, 6 December 2022 (UTC)[reply]

The fact that you don't understand something doesn't mean that it doesn't correspond to reality. If you think you know better than internationally recognized experts, then it could be that you understand less than you think you do. kwami (talk) 10:03, 6 December 2022 (UTC)[reply]

Reconfirmed, it is Wells, author of the English Pronouncing Dictionary and Longman Pronunciation Dictionary. Woop, I'm curious how you would syllabify the following words, compared to one of the main RS's for English pronunciation. (Just add periods or hyphens if you like:)

petrol, selfish, feature, dolphin, hamper, brandish, carpeting, crisis, banker, attestation, apex, freedom, mattress, squadron, paltry.

Without concordance, and considering that dictionaries contradict each other, I'm wondering how we would be able to decide on syllabification. kwami (talk) 10:31, 6 December 2022 (UTC)[reply]

@Kwamikagami: /ˈpɛ.tɹəl/, /ˈsɛl.fɪʃ/, /ˈfi.t͡ʃɚ/, /ˈdɔl.fɪn/, /ˈhæm.pɚ/, /ˈbɹæn.dɪʃ/, /ˈkɑɹ.p(ɪ/ə).ɾɪŋ/, /ˈkɹɑi.sɪs/, /ˈbæiŋ.kɚ/, /ˌæ.ɾəˈsɾɛi.ʃ(ɪ/ə)n/, /ˈɛi.pɛks/, /ˈfɹi.dəm/, /ˈmæ.tɹ(ɪ/ə)s/, /ˈskwɔ.dɹ(ɪ/ə)n/, /ˈpɔl.tɹi/. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 00:58, 7 December 2022 (UTC)[reply]

Okay, Wells disagrees with you on every one of those. E.g. for 'selfish', Wells argues it's self.ish, forming a near-minimal pair with 'shellfish', which is syllabified shell.fish. Other dictionaries agree with some of yours but not others. E.g., you have short/lax vowels in open syllables in pe.trol and ma.ttress, which most treatments argue is not allowed in English. So it's not obvious how we should approach this. kwami (talk) 01:06, 7 December 2022 (UTC)[reply]

That's confusing to me, because I interpret the aspiration of a stop in petrol, mattress, paltry as an indication that it comes at the beginning of a syllable, so they would have a syllable onset with /tɹ/ or /t͡ʃɹ/. Similarly with Wisconsin, some people pronounce the c as aspirated [kʰ] and others don't; that means to me that consonant cluster is either split across the syllable boundary /s.k/ or not /sk/. But I don't know how to harmonize this with the lax vowel rule. — Eru·tuon 14:02, 8 December 2022 (UTC)[reply]

Wells argues that /tr/ acts like an affricate in the syllabification of words like /ˈmætr.əs/, but it is not a popular solution. (In accents that don't affricate /tr/, is there any more aspiration here than in words like happy, apple or heckle?) Other alternatives are ambisyllabicity (not as unpopular, but there's far from a consensus in its favor) or concluding that English allows word-medial syllables to end in ways that word-final syllables cannot (this is not so implausible if we view the ban on words like */ˈmæ/ or */səˈmæ/ as having to do with minimal length requirements for feet, rather than restrictions on syllables).--Urszag (talk) 00:16, 9 December 2022 (UTC)[reply]

I often don't affricate the t in mattress and the r is still usually devoiced. But even when it's an affricate I think I'd still aspirate it. — Eru·tuon 03:45, 9 December 2022 (UTC)[reply]

Another possibility would be that the lax-vowel rule isn't an actual rule of English phonology, but merely a coincidental lack of words that violate the so-called rule; a point in favor of this theory would be that some (mostly-onomatopoeically-derived) words do exist which end in lax vowels, like eh and baa. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 06:32, 9 December 2022 (UTC)[reply]

That's not an accidental gap. Interjections frequently have their own phonotactics. E.g. you wouldn't say English is a click language because of tsk! tsk! or tchick! So yes, in lexical vocabulary, English words (and perhaps syllables) do not end in 'lax' vowels. kwami (talk) 08:39, 9 December 2022 (UTC)[reply]

Re "include the consonant in question twice", I'd say that'd be bad for a different reason than Andrew: if I'm understanding correctly, you're suggesting to write something like /-d.d-/, for a case where the word has a /-d-/ sound which is hard to pin down to one syllable or the other? But it's still one /d/; /-d.d-/ would wrongly say there are two consonants, the way there actually are in some words like bookkeeper, or wholely and solely /-l.l-/ when contrasted with holy, soul-y /-l-/. - -sche (discuss) 10:34, 6 December 2022 (UTC)[reply]

Umm, wholly and solely aren't contrasted with holy and souly; they're homophonous with the two latter words. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 01:04, 7 December 2022 (UTC)[reply]

They're homophones for me as well, but contrastive according to Longman. That may be an RP/GA difference, I don't know. kwami (talk) 01:07, 7 December 2022 (UTC)[reply]

Markedly different for me as a BrE-speaker, both in terms of vowel sound (cf. goat split) and gemination, so probably. —Al-Muqanna المقنع (talk) 01:09, 7 December 2022 (UTC)[reply]

Likewise - different for me. The 'l' is clearly lengthened in wholly and solely, but not in holy and souly. Theknightwho (talk) 02:13, 7 December 2022 (UTC)[reply]

It isn't even an RP/GA difference, as American dictionaries also acknowledge the double l in solely, vs single l in holy. Some speakers don't distinguish them is about as much as can be said, and merger seems to be more common for some words (like wholly, where the original morphemic division whole+-ly has become obscured) than others (like solely and bookkeeper where the fact that they're composed of different parts, one of which ends with /l/ or /k/ and the other of which begins with it, is transparent). Checking Cambridge, the old Century, Collins, Dictionary.com, Longman, MacMillan, Merriam-Webster, the old OED, and Oxford Learner's, all of them have double k as the only option for bookkeeper or bookkeeping (none allow single /k/), and all of them have double l as the only option for solely except MW which allows either double or single /l/. For wholly, Cambridge, Collins, Longman and Oxford Learner's have only double l, Dictionary.com and MW and the OED allow either double or single /l/, Century allows only single /l/ and MacMillan has single /l/ for the US and double for the UK. - -sche (discuss) 06:32, 7 December 2022 (UTC)[reply]

The problem of consonants being ambisyllabic / hard to pin down to one syllable or another is a known/longstanding problem, but it's a problem we face regardless (when we have to insert stress markers), and I don't think we should start removing syllable breaks as a result. (I also don't think "notate syllable breaks like other dictionaries generally do" requires "also list every alternative syllable-breaking scheme any phonologist anywhere has devised.) I will say, in the specific case at hand, /ˈvæʃ.ti/ seems to be a better analysis than /ˈvæ.ʃti/; it's my understanding that English speakers prefer to avoid ending syllables with checked vowels like /æ/ whenever possible (as is readily possible here by ending the syllable in /æʃ/ instead), and Dictionary.com also breaks it as /ˈvæʃ.ti/, and although Collins just has /ˈvæʃti/ without a syllable break marked, they list an alternate pronunciation /ˈvæʃˌtaɪ/ where they do mark the break. - -sche (discuss) 10:34, 6 December 2022 (UTC)[reply]

Can you give an example of where stress marking would require us to decide on ambisyllabicity?

It's not just ambisyllabicity, but that Whoop's idea of "obvious" contradicts mine, that Wells contradicts what is obvious to all three of us (e.g. he has /ˈvæʃt.aɪ/), and that respected dictionaries contradict each other. Given that, how are we to decide how to syllabify words consistently? kwami (talk) 10:48, 6 December 2022 (UTC)[reply]

Also, it's important to remember that we're not transcribing pronunciations here, but rather the phonemic abstractions that underlie the pronunciations. Phonemic analysis may produce something different than what we'd see in a spectrogram. E.g. ambisyllabic consonants might be necessarily codas or onsets phonemically, regardless of how they're realized phonetically. kwami (talk) 11:04, 6 December 2022 (UTC)[reply]

Re "Can you give an example of where stress marking would require us to decide on ambisyllabicity?": well, since you're arguing syllable divisions are inherently or widely ambiguous and hard to decide on, the answer is that any word where the stress isn't on the first syllable will require deciding where, exactly, relative to the word's various consonants and vowels, to insert the stress marker, just as we decide where to insert the /./. - -sche (discuss) 06:32, 7 December 2022 (UTC)[reply]

I don't think it's that bad. Most analyses agree pretty unanimously on syllabifying a consonant that comes between a reduced fully unstressed vowel and a stressed unreduced vowel with the following vowel, e.g. I think hardly anyone would argue for ambisyllabicity of /l/ in a word like political /pəˈlɪtɪkəl/ or of [m] in a word like information /ˌɪn.fɚˈmeɪ.ʃən/. The only type of word where I can imagine it being argued that the consonant at the start of the stressed syllable is ambisyllabic are certain words with an unreduced vowel before the stressed syllable, especially if it is a short/"lax" vowel, such as tattoo, elasticity, plasticity.--Urszag (talk) 08:06, 7 December 2022 (UTC)[reply]

"English words naturally fall apart cleanly into separate syllables" is certainly false. If this were true, there would be no disagreement among theoreticians about how to syllabify English words, but there is, as kwami observes. The perception of syllables by lay speakers is also variable in a number of cases and can be influenced by spelling (see e.g. David Eddington , Rebecca Treiman & Dirk Elzinga (2013) Syllabification of American English: Evidence from a Large-scale Experiment. Part I∗ , Journal of Quantitative Linguistics, 20:1, 45-67, DOI: 10.1080/09296174.2012.754601). Many aspects of syllabification are entirely predictable, and so not that helpful to display; however, there are some small contrasts in pronunciation that in some systems like that of Wells constitute examples of contrastive syllabification, which we ideally would be able to display somehow (either by means of marking syllable boundaries, or in some other way). These contrasts usually involve one of the items having an "unpredictable" syllable division due to an intervening morpheme boundary: hopefully, the placement of those will not be controversial, since the position of morpheme boundaries is generally clear. Of the linked examples, cupola, Vashti, Monty Python, vindaloo don't seem to benefit from showing syllable divisions. But for t-girl and understudy, I think the transcriptions /ˈtiɡɝl/ and /ˈʌndɚstʌdi/ leave some useful information out: they would benefit from either showing a syllable division marker as /ˈti.ɡɝl/ and |/ˈʌndɚ.stʌdi/ or a secondary/tertiary stress marker as /ˈtiˌɡɝl/ and /ˈʌndɚˌstʌdi/. I think I would perceive a slight difference between the rhymes found in these and in hypothetical words "league-earl" and "underce-tuddy" or "underst-uddy". These examples show that transcription of a secondary/tertiary stress after the main stressed syllable in a word is often an alternative possibility to the hypothesis of contrastive syllable divisions. Another example, from Wells, where the distinction that Wells reports making could be explained either in terms of contrastive syllable division or contrastive secondary/tertiary stress is "selfish" (with default syllable divsion, whatever you think that is, and definitely no stress on the second syllable) vs. "shellfish" (per Wells, /ˈʃɛl.fɪʃ/; per our current transcription, /ˈʃɛlˌfɪʃ/).--Urszag (talk) 13:33, 6 December 2022 (UTC)[reply]

It will indeed be useful to mark syllable boundaries in some cases. Another possibility might be to write compounds with a space between the elements in the IPA. We shouldn't use the stress marker as a syllable marker, though: that should only be for stress, and none of your examples have secondary stress. We might want to have a guideline something like "the syllable break should only be used to separate vowels and at morpheme boundaries." Currently we say that it needs to be used for one vowel sequence would would otherwise be ambiguous. kwami (talk) 01:13, 7 December 2022 (UTC)[reply]

With "We shouldn't use the stress marker as a syllable marker" do you mean we should use both the syllable break and stress marker before a non-initial stressed syllable? We try not to do that on Wiktionary; that's regarded as an error and tracked in Category:IPA for English using .ˈ or .ˌ. I get the impression it's avoided on Wikipedia as well. — Eru·tuon 14:08, 8 December 2022 (UTC)[reply]

<.ˈ> and <.ˌ> are correct IPA, but no, that's not what I meant. I meant that we should not use <ˌ> as a substitute for <.> on a non-stressed syllable. kwami (talk) 04:28, 9 December 2022 (UTC)[reply]

Good, I think everybody would agree with that. — Eru·tuon 14:29, 9 December 2022 (UTC)[reply]

`{{la-epithet}}`

At some point recently this was changed to be self-contradictory, but as far as I can tell the note-to-the-note is redundant to the usually=1 option. I guess another question is, if a Latin word is exclusively used as a taxonomic epithet and never inflected, shouldn't it just be listed as Translingual? —Al-Muqanna المقنع (talk) 13:00, 6 December 2022 (UTC)[reply]

Right, the current note looks pretty bad. The issue as I think of it so far is that these words are hypothetically supposed to have certain forms built according to Latin rules, but that doesn't mean that they are ever used in any other Latin context, and in practice I'm not sure taxonomic nomenclature should even be categorized as Latin anymore, given how little most coiners of new names actually are involved in a community of Latin speakers or writers. (I'm not sure whether the idea that these names are in Latin has been officially abandoned, or whether that varies depending on the codes according to which different types of organisms are named.) Listing as Translingual is OK; but as the note points out, it's an overgeneralization to say that taxonomic epithets are "not inflected except in the nominative singular"; plural forms are sometimes found and the genitive singular is not infrequently found in the formation of parasite names. So it is useful to provide some further information about inflected forms (if that information is available). There is no fixed pronunciation of these names, but if coined from Latin or Greek roots, the original vowel lengths may also be helpful information as in theory the stress should probably follow the Latin stress rule.--Urszag (talk) 14:16, 6 December 2022 (UTC)[reply]

The note-to-the-note isn't redundant to that parameter. Even if an epithet wasn't used in Latin, there can still be inflection as can be seen by ruderalis (German example, taxonomics, inflected in Dat./Abl. Sg.) and Homo neanderthalensis together with Citations:Homines neanderthalenses (various examples, with Pl.).

Indeed, the note without parameter (i.e. with the text: "Used exclusively as a taxonomic epithet and thus not inflected except in the nominative singular") makes no sense in Latin entries. It's not that taxonomic terms stay uninflected in Latin (like Gen./Dat./Acc./Abl. Sg. Homo sapiens). They are inflected the Latin way in Latin (as ruderalis shows). But some terms simply aren't (attested in) Latin (for which maybe see also Category:Pseudo-loans from Latin by language). --14:29, 6 December 2022 (UTC)

Yes, I see your point, taxonomic epithets can be inflected. In that case I think the template should be reworded—as it stands it just looks like two different editors arguing. —Al-Muqanna المقنع (talk) 14:33, 6 December 2022 (UTC)[reply]

Well, there are some similar issues even with non-taxonomic Latin names being displayed as "singular only". The proper name of an individual person is by its nature not pluralizable as such, but there are semi-productive ways to semantically coerce the meaning of plural proper names by giving them a meaning like "someone named X", "a person like X" or "a version/account of X", and in that case there is often no greater obstacle in Latin than in English to using a morphologically predictable plural form. E.g. consider the form Oedipōrum (currently marked with an RFV since we display Oedipus as "singular only"); I would say this is in reality simply no more or less possible in Latin than "Oedipuses" is in English (found in various contexts, e.g. "the Oedipuses of Harold Bloom and Gilles Deleuze"). Perhaps one could argue that we should have explicit sub-senses for names that have attested uses of that kind (and only for names with attested uses), but that seems a bit impractical and also not that valuable.--Urszag (talk) 14:46, 6 December 2022 (UTC)[reply]

I think another case worth considering is that taxonomic epithets were originally coined and discussed in Latin prose, and continued to be at least into the late 19th century. Epithets would naturally be used and declined in Latin in that context, e.g. here ("in sched. foliis ut in G. Burmauni Cass. et G. natalensi et abyssinica opacis"), where G[erbera] natalensis and G[erbera] abyssinica are in the ablative. —Al-Muqanna المقنع (talk) 15:08, 6 December 2022 (UTC)[reply]

Proper nouns: That's (IMHO) another topic. Proper nouns can be set in plural as pointed out above. For Hercules and Oedipus a plural is also mentioned in dictionaries. Though sometimes it's not the plural of a proper noun (with a meaning like multiple persons named X), but instead the proper noun turned into a common noun (person like X, with characteristics of X) and then for the common noun there is a plural. Example: There're Krösus (proper noun, a certain rich king) and Krösus (common noun, rich person, has a plural). --22:05, 6 December 2022 (UTC)

Per the above I've adjusted the template to remove the note-to-the-note and change the wording in the template's relevant forms to indicate that other inflections may be theoretical/rarely found as appropriate, rather than that they are theoretical. —Al-Muqanna المقنع (talk) 18:15, 10 December 2022 (UTC)[reply]

Let's add Jisho.org to the abuse filter

There have been edits that are based on copying from jisho.org. The problem with Jisho.org is that it is a tertiary source. Here is an example and its reversion. Here is another example. The content and tone of these edits tend to be a bit more informal than usual, and these are being slightly more frequent lately (the remaining ones that weren't reverted).

If this isn't WP:COPYVIO, this has a potential risk of being WP:CIRCULAR. Jisho.org is a site that aggregates information from different dictionaries to present a user-friendly display. It's like trying to cite Google.com. Perhaps one day it might even source information from Wiktionary itself, which would have us copying from our own mirror. Don't get me wrong, I use Jisho.org a lot to help me study Japanese. The thing is that it, like any tertiary non-expert source, needs to be cross-referenced and I do that with AnkiWeb and Google Translate. Before submitting to Wiktionary I go further and cross-reference against Yahoo Chiebukuro, DeepL, HiNative, and the underlying dictionaries that Jisho.org displays from. This at least resolves the copyright concerns especially with the restrictive EDRDG license.

If one does not want to look through all those sources, then one could at least cite the dictionaries that Jisho.org uses. That site states that it uses "the JMdict, Kanjidic2, JMnedict and Radkfile dictionary files". Those appear professional and are primary/secondary expert sources which are acceptable. Nippon Jisho is a different source that's probably fine, but Jisho.org isn't citable. The users adding these seem to be well-intentioned although beginner or intermediate Japanese students. Advanced students know how to consult a wide variety of sources like Japanese-Japanese dictionaries. If there is a warning before entering Jisho.org in the article bodies or edit summary, sources will be more critically examined and the quality of edits should improve. Therefore, I propose adding Jisho.org to the abuse filter. Daniel.z.tg (talk) 12:00, 10 December 2022 (UTC)[reply]

@Daniel.z.tg: Thank you for the post. I wholly agree that Jisho.org should not be usable as a reference. I am not sure how an abuse filter would prevent this, however, and I defer to the other editors who maintain the filters. ‑‑ Eiríkr Útlendi │^{Tala við mig} 00:20, 13 December 2022 (UTC)[reply]

Pinging @Fish bowl, Eirikr: Daniel.z.tg (talk) 19:37, 18 December 2022 (UTC)[reply]

Narrow IPA norms for English

Let's try to make a list/table of how things should be represented in narrow IPA for GenAm (and British if possible). Appendix:English pronunciation already has a few notes, e.g. that word-initial /p t tʃ k/ are aspirated [pʰ tʰ tʃʰ kʰ], but we should try to cover as much as possible: "narrow IPA for morpheme-final /e/ (day, gayly) should be [___] while narrow IPA for /e/ before same-morpheme /l/ (gale-y) should be [___], [___]", etc, etc; then we could make an effort to add (consistent) narrow IPA to entries more routinely.
I figure, it we routinely have narrow IPA covering flapping, aspiration, dark L, vowel allophones, etc, it'll address some of the concern that we make it seem like certain things have the same vowels or consonants when they actually (allophonically) differ, while the broad IPA stays phonemic. But I figure we should establish agreed-on notations, not just encourage everyone to add whatever narrow IPA seems right to them, because recent discussions amply demonstrate that people are often both confident and mistaken in their assessments of what the typical GenAm (etc) pronunciation of something is. So, what norms can you think of for narrow IPA notations of GenAm, British, etc; e.g., in what situations is /u/ one thing and when is it another? - -sche (discuss) 20:57, 10 December 2022 (UTC)[reply]

Template:hcol, Template:hrow, Template:zcol, Template:zcol+, Template:acol, Template:topx, Template:topx+, , Template:exp-topx, etc.

What a mess. @Useigor Why have you created such a profusion of row/column templates when we already have {{col}} and variants such as {{col2}}, {{col3}}, etc. as well as {{top2}}, {{top3}}, etc.? Can you please explain what they accomplish that the existing templates don't? We need to clean this up, and I am going to undo all your changes unless there is a good reason for them and a clear plan to clean them up. Thanks. Benwing2 (talk) 01:59, 12 December 2022 (UTC)[reply]

Happy birthday Wiktionary!

Apparently it's our 20th birthday today. We should rename ourselves "Wikintionary" for the day, or week... This, that and the other (talk) 06:48, 12 December 2022 (UTC)[reply]

Congrats! That's a major milestone in a lifetime! And that joke deserve a round of applause too!

Noé 08:43, 12 December 2022 (UTC)[reply]

Here's to another 20 years! Vininn126 (talk) 10:23, 12 December 2022 (UTC)[reply]

This makes me feel old. - TheDaveRoss 13:23, 13 December 2022 (UTC)[reply]

Reminder to provide feedback on the Movement Charter content

Hi all,

We are in the middle of the community consultation period on the three draft sections of the Movement Charter: Preamble, Values & Principles, and Roles & Responsibilities (statement of intent). The community consultation period will last until December 18, 2022. The Movement Charter Drafting Committee (MCDC) encourages everyone who is interested in the governance of the Wikimedia movement to share their thoughts and opinions on the draft content of the Charter.

How to share your feedback?

Interested people can share their feedback via different channels provided below:

Fill out a survey (optional and anonymous, accessible in different languages)
Share your thoughts and feedback on the Meta Talk pages:
- Preamble
- Values & Principles
- Roles & Responsibilities (statement of intent)
Share your thoughts and feedback on the MS Forum:
- Preamble
- Values & Principles
- Roles & Responsibilities (statement of intent)
Send an email to: movementcharter@wikimedia.org, if you have other feedback to the MCDC.

If you want to help include your community in the consultation period, you are encouraged to become a Movement Charter Ambassador. Please find out more about it here.

Thank you for your participation!

On behalf of the Movement Charter Drafting Committee Mervat (WMF) (talk) 13:00, 12 December 2022 (UTC)[reply]

Links to reflexive Polish verbs

What prompts my enquiry is that the 3rd person plural verb form "podobają" page shows its "inflection of" link as "podobać się" rather than "podobać", in that case resulting in a red link. (The same thing applies to spodoba/spodobać się/spodobać).

Looking through the reflexive verbs category for some (apparently rare) similar examples, I notice that the synonyms listed on the "pojawiać" page are shown on the page as "ukazywać się" and "zjawiać się", but are linked within the "inflection of" template to the "ukazywać" and "zjawiać" pages. On the other hand, within the 3rd person singular form "pojawia" page, the "inflection of" link "pojawiać się" links to the pojawiać page via a redirect (presumably because that page was originally titled "pojawiać się"?).

My question is - are those "verb+się" links now optional (just used occasionally to specifically point out the reflexive part of a verb)? Or are there any specific/changed rules? i.e. For example, is it best to leave that "podobać się" link visible and link it to the "podobać" page within the "inflection of" template? Or is it now best to completely remove "-się" from the link? Thanks. DaveyLiverpool (talk) 14:46, 12 December 2022 (UTC)[reply]

At one point the pagename included się, but instead we opted to have it be a label, considering the Polish reflexive word is a mobile particle (as opposed to anchored to the word, like Russian). Most pages were completely switched - but I guess nonlemmas were skipped. Perhaps a bot owner would be willing to help...

On occasion hard redirect pages are made so that interwiki linking can work. Vininn126 (talk) 15:01, 12 December 2022 (UTC)[reply]

IMO (a) we should eliminate the hard redirects, (b) if the choice was made to lemmatize reflexive-only verbs at their non-reflexive equivalent, the non-lemma forms should point to the non-reflexive equivalent. podobają is not the third person plural present of podobać się; that would presumably be 'podobają się'. Benwing2 (talk) 07:51, 13 December 2022 (UTC)[reply]

I agree to b, but I don't get a. What about interwiki linking? Vininn126 (talk) 09:51, 13 December 2022 (UTC)[reply]

@Vininn126 Hmm. Are you saying that Polish Wiktionary lemmatizes reflexive-only verbs at their reflexive version? (IMO this is actually the correct thing to do, and it's how Spanish, Portuguese and I think Bulgarian currently work. If the verb is reflexive-only, it's lemmatized at the reflexive version, otherwise at the non-reflexive version with a 'reflexive' tag.) In that case we should keep redirects only for the reflexive-only terms and make them soft redirects using {{reflexive of}}. Benwing2 (talk) 03:22, 14 December 2022 (UTC)[reply]

What I'm saying is en.wiktinonary will take a reflexive verb like bać się and set the page name to bać. nonlemmas such as "bałaś się" should be set to "bałaś", but the lemma should at least have a redirect set because on pl.wikt they have bać się. If there is a verb that is both reflexive and non-reflexive we set up no such redirect. Vininn126 (talk) 10:21, 14 December 2022 (UTC)[reply]

@Vininn126 Right, that is fine with me although we should use soft rather than hard redirects; hard redirects are generally dispreferred in Wiktionary. Again I'd prefer to lemmatize reflexive-only verbs with the reflexive particle in the pagename but the consensus of the Polish editors should take precedence. Benwing2 (talk) 01:35, 15 December 2022 (UTC)[reply]

@Hythonia @BigDom @KamiruPL Any thoughts as to whether reflexive only verbs should be lemmatized with się in the pagename? Where was the previous discussion on this? Vininn126 (talk) 13:47, 15 December 2022 (UTC)[reply]

I'd support that, yeah. I'm a fan of Polish Wiktionary's solution. I don't recall taking part in a discussion about it here, we definitely talked about it on the Wiktionary discord and the idea of lemmatizing reflexive-only verbs with the particle się was emphatically put in the "we should talk about this someday" box. Which I'm glad someone's finally opened. Hythonia (talk) 17:01, 15 December 2022 (UTC)[reply]

I'd be fine with this, I suppose. I'd also like to make it more clear on pages with another reflexive meaning, because as it stands now it's just a label which someone can very easily miss. What if on pages where there are transitive and reflexive meanings we have a second headword with a head printed |head=verb się? or should it be in the label, like I've seen before? Vininn126 (talk) 17:27, 15 December 2022 (UTC)[reply]

@Vininn126 That's an interesting solution. My only potential concern would be if it is common for a given reflexive meaning of a verb to also exist in non-reflexive form (e.g. in Portuguese, for many verbs the reflexive particle is optional). If so, this would potentially lead to duplication of information between the reflexive and non-reflexive headers. Otherwise it seems like a good idea. Benwing2 (talk) 07:18, 16 December 2022 (UTC)[reply]

Also is the reflexive particle uniformly a separate word 'się' placed after the verb in all inflections of the verb? If not, and it varies depending on the specific inflected form, it would IMO make sense to have a separate reflexive conjugation table (we do this in Portuguese, for example, where depending on the particular inflection the reflexive particle is 'me', 'te', 'se', 'nos' or 'vos' and variously goes before, after or in the middle of the verb; see lembrar for an example). Benwing2 (talk) 07:21, 16 December 2022 (UTC)[reply]

@Benwing2 The reflexive particle is basically always required - the only time it isn't is when there's another reflexive verb in the clause and the particle is already there. The particle is mobile but dictionaries already always print it after the verb in their headwords. Vininn126 (talk) 10:22, 16 December 2022 (UTC)[reply]

I think it might be a better idea to have a parameter in the headword for reflexive that does what I am suggesting? Vininn126 (talk) 10:23, 16 December 2022 (UTC)[reply]

@Benwing2: My only potential concern would be if it is common for a given reflexive meaning of a verb to also exist in non-reflexive form. For clarity, this happens for a couple of words in Polish (see przyzostać), but it's not at all a common occurence. As Vininn said, the particle is mobile, but generally when considering a form separately, outside of the context of a sentence it's always placed after the verb. Hythonia (talk) 11:45, 17 December 2022 (UTC)[reply]

In that case would two labels be better, like here, or to modify the head? Vininn126 (talk) 12:16, 17 December 2022 (UTC)[reply]

For non-lemma pages, then, (until any decisions are made around possible changes to lemma pages) to keep it simple would it be appropriate to just add się in brackets (or something similar) where there is at least one such reflexive form at the lemma page? - See podoba#Polish. That way it doesn't matter if the verb is reflexive-only or there is a mix of reflexive and non-reflexive forms at the lemma page - it still shows the user at a glance that there is at least one such reflexive use.DaveyLiverpool (talk) 21:20, 18 December 2022 (UTC)[reply]

Community Wishlist Survey 2023 opens in January

Please help translate to your language

(There is a translatable version of this message on MetaWiki)

Hello

The Community Wishlist Survey (CWS) 2023, which lets contributors propose and vote for tools and improvements, starts next month on Monday, 23 January 2023, at 18:00 UTC and will continue annually.

We are inviting you to share your ideas for technical improvements to our tools and platforms. Long experience in editing or technical skills is not required. If you have ever used our software and thought of an idea to improve it, this is the place to come share those ideas!

The dates for the phases of the Survey will be as follows:

Phase 1: Submit, discuss, and revise proposals – Monday, Jan 23, 2023 to Sunday, Feb 6, 2023
Phase 2: WMF/Community Tech reviews and organizes proposals – Monday, Jan 30, 2023 to Friday, Feb 10, 2023
Phase 3: Vote on proposals – Friday, Feb 10, 2023 to Friday, Feb 24, 2023
Phase 4: Results posted – Tuesday, Feb 28, 2023

If you want to start writing out your ideas ahead of the Survey, you can start thinking about your proposals and draft them in the CWS sandbox.

We are grateful to all who participated last year. See you in January 2023!

Thank you! Community Tech, STei (WMF) 16:44, 15 December 2022 (UTC)[reply]

Oghuz language

We should add Oghuz for the Oghuz language of Kashgarî's Divanü Lügatit Türk. (See Middle Turkic languages). We have qwm "Kipchak", but we don't have Oghuz. DLT really has huge Oghuz data that should be improved in Wiktionary. trk-ogz or ogz is the possible codes for it. BurakD53 (talk) 17:28, 15 December 2022 (UTC)[reply]

How is it different from Category:Proto-Oghuz? Vahag (talk) 17:37, 15 December 2022 (UTC)[reply]

I believe Proto-Oghuz must be a reconstructed language. BurakD53 (talk) 18:39, 15 December 2022 (UTC)[reply]

But what is the relation between the family of all Oghuz languages, their common ancestor Proto-Oghuz, and this language you say we should add? Does it have an ISO language code? --Lambiam 06:11, 16 December 2022 (UTC)[reply]

If Proto-Oghuz is okay, I will enter Proto-Oghuz lemmas. Both Proto-Oghuz and Oghuz don't have an ISO language code. BurakD53 (talk) 06:32, 16 December 2022 (UTC)[reply]

Possibly this user is using "Oghuz language" to refer to Old Anatolian Turkish. If so, this already has a code in Wiktionary. Benwing2 (talk) 07:14, 16 December 2022 (UTC)[reply]

OK. I'll count it in OAT. Thanks. BurakD53 (talk) 08:44, 16 December 2022 (UTC)[reply]

Note that the language code is trk-oat, and that we use the Ottoman Turkish variety of the Perso-Arabic script, so if we had an entry for the proper noun Oghuz, it should be found at Old Anatolian Turkish اغوز. --Lambiam 12:10, 17 December 2022 (UTC)[reply]

@Lambiam According to Wikipedia, short vowel diacritics were used, so it might actually have been Old Anatolian Turkish اُغُوز or similar; you'd have to go by what the actual sources said. Benwing2 (talk) 19:41, 17 December 2022 (UTC)[reply]

Or perhaps ٱغُز‎; see Oghuz Turks. But indeed, we must go by the sources. --Lambiam 21:45, 17 December 2022 (UTC)[reply]

@Benwing2, Lambiam: Wikipedia’s presentation is misleading there. The Ottoman literary language was radically developing across that timespan to which we ascribe Old Anatolian Turkish — while at the beginning it had this diacritic and inplene vowel writing, although this may be reflective of the available texts regularly consisting of poetry, at the end, and this means the whole of the 15th century (when Azerbaijani was not a distinct language and when the Ottomans became prominent on the scene of the world), it went out the Ottoman spelling. (At the same time some orthography habits in Persian and Arabic that we know had developed, such as even the death of rasm; hand-writing during various centuries in the Middle Ages exposes lots of difference (that in the case of Persian and Arabic text editions, undiplomatic as they be, we like to even out in favour of most modern distinctions, while for (Old) Ottoman we ask the same weird questions as for editions of Middle Dutch).) Fay Freak (talk) 08:40, 20 December 2022 (UTC)[reply]

Bragging

According to Category:English lemmas, we're closing in on 700,000 English words. Obviously we need to boast about this. Time to get our Twitter, Facebook, Instagram and Tinder pages updated. And perhaps the world's major publications would like to run front-pages stories about this too... Are we actually the biggest dictionary in the universe? Flackofnubs (talk) 16:01, 16 December 2022 (UTC)[reply]

Don't tell the press about us, we'll get cancelled. Equinox ◑ 16:13, 16 December 2022 (UTC)[reply]

Quoting a passage with multiple lines

What's the best way to quote a multi-line passage like the one found on 100s & 1000s using {{quote-book}}? JeffDoozan (talk) 02:00, 17 December 2022 (UTC)[reply]

I think Wiktionary:Quotations#Line breaks explains it pretty well, use a <br> tag if you want to preserve the line break or a / or ¶ character as appropriate if you don't. The paragraphs in the quotation in the entry were breaking the layout so I've changed it to use <br>. —Al-Muqanna المقنع (talk) 02:11, 17 December 2022 (UTC)[reply]

I greatly prefer using a character (usually "/" or, actually, / ) to indicate the line breaks without having the passage risk pushing other content off the screen too much of the time. DCDuring (talk) 02:21, 17 December 2022 (UTC)[reply]

I think that's fair generally, I figured in the case of a recipe like at this entry it would seem a bit odd though I wouldn't protest either way. —Al-Muqanna المقنع (talk) 02:23, 17 December 2022 (UTC)[reply]

I personally prefer newlines, although since seeing a discussion where someone complained about them I've switched to using slashes (in most cases). One problem is that it's not ideal in situations where the text itself contains a slash, e.g. see the 2020 citation on Citations:tiki torcher I added earlier today. 98.170.164.88 02:25, 17 December 2022 (UTC)[reply]

</p><p> also works (see, e.g., the quotation on Appendix:Gestures/breasts). Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 03:18, 17 December 2022 (UTC)[reply]

Wikimedia Sound Logo Voting: Final days!

Hello (:

The Wikimedia Sound Logo contest presented the 10 finalist, out of 3,000 submissions from 135 countries.
Play a part and help us decide what the Sum of All Human Knowledge sounds like!

The voting is open until 19 December 2022, 23:59 UTC.
Check the info on how to vote on Wikimedia Commons; or about the contest on the project's page on Meta-Wiki.

Best,
CalliandraDysantha-WMF (talk) 02:04, 18 December 2022 (UTC)[reply]

Macro-English languages, Macro-French languages, etc.

@Theknightwho I'm sorry but this absolutely needs to be discussed before implementing. You can't just create a bunch of new, questionable macro-families in Module:families/data with no prior discussion. Please revert pending consensus. Benwing2 (talk) 05:23, 19 December 2022 (UTC)[reply]

@Benwing2 I assumed these would be uncontroversial, as they're almost exactly what we already had: collections of languages with English, French etc as a common ancestor. The reason for doing it this way is because there was no way to create subfamilies within the descendants without having a main family that encompassed all of them. Theknightwho (talk) 05:29, 19 December 2022 (UTC)[reply]

Maybe the naming was inspired by Glottolog, which has Macro-English and Macro-French as families. I can't say I'm familiar with these terms, though, and I couldn't find any scholarly works that use them (except in a critique of Glottolog's hierarchization). 70.172.194.25 05:30, 19 December 2022 (UTC)[reply]

@Theknightwho I think the IP hits the nail on the head; these sorts of families aren't accepted practice and this is a big change in the way that families are structured (which is definitely a potentially controversial issue, and our current scheme has evolved gradually through lots of discussions). I also don't quite understand "there was no way to create subfamilies within the descendants without having a main family that encompassed all of them" means; is this a technical limitation? Benwing2 (talk) 05:38, 19 December 2022 (UTC)[reply]

I should also add, this introduces hundreds or thousands of new categories "Foo terms derived from Gallo-Romance languages", "Foo terms derived from Macro-Portuguese languages", etc. Benwing2 (talk) 05:40, 19 December 2022 (UTC)[reply]

@Benwing2 Yes - it does seem to be a technical limitation, and yes, 98 is correct as to where the naming scheme came from. There are three: Macro-English, Macro-French and Macro-Portuguese.

The main issue is that it isn't possible to set ancestors of families - you can only give them proto-languages. If a proto-language is attached to multiple families, it doesn't seem to work properly, as it groups every descendant language under one of those families (including those we don't want to be in any subfamily) while leaving the other subfamilies empty. Theknightwho (talk) 05:41, 19 December 2022 (UTC)[reply]

@Theknightwho The way to deal with a technical limitation is never to work around it like this, but to solve it properly; first make a proposal and circulate it (you agreed to do this with future module changes, I think) and then implement it. Benwing2 (talk) 06:10, 19 December 2022 (UTC)[reply]

Please don't make any more changes to Module:families/data while this discussion is happening. I have already asked you to revert. Benwing2 (talk) 06:30, 19 December 2022 (UTC)[reply]

@Theknightwho ^^^^ Benwing2 (talk) 06:31, 19 December 2022 (UTC)[reply]

@Benwing2 I needed to stabilise the changes I was in the middle of, because otherwise modules become incompatible with each other. The Indo-Aryan languages that I was just sorting out had a straight-up broken structure, so it needed doing.

On the macro-languages, I am fine with getting rid of them at the earliest opportunity, but undoing that will break the subfamily structure that I've created in the creole descendants. Would it not be preferable to find a technical solution first, and to then implement that, rather than just getting rid of everything? Theknightwho (talk) 06:40, 19 December 2022 (UTC)[reply]

@Theknightwho Once again it feels to me like you are trying to claim your changes made without consensus are a fait accompli and that it would be too much trouble to back the changes out. After the last time this happened, you agreed to shop around any significant module changes before making them, but then went ahead and did this without any prior discussion. So no I don't think it's preferable to keep these changes in place while we figure out what the right thing to do is, because (a) I don't agree with all these newly-added intermediate families, and I suspect many others don't like them either, because many of them don't reflect any sort of linguistic consensus; (b) the right solution is not likely to involve such intermediate families in any case; (c) the creole descendants can be made descendants of the acrolect (I think that's how they have been done up till now in any case), or you can just wait for the technical solution to be worked out. In sum I really think you should back out your changes *before* we work out the proper solution. In this case I'll wait for you to do that but if a situation like this recurs, it's likely I will back them out myself, no questions asked. Benwing2 (talk) 09:22, 19 December 2022 (UTC)[reply]

@Benwing2 You're raising separate issues here:

If the problem is just the macro-language families mentioned in the title, then I simply disagree that they add anything meaningful to the structure. None of us want to keep them, in any event. Of course the creoles can be made descendents of the macrolect, but that also entails removing all of the sub-structure.
If the issue is the fact that there are additional language families at all, then that's a separate issue entirely, and obviously it's fine for us to have a discussion about them. However, I just don't agree with you that it's appropriate to list 30 creole languages under English without any further organisational structure.
What we agreed in relation to modules related to potential technical issues, not linguistic content.

Theknightwho (talk) 09:46, 19 December 2022 (UTC)[reply]

@Benwing2 I've reverted these, but we're already seeing problems at pages such as जस, caused by the fact that Ardhamagadhi Prakrit is both a full language (pka) and an etymology-only language (inc-pka), but the full language is (wrongly) not set as ancestral. In my changes, I made sure it was, and corrected about 100 of these pages because ~~we obviously want to deprecate the etym-only code~~. Apparently we want to deprecate the language codes, so I'll switch them. In any event, the current situation is a problem. There are seven of these duplicate codes (that I know of):

Ardhamagadhi Prakrit: pka / inc-pka
Helu: elu-prk / inc-elu
Khasa Prakrit: inc-kha / inc-khs
Magadhi Prakrit: inc-mgd / inc-pmg
Maharastri Prakrit: pmh / inc-pmh
Paisaci Prakrit: inc-psc / inc-psi
Sauraseni Prakrit: psu / inc-pse

Theknightwho (talk) 10:30, 19 December 2022 (UTC)[reply]

@Theknightwho Thank you very much for reverting. Apologies for not responding yesterday, some RL issues came up. I'm not at all opposed to solving the Prakrit issues, it's just that they were mixed in with a bunch of other changes. For those issues in particular, the Indo-Aryan language community made a decision to switch from having separate language codes for the various Prakrit varieties to having a single language "Prakrit" with etymology-only variants. I had no part in this and I don't know enough about the linguistic situation with the various Prakrits to judge whether this was the right decision, but I know the change was made piecemeal once it was decided on, and I'm not surprised this left a mess in some places. From my perspective please feel free to fix it, but do ping the Indo-Aryan editors e.g. (Notifying AryamanA, Kutchkutch, Bhagadatta, Inqilābī, Msasag, Svartava, RichardW57): . Let me read the rest of what you wrote and I will respond to it. Benwing2 (talk) 01:59, 21 December 2022 (UTC)[reply]

Arabic transliterations: let's use ʔ and ʕ instead of ʾ and ʿ.

(Notifying Atitarev, Benwing2, Mahmudmasri, Metaknowledge, Wikitiki89, Erutuon, ZxxZxxZ, عربي-٣١, Fay Freak, AdrianAbdulBaha, Assem Khidhr, Fenakhay, Fixmaster, M. I. Wright, Roger.M.Williams, Zhnka): — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 23:35, 19 December 2022 (UTC)[reply]

It looks like this suggestion may pass. I will, of course, respect the decision but it still seems wrong. No transliteration is perfect and liked by everybody and we may kiss goodbye to be found by standard searches when someone tries to find عِلْم (ʕilm) while using "ʿilm" or عَرَبِيّ (ʕarabiyy) by using ʿarabiyy. Anatoli T. ^{(обсудить}/^вклад) 22:47, 20 December 2022 (UTC)[reply]

@Atitarev: I would still argue that searches by transliterations are not the main reason for having transliterations on Wiktionary anyway, just a side effect. If that was the case, we should at list give "chat Arabic" too, and possibly other common transliteration systems. If we wanted to have searchable Arabic romanization, we should do what we do for Chinese, Japanese and any other language here that has Romanization entries. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 00:17, 21 December 2022 (UTC)[reply]

There are various diacritics and symbols that may be disliked and seem difficult to see. I don't quite believe that ʾ and ʿ that hard to see or distinguish. If I, for example, have trouble seeing any symbols, I use glasses or make the display larger. Besides, options to make symbols larger or clearer have been suggested in this thread. I haven't seen any prior discussions to discard some symbols based on their visibility.

If we abandon these two symbols, there is no point to sticking to the current standard after that. What's the point of keeping those ṣ, ṯ, ḵ, ḥ, ā, ḡ, etc. if the resulting strings are no longer standard, anyway?

Multiple transliterations systems are added to languages you mentioned in pronunciation sections. Arabic has a very complex inflection and transliterations appear in the inflection tables, many Arabic entries lack pronunciation sections, so adding a single place with alt. transliterations won't make a lot of difference, IMO. Anatoli T. ^{(обсудить}/^вклад) 00:43, 21 December 2022 (UTC)[reply]

Let's just do it. Whoever thought it was a good idea to use ʾ and ʿ to transliterate ء and ع definitely had very good eyes, but for the rest of us normal people, those two sings are just hell to tell apart. Plus, they give the impression those are not "real" consonants, but just little more than an apostrophe. Let's give consonantal dignity back to 2alif and 3ayn! Let's give a break to our users and their poor eyes. Let's vote for what makes sense! Let's vote for ʔ and ʕ in Arabic transliterations! — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 21:27, 19 December 2022 (UTC)[reply]

Support. Andrew Sheedy (talk) 23:04, 19 December 2022 (UTC)[reply]

I'm striking my support, because I think Anatoli's point about people potentially searching using standard transliteration is a good one. I won't oppose, however, because I think there are still good reasons for making this change. Andrew Sheedy (talk) 23:25, 20 December 2022 (UTC)[reply]

Oppose. Why not use numbers, 2's and 3's like in your post and switch to chat Arabic, LOL? The current system is not made by people who wanted to make the life harder but it's based on Hans Wehr dictionary. Romanization of Arabic shows other systems, none of them, apart from API, uses ʔ and ʕ.

BTW, no pings were sent, since pings and signature were in different edits. I read this topic accidentally. --Anatoli T. ^{(обсудить}/^вклад) 23:17, 19 December 2022 (UTC)[reply]

@Atitarev: I wouldn't use numbers because they are not linguistically "neutral", they're heavily marked as "popular" and are basically just an expedient one has to resort to when typing in Arabic is not possible. ʔ and ʕ are a totally different question. They are legit substitute for ʾ and ʿ, and as a matter of fact we use them already on Wiktionary for all other Arabic dialects. Incidentally, they are pretty much the same symbols as ʾ and ʿ, they're just bigger and look more like a proper letter (I suppose the vertical bit is there to avoid confusion with /ɔ/ and /c/?). I can't imagine anyone who studies Arabic being confused or disoriented by them. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 23:51, 19 December 2022 (UTC)[reply]

Comment I agree with Sartma that the current accessibility is terrible. I personally find these symbols extremely hard to distinguish. But like Atitarev, I'd prefer to stick to something standard, in part because it will make things easier for people who want to compare our content with other sources, and in part because Google Search doesn't treat them as equivalent, which reduces the chance that people will find us if they want to look up a transliteration in quotation marks. The ideal solution would be to make the half-rings larger. This is technically feasible, all it would require is making a font consisting of two-characters, magnified versions of the ayin and alif symbols, and setting that as the font for the CSS selector .ar-Latn or whatever.

Also worth noting that these characters are used in transliterations of various other Semitic languages too, not just Arabic and its dialects. 70.172.194.25 23:27, 19 December 2022 (UTC)[reply]

I do understand that "tradition" is difficult to abandon, but this really is a case of "bad" tradition. We shouldn't be too afraid of improving things. I'm sure more people will thank us than criticise us for that. As for googling transliterations with ʾ or ʿ, I think it's such a remote eventuality, we can quite easily dismiss it. If anyone can type ʾ or ʿ from their keyboard, they will surely be able to type the word in the Arabic alphabet. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 00:35, 20 December 2022 (UTC)[reply]

I have personally many times encountered transliterated Arabic-language text online (e.g., Google Books OCR, which picks up the half-rings, but lots of other sources too), and searched it, and found the correct Arabic script representation. I don't usually enter those two characters myself, though. So this isn't a remote possibility, but to be fair maybe my workflow isn't typical. 70.172.194.25 00:45, 20 December 2022 (UTC)[reply]

Also compare searching on regular Google for "Muqannaʕ" (3 results), vs. "Muqannaʿ" (5,340 results). And that's for a term that has been covered extensively, for rarer terms you might not find anything by searching the IPA character. 70.172.194.25 00:57, 20 December 2022 (UTC)[reply]

I support your solution on this one - amending the CSS to make these two characters larger is preferable. Theknightwho (talk) 01:04, 20 December 2022 (UTC)[reply]

And Muqanna has 177.000 results... Following your line of reasoning, one would think that we should just get rid of the final ʕayn. The thing is that we don't give transliterations on Wiktionary so that people can find an Arabic word. That is nothing but a side effect. If this really was a feature so critically important to the extent that we can't ignore it when deciding how to transliterate Arabic, then we should give all possible transliterations, included, and much more pertinently, things like 3alam, with its 1.390.000 results (compare ʿalam, with just 13.600 results)... — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 01:46, 20 December 2022 (UTC)[reply]

It's still the case that larger half-rings would solve the accessibility problem while being more standard than the IPA characters. Could you please humor me and install this font, and then put in your CSS [lang="ar-Latn"] { font-family: LegibleHalfRings, sans-serif; }? I see this as a compromise that allows us to stay in line with almost every other scholarly source on Semitic languages, while also being accessible to readers. @Theknightwho may also want to test this. The font can be tweaked, of course. 70.172.194.25 03:04, 20 December 2022 (UTC)[reply]

The problem with depending on a particular font is that it's not immediate, a user will have to download it, and that's not something we can expect. ʔ and ʕ don't have that issue. They are already visible to anyone whatever font is used. I wouldn't make this a question of font, but one of letters vs diacritics. ʔ and ʕ are proper letters, ʾ and ʿ are just diacritics. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 09:28, 20 December 2022 (UTC)[reply]

Support. The current transliteration is as bad as using 2 and 3, by not treating them as real CONSONANTS. In Semitic linguistics, both ʔ and ʕ are used. — Fenakhay ^{(حيطي · مساهماتي)} 00:05, 20 December 2022 (UTC)[reply]

(Notifying Atitarev, Benwing2, Mahmudmasri, Metaknowledge, Wikitiki89, Erutuon, ZxxZxxZ, عربي-٣١, Fay Freak, AdrianAbdulBaha, Assem Khidhr, Fixmaster, M. I. Wright, Roger.M.Williams, Zhnka, Sartma): : Repinging as the first one didn't go through. — Fenakhay ^{(حيطي · مساهماتي)} 00:57, 20 December 2022 (UTC)[reply]

Support: It's easier to discern, and most of us, I suppose, would be familiar with the IPA treatment. Fixmaster (talk) 01:15, 20 December 2022 (UTC)[reply]

Since user:Sartma was the first to open the topic, I'll just address him primarily:

* The ring characters are used because they are used in the standard schemes we adhered to.

* They or similar symbols were obviously chosen by some schemes, because romanized Arabic is supposed to be read by speakers who mostly lack both consonants and very likely approximate them to a glottal stop if they ever utter them.

* The scheme on Wiktionary is already a mix, so I see no problem adding to the mix the IPA-based symbols, ⟨ʔ ʕ⟩, which are possible to add from smartphones which lack the ring characters and the Minerva editor lacks any inserting options I know of, anyway.

--Mahmudmasri (talk) 01:42, 20 December 2022 (UTC)[reply]

OP's argument sums it up well, so beyond status quo bias, I support. Fay Freak (talk) 02:02, 20 December 2022 (UTC)[reply]

Comment Don't the Aramaic, Hebrew, and Syriac entries also use the same two symbols (ʾ and ʿ)? I have looked at a few entries in these languages to verify this: they do use ʾ and ʿ, but at ܥܠܡܐ (“world”), the word is transcribed as "ˁālmā", which is curious.

I myself do not mind how these two sounds are transcribed (though, esthetically, I do like the ˁ from that Syriac entry). Nonetheless, I think that changing the transcription symbols for Arabic only would be strange; it would be like changing the representation of the dental consonants only for Arabic. Roger.M.Williams (talk) 04:28, 20 December 2022 (UTC)[reply]

@Roger.M.Williams: The topic of consistency of transliteration between Semitic languages has been discussed in the past, and people always said that "there is no need for all languages to have the same rules". I would agree with you that it would be nice to have inter-linguistic consistency, but apparently only intra-linguistic consistency is a thing on Wiktionary. That's why it's ok for all Arabic dialects to use ʔ and ʕ, while Standard Arabic still has ʾ and ʿ. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 08:58, 20 December 2022 (UTC)[reply]

@Roger.M.Williams: I still prefer the full ʕ to the superscript ˁ. I have issue with this constant effort to reduce full consonants to a diacritic or to something "different" from other consonants. ʔ and ʕ are proper consonants, why should they be written as superscripts (ˀ, ˤ) or as diacritics (ʾ, ʿ)? Even from a linguistic point of view it makes no sense. The problem is that we're so used to see these two consonants abused and belittled that we think this must be the way. But a different world is indeed possible. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 09:04, 20 December 2022 (UTC)[reply]

Comment/support Template:ayn and Template:hamza would perhaps be more visible, but they suffer the problem of looking like punctuation marks. I agree though that if we use the IPA letters for Arabic, we should probably also use them for Hebrew and Aramaic. It would certainly be easier to read the languages that way, apart from dyslexia (which I imagine the current convention suffers from too). kwami (talk) 05:47, 20 December 2022 (UTC)[reply]

In order to avoid problems with dyslexia, perhaps the two shouldn't be symmetric. Maybe for glottal stop we could use the lower-case form <ɂ>:

ʔ vs ʕ
ɂ vs ʕ

kwami (talk) 06:02, 20 December 2022 (UTC)[reply]

I think this may start another discussion (or wrangle) about the esthetic qualities of ˀ, ɂ, and ʔ.

In any case, I prefer the first two. Roger.M.Williams (talk) 06:44, 20 December 2022 (UTC)[reply]

@Kwamikagami: I think that can be discussed once the decision to go for something else then ʾ and ʿ has been made. One thing is sure: people with dyslexia would definitely be better off with ʔ and ʕ, if anything because they are bigger and clearer. The symmetry issue wouldn't anyway be bigger than existing letter like b/d p/q u/n etc. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 10:04, 20 December 2022 (UTC)[reply]

Yeah, no biggie either way. I just wanted to bring it up. Full-size ʔ vs ʕ would probably be the most straight-forward to implement. kwami (talk) 18:33, 20 December 2022 (UTC)[reply]

ˁ in Syriac/Aramaic entries is most probably lazily copied over from CAL who themselves may use it for its being conservatively close to the the traditional rings while more distinctive. But I suspect its being a “modifier letter” makes it technically illegal for expressing a whole consonant.

ɂ looks nice, and also amenable to European thinking where the voiced pharyngeal fricative may be perceived as more an actual sound than the glottal stop, as well as reflecting the linguistic reality with the drop of this sound in many dialects unlike the voiced pharyngeal fricative.

We forgot that Ethiopic script transcription uses the rings, but luckily I have argued in a previous discussion about the same matter that it can have special status. Fay Freak (talk) 08:18, 20 December 2022 (UTC)[reply]

@Kwamikagami: I proposed unifying transliterations of ʔ and ʕ for all Semitic languages, but back then I was the only one thinking that would have been a good idea... See: Unifying the transliteration of ʾalef and ʿayin in Semitic languages. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 09:54, 20 December 2022 (UTC)[reply]

Support. No reason to prioritise adhering to what happens to be the most common scientific transcription over overall usability, and the point about being a full consonant is a good one. —Al-Muqannaʕ المقنع (talk) 09:43, 20 December 2022 (UTC)[reply]

Support. We would have to use the IPA letter ʔ, not lowercase ɂ (whose uppercase is Ɂ, different from the IPA letter), because there is no lowercase version of ʕ, unfortunately because I like the idea of using only lowercase letters in transliteration. Apparently no language uses uppercase and lowercase versions of ʕ yet so Unicode hasn't added them. — Eru·tuon 15:29, 20 December 2022 (UTC)[reply]

The Arabic transliteration is strictly lower-case, so that's not a problem only with no exception at Wiktionary but Wikipedians suffer from that notion that they need to capitalise foreign transliterations where it's inappropriate. Anatoli T. ^{(обсудить}/^вклад) 22:34, 20 December 2022 (UTC)[reply]

@Atitarev I am not opposed to this, primarily because the existing ʾ and ʿ are nearly impossible to tell apart at a normal distance (several times I've had to zoom in on the text in order to see what was going on). In addition, in the edit window the two are displayed reversed for some reason, which makes things incredibly difficult. However I do understand the concern about people searching for the transliteration using the ring symbols. I think there's a technical solution to this: insert the old transliteration into the output code inside of an HTML comment or non-displaying span. This should made the correct entry pop up when you search using the "traditional" transliteration.

If we decide to implement this, we should not "just do it", but think it through; lots of entries and other places use manual transliteration, and they will all have to be corrected by bot. Benwing2 (talk) 02:19, 21 December 2022 (UTC)[reply]

Names for some Turkish verb forms

In a definition line for a Turkish verb form, what should be the name for

the participle ending in -en
the form ending in -dik
the verbal noun ending in -iş
the gerund ending in -erek
the gerund ending in -ip

?

I have been defining -iş forms as verbal noun of, the same as -me, which may be misleading.

I added placeholder names for -en, -erek, and -ip in Module:tr-verb form of. If you have a strong opinion, go ahead and edit that module.

Which of these forms should defined as {{head|tr|verb form}} instead of as lemmas? Vox Sciurorum (talk) 18:14, 20 December 2022 (UTC)[reply]

Adding script codes for the Clear Script, Manchu and Xibe.

For context, the the Clear Script (also know as Todo) was used for Written Oirat and (at least historically) its descendant Kalmyk, Manchu for Manchu, and Xibe for Xibe. All three are descended from the Mongolian script proper: the Clear Script was an overt attempt to remove the ambiguities present in Mongolian, while the latter two represent adaptions to account for the different phonology of the Tungusic languages. Certainly in the case of the Clear Script and Manchu, they also allow for the transcription of Tibetan and Sanskrit for religious purposes. However, they should not be confused with the Galik alphabet, which was an augmentation to Mongolian proper for the same purpose.

From an encoding perspective, the overlap between the four scripts is surprisingly small, despite the fact that they can often appear orthographically similar. This is due to the fact that each have several equivalent characters that appear orthographically identical in many forms, but which exhibit different behaviour under certain circumstances. Compare Mongolian and Manchu "i" (encoded separately), which have identical isolated (ᠢ (i) ᡳ (i)) and initial (ᠢ᠊ (i-) ᡳ᠊ (i-)) forms, and are usually identical when medial (᠊ᠢ᠊ (-i-) ᠊ᡳ᠊ (-i-)) and final (᠊ᠢ (-i) ᠊ᡳ (-i)), too. However, Manchu has additional variant medial (᠊ᡳ᠌᠊ (-i-)) and final (᠊ᡳ᠋ (-i)) forms. What's more, even when two of the scripts use the same encoded character, they may be stylistically different: compare medial "n" between Mongolian (᠊ᠨ᠊ (-n-)) and Clear Script (᠊ᠨ᠋᠊), where the latter includes a dot.

Usually, these differences are achieved by the use of one of the Mongolian variation selector characters. Unfortunately, there is no agreed upon standard when it comes these, and so different fonts might exhibit opposite behaviour depending on whether or not a variation selector has been included. As such, it's preferable for us to simply exclude their use from entry names entirely, and to implement alternate display forms based on which script is in use. This is only possible if we have different script codes for each script. It also wouldn't be appropriate to do this based on the language code, due to the fact that the Sanskrit language has historically been written in Mongolian, Clear Script and Manchu, and therefore all three could (at least theoretically) be added to the list of scripts displayed by {{sa-alt}}. In future, we may want to create a similar template for Tibetan, too. Not to mention the fact that this also introduces the possibility of a user using different fonts for each script.

As such, I suggest we add the following script codes: Todo for the Clear Script, Manc for Manchu and Xibe for Xibe. Theknightwho (talk) 20:10, 20 December 2022 (UTC)[reply]

@@ Line 448: / Line 448: @@
 : '''Support'''. We would have to use the IPA letter ''ʔ'', not lowercase ''ɂ'' (whose uppercase is ''Ɂ'', different from the IPA letter), because there is no lowercase version of ''ʕ'', unfortunately because I like the idea of using only lowercase letters in transliteration. Apparently no language uses uppercase and lowercase versions of ''ʕ'' yet so Unicode hasn't added them. — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 15:29, 20 December 2022 (UTC)
 ::The Arabic transliteration is strictly lower-case, so that's not a problem only with no exception at Wiktionary but Wikipedians suffer from that notion that they need to capitalise foreign transliterations where it's inappropriate. [[User:Atitarev|Anatoli T.]] <sup>([[User talk:Atitarev|обсудить]]</sup>/<sup>[[Special:Contributions/Atitarev|вклад]])</sup> 22:34, 20 December 2022 (UTC)
+:::{{ping|Atitarev}} I am not opposed to this, primarily because the existing ʾ and ʿ are nearly impossible to tell apart at a normal distance (several times I've had to zoom in on the text in order to see what was going on). In addition, in the edit window the two are displayed reversed for some reason, which makes things incredibly difficult. However I do understand the concern about people searching for the transliteration using the ring symbols. I think there's a technical solution to this: insert the old transliteration into the output code inside of an HTML comment or non-displaying span. This should made the correct entry pop up when you search using the "traditional" transliteration.
+:::If we decide to implement this, we should not "just do it", but think it through; lots of entries and other places use manual transliteration, and they will all have to be corrected by bot. [[User:Benwing2|Benwing2]] ([[User talk:Benwing2|talk]]) 02:19, 21 December 2022 (UTC)
 == Names for some Turkish verb forms ==

Wiktionary:Beer parlour/2022/December: difference between revisions

Revision as of 02:19, 21 December 2022

Contents

Japanese kyujitai

Position of box templates

`{{defdate}}` vs `{{etydate}}`

Wugniu tone notation

Location of Footnotes for Etymologies

Syllable breaks in English pronunciations

`{{la-epithet}}`

Let's add Jisho.org to the abuse filter

Narrow IPA norms for English

Template:hcol, Template:hrow, Template:zcol, Template:zcol+, Template:acol, Template:topx, Template:topx+, , Template:exp-topx, etc.

Happy birthday Wiktionary!

Reminder to provide feedback on the Movement Charter content

Links to reflexive Polish verbs

Community Wishlist Survey 2023 opens in January

Oghuz language

Bragging

Quoting a passage with multiple lines

Wikimedia Sound Logo Voting: Final days!

Macro-English languages, Macro-French languages, etc.

Arabic transliterations: let's use ʔ and ʕ instead of ʾ and ʿ.

Names for some Turkish verb forms

Adding script codes for the Clear Script, Manchu and Xibe.

Navigation menu

Wiktionary:Beer parlour/2022/December: difference between revisions

Revision as of 02:19, 21 December 2022

Japanese kyujitai

Position of box templates

{{defdate}} vs {{etydate}}

Wugniu tone notation

Location of Footnotes for Etymologies

Syllable breaks in English pronunciations

{{la-epithet}}

Let's add Jisho.org to the abuse filter

Narrow IPA norms for English

Template:hcol, Template:hrow, Template:zcol, Template:zcol+, Template:acol, Template:topx, Template:topx+, , Template:exp-topx, etc.

Happy birthday Wiktionary!

Reminder to provide feedback on the Movement Charter content

Links to reflexive Polish verbs

Community Wishlist Survey 2023 opens in January

Oghuz language

Bragging

Quoting a passage with multiple lines

Wikimedia Sound Logo Voting: Final days!

Macro-English languages, Macro-French languages, etc.

Arabic transliterations: let's use ʔ and ʕ instead of ʾ and ʿ.

Names for some Turkish verb forms

Adding script codes for the Clear Script, Manchu and Xibe.

Navigation menu

Search

`{{defdate}}` vs `{{etydate}}`

`{{la-epithet}}`