Wiktionary:Beer parlour/2024/May: difference between revisions

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Content deleted Content added
Line 1,187: Line 1,187:
::::::It would also not be unreasonable to call such expressions Translingual, once there is evidence of use embedded in running text in multiple (2, 5, 10?) languages. This violates few of our firm rules, I think, merely requiring that we allow pronunciations for multiple languages. A problem is the one of evidence of use in multiple languages. As a practical matter, it would be sufficient IMHO to start an entry for such an expression in whatever languages had sufficient evidence, typically including English. Once there were 2, 3, or more L2s, the L3s could be merged. Another demonstration of Translingual use might be single (ie, not triple) instances of use in a number of languages: eg, one use in Chinese, one in Japanese, one in Arabic, one in Spanish. [[User:DCDuring|DCDuring]] ([[User talk:DCDuring|talk]]) 14:48, 27 May 2024 (UTC)
::::::It would also not be unreasonable to call such expressions Translingual, once there is evidence of use embedded in running text in multiple (2, 5, 10?) languages. This violates few of our firm rules, I think, merely requiring that we allow pronunciations for multiple languages. A problem is the one of evidence of use in multiple languages. As a practical matter, it would be sufficient IMHO to start an entry for such an expression in whatever languages had sufficient evidence, typically including English. Once there were 2, 3, or more L2s, the L3s could be merged. Another demonstration of Translingual use might be single (ie, not triple) instances of use in a number of languages: eg, one use in Chinese, one in Japanese, one in Arabic, one in Spanish. [[User:DCDuring|DCDuring]] ([[User talk:DCDuring|talk]]) 14:48, 27 May 2024 (UTC)
:::::::What concerns me is that this will make attestation requirements more complicated, since by merging them into one “language” it’s only necessary to find 3 uses in any language at all, which is not particularly difficult. For example, {{m|en|pactum de non petendo}} is comparatively hard to attest in English, since it’s not terminology used in English law or any of its descendants, from what I can tell (the equivalent term used in such situations being {{m|en|estoppel}}, at least in England). Edit: I forgot about {{w|Scots Law}}, but the general point still stands. [[User:Theknightwho|Theknightwho]] ([[User talk:Theknightwho|talk]]) 15:02, 27 May 2024 (UTC)
:::::::What concerns me is that this will make attestation requirements more complicated, since by merging them into one “language” it’s only necessary to find 3 uses in any language at all, which is not particularly difficult. For example, {{m|en|pactum de non petendo}} is comparatively hard to attest in English, since it’s not terminology used in English law or any of its descendants, from what I can tell (the equivalent term used in such situations being {{m|en|estoppel}}, at least in England). Edit: I forgot about {{w|Scots Law}}, but the general point still stands. [[User:Theknightwho|Theknightwho]] ([[User talk:Theknightwho|talk]]) 15:02, 27 May 2024 (UTC)

== Stalking/harassment by [[User:Theknightwho]] ==

[[User:Theknightwho]] is an admin, but a controversial one. [[Wiktionary:Votes/sy-2023-02/Desysop Theknightwho|Last year, he faced a de-sysop vote]]. While he was allowed to retain his admin privileges, people on both sides expressed concern about his combative behavior, with comments such as, "''he seems to not know when to stop and distance himself from a fight.''", ''"Maybe Knight is a hothead, maybe Knight broke the rules''" and ''"I have my issues with User:Theknightwho...and I agree with the statement that he's a hothead and argues way too much''"

I'm afraid he's up to his questionable behavior again. I believe he is targeting me, stalking me, harassing me. And on top of that, some of the edits he made in targeting and harassing me...

# On Saturday, he modified or undid my edits on three different pages in a fairly short amount of time. Per the "quacks like a duck" test, it is unlikely those
#*I told him that those edits were inappropriate. He refused to see any problem
# Later that day (it was either Saturday night or Sunday morning), he undid an edit I made to [[hot dog]]
#*His edit was so bad it had to be modified only a few minutes later.
# And today, he deleted seven redirects I created over the years. Some of them are acceptable redirects...for example, he deleted [[busted my neck]], which as a conjugation of [[bust one's neck]]

What is to be done about Knight's stalking/harassment of me? Can somebody tell him to cut it out? <b style="font-family:Verdana">[[User:Purplebackpack89#top|<b style="color:#3A003A">Pur</b><b style="color:#800080">ple</b>]][[User talk:Purplebackpack89|<b style="color:#991C99">back</b><b style="color:#C3C">pack</b><b style="color:#FB0">89</b>]]</b> 15:58, 27 May 2024 (UTC)

Revision as of 15:58, 27 May 2024


Arabic and Hebrew transliteration

Wiktionary currently transliterates the glottal stop in both Arabic and Hebrew as ʔ and the voiced pharyngeal fricative in both languages as ʕ. Would it be possible to correct these to respectively transliterate the glottal stop as ʾ and the voiced pharyngeal fricative as ʿ so they would be in line with Wiktionary's transliteration of other Semitic languages, which all use ʾ and ʿ?

Wiktionary also currently transliterates the Arabic voiceless velar fricative as . However, an alternate transliteration as is also used for this sound. Since is used for the transliteration of voiceless velar fricative for most Semitic languages except for Hebrew and Aramaic, I would like to request that Wiktionary's transliteration of the Arabic voiceless velar fricative be changed from to as well. Antiquistik (talk) 13:08, 1 May 2024 (UTC)[reply]

@Antiquistik: No, we switched the other day. As for to , I don’t know, perhaps it’s better if you want to make an etymological statement that is fricativized k which we keep in begadkefat affected languages while organic. Fay Freak (talk) 18:08, 1 May 2024 (UTC)[reply]
@Fay Freak In this case, I will add my opposition to the discussion regarding that change.
Concerning to , should I make another request, or should I add it to this one itself? Antiquistik (talk) 18:55, 1 May 2024 (UTC)[reply]
@Antiquistik IMO the opposite change should happen and other Semitic languages should use ʔ and ʕ. The problem with the forward and backward quotes is that they're too small and too easily confused in many fonts. I also think ḵ is better than ḫ; ḫ is easily confused with the pharyngeal fricative. Benwing2 (talk) 23:49, 1 May 2024 (UTC)[reply]
Personally, I agree with Benwing, although I am sympathetic to the idea that we should use whatever is most widely used, and I am also sensitive to the issue of words being findable by people who search for them using other transliteration systems. I would like us to implement having the templates/modules produce (but then potentially set to be invisible / display:none by default) other common transliterations so the entries can be found if people use our site search or Google and search for ʾiʿlān etc, as discussed in the 2022 discussion, unless that would cause problems. Then we could probably also set different CSS classes for the different transliterations so people could select whether they see ʾiʿlān or ʔiʕlān, similar to the way people can choose to see or not see {{,}} (and we could debate which one would be most helpful to have on by default for the average lay reader). - -sche (discuss) 02:33, 2 May 2024 (UTC)[reply]
@-sche I think this is a good idea. AFAICT it would require some changes to Module:languages (which handles transliteration) so that a given transliteration method can return multiple transliterations rather than just one, each transliteration associated with properties such as CSS class, with one of them identified as "canonical" (meaning it is displayed while the others aren't). The only tricky thing here is manual transliterations; ideally, there would be method to convert a manual transliteration in the canonical system into each of the other systems, so that users have to specify only one transliteration rather than multiple. In the examples here, that conversion isn't hard, but sometimes it may not be possible (e.g. the current Hebrew transliterations are based on modern Hebrew pronunciation, which has several mergers compared with Biblical Hebrew, so we couldn't convert modern to Biblical Hebrew transliterations). Benwing2 (talk) 02:45, 2 May 2024 (UTC)[reply]
@Benwing2: I believe that some of the existing manual transliteration entries may need to be reviewed in order to see whether their use was actually justified in the first place. Some of them are there only to workaround various technical issues, which ceased to exist. For example, this manually added transliteration for a Belarusian quotation became unnecessary after this fix. And I definitely support the idea of having multiple transliteration schemas, because this would allow introducing Belarusian Łacinka in addition to the current WT:BE TR scholary transliteration. As @-sche mentioned, the primary motivation is that words should be preferably searchable via Google or via the search box from the Wiktionary front page. Belarusian entries currently solve the searchability problem via manually added "Alternative forms" sections with red links, but this isn't ideal. So the proposed improvement has uses even beyond Arabic and Hebrew. --Ssvb (talk) 16:41, 2 May 2024 (UTC)[reply]
Yes, I'm also in favour of having multiple transliteration schemes for this reason. Theknightwho (talk) 11:44, 8 May 2024 (UTC)[reply]
@-sche This is a good proposal.
@Benwing2 I understand that ʔ and ʕ are more visible than the small half-rings, but I question how useful using them would be for the average reader since they are barely used in current transliteration schemes. If it hinders readers' ability to find these entries, we should avoid using them. Additionally, when is ḫ confused with the pharyngeal fricative? Antiquistik (talk) 05:42, 2 May 2024 (UTC)[reply]
@Antiquistik I'm not sure what you mean by "barely used in current transliteration schemes". Are you referring to transliteration schemes outside of Wiktionary? If so, why do you think the average reader will be familiar with them, but won't be familiar with IPA? As for using ḫ, my point is that this is easily confused with ḥ (the transliteration for pharnygeal fricative), and having all three of h ḫ ḥ is going to make for endless confusion. Benwing2 (talk) 05:47, 2 May 2024 (UTC)[reply]
@Benwing2 While I don't think that the average reader will be more familiar with the IPA signs, I doubt that they will be searching Arabic terms with signs from the current standard transliteration schemes substituted by IPA signs that are rarely used for Arabic transliteration.
And, as pointed out by @Ssvb, the entries need to be searchable. Using the more widely employed transliteration is the better option for this.
As for the transliteration of /x/, I strongly disagree with your position. The transliterations for other Afroasiatic languages like Old South Arabian, Ugaritic and Ancient Egyptian use both ḫ and ḥ without any problem, and I don't see why should the organic /x/ in Arabic be represented through a character used for sounds affected by begadkefat. Antiquistik (talk) 11:19, 3 May 2024 (UTC)[reply]
@Antiquistik: Your premise of the signs being but used in IPA transcriptions before having been adopted by Wiktionary is wrong. We realized that there are lots of linguistic books, more or less traditionally Semitist, with them as their editorial choice for transcription. I have doomsurfed the philologies enough in the last 1½ decade to know that this is by far not so uncommon as to be stunting someone’s dictionary use. I also want to raise your attention towards pertinent languages without native writing system that can only be entered in an academic transcription, the Modern South Arabian languages, which have suffered some variations in transcription styles over the decades and native countries of researchers but I think are amenable as written down at أَيْدَع (ʔaydaʕ), whereas with all their diacritics the rings would strain the readers’ tempers. Fay Freak (talk) 11:37, 3 May 2024 (UTC)[reply]
@Fay Freak How prevalent are Arabic transliterations using the IPA signs compared to the half-rings? Antiquistik (talk) 13:06, 3 May 2024 (UTC)[reply]
@Antiquistik: No one, or at least not me, can do stats on such thing. There’s is also a qualitative difference in the kinds of resources that use them. In purely Semitist sources due to tradition the rings hold their ground. I have clicked around in my Semitics folder for you. I wanted to say that Leonid Kogan uses MODIFIER LETTER GLOTTAL STOP ˀ a lot, which is a bit more conspicious and between the two extremes, but the second work by him I opened ({{R:tig:Kogan:2011}} after {{R:sem-pro:GC}}), goes the whole hog and uses ʔ for Arabic and the other Semitic languages. {{R:sqt:CSOL}} and {{R:sem-pro:SED}} uses ˀ, anything published in the Journal of Semitic Studies such as →DOI the rings, we may see it as a publisher decision, in more relaxed journal pieces he seems to prefer the IPA letters? In the old and long series Perspectives on Arabic linguistics you got the IPA letters all around. There is a lot of socialization behind letter choices, you just need to get used them, but not lose aesthetic sense. University docents may teach something specific but there is a point where one shan’t believe other people. Younglings learn and adults function by imitation but science by organized skepticism, a dilemma.
The complicated part: I can hold you a lecture how it is has to do with spatial-temporal memory, again the first chapter of the handbook of memory, ASD and the law I mentioned. Everything normal in the head, you guys tribally react to relations previously experienced with and from other people, in spite of the meatspace effecting the worst selection bias, contrary to universalism of science. You underestimate the psychological background behind all this. I did hardly positively respond to what teachers required or expected from me in terms of organizing a treatise, by some internal logics which aren’t strictly rationally evident, writing points of a paper in this and that order and not missing out a super-influential fashionable nonsense in the field I mean, which is detrimental to exams, and self-portrayal in job applications, however exquisitely able to judge the merits of the matter in isolation, and I am now very aware how strong feelings about signs come about, without sustaining them myself. We don’t just count voices together to let the loudest party win, this is not how creating good stuff works, only a working hypothesis. Fay Freak (talk) 14:09, 3 May 2024 (UTC)[reply]
@Fay Freak I would still suggest that we should use the most prevalent system, but given your explanation, I am willing to accept the present status quo while still maintaining my opposition in the original change proposal. Antiquistik (talk) 08:28, 26 May 2024 (UTC)[reply]

Descendant tree design

Here's my idea for a horizontal tree style that could be generated by {{etymon}}. I've switched up the colour scheme, since this is a descendants tree rather than an etymology tree. We can also include question marks or labels just as in the etymology tree. Let me know what you think! @Vininn126, Equinox, Sławobóg, -sche, 0DF Ioaxxere (talk) 21:24, 1 May 2024 (UTC)[reply]

How would you represent borrowings and morphological reshaping in this format? Also I think I prefer Design 2, because in Design 1 the single right-branching node might be interpreted as somehow different from the below-branching nodes (and in addition, in Design 1 someone might e.g. interpret the juncture where Proto-Italic branches off as its own node, a daughter of PIE rather than just an artifact of the design). However, even better than either IMO would be one where the parent is centered vertically among all of its children rather than being at the top. Benwing2 (talk) 02:55, 2 May 2024 (UTC)[reply]
@Benwing2: Probably with the same label system that {{etymon}} already uses. I like your idea for centering the node, although for trees with a huge number of lines it might lead to the ultimate ancestor being far down the page. Possibly the ultimate ancestor could be given some kind of special status where it always goes at the top left of the page. Ioaxxere (talk) 05:31, 2 May 2024 (UTC)[reply]
I think Design 2 is also my preference, at least on desktop. Vininn126 (talk) 13:25, 3 May 2024 (UTC)[reply]

Design 1

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 2

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 3

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 4

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 5

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

At the risk of stating the obvious, only a small fraction of the descendants are being shown here. Is this focussed on English? Nicodene (talk) 21:49, 1 May 2024 (UTC)[reply]
@Nicodene: This is just a mockup. I created all the HTML by hand, but the full (automatically-generated) tree will have all the descendants. Ioaxxere (talk) 22:11, 1 May 2024 (UTC)[reply]
How would they all fit? Some of the ‘nodes’ have dozens of direct descendants. Nicodene (talk) 22:16, 1 May 2024 (UTC)[reply]
@Nicodene: The tree would be extremely tall in that case. Either way, it would still be significantly more readable than something like what we currently have at Reconstruction:Proto-Sino-Tibetan/s-la#Descendants. Ioaxxere (talk) 22:19, 1 May 2024 (UTC)[reply]
I have to agree with Nicodene. With etymology trees and the vertical format, it makes more sense to me because the tree will be much more compressed, but for descendants, I can't really see it working as well. It'll get really unwieldy and fast. The list you've pointed too isn't good either, but I don't like replacing one problem with another one. Looking at the link you've sent, how would this interact with etymology-only languages or the situation with Chinese? AG202 (talk) 03:06, 2 May 2024 (UTC)[reply]
Etymology-only languages shouldn't be too difficult to handle in general. For Chinese, I feel like including dozens of dialectal pronunciations in Reconstruction:Proto-Sino-Tibetan/s-la is excessive and we should reduce that to only those forms which were borrowed into other languages. It's also possible that descendants trees will end up having less automation than etymology trees in general. Ioaxxere (talk) 05:31, 2 May 2024 (UTC)[reply]
One thing that needs to be addressed is alternative forms. In Middle English, there are loads of them for everything. They can't always be ignored, because there are enough cases like catch and chase from Old French: chacier, chacer (chiefly Anglo-Norman), cachier (northern), flour and flower from Middle English: flour, fflour, fflowr, fleur, flor, floure, flower, flowr, flowre, flowyr, flur or even morrow and morn from Middle English: morwe, morewe, morowe, morow, morrou, morue, morw, morȝe, morewen, morowen, morȝen, morwen, morwyn, morwhen, morwoun, morun, moron, moryn, morn, morgen, marhen, mareȝen, morghen, moruwe, where different alternative forms have different descendants. Chuck Entz (talk) 18:31, 3 May 2024 (UTC)[reply]
Love it. After a quick glance at the HTML, is the only difference alignment? I think that since this could appear early on in a number of entries that have right-floating tables of contents, I think left-alignment makes the most sense to avoid some of the inevitable bunching. —Justin (koavf)TCM 22:14, 1 May 2024 (UTC)[reply]
@Koavf: No, the difference is whether there are connectors on the bottom of the boxes. I have no idea why the alignment is different, actually... Ioaxxere (talk) 22:16, 1 May 2024 (UTC)[reply]
Ah, I see that now. —Justin (koavf)TCM 22:17, 1 May 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Ioaxxere: Please excuse the lengthy delay in my response. I prefer design 4, with design 3 a close second for me. Thanks for asking. 0DF (talk) 17:11, 21 May 2024 (UTC)[reply]

get rid of noun and adjective plural form categories once and for all

There appears to be consensus established here, here and here, as well as in this diff, to not categorize noun and adjective non-lemma forms in separate 'noun plural forms' and 'adjective plural forms' categories. Yet when I made such a change for newly added Chadian Arabic terms, my favorite editor User:Fenakhay went on a revert spree. By longstanding consensus, we do not in general categorize non-lemma forms as e.g. Category:Russian noun prepositional case forms etc., so I don't see why an exception needs to be made for noun plural forms. However, I'd like to get clear consensus here to remove all such categories and delete the entries from Module:category tree/poscatboiler/data/non-lemma forms that allow such categories to be recognized. We have already done this for some languages; for example, there is intentionally no Category:English noun plural forms, and that page is protected against re-creation by bots or non-admins.

The alternative is to outline a clear rationale for why we need such categories and a rule for which situations they are allowed and which situations they aren't allowed. Either way, the current haphazard situation, where some languages have such categories and some don't, and the categories are incomplete, is unmaintainable.

Benwing2 (talk) 23:45, 1 May 2024 (UTC)[reply]

And a stronger consensus at Wiktionary:Requests for deletion/Others#Category:Adjective plural forms by language. It seems that Fenakhay is the only editor who supports the retention of these categories. Consensus is against them. This, that and the other (talk) 02:55, 2 May 2024 (UTC)[reply]
I support getting rid - trivial category intersections like this are a waste of time. Theknightwho (talk) 03:24, 2 May 2024 (UTC)[reply]
I don't see any rationale for this kind of category either and so am in favour of deleting them. Nicodene (talk) 14:13, 2 May 2024 (UTC)[reply]
I agree as well. Ioaxxere (talk) 17:57, 2 May 2024 (UTC)[reply]
Support deleting these. Ultimateria (talk) 17:21, 6 May 2024 (UTC)[reply]
If we have this kind of thing, it should be with a clear rationale for when/where and why (as Benwing says) and it should be added automatically, probably by whatever headword- or definition-line templates we're using to declare something as a noun plural form, paucal form, etc in the first place — I say this because as far as I saw in the prior RFDs, the categories were populated haphazardly and manually with handfuls of entries, which is not useful. The usefulness of categorizing non-lemma forms by their specific non-lemma-ness seems small (though not nonexistent) to me; I suppose if I wanted to know what kinds of endings Foobarian noun plural forms had, a category would be useful, but the array of endings which Foobarian noun plural forms have could alternatively be mentioned on the About Foobar page, or on the Foobarian equivalent of Appendix:English grammar. Can anyone articulate something these categories would be useful for? (Absent that, I have no objection to deleting them, and indeed voted to do so in some of the prior RFDs.) - -sche (discuss) 19:21, 2 May 2024 (UTC)[reply]
Personally, I find these categories very useful from a navigational standpoint, so I'd like to see them kept. That said, they should be added automatically as part of templates like {{infl of}} and {{plural of}}, not added manually by users. Binarystep (talk) 11:26, 5 May 2024 (UTC)[reply]
@Binarystep Do you realize this is simply an intersection category? In general we don't usually include intersection categories because you can search for any combination using the Search feature. In this case, e.g. to do the equivalent of CAT:Chadian Arabic noun plural forms, you can search for the combination of category CAT:Chadian Arabic noun forms and template Template:plural of. Adding them automatically using templates like {{infl of}} and {{plural of}} has already been tried, but it turns out to be difficult from a programmatic standpoint in some cases and a maintenance headache, which is the reason I want them removed. Benwing2 (talk) 20:02, 5 May 2024 (UTC)[reply]

Hi. User:Fay Freak and I have been having a discussion about using {{alt}} or {{desc}}, or a creating a similar template, for Derived terms and the like. This came up because Fay Freak has been using {{desc|nolb=1}} in Derived terms sections. (Note: |nolb=1 disables the language name at the beginning. FF proposes renaming |nolb= to |nolang= to avoid confusion with |lb= for labels and because what's being suppressed is a language name, not a label.) Both {{alt}} and {{desc}} let you specify a series of terms along with per-term properties plus overall labels for the whole set of terms, although the syntax of the two templates is different and {{desc}} has some extra features specific to descendants. Note that we also have {{syn}}, {{ant}}, etc. for inline synonyms/antonyms/etc., which likewise have support for specifying a series of terms with both per-term properties and overall labels. The current syntax for Derived terms, Related terms and such involves manually listing each term with {{l}} and using {{q}} to add qualifiers as needed, but compared with {{alt}} and {{desc}} this is both more cumbersome and less standardized, meaning that different people format things differently. I think we ought to have some way for Derived terms sections and the like of specifying a list of terms plus labels, similar to {{alt}} and {{desc}}. The question is, should we just reuse e.g. {{alt}} for this purpose, or create another template? (If the latter, I'd maybe call it {{terms}}.) Potentially we could rename {{alt}} to {{terms}} or something similarly generic and keep {{alt}} as an alias, since there isn't really anything about {{alt}} that is specific to Alternative forms.

I'm omitting mention of {{col3}} and the like; while these are useful especially for long lists of similar terms, they don't provide the ability to specify a set of labels at the end of the list of terms, as {{alt}} and {{desc}} do.

Benwing2 (talk) 05:22, 2 May 2024 (UTC)[reply]

That'd be quite nice. All I have to add is that it'd help to have the option to split derived terms into columns or put them in collapsible boxes, as people have been doing with a variety of other templates (cf. cado). Nicodene (talk) 14:01, 2 May 2024 (UTC)[reply]
I think we'd be able to scrape this to be honest. All it'd need is an etymology section for most terms... Vininn126 (talk) 16:20, 5 May 2024 (UTC)[reply]
@Vininn126 I don't quite understand what you mean, can you clarify? Benwing2 (talk) 20:03, 5 May 2024 (UTC)[reply]
Sorry, misinterpreted. Not sure I have a strong opinion. Vininn126 (talk) 07:17, 6 May 2024 (UTC)[reply]
I'm having trouble understanding the need for such a template beyond stringing multiple {{l}}s together. Can you give an example? I'm also confused by the association being made between Derived terms and Alternative forms. They're pretty distinct in my mind. -- Sokkjō 03:41, 11 May 2024 (UTC)[reply]
@Sokkjo User:Fay Freak gave the example in Sittenstrolch of using {{desc|de|Sittich|lb=prison slang|nolb=1}} under Derived terms in order to get the label functionality; it displays as
You can get a somewhat similar effect using {{alt|de|Sittich||prison slang}}:
Here, only one term is listed but you can easily imagine listing multiple terms and multiple labels, which are supported in both syntaxes. Note that you couldn't so easily just use a qualifier because the labels autolink like {{lb}} labels, but don't categorize. I suppose you could write
* {{l|de|foo}}, {{l|de|bar}}, {{l|de|baz}} {{lb|de|prison slang|Austria|nocat=1}}
which displays as
much like writing
* {{alt|de|foo|bar|baz||prison slang|Austria}}
but as you can see, the former is much more awkward.
The reason I brought this up is that there's not a lot of functionality (and arguably no functionality) that's specific to {{alt}}; that's why I mentioned generalizing (or simply renaming) {{alt}} so it can be used outside of Alternative forms sections. Benwing2 (talk) 07:10, 11 May 2024 (UTC)[reply]
In the example Sittenstrolch, there is no reason a usage label would belong there -- that should be left to the entry page. If I saw a user add that, I would delete it. -- Sokkjō 07:27, 11 May 2024 (UTC)[reply]
Obviously not everyone agrees with you, because qualifiers and labels are extremely common in derived terms, synonyms and the like. I would tread lightly and think twice before deleting such a label. Benwing2 (talk) 08:39, 11 May 2024 (UTC)[reply]
What other users are putting usage labels in the derived terms section?! -- Sokkjō 05:07, 12 May 2024 (UTC)[reply]
It's useful if it's only a derived term in one particular (uncommon) sense, and you want to make that clear, but that's quite a rare scenario. Theknightwho (talk) 22:52, 19 May 2024 (UTC)[reply]
Being able to string together multiple {{l}}’s is all I ever wanted for Christmas. Nicodene (talk) 05:56, 12 May 2024 (UTC)[reply]

Plurals on head lines and declension tables

Is there any point in having both plurals on the head line and a declension table showing the plural for a noun lemma? I would be inclined to omit the plural(s) when there is a declension table. --RichardW57m (talk) 16:36, 2 May 2024 (UTC)[reply]

@RichardW57m it would perhaps help to specify which language you're thinking of and give an example. This, that and the other (talk) 03:07, 6 May 2024 (UTC)[reply]
The specific language where this has come up is Lithuanian, avìdė, which currently only displays the plural through the declension table. A similar specific is with the Lithuanian adjective headword template, where until recently many ordinals' neuter form was wrong and contradicted the following declension table. --RichardW57m (talk) 11:27, 7 May 2024 (UTC)[reply]
IMO it depends on how regular the inflections in question are. If they serve as something like principal parts, I think it's useful to put them on the headword line as well as in the declension table, because then someone with some familiarity with the language will know how to inflect the term without needing to look through the whole declension table to figure out what the most important parts are. This is similar to how we list the past historic and past participle for Italian verbs. OTOH if they are largely predictable, putting them in the headword line is less useful. Benwing2 (talk) 23:27, 8 May 2024 (UTC)[reply]
As Benwing suggested, I would say the answer is language-specific. For example, in German, plurals seem to be the most unpredictable declined form of a noun, so it makes some sense to give the plural in the head line.--Urszag (talk) 22:38, 9 May 2024 (UTC)[reply]

A way to more easily connect with readers: a follow-up

Following Wiktionary:Beer_parlour/2024/March#A_way_to_more_easily_connect_with_readers, I wrote to WMF in an attempt to figure out how to best resolve this issue. @Johan Jönsson replied and has given us an option, I think. He suggests we create a new mailing list for admins and for us to put enwiktionary in the name somehow. What do people think of this solution? Vininn126 (talk) 16:03, 3 May 2024 (UTC)[reply]

Support Ioaxxere (talk) 16:56, 3 May 2024 (UTC)[reply]
Support This, that and the other (talk) 08:30, 5 May 2024 (UTC)[reply]
Support Binarystep (talk) 12:45, 5 May 2024 (UTC)[reply]
Support Thadh (talk) 11:09, 6 May 2024 (UTC)[reply]
Okay, I'm going to move forward with this. See phabricator:/T364731. Vininn126 (talk) 10:38, 13 May 2024 (UTC)[reply]
Update, we have a private mailing list for admins (please open phabricator thread for details). Any active admins may sign up. Ladsgroup mentioned we may also open a public general use mailing list if we want. I'll leave that discussion for another time. Vininn126 (talk) 07:20, 14 May 2024 (UTC)[reply]

Volga Türki language

Greetings, I'd like to propose giving Volga Türki an L2.

It is a significant member of the Middle Turkic literary languages, and is as important as Ottoman Turkish, Chagatai and Karakhanid, all of which already have their own Wiktionary categories: Category:Ottoman Turkish language, Category:Chagatai language, Category:Karakhanid language. Volga Türki is considered a descendant of Karakhanid, together with Chagatai, however they all are roughly contemporary.

It was in wide use in the Volga-Ural region from 15th century (if including Qissa-i Yosof poem by Qul Ghali, then from 12th century) until adoption of Cyrillic and Latin scripts for Tatar and Bashkir languages under Soviet rule. Even though before Soviet rule, at late 18th-early 19th century the written languages for Tatar and Bashkir started to slightly diverge from Volga Türki, it remained a common standard for international affairs, especially between other Turkic groups.

Its addition would not only help with etymological sections, but also help connect the cognates with other Turkic languages, similarly to other Middle Turkic literary languages' sections.

As for Unicode characters, numerals and readings, I already have prepared all of this, and will work on adding them as soon as the category is created. The sources of lemmas are going to be taken from books, dictionaries and other written resources from that time period. I will try to list a source for each lemma whenever possible.

The only issue, however, is that the language does not have its own ISO 639-2 code yet. I propose one of the following codes to be used for the language: iut (for İdil-Ural Turkic); tui (Turkic of İdil-Ural). I deprecate codes like vut (Volga-Ural Turkic) and ott (Old Tatar) firstly due to the name Volga not being used by the locals, especially during the era of Volga Türki, and secondly due to the name Volga/İdil/İdel Türki being neutral, and Old Tatar primarily referring to the diverged variant of Volga Türki that was used specifically for Tatar. Bababashqort (talk) 16:06, 3 May 2024 (UTC)[reply]

What is the Volga Turki corpus and how accessible is it? Qissa-i Yosof poem by Qul Ghali should definitely not be included, as it is covered by Khorezmian Turkic [1]. Allahverdi Verdizade (talk) 20:40, 3 May 2024 (UTC)[reply]
Support BurakD53 (talk) 17:57, 4 May 2024 (UTC)[reply]
Its corpus mostly isn't digitalised, but practically all Bashkir and Tatar literature from at least 16th century until late 19th century is written in Volga Türki. The books, manuscripts and magazines are still preserved in a lot of libraries in Tatarstan and Bashkortostan. As for Qissa-i Yusuf, that is somewhat debatable, but given the timeframe it probably suits Khorezmian, as one of the ancestors of Volga Türki. Bababashqort (talk) 07:24, 5 May 2024 (UTC)[reply]
@Bababashqort: for the last issue, we generally make up our own codes using the code for the group it belongs to (probably "trk") followed by a hyphen ("-") followed by some sequence of letters that's not already in use by us. That way there's no chance of our code conflicting with an ISO code. Since this is strictly for internal use and our modules and css/jss code convert everything for browsers, we don't have to use existing ISO codes. Chuck Entz (talk) 18:24, 4 May 2024 (UTC)[reply]
Yes, I've been told that wiki uses a placeholder, but didn't exactly know how it worked. Thank you for explaining!
In this case I'd suggest trk-iut Bababashqort (talk) 07:25, 5 May 2024 (UTC)[reply]
@Bababashqort We try to use the first three letters of the lect in the second part of names like this. What do you think of trk-idi or trk-vol? Benwing2 (talk) 08:04, 5 May 2024 (UTC)[reply]
trk-idi includes only the Volga part, as well as trk-vol. The name itself, however, is taken from the most widespread naming of the language, which unfortunately is shortened to Volga Türki, omitting Ural. And speaking of İdil, it is actually spelled as İdel in Tatar itself, İdil is just more Common Turkic. Therefore the only solution seems to be trk-iut, it's not that hard to deduce I think. Bababashqort (talk) 11:54, 5 May 2024 (UTC)[reply]
@Allahverdi Verdizade suggested to make a Turki category instead, which I'd very much prefer. It would remove the need to add more distinct subvariants of it, such as North Caucasian Turki, Nogay Turki and others. This would also allow to use derivation template for all languages that used it: Crimean Tatar, Kumyk, Nogay, Bashkir and others. Bababashqort (talk) 13:21, 5 May 2024 (UTC)[reply]
@Bababashqort Sure, that works. What language is this a category of? Benwing2 (talk) 19:56, 5 May 2024 (UTC)[reply]
I think he meant he wants Türki as a language code, not specifically Volga Türki Bortkastningskonto (talk) 07:01, 6 May 2024 (UTC)[reply]
@Bortkastningskonto @Bababashqort OK, I need more information then. Is "Türki" supposed to be an L2 language? This is an awfully generic name for a language, and I would likely oppose this name for this reason. And I will repeat my assertion that the code for Volga Türki should be 'trk-vol' in keeping with the name. The code should reflect the first three letters of the lect name barring extraordinary circumstances (usually due to ambiguity when there are multiple lects sharing the first three letters, which is not an issue here). @Allahverdi Verdizade can you weigh in here? I am not qualified enough to tell whether this should be an L2 language, an etym-only language or just a label of some other language (the last two being rather similar). Benwing2 (talk) 07:11, 6 May 2024 (UTC)[reply]
I didn't actually suggest making Türki a L2, rather I wondered whether it wouldn't be better to do so depending on how different Volga Türki is from, say, North Caucasian Türki. I can't answer that question myself, and I think, in general, very few people can give a well-informed opinion on that. Reading this book on North Caucasian Turki (in Russian) might help a little. Considering that Bababashqort is likely only going to work with sources written in the Volga variety, maybe it is the safest to create a Volga Turki L2, in which case you would circumvent the problem with "awfully generic name". Documents in North Caucasian Turki are terribly inaccessible (not digitized or normalized), so I don't think anyone is going to work with them.
In any case, there is also the problem of classifying "literary languages" and fitting them into genealogical tree schemes. It is often said that this or that language "is moslty X, but also incorporates elements of Y", at the same time as it "continues the literary tradition of Z". I can't exactly tell you what it means that "Volga Turki continues the tradition of Khorezmian Turki", which in turn "continues the tradition of Karakhanid", as it oftentimes is put in Russian books on the matter. Too much arbitrariness for my taste. So my opinion is that these "literary languages" maybe should not have ancestors and descendants. Allahverdi Verdizade (talk) 17:35, 7 May 2024 (UTC)[reply]
Support Yorınçga573 (talk) 20:23, 9 May 2024 (UTC)[reply]

Request for a new language

Yet again, I request for Old Lombard to be listed separately, as for now Old Lombard is listed as a dialect and not a language. That Northern Irish Historian (talk) 17:30, 4 May 2024 (UTC)[reply]

I notice that Old Italian is currently an etym-only variant of Italian. Why can't Old Lombard be the same? How different are Lombard and Old Lombard? Benwing2 (talk) 18:59, 4 May 2024 (UTC)[reply]
Old Lombard:
  • Faremo preg a Deo a Questi cominzament
  • et a la soa mather ke preg l’omnipotent.
  • Ke n’des a dir et a far tute l so placiment
  • Ço ked is la scritura si se conven a dir
  • De la pasin de Christ a ki ne plas hodir
  • La qual per nu katif je plase sostegnir
  • Bene questi paroli de panzer e da stremir
  • Qui longa fis e dis del pasio del fy de la rayna.
  • La qual si m’dia gratia et a mi sia vesina
  • Ke parlo dritament de la pasion divina
  • St’apreso si me scampo da la infernal pena.
Modern Lombard:
  • Ambiaróm con ‘na preghiéra a Dio
  • e a sò madèr che la préghes l’Onipotent
  • Che nómes a dì e a fa töt de so gradimènt
  • E per bontà sò el vègnes a compimènt
  • Chèl che la dis la Scritüra isé come l’è giöst a dìl
  • De la pasiù de Cristo a chi che öl sintìl
  • Pasiù che per notèr pecadùr la sèrf a soportà
  • Con rasegnasiù chèste parole de pianzer e dè dulùr
  • Ché se parla e se dìs del fiöl de la regina
  • Che la me dàghes gràsia e la me stàghes vizìna
  • ‘Ntat che parle drit de la pasiù divina
  • Semài che scamparó de la pena infernal.
That Northern Irish Historian (talk) 22:35, 8 May 2024 (UTC)[reply]
@That Northern Irish Historian That's not what I was looking for; you have pasted in two different translations which naturally will be different. If you try to match up the corresponding words, they are IMO marginally different enough to maybe be considered different L2's (although they differ less e.g. than the current Occitan dialects). I notice however that there are 0 lemmas currently listed as Old Lombard; are you actually planning on adding some? Benwing2 (talk) 23:16, 8 May 2024 (UTC)[reply]
Yes, but see zinqui, Jesu, and other pages. It is not working. That Northern Irish Historian (talk) 23:24, 8 May 2024 (UTC)[reply]

That's how we enter these words. If you have any objections, please write here. BurakD53 (talk) 14:29, 5 May 2024 (UTC) words[reply]

lol. Yes, I have objections. Allahverdi Verdizade (talk) 16:11, 5 May 2024 (UTC)[reply]
As I said before, I want the {{trk-ogz-pro}} code to be removed and replaced with {{trk-ogz}}. Since we have already reconstructed them all under the {{trk-pro}} pages, Proto-Oghuz is quite unnecessary. If anyone still wants to reconstruct Proto-Oghuz, you can reconstruct it using the * sign on the Oghuz page. (Which is quite unnecessary) Likewise, {{trk-klj}} can also refer to the Arghu language, but the data in this language consists of a few words. {{trk-ogz}} is the direct ancestor of all Oghuz languages, in short, it is the same as Proto-Oghuz {{trk-ogz-pro}}. However, we cannot enter these Oghuz or Proto Oghuz words recorded in the Diwan into the site as entries. It requires reconstruction in order to be entered to us. However, these Proto Oghuz words, also Proto Khalaj words, are not a reconstruction. I think that both of them should be entered as input on the site, the biggest reason is that these languages cannot be assumed to be dialects of other languages. But since the Arghu language consists of only a few words, it can be entered under the name Proto. Oghuz language is mentioned many times in the Diwan and even information about its grammar is given. A few Proto Khalaj, i.e. Arghu, words may be added as exceptions. But since this is the case for Oghuz, there is no need to create a language code called Proto-Oghuz. This is my opinion. I firmly reject the addition of these Oghuz words to Old Anatolian Turkish. Not every word mentioned in the Diwan has been witnessed in Old Anatolian Turkish, and the place where Kashgarî shows the Oghuzs on the map in the period he mentions is not Iran, but Central Asia. Also the words here are more archaic than the form in which they are found in Old Anatolian Turkish. BurakD53 (talk) 18:22, 5 May 2024 (UTC)[reply]
Support Yorınçga573 (talk) 20:10, 9 May 2024 (UTC)[reply]

Lemma categories

Discussion moved from WT:Beer parlour/2024/April#Lemma categories.

I've been cleaning up Special:UncategorizedPages, and I've run across a number where @Nicodene has disabled categorization for alternative forms. My understanding is that all mainspace entries should be in either Category:[Language] lemmas or Category:[Language] non-lemma forms. While an alternative form is supposed to be a stub that links to the main form, as far as the categories are concerned, it's a lemma. It's certainly not a non-lemma form, because it has its own non-lemma forms. Leaving it out of both categories raises the question of why we have the entry at all, if we feel we need to hide it: if we don't link to it in the main entry, there's no way to navigate to it.

This has come up before over the years, and we've more than once decided to do it this way. As far as I can tell, Nicodene is the only editor who's doing otherwise. Has anything changed? Chuck Entz (talk) 03:13, 6 May 2024 (UTC)[reply]

Why should Category:Franco-Provençal lemmas be clogged with twelve different renditions of ôtro, seventeen of ôtrament, and ten of solament? Why should Category:Old French lemmas (not to mention Category:Old French adverbs) be clogged with two hundred seventy one renditions of iluec? The whole point of a lemma is to provide a citation form to cover the variants. That is how altforms and altspellings are handled by the vast majority of dictionaries. Nicodene (talk) 03:23, 6 May 2024 (UTC)[reply]
I'm of two minds here. Yes, we generally include alternative spellings and forms as lemmas; otherwise, for example, we'd end up including only one of oxidi{s,z}e as a lemma, and the other would go nowhere. At the same time, however, including 171 alt variants of iluec seems like serious overkill. Maybe we need a separate policy for non-standardized languages vs. standardized ones. Benwing2 (talk) 07:16, 6 May 2024 (UTC)[reply]
At a minimum, every entry should be in some category. As far as how that's been accomplished up to now, my understanding matches Chuck's, that every entry is supposed to be categorized as either a lemma or a nonlemma (or both) and that alternatively-spelled nouns are still nouns (and lemmas, from the category / grammatical perspective). We could change that, e.g. add a parameter which, instead of turning categorization off, moves the entries from "Category:Foobarian nouns" to at least a POS-agnostic catchall "Category:Foobarian alternative forms and spellings", or something more specific like "Category:Foobarian alternative forms and spellings of nouns", "Category:Foobarian alternative forms and spellings of lemmas", but I do think we should continue to regard a completely uncategorized entry—an entry that cannot be accessed from any part of our category tree—as a problem.
There was support for not putting just any alternative spelling into topical categories in this 2022 discussion, but that didn't leave the entries categoryless.
FWIW, the issue of terms having tons of spellings isn't strictly limited to overall-nonstandardized languages, e.g. English has lots of spellings of kinnikinnick, Muhammad, voivode... but I think Benwing's suggestion of handling this on a per-language basis (and just accepting that the English categories will have a few cases like Muhammad where there are a bunch of spellings) is probably more workable than e.g. trying to decide (in a way that can be maintained over time with any consistency) on a per-spelling basis what counts, in a mostly-standardized (but standards-body-less "ungoverned") language like English, as a "standard" spelling. (E.g., several of the alternative spellings of Muhammad are used mainly in scholarly works, so dismissing them as nonstandard seems hard; and in the other direction, for a largely dialectal word, determining why any one spelling should be considered more standard than another seems hard.) - -sche (discuss) 13:54, 6 May 2024 (UTC)[reply]
We could change that, e.g. add a parameter which, instead of turning categorization off, moves the entries from "Category:Foobarian nouns" to at least a POS-agnostic catchall "Category:Foobarian alternative forms and spellings"
I would be quite happy to use that if it were available as an option.
My main concern is keeping the categories clear and usable. When I look up 'Foobarian feminine nouns', for instance, I'd rather not have to wade through 5–10 (+) duplicates for every distinct noun. That is a serious headache with languages like Franco-Provençal or Romansch. Nicodene (talk) 07:54, 7 May 2024 (UTC)[reply]
@-sche: I would like this to be implemented for English as well. Having full-fledged entries for minor spelling variants was a bad idea. Ioaxxere (talk) 03:08, 10 May 2024 (UTC)[reply]
I disagree. All words should be given equal status, at least when it comes to categorization. I don't think Wiktionary should be treating variant spellings as inferior forms of the main entries. For starters, every spelling is (or was) the "default" spelling to someone. Using the example of Muhammad, for instance, there are plenty of people named Mohamad, Mohamed, Mohammad, Muhamad, Muhammet, etc. and it seems weird to claim that their names are merely lesser variants of the single "canonical" spelling. There's also the fact that some spellings carry unique etymological information, have slightly different pronunciations, or are used primarily by certain groups (regional spellings, for instance, or spellings used primarily by non-native speakers). Frankly, I find it troubling that there have been so many recent attempts lately to get us to reduce our coverage rather than expand it. At this rate, I won't be surprised if someone starts a proposal to convert alternate spellings into hard redirects. Binarystep (talk) 19:20, 11 May 2024 (UTC)[reply]

Edit with "username removed"

This edit has the user name removed. How can one see (if not who the user is), which user removed it and why? [2] Equinox 09:33, 6 May 2024 (UTC)[reply]

I removed it because it was an accidental IP/logged-out edit by an editor (the same as did a similar change to unrapable). — SURJECTION / T / C / L / 10:17, 6 May 2024 (UTC)[reply]
I'm officially saying: don't do that. You can revert, delete, but do not wipe content unless it's real serious stuff like child porn. Thank you. Equinox 23:01, 7 May 2024 (UTC)[reply]
Re how to see which admin performed the revdel: it's technically in the "View logs for this page" link on the edit history page, [3]. If there were a lot of revdels and they did not follow so closely after the time the edits themselves were made, e.g. if I now went to the page and hid a revision from two months ago, and then Surjection hid a revision from one month ago as well as your edit just now, it might be hard for non-admins [who don't have "diff" links] to discern from that log who hid which thing... I guess in that case they'd just have to say "hey, who revdel'd X" and admins could check.) - -sche (discuss) 14:28, 6 May 2024 (UTC)[reply]
@Surjection: I want you to understand how it looked to me: I saw that someone had made an edit, they had no name, I couldn't see them, or talk to them, or discuss, it was like a GHOST DID IT. And I couldn't see who removed their name either. If you ever spent time on WP:OFFICE then ...well. Equinox 22:50, 7 May 2024 (UTC)[reply]
I would, personally, be happy to see text like "edit made by a user whose name is hidden by this admin: Surjection". What I think is wrong and bad and goes against our free openness is just that MYSTERY NO-NAME. Equinox 22:51, 7 May 2024 (UTC)[reply]
Side point: I know Chuck Entz (for example) likes to "clean the graffiti wall" so that vandals can't see their names. But I don't like that. The wiki should be a public space and we should only hide the history in real serious situations like "doxxing" (real name-addresses) or... am I wrong? @-sche @Chuck Entz @Surjection (and even worse, are there Wikipedia rules we are supposed to obey as children.) Equinox 22:55, 7 May 2024 (UTC)[reply]
AFAIK it's global WMF policy to suppress this kind of thing (the IP addresses of users who've accidentally edited logged out), and indeed to suppress it way harder than a mere revision-deletion like Surjection did: "oversighters" have (or had?) database access to delete the information so hard that not even admins can see it. (But it also takes time to contact them, so it's fine for admins to revdel it in the meantime, like this.) This is precisely because of doxxing concerns, because many IP addresses identify the person's real address. (Other IP addresses, of course, merely send you to that one farm in Kansas.) If you ever see an edit where you think the content of the edit is wrong, just undo the edit... as you saw in this case, the username being suppressed doesn't prevent you from undoing the edit. - -sche (discuss) 01:39, 8 May 2024 (UTC)[reply]
Would there be an issue if contributors were to hide their IP address with their screen names after, say, a week? CitationsFreak (talk) 03:46, 8 May 2024 (UTC)[reply]
I should clarify that AFAIK such hiding only happens when someone requests it—usually the person who made the edit, though plausibly someone else who simply noticed what was going on. Last I heard, WMF folks were trying to roll out something that automatically obfuscates all IP addresses by making them show up in edit histories as e.g. incrementing numbers that change periodically or on request (so anytime someone thinks their current [non-]IP is getting too much attention from admins, they can hit "refresh" and start doing vandalism under a new identity, just like logged-in users can by creating multiple accounts), which will probably remove the need to do this in the future, if it gets implemented. - -sche (discuss) 05:12, 8 May 2024 (UTC)[reply]
Would there be an issue if contributors could request that their IP address be hidden by their screen name? CitationsFreak (talk) 05:42, 8 May 2024 (UTC)[reply]
Tangent: Is there a way to "claim" an edit you made while accidentally logged out? Caoimhin ceallach (talk) 21:51, 12 May 2024 (UTC)[reply]

The issue of Old Kashubian (Old Pomeranian?)

I came to a recent realization about the {{R:zlw-opl:SPJSP|Old Polish dictionary}}: it contains texts from Pomerania with Pomeranian features, as it was made during a time when Kashubian was considered a dialect of Polish. However, typologically, this is very, very wrong. Pomeranian is considered North Lechitic, and anything "Polish" and (Masovian, Upper Polish, Lower Polish, and Silesian) are considered East Lechitic, therefor anything Old Kashubian should not be considered Old Polish. I propose a split; I intend to add the location of creation for any Old Polish documents anyway for a future dialectal project (for Old Polish this means categorizing somehow location of attestation by dialect) and separating any texts from Pomerania for "Old Pomeranian" with a code zlw-opm, or perhaps "Old Kashubian" zlw-ocb with Kashubian and Slovincian as the children. These codes seem clunky to me and I am open to others. I have also corroborated this by emailing the editors of the Old Polish dictionary, who have told me that it indeed is "Old Kashubian", which they accept in their framework of Old Polish. Gorazd also holds the same view. @Thadh @Sławobóg @Rakso43243 @Benwing2 @Mahagaja @Silmethule. Vininn126 (talk) 10:50, 6 May 2024 (UTC)[reply]

Alternatived are if we accept Kashubian and Slovincian as the descendants of Old Pomeranian, then we could set them both to be descendants of Old Polish. However, the argument for this is one could accept "Old Kashubian" as a constituent of Old Polish - not a dialect, but constituent. This is what the editor of the Old Polish dictionary told me, quote " Nie napisałam, że to dialekt. Napisałam, że to element składowy języka staropolskiego. To duża różnica. Język starokaszubski to element składowy języka staropolskiego." The alternative is also we ignore this, which seems wrong to me as well. Vininn126 (talk) 11:42, 6 May 2024 (UTC)[reply]
Another solution: give Old Kashubian an etycode and make it an alt of Old Polish and if a term is attested in Pomerania, we could set the Kashubian and Slovincian reflexes as inherited from that? Otherwise directly from Proto-Slavic. Vininn126 (talk) 14:36, 6 May 2024 (UTC)[reply]
@Vininn126 I think this last solution is maybe the best. This is similar to what is done with Old Northern French, which is considered an etym-only variety of Old French even though Old French as normally construed refers to the Old French of the Paris area whereas Old Northern French refers to the Old French of Normandy, and neither is an ancestor or descendant of the other. The two differ significantly in phonology, e.g. Old French chacier /tʃatsiɛr/ -> English "chase" vs. Old Northern French cachier /katʃiɛr/ -> English "catch". Anglo-Norman and modern Norman are both descendants of Old Northern French (although we currently list Norman as a descendant of Middle French, which is wrong) and modern French is a descendant of Old French per se. Benwing2 (talk) 18:38, 6 May 2024 (UTC)[reply]
I know @Silmethule also mentioned a similar situation with Ancient and Mycenean Greek and also Old Norse and Swedish/Icelandic. See also my question on WT:About Old Polish. Related to that, I'm unsure how to handle labels for all of this. I think we'd want to list Kashubian/Slovincian in the Old Polish entries if and only if a text from Pomerania has an attestation. And any Kashubian/Slovincian words should still have "inherited from Old Kashubian/Pomeranian". Vininn126 (talk) 18:49, 6 May 2024 (UTC)[reply]
@Nicodene As our resident Romance expert, do you agree with changing the ancestor of Norman to be Old Northern French instead of Middle French? This will cause the 5 terms in CAT:Norman terms inherited from Middle French to throw errors, I think. Can you fix up those 5 terms? Also I notice there are 30 terms in CAT:Norman terms inherited from Medieval Latin, which seems impossible and probably need to be cleaned up. Benwing2 (talk) 19:54, 6 May 2024 (UTC)[reply]
I've just cleared out the categories in question. Αgreed on removing Middle French as an ancestor of Norman. As for its further ancestor, I would leave it as just Old French, which includes ONF as-is. I think the latter are best treated as one overall language.
I've been meaning to eliminate '[Romance] terms inherited form Medieval Latin' in general, reassigning them to '...inherited from Early Medieval Latin' or '...borrowed from [later] Medieval Latin'. That will take some time. When it's done, perhaps we can make {{inh|romance language|ML.|...}} throw an error message and a brief comment. Nicodene (talk) 00:50, 7 May 2024 (UTC)[reply]
@Nicodene Thanks! I think the basic advantage of setting the ancestor of Norman to be Old Northern French is it more clearly shows the ancestry (when you go CAT:Norman language and look at the Ancestors panel) than just setting it to Old French. Since Old Northern French is an etym-only variant of Old French, I don't think it will make any difference in terms of what Norman terms are allowed to inherit from. What do you think? Benwing2 (talk) 01:44, 7 May 2024 (UTC)[reply]
Oh, so setting it to ONF won't disallow inheritance from Old French. In that case it sounds fine to me. Nicodene (talk) 01:50, 7 May 2024 (UTC)[reply]
Yeah that's right. Benwing2 (talk) 01:56, 7 May 2024 (UTC)[reply]
@Nicodene @Benwing2 Here's how it works. If you set a variety (etym-only language) as an ancestor, the descendant can inherit from:
  • That ancestor and any (sub)varieties of that ancestor (in this case, Old Northern French, and any varieties it might have).
  • The parent (in this case, Old French) unless the ancestral variety is also explicitly ancestral to its parent (read: the thing it's a variety of), which doesn't apply here. This is for situations like Tajik having Classical Persian as an ancestor: Classical Persian's set as a variety of Persian, but is also set as its ancestor. Since Tajik's ancestor is also Classical Persian, it's only possible for it to inherit from Classical Persian (and any varieties thereof), not Persian in general.
It can't inherit from:
  • Any other varieties of the parent which aren't in the direct lineage of its ancestor (i.e. it wouldn't be able to inherit from other varieties of Old French, unless they're ancestral to/descended from/a subvariety of Old Northern French). To use an Italic example: if we set the proto-language of Romance to be Vulgar Latin, instead of simply Latin, the Romance languages could also inherit from Classical Latin (its ancestor), Latin (the general parent) and Old Latin (set as the ancestor of Latin), but they wouldn't be able to inherit from varieties like Medieval Latin or New Latin, since they aren't in the direct lineage.
It sounds complicated, but it seems to line up pretty neatly with most people's intuitions in practice. Theknightwho (talk) 15:40, 10 May 2024 (UTC)[reply]
So it would be possible to set Old Kashubian as an etym-only variant of Old Polish and then set Kashubian and Slovincian as the children of Old Kashubian but not Old Polish? Vininn126 (talk) 15:43, 10 May 2024 (UTC)[reply]
@Vininn126 Per the rules just outlined, we could definitely make Old Kashubian an etym-only variety of Old Polish and set the ancestor of Kashubian and Slovincian to Old Kashubian, but people would still be able to "inherit" Kashubian and Slovincian terms from Old Polish. It'd be like the situation with Old French. If you wanted to avoid that, either we'd need a new flag or rule of some sort, or we'd need to change the name of Old Polish to e.g. "Old Lechitic" and make Old Polish an etym-only variety of Old Lechitic. @Theknightwho Here's a thought though. If we set the explicitly set the ancestor of Old Kashubian to Proto-Slavic, would that make it impossible to inherit Kashubian terms from Old Polish? That would be like a slight generalization of the special-case rule for ancestral-to-parent etym languages. Benwing2 (talk) 21:18, 10 May 2024 (UTC)[reply]
I've dreamed of "Old Lechitic", but it doesn't encompass Polabian. Vininn126 (talk) 22:08, 10 May 2024 (UTC)[reply]
@Vininn126 Sorry, why does Polabian matter here? It can just be excluded from Old Lechitic just as it would be excluded from Old Polish. Benwing2 (talk) 08:41, 11 May 2024 (UTC)[reply]
@Benwing2 I have actually tossed the idea of "Old Lechitic" around before with @Sławobóg and @Silmethule. I suppose since it contains Old Kashubian as well there is more precedent for the name. Vininn126 (talk) 08:45, 11 May 2024 (UTC)[reply]
@Benwing2 So far I think the name change and etycodes might be the best solution. I'd like to see if anyone else has any thoughts. If we agree, we can make this change, maybe once I finish adding location information to the quotation templates (or maybe that's not necessary...). Vininn126 (talk) 12:03, 13 May 2024 (UTC)[reply]
Also pinging @KamiruPL as the other main Old Polish editor so he can be aware of the goings-on and give his opinion. Vininn126 (talk) 08:34, 11 May 2024 (UTC)[reply]
I wouldn't like this. This is almost akin to handling Old East Slavic as an Old Church Slavonic variety. Pomeranian and Polish are two distinct branches, and the fact that an earlier stage was highly influenced in their literary variety by the other doesn't make them one and the same. Thadh (talk) 20:43, 6 May 2024 (UTC)[reply]
There's actually a similar issue with texts from Pomerania from {{R:pl:SXVI}} and {{R:pl:SXVII}} but I think we can safely nest these under modern Kashubian with a label, as I have done with Middle Polish. Vininn126 (talk) 19:43, 6 May 2024 (UTC)[reply]
Absolutely no to changing name of Old Polish to Old Lechitic or something. Since Kashubian belongs to different group, it should be separate Old Pomeranian L2 language. It would work better as Proto-Pomeranian too. Having etym-only code would be an alternative solution too, but then we are not consistent with our system (that made BG and MK descendants of OCS :)). Sławobóg (talk) 13:12, 14 May 2024 (UTC)[reply]
@Sławobóg As to your second point: are you saying we could set Old Pomeranian as an etycode within Old Polish? What about the issue where people would be able to give type e.g. {{inh+|csb|zlw-opl}} with no issues? Vininn126 (talk) 13:30, 14 May 2024 (UTC)[reply]
Having Kashubian as descendant of Old Polish is just wrong. Having "Old Pomeranian" as etym-code for Old Polish would be better, but still not as good as having separate lang, but it might make editing easier. Sławobóg (talk) 13:44, 14 May 2024 (UTC)[reply]
Would you be able to 1) assist in establishing spelling norms? 2) Dealing with the texts? 3) Understanding the grammar? 4) What about the fact that there are very few texts? 5) What about the fact that all of Old Polish already is a collection of dialects? Vininn126 (talk) 13:54, 14 May 2024 (UTC)[reply]
1-3) Probably not. 4) We have languages like that. 5) Pomeranian being part of it is wrong. I'm not gonna fight here, you asked be about opinion, I gave my opinion. And if you plan having Middle Kashubian, having Old Kashubian/Pomeranian as L2 would be a good thing. Sławobóg (talk) 14:48, 14 May 2024 (UTC)[reply]
I'm not arguing, I'm just asking questions. I have no problem if you question the points I raised earlier! Vininn126 (talk) 14:50, 14 May 2024 (UTC)[reply]
@Vininn126 I think we can fix the issue of {{inh+|csb|zlw-opl}}, if that would help. Benwing2 (talk) 14:54, 14 May 2024 (UTC)[reply]
@Benwing2 That could be a good compromise. Vininn126 (talk) 14:55, 14 May 2024 (UTC)[reply]
@Benwing2 @Sławobóg also mentioned a solution where we create a Kashubian label old "Old Pomeranian" and not place that information in the Old Polish article. I'm not sure how I feel about this considering how close Old Pomeranian was culturally and even linguistically to Old Polish at the time. Do you have any thoughts? Vininn126 (talk) 11:46, 17 May 2024 (UTC)[reply]
@Vininn126 Hmm, I'm not so versed in the ins and outs of Old Polish but given how close Old Pomeranian and Old Polish apparently were, along with the fact that (I assume) there will be relatively few lemmas under Old Pomeranian specifically, I think it might make sense to keep them under the same L2 and just fix the inheritance issue to prevent people inheriting from Kashubian directly to Old Polish instead of Old Pomeranian. I don't think fixing the inheritance issue is such a big deal; we just have to change the logic here Module:languages#L-869 that computes the ancestors of a given language. Benwing2 (talk) 00:02, 18 May 2024 (UTC)[reply]
Okay. I'm going to think on this issue a bit more. Vininn126 (talk) 08:18, 18 May 2024 (UTC)[reply]

Old Polish regional categorization

As a sort of continuation of Wiktionary:Beer_parlour/2024/May#The issue of Old Kashubian (Old Pomeranian?) and Wiktionary talk:About Old Polish#Regional Old Polish, I'm trying to figure out the best way to handle regional information for Old Polish. I have a document explaining the origin of most texts in Old Polish so it should be easy to figure out which of the 5 lects currently considered Old Polish (those being Masovian, Greater Polish, Lesser Polish, Silesian, and Pomeranian/Kashubian). I think it would be useful for readers to know which region a definition/term has been attested, as Old Polish wasn't a single entity and ultimately is the source of those modern dialects today, so we can see more clearly regional features and the like. My concern about using labels is that they would imply that a term might have been limited to a given lect, which we can't know for sure. What do others think? Vininn126 (talk) 19:17, 6 May 2024 (UTC)[reply]

One solution could be to use {{lb}} but print the text {{lb|zlw-opl|attested in|Masovia|Lesser Poland}} etc. @Benwing2, would this be technically bad? Vininn126 (talk) 15:56, 8 May 2024 (UTC)[reply]
@Vininn126 No, I don't see why that would be an issue. attested in isn't currently a recognized label but could easily be made one, so that it suppresses the following comma. Benwing2 (talk) 23:21, 8 May 2024 (UTC)[reply]
@Benwing2 Alright, that would be fine, and I think that's a good solution. Vininn126 (talk) 07:32, 9 May 2024 (UTC)[reply]
@Benwing2 Another solution would be to have the quotation templates categorize by dialect when added to a page. This probably would be a bad idea? Vininn126 (talk) 07:44, 9 May 2024 (UTC)[reply]
@Vininn126 Yeah the quotation templates do take a label but I feel uncomfortable categorizing based on that label. You could for example imagine someone illustrating a general-use term with a sentence written in a dialect, and labeling the quotation with the dialect in question; that doesn't mean in this case that the term is in the dialect. Benwing2 (talk) 08:24, 9 May 2024 (UTC)[reply]
@Benwing2 Alright so for now I'm going to add the location of creation of the documents and a note saying what label the quotation template should count toward, see {{RQ:zlw-opl:AcCas}}, and I'll add the labels and regions manually from there. Unless it'd be possible to do a bot job after. Vininn126 (talk) 08:26, 9 May 2024 (UTC)[reply]
@Vininn126 Might be possible, depends on how regular everything is and you making a list of all the quotation templates and associated lect/labels. Benwing2 (talk) 08:36, 9 May 2024 (UTC)[reply]
@Benwing2 I think it might be possible to try. Would it be possible to generate a list of text given in the {{{location}}} parameter? From there I could say which label should be given whenever that region appears under a definition and we could have a bot add those labels. (I might also want to update the output display of a few of the locations). Vininn126 (talk) 11:23, 26 May 2024 (UTC)[reply]
I also think this will help me make a decision with Old Pomeranian. Vininn126 (talk) 11:24, 26 May 2024 (UTC)[reply]
Finally, and sorry to dump a bunch of bot requests - could we remove any text within parentheses in quotations? They're not part of the original text, but used for things like clarification on the editor's end. Vininn126 (talk) 11:57, 26 May 2024 (UTC)[reply]
@Benwing2 Can we add this to the labels module? I'm slowly working through these sources and I think this is the best solution. Vininn126 (talk) 12:01, 13 May 2024 (UTC)[reply]

Continental Celtic

We have Continental Celtic as a family, but my understanding is that the consensus among Celticists is that is CC isn't a clade but just a term of convenience for Celtic languages other than the Insular Celtic ones. Isn't our custom at Wiktionary to have only actual genetic families, not convenient groupings? —Mahāgaja · talk 11:28, 7 May 2024 (UTC)[reply]

@Mahagaja Yeah we should get rid of this. BTW the Wikipedia article on Continental Celtic was in a terrible state due to a bunch of crap added a month ago, which I reverted. Benwing2 (talk) 22:06, 7 May 2024 (UTC)[reply]
Yeah, agreed. Theknightwho (talk) 04:09, 10 May 2024 (UTC)[reply]
@Mahagaja @Benwing2 I've done this, given there were no objections after nearly 3 weeks. Theknightwho (talk) 11:01, 27 May 2024 (UTC)[reply]

There are already far too many one-descendant Proto-Italic and Proto-Hellenic entries, and adding one descendant redlinks to, for example, a descendant tree or an etymology section is only going to encourage more of these entries being created. These redlinks should be banned. -saph 🍏 13:31, 7 May 2024 (UTC)[reply]

Right, there should be above-average incentive to create such a page, so unless it is already decided to have one, bots should neutralize these links. Fay Freak (talk) 13:51, 7 May 2024 (UTC)[reply]
In practice, what does a 'ban' on making certain kinds of redlinks mean, and what is the alternative it is supposed to incentivize? I guess mentioning the same form but not linking it would be slightly better, as it doesn't encourage creating an entry, but I'm not totally happy with that either in some cases. E.g. if the reconstructed form is itself doubtful, I wouldn't want it to be mentioned anywhere.--Urszag (talk) 15:44, 7 May 2024 (UTC)[reply]
For example:
From Proto-Italic *fworom, from Proto-Indo-European *dʰwor-om (enclosure, courtyard, i.e. something enclosed by the door, or the place outside, i.e. through the door), from *dʰwer- (door, gate).
With the Proto-Italic word displaying as just plain text, rather than what we currently have (forum). As for the reconstructed form being doubtful, we should just list the hypothesised PIE form, e.g.:
As opposed to the current etymology given at serius. -saph 🍏 15:58, 7 May 2024 (UTC)[reply]
The alternative it is supposed to incentivize is not creating such entries. You would have to have a more serious motive than ticking off a removed red link, since they are not apparent in the first place. Fay Freak (talk) 16:03, 7 May 2024 (UTC)[reply]
Agreed. Down the line it may also be worth discussing a general ban of reconstructions (and their associated redlinks) that have only one descendant and no derived terms. Nicodene (talk) 22:49, 9 May 2024 (UTC)[reply]
Could someone run a bot to do this? -saph 🍏 19:50, 10 May 2024 (UTC)[reply]

Add "Muslim", "Hindu" etc. labels?

Proposal to add labels for lemmas used by people of specific faiths (which are not necessarily religious terms, rather they're only used by certain groups. Case in point মিঞা (mĩa) which has a Muslim gloss, but the Muslim label is an alias for 'Islam', though it's not an 'Islamic' term, just used by Muslims. Urdu dictionaries, which I concern myself with, have used these labels for centuries without prejudice. I know this would be useful for languages in the Indian subcontinent, as well as European languages (especially English). نعم البدل (talk) 20:55, 7 May 2024 (UTC)[reply]

@نعم البدل There are (at least) two possibilities here. One is to disentangle the labels 'Muslim' and 'Islam' in a language-independent fashion, and the other is to do it for specific languages. I suspect the aliasing of 'Muslim' and 'Islam' was done with English entries in mind, where on the surface it makes a certain amount of sense (e.g. we have 'Muslim finance' as an alias of 'Islamic finance' and 'Christian' as an alias of 'Christianity'). A third possibility is to create a separate label, something like 'Muslim usage' or 'Muslim speakers', which makes it clear that the term is used by particular speech communities. Note that the advantage of doing it in a language-specific fashion is we can create associated categories, such as Category:Muslim Bengali, to categorize such terms, which wouldn't make so much sense if done language-independently. Finally, the adjective-noun issue you're bringing up isn't limited to this case; there is for example the issue of 'British India' (English terms formerly used in British India) vs. 'British Indian' (English terms currently used by Brits of Indian background).
BTW if you think the terms should be disentangled language-independently, you can see all current uses of the label 'Muslim' here: Special:WhatLinksHere/Wiktionary:Tracking/labels/label/Muslim (there are only 9 of them). Benwing2 (talk) 21:58, 7 May 2024 (UTC)[reply]
@Benwing2: I think the 'Muslim' (etc.) tag should be detached from the 'Islam' label and made into an independent label and placed under the Module:labels/data/topical so that, as you say, it can generate associated categories, something like Category:Bengali Muslim speech (similar to Category:English women's speech terms, a minor difference between 'Muslim Bengali' as the label I'm proposing should be shed of its religious connotations as much as possible).
  • you can see all current uses of the label 'Muslim' here – Thank you for this! As far as I can see, apart from marabout, all of the other terms should be placed under my proposed label, as that's what was probably implied. Note how the 'Muslim' tag in মিঞা (mĩa) was encapsulated with Template:a (added by an IP), not the 'Muslim' label – likely because the 'Muslim' label appends the lemma to Category:Islam which doesn't fit. نعم البدل (talk) 02:21, 8 May 2024 (UTC)[reply]
@نعم البدل OK, let's see if there are any objections/comments, and if not I'll make this change in a few days. Benwing2 (talk) 03:04, 8 May 2024 (UTC)[reply]
Yeah no worries! نعم البدل (talk) 17:34, 8 May 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Benwing2, نعم البدل For a while there was a category named CAT:Musalman Gujarati, which is now empty. The handful of terms that were in it were moved to CAT:Gujarati dialectal terms. It would be helpful if there is a category named something like CAT:Gujarati Muslim speech as a replacement for CAT:Musalman Gujarati.

There is a phenomenon known as being a Cultural Muslim, but not a practising Muslim, who might use the terms in a category such as CAT:Muslim speech but not necessarily identify with the terms in CAT:Islam. The same would probably be applicable to other faiths.

Would greetings such as salaam alaikum that are associated with a Muslim context but may or may not be intended to be Islamic be in the proposed CAT:Muslim speech alongside CAT:Islam? For this particular term, it says on Wikipedia that it is ‘common among Arabic speakers of other religions (such as Arab Christians and Mizrahi Jews)’. The usage notes section of नमस्ते says ‘it is often considered gracious to greet someone in their religion’s greeting’ [even if that differs from their own religion]. Kutchkutch (talk) 03:44, 10 May 2024 (UTC)[reply]

@Kutchkutch: I might be drifting away from the subject a little since I'm a little in interested in this :) The case with salaam alaikum is slightly complex, though. In Arabic, it's a common greeting, and used by people who follow Abrahamic faiths. I'm not really sure about the exact perception of that phrase in Arabic but in Urdu, it's sometimes the same, people who speak Urdu, regardless of their faith, might use that term, but some hardliners might be of the opinion that it's even forbidden to say 'Salam' to a non-Muslim, while other Muslims might not even bat an eye to the other's faith, and a label might not even be considered. Generally, I would say it applies to a CAT:Muslim speech (but not Category:Islam) because of alternatives like آداب (ādāb) being considered more 'neutral'. Is नमस्ते (namaste) considered to be inherently an Hindu phrase, as is generally the perception of Urdu speakers – even when it comes it to Hindi, or is it somewhat neutral? نعم البدل (talk) 01:44, 11 May 2024 (UTC)[reply]
@نعم البدل: Thanks for the clarification about سَلام عَلَیکُم (salām 'alaikum).
  • Is नमस्ते considered to be inherently an Hindu phrase, …when it comes it to Hindi, or is it somewhat neutral?
  • With respect to this proposal, नमस्ते and नमस्कार could go in CAT:Hindi Hindu speech. However, there is inherently nothing Hindu about the words नमस्ते and नमस्कार in of themselves other than Sanskrit being the liturgical language of Hinduism (similar to how Arabic is the liturgical language of Islam). What may considered inherently Hindu/Buddhist/Jain/Sikh about नमस्ते and नमस्कार is when the the salutation (and related hand gesture 🙏) is toward a deity rather than actual person.
  • Although the words नमस्ते and नमस्कार are found in Vedic literature in the context of worshipping Hindu deities, the words themselves are formations derived नमस्, which is cognate to نماز .نماز was probably associated with Zoroastrianism rather than Islam before the Islamic conquest of Persia, and this is indicative that the term was not inherently bound to a particular religion.
  • Even though नमस्ते and नमस्कार are considered Hindu greetings, it seems to be neutral when speaking Hindi because it may only be inappropriate to use them if both the speaker and listener belong to a community that has its own community-specific greeting such as सलाम अलैकुम (salām alaikum) among Muslims, जय जिनेंद्र (jay jinendra), among Jains and सत श्री अकाल (sat śrī akāl) among Sikhs. The reason for this may be that India is 79.8% Hindu (according to the 2011 census). If there are no overt indicators to guess the other person’s religion when talking to strangers, using the Hindu greetings (alongside the English greetings) may be considered as neutral since there is an 80% probability that the other person is a Hindu.
──────────────────────────────────────────────────────────────────────────────────────────────────── Thinking about the "Judeo-Urdu" vs "Urdu Jewish speech" question has me wondering how sensible labels/categories for "Jewish speech", "Muslim speech" etc really are...though I don't know what a better alternative is.
In theory, if we add a "Jewish speech" label, all of our entries in any Judeo-X lect, whether we treat it as a distinct language ("Judeo-Italian", "Judeo-Tat") or a dialect ("Judeo-Arabic"), could simultaneously gain this new "Jewish speech" label, by definition, no? Arguably a bulk of our Hebrew and Yiddish entries would also gain it. But is that useful? (So maybe, for such languages, we forgo the label? But where is the line? Do we have a Hindu-speech Hindi category, or do we assume the default for Hindi is Hindu and exceptions must be specified? But even more Bangladeshis are Muslim than Indians are Hindu...)
On a practical level, I worry that users will not grasp or maintain a distinction between "Muslim" and "Islam", because "it's mostly Muslims who use [such-and-such Islamic-religion-related word]" → "I'll label it 'Muslim'" is just too logical a train of thought, as is "only people who believe in Islam use this word" → "I'll label it 'Islam' [even if the word just means 'man' and not a per se religious concept]", so it'll be a perpetual maintenance task to keep the labels straight. I also wonder... would nonreligious people from 'traditionally Muslim' areas not use (e.g.) মিঞা? Is using মিঞা really bound up in being Islamic, something only Muslims do—and if so, why is it not then a {{lb|en|Islam}} term? I wonder if this is not better handled (like also the ostensible "English women's speech terms")* by usage notes, that a particular term is typically used by people from "culturally Muslim" communities...? But I concede that even usage notes imply that if there are many such terms, they could be in a category (and if something is a dialect, even a cultural dialect, well, we do often have labels for that), even if I wonder if it would be possible to find clearer wording ("culturally Muslim"?)...
Does Judeo-Urdu, in particular the spoken form, have a lot of words in common with "general" Urdu? Is the main distinction that Judeo-Urdu is Urdu written in Hebrew script, or are there pervasive "dialectal" differences e.g. in how vowels are pronounced or how words inflect? If, in speech, lots of words are held in common between Jewish speakers' Urdu and other people's Urdu, then it might be weird to call those common words "Jewish speech" words solely because Jews tend to use a different script in writing, no? (Conversely, if lots of words are different and Judeo-Urdu is its own lect, whether an independent language like Judeo-Italian or a dialetc like Judeo-Arabic, is there much benefit to categorizing things as both "Judeo-Urdu" and "Urdu Jewish speech"? But as I said above, I suppose we could set it up so that the labels that would generate "Urdu Jewish speech" and "Judeo-Urdu" were aliases for Urdu and only generated one of those categories no matter which one was entered...
This all seems...thorny.
(*Re "English women's speech": "Women's speech" does not seem to be a distinct lect in English the way it is in e.g. Sumerian ... but until very recently, the nature of our label- and category- system meant that any label that a single small language needed, had to be put into the singular big mishmash everyone saw presented as "labels that are available to all languages", so various people tried to find ways to apply myriad Sumerian-and Chinese- etc- specific labels to English... so I am tempted to change the few entries that use that label in English to instead have usage notes, where such notes/label would even be accurate, saying mostly women use it... and to restrict the label to only those languages which actually have distinct women's speech registers...) - -sche (discuss) 19:25, 12 May 2024 (UTC)[reply]
@-sche You've made a lot of good points. I agree with you about converting the terms in CAT:English women's speech terms into Usage notes. Interestingly, all but one are primarily used in foreign contexts; possibly the native languages in those contexts do have a women's register. And the one remaining (bestie) seems questionable; it has a cutesy feel to it, which is probably why it's being considered "women's speech" but I'm sure you can find examples of men using it. As for Judeo-Urdu etc. I think any variety that is predominantly used by a particular ethnic community should probably not be redundantly tagged using that community's speech tag. Hence you could have e.g. "Sikh speech" or "Jain speech" in Hindi but not "Hindu speech". Same goes e.g. for Yiddish and Ladino being tagged as "Jewish speech". Benwing2 (talk) 21:31, 12 May 2024 (UTC)[reply]
Sure we should have the labels. In other cases they are or have been even L2, as Christian Palestinian Aramaic and Jewish Palestinian Aramaic, the situation with information exchange in the past was of course more seclusive. For Arabic we agreed that the separate codes were exaggerated, but there were cases where I had to label Moroccan Arabic terms as Jewish-only, and even for Serbo-Croatian we have terms only working for Muslims.
Having to label most Hebrew, Yiddish and Ladino terms as Jewish is a strawman, Arab Israelis do great, of course etymologically many terms used by the minority will have foundation in the historical religion of the majority, the question is whether the respective frequency differs significantly. Fay Freak (talk) 21:51, 12 May 2024 (UTC)[reply]

Englishman picture

So User:Shoshin000 (among other trollish activities) has been insisting on adding a picture of an angry football hooligan as the picture of "Englishman". I reverted it once, he restored. I mention this because I know the modus operandi and soon I'll be accused of being a badmin. Check out the entry and you know the previous picture was nicer. Equinox 22:47, 7 May 2024 (UTC)[reply]

I personally think your picture is better (although I wonder, do we need a picture to illustrate this?). Benwing2 (talk) 00:00, 8 May 2024 (UTC)[reply]
Honestly, I like Shoshin's pic, as it's more stereotypical.[1] There's nothing inherently Englishman-y about Eq's pic, besides the depicted person being English.
[1] Then again, that's a good argument against the pic. CitationsFreak (talk) 03:25, 8 May 2024 (UTC)[reply]
It could be argued that pictures of nationalities, if they exist at all, should show someone of that nationality in characteristic clothing (although that is probably more appropriate for nationalities that actually have characteristic clothing that most people wear on a day-to-day basis). OTOH it's in general very hard to capture a nationality in single picture (for this reason, Wikipedia usually supplies a whole collection of pictures to illustrate a nationality), and in any case this is more encyclopedic than dictionaric (a real but rare word). Benwing2 (talk) 04:06, 8 May 2024 (UTC)[reply]
Yeah, I was thinking that a college would be best. I'm not sure what a recognizable British outfit would be, and having one person stand-in for Britain could imply that British people all are X. Highly unlikely, but possible. CitationsFreak (talk) 04:12, 8 May 2024 (UTC)[reply]
I don't think nationalities should have photos at all, but I also disagree that File:ENG-BEL (6).jpg is "a picture of an angry football hooligan". The person in that photo doesn't look angry, nor is he doing anything hooliganish. His Englishness is clearly shown by the St George's Cross painted on his face. He arguably does illustrate [[Englishman]] better than the photo of Greg Rutherford, since Rutherford is representing the entire UK (not just England) in his photo. All that said, however, it is probably better to leave such entries unillustrated to avoid stereotyping. —Mahāgaja · talk 08:07, 8 May 2024 (UTC)[reply]
I agree, this is not an image that requires an image. Vininn126 (talk) 08:09, 8 May 2024 (UTC)[reply]
Aren't photos appropriate where there is an attestable, probably dated and often derogatory or demeaning, definition of a stereotype? Eg, Bavarians with lederhosen, Prussians with spiked helmets, Mexicans with sombreros and/or serapes.
There is no such definition here, nor would I expect us to attest any such definition. DCDuring (talk) 17:34, 8 May 2024 (UTC)[reply]
I don't really see this picture as a problem, really, even though I wouldn't pick it myself. It'd probably be fine as part of a collage. Theknightwho (talk) 17:51, 8 May 2024 (UTC)[reply]

Fixing Telugu rhymes

For years now, User:Rajasekhar1961 has been adding Telugu rhymes written in Telugu script instead of IPA. There is a special hack in Module:rhymes to deal with this, but IMO Telugu should (obviously) use IPA for rhymes, just like all other languages. Does anyone object to this? Can anyone out there read Telugu script well enough to tell me if the rhymes listed under Rhymes:Telugu (e.g. Rhymes:Telugu/రం) and Category:Rhymes:Telugu are even salvageable, or should just be nuked? I don't know much about Telugu but scripts are generally not 1-to-1 mappable to IPA, so I don't know what it means to have a rhyme listed using Telugu script. Benwing2 (talk) 00:40, 8 May 2024 (UTC)[reply]

Strongly agree. Theknightwho (talk) 11:40, 8 May 2024 (UTC)[reply]
@Benwing2, Theknightwho Rajasekhar1961 has certainly put effort into creating CAT:Telugu rhymes. However, unless the definition of a rhyme in a Telugu or Dravidian context differs from
‘the second part of a syllable, from the vowel on, as opposed to the onset’
you are correct in pointing out that these do not appear to have been done correctly. From an orthographic perspective, the final consonant (or consonant cluster) followed by the final diacritic (or the inherent schwa) of a word written in Telugu script (which is a Southern Brahmic abugida) does not constitute a rhyme. The entries in CAT:Telugu rhymes categorise words by word-final syllables rather than rhymes because the onset is included.
A Telugu editor could probably rectify the words mentioned on the entries in CAT:Telugu rhymes. However, even if there is a user with the appropriate background to do so, it would be a lot of work, and it would be the equivalent of deleting the entries currently in CAT:Telugu rhymes and starting over again. Kutchkutch (talk) 11:13, 10 May 2024 (UTC)[reply]
@Kutchkutch Thanks. User:Rajasekhar1961 can you comment on why you did this? If I don't hear from you in a few days I will go ahead and delete all the Telugu rhymes. Benwing2 (talk) 14:50, 10 May 2024 (UTC)[reply]

Kwami is messing with translingual entries, again

Just want to make sure there are some eyes on Kwami, as they've been making mass edits to Translingual entries that seem... worrying. After being reverted by @Theknightwho and @Benwing2 for deleting the translingual section, Kwami has recently begun deleting all the definitions from the translingual section instead.

I reverted all (but one) of the single character edits they've made today. However, they've been editing hundreds of TL entries and I have no idea how many entries are affected, as I've been very busy recently and can't check.

I'm not sure how bad the situation is so I don't want to "call out" Kwami. Just want to make sure people are aware before it becomes out of hand, like the last time this was discussed on here. — Sameer مشارکت‌هابحث﴿ 23:54, 8 May 2024 (UTC)[reply]

@Sameerhameedy Thank you. I have blocked him for a month this time; I am getting seriously sick of this. I think he has used up all his lives; next time we should consider a permablock. Benwing2 (talk) 00:38, 9 May 2024 (UTC)[reply]
Thank you, I'm also a bit annoyed since Kwami has gotten so many warnings and continues to do the same action. Now, Kwami has indicated that they will actually start a discussion on this issue before acting. There's no way to know if Kwami will actually follow through on that statement, but hopefully they do, so we don't have to do this every month. — Sameer مشارکت‌هابحث﴿ 00:51, 9 May 2024 (UTC)[reply]
Just to clarify, these weren't random articles. I went through the whole Latin Extended Additional block and replaced physical descriptions (e.g. "the letter N with a line below") with requests for definition. I didn't delete actual definitions that would tell the reader what the letter meant or what it was used for.
Sameer, the discussion is the next thread. kwami (talk) 06:16, 10 May 2024 (UTC)[reply]
@Kwamikagami That is exactly the issue. You are continuing to fail to see that there is no consensus for doing what you did, after 10+ times that you've been asked to get consensus *BEFORE* doing mass changes. If you're not seeing this now, I doubt you will ever see it, and if you're not willing to defer to and respect consensus, you're in for a permablock. Benwing2 (talk) 08:01, 10 May 2024 (UTC)[reply]
@Benwing2, Sameer was concerned that there may be many more such edits, so I clarified what edits I had made. That included the category of articles I had edited, and the kind of edits I had made on them. I thought they might find that helpful.
As to your point, I wonder how possible it is to get consensus to do anything here. Hopefully the discussion below will produce consensus. My hopes aren't high, given that previous discussions got nowhere, but you never know. kwami (talk) 09:06, 10 May 2024 (UTC)[reply]
I've been at Wiktionary for almost 20 years and have never yet seen a Beer parlour discussion result in consensus, so my hopes aren't high either. —Mahāgaja · talk 09:16, 10 May 2024 (UTC)[reply]
You can't have read many Beer Parlour discussions, then. Kwami is simply trying to convince themselves that what they're being asked to do is impossible, because they can't ever accept they're wrong about anything, ever. It's not complicated. Theknightwho (talk) 10:57, 10 May 2024 (UTC)[reply]
@Mahagaja I have seen plenty of Beer Parlour discussions that result in consensus; not sure what you're referring to. Benwing2 (talk) 14:53, 10 May 2024 (UTC)[reply]
I guess ironically there's no consensus if there's consensus? And while I think us Wiktionarians like to bicker and we often disagree over certain details and such, I do think there's enough cooperation, compromise, and agreement to say that plenty of threads end inn consensus. Vininn126 (talk) 15:10, 10 May 2024 (UTC)[reply]
If I decide that "q" is not a proper English letter unless followed by "u" and I want to get rid of all the English entries with a "q" not followed by "u", there is no way that I can get consensus for that via any process. That doesn't mean that I can go ahead and remove the English entries for words like Qatar and Iraq or even BBQ (it's not a proper abbreviation) because the usual process doesn't work. It means I should find something else to do. The unwritten question underlying all of this is "how can I get my way when I'm right and I can't get people to say they agree with me". Yes, the process isn't perfect, and sometimes doesn't work- but rejecting it entirely won't fix it. Chuck Entz (talk) 16:16, 10 May 2024 (UTC)[reply]
That's why I'm here. The question is straightforward: do we have standards for what counts as a definition? If so, what are they? Where can I find them?
In this case, does a graphic description count as a definition? Quite a few editors have said they do not, but there seems to be difficulty in implementing that.
Also, should we have a translingual section without providing evidence of translingual use? Especially when there is no definition in that translingual section?
Do we have consensus that such things should be tagged with RFDef or RFD, and how should I respond if I tag them and someone goes through and deletes the tags without discussion because they don't like the extra work?
It's fine to say 'go to RFD', but why spend months doing that if it should be obvious from the outset that they're not going to pass? That's a waste of everyone's time. That's why I'd like some concrete standards to follow. I assume Wikt must have standards; if you could just show me where they are (I don't see anything in the help pages), I could add a link to my user page and refer to them when making edits. Then instead of arguing over every edit, I could point to the standards and show that I've been following them, or they could point to them and show that I've been violating them. I don't mean about the RFD process, but about the content of our articles. kwami (talk) 19:37, 10 May 2024 (UTC)[reply]
An example to provoke thought is zebra. A definition as 'an equid with prominent black and white stripes' would be an accurate description, and would work even if they were not a clade. (An early cladistic study concluded that they were not - morphology is a poor guide to details of relationships.) For Unicode characters - and Unicode provides an important high level classification of glyphs - the general principle is that combining marks are distinguished on the basis of shape. Therefore, graphic descriptions are quite relevant for 'precomposed' characters.
If there's an objection to a claim of translinguality, then raise a request for verification.
If tags are deleted without reason, then complain to the Beer Parlour if you can't find a helpful admin.
What may seem obvious to you is not necessarily even true. There are some very obscure characters around, quite possibly restricted to expensive books.
Wiktionary has a general disdain for wikilawyering, so don't expect everything to be laid down. Wiktionary also seems quite bad at documenting things - I'd love a guide on the anatomy of a definition. --RichardW57m (talk) 12:56, 13 May 2024 (UTC)[reply]
@RichardW57m For physical objects, a physical description is fine. And I think it would be fine if we had "the letter a with an acute accent, used for ...," where we went on to give its use or meaning. And a short description without definition would be fine under a 'description' section or even 'etymology', assuming it's accurate (many Unicode names are not, they're just labels that need to differ from all others). But these cases are like defining 'zebra' as "a word spelled Z-E-B-R-A", and placing it under 'translingual' because there are multiple languages that have such a word, even if they mean different things (maybe some languages use it only in the sense of a crosswalk, others only for the equid), and give it only the English pronunciation /ˈziːbrə/ because English is the most important language. (Yes, we have characters listed under 'translingual' even though we give them the pronunciation of a particular language.)
Many people have now said that the Unicode name of a character or emoji, or similar sum-of-parts description, is not appropriate on its own as the definition. The problem I've had is trying to implement that. I've been told to take it to RFD, but that generally doesn't work. It would be nice to have some agreement as to what counts as a definition. kwami (talk) 18:19, 13 May 2024 (UTC)[reply]
"the letter a with an acute accent, used for ...,"

I don't know how many languages change the meaning of an accent per character. It would much more efficient to say "the letter a with an acute accent" with the page acute accent explaining what the diacritic does. — SAMEER (؂؄؏) 18:49, 13 May 2024 (UTC)[reply]
I'd be fine if we had consensus on doing that, but would think we'd want something more. Such a description often wouldn't say anything more that the Unicode name, and conflicts with our sum-of-parts criterion. Plus, people often just use the Unicode name even when it's not an accurate description. Another problem is that it creates a blue link that make it look like we have a definition when we don't have a functional one. For people like me who use red links as a guide to creating missing articles, that can be a problem. Also, not all Unicode characters actually exist, some are errors. And some are rare enough that giving a graphical description as a definition doesn't do much for the reader, who may still not know what the character is used for or what languages it appears in. kwami (talk) 19:10, 13 May 2024 (UTC)[reply]
The problem I've had is trying to implement that. I've been told to take it to RFD, but that generally doesn't work. - the small problem with this is that you didn't take things to RFD, though, and it won't become any more true just because you keep repeating it. It's very clear that Kwami will never, ever understand why their approach is wrong. Theknightwho (talk) 18:53, 13 May 2024 (UTC)[reply]
I've taken dozens, possibly hundreds, of articles to RFD, or at least tag them so that the people at RFD respond to them. I don't know why you keep denying that. You repeating things over and over doesn't make them true either. Repeated false accusations like this are one reason I have a hard time accepting that you act in good faith. kwami (talk) 19:01, 13 May 2024 (UTC)[reply]
@Kwamikagami I've found one, plus this which isn't relevant. Are you referring to all those entries you mis-tagged with {{d}} (speedy deletion), which exists to avoid doing the RFD process for routine deletions? I've just remembered that, after I found this comment where you try to bullshit about that, as well. Good grief. Theknightwho (talk) 19:22, 13 May 2024 (UTC)[reply]
Yes, the POV of anyone who disagrees with you is "bullshit", while your POV is "truth". Again, arguing in bad faith and you habitually do.
I don't recall which abbreviation of which template I used for which article. Some of them were probably speedies. Some were RFD. Some later on were RFDef (which I know you think is somehow illegitimate, but I maintain is still a valid use of process). kwami (talk) 20:17, 13 May 2024 (UTC)[reply]
@Kwamikagami Saying I've taken dozens, possibly hundreds, of articles to RFD after taking two is bullshit, yes. Theknightwho (talk) 20:23, 13 May 2024 (UTC)[reply]
@Kwamikagami Replacing a definition with {{rfdef}} is not a valid process. In general you should never delete content even if you don't like it. Benwing2 (talk) 20:40, 13 May 2024 (UTC)[reply]
I understand that. I meant that rfdef itself is a valid process. kwami (talk) 20:48, 13 May 2024 (UTC)[reply]
RFDef is not a process - it's just the request template {{rfdef}}. It can't be used instead of RFD, and what you've just said makes absolutely no sense when the only times you used RFDef were in an attempt to circumvent the RFD process. Theknightwho (talk) 20:55, 13 May 2024 (UTC)[reply]
If RFD doesn't work, then RFDef is another possibility. If no-one can furnish a definition, then there's an argument that the entry should be deleted. How is that "circumventing" the process? Again, you attribute bad faith to anything you don't like, which simply shows bad faith on your part. kwami (talk) 21:01, 13 May 2024 (UTC)[reply]
@Kwamikagami If RFD doesn't work You've only ever taken two things to RFD, and one of those wasn't an entry, so you cannot possibly make that claim. You also seem to be under the bizarre impression that you're entitled to delete entries in other ways if you don't get what you want out of the RFD process.
I'm out of patience with this complete and utter refusal to understand the problem, and it's pretty clear other people are as well. Theknightwho (talk) 21:08, 13 May 2024 (UTC)[reply]
A few dozen articles were deleted, so somehow your count is off.
Not deleting entries in other ways, requesting completion or deletion in other ways. I didn't delete these entries I was blocked for. I replaced non-definitions with requests for definition -- and I won't do that again -- but the entries remained. kwami (talk) 22:02, 13 May 2024 (UTC)[reply]
@Kwamikagami There's nothing wrong with my counting: you only brought two things to RFD. It's not difficult to understand. Theknightwho (talk) 22:13, 13 May 2024 (UTC)[reply]
To clarify, it looks like @Kwamikagami only brought two articles to RFD, but tagged a bunch of articles for speedy deletion, which were deleted before the admins realized those speedy deletions were bogus. RFD is in general the correct process for requesting deletion of pages you believe ought to be deleted that don't meet the speedy deletion criteria; but IMO if you are going to request deletion of a large number of articles, you should not tag every article with RFD, just make a post in RFD with a title "all articles meeting such-and-such criteria" and give your reasons. Or alternatively, bring it up in the Beer Parlour if it's controversial and merits being seen more generally. Benwing2 (talk) 22:31, 13 May 2024 (UTC)[reply]
Precisely. Theknightwho (talk) 22:32, 13 May 2024 (UTC)[reply]
Okay, so that's what I'm doing: bringing it up at the Beer Parlour, as I was advised to do.
But if it's better at RFD than here, I can start a thread there.
As for "bogus", the opinion I got from other editors at the time was that, if an article had no real content, then it met the criteria and it should be deleted. That specifically included articles that consisted of nothing but the character box and Unicode name in the definition section. I didn't start this, but picked up from where I saw others acting. This has been happening for years, especially with emojis, where someone would go through and create batches of emoji articles defined simply as their unicode names, then someone else going through and deleting them, then someone else recreating them, etc. You can see that in their deletion histories. Less common with non-emoji characters, but there's a history of this there as well. So not only did I have no reason to think this was inappropriate, I was told it was appropriate and was something Wikt needed to keep on top of. kwami (talk) 00:45, 14 May 2024 (UTC)[reply]
Yes, and those people would have expected you to take any entries like that to RFD, instead of unilaterally deleting them, as there needs to be consensus that they contain no content. How are you still failing to understand such a simple concept? Theknightwho (talk) 10:42, 14 May 2024 (UTC)[reply]
And adding {{rfc|mul|Need meaning rather than graphical description}} to an existing entry is a valid process. --RichardW57m (talk) 14:12, 16 May 2024 (UTC)[reply]
Thanks. kwami (talk) 19:41, 16 May 2024 (UTC)[reply]
What he wrote was "I've taken dozens, possibly hundreds, of articles to RFD, or at least tag them so that the people at RFD respond to them". Bickering about numbers is not constructive. The issue is whether or not he (or anyone) should be wholesale deleting translingual definitions and it seems pretty clear that the answer is no. —Justin (koavf)TCM 23:20, 13 May 2024 (UTC)[reply]
I didn't see that as deleting definitions. I replaced Unicode names with requests for actual definitions. I won't do that again. kwami (talk) 00:47, 14 May 2024 (UTC)[reply]
I appreciate that you're acting in good faith and trying to do what you think is best, but it's not clear to me that you're correct and I'm honestly very surprised that you keep on doing things that seem like big unilateral changes without consulting others first because this kind of complaint has come up repeatedly. —Justin (koavf)TCM 00:59, 14 May 2024 (UTC)[reply]
Yes, it has. I need to be more careful. I could've just tacked the RFDef tag on the end of the description. kwami (talk) 03:31, 14 May 2024 (UTC)[reply]
@Kwamikagami That's not the right thing to do either. IMO you need to get consensus *BEFORE* making changes to large numbers of pages, even just adding {{rfdef}}; if you can't get that consensus, don't make the changes. Benwing2 (talk) 03:44, 14 May 2024 (UTC)[reply]
Okay. We'll see how the question below pans out. kwami (talk) 03:45, 14 May 2024 (UTC)[reply]
The letter 'a' with a grave accent is a translingual description. The particular meaning of the grave obviously varies from language to language, and sometimes it may merely be an arbitrary diacritic, perhaps even just a word diacritic as in French. Translingually, the accent part is often used as a tone mark, but I think it can also be a symbol used as an input to stress assignment rules (though I may be being confused by my own private invention). Perhaps we need a vote on whether precomposed characters can be dismissed as sums of parts.
We even have some interesting language assignment questions. For example, 'ṉ' denotes the alveolar nasal of Tamil and Malayalam, but I think it's arguable that it isn't a letter of those languages. I'm a bit bothered by a pair of Lithuanian diacritics which don't seem to be any part of standard Lithuanian. --RichardW57m (talk) 12:41, 14 May 2024 (UTC)[reply]
Yes, we define '' as the Latin transliteration of letters in Tamil and Malayalam scripts, and place that under a translingual heading rather than under Tamil or Malayalam headings. I think that's appropriate. For á, we don't have a translingual heading, as no-one has come up with a translingual use/definition. Not saying one doesn't exist; e.g. there's IPA [a] with high tone, but as you say that's SOP. Again, I think that approach is appropriate. The question then is whether we want to make this our general approach, and reserve the translingual section for definable translingual uses. (Perhaps if we had a translingual use of 'á', we might also give the SOP use in IPA for clarity, but that wouldn't be enough to create the translingual section in the first place.) kwami (talk) 18:12, 14 May 2024 (UTC)[reply]
In favor of this. Vininn126 (talk) 09:41, 10 May 2024 (UTC)[reply]

Do descriptions count as "definitions"?

I'm not being facetious here. This is a serious question for something I haven't understood for a long time.

For instance, in the article á, would "the letter a with an acute accent" be a valid definition? If so, should such descriptions be added to all letters? If not, should they be removed (perhaps placed under a "Description" heading instead)? And if not, and the only material for an article is such a non-definition, should the entry be tagged as needing a definition, or the article tagged for deletion for having no content?

I suspect that if I were to add a definition to cat as "the word spelled C-A-T", I would be blocked for vandalism. I don't see any meaningful difference between that and defining á as "the letter a with an acute accent". I've been told this is a straw-man argument, but I really don't understand what's appropriate in our entries if graphical descriptions are allowed as actual definitions.

The same applies to emojis, of course. Should an emoji of a face with tears be defined as "a face with tears", or should the definition be what it means and what it's used for? kwami (talk) 00:29, 9 May 2024 (UTC)[reply]

@Kwamikagami I agree that "the letter a with an acute accent" is not a good definition. If á had a translingual section it should at least explain how the letter is typically used across languages. I assume it usually represents some kind of /a/? Ioaxxere (talk) 01:43, 9 May 2024 (UTC)[reply]
A definition that depended on users understanding IPA, however, would be unsatisfactory. DCDuring (talk) 01:56, 9 May 2024 (UTC)[reply]
Personally, I don't see the point of a translingual section, except for things like the IPA or IAST transliteration, and this particular case would only be sum-of-parts in such cases.
But my question is what should be done with articles that have such non-definitions. Since starting this discussion, I was blocked for pasting [rfdef] tags yesterday on a bunch of articles in place of such descriptions.
So,
  1. since it is not a good definition, can it be removed?
  2. should it be replaced with a request for definition, or should the empty section be deleted?
kwami (talk) 01:56, 9 May 2024 (UTC)[reply]
@Kwamikagami: I think it's better to improve the definition rather than adding a ton of {{rfdef}}s as this creates lots of work for other editors. Ioaxxere (talk) 02:09, 9 May 2024 (UTC)[reply]
I agree, and I've been doing that where I can. But how do I improve the definition when there is no definition? What is the translingual definition of a letter that does not have translingual use? What is the definition of a letter that has no evidence of any kind of use? What I've been tagging are cases where I can't find any definition to provide.
The reason I've been adding rfdef tags is that I'm not allowed to delete empty entries.
So, if there's an empty article or section, one that has no content except a character box, and no definition of what's supposedly being defined, what's the solution? Do we leave it as a joke, or do we try to improve it? If we want to improve it, how do we do that, when there's no available data to improve it with?
If someone added a bunch of articles, all with the definition being "it's a word", shouldn't we at the very least tag them as needing actual definitions, even if that creates work for people? kwami (talk) 02:17, 9 May 2024 (UTC)[reply]
I've gone through and added definitions to hundreds of these articles. The ones I've tagged are ones where I can find to definition to give. It's a choice of adding a tag or leaving Wikt looking like a joke. kwami (talk) 02:24, 9 May 2024 (UTC)[reply]
Clearly, most definitions are simple descriptions, prototypically, for nouns, having a hypernym and differentia. The descriptions are also supposed to be useful to users. "The word spelled C-A-T" is not useful being redundant to the graphic representation of the headword itself, thus a straw-man. For a Latin letter with a diacritical mark it might be useful for some that the definition explained how the so-marked letter differs from the Latin character without the mark or those with other marks in each relevant character set.
A definition of a word naming an emoji might include a description as well as what the emoji is understood to mean, like a good definition of green light. The entry might also have the appropriate image, too, constituting an ostensive definition, redundant to the headword in the case of Latin characters diacritically marked. DCDuring (talk) 01:56, 9 May 2024 (UTC)[reply]
> "The word spelled C-A-T" is not useful being redundant to the graphic representation of the headword itself"
But "the letter a with an acute accent" is equally redundant to the graphic representation of the headword itself. So no, it's not a straw-man, it's reductio ad absurdum.
Because our users include many people who are not familiar with diacritical marks on Latin letters we give them an explanation of what they are looking at. Those descriptions also help those who, like me, don't have great vision and can't necessarily discriminate among the various diacritical marks and don't necessarily know the names of those marks. DCDuring (talk) 12:29, 9 May 2024 (UTC)[reply]
I have no problem with that. That's what the 'description' section is for. But it's not a definition.
We also have a pronunciation section for people who don't know how to pronounce a word. But again, the pronunciation of a word is not its definition.
In many cases I've moved the description to a description section, and tagged the definition section as needing a definition. But then people get annoyed that I'm creating work for them, because now they're expected to treat Wikt as an actual dictionary.
Much of the opposition to improving articles seems to center around it being more important for Wikt to be correctly formatted and to look good, than for it to actually contain any content or be useful as a dictionary. kwami (talk) 21:58, 9 May 2024 (UTC)[reply]
Definitions of nouns are not descriptions of the word, but of the meaning of the word. Orthography and pronunciation belong in other sections: they are not the definition itself. Why should graphemes (incl. emojis) be different? Many have (graphical) description, etymology and pronunciation sections. Those cover the description of the letter as a mark on paper or as said. A definition concerns itself with meaning. If no meaning is provided, so the reader can't tell what the symbol is for, then we're not providing a functional definition. kwami (talk) 02:00, 9 May 2024 (UTC)[reply]
BTW, we also have cases of 'translingual' sections with zero evidence of translingual use. Sometimes letters are specific to a particular language, yet they have a translingual section with no definition. All that does is push the actual definition down lower on the page, where you can't see it without scrolling. That is a minority situation, but we do have hundreds of articles with "the letter a with an acute accent"-type descriptions as their 'definition'. kwami (talk) 02:10, 9 May 2024 (UTC)[reply]
I guess my question is, if it's not acceptable to add these fake definitions, and it's not acceptable to tag them for improvement, why is it not acceptable to delete them? kwami (talk) 02:29, 9 May 2024 (UTC)[reply]
@Kwamikagami What's not acceptable is deleting them without going through RFV or RFD. Come on, we've said this many times by now. Theknightwho (talk) 02:36, 9 May 2024 (UTC)[reply]
We have the similar precedent of ligatures, an explicitly permitted 'part of speech'. Now, for त्र (tra), we have the reasonable definition "In Devanagari an irregular ligature of त and र". For characters, we sometimes benefit from having the composition pointed out. Of course, it's rarely as useful as that of ģ (g with cedilla). --RichardW57m (talk) 14:51, 16 May 2024 (UTC)[reply]
In the case of त्र, since we have a Translingual entry, what is the point of the Dhivehi, Hindi, and Marathi entries that don't say anything different? —Mahāgaja · talk 14:56, 16 May 2024 (UTC)[reply]
I suspect it's that nastiest of translingual issues, pronunciation, which differs between languages. --RichardW57m (talk) 17:24, 16 May 2024 (UTC)[reply]
I agree that in cases like त्र, a translingual entry makes the most sense. In many cases we will have language-specific material to add as well, but here even the pronunciations are sum-of-parts. Though the reader might not be sure of that if we don't spell it out. kwami (talk) 19:56, 16 May 2024 (UTC)[reply]
Divergent pronunciations can be seen with the ligature ज्ञ (jña); we have documented the divergence. --RichardW57m (talk) 14:06, 23 May 2024 (UTC)[reply]
That and ksh are special cases, and are unpredictable graphically and well phonetically. Most consonants + r are just sum-of-parts /Cr/. But that's the opposite question, on whether we should add predictable language-specific entries. kwami (talk) 20:58, 23 May 2024 (UTC)[reply]
DCDuring, I Kinda agree. People may not know what the diacritics are called + I find it helpful that translingual sections show variations of the same character across all languages. Kwami has, in many cases, moved the character variations template to a specific language. Which is problematic as it makes no sense to put translingual latin character variations under, say, Latvian as Kwami decided to do here. Since, y'know, I doubt Latvian has every Unicode variation of 'g' and every unicode character with a cedilla in its alphabet. — SAMEER (؂؄؏) 17:52, 16 May 2024 (UTC)[reply]
Our lack of explicit consensus licenses some aspects of the bad behavior being complained about. I'd favor the not-so-radical solution of keeping each letter (with diacritics) only under a Translingual header, with links to language-, language-family-, or script-specific appendices that cover the complete alphabets of each language etc. in which they are used. If there is language-specific content that doesn't fit in said appendices, I would try first to make it fit by expanding the Appendix and only then allow language-specific L2s for the letter. DCDuring (talk) 18:15, 16 May 2024 (UTC)[reply]

English anagrams

English anagrams haven't been updated in a while. Could someone run a bot to update them? Maybe @Kiril kovachev, Benwing2 Ioaxxere (talk) 01:43, 9 May 2024 (UTC)[reply]

@Ioaxxere I can try, but I'm not sure if I trust myself to do it properly. Specifically the part of which characters (like punctuation) should be removed when comparing two words. Kiril kovachev (talkcontribs) 12:41, 9 May 2024 (UTC)[reply]
@Kiril kovachev: It doesn't matter too much, since the vast majority of English terms don't have any special characters. Punctuation (periods, commas, etc.) as well as different casing should definitely be ignored, but I have no preference with respect to diacritics. Ioaxxere (talk) 15:45, 9 May 2024 (UTC)[reply]
@Ioaxxere Okay, I've updated the word list I am using, so it should be ready to run, but I've run it a bit to get some sample edits. Should it consider e.g. lork and kröl to be anagrams? Or and no etc.? In the case of diacritics I also don't have much of a preference, and it can be configured to not remove them if we want. I can make it run as-is if this seem ok. Kiril kovachev (talkcontribs) 13:11, 15 May 2024 (UTC)[reply]
@Kiril kovachev: Looks good. Will the bot be running continuously or on a schedule, e.g. updates every 3 months? Ioaxxere (talk) 14:22, 15 May 2024 (UTC)[reply]
It's based on what words are currently on Wiktionary (I got the parsed data from kaikki.org, which is updated ~every month), so I can make it run every time the words are substantially different. I can run it every 3 months if you like, as long as we don't spot any obvious problems. Kiril kovachev (talkcontribs) 16:24, 15 May 2024 (UTC)[reply]
@Ioaxxere (forgot ping) Kiril kovachev (talkcontribs) 16:25, 15 May 2024 (UTC)[reply]
Sounds good. Ioaxxere (talk) 16:32, 15 May 2024 (UTC)[reply]
@Ioaxxere We had a bit of a stall, since there's a little problem with the code when the entry contains a "/" in it, but I'll be working on a fix these few days. Actually it shouldn't be too complicated so hopefully we'll be back soon :) I don't know how long this is gonna take though, as it was going for a few hours already before it even managed to crash. I'll let you know again when I figure out how long it'll be... Kiril kovachev (talkcontribs) 15:41, 22 May 2024 (UTC)[reply]

becocked, and whether we want Trivia sections

Recently, Trump used the word becocked, which attracted some attention because it's an unusual word, and quite a lot of people thought he'd made it up (even though he didn't). Is this the kind of thing we want to note in trivia sections? To me, it seems like the kind of thing no-one will care about in a month, and that it adds pointless clutter. Pinging @Ioaxxere, who originally added it as a usage note, but later changed it to the little-used Trivia heading. Theknightwho (talk) 02:12, 9 May 2024 (UTC)[reply]

His use of the term attracted some media coverage ([4] [5]) making it probably the most notable event in the history of becocked. Does a single sentence about this really add so much clutter? Ioaxxere (talk) 02:18, 9 May 2024 (UTC)[reply]
Ask yourself this: in two years' time, if someone came across this in an entry, would they feel like this addition was the cringeworthy result of terminally online recency bias? Almost certainly yes. It's basically just celebrity gossip. Theknightwho (talk) 02:21, 9 May 2024 (UTC)[reply]
"X said this word" should never go in Trivia/Useful Notes. It can go as a quote, however. CitationsFreak (talk) 03:08, 10 May 2024 (UTC)[reply]
I agree completely. I think trivia sections should chiefly be used for things like noting that a word is thought to be the longest in a particular language, has no vowels, doesn’t rhyme with any other word—that sort of thing. — Sgconlaw (talk) 11:35, 10 May 2024 (UTC)[reply]
I agree too. PUC19:24, 11 May 2024 (UTC)[reply]

Manipuri vs Meitei language (moved to RFM)

Discussion moved to WT:RFM#Manipuri vs Meitei language.

Performing bulk edits for Bengali/Bangla

Discussion moved from Wiktionary talk:Beer parlour/2024/May.

I'm a NLP researcher who uses Wiktionary to collect pronunciation data. As part of this effort we have noticed various inconsistencies in phonemic transcription. For example,

1. According to various sources (Khan, 2010; Dasgupta, 2003, Ferguson & Chowdhuri, 1960; Chatterji, 1970), Bengali have only one voiceless glottal fricative /h/, so /ɦ/ > /h/. E.g.: অকৃতোদ্বাহ 'bachelor' /ɔ.kri.t̪od̪.ba.ɦo/ > /ɔ.kri.t̪od̪.ba.ho/. This IPA symbol is not correctly represented in Wiktionary Bengali transliteration guide. Therefore, I propose to edit the guide.

2. The correct phonemic transcription (ref. Dasgupta, 2003, Ferguson & Chowdhuri, 1960; Chatterji, 1970) for affricates should include the tie-bar, so /tʃ, t͡ʃʰ, dʒ, d͡ʒʱ/ > /t͡ʃ, t͡ʃʰ, d͡ʒ, d͡ʒʱ/. E.g.:চরম 'extreme' /tʃɔɾom/ > /t͡ʃɔɾom/, ছায়াছবি 'film' /tʃʰae̯atʃʰbi/ > /t͡ʃʰae̯at͡ʃʰbi/, জল 'water' /dʒɔl/ > /d͡ʒɔl/, ঝিনুক 'sea shells' /dʒʱinuk/ > /d͡ʒʱinuk/. This tie-bar is not included in Wiktionary Bengali transliteration guide. I proposed to include this tie-bar for affricates symbols.

3. According to various sources (Khan, 2010; Dasgupta, 2003, Ferguson & Chowdhuri, 1960; Chatterji, 1970), Bengali doesn't have palatal plosive /c and ɟ/. Instead it has post alveolar affricates (ref. https://en.wiktionary.org/wiki/Wiktionary:Bengali_transliteration). Therefore, /c/ > /t͡ʃ/ and /ɟ/ > /d͡ʒ/. E.g. : অগোচর 'beyond one's knowledge' /ɔɡocɔr/ > /ɔɡot͡ʃɔr/, অগ্নিযুগ '(figurative) the age of revolution' /oɡniɟuɡ/ > /oɡnid͡ʒuɡ/.

Does there exist any tool or API that could allow us to apply bulk edits? If this sounds right, I will start to make corrections. Arundhatisgupta (talk) 16:06, 9 May 2024 (UTC)[reply]

I relocated this post because it was in the wrong place. — Sgconlaw (talk) 16:20, 9 May 2024 (UTC)[reply]
The IPA has long held that the tie bar is not necessary when transcribing languages that don't distinguish affricates from stop-fricative sequences. If Bengali doesn't distinguish /t͡ʃʰ/ from ?/tʃʰ/, then our current transcription convention is fine.
In describing the phonetics of a language, you want to be as precise as possible, so the ties are a good thing. But with a key like we have, they're not necessary.
The tie bars clutter a transcription and can make it more difficult to read. If we did implement them, it would probably be better to use the under-tie, ⟨t͜ʃʰ⟩. That's generally more legible because our eyes pick up details better at the top of a symbol, so the under-tie is less distracting. kwami (talk) 05:12, 10 May 2024 (UTC)[reply]
While "the tie bar is not necessary", it is good practice to include it and most languages on Wiktionary do. I don't see why Bengali would be an exception. Thadh (talk) 11:36, 10 May 2024 (UTC)[reply]
I agree with @Thadh.
@kwami It is not necessary for English as well. Why did you included it in English? Also, there is no consistency. If you think it is not necessary then make sure that you maintain that consistency. E.g.: অগচ্ছিত 'not entrusted to anyone' /ɔɡot͡ʃt͡ʃʰit̪o/ has the tie bar but চরম 'extreme' /tʃɔɾom/ doesn't. What do you think about that? Arundhatisgupta (talk) 16:37, 10 May 2024 (UTC)[reply]
Whichever convention is chosen, it should be consistent, and should match the key. kwami (talk) 19:19, 10 May 2024 (UTC)[reply]
There will be a confusion when /t/ and /ʃ/ occurs together but they are not affricate. E.g. কুৎসা 'slander' /kutʃa/ and বচসা 'contention' /bɔtʃoʃa/. Without a tie-bar they seems like having similar pronunciation for /tʃ/ but the correct pronunciations are - /kutʃa/ and /bɔt͡ʃoʃa/. Arundhatisgupta (talk) 16:51, 10 May 2024 (UTC)[reply]
That can be handled as ⟨kut.ʃa⟩ and ⟨bɔtʃoʃa⟩ or as ⟨kutʃa⟩ and ⟨bɔt͜ʃoʃa⟩ -- or, for maximal clarity, as ⟨kut.ʃa⟩ and ⟨bɔt͜ʃoʃa⟩. Just as long as we're consistent, or people will get really confused. kwami (talk) 19:22, 10 May 2024 (UTC)[reply]
I personally think there should be a tie bar and it should go above, which is the more common practice. Benwing2 (talk) 21:06, 10 May 2024 (UTC)[reply]
@Kwamikagami 1. If you are introducing a syllable break (indicating with a dot), then it should be applied consistently for all words in Wiktionary.
2. According to Wikipedia, undertie is used to represent linking (absence of a break) in the International Phonetic Alphabet. E.g.: /vuz‿ave/ (Ref. https://en.wikipedia.org/wiki/Tie_(typography)#cite_note-6) Arundhatisgupta (talk) 21:34, 10 May 2024 (UTC)[reply]
Linking is used to override the orthographic spaces we insert between words in transcription. In that example, the words are //vuz ave// but the pronunciation is /vu.za.ve/. The /za/ forms a single syllable. That tie is not the same thing as the 'slur' tie used for affricates, which comes from musical notation (slurred notes). kwami (talk) 21:56, 10 May 2024 (UTC)[reply]
I think that there should be tie bar and it should go above, which is the more common and establish practice for phonemic/phonemic transcription. It is important to maintain the consistency within language and across Wiktionary.
Is there any objection regarding other inconsistencies mentioned in the proposal? Arundhatisgupta (talk) 09:52, 11 May 2024 (UTC)[reply]
@Arundhatisgupta No objections from me although I don't know enough about Bengali phonology to say whether e.g. the use of palatal plosives or affricates is correct. IMO the best way to go about making these changes is either manually or through AWB or JWB, which let you quickly do semi-manual changes based on regexes. Benwing2 (talk) 18:15, 11 May 2024 (UTC)[reply]
@Benwing2 Could you please add me to Wiktionary:AutoWikiBrowser/CheckPageJSON ? Arundhatisgupta (talk) 14:39, 14 May 2024 (UTC)[reply]
@Benwing2 I tried to sign in AWB and it's saying that my username is not enabled to use AWB. Could you please help me? Arundhatisgupta (talk) 13:52, 22 May 2024 (UTC)[reply]
@Benwing2 Thank you for adding me on CheckPageJSON. I am able to edit the IPA pronunciation for some word where the pronunciation is given as- * { {IPA|bn|/IPA/} }, for example, গহির but I'm not able to edit the pronunciation where it is given as - " *{{ bn- IPA} }", for example, ছায়াছবি or '{{ bn-IPA } }', for example. গহনা. Could you please suggest me how can I edit those pages? If they are linked to any database, is there a way to access or update that database? Arundhatisgupta (talk) 12:54, 25 May 2024 (UTC)[reply]
@Arundhatisgupta Pages that use {{bn-IPA}} are backed by a module, which auto-generates the pronunciation. In general it's preferred to use {{bn-IPA}}, possibly with respelling, instead of directly specifying the pronunciation using {{IPA|bn|...}}, because it ensures more consistent results. If you want to change the way the module works, you have to edit the Lua code that implements the module, but before doing that you should get consensus from other Bengali editors that this is the right thing to do. Benwing2 (talk) 17:53, 25 May 2024 (UTC)[reply]

How should we present Latin adjectives that inflect like nouns (or that are really appositive nouns?)

A few times now, I've been puzzled about how to handle showing the inflected forms of certain Latin third-declension adjectives that don't fit well into any of the usual adjective inflection patterns, because they show the endings typical for a noun instead. Currently, these seem to mostly be treated in our entries as third declension adjectives of "one ending", but I think there are some issues with the accuracy of this in terms of showing forms and usage.

A particularly clear case is certain rare words that are attested with adjectival function but that have the form of feminine nouns, such as silvicultrīx, -trīcis and Nīlōtis, -tidis (which have the forms respectively of Latin and Greek feminine agent nouns). The masculine counterparts would presumably be *silvicultor and *Nīlōtēs, but these do not to my knowledge occur, and in any case, we normally treat agent nouns as noun lemmas (distinct for masculine and feminine) rather than combining the masculine and feminine versions under one adjective lemma. Should we lemmatize such words as nouns and include a usage note saying that they're used appositively? Or should we put them as adjectives (as many dictionaries do) but include some kind of special headword and declension table coding to avoid showing masculine or neuter forms, which I think aren't accurate in this case? For example, Gaffiot marks silvicultrix as "adj. f".

Not quite as clearcut are cases like senex, iuvenis, mās that generally have the form of nouns, commonly function as either nouns or as adjectival or appositional modifiers of masculine or feminine nouns, but are extremely rare or unattested in the neuter. (I've found some neuter forms attested in some cases in New Latin.) Functionally, I think there isn't much difference between how mās and fēmina are used, but we treat mās as a noun or adjective and fēmina as only a noun. Urszag (talk) 02:49, 11 May 2024 (UTC)[reply]

@Urszag We have a whole category Category:Latin first declension adjectives for words like amnicola and indigena that don't seem so different from the words you've cited, and there are unquestionably third-declension non-i-stem adjectives (e.g. vetus, concolor) that "show the endings typical for a noun", so I don't see an issue treating these as adjectives. Benwing2 (talk) 05:33, 11 May 2024 (UTC)[reply]
Yes, we have that category for first-declension adjectives. The full inflection of those words as adjectives is actually a bit questionable also (there was an RFV that I closed based on New Latin examples, but I added some notes in Appendix:Latin_first_declension discussing how the neuter plural nominative/accusative/vocative forms in -a and dative/ablative forms in -īs are rather hypothetical and ambiguous, as they've often (at least since Priscian's time) been interpreted as belonging to a second-declension paradigm instead (e.g. that of indigenus).
Third-declension non-i-stem adjectives such as vetus exist, but are rare (aside from comparative forms). When there are attested neuter forms distinct from the masculine/feminine forms (such as vetus in the accusative singular, or vetera in the nominative/accusative plural), this establishes that a word is formally distinct from an appositive noun (and also establishes whether the neuter plural ends in -ia or -a, which can't always be predicted from the forms of the ablative singular or genitive plural). But I think that such neuter forms are often unattested (except sometimes in very late periods of the language) and in that case it's arguably misleading to just present a single full declension table. E.g. I found iuvenia once in Medieval Latin and occasionally in New Latin, and a couple of New Latin cases of iuvena (both from the same author), but I think it's more misleading than not to present either of these as established or standard Latin forms: a late imperial-era grammarian says that this word simply lacks neuter plural forms. In cases like this, there's an existing parameter to mark an adjective as lacking neuter forms, so I ended up using that and mentioning other forms in usage notes. But in cases like silvicultrīx and Nīlōtis, I don't know how to best present the fact that they occur only as adjectival modifiers of feminine nouns: for now I've just removed the declension table from the second word (since several forms are unattested, or only attested in post-Classical texts, and the Greek origin makes it tricky to actually infer what missing forms would be), but for silvicultrīx it seems fairly clear that it would simply inflect like victrīx. If we continued to categorize these as adjectives, does it sound reasonable to establish a parametric way to mark them as feminine-only?--Urszag (talk) 06:33, 11 May 2024 (UTC)[reply]
@Urszag Yes, I think so. We have done that for some other languages, e.g. French adjective headwords have an |onlyg= parameter that you can set to a gender (m or f), a number (s or p) or a gender-number combination (e.g. m-s, f-p). Now mind you, some of the terms that make use of this (e.g. enceinte) would IMO be better treated as conventional adjectives that are simply rare in other genders or numbers; there's even a usage note for enceinte that says
The masculine form enceint is occasionally used with regard to transgender men, for species with male pregnancy such as seahorses, as well as in metaphorical, jocular, or fantastic contexts.
And indeed you will find that in Spanish, the corresponding words like encinto, embarazado and preñado are given in the masculine, with quotes establishing that such usage does exist. But if the term is indeed unattested in some genders, I would definitely support adding a flag to suppress those genders in the declension table and make sure that the title next to the declension table reflects this. Benwing2 (talk) 06:53, 11 May 2024 (UTC)[reply]

Default font size for polytonic Greek

Does anyone else think the default font size for polytonic Greek should be increased? It looks small to me, especially in SBL Greek, which is the font with the highest priority in the CSS. Weylaway (talk) 19:55, 12 May 2024 (UTC)[reply]

It seems OK to me; can you post screen shots showing how it looks for you? BTW I do notice when comparing polytonic Ἀριστοτέλης (Aristotélēs) to non-polytonic Αριστοτέλης (Aristotélis) that the latter seems relatively ugly because it uses a sans-serif font. Benwing2 (talk) 21:17, 12 May 2024 (UTC)[reply]
FWIW, this is what it looks like for me (top is the level of zoom I usually use, bottom is 100% zoom). - -sche (discuss) 22:13, 12 May 2024 (UTC)[reply]
Interesting; your polytonic looks sans-serif, while your non-polytonic looks more serif, which is the reverse of what I see. Benwing2 (talk) 22:19, 12 May 2024 (UTC)[reply]
For non-polytonic, it's using Gentium for me, the second font in the list (because I don't have Athena, the first font in the list). For polytonic it was using DejaVu Sans (the third font in the list, because I didn't have SBL Greek or Athena). Now that I've downloaded SBL Greek, it looks like this, displaying polytonic in SBL Greek, which is heavily (IMO distractingly) serifed and slanted and handwritingesque. SBL Greek polytonic text is indeed smaller than other text, though not unreadably so IMO. (But I do think SBL Greek looks worse than other fonts in the list, so I might be tempted to move it down in the ranking... but perhaps it is first because it has the best diacritic support?) - -sche (discuss) 06:56, 13 May 2024 (UTC)[reply]
@-sche Hmm. I checked using the Computed tab in Chrome and, if it's correct, my polytonic font is using Times (postscript name "Times-Roman"), which is way down the list, and my non-polytonic font is Arial Unicode MS, which is likewise way down the list. Maybe this is because I'm on a Mac, although I'm surprised there aren't more fonts installed by default. BTW here is what it looks like: [6] Benwing2 (talk) 07:25, 13 May 2024 (UTC)[reply]
I personally think SBL Greek looks the best and I use a custom style to display it at 130% size. I was just thinking of learners who may find the default hard to read – in -sche's example the x-heights of the Greek letters are smaller than those of the Latin letters next to them. But clearly there is variation across operating systems and difference in personal preference, so maybe it's better for me just to use my custom style. Weylaway (talk) 17:46, 13 May 2024 (UTC)[reply]
Thank you @Weylaway for your question. Greek (script Grek, polyonic and monotonic alike) look miserable and small at en.wikt. I have no idea what the default font is for this site (as in sc=Latn), or what the designers wish their readers to view. Default looks much beter. Or perhaps an equivalent making sure that grave accent is shown (not a vertical accent). User:Sarri.greek/fonts#default. Ancient Greek inflectional tables, which should have a 110%, are even smaller. Even the prosody marks look better with normal default fonts. Thank you again, for putting this. ‑‑Sarri.greek  I 21:38, 13 May 2024 (UTC)[reply]

Blottoism

Don't let him die forgotten. Talk:upput. I hope my gastric bad temper will also survive. True story: our librarian/researcher has asked me why our system contains some non-existent records. It's because Artefactual's API is wrong. But I've got half a day of billable debugging before I can prove it. Equinox 02:00, 13 May 2024 (UTC)[reply]

Old Pskovian

I propose to add an etymological code for Old Pskovian (~zle-ops?), as part of Old Novgorodian (zle-ono) in the branch of East Slavic languages. Cases of mention of Old Pskovian. This is a dialect and variety of Old Novgorodian, which was in ancient Pskov and its environs (https://ru.wikipedia.org/wiki/Древнепсковский_диалект). What do you think @Thadh? AshFox (talk) 04:29, 13 May 2024 (UTC)[reply]

Don't see any issue with this. If nobody opposes, I'll add it in a week or so. Thadh (talk) 19:23, 13 May 2024 (UTC)[reply]
No objections. Benwing2 (talk) 20:32, 13 May 2024 (UTC)[reply]
@Thadh it seems good, no one is against it. A week has passed. Please add Old Pskovian when you have free time. AshFox (talk) 09:55, 20 May 2024 (UTC)[reply]

Should we split up multi-language pages?

Currently, a user trying to get to da#Zhuang on desktop has to:

  1. Type "da" into the search bar.
  2. Wait for the massive page to load (this could take a while on older devices or on slower connections).
  3. Scroll for a very long time until reaching "Zhuang" in the table of contents.
  4. Click it.

On mobile, the situation is even worse since in fact there is no table of contents.

Maybe a better option would be to have da function as a sort of disambiguation page which lists all of the available languages in a compact table. In this case, the user would quickly be able to locate and click "Zhuang" which would take them to da/Zhuang. Also, since da and da/Zhuang would both be very compact, the loading times would be practically instantaneous.

Also, doing this would also solve all of our Lua-related problems (at least for the near future). What do we think @Chuck Entz, Benwing2, Theknightwho? Ioaxxere (talk) 05:50, 13 May 2024 (UTC)[reply]

@Ioaxxere This has been proposed various times but it would be an enormous undertaking and would (of course) have some downsides, such as requiring more clicks to view anything and not so easily being able to see the similarities among different languages that share the same spelling. Maybe a less radical solution for the time being would be, as Chuck proposes, to move letter information out of letter pages into an Appendix or something. Benwing2 (talk) 05:58, 13 May 2024 (UTC)[reply]
(Not saying I think we should split pages, but) something I suggested in past discussions which would address the "more clicks to view anything" problem is : if we split, transclude the subpages back onto the 'main' entry, so someone looking up e.g. the main sender page rather than sender/da still sees all the languages. Transclusion could be the default, and for the few pages with excessively many language sections where it wouldn't be feasible (particularly because I think transcluding a page causes it to count 2x against the PEIS limit? and causes any templates it transcludes to thus count 4x? and even Tim Starling has said that raising the PEIS limit is not something the devs will do), we could fall back on having a table like Ioaxxere suggests. BTW I think the usual proposal is to use language codes in naming the subpages, rather than language names, which may be long and contain untypable characters. Either way, we have to watch out for conflicts with pages that actually contain slashes, e.g. s/he.
In this case, I'm inclined to agree that moving the letters to alphabet appendices is a solution to most of the immediate problem. Let something like Appendix:Dutch alphabet give the names and pronunciations of all the Dutch letters in one place, rather than giving them on a, b, etc. Just have a ==Translingual== entry on a and maybe categorize all the Appendices that use a into a category like "Category:Alphabets that use Latin a" or something and then have a link in the Translingual entry to that category...? - -sche (discuss) 06:31, 13 May 2024 (UTC)[reply]
@-sche: I'm very confused as to why letter entries are being mentioned? The initial page mentioned is da which isn't even about a single letter. There's only one "letter name" entry on it being Tagalog da. mi is one of the worst pages, if not the worst, when it comes to this, so again, trying to focus on letters is not the way to go about it. This type of proposal was proposed in 2020 and was not passed then either. Also, as I mentioned on said vote, most of the bytes on a aren't from letter entries either. Let's focus on finding a solution that actually fixes the overarching problem, rather than throwing us into the issue of letters again. AG202 (talk) 14:13, 13 May 2024 (UTC)[reply]
Additionally, the notion of moving letter entries comes from a clear, whether intentional or not, Latin script language-bias. As I mentioned in the vote, entries like (n) should not belong in an Appendix or Translingual just because some other languages on Wiktionary do their letter entries poorly. AG202 (talk) 15:24, 13 May 2024 (UTC)[reply]
@AG202: It's definitely not a Latin-script bias. The same applies to Cyrillic, Greek, Perso-Arabic, Georgian... The fact that certain writing systems are slightly more complex and language-specific doesn't mean that all those that aren't deserve an entry for every language and every grapheme.
By the way, I don't see a Jeju entry for , and something tells me that if you were to duplicate this content three times (Middle Korean, Korean, Jeju) you will not be such a fan of keeping the three entries on one page. Thadh (talk) 16:36, 13 May 2024 (UTC)[reply]
I would be. Just as I am with every other letter entry. The only reason I haven't created them myself is because I haven't had the time to. Also, even if it's not from a Latin-script bias, I still do not think that several smaller language communities are being considered.
That being said, this still doesn't address my main point that this doesn't actually fix the problem. If you don't want letter entries that's fine, that's another conversation, but let's not pretend that it's going to fix this current lua memory issue. It's not even a solution to "most" of the immediate problem at pages like a, nor does it fix anything at all at pages like da or mi or la. AG202 (talk) 17:21, 13 May 2024 (UTC)[reply]
For reference at a: there are 170 L2s, and 64 letter entries (with the header "Letter"). So not even half of the L2s there have letter entries. It's frankly overblown. That's not even considering the L2s that have significantly much more content in their non-letter entries compared to the letter entry such as English a. AG202 (talk) 22:17, 13 May 2024 (UTC)[reply]
@AG202 Not sure it's overblown, since a is the article causing the most headache in terms of Lua and parser limits. a is currently at 1.887MB out of an allowed 2.097MB in post-expand include size, and removing all the letter entries would bring that down noticeably. People are also thinking ahead to the fact that there are 5,000+ languages that use the Latin script, and we can't possibly have an entry for the letter a in every such language; whereas the number of languages where a is a word (excluding those where it's the name of the letter a) is much more limited. (Note also that when I just previewed a, I got a CPU timeout. User:Surjection may have inadvertently made this worse by lite-ifying a bunch of the templates again; the preview showed 43 seconds of CPU time and 53 seconds of real time, vs. 23 seconds of CPU time and 30 seconds of real time when previewing a slightly earlier version not using the lite templates. YMMV though, as there is a lot of variation in the CPU times.) Benwing2 (talk) 22:44, 13 May 2024 (UTC)[reply]
@Benwing2: "Removing all the letter entries would bring that down noticeably", can we actually get the numbers for this? Because when I tested that back in 2020, that wasn't the case. CC:@Surjection
"Whereas the number of languages where a is a word (excluding those where it's the name of the letter a) is much more limited": I also don't think that's the case. Again, looking at what's on the ground right now, there are significantly more non-letter entries that are taking up "space". There are only 64 L2s with letters or letter names, out of 170. Clearly the focus should be elsewhere.
Letter entries don't even take up that much space relatively; they don't have quotations like Sassarese a and they don't need usage notes like Serbo-Croatian a. I'm much more worried about 100 more L2s with non-Letter POSs as that's more realistic and takes up significantly more space, instead of a very rare possibility of 5000+ letter entries. Hell, the English entry at a has twelve etymologies outside of the letter entry, and is itself equivalent to several languages' letter entries.
Let's focus on actual long-term solutions, like the TOC option being discussed below, rather than taking out information that users like myself and others find useful. AG202 (talk) 23:18, 13 May 2024 (UTC)[reply]
User:Thadh/a I've removed all noun senses for letter names (except for the Norwegian figurative ones) and letter senses. I don't know how to measure whether the page loads better, but at least there are no lua errors anymore. Thadh (talk) 13:47, 14 May 2024 (UTC)[reply]
Thank you! Yeah looking at the page you've linked, removing the letters (and letter names which I thought were a separate issue) only removed 27225 bytes, which may seem like a lot, but that's out of 197738 bytes initially. That means that letters & letter names only account for ~14% of the bytes on the page, which is exactly what I was talking about. Letters on their own account for even less. We'd reach the max byte limit again in no time even if we barred letters from being added. (Also by my count only 14 L2s have solely a letter and/or letter name entry out of 170) AG202 (talk) 14:19, 14 May 2024 (UTC)[reply]
«no table of contents»? What do you mean? Tollef Salemann (talk) 06:19, 13 May 2024 (UTC)[reply]
Are the pages titles "Appendix: Variations of the letter [LETTER]" complete? Could they be made to include all the languages that use the given letter? If they could, that would eliminate one advantage of the current letter-page structure: comparison of letter use across languages. DCDuring (talk) 12:00, 13 May 2024 (UTC)[reply]
At the moment, the "Variations of" appendices are language-neutral. We could agree to change that, but I think a separate set of appendices would probably be the better approach. —Mahāgaja · talk 12:38, 13 May 2024 (UTC)[reply]
That's not what those pages are for - those are to list confusable/similar terms so that we don't clutter {{also}} with massive lists at the top of a page. Theknightwho (talk) 12:39, 13 May 2024 (UTC)[reply]
Right, I am wondering about extending their purpose to overcome a short-coming of off-loading letters to language- or script-based appendices: that one loses the ability to compare across languages. If we need to create something additional to preserve their purity of their current purpose, I would not object. It just seems that their current narrow purpose could be broadened to make them more effective even at achieving their current purpose. DCDuring (talk) 13:52, 13 May 2024 (UTC)[reply]
I suppose, but I'm not sure how useful it'd be (especially with letters like "a"). It'd make more sense with less common letters, though. Theknightwho (talk) 14:01, 13 May 2024 (UTC)[reply]
Based on the persistence of this problem over at least a decade, it seems that we are forced to use incremental solutions. Not all incremental solutions have to be technical. As was observed above, letters (and symbols) are not like the rest of our content, so perhaps we can use a different content model for them. If the different content model allows us to reduce the number of module-error pages, our attention will be led to a somewhat different set of violations, requiring or suggesting different partial solutions. A different content model for letters may lead to a better (more comprehensive) handling of letters. There are more than 60 Letter L3/4 headers on a. That does not seem a negligible amount of content to offload, but it seems likely to be smaller than the number of languages likely to use the letter a. DCDuring (talk) 17:19, 13 May 2024 (UTC)[reply]
  • The problem isn't just individual letter pages, though. A page like [[an]] is also difficult to navigate around and could benefit from being split up. On the other hand, at [[bachall]] I find it very convenient to have the Old Irish entry and its homographic Irish and Scottish Gaelic descendants all on the same page. —Mahāgaja · talk 12:34, 13 May 2024 (UTC)[reply]
@Benwing2: Well, the number of clicks is equal if you count the table of contents. Also, it seems like splitting the letter entries off doesn't address the Lua issue since letters make up only small proportion of the content at a.
@Tollef Salemann: If you don't have a phone handy, go to https://en.m.wiktionary.org/wiki/da and resize your browser to make it narrow — the table of contents disappears.
@Mahagaja, -sche: Yes, there could be some kind of controller template on da that would automatically transclude the pages if the number of languages is under a certain reasonable value (say 5), otherwise display the disambiguation table. Ioaxxere (talk) 13:32, 13 May 2024 (UTC)[reply]
What I find convenient about having all three languages at [[bachall]] is not so much being able to read them all at once as being able to edit them all at once, and if they're transcluded from three separate pages named "bachall/ga", "bachall/sga", "bachall/gd" or the like, then that's not convenient anymore. —Mahāgaja · talk 14:30, 13 May 2024 (UTC)[reply]
No, it doesn't want to disappear on my handy. Not on my neighbors' either. We both use Apple handy (iphone). What do you all mean it disappears? It just being smaller, so you must touch on it, so it becomes bigger and you easily can navigate, like in Wikipedia. Are you all talking but non-Apple devices? Tollef Salemann (talk) 17:44, 13 May 2024 (UTC)[reply]
How about only splitting up pages over a certain size? At the moment when I look up a short word I type "da#Zhuang" in the search bar to get me straight to the entry I need, but that's still annoying. —Caoimhin ceallach (talk) 13:52, 13 May 2024 (UTC)[reply]
I don't like the idea of splitting pages. Is there any way to personalise the table of contents? Being able to collapse language names per letter of the alphabet would go a long way. Thadh (talk) 14:03, 13 May 2024 (UTC)[reply]
@Thadh This may be possible with CSS. I know for example that User:Sarri.greek has been experimenting with different layouts for the TOC. If not, and you can create a clear plan for what functionality you'd like, the MediaWiki devs might be amenable (e.g. if you contact Tim Starling directly; he's the one who increased our memory and timeout limits). Benwing2 (talk) 20:32, 13 May 2024 (UTC)[reply]
@Thadh See [7] for an example of what Sarri did. Benwing2 (talk) 21:16, 13 May 2024 (UTC)[reply]
That looks good! Maybe this at least solves the issue of navigation. Thadh (talk) 21:17, 13 May 2024 (UTC)[reply]
I agree - it does look good. Theknightwho (talk) 21:19, 13 May 2024 (UTC)[reply]
Not bad. Is this intended as a default for certain types of pages (What kind?), opt-in, or only in custom JS/CSS? DCDuring (talk) 01:38, 14 May 2024 (UTC)[reply]
@DCDuring I think we could clean it up a bit and use it for long pages where we'd otherwise compress the TOC by omitting subheadings. Benwing2 (talk) 01:59, 14 May 2024 (UTC)[reply]
@Benwing2, Thadh, Theknightwho: I rewrote the template: {{minitoc}}. Maybe it can be automatically added to any entry with more than (say) ten languages. Ioaxxere (talk) 05:14, 14 May 2024 (UTC)[reply]
@Ioaxxere Looks good to me but let's solicit more comment first. Also it would be great if there was a way, after you expand it, to further expand it to show the subheadings. Some people (maybe User:RichardW57?) have complained about the shortened TOC's that you can't so easily navigate to the subheadings of a particular language. Benwing2 (talk) 05:19, 14 May 2024 (UTC)[reply]
M @Ioaxxere, thank you for your Template:minitoc! and your help for [8], [9], [10], [11],[12], ... Also, perhaps variations for few (1-5) languages by L2 something like this? For related language periods like this? (Reconstruction of the magic word __TOC__ because it might be taken away from us in future skins like vector22? (e.g. discussion@el.wikt for modifications) Thank you, thank you! ‑‑Sarri.greek  I 05:36, 14 May 2024 (UTC)[reply]
@Sarri.greek: Yes, a multi-column TOC is certainly possible. You can add the following into your commons.css:
div.toc > ul { display: flex; flex-direction: column; gap: 0 20px; flex-wrap: wrap; overflow: auto; max-height: 30em; /* change max-height as desired */ }
div.toc { width: 100%; }
I can't promise you that it'll look very good, though. Ioaxxere (talk) 06:08, 14 May 2024 (UTC)[reply]
@Ioaxxere @Sarri.greek I've rewritten Module:minitoc somewhat to take advantage of the pre-computed list of L2s that is already calculated by Module:headword/page, since it can cope with a bunch of weird edge-cases that can't be dealt with by a simple Lua pattern (and it's also faster, since it means we don't need to parse the page again). Theknightwho (talk) 14:00, 14 May 2024 (UTC)[reply]
Re complaining when TOCs are collapsed, FWIW I have also complained that when TOCs are collapsed you can't easily navigate to subsections of a given language section, but as long as we're only deploying that on entries with a truly excessive number of L2s, like a, I'll live with it. If we're deploying it on tons of pages, e.g. cat—11 L2s—I'm less happy. Maybe if we're only deploying it on mobile, that's better than also deploying it on desktop, OTOH someone was just complaining in another discussion that entries are hard to navigate because TOCs are collapsed or sometimes hidden(?) on mobile. So maybe the ideal would be to make it a gadget/pref, whether opt in or opt out, so people who wanted collapsed TOCs could get them—maybe even on all entries, if they wanted—and people who wanted uncollapsed TOCs on all entries could keep them. Or to make it possible to expand the collapsed TOCs (all at once or on a per-L2 basis) as mentioned above.
Not directly relevant to this specific concern, but relevant to the general topic is Wiktionary:Grease_pit/2021/June#Experience_on_mobile which also links a number of other prior discussions; see also some history at Wiktionary:Beer_parlour/2021/April#collapsed/minimized_language_headers. - -sche (discuss) 15:37, 14 May 2024 (UTC)[reply]
I can understand why __NOTOC__ has been included, since it avoids having two TOC with Vector, but with Vector 2022 it's a bit detrimental since it means there's now no longer a TOC in the left-hand sidebar, which can normally be used even if you're scrolled halfway down the page. Theknightwho (talk) 15:46, 14 May 2024 (UTC)[reply]
@Theknightwho: That's a good point. It's actually possible to specify by-skin behaviour by changing MediaWiki:Vector-2022.css and similar, so maybe we could use that to override __NOTOC__ in Vector 2022 since the TOC doesn't take up space in the document flow anymore. Ioaxxere (talk) 16:04, 14 May 2024 (UTC)[reply]
Instead of moving the Zhuang entry for da to da/Zhuang, we could move it to Zhuang/da, and we could move all the Zhuang entries that way and add a specialised search bar searching only in entries starting with “Zhuang/”, like we do with the search bar on top of the beer parlour here. That would reduce the scrolling and the clicks for people interested in Zhuang. MuDavid 栘𩿠 (talk) 01:54, 14 May 2024 (UTC)[reply]
Iff we split (which I am not saying I'm in favor of), I like the idea of putting the language first so people can make language-specific searches. But what do you think of using language codes rather than language names? Language names can be very long (e.g. "Southern Valley Yokuts") and can contain many hard-to-type characters (how likely is it that the average user can type "ǃXóõ"?)... although I concede we do name reconstructed entries using language names. - -sche (discuss) 02:06, 19 May 2024 (UTC)[reply]
@-sche Personally I wouldn't use Reconstruction sections as an example of good UI design; I find it annoying to have to type out the whole language name (not to mention the word "Reconstruction"; is there an abbreviation for this namespace?). As for putting the lang name or code first, I see advantages and disadvantages. Assuming we have a page at the bare word that links to all the lang-specific pages, the advantage for experienced users is that the lang specific pages won't so easily show up in autocomplete; but this may be a disadvantage to new users, who will see the lang-specific pages autocompleted if the lang follows but not if it precedes, and who are unlikely to be familiar with Wiktionary lang codes.Benwing2 (talk) 03:07, 19 May 2024 (UTC)[reply]
Re abbreviation: RC, e.g. RC:Proto-Germanic/-janą. (As you probably recall, people objected to past proposals to set up RC entries like regular entries i.e. put all languages that have a word *jab on one page, because reconstruction orthographies are language-specific and what j or a or b means in the notation for one language is different from what it means in another. I...can't say I find that persuasive, because what non-reconstructed orthographies mean by j, a, b also differs—Hmong uses b to represent a tone, vs other languages using it for /b/, /β/, etc—but...) - -sche (discuss) 03:39, 19 May 2024 (UTC)[reply]
@-sche Thanks. Yeah I don't find that argument persuasive either for the reasons you give. Benwing2 (talk) 03:51, 19 May 2024 (UTC)[reply]
I understand the logic between separating languages in this case but I agree that it's extremely annoying to search for and definitely unintuitive for new users. Maybe we should go as far as to treat reconstructed terms the same way as attested entries, i.e. have Reconstruction:Proto-Germanic/-janą and a term with the literal spelling *-janą on the same page or (under the proposal) page group. Ioaxxere (talk) 04:01, 19 May 2024 (UTC)[reply]
(@Benwing) If we display all the language-specific subpages on the bare entry (i.e. transclude the Zhuang da subpage onto da), then I am not concerned about whether new users see the subpage (whether "da/Zhuang" or "Zhuang/da") in the search bar when they type "da...", because I wouldn't expect such users to know/care about subpages, and I figure it's enough that they can type "da" and get to the "da" entry, where they can see the Zhuang content. However, it occurs to me that an advantage to "da/Zhuang"-style naming might be : if whatever template we use at [[da]] can just pull and display all subpages of whatever page it's on (and we would only need special handling for a small number of pages, e.g. s/he is not a subpage of s)... whereas if we use "Zhuang/da"-style naming, it seems like it would have to go through a list containing all of the thousands of possible languages we include, to check which exist for a given page. (No?) Would there be a difference in which approach is faster, Zhuang/da vs da/Zhuang...? I do think the ability to type "Zhuang/..." into the search bar and thus narrow whatever you next down so that you're only searching in Zhuang would be a benefit to that ordering — say I want to quickly check whether any Zhuang entries start with str-, I could type "Zhuang/str" into the search bar and see what results autocomplete suggests/finds — but not if there are drawbacks, like if it would complicate or slow down the 'transclude all subpages onto the main page' template. And I'm still not actually in favour of splitting entries onto subpages at all, although if it came to a vote I don't know if I'd oppose or just abstain.) - -sche (discuss) 04:05, 19 May 2024 (UTC)[reply]
@-sche: No, the pages can't be transcluded since that would lead to the massive pages we were trying to avoid in the first place. More realistically there would be a {{minitoc}}-like navigation table which could dynamically transclude pages through JavaScript (of course, there would be a less-convenient alternative for users not running JavaScript). By the way, I prefer da/Zhuang over Zhuang/da since it makes it easier to autocomplete queries. A user only has to type "da/z" for da/Zhuang to be the only valid completion. Ioaxxere (talk) 04:29, 19 May 2024 (UTC)[reply]
@-sche I didn't even think of that; my assumption was that adding a new language would entail both adding a split lang-specific page and modifying the combined page to know about the language in question, which is definitely a drawback to the split-lang approach. But if we put the lang name or code last, yes is should be possible to use the prefix-pages functionality to automatically find the split-lang pages. Benwing2 (talk) 04:29, 19 May 2024 (UTC)[reply]
@Ioaxxere It should still be possible to transclude in most cases, e.g. only not transclude if there are more than say 20 or so languages. Benwing2 (talk) 04:31, 19 May 2024 (UTC)[reply]
Yes, to clarify, because two ideas were discussed earlier and I get the sense Ioaxxere and I may be talking about different ideas(?)... I was saying it'd make sense to have a template automatically find any subpages and transclude them, rather than making users manually add new subpages to a list (since we already see how often users tag but neglect to list RFVs, RFDs, etc!), if we split all pages. But if we only split the large, Lua-memory-error-having or PEIS-limit-reaching pages that need splitting, Ioaxxere is right that it doesn't make sense to transclude anything. I haven't run the stats, but I would hazard a guess that the majority of pages on Wiktionary have only 1 L2 (maybe 2), so splitting all our millions of pages just because a few are too large to display does have obvious drawbacks, letting a tiny tail wag an enormous dog. It would be a lot less disruptive to only split the handful of pages that need it. - -sche (discuss) 06:34, 19 May 2024 (UTC)[reply]
100% agreed. Benwing2 (talk) 07:25, 19 May 2024 (UTC)[reply]
On the other hand, the overwhelming majority of pages for English lowercase 4-letter nouns starting with 'ta' have pages with entries for more than one language, and many of our inflection tables are littered with orange links. --RichardW57m (talk) 14:40, 23 May 2024 (UTC)[reply]

Enabling categories for logged-out users

Currently, categories are hidden on mobile unless a user is logged in and has "advanced mode" enabled. I don't think there's any good reason to do this since categories are a pretty important part of the site. Apparently we need to get community consensus and then open a Phabricator request to set $wgMinervaShowCategories['base'] = true; Would you support this? Ioaxxere (talk) 14:16, 14 May 2024 (UTC)[reply]

@Ioaxxere Support I had always assumed there was some reason for not doing so already such as:
  • not making entries look too cluttered
  • categories are too technical for viewers of Wiktionary who do not edit
  • or there are technical difficulties involved
Kutchkutch (talk) 14:27, 14 May 2024 (UTC)[reply]
Support. Binarystep (talk) 14:42, 14 May 2024 (UTC)[reply]
I assume this would slow things down a bit for all users. How much? DCDuring (talk) 14:49, 14 May 2024 (UTC)[reply]
Support. Benwing2 (talk) 14:55, 14 May 2024 (UTC)[reply]
I don't think this would cause any noticeable slowdown, even on very large pages. Theknightwho (talk) 14:57, 14 May 2024 (UTC)[reply]
Support SAMEER (؂؄؏) 18:01, 14 May 2024 (UTC)[reply]
Support, and I wish Wikipedia would follow suit but alas. lattermint (talk) 23:38, 14 May 2024 (UTC)[reply]
Support - -sche (discuss) 01:32, 15 May 2024 (UTC)[reply]
Support Fay Freak (talk) 01:55, 15 May 2024 (UTC)[reply]
Strong supportSURJECTION / T / C / L / 12:48, 15 May 2024 (UTC)[reply]
Support CitationsFreak (talk) 16:45, 15 May 2024 (UTC)[reply]
Support Vininn126 (talk) 17:23, 15 May 2024 (UTC)[reply]
Support Theknightwho (talk) 17:26, 15 May 2024 (UTC)[reply]
Strong support AG202 (talk) 06:14, 16 May 2024 (UTC)[reply]
Strong support Thanks for proposing this. I too had always assumed there must have been some kind of technical/UX reason for not implementing this already, but none has been forthcoming. Voltaigne (talk) 07:59, 16 May 2024 (UTC)[reply]
It seems the consensus is overwhelming. Has a Phabricator request been opened yet? — SURJECTION / T / C / L / 19:45, 18 May 2024 (UTC)[reply]
@Surjection: phab:T365323. Ioaxxere (talk) 01:29, 19 May 2024 (UTC)[reply]
This has been merged into the next MediaWiki release, so it should start working relatively soon. Theknightwho (talk) 08:47, 21 May 2024 (UTC)[reply]
it looks like it was just deployed an hour ago. It seems to be fully functional when testing it in private-browsing mode. — SAMEER (؂؄؏) 20:22, 21 May 2024 (UTC)[reply]

Sign up for the language community meeting on May 31st, 16:00 UTC

Hello all,

The next language community meeting is scheduled in a few weeks - May 31st at 16:00 UTC. If you're interested, you can sign up on this wiki page.

This is a participant-driven meeting, where we share language-specific updates related to various projects, collectively discuss technical issues related to language wikis, and work together to find possible solutions. For example, in the last meeting, the topics included the machine translation service (MinT) and the languages and models it currently supports, localization efforts from the Kiwix team, and technical challenges with numerical sorting in files used on Bengali Wikisource.

Do you have any ideas for topics to share technical updates related to your project? Any problems that you would like to bring for discussion during the meeting? Do you need interpretation support from English to another language? Please reach out to me at ssethi(__AT__)wikimedia.org and add agenda items to the document here.

We look forward to your participation!


MediaWiki message delivery 21:23, 14 May 2024 (UTC)[reply]

New TOC scheme

After thinking about our TOC issues a bit more I feel like it's impossible to have a one-size-fits-all system that's convenient for every user. Instead, I think we should have a different scheme for different skins. What do you think about this proposal? Pinging those who participated in the discussion above: @-sche, Benwing2, Theknightwho, DCDuring, Thadh.

1-4 L2s 5-9 L2s 10-19 L2s 20+ L2s
Vector (and other old skins) default TOC mini TOC
Vector 2022 default TOC Both
Mobile default TOC mini TOC

Ioaxxere (talk) 14:31, 15 May 2024 (UTC)[reply]

Testing would be good, especially if the system is to be imposed on IPs as a default, as we have very little information about how normal users use our entries, even basic facts like how many use English L2s only. I assume that other users will be able to override the defaults. DCDuring (talk) 14:56, 15 May 2024 (UTC)[reply]
@DCDuring: Users would be able to override the table with some simple CSS settings (see User:Ioaxxere/common.css for an example). The code is admittedly ugly but this comes with the benefit of virtually complete control over the output. As for what normal users think, maybe this is a good opportunity to "connect with readers" @Vininn126? Ioaxxere (talk) 16:30, 15 May 2024 (UTC)[reply]
We probably need a larger followership before we start making questionnaires. Vininn126 (talk) 10:29, 17 May 2024 (UTC)[reply]
Sounds good to me in principle. Thadh (talk) 15:08, 15 May 2024 (UTC)[reply]
@Ioaxxere Like @Thadh I would say "sounds good in principle"; I have no specific objections but I think there should be a period of testing before committing to particular numbers of L2's for controlling the TOC appearance. Benwing2 (talk) 00:04, 18 May 2024 (UTC)[reply]
Could we have numberings? (too difficult to count without them), possibly 'show' the frame default? & trial examples also at pages like this one? (probably with not centered text)? Thank you ‑‑Sarri.greek  I 00:32, 18 May 2024 (UTC)[reply]
@Benwing2: For testing, here are the current pages which use the template: a (169 L2s), da (91 L2s), rock (13 L2s), small (8 L2s), and fish (2 L2s, so no effect). The L2 numbers can be edited at any time here: Module:minitoc/styles.css. Ioaxxere (talk) 03:40, 18 May 2024 (UTC)[reply]

Should entries categorized according to speculative etymologies?

If a word’s etymology is sufficiently disputed or uncertain to merit an {{uncertain}}, should that word be categorized as coming from the proposed sources? For instance, should Vulgar Latin *tīrāre be categorized as coming from Old Persian, from Proto-Germanic, and from Greek according to the various competing theories? I’m inclined towards ‘no’, since we know for a fact that at least two of the three categorizations will inherently be wrong. (And in all likelihood all three of them are.) Yet I often find entries where this sort of categorization has been done anyway. Nicodene (talk) 03:39, 16 May 2024 (UTC)[reply]

I agree — the entry being in a category called "terms derived from Y" misleadingly implies a level of certainty that doesn't exist. FYI, {{etymon}} lets you explicitly specify whether a derivation is "confident" or "uncertain" so maybe we could use that to generate better categories. Ioaxxere (talk) 05:02, 16 May 2024 (UTC)[reply]
On the other hand, if one is looking for Latin words coming from Old Persian, I'd want it in that category rather than having to also look at the category of Latin words possibly coming from Old Persian. One usually has to at least look for the etymology section because of the possibility of Latin homographs. --RichardW57m (talk) 13:55, 16 May 2024 (UTC)[reply]
That’s doable by searching for Latin terms of uncertain origin, along with the keyword ‘Persian’. I’d really rather not have such a word be part of the actual category in question since it almost certainly doesn’t belong. Nicodene (talk) 14:30, 16 May 2024 (UTC)[reply]
So what is the search URI? Is it not obscure and prone to false detections? It's also a second search specification, and nothing like as simple to type as https://en.wiktionary.org/wiki/cat:Latin_terms_derived_from_Old_Persian. (I only need to type the scheme and underscores to make it look good on this page!) --RichardW57m (talk) 17:07, 16 May 2024 (UTC)[reply]
This, and his argument can be inverted: He can search entries containing the category but not {{uncertain}}. Overall both and either is easier. 😃 Fay Freak (talk) 20:59, 16 May 2024 (UTC)[reply]
Categories have no inherent truth value. They as well have the issue of containing surface analyses, affixations from previous chronolects: intended to have utility value. Fay Freak (talk) 20:59, 16 May 2024 (UTC)[reply]
I don't see what's so confusing. The search is as follows: incategory:"Latin terms with unknown etymologies" "Old Persian". If an argument is to be made that the command in question is obscure, so is the notion of someone wanting to find Latin (~Proto-Romance) words of unlikely Persian origin. Nicodene (talk) 02:07, 23 May 2024 (UTC)[reply]

"undo" has become "cin gbere le" on History tab

The "undo" button on the edit history is appearing for me as "cin gbere le". I don't know what language this is (if it is a language) but I guess something has gone wrong somewhere in the Wiktionary backend. Smurrayinchester (talk) 12:14, 16 May 2024 (UTC)[reply]

Still says "undo" for me. —Mahāgaja · talk 12:15, 16 May 2024 (UTC)[reply]
@Smurrayinchester: See w:Wikipedia:Village_pump_(technical)#'Undo'_button_now_says_'cin_gbere_le'. —Mahāgaja · talk 12:17, 16 May 2024 (UTC)[reply]
Thanks! Smurrayinchester (talk) 13:08, 16 May 2024 (UTC)[reply]
@Mahagaja: That link doesn't work (no such section). Any chance of a permalink? I don't have this problem but I am really curious what "cin gbere le" means. Equinox 03:30, 23 May 2024 (UTC)[reply]
It's Nupe language: translatewiki:MediaWiki:Editundo/nup. See the archive w:Wikipedia:Village pump (technical)/Archive 212#'Undo' button now says 'cin_gbere_le'. Vriullop (talk) 05:57, 23 May 2024 (UTC)[reply]

Creoles using "inh" template for words from lexifier language?

The whole time, I've been using the "der" template for Macanese words that derive from Portuguese, but I've seen some Kabuverdianu entries that use "inh" instead, as well as Solombala English entries that also use "inh" for words from Russian and English, and that's not even a creole. On the other hand, Haitian Creole uses "der" for words from French, and I believe Papiamentu also uses "der" for words from Portuguese and/or Spanish. So is there a specific preferred template for these kinds of things? Are creoles really considered to be "inheriting" words from their lexifier languages? Insaneguy1083 (talk) 17:56, 16 May 2024 (UTC)[reply]

@Insaneguy1083: Old, unresolved issue: Wiktionary:Beer parlour/2018/May#Lexifier etymology template? Fay Freak (talk) 21:02, 16 May 2024 (UTC)[reply]

Chinese: how should we display alternative readings?

(Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): Currently we have a lot of ways to separate alternative readings (e.g. thâng / thóng). I have compiled the currently covered Chinese lects and displayed them here:

  • In the input, / and , are used to separate alternative readings.
  • In the input, ; and / are used to separate different sub-lects. (Hakka specifies them using = while Wu specifies them using :.)
  • In the collapsed output (before you click "more"), ,, ;, and / are used to separate alternative readings.
  • In the expanded output (after you click "more"), ,, ;, and / are used to separate alternative readings. Different lects are also displayed in different sections. One thing that is consistent is that , is used for IPA, because / would be confusing. (Note that in Hokkien, Pe̍h-ōe-jī and Tâi-lô use / but Phofsit Daibuun uses ,.)

While it would be harder (though not impossible) to unify the inputs, what is more realistic is to unify the outputs. Can (and should) we come to one standard for this? (Obligatory XKCD 927) --kc_kennylau (talk) 18:48, 18 May 2024 (UTC)[reply]

I'm not a Chinese editor, but FWIW my instinct would be to use commas like e.g. {{alter}} and other "lists", but... do we anticipate ever needing to list alternative readings of a phrase that contains commas (like 一枝草,一點露)? Because commas would become confusing there. Of course, that is a general issue, also hitting {{alter}} et al (in all languages). If anyone has the energy to code this feature, maybe all these templates (not just for Chinese) that use commas could default to commas, but switch to separating items with semicolons when the lemma form or alt forms / alt readings contain commas? Or provide a parameter someone could set to make the item-separating commas switch to semicolons in such cases? - -sche (discuss) 19:12, 18 May 2024 (UTC)[reply]
Yes, that situation does occur, in e.g. 潮州音樂——自己顧自己 which currently displays as ciu4 zau1 jam1 ngok6, zi6 gei2 gu3 zi6 gei2 / ciu4 zau1 jam1 ngok6, gi6 gi1 gu3 gi6 gi1 (with superscripts). --kc_kennylau (talk) 19:14, 18 May 2024 (UTC)[reply]
Personally, I consider it suboptimal for the Chinese version and the transliteration/reading/etc to use different punctuation there (dash vs comma) and would prefer if the transliteration/reading also used a dash (or if the Chinese used a comma). (But as long as some Chinese entries like 一枝草,一點露 do use commas, I take the point that the situation does occur.) - -sche (discuss) 01:31, 19 May 2024 (UTC)[reply]
Coming from a predominantly Northern Wu editing basis I would prefer for us to go with what we do with |w= to be adopted in all other lects. However, one thing I would like to ask about is ;, as it can be very confusing (see prev. discord convo about . Currently I think there should be some sort of case-by-case arbitration regarding ; usage rather than fully depreciating it, though if anyone thinks otherwise do speak out — nd381 (talk) 19:46, 18 May 2024 (UTC)[reply]
Yes, thank you for bringing that up, I think that would be very relevant to this thread. Basically, to summarize for other people, Wu currently covers more than 10 lects all in one parameter that corresponds to "Northern Wu" (aka "Taihu Wu"), and we still don't quite have a consensus on what to do under a situation such as:
  • Lect A: Reading P, Reading Q
  • Lect B: Reading Q, Reading R
  • Lect C: Reading P, Reading Q, Reading R.
What is clear is that in such a situation, the collapsed display would most certainly be P / Q / R, but anything beyond that is unclear:
  • Should the input be A:P,Q;B:Q,R;C:P,Q,R (group by lect), or A,C:P;A,B,C:Q;B,C:R (group by reading)? Or should we allow both?
    • Grouping by lect would be inefficient if it happens that a lot of lects share the same reading P (which actually, the Wugniu romanization takes great care to prioritize compatibility between different lects).
    • Grouping by reading would make it hard to see at first glance what readings a given lect has, which would arguably be an important information (say for someone who is only learning a single lect).
    • Allowing both inputs is also not ideal, because standardized inputs are easier to monitor and track (and bot).
  • I think it should also be noted that to my knowledge Hokkien seems to be grouping by reading.
  • Similarly there is an issue of how to group the expanded display.
  • And when it scales up, for say a word of 4 characters, we would definitely run into issues where the listed readings are all very similar, because say Lect A would have a slightly different reading of the first character, while Lect B would have a slightly different reading of the third character, and so on. (See 世界 for an example.)
--kc_kennylau (talk) 21:11, 18 May 2024 (UTC)[reply]
I think we should use slashes, and also put all readings on new lines (currently Mandarin gets new bullets for each reading, while for Cantonese everything is crammed into one line). New lines should nullify confusion with IPA slashes.
(also. shilling User:Fish bowl/p/mul#Chinese for consideration. —Fish bowl (talk) 22:10, 18 May 2024 (UTC))[reply]

Chinese: Spaces around ellipsis in transliteration?

(Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): Apologies for the double ping. 一面……一面…… has pinyin as yīmiàn ... yīmiàn ... (with spaces), while 關……屁事 has pinyin as guān...pìshì (without spaces). Both pinyin are currently created. One can see from the synonyms of these two pages respectively that their synonyms follow the same rule as the pages themselves. The Cantonese module has recently been updated to explicitly only allow the variant without spaces. What should the standard for this be? --kc_kennylau (talk) 18:53, 18 May 2024 (UTC)[reply]

spaces look cleaner — nd381 (talk) 19:46, 18 May 2024 (UTC)[reply]
also consider: one space, only on the right side of the ellipsis —Fish bowl (talk) 22:15, 18 May 2024 (UTC)[reply]
Ditto. Anatoli T. (обсудить/вклад) 00:14, 19 May 2024 (UTC)[reply]
I prefer spaces on both sides of ellipses. — justin(r)leung (t...) | c=› } 18:23, 20 May 2024 (UTC)[reply]
With spaces only after the ellipsis. – wpi (talk) 15:06, 23 May 2024 (UTC)[reply]

An organization with a partnership with Wikimedia Foundation wanted wiki editors to translate a definition of theirs. They did release the definition with an open copyright license. WMF's position is that sometimes this works.

They are currently getting more information but I wanted to ask here - what is copyright status of definitions, and what is the copyright status of translations? While WMF legal has a project for definitions this could be a good time to talk.

Bluerasberry (talk) 18:56, 18 May 2024 (UTC)[reply]

@Bluerasberry: if you look at the bottom of every page here, you will see the following sentence: "Definitions and other text are available under the Creative Commons Attribution-ShareAlike License; additional terms may apply." In addition, every time an editor submits text, it is also irrevocably released under the GNU Free Documentation License—there is also a message to that effect. Does that answer your question? — Sgconlaw (talk) 22:34, 18 May 2024 (UTC)[reply]
@Sgconlaw Thanks, but no it does not.
The Wikimedia Foundation legal team has made a soft legal assertion that translation of definitions are not subject to copyright. In the current WMF project a definition claims a conventional copyright, but since copyright does not apply to translations of definitions, that definition can be directly translated into Wikimedia projects.
I get the Wiktionary policy. My surprise here is about the official WMF position of the copyright of translated definitions. If anyone wishes to press for clarity then getting a WMF commitment to legally back this theory for Wiktionary could be interesting. Bluerasberry (talk) 14:18, 19 May 2024 (UTC)[reply]
@Bluerasberry: with respect to the WMF, that makes no sense. A definition is licensed under CC-BY-SA-4.0. This means people are free to "remix, transform, and build upon the material for any purpose, even commercially"—which would include a translation—but subject to properly attributing the material and distributing any new material based on the original material under the same licence as the original. Obviously, then, translations of CC-BY-SA-4.0 material cannot possibly be "not subject to copyright" but must be released under CC-BY-SA-4.0 as well. The position is the same under the (more or less superseded) GFDL—any modified version (which includes a translation) must be released "under precisely this License". — Sgconlaw (talk) 18:30, 19 May 2024 (UTC)[reply]
@Sgconlaw At meta:International Museum Day 2024/Translation call the WMF organized the translation of a definition as an outreach program. The talk page the gives a rationale why. They requested time to sort things.
I share your perspective on all these things, and think that what you are saying is conventional but that this WMF outreach program is surprising. I appreciate the outreach and partnership. I wish that their programs matched Wikimedia editor practices.
The conversation there links to this conversation, and I think that is as much as I wish to react. Bluerasberry (talk) 15:46, 21 May 2024 (UTC)[reply]
@Bluerasberry: I read the discussion at "meta:Talk:International Museum Day 2024" and now have a clearer understanding of what you are asking. You weren't specifically asking about the copyright status of definitions here at Wiktionary, which are clearly licensed under CC-BY-SA-4.0 and GFDL, and not free of copyright. You were referring to a definition of the word museum published by the International Council of Museums (ICOM) at https://icom.museum/en/resources/standards-guidelines/museum-definition/. It's not a very long definition, but I don't think it is short enough to be de minimis so it is plausible that it is subject to copyright. ICOM is based in France, so French copyright law would presumably apply. I don't know what that law says about WMF's arguments on "functional language" and fair use. If English copyright law applied, I'm not sure there would be any "functional language" copyright exception. If ICOM is collaborating with the WMF in a project, ideally it should just clarify that it is licensing the text in question under a free licence for the purpose of the project. — Sgconlaw (talk) 16:34, 21 May 2024 (UTC)[reply]
  • I once published a Catalan-English dictionary, entirely copied from en.wiktionary. But I was at the time the third-biggest contributor to the site, and the ninth-biggest Catalan one, so figured it was fair game. I made 70 euros from it. P. Sovjunk (talk) 21:25, 19 May 2024 (UTC)[reply]
That is fair. In that case you are going Wiki -> off wiki, selling free content in book form for a fee. In the case above, content is going off-wiki -> wiki. Bluerasberry (talk) 15:46, 21 May 2024 (UTC)[reply]

I've created a gadget that allows users to specify "preferred languages" for {{minitoc}}, similar to the preferred languages system for our translation tables. These preferred languages are linked to on the header with the goal of allowing users to navigate to languages they're interested in faster (as requested in e.g. [13]). If anyone wants to try it out, go to User:YourName/common.js and add the line importScript("User:Ioaxxere/minitoc.js");

If we adopt {{minitoc}}, could we add this script as a default gadget so it can be used by logged-out users? Pinging interface administrators who participated in the previous discussion: @-sche, Benwing2. Ioaxxere (talk) 02:16, 19 May 2024 (UTC)[reply]

@Ioaxxere Hmm, I think this is useful. I note that after changing your preferred languages, you have to refresh the page for them to display; not sure if this is fixable. Benwing2 (talk) 04:41, 19 May 2024 (UTC)[reply]
@Benwing2 That's strange — so what happens when you press "save"? Is there an error? Ioaxxere (talk) 05:00, 19 May 2024 (UTC)[reply]
@Ioaxxere Hmm. I couldn't reproduce it, even after clearing my cookies, so I commented out the import, cleared my cookies again and uncommented the import, and now I don't get the functionality at all. Benwing2 (talk) 05:13, 19 May 2024 (UTC)[reply]
BTW no JavaScript errors coming from your gadget, only from the PreviewPopup gadget and complaints about Wiktionary using third-party cookies to access enwiki and mediawiki.org. Benwing2 (talk) 05:14, 19 May 2024 (UTC)[reply]

Infrastructure: Southern Pinghua vs. Nanning Pinghua

(@Benwing2) Recently we've added Nanning Pinghua to zh-pron (e.g. 捱更抵夜). Nanning Pinghua is a variety of Southern Pinghua, which is either a branch of Sinitic directly, or a sub-branch of the Yue branch. However, there is a problem with the categories. Currently zh-pron by default categorises them to Category:Southern Pinghua lemmas, but there is also a dialectal label {{lb|zh|Nanning Pinghua}} which categorises entries independently into Category:Nanning Pinghua (by complete accident I discovered that Category:Southern Pinghua is also populated as a label category), and these two categories are not directly connected:

If you ask me, I think "Pinghua languages" / "Pinghua Chinese" shouldn't exist at all, because Northern Pinghua and Southern Pinghua do not form a linguistical sub-branch by themselves. In User:Wpi/zh-dial-list we can also see that "Northern Pinghua" and "Southern Pinghua" are treated as their own branches in Module:zh/data/dial.

Actually, upon more exploration, it seems that this issue is not unique to Nanning Pinghua. There are also Category:Cantonese Chinese and Category:Cantonese language which are also not directly related, but they seem to have "solved" the issue by using a "See also" right under the header. --kc_kennylau (talk) 08:24, 19 May 2024 (UTC)[reply]

@Kc kennylau Yes, this is unfortunate and a known issue. I would like to eliminate the 'Foo Chinese' categories in favor of 'Foo lemmas', whenever possible. Benwing2 (talk) 19:09, 19 May 2024 (UTC)[reply]
@Benwing2: Can we make "Nanning Pinghua" (the label) instead point to a dialect of "Southern Pinghua" (the language csp)? --kc_kennylau (talk) 20:48, 19 May 2024 (UTC)[reply]
@Kc kennylau Sure. I didn't mean to imply that dialects are forced to categorize directly into lemma categories, but rather that we should eliminate the parallel hierarchy underneath Category:Dialectal Chinese in favor of a more conventional hierarchy that categorizes either directly into 'Foo lemmas' or into dialectal subcategories. So for example the label Cantonese should categorize into Category:Cantonese lemmas not into Category:Cantonese Chinese, and specific varieties of Cantonese go into their own categories, which are subcategories of Category:Regional Cantonese. So the label Nanning Pinghua would categorize into Category:Nanning Pinghua, which would be a subcategory of Category:Regional Southern Pinghua (which doesn't seem to exist yet). BTW there is an open WT:RFM topic under WT:RFM#Ramifying/filling out Yue Chinese on cleaning up the Yue branch, which is currently a hopeless mess. Benwing2 (talk) 21:02, 19 May 2024 (UTC)[reply]
@Benwing2: I didn't forget about that discussion. In fact, we are discussing that amongst ourselves, and it is also a bit of a mess right now. --kc_kennylau (talk) 21:06, 19 May 2024 (UTC)[reply]

Giving Appropriate Credit for Critical Semi-Anonymous Works

Here (diff), I used Wiktionary to highlight one of what is probably the earliest appearances of the word 'Haidong'. I use Foreign Broadcast Information Service translations frequently for some words which rarely appear elsewhere, or only appear decades later. I had recently seen some discussion about the appropriateness of the use of citations/quotations that do not name specific authors. Are citations/quotations that are semi-anonymous legitimate targets for a Wiktionary citation/quotation? They can be used to meet a goal like Wiktionary:Quotations#Choosing_quotations vision to "Extend the time range that we have quotations for, or fill long time gaps;" and/or "Show the variety of genres, regions and registers that a term is used in."? There is a critical area of cultural heritage for any word that is written semi-anonymously. Here, in the above cite, we have an anonymous propaganda author and an anonymous propaganda broadcaster (at "Xining Provincial Service"), with an anonymous propaganda transcriber and an anonymous propaganda translator (at Foreign Broadcast Information Service). It would be calamitous to the project of documenting actual usage if this citation/quotation were deleted on the basis that we don't know the name of the translator who would have been "creating" this English language loan word. (Please feel free to review the materials and find any relevant names. But I didn't see any.) Have I given appropriate credit to everyone involved in that text, as far as practically possible? Is that quotation a proper subject for a Wiktionary citation/quotation? Thank you. --Geographyinitiative (talk) 20:41, 19 May 2024 (UTC)[reply]

@Geographyinitiative I don't see any issue whatsoever in citing "semi-anonymous" sources, as you say, as long as you include whatever bibliographic information is available. Has anyone threatened to delete them? AFAIK these are well-accepted. Benwing2 (talk) 20:53, 19 May 2024 (UTC)[reply]
I just want to give everybody appropriate credit, but the genre of "spy reports", which is very useful for some of the vocabulary I'm looking at, is super opaque. --Geographyinitiative (talk) 20:59, 19 May 2024 (UTC)[reply]

Interslavic language

As the Interslavic language have got the ISO 639-3 code, is it welcomed now on the en.wikt? There is an online dictionary containing around 18,000 words ([14]), that can be used as a base for introducing it here. Wojsław Brożyna (talk) 01:27, 21 May 2024 (UTC)[reply]

Typically, constructed languages are in the appendix namespace, especially ones that don't have a long history with actual speakers, etc. In principle, I'd be okay with Interslavic in the appendix namespace. —Justin (koavf)TCM 02:02, 21 May 2024 (UTC)[reply]
See the recent Wiktionary:Beer_parlour/2024/April#CFI_for_constructed_languages. The general attitude is most editors don't want more mainspace conlangs, as Justin mentioned. Appendix seems fine. Vininn126 (talk) 07:31, 21 May 2024 (UTC)[reply]
We would still need to add a language code for it, but I don't really see the issue with doing that if it's appendix-only. Theknightwho (talk) 08:46, 21 May 2024 (UTC)[reply]
  • Thanks for the replies. I hope that the code would be quickly included on the WT:LL. At the meanwhile - as the words itself should be added in appendix, it is acceptable to link them from the mainspace? For example, as the descendants of the Proto-Slavic words in the reconstrucion namespace? --Wojsław Brożyna (talk) 14:00, 21 May 2024 (UTC)[reply]
    No, it is not acceptable to link them from mainspace, with the exception of terms in mainspace languages that derive from that language - so if a mainspace language has borrowed a word from Interslavic, then it can be linked, otherwise not (not as a translation, not as a descendant in mainspace or reconstruction space, etc.). I've now added the isv code as an appendix-only language code. — SURJECTION / T / C / L / 14:07, 21 May 2024 (UTC)[reply]
    To be sure that I properly understood: the reconstruction namespace is treated also as a part of mainspace, yes? --Wojsław Brożyna (talk) 14:10, 21 May 2024 (UTC)[reply]
    Essentially, yes. Theknightwho (talk) 14:11, 21 May 2024 (UTC)[reply]

To be perfectly honest, I don't really see the point. When I look at Category:Appendix-only constructed languages, I see mostly languages with little or no practical use. The only possible outcome of Interslavic in an appendix-only construction would be duplicating stuff that can already be found here and/or here, an additional risk being that we might end up with divergent versions. So it's not clear to me what the value would be. For the record, Interslavic is not some hobby language, and its community is at least ten times bigger than that of Ido and Interlingua — language for which apparently an exception has been made. IJzeren Jan (talk) 14:51, 21 May 2024 (UTC)[reply]

@IJzeren Jan I think it was a mistake to include Ido and Interlingua in the mainspace; these should be moved to the Appendix, as we already did with Novial for example. They should not be considered precedents. Benwing2 (talk) 19:02, 21 May 2024 (UTC)[reply]
We might want to eventually take this to a formal vote or perhaps a a thread in WT:RFM. We have enough BP threads to do so. Vininn126 (talk) 19:05, 21 May 2024 (UTC)[reply]
I would much rather have a vote as to what the criteria for conlangs being included in mainspace is, which would conveniently decide this question as well. I personally favour "has or has had a native speaker", but it'll need to be more fleshed-out than that to be rigorous. Theknightwho (talk) 22:28, 21 May 2024 (UTC)[reply]
Some objective criterion would certainly be helpful. However, native speakers won't do, because auxiliary languages are not even meant to have native speakers. Even in the case of Esperanto it's not the ultimate fulfilment of its goals but merely a side effect of its usage. And note that even though Esperanto reportedly has about a thousand native speakers, it still does not have any monolingual native speakers. If anything, I'd argue that a criterion like "has or had a user community of at least 1,000 people" (or whatever other number you prefer) would be more suitable, although I'll admit such figures are not without issues either. IJzeren Jan (talk) 07:14, 22 May 2024 (UTC)[reply]
That's assuming people want more conlangs; I'd say most editors don't based on the thread I linked above. Vininn126 (talk) 07:17, 22 May 2024 (UTC)[reply]
You don't seem to get my point. First of all, native speakers are gravely overrated. Although Esperanto is probably the only constructed language with native speakers, it's not like they set any linguistic standards, and as far as there is any form of natural development at all, that's done by L2 speakers. Besides, there are many languages here that never had native speakers. Let me just mention Old Church Slavonic and Rumantsch Grischun. To go even further, it's questionable whether classical Latin ever had native speakers in the form it's being presented. Secondly, it's wrong to assume some kind of binary distinction between natural and constructed languages. Languages show various degrees of deliberate human intervention, which goes especially for artificially created standardisations (like Nynorsk, Rumantsch Grischun, Euskara Batua, Limba Sarda Comuna) and revived languages (like Modern Hebrew, revived Cornish). At last, I don't really understand where that "allergy" to languages that were created at some point in history comes from. Nobody complains about some obscure language with very few speakers, so what's the problem with a few constructed languages with a lot of speakers? IJzeren Jan (talk) 10:56, 22 May 2024 (UTC)[reply]
I have a feeling you didn't read through the thread. Vininn126 (talk) 11:02, 22 May 2024 (UTC)[reply]
Those first counterpoints are questionable. And secondly it is meaningful to have native speakers - they have a natural intuition for what should or shouldn't be correct. Vininn126 (talk) 11:06, 22 May 2024 (UTC)[reply]
@IJzeren Jan: When we record languages like Latin or Old Church Slavonic, we don't simply record the written language, but we simultaneously record the spoken language associated with it: Vulgar Latin and Pre-Bulgaro-Macedonian. For auxlangs, you don't have a natural counterpart that is recorded together with the written one. Thadh (talk) 08:12, 23 May 2024 (UTC)[reply]
In the meantime, do you think that it'd be worth it to have a vote to remove Interlingua, Ido, and Volapük? I honestly don't see a criteria for constructed languages being agreed upon in the near future, and as is, the only constructed language that'd fall under any criteria that's been mentioned is Esperanto anyways. AG202 (talk) 14:12, 22 May 2024 (UTC)[reply]
I'd rather do that only if a vote to have a general criterion fails. Otherwise, it might end up being a lot of work for nothing. Theknightwho (talk) 15:00, 22 May 2024 (UTC)[reply]
I am fine with having a vote about Interlingua, Ido and Volapük in the near term, as I think User:AG202 is right that it will be difficult to find a criterion that everyone agrees on and I don't think it will be a waste of effort to have this vote. Benwing2 (talk) 18:02, 22 May 2024 (UTC)[reply]
It should be simple enough to set the entry barrier for a given conlang as ‘there exists a consensus on Wiktionary to admit it’. I imagine only Esperanto will clear that hurdle. Nicodene (talk) 23:24, 22 May 2024 (UTC)[reply]
We, more precisely, have to ask ourselves whether it would probably be child abuse if the conlang were taught to children as a native language. In Esperanto it wasn’t because the parents met and spoke it organically. In Interslavic it would be, because why don’t you teach the bairn your native harder Slavic language if you are proficient in it? No reasonable person … In Klingon the children also became weird. Were it to suffice that a native speaker has existed, one would set perverse incentives. I maintain my position that conlangers are suspect to suffer personality disorders. Fay Freak (talk) 12:40, 23 May 2024 (UTC)[reply]
@AG202: I support creating a vote to determine the status of those three languages although I'm undecided myself. Ioaxxere (talk) 22:00, 22 May 2024 (UTC)[reply]
Are there easily accessible objectively measurable activity stats for these conlangs? Such as the number of edits per month or the rate of adding new lemmas? Just to see the current situation. That said, any activity stats can be gamed and artificially propped by certain individuals via tons of low quality edits if this criteria were to become a part of the official policy. --Ssvb (talk) 07:49, 23 May 2024 (UTC)[reply]

Feedback invited on Procedure for Sibling Project Lifecycle

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Dear community members,

The Community Affairs Committee (CAC) of the Wikimedia Foundation Board of Trustees invites you to give feedback on a draft Procedure for Sibling Project Lifecycle. This draft Procedure outlines proposed steps and requirements for opening and closing Wikimedia Sibling Projects, and aims to ensure any newly approved projects are set up for success. This is separate from the procedures for opening or closing language versions of projects, which is handled by the Language Committee or closing projects policy.

You can find the details on this page, as well as the ways to give your feedback from today until the end of the day on June 23, 2024, anywhere on Earth.

You can also share information about this with the interested project communities you work with or support, and you can also help us translate the procedure into more languages, so people can join the discussions in their own language.

On behalf of the CAC,

RamzyM (WMF) 02:26, 22 May 2024 (UTC)[reply]

‘Surface analysis’: The end of an era?

As tired as we all are of the matter, the previous discussion ended in a possibly promising proposal: replacing all current usages of {{surf}} with a new template {{af+}}, a clone of {{af}} with the accompanying text ‘derivable from X + Y’.

It seems to me that this is the first proposed phrasing to feature all three of the following:

  • Precision: word-derivation is actually a ‘thing’ in linguistics, unlike ‘surface analysis’ (edit: and also unlike “X is equivalent to Y + Z”)
  • Comprehensibility: any educated person should be able to understand it
  • Compatibility: it works with all (valid) usages of {{surf}}

Thoughts? Nicodene (talk) 02:44, 23 May 2024 (UTC)[reply]

Can we not just do this by a hard redirect of {{surf}} to {{af+}}? So doing would avoid notifications of change being sent out for the pages modified. --RichardW57m (talk) 10:27, 23 May 2024 (UTC)[reply]
I do not like the idea of {{af+}} printing "derivable from". I would prefer a different name. Vininn126 (talk) 10:46, 23 May 2024 (UTC)[reply]
Me neither, it is not explicit about mere non-diachronic derivability.
I avoid schematic phrasing anyway, for literary style: writing sometimes “analyzible as”, sometimes “equivalent to” etc. Fay Freak (talk) 12:29, 23 May 2024 (UTC)[reply]
@Fay Freak As far as I can tell it is inherently synchronic. I cannot think of a non-synchronic way to read the statement “quickly is derivable from quick + -ly”.
As for those other phrasings, they are not precise, which is what leads to silliness like “month: equivalent to moon + -th”.
@Vininn126 I understand that the proposed phrasing doesn’t actually contain the word that af stands for, but I’m not sure there is any that does which fits the scope of {{af}} / {{surf}}. We can’t well say **affixable from X + Y, and even if we could, it would be inaccurate for compounds like greenhouse, which fall within the current scope of {{af}} / {{surf}} but do not involve any actual affixes.
The mismatch between template name and description could be fixed another way: by renaming {{der(+)}} to {{ult(+)}} “ultimately from”. If anything that is actually a better description than “derived from”, such that many editors (myself included) have found themselves manually writing out ”ultimately from {{der|…}}”. Once that is done, {{af}} can be renamed to {{der}} and {{surf}} to {{der+}} “derivable from”.
With bot assistance it would be fairly straightforward to implement this, I think. And the resulting system would be a good deal simpler/more transparent than our current one. Notably {{compound}} could finally be retired, since “derivable from” is applicable for compounds, while af is technically incorrect. Nicodene (talk) 19:52, 23 May 2024 (UTC)[reply]
I'm fine with the concept; like I said, it's the naming I don't like. {{der+}} is also not ideal. {{ult}} is so far the best I've seen, but it feels clunky. Vininn126 (talk) 19:53, 23 May 2024 (UTC)[reply]
Clunky in what way? Nicodene (talk) 04:18, 24 May 2024 (UTC)[reply]
Well to be honest it's not less clunky than {{surf}}. Vininn126 (talk) 09:10, 24 May 2024 (UTC)[reply]
I've got no strong opinion as to what it should be replaced with, but I too would like to get rid of this template; I never liked the wording. PUC15:04, 24 May 2024 (UTC)[reply]
We should first agree on the wording, and then consider the best way to implement it.  --Lambiam 16:57, 24 May 2024 (UTC)[reply]
What about "analysable as"? I'm not sure it's a good idea to use derive as it's already doing a lot of heavy lifting—we already use it in etymologies to refer to a term being derived from other languages, as well as in the "Derived terms" heading (and it was pointed out in an earlier discussion that these two uses are already possibly inconsistent). — Sgconlaw (talk) 19:07, 24 May 2024 (UTC)[reply]
As a bonus we could switch from {{surf}}'s up to {{anal}}. Vininn126 (talk) 19:10, 24 May 2024 (UTC)[reply]
Yes, instead of just scratching the surface let's go deep with {{anal}}. PUC19:51, 24 May 2024 (UTC)[reply]
Are we done with sniggering and drawing penises on the toilet stall door? Sgconlaw (talk) 22:23, 24 May 2024 (UTC)[reply]
I have no idea what you're referring to. Vininn126 (talk) 22:23, 24 May 2024 (UTC)[reply]
@Sgconlaw Vague formulations like ‘analysable as’ (or per @Sokkjo ‘equivalent to’) are as mentioned problematic in that they result in users claiming synchronic impossibilities like “month = moon + -th” or “again = on- + gain”.
The use of “derived” in derived terms sections is actually an argument in favour of this proposal, now that you mention it, as there’d be an increase in symmetry here. The derivation of quickly from quick is already mentioned on both entries; why not use matching language in both places?
As for the (unrelated) use of the template der, that is as mentioned fixable by renaming it to {{ult(imately)}}, which seems more descriptive really, and then renaming af/surf to der/der+. Nicodene (talk) 22:58, 24 May 2024 (UTC)[reply]
@Nicodene: I would not be comfortable with changing {{der}}/{{der+}} to “ultimately”. We have been using “derived” in etymology sections to denote derivation from one source language to another, and now suddenly repurposing it to mean something else is, I feel, a step too far. Moreover, “ultimately” suggests to me the omission of intermediate steps of derivation to some remote source language like Proto-Indo-European. (For example, term X is derived from Old French, which is derived from Latin, which is ultimately from Proto-Indo-European (skipping over Proto-Italic).) — Sgconlaw (talk) 23:18, 24 May 2024 (UTC)[reply]
(Edit: see below.) Nicodene (talk) 05:00, 25 May 2024 (UTC)[reply]
Oppose The only verbiage I support is equivalent to in areas where people instead use {{surf}}. I also do not support users going around changing From {{af}} to {{af+}}, as creating {{af+}} would promote. -- Sokkjō 21:11, 24 May 2024 (UTC)[reply]
@Sokkjo: I guess I'm OK with "equivalent to" as well if there's consensus for that. — Sgconlaw (talk) 22:23, 24 May 2024 (UTC)[reply]
I would also prefer "equivalent to" over any specific template, and I'm using it regularly. Thadh (talk) 23:03, 24 May 2024 (UTC)[reply]
(Edit: see below.) Nicodene (talk) 05:00, 25 May 2024 (UTC)[reply]
Would this be a thread to take to a wider audience? Vininn126 (talk) 23:11, 24 May 2024 (UTC)[reply]
Certainly. Nicodene (talk) 23:40, 24 May 2024 (UTC)[reply]
Support Theknightwho (talk) 21:39, 24 May 2024 (UTC)[reply]
@Theknightwho: thoughts on the wording? — Sgconlaw (talk) 22:23, 24 May 2024 (UTC)[reply]
"Equivalent to" is fine, though @Nicodene raises a good point about the difference between long-range derivations which contain combinations of roots completely alien to a modern speaker (e.g. analysing health as whole + -th) versus readily apparent formations that just-so-happened to enter the language as ready-formed borrowings. The problem with "equivalent to" is that it could refer to either, so it would be helpful to establish an alternative to refer to one of them. Theknightwho (talk) 10:11, 25 May 2024 (UTC)[reply]
I’m not aware of any serious linguistic source that would publish a comment like “health is equivalent to whole + -th”, or the same equation using any of the other phrases mentioned above. Personally I don’t see the point of it. If one really feels the need to do that sort of thing, one can simply spell it out in words: “…from Proto-Germanic *hailaz + *-iþō, which correspond to the modern English whole and -th.” I don’t see why this should need a template. Nicodene (talk) 11:52, 25 May 2024 (UTC)[reply]

Here is my revised proposal.

At the moment we handle distant etymological relations like "surgical: ultimately from Ancient Greek χειρουργία" using the template {{derived}}. This is a bit strange given that our "derived terms" sections never feature these words, instead having only language-internal formations like cuteness (< cute + -ness), for which we do not use the template {{derived}} but rather {{affix}} and its various "children" like {{compound}}.

My solution is to replace the current template {{derived}} with {{etyl}} and then rename the current {{affix}} to {{derived}}. Hence, finally, {{derived}} will match "derived terms".

(Also we'll avoid the awkwardness of using a template called {{affix}} for compounds like greenhouse, where there is no affixation going on at all. And, perhaps, we can one day retire {{compound}} and other such templates, replacing {{blend|en|emotion|icon}} with {{der|en|emotion|icon|blend=1}} and so on. Cutting down the absolute jungle of ety templates to a more modest size.)

Following this the infamous {{surf}} can be renamed to {{der~}} with the displayed text changed to "derivable from". This is the only phrasing proposed thus far which is understandable to just about anybody yet precise enough to discourage nonsense like "health: analysable as whole + -th" or "husband: equivalent to house + bond".

Thoughts? (Pinging @Benwing2.) Nicodene (talk) 08:22, 25 May 2024 (UTC)[reply]

Categories of child languages also be a subcategory of parent language

Currently, when I would like to look at "Tagalog terms borrowed from Spanish", terms from category: "Tagalog terms borrowed from Mexican Spanish" do not show up. I would like to propose that "Tagalog terms borrowed from Mexican Spanish", "Tagalog terms borrowed from Early Modern Spanish" be a subcategory of "Tagalog terms borrowed from Spanish".


Of course, should be applicable in the entire Wiktionary categorization:

  • Borrowed from Chinese

Subcategories:

    • Borrowed from Mandarin
    • Borrowed from Cantonese
    • Borrowed from Hokkien

[]

Similar to currently implemented in;

  • Derived from Latin
    • Derived from Vulgar Latin, Ecclessiastical Latin, Medieval Latin []

Seems to me it's already working in derived terms but not in borrowed terms. Ysrael214 (talk) 12:41, 23 May 2024 (UTC)[reply]

Chinese isn't a good example here, as it works differently for reasons that aren't worth going into in this thread. I agree that it makes sense to subcategorise borrowings like this in general, though, but we shouldn't have categories like "Tagalog terms borrowed from West Iberian languages" and so on, even though Category:Tagalog terms derived from Spanish is in Category:Tagalog terms derived from West Iberian languages. Theknightwho (talk) 13:32, 23 May 2024 (UTC)[reply]
I agree that "X terms borrowed/calqued/etc. from [language variety]" should be a subcategory of "X terms borrowed/calqued/etc. from [language]" (in a similar manner to "X terms derived from [language variety]" being a subcategory of "X terms derived from [language]"). Einstein2 (talk) 19:11, 23 May 2024 (UTC)[reply]

The pos= parameter

This is widely used in links as a way to give non-glosses (e.g. Jeju (island in South Korea)), because giving the definition as a gloss would be incorrect (e.g. Jeju (island in South Korea) is incorrect, because "Jeju" does not mean "island in South Korea" in general). @Fenakhay has decided today that this is "misuse" of a parameter, which is something that came up at the entry Dagelet. I have no idea why this is a problem, how this is misuse of anything, or what they are hoping to achieve by objecting to this, really, but given this is clearly proving of some difficulty should we rename the parameter? Theknightwho (talk) 01:21, 25 May 2024 (UTC)[reply]

Never used the parameter myself, but it sounds confusing in the use that you describe. Support renaming it. CitationsFreak (talk) 03:01, 25 May 2024 (UTC)[reply]
We could introduce |q=. It'd make sense to harmonize the "comment" parameter across different templates. Nicodene (talk) 04:23, 25 May 2024 (UTC)[reply]
I don’t really see why it is a “mistake” to use the |t= parameter in cases like that, with the gloss enclosed in quotation marks. Don’t we define geographical places and other proper nouns (e.g., names of languages) without any special type of formatting anyway? If so, why should they be treated unlike other definitions here? — Sgconlaw (talk) 04:28, 25 May 2024 (UTC)[reply]
@Sgconlaw Leaving aside place names specifically, there needs to be a way to express non-gloss definitions that doesn't use quote marks, since quote marks imply it's a gloss. Theknightwho (talk) 04:30, 25 May 2024 (UTC)[reply]
Yes. I find it a bit jarring to encounter non-glosses placed in quotation marks as if they’re glosses. Not a terribly urgent thing, but still. Nicodene (talk) 13:29, 25 May 2024 (UTC)[reply]
q= seems like a qualifier, rather than a non-gloss 'definition', so people would surely end up using it for things like "archaic" or "British English", and then it'd be inconsistent if {{m|...|q=...}} generated (like pos= does) an unitalicized qualifier while {{m|...}} {{q|...}} generated an italicized one... so I think we'd need to make it italicize for consistency... and then the question is, do we consider that "archaic" and "island in Korea" are or should be treated (and formatted) as the same type of information? (If so, then no problem, I guess! OTOH, if we consider them different kinds of thing, and iff we don't want to keep using pos= for "island in Korea", another idea is ngd= or ng= (taking inspiration from {{ngd}}/{{ng}}). - -sche (discuss) 15:38, 25 May 2024 (UTC)[reply]
I'd prefer ng= for brevity, which is another one of its aliases: {{ng}}. Theknightwho (talk) 15:42, 25 May 2024 (UTC)[reply]
Here and there I’ve found myself wanting to specify things like “archaic” or “with silent ⟨f⟩”. But I don’t mind either way. Nicodene (talk) 16:16, 25 May 2024 (UTC)[reply]
If we want non-glosses we should have a parameter for that. I use the POS parameter a lot for designating part of speech. Vininn126 (talk) 14:02, 25 May 2024 (UTC)[reply]
I agree that in links |pos= should be limited to specifying parts of speech and shouldn't be used for formatting purposes. I also have no objection to the introduction of a separate |ng= (non-gloss definition) parameter, with the text italicized in line with {{non-gloss definition}}. However, we need to ensure consistency between how glosses and non-glosses are indicated in entries and within links. As I mentioned above, at the moment we seem to treat definitions for geographical terms as glosses. For example, one sense of Jeju is "An island, province, and city in South Korea", and this is not enclosed within {{non-gloss definition}}. If this is so, then we should not treat this definition as a non-gloss within a link. — Sgconlaw (talk) 16:47, 25 May 2024 (UTC)[reply]
@Sgconlaw To explain my thought process: "island in South Korea" is a truncated definition (given for the purpose of clarification) which unambiguously isn't a gloss, as "Jeju" is not a generic term for South Korean islands. Whether the output of {{place}} should be treated as a gloss or non-gloss is a separate question, really. Theknightwho (talk) 21:15, 25 May 2024 (UTC)[reply]
I'm fine with a new param |ng= (although on a practical level it will take significant work to implement it everywhere; to future-proof this, we might want a put a list of pass-through link params somewhere convenient, like in Module:headword/data or Module:links/data, and rewrite the places that have pass-through link params to use the list instead of hardcoding the set of params). I'm not sure whether it makes sense to italicize it; keep in mind that currently |pos= is often used for arbitrary text like "all meanings" that aren't necessarily non-gloss definitions. Benwing2 (talk) 22:09, 25 May 2024 (UTC)[reply]
An edge case here is what exactly counts as a “part of speech”, especially with affixes. I’ve often used and seen pos used to give descriptions like “verb-forming affix”, “suffix forming agent nouns”, “diminutive suffix”, etc. These are not solely confined to part-of-speech information, but they are basically a concise way of stating what the part of speech is along with what the affix is used for.--Urszag (talk) 05:39, 26 May 2024 (UTC)[reply]

Shouldn't Wiktionary:No personal attacks be official?

Is there any good reason why that draft policy isn't official? Should we go about starting the process to make it official? Purplebackpack89 15:19, 25 May 2024 (UTC)[reply]

It's unnecessary. PUC15:30, 25 May 2024 (UTC)[reply]
Why is it unnecessary? What good can come of allowing personal attacks? Purplebackpack89 19:36, 25 May 2024 (UTC)[reply]
I didn't say we should allow personal attacks, I said having an official policy about it is unnecessary and I don't see what good is going to come of that. Common sense and common courtesy are enough; things might derail every once in a while but I see no need for all that time-intensive Wikipedia-style lawmaking. PUC22:07, 25 May 2024 (UTC)[reply]
As I see it there are really only two alternatives: 1) enacting a Wikipedia-style policy, or 2) this place becoming or staying the Wild West where users are allowed to harass and bully editors off the project. You say common sense and courtesy are enough but it's been my experience that some of the supposedly most-trusted editors on this project see no need to be courteous to other editors. Purplebackpack89 22:17, 25 May 2024 (UTC)[reply]
+1, I'd support it becoming official policy. AG202 (talk) 23:07, 25 May 2024 (UTC)[reply]
They don’t need to see a need. I believe in them to strive towards the optimal results rather, quite different from that which is “needed”. Sometimes it is making their stance clear of how low they esteem the opinions of certain editors in certain respects. Not needed for you, since you don’t have insight either way. Or for me, since I know I am retarded. I even make personal attacks against myself, what does the policy proposal say about it? Does it get your nose out of joint? Why does it happen when your pseudonymous internet identity is disrespected at some particular point in time? It’s not like we harass and bully you. Look at Kiwifarms, that’s bullying. Consistency is key. Fay Freak (talk) 23:12, 25 May 2024 (UTC)[reply]
The question is rather what good comes from a formal policy-page, save your personal stickling. The page has been added in February 2006, not changed since, when everyman has had made his experiences with the WWW for three years on 800 x 600 monitors via AOL CD-ROMs. Internet users have lost their virginities and even endured the MAGA era. There is extensive documentation how bad arguments and campaigns work over this medium; we call them out, and infer attributes of parties from their behaviour, sometimes also spanning decades, and vice versa, inasmuch as the factual argument is supported thereby.
Purplebackpack89 does not even understand the policy proposal he cites, which creates a false dichotomy by bidding to abstain from that but “discuss the facts”, which are quite different a thing in Wikipedia ~ early 2000s Richard Dawkins ideology of internet nerds: we also ignore “the facts”, as opposed to “the language”, and the audience having to make out its stance on it.
Behavioural addiction is a concept only paradigmized in this decade, our sensitivities of it, doomscrolling and other internet-related individual obsessive-compulsive behaviours conceptualized towards the end of the last. We should address people personally we suspect to have problematic behaviours; sometimes even to the point of remote diagnostics. If you read User talk:Surjection#what is wrong with you, or otherwise are attentive to the eccentricities added to Wiktionary and its discussions, you know that there is nothing left over, than perhaps refer to the psychiatrist; if there is data for this, an editor will get his personal homework. You see, some IP attacked me personally very heavily but I won. Nothing became better from a ban or something preventing a specific mode of discussion. Fay Freak (talk) 23:12, 25 May 2024 (UTC)[reply]
I think we need a vote to make it official. CitationsFreak (talk) 15:55, 25 May 2024 (UTC)[reply]
No objection if there's consensus to make it official. I agree it needs to be formally voted on. — Sgconlaw (talk) 16:38, 25 May 2024 (UTC)[reply]
agreed — SAMEER (؂؄؏) 01:01, 26 May 2024 (UTC)[reply]
Agree with PUC: We don't have a policy forbidding murder either; does that mean murder is allowed on Wiktionary? No of course not. National laws prohibit murder, and basic human decency prohibits personal attacks. MuDavid 栘𩿠 (talk) 04:28, 26 May 2024 (UTC)[reply]
How can one murder on wiktionary? Word0151 (talk) 21:23, 26 May 2024 (UTC)[reply]
I agree with User:PUC that this policy seems unnecessary. It's not as though anyone is under the impression that personal attacks are allowed... Ioaxxere (talk) 04:57, 26 May 2024 (UTC)[reply]
For folks that state that we don't need a policy, I'll point to Wiktionary:Beer parlour/2023/July § How to report a user? and how even though there was a consensus to take action against said user, there has yet to any taken. CC: @bd2412 I would hope that we wouldn't need a policy like this, but it will at least force more action vs handwaving it away as we've seen happen here again and again. It doesn't look good on the community as is currently; we have too many instances of personal attacks, even in more public places like Beer parlour, for a userbase with this few active editors. (As seen by where this proposal came out of) AG202 (talk) 08:41, 26 May 2024 (UTC)[reply]
Reading back through that discussion, I'm truly disappointed that several admin and many active users said that something should be done but yet nothing was done. If that doesn't show why we need more explicit policy, I don't know what does. It just shows pure favoritism at this point, almost like an old guard. AG202 (talk) 08:50, 26 May 2024 (UTC)[reply]
@AG202 In that particular case, I don't see how an official policy would have made any difference, unless it mandated that specific things must happen under certain circumstances. The current proposal doesn't do that, and any that did would need to be formulated very carefully. Theknightwho (talk) 23:35, 26 May 2024 (UTC)[reply]
I agree. We must have some way to punish those who break the rule, or it's worthless. CitationsFreak (talk) 02:43, 27 May 2024 (UTC)[reply]

Wu lects

With current Yue subdivision discussions underway, I would like to ask for a parallel discussion regarding the exact same subject but for Wu. Similar discussions have already happened and the consensus is as follows:

  • wuu Wu Chinese 吳語
    • wuu-nor Northern Wu 北部吳語 (alias: Taihu Wu 太湖片)
      • wuu-sha Shanghainese (Shanghai Wu) 上海小片
      • wuu-sji Sunan Wu (Suzhounese) 蘇嘉小片 (Sunan: 蘇南)
      • wuu-sdc Shadi Wu 沙地話
      • wuu-pil Piling Wu 毗陵小片
      • wuu-txh Tiaoxi Wu 苕溪小片 (Huzhounese 湖州小片)
      • wuu-lsx Linshao Wu 臨紹小片
      • wuu-nby Yongjiang Wu 甬江小片 (Ningbonese 寧波小片/明州小片)
      • wuu-hzn Hangzhounese 杭州話
    • wuu-tzo Taizhounese 台州片 (Taizhou Wu)
    • wuu-wzh Oujiang Wu 甌江片 (Wenzhounese 溫州片)
    • wuu-jhw Wuzhou Wu 㜈州片 (Jinhua Wu 金華片)
    • wuu-lsc Chuzhou Wu 處州片 (Lishui Wu 麗水片)
    • wuu-sqx Xinqu Wu 信衢片 ( Xin = Shangrao)
    • wuu-xww Xuanzhou Wu 宣州片 (Western Wu 西部吳語)

Notes:

1. Shanghainese and Taizhounese adopted as their respective concepts encompass most if not all lects within the branch they describe. In particular, "Shanghainese" can also refer to urban-like suburban lects, or even all suburban lects in general, and there is no representative lect for the Wu spoken in Taizhou prefecture
2. Hangzhounese is an isolate within the Northern Wu family; it has significant Northern Mandarin influence and has even so far as to have been classified as a Mandarin language by some
3. Inland Wu subdivisions (much like some Yue areas) is highly contentious and the scheme adopted here is one that the Wu editors stick to, and is essentially a slightly modified version of the 1987 Atlas subdivisions
4. Northern Wu will be a family code, like the potential future Yuehai code. Its further subdivision is purely practical as both active Wu editors tend to focus on Northern Wu, and is not a value judgement regarding the significance of Southern Wu sub-subbranches
5. Southern Wu is highly likely to be areal and thus will not have a corresponding code

If there are no objections, we shall follow through with this and implement these codes as soon as possible. @Musetta6729 (Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): — nd381 (talk) 21:14, 25 May 2024 (UTC)[reply]

@ND381 This is overall fine with me and seems suitably conservative and flat. My main concern is with the codes; I'd prefer to use codes that are less arbitrary and use the first three letters of the lect name whenever possible, otherwise using the first two letters of the first part along with the first letter of the second part. Benwing2 (talk) 22:03, 25 May 2024 (UTC)[reply]
The codes already for the most part adhere with what you want.
  • Northern
    • Shanghai
    • (ISO 639-6 code for Sujiahu)
    • Shadi, Chongming (Shadi is a place name)
    • Piling (no representative lect)
    • Tiaoxi, Huzhou (Tiaoxi is a place name)
    • Linhang-Shaoxing (Linshao is an abbreviation of these two place names; Shaoxing is a prefecture whereas Linhang is a county)
    • Ningbo, Yongjiang (spoken only in Ningbo prefecture and the tiny Zhoushan prefecture)
    • Hangzhounese
  • (ISO 639-6 code for Taizhounese)
  • (ISO 639-6 code for Wenzhounese)
  • Jinhua, Wuzhou (spoken only in Jinhua prefecture)
  • Lishui, Chuzhou (spoken in Lishui prefecture)
  • Shangrao-Quzhou, Xinqu (Xinqu is an appreviation of these two prefecture names; X included at the end to signify the historical abbreviation)
  • Xuanzhou Western Wu (Western Wu is a common name for the branch)
Some of these could have their letters shuffled around but for the most part they are already, in fact, two letters from one name and one letter from another name — nd381 (talk) 10:29, 26 May 2024 (UTC)[reply]

Luwian hieroforms template

Some Hieroglyphic Luwian lemmas have multiple spellings. Therefore I need a template similar to {{egy-hieroforms}} for Hieroglyphic Luwian for them. Can this be done? Antiquistik (talk) 08:49, 26 May 2024 (UTC)[reply]

Latin pronunciations in English entries

There are some unadapted Latin borrowings in English that were given Latin pronunciations by Doremitzwr along with English ones. They are (at least): argumenta ad populum, argumentum ad populum, opere citato, operibus citatis, pactum de non petendo, simpliciter. It seems to me such pronunciations don't belong, so I'm going to delete them. I also found an Aramaic entry מלכתא with an Ashkenazi Hebrew pronunciation, which I suspect doesn't belong either, but I'm leaving it for now. Bringing this up here in case people have other thoughts. Benwing2 (talk) 09:18, 26 May 2024 (UTC)[reply]

BTW the Latin pronunciation was the only one given for pactum de non petendo. Can someone supply an English pronunciation? Benwing2 (talk) 09:23, 26 May 2024 (UTC)[reply]
Done. This is almost never said aloud in English, as far as I can tell. Theknightwho (talk) 21:47, 26 May 2024 (UTC)[reply]
Guy has done so because they are not English, notwithstanding the header and templates. Save for simpliciter, everyone of them should be moved to Translingual according to my correct custom of viewing the things. I mean you already admitted it for op. cit., so why is opere citato “English”? Both can have both English and classicizing Latin pronunciation under a Translingual header. The linked terms are used in German as well but German dictionaries would not include them and claim them to be German, less propense to see foreign terminology as integrated. Fay Freak (talk) 22:48, 26 May 2024 (UTC)[reply]
@Fay Freak You are probably right. Benwing2 (talk) 22:50, 26 May 2024 (UTC)[reply]
I think that's a misunderstanding of what it means to be Translingual: if something is Translingual, then it very much can still part of English, but it's convenient to avoid having tons of duplicate entries all with the same spelling. There's also the issue of differing pronunciations, which has come up before in relation to Translingual A4 (paper size) (which is part of everyday English in the UK, even though we only have a Translingual entry for it). Theknightwho (talk) 22:52, 26 May 2024 (UTC)[reply]
Just FYI, we can have Pronunciation sections in Translingual entries with the pronunciation in multiple languages. An example where this occurs with several languages is Homo sapiens. I just created {{IPA+}} to make this a bit easier; it prefixes the output with the langname, similar to {{m+}}. Benwing2 (talk) 23:49, 26 May 2024 (UTC)[reply]
And the paper size, which we etymologize as from a German standard, is interestingly in German speech DIN A4 [dɪn ʔaː fiːɐ̯], as the regulatory authority is die DIN f [diː ˈdɪn]; A4-Papier is also a word, but we can form compounds even from a third language + section language via |langN=, not only translingual + section language. And yet it does not completely outrule that even this, DIN A4, is translingual.
It is a bit like with family names and other stuff, we had nasty votes about, that is not subjected to normal language rules by the community (Verkehrsanschauung, the jurist says). I ask whether something is intended to be translingual in the beginning: as certain now famous units, the meter, Kelvin and what not, which started in the monolingual scientific communities of certain countries.
For the bejel disease I added Latinate translingual translations in 2018 some of which were only ever found in German medicine works—because the common microbial identity of the diseases of various countries was only later recognized when hygiene ousted them.
As for pactum de non petendo, this is in turn one of the kind of terms that has been left over from the Middle Ages or so when they spake Latin in college, and ius commune and hence Latin as a law language was superseded only by the German Civil Code in 1900, with not every dogmatic term translated however. I probably have heard pactum de non petendo, but this is practically not so important so I bring forward more entropic examples like venire contra factum proprium also connected to § 242 BGB, and a long baggage around the law of unjust enrichment such as condictio indebiti, condictio ob rem, datio ob rem, datio obligandi causa, condictio ob turpem vel iniustam causam etc., which are mentioned and said to be found in a provision of the German Civil Code if the individual jurist likes so, by how much he is instilled by knowledge of the legal system’s historical background (the most illustrative example of this is the student book by Hans Josef Wieling on unjust enrichment last edition 2020), which also counterindicates all these terms being German or English.
Science and academia, whether debating arguments, for which it refers to known terminology about fallacies and cognitive biases, or just teaching, has to explain loan formations and intellectual history as contrasted from outside languages and debates, and hence carries forward terms and creates false impressions of regular inclusion of a term within a language.
Where in Arabic or Indian classrooms some scientific fields are difficult to discuss in the local native language at all, we are better in England and her former colonies and Germany to translate all education, but some Latin terms have been left over for centuries, and in a German psychology paper, not to speak of IT, there is a whole lot of untranslated English because you just got to know it anyway to get the grades.
It happens in all classes; practically you can’t understand this German drill I quoted at rambizzy without also being proficient in English. For law students they publish booklets like Latein für Jurastudierende; guess for some sciences you technically have to have some limited proficiency in a foreign language, morphology knowledge + basic stems, again indicative of these terms not being German.
Guess this is intuitive for me because I have atypical attention to details rather than to absorb every word-string as a social feedback integrating in the language community where it has been encountered; I don’t blame editors for not having thought around the corner so much when their language, English, has a low threshold of integrating foreign terms; and I had to illustrate the problem with German to mark the contrast of foreign vs. borrowed; and I know people don’t have the same propensity to systematize, or even the capacity, without both hyperpolyglottism and interdisciplinary excursions at play, even though it actually happen most closely to how I have described it. Fay Freak (talk) 00:17, 27 May 2024 (UTC)[reply]
I feel like it's the same level of difficulty for translating in non-European-language classrooms. People who want to discuss science topics without the words needed to do so coin them, either by borrowing them (as you say), calquing them, or describing what the term refers to.
Also, that German quotes does not belong in an English entry. The song is a German song which uses code-switching. I don't think that a random English entry is the best place to demonstrate the fact. Maybe an appendix page? CitationsFreak (talk) 08:10, 27 May 2024 (UTC)[reply]
It would not be completely unreasonable to call all Latin-derived expressions that retain inflection used in learned and not-so-learned writings (eg, "scientific", medical, and legal Latin) Latin. But here our rules against including SoP expressions as well as the pronunciation differences work against it, despite the administrative simplicity that would result.
It would also not be unreasonable to call such expressions Translingual, once there is evidence of use embedded in running text in multiple (2, 5, 10?) languages. This violates few of our firm rules, I think, merely requiring that we allow pronunciations for multiple languages. A problem is the one of evidence of use in multiple languages. As a practical matter, it would be sufficient IMHO to start an entry for such an expression in whatever languages had sufficient evidence, typically including English. Once there were 2, 3, or more L2s, the L3s could be merged. Another demonstration of Translingual use might be single (ie, not triple) instances of use in a number of languages: eg, one use in Chinese, one in Japanese, one in Arabic, one in Spanish. DCDuring (talk) 14:48, 27 May 2024 (UTC)[reply]
What concerns me is that this will make attestation requirements more complicated, since by merging them into one “language” it’s only necessary to find 3 uses in any language at all, which is not particularly difficult. For example, pactum de non petendo is comparatively hard to attest in English, since it’s not terminology used in English law or any of its descendants, from what I can tell (the equivalent term used in such situations being estoppel, at least in England). Edit: I forgot about Scots Law, but the general point still stands. Theknightwho (talk) 15:02, 27 May 2024 (UTC)[reply]

Stalking/harassment by User:Theknightwho

User:Theknightwho is an admin, but a controversial one. Last year, he faced a de-sysop vote. While he was allowed to retain his admin privileges, people on both sides expressed concern about his combative behavior, with comments such as, "he seems to not know when to stop and distance himself from a fight.", "Maybe Knight is a hothead, maybe Knight broke the rules" and "I have my issues with User:Theknightwho...and I agree with the statement that he's a hothead and argues way too much"

I'm afraid he's up to his questionable behavior again. I believe he is targeting me, stalking me, harassing me. And on top of that, some of the edits he made in targeting and harassing me...

  1. On Saturday, he modified or undid my edits on three different pages in a fairly short amount of time. Per the "quacks like a duck" test, it is unlikely those
    • I told him that those edits were inappropriate. He refused to see any problem
  2. Later that day (it was either Saturday night or Sunday morning), he undid an edit I made to hot dog
    • His edit was so bad it had to be modified only a few minutes later.
  3. And today, he deleted seven redirects I created over the years. Some of them are acceptable redirects...for example, he deleted busted my neck, which as a conjugation of bust one's neck

What is to be done about Knight's stalking/harassment of me? Can somebody tell him to cut it out? Purplebackpack89 15:58, 27 May 2024 (UTC)[reply]