Wiktionary:Beer parlour/2024/February

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Unencoded Quotations[edit]

It has been claimed that if a spelling cannot be expressed in Unicode, it should not be entered into Wiktionary.

How does this apply to quotations? If the quotation cannot be expressed in Unicode, is it inadmissible, or do we simply use the best approximation available? I may be about to be in this situation for some Tamil-script Sanskrit. --RichardW57m (talk) 12:25, 1 February 2024 (UTC)[reply]

How would it even be added? What would you type in your web browser in order to get it to render here? —Justin (koavf)TCM 12:36, 1 February 2024 (UTC)[reply]
One might use images instead of text. Or one could do something along the lines of using 'oe' instead of 'ö', or tricks like using a following 'z' to indicating a combining squiggle below for Romanian. Sometimes the text could simply be transposed to another script or otherwise transliterated. --RichardW57m (talk) 14:32, 1 February 2024 (UTC)[reply]
I'm opposed to adding these hacks, but at the very least, they should be contained with some kind of template with a tracking category like "Entries with Unicode hacks" that can be fixed as subsequent versions of Unicode are published. —Justin (koavf)TCM 18:29, 1 February 2024 (UTC)[reply]

Partially unadapted borrowings[edit]

There are categories for "unadapted borrowings", but what about partially adapted borrowings? For instance, in Romanian, we have yankeu, which is not unadapted (it has the ending changed so that it can fit into Romanian grammar), but it's not fully adapted to the Romanian phonetic alphabet, as that would be iancheu. Bogdan (talk) 09:06, 2 February 2024 (UTC)[reply]

Changes from /ə(ɹ)/ or /əɹ/ to /ɚ/[edit]

@Cpeng2 has edited multiple entries, changing /ə(ɹ)/ or /əɹ/ in the pronunciation to /ɚ/. I've left a message on his talk page requesting that they not do so, for compliance with "Appendix:English pronunciation". Should the edits be reverted? — Sgconlaw (talk) 22:49, 2 February 2024 (UTC)[reply]

IMO, yes, if there is an appendix giving a standard, and they are going against it. Benwing2 (talk) 08:04, 3 February 2024 (UTC)[reply]
Wait, did we stop using /ɚ/ at some point? Or is there a distinction between /ɚ/ and /əɹ/ that I wasn't aware of? Andrew Sheedy (talk) 01:55, 4 February 2024 (UTC)[reply]
It appears that @Kwamikagami implemented the change in June 2023 following a discussion on the talk page (this one, I guess). Perhaps such discussions ought to take place here at the Beer Parlour. — Sgconlaw (talk) 05:25, 4 February 2024 (UTC)[reply]
@Sgconlaw Hmm, yeah, agreed. I actually think /ɚ/ is better for American English. Benwing2 (talk) 05:34, 4 February 2024 (UTC)[reply]
I assume this topic should be included if this vote ever actually happens.--Urszag (talk) 05:45, 4 February 2024 (UTC)[reply]
For GA, it's just /ɹ/. There's no phonemic distinction. kwami (talk) 05:49, 4 February 2024 (UTC)[reply]
I don't think there is any single objectively correct phonemic transcription of GA. There is no phonemic distinction between /æw/ and /aʊ/ or /ɛj/ and /eɪ/, but we still use the latter transcriptions rather than the former.--Urszag (talk) 05:57, 4 February 2024 (UTC)[reply]
Sure, but we don't use both, because that would imply that they are distinct phonemes. kwami (talk) 06:01, 4 February 2024 (UTC)[reply]
In that case we should decide on one transcription and go with it. I am not an expert on phonetics, but it may be that laypersons would find /əɹ/ more intuitive than /ɚ/. — Sgconlaw (talk) 06:01, 4 February 2024 (UTC)[reply]
I think that's probably true. kwami (talk) 06:06, 4 February 2024 (UTC)[reply]
At the very least, I don't think we should enforce things like this without a Beer Parlour discussion. However, the arguments made above do make sense to me. Andrew Sheedy (talk) 19:34, 4 February 2024 (UTC)[reply]
User talk:Your future self was doing the same thing (a student of Cpeng2). Changes from /ə(ɹ)/ to /ɚ/ are problematic, as I explained on YSS's talk page, as they make a (lazy) pan-dialectal transcription rhotic only, sometimes while keeping UK-only vowels (it'd be better to clean up instances of /ə(ɹ)/ into separate rhotic/nonrhotic or GA/UK pronunciations). In favour of /əɹ/, I see several arguments: if there's not a distinction between /ɚ/ and /əɹ/ (and hence we shouldn't be using one in some entries and the other in other entries), and we're already using /ə/ and /ɹ/, then using /əɹ/ means using fewer phonemes/symbols, and is consistent with how we notate e.g. /kɑɹt/ not /kɑ˞t/, and /woɹt/ (~/wɔɹt/) not /wo˞t/~/wɔ˞t/. I think /əɹ/ is more accurate in situations where there is a following vowel (to me, something like /əˈdʌltɚaɪn/ looks bad—there's a consonant in the pronunciation which is missing from that IPA), so I understand the appeal of using it across the board. OTOH, in cases where there isn't a following vowel, I understand the habit/appeal of using /ɚ/, like many other dictionaries do. My edits aren't consistent in using one or the other, but I do agree it'd be good to pick one as the standard to try to use consistently. - -sche (discuss) 00:14, 5 February 2024 (UTC)[reply]
@-sche You make a lot of good points although I think /əˈdʌltɚaɪn/ (what word is this BTW?) is probably actually accurate for American English; I don't hear a consonant /ɹ/ in the onset of the syllable following the rhotic schwa in such words. Benwing2 (talk) 00:22, 5 February 2024 (UTC)[reply]
adulterine, chosen only because it was the first word I spotted in the "has IPA" category that had the sequence we're discussing [in whatever notation] followed by another vowel - -sche (discuss) 01:26, 5 February 2024 (UTC)[reply]
@User:Cpeng, and @User:Your future self who has been making the changes now that Cpeng was asked to stop, it would be helpful if you could discuss here why you think Wiktionary should use /ɚ/ instead of /əɹ/. Changing a few entries at a time, as you have been doing, is unwise: the number of entries which contain this sound is so large that it is surely more sensible for everyone to reach consensus here about which notation to use, and then we can standardize things in one direction or the other (either changing all relevant cases to /ɚ/ or changing all relevant cases to /əɹ/) systematically, with the help of automated and semi-automated tools.
YFS, I appreciate you heeding what I said on your talk page about tagging rhotic pronunciations as GenAm, but please pay attention to the whole pronunciation and not only the rhotacized schwa bit, for example here your edit resulted in the GenAm pronunciation being said to have /ɒ/ (GenAm uses /ɑ/ in that word, although it has come up in past discussions that some dialects in your area—I'm referring to the fact that you both list yourselves as CUNY folks—do use /ɒ/ in various words, but GenAm is not normally analysed as having that phoneme). - -sche (discuss) 02:43, 9 February 2024 (UTC)[reply]
@-sche FWIW, my more-or-less GA accent does have /ɒ/ but it is the sound of caught not of cot, which uses /ɑ/ or maybe a centralized [ä]. As for the wider issues, we need an English pronunciation module to reduce the Wild-West nature of our current pronunciations. User:Theknightwho has mentioned interest in doing this but has also said they might not be able to get to it any time soon. If not, I might try to create something. Thoughts? Benwing2 (talk) 05:39, 9 February 2024 (UTC)[reply]
(This ties in to what I was saying earlier on the talk page of the nurse-vowel vote, and I've commented there so that this discussion doesn't get too far off the topic of /ɚ/~/əɹ/...) - -sche (discuss) 08:27, 9 February 2024 (UTC)[reply]
@-sche Hello! Thank you for bringing up some good points! My apologies for the mistakes regarding the general american pronuncations that used the incorrect vowels. This was due to to a lack of my own knowledge and (incorrectly) assuming that the lazy pan-dialectical transcription corresponded to an American pronunciation. I will be more careful about this from now on. I also agree that we ought to choose either /ɚ/ or /əɹ/ and stick with it consistently. Which one we end up choosing doesn't particularly matter to me (although I tend to lean /ɚ/ because I think it better captures how it is a single phoneme), however, /ə(ɹ)/ just isn't valid IPA. Parentheses are not documented as a convention anywhere. It is not clear that they indicate an "optional" phoneme, and even if that *is* understood, it is completely opaque that one version is supposed to represent GenAm while the other is supposed to represent RP. Whatever symbol we do land on, we should at least stop the practice of transcribing this sound as /ə(ɹ)/. - Your future self (talk) 21:10, 16 February 2024 (UTC)[reply]

Failure to employ Descriptivism: Implied bias against Alternative forms[edit]

On the homepage of Internet Archive, one of the most important websites, is the following sentence: "Internet Archive is a non-profit library of millions of free books, movies, software, music, websites, and more." That sentence employs the word "non-profit". Wiktionary classifies this word as alternative form of "nonprofit". The "alternative form" treatment of "non-profit" is the standard treatment of any alternative form, sure. But it in fact treats the alternative form as a "lesser form" or a "discouraged form", because the entry is a hollow shell. Now, frankly: I'll take what I can get! You know, Wiktionary is very open to all kinds of alternative forms that most dictionaries would shun and discourage. But I want to challenge you that treating "non-profit" as a second-class citizen compared to "nonprofit", in the way Wiktionary impliedly does by making "non-profit" so skeletal is not what descriptivism is all about. The average reader will see the "non-profit" entry and think: "ah, this is the bullshit form I shouldn't use." For Wiktionary to give off this impression is against Wiktionary's core descriptivist ethic. I don't have a solution more than I have a nagging problem. --Geographyinitiative (talk) 00:27, 3 February 2024 (UTC)[reply]

Can you cite any major dictionary that does not do that? Nicodene (talk) 01:08, 3 February 2024 (UTC)[reply]
It is not obvious to me that the effect is of any importance.
If we had anything intelligent and fact-based to say about alternative forms, that would reduce this effect. Examples would be usage notes that noted usage contexts, {{defdate}}, citations, more common use with one definition rather than others (for MWEs). It would also help if the determination of what the main form was could be based on current (say, this millennium or over longer periods for words without much usage) relative frequency. DCDuring (talk) 15:30, 3 February 2024 (UTC)[reply]
Your idea of what the "average user" would or should do is also a non-descriptive imposition. Where are your statistics and proof? Equinox 16:00, 3 February 2024 (UTC)[reply]
Geographyinitiative assumes a user with low ambiguity tolerance for which the user is responsible himself. He might well appreciate not having a stance. Fay Freak (talk) 16:39, 3 February 2024 (UTC)[reply]

Thank you all for your comments. Thanks for your help. I'm thinking the same thoughts you are saying. I'm uneasy about it, but I can't think of a way out. --Geographyinitiative (talk) 17:20, 3 February 2024 (UTC)[reply]

I'm missing the original point, but would like to state that we don't (yet) have a Template:bullshit form of template. We'd do well to add it to wonderfool. Demonicallt (talk) 23:38, 4 February 2024 (UTC)[reply]

"the" in head=[edit]

A lot of English terms (with parallels in other languages) include an article in the headword when it's habitually used with the article. Example, ankle express, defined using |head=the [[ankle]] [[express]]. The next ten in alphabetical order after this that do the same are:

  1. Antiochian Orthodox Church
  2. Antlered God
  3. Antonine Wall
  4. Apennine Mountains (but Apennine Peninsula and Italian Peninsula don't get this treatment)
  5. Apostle Islands
  6. Apostolic Age
  7. Apple Isle
  8. Arab Emirates
  9. Arabian Peninsula
  10. Arab League

Some other interesting examples:

  1. 10/40 Window
  2. art of the possible
  3. aseptic technique
  4. Asian six-pack
  5. Asian songbird crisis
  6. book world
  7. calm before the storm
  8. great imitator
  9. imperial system
  10. kitty's titties (but cat's meow doesn't get this treatment)

Should we maintain this? Intuitively, it does convey some info, but (a) I'm not sure this is the best way to present it, (b) I'm not sure whether it's derivable automatically by a reasonably competent non-native speaker. I'm sure there are many cases where the the is missing, and indeed I found a couple of examples above. Note that I'm about to push some improvements to English headwords, one of which is to greatly improve the default linking behavior and another is to allow post-processing of the default-linked form the change specific links or add text at the beginning or end, which could be used to add the at the beginning without having to repeat the entire head. However, if people think the current system is the best, another possibility is a special param |def=1 to indicate that the term normally takes the definite article; there is something similar already in {{de-noun}} and {{de-proper noun}}, and it affects the entire declension table (in this case, we have no declension table, but we may have headword forms). Thoughts? Benwing2 (talk) 08:23, 3 February 2024 (UTC)[reply]

This was discussed at Wiktionary:Beer parlour/2017/March § Where to record usage of the form "the X" (I support the view that it should be indicated somewhere). J3133 (talk) 08:52, 3 February 2024 (UTC)[reply]
It's useful, and not derivable automatically. Greenpeace could have been named "the Greenpeace", but wasn't. Equinox 15:24, 3 February 2024 (UTC)[reply]
I agree with J3133 and Equinox - it conveys useful information. Theknightwho (talk) 18:12, 3 February 2024 (UTC)[reply]
I agree. Shows useful information. CitationsFreak (talk) 23:19, 3 February 2024 (UTC)[reply]
It does seem premature to have our desire for technical perfection proceed without attending to the issue of what is or is not part of the lexicon. The fact that we have divergent treatment of apparently similar headwords is not necessarily a sign of laziness or ignorance, but rather of disagreement over what should be presented. Articulating the rules for normal application of the is principally the job of the entry for the. Alternating usage ((the] Gambia, (the Ukraine, etc.) needs to be noted in the entry, probably in a usage note. Usage labels and examples accomplish most of what we need if we want to act as if the matter is lexical. Burdening headword templates with this kind of thing seems likely to waste technical resources and risks making templates/modules even harder to maintain. DCDuring (talk) 15:25, 4 February 2024 (UTC)[reply]
I don't see how this is a burden or a maintenance issue. Theknightwho (talk) 17:40, 4 February 2024 (UTC)[reply]
Complication always is. This is an instance of needless complication. I am not confident that we will be able to call on the authors of the modules when things change in the WM environment. People leave for all kinds of reasons, often unexpectedly, sometimes totally. DCDuring (talk) 18:43, 4 February 2024 (UTC)[reply]
I say that "the" should only be used if the term is very often used with it, not counting attributive uses (eg cat's meow, White House). CitationsFreak (talk) 19:44, 4 February 2024 (UTC)[reply]
I have added |def=1 and converted cases that use 'the' in the head to use it. I'm not sure how it adds complication; it's only about 6 lines of code and the principle is very simple. Benwing2 (talk) 21:45, 4 February 2024 (UTC)[reply]
I recall that 10 years ago or so our technical folks we already concerned about the complexity of {{en-noun}}. Maybe Lua makes it easier to use conventional programming methods to add complexity/features without limit. DCDuring (talk) 23:36, 4 February 2024 (UTC)[reply]
@DCDuring IMO Lua makes complexity much more manageable compared with template coding. Template code gets impossible to understand after a certain point due to the proliferation of braces and the need for duplication (because of missing features in the template syntax), but this doesn't happen with Lua. Benwing2 (talk) 00:14, 5 February 2024 (UTC)[reply]
I'm sure |def=1 makes things easier for the programmers, but from the user perspective the difference is:
Keystrokes: 4 (t + h + e + [space]) vs. 5 (| + d + e + f + = + 1)
Wiktionary-specific knowledge required: none vs. yet another named parameter- I'm not even sure how many there are so far, and other templates each have their own sets of parameters.
Output: No difference whatsoever
This strikes me as something like having Template:the, powered by Module:the, whose only output is "the ". It may not add much to the system overhead, but from a user's perspective it's a pointless waste of the user's time and learning capacity.
Mind you, I'm not exactly a technophobe. I appreciate all the wiki-bells and wiki-whistles, and spend a lot of my time adding things like headword templates and educating new users about them. It just seems like this is an outlier in the usefulness vs. complexity continuum. Chuck Entz (talk) 01:36, 5 February 2024 (UTC)[reply]
@Chuck Entz It's not 4 keystrokes because the old way you had to use |head= and spell out the entire lemma following the the , but when using |def=1 you can mostly avoid that, especially since I've made the default headword-linking algorithm a lot smarter. For example, formerly snake-in-the-box problem would use something like {{en-proper noun|head=the [[snake]]-[[in]]-[[the]]-[[box]] [[problem]]}} when now you write {{en-proper noun|def=1}}. Benwing2 (talk) 01:50, 5 February 2024 (UTC)[reply]
Point taken. Is there any reason not to call it |the=1, then? That would be easier to remember. Chuck Entz (talk) 01:57, 5 February 2024 (UTC)[reply]
No reason, I just didn't think of that. I'll add that as an alias. Benwing2 (talk) 02:00, 5 February 2024 (UTC)[reply]
@Benwing2 Would it be possible to add something like def=~ for optional use of the definite article? e.g. Old National Pronunciation. Theknightwho (talk) 23:40, 10 February 2024 (UTC)[reply]
@Theknightwho Sure. How should it display? As two heads, or with (the)? Benwing2 (talk) 23:41, 10 February 2024 (UTC)[reply]
@Benwing2 I'm inclined towards displaying both separately. To an unfamiliar user, it might seem like the brackets are there because it's not part of the pagename. Theknightwho (talk) 23:44, 10 February 2024 (UTC)[reply]

What's a "multiuse collective" supposed to be?[edit]

The term is used at Module:number_list/data/en and therefore appears in some number entries. The term doesn't seem to be used outside Wiktionary. Equinox 11:06, 3 February 2024 (UTC)[reply]

It might be read as "multiuse, collective", ie, as two adjectives. This would be bad form as most of the other column heads are NPs. DCDuring (talk) 16:41, 3 February 2024 (UTC)[reply]
I might have created this but if so I have no idea what I meant. It sounds like a (poor) attempt to distinguish various sorts of collective terms. Benwing2 (talk) 21:37, 3 February 2024 (UTC)[reply]
It could be worse. Sometimes I'm reading some old discussion here and find something that makes me think 'Who does this guy think he is?' only to find that it was me. But I don't let that slow me down. DCDuring (talk) 22:47, 3 February 2024 (UTC)[reply]
I think I get it now. It means a single entity consisting of n parts (e.g. triplet, as opposed to a threesome which is three separate entities). I don't know why that is "multiuse" though. Equinox 22:42, 4 February 2024 (UTC)[reply]
@Equinox What do you think of referring to terms like threesome, foursome as "group collectives" ("collective groups"?) and terms like triplet as "multipart collectives"? Benwing2 (talk) 00:17, 5 February 2024 (UTC)[reply]
@Benwing2: That seems clearer! Equinox 09:44, 5 February 2024 (UTC)[reply]
I am the one who named it "multiuse collective"; I needed a word to disambiguate it from the other collective series, especially the Germanic "-some" series. The "multiuse" series is really to "tuplet" series, which blends Latinate number prefixes and the Germanic diminutive "-et". At the time I named it "mutiuse", I think one of the tuples had a multiplicative meaning, or some confusing wording that made me think it had a meaning other than collective or elemental, so I named the series "mutliuse" since it had mutliple uses; I figured "Germanic-Latinate collective" would be offputting.
I agree, the term is inapt. I am glad others have changed it, but honestly the difference is only etymological and not semantic. I think the current "group" and "multipart" terms are misleading, since it implies some sort of semantic difference. Is "Tuple collective" better? PhalanxDown (talk) 17:41, 7 March 2024 (UTC)[reply]
I don’t think either “multipart collective” or “tuple collective” are clear. It may be that there is no brief way of expressing this clearly, in which case just pick one of the terms and add an footnote explaining at greater length what it means. — Sgconlaw (talk) 18:10, 7 March 2024 (UTC)[reply]

Search options[edit]

In case someone wants to find, say, English words beginning with st-, are there any tools to do such a search? Pirhayati (talk) 22:06, 4 February 2024 (UTC)[reply]

@Pirhayati: For some options, go here and click to expand the Advanced Search section: [1]. Or go to Category:English lemmas (or any category) and browse alphabetically. Equinox 22:44, 4 February 2024 (UTC)[reply]
Thanks. What if I want to find the words ending with -st or with a middle -st-? Pirhayati (talk) 08:54, 5 February 2024 (UTC)[reply]
As far as I recall, you can't search by suffix; see e.g. this discussion from 2015: [2]. If you are techy, one option is to download the periodic "titles only" database dump, and parse it with a simple script. Equinox 09:47, 5 February 2024 (UTC)[reply]
You can use incategory:"English lemmas" intitle:"*st*" in Special:Search for the middle -st-, but "*st" will show you things that have "st" at the end of any word, not just the last one. There might be some way to use a regex with the intitle keyword, but I wasn't able to make it work. Chuck Entz (talk) 15:59, 5 February 2024 (UTC)[reply]
You can search for words with a certain ending using this tool; set the category to "English lemmas" (or whatever language you're interested in); if you also want non-lemmas, do a second search for "English non-lemma forms". It may time out or truncate the results if you search for something extremely common. If you're techy, the best option is as Equinox said. If you're not techy, you can download AutoWikiBrowser, download a database dump, and use AWB to search the database dump for all pages with st in the title and e.g. ==English== in the content; AWB will not allow you to edit Wiktionary unless you've been approved by the community, but AFAIK you can use it to search database dumps and categories and generate lists. It stops after some high number of results, though, so if you want a list of all words with something extremely common like e, you need to learn to parse a dump with your own script as Equinox said. - -sche (discuss) 15:38, 5 February 2024 (UTC)[reply]
incategory:"English lemmas" prefix:st works for me. JeffDoozan (talk) 16:37, 5 February 2024 (UTC)[reply]

Listing West Proto-Slavic Descendants[edit]

@Sławobóg@Thadh@AshFox@ɶLerman

Would the following descendants section format be too busy? An example would be

  • West Slavic:
    • Czech-Slovak:
      • Old Czech: bez
      • Slovak: bez, baza
    • Lechitic:
    • Sorbian:

Vininn126 (talk) 13:34, 5 February 2024 (UTC)[reply]

I'm highly skeptical of Czech-Slovak as a valid branch. Once you get to dialectal Slovak the number of shared developments is extremely low, and it is well known that Old Slovak was under influence of Old Czech for many years. Thadh (talk) 13:41, 5 February 2024 (UTC)[reply]
I take it by your lack of comments you don't see any problems with grouping Lechitic like this? Vininn126 (talk) 13:45, 5 February 2024 (UTC)[reply]
I don't think it's problematic. Thadh (talk) 13:54, 5 February 2024 (UTC)[reply]
I am for it! A long time ago I proposed adding a Czech-Slovak subgroup to the tree of Slavic languages. But then Thadh also spoke about his skepticism about the Czech-Slovak subgroup. AshFox (talk) 05:04, 6 February 2024 (UTC)[reply]
I really ask you not to combine Czech with Slovak, since Czech dialects influenced Western Slovak dialects, and those on the Middle and Eastern dialects. ɶLerman (talk) 12:14, 6 February 2024 (UTC)[reply]
Also, the Polabian does not need to be designated as Lechitic. ɶLerman (talk) 12:15, 6 February 2024 (UTC)[reply]

Some Issues with the Way Hebrew Verbs (and Forms Thereof) Are Displayed[edit]

Hi,

I’ve noticed some glaring issues with the way Hebrew verbs are handled that I think need to be addressed:

1. Different binyanim with same etymology listed with different "Etymology" headings[edit]

Verbs (lemma forms and forms-of) with the same headword spelling (thus listed together on the same page) are often separated under different "Etymology" headings, in order to separate/disambiguate different binyanim, when in fact the two words in question have the exact same actual etymology. Take the entry נשלח, for example, which contains נִשְׁלַח (nishlákh), the 3ms past/perfect (and lemma) and נִשְׁלָח (nishlákh), the nif'al ms participle/present forms of the nif'al/"N-stem" binyan, under "Etymology 1", as well as נִשְׁלַח (nishlákh), the 1cp future/imperfect form of the qal/pa'al/"G-stem" binyan under a second heading of "Etymology 2".

However, in reality, both of these binyanim and all three forms share the exact same etymology, i.e. the same triliteral root שׁ־ל־ח (š-l-ḥ). Different binyanim do not constitute different etymologies, and such incorrect and misleading usage of the "Etymology" heading could actually create significant confusion in cases where there are actually different etymologies under the same headword/entry page, either where two different words just happen to have the same spelling despite having different roots, or where there are, in fact, two different roots with the same or similar spelling. For an example where this could be the case, one need look no further than words with the roots פ־ר־שׂ (p-r-ś) or any of the three (by Hebrew Wiktionary’s reckoning—see earlier link) separate roots with the literals פ־ר־שׁ (p-r-š) (Hebrew Wiktionary labels these different roots with the exact same spelling as פ־ר־שׁ א, פ־ר־שׁ ב, and פ־ר־שׁ ג, respectively).

Now, if one were to use the strategy utilized on נשלח of using different "Etymology" headers for different binyanim, in addition to the already potentially confusing plurality of headers needed for all of the different actual etymologies on פרש, things would get out of control pretty quickly. In fact, our entry for פרש actually does suffer from this malady. It is only (partially) redeemed from total confusion by the fact that the entry contains multiple {{HE root}} templates, which serve to group the actual etymologies together—unfortunately, such a strategy is not possible on a page like נשלח, which will only ever have a single {{HE root}} template. Importantly, it will only have a single {{HE root}} template precisely because all of the words in the entry only have one etymology, making the case by itself that "Etymology" headers should not be used to separate or classify different binyanim whose etymologies are, in fact, identical. Again, a binyan is not an etymology, it is effectively a higher-level kind of conjugation, a pattern which provides information about how a verb is used (i.e. voice—active/passive/reflexive, causativity, etc), which vowels should be used to conjugate its various parts, and whether any affixes are required, but it usually has no purely semantic information in itself. I suppose one could make the argument that "D-stem" or pi'el verbs often carry "extra" semantic content, usually in the form of "intensification" of the "G-stem"/qal/pa'al verb, although more rarely with somewhat different meaning altogether, but this still doesn’t change the fact that the word's etymology and primary semantic content is ultimately derived from the root, regardless of the binyan. And because most binyanim do not, in fact, contain any purely semantic content in and of themselves, the peculiarities of the "D-stem"\pi'el binyan cannot on its own be a case for this usage of the "Etymology" heading, especially in the context of all the issues such usage brings up, as discussed above.

One solution that I can think of to this problem would be to add a parameter to the {{he-verb form}} templates that lets editors specify the binyan, in the same way that the {{he-verb}} already does. This parameter would use the same shortcodes as {{he-verb}} for the various binyanim, and the result would be the appending of something like "(qal construction)" to the end of the {{he-verb form}} template, clearly disambiguating between it and other binyanim in the same entry, and eliminating the need for editors to abuse the "Etymology" header, leaving it for the task for which it should actually be used.

2. No good way to specify alternative spellings of verbs on same entry[edit]

I really think we need a way to specify alternative spellings in entries that use the {{he-verb}}/{{he-verb form}}/{{he-verb form of}} templates—for example, the verb שִׁלֵּחַ (shiléakh), with alternative spelling שִׁלַּח (shilákh). Currently, the only way to show the alternative would be to list it as a separate verb in the entry and then write "alternative form of…" in the definition (this is what I, myself, did on the aforementioned entry). While this particular combination of verb/tense/person/number only alternates between two spellings, many Hebrew verb conjugations have as many as four alternative spellings. We have a special parameter for pausal forms in {{he-verb}}, so IMO alternative forms should also be supported by template parameters (and by the way, we also may want to have a way to specify paragogic nun forms of verbs, so that they may be added and properly notated, in a way similar to pausal forms and my proposal here for alternative spellings).

Note that this is also a problem in the {{he-verb form}}/{{he-verb form of}} entries for the conjugated parts of such verbs, such as with יְשַׁלֵּחַ (y'shaléakh)/יְשַׁלַּח (y'shalákh). On those particular entries it is not quite as much of a problem to list the different spellings under different {{he-verb form}} templates, because they are already "forms of" anyway, but it still isn't ideal.

Note: I also posted about the above issue (#2) here, under the discussion/talk page for {{he-verb}}, but I wanted to bring it to the wider attention of this community here, since discussions at such parochial talk pages can easily be overlooked for long periods of time.

Please let me know what you think is the best way to solve either of these problems, and whether either of the solutions I have proposed makes sense.

Thanks,

Hermes Thrice Great (talk) 00:16, 6 February 2024 (UTC)[reply]

Regarding the first issue, I agree with your proposal. To give an example from another language group: when Romance languages have a noun that has one gender with some meanings and a different gender with other meanings, the practice is to simply have a second noun header (e.g. meia has two noun headers under one etymology). This would be a better approach for Hebrew verbs with different binyanim, I think. There would be a single etymology, but two (or more) verb headers, which would specify the stem. I have no thoughts on the second issue, for the moment. Andrew Sheedy (talk) 06:13, 6 February 2024 (UTC)[reply]
@Hermes Thrice Great For the first issue, I proposed last July a solution involving Etymology sections with numbers like 1.3 and 2.1, which are used to group the equivalent of terms from different binyanim in Arabic vs. terms from different roots. See WT:Beer parlour/2023/July#Etymology sections like 1.3, 2.1. There are issues with stuffing different binyanim under a single root in a single Etymology section, e.g. it becomes trickier to indicate the different pronunciations of the different forms (as well as the fact that they do indeed have different etymologies, logically speaking). For the second issue, we now have fairly general support for listing multiple variants of a term in a single link, courtesy of User:Theknightwho. The Hebrew templates haven't been reworked to use this support, however. As for your example of שִׁלֵּחַ‎ (shiléakh) vs. שִׁלַּח‎ (shilákh), however, these aren't just different spellings but also have different pronunciations, so I think some form of {{alt form of}} is correct. Benwing2 (talk) 23:58, 6 February 2024 (UTC)[reply]
Thanks for your response. I think both of these solutions make a lot of sense. I should like to see if anyone else wants to chime in, but I can definitely work with these solutions.
Hermes Thrice Great (talk) 08:05, 7 February 2024 (UTC)[reply]

Parameters for citing translations in {{quote}} templates[edit]

I think we should add a few parameters for citing where we got an English translation of foreign language texts if an English translation of a quoted text are readily available (not translated by our own editors). What do people think? Pinging @RcAlex36, who gave me the idea. — justin(r)leung (t...) | c=› } 04:48, 6 February 2024 (UTC)[reply]

@Justinrleung: you could use |footer= to indicate the source of the translation and, if desired, format the citation using {{cite-book}}, etc. I don’t think it’s necessary to have new parameters for this purpose. — Sgconlaw (talk) 04:53, 6 February 2024 (UTC)[reply]
@Sgconlaw: I'm thinking of Chinese quotations, where we use {{zh-q}} rather than |text= and |t= in the {{quote}} templates, but still the rest of the template for the bibliographic info. Using |footer= would not have the desired effect in this case, right? — justin(r)leung (t...) | c=› } 16:42, 6 February 2024 (UTC)[reply]
@Justinrleung: I'm not familiar with Chinese entries. Can you provide an example? — Sgconlaw (talk) 16:54, 6 February 2024 (UTC)[reply]
@Sgconlaw: Something like 另起爐灶. — justin(r)leung (t...) | c=› } 17:19, 6 February 2024 (UTC)[reply]
@Justinrleung Can you give an existing Chinese example that cites where an English translation came from? Your example of 另起爐灶 doesn't do that; it just seems to use {{zh-q}} for formatting the quotation itself, for some sort of technical reason that I don't completely understand (IMO we should strive to eliminate things like {{zh-q}}). Or alternatively, a hypothetical example with the syntax you propose. Benwing2 (talk) 22:31, 6 February 2024 (UTC)[reply]
@Benwing2: A use case would be 國君, which is currently using the even older format of |ref= in {{zh-x}} instead of having a {{quote-book}} with {{zh-q}}. What I would imagine it being with what I propose would be something like this:
{{quote-book|zh|year=c. 4th century {{BC}}|title=zh:《左傳》|trans-title=Commentary of Zuo|en-trans-title=Zuozhuan: Commentary on the "Spring and Autumn Annals|en-trans-year=2017|en-trans-translators=Stephen Durrant; Wai-yee Li; David Schaberg|...}} — justin(r)leung (t...) | c=› } 22:45, 6 February 2024 (UTC)[reply]
@Justinrleung How would this be displayed? Also maybe there is a way of doing this with |newversion= and |foo2= params; there are already quite a lot of params to {{quote-book}}, and various ways of citing translations, and I'm reluctant to add even more. Can you take a look at Template:quote-book#Reprintings,_translations_and_quoting_one_book_in_another near the bottom of that section where some text is quoted from a translation of The Snow Queen by Hans Christian Andersen and let me know if that format works? Maybe User:Sgconlaw can comment. Benwing2 (talk) 23:48, 6 February 2024 (UTC)[reply]
It makes sense. Currently the information is given by various hacks, like on فَنَك (fanak), or for سَابُورُ بْنُ سَهْلٍ [Sābūr ibn Sahl] (a. 869) Oliver Kahl, editor, Dispensatorium Parvum (al-Aqrābādhīn al-saghīr) (Islamic Philosophy, Theology and Science. Texts and Studies; 16) (in Arabic), Leiden: Brill, published 1994, →ISBN I left out mention of the 2003 translation The small dispensatory by the same author because the author is already cited as editor (and I do not always take over translations unrevised anyway).
Also true what Benwing suspects, we have had various workarounds for sequenced editions in the past, nobody might even gather all from memory. Imbiss still has the same hack as always. We tend to think out solutions for infrequent problems and then forget both. Fay Freak (talk) 00:02, 7 February 2024 (UTC)[reply]

Handling of some mathematical terms[edit]

In mathematics there are hundreds of adjectives which are used to indicate a certain mathematical object has some property (e.g. prime, normal, free). These terms usually appear next to a word indicating what kind of object they are characterizing (e.g. prime number, normal subgroup, free group), but sometimes they do not (e.g. x is prime, H is normal, G is free). Is there established policy on article/sense creation in this case? Do we create

(a) Senses under the adjective labeled (of a [number, subgroup, etc.])
e.g. local: (algebra, of a ring) Having a unique maximal (left) ideal.
(b) Separate entries of the form [adjective] [object] when the relevant sense of [adjective] only refers to mathematical structures of the type [object]
e.g. local ring: (algebra) A commutative ring with a unique maximal ideal, or a noncommutative ring with a unique maximal left ideal or (equivalently) a unique maximal right ideal.

or both? Or something else? In general I am in favor of (a) (if we can find sufficient evidence of use outside [adjective] [object]), and (b) seems justified given that textbooks read (e.g.) "a ring is called local if..." about as often as "a local ring is..."; see relevant discussion on Talk:prime number.

There's also a related problem: lots of properties in math can be seen either in a specific setting or in generality (e.g. ε-δ continuity of real-valued functions vs topological continuity). So, if we gloss free with its category-theoretic sense, should we remove free group as SOP? If we don't, presumably we do so on the basis that the more "groupy" sense we have at free group now is meaningfully different from something like ("a free object in the category of groups"), but this is kind of epistemologically murky since the definitions are (mathematically) equivalent, and practically difficult because there are lots of algebraic objects which can be free.

Is there already consensus on this issue? Discussions I'm not aware of? Winthrop23 (talk) 13:03, 6 February 2024 (UTC)[reply]

Among other things we are a historical dictionary. Thus, early use of a term often merits inclusion, ie, with its attestable meaning and part of speech. This may not be entirely consistent with its current use. So, if free in group theory preceded free in category theory (presumably a generalization/abstraction), we would like to have the early definition. I don't know whether we would want to have every attestable definition of free that preceded its ultimate? generalization in category theory.
Generally, we try to write definitions that can be understood by English language learners and those who are not experts in a field (like mathematics). Thus, simply moving a technically correct definition often does not lead to a satisfactory definition.
@Msh210 has entered many mathematical definitions and probably can help. DCDuring (talk) 17:09, 6 February 2024 (UTC)[reply]
@Winthrop23 I'm generally in favor of (a) to avoid duplication, although maybe some particularly common cases of (b) are acceptable under some circumstances. I think User:DCDuring's concern about historical accuracy can be handled in the mathematical definition of free (maybe through the Etymology section or in a Usage Note or something) without needing to create a bunch of specific entries like free X and free Y. Benwing2 (talk) 00:06, 7 February 2024 (UTC)[reply]
@Benwing2: See WT:IDIOM on in a jiffy and jiffy. DCDuring (talk) 13:47, 7 February 2024 (UTC)[reply]
Thanks for the ping. I agree that, by and large, we should follow "(a)" and treat things like free group as sums of parts, as Benwing2 said. That said, if there was a clear predecessor (for example, free group was in use before anyone thought to separate the words (which I highly doubt is true for this particular case, but maybe it's true for some (though off the top of my head I can't think of any likely candidates))), then IMO DCDuring's argument is a good one.​—msh210 (talk) 08:59, 13 March 2024 (UTC)[reply]

Classical Gaelic miscellanea[edit]

I’m preparing to start a preliminary WT:About Classical Gaelic page now that we’ve the ghc language code split.

Before I do that I’d like to ask ye about some preliminary stuff and see what you find about my ideas. There’ll be a few different issues here. I wasn’t sure if it fits any general discussion spaces, so I’ve chosen Beer Parlour as the most general one – hope that’s OK!

This post is a general request for comments.

What falls under Classical Gaelic and what doesn’t[edit]

So far we’ve had Middle Irish for anything up to ~1200 and then Irish or Scottish Gaelic for anything later. The ghc language code is not treated as the ancestor of Irish and Modern Gaelic (as it was a fairly standardized literary language that at the end of the period was quite different from the vernaculars). So the question is which texts should fall under the Classical Gaelic heading and which should not.

My proposal is:

  1. treat any text composed between the 13th and 15th centuries (inclusive) in Ireland and Scotland as Classical Gaelic,
  2. treat all poetry fulfilling dán díreach requirements from Ireland and Scotland, up to the 18th century, as Classical Gaelic,
  3. and for 16th century and later prose, decide depending on diagnostic features of the text in question.

The last will be fairly easy for Scottish texts, a bit more difficult for Irish ones. If a Scottish texts has a consistent use of:

  1. plural verbs endings (cuirid for ‘they put’),
  2. full preverbs do-, a-, ad- (do-bheir, a-tá, ad-chím, etc.),
  3. no reduction of do in past tense and relative clauses (do ghrádhuigh agas do ghlac ‘which have loved and accepted’),
  4. no reduction of do before verbal nouns (tareis an fhuar chreidimh do chur ar gcul ‘after putting away the vain faith’),
  5. use of eclipsis (a gcriochaibh for ‘in bounds / lands / countries’),
  6. use of future tense separate from present,

it’s (prose) Classical Gaelic and not Scottish Gaelic.

So Carswell’s Foirm na n-Uirrnuidheadh would be Classical Gaelic, but 1767 Gaelic Bible would be Scottish Gaelic. As Donald Meek put it in Language and Style in the Scottish Gaelic Bible (1767–1807):

(…) Thus, a bardic poem was governed by regularions that defined its language and style to a very minute degree. Prose was less strictly controlled, but it was carefully regulated, with the verb-endings correctly used according to classical norms, and other spects of morphology (…) closely observed. This type of prose was written in Scotland well before Keating’s time, and occurs in the first Gaelic book ever printed, namely John Carswell’s translation of the Book of Common Order, published in 1567. (…)

Carswell’s version [of a Biblical passage] is noticeably different from the form of the verse in the 1767 translation (…).

Although stylistically variable, from the highly ornate prose of John Carswell to the leaner but consciously dignified prose of Geoffrey Keating, Classical Gaelic of Type A made few concessions to the sort of language actually spoken by the ordinary people, certainly in Scotland. Nevertheless, one senses in Carswell a transparency and an occasional lightness of touch (…) which seem almost to anticipate the need for a level of language capable of connecting with the non-classical vernacular language. (…)

This balance, between ‘differentiated register’ and the vernacular, was in due time fully achieved by employing another of Classical Gaelic, which we can call ‘Type B’ for convenience. [we just call this (Early) Modern Scottish Gaelic on Wiktionary] (…)

(…) Kirk’s Bible is in the ‘Type A’ style; the later Old Testament is in ‘Type B’. (…) The most salient differences are:

  1. The pre-verb do, used to mark past tenses in the classical language, is not used in independent position in the Scottish Gaelic Old Testament, whereas it is fully preserved in Kird; thus sheas rather than do sheas (v. 3).
  2. The Scottish Gaelic text generally employs analytic forms of the verb, and does not use verbal inflections to indicate the person of the verb; thus chruinnich na Philistich rather than do chruinnigheadar na Philistinigh (v.1).
  3. The Scottish Gaelic text also marks nasalisation (eclipsis) in the Scottish manner, rather than in the classical manner (…); thus nam Philisteach rather than na Bhphilistineach (v. 2).
  4. There are noticeable differences in vocabulary, and the Scottish translation generally reflects a register more in keeping with Scottish use; thus ghlaodh e re slòigh Israeil rather than dfúagair ar sluaghaibh Israel (v. 8), (…)

He calls the Gaelic of the later Bible translation “Classical Gaelic Type B” but honestly I don’t see why as it’s pretty much just a high register of modern Scottish Gaelic. Fairly close to the modern vernacular and quite distant from Carswell.

It’s a bit more difficult to make a list of such diagnostic features for Irish, as many of them are kept in one dialect or the other and show up in prose fairly late (even if the language is generally recognizably a modern dialect). I’d suggest a few, though:

  1. points 2–3 from the Scottish list,
  2. occasional use of s-preterite forms in the past meaning,
  3. consistent use of the -idh ending in 3rd person present absolute verbs (and only restricting -ann to dependent/conjunct forms),
  4. other distinctions between synthetic absolute / conjunct endings (cuirmíd vs ní chuiream; cuirid vs ní chuiread),
  5. subject/object distinction in 1st sg., 1st pl. and 2nd pl. pronouns ( vs mhé; sinn vs inn; sibh vs ibh),
  6. keeping the preposition re beside le (even if not distinguished in meaning),
  7. use of eclipsing ar to express perfect,
  8. use of eclipsing go in the meaning ‘with’,

and perhaps more.

I’ll need to collect more examples of Irish texts from 17th and 18th centuries and compare them to see how easy it is to classify them as classical or not.

Lemmatizing verbs[edit]

We need to choose the lemma form for the verb. We have a few options, each with its adventages, disadventages, and some precedence.

Personally I lean towards the 3rd person present indicative, but here’s the full list:

  1. 3rd sg. pres. indicative – the same form as we use for Old and Middle Irish,
  2. 1st sg. pres. indicative – as in Dinneen,
  3. imperative – like for new dictionaries of Modern Irish,
  4. verbal noun – like in medieval tracts.

1. is used in DIL (where most of citations actually are from classical texts, it’s actually more of a Early Modern Irish dictionary than an Old Irish one, despite mostly using the Old Irish-like spelling), and in vocab lists to some editions of classical texts (see glossary on the Léamh.org website, it’s also the form in Eoin Mac Cárthaigh’s The Art of Bardic Poetry. So I’d say it’s the standard modern practice for Classical Gaelic, and well established since the beginning of the 20th century.

2. is used by Dinneen – his dictionary deals mostly with Modern Irish, but it uses the pre-reform spelling which is often followed by the editions of classical texts (see next section), and contains a lot of historical usage (making it perhaps the most useful dictionary for reading classical texts). It’s also used in many editions of classical texts (again, see Léamh glossary, eg. the entry for cuirim).

Generally, editions of classical texts seem to choose either 1st sg. or 3rd sg. form.

I haven’t seen 3. used for classical vocab lists (and it isn’t used by DIL – the sole dictionary actually encompassing the language) – but it’s the form that’s generally used for modern languages (both Irish and Scottish Gaelic), except for Dinneen’s dictionary. Choosing it would be useful for people wanting to find classical forms of a verb when they know the modern Irish imperative – as the lemma would often be the same.

4. is what’s found in the actual inflection lists in grammatical tracts from 14th–16th centuries – the bardic schools treated verbal noun as the name of a verb. I wouldn’t choose it though as the verbal noun has its own separate morphology and syntax, so I’d rather keep it listed among the verb’s forms and then have it as a separate lemma with its own inflection table.

Spelling normalization of lemmata[edit]

The spelling in actual early modern manuscripts and printed books vary a lot and can be anything between early Old Irish practices to pretty much post-caighdeán modern Irish sometimes.

Still, modern editions generally use late (post-15th century) style spelling (using ao for the /əː/ vowel instead of áe or the like, marking lenition of b, d, g consistently, etc.) so I think we can stick to later practices too – only listing actual orthographic variants appearing in texts in the entry.

The question is which exact spelling we should choose. We have the Irish Grammatical Tracts i: Introductory likely from early 1500s which gives a spelling guideline and it’s followed by Mac Cárthaigh in his The Art of Bardic Poetry, it has features such as:

  1. no a between é and a broad consonant (eg. bél rather than béal for ‘mouth’),
  2. doubling of eclipsed voiceless stops (a ccríochaibh instead of i gcríochaibh, a ttír and not i dtír, etc.),
  3. use of sg, sb, sd over sc, sp, st.

On the other hand, many modern editions stick to forms that got popular later and are closer to (or identical with) Dinneen’s spelling (béal, i gcríochaibh, scéal).

Reconsider whether we want Classical Gaelic to be a sister of Irish and Sc. Gaelic or their ancestor[edit]

It’s generally agreed upon that Irish and Scottish Gaelic have split during the Middle Irish period and started their own grammatical innovations during that time. But they both kept the same literary tradition wherever Gaelic political order with poets educated in bardic schools working as diplomats and public commentators was present. Gaelic was still popularly considered a single language at the time, up to at least the 18th century.

If we adopt the policy I suggested above, anything from Scotland and Ireland from the period 13th–15th century would classify as Classical Gaelic. And so it basically is the ancestor form of both – a language with some dialectal differences but generally a single literary standard allowing the use of many dialectal forms.

And most Scottish, as well as Irish, forms can be regularly derived directly from classical ones (even though some cannot, where the classical standard follows some Irish innovations not present in Scotland). So it can be useful in Etymology sections to derive some words from Classical Gaelic (and perhaps interesting to users of Wiktionary to see a “descendants” list under classical entries).

But since it was a classical standard not really representing any particular spoken dialect and during later times fairly removed from them, I’m not convinced that’s what we want to do.

So I’m mentioning this also as something to discuss further in the next weeks or months. // Silmeth @talk 21:46, 6 February 2024 (UTC) // Silmeth @talk 21:46, 6 February 2024 (UTC)[reply]

@Silmethule You might want to take this to the discussion page of WT:About Classical Gaelic. There are a lot of issues you're asking about and they seem to be something that only knowledgeable Irish and Scottish Gaelic editors would be able to discuss cogently. I'm not sure who these editors are but if you know, you can create a workgroup in Module:workgroup ping/data containing them, for ease in pinging. Benwing2 (talk) 00:01, 7 February 2024 (UTC)[reply]
@Benwing2: fair, I’ll start a stub of that page later today and move the discussion there. // Silmeth @talk 10:00, 7 February 2024 (UTC)[reply]
Moved to Wiktionary talk:About Classical Gaelic. // Silmeth @talk 20:02, 7 February 2024 (UTC)[reply]

Derived terms and surface analysis[edit]

As previous discussions have made apparent (1, 2) the precise scopes of 'derived terms' and 'surface analysis' remain uncertain. There is a clean way to fix this, as it happens.

First, note that our mainspace defines surface analysis as a 'synchronically valid analysis of a word's morphology regardless of whether it represents its diachronic etymology, that is, its historical origin'. That is a fairly clear definition and, in principle, easy to follow.

In practice, however, surface analysis has been thoroughly confused with surface etymology, to the extent that even the glossary blends the two into a synchronic-diachronic mess. In fairness, the problem isn't always apparent; the cited example earthen : earth + -en happens to work on both levels. Then one runs into any number of cases like the Bulgarian луна (luna), with its surface analysis of луч (luč) + -сна (-sna), and everything falls apart. Mind, neither *луч nor *-сна even exist synchronically, and the problems only multiply from there. If, following *lówks+neh₂, one tries to combine the actual Bulgarian noun лъ́ч m (lǎ́č) "light" and the actual suffix -ен (-en) "forms adjectives", all they could ever hope to produce is лъ́чен (lǎ́čen), a relational adjective for "light". Try as one might, there is simply no way to make anything like луна́ f (luná, moon) from the available materials.

To be sure the more typical mistakes are of a milder type, such as dissolve : dis- + solve, where the synchronic combination may look intuitive but produces the wrong pronunciation (/dɪ(s)ˈsɒlv/ ≠ /dɪˈzɒlv/) or the wrong meaning ('dis-solve' ≠ 'dissolve').

In any case, to sort all this perhaps one might try a plan like:

1) Change {{surf}} to display a vague/ambiguous 'equivalent to'. (Accepting for the moment that SA and SE are hopelessly confused.)

2) Gradually introduce distinctive replacements:

  • {{synch|Y|Z}} – 'synchronically derivable from Y + Z'
  • {{diach|Y|Z}} – 'diachronically corresponds to Y and Z' (or 'etymologically')

3) Introduce a sorting rule (possibly automatable?) such that:

  • If a given word X has a valid etymology with {{synch|Y|...}}, then (and only then) it is put under entry Y as a derived term.
  • If X has no valid etymology with {{synch|Y|...}} but does have one using {{diach|Y|...}}, or has some longer-range etymological link with Y, then it is put under Y as a related term.

ETA: It seems Rua had some similar thoughts back in the day. Nicodene (talk) 15:49, 7 February 2024 (UTC)[reply]

@Nicodene (For the record, at least луч (luč) is considered to exist in Bulgarian, according to RBE. If the roots indeed didn't exist, I'd find it to be a slightly absurd surface analysis and say it should be removed, but that one's fine IMO.)
Anyway, how are surface etymology and analysis different again? If "synchronic" means relating to a language in one point in its history, then it can't use forms that precede that point of the language, so we should be restricted to using only forms that exist in the modern form of the language, right?
My other question is why we would want to represent diachronically equivalent forms. What would be an example of this? Analyzing a modern English term as its Middle/Old English roots? I guess this should be used when we don't know an exact origin for a word, but want to represent it based on the component roots we know it came from? Kiril kovachev (talkcontribs) 20:36, 7 February 2024 (UTC)[reply]
@Nicodene Yeah I have the same concern as Kiril. I don't understand the difference between surface analysis and surface etymology, and unless this is made crystal clear, introducing two replacements for the one we have is just going to make things messier. Benwing2 (talk) 21:29, 7 February 2024 (UTC)[reply]
@Kiril kovachev, Benwing2:
Sorry, in trying to avoid rambling I'd cut out examples and quite a bit of context.
The gist is that for a modern Bulgarian word to have a synchronically valid morphological analysis, the morphological combination that one proposes for it by definition has to be synchronically possible - that is, the indicated elements should exist and be able to combine, within and according to the grammar of modern Bulgarian, to produce the form luná.
Suppose that a sorceror makes everyone forget that any word like luná ever existed. Would it be possible for a Bulgarian to wake up tomorrow and recreate that word by combining the noun lăč 'light' (still the only standard form I am aware of- not that it really matters) with a suffix -sna?
No, first and foremost because there exists no suffix -sna which modern Bulgarians attach to nouns to create new nouns from them. Nevermind being able to make new words with it, they wouldn't have any reason to suspect that their language had ever even had a -sna suffix unless informed of it by a linguist. I digress.
----
As for the terms 'surface analysis' and 'surface etymology' - frankly my preference would be to toss them in the bin and never look back. I wish I'd had the foresight to avoid using them at all in the earlier comment, as they (understandably) cause confusion.
Is the suggested alternative of 'synchronically derivable from' not clear? Another example: murderer is synchronically derivable from murder, because any time we like we can take a noun like murder and pop the agentive suffix -er onto it. Combining /ˈməːdə/ and /-ə/ gives us a form pronounced /ˈməːdəɹə/, and combining their senses results in one of 'he who engages in murder'. For all intents and purposes our creation is identical to the existing word murderer and we have proved that the latter is synchronically derivable. So its etymology merits the use of {{synch|murder|-er}} and we put murderer as a derived term under murder. (In principle we should likewise be able to synchronically recreate anything else found as a derived term under murder - and if something doesn't fit, then it would be moved instead under 'related terms'.)
On the other hand, we have no way to synchronically derive the English montage, even though we have the etymological components in mount and -age, because combining them results in the wrong pronunciation (/ˈmaʊntɪd͡ʒ/) and the wrong meaning ("act of mounting"). Thus in this case we could not put {{synch}} and instead use {{diach}}, since the components mount and -age do etymologically, and diachronically, correspond to those of montage, even if they cannot actually be combined to make montage. Nicodene (talk) 01:07, 8 February 2024 (UTC)[reply]
@Nicodene I see. Yeah I have never liked "surface blah" either, and have always preferred "synchronically analyzable as" which is similar to "synchronically derivable from". I actually think the vast majority of uses of {{surf}} are simply synchronic analyses; the cases like montage are rare and probably don't merit any sort of analysis into English components. I would just say montage is borrowed from French. There are indeed occasional cases (e.g. in Russian) of languages inventing French-like terms using French roots and French affixes that aren't normally productive in the destination language along with French combining rules, but it's not clear we need a special template for this. Benwing2 (talk) 01:15, 8 February 2024 (UTC)[reply]
I'd no interest in using {{diach}} myself and rather intended it as a relaxed alternative to {{synch}}. I'd somewhat pessimistically estimated that a third or so of the current uses of {{surf}} would fail to qualify as synchronically valid and so would one day have to be deleted or switched to another template. Having just surveyed some forty uses however I'm surprised to report myself agreeing with all but three.
If nobody pipes up in favour of {{diach}} I'm happy to drop it from the proposal, leaving just {{synch}}. Incidentally, an unintended side-effect of the phrasing 'synchronically derivable from' is that it may render {{af}} and such redundant. It seems just about inevitable that the preface works for any correct example of affixation. Nicodene (talk) 03:06, 8 February 2024 (UTC)[reply]
My position has always been against using linguistic jargon where it can be avoided (I favoured binning the "proscribed" label, for instance). I'd be a lot happier to see {{surf}} generate "equivalent to". I know it's imprecise, but so what? Your average punter isn't going to have a clue what "synchronic" means. I remember when I was at uni, a friend was studying a unit on Chinese translation and we came across the words "synchronic" and "diachronic" in one of the readings. Neither of us knew what they meant - and we both had undergraduate degrees (one in arts, one in sciences)! At a stretch, we could work on improving synchronic in the Glossary. But let's not make our readers work any harder than they have to. Most will just give up. This, that and the other (talk) 02:37, 9 February 2024 (UTC)[reply]
@This, that and the other I get your point although at the same time, I'm concerned that using something vague like "equivalent to" will lead to the same mess we currently have with "surface analysis": No one knows what it means and so it just leads to endless questions and speculation. At least "synchronic" has a precise meaning. Benwing2 (talk) 05:42, 9 February 2024 (UTC)[reply]
Would "Morphologically equivalent to" work for people? Vininn126 (talk) 08:15, 9 February 2024 (UTC)[reply]
In the last thread multiple people chimed in to make, in different ways, the same very important point that I'd initially missed. I should probably have made it at the beginning of this thread as well.
Take for instance boldly. Even if the combination is attested as far back as Old English (bealdlīċe), and across the centuries thereafter, that doesn't mean that our modern boldly is ipso facto its direct, linear survivor - in every case handed down from individual speaker to speaker gaplessly for some two thousand years. Inevitably it has been reinvented from the components bold and -ly, and older forms thereof, multiple if not countless times in the intervening period, and will continue to be so long as both components remain in active use. Speakers surely do not walk around with a massive stored archive of regular adverbs (rapidly, enviously, proudly, surreptitiously ad infinitum) individually and faithfully passed down from medieval English. What we actually store is -ly 'attaches to an adjective to form an adverb', and we go around merrily applying it to adjectives as we please.
All that is to say that the etymology shouldn't be phrased as if boldly is a fossil inherited from Old English that just so happens to be the equivalent of or comparable to bold and -ly: it is inseparable, a sort of forever-being-reborn child. After stewing on the point for some time I came up with the phrasing 'synchronically derivable from' to capture at least some of the significance of this. If there is a more effective way I'll be glad to hear it. Nicodene (talk) 12:23, 9 February 2024 (UTC)[reply]
Yes, of course reinnovation is part of this - how is this a response to the wording "Morphologically equivalent to"? Vininn126 (talk) 12:29, 9 February 2024 (UTC)[reply]
I have answered this, or tried to at any rate, in the last paragraph of that comment- its first sentence essentially. Sorry, I was in the middle of editing when you replied. Nicodene (talk) 12:39, 9 February 2024 (UTC)[reply]
I've a bit more faith in the reader than that! Surely they could click the glossary link. Even if they don't, one exotic word won't stop everything after it from making sense. If I were to stumble across the entry for virology and find a description like 'glabuciously derivable from virus + -ology', I can't imagine walking away without a clue as to the origin of virology.
In point of fact I've been reading etymologies with "surface analysis" for years here and the phrase still doesn't make sense to me. Or really to anyone else, judging by the foregoing discussions.
If it helps, here is a mock-up glossary entry for the proposed template:
  • Synchronic derivation: the process of making a word out of other words, suffixes, etc. that current speakers know, use, and can combine in that way. The modern English bluish is synchronically derivable because English speakers are aware of and use the separate adjective blue and because they also freely add -ish to various adjectives to diminish their sense. In case there is any doubt, this can be tested by coming up with an entirely unattested combination and seeing whether using it in a sentence make sense. At the time this was written there were zero Google results for 'Kyrgyzstani-ish', yet the writer judged the following sentence acceptable: 'The cat's owner sounds Kyrgyzstani-ish, but that there's an Uzbek cat if I've ever seen one'.
Nicodene (talk) 07:49, 9 February 2024 (UTC)[reply]
I for "synchronically, X + Y" or "morphologically, X + Y". PUC12:44, 9 February 2024 (UTC)[reply]
I agree with this conclusion. Kiril kovachev (talkcontribs) 16:33, 9 February 2024 (UTC)[reply]
Just spitballing, but if we accept the idea that "boldly" exists not solely because it was handed down from generation (Old English bealdlīċe) to generation (Middle English baldeliche, boldeliche, boldely) to generation (modern boldly) the way e.g. bold itself was, but because speakers productively use -ly, then what if we simply omit any sort of "surface analysis..." or "synchronically..." or other framing, and just start every etymology section for which a synchronic etymology is applicable with that etymology, i.e. instead of "From Middle English boldeliche [...] equivalent to bold + -ly", boldly would instead just say "{{af|en|bold|-ly}}. From Middle English boldeliche...}} ? (Cases where a synchronic etymology is not obviously applicable would not get {{af}}s, i.e. we would not say bold itself was "bell + -t" or whatever the reflexes of the parts it was composed of back in Pre-Germanic/PIE are, just like my opinion is that such things should not currently be getting "Surface analysis..."es either. I'm not sure how best to handle things like linking husband to house and bond: perhaps we keep its vague "equivalent to", or perhaps we replace it with something like "The first element is cognate to house, the second element is cognate to bond"? Or "...corresponds to..."?) - -sche (discuss) 01:52, 11 February 2024 (UTC)[reply]
We don't provide an etymology deriving the English plural friends from the Middle English frendes, do we? Nor singing from singynge, nor angered from angred. It'd be mad. Friends is synchronically, and trivially, the plural of friend, singing is the gerund or present participle of sing, and angered the past participle of anger. Plain and simple.
So, I ask, what is stopping us from treating boldly as the adverb of bold? Do our readers need to know that hundreds, perhaps thousands, of regular -ly adverbs are also attested in older stages of English any more than they need to know about all the older attested -s plurals or verb forms with -ing? Of course not. To see the latter they can simply visit the Old or Middle English lemma, and that is, logically, how it should be for -ly adverbs as well.
I understand that boldly differs in part-of-speech from bold, but I don't think that is sufficient reason to have the site cluttered with ever-growing numbers of trivial etymologies for regular adverbs with -ly, or in Romance languages -ment(e), or in Georgian -ად, and so on ad infinitum.
We should seriously consider the possibility of delemmatising that which is trivially derivable on the synchronic level in all ways - that is, whenever the result is entirely predictable in pronunciation, meaning, and form. For example this could involve a definition-line template that reads regular adverb of [X], possibly linking to the suffix used to make it, and such entries will be left without etymologies that are, in the end, entirely unnecessary. Nicodene (talk) 02:07, 12 February 2024 (UTC)[reply]
I disagree with delemmatizing adverbs like this; yes they are largely predictable in form and meaning but not always, and it would be confusing as well as hard to draw the line when it comes to doing that. Benwing2 (talk) 02:15, 12 February 2024 (UTC)[reply]
And that is where the use of a definition-line template will come in handy. Truthfully for instance is 1. regular adverb of truthful; 2. (sentence adverb) to tell the truth; and so on. As for as words unpredictable in form - that is already a problem, is it not? If all an etymology states is 'X+Y', but X and Y do not actually make Z, then the etymology is simply incorrect - or more charitably speaking, it is incomplete. The only reason the problem hasn't been apparent till now is that it's been hiding behind the vagueness of 'equivalent to', 'by surface analysis', and other nebulous phrasings. Nicodene (talk) 02:35, 12 February 2024 (UTC)[reply]
@Nicodene By delemmatizing I assume you mean either deleting the words or converting them to non-lemma forms, which I disagree with. If you are just referring to templatizing the definition, that's a different story, although I would argue that regular adverb of foo should be replaced with "in a foo manner", which is clearer. Benwing2 (talk) 02:43, 12 February 2024 (UTC)[reply]
You'll have to forgive my not looking up Wiktionary's particular definition of 'lemma'.
What I mean is de-etymologising the adverbs in question and templatising their synchronically-derivable definition - which will be the only one in ~99% of cases. The phrasing I gave for the template was just a place-holder; I do find 'in an [X] manner' much more user-friendly.
ETA: That said, the latter wouldn't work for non-English languages. Nicodene (talk) 03:06, 12 February 2024 (UTC)[reply]
I'm not a fan of this approach. While I understand that forms can be re-innovated that also doesn't exclude them from being inherited. Vininn126 (talk) 08:51, 12 February 2024 (UTC)[reply]
Well then it should be good news that this approach doesn't exclude inheritance at all, unlike the previous one. Nicodene (talk) 09:09, 12 February 2024 (UTC)[reply]

FYI: Updates from Unicode[edit]

https://mailchi.mp/7311676d715c/testing-rickys-template-6269058Justin (koavf)TCM 16:41, 7 February 2024 (UTC)[reply]

Is anyone going to tell Ricky that his email template works, and has in fact been working for several months?... This, that and the other (talk) 02:39, 9 February 2024 (UTC)[reply]

Setting Foreign Word of the Day[edit]

Is anybody eager to take over the task of setting Foreign Words of the Day? Then please reply below and we can work out a date from which you will assume full responsibility over it. You have to be a user in good standing with at least a modest amount of experience and an okay track record of not submitting questionable nominations (read: not having nominated offensive, vulgar or otherwise potentially controversial terms). I will explain the workings of the template to you and give you hints about certain best practices. I shall remain available for questions for some time as well. Be advised that you will often have to resort to finding or preparing suitable words by yourself.

Note that I am not insisting that someone else take over, nor burnt out, but I thought that the time may be right for some new blood. If it turns out to be too stressful, I remain available to reassume the task. ←₰-→ Lingo Bingo Dingo (talk) 20:25, 8 February 2024 (UTC)[reply]

We could just discontinue the FWOTD project. Demonicallt (talk) 20:40, 16 February 2024 (UTC)[reply]
Or just decide to only feature LDLs ;) Thadh (talk) 18:54, 17 February 2024 (UTC)[reply]
If you offer to set or provide them... :) ←₰-→ Lingo Bingo Dingo (talk) 23:36, 18 February 2024 (UTC)[reply]
Category:Ingrian terms with quotations is ripe for exploitation, but other than that I'm afraid I don't have enough time or enough lack of laziness to set a FWOTD every day. Thadh (talk) 23:43, 18 February 2024 (UTC)[reply]
We've also abandoned Wiktionary:Translations of the week and Wiktionary:Collaboration of the week Demonicallt (talk) 18:47, 17 February 2024 (UTC)[reply]

alternative forms and alternative spellings[edit]

Recently, someone asked what the difference was. While looking for examples to illustrate the supposed difference (previously explained to me as: alt. spellings if pronunced the same, alt. forms when pronunciation, or sometimes etymology, differs), I noticed many/most alt forms should (by that metric) be alt spellings. There was also discussion a while ago about how to indicate when the pronunciation is the same as the lemma vs hasn't been added yet (since in most entries that don't have pronunciations, it just hasn't been added yet).

  1. Do we agree on "pronounced the same" vs "pronounced differently" as the difference between these?
  2. If so, can we make {{alt spell}} spell out "alternative spelling of foo (pronounced the same)" or something? (For want of this, I have seen some entries use, and I have used, a pronunciation section that just contains "like foo" ... because duplicating the entire pronunciation section in a way that falls out of sync when people update one entry and not the other is bad, but having no indication about the pronunciation is indistinguishable from it just not having been added yet.)
  3. Should we go through existing {{alternative spelling of}}s and {{alternative form of}}s and make sure they're matching that distinction? I can't do all 94,000+, but I'll go through some with AWB and other people could please pitch in too, if people actually want these templates to be separate and distinct...

- -sche (discuss) 01:20, 11 February 2024 (UTC)[reply]

@-sche Yes, that is my understanding of the difference. I can make the change in (2) if others agree, and as for (3), yes we should fix this but it's a long-term project. Benwing2 (talk) 02:12, 11 February 2024 (UTC)[reply]
I feel like certain instances of "alt form" being used wrong can be seen by a simple bot program. As an example, there could be a bot program that sees that a certain spelling is defined as an "alt form of" when there is nothing but non-letter chars that separate it from another spelling. (As in fo'o-bar and foo bar.) CitationsFreak (talk) 02:26, 11 February 2024 (UTC)[reply]
@CitationsFreak For English, yes, but I wouldn't trust that more generally. Benwing2 (talk) 02:45, 11 February 2024 (UTC)[reply]
@User:Benwing Of course, of course! Still the general principle applies. CitationsFreak (talk) 03:05, 11 February 2024 (UTC)[reply]
I'm not sure I would assume "fo'o" vs "foo" to be pronounced the same even in English, but let's spot-check some examples and find out. I do think that (for English) a bot fixing all the instances that differ only in hyphenation would be great, and while a bot changing entries that differ only in spacing ("bat man" vs "batman") to "alternative spelling" might create a handful of errors (where the stress actually differs), based on my own spot-checking it seems like it would be a net positive by a wide margin because far more cases are currently wrong in the other direction (listed as "alternative forms" but not different in pronunciation). - -sche (discuss) 07:27, 11 February 2024 (UTC)[reply]
@-sche I downloaded the latest dump and fetched all occurrences of {{alternative form of}} or one of its aliases that refer to English. There are 57,005 of them. Looking through them, in addition to your suggestions I think we should also bot-change entries that differ only in capitalization, e.g. big brother vs. Big Brother, catch some Z's vs. catch some z's and cases that differ in a combination of capitalization, spacing and/or hyphenation, e.g. Middle-earth vs. Middle Earth or subsaharan vs. sub-Saharan. Also words that differ in -ise vs. -ize or -isation vs. -ization. Possibly also words differing in -ie vs. -y (leftie vs. lefty, forecaddy vs. forecaddie). Probably also differences in accent vs. no accent (a la carte vs. à la carte, chacun a son gout vs. chacun à son goût, épaulière vs. epauliere; although there are several like banishèd vs. banished where it may or may not count, depending on the intent, which is sometimes spelled out use a label poetic or poetry or even an explicit note "used to specify a disyllabic pronunciation"). In terms of apostrophe, there are cases like Bahá'í vs. Baháʼí with straight vs. curly apostrophe, and also cases like you's vs. yous, 'hood vs. hood, Shi'ite vs. Shiite, St. Paul's vs. St Pauls [here we would have to ignore the difference in punctuation, which seems a good idea to me], Tai-p'ing vs. Taiping that clearly are the same pronunciation; things like Hallowe'en vs. Halloween, Ba'ath vs. Baath, Guy Fawkes' Day vs. Guy Fawkes Day that probably represent the same pronunciation; and things like a'ight vs. aight, our'n vs. ourn that might represent the same pronunciation. There are also differing placements of apostrophes like horses' doovers vs. horse's doovers. (BTW there are some funny things, e.g. 38 alternative spellings of Gaddafi specified using {{alternative form of}}, of which 14 are labeled uncommon and 3 very rare. There was actually an SNL skit about this, listing a zillion alternative "spellings" including I think Chicago and Chewbacca.) Benwing2 (talk) 08:33, 11 February 2024 (UTC)[reply]
@-sche OK, I wrote a script to analyze the 57,005 cases to see how many could be converted:
39624 not same as to-page, can't convert to {{alt spell}}
6227 same as to-page with hyphens removed
4860 same as to-page with spaces removed
3066 same as to-page with hyphens converted to spaces
1569 same as to-page with capitalization ignored
 723 same as to-page with full canonicalization applied
 585 same as to-page with accents removed
 346 same as to-page with apostrophes removed
   5 same as to-page
This amounts to 17,381 cases, or about 30% of them. Here, "full canonicalization" means hyphens, spaces, apostrophes and accents all removed and capitalization ignored (i.e. more than one transformation was required to make the from-page and to-page the same). There are many more that are actually alt-spellings, but this is a good start. Benwing2 (talk) 09:21, 11 February 2024 (UTC)[reply]
Update: With a few more canonicalizations, I got another 2,956 cases handled:
36668 not same as to-page, can't convert to {{alt spell}}
6243 same as to-page with hyphens removed
4870 same as to-page with spaces removed
3070 same as to-page with hyphens converted to spaces
1639 same as to-page with -ise/isation/isational/isable/isability -> same with -iz-
1573 same as to-page with capitalization ignored
 921 same as to-page with full canonicalization applied
 585 same as to-page with accents removed
 362 same as to-page with -ible/-eable/-ibility/-eability -> -able/-ability
 355 same as to-page with periods removed
 346 same as to-page with apostrophes removed
 325 same as to-page with -ie -> -y
  40 same as to-page with æ/œ -> e
   5 same as to-page
   3 same as to-page with ł -> l
Benwing2 (talk) 11:03, 11 February 2024 (UTC)[reply]
I think the spellings of Gaddafi with 'z' actually have /z/ in the pronunciation, and I wouldn't trust 'q' not to induce a pronunciation with /k/. --RichardW57 (talk) 13:48, 11 February 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── -is- vs -iz- are supposed to use Template:standard spelling of rather than "alternative spelling of", with from=British form (strictly speaking, if being very precise, a combination of from=non-Oxford + labels for other relevant countries) for -is-, or from=American form|from2=Oxford for -iz- (+ other relevant countries if we're being precise). (Possibly we should just create labels like -is- form and -iz- form so we can just change what the labels display whenever we want to change the lists of countries, without having to update entries like we do at present.) Indeed, we should even be changing (or at least looking at, and double-checking whether or not we should be changing) cases that currently use either of "alternative spelling" or "alternative form", to use "standard spelling of".
Cases like big brother vs. Big Brother are meant to use Template:alternative case form of. Cases like catch some Z's vs. catch some z's are debatable; we could say the capitalization of Z doesn't seem to be indicating anything like greater specificity / proper-noun-ization the way it does with Big Brother or Native and so we could say catch Z's is a mere alternative spelling ... but we could also just make it an Template:alternative case form of too for consistency, and that is probably the easier thing to bot-ify.
Given the low numbers, I think we can just review and do the "5 same as to-page, 3 same as to-page with ł -> l" by hand, heh.
I think cases where the difference is an accent in a word-final -éd or -èd should be left using "alt form" rather than "alt spelling", or should examined by humans, as that difference fairly consistently seems to correspond to "the unaccented form can be pronounced either with -ed not being a separate syllable or with it being a separate syllable, whereas the accented form is always pronounced as a separate syllable".
For cases where the differences is plural vs singular possessive, horses' doovers vs. horse's doovers (BTW, if anyone is thinking "why is there a space between the horse and the possessive there?", it's because T:' adds padding), my understanding is that various standard accents distinguish princesses' with /ɪz/ vs princess's with /əz/), so it seems like a good idea to make a list of those for a human to review, if there are only 346, although some may then be bottable because I think no accent distinguishes e.g. drivers' vs driver's. - -sche (discuss) 15:22, 11 February 2024 (UTC)[reply]

@-sche Thanks. Some questions:
  1. What about cases like -isable vs. -izeable where there's both an -ise/-ize distinction and an extra e in one of them?
  2. What about cases like Middle-earth vs. Middle Earth or antisemitism vs. anti-Semitism where there's both a capitalization and hyphenation difference?
  3. Which accents make a difference between princesses' and princess's? Are these RP accents? My more-or-less GA speech makes no difference (although I do differentiate Rosa's from roses).
BTW it should be possible to have a bot pull out and flag cases with word-final -éd and -èd, as well as cases with apostrophes involving words ending in -s, -se or -ce (should I also flag cases with -x(e), -ch(e), -sh(e)?). Benwing2 (talk) 20:37, 11 February 2024 (UTC)[reply]
Here are the only cases that my script flagged (horse's doovers was the only case with apostrophes):
Page 130420 blessèd: WARNING: Saw from-page 'blessed' same as to-page with accents removed and needs manual checking for accented -ed
Page 1117859 horses' doovers: WARNING: Saw from-page 'horse's doovers' same as to-page with apostrophes removed and needs manual checking for pronounced schwa
Page 2253669 chassed: WARNING: Saw from-page 'chasséd' same as to-page with accents removed and needs manual checking for accented -ed
Page 2716069 lovèd: WARNING: Saw from-page 'loved' same as to-page with accents removed and needs manual checking for accented -ed
Page 2747107 belovèd: WARNING: Saw from-page 'beloved' same as to-page with accents removed and needs manual checking for accented -ed
Page 2747107 belovèd: WARNING: Saw from-page 'beloved' same as to-page with accents removed and needs manual checking for accented -ed
Page 3151227 cursèd: WARNING: Saw from-page 'cursed' same as to-page with accents removed and needs manual checking for accented -ed
Page 3151233 banishèd: WARNING: Saw from-page 'banished' same as to-page with accents removed and needs manual checking for accented -ed
Page 3184364 semi-auto: WARNING: Saw from-page 'semi-auto' same as to-page
Page 3184364 semi-auto: WARNING: Saw from-page 'semi-auto' same as to-page
Page 3200801 case shot: WARNING: Saw from-page 'case shot' same as to-page
Page 6354565 struntish: WARNING: Saw from-page 'struntish' same as to-page
Page 8666150 peakèd: WARNING: Saw from-page 'peaked' same as to-page with accents removed and needs manual checking for accented -ed
Page 9046577 day-long: WARNING: Saw from-page 'day-long' same as to-page
Benwing2 (talk) 21:15, 11 February 2024 (UTC)[reply]

I will say that I have always been 100% uncomfortable and unsatisfied with the dividing line between alternative spellings (most closely allied), alternative forms (basically the same), and synonyms (seriously etymologically divergent). And then there's doublets. The theory of what these categories mean is one thing, but treating them as mutually exclusive and finding that dividing line can become absurd at the edges. I am 100% open to changing "alternative form" entries to something else- either "alternative spellings" or "synonyms" or whatever else.
Are "big brother" and "Big Brother" alternative "spellings" or alternative "forms"? Under the doctrine that "alternative spellings are pronoucned the same" one might say they are alternative spellings. But the problem is that these two entries are spelled the same way! It's somewhat counterintuitive to call that pair alternative spellings. Capitalization is not a spelling change. Yes, the Spelling Bee does require you to indicate a capitalization [3], but I feel like the spelling of a word is the letters themselves and not the case of the letters. Hence this is an "alternative" "form" rather than an "alternative" "spelling". A word can be "spelled correctly but capitalized incorrectly" in my understanding. --Geographyinitiative (talk) 00:46, 12 February 2024 (UTC)[reply]

As sche said, cases like big brother and Big Brother should use "Template:alternative case form of" rather than either the alternative spelling or alternative form templates.
In regard to apostrophes, although many apostropheless forms with the same pronunciation are just alternative spellings, in some cases there is a difference in formal interpretation e.g. lady's man and ladies' man, despite being pronounced the same and meaning the same thing, are arguably different in more than just spelling since the first uses the genitive singular form and the second uses the genitive plural form. They (in theory at least) represent different interpretations of how the word is formed. There is also ladies man, which we now label as an 'eggcorn': while I can sort of see where this is coming from, I don't think that's the most useful way to describe why the apostrophe is missing in this case. "eggcorn" suggests to me reinterpretations between etymologically unrelated words, whereas the genitive plural "ladies'" and plain plural "ladies'" are obviously etymologically extremely related.--Urszag (talk) 01:36, 12 February 2024 (UTC)[reply]
@Urszag I would say, in theory, yes there is a difference between lady's man and ladies' man but it seems too theoretical to worry about. Cf. traveler's diarrhea vs. travelers' diarrhea. I would say to simplify things, we should just use 'alternative spelling of' whenever the pronunciation is the same and the etymologies are, if not precisely the same, very closely related. I agree with you, however, that ladies man is not an eggcorn (which I would reserve for sparrowgrass vs. asparagus and eggcorn vs. acorn and such), but just a case of lazy spelling (it is very common, for example, for place names to start out with apostrophes in them which get dropped over time). BTW User:-sche I created English-specific labels for ise-form and ize-form, which you can see in action at User:Benwing2/test-standard-spelling. Benwing2 (talk) 02:01, 12 February 2024 (UTC)[reply]
Just FYI I expanded the above analysis to cover many more cases and reran it on cases that used {{alternative spelling of}} or one of its aliases, producing the following:
7149 not same as to-page, can't convert to {{alt spell}}
2950 same as to-page with hyphens removed
2344 same as to-page with spaces removed
1766 same as to-page with hyphens converted to spaces
1267 same as to-page with -ise/-iser/-ises/-ised/-is(e)ing/-is(e)ation(al)/is(e)able/is(e)ability -> same with -iz-
 759 same as to-page with æ/œ/ae/oe -> e
 671 same as to-page with accents removed
 356 same as to-page with full canonicalization applied
 349 same as to-page with capitalization ignored
 247 same as to-page with ph -> f
 185 same as to-page with -ie -> -y
 160 same as to-page with -our -> -or
 141 same as to-page with -ey -> -y
 137 same as to-page with -ible/-eable/-ibility/-eability -> -able/-ability
 128 same as to-page with apostrophes removed
 119 same as to-page with -re -> -er
 103 same as to-page with -ll/-lled/-ller/-lling -> same with -l-
  74 same as to-page with periods removed
  60 same as to-page with -or -> -er
  39 same as to-page with grey -> gray
  27 same as to-page with -yse/-yser/-yses/-ysed/-ys(e)ing/-ys(e)ation(al)/ys(e)able/ys(e)ability -> same with -yz-
  25 same as to-page with plough -> plow
  23 same as to-page with -gue -> -g
   8 same as to-page with accents removed and needs manual checking for accented -ed
   4 same as to-page with -iseing/-iseation(al)/iseable/iseability -> same with -iz- and omit extra -e-
   1 same as to-page with -yseing/-yseation(al)/yseable/yseability -> same with -yz- and omit extra -e-
   1 same as to-page

Many of those that are in the "not same as" category do not have the same pronunciation and should be moved to {{alt form}}, e.g. half-arse vs. half-ass, stanch vs. staunch, alternate energy vs. alternative energy, hylogeny vs. hylogenesis, palmate compound vs. palmately compound, aluminum hydride vs. aluminium hydride, raddleman vs. ruddleman, superturbocharger vs. turbosupercharger, wazzup vs. wassup, shunyata vs. sunyata, faclempt vs. verklempt, neurone vs. neuron, Moslemize vs. Muslimize etc. Benwing2 (talk) 04:01, 12 February 2024 (UTC)[reply]

Also there are quite a lot of misspellings tagged as "alternative spellings" such as kotower vs. kowtower, idiosyncracy vs. idiosyncrasy, guerilla vs. guerrilla, propadeutic vs. propaedeutic, incomunicado vs. incommunicado, dicotyledenous vs. dicotyledonous, etc. etc. Benwing2 (talk) 04:39, 12 February 2024 (UTC)[reply]
Rerunning the above expanded analysis on the {{alt form}} uses produces the following:
33076 not same as to-page, can't convert to {{alt spell}}
6243 same as to-page with hyphens removed
4870 same as to-page with spaces removed
3070 same as to-page with hyphens converted to spaces
2028 same as to-page with -ise/-iser/-ises/-ised/-is(e)ing/-is(e)ation(al)/is(e)able/is(e)ability -> same with -iz-
1707 same as to-page with æ/œ/ae/oe -> e
1573 same as to-page with capitalization ignored
1066 same as to-page with full canonicalization applied
 577 same as to-page with accents removed
 363 same as to-page with -ible/-eable/-ibility/-eability -> -able/-ability
 355 same as to-page with periods removed
 345 same as to-page with apostrophes removed
 325 same as to-page with -ie -> -y
 251 same as to-page with -ll/-lled/-ller/-lling -> same with -l-
 233 same as to-page with ph -> f
 215 same as to-page with -ey -> -y
 184 same as to-page with -our -> -or
 169 same as to-page with -or -> -er
 163 same as to-page with -re -> -er
  54 same as to-page with -yse/-yser/-yses/-ysed/-ys(e)ing/-ys(e)ation(al)/ys(e)able/ys(e)ability -> same with -yz-
  50 same as to-page with grey -> gray
  43 same as to-page with -gue -> -g
  21 same as to-page with plough -> plow
   8 same as to-page with accents removed and needs manual checking for accented -ed
   7 same as to-page with -iseing/-iseation(al)/iseable/iseability -> same with -iz- and omit extra -e-
   5 same as to-page
   3 same as to-page with ł -> l
   1 same as to-page with apostrophes removed and needs manual checking for pronounced schwa
Benwing2 (talk) 04:53, 12 February 2024 (UTC)[reply]
Re -isable vs. -izeable, I think forms without e are more standard/common(?), at least for the examples I could think of to check; iff this is right, we could bot-normalize accordingly. I.e., if foobariseable, foobarisable and foobarizeable are all currently listed as {{alternative form of}}s of ''foobarizable, we could recast foobariseable as an {{alternative spelling of}} (strictly speaking of foobarisable, but as this requires a user to click through two entries to reach the definition, maybe we just say of foobarizable—or double the templates, as I have sometimes seem people do: {{alt spelling of|en|foobarisable}}: {{alt spelling of|en|foobarizable|nocap=1}}), and recast foobarisable as a {{standard spelling of}} foobarizable, and foobarizeable as an {{alternative spelling of}} foobarizable.
Re Middle-earth vs. Middle Earth, there's not really a great way to handle those AFAIK; we might just say they're alternative spellings and reserve "alternative case form of" for cases where only capitalization differs, but I'd like to know if User:Urszag or other users of "alternative case form of" (besides myself) have any better ideas.
Re stanch vs. staunch: excellent, it would be great if we could also clean up entries in that direction (things given as "alt spellings" that are actually alt forms); and I think that if we modify the "alt spelling" template to make our intended distinction more explicit by adding some text about how it's an alternative spelling "with the same pronunciation" or "(pronounced the same)" or whatever wording we decide is clearest — and who knows, maybe modify the "alt form" template to also make its own intended distinction explicit by saying "alternative form of ... (pronounced differently)" or whatever we think is clearest — that will help users maintain the distinction in the future. Re kotower, ugh; if we can fix such things, great. Re which accents distinguish princesses' vs princess's, AFAIK it's any accent that doesn't have the weak vowel merger yet, including older RP / Standard Southern English etc; in an ideal world I'd love to use {{alt form of}} on those and give them all pronunciation sections showing which accents distinguish vs merge them, but in the interim, if we're changing {{alt spelling of}} and {{alt form of}} to indicate "pronounced the same" vs "pronounced differently", maybe we should allow that text to be suppressed by parameter, and then suppress it on these entries (and eventually give them all pronunciation sections). - -sche (discuss) 16:04, 12 February 2024 (UTC)[reply]
I think the plural suffix and possessive clitic in princesses' and princess's are homophones in old RP, both with /ɪ/. What is distinct from both of those is words in -a with a plural suffix or possessive clitic (-as and -a's), with /ə/. — Eru·tuon 17:57, 12 February 2024 (UTC)[reply]
Following up: where are things at, as far as: do we feel like we're ready to add "(pronounced the same)" or some equivalent wording T:alternative spelling of? - -sche (discuss) 15:38, 2 April 2024 (UTC)[reply]
@-sche I didn't ever quite finish coding the script so it can be run and change {{alt form of}} and {{alt spelling of}} appropriately. I have no issue with adding something like "(pronounced the same)" to {{alternative spelling of}}; it will help people fix the issues where the pronunciation isn't the same. Benwing2 (talk) 23:08, 3 April 2024 (UTC)[reply]
OK; thanks. :) And can we update the -is- vs -iz- differences to use "standard spelling of" + the relevant "from=" (-ise spelling vs -ize spelling), rather than "alternative spelling of"? (AFAIK there will only be a handful of cases where this is wrong, like advertize is not standard, but it should be a net improvement. But anyone who foresees issues, please comment!) And also, re e.g. -or vs -our, either change (to use "standard" rather than "alt") any other categories of spellings which can be identified as reliably a UK/US standard spelling difference, or list them somewhere to be looked over and dealt with later.
With regard to the cases that differ by -is-~-iz- + having an e or not, like foobariseable, AFAICT forms without e are more common(?), so... whatever you think is best, unless anyone else wants to weigh in, as far as whether to recast forms with e as just "alt spellings" of the lemma spelling, or double the templates, as I have sometimes seem people do, so foobariseable is an {{alt spelling of|en|foobarisable}}: {{standard spelling of|en|foobarizable|from=-is- spelling|nocap=1}}.
Anyone else want to weigh in on the exact wording re pronunciation, like "Alternative spelling of foo (pronounced the same)" vs "Alternative spelling of foo, with the same pronunciation"? (I suppose it's easy to change the wording the template displays later if people have better ideas; that doesn't require editing entries.)
BTW, would be interested to know which the "5 same as to-page" mentioned above are (five pages that say they're alternative spellings/forms of themselves?). - -sche (discuss) 17:14, 4 April 2024 (UTC)[reply]
@-sche My plan was/is to add English-specific labels corresponding to the major classes of spelling, so that e.g. we have labels ise-form vs. ize-form, or-form vs. our-form, etc., which allows us to change exactly what they say later on if needed, since it seems there are more than just two standard varieties of English and not everything breaks as US/Canada vs. UK/Commonwealth. In some cases the names I chose for labels are a bit more verbose to indicate clearly what's going on, e.g. for analog/analogue I chose gue-form vs. g-not-gue-form, since just g-form might be ambiguous as to what's being referred to. Benwing2 (talk) 23:58, 4 April 2024 (UTC)[reply]
PS: I work a lot on alternative forms/alternative spellings/synonyms. I try to fit in to Wiktionary policies as best I can, but I haven't used "alternative spellings" a lot yet. I am not wedded to any particular wording or theoretical framework and can adapt to any change you all see fit. --Geographyinitiative (talk) 22:36, 4 April 2024 (UTC)[reply]

I ran into this while doing cleanup in Category:Entries with incorrect language header by language. It raises some interesting technical and theoretical issues, so I'm bringing it here. There's some discussion on the talk page, but I'm not sure I agree with the outcome as seen in the entry itself.

The word in question is mentioned as the Dacian word for bryony in De Materia medica by Pedanius Dioscorides, a Greek physician in the Roman era. Not only is this work a priceless treasure for those studying ancient knowledge and usage of medicinal plants, etc., but he also gives names in other languages of the day. In this case, he writes:

βρυωνία λευκή· οἱ δὲ μάδον, οἱ δὲ ἄμπελος λευκή, οἱ δὲ ψίλωθρον, οἱ δὲ μήλωθρον, οἱ δὲ ὄφιος σταφυλή, οἱ δὲ ἀρχέζωστιν, οἱ δὲ κέδρωστιν, Αἰγύπτιοι χαλαλαμόν, Ῥωμαῖοι νότιαμ, οἱ δὲ ἕρβα κοριάρια, οἱ δὲ κουκούρβιτα ἠρράτικα, Δάκοι κινούβοιλα, Σύροι αλλαβιάρια.

My Ancient Greek is mediocre, at best, but I believe βρυωνία λευκή (bruōnía leukḗ, white bryony) is the main name- the lemma, we would call it- and all of the parts starting with οἱ δὲ are other Ancient Greek names for the same plant. Then there are the plural forms of nationalities such as Αἰγύπτιοι (Aigúptioi, Eygyptians), Ῥωμαῖοι (Rhōmaîoi, Romans), Σύροι (Súroi, Syrians) and Δάκοι (Dákoi, Dacians), which are obviously a brief way of saying "the Egyptians call it...", etc.

So, anyway, Δάκοι κινούβοιλα refers to κινούβοιλα as the Dacian name for the plant. This seems to be the only attestation of the word. Note that it's an Ancient Greek spelling occurring in an Ancient Greek text, but it's obviously intended to be a representation of the Dacian word as Dacian. No one will mistake this for IPA, but it's all we've got.

The reason I'm bring this here is the way our entry handles the bilingual aspects of this: it's under a Dacian language header, but the headword line consists of:

{{cln|xdc|lemmas|nouns|feminine nouns}}{{head|grc|noun|g=f}}

So, basically, the headword template is Ancient Greek, but the {{cln|xdc|lemmas|nouns|feminine nouns}} adds all the categories that a Dacian headword template would. As an aside: κινούβοιλα looks morphologically like an Ancient Greek first declension noun, and therefore probably feminine- but how do we know that it would be feminine as a Dacian word?

Thus we have this categorized as both Ancient Greek and Dacian (hence its presence in Category:Ancient Greek entries with incorrect language header). In spite of it occurring in running Ancient Greek text in Ancient Greek script, I don't see why we should treat it as Ancient Greek. Here in the United States, there are a number of extinct languages that are only attested in texts written in English, French and Spanish texts where someone lists some words in the local indigenous language. There are similar issues with languages like Old Prussian in Europe, and with many other languages in Africa, Asia, and Australia, not to mention throughout the rest of the Americas. In addition, there are all the LDLs where we're getting all of our data from reference works that aren't in the languages in question.

Another issue I was thinking about: we generally don't do Reconstruction entries for attested terms, but this is a gray area. I could see having a reconstruction of the original term that's imperfectly represented by the form that's actually attested (just a thought).

Also, it might be interesting to look at the influence of the "host" language for this kind of secondhand attestation. Every language has its blind spots: English speakers have trouble with true voicing distinctions (what we tend to call voicing distinctions are a combination of aspiration and length of preceding vowels), and speaker of most Romance languages have trouble hearing aspiration. Pretty much all speakers of European languages have trouble with tones. And so on, and so on. I wonder if it would be worth the trouble of developing categories for types of attestation beyond the LDL/WDL divide.

Thank you for reading through all of this. Partly I'm just thinking out loud, so I hope this will inspire more discussion to make something actionable out of it, and that it will be worth the time you spent on it. Chuck Entz (talk) 02:00, 12 February 2024 (UTC)[reply]

@Chuck Entz Not quite sure what to make of this specific case but what you might call phonological reconstruction of an ancient language attested in an imperfect spelling system is quite common, cf. Old Chinese, Mycenaean Greek, Old and Middle Persian, etc. In those cases we tend to enter the term in its original spelling and try to reconstruct the pronunciation as best we can in the Pronunciation section. So maybe you are right that these should be considered Dacian terms written imperfectly in Greek letters. I think this comes up also in Xiongnu terms written in Old Chinese texts, where there was a debate I think in WT:RFM whether to consider these as Xiongnu or Old Chinese. Cf. User:Theknightwho, User:Thadh, User:-sche who I think participated in that debate. Benwing2 (talk) 02:12, 12 February 2024 (UTC)[reply]
@Benwing2 @Chuck Entz So the issue with Xiongnu is slightly different. We have a handful of Old Chinese transcriptions that we're pretty sure are names (or maybe titles), which could go either way. The thing is, the Xiongnu weren't illiterate - we know they had written records, but the conventional wisdom is that they're in Old Chinese. However, after some recent discoveries, Vovin argued (quite convincingly) that it's likely an analogous situation to Japan and Korea, where Chinese characters were used for their semantic values, but the texts themselves are in Xiongnu. Or at least, some of them probably are.
I think a better comparison is maybe Jie, where we have a single sentence transcribed in an Early Middle Chinese source which explicitly describes it as the language of the Jie tribe (秀支替戾岡,僕穀劬禿當). Or possibly even Bala, which is probably extinct, but is attested only in a Chinese transcription from the 1980s of a song. Admittedly, it was transcribed by a linguist, but all the same: it's an imperfect rendering, and any textual analysis has to be done by reference to the likely Manchu cognates (and sometimes Jurchen, since it retained some archaic features lost in Manchu). Theknightwho (talk) 06:00, 12 February 2024 (UTC)[reply]
We know too little about Dacian to so much as dream of a reconstructed equivalent of κινούβοιλα.
Otherwise, I agree. We can only put {{head|xdc|noun}} because it is a Dacian word and we don't know that the language had grammatical gender (though it's not unlikely). The mere fact that the word is mentioned in a passage written in Greek does not make it Greek. Dioscurides indeed cites it as the word that the Dacians use. Only if he were using the word in an ordinary way to refer to the plant could a case be made for {{head|grc|noun|g=f}}. Nicodene (talk) 06:39, 12 February 2024 (UTC)[reply]
@Nicodene @Chuck Entz There was a discussion on the Hunnic language in which this issue came up, so we've now got the (frankly ridiculous) Ancient Greek entry κάμος (kámos) (noted as a hapax!), which is very clearly described as a "native [Hunnic]" word in the original source. It's nonsense. These arguments make sense when it comes to names, but in situations like this it feels like a bizarre form of mental gymnastics to sweep the awkwardness of an imperfect non-native transcription under the rug, by simply pretending it's not an attestation at all. In that case, there's a separate question of whether they actually are Hunnic, but κάμος (kámos) is most definitely not Ancient Greek, going by the source. Theknightwho (talk) 06:54, 12 February 2024 (UTC)[reply]
That is baffling. It is surely better to put κάμος as Hunnic with a question mark, if that is the best available guess, than to put it as Greek, which we know for a fact is wrong. Safest of all, put the various words together in an appendix, as some were suggesting. Nicodene (talk) 07:25, 12 February 2024 (UTC)[reply]
The original text apparently doesn't say *κάμος but rather καμον. I would like to ask on what basis the masculine gender has been assumed, to deduce a nominative singular with -ος? Or on what basis the stress position has been assumed? Nicodene (talk) 07:43, 12 February 2024 (UTC)[reply]
Right. I have known this subject for long, but not hyperfocussed on it. Which does not absolve of having entry criteria. The leading word in Dacian of this thread is at the correct place, only the header is wrong (I’d do {{head|xdc|noun|sc=Polyt}}), and so would be the Hunnic word; for Trümmersprachen we can definitely lemmatize under attested non-lemma forms, if that makes sense: “citation forms” are not really a concept for languages not systematically known.
If the languages are known enough for citation forms to make sense then slight adjustments can be made, hence for the Systemzwang I entered briginos instead of mentioned briginos (the reasoning of which someone was too simple to transcend, so he RFVed it), and Punic terms supported by comparanda were entered in Punic script: about a dozen Punic plant names mentioned in the same Dioskurides interpolations (we know they were not present in the original work but later added, just for simple language one omits this factlet anyway) or only supported by Berber borrowings, only in the later case being situated in the reconstruction namespace. In the Ugaritic entry 𐎃𐎚𐎐 (ḫtn) I created a day ago you also see that I prefer having Ugaritic items attested but in syllabic cuneiform entered in alphabetic cuneiform if possible, which in this case keeps terms of the same root at one entry, as is always a goal in Semitic dictionaries. The same holds water if a multiword term is attested, 𐤔𐤕𐤋 𐤔𐤃 (štl šd /⁠šəṯīl sad⁠/) but only one or none of the constitutive terms: these get entries in the mainspace anyway, therefore unstarred 𐤔𐤕𐤋 (šəṯīl), at least being explicit that the derived Fügung is the only attestation of this term. For Germanic languages we hold that compounds do not generally attested simples but this does not apply analogically to terms in the Semitic construct state.
Following this discussion I am also more inclined to think that Old Persian or whatever gangaba can only be entered as such, not presented as Latin or a reconstruction, which we know is wrong, irrespective of the inflectional ending. In the old RFD Mnemosientje ponderably intervened that such words need a place to stay and the appendix is where words go die; we have not decided yet where we enter such a word “as such”. There is much standing to avoid having terms in the mainspace whose structure, in spite with systematic knowledge of the target language as in the case of Old Persian, is intransparent; in addition to transcription being imperfect the textual situation is supported by nothing. It is well known that the Thracian and Dacian and Phrygian or whatever words in Greek writing have variants, though I now don’t find a list of the terms where I got this impression from, and of the Punic plant names mentioned in ancient Greek or Roman writings I could only enter about a tenth, due to their being otherwise supported or us being able to make sense out of them. Löw, Immanuel (1881) Aramæische Pflanzennamen[4] (in German), Leipzig: Wilhelm Engelmann treats all the eighty-or-so alleged Punic names in an appendix and the rest I avoided, having all created terms deliberately hiding behind advanced linguistic arguments, in fear of creating a role model (Vorbildwirkung, as we say in construction law) of wanton littering of the mainspace in ancient lay transcription. You know, those Albanians, they will be overconfident about the attestation of ancient languages of the Balkan. Fay Freak (talk) 09:15, 12 February 2024 (UTC)[reply]
For convenience, the quote from Curtius is gangabas persae vocant umeris onera portantes, loosely translatable as "the Persians refer to those charged with carrying baggage on their shoulders as gangabas". Nicodene (talk) 09:49, 12 February 2024 (UTC)[reply]
I'm torn on this issue. On one hand, there is no real difference between this attestation and any other pre-IPA description of an unwritten language.
On the other hand, there doesn't seem to be any intention here to describe the language, but rather only to give this one native name. It would be the same as an English text saying "The Russian doll, called matryoshka in Russia...", and I think we can agree that matryoshka is an English term, not a Russian one.
This isn't a matter of "imperfect spelling" or anything of the sorts, this is a question of whether the author did try to follow the original word phonologically, or whether it was a borrowing (including phonological/phonotactical substitutions) that he noted to be a borrowing. I would assume the latter. Thadh (talk) 10:20, 12 February 2024 (UTC)[reply]
Well, I would agree that "matryoshka" is an English word, but I don't think this is established by sentences of that type. It's established by its use in English sentences that don't explain what the word means as a Russian word.--Urszag (talk) 10:35, 12 February 2024 (UTC)[reply]
What Urszag says. The Englishness is in other instances. Fay Freak (talk) 10:40, 12 February 2024 (UTC)[reply]
We can agree that matryoshka is an English term only because it actually happens to be one, not because the single quote you have written would have ever proved this.
There is a species of flower that Georgians call yayachura. My writing this in English, using Latin characters for the word in question, in no way makes it an English word. Nicodene (talk) 10:40, 12 February 2024 (UTC)[reply]
It doesn't make it a Georgian word, though. It is not an English word only because it doesn't have enough traction. I don't think we can use yayachura used in an otherwise English setting as an attestation of the Georgian term. Thadh (talk) 11:17, 12 February 2024 (UTC)[reply]
Of course it's a Georgian word - I've just told you it is. Writing ყაყაჩურა in another alphabet doesn't make it magically not what it is. Nor does talking about it in another language.
And yes that is how English words work. 'Globulicious' needs only to gain a little traction and it too can be one. Given that I've invented this about four seconds ago it may be a bit premature though. Nicodene (talk) 12:04, 12 February 2024 (UTC)[reply]
In "The Russian doll, called matryoshka in Russia…", matryoshka is neither a Russian nor an English word, it is an interlanguage of sorts. Such quotes are useless for the purposes of attestation. All they can do is point towards the existence of such a word, and set us on the path of looking for real quotes. PUC12:14, 12 February 2024 (UTC)[reply]
It is an attestation, by definition, of the existence in Russian of the word matryoshka. It may not be an optimal attestation, but it is one nevertheless. And if this were the only time the word occurred in an English sentence, lemmatising it as English would be mad. Nicodene (talk) 12:22, 12 February 2024 (UTC)[reply]
No, it is an attestation of the existance of something that the English speaker identified as matryoshka. This is not the same thing. If I hear some Arabic person saying 'alhairwan, not knowing a single thing about Arabic and having no idea what it means or whether it was a word, it doesn't mean there is anything of the sort in Arabic. Thadh (talk) 12:37, 12 February 2024 (UTC)[reply]
To continue this thought, having worked with a lot of smaller languages, I have found myself seeing bullshit ad-hoc transliterations of non-existant words into Russian (or some other language) that was based on a misunderstanding of a misheard word in the target language.
An example is лайба (laiba), which allegedly means "Izhorian sailboat", with the only similar word in Ingrian being laiva (ship); There is no cultural aspect to its meaning, nor is there a form with -b- (or -p-) to be found in any resource I could find or any language in the region, but Russian ethnologists seemingly misheard and misinterpreted this word. In the end this word did make it into the language as a borrowing, but it might well not have. Thadh (talk) 12:49, 12 February 2024 (UTC)[reply]
Yes I'm sure that an author who gave the definition of a word in Dacian, in a form that just so happens to resemble other Paleo-Balkan forms, attested in exactly the same sense, had no idea it was a word, nor knew what it meant. He was just randomly scribbling letters and got really lucky.
We've moved beyond the point, which was whether it should be lemmatised as 'Ancient Greek'. Has your point has now shifted to 'Dioscorides' testimony is illegitimate to begin with' and therefore the entry shouldn't exist? Nicodene (talk) 12:54, 12 February 2024 (UTC)[reply]
I'm not sure if we should lemmatise it as Ancient Greek or not record at all, but I don't see evidence of it really being Dacian. It may be a borrowing, it may be a transcription, but as long as we have no proof of the accuracy of this transcription I don't think it's fair to call this a part of the original language.
If I were to take your example, ყაყაჩურა, and say "Well, I've heard from a friend that there's this Georgian plant called yaychoora (/jeɪˈtʃʊɹə/)", I think you'll agree this isn't Georgian at this point, even though it "resemble[s]" other Karvelian terms. Thadh (talk) 14:21, 12 February 2024 (UTC)[reply]
What there is actually zero evidence of is this somehow being a 'borrowing into Greek' when the one mention by a Greek man explicitly calls it a foreign word. A man who, let's not forget, had just finished listing all the ways that Greeks do call the plant, then went on to contrast it with how Dacians call it! I can't believe we're even having this conversation.
That, and the parallel with the Thracian makes it clear beyond any reasonable doubt that Dioscorides' testimony is not some gibberish like the example you have attempted. I'm not sure why you've decided to run it through some odd spelling-pronunciation when the route of transmission under discussion is aural - thus a starting-point /q'aq'atʃura/. Nicodene (talk) 14:57, 12 February 2024 (UTC)[reply]
You're just trying to dismiss my argument on the basis of small inaccuracies, ignoring the bigger picture. Even a transcription like "cuckchura" (/kʌkˈtʃʊɹə/) would not be acceptable as a Georgian word, or an attstation of it, only as an English attempt at recreating one. Same here: The term itself is only similar to the Thracian one in so far as it seems to have a similar stem and maybe a similar suffix, but it's not similar enough to be able to apply sound laws to it. I would not expect a Greek scholar to be able to accurately transcribe a Dacian term. Thadh (talk) 17:28, 12 February 2024 (UTC)[reply]
The bigger picture is this: once you actually produce any evidence that the word was massively distorted, besides 'but what if??', then and only then will any of this amount to an actual argument. Nicodene (talk) 18:41, 12 February 2024 (UTC)[reply]
But otherwise we get this weird stuff like when koekchuch was listed as an Itelmen word and not English. Tollef Salemann (talk) 20:47, 21 February 2024 (UTC)[reply]
I also concur. --RichardW57m (talk) 10:58, 12 February 2024 (UTC)[reply]
There's an awkward aspect to this. What do we do when foreign words have the inflections of the matrix language? Perhaps we can treat each inflected form separately, and explain the inflection in the etymology. Perhaps for pages with many languages, we should have a soft direct. I'm not sure whether we even need a part of speech - and in many cases it might be discordant between the two languages. --RichardW57m (talk) 11:07, 12 February 2024 (UTC)[reply]
Then we drop or adapt the inflection in so far as we know the strange language, and don’t if we don’t know anything to that direction. Because otherwise we can’t well list the term as a term in the strange language, while we can’t list it as a term in the matrix language either since it is a false claim. The same problem actually occurs with uses in one known language: Latin farfarum being of unknown citation form but not reconstructed either; if a citation form is not attested one puts the star after the assumed citation form, in the academic writing. Fay Freak (talk) 12:10, 12 February 2024 (UTC)[reply]
There's a key difference between the Dacian example and "The Russian doll, called matryoshka in Russia..." : the reason the latter is not an attestation of matryoshka as either an English or a Russian word, the reason it's an "interlanguage" as PUC put it, is that for English and Russian our attestation requirement is use, and that's not a use — even "The London palace, which Londoners call Ally Pally" would not be an attestation of Ally Pally as an English word, in the sense we mean when considering whether to have an ==English== entry for it, because it's not a use. But for extinct languages where we don't require use and mention is sufficient, we do accept a 'reliable' Roman author saying (in and about Latin) "The foobar plant, called thymum foobaricundum in Rome..." as an attestation of thymum foobaricundum sufficient to have an entry for it! We accept mentions of various Native American words in Spanish texts (including, clearly, in cases where the Native language is not known to have used the Latin alphabet). For this attestation of a Dacian word in an Ancient Greek text, I think it comes down to whether scholars regard the attestation as reliable vs unreliable (I think there are cases of ancient authors confusing which people/language used a particular word, but in this case, if the evidence is that this is correct, then my inclination, at least based on the discussion so far, is to agree with those saying we should have a Dacian and not an Ancient Greek entry). - -sche (discuss) 17:19, 12 February 2024 (UTC)[reply]
This leads us to an additional argument Thadh apparently needs: even from analogy there can only be a Dacian and not Ancient Greek entry, because for those American Indian cases we can’t have English or Spanish entries due to the requirement of use in English or Spanish. It is conceptually incorrect though to claim that there is not an attestation of the Russian word: there is, in the usual academic sense of “attestation”, but none according to our quality standards for Russian, none we make use of because we committed to make use of better occurrences. In either case the enterprise of the author of the sentence of mentioning a foreign word and not employing or mentioning a word in his language is clear. Fay Freak (talk) 17:31, 12 February 2024 (UTC)[reply]
So basically what you are saying is that we should be fine with using low-quality resources for smaller/more vulnurable/extinct languages? I take extreme issue with this, it's fine saying a language needs fewer attestation, but these attestations should still be up to the standard of our dictionary, and that standard should be the same across all languages. Thadh (talk) 17:39, 12 February 2024 (UTC)[reply]
A hungry man will be content with fast food. Fay Freak (talk) 17:42, 12 February 2024 (UTC)[reply]
Not every hungry man, and we shouldn't be such a hungry man. Thadh (talk) 18:00, 12 February 2024 (UTC)[reply]
I could turn your question around: so basically what you are saying is that we should suppress records of small, vulnerable communities' languages precisely in those cases where previous colonizers tried to suppress those languages before whoever meets your definition of a proper scientist could record them? I would take extreme issue with that. I don't think it's helpful for you or me to accuse each other of such motives. As I said, we ought to evaluate case by case whether a source is usable; in some cases, a record is confused (e.g. records of Loup, especially what is now separated out as Loup B), and we don't or shouldn't use it; in other cases, notwithstanding that the original record was by no proper scientist, there's modern scholarship devoted to even single attested words and evaluating what sound laws etc they fit and what implications that has for what family the language would've belonged to. (And many cases are in the middle somewhere.) In this case, it seems like there is modern scholarship accepting this as an attestation of a Dacian word and even speculating on what sound laws etc it indicates, e.g. centum vs satem or simply depalatization. If there is other modern scholarship arguing that this author was confused and mistook some other language's word for Dacian, please bring it to bear. - -sche (discuss) 18:34, 12 February 2024 (UTC)[reply]
I strongly agree with @-sche. Theknightwho (talk) 18:55, 12 February 2024 (UTC)[reply]
I honestly don't think we should create entries for the Native American terms attested only as mentions in non-scientific literature either. I also don't see how those would not be the same "interlanguage" as matryoshka is. Seems to me like we have one rule for well-attested languages and a completely different rule for those that aren't, and while sometimes that is justified (namely, number of quotes and/or cites in scientific literature), I don't think it is in this case. Thadh (talk) 17:32, 12 February 2024 (UTC)[reply]
You won’t really convince people with this stance. Americanists take all they get, and editors will put those mentions somewhere, interlanguage or not, the motivation is not shattered by this Cartesian abstraction; Native American communities have been inviting or seclusive to varying degrees and you can’t even strictly define what is science and what not: it also depends on the current technological, and intellectual, capabilities at any given time. Fay Freak (talk) 17:42, 12 February 2024 (UTC)[reply]
If we get a word like koekchuch, which was once registered here as Itelmen, now it is clear that the real spelling is unknown, but we have the original spelling from a Russian source. Does it hurt Itelmen? No, because if you scroll down, you get a category reference to all the Itelmen words in English. So, even if the original Itelmen word is lost, we have it still in English listed here in the right category. The other way of doing it is stupid. It is like the guys who read the Hebrew letter from Khazars and find a tribe called "wnntit" and are sure that it is the Vyatichi tribe. Tollef Salemann (talk) 21:00, 21 February 2024 (UTC)[reply]
If you need to scrap a lost language from every corner of the world just to get a few words written by a foreigner, you still can set these words together to find patterns you can use for compare it to other languages. But as i said, you have already the categories here for these kinds of words. The one smart idea which can be done, is to create categories for every language, which are gonna contain all the words borrowed into other languages, so you can easily find an obscure term like barrabora listed both in Itelmen and English main topic. Now you find it only in English categories, because it is an English term. Tollef Salemann (talk) 21:13, 21 February 2024 (UTC)[reply]
Are you even sure that the spellings are ok? He gives translations into Syriac, but they are weird. Like, for lilly he gives "sasa", which is very distant from all the Semitic names of this plant, not only because Greek has no "sh"-letter. His Latin seems ok tho. Tollef Salemann (talk) 21:35, 21 February 2024 (UTC)[reply]

Announcing the results of the UCoC Coordinating Committee Charter ratification vote[edit]

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Dear all,

Thank you everyone for following the progress of the Universal Code of Conduct. I am writing to you today to announce the outcome of the ratification vote on the Universal Code of Conduct Coordinating Committee Charter. 1746 contributors voted in this ratification vote with 1249 voters supporting the Charter and 420 voters not. The ratification vote process allowed for voters to provide comments about the Charter.

A report of voting statistics and a summary of voter comments will be published on Meta-wiki in the coming weeks.

Please look forward to hearing about the next steps soon.

On behalf of the UCoC Project team,

RamzyM (WMF) 18:24, 12 February 2024 (UTC)[reply]

Nice. Icantthinkofaname12 (talk) 20:39, 16 February 2024 (UTC)[reply]

Glossing entries based on where the referent is found[edit]

I'm a bit concerned by this user's large number of edits of this kind: [5]. See the user's talk page for my reasoning. Equinox 21:10, 17 February 2024 (UTC)[reply]

Yeah, sounds like a shit idea. PUC21:11, 17 February 2024 (UTC)[reply]
@Equinox Yeah this is garbage. IMO if this user won't stop on their own they need to be blocked. Benwing2 (talk) 21:36, 17 February 2024 (UTC)[reply]
It doesn't fit well either in our context labels or our topical categories. The notion that the referent is to be commonly found principally in a particular region would be fine in a definition. DCDuring (talk) 23:44, 17 February 2024 (UTC)[reply]
@DCDuring, @Benwing2, @Equinox: do we want to revert this and/or other edits like this? It seems the edit still stands at the moment. Kiril kovachev (talkcontribs) 03:27, 18 February 2024 (UTC)[reply]
@Kiril kovachev Yes. Do you know how many there are? Benwing2 (talk) 03:34, 18 February 2024 (UTC)[reply]
@Benwing2 I'm afraid not, I only saw that we hadn't changed this particular one. Looking through the contributions is possible though: diff on brook seems a bit odd - is that a word only used in Northeastern US...? skillet? There are probably quite a few judging by this... Kiril kovachev (talkcontribs)# Kiril kovachev (talkcontribs) 03:43, 18 February 2024 (UTC)[reply]
The label for brook approximated what I found in DARE. He might be using DARE, but DARE may often be too detailed for us. DCDuring (talk) 04:35, 18 February 2024 (UTC)[reply]
@DCDuring I am going through the changes; most appear garbage. You are welcome to review them more specifically but there are over 500. I doubt brook for example is only Northeastern US; it is poetic everywhere. Benwing2 (talk) 04:49, 18 February 2024 (UTC)[reply]
We are more a people's dictionary than a literary one. I reworded brook to more closely follow DARE. It may be more detail than we want and may need a reference. DCDuring (talk) 04:55, 18 February 2024 (UTC)[reply]
Saying brook is Northeast-only is just wrong if it's used everywhere in poetry (and I doubt it's only poetry). Benwing2 (talk) 04:58, 18 February 2024 (UTC)[reply]
@User:Benwing2 So, add some qualification. Lots of contributors make contributions that need correction. Reversion is abusive of a good-faith user. DCDuring (talk) 16:26, 18 February 2024 (UTC)[reply]
@DCDuring Well in the case of brook, you changed the qualifier from misleading (implying it was only used in the Northeastern US without saying as much), to outright wrong (saying it originated in New England), when our first quote is from Shakespeare in the 1590s, followed by one from the KJV. Theknightwho (talk) 18:55, 18 February 2024 (UTC)[reply]
You are right. I took the DARE perspective, which ignores usage outside the US completely AFAICT. DCDuring (talk) 19:58, 18 February 2024 (UTC)[reply]
I wonder whether brook is used in India. It does seem to be used in the UK, eastern Canada, eastern Australia, and eastern US.
@User:Benwing2 He also seemed to be working from DARE for skillet. Did you check DARE or have a better source?. I think it wrong to revert first and ask questions later. DCDuring (talk) 04:51, 18 February 2024 (UTC)[reply]
@DCDuring I don't have access to DARE but on first glance many of the labels are patently wrong. I am reviewing them one-by-one but making educated guesses as to whether a change is plausible. Benwing2 (talk) 04:56, 18 February 2024 (UTC)[reply]
I'm sorry for troubling you, I explained my reasoning for many of my edits in my talk page and I can provide my reasoning for the more dubious choices if you like. I thought if I were doing anything wrong that it would've drawn more attention sooner. SwordofStorms (talk) 05:06, 18 February 2024 (UTC)[reply]
@SwordofStorms You still don't understand the difference between usage and reference. Terms like beignet, dirty rice, jambalaya etc. are used everywhere to refer to terms that may originate in specific parts of the US; but that doesn't mean they should be labeled with those parts. Terms like bitching and grody may have originated in California but have spread everywhere in the US. DARE is also quite out of date. Benwing2 (talk) 05:11, 18 February 2024 (UTC)[reply]
@User:Benwing2 Early volumes of print DARE are old: volume 1 is copyright 1985, with older data, of course; there has been much loss of regional difference in the US as people move between regions, use national rather than local media, etc, but much remains. We are, among other things a historical dictionary, so mere age should not be a disqualification of contributions (or contributors!). DCDuring (talk) 16:04, 18 February 2024 (UTC)[reply]
I assume that would mean that 'bunny chow' shouldn't have a 'South Africa' label and 'bangers and mash' shouldn't have a UK label then. But they do so I do hope you understand the confusion on that part.
I'm sorry about the DARE entries but that's a source Wiktionary's sister site, Wikipedia uses liberally lol. I understood some of my entries were tenuous and I was hoping I'd be corrected if I were jumping the gun! I'm not well travelled! SwordofStorms (talk) 05:19, 18 February 2024 (UTC)[reply]
OK I reviewed this user's changes (and reverted most of them). User:SwordofStorms you also don't seem to understand what doublets are; just because two terms are etymologically related doesn't make them doublets (cf. water, hydro, und). Benwing2 (talk) 06:10, 18 February 2024 (UTC)[reply]
@User:Benwing2, @ChuckEntz Are User:Demonicallt and User:SwordofStorms the same? DCDuring (talk) 16:26, 18 February 2024 (UTC)[reply]
@DCDuring Looks like Wonderfool to me. Benwing2 (talk) 20:36, 18 February 2024 (UTC)[reply]
Definitely not WF. Wrong kind of mistakes. Chuck Entz (talk) 21:14, 18 February 2024 (UTC)[reply]
@Chuck Entz I mean that Demonicallt looks like WF, not SwordofStorms. Benwing2 (talk) 22:44, 19 February 2024 (UTC)[reply]
@Benwing2 Yes, Demonicallt is unquestionably WF. DCD asked whether they were the same, so it looked you were saying "Yes, they're both WF" rather than "No, unlike SwordofStorms, Demonicallt is WF". Thank you for clarifying. Chuck Entz (talk) 23:16, 19 February 2024 (UTC)[reply]
Part of the problem has to do with the fact that topic and register/restricted-context labels appear identically. How can we blame a new contributor for being confused about this. We are compounding the problem by discouraging a new contributor by reverting good faith contributions based on knowledge of how experienced users (knowledgeable about the distinction we make between topic and register/etc.) view subjective beliefs or personal concept of what an entry feature should be. We should take this as an indication that we need to resolve the confusion in our label/topic display. DCDuring (talk) 16:04, 18 February 2024 (UTC)[reply]
The confusion is caused by taking over labelling by means of brackets from past print dictionaries. These labels look identical to additional information in parentheses such as by {{gloss}}. If one were to add some other decoration separating the labels from the header “the difference between usage and reference” would transpire more. Fay Freak (talk) 17:08, 18 February 2024 (UTC)[reply]
OK, @SwordofStorms if you were relying on DARE saying the terms really were regionally restricted, I apologize for my tone on your talk page, I can't fault someone for not realizing if DARE is not actually reliable; I would've thought it was reliable myself, unless I saw it saying something like that brook or cheesesteak was regionally restricted (does it really say that? I'm disappointed if it's so inaccurate/unreliable).
To DCDuring's point: yes, historical information, where it's accurate and useful, is worth preserving/indicating ... but we have to be careful, because as the words we're discussing demonstrate, (1) sometimes the supposed historical information is wrong, as with the claim that brook was originally New England (no, it was used already in old England), and (2) often, such "historical information" is not useful: e.g., from the perspective of sheer technical correctness, we could add "originally Britain" or "originally England" as a label on brook, and furthermore on stream and tens or hundreds of thousands of words which were first attested in Britain/England sometime before English-speakers even reached the Americas, but that's not useful, now is it? We have to be sure there was a period of time when a term was actually dialectal. - -sche (discuss) 19:59, 18 February 2024 (UTC)[reply]
I found that AHD took the trouble to have the following:
Our Living Language Traditional terms for “a small, fast-flowing stream” vary throughout the eastern United States especially and are enshrined in many place names. Speakers in the eastern part of the Lower North (including Virginia, West Virginia, Delaware, Maryland, and southern Pennsylvania) use the word run. Speakers in the Hudson Valley and Catskills, the Dutch settlement areas of New York State, may call such a stream a kill. Brook has come to be used throughout the Northeast. Southerners refer to a branch, and throughout the rural northern United States the term is often crick, a variant of creek.
Of course, they have the advantage on us of having the US (alone?) as their target market. DCDuring (talk) 20:23, 18 February 2024 (UTC)[reply]
@DCDuring @-sche I may have been a bit heavy-handed on the reversions but a lot of them appeared outright wrong to me such as saying fridge, freeway and frontage road are regionally restricted in the US (possibly 'frontage road' is regionally restricted but the regions given didn't make sense to me) and terms like arroyo and caliche are Texas-only (e.g. I grew up in Arizona and we used them there; as these are Spanish-origin terms maybe they are more present in the Western US but not only Texas). Some added definitions were just bizarre e.g. under bring this user added the following:
# {{lb|en|Florida}} To carry something inside; to have something.
@DCDuring: The main problem is that inline labels are the wrong tool for presenting DARE data. The goal of DARE is to document the variation in regional English usage in the United States. This is rarely simple and concise enough to represent in the amount of space available in front of the definition. When you use context labels, you have to leave things out, which makes what's left rather cryptic and/or ambiguous. Where it's of interest it should be presented as "Usage notes", which were designed for precisely this sort of thing. Chuck Entz (talk) 21:14, 18 February 2024 (UTC)[reply]
I had earlier written "but DARE may often be too detailed for us." I'm not certain that it is, but I mostly object to the manner in which this mass reversion was carried out: shoot first and ask questions later. (And disparage evidence rather than admit you have none to support your own preferences.) DCDuring (talk) 21:22, 18 February 2024 (UTC)[reply]
In general I left terms I didn't recognize except those referring to regional food items, where the confusion between usage and reference was present. My logic is that I'd rather remove incorrect info than try to correct things one by one where I don't have the resources to do so. In general this user doesn't seem to be applying common-sense tests to their labels (the "smell test") but taking some resources on face value. If this resource is DARE, then we know now that DARE is wrong or at least quite out of date in their data. Benwing2 (talk) 20:23, 18 February 2024 (UTC)[reply]
@User:Benwing2 How hard would it have been to correct arroyo (only heard it in context of descriptions of SW terrain and mostly mentions) and caliche (never heard of it before)? DARE supports the labels SwordofStorms provided. What source do you have to contradict that other that personal opinion?
Well, you probably won't have to worry too much about SwordofStorms coming back.
You have made me feel less embarrassed about my own occasional dyspeptic outbursts. DCDuring (talk) 20:36, 18 February 2024 (UTC)[reply]

Use of T:lang[edit]

See User talk:Benwing2#Italicising synonyms for taxonomic names for a relevant discussion.

Is there any kind of community position (for or against) on using {{lang}} like this? User:PUC objected, for no good reason, IMO. Incidentally, I object to PUC's rank-pulling and rudeness (see Special:History/schema logu and his/her talk page, respectively) in dealing with this issue, if that counts for anything. 0DF (talk) 23:12, 17 February 2024 (UTC)[reply]

There's no reason to do this that doesn't apply to almost every line on millions of pages. I don't see the point. Chuck Entz (talk) 23:23, 17 February 2024 (UTC)[reply]
The language is inherited from the page-language. For example it is lang="en" in the <html> element while lang="de" on German Wiktionary. The template {{lang}} is for marking a text as in another than the inherited language, so using it this way you abuse the template. Fay Freak (talk) 23:30, 17 February 2024 (UTC)[reply]
@Chuck Entz: I gave the rationale in this edit summary (i.e., w:Template:Lang#Rationale, specifically points 1, 2, 3, 6, and 7) and in this post (“{{lang}} skips the left-hand table of contents, which serves all readers, not just visually-impaired ones”). That was before Fay Freak informed me that “[t]he language is inherited from the page-language” (thanks for that, F.F.). In the light of that, only the section-linking caused by {{lang}} still applies as a reason; that can just as easily be achieved by using {{l|en}}, so I'll use that instead in future. 0DF (talk) 23:52, 17 February 2024 (UTC)[reply]
I've unblocked the page, but what do you have in mind exactly?
On another note, I apologize for the rudeness, but having to deal with large numbers of vandals or well-meaning but ultimately clueless editors (I'm not saying you're one of them, though) has made me a bit callous. I simply have lost the patience to explain to people how things are done here. When I see what looks like nonsense of any kind, I'm inclined to get rid of it as fast as possible so that I can go back to adding more material myself. Not the ideal mindset in an admin, but hopefully what I'm doing is still a net benefit for the project. PUC00:04, 18 February 2024 (UTC)[reply]
@PUC: Thanks for the apology. It's all water under the bridge. The section-linking I have in mind is to skip the left-hand table of contents, which pushes all the actual entry-contents down, often off the screen. 0DF (talk) 00:10, 18 February 2024 (UTC)[reply]
@0DF It is not normal to use {{l|en|...}} in definitions. Just use bare links, unless the link is to the pagename itself. Benwing2 (talk) 00:04, 18 February 2024 (UTC)[reply]
@0DF: You decrease source-code readability and strain server-side execution time without sufficient reason, I think. The section-linking reason you state is a reason but sore meagre: consider that there is a collection of reasons why English (and translingual) sections are put at the very beginning of main-space pages. When I began here the first months I thought the section-linking also great and was enough excited to employ {{l}} in the same fashion, later I valued my fingers and the brain typing on them too much—before I even deadlifted with the same and systematically engaged in maxxing out mental resources. This does not say that it is not particularly desirable in individual cases to link specific senses on large target pages or entry sections, which is admitted by those who read the source-code. Fay Freak (talk) 00:06, 18 February 2024 (UTC)[reply]
@Benwing2, Fay Freak: OK, so in what cases is linking to language sections desirable (outside term-list sections, I mean)? 0DF (talk) 00:13, 18 February 2024 (UTC)[reply]
Whenever linking to non-English terms, and when linking to English terms in term-list sections and the like (as you call them), and when a specialized template is required e.g. {{cog}}, {{inh}}, {{bor}}, etc. Generally yes in lists, no in running text using {{l}}. Benwing2 (talk) 00:20, 18 February 2024 (UTC)[reply]
I will not give a comprehensive answer, since this question actually rarely occurs, as such, empirically. But in general specific links concern particular senses that do not immediately surface, by the typical viewing behaviour of a reader: a slangy sense might not be likely to be remembered in context and then sought out: like leg day links a specific sense of split; it does so in a holonyms section with |id= but by the same token it can be desirable to link senses in foreign-languages glosses, which I can not think to do in another way than by {{l}}. This is for senses however, with language sections the specific use can not be that high, since such links are not that specific. Fay Freak (talk) 00:22, 18 February 2024 (UTC)[reply]
@Benwing2, Fay Freak: So there are no cases where you would deem it desirable to link to the English-language section of an entry from a definition? No matter how long the table of contents? Is the left-hand table of contents not regarded as an issue? 0DF (talk) 00:27, 18 February 2024 (UTC)[reply]
@0DF: I have not found, or at least do not remember, which is tantamount, a case empirically, where it is sufficiently desirable for one to exert oneself to link an English-language section, in seven years. Know that tables of contents can and are also manipulated, as on bar. I am concerned for your end-user device, writing this on 27" QHD, high response over DisplayPort with high-end graphics card and wont mouse. But I make this value judgment even though the interphalangeal joints of my right index finger have already hurt many days during the last two years due to all the computer-work or doomscrolling. Fay Freak (talk) 00:49, 18 February 2024 (UTC)[reply]
@0DF I agree with User:Fay Freak here. I think it would be better on pages with excessively long TOC's to add the appropriate settings to keep the lower levels of the TOC closed by default (as is done on bar). Benwing2 (talk) 00:54, 18 February 2024 (UTC)[reply]
Also I mentioned one specific use case, which is when the link is to the page itself, because in that case a bare link shows up as unliked boldface. Benwing2 (talk) 00:56, 18 February 2024 (UTC)[reply]
@Benwing2: Yes; noted with thanks. That is also a case in which {{lang}} doesn't work; only {{l}} will do there. 0DF (talk) 01:32, 18 February 2024 (UTC)[reply]
@Benwing2, Fay Freak: The table of contents for bar, despite being limited to the forty-nine level-two language headers only, still takes up two whole screens' worth of vertical space for me. That would be exactly the kind of case in which I would think that section-linking would be desirable. But, if The Community deems otherwise, so be it. 0DF (talk) 01:30, 18 February 2024 (UTC)[reply]
@0DF: For me three fourths of the TOC take up the whole screen and I am zoomed in 160%–170%, with no zoom where I can still read well the whole table takes 100%. I sometimes (not rarely; the behaviour is automatical after some use) click language sections in the TOC but yet I realize the order is pointless. I compare to Wikipedia’s default and prefer their TOCs technique now, which is from two years ago. In your preferences → Appearance their skin is Vector 2022. Should’ve tried before instead of maltreating my index as in 2021: It looks much, much better, also in comparison to Wikipedia! We might vote on it to make it default on Wiktionary, if basically editors weren’t energetic enough to even check out available layout improvements, busy with streamlining other parts of Wiktionary. Fay Freak (talk) 02:08, 18 February 2024 (UTC)[reply]
@0DF IMO it's not worth the effort to try and have a special list of terms where we link specifically to the English section; for one thing it's definitely not maintainable, for another these are almost always common English terms so the gain of saving a couple of mouse clicks/scrolls seems hardly worth it. User:Fay Freak's suggestion of using Vector 2022 is also an alternative; for me it wastes too much whitespace on the left and right but if the TOC issue is a big concern it may be worth it. Benwing2 (talk) 02:46, 18 February 2024 (UTC)[reply]
@Benwing2: I figure that the impression of waste is shaped by imagining yourself editing some Module code sections, which they don’t do that deeply at Wikipedia, hence their vote. For reading and editing in general it is of course more amenable by decreasing the paths the fingers have to move the mouse to edit and view the history and read it again, or switch between entry and discussion, hence the main actions there in the narrower middle column, and texts having only limited width makes as much sense as multiple columns in newspapers, more information with shorter eye moves. Even most other actions need no scrolling, but particularly the search, which is much more important on Wiktionary, but on Vector 2010 I have to scroll back to the top (or shift through the browser search bar) to search another word when I am in some L2 header section under the very top. Maybe we can make the gaps less extreme and more in alignment with Wiktionary through CSS, otherwise I must own that the Foundation’s designers have outslicked us in balancing the psychological and physiological effects of the interface upon editors and only-readers. Fay Freak (talk) 03:32, 18 February 2024 (UTC)[reply]
@Fay Freak, Benwing2: My screen resolution is 1,920 × 1,080 pixels, if that helps you to visualise. I tried the Vector (2022) skin, and that does indeed get rid of the problem, but at the acknowledged cost of instead wasting horizontal screen-space. However, I make a point of using mostly default settings, since those are the ones that all unregistered users will have, which I assume to be the vast majority of them, their being the ones for whom we should try to optimise things, other things being equal. Is there some way to apply {{col-auto}}-style columns to the table of contents in the Vector legacy (2010) skin in order to save vertical screen-space? 0DF (talk) 19:49, 18 February 2024 (UTC)[reply]
@0DF Maybe User:This, that and the other knows? This would involve some CSS magic. And User:Sarri.greek was also trying to work on changes to the TOC to save space. Benwing2 (talk) 20:33, 18 February 2024 (UTC)[reply]
Well, if you're asking, my personal preference would be to enable TabbedLanguages for everyone by default. And my second preference would be to enable Vector2022 for everyone by default... as DCDuring often says, we squander the width currently available to us in the vast majority of entries, and Vector2022 at least uses this space for something else.
Failing that, see User:This, that and the other/columnar toc mockup for a couple of mockups. I tend to prefer the second one. We could use media queries to ensure the columns only display on sufficiently large screens. This, that and the other (talk) 00:52, 19 February 2024 (UTC)[reply]
@This, that and the other: The second one looks good. How would it look on bar, restricted to level-2 headers only? The Vector (2022) skin or an improved version thereof might eventually be the way to go, but that's no reason not to improve the TOC problem with the Vector legacy (2010) skin in the meantime. 0DF (talk) 01:05, 19 February 2024 (UTC)[reply]
@0DF you can try it yourself by pressing F12 in your browser, going to the Console tab, and pasting the following code:
document.querySelector('.toc').style.display = 'flow-root'; document.querySelector('.toc > ul').style.columnWidth = '20em';
Note that, for it to work effectively, the {{TOC limit}} template should come after the {{character info}} template in the wikitext of the page. Currently these are the wrong way around at the bar page. This, that and the other (talk) 01:20, 19 February 2024 (UTC)[reply]
@This, that and the other: Sorry to be dense, but what do you mean by "the Console tab"? 0DF (talk) 13:41, 19 February 2024 (UTC)[reply]
@0DF When you press F12, a new panel should open up with various tabs along the top such as "Elements" (or "Inspector), "Console", "Network" and so forth. Switch to the "Console" tab. This, that and the other (talk) 03:56, 20 February 2024 (UTC)[reply]
@This, that and the other: Unfortunately, all that happens when I press F12 is that my computer's Calculator program starts. 0DF (talk) 02:10, 21 February 2024 (UTC)[reply]
@0DF It may depend on your browser. Under Chrome on MacOS, for example, you need to go to View -> Developer -> JavaScript Console to get a console pane to pop up. Under Firefox on MacOS you need to go to Tools -> Browser Tools -> Browser Console. Benwing2 (talk) 02:42, 21 February 2024 (UTC)[reply]
@0DF Try pressing Fn+F12 or Ctrl+Shift+I (capital "i"). This, that and the other (talk) 03:10, 21 February 2024 (UTC)[reply]
@Benwing2, This, that and the other: Thank you both. Fn + F12 was the first thing I tried and it worked, so I didn't try the other methods.
@This, that and the other: The TOC on bar looked great with your code. I made the correction to the order of {{character info|㍴}} and {{TOC limit|2}} you recommended, but I prefer the four-column presentation of the TOC when those two templates are in the "wrong" order to the three-column presentation of the TOC when they're in the "right" order. Nevertheless, they are both a great improvement, and in both cases the resulting TOC fits on the first screen with plenty of space to spare.
@Benwing2, This, that and the other: I think the columnar TOC should be rolled out universally. I also think that the TOC should be limited to level-two and -three headers by default. What do you think? Does it call for a vote? 0DF (talk) 00:48, 22 February 2024 (UTC)[reply]
@0DF I'm not sure whether it needs a vote, but if nothing else a thorough discussion (and the two issues you mentioned should be considered separately). I would recommend starting a new BP discussion bringing up these two issues as separate subsections and soliciting feedback. Benwing2 (talk) 01:30, 22 February 2024 (UTC)[reply]
@Benwing2: Done. Let's see what comes of it. I'll be focussing on responding to that discussion about Byzantine Greek for the next few days. 0DF (talk) 02:22, 25 February 2024 (UTC)[reply]
My comment still stands, though the comments by others since better address the same things. If you have concerns about how definition lines are treated by browsers, you need to bring them to the attention of the community. Using a template that's not intended for the purpose on a random individual line while leaving millions of similar lines untouched that are not different in any meaningful way strikes me as misguided and hacky. As for @PUC, their explanations were straightforward and to the point. I would probably have reverted your edit myself. As for {{l|en}} that's used for linking to English entries, and the modules would be wasting time determining that there was nothing to link to. I spend a lot of my time patrolling CAT:E, and I know that there are several entries that intermittently show up there because of excessive system overhead. If this is done systematically, it will just make things much worse. In comparison to the efforts of people like @Benwing2, This, that and the other and @Theknightwho (which have not been without criticism), such methods are piecemeal, inefficient and silly. Chuck Entz (talk) 00:34, 18 February 2024 (UTC)[reply]
I think my only contribution in this particular sphere was creating the -lite templates! Still, 0DF has a point. Unmarked links in definition lines should take the reader to the English section of the linked entry. The objections to using {{l}} for this purpose are very reasonable, so perhaps the addition of #English to these links can be done via JavaScript. This, that and the other (talk) 06:15, 18 February 2024 (UTC)[reply]
If you can figure out how to do this, by all means implement it. Benwing2 (talk) 06:26, 18 February 2024 (UTC)[reply]
@This, that and the other: That would be a great solution if you could manage it. 0DF (talk) 19:50, 18 February 2024 (UTC)[reply]
@Benwing2 @0DF  Done, see [6]. This, that and the other (talk) 01:10, 19 February 2024 (UTC)[reply]
@This, that and the other Thanks for this. This causes any red links to get turned orange by the OrangeLinks gadget, because it's still adding #English to the end, and OrangeLinks obviously can't find an English section on the target page. You should be able to filter these out, since the URL targets have a different format. Theknightwho (talk) 02:07, 19 February 2024 (UTC)[reply]
@Theknightwho thanks for noting this. Can you check to see if it's fixed now? This, that and the other (talk) 03:54, 19 February 2024 (UTC)[reply]
@This, that and the other Thanks. One other issue: it still fails with mainspace links which contain a colon (which covers anything in Category:English terms spelled with :), and I assume you've excluded anything with a colon as a way to catch interwiki links. A better way to do that would be to check for the class extiw. On the other hand, links to titles which include ? do seem to work, since it gets escaped to %3F, so that check's fine to keep. Theknightwho (talk) 04:09, 19 February 2024 (UTC)[reply]
@Theknightwho the : is mainly to catch links to other namespaces, such as the Glossary. I'm inclined to leave it as is. The number of English terms containing colons is already minuscule (cat:English terms spelled with :) and it's hard to imagine that any of these would be raw-linked from sense lines with any regularity. This, that and the other (talk) 04:40, 19 February 2024 (UTC)[reply]
@This, that and the other That's fair - you could always grab the array of namespaces, called wgContentNamespaces, and simply iterate over the array to eliminate them. Theknightwho (talk) 04:45, 19 February 2024 (UTC)[reply]
FWIW, I came up with a rough version that would handle the namespaces in proper internal wikilinks and would also correctly parse some sneaky links where the URL is spelled out (like this or this) though not those with sneaky interwiki links (like this or this):
Array.from(document.querySelectorAll('.mw-parser-output ol a')).filter(e => e.href).filter(e => {
  const url = new URL(e.href);
  if (url.hostname === mw.config.get("wgServerName") &&
    (url.search === "" || url.searchParams.get("action") === "view" || url.searchParams.get("action") === null)) {
    let rawTitle = url.searchParams.get("title");
    if (rawTitle === null) {
      const match = url.pathname.match(mw.config.get("wgArticlePath").replace("$1", "(.+)"));
      if (!match) return false;
      rawTitle = match[1];
    }
    const title = new mw.Title(rawTitle);
    return [0, 118].includes(title.namespace);
  }
})
I tested it with these links. It could use some refactoring but I have to go to bed. — Eru·tuon 06:18, 19 February 2024 (UTC)[reply]
Again, I'm not convinced it's worth it given that (a) this code only affects sense lines (dodgy external links to Wiktionary in sense lines should be fixed rather than worked around) and (b) there are so few false-negative entries in existence. I'm not going to get in the way of anyone who wants to edit my code, but it's complete as far as I'm concerned. This, that and the other (talk) 06:46, 19 February 2024 (UTC)[reply]
@This, that and the other: Thanks for your work on this. Unfortunately, when I clicked on those "trial" links, none of them took me to the English section of the page. Do you know why that might be? 0DF (talk) 13:37, 19 February 2024 (UTC)[reply]
@This, that and the other: Actually, scratch that. It seems to work fine most of the time. Thank you! 0DF (talk) 13:40, 19 February 2024 (UTC)[reply]
So now we should give taxonomic links as {{l|mul}}? @Benwing2 Do you think at least part of this transition could be bottable? Although this probably needs a separate discussion. Catonif (talk) 16:08, 19 February 2024 (UTC)[reply]
@Catonif Hmm, this is an unintended consequence of this change. Yeah probably we will need to give taxonomic links using {{l|mul}}; I think it's too expensive to consider trying to automate this by looking up each link from JavaScript to see whether it has a Translingual and/or English section. This should be bottable essentially by looking for raw links in definitions that link to terms that have Translingual but not English sections. Benwing2 (talk) 22:28, 19 February 2024 (UTC)[reply]
@Catonif, Benwing2, DCDuring: Why exactly is {{taxlink}} removed from blue links? I don't understand the reason for that practice. 0DF (talk) 22:46, 19 February 2024 (UTC)[reply]
@0DF Can you give me an example of this? Benwing2 (talk) 22:47, 19 February 2024 (UTC)[reply]
@Benwing2: Yes: Special:Diff/78116436/78128518. 0DF (talk) 23:24, 19 February 2024 (UTC)[reply]
@0DF: {{taxlink}} is primarily for tracking taxonomic names that don't have translingual entries yet- it adds the page to subcategories of Category:Entries using missing taxonomic names. For linking to a Translingual taxonomic name entry, it would be simpler to use {{l|mul}} or {{m|mul}}. Chuck Entz (talk) 23:49, 19 February 2024 (UTC)[reply]
@Chuck Entz: I'm not sure it would, given the functionality of {{taxlink}} brought up by DCDuring in User talk:Benwing2#Italicising synonyms for taxonomic names. {{taxlink}} with |nocat=1 seems like the easiest way to go. 0DF (talk) 00:39, 20 February 2024 (UTC)[reply]
@0DF I don't know much anything about the taxonomic templates but this seems the wrong way to do things; instead there should be a template that implements the fancy italicizing behavior without adding a tracking category. (Maybe there already is such a template; there are several taxonomic templates in CAT:Taxonomic name templates.) Benwing2 (talk) 01:23, 20 February 2024 (UTC)[reply]
If I remember correctly {{taxlink}} came first, then it was upgraded to have the proper display. After looking at {{taxlink}}, I recalled that the italicization logic was later moved to Module:italics, which also handles some titles for quotation templates. That said, I don't think it should be added to a general-purpose workhorse like {{m}}. Chuck Entz (talk) 02:22, 20 February 2024 (UTC)[reply]
@Chuck Entz It could be done if we had a special langcode for taxonomic links, which imo is justified. Theknightwho (talk) 02:25, 20 February 2024 (UTC)[reply]
@Theknightwho AFAIK we already have an etym code for taxonomic links although it may not be used. Benwing2 (talk) 02:44, 20 February 2024 (UTC)[reply]
@Benwing2, Chuck Entz, Benwing2: The code mul-tax exists for just such a purpose, although it is currently unused. See User talk:Benwing2#Italicising synonyms for taxonomic names. 0DF (talk) 02:17, 21 February 2024 (UTC)[reply]
The real question is, why is a template called {{taxlink}} being used for missing taxonomic names? Surely this template should be called something like {{taxlink-check}}, and then, when an instance checked and found valid, it can be changed to {{taxlink}}. I'm tempted to go to RFM. This, that and the other (talk) 04:00, 20 February 2024 (UTC)[reply]
{{taxlink}} evolved. I intended to be short because I often type it. Originally, it was to simultaneously provide a way of counting incoming links to missing terms and also hide our lack of content by providing a link to WikiSpecies. (There is a variant that provides link to WP and for alternative pagenames for links to Species and WP.) When we add a taxonomic entry I remove the template. Sometimes, as in the example given above, I add {{taxlink}} to all items in a list, as of hypernyms or hyponyms, and remove those for which we actually have entries.
The capability of automatically de-italicizing terms appearing within taxonomic names like "subsp.", "var.", "sect." etc. helps a lot for lists of terms (hypernyms, hyponyms). It isn't a big need for uses of taxonomic names otherwise eg, in definitions and etymologies, where occurrence is rare, unmanual wikitext is sufficient, and the "harm" of not having proper italicization is not serious. DCDuring (talk) 13:16, 20 February 2024 (UTC)[reply]
@DCDuring The whole system of removing the template when a page has been created seems like a major waste of time, as we can check for this stuff automatically to determine whether to add the tracking category. It also means that systematising/improving taxonomic links is now a massive job, instead of something we could have done by using existing template calls. Theknightwho (talk) 13:27, 20 February 2024 (UTC)[reply]
The machine checking of entry existence seemed like a gross waste of machine time, leading to unnecessarily slow loading of pages with lots of instances of taxlink, or so I was told at the time. Even today, the WM documentation for ParserFunctions says:
ifexist limits
  1. ifexist: is considered an "expensive parser function"; only a limited number of which can be included on any one page (including functions inside transcluded templates). When this limit is exceeded, any further #ifexist: functions automatically return false, whether the target page exists or not, and the page is categorized into Category:Pages with too many expensive parser function calls. The name of the tracking category may vary depending on the content language of your wiki.
To this day Were I not on the defensive for daring to have ambitions to include large number of taxa and had I the wit, I would have created another template, say, {{taxlinky}}, which retained taxonomic information. At the time, {{taxlink}} did not have any formatting capability and taxonomic entries received no technical support whatsoever, so there was not much point. To this day, taxonomic names are included in the same "language" as CKJV characters, despite there being more than 20K instances of {{taxon}} and sharing damned few commonalities with CJKV characters. ({{taxlink}} appears on more than 46K pages, very many with multiple instances.) In the process of removing {{taxlink}} instances I almost always find missing content and error in entries, so the 'major waste of time' leads IMHO to quality improvement. DCDuring (talk) 16:12, 20 February 2024 (UTC)[reply]
@DCDuring It’s a minor check that’s carried out very easily, can be done without using the shitty parser function you’ve referred to, and does not involve someone spending many hours going around removing templates. I can only conclude you’re being defensive because of all the pointless time you’ve wasted doing it. Theknightwho (talk) 16:32, 20 February 2024 (UTC)[reply]
@Theknightwho: This seems unnecessarily combative, DC gave good reasoning and supporting evidence for doing things one way while also expressing an interest in having a second template {{taxlinky}} to preserve the information (presumably so they can manually change {{taxlink}} to {{taxlinky}} after we have an entry for the term, which sounds pretty close to what TTO suggested earlier. If it's possible to modernize the underlying code of {{taxlink}} so that it can automatically link to terms we have and use WikiSpecies for terms we don't have without causing excessive memory or CPU overhead, that seems like an ideal solution. JeffDoozan (talk) 17:06, 20 February 2024 (UTC)[reply]
@DCDuring @JeffDoozan I apologise. I get irritated when people bring up these kinds of micro-optimizations as a reason to keep doing things the same way; in this case, even using the parser function, the memory/performance impact would be a few milliseconds at most even with several of them, unless we start adding many hundreds of taxonomic links (at which point it would hit the limit of 500). It is possible to do it another way in Lua which does not have this limit, however. Theknightwho (talk) 17:12, 20 February 2024 (UTC)[reply]
@Theknightwho: I found that taxlink is already invoking lua to handle the complicated italics so I doubt there would be any performance impact to just convert the whole thing to lua. I found mw.title.new(page):exists as a way of checking that a page exists, but the linked documentation mentions that it's an "expensive" function. Is there an inexpensive way to check if a page exists in Lua? Going even further, is there an inexpensive way to check that the page exists and that it contains a Translingual L2, maybe doing whatever orangelinks does to inspect the page (I haven't looked)? Thanks! JeffDoozan (talk) 16:57, 24 February 2024 (UTC)[reply]
@JeffDoozan Yes - you can use :getContent(), which is not expensive. If the page doesn’t exist it returns nil, so it functions a workaround. We already use this in several other places. Checking for a Translingual section can be done with get_section in Module:utilities. Theknightwho (talk) 17:00, 24 February 2024 (UTC)[reply]
@Theknightwho: Perfect! I misread the documentation, the call to mw.title.new() is apparently only expensive when using id and not when using title, so mw.title.new(PAGENAME):getContent() looks like a safe way of inspecting the contents. Thank you! JeffDoozan (talk) 17:09, 24 February 2024 (UTC)[reply]
I always had the feeling that taxon links should be left wrapped some way or the other, so they be later an object for manipulation in bulk, if ever needed. It is clear that the current state is not ideal. This, that and the other also seemingly underestimates the amount of work put into organism names; if we assumed two templates then calling one {{taxlink-check}} and the other {{taxlink}} is unacceptable for the former’s length, rather it would be {{taxcheck}} and {{taxlink}} or even {{taxl}}. I don’t know what you programmists can do, other than having two templates, but the problems can be outlined well. Fay Freak (talk) 17:14, 20 February 2024 (UTC)[reply]
See User talk:Benwing2#Italicising synonyms for taxonomic names for a relevant discussion.

Help wanted, lots of template calls with bad parameters[edit]

I built a tool to analyze the supported parameters of ~36,000 templates that don't invoke modules (except "Module:string" and "Module:ugly hacks") and then used that to validate all of the template calls in the main namespace and found nearly 100,000 calls to templates using unhandled parameters. There are a lot of errors in here that can be cleaned up by bot: typos, misnamed parameters, etc, and I made a cleanup config page where you can specify a template name, bad param name, good param name to have the bot rename or remove a parameter (this isn't automatic, I'll be verifying anything added here and running the bot manually). Additionally, there are probably some places where it might be worthwhile to modify the templates to support the parameters users are trying to use. Finally, there are places where parameters were once required by templates but are longer used and can be just deleted. Please take a look at the list (warning: it's big and not mobile friendly), add any cleanups you know are safe to the config page, and share any thoughts you have on how else we can clean this up and what can be done to prevent similar errors. JeffDoozan (talk) 17:05, 18 February 2024 (UTC)[reply]

@JeffDoozan Thank you, this is going to be very helpful. I am looking at the Hungarian entries and I'm not sure why some of them are listed. For example, {{hu-decl-ek}} has a valid parameter ül, listed in the documentation and present in the code but is marked as an error in jel as follows: {{hu-decl-ek|je|l|et|acc2=t|ül=y}} bad param 'ül' on jel. My next question is if I correct the issues manually can I delete them from your list? Or are you planning to regenerate the list from time to time? Panda10 (talk) 17:45, 18 February 2024 (UTC)[reply]
@Panda10: I'll regenerate the list with new data automatically every few weeks. |ül= is mistakenly flagged as invalid because it contains "ü", which I had not included as a valid character for parameter names. I'll fix that bug and regenerate the list. JeffDoozan (talk) 17:51, 18 February 2024 (UTC)[reply]
@JeffDoozan Please also include ő as well. There is a parameter |no-vő= in another template. Panda10 (talk) 18:01, 18 February 2024 (UTC)[reply]
The filtering by valid-characters was a mistake that excluded a lot of valid parameter names. I've removed it entirely. The list should be updated with the improved parameter detection in the next 10 minutes. JeffDoozan (talk) 18:06, 18 February 2024 (UTC)[reply]

Setting Classical Latin transcriptions to phonemic only[edit]

It is entirely unnecessary for a dictionary, of all things, to attempt narrow transcriptions of millennia-old pronunciations. It is even more unnecessary to insert all sorts of silly hot-takes like:

  • Complete absence of [s] in favour of "[s̱]"
  • ⟨z⟩ as a word-initial [d̪͡z̪] and intervocalic [z̪d̪͡z̪]
  • Complete absence of [j] and [w], even word-initially
  • Syllabification claimed to be phonemic (to be fair, not the only language with this issue)
  • Short vowels before /-m/ as half-long (ludicrous levels of claimed precision) but not raised (??): [ɪ̃ˑ ɛ̃ˑ ɔ̃ˑ ʊ̃ˑ]

Mind, all of this is simply presented as a matter of fact - the outputted transcriptions come with no disclaimer like 'phonetic details uncertain' or 'take with a grain spoonful mug of salt'.

I hope it's not too much of me to say that a word like divisibilitatem should be rendered as simply /diːwiːsibiliˈtaːtem/ and that the current output [d̪iːu̯iːs̠ɪbɪlʲɪˈt̪äːt̪ɛ̃ˑ] is best suited for a conlanging forum, not a lexicographical project with any pretense of professionalism. Nicodene (talk) 12:41, 19 February 2024 (UTC)[reply]

Yes. Fay Freak (talk) 15:17, 19 February 2024 (UTC)[reply]
Great I'm glad you agree. Nicodene (talk) 15:45, 19 February 2024 (UTC)[reply]
Support - put it in the bin. Theknightwho (talk) 20:46, 19 February 2024 (UTC)[reply]
Support. It feels like a conlang project. — Fenakhay (حيطي · مساهماتي) 20:51, 19 February 2024 (UTC)[reply]
Support. There's always a strong temptation to treat linguistic reconstruction as a magical portal into the past, when it's really just an educated guess based on incomplete evidence. We do have some contemporary evidence, and books have been written on the subject, but it's still just a guess. How would the pronunciation of a common foot soldier be different from that of a centurion, a member of a certain gens, or even an emperor? What about when giving an oration as opposed to when buying something in a market? Any language with that many speakers over that wide an expanse of time and space is going to have lots of variation with historical era, geography, and any number of social variables. Of course, most of those variations wouldn't make it into writing that would be preserved into modern times, but it's still like the parable of the blind men and the elephant to some extent. Chuck Entz (talk) 21:38, 19 February 2024 (UTC)[reply]
@Nicodene Support although it might be worth rendering final -m (as well as vowel + -ns-) as a long nasal vowel since that seems to have been universal. Benwing2 (talk) 23:04, 19 February 2024 (UTC)[reply]
BTW there used to be a {{cu-IPA}} that generated even more absurd pronunciations of Old Church Slavonic terms; it was nuked with extreme prejudice. Benwing2 (talk) 23:07, 19 February 2024 (UTC)[reply]
Support. We don't need every phonetic process that happens and we should really nuke other phonetic transcriptions when possible. Vininn126 (talk) 08:17, 20 February 2024 (UTC)[reply]
@Vininn126 Not sure I agree with the second part. It depends at what level the phonetic transcription is. Sometimes on the contrary we need to nuke the phonemic version, if it's too abstract and misleading. Benwing2 (talk) 08:19, 20 February 2024 (UTC)[reply]
I suppose with certain transcriptions, sure. My point is more that we have far too many phonetic transcriptions where phonemic ones would be better. Vininn126 (talk) 08:30, 20 February 2024 (UTC)[reply]
Again I believe broad phonetic transcriptions are usually the best. For example, a purely phonemic transcription of Catalan would render all voiced fricatives (which are actually pronounced more like approximants, as in Spanish) as stops; but this would be quite misleading for the language learner. What I think you're complaining about is narrow phonetic transcriptions, which I agree are usually unnecessary. Benwing2 (talk) 08:40, 20 February 2024 (UTC)[reply]
I don't have the same objection for modern languages since there the phonetic details are a matter of fact. I've not really found any that seem unnecessarily detailed; a transcription like [ɡɐˈrɨ] for instance is far more useful to a learner than a phonemic /ɡoˈri/.
If what you're getting at though is that having phonemic and phonetic transcriptions side-by-side is unnecessary, that I can agree with. If we're already going to show [kaˈβ̞a.ʝo], why have a preceding /kaˈbaʝo/? Nicodene (talk) 19:34, 20 February 2024 (UTC)[reply]
I don't agree with that, and would like to see a three-tier system: a broad phonetic transcription, and for people interested in more, a narrow phonetic transcription + a phonemic one, which would be hidden by default. PUC20:23, 20 February 2024 (UTC)[reply]
We don't actually disagree - I was referring to the default state of an entry. Putting additional information in a drop-down menu sounds like a good way of going about it. Nicodene (talk) 21:19, 20 February 2024 (UTC)[reply]
This could be a good system. Vininn126 (talk) 16:26, 21 February 2024 (UTC)[reply]
Phonemic transcriptions are useful for users to be able to infer idiolectal pronunciations of words: You can't show how every single speaker talks, but by giving a phonemic transcriptions a reader can apply a set of rules to determine it. Thadh (talk) 10:57, 22 February 2024 (UTC)[reply]
Support. MuDavid 栘𩿠 (talk) 00:58, 21 February 2024 (UTC)[reply]
Support, but Benwing makes a good point about final nasals; I was going to make the same comment myself. This, that and the other (talk) 08:39, 21 February 2024 (UTC)[reply]
A phonemic representation of ⟨-Vm⟩ as /Ṽː/ isn't given by any scholar that I'm aware of, and it would be contradicted by the fact that a consonant is evidenced by the various spellings of the tan durum type (= tam + durum), which show final [m] assimilated to [n] in contact with the initial consonant of the following word.
That leaves the matter of length. Does any source give phonemic representations like */dekeːm/ for decem? A blank implementation would run into mistakes like */reːm kʷeːm/ for rem, quem - contradicted by the Romance outcomes like French rien and Spanish quién. Nicodene (talk) 16:21, 21 February 2024 (UTC)[reply]
@Nicodene AFAIK final -m was pronounced as homorganic to the next consonant in cases like 'tam durum' as you mention, but as a nasal vowel when a vowel or nothing followed, as shown by elision in poetry. I thought that was universally accepted. Maybe this isn't phonemic but we run into the standard issue with phonemic representations found in so many languages, which is that there are typically multiple representations that make sense. Benwing2 (talk) 21:00, 21 February 2024 (UTC)[reply]
@Benwing2 It's not that I question that this occurred in general (though I suspect monosyllabic words were resistant), it's rather that I can't see it being phonemic. If we try to make tam = /tãː/ work, then in light of tan durum we have to tack on an allophonic rule like "nasal vowels in contact with a following plosive forwards-eject a homorganic nasal consonant", which is more or less the reverse of what all sources describe (final /-m/, retained as a consonant in that context, allophonically deleted in others). I don't know of anyone who gives Classical Latin as a language with phonemic nasal vowels. Nicodene (talk) 21:46, 21 February 2024 (UTC)[reply]
Mild oppose: Keep a phonetic transcription with some generally agreed-on allophones, like the nasal vowels and nasal assimilation. Though I guess what's generally agreed-on is a can of worms. Maybe there's a lot of disagreement these days. I'm not really up-to-date with the evidence on retracted or apical s in Latin (aside from it being in Old Spanish, Old French, Old Galician-Portuguese) and open-mid long vowels and such. — Eru·tuon 02:12, 22 February 2024 (UTC)[reply]
To the best of my knowledge, Latin /s/ as anything but [s] is a minority view in the scholarship; total exclusion of [s], either rare or non-existent. Low-mid /ē ō/ is Calabrese's pet theory, fringe amongst scholars but enjoying a cult following on the internet (thanks to a youtuber popularising it).
Above all, though, if you see some validity in either of those then a phonemic representation is to your advantage, since it accommodates them.
Maybe it's just me, but coda nasal assimilation is so common cross-linguistically that I take it as a given, upon encountering a new dialect/language, unless shown otherwise. I think once I found one where coda nasals were all automatically [ŋ]? Nicodene (talk) 09:35, 22 February 2024 (UTC)[reply]
@Nicodene I am with Erutuon here that we should show non-obvious phonetic information whenever possible when it's well-accepted, including in dead languages. We are aiming for language learners and I think it's doubtful it would be obvious that coda nasal assimilation would happen (since it tends not to happen in English), and there's no possible way they could figure out that a written /m/ is actually a nasal vowel before a pause or another vowel. I believe this strongly about living languages, and logically this extends to dead languages where the scholarship is sufficiently clear. All sorts of weird things happen in phonemic representations. E.g. one well-respected editor around here argued seriously that the consensus of modern scholarship is that Spanish fui pronounced [fwi] and muy pronounced [muj] have phonemic representations /fui/ and /mui/ respectively, which to me is completely bizarre because it requires a lexically-sensitive rule to determine whether to pronounce a given /ui/ as [wi] or [uj]. I'm coming more and more to the belief that we should ditch completely any pretense of generating purely phonemic pronunciations and present whatever will be most useful to the language learner. For Latin this might mean deviating from a pure phonemic representation only to the extent of indicating the actual pronunciation of -m (and maybe -ns-, if the -n- as nasal vowel is well-accepted). Or it could mean the same but also showing /l/ as [l] before <i> and <l> and [ł] elsewhere; again AFAIK some variant of this view is fairly universally accepted. Benwing2 (talk) 10:11, 22 February 2024 (UTC)[reply]
Believe it or not, even that is not straightforward, as there isn't a consensus that non-velarised /l/ was in fact [l] as opposed to, say, a somewhat palatalised [lʲ]. (-ns- on the other hand isn't controversial at all, I should say.)
Dead languages are very much a different beast altogether. It's trivial to find an academic source giving phonetic transcriptions for Spanish or Catalan on-par with the ones we have, some even more detailed. It is on the other hand impossible to find (and believe me I've tried) even one serious academic source which provides Latin transcriptions with anything like the level of allophonic detail you are describing. Not Allen, not Leumann, not Sen, not Cser... I'll take an actual transcription from the latter for illustration: [ĩːferus]. It does mark the nasalisation we've come to love and cherish, yet pretty much nothing else at all. And you won't find a more elaborate transcription than that in the entire study (link). If a modern work of 200+ pages dedicated to Latin phonology and morphology doesn't embark on these adventures, what are we doing here exactly?
And supposing we go broad to the extent of Cser: now the problem is that we're going to mislead anyone who knows basic IPA but isn't a specialist in Latin. If I saw, for a language I don't know, a phonetic transcription like [ĩːferus] - with that level of detail put into the first phone - I'd think the rest of the transcription is similarly accurate - yet, I know for a fact the rest is simply left in phonemic form. Meanwhile Sen, whenever he can, gives Classical Latin in phonemic transcriptions (as I am also suggesting we do). These are cutting-edge works discussing fine details of Latin phonology - by no means can either author be accused of, say, incompetence or laziness in that regard. It's simply that nobody does such a thing with a dead language, in any sort of academic context. Whimsically whilst daydreaming, maybe.
ETA: Why not just have the label 'Classical' link to the Wiki page that describes what scholars do or do not agree on, phonetics-wise? Nicodene (talk) 12:55, 22 February 2024 (UTC)[reply]

Replacement of the trill [r] with non-trill [ɹ][edit]

There seem to be cases in general American English where the voiced alveolar approximant /ɹ/ is transcribed as the trill /r/. For example, 'brand' is transcribed as /brænd/ instead of /bɹænd/. If this sounds right, our team at CUNY will start to make corrections. Cpeng2 (talk) 22:12, 20 February 2024 (UTC)[reply]

@Cpeng2: that seems uncontroversial to me. — Sgconlaw (talk) 22:32, 20 February 2024 (UTC)[reply]
Thanks,@Sgconlaw! Me and other team members @Yaejunmyung and @Your future self will start the cleaning soon. Cpeng2 (talk) 00:28, 21 February 2024 (UTC)[reply]
@Cpeng2 Not sure if you intend to cover it, but this also affects RP English. Theknightwho (talk) 00:34, 21 February 2024 (UTC)[reply]
@Sgconlaw: Except that the correct symbol for broad phonemic transcription is /r/ when there's only one rhotic. --RichardW57 (talk) 13:48, 21 February 2024 (UTC)[reply]
@RichardW57: in English? Can't say I'm very familiar with this. I'm just following what is specified in "Appendix:English pronunciation". Again, if it is thought there needs to be a change to the table, then it should be discussed on this page so that consensus should be reached. — Sgconlaw (talk) 13:54, 21 February 2024 (UTC)[reply]
@Sgconlaw It’s a long-established practice to use /ɹ/, and I’m not fully convinced that /r/ is correct even in broad transcription, even if it is used by some publications. Theknightwho (talk) 14:01, 21 February 2024 (UTC)[reply]
I'm not a big fan of /ɹ/ because I think using a special IPA letter rather than than /r/ makes it less obvious that the transcription is broad. (A plain alveolar approximant occurs as a lenited allophone of /r/ in languages like Italian and Greek, but sounds pretty different from the English "r" sound. The definition of "ɹ" is actually pretty vague, so it isn't improper to use it for either sound, but I think using it for English r can potentially give an air of false precision.) Neither /r/ nor /ɹ/ is "incorrect" in a holistic sense though. Notation for English R is discussed in "The Articulatory Phonetics of /r/ for Residual Speech Errors", Suzanne E. Boyce (Semin Speech Lang. 2015 Nov;36(4):257-70). Boyce writes: "Linguists agree that the rhotic liquid of English is a single phoneme and that certain articulatory movements must occur for a typical acoustic profile and an acceptable percept to occur. In the International Phonetic Alphabet’s notation, this sound is represented by /ɹ/, which specifies that the sound is an approximant with a primary constriction at a point along the palate that may range from alveolar to postalveolar to palato-velar. The American phonetic tradition, which is followed by most clinicians, is to use the Roman alphabet symbol /r/" (page 258) and "although many textbooks refer to /r/ as having an alveolar place of articulation, it is more accurate to say that it has a relatively undefined “palatal” or “postalveolar” primary place of articulation. As noted previously, this is in fact the current stance of the international phonetic association for the IPA symbol /ɹ/" (page 261-262). In the 1999 Handbook of the IPA, Peter Ladefoged's chapter on "American English" presents "'tɹævəlɚ" as a "broad phonemic transcription" of traveler.--Urszag (talk) 14:12, 22 February 2024 (UTC)[reply]

Emoji pronunciations in English[edit]

I recently noticed that we now have an entry (complete with pronunciation) for one emoji: 🧢. I'm curious whether others think this is a good precedent. Kylebgorman (talk) 13:07, 21 February 2024 (UTC)[reply]

The entry is pretty explicit that it is for the use of the emoji as a rebus for the slang word cap. To my mind, this isn't much different from including other types of alternative spellings such as ur, i18n, and so on. As with those, it seems reasonable to include most forms of detailed information at the main entry to avoid duplication, although including a short definition may be convenient, and I guess it can be argued that including the pronunciation may be useful in cases where it isn't obvious from the spelling itself.--Urszag (talk) 13:50, 21 February 2024 (UTC)[reply]
@Urszag I don't feel that strongly about this but I think conventionalized English names for emojis" is a rather different than conventionalized abbreviations. For one it's not clear whether all or even most emoji have a conventional name. Kylebgorman (talk) 19:30, 21 February 2024 (UTC)[reply]
The citations don't attest conventionalized English names for the emoji 🧢. Rather, they show use of 🧢 as a written representation of the word "cap".--Urszag (talk) 19:47, 21 February 2024 (UTC)[reply]

Theknightwho changes to Dravidian tree[edit]

User:Theknightwho took it upon himself to rename and restructure the Dravidian family tree. A South Dravidian superfamily does not have unilateral support, and is in fact not supported in the Kolipakam (2018) computational models. Please undo these changes. @Benwing2, -sche, Chuck Entz, Mahagaja, Mnemosientje --{{victar|talk}} 18:09, 21 February 2024 (UTC)[reply]

Apparently Victar missed the edit summary, so I shall repeat it here:
  • Changing Dravidian to the three branch model (North, Central, South), with (former) South and South-Central changed to South I and South II, in a new superfamily called South. Although the consensus at User talk:AleksiB 1945#Proto-Dravidian entries wasn't universal, it was generally in favour and it's also the model all our entries currently follow anyway.
The only person who opposed was you, and you self-admittedly are "really only interested in Dravidian terms borrowed into Sanskrit", and given your endless obstructionism/bullshitting in the face of anything you disagree with and the fact that User:AleksiB 1945 has been waiting for months for these changes, I decided to bite the bullet and make the change. So no, I will not "undo these changes".
What Victar has also missed is that a decision on this issue had to be made at some point, because we couldn't create proto-languages for the major Dravidian subfamilies until we made a decision as to what "Proto-South Dravidian" referred to; this has also been pending for a while. Given that our entries de facto follow the model I changed it to, that seemed by far the most sensible choice, as it's no use having the data say one thing but all our entries say another. Theknightwho (talk) 18:17, 21 February 2024 (UTC)[reply]
Also I should note that I was the one who added the previous Dravidian model in the first place - it's not like it had some kind of longstanding consensus behind it. Theknightwho (talk) 18:28, 21 February 2024 (UTC)[reply]
Again, this is another example of you making executive decisions without starting a proposal discussion first. If you really feel your change is warranted, undo it and start a proposal. But you won't, just as you didn't revert that last things you were told to revert because rules and practices clearly don't apply to you. --{{victar|talk}} 20:39, 21 February 2024 (UTC)[reply]
@Victar As I have already explained, this change was made in order to close a thread relating to proto-languages for Dravidian subfamilies which has been open for a long while now. You failed to comment on that thread. In making that decision, I consulted past threads in which you were a minority dissenter, and also noted the de facto state of our entries. This has already been debated at length, as you well know, so me opening yet another proposal would serve absolutely no purpose.
This has already been explained to you, and no rules or practices have been broken in coming to that decision. Theknightwho (talk) 20:51, 21 February 2024 (UTC)[reply]

@-sche, I could use your experience in language families on this. We've been only reconstructing Proto-Dravidian and not any subfamily branches for the following reasons:

  1. Some of these families are not agreed upon and may simply be areal, such as North Dravidian, with some scholars believing Brahui to be its own genetic branch, and the superfamily South Dravidian being highly disputed. Please see Dravidian_languages#Classification.
  2. The differences between proto reconstructions of the families, in many cases, would be none, creating redundant reconstructions.
  3. In part for the reasons above, most sources only reconstruct Proto-Dravidian in their etymologies.

To sort reconstructions that are only found in a single branch into categories, we started using labels, like is done with Proto-West-Germanic and many other languages. These seems the safer and more collated route. Pinging @Pulimaiyi, Kutchkutch. --{{victar|talk}} 21:26, 23 February 2024 (UTC)[reply]

@Victar That isn’t true, and misrepresents what @AleksiB 1945 has told you, and indeed what can plainly be seen in our reconstruction entries. You've just been reverting anyone who tried to make entries for them.
The South Dravidian branch is not “highly disputed” - I can find absolutely no evidence of that. I can see that there are different views as to whether South or South-Central should be grouped together, but recent scholarship has (strongly) trended towards doing so. Please stop misrepresenting things.
Finally, if you disagreed with adding additional proto-languages, you had ample time to object in the thread which has been open for two months, but you did not do so. It’s bizarre that suddenly you find a pressing need to claim it’s wrong now, and in this way; I wonder what the reason could possibly be.
Being frank, your conduct in this thread seems like an attempt to circumvent the consensus of discussions you don’t like through trying to impose a new one by canvassing editors you believe are going to be predisposed to your views, and by crying foul play by misrepresenting the circumstances in which the change was made. That’s made very obvious by the fact you haven’t pinged by far the largest contributor in Dravidian languages, which is very obviously because you know he won’t agree with you. It’s unacceptable. Theknightwho (talk) 22:49, 23 February 2024 (UTC)[reply]
I also note you didn’t ping @Illustrious Lock, but you did seemingly find the time to move the Proto-South Dravidian I entries they made to Proto-Dravidian. How strange. Anyone would think you’re not interested in genuine consensus. Theknightwho (talk) 23:25, 23 February 2024 (UTC)[reply]

@Theknightwho: Although the details may need to be discussed further, I agree with the administrative changes proposed by the Dravidian editors. However, I am unable to make the administrative changes myself, since I am an involved administator in the dispute. The labels at Module:labels/data/lang/dra-pro were intended to be a temporary measure until a more sustainable conclusion is reached. Kutchkutch (talk) 03:20, 24 February 2024 (UTC)[reply]

@Kutchkutch Thank you - yes, that was my impression as well, and Victar’s argument here makes little sense to me. Theknightwho (talk) 12:43, 24 February 2024 (UTC)[reply]
For some further context, the discussion which @Victar did not participate in was explicitly pointed out to him in User:Victar#Proto-Dravidian labels at MOD:labels/data/lang/dra-pro, so it's not as though he wasn't aware. He much prefers forcing through what he wants at the expense of the other Dravidian editors. Theknightwho (talk) 20:27, 24 February 2024 (UTC)[reply]

Writing a bot for surnames and relevant statistics.[edit]

See this diff for an example of the information being added. It's a "Statistics" heading for proper-noun surnames, with information like "According to the 2010 United States Census, Gullage is the 55,121st most common surname in the United States, belonging to 373 individuals. Gullage is most common among White (61.7%) and Black (34.0%) individuals."

From what I can tell, it's been done in an ad-hoc manner, but it's on over thirty thousand pages.

The data is from the 2010 Census surnames file (there's no 2020 file available), which contains 162,253 names: all names with more than 100 people having them.

I'd like to write a bot which will, for each of these names, look for an English proper noun section, add # {{surname|en}}. if not present, and add the "Statistics" section if not present with the relevant data, possibly with a citation footnote like:

Frequently Occurring Surnames from the 2010 Census”, in 2010 US Census, US Census Bureau, 2019 February 14, retrieved 2024-02-22

I also found this Nature article about names which links to a large dataset, but it doesn't include frequency information and is just a very large list.

Is this data too encyclopedic, viz., WT:NOT? I see that it's in a lot of places, and it might be worthwhile to automate it, as well as to import a lot of names. Thoughts? This would be my first attempt at a bot, though I have plenty of programming experience. grendel|khan 16:51, 22 February 2024 (UTC)[reply]

@Grendelkhan I would like it, personally. More information like this seems good. The only thing is to be careful with automatically adding the definition, because the same spelling could have a form that's not a name, and then you wouldn't know where to put it. But I think it's a good idea. Whilst you're at it, could you also convert any usages you do find to use {{surnames-us-census}}? We had a discussion on that topic a few months ago, and I wanted to do it, but I've forgotten and it's fallen by the wayside. If you're interested, I would definitely support it :) Kiril kovachev (talkcontribs) 22:54, 22 February 2024 (UTC)[reply]
I'm not a huge fan of these, because it seems like trivia, and not partiuclarly interesting trivia at that (beyond the most common names). Plus, it means that entries for surnames which are really rare in the US but common elsewhere end up being dominated by information about the US. Yeah, we could add census info for other countries as well, but that soon starts getting out of hand, as it's beyond the scope of a dictionary. Theknightwho (talk) 23:16, 22 February 2024 (UTC)[reply]
This is true, maybe we should limit it to a certain range so that we don't have names like the 10,000th most common name or whatever. I believe we previously removed a lot of those which had literally only one bearer, so we should also be careful not to add them back in. Kiril kovachev (talkcontribs) 21:28, 23 February 2024 (UTC)[reply]
I'm sure we had a whole discussion about the inclusion of this surname information elsewhere, which probably ended up without a consensus. — Sgconlaw (talk) 18:06, 25 February 2024 (UTC)[reply]
@Sgconlaw We did (at RFD), and it did. Theknightwho (talk) 13:35, 27 February 2024 (UTC)[reply]
@Theknightwho: right, so …? — Sgconlaw (talk) 15:30, 27 February 2024 (UTC)[reply]
Idea: could this data be hosted at Wikidata and dynamically read from there instead? —Fish bowl (talk) 23:51, 22 February 2024 (UTC)[reply]

Old Dutch lowercase toponyms[edit]

There are a bunch of Old Dutch toponyms entered using an initial lowercase letter and classified as nouns rather than proper nouns, e.g. ganipi = Gennep and budilio = Budel. These entries seem to have been created by User:Rua. This seems very strange to me, and the references given for these terms don't appear to support the usage of lowercase here. Any objections to renaming these with an initial capital letter and reclassifying as proper nouns? Benwing2 (talk) 22:02, 22 February 2024 (UTC)[reply]

Seems perfectly sound to me. Without a doubt they should at least be proper nouns, but since the references seem to cite the capitalized version, they should just be capitalized too in my opinion. Kiril kovachev (talkcontribs) 21:26, 23 February 2024 (UTC)[reply]
If the refs don't support it, spell it like they do. CitationsFreak (talk) 23:29, 23 February 2024 (UTC)[reply]
Is a capital letter used in the original contemporary source? —Rua (mew) 11:36, 27 February 2024 (UTC)[reply]
@Rua AFAICT it is, see the citations in [7] for example. But keep in mind that we capitalize Classical Latin proper names and toponyms despite there being no capitalization in the source (because the source didn't have any lowercase letters). Benwing2 (talk) 20:07, 27 February 2024 (UTC)[reply]
Why though? —Rua (mew) 17:59, 29 February 2024 (UTC)[reply]
Why do we capitalize Latin proper names? This is getting far afield of the issue but it's conventional to normalize ancient-language text, e.g. we separate U and V in Latin even though the source didn't do that, we add diacritics consistently to Ancient Greek text even though the source didn't always do that, etc. I don't know why I'm even explaining this to you; you already know it. Benwing2 (talk) 00:02, 1 March 2024 (UTC)[reply]

Wiktionary: a valuable tool in language preservation[edit]

https://diff.wikimedia.org/2024/02/23/wiktionary-a-valuable-tool-in-language-preservation/Justin (koavf)TCM 14:52, 24 February 2024 (UTC)[reply]

New Spanish-Language Dictionary Template[edit]

Hello guys. I just made my first edit in en.Wiktionary and I'd like to share it with you. It's a new template for a Spanish-language dictionary. It's the Diccionario del Español de México (Dictionary of the Spanish of Mexico).

Template:R:es:DEM

I recommend using it instead of/along with Template:R:es:DRAE for words related to Mexico and Mexican culture as they are usually deeper or more accurate than DIRAE's.

ocote”, in Diccionario del español de México, Segunda edición, Academia Mexicana de la Lengua, 2019

JaimeDes (talk) 15:41, 24 February 2024 (UTC)[reply]

Muchisimas gracias, but why does it link to an article at en.wp that doesn't exist? Were you trying to link to es.wp? —Justin (koavf)TCM 17:48, 24 February 2024 (UTC)[reply]
I didn't notice that. I just modified DRAE's template, in that case, I have homework for this weekend, I'll create the article. Thanks for mentioning it! JaimeDes (talk) 22:14, 24 February 2024 (UTC)[reply]
It's always good to have more references. I think this could be added to entries where it includes senses the DRAE lacks, such as campechana, caguama, and chocolate, or where it has an entry the DRAE does not, such as chípil. I'm not convinced the senses are that much better on ocote that it should *replace* the DRAE. Similarly, popote, huipil, and ejote don't seem much better in DEM vs DRAE. JeffDoozan (talk) 19:28, 24 February 2024 (UTC)[reply]
I do agree that it was a bad example, it seems that I picked the worst example possible whithout noticing. But I think you got the right idea about the DEM. JaimeDes (talk) 22:12, 24 February 2024 (UTC)[reply]

Two proposals concerning entries’ Tables of Contents (TOCs)[edit]

For those using the default Vector legacy (2010) skin (e.g., all unregistered users of this site), TOCs take up a lot of initial vertical space in entries with many headers. Two proposals to mitigate that follow. They derive from discussion in the section #Use of T:lang, above.

Proposal 1: Limit TOCs to displaying only level-2 and -3 sections by default[edit]

The template {{TOC limit}}, when transcluded on a page, limits which sections are displayed in the TOC for that page. The template defaults the limitation to level-2 and -3 sections (i.e., those whose headers are generated by bookended double [==] and triple [===] equals signs), although that can be varied by calling |1= or |limit= with any number besides 3. See bar for an example of a page than transcludes {{TOC limit|2}}, thereby limiting the page's TOC to displaying level-2 sections (i.e., language sections) only.

I propose that {{TOC limit}}'s default limitation should be made the default limitation to all TOCs (instituted at a fundamental level, not requiring the use of {{TOC limit}} on every page).

Rationale: The way TOCs are generated seems to have been designed with Wikipedia in mind. Wikipedia articles have few section headers relative to Wiktionary pages. Wikipedia TOCs are well suited to navigating their articles, whereas Wiktionary TOCs that include links to level-4, -5, and deeper sections are simply unwieldy. I propose the limitation to level-2 and -3 headers by analogy with print dictionaries' headwords, which usually give a term's part of speech (usually abbreviated, e.g. to n., sb., a., v., adv., vel sim.) and, in cases of homography, a numeral to differentiate terms with different etymologies. This limitation would result is the display of language sections, etymology sections, and (other than in cases of homography) pronunciation and part-of-speech sections, as well as some other sections (Alternative forms, Anagrams, etc.), but the hiding of other sections. (FWIW, I would happily see the limitation be to level-2 sections [language sections] only, à la bar, but I thought this proposal would be a more moderate change.)

This change would also improve the usability of TOCs for users of the Vector (2022) skin. 0DF (talk) 02:18, 25 February 2024 (UTC)[reply]

Discussion[edit]

Personally, I oppose this, as it makes the TOCs useless (people like me can't use them to navigate to the specific sections they're interested in any more); I even dislike and am inconvenienced that we suppress lower-level headings on extremely long pages like a, although I tolerate it because I recognize that other people dislike extremely long TOCs and don't want to have to click the "hide" button to collapse them. - -sche (discuss) 21:10, 25 February 2024 (UTC)[reply]

@0DF: I want to see L4 headers (such as 'Declension'), even when they're at Level 5. --RichardW57 (talk) 22:19, 25 February 2024 (UTC)[reply]

A different Proposal 1b: For pages with excessive number of languages L2 it is impossible to view L3. example A, te. My proposal is User:Sarri.greek/notes#TOC_hor_limit2 created by something like Module:User:Sarri.greek/toc2-hor to a template like Template:User:Sarri.greek/toc2-hor with an appropriated css Template:User:Sarri.greek/toc2-hor/style.css which will make it horizontal, but I do not know how to do it. I have been begging programmers to fix it. Also, for other horizontals, see all the ToCs of the Chinese wiktionary like A, also the Vietnamese have horizontal talks (with all subLs). For less L2s, or if desired to see all Ls, see below #Proposal_2b & User:Sarri.greek/notes#For_few_languages ‑‑Sarri.greek  I 23:09, 25 February 2024 (UTC)[reply]
@Sarri.greek: I think you meant to link to zh:A, rather than zh. I like the way 维基词典 (Wéijī Cídiǎn) formats its TOCs. The default Vector(2022) skin shows just the language sections, each with an arrow to click for the drop-down submenu for that language section's headers. Whereas the 旧版Vector(2010) skin shows all the sections, presented in two columns, with the language sections in the left column and all the subordinate sections in the right column, each one separated from the next by one standard space followed by • (a bullet, specifically a non-selectable one identical to that generated by a line-initial asterisk on this site). Let's consider how well that latter format could be applied to the English Wiktionary. Consider Wéijī Cídiǎn's page for A, for example. Besides an {{also}} transclusion, a couple of {{character info}} boxes, and — notably — a transclusion of {{TOC limit|2}} (although it doesn't seem to make a difference), it comprises 204 language sections, the longest-named of which are the two ten-character ones, 帕胡納爾-阿舍寧卡語 (Pàhúnàěr-āshěníngkǎyǔ?, “Ashéninka Pajonal”) and 塞爾維亞-克羅地亞語 (Sài’ěrwéiyà-kèluódìyàyǔ, “Serbo-Croatian”). Of the names of the 271 sections in the right column, 239 comprise two characters, 22 comprise three characters, and 10 comprise four characters; the longest non-language section names are only four-characters long each. As a result, the widest entry in that page's TOC is the first, the 跨語 (kuàyǔ yán, “translingual word”) section, which looks like this:
1 跨語言         詞源1 •詞源2 •圖片 •另見 •拓展閲讀 •參考資料
If we were to apply the same format to the TOC line for an identical entry with section titles translated into English (with intercolumnar spacing based on that of the longest language-section name in en:A, namely “Kalo Finnish Romani”), it would look like this:
1 Translingual      Etymology 1 •Etymology 2 •Image •See also •Further reading •References
But, for readability, I think this would be better:
1 Translingual      Etymology 1 • Etymology 2 • Image • See also • Further reading • References
That looks OK, I think. Better than what we have at the moment.
@-sche, RichardW57: How do you feel about this “Proposal 1b”? 0DF (talk) 20:07, 27 February 2024 (UTC)[reply]
@0DF: It lacks section numbers, which are important for navigation. --RichardW57 (talk) 22:34, 27 February 2024 (UTC)[reply]
@RichardW57: How are section numbers "important for navigation"? They aren't included in URLs to specific sections. 0DF (talk) 01:07, 28 February 2024 (UTC)[reply]
M @0DF, full numbering (e.g. 2.1, 2.2., 3, 3.1) is indispensable at wiktionaries (unlike wikipedias) because our section.titles are repetitive; especially in subsections of multiple etymologies, they help the brain navigate! (of course, positions may change at an electornic dictionary, but still... I find them so helpful!) Thank you. ‑‑Sarri.greek  I 02:35, 28 February 2024 (UTC)[reply]
@Sarri.greek: I can't say I really get it, but OK. I suspect that neither of these proposals are going to go anywhere. 0DF (talk) 03:03, 28 February 2024 (UTC)[reply]
I think, it should, @0DF. {alert/attn|bureaucrats}} The state of pages like A, te tells us, tells to me as a reader, that en.wiktionary does not care at all of how its ToCs look. Why is this? There are 3-4 desirable styles: for tooooo many Languages, for many languages, for juxtaposed languages, for few languages. And, if ever the __TOC__ is taken away (as it is forbidden at Vector2022), we should have a wiktionary‑built ToC. Because the structure of Contents is the responsibility of the editor, not the publisher. I hope, 0DF your proposals, ignite some interest. Thank you. ‑‑Sarri.greek  I 07:55, 28 February 2024 (UTC)[reply]
@Sarri.greek: Thank you for your support. I admit to feeling quite disappointed at the largely negative response to these proposals. I may write a third proposal at some point, accommodating the various criticisms of the first two proposals that have been made in this discussion. In the meantime, however, I owe you and others responses in Wiktionary:Requests for moves, mergers and splits#Medieval Greek from Ancient Greek and elsewhere, so I won't hurry back to this. 0DF (talk) 22:32, 16 March 2024 (UTC)[reply]

Proposal 2: Display the contents of all TOCs in columns[edit]

The template {{col-auto}} is widely used in entries here to sort lists of terms (in sections such as Related terms and Derived terms) into neat columns, like this:

Such presentation saves a lot of vertical space in entries that would otherwise be taken up by single unnecessarily long (but narrow) columns. That columnar arrangement can also be applied to TOCs. For a ready-made example approximating what that would look like, courtesy of This, that and the other, see User:This, that and the other/columnar toc mockup#using columns. Alternatively, you can see what that would look like on any page of your choosing. To do so, go to the page you want and, once there, open the Console tab and paste this code (courtesy, again, of This, that and the other) into it:

document.querySelector('.toc').style.display = 'flow-root'; document.querySelector('.toc > ul').style.columnWidth = '20em';

To open the Console tab, press either F12, Fn + F12, or Ctrl + ⇧Shift + I. Alternatively, if you're using MacOS and wish to navigate menus, go to View → Developer → JavaScript Console (on Chrome) or go to Tools → Browser Tools → Browser Console (on Firefox). Thanks go to This, that and the other for the keyboard shortcuts and to Benwing2 for the MacOS menu directions.

I propose that all TOCs display their contents in columns (using This, that and the other's code or code with equivalent effect).

Rationale: As they are, TOCs waste a lot of otherwise unused or little-used horizontal space at the top of pages. This change would save vertical space (which is used) by using that horizontal space (which is not used currently). Just as there is no controversy about the use of {{col-auto}}, I foresee no controversy about columnar presentation in TOCs. 0DF (talk) 02:18, 25 February 2024 (UTC)[reply]

Discussion[edit]

  1. Is there any particular problem with right-hand-side ("RHS") display (by default) of the table of contents, at least on personal computers?
  2. Will implementing either of these default options impinge on RHS display selected through gadgets or preferences?
  3. Is it possible to enjoy the benefits of proposal 1 (fewer levels in ToC) with RHS display?

Obviously, I really like RHS display and have been surprised that we don't have more who opt for it. DCDuring (talk) 16:07, 25 February 2024 (UTC)[reply]

@DCDuring:
 1. It pushes down images, any transclusions of {{character info}}, {{examples}}, {{wikipedia}}, etc., and any other RHS objects. It causes basically the same problems as left-hand-side TOCs, but to a subset of a page's contents, rather than the whole thing.
 2. Good question. Can you answer this, please, This, that and the other?
 3. It should be. What does the TOC look like for you on bar? If that page shows a RHS TOC displaying only language sections, the answer to your question is "yes".
My concern with these proposals is improving the experience of unregistered users, who are stuck with defaults. 0DF (talk) 17:30, 25 February 2024 (UTC)[reply]
The proposals still force down all content, eg. etymology, pronunciation, definitions, whereas RHS ToC forces down items that are less essential, less common ({{examples}}), or easily relocated (sister-project boxes).
I don't see how LHS ToC is better than RHS ToC is better for unregistered users.
RHS ToC displays well at bar. DCDuring (talk) 17:52, 25 February 2024 (UTC)[reply]
@DCDuring: The default is the LHS TOC. Whether the RHS TOC is better is outside the scope of this proposal, in the same way that whether the Vector (2022) skin is better is outside the scope of this proposal. 0DF (talk) 18:18, 25 February 2024 (UTC)[reply]
I am making an argument against the proposal on the grounds that there is a superior alternative. DCDuring (talk) 18:21, 25 February 2024 (UTC)[reply]
@DCDuring: Then write your own proposal in favour of making RHS TOCs the default. Don't hijack this proposal; all you'll achieve by doing so is sabotaging this attempt at improving things. 0DF (talk) 18:37, 25 February 2024 (UTC)[reply]
Proposal 2b for vertical ToCs. Example at User:Sarri.greek/notes#For_few_languages. TOC vertical-L2 with horizontal L3,4,5etc and TOC horizonal L2 with vertical L3,4,6. For excessive number of languages Ls see my #Proposal_1b as in User:Sarri.greek/notes#TOC_hor_limit2. These, if a programmer undertakes the burden of creating modules and css that will handle various styles of ToCs. I also think that the style and placing of ToCs is at the discretion of editors, regardless of which skin is around the bodytext, which may or may not produce an automatic TOC. Discussed with WMF at wikt:el:Wikiacademy/2023Vector#Modifications?. I truly hope that en.wiktionary could come up with multiple solutions and styles. Thank you ‑‑Sarri.greek  I 23:55, 25 February 2024 (UTC)[reply]
Ideally, I would want items in the menus individually collapsible. That way you could have the 1st-level only version, but click a control on the right of an item to view all of its sub-items. I have my doubts as to whether it can be done with the standard browser toolkit, but I can dream... Chuck Entz (talk) 00:02, 26 February 2024 (UTC)[reply]
@Chuck Entz:,many different styles should be available. I have been trying for 2 years now, at enWP, at pages of module-programmers, at en.wikt, to please please someone create something like wikt:el:Module:toc-test wikt:el:Πρότυπο:toc-test but the output should be like the manual wikt:el:Πρότυπο:test-ol? (I know nothing about Lua, i cannot do it).... ‑‑Sarri.greek  I 00:11, 26 February 2024 (UTC) @Chuck Entz: individually collapsible too like at fr.wiktioanry. https://fr.wiktionary.org/wiki/table?useskin=vector And fully numbered too. ‑‑Sarri.greek  I 00:15, 26 February 2024 (UTC)[reply]

"Tasmanian language"[edit]

Category:Tasmanian language is not a single language. There were an unclear number of distinct languages spoken on the island prior to the genocide of its native inhabitants, and all we have of them are wordlists in a wide range of mostly defective orthographies. For example, one word for "buttock" is variously transcribed as <leen.her>, <leieena>, <leng.in.ner>, and <liengana>. Many of the wordlists also mix data from different locations. One wordlist supposedly of Tasmanian even turned out to be Kaurna, a mainland Australian language.

I am not sure what to do with the data from Tasmanian languages, but clearly a situation like binearrenerepare, where itho, meener, munger, and nomemene are all given as "synonyms" of this first-person singular pronoun, is not ideal.--Saranamd (talk) 18:03, 25 February 2024 (UTC)[reply]

My stance is - if we don't even know what it is, better either not record it (always a good solution), or, if someone burns with desire to do so, put it under CAT:Undetermined lemmas. Thadh (talk) 18:33, 25 February 2024 (UTC)[reply]
Not recording something is the worst solution. We should check which lemmas go with which lang, and put them there, or put them in Undetermined Lemmas if we can't. CitationsFreak (talk) 06:38, 26 February 2024 (UTC)[reply]
Hard agree - I strongly oppose any situation where we can't record something because past data is imperfect (unless there is genuine reason to doubt it even existed at all). It only serves to erase its speakers from history even more than they already have been. Theknightwho (talk) 15:44, 26 February 2024 (UTC)[reply]
Nothing is erased from history, there are plenty of papers and books on the subject. "All words in all languages" does not mean we are obligated to document everything ever recorded by anyone. Thadh (talk) 15:59, 26 February 2024 (UTC)[reply]
@Thadh We can qualify entries by stating the limitations of any sources. Simply refusing to record anything that's imperfect or uncertain undermines the whole project. Theknightwho (talk) 16:49, 26 February 2024 (UTC)[reply]
There is a difference between "imperfect" and "we don't even know what this is". We shouldn't record the Voynich manuscript or Linear A or some Uugawoogan-English wordlist published in 1630 either. Thadh (talk) 16:51, 26 February 2024 (UTC)[reply]
@Thadh The first two examples are untranslated, while the third is not. The only reason we don't know what Linear A and the Voynich manuscript are written in is because they haven't been deciphered; whereas in the third example, you just don't think the source is very good. Completely different situations, and the third clearly falls under "imperfect". Theknightwho (talk) 17:21, 26 February 2024 (UTC)[reply]
It's not about not being good, it's about not correlating to the reality we currently witness. We can't identify the language with any one language that exists now and we don't have a good corpus to give us multiple accounts of this language, and until we can or do I don't think it makes sense for us to record these. Thadh (talk) 17:31, 26 February 2024 (UTC)[reply]
Indeed, we broke up "Tasmanian" and created codes for more specific Tasmanian languages (following the literature) several years ago, although I can't find the discussion at the moment (linked in the 2020 discussion that follows), and in 2020 the ISO followed our suit and retired xtz and split it into codes for specific languages, so we were even able to upgrade our exceptional codes (e.g. aus-pee) to ISO codes (e.g. xpw). AFAIR Category:Tasmanian language only still exists because we didn't have time to fully clean up every occurrence of xtz and then retire the code; if someone can clean up the last few entries in that category (binearrenerepare, bo, itho, lia, meener, munger, narrar, nomemene), ideally assigning them to the relevant more specific codes (see the "in 2020" link for a list), that'd be great. "Tasmanian" translations in water and one also need to be changed or removed. - -sche (discuss) 21:05, 25 February 2024 (UTC)[reply]
@-sche Thanks for this. binearrenerepare is now Port Sorell, itho is Pyemmairre, meener is Paredarerme, and munger is Peerapper in line with Crowley & Dixon 1981. I do not have access to the cited source and Crowley & Dixon do not give narrar, nomemene, or bo as attested Tasmanian 1SG pronouns so not sure where they come from.--Saranamd (talk) 16:44, 26 February 2024 (UTC)[reply]
nomemene+Tasmanian gets no Google Books hits and almost no Google web hits; I've RFVed it. Narrar is piped so that it links to "he" but displays "I", which does not give me confidence in the person who created the entry; it is mentioned in Henry Ling Roth, John George Garson, The Aborigines of Tasmania (1899), page 184, as the personal pronoun corresponding to "he, she" and has a parenthetical "Norman" after it; I'll RFV it too, and bo. - -sche (discuss) 18:29, 27 February 2024 (UTC)[reply]
@Saranamd I believe all entries and all uses of xtz have been dealt with, so I'm removing the code from the module. (Revert if this causes issues.) Can you assign lia/liya (the term for water, which we had an entry for at [[lia]]) to the relevant languages? It seems to be used in quite a few but I don't have access to C&D at the moment to check exactly which. I also removed marra as a translation of "one" if you can rescue it; beyond that, the only other mention of xtz was in boobialla (if you happen to know what language that term is from). - -sche (discuss) 23:20, 28 March 2024 (UTC)[reply]

Lojban cleanup again[edit]

@AugPi who has worked on Lojban and is still active. I would like to de-Lojbanize the grammatical terminology here on Wiktionary. I partly did this before; you can see for example, that Appendix:Lojban/vo'i has its header specified as ==Particle== instead of ==Cmavo==, as it did previously. But the template that defines the headword is still {{jbo-cmavo}} and it uses the Lojban-only POS cmavo. Furthermore, the categories of this term (besides Category:Lojban lemmas) are Category:Lojban cmavo, Category:Lojban cmavo of selma'o KOhA and Category:Lojban pro-sumti, and the definition of this term is as follows:

Repeats the x3 sumti of the main bridi of the current sentence.

To me, this reads as total gobbledygook, and I doubt 99.9% of anyone who comes across this definition will have any idea what it means.

As a general rule, we don't use native grammatical terms in Wiktionary entries, but map them to the closest English terms, and Lojban should be no exception despite being a rather unusual language. I would like to do a more thorough cleanup where we replace Lojban grammatical terms with English ones everywhere. For example, the definition could be reworded something like this:

Repeats the third argument of the main clause of the current sentence.

assuming that argument is a good translation of sumti and clause of bridi. If necessary we can include the Lojban terms in parens following the English term, like this:

Repeats the third argument (sumti) of the main clause (bridi) of the current sentence.

User:AugPi, can you help me compile a list of Lojban terms and their best English equivalents? Feel free to ping anyone you know who works on or has worked on Lojban.

For the terms needing translation, we can start with those that have infected Module:headword/data:

1. For lemmas:

cmavo
cmavo clusters
cmene
fu'ivla
gismu
lujvo

2. For non-lemma forms:

rafsi

3. Other terms appearing in categories are:

brivla
selma'o
fu'ivla cmene
lujvo cmene
pro-bridi
pro-sumti
sumti tcita

4. Other terms in Category:jbo:Grammar are:

bridi
gadri
jufra
selbri
sumtcita (=sumti tcita?)
tanru

Thanks! Benwing2 (talk) 03:39, 29 February 2024 (UTC)[reply]

@Benwing2 I support Englishing the terminology, and I support putting the Lojban terminology in parentheses as you mention ("the third argument (sumti) of the main clause (bridi) of..."). I don't see the need to change anything in Category:jbo:Grammar, it seems to be correctly defining those Lojban terms; what am I missing? (Or were you not proposing to change anything in the category, just saying that the terms in the category are not terms that should be used in entries?) Prior discussions, for other people's reference: Wiktionary:Beer_parlour/2013/March#POS_labels_and_different_languages, Wiktionary:Beer_parlour/2022/January#Lojban_cleanup. - -sche (discuss) 23:30, 28 March 2024 (UTC)[reply]
@-sche Thanks. Unfortunately User:AugPi hasn't responded; they are moderately active but maybe not working on Lojban any more so maybe they don't care. What I am proposing is not just cleaning the definitions but also renaming the categories, so that e.g. Category:Lojban cmavo becomes Category:Lojban particles. You are right that nothing needs to change in Category:jbo:Grammar; I just mentioned this as a source of definitions for terms that might appear in Lojban categories and/or definitions. Benwing2 (talk) 23:39, 28 March 2024 (UTC)[reply]

Navajo category renames[edit]

Does anyone here at Wiktionary still work on Navajo these days? I am planning on renaming the Navajo verb categories to be more standard. In particular:

  1. Navajo verbs with prefix foo- will become Navajo terms prefixed with foo-; e.g. Category:Navajo verbs with prefix shó- -> Category:Navajo terms prefixed with shó-
  2. Navajo verbs with foo prefix bar- will become Navajo terms prefixed with bar- (foo); e.g. Category:Navajo verbs with disjunct prefix ʼá- -> Category:Navajo terms prefixed with ʼá- (disjunct)
  3. Category:Navajo terms with emphatic infix -x- -> Category:Navajo terms infixed with -x- (emphatic)
  4. Navajo verbs with classifier -foo- will become Navajo terms prefixed with foo- (classifier); e.g. Category:Navajo verbs with classifier -∅- -> Category:Navajo terms prefixed with ∅- (classifier)
  5. Category:Navajo verbs with peg element yi- -> Category:Navajo terms prefixed with yi- (peg element)
  6. Navajo verbs with postpositional prefix -foo will become Navajo terms prefixed with foo- (postposition); e.g. Category:Navajo verbs with postpositional prefix -aʼ -> Category:Navajo terms prefixed with aʼ- (postposition)
  7. Mainspace entries for terms that are prefixes but don't have a following hyphen, or do have a preceding hyphen, will be renamed.

The logic here:

  1. I renamed "verbs" -> "terms" in categories because the vast majority of such categories are for verbs. There are only five categories in Category:Navajo terms by prefix representing a total of 11 lemmas (mostly nouns). Navajo seems a heavily verb-centric language, so it can be assumed a given prefix is verbal, and if there are prefixes that can be both nominal and verbal and it's important to note the difference, this can be handled in the parenthetical tag.
  2. I standardized the use of hyphens in prefixes. The current etymologies are not consistent in the use of hyphens. IMO everything that's a prefix (where "prefix" means anything coming before the root) should have a following hyphen and no preceding hyphen.
  3. The use of the term "classifier" here does not follow the standard usage of this term (e.g. as in East Asian languages). For example, -ł- is defined as follows:
    The -ł- classifier or valence-change prefix, a causative-transitivizing prefix of active verbs that modifies the transitivity or valence and grammatical voice of a verb. It often transitivizes an intransitive -∅- (unmarked) verb:
    This doesn't appear to have anything to do with (e.g.) Chinese classifiers, which categorize a noun semantically, a bit similarly Indo-European genders. Instead it's simply a type of prefix. If "classifier" is the normal term in Navajo grammars, it's fine to maintain it as a parenthesized tag (as I have done), but it should not be placed in the Classifier POS.

Benwing2 (talk) 04:44, 29 February 2024 (UTC)[reply]

@Eirikr, who knows something about Navajo, even if he doesn't work with it. I've worked a little on other American Indian languages, and this is similar in the way the lines between inflectional morphology, derivational morphology and syntax tend to blur- I'm not sure if we can make these fit neatly into anything. When you can have a single word that means "I saw those two women walk this way out of the water", all bets are off. In the case of Cahuilla, the verb is the sentence, with subject pronouns, object pronouns, many adverbs/prepositions, etc. reduced to affixes, and separate words mostly just used to name the referents of the affixes. Chuck Entz (talk) 16:07, 29 February 2024 (UTC)[reply]
@Chuck, thanks for the ping. Much IRL is keeping me busy enough that my Wiktionary time is more limited.
@Benwing2, re: renaming the categories, I have no particular concerns. What you explain above all makes sense.
Specifically about the verb-valence morphemes, I agree that the "Classifier" part-of-speech would be a mistake. That said, these are commonly called "verb classifiers" in the literature, particularly the seminal resources by Young and Morgan, if memory serves. I'll dig up a couple of my dead-tree books later on and make sure I'm not mis-remembering.
(FWIW, I just tried googling for complex Navajo verbs, and ran across the Reddit thread "What makes Navajo considered so difficult?". Please take the "qcomplex5" poster there with a big grain of salt — they list a lot of "Navajo" examples that are patent gibberish. For example, they gloss bízhiʼ jį́ as "he/she is asleep", but as you can see from our entries, this is the two nouns "his/her name" + "day". The rest of their "Navajo" is similarly whackadoo.)
Anyway, like Chuck describes for Cahuilla, a Navajo verb incorporates many of the grammatical elements that are explicitly separate in many other languages, such that subject, object, etc. are all fused in as part of the "verb" word. Consider the relatively simple example of yishdlóósh, an intransitive verb meaning "I creep on all fours". This incorporates the first-person subject pronoun shí as that medial -sh-. Or compare ółtaʼ (s/he reads something unspecified; s/he studies, transitive, unspecified object), yółtaʼ (s/he reads it; s/he studies it, transitive, specific object), and then wóltaʼ (it is read, passive), which also involves a change in the so-called "classifier" from active / transitive -ł- to intransitive / passive / reflexive -l-.
I digress, but I hope that helps.  :) ‑‑ Eiríkr Útlendi │Tala við mig 20:29, 29 February 2024 (UTC)[reply]
@Eirikr Thanks, Eirikr, yeah this makes sense to me and in some ways the use of a valence-changing "classifier" is simpler than English, where we have to use a different finite verb ("to be") to passivize and the main verb changes from a finite form to a participle. The valence change prefix reminds me a bit of se in Romance languages, which is similarly ambiguous as to whether it's intransitive, passive or reflexive. Benwing2 (talk) 23:39, 29 February 2024 (UTC)[reply]
 Done. Benwing2 (talk) 04:24, 16 March 2024 (UTC)[reply]

Sanskrit kṣ-aorist[edit]

@Dragonoid76 What is a Sanskrit kṣ-aorist meant to be? By the current categorisation rules it appears to simply be an s-aorist whose root ends in a velar before the sibilant, which at the very least invites the parallel of ts-aorists. Is it perhaps a confusion with sa-aorists, whose aorist stems all end in -kṣa and whose affix is sometimes called ksa? (Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Getsnoopy, Rishabhbhat, Dragonoid76): --09:51, 29 February 2024 (UTC) RichardW57 (talk) 09:51, 29 February 2024 (UTC)[reply]

@RichardW57 Last I checked there was a lot of confusion in Sanskrit noun and verb classification. Benwing2 (talk) 10:11, 29 February 2024 (UTC)[reply]
@RichardW57 Yes, it is the same as the sa-aorist. Dragonoid76 (talk) 18:05, 29 February 2024 (UTC)[reply]
@Exarchus: So none of the 4 verbs in Category:Sanskrit verbs with kṣ-aorist should be there! RichardW57 (talk) 19:56, 29 February 2024 (UTC)[reply]

I created this as an experiment. It's meant to clean up cases like these, where we want to group languages together by family, but do not have a proto-form. (In practice, this does happen occasionally; any argument that there must always be a proto-form is absurd to me.) This allows 1. better formatting (with tooltips for borrowings, etc.) and 2. easier parsing. I'm bringing it up here in case anyone has any feedback on it. — SURJECTION / T / C / L / 22:11, 29 February 2024 (UTC)[reply]

@Surjection Looks good to me. I agree that trying to force a proto-form when there isn't one reconstructed isn't helpful. Benwing2 (talk) 23:35, 29 February 2024 (UTC)[reply]