Wiktionary:Beer parlour/2024/March

From Wiktionary, the free dictionary
Jump to navigation Jump to search

A way to more easily connect with readers[edit]

I have seen this idea thrown around some and I have had it myself - what if we had some official social media accounts where we can respond to readers, give polls, etc., that admins have access to? In theory readers can interact with us here i.e. at the Information Desk etc., but I think that process can be a little obtuse for the average person, and for some even intimidating. I also know that various Wikimedia projects have their own accounts on various platforms. Also, since it's an open project, the more input the better, theoretically. Vininn126 (talk) 08:45, 1 March 2024 (UTC)[reply]

I see you beat me to it. I have personally had a need for this on numerous occasions, finding Reddit and Twitter posts that (often in the form of a joke/curiosity) showed serious errors in our dictionary. A recent example is diff. A way to interact with the people that bring such mistakes to our attention is in my opinion very important.
I think the easiest way to maintain this is to simply create a couple of accounts and share their login information in the Admin channel on Discord, which automatically gives all admins that have joined the Discord server the means to manage these accounts. Then afterward we can post various polls and/or announcements after a consensus with the community, while also having the ability to quickly respond to feedback. Thadh (talk) 09:26, 1 March 2024 (UTC)[reply]
@Thadh I agree, but we should probably message other admins and send it to their emails potentially, so as not to have a barrier to get access. Vininn126 (talk) 09:29, 1 March 2024 (UTC)[reply]
I would prefer we do that based on requests. Many admins are not very active and I don't want the mail to go to some years-long unchecked inbox. Thadh (talk) 09:42, 1 March 2024 (UTC)[reply]
Some platforms we should consider: Reddit, Twitter (X), Facebook. Any other suggestions? Vininn126 (talk) 09:58, 1 March 2024 (UTC)[reply]
Only fans? Allahverdi Verdizade (talk) 10:19, 1 March 2024 (UTC)[reply]
You wish. I ain't doing a body reveal that easily. Vininn126 (talk) 10:23, 1 March 2024 (UTC)[reply]
I hate the idea that we are, in effect, endorsing/legitimizing and making more attractice these intrusive systems, but we are effectively forced into it by user preference for them. DCDuring (talk) 13:54, 1 March 2024 (UTC)[reply]
@DCDuring One thing we could do is also promote how to engage with the site more directly. By creating accounts on these sites with wider reach, we can bridge the gap for readers who are scared to edit and also show them how to start discussions, etc., pontitally increasing editorship. Vininn126 (talk) 13:57, 1 March 2024 (UTC)[reply]
@DCDuring Alternatively, we could see it as engaging with the reality that users will not always come to the site directly in order to raise issues. Ignoring that isn't going to help anyone. Theknightwho (talk) 14:13, 1 March 2024 (UTC)[reply]
It's certainly not a bad idea. But, in the interest of protecting myself from "personalization", I waste my time at MW projects, not the commercial sites. DCDuring (talk) 14:50, 1 March 2024 (UTC)[reply]
@DCDuring What do you mean by personalization? Kiril kovachev (talkcontribs) 18:24, 2 March 2024 (UTC)[reply]
Generating content tailored to me, certainly including advertising, possibly already including or soon to include price discrimination. DCDuring (talk) 20:36, 2 March 2024 (UTC)[reply]
Mastodon. CitationsFreak (talk) 16:33, 1 March 2024 (UTC)[reply]
I second this. Allahverdi Verdizade (talk) 20:53, 2 March 2024 (UTC)[reply]
VK may be a good idea to attract any potential editors and/or readers from Russia. Thadh (talk) 20:45, 10 March 2024 (UTC)[reply]
I support this idea, but would it only be admins? I might like to have access as well. Ioaxxere (talk) 20:43, 1 March 2024 (UTC)[reply]
@Ioaxxere I think without a rigorous way to add users, it might devolve to a free-for-all. Also at the beginning, I think only truly trusted users should be given access. Perhaps there'd be a process for adding trusted users in the future. Vininn126 (talk) 20:46, 1 March 2024 (UTC)[reply]
We had an unofficial Twitter account created by WF. Only admins can see it, but there's some information at the deleted page for User:Wikt Twitterer (I believe they created it when they were using that account, but kept the Twitter account going after the Wiktionary account was blocked). Chuck Entz (talk) 23:56, 1 March 2024 (UTC)[reply]
Well considering no one has objected I think we can probably move forward. The question is what email to use when signing up for these accounts and what username to use. Vininn126 (talk) 10:49, 8 March 2024 (UTC)[reply]
We should probably create one to go with these accounts. Best not to advertise its address anywhere on-wiki though. Thadh (talk) 17:18, 8 March 2024 (UTC)[reply]
Use a Wikimedia address. Don't go third-party like Gmail etc. Equinox 17:25, 8 March 2024 (UTC)[reply]
That's a good point. Vininn126 (talk) 17:26, 8 March 2024 (UTC)[reply]
@Equinox How might we do that? Vininn126 (talk) 17:30, 8 March 2024 (UTC)[reply]
I assume we tell Wikimedia that we want to make a social media account. CitationsFreak (talk) 20:22, 8 March 2024 (UTC)[reply]
@Vininn126: Search me. When I said "we should stick with open tech like IRC" people laughed at me, and Discord won. But some day it will die, or become "Utility cloud computing" with a bill attached. It's smarter to keep free and open. You do it if you can. I'm not there anyway, being a transphobic nazi etc. Equinox 07:38, 29 March 2024 (UTC)[reply]

Bengali language[edit]

I intended to create a phonelist for Bengali. Is there anyone who can guide me through bot stuff? Arundhatisgupta (talk) 18:24, 1 March 2024 (UTC)[reply]

@Arundhatisgupta What do you mean by "phonelist"? What sort of bot work are you trying to do? (Keep in mind if you plan to do page edits using a bot, you need to get permission to do so.) Benwing2 (talk) 23:45, 5 March 2024 (UTC)[reply]

Restricting {{m}} in etymology sections[edit]

Wiktionary's etymology sections are not very machine-readable, and the main issue is the {{m}} template, which can be used in a wide variety of ways:

  • Origin within a language: A {{glossary|respelling}} of {{m|en|puisne}} (in puny)
  • Listing alternative forms of an etymon: From {{inh|en|enm|hed}}, {{m|enm|heed}}, {{m|enm|heved}}, {{m|enm|heaved}} (in head)
  • Listing related terms: More at {{m|en|Tyr}}, {{m|en|day}}. (in Tuesday)
  • Listing unrelated terms: Not related to {{m|en|Romanian}} or {{m|en|Roman}}. (in Rom)

I propose that {{m}} be used only for unrelated terms and that we create new templates for the other three cases. Ioaxxere (talk) 20:41, 1 March 2024 (UTC)[reply]

In the case of "More at...", that should be {{l}} anyway, since it refers to the entry and not the term. Theknightwho (talk) 21:29, 1 March 2024 (UTC)[reply]
Just in terms of the formatting produced, I dislike the use of {{l}} when used inline in other running text: {{m}} produces italicized text, which is visually distinct from the rest of the text.
For that matter, I understood the "l" in {{l}} to stand for "list", as this template was originally intended to only be used in lists of terms, where formatting to distinguish from other running text isn't needed. ‑‑ Eiríkr Útlendi │Tala við mig 05:09, 10 March 2024 (UTC)[reply]
That is an unrealistically lofty ideal. {{m}} has many other uses and you simply cannot sequester them all into separate templates. — SURJECTION / T / C / L / 21:36, 1 March 2024 (UTC)[reply]
I wouldn't mind that it no longer be used to italicize (often mis-italicize) taxonomic names. But there are no restrictions on its use at present, it has mostly been used for formatting, and there is no incentive for users to limit use. It seems particular hard to imagine that we could get users to comply with different rules based on the L3/4/5/6 header they were editing in. Our filters are already getting intrusive and unhelpful. DCDuring (talk) 22:50, 1 March 2024 (UTC)[reply]
This doesn't seem like a good idea... Thadh (talk) 22:53, 1 March 2024 (UTC)[reply]
Being able to mention other terms is very, very useful. Vininn126 (talk) 22:56, 1 March 2024 (UTC)[reply]
Yeah, I don't think we need a proliferation of different templates. It will just make coding much harder. As it is, it's already difficult to learn how to use templates like {{en-verb}}, {{inflection of}}, and {{Module:quote|call_quote_template}}. — Sgconlaw (talk) 22:59, 1 March 2024 (UTC)[reply]
Oppose. I spend a lot of time fixing etymologies where someone copied the entire etymology from an entry in another language without changing the language codes. If people routinely get that wrong, they're not going to have a clue about the subtleties and intricacies proposed. You'll end up with people copying from one entry where they make sense to another where they're all wrong- or worse, partly wrong. Unlike with language codes, there's no reliable way to tell if they're being misused without knowing something about the etymology (if there were, you wouldn't need them in the first place). Basically, this proposal would give editors more ways to be wrong. Chuck Entz (talk) 23:35, 1 March 2024 (UTC)[reply]
Also, I don't know if we would want to do this in the name of machine-readability. Wiktionary is not really a database, so if we wanted it to be machine-readable, it would have had to have been one to begin with. Maybe Wikidata could hanlde this kind of thing instead. Kiril kovachev (talkcontribs) 00:50, 2 March 2024 (UTC)[reply]
@Kiril kovachev I think it's a balance - we need to be machine-readable to some extent, since some users rely on that to collate info from a wide array of entries (and it also helps our bots), but I'd agree that the templates suggested here would be a step too far, as I don't really see what advantage they'd provide. Theknightwho (talk) 18:16, 2 March 2024 (UTC)[reply]
That's true, I agree with your point here. It's nice to have a clearly-defined {{inh|en|grc|...}} kind of thing, but there're also ways in which our etymology section can be virtually free-form and forcing it to be more machine-readable would kill that flexibility. Such as the ways we may be using {{m}} right now. Kiril kovachev (talkcontribs) 18:23, 2 March 2024 (UTC)[reply]
It doesn't seem very useful to me. Are there plans to have machines doing something with our etymology sections anytime soon? At some point far enough in the future, improvements in machine comprehension of natural language might make it easier for machines to understand what humans write, rather than forcing humans to adjust how they write so machines can understand it. I think there are a lot of aspects of writing etymologies that are difficult to boil down to a fixed set of templates, so I'm not enthusiastic about us engaging in that project unless there's some real benefit we can point to. Simple etymologies already use templates, so this proposal seems to deal with a tail of complicated etymologies (do you know what percentage do contain {{m}})?--Urszag (talk) 01:06, 2 March 2024 (UTC)[reply]
This seems to be a good attitude/strategy for such matters in general. DCDuring (talk) 17:52, 2 March 2024 (UTC)[reply]
Oppose. Makes things harder, would make me inclined to not do etymologies if it's such a pain, even if I know the word origin. If we must restrict codes in this way, introduce a nice GUI that creates the code from our menu choices or something. Equinox 18:27, 2 March 2024 (UTC)[reply]
Honestly, an extension of the New Entry Creator that can easily add etymologies and quotes would be nice... CitationsFreak (talk) 06:02, 3 March 2024 (UTC)[reply]
The lack of a template for this purpose has irked me in the past. We have nice templates for when a word comes from another language, but not for when it comes from another word in the same language (unless by some well-known process such as affixation).
Recently I wanted to generate a list of all English terms which are said to derive from another English term, but where that term's entry doesn't include the term derived from it in a "Derived terms" section. Such a list would help fill gaps in our "Derived terms" sections, but it's all but impossible to generate a comprehensive list like this with the current setup.
I would definitely support an effort to designate a template specifically for same-language derivations. It seems it would be possible to use {{af}} for this purpose with minor modifications to its code, and probably a new name too:
A {{glossary|respelling}} of {{af|en|puisne}}. Not related to {{m|en|some other term}}.
The other uses of {{m}} can be dealt with in other ways. This, that and the other (talk) 03:21, 4 March 2024 (UTC)[reply]
@This, that and the other what do you think it should be called? For now, we could create a template which redirects to {{affix}}. In the future, I think {{af}} should be adapted into a generic "internal derivation" template. I think @Benwing2 is on the same page on this. Ioaxxere (talk) 17:30, 4 March 2024 (UTC)[reply]
I realized that {{from}} didn't exist—I think that's a good name. Another idea is to be able to adapt {{der}} to allow a faster way of writing {{der|en|en|term|nocat=1}}. Ioaxxere (talk) 18:58, 4 March 2024 (UTC)[reply]
Yes, I recall seeing a discussion about broadening the use of, and renaming, {{af}} in the past (not sure where or with whom).
{{from}} is an excellent name - good find. This, that and the other (talk) 00:08, 5 March 2024 (UTC)[reply]
Sounds like a skill issue on the part of the machines. Nicodene (talk) 11:13, 7 March 2024 (UTC)[reply]

deprecate Template:1[edit]

I have renamed this to {{cap}} and deprecated it per the discussion in WT:RFM, but User:Equinox reverted the deprecation claiming it will save them keystrokes. I would like to see what people think about keeping this deprecated. I don't see how two keystrokes makes much of a difference, and {{1}} is just about the worst alias imaginable. If keystroke savings is really a big deal, we could use something like {{ca}} or {{cp}}, both of which are currently undefined. Benwing2 (talk) 02:02, 3 March 2024 (UTC)[reply]

Looking at the RFD, I see that {{M}} was suggested as a new name for this template by User:This, that and the other. We should just switch the template's name to that. Saves the same amount of keystrokes as {{1}}, better alias. (Plus, there was no real consensus to deprecate it in the first place.) CitationsFreak (talk) 06:24, 3 March 2024 (UTC)[reply]
Equinox's argument is weak - "cap" falls under the fingers nicely (on a QWERTY keyboard at least) and is just as typeable as "1". As for alternative names, {{M}} (for "majuscule") is just okay, because of the existence of lowercase {{m}}. The other obvious single-letter shortcuts ({{C}} for "capital" and {{U}} for "uppercase") are already taken. Another alternative would be {{^}}, implying "raising" the first letter to uppercase. This, that and the other (talk) 07:04, 3 March 2024 (UTC)[reply]
@This, that and the other @CitationsFreak {{U}} is hardly used so we could easily repurpose it. Also how about {{uc}}? Benwing2 (talk) 08:07, 3 March 2024 (UTC)[reply]
Please don't deprecate, rename, etc. An uppercase template name, like {{M}}, is a bit worse than a lowercase one, like {{1}}. I, for one, appreciate any keystroke savings for my arthritic joints. DCDuring (talk) 13:29, 4 March 2024 (UTC)[reply]
I know this template only for a few weeks, since editors always preferred bare links. The issue here is that it looks homographic to the non-italic linking template {{l}}, whereas we don’t want confusables. For this purposes anything cV seems to be bad already, looking like {{cat}}, {{c}} and {{C}}, the parameter |nocap=, and {{caps}} and {{cx}} and what not. I suppose Benwing2 wants to cleanup {{caps}} too, though, since this is only used in about 200 entries having {{he-root}}.
Intuitively I propose {{up}} and {{high}} since the letters are close together, and high, on the keyboard. And {{}} which is Shift + AltGr + U on my standard xkeyboard-config layout, I’d actually use that, it looking exactly as much better than {{1}} as needed, Abloh’s 3% rule or something. Not seen DCDuring using the template, but the same concern can be valid for other editors and a rename can make it better. Fay Freak (talk) 14:29, 4 March 2024 (UTC)[reply]
I don't see the point of deprecating this template if multiple editors use it to save keystrokes. However I think we should be automatically subst-ing every instance of {{1}} and {{cap}} for the sake of readability. Ioaxxere (talk) 17:36, 4 March 2024 (UTC)[reply]
In the software industry, "deprecation" usually gives you a long time to deal with something. For example, Microsoft deprecated WebClient (a class used to perform Internet downloads), but it continues to work for many years. Also, there is usually a genuine stated rationale by which the replacement is better, not just a programmer's whim. You can joke "it's not a big deal", but it is longer to type cap than 1 (especially if you create thousands of entries, like I do) and there's also muscle memory, which is really important for older people: please understand this, even if you are young: it's ableism. In this case, it costs us literally nothing to retain the 1 page as a redirect, which makes the template work fine. Removing and breaking the redirect can be nothing but either (i) punishing "old dogs" who can't learn "new tricks", or (ii) a fascist march ahead that supports developers but not users who create the project. Equinox 03:43, 5 March 2024 (UTC)[reply]
@Benwing2: Some years ago, we had a very aggressive template editor who upset many people by placing his/her software design decisions over user needs. Please don't be that person again. There are democratic discussion tools to allow you to work it out without turning off things that really matter to me, as a person who creates hundreds of entries per month and never fucks with a template. Equinox 03:47, 5 March 2024 (UTC)[reply]
@Equinox Would {{L}} work as a compromise? It looks sufficiently different that I don't find it confusable, and it's only one extra keystroke. I don't like {{1}} because it looks almost the same as {{l}} in the code. Theknightwho (talk) 19:13, 5 March 2024 (UTC)[reply]
L is better than nothing. But I really don't see why it's killing anyone to retain the working redirect.v Equinox 19:18, 5 March 2024 (UTC)[reply]
@Theknightwho @Equinox I was thinking of repurposing {{u}}. Not even a shift key extra and it's barely used; {{U}} can be used for user mentions if anyone cares. Please note that this change is not coming out of the blue; the discussions over getting rid of {{1}} have been going on for years, most recently in WT:RFM. I'm also not sure how useful or helpful it is to accuse me of being selfish, fascist, ableist and ageist, and IMO it's definitely not helpful to demand that no template be removed once it's created (or maintained over a several-year deprecation process, which is tantamount to the same thing). Benwing2 (talk) 22:53, 5 March 2024 (UTC)[reply]
@User:Benwing2 You should have said that at the start, to be honest. I feel that mentioning that would be more productive for you, since there is the same amount of joint movement in typing both {{1}} and {{u}}, so there could be no argument based on that. CitationsFreak (talk) 23:31, 5 March 2024 (UTC)[reply]
I'm getting to the point where I just use wikitext for everything I input. If the wikitext "required" for "proper formatting" is too hard, then I get the words right and leave something that doesn't necessarily conform to WT:ELE or whatever other norms we have for cleanup by others, who seem to like that kind of thing. If the next step is to filter such input, I'm out of here. DCDuring (talk) 01:07, 6 March 2024 (UTC)[reply]
@DCDuring I don’t have any strong views on the issue raised by this thread, but this attitude isn’t fair on other users, because you’re just creating clean-up work for others. The idea of using link templates outside of definition lines isn’t new, and it’s not complicated. Theknightwho (talk) 17:40, 7 March 2024 (UTC)[reply]
@User:Theknightwho Just more keystrokes and more learning overhead. I find it hard enough to try to make and keep taxonomic and related entries useful and to correct other users' mistaken and omitted uses of {{taxlink}}, {{vern}}, and now {{taxfmt}}. I don't undertake any non-morphological etymologies, instead inserting {{rfe}} (and getting complaints about that), because that's just more learning overhead, easily forgotten. I'm sure I get lots of descendants items wrong too. DCDuring (talk) 18:43, 7 March 2024 (UTC)[reply]
@Theknightwho, Benwing2: I’m with User:Ioaxxere above: Why not just automatically subst every instance of {{1}} (perhaps by bot), making this problem vanish? This template, whatever its name, is convenient for editors adding content but bad for readability; subst-ing would keep the convenience while resolving the problems of having such a template hang around in the code. — Vorziblix (talk · contribs) 02:57, 6 March 2024 (UTC)[reply]
@Vorziblix It's not possible to automatically do this except by periodically running a bot script. We only have a few things that currently run by periodic bot scripts, and AFAIK they are all triggered manually (by me, or in the case of {{t+}}, by User:Ruakh, although I don't know whether this still runs); in general I am reluctant to add more esp. to mainspace pages because they cause surprise for editors and are a maintenance burden. Also, for long words at least, it might be worse to have it duplicated in capitalized and lowercase forms than to have a (properly-named) template that wraps a single instance of the word. Benwing2 (talk) 03:04, 6 March 2024 (UTC)[reply]
Why aren't you 'reluctant' to do things that add more keystrokes? Is it because you aren't the one doing those keystrokes? Or do you think that our content is so good that all we have to do is pretty the dictionary up and let AI fill in the gaps? DCDuring (talk) 13:19, 6 March 2024 (UTC)[reply]
Needlessly snarky. Vininn126 (talk) 13:28, 6 March 2024 (UTC)[reply]
@DCDuring: Look, we can’t imagine well how it is to have arthritis and have to balance the concerns of joints and eyes of everyone, stop being so combative. Depending on the position of the keys, one or two keystrokes more may go easier for you than even one: if they are in a close area and if they are in the upper mid; 1 is at a corner and {{1}} strains the eyes of people with impeded and good eyesight in view of {{l}}. That’s why I have these three suggestions here, we might take two of: {{up}}, {{high}}, {{}}. I actually think a lot about keyboard layouts, the curly brackets are at the keys for 8 and 9 for me and for US standard <AD11> and <AD12> (the two right of O and P) and so these will be typed on one hand easily. Fay Freak (talk) 13:36, 6 March 2024 (UTC)[reply]
That's not snarky. I'm really concerned about attitude.
My eyesight isn't very good either. I've weighed the difference to me.
So, I'm just supposed to roll over? I haven't objected to the {{subst}} idea.
Why don't we have a thoroughgoing consideration of keystroke minimization. Why not use {{i}} for initial capitalization, instead of wasting it as a redirect to {{qualifier}}, when {{q}} also redirects thereto? DCDuring (talk) 15:45, 6 March 2024 (UTC)[reply]
In general the concerns of easy input and easy readability for future editors both have to be considered when naming templates. There are ways for editors to configure their own machines to make entry easier, e.g. I think an AutoHotkey script could be used on Windows to convert {{1| to {{cap|, or do anything similar like this, on the user's end.--Urszag (talk) 02:52, 7 March 2024 (UTC)[reply]
On the one hand I support the principle of things actually making sense, and nothing about the abbreviation "1" does. On the other hand it seems fairly harmless, and if it really is saving Equinox so much trouble, why not? Nicodene (talk) 11:10, 7 March 2024 (UTC)[reply]
Mehhh. I agree it's an unintelligible name (and therefore proposed at RFM that we make the 'main' name something more intelligible), but redirects are cheap and I don't see harm in leaving {{1}} as a redirect. Some prolific editors are clearly used to using it. (In any given couple of months, we have one or two entries which use {{altcaps}} and thus just display a redlink, because I or someone else has been unable to recall what the new name for that is.) I admit {{1}} is a particularly unintelligible name, though (unlike e.g. {{altcaps}}). - -sche (discuss) 06:04, 9 March 2024 (UTC)[reply]
Badly named redirects add cognitive burden to people trying to understand the wikicode, and redirects in general (esp. badly named ones) increase the tech debt; enough of them and the site becomes unmaintainable. This is why people like me and User:Theknightwho who put time into maintaining the site (rather than just using it) push back against having random redirects littering the site. I also still don't know why User:Equinox as well as User:DCDuring (who doesn't even use the alias) and are so attached to this particular alias when I have proposed a more sensible redirect {{u}} that is the same number of keystrokes. (Not to mention that using any template requires 5-6 keystrokes due to the left brace and vertical bar, so I have a hard time buying the argument that a single extra shift key makes a huge difference. I should also add, Equinox accused me of ageism and ableism knowing almost nothing about me -- I am in fact older than him and have suffered my own spate of hand-related disability.) Benwing2 (talk) 06:18, 9 March 2024 (UTC)[reply]
Personally, I feel that most of the arguments that apply to {{1}} apply to {{u}} as well. Also, after enough uses of it, they will be using it in no time (like with the mandated use of "en" in the etym and quote fields). CitationsFreak (talk) 20:50, 9 March 2024 (UTC)[reply]
{{up}} is still clearer than {{u}} and easy enough to type. My tendency is always that single-letter templates are badly named if they might look like something else (e.g. usage templates, {{user}}) and as after all there are little more than twenty letters available. This is not strictly comparable to terminal commands either, where we use to have a -V synonym of a longer --word. The one-ASCII-character ones really need broad consensus, even unconscious one. I doubt that {{u}} for {{cap}} will have this habitation like {{m}} and {{l}} have. The difference is also that these, and {{q}}, have semantics, even if it only consists in wrapping a language other than the working one, that capitalization at the beginning of English glosses hasn’t. All rationalizations that I am uneasy about {{u}} and {{i}} for any purpose. So far I have only three one-letter template-codes I use and watch out for. Fay Freak (talk) 21:49, 9 March 2024 (UTC)[reply]
@Benwing2 "I don't know why users keep doing their user things, rather than the better thing that I, the programmer god, imposed upon them". I hope some day you will realise why users hate your guts. I have been here since 2008, LOOK AT MY TRACK RECORD, I am doing nothing to you, I am not hurting you, but YOU, BENWING, you are changing things, you are hurting me and making it hard for me to continue the free open source project. Don't you dare make me sound like the criminal here. Equinox 07:52, 29 March 2024 (UTC)[reply]
I do not "agree" with some changes made to quote-book etc by Benwing, but it doesn't ruin what I'm doing. I am glad there is someone out there boldly editing at that level, because that seems to me like high-quality editor that can help out in complex code-related situations. I remember that I didn't agree with something that was going on related to Categories, too. But I'm kind of ambivalent most of the time on these things, and even if I "lose" an argument or whatever, ultimtaely it's not a big deal even if 100% of what I've ever done was just deleted. (I will mention that I recently (maybe Jan or Feb 2024?) started using the "{{u}}" in some citations that had underlined text in the original. I believe I found it in the code at a Wikipedia article and applied it here or something? I think Wiktionary should maintain code-level comaptibility with Wikipedia unless Wiktionary would thereby lose some area of functionality.) Geographyinitiative (talk) 08:08, 29 March 2024 (UTC)[reply]
@Geographyinitiative I'm glad you don't have a problem. I do. Do you also hang around abortion clinics saying "I'm sorry you lost a baby but I didn't"? Come back when you are relevant. This really hurts my usability and makes it hard for me to keep up the famously huge productivity that I have here. Equinox 08:13, 29 March 2024 (UTC)[reply]

Report of the U4C Charter ratification and U4C Call for Candidates now available[edit]

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello all,

I am writing to you today with two important pieces of information. First, the report of the comments from the Universal Code of Conduct Coordinating Committee (U4C) Charter ratification is now available. Secondly, the call for candidates for the U4C is open now through April 1, 2024.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. Community members are invited to submit their applications for the U4C. For more information and the responsibilities of the U4C, please review the U4C Charter.

Per the charter, there are 16 seats on the U4C: eight community-at-large seats and eight regional seats to ensure the U4C represents the diversity of the movement.

Read more and submit your application on Meta-wiki.

On behalf of the UCoC project team,

RamzyM (WMF) 16:25, 5 March 2024 (UTC)[reply]

Module Breaker[edit]

User:Module Breaker should be blocked. Also, why can't I edit Wiktionary:Vandalism in progress? Avessa (talk) 15:06, 6 March 2024 (UTC)[reply]

@Avessa: Thanks. Because it is a vandalism target obviously. You can do some useful edits and then edit that page if needed. The bar to become autoconfirmed is low. Fay Freak (talk) 17:34, 7 March 2024 (UTC)[reply]

Unlink more and most in English headwords[edit]

For example:

common (comparative commoner or more common, superlative commonest or most common)

I don't see the point of having links to more and most in this kind of entry. In my view, having excessive links makes a page less visually appealing and could invite misclicks. Would anyone oppose removing these links? Ioaxxere (talk) 22:02, 6 March 2024 (UTC)[reply]

Not really seeing why unlinking the words is necessary. Maybe learners of English would find the links helpful. — Sgconlaw (talk) 11:49, 7 March 2024 (UTC)[reply]
Because it is easy to click on touchscreens. One would find it helpful only one time theoretically: and then would not understand the English definitions anyway; anyone who would find necessity to click them is at the wrong place with a monolingual dictionary. Nobody said it is necessary, it is about optimization. Fay Freak (talk) 13:35, 7 March 2024 (UTC)[reply]
For that matter why have links to commoner and commonest? (Alright, maybe commoner is a special case because of commoner#Noun.) DCDuring (talk) 16:58, 7 March 2024 (UTC)[reply]
Because the link is what’s left from WT:ACCEL, or any red link crosslinguistically when there is a comparable situation with periphrastic adjective gradation without the gadget, leaving a red link to invite creation. It would be too distressing to create a page and have no link then. The links are made not with random logics but impulses in mind. Fay Freak (talk) 17:31, 7 March 2024 (UTC)[reply]
I don't understand your last sentence. What "random logics" and what "impulses"? DCDuring (talk) 19:58, 7 March 2024 (UTC)[reply]
You find something that makes sense, harmonizes with some aesthetic equation. But we have to ponder what will be clicked, by the typical, impulse-driven behaviours of readers and editors. The contradiction against analogy logic (synthetic comparatives vs. periphrastic ones) would barely be felt. Fay Freak (talk) 20:06, 7 March 2024 (UTC)[reply]
I was about to close this but I've just realized that further and furthest are also linked. I would like to expand this proposal last-minute to cover those as well. @Sgconlaw, ExcarnateSojourner, does this change your opinions at all? Ioaxxere (talk) 21:48, 6 April 2024 (UTC)[reply]
@Ioaxxere: mmmm, not really. I think there's no harm in leaving them linked. — Sgconlaw (talk) 21:52, 6 April 2024 (UTC)[reply]
In that case this passes 2-1. Ioaxxere (talk) 06:38, 7 April 2024 (UTC)[reply]

Wikimedia Canada survey[edit]

Hi! Wikimedia Canada invites contributors living in Canada to take part in our 2024 Community Survey. The survey takes approximately five minutes to complete and closes on March 31, 2024. It is available in both French and English. To learn more, please visit the survey project page on Meta. Chelsea Chiovelli (WMCA) (talk) 00:23, 7 March 2024 (UTC)[reply]

Revoking autopatrolled status from Kwamikagami[edit]

Some background: @Kwamikagami has been autopatrolled since April 2009, and currently has just over 34,500 edits. They're sporadically active, but when they do edit they tend to make changes to large numbers of entries very quickly, and they tend to focus on single-character entries or anything relating to IPA.

Me, @Benwing2, Vininn126, AG202 and others have been pretty concerned about their sloppy editing for a few months now, and their autopatrolled status makes it much harder to spot. Some examples off the top of my head (but there are literally hundreds like this):

  1. [1]: mass-adding languages with tons of mistakes: the Khoekhoe entry uses the wrong language code throughout, Bodo (India) has the wrong L2 header, and Dogri (even today) still doesn't have a headword template.
  2. [2] Deciding to merge ң and ӈ with no consensus or discussion, despite the fact this caused a bunch of issues for several languages. They also did this for a bunch of analogous letters.
  3. [3] Adding a bunch of stenoscript entries like w—O or adm with merged part of speech headers and/or no headword templates. I can see why they've done this - to avoid repetition - but the obvious and sensible thing to do would have been to start a discussion, not create ~100 more entries with the same issue ([4]). We should not be giving '''{{PAGENAME}}''' on the headword line, and anyone with autopatrolled status should know that.
  4. [5] Adding definitions like "?". No request or attention template - just "?".
  5. [6] Even looking at their most recent contributions, they've wrongly given the pronunciation IPA(key): /ɪkˈsaɪ.ən, -ɒn/ at Ixion. This should be IPA(key): /ɪkˈsaɪ.ən/, /-ɒn/ or IPA(key): /ɪkˈsaɪ.ən/, /ɪkˈsaɪ.ɒn/.

All of this creates a massive clean-up job for everyone else, and Kwamikagami has repeatedly proclaimed that they don't understand the problem, which quite frankly means I don't think they should be autopatrolled anymore. Theknightwho (talk) 22:26, 8 March 2024 (UTC)[reply]

Support. I actually believe we should block Kwami for at least a month since they refuse to acknowledge the problematic nature of many of their edits and continue doing the same thing after warnings. But revoking autopatroller status is a good start. Benwing2 (talk) 22:30, 8 March 2024 (UTC)[reply]
Yeah, I decided not to bring up the repeated refusal to understand consensus, since it didn't seem relevant to this particular issue, but that's definitely a much worse issue.
The clean-up job of their contributions is going to be huge. Theknightwho (talk) 22:33, 8 March 2024 (UTC)[reply]
Support. Vininn126 (talk) 22:31, 8 March 2024 (UTC)[reply]
Support. There's also Category:Translingual entries with incorrect language header (not all of them are Kwami's, but too many are). There are good arguments for treating language-specific characters as either the language itself or translingual, but not both. Most of the entries in this category use translingual templates and language codes under other language headers.
The problem they have in general seems to be making snap decisions without thinking things through, then sticking with those bad decisions until forced to abandon them. They know more than I do on a lot of things, but they don't make very good use of that knowledge. As for the whole stenoscript issue: they did actually ask for advice at the time, so that may not be the best example. Chuck Entz (talk) 01:44, 9 March 2024 (UTC)[reply]
@Chuck Entz I'm not sure I agree with you re the stenoscript: regardless of when they asked for advice or what the response was, they've still created ~100 entries which are in a completely unacceptable state and will need to be cleaned up by someone. Even if they got no response to at all, what they did was definitely not the right thing to do, and is the kind of thing that has got some new users banned. Theknightwho (talk) 02:07, 9 March 2024 (UTC)[reply]
Support + they need a block. AG202 (talk) 04:58, 9 March 2024 (UTC)[reply]
@Theknightwho I confess that I've also used this forbidden headword line on sum#Multiple parts of speech and tht#Multiple parts of speech. I agree that it would be better as a template, but I think "multiple parts of speech" should be allowed as a POS header. Ioaxxere (talk) 06:49, 9 March 2024 (UTC)[reply]
No, it shouldn't. — SURJECTION / T / C / L / 09:53, 9 March 2024 (UTC)[reply]
It looks super objectionable. With little necessity, since at least with {{head}} you can just use |catN=. You might stretch WT:POS a bit by letting a part of speech header be followed by another part of speech header and then only the headword line, which probably contradicts basic publication logics of not having empty headers but would at least look better. For such alternative forms, to save vertical space, we could introduce headers like Pronoun · Adjective (i.e. in my example separated by middle dots). Years ago it was considered whether it would be better to have templates instead of headings, as on other Wiktionaries, only dismissed for Lua memory restrictions, to make appearance centrally manipulatable. Fay Freak (talk) 18:09, 9 March 2024 (UTC)[reply]
I don't think this would be widely accepted. It messes with categorization and not having a "head" template violates our practices. I would change it to how entries like obvi & unfort are. AG202 (talk) 20:04, 10 March 2024 (UTC)[reply]
@AG202 those entries are sensible, since each abbreviated word corresponds with a single part of speech. Compare tht, which would need to have four or five identical POS sections. Clearly a dedicated template would be preferable eventually, although for now I don't think a few missing categories are the end of the world. Ioaxxere (talk) 23:37, 10 March 2024 (UTC)[reply]
The repeated POS sections are what are required at this point per our policy. You should've brought it up with English editors at the very least if not everyone in general before creating those entries like that. It clearly violates our Entry Layout guidelines. AG202 (talk) 23:48, 10 March 2024 (UTC)[reply]
@Ioaxxere I agree with User:AG202. There are various imaginable ways of compressing repeated POS sections but (a) it needs discussion, (b) I doubt using a POS "Multiple parts of speech" is ideal in any case; certainly the actual parts of speech should be listed one way or another. Benwing2 (talk) 23:53, 10 March 2024 (UTC)[reply]
Support unfortunately, I think we exhausted other options. Kwami was given multiple written warnings and blocks for EACH these mass edits and continued regardless. They also haven't really helped clean up or shown remorse for their problematic mass edits... - سَمِیر | Sameer (مشارکت‌ها · بحث) 08:47, 9 March 2024 (UTC)[reply]
Support + a block to fix up the entries. CitationsFreak (talk) 23:08, 9 March 2024 (UTC)[reply]
Support Ioaxxere (talk) 23:35, 10 March 2024 (UTC)[reply]

Revoked, given:

  1. The unanimous and overwhelming support.
  2. It only takes a nomination from one admin and approval from another for a user to gain autopatrolled status.
  3. This has been open for just over 2 days, which is about 6 times longer than it took for the original nomination to get approved and actioned ([7] [8] [9]).

Theknightwho (talk) 00:18, 11 March 2024 (UTC)[reply]

@Theknightwho Thank you. Benwing2 (talk) 00:26, 11 March 2024 (UTC)[reply]

Eastern Geshiza language[edit]

User:Geshiza has been asking about adding this language, but in the meanwhile has created a walled garden of over 30 entries with their own improvised categories, but no templates and no links to or from the rest of Wiktionary.

Adding this language won't be easy, because it's hard to tell what it really is. It's apparently a sub-sublect of Horpa (language code ero), but the Wiktionary article for that language doesn't have much detail about what it describes as "a cluster of closely related yet unintelligible dialect groups/languages". In one analysis of the groupings that it cites, there are 5 "varieties", of which "Central Horpa" has 3 "dialects", one of them being "Dgebshesrtsa (Geshezha 革什扎) (non-tonal)". Whether "Gesheza" and "Geshiza" are the same thing isn't explicitly stated, but another quote in the article makes that seem likely. At any rate, there's no mention at all of "Eastern Geshiza". Does anyone have access to any sources that will make sense out of all this? Chuck Entz (talk) 02:28, 10 March 2024 (UTC)[reply]

@Chuck Entz This sort of "do it then get permission" approach was done for Belter Creole as well. I am strongly opposed to allowing this to proceed as it sets a terrible precedent. I would suggest moving the contents into that user's space until it becomes clearer whether there's any hope of supporting this variety or these varieties. Benwing2 (talk) 03:11, 10 March 2024 (UTC)[reply]
@Benwing2 I don't think they're comparable at all. Belter Creole is a constructed language, whereas Eastern Geshiza seems to be a variety of Horpa, and I can see that a published grammar exists. I would much rather that we simply put a moratorium on any new entries until we've hashed out how it should be handled, but regardless of the language code they still belong in mainspace. Theknightwho (talk) 03:16, 10 March 2024 (UTC)[reply]
@Theknightwho Ultimately maybe so, but not remotely in the current state they're in, and I doubt simply asking or telling this user to stop will make them stop. Who's gonna restructure and clean up the entries once we sort out how many varieties are involved and whether they are L2's or etymology variants? You? If you're not willing to personally commit to doing this then IMO we should move these ill-structured entries to userspace and put them back, gradually, in a properly structured form, once we add the lect codes. Benwing2 (talk) 03:36, 10 March 2024 (UTC)[reply]
@Benwing2 @Theknightwho moving the entry's to their userspace is probably fine. They seem to not understand templates (but they are making an effort, as they seem to be trying to make their entries match others here). We could have them practice using templates in their userspace and, once we feel like they understand how templates work, they can move the entries back themselves. — Sameer (مشارکت‌ها · بحث)
As someone who regularly patrols Abuse Filter 68, I can tell you that creating entries with no templates is more common than you might think. Usually it's not bad faith- just cluelessness. Chuck Entz (talk) 04:23, 10 March 2024 (UTC)[reply]
@Benwing2: Well, they've already been asked, but it's too soon to tell how they'll respond. Chuck Entz (talk) 03:55, 10 March 2024 (UTC)[reply]
@Chuck Entz @Benwing2 they responded and they indicated they will wait until everything is resolved before continuing to edit. — Sameer (مشارکت‌ها · بحث) 05:19, 10 March 2024 (UTC)[reply]
@Sameerhameedy Sounds good, thanks for making the request. Benwing2 (talk) 05:21, 10 March 2024 (UTC)[reply]

Language titles with category[edit]

Could the language.titles have a clickable link to their Category? (main, or lemmas, whatever?) Ideally, also with tooltip with their code? (would be very helpful!). At pages with many language sectors, it is very difficult to go down to the bottom and find the language.
e.g. [:Cat:Afar language|<span title="Afar (aa)">Afar</span>] Thank you! ‑‑Sarri.greek  I 12:12, 10 March 2024 (UTC)[reply]

I'd rather not add any templates to the headings. One could implement a JavaScript gadget that automatically does this, though. — SURJECTION / T / C / L / 12:31, 10 March 2024 (UTC)[reply]
M @Surjection, Thank you. I have no idea how it could be done. I would be delighted at the output. ‑‑Sarri.greek  I 12:57, 10 March 2024 (UTC)[reply]
I have a working prototype in User:Surjection/linkLanguageHeaders.js. You can add it to your common.js to test it. Perhaps it can be turned into a gadget if there is interest. — SURJECTION / T / C / L / 13:08, 10 March 2024 (UTC)[reply]
Yes, I think it was agreed awhile ago not to use templates in headings and IMO this is just as well. Benwing2 (talk) 00:25, 11 March 2024 (UTC)[reply]

I was not proposing a way to do it, I was just showing the desired result. I don't know what js is. I do not change default looks at platforms. As a reader, I would like to click language.titles, because I do not know what they are and Categories are too far away to click. Could, please, en.wiktionary rethink it? Thank you. ‑‑Sarri.greek  I 01:36, 14 March 2024 (UTC)[reply]

Hi, it should be available now through Special:Preferences under "Gadgets" as "Add links to language headings that point to the category of the corresponding language." — SURJECTION / T / C / L / 19:09, 14 March 2024 (UTC)[reply]
Ω! Μ @Surjection! you did this for me? Hooray! Thank you, thank you! I will find it immediately. You are too kind. I hope, lots of people will like it and that it become standard! ‑‑Sarri.greek  I 19:57, 14 March 2024 (UTC)[reply]
It works! it is wonderful; why not for everyone? why hidden in 'gadgets'... You are a magician M @Surjection. The default should be the 'best' and the most useful. ‑‑Sarri.greek  I 20:06, 14 March 2024 (UTC)[reply]

Make default language titles with category[edit]

Great news! M @Surjection, has made a Gadget and we can click the Language.Titles to go to the category! I propose it become default, for all to use. Kiitos! kiitos Surjection! ‑‑Sarri.greek  I 20:27, 14 March 2024 (UTC)[reply]

I don't personally think it should be the default, since it can be a bit distracting and confusing to those who aren't used to it. — SURJECTION / T / C / L / 21:44, 14 March 2024 (UTC)[reply]
But, M @Surjection, you have made it so discreet and elegant! There are no colours, or anything 'loud' about it. I find it very helpful, because there are many names of languages unknown to us. I am delighted, and I wish all people could use it too. (you may not guess it, but lots of us do not go to Preferences. This was my first time, except for Global Pref. for Vector Classic for wikipedias, and fr.wikt). ‑‑Sarri.greek  I 22:27, 14 March 2024 (UTC)[reply]
Thank you @Surjection for doing this! I tried it out and it looks great. @Sarri.greek I think enabling it by default could very well be done a bit down the line but for the moment we should wait to make sure it doesn't have any unexpected interactions with anything else. Benwing2 (talk) 22:36, 14 March 2024 (UTC)[reply]
Mainio! hieno! -in honour of M Surjection, from now on, Finnish will be the language of interjections. .js will be renamed .surjs @Benwing, many wiktionaries have clickable Lang.titles. I was, so longing for it. At el.wikt, the visible labels {{lb}} before definitions, link to their Cat. Where, we see on top, the word of the label in host language, and sorted on top, its translation in the target language :) Anything to fascilitate readers! ‑‑Sarri.greek  I 04:38, 15 March 2024 (UTC)[reply]
I'm also worried about how it will behave in the mobile view, specifically about if it makes the headings harder to click to expand. It does help get around the lack of categories in the mobile view, something which has always greatly irked me. — SURJECTION / T / C / L / 07:35, 15 March 2024 (UTC)[reply]

Two transliterations[edit]

A question (after endless discussions of how to transliterate Modern Greek at Module_talk:el-translit). I do not know about other languages, but at least for Modern Greek ISO offers two types of conversions.

  • TypeA = unique.conversion letter-to-letter transliteration, reversable (two-directional), used for international usage. Customs, machines etc when one-to-one translit is needed.
  • TypeB = slightly simplified, and pseudo-phonemic, calls it transcription (but not with IPA symbols), for national usage. For Greek, the only difference to TypeA are two macron diacritics.
  • ISO also introduces an idea of a 'level 3' mixed Type, more phonemic, for national usage, 'especially' when the above transliterations are very different from the pronunciation.

The question is: Does en.wiktionary have a rule that says: a) en.wikt is obliged to provide the official unique.conversion ISO transliterations. b) en.wiktionary also provides a more phonemic transliteration based on ISO and House Rules, through consensus.
If a) is yes, then we should have two transliterations (for some languages). Discussions would be needed only for b), saving a lot of our energy. Two translits, How? I propose

word (xxxxx© / xyyxxxyy) ...or I for ISO --please check the tooltips

Thank you. ‑‑Sarri.greek  I 12:42, 10 March 2024 (UTC)[reply]

@Sarri.greek Agreed, Persian is also running into this issue. After a discussion months ago it was agreed that Persian templates should have two transliterations (Classical + Iranian) but modules don't support that so we can't do anything rn. I believe Hebrew editors have wanted something similar as well. — Sameer (مشارکت‌ها · بحث) 18:21, 10 March 2024 (UTC)[reply]
@Sameerhameedy There is some language-specific support for this in place at the moment: the major example being Chinese (and I'm not referring to the separate languages grouped together), where several lects show two or three transliterations each in the dropdown; Cantonese has four, and Mandarin seven(!). Korean, Thai and Khmer also do this in various ways, too.
It's clear that there needs to be a language-neutral way of showing things like this, and (taking Mandarin as a benchmark) it shouldn't be limited to transliterations into the Latin script, either, given one of the systems is Zhuyin and another Cyrillic. Theknightwho (talk) 20:16, 10 March 2024 (UTC)[reply]
Thank you M @Sameerhameedy. Asking M @Theknightwho for languages mentioned with 3 or 4 transliterations. What is the legal status of these? By 'obligatory' for wiktionary to show, we mean: ISO-assigned for international transactions like exports. Is there one and only one topping the others? The problem we have here is: Because wiktionarians try to adapt ISO to something more useful to our readers, the discussions a. never end. and b. every 5 or so years, someone comes up with an alteration or a restoration of some letter conversion. This will never end. ‑‑Sarri.greek  I 22:43, 10 March 2024 (UTC)[reply]
@Sarri.greek As far as I can tell, they all have one system which is used for things like links (i.e. as the "transliteration" in the normal sense), and the others are only shown on the entry.
I don't think we're under any obligation to choose the ISO standard as the main transliteration, but if we don't, then it's a good idea to show it on the entry itself. Theknightwho (talk) 22:50, 10 March 2024 (UTC)[reply]
@Theknightwho, I see that such languages have boxes for transliterations = they can manage multiple solutions. I was thinking of languages that have one translit. next to PAGENAME, and disabled the option to add a second one. May I add a point:
ISOs have been critisised for poor results and unsuccesful conversions. Still, I am not proposing to reform ISO here. If ISO makes changes, we record them and update the official translit. I am proposing to free ourselves from the rigid 1st translit, which is not-to-be-debated. Also: how do wikipedians face this problem? Does en.wikt. have a liaison to en.wikipedia for questions or coordination? Thank you. ‑‑Sarri.greek  I 23:04, 10 March 2024 (UTC)[reply]
@Sarri.greek The community of Wikipedia editors who work on language entries seems much smaller than the community of Wiktionary editors, so whoever has the most stamina tends to win out. E.g. User:Mahmudmasri insisted on particular standards for transliteration and phonemic rendering of Egyptian Arabic that I disagree with, but I don't have the energy to fight him on this and he does have the energy to patrol all the relevant pages and edit-war as necessary to get his preferred system in place, so that is what Wikipedia has. Similarly for things like language names and family trees; User:Kwamikagami out-staminas everyone else. I definitely agree with User:Theknightwho that we are under no obligation whatsoever to choose ISO's or anyone else's proposed standard as the preferred/main transliteration. We need to do what's right for Wiktionary and hopefully maintain some consistency of approach across languages where feasible. Benwing2 (talk) 00:24, 11 March 2024 (UTC)[reply]
Thank you @Benwing2 About your comment (for general 'rules') >>we are under no obligation whatsoever to choose ISO's or anyone else's proposed standard as the preferred/main transliteration.<< (Also by @Theknightwho) The problem with not having some 'locked' directives, is, that talks be endless. Official things: (ISO, spelling directives of Academies or similar). Are not official things the first obligation of wikt? = credibility, stability, well-referenced, not subject to 'talks' and alterations. I dislike it too, but as a reader, I expect the info available. Otherwise, I would have to go elsewhere to get it. For some ISOs: Wiktionary's standards aspire to give better results than the official ISO :) That would be nice! But one has to see the comparison. ‑‑Sarri.greek  I 01:46, 11 March 2024 (UTC)[reply]
@Sarri.greek Yes, sometimes consensus is hard to achieve but we all know that some ISO standards are garbage and/or have no adoption, and many ISO standards simply have different aims than we do at Wiktionary. I think we should aim to not be gratuitously different from ISO standards where possible (e.g. we use ISO language codes whenever possible rather than incompatible ones), but at the same time not be bound by them (e.g. sometimes we merge lects that ISO considers different, and sometimes we split lects that ISO considers the same). Benwing2 (talk) 01:53, 11 March 2024 (UTC)[reply]
Ok, then @Benwing2. This is the end of this talk, so, my proposal for 2 transliterations is withdrawn. ‑‑Sarri.greek  I 02:07, 11 March 2024 (UTC)[reply]
@Sarri.greek I don't think you need to withdraw your proposal just to end the conversation :) ... I do think having multiple translits is an interesting idea to be potentially considered further. After all, this is not the first or second time this idea has come up. Benwing2 (talk) 02:11, 11 March 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Holding a discussion between two in this medium is difficult, I find one between so many impossible! I will only say that Dictionaries should be accessible (understandable to the "Man on the Clapham omnibus" — Oxford dictionaries appreciated this and have changed substantive to noun in their entries). I suspect that most people, not understanding IPA, use the transliteration as a guide to pronunciation. I hope that whoever makes a decision (the cynic in me says that it will probably be changed again next year) will bear the "man from Clapham" in mind.   — Saltmarsh 06:07, 11 March 2024 (UTC)[reply]

Ωωωω! my wise mentor and administrator for Greek, @Saltmarsh! Hear, hear! Thank you. ‑‑Sarri.greek  I 06:13, 11 March 2024 (UTC)[reply]

One system, multiple transliterations[edit]

For Vedic Sanskrit, transliteration is abused to show the placement of the accent. Our policy is not to show the placement in the spelling of the word. Now, for finite verbs incorporating prefixes, there are two possible placements for the same verb, depending on the grammatical usage of the verb. Is there an approved mechanism for showing the two transliterations, and if so, what is it and where, if anywhere, is it documented? Or do we only show the accent for finite verbs for the usage where a verb without a prefix would bear an accent? The placement in the other case appears to be reliably predictable if one can identify the prefix. --RichardW57m (talk) 11:07, 11 March 2024 (UTC)[reply]

There's an undocumented solution of using |tr2= in templates similar to {{head}}, which currently works for some (perhaps all) Sanskrit headword templates. @Theknightwho: I don't know whether it is likely to be declared a 'hack' and broken with disdain. It looks from the code of Module:headword that it is intended to work. --RichardW57m (talk) 15:48, 26 March 2024 (UTC)[reply]
@RichardW57m Stop tagging me if you’re just going to make rude comments. Theknightwho (talk) 15:52, 26 March 2024 (UTC)[reply]
@Theknightwho: Kindly advise whether this technique is safe to use. I'm not sure what to conclude from its lack of documentation. Perhaps the correct solution is to clone the headword module for Sanskrit, though I hope not. --RichardW57m (talk) 16:10, 26 March 2024 (UTC)[reply]

I refer to this marking of the accent as an abuse partly because transliteration-related categorisation assumes that explicit transliterations are exceptional and worthy of review, whereas it is the norm for words found in accented texts. --RichardW57m (talk) 11:07, 11 March 2024 (UTC)[reply]

How should we transliterate (into Japanese script or other scripts), romanize, and lemmatize Ryukyuan?[edit]

Previous discussions[edit]

The following previous discussions can have useful possibilities.

Information[edit]

Lately, the Ryukyuan orthography has been a mess. Various works vary between the hiragana or katakana or mixed script. There are vowels and syllabic consonants that cannot be transcribed cleanly/properly using Japanese orthography, so the central vowel (ɨ for example, サ行) been variously transcribed in Japanese script as シゥ, シィ, ス, す, スィ, ス𛅤 (CJK small katakana wi () if you cannot render this character), you name it. Aspirated and unaspirated consonants are also variously referred to as plain and glottalized consonants, and one of either is distinguished in hiragana or katakana. At Wiktionary we use an ad hoc transcription of inserting dakuten into the aspirated (Amami) and unaspirated (Okinawan/Kunigami), which is not used anywhere else. We also use an ad hoc method of including kanji in Ryukyuan languages, which some people do to transliterate Okinawa songs (but I can't find an example at the moment). Thus, 送り仮名 (okurigana) is basically another ad hoc transcription. In addition, we are basically duplicating kanji information from the Japanese entry, which requires more time and effort.

For the glottalized consonants such as [⸢ʔwáː] 'pig', should we do っわー, or ’わー?

Miyako has a special vowel, variously referred to as an apical vowel, laminal vowel, or fricative vowel (it is not a central vowel), which is variously transcribed as (S)ɨ, (S)ï, ʉ, ɿ, z, ü, you also name it. In fact, there are syllabic consonants in Ogami Miyako that cannot be transcribed cleanly in Japanese kana script, although there's a possibility that some Ogami words are actually reflections of a fricative vowel, as Kaneda Akihiro's vocabulary spreadsheet (from personal communication) does.

For romanizing, take Okinawan Shuri dialect [⸢ʔútɕínáː] for example. We could variously romanize it as ucinaa, 'ucinaa, ?ucinaa, uchinaa, uchinā, 'uchinā, you name it as well. And central vowels ([ɨ] in this instance) could either be transliterated as ï or ɨ (perhaps IPA only, so the former can be more plausible?), or we can transliterate [i] as yi and [ɨ] as i, and also have a glottalized initial as qV (as in qutyinaa). For aspiration, we could include <h> for aspiration, but nothing for unaspiration (or <'>), or include <'> for aspiration but nothing for unaspiration.

Finally, do we lemmatize at the kanji, the kana, or romanization? The current situation is just a total mess.

TL:DR: Transliteration and lemmatization of Ryukyuan needs a massive overhaul; it's a mess as of right now.

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Mcph2): This is an important discussion for the orthography and lemmatization of the Ryukyuan languages. Please come to a consensus. Chuterix (talk) 17:10, 11 March 2024 (UTC)[reply]

We should lemmatize at what native speakers have used the most, absent a standard orthography, regardless of if it seems inconsistent or "ad-hoc". Defective or variant orthographies are not specific to Ryukyuan, and in other cases, we list the variants as alternative forms with the "standard" or most-common form as the lemma. (Or in the case of two differently-pronounced words represented by the same orthography, we disambiguate in the etymology + pronunciation sections)
For Okinawan in particular, there are several works written in mixed script (Kanji & kana(, and it looks to be the traditional orthography as well, so I wouldn't support a move to solely kana, and definitely not the Latin script. The same level of research should be done for the other languages as well; if they are more-written in the Latin script or katakana, then shifts can be made, but the research needs to be done first. AG202 (talk) 17:25, 11 March 2024 (UTC)[reply]
As someone who does not read Japonic/Ryukyuan literature and cannot otherwise comment much on this, I would just like to register my (ignorant) doubts towards/concerns regarding Wiktionary constructs such as {{ryn-readings}} (the concept of on'yomi vs. kun'yomi, at least) and Category:Northern Amami-Oshima Han characters (the concept of "Ryukyuan kanji" in general). Kana orthography seems to be under-developed, let alone usage of kanji (or should it be the other way around? placenames, etc.). Are we just reapplying 標準語 kanji to Ryukyuan? (can we examine 1. Japonic dialects [using kanji seems non-problematic] 2. Chinese "dialects" [本字 debates, "unwritten", etc.] 3. Jeju [the concept of Sino-Jeju is discouraged on Wiktionary]? as a comparison point for this topic?) —Fish bowl (talk) 09:29, 14 March 2024 (UTC)[reply]

Recent change to government standard for Japanese[edit]

I broke this off into a subtopic because I do not understand Japanese (and therefore cannot check original sources) and I'm generally ignorant of CJK languages, but per Wiktionary:Grease_pit/2024/March#FYI:_Major_romanization_change_coming_in_Japan, the government standard in Japan for Japanese is now Hepburn. As AG202 notes above about "absent a standard orthography", I'm just soliciting that the feds there may have a standard for Ainu, Ryukuan, etc. as well and that standard may be Hepburn also. Sorry if my ignorance introduces noise. :/ —Justin (koavf)TCM 17:53, 11 March 2024 (UTC)[reply]

the government standard in Japan for Japanese is now Hepburn.

Notably, this is for romanization, which is included on various kinds of signage explicitly for foreigners, as part of the country's efforts to court tourism money. This shift to Hepburn has nothing to do with text written in Japanese or other Japonic languages, outside of this very limited context (signs for foreigners). ‑‑ Eiríkr Útlendi │Tala við mig 20:43, 12 March 2024 (UTC)[reply]

Wikimedia Foundation Board of Trustees 2024 Selection[edit]

You can find this message translated into additional languages on Meta-wiki.

Dear all,

This year, the term of 4 (four) Community- and Affiliate-selected Trustees on the Wikimedia Foundation Board of Trustees will come to an end [1]. The Board invites the whole movement to participate in this year’s selection process and vote to fill those seats.

The Elections Committee will oversee this process with support from Foundation staff [2]. The Board Governance Committee created a Board Selection Working Group from Trustees who cannot be candidates in the 2024 community- and affiliate-selected trustee selection process composed of Dariusz Jemielniak, Nataliia Tymkiv, Esra'a Al Shafei, Kathy Collins, and Shani Evenstein Sigalov [3]. The group is tasked with providing Board oversight for the 2024 trustee selection process, and for keeping the Board informed. More details on the roles of the Elections Committee, Board, and staff are here [4].

Here are the key planned dates:

  • May 2024: Call for candidates and call for questions
  • June 2024: Affiliates vote to shortlist 12 candidates (no shortlisting if 15 or less candidates apply) [5]
  • June-August 2024: Campaign period
  • End of August / beginning of September 2024: Two-week community voting period
  • October–November 2024: Background check of selected candidates
  • Board's Meeting in December 2024: New trustees seated

Learn more about the 2024 selection process - including the detailed timeline, the candidacy process, the campaign rules, and the voter eligibility criteria - on this Meta-wiki page, and make your plan.

Election Volunteers

Another way to be involved with the 2024 selection process is to be an Election Volunteer. Election Volunteers are a bridge between the Elections Committee and their respective community. They help ensure their community is represented and mobilize them to vote. Learn more about the program and how to join on this Meta-wiki page.

Best regards,

Dariusz Jemielniak (Governance Committee Chair, Board Selection Working Group)

[1] https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_elections/2021/Results#Elected

[2] https://foundation.wikimedia.org/wiki/Committee:Elections_Committee_Charter

[3] https://foundation.wikimedia.org/wiki/Minutes:2023-08-15#Governance_Committee

[4] https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_committee/Roles

[5] Even though the ideal number is 12 candidates for 4 open seats, the shortlisting process will be triggered if there are more than 15 candidates because the 1-3 candidates that are removed might feel ostracized and it would be a lot of work for affiliates to carry out the shortlisting process to only eliminate 1-3 candidates from the candidate list.

MPossoupe_(WMF)19:57, 12 March 2024 (UTC)[reply]

Last month, this user added well over a thousand problematic Greenlandic entries over a day or two by scraping a Greenlandic dictionary site and running an unauthorized bot on their account. I blocked them from mainspace and the Reconstruction namespace as an unauthorized bot and asked for help at the Grease pit (see Wiktionary:Grease pit#Hundreds of Incomplete Greenlandic entries need to be cleaned up) on getting them up to Wiktionary standards. The consensus seemed to be that it would be best to just nuke them all, which I have since done, for the most part. Aside from copyvio concerns (compilation copyright, if nothing else), the verbatim inclusion of typos and other irregularities in the headwords showed that the bot run had been prepared with only minimal attention to the content. They have admitted that they don't speak Greenlandic at all (they're editing from Brazil).

The user responded by apologizing on their talk page and by attempting to clean the entries up using an alternate account and as an ip, for which those were blocked by others on grounds of block evasion.

We need to discuss what to do next. While their methods were wrong, their motivation was to add content to the dictionary. They have admitted their mistakes and agreed not to repeat them. I made a point of only blocking them from two namespaces so they could discuss things here and on talk pages. This should not be about punishment for anything they did, but about whether they can be trusted to edit responsibly and add worthwhile content.

Pinging participants in the Grease pit discussion: (@Benwing2, DCDuring, Thadh, Vininn126), and users I've seen editing Greenlandic entries: (@Gamren, Jakeybean, Tesco250). Chuck Entz (talk) 14:57, 13 March 2024 (UTC)[reply]

The only input I can give is on admin decisions - I definitely think we should WT:Assume good faith and discuss with this user and teach them. Unfortunately when it comes to specifically Greenlandic I am very unfamiliar. I do think that they should stick to languages whose text they can at least read and understand (and not just rely on something else). Perhaps this user shouldn't be editing Greenlandic at all. Vininn126 (talk) 15:02, 13 March 2024 (UTC)[reply]
Unfortunately our dictionary does suffer from a severe lack of terms in many languages. However, if we don't have any editors who know the language, there is nothing we can do about that. The best course of action in my opinion would be to simply remove all these contributions, because currently a larger problem we are facing as a dictionary is untrustworthiness, which in turn decreases the number of willing editors in these languages. Better to not have any entries in a language than to have hundreds of questionable quality and validity at best. Thadh (talk) 16:35, 13 March 2024 (UTC)[reply]
Untrustworthiness is probably mostly based on English entries. Maybe we need to start over with a clean sheet of virtual paper. DCDuring (talk) 18:46, 13 March 2024 (UTC)[reply]
@DCDuring: I feel like you are using some kind of tone that isn't being taken over into your writing. What are you saying? That we should re-do our whole dictionary? Also applies to the message below, I'm confused what your opinion is. Thadh (talk) 19:46, 13 March 2024 (UTC)[reply]
I found the argument given spurious. If our problem is that we are thought untrustworthy, I find it hard to believe that the problem can be anywhere other than English entries. If untristworthiness is a reason to delete content, then it is English entries that should be deleted. DCDuring (talk) 20:02, 13 March 2024 (UTC)[reply]
Most of the readers I interact with don't even use the English entries, so I think you're talking about a whole other reader base. If a language's sections don't have any references and half of the time feature an incorrect translation, then this is a disservice to the readers, and we should remove or improve those sections. If you think an English entry does not fulfill our CFI, you should RFV it, too, but mostly our English entries are pretty well-formed and represent the language adequately, as they are proofread by hundreds of native speakers. Not at all the case with our other language sections. Thadh (talk) 20:51, 13 March 2024 (UTC)[reply]
I've barely glanced at English entries except occasionally to make Romance etymologies that bleed into them more consistent. Nicodene (talk) 12:52, 15 March 2024 (UTC)[reply]
I guess that either we don't need no stinking first-draft-level Greenlandic entries from a volunteer or we should be trying to recruit someone (from where?) to add them from scratch. DCDuring (talk) 18:44, 13 March 2024 (UTC)[reply]
Perhaps we could see whether there is some other language's wiktionary that has some good Greenlandic entries. da.wikt? is.wikt? DCDuring (talk) 20:02, 13 March 2024 (UTC)[reply]
@DCDuring Alas, no. da.wikt has ~ 100 Greenlandic entries and they're all extremely basic stubs with a single-word definition and nothing more. is.wikt is even worse, with only 2 basic stub Greenlandic entries. In general, many non-English Wiktionaries are slim pickings; the entries are typically OK only for the native language of the Wiktionary in question and often not even then. For many languages, en.wikt does a far better job than the corresponding language's own Wiktionary. Benwing2 (talk) 07:28, 14 March 2024 (UTC)[reply]
It seems a shame that there are so many resources for Greenlandic available from the Greenland Language Secretariat to evaluate and improve the first-draft/stub entries, but we can't find the linguistic talent motivated to improve the entries. Oh well. DCDuring (talk) 14:30, 14 March 2024 (UTC)[reply]
@DCDuring You are welcome to do the entries yourself. Theknightwho (talk) 00:35, 15 March 2024 (UTC)[reply]
Needless to say, please do consult a grammar (or more) before doing so. Thadh (talk) 09:10, 15 March 2024 (UTC)[reply]
I'll just request Greenlandic translations for the organisms that sometimes live there. Maybe I'll venture to add a Greenlandic entry for them too. DCDuring (talk) 12:48, 15 March 2024 (UTC)[reply]
As I understand the question, it is not about what to do or not do with Greenlandic entries, but whether to lift the blocks. My inclination is to remove the blocks while admonishing the user to constrain their edits in future to their languages of (reasonable) competence. Given the apologies, I feel we currently do not have more reason to distrust this user than any random new user.  --Lambiam 20:58, 20 March 2024 (UTC)[reply]
I haven't had time to review the user's edits in depth, but reading through the user's talk page and this discussion, I am inclined to agree: as far as the user/block is concerned, we can unblock and see whether subsequent edits are good or not. (If no other admin wants to beat me to that, and if no-one has objections, then someone can ping me in a few days and I'll unblock them.) - -sche (discuss) 18:07, 23 March 2024 (UTC)[reply]
Yes, this is fine with me too. When I have blocked users for long periods or indefinitely, it's generally because the user (a) thinks they know what they're doing but doesn't, (b) continues adding trash despite multiple warnings, and (c) doesn't respond to those warnings (or responds defensively and denies there's an issue). It's a good sign if a user apologizes and promises to change their behavior (although some users do that and then continue the same pattern of adding trash, so they need to be monitored). Benwing2 (talk) 18:26, 23 March 2024 (UTC)[reply]

Hoping to convene on practice regarding natural overlap of hyponyms and derived terms[edit]

There are many nouns for which a population of hyponyms and a population of derived terms will quite naturally have a substantial overlap. For example, in English, the noun list has laundry list, punch list, and dozens more. The theme is quite generalizable. What I propose here is to codify the principle that it is not wrong to show such terms both in a hyponyms section and in a derived terms section. Earlier I had avoided doing so because I anticipated that otherwise someone else might complain that having the same term twice on the page was "clutter". But there are some good reasons, regarding w:structured data, why allowing for the natural degree of double-posting is a good idea. Does anyone strenuously object to doing so? Note that column wrappers can be used, so there is no excuse for a section with lots of content not to auto-collapse. Thus, the user will not be presented with a giant unfolded list. Thanks. Quercus solaris (talk) 17:34, 13 March 2024 (UTC)[reply]

Definitely not wrong, actually encouraged inmo. Thadh (talk) 17:37, 13 March 2024 (UTC)[reply]
I support this. "Laundry list" is both a derived term from "list" as well as a hyponym of "list". CitationsFreak (talk) 18:38, 13 March 2024 (UTC)[reply]
It is pure clutter on taxonomic name entries. I have limited derived terms to items that are not hyponyms, ie, accepted species names are hyponyms, not derived terms. No-longer-accepted species names that have been placed in other genera may appear as derived terms. The rule-driven, mechanical nature of the derivation of almost all tribe and family names and many order names makes their inclusion in any derived terms lists often of similarly low value, insufficient to warrant the clutter. OTOH, if we really want this duplication, there is nothing to prevent a bot from doing the job thoroughly. DCDuring (talk) 20:14, 13 March 2024 (UTC)[reply]
User:Sae1962 had a frequent habit of adding extremely basic hypernyms (i.e. going the other way), so he would take something like Hypertext Markup Language and add a hypernym of language. I consider this fairly unhelpful to human readers, though technically correct: it reminds me of reading Java or .NET programming documentation where you have a huge list of derived or inherited classes going all the way back to object, because everything is an object in the end. Equinox 20:20, 13 March 2024 (UTC)[reply]
i thought the Derived terms section was only for terms that could not fit under Hyponyms or some other section (though I suppose it would nearly always be Hyponyms). I think there are at least some pages that have an HTML comment in the Derived terms section warning users not to add terms that could be put into a hyponyms section or some other section. But I didnt bookmark anything and I dont see it on the policy page. Soap 20:53, 13 March 2024 (UTC)[reply]
Great points everyone. Thanks. Perhaps not a firm rule to be codified now. Wiktionary could impose firm consistency retroactively, later, if it ever feels the need. So where I'll leave it is that I'll respect and obey any existing setups that don't double-post (such as taxonomy, or entries with comments discouraging it). And I'll follow the principle that whichever method is used, just make sure that auto-collapse is keeping everything nice and orderly. Quercus solaris (talk) 16:08, 14 March 2024 (UTC)[reply]

Renaming "etymology-only language"[edit]

@Theknightwho, -sche I think the time has come to rename the term "etymology-only language" to something else. This term is cumbersome, and while it was accurate originally when the codes in question could be used only in etymology templates, it's long outgrown that particular use case. I would propose one of "dialect", "subvariety" or "sublect". "Dialect" is the most straightforward and arguably is exactly what these varieties are in most cases, but it's a bit of a loaded term given the longstanding language-vs-dialect controversy that happens with many language varieties. Thoughts? Benwing2 (talk) 04:57, 14 March 2024 (UTC)[reply]

I'll just add we treat Middle Polish as such, and I'm not sure dialect would be the best term for it. Unless we accept "dialect" to mean "any variant of"... Vininn126 (talk) 07:21, 14 March 2024 (UTC)[reply]
@Vininn126 Good point. That is why I suggested "subvariant" and "sublect". "Variant" on its own could sort of work but it feels too vague without some other qualifier, since "variant" and "lect", at least in some contexts, are generic terms covering any type of language. Benwing2 (talk) 07:25, 14 March 2024 (UTC)[reply]
@Benwing2, Vininn126: May I suggest merolect? The word sees no established usage, so we can thereby avoid any undesirable connotations, and with the etymological sense of "part-language", it means exactly what we want it to mean. 0DF (talk) 14:33, 14 March 2024 (UTC)[reply]
Variant is succinct, and covers all the different kinds of etym-only language: dialects, chronolects, regional varieties, written standards etc. Theknightwho (talk) 14:37, 14 March 2024 (UTC)[reply]
@Benwing2, at el.wikt we mark them as 'sublang' = subordinate languages. The weird thing here, is that they can be donors but not receivers. How is this possible? The 'subordinate' or 'hosted' languages/varieties/dialects/whatever have Cat:Terms derived from this.sublang (donor to other languages) Cat:Sublang terms derived from X.languagage (as receivers) e.g. MedLat alchemia at wikt:el:alchemia has a Cat:Med.Lat terms borrowed from arabic. ‑‑Sarri.greek  I 15:03, 14 March 2024 (UTC)[reply]
@Sarri.greek I'm not keen on this; I'm not sure about Greek, but in English the term "subordinate" implies a lesser status, which is likely to put some contributors off. Theknightwho (talk) 16:23, 14 March 2024 (UTC)[reply]
I meant, M @Theknightwho, that they are marked at module sublang=true. If the question is about the 'name' of all of them, it doesn't matter. But, how could this title convey that these languages are not allowed what the others are? They are code-only languages with only existence, in the template {{m}} and being a donor but never a receiver at etym.templates. My big surprise, worry, and question is: why are they not receivers?? Probably this is not the place to ask this. I just bring it up because it is relevant, and because I do not intend to open such a subject myself. ‑‑Sarri.greek  I 16:41, 14 March 2024 (UTC)[reply]
@Sarri.greek We do often include them in descendant sections. Also, the elephant in the room is Chinese, which we already subdivide for this purpose already; we simply group them all under one header. Theknightwho (talk) 16:44, 14 March 2024 (UTC)[reply]
Please, please, Sir, think about it! @Theknightwho, Benwing2 why etymologies should be inaccurate? Medieval Latin alchemia and similar LaMed words, give the Cat:Latin terms derived from Arabic, which should include only a subcategory: Medieval Latin terms derived from Arabic. cf @el.wikt.Cat.Lat.from.Ar has only this subcat. The etymologies of descendants, should say 'from Med.lat' not 'from Lat'? because it is a medieval word. ‑‑Sarri.greek  I 16:58, 14 March 2024 (UTC)[reply]
@Sarri.greek This is an orthogonal point. I think you're asking for allowing etym-only languages in the |1= param of etym templates and categorize under e.g. CAT:Medieval Latin terms derived from Arabic in addition to CAT:Latin terms derived from Arabic. We don't currently do this but the trend is towards allowing etym-only languages in more places (hence this renaming discussion), so potentially we could allow this. IMO though this should be a separate discussion from what we should rename "etym-only language" to. Benwing2 (talk) 20:45, 14 March 2024 (UTC)[reply]
Agreed. Vininn126 (talk) 20:47, 14 March 2024 (UTC)[reply]
May I suggest variety? It is the most neutral commonly-used term that comes to mind. ‘A variety of Spanish’ brings up some seven million hits on Google. Nicodene (talk) 15:41, 14 March 2024 (UTC)[reply]
Yeah, this is probably a better suggestion than "variant", and is a widely-used term. Theknightwho (talk) 16:20, 14 March 2024 (UTC)[reply]
So far variety is my top option. I understand the logic of merolect but I think we should avoid obtusisms if possible. Vininn126 (talk) 16:23, 14 March 2024 (UTC)[reply]
@Nicodene, Theknightwho, Vininn126: I'm also happy with variety. 0DF (talk) 17:47, 14 March 2024 (UTC)[reply]
"Dialect" isn't ideal, because we already have dialectal data modules. Likewise, "variety" isn't ideal, because language data has a field for "varieties" that is just a list of names (see e.g. Category:English language). — SURJECTION / T / C / L / 18:41, 14 March 2024 (UTC)[reply]
@Surjection I’d argue the opposite: the two listed for English are both lects which would benefit from having a code of this type, so naming them “variety codes” make total sense. Theknightwho (talk) 19:45, 14 March 2024 (UTC)[reply]
Can you say the same of every "variety" currently specified for every language? — SURJECTION / T / C / L / 20:26, 14 March 2024 (UTC)[reply]
Do you have an alternative suggestion? Vininn126 (talk) 20:27, 14 March 2024 (UTC)[reply]
My point is that we should avoid adopting terminology that is already used for something else. It's only going to make everything more confusing than it is. — SURJECTION / T / C / L / 20:42, 14 March 2024 (UTC)[reply]
I don't think this is something else, though - the whole point of the varieties field is to list more specific types of the main language, which is precisely what these codes are for. I've not been able to find a counter-example to that yet, since alternative names for the language itself should go under the "aliases" field instead. Theknightwho (talk) 21:24, 14 March 2024 (UTC)[reply]
@Surjection I think we shouldn't worry about existing internal names. The "dialectal data modules" are probably going away in any case (see my post in the Grease pit) and we can rename the language data field. Benwing2 (talk) 20:40, 14 March 2024 (UTC)[reply]
BTW since most people seem to support the term "variety", maybe we can call the internal data field "variant" or "lect". Benwing2 (talk) 20:41, 14 March 2024 (UTC)[reply]
Sure, renaming that field is another option, and then we can call etymology-only language "varieties". — SURJECTION / T / C / L / 20:42, 14 March 2024 (UTC)[reply]
@Surjection Can you provide an example of something which should go under the "varieties" field in the language data which shouldn't ever have an etymology-only code? Theknightwho (talk) 21:25, 14 March 2024 (UTC)[reply]
I'm not saying I know of any such cases. What I am saying is that nobody knows, until the work to check them is put in, that all of the currently registered "varieties" could reasonably have their own codes. — SURJECTION / T / C / L / 21:31, 14 March 2024 (UTC)[reply]
Alright - I'll do that. Theknightwho (talk) 21:56, 14 March 2024 (UTC)[reply]
Just FYI I suspect that everything that qualifies as a "variety" under the variety field can reasonably have an etym-only code. We have current etym-only codes for conventional dialects (regional lects/topolects), chronolects (e.g. Early Modern English), registers/sociolects (e.g. Katharevousa), cants (e.g. Polari), even writing systems (e.g. Wade-Giles). It might be useful to set up the ability to categorize etym-only varieties by the type of lect involved; currently this info is found only in the associated category and only at the level of regional lect vs. everything else. Also, if we get serious about adding etym-only codes for all varieties, we might want to split the data into submodules the way we currently do for full languages. Benwing2 (talk) 22:32, 14 March 2024 (UTC)[reply]
If we're adding etym-only codes for all varieties, do we still need a "varieties" field in Module:languages, or is it just redundant to Module:etymology languages/data and its "parent"/"3" field? (I think the/an original reason varieties were listed in Module:languages is so people searching the module for e.g. Twi would find what code covered it; we might want to retain "varieties" that have ISO codes but let Module:etymology languages/data handle all the other ones...?) - -sche (discuss) 23:10, 14 March 2024 (UTC)[reply]
My (half-serious) suggestion last time this came up was "subsumed variety", since that seems to be the distinguishing characteristic (?), that these are codes that are subsumed under other codes. There are some edge cases like substrates which none of the proposed names fit, e.g. the pre-Roman substrate of the Balkans—or as it was recently (non-consensusly?) renamed, Paleo-Balkan—is not really a "dialect" or "subvariety" or "subsumed variety" of anything, it's "one or more unknown languages from place X". I agree with Surjection we shouldn't be using the same name for two nonidentical things, so if we call these "varieties", we should consider whether to rename or retire the "varieties" field in Module:languages as discussed above.
BTW, re "dialect", another issue with that term is: are things like "Classical Latin" and "Late Latin" "dialects", per se? (If they are, we need to update the entry dialect.) - -sche (discuss) 23:10, 14 March 2024 (UTC)[reply]
Anyway, I think "variety" is fine as long as we decide what to do with the "varieties" field in Module:languages. (In particular, if we're giving every one of Module:language's "varieties" a code, do we still need the "varieties" field? Maybe we just make sure the "search for a language" thing on Module languages also searches through the list of variety codes, and that solves the issue that "varieties" were IIRC initially added to Module:languages to deal with, which was if someone wondered what code to enter e.g. "Twi" under.) - -sche (discuss) 18:11, 23 March 2024 (UTC)[reply]
@-sche I think for the moment we will need to have the "varieties" field because a lot of the info in that field isn't well-vetted, meaning it will take work to convert the info in that field into proper etym-only languages. Since the current practice is to not put varieties in the "varieties" field that also exist as etym-only languages, I think we should rename the field "other lects" (or "other variants" or even just "other varieties"). As for the search box on Module:languages, it looks like it already does what you want it to do; e.g. if I search for "Twi", the first entry that comes up is in Module:etymology languages/data, which is the module holding etym-only languages (aka variety codes). Benwing2 (talk) 18:21, 23 March 2024 (UTC)[reply]

Proposal[edit]

@Theknightwho, -sche, Vininn126, Surjection, Nicodene, 0DF, Sarri.greek Pinging the people who took part in the above discussion. I'd like to formally propose renaming "etym-only language" to "language variety" and rename the "varieties" field in Module:languages to "lects".

Note also: As per my recent discussion Wiktionary:Grease pit/2024/March#merging lect info, I'd like for the "varieties"/"lects" field to go away in favor of consolidated info somewhere, probably in the labels data modules (which is where I've put such information for Chinese, see Module:labels/data/lang/zh). Then the lects can be pulled out of the label data by looking for labels with the parent field (which indicates they are lects in a tree of such lects). But whether you like or dislike this approach and/or would prefer a different one is orthogonal to the above proposal, which is only about renaming terminology that has outlived its usefulness.

Implementation: This should not be terribly hard as AFAIK we don't have any templates that specifically reference etymology-only languages using that name. We'd just need to rename Module:etymology languages and associated data modules to Module:language varieties, change any references to those modules in other code (which shouldn't be that many since most of them go through Module:languages anyway), and update documentation. We should also consider (at the same time or later) renaming the methods getNonEtymological, getNonEtymologicalName and getNonEtymologicalCode to getFullLanguage, getFullLanguageName and getFullLanguageCode. This can be done by renaming the methods in Module:languages while keeping the old ones as aliases until all callers are updated. Note that this is all purely internal and won't affect any mainspace Wikicode. Finally, we need to rename the "varieties" field in the languages extra data to "lects"; again this is all internal, and hardly anyone references or uses this data so it won't be very much effort.

Please indicate support, abstain, oppose, etc. below so that we have clear consensus for doing this.

Benwing2 (talk) 05:36, 2 April 2024 (UTC)[reply]

Support what you want since it is internal anyway and you needed to do something about redundant data fields. Fay Freak (talk) 05:43, 2 April 2024 (UTC)[reply]
Support Vininn126 (talk) 05:52, 2 April 2024 (UTC)[reply]
SupportSURJECTION / T / C / L / 06:02, 2 April 2024 (UTC)[reply]
Support 0DF (talk) 06:46, 2 April 2024 (UTC)[reply]
Support, support! ‑‑Sarri.greek  I 10:13, 2 April 2024 (UTC)[reply]
@Theknightwho, -sche, Vininn126, Surjection, Nicodene, 0DF, Sarri.greek Sorry for the second ping. I went to implement the first part of this (`varieties` field -> lects) and I realize there's a small issue, which is that there is currently a `varieties` field for all three of languages, families and scripts, and "lects" is only applicable to languages. We could keep `varieties` for families and scripts except there are also "family varieties" renamed from etymology-only families (just Old and Middle Iranian languages). There are two other possibilities I can think of, which are "variant" and "subvariety". I think subvariety might be confusing in that people would think there's something inherently "lesser" about the subvarieties listed in the extra data, so I propose "variant". Benwing2 (talk) 19:43, 5 April 2024 (UTC)[reply]
@Benwing2: It's not as good, but since it's all internal, it doesn't matter much, so I'm also OK with variant. 0DF (talk) 20:25, 5 April 2024 (UTC)[reply]
Not to get off-topic, but given the way we use "Middle Iranian"—that we actually reconstruct terms in it, "Middle Iranian *foobar"—it does not make sense to me that we're calling it a family code. I seem to recall it was just one user who insisted we mustn't internally put it in the lect module? But maybe we just take a !vote and see whether people agree with that, if treating it as a family is also complicating other things. If it's not a lect, then I'm not sure it makes sense to be reconstructing terms in it; if we're reconstructing terms in it, then we're ipso facto treating it as a lect, similar to a reconstructed language (like how some people think Proto-Indo-European may have been more of a dialect continuum than a unitary lect, but we still treat it as a lect for our purposes). - -sche (discuss) 22:59, 5 April 2024 (UTC)[reply]
@-sche Pinging User:Vahagn Petrosyan who I think is the one who mostly uses it, and User:Theknightwho who may have opinions. IMO neither "Middle Iranian languages" nor "Old Iranian languages" should exist at all but it seems that Armenian specialists like to reconstruct terms in "unspecified Middle Iranian" and "unspecified Old Iranian". Maybe these should be treated as etym-only languages (aka language varieties) whose parent is a family (which is allowed), and called "unspecified Middle Iranian" etc. since that's what they are. I should note though that currently we have the variety field filled out in various places for all three of languages, families and scripts. If we eliminate "family varieties" "Middle Iranian" and "Old Iranian", we could keep the variety as variety for families and scripts and use lect for languages, although that might be slightly confusing; or we could rename "etym-only languages" to "lects" instead of "varieties". I dunno. Benwing2 (talk) 23:28, 5 April 2024 (UTC)[reply]
As I said before, I can stop using Middle Iranian and Old Iranian. Simply "Iranian" is good enough for me. Vahag (talk) 11:01, 7 April 2024 (UTC)[reply]
I can also, since you have a technical background providing a reason for it, basically simplification, as the sole reason I use etym-only-languages “Middle Iranian” and “Old Iranian” is periodization, me informing the reader that I have an idea whether the Greek term (for example) was borrowed 0–700 CE or 700–0 BCE. One can see internal differences between Old and Middle Iranian but they are only rough distributions. Inb4 Victar is annoyed due to witnessing a change he has not consulted about, in spite of seeing this discussion header – techbro imperialists removing languages, d'oh! (But really changing the internal relations of preserved langcodes does not change the dictionary content, so one can be bold about recoding them, as you do nothing without affording most diligent advice.) Fay Freak (talk) 12:33, 7 April 2024 (UTC)[reply]
Support variant even better! thank you for your hard work! ‑‑Sarri.greek  I 20:14, 5 April 2024 (UTC) PS The actual situation of it, is 'hosted' or 'subordinate to' = We find this language hosted Under the title 'XX language' Regardless of the reason (etymological, regional, or other). ‑‑Sarri.greek  I 21:14, 5 April 2024 (UTC)[reply]

Wiktionary really needs structured etymology[edit]

I've become convinced that Wiktionary's current etymology system, in which each entry contains the complete ancestry of a term, is creating massive problems that prevent Wiktionary from being a good etymological dictionary. Here are the problems:

  • Massive duplication: consider English puny and its earlier form puisne. We repeat the exact same information in different entries, blatantly violating the DRY principle. That's just two entries: the etymology of a widely-borrowed term like the ancestor of English sugar has to be duplicated across hundreds of languages. Often, editors don't bother and just write something like "see term#English for more details".
  • Entries falling out of sync: English nexus claims to derive from Proto-Indo-European *gned- or *gnod- through Latin necto. But necto was recently revised, claiming that its origin is "uncertain". Which entry is a reader meant to trust? This kind of inconsistency is actually encouraged by the current system, because after changing editing the etymology of a term, an editor has to hunt down and correct every single place where that etymology is referenced or copied. More often than not, they don't, and the result is that entries can drift out of sync and sometimes even contradict each other.
  • Redundant edits: editors spend large amounts of time expanding "derived terms" and "descendants" sections which is necessary only because of limitations in the current system. Because if we know that A is an ancestor of B, there's no point in also writing that B is a descendant of A—that's clearly implied. But we have to anyway, since there's no automated system that can make that logical step.

Structured etymologies would also let us do cool things, like create etymological trees and automatically find cognates and doublets across different languages.

Here is a simple model for creating structured etymologies:

  • Each etymology section of an entry needs to be associated with one or more etymons. An etymon is a term which is the ancestor of another with no intermediate steps. Thus, the etymon of puny is puisne, the etymon of puisne is Anglo-Norman puisné, and so on.
  • An entry can have more than one etymon. For example, English arrangement can be said to derive from English arrange, English -ment, and French arrangement.
  • There are different kinds of etymons: English fullwidth clearly derives from full +‎ width, but is also calqued from Japanese 全角 (zenkaku). The first two are morphological etymons, while the last is a semantic etymon. Another example is bullroar (sense 3) which is morphologically from bull +‎ roar but semantically from bullshit.
  • An etymon can also have a degree of certainty: the levels might be "certain", "likely", and "unlikely". This is an improvement from the current system, where something is either {{derived}} or it isn't. Sometimes, when editors aren't confident, they add |nocat=1, but this isn't a standardized practice.

Thus, to create a list of derived terms or descendants, all you would need to do is get a list of entries which have a particular term as an etymon.

The main problem to consider is how all this can be accomplished. Here are the possibilities:

  1. Use Lua data modules, which are well-established on this week but are fairly unintuitive for new users and might cause performance issues.
  2. Get an extension like Wikibase, which is already used on Wikidata. To be clear, I'm not saying we should turn Wiktionary into Wikidata, but rather use a Wikidata-like structure for this application. The drawback is that this would require WMF developers to get something done (not their strong suit).
  3. Use bots. A bot can essentially function as a parser which converts high-level information into wikitext. This kind of thing is already being done with our {{anagrams}} system. This is the technically simplest solution, but would require someone to continually run a bot.
  4. Do nothing and keep writing etymologies manually. This is easier for now, but probably not a great long-term solution.

Another problem to consider is how this structured information should be presented to the reader. But all in all, I'm curious as to what the community thinks should be done. Ioaxxere (talk) 23:18, 14 March 2024 (UTC)[reply]

I agree that the current way etymologies are handled is problematic. If there is some technical solution to that, it would be great; I don't understand that side of things so am not sure what can be done.--Urszag (talk) 00:17, 15 March 2024 (UTC)[reply]
@Ioaxxere IMO none of your proposed solutions is workable. I would rather suggest a scraping solution. This is what we do for Descendants, for example, and it seems to work fairly well. (Your proposed solution #1 was tried for Descendants prior to implementing scraping, and failed, which led to the scraping solution.) This should not be too hard, but it might require the introduction of a few more templates to more clearly spell out the relations between etyms. Not sure. Benwing2 (talk) 00:46, 15 March 2024 (UTC)[reply]
@Benwing2 By scraping, do you mean creating an etymology equivalent of {{desctree}}? I think the main limitation of this is that you can only go in one direction (i.e. you wouldn't be able use the etymology data to get a list of descendants). But I would definitely consider that an improvement over the current situation.
Actually: I thought of a way to resolve this, by using the category system. We already have categories like Category:English terms suffixed with -en. If we created categories for every single "terms descended from LEMMA" (there would be millions) this could theoretically be used to encode a tree. What do you think? Ioaxxere (talk) 02:15, 15 March 2024 (UTC)[reply]
@Ioaxxere Yes, something like {{desctree}}. It's true this wouldn't easily let you go from lists of descendants to ancestors and vice-versa, but (a) it would solve the other issues, (b) it's not clear in any case you would want to automate things in both directions in all situations; there are lots of complex cases involving etymologies that can't be neatly categorized and need descriptive text, and the Descendants lists and Etymology sections are conceptually different in their current implementations. Since the etymology sections are less structured than Descendants sections, some thought would have to go both into the conventions needed in Etymology sections so that the scraping result looks reasonable, and into how to implement the scraping itself and handle the various edge cases. Not something I have time to work on now but I agree it would be a good idea in the longer run and avoid lots of duplication and the inevitable bit rot associated with this (maybe there's a better term than "bit rot" to describe the inevitability of things getting out of sync when you have duplication). Benwing2 (talk) 03:28, 15 March 2024 (UTC)[reply]
I wholeheartedly agree but think you may be underestimating the sheer scale of the undertaking. Even with bot help it will take a lot of hands-on work by knowledgeable editors. My instinct is to pare this down to the basic core: set etymologies to point only one lemma higher and add the kind of 'scraper' currently being discussed. (And then call in the clean-up crew for a massive spectrum of languages...) As someone who edits mainly ety and desc sections, this alone, if feasible, would clear up 80% of my headaches. Nicodene (talk) 05:39, 15 March 2024 (UTC)[reply]
I don't disagree with 'make etymologies only point to the next level up' as a goal, but the issues that have come up when that's been discussed before, which you are probably aware of but which I want to make sure are mentioned here now that the idea itself has been mentioned, include (1) what if the next level up doesn't have an entry? e.g. I just added an etymology to chabot, but neither the Occitan chabotz nor the Latin capoceus exists to house the information that they ultimately seem to come from caput. (maybe in that case we add the full ety to chabot but also add a template that categorizes the entry as needing Occitan and Latin editors to help by creating those entries and moving the information thither?) (2) someone who's only interested in e.g. the etymology of English or French words now has to watchlist Occitan and Latin (etc) pages to see if that etymology gets changed; in some cases where minor languages see vandalism, this would make it less likely to be spotted (e.g. people try to add all kinds of weirdness to Kamboja, and if they were adding it to some less-watched Indian language instead, it'd get noticed less). (Also 3: if I want to know how many Old English words survive in English, and our English entries only point to Middle English, I'm stymied... but that one we could solve if the scraper/bot mass-adds that template that people currently use to add categories to etymologies.) - -sche (discuss) 06:39, 15 March 2024 (UTC)[reply]
For 1) I meant the next entry up which exists, and if it's a dead end, the etymology is left as-is. The proposed change wouldn't affect the chabot situation one way or another. For 2) I don't have an answer. Is vandalism that bad of a problem? Maybe I've just not noticed since I'm not the one dealing with it. 3) I wouldn't do this without including some kind of automatic categorization that runs through the etymology chain, or maybe regular bot sweeps that add/fix dercat. Not that I know how feasible that is, now that you mention it. Nicodene (talk) 09:44, 15 March 2024 (UTC)[reply]
For point 2 with Wikibase, the model is not Wikidata but c:Commons:Structured data. It's a new tab with data that bots can populate from key templates. Lua can access it recursively to create new and more powerful templates. Other tools like SPARQL can query the data. It's all about how to model metadata. Vriullop (talk) 09:33, 15 March 2024 (UTC)[reply]
I generally support this idea but I have no strong opinions on what the exact solution should be. I like being able to point to a specific etymon to generate structure from there, however that structure looks. Vininn126 (talk) 10:11, 15 March 2024 (UTC)[reply]
I've been thinking about this, and I'm starting to feel that a category system is the only sensible way to implement this. Here's how it would work:
  1. Start at an entry (say biology)
  2. Add an etymon template to the top of the etymology section. It might be formatted like this: {{etymons|en|id=life science|Biologie#German: biology|bio-#English: life|-logy#English: study}}. The |id= parameter defines the {{etymid}}, while the subsequent parameters link to etymons by their etymids.
  3. The {{etymons}} template adds the category Category:ety:biology (English: life science), which represents a node in the etymology tree (although the naming scheme isn't final).
  4. A bot creates the category with {{auto cat}}.
  5. {{auto cat}} scrapes the page biology and discovers the {{etymons}} template. Using this information, it adds Category:ety:biology (English: life science) into the categories Category:ety:Biologie (German: biology), Category:ety:bio- (English: life), and Category:ety:-logy (English: study).
Now, getting the descendants or derived terms of biology is as simple as seeing what entries are in Category:ety:biology (English: life science). There might be subpages, like Category:ety:biology (English: life science)/uncertain or Category:ety:biology (English: life science)/semantic, to include cases I discussed in my original post. But overall, the concept is essentially {{prefixsee}} or {{suffixsee}}, just for every term. @Benwing2, would you support implementing this template? It could coexist in parallel with the current system for now until we figure out a way forward.
To answer a few others:
  • @Nicodene: Yes! Having etymologies point only one lemma higher is the entire purpose of this proposal. Because if we have a chain A -> B -> C, there's no reason why C needs to "know" that it comes from A. It's implicit. The problem is that editors spend lots of time writing out the entire chain on every entry, and this is done in an inconsistent way. But there's no rush to implement this on a massive scale right away. As stated above, we should have this coexist with the current system.
  • @-sche: Those are honestly good questions. In the case of A -> B -> C, if B doesn't exist, it might be reasonable to create it as a "dummy entry" with an etymology section and nothing else. Another possibility is to just link A -> C. For the second point, I don't think we should be designing our systems with the expectation of vandalism. But yes, an editor would have to watch a variety of pages to follow an entire etymological chain. However, someone who's really only interested in English etymology wouldn't care about, say, which PIE root an entry comes from, because that's not English etymology. In the case of French chabot, we might do {{etymons|fr|id=fish|caput#Latin: head|q1=uncertain}}, which would add it into Category:ety:caput (Latin: head)/uncertain.
  • @Benwing2: I think the term you're looking for might be entropy.
Ioaxxere (talk) 14:43, 15 March 2024 (UTC)[reply]
@Ioaxxere I think we need a solution that can work with existing etymologies. That probably means accessing a chain of data if possible from a single entry, and if that doesn't work, falling back to accessing from multiple entries in a chain. I don't think having an entirely new system in place in addition to the old system will work very well. Benwing2 (talk) 19:32, 15 March 2024 (UTC)[reply]
@Benwing2: What you're asking for seems impossible. The current system is ambiguous in that we link to an entry without specifying its etymid, meaning that "going up the chain" is rarely possible to do in an automated way. If your plan then involves specifying etymids for every etymology section, then we might as well just overhaul everything, because it's the same amount of work anyway. Ioaxxere (talk) 20:53, 15 March 2024 (UTC)[reply]
@Ioaxxere I think we'd have to examine some actual use cases before deciding it's impossible. In many cases there's only one Etymology section, for example. Benwing2 (talk) 21:00, 15 March 2024 (UTC)[reply]
@Benwing: the problem with this heuristic is that a) there's no guarantee that that the single-etymology entry is actually the correct one (maybe the actual ancestor hasn’t been added yet) and b) could break unpredictably if someone adds a new etymology section later. The basic use case, in my view, is to replace our current "from X, from Y, from Z" with just "from X" and have the rest be automatically filled in. Those on the Discord will have seen my struggles in trying to do just this. Ioaxxere (talk) 21:29, 15 March 2024 (UTC)[reply]
@Ioaxxere Sure but realistically I don't think trying to implement a completely new system will work. We need to find a solution that leverages what's already there. Benwing2 (talk) 21:35, 15 March 2024 (UTC)[reply]
@Benwing2: We can leverage our current data by using bots to convert etymology sections into a structured format, but this can only be done in situations where we are certain that errors won't be propagated. For example: if A is listed an ancestor of B#Etymology_2, and we find B listed as a derived term or descendant at A#Etymology_2, we can fairly confidently connect A#Etymology_2 and B#Etymology_2. I have implemented this heuristic in my own script and it works very well. Ioaxxere (talk) 03:26, 16 March 2024 (UTC)[reply]
@Ioaxxere Just FYI, before you set off to radically restructure etymologies, you need to (a) get consensus, (b) keep in mind what will be workable for the typical editor; ideally the system should be as little different as possible from what we have already. It's also better to do this stuff dynamically through scraping if at all possible, vs. requiring a bot to run periodically. Benwing2 (talk) 04:16, 16 March 2024 (UTC)[reply]
@Benwing2 It seems as though there's consensus for a change of some kind, but no agreement as to how it should be implemented. And that's something I'm also still thinking about... Ioaxxere (talk) 07:32, 17 March 2024 (UTC)[reply]
Ben's idea on something like {{desctree}} would imply that {{bor}}/{{inh}} would be able to check the pointed-at etymon and print information from there, and potentially go several pages back. If such a system were implemented, I think {{af}} should obviously be excluded, imagine printing the information for all the morphemes! It also wouldn't work for redlink pages just the same as {{desctree}}. Vininn126 (talk) 07:42, 17 March 2024 (UTC)[reply]
But, as mentioned, every word would need etymid's and etymology sections, of course... Vininn126 (talk) 07:48, 17 March 2024 (UTC)[reply]
@Benwing2, Vininn126 I created a mockup of my concept at User:Ioaxxere/under. I created a module, Module:User:Ioaxxere/etymon, which can recursively go backwards through various entries to build a chain of etymons (no categories are involved). Currently, it can only handle "From X, from Y"-type etymologies, so we'll need more complex parameters to represent stuff like {{af}}. Please let me know what you think! Ioaxxere (talk) 05:15, 18 March 2024 (UTC)[reply]
@Ioaxxere I took a brief look. I really don't want to be a party pooper but my sense is the scraping needs to be a lot more sophisticated and more able to work with existing entries (I feel I've said this before). Manual or even manual-assisted conversion of large numbers of entries to a new system/template just won't be happening any time soon. The reason {{desctree}} works is that it works with existing entries without requiring everything to be converted to a new format (and to the extent things have been converted, like when I changed {{desc}} to accept multiple terms, it's been in a completely automated fashion). Benwing2 (talk) 05:22, 18 March 2024 (UTC)[reply]
@Benwing2 Take the example of father. Let's say I want to the etymology to by synced up with its etymon, Middle English fader. But wait, do we want fader (Etymology 1) or fader (Etymology 2)? A human would obviously realize that the correct section is etymology 1. An automated scraping template could easily figure this out as well if we added heuristics like "Etymology 1 is a lot longer" and "Etymology 1 is on top" and "Etymology 1 links to father in its descendants section" and "Etymology 1 and father list the same ancestors" and "Etymology 1 is defined as father". The problem is that these heuristics can get arbitrarily complex and break in unpredictable ways. That's why we should be working towards using etymids.
Also, I have a question about {{desctree}}: how would I get it to scrape bar#Descendants_2? The entry doesn't have etymids, so this doesn't seem to be possible.

Manual or even manual-assisted conversion of large numbers of entries to a new system/template just won't be happening any time soon.

Would you be opposed to trying out a new system on a few entries, such as father and its five ancestors? Like {{desctree}}, this would have no effect on any other entry. Ioaxxere (talk) 06:11, 18 March 2024 (UTC)[reply]
@Ioaxxere: I see a number of possible problems. Can you assure me that they aren't?
1. Intermediate steps may be unattested.
2. Uncertainty as to the borrowing route. The OED has this problem with terms that may come from French or some form of Latin, and Thai has many words for which the ultimate source is Pali or Sanskrit, and indeed some which are blends of the two. A further problem is that many of these words were probably (but not certainly) borrowed via Old Khmer, where the spelling is chaotic. The path of mainland SE Asian loans from Pali or Sanskrit may be very uncertain.
3. Clusters of 'obvious' cognates, but for which there is no authoritative proto-form. Tai languages often show this.
4. Word A is derived in language X, and word B in language Y may be inherited from A in X or independently formed in Y. RichardW57m (talk) 16:51, 18 March 2024 (UTC)[reply]
@RichardW57m Here are my proposed resolutions.
1. User:-sche highlighted this issue with entries like French chabot, which a reference suggests derives from Latin caput through Vulgar Latin *capoceus (which is unlikely to ever be created). This could be written as: {{etymon|id=fish|der|unc|la>caput>head|text=perhaps from <1> (via Occitan, from unattested {{m+|VL.|*capoceus}})}}.
In natural language, this represents: French chabot (etymid: fish) may be derived from Latin caput (etymid: head), but this is uncertain. Also, the entry should display the text "perhaps from Latin caput (via Occitan, from unattested Vulgar Latin *capoceus)".
The template would also be able to automatically fetch the ancestors of Latin caput (etymid: head), although we probably don't want that in this case.
-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]
2. One example of this is in English crusado, which is partially borrowed from Spanish cruzado as well as Portuguese cruzado. This could be written as: {{etymon|id=crusader|bor|es>cruzado>cross|pt>cruzado>cross}}
In natural language, this represents: English crusado (etymid: crusader) is borrowed from either/both Spanish cruzado (etymid: cross) and Portuguese cruzado (etymid: cross). The template would automatically generate the text "Borrowed from Spanish cruzado and/or Portuguese cruzado." in the entry (the |text= parameter could be used to change the display text).
-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]
@Ioaxxere: What I particularly had in mind is cases where one possible source is 'inherited' from the other. (In the case of Pali and 'Sanskrit', this requires that we write in Wiktionarian.) This example doesn't address that issue. RichardW57m (talk) 12:46, 19 March 2024 (UTC)[reply]
@RichardW57m: it would be helpful if you told me the specific case you have in mind. However, in the example I gave it doesn't matter at all how the Spanish and Portuguese terms are connected. Ioaxxere (talk) 15:30, 19 March 2024 (UTC)[reply]
@Ioaxxere: I was having trouble finding a clean example. I think our etymologisers have wrongly assumed that a Sanskrit-based spelling implies borrowing from Sanskrit, rather than reshaping. A clean example is พันธุ์ (pan). --RichardW57m (talk) 16:48, 19 March 2024 (UTC)[reply]
@RichardW57m: Thank you for the example! Before I answer, I need to get some clarity on the meaning of "or" in that etymology. Does it mean that the term was borrowed either from Sanskrit or Pali (but definitely not both), or does it suggest that the term could have been borrowed from both languages at different points? This is the kind of ambiguity that I'm hoping to eliminate. Ioaxxere (talk) 17:38, 19 March 2024 (UTC)[reply]
@Ioaxxere: I find it hard to see how if it could have been borrowed from either then it could not have been borrowed by both. Indeed, it could have been borrowed from both simultaneously, and on multiple occasions. Now, in Thai, there are some cases where a word seems to have been borrowed from Pali and then the spelling upgraded to Sanskrit, e.g. ธมม (or had it become ธัมม?) from Pali being replaced by ธรรม (tam) with (rɔɔ) from Sanskrit (but with gemination of the letter seemingly being the borrowing of a Pali spelling pattern not applicable to (rɔɔ) in words of Pali origin). --RichardW57m (talk) 18:00, 19 March 2024 (UTC)[reply]
@RichardW57m: In that case, it seems like the etymology is equivalent to crusado. The code used in the Thai entry describes the immediate origin of the Thai term, so the fact that the Pali and Sanskrit terms are related doesn't actually change anything. So the code would be: {{etymon|th|id=breed|bor|sa>बन्धु>kinsman|pi>bandhu>kinsman}}, which might produce "Partially borrowed from Sanskrit बन्धु (bandhu) and Pali bandhu." Ioaxxere (talk) 18:47, 19 March 2024 (UTC)[reply]
@Ioaxxere: In which case {{etymon}} should not be used. --RichardW57m (talk) 09:53, 20 March 2024 (UTC)[reply]
@RichardW57m: I'm not sure what you mean by this. Is there something wrong with what I said? Ioaxxere (talk) 17:45, 20 March 2024 (UTC)[reply]
@Ioaxxere: Yes. It rather implies that there is a some third source of the word.
Moreover, with the tight restriction on how far back tracing would go, our statements would not be compatible with the word actually being borrowed via Khmer. I did begin to have some doubts as to the origin of the apocope in this word; I am not totally sure that it is part of the mechanism of directly borrowing from Pali and Sanskrit, especially as some words have been borrowed without apocope. Possibly we can stick in 'ultimately' to cover ourselves. The dropping of final -a from the 'rough forms' (i.e. stems) of words has been incorporated in the way Thai borrows from Pali and Sanskrit, but not as a mandatory process. (I suspect some words may also have been borrowed as the first elements of compounds, thereby preserving the final -a.) --RichardW57m (talk) 18:12, 20 March 2024 (UTC)[reply]
@RichardW57m: I see what you mean. In that case, I suggest "Borrowed from Sanskrit बन्धु (bandhu) and/or Pali bandhu." If we want to allow the possibility of an intermediate step or steps, it could be "Derived from Sanskrit बन्धु (bandhu) and/or Pali bandhu." (in which case the template would be using |der rather than |bor). Also, by "borrowed via Khmer", are you referring to Khmer ព័ន្ធុ (pŏənthuʼ)? If so, that can easily be added in as well. Ioaxxere (talk) 19:07, 20 March 2024 (UTC)[reply]
@Ioaxxere: I'm not sure why you're excluding borrowing in the latter case, but these wordings are better. For borrowing via Khmer, I would have expected the final written vowel to have been silent, but the word you give is definitely a cognate. --RichardW57m (talk) 09:24, 21 March 2024 (UTC)[reply]
3. If there is no proto-form, the {{etymon}} template wouldn't have any ancestors listed and wouldn't be very useful. If for some reason the only thing we knew about English king was that it was cognate with German König, the entry might have: {{etymon|id=monarch}} Cognate with {{m+|de|König}}. (Note: for now, I'm not sure if it's possible to automatically get cognates as we would have to go up and then down the etymology tree, although it might be possible with category stuff).
-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]
4. This one's simple: English unlock (for example) is from Middle English unloken but also equivalent to un- +‎ lock. This could be written as: {{etymon|id=open lock|enm>unloken>unlock|afeq|un->inverse|lock>mechanism}}.
In natural language, this represents: English unlock (etymid: open lock) is inherited from Middle English unloken (etymid: unlock), and is also equivalent to English un- (etymid: inverse) + English lock (etymid: mechanism). The template would automatically generate the text "From Middle English unloken, equivalent to un- +‎ lock." in the entry.
-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]
@Ioaxxere: Actually, this type of assertion is one that I am trying to avoid. The situation I gave was, "Word A is derived in language X, and word B in language Y may be inherited from A in X or independently formed in Y". You are asserting that word B derives from word A. There are also cases where one can confidently note that, despite formal appearance, word B does not descend from word A. --10:40, 19 March 2024 (UTC) RichardW57m (talk) 10:40, 19 March 2024 (UTC)[reply]
Another interesting case of this would be something like rocznik, i.e. possibly from Old Polish, but scholars aren't sure. Vininn126 (talk) 12:59, 19 March 2024 (UTC)[reply]
@RichardW57m, Vininn126 Based on the current information at Polish rocznik, I would write this as {{etymon|id=year|unc|zlw-opl>rocznik>flower|af|rok>year|-nik>performer}}. I used the keyword afeq (equivalent affix) in a previous example but I'm beginning to doubt whether there's any need to have both it and af (affix). Ioaxxere (talk) 15:38, 19 March 2024 (UTC)[reply]
{{{eqaf}}} just seems to be {{surf}}. Vininn126 (talk) 15:40, 19 March 2024 (UTC)[reply]
Correct. Ioaxxere (talk) 15:47, 19 March 2024 (UTC)[reply]
@Ioaxxere: I think the question is about the Polish word. The formal match is fine, but the semantics don't work except in so far as an old formation may guide a new formation. --16:53, 19 March 2024 (UTC) RichardW57m (talk) 16:53, 19 March 2024 (UTC)[reply]
@Ioaxxere:: It seems like a rather elaborate system, and for that reason it won't be easy to get widespread approval. I reiterate the earlier comment about slimming this down to its ‘core’, which is already very ambitious. Nicodene (talk) 12:19, 20 March 2024 (UTC)[reply]
@Nicodene Yes, this is the point I've been trying to make as well. Ideally we could leverage what we have already, possibly with some minimal changes (e.g. adding an extra etymid param or something if absolutely necessary). The tradeoff should be in the direction of adding more logic to the scraping function so that less explicit specification on the part of the editor is needed. Benwing2 (talk) 17:14, 20 March 2024 (UTC)[reply]

Update: I've added the template to father (& ancestors). In the end I didn't implement any text generation, since I don't think template-generated text can ever be better than human-generated text. Instead, I'm having it create an etymology tree which is pretty cool as well. Ioaxxere (talk) 00:28, 27 March 2024 (UTC)[reply]

@Ioaxxere: Did you get approval to add this? It doesn't seem like there's an actual consensus for it, and it is a major change. Additionally in the Proto-Indo-European entries that you've added it to, like at *peh₂-, it's causing weird spacing issues that weren't there before. You should only add it if there's a clear consensus for it. For testing purposes, you can use sandbox pages instead. CC: @Benwing2 AG202 (talk) 01:16, 27 March 2024 (UTC)[reply]
@AG202 @Ioaxxere I agree, please use sandbox pages until there's consensus to add this to mainspace pages. Benwing2 (talk) 01:29, 27 March 2024 (UTC)[reply]

Adding "Língua Geral" as a new language[edit]

It's a fairly well documented language, and represents a 150-year-old path between Old Tupi (tpw) and Nheengatu (yrl). Língua Geral has some evolutions that appear in both late Portuguese borrowings and Nheengatu, and are hard to show without it, like some changes in pronunciation (kunumĩ > kurumĩ) and meaning (paranã (sea) > paraná (“river”)). Língua Geral is also present in a good number of Brazilian toponyms, like Botuverá, and having an Etymology section saying "From Old Tupi" would just be wrong.

The cut between Língua Geral and Nheengatu is set at 1853 by most scholars, when the word "Nheengatu" was first used with the current meaning. The cut between Old Tupi and Lingua Geral is a bit more nebulous. Navarro used 1700 for his dictionary, so I'd go with that. For the code, I sugest <tpw-lg>, as it comes from Old Tupi.

There existed two varieties, Língua Geral Amazônica and Língua Geral Paulista, but I think that they could just be pointed out using {{lb}} when needed, rather than two separated L2 headings (if this ever become a new L2). What do you think?. Trooper57 (talk) 20:31, 16 March 2024 (UTC)[reply]

@Trooper57 I think what you are proposing is a full (L2 header) language. I don't know much about the differences between Old Tupi, Nheengatu and Língua Geral, but 150 years seems rather narrow a window for a full language; unless things evolved really fast, this could also (maybe better) be handled as an etym-only language variant of either the preceding or following stages. At least to me, the changes in pronunciation you give (kunumĩ > kurumĩ and paranã -> paraná with a semantic shift) do not seem indicative of a radical transformation in the language. Also, for a code I'd suggest maybe tpw-lig as we try to make the second component of a two-component language code have three chars. Benwing2 (talk) 21:57, 16 March 2024 (UTC)[reply]
Maybe a etym-only would suffice. The trouble I was having was how to state a word came from a later stage o Tupi, and not from the 16th century.
Also, as I understand, the fast pace of LG comes from its marginalization: Marquis de Pombal prohibited anything besides Portuguese, so it ended being a unscripted, nonstandardized language. Trooper57 (talk) 00:26, 17 March 2024 (UTC)[reply]
@Trooper57 If we add tpw-lig as an etym-only language variant of Old Tupi, then all you need to do is use the code in place of tpw and it will show as "From Língua Geral ..." with the appropriate link to the Língua Geral category (which doesn't seem to exist but can be created). Benwing2 (talk) 00:32, 17 March 2024 (UTC)[reply]
@Trooper57 BTW "Língua Geral" and "Geral" are listed as other names for Nheengatu in our data, which is consistent with how Wikipedia describes things (i.e. Língua Geral being an older stage of Nheengatu rather than a later stage of Old Tupi). Benwing2 (talk) 00:39, 17 March 2024 (UTC)[reply]
There are authors that call everything from 1500 onwards "Lingua Geral". It gets really confusing sometimes. Trooper57 (talk) 01:23, 17 March 2024 (UTC)[reply]
@Trooper57 OK. Let's wait a couple of days for anyone else to weigh in who might be knowledgeable about this topic (please ping anyone you think might be able to contribute), and then we can create an etym-only lang for Língua Geral, either tpw-lig or yrl-lig, whatever you think most appropriate. We can also create subvarieties *-lga (Língua Geral Amazônica) and *-lgp (Língua Geral Paulista) if you think this would be helpful (e.g. if you think these varieties will ever see fit to be cited in an etymology). Benwing2 (talk) 01:53, 17 March 2024 (UTC)[reply]
@RodRabelo7 and @NoKiAthami are the other Old Tupi editors I can think of. There's also @Arthur botelho, but he's inactive since 2019. Trooper57 (talk) 02:22, 17 March 2024 (UTC)[reply]
@Benwing2, the grammars of the General Languages are very distinctive from the ones of Old Tupi and Nheengatu, hence I agree with @Trooper57. An example: "Nitípe nde Caräíba?" in Old Tupi would be "Nda karaíba ruãpe endé?"; "Xe çüí nití oiabàb" would be "Xe suí i îababe'ymi", etc. In Nheengatu it would probably be similar, but still there are significant differences, in grammar and vocabulary. I honestly don't know the implications of "adding a new language" on Wiktionary, but undoubtfully the General Languages are independent languages. Those who call the language spoken from 1500 onwards Língua Geral (or even Nheengatu) should be completely ignored; for instance, Cândida Barros and Monserrat categorically state that the term isn't pertinent to the 16th century. In my opinion, Navarro's division is the most didactic one. Melhor feito do que perfeito… Pinging Erick Soares3 and Bageense in case they have anything to add. RodRabelo7 (talk) 17:38, 23 March 2024 (UTC)[reply]
@RodRabelo7 One gauge is to consider the differences between Old, Middle and Modern English. Middle English covered a c. 450-year time window with significant differences in the grammar and phonology between Early and Late Middle English (e.g. Early Middle English had 4 cases and 3 genders, while Late Middle English had no cases and no genders), but we don't split Early and Late Middle English for various reasons, e.g. (a) there's no obvious line to draw between Early and Late Middle English; (b) overly fine splits scatter information in different places that might be better grouped together; (c) splits risk creating duplicate information, where essentially the same lemmas appear in several places, with inevitable bit rot as changes in one place don't get properly propagated to the other(s). If the original motivation was simply to better show etymological progression, that can be done without any issue with etym-only languages and the appropriate labels; we do that for example with the different stages of Latin, where Classical Latin, Late Latin, Early Medieval Latin, Medieval Latin, New Latin, etc. are all grouped under the L2 Latin header and labels used to identify distinct stages. Benwing2 (talk) 18:13, 23 March 2024 (UTC)[reply]
I think Latin would be a good example to follow in this case. For example paranã, we could use {{lb|tpw|Língua Geral}} and say "3. (Língua Geral) river" to state this sense appeared later. Trooper57 (talk) 18:31, 23 March 2024 (UTC)[reply]

Reconstruction:Latin → Reconstruction:Proto-Romance?[edit]

Periodically people leave me messages like this asking why the provided pronunciations can be so different from what they would expect from ‘Vulgar Latin’, not understanding that these are reconstructions that work backwards from Romance and not forwards from Latin. To be fair, it's not as if we make this easy to understand. Every page in question prominently says ‘Latin’ at the top, while the ‘Proto-’ and ‘Romance’ parts are buried deeper in the entries, in template labels that are (apparently) easy to miss. Examples that have confused people include *cordarium and *damnaticum.

So, would we be better off splitting out reconstructions based on Romance under a new name? An incidental benefit would be that the relatively few Classical or pre-Classical reconstructions such as *futo, which are currently buried under a mass of Romance forms, would be easier to find and compare. Naturally the objection can be made that most if not all of the reconstructions probably pre-date any meaningful split between Latin and Romance, and so a proper entry of this kind amounts to ‘Late Latin, but unattested’ - and this is why I've not been inclined to change the way that we handle these so far. But I've come to realize that a change in name would have the benefit of clarifying for people how these reconstructions work, which must be rather opaque for anyone but a specialist.

Then the follow-up question is: if we do this, then should all such entries fall under the umbrella of ‘Reconstruction:Proto-Romance’, with labels distinguishing lower-order reconstructions like Proto-Gallo-Romance, or should they be split up accordingly? Nicodene (talk) 12:58, 20 March 2024 (UTC)[reply]

@Nicodene Personally I'd rather not introduce a new L2 for Proto-Romance because the reconstructed forms look so much like Latin; but I understand your concerns as well. If we are to do this, IMO there should be only one L2 for Proto-Romance, at least for now, with lower-order reconstructions distinguished using labels. In any case AFAIK the inner structure of the Romance languages isn't that worked out? Benwing2 (talk) 17:18, 20 March 2024 (UTC)[reply]
Just throwing out there that two "least-change" solutions to people being confused about the pronunciations, which we could do either or both (or, of course, neither) of, are to start repeating the "Proto-Ibero-Romance" etc labels as {{q}}s right before or after the pronunciation, and/or to start (additionally) providing the Classicizing / Classical-esque pronunciation, which is plainly what people are looking for in these and other Latin entries and tend to use in the situations (like Latin classes at school) where people use Latin anymore, which I think we should be providing across all our Latin entries. (I know you've opposed that because the people using those pronunciations are not Classical Romans, but I think we could find some label to make that clear.) - -sche (discuss) 17:39, 20 March 2024 (UTC)[reply]
@-sche @Nicodene I agree with this. Note that for example, we provide the "modern Italianate Ecclesiastical" pronunciation on Classical Latin entries even though this isn't how the Romans pronounced things, and we provide the Egyptological pronunciation on Ancient Egyptian terms, which is totally artificial. Benwing2 (talk) 17:48, 20 March 2024 (UTC)[reply]
We assign a modern ecclesiastical pronunciation to words that are actually in use by modern speakers of Latin - which is neither *damnaticum, nor *cordarium, nor *muccicalium, nor *ramuscellum - ad infinitum. That is to say, no, there is absolutely zero sociolinguistic justification for doing this - much less any historical or phonological. Egyptological pronunciation is based on a scholarly convention - which incidentally arose because scholars needed a way to refer to attested words without knowing the vowels in them - and so not at all comparable.
To put it more concretely - it'd be exactly as if the OED cited a Proto-Germanic reconstruction and assigned it a modern cockney pronunciation based on its spelling. Or vice-versa: cited cockney slang in a reconstructed Anglo-Saxon pronunciation. That is to say, it'd be utterly mad either way. Nicodene (talk) 18:42, 20 March 2024 (UTC)[reply]
No word that is known only from Romance reconstruction exists ‘in Latin classes’, or in any dictionary of Latin, by definition. To assign a pronunciation from the first century BC to a reconstructed word that shepherds in the Balkans would have come up with ca. AD 900 would be fantasy - misinformation - the precise opposite of what any serious dictionary should be providing. What you are proposing is quite literally to feed the misconceptions of (some) people coming here for clarification. Just because a reconstruction is spelt for etymological reasons in traditional Latin style does not mean it is equivalent to any old word Cicero used in the first century BC that people actually use now and have used throughout the entire history of Latin as a literary language, from end to end.
Actually your comment has convinced me that splitting is really the best option. There is simply no other way I can see to prevent this problem recurring over and over again. Well, there is of course always the option of giving up and letting fantasy run amok forever, where comparing a Friulian word to a Catalan word magically results in a reconstruction with the phonetics of Cicero's era based on quite literally nothing more than the spelling that the reconstructed entries are given.
Do editors in Proto-Germanic have to deal with people inserting modern English spelling-pronunciations like /ɪˈtʃɹəʊnə/ for *aitrōną? Do editors in Proto-Slavic have to deal with people insisting on modern Russian Church Slavonic IPA? I imagine not. Nicodene (talk) 19:14, 20 March 2024 (UTC)[reply]
That is not how reconstruction works, for any language. The pronunciation given on a reconstructed entry is what can be deduced from the descendants, not some kind of modern spelling-pronunciation based on the letters used to spell the reconstruction.
@-sche: could you name any academic source that assigns to a Gallo-Romance reconstruction like *damnaticum a supposed Classical Latin pronunciation like [d̪ämˈnäːt̪ɪkʊ̃ˑ]? Or for that matter assigns a pronunciation like [foːrˈmäːt̪ɪkʊ̃ˑ] to a Gallo-Romance form like ⟨formaticum⟩ ‘cheese’, found for a couple of decades in Latin records from medieval France and never before or since? Or claims that modern English scholars discussing such words switch to Classical Latin phonetics, complete with trilled /r/, phonemic vowel length, and nasal vowels? Or that either word has been artificially resurrected by modern Latin enthusiasts, or is even known by them?
So long as the answer to all four questions is ‘no’, I don't see what there is to discuss. The pronunciation in question would be a fiction divorced not only from scholarship but also history and modern reality. The only remaining point is what you mentioned: that some uninformed passers-by want to see this fiction, because they look at a word spelt in Latin orthography and can't imagine any other pronunciation. And if that is to prevail over the aforementioned points, if this site is to deliberately place fiction over fact, then tell me and I will leave it in peace instead of trying to observe some modicum of academic rigour. Nicodene (talk) 13:32, 21 March 2024 (UTC)[reply]
@Nicodene Hi again. I'm not sure what your latest response is responding to, but I thought of this some more and I'm not convinced by your arguments. The thing is that the "Classicizing" pronunciation is essentially a modern invention that bears a certain similarity to the pronunciation of c. 50 BC but isn't the same. For example, the typical Classicizing pronunciation pronounces final -m as /m/, which is not how it was actually pronounced, and does not bother making any distinction between different types of written l. For these reasons, it *does* remind me a bit of the Egyptological pronunciation, and even more of the Modern Standard Arabic pronunciation, which (like the Classicizing pronunciation) is based on a specific era's pronunciation (that of Koranic Arabic of c. 600 AD) but with significant changes. MSA pronunciation can be given to all Arabic words, including ones that originated far after Koranic times, and there's nothing particularly wrong with assigning and listing such a pronunciation because (a) it's a modern invention anyway, and (b) it is in common use today. In fact there are further similarities, e.g. both the MSA and Classicizing pronunciations differ somewhat depending on the native languages of the speakers, because in various circumstances they match up certain written letters to the closest native-language phoneme instead of attempting a "true" representation of the original era's pronunciation. Benwing2 (talk) 21:41, 21 March 2024 (UTC)[reply]
@Benwing2 Sorry, I think I missed what @-sche was actually getting at. And it seems to be what you're getting at too. Namely that if we were to use an accurate label like ‘Modern Classicizing’ rather than ‘Classical’ then everything else becomes very easy.
In that case there's no longer any concern about ahistoricity and we can indeed ‘provide [this pronunciation] across all our Latin entries’. We'd no longer be portraying this as how for example a word attested in 9th-century Italy was actually pronounced, we'd simply be saying this is how a modern speaker would read it. And that's completely reasonable.
Also this makes it clear what pronunciation we should be showing - the standard that modern speakers follow, i.e. Allen's Vox Latina. No need to host wacko home-brewed theories like [z̪d̪͡z̪] anymore. Your idea of setting it all to [phonetic only] would also be well-suited for a modern convention if you're still in favour.
As far as the original topic is concerned though I can't agree with the idea that such pronunciations would be reasonable in a reconstructed entry, because in my view the only pronunciation that is valid on a reconstructed entry is one that is actually reconstructed from the descendants. Nicodene (talk) 23:27, 21 March 2024 (UTC)[reply]

Minimal viable quotation that satisfies the WT:QUOTE policy requirements[edit]

The quote-book template provides amazing flexibility and great level of details. However this also creates an impression of a steep learning curve, so it's natural that some editors are reluctant to add quotations as a result. So I wonder, what is the absolute minimum amount of information that has to be provided to make a valid quotation?

My understanding is that it's always necessary to mention the author, because not doing so would constitute an act of plagiarism. But then there's the publication year, the title of the work, the information about the publisher, ISBN, a link to Google Books or some other durably archived source, an English translation for non-English quotations and many other details. How much of this can be omitted during the initial edits, done by inexperienced contributors, without becoming a problem? --Ssvb (talk) 22:29, 20 March 2024 (UTC)[reply]

I would say that the minimum viable quote has |title=, one of |year= or |date=, and |text=. Obviously |author= should be included when available, but that's not always the case. Ioaxxere (talk) 23:10, 20 March 2024 (UTC)[reply]
Agreed. I think in fact that if you omit any of |title=, |year=+|date= or |text=, you get a maintenance message asking for them; if not, that should be what happens. Other parameters, e.g. publisher, link to a durably archived source, etc. should also be present ideally but aren't strictly required. Benwing2 (talk) 04:07, 21 March 2024 (UTC)[reply]
If mentioning the author is currently not required, then this probably needs to change. Because this conflicts with WT:FAQ ("the right to be properly credited for one’s works", "What is fair use? [...] Any such quotation must be properly credited.") and the copyright laws of many countries. The Belarusian paper dictionary in 5 volumes, directly copying from which instigated this discussion in the first place, only lists the text itself and the author's surname in its entry, but not the year or any other information. They had to make their entries compact due to the restrictions of the paper format, but out of all things, it was the author's name that was kept there. --Ssvb (talk) 10:02, 21 March 2024 (UTC)[reply]
If mentioning the author is currently not required, then this probably needs to change. In context at WT:FAQ, "the right to be properly credited for one’s works" doesn't imply that authors should be required in quote-book. If ascertaining the "author" to ensure the "right to be properly credited for one's works" becomes an actual requirement, then the texts in the Bible or other anonymous works can not be quoted since the authors are uncertain. Proper credit for a work can come in many forms, some of which do not necessarily include identification of the author directly by some name. I guess "due diligence" might ask that you put "Anonymous" or similar on a text with uncertain authorship, but what about when authorship claims are contested? --Geographyinitiative (talk) 10:38, 21 March 2024 (UTC)[reply]
There's already a policy about WT:QUOTE#Debated_authorship. --Ssvb (talk) 11:23, 21 March 2024 (UTC)[reply]
It doesn't say what to do if we don't know the exact author's name, as in with an unsigned editorial. CitationsFreak (talk) 15:17, 21 March 2024 (UTC)[reply]
My interpretation of "giving proper credit" is that some point of contact should be provided. It can be the author, an editor or a publisher house. So that it's clear that the Wiktionary editor doesn't try to claim authorship of the quoted text. Anyway, as a result of this discussion, I just would like to have some clear, simple and actionable instructions. Quotes from the Bible are useful, but they are more like a special case. My point is that people should be able to easily add quotes without having unnecessary headache in typical cases:
  • The authors are usually known for modern books, but the original year of publication is a pain, because Google Books often messes it up. See WT:Quotations/Resources#Google_Books.
  • Wikisource is a great place for finding quotes from older public domain books, so some simple guidelines specifically tailored for its usage with the quote-book template would be useful.
--Ssvb (talk) 11:50, 21 March 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Some thoughts:

  • I think the main point is to provide sufficient information so that another editor who wishes to verify the information can easily do so. For example, if the quotation can be found online, then please add the URL. I have seen really bad quotations where an editor provided a quote from, for example, The Guardian newspaper (presumably the one published in London? but no information was provided) some time in 2020—not even a complete date. It's not so bad if the quotation can be found through an online search, but if it can't then I'd have to reject this as a quotation. Ditto for quotations from works that I cannot find online and aren't even listed in Worldcat—I usually move these to the Citations page and mark them "unverifiable".
  • As to whether providing the author's name is mandatory, I'd say no but if we are unsure let's ask the Wikimedia Foundation for advice. It's definitely a good practice, though. Also, there's a difference between works still subject to copyright and works now in the public domain. A stricter standard may well apply to copyrighted works.

Sgconlaw (talk) 11:32, 21 March 2024 (UTC)[reply]

> mark them "unverifiable"
How do you do that? Does the quote-book template allow setting the "unverifiable" status and automatically put such quotations into their own category?
I think that it would be very useful for creating English translations. For example, I could translate Belarusian quotations to the best of my ability and set some kind of "needs to be proofread by a native English speaker" status for them. Then some native English speaker could fix the grammar and style issues, rephrase the translation if necessary, and set "needs to be verified whether the translation still conveys the same meaning" status. Then I could take a look again and remove this status if everything is fine. With an iterative process like this, the quality of translations could be potentially improved. --Ssvb (talk) 16:09, 21 March 2024 (UTC)[reply]
> the main point is to provide sufficient information so that another editor who wishes to verify the information can easily do so
That's a good point and I agree. In the case of dealing with modern books indexed by Google Books, just listing ISBN in the quote-book template is probably good enough to make the information verifiable and the duty of providing all the extra details can be delegated to Google. As long as the text of the quotation is searchable itself. I mean, it's probably okay to use "|isbn=" instead of "|author=" from the compliance point of view. Explicitly specifying the author is indeed a good practice, but things may get tricky for the inexperienced Wiktionary editors when dealing with translated works. Another tricky case is when some book is a collection of multiple shorts stories from different authors. And again, just providing the title of the whole book and ISBN is probably good enough. Rather than trying to figure out who was the actual author of that particular text snippet. --Ssvb (talk) 16:55, 21 March 2024 (UTC)[reply]
@Ssvb I tend to disagree that just providing the ISBN would be enough; I think as a courtesy to the reader, you should supply the title, author, date and ideally page number. As you mention above, there are edge cases where it may be hard to determine the author, and in such a case I think it's OK to leave out the author, but the vast majority of the time, the author is right there on the front of the book (or in the table of contents if the book is a collection from multiple authors). I also completely agree with User:Sgconlaw that quotes need to be verifiable. I did a lot of rewriting of {{quote-book}} a few months ago and cleaned up all the bad parameter uses (formerly, parameters were not checked properly), and came across a lot of badly formatted quotes, most of which I was able to verify but sometimes it took a lot of work to do so, which is not what you want to force people to do. As for adding things like "unverifiable" and "needs to be proofread", there aren't things built into {{quote-book}} to let you do that currently. I think User:Sgconlaw is just writing [unverifiable] and putting it somewhere next to or within the quote (e.g. using {{attn}}); you'd have to ask them for sure. I could definitely see adding options to supply notes "unverifiable" and "needs to be proofread" automatically and add them to appropriate categories. Benwing2 (talk) 21:51, 21 March 2024 (UTC)[reply]
I've just been adding to such quotations on citations pages |footer={{small|Unable to verify this quotation.}}. — Sgconlaw (talk) 21:58, 21 March 2024 (UTC)[reply]
@Sgconlaw: Thanks! I made the following diff to give it a try. Still, as User:Benwing2 mentioned, having a dedicated option and a category for it might be useful. --Ssvb (talk) 14:22, 22 March 2024 (UTC)[reply]
@Ssvb: I'm not sure it should be used on the main entry page. I would suggest you move the entire quotation to the citations page. — Sgconlaw (talk) 15:59, 22 March 2024 (UTC)[reply]
@Sgconlaw: Removing the quotation from the main entry page would be less than ideal and I believe that my translations into English aren't too horrible. I'm just not fully confident about the things like "useful for me" vs. "useful to me" and also wonder whether I'm possibly doing a kind of odd Yoda-style sentences composition in some of my translations.
Looks like I probably need something like |t-check= parameter support in the quote-book template. Similar to the WT:TRANS's approach to handling this. --Ssvb (talk) 15:28, 23 March 2024 (UTC)[reply]
@Benwing2: I mean that providing |title=|year=|text=|isbn= with a valid ISBN known to Google Books is better than providing just |title=|year=|text= and nothing else. But having |title=|year=|text=|author= with a valid author is even better. Page numbers are not always available in the newer ebook editions of older paper books.
When you are talking about badly formatted quotes, do you mean the uses of quote-book template, which incorrectly add advanced options with inaccurate/bogus information? Or was it something else? --Ssvb (talk) 00:12, 22 March 2024 (UTC)[reply]

I agree with the above: "My interpretation of "giving proper credit" is that some point of contact should be provided. It can be the author, an editor or a publisher house." How do you feel about Lufeng's quote here: diff? You may say "there are other cites with authors on them"- but consider that you exclude an important part of the literature of humanity that has an anonymous or semi-anonymous character, but can still be adequately identified with OCLC, etc. Formally identified authors/translators are important to identify, yes, but there's generalized "sources". I mean it's an interesting question, of course. I don't really know what's right legally speaking for the site. Many magazines and newspapers have articles with unidentified authorship, and "AP News" is the source for many articles in 20th century newspapers.
Or again, consider this cite for Citations:Manoi, which has no specific author except the city "CHUNGKING":

  • 1945 May 23, “Chinese Expand New Drive on East Coast”, in Manila Free Philippines[10], volume III, number 24, Manila, sourced from Chungking, →OCLC, page 1, column 5:
    On the east China coast, onrushing Chinese forces captured Manoi, Min river estuary coastal town nine miles east of Foochow. Other forces, driving northeast from Foochow, were reported at the outskirts of Lienkong on the coast.
    --Geographyinitiative (talk) 22:18, 21 March 2024 (UTC) (Modified)[reply]
@Geographyinitiative: I feel that these quotes are way too elaborate and detailed. So they are exactly the thing that scares away new contributors. The beginners wouldn't touch any of these excessively complex templates with a ten-foot pole.
The discussion is about how much can be left out, while still being an acceptable and useful contribution. There was a suggestion that just having "|title=, |year= and |text=" is enough. Yet my opinion is that such bare minimum likely fails to "give proper credit" and a little bit more information is necessary, such as the "|author=" option. Or, as it's in your example, it may be the issue number, OCLC, page number, etc. However I think that the beginners should focus on just quotes from books and the "|author=" option, initially staying away from more complicated cases. --Ssvb (talk) 23:31, 21 March 2024 (UTC)[reply]
Here is "title year text", bare bones, for the above quote:
1945, Manila Free Philippines:
On the east China coast, onrushing Chinese forces captured Manoi, Min river estuary coastal town nine miles east of Foochow. Other forces, driving northeast from Foochow, were reported at the outskirts of Lienkong on the coast.

User:Wyang once called the website 烏煙瘴氣. I don't think anyone will respect the website that is made up of only relatively simple quote cites. Those simplistic quote cites are part of the equation, but you don't want to hurt the high-quality quotes just because some quotes are simple.
I think the main problem is that it's not obvious to entry-level people that Template:quote-book is the place to go to find out about quote-book parameters. It's hidden from them.
WT:ATTEST says: "Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source." My goal in providing detail in quote cites is two-fold: (1) to allow someone looking for this in 50 years after Internet Archive is long gone to find it again, and (2) to give an atmosphere to this website of professionalism that encourages high-quality editing and citations. If I did "title year text" on my quote cites in my high school paper, the teacher would give me an F.
I also work on a lot of words that were ignored by Wiktionary and Wikipedia for about 20 years time, and they will be ignored again when I'm gone. I expect no one will follow up after me and that the soft-power campaigns of authoritarian China will mostly reverse what I've done so far to illuminate some of this terminology that the CCP doesn't like people to know about. It's an area of English language vocabulary that is scorned by the field of Asian Studies.
All I have to cling to is that the cites I've done so far might be high-quality enough to convince some future admins in 2040 not to allow deletion of everything I did. If I just did "title year text" quote cites, my argument to those future admins is significantly weakened, because they can say "well, who the fuck knows where that Manila Free Philippines bullshit quote was really located? that's just some bullshit from the cretins in 2024 who didn't have MindLinkAI." I'm adding indicia of authenticity with these details to stave off deletion in 2040, 2050, etc. I want to make it so that if you delete some of these high-quality quote cites, you have to feel palpably ashamed of yourself, unless you're totally dead inside. No one would feel ashamed deleting "title year text" quote cite. --Geographyinitiative (talk) 00:08, 22 March 2024 (UTC)[reply]
@Geographyinitiative: I'm not asking you to reduce the level of details in your quotes. Keep up the good job. I'm actually asking people to start using Template:quote-book instead of Template:uxi for adding quotes to Wiktionary. And this transition doesn't have to be difficult. --Ssvb (talk) 00:39, 22 March 2024 (UTC)[reply]
I'm telling you, we will always regret simplification to less than what exists now in quote-book. But we will also always regret throwing out things that are semi-anonymous. You give credit to the extent the work allows you to give credit, not less, not more. The work being cited is the guide to its citation. Cookie cutter simplifications will not work. It is as beautiful and marevelous as it is without Stalinist dictates to simplfy or to require specific authors. Geographyinitiative (talk) 02:22, 22 March 2024 (UTC)[reply]
@Geographyinitiative: This is getting ridiculous. I don't just "give credit to the extent the work allows you to give credit, not less, not more". I'm providing a lot of additional details when adding quotations myself. But I'm also occasionally fixing quotations like this. And please take a look at how people opt to just delete an ux template rather than convert it into a proper quotation. Why is this happening? What can we do to improve the current situation? I don't believe that it's productive to just shame people by claiming that allegedly "the teacher would give them an F". --Ssvb (talk) 21:40, 23 March 2024 (UTC)[reply]

Template:ko-etym-native without parameters is pointless and misleading[edit]

@AG202 @Solarkoid @Chom.kwoy @Tibidibi

Calling Template:ko-etym-native without parameters is currently explicitly allowed, producing the message "Of native Korean origin."

This to me seems completely pointless. How does this in any way help the reader? Not only that, it is misleading and gives the ety false confidence; if you have nothing to add to the ety, how can you be so sure that the word is of native origin to begin with?

I propose that this usage be deprecated so that it can eventually be phased out. Ideally this would be done by categorizing any entries with parameter-less invocations into a separate category (or just ) so that the respective entries can be dealt with appropriately. Lunabunn (talk) 00:46, 21 March 2024 (UTC)[reply]

I think that template is currently being used as a signal to indicate the average Korean speaker's intuition about that word, as being neither Sino-Korean nor "foreign", as this intuition sometimes has an effect on the word's usage and grammar (e.g. Sino-Korean words better go along with other Sino-Korean words when compounding, etc). So while this information has some uses, it should not be conflated with the actual etymology of the word, which is what the etymology section is for.
So I agree that it should be deprecated as well. Chom.kwoy (talk) 09:10, 21 March 2024 (UTC)[reply]
@Chom.kwoy: Are there any bright ideas as to what should replace it? There is a significant similar notion in Thai of a partition of vocabulary into 'native' and Pali/Sanskrit vocabulary, distinguished as 'blue' and 'red' in one English-language grammar, and associated with compounding rules (which have exceptions, most notoriously ผลไม้ (pǒn-lá-máai)). English also has a similar but sloppy categorisation, though the effects of 'marking' in the lexicon are shown by other information in lemmas' entries, so probably less useful to record. --RichardW57m (talk) 11:33, 21 March 2024 (UTC)[reply]
As an outsider (beginner in Korean studies), I find this useful information, for the reasons you (@Chom.kwoy) note.
As a long-time Wiktionary editor, I think "native Korean origin" is indeed etymological information, as etymology is about where a word comes from (its origin). I struggle to think where else in our WT:ELE entry structure that this kind of information would go. ‑‑ Eiríkr Útlendi │Tala við mig 18:56, 22 March 2024 (UTC)[reply]
If a word is truly of native Korean origin, that is of course etymological information. However, the issue is that "Of native Korean origin." is currently a copout that gets added to every entry under the sun that isn't an obvious loan (i.e. from English or Sino-Korean) and obscuring actual etymological info based on "speaker intuition." As Chom.kwoy said, this speaker intuition is also important. It should not, however, be conflated with the true origin of a word.
See also examples from Surjection below on words that are clearly loans (and even described as such in the etymology section) but still end up using Template:ko-etym-native because of this template's double duty as "first attested in" and "of native origin." Lunabunn (talk) 01:18, 23 March 2024 (UTC)[reply]
Fully agree, as an outsider. The "first attested in" part should be moved into its own template (perhaps named {{ko-attest}}) and the rest removed, if not completely, then at least from the etymology sections. Category:Native Korean words must also go. — SURJECTION / T / C / L / 12:06, 21 March 2024 (UTC)[reply]
First attestation seems to me like part of etymology, in describing the start date (origin) of textual evidence; I've certainly been putting that info in the ===Etymology=== sections in Japanese entries, for years now. Where else should that go? ‑‑ Eiríkr Útlendi │Tala við mig 19:06, 22 March 2024 (UTC)[reply]
First attestation should stay in the etymology. I wasn't arguing it shouldn't, just that it be moved into its own template that could then be used in etymology sections. — SURJECTION / T / C / L / 21:35, 22 March 2024 (UTC)[reply]
Keep until a better alternative is found, don't simply remove. I liken this to Japanese on'yomi and kun'yomi - native Japanese and Sino-Japanese readings of Chinese characters (kanji). Apart from Sino-Korean, it also distinguishes from loanwords from European languages, more modern Japanese loanwords where occasional tensification of consonants occur.
Finding etymologies for all words may not be possible bu distinguishing loanwords from native words specifically for Korean has its values and is practice with certain dictionaries and contributors.
Less categorical on Category:Native Korean words but I don't see why it should be removed. Anatoli T. (обсудить/вклад) 13:36, 21 March 2024 (UTC)[reply]
If our English entries had ever adopted a similar system, there would be people defending it too. "Of native origin" is not an etymology nor are entries perceived to be of "native" origin a valid category. I've seen this even be misused for compounds derived entirely from recent borrowings that nobody would ever call "native". In short, this system is nothing more than an absurd cop-out. — SURJECTION / T / C / L / 14:15, 21 March 2024 (UTC)[reply]
And to illustrate just how terrible of an idea it is to combine an attestation template with a "this is native I promise" template (which does not make the latter any less absurd), look at 가라치 (garachi) and 고두리 (goduri). Truly, some native Korean terms that were borrowed from Mongolic. — SURJECTION / T / C / L / 14:19, 21 March 2024 (UTC)[reply]
Here's the misuse too for good measure. — SURJECTION / T / C / L / 14:21, 21 March 2024 (UTC)[reply]
Definitely agree to remove. Logged in just to reply to this. Also thanks, Surjection, great examples as to why it's a bad/misleading template. Assuming right from the get-go that a word is of native origin simply won't do. Additionally, we use Of native Korean origin. for everything attested; so, it adding those word to a category called 'Native Korean words' is a really bad practice (Compare a Chinese loan ). Also, I share Surjection's suggestion that that the template should be changed (i.e. remove parameterless option, categories) and renamed (to something like a ko-attest). - Solarkoid (talk) 14:40, 21 March 2024 (UTC)[reply]
The issue is whether this corresponds to a vocabulary marking (for an 'unmarked' value) that is relevant to Korean grammar. In English, the abstract lexicon has words marked as 'foreign' or 'Latinate', for which the unmarked value approximates to 'native', for which 'effectively native' may be a better term. And in English, the 'native' category includes such words of French origin as beak and beef. For grammar, being 'Sino-Korean' is reported above to be a matter of synchronic fact, not of historical fact. --15:01, 21 March 2024 (UTC) RichardW57m (talk) 15:01, 21 March 2024 (UTC)[reply]
If it has significant relevance, then it should be documented in some way, but by using a better term than "native" and by without abusing the etymology section to document it. — SURJECTION / T / C / L / 15:07, 21 March 2024 (UTC)[reply]
@Surjection: Agree. But let's not just delete the records, but rather eliminate them by conversion to the new way of documenting. --RichardW57m (talk) 15:32, 21 March 2024 (UTC)[reply]
Sure, I'd be fine with (a) splitting first attestation into its own template, (b) moving {{ko-etym-native}} out of the etymology section to somewhere else (maybe usage notes), (c) rewording the text displayed by the template as appropriate, and (d) getting rid of Category:Native Korean words and stopping the template from categorizing. — SURJECTION / T / C / L / 15:38, 21 March 2024 (UTC)[reply]
@Surjection: Thank you for the valuable input. I definitely see value in keeping some kind of label (CC: @Chom.kwoy), but otherwise agree that all of this is cruft that needs to be gone.
Perhaps, then, we may just deprecate the entire ko-etym-native template and then gradually replace it with a near-identical template with parameterless invocation and categorization removed? (We can still use Module:ko-etym for this new template.) We should be able to use either one of Category:Pages using deprecated templates or Category:Native Korean words to keep track of all the entries that need to be updated (eventually). Lunabunn (talk) 19:11, 21 March 2024 (UTC)[reply]
@Atitarev I am not sure how applicable your points are to Korean.
  1. I liken this to Japanese on'yomi and kun'yomi - native Japanese and Sino-Japanese readings of Chinese characters (kanji).
    • This is already covered by the Sino-Korean template, as you have noted.
  2. it also distinguishes from loanwords from European languages
    • This is already covered by the fact that, ya know, modern loanwords are marked as such. If an etym section does not say it is a loan, that is enough to perceive that it is not a modern European loanage such that this kind of thing would matter. Conversely, just because a word is not a modern European loan does not mean it is of native origin. See the many Mongolian/Japanese/Korean... loans, for example, that are now perceived as native. (e.g. 수라 (sura), 담배 (dambae), 김치 (gimchi))
  3. more modern Japanese loanwords where occasional tensification of consonants occur.
    • This is already explicitly covered by Template:ko-ipa. Note that not all words that receive this "loanword-like" tensing are loanwords, nor does every loanword receive this kind of tensing.
But most importantly, once you add a first attestation to Template:ko-etym-native, the "native origin" message disappears. So even if everything you said is true and indeed "native origin" is a useful label here, this template is still the worst of both worlds. Lunabunn (talk) 01:24, 23 March 2024 (UTC)[reply]
Support removal. We do need some kind of hidden category tracking though, like "Korean words with no etymological history" or something like that. AG202 (talk) 03:47, 25 March 2024 (UTC)[reply]

Splitting Etymology by Accentuation[edit]

(Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Getsnoopy, Rishabhbhat, Dragonoid76, RichardW57, Exarchus): When forms of a lemma differ in accentuation which is not normally marked in the orthography, my reading of WT:EL says that if we have entries for the forms, they should be recorded under different etymologies, as is done for the present and past of English read, which have different vowel sounds. I have accordingly followed that rule at Sanskrit अमृता (amṛtā, deathless), where the Vedic accent is on the first syllable of the word in the vocative and on the second syllable in the other cases. Should or could I instead have followed the example of Russian сковороды (skovorody, frying pan), where there is only one etymology section and the two pronunciations are given in the same pronunciation section, with the pronunciation tied to the relevant noun form entry by the accent shown on the Cyrillic? --RichardW57m (talk) 16:56, 21 March 2024 (UTC)[reply]

@RichardW57m I'm more inclined to say we should follow the example of сковороды (skovorody) for a few reasons:
1. "Sanskrit" represents a continuum of Indo Aryan languages, including Vedic, but in general most entries are Classical Sanskrit (which does not have the Vedic stress), i.e. Classical Sanskrit is the default and we tend to use "(Vedic)" as it is required.
2. Hence, it seems like needless complication, and right now the page for अमृता (amṛtā) is crazily complicated and tries to over-explain. Why are there so many entries for "Noun" for "sandhi form of अमृत (amṛta) under various definitions? We should just have one entry like "sandhi form of अमृता (amṛtā)" which covers all the definitions and maybe put a "Usage Note" there if there is something specific to address. These are all non-lemma forms, and hence the actual stuff like Usage Notes and meaning nuances based on stress should probably be on the lemma page, as I understand.
2. There are tons of cases where an inflecting nominal/adjective has a feminine form that has it's own specific definitions. We should just have one entry for the non-lemma of the masculine form and one entry for the special feminine meanings, to be succinct. Dragonoid76 (talk) 18:22, 21 March 2024 (UTC)[reply]
@Dragonoid76: The multiplicity arises from the decision taken by someone (not me) that there are separate noun and adjective for अमृत (amṛta). Having separated adjective and noun, in this case we get at least one lemma for each of the three genders. अमृता (amṛtā) is a form of each of these four lemmas, and for each lemma they include both vocative and non-vocative forms, so two semantically distinct pronunciations for each of the four. We thus end up with two adjective forms, one noun and five noun forms. (Forms identical with the lemma do not get separate entries; the feminine noun is a lemma.) We use the same PoS heading for both noun and noun form, in accordance with WT:EL#Part of Speech. My understanding is that forms of different lemmas should have different entries.
Vedic Sanskrit is a valid language and therefore its words are eligible for inclusion. I've cross-referenced the forms with different accents to cover words with pitch accents not indicated, as in the writing of Classical Sanskrit. --RichardW57 (talk) 02:07, 22 March 2024 (UTC)[reply]
@Dragonoid76: You seem to have overlooked specific meanings of the neuter form. --02:07, 22 March 2024 (UTC)[reply]
Even with the collapsing of what are currently nouns into the adjective, we would need different entries for semantically distinctive differently placed Vedic accents. I think we would still have to separate out at least three nouns, for 'ambrosia', 'kudzu' and 'root', and a bunch of feminine plant names. --RichardW57 (talk) 02:07, 22 March 2024 (UTC)[reply]
@RichardW57 What do you make of something like User:Dragonoid76/sandbox/अमृता. I think all the "See also" boxes are very excessive and will be quite hard to replicate on all pages since Sanskrit has quite a bit of syncretism and words with multiple meanings. Here, all different forms of words with the same pronunciation and with the same etymology are put under the same header. Dragonoid76 (talk) 05:49, 22 March 2024 (UTC)[reply]
@Dragonoid76: I don't object in principle to the pronunciation section, but I'm not sure it satisfies the requirements of Gangetic chauvinism, for it gives a clearly greater rôle to the Roman script than to Devanagari. I think we may need to add to the capabilities of {{sa-IPA}}.
The orthographically identical forms of the same part of speech should either be kept together or cross-linked, as a reader may think that a PoS section is exhaustive. Possibly, and this also applies to the current form of the page, the noun forms should have a link to the feminine noun, as it may not be obvious to the reader that the forms of the feminine noun will not have separate definitions. See also is quite appropriate for these cross-links.
As to the work involved, I would remind you that 'lexicographer' is famously defined as 'a harmless drudge'.
I'm not confident that the botanical senses have the same etymology as the more obviously derived senses. The use case for not grouping senses with the same etymology as a single word is the risk of etymologies being added for a single sense, which is more likely to happen with derivative lemmas, such as participles, than with case forms such as we have. Incidentally, these case forms don't all have the same lemma form, and I will correct that in the mock-up. --RichardW57m (talk) 11:32, 22 March 2024 (UTC)[reply]
@RichardW57m For the accentuation you can add the Devanagari accents like this: अ॒मृता॑ (amṛ́tā) ... अमृ॑ता (ámṛtā).
For me it makes simply no sense to have 'See also' for something on the same page. Exarchus (talk) 15:38, 23 March 2024 (UTC)[reply]
@Exarchus: Are you unaware of the senseif/etymid functionality? That enables one to link to specific elements within a language's section. --RichardW57 (talk) 16:39, 24 March 2024 (UTC)[reply]
@RichardW57 I see what you mean, one could end up on one section of a page and not notice the others, but is this really relevant here (as we are talking about non-lemma forms)? Why would someone link specifically to these forms instead of simply the main lemma? Exarchus (talk) 16:57, 24 March 2024 (UTC)[reply]
@Exarchus: Yes, it is relevant. When looking up अमृता with Vedic accent unknown, one will find a matching section, and quite likely forget to look for one with a different Vedic accent. Remember, Wiktionary's published aim is to cover every word, not every lemma.
The main utility of the etymids (I really wanted 'wordid' and was tempted to abuse senseid) is in these intra-page crosslinks, but there may be other occasions to link to them. --RichardW57m (talk) 10:17, 25 March 2024 (UTC)[reply]
@RichardW57m I'm still not sure how this is different from someone looking up the English noun swallow, then finding "(archaic) A deep chasm or abyss in the earth", reasoning that it's probably not archaic, so it's much more likely "The amount swallowed in one gulp; the act of swallowing", forgetting that there's also an 'Etymology 2'. Exarchus (talk) 11:04, 25 March 2024 (UTC)[reply]
@RichardW57m To put this discussion into perspective: of all the forms currently classified under 'Etymology 2' (ámṛtā), only the vocatives (dual and plural) of the adjective occur in the DCS. And none of these vocatives actually has a pitch accent, as they don't occur at the start of a pāda. (Even in Padapatha they are written अ॒मृ॒ता॒ (amṛtā).) Exarchus (talk) 22:20, 25 March 2024 (UTC)[reply]
@Exarchus: I don't think the DCS is exhaustive. We also have the policy of allowing regular inflected forms despite their lack of attestation. On the basis of this, I've been adding terms for Pali imperative singular actives when they are the same as another word, to avoid Pali inflection tables wrongly linking to words of other languages, regardless of whether I can find an attestation of the Pali form. The alternative approach would be to stub the vocative dual and plural forms out from the noun inflection tables for अमृत (amṛta) and अमृता (amṛtā).
This now gives us three different pronunciations with the same basic spelling!
However, I think I have a better solution, which is to use
{{sa-adj form|tr=amṛ́tā|tr2=amṛtā|tr3=ámṛtā}}, which currently yields
अमृता (amṛ́tā or amṛtā or ámṛtā)
This has implicit parameters |head=, |head2= and |head3=, which implicitly default to the page name.
We can then use the usage notes section to explain that dependent upon position, the vocative is either unaccented or is accented on the first syllable, while the other cases are accented on the second syllable. --RichardW57m (talk) 10:45, 26 March 2024 (UTC)[reply]
@RichardW57m Sounds like a good idea Exarchus (talk) 11:30, 26 March 2024 (UTC)[reply]
@Benwing2, Theknightwho This relies on the undocumented merger of the output of the terms - may this be relied upon? --RichardW57m (talk) 10:45, 26 March 2024 (UTC)[reply]
And I suppose that if one wants to link specifically to one of these forms (because it's in a citation), then one should be able to link directly to the correct form (either ámṛtā or amṛ́tā), so people shouldn't have to look at the other one. Exarchus (talk) 17:19, 24 March 2024 (UTC)[reply]
  • If I understand correctly, read is the exception, not the common practice in English entries. This search reveals many English terms like object and document with one etymology section and one pronunciation section that splits the pronunciations based on part of speech. — excarnateSojourner (ta·co) 15:59, 26 March 2024 (UTC)[reply]
    @excarnateSojourner: With regard to the method of you searched for, looking for "{{a|noun}}", the documentation of {{accent}} says, "It should not be used for other qualifiers like noun, verb, adjective, and so on"! It does, however, say that {{qualifier}} is used for these, and indeed, if often is. WT:EL does not mandate following the example of read, but merely gives the example of lead as where one may use 'etymology' to label pronunciations. Your finding usefully supports the solution I currently favour; I can use {{q}} to grammatically label the pronunciations themselves in this Sanskrit case. --RichardW57m (talk) 16:51, 26 March 2024 (UTC)[reply]

Equivalent of Template:ellipsis of in compounds[edit]

I think we would need something like this. There are quite a few cases in e.g. Finnish where a word can stand for a compound using that word, if it is clear from context, e.g. kone. Currently many of these use {{short for}}, but that is not ideal. — SURJECTION / T / C / L / 23:25, 23 March 2024 (UTC)[reply]

FWIW, for me {{ellipsis of}} implies a multiword term, not a compound, which is why I don't feel like using it. — SURJECTION / T / C / L / 23:29, 23 March 2024 (UTC)[reply]
{{clipping of}}? Vininn126 (talk) 08:49, 24 March 2024 (UTC)[reply]
Doesn't really fit either - we are removing whole words, even if they are still part of a compound word. — SURJECTION / T / C / L / 08:57, 24 March 2024 (UTC)[reply]
I've switched these to {{ellipsis of}} for now - but I still feel it might be better to have a separate template for this. — SURJECTION / T / C / L / 14:42, 24 March 2024 (UTC)[reply]
Hmm... I think this kind of thing (where lexical) must be clipping, ellipsis, or "short for"; at least, I don't think we could realistically distinguish some fourth category ("Template:shortening of"?) from T:short for, ellipsis, and clipping; there's too little distinction and too much overlap.
I can find a few works discussing ellipses of compounds, and slightly more works discussing clipping compounds (many are discussing clipping words out of a spaced multi-word compound, but this seems to function identically to clipping unspaced compounds; as with kone, a manual for a washing machine might say to load things into the machine). And of course other works discuss some element of a compound being "short for" or "a shortening of" the compound. Quotes.
But we should perhaps consider at what point this is no longer lexical (no longer "one of the definitions of kone is pyykinpesukone") and is just hypernymy; I mean, you can also shorten sports car to car or Chinese food (Indian food, delicious food, more food, etc) to food if the meaning is clear from context... ("I ordered Chinese food. Later, when I was eating that [Chinese] food, I noticed it had onions in it.") - -sche (discuss) 20:09, 25 March 2024 (UTC)[reply]
@-sche @Surjection I am inclined to agree with -sche here. I do understand the idea that ellipses are multiword, but I think that is somewhat of an arbitrary distinction. Consider English vs. German, for example, where compounding is similar but English often writes the compounds open whereas German writes them closed. I even think {{short for}} should usually be rewritten as either ellipsis, clipping or abbreviation. Benwing2 (talk) 05:54, 26 March 2024 (UTC)[reply]

These are pretty much all letter entries where the entire entry uses the "mul" language code, but the header is that of a specific language (there's often a Pronunciation section using the language code that matches the header- but that's it).

My understanding is that there should be two types of letter entries:

  1. A translingual one giving the kind of information that's not specific to any one language, such as Unicode codepoint. For this, both the "mul" language code and the "Translingual" header are required. This always goes at the top of the page, though the language section may also include mathematical and other non-language-specific symbols that use the same character.
  2. A language-specific one giving the type of information that's typically different between languages, such as position in the language's alphabetic order, and its pronunciation. For this, the Wiktionary language code for a specific language and the header with the Wiktionary name of that language are required. This always goes in the same place on the page that any entry for that language would go. If there are multiple languages that use a given character, there should be multiple language sections.

IMO, there should be a language section (of the second of the two types above) for every language that uses that letter as part of its standard orthography. There probably should also be a single Translingual section.

There should never be a Translingual entry with the header for a specific language, and there should never be a section for a specific language with a Translingual header. There should also never be a pronunciation section in a Translingual entry, unless it's a phonetic symbol- "Translingual" isn't a language, so it has no speakers.

This is the prevailing practice at Wiktionary since we've had language codes, language headers, and translingual entries. Do we have this spelled out somewhere, so we can point to something when we tell people to stop doing otherwise?

Finally, I would like us to get rid of all the entries in this category, by fixing them. I see three ways to do this:

  1. keep the "mul" code and give them a Translingual header
  2. keep the language-specific header, but change all the language codes to match the language of the header.
  3. split them into two sections: a Translingual one at the top of the page with "mul" language codes, and a language-specific one in the body of the page with matching language-specific headers and language codes.

What does everyone think? Chuck Entz (talk) 01:38, 24 March 2024 (UTC)[reply]

Wouldn't it be easier to have a table under the Translingual L2 that contained columns for language name, pronunciation, serial position, alternative representations, and anything else that applied to more-or-less every language's use of the letter/symbol, with an extra column for language(-family)-specific info or, at least, links to sources for such info? The same idea could be applied to personal names. Maybe the same approach would be useful for taxonomic names, etc. DCDuring (talk) 14:16, 24 March 2024 (UTC)[reply]
That might work for letters, which are generally not treated as words in the various languages (as written), but personal names have all kinds of grammatical and etymological information that doesn't go well in a table format. Also, there's Special:PrefixIndex/Template:list:Latin script letters, etc. Chuck Entz (talk) 14:46, 24 March 2024 (UTC)[reply]
I worry I'm cutting the Gordian knot with too simple a solution, but the bulk of the entries currently in the category seem to be Osage entries created by a single user (whose defiance of other norms, like one POS header per POS, people have discussed a few times including recently), and to me, the simplest / easiest / "least-change" solution seems to be to just revise the language code to match the language header in line with how we also have e.g. as ==Korean== and not ==Translingual==. This is what I would've done (quietly, as basic cleanup, without even thinking to have a BP discussion about it) if I had seen a new user adding these. Like with , there does not currently seem to be any codepoint information in 𐓶̋ to discuss whether or not to have a ==Translingual== section for. Once the Osage entries are changed, we can see whether any other entries in the category look like they would need different treatment, e.g. any letters which are actually used by more than one language, but I don't suppose it makes sense to discuss a table to compactly provide multiple languages' pronunciations of 𐓶̋ if only one language uses it. - -sche (discuss) 15:18, 24 March 2024 (UTC)[reply]
Note that Osage 𐓶̋ isn't a real letter, but a letter plus diacritic combination, which was encoded too late to be assigned a precomposed character in Unicode. I'm not sure if we have a policy on giving the code points (or even entries) for such things; we generally don't have entries for combinations of Indic letters and vowel 'diacritics'. I've a feeling there was sentiment was against allowing such entries, but rather insisted on status as letters, as seen in Scandinavia but generally not France or Germany, but with a counter-argument that Unicode characters in use were eligible. --RichardW57 (talk) 16:32, 24 March 2024 (UTC)[reply]
@-sche: I don't understand your comparison with (ryeo), which does have codepoint information, and is a precomposed character. --RichardW57 (talk) 16:32, 24 March 2024 (UTC)[reply]
Chuck seemed to suggest we might need/want Translingual entries to handle non-language specific information like what Unicode codepoint(s) the letter has, but these letters do not currently have such information, so I'm saying the simple fix is just to fix the headword-line template to use the language code that corresponds to the L2. - -sche (discuss) 20:15, 24 March 2024 (UTC)[reply]
I stumbled across today. The entry seems to have "alternative forms" shared between two languages. Isn't this common?
Is there other information, besides the Unicode codepoint that is shared by all languages that use the letter/character/symbol? Like typographic realizations, as in ? DCDuring (talk) 23:53, 24 March 2024 (UTC)[reply]
@DCDuring: I don't think it's common, and such information probably belongs on Wikipedia rather than here. For an entry like Osage 𐓶̋, I think we simply want a head line template such as {{head|osa|letter|upper case|{{uc:{{PAGENAME}}}}|tr=-}}. If we chose to keep such entries, we can then worry about displaying the codepoints for the non-letters with PoS 'letter'. (We probably already have a template to do it.) Can we just go ahead and fix the Osage letters and letter-accent combinations? --RichardW57m (talk) 18:11, 25 March 2024 (UTC)[reply]
@-sche: When you wrote, "Like with 려", did you mean "Unlike with 려"? --RichardW57m (talk) 10:41, 25 March 2024 (UTC)[reply]
려 contains no information about what Unicode codepoint(s) a user would have to enter/use to obtain the character. Like with 려, 𐓶̋ contains no information about what Unicode codepoint(s) a user would have to enter/use to obtain the character. - -sche (discuss) 18:48, 25 March 2024 (UTC)[reply]
@-sche: The codepoint information for is generated by the invocation {{character info}} at the top of the page. It's true that you have to work to get the NFD code sequence, as the vowel is given as a compatibility jamo rather than as a combining jamo. For me it displays at the top right of the page. If you click on a language-tagged link, you may have to scroll up to see that information box. --RichardW57 (talk) 05:38, 26 March 2024 (UTC)[reply]
The 'simple fix' hides the omission, so we don't remember to supply the lack. --RichardW57m (talk) 10:41, 25 March 2024 (UTC)[reply]
Belay this. The straightforward fix above hides nothing. --RichardW57m (talk) 18:16, 25 March 2024 (UTC)[reply]
I agree. And this shows once again why this user should've stayed blocked. AG202 (talk) 03:44, 25 March 2024 (UTC)[reply]
Anyway, unless someone has a cogent objection, I will just fix these entries with AWB soon. - -sche (discuss) 18:48, 25 March 2024 (UTC)[reply]
@-sche What are 'these' entries, and what changes will you make? I'm not sure that the fix for 🐀 is obvious. The English meaning of the emoji looks dependent on English - I wouldn't be surprised if it were associated with endearment in Thai usage - compare Thai หนู (nǔu). --RichardW57m (talk) 12:14, 26 March 2024 (UTC)[reply]
As I described above, the fix is to conform the few stray instances in each entry where the language code doesn't match the language the entry is declared to be in. For the rat, that means [11]. - -sche (discuss) 14:24, 26 March 2024 (UTC)[reply]
@-sche: OK, that {{mul-symbol}} does less than I thought it did. But just replacing 'mul-letter' by 'osa-letter' damages the entries - the undocumented template doesn't display the casing partner, which I'd rather than calculate than enter manually. --RichardW57m (talk) 18:00, 26 March 2024 (UTC)[reply]
@Chuck Entz: A vote to prohibit creating an entry for every letter of every alphabetically written letter gained significant support, but not enough to make it policy. As there are, according to Wiktionary:List_of_scripts, 5,114 languages on Wiktionary using the Roman script, I think it's a bad idea to do things that way before pages are split by language. --RichardW57m (talk) 11:55, 26 March 2024 (UTC)[reply]
The vote was Wiktionary:Votes/2020-07/Removing_letter_entries_except_Translingual - there was a majority in favour, but not a consensus. --RichardW57m (talk) 18:02, 26 March 2024 (UTC)[reply]

Chinese lect labels and categories[edit]

(Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): @Theknightwho Sorry to ping everyone again. I am going through and adding labels to Module:labels/data/lang/zh for missing Chinese lects and creating the associated categories, and it's revealing some issues in the way that categories are currently handled. To fill everyone in who isn't familiar with the unique way that Chinese is handled:

  1. All lects go under the Chinese L2 header.
  2. The 'Foo lemmas' categories are added for individual Chinese languages using the {{zh-pron}} template in the Pronunciation section, which lists all the possible pronunciations of a given term in different Chinese languages (including Old Chinese and Middle Chinese).
  3. There are also labels, added using {{lb|zh|...}} or {{tlb|zh|...}}, to identify that a given sense is defined only for specified Chinese lects.

The problem here is that different structures are used for the categories generated by the labels vs. the categories generated by {{zh-pron}}. In particular, a label like Mandarin, Hakka or Xiang places the term in (respectively) categories Category:Mandarin Chinese, Category:Hakka Chinese and Category:Xiang Chinese, which are subcategories of Category:Dialectal Chinese (which in turn is a subcategory of Category:Regional Chinese, which is ultimately under Category:Chinese language). It's true that Category:Mandarin Chinese is also a subcategory of Category:Mandarin lemmas, and similarly for Category:Hakka Chinese, Category:Xiang Chinese, etc., but the existence of two categories for each language seems redundant. To make matters worse, as I've been creating new categories for missing lects like Category:Dabu Hakka, I haven't been explicitly specifying the parent category as e.g. Category:Hakka Chinese, with the result that they're placed under Category:Regional Hakka (rather than a child of Category:Regional Chinese), which leads to a very different breadcrumb trail than older lect categories like Category:Hong Kong Hakka. I propose the following:

  1. Eliminate categories 'Foo Chinese' as much as possible. Labels like Mandarin and Hakka should directly categorize into Category:Mandarin lemmas and Category:Hakka lemmas. (It's true that such labels could potentially be used for non-lemma forms as well, but in practice this isn't an issue due to the fact that Chinese languages have almost no morphology to speak of.)
  2. Older-created lect categories like Category:Hong Kong Hakka should have their parents set to e.g. Category:Regional Hakka rather than Category:Hakka Chinese. This brings Chinese lects in line with non-Chinese lects, which always work this way.

This leaves a few outstanding issues:

  1. What about the remaining lemmas in categories like Category:Malaysian Chinese and Category:Philippine Chinese? The problem here is that the lemmas are tagged just using the labels Malaysia and/or Philippines without properly identifying which language is involved. Maybe these can be recategorized by bot but I don't know enough about Chinese to do it without help.
  2. Related to this: Currently categories like Category:Malaysian Chinese and Category:Philippine Chinese are children (sometimes grandchildren, etc.) of Category:Overseas Chinese. Do we want to bother with per-language categories like Category:Overseas Hakka, Category:Overseas Hokkien, Category:Overseas Teochew, etc. or just put things like Category:Malaysian Hokkien directly under Category:Regional Hokkien?

Benwing2 (talk) 00:08, 26 March 2024 (UTC)[reply]

@Theknightwho Do you have any opinions here? As I'm filling out the labels and descriptions in Module:labels/data/lang/zh, the dual hierarchy is becoming more and more annoying. Benwing2 (talk) 00:05, 4 April 2024 (UTC)[reply]
I'm also thinking we should add a field to the language extra data and/or the {{auto cat}} call on the language page, which describes the language in more detail. For example, Category:Eastern Min Chinese currently has the definition "Terms or senses in a branch of the Sinitic languages spoken in eastern Fujian Province in southeast China, as well as in parts of extreme southern Zhejiang Province to the north of Fujian and in the Matsu Islands (belonging to Taiwan)", which gives a lot more information than the Category:Eastern Min language page, which just says it's a language spoken in China and some other countries. Benwing2 (talk) 00:09, 4 April 2024 (UTC)[reply]
@Benwing2 I've not been keeping up with the Chinese label discussions, so I'll need to catch up with everything first. I agree the dual system is silly, though. Theknightwho (talk) 02:43, 4 April 2024 (UTC)[reply]

Adding Transitional Proto-Norse as an etymology-only language[edit]

Transitional Proto-Norse is the name for the language found in Scandinavian runic inscriptions from around 550–650. It's basically the last stage of Elder Futhark writing before the transition to Younger Futhark (generally also considered the start of Old Norse), and has several orthographic traits in common with the YF. The relevant inscriptions also show innovations found in Old Norse, but not yet in classic Proto-Norse, e.g. syncopation and the merger of the 2nd and 3rd persons in the present tense indicative of verbs. I think having it as an etymological-only language would be useful. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 21:30, 26 March 2024 (UTC)[reply]

@Mårtensås No objections here; we can use a code like gmq-tra for this. Benwing2 (talk) 01:54, 28 March 2024 (UTC)[reply]
@Benwing2 That code would work. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 19:36, 28 March 2024 (UTC)[reply]
What sources do you have for this name? -- Sokkjō 06:55, 28 March 2024 (UTC)[reply]
It's well accepted in the literature. If you search "transitional period" "Proto-Norse" on Google you will find numerous digitalised scholarly articles that uses it. Düwel & Nedoma (Runenkunde 5th edition, p. 124):
Einige skandinavische Inschriften um 600 zeigen bereits Charakteristika des jüngeren Fuþark, z.B. hAborum (ᛡ A für a) und hagestumʀ (ᚨ a für an) auf Stentoften (S. 25 f.) oder uþArAbsbA (-sbA für -spā) auf Björketorp (S. 51). Aus der nachfolgenden Zeit bis in das 8. Jh. hinein sind relativ wenige runenepigraphische Texte überliefert; die Beschaffenheit des Korpus dieser Übergangsinschriften (engl. transitional inscriptions) wird verschieden beurteilt (s. u.a. Birkmann 1995, 219 ff.; Barnes 1998; Grønvik 2001a, 64 ff.; Schulte 2010; Stoklund 2010).
The references are:
  • Thomas Birkmann, Von Ågedal bis Malt. Die skandinavischen Runeninschriften vom Ende des 5. bis Ende des 9. Jahrhunderts (= RGA-E 12; Berlin – New York 1995).
  • Michael P. Barnes, The Transitional Inscriptions. In: Düwel / Nowak 1998, 448–461.
  • Ottar Grønvik, Über die Bildung des älteren und des jüngeren Runenalphabets (= OBG 29; Frankfurt/Main etc. 2001).
  • Michael Schulte, Der Problemkreis der Übergangsinschriften im Lichte neuerer Forschungsbeiträge. In: Askedal et al. 2010, 163–189.
  • Marie Stoklund, The Danish Inscriptions of the Early Viking Age and the Transition to the Younger Futhark. In: Askedal et al. 2010, 237–252.
ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 18:02, 28 March 2024 (UTC)[reply]
I've never seen the term "Transitional Proto-Norse" though. For many languages, we have etymology-only codes for late and early forms, and Early Old Norse and Late Proto-Norse are terms that actually have traction in the literature, so I would recommend either [non-ear] or [gmq-lat] instead. -- Sokkjō 19:51, 28 March 2024 (UTC)[reply]
Some pings: @Mnemosientje, Mahagaja -- Sokkjō 19:54, 28 March 2024 (UTC)[reply]
I prefer "Late Proto-Norse" as well. The phrase "transitional period of Proto-Norse" is common enough, but not "Transitional Proto-Norse" as a lect name. —Mahāgaja · talk 20:09, 28 March 2024 (UTC)[reply]

Etymology trees[edit]

I would like to add etymology trees to some of our entries. Here's an example which could potentially be used in mainspace. I personally think there's value in representing etymology in such a visual way, and these trees are extremely easy to create (my template can create them automatically). I'd like to hear your thoughts. Ioaxxere (talk) 02:15, 27 March 2024 (UTC)[reply]

Support The visual styling is helpful for understanding. Maybe in instances where the etymology is more than x levels deep, a tree could be created. I'm open to others' thoughts as well. —Justin (koavf)TCM 02:20, 27 March 2024 (UTC)[reply]
Oppose The example looks like {{desctree}}, just on a different page. --RichardW57m (talk) 13:48, 27 March 2024 (UTC)[reply]
@RichardW57m: I think you're misunderstanding the concept. A descendants tree and an etymology tree are exact opposites in that a descendants tree has all the descendants for a particular ancestor while an etymology tree has all the ancestors for a particular descendant. They only look similar when the tree is a simple chain (i.e., A -> B -> C -> D). Some etymology trees are much more complicated, like puny or every. Ioaxxere (talk) 21:20, 27 March 2024 (UTC)[reply]
@Ioaxxere: So why not give examples of the trees for these? --RichardW57m (talk) 09:23, 28 March 2024 (UTC)[reply]
@RichardW57m: See #New design, #Feedback on proposed label designs, and Special:Permalink/78726876 for more examples. Ioaxxere (talk) 08:04, 1 April 2024 (UTC)[reply]
Support I concur. I think there should be a way the etymology section can be visualised since it can definitely make things more clearer. It can also make things more codified. However, if this becomes permanent, a point to discuss would be lemmas who's etymology isn't clear, ie. how would they be represented visually, or for lemmas which have mixed etymology (for instance lemmas who's specific sense is influenced by a different language). نعم البدل (talk) 23:54, 27 March 2024 (UTC)[reply]
Mild Oppose. I am opposed in particular to the current implementation involving a separate {{etymon}} specification with information duplicated between the {{etymon}} call and the actual text of the Etymology sections. I have expressed my concerns above. Benwing2 (talk) 01:53, 28 March 2024 (UTC)[reply]
@Benwing2: I agree that duplication is not ideal. My original idea was to have the template automatically generate text and have that be the only thing in the etymology section, but I feel now that that's only really possible for the simplest cases. Another idea is to use special flags on our current templates, so something like From {{m|en|something}} might be replaced with From {{m|en|something<id:whatever><etymon>}} to mark that particular term as an etymon. Ioaxxere (talk) 05:58, 28 March 2024 (UTC)[reply]
@Ioaxxere As User:AG202 points out, this is a huge change, and needs a lot more thought and design before being rolled out. That's another reason I Oppose adding it to mainspace at this time. Benwing2 (talk) 06:02, 28 March 2024 (UTC)[reply]
Weak oppose: The issues with spacing need to be resolved, and there need to be more eyes on the matter. I like the idea, nonetheless, but it feels very rushed for such a big change. (I'm also not really sure I like the way it looks that much?) AG202 (talk) 02:03, 28 March 2024 (UTC)[reply]
Strong oppose: Looks just awful. -- Sokkjō 06:53, 28 March 2024 (UTC)[reply]
Strong support - looks pretty much the same as what we already have in descendant sections, and automating this has huge potential. Theknightwho (talk) 04:08, 29 March 2024 (UTC)[reply]
Support I think it has potential. I saw the redesigned tree and it looks very promising! That being said, I still would like to see the end product before fully implementing it. — Sameer مشارکت‌هابحث﴿ 07:07, 30 March 2024 (UTC)[reply]

New design[edit]

I've created a new etymology tree design. See below:

Let's do a new poll in regards to this design, as a few oppose votes were made on the basis of the template's appearance. Ioaxxere (talk) 07:46, 29 March 2024 (UTC)[reply]

Still Oppose, except this one straight up does not work well on mobile. Again, with a change this visible and massive, I'd avoid rushing it. It needs to be properly tested everywhere. I really do support the idea, but it needs to be done well. (Also one might argue that it'd need to have a formal vote, and if it does, you'd really need to have it entirely fleshed out) AG202 (talk) 07:57, 29 March 2024 (UTC)[reply]
I definitely agree that such a change would need a formal vote because it is a pretty large change to how the dictionary looks. Thadh (talk) 09:15, 29 March 2024 (UTC)[reply]
Shame it doesn't work on mobile, otherwise it's a great improvement over the first one. It makes things so much clearer. نعم البدل (talk) 15:34, 29 March 2024 (UTC)[reply]
@نعم البدل: What does it look like on your device? On my phone it seems to be working properly. Ioaxxere (talk) 15:55, 29 March 2024 (UTC)[reply]
@Ioaxxere: Sorry, I meant like it becomes too wide, not that it doesn't work (but that's generally with any visual template on this website). I tried it on my phone again, just now, and it turned out to be an anomaly before. It works better than other visual templates on this website. نعم البدل (talk) 16:03, 29 March 2024 (UTC)[reply]
@نعم البدل: Good to know! And thank you for the kind words—I've spent an absurd amount of time making those grey connectors pixel-perfect. Also note that the tree on English puny is one that is exceptionally wide. Here's a more representative example, for English father:

Proto-Indo-European *peh₂-

Proto-Indo-European *-tḗr


Proto-Indo-European *ph₂tḗr


Proto-Germanic *fadēr


Proto-West Germanic *fader


Old English fæder


Middle English fader


English father

No poll. Let's have a discussion about the design. Also, I think it should be working fine on mobile now. Ioaxxere (talk) 08:46, 29 March 2024 (UTC)[reply]

Oppose: If we ever started creating etymology trees for entries, it would have be a very complex module, with ways to mark derivational types, certainty, alternatives, mergers, etc. Create such a module and then maybe we can explore its inclusion on entry pages with a proper vote. --{{victar|talk}} 19:58, 29 March 2024 (UTC)[reply]
Oppose. I'm impressed by the visual output and effort that went into this, but we are still a dictionary. Treeing around is amusing but unscientific. {{desctree}} is also in my opinion much abused. Common sense tells us when an etymology or a descendants branch should stop. Catonif (talk) 17:53, 30 March 2024 (UTC)[reply]
Abstain. Interesting idea but it is missing a distinction between inheritance and borrowing and seems rather elaborate to implement. I would rather we focus on your other idea of limiting ety sections to ‘one step up’ (when the ancestor entry exists) and filling in the rest with an automated module. Said module could then later be modified to output tree diagrams, perhaps, if that can be done well. Nicodene (talk) 22:42, 16 April 2024 (UTC)[reply]
Abstain I like the idea and the look; I get more information more quickly from such a small diagram than from a paragraph of text. But I think this idea should be worked out in more detail before it is adopted, as expressed by Victar. Which entries should qualify for etymology trees? —Caoimhin ceallach (talk) 13:15, 22 April 2024 (UTC)[reply]

Changing the letters sort order on Arabic dialects' category pages[edit]

I suggest changing the letters sort order on Arabic dialects category pages so that the non-native Arabic letters would come at the end

for example the current order for Hijazi Arabic is

آ أ إ ا ب پ ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ڤ ق ك ل م ن ه و ي

but it should be

آ أ إ ا ب ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ق ك ل م ن ه و ي . پ ڤ

with the non-native پ and ڤ coming at the end and separated by a dot, since they are not part of the original Arabic letters

and the Arabic sorting should be the same and it should be from right to left as per the example (unlike the current standard arabic left to right) since it looks more appealing and correct @Benwing2 عربي-٣١ (talk) 12:54, 27 March 2024 (UTC)[reply]

@عربي-٣١: A better argument would be examples of alphabetic orderings in use, as people have many ways of handling the alphabetical position of additional letters. --RichardW57m (talk) 13:36, 27 March 2024 (UTC)[reply]
The current order is more appealing and correct. The current order of the alphabet pays respect to internal etymological and graphical relations of the letters anyway. A division due to nonnativeness is outlandish. A German or English entry with é will also be sought under e, and so on for any Latin-script language. Interestingly German çöp is at the end of the order, and should arguably be because c is not much of a letter in German and since the 1901 spelling reform virtually only used in digraphs: I rather expect it with c anyway, as due to French and Czech borrowings, and the only reason it is otherwise now is the default Unicode order. But پ (p) has little reason not to be put to ب (b), as a specification of it, even less reason than ç to c. Fay Freak (talk) 14:58, 27 March 2024 (UTC)[reply]
I was only talking about the table where the letters are written on top of each page (like this one Category:Hijazi Arabic terms with IPA pronunciation), if so then they should be removed altogether since they are variants of the letters (like German ç is a variant of German c) not full letters and most speakers do not use them/pronounce them either
The sorting table should be as follows in every Arabic page:
1-should be either with no پ ڤ گ ژ چ or any other non-native variants
2-those variants would be put at the end, but putting the variants between the letters would give an impression as if they're part of the official language or the alphabetical order عربي-٣١ (talk) 16:38, 27 March 2024 (UTC)[reply]
There is no such thing as an official letterset, nor does the category give an impression, it is to navigate for people who already have an idea about the writing system of the language and what could occur in an entry title. Fay Freak (talk) 17:03, 27 March 2024 (UTC)[reply]
Actually, there are all sorts of orders even for the Latin script. In Vietnamese, tone marks don't create separate letters, but vowel quality modifiers do. The modified vowel letters occur at the end of the alphabet in some Scandinavian alphabets. Looking at the listing of various Arabic script alphabets in Appendix:Arabic_script, I note that while پ (p) occurs in same part of the alphabet as ب (b), its precise position varies. --RichardW57 (talk) 08:27, 28 March 2024 (UTC)[reply]
@Fay Freak. @Benwing2 Yes and I was talking only about the Arabic dialects and Standard Arabic, these additional letters are already sorted after the regular letters, but on the main table they are shown in the middle of it (پ after ب and ڤ after ف), so I suggest removing them or putting them at the end of it as they are sorted already
I did not mean to remove those variants completely عربي-٣١ (talk) 20:33, 28 March 2024 (UTC)[reply]
@عربي-٣١: A codepoint sort, which is what you are seeing, should not be confused with a considered order for sorting. --RichardW57 (talk) 03:04, 29 March 2024 (UTC)[reply]

Aquitanian entries in reconstruction namespace[edit]

As of now, all pages in the Aquitanian lemmas category are in the reconstruction namespace, aside from a few personal names which are in the main namespace. Most of these words appear to be attested, they should be moved to main. -saph 🍏 13:47, 28 March 2024 (UTC)[reply]

@Saph668 Overall this sounds fine for attested terms except that when they are moved, they should have the source more clearly indicated and use the form as actually attested rather than reconstructed. Currently they just say "Known from Aquitanian inscriptions" but that says nothing about (a) which inscription it is, (b) what form the term appeared in the inscription, (c) what the context was. Benwing2 (talk) 20:29, 28 March 2024 (UTC)[reply]
User:UtherPendrogn haunting us till this day.
My understanding is that Aquitanian is unattested, with the reconstructions based solely on place and personal names found is Latin texts, much like Gaulish. If that is indeed the case, all the proper names should be moved a Latin header or an Aquitanian reconstruction. --{{victar|talk}} 22:16, 28 March 2024 (UTC)[reply]
The Aquitanian corpus is limited to a few proper nouns in otherwise Latin inscriptions. While these proper nouns often contain elements from Proto-Basque, most entries in Category:Aquitanian lemmas are just duplicates of (sometimes dubious) Proto-Basque reconstructions written with an ad-hoc orthography. I've cleaned up a few of them (keeping them under the Aquitanian header), but I find the reconstruction entries pretty useless. If we were to move the actually attested forms to Latin, it might be a good idea to make Aquitanian an etymology-only language. Santi2222 (talk) 21:16, 18 April 2024 (UTC)[reply]

Japanese bot task proposal - on'yomi categorization[edit]

Hello, since I last posted (link broken, please Ctrl+F "Granularity of reading types"...) about the topic of whether we should specify the type of on'yomi (kan'on, goon, kanyouon, etc.) used by Chinese-derived terms (via {{ja-kanjitab}}'s |o= param) — which didn't get much activity, but I believe we agreed it would be good to specify — I've been making quite a few changes to that effect in which I simply go to the respective kanji pages, read off whether the on'yomi is kan'on or goon and just add it to the relevant entry (e.g. diff). This task is no trouble to automate, at least partially, since I could make the bot fetch all this data and see whether the reading types can be unambiguously labelled, or whether they'd overlap (I don't propose guessing, if a character has e.g. きょう as both goon and kan'on readings, which one it is). And if it's unambiguous, the reading types can simply be filled in. It also helps that some months ago I parsed out the complete readings of all the kanji we had covered at the time, so they could be accessed quite quickly (although they're a bit less current now).

Does this sound like a good/acceptable bot task? Thanks for any feedback, Kiril kovachev (talkcontribs) 14:03, 28 March 2024 (UTC)[reply]

@Kiril kovachev Hi. Please take a look at [12], which is a script I wrote awhile ago to do something very similar, which is pull out the type of reading from kanji pages and insert it into category pages of the form Category:Japanese terms spelled with FOO read as BAR, e.g. Category:Japanese terms spelled with 柄 read as ひ. It looks either for {{ja-readings}} or {{ja-kanjitab}}. As for your proposed bot task, yes it sounds fine to me. Benwing2 (talk) 20:26, 28 March 2024 (UTC)[reply]
@Benwing2 Oh, thanks for that. Do you think I should adapt your script to do the changes I need or did you just mean to have a look for ideas what needs to be checked, etc.? Kiril kovachev (talkcontribs) 20:43, 28 March 2024 (UTC)[reply]
@Kiril kovachev It's your choice. I'm not sure what state your scripts are in, but feel free to reuse/adapt the code or simply look at the logic to make sure you haven't missed anything. Benwing2 (talk) 20:50, 28 March 2024 (UTC)[reply]
Alright, thanks very much, I'll have a proper good look when I sit down and try to code it. Kiril kovachev (talkcontribs) 20:53, 28 March 2024 (UTC)[reply]

I noticed I was blocked permanently in March, while I did not edit anything these 2 months.[edit]

Can anyone give me an explanation? I don't know who is in charge now.

@Benwing2, Chuck Entz -- Huhu9001 (talk) 03:39, 29 March 2024 (UTC)[reply]

@Huhu9001 The block log shows that you were blocked in Nov 2023 by User:Theknightwho for one year from the Module space, which was changed earlier this March to a permanent block. You'll have to ask User:Theknightwho why he saw fit to block you like this. However, I do notice that several other people blocked you earlier, so this can't be chalked up to "just not getting along with a single administrator". Wyang in particular said "Repeat offender, discourteous, defiant", which means you don't recognize what you did wrong; IMO this doesn't bode well for an unblock. Benwing2 (talk) 03:48, 29 March 2024 (UTC)[reply]
@Benwing2 Huhu9001 has ignored everything I've said for months now, but I changed the block length because Huhu9001 had already had two one-month blocks from the module namespace, which made absolutely no difference to their behaviour. In March, I changed the block to a permanent one with the explanation Edit: this should require an appeal to expire, given the long-term nature of the abuse., because it seems highly unlikely that Huhu9001 is ever going to learn how to get along with other users, and I don't want to periodically have to deal with their (mis)behaviour every time a block expires. Theknightwho (talk) 03:54, 29 March 2024 (UTC)[reply]
Wyang himself left Wiktionary in a highly dishonorable manner which pretty much tells that he is himself "Repeat offender, discourteous, defiant" instead. And are you justifying a block now by another one several years ago? -- Huhu9001 (talk) 03:55, 29 March 2024 (UTC)[reply]

@Benwing2: Also I notice other well-managed Wikiprojects like English Wikipedia enforce a rule that:

Administrators must not block users with whom they are engaged in a content dispute; instead, they should report the problem to other administrators. Administrators should also be aware of potential conflicts involving pages or subject areas with which they are involved. It is acceptable for an administrator to block someone who has been engaging in clear-cut vandalism in that administrator's userspace.

(w:WP:BLOCKNO). I roughly remember someone told me this is a custom or somewhat "softer" rule here. Why it never has any effect when I need it to protect me from abuse? Is Wiktionary giving its admins too much arbitrariness? -- Huhu9001 (talk) 04:01, 29 March 2024 (UTC)[reply]

The block breaks even Wiktionary's own blocking policy (Wiktionary:Blocking policy). For infinite block length:

  1. Blatant or confirmed sockpuppets created for the purpose of vandalism or block evasion.
    Sockpuppets, nope.
  2. Abuse, plagiarism, persona non grata type blocks, based on community consensus.
    Put aside "Abuse, plagiarism, persona non grata" part. TKW silently changed my block to infinite. What consensus did he get? What consensus did he ever get for any of my blocks?
  3. Bad username accounts, including: email addresses, exploitative names, copycats, offensive names, etc.
    Nope.
  4. CheckUser-identified bad sockpuppets.
    Same as #1, nope.

Is not even Wiktionary:Blocking policy treated seriously any more on Wiktionary? -- Huhu9001 (talk)

Is Wiktionary:Blocking policy a valid document? @Benwing2, Chuck Entz -- Huhu9001 (talk) 04:10, 29 March 2024 (UTC)[reply]

@Huhu9001 I would have more patience with you if you showed even a little bit of understanding of the pattern of bad behavior you've engaged in. Instead you resort to wikilawyering and casting aspersions at people who are no longer active and hence cannot defend themselves. Benwing2 (talk) 04:36, 29 March 2024 (UTC)[reply]
When have you ever had any patience for me? Did you try, let alone to stop TKW's abusive behaviour, to think carefully about my stance and reasoning even a single time? If so I simply have never see it. "Wikilawyering" is a convenient accusation, but shouldn't there be any respect for rules, including Wiktionary:Blocking policy? For unprevileged normal users, what else can we rely on except for rules? -- Huhu9001 (talk) 04:46, 29 March 2024 (UTC)[reply]

Now since someone mentioned my block history. I can say I have never been blocked on any other Wikiprojects with the only exception being here. It is quite arguable whether the responsibility lies on my own side or on English Wiktionary's side, as English Wiktionary has almost always been failing to put meaningful checks on how its admins use their rights, just as I can see in the Wiktionary:Blocking policy case here. -- Huhu9001 (talk) 05:30, 29 March 2024 (UTC)[reply]

Japanese いぃ, うぅ, イィ, ウゥ[edit]

Bringing up this topic again because I get permanently blocked for this. Orginal discussion: Wiktionary:Beer_parlour/2023/August#Japanese いぃ, うぅ, イィ, ウゥ. -- Huhu9001 (talk) 04:35, 29 March 2024 (UTC)[reply]

To check whether they represent yi and wu or ī and ū, Here are the top hits from Google. Hits of the same proper names, mojibake and cases where it is not possible to tell, like うぅ居酒屋 (name of an izakaya), are ignored.
* いぃ
*# 那珂宣伝部/いぃ那珂暮らし (a city promotion group of Naka 那珂, "good Naka"): ī
*# 【音量注意】"樋口いぃいいぃいぃぃいいい" ("Higuchi gooooood"): ī
*# 新喜劇アキ【いぃよぉ~講座】("good"): ī
*# 海を感じる、エモいぃ~スポット ("emotional", inflection ending): ī
*# いぃべあー楽天 (a company name, "e-Bear"): ī
*# 台本のないコメディーvol.5 ~全部アドリブでいぃよぉ~~("good"): ī
*# いぃ〜バンド(e-band)結束バンド、アソート: ī
* イィ
*# yee(イィ) (a fashion brand): yi
*# イィの英訳 - 英辞郎: yi
*# 10年後イィ女になるために!! ("good"): ī
*# Satomi (演歌)/イィ...女 ("good"): ī
*# 闇の洗礼をうけるがイィ! 暴君ハバネロ外伝 ("good"): ī
*# ヴィエイィニュアンセ (a brand name, "vieille nuance", French "vieille" /vjɛj/): yi?
*# 鳥イィ? | 上地雄輔 OFFICIAL SITE ("Like some birds?"): ī
* うぅ
*# うぅ・・・の人気イラストやマンガ/ううぅ、うぅ、あ・・(息が苦しい)/うぅぉおお/うぅぅ~暑いっ、コンビニ行こっ! (interjections "urgh", "hmm" or the likes): ū (many hits from different sites)
*# ひねくれ領主の幸福譚 性格が悪くても辺境開拓できますうぅ!(lengthening of the sentence's ending): ū
*# ぶぅふぅうぅ農園 (@boohoowoofarm): wu
*# さうすまうぅん sausumaUun (an artist from Sapporo?): ū
*# 【ホットペッパービューティー】デフィ(defi)のフォトギャラリー:カラフルぅうぅううぅー!("colorfuuuuul"): ū
* ウゥ
*# 腕時計 ヴィヴィアンウゥストウッド (a brand name, "Vivienne Westwood"): wu
*# ウゥ~~~ン、店を出てからは振り返りたくない店かもしんないな/ウゥ~ン・・・落ちた~??/ ウゥ〜(interjections "urgh", "hmm" or the likes): ū (many hits from different sites)
*# アズノウゥアズピンキー 美品 ロングシーズン ロゴ刻印ボタン (a brand name, "As Know As"): wu?
*: (Many ウゥ hits seem to be misspelling of ウィ, ウェ and ウォ, like "ウゥストウッド" above and "ミニウゥレット" for "ミニウォレット".)
My conclusion is these syllables are more often ī and ū. Especially when in haragana, they are almost always ī and ū. Wikitionary transliterating all of them into yi and wu is a mistake. yi and wu should be taken as special cases, like we have ヲチ (芸能人ヲチ, etc) as unusual wochi instead of regular ochi. -- Huhu9001 (talk) 08:58, 27 August 2023 (UTC)[reply]

Request for comments. (Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): -- Huhu9001 (talk) 04:35, 29 March 2024 (UTC)[reply]

Support stuff like かわいいぃいぃいぃいぃ would then become kawaiyiyiyiyi instead of kawaiiiiiiiii if いぃ transcribes just yi. をぅ•ヲゥ can be used to transcribe wu, although を/ヲ is only used for accusative. Transcription of yi is more controversial, because there are no modern single kana characters of [j] + [+front], as ye could theoretically be transcribed いぇ as in イェイ (iei, yay!), but otherwise looks like a blend of a [+high] + [+mid] which could be recognized as a glide. Chuterix (talk) 13:57, 29 March 2024 (UTC)[reply]
Looks legit, as long as it's somehow possible to choose which one to use in a given context. I agree that ī/ū seems more logical, based on your examples, but frankly I've never seen these used in a real scenario so I can't say much more with confidence. Kiril kovachev (talkcontribs) 02:35, 2 April 2024 (UTC)[reply]
@Chuterix @Kiril kovachev If you look at the examples, given, almost all of the ī and ū examples are down to syllable lengthening in the same manner as English: "gooooood". However, this is an absurd standard to use: it's the equivalent of using a sentence like "it's hooooot today" to justify "oo" sometimes standing for short "o" in English. It doesn't - it's simply lengthening for emphasis, and isn't the kind of spelling we're ever going to lemmatise at. On the other hand, the yi and wu terms are actually terms in their own right. Theknightwho (talk) 00:42, 13 April 2024 (UTC)[reply]
@Theknightwho You're right... it would make sense that yours be the default, with that in mind. But I still do think we need both to be possible somehow, because how we will handle sentences (e.g. in {{ja-usex}} where we need いぃ to equal ī? Kiril kovachev (talkcontribs) 14:38, 13 April 2024 (UTC)[reply]
@Kiril kovachev An override exists with the rom= parameter (though this should probably be changed to tr=). It might be worth having a flag like . and ^, but this is really rare: we have two terms which use wu (ウゥルカーヌス (Wurukānusu) and its alt form), and I don't think we have any yi terms, though it does show up in some French place names, like プイィ゠シュル゠ロワール (Puyi-shuru-Rowāru, Pouilly-sur-Loire). We have no terms that I can find which use these for ī or ū. Theknightwho (talk) 15:04, 13 April 2024 (UTC)[reply]
@Theknightwho Hm, okay, that's fine, I guess. I believe it's not optimal because the romanization would still need to be manually written out, whereas we usually automate it with the kana transcription. But no, that might not be worth fussing over given that is a very rare thing. I've not actually seen any uses of either of these till these discussions, so I won't worry about it any more. Thanks for the explanation. Kiril kovachev (talkcontribs) 15:13, 13 April 2024 (UTC)[reply]

Splitting WT:RFM[edit]

What do people think of splitting WT:RFM? It's currently at > 1MB in size, which is very large. We split the various RFV and RFD pages at half that size. Yes, we should be better about archiving conversations but there's only so far that gets you. There are two possibilities: Split along the same lines as RFV/RFD (which splits approximately into English, CJK, Italic, and everything else), or just split for now into English/Non-English. I am inclined to do the latter as a first split; we can always split later as needed. One issue is what to do with Templates, Categories and the like that occur in WT:RFM; maybe we should have a three-way split at first: (1) English lemmas (WT:RFME); (2) non-English lemmas (WT:RFMN); (3) non-mainspace pages, including categories, templates, languages and the like (WT:RFMO). Thoughts? Benwing2 (talk) 05:10, 29 March 2024 (UTC)[reply]

Makes sense. — Sgconlaw (talk) 05:16, 29 March 2024 (UTC)[reply]
It really doesn't matter how much you split it. If no one is closing discussions then the new subpages will get arbitrarily large eventually. Ioaxxere (talk) 08:03, 29 March 2024 (UTC)[reply]
In the past, some people suggested moving language mergers, splits, renames, etc to either the BP or a language-specific page (the latter of which is the better idea to avoid just inflating the BP while deflating RFM, heh). (The reason they're on RFM at all is that originally, merging or splitting a language entailed merging its specific template, merging an actual page in the manner RFM usually does.) I've come around to the idea: have a language-discussion page, maybe even Wiktionary:Language treatment/Discussions, and then archive the discussions to manageably-sized archive subpages of its talk page (which would entail moving the current contents of that page and its talk page to such subpages). Moving language discussions off RFM would knock it down from 1,012,358 bytes to 457,067 bytes (removing 555,291‎ bytes).
Ioaxxere is correct that if no-one is archiving, the split pages will just get large again, though. - -sche (discuss) 13:37, 29 March 2024 (UTC)[reply]
@-sche Support moving language change discussions somewhere else. Benwing2 (talk) 20:31, 29 March 2024 (UTC)[reply]
I've wanted to do this for a long time. The hard part is what to call it. Is there some type of place where people go to discuss or make decisions on language issues? I suppose something alliterative is better than nothing: maybe "Lect lounge" or "Lect library". Or the "Lect embassy"/"Lect office"/"Lect bureau"? Or maybe emulating some kind of international organization: "League of lects/languages"? "Language court"? "Language academy"? Or something random like the vaguely Lovecraftian "Glossonomicon"?
Another option would be to separate entries/terms from everything else, so that templates, categories, appendices, languages, etc. would go to "RFMO". Chuck Entz (talk) 22:44, 29 March 2024 (UTC)[reply]
Lect Lounge! Lect Lounge! (Not that I’m going to be spending much time there, unfortunately.) — Sgconlaw (talk) 22:52, 29 March 2024 (UTC)[reply]
I admire the creativity, but I think we should stick with something that clearly communicates the purpose of the page, which will not be a general "lounge" to discuss anything lect-related (compare WT:Etymology scriptorium), but will specifically host proposals to change the way Wiktionary divides up the world's languages. I'd prefer -sche's idea of using Wiktionary:Language treatment/Discussions, even if it is terribly anodyne and sits outside the existing "requests for..." structure. Another possibility would be to split RFM into WT:Requests for moves, mergers and splits/Entries and WT:Requests for moves, mergers and splits/Languages, although that leaves no scope for the relatively rare requests to merge templates, appendices etc. This, that and the other (talk) 09:56, 30 March 2024 (UTC)[reply]

Hard Lithuanian Dotting[edit]

Should dots be preserved for Lithuanian when the accent is marked on 'i'? I noticed that we have a head word pìrmas when I would have expected pi̇̀rmas. The preservation is done by inserting U+COMBINING DOT ABOVE between the ASCII letter and the accent. (Some fonts barely show the difference.) WT:About Lithuanian is silent about the spelling of Lithuanian head words.

There seems to be a problem with stripping these combining dots as part of diacritic stripping for links. I would assume that that's minor rather than a show stopper. --RichardW57 (talk) 15:28, 29 March 2024 (UTC)[reply]

@RichardW57 Why would we need to insert U+COMBINING DOT ABOVE in the Lithuanian entry names? This isn't done anywhere else and the result looks bad in my browser. Something like pìrmas is completely unambiguous since Lithuanian doesn't have dotless i's in its repertoire. Benwing2 (talk) 19:25, 29 March 2024 (UTC)[reply]
It's always been the case that when you add a diacritic on top of i, it replaces the dot. I'd be surprised to see a font that didn't do this. Entry name replacements have no effect on displayed text. — Eru·tuon 21:58, 29 March 2024 (UTC)[reply]
@Benwing2, Erutuon: Historically, the suppression of the dot above in 'i' doesn't always happen, and in Vietnam and beside the Baltic there is some attachment to keeping the dot above when there is a diacritic above the letter. Unicode has decreed (https://www.unicode.org/versions/Unicode15.0.0/ch07.pdf pp293-4, section entitled 'Diacritics on i and j') that to keep that dot, it must be separately encoded as 'overdot', i.e. U+0307 COMBINING DOT ABOVE. Indeed, the Unicode Character Database (UCD) declares that in a Lithuanian locale, lowercasing LATIN CAPITAL LETTER I WITH GRAVE introduces U+0307; the presumption is that in Lithuanian contexts, the dot remains even when there is a diacritic above the 'i'. --RichardW57 (talk) 00:30, 30 March 2024 (UTC)[reply]
Now, as these diacritics aren't used in the normal writing of Lithuanian, examples of behaviour are hard to come by. However, I have found some quotations with these letters at https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=aa4570669a29839471f1a220ad2649a4bae0f5c5, an article by Vladas Tumasonis. The letters for showing accentuation are given in Figures1-3 therein, and perhaps more usefully, there are short quotations from a dictionary, a missal and a grammar on pp19-20. --RichardW57 (talk) 00:30, 30 March 2024 (UTC)[reply]
Malfunctions in entry name replacement can change what should be blue text into red text, as I discovered to my surprise in the opening post in this section. Imagine what happens to links to Latin terms if macrons on Latin terms aren't stripped. (Macrons distinguish entry names for some languages.) --RichardW57 (talk) 00:30, 30 March 2024 (UTC)[reply]
@RichardW57 IMO we definitely do not want U+0307 in the actual titles of entries. If this needs to be done it can be done automatically but I would be opposed to that for Lithuanian. Benwing2 (talk) 00:41, 30 March 2024 (UTC)[reply]
@Benwing2: By 'title', do you mean page name or head word? --RichardW57 (talk) 00:53, 30 March 2024 (UTC)[reply]
@RichardW57 In the page name. It can be added automatically to headwords but as I said, I'd be opposed to that. As User:Erutuon said, most fonts automatically remove the dot when an accent is added to i, and IMO that is correct and perfectly fine for Lithuanian. Benwing2 (talk) 00:58, 30 March 2024 (UTC)[reply]
@Benwing2: But seemingly not to the editors of the dictionary, missal and grammar indirectly cited above! The intervening U+0307 shall not appear in page numbers, as it should be stripped out along with the acutes, graves and tildes above it. (Note that 'ė́ ' will be reduced to the single code point for 'ė'.) --RichardW57 (talk) 01:18, 30 March 2024 (UTC)[reply]
If Lithuanian editors actually want dots on all their is and js in headwords, it could be done. I know almost nothing about Lithuanian or these dictionary editors. Alternatively, Lithuanians could lobby font makers to program different glyphs when the language of the text is Lithuanian. — Eru·tuon 01:52, 30 March 2024 (UTC)[reply]
After reading the linked article on "Encoding of Lithuanian Accented Letters", my opinion is that retaining the dot in Lithuanian entries would be preferrable as a matter of typographical style. I'm less sure however whether the best way to implement that on the technical level is to include U+COMBINING DOT ABOVE (as a practical matter, pi̇̀rmas looks wrong in the font used by my browser: it shows up with an extra dot in addition to the tittle. pı̇̀rmas looks better but doesn't have correct centering, although it shares that problem with some other oblique letters with combining diacritics such as ī̆. When not oblique, pı̇̀rmas looks OK, although it's apparently not officially the right way to do this). Tumasonis seems to present this as a proposed encoding convention; is there evidence that it has actually been adopted by anyone in practice?--Urszag (talk) 01:59, 30 March 2024 (UTC)[reply]
@Urszag This seems to be one person's opinion, and the article is quite old (it's not dated but the last citation is from 1998). Do we have any evidence that standard Lithuanian practice actually retains dots over the i along with accents? If so, how is this implemented? BTW, by now, if the use of U+0307 were standard, certainly the fonts would have been fixed to display this correctly. The fact that it shows up wrong is a strong indication that this is not standard. Benwing2 (talk) 02:05, 30 March 2024 (UTC)[reply]
As RichardW57 said, this typographical form is also mentioned in The Unicode Standard Version 15.0 – Core Specification (published September 2022): "To express the forms sometimes used in the Baltic (where the dot is retained under a top accent in dictionaries), use i + overdot + accent (see Figure 7-2)" (page 293). So I guess it is officially the correct way to encode this after all and not just a proposal. But I don't know whether any digital fonts have bothered with it.--Urszag (talk) 02:10, 30 March 2024 (UTC)[reply]
@Benwing2 it may be of interest that the main Lithuanian academic dictionary retains the dot when applying accents to lowercase i: http://www.lkz.lt/?zodis=pirmas&id=22167950000. The encoding appears to be the "i" character plus a combining grave accent. This, that and the other (talk) 10:01, 30 March 2024 (UTC)[reply]
@Benwing2: What about the readability of Lithuanian pelė̃ (mouse)? The accentuation mark (a tilde) overstrikes the dot above the 'ė' in the font I'm getting for this page. The preview font actually moves the second mark to the right, overhanging subsequent letters. On the other hand, Tahoma handles both words correctly. Liberation Sans makes an effort, but squashes the marks above, while Liberation Serif handles 'pi̇̀rmas' by stacking but handles 'pelė̃ ' by writing the accents side by side. Side by side is not unknown for dot above and acute together on 'e' - call it 'accent kerning'. We see it in kūrė́jau 'Lord' on the first line quoted from the missal - Tumasonis p20. All three fonts, Tahoma, Liberation Serif and Liberation Sans stack the dot and acute on 'e'.
In short, there are fonts that support dot above and mark on top. --RichardW57 (talk) 23:14, 30 March 2024 (UTC)[reply]
@RichardW57 On my Mac Book Pro under Chrome, pelė̃ written in upright (non-italic) font looks correct but pelė̃ in italic font has the tilde overhanging the space between the l and ė. Benwing2 (talk) 23:26, 30 March 2024 (UTC)[reply]

Lydian letters[edit]

Would it be possible to update Wiktionary's rendering of the Lydian script to reflect the new values of the letters 𐤮 (formerly ś, now s) and 𐤳 (formerly s, now š)? Antiquistik (talk) 08:41, 30 March 2024 (UTC)[reply]

@Antiquistik Yes although I'd like to hear from someone else who has some knowledge of Lydian (which isn't me). Benwing2 (talk) 20:18, 30 March 2024 (UTC)[reply]
Can you provide any more information, e.g. who changed the values? Links to information about why the new values are what they are? - -sche (discuss) 04:12, 6 April 2024 (UTC)[reply]
@Benwing2, -sche Per Diether Schurr (1999), Lydisches I: zur Doppelinschrift von Pergamon, the values of the letters 𐤮 (formerly ś) and 𐤳 (formerly s) were erroneously assigned. Lydian transcriptions of non-Lydian names show that 𐤮 is always used for the sound /s/ (that is, s) while 𐤳 is always used for the sound /ʃ/ (that is, š).
It also appears that the value of the letter 𐤡 (previously /b/) needs to be updated, since Schurr concludes that it instead represents /p/.
Schurr (1997), Lydisches IV: Zur Grammatik der Inschrift Nr. 22 (Sardes) also reassigned to the letter 𐤥 (previously "v") the new value of "w" to avoid confusion with the letter 𐤸 (transcribed using the Greek letter "ν").
These new values are now used as the standard Lydian transliteration. So it would be preferable to use them. Antiquistik (talk) 16:12, 7 April 2024 (UTC)[reply]
Thanks. OK, I can find both systems in use when I quickly search Google Books for some common words in one vs the other system (like laqrisa / laqriša), but the "new" system does seem clearer, if it also involves changing 𐤸 from ν (Greek nu, which seems like a horribly confusing choice) to "ñ" to match Lycian. Pinging @Vorziblix if you have any thoughts as the only active user to have edited Module:Lydi-translit. - -sche (discuss) 17:05, 7 April 2024 (UTC)[reply]
@-sche On a purely personal level, I would support the change from ν to ñ. However I cannot find the relevant literature advocating for any value change, all I can find for now is the use of different values without explanation. So I will leave whether or not to implement this specific change to be discussed here. Antiquistik (talk) 10:28, 8 April 2024 (UTC)[reply]
@-sche: It’s been too long since I looked at the relevant literature; at present I have no personal opinions one way or the other. — Vorziblix (talk · contribs) 19:34, 10 April 2024 (UTC)[reply]
@Vorziblix Would you object to updating the values of these Lydian letters on Wiktionary? Or can we go ahead with the changes? Antiquistik (talk) 06:49, 15 April 2024 (UTC)[reply]
@Antiquistik: No objections here! — Vorziblix (talk · contribs) 13:32, 16 April 2024 (UTC)[reply]
OK, let's think which letters need to be changed. Modern works that use s for 𐤮, š for 𐤳, w for 𐤥, p for 𐤡 (I can again find both systems in use even in recent works, searching for e.g. pira vs bira "house"), and ñ for 𐤸, do they also update any of the other letters? (We appear to already be using w for 𐤥.) Then we can change everything that needs to be changed in the module at once. (Does anyone update 𐤴 to anything different avoid confusion between τ and t?) - -sche (discuss) 16:11, 15 April 2024 (UTC)[reply]
@-sche The letters 𐤮, 𐤳, 𐤥, and 𐤡 are the only ones requiring updates. There is, for now, no proposal to change the value of 𐤴 or of other letters by linguists covering the Lydian language. Antiquistik (talk) 05:53, 16 April 2024 (UTC)[reply]
@Antiquistik Can we please change the value of 𐤸 to ñ? Per User:-sche, there is support in recent works for this. Use of a mixture of Greek letters and Latin letters is bad enough, but it's IMO intolerable when the Greek letters look like unrelated Latin letters. Benwing2 (talk) 06:26, 16 April 2024 (UTC)[reply]
@Benwing2 Yes, that would be good too. Antiquistik (talk) 06:31, 16 April 2024 (UTC)[reply]
Full disclosure, 1) Despite 𐤸 reportedly occurring in several words that seem like they should be common, including certain case forms of demonstratives and pronominal clitics, I actually haven't managed to find very many works that contain words spelled with 𐤸 (Wiktionary doesn't have any, either, AFAICT), in order to determine how they transliterate it (I tried searching for such words with ν, with ñ, with n, with v ... couldn't find many works mentioning the words at all, in any spelling I could think to search for). 2) In the few works I was able to find, there were somewhat more using Greek nu, but yes, ñ is also found, and has the advantages of much greater clarity, plus agreement with Lycian where a similar letter is transliterated ñ.
OK, I guess I will change the module in a few days if no one brings any other issues forward... - -sche (discuss) 23:43, 16 April 2024 (UTC)[reply]
@-sche The Digital Philological-Etymological Dictionary of the Minor Ancient Anatolian Corpus Languages covers Lydian in its dictionary and corpus, and it transliterates 𐤸 as ν rather than ñ.
Which is why I find it preferable to leave it to further discussion here whether to update its value on Wiktionary or keep the present value. Antiquistik (talk) 22:07, 18 April 2024 (UTC)[reply]
I still think it should be ñ; it's going to be too confusing to have it look like a v. Benwing2 (talk) 22:11, 18 April 2024 (UTC)[reply]
@Benwing2 I don't disagree. Antiquistik (talk) 05:36, 19 April 2024 (UTC)[reply]
 Done - -sche (discuss) 01:22, 19 April 2024 (UTC)[reply]

Orthographic borrowing[edit]

If I say I'm going to Москва next year, using Cyrillic in the middle of the sentence (as some people do), and especially if I approximate the Russian pronunciation rather than sub in a different pronunciation like "Moscow", is that an "orthographic borrowing" of the Russian word, or am I just (basically) quoting the Russian word (if not code-switching)?
My understanding has been that "orthographic borrowing" is a phenomenon that happens almost exclusively in Asian languages, where a language like Japanese borrows only the written shape of a Chinese character but not an approximation of the Chinese pronunciation: that's what makes the borrowing an only "orthographic" borrowing and not just a regular "borrowing".
But I see that e.g. い-adjective#English is currently given as an "orthographic borrowing", although it's kind of the opposite of how we define "orthographic borrowing" on T:obor and in Appendix:Glossary, and of how e.g. Korean 葉書 is an orthographic borrowing: like with Москва, い-adjective is borrowing/quoting both the pronunciation and the spelling, a straight-up borrowing. This isn't the first time I've seen someone use "orthographic borrowing" in a way that seems incorrect (compare bauxite). Am I right about the scope of {{obor}} here? (Is there anything we could do to make the scope of {{obor}} clearer / discourage misuse?) - -sche (discuss) 04:01, 31 March 2024 (UTC)[reply]

link to another relevant discussion: Wiktionary:Information desk/2022/July#Is_this_Orthographic_Borrowing? - -sche (discuss) 05:19, 1 April 2024 (UTC)[reply]
@-sche Thanks. If we systemically restrict orthographic borrowings to logographic languages, we can implement that in the code, but if it's just a suggestion and we allow things like English CCCP to be considered orthographic borrowings, I think the best we can do is display an "are you sure?" type of warning. Benwing2 (talk) 05:25, 1 April 2024 (UTC)[reply]
Pondering: regardless of whether we restrict this to logographic scripts or allow CCCP (Latin) - СССР (Cyrillic), is there ever a situation where it makes sense to say a term in one alphabetic script was borrowed from another term in the same alphabetic script? Or would it be appropriate to, at least, make it throw an error whenever the terms are in the same alphabetic script (or perhaps, since this seems to be one of the biggest sources of errors, just when both terms are Latn)? Or would that cause problems, are there valid cases? I notice that back in February, Actarus176 switched a bunch of French {{bor|fr|en|foo}} to {{obor|fr|en|foo}}, but as in your sioux example, I don't think dinosaure (for example) is an "orthographic borrowing" of dinosaur (it's just a borrowing). This would also remove p͛- and ꝓ- from being orthographic borrowings (which seems reasonable to me). - -sche (discuss) 20:33, 19 April 2024 (UTC)[reply]
But I have started to think it would be best to systematically restrict obor to logographic scripts, and not consider CCCP to be orthographic borrowing. I'm thinking about the spectrum of things CCCP is in, like cyka where the k is the wrong shape, POCCNR where some letters aren't the same direction as in Cyrillic, etc... English is only imitating the Cyrillic letters, not borrowing them. English only renders Cyrillic СССР as CCCP because it already has similar-looking letters, but in a case like cyka or POCCNR, English doesn't borrow short к or И and Я, and it's not borrowing с (es) or р (er) either, because English having c (cee) and p (pee) predates any contact with Cyrillic, very different to the situation with e.g. Japanese or Akkadian, which did actually borrow glyphs from Chinese and Sumerian. I... guess we should have a poll or something? (I doubt this needs a full, capital-WT:V-Vote.) - -sche (discuss) 20:43, 19 April 2024 (UTC)[reply]
@-sche This is fine with me. I can add tracking to {{obor}} to see where it's used (a) with non-logographic scripts, (b) where source and destination are both Latin. Benwing2 (talk) 21:10, 19 April 2024 (UTC)[reply]
I suppose you could argue that the standard Finnish pronunciation of sioux, which sounds like /sjouks/, is an orthographic borrowing because the pronunciation is entirely based on the spelling and not the source language's pronunciation; but this could as well just be called a rather striking example of a spelling pronunciation. It depends on how we define "orthographic borrowing" and whether it's restricted to writing systems based on logograms (in which case orthographic borrowings can't occur in the Latin alphabet but could occur for example in cuneiform). Benwing2 (talk) 05:16, 31 March 2024 (UTC)[reply]
But yes, the case of い-adjective#English is definitely not an orthographic borrowing. Benwing2 (talk) 05:17, 31 March 2024 (UTC)[reply]
@-sche how about this scheme for your example? Ioaxxere (talk) 05:39, 31 March 2024 (UTC)[reply]
@Ioaxxere I don't understand your table. In the case of @-sche's example, I would say that an example like I'm going to Москва next year is just code-switching. It is similar to people who say "I just got back from Nicaragua" and pronounce the word "Nicaragua" exactly as it would be pronounced in Spanish; essentially they are inserting a Spanish word in the middle of an English sentence (never mind that it's spelled the same in both languages). Benwing2 (talk) 05:47, 31 March 2024 (UTC)[reply]
It seems like you're agreeing with me. Code-switching is when you briefly switch into another language. So if you say "I'm going to Москва next year", that's an English sentence with a Russian word. Since the word is intended to be as Russian as possible, we can't possibly call it a "borrowing" of any kind.
Also, to address the question directly, い-adjective is essentially equivalent to something like γ-ray, which I would call an unadapted borrowing compounded with an English term. Ioaxxere (talk) 05:54, 31 March 2024 (UTC)[reply]
@Ioaxxere: We currently have only two entries in Category:English orthographic borrowings from Russian: CCCP and cyka. I assume these are correctly categorized and hence *MOCKBA would also be an orthographic borrowing? J3133 (talk) 05:53, 31 March 2024 (UTC)[reply]
@J3133: Yes, I agree with that categorization since those terms are spelled according to Russian conventions, or their nearest ASCII equivalents, but have been adapted into English (from a quick search I see people pluralizing "cyka" as "cykas"). Ioaxxere (talk) 06:01, 31 March 2024 (UTC)[reply]
Benwing found the words for something I was struggling to (thank you), "spelling pronunciations"—and that bauxite-type entries are better viewed as "spelling pronunciations". (And "spelling pronunciation" isn't "orthographic borrowing", or else where's "CAT:English orthographic inheritances from Middle English" for when the pronunciation but not spelling changed on inherited words like one?) That's a good explanation for why intra-Latin-script borrowings are better viewed as "spelling pronunciations". Can we move kaolin out of Category:English orthographic borrowings from French on that basis?
And I appreciate Ioaxxere's comparison of い-adjective-type entries to γ-ray, though I think that even moreso than "γ-ray", "い-adjective" is just 'quoting' the other language/script, a la code-switching but in such a way that it's OK to have an English entry (whereas we don't have Москва#English).
I am inclined to agree "CCCP" is an orthographic borrowing, but I'm not 100% sure, because while on one hand it looks identical to the Cyrillic script form, it is still changing from Cyrillic to Latin script, and well, where is the line past which approximating one script via similar characters in another is no longer "orthographic borrowing"? E.g. if it were attested, would POCCNR for Россия (Rossija) be "orthographic borrowing", although it changes the direction of some letters? What if someone ASCIIizes "ㄸ-initial" [words in Korean] as "cc-initial"? Surely there is some point beyond which an ersatz representation is definitely not orthographic borrowing(?), so does it make more sense for the line to be "when the script changes" (so Latin-script CCCP is not an orthographic borrowing), or "when the written form is not identical" (so Latin-script CCCP counts but not POCCNR)? I'm unsure. (Maybe "orthographic borrowing" should even be restricted to logographic scripts, as Benwing mentions.) - -sche (discuss) 00:06, 1 April 2024 (UTC)[reply]
@-sche There is in fact an underused template {{spelling pronunciation}}, which I have used on kaolin. Maybe this template should be augmented to allow for "spelling-pronunciation borrowings" or some such; currently it only takes a single param (the current lang), and categorizes into e.g. Category:English spelling pronunciations. Benwing2 (talk) 05:00, 1 April 2024 (UTC)[reply]
/ˈmɑskaʊ/ /mɐskˈva/ (or its closest English equivalent)
<Moscow> Borrowing Why would you do this?
<Moskva> Why would you do this? Unadapted borrowing
<Москва> Orthographic borrowing You're just speaking Russian

Well let’s see what I have thought out two months ago: orthographic borrowings are a transscriptural concept. A method of writing a language must be transferred upon the manner in which another language is written. Not the case with い-adjective#English because the term embeds content in another language, referring to the content of another language, i.e. it does not loan anything at all from Japanese. You might say that code-switching even exists inside of lexical units and when one is polyglot, that is at least started a lexicon of Japanese in one’s mind, then the lexicon of each individual language is permeable to transclude, fetch as an external resource, lexical units from another language also documented within the mind.
I'm going to Москва next year has no orthographic borrowing because the category of orthographic borrowing is a category of the dictionary sphere, not of sentence analysis, in other words because I'm going to Москва next year is not parsed by us as a lexeme which we could enter somewhere as such, claiming it to contain etymological relations.
As for Ioaxxere’s table, per my previously found dogmatics, that which bro calls “orthographic borrowing” is a heterogram, and “you’re just speaking Russian” is even more an “unadapted borrowing”, for I cannot admit that the language or code-switching nature changes depending on the spelling, also the constellation is empirically rare in so far as relevant for possible dictionary entries. Fay Freak (talk) 06:39, 31 March 2024 (UTC)[reply]

Feedback on proposed label designs[edit]

In this example, German Term was possibly borrowed from English term, which was derived from a combination of Latin anc2, Latin anc4, and Latin anc6, which each have a further uncertain inherited ancestor.

Proto-Italic *anc1

?

Latin anc2

der.

Proto-Italic *anc3

?

Latin anc4

der.

English term

bor.?

German Term

My main question is: is the text too hard to read on small screens? The font size is 12 pixels, equivalent to putting text in a <small> tag.

I'm very interested if anyone has feedback and suggestions for improving this design. Pinging @Victar, Vininn126, who highlighted the need for these kinds of labels, and @Lunabunn, who inspired the current design. Ioaxxere (talk) 04:51, 31 March 2024 (UTC)[reply]

Nice work, I like it! While I personally don't care much for the highlighted backgrounds, I can see their value in making the labels stand out more. As for the font size, I think we have found a good balance. Lunabunn (talk) 04:55, 31 March 2024 (UTC)[reply]
As a contrasting opinion, I really rate the backgrounds. They look really nice on dark theme. Kiril kovachev (talkcontribs) 02:24, 2 April 2024 (UTC)[reply]
Definitely an improvement. Would tooltips be possible? Vininn126 (talk) 07:25, 31 March 2024 (UTC)[reply]

@Lunabunn, Vininn126: I made some adjustments and added a link to the glossary for the benefit of mobile users. Ioaxxere (talk) 20:04, 31 March 2024 (UTC)[reply]

A proposal for how future big template changes should be done.[edit]

Big template changes happen too quickly for some people. While some people (mostly those that keep up with/make the latest Wikt programming news) know what the latest change to templates are, others have no clue until the big red "deprecated" banner or the Lua error text shows up. In addition, sometimes the big changes leave errors in unforeseen ways that didn't exist before.

As such, I propose a new system for big template changes, to help users get used to the templates and programmers have fewer bugs. It goes something like this:

0. Make a big change to a template (which includes actions such as deprecating any template or changing the logic of a highly-used template).
1. Let your changed version and the unchanged one exist side-by-side, but with the changed one being encouraged and receiving updates.
2. See if there are any bugs, and fix them.
3. After some time (I'd say around a month for big fixes, and maybe a little more for people to get used to the changes), remove the unchanged template.

What do you think? CitationsFreak (talk) 07:26, 31 March 2024 (UTC)[reply]

@CitationsFreak What sorts of changes prompted this? Also keep in mind that logic changes to templates often cannot easily be done in the way you suggest. Benwing2 (talk) 08:03, 31 March 2024 (UTC)[reply]
My big concern with this proposal is that it'll result in templates changing names periodically, which will annoy basically everyone. How would this work with a template like {{l}}, for instance?Theknightwho (talk) 18:03, 31 March 2024 (UTC)[reply]
@Benwing2 The threads "deprecate Template:1" and "Accelerated English plurals generate the wrong template".
@Theknightwho I was thinking of renaming the unchanged template something like {{[template-name]-old}}, so that the new template can still used as much as possible, but we have a backup in case something goes wrong. CitationsFreak (talk) 18:40, 31 March 2024 (UTC)[reply]
@CitationsFreak Would every page need to be bot converted to the old version and then changed over, or are you envisioning that people could choose which one to use? The former seems impractical, and the latter feels like a recipe for confusion, since we'll end up with a mish-mash of versions and we'll need to ensure every other relevant module is able to support both (which could get very messy). Theknightwho (talk) 18:49, 31 March 2024 (UTC)[reply]
@Theknightwho The later, but only for some time, and with a later bot job to replace the old template with the new. The other templates should only recognize the old template if they did before, and clearly mark where the old template is being recognized in-code so that it can be deleted when its time comes. CitationsFreak (talk) 19:08, 31 March 2024 (UTC)[reply]
Should be uncontroversial. DCDuring (talk) 15:52, 31 March 2024 (UTC)[reply]
A month is not much time. Commercial APIs often give six months or a year. (Backwards compatibility is even better. You can still use Stripe's 2014 APIs if your HTTP header requests it.) Equinox 15:23, 1 April 2024 (UTC)[reply]
Does it really take a year for people to get used to new templates on Wikt? CitationsFreak (talk) 17:48, 1 April 2024 (UTC)[reply]
Also, we're not a commercial operation, development is difficult enough as it is, and the barrier to getting involved in module development is already high due to the learning curve. Theknightwho (talk) 17:56, 1 April 2024 (UTC)[reply]
That's exactly right. I don't think any of this is needed; I make a lot of changes and few of them have caused any issue. Best to handle any issues on a case by case basis. We only have a few "developers" working on their own time; we can't afford to put more barriers up. Benwing2 (talk) 19:43, 1 April 2024 (UTC)[reply]
I feel that the issue of fewer developers is easily solved, by adding more developers from this wiki and not. (Also, is there a guidebook for developers? Wouldn't hurt to have one.) CitationsFreak (talk) 23:11, 1 April 2024 (UTC)[reply]
I doubt that we need a year, but some kind of statement of, 1., what the overall plans are and, 2., what particular code changes, monitoring categories, filters, etc. should not be too much to ask. There have been many technical changes over the years that have not generated as much annoyance as some recent ones. It is definitely more fun to not ever answer to anyone. Having pesky users objecting to changes will definitely slow down the pace of change. But, will it slow down the pace of desirable change? And who gets to say what change is desirable, perps or victims? DCDuring (talk) 22:37, 1 April 2024 (UTC)[reply]
@DCDuring It would be much easier to work with you if you stopped acting like such a drama-queen. I have tried to have these conversations with you many times, but we never seem to get anywhere because things are rarely good enough for you, and the objections often boil down to your personal needs at the expense of everything else. Calling yourself a "victim" is seriously unhelpful. Theknightwho (talk) 23:22, 1 April 2024 (UTC)[reply]
  1. User:Thejnightwho I apologize for working in a realm that is different in kind from a language and therefore has needs that do not fit into the normal framework. DCDuring (talk) 15:58, 2 April 2024 (UTC)[reply]
    @DCDuring And I'm working from a framework that includes more people than just you. Thanks. Theknightwho (talk) 18:01, 3 April 2024 (UTC)[reply]
    I'm sorry that my needs are different. DCDuring (talk) 18:56, 3 April 2024 (UTC)[reply]
@Theknightwho, DCDuring: I see it as an attempt to train developers about user perceptions. For example, @JeffDoozan's parameter checking module Module:checkparams is in principle a good idea, but dealing with its warnings for invocations of declension templates is a pain for whoever deals with them and seeing them is distinctly off-putting for anyone making other changes to the pages. One problem is a widespread lack of documentation of the templates, and the other is that some positional parameters have become obsolete as better ways of doing things became available or more widely known. And this is despite Jeff making efforts to reduce or eliminate its impact (albeit sometimes with the aid of other editors) when its warnings are misguided. --RichardW57m (talk) 09:37, 2 April 2024 (UTC)[reply]
They fix the bugs already, I see no issue. I don’t even keep up with programming news, page creation goes smoothless, and I don’t think there is manpower or mental capacity on either developer or editor side to run multiple template and module versions concurrently. BTW I used Arch. Fay Freak (talk) 23:01, 1 April 2024 (UTC)[reply]
I don't see an immediate problem, but by #3's "remove", I suggest redirecting, as someone will possibly want to use the new name or old name in the future without having seen the parallel testing period. No particular response to the rest of the proposal. —Justin (koavf)TCM 23:18, 1 April 2024 (UTC)[reply]
You mean the old and new templates, right? I can't see a situation where a person would want to use the old name of a template on purpose after everyone's gotten used to the change. An old version, sure, but not an old name, (as in {{1}}.) CitationsFreak (talk) 23:23, 1 April 2024 (UTC)[reply]
Correct: the names. And the reason someone would want to use the old name is "I haven't edited Wiktionary in three years, but I remember that this is how you do <var>x</var>". —Justin (koavf)TCM 23:36, 1 April 2024 (UTC)[reply]
This feels like us having to keep templates around forever, even if it is a redirect. I suppose there is nothing wrong with it. However, I do feel like the template name must be removed at some point. I'll mull over it. CitationsFreak (talk) 23:48, 1 April 2024 (UTC)[reply]

Derived terms[edit]

According to Appendix:Glossary:

derived terms
A post-POS heading listing terms in the same language that are morphological derivatives. [Italics mine]

According to Template:derived:

This template is used to format the etymology of terms derived from another language. [Italics mine]

This seems confusing. I think one of them should be renamed. Ioaxxere (talk) 19:40, 31 March 2024 (UTC)[reply]

@Ioaxxere
I see what you mean. It happens in etymologies that terms are derived from the same language, but we don't, maybe illogically, use the "derived" template in those cases, even though we do use a relationship of the same name when referring to the given word on the page of the word it's derived from.
Despite that I still don't like a rename. I didn't notice any problem till you mentioned it, so IMO it's not a big deal. Kiril kovachev (talkcontribs) 02:18, 2 April 2024 (UTC)[reply]
@Ioaxxere: I think the difficulty is that there aren't many (or any?) alternative words that can be substituted in place of derived. Thus, the same word has been used in different contexts. Do you have a suggestion as to an alternative word for one of the contexts? — Sgconlaw (talk) 17:48, 15 April 2024 (UTC)[reply]
@Sgconlaw: "Derived" in the second sense could be changed to "descended". Ioaxxere (talk) 17:54, 15 April 2024 (UTC)[reply]
@Ioaxxere: wouldn’t this clash with {{desc}}, although we would be using the word descended/descendant in the same sense? — Sgconlaw (talk) 18:46, 15 April 2024 (UTC)[reply]

Idea: Categorizing illustrated terms[edit]

I love illustrated pages on Wiktionary. They can give a visual grip to otherwise bland white pages. And that visual grip can, moreover, convey the culture of the speakers in a way transcending characters and phonetics.

Moreover, a fair bunch of language learners are also visual learners, and illustrations would only help. On top of that, Wiktionary can not only become the most comprehensive online dictionary, but also the most delightfully illustrated one. With time, this will draw more readers, and from there, more contributors, which leads to higher quality pages, which leads to more readers – you get the drill.

For this reason, I have a proposal: Get a bot to parse the lemmas of a given language, and drop them into categories such as "Fooian terms with illustrations". We already do this with "Fooian terms with quotations". Anyone on board?

P.S. This is unrelated to April Fools' Day :) Shoshin000 (talk) 20:38, 31 March 2024 (UTC)[reply]

@Shoshin000 You know what, this sounds good, and it would be even better if we systematically used a template for adding pictures onto entries that could do the categorization for us. Then there'd be no need for period bot tasks. The problem right now is that we just use the bare wiki syntax for images (or at least I do, I dunno), so I don't think it's able to be tracked ATM. We could consider changing over to a template for that, even if it's just a thin wrapper over the image syntax? Kiril kovachev (talkcontribs) 02:20, 2 April 2024 (UTC)[reply]
Yeah, and the bot would switch Wiki syntax to the template. Sounds more simple. Shoshin000 (talk) 11:09, 3 April 2024 (UTC)[reply]
How does having the category or the new template help achieve the goal of having more entries with good visuals?
We already have {{rfi|[language]}} which shows that someone thinks that the tagged L2 would benefit from an image. That template populates the language subcategories of Category:Requests for images by language. It is principally used for English (1961) and Translingual (taxonomic) (2249) entries.
As for pages with images there are 17,735 English noun lemma pages with "File:" and 5,410 with "Image:" and 906 English proper noun lemmas with "Image" and 3,447 with "File:". There are 9,058 Translingual taxonomic pages with "File:" and 386 with "Image:". There would be some duplication in the total count of pages, but also many instances of multiple images in L2s and between L2s. There are also some English verb pages with images. Also many of the pages for letters and symbols have images.
Among the problems we have with images are the lack of informational rather than esthetic value. A picture of a tree from a distance may convey information about shape (for a "specimen" tree not growing a forest). Plants that bear colorful flowers for a week in Spring are often illustrated only by a picture of the flower. More value is contained in an image that illustrates the probable reason for the name, eg, red-bellied piranha. DCDuring (talk) 17:35, 3 April 2024 (UTC)[reply]
@DCDuring I don't think it would do that, I think it would just make it easier to discover and search pages with images on them. Question: how are you coming up with all these figures? I can't figure it out. Which is why we would benefit from such a category.
I also disagree about the purpose of images. The entry already defines the word, and the etymology would also explain anything about the origin of the word. The most useful thing an image can do is show you the most distinctive form of whatever's being defined. For example, I'll look up 木漏れ日 and immediately see what it's meant to be referring to, perhaps even without needing to see the definition; for words with slightly abstract or culturally-specific definitions, a picture really speaks a thousand words, so I think we should have as many pictures as possible for words where they make sense — they just make it much easier to understand some definitions. Not just for etymological reasons. Kiril kovachev (talkcontribs) 18:17, 7 April 2024 (UTC)[reply]
It is amazing what can be accomplished by using CirrusSearch, especially "insource:" with regular expressions.
Many of our English verbal definitions are laughably incomplete, not mention the often-ambiguous glosses that pass for definitions in non-English L2s. That justifies more, rather than fewer, illustrations. In the case of vernacular names of organisms, what passes for "red" in definitions of "red-breasted this" or "red that" can vary dramatically. DCDuring (talk) 20:37, 7 April 2024 (UTC)[reply]
I don't know if this should necessarily be subdivided by language — consider a page like Ukraine, where the images are in the English section, but equally relevant to the other language sections, but it'd look bad to repeat them in every language section — but in my view, at least, it'd be fine if someone wants to periodically bot-categorize pages with images into Category:Pages with images or something (compare Category:Pages with broken file links, for cases where an image or audio is linked but doesn't work). - -sche (discuss) 17:11, 7 April 2024 (UTC)[reply]
It's possible to do it automatically, though any images added by templates wouldn't be included. Plus, we'd want to exclude any images which are part of a link template, since they're sometimes used for scripts which haven't been encoded yet. It's probably one of those jobs that sounds simple, but ends up being quite complicated. Theknightwho (talk) 20:52, 7 April 2024 (UTC)[reply]

French Wiktionary[edit]

Gosh, what is happening over there? I visited it only to see this banner at the top:


L’accès à la base de données est désactivé pour une durée indéterminée

Plainte

Je vous écris cette lettre au nom et pour le compte de Monsieur (omis), afin de vous informer que j’ai déposé le 25 mars dernier une plainte officielle auprès de la Préfecture de Police de Versailles contre votre site Internet, car il est coupable d’avoir publié des nouvelles et des faits à caractère injurieux, susceptibles de porter atteinte à l’honneur et à la réputation de mon client.

Comme l’a rappelé la jurisprudence (par exemple, Cour de cassation 03/3956) « … l’utilisation d’un site Internet pour la diffusion d'images ou d'écrits susceptibles d’offenser une personne est une action susceptible de porter atteinte au patrimoine juridique de l’honneur, et constitue donc le délit de diffamation aggravée… ».

Nous regrettons d’avoir dû recourir à l’autorité judiciaire, mais le ton et le contenu de l’article en question ne laissaient pas la moindre place à la négociation.

Je vous prie d’agréer, Madame, Monsieur, l’expression de mes salutations distinguées.

Avocat (omis) Ordonnance

En vertu de la plainte n° 2154021/24 déposée à l’Hôtel de Police de Versailles le 25/03/2024, il est notifié ce qui suit :

ayant été informé d’une infraction contre fr.wiktionary.org, étant directeur général de l’association Wiktionnaire,

la fermeture préventive du site fr.wiktionary.org est ordonnée car il fait l’objet d'une enquête pour l’infraction visée à l’article 595 du code pénal, avec la circonstance aggravante visée au paragraphe 3 du même article.

La fermeture doit être achevée au plus tard le lundi 1er avril 2024 à 15 heures.

Vous êtes également informé que, conformément aux dispositions du code de procédure pénale, vous avez été désigné comme avocat commis d’office par M. (omis).

Ceci est sans préjudice de votre droit de désigner un conseil de votre choix, en notifiant cette procuration par l’envoi, également par fax à (omis), d’une désignation formelle d’un conseil.

Versailles 31/03/24

Procureur (omis) Avis aux utilisateurs et utilisatrices du site Wiktionnaire

Nous avons considéré qu’il était préférable d’occulter les noms des personnes et des utilisateurs directement concernés.

Il est impossible de trouver les mots justes pour exprimer le sentiment de perte profonde que cette décision nous laisse. Une seule chose est sûre : le Wiktionnaire ne s’arrête pas là.

L’esprit qui animait le projet jusqu’à hier est profondément blessé, mais la communauté saura trouver de nouveaux espaces pour repartir.

Restez en contact, le Wiktionnaire francophone.


Sgconlaw (talk) 22:41, 31 March 2024 (UTC)[reply]

For those of us who parles un petite rancais (like myself), DuckDuckGo says this means:
Access to the database is disabled for an indefinite period of time
Complaint
I am writing this letter to you in the name and on behalf of Monsieur (omitted), to inform you that on March 25 I filed an official complaint with the Prefecture of Police of Versailles against your website, because it is guilty of having published news and facts of an offensive nature, likely to damage the honor and reputation of my client.
As recalled in case law (e.g. Court of Cassation 03/3956) "... the use of a website for the dissemination of images or writings likely to offend a person is an action likely to harm the legal patrimony of honour, and therefore constitutes the offence of aggravated defamation... ».
We regret that we had to resort to the judicial authority, but the tone and content of the article in question did not leave the slightest room for negotiation.
Please accept, Madam, Sir, the expression of my distinguished greetings.
Counsel (omitted)
Order Pursuant to complaint no. 2154021/24 filed at the Versailles Police Station on 25/03/2024, the following is notified:
having been informed of an offence against fr.wiktionary.org, being the general manager of the Wiktionary association,
The preventive closure of the fr.wiktionary.org site is ordered because it is under investigation for the offence referred to in Article 595 of the Criminal Code, with the aggravating circumstance referred to in paragraph 3 of the same Article.
The closure must be completed no later than Monday, April 1, 2024 at 3 p.m.
You are also informed that, in accordance with the provisions of the Code of Criminal Procedure, you have been appointed as a court-appointed lawyer by M. (omitted). This is without prejudice to your right to appoint counsel of your choice, by notifying such power of attorney by sending, also by fax to (omitted), a formal appointment of counsel. Versailles 31/03/24 Prosecutor (omitted) Notice to users of the Wiktionary site We felt it was best to redact the names of the people and users directly involved. It is impossible to find the right words to express the sense of deep loss that this decision leaves us with. Only one thing is certain: Wiktionary doesn't stop there. The spirit that animated the project until yesterday is deeply wounded, but the community will be able to find new spaces to start again. Stay in touch, the French-speaking Wiktionary.
You are also informed that, in accordance with the provisions of the Code of Criminal Procedure, you have been appointed as a court-appointed lawyer by M. (omitted).
This is without prejudice to your right to appoint counsel of your choice, by notifying such power of attorney by sending, also by fax to (omitted), a formal appointment of counsel.
Versailles 31/03/24
Prosecutor (omitted) Notice to users of the Wiktionary site
We felt it was best to redact the names of the people and users directly involved.
It is impossible to find the right words to express the sense of deep loss that this decision leaves us with. Only one thing is certain: Wiktionary doesn't stop there.
The spirit that animated the project until yesterday is deeply wounded, but the community will be able to find new spaces to start again.
Stay in touch, the French-speaking Wiktionary.
Courtesy link: fr:. —Justin (koavf)TCM 23:44, 31 March 2024 (UTC)[reply]
fr:MediaWiki:Sitenotice. —Justin (koavf)TCM 23:46, 31 March 2024 (UTC)[reply]
@Àncilu: —Justin (koavf)TCM 23:51, 31 March 2024 (UTC)[reply]
Wow. I knew that British libel laws are seriously f***ed (i.e. extremely biased in favor of rich plaintiffs) but I didn't realize things are even worse in France. Benwing2 (talk) 23:54, 31 March 2024 (UTC)[reply]
🐟 this Is April 1st 🐟 Àncilu (talk) 23:55, 31 March 2024 (UTC)[reply]
@Benwing2 @Koavf @Sgconlaw Àncilu (talk) 23:56, 31 March 2024 (UTC)[reply]
Sacre bleu! Zoot alors! —Justin (koavf)TCM 00:02, 1 April 2024 (UTC)[reply]
fr:Wiktionnaire:P/24. —Justin (koavf)TCM 00:03, 1 April 2024 (UTC)[reply]
@Àncilu: OH! Ha ha ha! Good one! I was mystified as to how content at the Wiktionnaire (as opposed to Wikipedia) could end up defaming someone… — Sgconlaw (talk) 01:39, 1 April 2024 (UTC)[reply]
I don’t see that anything could have happened either, even if it was not an Aprils joke. @Àncilu added this to fr:MediaWiki:Sitenotice, so suppose he got an e-mail about a supposed proceeding. What ever that proceeding is in France, a court proceeding or administrative proceeding, it has to be delivered to an actual person representative of French Wiktionary (international delivery, given Àncilu is in Italy (?), is also quite a feat), which doesn’t even have legal capacity, so there can be no pending case, the lawyer would be incompetent. Fay Freak (talk) 00:02, 1 April 2024 (UTC)[reply]
@Fay Freak i live in France. Àncilu (talk) 00:07, 1 April 2024 (UTC)[reply]
@Àncilu: Well, have you heard how they invented separation of powers in France? I doubt that they would bring a defamation case before the Préfecture de Police de Versailles 😵‍💫, because that would be a civil matter. Can we admit this as creativity? Fay Freak (talk) 00:37, 1 April 2024 (UTC)[reply]
@Fay Freak : I wrote this to make April Fool's Day less realistic to avoid external problems, because of people might outside the Wiktionary misunderstand if it would be mentioned even indirectly. Àncilu (talk) 12:07, 1 April 2024 (UTC)[reply]