Wiktionary > Discussion rooms > Beer parlour

Click here to start a new Beer parlour discussion.

Wiktionary discussion rooms (edit) see also: requests
Information desk start a new discussion \| this month \| archives Newcomers’ questions, minor problems, specific requests for information or assistance.	Tea room start a new discussion \| this month \| archives Questions and discussions about specific words.	Etymology scriptorium start a new discussion \| this month \| archives Questions and discussions about etymology—the historical development of words.	Beer parlour start a new discussion \| this month \| archives General policy discussions and proposals, requests for permissions and major announcements.	Grease pit start a new discussion \| this month \| archives Technical questions, requests and discussions.

All Wiktionary: namespace discussions 1 2 3 4 5 – All discussion pages 1 2 3 4 5

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit

2024

2023

Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

December

March 2024

A way to more easily connect with readers[edit]

I have seen this idea thrown around some and I have had it myself - what if we had some official social media accounts where we can respond to readers, give polls, etc., that admins have access to? In theory readers can interact with us here i.e. at the Information Desk etc., but I think that process can be a little obtuse for the average person, and for some even intimidating. I also know that various Wikimedia projects have their own accounts on various platforms. Also, since it's an open project, the more input the better, theoretically. Vininn126 (talk) 08:45, 1 March 2024 (UTC)[reply]

I see you beat me to it. I have personally had a need for this on numerous occasions, finding Reddit and Twitter posts that (often in the form of a joke/curiosity) showed serious errors in our dictionary. A recent example is diff. A way to interact with the people that bring such mistakes to our attention is in my opinion very important.

I think the easiest way to maintain this is to simply create a couple of accounts and share their login information in the Admin channel on Discord, which automatically gives all admins that have joined the Discord server the means to manage these accounts. Then afterward we can post various polls and/or announcements after a consensus with the community, while also having the ability to quickly respond to feedback. Thadh (talk) 09:26, 1 March 2024 (UTC)[reply]

@Thadh I agree, but we should probably message other admins and send it to their emails potentially, so as not to have a barrier to get access. Vininn126 (talk) 09:29, 1 March 2024 (UTC)[reply]

I would prefer we do that based on requests. Many admins are not very active and I don't want the mail to go to some years-long unchecked inbox. Thadh (talk) 09:42, 1 March 2024 (UTC)[reply]

Some platforms we should consider: Reddit, Twitter (X), Facebook. Any other suggestions? Vininn126 (talk) 09:58, 1 March 2024 (UTC)[reply]

Only fans? Allahverdi Verdizade (talk) 10:19, 1 March 2024 (UTC)[reply]

You wish. I ain't doing a body reveal that easily. Vininn126 (talk) 10:23, 1 March 2024 (UTC)[reply]

I hate the idea that we are, in effect, endorsing/legitimizing and making more attractice these intrusive systems, but we are effectively forced into it by user preference for them. DCDuring (talk) 13:54, 1 March 2024 (UTC)[reply]

@DCDuring One thing we could do is also promote how to engage with the site more directly. By creating accounts on these sites with wider reach, we can bridge the gap for readers who are scared to edit and also show them how to start discussions, etc., pontitally increasing editorship. Vininn126 (talk) 13:57, 1 March 2024 (UTC)[reply]

@DCDuring Alternatively, we could see it as engaging with the reality that users will not always come to the site directly in order to raise issues. Ignoring that isn't going to help anyone. Theknightwho (talk) 14:13, 1 March 2024 (UTC)[reply]

It's certainly not a bad idea. But, in the interest of protecting myself from "personalization", I waste my time at MW projects, not the commercial sites. DCDuring (talk) 14:50, 1 March 2024 (UTC)[reply]

@DCDuring What do you mean by personalization? Kiril kovachev (talk・contribs) 18:24, 2 March 2024 (UTC)[reply]

Generating content tailored to me, certainly including advertising, possibly already including or soon to include price discrimination. DCDuring (talk) 20:36, 2 March 2024 (UTC)[reply]

Mastodon. CitationsFreak (talk) 16:33, 1 March 2024 (UTC)[reply]

I second this. Allahverdi Verdizade (talk) 20:53, 2 March 2024 (UTC)[reply]

VK may be a good idea to attract any potential editors and/or readers from Russia. Thadh (talk) 20:45, 10 March 2024 (UTC)[reply]

I support this idea, but would it only be admins? I might like to have access as well. Ioaxxere (talk) 20:43, 1 March 2024 (UTC)[reply]

@Ioaxxere I think without a rigorous way to add users, it might devolve to a free-for-all. Also at the beginning, I think only truly trusted users should be given access. Perhaps there'd be a process for adding trusted users in the future. Vininn126 (talk) 20:46, 1 March 2024 (UTC)[reply]

We had an unofficial Twitter account created by WF. Only admins can see it, but there's some information at the deleted page for User:Wikt Twitterer (I believe they created it when they were using that account, but kept the Twitter account going after the Wiktionary account was blocked). Chuck Entz (talk) 23:56, 1 March 2024 (UTC)[reply]

Well considering no one has objected I think we can probably move forward. The question is what email to use when signing up for these accounts and what username to use. Vininn126 (talk) 10:49, 8 March 2024 (UTC)[reply]

We should probably create one to go with these accounts. Best not to advertise its address anywhere on-wiki though. Thadh (talk) 17:18, 8 March 2024 (UTC)[reply]

Use a Wikimedia address. Don't go third-party like Gmail etc. Equinox ◑ 17:25, 8 March 2024 (UTC)[reply]

That's a good point. Vininn126 (talk) 17:26, 8 March 2024 (UTC)[reply]

@Equinox How might we do that? Vininn126 (talk) 17:30, 8 March 2024 (UTC)[reply]

I assume we tell Wikimedia that we want to make a social media account. CitationsFreak (talk) 20:22, 8 March 2024 (UTC)[reply]

@Vininn126: Search me. When I said "we should stick with open tech like IRC" people laughed at me, and Discord won. But some day it will die, or become "Utility cloud computing" with a bill attached. It's smarter to keep free and open. You do it if you can. I'm not there anyway, being a transphobic nazi etc. Equinox ◑ 07:38, 29 March 2024 (UTC)[reply]

Bengali language[edit]

I intended to create a phonelist for Bengali. Is there anyone who can guide me through bot stuff? Arundhatisgupta (talk) 18:24, 1 March 2024 (UTC)[reply]

@Arundhatisgupta What do you mean by "phonelist"? What sort of bot work are you trying to do? (Keep in mind if you plan to do page edits using a bot, you need to get permission to do so.) Benwing2 (talk) 23:45, 5 March 2024 (UTC)[reply]

Restricting `{{m}}` in etymology sections[edit]

Wiktionary's etymology sections are not very machine-readable, and the main issue is the {{m}} template, which can be used in a wide variety of ways:

Origin within a language: A {{glossary|respelling}} of {{m|en|puisne}} (in puny)
Listing alternative forms of an etymon: From {{inh|en|enm|hed}}, {{m|enm|heed}}, {{m|enm|heved}}, {{m|enm|heaved}} (in head)
Listing related terms: More at {{m|en|Tyr}}, {{m|en|day}}. (in Tuesday)
Listing unrelated terms: Not related to {{m|en|Romanian}} or {{m|en|Roman}}. (in Rom)

I propose that {{m}} be used only for unrelated terms and that we create new templates for the other three cases. Ioaxxere (talk) 20:41, 1 March 2024 (UTC)[reply]

In the case of "More at...", that should be {{l}} anyway, since it refers to the entry and not the term. Theknightwho (talk) 21:29, 1 March 2024 (UTC)[reply]

Just in terms of the formatting produced, I dislike the use of {{l}} when used inline in other running text: {{m}} produces italicized text, which is visually distinct from the rest of the text.

For that matter, I understood the "l" in {{l}} to stand for "list", as this template was originally intended to only be used in lists of terms, where formatting to distinguish from other running text isn't needed. ‑‑ Eiríkr Útlendi │^{Tala við mig} 05:09, 10 March 2024 (UTC)[reply]

That is an unrealistically lofty ideal. {{m}} has many other uses and you simply cannot sequester them all into separate templates. — SURJECTION ^{/ T / C / L /} 21:36, 1 March 2024 (UTC)[reply]

I wouldn't mind that it no longer be used to italicize (often mis-italicize) taxonomic names. But there are no restrictions on its use at present, it has mostly been used for formatting, and there is no incentive for users to limit use. It seems particular hard to imagine that we could get users to comply with different rules based on the L3/4/5/6 header they were editing in. Our filters are already getting intrusive and unhelpful. DCDuring (talk) 22:50, 1 March 2024 (UTC)[reply]

This doesn't seem like a good idea... Thadh (talk) 22:53, 1 March 2024 (UTC)[reply]

Being able to mention other terms is very, very useful. Vininn126 (talk) 22:56, 1 March 2024 (UTC)[reply]

Yeah, I don't think we need a proliferation of different templates. It will just make coding much harder. As it is, it's already difficult to learn how to use templates like {{en-verb}}, {{inflection of}}, and {{Module:quote|call_quote_template}}. — Sgconlaw (talk) 22:59, 1 March 2024 (UTC)[reply]

Oppose. I spend a lot of time fixing etymologies where someone copied the entire etymology from an entry in another language without changing the language codes. If people routinely get that wrong, they're not going to have a clue about the subtleties and intricacies proposed. You'll end up with people copying from one entry where they make sense to another where they're all wrong- or worse, partly wrong. Unlike with language codes, there's no reliable way to tell if they're being misused without knowing something about the etymology (if there were, you wouldn't need them in the first place). Basically, this proposal would give editors more ways to be wrong. Chuck Entz (talk) 23:35, 1 March 2024 (UTC)[reply]

Also, I don't know if we would want to do this in the name of machine-readability. Wiktionary is not really a database, so if we wanted it to be machine-readable, it would have had to have been one to begin with. Maybe Wikidata could hanlde this kind of thing instead. Kiril kovachev (talk・contribs) 00:50, 2 March 2024 (UTC)[reply]

@Kiril kovachev I think it's a balance - we need to be machine-readable to some extent, since some users rely on that to collate info from a wide array of entries (and it also helps our bots), but I'd agree that the templates suggested here would be a step too far, as I don't really see what advantage they'd provide. Theknightwho (talk) 18:16, 2 March 2024 (UTC)[reply]

That's true, I agree with your point here. It's nice to have a clearly-defined {{inh|en|grc|...}} kind of thing, but there're also ways in which our etymology section can be virtually free-form and forcing it to be more machine-readable would kill that flexibility. Such as the ways we may be using {{m}} right now. Kiril kovachev (talk・contribs) 18:23, 2 March 2024 (UTC)[reply]

It doesn't seem very useful to me. Are there plans to have machines doing something with our etymology sections anytime soon? At some point far enough in the future, improvements in machine comprehension of natural language might make it easier for machines to understand what humans write, rather than forcing humans to adjust how they write so machines can understand it. I think there are a lot of aspects of writing etymologies that are difficult to boil down to a fixed set of templates, so I'm not enthusiastic about us engaging in that project unless there's some real benefit we can point to. Simple etymologies already use templates, so this proposal seems to deal with a tail of complicated etymologies (do you know what percentage do contain {{m}})?--Urszag (talk) 01:06, 2 March 2024 (UTC)[reply]

This seems to be a good attitude/strategy for such matters in general. DCDuring (talk) 17:52, 2 March 2024 (UTC)[reply]

Oppose. Makes things harder, would make me inclined to not do etymologies if it's such a pain, even if I know the word origin. If we must restrict codes in this way, introduce a nice GUI that creates the code from our menu choices or something. Equinox ◑ 18:27, 2 March 2024 (UTC)[reply]

Honestly, an extension of the New Entry Creator that can easily add etymologies and quotes would be nice... CitationsFreak (talk) 06:02, 3 March 2024 (UTC)[reply]

The lack of a template for this purpose has irked me in the past. We have nice templates for when a word comes from another language, but not for when it comes from another word in the same language (unless by some well-known process such as affixation).

Recently I wanted to generate a list of all English terms which are said to derive from another English term, but where that term's entry doesn't include the term derived from it in a "Derived terms" section. Such a list would help fill gaps in our "Derived terms" sections, but it's all but impossible to generate a comprehensive list like this with the current setup.

I would definitely support an effort to designate a template specifically for same-language derivations. It seems it would be possible to use {{af}} for this purpose with minor modifications to its code, and probably a new name too:

A {{glossary|respelling}} of {{af|en|puisne}}. Not related to {{m|en|some other term}}.

The other uses of {{m}} can be dealt with in other ways. This, that and the other (talk) 03:21, 4 March 2024 (UTC)[reply]

@This, that and the other what do you think it should be called? For now, we could create a template which redirects to {{affix}}. In the future, I think {{af}} should be adapted into a generic "internal derivation" template. I think @Benwing2 is on the same page on this. Ioaxxere (talk) 17:30, 4 March 2024 (UTC)[reply]

I realized that {{from}} didn't exist—I think that's a good name. Another idea is to be able to adapt {{der}} to allow a faster way of writing {{der|en|en|term|nocat=1}}. Ioaxxere (talk) 18:58, 4 March 2024 (UTC)[reply]

Yes, I recall seeing a discussion about broadening the use of, and renaming, {{af}} in the past (not sure where or with whom).

{{from}} is an excellent name - good find. This, that and the other (talk) 00:08, 5 March 2024 (UTC)[reply]

Sounds like a skill issue on the part of the machines. Nicodene (talk) 11:13, 7 March 2024 (UTC)[reply]

deprecate Template:1[edit]

I have renamed this to {{cap}} and deprecated it per the discussion in WT:RFM, but User:Equinox reverted the deprecation claiming it will save them keystrokes. I would like to see what people think about keeping this deprecated. I don't see how two keystrokes makes much of a difference, and {{1}} is just about the worst alias imaginable. If keystroke savings is really a big deal, we could use something like {{ca}} or {{cp}}, both of which are currently undefined. Benwing2 (talk) 02:02, 3 March 2024 (UTC)[reply]

Looking at the RFD, I see that {{M}} was suggested as a new name for this template by User:This, that and the other. We should just switch the template's name to that. Saves the same amount of keystrokes as {{1}}, better alias. (Plus, there was no real consensus to deprecate it in the first place.) CitationsFreak (talk) 06:24, 3 March 2024 (UTC)[reply]

Equinox's argument is weak - "cap" falls under the fingers nicely (on a QWERTY keyboard at least) and is just as typeable as "1". As for alternative names, {{M}} (for "majuscule") is just okay, because of the existence of lowercase {{m}}. The other obvious single-letter shortcuts ({{C}} for "capital" and {{U}} for "uppercase") are already taken. Another alternative would be {{^}}, implying "raising" the first letter to uppercase. This, that and the other (talk) 07:04, 3 March 2024 (UTC)[reply]

@This, that and the other @CitationsFreak {{U}} is hardly used so we could easily repurpose it. Also how about {{uc}}? Benwing2 (talk) 08:07, 3 March 2024 (UTC)[reply]

Please don't deprecate, rename, etc. An uppercase template name, like {{M}}, is a bit worse than a lowercase one, like {{1}}. I, for one, appreciate any keystroke savings for my arthritic joints. DCDuring (talk) 13:29, 4 March 2024 (UTC)[reply]

I know this template only for a few weeks, since editors always preferred bare links. The issue here is that it looks homographic to the non-italic linking template {{l}}, whereas we don’t want confusables. For this purposes anything cV seems to be bad already, looking like {{cat}}, {{c}} and {{C}}, the parameter |nocap=, and {{caps}} and {{cx}} and what not. I suppose Benwing2 wants to cleanup {{caps}} too, though, since this is only used in about 200 entries having {{he-root}}.

Intuitively I propose {{up}} and {{high}} since the letters are close together, and high, on the keyboard. And {{↑}} which is Shift + AltGr + U on my standard xkeyboard-config layout, I’d actually use that, it looking exactly as much better than {{1}} as needed, Abloh’s 3% rule or something. Not seen DCDuring using the template, but the same concern can be valid for other editors and a rename can make it better. Fay Freak (talk) 14:29, 4 March 2024 (UTC)[reply]

I don't see the point of deprecating this template if multiple editors use it to save keystrokes. However I think we should be automatically subst-ing every instance of {{1}} and {{cap}} for the sake of readability. Ioaxxere (talk) 17:36, 4 March 2024 (UTC)[reply]

In the software industry, "deprecation" usually gives you a long time to deal with something. For example, Microsoft deprecated WebClient (a class used to perform Internet downloads), but it continues to work for many years. Also, there is usually a genuine stated rationale by which the replacement is better, not just a programmer's whim. You can joke "it's not a big deal", but it is longer to type cap than 1 (especially if you create thousands of entries, like I do) and there's also muscle memory, which is really important for older people: please understand this, even if you are young: it's ableism. In this case, it costs us literally nothing to retain the 1 page as a redirect, which makes the template work fine. Removing and breaking the redirect can be nothing but either (i) punishing "old dogs" who can't learn "new tricks", or (ii) a fascist march ahead that supports developers but not users who create the project. Equinox ◑ 03:43, 5 March 2024 (UTC)[reply]

@Benwing2: Some years ago, we had a very aggressive template editor who upset many people by placing his/her software design decisions over user needs. Please don't be that person again. There are democratic discussion tools to allow you to work it out without turning off things that really matter to me, as a person who creates hundreds of entries per month and never fucks with a template. Equinox ◑ 03:47, 5 March 2024 (UTC)[reply]

@Equinox Would {{L}} work as a compromise? It looks sufficiently different that I don't find it confusable, and it's only one extra keystroke. I don't like {{1}} because it looks almost the same as {{l}} in the code. Theknightwho (talk) 19:13, 5 March 2024 (UTC)[reply]

L is better than nothing. But I really don't see why it's killing anyone to retain the working redirect.v Equinox ◑ 19:18, 5 March 2024 (UTC)[reply]

@Theknightwho @Equinox I was thinking of repurposing {{u}}. Not even a shift key extra and it's barely used; {{U}} can be used for user mentions if anyone cares. Please note that this change is not coming out of the blue; the discussions over getting rid of {{1}} have been going on for years, most recently in WT:RFM. I'm also not sure how useful or helpful it is to accuse me of being selfish, fascist, ableist and ageist, and IMO it's definitely not helpful to demand that no template be removed once it's created (or maintained over a several-year deprecation process, which is tantamount to the same thing). Benwing2 (talk) 22:53, 5 March 2024 (UTC)[reply]

@User:Benwing2 You should have said that at the start, to be honest. I feel that mentioning that would be more productive for you, since there is the same amount of joint movement in typing both {{1}} and {{u}}, so there could be no argument based on that. CitationsFreak (talk) 23:31, 5 March 2024 (UTC)[reply]

I'm getting to the point where I just use wikitext for everything I input. If the wikitext "required" for "proper formatting" is too hard, then I get the words right and leave something that doesn't necessarily conform to WT:ELE or whatever other norms we have for cleanup by others, who seem to like that kind of thing. If the next step is to filter such input, I'm out of here. DCDuring (talk) 01:07, 6 March 2024 (UTC)[reply]

@DCDuring I don’t have any strong views on the issue raised by this thread, but this attitude isn’t fair on other users, because you’re just creating clean-up work for others. The idea of using link templates outside of definition lines isn’t new, and it’s not complicated. Theknightwho (talk) 17:40, 7 March 2024 (UTC)[reply]

@User:Theknightwho Just more keystrokes and more learning overhead. I find it hard enough to try to make and keep taxonomic and related entries useful and to correct other users' mistaken and omitted uses of {{taxlink}}, {{vern}}, and now {{taxfmt}}. I don't undertake any non-morphological etymologies, instead inserting {{rfe}} (and getting complaints about that), because that's just more learning overhead, easily forgotten. I'm sure I get lots of descendants items wrong too. DCDuring (talk) 18:43, 7 March 2024 (UTC)[reply]

@Theknightwho, Benwing2: I’m with User:Ioaxxere above: Why not just automatically subst every instance of {{1}} (perhaps by bot), making this problem vanish? This template, whatever its name, is convenient for editors adding content but bad for readability; subst-ing would keep the convenience while resolving the problems of having such a template hang around in the code. — Vorziblix (talk · contribs) 02:57, 6 March 2024 (UTC)[reply]

@Vorziblix It's not possible to automatically do this except by periodically running a bot script. We only have a few things that currently run by periodic bot scripts, and AFAIK they are all triggered manually (by me, or in the case of {{t+}}, by User:Ruakh, although I don't know whether this still runs); in general I am reluctant to add more esp. to mainspace pages because they cause surprise for editors and are a maintenance burden. Also, for long words at least, it might be worse to have it duplicated in capitalized and lowercase forms than to have a (properly-named) template that wraps a single instance of the word. Benwing2 (talk) 03:04, 6 March 2024 (UTC)[reply]

Why aren't you 'reluctant' to do things that add more keystrokes? Is it because you aren't the one doing those keystrokes? Or do you think that our content is so good that all we have to do is pretty the dictionary up and let AI fill in the gaps? DCDuring (talk) 13:19, 6 March 2024 (UTC)[reply]

Needlessly snarky. Vininn126 (talk) 13:28, 6 March 2024 (UTC)[reply]

@DCDuring: Look, we can’t imagine well how it is to have arthritis and have to balance the concerns of joints and eyes of everyone, stop being so combative. Depending on the position of the keys, one or two keystrokes more may go easier for you than even one: if they are in a close area and if they are in the upper mid; 1 is at a corner and {{1}} strains the eyes of people with impeded and good eyesight in view of {{l}}. That’s why I have these three suggestions here, we might take two of: {{up}}, {{high}}, {{↑}}. I actually think a lot about keyboard layouts, the curly brackets are at the keys for 8 and 9 for me and for US standard <AD11> and <AD12> (the two right of O and P) and so these will be typed on one hand easily. Fay Freak (talk) 13:36, 6 March 2024 (UTC)[reply]

That's not snarky. I'm really concerned about attitude.

My eyesight isn't very good either. I've weighed the difference to me.

So, I'm just supposed to roll over? I haven't objected to the {{subst}} idea.

Why don't we have a thoroughgoing consideration of keystroke minimization. Why not use {{i}} for initial capitalization, instead of wasting it as a redirect to {{qualifier}}, when {{q}} also redirects thereto? DCDuring (talk) 15:45, 6 March 2024 (UTC)[reply]

In general the concerns of easy input and easy readability for future editors both have to be considered when naming templates. There are ways for editors to configure their own machines to make entry easier, e.g. I think an AutoHotkey script could be used on Windows to convert {{1| to {{cap|, or do anything similar like this, on the user's end.--Urszag (talk) 02:52, 7 March 2024 (UTC)[reply]

On the one hand I support the principle of things actually making sense, and nothing about the abbreviation "1" does. On the other hand it seems fairly harmless, and if it really is saving Equinox so much trouble, why not? Nicodene (talk) 11:10, 7 March 2024 (UTC)[reply]

Don't touch the template. I say this as en.wikt's 2nd biggest contributor. P. Sovjunk (talk) 21:33, 8 March 2024 (UTC)[reply]

Mehhh. I agree it's an unintelligible name (and therefore proposed at RFM that we make the 'main' name something more intelligible), but redirects are cheap and I don't see harm in leaving {{1}} as a redirect. Some prolific editors are clearly used to using it. (In any given couple of months, we have one or two entries which use {{altcaps}} and thus just display a redlink, because I or someone else has been unable to recall what the new name for that is.) I admit {{1}} is a particularly unintelligible name, though (unlike e.g. {{altcaps}}). - -sche (discuss) 06:04, 9 March 2024 (UTC)[reply]

Badly named redirects add cognitive burden to people trying to understand the wikicode, and redirects in general (esp. badly named ones) increase the tech debt; enough of them and the site becomes unmaintainable. This is why people like me and User:Theknightwho who put time into maintaining the site (rather than just using it) push back against having random redirects littering the site. I also still don't know why User:Equinox as well as User:DCDuring (who doesn't even use the alias) and are so attached to this particular alias when I have proposed a more sensible redirect {{u}} that is the same number of keystrokes. (Not to mention that using any template requires 5-6 keystrokes due to the left brace and vertical bar, so I have a hard time buying the argument that a single extra shift key makes a huge difference. I should also add, Equinox accused me of ageism and ableism knowing almost nothing about me -- I am in fact older than him and have suffered my own spate of hand-related disability.) Benwing2 (talk) 06:18, 9 March 2024 (UTC)[reply]

Personally, I feel that most of the arguments that apply to {{1}} apply to {{u}} as well. Also, after enough uses of it, they will be using it in no time (like with the mandated use of "en" in the etym and quote fields). CitationsFreak (talk) 20:50, 9 March 2024 (UTC)[reply]

{{up}} is still clearer than {{u}} and easy enough to type. My tendency is always that single-letter templates are badly named if they might look like something else (e.g. usage templates, {{user}}) and as after all there are little more than twenty letters available. This is not strictly comparable to terminal commands either, where we use to have a -V synonym of a longer --word. The one-ASCII-character ones really need broad consensus, even unconscious one. I doubt that {{u}} for {{cap}} will have this habitation like {{m}} and {{l}} have. The difference is also that these, and {{q}}, have semantics, even if it only consists in wrapping a language other than the working one, that capitalization at the beginning of English glosses hasn’t. All rationalizations that I am uneasy about {{u}} and {{i}} for any purpose. So far I have only three one-letter template-codes I use and watch out for. Fay Freak (talk) 21:49, 9 March 2024 (UTC)[reply]

@Benwing2 "I don't know why users keep doing their user things, rather than the better thing that I, the programmer god, imposed upon them". I hope some day you will realise why users hate your guts. I have been here since 2008, LOOK AT MY TRACK RECORD, I am doing nothing to you, I am not hurting you, but YOU, BENWING, you are changing things, you are hurting me and making it hard for me to continue the free open source project. Don't you dare make me sound like the criminal here. Equinox ◑ 07:52, 29 March 2024 (UTC)[reply]

I do not "agree" with some changes made to quote-book etc by Benwing, but it doesn't ruin what I'm doing. I am glad there is someone out there boldly editing at that level, because that seems to me like high-quality editor that can help out in complex code-related situations. I remember that I didn't agree with something that was going on related to Categories, too. But I'm kind of ambivalent most of the time on these things, and even if I "lose" an argument or whatever, ultimtaely it's not a big deal even if 100% of what I've ever done was just deleted. (I will mention that I recently (maybe Jan or Feb 2024?) started using the "{{u}}" in some citations that had underlined text in the original. I believe I found it in the code at a Wikipedia article and applied it here or something? I think Wiktionary should maintain code-level comaptibility with Wikipedia unless Wiktionary would thereby lose some area of functionality.) Geographyinitiative (talk) 08:08, 29 March 2024 (UTC)[reply]

@Geographyinitiative I'm glad you don't have a problem. I do. Do you also hang around abortion clinics saying "I'm sorry you lost a baby but I didn't"? Come back when you are relevant. This really hurts my usability and makes it hard for me to keep up the famously huge productivity that I have here. Equinox ◑ 08:13, 29 March 2024 (UTC)[reply]

Report of the U4C Charter ratification and U4C Call for Candidates now available[edit]

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello all,

I am writing to you today with two important pieces of information. First, the report of the comments from the Universal Code of Conduct Coordinating Committee (U4C) Charter ratification is now available. Secondly, the call for candidates for the U4C is open now through April 1, 2024.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. Community members are invited to submit their applications for the U4C. For more information and the responsibilities of the U4C, please review the U4C Charter.

Per the charter, there are 16 seats on the U4C: eight community-at-large seats and eight regional seats to ensure the U4C represents the diversity of the movement.

Read more and submit your application on Meta-wiki.

On behalf of the UCoC project team,

RamzyM (WMF) 16:25, 5 March 2024 (UTC)[reply]

Module Breaker[edit]

User:Module Breaker should be blocked. Also, why can't I edit Wiktionary:Vandalism in progress? Avessa (talk) 15:06, 6 March 2024 (UTC)[reply]

@Avessa: Thanks. Because it is a vandalism target obviously. You can do some useful edits and then edit that page if needed. The bar to become autoconfirmed is low. Fay Freak (talk) 17:34, 7 March 2024 (UTC)[reply]

Unlink more and most in English headwords[edit]

For example:

common (comparative commoner or more common, superlative commonest or most common)

I don't see the point of having links to more and most in this kind of entry. In my view, having excessive links makes a page less visually appealing and could invite misclicks. Would anyone oppose removing these links? Ioaxxere (talk) 22:02, 6 March 2024 (UTC)[reply]

Not really seeing why unlinking the words is necessary. Maybe learners of English would find the links helpful. — Sgconlaw (talk) 11:49, 7 March 2024 (UTC)[reply]

Because it is easy to click on touchscreens. One would find it helpful only one time theoretically: and then would not understand the English definitions anyway; anyone who would find necessity to click them is at the wrong place with a monolingual dictionary. Nobody said it is necessary, it is about optimization. Fay Freak (talk) 13:35, 7 March 2024 (UTC)[reply]

For that matter why have links to commoner and commonest? (Alright, maybe commoner is a special case because of commoner#Noun.) DCDuring (talk) 16:58, 7 March 2024 (UTC)[reply]

Because the link is what’s left from WT:ACCEL, or any red link crosslinguistically when there is a comparable situation with periphrastic adjective gradation without the gadget, leaving a red link to invite creation. It would be too distressing to create a page and have no link then. The links are made not with random logics but impulses in mind. Fay Freak (talk) 17:31, 7 March 2024 (UTC)[reply]

I don't understand your last sentence. What "random logics" and what "impulses"? DCDuring (talk) 19:58, 7 March 2024 (UTC)[reply]

You find something that makes sense, harmonizes with some aesthetic equation. But we have to ponder what will be clicked, by the typical, impulse-driven behaviours of readers and editors. The contradiction against analogy logic (synthetic comparatives vs. periphrastic ones) would barely be felt. Fay Freak (talk) 20:06, 7 March 2024 (UTC)[reply]

Support. — excarnateSojourner (ta·co) 06:23, 26 March 2024 (UTC)[reply]

I was about to close this but I've just realized that further and furthest are also linked. I would like to expand this proposal last-minute to cover those as well. @Sgconlaw, ExcarnateSojourner, does this change your opinions at all? Ioaxxere (talk) 21:48, 6 April 2024 (UTC)[reply]

@Ioaxxere: mmmm, not really. I think there's no harm in leaving them linked. — Sgconlaw (talk) 21:52, 6 April 2024 (UTC)[reply]

@Ioaxxere Oh, you mean like at gone. No, that does not change my position. — excarnateSojourner (ta·co) 23:11, 6 April 2024 (UTC)[reply]

In that case this passes 2-1. Ioaxxere (talk) 06:38, 7 April 2024 (UTC)[reply]

Wikimedia Canada survey[edit]

Hi! Wikimedia Canada invites contributors living in Canada to take part in our 2024 Community Survey. The survey takes approximately five minutes to complete and closes on March 31, 2024. It is available in both French and English. To learn more, please visit the survey project page on Meta. Chelsea Chiovelli (WMCA) (talk) 00:23, 7 March 2024 (UTC)[reply]

Revoking autopatrolled status from Kwamikagami[edit]

Some background: @Kwamikagami has been autopatrolled since April 2009, and currently has just over 34,500 edits. They're sporadically active, but when they do edit they tend to make changes to large numbers of entries very quickly, and they tend to focus on single-character entries or anything relating to IPA.

Me, @Benwing2, Vininn126, AG202 and others have been pretty concerned about their sloppy editing for a few months now, and their autopatrolled status makes it much harder to spot. Some examples off the top of my head (but there are literally hundreds like this):

[1]: mass-adding languages with tons of mistakes: the Khoekhoe entry uses the wrong language code throughout, Bodo (India) has the wrong L2 header, and Dogri (even today) still doesn't have a headword template.
[2] Deciding to merge ң and ӈ with no consensus or discussion, despite the fact this caused a bunch of issues for several languages. They also did this for a bunch of analogous letters.
[3] Adding a bunch of stenoscript entries like w—O or adm with merged part of speech headers and/or no headword templates. I can see why they've done this - to avoid repetition - but the obvious and sensible thing to do would have been to start a discussion, not create ~100 more entries with the same issue ([4]). We should not be giving '''{{PAGENAME}}''' on the headword line, and anyone with autopatrolled status should know that.
[5] Adding definitions like "?". No request or attention template - just "?".
[6] Even looking at their most recent contributions, they've wrongly given the pronunciation IPA^(key): /ɪkˈsaɪ.ən, -ɒn/ at Ixion. This should be IPA^(key): /ɪkˈsaɪ.ən/, /-ɒn/ or IPA^(key): /ɪkˈsaɪ.ən/, /ɪkˈsaɪ.ɒn/.

All of this creates a massive clean-up job for everyone else, and Kwamikagami has repeatedly proclaimed that they don't understand the problem, which quite frankly means I don't think they should be autopatrolled anymore. Theknightwho (talk) 22:26, 8 March 2024 (UTC)[reply]

Support. I actually believe we should block Kwami for at least a month since they refuse to acknowledge the problematic nature of many of their edits and continue doing the same thing after warnings. But revoking autopatroller status is a good start. Benwing2 (talk) 22:30, 8 March 2024 (UTC)[reply]

Yeah, I decided not to bring up the repeated refusal to understand consensus, since it didn't seem relevant to this particular issue, but that's definitely a much worse issue.

The clean-up job of their contributions is going to be huge. Theknightwho (talk) 22:33, 8 March 2024 (UTC)[reply]

Support. Vininn126 (talk) 22:31, 8 March 2024 (UTC)[reply]

Support. There's also Category:Translingual entries with incorrect language header (not all of them are Kwami's, but too many are). There are good arguments for treating language-specific characters as either the language itself or translingual, but not both. Most of the entries in this category use translingual templates and language codes under other language headers.

The problem they have in general seems to be making snap decisions without thinking things through, then sticking with those bad decisions until forced to abandon them. They know more than I do on a lot of things, but they don't make very good use of that knowledge. As for the whole stenoscript issue: they did actually ask for advice at the time, so that may not be the best example. Chuck Entz (talk) 01:44, 9 March 2024 (UTC)[reply]

@Chuck Entz I'm not sure I agree with you re the stenoscript: regardless of when they asked for advice or what the response was, they've still created ~100 entries which are in a completely unacceptable state and will need to be cleaned up by someone. Even if they got no response to at all, what they did was definitely not the right thing to do, and is the kind of thing that has got some new users banned. Theknightwho (talk) 02:07, 9 March 2024 (UTC)[reply]

Support + they need a block. AG202 (talk) 04:58, 9 March 2024 (UTC)[reply]

@Theknightwho I confess that I've also used this forbidden headword line on sum#Multiple parts of speech and tht#Multiple parts of speech. I agree that it would be better as a template, but I think "multiple parts of speech" should be allowed as a POS header. Ioaxxere (talk) 06:49, 9 March 2024 (UTC)[reply]

No, it shouldn't. — SURJECTION ^{/ T / C / L /} 09:53, 9 March 2024 (UTC)[reply]

It looks super objectionable. With little necessity, since at least with {{head}} you can just use |catN=. You might stretch WT:POS a bit by letting a part of speech header be followed by another part of speech header and then only the headword line, which probably contradicts basic publication logics of not having empty headers but would at least look better. For such alternative forms, to save vertical space, we could introduce headers like Pronoun · Adjective (i.e. in my example separated by middle dots). Years ago it was considered whether it would be better to have templates instead of headings, as on other Wiktionaries, only dismissed for Lua memory restrictions, to make appearance centrally manipulatable. Fay Freak (talk) 18:09, 9 March 2024 (UTC)[reply]

I don't think this would be widely accepted. It messes with categorization and not having a "head" template violates our practices. I would change it to how entries like obvi & unfort are. AG202 (talk) 20:04, 10 March 2024 (UTC)[reply]

@AG202 those entries are sensible, since each abbreviated word corresponds with a single part of speech. Compare tht, which would need to have four or five identical POS sections. Clearly a dedicated template would be preferable eventually, although for now I don't think a few missing categories are the end of the world. Ioaxxere (talk) 23:37, 10 March 2024 (UTC)[reply]

The repeated POS sections are what are required at this point per our policy. You should've brought it up with English editors at the very least if not everyone in general before creating those entries like that. It clearly violates our Entry Layout guidelines. AG202 (talk) 23:48, 10 March 2024 (UTC)[reply]

@Ioaxxere I agree with User:AG202. There are various imaginable ways of compressing repeated POS sections but (a) it needs discussion, (b) I doubt using a POS "Multiple parts of speech" is ideal in any case; certainly the actual parts of speech should be listed one way or another. Benwing2 (talk) 23:53, 10 March 2024 (UTC)[reply]

Support unfortunately, I think we exhausted other options. Kwami was given multiple written warnings and blocks for EACH these mass edits and continued regardless. They also haven't really helped clean up or shown remorse for their problematic mass edits... - سَمِیر | Sameer (^{مشارکت‌ها} · ^بحث) 08:47, 9 March 2024 (UTC)[reply]

Support + a block to fix up the entries. CitationsFreak (talk) 23:08, 9 March 2024 (UTC)[reply]

Support Ioaxxere (talk) 23:35, 10 March 2024 (UTC)[reply]

Revoked, given:

The unanimous and overwhelming support.
It only takes a nomination from one admin and approval from another for a user to gain autopatrolled status.
This has been open for just over 2 days, which is about 6 times longer than it took for the original nomination to get approved and actioned ([7] [8] [9]).

Theknightwho (talk) 00:18, 11 March 2024 (UTC)[reply]

@Theknightwho Thank you. Benwing2 (talk) 00:26, 11 March 2024 (UTC)[reply]

Eastern Geshiza language[edit]

User:Geshiza has been asking about adding this language, but in the meanwhile has created a walled garden of over 30 entries with their own improvised categories, but no templates and no links to or from the rest of Wiktionary.

Adding this language won't be easy, because it's hard to tell what it really is. It's apparently a sub-sublect of Horpa (language code ero), but the Wiktionary article for that language doesn't have much detail about what it describes as "a cluster of closely related yet unintelligible dialect groups/languages". In one analysis of the groupings that it cites, there are 5 "varieties", of which "Central Horpa" has 3 "dialects", one of them being "Dgebshesrtsa (Geshezha 革什扎) (non-tonal)". Whether "Gesheza" and "Geshiza" are the same thing isn't explicitly stated, but another quote in the article makes that seem likely. At any rate, there's no mention at all of "Eastern Geshiza". Does anyone have access to any sources that will make sense out of all this? Chuck Entz (talk) 02:28, 10 March 2024 (UTC)[reply]

@Chuck Entz This sort of "do it then get permission" approach was done for Belter Creole as well. I am strongly opposed to allowing this to proceed as it sets a terrible precedent. I would suggest moving the contents into that user's space until it becomes clearer whether there's any hope of supporting this variety or these varieties. Benwing2 (talk) 03:11, 10 March 2024 (UTC)[reply]

@Benwing2 I don't think they're comparable at all. Belter Creole is a constructed language, whereas Eastern Geshiza seems to be a variety of Horpa, and I can see that a published grammar exists. I would much rather that we simply put a moratorium on any new entries until we've hashed out how it should be handled, but regardless of the language code they still belong in mainspace. Theknightwho (talk) 03:16, 10 March 2024 (UTC)[reply]

@Theknightwho Ultimately maybe so, but not remotely in the current state they're in, and I doubt simply asking or telling this user to stop will make them stop. Who's gonna restructure and clean up the entries once we sort out how many varieties are involved and whether they are L2's or etymology variants? You? If you're not willing to personally commit to doing this then IMO we should move these ill-structured entries to userspace and put them back, gradually, in a properly structured form, once we add the lect codes. Benwing2 (talk) 03:36, 10 March 2024 (UTC)[reply]

@Benwing2 @Theknightwho moving the entry's to their userspace is probably fine. They seem to not understand templates (but they are making an effort, as they seem to be trying to make their entries match others here). We could have them practice using templates in their userspace and, once we feel like they understand how templates work, they can move the entries back themselves. — Sameer (^{مشارکت‌ها · بحث})

As someone who regularly patrols Abuse Filter 68, I can tell you that creating entries with no templates is more common than you might think. Usually it's not bad faith- just cluelessness. Chuck Entz (talk) 04:23, 10 March 2024 (UTC)[reply]

@Benwing2: Well, they've already been asked, but it's too soon to tell how they'll respond. Chuck Entz (talk) 03:55, 10 March 2024 (UTC)[reply]

@Chuck Entz @Benwing2 they responded and they indicated they will wait until everything is resolved before continuing to edit. — Sameer (^{مشارکت‌ها · بحث}) 05:19, 10 March 2024 (UTC)[reply]

@Sameerhameedy Sounds good, thanks for making the request. Benwing2 (talk) 05:21, 10 March 2024 (UTC)[reply]

Language titles with category[edit]

Could the language.titles have a clickable link to their Category? (main, or lemmas, whatever?) Ideally, also with tooltip with their code? (would be very helpful!). At pages with many language sectors, it is very difficult to go down to the bottom and find the language.
e.g. [:Cat:Afar language|<span title="Afar (aa)">Afar</span>] Thank you! ‑‑Sarri.greek ^♫ I 12:12, 10 March 2024 (UTC)[reply]

I'd rather not add any templates to the headings. One could implement a JavaScript gadget that automatically does this, though. — SURJECTION ^{/ T / C / L /} 12:31, 10 March 2024 (UTC)[reply]

M @Surjection, Thank you. I have no idea how it could be done. I would be delighted at the output. ‑‑Sarri.greek ^♫ I 12:57, 10 March 2024 (UTC)[reply]

I have a working prototype in User:Surjection/linkLanguageHeaders.js. You can add it to your common.js to test it. Perhaps it can be turned into a gadget if there is interest. — SURJECTION ^{/ T / C / L /} 13:08, 10 March 2024 (UTC)[reply]

Yes, I think it was agreed awhile ago not to use templates in headings and IMO this is just as well. Benwing2 (talk) 00:25, 11 March 2024 (UTC)[reply]

I was not proposing a way to do it, I was just showing the desired result. I don't know what js is. I do not change default looks at platforms. As a reader, I would like to click language.titles, because I do not know what they are and Categories are too far away to click. Could, please, en.wiktionary rethink it? Thank you. ‑‑Sarri.greek ^♫ I 01:36, 14 March 2024 (UTC)[reply]

Hi, it should be available now through Special:Preferences under "Gadgets" as "Add links to language headings that point to the category of the corresponding language." — SURJECTION ^{/ T / C / L /} 19:09, 14 March 2024 (UTC)[reply]

Ω! Μ @Surjection! you did this for me? Hooray! Thank you, thank you! I will find it immediately. You are too kind. I hope, lots of people will like it and that it become standard! ‑‑Sarri.greek ^♫ I 19:57, 14 March 2024 (UTC)[reply]

It works! it is wonderful; why not for everyone? why hidden in 'gadgets'... You are a magician M @Surjection. The default should be the 'best' and the most useful. ‑‑Sarri.greek ^♫ I 20:06, 14 March 2024 (UTC)[reply]

Make default language titles with category[edit]

Great news! M @Surjection, has made a Gadget and we can click the Language.Titles to go to the category! I propose it become default, for all to use. Kiitos! kiitos Surjection! ‑‑Sarri.greek ^♫ I 20:27, 14 March 2024 (UTC)[reply]

I don't personally think it should be the default, since it can be a bit distracting and confusing to those who aren't used to it. — SURJECTION ^{/ T / C / L /} 21:44, 14 March 2024 (UTC)[reply]

But, M @Surjection, you have made it so discreet and elegant! There are no colours, or anything 'loud' about it. I find it very helpful, because there are many names of languages unknown to us. I am delighted, and I wish all people could use it too. (you may not guess it, but lots of us do not go to Preferences. This was my first time, except for Global Pref. for Vector Classic for wikipedias, and fr.wikt). ‑‑Sarri.greek ^♫ I 22:27, 14 March 2024 (UTC)[reply]

Thank you @Surjection for doing this! I tried it out and it looks great. @Sarri.greek I think enabling it by default could very well be done a bit down the line but for the moment we should wait to make sure it doesn't have any unexpected interactions with anything else. Benwing2 (talk) 22:36, 14 March 2024 (UTC)[reply]

Mainio! hieno! -in honour of M Surjection, from now on, Finnish will be the language of interjections. .js will be renamed .surjs @Benwing, many wiktionaries have clickable Lang.titles. I was, so longing for it. At el.wikt, the visible labels {{lb}} before definitions, link to their Cat. Where, we see on top, the word of the label in host language, and sorted on top, its translation in the target language :) Anything to fascilitate readers! ‑‑Sarri.greek ^♫ I 04:38, 15 March 2024 (UTC)[reply]

I'm also worried about how it will behave in the mobile view, specifically about if it makes the headings harder to click to expand. It does help get around the lack of categories in the mobile view, something which has always greatly irked me. — SURJECTION ^{/ T / C / L /} 07:35, 15 March 2024 (UTC)[reply]

Two transliterations[edit]

A question (after endless discussions of how to transliterate Modern Greek at Module_talk:el-translit). I do not know about other languages, but at least for Modern Greek ISO offers two types of conversions.

TypeA = unique.conversion letter-to-letter transliteration, reversable (two-directional), used for international usage. Customs, machines etc when one-to-one translit is needed.
TypeB = slightly simplified, and pseudo-phonemic, calls it transcription (but not with IPA symbols), for national usage. For Greek, the only difference to TypeA are two macron diacritics.
ISO also introduces an idea of a 'level 3' mixed Type, more phonemic, for national usage, 'especially' when the above transliterations are very different from the pronunciation.

The question is: Does en.wiktionary have a rule that says: a) en.wikt is obliged to provide the official unique.conversion ISO transliterations. b) en.wiktionary also provides a more phonemic transliteration based on ISO and House Rules, through consensus.
If a) is yes, then we should have two transliterations (for some languages). Discussions would be needed only for b), saving a lot of our energy. Two translits, How? I propose

word • (xxxxx^© / xyyxxxyy^ⓦ) ...or I for ISO ^Ⓘ --please check the tooltips

Thank you. ‑‑Sarri.greek ^♫ I 12:42, 10 March 2024 (UTC)[reply]

@Sarri.greek Agreed, Persian is also running into this issue. After a discussion months ago it was agreed that Persian templates should have two transliterations (Classical + Iranian) but modules don't support that so we can't do anything rn. I believe Hebrew editors have wanted something similar as well. — Sameer (^{مشارکت‌ها · بحث}) 18:21, 10 March 2024 (UTC)[reply]

@Sameerhameedy There is some language-specific support for this in place at the moment: the major example being Chinese (and I'm not referring to the separate languages grouped together), where several lects show two or three transliterations each in the dropdown; Cantonese has four, and Mandarin seven(!). Korean, Thai and Khmer also do this in various ways, too.

It's clear that there needs to be a language-neutral way of showing things like this, and (taking Mandarin as a benchmark) it shouldn't be limited to transliterations into the Latin script, either, given one of the systems is Zhuyin and another Cyrillic. Theknightwho (talk) 20:16, 10 March 2024 (UTC)[reply]

Thank you M @Sameerhameedy. Asking M @Theknightwho for languages mentioned with 3 or 4 transliterations. What is the legal status of these? By 'obligatory' for wiktionary to show, we mean: ISO-assigned for international transactions like exports. Is there one and only one topping the others? The problem we have here is: Because wiktionarians try to adapt ISO to something more useful to our readers, the discussions a. never end. and b. every 5 or so years, someone comes up with an alteration or a restoration of some letter conversion. This will never end. ‑‑Sarri.greek ^♫ I 22:43, 10 March 2024 (UTC)[reply]

@Sarri.greek As far as I can tell, they all have one system which is used for things like links (i.e. as the "transliteration" in the normal sense), and the others are only shown on the entry.

I don't think we're under any obligation to choose the ISO standard as the main transliteration, but if we don't, then it's a good idea to show it on the entry itself. Theknightwho (talk) 22:50, 10 March 2024 (UTC)[reply]

@Theknightwho, I see that such languages have boxes for transliterations = they can manage multiple solutions. I was thinking of languages that have one translit. next to PAGENAME, and disabled the option to add a second one. May I add a point:
ISOs have been critisised for poor results and unsuccesful conversions. Still, I am not proposing to reform ISO here. If ISO makes changes, we record them and update the official translit. I am proposing to free ourselves from the rigid 1st translit, which is not-to-be-debated. Also: how do wikipedians face this problem? Does en.wikt. have a liaison to en.wikipedia for questions or coordination? Thank you. ‑‑Sarri.greek ^♫ I 23:04, 10 March 2024 (UTC)[reply]

@Sarri.greek The community of Wikipedia editors who work on language entries seems much smaller than the community of Wiktionary editors, so whoever has the most stamina tends to win out. E.g. User:Mahmudmasri insisted on particular standards for transliteration and phonemic rendering of Egyptian Arabic that I disagree with, but I don't have the energy to fight him on this and he does have the energy to patrol all the relevant pages and edit-war as necessary to get his preferred system in place, so that is what Wikipedia has. Similarly for things like language names and family trees; User:Kwamikagami out-staminas everyone else. I definitely agree with User:Theknightwho that we are under no obligation whatsoever to choose ISO's or anyone else's proposed standard as the preferred/main transliteration. We need to do what's right for Wiktionary and hopefully maintain some consistency of approach across languages where feasible. Benwing2 (talk) 00:24, 11 March 2024 (UTC)[reply]

Thank you @Benwing2 About your comment (for general 'rules') >>we are under no obligation whatsoever to choose ISO's or anyone else's proposed standard as the preferred/main transliteration.<< (Also by @Theknightwho) The problem with not having some 'locked' directives, is, that talks be endless. Official things: (ISO, spelling directives of Academies or similar). Are not official things the first obligation of wikt? = credibility, stability, well-referenced, not subject to 'talks' and alterations. I dislike it too, but as a reader, I expect the info available. Otherwise, I would have to go elsewhere to get it. For some ISOs: Wiktionary's standards aspire to give better results than the official ISO :) That would be nice! But one has to see the comparison. ‑‑Sarri.greek ^♫ I 01:46, 11 March 2024 (UTC)[reply]

@Sarri.greek Yes, sometimes consensus is hard to achieve but we all know that some ISO standards are garbage and/or have no adoption, and many ISO standards simply have different aims than we do at Wiktionary. I think we should aim to not be gratuitously different from ISO standards where possible (e.g. we use ISO language codes whenever possible rather than incompatible ones), but at the same time not be bound by them (e.g. sometimes we merge lects that ISO considers different, and sometimes we split lects that ISO considers the same). Benwing2 (talk) 01:53, 11 March 2024 (UTC)[reply]

Ok, then @Benwing2. This is the end of this talk, so, my proposal for 2 transliterations is withdrawn. ‑‑Sarri.greek ^♫ I 02:07, 11 March 2024 (UTC)[reply]

@Sarri.greek I don't think you need to withdraw your proposal just to end the conversation :) ... I do think having multiple translits is an interesting idea to be potentially considered further. After all, this is not the first or second time this idea has come up. Benwing2 (talk) 02:11, 11 March 2024 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Holding a discussion between two in this medium is difficult, I find one between so many impossible! I will only say that Dictionaries should be accessible (understandable to the "Man on the Clapham omnibus" — Oxford dictionaries appreciated this and have changed substantive to noun in their entries). I suspect that most people, not understanding IPA, use the transliteration as a guide to pronunciation. I hope that whoever makes a decision (the cynic in me says that it will probably be changed again next year) will bear the "man from Clapham" in mind. — Salt marsh ^☮ 06:07, 11 March 2024 (UTC)[reply]

Ωωωω! my wise mentor and administrator for Greek, @Saltmarsh! Hear, hear! Thank you. ‑‑Sarri.greek ^♫ I 06:13, 11 March 2024 (UTC)[reply]

One system, multiple transliterations[edit]

For Vedic Sanskrit, transliteration is abused to show the placement of the accent. Our policy is not to show the placement in the spelling of the word. Now, for finite verbs incorporating prefixes, there are two possible placements for the same verb, depending on the grammatical usage of the verb. Is there an approved mechanism for showing the two transliterations, and if so, what is it and where, if anywhere, is it documented? Or do we only show the accent for finite verbs for the usage where a verb without a prefix would bear an accent? The placement in the other case appears to be reliably predictable if one can identify the prefix. --RichardW57m (talk) 11:07, 11 March 2024 (UTC)[reply]

There's an undocumented solution of using |tr2= in templates similar to {{head}}, which currently works for some (perhaps all) Sanskrit headword templates. @Theknightwho: I don't know whether it is likely to be declared a 'hack' and broken with disdain. It looks from the code of Module:headword that it is intended to work. --RichardW57m (talk) 15:48, 26 March 2024 (UTC)[reply]

@RichardW57m Stop tagging me if you’re just going to make rude comments. Theknightwho (talk) 15:52, 26 March 2024 (UTC)[reply]

@Theknightwho: Kindly advise whether this technique is safe to use. I'm not sure what to conclude from its lack of documentation. Perhaps the correct solution is to clone the headword module for Sanskrit, though I hope not. --RichardW57m (talk) 16:10, 26 March 2024 (UTC)[reply]

I refer to this marking of the accent as an abuse partly because transliteration-related categorisation assumes that explicit transliterations are exceptional and worthy of review, whereas it is the norm for words found in accented texts. --RichardW57m (talk) 11:07, 11 March 2024 (UTC)[reply]

How should we transliterate (into Japanese script or other scripts), romanize, and lemmatize Ryukyuan?[edit]

Previous discussions[edit]

The following previous discussions can have useful possibilities.

Information[edit]

Lately, the Ryukyuan orthography has been a mess. Various works vary between the hiragana or katakana or mixed script. There are vowels and syllabic consonants that cannot be transcribed cleanly/properly using Japanese orthography, so the central vowel (ɨ for example, サ行) been variously transcribed in Japanese script as シゥ, シィ, ス, す, スィ, ス𛅤 (CJK small katakana wi (ヰ) if you cannot render this character), you name it. Aspirated and unaspirated consonants are also variously referred to as plain and glottalized consonants, and one of either is distinguished in hiragana or katakana. At Wiktionary we use an ad hoc transcription of inserting dakuten into the aspirated (Amami) and unaspirated (Okinawan/Kunigami), which is not used anywhere else. We also use an ad hoc method of including kanji in Ryukyuan languages, which some people do to transliterate Okinawa songs (but I can't find an example at the moment). Thus, 送り仮名 (okurigana) is basically another ad hoc transcription. In addition, we are basically duplicating kanji information from the Japanese entry, which requires more time and effort.

For the glottalized consonants such as [⸢ʔwáː] 'pig', should we do っわー, or ’わー?

Miyako has a special vowel, variously referred to as an apical vowel, laminal vowel, or fricative vowel (it is not a central vowel), which is variously transcribed as (S)ɨ, (S)ï, ʉ, ɿ, z, ü, you also name it. In fact, there are syllabic consonants in Ogami Miyako that cannot be transcribed cleanly in Japanese kana script, although there's a possibility that some Ogami words are actually reflections of a fricative vowel, as Kaneda Akihiro's vocabulary spreadsheet (from personal communication) does.

For romanizing, take Okinawan Shuri dialect [⸢ʔútɕínáː] for example. We could variously romanize it as ucinaa, 'ucinaa, ?ucinaa, uchinaa, uchinā, 'uchinā, you name it as well. And central vowels ([ɨ] in this instance) could either be transliterated as ï or ɨ (perhaps IPA only, so the former can be more plausible?), or we can transliterate [i] as yi and [ɨ] as i, and also have a glottalized initial as qV (as in qutyinaa). For aspiration, we could include <h> for aspiration, but nothing for unaspiration (or <'>), or include <'> for aspiration but nothing for unaspiration.

Finally, do we lemmatize at the kanji, the kana, or romanization? The current situation is just a total mess.

TL:DR: Transliteration and lemmatization of Ryukyuan needs a massive overhaul; it's a mess as of right now.

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Mcph2): This is an important discussion for the orthography and lemmatization of the Ryukyuan languages. Please come to a consensus. Chuterix (talk) 17:10, 11 March 2024 (UTC)[reply]

We should lemmatize at what native speakers have used the most, absent a standard orthography, regardless of if it seems inconsistent or "ad-hoc". Defective or variant orthographies are not specific to Ryukyuan, and in other cases, we list the variants as alternative forms with the "standard" or most-common form as the lemma. (Or in the case of two differently-pronounced words represented by the same orthography, we disambiguate in the etymology + pronunciation sections)

For Okinawan in particular, there are several works written in mixed script (Kanji & kana(, and it looks to be the traditional orthography as well, so I wouldn't support a move to solely kana, and definitely not the Latin script. The same level of research should be done for the other languages as well; if they are more-written in the Latin script or katakana, then shifts can be made, but the research needs to be done first. AG202 (talk) 17:25, 11 March 2024 (UTC)[reply]

As someone who does not read Japonic/Ryukyuan literature and cannot otherwise comment much on this, I would just like to register my (ignorant) doubts towards/concerns regarding Wiktionary constructs such as {{ryn-readings}} (the concept of on'yomi vs. kun'yomi, at least) and Category:Northern Amami-Oshima Han characters (the concept of "Ryukyuan kanji" in general). Kana orthography seems to be under-developed, let alone usage of kanji (or should it be the other way around? placenames, etc.). Are we just reapplying 標準語 kanji to Ryukyuan? (can we examine 1. Japonic dialects [using kanji seems non-problematic] 2. Chinese "dialects" [本字 debates, "unwritten", etc.] 3. Jeju [the concept of Sino-Jeju is discouraged on Wiktionary]? as a comparison point for this topic?) —Fish bowl (talk) 09:29, 14 March 2024 (UTC)[reply]

Recent change to government standard for Japanese[edit]

I broke this off into a subtopic because I do not understand Japanese (and therefore cannot check original sources) and I'm generally ignorant of CJK languages, but per Wiktionary:Grease_pit/2024/March#FYI:_Major_romanization_change_coming_in_Japan, the government standard in Japan for Japanese is now Hepburn. As AG202 notes above about "absent a standard orthography", I'm just soliciting that the feds there may have a standard for Ainu, Ryukuan, etc. as well and that standard may be Hepburn also. Sorry if my ignorance introduces noise. :/ —Justin (koavf)❤T☮C☺M☯ 17:53, 11 March 2024 (UTC)[reply]

the government standard in Japan for Japanese is now Hepburn.

Notably, this is for romanization, which is included on various kinds of signage explicitly for foreigners, as part of the country's efforts to court tourism money. This shift to Hepburn has nothing to do with text written in Japanese or other Japonic languages, outside of this very limited context (signs for foreigners). ‑‑ Eiríkr Útlendi │^{Tala við mig} 20:43, 12 March 2024 (UTC)[reply]

Wikimedia Foundation Board of Trustees 2024 Selection[edit]

You can find this message translated into additional languages on Meta-wiki.

More languages • Please help translate to your language

Dear all,

This year, the term of 4 (four) Community- and Affiliate-selected Trustees on the Wikimedia Foundation Board of Trustees will come to an end [1]. The Board invites the whole movement to participate in this year’s selection process and vote to fill those seats.

The Elections Committee will oversee this process with support from Foundation staff [2]. The Board Governance Committee created a Board Selection Working Group from Trustees who cannot be candidates in the 2024 community- and affiliate-selected trustee selection process composed of Dariusz Jemielniak, Nataliia Tymkiv, Esra'a Al Shafei, Kathy Collins, and Shani Evenstein Sigalov [3]. The group is tasked with providing Board oversight for the 2024 trustee selection process, and for keeping the Board informed. More details on the roles of the Elections Committee, Board, and staff are here [4].

Here are the key planned dates:

May 2024: Call for candidates and call for questions
June 2024: Affiliates vote to shortlist 12 candidates (no shortlisting if 15 or less candidates apply) [5]
June-August 2024: Campaign period
End of August / beginning of September 2024: Two-week community voting period
October–November 2024: Background check of selected candidates
Board's Meeting in December 2024: New trustees seated

Learn more about the 2024 selection process - including the detailed timeline, the candidacy process, the campaign rules, and the voter eligibility criteria - on this Meta-wiki page, and make your plan.

Election Volunteers

Another way to be involved with the 2024 selection process is to be an Election Volunteer. Election Volunteers are a bridge between the Elections Committee and their respective community. They help ensure their community is represented and mobilize them to vote. Learn more about the program and how to join on this Meta-wiki page.

Best regards,

Dariusz Jemielniak (Governance Committee Chair, Board Selection Working Group)

[1] https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_elections/2021/Results#Elected

[2] https://foundation.wikimedia.org/wiki/Committee:Elections_Committee_Charter

[3] https://foundation.wikimedia.org/wiki/Minutes:2023-08-15#Governance_Committee

[4] https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_committee/Roles

[5] Even though the ideal number is 12 candidates for 4 open seats, the shortlisting process will be triggered if there are more than 15 candidates because the 1-3 candidates that are removed might feel ostracized and it would be a lot of work for affiliates to carry out the shortlisting process to only eliminate 1-3 candidates from the candidate list.

MPossoupe_(WMF)19:57, 12 March 2024 (UTC)[reply]

User:GabMarquetto[edit]

Last month, this user added well over a thousand problematic Greenlandic entries over a day or two by scraping a Greenlandic dictionary site and running an unauthorized bot on their account. I blocked them from mainspace and the Reconstruction namespace as an unauthorized bot and asked for help at the Grease pit (see Wiktionary:Grease pit#Hundreds of Incomplete Greenlandic entries need to be cleaned up) on getting them up to Wiktionary standards. The consensus seemed to be that it would be best to just nuke them all, which I have since done, for the most part. Aside from copyvio concerns (compilation copyright, if nothing else), the verbatim inclusion of typos and other irregularities in the headwords showed that the bot run had been prepared with only minimal attention to the content. They have admitted that they don't speak Greenlandic at all (they're editing from Brazil).

The user responded by apologizing on their talk page and by attempting to clean the entries up using an alternate account and as an ip, for which those were blocked by others on grounds of block evasion.

We need to discuss what to do next. While their methods were wrong, their motivation was to add content to the dictionary. They have admitted their mistakes and agreed not to repeat them. I made a point of only blocking them from two namespaces so they could discuss things here and on talk pages. This should not be about punishment for anything they did, but about whether they can be trusted to edit responsibly and add worthwhile content.

Pinging participants in the Grease pit discussion: (@Benwing2, DCDuring, Thadh, Vininn126), and users I've seen editing Greenlandic entries: (@Gamren, Jakeybean, Tesco250). Chuck Entz (talk) 14:57, 13 March 2024 (UTC)[reply]

The only input I can give is on admin decisions - I definitely think we should WT:Assume good faith and discuss with this user and teach them. Unfortunately when it comes to specifically Greenlandic I am very unfamiliar. I do think that they should stick to languages whose text they can at least read and understand (and not just rely on something else). Perhaps this user shouldn't be editing Greenlandic at all. Vininn126 (talk) 15:02, 13 March 2024 (UTC)[reply]

Unfortunately our dictionary does suffer from a severe lack of terms in many languages. However, if we don't have any editors who know the language, there is nothing we can do about that. The best course of action in my opinion would be to simply remove all these contributions, because currently a larger problem we are facing as a dictionary is untrustworthiness, which in turn decreases the number of willing editors in these languages. Better to not have any entries in a language than to have hundreds of questionable quality and validity at best. Thadh (talk) 16:35, 13 March 2024 (UTC)[reply]

Untrustworthiness is probably mostly based on English entries. Maybe we need to start over with a clean sheet of virtual paper. DCDuring (talk) 18:46, 13 March 2024 (UTC)[reply]

@DCDuring: I feel like you are using some kind of tone that isn't being taken over into your writing. What are you saying? That we should re-do our whole dictionary? Also applies to the message below, I'm confused what your opinion is. Thadh (talk) 19:46, 13 March 2024 (UTC)[reply]

I found the argument given spurious. If our problem is that we are thought untrustworthy, I find it hard to believe that the problem can be anywhere other than English entries. If untristworthiness is a reason to delete content, then it is English entries that should be deleted. DCDuring (talk) 20:02, 13 March 2024 (UTC)[reply]

Most of the readers I interact with don't even use the English entries, so I think you're talking about a whole other reader base. If a language's sections don't have any references and half of the time feature an incorrect translation, then this is a disservice to the readers, and we should remove or improve those sections. If you think an English entry does not fulfill our CFI, you should RFV it, too, but mostly our English entries are pretty well-formed and represent the language adequately, as they are proofread by hundreds of native speakers. Not at all the case with our other language sections. Thadh (talk) 20:51, 13 March 2024 (UTC)[reply]

I've barely glanced at English entries except occasionally to make Romance etymologies that bleed into them more consistent. Nicodene (talk) 12:52, 15 March 2024 (UTC)[reply]

I guess that either we don't need no stinking first-draft-level Greenlandic entries from a volunteer or we should be trying to recruit someone (from where?) to add them from scratch. DCDuring (talk) 18:44, 13 March 2024 (UTC)[reply]

Perhaps we could see whether there is some other language's wiktionary that has some good Greenlandic entries. da.wikt? is.wikt? DCDuring (talk) 20:02, 13 March 2024 (UTC)[reply]

@DCDuring Alas, no. da.wikt has ~ 100 Greenlandic entries and they're all extremely basic stubs with a single-word definition and nothing more. is.wikt is even worse, with only 2 basic stub Greenlandic entries. In general, many non-English Wiktionaries are slim pickings; the entries are typically OK only for the native language of the Wiktionary in question and often not even then. For many languages, en.wikt does a far better job than the corresponding language's own Wiktionary. Benwing2 (talk) 07:28, 14 March 2024 (UTC)[reply]

It seems a shame that there are so many resources for Greenlandic available from the Greenland Language Secretariat to evaluate and improve the first-draft/stub entries, but we can't find the linguistic talent motivated to improve the entries. Oh well. DCDuring (talk) 14:30, 14 March 2024 (UTC)[reply]

@DCDuring You are welcome to do the entries yourself. Theknightwho (talk) 00:35, 15 March 2024 (UTC)[reply]

Needless to say, please do consult a grammar (or more) before doing so. Thadh (talk) 09:10, 15 March 2024 (UTC)[reply]

I'll just request Greenlandic translations for the organisms that sometimes live there. Maybe I'll venture to add a Greenlandic entry for them too. DCDuring (talk) 12:48, 15 March 2024 (UTC)[reply]

As I understand the question, it is not about what to do or not do with Greenlandic entries, but whether to lift the blocks. My inclination is to remove the blocks while admonishing the user to constrain their edits in future to their languages of (reasonable) competence. Given the apologies, I feel we currently do not have more reason to distrust this user than any random new user. --Lambiam 20:58, 20 March 2024 (UTC)[reply]

I haven't had time to review the user's edits in depth, but reading through the user's talk page and this discussion, I am inclined to agree: as far as the user/block is concerned, we can unblock and see whether subsequent edits are good or not. (If no other admin wants to beat me to that, and if no-one has objections, then someone can ping me in a few days and I'll unblock them.) - -sche (discuss) 18:07, 23 March 2024 (UTC)[reply]

Yes, this is fine with me too. When I have blocked users for long periods or indefinitely, it's generally because the user (a) thinks they know what they're doing but doesn't, (b) continues adding trash despite multiple warnings, and (c) doesn't respond to those warnings (or responds defensively and denies there's an issue). It's a good sign if a user apologizes and promises to change their behavior (although some users do that and then continue the same pattern of adding trash, so they need to be monitored). Benwing2 (talk) 18:26, 23 March 2024 (UTC)[reply]

Hoping to convene on practice regarding natural overlap of hyponyms and derived terms[edit]

There are many nouns for which a population of hyponyms and a population of derived terms will quite naturally have a substantial overlap. For example, in English, the noun list has laundry list, punch list, and dozens more. The theme is quite generalizable. What I propose here is to codify the principle that it is not wrong to show such terms both in a hyponyms section and in a derived terms section. Earlier I had avoided doing so because I anticipated that otherwise someone else might complain that having the same term twice on the page was "clutter". But there are some good reasons, regarding w:structured data, why allowing for the natural degree of double-posting is a good idea. Does anyone strenuously object to doing so? Note that column wrappers can be used, so there is no excuse for a section with lots of content not to auto-collapse. Thus, the user will not be presented with a giant unfolded list. Thanks. Quercus solaris (talk) 17:34, 13 March 2024 (UTC)[reply]

Definitely not wrong, actually encouraged inmo. Thadh (talk) 17:37, 13 March 2024 (UTC)[reply]

I support this. "Laundry list" is both a derived term from "list" as well as a hyponym of "list". CitationsFreak (talk) 18:38, 13 March 2024 (UTC)[reply]

It is pure clutter on taxonomic name entries. I have limited derived terms to items that are not hyponyms, ie, accepted species names are hyponyms, not derived terms. No-longer-accepted species names that have been placed in other genera may appear as derived terms. The rule-driven, mechanical nature of the derivation of almost all tribe and family names and many order names makes their inclusion in any derived terms lists often of similarly low value, insufficient to warrant the clutter. OTOH, if we really want this duplication, there is nothing to prevent a bot from doing the job thoroughly. DCDuring (talk) 20:14, 13 March 2024 (UTC)[reply]

User:Sae1962 had a frequent habit of adding extremely basic hypernyms (i.e. going the other way), so he would take something like Hypertext Markup Language and add a hypernym of language. I consider this fairly unhelpful to human readers, though technically correct: it reminds me of reading Java or .NET programming documentation where you have a huge list of derived or inherited classes going all the way back to object, because everything is an object in the end. Equinox ◑ 20:20, 13 March 2024 (UTC)[reply]

i thought the Derived terms section was only for terms that could not fit under Hyponyms or some other section (though I suppose it would nearly always be Hyponyms). I think there are at least some pages that have an HTML comment in the Derived terms section warning users not to add terms that could be put into a hyponyms section or some other section. But I didnt bookmark anything and I dont see it on the policy page. —Soap— 20:53, 13 March 2024 (UTC)[reply]

Great points everyone. Thanks. Perhaps not a firm rule to be codified now. Wiktionary could impose firm consistency retroactively, later, if it ever feels the need. So where I'll leave it is that I'll respect and obey any existing setups that don't double-post (such as taxonomy, or entries with comments discouraging it). And I'll follow the principle that whichever method is used, just make sure that auto-collapse is keeping everything nice and orderly. Quercus solaris (talk) 16:08, 14 March 2024 (UTC)[reply]

Renaming "etymology-only language"[edit]

@Theknightwho, -sche I think the time has come to rename the term "etymology-only language" to something else. This term is cumbersome, and while it was accurate originally when the codes in question could be used only in etymology templates, it's long outgrown that particular use case. I would propose one of "dialect", "subvariety" or "sublect". "Dialect" is the most straightforward and arguably is exactly what these varieties are in most cases, but it's a bit of a loaded term given the longstanding language-vs-dialect controversy that happens with many language varieties. Thoughts? Benwing2 (talk) 04:57, 14 March 2024 (UTC)[reply]

I'll just add we treat Middle Polish as such, and I'm not sure dialect would be the best term for it. Unless we accept "dialect" to mean "any variant of"... Vininn126 (talk) 07:21, 14 March 2024 (UTC)[reply]

@Vininn126 Good point. That is why I suggested "subvariant" and "sublect". "Variant" on its own could sort of work but it feels too vague without some other qualifier, since "variant" and "lect", at least in some contexts, are generic terms covering any type of language. Benwing2 (talk) 07:25, 14 March 2024 (UTC)[reply]

@Benwing2, Vininn126: May I suggest merolect? The word sees no established usage, so we can thereby avoid any undesirable connotations, and with the etymological sense of "part-language", it means exactly what we want it to mean. 0DF (talk) 14:33, 14 March 2024 (UTC)[reply]

Variant is succinct, and covers all the different kinds of etym-only language: dialects, chronolects, regional varieties, written standards etc. Theknightwho (talk) 14:37, 14 March 2024 (UTC)[reply]

@Benwing2, at el.wikt we mark them as 'sublang' = subordinate languages. The weird thing here, is that they can be donors but not receivers. How is this possible? The 'subordinate' or 'hosted' languages/varieties/dialects/whatever have Cat:Terms derived from this.sublang (donor to other languages) Cat:Sublang terms derived from X.languagage (as receivers) e.g. MedLat alchemia at wikt:el:alchemia has a Cat:Med.Lat terms borrowed from arabic. ‑‑Sarri.greek ^♫ I 15:03, 14 March 2024 (UTC)[reply]

@Sarri.greek I'm not keen on this; I'm not sure about Greek, but in English the term "subordinate" implies a lesser status, which is likely to put some contributors off. Theknightwho (talk) 16:23, 14 March 2024 (UTC)[reply]

I meant, M @Theknightwho, that they are marked at module sublang=true. If the question is about the 'name' of all of them, it doesn't matter. But, how could this title convey that these languages are not allowed what the others are? They are code-only languages with only existence, in the template {{m}} and being a donor but never a receiver at etym.templates. My big surprise, worry, and question is: why are they not receivers?? Probably this is not the place to ask this. I just bring it up because it is relevant, and because I do not intend to open such a subject myself. ‑‑Sarri.greek ^♫ I 16:41, 14 March 2024 (UTC)[reply]

@Sarri.greek We do often include them in descendant sections. Also, the elephant in the room is Chinese, which we already subdivide for this purpose already; we simply group them all under one header. Theknightwho (talk) 16:44, 14 March 2024 (UTC)[reply]

Please, please, Sir, think about it! @Theknightwho, Benwing2 why etymologies should be inaccurate? Medieval Latin alchemia and similar LaMed words, give the Cat:Latin terms derived from Arabic, which should include only a subcategory: Medieval Latin terms derived from Arabic. cf @el.wikt.Cat.Lat.from.Ar has only this subcat. The etymologies of descendants, should say 'from Med.lat' not 'from Lat'? because it is a medieval word. ‑‑Sarri.greek ^♫ I 16:58, 14 March 2024 (UTC)[reply]

@Sarri.greek This is an orthogonal point. I think you're asking for allowing etym-only languages in the |1= param of etym templates and categorize under e.g. CAT:Medieval Latin terms derived from Arabic in addition to CAT:Latin terms derived from Arabic. We don't currently do this but the trend is towards allowing etym-only languages in more places (hence this renaming discussion), so potentially we could allow this. IMO though this should be a separate discussion from what we should rename "etym-only language" to. Benwing2 (talk) 20:45, 14 March 2024 (UTC)[reply]

Agreed. Vininn126 (talk) 20:47, 14 March 2024 (UTC)[reply]

May I suggest variety? It is the most neutral commonly-used term that comes to mind. ‘A variety of Spanish’ brings up some seven million hits on Google. Nicodene (talk) 15:41, 14 March 2024 (UTC)[reply]

Yeah, this is probably a better suggestion than "variant", and is a widely-used term. Theknightwho (talk) 16:20, 14 March 2024 (UTC)[reply]

So far variety is my top option. I understand the logic of merolect but I think we should avoid obtusisms if possible. Vininn126 (talk) 16:23, 14 March 2024 (UTC)[reply]

@Nicodene, Theknightwho, Vininn126: I'm also happy with variety. 0DF (talk) 17:47, 14 March 2024 (UTC)[reply]

"Dialect" isn't ideal, because we already have dialectal data modules. Likewise, "variety" isn't ideal, because language data has a field for "varieties" that is just a list of names (see e.g. Category:English language). — SURJECTION ^{/ T / C / L /} 18:41, 14 March 2024 (UTC)[reply]

@Surjection I’d argue the opposite: the two listed for English are both lects which would benefit from having a code of this type, so naming them “variety codes” make total sense. Theknightwho (talk) 19:45, 14 March 2024 (UTC)[reply]

Can you say the same of every "variety" currently specified for every language? — SURJECTION ^{/ T / C / L /} 20:26, 14 March 2024 (UTC)[reply]

Do you have an alternative suggestion? Vininn126 (talk) 20:27, 14 March 2024 (UTC)[reply]

My point is that we should avoid adopting terminology that is already used for something else. It's only going to make everything more confusing than it is. — SURJECTION ^{/ T / C / L /} 20:42, 14 March 2024 (UTC)[reply]

I don't think this is something else, though - the whole point of the varieties field is to list more specific types of the main language, which is precisely what these codes are for. I've not been able to find a counter-example to that yet, since alternative names for the language itself should go under the "aliases" field instead. Theknightwho (talk) 21:24, 14 March 2024 (UTC)[reply]

@Surjection I think we shouldn't worry about existing internal names. The "dialectal data modules" are probably going away in any case (see my post in the Grease pit) and we can rename the language data field. Benwing2 (talk) 20:40, 14 March 2024 (UTC)[reply]

BTW since most people seem to support the term "variety", maybe we can call the internal data field "variant" or "lect". Benwing2 (talk) 20:41, 14 March 2024 (UTC)[reply]

Sure, renaming that field is another option, and then we can call etymology-only language "varieties". — SURJECTION ^{/ T / C / L /} 20:42, 14 March 2024 (UTC)[reply]

@Surjection Can you provide an example of something which should go under the "varieties" field in the language data which shouldn't ever have an etymology-only code? Theknightwho (talk) 21:25, 14 March 2024 (UTC)[reply]

I'm not saying I know of any such cases. What I am saying is that nobody knows, until the work to check them is put in, that all of the currently registered "varieties" could reasonably have their own codes. — SURJECTION ^{/ T / C / L /} 21:31, 14 March 2024 (UTC)[reply]

Alright - I'll do that. Theknightwho (talk) 21:56, 14 March 2024 (UTC)[reply]

Just FYI I suspect that everything that qualifies as a "variety" under the variety field can reasonably have an etym-only code. We have current etym-only codes for conventional dialects (regional lects/topolects), chronolects (e.g. Early Modern English), registers/sociolects (e.g. Katharevousa), cants (e.g. Polari), even writing systems (e.g. Wade-Giles). It might be useful to set up the ability to categorize etym-only varieties by the type of lect involved; currently this info is found only in the associated category and only at the level of regional lect vs. everything else. Also, if we get serious about adding etym-only codes for all varieties, we might want to split the data into submodules the way we currently do for full languages. Benwing2 (talk) 22:32, 14 March 2024 (UTC)[reply]

If we're adding etym-only codes for all varieties, do we still need a "varieties" field in Module:languages, or is it just redundant to Module:etymology languages/data and its "parent"/"3" field? (I think the/an original reason varieties were listed in Module:languages is so people searching the module for e.g. Twi would find what code covered it; we might want to retain "varieties" that have ISO codes but let Module:etymology languages/data handle all the other ones...?) - -sche (discuss) 23:10, 14 March 2024 (UTC)[reply]

My (half-serious) suggestion last time this came up was "subsumed variety", since that seems to be the distinguishing characteristic (?), that these are codes that are subsumed under other codes. There are some edge cases like substrates which none of the proposed names fit, e.g. the pre-Roman substrate of the Balkans—or as it was recently (non-consensusly?) renamed, Paleo-Balkan—is not really a "dialect" or "subvariety" or "subsumed variety" of anything, it's "one or more unknown languages from place X". I agree with Surjection we shouldn't be using the same name for two nonidentical things, so if we call these "varieties", we should consider whether to rename or retire the "varieties" field in Module:languages as discussed above.
BTW, re "dialect", another issue with that term is: are things like "Classical Latin" and "Late Latin" "dialects", per se? (If they are, we need to update the entry dialect.) - -sche (discuss) 23:10, 14 March 2024 (UTC)[reply]

Anyway, I think "variety" is fine as long as we decide what to do with the "varieties" field in Module:languages. (In particular, if we're giving every one of Module:language's "varieties" a code, do we still need the "varieties" field? Maybe we just make sure the "search for a language" thing on Module languages also searches through the list of variety codes, and that solves the issue that "varieties" were IIRC initially added to Module:languages to deal with, which was if someone wondered what code to enter e.g. "Twi" under.) - -sche (discuss) 18:11, 23 March 2024 (UTC)[reply]

@-sche I think for the moment we will need to have the "varieties" field because a lot of the info in that field isn't well-vetted, meaning it will take work to convert the info in that field into proper etym-only languages. Since the current practice is to not put varieties in the "varieties" field that also exist as etym-only languages, I think we should rename the field "other lects" (or "other variants" or even just "other varieties"). As for the search box on Module:languages, it looks like it already does what you want it to do; e.g. if I search for "Twi", the first entry that comes up is in Module:etymology languages/data, which is the module holding etym-only languages (aka variety codes). Benwing2 (talk) 18:21, 23 March 2024 (UTC)[reply]

Proposal[edit]

@Theknightwho, -sche, Vininn126, Surjection, Nicodene, 0DF, Sarri.greek Pinging the people who took part in the above discussion. I'd like to formally propose renaming "etym-only language" to "language variety" and rename the "varieties" field in Module:languages to "lects".

Note also: As per my recent discussion Wiktionary:Grease pit/2024/March#merging lect info, I'd like for the "varieties"/"lects" field to go away in favor of consolidated info somewhere, probably in the labels data modules (which is where I've put such information for Chinese, see Module:labels/data/lang/zh). Then the lects can be pulled out of the label data by looking for labels with the parent field (which indicates they are lects in a tree of such lects). But whether you like or dislike this approach and/or would prefer a different one is orthogonal to the above proposal, which is only about renaming terminology that has outlived its usefulness.

Implementation: This should not be terribly hard as AFAIK we don't have any templates that specifically reference etymology-only languages using that name. We'd just need to rename Module:etymology languages and associated data modules to Module:language varieties, change any references to those modules in other code (which shouldn't be that many since most of them go through Module:languages anyway), and update documentation. We should also consider (at the same time or later) renaming the methods getNonEtymological, getNonEtymologicalName and getNonEtymologicalCode to getFullLanguage, getFullLanguageName and getFullLanguageCode. This can be done by renaming the methods in Module:languages while keeping the old ones as aliases until all callers are updated. Note that this is all purely internal and won't affect any mainspace Wikicode. Finally, we need to rename the "varieties" field in the languages extra data to "lects"; again this is all internal, and hardly anyone references or uses this data so it won't be very much effort.

Please indicate support, abstain, oppose, etc. below so that we have clear consensus for doing this.

Benwing2 (talk) 05:36, 2 April 2024 (UTC)[reply]

Support what you want since it is internal anyway and you needed to do something about redundant data fields. Fay Freak (talk) 05:43, 2 April 2024 (UTC)[reply]

Support Vininn126 (talk) 05:52, 2 April 2024 (UTC)[reply]

Support — SURJECTION ^{/ T / C / L /} 06:02, 2 April 2024 (UTC)[reply]

Support 0DF (talk) 06:46, 2 April 2024 (UTC)[reply]

Support, support! ‑‑Sarri.greek ^♫ I 10:13, 2 April 2024 (UTC)[reply]

@Theknightwho, -sche, Vininn126, Surjection, Nicodene, 0DF, Sarri.greek Sorry for the second ping. I went to implement the first part of this (`varieties` field -> lects) and I realize there's a small issue, which is that there is currently a `varieties` field for all three of languages, families and scripts, and "lects" is only applicable to languages. We could keep `varieties` for families and scripts except there are also "family varieties" renamed from etymology-only families (just Old and Middle Iranian languages). There are two other possibilities I can think of, which are "variant" and "subvariety". I think subvariety might be confusing in that people would think there's something inherently "lesser" about the subvarieties listed in the extra data, so I propose "variant". Benwing2 (talk) 19:43, 5 April 2024 (UTC)[reply]

@Benwing2: It's not as good, but since it's all internal, it doesn't matter much, so I'm also OK with variant. 0DF (talk) 20:25, 5 April 2024 (UTC)[reply]

Not to get off-topic, but given the way we use "Middle Iranian"—that we actually reconstruct terms in it, "Middle Iranian *foobar"—it does not make sense to me that we're calling it a family code. I seem to recall it was just one user who insisted we mustn't internally put it in the lect module? But maybe we just take a !vote and see whether people agree with that, if treating it as a family is also complicating other things. If it's not a lect, then I'm not sure it makes sense to be reconstructing terms in it; if we're reconstructing terms in it, then we're ipso facto treating it as a lect, similar to a reconstructed language (like how some people think Proto-Indo-European may have been more of a dialect continuum than a unitary lect, but we still treat it as a lect for our purposes). - -sche (discuss) 22:59, 5 April 2024 (UTC)[reply]

@-sche Pinging User:Vahagn Petrosyan who I think is the one who mostly uses it, and User:Theknightwho who may have opinions. IMO neither "Middle Iranian languages" nor "Old Iranian languages" should exist at all but it seems that Armenian specialists like to reconstruct terms in "unspecified Middle Iranian" and "unspecified Old Iranian". Maybe these should be treated as etym-only languages (aka language varieties) whose parent is a family (which is allowed), and called "unspecified Middle Iranian" etc. since that's what they are. I should note though that currently we have the variety field filled out in various places for all three of languages, families and scripts. If we eliminate "family varieties" "Middle Iranian" and "Old Iranian", we could keep the variety as variety for families and scripts and use lect for languages, although that might be slightly confusing; or we could rename "etym-only languages" to "lects" instead of "varieties". I dunno. Benwing2 (talk) 23:28, 5 April 2024 (UTC)[reply]

As I said before, I can stop using Middle Iranian and Old Iranian. Simply "Iranian" is good enough for me. Vahag (talk) 11:01, 7 April 2024 (UTC)[reply]

I can also, since you have a technical background providing a reason for it, basically simplification, as the sole reason I use etym-only-languages “Middle Iranian” and “Old Iranian” is periodization, me informing the reader that I have an idea whether the Greek term (for example) was borrowed 0–700 CE or 700–0 BCE. One can see internal differences between Old and Middle Iranian but they are only rough distributions. Inb4 Victar is annoyed due to witnessing a change he has not consulted about, in spite of seeing this discussion header – techbro imperialists removing languages, d'oh! (But really changing the internal relations of preserved langcodes does not change the dictionary content, so one can be bold about recoding them, as you do nothing without affording most diligent advice.) Fay Freak (talk) 12:33, 7 April 2024 (UTC)[reply]

Support variant even better! thank you for your hard work! ‑‑Sarri.greek ^♫ I 20:14, 5 April 2024 (UTC) PS The actual situation of it, is 'hosted' or 'subordinate to' = We find this language hosted Under the title 'XX language' Regardless of the reason (etymological, regional, or other). ‑‑Sarri.greek ^♫ I 21:14, 5 April 2024 (UTC)[reply]

Wiktionary really needs structured etymology[edit]

I've become convinced that Wiktionary's current etymology system, in which each entry contains the complete ancestry of a term, is creating massive problems that prevent Wiktionary from being a good etymological dictionary. Here are the problems:

Massive duplication: consider English puny and its earlier form puisne. We repeat the exact same information in different entries, blatantly violating the DRY principle. That's just two entries: the etymology of a widely-borrowed term like the ancestor of English sugar has to be duplicated across hundreds of languages. Often, editors don't bother and just write something like "see term#English for more details".
Entries falling out of sync: English nexus claims to derive from Proto-Indo-European *gned- or *gnod- through Latin necto. But necto was recently revised, claiming that its origin is "uncertain". Which entry is a reader meant to trust? This kind of inconsistency is actually encouraged by the current system, because after changing editing the etymology of a term, an editor has to hunt down and correct every single place where that etymology is referenced or copied. More often than not, they don't, and the result is that entries can drift out of sync and sometimes even contradict each other.
Redundant edits: editors spend large amounts of time expanding "derived terms" and "descendants" sections which is necessary only because of limitations in the current system. Because if we know that A is an ancestor of B, there's no point in also writing that B is a descendant of A—that's clearly implied. But we have to anyway, since there's no automated system that can make that logical step.

Structured etymologies would also let us do cool things, like create etymological trees and automatically find cognates and doublets across different languages.

Here is a simple model for creating structured etymologies:

Each etymology section of an entry needs to be associated with one or more etymons. An etymon is a term which is the ancestor of another with no intermediate steps. Thus, the etymon of puny is puisne, the etymon of puisne is Anglo-Norman puisné, and so on.
An entry can have more than one etymon. For example, English arrangement can be said to derive from English arrange, English -ment, and French arrangement.
There are different kinds of etymons: English fullwidth clearly derives from full +‎ width, but is also calqued from Japanese 全角 (zenkaku). The first two are morphological etymons, while the last is a semantic etymon. Another example is bullroar (sense 3) which is morphologically from bull +‎ roar but semantically from bullshit.
An etymon can also have a degree of certainty: the levels might be "certain", "likely", and "unlikely". This is an improvement from the current system, where something is either {{derived}} or it isn't. Sometimes, when editors aren't confident, they add |nocat=1, but this isn't a standardized practice.

Thus, to create a list of derived terms or descendants, all you would need to do is get a list of entries which have a particular term as an etymon.

The main problem to consider is how all this can be accomplished. Here are the possibilities:

Use Lua data modules, which are well-established on this week but are fairly unintuitive for new users and might cause performance issues.
Get an extension like Wikibase, which is already used on Wikidata. To be clear, I'm not saying we should turn Wiktionary into Wikidata, but rather use a Wikidata-like structure for this application. The drawback is that this would require WMF developers to get something done (not their strong suit).
Use bots. A bot can essentially function as a parser which converts high-level information into wikitext. This kind of thing is already being done with our {{anagrams}} system. This is the technically simplest solution, but would require someone to continually run a bot.
Do nothing and keep writing etymologies manually. This is easier for now, but probably not a great long-term solution.

Another problem to consider is how this structured information should be presented to the reader. But all in all, I'm curious as to what the community thinks should be done. Ioaxxere (talk) 23:18, 14 March 2024 (UTC)[reply]

I agree that the current way etymologies are handled is problematic. If there is some technical solution to that, it would be great; I don't understand that side of things so am not sure what can be done.--Urszag (talk) 00:17, 15 March 2024 (UTC)[reply]

@Ioaxxere IMO none of your proposed solutions is workable. I would rather suggest a scraping solution. This is what we do for Descendants, for example, and it seems to work fairly well. (Your proposed solution #1 was tried for Descendants prior to implementing scraping, and failed, which led to the scraping solution.) This should not be too hard, but it might require the introduction of a few more templates to more clearly spell out the relations between etyms. Not sure. Benwing2 (talk) 00:46, 15 March 2024 (UTC)[reply]

@Benwing2 By scraping, do you mean creating an etymology equivalent of {{desctree}}? I think the main limitation of this is that you can only go in one direction (i.e. you wouldn't be able use the etymology data to get a list of descendants). But I would definitely consider that an improvement over the current situation.

Actually: I thought of a way to resolve this, by using the category system. We already have categories like Category:English terms suffixed with -en. If we created categories for every single "terms descended from LEMMA" (there would be millions) this could theoretically be used to encode a tree. What do you think? Ioaxxere (talk) 02:15, 15 March 2024 (UTC)[reply]

@Ioaxxere Yes, something like {{desctree}}. It's true this wouldn't easily let you go from lists of descendants to ancestors and vice-versa, but (a) it would solve the other issues, (b) it's not clear in any case you would want to automate things in both directions in all situations; there are lots of complex cases involving etymologies that can't be neatly categorized and need descriptive text, and the Descendants lists and Etymology sections are conceptually different in their current implementations. Since the etymology sections are less structured than Descendants sections, some thought would have to go both into the conventions needed in Etymology sections so that the scraping result looks reasonable, and into how to implement the scraping itself and handle the various edge cases. Not something I have time to work on now but I agree it would be a good idea in the longer run and avoid lots of duplication and the inevitable bit rot associated with this (maybe there's a better term than "bit rot" to describe the inevitability of things getting out of sync when you have duplication). Benwing2 (talk) 03:28, 15 March 2024 (UTC)[reply]

I wholeheartedly agree but think you may be underestimating the sheer scale of the undertaking. Even with bot help it will take a lot of hands-on work by knowledgeable editors. My instinct is to pare this down to the basic core: set etymologies to point only one lemma higher and add the kind of 'scraper' currently being discussed. (And then call in the clean-up crew for a massive spectrum of languages...) As someone who edits mainly ety and desc sections, this alone, if feasible, would clear up 80% of my headaches. Nicodene (talk) 05:39, 15 March 2024 (UTC)[reply]

I don't disagree with 'make etymologies only point to the next level up' as a goal, but the issues that have come up when that's been discussed before, which you are probably aware of but which I want to make sure are mentioned here now that the idea itself has been mentioned, include (1) what if the next level up doesn't have an entry? e.g. I just added an etymology to chabot, but neither the Occitan chabotz nor the Latin capoceus exists to house the information that they ultimately seem to come from caput. (maybe in that case we add the full ety to chabot but also add a template that categorizes the entry as needing Occitan and Latin editors to help by creating those entries and moving the information thither?) (2) someone who's only interested in e.g. the etymology of English or French words now has to watchlist Occitan and Latin (etc) pages to see if that etymology gets changed; in some cases where minor languages see vandalism, this would make it less likely to be spotted (e.g. people try to add all kinds of weirdness to Kamboja, and if they were adding it to some less-watched Indian language instead, it'd get noticed less). (Also 3: if I want to know how many Old English words survive in English, and our English entries only point to Middle English, I'm stymied... but that one we could solve if the scraper/bot mass-adds that template that people currently use to add categories to etymologies.) - -sche (discuss) 06:39, 15 March 2024 (UTC)[reply]

For 1) I meant the next entry up which exists, and if it's a dead end, the etymology is left as-is. The proposed change wouldn't affect the chabot situation one way or another. For 2) I don't have an answer. Is vandalism that bad of a problem? Maybe I've just not noticed since I'm not the one dealing with it. 3) I wouldn't do this without including some kind of automatic categorization that runs through the etymology chain, or maybe regular bot sweeps that add/fix dercat. Not that I know how feasible that is, now that you mention it. Nicodene (talk) 09:44, 15 March 2024 (UTC)[reply]

For point 2 with Wikibase, the model is not Wikidata but c:Commons:Structured data. It's a new tab with data that bots can populate from key templates. Lua can access it recursively to create new and more powerful templates. Other tools like SPARQL can query the data. It's all about how to model metadata. Vriullop (talk) 09:33, 15 March 2024 (UTC)[reply]

I generally support this idea but I have no strong opinions on what the exact solution should be. I like being able to point to a specific etymon to generate structure from there, however that structure looks. Vininn126 (talk) 10:11, 15 March 2024 (UTC)[reply]

I've been thinking about this, and I'm starting to feel that a category system is the only sensible way to implement this. Here's how it would work:

Start at an entry (say biology)
Add an etymon template to the top of the etymology section. It might be formatted like this: {{etymons|en|id=life science|Biologie#German: biology|bio-#English: life|-logy#English: study}}. The |id= parameter defines the {{etymid}}, while the subsequent parameters link to etymons by their etymids.
The {{etymons}} template adds the category Category:ety:biology (English: life science), which represents a node in the etymology tree (although the naming scheme isn't final).
A bot creates the category with {{auto cat}}.
{{auto cat}} scrapes the page biology and discovers the {{etymons}} template. Using this information, it adds Category:ety:biology (English: life science) into the categories Category:ety:Biologie (German: biology), Category:ety:bio- (English: life), and Category:ety:-logy (English: study).

Now, getting the descendants or derived terms of biology is as simple as seeing what entries are in Category:ety:biology (English: life science). There might be subpages, like Category:ety:biology (English: life science)/uncertain or Category:ety:biology (English: life science)/semantic, to include cases I discussed in my original post. But overall, the concept is essentially {{prefixsee}} or {{suffixsee}}, just for every term. @Benwing2, would you support implementing this template? It could coexist in parallel with the current system for now until we figure out a way forward.

To answer a few others:

@Nicodene: Yes! Having etymologies point only one lemma higher is the entire purpose of this proposal. Because if we have a chain A -> B -> C, there's no reason why C needs to "know" that it comes from A. It's implicit. The problem is that editors spend lots of time writing out the entire chain on every entry, and this is done in an inconsistent way. But there's no rush to implement this on a massive scale right away. As stated above, we should have this coexist with the current system.
@-sche: Those are honestly good questions. In the case of A -> B -> C, if B doesn't exist, it might be reasonable to create it as a "dummy entry" with an etymology section and nothing else. Another possibility is to just link A -> C. For the second point, I don't think we should be designing our systems with the expectation of vandalism. But yes, an editor would have to watch a variety of pages to follow an entire etymological chain. However, someone who's really only interested in English etymology wouldn't care about, say, which PIE root an entry comes from, because that's not English etymology. In the case of French chabot, we might do {{etymons|fr|id=fish|caput#Latin: head|q1=uncertain}}, which would add it into Category:ety:caput (Latin: head)/uncertain.
@Benwing2: I think the term you're looking for might be entropy.

Ioaxxere (talk) 14:43, 15 March 2024 (UTC)[reply]

@Ioaxxere I think we need a solution that can work with existing etymologies. That probably means accessing a chain of data if possible from a single entry, and if that doesn't work, falling back to accessing from multiple entries in a chain. I don't think having an entirely new system in place in addition to the old system will work very well. Benwing2 (talk) 19:32, 15 March 2024 (UTC)[reply]

@Benwing2: What you're asking for seems impossible. The current system is ambiguous in that we link to an entry without specifying its etymid, meaning that "going up the chain" is rarely possible to do in an automated way. If your plan then involves specifying etymids for every etymology section, then we might as well just overhaul everything, because it's the same amount of work anyway. Ioaxxere (talk) 20:53, 15 March 2024 (UTC)[reply]

@Ioaxxere I think we'd have to examine some actual use cases before deciding it's impossible. In many cases there's only one Etymology section, for example. Benwing2 (talk) 21:00, 15 March 2024 (UTC)[reply]

@Benwing: the problem with this heuristic is that a) there's no guarantee that that the single-etymology entry is actually the correct one (maybe the actual ancestor hasn’t been added yet) and b) could break unpredictably if someone adds a new etymology section later. The basic use case, in my view, is to replace our current "from X, from Y, from Z" with just "from X" and have the rest be automatically filled in. Those on the Discord will have seen my struggles in trying to do just this. Ioaxxere (talk) 21:29, 15 March 2024 (UTC)[reply]

@Ioaxxere Sure but realistically I don't think trying to implement a completely new system will work. We need to find a solution that leverages what's already there. Benwing2 (talk) 21:35, 15 March 2024 (UTC)[reply]

@Benwing2: We can leverage our current data by using bots to convert etymology sections into a structured format, but this can only be done in situations where we are certain that errors won't be propagated. For example: if A is listed an ancestor of B#Etymology_2, and we find B listed as a derived term or descendant at A#Etymology_2, we can fairly confidently connect A#Etymology_2 and B#Etymology_2. I have implemented this heuristic in my own script and it works very well. Ioaxxere (talk) 03:26, 16 March 2024 (UTC)[reply]

@Ioaxxere Just FYI, before you set off to radically restructure etymologies, you need to (a) get consensus, (b) keep in mind what will be workable for the typical editor; ideally the system should be as little different as possible from what we have already. It's also better to do this stuff dynamically through scraping if at all possible, vs. requiring a bot to run periodically. Benwing2 (talk) 04:16, 16 March 2024 (UTC)[reply]

@Benwing2 It seems as though there's consensus for a change of some kind, but no agreement as to how it should be implemented. And that's something I'm also still thinking about... Ioaxxere (talk) 07:32, 17 March 2024 (UTC)[reply]

Ben's idea on something like {{desctree}} would imply that {{bor}}/{{inh}} would be able to check the pointed-at etymon and print information from there, and potentially go several pages back. If such a system were implemented, I think {{af}} should obviously be excluded, imagine printing the information for all the morphemes! It also wouldn't work for redlink pages just the same as {{desctree}}. Vininn126 (talk) 07:42, 17 March 2024 (UTC)[reply]

But, as mentioned, every word would need etymid's and etymology sections, of course... Vininn126 (talk) 07:48, 17 March 2024 (UTC)[reply]

@Benwing2, Vininn126 I created a mockup of my concept at User:Ioaxxere/under. I created a module, Module:User:Ioaxxere/etymon, which can recursively go backwards through various entries to build a chain of etymons (no categories are involved). Currently, it can only handle "From X, from Y"-type etymologies, so we'll need more complex parameters to represent stuff like {{af}}. Please let me know what you think! Ioaxxere (talk) 05:15, 18 March 2024 (UTC)[reply]

@Ioaxxere I took a brief look. I really don't want to be a party pooper but my sense is the scraping needs to be a lot more sophisticated and more able to work with existing entries (I feel I've said this before). Manual or even manual-assisted conversion of large numbers of entries to a new system/template just won't be happening any time soon. The reason {{desctree}} works is that it works with existing entries without requiring everything to be converted to a new format (and to the extent things have been converted, like when I changed {{desc}} to accept multiple terms, it's been in a completely automated fashion). Benwing2 (talk) 05:22, 18 March 2024 (UTC)[reply]

@Benwing2 Take the example of father. Let's say I want to the etymology to by synced up with its etymon, Middle English fader. But wait, do we want fader (Etymology 1) or fader (Etymology 2)? A human would obviously realize that the correct section is etymology 1. An automated scraping template could easily figure this out as well if we added heuristics like "Etymology 1 is a lot longer" and "Etymology 1 is on top" and "Etymology 1 links to father in its descendants section" and "Etymology 1 and father list the same ancestors" and "Etymology 1 is defined as father". The problem is that these heuristics can get arbitrarily complex and break in unpredictable ways. That's why we should be working towards using etymids.

Also, I have a question about {{desctree}}: how would I get it to scrape bar#Descendants_2? The entry doesn't have etymids, so this doesn't seem to be possible.

Manual or even manual-assisted conversion of large numbers of entries to a new system/template just won't be happening any time soon.

Would you be opposed to trying out a new system on a few entries, such as father and its five ancestors? Like {{desctree}}, this would have no effect on any other entry. Ioaxxere (talk) 06:11, 18 March 2024 (UTC)[reply]

@Ioaxxere: I see a number of possible problems. Can you assure me that they aren't?

1. Intermediate steps may be unattested.

2. Uncertainty as to the borrowing route. The OED has this problem with terms that may come from French or some form of Latin, and Thai has many words for which the ultimate source is Pali or Sanskrit, and indeed some which are blends of the two. A further problem is that many of these words were probably (but not certainly) borrowed via Old Khmer, where the spelling is chaotic. The path of mainland SE Asian loans from Pali or Sanskrit may be very uncertain.

3. Clusters of 'obvious' cognates, but for which there is no authoritative proto-form. Tai languages often show this.

4. Word A is derived in language X, and word B in language Y may be inherited from A in X or independently formed in Y. RichardW57m (talk) 16:51, 18 March 2024 (UTC)[reply]

@RichardW57m Here are my proposed resolutions.

1. User:-sche highlighted this issue with entries like French chabot, which a reference suggests derives from Latin caput through Vulgar Latin *capoceus (which is unlikely to ever be created). This could be written as: {{etymon|id=fish|der|unc|la>caput>head|text=perhaps from <1> (via Occitan, from unattested {{m+|VL.|*capoceus}})}}.

In natural language, this represents: French chabot (etymid: fish) may be derived from Latin caput (etymid: head), but this is uncertain. Also, the entry should display the text "perhaps from Latin caput (via Occitan, from unattested Vulgar Latin *capoceus)".

The template would also be able to automatically fetch the ancestors of Latin caput (etymid: head), although we probably don't want that in this case.

-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]

2. One example of this is in English crusado, which is partially borrowed from Spanish cruzado as well as Portuguese cruzado. This could be written as: {{etymon|id=crusader|bor|es>cruzado>cross|pt>cruzado>cross}}

In natural language, this represents: English crusado (etymid: crusader) is borrowed from either/both Spanish cruzado (etymid: cross) and Portuguese cruzado (etymid: cross). The template would automatically generate the text "Borrowed from Spanish cruzado and/or Portuguese cruzado." in the entry (the |text= parameter could be used to change the display text).

-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]

@Ioaxxere: What I particularly had in mind is cases where one possible source is 'inherited' from the other. (In the case of Pali and 'Sanskrit', this requires that we write in Wiktionarian.) This example doesn't address that issue. RichardW57m (talk) 12:46, 19 March 2024 (UTC)[reply]

@RichardW57m: it would be helpful if you told me the specific case you have in mind. However, in the example I gave it doesn't matter at all how the Spanish and Portuguese terms are connected. Ioaxxere (talk) 15:30, 19 March 2024 (UTC)[reply]

@Ioaxxere: I was having trouble finding a clean example. I think our etymologisers have wrongly assumed that a Sanskrit-based spelling implies borrowing from Sanskrit, rather than reshaping. A clean example is พันธุ์ (pan). --RichardW57m (talk) 16:48, 19 March 2024 (UTC)[reply]

@RichardW57m: Thank you for the example! Before I answer, I need to get some clarity on the meaning of "or" in that etymology. Does it mean that the term was borrowed either from Sanskrit or Pali (but definitely not both), or does it suggest that the term could have been borrowed from both languages at different points? This is the kind of ambiguity that I'm hoping to eliminate. Ioaxxere (talk) 17:38, 19 March 2024 (UTC)[reply]

@Ioaxxere: I find it hard to see how if it could have been borrowed from either then it could not have been borrowed by both. Indeed, it could have been borrowed from both simultaneously, and on multiple occasions. Now, in Thai, there are some cases where a word seems to have been borrowed from Pali and then the spelling upgraded to Sanskrit, e.g. ธมม (or had it become ธัมม?) from Pali being replaced by ธรรม (tam) with ร (rɔɔ) from Sanskrit (but with gemination of the letter seemingly being the borrowing of a Pali spelling pattern not applicable to ร (rɔɔ) in words of Pali origin). --RichardW57m (talk) 18:00, 19 March 2024 (UTC)[reply]

@RichardW57m: In that case, it seems like the etymology is equivalent to crusado. The code used in the Thai entry describes the immediate origin of the Thai term, so the fact that the Pali and Sanskrit terms are related doesn't actually change anything. So the code would be: {{etymon|th|id=breed|bor|sa>बन्धु>kinsman|pi>bandhu>kinsman}}, which might produce "Partially borrowed from Sanskrit बन्धु (bandhu) and Pali bandhu." Ioaxxere (talk) 18:47, 19 March 2024 (UTC)[reply]

@Ioaxxere: In which case {{etymon}} should not be used. --RichardW57m (talk) 09:53, 20 March 2024 (UTC)[reply]

@RichardW57m: I'm not sure what you mean by this. Is there something wrong with what I said? Ioaxxere (talk) 17:45, 20 March 2024 (UTC)[reply]

@Ioaxxere: Yes. It rather implies that there is a some third source of the word.

Moreover, with the tight restriction on how far back tracing would go, our statements would not be compatible with the word actually being borrowed via Khmer. I did begin to have some doubts as to the origin of the apocope in this word; I am not totally sure that it is part of the mechanism of directly borrowing from Pali and Sanskrit, especially as some words have been borrowed without apocope. Possibly we can stick in 'ultimately' to cover ourselves. The dropping of final -a from the 'rough forms' (i.e. stems) of words has been incorporated in the way Thai borrows from Pali and Sanskrit, but not as a mandatory process. (I suspect some words may also have been borrowed as the first elements of compounds, thereby preserving the final -a.) --RichardW57m (talk) 18:12, 20 March 2024 (UTC)[reply]

@RichardW57m: I see what you mean. In that case, I suggest "Borrowed from Sanskrit बन्धु (bandhu) and/or Pali bandhu." If we want to allow the possibility of an intermediate step or steps, it could be "Derived from Sanskrit बन्धु (bandhu) and/or Pali bandhu." (in which case the template would be using |der rather than |bor). Also, by "borrowed via Khmer", are you referring to Khmer ព័ន្ធុ (pŏənthuʼ)? If so, that can easily be added in as well. Ioaxxere (talk) 19:07, 20 March 2024 (UTC)[reply]

@Ioaxxere: I'm not sure why you're excluding borrowing in the latter case, but these wordings are better. For borrowing via Khmer, I would have expected the final written vowel to have been silent, but the word you give is definitely a cognate. --RichardW57m (talk) 09:24, 21 March 2024 (UTC)[reply]

3. If there is no proto-form, the {{etymon}} template wouldn't have any ancestors listed and wouldn't be very useful. If for some reason the only thing we knew about English king was that it was cognate with German König, the entry might have: {{etymon|id=monarch}} Cognate with {{m+|de|König}}. (Note: for now, I'm not sure if it's possible to automatically get cognates as we would have to go up and then down the etymology tree, although it might be possible with category stuff).

-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]

4. This one's simple: English unlock (for example) is from Middle English unloken but also equivalent to un- +‎ lock. This could be written as: {{etymon|id=open lock|enm>unloken>unlock|afeq|un->inverse|lock>mechanism}}.

In natural language, this represents: English unlock (etymid: open lock) is inherited from Middle English unloken (etymid: unlock), and is also equivalent to English un- (etymid: inverse) + English lock (etymid: mechanism). The template would automatically generate the text "From Middle English unloken, equivalent to un- +‎ lock." in the entry.

-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]

@Ioaxxere: Actually, this type of assertion is one that I am trying to avoid. The situation I gave was, "Word A is derived in language X, and word B in language Y may be inherited from A in X or independently formed in Y". You are asserting that word B derives from word A. There are also cases where one can confidently note that, despite formal appearance, word B does not descend from word A. --10:40, 19 March 2024 (UTC) RichardW57m (talk) 10:40, 19 March 2024 (UTC)[reply]

Another interesting case of this would be something like rocznik, i.e. possibly from Old Polish, but scholars aren't sure. Vininn126 (talk) 12:59, 19 March 2024 (UTC)[reply]

@RichardW57m, Vininn126 Based on the current information at Polish rocznik, I would write this as {{etymon|id=year|unc|zlw-opl>rocznik>flower|af|rok>year|-nik>performer}}. I used the keyword afeq (equivalent affix) in a previous example but I'm beginning to doubt whether there's any need to have both it and af (affix). Ioaxxere (talk) 15:38, 19 March 2024 (UTC)[reply]

{{{eqaf}}} just seems to be {{surf}}. Vininn126 (talk) 15:40, 19 March 2024 (UTC)[reply]

Correct. Ioaxxere (talk) 15:47, 19 March 2024 (UTC)[reply]

@Ioaxxere: I think the question is about the Polish word. The formal match is fine, but the semantics don't work except in so far as an old formation may guide a new formation. --16:53, 19 March 2024 (UTC) RichardW57m (talk) 16:53, 19 March 2024 (UTC)[reply]

@Ioaxxere:: It seems like a rather elaborate system, and for that reason it won't be easy to get widespread approval. I reiterate the earlier comment about slimming this down to its ‘core’, which is already very ambitious. Nicodene (talk) 12:19, 20 March 2024 (UTC)[reply]

@Nicodene Yes, this is the point I've been trying to make as well. Ideally we could leverage what we have already, possibly with some minimal changes (e.g. adding an extra etymid param or something if absolutely necessary). The tradeoff should be in the direction of adding more logic to the scraping function so that less explicit specification on the part of the editor is needed. Benwing2 (talk) 17:14, 20 March 2024 (UTC)[reply]

Update: I've added the template to father (& ancestors). In the end I didn't implement any text generation, since I don't think template-generated text can ever be better than human-generated text. Instead, I'm having it create an etymology tree which is pretty cool as well. Ioaxxere (talk) 00:28, 27 March 2024 (UTC)[reply]

@Ioaxxere: Did you get approval to add this? It doesn't seem like there's an actual consensus for it, and it is a major change. Additionally in the Proto-Indo-European entries that you've added it to, like at *peh₂-, it's causing weird spacing issues that weren't there before. You should only add it if there's a clear consensus for it. For testing purposes, you can use sandbox pages instead. CC: @Benwing2 AG202 (talk) 01:16, 27 March 2024 (UTC)[reply]

@AG202 @Ioaxxere I agree, please use sandbox pages until there's consensus to add this to mainspace pages. Benwing2 (talk) 01:29, 27 March 2024 (UTC)[reply]

Adding "Língua Geral" as a new language[edit]

It's a fairly well documented language, and represents a 150-year-old path between Old Tupi (tpw) and Nheengatu (yrl). Língua Geral has some evolutions that appear in both late Portuguese borrowings and Nheengatu, and are hard to show without it, like some changes in pronunciation (kunumĩ > kurumĩ) and meaning (paranã (“sea”) > paraná (“river”)). Língua Geral is also present in a good number of Brazilian toponyms, like Botuverá, and having an Etymology section saying "From Old Tupi" would just be wrong.

The cut between Língua Geral and Nheengatu is set at 1853 by most scholars, when the word "Nheengatu" was first used with the current meaning. The cut between Old Tupi and Lingua Geral is a bit more nebulous. Navarro used 1700 for his dictionary, so I'd go with that. For the code, I sugest <tpw-lg>, as it comes from Old Tupi.

There existed two varieties, Língua Geral Amazônica and Língua Geral Paulista, but I think that they could just be pointed out using {{lb}} when needed, rather than two separated L2 headings (if this ever become a new L2). What do you think?. Trooper57 (talk) 20:31, 16 March 2024 (UTC)[reply]

@Trooper57 I think what you are proposing is a full (L2 header) language. I don't know much about the differences between Old Tupi, Nheengatu and Língua Geral, but 150 years seems rather narrow a window for a full language; unless things evolved really fast, this could also (maybe better) be handled as an etym-only language variant of either the preceding or following stages. At least to me, the changes in pronunciation you give (kunumĩ > kurumĩ and paranã -> paraná with a semantic shift) do not seem indicative of a radical transformation in the language. Also, for a code I'd suggest maybe tpw-lig as we try to make the second component of a two-component language code have three chars. Benwing2 (talk) 21:57, 16 March 2024 (UTC)[reply]

Maybe a etym-only would suffice. The trouble I was having was how to state a word came from a later stage o Tupi, and not from the 16th century.

Also, as I understand, the fast pace of LG comes from its marginalization: Marquis de Pombal prohibited anything besides Portuguese, so it ended being a unscripted, nonstandardized language. Trooper57 (talk) 00:26, 17 March 2024 (UTC)[reply]

@Trooper57 If we add tpw-lig as an etym-only language variant of Old Tupi, then all you need to do is use the code in place of tpw and it will show as "From Língua Geral ..." with the appropriate link to the Língua Geral category (which doesn't seem to exist but can be created). Benwing2 (talk) 00:32, 17 March 2024 (UTC)[reply]

@Trooper57 BTW "Língua Geral" and "Geral" are listed as other names for Nheengatu in our data, which is consistent with how Wikipedia describes things (i.e. Língua Geral being an older stage of Nheengatu rather than a later stage of Old Tupi). Benwing2 (talk) 00:39, 17 March 2024 (UTC)[reply]

There are authors that call everything from 1500 onwards "Lingua Geral". It gets really confusing sometimes. Trooper57 (talk) 01:23, 17 March 2024 (UTC)[reply]

@Trooper57 OK. Let's wait a couple of days for anyone else to weigh in who might be knowledgeable about this topic (please ping anyone you think might be able to contribute), and then we can create an etym-only lang for Língua Geral, either tpw-lig or yrl-lig, whatever you think most appropriate. We can also create subvarieties *-lga (Língua Geral Amazônica) and *-lgp (Língua Geral Paulista) if you think this would be helpful (e.g. if you think these varieties will ever see fit to be cited in an etymology). Benwing2 (talk) 01:53, 17 March 2024 (UTC)[reply]

@RodRabelo7 and @NoKiAthami are the other Old Tupi editors I can think of. There's also @Arthur botelho, but he's inactive since 2019. Trooper57 (talk) 02:22, 17 March 2024 (UTC)[reply]

@Benwing2, the grammars of the General Languages are very distinctive from the ones of Old Tupi and Nheengatu, hence I agree with @Trooper57. An example: "Nitípe nde Caräíba?" in Old Tupi would be "Nda karaíba ruãpe endé?"; "Xe çüí nití oiabàb" would be "Xe suí i îababe'ymi", etc. In Nheengatu it would probably be similar, but still there are significant differences, in grammar and vocabulary. I honestly don't know the implications of "adding a new language" on Wiktionary, but undoubtfully the General Languages are independent languages. Those who call the language spoken from 1500 onwards Língua Geral (or even Nheengatu) should be completely ignored; for instance, Cândida Barros and Monserrat categorically state that the term isn't pertinent to the 16th century. In my opinion, Navarro's division is the most didactic one. Melhor feito do que perfeito… Pinging Erick Soares3 and Bageense in case they have anything to add. RodRabelo7 (talk) 17:38, 23 March 2024 (UTC)[reply]

@RodRabelo7 One gauge is to consider the differences between Old, Middle and Modern English. Middle English covered a c. 450-year time window with significant differences in the grammar and phonology between Early and Late Middle English (e.g. Early Middle English had 4 cases and 3 genders, while Late Middle English had no cases and no genders), but we don't split Early and Late Middle English for various reasons, e.g. (a) there's no obvious line to draw between Early and Late Middle English; (b) overly fine splits scatter information in different places that might be better grouped together; (c) splits risk creating duplicate information, where essentially the same lemmas appear in several places, with inevitable bit rot as changes in one place don't get properly propagated to the other(s). If the original motivation was simply to better show etymological progression, that can be done without any issue with etym-only languages and the appropriate labels; we do that for example with the different stages of Latin, where Classical Latin, Late Latin, Early Medieval Latin, Medieval Latin, New Latin, etc. are all grouped under the L2 Latin header and labels used to identify distinct stages. Benwing2 (talk) 18:13, 23 March 2024 (UTC)[reply]

I think Latin would be a good example to follow in this case. For example paranã, we could use {{lb|tpw|Língua Geral}} and say "3. (Língua Geral) river" to state this sense appeared later. Trooper57 (talk) 18:31, 23 March 2024 (UTC)[reply]

Reconstruction:Latin → Reconstruction:Proto-Romance?[edit]

Periodically people leave me messages like this asking why the provided pronunciations can be so different from what they would expect from ‘Vulgar Latin’, not understanding that these are reconstructions that work backwards from Romance and not forwards from Latin. To be fair, it's not as if we make this easy to understand. Every page in question prominently says ‘Latin’ at the top, while the ‘Proto-’ and ‘Romance’ parts are buried deeper in the entries, in template labels that are (apparently) easy to miss. Examples that have confused people include *cordarium and *damnaticum.

So, would we be better off splitting out reconstructions based on Romance under a new name? An incidental benefit would be that the relatively few Classical or pre-Classical reconstructions such as *futo, which are currently buried under a mass of Romance forms, would be easier to find and compare. Naturally the objection can be made that most if not all of the reconstructions probably pre-date any meaningful split between Latin and Romance, and so a proper entry of this kind amounts to ‘Late Latin, but unattested’ - and this is why I've not been inclined to change the way that we handle these so far. But I've come to realize that a change in name would have the benefit of clarifying for people how these reconstructions work, which must be rather opaque for anyone but a specialist.

Then the follow-up question is: if we do this, then should all such entries fall under the umbrella of ‘Reconstruction:Proto-Romance’, with labels distinguishing lower-order reconstructions like Proto-Gallo-Romance, or should they be split up accordingly? Nicodene (talk) 12:58, 20 March 2024 (UTC)[reply]

@Nicodene Personally I'd rather not introduce a new L2 for Proto-Romance because the reconstructed forms look so much like Latin; but I understand your concerns as well. If we are to do this, IMO there should be only one L2 for Proto-Romance, at least for now, with lower-order reconstructions distinguished using labels. In any case AFAIK the inner structure of the Romance languages isn't that worked out? Benwing2 (talk) 17:18, 20 March 2024 (UTC)[reply]

Just throwing out there that two "least-change" solutions to people being confused about the pronunciations, which we could do either or both (or, of course, neither) of, are to start repeating the "Proto-Ibero-Romance" etc labels as {{q}}s right before or after the pronunciation, and/or to start (additionally) providing the Classicizing / Classical-esque pronunciation, which is plainly what people are looking for in these and other Latin entries and tend to use in the situations (like Latin classes at school) where people use Latin anymore, which I think we should be providing across all our Latin entries. (I know you've opposed that because the people using those pronunciations are not Classical Romans, but I think we could find some label to make that clear.) - -sche (discuss) 17:39, 20 March 2024 (UTC)[reply]

@-sche @Nicodene I agree with this. Note that for example, we provide the "modern Italianate Ecclesiastical" pronunciation on Classical Latin entries even though this isn't how the Romans pronounced things, and we provide the Egyptological pronunciation on Ancient Egyptian terms, which is totally artificial. Benwing2 (talk) 17:48, 20 March 2024 (UTC)[reply]

We assign a modern ecclesiastical pronunciation to words that are actually in use by modern speakers of Latin - which is neither *damnaticum, nor *cordarium, nor *muccicalium, nor *ramuscellum - ad infinitum. That is to say, no, there is absolutely zero sociolinguistic justification for doing this - much less any historical or phonological. Egyptological pronunciation is based on a scholarly convention - which incidentally arose because scholars needed a way to refer to attested words without knowing the vowels in them - and so not at all comparable.

To put it more concretely - it'd be exactly as if the OED cited a Proto-Germanic reconstruction and assigned it a modern cockney pronunciation based on its spelling. Or vice-versa: cited cockney slang in a reconstructed Anglo-Saxon pronunciation. That is to say, it'd be utterly mad either way. Nicodene (talk) 18:42, 20 March 2024 (UTC)[reply]

No word that is known only from Romance reconstruction exists ‘in Latin classes’, or in any dictionary of Latin, by definition. To assign a pronunciation from the first century BC to a reconstructed word that shepherds in the Balkans would have come up with ca. AD 900 would be fantasy - misinformation - the precise opposite of what any serious dictionary should be providing. What you are proposing is quite literally to feed the misconceptions of (some) people coming here for clarification. Just because a reconstruction is spelt for etymological reasons in traditional Latin style does not mean it is equivalent to any old word Cicero used in the first century BC that people actually use now and have used throughout the entire history of Latin as a literary language, from end to end.

Actually your comment has convinced me that splitting is really the best option. There is simply no other way I can see to prevent this problem recurring over and over again. Well, there is of course always the option of giving up and letting fantasy run amok forever, where comparing a Friulian word to a Catalan word magically results in a reconstruction with the phonetics of Cicero's era based on quite literally nothing more than the spelling that the reconstructed entries are given.

Do editors in Proto-Germanic have to deal with people inserting modern English spelling-pronunciations like /ɪˈtʃɹəʊnə/ for *aitrōną? Do editors in Proto-Slavic have to deal with people insisting on modern Russian Church Slavonic IPA? I imagine not. Nicodene (talk) 19:14, 20 March 2024 (UTC)[reply]

That is not how reconstruction works, for any language. The pronunciation given on a reconstructed entry is what can be deduced from the descendants, not some kind of modern spelling-pronunciation based on the letters used to spell the reconstruction.

@-sche: could you name any academic source that assigns to a Gallo-Romance reconstruction like *damnaticum a supposed Classical Latin pronunciation like [d̪ämˈnäːt̪ɪkʊ̃ˑ]? Or for that matter assigns a pronunciation like [foːrˈmäːt̪ɪkʊ̃ˑ] to a Gallo-Romance form like ⟨formaticum⟩ ‘cheese’, found for a couple of decades in Latin records from medieval France and never before or since? Or claims that modern English scholars discussing such words switch to Classical Latin phonetics, complete with trilled /r/, phonemic vowel length, and nasal vowels? Or that either word has been artificially resurrected by modern Latin enthusiasts, or is even known by them?

So long as the answer to all four questions is ‘no’, I don't see what there is to discuss. The pronunciation in question would be a fiction divorced not only from scholarship but also history and modern reality. The only remaining point is what you mentioned: that some uninformed passers-by want to see this fiction, because they look at a word spelt in Latin orthography and can't imagine any other pronunciation. And if that is to prevail over the aforementioned points, if this site is to deliberately place fiction over fact, then tell me and I will leave it in peace instead of trying to observe some modicum of academic rigour. Nicodene (talk) 13:32, 21 March 2024 (UTC)[reply]

@Nicodene Hi again. I'm not sure what your latest response is responding to, but I thought of this some more and I'm not convinced by your arguments. The thing is that the "Classicizing" pronunciation is essentially a modern invention that bears a certain similarity to the pronunciation of c. 50 BC but isn't the same. For example, the typical Classicizing pronunciation pronounces final -m as /m/, which is not how it was actually pronounced, and does not bother making any distinction between different types of written l. For these reasons, it *does* remind me a bit of the Egyptological pronunciation, and even more of the Modern Standard Arabic pronunciation, which (like the Classicizing pronunciation) is based on a specific era's pronunciation (that of Koranic Arabic of c. 600 AD) but with significant changes. MSA pronunciation can be given to all Arabic words, including ones that originated far after Koranic times, and there's nothing particularly wrong with assigning and listing such a pronunciation because (a) it's a modern invention anyway, and (b) it is in common use today. In fact there are further similarities, e.g. both the MSA and Classicizing pronunciations differ somewhat depending on the native languages of the speakers, because in various circumstances they match up certain written letters to the closest native-language phoneme instead of attempting a "true" representation of the original era's pronunciation. Benwing2 (talk) 21:41, 21 March 2024 (UTC)[reply]

@Benwing2 Sorry, I think I missed what @-sche was actually getting at. And it seems to be what you're getting at too. Namely that if we were to use an accurate label like ‘Modern Classicizing’ rather than ‘Classical’ then everything else becomes very easy.

In that case there's no longer any concern about ahistoricity and we can indeed ‘provide [this pronunciation] across all our Latin entries’. We'd no longer be portraying this as how for example a word attested in 9th-century Italy was actually pronounced, we'd simply be saying this is how a modern speaker would read it. And that's completely reasonable.

Also this makes it clear what pronunciation we should be showing - the standard that modern speakers follow, i.e. Allen's Vox Latina. No need to host wacko home-brewed theories like [z̪d̪͡z̪] anymore. Your idea of setting it all to [phonetic only] would also be well-suited for a modern convention if you're still in favour.

As far as the original topic is concerned though I can't agree with the idea that such pronunciations would be reasonable in a reconstructed entry, because in my view the only pronunciation that is valid on a reconstructed entry is one that is actually reconstructed from the descendants. Nicodene (talk) 23:27, 21 March 2024 (UTC)[reply]

Minimal viable quotation that satisfies the WT:QUOTE policy requirements[edit]

The quote-book template provides amazing flexibility and great level of details. However this also creates an impression of a steep learning curve, so it's natural that some editors are reluctant to add quotations as a result. So I wonder, what is the absolute minimum amount of information that has to be provided to make a valid quotation?

My understanding is that it's always necessary to mention the author, because not doing so would constitute an act of plagiarism. But then there's the publication year, the title of the work, the information about the publisher, ISBN, a link to Google Books or some other durably archived source, an English translation for non-English quotations and many other details. How much of this can be omitted during the initial edits, done by inexperienced contributors, without becoming a problem? --Ssvb (talk) 22:29, 20 March 2024 (UTC)[reply]

Agreed. I think in fact that if you omit any of |title=, |year=+|date= or |text=, you get a maintenance message asking for them; if not, that should be what happens. Other parameters, e.g. publisher, link to a durably archived source, etc. should also be present ideally but aren't strictly required. Benwing2 (talk) 04:07, 21 March 2024 (UTC)[reply]

If mentioning the author is currently not required, then this probably needs to change. Because this conflicts with WT:FAQ ("the right to be properly credited for one’s works", "What is fair use? [...] Any such quotation must be properly credited.") and the copyright laws of many countries. The Belarusian paper dictionary in 5 volumes, directly copying from which instigated this discussion in the first place, only lists the text itself and the author's surname in its entry, but not the year or any other information. They had to make their entries compact due to the restrictions of the paper format, but out of all things, it was the author's name that was kept there. --Ssvb (talk) 10:02, 21 March 2024 (UTC)[reply]

If mentioning the author is currently not required, then this probably needs to change. In context at WT:FAQ, "the right to be properly credited for one’s works" doesn't imply that authors should be required in quote-book. If ascertaining the "author" to ensure the "right to be properly credited for one's works" becomes an actual requirement, then the texts in the Bible or other anonymous works can not be quoted since the authors are uncertain. Proper credit for a work can come in many forms, some of which do not necessarily include identification of the author directly by some name. I guess "due diligence" might ask that you put "Anonymous" or similar on a text with uncertain authorship, but what about when authorship claims are contested? --Geographyinitiative (talk) 10:38, 21 March 2024 (UTC)[reply]

There's already a policy about WT:QUOTE#Debated_authorship. --Ssvb (talk) 11:23, 21 March 2024 (UTC)[reply]

It doesn't say what to do if we don't know the exact author's name, as in with an unsigned editorial. CitationsFreak (talk) 15:17, 21 March 2024 (UTC)[reply]

My interpretation of "giving proper credit" is that some point of contact should be provided. It can be the author, an editor or a publisher house. So that it's clear that the Wiktionary editor doesn't try to claim authorship of the quoted text. Anyway, as a result of this discussion, I just would like to have some clear, simple and actionable instructions. Quotes from the Bible are useful, but they are more like a special case. My point is that people should be able to easily add quotes without having unnecessary headache in typical cases:

The authors are usually known for modern books, but the original year of publication is a pain, because Google Books often messes it up. See WT:Quotations/Resources#Google_Books.
Wikisource is a great place for finding quotes from older public domain books, so some simple guidelines specifically tailored for its usage with the quote-book template would be useful.

--Ssvb (talk) 11:50, 21 March 2024 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Some thoughts:

I think the main point is to provide sufficient information so that another editor who wishes to verify the information can easily do so. For example, if the quotation can be found online, then please add the URL. I have seen really bad quotations where an editor provided a quote from, for example, The Guardian newspaper (presumably the one published in London? but no information was provided) some time in 2020—not even a complete date. It's not so bad if the quotation can be found through an online search, but if it can't then I'd have to reject this as a quotation. Ditto for quotations from works that I cannot find online and aren't even listed in Worldcat—I usually move these to the Citations page and mark them "unverifiable".
As to whether providing the author's name is mandatory, I'd say no but if we are unsure let's ask the Wikimedia Foundation for advice. It's definitely a good practice, though. Also, there's a difference between works still subject to copyright and works now in the public domain. A stricter standard may well apply to copyrighted works.

— Sgconlaw (talk) 11:32, 21 March 2024 (UTC)[reply]

> mark them "unverifiable"

How do you do that? Does the quote-book template allow setting the "unverifiable" status and automatically put such quotations into their own category?

I think that it would be very useful for creating English translations. For example, I could translate Belarusian quotations to the best of my ability and set some kind of "needs to be proofread by a native English speaker" status for them. Then some native English speaker could fix the grammar and style issues, rephrase the translation if necessary, and set "needs to be verified whether the translation still conveys the same meaning" status. Then I could take a look again and remove this status if everything is fine. With an iterative process like this, the quality of translations could be potentially improved. --Ssvb (talk) 16:09, 21 March 2024 (UTC)[reply]

> the main point is to provide sufficient information so that another editor who wishes to verify the information can easily do so

That's a good point and I agree. In the case of dealing with modern books indexed by Google Books, just listing ISBN in the quote-book template is probably good enough to make the information verifiable and the duty of providing all the extra details can be delegated to Google. As long as the text of the quotation is searchable itself. I mean, it's probably okay to use "|isbn=" instead of "|author=" from the compliance point of view. Explicitly specifying the author is indeed a good practice, but things may get tricky for the inexperienced Wiktionary editors when dealing with translated works. Another tricky case is when some book is a collection of multiple shorts stories from different authors. And again, just providing the title of the whole book and ISBN is probably good enough. Rather than trying to figure out who was the actual author of that particular text snippet. --Ssvb (talk) 16:55, 21 March 2024 (UTC)[reply]

@Ssvb I tend to disagree that just providing the ISBN would be enough; I think as a courtesy to the reader, you should supply the title, author, date and ideally page number. As you mention above, there are edge cases where it may be hard to determine the author, and in such a case I think it's OK to leave out the author, but the vast majority of the time, the author is right there on the front of the book (or in the table of contents if the book is a collection from multiple authors). I also completely agree with User:Sgconlaw that quotes need to be verifiable. I did a lot of rewriting of {{quote-book}} a few months ago and cleaned up all the bad parameter uses (formerly, parameters were not checked properly), and came across a lot of badly formatted quotes, most of which I was able to verify but sometimes it took a lot of work to do so, which is not what you want to force people to do. As for adding things like "unverifiable" and "needs to be proofread", there aren't things built into {{quote-book}} to let you do that currently. I think User:Sgconlaw is just writing [unverifiable] and putting it somewhere next to or within the quote (e.g. using {{attn}}); you'd have to ask them for sure. I could definitely see adding options to supply notes "unverifiable" and "needs to be proofread" automatically and add them to appropriate categories. Benwing2 (talk) 21:51, 21 March 2024 (UTC)[reply]

I've just been adding to such quotations on citations pages |footer={{small|Unable to verify this quotation.}}. — Sgconlaw (talk) 21:58, 21 March 2024 (UTC)[reply]

@Sgconlaw: Thanks! I made the following diff to give it a try. Still, as User:Benwing2 mentioned, having a dedicated option and a category for it might be useful. --Ssvb (talk) 14:22, 22 March 2024 (UTC)[reply]

@Ssvb: I'm not sure it should be used on the main entry page. I would suggest you move the entire quotation to the citations page. — Sgconlaw (talk) 15:59, 22 March 2024 (UTC)[reply]

@Sgconlaw: Removing the quotation from the main entry page would be less than ideal and I believe that my translations into English aren't too horrible. I'm just not fully confident about the things like "useful for me" vs. "useful to me" and also wonder whether I'm possibly doing a kind of odd Yoda-style sentences composition in some of my translations.

Looks like I probably need something like |t-check= parameter support in the quote-book template. Similar to the WT:TRANS's approach to handling this. --Ssvb (talk) 15:28, 23 March 2024 (UTC)[reply]

When you are talking about badly formatted quotes, do you mean the uses of quote-book template, which incorrectly add advanced options with inaccurate/bogus information? Or was it something else? --Ssvb (talk) 00:12, 22 March 2024 (UTC)[reply]

I agree with the above: "My interpretation of "giving proper credit" is that some point of contact should be provided. It can be the author, an editor or a publisher house." How do you feel about Lufeng's quote here: diff? You may say "there are other cites with authors on them"- but consider that you exclude an important part of the literature of humanity that has an anonymous or semi-anonymous character, but can still be adequately identified with OCLC, etc. Formally identified authors/translators are important to identify, yes, but there's generalized "sources". I mean it's an interesting question, of course. I don't really know what's right legally speaking for the site. Many magazines and newspapers have articles with unidentified authorship, and "AP News" is the source for many articles in 20th century newspapers.
Or again, consider this cite for Citations:Manoi, which has no specific author except the city "CHUNGKING":

1945 May 23, “Chinese Expand New Drive on East Coast”, in Manila Free Philippines‎^[10], volume III, number 24, Manila, sourced from Chungking, →OCLC, page 1, column 5:
On the east China coast, onrushing Chinese forces captured Manoi, Min river estuary coastal town nine miles east of Foochow. Other forces, driving northeast from Foochow, were reported at the outskirts of Lienkong on the coast.
--Geographyinitiative (talk) 22:18, 21 March 2024 (UTC) (Modified)[reply]

@Geographyinitiative: I feel that these quotes are way too elaborate and detailed. So they are exactly the thing that scares away new contributors. The beginners wouldn't touch any of these excessively complex templates with a ten-foot pole.

The discussion is about how much can be left out, while still being an acceptable and useful contribution. There was a suggestion that just having "|title=, |year= and |text=" is enough. Yet my opinion is that such bare minimum likely fails to "give proper credit" and a little bit more information is necessary, such as the "|author=" option. Or, as it's in your example, it may be the issue number, OCLC, page number, etc. However I think that the beginners should focus on just quotes from books and the "|author=" option, initially staying away from more complicated cases. --Ssvb (talk) 23:31, 21 March 2024 (UTC)[reply]

Here is "title year text", bare bones, for the above quote:

1945, Manila Free Philippines:

On the east China coast, onrushing Chinese forces captured Manoi, Min river estuary coastal town nine miles east of Foochow. Other forces, driving northeast from Foochow, were reported at the outskirts of Lienkong on the coast.

User:Wyang once called the website 烏煙瘴氣. I don't think anyone will respect the website that is made up of only relatively simple quote cites. Those simplistic quote cites are part of the equation, but you don't want to hurt the high-quality quotes just because some quotes are simple.
I think the main problem is that it's not obvious to entry-level people that Template:quote-book is the place to go to find out about quote-book parameters. It's hidden from them.
WT:ATTEST says: "Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source." My goal in providing detail in quote cites is two-fold: (1) to allow someone looking for this in 50 years after Internet Archive is long gone to find it again, and (2) to give an atmosphere to this website of professionalism that encourages high-quality editing and citations. If I did "title year text" on my quote cites in my high school paper, the teacher would give me an F.
I also work on a lot of words that were ignored by Wiktionary and Wikipedia for about 20 years time, and they will be ignored again when I'm gone. I expect no one will follow up after me and that the soft-power campaigns of authoritarian China will mostly reverse what I've done so far to illuminate some of this terminology that the CCP doesn't like people to know about. It's an area of English language vocabulary that is scorned by the field of Asian Studies.
All I have to cling to is that the cites I've done so far might be high-quality enough to convince some future admins in 2040 not to allow deletion of everything I did. If I just did "title year text" quote cites, my argument to those future admins is significantly weakened, because they can say "well, who the fuck knows where that Manila Free Philippines bullshit quote was really located? that's just some bullshit from the cretins in 2024 who didn't have MindLinkAI." I'm adding indicia of authenticity with these details to stave off deletion in 2040, 2050, etc. I want to make it so that if you delete some of these high-quality quote cites, you have to feel palpably ashamed of yourself, unless you're totally dead inside. No one would feel ashamed deleting "title year text" quote cite. --Geographyinitiative (talk) 00:08, 22 March 2024 (UTC)[reply]

@Geographyinitiative: I'm not asking you to reduce the level of details in your quotes. Keep up the good job. I'm actually asking people to start using Template:quote-book instead of Template:uxi for adding quotes to Wiktionary. And this transition doesn't have to be difficult. --Ssvb (talk) 00:39, 22 March 2024 (UTC)[reply]

I'm telling you, we will always regret simplification to less than what exists now in quote-book. But we will also always regret throwing out things that are semi-anonymous. You give credit to the extent the work allows you to give credit, not less, not more. The work being cited is the guide to its citation. Cookie cutter simplifications will not work. It is as beautiful and marevelous as it is without Stalinist dictates to simplfy or to require specific authors. Geographyinitiative (talk) 02:22, 22 March 2024 (UTC)[reply]

@Geographyinitiative: This is getting ridiculous. I don't just "give credit to the extent the work allows you to give credit, not less, not more". I'm providing a lot of additional details when adding quotations myself. But I'm also occasionally fixing quotations like this. And please take a look at how people opt to just delete an ux template rather than convert it into a proper quotation. Why is this happening? What can we do to improve the current situation? I don't believe that it's productive to just shame people by claiming that allegedly "the teacher would give them an F". --Ssvb (talk) 21:40, 23 March 2024 (UTC)[reply]

Template:ko-etym-native without parameters is pointless and misleading[edit]

@AG202 @Solarkoid @Chom.kwoy @Tibidibi

Calling Template:ko-etym-native without parameters is currently explicitly allowed, producing the message "Of native Korean origin."

This to me seems completely pointless. How does this in any way help the reader? Not only that, it is misleading and gives the ety false confidence; if you have nothing to add to the ety, how can you be so sure that the word is of native origin to begin with?

I propose that this usage be deprecated so that it can eventually be phased out. Ideally this would be done by categorizing any entries with parameter-less invocations into a separate category (or just ) so that the respective entries can be dealt with appropriately. Lunabunn (talk) 00:46, 21 March 2024 (UTC)[reply]

I think that template is currently being used as a signal to indicate the average Korean speaker's intuition about that word, as being neither Sino-Korean nor "foreign", as this intuition sometimes has an effect on the word's usage and grammar (e.g. Sino-Korean words better go along with other Sino-Korean words when compounding, etc). So while this information has some uses, it should not be conflated with the actual etymology of the word, which is what the etymology section is for.

So I agree that it should be deprecated as well. Chom.kwoy (talk) 09:10, 21 March 2024 (UTC)[reply]

@Chom.kwoy: Are there any bright ideas as to what should replace it? There is a significant similar notion in Thai of a partition of vocabulary into 'native' and Pali/Sanskrit vocabulary, distinguished as 'blue' and 'red' in one English-language grammar, and associated with compounding rules (which have exceptions, most notoriously ผลไม้ (pǒn-lá-máai)). English also has a similar but sloppy categorisation, though the effects of 'marking' in the lexicon are shown by other information in lemmas' entries, so probably less useful to record. --RichardW57m (talk) 11:33, 21 March 2024 (UTC)[reply]

As an outsider (beginner in Korean studies), I find this useful information, for the reasons you (@Chom.kwoy) note.

As a long-time Wiktionary editor, I think "native Korean origin" is indeed etymological information, as etymology is about where a word comes from (its origin). I struggle to think where else in our WT:ELE entry structure that this kind of information would go. ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:56, 22 March 2024 (UTC)[reply]

If a word is truly of native Korean origin, that is of course etymological information. However, the issue is that "Of native Korean origin." is currently a copout that gets added to every entry under the sun that isn't an obvious loan (i.e. from English or Sino-Korean) and obscuring actual etymological info based on "speaker intuition." As Chom.kwoy said, this speaker intuition is also important. It should not, however, be conflated with the true origin of a word.

See also examples from Surjection below on words that are clearly loans (and even described as such in the etymology section) but still end up using Template:ko-etym-native because of this template's double duty as "first attested in" and "of native origin." Lunabunn (talk) 01:18, 23 March 2024 (UTC)[reply]

Fully agree, as an outsider. The "first attested in" part should be moved into its own template (perhaps named {{ko-attest}}) and the rest removed, if not completely, then at least from the etymology sections. Category:Native Korean words must also go. — SURJECTION ^{/ T / C / L /} 12:06, 21 March 2024 (UTC)[reply]

First attestation seems to me like part of etymology, in describing the start date (origin) of textual evidence; I've certainly been putting that info in the ===Etymology=== sections in Japanese entries, for years now. Where else should that go? ‑‑ Eiríkr Útlendi │^{Tala við mig} 19:06, 22 March 2024 (UTC)[reply]

First attestation should stay in the etymology. I wasn't arguing it shouldn't, just that it be moved into its own template that could then be used in etymology sections. — SURJECTION ^{/ T / C / L /} 21:35, 22 March 2024 (UTC)[reply]

Keep until a better alternative is found, don't simply remove. I liken this to Japanese on'yomi and kun'yomi - native Japanese and Sino-Japanese readings of Chinese characters (kanji). Apart from Sino-Korean, it also distinguishes from loanwords from European languages, more modern Japanese loanwords where occasional tensification of consonants occur.

Finding etymologies for all words may not be possible bu distinguishing loanwords from native words specifically for Korean has its values and is practice with certain dictionaries and contributors.

Less categorical on Category:Native Korean words but I don't see why it should be removed. Anatoli T. ^{(обсудить}/^вклад) 13:36, 21 March 2024 (UTC)[reply]

If our English entries had ever adopted a similar system, there would be people defending it too. "Of native origin" is not an etymology nor are entries perceived to be of "native" origin a valid category. I've seen this even be misused for compounds derived entirely from recent borrowings that nobody would ever call "native". In short, this system is nothing more than an absurd cop-out. — SURJECTION ^{/ T / C / L /} 14:15, 21 March 2024 (UTC)[reply]

And to illustrate just how terrible of an idea it is to combine an attestation template with a "this is native I promise" template (which does not make the latter any less absurd), look at 가라치 (garachi) and 고두리 (goduri). Truly, some native Korean terms that were borrowed from Mongolic. — SURJECTION ^{/ T / C / L /} 14:19, 21 March 2024 (UTC)[reply]

Here's the misuse too for good measure. — SURJECTION ^{/ T / C / L /} 14:21, 21 March 2024 (UTC)[reply]

Definitely agree to remove. Logged in just to reply to this. Also thanks, Surjection, great examples as to why it's a bad/misleading template. Assuming right from the get-go that a word is of native origin simply won't do. Additionally, we use Of native Korean origin. for everything attested; so, it adding those word to a category called 'Native Korean words' is a really bad practice (Compare a Chinese loan 요). Also, I share Surjection's suggestion that that the template should be changed (i.e. remove parameterless option, categories) and renamed (to something like a ko-attest). - Solarkoid (talk) 14:40, 21 March 2024 (UTC)[reply]

The issue is whether this corresponds to a vocabulary marking (for an 'unmarked' value) that is relevant to Korean grammar. In English, the abstract lexicon has words marked as 'foreign' or 'Latinate', for which the unmarked value approximates to 'native', for which 'effectively native' may be a better term. And in English, the 'native' category includes such words of French origin as beak and beef. For grammar, being 'Sino-Korean' is reported above to be a matter of synchronic fact, not of historical fact. --15:01, 21 March 2024 (UTC) RichardW57m (talk) 15:01, 21 March 2024 (UTC)[reply]

If it has significant relevance, then it should be documented in some way, but by using a better term than "native" and by without abusing the etymology section to document it. — SURJECTION ^{/ T / C / L /} 15:07, 21 March 2024 (UTC)[reply]

@Surjection: Agree. But let's not just delete the records, but rather eliminate them by conversion to the new way of documenting. --RichardW57m (talk) 15:32, 21 March 2024 (UTC)[reply]

Sure, I'd be fine with (a) splitting first attestation into its own template, (b) moving {{ko-etym-native}} out of the etymology section to somewhere else (maybe usage notes), (c) rewording the text displayed by the template as appropriate, and (d) getting rid of Category:Native Korean words and stopping the template from categorizing. — SURJECTION ^{/ T / C / L /} 15:38, 21 March 2024 (UTC)[reply]

@Surjection: Thank you for the valuable input. I definitely see value in keeping some kind of label (CC: @Chom.kwoy), but otherwise agree that all of this is cruft that needs to be gone.

Perhaps, then, we may just deprecate the entire ko-etym-native template and then gradually replace it with a near-identical template with parameterless invocation and categorization removed? (We can still use Module:ko-etym for this new template.) We should be able to use either one of Category:Pages using deprecated templates or Category:Native Korean words to keep track of all the entries that need to be updated (eventually). Lunabunn (talk) 19:11, 21 March 2024 (UTC)[reply]

@Atitarev I am not sure how applicable your points are to Korean.

I liken this to Japanese on'yomi and kun'yomi - native Japanese and Sino-Japanese readings of Chinese characters (kanji).
- This is already covered by the Sino-Korean template, as you have noted.
it also distinguishes from loanwords from European languages
- This is already covered by the fact that, ya know, modern loanwords are marked as such. If an etym section does not say it is a loan, that is enough to perceive that it is not a modern European loanage such that this kind of thing would matter. Conversely, just because a word is not a modern European loan does not mean it is of native origin. See the many Mongolian/Japanese/Korean... loans, for example, that are now perceived as native. (e.g. 수라 (sura), 담배 (dambae), 김치 (gimchi))
more modern Japanese loanwords where occasional tensification of consonants occur.
- This is already explicitly covered by Template:ko-ipa. Note that not all words that receive this "loanword-like" tensing are loanwords, nor does every loanword receive this kind of tensing.

But most importantly, once you add a first attestation to Template:ko-etym-native, the "native origin" message disappears. So even if everything you said is true and indeed "native origin" is a useful label here, this template is still the worst of both worlds. Lunabunn (talk) 01:24, 23 March 2024 (UTC)[reply]

Support removal. We do need some kind of hidden category tracking though, like "Korean words with no etymological history" or something like that. AG202 (talk) 03:47, 25 March 2024 (UTC)[reply]

Splitting Etymology by Accentuation[edit]

(Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Getsnoopy, Rishabhbhat, Dragonoid76, RichardW57, Exarchus): When forms of a lemma differ in accentuation which is not normally marked in the orthography, my reading of WT:EL says that if we have entries for the forms, they should be recorded under different etymologies, as is done for the present and past of English read, which have different vowel sounds. I have accordingly followed that rule at Sanskrit अमृता (amṛtā, “deathless”), where the Vedic accent is on the first syllable of the word in the vocative and on the second syllable in the other cases. Should or could I instead have followed the example of Russian сковороды (skovorody, “frying pan”), where there is only one etymology section and the two pronunciations are given in the same pronunciation section, with the pronunciation tied to the relevant noun form entry by the accent shown on the Cyrillic? --RichardW57m (talk) 16:56, 21 March 2024 (UTC)[reply]

@RichardW57m I'm more inclined to say we should follow the example of сковороды (skovorody) for a few reasons:

1. "Sanskrit" represents a continuum of Indo Aryan languages, including Vedic, but in general most entries are Classical Sanskrit (which does not have the Vedic stress), i.e. Classical Sanskrit is the default and we tend to use "(Vedic)" as it is required.

2. Hence, it seems like needless complication, and right now the page for अमृता (amṛtā) is crazily complicated and tries to over-explain. Why are there so many entries for "Noun" for "sandhi form of अमृत (amṛta) under various definitions? We should just have one entry like "sandhi form of अमृता (amṛtā)" which covers all the definitions and maybe put a "Usage Note" there if there is something specific to address. These are all non-lemma forms, and hence the actual stuff like Usage Notes and meaning nuances based on stress should probably be on the lemma page, as I understand.

2. There are tons of cases where an inflecting nominal/adjective has a feminine form that has it's own specific definitions. We should just have one entry for the non-lemma of the masculine form and one entry for the special feminine meanings, to be succinct. Dragonoid76 (talk) 18:22, 21 March 2024 (UTC)[reply]

@Dragonoid76: The multiplicity arises from the decision taken by someone (not me) that there are separate noun and adjective for अमृत (amṛta). Having separated adjective and noun, in this case we get at least one lemma for each of the three genders. अमृता (amṛtā) is a form of each of these four lemmas, and for each lemma they include both vocative and non-vocative forms, so two semantically distinct pronunciations for each of the four. We thus end up with two adjective forms, one noun and five noun forms. (Forms identical with the lemma do not get separate entries; the feminine noun is a lemma.) We use the same PoS heading for both noun and noun form, in accordance with WT:EL#Part of Speech. My understanding is that forms of different lemmas should have different entries.

Vedic Sanskrit is a valid language and therefore its words are eligible for inclusion. I've cross-referenced the forms with different accents to cover words with pitch accents not indicated, as in the writing of Classical Sanskrit. --RichardW57 (talk) 02:07, 22 March 2024 (UTC)[reply]

@Dragonoid76: You seem to have overlooked specific meanings of the neuter form. --02:07, 22 March 2024 (UTC)[reply]

Even with the collapsing of what are currently nouns into the adjective, we would need different entries for semantically distinctive differently placed Vedic accents. I think we would still have to separate out at least three nouns, for 'ambrosia', 'kudzu' and 'root', and a bunch of feminine plant names. --RichardW57 (talk) 02:07, 22 March 2024 (UTC)[reply]

@RichardW57 What do you make of something like User:Dragonoid76/sandbox/अमृता. I think all the "See also" boxes are very excessive and will be quite hard to replicate on all pages since Sanskrit has quite a bit of syncretism and words with multiple meanings. Here, all different forms of words with the same pronunciation and with the same etymology are put under the same header. Dragonoid76 (talk) 05:49, 22 March 2024 (UTC)[reply]

@Dragonoid76: I don't object in principle to the pronunciation section, but I'm not sure it satisfies the requirements of Gangetic chauvinism, for it gives a clearly greater rôle to the Roman script than to Devanagari. I think we may need to add to the capabilities of {{sa-IPA}}.

The orthographically identical forms of the same part of speech should either be kept together or cross-linked, as a reader may think that a PoS section is exhaustive. Possibly, and this also applies to the current form of the page, the noun forms should have a link to the feminine noun, as it may not be obvious to the reader that the forms of the feminine noun will not have separate definitions. See also is quite appropriate for these cross-links.

As to the work involved, I would remind you that 'lexicographer' is famously defined as 'a harmless drudge'.

I'm not confident that the botanical senses have the same etymology as the more obviously derived senses. The use case for not grouping senses with the same etymology as a single word is the risk of etymologies being added for a single sense, which is more likely to happen with derivative lemmas, such as participles, than with case forms such as we have. Incidentally, these case forms don't all have the same lemma form, and I will correct that in the mock-up. --RichardW57m (talk) 11:32, 22 March 2024 (UTC)[reply]

@RichardW57m For the accentuation you can add the Devanagari accents like this: अ॒मृता॑ (amṛ́tā) ... अमृ॑ता (ámṛtā).

For me it makes simply no sense to have 'See also' for something on the same page. Exarchus (talk) 15:38, 23 March 2024 (UTC)[reply]

@Exarchus: Are you unaware of the senseif/etymid functionality? That enables one to link to specific elements within a language's section. --RichardW57 (talk) 16:39, 24 March 2024 (UTC)[reply]

@RichardW57 I see what you mean, one could end up on one section of a page and not notice the others, but is this really relevant here (as we are talking about non-lemma forms)? Why would someone link specifically to these forms instead of simply the main lemma? Exarchus (talk) 16:57, 24 March 2024 (UTC)[reply]

@Exarchus: Yes, it is relevant. When looking up अमृता with Vedic accent unknown, one will find a matching section, and quite likely forget to look for one with a different Vedic accent. Remember, Wiktionary's published aim is to cover every word, not every lemma.

The main utility of the etymids (I really wanted 'wordid' and was tempted to abuse senseid) is in these intra-page crosslinks, but there may be other occasions to link to them. --RichardW57m (talk) 10:17, 25 March 2024 (UTC)[reply]

@RichardW57m I'm still not sure how this is different from someone looking up the English noun swallow, then finding "(archaic) A deep chasm or abyss in the earth", reasoning that it's probably not archaic, so it's much more likely "The amount swallowed in one gulp; the act of swallowing", forgetting that there's also an 'Etymology 2'. Exarchus (talk) 11:04, 25 March 2024 (UTC)[reply]

@RichardW57m To put this discussion into perspective: of all the forms currently classified under 'Etymology 2' (ámṛtā), only the vocatives (dual and plural) of the adjective occur in the DCS. And none of these vocatives actually has a pitch accent, as they don't occur at the start of a pāda. (Even in Padapatha they are written अ॒मृ॒ता॒ (amṛtā).) Exarchus (talk) 22:20, 25 March 2024 (UTC)[reply]

@Exarchus: I don't think the DCS is exhaustive. We also have the policy of allowing regular inflected forms despite their lack of attestation. On the basis of this, I've been adding terms for Pali imperative singular actives when they are the same as another word, to avoid Pali inflection tables wrongly linking to words of other languages, regardless of whether I can find an attestation of the Pali form. The alternative approach would be to stub the vocative dual and plural forms out from the noun inflection tables for अमृत (amṛta) and अमृता (amṛtā).

This now gives us three different pronunciations with the same basic spelling!

However, I think I have a better solution, which is to use

{{sa-adj form|tr=amṛ́tā|tr2=amṛtā|tr3=ámṛtā}}, which currently yields

अमृता • (amṛ́tā or amṛtā or ámṛtā)

This has implicit parameters |head=, |head2= and |head3=, which implicitly default to the page name.

We can then use the usage notes section to explain that dependent upon position, the vocative is either unaccented or is accented on the first syllable, while the other cases are accented on the second syllable. --RichardW57m (talk) 10:45, 26 March 2024 (UTC)[reply]

@RichardW57m Sounds like a good idea Exarchus (talk) 11:30, 26 March 2024 (UTC)[reply]

@Benwing2, Theknightwho This relies on the undocumented merger of the output of the terms - may this be relied upon? --RichardW57m (talk) 10:45, 26 March 2024 (UTC)[reply]

And I suppose that if one wants to link specifically to one of these forms (because it's in a citation), then one should be able to link directly to the correct form (either ámṛtā or amṛ́tā), so people shouldn't have to look at the other one. Exarchus (talk) 17:19, 24 March 2024 (UTC)[reply]

If I understand correctly, read is the exception, not the common practice in English entries. This search reveals many English terms like object and document with one etymology section and one pronunciation section that splits the pronunciations based on part of speech. — excarnateSojourner (ta·co) 15:59, 26 March 2024 (UTC)[reply]
@excarnateSojourner: With regard to the method of you searched for, looking for "{{a|noun}}", the documentation of {{accent}} says, "It should not be used for other qualifiers like noun, verb, adjective, and so on"! It does, however, say that {{qualifier}} is used for these, and indeed, if often is. WT:EL does not mandate following the example of read, but merely gives the example of lead as where one may use 'etymology' to label pronunciations. Your finding usefully supports the solution I currently favour; I can use {{q}} to grammatically label the pronunciations themselves in this Sanskrit case. --RichardW57m (talk) 16:51, 26 March 2024 (UTC)[reply]

Equivalent of Template:ellipsis of in compounds[edit]

I think we would need something like this. There are quite a few cases in e.g. Finnish where a word can stand for a compound using that word, if it is clear from context, e.g. kone. Currently many of these use {{short for}}, but that is not ideal. — SURJECTION ^{/ T / C / L /} 23:25, 23 March 2024 (UTC)[reply]

FWIW, for me {{ellipsis of}} implies a multiword term, not a compound, which is why I don't feel like using it. — SURJECTION ^{/ T / C / L /} 23:29, 23 March 2024 (UTC)[reply]

{{clipping of}}? Vininn126 (talk) 08:49, 24 March 2024 (UTC)[reply]

Doesn't really fit either - we are removing whole words, even if they are still part of a compound word. — SURJECTION ^{/ T / C / L /} 08:57, 24 March 2024 (UTC)[reply]

I've switched these to {{ellipsis of}} for now - but I still feel it might be better to have a separate template for this. — SURJECTION ^{/ T / C / L /} 14:42, 24 March 2024 (UTC)[reply]

Hmm... I think this kind of thing (where lexical) must be clipping, ellipsis, or "short for"; at least, I don't think we could realistically distinguish some fourth category ("Template:shortening of"?) from T:short for, ellipsis, and clipping; there's too little distinction and too much overlap.
I can find a few works discussing ellipses of compounds, and slightly more works discussing clipping compounds (many are discussing clipping words out of a spaced multi-word compound, but this seems to function identically to clipping unspaced compounds; as with kone, a manual for a washing machine might say to load things into the machine). And of course other works discuss some element of a compound being "short for" or "a shortening of" the compound. Quotes.
But we should perhaps consider at what point this is no longer lexical (no longer "one of the definitions of kone is pyykinpesukone") and is just hypernymy; I mean, you can also shorten sports car to car or Chinese food (Indian food, delicious food, more food, etc) to food if the meaning is clear from context... ("I ordered Chinese food. Later, when I was eating that [Chinese] food, I noticed it had onions in it.") - -sche (discuss) 20:09, 25 March 2024 (UTC)[reply]

@-sche @Surjection I am inclined to agree with -sche here. I do understand the idea that ellipses are multiword, but I think that is somewhat of an arbitrary distinction. Consider English vs. German, for example, where compounding is similar but English often writes the compounds open whereas German writes them closed. I even think {{short for}} should usually be rewritten as either ellipsis, clipping or abbreviation. Benwing2 (talk) 05:54, 26 March 2024 (UTC)[reply]

Category:Translingual entries with incorrect language header[edit]

These are pretty much all letter entries where the entire entry uses the "mul" language code, but the header is that of a specific language (there's often a Pronunciation section using the language code that matches the header- but that's it).

My understanding is that there should be two types of letter entries:

A translingual one giving the kind of information that's not specific to any one language, such as Unicode codepoint. For this, both the "mul" language code and the "Translingual" header are required. This always goes at the top of the page, though the language section may also include mathematical and other non-language-specific symbols that use the same character.
A language-specific one giving the type of information that's typically different between languages, such as position in the language's alphabetic order, and its pronunciation. For this, the Wiktionary language code for a specific language and the header with the Wiktionary name of that language are required. This always goes in the same place on the page that any entry for that language would go. If there are multiple languages that use a given character, there should be multiple language sections.

IMO, there should be a language section (of the second of the two types above) for every language that uses that letter as part of its standard orthography. There probably should also be a single Translingual section.

There should never be a Translingual entry with the header for a specific language, and there should never be a section for a specific language with a Translingual header. There should also never be a pronunciation section in a Translingual entry, unless it's a phonetic symbol- "Translingual" isn't a language, so it has no speakers.

This is the prevailing practice at Wiktionary since we've had language codes, language headers, and translingual entries. Do we have this spelled out somewhere, so we can point to something when we tell people to stop doing otherwise?

Finally, I would like us to get rid of all the entries in this category, by fixing them. I see three ways to do this:

keep the "mul" code and give them a Translingual header
keep the language-specific header, but change all the language codes to match the language of the header.
split them into two sections: a Translingual one at the top of the page with "mul" language codes, and a language-specific one in the body of the page with matching language-specific headers and language codes.

What does everyone think? Chuck Entz (talk) 01:38, 24 March 2024 (UTC)[reply]

Wouldn't it be easier to have a table under the Translingual L2 that contained columns for language name, pronunciation, serial position, alternative representations, and anything else that applied to more-or-less every language's use of the letter/symbol, with an extra column for language(-family)-specific info or, at least, links to sources for such info? The same idea could be applied to personal names. Maybe the same approach would be useful for taxonomic names, etc. DCDuring (talk) 14:16, 24 March 2024 (UTC)[reply]

That might work for letters, which are generally not treated as words in the various languages (as written), but personal names have all kinds of grammatical and etymological information that doesn't go well in a table format. Also, there's Special:PrefixIndex/Template:list:Latin script letters, etc. Chuck Entz (talk) 14:46, 24 March 2024 (UTC)[reply]

I worry I'm cutting the Gordian knot with too simple a solution, but the bulk of the entries currently in the category seem to be Osage entries created by a single user (whose defiance of other norms, like one POS header per POS, people have discussed a few times including recently), and to me, the simplest / easiest / "least-change" solution seems to be to just revise the language code to match the language header in line with how we also have e.g. 려 as ==Korean== and not ==Translingual==. This is what I would've done (quietly, as basic cleanup, without even thinking to have a BP discussion about it) if I had seen a new user adding these. Like with 려, there does not currently seem to be any codepoint information in 𐓶̋ to discuss whether or not to have a ==Translingual== section for. Once the Osage entries are changed, we can see whether any other entries in the category look like they would need different treatment, e.g. any letters which are actually used by more than one language, but I don't suppose it makes sense to discuss a table to compactly provide multiple languages' pronunciations of 𐓶̋ if only one language uses it. - -sche (discuss) 15:18, 24 March 2024 (UTC)[reply]

Note that Osage 𐓶̋ isn't a real letter, but a letter plus diacritic combination, which was encoded too late to be assigned a precomposed character in Unicode. I'm not sure if we have a policy on giving the code points (or even entries) for such things; we generally don't have entries for combinations of Indic letters and vowel 'diacritics'. I've a feeling there was sentiment was against allowing such entries, but rather insisted on status as letters, as seen in Scandinavia but generally not France or Germany, but with a counter-argument that Unicode characters in use were eligible. --RichardW57 (talk) 16:32, 24 March 2024 (UTC)[reply]

@-sche: I don't understand your comparison with 려 (ryeo), which does have codepoint information, and is a precomposed character. --RichardW57 (talk) 16:32, 24 March 2024 (UTC)[reply]

Chuck seemed to suggest we might need/want Translingual entries to handle non-language specific information like what Unicode codepoint(s) the letter has, but these letters do not currently have such information, so I'm saying the simple fix is just to fix the headword-line template to use the language code that corresponds to the L2. - -sche (discuss) 20:15, 24 March 2024 (UTC)[reply]

I stumbled across ꜫ today. The entry seems to have "alternative forms" shared between two languages. Isn't this common?

Is there other information, besides the Unicode codepoint that is shared by all languages that use the letter/character/symbol? Like typographic realizations, as in ꜫ? DCDuring (talk) 23:53, 24 March 2024 (UTC)[reply]

@DCDuring: I don't think it's common, and such information probably belongs on Wikipedia rather than here. For an entry like Osage 𐓶̋, I think we simply want a head line template such as {{head|osa|letter|upper case|{{uc:{{PAGENAME}}}}|tr=-}}. If we chose to keep such entries, we can then worry about displaying the codepoints for the non-letters with PoS 'letter'. (We probably already have a template to do it.) Can we just go ahead and fix the Osage letters and letter-accent combinations? --RichardW57m (talk) 18:11, 25 March 2024 (UTC)[reply]

@-sche: When you wrote, "Like with 려", did you mean "Unlike with 려"? --RichardW57m (talk) 10:41, 25 March 2024 (UTC)[reply]

려 contains no information about what Unicode codepoint(s) a user would have to enter/use to obtain the character. Like with 려, 𐓶̋ contains no information about what Unicode codepoint(s) a user would have to enter/use to obtain the character. - -sche (discuss) 18:48, 25 March 2024 (UTC)[reply]

@-sche: The codepoint information for 려 is generated by the invocation {{character info}} at the top of the page. It's true that you have to work to get the NFD code sequence, as the vowel is given as a compatibility jamo rather than as a combining jamo. For me it displays at the top right of the page. If you click on a language-tagged link, you may have to scroll up to see that information box. --RichardW57 (talk) 05:38, 26 March 2024 (UTC)[reply]

The 'simple fix' hides the omission, so we don't remember to supply the lack. --RichardW57m (talk) 10:41, 25 March 2024 (UTC)[reply]

Belay this. The straightforward fix above hides nothing. --RichardW57m (talk) 18:16, 25 March 2024 (UTC)[reply]

I agree. And this shows once again why this user should've stayed blocked. AG202 (talk) 03:44, 25 March 2024 (UTC)[reply]

Anyway, unless someone has a cogent objection, I will just fix these entries with AWB soon. - -sche (discuss) 18:48, 25 March 2024 (UTC)[reply]

@-sche What are 'these' entries, and what changes will you make? I'm not sure that the fix for 🐀 is obvious. The English meaning of the emoji looks dependent on English - I wouldn't be surprised if it were associated with endearment in Thai usage - compare Thai หนู (nǔu). --RichardW57m (talk) 12:14, 26 March 2024 (UTC)[reply]

As I described above, the fix is to conform the few stray instances in each entry where the language code doesn't match the language the entry is declared to be in. For the rat, that means [11]. - -sche (discuss) 14:24, 26 March 2024 (UTC)[reply]

@-sche: OK, that {{mul-symbol}} does less than I thought it did. But just replacing 'mul-letter' by 'osa-letter' damages the entries - the undocumented template doesn't display the casing partner, which I'd rather than calculate than enter manually. --RichardW57m (talk) 18:00, 26 March 2024 (UTC)[reply]

@Chuck Entz: A vote to prohibit creating an entry for every letter of every alphabetically written letter gained significant support, but not enough to make it policy. As there are, according to Wiktionary:List_of_scripts, 5,114 languages on Wiktionary using the Roman script, I think it's a bad idea to do things that way before pages are split by language. --RichardW57m (talk) 11:55, 26 March 2024 (UTC)[reply]

The vote was Wiktionary:Votes/2020-07/Removing_letter_entries_except_Translingual - there was a majority in favour, but not a consensus. --RichardW57m (talk) 18:02, 26 March 2024 (UTC)[reply]

Chinese lect labels and categories[edit]

(Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): @Theknightwho Sorry to ping everyone again. I am going through and adding labels to Module:labels/data/lang/zh for missing Chinese lects and creating the associated categories, and it's revealing some issues in the way that categories are currently handled. To fill everyone in who isn't familiar with the unique way that Chinese is handled:

All lects go under the Chinese L2 header.
The 'Foo lemmas' categories are added for individual Chinese languages using the {{zh-pron}} template in the Pronunciation section, which lists all the possible pronunciations of a given term in different Chinese languages (including Old Chinese and Middle Chinese).
There are also labels, added using {{lb|zh|...}} or {{tlb|zh|...}}, to identify that a given sense is defined only for specified Chinese lects.

The problem here is that different structures are used for the categories generated by the labels vs. the categories generated by {{zh-pron}}. In particular, a label like Mandarin, Hakka or Xiang places the term in (respectively) categories Category:Mandarin Chinese, Category:Hakka Chinese and Category:Xiang Chinese, which are subcategories of Category:Dialectal Chinese (which in turn is a subcategory of Category:Regional Chinese, which is ultimately under Category:Chinese language). It's true that Category:Mandarin Chinese is also a subcategory of Category:Mandarin lemmas, and similarly for Category:Hakka Chinese, Category:Xiang Chinese, etc., but the existence of two categories for each language seems redundant. To make matters worse, as I've been creating new categories for missing lects like Category:Dabu Hakka, I haven't been explicitly specifying the parent category as e.g. Category:Hakka Chinese, with the result that they're placed under Category:Regional Hakka (rather than a child of Category:Regional Chinese), which leads to a very different breadcrumb trail than older lect categories like Category:Hong Kong Hakka. I propose the following:

Eliminate categories 'Foo Chinese' as much as possible. Labels like Mandarin and Hakka should directly categorize into Category:Mandarin lemmas and Category:Hakka lemmas. (It's true that such labels could potentially be used for non-lemma forms as well, but in practice this isn't an issue due to the fact that Chinese languages have almost no morphology to speak of.)
Older-created lect categories like Category:Hong Kong Hakka should have their parents set to e.g. Category:Regional Hakka rather than Category:Hakka Chinese. This brings Chinese lects in line with non-Chinese lects, which always work this way.

This leaves a few outstanding issues:

What about the remaining lemmas in categories like Category:Malaysian Chinese and Category:Philippine Chinese? The problem here is that the lemmas are tagged just using the labels Malaysia and/or Philippines without properly identifying which language is involved. Maybe these can be recategorized by bot but I don't know enough about Chinese to do it without help.
Related to this: Currently categories like Category:Malaysian Chinese and Category:Philippine Chinese are children (sometimes grandchildren, etc.) of Category:Overseas Chinese. Do we want to bother with per-language categories like Category:Overseas Hakka, Category:Overseas Hokkien, Category:Overseas Teochew, etc. or just put things like Category:Malaysian Hokkien directly under Category:Regional Hokkien?

Benwing2 (talk) 00:08, 26 March 2024 (UTC)[reply]

@Theknightwho Do you have any opinions here? As I'm filling out the labels and descriptions in Module:labels/data/lang/zh, the dual hierarchy is becoming more and more annoying. Benwing2 (talk) 00:05, 4 April 2024 (UTC)[reply]

I'm also thinking we should add a field to the language extra data and/or the {{auto cat}} call on the language page, which describes the language in more detail. For example, Category:Eastern Min Chinese currently has the definition "Terms or senses in a branch of the Sinitic languages spoken in eastern Fujian Province in southeast China, as well as in parts of extreme southern Zhejiang Province to the north of Fujian and in the Matsu Islands (belonging to Taiwan)", which gives a lot more information than the Category:Eastern Min language page, which just says it's a language spoken in China and some other countries. Benwing2 (talk) 00:09, 4 April 2024 (UTC)[reply]

@Benwing2 I've not been keeping up with the Chinese label discussions, so I'll need to catch up with everything first. I agree the dual system is silly, though. Theknightwho (talk) 02:43, 4 April 2024 (UTC)[reply]

Adding Transitional Proto-Norse as an etymology-only language[edit]

Transitional Proto-Norse is the name for the language found in Scandinavian runic inscriptions from around 550–650. It's basically the last stage of Elder Futhark writing before the transition to Younger Futhark (generally also considered the start of Old Norse), and has several orthographic traits in common with the YF. The relevant inscriptions also show innovations found in Old Norse, but not yet in classic Proto-Norse, e.g. syncopation and the merger of the 2nd and 3rd persons in the present tense indicative of verbs. I think having it as an etymological-only language would be useful. ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 21:30, 26 March 2024 (UTC)[reply]

@Mårtensås No objections here; we can use a code like gmq-tra for this. Benwing2 (talk) 01:54, 28 March 2024 (UTC)[reply]

@Benwing2 That code would work. ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 19:36, 28 March 2024 (UTC)[reply]

What sources do you have for this name? -- Sokkjō 06:55, 28 March 2024 (UTC)[reply]

It's well accepted in the literature. If you search "transitional period" "Proto-Norse" on Google you will find numerous digitalised scholarly articles that uses it. Düwel & Nedoma (Runenkunde 5th edition, p. 124):

Einige skandinavische Inschriften um 600 zeigen bereits Charakteristika des jüngeren Fuþark, z.B. hAborum (ᛡ A für a) und hagestumʀ (ᚨ a für aⁿ) auf Stentoften (S. 25 f.) oder uþArAbsbA (-sbA für -spā) auf Björketorp (S. 51). Aus der nachfolgenden Zeit bis in das 8. Jh. hinein sind relativ wenige runenepigraphische Texte überliefert; die Beschaffenheit des Korpus dieser Übergangsinschriften (engl. transitional inscriptions) wird verschieden beurteilt (s. u.a. Birkmann 1995, 219 ff.; Barnes 1998; Grønvik 2001a, 64 ff.; Schulte 2010; Stoklund 2010).

The references are:

Thomas Birkmann, Von Ågedal bis Malt. Die skandinavischen Runeninschriften vom Ende des 5. bis Ende des 9. Jahrhunderts (= RGA-E 12; Berlin – New York 1995).
Michael P. Barnes, The Transitional Inscriptions. In: Düwel / Nowak 1998, 448–461.
Ottar Grønvik, Über die Bildung des älteren und des jüngeren Runenalphabets (= OBG 29; Frankfurt/Main etc. 2001).
Michael Schulte, Der Problemkreis der Übergangsinschriften im Lichte neuerer Forschungsbeiträge. In: Askedal et al. 2010, 163–189.
Marie Stoklund, The Danish Inscriptions of the Early Viking Age and the Transition to the Younger Futhark. In: Askedal et al. 2010, 237–252.

ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 18:02, 28 March 2024 (UTC)[reply]

I've never seen the term "Transitional Proto-Norse" though. For many languages, we have etymology-only codes for late and early forms, and Early Old Norse and Late Proto-Norse are terms that actually have traction in the literature, so I would recommend either [non-ear] or [gmq-lat] instead. -- Sokkjō 19:51, 28 March 2024 (UTC)[reply]

Some pings: @Mnemosientje, Mahagaja -- Sokkjō 19:54, 28 March 2024 (UTC)[reply]

I prefer "Late Proto-Norse" as well. The phrase "transitional period of Proto-Norse" is common enough, but not "Transitional Proto-Norse" as a lect name. —Mahāgaja · talk 20:09, 28 March 2024 (UTC)[reply]

Etymology trees[edit]

I would like to add etymology trees to some of our entries. Here's an example which could potentially be used in mainspace. I personally think there's value in representing etymology in such a visual way, and these trees are extremely easy to create (my template can create them automatically). I'd like to hear your thoughts. Ioaxxere (talk) 02:15, 27 March 2024 (UTC)[reply]

Support The visual styling is helpful for understanding. Maybe in instances where the etymology is more than x levels deep, a tree could be created. I'm open to others' thoughts as well. —Justin (koavf)❤T☮C☺M☯ 02:20, 27 March 2024 (UTC)[reply]

Oppose The example looks like {{desctree}}, just on a different page. --RichardW57m (talk) 13:48, 27 March 2024 (UTC)[reply]

@RichardW57m: I think you're misunderstanding the concept. A descendants tree and an etymology tree are exact opposites in that a descendants tree has all the descendants for a particular ancestor while an etymology tree has all the ancestors for a particular descendant. They only look similar when the tree is a simple chain (i.e., A -> B -> C -> D). Some etymology trees are much more complicated, like puny or every. Ioaxxere (talk) 21:20, 27 March 2024 (UTC)[reply]

@Ioaxxere: So why not give examples of the trees for these? --RichardW57m (talk) 09:23, 28 March 2024 (UTC)[reply]

@RichardW57m: See #New design, #Feedback on proposed label designs, and Special:Permalink/78726876 for more examples. Ioaxxere (talk) 08:04, 1 April 2024 (UTC)[reply]

Support I concur. I think there should be a way the etymology section can be visualised since it can definitely make things more clearer. It can also make things more codified. However, if this becomes permanent, a point to discuss would be lemmas who's etymology isn't clear, ie. how would they be represented visually, or for lemmas which have mixed etymology (for instance lemmas who's specific sense is influenced by a different language). نعم البدل (talk) 23:54, 27 March 2024 (UTC)[reply]

Mild

Oppose. I am opposed in particular to the current implementation involving a separate {{etymon}} specification with information duplicated between the {{etymon}} call and the actual text of the Etymology sections. I have expressed my concerns above. Benwing2 (talk) 01:53, 28 March 2024 (UTC)[reply]

@Benwing2: I agree that duplication is not ideal. My original idea was to have the template automatically generate text and have that be the only thing in the etymology section, but I feel now that that's only really possible for the simplest cases. Another idea is to use special flags on our current templates, so something like From {{m|en|something}} might be replaced with From {{m|en|something<id:whatever><etymon>}} to mark that particular term as an etymon. Ioaxxere (talk) 05:58, 28 March 2024 (UTC)[reply]

@Ioaxxere As User:AG202 points out, this is a huge change, and needs a lot more thought and design before being rolled out. That's another reason I

Oppose adding it to mainspace at this time. Benwing2 (talk) 06:02, 28 March 2024 (UTC)[reply]

Weak oppose: The issues with spacing need to be resolved, and there need to be more eyes on the matter. I like the idea, nonetheless, but it feels very rushed for such a big change. (I'm also not really sure I like the way it looks that much?) AG202 (talk) 02:03, 28 March 2024 (UTC)[reply]

Strong oppose: Looks just awful. -- Sokkjō 06:53, 28 March 2024 (UTC)[reply]

Strong support - looks pretty much the same as what we already have in descendant sections, and automating this has huge potential. Theknightwho (talk) 04:08, 29 March 2024 (UTC)[reply]

Support I think it has potential. I saw the redesigned tree and it looks very promising! That being said, I still would like to see the end product before fully implementing it. — Sameer ^{﴾مشارکت‌ها・بحث﴿} 07:07, 30 March 2024 (UTC)[reply]

New design[edit]

I've created a new etymology tree design. See below:

Etymology tree of English puny

Proto-Indo-European *pós

Proto-Italic *posti

Latin post

Proto-Indo-European *íh₂

Latin ea

Latin posteā

Vulgar Latin *postius

Old French pois

Proto-Indo-European *ǵenh₁-

Proto-Indo-European *-tós

Proto-Indo-European *ǵn̥h₁tós

Proto-Italic *gnātos

Latin gnātus

Latin nātus

Old French né

Anglo-Norman puisné

English puisne

English puny

~~Let's do a new poll in regards to this design, as a few oppose votes were made on the basis of the template's appearance. Ioaxxere (talk) 07:46, 29 March 2024 (UTC)~~[reply]

Still

Oppose, except this one straight up does not work well on mobile. Again, with a change this visible and massive, I'd avoid rushing it. It needs to be properly tested everywhere. I really do support the idea, but it needs to be done well. (Also one might argue that it'd need to have a formal vote, and if it does, you'd really need to have it entirely fleshed out) AG202 (talk) 07:57, 29 March 2024 (UTC)[reply]

I definitely agree that such a change would need a formal vote because it is a pretty large change to how the dictionary looks. Thadh (talk) 09:15, 29 March 2024 (UTC)[reply]

Shame it doesn't work on mobile, otherwise it's a great improvement over the first one. It makes things so much clearer. نعم البدل (talk) 15:34, 29 March 2024 (UTC)[reply]

@نعم البدل: What does it look like on your device? On my phone it seems to be working properly. Ioaxxere (talk) 15:55, 29 March 2024 (UTC)[reply]

@Ioaxxere: Sorry, I meant like it becomes too wide, not that it doesn't work (but that's generally with any visual template on this website). I tried it on my phone again, just now, and it turned out to be an anomaly before. It works better than other visual templates on this website. نعم البدل (talk) 16:03, 29 March 2024 (UTC)[reply]

@نعم البدل: Good to know! And thank you for the kind words—I've spent an absurd amount of time making those grey connectors pixel-perfect. Also note that the tree on English puny is one that is exceptionally wide. Here's a more representative example, for English father:

Proto-Indo-European *peh₂-

Proto-Indo-European *-tḗr

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

No poll. Let's have a discussion about the design. Also, I think it should be working fine on mobile now. Ioaxxere (talk) 08:46, 29 March 2024 (UTC)[reply]

Oppose: If we ever started creating etymology trees for entries, it would have be a very complex module, with ways to mark derivational types, certainty, alternatives, mergers, etc. Create such a module and then maybe we can explore its inclusion on entry pages with a proper vote. --{{victar|talk}} 19:58, 29 March 2024 (UTC)[reply]

Oppose. I'm impressed by the visual output and effort that went into this, but we are still a dictionary. Treeing around is amusing but unscientific. {{desctree}} is also in my opinion much abused. Common sense tells us when an etymology or a descendants branch should stop. Catonif (talk) 17:53, 30 March 2024 (UTC)[reply]

Abstain. Interesting idea but it is missing a distinction between inheritance and borrowing and seems rather elaborate to implement. I would rather we focus on your other idea of limiting ety sections to ‘one step up’ (when the ancestor entry exists) and filling in the rest with an automated module. Said module could then later be modified to output tree diagrams, perhaps, if that can be done well. Nicodene (talk) 22:42, 16 April 2024 (UTC)[reply]

Abstain I like the idea and the look; I get more information more quickly from such a small diagram than from a paragraph of text. But I think this idea should be worked out in more detail before it is adopted, as expressed by Victar. Which entries should qualify for etymology trees? —Caoimhin ceallach (talk) 13:15, 22 April 2024 (UTC)[reply]

Changing the letters sort order on Arabic dialects' category pages[edit]

I suggest changing the letters sort order on Arabic dialects category pages so that the non-native Arabic letters would come at the end

for example the current order for Hijazi Arabic is

آ أ إ ا ب پ ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ڤ ق ك ل م ن ه و ي

but it should be

آ أ إ ا ب ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ق ك ل م ن ه و ي . پ ڤ

with the non-native پ and ڤ coming at the end and separated by a dot, since they are not part of the original Arabic letters

and the Arabic sorting should be the same and it should be from right to left as per the example (unlike the current standard arabic left to right) since it looks more appealing and correct @Benwing2 عربي-٣١ (talk) 12:54, 27 March 2024 (UTC)[reply]

@عربي-٣١: A better argument would be examples of alphabetic orderings in use, as people have many ways of handling the alphabetical position of additional letters. --RichardW57m (talk) 13:36, 27 March 2024 (UTC)[reply]

The current order is more appealing and correct. The current order of the alphabet pays respect to internal etymological and graphical relations of the letters anyway. A division due to nonnativeness is outlandish. A German or English entry with é will also be sought under e, and so on for any Latin-script language. Interestingly German çöp is at the end of the order, and should arguably be because c is not much of a letter in German and since the 1901 spelling reform virtually only used in digraphs: I rather expect it with c anyway, as due to French and Czech borrowings, and the only reason it is otherwise now is the default Unicode order. But پ (p) has little reason not to be put to ب (b), as a specification of it, even less reason than ç to c. Fay Freak (talk) 14:58, 27 March 2024 (UTC)[reply]

I was only talking about the table where the letters are written on top of each page (like this one Category:Hijazi Arabic terms with IPA pronunciation), if so then they should be removed altogether since they are variants of the letters (like German ç is a variant of German c) not full letters and most speakers do not use them/pronounce them either

The sorting table should be as follows in every Arabic page:

1-should be either with no پ ڤ گ ژ چ or any other non-native variants

2-those variants would be put at the end, but putting the variants between the letters would give an impression as if they're part of the official language or the alphabetical order عربي-٣١ (talk) 16:38, 27 March 2024 (UTC)[reply]

There is no such thing as an official letterset, nor does the category give an impression, it is to navigate for people who already have an idea about the writing system of the language and what could occur in an entry title. Fay Freak (talk) 17:03, 27 March 2024 (UTC)[reply]

Actually, there are all sorts of orders even for the Latin script. In Vietnamese, tone marks don't create separate letters, but vowel quality modifiers do. The modified vowel letters occur at the end of the alphabet in some Scandinavian alphabets. Looking at the listing of various Arabic script alphabets in Appendix:Arabic_script, I note that while پ (p) occurs in same part of the alphabet as ب (b), its precise position varies. --RichardW57 (talk) 08:27, 28 March 2024 (UTC)[reply]

@Fay Freak. @Benwing2 Yes and I was talking only about the Arabic dialects and Standard Arabic, these additional letters are already sorted after the regular letters, but on the main table they are shown in the middle of it (پ after ب and ڤ after ف), so I suggest removing them or putting them at the end of it as they are sorted already

I did not mean to remove those variants completely عربي-٣١ (talk) 20:33, 28 March 2024 (UTC)[reply]

@عربي-٣١: A codepoint sort, which is what you are seeing, should not be confused with a considered order for sorting. --RichardW57 (talk) 03:04, 29 March 2024 (UTC)[reply]

Aquitanian entries in reconstruction namespace[edit]

As of now, all pages in the Aquitanian lemmas category are in the reconstruction namespace, aside from a few personal names which are in the main namespace. Most of these words appear to be attested, they should be moved to main. -saph 🍏 13:47, 28 March 2024 (UTC)[reply]

@Saph668 Overall this sounds fine for attested terms except that when they are moved, they should have the source more clearly indicated and use the form as actually attested rather than reconstructed. Currently they just say "Known from Aquitanian inscriptions" but that says nothing about (a) which inscription it is, (b) what form the term appeared in the inscription, (c) what the context was. Benwing2 (talk) 20:29, 28 March 2024 (UTC)[reply]

User:UtherPendrogn haunting us till this day.

My understanding is that Aquitanian is unattested, with the reconstructions based solely on place and personal names found is Latin texts, much like Gaulish. If that is indeed the case, all the proper names should be moved a Latin header or an Aquitanian reconstruction. --{{victar|talk}} 22:16, 28 March 2024 (UTC)[reply]

The Aquitanian corpus is limited to a few proper nouns in otherwise Latin inscriptions. While these proper nouns often contain elements from Proto-Basque, most entries in Category:Aquitanian lemmas are just duplicates of (sometimes dubious) Proto-Basque reconstructions written with an ad-hoc orthography. I've cleaned up a few of them (keeping them under the Aquitanian header), but I find the reconstruction entries pretty useless. If we were to move the actually attested forms to Latin, it might be a good idea to make Aquitanian an etymology-only language. Santi2222 (talk) 21:16, 18 April 2024 (UTC)[reply]

Japanese bot task proposal - on'yomi categorization[edit]

Hello, since I last posted (link broken, please Ctrl+F "Granularity of reading types"...) about the topic of whether we should specify the type of on'yomi (kan'on, goon, kanyouon, etc.) used by Chinese-derived terms (via {{ja-kanjitab}}'s |o= param) — which didn't get much activity, but I believe we agreed it would be good to specify — I've been making quite a few changes to that effect in which I simply go to the respective kanji pages, read off whether the on'yomi is kan'on or goon and just add it to the relevant entry (e.g. diff). This task is no trouble to automate, at least partially, since I could make the bot fetch all this data and see whether the reading types can be unambiguously labelled, or whether they'd overlap (I don't propose guessing, if a character has e.g. きょう as both goon and kan'on readings, which one it is). And if it's unambiguous, the reading types can simply be filled in. It also helps that some months ago I parsed out the complete readings of all the kanji we had covered at the time, so they could be accessed quite quickly (although they're a bit less current now).

Does this sound like a good/acceptable bot task? Thanks for any feedback, Kiril kovachev (talk・contribs) 14:03, 28 March 2024 (UTC)[reply]

@Kiril kovachev Hi. Please take a look at [12], which is a script I wrote awhile ago to do something very similar, which is pull out the type of reading from kanji pages and insert it into category pages of the form Category:Japanese terms spelled with FOO read as BAR, e.g. Category:Japanese terms spelled with 柄 read as ひ. It looks either for {{ja-readings}} or {{ja-kanjitab}}. As for your proposed bot task, yes it sounds fine to me. Benwing2 (talk) 20:26, 28 March 2024 (UTC)[reply]

@Benwing2 Oh, thanks for that. Do you think I should adapt your script to do the changes I need or did you just mean to have a look for ideas what needs to be checked, etc.? Kiril kovachev (talk・contribs) 20:43, 28 March 2024 (UTC)[reply]

@Kiril kovachev It's your choice. I'm not sure what state your scripts are in, but feel free to reuse/adapt the code or simply look at the logic to make sure you haven't missed anything. Benwing2 (talk) 20:50, 28 March 2024 (UTC)[reply]

Alright, thanks very much, I'll have a proper good look when I sit down and try to code it. Kiril kovachev (talk・contribs) 20:53, 28 March 2024 (UTC)[reply]

I noticed I was blocked permanently in March, while I did not edit anything these 2 months.[edit]

Can anyone give me an explanation? I don't know who is in charge now.

@Benwing2, Chuck Entz -- Huhu9001 (talk) 03:39, 29 March 2024 (UTC)[reply]

@Huhu9001 The block log shows that you were blocked in Nov 2023 by User:Theknightwho for one year from the Module space, which was changed earlier this March to a permanent block. You'll have to ask User:Theknightwho why he saw fit to block you like this. However, I do notice that several other people blocked you earlier, so this can't be chalked up to "just not getting along with a single administrator". Wyang in particular said "Repeat offender, discourteous, defiant", which means you don't recognize what you did wrong; IMO this doesn't bode well for an unblock. Benwing2 (talk) 03:48, 29 March 2024 (UTC)[reply]

@Benwing2 Huhu9001 has ignored everything I've said for months now, but I changed the block length because Huhu9001 had already had two one-month blocks from the module namespace, which made absolutely no difference to their behaviour. In March, I changed the block to a permanent one with the explanation Edit: this should require an appeal to expire, given the long-term nature of the abuse., because it seems highly unlikely that Huhu9001 is ever going to learn how to get along with other users, and I don't want to periodically have to deal with their (mis)behaviour every time a block expires. Theknightwho (talk) 03:54, 29 March 2024 (UTC)[reply]

Wyang himself left Wiktionary in a highly dishonorable manner which pretty much tells that he is himself "Repeat offender, discourteous, defiant" instead. And are you justifying a block now by another one several years ago? -- Huhu9001 (talk) 03:55, 29 March 2024 (UTC)[reply]

@Benwing2: Also I notice other well-managed Wikiprojects like English Wikipedia enforce a rule that:

Administrators must not block users with whom they are engaged in a content dispute; instead, they should report the problem to other administrators. Administrators should also be aware of potential conflicts involving pages or subject areas with which they are involved. It is acceptable for an administrator to block someone who has been engaging in clear-cut vandalism in that administrator's userspace.

(w:WP:BLOCKNO). I roughly remember someone told me this is a custom or somewhat "softer" rule here. Why it never has any effect when I need it to protect me from abuse? Is Wiktionary giving its admins too much arbitrariness? -- Huhu9001 (talk) 04:01, 29 March 2024 (UTC)[reply]

The block breaks even Wiktionary's own blocking policy (Wiktionary:Blocking policy). For infinite block length:

Blatant or confirmed sockpuppets created for the purpose of vandalism or block evasion.
Sockpuppets, nope.
Abuse, plagiarism, persona non grata type blocks, based on community consensus.
Put aside "Abuse, plagiarism, persona non grata" part. TKW silently changed my block to infinite. What consensus did he get? What consensus did he ever get for any of my blocks?
Bad username accounts, including: email addresses, exploitative names, copycats, offensive names, etc.
Nope.
CheckUser-identified bad sockpuppets.
Same as #1, nope.

Is not even Wiktionary:Blocking policy treated seriously any more on Wiktionary? -- Huhu9001 (talk)

Is Wiktionary:Blocking policy a valid document? @Benwing2, Chuck Entz -- Huhu9001 (talk) 04:10, 29 March 2024 (UTC)[reply]

@Huhu9001 I would have more patience with you if you showed even a little bit of understanding of the pattern of bad behavior you've engaged in. Instead you resort to wikilawyering and casting aspersions at people who are no longer active and hence cannot defend themselves. Benwing2 (talk) 04:36, 29 March 2024 (UTC)[reply]

When have you ever had any patience for me? Did you try, let alone to stop TKW's abusive behaviour, to think carefully about my stance and reasoning even a single time? If so I simply have never see it. "Wikilawyering" is a convenient accusation, but shouldn't there be any respect for rules, including Wiktionary:Blocking policy? For unprevileged normal users, what else can we rely on except for rules? -- Huhu9001 (talk) 04:46, 29 March 2024 (UTC)[reply]

Now since someone mentioned my block history. I can say I have never been blocked on any other Wikiprojects with the only exception being here. It is quite arguable whether the responsibility lies on my own side or on English Wiktionary's side, as English Wiktionary has almost always been failing to put meaningful checks on how its admins use their rights, just as I can see in the Wiktionary:Blocking policy case here. -- Huhu9001 (talk) 05:30, 29 March 2024 (UTC)[reply]

Japanese いぃ, うぅ, イィ, ウゥ[edit]

Bringing up this topic again because I get permanently blocked for this. Orginal discussion: Wiktionary:Beer_parlour/2023/August#Japanese いぃ, うぅ, イィ, ウゥ. -- Huhu9001 (talk) 04:35, 29 March 2024 (UTC)[reply]

To check whether they represent yi and wu or ī and ū, Here are the top hits from Google. Hits of the same proper names, mojibake and cases where it is not possible to tell, like うぅ居酒屋 (name of an izakaya), are ignored.
* いぃ
*# 那珂宣伝部/いぃ那珂暮らし (a city promotion group of Naka 那珂, "good Naka"): ī
*# 【音量注意】"樋口いぃいいぃいぃぃいいい" ("Higuchi gooooood"): ī
*# 新喜劇アキ【いぃよぉ～講座】("good"): ī
*# 海を感じる、エモいぃ～スポット ("emotional", inflection ending): ī
*# いぃべあー楽天 (a company name, "e-Bear"): ī
*# 台本のないコメディーvol．5 ～全部アドリブでいぃよぉ～～("good"): ī
*# いぃ〜バンド(e-band)結束バンド、アソート: ī
* イィ
*# yee(イィ) (a fashion brand): yi
*# イィの英訳 - 英辞郎: yi
*# 10年後イィ女になるために！！ ("good"): ī
*# Satomi (演歌)/イィ...女 ("good"): ī
*# 闇の洗礼をうけるがイィ！ 暴君ハバネロ外伝 ("good"): ī
*# ヴィエイィニュアンセ (a brand name, "vieille nuance", French "vieille" /vjɛj/): yi?
*# 鳥イィ？ | 上地雄輔 OFFICIAL SITE ("Like some birds?"): ī
* うぅ
*# うぅ・・・の人気イラストやマンガ/ううぅ、うぅ、あ・・（息が苦しい）/うぅぉおお/うぅぅ～暑いっ、コンビニ行こっ！ (interjections "urgh", "hmm" or the likes): ū (many hits from different sites)
*# ひねくれ領主の幸福譚 性格が悪くても辺境開拓できますうぅ！(lengthening of the sentence's ending): ū
*# ぶぅふぅうぅ農園 (@boohoowoofarm): wu
*# さうすまうぅん sausumaUun (an artist from Sapporo?): ū
*# 【ホットペッパービューティー】デフィ(defi)のフォトギャラリー：カラフルぅうぅううぅー！("colorfuuuuul"): ū
* ウゥ
*# 腕時計 ヴィヴィアンウゥストウッド (a brand name, "Vivienne Westwood"): wu
*# ウゥ～～～ン、店を出てからは振り返りたくない店かもしんないな/ウゥ～ン・・・落ちた～？？/ ウゥ〜(interjections "urgh", "hmm" or the likes): ū (many hits from different sites)
*# アズノウゥアズピンキー 美品 ロングシーズン ロゴ刻印ボタン (a brand name, "As Know As"): wu?
*: (Many ウゥ hits seem to be misspelling of ウィ, ウェ and ウォ, like "ウゥストウッド" above and "ミニウゥレット" for "ミニウォレット".)
My conclusion is these syllables are more often ī and ū. Especially when in haragana, they are almost always ī and ū. Wikitionary transliterating all of them into yi and wu is a mistake. yi and wu should be taken as special cases, like we have ヲチ (芸能人ヲチ, etc) as unusual wochi instead of regular ochi. -- Huhu9001 (talk) 08:58, 27 August 2023 (UTC)[reply]

Request for comments. (Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): -- Huhu9001 (talk) 04:35, 29 March 2024 (UTC)[reply]

Support stuff like かわいいぃいぃいぃいぃ would then become kawaiyiyiyiyi instead of kawaiiiiiiiii if いぃ transcribes just yi. をぅ•ヲゥ can be used to transcribe wu, although を/ヲ is only used for accusative. Transcription of yi is more controversial, because there are no modern single kana characters of [j] + [+front], as ye could theoretically be transcribed いぇ as in イェイ (iei, “yay!”), but otherwise looks like a blend of a [+high] + [+mid] which could be recognized as a glide. Chuterix (talk) 13:57, 29 March 2024 (UTC)[reply]

Looks legit, as long as it's somehow possible to choose which one to use in a given context. I agree that ī/ū seems more logical, based on your examples, but frankly I've never seen these used in a real scenario so I can't say much more with confidence. Kiril kovachev (talk・contribs) 02:35, 2 April 2024 (UTC)[reply]

@Chuterix @Kiril kovachev If you look at the examples, given, almost all of the ī and ū examples are down to syllable lengthening in the same manner as English: "gooooood". However, this is an absurd standard to use: it's the equivalent of using a sentence like "it's hooooot today" to justify "oo" sometimes standing for short "o" in English. It doesn't - it's simply lengthening for emphasis, and isn't the kind of spelling we're ever going to lemmatise at. On the other hand, the yi and wu terms are actually terms in their own right. Theknightwho (talk) 00:42, 13 April 2024 (UTC)[reply]

@Theknightwho You're right... it would make sense that yours be the default, with that in mind. But I still do think we need both to be possible somehow, because how we will handle sentences (e.g. in {{ja-usex}} where we need いぃ to equal ī? Kiril kovachev (talk・contribs) 14:38, 13 April 2024 (UTC)[reply]

@Kiril kovachev An override exists with the rom= parameter (though this should probably be changed to tr=). It might be worth having a flag like . and ^, but this is really rare: we have two terms which use wu (ウゥルカーヌス (Wurukānusu) and its alt form), and I don't think we have any yi terms, though it does show up in some French place names, like プイィ゠シュル゠ロワール (Puyi-shuru-Rowāru, “Pouilly-sur-Loire”). We have no terms that I can find which use these for ī or ū. Theknightwho (talk) 15:04, 13 April 2024 (UTC)[reply]

@Theknightwho Hm, okay, that's fine, I guess. I believe it's not optimal because the romanization would still need to be manually written out, whereas we usually automate it with the kana transcription. But no, that might not be worth fussing over given that is a very rare thing. I've not actually seen any uses of either of these till these discussions, so I won't worry about it any more. Thanks for the explanation. Kiril kovachev (talk・contribs) 15:13, 13 April 2024 (UTC)[reply]

Splitting WT:RFM[edit]

What do people think of splitting WT:RFM? It's currently at > 1MB in size, which is very large. We split the various RFV and RFD pages at half that size. Yes, we should be better about archiving conversations but there's only so far that gets you. There are two possibilities: Split along the same lines as RFV/RFD (which splits approximately into English, CJK, Italic, and everything else), or just split for now into English/Non-English. I am inclined to do the latter as a first split; we can always split later as needed. One issue is what to do with Templates, Categories and the like that occur in WT:RFM; maybe we should have a three-way split at first: (1) English lemmas (WT:RFME); (2) non-English lemmas (WT:RFMN); (3) non-mainspace pages, including categories, templates, languages and the like (WT:RFMO). Thoughts? Benwing2 (talk) 05:10, 29 March 2024 (UTC)[reply]

Makes sense. — Sgconlaw (talk) 05:16, 29 March 2024 (UTC)[reply]

It really doesn't matter how much you split it. If no one is closing discussions then the new subpages will get arbitrarily large eventually. Ioaxxere (talk) 08:03, 29 March 2024 (UTC)[reply]

In the past, some people suggested moving language mergers, splits, renames, etc to either the BP or a language-specific page (the latter of which is the better idea to avoid just inflating the BP while deflating RFM, heh). (The reason they're on RFM at all is that originally, merging or splitting a language entailed merging its specific template, merging an actual page in the manner RFM usually does.) I've come around to the idea: have a language-discussion page, maybe even Wiktionary:Language treatment/Discussions, and then archive the discussions to manageably-sized archive subpages of its talk page (which would entail moving the current contents of that page and its talk page to such subpages). Moving language discussions off RFM would knock it down from 1,012,358 bytes to 457,067 bytes (removing 555,291‎ bytes).
Ioaxxere is correct that if no-one is archiving, the split pages will just get large again, though. - -sche (discuss) 13:37, 29 March 2024 (UTC)[reply]

@-sche

Support moving language change discussions somewhere else. Benwing2 (talk) 20:31, 29 March 2024 (UTC)[reply]

I've wanted to do this for a long time. The hard part is what to call it. Is there some type of place where people go to discuss or make decisions on language issues? I suppose something alliterative is better than nothing: maybe "Lect lounge" or "Lect library". Or the "Lect embassy"/"Lect office"/"Lect bureau"? Or maybe emulating some kind of international organization: "League of lects/languages"? "Language court"? "Language academy"? Or something random like the vaguely Lovecraftian "Glossonomicon"?

Another option would be to separate entries/terms from everything else, so that templates, categories, appendices, languages, etc. would go to "RFMO". Chuck Entz (talk) 22:44, 29 March 2024 (UTC)[reply]

Lect Lounge! Lect Lounge! (Not that I’m going to be spending much time there, unfortunately.) — Sgconlaw (talk) 22:52, 29 March 2024 (UTC)[reply]

I admire the creativity, but I think we should stick with something that clearly communicates the purpose of the page, which will not be a general "lounge" to discuss anything lect-related (compare WT:Etymology scriptorium), but will specifically host proposals to change the way Wiktionary divides up the world's languages. I'd prefer -sche's idea of using Wiktionary:Language treatment/Discussions, even if it is terribly anodyne and sits outside the existing "requests for..." structure. Another possibility would be to split RFM into WT:Requests for moves, mergers and splits/Entries and WT:Requests for moves, mergers and splits/Languages, although that leaves no scope for the relatively rare requests to merge templates, appendices etc. This, that and the other (talk) 09:56, 30 March 2024 (UTC)[reply]

Hard Lithuanian Dotting[edit]

Should dots be preserved for Lithuanian when the accent is marked on 'i'? I noticed that we have a head word pìrmas when I would have expected pi̇̀rmas. The preservation is done by inserting U+COMBINING DOT ABOVE between the ASCII letter and the accent. (Some fonts barely show the difference.) WT:About Lithuanian is silent about the spelling of Lithuanian head words.

There seems to be a problem with stripping these combining dots as part of diacritic stripping for links. I would assume that that's minor rather than a show stopper. --RichardW57 (talk) 15:28, 29 March 2024 (UTC)[reply]

@RichardW57 Why would we need to insert U+COMBINING DOT ABOVE in the Lithuanian entry names? This isn't done anywhere else and the result looks bad in my browser. Something like pìrmas is completely unambiguous since Lithuanian doesn't have dotless i's in its repertoire. Benwing2 (talk) 19:25, 29 March 2024 (UTC)[reply]

It's always been the case that when you add a diacritic on top of i, it replaces the dot. I'd be surprised to see a font that didn't do this. Entry name replacements have no effect on displayed text. — Eru·tuon 21:58, 29 March 2024 (UTC)[reply]

@Benwing2, Erutuon: Historically, the suppression of the dot above in 'i' doesn't always happen, and in Vietnam and beside the Baltic there is some attachment to keeping the dot above when there is a diacritic above the letter. Unicode has decreed (https://www.unicode.org/versions/Unicode15.0.0/ch07.pdf pp293-4, section entitled 'Diacritics on i and j') that to keep that dot, it must be separately encoded as 'overdot', i.e. U+0307 COMBINING DOT ABOVE. Indeed, the Unicode Character Database (UCD) declares that in a Lithuanian locale, lowercasing LATIN CAPITAL LETTER I WITH GRAVE introduces U+0307; the presumption is that in Lithuanian contexts, the dot remains even when there is a diacritic above the 'i'. --RichardW57 (talk) 00:30, 30 March 2024 (UTC)[reply]

Now, as these diacritics aren't used in the normal writing of Lithuanian, examples of behaviour are hard to come by. However, I have found some quotations with these letters at https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=aa4570669a29839471f1a220ad2649a4bae0f5c5, an article by Vladas Tumasonis. The letters for showing accentuation are given in Figures1-3 therein, and perhaps more usefully, there are short quotations from a dictionary, a missal and a grammar on pp19-20. --RichardW57 (talk) 00:30, 30 March 2024 (UTC)[reply]

Malfunctions in entry name replacement can change what should be blue text into red text, as I discovered to my surprise in the opening post in this section. Imagine what happens to links to Latin terms if macrons on Latin terms aren't stripped. (Macrons distinguish entry names for some languages.) --RichardW57 (talk) 00:30, 30 March 2024 (UTC)[reply]

@RichardW57 IMO we definitely do not want U+0307 in the actual titles of entries. If this needs to be done it can be done automatically but I would be opposed to that for Lithuanian. Benwing2 (talk) 00:41, 30 March 2024 (UTC)[reply]

@Benwing2: By 'title', do you mean page name or head word? --RichardW57 (talk) 00:53, 30 March 2024 (UTC)[reply]

@RichardW57 In the page name. It can be added automatically to headwords but as I said, I'd be opposed to that. As User:Erutuon said, most fonts automatically remove the dot when an accent is added to i, and IMO that is correct and perfectly fine for Lithuanian. Benwing2 (talk) 00:58, 30 March 2024 (UTC)[reply]

@Benwing2: But seemingly not to the editors of the dictionary, missal and grammar indirectly cited above! The intervening U+0307 shall not appear in page numbers, as it should be stripped out along with the acutes, graves and tildes above it. (Note that 'ė́ ' will be reduced to the single code point for 'ė'.) --RichardW57 (talk) 01:18, 30 March 2024 (UTC)[reply]

If Lithuanian editors actually want dots on all their is and js in headwords, it could be done. I know almost nothing about Lithuanian or these dictionary editors. Alternatively, Lithuanians could lobby font makers to program different glyphs when the language of the text is Lithuanian. — Eru·tuon 01:52, 30 March 2024 (UTC)[reply]

After reading the linked article on "Encoding of Lithuanian Accented Letters", my opinion is that retaining the dot in Lithuanian entries would be preferrable as a matter of typographical style. I'm less sure however whether the best way to implement that on the technical level is to include U+COMBINING DOT ABOVE (as a practical matter, pi̇̀rmas looks wrong in the font used by my browser: it shows up with an extra dot in addition to the tittle. pı̇̀rmas looks better but doesn't have correct centering, although it shares that problem with some other oblique letters with combining diacritics such as ī̆. When not oblique, pı̇̀rmas looks OK, although it's apparently not officially the right way to do this). ~~Tumasonis seems to present this as a proposed encoding convention; is there evidence that it has actually been adopted by anyone in practice?~~--Urszag (talk) 01:59, 30 March 2024 (UTC)[reply]

@Urszag This seems to be one person's opinion, and the article is quite old (it's not dated but the last citation is from 1998). Do we have any evidence that standard Lithuanian practice actually retains dots over the i along with accents? If so, how is this implemented? BTW, by now, if the use of U+0307 were standard, certainly the fonts would have been fixed to display this correctly. The fact that it shows up wrong is a strong indication that this is not standard. Benwing2 (talk) 02:05, 30 March 2024 (UTC)[reply]

As RichardW57 said, this typographical form is also mentioned in The Unicode Standard Version 15.0 – Core Specification (published September 2022): "To express the forms sometimes used in the Baltic (where the dot is retained under a top accent in dictionaries), use i + overdot + accent (see Figure 7-2)" (page 293). So I guess it is officially the correct way to encode this after all and not just a proposal. But I don't know whether any digital fonts have bothered with it.--Urszag (talk) 02:10, 30 March 2024 (UTC)[reply]

@Benwing2 it may be of interest that the main Lithuanian academic dictionary retains the dot when applying accents to lowercase i: http://www.lkz.lt/?zodis=pirmas&id=22167950000. The encoding appears to be the "i" character plus a combining grave accent. This, that and the other (talk) 10:01, 30 March 2024 (UTC)[reply]

@Benwing2: What about the readability of Lithuanian pelė̃ (“mouse”)? The accentuation mark (a tilde) overstrikes the dot above the 'ė' in the font I'm getting for this page. The preview font actually moves the second mark to the right, overhanging subsequent letters. On the other hand, Tahoma handles both words correctly. Liberation Sans makes an effort, but squashes the marks above, while Liberation Serif handles 'pi̇̀rmas' by stacking but handles 'pelė̃ ' by writing the accents side by side. Side by side is not unknown for dot above and acute together on 'e' - call it 'accent kerning'. We see it in kūrė́jau 'Lord' on the first line quoted from the missal - Tumasonis p20. All three fonts, Tahoma, Liberation Serif and Liberation Sans stack the dot and acute on 'e'.

In short, there are fonts that support dot above and mark on top. --RichardW57 (talk) 23:14, 30 March 2024 (UTC)[reply]

@RichardW57 On my Mac Book Pro under Chrome, pelė̃ written in upright (non-italic) font looks correct but pelė̃ in italic font has the tilde overhanging the space between the l and ė. Benwing2 (talk) 23:26, 30 March 2024 (UTC)[reply]

Lydian letters[edit]

Would it be possible to update Wiktionary's rendering of the Lydian script to reflect the new values of the letters 𐤮 (formerly ś, now s) and 𐤳 (formerly s, now š)? Antiquistik (talk) 08:41, 30 March 2024 (UTC)[reply]

@Antiquistik Yes although I'd like to hear from someone else who has some knowledge of Lydian (which isn't me). Benwing2 (talk) 20:18, 30 March 2024 (UTC)[reply]

Can you provide any more information, e.g. who changed the values? Links to information about why the new values are what they are? - -sche (discuss) 04:12, 6 April 2024 (UTC)[reply]

@Benwing2, -sche Per Diether Schurr (1999), Lydisches I: zur Doppelinschrift von Pergamon, the values of the letters 𐤮 (formerly ś) and 𐤳 (formerly s) were erroneously assigned. Lydian transcriptions of non-Lydian names show that 𐤮 is always used for the sound /s/ (that is, s) while 𐤳 is always used for the sound /ʃ/ (that is, š).

It also appears that the value of the letter 𐤡 (previously /b/) needs to be updated, since Schurr concludes that it instead represents /p/.

Schurr (1997), Lydisches IV: Zur Grammatik der Inschrift Nr. 22 (Sardes) also reassigned to the letter 𐤥 (previously "v") the new value of "w" to avoid confusion with the letter 𐤸 (transcribed using the Greek letter "ν").

These new values are now used as the standard Lydian transliteration. So it would be preferable to use them. Antiquistik (talk) 16:12, 7 April 2024 (UTC)[reply]

Thanks. OK, I can find both systems in use when I quickly search Google Books for some common words in one vs the other system (like laqrisa / laqriša), but the "new" system does seem clearer, if it also involves changing 𐤸 from ν (Greek nu, which seems like a horribly confusing choice) to "ñ" to match Lycian. Pinging @Vorziblix if you have any thoughts as the only active user to have edited Module:Lydi-translit. - -sche (discuss) 17:05, 7 April 2024 (UTC)[reply]

@-sche On a purely personal level, I would support the change from ν to ñ. However I cannot find the relevant literature advocating for any value change, all I can find for now is the use of different values without explanation. So I will leave whether or not to implement this specific change to be discussed here. Antiquistik (talk) 10:28, 8 April 2024 (UTC)[reply]

@-sche: It’s been too long since I looked at the relevant literature; at present I have no personal opinions one way or the other. — Vorziblix (talk · contribs) 19:34, 10 April 2024 (UTC)[reply]

@Vorziblix Would you object to updating the values of these Lydian letters on Wiktionary? Or can we go ahead with the changes? Antiquistik (talk) 06:49, 15 April 2024 (UTC)[reply]

@Antiquistik: No objections here! — Vorziblix (talk · contribs) 13:32, 16 April 2024 (UTC)[reply]

OK, let's think which letters need to be changed. Modern works that use s for 𐤮, š for 𐤳, w for 𐤥, p for 𐤡 (I can again find both systems in use even in recent works, searching for e.g. pira vs bira "house"), and ñ for 𐤸, do they also update any of the other letters? (We appear to already be using w for 𐤥.) Then we can change everything that needs to be changed in the module at once. (Does anyone update 𐤴 to anything different avoid confusion between τ and t?) - -sche (discuss) 16:11, 15 April 2024 (UTC)[reply]

@-sche The letters 𐤮, 𐤳, 𐤥, and 𐤡 are the only ones requiring updates. There is, for now, no proposal to change the value of 𐤴 or of other letters by linguists covering the Lydian language. Antiquistik (talk) 05:53, 16 April 2024 (UTC)[reply]

@Antiquistik Can we please change the value of 𐤸 to ñ? Per User:-sche, there is support in recent works for this. Use of a mixture of Greek letters and Latin letters is bad enough, but it's IMO intolerable when the Greek letters look like unrelated Latin letters. Benwing2 (talk) 06:26, 16 April 2024 (UTC)[reply]

@Benwing2 Yes, that would be good too. Antiquistik (talk) 06:31, 16 April 2024 (UTC)[reply]

Full disclosure, 1) Despite 𐤸 reportedly occurring in several words that seem like they should be common, including certain case forms of demonstratives and pronominal clitics, I actually haven't managed to find very many works that contain words spelled with 𐤸 (Wiktionary doesn't have any, either, AFAICT), in order to determine how they transliterate it (I tried searching for such words with ν, with ñ, with n, with v ... couldn't find many works mentioning the words at all, in any spelling I could think to search for). 2) In the few works I was able to find, there were somewhat more using Greek nu, but yes, ñ is also found, and has the advantages of much greater clarity, plus agreement with Lycian where a similar letter is transliterated ñ.
OK, I guess I will change the module in a few days if no one brings any other issues forward... - -sche (discuss) 23:43, 16 April 2024 (UTC)[reply]

@-sche The Digital Philological-Etymological Dictionary of the Minor Ancient Anatolian Corpus Languages covers Lydian in its dictionary and corpus, and it transliterates 𐤸 as ν rather than ñ.

Which is why I find it preferable to leave it to further discussion here whether to update its value on Wiktionary or keep the present value. Antiquistik (talk) 22:07, 18 April 2024 (UTC)[reply]

I still think it should be ñ; it's going to be too confusing to have it look like a v. Benwing2 (talk) 22:11, 18 April 2024 (UTC)[reply]

@Benwing2 I don't disagree. Antiquistik (talk) 05:36, 19 April 2024 (UTC)[reply]

Done - -sche (discuss) 01:22, 19 April 2024 (UTC)[reply]

Orthographic borrowing[edit]

If I say I'm going to Москва next year, using Cyrillic in the middle of the sentence (as some people do), and especially if I approximate the Russian pronunciation rather than sub in a different pronunciation like "Moscow", is that an "orthographic borrowing" of the Russian word, or am I just (basically) quoting the Russian word (if not code-switching)?
My understanding has been that "orthographic borrowing" is a phenomenon that happens almost exclusively in Asian languages, where a language like Japanese borrows only the written shape of a Chinese character but not an approximation of the Chinese pronunciation: that's what makes the borrowing an only "orthographic" borrowing and not just a regular "borrowing".
But I see that e.g. い-adjective#English is currently given as an "orthographic borrowing", although it's kind of the opposite of how we define "orthographic borrowing" on T:obor and in Appendix:Glossary, and of how e.g. Korean 葉書 is an orthographic borrowing: like with Москва, い-adjective is borrowing/quoting both the pronunciation and the spelling, a straight-up borrowing. This isn't the first time I've seen someone use "orthographic borrowing" in a way that seems incorrect (compare bauxite). Am I right about the scope of {{obor}} here? (Is there anything we could do to make the scope of {{obor}} clearer / discourage misuse?) - -sche (discuss) 04:01, 31 March 2024 (UTC)[reply]

link to another relevant discussion: Wiktionary:Information desk/2022/July#Is_this_Orthographic_Borrowing? - -sche (discuss) 05:19, 1 April 2024 (UTC)[reply]

@-sche Thanks. If we systemically restrict orthographic borrowings to logographic languages, we can implement that in the code, but if it's just a suggestion and we allow things like English CCCP to be considered orthographic borrowings, I think the best we can do is display an "are you sure?" type of warning. Benwing2 (talk) 05:25, 1 April 2024 (UTC)[reply]

Pondering: regardless of whether we restrict this to logographic scripts or allow CCCP (Latin) - СССР (Cyrillic), is there ever a situation where it makes sense to say a term in one alphabetic script was borrowed from another term in the same alphabetic script? Or would it be appropriate to, at least, make it throw an error whenever the terms are in the same alphabetic script (or perhaps, since this seems to be one of the biggest sources of errors, just when both terms are Latn)? Or would that cause problems, are there valid cases? I notice that back in February, Actarus176 switched a bunch of French {{bor|fr|en|foo}} to {{obor|fr|en|foo}}, but as in your sioux example, I don't think dinosaure (for example) is an "orthographic borrowing" of dinosaur (it's just a borrowing). This would also remove p͛- and ꝓ- from being orthographic borrowings (which seems reasonable to me). - -sche (discuss) 20:33, 19 April 2024 (UTC)[reply]

But I have started to think it would be best to systematically restrict obor to logographic scripts, and not consider CCCP to be orthographic borrowing. I'm thinking about the spectrum of things CCCP is in, like cyka where the k is the wrong shape, POCCNR where some letters aren't the same direction as in Cyrillic, etc... English is only imitating the Cyrillic letters, not borrowing them. English only renders Cyrillic СССР as CCCP because it already has similar-looking letters, but in a case like cyka or POCCNR, English doesn't borrow short к or И and Я, and it's not borrowing с (es) or р (er) either, because English having c (cee) and p (pee) predates any contact with Cyrillic, very different to the situation with e.g. Japanese or Akkadian, which did actually borrow glyphs from Chinese and Sumerian. I... guess we should have a poll or something? (I doubt this needs a full, capital-WT:V-Vote.) - -sche (discuss) 20:43, 19 April 2024 (UTC)[reply]

@-sche This is fine with me. I can add tracking to {{obor}} to see where it's used (a) with non-logographic scripts, (b) where source and destination are both Latin. Benwing2 (talk) 21:10, 19 April 2024 (UTC)[reply]

I suppose you could argue that the standard Finnish pronunciation of sioux, which sounds like /sjouks/, is an orthographic borrowing because the pronunciation is entirely based on the spelling and not the source language's pronunciation; but this could as well just be called a rather striking example of a spelling pronunciation. It depends on how we define "orthographic borrowing" and whether it's restricted to writing systems based on logograms (in which case orthographic borrowings can't occur in the Latin alphabet but could occur for example in cuneiform). Benwing2 (talk) 05:16, 31 March 2024 (UTC)[reply]

But yes, the case of い-adjective#English is definitely not an orthographic borrowing. Benwing2 (talk) 05:17, 31 March 2024 (UTC)[reply]

@-sche how about this scheme for your example? Ioaxxere (talk) 05:39, 31 March 2024 (UTC)[reply]

@Ioaxxere I don't understand your table. In the case of @-sche's example, I would say that an example like I'm going to Москва next year is just code-switching. It is similar to people who say "I just got back from Nicaragua" and pronounce the word "Nicaragua" exactly as it would be pronounced in Spanish; essentially they are inserting a Spanish word in the middle of an English sentence (never mind that it's spelled the same in both languages). Benwing2 (talk) 05:47, 31 March 2024 (UTC)[reply]

It seems like you're agreeing with me. Code-switching is when you briefly switch into another language. So if you say "I'm going to Москва next year", that's an English sentence with a Russian word. Since the word is intended to be as Russian as possible, we can't possibly call it a "borrowing" of any kind.

Also, to address the question directly, い-adjective is essentially equivalent to something like γ-ray, which I would call an unadapted borrowing compounded with an English term. Ioaxxere (talk) 05:54, 31 March 2024 (UTC)[reply]

@Ioaxxere: We currently have only two entries in Category:English orthographic borrowings from Russian: CCCP and cyka. I assume these are correctly categorized and hence *MOCKBA would also be an orthographic borrowing? J3133 (talk) 05:53, 31 March 2024 (UTC)[reply]

@J3133: Yes, I agree with that categorization since those terms are spelled according to Russian conventions, or their nearest ASCII equivalents, but have been adapted into English (from a quick search I see people pluralizing "cyka" as "cykas"). Ioaxxere (talk) 06:01, 31 March 2024 (UTC)[reply]

Benwing found the words for something I was struggling to (thank you), "spelling pronunciations"—and that bauxite-type entries are better viewed as "spelling pronunciations". (And "spelling pronunciation" isn't "orthographic borrowing", or else where's "CAT:English orthographic inheritances from Middle English" for when the pronunciation but not spelling changed on inherited words like one?) That's a good explanation for why intra-Latin-script borrowings are better viewed as "spelling pronunciations". Can we move kaolin out of Category:English orthographic borrowings from French on that basis?
And I appreciate Ioaxxere's comparison of い-adjective-type entries to γ-ray, though I think that even moreso than "γ-ray", "い-adjective" is just 'quoting' the other language/script, a la code-switching but in such a way that it's OK to have an English entry (whereas we don't have Москва#English).
I am inclined to agree "CCCP" is an orthographic borrowing, but I'm not 100% sure, because while on one hand it looks identical to the Cyrillic script form, it is still changing from Cyrillic to Latin script, and well, where is the line past which approximating one script via similar characters in another is no longer "orthographic borrowing"? E.g. if it were attested, would POCCNR for Россия (Rossija) be "orthographic borrowing", although it changes the direction of some letters? What if someone ASCIIizes "ㄸ-initial" [words in Korean] as "cc-initial"? Surely there is some point beyond which an ersatz representation is definitely not orthographic borrowing(?), so does it make more sense for the line to be "when the script changes" (so Latin-script CCCP is not an orthographic borrowing), or "when the written form is not identical" (so Latin-script CCCP counts but not POCCNR)? I'm unsure. (Maybe "orthographic borrowing" should even be restricted to logographic scripts, as Benwing mentions.) - -sche (discuss) 00:06, 1 April 2024 (UTC)[reply]

@-sche There is in fact an underused template {{spelling pronunciation}}, which I have used on kaolin. Maybe this template should be augmented to allow for "spelling-pronunciation borrowings" or some such; currently it only takes a single param (the current lang), and categorizes into e.g. Category:English spelling pronunciations. Benwing2 (talk) 05:00, 1 April 2024 (UTC)[reply]


	/ˈmɑskaʊ/	/mɐskˈva/ (or its closest English equivalent)
<Moscow>	Borrowing	Why would you do this?
<Moskva>	Why would you do this?	Unadapted borrowing
<Москва>	Orthographic borrowing	You're just speaking Russian

Well let’s see what I have thought out two months ago: orthographic borrowings are a transscriptural concept. A method of writing a language must be transferred upon the manner in which another language is written. Not the case with い-adjective#English because the term embeds content in another language, referring to the content of another language, i.e. it does not loan anything at all from Japanese. You might say that code-switching even exists inside of lexical units and when one is polyglot, that is at least started a lexicon of Japanese in one’s mind, then the lexicon of each individual language is permeable to transclude, fetch as an external resource, lexical units from another language also documented within the mind.
I'm going to Москва next year has no orthographic borrowing because the category of orthographic borrowing is a category of the dictionary sphere, not of sentence analysis, in other words because I'm going to Москва next year is not parsed by us as a lexeme which we could enter somewhere as such, claiming it to contain etymological relations.
As for Ioaxxere’s table, per my previously found dogmatics, that which bro calls “orthographic borrowing” is a heterogram, and “you’re just speaking Russian” is even more an “unadapted borrowing”, for I cannot admit that the language or code-switching nature changes depending on the spelling, also the constellation is empirically rare in so far as relevant for possible dictionary entries. Fay Freak (talk) 06:39, 31 March 2024 (UTC)[reply]

Feedback on proposed label designs[edit]

In this example, German Term was possibly borrowed from English term, which was derived from a combination of Latin anc2, Latin anc4, and Latin anc6, which each have a further uncertain inherited ancestor.

Proto-Italic *anc1

?

Latin anc2

der.

Proto-Italic *anc3

?

Latin anc4

der.

Proto-Italic *anc5 breaks the symmetry

?

Latin anc6

der.

English term

bor.?

German Term

My main question is: is the text too hard to read on small screens? The font size is 12 pixels, equivalent to putting text in a <small> tag.

I'm very interested if anyone has feedback and suggestions for improving this design. Pinging @Victar, Vininn126, who highlighted the need for these kinds of labels, and @Lunabunn, who inspired the current design. Ioaxxere (talk) 04:51, 31 March 2024 (UTC)[reply]

Nice work, I like it! While I personally don't care much for the highlighted backgrounds, I can see their value in making the labels stand out more. As for the font size, I think we have found a good balance. Lunabunn (talk) 04:55, 31 March 2024 (UTC)[reply]

As a contrasting opinion, I really rate the backgrounds. They look really nice on dark theme. Kiril kovachev (talk・contribs) 02:24, 2 April 2024 (UTC)[reply]

Definitely an improvement. Would tooltips be possible? Vininn126 (talk) 07:25, 31 March 2024 (UTC)[reply]

@Lunabunn, Vininn126: I made some adjustments and added a link to the glossary for the benefit of mobile users. Ioaxxere (talk) 20:04, 31 March 2024 (UTC)[reply]

A proposal for how future big template changes should be done.[edit]

Big template changes happen too quickly for some people. While some people (mostly those that keep up with/make the latest Wikt programming news) know what the latest change to templates are, others have no clue until the big red "deprecated" banner or the Lua error text shows up. In addition, sometimes the big changes leave errors in unforeseen ways that didn't exist before.

As such, I propose a new system for big template changes, to help users get used to the templates and programmers have fewer bugs. It goes something like this:

0. Make a big change to a template (which includes actions such as deprecating any template or changing the logic of a highly-used template).
1. Let your changed version and the unchanged one exist side-by-side, but with the changed one being encouraged and receiving updates.
2. See if there are any bugs, and fix them.
3. After some time (I'd say around a month for big fixes, and maybe a little more for people to get used to the changes), remove the unchanged template.

What do you think? CitationsFreak (talk) 07:26, 31 March 2024 (UTC)[reply]

@CitationsFreak What sorts of changes prompted this? Also keep in mind that logic changes to templates often cannot easily be done in the way you suggest. Benwing2 (talk) 08:03, 31 March 2024 (UTC)[reply]

My big concern with this proposal is that it'll result in templates changing names periodically, which will annoy basically everyone. How would this work with a template like {{l}}, for instance?Theknightwho (talk) 18:03, 31 March 2024 (UTC)[reply]

@Benwing2 The threads "deprecate Template:1" and "Accelerated English plurals generate the wrong template".

@Theknightwho I was thinking of renaming the unchanged template something like {{[template-name]-old}}, so that the new template can still used as much as possible, but we have a backup in case something goes wrong. CitationsFreak (talk) 18:40, 31 March 2024 (UTC)[reply]

@CitationsFreak Would every page need to be bot converted to the old version and then changed over, or are you envisioning that people could choose which one to use? The former seems impractical, and the latter feels like a recipe for confusion, since we'll end up with a mish-mash of versions and we'll need to ensure every other relevant module is able to support both (which could get very messy). Theknightwho (talk) 18:49, 31 March 2024 (UTC)[reply]

@Theknightwho The later, but only for some time, and with a later bot job to replace the old template with the new. The other templates should only recognize the old template if they did before, and clearly mark where the old template is being recognized in-code so that it can be deleted when its time comes. CitationsFreak (talk) 19:08, 31 March 2024 (UTC)[reply]

Should be uncontroversial. DCDuring (talk) 15:52, 31 March 2024 (UTC)[reply]

A month is not much time. Commercial APIs often give six months or a year. (Backwards compatibility is even better. You can still use Stripe's 2014 APIs if your HTTP header requests it.) Equinox ◑ 15:23, 1 April 2024 (UTC)[reply]

Does it really take a year for people to get used to new templates on Wikt? CitationsFreak (talk) 17:48, 1 April 2024 (UTC)[reply]

Also, we're not a commercial operation, development is difficult enough as it is, and the barrier to getting involved in module development is already high due to the learning curve. Theknightwho (talk) 17:56, 1 April 2024 (UTC)[reply]

That's exactly right. I don't think any of this is needed; I make a lot of changes and few of them have caused any issue. Best to handle any issues on a case by case basis. We only have a few "developers" working on their own time; we can't afford to put more barriers up. Benwing2 (talk) 19:43, 1 April 2024 (UTC)[reply]

I feel that the issue of fewer developers is easily solved, by adding more developers from this wiki and not. (Also, is there a guidebook for developers? Wouldn't hurt to have one.) CitationsFreak (talk) 23:11, 1 April 2024 (UTC)[reply]

I doubt that we need a year, but some kind of statement of, 1., what the overall plans are and, 2., what particular code changes, monitoring categories, filters, etc. should not be too much to ask. There have been many technical changes over the years that have not generated as much annoyance as some recent ones. It is definitely more fun to not ever answer to anyone. Having pesky users objecting to changes will definitely slow down the pace of change. But, will it slow down the pace of desirable change? And who gets to say what change is desirable, perps or victims? DCDuring (talk) 22:37, 1 April 2024 (UTC)[reply]

@DCDuring It would be much easier to work with you if you stopped acting like such a drama-queen. I have tried to have these conversations with you many times, but we never seem to get anywhere because things are rarely good enough for you, and the objections often boil down to your personal needs at the expense of everything else. Calling yourself a "victim" is seriously unhelpful. Theknightwho (talk) 23:22, 1 April 2024 (UTC)[reply]

User:Thejnightwho I apologize for working in a realm that is different in kind from a language and therefore has needs that do not fit into the normal framework. DCDuring (talk) 15:58, 2 April 2024 (UTC)[reply]
@DCDuring And I'm working from a framework that includes more people than just you. Thanks. Theknightwho (talk) 18:01, 3 April 2024 (UTC)[reply]
I'm sorry that my needs are different. DCDuring (talk) 18:56, 3 April 2024 (UTC)[reply]

@Theknightwho, DCDuring: I see it as an attempt to train developers about user perceptions. For example, @JeffDoozan's parameter checking module Module:checkparams is in principle a good idea, but dealing with its warnings for invocations of declension templates is a pain for whoever deals with them and seeing them is distinctly off-putting for anyone making other changes to the pages. One problem is a widespread lack of documentation of the templates, and the other is that some positional parameters have become obsolete as better ways of doing things became available or more widely known. And this is despite Jeff making efforts to reduce or eliminate its impact (albeit sometimes with the aid of other editors) when its warnings are misguided. --RichardW57m (talk) 09:37, 2 April 2024 (UTC)[reply]

They fix the bugs already, I see no issue. I don’t even keep up with programming news, page creation goes smoothless, and I don’t think there is manpower or mental capacity on either developer or editor side to run multiple template and module versions concurrently. BTW I used Arch. Fay Freak (talk) 23:01, 1 April 2024 (UTC)[reply]

I don't see an immediate problem, but by #3's "remove", I suggest redirecting, as someone will possibly want to use the new name or old name in the future without having seen the parallel testing period. No particular response to the rest of the proposal. —Justin (koavf)❤T☮C☺M☯ 23:18, 1 April 2024 (UTC)[reply]

You mean the old and new templates, right? I can't see a situation where a person would want to use the old name of a template on purpose after everyone's gotten used to the change. An old version, sure, but not an old name, (as in {{1}}.) CitationsFreak (talk) 23:23, 1 April 2024 (UTC)[reply]

Correct: the names. And the reason someone would want to use the old name is "I haven't edited Wiktionary in three years, but I remember that this is how you do <var>x</var>". —Justin (koavf)❤T☮C☺M☯ 23:36, 1 April 2024 (UTC)[reply]

This feels like us having to keep templates around forever, even if it is a redirect. I suppose there is nothing wrong with it. However, I do feel like the template name must be removed at some point. I'll mull over it. CitationsFreak (talk) 23:48, 1 April 2024 (UTC)[reply]

Derived terms[edit]

According to Appendix:Glossary:

derived terms

A post-POS heading listing terms in the same language that are morphological derivatives. [Italics mine]

According to Template:derived:

This template is used to format the etymology of terms derived from another language. [Italics mine]

This seems confusing. I think one of them should be renamed. Ioaxxere (talk) 19:40, 31 March 2024 (UTC)[reply]

@Ioaxxere

I see what you mean. It happens in etymologies that terms are derived from the same language, but we don't, maybe illogically, use the "derived" template in those cases, even though we do use a relationship of the same name when referring to the given word on the page of the word it's derived from.

Despite that I still don't like a rename. I didn't notice any problem till you mentioned it, so IMO it's not a big deal. Kiril kovachev (talk・contribs) 02:18, 2 April 2024 (UTC)[reply]

@Ioaxxere: I think the difficulty is that there aren't many (or any?) alternative words that can be substituted in place of derived. Thus, the same word has been used in different contexts. Do you have a suggestion as to an alternative word for one of the contexts? — Sgconlaw (talk) 17:48, 15 April 2024 (UTC)[reply]

@Sgconlaw: "Derived" in the second sense could be changed to "descended". Ioaxxere (talk) 17:54, 15 April 2024 (UTC)[reply]

@Ioaxxere: wouldn’t this clash with {{desc}}, although we would be using the word descended/descendant in the same sense? — Sgconlaw (talk) 18:46, 15 April 2024 (UTC)[reply]

Idea: Categorizing illustrated terms[edit]

I love illustrated pages on Wiktionary. They can give a visual grip to otherwise bland white pages. And that visual grip can, moreover, convey the culture of the speakers in a way transcending characters and phonetics.

Moreover, a fair bunch of language learners are also visual learners, and illustrations would only help. On top of that, Wiktionary can not only become the most comprehensive online dictionary, but also the most delightfully illustrated one. With time, this will draw more readers, and from there, more contributors, which leads to higher quality pages, which leads to more readers – you get the drill.

For this reason, I have a proposal: Get a bot to parse the lemmas of a given language, and drop them into categories such as "Fooian terms with illustrations". We already do this with "Fooian terms with quotations". Anyone on board?

P.S. This is unrelated to April Fools' Day :) Shoshin000 (talk) 20:38, 31 March 2024 (UTC)[reply]

@Shoshin000 You know what, this sounds good, and it would be even better if we systematically used a template for adding pictures onto entries that could do the categorization for us. Then there'd be no need for period bot tasks. The problem right now is that we just use the bare wiki syntax for images (or at least I do, I dunno), so I don't think it's able to be tracked ATM. We could consider changing over to a template for that, even if it's just a thin wrapper over the image syntax? Kiril kovachev (talk・contribs) 02:20, 2 April 2024 (UTC)[reply]

Yeah, and the bot would switch Wiki syntax to the template. Sounds more simple. Shoshin000 (talk) 11:09, 3 April 2024 (UTC)[reply]

How does having the category or the new template help achieve the goal of having more entries with good visuals?

We already have {{rfi|[language]}} which shows that someone thinks that the tagged L2 would benefit from an image. That template populates the language subcategories of Category:Requests for images by language. It is principally used for English (1961) and Translingual (taxonomic) (2249) entries.

As for pages with images there are 17,735 English noun lemma pages with "File:" and 5,410 with "Image:" and 906 English proper noun lemmas with "Image" and 3,447 with "File:". There are 9,058 Translingual taxonomic pages with "File:" and 386 with "Image:". There would be some duplication in the total count of pages, but also many instances of multiple images in L2s and between L2s. There are also some English verb pages with images. Also many of the pages for letters and symbols have images.

Among the problems we have with images are the lack of informational rather than esthetic value. A picture of a tree from a distance may convey information about shape (for a "specimen" tree not growing a forest). Plants that bear colorful flowers for a week in Spring are often illustrated only by a picture of the flower. More value is contained in an image that illustrates the probable reason for the name, eg, red-bellied piranha. DCDuring (talk) 17:35, 3 April 2024 (UTC)[reply]

@DCDuring I don't think it would do that, I think it would just make it easier to discover and search pages with images on them. Question: how are you coming up with all these figures? I can't figure it out. Which is why we would benefit from such a category.

I also disagree about the purpose of images. The entry already defines the word, and the etymology would also explain anything about the origin of the word. The most useful thing an image can do is show you the most distinctive form of whatever's being defined. For example, I'll look up 木漏れ日 and immediately see what it's meant to be referring to, perhaps even without needing to see the definition; for words with slightly abstract or culturally-specific definitions, a picture really speaks a thousand words, so I think we should have as many pictures as possible for words where they make sense — they just make it much easier to understand some definitions. Not just for etymological reasons. Kiril kovachev (talk・contribs) 18:17, 7 April 2024 (UTC)[reply]

It is amazing what can be accomplished by using CirrusSearch, especially "insource:" with regular expressions.

Many of our English verbal definitions are laughably incomplete, not mention the often-ambiguous glosses that pass for definitions in non-English L2s. That justifies more, rather than fewer, illustrations. In the case of vernacular names of organisms, what passes for "red" in definitions of "red-breasted this" or "red that" can vary dramatically. DCDuring (talk) 20:37, 7 April 2024 (UTC)[reply]

I don't know if this should necessarily be subdivided by language — consider a page like Ukraine, where the images are in the English section, but equally relevant to the other language sections, but it'd look bad to repeat them in every language section — but in my view, at least, it'd be fine if someone wants to periodically bot-categorize pages with images into Category:Pages with images or something (compare Category:Pages with broken file links, for cases where an image or audio is linked but doesn't work). - -sche (discuss) 17:11, 7 April 2024 (UTC)[reply]

It's possible to do it automatically, though any images added by templates wouldn't be included. Plus, we'd want to exclude any images which are part of a link template, since they're sometimes used for scripts which haven't been encoded yet. It's probably one of those jobs that sounds simple, but ends up being quite complicated. Theknightwho (talk) 20:52, 7 April 2024 (UTC)[reply]

French Wiktionary[edit]

Gosh, what is happening over there? I visited it only to see this banner at the top:

L’accès à la base de données est désactivé pour une durée indéterminée

Plainte

Je vous écris cette lettre au nom et pour le compte de Monsieur (omis), afin de vous informer que j’ai déposé le 25 mars dernier une plainte officielle auprès de la Préfecture de Police de Versailles contre votre site Internet, car il est coupable d’avoir publié des nouvelles et des faits à caractère injurieux, susceptibles de porter atteinte à l’honneur et à la réputation de mon client.

Comme l’a rappelé la jurisprudence (par exemple, Cour de cassation 03/3956) « … l’utilisation d’un site Internet pour la diffusion d'images ou d'écrits susceptibles d’offenser une personne est une action susceptible de porter atteinte au patrimoine juridique de l’honneur, et constitue donc le délit de diffamation aggravée… ».

Nous regrettons d’avoir dû recourir à l’autorité judiciaire, mais le ton et le contenu de l’article en question ne laissaient pas la moindre place à la négociation.

Je vous prie d’agréer, Madame, Monsieur, l’expression de mes salutations distinguées.

Avocat (omis) Ordonnance

En vertu de la plainte n° 2154021/24 déposée à l’Hôtel de Police de Versailles le 25/03/2024, il est notifié ce qui suit :

ayant été informé d’une infraction contre fr.wiktionary.org, étant directeur général de l’association Wiktionnaire,

la fermeture préventive du site fr.wiktionary.org est ordonnée car il fait l’objet d'une enquête pour l’infraction visée à l’article 595 du code pénal, avec la circonstance aggravante visée au paragraphe 3 du même article.

La fermeture doit être achevée au plus tard le lundi 1er avril 2024 à 15 heures.

Vous êtes également informé que, conformément aux dispositions du code de procédure pénale, vous avez été désigné comme avocat commis d’office par M. (omis).

Ceci est sans préjudice de votre droit de désigner un conseil de votre choix, en notifiant cette procuration par l’envoi, également par fax à (omis), d’une désignation formelle d’un conseil.

Versailles 31/03/24

Procureur (omis) Avis aux utilisateurs et utilisatrices du site Wiktionnaire

Nous avons considéré qu’il était préférable d’occulter les noms des personnes et des utilisateurs directement concernés.

Il est impossible de trouver les mots justes pour exprimer le sentiment de perte profonde que cette décision nous laisse. Une seule chose est sûre : le Wiktionnaire ne s’arrête pas là.

L’esprit qui animait le projet jusqu’à hier est profondément blessé, mais la communauté saura trouver de nouveaux espaces pour repartir.

Restez en contact, le Wiktionnaire francophone.

— Sgconlaw (talk) 22:41, 31 March 2024 (UTC)[reply]

For those of us who parles un petite rancais (like myself), DuckDuckGo says this means:

Access to the database is disabled for an indefinite period of time

Complaint

I am writing this letter to you in the name and on behalf of Monsieur (omitted), to inform you that on March 25 I filed an official complaint with the Prefecture of Police of Versailles against your website, because it is guilty of having published news and facts of an offensive nature, likely to damage the honor and reputation of my client.

As recalled in case law (e.g. Court of Cassation 03/3956) "... the use of a website for the dissemination of images or writings likely to offend a person is an action likely to harm the legal patrimony of honour, and therefore constitutes the offence of aggravated defamation... ».

We regret that we had to resort to the judicial authority, but the tone and content of the article in question did not leave the slightest room for negotiation.

Please accept, Madam, Sir, the expression of my distinguished greetings.

Counsel (omitted)

Order Pursuant to complaint no. 2154021/24 filed at the Versailles Police Station on 25/03/2024, the following is notified:

having been informed of an offence against fr.wiktionary.org, being the general manager of the Wiktionary association,

The preventive closure of the fr.wiktionary.org site is ordered because it is under investigation for the offence referred to in Article 595 of the Criminal Code, with the aggravating circumstance referred to in paragraph 3 of the same Article.

The closure must be completed no later than Monday, April 1, 2024 at 3 p.m.

You are also informed that, in accordance with the provisions of the Code of Criminal Procedure, you have been appointed as a court-appointed lawyer by M. (omitted). This is without prejudice to your right to appoint counsel of your choice, by notifying such power of attorney by sending, also by fax to (omitted), a formal appointment of counsel. Versailles 31/03/24 Prosecutor (omitted) Notice to users of the Wiktionary site We felt it was best to redact the names of the people and users directly involved. It is impossible to find the right words to express the sense of deep loss that this decision leaves us with. Only one thing is certain: Wiktionary doesn't stop there. The spirit that animated the project until yesterday is deeply wounded, but the community will be able to find new spaces to start again. Stay in touch, the French-speaking Wiktionary.

You are also informed that, in accordance with the provisions of the Code of Criminal Procedure, you have been appointed as a court-appointed lawyer by M. (omitted).

This is without prejudice to your right to appoint counsel of your choice, by notifying such power of attorney by sending, also by fax to (omitted), a formal appointment of counsel.

Versailles 31/03/24

Prosecutor (omitted) Notice to users of the Wiktionary site

We felt it was best to redact the names of the people and users directly involved.

It is impossible to find the right words to express the sense of deep loss that this decision leaves us with. Only one thing is certain: Wiktionary doesn't stop there.

The spirit that animated the project until yesterday is deeply wounded, but the community will be able to find new spaces to start again.

Stay in touch, the French-speaking Wiktionary.

Courtesy link: fr:. —Justin (koavf)❤T☮C☺M☯ 23:44, 31 March 2024 (UTC)[reply]

fr:MediaWiki:Sitenotice. —Justin (koavf)❤T☮C☺M☯ 23:46, 31 March 2024 (UTC)[reply]

@Àncilu: —Justin (koavf)❤T☮C☺M☯ 23:51, 31 March 2024 (UTC)[reply]

Wow. I knew that British libel laws are seriously f***ed (i.e. extremely biased in favor of rich plaintiffs) but I didn't realize things are even worse in France. Benwing2 (talk) 23:54, 31 March 2024 (UTC)[reply]

🐟 this Is April 1st 🐟 Àncilu (talk) 23:55, 31 March 2024 (UTC)[reply]

@Benwing2 @Koavf @Sgconlaw Àncilu (talk) 23:56, 31 March 2024 (UTC)[reply]

Sacre bleu! Zoot alors! —Justin (koavf)❤T☮C☺M☯ 00:02, 1 April 2024 (UTC)[reply]

fr:Wiktionnaire:P/24. —Justin (koavf)❤T☮C☺M☯ 00:03, 1 April 2024 (UTC)[reply]

@Àncilu: OH! Ha ha ha! Good one! I was mystified as to how content at the Wiktionnaire (as opposed to Wikipedia) could end up defaming someone… — Sgconlaw (talk) 01:39, 1 April 2024 (UTC)[reply]

I don’t see that anything could have happened either, even if it was not an Aprils joke. @Àncilu added this to fr:MediaWiki:Sitenotice, so suppose he got an e-mail about a supposed proceeding. What ever that proceeding is in France, a court proceeding or administrative proceeding, it has to be delivered to an actual person representative of French Wiktionary (international delivery, given Àncilu is in Italy (?), is also quite a feat), which doesn’t even have legal capacity, so there can be no pending case, the lawyer would be incompetent. Fay Freak (talk) 00:02, 1 April 2024 (UTC)[reply]

@Fay Freak i live in France. Àncilu (talk) 00:07, 1 April 2024 (UTC)[reply]

@Àncilu: Well, have you heard how they invented separation of powers in France? I doubt that they would bring a defamation case before the Préfecture de Police de Versailles 😵‍💫, because that would be a civil matter. Can we admit this as creativity? Fay Freak (talk) 00:37, 1 April 2024 (UTC)[reply]

@Fay Freak : I wrote this to make April Fool's Day less realistic to avoid external problems, because of people might outside the Wiktionary misunderstand if it would be mentioned even indirectly. Àncilu (talk) 12:07, 1 April 2024 (UTC)[reply]

April 2024

Request lemma[edit]

Requesting template rf-lemma or rf-entry = creation of this lemma is wanted (urgently). Why?

When parts of etymologies referring to different entries, are moved at a new empty page or at a new L2.sector in the same page (as in principle: "No repetitions"). e.g. I have moved part of Etymolgory from Modern Greek ακανθόχοιρος (akanthóchoiros) to a new page for the red-link to Koine ἀκανθόχοιρος (akanthókhoiros).

Is this OK? Should I stop moving them? I did not find a req-lemma at Category:Request templates (sooo many!) Is there another template for needed new pages? alert programmers MM Benwing2, Surjection Thank you ‑‑Sarri.greek ^♫ I 17:41, 1 April 2024 (UTC)[reply]

@Sarri.greek: I believe {{rfdef|<lang>}} is the template you're looking for. However, there is not much point to just moving the etymology if the new word (ἀκανθόχοιρος in this case) doesn't even have a definition. Yes, we should avoid "repetitions", but if the new word doesn't have a definition, it's not "repetition" to simply not create the word in the first place.

Also, please be careful when moving information, as you moved the "displaced ..." part, which applies to the Greek lemma and not the Ancient Greek lemma. I also could not find the word in Hesychius.

There are also some minor formatting issues in your new page. Please be more careful in the future. --kc_kennylau (talk) 10:13, 2 April 2024 (UTC)[reply]

Thank you M @kc_kennylau -sorry, I did not get a ping from my browser for your {reply}. O! And sorrry for my mistake during moving material (I make more mistakes as, alas, I am getting older...) About {wanted} or {requested} lemma. It is especially needed when the 'other' language is in the same page; it happens a lot in Greek. There is no way (orange links and the similar do not help) to make an urgent call to create a lemma. Pages like Wiktionary:Requested entries (Ancient Greek) are wishlists. But an urgent need, means there is a gap, a need for a lemma. It asks editors who might be interested in creating lemmata in this language to begin first and mainly with the {wanted} calls. Thank you. ‑‑Sarri.greek ^♫ I 03:30, 4 April 2024 (UTC)[reply]

Etymology tree testing[edit]

As @AG202 and others pointed out, it is very important to test out major changes before implementing them in mainspace. The problem is that I don't know everyone's use cases. Therefore, I invite the community to suggest terms to create an etymology tree for. If the output is undesirable in some way, I can tweak the template to give a better result. Before commenting, please note the following: 1. No test trees will be created in mainspace. 2. The template requires each ancestor to have an entry. If your entry says "from English redlink1, French redlink2, from Latin redlink3", I can't really do much with that. 3. If there's a problem with the output, please give constructive criticism so I can fix it.

To start, here's an example of a hypothetical Swedish term which is borrowed from English father and calqued from Old English fæder. This would be generated by: {{etymon|sv|id=whatever|bor|en>father>male parent|calque|ang>fæder>father|tree=1}}

Etymology tree

Proto-Indo-European *peh₂-

?

Proto-Indo-European *-tḗr

?

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

bor.

Old English fæder

calq.

Swedish hypothetical

Ioaxxere (talk) 19:14, 1 April 2024 (UTC)[reply]

Not sure why it's not working but

test:

{{etymon|hi|id=whatever|title=गुल|bor|fa-cls>گُل>flower|tree=1}} (escaped by kc_kennylau (talk) 09:53, 2 April 2024 (UTC))[reply]

my guess is that it does not support etymology codes? — Sameer ^{﴾مشارکت‌ها・بحث﴿} 08:08, 2 April 2024 (UTC)[reply]

I think this is best suited to a subpage in the template. I've moved these two examples to Template:etymon/testcases. I have also escaped the above code because it is generating a error and filling up Cat:E. --kc_kennylau (talk) 09:53, 2 April 2024 (UTC)[reply]

@Sameer: Yes, it turned out that the template was incorrectly rejecting etymology language codes. Here is how it should look:

Etymology tree

Proto-Iranian *wardah

Old Persian *vr̥dah

Middle Persian gwl

Classical Persian (gul)

bor.

Hindi गुल (gul)

It seems like the Persian character is tall enough to slightly overflow the box. I'm not sure what I can do about this, since the font size is actually being pumped up by the {{m+}} template which is used within the box. Ioaxxere (talk) 14:33, 2 April 2024 (UTC)[reply]

@Ioaxxere: It seems that height:auto; could solve this problem; see example. --kc_kennylau (talk) 19:10, 2 April 2024 (UTC)[reply]

Fixed Ioaxxere (talk) 00:00, 3 April 2024 (UTC)[reply]

While I agree with the reservations expressed by others in the earlier March discussion on this same proposal, I have to say this looks pretty good already. If the infrastructure behind it were robust enough, it could become a pretty neat addition to the project. — Mnemosientje (t · c) 13:47, 2 April 2024 (UTC)[reply]

100% agree that this is visually interesting, more intelligible, and handy. I totally support this and have only minor cosmetic tweaks to suggest. —Justin (koavf)❤T☮C☺M☯ 16:49, 3 April 2024 (UTC)[reply]

@Ioaxxere A bit belated, but I like the idea. My only (minor) criticism is to ask why the text inside the boxes is so large. Is there any particular reason for that? Could we reduce it to the same size as the other ordinary text found in the rest of a given entry? (Or is it just that way on my display, for whatever reason? Not sure if it shows up normal-sized for other people.) — Vorziblix (talk · contribs) 19:46, 10 April 2024 (UTC)[reply]

@Vorziblix: Yes, the font size was (by default) slightly larger than regular text (16px versus 14px). I've gone ahead and made it smaller (although I'm not totally sure whether it's better this way...). Ioaxxere (talk) 20:13, 10 April 2024 (UTC)[reply]

@Ioaxxere: Thanks! IMO it looks great. — Vorziblix (talk · contribs) 15:11, 11 April 2024 (UTC)[reply]

This looks pretty neat! I want to suggest using a tabular format (left- or right-alignment, like two columns) instead of center-alignment for the boxes. It is much easier to read and compare similar data when they are aligned with each other, rather than arbitrarily aligned due to varying text width. (I'm not sure how it would best be implemented – maybe a consistent width for language name and etymon – but anyway should be tested on mobile screen sizes.) I also find the small "?" icon not immediately intuitive (despite the tooltip) and not emphatic enough for representing anything that is iffy (i.e. too easy to miss), and would prefer spelling out the word "uncertain" instead, perhaps with a change in box color contrast as well. It would be nice if the collapsible box header said "Etymology tree for <word>" instead of "Etymology tree". Hftf (talk) 22:10, 17 April 2024 (UTC)[reply]

@Hftf: Thank you for the feedback! 1) Using a tabular format would mean packing everything into a rectangular grid, which I don't think would look good. 2) My ideal solution for uncertainty would be to have a dashed line, but unfortunately this doesn't seem to be technically possible. I'm not sure I like the appearance of uncertain spelled in full — but I'm fine with doing that if other people agree. Another idea would be to change the colour of the box itself, so every uncertain etymon might be pink rather than beige. I don't want to make any major design changes at this point though, since I'm preparing to start a vote on the template in the next few days (see #Etymology tree vote). 3) The idea of the template is to be used on the entry page itself, and of course there's no need to remind the reader what entry they're on. However, if we started adding trees to other pages you would be absolutely right. Ioaxxere (talk) 17:16, 18 April 2024 (UTC)[reply]

Thanks for the response! I don't mean packing everything into a rectangular grid per se – but would it be possible to experiment with using fixed widths (couple hundred pixels, but overrideable) and a left (or right) alignment for the two primary pieces of information (language and etymon) at least? The dynamic non-space-constrained medium of the screen allows us to not need to rely heavily on abbreviations that often plague most printed dictionaries, while balancing screen size constraints, sleek aesthetics, information density, colorblindness accessibility, and other factors. I still think there is net value in a "redundant" header – taking a screenshot of just the tree widget to embed somewhere else, for instance, would benefit from it, and that space is being used for nothing anyway. Hftf (talk) 00:00, 19 April 2024 (UTC)[reply]

T:antsense, to finally clarify T:sense on antonyms[edit]

As has been discussed a number of times, including Wiktionary:Beer_parlour/2016/August#Suggestion_for_sense_tags_on_antonyms (which I stumbled upon while trying to find a different discussion where I suggested the same thing), it perennially confuses lots of people that we write things like "(to start work): clock in, clock on, punch in, go on the clock" ... so I've gone ahead and made the T:antsense template Benwing proposed in the 2016 discussion; you can see it in use now, displaying "(antonym(s) of "to end work"): clock in". Please feel free and encouraged to make the template better, find a better name, whatever... once it's in a state everyone's happy with, maybe we can bot-deploy it to entries and finally end this enduring source of confusion... - -sche (discuss) 02:26, 2 April 2024 (UTC)[reply]

@-sche Thanks! Benwing2 (talk) 05:14, 2 April 2024 (UTC)[reply]

I hope that one day we can delete this template because we will have placed all the antonyms under the relevant sense lines! This, that and the other (talk) 22:50, 2 April 2024 (UTC)[reply]

I am going to do a bot run in a couple of days to switch all occurrences of {{sense}} to {{antsense}} in Antonyms sections. Benwing2 (talk) 01:15, 4 April 2024 (UTC)[reply]

@-sche: I just now found out about this template, and honestly, it confused me even more, and in the same way you pointed out above ("big" does not have a sense that is translated as "antonym of big"...; and furthermore in some language it might!). In my opinion, the only way to solve this so that it is clear what is meant, is by doing something like suur#Antonyms (bar the new template), where the meaning of the antonym is given within the link template. Thadh (talk) 21:53, 11 April 2024 (UTC)[reply]

Hmm... so you interpret

Antonyms

(antonym(s) of "begin working time"): clock out

as saying antonym(s) of "begin working time" is one of the definitions of clock in? (or that it is the definition of clock out?) If anyone else interprets it that way, I hope they'll chime in; to me, the parenthetical (s) and the fact that part of the text is set off in quotation marks and quotes the definition further up the page makes it very unlike anything that appears in our definitions of words — I don't think any language has a single word that we would translate as "antonyms, plural, of quote "big" unquote", we would definitely format it differently, without quotation marks, and "antonym" would be exclusively singular — so I would not have expected anyone to interpret it as a definition. But your comment shows we could benefit from being even more explicit! What if we change the wording to (the following word(s) is/are antonym(s) of "begin working time"): clock out? (The beauty of this being a template is that we can change the wording in one place and it will propagate out to all entries.) - -sche (discuss) 22:15, 11 April 2024 (UTC)[reply]

We don't always use {{sense}} to give the gloss already used in the entry, sometimes we make it a little more specific. If an entry has a meaning "not X" (which happens quite often, English doesn't always have a good equivalent for these), then it would actually make sense for me to say "antonym of X" as a sense. The parenthesised "(s)" indeed does make it less likely, but it's also pretty easily missed.

It's possible that I interpret it that way because I am used to antonym sections using the sense as in the entry, rather than describing the following term, but on the other hand, so are our readers, right? Thadh (talk) 23:55, 11 April 2024 (UTC)[reply]

FYI: April Updates (Unicode)[edit]

https://mailchi.mp/a9bd0287cce4/testing-rickys-template-6362454

Note that it includes Wikimedia news. —Justin (koavf)❤T☮C☺M☯ 16:47, 3 April 2024 (UTC)[reply]

Automatic cognate generation[edit]

Many of our entries, like king#Etymology_1, include long lists of cognates. Would anyone be interested in a template that could be used to generate these kinds of lists automatically? The template would work by adding terms into a category which would be accessed to get a list of cognates. Ioaxxere (talk) 19:24, 3 April 2024 (UTC)[reply]

@Ioaxxere: That's really a job that is done by Proto-Germanic *kuningaz. The only benefit of this sort of template would be that if I had a Proto-Tai descendant on my watchlist, I wouldn't get alerted every time someone added the form from yet another Zhuang dialect. --RichardW57 (talk) 20:54, 3 April 2024 (UTC)[reply]

Like Richard said, I don't think this is really necessary, and I am personally of the opinion that these long lists of cognates do more harm than good, so we should rather remove them than make it easier to generate more of them. Thadh (talk) 20:58, 3 April 2024 (UTC)[reply]

I second both of the above comments. Nicodene (talk) 06:22, 10 April 2024 (UTC)[reply]

"terms spelled with"[edit]

^{(moved from User talk:Benwing2#"terms spelled with")} We currently have two categories Category:Hindi terms spelled with ॉ and Category:Translingual terms spelled with ◌ॉ whose names are in conflict.

In terms of Unicode, the former is:

[...] U+0020 'SPACE'
U+0949 'DEVANAGARI VOWEL SIGN CANDRA O'

And the latter is:

[...] U+0020 'SPACE'
U+25CC 'DOTTED CIRCLE'
U+0949 'DEVANAGARI VOWEL SIGN CANDRA O'

Usually, U+25CC is included in the name of the category, such as in Category:Translingual terms spelled with ◌̺ which in Unicode is:

[...] U+0020 'SPACE'
U+25CC 'DOTTED CIRCLE'
U+033A 'COMBINING INVERTED BRIDGE BELOW'

Thus the name appears with the combining character U+033A being displayed on the dotted circle for demonstration purposes.

U+0949 is a combining character as well, and indeed if you try to highlight the character in the first category name, you are forced to select the space as well. However, visually the dotted circle is also rendered, at least on my browser.

Should we unify them? If so, which one should we choose? --kc_kennylau (talk) 00:58, 4 April 2024 (UTC)[reply]

It's not that simple. Whether dotted circle plus mark renders properly is, or has been, renderer and script dependent. Sometimes the result is two dotted circles! --RichardW57 (talk) 04:42, 4 April 2024 (UTC)[reply]

It's not "script dependent" under any conventional definition on script, nor should we let the fact that some misbehaving renderers render two circles there (resulting in a minor visual glitch) take priority over correctness/consistency. Lunabunn (talk) 21:17, 23 April 2024 (UTC)[reply]

Requiring attribution when moving from one Wiktionary page to another[edit]

Er, what do we think about this? [13] Equinox ◑ 13:59, 4 April 2024 (UTC)[reply]

Well I'm a little confused, since @Indigopari reverted @A westman's edit to haemoglobin that copied material from hemoglobin on the basis of giving no attribution, but then they immediately violated that rule less than 15 minutes later when copying material in the other direction. Theknightwho (talk) 14:22, 4 April 2024 (UTC)[reply]

The edit summary looks sufficient to me. It would have been better, though, to just make a minor edit and say "the previous edit copied from hemoglobin"- although it was obvious enough what they were doing. Chuck Entz (talk) 14:46, 4 April 2024 (UTC)[reply]

That's correct, though the attribution can be as simple as saying where you got it from in the edit summary. Of course, not every change is distinctive enough to require attribution- we're talking copyvio or plagiarism, not dotting i's and crossing t's. Chuck Entz (talk) 14:25, 4 April 2024 (UTC)[reply]

Oh woe, what kind of attribution should I have given? ✵ A Westman talk stalk 14:38, 4 April 2024 (UTC)[reply]

@A westman: There's a policy WT:FORMS about handling alt forms and soft redirects. Probably you shouldn't have duplicated the information in two entries, because they may easily get out of sync and it's a maintenance burden. Also was your deletion of the Welsh entry actually intended? --Ssvb (talk) 15:45, 4 April 2024 (UTC)[reply]

The deletion was unintended. ✵ A Westman talk stalk 16:49, 4 April 2024 (UTC)[reply]

One way to look at it is to treat the collective of Wiktionary editors as a single entity from the copyright standpoint. But if individual editors want to be always personally credited when moving every tiny bit of text from one entry to another, then this looks more like the case of malicious compliance. For example, in this diff I copied the texts of glosses from their English entries. Did I have to explicitly mention these entries in the edit's summary? Did I have to track the actual handles of the editors, who contributed these pieces of text in the first place?

In principle, the wiki engine could do the identification of copied text fragments automatically, reducing the need for manual labour. Yes, this would be very resources intensive and probably won't be implemented any time soon. But theoretically this can be done. --Ssvb (talk) 15:14, 4 April 2024 (UTC)[reply]

@Vahagn Petrosyan as someone who expressed interest in attribution in earlier discussions. Thadh (talk) 16:55, 4 April 2024 (UTC)[reply]

Wikipedia:Copying within Wikipedia may be relevant. Whatever the formal rules may be, mentioning in the edit summary that content is copied from a certain page is the decent thing to do. We spend a lot of time, energy and ability to write a good Wiktionary entry. We deserve some credit. Vahag (talk) 18:17, 4 April 2024 (UTC)[reply]

Some thought has been given to this issue on Wikipedia, and the result was w:Template:copied. The problem is that each individual's (surviving) contribution has to be acknowledged for as long as they retain copyright. Now, it can be done by implicitly referencing the change history of the source, but that only works while the change history remains accessible. That template attempts to protect this history, but I don't know how well that works. In practical terms, the history usually dies with the page. As {{copied}} on Wiktionary was deleted, I suspect the idea went down like a lead balloon on Wiktionary and the collective decision was to live dangerously.

The selection of which cognates to display is sufficiently original to be protected by copyright - the question is then whether the 'fair use'-like exemptions from copyright apply. Wikimedia seems only to worry about US law, but many of us might fall foul of local laws. --RichardW57 (talk) 17:59, 7 April 2024 (UTC)[reply]

Separate from the copyright question: A Westman, why were you moving the content from hemoglobin to haemoglobin in the first place? In the case of US/UK spelling differences, the thing we've been doing to be neutral (not inherently favouring one national variety or the other) is centralizing the content on the older entry, which in this case is hemoglobin (which is also, as an aside, the more common spelling). - -sche (discuss) 16:58, 4 April 2024 (UTC)[reply]

Because haemoglobin is used everywhere outside the US and matches the Greek and Latin spellings, it's not just a UK spelling. ✵ A Westman talk stalk 18:51, 4 April 2024 (UTC)[reply]

Also just using the older spellings is not "neutral" because most of them are in US English (inherently not neutral). Which makes sense because most Wikimedians are American afaik. ✵ A Westman talk stalk 18:53, 4 April 2024 (UTC)[reply]

@A westman Favoring UK/Commonwealth spellings is no more "neutral" than favoring US spellings. But in fact your AFAIK about most Wikimedians being American isn't even true; they are scattered across the world, and I actually get the sense more current Wiktionarians are British than American. (From what I've seen, there is a fairly random mixture of pages where the main version is hosted using the British spelling vs. the American spelling.) Benwing2 (talk) 19:40, 4 April 2024 (UTC)[reply]

For the last one from what I've seen there really isn't. And I never said UK spellings are neutral, what i did say is that there is no way to be neutral. US spellings are localized to the US and to an extent Canada and the Phillippines but everywhere else CW English prevails. ✵ A Westman talk stalk 19:54, 4 April 2024 (UTC)[reply]

And anyway: https://stats.wikimedia.org/#/en.wiktionary.org/reading/page-views-by-country/normal%7Cmap%7Clast-month%7C(access)~desktop*mobile-app*mobile-web%7Cmonthly ✵ A Westman talk stalk 19:58, 4 April 2024 (UTC)[reply]

And that tells you precisely zero about who edits Wiktionary. Benwing2 (talk) 20:02, 4 April 2024 (UTC)[reply]

Fair enough but those statistics aren't available. ✵ A Westman talk stalk 20:03, 4 April 2024 (UTC)[reply]

Yeah, those who prefer UK spellings point to the number of countries that theoretically consider those standard (although how many people there actually use them, when English is not the main native language, is another matter), and those who prefer US spellings point to the number of greater uses of US vs UK spellings (greater number of works using them)... sometimes people try to work out how many native English users there are of one or the other... to avoid pages being moved back and forth because you come along in April and think your preference is the rational one, and then someone else comes along in May and thinks their own preference is the rational one, just use the older entry. However, when a spelling is not merely "alternative" but a national "standard", we could be using T:standard spelling of like this—that template postdates a lot of entries, so many don't use it yet. - -sche (discuss) 20:30, 4 April 2024 (UTC)[reply]

No part of the page reaches the threshold of originality, in the jurisdiction I know. Luckily, the definition is written by late SemperBlotto in 2005, vouchsafing verisimilitude of lacking creativity. The collective of Wiktionary editors is no natural person and thus cannot be the author of a work, which is the requirement for copyright to arise, nor can there be co-authors if there is nor will nor imaginative power to create a joint work under a joint conception; instead contributions are driven by intrinsic logics of the subject matters and rarely collaborative. Fay Freak (talk) 17:17, 4 April 2024 (UTC)[reply]

@Fay Freak: The WMF:ToU#7._Licensing_of_Content mentions "the Wikimedia community" in its text, maybe not as the copyright holder, but it's still mentioned. The Terms of Use also explains that the outsiders only have to provide the article URL to comply with the attribution requirements of the license and that's sufficient. Not having this particular clause would render the content of Wikipedia unusable to be republished anywhere under the CC BY-SA license, because some individual Wikipedia editors would start pestering the re-publishers and demand to be given credit personally. Now back to Wiktionary. Reusing and copying parts of text from the corresponding English entries or from the cognates of the same word in other languages is rather common in Wiktionary. I have witnessed this myself many times. Can we possibly agree that mentioning this is generally not necessary in the edit summary? Because such copying either does not reach the threshold of originality or because the original text is still easily available just one wikilink away. I mean, via cognate links in the Etymology section or via links to the English words in the list of the foreign word senses. --Ssvb (talk) 01:09, 5 April 2024 (UTC)[reply]

Of course, either the threshold of originality is not reached (with list-like content, which is almost everything in Wiktionary, sometimes formulating weighing and illustrating aspects, as said in homoeopathic doses for what Blotto wrote there, who described chemistry like one is five), or anyone interested should ask himself and guess the provenience or attribution, or because authors with the licence agreed to collective attribution, even by implication of participation if not expressis verbis: I mean such a site works with a low entry barrier, we can’t rebase like in git commits. Even if a source got deleted one can just ask about the edit history if explaining a legitimate interest, this can play a role for establishing a copyright violation or at least proceeding, otherwise main character syndrome. Fay Freak (talk) 03:28, 5 April 2024 (UTC)[reply]

Some authors seem to have agreed to collective attribution, but I don't know an easy way of finding out who has. --RichardW57 (talk) 17:25, 7 April 2024 (UTC)[reply]

Limburgish nominal inflection[edit]

So this is apparently a meme in certain online communities. Looking at the table, even without any knowledge of Limburgish, it is clear that dative *berem cannot be correct and that the ‘locatives’ are all extremely suspect. Nor is it obvious what the ‘consonant mutation’ refers to (progressive assimilation?). The Limburgish page looks a lot more sane by contrast. 109.184.88.220 19:16, 5 April 2024 (UTC)[reply]

@BartGerardsSodermans You seem to have been the one who added these entries. Can you fix this? Benwing2 (talk) 05:14, 6 April 2024 (UTC)[reply]

@Benwing2 I've gone around and simply removed those, as I added most of them when I transforming the original pages' bare wikitables into templates. I based this template on whoever originally added the inflection tables, but they seem either very specific to one specific dialect or just to have been wrong in describing nominal inflection in Limburgish. These questionable non-template inflection tables are still present in some cases, like geo, glee, hieër, hoes, kindj, krieëk, meule, wien, water, and hit. I haven't removed those yet as I didn't originally add them and you might want to ask the original contributor (if they are even active anymore) whether these are correct or can be removed. Though to my judgement the do seem problematic as well, so I'd be fine with removing those as well.

The "locative" case for example seems to have been someone who was to eager with generalising a rule over a few actually existing examples (specifically hieër (“lord”) → hieëves (“to the lord”) & heim (“home”) → heives (“to (the) home”)), instead of being a locative case it may just as well be a regular suffix -ves (which may be related to suffixes like English -ward en Dutch -waarts).

As far as I can tell the dative is identical in most dialects to the nominative, though they were different at some point I don't believe an -m has ever been present. The only dative marking that ever occurred seems to have been a simple -e. Similarly, the "consonant mutation" seems to just be progressive assimilation, which most spelling systems would not spell out anyway so I also don't see why that was added initially. BartGerardsSodermans (talk) 06:37, 6 April 2024 (UTC)[reply]

@BartGerardsSodermans Thank you! The original contributor is User:Ooswesthoesbes, who is sporadically active; but given what you've said, I am inclined to simply remove all the problematic tables regardless. I have a hard time believing, for example, there is a separate locative case in Limburgish. Benwing2 (talk) 06:45, 6 April 2024 (UTC)[reply]

The inflections given are based on the "High Limburgish" standardisation. After a community consultation on the Limburgish Wiktionary, we decided to drop it altogether and instead use no standardisation. As BartGerardsSodermans indicates, the assimilation is generally left out in spelling, and as such can be dropped as well.

While some dialects do differentiate in genitive, it is mainly tonal, f.e. daa~g vs. daa\g.

My advise would therefore be to remove the templates or replace them with the ones similar to those on the Limburgish Wiktionary. --Ooswesthoesbes (talk) 06:50, 6 April 2024 (UTC)[reply]

@BartGerardsSodermans Can you do this? Benwing2 (talk) 06:59, 6 April 2024 (UTC)[reply]

Yeah, I'll go ahead and remove the inflection tables for now. And maybe in the future I'll see about creating a correct template. BartGerardsSodermans (talk) 12:14, 8 April 2024 (UTC)[reply]

`{{tcl}}`[edit]

Pinging interested parties, feel free to ping more: @Benwing2, Surjection, Vininn126

Last December, WingerBot went through hundreds of pages replacing definitions of countries with this template, which automatically transcludes the English definition on the various language-specific pages. The only reason I noticed this is because this broke some quotation templates' display, but even if we ignore this issue (which should be fixed as soon as possible), in my opinion using this template across languages is a terrible idea:

Languages are very different with regards to how they perceive the world, and proper nouns and even place names are not different. When an English speaker sees "largest country in the world", a Russian speaker sees "home country", whereas a Ukrainian speaker... Well, you get it.

This becomes ever so relevant when you go down in size. A speaker of Votic will see the concept of Canada as fundamentally different from a speaker of Dogrib. While this is difficult to put into words, slight changes in the definition do matter! This becomes evident when you go to smaller-sized place names - The fact that Den Hoorn is located in Midden-Delfland is only really part of the worldview for the Dutch speakers in the region, for a Limburgish speaker it will already only be useful to know that it is located in South Holland, while for a speaker in Singapore this will just be a village in the Netherlands (or perhaps even in Europe!).

In any case I think the indiscriminate introduction of the template accross languages by a bot should be undone, but I also suspect that it might be better to not use the {{tcl}} template at all: Definitions are ever changing, both in the language and in English, and while at one point the English definition might match the one in a given language, it is possible that this English definition will be optimised in the future, leaving the definition in the other language incorrect. Thadh (talk) 11:48, 6 April 2024 (UTC)[reply]

Strongly agree that any mass adoption of {{tcl}} should have been discussed properly, and if I were to decide, I'd burn that template with fire. Not only for technical reasons, but with the valid points raised here. — SURJECTION ^{/ T / C / L /} 11:50, 6 April 2024 (UTC)[reply]

The technical issue aside, I strongly am of the opinion that there are absolutely terms that are the same semantically within languages, this removing it wholesale is not necessary. The argument comes across as ideological to me. trójkąt along with tons' other scientific terminology come to mind as being semantic matches. Vininn126 (talk) 11:55, 6 April 2024 (UTC)[reply]

What on earth is gained by using {{tcl}} to include a definition of Russia or triangle on an entry instead of just using a link and writing a short gloss? The tcl approach 1. is subject to whims by editors of the English entry, 2. wastes technical resources, 3. introduces risks of breakage with existing templates and gadgets (like the quotation issue that prompted this discussion), 4. makes it harder for editors to see where the definition comes from, 5. makes it harder for people who want to parse the Wiktionary entry data to get senses, etc. It is worse in literally every single way I can think of. — SURJECTION ^{/ T / C / L /} 11:58, 6 April 2024 (UTC)[reply]

1) It provides more precise linking between terms. To your 1) this can be an upside or a downside for particular terms. TO your 2) On 99% of potential matches, the difference is insignificant. 3) There's argument for every template/module for "potential breaking", let's not have any in that case. IPA modules can be changed, too, and break. 4) How? 5) For some people. It's also based on Wikidata, which I'm sure people can parse. Vininn126 (talk) 12:02, 6 April 2024 (UTC)[reply]

1 is meaningless. What is "more precise linking" between terms? 2 is never insignificant; this line of thinking is why we had memory errors for years. 4 is obvious; if you try to edit the page to fix a typo in the sense and just see a {{tcl}}, how is an editor not familiar with this template supposed to do anything? Your response to 5 I think just shows you're not familiar with this topic; forcing people to use the Wikidata API too just to display Wiktionary definitions makes no sense. If all you care about is the Wikidata lexeme ID, you can add that with {{senseid}}. {{tcl}} adds no value whatsoever. — SURJECTION ^{/ T / C / L /} 12:08, 6 April 2024 (UTC)[reply]

As to memory issues - we have tons of templates that can cause these problems and it's always on shared pages. This does not stop us from including templates on pages that are not shared and are unlikely to be shared. We've had to come up with other ways to deal with memory issues for specific pages in the past. I don't see why we can't just limit it's use instead of banning it entirely for this reason. Vininn126 (talk) 12:12, 6 April 2024 (UTC)[reply]

@Surjection Unless you can point to something concrete, complaints about use of resources are not helpful here. I see nothing about this template which uses anything significant, and vague hand-wringing about it is not productive. Theknightwho (talk) 12:13, 6 April 2024 (UTC)[reply]

Fetching and transcluding another page and parsing it in Lua is not free and is never going to be free. — SURJECTION ^{/ T / C / L /} 12:15, 6 April 2024 (UTC)[reply]

@Surjection Correct, but it’s also a negligible cost outside of pages which do it to hundreds/thousands of other pages, and I cannot think of any examples where that would happen with this template. Theknightwho (talk) 12:17, 6 April 2024 (UTC)[reply]

We currently have 3.888 Latin-script languages. I bet you 99% of them would write the name of Samoa in the same way as English does. This is an issue in the long run (although to me the technical side is much less important than the lexicographical one). Thadh (talk) 12:22, 6 April 2024 (UTC)[reply]

@Thadh Which is a hypothetical future problem for when we have thousands of L2s on a single page.

Plus, by far the biggest cost involved is in grabbing the content of the page (which is something we can’t do anything about, since it calls back into PHP), whereas the actual parsing is relatively cheap. I know this, because I spent ages doing time profiles to see what the problem was. In your example, if they’re all calling back into PHP to get content from the same page, then that time cost won’t apply, since PHP uses a cache. Where it is a significant issue is in scraped transliterations, since every link grabs the content of new page(s). Theknightwho (talk) 12:27, 6 April 2024 (UTC)[reply]

One consequence that hasn't been considered: this makes the master entries, in effect, templates that can't be given template-editor protection. Any bad edits to these entries will be propagated to all of the other entries, which will make them targets once the vandals catch on. Because they're entries with cross-linguistic significance they need lots of translations, and many of the translations are added by contributors who would be kept out if the page is protected. Page protection would also affect unrelated homographs. It might be possible to create an abuse filter that protects just the sense, with the level dictated by something added to the main entry (maybe a parameter in the senseid?)- but that definitely complicates things. At the very least, {{tcl}} raises the stakes for any decisions made regarding the senses being transcluded. Chuck Entz (talk) 21:34, 6 April 2024 (UTC)[reply]

I find the point being made here rather confusing. This dictionary is written in English and maintains a neutral point of view, so it does not refer to any country as a "home country" for instance. It's true that Votic and Dogrib speakers will think differently of Canada, but if the denotations of the words both refer to the UN member state, then the definitions, as written in English, should surely be identical. Further historical or cultural nuances are presented as distinct senses or usage notes. This, that and the other (talk) 05:58, 7 April 2024 (UTC)[reply]

The reason I did this was to avoid tons of duplication of definitions. It may seem "obvious" to define Canada as a country in North America, but there are lots of cases where it's far from obvious especially if you want the categorization to work out correctly, and it was very painful to try and keep manually synchronized the definitions of a hundred different terms for e.g. Vatican City. For any country in the Middle East, for example, there are issues such as what are the limits of the Middle East and Western Asia, etc. and how do we indicate that countries are part of both? Even for a relatively innocuous area like Europe, the boundaries of "Western Europe", "Eastern Europe", etc. aren't necessarily obvious and it can be problematic if you get it wrong. For many geographic terms (e.g. Palestine, Jerusalem, Crimea, Artsakh/Nagorno-Karabakh, Macedonia, ...) just coming up with an NPOV definition is hard, and furthermore the NPOV definitions may change over time, leading again to a massive synchronization effort if {{tcl}} isn't used. The technical objections made above seem largely theoretical and speculative to me since I haven't actually seen major issues arising, and I don't at all buy User:Thadh's claim that we need to phrase the definition of a given geographic term differently depending on the language in question. In fact I would say doing so can be quite problematic from an NPOV perspective. Benwing2 (talk) 06:37, 7 April 2024 (UTC)[reply]

NPOV has absolutely nothing to do with lexicography. A certain word has a certain meaning in a certain culture, distinct from other cultures, be it a noun, adjective, or a name. That meaning is what we record here, and it is impossible to do lexicography well by only documenting the English definition of the referent, rather than the definition through the speaker's pov. This includes things like omitting specific information which is not primary to the speakers: For instance, Mäkkylä would best be defined as "A village in the Leningrad Oblast" in Russian, and as "A village in Russia" in English. That has nothing to do with NPOV, it has to do with speaker experience. Thadh (talk) 09:08, 7 April 2024 (UTC)[reply]

I am perplexed that you want the definitions of Ingrian words to be contextualised for Ingrian speakers, even though these definitions are written in English. It's likely that in the Ingrian edition of Wiktionary, definitions would be written in the way you wish. However, English is a global language. I do not agree with the idea that different languages' entries on English Wiktionary should assume different levels of geographic foreknowledge on the part of the reader. Perhaps not what we usually mean by NPOV, but it's a similar principle. This, that and the other (talk) 11:39, 8 April 2024 (UTC)[reply]

@This, that and the other: If you want our dictionary to only be used by English speakers, then I am afraid you'll see the majority of the editors leave pretty quickly. The target audience of the Ingrian entries are Russian, Finnish and Estonian speakers, which is unsurprising, as I doubt you'll ever encounter even one English speaker who has even heard of the word "Ingrian" in his life. So forgive me for not wanting to adapt my definitions to people that will have absolutely never use the entries, and to have information there that absolutely no reader will ever want. Thadh (talk) 11:50, 8 April 2024 (UTC)[reply]

@Thadh When you say If you want our dictionary to only be used by English speakers, then I am afraid you'll see the majority of the editors leave pretty quickly., do you not see how this implies the precise opposite of what you're saying? If it's not solely used by speakers of one language, then it's not safe to assume the reader's knowledge or interest on that basis. Theknightwho (talk) 17:03, 8 April 2024 (UTC)[reply]

Which is why I don't do that, but rather state that we should adapt the definition on the basis of what the speakers would denote. Thadh (talk) 17:54, 8 April 2024 (UTC)[reply]

@Thadh's argument doesn't make any sense to me. Why would English speakers not care that Den Hoorn is in a certain part of the Netherlands? Do you speak for all of us? As an English speaker, I strongly oppose removing this kind of geographical information on any entry. @Surjection's argument is also irrelevant, given that anyone trying to parse Wiktionary will use the HTML output, which isn't affected, rather than the wikitext. Therefore I support applying {{tcl}} whenever possible. Ioaxxere (talk) 20:59, 7 April 2024 (UTC)[reply]

@Ioaxxere: If you want more detailed information on Den Hoorn, there is a very good website for you called Wikipedia. Luckily for you, pretty much all our English entries have a handy link (and often more than one) to related articles. We also don't include information on past inhabitants of the village and how many schools there are. There is a reason for that: We are not an encyclopedia. Thadh (talk) 21:05, 7 April 2024 (UTC)[reply]

@Thadh Forcing users to go elsewhere because you’ve assumed people don’t care about information just adds inconvenience. What you’re essentially doing is applying the Sapir-Whorf hypothesis to place names, instead of the far more obvious explanation that it’s down to geographic proximity, which changes based on the speaker’s personal circumstances.

I agree that a speaker’s conceptions will change based on the language they’re speaking, but I do not agree that that applies to place names in general (edit for clarity: I’m not talking about poetic terms like Albion, which obviously are affected). Theknightwho (talk) 21:37, 7 April 2024 (UTC)[reply]

I'm sympathetic to the idea that e.g. "(an island and city-state in Southeast Asia, located off the southernmost tip of the Malay Peninsula; a former British crown colony)" does not necessarily need to be present on every single language's definition of Singapore (e.g. சிங்கப்பூர்) ... but only inasmuch as I think just defining it as "Singapore", pointing to the English entry, and letting the English entry do the heavy lifting (including noting any associations the place has in any particular cultures) could be enough. (And since {{tcl}} syncs/transcludes this content, rather than duplicating it in a way that would fall out of sync, I think it's fine.)
I don't think we can assume that the only people looking up a word in a given language are speakers from the main culture associated with the language, who live wherever the language is most commonly spoken; (native) Chinese speakers living outside China might only have as little need to know what specific sub-area of China Haikou is in as the average English-speaking American (but conversely, members of either group might want to know what specific region it was in); at the same time, they might have correspondingly more interest in exactly what part of Malaysia or the US, where they live, a nearby city (whether named in Chinese, Malaysian or English) is in. And IMO, information like (say) Mount Paektu being important to Koreans and Manchus should be noted in the English entry, not just the Korean entry. Only if a single place truly has salient/definitional cultural significance in a large number of languages would I consider not having at least a copy of all the info in the English entry, if it would balloon the English definition up too much.
Regarding the idea that these are "unprotectable templates": if every language's word for "Singapore" just linked to the English entry with no extra details, and someone vandalized it, then people coming from all different entries would still see the vandalism if they clicked through to the English entry... and if they didn't click through, then the vandalism would go unnoticed by them, whereas if the English definition is transcluded across many pages and gets vandalized, more people are in a position to see the issue and bring it to our attention... so IMO it seems like a wash on that front. - -sche (discuss) 22:22, 7 April 2024 (UTC)[reply]

@-sche: "I don't think we can assume that the only people looking up a word in a given language are speakers from the main culture associated with the language, who live wherever the language is most commonly spoken" - I assume nothing of the sort, but I am absolutely certain that the term denoted is the one that is used by "speakers from the main culture associated with the language, who live wherever the language is most commonly spoken". There is simply a difference between describing a referent and describing a term - when I as a Russian speaker say "Дортмунд (Dortmund)" I denote something different from what a German speaker would denote. The referent is the same, that is true, but the communicated information is not. Thadh (talk) 16:24, 8 April 2024 (UTC)[reply]

Unless you happen to be speaking to @Fay Freak or one of many other Russian-speakers who live in Germany, that is. Theknightwho (talk) 17:08, 8 April 2024 (UTC)[reply]

@Theknightwho: Good luck finding three quotes in Russian proving that the speaker encoded the knowledge that Dortmund is located in North Rhine-Westphalia into their used word, and the best of luck proving that for smaller languages. Thadh (talk) 17:57, 8 April 2024 (UTC)[reply]

@Thadh I'm pretty sure that Russian travel guides exist about Germany. Theknightwho (talk) 18:05, 8 April 2024 (UTC)[reply]

Okay, great. {{tcl}} can stay for Russian. Please remove it from Ingrian, Votic, Veps, Karelian, [insert any other Uralic language other than Finnish, Estonian and Hungarian], and while you're at it [any language that does not have travel guides written in it, which by my estimation is 98%].

I don't see why this would ever be a site-wide decision anyway. Regardless of what English editors may think, why should anyone choose whether or not to use this template for languages other than that language's editors? Thadh (talk) 18:10, 8 April 2024 (UTC)[reply]

@Thadh I have no idea why (or on what basis) you think the totality of speakers of these languages are ignorant of information about major cities in Germany such as what region they're in. It's completely baffling. Theknightwho (talk) 10:58, 9 April 2024 (UTC)[reply]

@Theknightwho: "Ignorant" and "Not encoding in their speech" is not the same thing. Thadh (talk) 12:17, 9 April 2024 (UTC)[reply]

@Thadh When using value-neutral terms for places, I encode as much information as the listener is able to infer from their knowledge of that place. Nothing more or less. Theknightwho (talk) 12:21, 9 April 2024 (UTC)[reply]

Also, maybe you haven't noticed, but there are currently two elderly speakers of Votic, and just some thirty of Ingrian. Would not be surprised if indeed the totality of these do not know where Dortmund is located. Thadh (talk) 12:21, 9 April 2024 (UTC)[reply]

Thadh being illogical again (→ belief perseverance). The correct perspective was outlined by the accusation of perplexity by This, that and the other. If you are a Russian, Finnish and Estonian speaker and use en.wiktionary.org with success then you are an English speaker to some degree, and assume an English speaker’s perspective. Theory of mind. He portrays the issue without a sense of proportion. The Wikipedia stuff doesn’t work either in the way Thadh suggests. I already had the problem of Malaysian place-names cited in 1978 by the Encyclopedia of Islam being unidentifiable to me, the suggestions on كَلَة (kala) being like a third of the mentioned suggestions. It is also psychiatrically interesting that Thadh illustratively expands upon the application of the Sapir-Whorf hypothesis after being called out for it, which is at this point willingly fallacious, stubborn. I mean I don’t say it is a disorder, giving lack of significance pervasiveness, allism must have even maladaptive interaction typically, we are all learning, but I must warn against such subjectivity, this is an unconstructive and dangerous personality trait. Fay Freak (talk) 18:15, 8 April 2024 (UTC)[reply]

I don't even know where to start, but I speak English and I still don't know what state any given American city is located in, even if I know English, let alone any city in any other of the hundreds of English-speaking countries and territories. Speaking a language to the degree of understanding glosses does not entail knowing anything about the culture at all. Furthermore, in this day of internet, you don't even have to know English to use our dictionary. Thadh (talk) 18:23, 8 April 2024 (UTC)[reply]

That’s why we take extra care about the place-name glosses being comprehensible to the naivest denominator. They are made exact to various granularity within the same gloss and at the same time robot-readable. I still freestyle within these limits and give entries a human touch: Ahlat (a town in Bitlis Province, Turkey; at Lake Van, 40 km northeast from Tatvan along the coastline). You don’t regularly succeed to be more intelligent than that either way. Fay Freak (talk) 18:35, 8 April 2024 (UTC)[reply]

@Fay Freak: I don't know if you've noticed, but this entire discussion is about removing this type of glossing. Thadh (talk) 18:40, 8 April 2024 (UTC)[reply]

@Thadh: The types of glossing aren’t mutually exclusive. English entries like Aksaray have to be improved, they are inexact because we didn’t know formatting, even using images as a replacement for coordinates. And {{transclude}} can have parameters for extra text like {{place}} has. Or if the link in {{place}} has an |id= then we don’t need to use {{tcl}} at all. It’s hardly the hundreds of pages WingerBot caught semi-automatically; I too was doomscrolling my watchlist in December and did not notice the theoretical offence. Fay Freak (talk) 19:52, 8 April 2024 (UTC)[reply]

I have a very specific objection to a use of {{tcl}} which I had forgotten. When I looked at Ancient Greek Ἰνδῐ́ᾱ (Indíā),I found that the definition was, "(chiefly historical, proscribed in modern use) India (a region of South Asia, traditionally delimited by the Himalayas and the Indus river; the Indian subcontinent)". What? A Classical Greek word proscribed in modern use? I then looked at the wikicode - {{tcl|grc|India|id=region}}. The problem is that the gloss is picking up {{lb|en|chiefly|historical|proscribed|_|in modern use}} from the English entry. We need some way to stop such labels being picked up.

A problem with correcting that entry is that I don't know what the Greek conception of 'India' was. But.. - while English glosses are supposed to be definitions, glosses for other languages are supposed to be translations. Perhaps I should just change the gloss of the Greek word to 'India'. --RichardW57 (talk) 19:56, 8 April 2024 (UTC)[reply]

I think this is actually a good example of what was being talked about above. Vininn126 (talk) 19:59, 8 April 2024 (UTC)[reply]

Or Ἀλαζώνιος (Alazṓnios), anachronistically using a country name in the definition which was invented two thousand years after the attestation. On the other hand, compare Κῦρος (Kûros) using a historically appropriate gloss. Vahag (talk) 20:12, 8 April 2024 (UTC)[reply]

That still works, you can’t circumscribe everything comprehensibly in historical terms, which become more ambiguous the more you go back, less the earliest Armenian historic times, which you do not well map year by year though unlike Europe’s maps of the 1900s, but another two millennia before, you see in what I wrote in the etymology section (tabrīz). And the historical sensibility or diachronic coherence gets lost by people’s manual actions, sigh, compare my original definition of Cyprus, thus circumspectly formulated because I added the Phoenician translation for the island of Cyprus (no country nor political unity existed). There is an inherent bias towards the present natural to language itself, its preservation and transmission up to the imagined readers of our working language, without anyone being Whorfian here. All techniques have intelligbility advantages. Fay Freak (talk) 20:48, 8 April 2024 (UTC)[reply]

There is a principal difference between countries (top-level states) and geographical or cultural regions corresponding to them sometimes (through the concept of a nation, such as tied to a demonym). I again applaud Wiktionary and whoever wrote the English dictionary entry Germany which has been afforded meticulousness in this respect. If we think about it like a programmer then they are different objects, which we could fetch. Wikipedians can only complain about it not being made out in the references though, which need to be interpreted as referring to one or the other or both. The concept would simplify historic accuracy, though even ethnicities only form in distinct times (e.g. Bosnians split off Serbians in the 13th century), but then again only most intelligent people like Vahagn, Richard, and Ben would even get the idea of what we try to achieve there, which is not usually expressed well in language – something for a later generation.

Psychology fact: Tasks become easier by sequentializiation and big questions like “What is Russia?” can be projected: At another occasion I point out that even in the legal realm there is the diplomatic/international law answer, the civil law one, which is the de-facto state one (a term invented by Wikipedia), the internal administrative one, then here we have the cultural one (surely went somewhere through Belarus and Ukraine until everyone became sick of being the Russian world as they bombed culture away), which is equal to the regions typically settled by Russians at the Eastern frontiers as opposed to Turks (ethnopluralism~~, which we define incorrectly btw,~~ is rare, most people distinguish other people(s) by acculturation). Fay Freak (talk) 21:13, 8 April 2024 (UTC)[reply]

@RichardW57 There is in fact support for this in {{tcl}}; |nolb=+ or |nolb=1 makes it not pick up any labels, or you can give a semicolon-separated list of labels not to pick up. This needs to be documented. Benwing2 (talk) 21:17, 8 April 2024 (UTC)[reply]

@Benwing2: Ἰνδῐ́ᾱ (Indíā) corrected accordingly. But when may one use undocumented features without being accused by perps of hacking? --RichardW57 (talk) 06:24, 9 April 2024 (UTC)[reply]

I wonder how often labels should vs. should not be transcluded; I wonder if |nolb=+ should be the default. If it's not too much work, maybe someone could make a list of what labels are used on the various definitions around Wiktionary that {{tcl}} transcludes, and how often, so we could get a sense of whether most labels can or can't be expected to apply across languages (e.g., if there are any foreign places for which the US and UK or NZ, etc, use different names, any "US" label would never carry over to Czech — maybe the template even already realizes this, to not carry over English-specific labels).
I'm not saying this is a good idea, if it would increase how much memory or other resources Module:languages uses for relatively small gain, but there is also the possibility of adding some kind of "long dead?" / "ceased to be spoken by native speakers in ancient times?" field (precise scope and name to be workshopped) to Module:languages or to some other module, which might need only be present only when the value is "true", from which not only could {{tcl}} know to suppress labels like "historical" or "obsolete" when a language is long-dead (since such labels are likely to be less accurate, and might be better added manually if accurate), but also "long-dead language term borrowed from modern language term" in {{der}}/{{bor}} like "Coptic terms derived from Greek" (permalink) could categorize into a "this is probably wrong, check me" category (in that case, Coptic borrowed from Ancient Greek, not modern Greek). (Again, not sure this is worth the cost, but mentioning it.) - -sche (discuss) 13:33, 9 April 2024 (UTC)[reply]

@Benwing: This discussion is delving into the theoretical of whether speakers of different languages denote different things, but meanwhile the technical issue with the quotations is not resolved yet. Thadh (talk) 12:50, 9 April 2024 (UTC)[reply]

After a disussion with @Theknightwho, it seems the main problems we share with the template is the fact that cultural baggage denoted by the term is now both transcluded (it shouldn't be, as English is a separate language and has separate connotations), and afaict excluded on the individual entries. By "cultural baggage" I mean for instance things like the difference between Burma and Myanmar.

We still disagree on the amount of geographical information that should be added to the entries (I personally am of the opinion that this should be minimal), but in any case things like "Largest country in the world" does not seem to be something that should be transcluded. Similarly, labels should not be transcluded by default, I don't see how that is desirable. @Benwing. Thadh (talk) 13:25, 9 April 2024 (UTC)[reply]

Coincidentally, my earlier comment here was prompted by finding a number of instances of the invalid category [language code]:Burma] that were only there because someone put |cat=Burma in a {{sensid}} template instead of |cat=Myanmar. While it's nice that all these entries are in synch, the fact that a Tagalog entry has all the mistakes of its English counterpart isn't. IMO we shouldn't use this on entries that are contentious or vandalism-prone in any way. Chuck Entz (talk) 14:04, 9 April 2024 (UTC)[reply]

@Thadh I think you are right about not transcluding labels by default. This is behavior inherited the original implementation by User:Fytcha, but it violates the principle of least surprise because often the labels won't be relevant or accurate and it's not obvious to the transcluder what the labels are. I'll change this so they are only transcluded if you use |lb=1 or |lb=+. BTW you should ping me as Benwing2; I don't log into my admin account much so I won't generally see pings to that account. Benwing2 (talk) 05:17, 10 April 2024 (UTC)[reply]

Oh whoops, sorry, I normally do but seemingly forgot this time. Thadh (talk) 07:55, 10 April 2024 (UTC)[reply]

@Thadh: I don't think I understand your position. To help me, could you tell me whether and in what way you disagree with how I've defined the various senses of English Bilohorivka and Ukrainian Білого́рівка (Bilohórivka), please? 0DF (talk) 16:58, 21 April 2024 (UTC)[reply]

@0DF: In my opinion, the fact that Bilohorivka is founded in 1720 is not dictionary material. Furthermore, the fact that Bilohorivka is located in a certain hromada in a certain raion is not in my opinion useful information for the English entry, and definitely for most other languages; They are probably useful for the Ukrainian entry though, which is why I think such information should not be transcluded. Thadh (talk) 17:33, 21 April 2024 (UTC)[reply]

I think this illustrates why the suggestion to categorically drastically reduce how much information is present is not workable. If "the fact that Bilohorivka is located in a certain hromada in a certain raion is not in my opinion useful information for the English entry", will you have "A village in Donetsk Oblast, Ukraine." twice, and "A village in Kharkiv Oblast, Ukraine." as nine different senses of Mykhailivka alongside six "A village in Luhansk Oblast, Ukraine."s, etc? Earlier, you even suggested reducing many definitions to just "a city in [country]" or "a city in [continent]"; if enough Chinese or Telugu or Nepali (etc) news reports have mentioned different Mykhailivkas, would you put "# A village in Ukraine." dozens of times in a row? Lots of places have this issue: Alexandrovka, Russia, Centerville, etc. The assumption that "a city in [country]" or even "a city in [province, country]" will sufficiently identify which place is meant is clearly wrong in many cases, and is not a safe assumption in general IMO, not only because multiple places may exist with the same English name, but because those places may not have the same name in other languages, given how prone languages are to having e.g. a more nativized name for one place than another place, or having a name they got from one language for one place and from another language for another place. Hence, when we know factual details of where a place like a particular Mykhailivka is, I think it's reasonable to include them. (Yes, in some cases, it will be unclear which one was meant, but that's a general problem with all words: it's also not clear which species of buttercup many uses of buttercup are about). - -sche (discuss) 20:09, 21 April 2024 (UTC)[reply]

@Thadh: I just created Andriivka to demonstrate many of the points -sche made in the meantime. You'll see from that entry that there are six Andriivky in Donetsk Oblast alone, and not only are there seven Andriivky in Poltava Oblast, five of those are in Poltava Raion, meaning hromady must be specified to distinguish those five villages as separate senses. In all likelihood, every one of those settlements (I'm excluding the hromada) would have the French translation Andriïvka, the Ukrainian translation Андрі́ївка (Andríjivka), the Russian translation Андре́евка (Andréjevka), and so on, but it would be pretty unsurprising if some of the Crimean Tatar or Mariupol Greek translations varied by sense; assuming such variation, it becomes increasingly impractical to deal with them all in a single translation table. Re the occasional postmodifiers, founding and disestablishment/abandonment/destruction dates give termini inter quos for literary references to those places (as living settlements, at least) and statements about de facto control or occupation clarify by implication that the main definition is de jure; those qualifiers are universal (translingual) in their relevance. 0DF (talk) 21:29, 21 April 2024 (UTC)[reply]

@0DF: But we don't distinguish the various meanings a term can have in other languages, we distinguish the meanings a term has in the language that the entry is in. We can deal with various terms for various Andriivky using qualifiers in the translation table, as we do with countless examples in other places, because English simply doesn't make the same distinctions as some other languages do. Compare for instance buzz#Translations_2, where Finnish makes a distinction between three different subsenses which are translated differently in Finnish, but are not distinguished in English. Splitting a translation table just for that is what's impractical.

A translation of the kind "One of various villages in the Poltava Oblast of Ukraine" would be enough for an English entry. Thadh (talk) 21:38, 21 April 2024 (UTC)[reply]

@Thadh: When people with no knowledge of chemistry talk about water, are they not all referring to a substance whose molecular structure is H₂O? 0DF (talk) 21:56, 21 April 2024 (UTC)[reply]

@0DF: No, they're not. And if there were a substance with a different molecular structure that would quench thirst and be wet, colourless and tasteless, it would also be called water. In fact, anything that doesn't fail any of these parameters to the speaker's knowledge would be water. Thadh (talk) 21:59, 21 April 2024 (UTC)[reply]

@Thadh: I think we're going to have to agree to disagree on this qua principle regulating how we should compose definitions. 0DF (talk) 22:53, 21 April 2024 (UTC)[reply]

@0DF: The issue with the {{tcl}} template being applied on the entire website means we cannot "agree to disagree"! I would love to just edit the languages I edit without getting constant interference from people who don't know these languages and don't want to, but recently I find myself constantly having to deal with these issues that have never arisen before. Why should an Ingrian entry for New York include the information that it's the "largest city in the state of New York and the largest city in the United States, a metropolis extending into neighboring New Jersey" instead of just stating it's "A city in the United States"?

And this applies to other topics discussed on the Beer Parlour in recent times as well: Why should an Ingrian etymology section link to the word "Borrowed from" Russian and show the term "Inherited from" Proto-Finnic? Why would an Ingrian entry with two or three definitions have "antonyms of sense" in the antonym section as a qualifier, rather than just the sense? I understand that you personally haven't participated in all these discussions, but this is getting more and more frustrating, having to deal with these ideas which are obviously centered at English (and perhaps a couple of other large languages), but are absolutely worthless for the majority of languages. We are currently the most usable, most complete dictionary of Ingrian in the world, I think our readers can live with having to figure out that antonym sections' sense template shows senses of the entry (rather than the antonym), they have to figure out how the dictionary works anyway! Having a bunch of text to "clarify" it just makes it all much more complex and offputting for any reader who doesn't enjoy wasting time. Thadh (talk) 23:10, 21 April 2024 (UTC)[reply]

@Thadh: These are side-issues, but why don't you use {{bor}} and {{inh}} instead of {{bor+}} and {{inh+}} if you don't like the latter's preambles? And why don't you use {{ant}} instead of Antonyms sections if you don't like the wordiness of "antonyms of sense"? 0DF (talk) 00:27, 22 April 2024 (UTC)[reply]

@0DF: I do use the simple templates, but I had to fight like hell for those to even be allowed in entries. As for the inline-antonyms: I have up to three quotes per sense, all inline, and sometimes up to four nym-sections. Inline nyms are not a good idea. And I don't understand why something that has worked well for the past ten years should suddenly change. Thadh (talk) 07:11, 22 April 2024 (UTC)[reply]

{{ant}} was created in 2017, hard to call inline nyms a "sudden change". They also are generally preferred by people I ask - they say it's easier to understand what's a nym in relation to what sense. The other way works but is messier. Vininn126 (talk) 07:13, 22 April 2024 (UTC)[reply]

@Vininn126: I am talking about antsense being the new change. Thadh (talk) 07:16, 22 April 2024 (UTC)[reply]

@Thadh: I also occasionally grumble to myself about certain changes with the thought of “did this really have to be made more difficult to understand or modify?”, but it's a very ephemeral thing when I do. On the whole, I just trust that most people are trying to improve things. Some people seem misguided in those attempts, but on the technical side of things, the main editors I'm aware of (Benwing2 and This, that and the other) are competent and thoughtful, so I just trust, when I don't understand exactly what they're doing, that their changes are sensible. I can only recommend picking your battles wisely and trying to be adaptable otherwise; resentment won't do you any good. And since I neither edit Ingrian nor enforce the use of {{tcl}}, we indeed can agree to disagree on this. 0DF (talk) 19:53, 22 April 2024 (UTC)[reply]

IMO if the consensus is that definitions of similar items in different languages should read similarly across the site (which is certainly what I believe), you should follow this e.g. for Ingrian even if you disagree with it. This particular issue doesn't seem to me like something that should be up to an individual language's editing community. Benwing2 (talk) 23:15, 21 April 2024 (UTC)[reply]

@Benwing2: You mean the consensus by people who edit English? That's absurd. Thadh (talk) 23:21, 21 April 2024 (UTC)[reply]

Also, maybe you haven't seen it in a while but our Chinese entries are the complete opposite of this "reading similarly", and I haven't seen anyone complain about that as much as I've seen these countless optimisations and regularisations being pushed onto smaller languages. Why would you ever think many of these changes would benefit anyone, editor and reader alike, is truly beyond me. Thadh (talk) 23:25, 21 April 2024 (UTC)[reply]

@Thadh Quite a lot of people have expressed that opinion in this thread, and I think you know very well that they edit a wide array of languages. I appreciate that you believe each language's community should have broad control over how their entries should look, but I also note that you only tend to invoke that when you're also claiming that you're the sole editor of a particular language, and that isn't really how Wiktionary consensus works. Theknightwho (talk) 00:46, 22 April 2024 (UTC)[reply]

@Theknightwho: That's not true, I invoke it all the time, I just happen to be the sole active editor of most languages I edit. But for Kashubian for instance, where I am one of many editors, I have the same ideas, yet follow the consensus of other Kashubian editors. Thadh (talk) 07:07, 22 April 2024 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ I sympathize that you don't like seeing more information than you are interested in, and re "a bunch of text to "clarify" [antonyms]", I've now encased the two extra words in a <span> so you can add .antonym-clarification { display: none; } to your css like this and {{antsense}} will display the same way as {{sense}}. I guess someone could modify the placename template to similarly encase raions (etc) in a CSS class that users could opt out of seeing, but I think this discussion demonstrates that just because you are not interested in the information doesn't mean that no-one is interested in it.
I mentioned migrant communities (and others mentioned tourists, and news media) as calling into question the idea that speakers of a language are not aware of or interested in the details of places far from where the language is most commonly spoken; you asserted that nonethless "I am absolutely certain that the term denoted is the one that is used by 'speakers from the main culture associated with the language, who live wherever the language is most commonly spoken'", but ... it seems to me like your own idea that people in Ingria or Vietnam or wherever else are uninterested in the finer details of the location of a city somewhere far off (say, the US) — the idea that "a city in America" or even "a city in the Americas" is all they care about knowing — suggests that the few Vietnamese (etc) speakers most likely to speak about such a city, the people creating most of the uses of the Vietnamese word for that city, may be the Vietnamese immigrants who live near it and know and "encode" exactly where it is. More generally, any time we have an Ingrian or Chinese, Vietnamese, etc word for New Orleans or New York because an Ingrian or Chinese or Vietnamese author used the word for it in a text, and then someone else comes here and specifically looks that up,* I just ... don't share the belief that neither of those people will have been interested in where the place is beyond "it's the name of various places in the Americas".
(*And it seems relevant to me that that's what's happening; we aren't stopping people in Asia on the street and randomly accosting them "did you know New York is the largest city in the state of New York?!": someone is coming here specifically to look for an English-language definition of what the Vietnamese (Ingrian, etc... or even English) word for New York (Bilohorivka, etc) is. In that context, I don't share the idea that they'll be so uninterested in the details of where it is that they'll be put off if we spend a few extra words offering that information to them.) - -sche (discuss) 04:47, 22 April 2024 (UTC)[reply]

@-sche: That's simply not good enough. What you're doing is acting as if everything I'm speaking of is my personal preference and an extreme minority opinion among the readers. Personalised CSS is only possible for logged-in users. The readers who will be put off from the enormous wall of text rendering their target language unreadable will not stick around long enough to create an account, find this discussion, and change their CSS - they'll simply stop using us.

As for the topic at hand: If we suddenly get an Ingrian speaker who lives in the US, but doesn't know New York and wants to know that it's the largest city of the state New York, then they can simply click the link to the English entry. Once there, if they want to know more still, they can click on the giant Wikipedia box on the right and read an encyclopedic article on the city. But the fact there may be someone who may have to go through those two clicks doesn't excuse a wall of text on the original entry.

Same thing for Andriivka: If you're interested in which Andriivky specifically are meant, you click on the giant floating box and see for yourself.

And I repeat, this is not me who has personal preferences and wants everyone to follow them. It is me advocating for the readers that I bring in. This can't be solved by personalised CSS or preferences, this is major issue and readers will leave because of it. And when readers leave, editors do, too. Thadh (talk) 07:39, 22 April 2024 (UTC)[reply]

@Thadh: And some languages use different words for water depending on what it's fit for, such as 'drinking water'. And the usual Thai word for water, น้ำ (náam), refers to fluids in general, such as oil, and in some compounds, to the solid residue left after driving H₂O off, e.g. น้ำตาล (nám-dtaan, “sugar”). --RichardW57 (talk) 23:09, 21 April 2024 (UTC)[reply]

Copying rhyme syllable counts from existing categories[edit]

The {{rhymes}} template takes |s=, which specifies the number of syllables in the relevant pronunciation of the term (not in the rhyme itself). This allows the template to categorize the term in e.g. Category:Rhymes:English/iː/1 syllable. Many English entries lack these syllable counts but are in categories like Category:English 1-syllable words. I've written a Python script to find English terms that are in exactly one of these syllable categories and that already have a rhyme with no syllable count specified, and add the syllable count specified by the category. I'm seeking consensus to run it (under my bot account) over all "English N-syllable words" categories. As usual I will start off slow to allow me to catch bugs early. — excarnateSojourner (ta·co) 01:46, 7 April 2024 (UTC)[reply]

Run it. CitationsFreak (talk) 07:08, 7 April 2024 (UTC)[reply]

@CitationsFreak This seems fairly innocuous to me. I think User:Surjection has already done similar runs for certain languages. Ideally this wouldn't be necessary and we'd have a pronunciation module that would automatically generate the pronunciation from a respelling along with the rhymes, but we're a long way off from that for English. The only issues I can really think of are cases where there are multiple possible pronunciations with different numbers of syllables, where the different pronunciations are tied to a specific dialect (e.g. secretary, normally 4 syllables in the US but 3 syllables in the UK), and for which the rhymes are different per dialect and thus the syllable counts need to be synchronized to the rhymes. Whether this actually happens I don't know, but you might want to generate a list of all the pages that have both multiple syllable counts and multiple rhymes, and manually review them to see if there are any of this nature (or just tell your bot not to touch them, and do them by hand). Benwing2 (talk) 07:31, 7 April 2024 (UTC)[reply]

@ExcarnateSojourner Sorry, too late at night, I pinged the wrong person. Benwing2 (talk) 07:31, 7 April 2024 (UTC)[reply]

It's fine, Benwing. (Also, thanks for asking permission, ExcarnateSojourner!) CitationsFreak (talk) 08:10, 7 April 2024 (UTC)[reply]

Labels 'UK' vs 'Britain'[edit]

Pinging @-sche because you often have thoughts about things like this. Currently we have two labels in Module:labels/data/regional, UK (alias United Kingdom) and Britain (aliases British, Great Britain, Brit), which display differently but categorize identically into British Foo where Foo is currently one of the five languages Bengali, English, Urdu, Vietnamese and Chinese. Yes, I know the UK and Great Britain aren't the same, but is there really enough of a linguistic distinction to merit two labels? I am 90% sure these labels are used promiscuously, with editors more or less randomly choosing one or the other. Given this, should we merge them? Benwing2 (talk) 07:00, 7 April 2024 (UTC)[reply]

Aren’t they merged in their categorization already? Because that’s only what I could propose. Otherwise I let editors write what they want to write. To avoid frustration that is.

I have added Irish English often enough, and with some likelihood “UK” means a term relevant due the Kingdom’s politics or federal (yikes) legal system, whereas ”Britain” means that I think a term is said in the ends in Scotland too since I know it from Northern England whereas Northern Ireland I would reserve to check. Your mileage differs no doubt.

Doesn’t really matter that editors are inconsistent in theology, for, as outlined, the editing process is affective rather than based on thorough linguistic field study, lucky googling + “I feel like it” (impeccable for experienced editors, who know what the dictionary profits from, don’t get it twisted). Fay Freak (talk) 08:40, 7 April 2024 (UTC)[reply]

To me "UK" seems less ambiguous, and so preferable (except for the fact that it might sound less pleasant to the ear?). I don't live there, so maybe I'm missing something.--Urszag (talk) 08:45, 7 April 2024 (UTC)[reply]

Technically, "Britain" excludes Northern Ireland while "UK" does not, but (a) I don't think that's a very meaningful distinction, as Scotland and Northern Ireland form a far more coherent linguistical unit than Great Britain to the exclusion of Northern Ireland, and (b) "British", the adjective, doesn't exclude Northern Ireland, so if we're categorising terms as "British X" then it makes more sense to use the "UK" label. Theknightwho (talk) 13:28, 7 April 2024 (UTC)[reply]

IIRC, a main reason these exist separately is that sometimes people wanted a noun (that didn't require typing "the" in front of it every time) and sometimes people wanted a word that fit in an adjectival/attributive slot. This is partly so that in {{label}}s people can write "dated in Britain" vs "UK dialects" (instead of wrong-sounding "Britain dialects") — I dimly recall "British" may have displayed as such at some point too, instead of being aliased to "Britain" — and partly because {{label}} isn't the only place these are used, there's also e.g. UK form of foobar vs wrong-sounding British form of foobar. (This issue of needing {{standard spelling of}} et al. to display something different is also why we hackily have "British spelling" and "British form" as different labels, btw; see this June 2020 TR discussion and the other discussions linked there.)
Offhand, it seems like uses of "Britain" could be folded into "UK" or, in a very few cases in labels, "the|_|UK", but I'd want us to first make sure these aren't also used in other places we haven't thought of where that wouldn't work. I am inclined to agree with the goal of merging them, because they technically refer to different terms as TKW says, but I doubt even 10% of uses intend to be conveying different things, so having the difference is basically creating inaccuracy (not to mention that they're not distinguished in categorization).
I am reluctant to even mention this, because I fear some people will use it as a reason to keep separate labels, but: another difference they could theoretically convey if there was any chance — which there clearly isn't, looking at how they're used at the moment — of people ever maintaining this difference, is that "Britain" existed at times when the "UK" did not, so theoretically some entry for a word that went obsolete centuries ago might technically be more accurately labelled "Britian" than "UK", but such a case could (and probably better should) be tagged with the relevant more specific labels like "England, Scotland, Wales" instead. - -sche (discuss) 15:38, 7 April 2024 (UTC)[reply]

@-sche So ... I actually introduced the capability in labels of having a language-specific postprocessing function, which is currently used in Chinese so that e.g. {{lb|zh|Jilu Mandarin|Jiaoliao Mandarin|and|Jianghuai Mandarin}} displays as (Jilu, Jiaoliao and Jianghuai Mandarin) and {{lb|zh|Zhangzhou}} displays as (Zhangzhou Hokkien) but {{lb|zh|Zhangzhou|_|Hokkien}} also displays as (Zhangzhou Hokkien) rather than as (Zhangzhou Hokkien Hokkien). Such an approach, or maybe some modification of it maybe with label-specific settings, could potentially be used to correct the display issues you've mentioned above without actually needing separate labels. I'd just need a full description of what should be displayed in which circumstances. Benwing2 (talk) 18:22, 7 April 2024 (UTC)[reply]

For "Britain" vs "UK", I (at least) wouldn't bother trying to have the template/module guess which one to display, because the odds of us being able to set it up to always display "Britain" in only those miscellaneous situations where that is desirable (without it accidentally displaying in situations where it is undesirable), and vice versa / mutatis mutandis for "UK", seem slim to me, and the benefits even if we do it right seem small; it seems easier to convert everything to "UK" and maybe manually add "_|the|_|" in the hopefully few places where "the UK" would be more euphonious than "UK".
For "British form" vs "British spelling", though... if the "spelling of" and "form of" templates like "alternative form of", "standard spelling of" etc could know to delete the word "spelling" from the display form of "British spelling" — so that {{altspell|en|foobar|from=British spelling}} displayed "British spelling of foobar" instead of "British spelling spelling of foobar" — then I think the labels "British form", "Canadian form", etc could be reduced to aliases of the "British spelling" labels, although we should check first whether those labels ("British form", "Canadian form" etc) have come to be used anywhere else in the years since I set them up... I do spy one use in from A to Zed which needs to be changed (probably to just say {{lb|en|UK}}) before we alias "British form" to "British spelling". - -sche (discuss) 19:15, 7 April 2024 (UTC)[reply]

@-sche It occurs to me there's a very simple solution to this issue, which is to provide a way of indicating that the label should display as written, e.g. {{lb|dated|in|!Britain}} which means that Britain should display as written rather than canonicalized to "UK" or whatever. It could easily be argued that this should actually be the default, but I don't know the ramifications of that. Benwing2 (talk) 04:08, 8 April 2024 (UTC)[reply]

Hmm. That might be useful in some situations which are currently handled by having different labels (with the same categorization or whatever), but the downside, especially if we allow that for all labels and their aliases, is that then (as people start to use it even in cases where it's the only label, not part of a "dated in..." or the like), lots of entries will start displaying different things, suggesting there is a difference. If some entries say "Canada" and others say "Canadian", I suspect anyone who actually notices the difference may wonder what it's trying to convey (is Canada the topic label, and Canadian the dialect label?), and if the answer is "we're not trying to convey a difference", then why do we have a difference? I should clarify that my initial comment in this discussion was not in support of these being separate, just answering what the reason people kept them separate was; I actually feel the same way about "Britain" vs "UK" as about "Canada" vs "Canadian": that anyone who actually notices the difference is liable to wonder if we're actually trying to convey that the words are used in different sub-areas of the British Isles. I don't think adding "|_|the|_" to labels that need to display "dated in..." [Britain → the UK] is onerous. But especially given recent cases of pushback to template changes, maybe we should ping some more British editors to make sure they're onboard. - -sche (discuss) 22:32, 11 April 2024 (UTC)[reply]

@-sche I have thought instead of making this opt-in only for certain labels, e.g. the labels data can mark that Britain should stay as such even when it's an alias of UK. There are cases where non-equivalent labels were being aliased (e.g. South Midlands as an alias Midlands), which is problematic when the display gets changed. I have solved this so far by separating the labels entirely but this is a bit annoying to implement. Benwing2 (talk) 00:13, 12 April 2024 (UTC)[reply]

@-sche I implemented ! preceding a label to indicate that the label should be displayed as-is instead of converted to its canonical form. This is useful e.g. for yallah, which is labeled as Arab|_|!Australian so it displays as Arab Australian; otherwise it would show up as Arab Australia, which sounds wrong. Benwing2 (talk) 23:13, 16 April 2024 (UTC)[reply]

@-sche Specifically referring to your point about Britain existing before the UK, I think that any terms which fall within that period (1707-1800) are better labelled as "18th century". If the country is absolutely necessary for whatever reason, it would be best to use the term "Great Britain", which is the period-equivalent to "UK" (which is what it became at the start of 1801). Theknightwho (talk) 23:32, 7 April 2024 (UTC)[reply]

Update the text on main page[edit]

The license on the main page says CC-BY-SA 3.0, it was actually updated to 4.0. 2001:4455:25F:9A00:FD29:81AD:A478:9FA2 06:22, 8 April 2024 (UTC)[reply]

Done — SURJECTION ^{/ T / C / L /} 07:07, 8 April 2024 (UTC)[reply]

Old Lombard[edit]

Ok, so first of all, I have listed "Old Lombard" as a dialect of Lombard. However, Old Lombard was spoken in the 13th-14th centuries, a fact I got from Lombard Wikipedia, meaning it was spoken in the same time as Old Spanish and Old French, etc. So I guess we could just add a code like roa-lmoa, (the final a for antich). That Northern Irish Historian (talk) 14:40, 10 April 2024 (UTC)[reply]

Definition of a neologism for loanwords[edit]

After the Spanish occupation, Tagalists in the early 20th century (Tagalog enthusiasts and promoters, who eventually became or influenced the Filipino language committee members) were promoting the use of Tagalog for academic abstract terms such as for science and arts because the terms used back then were heavily depending on Spanish. They started coining new terms and intentionally borrowed (not by natural contact) other Philippine languages' terms (and also Malay) such as Cebuano batas for Spanish ley (“law”), Cebuano katarongan for Spanish justicia (“justice”), Malay bangsa for Spanish nación, and Malay guru for Spanish maestro (“teacher”). All of these made it to common use up to the current period as Tagalog batas, katarungan, and bansa, guro. Since they were initially borrowed but made it to the "mainstream" use, some people would never think of these as neologisms anymore.

Now, Tagalog has native words araw (“sun; day”) and buwan (“moon; month”) and the words can mean both the celestial object meaning, and the period of time meaning they were assigned to and can be perfectly understandable with context. In addition, Spanish had separate words for "sun" and "day" which are Spanish sol (“sun”) and Spanish día (“day”) respectively. Likewise, Spanish also has separate words for "moon" and "month" which are Spanish luna (“moon”) and Spanish mes (“month”).

With the sun/day, moon/month distinction of Spanish and with the goals of being "inclusive" to create a true "Philippine" language, there was a proposal back then as well to borrow Cebuano adlaw (“sun; day”) and Cebuano bulan (“moon; month”) but the Cebuano terms would only refer to the celestial objects sun and moon, and araw and buwan would be used for the time periods, day, and month. The separation of words to refer to the time period and the celestial object did not made it to mainstream use unlike ther terms in the first paragraph and Tagalog still used the native terms up to today.

Currently, in Wiktionary, Tagalog adlaw (“sun”) and Tagalog bulan (“moon”) are listed as neologisms since it was not used practically, nor added in the common dictionary but still listed in some books introducing neologisms such as Maugnaying talasalitaang pang-agham Ingles-Pilipino Literally, “Relational Scientific Vocabulary English-Filipino” and still being talked about in some papers or whatever that it can actually satisfy the Criteria for Inclusion.

A user thought that the neologism label for these words should be removed due to the definitions provided at Wiktionary:Neologisms which are the following:

A more precise sense of neologism which has gained some support on Wiktionary is a word that

a) is new and perceived as new, although there is no precise age cutoff for newness;

b) has not yet been recognized as part of the standard language (often being written with scare quotes);

c) is not slang, colloquial, very informal, or technical, and;

d) is not merely derived from an already-existing term with no unexpected change in meaning, such as in the case of clippings, loanwords, and abbreviations.

A neologism that becomes part of the standard language should have the "neologism" label removed. A neologism that fails to become part of the standard language after an extended period of time but is not a protologism should be labeled nonstandard.

The user has said that they can just be interpreted as regular loanwords and not as neologisms.

However, I think adlaw and bulan by this definition are

1. new in the sense that an existing term araw/buwan already existed but the sun/day moon/month concept was being introduced, coinage in intention so could be arguably a protologism

2. did not become part of the standard Tagalog language, and you may only understand bulan and adlaw more likely if you come from the regional speakers such as Cebuano and the native terms are still used without separation

3. not slang, nor colloquially derived as it was introduced by intellectuals

However it failed number 4 because it is a loanword, an existing loanword with the same definition, but I would argue that the word was not borrowed naturally (ex. Cebuano speakers influx, interaction nor Cebuano getting political power to have their language used by Tagalogs).

So what do you think, is this a neologism still? Should the rules at Wiktionary:Neologisms be modified if it is still a neologism? Thanks. Ysrael214 (talk) 18:49, 10 April 2024 (UTC)[reply]

I still think we should count these as neologisms - they are loanwords, but they're not normal loanwords. If they're not neologisms, they are surely something else, since these have not come about naturally, but through a "top-down" approach in language development. — SURJECTION ^{/ T / C / L /} 19:03, 10 April 2024 (UTC)[reply]

That's pretty much the definition of a learned borrowing. Trooper57 (talk) 19:05, 10 April 2024 (UTC)[reply]

Yet some learned borrowings are actually commonly used in their target languages; this word isn't. If it weren't borrowed from another language, everyone would agree to call it a puristic neologism. — SURJECTION ^{/ T / C / L /} 19:24, 10 April 2024 (UTC)[reply]

Add a lang code to Template:a (Template:accent)?[edit]

In my quest to reduce the number of modules enumerating language-specific varieties I discovered yet another one, which is Module:accent qualifier/data. This is a real mess as labels from multiple languages are all jumbled together. Since there is no language code currently associated with {{a}}, clashes are a real problem and are solved in all sorts of ad-hoc ways, e.g. inexplicably, Lahore displays as Lahori Urdu but Lahori displays as Lahori Punjabi. I think the only way to make this clean is to add a language code to {{a}}. This would make it possible to separate the per-language uses and ultimately eliminate Module:accent qualifier/data entirely in favor of the label data. It would also make it possible to have some labels categorize, if we wanted that. Thoughts/support/opposition/etc.? Benwing2 (talk) 03:01, 11 April 2024 (UTC)[reply]

Support. Theknightwho (talk) 03:03, 11 April 2024 (UTC)[reply]

Yes please. This template and {{lookfrom}} (see RFDO) are just about the last frontier when it comes to templates that should have language codes but don't. This, that and the other (talk) 02:06, 12 April 2024 (UTC)[reply]

Support. Binarystep (talk) 11:59, 20 April 2024 (UTC)[reply]

@-sche I wrote a script to analyze existing uses of accent qualifiers overall and by language. See the results in User:Benwing2/analyze-accent-qualifier-20240420-dump. The good news is most accent qualifiers are used only by one or occasionally two languages, so disentangling them shouldn't be too hard. In the {{a}}-vs-{{lq}} topic I've been thinking it would be better to add a lang code to {{a}} and repurpose it as a general "non-categorizing labeler". I'm thinking it could stand for something like ancillary label or auxiliary label: "label" in that it works with the same labels as {{lb}} does, and "ancillary" in that it adds extra info to an existing something (pronunciation, synonym, derived term, etc.) that isn't the term itself (hence it doesn't categorize the term). Benwing2 (talk) 04:44, 23 April 2024 (UTC)[reply]

BTW only 265 of 4,328 distinct labels occur with more than one language; often with only two languages where the second language uses the label only once. Of these 265, 147 of the labels begin with a lowercase letter, meaning they are typically things like informal, nonstandard or misspelling rather than lects. Benwing2 (talk) 06:34, 23 April 2024 (UTC)[reply]

I mean, for my part I'm down with this, but I suspect it'll be enough of a change in people's habits that it'd be good to give people ample time to notice this and complain (😅). (Even splitting by language won't solve some of the possible sources of confusion, like "NY" being "New York" but "CA" being Canada not California, and "GA" being General American not Georgia.) - -sche (discuss) 16:27, 23 April 2024 (UTC)[reply]

@-sche How much time do you think is enough? It's been 12 days so far; I'm thinking a month should be enough. Not really sure how else to ensure everyone gets their say. BTW longer-term I think we should eliminate confusing labels like CA and GA in favor of slightly longer but unambiguous ones; but that can come after adding a language code and unifying accent qualifiers with labels. Benwing2 (talk) 23:49, 23 April 2024 (UTC)[reply]

Support. نعم البدل (talk) 16:31, 23 April 2024 (UTC)[reply]

rename all states, provinces, etc. to include the associated country in them[edit]

User:نعم البدل pointed me to an issue we currently have, which is that Category:Punjab refers specifically to the Indian state of Punjab when there are actually two Punjabs, one a state in India and the other a province in Pakistan. Currently we have a messy situation where some first-level administrative divisions contain the country in them (e.g. Category:Arizona, USA; Category:Ostrobothnia, Finland; Category:Herefordshire, England) but others don't (Category:Punjab and other states in India; Category:Trentino-Alto Adige and other regions in Italy; Category:Lampung and other provinces in Indonesia; etc.). Sometimes instead the name of the administrative division is in the category, e.g. Category:Gunma Prefecture and other prefectures of Japan, except for Category:Hokkaido and Category:Tokyo; Category:Chittagong Division and other divisions of Bangladesh; etc. I would like to propose we rename all such first-level administrative divisions to include the name of the country in them, although it might make sense to keep the existing UK situation where the category contains the constituent country (Category:Herefordshire, England instead of Category:Herefordshire, UK or Category:Herefordshire, United Kingdom). Benwing2 (talk) 03:29, 11 April 2024 (UTC)[reply]

@Benwing2: But England is a country (within the UK), just as Denmark is a country (within the Danish Realm).

Oppose I think this is stoking up problems for disputed territories like Crimea or Kherson. --RichardW57m (talk) 11:33, 11 April 2024 (UTC)[reply]

@RichardW57m The issue with Crimea is definitely an edge case, and there isn't even a CAT:Crimea or CAT:Kherson. We can leave disputed cases like this without any country in them; this is not a problem. The vast majority of states and provinces are not disputed, however, and as the issue with Punjab shows, there's a real problem with ambiguity when the country is not mentioned. Do you still oppose if we leave any problematic cases without an attached country? Benwing2 (talk) 22:20, 11 April 2024 (UTC)[reply]

Can't we do it the other way around? Make a list of duplicates and make it obligatory there? I'm personally fine with both approaches, but I do think in some cases explicitly stating the country may be overkill and/or potentially problematic. Thadh (talk) 22:26, 11 April 2024 (UTC)[reply]

@Thadh That is possible; personally I like stating the country because e.g. I've never heard of Gunma Prefecture or Lampung, and I think it helps users if we give more context. Can you give examples where it's problematic to state the country (other than the already-identified cases like Crimea, and a few others that come to mind, such as Abkhazia, South Ossetia and Gaza)? Note also that these problematic cases have to be handled with special-purpose code in any case because their category text identifies the country they're part of. Benwing2 (talk) 22:37, 11 April 2024 (UTC)[reply]

I was thinking of regions with strong nationalistic movements, but no majority support for independence yet, or alternatively no international support for it. Things like calling Catalonia a part of Spain might strike a nerve with some people.

Think also of regions in Myanmar that are currently not controlled by the government. These are not disputed between multiple countries, they're disputed between one recognised country and a rebel group, which makes it pretty complex, and also very susceptible to rapid changes. Thadh (talk) 00:12, 12 April 2024 (UTC)[reply]

My general impression is that Wikimedia Commons and English Wikipedia encounter similar naming issues when there are two places with the same name, and those websites deal with these naming issues in more or less haphazard fashion, as Wiktionary does. A correct solution would include an extensive review the policies on those websites so Wiktionary could make an intelligently concieved standard policy. What a solution would be- I cannot say. --Geographyinitiative (talk) 22:45, 11 April 2024 (UTC)[reply]

Support. Binarystep (talk) 11:59, 20 April 2024 (UTC)[reply]

Russian book quotations[edit]

Which of the following quotation formatting variants do you like better:

1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
Когда они вышли, карета Вронских уже отъехала. Выходившие люди все еще переговаривались о том, что случилось.
Kogda oni vyšli, kareta Vronskix uže otʺjexala. Vyxodivšije ljudi vse ješče peregovarivalisʹ o tom, što slučilosʹ.
When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
Когда́ они́ вы́шли, каре́та Вро́нских уже́ отъе́хала. Выходи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.
Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vyxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
Когда́ они́ вы́шли, каре́та Вро́нских уже́ отъе́хала. Выходи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.
Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vyxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
Когда они вышли, карета Вронских уже отъехала. Выходившие люди все еще переговаривались о том, что случилось.
Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vyxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
Когда они вышли, карета Вронских уже отъехала. Выходившие люди все еще переговаривались о том, что случилось.
[Когда́ они́ вы́шли, каре́та Вро́нских уже́ отъе́хала. Выходи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.]
Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vyxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
1877, Лев Толстой, Анна Каренина, volume 1, Москва: Типо-литографія Т-ва И. Н. Кушнеровъ и К° (1903), page 87; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
Когда они вышли, карета Вронскихъ уже отъѣхала. Входившіе люди все еще переговаривались о томъ, что случилось.
Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
1877, Лев Толстой, Анна Каренина, volume 1, Москва: Типо-литографія Т-ва И. Н. Кушнеровъ и К° (1903), page 87; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
Когда они вышли, карета Вронскихъ уже отъѣхала. Входившіе люди все еще переговаривались о томъ, что случилось.
[Когда́ они́ вы́шли, каре́та Вронских уже́ отъе́хала. Входи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.]
Kogdá oní výšli, karéta Vronskix užé otʺjéxala. Vxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.

The differences are basically:

|text= the original text from a modern paper book edition |t= English translation
|text= the text from a modern paper book edition, modified to add stress accents and "ё" letters |t= English translation
|text= the text from a modern paper book edition, modified to add stress accents, "ё" letters and wikilinks for all words |t= English translation
|text= the original text from a modern paper book edition |tr= romanized transcription with the added stress accents and "jo" where appropriate |t= English translation
|text= the original text from a modern paper book edition |norm= normalization of the Cyrillic text with the added accents and "ё" letters |t= English translation
|text= the original text from a pre-reform paper book edition |tr= romanized transcription with the added accents and "jo" where appropriate |t= English translation
|text= the original text from a pre-reform paper book edition |norm= normalization of the Cyrillic text to modern orthography with the added accents and "ё" letters|t= English translation

Right now the variant 3 is used in Wiktionary. With some assistance from a presumably @Benwing2's bot correcting the quotations (e.g. this diff or this diff). But the conversion to modern orthography and the addition of stress accents and ё letters can be alternatively done automatically by a Lua module, allowing to implement the variants 4, 5, 6 or 7. With an extra benefit of having an instant feedback to the human, who is editing a Wiktionary article. See the technical discussion here and a working demo of such automatic conversion at Module:User:Ssvb/ru-autoaccent/testcases. The automatic conversion can be also amended via |subst= overrides for the parts of text that the automatic converter can't handle on its own due to the ambiguity of уже́ (užé) vs. у́же (úže) or все (vse) vs. всё (vsjo). It's also possible to override the whole sentence via |norm= parameter.

PS. I also noticed that the 1903's edition of "Анна Каренина" used the word "входившіе" ("coming inside") and the 1970's edition changed it to "выходившие" ("coming outside"). Just shows that we can't always fully trust the modern editions of books, so preserving the original pre-reform orthography from the old book editions may be useful in quotations.

--Ssvb (talk) 06:44, 11 April 2024 (UTC)[reply]

@Ssvb Hi, I've meant to respond earlier. In terms of the above variants, I would definitely be opposed to variants 4 and 6 where the stress is included only in the transliteration. If we take the approach of including the original unaccented, un-ё'd, unlinked text in the |text= param, we should use the |norm= param to include accents, ё's and links. Also I've been thinking there are ways in |subst= of avoiding having to repeat the unaltered text in most circumstances; e.g. if there's only one уже in the text, writing уже́ by itself instead of уже/уже́ should be enough. I've already taken this approach elsewhere in the Czech, Portuguese and Catalan pronunciation modules. I should also add, there are several edge cases your auto-accenting code really needs to handle properly; I had hoped by linking to my offline script you would glean those edge cases from the script, but you mostly dismissed them as bells and whistles (some are, some aren't). For example, in cases like до́ смерти (dó smerti), there's a multisyllabic word that must remain unstressed because of the preceding stressed preposition, whereas a naive approach would stress it. Benwing2 (talk) 07:01, 11 April 2024 (UTC)[reply]

@Benwing2: I haven't dismissed your offline script. I only mentioned that some of its functionality is clearly out of scope of my module. For example, the creation of wikilinks for the lemma forms of words. I don't think that doing lemmatization is practically feasible inside of a Lua module due to a much higher resources usage required for that. Additionally, I initially intended to implement auto-accenting for Belarusian quotations. And the code for processing the Russian "ё" from your offline script wasn't applicable to my use case (the dots above "ё" are mandatory in Belarusian texts and can't be omitted). I decided to put aside my Belarusian module plans and started implementing the auto-accenting code for the Russian language precisely because of your feedback. I believe that it would be easier to reach consensus and avoid friction when we are on the same page.

I like your idea about making the usage of |subst= simpler. Also thanks for your до́ смерти example. I have added it to the list of testcases and will update the code to handle it properly. BTW, my auto-accent code doesn't edit wiki pages, so it has no potential to do long lasting difficult to reverse damage. Of course, problems in it preferably should be fixed now, but there's no harm in initially deploying it even with a few minor bugs. The growth of the Wiktionary backup dump is the only thing that worries me, because the module data size is ~6MB right now. And each tiny adjustment of the dictionary regenerates it all for now, but it's possible to come up with an incremental updates scheme.

Do the curators of the Russian section of English Wiktionary have an opinion about making |text= faithful to the orthography of the original source and moving the accent markup to |norm=? --Ssvb (talk) 11:09, 11 April 2024 (UTC)[reply]

@Benwing2, @Atitarev: Could you please comment on this? I believe that, feature wise, I have finished the conversion functionality and I'm not aware of any remaining bugs in the algorithm or in the approach in general. But it performs a dictionary assisted conversion, so any errors or omissions in the existing Wiktionary entries snapshotted at https://dumps.wikimedia.org/enwiktionary/20240401/ will show up in the generated output. For example, the two testcases at the top of Module:User:Ssvb/ru-autoaccent/testcases rely on the existence of "антидилювиа́льный" and "за́ руку" entries. If somebody creates these entries right now, then the dictionary can be regenerated after the 20th of April upon the arrival of the next Wikimedia's backup dump. I can still tidy up the code to make it better commented, cleaner and faster, but now it's necessary to figure out what are the requirements and roadmap for integrating it into mainspace. --Ssvb (talk) 07:15, 13 April 2024 (UTC)[reply]

BTW, I wonder if the converter should keep the original spelling "антидилювіальныя" with "і" in its output or somehow visually highlight the word in other ways? My understanding is that it's the responsibility of a Wiktionary editor to provide the necessary |subst= crutches when adding a quotation. Automatic conversion can successfully and accurately do the bulk of the work, but still a human has to review the result and be ready to step in to add the necessary corrections. --Ssvb (talk) 07:41, 13 April 2024 (UTC)[reply]

@Ssvb Apologies for the delay in responding and apologies also for my slightly snippy comment about your earlier response. I need to look over your code in more detail, which I can probably do tomorrow (it's bed time for me now), but it feels to me like we need to resolve the issue of how to format quotations, transliterations and the like. I don't like the idea of having *only* the transliteration contain the accents; this is not how it's normally done here. Either the original or normalized version should contain the accents, too. This might need to differ between usexes (where it's probably OK to auto-accent the original) and quotes (where it's still to be resolved how to proceed). Also, just a note, any auto accenting you implement should err *STRONGLY* on the side of not putting in an accent if there's any doubt; better to have no accent than a wrong one. This goes especially for something that happens on the fly, because any review that the author does could become invalid due to a subsequent updating of the underlying data. Benwing2 (talk) 08:00, 13 April 2024 (UTC)[reply]

@Benwing2: I think that the Wiktionary quotation entries can have pretty detailed information in them and that the original unmodified spelling with all its archaic words and typos is a valuable piece of information, so it should be one of the template data fields. But the way how the information is presented to the end user in the browser is another matter. Having the original text, its automatic or manual normalization, the romanized transliteration and the English translation already makes it four lines of text in the browser instead of the current three lines. And things may look even more awkward if it's a multi-line poetry quotation. However, at the end of the day, the format of the presentation can be probably configurable on the frontend side and based on the end user's preferences. I mean, the end user may prefer to only see the modern normalized Cyrillic text and hide the original unaccented pre-1918 orthography to reduce the on-screen clutter, but this doesn't mean that we have to erase the original non-tampered quotations from the Template:quote-book instances themselves in the Russian entries.

As for erring *STRONGLY* on the side of not putting in an accent if there's any doubt, I think that with this approach none of the words can be safely accented. Because we can't rule out the possibility that the same word with a different stress position may be eventually added to the dictionary in the future. It's only possible to safeguard against this by not adding any accents at all. And the same goes for the prepositions "из", "за", "до", "у", "от", "со", "без", "по", "на" and many others. If we end up deciding not to accent any words adjacent to these prepositions at all, because of the risk of them being potentially a part of предложно-именные сочетания, then we would be doing a disservice to the users. I think that we should instead prioritize adding the missing dictionary entries to provide a good coverage for all of these cases. But maybe @Atitarev has an opinion on this?

As for the generation on the fly and future updates of the underlying data. I think that the module can probably automatically categorize quotations if it has troubles accenting something. I mean, let's suppose that Wiktionary initially had no entry for "за́мок" and the auto-accent module happily annotated this word as "замо́к" in one of the quotations. Now suppose that "за́мок" got eventually added to Wiktionary and the auto-accent module detected this inconsistency in a quotation. What's the best course of action? I think that just dropping the accent mark for "замок" and adding the page to some sort of a "need review" category would be a reasonable thing to do. --Ssvb (talk) 09:32, 13 April 2024 (UTC)[reply]

@Benwing2, @Atitarev: BTW, is there a big comprehensive and freely available Russian dictionary in an electronic format with full stress marks information to validate the Wiktionary words data against it? I mean, something similar to the Belarusian https://github.com/Belarus/GrammarDB --Ssvb (talk) 09:59, 13 April 2024 (UTC)[reply]

@Ssvb Apologies, I missed this. There are lots of online Russian dictionaries but I don't know of any where you can download all the data. Doesn't mean it doesn't exist though. You might ping User:Cinemantique on ruwikt, they seem to be still somewhat active there and may well know. Benwing2 (talk) 03:11, 19 April 2024 (UTC)[reply]

BTW, instead of my kinda lame and seemingly artificial "за́мок" / "замо́к" example, here's something more practical: the current module testcases include the word Ока́ (Oká), which is accented because it isn't ambiguous. But what if somebody adds О́ко Сауро́на (Óko Sauróna) and its genitive form О́ка Сауро́на (Óka Sauróna) to Wiktionary in the future? Using something like "То есть по крайней мере, помимо облика Ока, Саурон имел и физический облик человека" [14] as a quotation? This would invalidate the automatically generated on-the-fly accenting in the old quotations of the word "Ока". And we need to be prepared for the situations like this. --Ssvb (talk) 11:54, 13 April 2024 (UTC)[reply]

@Ssvb: Of the modern text versions (1 to 5), I like 1 (text, transliteration in the strict sense, and translation) best, then 5 (accented Cyrillic as normalisation, which is then transliterated), but can stomach 4, where the transliteration includes the stress. 6 and 7 are a different text - the difference from the first five matters if one is demonstrating spelling, but as a quotation for the emboldened word they are equivalent. I think Wiktionary policy would actually call for 6 or 7, as being the earliest issue available to the editor. I don't think it makes any difference for establishing the latest data of use of the emboldened word. --RichardW57m (talk) 11:54, 11 April 2024 (UTC)[reply]

Aside: I think the examples have abused |newversion=, and the presentation of the dates seems dodgy. I wonder if I may do that when millennia have elapsed between composition and the printed version quoted, with the 2nd version's fields referring to a translation not eligible for supporting the words used in it. I don't like the style of the wikitext - it's hard to extract pieces of data from it. --RichardW57m (talk) 12:08, 11 April 2024 (UTC)[reply]

@RichardW57m: The |newversion abuse was suggested here by @Sgconlaw. The usefulness of it is at least twofold:

We get an English translation, written by a native English speaker, who was also a professional translator. And by contrast, if I translate the Russian text into English myself, then I risk accidentally constructing something ungrammatical. Currently Template:quote-book doesn't support something like a |t-check parameter.
It attests that Constance Garnett considered that particular English word to be an appropriate translation for that particular Russian word back in 1919.

As for the dodgy dates, WT:QUOTE says "The year should be that of the earliest edition known to use the word. Where feasible, the page number should be taken from the first edition, but if a later edition is used (e.g. paperback version, or digitised by Google Books), then the publication date should be added in parentheses after the publisher’s name."

My understanding is that "Anna Karenina" novel was initially published from 1875 to 1877 in the literary journal "The Russian Messenger". And then there were multiple book editions printed after that. Russian Wikisource uses the book from 1903 for its pre-reform orthography text and the book from 1970 for the modern orthography text. Now which years should be referenced in the quote-book template? I wish there were a simple non-ambiguous and easy to follow guideline related to adding Russian quotations in WT:ARU. People would have a lot less headache. --Ssvb (talk) 15:33, 11 April 2024 (UTC)[reply]

@Ssvb: I don't dispute that the abuse is very useful. I've asked the same question myself, and had not got a satisfying answer. I've had the same problem using translations that are sometimes CC-BY-SA, so it's a legal obligation as well as courtesy to acknowledge the translator, which latter applies if the translation has been released into the public domain by the translator. (Sometimes I've felt obliged to use a more literal translation.) I'd been resorting to putting the acknowledgement in the |footnote= field. I've felt particularly cribbed when the spelling of an ancient text seems very much 20th century or later and I'm having to quote it from a scan published in yet another document. So I looked at the template invocations to see if I could learn a new trick. --RichardW57m (talk) 16:04, 11 April 2024 (UTC)[reply]

@Ssvb: Hi. My preference is #3 with accents in the souce language. Some people dislike linking each word. I think it's helpful for learners or someone who wants to analyse the text. The number of links could be reduced to some e.g. difficult, rare words and/or remove links for proper nouns, company/product names or words, which are NOT supposed to be linked (i.e. have entries).

IMO, providing accents on unaccented words is incorrect, e.g. #4. It was the original old practice, which has been eradicated over time. Let's not reintroduce it. :)

We already have an established practice to accentuate each Russian word and supply letter "ё" whereever a word appears. I belong to the group who wants to keep it that way, including quotes (and extend the good practice to other languages where appropriate). I go out of my way to provide accents fo Cyrillic-based Slavic languages, vocalisations for Arabic, and more recently Persian and Urdu, nuqta spellings for Hindi terms.

Errors like you mentioned re "входившіе" are uncommon. Editors may prefer to quote one or the other or both.

These changes sometimes deal with not just the spelling but the grammar. онѣ́ (oně́) -> оне́ (oné) -> они́ (oní). The entry for оне́ (oné) shows where the original Pushkin pronunciation was preserved for the sake of rhyming. Anatoli T. ^{(обсудить}/^вклад) 03:04, 15 April 2024 (UTC)[reply]

I'm not a fan of any of these. It's better to cite the first edition if possible, and in this case the first edition is available online. The quotation should be faithful to the source—Tolstoy wrote in the 1800s, not in 2024, and the quotation should reflect that. See Middle English pyteuous for an extreme example of this. That said, the normalization should be included if the original quotation is hard to understand (which might be the case here). Also, Anna Karenina was written 1873–1877 and published 1875–1877 as well as 1878 (per Wikipedia), so I think that you having |year=1877 is inaccurate. I propose:

1873–1877, Л[евъ] Н[иколаевичъ] Толстой [Leo Tolstoy], Анна Каренина [Anna Karenina], first volume, Москва: Типографія Т. Рисъ, […], published 1878, page 103; English translation from Constance Garnett, transl., Anna Karenina: A Novel, Philadelphia, P.A.: George W. Jacobs & Company, 1919, page 86:
Когда они вышли, карета Вронскихъ уже отъѣхала. Входившіе люди все еще переговаривались о томъ, что случилось.
[Когда́ они́ вы́шли, каре́та Вронских уже́ отъе́хала. Входи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.]
Kogdá oní výšli, karéta Vronskix užé otʺjéxala. Vxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.

Ioaxxere (talk) 14:51, 11 April 2024 (UTC)[reply]

@Ioaxxere: I wrote my previous comment right before I noticed that there was your reply already added. Thanks for all this additional information. I'm primarily interested in integrating the automatic text normalization auto-accenting Lua module, so focusing on the boilerplate with dates unfortunately may sidetrack the discussion a bit. That said, you raised a good question about whether the original quotation in the pre-reform orthography is easy or hard to understand for the intended Wiktionary users. I hope that somebody can answer it. --Ssvb (talk) 15:56, 11 April 2024 (UTC)[reply]

New label: ephemeral[edit]

There are many cases where a term briefly becomes extremely popular, and then fades into obscurity. Labelling these (dated) or (obsolete) doesn't feel right. Therefore, I propose the label (ephemeral) for this purpose along with an associated category. Some ephemeral terms in English:

Note that "ephemeral" only refers to time, not usage, so "ephemeral" terms can be slang, informal, or formal. Ioaxxere (talk) 01:06, 12 April 2024 (UTC)[reply]

In support of this concept. (Side note, what's the earliest term that could have this label be applied to?) CitationsFreak (talk) 02:02, 12 April 2024 (UTC)[reply]

Maybe terms like nervo-bilious and sulphite? Benwing2 (talk) 02:45, 12 April 2024 (UTC)[reply]

At first glance, the concept seems questionable. How would one characterize the difference between 'dated' and 'ephemeral' in such a way that users would care? How brief a period of 'popularity' (itself problematic) would we require for a definition to be deemed 'ephemeral'? Less than a decade? Would we need to define a class of curves of usage frequency over time? Moreover, as a practical matter, even with satisfactory definitions and criteria, systematic application based on objective facts seems very unlikely. DCDuring (talk) 14:18, 12 April 2024 (UTC)[reply]

How about "only used for X amount of years"? CitationsFreak (talk) 14:21, 12 April 2024 (UTC)[reply]

If facts to support 'X' were required, that label would rarely be used. DCDuring (talk) 15:04, 13 April 2024 (UTC)[reply]

Good point. I was thinking about this more, and I believe that the word should be heavily tied to a rapidly-forgetten fad. CitationsFreak (talk) 01:56, 15 April 2024 (UTC)[reply]

What happens to such words after twenty more years of successful curation here on Wiktionary? After a certain period of time, the use of any word no longer regularly used should just be considered "dated", or even "obsolete". Therefore, if you are already making the judgment that a word is "ephemeral", it has presumably already lost its regular usage and should probably also just be considered "dated" from the point at which the judgment has been made, for the foreseeable future of Wiktionary, or until such time that the word were to come back into vogue.

If a word is inextricably linked to a particular event whence its ephemerality derives, then that event and its ephemeral real nature should probably just be mentioned in its definition, rather than creating a new label that really just means to say "no longer used". It would also be difficult to determine what the requisite maximum "lifespan" of a word should be to be given this label.

It's worth noting that a word can be both "dated" and "ephemeral", however. I'm reminded of both inkhorn terms from 16th and 17th century English: Latinate neologisms brought into English and used for a short period of time—as well as several reactionary neologisms of more Anglo-Saxon stock, created or resuscitated from the depths of time in response, such as inwit and gleeman.

Shakespeare as well coined many, many words in his works, and though many of them are now part and parcel of English, many others have faded into obscurity, such as appertainments, attasked, conspectuity, defunctive, dispunge, enacture, ensear, exsufflicate, immoment, imperceiverant, intrenchant, irregulous, oppugnancy, relume, reprobance, and rubious. Surely some of those words (and others I can't call to mind right now) found "ephemeral" usage for a short time after their coining?

Hermes Thrice Great (talk) 16:42, 12 April 2024 (UTC)[reply]

@Hermes Thrice Great: I don't like to use "dated" or "obsolete" to describe ephemeral terms, since those labels generally describe words which were in common use for a long time, maybe even centuries, before gradually falling out of use. Ioaxxere (talk) 17:20, 12 April 2024 (UTC)[reply]

Oh? I wasn't aware. I've put "dated" on old computer terminology that only existed for a decade or two. Equinox ◑ 17:24, 12 April 2024 (UTC)[reply]

@Ioaxxere: For me, "dated" mostly refers to terms associated with a previous generation- using them "dates" you. Such terms were usually only popular during that generation. We've also used it in the sense you're referring to, but that's not the primary meaning. Chuck Entz (talk) 17:48, 12 April 2024 (UTC)[reply]

@Equinox, Chuck Entz: I think you two are agreeing with me. I consider "a decade or two", or a generation (~20 years), a relatively long time in comparison with the "ephemeral" terms listed above. Ioaxxere (talk) 18:09, 12 April 2024 (UTC)[reply]

I'm trying to think if I could use this. There was a city in Hubei named Jingsha for a little over two years between 1994 and 1996, that might be ephemeral. Also, I'm thinking that some Cultural Revolution-connected geographical terms may be ephemeral in this sense, but reach three durably archived cites. --Geographyinitiative (talk) 15:32, 13 April 2024 (UTC)[reply]

Looks like discussion has died down. What do we think — yea or nay? Ioaxxere (talk) 06:58, 25 April 2024 (UTC)[reply]

It needs a definition for sure, otherwise I would oppose. Benwing2 (talk) 07:09, 25 April 2024 (UTC)[reply]

@Benwing2: Here's a definition we could add to Appendix:Glossary.

ephemeral: Popular for a relatively brief period of time (about a decade or less) before rapidly fading into obscurity. Thus, there is no period of time where an ephemeral term ever existed in the standard language. In general, a term might be identified as ephemeral if its usage trend graph is shaped like a spike. If a term is labelled as ephemeral, the entry should state the approximate time range in which the term was in use.

Ioaxxere (talk) 19:56, 25 April 2024 (UTC)[reply]

Still not sure this is useful. Of the three terms given above, one (bookshelf wealth) is a neologism still in use, and covidiot is too new to say for sure that it is passé (what if COVID keeps recurring?). I'd need to see several more words that clearly fit the category in order for me to judge it useful. My general concern is that if we keep adding usage terms with different definitions, people will become hopelessly confused as to which one to use and all of them will get diluted (similarly to having uncommon, rare, very rare, etc.). Benwing2 (talk) 20:42, 25 April 2024 (UTC)[reply]

AAVE vs. African-American English[edit]

We have 637 terms in Category:African-American Vernacular English and 13 in Category:African-American English. However, I can attest (being married to a Black woman who doesn't speak AAVE) that the vast majority of these terms are not restricted to AAVE. About the only ones I can think of that come to mind as really AAVE-only are terms like aks/axe/ax "ask" and fitna (= fixing to "going to") that are proscribed in Standard English, along with pronunciation spellings like smoove, mouf, foun' and skraight that represent AAVE-specific pronunciation features. (I should also note that we have uncharacterized entries for weird pronunciation spellings like debbil that sound to me like something out of Uncle Remus; these need to be marked as archaic or obsolete or something.) This suggests we either need to move the large majority of these terms under Category:African-American English or just merge the two (since I'm doubtful the average Wiktionarian will be able to keep them correctly categorized). Benwing2 (talk) 04:53, 12 April 2024 (UTC)[reply]

Well yeah, to the category African-American English while still making the distinction in the entry if an editor intents too, so I had merged nouchi with Ivorian French from the beginning, because the antinomy only arises if you look upon all terms collectively from bird's-eye view, via the category system.

Many terms of these terms raise our attention as slang—as in the Ivorian example, where you would think the language is the official language of a nation and hence there would be an even greater share of non-vernacular terms—, so I doubt non-proscribed ones are the vast majority, from what we even have documented. For either comprehensibility or markedness, most terms may be proscribed in use with the general American population, you probably proscribe less if it is your wife—it depends which social circle you take as a point of reference? If it’s a white conservative one then it is every single one in “Standard English”? The other terms are often just intransparent as to their origin from African-American speech (dig) or particular meaning (Talk:tired).

But that may be not what you are trying to say, you think about Standard English as from an Afro-American perspective (which you can very well assume), if that makes sense, but as you realize the average Wiktionarians cannot consistently make sense out of it though he have enough charity to get your idea, never setting foot in America but consuming her media, like you have not been to Abidjan to accurately judge registers (this is just a statistical probability). Fay Freak (talk) 08:45, 12 April 2024 (UTC)[reply]

@Benwing2: What does "AAVE-only" mean? "Fitna" and "skraight" are the only ones that would be out of place in non-standard English dialogue out of the mouth of an Englishman of native origin. And as /skr/ is mostly of Danish origin in English, I wouldn't be surprised to find that the scream-stream merger came from the English West Country. --RichardW57m (talk) 09:55, 12 April 2024 (UTC)[reply]

I think typing shortcuts could help with future labelling and categorization. At present:

{{lb|en|AAVE}} generates (African-American Vernacular).

{{lb|en|AAE}} now generates (AAE) [could/should(?) generate African-American English].

The situation is, has been, and probably will be dynamic. Labels are likely to change and their application even more so. Empirical support for the application of the labels is likely to be scarce and soon (one or two decades) become dated or even deemed insulting. I don't think we can count on many contributors to make systematic revisions of fine categories under these circumstances. So labels that are not controversial and of broad application are probably best. "African-American English" would seem to fit the bill. Also, we do have register labels to add finer descriptions and for searches using Cirrus Search. DCDuring (talk) 13:04, 12 April 2024 (UTC)[reply]

Absolutely agree this is an issue. It's come up before, e.g. 2020 discussion here, linking to earlier discussions. Not only on Wiktionary but in the world people sometimes refer to the vocabulary of Black Americans in general, or "Black rappers' slang", "Black Twitter" etc as "AAVE", maybe due to not knowing what else to call it, and even in linguistics literature many uses of "African American English" use it as a synonym of "AAVE", so even that label doesn't unambiguously distinguish anything. Many uses of "MLE" on Wiktionary seem to similarly just mean "words Black rappers in Britain use" rather than specifically MLE. Sometimes white people have labelled entries "MLE, AAVE", just meaning Black rappers on both sides of the Atlantic use it, not that it manages to belong and be limited to the specific lect of AAVE and also the specific lect of MLE. I do feel some reluctance to do away with (merge into broader labels) specific AAVE or MLE labels, because there are a few things in those categories which belong to those specific lects, but it'll continue to be a constant, effort-intensive (and if the current state of the categories is anything to go on, losing) task to keep checking the categories' contents for wrong entries, because it's clear people both on Wiktionary and in the world at large don't distinguish "AAVE" from "words Black Americans use".
Yes, Remus-esque stuff like debbil or ebery needs its own category... as I said in 2020, I wouldn't even put them in whatever "African-American English" category we put the rest in (even with an "obsolete" label), because it's not clear they were mainly or ever used by Black people, as opposed to just by white people caricaturing Black people. Maybe create a label for use in pronunciation spelling of|en|foo|from=bar}}, to display something like "19th century white caricatures of Black speech" or something... - -sche (discuss) 14:49, 12 April 2024 (UTC)[reply]

"African-American Vernacular English" and "African-American English" are synonyms, so the categories should be merged. Ioaxxere (talk) 16:22, 12 April 2024 (UTC)[reply]

They're sometimes synonyms. But they're also sometimes not synonyms, sometimes African-American English includes AAVE but also other African-American sociolects of English. Most of what we currently have labelled "AAVE" is actually not AAVE, just AAE, as Benwing points out. But given the occasional synonymy and the widespread inability to consistently distinguish them, maybe we are better off moving all of this to a new label like "Black American English". - -sche (discuss) 17:51, 12 April 2024 (UTC)[reply]

@-sche I would support merging the two into a label like "Black American English". Benwing2 (talk) 20:51, 12 April 2024 (UTC)[reply]

Because any difference to Canadian Black English is also a black box? Has a parallel advantage if we introduce “Black British English”, which may or may not include Ireland, but we already have Category:Multicultural Toronto English, which would be another Black American English, and that like MLE only in the last three decades, so behind it is a substratum of Black English like in the US, not a recent “multiethnolect”, a substratum Britain did not have. Strange that the term Canadian Black English is a hapax. There is some truth to even linguistic literature being at a loss. Fay Freak (talk) 21:31, 12 April 2024 (UTC)[reply]

@Fay Freak: Multicultural Toronto English is emphatically not a type of American/Canadian Black English. They have completely different origins. Ioaxxere (talk) 21:38, 12 April 2024 (UTC)[reply]

@Ioaxxere: Isn’t that what I was saying? (Though I have doubts they will stay distinct later into the century.) It would be strange if we rename MLE to British Black English but keep “MTE” modelled after the MLE term, and also if we take advantage of the term “Black American English” but not “Black British English”, we are in a pickle.

I add that there is also “Black Canadian English”, as one can’t use arbitrary orders of English adjectives, also rare, but here referring to MTE. Fay Freak (talk) 21:43, 12 April 2024 (UTC)[reply]

My understanding is that the demographic surge of emancipated slaves and their descendants leaving the US South combined with barriers set up to keep them out of the mainstream created a distinctive type of Southern-based speech that almost swamped out the other varieties. Those other varieties aren't really part of AAVE, nor is that of those who made it into the mainstream (not to mention more recent immigrants from the Caribbean and from the rest of the world). The problem is that this complexity is almost invisible to those of us from other parts of US society, so it's too easy to conflate everything else with AAVE. Chuck Entz (talk) 23:08, 12 April 2024 (UTC)[reply]

We don't have a good factual basis for maintaining relatively fine distinctions, so a broad label that is subject to criticism, but defensible, is probably better than narrow ones, which are also subject to criticism. Most criticism doesn't seem to be based on systematic facts, just anecdotes and idiolects. DCDuring (talk) 15:01, 13 April 2024 (UTC)[reply]

Ukrainian IPA transcriptions—in particular concerning the vowel И[edit]

I’ve noticed that many (the majority) of Ukrainian entries here on English Wikipedia transcribe the Cyrillic letter И as [e], when in fact it should—in most every case (but not all)—be transcribed as [ɪ] (see, for example w:Help:IPA/Ukrainian). This is especially obvious when comparing the IPA transcriptions for the same word on Ukrainian Wikipedia and English Wikipedia—here are a few examples:

en:читати has [t͡ʃeˈtate], but uk:читати has [t͡ʃɪˈtatɪ]
en:закінчити has [zɐˈkʲint͡ʃete], but uk:закінчити has [za'kint͡ʃɪtɪ] replace ' with ˈ, invalid IPA characters (')
en:життя has [ʒeˈtʲːa], but uk:життя has [ʒɪˈtʲːɑ]
en:робити has [rɔˈbɪte], but uk:робити has [rɔˈbɪtɪ]
hundreds more, no doubt…

It seems that most of the cases where the erroneous IPA transcriptions show up were added by the bot User:WingerBot, and the IPA transcriptions tend to be more correct with regards to this letter when a real user (who presumably knows Ukrainian) had added the transcription prior to the bot having done its work on the Ukrainian corpus here on English Wikipedia. For example, in the entry ринок, the transcription, provided by a real user, has been given correctly as [ˈrɪnɔk].

I'm not sure if the bot could be reconfigured to go through these entries and fix them, but this would be preferable to having to go through all these entries manually, especially because not all Ukrainian entries are affected, and also there are indeed cases where [e] ought to be the IPA transcription for И.

Hermes Thrice Great (talk) 16:01, 12 April 2024 (UTC)[reply]

@Hermes Thrice Great I don't think this has anything to do real users vs bots, because all of these entries seem to use {{uk-IPA}}, including the one you've given as an example of not having the problem. Evidently a distinction is made in the underlying module somewhere (which looks to be down to whether the syllable is stressed or not). It would be a really bad idea to go through all of these manually, because that would decouple the entries from the template, meaning that it's much harder to ensure entries are kept in sync. Theknightwho (talk) 16:18, 12 April 2024 (UTC)[reply]

@Theknightwho Sorry, you’re right, I should have looked into the template code. Yikes, this is a real mess then.

Hermes Thrice Great (talk) 16:50, 12 April 2024 (UTC)[reply]

@Hermes Thrice Great I expect it’s easy to fix (if it needs to be fixed - I don’t know what scheme @Benwing2 used). Also pinging @Atitarev. Theknightwho (talk) 16:59, 12 April 2024 (UTC)[reply]

@Theknightwho @Hermes Thrice Great This implementation predates me and yes it is simply merging и and е when they are unstressed. According to Ukrainian phonology:

/ɛ/ and /ɪ/ approach [e], which may be a shared allophone for the two phonemes.

This is equivocal as to whether these two sounds are actually the same; I imagine it depends on how formal the speaker is being. Maybe Anatoli can comment. Benwing2 (talk) 18:55, 12 April 2024 (UTC)[reply]

Yes, that’s why I said in some cases it makes sense to transliterate it that way. And yes, it may approach a shared allophone in some cases, but especially at the end of a word in the infinitive form, and especially where there are the two separate vowels Е and then final И in the word, like for example перекладати, the difference should be transcribed. In the example I just linked, you can hear very clearly in the audio the difference in the two vowels, yet the IPA transcription given transcribes them the same, as [perekɫɐˈdate] instead of [perekɫɐˈdatɪ].

Hermes Thrice Great (talk) 01:39, 13 April 2024 (UTC)[reply]

@Benwing2, @Theknightwho, @Hermes Thrice Great: It’s not a mess. The module was based on one classical version of Ukrainian, which may not be common, known or accurate from the modern perspective. It was consistent. I don’t have any objection to change to [ɪ] for the unstressed «и». I’m sure there are cases where the pronunciation is not considered modern or common but there are too many opinions on this. I prefer the scheme to be sourced. Anatoli T. ^{(обсудить}/^вклад) 01:46, 13 April 2024 (UTC)[reply]

@Voltaigne might want to say something on the topic. Anatoli T. ^{(обсудить}/^вклад) 02:03, 13 April 2024 (UTC)[reply]

To my orthoepically untrained ear, "и" sounds closer to [ɪ] in stressed and unstressed syllables, but can tend towards [e] when it occurs unstressed at the end of a word. In any event I share Atitarev's preference for a sourced transcription scheme. Voltaigne (talk) 11:19, 13 April 2024 (UTC)[reply]

@Voltaigne, @Benwing2, @Theknightwho, @Hermes Thrice Great: We can decide with a vote to change to [ɪ], since this is the most controversial and often mentioned discrepancy with the modern transcription. I am sure this can sourced too. Anatoli T. ^{(обсудить}/^вклад) 03:07, 15 April 2024 (UTC)[reply]

Mainspace Proto-West-Germanic?[edit]

ᚲᚨᛒᚨ and Proto-Germanic *kambaz are now in CAT:E because the former has been created in mainspace as a Proto-West-Germanic entry and the latter links to it. It's true that ᚲᚨᛒᚨ is known from writing on a comb from the 3rd century that was found in the near Erfurt or Frienstedt in Germany. It apparently isn't Old Norse, which is allowed in mainspace, or East Germanic. That leaves either Proto-Germanic or Proto-West-Germanic. I don't know enough about either to say what should be done, but we can't leave things the way they are. Chuck Entz (talk) 23:50, 12 April 2024 (UTC)[reply]

I see two alternative approaches:

1: downgrade mainspace refusion errors in proto-languages rarely ever attested to editor-directed warnings on editing and on read with {{attn}} which now we see one every page with a gadget, and categorization of such pages into “mainspace entries of a language almost always only reconstructed”, since we don’t even have categorization of the error-affected section at all. I.e. raise maximum attention but still operate.

2: Or add some exception list for particular pages that only people privileged due to knowing what they do (autopatrollers) can modify, or even know about, which would raise enough awareness. I.e. have a preventive approach and annoy during the editing process already.

Personally I prefer the second as the more aggressive approach.

The appropriate language name though is probably Proto-West Germanic, in view of the dating and ending, having been created before we introduced Proto-West Germanic. Fay Freak (talk) 11:54, 13 April 2024 (UTC)[reply]

A list of approved mainspace pages could be a good idea for a case like this where the language will have very few, iff it isn't too memory-"expensive". My reaction would've been to just change PWG to function like Proto-Norse, if some terms in PWG are attested (if we take the attested terms to be PWG), but that has the downside that then people aren't warned if they forget to put the asterisk in front of any of the more numerous unattested words. - -sche (discuss) 14:09, 13 April 2024 (UTC)[reply]

@-sche Small lists of exceptions like this aren’t the kind of thing that impact memory use in a problematic way, so I wouldn’t worry about that. Theknightwho (talk) 14:19, 13 April 2024 (UTC)[reply]

Though a better solution might be some kind of manual override (an “anti-asterisk”), for use with reconstructed languages, which could be placed in headwords/links to suppress the error. It would have to be something relatively obscure, so it couldn’t be entered by mistake, and unlike an asterisk it wouldn’t actually display. Theknightwho (talk) 14:22, 13 April 2024 (UTC)[reply]

This has lower maintenance and hence annoyance than an exception list; we would have a tracking category filled by the parametral override, for such rare cases then, in place of an module-data exception list—though categories have memory impact, it would barely ever not be on low-stress page-titles. Fay Freak (talk) 17:52, 13 April 2024 (UTC)[reply]

Maybe ‼foobar or ꜝfoobar? Inspired by "sic!", but thinking !foobar itself might not be the best choice because someone might actually type that. Just spitballing. (Obviously, past some threshold, a reconstructed proto-language has enough attested terms to merit just treating it as attested, but I don't know if we want "one attested term" to be the threshold.) - -sche (discuss) 20:49, 13 April 2024 (UTC)[reply]

"Forget to put asteriks in reconstructions"? If a word is a reconstruction, you'll never forget to put an asteriks if you know what your doing. Did such mistakes happened before? And who is gonna do such mistake? Is it not a policy we have here on Wiktionary to put asterikses for reconstructed words? Why worry yourself then? Tollef Salemann (talk) 19:48, 13 April 2024 (UTC)[reply]

Would that that were so! Sadly, not everyone knows, or at least manages, to do everything right all of the time... even in your own comment you forgot to put the apostrophe that goes in know what you're doing — or perhaps that was your point/joke, heh. The number of unattested or attested terms that have been RFVed, RFDed or RFMed to move them either to or out of the reconstruction namespace is non-trivial, and is just a fraction of the cases where someone incorrectly omitted or used an asterisk. - -sche (discuss) 20:49, 13 April 2024 (UTC)[reply]

Potentially irrelevant question: how certain are we that the inscription on the Erfurt-Frienstedt comb even means "comb"? It seems odd for a comb to be labelled "comb". Could it just be saying that the comb was owned by someone named "Kaba"? Ioaxxere (talk) 01:23, 14 April 2024 (UTC)[reply]

Nothing is certain, but some variation on this is exactly what would be expected as a word for comb in that time and place based on comparing attested terms in related languages. It's not a coincidence that the ancestor of this term is reconstructed as *kambaz. Also, writing in early Germanic cultures had dimensions not found in modern culture- it had some kind of intrinsic magic. Many of the objects from that period had explicit references to the evil things that were supposed to happen to anyone who stole them. The circumstantial evidence points to this being some kind of offering that was intentionally placed where it was, so the writing may well have had some deeper purpose. Chuck Entz (talk) 02:47, 14 April 2024 (UTC)[reply]

Actually, in Old Norse runic inscriptions, most of short inscriptions on objects ending with -a are infact names of the owners (where "a"-ending is a shortening of "á mik"). But in PWG it is not gonna work same way with this ending of course. Anyway, the inscriptions on objects calling objects names can be not only made with magical purpose as Chuck says, but also just for fun (like mr. Tenevil did object inscriptions in Chukotka). Tollef Salemann (talk) 07:45, 14 April 2024 (UTC)[reply]

Also, i don’t remember how many exactly, but there are some other objects with runic inscriptions on them calling the name of the object and not the owner. On the other hand, if the owner’s name is used, it is often used in context of an ownership phrase, also not the name alone. Not sure how it works for Germany tho. Tollef Salemann (talk) 07:55, 14 April 2024 (UTC)[reply]

clarify boilerplate on categories for terms derived from vs topical to fiction[edit]

Wiktionary:Tea room/2023/December#Category:English_terms_derived_from_Star_Wars shows we would benefit from clarifying the boilerplate on "Foo terms derived from [fiction]" vs "foo:[fiction]" categories, to better explain the difference. I think most people agree on the core difference (?) :

"CAT:English terms derived from Star Trek" is for terms like cloaking device, which was coined by Star Trek. (In this case, cloaking device happens to also not belong in CAT:en:Star Trek, because it's a general-English word and no longer has any closer connection to Star Trek than captain does.)
"CAT:en:Star Wars" is for terms like Glup Shitto, which is directly about Star Wars. (Glup Shitto does not belong in "...derived from Star Wars", because it's not used in Star Wars or derived from anything that is.)
Some terms, e.g. Dalek, both derive from and are directly about (a race from) a fictional franchise, and belong in both a topical category and a "derived from" category.

But sometimes a franchise coins a term X, or the franchise itself is named X, and someone else modifies X to XY: is XY "derived from" the franchise?

Does Vaderesque belong in "terms derived from Star Wars"? (Or Voldemortian in "terms derived from Harry Potter"?) The franchises only coined "Vader" and "Voldemort", someone else added the suffixes.
Does Warsie belong in "terms derived from Star Wars"? (Or Trekkie in "...derived from Star Trek"?)
Does Hand Solo belong in "terms derived from Star Wars"? It derives from modifying the name of a character.

Asking here rather than in the December TR in hopes of reaching more people. How people feel about these will help with rewriting the boilerplate: if the answer to any is "yes", we should change the "derived from" categories' text from saying terms "originated in" the franchise, to saying ~"originated in or are derived from", whereas if the answer to all three is "no" and we exclusively want terms coined by the franchise, not just derived from it, then we should rename the categories from "derived from" to "coined by"... - -sche (discuss) 05:15, 13 April 2024 (UTC)[reply]

PS how 'expensive' would it be for the modules that generate the categories' boilerplates to check, on either type of category, whether a corresponding category of the other type exists (or failing that, to let users input the other category into e.g. a "crossreference=" parameter), and link each category to the other so that users are aware of both and hopefully glean the distinction from comparing them? - -sche (discuss) 05:15, 13 April 2024 (UTC)[reply]

@-sche This is not expensive, and in any case the category pages aren't generally memory hogs. Benwing2 (talk) 21:52, 13 April 2024 (UTC)[reply]

I note the following:

The "core difference" mentioned above seems somewhat different to what was mentioned by some editors during the Tea Room discussion, since it was posited there that "Category:English terms derived from Star Wars" includes all terms etymologically derived from terms invented in the franchise, while "Category:en:Star Wars" includes terms semantically related to the franchise.
I don't think there is anything inherent in the way either of the above categories is named which points one way or another. Derived is quite a general word. Thus, I think it is open to us to define the categories in whatever way achieves consensus.
I think terms like Hand Solo ought to be in one of these categories, because—like it or not—I'll bet editors will be adding them to such categories anyway.
Ideally, the categories should be defined in such a way that a term belongs in one or the other category, not both. (I assume "Category:English terms derived from Star Wars" will remain a subcategory of "Category:en:Star Wars". Thus, editors should put a term in the subcategory if it applies, and only put it in the parent category if the subcategory is inapplicable.)

I'm minded to suggest the following definitions for the categories:

"Category:English terms derived from Star Wars" is for terms which were coined in the franchise as well as terms etymologically derived from such terms, but excluding the name of the franchise itself unless it is used in-universe within the franchise.
- Comments: I wholeheartedly agree that the category should be limited to coinages to exclude general terms like cloaking device and moon which may happen to be used in the franchise. Extending the category to terms etymologically derived from franchise terms means that terms like Hand Solo and Vaderesque will be included.
"Category:en:Star Wars" is for terms which are neither coined in the franchise nor etymologically derived from such terms, but relate in some way to the franchise. Terms etymologically derived from the name of the franchise should be placed here, if the name is not actually used in-universe within the franchise.
- Comments: Thus, a term like Glup Shitto would be placed in this category. This category would also include terms like Star Wars Day, Trekkie, and Warsie since (I assume) the terms Star Wars and Star Trek aren't actually used in-universe within the franchise.

— Sgconlaw (talk) 21:51, 13 April 2024 (UTC)[reply]

I think Vaderesque, Warsie, and Hand Solo do not belong in Category:English terms derived from Star Wars. Ioaxxere (talk) 01:23, 14 April 2024 (UTC)[reply]

@Ioaxxere: why, and which category should they then be placed in? — Sgconlaw (talk) 03:19, 14 April 2024 (UTC)[reply]

French Translingual[edit]

There used to be five entries in the nonexistent category Category:French Translingual: ĵ, ŝ, ꞓ, ꞓ̂ and ꭒ. These are due to User:Kwamikagami; they used to e.g. define ĵ as "j with a circumflex" but Kwami changed them to try and capture some of their common uses, in this case by changing the definition to

# {{lb|mul|phonetics|French}} {{ng|The affricate sound of English ''[[j]]''.}}

This was leading to the weird categorization. I made a change a couple of weeks ago to add language restrictions to labels like French so they only are supposed to categorize for a subset of known good languages. This was broken but I just fixed it, so now these entries no longer categorize. This leaves the question, though, of what the definitions *SHOULD* be. I think Kwami's point was that the definition "j with a circumflex" doesn't carry much information, but OTOH I'm sure ĵ is used for purposes other than the now-stated one. Thoughts as to how to make the definitions better? Benwing2 (talk) 21:47, 13 April 2024 (UTC)[reply]

Does "French" mean this is only used (in this way) in the French language? Then the definition should be in a ==French== section, no? Or is there a "Frenchist" school of phonological notation the way there is an "Americanist" one (that uses "y" for /j/, etc) one, and people writing in multiple languages (but using this school's notation) would use this letter this way? Then I would think that either needs a different label, or needs to be in the definition, a la # {{ng|The affricate sound of English ''[[j]]'', in Frenchist phonological notation.}} replacing "Frenchist" with whatever the appropriate term is. (Even if using labels in this way no longer categorizes, we probably still want to track it, because it seems like ... well, a sign of an entry which should be cleaned up ...) - -sche (discuss) 23:09, 13 April 2024 (UTC)[reply]

@-sche That's a good idea, I will create some cleanup categories or tracking pages. I think the intent was to express the idea of a "Frenchist" school of notation but I don't know if any such thing actually exists. Benwing2 (talk) 23:31, 13 April 2024 (UTC)[reply]

Maybe any label which—with the language being mul—would normally generate "Category:[language name] Translingual", could generate a cleanup/attention category instead? (Maybe any regional label + Translingual should also be monitored, e.g. {{lb|mul|Appalachia}} / "Category:Appalachian Translingual"?) If it would not be too difficult or expensive, it would also be useful to track/categorize cases of "Category:[language name] [different language name]", e.g. "Category:German Irish", another persistent type of erroneous use of labels (when people use "Germany" etc as a topic label); the few categories where that is actually correct, e.g. "Category:Vietnamese Chinese", would need to be exempted. - -sche (discuss) 23:52, 13 April 2024 (UTC)[reply]

@-sche I added tracking for any label that fails a lang restriction, which should include both types you mention above but not e.g. Category:Vietnamese Chinese, which doesn't fail any lang restriction. Benwing2 (talk) 00:08, 14 April 2024 (UTC)[reply]

@-sche BTW I am skeptical there is any such thing as a Frenchist phonetic school; the IPA was created in fact by French linguists and all French dictionaries I've ever seen use the IPA. Benwing2 (talk) 00:12, 14 April 2024 (UTC)[reply]

@Benwing2, -sche: User:Kwamikagami#Initiation à la phonétique may be relevant to this. 0DF (talk) 16:14, 21 April 2024 (UTC)[reply]

I suspect that Initiation à la phonétique is idiosyncratic to that author.

The ĵ, ŝ, ꞓ, ꞓ̂, ꭒ convention is used for transcriptions of other languages. It is a "Frenchist" system, in that AFAICT it's only ever found in French-language texts. But even so, it differs from e.g. <ă> in the Merriam-Webster and Random House transcriptions found in American dictionaries in the sense that the MW characters are used in English for English. For ĵ, ŝ, ꞓ, ꞓ̂, ꭒ, they're used in French-language texts to spell out other languages phonetically. I'd naively expect that to be covered by the word "translingual": it's rendering one language in a script legible in another.

A similar situation might be ʹ. Let's suppose for the sake of argument that, in its entry for being a transliteration of the Cyrillic soft sign, it only occurs in English-language texts. (ʹ of course is not so restricted, but we may be able to find Library of Congress transliterations that are.) Should that sense then be listed under 'English'? The words it appears in are never English, only Russian etc. If we placed it under 'English', I suspect our readers would understand it to mean that it's English. And if they were searching for it to understand Russian transliteration, would they know to look for it under 'English'?

If the Americanist phonetic notation were only attested from use in English-language sources, would we reclassify all the symbols as 'English'?

If we place the 'Frenchist' letters under 'French', and discover that people writing in other languages of Francophone countries, such as Wolof, used them in books written in their languages, would we then need to move them back to 'Translingual'?

I'd think that a system designed to represent one language in the context of another shouldn't be identified as being either, but as a cross-linguistic transcription. Anyway, the reason I'd tagged it as both 'phonetic' and 'French' is that it appears to be specifically designed to be understood by readers who assign French values to the letters of the Latin alphabet. kwami (talk) 19:23, 21 April 2024 (UTC)[reply]

Here (and on the preceding page 13) it's used to transcribe 'patois' pronunciations; I don't see which "Vincelles" is being discussed, but the 'dialect' appears to be Franco-Provençal.

There's a clearer description (though the copy is too light to be easily legible) here, in a French-language description of a Greek dialect. kwami (talk) 20:09, 21 April 2024 (UTC)[reply]

Hmm, I think I see your point. But it seems very debatable to me. I do think it would be clearer to explain the situation within the definition, rather than to {{label}} it as French Translingual; something like "A symbol used to [transliterate / respell] [whatever] in[to?] French." And while I can see your argument for having it as Translingual, if it's only used in French, I still see the argument that it's just French ... and I can see the argument that it may not be the kind of thing to include at all. I mean, we don't say ш is used to transliterate/respell English sh and German sch, and we don't say ja is used to respell я, it seems like some things are considered to be nonlexical. - -sche (discuss) 21:09, 21 April 2024 (UTC)[reply]

I don't see any qualitative difference between this and IPA, NAPA or Merriam-Webster transcriptions. True, we don't give ш as the Cyrillic transliteration of German sch, and doing so would mean potentially adding a huge number of additional definitions. Since we're English WK, should we restrict transliterations and phonetic transcriptions to those used in English-language sources?

The reason I added these in particular was that we had translingual sections in these articles that didn't have any actual content, and no evidence of translingual use. I've gotten in trouble for deleting sections with no content, but didn't want to leave them in such a bad state, so I did a search for translingual use and this is what I was able to find. kwami (talk) 21:33, 21 April 2024 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Kwamikagami, -sche, Benwing2: This system of phonetic representation appears to be a creation of Jean-Pierre Rousselot. AFAIK, it originates in:

L’abbé Rousselot (1887) “Introduction a l’étude des patois [Introduction to the study of dialects]”, in L’abbé Rousselot, J. Gilliéron, editors, Revue des patois gallo-romans [Journal of Gallo-Romance dialects] (in French), volume I, Paris · Neuchâtel: H. Champion · Attinger Frères, §§ II: « Système graphique » et III: « Analyse des sons » [§§ II: “Graphical system” and III: “Analysis of sounds”], pages 3–17

and is recapitulated in brief here. In the light of that, may I suggest the label “Rousselot phonetic notation”? 0DF (talk) 22:48, 21 April 2024 (UTC)[reply]

@Kwamikagami @0DF Are these symbols still in use? The citations given appear to be <= 1900. Benwing2 (talk) 23:00, 21 April 2024 (UTC)[reply]

Thanks, ODF. That label works for me.

Can you see what the difference between 'nasale' and 'demi-nasal' is? Can that be rendered in Unicode?

Benwing, I thought my original sources were a bit later, but still early 1900s. Around the era that Americanist notation was being developed. kwami (talk) 23:09, 21 April 2024 (UTC)[reply]

@Kwamikagami But AFAIK the Americanist notation is still in use, and in any case has seen quite widespread adoption; my concern here is that these symbols might be idiosyncratic to one or a few authors from a particular time period. Benwing2 (talk) 23:11, 21 April 2024 (UTC)[reply]

I assume that they're defunct, just as many Americanist symbols are. (The [Americanist] system remains in use, but with a reduced inventory that gets closer to IPA over time. Some Americanist symbols aren't even supported by Unicode.) But this system does seem to have been used by a number of authors.

Anyway, the reason I added these was to have some content in the translingual sections of those articles, not because I thought they were particularly notable. My first impulse would be to delete those sections, but I've gotten burned doing that before. I wouldn't object to them being deleted though. kwami (talk) 23:21, 21 April 2024 (UTC)[reply]

@Kwamikagami: “Nasalité” is discussed in Rousselot 1887 (15–16), but I didn't glean the difference thence (but then, my French is poor). The graphic difference between “nasale” and “demi-nasale” is extremely slight (almost nonexistent) in both Rousselot 1887 and his 1891 recapitulation. I'll have a look at Unicode tildes to see whether they encode it.

@Benwing2: I don't know whether this system is still in use, but the texts that make use of it still exist, so the information is still valuable, IMO.

0DF (talk) 00:02, 22 April 2024 (UTC)[reply]

He doesn't mention 'demi-nasal' there, though he does speak of weak nasalization, where you need to place a mirror under the nose to even tell that it's there. I don't know if that what he meant or not, but I was more concerned about whether the text could be digitized. kwami (talk) 00:14, 22 April 2024 (UTC)[reply]

@Kwamikagami: The greater "amplitude" of the demi-nasale tilde could be represented by ◌᷑ or ◌᷉, although neither is semantically ideal. 0DF (talk) 00:09, 22 April 2024 (UTC)[reply]

There's an encoding problem with the Vietnamese "apex", which really is just a tilde; the problem is that the Unicode tilde is used as a tone mark, so something else needs to be used for the true tilde (apex). So if there were a solution to Rousselot notation, that might could be used for Vietnamese as well. An IPA diacritic is used on Wiktionary and Wikisource, but that's not ideal, and the medievalist -ur won't work for various reasons I don't fully understand. kwami (talk) 00:18, 22 April 2024 (UTC)[reply]

@Kwamikagami: Judging by this image, that apex doesn't look like a tilde. w:Vietnamese apex uses ◌᷄ for that diacritic, which is imperfect, but probably as close as can be got using Unicode. Rousselot probably just used a different typeface's tilde for the demi-nasale. Or maybe one of the tildes is actually a perispomene (◌͂). Alternatively, but rather less probably, France had already had Vietnamese territories for fifteen years at the time Rousselot devised his système graphique, so one of those tildes might even be that Vietnamese tone mark (probably not the apex though, since that had been superseded by -ng long before then). 0DF (talk) 01:09, 22 April 2024 (UTC)[reply]

The modern, wavy form of the tilde is rather late. In the era the apex was used in Vietnamese, it looked just like the tilde in Portuguese, which was called the 'apex' at the time. The Vietnamese tone mark was evidently the perispomene, and got miscoded in Unicode. kwami (talk) 01:12, 22 April 2024 (UTC)[reply]

@Kwamikagami: Yes, the serpentine perispomene is, strictly speaking, incorrect, but I've seen it many times in nineteenth-century Greek texts. 0DF (talk) 01:37, 22 April 2024 (UTC)[reply]

bullet points, usage notes and etymologies[edit]

Diff was reverted, but wasn't it correct to add the bullet point? Don't we normally bullet usage notes? (And on the topic of bullet points, don't we normally not bullet etymologies? Because I sometimes see people bullet them.) Do we have enough consensus about whether these sections should vs shouldn't be bullet-pointed to add something to WT:ELE about it? - -sche (discuss) 00:34, 14 April 2024 (UTC)[reply]

@-sche Yes, I have gone through several times and added missing bullet points to Usage notes. I thought this was the standard, and also I agree there shouldn't be bullet points in etymologies. Benwing2 (talk) 00:39, 14 April 2024 (UTC)[reply]

I agree with User:Benwing2. Ioaxxere (talk) 01:23, 14 April 2024 (UTC)[reply]

I’m not sure a bullet point is needed if there is only one usage note, but don’t mind if one is added. On the other hand, I do think that it is occasionally desirable to use bullet points in etymologies for readability, for example, if a term is partly derived from two languages. — Sgconlaw (talk) 03:23, 14 April 2024 (UTC)[reply]

@Sgconlaw: That's true, bullet points are useful if an etymology often takes the form of a list, such as at kibosh. But I think we all agree that it shouldn't be the default, so accomplished for example shouldn't have a bullet. Ioaxxere (talk) 04:54, 14 April 2024 (UTC)[reply]

@Ioaxxere Yes, exactly. Benwing2 (talk) 05:03, 14 April 2024 (UTC)[reply]

@Ioaxxere: yes, single-sentence or paragraph etymologies shouldn’t be bulleted. — Sgconlaw (talk) 05:20, 14 April 2024 (UTC)[reply]

I agree with this, kibosh is fine. If we add something to WT:ELE my suggestion, subject to wordsmithing/improvement please, would be along the lines of: either "The first sentence of an etymology should not be bulleted. Other parts of the etymology may be bulleted if they are a list." if we think lists should always be introduced by something unbulleted ("Uncertain. Theories include:", etc), or at least "The paragraphs/sentences of the etymology section should not be bulleted unless they are a list." IMO a list must also contain more than 1 item, a "1-item list" is not a list and should just be unbulleted; for bullet points to be used, there should be multiple items IMO. - -sche (discuss) 15:20, 14 April 2024 (UTC)[reply]

FWIW, as of a database dump from last August (what I had handy), 3,834 pages contained Etymology===\n\* like abdominothoracic (very many of them added by just one user), and 24,770 pages contained Usage notes====\n[A-Z] like Abrahamic. I did this just to quickly get a rough figure, it does not account for numbered etymology sections or people using Usage notes at L3 or L5, and some pages have probably changed since August, to stop having the problem or to newly have the problem. Iff we agree on fixing these, maybe a bot (operating from a more up-to-date list) could remove the bullet from any etymologies where there is only one "item"/sentence/paragraph (and any etymologies like this, where the first item is bulleted and the second item is a bulleted affix template?), and we could see whether what's left is a small enough number to go through by hand? (In case some instances of multiple bullet points are OK.) - -sche (discuss) 15:21, 14 April 2024 (UTC)[reply]

I would support this. Benwing2 (talk) 21:16, 14 April 2024 (UTC)[reply]

Noting the vast number of unbulleted usage notes, I do not support bulleting them universally. It would be interesting to compare the number of occurrences of Usage notes==+\n[^*] with Usage notes==+\n\*. I suspect the former will strongly prevail but I'm open to being proven wrong. This, that and the other (talk) 22:53, 14 April 2024 (UTC)[reply]

As of the database dump above, 26,925 mainspace pages (and 27,092 pages overall) contained Usage notes====\n\*, e.g. var gücüyle. 24,460 mainspace pages (and 24,770 pages overall) contained Usage notes====\n[A-Z], e.g. în ruptul capului. (88 mainspace pages contained Usage notes====\n(1|2|3|4|5|6|7|8|9|0|\[), e.g. tetranary and himmel, and I hope the number of pages where the usage notes start with bare non-English text not inside a template, or start with a lowercase letter, is negligible since both of those seem like issues to be cleaned up. I haven't provided numbers of pages where usage notes start with { because those are often templatized usage notes, and some templates provide bullet points and others don't, which AWB can't determine from scanning a database dump.) I'm surprised it's even that close, I thought bullet points were the standard... well, I won't add anything to ELE about bulleting or not bulleting usage notes without a wider discussion (with more input than this one has gotten), then.
BTW, regardless of whether they use bullet points or not, 721 mainspace pages were using L3 usage notes (no mainspace pages were using L2 usage notes); many should be L4. - -sche (discuss) 23:51, 14 April 2024 (UTC)[reply]

@This, that and the other: Could you explain your objection to bulletting usage notes, please? I also thought bulletting was the standard. 0DF (talk) 15:59, 21 April 2024 (UTC)[reply]

Bullets should be used for lists of similar items, such as:

derived and related terms lists
your shopping list
lists exactly like this one.

Usage notes are not inherently listlike. They do not consist of "items". Rather, they are written using one or more full sentences, so it feels more natural to present them as distinct paragraphs. This also helps to visually distinguish the usage notes from the numbered lists of senses and bulleted lists of terms that generally surround them. This, that and the other (talk) 23:41, 21 April 2024 (UTC)[reply]

@This, that and the other: That makes sense. Would you agree with usage notes being bulletted when there are multiple topics in the section (i.e. when they're actually usage notes, rather than a single usage note)? 0DF (talk) 01:14, 22 April 2024 (UTC)[reply]

I think TTO's comments make sense except that I think multiple usage notes are a lot more list-like than paragraph-like, since they're generally unrelated to each other. Benwing2 (talk) 01:16, 22 April 2024 (UTC)[reply]

When we say ban bulleted etymologies, what do we mean? Would kobieta have to not include bullets? Vininn126 (talk) 16:12, 21 April 2024 (UTC)[reply]

As far as I can tell, all of the people who've commented here on the question of bulleting etymologies have said bulleted lists are fine but non-lists like accomplished are not. Is this being discussed somewhere else where someone has proposed a blanket "ban"? - -sche (discuss) 17:18, 21 April 2024 (UTC)[reply]

This seems fine to me, then. Vininn126 (talk) 17:32, 21 April 2024 (UTC)[reply]

User:‎TTObot[edit]

...is clearly (and openly) a bot operated by User:This, that and the other. The bot seems to be running some maintenance tasks of a few Wiktionary: namespace pages that contain lists for various tasks. While the user is trusted and the bot doesn't seem to be doing anything that dangerous, it should still in principle have the bot flag, but it does not, nor does it seem a vote was ever started to even obtain one. — SURJECTION ^{/ T / C / L /} 18:00, 14 April 2024 (UTC)[reply]

Well, by the reasoning in WT:BOT, bots need vote approval and control to avert that they run amok and leave messes not cleaned up after. Not the issue for the list-type pages the bot has only edited that are not expected to be created manually in the first place. Fay Freak (talk) 19:54, 14 April 2024 (UTC)[reply]

I still think it would be better to start a bot vote; I'm sure it won't have any problem passing. Benwing2 (talk) 21:15, 14 April 2024 (UTC)[reply]

@Surjection yes, in the end I suppose it should have the flag, and that requires a vote: Wiktionary:Votes/bt-2024-04/User:TTObot for bot status. Don't have a lot of time right now to put the finishing touches on it though. This, that and the other (talk) 22:50, 14 April 2024 (UTC)[reply]

@This, that and the other Thanks for creating the vote. Note that bot votes are generally only a week in length, so I think it's fine for you to shorten the vote to one week. Benwing2 (talk) 00:44, 16 April 2024 (UTC)[reply]

@Benwing2 the vote page template defaulted to 14 days, so I just rolled with it. I think we changed from 7 to 14 days when we doubled the length of votes a few years ago. This, that and the other (talk) 12:13, 16 April 2024 (UTC)[reply]

@This, that and the other I see, no problem. Benwing2 (talk) 20:40, 16 April 2024 (UTC)[reply]

Etymology tree vote[edit]

The etymology tree testing thread, which has run for two weeks now, has achieved good results with several minor bugs or problems being fixed^{[note 1]} and every commenter being happy with the output. Therefore I feel that I'm ready to put the template up for a vote to establish consensus on whether it can be used on mainspace. I think that language community should establish its own policy on where etymology trees may be used, which may well be "nowhere".^{[note 2]} However, before I start any vote I would like to address some objections that were made in last month's thread.

@Benwing2 said that we should avoid duplicating information between the template and the etymology sections, and this is something I agree with. It's just that our current etymology sections are fairly inefficient at representing information and can be automated to a significant extent. One of the things I've been working on is automatic text generation, which you can see here.

@Victar commented that we would need "a very complex module, with ways to mark derivational types, certainty, alternatives, mergers, etc." All of these have been integrated into Module:etymon.

So I invite the community to discuss whether the module is ready for a vote, and if so, how it should be worded.

^ Specifically: a) unbolded transliterations, b) fixed overflowing text, c) decreased font size, d) fixed visual bug on Firefox
^ Note that {{etymon}} can run in a "silent mode" which produces no visible output, but passes along information to other entries. This could be added anywhere.

Ioaxxere (talk) 20:17, 14 April 2024 (UTC)[reply]

Edge cases maybe difficult, some concepts in Etymology are not as clear to convey like Wandelwörter, reference @benwing2. But it seems you should vote on it. Also will be difficult to display doublets or cognates ADDSamuels (talk) 01:49, 15 April 2024 (UTC)[reply]

P.S: Should be voted as an opt-in additional. ADDSamuels (talk) 13:48, 15 April 2024 (UTC)[reply]

Will we have a policy of eliminating lies, such as the claim that most Indic lects are descended from Sanskrit as normally understood and as usually described by Wikipedia? (A solution is to direct the user to WT:About Sanskrit instead of to w:Sanskrit. Dubious edits on Wikipedia to promote the Wiktionary definition of 'Sanskrit' seem to have disappeared.) --RichardW57m (talk) 10:14, 15 April 2024 (UTC)[reply]

our current etymology sections are fairly inefficient at representing information and can be automated to a significant extent.

I think that's a problem that needs to be solved before deploying this. For this to be useful, it needs to be widespread, and for it to be widespread it needs an automated way to be integrated into most of the existing etymologies plus manual effort to integrate it where it can't be added automatically. How can we replace the existing information in the etymology sections with this template? Do you have a parser to convert existing etymology data to this template? How many etymologies can be automatically replaced and how many will need manual work? JeffDoozan (talk) 13:49, 15 April 2024 (UTC)[reply]

I think the biggest hurdle would be the fact that each etymology would need an ID. Those with {{etymid}} already present could just give that to the template, but those without could not. You could even probably generate ID's for most entries that only have 1 etymology (i.e. affixed adverbs and the like) and even only a few definitions, but you'd need a way to generate the actual ID name. I forsee a major hurdle with entries with many etymologies/definitions. Vininn126 (talk) 14:09, 15 April 2024 (UTC)[reply]

@JeffDoozan: User:Vininn126 is right in that assigning IDs is a significant challenge. Basically, if an entry says "borrowed from X", does it mean X (etymology 1) or X (etymology 2)? Sometimes the ID is specified, but not often enough to be very useful. However, I have been working on getting structured etymological data from Wiktionary. Here's a sample:

camarade#French>0 camaraderie#French>0
thio-#English>0 thiocresol#English>0
Reconstruction:Proto-Germanic/watrōną>0 water#English>2
flop#English>1 flopover#English>0
Conkright#English>0 Conkrights#English>0
Reconstruction:Proto-Indo-European/mey->3 Reconstruction:Proto-Italic/moinos>0

(>0 means that the entry only has a single etymology section)

With that said, I don't think mass replacement needs to be a priority.

Ioaxxere (talk) 14:41, 15 April 2024 (UTC)[reply]

For what it's worth, I

Support the idea of this template. I think it will attract a lot of readers. Having asked some various people online (hearsay evidence, I know, it would be nice to have an official account where we can poll people on other platforms...), they seem to generally love this. Probably not possible at the moment to deploy it large-scale, but is there anything stopping one from adding it to pages manually? Vininn126 (talk) 08:53, 16 April 2024 (UTC)[reply]

Support. Binarystep (talk) 10:12, 20 April 2024 (UTC)[reply]

@Ioaxxere: The family-tree–style presentation seems wasted on linear descent, as in the case of father. How does this handle branching derivation and cognates or other relations? I see real promise there. 0DF (talk) 02:10, 22 April 2024 (UTC)[reply]

@0DF: You can see a more branched example at Wiktionary:Beer parlour/2024/March#New design (note: it's a mockup, but the real thing looks very similar). Cognate generation is probably not going to happen (per #Automatic cognate generation). Ioaxxere (talk) 02:50, 22 April 2024 (UTC)[reply]

@Ioaxxere: Oh yeah, that looks great! Re cognates, I believe your respondents in that section were all under the impression (as I was) that you were proposing a template that would generate mere lists of cognates; IMO, presenting with a table like the one you present in this section would be an entirely different, and considerably more attractive, one. 0DF (talk) 03:57, 22 April 2024 (UTC)[reply]

@Ioaxxere I see you went ahead and created a vote. I don't think this proposal is ready for a vote yet. Benwing2 (talk) 03:04, 27 April 2024 (UTC)[reply]

Ordering entries differing in lexicographic spelling[edit]

Do we need a policy on ordering the entries of the forms of the same lemma when they appear on the same page? For example, Latin nigra and nigrā are two different entries on the same page, and are different case and gender forms of the same lemma, Latin niger, which is on a different page. Indeed, should such forms have different entries? I see that Latin mala (“bad”) and malā (“bad”) share the same entry, the writing with the macron only being distinguished as a label for the pronunciations! I may not even be consistent myself - I may have done the entry for Lithuanian Alfrede wrong, as the two locative singular forms correspond to different citation (nominative singular) forms. --RichardW57m (talk) 14:53, 15 April 2024 (UTC)[reply]

Recent changes to the citation templates[edit]

User:JeffDoozan has been making a lot of changes to the citation templates as of late with their bot, AutoDooz. Some have introduced errors, but the change that I have to object to above all is changing parameter |1= to |lang=, and despite me bringing this up on their talk page, they went ahead with their script today, regardless. The rational given for this change is that it unifies the citation templates with the quotation templates, but this is argument fallacy because these templates have very different purposes. For one, we only specify the language of work if it isn't English, which mind you, isn't even done at all in other citation formats, ex. Harvard, Oxford, etc. Secondly, many journals are in multiple languages, or others, like lists, are in no language at all. Making language a mandatory field for citation templates is all around a bad idea and was done with no community consensus. @Benwing2 -- Sokkjō 18:04, 15 April 2024 (UTC)[reply]

Including a language is not mandatory, as discussed here. Works in multiple language can use a comma separated list of language ids in either |1= and |worklang=. JeffDoozan (talk) 18:12, 15 April 2024 (UTC)[reply]

But is it mandatory because if you use the template with just the numbered fields, you have to at the very least leave the field blank. -- Sokkjō 18:18, 15 April 2024 (UTC)[reply]

@Sokkjo That means it's not mandatory, then. I'm sure you'll be able to cope with leaving a parameter blank - it's only one extra button press. Theknightwho (talk) 23:42, 15 April 2024 (UTC)[reply]

Definitely this should have been discussed more before implementation. However, it's not clear to me it's wrong. It is a bit strange to have an optional langcode parameter in |1= but I do understand on the one hand the desire to harmonize the interfaces of quote-* and cite-* and on the other hand the fact that we don't categorize citations, so it's not strictly necessary to have the language specified for all citations. I actually think having numbered params (other than something |1= for a language code) for these templates, whether quote-* or cite-*, is a bad idea, because they're hard to make sense of when reading the wikitext and highly error-prone when creating the wikitext (esp. since there are a lot of them and every citation template is different from every other one). I proposed eliminating numbered params for the quote templates but some people objected, so I just eliminated them on the less-used ones and kept them e.g. for {{quote-book}} and {{quote-journal}}. Yes it requires a bit more typing to use named params but inserting a quotation or citation takes a lot of typing anyway, so the net effect isn't that much. Benwing2 (talk) 00:42, 16 April 2024 (UTC)[reply]

@Benwing2: What myself and other users do is use the numbered parameters for quick inline citations, i.e. {{cite-book|year|author|title}}. Having language as a first parameter would mean we would have to do {{cite-book||year|author|title}}, leaving a blank field, which is very prone to error, either from people forgetting to do so, or from people misinterpreting it as typo. Certainly when creating a reference template, I use full parameter names. -- Sokkjō 01:48, 16 April 2024 (UTC)[reply]

@JeffDoozan, so even though we label each parameter in citation templates, you moved all the instances of |lang= to |1=? Why? @Benwing2 -- Sokkjō 23:55, 21 April 2024 (UTC)[reply]

For consistency with the quote templates. JeffDoozan (talk) 23:56, 21 April 2024 (UTC)[reply]

And why did you unilaterally decide the templates should be unified to |1= and not |lang=? -- Sokkjō 00:07, 22 April 2024 (UTC)[reply]

What is true about it is that the text in langcode is in less than optimal places, being placed after journal names, while we always encode the language of the journal piece. Fay Freak (talk) 18:24, 15 April 2024 (UTC)[reply]

Multiple Quotations[edit]

Reading WT:ELE in response to the call to bullet usage notes, I noticed the following text:

Quotations are generally placed under the definition which they illustrate. If there is more than one being provided, or where this is not possible (e.g., a very early usage that does not clearly relate to a specific sense of the word), a separate section should be used.

Are we meant to take this seriously? If so, is there any guidance on how (or whether) to relate quotations to senses? Or any good examples of so doing? --RichardW57m (talk) 14:59, 16 April 2024 (UTC)[reply]

My preference (and I don't think I'm the only one) is to eradicate ====Quotations==== sections by assigning the quotations to senses or, if it's truly unclear whether they use any of the definitions we're providing, then moving them to the cites page — how weird it would be if quotations of senses that don't meet CFI were given extra prominence with their own section in the main entry! If ELE says that merely having more than one quotation means they should not go under the definition anymore, that definitely needs to be changed, yikes! We have so many entries where a definition is supported by more than one quotation under it... :o (prior vote btw) - -sche (discuss) 16:26, 16 April 2024 (UTC)[reply]

100% agreed. ===Quotations=== sections are unhelpful, even more so than ===Synonyms=== and the like because at least the synonyms sections (sometimes) identify which sense the given terms are synonyms of. Benwing2 (talk) 18:47, 16 April 2024 (UTC)[reply]

I agree. — Sgconlaw (talk) 18:57, 16 April 2024 (UTC)[reply]

@-sche: Most lemmas don't have any quotations, though a high proportion (very roughly half) are backed up by dictionaries. (And most entries are non-lemmas without quotations.) I actually found it quote hard to find senses with two or more quotations. I suspect it's mostly those senses that have been challenged that have multiple quotations. For Pali I'd been working on the principle of keeping the best set of three with the sense, but I may not have had any extras that seemed worth putting on a citations page. I think the number three may simply have been the WDL requirement. --RichardW57m (talk) 16:56, 17 April 2024 (UTC)[reply]

@RichardW57m I think you're right about this; senses that are obviously attestable often get only one quotation to illustrate them, but those that may be challenged are more likely to get three. Benwing2 (talk) 21:36, 17 April 2024 (UTC)[reply]

Tibetan script for Japhug[edit]

Currently, our Japhug entries are using IPA transcription as the main script. This follows almost all publications on (Kamnyu) Japhug, all of which are by Guillaume Jacques (and his co-authors). Guillaume Jacques' grammar on the language does mention: "The IPA orthography chosen to write Japhug in this grammar is probably not viable for use by native speakers, and an alternative writing system based on Tibetan script is preferable." The grammar does have a page and a little on how Tibetan script can be used for writing the language, though there aren't really many details given. After some discussion on Discord with @Thadh, we thought it best to bring it here to see if people have thoughts on how to proceed. To me, there are two possibilities. First is to keep the status quo and continue using IPA for lemmatization; Tibetan script could be added perhaps automatically to entries beside the headword. The second option would be to use Tibetan script for lemmatization, perhaps with IPA transcription as the romanization that appears beside headwords. — justin(r)leung _{{ (t...) | c=› }} 15:34, 16 April 2024 (UTC)[reply]

(I personally highly prefer the latter option). Thadh (talk) 15:36, 16 April 2024 (UTC)[reply]

If the Tibetan script is not attested, then using it only reinforces Wiktionary as a den of fabrications. --RichardW57m (talk) 15:51, 16 April 2024 (UTC)[reply]

@RichardW57m: It's not us who developed the script, it's described in the grammar. And the language itself is not written at all, it's only attested in grammars and scientific papers. If anything, people are going to thank us for using something that even resembles human language rather than a scientific transcription. Thadh (talk) 15:59, 16 April 2024 (UTC)[reply]

People could also say that we're making things up with no proof, and we're chauvinists assuming that a language uses this script. We need to find people writing the language down, and see what script they use. CitationsFreak (talk) 16:26, 16 April 2024 (UTC)[reply]

The language uses no script. In my opinion it's better to use some script (especially when it is proposed already!) than use no script. If at some point we hear from the speakers that they developed an alternative orthography, we can always switch to that. Thadh (talk) 16:31, 16 April 2024 (UTC)[reply]

@Thadh: Is Japhug in the Tibetan script used to convey meaning? If not, the words in the Tibetan script don't meet CFI. And it doesn't seem to meet the principle of independence, so being an LDL should be irrelevant. --RichardW57m (talk) 16:27, 16 April 2024 (UTC)[reply]

What are you even talking about? Thadh (talk) 16:29, 16 April 2024 (UTC)[reply]

If the language is not normally written, and the only orthography used when it is written (or written about, by scholars) is the IPA-ish transcription, then it seems like creating unnecessary hurdles to require people to figure out how to translate the words as they are actually written into someone's speculative "maybe if people wrote this in Tibetan script, they'd write it like this" orthography. I say we just continue using the attested (IPA-ish) orthography, until such time as a Tibetan orthography actually becomes used. - -sche (discuss) 16:31, 16 April 2024 (UTC)[reply]

Lemmatising at a practical orthography is standard practice for dictionaries. I think using IPA would be a grave mistake, and would also give out a "we don't really care" attitude. Thadh (talk) 16:35, 16 April 2024 (UTC)[reply]

I agree with User:-sche here. Lemmatizing at what is at this point essentially a con-script seems worse in many ways than using the IPA that publications actually use. Benwing2 (talk) 18:44, 16 April 2024 (UTC)[reply]

@-sche: "such time" - If ever. Population 3,000 and characterised as 'vulnerable'. --RichardW57m (talk) 16:04, 17 April 2024 (UTC)[reply]

Thanks for all your input. I generally sense that the general sentiment is to stick with the status quo and use IPA transcription for lemmatization purposes. I am sympathetic towards this option as well. (Sorry, Thadh!) The question now is whether to show a possible Tibetan orthography in entries at all based on the schema described in Jacques Guillaume's grammar. — justin(r)leung _{{ (t...) | c=› }} 00:28, 18 April 2024 (UTC)[reply]

{{surf}}’s up[edit]

(Previous discussion here.)

What do people think of introducing a long-term replacement like {{sync}} ‘synchronically derivable from X + Y’? This is, in my view, the most precisely-defined proposal thus far, and thkiat exact phrase is used in the linguistic literature. Nicodene (talk) 02:12, 17 April 2024 (UTC)[reply]

Because we don't want to make it easy for new normal users? I don't think synchronic or its morphs are part of the idiolect of more than 5% (1%?) of normal users. What we mean by surface analysis is not much better, but at least the words are understandable. Superficially would be much clearer, though it is too (negatively) evaluative. DCDuring (talk) 12:44, 17 April 2024 (UTC)[reply]

Solvable by simply linking to a glossary entry that explains the meaning. I doubt most visitors would know what a determiner is either, yet it would hardly be a reason to remove the correct linguistic term in favour of a fictitious alternative. Moreover our using a fictitious term gives readers the misleading impression that it actually exists. Nicodene (talk) 18:57, 17 April 2024 (UTC)[reply]

Not a real solution, just an excuse. It's like fine-print or shrink-wrap (non)disclosure IRL. DCDuring (talk) 13:20, 18 April 2024 (UTC)[reply]

Not an argument, unless you're to propose deleting ‘confix’, ‘determiner’, ‘deverbal’, etc. Nicodene (talk) 22:25, 18 April 2024 (UTC)[reply]

I generally support this but I'd like to hear how you propose to convert things over. Can we just replace all uses of {{surf}}? If not we will potentially get a messy situation with both {{surf}} and {{sync}} coexisting indefinitely. Benwing2 (talk) 23:34, 17 April 2024 (UTC)[reply]

Step one: {{surf}} is deprecated, step two: a bot converts {{surf|X|Y}} to {{sync|X|Y}} in cases where X and Y combined have the same spelling as the entry? (Allowing for the loss of a vowel at the end of X.) That should avoid most of the synchronically invalid combinations. Whatever is left will require human attention, unfortunately. I can take care of various European languages at least. Nicodene (talk) 05:24, 18 April 2024 (UTC)[reply]

Note that {{sync}} already exists as a redirect to Template:syncopic form. (Personally, I think "sync" seems too much like the name for a template for syncing entries, and I'd rather find another short name for both "syncopic" and "synchronic" if possible.)
IMO, if we want to change the name or wording of {{surf}}, I don't know if it makes a difference (in the long run) whether we slowly change uses of one over to the other, or just move (redirect) the existing template to the new name and update the wording (and potentially switch all uses over to the name main name all at once). As far as I have been able to tell, even going by our glossary definitions of these things (which say a surface analysis is a synchronic one), the places {{surf}} should be used and the places "{{sync}}" should be used seem to be identical. It's true that {{surf}} is currently used in some places it should not be, e.g. again says "By surface analysis, on- +‎ gain (“against”)" which is wrong (no speaker who is unaware that again comes from Middle English thinks that it looks on the surface like it was formed in modern English by combining "on-" and "gain"! this should be a "For more, see..." set of links or something)... but A) such improper uses need to be cleaned up whether we move/rename/switch the template or not (and basically the same process the Nicodene outlines for switching between templates would work even if just making a list of all uses of {{surf}} and progressively removing valid ones, or invalid ones after correcting them), and B) like people use {{surf}} in wrong places, people would undoubtedly also use {{sync}} in wrong places, particularly because DCDuring is right that "synchronic" would be an even more opaque name to many people (and the short name "sync" just makes it seem like a template to sync entries or sync links to entries or something), so if they see {{sync}} in entries, I expect the main takeaway for at least some people will just be "this is the way to link whatever words are closest to being the parts that make up the word in question", and that's how they'll use it, even in places we don't want it used, like again. That itself doesn't necessarily mean we shouldn't use "synchronic"; lots of other words we use are unfamiliar to laypeople, as Nicodene says; but it means I wouldn't expect renaming the template to be any help as far as making people only use it in the right places. I am actually sympathetic to DCDuring's point that the word clearest to laypeople would probably be "superficial", and indeed it's not hard to find linguistics literature discussing "superficial analysis" of how words are formed vs their actual etymologies, so maybe we should consider that. Heck, what if we made the wording something like "Superficially or synchronically derivable from x + y"? (We could probably even make it so that users could optionally suppress seeing "superficially or", in the same way people can opt out of or opt in to seeing {{,}}, so the 'clearer' word would be shown to laypeople, but logged-in linguists could choose to only see "synchronically".) - -sche (discuss) 16:41, 18 April 2024 (UTC)[reply]

@Nicodene: I agree that we should try to reduce ambiguity but as others have said we should be avoiding jargon. How about this wording?

boldly -> "Analyzeable as bold +‎ -ly." (synchronic)
outrage -> "Etymologically equivalent to ultra +‎ -age." (diachronic)

The template names could be {{etyeq}} and {{anz}}.

Ioaxxere (talk) 16:50, 18 April 2024 (UTC)[reply]

Well, there is another way to avoid jargon.

It is worth noting that the ‘correct’ (synchronically valid) context for using {{surf}} is effectively identical to the correct context for using {{affix}} and {{compound}}. (The only difference seems to be that {{surf}} has been used alongside derivations from other languages, as in ‘from French kilomètre, by surface analysis kilo- + metre’. Changing the wording of {{affix}} and {{compound}} to ‘composed of X + Y’ would cover these and all other use-cases, I think.) That means we could clean up the whole mess like this:

1) Deprecate {{surf}}.

2) Have a bot replace {{surf}} with {{affix}} or {{compound}} using orthography as a guide. (If no affixes are involved, the bot assumes it's dealing with a compound.) The remaining transclusions are dumped into a list.

3) (Optional:) if said list is quite long, identify repeating patterns and have a bot clean those up. For instance if a Latin noun lemma ends in x, derivatives will often have ci or gi instead (pacificus < pax + -ficus).

4) Manually review whatever is left, assigning {{affix}} or {{compound}} where appropriate.

That leaves us with etymological relations like husband < house + bond. These can either be deleted outright (they are a bit silly, really) or else assigned to a new template, as @Ioaxxere describes. In the latter case I would favour a wording like ‘etymologically corresponds to X and Y’ to avoid implying any kind of synchronic validity. Nicodene (talk) 23:01, 18 April 2024 (UTC)[reply]

@Nicodene There's actually no need to use {{compound}} as {{affix}} handles compounds already. However I don't think it will work to just replace {{surf}} with {{affix}}; {{surf}} includes additional text to note that it's "surface etymology" aka synchronic. Replacing it as proposed will remove that information and lead to etym sections that don't read properly. However, I agree that cases like "husband < house + bond" should just be deleted; I don't really see the point. Benwing2 (talk) 23:05, 18 April 2024 (UTC)[reply]

Yes the conversion is a bit tricky because of the difference in wording.

From a look through the transclusions of {{affix}} I see we have the same diachronic issues there as we do with {{surf}}. One of the first transclusions of {{affix}} that comes up is the (rather brave) month < moon + -th.

Perhaps we can begin our cleanup with {{affix}}? The procedure would be something like this:

1) Change the wording to ‘composed of X + Y’. (Compatible with the use-cases of {{surf}}.)

2) Have a bot run through the transclusions of {{affix}}, removing preceding text like ‘from’/‘equivalent to’ and dumping entries where X + Y ≠ ⟨spelling of lemma⟩ into a list for further review.

Nicodene (talk) 23:30, 18 April 2024 (UTC)[reply]

@Nicodene I think {{af}} should not be preceded by any text. If we want a version of {{af}} preceded by text, it should be a separate template. We already ran down this road with {{bor}} and {{inh}}, which at one point had preceding text "Inherited from" and "Borrowing from" (note, not "Borrowed from" as you'd expect) and was later switched to not have that text. So I would advocate a separate template to express surface/synchronic derivations — just like we already have, except maybe it should be renamed and the wording corrected. Benwing2 (talk) 01:11, 19 April 2024 (UTC)[reply]

By analogy with those we would have {{af+}}, no? That would work fine. Nicodene (talk) 01:18, 19 April 2024 (UTC)[reply]

Yes, that would be fine with me too. Benwing2 (talk) 01:29, 19 April 2024 (UTC)[reply]

Great. Since a cleanup of {{affix}} transclusions doesn't require any template change, perhaps that can be the testing ground? Not sure how effective the aforementioned ‘orthographic method’ will be, even if the bot allows for the loss of a vowel at the end or beginning of a morpheme and accounts for alternations like x~ci/gi. If I had a list of flagged words I could comb through it, looking for additional patterns to teach to the bot until it can cut the list down to a size that humans could deal with manually. Nicodene (talk) 01:43, 19 April 2024 (UTC)[reply]

Can you clarify with some examples what exactly you'd want done in a testing-ground bot run? Benwing2 (talk) 02:53, 19 April 2024 (UTC)[reply]

A list of {{affix}} transclusions where the combined components do not orthographically make the lemma (moon + -th = *moonth ≠ month). Excluding cases where there is a discrepancy because a written vowel is lost (surprising < surprise + -ing) or x alternates with c(i)/g(i) (vocifer < vox + -fer).

Then I look through the list and find other rules for things to exclude (e.g. ad- + lumino = allumino, because /d/ in that prefix tends to assimilate). The goal is, eventually, to cut the list down to only invalid cases like ‘month = moon + -th’. Nicodene (talk) 05:56, 19 April 2024 (UTC)[reply]

@Nicodene: {{af}} and {{surf}} should not be merged since they designate different things. You can rename {{surf}} but it should be clearly distinguished from {{af}} and {{com}}. Ioaxxere (talk) 05:49, 19 April 2024 (UTC)[reply]

@Ioaxxere There isn’t any actual difference that I can see. Nicodene (talk) 05:55, 19 April 2024 (UTC)[reply]

@Nicodene: {{surf}} is used when the term wasn't actually formed within the language, but can still be analyzed as though it were. Saying that English binary is actually "from" bin- +‎ -ary, for example, would be completely ridiculous. Merging the templates would lead to important information being lost. Ioaxxere (talk) 17:53, 19 April 2024 (UTC)[reply]

@Ioaxxere No, it wouldn't. The proposed change is {{af+|bin-|-ary}} ‘composed of bin- + -ary’, which is correct, and the preceding ‘from Late Latin binarius’ will still be there. Also {{affix}} is used in exactly the same way (cf. the entry anti-Semitism) so no, this isn't showing any difference between the templates.

Also the conceptual dividing line isn't actually clear in most cases, as discussed previously regarding boldly, where (I'd argue) the assumption that the word was passed down in an unbroken chain across a thousand years, and never reformed from its components, is completely ridiculous. Even so, nothing about the proposed wording for {{af+}} affects this one way or another. Nicodene (talk) 18:53, 19 April 2024 (UTC)[reply]

You're right that {{af}} is used ambiguously, but that doesn't mean we should be converting a precise template ({{surf}}) to a vague one ({{af}} / {{af+}}). It's not about the wording but about the wikitext itself losing information. Ioaxxere (talk) 20:42, 19 April 2024 (UTC)[reply]

There is nothing precise about the Wiktionary-ism ‘surface analysis’, as all the preceding discussions about what it should mean have shown.

Zero information is being lost, because the preceding text (‘from Late Latin binarius’ and such) is not going to be deleted. Nicodene (talk) 21:15, 19 April 2024 (UTC)[reply]

┌┘ Okay let’s simplify things. Cleaning up the entire website’s (mis)uses of affix or surf all at once would be a gargantuan task.

Far simpler proposal: have a bot replace {{surf}} with {{af+}} which generates a preceding ‘derivable from’. This phrasing strikes me as jargon-free but still precise. Thoughts? Nicodene (talk) 03:22, 20 April 2024 (UTC)[reply]

I can support this. Ioaxxere (talk) 18:51, 20 April 2024 (UTC)[reply]

Arabic-script affixes[edit]

As we all know, Latin-script affixes are lemmatised with a hyphen (such as -ed); what about affixes in Arabic script? Currently, Arabic affixes are lemmatised in plain form without any hyphen, but get a connecting character in the headword (such as at ون) or do not get one (such as at وت). Ottoman Turkish affixes are lemmatised with the connecting character in the pagetitle (such as ـدن). Pashto affixes randomly do or do not get a hyphen (such as -تون versus تون). What is the correct way to do this? MuDavid 栘𩿠 (talk) 02:43, 17 April 2024 (UTC)[reply]

@MuDavid I agree the current situation is a mess and should be harmonized. I would propose lemmatizing using the tatweel sign like in Ottoman Turkish, or if that is rejected, lemmatizing at the form without connector but placing the tatweel in the headword (like ون). Benwing2 (talk) 21:35, 17 April 2024 (UTC)[reply]

@MuDavid, Benwing2:

Support lemmatising using the tatweel sign. 0DF (talk) 15:37, 21 April 2024 (UTC)[reply]

I’m fine with the tatweel (I’ll try to remember the word) in lemma and headline. But someone should then clean up the mess… MuDavid 栘𩿠 (talk) 00:57, 24 April 2024 (UTC)[reply]

@MuDavid, Benwing2, 0DF: Ottoman Turkish is lemmatized like that because I started filling the language late and correctly—recently mostly @Samubert96 is doing the job—, as with ASCII hyphens the entry links become insufferable aesthetically and BiDi-wise, so we employ the so-called tatweel (a term barely attested in Arabic, and if so borrowed from the Unicode chart). The Arabic language pages, less so the Persian ones if I have observed correctly, were just kept were they had been created in Wiktionary’s dark ages, with the language-specific module adjustments in order not to change existing pages much. I have noticed greater problems in 2019 and performed some complicated considerations due to Persian distinguishing connecting and non-connecting suffixes, after which Erutuon (talk • contribs) added the relevant capabilities to the modules, but whether the intended system is intelligible is open to critique. Fay Freak (talk) 01:44, 24 April 2024 (UTC)[reply]

@Fay Freak I did a lot of work on Module:affix to correctly support the different ways of handling affix indicators in Arabic-script languages; it's code I'd gladly get rid of if possible. If it's agreed to follow the Ottoman Turkish approach, I can clean up the other languages without too much difficulty, I think. Benwing2 (talk) 01:47, 24 April 2024 (UTC)[reply]

Russian clitics in English Wiktionary[edit]

@Atitarev @Benwing2: Currently the Russian clitic entries barely have any presence in English Wiktionary. But it's probably possible to borrow many of them from Russian Wiktionary.

Ruby script for extracting clitic information from https://dumps.wikimedia.org/ruwiktionary/20240401/ruwiktionary-20240401-pages-articles-multistream.xml.bz2:

LETTERS = "Ѐ-џҊ-ԧꚀ-ꚗѣѢѳѲѵѴʼ"
AC = "\xCC\x81"
PREPOS = "[#{LETTERS}]+#{AC}[#{LETTERS}]*"
WORD = "[#{LETTERS}]+"
result = []
fh = ARGV[0] ? File.open(ARGV[0]) : STDIN
while line = fh.gets
  next unless line =~ /\|клитика/
  next if line =~ /\|клитика={{{клитика|}}}/ # skip empty entries
  newentries = line.scan /(#{PREPOS})\s+(#{WORD})/
  result += newentries.map {|a| a[0] + " " + a[1] }
  nextline = fh.gets
  # report suspiciously formatted entries
  STDERR.puts "#{line}#{nextline}" unless !newentries.empty? && nextline =~ /^(\}\})|(\|)/
end
# print results as wikilinks
puts result.sort.uniq.map {|w| "[[" + w.gsub(/#{AC}/, "") + "|" + w + "]]" }.join(" ")

Here are the results (please note that only two-word pairs are listed, but some of them could have been a part of a longer phrase):

бе́з году бе́з соли бе́з толку во́ поле до́ дому до́ крови до́ ночи до́ смерти за́ бок за́ борт за́ версту за́ воду за́ волосы за́ год за́ голову за́ город за́ два за́ день за́ душу за́ зиму за́ зуб за́ лето за́ море за́ морем за́ мост за́ ноги за́ ногу за́ нос за́ ночь за́ полдень за́ полем за́ полночь за́ пять за́ реку за́ руки за́ руку за́ слово за́ спину за́ стену за́ три за́ угол за́ уши за́ хер за́ хрен за́ хуй за́ щеку и́з году и́з дому и́з лесу и́з носу и́зо дня на́ берег на́ бок на́ борт на́ бронь на́ ветер на́ воду на́ год на́ голову на́ два на́ день на́ дом на́ душу на́ зиму на́ зуб на́ лето на́ люди на́ людях на́ море на́ мост на́ ноги на́ ногу на́ нос на́ ночь на́ поле на́ пять на́ реку на́ руки на́ руку на́ семь на́ слово на́ смерть на́ спину на́ стену на́ сторону на́ три на́ угол на́ уши на́ хер на́ хрен на́ хуй на́ цепь не́ хер не́ хера не́ хуй не́ хуя о́б голову о́б ноги о́б ногу о́б руки о́б руку о́б стену о́б хер о́б хуй о́т году о́т часу по́ боку по́ ветру по́ воду по́ два по́ лесу по́ лугу по́ льду по́ морю по́ мосту по́ носу по́ полю по́ пять по́ семь по́ слову по́ три по́ уши по́ хер по́ хрен по́ хрену по́ хуй по́ цепи по́д воду по́д голову по́д гору по́д зиму по́д лето по́д ноги по́д ногу по́д руки по́д руку по́д хер по́д хуй при́ смерти

I also have corrected one inconsistently formatted entry there. Do the missing headword entries just need to be created in English Wiktionary? Or, similar to how it's done in Russian Wiktionary, some kind of a special template can be introduced for listing clitics in the parent word entries?

BTW, as a person from Belarus, I personally find a lot of these clitics weird and unnatural to various extent. I understand the reason and necessity of the accent pattern in до́ смерти (dó smerti)" or на́ хуй (ná xuj), because these word pairs are not to be interpreted literally and have a different sense of their own. But many others just feel to me like somebody is trying to sound deliberately poetic or archaic when reciting a fairy tale or something. And, for example, I doubt that anyone from Belarus would ever say "до́ дому" in their Russian speech, maybe because of the influence from the Belarusian дадо́му (dadómu)? The whole concept of the prepositions stealing stress from the next word doesn't exist in the Belarusian language. I understand that it's the standard Russian pronunciation norm and has to be acknowledged as such, I'm just trying to say that my Russian language competence is definitely lacking in this area. I'm primarily interested in the Russian clitics for the purpose of correctly handling them in the auto-accenting Lua module. --Ssvb (talk) 04:18, 17 April 2024 (UTC)[reply]

@Ssvb I am not a native Russian speaker; I'm sure Anatoli can answer better. I'll just note that Zaliznyak's grammatical dictionary notes the occurrences of such stress-stealing in the headwords of each word where it occurs. I would not necessarily recommend creating entries for all such combinations unless they are idiomatic. I think it's enough to note them in a usage note in the noun. I'm not sure if we need a special template for this, esp. since I imagine the conditions under which this stress-stealing occurs are rather varied and some of the expressions may be archaic, poetic, etc. as you note. Benwing2 (talk) 04:31, 17 April 2024 (UTC)[reply]

As an example, under нога it has a diamond symbol (indicating special usages) followed by this:

зá ногу, зá ноги; на́ ногу, на́ ноги; снде́ть ногá зá ногу (ногá нá ногу); заки́нуть нóгу зá ногу (нóгу нá ногу); переступа́ть (перемина́ться) с ноги́ нá ногу; бро́сить (смотрéть) по́д ноги

This is a lot to unpack but I take it it is indicating the various expressions where stress-stealing occurs. Benwing2 (talk) 04:36, 17 April 2024 (UTC)[reply]

@Benwing2: Thanks! So these are recorded in {{m|ru|stressed_preposition word}} format in English Wiktionary. This makes them relatively easy to parse from https://dumps.wikimedia.org/enwiktionary/20240401/enwiktionary-20240401-pages-articles-multistream.xml.bz2 directly. They are just not present in https://kaikki.org/dictionary/Russian/ JSON, but it's a minor issue. --Ssvb (talk) 04:58, 17 April 2024 (UTC)[reply]

@Ssvb BTW if you want I can send you a copy of the 1980 edition. I think there's a newer one available online somewhere, but I'm not sure where; maybe User:Cinemantique would know. Benwing2 (talk) 05:17, 17 April 2024 (UTC)[reply]

Benwing, do you reckon ходить по воду (xoditʹ po vodu) as idiomatic or should i never created it? Tollef Salemann (talk) 16:54, 17 April 2024 (UTC)[reply]

@Tollef Salemann Not being a native Russian speaker I can't say for sure but I imagine if the entry should be there at all, it should be under по́ воду (pó vodu) as I bet you can use other verbs with it besides ходи́ть (xodítʹ). Benwing2 (talk) 21:29, 17 April 2024 (UTC)[reply]

Not really. May be you can replace ходить with идти, but it's the same conjugation forms other than the infinitive. Ехать по воду (to drive for getting water) is also possible, but rare. Tollef Salemann (talk) 10:11, 18 April 2024 (UTC)[reply]

As someone who created a Russian clitic entry, I wonder if it really worth to do it, because it is not really a thing and it is very depending on dialect. We can of course take clitic stuff from dictionary, but it is gonna be useless about 20-40 years (as of modern Russian dialects, it is useless anyway). I mean, it is important to register clitics, but it seems not so easy as the dictionaries say. Tollef Salemann (talk) 06:49, 17 April 2024 (UTC)[reply]

@Tollef Salemann: These things are not reflected in spelling, so pronunciation may have diverged in different regions. Still only Moscow determines what is considered to be the official standard pronunciation. And "за́ руку" [18] seems to be legit. But "на́ берег"[19] - not so much and seems to primarily exist because of "Выходила на́ берег Катюша". --Ssvb (talk) 17:57, 17 April 2024 (UTC)[reply]

Moscow pronunciation is not the same as "standard" Russian pronunciation. The dictionaries have often differencies, and they are under update. Different primary schools may use different dictionaries as well. So the clitics seem as a mess for me. Tollef Salemann (talk) 10:16, 18 April 2024 (UTC)[reply]

@Ssvb Under берег, Zaliznak says "на бе́рег // на́ берег" which seems to mean both are possible but the first is more common. It is similar to the entry directly above for снег (the entries are sorted alphabetically from the end), which reads "по сне́гу // по́ снегу". Note also that sometimes the part after the // is enclosed in brackets, presumably meaning that variant is dated or dialectal or something. Benwing2 (talk) 04:26, 19 April 2024 (UTC)[reply]

@Ssvb: How would these be clitic entries? They look like clitic plus noun phrases, presumably suggested for inclusion because of idiomatic meanings or possibly (though I'm not sure of the validity of so doing) because they are not readily recognised as such. --RichardW57m (talk) 15:36, 17 April 2024 (UTC)[reply]

@RichardW57m: They are relevant and deserve to be documented because they have different pronunciation at least by some speakers (the true authentic Russians). But unless they are really idiomatic, they don't need their own headword entries each (I agree with User:Benwing2). Please disregard the red links in my starter comment, they are a red herring. --Ssvb (talk) 18:10, 17 April 2024 (UTC)[reply]

Having heard my Russian from Sovietized Qazaqstani Germans and Tatars, most, save distinct idiomatic ones, which are figurative but not literal uses of the vulgarities and apparently за́ бок (zá bok) which I only now hear, seem optional to me, the oftener I try to recall them, up to individual preference or even mood, some stilted, similar to Ssvb.

и́зо дня (ízo dnja) seems like an archaism and бронь (bronʹ) I have never heard, and also not a coincidence that I never heard the set phrase за версту (za verstu) in either stress. Interestingly it shows that some such phrases, including при́ смерти (prí smerti), are idiomatic only in some regions and registers of the Russian language area (but до смерти (do smerti) has optionally either stress for me and is idiomatic anyhow).

I also think that some of the phrases only have peculiar meaning and stress due to using a particular, stressed, sense of the preposition, namely за́ зиму (zá zimu) and за́ ночь (zá nočʹ) and на́ ночь (ná nočʹ) (all quite obligatory), which also constitutes the reason of the said figurative senses stress-stealing.

There seems to be an overlooked, hardly satisfactorily described, part of Great Russian grammar that figurative prepositional phrases switch stress to attain emphasis for the figure. Fay Freak (talk) 22:21, 17 April 2024 (UTC)[reply]

@Fay Freak, @Ssvb, @Benwing2, @Tollef Salemann, @RichardW57m: Yes, a changed stress may mean additional meanings, either as a set expression or some metaphoric usage.

I think it's important to include {{&lit}} in all these collocations, so that people are aware that non-idiomatic meanings are also possible.

Let's look at на́ ноги (ná nogi) vs на но́ги (na nógi) - "on(to) legs/feet"

I would use the latter when talking about putting on (shoes, pants) or if something is placed on legs/feet (the former is also OK in this case) but in the expression встава́ть на́ ноги (vstavátʹ ná nogi) "to get to one's feet" (both literally and metaphorically), stressing the preposition (the former) would sound more natural. Anatoli T. ^{(обсудить}/^вклад) 00:55, 19 April 2024 (UTC)[reply]

@Ssvb: Thanks for the ping. I think many of the clitics listed can be created but they have to be filtered case by case. Just a few things to consider

бе́з соли (béz soli) sounds weird. I never heard it. без со́ли (bez sóli) is not an expression, IMO.
Both до до́му (do dómu) and до́ дому (dó domu) are valid. The latter sounds a bit rustic or folkloric.
Both за́ голову (zá golovu) and за го́лову (za gólovu) are valid. The same is true for many cases.
{{&lit}} can be used to clarify that a term can be both idiomatic and unidiomatic. Many clitics will fall into that.

Off-topic, your comment BTW, as a person from Belarus, I personally find a lot of these clitics weird and unnatural to various extent. surprises me. It's great if your Belarusian is better than Russian but unfortunately, it seems not many Belarusians mastered their own language. I heard 6% of Minsk citizens are fluent in Belarusian. I follow news from Belarus and many Belarusians gave interviews or answered reporters' questions in perfect Russian. Also, you can be even arrested for showing your preference to speak Belarusian over Russian. Anatoli T. ^{(обсудить}/^вклад) 01:14, 18 April 2024 (UTC)[reply]

Hey, I'm saying "béz soli", but "za nógi". They both ain't no really expressions, why to include such stuff? Only because the clitics? But there are thousands of them. Tollef Salemann (talk) 10:05, 18 April 2024 (UTC)[reply]

@Tollef Salemann: I have never heard "бе́з соли" myself, but it is mentioned in Ushakov Dictionary. Why to include such stuff? It's useful to inform the users that the stress pattern may be unusual in some cases. And also stress can be marked automatically in quotations, the Lua module just needs to identify tricky cases and avoid marking stress in them. For now I can take the list from Russian Wiktionary, but it would be great to be able to rely only on the information from English Wiktionary alone. Russian Wiktionary lists less than 200 of them. English Wiktionary currently mentions them in notes of the declension tables, e.g. the "нога́" entry. --Ssvb (talk) 13:22, 18 April 2024 (UTC)[reply]

@Ssvb I completely agree they should be documented and ideally in a standard way whenever possible. Benwing2 (talk) 22:34, 18 April 2024 (UTC)[reply]

@Tollef Salemann: It's interesting. I never heard "бе́з соли" but I won't argue with Ushakov. This usage is probably dated. Do you want to have a go and make an entry? Anatoli T. ^{(обсудить}/^вклад) 01:05, 19 April 2024 (UTC)[reply]

@Atitarev FWIW, Zaliznyak doesn't mention бе́з соли under соль. Under толк, it says "бе́з толку [// без то́лку] (безрезультатно, напрасно)". Benwing2 (talk) 01:20, 19 April 2024 (UTC)[reply]

Under год, it says "нá год; зá год; с го́ду нá год; го́д о́т году [// от го́ду, от гóда]; из го́да [// и́з году] в го́д; бе́з году неде́ля". I don't know what без году неделя ("a week without a year"?) means. Benwing2 (talk) 01:26, 19 April 2024 (UTC)[reply]

@Benwing2: Thanks. бе́з году неде́ля (béz godu nedélja) is a jocular, often derogatory expression meaning a short time for something requiring longer time. For example when someone claims to have a lot of experience, even if they have worked in that area "a week without a year" (which makes it a negative duration). Pls check some quotes at Russian Wiktionary [[без году неделя]]. Anatoli T. ^{(обсудить}/^вклад) 02:55, 19 April 2024 (UTC)[reply]

(Don't rely too much on Google translate here. it fails translating the expression). Anatoli T. ^{(обсудить}/^вклад) 02:57, 19 April 2024 (UTC)[reply]

@Atitarev: If most of Wiktionary quotations and usage examples are consistently formatted (making them machine readable), then this information can be potentially used as a part of the training data for improving various AI models. Including, but not limited to, Google Translate. In other words, Google Translate can learn from Wiktionary, but not the other way around. --Ssvb (talk) 07:19, 19 April 2024 (UTC)[reply]

@Ssvb: Interesting perspective but I didn't mean to take the translation from Google Translate but shared my observation for people who don't know Russian and are not familiar with this expression. Google Translate did a poor job in this case and people shouldn't try to use it in this particular case (to understand its meaning). Even the literal translation doesn't quite clarify what it means, IMO. Anatoli T. ^{(обсудить}/^вклад) 07:00, 20 April 2024 (UTC)[reply]

@Atitarev: I apologize for the misunderstanding and I didn't imply that you suggested that. My point is that Google Translate (or its alternatives/replacements) will definitely improve in the future. And Wiktionary, among other things, may be instrumental in making this happen. Which brings another aspect: AI art is currently resented by the artists, who believe that the AI is plagiarizing their work and destroying their jobs. And in a similar fashion, Google Translate may take advantage of the work of the Wiktionary editors without giving them any credit. Google Translate may eventually learn from your без году неделя entry and start translating it correctly, which is a good thing for the humankind in general. But the question is: how would you feel about this? Some people may prefer to make entries machine readable, the others may prefer to make them deliberately obfuscated as a way to combat the AI. My personal opinion is that the latter would be futile and counterproductive. And by the way, I have no horse in this race, as I'm not employed by Google Translate or by any similar services. --Ssvb (talk) 07:47, 20 April 2024 (UTC)[reply]

@Ssvb: No worries at all. I was just sort of amused as if we need to help Google to improve their algorithms. I am easy about it, though. If our entries help everyone, it's only better. I took them from Reverso, anyway. Not sure if I need to quote. Anatoli T. ^{(обсудить}/^вклад) 07:53, 20 April 2024 (UTC)[reply]

I've made an entry за версту́ (za verstú)/за́ версту (zá verstu) from the list. I might make more, later. Anatoli T. ^{(обсудить}/^вклад) 23:05, 18 April 2024 (UTC)[reply]

The entry for бе́з году неде́ля (béz godu nedélja) is created. Anatoli T. ^{(обсудить}/^вклад) 07:21, 20 April 2024 (UTC)[reply]

template for INN spellings of drugs[edit]

I made these three edits this morning, each indicating in a slightly different way that a given entry is the INN spelling of the name of a drug. I wonder if we should have a standalone template for this. If not, could the {{altsp}} template be made such that we could put "INN" in the from field and have it link to Wikipedia or to the glossary? Best regards, —Soap— 08:58, 17 April 2024 (UTC)[reply]

And we could also use {{altform}}, which is perhaps more accurate. At first I avoided altform because i thought its labels could only be parenthemes, but it seems that I can type

{{altform|en|methamphetamine|from=INN}}

and get

INN form of methamphetamine

So if we could only have INN link to Wikipedia (or to a glossary entry, if we prefer), I think this would be the best solution of all. Ideally, it would also categorize just as labels like US do. Then anyone could see all the INN spellings (at least the ones we get to) all laid out in a list. —Soap— 09:15, 17 April 2024 (UTC)[reply]

Re {{altsp}} vs {{altform}}: use {{altsp}} if the pronunciation is the same, and {{altform}} if it's different. (There is an effort in the works to make {{altsp}} spell this out soon.) - -sche (discuss) 16:33, 17 April 2024 (UTC)[reply]

Let me see what I can do in terms of making {{alt sp}} be explicit about this. Since I assume metamfetamine would be pronounced differently than methamphetamine (/t/ instead of /θ/), it should use {{alt form}}. Benwing2 (talk) 23:30, 17 April 2024 (UTC)[reply]

@Soap What you are proposing can easily be done by adding an INN label. Benwing2 (talk) 23:32, 17 April 2024 (UTC)[reply]

That looks like a nightmare:

1) Not all dialects of English distinguish /t/ and /θ/.

2) What of the possibility of such a word being written with 't' but pronounced with /θ/?

Or is {{altform}} appropriate if it represents a distinct set of pronunciations? --RichardW57m (talk) 11:20, 18 April 2024 (UTC)[reply]

{{alt spell}} is intended for cases that are really just spelling variants with no pronunciation differences. For these purposes I don't think it matters if some dialects merge /t/ and /θ/, I would still use {{alt form}}. Benwing2 (talk) 22:26, 18 April 2024 (UTC)[reply]

@Soap: Could you give me your opinion on how I've specified this in besilate and cilexetil, please? 0DF (talk) 15:34, 21 April 2024 (UTC)[reply]

Thank you for doing that, as it provides a lot of information for the reader, and the etymology section is a lot more expansive than the headword line. But honestly what I wanted most was to get these terms into a category, such that a curious reader could browse through all of them at once, and using the etymology section won't do that .... we could put a template within the etymology section that would, but then that template could just as well go into the definition line. I dont want to say "no thanks" because it's up to the community, not to me ... but i think we should do something, and ideally I'd like to have the INN terms listed in a category. Best regards, —Soap— 17:20, 21 April 2024 (UTC)[reply]

@Soap: Yes, I agree that it would be good to have a category for these spellings. How feasible would it be to have a specific etymology template do this? I envisage something like {{INN respelling|language code|original spelling}}. {{INN respelling|en|besylate}} and {{INN respelling|en|cylexethyl}} would generate the etymologies as I wrote them whilst also adding the pages to Category:English INN respellings; I would imagine that automatic string analysis could be used to specify which substitutions had taken place (y → i for besylate → besilate; y → i and th → t for cylexethyl → cilexetil). 0DF (talk) 17:35, 21 April 2024 (UTC)[reply]

Proposed Reorganization of Regional Hokkien[edit]

I have some concerns about the current classification system for Regional Hokkien. The system appears to be based more on administrative divisions than on linguistic, particularly phonological, relationships. For example, the current classification treats Tong'an dialect as a branch of Xiamen dialect. While administratively Tong'an District is part of Xiamen City, the narrowly-defined Xiamen dialect (specifically the dialect of Xiamen City center, which is spoken in the southwest of Xiamen Island) actually belongs to the Zhangdong branch (漳東腔), differing from the true Tong'an dialect, which is used in Tong'an District, Xiang'an District, and Kinmen County.

If we were to simplify and categorize Hokkien into just three dialects—Quanzhou, Zhangzhou, and Xiamen—based solely on major geographical areas, it might be feasible. However, treating these as three distinct major divisions fails to accurately reflect the relationships among various other dialect points.

The current classification system

├── Hui'an (惠安)
├── Longyan (龍巖)
├── Non-Mainland
│   └── Taiwanese (臺灣)
│       ├── Hsinchu (新竹)
│       ├── Kaohsiung (高雄)
│       ├── Lukang (鹿港)
│       ├── Penghu (澎湖)
│       ├── Sanxia (三峽)
│       ├── Taichung (臺中)
│       ├── Tainan (臺南)
│       ├── Taipei (臺北)
│       └── Yilan (宜蘭)
├── Quanzhou (泉州)
│   ├── Anxi (安溪)
│   ├── Jinjiang (晉江)
│   ├── Lukang (鹿港)
│   ├── Philippine (菲律賓)
│   └── Yongchun (永春)
├── Xiamen (廈門)
│   └── Tong'an (同安)
└── Zhangzhou (漳州)
    ├── Changtai (長泰)
    ├── Medan (棉蘭)
    ├── Penang (庇能)
    ├── Taichung (臺中)
    ├── Yilan (宜蘭)
    ├── Zhangping (漳平)
    └── Zhao'an (詔安)

To address these issues, I propose adopting a new system based on the Dialectal Atlas of Southern Min (閩南地區方言地圖集) by Professor Ang Uijin. This system classifies the core Southern Min dialects into eight branches: Tong'an, Quanhai, Quanshan, Quanzhong, Zhangdong, Zhanghai, Zhangnan, and Zhangshan. The first four can be collectively referred to as "Quan dialects" (泉系方言), and the latter four as "Zhang dialects" (漳系方言).

Additionally, considering geographical, historical, and political factors, Taiwanese Hokkien could be established as a separate branch, with its internal dialects also categorized under these eight branches or their sub-branches. For example, Lukang Hokkien could be categorized under the Quanzhong branch.

This new classification system would not only allow for a more precise relationship between the dialects but also maintain expandability for future adjustments. We do not need to immediately add all sub-branches, but I believe this approach could significantly improve the way Hokkien dialects are related and classified in Wiktionary.

The proposed classification system (excluding Taiwanese)

Hokkien
├── Tong'an (同安腔方言)
│   └── Tong'an (同安話)
│       ├── Tong'an (同安腔)
│       └── Kinmen (金門腔)
├── QuanHai (泉海腔方言)
│   ├── Northern Quanhai (泉海腔北片)
│   │   ├── Chongwu (崇武話)
│   │   ├── Honglai Majia (洪瀨馬甲話)
│   │   ├── Luoyang (洛陽話)
│   │   └── Shanyao (山腰話)
│   ├── Old Quanhai (老泉海腔)
│   │   ├── Huizhong (惠中話)
│   │   └── Tuling (塗嶺話)
│   ├── Shishi (石獅片)
│   │   └── Shishi (石獅話)
│   └── Southern Quanhai (泉海腔南片)
│       ├── Jinjiang Fengze (晉江豐澤話)
│       └── Xunpu (蟳埔話)
├── Quanshan (泉山腔方言)
│   ├── Central Quanshan/Yongchun (泉山腔中片/永春話)
│   │   ├──  (內永春腔)
│   │   └──  (外永春腔)
│   ├── Northern Quanshan (泉山腔北片)
│   │   └── Dehua (德化話)
│   └── Southern Quanshan (泉山腔南片)
│       ├── Inner Anxi (內安溪腔)
│       ├── Longjuan (龍涓腔)
│       └── Outer Anxi (外安溪腔)
├── Quanzhong (泉中腔方言)
│   ├── Central Quanzhong (泉中腔中片)
│   │   ├── Nan'an (南安話)
│   │   └── Shishan (詩山話)
│   ├── Eastern Quanzhong (泉中腔東片)
│   │   └── Licheng (鯉城話)
│   ├── Southern Quanzhong (泉中腔南片)
│   │   └── Shuitou Dongshi (水頭東石話)
│   └── Western Quanzhong (泉中腔西片)
│       └── Yingdu Cannei (英都參內話話)
├── Zhangdong (漳東腔方言)
│   ├── Mixed Zhangdong (混合漳東腔)
│   │   └── Xiamen (廈門話)
│   ├── Quanzhouized Zhangdong (泉化漳東腔)
│   │   ├── Chenjing (陳井話)
│   │   ├── Guankou (灌口話)
│   │   └── Xiamen Shanchang (廈門山場話)
│   └── Standard Zhangdong (正漳東腔)
│       └── Jiaomei (角美話)
├── Zhanghai (漳海腔方言)
│   ├── Haicheng (海澄話)
│   │   ├── Guanxun (官潯腔)
│   │   ├── Longjiao (隆教腔)
│   │   └── Standard Haicheng (正海澄腔)
│   ├── Nanjing (南靖話)
│   │   ├── Chuanchang (船場腔)
│   │   └── Nanjing (南靖腔)
│   ├── Northern Pinghe (平和話北片)
│   │   └── Pinghe (平和腔)
│   ├── Southern Pinghe (平和話南片)
│   │   ├── Mapu (馬鋪腔)
│   │   ├── Nanpu (南浦腔)
│   │   ├── Shiliu (石榴腔)
│   │   └── Wuzhai (五寨腔)
│   ├── Western Pinghe (平和話西片)
│   │   ├── Jiufeng (九峰腔)
│   │   ├── Luxi (蘆溪腔)
│   │   └── Xiazhai (霞寨腔)
│   ├── Yunxiao (雲霄話)
│   │   ├── Yunling (雲陵腔)
│   │   └── Yunxiao (雲霄腔)
│   └── Zhangpu (漳浦話)
│       ├── Fotan (佛壇腔)
│       ├── Qianting (前亭腔)
│       └── Standard Zhangpu (正漳浦腔)
├── Zhangnan (漳南腔方言)
│   ├── Donshan (東山話)
│   │   ├── Dachan (大嵼腔)
│   │   ├── Dongshan (東山腔)
│   │   └── Tongling (銅陵腔)
│   ├── Sidu (四都話)
│   │   └── Sidu (四都腔)
│   └── Zhao'an (詔安話)
│       ├── Chencheng (陳城腔)
│       ├── Qianlou (前樓腔)
│       └── Zhao'an (詔安腔)
└── Zhangshan (漳山腔方言)
    ├── Eastern Zhangdong (漳山腔東片)
    │   └── Changtai (長泰話)
    ├── Northern Zhangdong (漳山腔北片)
    │   ├── Hexi (和溪話)
    │   ├── Hua'an (華安話)
    │   ├── Mafang (馬坊話)
    │   └── Xiandu (仙都話)
    └── Southern Zhangdong (漳山腔南片)
        ├── Longhai (龍海話)
        └── Xiangcheng (薌城話)

I would appreciate feedback on the proposed classification system from other editors. If you have a moment, please take a look and share your thoughts.

@Theknightwho, Singaporelang, Mar vin kaiser, Mlgc1998, 幻光尘, MistiaLorrelay, RcAlex36, Kangtw, your insights would be especially valuable. Thank you! --TongcyDai (talk) 16:32, 17 April 2024 (UTC)[reply]

@TongcyDai I am probably at least partly responsible for the current system; I've been trying to clean up the various Chinese lects and what we have results from what was there before (even messier) along with some changes I've attempted to make based on Wikipedia. I have no attachment whatsoever to the current system and would welcome some cleanup from someone who better understands the dialect situation. Also pinging @Wpi, ND381 who might have thoughts. Benwing2 (talk) 21:02, 17 April 2024 (UTC)[reply]

@Benwing2 I sincerely appreciate the substantial efforts you have dedicated to organizing and refining the classifications of Chinese lects on Wiktionary. It's clear that such a task is complex and challenging, and the progress achieved, even though not yet perfect, has greatly improved upon what was previously in place. Your dedication to enhancing these entries is invaluable to the community.

Thank you for your openness to new approaches and improvements. While referencing Wikipedia has been a good starting point for improving the Hokkien classifications, I have noticed some discrepancies and occasional inaccuracies across various aspects. This observation has inspired me to propose some adjustments based on specialized academic works, aiming to further refine our classification system on Wiktionary. --TongcyDai (talk) 06:44, 18 April 2024 (UTC)[reply]

@TongcyDai Of course. My only comment would be that I am generally in agreement with Justin that some of the intermediate nodes might not be needed, as they are generally the most controversial. Benwing2 (talk) 07:24, 18 April 2024 (UTC)[reply]

@TongcyDai Thanks for pointing this out. I am generally in favour of what you propose, although we might not need to go as detailed as the 片 level from Ang Uijin. I am also curious to know if there are better English translations of these names; are they taken from Ang or translated by yourself? — justin(r)leung _{{ (t...) | c=› }} 02:25, 18 April 2024 (UTC)[reply]

Thank you for your supportive feedback. I agree that while we may not need to delve into the level of detail as outlined by Ang Uijin, maintaining a broad framework would indeed be beneficial.

Regarding the translation of the dialect names, I translated them myself with Hanyu Pinyin, following Wiktionary's conventions, as no English equivalents were provided in the original Dialectal Atlas of Southern Min.

The atlas, including its maps and appendices, does not provide translations but does assign codes to different branches and dialect points, such as the Zhanghai dialect ("Jc"), Zhangpu subdialect ("Jc1"), and Qianting sub-subdialect ("Jc1.3"), where "J" presumably stands for Zhangzhou and "c" might denote coastal, although the specific romanization scheme used is unclear.

Additionally, the use of identical names at different hierarchical levels, such as Tong'an dialect (同安腔方言), Tong'an subdialect (同安話), and Tong'an sub-subdialect (同安腔)—with only their Chinese suffixes varying—presents a challenge for listing in Wiktionary. This overlap necessitates careful consideration of how to name these similarly titled branches to ensure clarity and adherence to Wiktionary's standards. If we need to classify these categories in such detail, it is crucial that we develop a systematic approach to differentiate and label them appropriately. --TongcyDai (talk) 07:28, 18 April 2024 (UTC)[reply]

@TongcyDai What is the difference between the three levels of Tong'an? If this is related to the dialect of an urban core vs. a larger grouping, one possibility is to use qualifiers like "Urban ...". This is what we've done with "Urban Shanghainese Wu" vs. just "Shanghainese Wu". Contrarily we also have both "Beijing Mandarin" to refer to the dialect of the city of Beijing and "Beijingic Mandarin" to refer to a larger grouping that includes Beijing and several other dialects, although I'm not entirely happy with this naming. Benwing2 (talk) 22:23, 18 April 2024 (UTC)[reply]

If Ang Uijin is a peer-reviewed authoritative source, then adopting his classification is a reliable decision. I cannot provide any feedback regarding the internal subclassifications. Since many Taiwanese dialects tend to lean towards Zhang or Quan, or a mix of the two in terms of pronunciation, would it be possible to find a way to integrate them among the current classification or does the unique vocabulary of Taiwanese constitute keeping all of them in a different branch? Kangtw (talk) 08:33, 19 April 2024 (UTC)[reply]

@TongcyDai The well-known three-way split is often unhelpful, even if applied sensibly. But does an Ang-based eight-way split add clarity?

First off, a question that should not go unanswered is why dialect classification should be based on phonology alone.

Second, what is the benefit of labelling (say) an Amoy form “Zhangdong” instead of simply “Amoy” (or “central Amoy”, etc.)? Cities, towns & villages are much more objective & unbiased points of reference; “Zhangnan” or “Quanhai” are processed. Ang’s scholarship is (very) insightful, but the reality of dialect variation is messy. A fancy categorization scheme would seem to add to the mess.

Third, “overseas” (incl. 浙江) dialects of Hokkien are not being addressed, incl. the Penang-Medan dialect, arguably the only dialect of Hokkien that’s not dying. Ironically, the reason why ASEAN dialects of Hokkien are typically excluded from such discussions (while Taiwanese is often included) seems to be that they’re not spoken under Chinese (Tionghoa) administration.

All this said, it definitely makes no sense for 同安 Hokkien to be classified under “Xiamen”. Maybe we should first try to understand how that misclassification even came about, and when.

A very real problem is that our editors (some of whom don’t even speak the language) in aggregate seem to have scant access to all but a few Hokkien-speaking community-locales anyway. And of course that’s okay. But there’s no need for us to pretend to have the entire Hokkien-speaking world covered.

Again, a very relevant question is why we are thinking of this as a purely phonological endeavor. Cheers…. 61.222.241.251 12:08, 21 April 2024 (UTC)[reply]

Signed,

釆釆 (talk) 12:09, 21 April 2024 (UTC)[reply]

Here is an article Prof. Âng Ûi Jîn himself posted on Facebook in 2021 that suggests how futile it is to box Hokkien into, say, eight phono-dialects. Pay special attention to discussion of the so-called 漳山 dialect or accent and the form given by Lîm Kiàn Hui.

https://www.facebook.com/notes/1315932238750667/

And also a quote from another thread in this month's Beer Parlour that gets at another facet of the problem:

"We don't have a good factual basis for maintaining relatively fine distinctions, so a broad label that is subject to criticism, but defensible, is probably better than narrow ones, which are also subject to criticism." 釆 (talk) 11:41, 24 April 2024 (UTC)[reply]

@釆 Just curious, do you have an alternative suggestion? We have to do *something* (although "maintain the status quo" counts as "something" :) ...). Note that it is possible to give a specific dialect more than one parent. This is done in the current scheme with Lukang, which is both a Taiwanese and a Quanzhou dialect. The first parent determines the breadcrumb trail displayed at the top of the page. Benwing2 (talk) 21:15, 24 April 2024 (UTC)[reply]

disambiguation of links to Wikipedia[edit]

If a Wiktionary page has multiple definitions where further reading on Wikipedia would be useful, maybe each Wikipedia link should be adjacent to that Wiktionary definition, instead of all the Wikipedia links getting shuffled into a separate section? For example (copy-paste from Wiktionary:Tea_room/2024/April#API):

i made a "sandbox" at Talk:API#proposed deviation from usual format. i think it might be more useful to put Wikipedia links with each definition instead of lumping all the Wikipedia links into a single =Further reading= section? --173.67.42.107 06:12, 10 April 2024 (UTC)[reply]

I like this idea, because to my mind it is more useful, direct, and intuitive to users. My gut will not be surprised if other Wiktionarians dislike it. The Beer parlour ("for policy discussion and cross-entry discussion") is the correct place to propose it, rather than the Tea room ("for questions concerning particular words"). Quercus solaris (talk) 16:35, 10 April 2024 (UTC)[reply]

END copy-paste --173.67.42.107 23:47, 17 April 2024 (UTC)[reply]

Very attentive. Would use. Fay Freak (talk) 00:24, 18 April 2024 (UTC)[reply]

A follow-up thought: In my opinion, if Wiktionary adopts this idea, then it would be best to use significant case for the WP link, instead of sentence case, irrespective of WP's page titles being sentence case. I have a solid reason for favoring this approach, which I can share if anyone requests it. TLDR: it's better for the Wiktionary environment. Quercus solaris (talk) 01:10, 18 April 2024 (UTC)[reply]

Yeah, you’re right. I’m not in favour of this because Wikipedia links are not that important, and splitting them up means possibly putting too much information adjacent to definitions which some readers may not like. — Sgconlaw (talk) 04:45, 18 April 2024 (UTC)[reply]

Yes, too much clutter and noise, distracting from the definitions. How about a small ref-style Wikipedia icon which jumps to the wp link in the "Further reading" section? Jberkel 07:46, 18 April 2024 (UTC)[reply]

Reintroduce T:jump 🤣?

I only voiced my agreement with the concept, not the design, has not been made for this place in the first place, and the IP has not implied to be final for under the definition lines. Fay Freak (talk) 22:12, 18 April 2024 (UTC)[reply]

Kind of, but it's closer to how internal reference links/footnotes are handled. Maybe MediaWiki references could actually be used, if their design can be customised (ex. show a little wp logo instead of [1], [2] etc.) Jberkel 22:33, 18 April 2024 (UTC)[reply]

I've suggested something similar in October with a few possible alternatives. Einstein2 (talk) 22:40, 20 April 2024 (UTC)[reply]

OP here. i will try to read these discussions (April and October) soon, but my brain can't handle it right now.

^{i clicked the [ reply ] link after Einstein2's signed post. Does that automatically ping Einstein2? Does it automatically ping anyone/everyone who's posted in this subsection of the beer parlor? Now that others have posted, if i instead clicked the [ reply ] link after Sgconlaw's signed post (for example, if i wanted Sgconlaw to clarify that post, i wouldn't really be replying to anyone else), would clicking there ping the people who posted above (Quercus solaris, Fay Freak, Sgconlaw), but not the people who posted below (Jberkel and Einstein2)? i assume if i click [ edit ] instead, no one gets pinged?}

i started this in Wiktionary's tea room (and should i go back there for this part?) because i was asking particularly about the API page, which includes the definition [[advanced]] [[primer]] [[ignition]]--sum of parts, but with each part having its own multiple definitions, leaving the reader to guess if API refers to a more difficult introductory text on any basic concept or a substance (used to prep wood or metal for painting) catching fire, or any number of other mix-and-match combinations of definitions for each of those three initial words. So in this case, the link to Wikipedia did not distract from the definition, but more or less provided the definition. So no matter how Wiktionary decides to link to Wikipedia most of the time, i wonder if API merits a deviation from usual format. ;-) :-P

With apologies and best wishes for happy editing,

173.67.42.107 04:36, 21 April 2024 (UTC)[reply]

No, just replying to a post doesn’t ping anyone. You have to add {{ping}} or {{reply}}. — Sgconlaw (talk) 05:14, 21 April 2024 (UTC)[reply]

To be more accurate, you have to link to their user page. I can do something like "No, I'm not pinging anyone- why do you ask?, and as long as I sign it, it becomes a ping. The last part is what many people don't understand: it doesn't matter how you format it, if the signature isn't added in the same edit as the ping, it won't work. If you add a ping in later edits, it just makes it look like you pinged someone- no one receives the ping. The only exception is linking to the user page in an edit summary- that always works. Chuck Entz (talk) 05:28, 21 April 2024 (UTC)[reply]

indented translations: "Ancient" or "Ancient Greek"?[edit]

Common practice indents Ancient Greek and Mycenaean Greek translations under a Greek header, where the header line itself lists Modern Greek translations. Usage isn't consistent, however, on whether the lines read "Ancient Greek" and "Mycenaean Greek" or just "Ancient" and "Mycenaean". What should it be? Personally I lean towards "Ancient Greek" and "Mycenaean Greek"; I notice for example that Arabic lect translations indented under the Arabic header always spell out "Egyptian Arabic", "Moroccan Arabic", etc. rather than just "Egyptian", "Moroccan", etc. The general principle I would advocate is when the indented language is a full L2 language, use the full name of that language. This is consistent with how both Arabic and Chinese translations are handled. Benwing2 (talk) 01:07, 19 April 2024 (UTC)[reply]

when we do a search it reads the string from the template, so e.g. this search (great choice of word, i know, but i didnt want to waste time searching for something else) produces

μοτός is an Ancient translation of the word pledget ("small absorbent pad").

Instead of reading "Ancient Greek". Likewise the same type of search will turn up results like "---- is a Cyrillic translation of ----" because Serbian and some other digraphic languages have the indented forms just read "Cyrillic". If we could change this it would be good. —Soap— 03:46, 19 April 2024 (UTC)[reply]

@Soap Agreed, but I spent about 20 minutes trying to find the source code that generates this and failed. User:Erutuon or User:This, that and the other do you have any idea how these translations are generated when you search? Benwing2 (talk) 04:13, 19 April 2024 (UTC)[reply]

BTW I have modified my script to sort translations to automatically indent all varieties of Greek listed under the family tree at Category:Ancient Greek language under a Greek header, and to rename Ancient -> Ancient Greek and Mycenaean -> Mycenaean Greek (and similarly for Epic, Ionic, Doric, Aeolic, Boeotian, etc.). I haven't run it yet pending consensus that this is the right thing to do. Benwing2 (talk) 07:05, 19 April 2024 (UTC)[reply]

@Benwing2 it looks to be User:Yair rand/FindTrans.js. We really should get that script out of userspace. This, that and the other (talk) 07:47, 19 April 2024 (UTC)[reply]

@This, that and the other Agreed. BTW for future reference how did you find that script? Maybe just had the presence of mind to search in User space? (I only searched in MediaWiki space.) Benwing2 (talk) 21:12, 19 April 2024 (UTC)[reply]

Yes, that's all it was. This, that and the other (talk) 02:28, 20 April 2024 (UTC)[reply]

(Notifying Mahagaja, Sartma): User:Erutuon User:-sche do any of you have an opinion about this? I notice that the use of "Ancient Greek" rather than "Ancient" is what the translation adder generates, and it's consistent with the handling of most other indented language sets, including Sami, Kurdish, Romani, Mari, Sorbian, Nenets, Arabic, Chinese, etc. The only exceptions where something other than a language name is specified in indented lines are (a) when a script is mentioned instead of a language name (e.g. underneath Serbo-Croatian, Mongolian, Old Church Slavonic, Javanese, Malay, etc.); (b) in some cases where etym-only varieties are indented instead of full languages (e.g. varieties of North Frisian, Ossetian and Albanian); (c) in the case of Bokmål and Nynorsk. Benwing2 (talk) 02:10, 21 April 2024 (UTC)[reply]

I agree with your general principle of spelling out full language names ("Ancient Greek" is better than "Ancient"), this also helps if anyone has the presence of mind to Ctrl+F and look for "Ancient Greek" when they can't find it in alphabetic order. I am not personally a fan of nesting translations at all, but I recognize that other people like it. To me, it seems unintuitive to call a language e.g. "Whatever" in its L2, and sort that L2 after Walloon and before Zulu, but then in translations tables, sort it under A — if we're not considering Whatever to be a dialect of Apache when it comes to having actual entries, it seems unintuitive to me to be subsuming it like a dialect of Apache in translations tables, and I have sometimes thought a translation was missing from a table (only to find it upon trying to add it) as a result of this. But I recognize that other people feel the opposite way and think sorting all Apachean languages under A (etc) is the more intuitive thing. - -sche (discuss) 02:53, 21 April 2024 (UTC)[reply]

@-sche Yes, I am of two minds about this for the reasons you state. I think it's especially problematic when the L2 language name doesn't include the language-set name in it. I asked about this specifically in the context of Aramaic in the Grease pit discussion that prompted this: besides lects ending in "Aramaic", there's also "Mlahsö", "Turoyo", "Classical Syriac", "Hulaulá", "Hértevin", "Koy Sanjaq Surat", "Lishana Deni", "Lishanid Noshan", "Lishán Didán", "Senaya", "Classical Mandaic" and "Mandaic". I have never heard for example of Mlahsö and would have no idea that it's nested under Aramaic instead of found under M. (OTOH I assume someone who adds a Mlahsö translation or goes looking for one will know that it's a variety of Aramaic.) Benwing2 (talk) 03:33, 21 April 2024 (UTC)[reply]

@Benwing2: Given that "Greek" is ambiguous between Ancient Greek and Modern Greek (and presumably other varieties and chronolects of Greek), might it be worth labelling Modern Greek translations as "Modern Greek" and nesting them under "Greek" like all the other varieties and chronolects of Greek? I've noticed Sarri.greek specify "Modern Greek" in several of her edit summaries, so perhaps she has an opinion regarding this. 0DF (talk) 17:22, 21 April 2024 (UTC)[reply]

(If you mean renaming the language everywhere,) On one hand, this would also solve the problem (discussed further up this page) of people not realizing "Greek" means the modern language, and so writing that this or that Coptic term derives from "Greek". On the other hand, "Greek" is clearly a ~~modern~~ more common name for the language than "Modern Greek", and "Modern Greek" would be a weird header to give long-obsolete terms from the early end of the time period el covers (centuries ago). On a balance, I don't think it's a good idea. (If you mean only renaming the language in translations tables, that still seems confusing, to have two names for the language in different places, and (again) to be labelling obsolete old terms as "Modern".) I think we're just stuck with some confusion. (It's certainly not the worst such confusion, compare cases where two languages are both called the same name, e.g. Riang, but one is sometimes also spelled Reang, so we call that one Reang and the other one gets exclusive use of Riang, probably confusing anyone who wants to add terms in Reang but is familiar with it being Riang and so adds the terms as Riang...) - -sche (discuss) 17:55, 21 April 2024 (UTC)[reply]

@-sche: I meant only in translation tables, although I note that το Βικιλεξικό uses Νέα ελληνικά (Néa elliniká, literally “New Greek”) for its el language header. IMO, it wouldn't be nearly as confusing as you suggest, since Modern Greek translations are currently given at "Greek"; they would still be given at "Greek" if we nested "Modern Greek" there, just without the Ancient/Modern confusability. To your other point, very many modern things are obsolete, and not just in language; remember that all of Shakespeare is (Early) Modern English. 0DF (talk) 21:52, 21 April 2024 (UTC)[reply]

New attestation template for Korean[edit]

@Solarkoid @Tibidibi @AG202 @Chom.kwoy

As per the discussion at Wiktionary:Beer parlour/2024/March#Template:ko-etym-native without parameters is pointless and misleading, I have created a new template {{ko-attest}}, which is demonstrated here. This offers a few advantages over Template:ko-nat, namely:

Language code is manually specified instead of guessed based on the year, allowing for more flexibility
Term transliteration and formatting takes advantage of Module:links, giving us automatic transliteration, |t=, and |alt= for free.
Volume, page number, and url have been added as optional parameters, allowing for more specificity
Work "presets" are defined as their own templates, making them much easier to add and customize
- See Module:ko-etym for the maintenance burden posed by the previous strategy of putting everything in the module
- As an example of customization, Template:User:Lunabunn/ko-attest/YB uses the 세종한글고전 database to automatically link to the relevant page or volume; this kind of thing would have been impossible or very difficult under the previous model

Example: automatic linking
{{ko-attest/YB\|ᄃᆞᆯ〮\|volume=1\|page=1a}}. Output: First attested as Middle Korean ᄃᆞᆯ〮 (tól) in the Worinseokbo (月印釋譜 / 월인석보) 1:1a^[20], 1459.

It does not automatically categorize the term as native Korean or derived/inherited/... from the source language

Especially for editors,

Work title/year/... formatting is handled by the module, making it much easier to cite works without dedicated templates
Only the language code is required with both the term and work being optional
- This reduces the burden of finding a first attestation when you only know the Middle Korean form (from e.g. a dictionary)
- Incomplete invocations such as these can be tracked using Module:debug/track
We can finally treat derivation separately from attestation. It is often the case that the attested form is not a direct ancestor of the entry form, and we can make this clear.

Given below are some example etymologies for words that are currently problematic:

Example: 부럽다 (bureopda)

{{ko-attest|ko-ear|부럽다|...}}, probably from an earlier {{inh|ko|okm|nocat=1|-}} {{com|okm|nocat=1|블다|-어ᇦ-|pos2=adjectivizer}}.

Equivalent to now-obsolete {{dbt|ko|notext=1|븗다}}, first [[attested]] as {{ko-attest/...|okm|h=none|븗다|...}} from {{com|okm|nocat=1|블다|-ㅸ-|pos2=adjectivizer}}.

Output:

First attested as Early Modern Korean 부럽다 (pwulepta) in ..., probably from an earlier 블다 (pulta) +‎ 어ᇦ (-eW-, adjectivizer).

Equivalent to now-obsolete 븗다 (beulda), first attested as Middle Korean 븗다 (pulpta) in ... from 블다 (pulta) +‎ ㅸ (-W-, adjectivizer).

Example: 서럽다 (seoreopda)

{{ko-attest/...|섧다|...}}.

The origin of this particular form is {{unc|ko|nocap=1}}. By analogy with words like {{m|ko|즐겁다}}, one may reconstruct {{com|okm|nocat=1||-어ᇦ-|alt1=*셜-|pos2=adjectivizer}}, but the existence of such a verb is dubious. Instead, it may simply be the result of reanalysis by analogy.

Output:

First attested as Middle Korean 셟다 (syelpta) in ....

The origin of this particular form is uncertain. By analogy with words like 즐겁다 (jeulgeopda), one may reconstruct *셜 (*syel) +‎ 어ᇦ (-eW-, adjectivizer), but the existence of such a verb is dubious. Instead, it may simply be the result of reanalysis by analogy.

This topic DOES intend to:

Propose bringing this new template into main space for us to start using
Encourage discussion about the benefits and shortcomings of this new template
Acknowledge the need of separating attestation and etymology, hopefully providing a foundation for seamless transition into structured etymologies in the future

This topic DOES NOT intend to:

Call for the immediate deprecation of Template:ko-nat or Module:ko-etym

I acknowledge that this needs more discussion and future-planning. Lunabunn (talk) 05:55, 19 April 2024 (UTC)[reply]

Would also love the opinions of @Surjection, @Eirikr, and @Atitarev (as well as anyone else I have not mentioned).

I apologize for the mass ping, but I would love to get this as nice as possible from the onset. Lunabunn (talk) 05:58, 19 April 2024 (UTC)[reply]

@Lunabunn: The work on providing references is great. I am not familiar with sources and how to use them but it seems someone needs, at least to make some effort and find the entry. My original objection to simply remove {{ko-etym-native}} was only when no replacement is offered. Of course, the more detailed the reference is, the better! Thanks.

Let's take "Sinjeung yuhap". Is it easily available, online and has an English translation? Anatoli T. ^{(обсудить}/^вклад) 06:29, 19 April 2024 (UTC)[reply]

Most sources are going to lack English translations, unfortunately, but at least providing the original image as I have with YB seems to be a good addition.

As for the 신증유합 example, (as with most other "high-profile" texts,) scans are available online. The issue with linking to individual pages may be hosting, as many scans are not directly linkable as are Sejong DB's; Chom.kwoy might be able to shed some light here.

In addition, I will note that the |url= parameter is optional & all URL-related work is done in the work template. See Template:User:Lunabunn/ko-attest/YB. For works where we cannot get per-page links, we always have the flexibility to link instead to the entire PDF (or not link at all). Lunabunn (talk) 06:45, 19 April 2024 (UTC)[reply]

@Lunabunn: You have just LIED that "As per the discussion at Wiktionary:Beer parlour/2024/March#Template:ko-etym-native without parameters is pointless and misleading". It was pointed out that it is being used to record that the term feels 'native', a feeling that is not rigidly tied to the true etymology. Unfortunately, there has been no agreement that I have seen on a method of recording that. --RichardW57m (talk) 15:47, 19 April 2024 (UTC)[reply]

Like I said, I never called for ko-nat to be deprecated, nor did I claim that we reached a consensus. All I said was that I have created a new template as per the discussion---which is true---explicitly noting that further action will require more discussion. Your comment is thus explicitly outside of the scope of this thread.

Please do not resort to childish attacks such as calling me a "LIAR" solely based on your unfortunate assumption that everyone who doesn't personally agree with you must be malicious and deceitful. Lunabunn (talk) 16:13, 19 April 2024 (UTC)[reply]

That being said, for additional context that I have already provided in the original thread:

ko-nat doesn't print a "Of native Korean origin" message unless it is invoked incompletely without the attestation parameters. In such cases, this template would not be used anyway (as there is no attestation to speak of).

The only other difference is categorization, which editors can add manually (or, for the time being, even using ko-nat alongside this template). There is absolutely no difference from the user/reader's point of view. This template merely provides nicer formatting and more flexibility for editors. Lunabunn (talk) 16:20, 19 April 2024 (UTC)[reply]

I personally really like how this template is laid out. Template:ko-etym-native was already very hard to maintain, very rigid in how it works and just generally difficult to edit, at least for me. As I already mentioned and/or agreed on the aforementioned template's beer parlour thread, it made unnecessary and uncalled judgments and there was no good way around it. Besides, the name was pretty misleading to a lot of editors, in my opinion. I think this template just solves most of the problems associated with it. Plus, it is called exactly what it does, mentions Korean attestations. I'm sure we will need to discuss some stuff mostly about picture URLs but for now it should work. I was (and still somewhat am) of the mind that first attestations in Modern Korean's etymology section is unnecessary, however until Middle Korean entries become sufficiently good for templates like this (which, who knows when it's gonna happen), I think it'll do just great. - Solarkoid (talk) 18:06, 19 April 2024 (UTC)[reply]

I am not an editor for KO entries, aside from minor maintenance stuff, as my Korean capabilities are very much at the "beginner" level currently. That said, from the perspective of a reader of KO entries, including occasionally the wikitext, I think this proposed template looks good, both in terms of output and wikitext. And as others have also noted, it "does what it says on the tin", unlike Template:ko-etym-native. +1 from me. 😄 ‑‑ Eiríkr Útlendi │^{Tala við mig} 22:17, 19 April 2024 (UTC)[reply]

I'm very late, but Support from my end. AG202 (talk) 14:34, 26 April 2024 (UTC)[reply]

East Bergish[edit]

There's something of a slow-moving edit-war between Special:Contributions/2003:DE:374E:E270::/64 and Special:Contributions/Sarcelles happening across a variety of pages, revolving around whether Ostbergisch is A Thing (valid clade / lect) or not. Who do we have who is able to weigh in and, ideally, bring references to bear? (@Kolmiel, Korn? Not active recently.) (Tiny and limited prior discussion: in the ES.) - -sche (discuss) 20:38, 20 April 2024 (UTC)[reply]

Not something I'm an expert in, but reversions like this justify more explanation than "POV and other problems", and I see many, many similar removals by @Sarcelles. I also see Sarcelles pointing to various Wikipedia talkpages in various edit summaries, too, but these are wholly irrelevant to Wiktionary. Theknightwho (talk) 20:46, 20 April 2024 (UTC)[reply]

I didn't know of the ES discussion until a few minutes. A typical opinion removed by me was Dutch basically is Low German. https://en.wikipedia.org/wiki/Talk:East_Bergish and https://en.wikipedia.org/wiki/Talk:Limburgish are among the talk pages frequented by me. Sarcelles (talk) 20:56, 20 April 2024 (UTC)[reply]

@Sarcelles: Wiktionary is a dictionary, so quotes are used to show usage, not verify facts. The most wrongheaded and vicious lie is fine as a quote as long as it uses the term in the spelling and with the definition being illustrated. Chuck Entz (talk) 21:14, 20 April 2024 (UTC)[reply]

What can we do with doubtful content? Sarcelles (talk) 21:16, 20 April 2024 (UTC)[reply]

Although lest anyone get the wrong impression of our policies and practices, a quote with vicious misinformation is better on the Citations page than in an entry! and if there are enough other quotations to verify the sense, may not be needed at all.
In the case of linguistic families, I'm sympathetic to the point of view that if we had, for example, an old book somewhere which erroneously included Hungarian in a list of Slavic languages, "The Slavic languages include Russian, Polish, Serbian, Hungarian, Slovene, and Bulgarian.", this might not actually be useful to us as a citation of Slavic, and/or factual considerations of whether Hungarian was actually Slavic might outweigh the existence of that cite, when it came to deciding whether to label a word as "Slavic, specifically Hungarian" (to give a parallel to some of the entries being discussed here). (But if there were many such cites using a different-than-usual sense, as may be the case with some of the Low German / Niederdeutsch cites being discussed, that would merit a separate sense line, yes, like obsolete taxonomic categories, etc.) - -sche (discuss) 22:03, 20 April 2024 (UTC)[reply]

It's not an issue about East Bergish, but about Ostbergisch. The English term is a protologism (failing WT:CFI), while the German term has three usages (hence passing WT:CFI).
Ostbergisch is not a universally used term, instead it's rather uncommon (cp. the label), but it exists (at least three usages, passing WT:CFI).
This stuff isn't in line with the quotations (e.g. 2021: "Ostbergisch, einer niederfränkischen Mundartgruppe"), and instead is rather defining Bergisch, another term and a totally different thing.
As for other stuff, like e.g.:
- Krefeldisch:
  - It's not a good explanation: Why remove a quotation with a usage of the term in the first place? And why remove the most recent and not the oldest?
  - Three usages are needed as for WT:CFI, so the third quote is needed as well to proof the existence.
- Niederrheinisch:
  - This should be quite self-explaining: The entry lists different senses of the term in question, some senses being old but attested and illustrated by older usages. Some quotes were removed. At first they were removed as being "doubtful quotations" - but they aren't doubtful, they can be verified with books.google.com. At second they were removed as being "ages old". But WT also covers "old" (obsolete, archaic, dated) and other (e.g. technical, uncommon, rare, offensive) usages/senses. And "old" usages/senses are usually attested with old quotations which were provided to show the existence (cp. WT:CFI about attestion etc.).

--00:11, 21 April 2024 (UTC)

I doubt that either of the IP author or me knows the rules. How many quotations are advisable? What is done with obsolete concepts? Sarcelles (talk) 04:20, 21 April 2024 (UTC)[reply]

@Sarcelles: Well to put it bluntly, you are even supposed to have a POV, being a currently informed observer of past matters. Being a secondary source, Wiktionarians have some leeway to be creative and combine different views: notably w:WP:SYNTHESIS does not apply, which enables Wiktionary to have the most correct, balanced, perspectives on word histories, in comparison against references available. We don’t write biographies of living persons, all is in the linguistic material, every statement based on accessible corpora, right? That’s why we afford this approach. So you don’t have to start from references containing the language names in the first place but base glosses helpful to readers on your experience of usage, your general impression, which of course does not preclude additional research with the purpose of creating entries.

Clinging to the understandings of previous references does not even pay justice to the diachrony and synchrony distinction, which reflects in distinguishable styles and orders of glosses: some sort their word explanations by most common to rarest, others chronologically to give better credit to etymology which too needs to be written, others logically (as field, which otherwise is unreadable), depending on how they felt.

If you are into psychobabble (I’ve recently grown fond of): intersubjectivity, being the actual goal rather than objectivity because our human readers, as social animals or neurotypes more interested in social proof and framing rather than pure reason anyways, requires greater mentalisation capabilities in the form of the editor being able to interpret what was behind an author’s utterances when he wrote a quote. We cannot go without our personal interpretation of the information available to us, for talking to other people—whenever you are not only speaking to yourself in the shape of private language—involves the hot potato of theory of mind, which we may be more or less explicit about: Talk:skoliosexual. Fay Freak (talk) 08:55, 21 April 2024 (UTC)[reply]

I also see that @Sarcelles continued to remove quotations even after the difference between WT and WP was explained to them. This isn't really acceptable, to be honest, especially not when "to a large extent irrelevant" isn't actually true. It's absolutely relevant that the removed quotations were being used to support usage. Theknightwho (talk) 19:05, 21 April 2024 (UTC)[reply]

I've been following the debate between Sarcelles and the Paderborn-based IP on enWP for quite some time (and also actively participated in some of the discussions). Sarcelles has been kind enough to bring to my attention that they also extend their bickering to various other Wikimedia projects, including English Wiktionary.

First of all, @Sarcelles, keep in mind that Wiktionary is about words, while Wikipedia is about things that pass the threshold of general notability. So e.g., Ostbergisch might be a contentious concept; for my part, I believe it is a useless label for an arbitrary residual artefact of rigid inclusion criteria (= Uerdingen line, Benrath line, en-Einheitsplural) for four surrounding major dialect groups (Kleverlandish, South Low Fraconian, Ripuarian, Westphalian). But that's largely irrelevant for Wiktionary. The term is a neologism coined by German dialectologist Georg Cornelissen. Because of his affiliation to the LVR (Landschaftsverband Rheinland), this term has received an enormous online boost and consequently has gained some prevalence outside of Cornelissen's bubble in quite a few printed texts. So it clearly deserves an entry here.

The only things that need to be discussed here are:

Does the label "uncommon" sufficiently capture its usage?
Is the definition ("A Low Franconian variety, spoken in the German state of North Rhine-Westphalia") too much in-universe? We don't define "fish" as "A biological taxon", but with a simple descriptive definition. What about "A label for dialects spoken in Bergisches Land in the German state of North Rhine-Westphalia", maybe with direct attribution to Cornelissen?

As for the Paderborn IP entries, I can see a general problem with them being very ambitious in trying to capture all possible of definitions of highly fluid terms like Niederrheinisch in German dialectology. Currently, we have four definitions in Usage notes. The second one is overly detailed and as a result wrong; Wenker and Wiesinger have used the term for quite similar concepts, but Wiesinger didn't see the Uerdingen line as the southern demarcation for Niederrheinisch, but rather the Akzentgrenze. And worst of all, the most common defintion is lacking: most traditional dialect in the Niederrhein area are moribund of extinct; for most people, Niederrheinisch refers to the regional Umgangssprache of that area which is markedly different from colloquial Rheinisch (as spoken around Cologne) and Ruhrdeutsch. Bluntly speaking, I think that the IP's ambitions are occasionally not matched by sufficient familiarity with the relevant literature, leading to the odd inaccuracies and lacunae in the Usage notes of Niederrheinsch (and potentially other entries). –Austronesier (talk) 10:22, 21 April 2024 (UTC)[reply]

"for my part, I believe"
- I for my part prefer NPOV, it doesn't matter what I or some other WT editors think, believe or prefer: others obviously didn't share this believe, used Ostbergisch, and the term is sufficiently attested.
"Wiesinger didn't see"
1. Sometimes terms are used without giving a clear definition first, so Wiesinger (1975/2017) could have the term without a clear definition.
2. Wiesinger (1983, in Dialektologie vol. 2, HSK 1.2) has e.g. the following (which indeed could lead to another sense):
  - p. 859: "Das Niederfränkische am Niederrhein [...] Niederrheinisch oder Kleverländisch bezeichnet. [...] Obwohl es mangels der Lautverschiebung [..] als „niederdeutsch“ bezeichnet wird, [...]. [Some features from the vocalism and consonantism.]
  - p. 856 ("Karte 47.10"): Here several "Niederfränkisch-ripuarische Strukturgrenzen" are given.
3. In the given quote, Wiesinger (1975/2017) refers to Goossens and the Ürdingen line:
  "GOOSSENS [...] erwies sich auch ihm [...] die Ürdinger Linie [...] als die sprachliche Hauptscheide gegen das Westfälische im Osten und das Niederrheinische im Norden."
"A label for [...]"
- That's a start of a rather bad definition like "fish: a word/term for cold-blooded vertebrate animal that lives in water". (Cp. e.g. de.WT's Einsetz-Probe.)
"And worst of all, the most common defintion is lacking: [...] Niederrheinisch refers to the regional Umgangssprache of that area which is markedly different from colloquial Rheinisch (as spoken around Cologne) and Ruhrdeutsch."
- de.WP doesn't have that sense as well. [It has: "Niederrheinisch ist [...] Mundarten des Niederrheins", and calls the regiolect which developed from the dialect(s) "Niederrhein-Deutsch". Though, albeit no surprise, it lacks several senses which are attested in en.WT.]
  Feel free to add this sense, but please don't forget quotes.

--21:13, 22 April 2024 (UTC)

I for my part prefer NPOV, it doesn't matter what I or some other WT editors think, believe or prefer Not the first part of the sentence, but the latter part of the sentence is a quite good description of the attitude of this user. What's more, this user seems to confuse WT with a talk page of German POV, German language, German manners and German quality. They are piling up arguments in favour of their views excessively.Sarcelles (talk) 05:37, 23 April 2024 (UTC)[reply]

en:List of fallacies, e.g. argumentum ad hominem. --2A01:599:640:4B27:905A:B12A:943E:F96B 06:04, 23 April 2024 (UTC)[reply]

This is a major conflict across Wikimedia. What do other users think? Sarcelles (talk) 06:19, 23 April 2024 (UTC)[reply]

which languages to indent?[edit]

Related to the above topic on whether Ancient Greek should be indented as "Ancient" or "Ancient Greek", there are several cases where the existing practice is equivocal as to whether to indent. MediaWiki:Gadget-TranslationAdder-Data.js contains a list of languages to indent, but some others tend to be indented as well. Some questionable cases:

German: The translation adder data puts Alemannic German, Kölsch and Palatinate German indented under German but not other German varieties. (Not to mention that it puts German Low German and Dutch Low Saxon indented under Low German instead.) The stats bear this out to some extent: Alemannic German is indented 795 times vs. non-indented 127 times, whereas Bavarian is indented 33 times vs. non-indented 452 times, and Pennsylvania German is indented 32 times vs. non-indented 355 times. But I don't see why Alemannic German should be treated differently from Bavarian, Pennsylvania German, East Central German, etc. What should be done? A fuller table looks like this:

Language	Indented	Non-indented
Alemannic German	795	127
Bavarian	33	452
Central Franconian	78	72
Cimbrian	2	144
East Central German	8	69
East Franconian	2	3
Kölsch	1	1
Luxembourgish	2	3476
Middle High German	55	43
Mòcheno	1	362
Old High German	47	169
Palatinate German	0	0
Pennsylvania German	32	355
Rhine Franconian	49	45
Swabian	7	81
Swabian German	3	0
Vilamovian	0	408

Low German: The translation adder data says to indent German Low German and Dutch Low Saxon, but not Middle Low German. In reality, it's 17 indented Middle Low German vs. 31 non-indented.
Greek: The translation adder data says to put Ancient Greek and Mycenaean Greek under Greek. Presumably other ancient varieties get indented too. But what about Pontic Greek, Mariupol Greek, Tsakonian, etc.? Currently it's 7 Pontic Greek indented vs. 86 non-indented, 0 Mariupol Greek indented vs. 14 non-indented, 3 Byzantine Greek indented vs. 1 non-indented, 5 "Cappadocian" indented vs. 12 "Cappadocian Greek" non-indented.
Persian: The translation adder data says to indent Iranian Persian, Classical Persian and Dari, but not Middle Persian or Old Persian. In reality, it's 27 indented Middle Persian vs. 256 non-indented, and 15 indented Middle Persian vs. 84 non-indented.
Apache: The translation adder data says to indent Western Apache, Jicarilla and Chiricahua, but not Plains Apache or Lipan (probably an oversight). In reality, it's 2 indented Lipan vs. 6 non-indented and 2 indented Plains Apache vs. 9 non-indented. Note also that Navajo is on the same level phylogenetically as the other Apache varieties but is almost certainly excluded intentionally.
Irish: The translation adder data says to indent Old Irish and Middle Irish but not Primitive Irish (almost certainly an oversight). In reality it's 5 to 5 for Primitive Irish.
Khanty, Mansi, Rusyn: All recently split. Based on other patterns, probably they should all be indented but they're not mentioned in the translation adder.
Nenets: Not mentioned in the translation adder. Should Tundra Nenets and Forest Nenets be indented? Currently it's 3 Tundra Nenets indented vs. 56 non-indented and 2 Forest Nenets indented vs. 10 non-indented.
Nahuatl: Not mentioned in the translation adder. Should the various Nahuatl varieties be indented?
Malay: Not mentioned in the translation adder. Should the various Malay varieties (Brunei Malay, Ambonese Malay, Baba Malay, Pattani Malay, North Moluccan Malay, Manado Malay, etc.) be indented?

Etc. Benwing2 (talk) 06:03, 21 April 2024 (UTC)[reply]

Indent all German except Ausbausprache Luxembourgish (we don’t indent Maltese either). Indent all Greek. I guess indent Apache due to terminology difficulties (as Aramaics). Of course chronolects of Irish. Indent all Nahuatls, I have no doubt, and Malays. Fay Freak (talk) 08:05, 21 April 2024 (UTC)[reply]

Don't indent Old (or Middle) High German IMO, I look for that under O (and M), just like Old English which we also don't indent. Don't indent Luxembourgish, which, as the proverb goes, has its own army. I am sceptical of the helpfulness of nesting Bavarian or Alemannic German, but maybe editors who add those languages can weigh in on what they would find helpful vs unhelpful. I am very ambivalent about indenting the rest. To me it seems normally unhelpful / unintuitive to take a language that we sort alphabetically when it comes to L2s, and that we indeed give its own L2s in recognition that it is not merely a dialect of something else, but then sort it like a dialect in translations tables; to me it only seems intuitive when it's a situation like Chinese where we also sort all the languages under one L2. (I am neutral about nesting Arabic, where at least all but one(?) of the nested languages have "Arabic" in the name and are probably thought of as dialects by many people.) But I recognize that other people find it more intuitive, for whatever reason, to sort L2s and translations differently. - -sche (discuss) 15:09, 21 April 2024 (UTC)[reply]

Error: There is no English to indent under. Otherwise I would suggest to indent Old and Middle English under it. Fay Freak (talk) 17:59, 21 April 2024 (UTC)[reply]

Interesting point! I mentioned Old English because there's a place in the code where ang and enm have specifically been commented out of being nested... I guess at some point in the distant past someone put them under an empty "English:" header. (Anyway, I oppose nesting Middle or Old High German, and am inclined to oppose nesting the others; it is unintuitive to sort "Old Norse" under "O" but "Old High German" under "G", IMO.) - -sche (discuss) 18:10, 21 April 2024 (UTC)[reply]

@-sche FYI there are two Arabic varieties (not counting Maltese) that don't have "Arabic" in their name: Hassaniya and Nubi. Nubi is a creole (although Juba Arabic is also a creole but has Arabic in its name, presumably because Juba by itself is a city), but I don't know why Hassaniya doesn't have Arabic in its name (cf. the Wikipedia article Hassaniya Arabic). Benwing2 (talk) 21:00, 21 April 2024 (UTC)[reply]

FWIW It was proposed to add "Arabic" to the name of Hassaniya at WT:RFM#Renaming_mey for consistency back in 2017, if anyone has time to make the rename. (I would suggest checking with anyone editing the language first to make sure they're still on board with a rename, but no one seems to be editing the language; our few entries seem to have been added years ago and the most recent editor to touch them was TongcyDai just helping with categorization and listing no knowledge of the language.) - -sche (discuss) 21:58, 22 April 2024 (UTC)[reply]

Repeating what I wrote in #indented translations: "Ancient" or "Ancient Greek"?, given that "Greek" is ambiguous between Ancient Greek and Modern Greek (and presumably other varieties and chronolects of Greek), might it be worth labelling Modern Greek translations as "Modern Greek" and nesting them under "Greek" like all the other varieties and chronolects of Greek? Note that this would not move Modern Greek translations to a new, unintutive location, and it would prevent the conflation that Ancient Greek and (Modern) Greek often suffer. 0DF (talk) 01:56, 22 April 2024 (UTC)[reply]

@0DF There is a note in MediaWiki:Gadget-TranslationAdder-Data.js that reads:

//el:'Greek/Modern', don't nest Modern Greek (Atelaes)

So evidently this idea was considered and rejected. Maybe User:-sche has the history on this. (User:Atelaes has not been active for 10 years so this decision is very old at this point and could potentially be reconsidered.) Benwing2 (talk) 02:07, 22 April 2024 (UTC)[reply]

@Benwing2: Does anyone have contact details for Atelaes? Perhaps he could enlighten us with a rationale. Without one, that note is just an inscrutable dead hand. 0DF (talk) 02:15, 22 April 2024 (UTC)[reply]

@0DF I think we'd probably have better luck searching through Beer Parlour or Grease Pit archives and examining the changelog history of MediaWiki:Gadget-TranslationAdder-Data.js. Probably the rationale is there somewhere. BTW -sche is especially good at digging up relevant old discussions. Benwing2 (talk) 02:40, 22 April 2024 (UTC)[reply]

@Benwing2: -sche appears to have been here over six years longer than MediaWiki:Gadget-TranslationAdder-Data.js, so I am hopeful! 0DF (talk) 02:49, 22 April 2024 (UTC)[reply]

@Benwing2: For starters, the comment //el:'Greek/Modern', don't nest Modern Greek (Atelaes) has been present in MediaWiki:Gadget-TranslationAdder-Data.js since its creation. @Dixtosa, can you tell use why you added that comment, please? 0DF (talk) 02:55, 22 April 2024 (UTC)[reply]

@0DF Dixtosa split out the translation adder data in 2017 from MediaWiki:Gadget-TranslationAdder.js. This diff [21] is the one that added this note to MediaWiki:Gadget-TranslationAdder.js in 2010, from User:Conrad.Irwin, who is long-gone. But the fact that it happened on 8 June 2010 indicates that probably the June 2010 Beer Parlour will indicate the rationale. Benwing2 (talk) 03:01, 22 April 2024 (UTC)[reply]

@Benwing2: I tried searching through the Beer parlour, the Grease pit, and the user talk pages for Atelaes and Conrad.Irwin. The only somewhat relevant thing I could find was this post by Atelaes, which states “Greek is solely a super-heading, with no actual content (the content is in its subheading under ‘Modern’)”, which suggests that Modern Greek did use to be nested under "Greek", as I propose. It seems that such nesting caused problems for a Javascript gadget of his, User:Atelaes/TargetedTranslations.js. Could that be the entire reason? Does anyone still use that gadget? 0DF (talk) 03:51, 22 April 2024 (UTC)[reply]

@0DF Well, it's been ported to the MediaWiki space under MediaWiki:Gadget-TargetedTranslations.js, so it's become an officially supported gadget. But I doubt it still has issues with this because there are tons of nested translations. It's more a case then of what we believe the right thing to do is. Benwing2 (talk) 03:57, 22 April 2024 (UTC)[reply]

@Benwing2: I did some testing with [[Middle Ages]] and [[water/translations]]. AFAICT, the way to “[s]elect preferred languages” using MediaWiki:Gadget-TargetedTranslations.js is completely different from that using User:Atelaes/TargetedTranslations.js: with the latter, one had to type the language name precisely; with the former, one toggles a star icon (⭐), which takes the place of the bullet to the left of each language. As with Atelaes’ script, if I select, for example, “Kurdish” (simpliciter) in the table in Middle Ages, it does't show me anything; however, if I select “Northern Kurdish” therein, it shows me “Kurdish: Northern Kurdish: Serdema Navîn ^(ku)” (following “historical period - ”). So yes, the issue with nested translations appears to have been resolved.
All that being said, if the selected language is not nested in the way the gadget expects it to be, it will fail to display the selected language's translation, even if it exists. For example, if I select “San Juan Atzingo Popoloca” in the translation table in Middle Ages, wherein it is a stand-alone unnested language, it shows me “San Juan Atzingo Popoloca: please add this translation if you can” (following the Northern Kurdish translation + “; ”); however, for the first table in [[water/translations]] (liquid H₂O), the gadget only shows me “Kurdish: Northern Kurdish: av ^(ku) f ”, even though the table also contains a translation for San Juan Atzingo Popoloca (⁴nta⁴). The reason for this is that San Juan Atzingo Popoloca is nested under “Popoloca” in that table. I deselected “San Juan Atzingo Popoloca” in Middle Ages, added *: San Juan Atzingo Popoloca: {{t-needed|poe}} nested under * Popoloca: to Dark Ages, and then reselected “San Juan Atzingo Popoloca” there; refreshing [[water/translations]], the gadget then showed me “Popoloca: San Juan Atzingo Popoloca: ⁴nta⁴”; refreshing [[Middle Ages]], the gadget fails to show me anything for San Juan Atzingo Popoloca. This, if anything, shows the technical value of consistency. The translations in [[water/translations]] are in many places listed contrary to the way they are currently listed by the translation-adding tool. Is there a bot that enforces the prescribed way of listing translations in translation tables? If so, can it be run on [[water/translations]]? 0DF (talk) 17:17, 22 April 2024 (UTC)[reply]

@0DF @-sche Hmm. I have a script I've been working on to indent languages that aren't indented but where the translation adder says they should be, but it currently doesn't go the opposite direction (i.e. unindent where the translation adder says not to indent) except in a few cases. If we are serious about being consistent with our indenting, we should probably fix the translation adder to know about the various cases identified in water/translations, e.g. Amuzgo, Chinantec, Coptic, Fula, Javanese, Kashmiri, Kipchak, Ladino, Mari, Mazahua, Mazatec, Me'phaa, Mixtec, Nahuatl, Otomi, Popoloca, Popoluca (are they different?), Talysh, Teke, Tepehua, Totonac, Zapotec and Zoque. -sche would probably disagree with all these cases; regardless I think we need a general principle concerning whether to indent or not rather than doing it in an ad-hoc, half-assed fashion. Then I can do a bot run to fix all the cases needing fixing. Even better IMO, however, would be to fix the targeted translations gadget to be less picky about whether a given language is indented or not. Benwing2 (talk) 21:06, 22 April 2024 (UTC)[reply]

Re "Popoloca, Popoluca (are they different?)": confusingly, yes—at least in theory. Scholars try to use -u- is for Mixe-Zoque languages, -o- for Oto-Manguean languages. The Mixe-Zoque Popolucas in particular are, as the name suggests, not even all in one subfamily, let alone intelligible with one another, and they only share the name "Popoluca" (together with the Popolocas) because that was the Nahuatl pejorative for "(non-Nahuatl) gibberish". (Despite this etymology, the speakers of the languages have often gotten used to the name and find academics' proposed clearer replacements unfamiliar; Lynda Boudreault's 2018 A Grammar of Sierra Popoluca devotes some pages to this.) The various things we nest really seem to run a gamut from "related and sometimes considered varieties/dialects of one language (whether correctly/defensibly or not)" (Arabic, Chinese) to "I guess people might group these for convenience?" to "not related, but the names sure do sound similar". - -sche (discuss) 21:58, 22 April 2024 (UTC)[reply]

Although we have a lot of users in CAT:User el, I don't think we've ever had a very active Greek-speaking editor community of more than a few users (our lack of active Greek speakers has been a memorable problem when Greek questions have come up), so my guess would be that Atelaes and whoever else was actively editing Greek at the time may have briefly talked about it (potentially even somewhere now-inaccessible like IRC, where some parts of our language-code 'naming' schema were also hashed out) and just decided they personally didn't want it nested, so as Benwing says, it's probably just a case of whether we personally now want it nested. FWIW, not nesting it seems consistent with most other languages, where the modern or main language is not nested (Arabic, French, etc). I am not inclined to nest it, personally, but (pending any general decision on whether to nest vs stop nesting things in general) would defer to the few active Greek users we have now, which seems to be ... Sarri and Omnipaedista and maybe someone else? - -sche (discuss) 04:08, 22 April 2024 (UTC)[reply]

@Sarri.greek @Saltmarsh as two people who work on Modern Greek. Also @Omnipaedista. I am also inclined not to nest as well, for consistency with other languages, as you mention. Benwing2 (talk) 04:17, 22 April 2024 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Benwing2@Sarri.greek I'm afraid that I haven't absorbed all the arguments above and, approaching semi-retirement, I'm neutral on the subject. New Ancient Greek users unused to Wiktionary will probably look under G — on the other hand uniformity with the majority of languages might make sense. So I don't have strong feelings on the issue (as indeed about whether Greek should be called Modern Greek.) — Salt marsh ^☮ 05:30, 22 April 2024 (UTC)[reply]

@-sche Please take a look at User:Benwing2/analyze-indented-translations-20240420-dump. I analyzed the existing occurrences of indenting. The table sorts first by the number of times a header (e.g. Chinese, Arabic, Kurdish, Serbo-Croatian) occurs with at least one indented language under it (where an indented language could be any translation with any label, including things like "Cyrillic"). Under each header is listed (alphabetically) all languages that occur under it anywhere. To the right are the counts of how many times the language occurs under the header in question and then how many times the language occurs total (indented or not, and indented under any header). I think we need to establish a general principle for whether to indent a language, and I propose the following:

Whenever a language is of the form "Qualifier Macrolanguage" e.g. "Saterland Frisian", "Iraqi Arabic", "Tundra Nenets", "Western Mari", "Ancient Greek", "Middle French", etc., it gets indented under the macrolanguage.
- The main advantage is that this makes it possible to quickly compare translations from similar languages. This, I think, is fundamentally why people like indented translations.
- The main disadvantage (as -sche would put it) is that it makes it harder to locate a given language's translation. (While this is true, I think it's partly negated by having a consistent policy of when we indent; our current ad-hoc situation gives us the worst of both worlds.)
If an L2 language is missing the name of the macrolanguage, but (a) logically goes under it (which should maybe exclude pidgins and creoles), and (b) does not have its own "army" (as the proverb goes), it also gets indented. Hence, Mlahsö, Turoyo, Mandaic go under Aramaic and maybe also Classical Syriac (although this could potentially be said to have its own "army"), but not Maltese under Arabic or Navajo under Apache. The criterion about having its own army is an attempt to balance interest in a language primarily for comparative purposes, which justifies indenting it (and happens more with obscure languages) vs. interest in the language for its own sake, which justifies not indenting it so it's easier to find (which happens more with more well-known and well-represented languages).

So I propose the following:

Use the table I linked to above as a starting point to compile a complete list of all macrolanguages and individual languages to indent under them.
Update the translation adder appropriately.
Run a script to indent any L2 languages not correctly indented. (I am a bit loath to automatically go the other direction because there may be unrecognized etym-only varieties nested underneath a language that we don't want to automatically unindent.

Along with this proposal I propose that the names of L2 and etym-only languages as appearing indented under macrolanguage headers should *ALWAYS* match the actual name of the L2 or etym-only language, except for a small, well-defined set of exceptions. Possibly, for example, Bokmål and Nynorsk could be exceptions (the policy just enumerated would call for 'Norwegian Bokmål' and 'Norwegian Nynorsk'), although I must say I don't see a compelling reason to make such an exception. Benwing2 (talk) 07:40, 24 April 2024 (UTC)[reply]

One more issue: Should we use "Latin" or "Roman" in reference to Latin-script entries in languages such as Serbo-Croatian, Uzbek and Ladino? Serbo-Croatian, which is the split-script language with by far the largest number of translations, favors "Roman" but for other languages, current usage is split. I favor "Latin" as that is the name of the script used both here and in Wikipedia (the Wikipedia article is called Latin script). Stats are as follows:

Serbo-Croatian: Roman (14,895) vs. Latin (600);
Mongolian: Roman (3) vs. Latin (0);
Old Church Slavonic: Roman (6) vs. Latin (0);
Uzbek: Roman (72) vs. Latin (15);
Azerbaijani: Roman (83) vs. Latin (4);
Ladino: Roman (47) vs. Latin (59);
Fula: Roman (26) vs. Latin (41);
Kazakh: Roman (2) vs. Latin (14);
Turkmen: Roman (2) vs. Latin (2);
Pali: Roman (1) vs. Latin (5);
Tatar: Roman (7) vs. Latin (6);
Crimean Tatar: Roman (1) vs. Latin (8);

etc.

Benwing2 (talk) 05:38, 25 April 2024 (UTC)[reply]

OK, another issue I discovered as I try to update my script to normalize indented lang names: What about more specific varieties of existing L2's? This came up in the context of Hadrami Arabic (which is assigned an ISO 639-3 code ayh but where we merged it into Yemeni Arabic), but the same thing applies e.g. to Yemeni Arabic (San'ani), South Levantine Arabic (Palestinian), Lebanese Arabic, etc. Presumably we don't want to canonicalize these to the L2 name because that would lose information, but how should we format the extra lect info? Benwing2 (talk) 06:53, 25 April 2024 (UTC)[reply]

Latin-script Yiddish[edit]

I encounter Yiddish written in Latin script a lot (more often than I encounter it written in Hebrew script, actually, although my understanding is that overall Hebrew script is much more common?). Unlike with e.g. romanized Cyrillic, I do not get the impression that authors are using Latin script because they can't typeset Hebrew: Latin script is simply one of the scripts that people who use Yiddish use, like (and indeed to perhaps even a greater extent than) e.g. Arabic is one of the scripts people write Afrikaans in. I would therefore like to suggest that in the way we have Afrikaans entries like اِتْسْ with the definition "Arabic spelling of iets", or at the very least in the way we have Gothic entries like aiwaggeli as "Romanization of 𐌰𐌹𐍅𐌰𐌲𐌲𐌴𐌻𐌹", we should have entries for Latin-script Yiddish pointing people to the Hebrew spellings. - -sche (discuss) 21:06, 21 April 2024 (UTC)[reply]

@-sche I agree and I think there should be a better way of formatting it than the way done at اِتْسْ, which uses {{form of}}, and a better way of categorizing than using sccat= (maybe {{head}} should auto-categorize terms that are in a script other than the "dominant scripts" of the language, whatever those may be?). Either a new template, something like {{script spelling of}}, or a particular way of using the existing {{spelling of}} template. Benwing2 (talk) 22:05, 21 April 2024 (UTC)[reply]

Hmmm, looks like {{spelling of}} already does this if the thing in |2= is recognized as a script — and I was even the one who coded this up :) ... Benwing2 (talk) 23:08, 21 April 2024 (UTC)[reply]

The approach of using |sccat= in {{head}} currently includes the POS in the category name, whereas the approach with {{spelling of}} does not, since it isn't available. Which do you think is better? In the case of Afrikaans, we only have 2-3 terms per POS so probably they should be combined, but I could see the opposite argument being made when there are lots of such terms. Benwing2 (talk) 01:18, 22 April 2024 (UTC)[reply]

Well, some removed Latin-script Yiddish albeit it being attested (– if prescriptivism or anti-Judaism or anti-Germanism played a role in this?); cp. [22] & [23]. --2003:DE:374E:E207:18DF:B2BB:AAE0:CF74 21:15, 25 April 2024 (UTC)[reply]

Challenging pronunciations[edit]

Is {{rfv-pron}} the correct template for challenging pronunciations? It puts items in Category:Requests for references for pronunciations in Lithuanian entries, but I think references will often not be the resolution. Additionally, this is Wiktionary, not Wikipedia, so we are achieving a poor substitute for our goals when we merely select from other dictionaries. (The word that prompted this question was Lithuanian policija, where I am after the truth, not references.) --RichardW57m (talk) 09:33, 22 April 2024 (UTC)[reply]

You can reference pronunciations by means of primary sources innit. Corpus linguistics. Though I agree with the interpretation that the author of the template text was Wikipedia-infected and thus designed it otherwise. Fay Freak (talk) 16:03, 22 April 2024 (UTC)[reply]

Modify/deprecate NFCC or request re-enabling Special:Upload for all users?[edit]

Now there's a panorama I don't know if it's fair for us or not, on one hand, we have a policy called Wiktionary:Non-free content criteria (NFCC), describes how contents, include files, are accepting fair-used copyrighted non-free parts. But on the other hand, the well-known uploading form, Special:Upload, is restricted to administrators only here, this lead the NFCC policy has somewhat concept conflictions with uploading configuration, so far:

Do we still need NFCC without further local file uploads? As such, I guess these texts are no longer needed in NFCC (one under Policy section, and two under Enforcement):
1. "...including copyrighted images, audio clips, videos and other media files."
2. "A file with a valid non-free-use rationale for some (but not all) of the places it is used in will not be deleted. Instead, the file should be removed from the places for which it lacks a non-free-use rationale, or a suitable rationale should be added."
3. "If a user suspects that a file does not meet these criteria, that user should list that file on WT:RFDO. Files may be deleted after discussion."
Or can the NFCC be just deprecated? Or
Are there reasons we can accept admin-only local uploads? Shouldn't that be either fully disabled? Or re-enabled for all users for fair using files?

Liuxinyu970226 (talk) 04:06, 24 April 2024 (UTC)[reply]

Or maybe follow how Wikibooks does? Create an Uploader user group (like b:Wikibooks:Uploaders) and grant uploading permissions to that group, so that a specific number of users can upload while others can't? Liuxinyu970226 (talk) 04:14, 24 April 2024 (UTC)[reply]

It should be amended to explicitly state that only admins can upload. We should not allow general uploads and there should almost certainly be very, very few pieces of non-free media here. —Justin (koavf)❤T☮C☺M☯ 04:29, 24 April 2024 (UTC)[reply]

I don't support broadening the ability to upload images. The NFCC is badly drafted imho, but I suspect nobody else really cares, nor does it actually matter. On the other hand, I'd support adding a note along the lines of what Koavf proposes. We could also say "in the extremely rare case that an image upload is required, non-admins can make a request at the grease pit". This, that and the other (talk) 09:18, 24 April 2024 (UTC)[reply]

Added. This, that and the other (talk) 02:47, 25 April 2024 (UTC)[reply]

I don't see why image upload rights shouldn't be extended to autopatrollers. With that said, I'm grateful that we at least have a process for requesting uploads thanks to User:This, that and the other. Ioaxxere (talk) 06:02, 25 April 2024 (UTC)[reply]

@Ioaxxere Given the vanishingly small number of such images and the heavy restrictions on their presence, I would

Oppose this until/unless we significantly relax the restrictions on non-free images. Benwing2 (talk) 06:58, 25 April 2024 (UTC)[reply]

@Benwing2: "Vanishingly small" is one way to put it — there's exactly one! (on thagomizer). It seems like the current low quality is mostly due to disinterest as opposed to heavy restrictions. The NFCC criteria are actually fairly broad. Ioaxxere (talk) 07:12, 25 April 2024 (UTC)[reply]

Comment: not sure it's a good idea unless we have enough volunteers able and willing to assess uploads and determine if they meet any NFCC, or whether they need to be deleted as being copyright infringements. There are many more experienced volunteers doing this at the Commons and the English Wikipedia, and they have to deal with many people who either don't understand or don't care about copyright or any NFCC uploading material under incorrect licences. — Sgconlaw (talk) 17:56, 25 April 2024 (UTC)[reply]

CFI for constructed languages[edit]

to note: this started as a discussion on the Wiktionary Discord server, where I asked how to include a constructed language in Wiktionary. I was told that there was currently a debate on whether these should be included at all, with contention also on languages already on mainspace, such as Ido and Volapük.

I am a tokiponist and wish to see it in Wiktionary on the future. the arguments listed against the inclusion of more conlangs and the exclusion of the current ones seem weird to me. they are as follows:

these are not natural languages and have no native speakers, especially no native monolingual speakers. the several ways that the speakers' native language affects the conlang leads to many different fragmented styles of speech, in turn leading to the standard way of speaking being the one defined by its creator. additionally, the way that L1 and L2 speakers interact with their language is different. for this, language without native speakers should be removed.

Wiktionary's stated goal, as per WT:CFI, is to include "all words in all languages", and I don't see how these two arguments are relevant to the exclusion of these languages. on this discussion, I want to try to make a case for Toki Pona, as I'm not a speaker of any other excluded language.

first of all, even if these languages have no native speakers, they are still spoken (or have in some point in the past been spoken) by a large community. for Toki Pona, as of the latest census as of 2022, there were more than 1400 respondents,^[1] and this only counts mainly of the speakers who did respond, as many haven't or weren't even reached by the survey. this figure is larger than or comparable to other languages on Wiktionary, such as Ido and Interlingua.^[2] there are possibly more Toki Pona speakers and these would want to definitions for their language.

me and many contributors on the sona pona community, a Toki Pona wiki, have been collecting data on the works written in Toki Pona which were either physically published or freely licenced and available (or planned to be) on Wikisource. as of today, there are up to 125 authors, with many words with up to +100 independent citations.^[3] these are following Wiktionary guidelines on durably archived works above. if we were to include all of the available literature (possibly archived with the Internet Archive), these figures would certainly double in size.

these have many published books not only about the language but written in the language. this seems to be a big barrier as for many of languages as book publishing is expensive and takes time (never published one, I don't know).

in the Toki Pona community, there is not any superior guiding authority. the "standard" is mostly decided by what people consider to be correct and what is more widepsread. some of what was written by the creator herself has since (and even at the time) been considered weird, as her books reflect only the way she speaks, while speakers have all control in the language.

a question may arise whether actually non-notable languages may be included by this change of policy. however, this would only be the first step that a conlang must take, as it must also have a population of speakers, citations spanning more than a year, possibly an ISO 639 code or something similar

Previous discussions:

References:

^ jan Tamalu (2022). "Results of the 2022 Toki Pona census". Toki Pona census. GitHub. Retrieved 25 April 2024.
^ De Gruyter, W. "Phraseology in planned languages". Phraseology: an international handbook of contemporary research – Volume 2 via Google Books.
^ kulupu Menasewi, jan Juwan, jan Pensa (compilers). (2024). "Toki Pona word attestation for Wiktionary". Google Sheets. Retrieved 25 April 2024.

Juwan (talk) 14:20, 25 April 2024 (UTC)[reply]

@JnpoJuwan: You make some convincing points but can I ask why you'd like Toki Pona to be in mainspace? It seems like centralizing all Toki Pona-related information under Appendix:Toki Pona is the ideal approach given that Toki Pona has very few words in comparison with most languages. Ioaxxere (talk) 20:10, 25 April 2024 (UTC)[reply]

I'm generally against conlangs without native speakers as natives are what determine naturalness and correctness within a language. I think Toki Pona should remain an appendix language until things change. Vininn126 (talk) 20:24, 25 April 2024 (UTC)[reply]

Why should there be a need for "naturalness"? Conlangs aren't natural languages anyway.
And correctness can be jugded on another level: Is a term formed correctly (like Esperanto nouns ending in -o)? Does a construction follow the language's rules (word-endings and word-order)? --2003:DE:374E:E207:DC5C:89A9:BD02:9B52 21:32, 25 April 2024 (UTC)[reply]

I personally think all conlangs other than Esperanto should be in the Appendix, so I would oppose moving Toki Pona to mainspace (and support moving Ido, Interlingua and Volapük out of mainspace). Benwing2 (talk) 20:31, 25 April 2024 (UTC)[reply]

I second this. Ioaxxere (talk) 20:36, 25 April 2024 (UTC)[reply]

And why? Is there any argument other than "I personally think"? --2003:DE:374E:E207:DC5C:89A9:BD02:9B52 21:32, 25 April 2024 (UTC)[reply]

We need to draw a line somewhere on what we call 'language'. Obviously allowing anything is untenable, as we will be flooded by one-person conlangs. Allowing anything where we can find at least three cites for at least one word is possible, but pretty useless: Having a handful of words in the namespace for a conlang with thousands because only three resources in the language are durably attested isn't great for anyone. "Conlangs with native speakers stay, those without go" is a pretty valid line to draw. Thadh (talk) 21:45, 25 April 2024 (UTC)[reply]

of course having a small number of wordss for a conlang because there are only three sources would not be helpful. but why jump to conclusions like this? what is the normal process for adding a language to Wiktionary for natural languages? Juwan (talk) 22:27, 25 April 2024 (UTC)[reply]

I should note, I expressed this same opinion in Wiktionary:Beer_parlour/2023/February#Is_it_time_to_look_at_Toki_Pona_again? and 4 people agreed with me, including several of our most prolific contributors. I agree with User:Thadh that we need to take native speakers of conlangs into account, and most conlangs have very few (if any) true native speakers. I make an exception for Esperanto because it is well-known to have several thousand native speakers, which AFAIK can't be said for any other conlang. Benwing2 (talk) 21:51, 25 April 2024 (UTC)[reply]

@Benwing2 I think the line should be whether there are/have been any native speakers at all - I don't like the idea of drawing the line at thousands. Theknightwho (talk) 22:35, 25 April 2024 (UTC)[reply]

@Theknightwho My concern about having "any speakers at all" for conlangs is that someone will then point to a survey somewhere claiming 2 native speakers in Hungary that will lead to some obscure conlang ending up in the mainspace. A lot of claims of native speakers for conlangs are exaggerated and I doubt there are any conlangs at all with monolingual native speakers, whereas all natural languages have or had monolingual native speakers. Benwing2 (talk) 22:54, 25 April 2024 (UTC)[reply]

That is not entirely true. Pidgins are natural languages which don't usually have any native speakers. And it's definitely possible - in some cases probably even likely - for a creole language to form without any monolingual native speakers. --Spenĉjo (talk) 23:27, 25 April 2024 (UTC)[reply]

There has to be some measure that can be followed. Otherwise, we'll have cases like the infamous case of a father trying to have his son become the first Klingon native speaker. AG202 (talk) 03:37, 26 April 2024 (UTC)[reply]

To review some of the arguments in that discussion:

[Toki Pona] isn't capable of expressing concepts outside its scope by design. Any translation from English into Toki Pona and back again would result in a distorted message.

This may have been thought to be the case as of a decade ago, but per @CitationsFreak's reply, it is not accurate. Speakers have discussed far-out-of-scope concepts such as non-Euclidean geometry, the mRNA vaccine, and the theory of relativity in Toki Pona. The lack of jargon doesn't preclude effective circumlocution.

These and other articles are context-rich enough that they can be translated into a natural language without an unusual level of distortion.

So you say that over a thousand people use Toki Pona. Sweet, but do they publish durably archived works? can Toki Pona entries meet our criteria for inclusion?

When I looked into this last year [2022], I was able to find only two such works, and they were by the same author. I don’t know whether that has changed.

The spreadsheet linked in @JnpoJuwan's original post is part of an effort to address this. We can point to:

Several other published books
A printed zine that has been ongoing since 2021
Creative works by other authors, particularly poems and comics, that have been included in those sources
Hundreds of transcriptions on Wikisource

The initial 120 words have been used in many works that we believe meet the attestation criteria, by well over a dozen authors each. So have several words beyond that set, in spite of being contested as "unofficial" for the better part of a decade.

At least a couple dozen words ought to qualify for "clearly widespread use", being present in 100 authors' works (out of 125 authors as of writing):

ala, e, jan, kama, ken, la, li, lili, lon, mi, mute, ni, ona, pali, pi, pona, sona, suli, tan, taso, tawa, tenpo, toki, wile

And maybe the threshold should be lower; I just want to be safe for the purposes of this discussion.

Yes, most of the Toki Pona entries are currently light on citations. I and several other editors intend to address this. It would be perfectly fair to wait to mainspace Toki Pona until the entries have rigorous attestation. (Of course, that is a different discussion than the idea of barring Toki Pona from mainspace forever.) AgentMuffin4 (talk) 00:00, 26 April 2024 (UTC)[reply]

As for durably archived works, it's worth adding that Toki Pona is usually said to have about 120 to 140 words, but according to the spreadsheet there are at least 150(!) words with works from 3 or more different authors that are or can be durably archived. Probably not all of those words would qualify for attestation, because in a few cases it's articles from 3 contributors of the same magazine, which doesn't make for very independent sources. But the vast majority can probably qualify, seeing as 138 words have 10 or more authors listed, and 124 words have 25 or more authors. --Spenĉjo (talk) 15:15, 26 April 2024 (UTC)[reply]

As for "natives are what determine naturalness and correctness within a language", this tends to be true for most natural languages, but definitely not all of them.

For example, if I'm not mistaken, in Swahili the L2 speakers vastly outnumber the L1 speakers. And even among L1 speakers, many are only first or second generation L1 speakers who learned to speak the L2 way of speaking, instead of learning the grammatically more complex traditional dialects still used among the Swahili people. I don't know for sure, but I'm fairly confident that the vast majority of Swahili content on Wiktionary describes usage and formal standards that were mostly shaped and decided by L2 speakers. This is also the case for many (if not most) creole languages.

And this is definitely also the case for Esperanto.

As a fluent Esperanto speaker and active member of the Esperanto community, in my opinion the Esperanto native speakers are made into a far bigger deal by non-Esperantists than they should be. Esperanto native speakers are mostly indistinguishable from L2 speakers who have actively used the language for a couple of years. In fact, because none of them have had an education in Esperanto, on average they even tend to be less proficient when it comes to spelling, grammar, and similar types of "formal correctness" than fluent L2 speakers, who have actively studied the language to become proficient.

What determines naturalness and correctness within Esperanto (and, in my opinion, in any language), is the active core speaker base - most influentially the published authors, the magazine editors, the teachers, the writers of textbooks and online courses, the people who spend a lot of time in discussions about how to best express certain concepts in the language, etc. Some of those people are native speakers, but the vast majority are not. People who have dedicated decades of their lives to Esperanto aren't in any meaningful way less valuable and influential than Esperanto native speakers. And if all native speakers were to magically disappear tomorrow, neither the language itself nor its notability would noticeably change.

So I emphatically disagree that "has native speakers" is a meaningful metric. It seems like a relatively easy line to draw in the sand (if you ignore dubious or exaggerated claims of native speakers of some conlangs), but it is not one that says a whole lot about the degree of activity and language proficiency of its community, the size and quality of its literature and music, the degree to which the language is owned by the speaking community (as opposed to being codified in formal standards determined by only a small number of people), or to which it is experiencing natural evolution despite its artificial origins, etc. You can have any and all of those things without native speakers, and you can also have a few native speakers without any of those, if you have one parent who is dedicated enough and knows what they're doing. --Spenĉjo (talk) 23:21, 25 April 2024 (UTC)[reply]

It is. If it has no native speakers then there is a high likelihood of it being a playground for psychiatric disorders. Those with schizoid, perfectionist, or avoidant personality traits. I formulate extra-carefully due to the multiformity of the language-capable hard cases we actually meet. It’s not funny.

Reality was too complex to control so we would have to invent peculiar judging standards to play against them. I don’t wanna play against them. I tell them to stop it and get back to learning something real, not to say useful, hopefully.

Only that a psychologist would also have to have a language special interest to finally write a paper about it; the incidence of such a thing was not high enough for it to happen, the visible distress of the maladaptive behaviour exhausted in producing paper and websites that will be read by few and probably vanish, from but free time—which you are in the right to expend but we are overqualified for. Fay Freak (talk) 00:13, 26 April 2024 (UTC)[reply]

To make absolutely sure I'm not misinterpreting:

Someone who speaks a language with no native speakers probably has a psychiatric disorder (!?);
Most constructed languages have no native speakers;
Therefore, only people with psychiatric disorders use these languages;
Quotes by people with psychiatric disorders don't count towards CFI (!?);
Therefore, these languages cannot meet CFI.

If there is some respectful interpretation here that does not involve strawmanning, ableism, pathologizing a hobby, or moving the goalposts, then I sincerely, profusely apologize for so wildly misconstruing it. AgentMuffin4 (talk) 02:09, 26 April 2024 (UTC)[reply]

This particular user sometimes posts very strange things that don't make a lot of sense; best to ignore it. Benwing2 (talk) 02:29, 26 April 2024 (UTC)[reply]

Everyone is talking about the required notability to get into mainspace. It makes me wonder if this means there is no current notability to get into a appendix. If there isn't then can I make a appendix for my conlang that only I speak? (and if there is then what's the point?) --2007GabrielT (talk) 00:27, 26 April 2024 (UTC)[reply]

@2007GabrielT: No. Thought it be not clear to you from the WT:CFI text instituted by Wiktionary:Votes/pl-2020-12/CFI for appendix-only conlangs, for technical reasons of categories and templates we have a rigid system and there are no language headers allowed which are not previously agreed upon; it’s a whitelist, there too, this is what “at the community's discretion” means, you can’t even edit the language data needed to make a new one, see also Help:Adding and removing languages, and Category:Constructed languages shows what works. Fay Freak (talk) 01:38, 26 April 2024 (UTC)[reply]

@2007GabrielT User:Fay Freak's writing is sometimes hard to understand but they are correct that you can't just make an appendix for your own conlang; before doing that, you need to make a request in the Beer parlour for this and get consensus that the conlang is notable enough to be recorded in Wiktionary. If you ignore this process and just start creating appendix entries, your entries are liable to get deleted. The issue with the mainspace is that the bar is much higher for conlangs in the mainspace, which is why there are so few of them (few enough to be counted on one hand), and many of the ones that are there were essentially grandfathered in. (There used to be more, e.g. I think Novial and Interlingue used to be in the mainspace but were moved out due to a vote.) Benwing2 (talk) 07:36, 26 April 2024 (UTC)[reply]

If a conlang is notable it should be notable. Why would it matter where on the website it is? If its notable enough to be on wiktionary it should be notable enough to be… on wiktionary 2007GabrielT (talk) 15:11, 26 April 2024 (UTC)[reply]

@Lingo Bingo Dingo I assume you will be interested in this discussion. Thadh (talk) 13:31, 26 April 2024 (UTC)[reply]

Auto-protection of highly visible templates/modules[edit]

I just made a bot script that can do this - perhaps we should do this, given that the occasional spates of template vandalism we get (like earlier today) are highly disruptive. — SURJECTION ^{/ T / C / L /} 17:30, 25 April 2024 (UTC)[reply]

Strong support. Vininn126 (talk) 17:31, 25 April 2024 (UTC)[reply]

Support. Benwing2 (talk) 20:34, 25 April 2024 (UTC)[reply]

Strong support Theknightwho (talk) 21:41, 25 April 2024 (UTC)[reply]

Support - -sche (discuss) 22:10, 25 April 2024 (UTC)[reply]

Weak support - I think SCORE_TO_PROTECT is currently too low. 1000 is a mere 500 entries in mainspace; there are plenty of language-specific templates that are used in more entries than that, but still need to be edited semi-regularly. Lunabunn (talk) 19:54, 26 April 2024 (UTC)[reply]

Do you have any examples? Note that the bot would only give the pages semi-protection. — SURJECTION ^{/ T / C / L /} 19:58, 26 April 2024 (UTC)[reply]

@Lunabunn Semi-protection is a pretty low bar; AFAIK it just means you have to have an autoconfirmed account, which happens to all accounts after a certain (relatively low) number of edits. Benwing2 (talk) 20:01, 26 April 2024 (UTC)[reply]

@Surjection @Benwing2 Ah, upon only skimming through the script I was under the (mistaken) impression that it would lock the page. Semi-protection should be fine; thank you for the clarification!

Strong support Lunabunn (talk) 20:21, 26 April 2024 (UTC)[reply]

Support. Binarystep (talk) 05:49, 27 April 2024 (UTC)[reply]

Vote now to select members of the first U4C[edit]

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Dear all,

I am writing to you to let you know the voting period for the Universal Code of Conduct Coordinating Committee (U4C) is open now through May 9, 2024. Read the information on the voting page on Meta-wiki to learn more about voting and voter eligibility.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. Community members were invited to submit their applications for the U4C. For more information and the responsibilities of the U4C, please review the U4C Charter.

Please share this message with members of your community so they can participate as well.

On behalf of the UCoC project team,

RamzyM (WMF) 20:21, 25 April 2024 (UTC)[reply]

Punjabi vs. "Western Panjabi"[edit]

User:-sche Sorry to keep pinging you but I'm guessing you may know what's up here. Category:Western Panjabi language (code pnb) says all lemmas should be placed under Punjabi (code pa), and it's apparently been that way since at least 2020 (when User:Kutchkutch asked the same question in the Beer parlour), but we still have an L2 entry for Western Panjabi. WT:LT says nothing about Punjabi. I would like to clean this up properly because we still have a bunch of Western Panjabi categories as well as 366 translation entries for "Western Panjabi" (only 5 of which are nested under Punjabi) and 4 translation entries for "Western Punjabi" (3 of which are nested under Punjabi). Can we eliminate code pnb and agree on the spelling "Punjabi" instead of "Panjabi"? Benwing2 (talk) 00:05, 26 April 2024 (UTC)[reply]

I believe this is another case where people didn't get around to properly finishing a language merger. Dijan added the notice that "Western Panjabi" should just be "Punjabi" back in March of 2013, but apparently didn't actually remove the code (T:pnb wasn't deleted till November 2013, around the time the code was moved to Module:languages, where it's been ever since). I believe you can just finish the merger. Pinging @عُثمان who is AFAICT our only recently-active natively Punjabi-speaking editor. (Wikipedia sez of Western Panjabi "Its validity as a genetic grouping is not certain.[5] The terms "Lahnda" and "Western Punjabi" are exonyms employed by linguists, and are not used by the speakers themselves.[4]") - -sche (discuss) 03:20, 26 April 2024 (UTC)[reply]

Thanks. I was going to ask whether we need a Western P{a,u}njabi etym-only lang but I see in Module:labels/data/lang/pa that there's a Western Punjab label that categorizes into CAT:Pakistani Punjabi, which is probably good enough. Benwing2 (talk) 04:06, 26 April 2024 (UTC)[reply]

@-sche @Benwing2 Yes I would agree this is an unfinished merger and that there is no reason to treat these separately. I also agree "Punjabi" is a preferable spelling as it is still the spelling used officially in India and Pakistan even though it the more archaic one.

There are maintenance issues inherent in maintaining information about languages which are written in multiple writing systems. I focus most of my editing on Wikidata lexicographical data and long-term I don't see much future in maintaining separate entries for the same words in a different writing system. For example, it should be possible to get all the definitions, references, spellings, and dialectal forms of inflections from a single data entity like this: https://www.wikidata.org/wiki/Lexeme:L686283

(Technically there is nothing stopping anyone from implementing this on Wiktionary now, I would just personally like to focus my own efforts on the Punjabi-language Wiktonaries first.) عُثمان (talk) 14:05, 26 April 2024 (UTC)[reply]

@-sche @عُثمان I see. I didn't realize that Punjabi can be written in either Gurmukhi or Shahmukhi, and that information is duplicated across the two scripts in a haphazard fashion. IMO the information needs to be in one of the scripts, and propagated to the other either using a {{tcl}}-like solution or through soft redirects. What do you think is the best way of doing this? Which script should be the canonical source of information? (I know this probably has political ramifications but I don't see any way around this, except to do something like put the information in neither script but in the Latin script. A {{tcl}}-like solution would make it appear, to the user a least, that both scripts are equal, although it's a bit trickier to implement than soft redirects. The only other alternative I can think of is to haphazardly choose one or the other script as the source of information on a per-word basis, a bit like what is done for English when there are British vs. US spelling differences; but this seems very messy to me.) Also, are there cases of lexical differences across the two scripts, or is it reasonable to have the same information displayed in all cases for both scripts (with labels to distinguish India vs. Pakistan uses)? Benwing2 (talk) 04:07, 27 April 2024 (UTC)[reply]

One other thing is that there are parallel inflection templates like {{pa-noun-f-c}} and {{pnb-noun-f-c}} that claim to be respectively the "Punjabi" and "Western Panjabi" versions of the same underlying templates but are really just the Gurmukhi and Shahmukhi equivalents. I propose to rename them e.g. {{pa-Guru-noun-f-c}} and {{pa-Arab-noun-f-c}} to reflect their actual purposes. Benwing2 (talk) 04:40, 27 April 2024 (UTC)[reply]

@Benwing2 There are no lexical differences across the scripts and every word form can be represented in either script. A majority of speakers live in Pakistan, but an issue with treating one over the other as canonical comes up with homographs. The words حال (from Arabic via Persian حال) and ہال (from English hall) are both spelled ਹਾਲ in Gurmukhi, while the words ਤੂੰ and ਤੋਂ are both توں in Shahmukhi.

The inflection tables are overdue for a complete overhaul, particularly the verb ones, and the way verbs are lemmatized on enwiktionary is poorly suited to the language as it uses a form which doesn't exist for all verbs.

These problems could all be avoided using Wikidata lexicographical data, which uses numerical IDs. I set up a demo on pnbwiktionary using it; https://pnb.wiktionary.org/wiki/%D9%85%D8%B9%D9%86%DB%8C%D9%B0

Regarding the language codes: Wikimedia has been using pa and pnb for the two scripts for a very long time and the respective Wiktionary projects use these codes. It doesn't matter too much, but pnb is more consistent with other Wikimedia projects. عُثمان (talk) 05:11, 27 April 2024 (UTC)[reply]

@عُثمان You've brought up several issues. My thoughts:

If we use a {{tcl}}-type solution and are consistent in treating one script as canonical, we can avoid the issue of homographs by using etym ID's of some sort (either Wikidata ID's or English descriptive words) for the words that become homographs in the canonical script but are separated in the other script. It's also possible to use one script as canonical except in the cases where that script has homographs and the other doesn't. As for using Wikidata lexicographical data, I don't know what that would entail. Can you explain how Wikidata works? From what I've seen, Wikidata does not do as good a job as English Wiktionary at representing lexicographical information, and the information structures are extremely different and likely incompatible (for example, AFAIK no one at the English Wiktionary has ever been consulted by Wikidata on how to best represent lexicographical information). I also think it would be difficult for contributors to use; they'd have to understand the Wikidata information structure as well as Wiktionary's structure, and overall I would rather keep all the information in Wiktionary itself if at all possible. However, maybe I'm wrong here.
As for pa vs. pnb, if we're trying to retire pnb at English Wiktionary in favor of pa, I don't really think it would be good to continue to use pa vs. pnb as a proxy for the different scripts when we already have script codes (Guru and (pa-)Arab) to directly represent the differences.
As for the way verbs are lemmatized being wrong, can you explain more? I am happy to write a bot script to move the terms as necessary, but I don't know Punjabi very well so I'd need a little help.
As for overhauling verbs, note that a couple of years ago I overhauled Hindi verbs and wrote a proper module to implement them. Maybe this module could serve as a basis for Punjabi? How different are Hindi and Punjabi verbs?

Benwing2 (talk) 05:52, 27 April 2024 (UTC)[reply]

include etym-only languages in WT:LOL[edit]

We have two pages listing Wiktionary languages: WT:LOL (which lists all L2 languages) and WT:LOL/S (which lists all etym-only languages plus various L2 languages meeting certain "special" criteria). When I encounter a particular lect and don't know whether it's an L2 or etym-only language, it's annoying to have to search in two places. I propose expanding WT:LOL to include both L2 and etym-only languages. This should not significantly impact the size of WT:LOL because there are 8,218 L2 languages and only 548 etym-only languages currently. Benwing2 (talk) 00:10, 26 April 2024 (UTC)[reply]

I'm fine with the idea in general, but it may be worthwhile to put these under a separate header. Thadh (talk) 11:48, 26 April 2024 (UTC)[reply]

Another header could be a good idea, but it might make sense to re-list it's parent language for clarity? Vininn126 (talk) 12:05, 26 April 2024 (UTC)[reply]

@Thadh @Vininn126 Yes I was thinking of putting them under a separate header and including a column for the parent language (I think that's what Vininn was requesting?). Benwing2 (talk) 19:30, 26 April 2024 (UTC)[reply]

Something along those lines, just to make it clear what it is an etymology language of (I hope that syntax makes sense...). Vininn126 (talk) 19:32, 26 April 2024 (UTC)[reply]

make Template:q recognize lang-independent labels[edit]

People often put things like archaic, figurative, colloquial and the like in qualifiers using {{q}}. Earlier I proposed adding an {{lq}} template that took a language code and processed labels just like {{lb}}, but without categorizing. I'm coming round instead to the idea that what we should do is this:

Add a language code to {{a}} and make it work like {{lq}}, which won't then be needed (i.e. process its parameters like labels, but don't categorize);
make {{q}} process its parameters like labels, but only for lang-independent labels, and don't categorize. Currently all {{q}} does is display its stuff in italics surrounded by parens, but with this change it would auto-link terms like archaic, figurative and colloquial to the appropriate glossary entry, and would canonicalize certain labels unless preceded by an ! (which forces a label to display as-is, but still allows it to be linked appropriately). For example, {{q|hapax}} would display as (hapax legomenon) and be linked to the glossary, {{q|historic}} would display as (historical) with a glossary link, etc. It would also auto-recognize special parameters like _ and ;, just like {{lb}} does.

Thoughts?

If people think we still need a simple "just italicize and add parens" template, we could use {{i}} for that purpose; currently {{i}} is an alias of {{q}} but they could be split, so that {{q}} does label processing and {{i}} doesn't. My instinct is this isn't necessary, but it is a possibility if people think it may be needed in certain cases. Benwing2 (talk) 07:49, 26 April 2024 (UTC)[reply]

This would be nice, and I agree better than a new template. Some might people might be upset about the functionality change - either {{i}} could be made to do take on the old functions, or I suppose the new ones. Not sure what the downsides are. But I think synching these things up would still be good. Vininn126 (talk) 07:55, 26 April 2024 (UTC)[reply]

The point of a qualifier is to point you towards an entry where the term will already be explained. Saving the one click you need in any case for all the other information that is on that page is just not worth it in my opinion. I would probably be fine with a tooltip, but links in my opinion are too distracting. Thadh (talk) 11:39, 26 April 2024 (UTC)[reply]

In the past, {{q}} was used to just present the text without parsing or expanding it. If you need to prop a door open, a rock will do just fine- no need to determine its exact location to the micrometer, the ambient temperature or the direction of the forces acting on it. I see that it now uses a module, which seems silly. I'm sure that the 121 instances of the template at "a" are contributing to the fact that it keeps drifting in and out of CAT:E. I would prefer to revert this template to its previous dumb state, and create a separate lua-powered version for more specialized things. Given its history, I would worry about retroactively changing the display in thousands of entries in subtle ways that would require manual checking to spot, as well as tripping up contributors who have been using this template for eons and have no reason to check the documentation to discover that it's been changed. Don't get me wrong- there's definitely a place for the Swiss-Army-knife approach- but we should have a few exceptions set aside, just in case. I don't think anyone wants a template used on every single character, with a separate data submodule for each codepoint, but we should be careful to avoid drifting too far in that direction, anyway. Chuck Entz (talk) 18:10, 26 April 2024 (UTC)[reply]

Revisiting the Deleter role proposal[edit]

I'd like to revisit the proposal that was voted down in Wiktionary:Votes/2021-12/Deleter role. Clearly whatever we have right now when it comes to the timely deletion of entries found at Category:Candidates_for_speedy_deletion is not working, if there are words there from last year that have still not been deleted. We shouldn't have 209 pages currently listed there. It makes the enforcement of rules and policies like WT:DEROGATORY, RFV, & RFD useless if users know that their entries won't be deleted in the end. I've had to start resorting to blanking entries so that the unverified information at the very least won't be there to stay. As such, I do think that it'd be helpful for a deleter role, so that these entries can be dealt with in a timely manner, barring increased activity from current admin. AG202 (talk) 14:29, 26 April 2024 (UTC)[reply]

@AG202: On some of the pages that you marked for speedy deletion, was there ever an RFV discussion? From what I understand WT:DEROGATORY only means that the RFV discussions should be shorter, not that any derogatory term can be marked for deletion without discussion.

On the more general topic: The issues as outlined in the original vote didn't disappear. If anything, most if not all of the people who would ever qualify for the "deleter role" should be made admins. Thadh (talk) 14:39, 26 April 2024 (UTC)[reply]

@Thadh: No, they do not need to be sent to RFV, provided that it's been within 2 weeks of entry creation. Per WT:DEROGATORY, I marked the entries, such as 13%, with the derogatory template before the 2 weeks was up. However, cites were not added before the deadline, so the entry was marked for speedy deletion. This is how the policy was understood to work at the vote, and how it's been implemented, until the recent lag in deletions. AG202 (talk) 14:55, 26 April 2024 (UTC)[reply]

@AG202: So basically nobody sees the entry anywhere except the creator and the one who marks it for deletion? That seems a bit overly strict, that just means IPs can't create derogatory terms without immediately adding quotes basically. Thadh (talk) 17:14, 26 April 2024 (UTC)[reply]

The wording at WT:DEROGATORY reads to me as AG202 explained it, and the discussion at the vote also said a motivation for the policy was to avoid cluttering RFD or RFV or wasting editor time and attention. What the policy is seems pretty straightforward to me (whether it is 'too strict' or not is a different matter, but admins should simply carry out the policy unless it is revised through consensus). As for a deleter role, I don't think there should be one, as I don't like the idea of giving special deletion powers to editors who wouldn't be suitable admins.--Urszag (talk) 17:24, 26 April 2024 (UTC)[reply]

The wording per Wiktionary:Votes/pl-2022-06/Attestation criteria for derogatory terms reads to me that the quotes need to exist and only if the term is nominated for being of dubious language permeation (criteria accepted after Wiktionary:Votes/pl-2022-01/Handling of citations that do not meet our current definition of permanently archived) then deletion is sped up by a deadline rather than the timeframes afforded to less offensive words, so offensive IPs are deterred to create their words in the first place because from motivational psychology we know rewards or punishment need to follow with not too great a time lag, whereas taking space is what certain kinds of trolls desire. You can’t point the finger at your mutt pooping your carpet and scold him a “bad dog” one month after the offence.

Editor time and attention must be “wasted” either way to some degree, and the terms are actually more likely to be, and more legitimately, deleted by an admin if you posted a banns ascertaining him of the matter. Legitimacy comes through participation, through procedure, you know it. Fay Freak (talk) 21:25, 26 April 2024 (UTC)[reply]

I mean, that's exactly what we talked about at the vote. We had a problem with IPs creating derogatory terms and them cluttering RFV/RFD. This was created to help limit that. AG202 (talk) 17:38, 26 April 2024 (UTC)[reply]

To me it seems odd to have a group of users who are simultaneously trusted enough to delete pages but not trusted enough to become admins. Do you have yourself in mind for the role? In any case, I agree that we need more admins patrolling Category:Candidates for speedy deletion. Ioaxxere (talk) 19:19, 26 April 2024 (UTC)[reply]

Kyrgyz transliteration[edit]

I had changed the Kyrgyz transliteration module Module:ky-translit to a simpler transliteration system that uses some of the Common Turkic Alphabet's letters differently. The reason is to declutter the text from excessive diacritics and simplify the alphabet.

I am currently working on the Module:ky-IPA to help avoid confusion that might've been caused by my change. And nobody else seems to be working on the Kyrgyz language as of right now, so I am not interfering in anyone's work. Bababashqort (talk) 05:53, 27 April 2024 (UTC)[reply]

[1] Specifically: a) unbolded transliterations, b) fixed overflowing text, c) decreased font size, d) fixed visual bug on Firefox

[2] Note that {{etymon}} can run in a "silent mode" which produces no visible output, but passes along information to other entries. This could be added anywhere.

[3] Tamalu (2022). "Results of the 2022 Toki Pona census". Toki Pona census. GitHub. Retrieved 25 April 2024.

[4] De Gruyter, W. "Phraseology in planned languages". Phraseology: an international handbook of contemporary research – Volume 2 via Google Books.

[5] ulupu Menasewi, jan Juwan, jan Pensa (compilers). (2024). "Toki Pona word attestation for Wiktionary". Google Sheets. Retrieved 25 April 2024.

[note 1]

[note 2]

[1]

[2]

[3]

Wiktionary:Beer parlour

March 2024

A way to more easily connect with readers[edit]

Bengali language[edit]

Restricting {{m}} in etymology sections[edit]

deprecate Template:1[edit]

Report of the U4C Charter ratification and U4C Call for Candidates now available[edit]

Module Breaker[edit]

Unlink more and most in English headwords[edit]

Wikimedia Canada survey[edit]

Revoking autopatrolled status from Kwamikagami[edit]

Eastern Geshiza language[edit]

Language titles with category[edit]

Make default language titles with category[edit]

Two transliterations[edit]

One system, multiple transliterations[edit]

How should we transliterate (into Japanese script or other scripts), romanize, and lemmatize Ryukyuan?[edit]

Previous discussions[edit]

Information[edit]

Recent change to government standard for Japanese[edit]

Wikimedia Foundation Board of Trustees 2024 Selection[edit]

User:GabMarquetto[edit]

Hoping to convene on practice regarding natural overlap of hyponyms and derived terms[edit]

Renaming "etymology-only language"[edit]

Proposal[edit]

Wiktionary really needs structured etymology[edit]

Adding "Língua Geral" as a new language[edit]

Reconstruction:Latin → Reconstruction:Proto-Romance?[edit]

Minimal viable quotation that satisfies the WT:QUOTE policy requirements[edit]

Template:ko-etym-native without parameters is pointless and misleading[edit]

Splitting Etymology by Accentuation[edit]

Equivalent of Template:ellipsis of in compounds[edit]

Category:Translingual entries with incorrect language header[edit]

Chinese lect labels and categories[edit]

Adding Transitional Proto-Norse as an etymology-only language[edit]

Etymology trees[edit]

New design[edit]

Changing the letters sort order on Arabic dialects' category pages[edit]

Aquitanian entries in reconstruction namespace[edit]

Japanese bot task proposal - on'yomi categorization[edit]

I noticed I was blocked permanently in March, while I did not edit anything these 2 months.[edit]

Japanese いぃ, うぅ, イィ, ウゥ[edit]

Splitting WT:RFM[edit]

Hard Lithuanian Dotting[edit]

Lydian letters[edit]

Orthographic borrowing[edit]

Feedback on proposed label designs[edit]

A proposal for how future big template changes should be done.[edit]

Derived terms[edit]

Idea: Categorizing illustrated terms[edit]

French Wiktionary[edit]

April 2024

Request lemma[edit]

Etymology tree testing[edit]

T:antsense, to finally clarify T:sense on antonyms[edit]

FYI: April Updates (Unicode)[edit]

Automatic cognate generation[edit]

"terms spelled with"[edit]

Requiring attribution when moving from one Wiktionary page to another[edit]

Limburgish nominal inflection[edit]

{{tcl}}[edit]

Copying rhyme syllable counts from existing categories[edit]

Labels 'UK' vs 'Britain'[edit]

Update the text on main page[edit]

Old Lombard[edit]

Definition of a neologism for loanwords[edit]

Add a lang code to Template:a (Template:accent)?[edit]

rename all states, provinces, etc. to include the associated country in them[edit]

Russian book quotations[edit]

New label: ephemeral[edit]

AAVE vs. African-American English[edit]

Ukrainian IPA transcriptions—in particular concerning the vowel И[edit]

Mainspace Proto-West-Germanic?[edit]

clarify boilerplate on categories for terms derived from vs topical to fiction[edit]

French Translingual[edit]

bullet points, usage notes and etymologies[edit]

User:‎TTObot[edit]

Etymology tree vote[edit]

Ordering entries differing in lexicographic spelling[edit]

Recent changes to the citation templates[edit]

Restricting `{{m}}` in etymology sections[edit]

`{{tcl}}`[edit]