Wiktionary:Beer parlour

From Wiktionary, the free dictionary
(Redirected from Wiktionary:BP)
Latest comment: 3 hours ago by WordyAndNerdy in topic Reddit as a Source for WT:CFI
Jump to navigation Jump to search

Wiktionary > Discussion rooms > Beer parlour

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


Language code for Baltic German

[edit]

I would like to request adding a languliage code for Baltic German on this platform. A lot of Estonian terms (and Latvian terms) are derived from Baltic German and there's currently no real way of displaying that, other than Baltic {{der|et|de|<term>}}, which not only looks ugly, but is also wrong. It categorizes the term to [[CAT:Estonian terms derived from German]], which is incorrect, as there is a clear distinction (at least in Estonian) between terms derived from (High) German and the Baltic German dialect spoken here. As such, the code could also be etymology-only. In essence, the Baltic German dialect is a vernacular dialectal form of a mixture of High and Low German with a clearly recognisable regional flavour (Estonian and Latvian dialects) in pronunciation, morphology, syntax and vocabulary. EKI (Institute of the Estonian Language) has an online dictionary of Baltic German, with a myriad of sources for various terms: https://arhiiv.eki.ee/dict/bss/. I feel like having a language code for Baltic German is justified. Joonas07 (talk) 19:31, 1 August 2024 (UTC)Reply

My comprehension of German is pretty rudimentary, but from my understanding, there is a distinction with Baltic German and central European German varieties, so I agree that this is justified. Joonas, do you know to what extent this is also true for Latvian or even Lithuanian? Danke. —Justin (koavf)TCM 19:37, 1 August 2024 (UTC)Reply
Lithuanian barely has any influences from Baltic German, if at all. Are you asking whether this distinction exists the same way in Latvian? I'm not extremely familiar with Latvian, but I believe both of these languages have been influenced the same way by Baltic German, as the history is the same. A quick look at [[Category:Latvian terms derived from German]] as well makes me believe that is the case. Joonas07 (talk) 20:21, 1 August 2024 (UTC)Reply
Ja, that was my question. I figured that the influence wouldn't be as strong due to the Polish–Lithuanian Commonwealth. —Justin (koavf)TCM 20:23, 1 August 2024 (UTC)Reply
vernacular dialectal form of a mixture of High and Low German with a clearly recognisable regional flavour – this is fiction. It is either Standard High German with Baltic characteristics in vocabulary (e.g. Burkane, but admittedly they suffice for whole dictionaries which maintain borrowings from Low German and the local Baltic language) or it is Low German, with Baltic-influenced accent. I also speak German with Slavic twang due to speaking Russian, doesn’t mean I have created a new dialect or creole. Most commonly it is High German like “Austrian German” is High German, or just German. w:de:Baltisches Deutsch knows that even around 1600 High German “setzte sich durch” prevailed over Middle Low German – which seems exaggerated to me, but perhaps only by half a century, and Middle Low German ends in 1650 precisely at the point of being supplanted by High German for cultivated and literary purposes –, and then around the first half of the 19th century the academic upper class was oriented towards “a trim Standard German”. But Baltic Germans were only upper class, so there is no third language for even diglossia to fit in.
Baltic German should be no more than a label of German and occasionally Low German for any traces of it remaining, a distinction between (High) German and the Baltic German dialect spoken [in Estonia, Latvia, in St. Petersburg or the Baltics in general] is incorrect, it was not present for speakers. It was like Euro-English in Brussels: behind it in Brussels there is Flemish and French, and in Reval now Tallin and Dorpat now Tartu Estonian and on another level Russian. Fay Freak (talk) 20:35, 1 August 2024 (UTC)Reply
I would argue the Baltic German varieties developed enough from their High or Low German origins to warrant an etymology-only code. There is a noticable difference between you speaking German in a Russian accent, and German settlers in Estonia and Latvia speaking a variety of their language for hundreds of years. I definitely disagree that the distinction wasn't present for the speakers. Besides, that isn't even that important, as the distinction is present in target languages. Joonas07 (talk) 20:55, 1 August 2024 (UTC)Reply
German settlers … variety of their language There you have it, their language of the mainland.
The distinction is not present in the remaining sources either. It must be in use and not merely by declaration in target languages. Its texts look like Standard German texts with peculiar words we of course seek. E.g. the sentences quoted in Wörterschatz der deutschen Sprache Livlands – there aren’t actual dialect dictionaries. And all quoted on w:de:Baltisches Deutsch.
Of course only a single word suffices for a Baltic Germans speech to be marked and ridiculized further west, which they themselves did not expect, since they only knew one German, Standard German, not a diglossic situation of local Standard German plus dialect as is now known from Switzerland and Arabic countries, hence the quoted Harry Siegmund from Liepāja writes about his stay in Königsberg, because of the sensitive nationalist climate in Germany: “Ich schwieg auch, weil ich fürchtete, mit meiner baltischen Sprechweise als Fremder aufzufallen und ihnen in jeder Hinsicht unterlegen zu sein.” – “I was silent for fear of raising attention as a foreigner due to my Baltic mode of speech and be outgunned in every way.” It was a mode of speech. This is the situation in its last 150–250 years. Going further back, the variance is within the standard variance of all Early New High German and Middle Low German. For a 16th-century text it is highly problematic to e.g. claim it specifically Swabian or Category:Alemannic German language instead of Early New High German, which was just developing as a standard. And then in the Baltics you don’t even have a solid basis of untarnished dialect speakers because the peasants spoke Estonian and Latvian, and “German” were those who worked in administration and churches and their language—occasionally also a Russian, an Englishman, or a Swede, your target language sources may generalize it—, quite different also e.g. from the Volga German situation, which were homogenous German societies with little if any Russian or Turkic etc. encroachment until Sovietization.
This also does not mean though we can’t have “Baltic German” as an etymology-only language. As I implied with the Austrian German we can have a code, I probably would have added it myself if I had cared enough about the variety. For your purposes you should know it is still German however. Reminds me a bit of the pendants amongst Hungarian editors who liked to be sure whether a Hungarian word is borrowed “from German”, “from Austrian German” or “from Bavarian”. Linguistic works vary in the declaration. There is no actual idea behind such questions. Fay Freak (talk) 22:28, 1 August 2024 (UTC)Reply
Yeah, I didn't mean it in the sense that it has developed into a language of its own right, rather that the variety of Standard German that is Baltic German has developed far enough to be notable. Didn't quite understand what you're getting at in the second half of your third paragraph. Re: Hungarian, it doesn't hurt to be exact. I don't know what you mean by "there is no idea behind such questions". For Estonian, it is often significant whether a word was borrowed from German or Baltic German. Joonas07 (talk) 23:30, 1 August 2024 (UTC)Reply
This significance I understand. It might be from the historical German there or it might have intruded into the standard from present Germany, or even its predecessor Reich. Similarly one judges whether a word entered Ethiosemitic from Egyptian Arabic or Yemenite or Ḥijāzi usage, but all under one Dachsprache. I feared that you tried to introduce a distinction that is impossible to make out, exaggerating diglossia. The Hungarian etymology statements are more fanciful than reliable in this respect. Fay Freak (talk) 23:52, 1 August 2024 (UTC)Reply
What is your preferred code for Baltic German then? de-BAT? de-BLT? There seem to be no codes for geographic regions comparable to ISO 3166. Region code means something different. Fay Freak (talk) 22:37, 1 August 2024 (UTC)Reply
That's a good question. Does it have to be in the format <language code>-<REGION CODE> to be an etymology-only code? Joonas07 (talk) 23:32, 1 August 2024 (UTC)Reply
@Joonas07: There is no rule, compare the list of etymology-only languages in WT:LOL/E, so I can only inquire about preferences. Fay Freak (talk) 23:52, 1 August 2024 (UTC)Reply
I don't really know. de-bal maybe? ger-bal? The list you linked seems to indeed have various different formats (btw, I really enjoy the abbreviation WT:LOL): some are just three-letter codes, but I don't know if there's an intuitive one for Baltic German that's not already in use. Some start with gsw-, which I gather is for High German varieties? So there seems to be some conventions. I'm open to suggestions. Joonas07 (talk) 00:23, 2 August 2024 (UTC)Reply
@Joonas07: de-cle For Curonia, Livonia and Estonia, because they called their storage-chambers by Proto-Slavic *klětь and we store information here. I will go to sleep now, before implementing it. Fay Freak (talk) 00:43, 2 August 2024 (UTC)Reply
Let's just do de-bal. That's analogous with other language varieties not from a specific country. Joonas07 (talk) 00:57, 2 August 2024 (UTC)Reply
Done Done, @Joonas07. Fay Freak (talk) 11:25, 2 August 2024 (UTC)Reply
@Joonas07: I have added you the online dictionary of Baltic German as a reference template, {{R:de:BSS}}. Most of the dictionaries and whole sentences quoted from Baltic German therein are in Standard German. Schiller-Lübben is for Middle Low German. Fay Freak (talk) 23:04, 1 August 2024 (UTC)Reply

AWB access

[edit]

Hello, I would like to request access to the AutoWikiBrowser tool. I have been contributing significantly by adding entries in Old Tupi and Guaraní, and I often need to correct some inaccuracies in the entries of these languages. Furthermore, the creation of Old Tupi entries only really started to take off last year; we are in a somewhat unstable phase where some quotation templates are occasionally renamed. For what it's worth, I already have access to AutoWikiBrowser on enwiki. Thank you, RodRabelo7 (talk) 04:48, 2 August 2024 (UTC)Reply

This seems uncontroversial, based on edits such as this. I'm not familiar with the language, but your work seems reasonable to me. Please ping me if no one else grants access in a week. I'll try to check in on this thread to see if there are any other comments. —Justin (koavf)TCM 04:53, 2 August 2024 (UTC)Reply
Just in case, a decent and modern Old Tupi grammar in English is Ferraz Gerardi's A Role and Reference Grammar Description of Tupinambá. RodRabelo7 (talk) 04:56, 2 August 2024 (UTC)Reply
Obrigado. —Justin (koavf)TCM 05:04, 2 August 2024 (UTC)Reply
@Koavf: pinging, as requested. RodRabelo7 (talk) 20:29, 9 August 2024 (UTC)Reply
https://en.wiktionary.org/w/index.php?title=Wiktionary%3AAutoWikiBrowser%2FCheckPageJSON&diff=80991102&oldid=80906256 Obrigado for your service. —Justin (koavf)TCM 20:33, 9 August 2024 (UTC)Reply

Hyphenation for Row-Splitting versus a Word that Might or Might Not Normally have Hyphenation

[edit]

Dear Wiktionary: If a word could be normatively be interpreted as either needing hyphenation or not needing hyphenation, and it is hyphenated by a row-splitting hyphenation, how do I take a verbatim quote of that sentence for a Wiktionary citation? This actually comes up A LOT for me, because formal Wade-Giles includes hyphenation, while informal Wade-Giles and postal romanization do not include hyphenation, so many words "could go either way". What I did in this case: diff on the Zichang page was make a context-based decision (i.e. this sentence did not fall out of a coconut tree; in the context of the book and the other usage of the word in a different entry of the dictionary, it appears that the authors might likely have intended that this hyphen is more than just a row-splitting hyphenation). But I also want to imagine what could be unburdened by what has been before (that is, the author may have intended non-hyphenation for this specific instance, even if the publisher did hyphenate for the row-split, and even if the same word was hyphenated elsewhere, and even if other similarly situated words in the book are hyphenated). Thanks for any guidance. Yours Truly, --Geographyinitiative (talk) 11:26, 2 August 2024 (UTC) (Modified)Reply

I'm not sure how familiar you are with CSS and HTML, but have you by chance seen these web design solutions?
I think these kind of solutions will work for what you're going for, which may involve inserting raw HTML/CSS rather than a template or other wikitext. —Justin (koavf)TCM 11:41, 2 August 2024 (UTC)Reply
Thanks-- Okay, I'm looking at this, but does this coding allow me to signal to the reader that, within the context of the published book, there is an ambiguity as to whether the hyphen is merely a row-splitting hyphen or actually a part of the word proper (i.e. the hyphen would have been included if the word were not on the edge of a row)???--Geographyinitiative (talk) 11:47, 2 August 2024 (UTC)Reply
I think your solution of an HTML comment is probably the best you can do. —Justin (koavf)TCM 11:52, 2 August 2024 (UTC)Reply
If the surrounding context makes the intended usage clear (for example, if the same document has examples within a single line of the same word/proper noun spelled with or without a hyphen, or of analogously formed words or names) it seems fine to follow that. In cases where that can't be determined, I would say it should be considered whether these specific quotations are really essential.--Urszag (talk) 12:20, 2 August 2024 (UTC)Reply
If it’s significant (for example, because a term has both hyphenated and non-hyphenated forms), I indicate this as “roly[-]poly” in a quotation. You can also use the template {{quote-gloss}}, which results in “roly[-]poly”. — Sgconlaw (talk) 15:26, 2 August 2024 (UTC)Reply
As to Sgconlaw's comment, I am not sure if the hyphen is a gloss on the quote, and I don't want to misuse quote-gloss, though I see how this could be good.
Concerning Urszag's comment (that I have usually agreed with) that "In cases where that can't be determined, I would say it should be considered whether these specific quotations are really essential." I have to admit that I have followed that line of thinking before. However, I have later come to feel that I really don't want to cause a bias in my citations by just blatantly ignoring a category of ambiguous situations in English. So I really want to embrace the citations as I come to them. There should be a normative way to deal with this category of scenario beside "fuck it". This is not a lowly or vulgar usage of English- this is a category of ambiguity that is baked into English, and I believe Wiktionary should have a way to confront the situation head-on and properly cite them as what they are. In the above quote the word "An-ting" is not the essential word, but instead the rare word "Tzu-ch'ang" which is super rare because it is a Wade-Giles name derived from a communist-only Chinese original (Taiwan did not use it), so Tzu-ch'ang is pretty rare, and the book is pretty authoritative. So I want to deal with "An-ting" in the "right" way that fully acknowledges the ambiguity rather than do my grab ass horseshit of writing something in the html. I came up with that shit ages ago as a work-around; now, I want to fucking do to beautifully and make it clear to the reader of the quote what the fuck is happening, and unambiguously tell the reader that there is a potential ambiguity. --Geographyinitiative (talk) 22:35, 2 August 2024 (UTC)Reply
@Geographyinitiative, Sgconlaw: The purpose of {{quote-gloss}} is to contain text not present in the original text, so An{{quote-gloss|-}}ting would mean "there was no hyphen in the original, but there was supposed to be" — probably not your goal. The OED does something like this: An-ting [variant reading Anting]. Although I prefer something more explicit like this: An-ting [or Anting, if the hyphen is a line-breaking hyphen] Ioaxxere (talk) 01:53, 8 August 2024 (UTC)Reply
@Ioaxxere: ah, true. In that case I’d go with the first option I suggested which is to indicate the hyphen as “[-]”. — Sgconlaw (talk) 05:08, 8 August 2024 (UTC)Reply
Following Ioaxxere's comment, from now, I will plan to explore the possibilities of make these kinds of edits: diff. It's so murky. --Geographyinitiative (talk) 07:07, 8 August 2024 (UTC)Reply

Unless someone comes up with a better solution, I'm going to leave this quote (diff) as is. I'm going to eventually take this topic to Grease Pit to see if a real solution can be created for this kind of ambiguous situation. However, right now, for this quote, I don't think this quote is a good "model case" for the larger problem because I really feel that the context of the book itself more heavily favors the hyphenated form of "An-ting" than unhyphenated "Anting". But I'll keep this case in mind and come back to it later; please ping me if you have more help/input/advice on the topic generally. --Geographyinitiative (talk) 23:56, 2 August 2024 (UTC)Reply

@Ioaxxere: I thought OED indicates variant readings when there are multiple versions of the same work, and some use one form of a term and some use another form. That’s how I interpreted it anyway. — Sgconlaw (talk) 11:21, 8 August 2024 (UTC)Reply
I do agree that it would be nice to have a standardized solution. I've come across this issue more than once and it can be annoying when it's a rare word and I'm trying to figure out whether the hyphenated form is more common or not. Andrew Sheedy (talk) 05:00, 3 August 2024 (UTC)Reply
Yeah guys, please keep me in mind if you come up with a good solution for this. I will keep Sgconlaw's use of quote-gloss in mind. But I really want to give readers of a quote the full picture on the quote and not either (a) ignore the potential ambiguity, (b) just opt not to use the quote, or (c) use the quote anyway without fully acknowledging potential ambiguity in some way that the reader can see without misusing quote-gloss (in my opinion) or using a work-around or similar, or relying on my personal assessment of what the author meant to pick one over the other. --Geographyinitiative (talk) 10:22, 3 August 2024 (UTC)Reply
Bit late here, but I follow OED in using [-] in these kinds of cases. @Geographyinitiative This, that and the other (talk) 10:14, 8 August 2024 (UTC)Reply
@This, that and the other Is that right? Is there a paper about this? I'd like to learn about the finesse behind when they use [-] and use it the same way they use it. It's very bizarre looking to me, so I want to be 100% clear what I'm doing if I follow that method- (1) EXACTLY what specific situations is it used in? (2) EXACTLY how is it formatted? (3) Do other dictionaries deal with this issue in a similar manner? (4) Is there any clear policy-level guidance on this issue anywhere in Wiktionary? If not, why not? Should it be created? --Geographyinitiative (talk) 11:02, 8 August 2024 (UTC)Reply
@Geographyinitiative Upon closer inspection it seems I may be wrong; OED appears to use the tilde (~) for this purpose instead. But I definitely picked up the habit of using [-] from somewhere - it's not my own invention! This, that and the other (talk) 00:56, 12 August 2024 (UTC)Reply

List and topic categories again (how many types, and how to name them)

[edit]

I notice CAT:en:Waterfalls says it's for "names of specific waterfalls, not merely terms related to waterfalls, [nor] types of waterfalls." But even before I added to it, almost all its contents were related/type terms (and I could add more: byfall, catadupe, maybe spray bow, foambow, plunge pool, stickle, huck).
I could solve this by changing "Waterfalls" to a "related-to" category; in this case, that wouldn't even cause other languages much hassle, as other languages barely use it. However... I think it is reasonable to have a category for specific Falls too, like we have for cities. But what could it be called?
We use "CAT:en:NAME" for both set categories ("terms for seasons, not merely terms related to seasons. It may contain [...] types of seasons [or...] names of specific seasons"), related-to categories ("This is a "related-to" category. It should contain terms directly related to winter"), and name lists. In our schema, the category for terms related to waterfalls or which are types of waterfalls, and the category for names of specific waterfalls, should both be "CAT:en:Waterfalls" AFAICT.
And because type isn't predictable from name, some people (reasonably!) think e.g. Category:en:Cities is named like a set category and put "capital city", eperopolis et al in it (and where else should these go?), while other people think it's named like a related-to category, or (yes) a name category... so, like many categories, its contents are a mix.
A solution would be to specify the purpose in the name: ":en:set:Seasons", ":en:topic:Winter", ":en:names:Cities"... but this highlights another issue: does it make sense that :en:Winter can include wintery, but :en:Seasons says it shouldn't contain seasonal? Maybe not! (And is it unmaintainable, anyway? It seems like in practice the more fine-grained distinctions we assert, the less well people maintain them.)
Should we merge "sets" into "related-to" categories, so "CAT:en:Seasons" could contain summer and seasonal? (In theory, set categories could just be ====Hyponyms==== sections of entries like [[season]], not needing to be categories at all.) - -sche (discuss) 21:24, 3 August 2024 (UTC)Reply

@-sche: I would be in favour of merging the two types of categories, as I don't really think the distinction is easy to maintain. Alternatively, if it is felt that in some cases it is appropriate to have a "name" category, maybe the default should be a related-to category (for example, "Category:Cities") and the "name" category should be a subcategory called "Category:Names of cities". — Sgconlaw (talk) 23:57, 7 August 2024 (UTC)Reply
@-sche: I prefer a naming scheme that makes the purpose clear, so you might have "types of waterfalls", "waterfalls", and "particular waterfalls" for the three kinds of categories. Ioaxxere (talk) 01:36, 8 August 2024 (UTC)Reply
@Ioaxxere: I don’t feel it’s necessary to distinguish between “Category:Waterfalls” and “Category:Types of waterfalls”. I’m somewhat concerned that if we have distinctions which are too fine we are just going to get editors dumping everything in “Category:Waterfalls”. — Sgconlaw (talk) 11:23, 8 August 2024 (UTC)Reply
[edit]

Something I found neat in our PIE entries is the feature in WT:AINE allowing the splitting of reconstructed PIE terms by morpheme with hyphens in the alt parameter of links in Derived and Related terms. Not only does it allow more derivation transparency, but also you can square-bracket link the individual morphemes involved so less familiar visitors can be taken to the compositional morphemes to learn more about them.

I would like to amend WT:Reconstructed terms to allow this practice to be used on other proto-language pages, not just PIE (and not on non-proto-language entries).

The amendments to WT:Reconstructed terms#Entries would be something like this, derived from language at WT:AINE:

Separating hyphens can be used in the displayed form of links in Derived terms and Related terms sections of proto-language pages to clarify the formation, as long as it is not used in the page name itself.

Ceso femmuin mbolgaig mbung, mellohi! (投稿) 08:54, 4 August 2024 (UTC)Reply

Strong support, have been doing this for non-proto reconstructions in, e.g., Prakrit and Ashokan Prakrit. It will be nice as a frequent reader of Proto-Indo-European entries as well, though I understand that this is obviously not always possible when there are factors like sandhi in play. Svartava (talk) 09:24, 4 August 2024 (UTC)Reply
Please don't do this for lower-branched Uralic Proto-languages. I think this is not helpful for agglutinative languages overall.
Not sure I like it for PIE, either, but it is kind of a tradition in IE linguistics, so I guess. For languages where this isn't done in literature - not sure it's helpful. Thadh (talk) 09:51, 4 August 2024 (UTC)Reply
As Thadh points out, this is something needs to be decided on a language-to-language basis. If Proto-North Caucasian feels this works best for them, godspeed. What I am opposed to is doing so on Proto-Italic and Proto-Celtic entries, as you have been doing. Those need to go to a vote, because the status quo is not to have hyphens. --{{victar|talk}} 17:23, 4 August 2024 (UTC)Reply
...which is why I posted here in the first place, to narrow down the boundaries of such a vote? — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 18:29, 4 August 2024 (UTC)Reply
Since when have we needed votes for a content issue like this? Theknightwho (talk) 19:21, 4 August 2024 (UTC)Reply
I don't see any policy prohibiting morpheme hyphens elsewhere... — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 19:29, 4 August 2024 (UTC)Reply

Vote has been drafted

[edit]

@Svartava, Thadh I have started a vote at Wiktionary:Votes/2024-08/Allow hyphens in link displays for Indo-European proto-languages. Feel free to discuss or ask for amendments. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 19:20, 4 August 2024 (UTC)Reply

This is a silly vote. As you pointed out, there is no policy prohibiting hyphens in entry links, let alone alternatives to links. Again, it is up to communities to decide what conventions they use. If you want to change status quo conventions for Proto-Italic, start vote on that, like at Wiktionary:Requests_for_deletion/Reconstruction#Proto-Italic_terms_with_only_one_descendant. --{{victar|talk}} 21:55, 4 August 2024 (UTC)Reply
@Victar You are the one who suggested a vote. Pick a lane. Theknightwho (talk) 01:56, 6 August 2024 (UTC)Reply
I suggested a vote specific to Proto-Italic or Proto-Celtic. --{{victar|talk}} 03:08, 6 August 2024 (UTC)Reply

Latin months: nouns or proper nouns? Capitalized or uncapitalized?

[edit]

Another Latin "proper noun" question. Currently, there seems to be no standardization in how we format entries for Latin month names. Aprīlis (April) only has a capitalized entry, and is marked as an Adjective and Noun. Maius is marked as an Adjective and Proper noun; there is a stub at maius noting it is an Alternative letter-case form. On the other hand, iānuārius is used as the main entry (Adjective and Noun) while Iānuārius is marked as an Alternative letter-case form. Contributing further to the mess, Category:la:Months includes multiple variants of some names such as Jānuārius.

What should the main entries be, what POS should be used, and how much information should be included in the alternative case form entries? In English, the POS of months is treated as "Proper noun". Urszag (talk) 11:10, 4 August 2024 (UTC)Reply

Do any Latin dictionaries indicate when something is a proper noun? (In English, one hurdle to consulting other dictionaries about whether some class of word is a common noun or proper noun has been that many lazily have just one 'noun' category into which everything goes.) I seem to recall the fact that Russian month names are listed as uncapitalized common nouns being the result of a discussion where Russian editors argued for that based on how Russian references/speakers treated them.
Do you have a sense of whether modern editions of Latin texts usually capitalize month names, the way they usually capitalize personal and place names? Poking around Google Books, it looks to me like "modern" Latin texts (actually, everything that turns up, from texts written in Latin the 1500s and 1600s to recent editions of ancient Roman works) almost always capitalizes month names, which suggests the capitalized forms should be the main entries. - -sche (discuss) 15:52, 4 August 2024 (UTC)Reply
I'm not familiar with any Latin dictionary that indicates proper nouns. Typically they just mark nouns or proper nouns by providing the gender (m, f, n); DMLBS also makes some use of the label "sb." (substantive) for both nouns and proper nouns. In my experience, capitalization is the usual editorial convention.--Urszag (talk) 17:26, 4 August 2024 (UTC)Reply
Since it seems (by capitalization) that Latin dictionaries treat months as capitalized proper nouns, I would argue we should do the same. Likewise the adjectives should be capitalized. Benwing2 (talk) 18:39, 4 August 2024 (UTC)Reply
Capitalisation doesn't mark proper nouns: Several dictionaries also capitalise other adjectives like Homēricus (Homeric), Rōmānus (Roman) and common nouns like Rōmānus m (Roman (person)).
As for months and spellings: It's also a matter of attestion. Is always both like Februārius and februārius attested?
Likewise for months and POS: Is always both mēnsis Februārius/februārius (or something like: Kalendae Februariae/februariae, Nonae Februariae/februariae, Idus Februariae/februariae) and simply Februārius/februārius m attested? --16:18, 6 August 2024 (UTC)
I don't really understand the second part of this comment. Ancient texts don't use capitalization, so there is no relevant ancient attestation distinguishing the two. Pretty much every modern edition I've seen (or modern Latin works, such as "Lingua Latina Per Se Illustrata") follows the convention of capitalizing the names of Latin months. This isn't restricted to English editors either: you can see "Augustus" capitalized in French texts such as the Gaffiot dictionary. I did see some lowercase examples of Latin month names on Google Books (e.g. "mensis augustus") so they are also attested, but I'm confident that uppercase is currently the more usual convention.--Urszag (talk) 15:02, 7 August 2024 (UTC)Reply
Another example of capitalization: "Datum Romae, apud S. Petrum, die XIX mensis Martii, in sollemnitate Sancti Ioseph, anno MMXVIII, Pontificatus Nostri sexto" in Pope Francis's Gaudete et exsultate (2018).--Urszag (talk) 15:25, 7 August 2024 (UTC)Reply
I found an older discussion from when month names were moved to lowercase versions: Wiktionary:Tea_room/2015/June#Latin_month_names. It looks like EncycloPetey based this on (some edition of?) "Josip Lučić Spisi Dubrovačke Kancelarije, a series of legal documents in Latin from Ragusa in the late 13th century". I'm not convinced yet that the cited text is representative of medieval usage as a whole, or that medieval usage should be relevant compared to the typical usage of more recent centuries, but I wanted to link to that discussion for greater context. I have already started moving the names (back) to capitalized versions based on the input from -sche and Benwing2.--Urszag (talk) 15:38, 7 August 2024 (UTC)Reply
The publication in question was the source of citations, used because it was the easiest at hand, and because the text had both capital and lowercase lettering. A search of other medieval records containing dates should be able to furnish additional citations, as long as the scribe wrote out month names rather than numbers. At the time of the earlier discussion, the Latin months were treated as adjectives because the available citations in both classical and medieval Latin demonstrated use as adjectives. Modern dictionaries and Modern Latin do use capitalized forms, but Augustus is not a good example, since it specifically derives from the name of a person. Capitalization of months like october and november would be stronger evidence for capitalization, but as I say, evidence at the time suggested the practice of capitalizing month words was a modern practice. --EncycloPetey (talk) 16:30, 7 August 2024 (UTC)Reply
Thank you for the clarification; so this is a compilation which is being cited as showing multiple independent examples of medieval usage? I guess it seems to me that the first question to be resolved (before getting into the question of what typical medieval usage was) would be whether capitalization on Wiktionary should be based on modern capitalization practices (e.g. "Datum Romae, Laterani, die XV mensis Octobris, in memoria sanctae Teresiae a Iesu, anno MMXXIII, Pontificatus Nostri undecimo", Est utique fiducia/C'Est La Confiance, 2023) or on medieval capitalization practices. I think that in general, we follow modern practices for spelling Latin words in entry titles; e.g. the use of "ae" and "oe" rather than æ, ę, œ, although I guess it is often difficult to distinguish between Classical conventions and modern conventions.--Urszag (talk) 17:09, 7 August 2024 (UTC)Reply

Our treatment of MIA reconstructions

[edit]

@Pulimaiyi, Kutchkutch, Svartava (feel free to ping others, no idea who is interested in this stuff these days): There are many terms that are only attested across several New Indo-Aryan languages but not at any earlier stages of Indo-Aryan. Sources like Turner's {{R:CDIAL}} reconstruct ancestral forms for such cognate sets, but due to phonological degradation (e.g. consonant cluster assimilation) the reconstructions can only go back to Proto-Middle Indo-Aryan rather than a language we clearly know how to deal with like Proto-Indo-Aryan or Proto-Sanskrit.

For the past couple years our strategy has been to call these reconstructions Proto-Ashokan Prakrit, which is a language we made up and not a label that is really used in any literature (0 hits on Google). We settled on Ashokan Prakrit since it is likely the ancestor of all New Indo-Aryan languages (including "Dardic") and we didn't have a later node that unifies NIA subfamilies, since e.g. we used to treat Prakrit and Apabhramsha as collections of languages.

Now that we have codes for unified Prakrit and unified Apabhramsha, I think we should move any Proto-Ashokan Prakrit terms without "Dardic" descendants (e.g. *𑀟𑀼𑀓𑁆𑀓𑀭 (*ḍukkara, pig)) to Proto-Prakrit. Proto-Prakrit is a term used in scholarly literature on IA historical linguistics, including by Turner. Also, this way we are not overclaiming the age of the word.

One edge case to consider is that often, a term may be constrained to non-Dardic NIA but also happen to have a descendant in Kashmiri; an example is *𑀝𑁄𑀓𑁆𑀓 (*ṭokka, basket)). Kashmiri is the "Dardic" IA language that is most in-contact with plains Indo-Aryan (particularly Punjabi). I think this should also be called Proto-Prakrit but we can debate this. Ideally, we reserve Proto-Ashokan Prakrit for any NIA terms with non-Kashmiri Dardic cognates. —AryamanA (मुझसे बात करेंयोगदान) 20:40, 4 August 2024 (UTC)Reply

I agree. Some related followup Qs:
1. How do we want to handle cases like *dākka, which Turner reconstructs [1]here with both a long vowel and consonant cluster (which is generally considered invalid in Middle Indo-Aryan). It appears that Turner is reconstructing Old Indo-Aryan. In this case, do we want to say that the descendant is Sanskrit *डाक्क (ḍākka), Prakrit *𑀟𑀸𑀓𑁆𑀓 (*ḍākka​), Ashokan Prakrit *𑀟𑀸𑀓𑁆𑀓 (*ḍākka​), or Prakrit *𑀟𑀓𑁆𑀓 (*ḍakka​)?
2. Is Proto-Prakrit a separate language or just a shorthand for referring to reconstructed Prakrit? I haven't seen any Proto-Ashokan Prakrit language in Wiktionary, so I'm guessing what you're referring to is reconstructed Ashokan Prakrit, right? Dragonoid76 (talk) 18:57, 5 August 2024 (UTC)Reply
One more question—what are the cases where it makes sense to reconstruct "Sanskrit", as opposed to "Proto-Prakrit" or "Proto-Ashokan Prakrit"? Can we make (or does it already exist?) a clear decision on these cases? For example:
Dragonoid76 (talk) 20:00, 5 August 2024 (UTC)Reply
I also agree. For Dardic descendants, and also Pali descendants of Turner reconstructions, we might want a Proto-Middle Indo-Aryan but that ways the age of any word will obviously be implied more than what it would be if it was called "Proto-Prakrit". I'm also open for Sanskrit reconstructions which do seem better suited in some cases like *ध्वजदण्ड (dhvajadaṇḍa), *तिथिवार (tithivāra), etc. and this can be easily dealt with on a case-by-case basis (due to the low number of MIA editors) as to which reconstruction fits better. I would also like to point out that despite being less frequent, early MIA like Pali does show both a long vowel and consonant cluster and even some Prakrit words do that, so I don't think it would be very problematic to have Proto-Prakrit reconstructions having both a long vowel and consonant cluster. Svartava (talk) 04:55, 6 August 2024 (UTC)Reply
@AryamanA: Hello! It's great to see you active again. As a matter of an incredible coincidence, @Svartava and I have been, for the past few weeks, discussing on Discord about having a Proto-Prakrit code. Having a Proto-Prakrit code is surely less problematic than taking Turner reconstructions (which were intended by Turner to be Sanskrit) and showing them as Ashokan Prakrit, a practice unique to Wiktionary. Moreover, in Ashokan reconstructions, we spell out the geminated stops (case in point: *𑀝𑁄𑀓𑁆𑀓 (*ṭokka)) but we know that gemination was not reflected in spelling in the edicts of Ashoka. So we have to either change these reconstructions to Proto Prakrit or render them in the Latin script. Also, Ashokan needs to be set as the ancestor of Dardic (it's not, for now).
@Dragonoid76: To address your queries: a long vowel followed by a geminated consonant cluster is uncommon, but not invalid in MIA, as cases like dātta definitely exist. As for Prakrit entries in the reconstruction namespace vs Proto-Prakrit as a separate code, I am of the opinion that since Prakrit has been merged, we might as well use Prakrit reconstructions. As for your next question of how to decide between Ashokan vs Pkt reconstruction vs Sanskrit, as Aryaman said, if it has non Kashmiri Dardic reflexes, it will be an Ashokan reconstruction. As of now, inc-ash is not set to be the ancestor of inc-dar-pro but that can be fixed. I believe it should be, because Shahbazgarhi Ashokan shows many features which can be said to be the ancestor of the corresponding features in Dardic. Deciding between Sanskrit and Ashokan can be much more challenging, given Ashokan contains sounds like /ṣ/, /ś/ and non simplified consonant clusters. So ciṣṭa might well be early MIA. One rule of thumb I'd use is, compounds where the components are discernable as Sanskrit words are Sanskrit, such as *bhaginī-putra -- 𝘗𝘶𝘭𝘪𝘮𝘢𝘪𝘺𝘪(𝘵𝘢𝘭𝘬) 05:18, 6 August 2024 (UTC)Reply
@AryamanA: Moving a few entries from reconstructed early MIA Ashokan Prakrit to reconstructed middle MIA Proto-Prakrit seems to be uncontroversial since that was the original proposal.
@Dragonoid76, Pulimaiyi: Regarding, Is Proto-Prakrit a separate language or just a shorthand for referring to reconstructed Prakrit?
I agree with
since Prakrit has been merged, we might as well use Prakrit reconstructions
rather creating a new code for Proto-Prakrit. This is because creating a new code for Proto-Prakrit would mean that we would have to decide whether it is an ancestor, descendant or contemporaneous with the merged Prakrit language. Furthermore, Prakrit reconstructions are usually one-off for special cases unlike protolanguages such as Proto-Indo-Iranian. Protolanguages such as Proto-Indo-Iranian are entirely reconstructed while Middle Indo-Aryan is a mixture of attested and reconstructed terms.
Would the script continue to be Brahmi? … we have to either change these reconstructions to Proto Prakrit or render them in the Latin script
When Proto-Indo-Aryan reconstructions were moved to Ashokan Prakrit reconstructions, it was a delight to see them in Brahmi script instead of Latin script. Then, Victar started this discussion WT:Beer_parlour/2021/March#Reconstructions_in_Latin_script
Victar: I'd like get a discussion going about adding a guideline to WT:PROTO that states that all reconstructions should be in Latin script. Most already are, but here's a list of the ones that buck that standard…Ashokan Prakrit … Sanskrit
Mahāgaja: Devanagari seems perfectly natural to me
Victar: If we're going by academia, reconstructions will always usually be in Latin script, which does also go for Sanskrit and Avestan. Seeing RC:Sanskrit/लुट्टति is rather weird to my eyes
I agree with Fay Freak’s comment. However, I also see what Victar meant. Academia in the English language will probably not consider Wiktionary’s reconstructions seriously if they are not in the Latin script.
At Talk:बद्ध, AryamanA said
It is not useful to reconstruct with the idiosyncracies of Ashokan Brahmi being applied, in comparative linguistics we care about the phonology not orthography.
If the idiosyncracies of Brahmi are not to be applied to reconstructed Brahmi, and if we care about the phonology not orthography, then that might suggest that the Latin script might be used for reconstructions if the Latin script better represents the phonology. However, it could be argued that even the Latin script has idiosyncrasies of its own.
Question 1: @Pulimaiyi, Svartava: If middle MIA reconstructions continue to be in Brahmi, would the anusvara be used for homorganic nasal consonants, or would they be written as the Brahmi equivalents of ङ् ञ् ण् न् म्? The middle MIA convention is to use the anusvara. RC:Ashokan Prakrit/𑀟𑀗𑁆𑀓 uses ङ्, while RC:Ashokan Prakrit/𑀫𑀡𑀺𑀕𑀁𑀞𑀺 uses the anusvara.
As for … how to decide between Ashokan vs Prakrit reconstruction vs Sanskrit, … if it has non Kashmiri Dardic reflexes, it will be an Ashokan reconstruction … given Ashokan contains sounds like /ṣ/, /ś/ and non simplified consonant clusters … ciṣṭa might well be early MIA
What this means is that there will reconstructions at three stages:
OIA (Sanskrit)
Early MIA (Ashokan Prakrit)
and Middle MIA (Prakrit)
By analogy with RC:Sanskrit/चिष्ट, RC:Ashokan Prakrit/𑀧𑀝𑁆𑀞𑀸𑀦 might be moved to RC:Ashokan Prakrit/𑀧𑀱𑁆𑀝𑀸𑀦 especially since there is a Kashmiri descendant K. paṭhān m. (see Reconstruction_talk:Ashokan_Prakrit/𑀧𑀝𑁆𑀞𑀸𑀦#*paṣṭāna?). However, RC:Ashokan Prakrit/𑀙𑁄𑀝𑁆𑀝 has a Kashmiri descendant, but it does not resemble early MIA.
Question 2: @Pulimaiyi, Svartava: With such a scheme shouldn’t we use the ===Reconstruction notes==== section to explain why a particular stage was chosen for a reconstruction rather than another stage (in addition to other details)?
For example, when I look at
RC:Sanskrit/ध्वजदण्ड
RC:Sanskrit/तिथिवार
RC:Sanskrit/उन्नग्न
RC:Sanskrit/स्यालभार्या
it always takes me a few minutes to justify why these are being reconstructed as OIA (Sanskrit) rather than middle MIA because of
Special:Permalink/65062470#बुभुक्ष्
Pulimaiyi: Sanskrit reconstructions are very rare in wiktionary and are generally not favoured by wiktionary's convention … Sanskrit reconstructions are not favoured by wiktionary's convention because of the lack of reliable reconstruction sources to base it on.
See also:
Reconstruction talk:Sanskrit/ध्वजदण्ड
We already have RC:Sanskrit/तिथिवार, which is why I even thought of creating this reconstruction. Or else, I'd have simply added {{inh|hi|sa||*ध्वजदण्ड}}, without linking it.
[[User_talk:Inqilābī#Status_of_{{R:CDIAL}}_reconstructions]]
Kutchkutch: Do you have a opinion on whether RC:Sanskrit/उन्नग्न should be modified to a Prakrit form or remain as [it] appear[s] in {{R:CDIAL}}?
CDIAL Introduction:
Many of the headwords, like so much of classical Sanskrit vocabulary, are in reality Middle Indo-Aryan clothed, for the convenience of presentation, in an earlier phonetic dress
Inqilābī: No idea, but it might be the case that Turner reconstructs both OIA and MIA terms.
Talk:सलहज
PUC: Wow, the phonetic erosion was rather strong in there! No?
AryamanA: Yep. It's syālabhāryā > sālahāyya > sallahayya > salhaj
At one point I was deciding whether RC:Sanskrit/तिथिवार should be moved to Ashokan Prakrit and then decided not to. If the ===Reconstruction notes==== section explicitly explains why that particular stage was chosen (in addition to other details), then that would clear up the confusion.
At RC:Sanskrit/युट् despite saying,
Turner posits that all forms of this root may have originated from *युट्ट which was a MIA replacement for युक्त
it seems that the justification for having RC:Sanskrit/युट् as Sanskrit rather than middle MIA is that we agreed not to have middle MIA roots at Talk:घोट#𑀖𑀼𑀝𑁆𑀝𑁆-_(ghuṭṭ-). However, early MIA CAT:Ashokan Prakrit roots are permissable according to the following statement in that discussion:
Ashokan Prakrit roots are tolerated because *we* consider the unattested terms in Turner's dictionary to be Ashokan Prakrit
One rule of thumb I'd use is, compounds where the components are discernable as Sanskrit words are Sanskrit, such as *bhaginī-putra
The components of RC:Ashokan Prakrit/𑀫𑀡𑀺𑀕𑀁𑀞𑀺 are discernable as Sanskrit, but I placed it in MIA rather than OIA (Sanskrit). *bhaginī-putra differs from RC:Ashokan Prakrit/𑀫𑀡𑀺𑀕𑀁𑀞𑀺 because it has the Kashmiri descendant K. bĕnathᵃr m..
The relationship between reconstructed MIA and Dardic languages has been discussed several times such as at
Reconstruction talk:Ashokan Prakrit/𑀕𑀼𑀧𑁆𑀨𑀸
The existence of a Dardic cognate could suggest that this word existed in late Old Indo-Aryan/early MIA: this is precisely why initially a code for "Proto MIA" was proposed so that Pali and Dardic could be included; but that idea did not garner much support and we had to settle for Ashokan Prakrit instead, which albeit quite pervasive, unfortunately does not extend to Pali and Dardic.
Special:Diff/73057977 at RC:Ashokan Prakrit/𑀕𑀸𑀟𑁆𑀟
Any other way to deal with Dardic terms cognate with Ashokan prakrit without having to reconstruct Sanskrit?
Special:Diff/73407835 at گاڑے#Torwali
Apparently there are more Dardic terms than just Kashmiri corresponding to CDIAL 4116 *gāḍḍa 'cart'
Kashmiri is the "Dardic" IA language that is most in-contact with plains Indo-Aryan (particularly Punjabi)
Although Kashmiri is the most spoken Dardic language, the other Dardic languages are also in contact with “plains Indo-Aryan”, which might explain گاڑے#Torwali. RC:Sanskrit/चिष्ट has the Shina descendant چٹھ#Shina. Also, CDIAL Introduction: derives the Khowar term ātΛpik from reconstructed MIA :
Khowar ātΛpik `to have high fever' must rest either upon a late MIA. *ātapp- (newly formed compound with ā from tappaï) or upon MIA. *āttapp- with analogical -tt- (after type ā-tt- < ā-tr-, etc.). The head-word ātapyatē under which the Khowar word appears is thus in reality a Middle Indo-Aryan word in Old Indo-Aryan form.
What is probably meant by “Punjabi” here is “Punjabic languages” such as Pahari-Potwari and Hindko in addition to the standardised Majhi Punjabi.
Urdu as a lingua franca is also in contact with Dardic languages to a significant extent.
Pashto is another lingua franca that is in contact with Dardic languages in Khyber-Pakhtunkhwa and Afghanistan. Although Pashto is an Iranian language, Pashto borrows from Urdu and Punjabic languages including Lahnda/Saraiki. Perhaps it is too much of a stretch for a Dardic language in Khyber-Pakhtunkhwa or Afghanistan to have a “plains Indo-Aryan” term through Pashto. For example,
RC:Ashokan Prakrit/𑀕𑀸𑀟𑁆𑀟ګاډی#Pashtoگاڑے #Torwali
(See CAT:Pashto borrowed terms) Perhaps there is a possibility with RC:Ashokan Prakrit/𑀧𑀝𑁆𑀞𑀸𑀦 that a Dardic language acquired the term first and then it spread to “plains Indo-Aryan”.
Kutchkutch (talk) 16:00, 6 August 2024 (UTC)Reply
Turner's etymons are basically Sanskritized protoforms, which also counts Nuristani reflexes (including those that are inherited from Proto-Indo-Iranian and those that are borrowed from Gandharan Prakrit per Halfmann). Ashokan Prakrit here is misconstrued as a generic proto-Prakrit (in fact the ancestors of Sinhalese and Dhivehi left the subcontinent before Ashokan Prakrit was even attested; one can argue also for the early separation of Dardic, which is best interpreted as an areal grouping of northwestern Indo-Aryan languages). We should start by reconstructing intermediate proto-languages that are based on uncontroversial subgroups (like Insular Indic, Kalasha-Khowar, Shinaic, Kashmiric, or the Kamta dialects, even Proto-Nuristani). People have reconstructed Proto-Hindko and Proto-Tharu before. The issue here is that the Indo-Aryan languages in general form a vast dialect continuum that cannot be easily visualized by the standard tree model. Kwékwlos (talk) 02:11, 17 September 2024 (UTC)Reply

etymology sections and a lack of standardization on detail

[edit]

We have basically zero standards on the level of detail one should put in an etymology section, some will only list the direct ancestor regardless on if it's derived from another language or not (DeJulio, others will list the ancestors of a word all the way to say, Latin(like dictionary or this Malay term for June), and then others still will go all the way back to PIE or similar. that's not getting into entries like admiral, orange or pizza that start to look run on paragraphs with stuff like cognates and miscellaneous etymological detail.

I do recognize a pattern of more common or popular words having the larger etymology sections but that not really the "problem" here, and the longer sections are all pretty much on topic even if they get rambly. For one, we aren't Wikipedia, and these long paragraphs are a bit unwieldy to the average reader(read: eyesore), and if i probably wouldn't have broached this topic this time last year on the merit of having the full etymology on the same page to be quite useful, and probably was the intent prior, however with the introduction of the etymon template among other technical revolutions on the site this year, there's now much better ways(imo) to present the info to the average readers. another argument for reducing these large sections would be synchronicity, as I've encountered plenty of cases where one entry is missing details provided by another or one having an error that the other had fixed.


now it might sound like i'm advocating for said "only list the direct ancestor" situation but honestly my main gripe with how things are are mostly just presentation of the info, I've brought up on the Discord the suggestion of if entries are to be going the distance of providing an exhaustive etymology, that it doesn't need to be presented in paragraph form, particularly given that it's mostly just three to five word statements like(now presenting how it could be presented instead of paragraphs):

- Word A from language A

- Word B from language A

- Word C from Language B

- Word D from Proto Language Akaibu (talk) 06:39, 6 August 2024 (UTC)Reply

I mentioned this on Discord: with the {{etymon}} template, I don't think it'll hit widespread usage until it's easier to use than the basic etymology templates like {{der}}, {{bor+}}, {{inh}}, etc. etc. Having to learn/use IDs and the whole system is daunting for the average editor. I do agree though that our etymologies do need cleanup in terms of what to display. A lot of times I'll just show the initial borrowing and put "ultimately from" for entries like Hawaiian ʻApekanikana or Yoruba Alibéníà. AG202 (talk) 17:08, 6 August 2024 (UTC)Reply
@AG202: The goal of {{etymon}} is to connect entries like puzzle pieces, so I think the main problem currently is that very few entries are using it. In the future it will hopefully be saving massive amounts of time on stuff like categorization, finding derived terms, and writing out long etymological chains by hand. Ioaxxere (talk) 19:56, 6 August 2024 (UTC)Reply
I'm in favor of increased usage of etymon to reduce the problem of different etymology sections not being in sync with each other, but I agree that in its current form the template is not simple enough to be easily used (e.g. the ID system is cumbersome, and the conditions for when not to use "from" are not intuitive).--Urszag (talk) 20:25, 6 August 2024 (UTC)Reply
We've previously discussed - and seemingly agreed upon - how to make the syntax of {{etymon}} more intuitive. Due to the unfortunate choice of title I cannot link the thread directly, so here is the URL in plaintext: https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2024/June#{{etymon}}
Incorporating Benwing's last suggestion, we'd have something like:
{{ety|en#X|clever#Y|-ly#Z}} “[cleverly is] from clever + -ly
{{ety|en#X|enm:charitee#Y}} “[charity is] from Middle English charitee
{{ety|ru#X|de:montieren#Y|-овать#Z}} “[монтировать is] from German montieren + Russian -овать
The X, Y, Z following the hashtags are IDs, and various additional parameters can be added like |inh=1 |bor=1 |blend=1 |backformation=1
If the syntax were like that, I'd actually be happy to use it as an general-purpose etymology template. Nicodene (talk) 23:48, 6 August 2024 (UTC)Reply

Reminder! Vote closing soon to fill vacancies of the first U4C

[edit]
You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Dear all,

The voting period for the Universal Code of Conduct Coordinating Committee (U4C) is closing soon. It is open through 10 August 2024. Read the information on the voting page on Meta-wiki to learn more about voting and voter eligibility. If you are eligible to vote and have not voted in this special election, it is important that you vote now.

Why should you vote? The U4C is a global group dedicated to providing an equitable and consistent implementation of the UCoC. Community input into the committee membership is critical to the success of the UCoC.

Please share this message with members of your community so they can participate as well.

In cooperation with the U4C,

-- Keegan (WMF) (talk) 15:30, 6 August 2024 (UTC)Reply

Micronations inclusion criteria

[edit]

Micronations are not explicitly mentioned in Wiktionary:Criteria for inclusion#Place names, yet we have at least seven pages for micronations on Wikt. Seeing as they do count as place-names, I am asking here for input on whether or not micronations should be allowed to have their own entries/just be subject to the same criteria as any entry. FWIW, I am of the opinion that they should be allowed to have entries, and, for clarity, be added to the aforementioned policy link as legal scholars tend to classify them as political entities, which are already allowed entries on Wikt. Would appreciate any feedback or comments, including any opposition to this proposal! Kindest regards, LunaEatsTuna (talk) 03:30, 7 August 2024 (UTC)Reply

I've discussed on the Discord that they should be counted, because they are names of places, and could be seen as already have been included in CFI.
My reasoning for this is as follows:
1. We include "[h]uman settlements: cities, towns, villages, etc."
2. Micronations are human settlements, in the sense that they have/had people who live in them. (We also list ghost towns with 0 people in them, so people actually living in them isn't a concern).
3. As such, human settlements are implicitly included in CFI.
Regardless of if you think my reasoning is sound, I do feel that they should be included, as they can achieve the same level of being talked-about as the towns in Arizona, or even more, in some cases (such as Sealand.) CitationsFreak (talk) 04:00, 7 August 2024 (UTC)Reply
I firmly believe that they are not currently included under our current criteria for Wiktionary:Criteria for inclusion#Place names. Looking at the list found at w:List of micronations, I would be hard-pressed to say that our policy states that we should include all of them. Most of them have no people living in them, and some don't even have an actual territory. A resort, a farm, a bank, two sculptures, straight-up fraud, and more should not be included by default as purported micronations. While I don't necessarily support our current policy that includes ghost towns & unincorporated communities with no people living in them, those at least receive recognition from an actual state and can be found on official government documents.
A lot of micronations are essentially "I made this up". Some should fall under WT:COMPANY. For example, I simply do not think that the Principality of Snake Hill, from a "family in New South Wales who were unable to afford their taxes seceded from Australia", should be included by default here. Some micronations are just online communities, and I don't think we'd want to open the floodgates to the name of just any online community that declares itself a micronation. A number of them claim territory that they don't even live on. It just rings as unserious, frankly, and our place names policy is broad enough as is. They don't rise up the level that an actual unrecognized state like Somaliland does.
That being said, I would support a policy to explicitly include notable micronations such as Sealand, but I'm not yet sure what the notability criteria should be. But for now, I'd say that they fall under this policy: "Most manmade structures, including buildings, airports, ports, bridges, canals, dams, tunnels, individual roads and streets, as well as gardens, parks, and beaches may only be attested through figurative use.", if even that. Or they could go into the Appendix. AG202 (talk) 04:28, 7 August 2024 (UTC)Reply
I would like to point out this line from the rationale of the CFI place names vote: "[T]he categories are left open-ended to allow more of our existing entries." This means that if a specific type of place is not explicitly spelled out, it does not mean that it falls under the criteria.
Also, the regular CFI criterion protects us from having to deal with every little obscure micronation made up by a ten-year-old in their bedroom. I would say that any micronation that is mentioned in three+ independent sources over a period of one year should be included. If enough people talk about, say, Melchizedek, then I'd say it's notable enough for us.
(Plus, is a fake nation really more similar to an airport or street or anything else mentioned in that sentence than a nation?) CitationsFreak (talk) 04:47, 7 August 2024 (UTC)Reply
"Plus, is a fake nation really more similar to an airport or street or anything else mentioned in that sentence than a nation?" Yes? I'm almost certain major and even some minor airports have more notability and usage than the vast majority of the micronations listed. Let alone actual nations and sovereign states. Like I said some of them are literally a singular building. Also, honestly, our CFI criterion doesn't protect us, considering all we need are 3 Usenet comments, or at this point simply 3 tweets. AG202 (talk) 05:02, 7 August 2024 (UTC)Reply
I meant that in terms of function. A micronation acts like a nation, with its own government and rulers and flag and so on. This is unlike an airport or a street, which doesn't.
Also, like I said, I am using the CFI standard, "use in durably archived media, conveying meaning, in at least three independent instances spanning at least a year". If these conditions are met when people are talking out a building, why shouldn't it be in Wikt? CitationsFreak (talk) 05:08, 7 August 2024 (UTC)Reply
Because they are explicitly excluded by WT:CFI#Place names, unless they have figurative usage, which is exactly my point. If we included buildings and such by default, I wouldn't be replying here, but that's not the case. We can't simply have someone redress a building or company or farm or something similar as a "micronation", get 3 independent usages, and then bam, we include it by default. That just does not align with how I'd expect our policies to be read. And looking at the list from WP, based on the references they have, I would expect the vast majority, if not all, of them to pass if we include them by default. AG202 (talk) 05:15, 7 August 2024 (UTC)Reply
I would totally expect a reader to look Sealand, but not the name of any other sea fort, since it is famous. (However, I wouldn't expect a reader to look up Bob's Principality of North-East Main Street.)
(In the Discord, I had also suggested "a new rule for microstates, that says something like "Ignore all references to the founding of the state[, since they are not independent]"? What do y'all think?) CitationsFreak (talk) 05:33, 7 August 2024 (UTC)Reply
Well, a micronation is a territory around a building plus a government. So we include them by comparison to, or their partial identity with human settlements, neighbourhoods and countries, as we even include fictional countries. This does not exclude that some shall not be included also according to our inclusion criteria because they are more similar to constructed languages, for instance.
We should be more concerned with violation of WT:BRAND by their artificialities. Some are organized like a cult, a club or criminal organization, though we include 'ndrangheta, Hamas, Islamic Revolutionary Guard Corps and the Unification Church, or what: I think about Reichsbürger here, whose constructs aren’t considered micronations however. Somewhere it does go too far. We won’t agree on their being noted in references per se supporting their inclusion, though notability is important, since just for clarity and not being confused with Wikipedia Wiktionary editors will avoid mentioning notability in the CFI, which they fear not to even understand in the same way as you if they don’t edit Wikipedia.
We can make RFDs for any reason later if the current inclusion situation goes out of hand, I don’t see a benefit of a theoretical community agreement on inclusion criteria specific to micronations. It is right, necessary and sufficient that we have discussed it, this well help us later to find out what goes too far. Fay Freak (talk) 13:15, 7 August 2024 (UTC)Reply
Micronations aren't all one kind (Liberland denotes a specific area, Obsidia is a movable rock; some are oft-mentioned, some scarcely-mentioned), so IMO we shouldn't add blanket acceptance of all micronations to CFI. But if enough people use a term like Liberland to refer to a given area, I don't see an obvious dividing line between that and other coinages for specific (or nebulous!) regions—which may not have administrative significance or population—we don't bat an eye at: the Triangle, the Golden Strip, Trójmiasto, Mariana Trench, not to mention terms where sovereignty is disputed, like Northern Cyprus, Judea and Samaria, or Donetsk People's Republic. If there are cites to support it, I don't see a reason not to include Liberland: but that doesn't mean we should define these as real nations; I might lead with Liberland being a name for a particular area (used by people who claim it's a nation), and likewise might merge the first two senses of Seborga and just mention that the town is claimed to be a micronation.
It's true anyone can make up a micronation and we could be flooded, but we can RFD things were needed (if we don't blanket-include them), and AFAICT people could already coin and flood us with coinages for non-micronation regions: if people start calling an arbitrary U-shaped snake of land from Hamburg down to Hannover and east to Berlin and north up to Waren "HaHaBeWare" (or something as self-promotional as some micronations, like "Rachel's Backyard"), not asserting it to be a micronation but just saying "this is a name for this region, a la the Golden Strip", I don't currently see on what basis we wouldn't include that... (Also, while Obsidia, which I mentioned above, is less a placename and more like Ishango bone or Einang Stone, we seem to be deciding at RFD to keep such "names of specific individual stones and bones", so maybe Obsidia is fine too? I don't know; I'm more sceptical of it, and we don't currently have other stone-names I checked like Stone of Scone, but I'm curious why Ishango bone would get a pass and not Obsidia... maybe we want to reconsider including Ishango bone?) I am, as always, liable to change my mind as I hear more arguments... - -sche (discuss) 21:36, 7 August 2024 (UTC)Reply

Billion: a thousand millions or a milion millions

[edit]

Garner still uses the regular plural in these types of definition. Thus, for trillion he states that in Great Britain, it traditionally means a million million millions.

When it comes to defining nominal meanings as different from numerals, should the wording not reflect this? JMGN (talk) 16:13, 8 August 2024 (UTC)Reply

la Luynes

[edit]

(Notifying PUC, Jberkel, Nicodene, AG202, Benwing2): There has been a conflict on what to do with the headword line (pinging the article creator @Olybrius). My understanding is that the article "la" seems to be always used with the name of this river, and it is not capitalised; but I don't think we should change the headword line. What should we do? Are there other names in the same situation? This is not like the situation of La Défense where "La" is lexicalised as part of the name and is always capitalised (however there are also some websites that perhaps by mistake have left it uncapitalised.) --kc_kennylau (talk) 19:29, 8 August 2024 (UTC)Reply

@Kc kennylau This is very common with French rivers as well as other entities, e.g. most countries (les États-Unis, la France, but just Israël). We don't have a general policy on how to handle this; in English, there is now a param |the=1 for cases like this, which displays "the" in the headword (e.g. the White House; but not always used, cf. the Castro, a well-known district in San Francisco). In German, {{de-noun}} also has special support for this. I'm not sure about other languages. Benwing2 (talk) 19:42, 8 August 2024 (UTC)Reply
For that matter, most (all?) rivers in English use the as well. Benwing2 (talk) 19:42, 8 August 2024 (UTC)Reply
I do feel that we should maybe add articles in the headword for things like la la Barbade or l’Alabama. It makes it more clear for learners, especially since not every region or country uses an article. It's brought up quite often in French-learning spaces. AG202 (talk) 20:58, 8 August 2024 (UTC)Reply
I agree. I don't know if it's necessary for rivers because AFAIK all rivers take an article, but for countries and regions it varies from term to term and is very useful to include. That's why it's included in English and German, for example. Benwing2 (talk) 21:17, 8 August 2024 (UTC)Reply
I also agree. CitationsFreak (talk) 03:55, 9 August 2024 (UTC)Reply
I don't see the point of adding the article, knowing the gender is enough. See Nil, Rhône, Meuse, Rhin, Danube, etc. PUC19:46, 8 August 2024 (UTC)Reply
This is my view as well.
For Luynes, if the concern is that a reader may not know to use la (as opposed to *les), that can be clarified in a usage note. Nicodene (talk) 21:32, 8 August 2024 (UTC)Reply
Does your belief apply only to rivers, or also to countries and regions (see above)? If the latter, my concern is that these usage notes would need to be added to every country and region, and would be more compactly conveyed in the headword (following the example of English and German, among others). Benwing2 (talk) 21:35, 8 August 2024 (UTC)Reply
Personally I'd only add usage notes when something deviates from the pattern. Of the countries mentioned so far that's just Israël. Nicodene (talk) 22:41, 8 August 2024 (UTC)Reply
I'd just find it easier to include the definite article in the headword for learners, since it's not like it's particularly common to find them with the indefinite article. And then for the prepositions used, we definitely have to include usage notes or usexes (like fr.wikt) after having seen this page for countries and this page for U.S. states, which is what I've done at pages like Alabama and Barbade. AG202 (talk) 04:37, 9 August 2024 (UTC)Reply
It seems we're tackling several topics at once, but getting back to the topic of rivers specifically, the indefinite article is also in use: "Pour une Seine plus propre". Therefore, not only is it not useful to add the article in the headword, it's also potentially misleading. We don't feel the need to mention that Thames is used with an article, why should it be any different for French names? We're a dictionary, not a grammar book. PUC12:01, 9 August 2024 (UTC)Reply
To be clear, I don't think rivers need an article displayed. However, I will say that in English, with United States, you can definitely say "(for) a cleaner United States" as well; it's just not particularly common, and it's generally understood as a possibility, so I don't think it's misleading to show the definite article. AG202 (talk) 13:34, 9 August 2024 (UTC)Reply
What reason would there be to follow this approach that would not just as well justify adding articles to all French nouns? Nicodene (talk) 21:34, 10 August 2024 (UTC)Reply
Because not all countries/regions use an article. It's a class of words that has its own special rules/usages. AG202 (talk) 00:09, 11 August 2024 (UTC)Reply
Neither do all nouns. Nicodene (talk) 00:36, 11 August 2024 (UTC)Reply
I think you are missing the point. There are semantic reasons why some common nouns take articles and some don't, but there are no such reasons for proper nouns referring to countries and regions; it's essentially arbitrary. Benwing2 (talk) 00:38, 11 August 2024 (UTC)Reply
It's rather that I don't see why the “default” state for a class of words should need to be marked every time, as opposed to just the exceptions. Nicodene (talk) 03:50, 11 August 2024 (UTC)Reply
Because from a French learner's perspective, while it's clear that every noun has an article, it's not necessarily assumed with a country name considering how other languages handle countries. I've seen it happen so many times in French-learning spaces, where folks are confused as to which countries use an article, what gender they are, what prepositions they use, etc. etc. AG202 (talk) 01:19, 12 August 2024 (UTC)Reply
Then mark the nouns and proper nouns that deviate from the general pattern of taking articles? I don't see the problem at all. Nicodene (talk) 01:25, 12 August 2024 (UTC)Reply
How are readers of this dictionary (which is an English dictionary, intended for English speakers, whose countries don't normally come with articles) going to magically know these Wiktionary-specific conventions? Benwing2 (talk) 01:28, 12 August 2024 (UTC)Reply
I'd be very surprised if there exists a single dictionary of French with entries/headwords like le chat, la femme, l'homme, la France, Israël, janvier, le zèbre. Nicodene (talk) 01:34, 12 August 2024 (UTC)Reply
Most if not all bilingual dictionaries leave out lots of pertinent info; that doesn't mean we need to do the same (and User:AG202 and I have already said there is a material difference between 'le chat' and 'la France', which you seem to be willfully ignoring). Benwing2 (talk) 02:23, 12 August 2024 (UTC)Reply
Also it's not like several monolingual & bilingual French sources don't list them either: see: French Wikipedia, the French government, the Canadian government, The International Labour Organization, the UN's term database, and the EU, in addition to Quebec's government showing it using the "visiter" examples. I don't know why we can't do the same, especially since the project is aimed at English speakers. AG202 (talk) 02:33, 12 August 2024 (UTC)Reply
If the existence of an exception to a rule means that everything that does follow that rule needs to be marked as such, then you should also, logically, do the same with all French nouns. Nicodene (talk) 02:35, 12 August 2024 (UTC)Reply
(Notifying PUC, Jberkel, Nicodene, Benwing2): I'm bringing this back to the forefront, specifically for countries, because I keep seeing this issue pop up in French-learning spaces. And I'd like to mention again how other native-French sources do this as well: French Wikipedia, the French government, the Canadian government, The International Labour Organization, the UN's term database, and the EU, in addition to Quebec's government. Clearly it's something that's important for folks writing in French to know and it's fairly unpredictable, let alone from the perspective of a non-native. It wouldn't add too much to simply add the article for the countries that use one. It's a part of French that works so fundamentally different from English (since most countries don't use a definite article at all in English) that it'd be helpful to display within English Wiktionary and wouldn't cause any harm. I also wouldn't want it tucked away in an Appendix since we'd have to link there anyways, so might as well make it clear on the page without doing that.
An example would be with Canada: it'd simply display le Canada on the headword line. This would immediately tell someone unfamiliar with French that they should expect to see and use an article with the country. AG202 (talk) 03:18, 7 September 2024 (UTC)Reply
@CitationsFreak, @Kc kennylau as well AG202 (talk) 03:19, 7 September 2024 (UTC)Reply
I support this wholeheartedly. Benwing2 (talk) 03:21, 7 September 2024 (UTC)Reply
Support. CitationsFreak (talk) 06:04, 7 September 2024 (UTC)Reply
Support. This sounds very user friendly to me and I can't see it doing any harm. Andrew Sheedy (talk) 06:10, 14 September 2024 (UTC)Reply
What would it be for Luynes? --kc_kennylau (talk) 22:56, 10 September 2024 (UTC)Reply
@Kc kennylau: I honestly would prefer to keep this specific proposal to countries since those are more defined, as with rivers I'm more undecided. AG202 (talk) 12:46, 11 September 2024 (UTC)Reply

Bot rights

[edit]

We really should have a policy for removal of bot rights from accounts that have become inactive for a reasonable period. We could say a bot is temporarily inactive after 2 years and permanently after 3 years. For example, NanshuBot and Websterbot have not edited since 2003, and TheCheatBot has made no contributions since 2008. There must be a notice to the bot owner prior to removal of rights. Any bot removed due to temporary inactivity must be restorable at the request of the owner. However, if the rights are permanently taken away after a longer period, it would require another vote for their reinstatement. Let me know what you think. — Fenakhay (حيطي · مساهماتي) 02:45, 9 August 2024 (UTC)Reply

@Fenakhay Sounds good to me. 2 years sounds good for a temporary revocation but for a permanent revocation maybe 5 years; 3 years seems maybe too close to 2 years. I would add that if a bot owner requests that the bot rights be restored, this doesn't reset the clock; if they ask for a restoration but don't do anything with their bot, then the bot is still subject to permanent revocation after the relevant period from the last edit performed by the bot. Benwing2 (talk) 04:14, 9 August 2024 (UTC)Reply
Support Vininn126 (talk) 13:35, 9 August 2024 (UTC)Reply
Support BABRtalk 07:06, 10 August 2024 (UTC)Reply
Knowing which accounts are bots is surely of great value to researchers and others who are studying patterns of contribution to this wiki. Given the special importance of the bot flag as a way of distinguishing non-human contributors to our entries, I'd rather deal with the account compromise risk by indefinitely blocking inactive block accounts rather than taking away the bot group. This, that and the other (talk) 11:41, 11 August 2024 (UTC)Reply
Should this say "inactive bot accounts" rather than "inactive block accounts", @This, that and the other? LeadingTheLifeOfRiley (talk) 22:36, 11 August 2024 (UTC)Reply
Yes! This, that and the other (talk) 00:48, 12 August 2024 (UTC)Reply
I agree with TTO here. When a bot hasn't edited for a long time, it doesn't suddenly turn into a human being. So if the only goal is to prevent the bot account from getting compromised, we might as well use a block. Ioaxxere (talk) 06:03, 15 August 2024 (UTC)Reply
Yes, it confused me that neither Fenakhay nor Benwing2 actually gave a rationale for the proposal, so I had to read between the lines and assume it was to minimise the risk of account compromise, in which case blocking is the more appropriate solution in this context.
If it's needed to make my position clearer, I Oppose the proposal as put. This, that and the other (talk) 01:28, 17 August 2024 (UTC)Reply
Support @Benwing2's suggestions. — Sgconlaw (talk) 20:44, 13 August 2024 (UTC)Reply
Support removal after 5 years' inactivity. The bot operators could be dead now, but we don't know for sure. DonnanZ (talk) 08:56, 16 August 2024 (UTC)Reply
Hmm. This is tricky. I'm weakly inclined to Oppose as written and Support TTO's alternative idea to block inactive bots, because I appreciate that declaring inactive bots to have become human (!) — or, to have ceased being bots — is probably unhelpful... but I think the real issue is that we're using the bot flag to mean two different things at the same time, and that becomes a problem in situations like this where only one of the two things is true. We mean both "this account is [present tense] authorized to operate as a bot" and "this account is a bot". Perhaps what we really need is to have the devs add a new user group for "inactive or unauthorized bot" (which has no special rights), to which inactive, or recently-active and blocked unauthorized, bots can be switched...? But even then, blocking such bots seems advisable, so I am persuaded that blocking is a better way to accomplish the goals of "continue to indicate which edits came from a bot" and "prevent the accounts from making edits". - -sche (discuss) 04:33, 17 August 2024 (UTC)Reply
A comparison between bot activity and non-bot activity by the bot account holder may be useful. There may be cases where the bot account holder is still active, but not using their bot. DonnanZ (talk) 12:32, 18 August 2024 (UTC)Reply

Glottonym tweaks: Franco-Provençal, Venetian → Francoprovençal, Venetan

[edit]

These changes would bring Wiktionary in line with the naming conventions of modern English scholarship, as found in for instance the Oxford Guide to the Romance languages (2016).

Context:

  • Francoprovençal has been the name used in French scholarship since the 1970's. Removing the older hyphen lessened the misleading impression that the language is some sort of secondary blend of French and Provençal (Occitan). There is also an element of typographical convenience.
  • Veneto has always been the name used in Italian scholarship, if I'm not mistaken, with Veneziano predominantly or exclusively reserved for the varieties spoken in Venice and environs, as opposed to the rest of the Venetan domain (Ve1, Ve3‒7).

Nicodene (talk) 22:05, 9 August 2024 (UTC)Reply

Support, the Venetan proposal in particular has been a long awaited change, and given a part of modern Anglophone scholarship handle this sensibly we have little reason to stay behind. Catonif (talk) 22:15, 9 August 2024 (UTC)Reply
Support. Never heard of Venetan but if this is the accepted term, so be it. Benwing2 (talk) 07:40, 10 August 2024 (UTC)Reply
Thoughts, @Apisite, IvanScrooge98, Samubert96, Sartma, Ultimateria, Urszag, Word dewd544?
(Active users who speak Venet[i]an or have contributed to its entries.)
Nicodene (talk) 20:52, 13 August 2024 (UTC)Reply
Thanks for pinging me. I am pretty indifferent to the hyphen question for Francoprovençal, while I am not fully convinced about Venetan; after all, Venetia is the anglicized name for the region of Veneto (if the linguistic reasoning is to distinguish the specific dialect of Venice from the language as a whole). But if Venetan is now most common in English-language professional literature, then I don’t think there is much to debate. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 21:21, 13 August 2024 (UTC)Reply
The region's name occurs ~15 times more often in English as Veneto than Venetia, according to a Google search for “region of ____” (119000 results versus 7960). The latter occurs generally in historical as opposed to modern contexts.
Also at the moment we have no (reasonable) way to indicate a term used in Venice proper, as opposed to, say, Padua. A dialect label like Venetian would be identical to the name we currently use for the overall language (contra, as mentioned, the name used in linguistics). Nicodene (talk) 22:05, 13 August 2024 (UTC)Reply
Yeah, as I said, I get the reasoning. The thing is Venetian, despite being most commonly a word for stuff from Venice specifically, is not a strictly technical term like Venetan is—which is what comes to me a bit off given that this project is not directed to linguists but rather to the general public. And we could still label entries from the dialect of Venice as Venice, Venice dialect, Venice Venetian or something along those lines. But, again, it doesn’t mean I strongly oppose changing Venetian to Venetan. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 22:19, 13 August 2024 (UTC)Reply
The general public in Italy would be surprised to hear the dialect of, say, Padua described as veneziano. E.g. on Italian Wiki Dialetto padovano redirects to this page, where veneziano is mentioned solely as an external entity: “le parlate dei centri più importanti…sono state influenzate dal veneziano”.
So this is more about the general public of English-speaking countries, which isn't aware that such a language exists, as opposed to a local variety of (Standard) Italian. Nicodene (talk) 23:00, 13 August 2024 (UTC)Reply
Fair enough. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 23:09, 13 August 2024 (UTC)Reply
How do you pronounce "Venetan"? Benwing2 (talk) 23:20, 13 August 2024 (UTC)Reply
For me it's /ˈvɛnətən/ < /ˈvɛnətəʊ/ (≈Italian /ˈvɛneto/) + /-ən/. Nicodene (talk) 23:31, 13 August 2024 (UTC)Reply
@Benwing2: I would rather pronounce the term as /ˈvɛneɪtʌn/. --Apisite (talk) 10:49, 14 August 2024 (UTC)Reply
Support If we are not going to have separate h2 for the main dialect groups of the Venetan language, then we must go for Venetan. As @Nicodene said, Venetian is the dialect of Venetan spoken in and around Venice. For instance, Paduans, Vicentines and Trevisans speak Paduan, Vicentine and Trevisan respectively, not Venetian. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 15:27, 15 August 2024 (UTC)Reply
@Benwing2 Shall we go ahead, then? Nicodene (talk) 18:00, 22 August 2024 (UTC)Reply

Synthesised audio files (again)

[edit]

Hello, I'm still new here so not sure if I'm posting in the right place. But WT:TEA seems to be about individual words and my concern here is wider. There is a previous discussion of synthesised audio files at Wiktionary:Beer_parlour/2024/June#synthesized_audio_files but as far as I can see it was archived before reaching a firm conclusion, and I'm not sure what I'm supposed to do on encountering a batch of low quality synthesised audio files. They were added by a user whose only contributions seems were on 2 days in July, so I'm not sure if they're still active or are a known user who likes to contribute under different usernames. I'm sure somebody ought to have a word with the uploader, but that would best be done by someone with more experience than me.

The audio at fucking Nora is very unnatural, especially in intonation, and the one at paucilingual is of something else entirely. I suppose I could boldly revert all the additions, but I'm not sure whether the files themselves should be deleted or how to initiate that process. Moreover, the previous discussion seems to have no firm consensus on whether all synthesised audio files should be removed, or only the ones obviously of poor quality, and some of this batch seem somewhat reasonable (although all are obviously synthetic).

The previous discussion did seem to be inching towards developing some kind of process that an editor can follow when they encounter such files, but it doesn't look like a final consensus was reached on that either, or that it was written up on an appropriate help page. So while I'm flagging this particular batch up now, I think it would be helpful for there to be guidance available on what I'm supposed to do in future. LeadingTheLifeOfRiley (talk) 22:32, 11 August 2024 (UTC)Reply

@LeadingTheLifeOfRiley I personally think synthesized audio should not exist on Wiktionary, because it's not nearly good enough (even the best TTS) at equalling a native speaker's pronunciation. This is also probably why there's a request that only native speakers should record audio, since non-natives might make "mistakes". In that case there's definitely no doubt that programs will make mistakes, so I think synthetic audio should not be put on entries. Kiril kovachev (talkcontribs) 18:09, 19 August 2024 (UTC)Reply
Seems bad but if it's going to happen, then it would be more sensible to generate them programmatically on demand instead of uploading tons of files. 2A00:23C5:FE1C:3701:A5E9:B57F:509D:732E 14:53, 30 August 2024 (UTC)Reply
Strong oppose synthesized audio. Vininn126 (talk) 15:09, 30 August 2024 (UTC)Reply
OP here. Not synthesized audio. How dare you. JapanYoshi (talk) 10:17, 6 September 2024 (UTC)Reply
@LeadingTheLifeOfRiley: I don't believe that JapanYoshi's audios were TTS, but I do think that they fell under our problem of not having enough guidelines as to what an audio file should actually be on Wiktionary. They shouldn't be just anyone recording a word but should follow the guidelines that other dictionaries have set, with a very specific intonation + pronunciation. AG202 (talk) 00:41, 14 September 2024 (UTC)Reply
Why on earth do people keep accusing JapanYoshi of adding synthesized audio? They sound perfectly natural to me. It's a higher-pitched and more sing-songy intonation than typical dictionary audio files, but I really don't get why people thing it's synthesized. What a horrible way to welcome a user making good faith contributions. Other people have assumed this user isn't a native speaker, but I'm not even convinced of that. Why not ask on their talk page and provide further guidance before jumping on all their contributions as harmful?
@JapanYoshi, my apologies for the assumptions people are making about your contributions. Your audios could probably use some improvement to sound more natural, but this response to your recordings must be very frustrating. Andrew Sheedy (talk) 06:07, 14 September 2024 (UTC)Reply

Beautifying English etymology sections (2)

[edit]

By my count, Wiktionary:Beer_parlour/2023/September#Beautify_etymology_sections resulted in consensus that:

  • English etymologies should start with "From" (or similar) rather than just a link.
  • English etymologies should end with a period.

Thus, for example, unhoaxable, currently {{prefix|en|un|hoaxable}}, is converted to: From {{prefix|en|un|hoaxable}}. These changes will be going forward in a week's time for English only unless there are concerns that need to be addressed. (Notifying @Benwing2) Ioaxxere (talk) 23:25, 11 August 2024 (UTC)Reply

my only comment to make on the matter is using the + templates that support adding that, and making sure that we don't get accidental from repetition such as "from Derived from English example" and such. Akaibu (talk) 01:02, 12 August 2024 (UTC)Reply
Trust me that I know what I'm doing. As for + templates, there isn't currently consensus for adding them everywhere, so it's done on a per-language basis. Benwing2 (talk) 01:24, 12 August 2024 (UTC)Reply

Orthography guidelines for Venet(i)an

[edit]

Currently there are no formalised guidelines on what orthography scheme to lemmatise Venetan terms in, which has led to at least three different orthographies being used conflictingly at the moment. I have written a concise guideline list at WT:About Venetian/sandbox (the definition section will of course have to be updated if the Venetian → Venetan renaming goes through). I'm not familiar with the bureaucracy needed to get that out of the sandbox and make it official, is this BP post enough if it receives enough support? Catonif (talk) 10:11, 12 August 2024 (UTC)Reply

@Catonif: When there are no active editors and you are the only one to edit the language, you can just impose a standard first and change it later if people appear that have anything to say about it. If there are others that have an opinion, you can ping them on an About: page and discuss it there. I don't think a BP discussion is needed unless you personally want input. Thadh (talk) 12:31, 12 August 2024 (UTC)Reply
Right, thank you. @Sartma maybe you have input? Catonif (talk) 17:08, 12 August 2024 (UTC)Reply
I’ll chime in. I have made occasional edits to Venetian entries and I also thought some consistency was needed, thanks for working this out. Your proposal is very similar to the standardization I was thinking of, even though I don’t vibe with a couple of things: always marking ⟨è ò⟩ but not ⟨é ó⟩, and using ⟨qu⟩ rather than ⟨cu⟩. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 21:39, 13 August 2024 (UTC)Reply
Hi @IvanScrooge98, thank you for the input! About the accents, the different opinions by the three modern competing standards are
  1. only on /ɛ ɔ/, according to Grafia Veneta Unitaria (1995), which I followed
  2. only on /e o/ according to Brunelli (2012)
  3. on neither, according to Grafia Veneta Internazione Moderna (2017).
What are you proposing, on both /ɛ ɔ/ and /e o/? If so that personally seems unnecessarily a bit cluttered. FWIW, vec.wikt also follows GVU 1995 in this regard.
About etymological Q I have no strong feelings, yet again from GVU: Pur riconoscendo che foneticamente non c'è nessuna differenza rispetto a cu + vocale, per il principio di adeguamento, per quanto possibile, alle abitudini grafiche italiane si scriverà aqua []. Né, per quanto l'identità sia evidente, si procederà a un livellamento, in ogni caso, con q (aqua, quor, squola) o con c (àcua, cuòr, scuòla). Admittedly, both Brunelli and GVIM seem to later ditch this principle and opt for cu always, and I'm ready to do so as well for the sake of scientific consistency, but my ultimate goal is to balance out and find the middle point of different forces, two of contrasting which being what the standard guidelines proscribe and what is actual everyday practice. My question is how familiar would cu spellings be for the average speaker (or rather, writer/reader) of Venetan? If it looks relatively natural then I can agree on switching to cu. If on the other hand it looks too unnatural and "artificial", or even straight-up wrong, then I'd rather keep the qu, which as far as I understand is still by far the commonest. Catonif (talk) 10:22, 14 August 2024 (UTC)Reply
I’d go for accent on neither regarding ⟨e⟩ and ⟨o⟩. I think we should treat them like other vowels.
As you said, my main point about ⟨cu⟩ is consistency. In any case, other attested orthographies are normally included under “alternative forms”, and so we can have, e.g. aqua, àqua, àcua and the like to point to the main entry acua, under which they would be listed together. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 10:35, 14 August 2024 (UTC)Reply
The thing is, as apparently admitted by the proponents themselves, qu is only more common due to the influence of Italian orthography, where the u in cu normally represents a full vowel and not a semivowel. This distinction does not exist in Venetian the same way it doesn’t in Spanish, for example, and I think we should proceed accordingly. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 10:35, 14 August 2024 (UTC)Reply
@IvanScrooge98 Alright, I can bend towards no accent for the mid vowels on paroxytone (piane) words. This has the disadvantage of making the orthography ambiguous, making IPA sections necessary, but brings the scheme closer to both everyday practice and GVIM and arguably decreases visual clutter. As for ⟨cu⟩ I'm still unsure how natural it looks for speakers, but whatever. I'll make these two changes to the page and then officialise it into mainspace in a couple days from now if no further input is given. Catonif (talk) 12:46, 14 August 2024 (UTC)Reply
Thanks again for working on this! [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 13:14, 14 August 2024 (UTC)Reply
@Catonif: Native speakers of Venetan dialects use any sort of spelling. Very few native speakers are even aware of the various spelling proposals, so they tend to apply what they learned in school for Italian. I have no issues tanking a bold approach here and go with what we prefer. My preference would be to use <cu> all the time. As for vowels, I would always indicate them in the headword, since some words in different dialects are distinguished only by their openness (I say dòcia, but 5 km from where I live they say dócia. I wouldn't write any accent in the entry name; there's too much vocalic variation depending on the dialect to fix a certain accent in the entry name. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 15:19, 14 August 2024 (UTC)Reply
Hello @Sartma! Been a while. About accents in the headword, I'd prefer to avoid it as much as possible given sometimes it would be also in the entry name (for oxytone and proparoxytone terms) and sometimes it would only be in the headword, which although clear to us I believe would bring confusion to readers. Hence, I'd just omit accents on all paroxytone terms and leave dócia~dòcia shenanigans to the pronunciation section. Catonif (talk) 09:25, 16 August 2024 (UTC)Reply
@Catonif: Been a while indeed. I had to take a break from Wiktionary's toxic environment. You're right, we should use the pronunciation section for those cases. 👍 — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 11:16, 16 August 2024 (UTC)Reply
Alright, done. The guidelines are now official. Catonif (talk) 11:05, 19 August 2024 (UTC)Reply
You didn’t change them as agreed though! XD
I edited the page. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 11:23, 19 August 2024 (UTC)Reply
Derp! I don't know how I could forget. 🤦‍♂️ Thank you for updating them. Catonif (talk) 11:31, 19 August 2024 (UTC)Reply

Character info box redesign

[edit]

I've redesigned the box generated by {{character info}} to create a better mobile experience by using space more efficiently. You can see how it looks here. Unfortunately I'm not familiar with the inner workings of Module:character info so I may need help in that respect. Ioaxxere (talk) 06:10, 13 August 2024 (UTC)Reply

If the only differences are in CSS, we can just integrate it into Template:character info/style.css. body.mw-mf can be used to detect the mobile version. — SURJECTION / T / C / L / 19:15, 16 August 2024 (UTC)Reply
@Surjection: I meant that the desktop version and the mobile version of the new design have identical HTML, but they each have a very different HTML to the current design so the module would have to be changed. Ioaxxere (talk) 20:30, 17 August 2024 (UTC)Reply
@Ioaxxere This is just my opinion, but the desktop one in my opinion seems to be a bit too small in the new version. Comparing the page 🝬 that you use as an example, the current one puts the Unicode/HTML entity in a line above the name of the character, which keeps it out of the way; yours, on the other hand, puts it in line with the name, which seems to take up too much space, considering the box has also been shrunk.
I also like the liberal use of space employed by the current design on the bottom, which takes up another row to display the Unicode codepoints for the previous and next characters (which I also prefer), and correspondingly, there's a gap underneath the name of the character block, which makes it feel less cramped.
If there isn't a problem with the space usage in your opinion on the desktop version, would it be possible to keep it quite similar to how it is already in terms of spaciousness? I enjoy the way that it doesn't feel constrained right now and has all the space it needs to show all the information, whereas the new design seems too small for me (since it would easily have room to grow if I viewed it on any page as it is).
What do you think? Kiril kovachev (talkcontribs) 18:18, 19 August 2024 (UTC)Reply
@Kiril kovachev: That's a good point, so I'll see if I can make the new design use a similar amount of space on desktop to the current design. But there is some information that I think is unnecessary, like the code points of the neighbouring characters which I think are always one less or one more than the current page's code point. Ioaxxere (talk) 19:26, 19 August 2024 (UTC)Reply
@Ioaxxere Yeah, you might be right about that, it is technically not very useful. Maybe you can disregard that part, since it's just my arbitrary preference, but I guess I've gotten used to how it is at the minute. But I don't want to impede your change if you'd rather remove it. Kiril kovachev (talkcontribs) 22:38, 19 August 2024 (UTC)Reply

Applying ux to English entries (2)

[edit]

Last month in this discussion, @JeffDoozan told me that his bot followed some strict rules when applying {{ux}}. I would like to gain consensus for the following bot job for English:

  • Apply {{ux}} even when the usage example contains a wikilink (like at attire#Noun) or multiple bolded items.
  • Apply {{ux}} even when the usage example doesn't start with an uppercase letter, like at proper subset#Noun or protusible#Adjective.
  • Apply {{ux}} even when the usage example is a phrase (rather than full sentences), like at puffing#Noun.
  • Apply {{co}} when the usage example contains two words, not including a leading "a" or "the", like at rabbit-proof#Adjective.

This algorithm isn't perfect and may result in some usage examples being misclassified, but it is still a big improvement over not using templates at all. Ioaxxere (talk) 19:21, 13 August 2024 (UTC)Reply

I have no problem with that. I suspect there are plenty of three-word collocations though. Andrew Sheedy (talk) 03:50, 18 August 2024 (UTC)Reply
@Ioaxxere This is because there are some usage examples that are entered as plain text, without using a template, right? I support these rules in that case. Kiril kovachev (talkcontribs) 18:01, 19 August 2024 (UTC)Reply
@Kiril kovachev: Yes, exactly. Ioaxxere (talk) 19:26, 19 August 2024 (UTC)Reply
@JeffDoozan: It doesn't seem like anyone is opposed to this, so feel free to run the bot at your convenience. Ioaxxere (talk) 21:35, 28 August 2024 (UTC)Reply

planning to standardize names of categories like Category:Semantic loans from English

[edit]

For historical reasons, we have umbrella categories like Category:Semantic loans from English that are missing the normal "by language" terminology. Only some such etymology categories are this way, cf. Category:Semantic loans from English vs. Category:Pseudo-loans from English by language. I am planning on renaming these to conform to standard umbrella category terminology, e.g. Category:Semantic loans from English -> Category:Semantic loans from English by language. This specifically applies to:

  • Phono-semantic matchings from LANG
  • Semantic loans from LANG
  • Terms borrowed from LANG
  • Terms calqued from LANG
  • Terms derived from LANG
  • Terms inherited from LANG
  • Terms partially calqued from LANG
  • Transliterations of LANG terms

Benwing2 (talk) 03:16, 15 August 2024 (UTC)Reply

Sounds good to me. Ioaxxere (talk) 06:05, 15 August 2024 (UTC)Reply
Sounds fine. Vininn126 (talk) 10:45, 16 August 2024 (UTC)Reply
Support for consistency. — excarnateSojourner (ta·co) 03:59, 20 August 2024 (UTC)Reply

Wardian

[edit]

We have Wardian case (and Wardian cases), but no entry at Wardian. Should we? What should such an entry say?

Are there other, similar examples? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:09, 15 August 2024 (UTC)Reply

{{def-uncertain}} for graphemes

[edit]

Could we add an option to this template that a reading is uncertain, for extinct scripts that are not completely deciphered? This came up for me with the Linear B glyphs, many of which we know are syllabic letters rather than logograms (Unicode even separates them into different blocks), but there is uncertainty or dispute over which syllable they transcribe. Saying their "definition" is uncertain is a weird way of putting that; normal parlance is "reading". We might also want "reading" to trigger different categories (or perhaps rename the existing categories "terms with uncertain meaning or reading").

For the logographic Linear B glyphs, the current wording is fine IMO, but in other scripts "reading" might be appropriate to logograms as well. There may also be glyphs that have both phonographic and logographic uses, in which case we might want "reading" in one section and "meaning" in the other.

There are cases in other scripts where this might apply to words written in an opaque phonographic script (e.g. Akkadograms) -- we might know the meaning but not the reading, or vice versa. Possibly in some cases we'd want to say that both the reading and meaning are uncertain, rather than a binary choice. kwami (talk) 22:22, 15 August 2024 (UTC)Reply

"der" template for phonological influence from substrate

[edit]

Hey all, apologies if this has been asked before (cursory archive search suggests it hasn't), just checking in to clarify whether or not the "der" template can be used for languages that influenced a term, but which aren't ancestors of said term. Sanskrit नड (naḍa, reed), of Indo-European origin, is thought to have become retroflexed via substrate influence, most likely Dravidian. User:Djkcel says that Dravidian should be marked with a "der" template, but I'm not so sure about this, and the der template page makes no mention of this case, whether to confirm or deny. The Dravidian hasn't quite "hybridized" with the Sanskrit term (which would make "der" more clearly suitable), but it has exerted influence on it. Does anyone have any clue on this? User:Mahagaja, User:Vininn126. Thanks. Agamemenon (talk) 23:19, 16 August 2024 (UTC)Reply

I certainly see "influence" being a case for using der in quite a few entries such as English arbour, Spanish barrueco, Bulgarian агро- (agro-), German Runde, Dutch automaat, Sardinian piaghere, etc. I don't know that there is a template for designating "influence" from another language but perhaps that would be better than der... DJ K-Çel (contribs ~ talk) 00:14, 17 August 2024 (UTC)Reply
Since we even categorize loan meanings, any substitution, as “derivations” from foreign language, yes. See also the graph of foreignisms. (No templates for loan renditions and loan creations yet, as far as I know.) Fay Freak (talk) 01:16, 17 August 2024 (UTC)Reply
{{der}} has often been used in such cases. Vininn126 (talk) 07:16, 17 August 2024 (UTC)Reply

Category/template for terms that are not derived from another language, i.e. derived internally

[edit]

We currently have no categorization/template for terms that come from within a given language and are not derived or borrowed from another, such as compounds(jackhammer), clippings(motherfuck, affixation(frotteurism), among other methods of internal word formations. even if the aformentioned templates were to be given whatever is needed to denote said internal derivation, my observations have told me that a dedicated template would likely be needed as some etymologies don't slot neatly into an existing etymology template, and are usually just given either the mention or link template Akaibu (talk) 22:41, 17 August 2024 (UTC)Reply

What do you mean? jackhammer is a compound term, so it's categorised as a compound, which is implied to be derived internally. Theknightwho (talk) 22:56, 17 August 2024 (UTC)Reply
@Theknightwho cases like booze, trevally and squeegee. Akaibu (talk) 23:14, 17 August 2024 (UTC)Reply
Okay, those are alterations - I don’t know if they form a coherent class of terms, though. Theknightwho (talk) 23:19, 17 August 2024 (UTC)Reply
@Theknightwho there's more than just alterations, peruse the top few hundred examples of https://petscan.wmcloud.org/?psid=29112298 for more cases Akaibu (talk) 23:49, 17 August 2024 (UTC)Reply
Isn't {{Template:from}} meant for generalized within-language derivations? Although it seems it may not actually function any differently from 'mention' at the moment, and as mentioned, more specific templates like the affix template should be used if appropriate. See e.g. puny, which was brought up when the etymon template was being introduced as an example of this type of derivation.--Urszag (talk) 00:55, 18 August 2024 (UTC)Reply
I wasn't actually aware this existed. Theknightwho (talk) 01:15, 18 August 2024 (UTC)Reply
that template currently doesn't categorize, if it can do such and be used for such internal derivations, that would satisfy. Akaibu (talk) 01:58, 18 August 2024 (UTC)Reply

Reconstructions and scripts

[edit]

I was informed by @Victar that reconstruction entries on Wiktionary are rendered in Latin script corresponding to their vocalisation while only attested words are rendered in the language's original script, which is how Old Persian entries are already handled on Wiktionary.

However @Mellohi! changed the /k/, /w/ and /y/ in the Gaulish reconstructions to /c/, /u/ and /i/ so as to reflect the forms they are attested in Latin, and they argue that reconstructed words should be spelt like the other attested words in the language.

I have also also come across a Hittite reconstruction using cuneiform in the form of Hittite *𒊭𒀀𒆪𒉿𒀭, which is even more problematic than regular reconstructions because of how the large number of phonetic values per sign and large number of signs corresponding to a phonetic value in cuneiform, due to which even attested words in the many languages using cuneiform did not have fixed spellings in the script.

Can a fixed rule be established for all reconstructions in all languages? Personally, I argue for the phonetic-based use of the Latin script because reconstructions in the original scripts are not always feasible or predictable due to lack of standardisation in pre-modern scripts. Antiquistik (talk) 12:55, 18 August 2024 (UTC)Reply

That's not feasible in cases where the script form is reconstructable, even if the pronunciation is not; e.g. I don't think our Ancient Greek reconstructions should be converted to the Latin script. Theknightwho (talk) 13:30, 18 August 2024 (UTC)Reply
That's fair. In this case, there will need to be criteria established to decide which languages' reconstructions should be in Latin script and which should be in their native scripts. Antiquistik (talk) 13:49, 18 August 2024 (UTC)Reply
@Antiquistik I agree in principle, though there may be cases where one or other is warranted in the same language, depending on what the evidence for the reconstruction is. Theknightwho (talk) 10:17, 19 August 2024 (UTC)Reply
@Theknightwho This is fair as well, though I suppose a bit more tricky. I think it needs to be discussed more thoroughly. Antiquistik (talk) 11:21, 19 August 2024 (UTC)Reply
To quote what I wrote on your talk page:
"It's true, most academic reconstructions are written in Latin script, for clarity, and that's especially the case for languages like cuneiform Old Persian, which are orthographically unpredictable and difficult to read. To the point of Primitive Irish, I can't find any author that reconstructs it in Ogam,{{R:sga:McCone:1986|page=245}} and there is probably a good argument to reconstruct it in Latin script as well, and I would support that if brought up in a discussion."
I would absolutely support moving RC:Hittite/𒊭𒀀𒆪𒉿𒀭 to RC:Hittite/šākuwan. Hittite cuneiform has dozens of alternative characters and logograms. --{{victar|talk}} 20:03, 18 August 2024 (UTC)Reply
I would Support requiring reconstructions for languages in scripts that are incompatible with the Latin alphabet to be converted to it, whether by transliteration, transcription, or spelling out the phonetics. Cuneiform is a partially logographic script, so the correspondance between spelling and phonetics is pretty loose in many cases. As you progress toward a more-or-less phonemic alphabet or abugida, it's more of a gray area. I wouldn't object to keeping reconstructions in such a script if there are other reasons to so so. In cases such as Gaulish, which is already in the Latin script, minor adjustments for compatibility with attested spellings aren't that big a deal. Chuck Entz (talk) 20:38, 18 August 2024 (UTC)Reply
Editors naturally make the correct choices. Clarity is the key criterion here. Middle Persian had multiple shit scripts only so it is reconstructed in Romanization, Old Persian is also not transparent enough. Akkadian does not have the problem since we lemmatize at Latin script per language-specific decision, while for Sumerian I can imagine some to prefer cuneiform, but the Sumerian internet community is not at Wiktionary yet.
People working with a language often think the script directly instead of what it stands for, one reads with visual memory. For Old South Arabian it is likely that one would use their script and forgo Romanization since knowledge of many vowel values is wanting and acquaintance with the script is expected if one does anything at all with the language, and at a similar vein I made Punic reconstructions to avert entryism (an interesting context to apply this term, yes), also motivated by the idea that no one should enter ancient Romanizations (and Grecizations, hell why is this word unattested) outside of an appendix (as they would go out of hand and contain more textual corruption than truth). Fay Freak (talk) 21:28, 18 August 2024 (UTC)Reply
"Grecizations" is probably unattested because the usual word ("Hellenizations") is easier to say. Andrew Sheedy (talk) 22:31, 18 August 2024 (UTC)Reply
@Fay Freak This would align with @Chuck Entz's comment that terms in logographic scripts would be more inaccurate to reconstruct compared to those recorded in more phonemic ones. For example, I would not necessarily support moving the Gothic and Prakrit reconstructions to the Latin script either, primarily because their representations in their respective scripts are certain to be accurate to how they would have been historically written (although I would not necessarily oppose such a move either if it was seriously proposed on Wiktionary).
Now, there is also the question of the extent to which various scripts have been used for the concerned languages as well. Given that this discussion started with concerns regarding Gaulish reconstructions, I think it is fair to take into consideration the fact that Gaulish was a primarily oral language which did not extensively use writing. I approve of primarily using the Latin script to render Pali on Wiktionary for the similar reason that Pali never had any one single primary script, and I think Wikipedia, though an independent project from Wiktionary, is also correct in primarily using the Latin script for rendering Sanskrit.
All this is to say, in addition to the question of the accuracy of reconstructions in scripts like cuneiform which had very loose spelling conventions which I have already addressed elsewhere in this discussion, there are also many more layers of nuance to take into account when choosing which reconstructions to Romanise and which ones to represent in scripts their languages are attested in. Antiquistik (talk) 11:46, 19 August 2024 (UTC)Reply
@Victar: It is also extremely difficult to find authors that write Hittite in cuneiform, but that doesn't mean we shouldn't do it. I think our reconstructions should be given in the script the language used, unless of course that is not possible. Thadh (talk) 10:37, 19 August 2024 (UTC)Reply
That certainly wouldn't be possible for the vast majority and even in cases where some (proto-)Hittite person was speaking a (proto-)Hittite word, the odds are very good that person was illiterate anyway, so it does make sense to me to try to normalize all these into a Latin script, as someone who does not know about Ancient Near East languages or the contemporary scholarship on them. —Justin (koavf)TCM 10:40, 19 August 2024 (UTC)Reply
I don't think the literacy of past speakers is relevant to reconstructions. Theknightwho (talk) 11:22, 19 August 2024 (UTC)Reply
@Theknightwho I would say that it depends on the specific language. It's fair to opt for Romanisation for a primarily oral language which barely used writing.
Though, when it comes to Hittite, the issue is instead about the reliability of reconstruction in the language's script, given that Hittite was written in cuneiform. And cuneiform being a mixed logographic-syllabographic script meant that it had very loose spelling conventions, not to mention that in cuneiform a single character could have several different phonetic values and a single phonetic value could be represented by several different signs. Antiquistik (talk) 11:54, 19 August 2024 (UTC)Reply
@Antiquistik To clarify: the important issue here is the reliability of reconstructions. In the case of Hittite, your (and Victar's) argument has been that we cannot reliably reconstruct terms in cuneiform, and therefore we shouldn't have reconstructions in it; for Gaulish, the fact that it had no particular literary tradition also prevents us from being able to reliably reconstruct an authentic representation, because no such representation ever existed in the first place. In such cases, we use a normalised Latin script to represent morphemes (though I admit the distinction becomes confusing for languages that have actually been attested using the Latin script, such as Gaulish, even if it was only done on an ad hoc basis). However, at no point does the literacy of the majority of speakers play any part in this: it's true that "the odds are very good that person was illiterate anyway", but that was true for Ancient Greek, Gothic and Prakrit, too. The fact is that it doesn't matter, for our purposes. Theknightwho (talk) 12:07, 19 August 2024 (UTC)Reply
@Theknightwho My bad, I forgot to clarify myself. Indeed, I don't think literacy of the speakers is a factor regarding whether or not reconstruction entries should be in scripts that were used to write the languages they are from.
Ability to create a reliable/accurate reconstruction in the script, and presence or lack of a literary tradition in a particular script or several specific scripts, should dictate whether or not to Romanise reconstruction entries. If either one of those is missing (e.g. the 1st one for Hittite; the 2nd one for Gaulish; both 1st and 2nd one for oral-only languages), then the reconstructions should be Romanised. Antiquistik (talk) 12:19, 19 August 2024 (UTC)Reply
@Antiquistik Agreed. Theknightwho (talk) 12:23, 19 August 2024 (UTC)Reply
@Thadh The problem is that there was not one single fixed way of spelling words in scripts like cuneiform, or even Egyptian and Anatolian hieroglyphs and Linear A and B, for that matter. There were multiple ways in which words could be written even if using a very limited set of characters, and, in the case of cuneiform, a single character could have several different phonetic values and a single phonetic value could be represented by several different signs. This makes any process of reconstructing using cuneiform and similarly-functioning scripts extremely unreliable in terms of accuracy. Antiquistik (talk) 11:25, 19 August 2024 (UTC)Reply
@Antiquistik: We often normalise the entries for such languages anyway, using one 'standard' spelling with multiple attested variants. Don't see how that would be problematic for the reconstructions.
As for @Koavf's point: If we're using arbitrary signs for recording a spoken language, might as well use a native script. What I said wasn't about "Proto-Hittite", it was about actual, recorded Hittite - scholarly consensus isn't always the best thing for us to follow. In the case of orthography, it's not. You're not arguing for reconstructing Middle English using Canadian syllabics just because it had a messy orthography - seems a bit disingenuous to force Latin unto those other languages, no? Thadh (talk) 14:31, 19 August 2024 (UTC)Reply
@Thadh What you are proposing is feasible for adjads, abugidas and alphabets, not not with logographic and/or syllabographic scripts like cuneiform and the like.
Hittite cuneiform alone had four signs for /ša/, one sign for /sak/, eight signs for /ku/, and four signs for /wa/.
And the were no rules regarding which signs to give precedence in cuneiform, which simply makes it too uncertain how to normalise an unattested term in a language using this script. Antiquistik (talk) 15:18, 19 August 2024 (UTC)Reply
@Antiquistik: I know how Hittite works. I also know these signs have the same value and were used interchangeably, so we can actually decide ourselves which one to give a priority. We can, for instance, say that ša1 is from now on the 'prioritised' form, with the others being added based on attestation - there, problem solved, all reconstructions now use ša1. That is exactly how we would treat any other attested language.
For logographic scripts, yes, that won't work, but most languages have a segmental alternative - the only languages that doesn't that I can think of that is deciphered is Sinitic, and even there we can often just use the modern sign for e.g. classical Chinese. Thadh (talk) 15:26, 19 August 2024 (UTC)Reply
I don't know that it's disingenuous: it's just the most convenient manner to write these reconstructed forms. It may well be the case that the literature uses cuneiform to write them, but it could also easily be the case that they use Latinized forms. There would be nothing wrong in principle with Cherokee or Sinhala script or whatever, but I find it highly unlikely that is what the sources use. —Justin (koavf)TCM 20:59, 19 August 2024 (UTC)Reply
Support having reconstructed entries for poorly attested languages with weird scripts in Latin script. Making reconstructed Primitive Irish entries in Ogam is a pain in the ass. —Caoimhin ceallach (talk) 20:46, 19 August 2024 (UTC)Reply
@Caoimhin ceallach: This online keyboard easily solves the Ogham issue. I think it might even be how those written-in-ogham titles even exist in the first place. But I don't mind changing reconstructed Primitive Irish to romanized entry names; albeit scholars generally romanize Primitive Irish in all-caps. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 21:44, 20 August 2024 (UTC)Reply

Arbitrary break: recap

[edit]

So we have the following categories of related issues:

  • Primitive Irish: Requires this online keyboard to type in conveniently, and its attested inscriptions consist of virtually exclusively personal names.
  • Gaulish: Written in Latin script natively, the "Romanized" reconstruction pages are inconsistent with attested native Gaulish orthography. E.g. the reconstruction pages spell /k/ with the letter K when the actual Gauls spelled the sound with the letter C, and the reconstructions tend to use -y- even though Gauls never spelled /j/ like that (they used the letter I).
  • Cuneiform and Persian scripts: general nightmare to handle digitally, Romanized reconstruction entries are preferred for these to avoid hassles.

Ceso femmuin mbolgaig mbung, mellohi! (投稿) 21:44, 20 August 2024 (UTC)Reply

Pinging the other participants in this discussion: @Caoimhin ceallach, @Chuck Entz, @Fay Freak, @Koavf, @Thadh @theknightwho, @Victar. Antiquistik (talk) 07:49, 23 August 2024 (UTC)Reply
1 Could you add that to Wiktionary:About_Primitive_Irish? I think all hacks like that should be shared to help less savvy editors like me.
2 I agree with using C, I, U, also because that's what Delamarre's {{R:cel:DLG}} does, although he doesn't mark unattested forms as reconstructed, which we should definitely not copy.
3 Yes. —Caoimhin ceallach (talk) 13:12, 23 August 2024 (UTC)Reply
@mellohi!, Caoimhin ceallach Is it accurate to say that Gaulish was natively written in the Latin script? It appears to have used Greek and Greek-derived scripts like the Lugano script before the Roman conquest, and Latin epigraphically under Roman rule, but it seems to have otherwise been a primarily oral language with no literary form.
I must also note that Wiktionary sometimes uses its own spelling conventions, especially for reconstructions (see the Old Median reconstructions by @Victar compared to how they are presented in academic literature). Antiquistik (talk) 08:26, 24 August 2024 (UTC)Reply
I disagree on the cuneiform front. Linear B is also not great for writing in either, but that doesn't mean we shouldn't use it. Just normalise some phonetic spelling, and then list attested spellings, just like any well-covered language is (e.g. Old East Slavic). Thadh (talk) 17:49, 23 August 2024 (UTC)Reply
Just to be clear, these are guidelines for only reconstructions. Attested forms of Old Persian and Hittite cuneiform, ogham Primitive Irish, etc., should still be written in their attested scripts. --{{victar|talk}} 19:18, 23 August 2024 (UTC)Reply
Yes, I understand. I would still strongly prefer using the native script in a normalised transcription over Latin. Thadh (talk) 12:09, 5 September 2024 (UTC)Reply
It seems that this discussion is now stagnant. Have we come close to any decision, or do we still need to discuss anything further? Antiquistik (talk) 14:01, 2 September 2024 (UTC)Reply
Pinging participants again to see if we have any decision so far @Caoimhin ceallach, Chuck Entz, Fay Freak, Koavf, mellohi!, Thadh, theknightwho, Victar. Antiquistik (talk) 07:03, 5 September 2024 (UTC)Reply

St and St. abbreviations of Saint

[edit]

I would like to discuss the idea of sorting St and St. under Saint in place name categories in particular. This idea, though radical, is not as daft as it sounds, as Oxford and Collins do this in their printed dictionaries. That fact can't be proved online, you need to own, or refer to, the actual volumes to find out.

There has been a long discussion in the Grease Pit (Wiktionary:Grease_pit/2024/August#What_to_do_with_St?) over the treatment of the abbreviations, which are currently mixed up with all other entries beginning with St-. This is hardly satisfactory, and User:Theknightwho has been the obstacle to change. I did prepare an example with St Neots of proposed sorting under Saint using a sortkey, but TKW saw fit to revert it.

This user has become far too used to getting his or her own way since becoming an admin, and needs reining in by more senior admins. I am non-admin, and don't want adminship, but this leaves me open to being downtrodden, despite over 255,000 edits since 2013. DonnanZ (talk) 15:31, 18 August 2024 (UTC)Reply

For context: English entries starting with "St" and "St." are not "mixed up": they're simply sorted according to the method that decided last year in in this thread, where we agreed that English sortkeys should ignore spaces and punctuation, in accordance with how most English dictionaries seem to do it. I reverted Donnanz's attempt to change all the sortkeys to manual sort=St., which put them out-of-step with all other English entries, and Donnanz has spent the last few days acting like this is about my personal preference versus his, for some bizarre reason, despite me and @Urszag explaining numerous times that he can't just ignore established consensus; I'm still not sure if Donnanz actually understands that, going by what he's just commented above. In fact, my personal preference would have been to not ignore spaces and punctuation in the first place, as I've told him several times, which would have avoided them being "mixed up" (as he calls it), but that's not what we decided.
That all being said, I would oppose any change to sortkeys to treat "St" and "St." as "Saint", because it makes automatic sorting impossible when it appears in the middle of terms, as it's impossible to distinguish from the abbreviation for "street": compare Bury St Edmunds (saint) with Bow St. Runner (street). Theknightwho (talk) 16:21, 18 August 2024 (UTC)Reply
It would not be necessary to apply a sortkey to those examples, only to entries beginning with St or St. Thus the sortkey would need to be selective, and only applied manually, not automatically, to those that need it. With the limited numbers of entries this is not insurmountable. DonnanZ (talk) 16:55, 18 August 2024 (UTC)Reply
If "St" and "St." are supposed to be sorted like "Saint", then Stoke St Gregory should sort before Stokes Bay and stokesia, because Stoke Saint Gregory would be sorted before them. By default, Stoke St Gregory is sorted after them, which means we're being inconsistent if we only sort "St" and "St." like "Saint" at the start of a term. Here's an alphabetically-sorted column template which demonstrates it: Theknightwho (talk) 17:13, 18 August 2024 (UTC)Reply
Some of those would never appear in a place category. I don't think we should be looking for inconsistencies in sorting five or six characters into an entry. Stokes Bay is always going to sort before Stokesley, Stoke sub Hamdon, Stoke Trister and Stoke Works, and Stokes County would only appear in a US list, not an English one. There is no mixing of the two category lists. DonnanZ (talk) 18:37, 18 August 2024 (UTC)Reply
Okay, so you're advocating for a sorting system that isn't even internally consistent. Let me change that oppose to a strong oppose. There are plenty of other examples where this happens, so taking issue with the specific one I gave completely misses the point; nevermind the fact that you're advocating for sorting place name categories in a special way that adds an annoying maintenance burden to ensure they're sorted consistently, and doesn't really make sense: either we do it everywhere, or we don't do it at all. Theknightwho (talk) 18:46, 18 August 2024 (UTC)Reply
I need to hear from other editors. DonnanZ (talk) 18:58, 18 August 2024 (UTC)Reply
Also you're not even correct about them not being mixed anyway: Stokesley and Stoke St Gregory both appear in Category:en:Places in England, so they would appear in the wrong order. Theknightwho (talk) 18:52, 18 August 2024 (UTC)Reply
It used to be, or may still be, the bibliographical standard to treat abbreviations at the beginnings of terms (not sure about what happens when they occur in the middle) as if spelled out in full for sorting purposes. Thus St and St. were treated as if spelled as Saint, and Mc and M‘ (e.g., in McDonald) as Mac. That being said, I can see how this is not obvious to the average user who may expect to see all the Saints grouped together in one lot, and Sts in another. It may also cause difficulty in sorting as Theknightwho pointed out, though I wonder if there is a technical way to say "sort St and St. as Saint". At this point I remain undecided. — Sgconlaw (talk) 18:55, 18 August 2024 (UTC)Reply
@Sgconlaw Not without solving the "street" issue. It's inherently ambiguous, so it would always have to be manual; see Bow St. Runner. Theknightwho (talk) 18:57, 18 August 2024 (UTC)Reply
@Theknightwho: it can't be that common for there to be entries with Street abbreviated as St. I would imagine the use of St to mean Saint is much more prevalent in dictionary entries. What if we defaulted St and St. to mean Saint, and provided a parameter to override it manually? (This is, of course, on the assumption that there is consensus that St and St. should be treated like Saint, and perhaps the Mac situation should be treated that way as well, which has yet to be decided.) — Sgconlaw (talk) 20:38, 18 August 2024 (UTC)Reply
@Sgconlaw: Guess who created Bow St. Runner? It was TKW, today. DonnanZ (talk) 20:53, 18 August 2024 (UTC)Reply
@Sgconlaw I don't really see what benefit this extra maintenance work brings, as it's adding additional (ongoing) work for no clear purpose; the point of sorting terms in categories is to make them findable, while this achieves the opposite by defying user expectations, especially if we only apply it to some abbreviations and not others. FWIW, the OED Online sorts "St. Elmo's fire" under "St", not "Saint"; it's the same for all other saint terms. For instance, "stag", "St. Agatha's letters", "stage cloth": the precise same method we currently use. Theknightwho (talk) 21:01, 18 August 2024 (UTC)Reply
US place categories are not mixed with English place categories, and this applies to all countries. DonnanZ (talk) 19:29, 18 August 2024 (UTC)Reply
@Donnanz: It seems like if we do this, we should sort every other abbreviation under its expanded form for consistency, which will be problematic. Ioaxxere (talk) 05:27, 19 August 2024 (UTC)Reply

FWIW, although my personal intuition would be to sort things as spelled ('respecting' spaces), thus "sack, saint, Saint Bernard, sap, St, St Andrew's cross, St Elmo's fire, stab, stand, stellar", the few dictionaries I've managed to find "St Whatever" terms in don't match either my intuition or each other: The Webster's New College Dictionary, Third Edition sorts "standpoint, St. Andrew's cross, standstill, ..., stellular, St. Elmo's fire, stem"; Webster's New Universal Unabridged Dictionary, Second Edition has all the "Saint" and "St." terms as run-ins under "saint", alphabetized there as "Saint Bernard, ..., Saint Valentine's Day, St.-Agnes's-flower, St. Andrew's cross"; and ... unhelpfully, those are the only two I've managed to find "St" terms in.
I am not inclined to sort "St Whatever" terms under "saint Whatever" here: I think more people would look for "st..." terms under "st..." than would look for "st..." terms under "sa...". - -sche (discuss) 05:58, 19 August 2024 (UTC)Reply

@-sche The OED follows the same method as The Webster's New College Dictionary, Third Edition, though I don’t have a print copy to hand. Theknightwho (talk) 09:09, 19 August 2024 (UTC)Reply
@-sche: (edit conflict) Yes, that's the unknown factor. Where do users expect to find them, under Saint or St? Currently we have alphabetical sorting like Staffordshire Moorlands, St Agnes, Stagsden. My attempt to group all saint entries together in an orderly fashion, where St Agnes would be followed by St Albans etc., was thwarted by TKW, who has thwarted me at every turn. I would be much happier if we could do that, so we can discuss that here too. DonnanZ (talk) 09:30, 19 August 2024 (UTC)Reply
I expect to find "St. Foo" and "St Foo" at "Saint Foo", not after "Solisbury" and before "Stanford" or whatever. —Justin (koavf)TCM 10:38, 19 August 2024 (UTC)Reply
I'm not sure where I would expect to find them (whether at Sa or St), but I would certainly expect to find all "St" terms together. At least, that's what would be most useful. Andrew Sheedy (talk) 17:57, 19 August 2024 (UTC)Reply
If we do that, then we should stop ignoring spaces in sorting altogether, because it would be a really bad idea to take spaces into account only in this one case, because it's inconsistent. Pinging @DCDuring @J3133 @RichardW57 @Vriullop @Benwing2 who participated in the last discussion about this. Theknightwho (talk) 18:16, 19 August 2024 (UTC)Reply
My understanding is that systems that face a broad user population have lots of special cases or very general architectures because users have diverse, complex, and seemingly contradictory needs. If we have one "special case", we will probably have others. What would be a way to accommodate them? DCDuring (talk) 19:45, 19 August 2024 (UTC)Reply
I would say letting people chose how they sort the lemmas in some sense would be the only logical move. CitationsFreak (talk) 18:41, 31 August 2024 (UTC)Reply
I have a problem with User:Theknightwho reverting my experiments. Get off my back, they will be reverted when I have studied them. DonnanZ (talk) 19:18, 19 August 2024 (UTC)Reply
Testing proved |sort=Staa| works well. I expect I will be told "No, we can't do that." DonnanZ (talk) 22:51, 19 August 2024 (UTC)Reply
@Donnanz Why would we sort St Georges-super-Ely as sort=Staa, as you just tested? That isn't consistent with anything that's been proposed. Theknightwho (talk) 22:55, 19 August 2024 (UTC)Reply
It's being proposed now, as another option. DonnanZ (talk) 23:07, 19 August 2024 (UTC)Reply
@Donnanz Okay, so I repeat the question: why would we sort St Georges-super-Ely as sort=Staa, as you just tested? Theknightwho (talk) 23:10, 19 August 2024 (UTC)Reply
As Andrew Sheedy said above: "I would certainly expect to find all "St" terms together." DonnanZ (talk) 23:17, 19 August 2024 (UTC)Reply
@Donnanz Yeah, but crudely shoving them in a random place, inconsistently from everything else, is not the way to achieve that. In Category:English lemmas, you'd be inexplicably placing them after sta. Theknightwho (talk) 23:21, 19 August 2024 (UTC)Reply
This is a kludge, which is inadvisable because it is not obvious to other editors why this particular sort key has been used. If there is consensus that St should be sorted as if spelled as Saint, then a proper technological solution should be developed. The focus of the discussion ought to be on determining what the consensus is. — Sgconlaw (talk) 23:25, 19 August 2024 (UTC)Reply
Precisely. I think there's probably consensus for not ignoring spaces, but I don't think there's consensus for treating "St" and "St." as "Saint". Theknightwho (talk) 23:27, 19 August 2024 (UTC)Reply
I don't want random sorting, I would have to do more testing there, my intention is to segregate the saints from other St- places, placing them at the beginning of the St- entries. But judging by your past actions, you will disagree with that. Goodnight. DonnanZ (talk) 23:32, 19 August 2024 (UTC)Reply
@Donnanz I've told you at least 5 times what my personal view is, but you obviously haven't listened. Theknightwho (talk) 23:40, 19 August 2024 (UTC)Reply
@Donnanz Why do you keep adding sort=Staa to various entries with the edit summary "test"? What is being tested here? Theknightwho (talk) 11:56, 20 August 2024 (UTC)Reply
I wanted to see whether random sorting occurs. No, it doesn't. In the Welsh list St Asaph, St Davids and St Georges-super-Ely appear in proper alphabetical order between Square and Compass and Stackpole at the moment. Those edits can be reverted once you have checked them. An alternative would be using sort=Stz, which in theory would sort them after the other St- entries. At least I am trying to find a solution to the problem, your personal view seems to stop you from looking for one. DonnanZ (talk) 12:34, 20 August 2024 (UTC)Reply
@Donnanz As @Sgconlaw pointed out, any manual sorting like that is a kludge, which we wouldn't want to use. As several users have said already, we shouldn't be carving out a special exception just for these, because the same issues apply to all kinds of other entries as well: all the entries starting "N.", "S.", "E." or "W.", any with "U.S." and so on. The problem is that you're not looking at the bigger picture, and don't seem to understand that your personal preference for these particular entries does not justify being inconsistent in how we sort things overall. It's not complicated.
You (still!) don't seem to grasp that you personally not liking something does not automatically make it a problem that needs to be solved: given that the sorting you don't like is the method used by the OED, it's clearly not nonsense; it just isn't what you'd prefer. If there is consensus for taking spaces into account when sorting (which would group "St." terms together), then we can do that, but if there isn't, then we won't. Again: this is a very simple concept that someone with your level of experience should understand by now, but you keep reverting back to the same arguments time and again and ignore everything that doesn't align with your view, which is not becoming of an editor with over 200,000 edits. You also constantly make things personal, which is not on. Instead of telling me what I think, absorb what I actually say. Theknightwho (talk) 12:54, 20 August 2024 (UTC)Reply
I noted Sgconlaw's comment, and the aversion to sortkeys. However, no one has come up with a better solution, let alone an automatic solution, AFAIK. So this issue may never be resolved. And being bossed about is a definite turn-off. I am not trying to make things personal, just make an observation. What about personal attacks on me? I have to accept them, it seems. It applies both ways. DonnanZ (talk) 13:32, 20 August 2024 (UTC)Reply
@Donnanz I just gave you a general solution in my last comment, and it's far from the first time I've mentioned it. What personal attacks on you are you referring to? Theknightwho (talk) 13:35, 20 August 2024 (UTC)Reply
Referring to personal attacks on me was a general comment, but they can occur. But a "general solution", are you referring to taking spaces into account when sorting? In general sorting, probably not. I created White Ball along with alt form Whiteball earlier; the latter won't be sorted. But White Ball is sorted before Whitechapel, which is fine, and Whitechapel is followed by White City, as can be expected. However special cases, such as here, can occur. We have to get our heads around those somehow. DonnanZ (talk) 14:51, 20 August 2024 (UTC)Reply
@Theknightwho: I just found some odd sorting for "the" though, The Charltons, Theddingworth, Theddlethorpe All Saints, Theddlethorpe St Helen, The Gorge, Themelthorpe, The Stukeleys. DonnanZ (talk) 20:14, 20 August 2024 (UTC)Reply
@Donnanz It's the same logic used for "white" above. If you remove the spaces, you can see it: "thecharltons", "theddingworth", "theddlethorpeallsaints", "theddlethorpesthelen", "thegorge", "themelthorpe", "thestukeleys" etc. The same thing happens with "red dog", "red drum", "rede", "redeem", "red ensign", "red hat", and so on. Theknightwho (talk) 20:39, 20 August 2024 (UTC)Reply
Yeah, it's odd-looking, rather than wrong sorting. I see Wikipedia does the same (List of United Kingdom locations: The-Thh). We have to live with it, though "Charltons, The" has occurred to me. OK. DonnanZ (talk) 20:55, 20 August 2024 (UTC)Reply
@Donnanz The problem is that the solution would require us to add spaces back in to compounds (e.g. "Whiteball" → "White Ball"), since it would need to know "redeye" should be "red eye", but "redye" should not be "red ye", so it would be a monumental effort to do it all manually. Theknightwho (talk) 21:58, 20 August 2024 (UTC)Reply
OK. I see there is a Derived terms section for the, but I don't bother with cataloguing any. There must be thousands of entries with that word. It occurs in a surprising number of place names; the road on two sides of Twickenham Green is named simply as "The Green". DonnanZ (talk) 22:45, 20 August 2024 (UTC)Reply
I did find a template which allows manual sorting, which is useful with Saint and St. It has limited applications, I think, but it does make a nonsense out of the stance taken by User:Theknightwho. DonnanZ (talk) 17:28, 8 September 2024 (UTC)Reply

Coming soon: A new sub-referencing feature – try it!

[edit]

Hello. For many years, community members have requested an easy way to re-use references with different details. Now, a MediaWiki solution is coming: The new sub-referencing feature will work for wikitext and Visual Editor and will enhance the existing reference system. You can continue to use different ways of referencing, but you will probably encounter sub-references in articles written by other users. More information on the project page.

We want your feedback to make sure this feature works well for you:

Wikimedia Deutschland’s Technical Wishes team is planning to bring this feature to Wikimedia wikis later this year. We will reach out to creators/maintainers of tools and templates related to references beforehand.

Please help us spread the message. --Johannes Richter (WMDE) (talk) 10:36, 19 August 2024 (UTC)Reply


AWB whitelist request

[edit]

Request to be added to the AWB whitelist to clean up [[Category:wikipedia with redundant first parameter]] -saph668 (usertalkcontribs) 13:48, 19 August 2024 (UTC)Reply

[edit]

What do y'all think about placement of possibly related terms? My thinking is to put them under See also and mark them (via qualifier) as possibly related (or probably related, when the likelihood is high); this notion reserves the Related terms section for only terms known with very high certainty to be related. But I will put them under Related terms (with the same qualifier) if most people prefer that. No big deal, as it doesn't come up very often. Just bouncing it off the beer parlour wall. If a consensus arises (or if one already did, in some talk namespace or other), it could be notated at Wiktionary:Related terms. Thanks all. Quercus solaris (talk) 01:22, 21 August 2024 (UTC)Reply

@Quercus solaris I do it this way too, reserving the "Related terms" section only for words that are 100% related, otherwise it basically spreads the false information that two words "are" related rather than just maybe being related. The qualifier idea is good though. Kiril kovachev (talkcontribs) 16:54, 21 August 2024 (UTC)Reply
My personal preference would be to have them under "Related terms" and note the uncertainty in a qualifier. But I don't really add related terms, so I'd rather leave the decision to those who do. Andrew Sheedy (talk) 18:54, 21 August 2024 (UTC)Reply

Removing information from Azerbaijani articles written in Abjad alphabet

[edit]

Hello,

  1. 1 I filled out the Wiktionary with words from Azerbaijani words written in the Azerbaijani Abjad (Perso-Arabic alphabet). However, at some point, @Əkrəm Cəfər wrote to me on the page and asked if these words were Ottoman Turkish. I proved that they were not, since I used a lot of literature and dictionaries of the Azerbaijani language. Then, I noticed that he just deleted the information from these pages and gave a link to the Latin versions of these words. I was offended by this, since most Azerbaijanis write in the Abjad script. This person is a citizen of the Azerbaiajni Republic, which promotes and uses the Latin alphabet (I am not against this and even for it, but the Abjad alphabet is also part of this language), but Azerbaijanis are currently an indigenous people not only in the Azerbaijani Republic, but also (if we take into account the subethnic groups) in Georgia, Iran, Iraq, Russia, Afghanistan. Azerbaijanis from these countries did not accept the Latin alphabet as the main script, for example, in Russia the official script for Azerbaijani is Cyrillic. That's not the point. The point is that they cancel all my edits on the page. Here the next question arises, why do these templates like Template:az-arabic-noun for the Abjad alphabet exist if some users delete them and make a link to the Latin version.
  2. 2 The second question is related to the article müvəllidülma. @Fenakhay just deleted Abjad written word and renamed page to müvəllidülma. Also I was write that the word is formed from 1 Arabic root, 1 Arabic affix and 1 Persian word. But he replaced the etymology with Ottoman Turkish. (Given that this Ottoman word is not in the wiktinary, this is not an argument, but still) Maybe he has some evidence? or did he do this because he considers the Ottoman language more "prestigious" than Azerbaijani?
  3. 3 The third question is related to the constant rollbacks of information from articles written in the Abjad alphabet, I constantly encounter these restrictions that they write "this word does not exist in modern Azerbaiani". This is due to the fact that the ancestor of the Azerbaijani language is not defined in Wiktionary, or rather it is defined as Old Anatolian Turkish, but this is too ancient an ancestor. For comparison, in the Turkish language the ancestor is indicated as the Ottoman language and then the old Anatolian Turkish, this is logical. But it turns out that modern Azerbaijani has no ancestor in the time intervals from the 15th to the beginning of the 20th century (according to various sources, modern Azerbaijani can begin in 1922-1923, when the USSR occupied Azerbaijan, or in 1928-1939, when the USSR translated the Azerbaijani language into latin alphabet). However, historically, the ancestor of Azerbaijani was considered as Ajami Turkish ("Turkish of Persia" and was language of Qajars, Afshars, Qizilbashs etc, it is also ancestor for Qashqayi, possible for Khalaji etc), it is known under different names, but this is the most common, since most often it was simply called Turkî (Turkish, Turc, Turk). I could write Azerbaijani articles written in the Abjad alphabet within this language so as not to encounter restrictions, but as I understand it is not possible at the moment.

Please help me with this issue, since I have a lot of literature and I want to create pages indicating these words, but I encounter restrictions from other users. Sebirkhan (talk) 12:19, 22 August 2024 (UTC)Reply

well, well, well… since i'm mentioned here, and i feel that i'm being blamed in a disastrously manipulative way, i, with all my humility, consider myself righteous to say a few words in order to defend myself. jimmy mcgill ahh intro
first of all, i'd like to talk about my promotion of the supremacy of the almighty and the glorious of all the writing systems. as seen here and in the discussion in their talk page, the user does not hesitate to appeal to manipulative fallacies, such as the classic ones, fabricating an enemy, blaming others, and self-victimizing themselves, as seen here, here, and here: …he considers the Ottoman language more "prestigious" than Azerbaijani?. even the simple act of creating a discussion in this page is a very nice example of that, since they just couldn't manage to provide an argument and tried to end the convo asap in their user talk page and just want to continue agitating.
azerbaijani has been contributed for in wiktionary for years, in all 3 writing systems — latin, cyrillic and arabic (aka perso-arabic or abjad), and nobody denies nor asks for the opposite of that. however, handling three scripts for a language can never be an easy task, since there are quite a few options for how to approach to this situation. as far, the widely applied solution is keeping the main entries in latin-script pages, providing an {{az-variant}} to make the access to the spellings in other scripts easier, and use the {{spelling of}} template with the adequate script code, which links to the main entry, spelled in latin. i have already provided arguments about why this makes sense and why we should continue doing things this way (in the reply to this message). i humbly think and believe that, north azeri (the one written in latin script) is the only one that's regulated, and the only one that's recognized as an official language. south azerbaijani, on the other hand, (written in arabic script) is not regulated and doesn't have a widely accepted standard orthography. in addition, notable per my consideration, the latin-script azerbaijani is more accessible for people on the internet (and in the real life in general, in the world outside iran), is more well-documented, and has an overwhelmingly better online support than the arabic-script one. apple not having an arabic-script keyboard for azerbaijani is just a simple instance for this. people would immediately think of north azerbaijani if we don't explicitly mention "southern", even though the macrolanguage includes both. these, i believe, could be the reasons why the latin script was selected to keep the main azerbaijani entries. we've been doing this for years, and it is the de-facto solution for creating azerbaijani entries on wikt. this is the reason why i cleaned the entry in the first place. unfortunately, this caused an edit war, which we're trying to solve. by the way, they also suggested having duplicate entries for each script, which i object as it'd be inconsistent and hard to maintain.
and about that ottoman thing… the orthography of the azerbaijani language was quite similar to the one of the ottoman turkish language until both languages switched to the latin script, at approximately same times —the end of 1920s—. and that one dated orthography differs significantly from the current persian-style orthography and the underrated varliq standard. that's why i thought it could be an ottoman turkish word, accidentally input as azerbaijani. then they provided some dictionaries that are older than my grandma, and that's why i suggested them being marked as {{lb|az|dated}}. we just forgor this due to this silly thing we've been kept busy with, such a tragicomedy
in conclusion, i don't and cannot ever have any objections for azerbaijani being written using different scripts, unlike they try and manipulate as if i would. i am just a man of keeping things tidy and clean, appropriately formatted. that's all. it's such a shame that i've been wasting more than 2 hours to write this. i just don't understand why they just keep insisting on their nationalist views, while not being able to provide reasonable arguments. i remember that i overcame this when i was 14, like a year ago or smth. it shouldn't be that hard and we shouldn’t be struggling with such shenanigans.
p.s. i see no problem with discussing the legitimacy of the status quo approach to azerbaijani entries, but i'd prefer reasonable arguments, instead of that our armenian friend will be grateful bs.
p.s. 2. i feel like there could be better reasons for why the latin script is (and should be) perceived as the main one, so, if you have any ideas, or you think that i'm wrong, i'd be thankful if you just threw them below. tia :D əkrəm. 14:47, 22 August 2024 (UTC)Reply
You are confusing the concepts again, I can't speak or write in a language called "South Azerbaijani". My grandfather was from the Ganjabasar region, which now is the east of the modern Azerbaijan Republic. I have never spoken to a person who speaks South Azerbaijani and have never read a book written in this language. How does the alphabet relate to this or that dialect? The problem is that you are deleting information from pages where the word is written in the Azerbaijani Abjad alphabet. I am talking about information that is missing from the page of the word written in the Latin alphabet. Why do you use the template you mentioned in relation to the Abjad alphabet with a link to the Latin alphabet, and not vice versa? Sebirkhan (talk) 15:14, 22 August 2024 (UTC)Reply
well, the term south azerbaijani is just an alias i used to indicate the modern azerbaijani language, if that helps. if you're NOT talking about the modern language, well… obviously, {{lb|az|<dated|archaic|obsolete>}} is, in my humblest-to-god opinion, our only choice.
about that deletion, thing… JUST GO AND FUCKING READ WHAT THE FUCK I TOLD YOU, OKAY? umm, wait, actually, this is not okay. i wouldn't want to use such a language. all right, what about this: i kindly ask you to read my arguments and not to act like they never happened. please. i'm not going to write the same thing twice, just waitin' till you open your eyes and start seeing things you've never seen before.
oh and btw please use an autocorrection tool like grammarly or smth before you post your reply here, tia. əkrəm. 21:49, 22 August 2024 (UTC)Reply
@Sebirkhan You didn't prove anything, all of the dictionaries you provided are from the early 1900s and late 1800s, despite the fact that there was an Azerbaijani orthography reform in the 1980s. None of the dictionaries you provided proved those spellings are still recognized in the modern Iranian-Azerbaijani alphabet.

Secondly, You need to calm down and Assume good faith, it is absolutely disrespectful to accuse people you disagree with of having evil motives. For the record, Akram and Fenakhay were simply enforcing a long-standing wiktionary policy to consolidate all language information in one place. If an entry is repeated in multiple places then when one version of the entry is updated, all the others will become outdated. In those instances, it can be years before someone notices one entry is outdated. — BABRtalk 15:27, 22 August 2024 (UTC)Reply
Sorry, but I think you need to read my entire text to understand that the problem is a little broader than you thought. I can not used Abjad becouse they deleting this. The problem is that the ancestor of the Turkish language is indicated - Ottoman Turkish, which was used until 1920s. This completely solves the problem in the case of the Turkish language. At the same time, there is no solution to this problem for the Azerbaijani language - the ancestor of the Azerbaijani language is indicated in wiktionary as Old Anatolian Turkish, which was used until the 14th century at the latest (and which is also the ancestor of Ottoman Turkish and Ajami Turkish and Turcomani). Where is the ancestor of the Azerbaijani language in the period from the 15th century to 1920s? Where can I write these words, but so that Latin people do not delete information from the article and do not replace it with a template referring to Latin? Sebirkhan (talk) 15:38, 22 August 2024 (UTC)Reply
Please, someone, create the language Category for this language Ajami Turkish (https://www.wikidata.org/wiki/Q110812703) to make it ancestor it for Azerbaijani language. It will look like this: Azerbaijani language comes from Ajami Turkish, which comes from Old Anatolian Turkish.
I do not know how to do it in wiktionary. Sebirkhan (talk) 17:08, 23 August 2024 (UTC)Reply
I'm surprised this hasn't been added yet. Nicodene (talk) 20:44, 23 August 2024 (UTC)Reply
The Wikipedia article Q110812703 was authored from late 2022, in the whole lot of languages. Guess Azerbaijanis organize well and their DMN made new projects during the pandemic. But they have missed what we have done all the time on English Wiktionary. I added Classical Azerbaijani two years earlier. Nobody outside global Azerbaijan could have expected such a preference for another term, even Azerbaijani Turkologists in the West, which seem content with this label. Unlike Classical Persian it has not been added as an ancestor of Azerbaijani for no discernible reason later, after our language data modules have been rewritten and reorganized multiple times and only acquired the option of us setting an L2 language to have an ancestor in an etymology-only variety of itself.
Either way editors should be aware what they do here. They compartmentalize Southern Azerbaijani incorrectly if the target audience of this dictionary necessarily reads Latin characters. It matters less then what speakers mostly use. Anyone seeking out en.wiktionary.org can get through Arabic Azerbaijani forms redirecting to Latin spellings, there is little grounds for animosity. Fay Freak (talk) 04:48, 24 August 2024 (UTC)Reply
Hello, so can you please add Ajem-Turkic (aka Ajami Turkish, Ajami Turkic) as ancestor? As for the term Classical Azerbaijani (which is listed as a variety of modern Azerbaijani in wiktionary), I will try to explain why it is not entirely appropriate (and in general the use of the word "Azerbaijani" for the ancestor of the Azerbaijani language.) Ajem Turkic is the ancestor of several languages, but whether these languages ​​are separate languages ​​or dialects is a question for which there is no clear consensus - I mean Qashqai, Afshari, Iraqi Turcoman, Sonqori, Qizilbash. In book The Turkic varieties of Iran , Christine Bulut says (page 406) that written language for theese language was Ajam Turkic since 16th century. It is a good term. But it is also a good term to use because it does not require each of these languages ​​to have an ancestor called "classical Qashqai" (or old qashqai), "classical Sonqori", etc. especially considering the fact that their vocabulary is identical to each other with a few exception. It is obvious that they all descended from one ancestor, but now the only question is what to call this ancestor.) The above mentioned languages/dialects have no relation to the region called Azerbaijan and have never in their life called their language Azerbaijani. 178.46.58.85 11:49, 24 August 2024 (UTC)Reply
m["trk-ajm"] = {
"Ajami Turkish",
110812703,
"trk-ogz",
"fa-Arab",
ancestors = "trk-oat",
entry_name = {["fa-Arab"] = "ar-entryname"},
} 192.71.227.211 15:20, 24 August 2024 (UTC)Reply

Sign up for the language community meeting on August 30th, 15:00 UTC

[edit]

Hi all,

The next language community meeting is scheduled in a few weeks—on August 30th at 15:00 UTC. If you're interested in joining, you can sign up on this wiki page.

This participant-driven meeting will focus on sharing language-specific updates related to various projects, discussing technical issues related to language wikis, and working together to find possible solutions. For example, in the last meeting, topics included the Language Converter, the state of language research, updates on the Incubator conversations, and technical challenges around external links not working with special characters on Bengali sites.

Do you have any ideas for topics to share technical updates or discuss challenges? Please add agenda items to the document here and reach out to ssethi(__AT__)wikimedia.org. We look forward to your participation!

MediaWiki message delivery (talk) 23:20, 22 August 2024 (UTC)Reply

User:ColumbaBushBot

[edit]

Hi everyone - I recently started a vote Wiktionary:Votes/bt-2024-08/User:ColumbaBushBot_for_bot_status for bulk-renaming Assyrian Neo-Aramaic inflection templates, ie Category:Assyrian_Neo-Aramaic_inflection-table_templates

Here's some examples of changes it could be used for

Anyhoo - I invite everyone in the community to discuss and share your thoughts ColumbaBush (talk) 07:03, 23 August 2024 (UTC)Reply

Hi everyone - votings ends on Sept 3rd, so if you haven't yet the chance to vote, your participation would be appreciated ColumbaBush (talk) 04:34, 2 September 2024 (UTC)Reply

Blocked 1 week

[edit]

Hello, I would like to know why my (@Sebirkhan) account was blocked? It says "Re-adding previously deleted entries" but i just tried to create new page: Ajami Turkish 178.46.58.85 20:11, 23 August 2024 (UTC)Reply

@178.46.58.85 JSYK the proper way to request an unblock is to use {{unblock}} on your talk page. — BABRtalk 20:36, 23 August 2024 (UTC)Reply
thank you 194.87.107.107 20:59, 23 August 2024 (UTC)Reply
You created Ajami Turkish and then Fenakhay moved it to Ajami Turkic 35 minutes later. Then an hour later you created Ajami Turkish again, Fenakhay deleted it, and a few minutes later you created it again, and he deleted it again. I think Fenakhay moved it because in English Turkish refers to the Oghuz language of Turkey, and Turkic refers to other related languages that are in the Turkic family. — Eru·tuon 20:49, 23 August 2024 (UTC)Reply
But why Turkish page says that Turkish is synonym of Turkic? I am as Azerbaijani Turk can say that Turkic word used for words like turkic runes, turkic tribes and other ancient things (also it common for all Turkic nations) but in case of Azerbaijani we use "Turkish". For example Azeri Turkish, Azerbaijani Turkish (see: w:Azerbaijani_language. So anyway it not incorrect word but synonyms, and I was blocked becouse I have used synonyms? 194.87.107.107 20:58, 23 August 2024 (UTC)Reply
Ajami Turkish is not attested in English, but only your protologism. In Turkic linguistics, the variety is known as Ajem-Turkic or Ajami Turkic, which is why I moved it to the latter form. Continuing to recreate a protologism three times is disruptive and block-worthy no matter what. — Fenakhay (حيطي · مساهماتي) 21:01, 23 August 2024 (UTC)Reply
as I know "Ajami Turkic" is protologolism of H. Boeschoten, i did not find any other sources that would not refer to him. Can you share links to the term you mentioned above from at least three different authors so that we know for sure that this term is preferable?
Also Ajami Turkish is tranlation if original word "Turkî Ajami" 178.46.58.85 21:13, 23 August 2024 (UTC)Reply

New Wiktionary logo: request for feedback (replacing 维 with 維)

[edit]

Hi, I've previously proposed making this change from to and received positive support and suggestions. User:Cypp0847 has now kindly created an svg which I would like to seek your opinion on. Here is the original (current version) and here is the newly created one.

Proposed new Wiktionary logo

I think something like this looks good, if anyone has any thoughts on, for example, font selection, stroke thickness, or anything else I'd love to hear it. I was thinking a thinner font might look clearer when displayed in small size, but would like to get y'all's thoughts on it. It might also be better to have this as a separate file on Commons in the meantime. In any case looking forward to seeing this come to fruition! Thanks, ChromeGames (talk) 23:55, 23 August 2024 (UTC)Reply

sounds good to me — nd381 (talk) 04:12, 24 August 2024 (UTC)Reply
sounds boring to me Zebres rouges (talk) 08:39, 24 August 2024 (UTC)Reply
Thanks for tagging me in this thread. I was able to find two font types that can display but either a bit heavier (the one now shown) or much thinner. Grateful if anyone can propose a better font so that I can improve on it. Thanks. Cypp0847 (talk) 12:40, 24 August 2024 (UTC)Reply
If you can find a version with thinner strokes, that would be great, but I'm happy enough with it either way. Andrew Sheedy (talk) 15:24, 24 August 2024 (UTC)Reply
I like how the proposed logo looks. Nicodene (talk) 01:05, 27 August 2024 (UTC)Reply
looks nice to me Chihunglu83 (talk) 10:14, 29 August 2024 (UTC)Reply
I’m not sure why the change is needed or desirable. The rationale should be explained. Also, the current font for the new character is too heavy—one with thinner strokes would be better. — Sgconlaw (talk) 11:49, 29 August 2024 (UTC)Reply
is not part of Japanese, Korean, or Vietnamese as far as I am aware, though I have not studied those languages closely. If I am right about that, then is China (PRC)-specific. on the other hand is a character used in Cantonese, Hokkien, Japanese, Korean, other local languages in China, languages in Taiwan (ROC), and Vietnam and is acknowledged in the China (PRC) official standards and dictionaries for Mandarin. It is rooted in 2,000 years of kaishu. So, from this kind of superficial view, I would say yes, it seems like this is desirable, to show inclusiveness toward all cultures and traditions that use this character. (In the alternate, I would propose an ancient form of the character; the logo seems mildly presentist to my eye.) --Geographyinitiative (talk) 19:03, 31 August 2024 (UTC) (Modified)Reply
Yes, 维 is specific to the People's Republic and thus to Mandarin, for all intents. However, 维 is evidently in the logo specifically to represent Mandarin, and as an abbreviation of 维基词典, which is Wiktionary's Mandarin name.
Note that Wiktionary is not called 維基-anything in Hokkien, Vietnamese, etc. The 维 in the logo is fundamentally Mandarin, and fundamentally unrelated to most of the other languages of the Sinosphere. (talk) 07:12, 2 September 2024 (UTC)Reply
Seconding Sgconlaw’s concerns. The font weight is definitely too heavy on the character in the proposed logo. And I, too, don’t see what the change accomplishes; the glyphs in the logo are obviously intended as a representative sample of writing systems, not supposed to cover all writing systems exhaustively. (In any case, if the latter were the case, much more radical changes would be needed.) — Vorziblix (talk · contribs) 12:50, 3 September 2024 (UTC)Reply
@Sgconlaw, Vorziblix, see the discussion linked in the first sentence of OP's original post. This was already discussed fairly extensively and the rationale was explained more clearly there. There was fairly strong consensus for the change in that discussion. Andrew Sheedy (talk) 17:55, 3 September 2024 (UTC)Reply
@Andrew Sheedy: Thanks. It does indeed look like there was a good consensus in favor, so I’ll defer to the rest of the community on switching to the traditional character. I still strongly think we need a lighter font weight, though. — Vorziblix (talk · contribs) 18:05, 3 September 2024 (UTC)Reply
@Andrew Sheedy: thanks for highlighting the earlier discussion. Well, I have no objection if the change is simply to replace the word in simplified script with one in traditional script. As mentioned, a less-heavy font should be used. — Sgconlaw (talk) 22:02, 3 September 2024 (UTC)Reply

Unverified sequences of quotation marks

[edit]

We have multiple articles for combinations of opening and closing quotation marks, most saying only that they "enclose a quotation in some languages" but give no examples. In our navigation template {{quotation marks}}, ‟ ” and ‛ ’ were listed only for Italian. However, when I went to the Italian wiki and checked their sources (e.g. this source), I found that the more common sequences “ ”, ‘ ’ were said to be used. Thus we have no attested languages for ‟ ” or ‛ ’, and I tagged them for verification and removed them from the navigation template. If they can be verified to be used somewhere, it would be nice to have an example or two. kwami (talk) 01:16, 25 August 2024 (UTC)Reply

They supposedly have been used in Greek for second-level quotations: Nick Nicholas's website (2005-04-23) cites Haralambous §1.6.1, who says "Also interesting is the case of the second level quotes. Here, quotes of the size and shape of the English ones are used, but the opening quotes are inverted, similar in form to raised small round guillemets : ‟εισαγωγικά”. Fortunately these quotes are provided by the Unicode standard (U+201F and U+201D, the latter being the same closing double quotes as in English) ; the author knows no other language in which this combination of double quotes might be used."--Urszag (talk) 02:41, 25 August 2024 (UTC)Reply
Thanks! I'll restore the link but move it to Greek.
But that looks like it's only the double quotes. I've left the tag for the single ones. kwami (talk) 02:54, 25 August 2024 (UTC)Reply

When a contraction re-separates...

[edit]

While creating "betch" I found several examples of betchya getting split back into betch ya (or betch'ya), but, like, how the crap do I format an entry for this verbal "betch"? It's certainly not a contraction, but is it really a verb form? (Are there still headword templates for verb forms in English, BTW? I have not been editing much for well over a decade...) Circeus (talk) 06:59, 25 August 2024 (UTC)Reply

@Circeus Maybe creating an entry for betch ya (as alt form of betchya) and having {{only used in|en|betch ya}} at betch? Einstein2 (talk) 21:56, 25 August 2024 (UTC)Reply

Long etymology finder.

[edit]

Akaibu made a tool that lists etymologies that have over 1,000 bytes. Link: https://public-paws.wmcloud.org/63256795/over_1000_etymologies.txt CitationsFreak (talk) 21:33, 25 August 2024 (UTC)Reply

Changing the look of the transliteration in quotation templates

[edit]

Hello. I was playing around with quotation templates (I work on several languages using Cyrillic as their native script), and I stumbled upon the way how transliteration looks there. Now It looks like this:

  • 2021 January 13, “Э̄лы ма̄т ле̄ккарыт рӯпитан ма̄гыс э̄рнэ ма̄шинат ёвтве̄сыт”, in Лӯима̄ сэ̄рипос[2], volume 1235, number 1, page 6:
    Э̄лы ма̄т ле̄ккарыт рӯпитан ма̄гыс э̄рнэ ма̄шинат ёвтве̄сыт.
    È̄ly māt lēkkaryt rūpitan māgys è̄rnè māšinat ëwtwēsyt.
    Useful machines were bought for doctors working on far away land.

There are several issues I think could be useful to change:

  1. Aligning the given text and its transliteration (Currently it is aligned with the translation, which is quite confusing)
  2. Making the transliteration smaller and gray
  3. Moving the transliteration above the given text
Here how it looks in my head

I noticed {{zh-x}} template does essentially the thing I mentioned above, so I think it wouldn't be terribly out of place since this stylistic decision already exists on Wiktionary (and honestly looks much neater):

  1. few; little in number; less; not many
    知道實情 [MSC, trad.]
    知道实情 [MSC, simp.]
    Hěn shǎo rén zhīdào shíqíng. [Pinyin]
    Very few people know the truth.
    只有麵包餅乾 [MSC, trad.]
    只有面包饼干 [MSC, simp.]
    Zhǐyǒu hěn shǎo miànbāo hé bǐnggān. [Pinyin]
    There is only very little bread and biscuits.

Any thoughts?

--Kaarkemhveel (talk) 18:03, 26 August 2024 (UTC)Reply

I do like the idea of the transliteration being grey and possibly smaller, but I would prefer it below the original text - that way it is clear that the quote is in Cyrillic and transliterated in Latin, and not originally in Latin and normalised to Cyrillic. Thadh (talk) 18:08, 26 August 2024 (UTC)Reply
I do think we should consider the official dark mode that is slowly being rolled out. The current color is similar to the background color of the dark mode, which seems undesirable as it can make the text difficult to read. Also I don't agree with moving the transliteration above the original text, as it implies the original text is in Latin. The transliterations are supposed to aid in reading the original script, not replace them. — BABRtalk 18:21, 26 August 2024 (UTC)Reply
Based on some testing, this color seems to be easy to read in both dark and light mode. This is the color used by Arabic conjunction templates. There might be a better color this is the just one I found that worked. The color used by timestamps is also good in both modes but I don't know the exact hexcode for that color. — BABRtalk 18:26, 26 August 2024 (UTC)Reply
In the topic, I described my experience with the light mode, and, to my look, the issue is in the alignment and lack of contrast, the current look of quotation templates is confusing, since it's not clear right away where the native script is. Transliteration should be the main focus point, and the transliteration should look complimentary, less distracting. One of the options is to visually separate it from the original text by making the transliteration gray and/or small, and possibly make a little larger gap between the two. Kaarkemhveel (talk) 18:34, 26 August 2024 (UTC)Reply
@Babr: This isn't relevant, because naturally the text colour will be changed in dark mode if necessary. Ioaxxere (talk) 02:21, 27 August 2024 (UTC)Reply
I also like the idea, we can possibly make it so only for the light mode (and think later when the dark mode is rolled out and understood fully), but as Thadh said, I would prefer [the transliteration] below the original text - that way it is clear that the quote is in Cyrillic and transliterated in Latin, and not originally in Latin and normalised to Cyrillic. Svartava (talk) 18:25, 26 August 2024 (UTC)Reply
I also think it makes more sense the transliteration being aligned to the original text rather thant the translation. Normalizations could be like this, too. Trooper57 (talk) 18:30, 26 August 2024 (UTC)Reply
@Kaarkemhveel: I'm confused by what you mean with the Chinese example. {{zh-x}} does the exact same thing as the Cyrillic example, where the transliteration is aligned with the translation. The only difference is that the translit font color is slightly different. AG202 (talk) 18:54, 26 August 2024 (UTC)Reply
Yes, I didn't even notice that, sorry! I think because it's perfectly readable, even though the alignment is the same as in the current Q template. Hieroglyphs look distinct partially because they are blue (because of the links) and partially because they are slightly bigger and differ greatly from the Latin text (which I don't think would work with Cyrillic-Latin situation). But the transliteration is gray, nevertheless. Again, sorry for this contradiction, it's totally my fault! Kaarkemhveel (talk) 19:02, 26 August 2024 (UTC)Reply

Dear Wiktionary

[edit]

it is your old friend/enemy (frenemy) Equinox. I stayed away for a quarter of a year, but I have started doing nasty IP edits again.

It started out just the way that I would touch Wikipedia (like fixing a comma, tiny corrections) and then I thought "oh I have to add such-and-such a missing word", and then I risked being a damn "Wiktionarian" again. If you remember, I first arrived on your site in 2008 when I found the WT:REE page (I can't remember why, but I do remember "darkwave" [a great music genre] and "countline" [chocolate industry], and SemperBlotto repeatedly unfairly banning me for being Wonderfool, which made no sense to me whatsoever, but I understood it in retrospect, because I was British and funny).

In future you may recognise me by my excellent definitional skills, my IP address (which changes every 12 hours or so, and is usually a long IP6 one beginning with a "2"), and my ABSOLUTELY NOT editing Spanish like Wonderfool. And by not having a user account. And not posting on talk pages and starting fights. I can't do "community". I was always best at just adding missing words.

Meanwhile in the real world I have been developing some cute software and walking in some hills. (And I'm about three quarters of the way through Tolstoy's War and Peace. Jesus, I am a voracious reader but this novel is murdering me.)

Be kind to the IP that begins with a two,

best wishes ~ E 2A00:23C5:FE1C:3701:6450:40E4:A6B9:8E69 21:56, 26 August 2024 (UTC)Reply

If real, I felt I had seen a few IP edits that had your vibe. If not real, very WF-esque. Vininn126 (talk) 22:00, 26 August 2024 (UTC)Reply
I have met a demoscene guy with severe social phobia who reminds me of you (this is a compliment). Same sort of personality. 2A00:23C5:FE1C:3701:6450:40E4:A6B9:8E69 06:10, 27 August 2024 (UTC)Reply
thanks for writing back to us. i wouldn't have seen this if i didnt still have your talk page on my watchlist. i'm glad to see you still helping out, but i'm also glad you're able to keep yourself busy there are other things you need to do. For some people social media platforms can become an addiction, and i think Wiktionary can be just the same. Soap 22:49, 26 August 2024 (UTC)Reply
Thanks Soapy. 2A00:23C5:FE1C:3701:6450:40E4:A6B9:8E69 06:10, 27 August 2024 (UTC)Reply
I am convinced that it is like that. It’s just not prevalent in the overall population enough to have a name as its own behavioral addiction. w:WP:Wikipediholism test is not humorous to me. People just start to understand their dependence on technology, and disregard of it, between all education. Gotta balance the benefit you draw yourself from maintaining this elaborate vocabulary sheet. I mean surely you gained one or the other competence from it for your CV. It even directly increased my grades in the first state examination when I could answer more obscure questions because I repeatedly went through a law paper for quotes. You can choose your special interest wisely. Fay Freak (talk) 12:32, 27 August 2024 (UTC)Reply
Welcome back! I thought I would have to add all the missing OED words myself... Ioaxxere (talk) 02:21, 27 August 2024 (UTC)Reply
You will only have to do about 96% of them. 2A00:23C5:FE1C:3701:6450:40E4:A6B9:8E69 06:10, 27 August 2024 (UTC)Reply
I think you were splendid doing community, greatly economical in posting on talking pages and engaging fights. But who cares about my opinion concerning community. I overanalyze it pathologically. Fay Freak (talk) 12:32, 27 August 2024 (UTC)Reply
A belated farewell and welcome back from me as well. — Fytcha T | L | C 15:35, 27 August 2024 (UTC)Reply
Oh, thats why you use different IPs! Tollef Salemann (talk) 13:49, 28 August 2024 (UTC)Reply
Welcome back, mate. You will notice your old user name is much shorter than any damned IP #. What's it to be, "Equinox 2"? DonnanZ (talk) 20:23, 1 September 2024 (UTC)Reply

Adding Proto-Romance pronunciation to Latin entries

[edit]

Currently, our Proto-Romance reconstruction, such as Reconstruction:Latin/adbracchiare, feature a reconstructed pronunciation. I think it would be neat if our attested entries with inherited descendants had the same thing, since after all we know they continued to exist in the same language. I'm aware that Proto-Romance doesn't necessarily correspond to an actual historical lect, but I nonetheless think that the reconstructed pronunciation would grant readers insight as to how Latin evolved during the first few centuries AD. Naturally some kind of disclaimer could be added if necessary. @Nicodene has outlined some sound changes here: User:Nicodene/sandbox. Other reconstructed pronunciations could potentially be added as well, such as Proto-Western-Romance or Proto-Balkan-Romance. What do you guys think? (Notifying Fay Freak, Brutal Russian, JohnC5, Benwing2, Lambiam, Mnemosientje, Nicodene, Sartma, Al-Muqanna, SinaSabet28): Ioaxxere (talk) 02:21, 27 August 2024 (UTC)Reply

We had this once, more or less--it was labeled as "Vulgar Latin". Aside from various bugs, the biggest problem was that it wasn't actually accurate for a strict definition of "Proto-Romance": it included sound changes that were never completed in some Romance languages, such as lenition of intervocalic stops (often not seen in Italian or Romanian) or lowering of short /u/ to /o/ (not seen in Romanian or Sardinian). We certainly shouldn't do that. I think displaying a pronunciation that actually represents the common ancestor of all Romance languages is of limited usefulness: there aren't a huge amount of sound changes shared 100% between all Romance, so the forms will often not be all that different from the Classical Latin form. In addition, a few changes are somewhat uncertain: e.g. when I was looking into Cj → Cʲ while working on a Wikipedia article on w:Palatalization in the Romance languages, I discovered that some languages such as Sardinian show clusters in forms such as vineam > [bind͡za]~[ˈbind͡ʒa] or corium > [ˈkorju]~[ˈkord͡zu]~[ˈkord͡ʒu]. Those could be secondary developments of earlier [nʲ] and [rʲ] respectively, but they could alternatively indicate that fusion of [nj] and [rj] into unitary palatalized consonants never occurred in these languages at all ... while stuff like this doesn't make a huge difference to the reconstructed form, it's an example of how we don't actually have complete certainty about the details of Proto-Romance phonology and phonetics. You could say the same about our knowledge of Classical Latin phonology and phonetics, of course, but I think the arguments in favor of including that are a lot stronger than the arguments for including a Proto-Romance pronunciation on Latin entries. Anyway, despite those concerns, I'm not definitely opposed to the idea.--Urszag (talk) 05:03, 27 August 2024 (UTC)Reply
The spoken vernaculars that evolved from Latin after the fall of the Western Roman Empire began as a dialect continuum from which the various Romance languages arose. Not only did the loss of intensive Empire-wide interaction and of a centre with a prestige standard lead to a divergence in the vocabularies, but this must inevitably also have caused the local pronunciations to start diverging, with (I believe) uncoordinated sound changes. The best we can do, IMO, is to identify Proto-Romance as the last stage of Colloquial Latin, as it was spoken at the fall of the Empire. Wikipedia has an article on the phonological changes from Classical Latin to Proto-Romance. I surmise that most of these changes actually stem from before these days and reflect the phonological differences between the prestige standard (Classical Latin) and the spoken Colloquial Latin as they were already during the late days of the Empire.  --Lambiam 07:25, 27 August 2024 (UTC)Reply

Proposal to add category "Names of sciences"

[edit]

There are Category:Sciences and Category:en:Sciences here, and d:Q32855022 elsewhere. Lemmas like "anticorrelated" could stay in "Category:Sciences" (NOT a name of science) whereas lemmas like "geology" would thrive better in "Category:Names of sciences". Taylor 49 (talk) 18:37, 28 August 2024 (UTC)Reply

I support such a distinction, but I would name it "Branches of science". Einstein2 (talk) 19:00, 28 August 2024 (UTC)Reply
Category:Geology should still be under Category:Sciences, not under the proposed new category. Given that "Names of sciences" possibly makes more clear that "geology" should be there but Category:Geology not. The Swedish lemmas "lingvistik" and "språkvetenskap" are two names of sciences, but refer to same branch of science. Taylor 49 (talk) 20:28, 28 August 2024 (UTC)Reply
The proposed new category would be considered a set category (i.e. containing terms for sciences and not merely terms related to sciences). Although there have been some discussions in recent months, currently, the type of a category (set category vs. related-to category) is mentioned in its description but not its name, and existing set categories don't have "Names of" in their name. I think "Branches of science" fits better in the current naming system. Einstein2 (talk) 20:50, 28 August 2024 (UTC)Reply
Alternatively the existing Category:Sciences could be renamed to "Category:Branches of science" (much bot work needed) holding the subcategories like Category:Geology, and then "Category:Sciences" could become the set category for lemmas like "geology", "lingvistik" and "språkvetenskap". Taylor 49 (talk) 21:57, 28 August 2024 (UTC)Reply
@Taylor 49: Well, if both of those categories existed, I would be surprised to find lingvistik etc. in Sciences instead of Branches of science. I find it better to have Brances of science (or Names of sciences as you proposed) as a set category with only terms for names of branches; while Sciences would contain every term related to sciences + subcats with terms related to specific sciences. But this is probably not a perfect solution either. See also Wiktionary:Beer parlour/2023/December#Category:en:Landforms, where similar questions are raised. Einstein2 (talk) 21:09, 5 September 2024 (UTC)Reply

Decimal point etymologies

[edit]

Some of our Arabic entries, like تسمين, are using a new layout with decimal etymologies (Etymology 4.1, Etymology 4.2, Etymology 4.3...). Doesn't this look weird to anyone else? In any case, these don't seem to be allowed headers under Wiktionary:Entry layout. @Benwing2, Fenakhay. Ioaxxere (talk) 21:46, 28 August 2024 (UTC)Reply

Hi, I’m having Internet issues so I can’t respond in detail but imo this is the best way of handling related groups of etymologies in Arabic, which arise due to the underspecified script. Benwing2 (talk) 21:57, 28 August 2024 (UTC)Reply
If you have a better idea let me know. Benwing2 (talk) 21:57, 28 August 2024 (UTC)Reply
@Benwing2 Do these differ only by pronunciation? If so, would headers like "Pronunciation 1" and "Pronunciation 2" work, as used in Chinese? My concern with these is that any etymology given under a heading like "Etymology 4.1" is ambiguous, as I don't know which parts apply to all 4.X sections and which only apply to 4.1. Theknightwho (talk) 15:05, 29 August 2024 (UTC)Reply
@Theknightwho No, usually not. For example, unvocalized كاتب could be كَاتِب (kātib) (a form I active participle meaning "writing/writer") or كَاتَبَ (kātaba) (a form III verb meaning "to correspond with (someone)"); unvocalized كتاب could be كِتَاب (kitāb) (a noun meaning "book") or potentially كَتَّاب (kattāb) (an agent noun "scribe") or maybe كُتَاب (kutāb) (plural of a different noun, or of an adjective); unvocalized كتب could be one of many things (a form I verb كَتَبَ (kataba, to write) or potentially a stative form I verb كَتِبَ (katiba) or كَتُبَ (katuba), or a form II verb كَتَّبَ (kattaba, to cause to write), or كُتُب (kutub) the plural of كِتَاب (kitāb, book), or potentially a stative active participle كَتِب (katib) or a verbal noun كُتْب (kutb), etc.). All of these are derived from the same underlying root ك ت ب (k t b) but derived by different inflectional and derivational processes, and are all different terms, typically a mix of lemma and non-lemma forms, usually with unpredictable meanings. In a related language like Maltese that is written in the Latin script, they would all be written differently, and each has its own etymology. In cases that differ only in pronunciation, they are indeed placed under the same Etymology section, but mostly that isn't the case. If the etymology applies to all 4.X sections I will place it under 4.1; it is usually quite clear whether this applies to the root (and hence to all sections) or only to the term in question. We could come up with a different way of handling this but I can't think of any that wouldn't require nesting down to L6 headers (which aren't explicitly allowed per WT:EL and don't display very well). Most languages don't have such underspecified writing systems so this issue doesn't appear anywhere but in Semitic languages and maybe some other Afroasiatic languages; other languages like Persian and Urdu that use the same underspecified Arabic writing system don't follow the same Semitic inflectional principles so typically don't run into this issue so much. Benwing2 (talk) 18:47, 29 August 2024 (UTC)Reply
@Benwing2 Alright - I see what you mean. Level 6 headers are reasoanbly widespread in Chinese and Japanese entries due to the "Pronunciation X" headers, though, and I haven't noticed any issues with them. Theknightwho (talk) 19:04, 29 August 2024 (UTC)Reply
What I mean is the level 6 headers look identical to the level 5 headers, so it gets confusing to distinguish what goes under what. Benwing2 (talk) 19:41, 29 August 2024 (UTC)Reply
Ben asked us on Wiktionary:Beer parlour/2023/July § Etymology sections like 1.3, 2.1. I already got used to the formatting, and I think it makes parsing – not in a computer sense, but mentally apperceiving – the page you provided as an example less disorganized.
“A constantly experimental attitude toward everything – that's all we need.” As it is put by B.F. Skinner. Fay Freak (talk) 22:07, 28 August 2024 (UTC)Reply

How do we handle morphemes that are not affixes?

[edit]

E.g. selen- is templated as a prefix, but in most of the examples it is either the root (e.g. selenic) or part of a compound. I don't see a way to handle this in the English-language POS templates or with the general 'head' template. kwami (talk) 02:47, 31 August 2024 (UTC)Reply

Yes, it isn't clear what part-of-speech one could reasonably assign to a cranberry morpheme. Nicodene (talk) 04:24, 31 August 2024 (UTC)Reply
How about just "combining form", "bound form" or something analogous? Or should it be broken up into functional sections like "prefix" (the OED argues that in some words it functions as a prefix), "root", "first element in compound words" etc? kwami (talk) 05:13, 31 August 2024 (UTC)Reply
How about just classifying it as a noun? The hyphen and the definition then clarify that it only exists in derivatives and compounds. —Caoimhin ceallach (talk) 16:48, 31 August 2024 (UTC)Reply
I don't agree with classifying it as a noun, as it's clearly not. We already have a POS combining form which is probably the right thing to use here (IMO). Benwing2 (talk) 20:50, 31 August 2024 (UTC)Reply
I didn't see that option in the parameter list. I agree that's the best option. kwami (talk) 22:59, 31 August 2024 (UTC)Reply
Changed the POS. The option is not automated for English, and we still have 'prefixsee' because that's how derived words are formatted, but it's a start. kwami (talk) 23:14, 31 August 2024 (UTC)Reply

Can we establish a way to label separable phrasal verbs (verbs with an adverb) and inseparable phrasal verbs (verbs with a preposition) in English?

[edit]

In English, there are:

  • separable phrasal verbs (verbs with an adverb)
  • inseparable phrasal verbs (verbs with a preposition, some sources call them "prepositional verbs")

Intranstive phrasal verbs and three-part phrasal verbs are always inseparable, but in other cases, there's no easy way to tell whether a phrasal verb is separable or inseparable. Learners have to look up this information in dictionaries.

Well, they certainly won't check this information in the wiktionary because it doesn't provide this information. A rare exception is the "usage notes" section e.g. try on, count on, turn back, hear out, run into 5.172.255.165 18:30, 31 August 2024 (UTC)Reply

Hi, see Wiktionary:Beer_parlour/2024/July#consensus_on_inclusion/exclusion_of_"someone"_in_multiword_English_verb_lemmas. This was discussed just a month ago but wasn't resolved. It turns out to be a bit more complicated than just distinguishing cases that are separable vs. inseparable, and that's where we got stuck. Benwing2 (talk) 18:57, 31 August 2024 (UTC)Reply

Citations talk namespace

[edit]

The citations talk namespace has only two pages: Citations talk:Frühgeburtskernikteri and Citations talk:Tai Ji Men. Maybe the namespace can be deleted with any discussions going to the entry's main talk page. Ioaxxere (talk) 05:56, 1 September 2024 (UTC)Reply

I don't think this is possible, is it? Doesn't every namespace need an equivalent talk namespace? Theknightwho (talk) 13:02, 1 September 2024 (UTC)Reply
@Theknightwho: If it isn't possible, we can discourage it by creating an edit filter and hiding the portlet link when you're at a Citations page (this is already the case). A bot could also be created to move all existing content from Citations talk to Citations on a regular basis as well. If we decide not to do this then the link should not be getting hidden in my opinion. Ioaxxere (talk) 15:21, 1 September 2024 (UTC)Reply
Whether or not it's possible, the RFD discussions mentioned on those pages should definitely go to the talk pages, as per our normal practice. They shouldn't be on a Citations talk page that most users won't know about, let alone think to look at. Andrew Sheedy (talk) 21:44, 1 September 2024 (UTC)Reply
It looks like we had an edit filter that stopped Citations talk pages from being created, but I don't know why it was turned off. Theknightwho (talk) 00:32, 2 September 2024 (UTC)Reply
Interesting: Erutuon turned it off on 17 August 2020, saying "Added entry in MediaWiki:Titleblacklist, so this filter is unnecessary." But the two pages mentioned above were both created after that, so either something went wrong with the title blacklist entry, or the issue is that both Citations-talk pages were created by sysops, who may have the rights to defy the blacklist (and because they were making the edit via a gadget, may not even have realized they were doing so). I see a few possible solutions, including editing aWa to automatically change "Citations talk:" to "Talk:", or re-enabling the edit filter (which would mean attempts to archive a discussion with aWa fail and require the user to manually go back and fetch the content and manually archive it, which is a faff...). - -sche (discuss) 06:04, 2 September 2024 (UTC)Reply
@-sche: Yes, it looks like those entries were created by admins using aWa, so that gadget should probably be set to archive things under the main talk page. Courtesy pinging @Erutuon, who also might be interested to work on this. Ioaxxere (talk) 06:57, 2 September 2024 (UTC)Reply
Tangentially related — today I rewrote MediaWiki:Gadget-DocTabs.js and in the process encountered two more edge cases that are currently not handled correctly: template documentation talk (e.g. Template talk:pi-alt/documentation) and module documentation talk (e.g. Module talk:inc-pra-Brah-translit/documentation). I'm mentioning this here because the issue is caused by the same things that @-sche brought up above. Ioaxxere (talk) 06:57, 2 September 2024 (UTC)Reply

Superstition category

[edit]

I propose we add a Superstition related-to category placed under Folklore (as suggested by @Qwertygiy and @Theknightwho) on Discord). Vininn126 (talk) 12:26, 1 September 2024 (UTC)Reply

Interwiki

[edit]
See also: #What should I do

May I create such interwiki? ПростаРечь (talk) 12:26, 1 September 2024 (UTC)Reply

I have no problem with this. Theknightwho (talk) 22:07, 5 September 2024 (UTC)Reply

@PUC May I hear your opinion? ПростаРечь (talk) 06:24, 3 September 2024 (UTC)Reply

Unnamed parameters in etymology templates

[edit]

In the etymology templates like {{bor}}, {{inh}}, etc. all others and, I think it would be convenient if unnamed parameters (except the first, which is the lang code) are for more words, which would help avoid typing out, for example: {{inh|FOO|BAR|TERM1}}, {{m|BAR|TERM2}}, {{m|BAR|TERM3}} which would be replaced by {{inh|FOO|BAR|TERM1|TERM2|TERM3}} (and similarly for {{cog}}, {{der}}, etc.).

  • This is similar to how the templates: {{alt}} and {{desc}} function.
  • In this proposed pattern, meaning of the word(s) would stricly be entered using |tN= (t1, t2, t3, ... correspond to TERM1, TERM2, TERM3; for convenience, |t= will be same as |t1=).
  • Similarly, the term(s) to be displayed, if different from the term linked, would stricly be entered using |altN=.
  • Example: {{cog|FOO|TERM1||MEANING1}}, {{m|FOO|TERM2||MEANING1}}{{cog|FOO|TERM1|t1=MEANING1|TERM2|t2=MEANING2}}
  • This would especially be very helpful when using typing aids since subst:chars would only have to be typed once.
  • For more convenience, we can possibly change |altN= to |aN=, |dN= (display), or |sN= (show) thus minimizing the slight extra effort that will be needed if this proposal is to be implemented.
  • This proposal can also be extended to other linking templates like {{m}} and {{l}} if there is consensus.

This was sometime ago raised on WT:Discord as well but the discussion died down without follow-up. Svartava (talk) 17:52, 1 September 2024 (UTC)Reply

I don't recall often having to list a word as being inherited from multiple lemmas per ancestor, or having multiple cognates in a single related language. What are some example entries where this would be useful?--Urszag (talk) 18:45, 1 September 2024 (UTC)Reply
I get that this varies with languages/families or editors, but I very often have to do this, e.g. 𑀮𑁄𑀡𑀺𑀬, 𑘄𑘡𑘿𑘮𑘰𑘯𑘰, इंदूर etc. When the ancestor or cognate has alternative forms, I always like to mention (as well as read) them whenever relevant. Svartava (talk) 18:58, 1 September 2024 (UTC)Reply
@Svartava It would be better to use // syntax (e.g. {{m|en|foo//bar}} gives foobar), but we still need to work out how to handle transliterations with that format, since some languagse (e.g. Chinese) use slashes but only have one translit. Theknightwho (talk) 00:01, 2 September 2024 (UTC)Reply
@Theknightwho: So in that case, would t1 and alt1 correspond to foo and t2 and alt2 to bar? Alternatively, this might also be workable: {{m|LANG|foo<t:meaning><tr:xlit>}} along with the original proposal, similar to {{desc}}. Svartava (talk) 03:43, 2 September 2024 (UTC)Reply
@Svartava I initiated the change to {{desc}} to the syntax you're proposing, but I'm not currently convinced this is needed for {{bor}}, {{inh}}, etc. since as User:Urszag noted it doesn't seem so common to have multiple ancestral lemmas of a given term in the same language. If we add support for this I'd propose allowing for comma-separated lemmas in a single param with inline modifiers, e.g. your example of इंदूर (indūr) would use something like {{inh+|mr|pra|*𑀇𑀁𑀤𑁅𑀭,*𑀇𑀁𑀤𑀯𑀼𑀭,𑀇𑀁𑀤𑀧𑀼𑀭}}. This syntax is already supported for all form-of templates; to allow for embedded commas in a lemma, the comma is only recognized as a separator if not followed by a space and not preceded by a backslash. Benwing2 (talk) 03:45, 5 September 2024 (UTC)Reply
@Benwing2 That looks like a nice idea. How do we provide the meanings and alternate display in the form-of templates? I don't see it by |t=T1,T2 or |t1=T1 and |t2=T2? Svartava (talk) 03:53, 5 September 2024 (UTC)Reply
@Svartava Use inline modifiers. Benwing2 (talk) 03:54, 5 September 2024 (UTC)Reply
Personally I would support the original proposal. To me it seems tidiest and given how {{desc}} was made to work I think it'd be nice if all templates would work the same for the sake of consistency. I don't think this is rare by any means, I've had to write this a number of times and a quick search insource:/\{(der|inh|bor|cog)[^}]+\}\}, \{\{m\|/ reveals 35k hits throughout the project. However I'd also be happy with Benwing's solution. It does feel like extra syntax to be aware of, but it seems a good compromise with editors who'd prefer to continue using unnamed parameters for |alt= and |t=. As a separate note, I also liked the idea of shortening |alt= to something like |d=, all things aside. Catonif (talk) 10:46, 10 September 2024 (UTC)Reply
While the possibility of specifying multiple parents in one template is entirely agreeable, I don’t think that unnamed parameters are the way to go. Upending the elegant, laconic and ubiquitous link-display-translation syntax in order to optimise this relatively uncommon situation is not worth it, and any other solution, whether parameter- or comma-based, is preferable. ―⁠Biolongvistul (talk) 12:43, 10 September 2024 (UTC)Reply

Creating Wiktionary Pages for Generation Z Slang

[edit]

Hello,

Today I read a w:List of Generation Z slang on Wikipedia, and I created audio for a couple of slang terms. I did create audio on Lingua Libre for two slang terms that don't have English Wiktionary pages yet. Is Wikipedia's list of Generation Z slang sufficient evidence these terms exist, or do I need to further attest their existence by digging up quotes on the Internet?

bouncing on it:(file)
big yikes:(file)

Thank you Flame, not lame (talk) 00:14, 2 September 2024 (UTC)Reply

We do have a page about that actually: Appendix:Gen Z slang and there is a relevant discussion about deleting it which discusses some of the issues you mention. —Justin (koavf)TCM 00:29, 2 September 2024 (UTC)Reply
I'll keep these question in Beer Parlour for one 'tis popularer, and the page you linked is not likely to get noticed. Flame, not lame (talk) 00:44, 2 September 2024 (UTC)Reply
Okay, so attestation requirements are the same for Gen Z slang as they are for any other words, so citing Wikipedia itself is not sufficient. Unfortunately, the nature of youth slang makes it so that you're less liable to find durable citations, but the good news is that requirements for what can be in the appendix namespace are much more lax (and not really codified), so if you wanted to add content there, it would be appreciated. —Justin (koavf)TCM 00:56, 2 September 2024 (UTC)Reply

(ab)use and other unusual portmanteau-like terms

[edit]

In my recent research into various potential sources/concordances, I have stumbled across a class of terms as best I can understand do not have a linguistic term for(or at least those I have brought this up to about have also failed to recall one), and for which to my understanding we currently only have one, albeit partial in my opinion, example being (s)he, the terms being as followed:

What is unusual about these is unlike a traditional portmanteau, these aren't blending meanings to form a new meaning, and basically are reliant on their on the sum of their parts. None of these terms can not be "synonymize" like (s)he can be(the singular they), of which I found two other examples of such:

These are also notable because unlike the above, these are not simple bracketing of a prefix.


I will also note two examples other examples of technically attestable terms I found of the same class but am unsure of the actual acceptance of said attestments but that is outside the scope of the topic I'm broaching here:

  • (g)old - gold + old (I understand this as coming from the phrase "old but gold", probably "synomizeable" as under adj sense 2 of vintage )
  • (nick)name - nickname + name (even with quote context the exact nature of the use of this eludes me)

currently I haven't been able to find any examples of such but I suspect that this kind of term isn't exclusive of prefixing, give the following hypothetical term as an example of what it could be like:

Basically I bring all this up because I'm unsure on if these are actually something that we "can" document, and more so how exactly we would go about actually documenting them, because describing them as just another blend/portmanteau seems inapt, as well as how one would go about actually defining these. Akaibu (talk) 03:54, 2 September 2024 (UTC)Reply

I'm not aware of a specific term for this either. A juxtaposition of perspectives, like the one that excarnateSojourner describes, occurs when the bracketed element is a negative or pejorative suffix. The result is often a bit cheeky; cf. (in)famous, (mis)adventures. Nicodene (talk) 04:53, 2 September 2024 (UTC)Reply
A juxtaposition of perspectives, again, isn't a requirement, as (wo)man shows, unless you go by the binary definition of gender which, yea lmao. It certainly seems like the most common though. Akaibu (talk) 05:51, 2 September 2024 (UTC)Reply
As I said: “when the bracketed element is a negative or pejorative suffix”. Nicodene (talk) 06:05, 2 September 2024 (UTC)Reply
I have also seen slashes used, as in dis/like, mis/fortune, and we do also have s/he and Latino/a. My initial reaction is that I'm not sure it makes sense to include just any term in this class, though, like (mis)fortune, mis/fortune, (wo)man, or e.g. person(s), which all seem relatively SOP—maybe we want to only include ones where the meaning is not obvious (maybe (g)old?) or where there are THUB translations? I don't know... I'm aware we include unspaced, unpunctuated single words regardless of whether they're SOP, but when punctuation clearly tells the reader what the 'parts' are that they need to look up, it seems like we take that into account, since we don't have e.g. North/South: we seem to rely on someone who sees North/South Dakota to know to fill in the missing part and look up North Dakota and South Dakota separately, and on the face of it, it seems like we could similarly rely on someone who sees mis/fortune or (mis)fortune or misfortune(s) to look up misfortune and fortune and misfortunes separately...? - -sche (discuss) 06:56, 2 September 2024 (UTC)Reply
@-sche I would say they aren't SOP because their use implies simultaneously meanings to an object, while your example of North/South Dakota is different because that would be a single term referring to multiple things you wouldn't expect someone to say "New York (City)", because that's that city and the state, where as with (mis)fortune and such, your only talking about one thing(someone's fortune or misfortune). Akaibu (talk) 16:40, 4 September 2024 (UTC)Reply
Re slashes: we also have wo/man, a form of the mentioned (wo)man. J3133 (talk) 16:55, 4 September 2024 (UTC)Reply

Request for Old Parthian/Proto-Parthian

[edit]

I would like to request for the creation of templates, modules and codes to enable supporting Old Parthian/Proto-Parthian on Wiktionary. Is this feasible? (Also, pinging @Victar here because they might have useful insights regarding this) Antiquistik (talk) 08:37, 2 September 2024 (UTC)Reply

To what ever end? --{{victar|talk}} 05:09, 3 September 2024 (UTC)Reply
@Victar To add the earlier (Old Iranian) forms of certain attested Parthian (Middle Iranian) lemmas. For example, if I were to create a page for Friyapat in Parthian, I would like to accompany it with an entry on the reconstructed older form, *Friyapatiš in Old/Proto-Parthian, where I can further explain its etymology.
Besides, even on present Parthian entries, the only option now available is to have the terms' further etymologies be labelled as "Old Iranian," which is very flawed when the pre-Middle Iranian forms of these terms can be reconstructed.
I did leave a message regarding this on your talk page, but it seems you might have missed it. Antiquistik (talk) 06:09, 3 September 2024 (UTC)Reply
I've been away for a few weeks.
I would write the etymology of Parthian 𐭐𐭓𐭉𐭐𐭕 (prypt /⁠Friyapāt⁠/) as, "From earlier *Friyapatiš, whence Achaemenid Elamite [script needed] (pír-ri-ia-bat-ti-iš), Ancient Greek Φριαπίτης (Phriapítēs), from Proto-Iranian *FriHyápatiš, from Proto-Indo-Iranian *PriHyápatiš, from *priHyás (beloved) +‎ *pátiš (master). Cognate with Sanskrit प्रियपति (priyápati)." --{{victar|talk}} 19:09, 3 September 2024 (UTC)Reply
@Victar Isn't this alternative a tad cumbersome though? And should I use the code for Parthian itself for the earlier form? Antiquistik (talk) 02:48, 4 September 2024 (UTC)Reply
Honestly, no more cumbersome than the next. --{{victar|talk}} 07:55, 4 September 2024 (UTC)Reply
@Victar And do I use the code for Parthian itself? Or does Wiktionary have a code for unlabelled languages similar to Wikipedia's mis? Antiquistik (talk) 08:38, 4 September 2024 (UTC)Reply
The only reason for giving an earlier form to Parthian terms is if borrowings points to a more archaic form, like with 𐭐𐭓𐭉𐭐𐭕 (prypt /⁠Friyapāt⁠/). Otherwise, the inherited form should be Proto-Iranian. --{{victar|talk}} 07:12, 12 September 2024 (UTC)Reply

Announcing the Universal Code of Conduct Coordinating Committee

[edit]
Original message at wikimedia-l. You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello all,

The scrutineers have finished reviewing the vote and the Elections Committee have certified the results for the Universal Code of Conduct Coordinating Committee (U4C) special election.

I am pleased to announce the following individual as regional members of the U4C, who will fulfill a term until 15 June 2026:

  • North America (USA and Canada)
    • Ajraddatz

The following seats were not filled during this special election:

  • Latin America and Caribbean
  • Central and East Europe (CEE)
  • Sub-Saharan Africa
  • South Asia
  • The four remaining Community-At-Large seats

Thank you again to everyone who participated in this process and much appreciation to the candidates for your leadership and dedication to the Wikimedia movement and community.

Over the next few weeks, the U4C will begin meeting and planning the 2024-25 year in supporting the implementation and review of the UCoC and Enforcement Guidelines. You can follow their work on Meta-Wiki.

On behalf of the U4C and the Elections Committee,

RamzyM (WMF) 14:07, 2 September 2024 (UTC)Reply

Have your say: Vote for the 2024 Board of Trustees!

[edit]

Hello all,

The voting period for the 2024 Board of Trustees election is now open. There are twelve (12) candidates running for four (4) seats on the Board.

Learn more about the candidates by reading their statements and their answers to community questions.

When you are ready, go to the SecurePoll voting page to vote. The vote is open from September 3rd at 00:00 UTC to September 17th at 23:59 UTC.

To check your voter eligibility, please visit the voter eligibility page.

Best regards,

The Elections Committee and Board Selection Working Group

MediaWiki message delivery (talk) 12:15, 3 September 2024 (UTC)Reply

tautological definitions

[edit]

The following subscript letters are "defined" simply as subscript letters, with no other content. For example, U+2093 LATIN SUBSCRIPT SMALL LETTER X () is defined only as "subscript x".

Graphical descriptions are not definitions. I added an actual definition to , but the rest IMO should be deleted, unless someone wants to go through and add some content. They are:

, , , , , , , , , , , .

kwami (talk) 13:55, 3 September 2024 (UTC)Reply

Should we remove automated "more X" and "most X" from headers of English adjectives?

[edit]

I'm here per the suggestion of user:Theknightwho.

There are many adjectives where we use the header {{en-adj|-}} because we haven't attested to comparative forms, and that creates the note "not comparable". But almost all of these are comparable, especially in poetry. This came up for Iapetian, where it's easy to see how it might be comparable even if there are no hits at GBooks. That took me to Japhetic, which we also claimed was not comparable, but where I found several instances of "more Japhetic" on GBooks ("some Japhetides are clearly more Japhetic than others"; "beyond the more Japhetic order of minstrelsy"), and possibly one of "most" (no preview on that one).

As Theknightwho pointed out, there are probably very few cases where an English adjective is truly not comparable, because combining an adjective with "more" and "most" is extremely productive -- it's not like countability, where some nouns truly aren't countable. I agree with Theknightwho that we probably shouldn't automatically list these forms: they don't add any information that "adjective" doesn't already. We wouldn't want to claim they exist if we can't attest to them (something like "most Iapetian" may have never been used in the history of English), yet we shouldn't make a positive claim that they can't exist either, unless we have a source to back up that claim.

For inflected comparatives (nicer, nicest), we'd want to continue to provide them in the header of course. But those are generally easy to attest.

In the few cases where that's inadequate, we can always expand on the header template, like we do at fun#Adjective, or add usage notes. kwami (talk) 01:07, 4 September 2024 (UTC)Reply

It sounds like the problem is not the more/most forms, but the text generated when someone suppresses them via {{en-adj|-}}. It sounds like what would solve the issue is to either add a ! parameter (like {{en-noun|!}} has) to {{en-adj}}, or just change the wording of {{en-adj|-}} itself to say the forms are not attested rather than that they don't exist. It doesn't seem like removing the mention of more/most forms, when they're attested, would improve anything; indeed, IMO it would make things a bit worse, because if some entries list comparative/superlative forms (as entries like nice and ugly will need to, as you say), then not listing comparatives on other entries suggests they don't grade/inflect. (Granted, we could retain some mention like "inflects the usual way", but what's the usual way? Might as well go ahead and spell out "inflects using more/most"... and then we might as well just keep listing the forms, we're not short on ink.)
We have the same problem with {{en-noun}}: most or all entries that currently use {{en-noun|-}} should properly use {{en-noun|!}}, because just like with the adjectives you mention, the issue is that the inflected forms aren't thrice-attested, not that they positively cannot exist. Comparing today's X to last century's, or this universe's X to a hypothetical mirror universe's, I can discuss how the Xs [plural] differ even if X is one of the nouns we claim are uncountable. Our whack-a-mole / cite-a-mole approach means we list e.g. "engagingness" as uncountable even though there are two cites of it. - -sche (discuss) 06:32, 4 September 2024 (UTC)Reply
There is a distinct grammatical class (POS) of mass nouns, such as water and bread, that are uncountable with their default definition. (You can say at a restaurant "we'll have 2 waters", or "all the waters of the Earth", but those are distinct definitions and can be listed separately.)
There is nothing comparable with adjectives. Whether they inflect is a matter of word length and lexicalization, not being a distinct POS. They're more like strong vs weak verbs.
We can already suppress the comment with {{en-adj|?}} or {{head|en|adj}}; the problem is that those get converted to {{en-adj|-}} (as happened to Iapetian) and then make the claim that a comparative form does not exist.
Unlike nouns and verbs in their inflections, all adjectives can form comparative phrases with 'more' and 'most', unless those would be redundant with existing inflected forms that we already mention. We don't mention which nouns can be pluralized with "some" (some water, some bread), or which verbs can be made past with did, so I don't see how not mentioning comparative forms with more, most is a problem. And do we even want to bother attesting them? But I agree that it would be better to say that such phrasing is not attested than to claim it cannot be. kwami (talk) 18:17, 4 September 2024 (UTC)Reply
At the risk of discussion creep, I feel like this is a persistent problem with virtually any instances where you have no information displayed or stored which is that it's not clear if there is no such value or if the value is just kind of obvious and what you'd intuit. To use a somewhat similar example, on d: if someone has no death date listed, that could be because he's not dead or it could be because we simply haven't added it to the database yet. Putting a death date of "no value" at least explicitly says "no death date applies here: he's still alive". I feel the same way here, where I as a native English speaker would probably not think that "Japhetianer" or "Japhetianest" are words and it's probably more likely that "more/most Japhetian" is the correct way to say this, have a few extra words in the entry that says "the more/most form is the correct one" really causes no harm and clarifies what could be confusing or an omission. So I think we should retain "more/most" in headings. —Justin (koavf)TCM 18:49, 8 September 2024 (UTC) Actually, let me think this thru more... I'll retain this if anyone finds the direction I was headed meaningful. —Justin (koavf)TCM 18:50, 8 September 2024 (UTC)Reply
For me, redundant info is not the problem, or at least not important enough to worry about. It's our positive claim that the more/most forms don't exist that's the issue. I've also used DonnanZ's {en-adj|?} solution, or else {head|en|adj}, but people go through and change them all to {en-adj|-} even after I explain to them what I've done, because a search of GBooks does not come up with any tokens of the more/most forms. If GBooks doesn't have them, that's taken as proof that they do not exist and, more importantly, cannot exist. In their opinion, the solution is to change the wording produced by {en-adj|-}, and in the mean time they'll continue to make false claims on Wk. kwami (talk) 20:25, 8 September 2024 (UTC)Reply

Ambonym Attestation

[edit]

Hello,

I recently created a Wiktionary page for the neologism "ambonym." The term was coined in Human1011's YouTube video. [3] Etymology Nerd also covered this topic. How do I attest this new word? Perhaps I can use better word choices in the etymology and definition.

Thank you Flame, not lame (talk) 01:18, 4 September 2024 (UTC)Reply

I would say look it up in Google Books and Internet Archive, see if you find any work that uses the term. CitationsFreak (talk) 05:53, 4 September 2024 (UTC)Reply
I don't think the term meets our CFI. The only durably archived mention I find is a coinage of the same term but with a different meaning. And it doesn't appear widespread in online sources. Einstein2 (talk) 15:57, 4 September 2024 (UTC)Reply
I haven't thought too hard about it yet, but my initial thought is that any contranym can meet the definition of Human1011's proposed word. Am I wrong? Quercus solaris (talk) 21:33, 4 September 2024 (UTC)Reply
Not all contranyms are ambonyms, and Etymology Nerd did a pedantic essay on ambonyms, so let's go with Human1011's new word because he is a smart young man. Flame, not lame (talk) 21:35, 4 September 2024 (UTC)Reply
I just realized that yes, the ambonymy concept is about the relation of the contranym to others, not just about the contranymy alone. Quercus solaris (talk) 21:41, 4 September 2024 (UTC)Reply
If other people use it, then we can list it. We don't have a "smart young man" clause. CitationsFreak (talk) 23:32, 4 September 2024 (UTC)Reply

Arabic voiceless velar fricative notation

[edit]

The Arabic voiceless velar fricative (خ) is currently rendered as ḵ on Wiktionary. I would however argue that, since this diacritic form is already used for begadkefat notation in Hebrew and Aramaic, the organic and non-begadkefat Arabic خ should instead be rendered using ḫ, just like how this phoneme is rendered for other Semitic languages like Old North Arabian, Old South Arabian and Akkadian, and for the non-Semitic but still Afroasiatic Ancient Egyptian language. Antiquistik (talk) 10:28, 4 September 2024 (UTC)Reply

@Antiquistik I oppose this. is too confusable with , the pharyngeal fricative. Benwing2 (talk) 03:37, 5 September 2024 (UTC)Reply
@Benwing2 We already use both and for, respectively, the velar and pharyngeal fricatives, for Old North Arabian, Old South Arabian and Ancient Egyptian without this causing issues so far, don't we? Antiquistik (talk) 06:52, 5 September 2024 (UTC)Reply
Oppose, consistently using an underline for fricatives is much more understandable. ḵ is the fricative equivalent of k, ḡ of g, ṯ of t and ḏ of d. Using diacritics in a consistent manner is preferable IMO and randomly using ḫ would be much less clear. — BABRtalk 01:39, 6 September 2024 (UTC)Reply
Agree, and I'll add that @Antiquistik's concern with "organic" vs. "begadkefat" variants across different languages runs into the same problem with ḡ, ṯ and ḏ, yet we aren't proposing replacing every one of those with some other random symbol. Benwing2 (talk) 04:43, 6 September 2024 (UTC)Reply
@Babr @Benwing2 I am sorry, but saying that I am proposing replacing with a random symbol comes across as disingenuous when is already the notation for the velar fricative in the standard Romanisation schemes for several Semitic languages like Old North Arabian, Old South Arabian, Akkadian, and it is also used as such in some Romanisation schemes for Arabic.
In fact, in my opinion, the DIN 31635 scheme, which notes the velar fricative as , is among the better transliteration schemes for Arabic. It's a German transliteration scheme, so I doubt English Wiktionary would adopt it wholesale, but it is used widely enough in English literature on Arabic that I think we should consider it.
Though I must note that the argument with regards to organic vs begadkefat was proposed by @Fay Freak in a previous discussion Wiktionary:Beer parlour/2024/May#Arabic and Hebrew transliteration
I have no problem with disagreements with or opposition to my proposal per se, but it is not a random choice. Antiquistik (talk) 07:31, 6 September 2024 (UTC)Reply
@Antiquistik I did not say it was a random symbol, I meant it's usage would be random since it wouldn't match our otherwise consistent pattern for marking fricatives. I think using diacritics in simple and consistent manner is desirable as it makes transliterations more easily understood by readers. — BABRtalk 18:01, 6 September 2024 (UTC)Reply
@Babr That's fair. I suppose I should have argued for switching to the DIN 31635 transliteration scheme altogether, especially given that I think its other letters provide better consistency with the Wiktionary notations of languages with heavy loaning from Arabic, like the various registers of Hindustani. Antiquistik (talk) 12:34, 10 September 2024 (UTC)Reply
Oppose. — Fenakhay (حيطي · مساهماتي) 07:42, 6 September 2024 (UTC)Reply
I have no strong feelings for either scheme, but there are lots of casual readers and in particular native speakers without experience in Semitist tradition that do not participate in discussions but might be snubbed by . For now we at least have the advantage of there only being one diacritic variant of k and h each, and , which is as Benwing2 implies optically though not comparatively advantageous, and last but not least status quo bias and complete correspondence with the English edition of the Hans Wehr transliteration, only its rings made larger. Your, albeit excellent, centre of gravity in language comparison begs us here to “cause issues”, I am sorry. Fay Freak (talk) 16:20, 6 September 2024 (UTC)Reply

What should I do

[edit]

What should I do if my edits are reverted?
I left a message on PUC talk page.
I didn't get a response.
I left a message here.
I didn't get a response. ПростаРечь (talk) 18:30, 4 September 2024 (UTC)Reply

  • @ПростаРечь I don't know that I can give you a full answer as I don't work in the reconstruction namespace myself, but Wiktionary generally doesn't use manually-added interwiki links, because the title of an entry in the English Wiktionary will be the same as the title of the corresponding entry on any other Wiktionary. I do see that the Russian Wiktionary does not seem to have a dedicated namespace for reconstructed terms as we do here, but the way to solve this problem (if we see it as a problem at all) would probably be to change something at Wikidata, rather than manually adding interwiki links between all reconstructed terms in all Wiktionaries. — excarnateSojourner (ta·co) 21:31, 4 September 2024 (UTC)Reply

Recategorizing quotation navigation templates by bot

[edit]

We have a collection of templates such as {{Douglas Adams quotation templates}} that are used on the documentation pages of quotation templates (such as Template:RQ:Adams Hitchhiker/documentation) to list other quotation templates for works by the same author. Currently these are categorized in cat:Navigation templates (and e.g. cat:English quotation templates), but they are shoved to the front by their sort keys rather than being listed under each letter.

I'm seeking consensus to create cat:Quotation template navigation templates (or cat:Quotation navigation templates if people prefer) as a subcategory of cat:Navigation templates, and use my bot account to recategorize these templates there. — excarnateSojourner (ta·co) 00:11, 5 September 2024 (UTC)Reply

Support This, that and the other (talk) 10:11, 5 September 2024 (UTC)Reply
No objection: perhaps the name should be "Category:English quotation navigation templates" which can then be a child category of both "Category:English quotation templates" and "Category:Navigation templates". Some of our quotation navigation templates are multilingual (for example, {{Bible quotation templates}} and {{Don Quixote quotation templates}}), in which case they should also be child categories of "Category:French quotation navigation templates", "Category:Spanish quotation navigation templates", and so on. — Sgconlaw (talk) 12:15, 5 September 2024 (UTC)Reply

Request for extended mover right

[edit]

I plan on enforcing the changes to the Ottoman Turkish encoding guidelines which were formalised into WT:AOTA (see disc) but never really put into practice. For doing so I would need to move and merge about ~300 ca. entries, and since I would rather not leave unwanted redirects behind, nor flood CAT:D for someone else to tediously go through, I request the extended mover right as to save both myself and others the hassle. Catonif (talk) 15:53, 5 September 2024 (UTC)Reply

Granted. More than trusted user. Vininn126 (talk) 16:38, 5 September 2024 (UTC)Reply
@Vininn126 Thank you! Catonif (talk) 17:02, 5 September 2024 (UTC)Reply

Request for autopatroller rights

[edit]

I wish to request autopatrolling rights as I plan on editing certain pages following my collaboration with GLAAD that are fully protected. I already have autopatroller rights on Commons and edit in good faith and good contributions, only with minor mistakes as I am still not a total expert on everything. Juwan (talk) 22:04, 6 September 2024 (UTC)Reply

I'm inclined to say yes, but I have a few thoughts. You are a cooperative editor and fast to learn, and your experience on Wikipedia has definitely helped you in learning the ropes. I'm hesitant, as I think there are still some fairly common ins and outs of the site you're not familiar with yet, and your edit count is just on the edge for many users (not that edit count is everything, but I feel in this particular case it's a useful metric). If no other admins have any qualms, then I think it's probably in order. I'd like to see what others say. Vininn126 (talk) 22:07, 6 September 2024 (UTC)Reply
if you have anything specific that you think that I should learn, please go ahead and mention it in my talk page on or the Discord server, not even for this particular request but in general I want to know! Juwan (talk) 22:14, 6 September 2024 (UTC)Reply
@JnpoJuwan IMO these terms need to be approached with a lot of care. I agree with @Vininn126 that it might make sense to make a list of the changes you propose and post this in the Tea Room. FWIW I am not an expert in these sorts of terms; I think User:-sche knows more. Benwing2 (talk) 03:19, 7 September 2024 (UTC)Reply
thanks for the advice! I will go ahead and do that. Juwan (talk) 13:11, 7 September 2024 (UTC)Reply

Reference bibliographies

[edit]

Is there any central or per-language bibliography in Wiktionary?

The references and particularly ref templates make a treasure of sources on etymology, usage and other linguistic matters. BUt one usually encounters them under the random heading in this or that article. Is there a section of Wiktionary where you can find them grouped? In the language top categories I saw some "Category:<lang xx> reference templates". But untranscluded, they are completely opaque, and there is no telling the grand etymological dictionary from the note about a single word, the old from the new.

What I'm looking for (or propose) is more like Appendix:Bulgarian bibliography, that lists the major recuring sources, and possibly more works, by topic. Danny lost (talk) 22:04, 6 September 2024 (UTC)Reply

The category you mentioned is usually better. As far as transparency, this depends template to template, and if someone has written a documentation or not. It's also usually important to read the forward of the given dictionary and any reviews, which is often not given. So the answer is no, there usually is not a grand explanation of each source, unfortunately. Sometimes more details are provided on pages such as WT:About Bulgarian but not always. Vininn126 (talk) 22:09, 6 September 2024 (UTC)Reply
Actually what I asked for is found at Wiktionary:Reference templates / Wiktionary:REFT. Though it is a bit of a random selection at the moment. Danny lost (talk) 20:41, 9 September 2024 (UTC)Reply

East Frisian

[edit]

Anybody could know what code we could use for East Frisian dialects that are not Saterlandic? Like Upgant, Wangerooge, Harlingerland, and Wursten? That Northern Irish Historian (talk) 00:38, 7 September 2024 (UTC)Reply

It's been a while, but if memory serves, ISO messed things up by using East Frisian for a Low Saxon lect and leaving Saterland Frisian as the only linguistically Frisian eastern lect with a code. We made do by making all the Frisian East Frisian lects part of Saterland Frisian's code. Pinging @-sche who was more directly involved and @Theknightwho who might know more about the current state of the codes. I don't think anyone was happy with the way we left it back then, so it wouldn't hurt to revisit the whole mess. Chuck Entz (talk) 02:03, 7 September 2024 (UTC)Reply
So that means using stq for these dialects? That Northern Irish Historian (talk) 15:24, 8 September 2024 (UTC)Reply

"the act of being"

[edit]

There are 91 hits (enwikt entries), mostly in definitions, for this grammatically correct but IMHO semantically odd expression. Be is the ultimate stative verb, contrasting with a dynamic verb, which describes an action. "The act of being" makes for a good phrase for a poetic, spiritual, or motivational bit of writing or speech, but it seems completely out of place for definitions.

One simple change that might work for many of these definitions is to drop "The act of" and leave "Being". Another is to replace "act" with "condition" or "state" or, imitating other dictionaries, replace "act" with "condition, state, or quality" or similar.

Am I missing something? If I am not, this seems to show that we need to up our game to improve Wiktionary's most basic product, its definitions. I don't know what means there are might be to prevent or cure such inappropriate expressions in definitions. Prevention may be hopeless because almost all newbies and some no-so-new-bies are convinced that it is trivial to write a definition. ("Style guide? Style guide! We don't need no stinking style guide.") Perhaps the alternative is to record common phrases that are almost always wrong for a dictionary definition, scan the dictionary periodically for such expressions, and correct them. The correction could be simple (automatic or semiautomatic) replacement of the offending expression, but probably more extensive rewording would be better. DCDuring (talk) 02:57, 7 September 2024 (UTC)Reply

In most cases, "being" in this context is being used not as a stative verb, but as an auxiliary. E.g. "being dethroned" is not necessarily stative: it can refer to an action.--Urszag (talk) 03:09, 7 September 2024 (UTC)Reply
The OED, based on my quick look at it (specifically, "dis-...-ment" words), uses the wording "The act of ___ing, the fact of being ____ed." I think leaving off "The act of" could work. I don't think it's sufficient to replace "act" with "condition", "state", or "condition", as Urszag points out. "His dismemberment was swiftly accomplished" refers to an action of dismembering, of which the subject is the passive recipient. It is not a state or condition of being dismembered (though the definitions worded with "act" fail to capture the possibility of the word being used that way). That being said, note the way we actually do handle this at dismemberment with two definitions, by contrast with, say, "disenthronement," which uses "The act of being". Andrew Sheedy (talk) 03:12, 7 September 2024 (UTC)Reply

Maybe a better way of looking at this is to divide the instances of "the act of being" into two categories:

  1. those of the form "act of being [ADJ]" in which be is a copula.
  2. those of the form "act of [present progressive passive verb] (=being [PAST PART.])" in which be is an auxiliary.

I don't think there are many other cases to distinguish.

I believe that all of the instances with be as a copula are simply wrong, as the English copula is stative.

The other case is one of the awkward and unjustified use of act when the purported actor/agent is actually a patient. DCDuring (talk) 21:39, 7 September 2024 (UTC)Reply

Icelandic útúrsnúningur (the act of being intentionally obtuse) and Hungarian foglalkozás (the act of being occupied with something) are both fine, I think. The former despite seeming to be a copulative phrase, the latter despite seeming to be a passive progressive phrase. In actual fact they're both durative non-passive actions and the word act positively contributes to the definition. On the other hand the definition of Swedish umgänge (the act of being with people) is indeed better off rephrased. It's a case-by-case question. —Caoimhin ceallach (talk) 19:37, 8 September 2024 (UTC)Reply
Not only does the English seem to be a copulative phrase, it is a copulative phrase. Being intentionally obtuse is not a "durative non-passive action" because it is not an action. If the word were defined as "pretending or intending to be obtuse" or "acting obtuse" that would be different. I am not too surprised that the problem has to be remedied on a case by case basis. As I siad above: "The correction could be simple (automatic or semiautomatic) replacement of the offending expression, but probably more extensive rewording would be better". DCDuring (talk) 20:05, 8 September 2024 (UTC)Reply
What I meant was that the words I quoted denote durative non-passive action. If they are most clearly glossed by "act of being [ADJ]" or "act of being [PAST PART.]," I don't see that as a big issue. "Act of pretending/intending to be obtuse" would be inaccurate and "act of acting obtuse" is tautological. —Caoimhin ceallach (talk) 21:24, 8 September 2024 (UTC)Reply
What I am saying is that "act of being" never belongs in a definition because be is always either a copula or an auxiliary with an -ed-form making a passive verb. The predicate with a copula is never an act, in any normal sense of the word act (See act#Noun.). When being is a component of a passive, the subject of the passive is not an agent, but rather a patient. Patients do not act in any normal sense of the word act. Looking at occurrences of "the act of being" in Google Books should be sufficient indication that the expression is not normal English. It occurs principally in works of metaphysics. DCDuring (talk) 01:39, 9 September 2024 (UTC)Reply
My take is that it's a way to emphasize the verb POS, since an act has to involve a verb. A number of non-European languages use stative verbs where we use adjectives- we have a relatively small number (look, seem, sound, smell, to name a few), but those aren't what we think of when verbs are mentioned. Likewise, English has lost its passive morphology, and instead uses constructions made of forms already in use for other purposes. Saying "the act of being" makes it clear that action is referred to, even if it's being done by someone or something else. Of course, both are at the expense of not making literal sense, but their practitioners aren't paying attention to that aspect. Chuck Entz (talk) 03:44, 9 September 2024 (UTC)Reply
So we can use an expression commonly used in English only in metaphysics, theology, and spiritualism in our English definitions and glosses? DCDuring (talk) 11:34, 9 September 2024 (UTC)Reply
I get your argument. I get that being per se isn't an act, nor is being tired, or being eaten alive. But being intentionally obtuse is, and being occupied with something is too, despite them being grammatically analogous to the former examples. I think the semantics of the constituent parts is more important than the formal properties of the word being. If we're talking about formal logic you may have an argument, but this is a dictionary and clarity is all that matters. —Caoimhin ceallach (talk) 12:35, 9 September 2024 (UTC)Reply
If I thought that the "act of being" definitions were clear, I wouldn't have bothered introducing this topic. I think "act" allows for what ever modest active role a patient may have, but at the expense of muddying the basic definition.
I'm not in the slightest concerned about formal logic and didn't use it. I am concerned about not confusing ordinary folks who might see our definitions, not just on our site, but also via Google Search and entities like OneLook.com (which give us a very favored treatment), but don't have hyperlinks, hovertext, etc. to explain things and cover over some of the inadequacies of our definitions.
Andrew Sheedy noted that, given the need to define terms of the form dis-...-ment, the OED uses the wording "The act of ...ing, the fact of being ...ed." I don't think we should be too proud to learn from and follow their example. I don't recall ever seeing the wording "the act of being" used in any dictionary's definitions.
In the case of Icelandic útúrsnúningur (the act of being intentionally obtuse), I propose "intentional obtuseness", "pretending to be obtuse", "feigning obtudeness". "Intentional", "pretending", and "feigning" all convey an active role for the intender, pretender, or feigner.
For Hungarian foglalkozás (the act of being occupied with something), all the meanings of the verb Hungarian foglalkozik are active and are so worded, excepting "be engaged in", one of the synonym cloud to glosses of definition 4, which could easily be worded as "engage in". Hungarian -as seems to function much as English -ing. Normally we don't reword all the definitions of the verb at each form of the verb or each term derived from the verb, unless there are new meanings or, perhaps, restrictions of meanings. Leaving the definition of Hungarian foglalkozás as "verbal noun of foglalkozik" without the sense-obscuring gloss seems preferable to the current state. DCDuring (talk) 23:09, 9 September 2024 (UTC)Reply
I concede that some of your suggestions might work better than what we currently have. The problem however with English -ing (verbal noun) is that it is homonymous with the present participle. That's an ambiguity which I presume the admittedly not-so-beautiful "the act of"-formulation is intended to avoid. Alternatively we could choose a formulation like "the feigning of obtuseness" or "the engaging in an activity" for verbal nouns, but I'm not sure if this is better. Leaving out a gloss altogether seems even worse to me because not everyone has a perfect intuition for what a verbal noun is.
Perhaps you can make the case better for why ordinary folks might be confused by definitions like "act of being [ADJ]/[PAST PART.]" in cases where the phrase "being [ADJ]/[PAST PART.] clearly denotes an agentive activity, because I'm still not entirely convinced. —Caoimhin ceallach (talk) 00:06, 10 September 2024 (UTC)Reply

Korean determiner from of adjectives

[edit]

For example "진정한," I couldn't find any determiner form of Korean adjectives terms on this dictionary. Should we create a term for them or we should make them redirect to, for example, 진정하다, or should we just not create page for them? If we're going to create a term for them, what should we write? 列维劳德 (talk) 00:33, 9 September 2024 (UTC)Reply

I see 좋은 is a redirect although it used to be an entry. Justin the Just (talk) 12:19, 9 September 2024 (UTC)Reply
Honestly, I'd keep them at redirects, but I'll CC the other Korean editors: (Notifying TAKASUGI Shinji, Atitarev, HappyMidnight, Tibidibi, Quadmix77, Kaepoong, The Editor's Apprentice, Saranamd): , @Solarkoid. AG202 (talk) 22:34, 10 September 2024 (UTC)Reply
@AG202 I'm not keen on having hard redirects for what are basically non-lemmas. Theknightwho (talk) 22:41, 10 September 2024 (UTC)Reply
@列维劳德: At some stage I created a few determiner forms but they were frowned upon. There are just too many forms, even determiners. It's best to focus on lemmas. It's a similar situation with Japanese verbs and adjectives. Anatoli T. (обсудить/вклад) 04:20, 11 September 2024 (UTC)Reply
[edit]

These don't link to Chinese — for example, the first link on jan4 goes to 人#Cantonese instead of 人#Chinese. Are there any objections to me changing this? -saph668 (usertalkcontribs) 17:55, 9 September 2024 (UTC)Reply

RQ for template editor

[edit]

I've been working on adding support for dark mode with @Ioaxxere and there are many small templates that are locked that are unreadable in dark mode. I'm mainly creating a dark mode color palette using recommendations from MediaWiki, so far it's coming along great but some templates that need adjustments don't have any classes and can't be adjusted from my css page. — BABRtalk 22:16, 9 September 2024 (UTC)Reply

No objections from me. Benwing2 (talk) 04:08, 10 September 2024 (UTC)Reply
@Benwing2 No objections yet. Do you think ~29 hours is enough time or should we wait a bit longer? — BABRtalk 03:09, 11 September 2024 (UTC)Reply
Give it a day or two and if no one says anything I'll add you. Benwing2 (talk) 04:07, 11 September 2024 (UTC)Reply
@Benwing2, it's been about two days now, I think. — BABRtalk 20:05, 12 September 2024 (UTC)Reply
@Babr Thanks for reminding me, you are now a template editor! Benwing2 (talk) 20:08, 12 September 2024 (UTC)Reply

About a new translation parameter for quote-book

[edit]

For multiple different historical circumstances, it was pretty common for many Albanian books and periodicals to be printed with an Italian translation next to the Albanian text. These translations are very valuable because they were written by the authors themselves, which let us know more or less exactly what the authors meant for each word they wrote. For this reason I think they should not be omitted whenever these works are quoted, as we would be losing important context. Take for example akull (§ ety 2) where we are deducing the otherwise unattested dialectal meaning "arrow" only thanks to the Italian translation of the quote, which is however not presented in the entry, same thing for kacadre. For this reason, on the entries tipos and gëluhë I temporarily used the parameter |origtext= which does exactly what I'm looking for, except for printing the text "original", which is not the case here as the original text is actually the Albanian one.

So I propose the creation of a parameter |transltext= (and all other |transl... etc.) to essentially work like |origtext= but saying "translation". It's true this parameter name could look confusing at first, given we also have |transl= and |text= respectively, I can't think of a better one. Catonif (talk) 23:18, 9 September 2024 (UTC)Reply

@Catonif I'm thinking maybe instead of generalizing this slightly, so that there would be an |alttext= param (and maybe |alttext2=, etc.) along with a param |altprefix= or something to specify the prefix used for this text. Original text, normalized text, translations, etc. are all instances of "alternative" text that you might want to show, and supporting multiple types of alternative text would let you e.g. put renderings in multiple languages if it's important to do so. For a case where this might be useful, see the example in the {{quote-book}} documentation of Italian text translated from a French translation of the Zhuang-zi. There's no current way of inserting all of the original Old Chinese text, the first-level French translation, the second-level Italian translation and the modern English rendering. Similarly sometimes people have found it important to include multiple renderings in English, e.g. a "poetic" translation and a literal translation; the poetic one could use |alttext=. Pinging @Sgconlaw and @Vininn126 who may have thoughts about this. Benwing2 (talk) 03:15, 12 September 2024 (UTC)Reply
BTW these parameters would actually be added to Module:usex so they are also available to {{ux}}, {{quote}} and the like. Benwing2 (talk) 03:17, 12 September 2024 (UTC)Reply
Hi @Benwing2, to me this sounds like a very sensible solution, you have my full support. Just to be clear, this wouldn't imply the removal of the |norm= parameter, right? Catonif (talk) 08:15, 12 September 2024 (UTC)Reply
@Catonif Right, that param would remain. Benwing2 (talk) 08:24, 12 September 2024 (UTC)Reply
This sounds fine to me. Vininn126 (talk) 09:20, 12 September 2024 (UTC)Reply
I like this. Especially |alttextN=. Thereby we do not need to conceive, or it is agnostic to, why an editor wants multiple texts. Otherwise it becomes intellectually challenging to distinguish the multiple formatting options, new version, old version, original version, translation text and so on. So stick with |text= / |passage= for the entry language’s quote and before- and after-texts the labelling of which the editor can choose, including for |t= in the case that something is to be said about the translation. Fay Freak (talk) 01:14, 13 September 2024 (UTC)Reply
It's useful to have an unambiguous and well defined interpretation for the standard parameters, such as |text=, |norm= or |t=. This makes quotations machine readable. At the same time, the potential great flexibility and freedom of |alttext= makes it great for humans, but possibly difficult to parse for machines. --Ssvb (talk) 04:32, 13 September 2024 (UTC)Reply
@Benwing2: Will this new |alttext= model allow differentiating professional English translations from published books and the English translations provided by Wiktionary contributors?
For example, the Belarusian quotation from бачыць right now lists the English translation done by Mary Mintz in 1989 in the |t= parameter together with the |newversion=English translation from parameter. But the |t= parameter is normally supposed to be used for the translations authored by Wiktionary editors, right? --Ssvb (talk) 04:18, 13 September 2024 (UTC)Reply
I like this, and think it would be useful to have in the language-specific usex templates, too, like {{ja-usex}} which sometimes include translations to multiple languages like on だはんで. Or, migrate the language-specific code from the language templates into Module:usex and use that everywhere. Pinging @Fish bowl who might have some good ideas about this. JeffDoozan (talk) 13:58, 14 September 2024 (UTC)Reply

zzxjoanw Definition

[edit]

Hello,

How can we make a dictionary entry for zzxjoanw? I made a pronunciation audio yesterday, and it's said to be of Maori origin. It apparently means "drum" "file" or "conclusion". There is a Wikipedia page for it, and I put my voice on Wikipedia a moment ago. Can anybody attest to that?

Thank you Flame, not lame (talk) 00:34, 10 September 2024 (UTC)Reply

It's a known hoax. Besides which, Polynesian languages like Maori pretty much always end syllables with a vowel, nor do they have double consonants, and Maori doesn't use j, x, or z in native words. Chuck Entz (talk) 04:45, 10 September 2024 (UTC)Reply
@Chuck Entz: You don't generally get b, c, d, f, l, q, s, v and y in Maori either, though the place and lake Waihola (known for its black swans) springs to mind. I'm not sure if 'ng' and 'wh' (pronounced like an f) combined qualify as Maori letters. DonnanZ (talk) 11:20, 12 September 2024 (UTC)Reply
Maybe define "zzxjoanw" as an English word that is a fake Maori word. Flame, not lame (talk) 08:13, 10 September 2024 (UTC)Reply
@Flame, not lame: The pronunciation is at Appendix:English dictionary-only terms/zzxjoanw. J3133 (talk) 05:05, 10 September 2024 (UTC)Reply
Writing as an NZer, we don't need this. There shouldn't be an entry for Waikikamukau either, a fake Maori place name pronounced "why kick a moo cow". DonnanZ (talk) 10:54, 12 September 2024 (UTC)Reply
I can't believe it. "Don't talk to me." 💔 Flame, not lame (talk) 22:14, 12 September 2024 (UTC)Reply

Letter articles without definitions, Latin Extended Additional block

[edit]

Hi. Per request, I'm consolidating requested deletions here rather than posting 'rfd' multiple times. The following articles have no content, apart from the Unicode name being used in place of a definition. There is no indication of which languages they may be used in, or if they're translingual as claimed by the header. Previous consensus has been that the Unicode name, or a paraphrase of it, does not qualify as a definition.

I'd be interested in seeing what these letters are used for, if anyone can document them, but I've failed to find anything myself.

Thanks. kwami (talk) 07:25, 10 September 2024 (UTC)Reply

Latin: words with suffixes (and prefixes?)

[edit]

I couldn't find habitasne. I eventually decoded it as habitās + -ne. @Nicodene has advised that words like habitasne do not merit a separate entry, but I feel it's worth a broader discussion, expanding on the original conversation.


I wouldn't mind so much not having a separate entry for words like habitasne if there were an easier way to find them. For example, if the search results could be coerced into showing something more useful, and/or if it could be presented (perhaps not hyperlinked) in a table of conjugations or similar. Having the word appear somewhere within the main verb entry could presumably help the search page to list it too — it seems to work this way for bailémosles, which doesn't have its own entry, but is mentioned within the bailar entry.

Slightly off-topic: is it possible to force the search to only look within a specified language or languages? For example, if the user 'knows' that the word is Latin, then can they avoid being shown results from English and other irrelevant languages?

A table of conjugations can perhaps (?) be automatically generated (and manually edited where necessary), and minimised by default. I'm thinking of the Spanish verb conjugation tables, which include not just the 'simple' conjugations, but also every (?) combination with pronoun suffixes too, such as in the Selected combined forms of bailar table: many words in the table do have their own entries too (such as bailarme and bailándolas), and are correspondingly linked, but also many — presumably less common — words don't (such as bailémosles and báilenla), which are somehow redlinked without being coloured red in that table. The convention adopted for Spanish verbs must surely have expanded the number of entries, but was still deemed worthwhile.

For people familiar with Latin it may seem trivial. But suppose I encounter a hypothetical new Latin word *previviruminsmaste. I would rather not have to 'manually' try various combinations in order to decode it: for example, would it be defined in one single entry (previviruminsmaste), or across two entries (pre + viviruminsmaste or previ + viruminsmaste or previviruminsmas + te or previvirumins + maste etc.), or across three entries (pre + viviruminsmas + te or previ + virumins + maste etc.)?

Alternatively, if there is no desire to add conjugation tables or similar to each main verb entry, how about adding a link to a grammar page listing various Latin suffixes (and prefixes?) that can commonly be applied?

—DIV (2001:8004:44F0:5AD4:F864:E084:92A9:DBF4 00:51, 11 September 2024 (UTC))Reply

The problem is that -ne belongs to a class of "enclitic" particles that can in principle be suffixed to just about any Latin word, not just verbs. See the discussion at Talk:fasque#Deletion_debate. It would be unwieldy and not very useful to display all of these forms even in a collapsed table. Ideally we would have redirects, but one of the many structural flaws of Wiktionary as a MediaWiki-based multilingual dictionary is that it doesn't make it easy to set up automatic redirects for cases like this.--Urszag (talk) 01:01, 11 September 2024 (UTC)Reply
One can vaguely dream of a search feature which, in the event of a query finding no results, displays results matching that query minus various common clitics. Nicodene (talk) 01:24, 11 September 2024 (UTC)Reply
Many languages have similar issues; e.g. we don't lemmatize every word with 's added. Arabic single-letter conjunctions and prepositions regularly cliticize onto the next word, which can momentarily trip up even native speakers, e.g. Fenakhay was looking for the rare verb اِسْتَسْأَلَ (istasʔala) and momentarily interpreted the form أستسألني as a form of this verb أَسْتَسْأَلُنِي (ʔastasʔalunī) (maybe "I ask myself", although this may be ungrammatical) when it's actually أَ (ʔa) (which marks a yes-no question) + سَ (sa) (future-tense prefix) + تَسْأَلُ (tasʔalu, you ask) + نِي (, me), i.e. "will you ask me?". The problem, as noted by User:Urszag, is that many types of clitics can be added onto pretty much any word, e.g. -que, -ne and -ve attach to the first word, whatever it is. Potentially we could write a JavaScript add-on that helps with this; we already have JavaScript gadgets that auto-redirect certain forms to certain other forms, and so it's not out of the question to write a gadget that tries to analyze a nonexistent word into its components, as User:Nicodene notes. But this would have to be quite complex and ideally would have machine learning attached to figure out the likely language and do the segmentation. Benwing2 (talk) 04:05, 11 September 2024 (UTC)Reply
There was a discussion a while ago about having our templates (modules), whenever they transliterate an Arabic word, output all the common methods of transliteration of Arabic, but then set them [except the Wiktionary Standard one] to "display: none" (or hide them in HTML comments), so that searching for them in our on-site search engine, or on Google, will find the relevant pages, without the extra content actually being displayed to readers of the entry itself. I'm not sure whether anyone got around to implementing that, but the same basic idea suggests itself to me here: have Latin-specific headword or inflection templates produce (but not display) text like "suffixed with -ne: foobarne, suffixed with -que: foobarque", etc, so that searches for those things find the relevant entry (and ideally the 'results snippet' even includes the "suffixed with..." text so the reader can work out what's going on). (Edited to add: the discussion was in December 2022.) - -sche (discuss) 19:16, 11 September 2024 (UTC)Reply
Other dictionaries of course do these automatic redirects since the user chooses his language in the mask. MediaWiki is not for languages, though we be better than the other dictionaries by virtue of the other things MediaWiki is designed for. One probably indeed needs some frequency data for any target language, if not so-called machine learning which sounds so undebuggable as to be out of scope of a Wiki. A list of relevant clitics and the mechanisms by which they are attached to be edited by trusted editors plus a toggle for the searcher to restrict for a specific language. I rather believe in someone just doing it with an external site … Fay Freak (talk) 19:46, 11 September 2024 (UTC)Reply

I’m not a TTS

[edit]

Last month @Fytcha deleted my contributions to Wiktionary accusing my voice recordings of being “computer-generated”. They are not. I request them to be brought back. JapanYoshi (talk) 00:23, 13 September 2024 (UTC)Reply

Your audio files have weird singsong intonation and missing or mispronounced sounds. Whether they're generated by a computer or by a human, we would rather have audio generated by people who speak the language well. - -sche (discuss) 01:34, 13 September 2024 (UTC)Reply
@-sche: There's no missing sound. The /p/ at the end of poop in poop pipe is [p̚] which is perfectly valid in GA (in fact, [p] would sound unnatural). With that said this one sounds a little non-native to me. I don't think JapanYoshi is using TTS, unless it's some kind of super-advanced AI that sounds exactly like a human, in which case I wouldn't be opposed to it anyway. Ioaxxere (talk) 06:39, 14 September 2024 (UTC)Reply
I hope this thread is not used in the future to assert consensus for the use of TTS. I'd be strongly opposed. The takeaway is that JapanYoshi's audios may not be TTS but they are incredibly unnatural sounding. Vininn126 (talk) 09:13, 14 September 2024 (UTC)Reply
I don't get the sense that the audios were recorded by someone who doesn't speak the language well. I think any unnaturalness is simply due to over-enunciation. So the solution is very simple: we just explain to JapanYoshi that we want audio files to sound as natural as possible, not over-enunciated, and if we find their speech a little unusual, then it should be labelled with the appropriate region. Of course, if JapanYoshi does have an idiosyncratic accent, then we might consider letting them know that it's important for the audio to be fairly representative of a given accent. But let's not jump all over this person and their contributions without even welcoming them to Wiktionary and giving them a little guidance! Andrew Sheedy (talk) 16:30, 14 September 2024 (UTC)Reply
As I said in the other thread, there's a certain type of pronunciation that should be expected in a dictionary. You can look at the OED, Merriam-Webster, Dictionary.com, etc. to find it. Even if they may sound like TTS (or be TTS), they still have a very specific and clear intonation & enunciation that's been missing from your audios. Remember that audios are meant to illustrate a standard pronunciation and are especially used by non-natives for learning/illustration purposes. Honestly, I really do think that we should have an explicit policy as to what should be expected from audios reflecting standard speech, as a lot of recent ones have been problematic for various reasons. I've been getting frustrated with the less-than-par quality that we've been allowing recently, and I also really do not have the time to go through each one, removing the ones that sound noticeably non-standard. CC: @-sche, @Benwing2, @Theknightwho. AG202 (talk) 17:29, 14 September 2024 (UTC)Reply
@AG202 I agree with you about the need for good-quality audio files but would an explicit policy accomplish anything? I assume the people who are uploading bad audio files are unlikely to read the policy even if it's in their face. Maybe a more relevant issue IMO concerns User:DerbethBot, which auto-uploads audio files. I have encountered lots of problems due to this; some of them are old but I don't know if they are all fixed and I can't tell from looking over the github source code because it isn't well documented. As an example, all audio files for any Arabic-script and Hebrew-script terms should be blacklisted because the scripts are underspecified. Yet there are definitely lots of Arabic script audio files present that have been uploaded by DerbethBot, and I don't know whether this can still happen. Benwing2 (talk) 20:30, 14 September 2024 (UTC)Reply
@Benwing2: An explicit policy would at least give us something to point to if we added a like filter or warning or something. It would also give less wiggle room to argue about what is and isn't "good audio", as we've seen happen time and time again. Re: bots like User:DerbethBot: as stated on their page, "Administrators: if this bot is malfunctioning or causing harm, please block it." If it's been seen time and time again that this bot is problematic in the way it adds audios, why wasn't it been blocked since 2008? Honestly, it baffles me a bit. AG202 (talk) 00:45, 15 September 2024 (UTC)Reply
@AG202 Because I don't know if the bot is still causing issues; I'm waiting for User:Derbeth to respond. I feel somewhat uncomfortable about the idea of a bot that auto-adds audio files without any way of verifying the correctness of the actual content, but not enough to insist on shutting it down outright. Benwing2 (talk) 01:15, 15 September 2024 (UTC)Reply
Doesn't that bot operate based on a whitelist? Vininn126 (talk) 06:26, 15 September 2024 (UTC)Reply
I don't think so, maybe it has whitelists for sites but generally it uses blacklists I think. Benwing2 (talk) 06:29, 15 September 2024 (UTC)Reply
I think that a workable solution would be to implement some sort of a voting system at https://lingualibre.org/ (the project under the Wikimedia Foundation's umbrella?) and grade audio recordings based on that. The downvoted audios or their submitters could be blacklisted. But the problem is that not all words even have audio recordings available, and beggars can't be choosers: https://lingualibre.org/wiki/List:Eng/Lemmas-without-audio-sorted-by-number-of-wiktionaries --Ssvb (talk) 18:06, 15 September 2024 (UTC)Reply
(Previous discussion: Wiktionary:Beer parlour/2024/August § Synthesised audio files (again))
@JapanYoshi: Hey. I'm willing to take you at your word that your audio files are not computer-generated and I therefore apologize for my mistake. When I read the BP thread I linked to above, I reviewed your audio files and found there to be multiple issues with them. On top of that, some files, especially File:En-fucking Nora.ogg, sounded "robotic" to me; to my ears, there's something weird going on in the higher frequencies of that file, exactly the same kind of artifact often encountered in synthesized audio files. However, that could just as well be your recording equipment or maybe I'm just mishearing things.
All that said, even if not the product of speech synthesis software, there's still issues to be addressed with your audio files as others in this discussion have also pointed out. I won't touch your audio files anymore and I wouldn't revert you if you revert my removal. — Fytcha T | L | C 19:58, 14 September 2024 (UTC)Reply
Having just noticed this, I'm going to block the user for unacceptable conduct (bigotry). For my part, I set the block length to indef because my feeling is that a user whose 31 edits are "queer people / theorists advocate pedophilia and bestiality" and unnatural audio and defense thereof (which, in light of the other thing, someone could even speculate could be trolling) is NOTHERE, but if the user makes an unblock request that persuades another admin that they should be unblocked, I (of course) won't stand in the way of that. - -sche (discuss) 21:28, 14 September 2024 (UTC)Reply
WTF, -sche? You blocked them for asking a question "why don't we have such an entry?" I'm disappointed. Denazz (talk) 21:39, 14 September 2024 (UTC)Reply
Yikes on their part. AG202 (talk) 00:46, 15 September 2024 (UTC)Reply
I oppose an indefinite block. I don't really understand what they were getting at with that question you blocked them for, but I really don't understand what happened to assuming good faith. It seems we assume bad faith now until a user can prove otherwise. Andrew Sheedy (talk) 04:00, 15 September 2024 (UTC)Reply
Well, it sure reads to me like that post is trolling. However, if they ask for an unblock and can provide a reasonable explanation why their post was not trolling, I would not be opposed to reducing the block to e.g. a month. Benwing2 (talk) 04:08, 15 September 2024 (UTC)Reply
Well, can I ask for a unblock on their behalf? I assume they were asking why we were missing a term, albeit a vulgar/offensive one. Sure, it wasn't very clear a request, but not blockworthy. Like if I ask "I heard 'you are a honky donkey' in a film used against a white person from Ottawa. Can we add it?" Denazz (talk) 07:21, 15 September 2024 (UTC)Reply
No, it's very obvious what type of place they're coming from. Judith Butler has never argued for those topics, nor is she their advocate, and anyone who's seriously taken a look at queer theory would know that those topics are not a part of it. Don't be obtuse, saying that they're just asking "why don't we have xyz entry". It's at best trolling and at worst bigotry to equate queer theory and queerness with those unrelated repulsive topics. I'm disappointed but not surprised. AG202 (talk) 15:40, 15 September 2024 (UTC)Reply
I know little of LGBT-studies, and their stance on the issue is not my concern. The user deserves another chance. BTW, if anything creative comes out the discussion, let it be the creation of q***r! Denazz (talk) 07:26, 18 September 2024 (UTC)Reply

zh-pron order

[edit]

Hey, I think Wade-Giles should appear above Tongyong Pinyin in Template:zh-pron, on the basis of wider usage of Wade in personal names, historically, etc. I don't know where to make that change in the code but I think it would be non-controversial. It seems too minor for beer parlor and grease pit. If anyone that sees this could do that, I would appreciate it, or tell me what to do and where could I take this. Thanks. --Geographyinitiative (talk) 22:58, 13 September 2024 (UTC)Reply

I'm not sure if this change should be done. Pinyin is, to my understanding, more widely used in transliterating Chinese than Wade-Giles presently, so it should go first. CitationsFreak (talk) 00:17, 14 September 2024 (UTC)Reply
Hanyu Pinyin, not Tongyong Pinyin. MuDavid 栘𩿠 (talk) 00:51, 14 September 2024 (UTC)Reply
Oh yeah, of course Hanyu Pinyin is on top, I'm just saying that it should be: Hanyu Pinyin, then Bopomofo (so-called "Zhuyin"), then Wade-Giles, then Tongyong Pinyin, then the rest. I think that's more in line with volume of usage of those systems. --Geographyinitiative (talk) 09:29, 14 September 2024 (UTC)Reply

Stormy Weather 🌧️

[edit]

so I'm letting you know I'm probably going to disappear because I'm not happy and I feel so remorseful because I write stupid, idiotic, asinine edit summaries. I'm such an annoying girl with my low-quality audios and painful voice. my life is so stressful. I'm crying as I type this message, and I don't know when I'll stop.

I'm so, so sorry! I really am! "Don't talk to me." 💔 Flame, not lame (talk) 14:38, 14 September 2024 (UTC)Reply

Hehe, don't worry about things so much, Flamey. The audios are cool, we're mostly happy with them - the thing is, there'll always be more audios to record, it is an impossible project to complete. Oh, and in general edit summaries are, like, the least important aspect of amateur lexicography ever.... One more thing: we all have our moments of wanting to quit WT. Maybe take a break, focus on your studies, whatever. Hopefully see you back soon :) Denazz (talk) 15:47, 14 September 2024 (UTC)Reply
I know edit summaries aren't very important as long as my edits themselves (mainly audios) are not vandalism. also if I'm not promoting bigotry. I don't plan on quitting WT, but I'm really stressed out. "Don't talk to me." 💔 Flame, not lame (talk) 17:20, 14 September 2024 (UTC)Reply
@Flame, not lame: I think people tend to have more dramatic ups and downs when they're young, so it's understandable that you're feeling this way, but don't worry about any of that stuff. Everyone gets a bit of criticism here and there, but really, we're happy to have the help. By all means, take a break, but feel free to come back whenever and we'll be happy to have you back. Andrew Sheedy (talk) 16:26, 14 September 2024 (UTC)Reply
"Give it some time. You're young. It will pass."
I will remember that. even though I'm really, really, really dealing with pain right now, I know I'm going to give into temptation and edit again, but I felt the need to put an apology online because I don't want to get myself in trouble for an innocent mistake. "Don't talk to me." 💔 Flame, not lame (talk) 17:21, 14 September 2024 (UTC)Reply
It won't be long before this will be a minor bump in the road you can barely see in the rear-view mirror. That's not to say you won't run into more, but don't worry about this one.
If you're really, really embarassed by your edit summaries, an admin can hide them. Normally we just do that when there's information that might cause harm, to deprive vandals or spammers of an audience, or to not give ideas to future vandals.
In this case, since these edit summaries are completely unrelated to the edits there's nothing that needs to be kept for the record- so I see no problem with just hiding them in the same way we delete entries that the creators decide are mistakes (as long as no one else has edited them). Chuck Entz (talk) 01:24, 15 September 2024 (UTC)Reply
I'm trying to move on. Great to see I'm not in a bunch of trouble.
I'm not embarrassed about my edit summaries, but I am trying to avoid getting myself in trouble or accused of vandalism. In reality, I am submitting useful audios but implementing humor. I now add /* Pronunciation */ to the beginning of summaries followed by a joke so people know I am a sincere editor. anyway, it is absurd how quick some people are to pronounce me a vandal without clicking on the time stamp nor looking at "diff" in history. "Don't talk to me." 💔 Flame, not lame (talk) 01:33, 15 September 2024 (UTC)Reply
You'll probably be embarrassed in a few years, Flamey!!! Wonderfool, at your age, kept on edit-summarying about themselves, oh how silly they were. Denazz (talk) 07:25, 15 September 2024 (UTC)Reply
Yesternight I showed my mom how to put audio on Wiktionary, and I did include /* Pronunciation */ in the edit summary followed by a joke. "Don't talk to me." 💔 Flame, not lame (talk) 09:05, 15 September 2024 (UTC)Reply
"At your age"
I don't share my age. "Don't talk to me." 💔 Flame, not lame (talk) 13:28, 15 September 2024 (UTC)Reply
Let me also say that I appreciate your edits and while your edit summaries are silly, the content you've added is great. I hope you continue to contribute here. —Justin (koavf)TCM 15:47, 15 September 2024 (UTC)Reply
You're such a sweetie. "Don't talk to me." 💔 Flame, not lame (talk) 15:51, 15 September 2024 (UTC)Reply
Takes one to know one. Keep up the good work, friend. Onwards and upwards, please. —Justin (koavf)TCM 15:52, 15 September 2024 (UTC)Reply
yes "Don't talk to me." 💔 Flame, not lame (talk) 16:41, 15 September 2024 (UTC)Reply

Archiving failure at payback's a bitch

[edit]

This deleted entry was just recreated, and I went to the talk page to see the details of the deletion. The only thing on the talk page was a discussion from January of 2013 where Equinox and I explained that the {{rfd}} template had to stay because its deletion was still being discussed (that put the entry on my watchlist, which is how I found out it was recreated). The page itself was deleted in April of 2013 as having failed RFD, but no mention of that made of it onto the talk page. It was only after wading through the revision history of WT:RFD from 2013 that I was able to find that it had been added to the RFD for life's a bitch, and I found the discussion was archived at Talk:life's a bitch.

How should we handle this? The new definition is odd and probably wrong, but not SOP. I'm not really sure what to do with the talk page so the previous RFD is reflected there, and right now it seems a bit off to tell someone that their good-faith page creation is going to be deleted because of something not explained anywhere findable from the entry or its talk page. If someone wants to retag the new entry and send it to RFDE, that might help. Chuck Entz (talk) 00:51, 15 September 2024 (UTC)Reply

Should personal attacks be restored?

[edit]

Let's say that Alice and Bob get into a heated personal argument. Insults lobbed, caps lock engaged, lawyers threatened, the whole nine yards. But then Caroline decides to hide the comments made by Alice and Bob. The question is, should the comments be restored?

My understanding of the policy is that they should be, since it's common etiquette to not edit other's comments. However, it does feel wrong to let such incivility lay out in the open. CitationsFreak (talk) 07:56, 15 September 2024 (UTC)Reply

I say they should be if only lest discords be continued by editing others’ comments. Any bidding to hide or keep hidden exaggerated argumentative behaviour specifically – as opposed to e.g. data protection violations where findability in machine searches has to be considered, which loses relevance in so far as pseudonymity is kept, and circular arguments of editors that have been hidden by a click-to-open box in ultimately inoffensive cases due to the otherwise unpalatable bulk of the discussion – sets a perverse incentive.
Empirically it is also shown that the cases occurring on Wiktionary – I won’t point at the cases – would have fared better if people would have gotten over the matter by just remembering they are not a main character and not reading it already, instead of heating up for edit summaries.
The probability of success in forgetting such microtraumata without even any cognitive reframing effort is great, while only complexity is engaged by hiding their agents in edit histories. Fay Freak (talk) 09:17, 15 September 2024 (UTC)Reply
I think hiding, but not deleting, is fine. I'd favor doing so whenever motives, values, attitudes, or beliefs are attributed to a user. DCDuring (talk) 17:34, 15 September 2024 (UTC)Reply

Request to change libelous block statement

[edit]

Hello, I am User:Gapazoid. I was blocked by User:Surjection for an edit to MAP that promoted the idea that there are people who are attracted to minors and are against inappropriate adult-minor relationships (so-called "anti-contact pedophiles" or "non-offending pedophiles"). Furthermore, I identified myself as being both attracted to minors and fundamentally against any sexual contact between adults and minors.

The stated reason for my block is "Unnacceptable conduct: pedophilia advocacy". To the average person, "pedophilia advocacy" implies that I was promoting inappropriate adult-minor relationships, which is not the case, as I have repeatedly explained. It is very important to me that the public is not misinformed about my values. In my discussion with Surjection on their talk page, they stated that the exact reason for my block was "promoting the idea that there are non-offending pedophiles". This statement I have no issue with, as it is truthful and cannot be misconstrued.

Therefore, I am requesting that the stated reason for my block be changed from "Unacceptable conduct: pedophilia advocacy" to "Unacceptable conduct: Promoting the idea that there are non-offending pedophiles". 76.35.75.228 18:28, 15 September 2024 (UTC)Reply

Quotations asserting dubious, clickbaity, controversial, deceptive or sometimes even blatantly false statements or conjectures

[edit]

Today an IP user made this edit, adding a quotation about Putin allegedly admitting that he is "hunting" his opponents. To put it mildly, I'm not a fan of Putin, but I personally think that the concept of "hunting" can be better illustrated using some other quotations, which don't mention Putin, Biden, Trump or any other modern politician. And also without trying to promote any political, religious, ideological or corporate narratives. I checked the WT:QUOTE#Choosing_quotations policy and it doesn't say anything on this topic. Does Wiktionary need some kind of a formal rule to prevent it from turning into an outlet of propaganda for various modern ideologies, hoaxes and conspiracy theories?

That said, really old books may contain quotations, which mention obsolete morally questionable barbaric traditions or some outdated unscientific information. Do they get a free pass on the basis of their age? --Ssvb (talk) 20:22, 15 September 2024 (UTC)Reply

Adding a controversial quote to an entry for a non-controversial term that doesn't need the controversial aspect to illustrate the meaning is definitely a violation of WT:NPOV. I'm not talking about politically charged or bigoted terms, which would have bigoted or politically charged usage that we would be wrong not to document. I'm talking about converting an entry for an ordinary term into a political statement by adding a quote that illustrates something political that doesn't need to be there to understand the term. Sure, there are lots of really outrageous things people have said that happen to contain the word "the", but that's not a reason to add them to the entry for "the" as quotes or usage examples.
I've reverted and hidden the edit that added that quote. Chuck Entz (talk) 20:50, 15 September 2024 (UTC)Reply
It is not a controversial term but its meaning range has to be described in a controversial context: “hunting” in the figurative sense always involves various kinds of, well, predatory behaviour designed to offend. In other words not the term, but the meanings being charged is enough for the quotes to be charged. Only that the Belarusian entry has not been that elaborate. It should get one gloss or at least quote that is normal and then the user might add propaganda, with some hyperbole.
I cannot agree with the general rule given that I normally browse political websites and find quotes there about specific people or brands that everyone relates too, even find them when I specifically search terms, because that’s what publications are about since humans are social animals, and if you aspire to be in a leading political office, the more so as an eternal leader, you have to bear being an example of everything. I would be more concerned if the shown example skewed the usage already, like the anti-abortion propaganda redefining infant the other day. Fay Freak (talk) 18:29, 16 September 2024 (UTC)Reply

templatizing raw category references more generally

[edit]

We have a mechanism that automatically tracks raw category references. See subcategories of Category:Entries with language name categories using raw markup by language and Category:Entries with topic categories using raw markup by language. I have a script to templatize these into calls to {{cln}} and {{C}}, respectively, grouping multiple categories into a single call as much as possible. I have run the script on all the pages in various individual common languages (e.g. Spanish, French, German, Russian, Italian, several others) but not generally. As discussed in Wiktionary:Grease_pit/2024/September#user_sandboxes_showing_up_in_categories and elsewhere, there are two major disadvantages in using raw category references: (1) sort keys are not properly generated for languages that have custom sort orders, meaning that the pages are wrongly ordered in the categories in question; (2) transclusions of pages containing raw category references into userspace pages result in the userspace pages wrongly showing up in the categories in question. My script has been extensively tested through its use on the various common languages mentioned above. I propose running the script on all the pages in all languages under the above maintenance categories; or maybe, on all but certain languages (e.g. excluding English if people object for some reason to this). The only downside I can think of is a slight increase in Lua memory usage, which could theoretically push some pages over the limit if they're close to it; but AFAIK, with the increase in memory limits late last year, no pages are close to the current limit. Instead, we're hitting the template expansion limits before the memory limits, and I don't think the use of Lua-based templates here will increase the template expansion size (and in any case, the number of such categories on a given page is usually small). If for some reason any given page ends up hitting a limit as a result of this, we can back out the changes for that particular page, whitelist it in the code that generates the contents of the subpages of Category:Entries with language name categories using raw markup by language etc., and blacklist it in my templatize-categories script so a future run won't affect it. Benwing2 (talk) 23:12, 15 September 2024 (UTC)Reply

I personally am in favor of this; it would save me a lot of time and energy. -BRAINULATOR9 (TALK) 03:48, 17 September 2024 (UTC)Reply
[edit]

{{senseno}} is a nasty hack created by IP 70.* (who seems to have disappeared). Per discussion with User:Vininn126, I am planning on renaming it to {{senselink}} and not having it display the sense number by default because I don't believe this is the best way of referring to senses; instead senses should be named using a summary of the meaning of by POS. Also the code in Module:senseno to determine the number of the sense is really terrible. I am planning on renaming it to {{senselink}}, cleaning it up and adding a required param to specify either a meaning summary (arbitrary text), a request for the sense number (maybe +# or something) or a part of speech (maybe pos:POS or something). As an example of what I mean, under raz you have:

{{senseno|pl|counting|uc=1|pos=numeral}} is a generalization of {{senseno|pl|instance|pos=noun}}, for which see {{cog|zlw-opl|raz}}. For this use, compare {{cog|ru|раз}}. {{senseno|pl|case|uc=1|pos=noun}} is a semantic narrowing of {{senseno|pl|instance|pos=noun}}.

which displays as

Sense 1 is a generalization of sense 1, for which see Old Polish raz. For this use, compare Russian раз (raz). Sense 1 is a semantic narrowing of sense 1.

This is really awful. Much better would be "the sense one is a generalization of (one) time, for which see (etc.)", which would make a lot more sense. Benwing2 (talk) 06:13, 16 September 2024 (UTC)Reply

Anything for more specificity would be preferable. Vininn126 (talk) 06:19, 16 September 2024 (UTC)Reply
No objection to adding an option for a gloss (which I already add manually), but I'm not sure it's a good idea to remove the ability to display a sense number altogether. If an entry has many senses, having the sense number may be a quick way of determining which is the relevant sense referred to. Could you provide an example of how the template would display if the sense number is removed?
On a separate note, perhaps there should also be options for adding the etymology number (for example "etymology 1, sense 2") and the part of speech ("noun sense 1"), though I'm not sure how this would be displayed if the sense number is ultimately removed as proposed. — Sgconlaw (talk) 19:43, 16 September 2024 (UTC)Reply
@Sgconlaw I'm not suggesting removing the ability to display a sense number, but just requiring that if the user wants a sense number, they request it explicitly using e.g. {{senselink|pl|#}} or {{senselink|pl|+#}}, instead of having the template default to displaying a sense number. Basically the second param is required and specifies either a short gloss, a request for a sense number or a part of speech. Benwing2 (talk) 19:54, 16 September 2024 (UTC)Reply
@Benwing2: ah, I see. Could you provide some mock-ups of how the template will display if different parameters (sense number, gloss, part of speech, etc.) are used? — Sgconlaw (talk) 20:24, 16 September 2024 (UTC)Reply
@Sgconlaw
  • {{senselink|pl|#}} displays as "sense 1" or whatever, linked appropriately
  • {{senselink|pl|pos:noun}} displays as "the noun sense", linked appropriately
  • {{senselink|pl|(one) time}} displays as "(one) time", linked appropriately; alternatively it could display as "the sense (one) time", but then you'd need a way of suppressing the words "the sense"
If you think it would be useful, I can provide syntax to include the etymology number; maybe {{senselink|pl|##}} displays as "etymology 1, sense 2"
or whatever. Benwing2 (talk) 21:41, 16 September 2024 (UTC)Reply
I think the template was designed for use in image captions, where brevity is valuable. I'm not hugely supportive of any changes, especially if it's going to introduce yet more of this bespoke syntax ("funny characters to memorise") that seems to be in vogue these days. This, that and the other (talk) 01:49, 17 September 2024 (UTC)Reply
@This, that and the other The problem is that this template is not only used for image captions. Would you rather have two different templates, one for image captions and another for etymology sections? That seems a worse solution. Benwing2 (talk) 02:02, 17 September 2024 (UTC)Reply
@Benwing2 surely it wouldn't be too much trouble to have {{senseno}} to generate a sense number, and {{senselink}} to generate a sense link? Obviously the situation at raz is silly, but it's hardly the fault of {{senseno}} that it's being forced to do bad things (I certainly would not have attempted to use it on an entry with more than one etymology section!). Anyway maybe I need to think about this some more. This, that and the other (talk) 02:09, 17 September 2024 (UTC)Reply
@Benwing2: One feature that I would like is a |t= parameter like our other templates. So you would be able to do something like Sense 2 ("something something") comes from [...] without having to hack it together in wikitext. It could be generated automatically as well but in some cases you want to shorten the gloss a bit. Ioaxxere (talk) 03:44, 17 September 2024 (UTC)Reply

That pesky dot in Template:pedia

[edit]

The dot at the end of {{pedia}} is an absolute nuisance. I suppose it was put there because most of our reference templates end with a dot, but (a) Wikipedia isn't a reference, and (b) the template is so frequently used in running text that the dot becomes a pain to deal with. If you want to write a sentence like "See {{pedia|Something}}", you have to remember to leave off the dot at the end, because the template adds it for you!

Would there be support for a bot run to add a dot after every instance of {{pedia}}, after which we can remove the dot itself from the template? The same would need to be done with the other bolded, logo-adorned Wikimedia project link templates like {{specieslite}}. This, that and the other (talk) 01:47, 17 September 2024 (UTC)Reply

Yes definitely. Benwing2 (talk) 02:03, 17 September 2024 (UTC)Reply
I frequently use {{pedia}} for references, but I have no objection to the removal of the dot from the template. There are other templates it could be removed from too. DonnanZ (talk) 08:19, 18 September 2024 (UTC)Reply

Minitoc

[edit]

{{minitoc}} has been added to 436 of our longest entries so far. I think now that we've had time to get used to it we can decide on a policy for when it can be added. I've personally found it very useful on mobile, especially with User:Ioaxxere/minitoc.js so I would support adding it to all entries with at least five language section (probably via a continuously running bot job). Note that the template uses some complex rules to show or hide itself on different skins which you can see here. Ioaxxere (talk) 03:44, 17 September 2024 (UTC)Reply

I love it. And I also love the initiative that has been taken to address a real, longstanding problem.
A few minor comments about the look of the template (I know that's not the purpose of this section!):
  • Why is it collapsed by default? The regular TOC is not collapsed by default, so why should this one be?
  • The space before the bullet should be a non-breaking space.
  • It serves no purpose when Tabbed Languages is enabled, so something should be added to MediaWiki:Gadget-TabbedLanguages.css to hide the minitoc.
As for the thrust of your point, I feel that a better threshold would be - add {{minitoc}} to entries where the regular TOC exceeds, say, 25 lines (this could be easily calculated by a bot). This, that and the other (talk) 04:46, 17 September 2024 (UTC)Reply
One thing I find confusing: why does it have to be implemented by a template in each page's source code (whether added manually or by a bot)? Is it impossible to add logic to whatever code generates and displays the normal TOC?--Urszag (talk) 07:31, 17 September 2024 (UTC)Reply
@This, that and the other: You can make it uncollapsed by default by clicking the link in the sidebar and clicking "Show table of contents" (see diff). @Urszag: Yes, we cannot control how the TOC is generated to my knowledge. Ioaxxere (talk) 02:57, 18 September 2024 (UTC)Reply
Thanks @Ioaxxere for addressing the issues. As for collapsing, I was talking about the default state for logged-out desktop users. The default state of the MediaWiki TOC is uncollapsed; the default state of {{minitoc}} is collapsed. What is the rationale for the inconsistency?
In any event, I Support wider use of this template. This, that and the other (talk) 04:57, 18 September 2024 (UTC)Reply
@This, that and the other: We could try that. I think we would have to change something in MediaWiki:Gadget-defaultVisibilityToggles.js or MediaWiki:Gadget-VisibilityToggles.js? Not sure (@Erutuon?). Ioaxxere (talk) 19:56, 18 September 2024 (UTC)Reply

ISO dab mislabeled; unicase Latin letters called "symbols"

[edit]

If we enter 'ISO' in the {lb} tag, it's converted to the generic "international standards". However, ISO frequently contrasts with other international standards. For example, at , the ISO definition contrasts with the IAST definition. Both are international standards, but currently one is labeled "IAST" while the other is labeled just "international standards". Could we restore the ISO tag to its actual meaning?

Also, at that same article, the letters of IAST and ISO transliteration fall under the heading "letter", while the letters of NAPA and UPA transcription fall under the heading "symbol". They're all alphabetic letters; the only difference is that NAPA and UPA are unicase, while ISO and IAST are, sometimes, cased. ISO and IAST are also often unicase, because they transliterate unicase scripts, but there is an allowance for capital forms. That is, however, the only effective difference, and we don't call the letters of unicase scripts like Georgian "symbols". Some languages are written only in the IPA, NAPA etc. alphabet, and it's weird to say they're written in "symbols". Shouldn't we just have two "letter" headers, one with casing and one without?

To be clear, IMO actual alphabetic symbols -- e.g. those like e and pi used in math, chemistry and the like -- should remain under the "symbol" heading, because they're not used as letters of an alphabetic script. kwami (talk) 09:21, 17 September 2024 (UTC)Reply

split up WT:RFM

[edit]

This page is over 1MB in length, which is too big. I propose a three-way split:

  1. Requests for language splits/mergers/additions go to WT:RFLM = Wiktionary:Requests for language moves, mergers and splits.
  2. Requests not related to individual terms (i.e. non-mainspace, non-Reconstruction and non-Appendix-only-language pages such as categories, modules, templates, etc.) go to WT:RFMO = Requests for moves, mergers and splits/Others (compare WT:RFDO).
  3. Requests related to individual terms remain on WT:RFM.

Benwing2 (talk) 04:02, 18 September 2024 (UTC)Reply

Support 2 and 3. I support 1 in principle too; I feel the name is a little wordy, and I would like something like WT:Lect workshop, but since it's a request page, not a discussion room, this might not be so great. Maybe WT:Requests to rename, merge or split languages (you don't "move" a language) or simply WT:Language treatment requests (a la WT:LT). In any event I don't want to hold this effort up with petty disagreements; it's exactly these language discussions that make the RFM page so large, so they are the priority for moving off the page.
While we're solving this issue, we should also deal with the issue that there is no natural place to archive language-oriented RFM discussions, yet archiving them for posterity is especially important. The current situation is that they are archived across WT:Language treatment/Discussions and Wiktionary talk:Language treatment/Discussions, but this is clearly unsustainable. I propose to resolve this issue by archiving completed discussions to yearly archive subpages of the new venue, such as WT:____/Archive/2024. We can't use yearly discussion subpages in the same manner as BP, because the request discussions need to be closed, handled and archived; it won't do if they silently disappear into oblivion at the end of each year. This, that and the other (talk) 05:10, 18 September 2024 (UTC)Reply
I can agree with not using "move" for the L2 discussion page. Vininn126 (talk) 20:21, 18 September 2024 (UTC)Reply
@Benwing2: I don't really care too much but I oppose doing this on the basis of page size alone, because that can easily be resolved by just archiving old discussions (why are there still conversations from 2015?). Also keep in mind that Wiktionary:Request pages will get really crowded with an additional column. Ioaxxere (talk) 19:53, 18 September 2024 (UTC)Reply
Some conversations from 2015 are still unresolved. I don't think we should archive unresolved discussions even if they're 10 years old. Benwing2 (talk) 19:56, 18 September 2024 (UTC)Reply

Reddit as a Source for WT:CFI

[edit]

I've largely avoided bringing this up since I frankly didn't have the energy, but I've seen one too many entries now. I'd like to start a discussion about whether or not we should include Reddit as a source for WT:CFI. Right now, it's been included de facto mainly because CFI has essentially stopped being enforced for online quotes (the main people who did are either inactive or have been overwhelmed or don't care anymore), but looking at entries like j*et (etym 2, sense 2), I'm very averse to having any entry that's solely based upon Reddit quotes. Reddit is much more anonymous than, for example, Twitter, and there's little to actually verify if each account is an individual person. The content is also not durable. 3 Reddit users is simply not enough for us to be including a word; it starts to harm our credibility and comes off as honestly unserious, becoming the likes of Urban Dictionary with some of the recent terms.

The last time this was brought up was at Wiktionary:Beer parlour/2022/September § Reddit, where the vote was 9-9, which meant that it shouldn't have been included by default, but again enforcement hasn't really been a thing. As such, I want to open up the discussion again. AG202 (talk) 05:22, 18 September 2024 (UTC)Reply

That being said, I would be okay with Reddit quotes showing usage, providing that there's ample evidence that it's not a complete nonce word (ex: references, much more than 3 quotes, etc.). AG202 (talk) 05:26, 18 September 2024 (UTC)Reply
I'd be OK with preventing Reddit from being an exclusive source of quotes. But I don't like the idea of just outright banning it. I added a slang term at one point (slang senses of dead) which was pretty well unciteable apart from Reddit and Twitter (and very difficult to find cites for on Twitter). Yet I hear and see this sense all the time in real life (texting, conversations) among my peers. A lot of slang just slips through the cracks if you're not able to use internet sources, so until there's an alternative, I'd prefer that we be generous in what we include. Andrew Sheedy (talk) 05:35, 18 September 2024 (UTC)Reply
Well yes, I'm not saying that we should ban it. I will say though that that particular example has been around for quite some time (I've used it since at least 2017 after checking my texts), and shouldn't be hard to find at all in other media, let alone Twitter. It's mentioned in this 2021 CNN article, this 2023 USA Today article, this 2022 The Atlantic article, and used in this CharlotteWeekly artice, for example. There's even a cite in the OED (no paywall) for this term. There's definitely no need for Reddit here, and I'm more so surprised that we didn't have it before considering how widespread it is, though maybe that's a sign. AG202 (talk) 05:52, 18 September 2024 (UTC)Reply
@AG202: I don't think Reddit should count for CFI in general. But CFI states that "a term should be included if it's likely that someone would run across it and want to know what it means", and this is clearly met for jeet. But if the issue is Reddit specifically, then we could find quotes from other websites as well. Generally I try to add a mix of Reddit and Twitter citations since both sites are easily searchable and archivable. Ioaxxere (talk) 19:46, 18 September 2024 (UTC)Reply

Reddit is an important tool for attesting fandom and video game slang. For all its flaws, Reddit seems to be the last mature, widely used, and openly accessible social-media platform. There's a decent built-in search function on the site that can be used to find quotes. Plus Reddit is indexed by Google and will likely remain so in the near term. The fact that Reddit is generally used for casual discussion also makes it an ideal source for quotes. I had a lot of success finding short illustrative quotes on Reddit. That's just not as easy to do on Tumblr. Tumblr's search functionality is basically non-existent, and the site's text content tends toward cryptic in-jokes and long essays. Reddit also potentially has some safeguards against linkrot. AFAIK you can't change account or subreddit names. Comments are also preserved after account deletion unless manually deleted beforehand. Whereas on Twitter and Tumblr, links can break with account name changes, as well as with account deletion or suspension. More broadly, I'd argue that Reddit is a modern-day Usenet. Wiktionary needs to keep pace with the evolving landscape of social media to document new and emerging language. The vast majority of fandom slang never makes it into print publications. I found it wasn't uncommon for fandom slang with a decade-plus history on Reddit to have zero hits on Google Books, Google Scholar, Issuu, etc. Sometimes the search needs to start where language is actually being used. WordyAndNerdy (talk) 08:01, 18 September 2024 (UTC)Reply

I agree with you all, but I agree with the status quo as well for the given reasons, since you don’t and won’t have a particular formulation that will find agreement, because precisely we have come close enough to include what we should include without being gameable by three Reddit usernames. The inclusion criteria always have been about what usage is there and not about what three quotes are shown on the front. Sometimes you don’t take a quote even from a book if it is the heading of a table or the context is too confusing or extensive for the quote to be read by anyone or the isolated sentence is misleading factually or politically. When we have a word with three Reddit quotes or three Twitter quotes or three Telegram quotes or whatever it is just for example, after which the editor called it a day, it is understood that one site is not enough like a word from the universe of one videogame is not enough. I don’t find it more ambiguous than is apt. Fay Freak (talk) 08:58, 18 September 2024 (UTC)Reply
I think Reddit should be a less broad analog to the "clear widespread use" clause. The main problem with Reddit is independence: as I said before when we first discussed using social media, you sometimes have the equivalent of a group of friends who hang out together in school and develop their own in-group vocabulary. The idea is to find a way to filter out the in-group stuff for narrow groups and look for the terms that people use not because it's part of participating in the specific group but because that's the way they talk online in general. I'm not sure exactly how to implement it, but that's the idea. Chuck Entz (talk) 13:58, 18 September 2024 (UTC)Reply
Exactly the above. Also @WordyAndNerdy: I get what you're saying and I do think Reddit is useful; I just think that we need more stringent criteria. 3 usages is not enough. Also, I would be against deleted accounts being counted for CFI (unless we're able to find the original account and cite that), as that completely goes against our criteria for independent usages from different people. AG202 (talk) 16:44, 18 September 2024 (UTC)Reply
These concerns have come up in discussions about Usenet. The "independent" clause in CFI applies to authors. It doesn't apply to publishers. Attesting an entry with cites from a single newsgroup or subreddit would be like attesting an entry with only New York Times articles. Sometimes terms are restricted to a specific community. Warhammer slang is easier to find on /r/warhammer40k than on cute animal subreddits. I usually gathered "extra" cites as a personal practice. But one needn't go out of their way as I did. Only derogatory terms and limited documentation languages are held to different standards under CFI. Three cites should be enough for attesting English slang off Reddit if it's enough to attest marketing buzzwords.
CFI's one-year requirement is effective at filtering out most nonsense. It's more difficult than one might expect to make fetch happen. Friend groups often lack the dedication and/or social cohesion necessary to push in-jokes/protologisms into wider use. This is not a problem encountered with any regularity in the wild. Bored kids will try dropping their coinages directly onto Wiktionary and move on when they're rejected (perhaps after a bit of feet-stomping). Few if any will go to the trouble of creating three Reddit sockpuppets and letting their comments age a year. And if anyone actually wants to play that kind of 4D chess it would be entirely possible for them to game protologisms onto Wiktionary with two friends and three letters to local newspapers.
As for deleted accounts, this seems no different from including quotes from anonymous authors, articles without bylines, etc. There can never be 100% forensic certainty that any three quotes are from three different people. However, it's often possible to infer authorship from style, timestamps, discussion flow, etc. on Reddit. "Deleted_account" replying to "ILoveGarfield82" multiple times in a single chain can reasonably be assumed to be the same user. There won't be many instances in which the only quotes available are from a deleted account anyway. Reddit's karma system disincentivizes account deletion and anything that meets the one-year provision of CFI is almost guaranteed to have been used by more than one person. WordyAndNerdy (talk) 20:24, 18 September 2024 (UTC)Reply