Wiktionary:Requests for moves, mergers and splits

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Wiktionary Request pages (edit) see also: discussions
Requests for cleanup
add new | history | archives

Cleanup requests, questions and discussions.

Requests for verification/English
add new English request | history | archives

Requests for verification in the form of durably-archived attestations conveying the meaning of the term in question.

Requests for verification/CJK
add new CJK request | history

Requests for verification of entries in Chinese, Japanese, Korean or any other language using an East Asian script.

Requests for verification/Italic
add new Italic request | history

Requests for verification of Italic-language entries.

Requests for verification/Non-English
add new non-English request | history | archives

Requests for verification of any other non-English entries.

Requests for deletion/Others
add new | history

Requests for deletion and undeletion of pages in other (not the main) namespaces, such as categories, appendices and templates.

Requests for moves, mergers and splits
add new | history | archives

Moves, mergers and splits; requests listings, questions and discussions.

Requests for deletion/English
add new English request | history | archives

Requests for deletion of pages in the main namespace due to policy violations; also for undeletion requests.

Requests for deletion/CJK
add new CJK request | history

Requests for deletion and undeletion of entries in Chinese, Japanese, Korean or any other language using an East Asian script.

Requests for deletion/Italic
add new Italic request | history

Requests for deletion and undeletion of Italic-language entries.

Requests for deletion/Non-English
add new non-English request | history | archives

Requests for deletion and undeletion of any other non-English entries.

Requests for deletion/​Reconstruction
add new reconstruction request | history

Requests for deletion and undeletion of reconstructed entries.

{{attention}} • {{rfap}} • {{rfdate}} • {{rfquote}} • {{rfdef}} • {{rfeq}} • {{rfe}} • {{rfex}} • {{rfi}} • {{rfp}}

All Wiktionary: namespace discussions 1 2 3 4 5 - All discussion pages 1 2 3 4 5

This page is designed to discuss moves (renaming pages), mergers and splits. Its aim is to take the burden away from the Beer Parlour and Requests for Deletion where these issues were previously listed. Please note that uncontroversial page moves to correct typos, missing characters etc. should not be listed here, but moved directly using the move function.

  • Appropriate: Renaming categories, templates, Wiktionary pages, appendices, rhymes and occasionally entries. Merging or splitting temp categories, templates, Wiktionary pages, appendices, rhymes.
  • Out of scope: Merging entries which are alternative forms or spellings or synonyms such as color/colour or traveled/travelled. Unlike Wikipedia, we don’t redirect in these sort of situations. Each spelling gets its own page, often employing the templates {{alternative spelling of}} or {{alternative form of}}.
  • Tagging pages: To tag a page, you can use the general template {{rfm}}, as well as one of the more specific templates {{move}}, {{merge}} and {{split}}.

Note that discussions for splitting, merging, and renaming languages are often also held here, and should be archived to WT:LTD when closed.

2015[edit]

Reviving the earlier discussion, I'm still bothered by the fact that we have two different categories for names. But the previous discussion also made it clear that it's not as easy as just merging them.

CodeCat 00:45, 10 November 2015 (UTC)[reply]

FWIW, what I am going to say is somewhat off-topic and maybe I'm minority on that, but I would not mind using the naming system "Category:English xxxx" for all topical categories: Category:en:Chess -> English terms related to chess. (or any better name along those lines) --Daniel Carrero (talk) 00:59, 10 November 2015 (UTC)[reply]
"Category:en:Transliteration of personal names" could be renamed to "Category:English names transliterated from other languages", I suppose. What's the matter with the demonyms category? It contains demonyms, as expected. Would it be better titled "English demonyms", on the model of "English phrases"? - -sche (discuss) 06:02, 10 November 2015 (UTC)[reply]
"Category:en:Transliteration of personal names" would be better named "English transliterations of (foreigners') personal names". Notice the existence of e.g.Category:Latvian transliterations of English names. Names of non-English speakers are not English names. I agree with CodeCat that place names belong to topic categories.--Makaokalani (talk) 14:32, 10 November 2015 (UTC)[reply]
Here's the old discussion if anyone wants to read it. - excarnateSojourner (talk | contrib) 15:58, 12 April 2022 (UTC)[reply]
Category:en:Place names was deleted by Equinox in 2017-05 because it was empty. Category:Transliteration of personal names (and its language-specific subcategories) were moved to Category:Foreign personal names in 2021-09 with the help of WingerBot. - excarnateSojourner (talk | contrib) 16:14, 12 April 2022 (UTC)[reply]
@ExcarnateSojourner There being no opposition here, only support (albeit mostly old support), and no opposition or interest when I brought this up in the BP, let's revise whatever needs to be revised to put (at a minimum) all given names and surnames into subcategories of Category:Names by language, instead of some of them being in subcategories of Category:Names. The split is haphazard and arbitrary; I see the intention — put a name that was given within English in one top-level category and a name transliterating a foreign name in a different top-level category — but in practice that's not maintained, since e.g. Alexandra in the context of discussing ancient Greek is transliterating the Ancient Greek name, Sergei has been given to babies born in the Anglosphere (and to characters in English fiction), and we don't maintain such a split with place names. - -sche (discuss) 16:01, 24 April 2023 (UTC)[reply]
It making no sense to have Alexandra (in works about ancient Greece where it's romanizing a Greek name), Alexandra (in fiction about ancient Greece where it's a given name), Alexandra (as borne by British or American people today), Sonya, Vadim and Vladimir divided haphazardly into two different top-level categories, "Names" vs "Names by language", I'm now (attempting) editing the modules to consolidate them into "Names by language" subcategories. - -sche (discuss) 14:37, 5 May 2023 (UTC)[reply]
(Assistance solicited at Module talk:names#en:Russian_male_given_names,_etc.) - -sche (discuss) 14:48, 5 May 2023 (UTC)[reply]

Recategorize Category:Demonyms and Category:Ethnonyms[edit]

Pinging some editors from the discussion above: @User:Rua, @User:Daniel Carrero

As I explained in the discussion about exonyms above, renaming the language-specific subcategories of cat:Demonyms properly will require removing it from the topic category tree and adding it to the set category tree. We should similarly recategorize cat:Ethnonyms, another child of cat:Names that did not yet exist when this discussion started. I propose recategorizing them into Category:Terms by semantic function subcategories by language, unless someone can find a better place, and renaming them cat:Demonyms by language and cat:Ethnonyms by language. — excarnateSojourner (talk · contrib) 06:55, 25 February 2023 (UTC)[reply]

@ExcarnateSojourner @-sche I am going to take a stab at implementing this. Can you help with what the renames should be? I understand the separation between poscat categories and topic categories should be "lexical" vs. "semantic" but I sometimes have trouble putting this into practice. A tentative list based on what's already been proposed:
  1. 'DESTLANGCODE:SOURCELANG male given names' -> 'DESTLANG male given names transliterated from SOURCELANG'; same for 'female given names', 'surnames', etc. This doesn't work; these are not DESTLANG names but SOURCELANG names rendered into DESTLANG. So I propose 'DESTLANG renderings of SOURCELANG male given names' or similar. ("Transliteration" isn't quite right; sometimes these are transliterations, sometimes respellings, sometimes mere borrowings (cf. Italian Clinton).)
  2. 'LANGCODE:Foreign personal names' (a grouping category) -> 'LANG foreign personal names'
  3. 'LANGCODE:Demonyms' -> 'LANG demonyms'
  4. 'LANGCODE:Ethnonyms' -> 'LANG ethnonyms'
  5. 'LANGCODE:Exonyms' -> 'LANG exonyms'
  6. 'LANGCODE:Letter names' -> 'LANG letter names'
  7. 'LANGCODE:Couple nicknames' -> 'LANG couple nicknames'
  8. 'LANGCODE:Named roads' -> 'LANGCODE:Names of roads' and remove from 'LANGCODE:Names'
  9. 'LANGCODE:Named prayers' -> 'LANGCODE:Names of prayers' and remove from 'LANGCODE:Names'
What about the following:
  1. Subcategories of 'LANGCODE:Demonyms':
    1. 'LANGCODE:Armenian demonyms'?
    2. 'LANGCODE:Celestial inhabitants'?
      1. 'LANGCODE:Ufology' -> stays as a topic category.
    3. 'LANGCODE:Latvian demonyms'?
    4. 'LANGCODE:Nationalities'
    5. 'LANGCODE:Tribes'
      1. 'LANGCODE:Celtic tribes'
      2. 'LANGCODE:Germanic tribes'
      3. 'LANGCODE:Native American tribes'
      • See also 'LANGCODE:Mongolian tribes' under 'LANGCODE:Ethnonyms'.
  2. Subcategories of 'LANGCODE:Ethnonyms':
    1. 'LANGCODE:Mongolian tribes' -> Goes wherever 'LANGCODE:Celtic tribes', 'LANGCODE:Germanic tribes' and 'LANGCODE:Native American tribes' go.
  3. 'LANGCODE:Place names' -> Delete and reclassify the terms under them using {{place}} so they end up in 'Places in FOO'.
  4. 'LANGCODE:Places' -> Leave as a topic category but remove 'LANGCODE:Names' as a parent?
  5. Script-specific variants of 'LANGCODE:Letter names': 'LANGCODE:Arabic letter names', 'LANGCODE:Devanagari letter names', 'LANGCODE:Imperial Aramaic letter names', 'LANGCODE:Korean letter names', 'LANGCODE:Latin letter names'?
  6. Subcategories of 'LANGCODE:Nicknames':
    1. 'LANGCODE:Nicknames' itself? This is a grouping category.
    2. 'LANGCODE:Nicknames of individuals'?
    3. 'LANGCODE:City nicknames'?
    4. 'LANGCODE:Country nicknames'?
      1. 'LANGCODE:Racist names for countries' -> Terminate with extreme prejudice, see WT:BP.
    5. 'LANGCODE:Sports nicknames' -> either 'LANGCODE:Sports team nicknames', 'LANGCODE:Nicknames of sports teams', 'LANG sports team nicknames', 'LANG nicknames of sports teams'
      • See also 'LANGCODE:Couple nicknames' above.
  7. 'LANGCODE:Onomastics' -> stays as topic category but should not have 'LANGCODE:Names' as one of its parents.
  8. 'LANGCODE:Language families'? Regardless, it should not have 'LANGCODE:Names' as one of its parents.
  9. 'LANGCODE:Languages'? Regardless, it should not have 'LANGCODE:Names' as one of its parents.
  10. 'LANGCODE:Taxonomic names' and subcategories:
    1. 'LANGCODE:Taxonomic names' itself?
    2. 'Taxonomic eponyms by language': Already a pos category.
    3. 'Specific epithets' -> 'Translingual specific epithets'?
Other topic categories not directly reachable through 'LANGCODE:Names' but needing consideration:
  1. 'LANGCODE:Ships (fandom)' and numerous subcategories ('LANGCODE:F/F ships (fandom)', 'LANGCODE:M/M ships (fandom)', 'LANGCODE:Heterosexual ships (fandom)', 'LANGCODE:Homosexual ships (fandom)', 'LANGCODE:Polyamorous ships (fandom)', 'LANGCODE:RPF ships (fandom)'
  2. 'LANGCODE:Horse given names'
Benwing2 (talk) 07:14, 31 October 2023 (UTC)[reply]
@-sche Wondering if you missed my ping. I know my post is long, so take your time in responding. Benwing2 (talk) 06:36, 4 November 2023 (UTC)[reply]
Sorry, didn't mean to ignore your ping, but got distracted by life after seeing it. As far as the categories for "English renderings of Ukrainian names" (or whatever), I have no strong preference for any particular name at this time. My immediate concern was just with addressing the odd point of bifurcation where "native English placename like Warwick or Alberta; English rendering of an Armenian placename like Stepanakert; English rendering of a personal name someone gave a baby born in Ukraine like Volodymyr" are in one top-level category system ("LANGCODE:Names", named like 'set' categories), and "personal name someone gave a baby born in Canada" is in a different top-level category system ("LANGNAME names", treated like a quasi-part of speech). It's hard to decide where exactly to split the spectrum of categories we're dealing with here, if we're wanting to keep e.g. "John" in "Category:English male given names" at that (part-of-speech-esque) category name, but wanting to consider some things like Category:en:Native American tribes to be clearly a set/list category (a set/list of tribes); my immediate point was just that I don't see a sound basis for considering "John, Jane" a POS-type (LANGNAME) category but "Volodymyr, Sergei" a LANGCODE:-set-type category — surely they're both one or both the other, and the greater momentum seems to be towards considering "names" a POS-type/LANGNAME category. But maybe we should think about that more carefully and consider them all to be "sets"? (But then, "Category:English verbs" is also just a category containing the set of English verbs. Hmm... should we perhaps allow only things that are truly "parts of speech" to have "Category:LANGNAME foobars" names, and make all the "names" categories that contain John and Volodymyr into set categories? Should that be the direction in which we eliminate the bifurcation of the 'John' vs 'Volodymyr' categories?)
I do think even keeping names in two subcategories like "English given names" vs "English renderings of Ukrainian names"/"English renderings of Chinese names"/etc [whatever we call those categories] based on, in effect, whether they were born in Ukraine vs to a Ukrainian family in Canada (or in China vs to a Chinese family in America) may be less than ideal; e.g. what do we do if a transliterated Ukrainian or Chinese name is common in English-language fiction? What about if it's a German name; does the fact that those names are "natively" Latin script make the threshold for considering them to have become "English names" lower? Does it make a difference if the fiction is set in lightly-fictionalized Germany or Ukraine or China, vs in a space future or a generic medievalesque Middle Earth / Westeros? But I don't have time to think through and suggest any proposal for any better approach to that yet.
"LANG foreign personal names" (e.g. "English foreign personal names") sounds a bit odd; would "LANG renderings of foreign personal names" (aligning with your proposed "DESTLANG renderings of SOURCELANG male given names") be better, iff we're sticking with moving "Names" categories to LANGNAME names and not LANGCODE names?
I will try to respond more, and to the rest, later. - -sche (discuss) 17:54, 4 November 2023 (UTC)[reply]
@-sche Thanks for your comments. I have no issue with "LANG renderings of foreign personal names". I see your point about the line between nativized foreign-origin names and renderings of actual foreign names being fuzzy, but there does feel to me like a distinction, esp. in languages like Latvian that tend to respell foreign names according to Latvian spelling conventions, and the distinction is fairly clearly made in reality between e.g. the large number of Russian names respelled according to Latvian conventions (and used e.g. by the large population of Russians in Latvia) vs. the smaller number of Russian-origin names that have become nativized for naming of ethnic Latvians. In a multi-ethnic society like the US or Canada where nationality and ethnicity aren't always clearly distinguished, things get a lot fuzzier, although it still feels like there's some sort of distinction between names like Volodymyr or Volha that are unlikely to be borne by anyone other than someone who is Ukrainian (resp. Belarusian) or whose parents or grandparents are Ukrainian (resp. Belarusian), vs. a name like Vladimir or Olga that might be given to someone with no particular connection to Russia. As for whether these should use LANGNAME-type or LANGCODE-type naming, I'm not sure although I gather the distinction is supposed to be lexical vs. semantic, if that helps at all. Benwing2 (talk) 23:57, 4 November 2023 (UTC)[reply]
I guess we should stick with LANGNAME naming for given names / surnames, then, at least for now. (Switching gears for a moment to address a different aspect:) Regarding "horse given names", we also have (but apparently don't currently categorize) dog given names likes Scruffy, Fido, and Spot, and we have Polly as a name for a parrot, and Mittens, Kitty, Socks for cats (also e.g. Miming in Cebuano). Perhaps we should merge all the different animals into one category for "animal given names". To me, at least, it seems intuitive to then handle this category in whatever way we handle the human given name categories—so, if we're naming the category that contains 'John' "English male given names", then 'Fido' goes in "English animal given names", or if we're using language codes, then use codes for both. (Back to the first gear:) We also have names that belong to specific individual people (Confucius, Cicero) or animals (Laika, and mythically Cerberus, Garm); we seem to put these in LANGCODE-set categories; I suppose the rationale is that the category that contains "Confucius, Cicero" contains a set of individuals, whereas "John" and "Jane" are 'less restricted'... in practice, people have undoubtedly also named babies 'Confucius' and 'Cicero', but if we demonstrate that, then we add a {{given name}} sense, so I guess we're fine leaving the individuals in LANGCODE-set categories and the {{given name}}s in LANGNAME categories... I guess this also explains the difference between nicknames (LANGNAME nicknames) and relationship names (the category contains a set of specific ships)...? nevermind, "Category:Nicknames" doesn't contain what I would've expected ("Bob, Jim, Tom" for Robert, James, Thomas) - -sche (discuss) 18:45, 5 November 2023 (UTC)[reply]
@-sche This all sounds good to me. I think I'll start on the renames in a couple of days depending on how the comments go. Benwing2 (talk) 21:45, 5 November 2023 (UTC)[reply]
Just checking, when your "list based on what's already been proposed" includes "'LANGCODE:Demonyms' -> 'LANG demonyms'" but then your follow-up proposal is for Subcategories of 'LANGCODE:Demonyms': like 'LANGCODE:Armenian demonyms'?, you're proposing to not actually rename "'LANGCODE:Demonyms' -> 'LANG demonyms'", right? I'm just checking that we're going to handle "Demonyms" and the subcategories like "Armenian demonyms" the same way, either all using LANGCODEs or all using LANGNAME. I could see handling the categories that actually have the word "demonyms" in their name either way, but since some of the other subcategories like "LANGCODE:Native American tribes" do seem more like set categories, maybe it's best to consider the whole batch to be set categories and stick with LANGCODE names like they have at present? (But maybe move them out of the "Names" category?)
"Couple nicknames" is an interesting case, because intuitively it seems like those and (relation)ship names should be handled the same way, since they seem like the exact same thing: "Lumity" is the portmanteau name for the two specific individuals Luz Noceda and Amity Blight, and Billary is the portmanteau name for the two specific individuals Bill Clinton and Hillary Clinton... maybe LANGCODE:Couple nicknames should be renamed "LANGCODE:Couples" to be more clearly a set category? and moved out from under the "names" category, since we don't categorize ship names as "names"? - -sche (discuss) 02:34, 6 November 2023 (UTC)[reply]
@-sche Thanks for pointing out that inconsistency. Rua's point awhile ago was that 'Native American tribes' is named correctly as a set category because the contents are "names of Native American tribes" but 'Armenian demonyms' isn't named correctly as the contents aren't "names of Armenian demonyms". Rua suggested renaming 'Demonyms' -> 'Peoples' although that seems a bit strange to me as the term 'demonym' is fairly well established, and furthermore a distinction could be made between nominal demonyms and adjectival demonyms (note, we have {{demonym-noun}} and {{demonym-adj}} for these two, respectively), which is clearly a lexical distinction. That suggests maybe they should all be considered lexical categories, esp. since I think something like Category:en:Exonyms doesn't make sense as a set category (being an exonym is completely a lexical property. If we are to make Category:en:Armenian demonyms a lexical category, IMO it should be Category:English demonyms for Armenians as Category:English Armenian demonyms doesn't make much sense. As for CAT:en:Couples, that seems ambiguous so maybe it should be CAT:en:Nicknames of couples or something (which would be keeping with future names like CAT:Types of stars and such). Benwing2 (talk) 02:54, 6 November 2023 (UTC)[reply]
"CAT:en:Nicknames of couples" works. Or should it even be "Nicknames of pairs", since it currently contains a few things like Bushbama {{subst:dash}} or should we remove those? (We don't categorize e.g. Republicrat as anything but "US politics".)
Good point about exonyms. "Demonyms", or at least the things currently in the "Demonyms" categories, seem to straddle the line between being a set category like "Occupations", vs being lexical like "Exonyms"... ugh, as you said earlier, it's hard to pin down and "put into practice" the difference, since so many of these categories exist in a grey area with characteristics of both. Like: it would not technically be wrong AFAICT to say "Category:English male given names and Category:English nouns are set categories containing the set of all English male given names or nouns respectively" (it would just be madness, heh). And in the other direction, isn't being a placename as much a lexical property as being a given name? But should they go into the same top-level "LANGNAME names" category, or is that madness?
Thinking aloud for a moment, I guess one difference is whether a term refers to one specific entity, or to an open-ended cast, which would rationalize why "John" and "Bob"—as names that can be given to an open-ended variety of people, new babies every day—are in (or belong in, in the case of "Volodymyr") "LANGNAME names" categories, whereas "Baghdad Bob" (individual's nickname), "Billary" and "Lumity" (real and fictional couples' nicknames) and e.g. "Saskatchewan" and "Yerevan" (placenames) refer to specific entities, and so are LANGCODE set categories...? So then, since demonyms like "Saskatchewanian" and "Yerevanian" also refer to an open-ended set of people (new babies born in Saskatchewan every day), and as you say, 'being a demonym' can be argued to be a lexical property like 'being an exonym', that justifies them being "LANGNAME demonyms" categories...? (Then the "type of"-set categories, like the category for "the set of all types of stars" or "the set of Native American tribes", are LANGCODE-set categories for a different reason.) - -sche (discuss) 19:04, 6 November 2023 (UTC)[reply]
@-sche Yes, that seems to make a lot of sense. BTW I have written the script to move topic (langcode) categories to lexical (langname) categories and I'm probably going to run it on exonyms first. Benwing2 (talk) 19:59, 6 November 2023 (UTC)[reply]
@-sche I have moved the exonyms and foreign-personal-names categories. Benwing2 (talk) 03:30, 7 November 2023 (UTC)[reply]
  • @Benwing2 Sorry for being absent here. I'm glad to see discussion happening and generally support your proposals. A few specific comments:
8. 'LANGCODE:Named roads': Why not 'LANGCODE:Roads' (and remove from 'LANGCODE:Names')?
9. 'LANGCODE:Named prayers': Why not 'LANGCODE:Prayers' (and remove from 'LANGCODE:Names')?
5. Regarding letter names, see also cat:Letters.
— excarnateSojourner (talk · contrib) 21:18, 22 November 2023 (UTC)[reply]
@ExcarnateSojourner The main reason for including the word "named" is that otherwise it might not be clear whether the categories are set-type or related-to categories. Benwing2 (talk) 00:57, 23 November 2023 (UTC)[reply]
Relevant to the discussion above about creating a general animal given names category, this discussion points out "Ralph" for a raven, as well as "Rover" as another dog name. Whenever the situation with human names is sorted out, I suggest moving "LANGCODE:Horse given names" ("is:Horse given names") to "LANGNAME animal given names" ("Icelandic animal given names"), unless anyone has objections... (or we could add a general "animal given names" category and retain subcategories for specific animals if one or more languages had a lot of names for them, as might be the case for dogs and horses...) - -sche (discuss) 17:24, 11 November 2023 (UTC)[reply]

2016[edit]

Nkore-Kiga[edit]

As can be seen at w:Nkore-Kiga language, Kiga [cgg] should definitely be merged into Nyankore [nyn]. Unfortunately, this might require a rename to something that is both hyphenated and considerably less common that just plain "Nyankore" (though that is, strictly speaking, merely the name of the main dialect). —Μετάknowledgediscuss/deeds 05:21, 18 September 2016 (UTC)[reply]

I'm not sure. WP suggests the merger was politically motivated, but many reference works do follow it. Ethnologue says there as "Lexical similarity [of] 78%–96% between Nyankore, Nyoro [nyo], and their dialects; 84%–94% with Chiga [cgg], [...and] 81% with Zinza [zin]" (Kiga, meanwhile, is said to be "77% [similar] with Nyoro [nyo]"), as if to suggest nyn is about as similar to cgg as to nyo, and indeed many early references treat Nkore-Nyoro like one language, where later references instead prefer to group Nkore with Kiga. Ethnologue mentions that some authorities merge all three into a "Standardized form of the western varieties (Nyankore-Chiga and Nyoro-Tooro) [...] called Runyakitara [...] taught at the University and used in internet browsing, but [it] is a hybrid language." (For comparison, Ethnologue says English has 60% lexical similarity to German.) - -sche (discuss) 00:16, 2 June 2017 (UTC)[reply]
Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

Itneg lects[edit]

See w:Itneg language. All the dialects have different codes, but we really should give them a single code and unify them. I came across this problem with the entry balaua, which means "spirit house" (but I can't tell in which specific dialect). It's also known as Tinggian (with various different spellings), and this may be a better name for it than Itneg. —Μετάknowledgediscuss/deeds 02:09, 23 September 2016 (UTC)[reply]

Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

Paraguayan Guaraní [gug][edit]

I just noticed that we have this for some reason. Guaraní is a dialect continuum that is quite extensive, both in inter-dialect differences and in geography, and certain varieties have been heavily influenced by Spanish or Portuguese. That said, our Guaraní [gn] content is, as far as I can tell, pretty much entirely on Paraguayan Guaraní, which for some reason has a different code, [gug]. My attention was brought to this by User:Guillermo2149 changing L2 headers (I have not reverted his edits, but they do cause header-code mismatch). We could try splitting up the Guaraní dialects, but it would hard to choose cutoffs and would definitely confuse potential editors, of which we have had more since Duolingo released a Guaraní course. I think the best choice is to merge [gug] into [gn] and mark words extensively for which dialects or countries they are used in. @-scheΜετάknowledgediscuss/deeds 01:29, 1 November 2016 (UTC)[reply]

Support merging gn and gug. - -sche (discuss) 14:33, 1 November 2016 (UTC)[reply]
Don't forget there's also [gui] and apparently also [tpj]. - -sche (discuss) 04:28, 16 May 2017 (UTC)[reply]

2017[edit]

Merger into Scandoromani[edit]

I propose that the Para-Romani lects Traveller Norwegian, Traveller Danish and Tavringer Swedish (rmg, rmd and rmu) be merged into Scandoromani. TN, TD and TS are almost identical, mostly differing in spelling (e.g. tjuro (Sweden) vs. kjuro (Norway) meaning 'knife', gräj vs. grei 'horse' etc.). WP treats them as variants of Scandoromani. My langcode proposal could be rom-sca, or maybe we could just use rmg, which already has a category. --176.23.1.95 20:19, 25 January 2017 (UTC)[reply]

Im supporting it. Traveller Norwegian is sometimes referred to as Tavring, and, to be honest, Ive never herd nobody use the term Traveller Norwegian as a language. People are calling it rather Taterspråk or Fantemål, even when books states it as a derigatory therm. The other problem is that we've got in fact 2 differnet Norwegian Traveller languages (the Romani-based and the Månsing-based). So it look like a total mess rite now Tollef Salemann (talk) 07:55, 2 April 2023 (UTC)[reply]
I don't think this makes sense if the orthographies are consistently different, which seems to be the case. Otherwise, we could use the same logic to merge quite a few of the Slavic languages, which obviously doesn't make sense. Theknightwho (talk) 13:43, 2 April 2023 (UTC)[reply]
Ok, but Traveller Norwegian is not quite right term, cuz the Romani-based TN has two or more branches, which are quite different from eachother, while the main one is allmost the same as the Swedish and had often the same name(s). Meenwhile, there is also a Germanic TN version, unrelated to the Romani-ish TN variations. I mean, we need at least two more L2 in this case, even if we gonna merge TN and Swedish Tavring.
PS there are also Swedish stuff like Knoparmoj and Loffarspråk and more, and they still have remnants in some rare Swedish/Norwegian sociolects. Maybe they also need their L2? Or can we treat them as sociolects? Tollef Salemann (talk) 13:59, 2 April 2023 (UTC)[reply]

Yenish[edit]

The Yenish "language" (which we call Yeniche) was given the ISO code yec, despite being clearly not a separate language from German. Instead, it is a jargon which Wikipedia compares to Cockney (which has never had a code) and Polari (which had a code that we deleted in a mostly off-topic discussion). The case of Gayle, which is similar, is still under deliberation at RFM as of now. Most tellingly, German Wiktionary considers this to be German, and once we delete the code, we should make a dialect label for it and add the contents of de:Kategorie:Jenisch to English Wiktionary. @-scheΜετάknowledgediscuss/deeds 00:49, 7 April 2017 (UTC)[reply]

I don't see how that's most tellingly; I don't know about the German Wiktionary, but major language works frequently treat things as dialects of their language that outsiders consider separate languages.--Prosfilaes (talk) 03:01, 10 April 2017 (UTC)[reply]
The (linked) English Wikipedia article even says "It is a jargon rather than an actual language; meaning, it consists of a significant number of unique specialized words, but does not have its own grammar or its own basic vocabulary." Despite the citation needed that follows, that sentence is about accurate, as such this should be deleted. -- Pedrianaplant (talk) 10:53, 30 April 2017 (UTC)[reply]
(If kept, it should be renamed.)
There are those who argue that Yenish should have recognition (which it indeed gets, in Switzerland) as a separate language. And it can be quite divergent from Standard German, with forms that are as different as those of some of the regiolects we consider distinct. Many examples from Alemannic or Bavarian-speaking areas are better considered Alemannic or Bavarian than Standard German. But then, that's a sign that it is, as some put it, a cant overlaid onto the local grammar, rather than a language per se. Ehh... - -sche (discuss) 03:22, 9 July 2017 (UTC)[reply]

What's the difference? --Barytonesis (talk) 20:19, 17 April 2017 (UTC)[reply]

Apparently (Google n-grams) the term could be used with or without an object. The definition should be somewhat different. An example of use without a direct object is "to rake over the coals of failure". I don't know how to word this in a substitutable way. It seems to mean something like "to belabor (something negative (result, process), obvious from context) as if in reprimand". DCDuring (talk) 15:14, 3 January 2018 (UTC)[reply]

Move entries in CAT:Khitan lemmas to a Khitan script[edit]

The Khitan wrote using a Siniform script. Are these Chinese transcriptions of Khitan? —suzukaze (tc) 02:22, 13 August 2016 (UTC)[reply]

I'm a little confused about what's going on here. Are you RFV-ing every entry in this category? Or are you just looking for evidence that Khitan was written using this script? —Mr. Granger (talkcontribs) 12:45, 13 August 2016 (UTC)[reply]
The Khitans had their own script. These entries use the Chinese script. —suzukaze (tc) 17:30, 13 September 2016 (UTC)[reply]
I understand that, but I don't understand what your goal is with this discussion. If you want to RFV every entry in the category, then I'd like to add {{rfv}} tags to alert anyone watching the entries. If you want to discuss what writing systems Khitan used, maybe with the goal of moving all of these entries to different titles, then I'm not sure RFV is the right place for the discussion. (Likewise with the Buyeo section below.) —Mr. Granger (talkcontribs) 17:55, 13 September 2016 (UTC)[reply]
Moved to RFM. - -sche (discuss) 21:04, 30 April 2017 (UTC)[reply]

This should be handled with {{liushu}}, since jiajie is one of the six categories (liushu). — justin(r)leung (t...) | c=› } 18:36, 17 May 2017 (UTC)[reply]

Can both of these templates be renamed to include a language code? —CodeCat 19:01, 17 May 2017 (UTC)[reply]
{{jiajie}} should be merged with {{liushu}}, which could be renamed as {{Han liushu}}, following {{Han compound}} and {{Han etym}}. It might not be a good idea to use a particular language code because these templates are intended for use in multiple languages now. They used to be used under Translingual, but we have decided to move the glyph origin to their respective languages. — justin(r)leung (t...) | c=› } 20:22, 17 May 2017 (UTC)[reply]
You can use script codes as prefixes too. We have Template:Latn-def, Module:Cans-translit and such. —CodeCat 20:26, 17 May 2017 (UTC)[reply]

Should perhaps be moved to long story? W3ird N3rd (talk) 06:42, 9 August 2017 (UTC)[reply]

In contrast to long story short, neither seems entryworthy to me. They are quite transparent. Checking long story”, in OneLook Dictionary Search., one notes that none of those references find it inclusionworthy, whereas long story short”, in OneLook Dictionary Search. shows some coverage. DCDuring (talk) 11:01, 9 August 2017 (UTC)[reply]

sense: Noun: "(aviation) A large multi-engined aircraft. The term heavy normally follows the call-sign when used by air traffic controllers."

In the aviation usage AA21 heavy ("American Airline flight 21 heavy") the head of the NP is AA21, heavy being a qualifying adjective indicating a "wide-bodied", ergo "heavy", aircraft.

Move to noun with any adjustments required. DCDuring (talk) 13:19, 24 August 2017 (UTC)[reply]

@DCDuring You're proposing we move from noun to noun? Did you mean from noun to adjective? - excarnateSojourner (talk | contrib) 05:57, 18 October 2022 (UTC)[reply]
I don't know what I meant 5 years ago, but that's what I mean now: move it to adjective. Though it would be good to confirm that there is not sufficient attestation of heavies and/or [DET] heavy. DCDuring (talk) 12:48, 18 October 2022 (UTC)[reply]
I can find the plural in reference to large (sometimes restricted to widebody) commercial aircraft and heavy bombers (sometimes 2-engine, always at least 4-). Also "heavy" motor vehicles (eg. large trucks, esp semis). I'm not entirely sure what heavy refers to when used by the pilot of a Cessna. DCDuring (talk) 12:57, 18 October 2022 (UTC)[reply]

Renaming mey[edit]

We currently have it as "Hassaniya" (which we used to spell as Hassānīya; those macra were removed along the way, presumably by Liliana, although I don't see any discussion; MG deleted the old category once it was empty). To match the other colloquial Arabic languages, it should be "Hassaniya Arabic". (Note: if Arabic is merged, this will become moot.) —Μετάknowledgediscuss/deeds 07:07, 16 September 2017 (UTC)[reply]

This seems a bit different from most of the other forms of Arabic which are "[Adjective referring to a place] Arabic", where just calling the lect "Libyan" (etc) would be more awkward. Still, I have no objection to a rename, though I don't have time to rename all the categories right now. I also notice that, while Hassaniya is probably still the most common spelling overall, it seems like Hassaniyya started to become more common around 2003. - -sche (discuss) 04:03, 29 December 2017 (UTC)[reply]
 Done. Benwing2 (talk) 07:32, 25 April 2024 (UTC)[reply]

Categories about country subdivisions to include the country name[edit]

This will include at least the following:

Categories for certain things that are located within these subdivisions will also be named, e.g. Category:Cities in Aomori (Prefecture)Category:Cities in Aomori Prefecture, Japan. —Rua (mew) 13:07, 16 October 2017 (UTC)[reply]

Support. I oppose the existence of categories with language code like "en:" in the first place, but what is proposed here seems to be an improvement over the status quo. --Daniel Carrero (talk) 20:27, 20 October 2017 (UTC)[reply]
I would have opposed a lot of these, but I was too late on the scene. DonnanZ (talk) 15:51, 12 November 2017 (UTC)[reply]
Support all except Category:Abkhazia, Georgia (for which I abstain as I do not properly understand the political situation explained by User:Palaestrator verborum). - excarnateSojourner (talk|contrib) 03:34, 29 October 2021 (UTC)[reply]
US states were moved by MewBot (talkcontribs) in 2017. - excarnateSojourner (talk | contrib) 22:00, 27 April 2022 (UTC)[reply]

The rename has been put on hold until there is a clear consensus either way. Please vote! —Rua (mew) 15:11, 14 November 2017 (UTC)[reply]

@Rua It looks sane to me if politics are let out. But why is Abkhazia in Georgia though it is an independent state, statehood only depending on factual prerequisites and not on diplomatic recognition which has nothing to do with it? Where does the Crimea belong to? (article Sevastopol is only in Category:en:Ukraine because it has not really been edited since 2014.) I can think of two solutions: First possibility: We focus on geographical and cultural constants. Second possibility: We focus on the actual political power. I disprefer the second slightly because it can mean much work in cases of war (i.e. how much the Islamic state holds etc., or say the current factions in Libya). But in neither case Abkhazia is in Georgia. But the first possibility does not even answer what the Crimea belongs to, i.e. I am not sure if it is historically correct to speak of the Crimea as Ukraine. And geographical terms are often fuzzy and subject to editorial decisions. All seems so easy if you start your concepts from the United States, which do not even have a name for the region they are situated in. And even for the USA your idea is questionable because the constituent states of the United States are states in their own right (Teilstaat, Gliedstaat in German), as is also the case for the Federal Republic of Germany and the Russian Federation partially (according to the Russian constitution only those of the 85 subjects are states which are called Republic, not the Oblasti etc.). Is Tatarstan Russia? Not even Russians can agree with such a sentence, as in Russia one sharply distinguishs русские and россияне, Россия and Российская федерация. Technically Ceuta and Melilla are in Morocco because Spain is not in Africa. Also, Kosovo je Srbija, and it would become just a coincidence if a place important in Serbian history is listed as X, Kosovo or X, Serbia. Palaestrator verborum (loquier) 16:06, 14 November 2017 (UTC)[reply]

@Rua: Most of these categories like Category:en:Special wards in Tokyo are back on the {{delete}} list. I think these should be removed again for the time being. DonnanZ (talk) 18:02, 14 November 2017 (UTC)[reply]

  • Starting with the above, I don't know how the Tokyo ward system works, but I imagine it's a subdivision of the city. In England wards are subdivisions in cities, boroughs, local government districts, and possibly counties. "Wards in" is the natural usage.
Municipalities similarly. For example in Norway there are hundreds of municipalities (kommuner) which are subdivisions within counties (fylker). Some of these can be large, especially in the north, but so are the counties in the north. To me "municipalities in" is the natural wording.
States and provinces in the USA and Canada: In nearly all cases it is unnecessary to add the country name as the names are unambiguous. The only exception I can think of is Georgia, USA. This could also apply to prefectures in Japan and states in India (is there a Punjab in Pakistan?). DonnanZ (talk) 18:52, 14 November 2017 (UTC)[reply]
Yes, there is, like there is in India. Maybe categorisations should be abundant? Cities can belong to Punjab as well as to Punjab, India, and the Crimea is part of administration of both the Russian Federation and the Republic Ukraine at least for some purposes in the Republic Ukraine. We can make the least thing wrong by adding Sheikh Zuweid (presuming it exists) as well to the Islamic State as to the Arab Republic of Egypt, because we do not want to judge morally and formally states and terror organizations are indistinguishable. On the other hand of course we need sufficient data to relate towns to administrative divisions and ISIS presumably does not publish organigrams. Palaestrator verborum (loquier) 19:44, 14 November 2017 (UTC)[reply]

2018 — February[edit]

2018 — March[edit]

This is extremely trivial, not to mention something that could be found even if it were not categorised. I think that it suits an appendix much better, so I propose that its contents be moved to Appendix:English words ending in -gry. —Μετάknowledgediscuss/deeds 03:23, 15 March 2018 (UTC)[reply]

A benefit to having it as a category is that theoretically it ought to be addable by the headword templates examining the pagename (like "English terms spelled with Œ"), which, if implemented (...if it could be implemented without excessive memory costs), would allow it to be kept up to date automatically. - -sche (discuss) 17:16, 15 March 2018 (UTC)[reply]
That is true, but I don't really think we should be using headword templates to collate trivia. —Μετάknowledgediscuss/deeds 17:47, 15 March 2018 (UTC)[reply]
Delete per proponent. --Per utramque cavernam 18:09, 31 May 2018 (UTC)[reply]
Is there something like Category:English lemmas but sorted from the end, like anger, ranger, hunger, angry, hungry? --幽霊四 (talk) 19:40, 6 February 2021 (UTC)[reply]
At http://tools.wmflabs.org/dixtosa/ you can get a list of all entries in any category that end with any string you like. —Mahāgaja · talk 20:58, 6 February 2021 (UTC)[reply]
Support the proposed move per nom. - excarnateSojourner (talk|contrib) 05:00, 29 October 2021 (UTC)[reply]
Meh. Mehhhhhh. On one hand, I still like the idea of a category which can be populated automatically any time a new relevant entry is added. OTOH, it's very trivial. Well, it would be simple for someone to copy the current contents of the category over to the appendix and then remove the category from the entries (maybe with AWB to speed things up). - -sche (discuss) 09:04, 28 December 2023 (UTC)[reply]

2018 — April[edit]

Entries for Japanese prefecture names that end in (ken, prefecture)[edit]

I would like to request the move of the content of entries like 茨城県 (Ibaraki-ken, literally Ibaraki prefecture) to simply 茨城 (Ibaraki, Ibaraki), cf. Daijisen. is not an essential part of the name.

(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Dine2016, Poketalker, Cnilep, Britannic124, Fumiko Take, Dine2016): Suzukaze-c 03:19, 19 April 2018 (UTC)[reply]

As a counterargument, Shogakukan's 国語大辞典 entry for 茨城 (Ibaraki) has one sense listed as 「いばらきけん(茨城県)」の略 ("Ibaraki-ken" no ryaku, "short for Ibaraki-ken"), and the 茨城 page on the JA Wikipedia is a disambig pointing to 茨城県 as one possible more-specific entry. ‑‑ Eiríkr Útlendi │Tala við mig 03:52, 19 April 2018 (UTC)[reply]
(edit conflict) It seems like a two-word phrase to me. I am not a native speaker, but I think that if someone asked "水戸市は何県?" ((in) What prefecture is Mito?) then "茨城です。" (It's Ibaraki) would be a correct answer. Entries such as 奈良 and 広島 should have both the city and the prefecture. (I see that 奈良 currently does.) Cnilep (talk) 04:01, 19 April 2018 (UTC)[reply]
茨城県です would also be correct and probably more common. At least 東京 and 東京都 are clearly distinguished. No one in Izu Ōshima would say he/she is from 東京. — TAKASUGI Shinji (talk) 04:04, 19 April 2018 (UTC)[reply]
Yes, 茨城県 is also correct. And if someone asked どこの出身? (Where are you from?) the answer would probably be 奈良県 rather than 奈良, or else expect a follow-up question. But I don't think that is necessarily a matter of word boundaries. Compare Pittsburgh, Pennsylvania and Pittsburgh, Kansas; the fact that it is usually necessary, and always acceptable to specify the latter doesn't mean that Pittsburgh on its own is not a proper noun. By same token, I think that 茨城 (et alia) is a word. That's the point I had in mind. I will say nothing about what is more common. I don't even have good intuitions about frequency in my native language. Cnilep (talk) 04:54, 19 April 2018 (UTC)[reply]
I fully agree that 茨城 is a term worthy of inclusion. I also think that 茨城県 is a term worthy of inclusion. We have entries for both New York and New York City, and even New York State. Similarly, I think we should have entries for [PREFECTURE NAME], and also for [PREFECTURE NAME] and [PREFECTURE NAME] and [PREFECTURE NAME], etc., as appropriate. ‑‑ Eiríkr Útlendi │Tala við mig 05:03, 19 April 2018 (UTC)[reply]
I believe New York is a special case because there is both the state and the city. We have Washington State, but we don't have City of Chicago or State of Oregon. —Suzukaze-c 18:40, 19 April 2018 (UTC)[reply]
A lot (maybe all?) of the prefecture names minus the (-ken) suffix are polysemous. Listing a few from the north to the south, limiting just to geographical senses, and just in the same regions at that:
  • 青森 (Aomori): a prefecture and a city
  • 岩手 (Iwate): a prefecture, a city, and a township
  • 秋田 (Akita): a prefecture and a city
  • 山形 (Yamagata): a prefecture, a city, and a village
  • 宮城 (Miyagi): a prefecture, a county, a township, a rural area (ancient Japan), a village, an island, and a mountain
  • 福島 (Fukushima): a prefecture, a city, and a township
  • 新潟 (Nīgata): a prefecture, a city, a park, and a village
  • 栃木 (Tochigi): a prefecture and a city
  • 茨城 (Ibaraki): a prefecture, a county, and a township
Jumping south a bit to touch on Anatoli's example further below:
  • 奈良 (Nara): a prefecture, a city, a township, and a village
I am consequently in support of including both the bare name, and the qualified name(s), much as we already do for similar situations with English terms. ‑‑ Eiríkr Útlendi │Tala við mig 21:35, 19 April 2018 (UTC)[reply]
They are polysemic because most prefectures were named after their capital city during the abolition of the han system. Exceptions include 埼玉 and 沖縄, where cities are named after their prefecture. — TAKASUGI Shinji (talk) 12:23, 23 April 2018 (UTC)[reply]
Generally support. Less duplication is good, and it is not much different from Chinese etc. for which we generally delemmatise, if not completely hard-redirect, these forms. Wyang (talk) 04:49, 19 April 2018 (UTC)[reply]
Support. For a dictionary, I think we don't need to keep entries with both prefecture name and prefecture, despite the usage but it's always helpful to provide usage notes (e.g. normally used with 県: ~県) and usage examples, e.g. 奈良県(ならけん) (Nara ken, Nara (prefecture)). --Anatoli T. (обсудить/вклад) 05:45, 19 April 2018 (UTC)[reply]

2018 — July[edit]

After some discussion on Category talk:Baybayin script (that went a bit off-topic), some of the Indian language editors (@Bhagadatta, Msasag and myself) have agreed that this category should be renamed to Category:Eastern Nagari script, the reasons being (1) several languages other than Bengali use this script, and (2) the Bengali alphabet is just a subset of this script and lacks some of the glyphs used by other Bengali-script languages (most prominently Assamese which has a separate r-glyph). I want to make sure that there are no objections to this by editors who were not in the discussion. —AryamanA (मुझसे बात करेंयोगदान) 02:06, 20 July 2018 (UTC)[reply]

google:assamese+site:unicode.orgSuzukaze-c 02:16, 20 July 2018 (UTC)[reply]

@Asm sultan, Dubomanab Kutchkutch (talk) 05:35, 21 July 2018 (UTC)[reply]

Support -- Bhagadatta (talk) 08:38, 21 July 2018 (UTC)[reply]

2018 — August[edit]

Nahuatl is sometimes treated as a language, and sometimes as a family of languages. Right now, Wiktionary is treating it as both simultaneously, which doesn't make sense. "Nahuatl" should be removed as a language. --Lvovmauro (talk) 11:55, 30 August 2018 (UTC)[reply]

I agree the current arrangement doesn't make sense; it is a relic of very early days on Wiktionary, and has persisted mostly because it's not entirely clear how intelligible the varieties are and hence whether it's better to lump them all into nah, or retire nah and separate everything. But enough varieties are not intelligible that I agree with retiring nah (or perhaps finally converting it to a family code). - -sche (discuss) 20:34, 31 August 2018 (UTC)[reply]
I think a family code for Nahuan languages is really needed since there are many cases where we don't know specifically which variety a word was borrowed from. --Lvovmauro (talk) 09:55, 9 September 2018 (UTC)[reply]
@Lvovmauro: OK, thanks to you and a few other editors, all words with ==Nahuatl== sections have been given more specific headers. However, as many as a thousand translations remain to be dealt with before the code can be made a family code and Category:Nahuatl language moved on over to Category:Nahuan languages. - -sche (discuss) 06:48, 19 September 2018 (UTC)[reply]
A disturbingly large number of these translations are neologisms with no actual usage. Some of them don't even obey the rules of Nahuatl word formation. --Lvovmauro (talk) 11:03, 19 September 2018 (UTC)[reply]
@Lvovmauro: Feel free to remove obvious errors / unattested neologisms. If a high proportion of the translations are bad, it might even be reasonable to start presuming they're bad and just removing them, since they already suffer from the problem of using an overbroad code. - -sche (discuss) 00:28, 21 October 2018 (UTC)[reply]
Someone with more time on their hands than me at the moment will need to delete all the subcategories of Category:Nahuatl language, and then the category itself, in preparation for moving 'nah' from the language-code module to the family-code module so the categories won't be recreated by careless misuse of 'nah' in the labels etc of 'nci' entries. - -sche (discuss) 00:24, 21 October 2018 (UTC)[reply]
Five years on, I've reviewed the situation here. There are no Nahuatl entries anymore, which is good progress. However, two pressing issues are stopping us from fully retiring this language code:
  • There are still about 450 "Nahuatl" (nah) translations in English entries. I suppose these need manual review. This should not be too difficult if one can find word lists for some of the best-attested Nahuatls.
  • Many languages have at least one word said to be derived from Nahuatl (presumably this is the word for "chocolate" in most cases). This could be solved by making Nahuatl an etymology-only language, or by changing these etymologies to refer generically to "a Nahuan language".
This, that and the other (talk) 09:25, 1 November 2023 (UTC)[reply]

Mecayapan Nahuatl saltillos[edit]

A number of Mecayapan Nahuatl words are currently written with U+0027 APOSTROPHE, which is a punctuation mark and not a letter. And a couple are using U+02BC MODIFIER LETTER APOSTROPHE, which is the wrong shape for this language. They should all be written with U+A78C LATIN SMALL LETTER SALTILLO instead.

--Lvovmauro (talk) 09:48, 31 August 2018 (UTC)[reply]

Or perhaps they should just be moved to use the Modifier Letter Apostrophe, cf WT:RFM#Entries_in_CAT:Taos_lemmas_with_curly_apostrophes, to avoid over-proliferation of different apostrophe-ish letters. I think we should try to be consistent within the Nahuatl languages, at least, in which codepoint we use. - -sche (discuss) 20:26, 31 August 2018 (UTC)[reply]
Most Nahuan languages don't use any sort of apostrophe. Mecayapan is unusual. --Lvovmauro (talk) 01:54, 1 September 2018 (UTC)[reply]

2018 — September[edit]

It’s not about goon but go-on. Most books on Japanese seem to use kan-on and go-on with a hyphen rather than the correctly Romanized kan’on and goon. — TAKASUGI Shinji (talk) 15:42, 22 September 2018 (UTC)[reply]

2018 — October[edit]

I propose to rename Category:Korean determiners to Category:Korean adnominals, just like Category:Japanese adnominals. Korean gwanhyeongsa are grammatically almost identical to Japanese rentaishi or adnominals, which may or may not be determiners. Gwanhyeongsa are generally divided into three classes: demonstrative gwanhyeongsa, numeral gwanhyeongsa, and qualifying gwanhyeongsa ([4]). The last ones are not determiners. (pinging @Atitarev, Eirikr, Garam, HappyMidnight, KoreanQuoter) — TAKASUGI Shinji (talk) 23:31, 10 October 2018 (UTC)[reply]

Support. --Garam (talk) 08:21, 12 October 2018 (UTC)[reply]
Tentatively Support. Let's check with User:Wyang who was also involved and had an opinion in a related discussion on the group of words ending in (, jeok). --Anatoli T. (обсудить/вклад) 02:42, 13 October 2018 (UTC)[reply]
I feel determiner is the more common name for this in English; the different definitions of these terms across languages should not be a concern - e.g. we also use adjective differently for Korean. adnominal may be confused with the -eun, -neun, -eul, -deon forms of Korean verbs and adjectives. Wyang (talk) 03:57, 13 October 2018 (UTC)[reply]
@Wyang: The problem is that Category:Korean determiners contains words other than determiners. It will be all right to have both Category:Korean adnominals and Category:Korean determiners without renaming if you want, just like Category:Japanese adnominals and Category:Japanese determiners. — TAKASUGI Shinji (talk) 10:31, 13 October 2018 (UTC)[reply]

@Tibidibi, AG202Fish bowl (talk) 11:32, 7 February 2022 (UTC)[reply]

2018 — November[edit]

Language request: Old Cahita[edit]

Mayo and Yaqui are mutually intelligible and sometimes considered to be a single language called Cahita. But their speakers apparently consider them to be distinct languages, and they have distinct ISO codes (mfy and yaq) and are currently treated distinctly by Wiktionary.

I'm not requesting that they be merged, but separating them is a problem because an important early source, the Arte de la lengua cahita conforme à las reglas de muchos peritos en ella (published 1737 but written earlier) treats them as a single language, and also includes an extinct dialect called Tehueco. I'd like to add words from the Arte but I can't list them specifically as either Mayo or Yaqui.

One solution would be treat to the language of the Arte as a distinct historical language, "Old Cahita", which would then be the ancestor of Mayo and Yaqui. The downside is there only seems to be one linguist currently using this name. --Lvovmauro (talk) 11:32, 4 November 2018 (UTC)[reply]

On linguistic grounds, it seems like we should merge Yaqui and Mayo. Jacqueline Lindenfeld's 1974 Yaqui Syntax says "Yaqui and Mayo are sufficiently similar to be mutually intelligible", the Handbook of Middle American Indians says "the modern known representatives of Cahitan—Yaqui and Mayo—are mutually intelligible", and various more general references say "Yaqui and Mayo are mutually intelligible dialects of the Cahitan language", "The Yaqui and Mayo speak mutually intelligible dialects of Cahita". (There are political considerations behind the split, which a merger might upset, so adding Old Cahita would also work, but we have tended to be lumpers...) - -sche (discuss) 23:03, 18 November 2018 (UTC)[reply]
I wouldn't object to merging them. --Lvovmauro (talk) 08:58, 19 November 2018 (UTC)[reply]

Merging Classical Mongolian into Mongolian[edit]

"Classical Mongolian" refers to the literary language of Mongolia used from 17th to 19th century created through a language reform associated with increased Buddhist cultural production (this started in the 16th century, but language standardization took place later). In the 20th century, (outer) Mongolia became independent from China and later adopted a Cyrillic orthography based on the spoken language, while Inner Mongolia kept her Uyghur script.

The literary language of Inner Mongolia continues Classical Mongolian in terms of its orthography as well as most of its grammar (to an extent that Janhunen (?) calls the situation bilingual). Modern varieties, in both Outer and Inner Mongolia, have greatly expanded their lexicons through borrowing of modern terms, but they also both consider all of Classical Mongolian lexicon to be a part of their language, and will put it in their dictionaries, even transcribed into Cyrillic.

The actual problem I have with this division is that when it comes to borrowings from (Classical) Mongolian, we sometimes cannot ascertain whether they precede the 20th century or not, or more common still, we know they precede the 19th century (and post-date the 16th), but they obviously come from a spoken variety and not "Classical Mongolian" as a literary language. Crom daba (talk) 17:14, 15 November 2018 (UTC)[reply]

Yes. I find it also strange that Wiktionary distinguishes Ottoman Turkish from Turkish, it’s like distinguishing pre-1918 Russian from “Russian”, or like one reads about “Ottoman Turks” instead of “Turks”. Also Kazakh and the other Turkic language do not get extra codes for Arabic spelling, this situation is even more comparable, innit. Kazakhs in China write in Arabic script, Mongols in China in Mongolian script, but the languages are two and not four. Or also it sounds as with Pali. Am I correct to assume that Classical Mongolian texts get reedited in Cyrillic script? Then you could base all on Cyrillic and make Mongolian script soft redirects, because even words died out before the introduction of Cyrillic can be found in Cyrillic. Fay Freak (talk) 15:23, 17 November 2018 (UTC)[reply]
@Fay Freak, the situation is similar to Turkish, but it creates less problems there since the Arabic script Turkish is obsolete and most relevant loans are pre-Republican.
In principle it could be possible to collapse all of Mongolian into Cyrillic, but this would be extremely politically incorrect.
Collapsing everything (potentially even Buryat, Daur and Middle Mongolian) into Uyghur script, like we do with Chinese, would perhaps make more sense, but 1) it's a pain to enter 2) Cyrillic is generally more accessible and useful to our users and (Outer) Mongolians 3) most of my materials are in Cyrillic 4) it corresponds poorly to the spoken forms 5) its Unicode encoding corresponds poorly to its actual form 6) the encoding doesn't correspond that well to the spoken form either. Crom daba (talk) 16:50, 18 November 2018 (UTC)[reply]
This is tricky, because as far as language headers and having entries for terms in the language, it seems like we could often resolve which language a word is in(?) by knowing the date of the texts it's attested in. It is, as you say, etymologies where it's hardest to ascertain dates. (Still, if we merged the lects, we could retain an "etymology only" code for borrowings that were clearly from Classical Mongolian, like is done for Classical Persian, etc.) I'm having a hard time finding any references on the mutual intelligibility of the two stages; most references are concerned with the intelligibility or non-intelligibility of modern Khalkha, Kalmyk, etc. If we kept the stages separate, etymologies could always say something like "from Mongolian foo, or a Classical Mongolian forerunner". - -sche (discuss) 22:50, 18 November 2018 (UTC)[reply]
@-sche, yes, the Persian model would be desirable.
It doesn't make much sense to speak of intelligibility between Classical and Modern Mongolian, Classical Mongolian is exclusively a written language, its spelling reflects the phonology of 13th-century Mongolian (early Middle Mongolian). The same spelling is used in Modern Mongolian as written in Uyghur script.
The biggest problem with Classical Mongolian is how redundant it is. For any word that is shared between modern and classical periods, and that is probably most of the lexicon, we would need to make two identical entries in Uyghur script for modern and classical Mongolian. Crom daba (talk) 11:18, 19 November 2018 (UTC)[reply]
That seems not unlike how we handle Serbo-Croatian and Hindi-Urdu. — [ זכריה קהת ] Zack. 14:25, 30 November 2018 (UTC)[reply]
Indeed. The way we handle them sucks. Crom daba (talk) 12:52, 1 December 2018 (UTC)[reply]
I agree. All this duplication is a huge waste of resources. Per utramque cavernam 13:22, 1 December 2018 (UTC)[reply]
Not exactly; Serbo-Croatian and Hindi-Urdu have redundant entries in different scripts on different pages, while I understand Crom daba's point to be that we would need to have redundant ==Mongolian== and ==Classical Mongolian== entries on the same pages for most Mongolian/Uyghur script words, which would be more like having duplicate Bosnian and Croatian entries on the same pages, not our current system. And Serbo-Croats are testier about their language(s) being lumped than speakers of Classical Mongolian... ;) - -sche (discuss) 17:29, 3 December 2018 (UTC)[reply]
OK, does anyone object to the merge? If not, I can try to do it with AutoWikiBrowser later, or Crom or others could start reheadering our small number of Classical Mongolian entries, fixing any wayward translations, etc. For etymologies of terms that are known to derive from Classical Mongolian, we should be able to just move cmg over to Module:etymology languages/data. - -sche (discuss) 17:29, 3 December 2018 (UTC)[reply]
@Crom daba, Fay Freak I made the few ==Classical Mongolian== entries we had into ==Mongolian== entries (labelled "Classical Mongolian" unless there was already a modern Mongolian section on the same page), but many of the categories still need to be deleted, and one needs to check whther anything else is left that would break before "cmg" is moved from being a language code to being an etymology-only code. - -sche (discuss) 02:46, 27 September 2020 (UTC)[reply]
There's no full correspondence between different Mongolian scripts and none of the scripts is totally phonetic. It's not just the spelling, the phonologies are different but sometimes one script represents the true or historical pronunciation and it's not necessarily Cyrillic, which is strange. There are words that only exist on one or the other, which is quite understandable, cf. modern ᠱᠠᠹᠠ (šafa, sofa) in Inner Mongolia (from 沙發沙发 (shāfā) and софа (sofa, sofa) in outer Mongolia (from софа́ (sofá). I support the merge, though but I am curious if classical Mongolian terms are equally representable in Cyrillic and Arabic scripts. In other words, are there terms in classical Mongolian, which are different from modern and there's no Cyrillic form for them? I think I saw them.
Duplication of entries is a waste. You may think I am biased but I think Mongolian should be presented/lemmatised in Cyrillic (Uyghurjin should also be available in all entries where it can be found) - for which resources are much more accessible. (Serbo-Croatian should be lemmatised on the Roman alphabet, on the other hand, let's finish the senseless duplications of entries)
Also supporting the Ottoman Turkish/Turkish merge. --Anatoli T. (обсудить/вклад) 03:25, 27 September 2020 (UTC)[reply]
@Atitarev In Mongol khelnii ikh tailbar toli we see the term уйгуржин бичиг is described as ‘монгол бичгийн дундад эртний үеийн хэлбэр’ (‘early form of the Mongolian/Khudam script’). Middle Mongolian in uigurjin with its own rules shall not to be equated with the later ‘Classical’-Modern script and orthography. I maintain uigurjin (with its specific glyph forms and spelling rules) shall be treated as a term only for Middle Mongolian.
Similarly I also object treating Northern Yuan – Qing (‘Classical’) Mongolian and Modern Mongolian-script Mongolian as one literary language standard. In fact orthographic standardisations and modifications make written Modern Mongolian such different from Classical. Personally I’d like to display a historical feature of this language collectively under ‘Classical Mongolian’, as only this term directly interlinks with an Inner Asian historical and linguistic tradition. LibCae (talk) 16:40, 7 May 2021 (UTC)[reply]

2018 — December[edit]

Renaming agu[edit]

We currently call this "Aguacateca", but "Aguacateco" is much more common. (Wikipedia opts for "Awakatek", which is rapidly becoming more common but is probably not there yet — not that we can't be crystal-ballsy if we want to when it comes to names rather than entries.) —Μετάknowledgediscuss/deeds 05:42, 19 December 2018 (UTC)[reply]

You're right that several modern (and a few older) sources seem to use Awakatek. In turn, historically Aguacatec has been used in the titles of many reference works on it, and seems like it may be the most common name (ngrams), although it's also the name of the people-group. (Others: Awakateko, Awaketec, Qa'yol, Kayol, and variously spellings of Chalchitec sometimes considered a distinct lect.) - -sche (discuss) 04:31, 19 August 2020 (UTC)[reply]
Indeed, the most common name by a longshot is Aguacatec, followed by Awakatek (but these are also names of the people-group), followed by Awakateko, then Aguacateco, and in dead last, our current name of Aguacateca. Can we rename to Aguacatec? - -sche (discuss) 07:02, 28 December 2023 (UTC)[reply]
  • Support renaming to Aguacatec. Also being the name of the "people-group" is hardly an argument against it; the same is true of a huge number of languages including French, Welsh, Manx and the vast majority of language names ending in -ish. —Mahāgaja · talk 07:22, 28 December 2023 (UTC)[reply]
    Oh, to clarify, I didn't intend that as an argument against using that name, but as a qualification on the data; comparing which term is more common can't easily determine which is the most common name of the language if one term is also used for something else (the name of the people). But Aguacatec seems to be the most common name in e.g. the books about it in Glottolog's bibliography, too. Who has a bot that does renames? This one involves few enough entries that it could be done by hand, but it seems like the tasks that would need to be done are the same for many (all?) language renames, so it should be bottable... - -sche (discuss) 07:51, 28 December 2023 (UTC)[reply]

2019 — January[edit]

"comparative adjectives" > "adjective comparative forms"[edit]

Apparently there was a recent vote to remove the ambiguity of comparative and superlative categories. What I don't understand is why the name "comparative adjectives" was chosen, which suggests a lemma category, yet it's now being subcategorised under non-lemmas. Lemma subcategories are named "xxx POSs", as can be seen in Module:category tree/poscatboiler/data/lemmas. Non-lemma subcategories are named "POS xxx forms", visible in Module:category tree/poscatboiler/data/non-lemma forms. Therefore, the obvious place for comparative forms of adjectives is the "adjective comparative forms" category we used to have. The new name, although voted on, stands out as an exception among all of our existing categories and is inconsistent. It should therefore either be renamed back to reflect its non-lemma status, or it should be moved back under its original lemma parent category. —Rua (mew) 23:57, 10 January 2019 (UTC)[reply]

@Surjection, ErutuonRua (mew) 00:09, 11 January 2019 (UTC)[reply]

The vote was here: Wiktionary:Votes/2018-07/Restructure comparative and superlative categories. — Eru·tuon 00:13, 11 January 2019 (UTC)[reply]
Participles are not lemmas yet they are called "(language) participles", so it's not as if the comparatives/superlatives would exactly be exceptions of some kind. They even have their own "participle forms" categories! The former also applies to gerunds. — surjection?09:13, 11 January 2019 (UTC)[reply]
And to make it clear, "adjective/adverb comparative/superlative forms" categories are to be made obsolete as a direct result of the vote. — surjection?09:16, 11 January 2019 (UTC)[reply]
Yes, and that should be undone, because as I said, the name "comparative adjectives" suggests that they are lemmas because of our existing naming scheme. Participles are non-lemmas by virtue of being participles, but adjectives are lemmas, so "comparative adjectives" are also lemmas. Are you implicitly proposing to rename all non-lemma categories to this new scheme, e.g. "dual adjectives", "plural nouns", "possessive nouns", "feminine adjectives"? If the vote is upheld then I will propose this change to make things consistent again. —Rua (mew) 12:00, 11 January 2019 (UTC)[reply]
I certainly would not assume "comparative adjectives" refer to lemmas in any way as much as "participles" don't. If we go back to "adjective comparative forms", what do you suggest for the name of the category with inflected forms of such? And don't just say "put them in 'Adjective forms'", because that at the very least isn't consistent as I stated below. In the old system, there was no consistency at all - inflected forms of comparatives and superlatives went to either the same category as them or Adjective forms without any sort of rule. — surjection?12:17, 11 January 2019 (UTC)[reply]
I would not even categorise inflected forms of comparatives in a special way. They are just adjective forms. I don't even think comparatives should be categorised separately at all, there is no obvious need to do so. The example of possessive forms is perhaps the best parallel, since they have inflection tables of their own in Northern Sami and many other languages. Do you propose renaming them to "possessive nouns" so that there can be a separate "possessive noun forms" category? —Rua (mew) 12:28, 11 January 2019 (UTC)[reply]
If you feel comparatives too don't need a special category, I'm personally fine with bunching all of them under "adjective forms", but that will too need wider consensus to implement. When it comes to those possessive nouns, I would argue comparatives and superlatives are closer to participles than to those possessive forms, which is why I believe they're not a good parallel and should be considered separately. — surjection?12:40, 11 January 2019 (UTC)[reply]
Why? —Rua (mew) 12:46, 11 January 2019 (UTC)[reply]
Many participle forms develop into adjectives of their own right and some comparative/superlatives too have developed into their own forms. Possessive forms by comparison basically never have, showing that they are fundamentally different in some way. — surjection?12:49, 11 January 2019 (UTC)[reply]
In fact, unlike this new system which has parallels, I'm fairly sure the old system of having "adjective comparative forms" but then the forms of comparatives under "adjective forms" is more of an exception. — surjection?09:32, 11 January 2019 (UTC)[reply]
Not really. We don't have separate non-lemma categories for everything in Module:category tree/poscatboiler/data/lemmas and in fact we don't need to. Under the old system, all comparative forms could be categorised under "adjective comparative forms", so that includes all case forms of comparatives. There was never any need to separately categorise forms of comparatives. In fact I'm generally opposed to subcategorising non-lemmas, so that's why I moved everything in Dutch to just "adjective forms". We don't need a subcategory for every possible type of non-lemma form. However, if we do have them, then they should be named consistently. —Rua (mew) 12:00, 11 January 2019 (UTC)[reply]
We don't have separate non-lemma categories for the reason that many of them are simply not inflectable on and upon themselves. Again, participles have separate categories for the main participle and inflected forms of such - why should this not apply to comparative and superlative adjectives? — surjection?12:17, 11 January 2019 (UTC)[reply]
What I get out of your argument is that you think "POS xxx forms" should become "xxx POSs" when the form has its own inflections. But then what about cases like English, where comparatives don't have their own forms and are simply adjective forms? Or cases like Dutch or Swedish, where there are multiple superlative forms but their inflections are shown on the lemma? How is an editor supposed to know what the name of the category for any particular adjective form is, when some of them are named differently from others? —Rua (mew) 12:28, 11 January 2019 (UTC)[reply]
That is indeed my argument for comparatives and superlatives due to their so far horridly inconsistent handling. In the case of English and all other languages, they will only have "comparative adjectives", no "comparative adjective forms", much like English would have "participles" that too aren't lemmas but would not have "participle forms". In cases like Dutch, Swedish and such where comparative/superlative forms are more numerous, those need to be handled on a language by language basis, ideally to choose one of the forms as the most lemma-esque (such as which form dictionaries primarily use to describe the comparative/superlative of an adjective), and if not one can be decided, it is more of a tricky situation (possibly all into "comparative/superlative adjective forms"?). Editors in turn can rely on other existing entries and eventually remember these entries much like the existing ones are, or use language-specific headword templates. Yes, the new system is by no means perfect, but I would argue it is miles better than what we had before. — surjection?12:38, 11 January 2019 (UTC)[reply]
But again, how is an editor of these languages supposed to know that, while adjective forms normally go in "adjective xxx forms", it is somehow different for comparative and superlative forms? You still haven't answered this. Your argument is based on sublemma-ness, but this differs per language, not all languages treat comparatives and superlatives as sublemmas. The categorisation should allow for both treatments depending on the needs of the individual language, not force a particular treatment on all languages. The fact that you think it makes sense for Finnish doesn't mean it makes sense for English. Now we have Category:English comparative adjectives for an adjective form, but Category:English noun plural forms for a noun form. How is that consistent? —Rua (mew) 12:45, 11 January 2019 (UTC)[reply]
I did already answer that question - read the latter part of my previous response. Many a time has an editor checked an existing entry to see how something is formatted, and I doubt there would be a single editor that has never done that. Many of the languages with comparatives and superlatives set up have language-specific headword templates, and many of those too have ACCEL which can too give the correct headword category autom- oh wait, it can't anymore since someone removed that capability. — surjection?12:49, 11 January 2019 (UTC)[reply]
You have not answered the question. An editor cannot, based on the rule that non-lemma categories are named "adjective xxx forms", guess the correct name of the category for comparative forms, whereas they could before. Instead, there is now a single exception that comparatives are named "comparative adjectives". Where are all the other "xxx POSs" categories for non-lemmas? Again, are you proposing that all non-lemmas be renamed to match this new scheme? If not, what justifies this single exception? —Rua (mew) 12:54, 11 January 2019 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Which question exactly have I not answered? The question was "how would an editor of these languages know the correct name for the categories?", which I have now answered not less than twice in my two previous responses. Instead, what it seems you are arguing is that the new scheme creates inconsistency in terms of the category names for non-lemma forms. Indeed, if other derivations are shown to be just like participles or comparative/superlatives, I'm happy to agree to move them under a similar scheme as well, but the possessive forms you brought up above are not an example of such. — surjection?12:58, 11 January 2019 (UTC)[reply]

Since it seems that this is the new norm for naming categories, I have proposed to rename all existing categories to match the new naming scheme at WT:BP. —Rua (mew) 13:16, 11 January 2019 (UTC)[reply]

@Rua Given the edits you have made to the templates and modules are still in place, are you willing to revert those yourself or are you asserting that you are overriding the consensus established by the vote? — surjection?21:10, 11 January 2019 (UTC)[reply]

See also Category talk:Terms making reference to character shapes by language.

Perhaps they could be merged, or perhaps both could be kept (Japanese: characters; letters?), but the naming should be consistent, at the least. —Suzukaze-c 11:08, 20 January 2019 (UTC)[reply]

Merge, perhaps into Category:Terms derived from character shapes by language (a bit shorter, and inclusive of non-letter characters). - excarnateSojourner (talk | contrib) 04:50, 28 April 2022 (UTC)[reply]

2019 — February[edit]

These should be merged, I think. Per utramque cavernam 12:39, 2 February 2019 (UTC)[reply]

Yes, IMO, into someone's blood runs cold, with hard redirects from both. DCDuring (talk) 15:43, 2 February 2019 (UTC)[reply]
I would support a hard redirect. Imetsia (talk) 23:34, 1 August 2021 (UTC)[reply]
Formerly entitled Category:Taxonomic eponyms

As above. —Rua (mew) 13:35, 2 February 2019 (UTC)[reply]

As with Category:Specific epithets. DCDuring (talk) 15:41, 2 February 2019 (UTC)[reply]
@Benwing2, Rua, DCDuring: I guess there is nothing to move here and this can be solved by an addition to module data so that we can auto-cat after adding {{cln|langcode|taxonomic eponyms}} in entries. I mean, in order to categorize the {{named-after}} stuff more specifically. Fay Freak (talk) 23:45, 7 November 2020 (UTC)[reply]
I think all of these that are entire taxonomic names must be Translingual, by virtue of being taxonomic names. The ones that are specific epithets would have the same language code for the taxonomic eponyms as for the specific epithet. DCDuring (talk) 01:02, 8 November 2020 (UTC)[reply]
@DCDuring: I am not exactly sure what you mean. I mean that “taxonomic eponyms” can be added to the topical data or to the etymological data (Category:Taxonomic names, the supercategory of Category:Taxonomic eponyms, resides in the former for some reason, but I devise the taxonomic eponym categories as motivated by etymological description, so the latter it should be), whereas Category:Taxonomic eponyms cannot because it cannot generally be applied onto all languages (only to Translingual and perhaps Latin words that also are epithets). @Rua mixed up different issues here, the reasoning “as above” is not comprehensible thus. Fay Freak (talk) 12:00, 9 November 2020 (UTC)[reply]
The question then is whether Translingual appears as "Translingual " or "mul:"? I have thought that "specific epithets" is a category having to do with the usage of the term. Thus the categorization should be the result of a label or of a non-gloss definition.DCDuring (talk) 18:57, 9 November 2020 (UTC)[reply]
Since "Translingual" is a junk supercategory, not comparable to our language categories, based on an attribute of the usage of some terms. The category includes CJKV characters, airport ocdes, other international abbreviations, symbols, and codes, some non-taxonomic scientific terms, and who-knows-what-else, as well as taxonomic names. The effort to act as if every linguistic entity in Wiktionary fits into a relatively well-defined hierarchy of language families, languages, and dialects comes a-cropper with the entities thrown into Translingual, just as the taxonomic naming system has its troubles with hybridisation and trans-taxon gene transfer (eg, from viruses or from the assimilation of prokaryotes into eukaryotes as organelles).
Specific epithets have a function within taxonomic terms that has nothing whatsoever to do with the fact that taxonomic names are used translingually, but has everything to do with names in the taxonomic/biological "language". 'Specific epithet' is a grammatical role within certain classes of taxonomic names. DCDuring (talk) 22:02, 9 November 2020 (UTC)[reply]
@Rua, DCDuring, Fay Freak: Heads up that I amended Module:category tree/poscatboiler/data/terms by etymology to standardize these categories and so we now have Category:Taxonomic eponyms by language. I realize this makes the deletion discussion a little more confusing, since the main category has changed, so just giving visibility to the subcats Category:Arabic taxonomic eponyms, Category:English taxonomic eponyms, and Category:Translingual taxonomic eponyms and the fact that the main category under discussion was emptied and deleted for being empty. I've put the notice on the new main category and changed this subheading. —Justin (koavf)TCM 15:54, 13 March 2022 (UTC)[reply]
By which of our definitions of eponym is Anna's hummingbird an eponym? DCDuring (talk) 16:05, 13 March 2022 (UTC)[reply]

Seems to be inconsistently integrated in so far as the latter in its name contains “verbs” but the former does not contain “noun”, and the latter gets categorized as Category:Lemmas subcategories by language but the former as Category:Terms by etymology subcategories by language. Outside the category structure we have Category:Taos deverbal nouns which nobody has noticed. I have no tendency towards any gestalt so far, and I can’t decide either. Furthermore somebody will have to make a complement {{denominal}} for {{deverbal}} – so far there is only an Arabic-specific {{ar-denominal verb}}. Fay Freak (talk) 18:31, 25 February 2019 (UTC)[reply]

A lot of this is redundant to our suffix derivation categories. In many cases, the suffix used already determines what something is derived from. For example, -ness always forms deadjectival nouns, it can't really be anything else. —Rua (mew) 18:47, 25 February 2019 (UTC)[reply]
Please see Wiktionary:Etymology_scriptorium/2018/May#основать. Per utramque cavernam 19:13, 25 February 2019 (UTC)[reply]
True, for “a lot”, and if you know the deep intricacies of Wiktionary’s category structure.
Category:Russian deverbals that contains now 53 entries has only entries the etymology of which consists in just removing the verb ending and using the stem. I see we have for this case Category:Russian words suffixed with -∅ – we just need to implement something like Category:Latin words suffixed with -o that is split by purpose of the suffix, Category:Latin words suffixed with -o (denominative), Category:Latin words suffixed with -o (compound verb) and so on, which is bare laudable. Now you only need to tell people, @Rua, how to create this id stuff, for to me it is a secret thus far.
However this does not work with non-catenative morphology thus far – you may link the previous discussions on those infix categorization matters here, but even if that pattern collecting is solved the derived terms listed at صَلِيب (ṣalīb, cross), for instance, would only be categorized by pattern but nothing would imply that the terms are denominal –, and the point I have made about the categorization and naming of these categories is still there. But I give you green light in any case, if you want to replace all those “[language] deverbals” and “[language] denominal verbs” categorizations by suffigation categories of the format “[language] words suffixed with -∅ [deverbal]”, as well if it concerns action towards categorization of noncatenative morphology language terms, since your idea of uniformity is correct. Fay Freak (talk) 19:49, 25 February 2019 (UTC)[reply]
Nonconcatenative morphology is still an underexplored part of Wiktionary, which is kind of annoying. But quite often, we simply show the concatenative part as the affix, and then leave a usage note saying what other changes occur when this form of derivation is used. For example on Northern Sami -i and -hit. —Rua (mew) 20:40, 25 February 2019 (UTC)[reply]
How to create an affix category with an id: add the id to the definition line in the affix's entry with {{senseid|language code|id}}, add {{affix|language code|affix|id1=id}} (at minimum) to the etymology section of a term that uses the affix, find the resulting red-linked category and create it with {{auto cat}}. — Eru·tuon 20:51, 25 February 2019 (UTC)[reply]
Thanks, this is easier than I imagined, so it takes the category name from {{senseid}}. I thought it is in some background module data. Now where to document it? Add it to the documentation of {{affix}} under |idN=? This is the main or even only use of this parameter in this template, right? Fay Freak (talk) 21:18, 25 February 2019 (UTC)[reply]
It's not that {{senseid}} has any effect on the category name, but that a category with a parenthesis after it, such as Latin words suffixed with -tus (action noun), expects a matching {{senseid}} in the entry for -tus, in this case {{senseid|la|action noun}} because the link in the category description points to -tus#Latin-action_noun, which is the format of the anchor created by {{senseid}}. The |id= type parameters, including in {{affix}}, generally create a link of that type. In {{affix}}, the parameter also has the effect of changing the category name. Sorry, I am not sure if I am explaining this clearly. — Eru·tuon 22:36, 25 February 2019 (UTC)[reply]
You explain this clearly. I just rolled it up from that side that I need to choose the name in {{senseid}} that I want to have in the category name so later with affix I will categorize in a reasonably named category because in other cases the id can arbitrary – not that {{senseid}} has an effect on the category name. Fay Freak (talk) 22:53, 25 February 2019 (UTC)[reply]
Our affix system is not sufficient to handle morphological derivation we have to deal with (unless you want us to introduce lambdas...) Serbo-Croatian hardly has the intricacy of Arabic conjugation, but there are plenty of nouns that are created from verbal roots through apophony, and this needs to be categorized somehow. Crom daba (talk) 17:24, 2 March 2019 (UTC)[reply]
@Crom daba At least for Indo-European, we do have a system for handling combinations of affixation + ablaut, like on *-os (notice the parentheses showing the root grade) and -ος (-os). Our current system totally fails where there is no affix, though, a case which also exists in Indo-European. For example, there are some Indo-European forms of derivation, called "internal derivation", which are built entirely around changing ablaut grades and accents: *krótus (strength) > *krétus (strong) or τόμος (tómos, slice) > τομός (tomós, sharp). We have no systematic way to indicate this kind of derivation, but it is sorely needed. —Rua (mew) 23:42, 30 April 2019 (UTC)[reply]

2019 — March[edit]

1 member in this category, whose purpose I cannot discern and whose name seems like poor English to me. Note: "dismissal" is in Module:labels/data and should be removed from there if this fails. —Μετάknowledgediscuss/deeds 04:16, 31 March 2019 (UTC)[reply]

@Metaknowledge: What about Category:English dismissals? —Suzukaze-c 04:21, 31 March 2019 (UTC)[reply]
Thanks, suzukaze. Accordingly moved to RFM with both cats listed; I now see what the intent is, but I still think the name is bad. —Μετάknowledgediscuss/deeds 04:24, 31 March 2019 (UTC)[reply]
I think it fits in the same idea as Category:en:Greetings, but with a different naming scheme. "Greetings" should probably not be a set category, because sets group words by semantics (i.e. what the words refer to), rather than by function. —Rua (mew) 21:16, 7 April 2019 (UTC)[reply]
Add Category:Punjabi dismissals. Module:labels/data specifies that [[Category:<Language name> dismissals]] be added whenever the context label "dismissal" is used, but nothing has been added for this in the relevant category data module. This apparently predates Module:labels/data, since it was migrated in with all the other usage labels in August, 2013. We should either do something with these categories or get rid of the categorization in Module:labels/data- it's silly to have things showing up in Category:Categories with invalid label just because someone added a context label. Chuck Entz (talk) 17:39, 17 October 2020 (UTC)[reply]

2019 — April[edit]

Topical and set categories group terms based on what they refer to, but this category doesn't contain terms for greetings, it contains terms that are greetings. In other words, the name of the category refers to the word itself, not to its meaning, like Category:English nouns and unlike Category:en:Colors. So the category shouldn't be named and categorised like a set category, but instead should be named Category:English greetings. It belongs somewhere in Category:English phrasebook or Category:English terms by semantic function or something like that. —Rua (mew) 21:21, 7 April 2019 (UTC)[reply]

As above, these terms do not refer to farewells, they are farewells: the category name pertains to the word rather than the meaning. —Rua (mew) 21:27, 7 April 2019 (UTC)[reply]

Support per nom. - excarnateSojourner (talk|contrib) 06:41, 29 October 2021 (UTC)[reply]
Support. I agree with your reasoning. Tc14Hd (talk) 21:35, 3 September 2023 (UTC)[reply]

As above. —Rua (mew) 21:28, 7 April 2019 (UTC)[reply]

Support per nom. - excarnateSojourner (talk|contrib) 06:42, 29 October 2021 (UTC)[reply]

Again, as above. —Rua (mew) 21:28, 7 April 2019 (UTC)[reply]

Support per nom. - excarnateSojourner (talk|contrib) 06:43, 29 October 2021 (UTC)[reply]

Category:Translingual numerals or Category:Translingual numeral symbols[edit]

Discussion moved from WT:RFDO#Category:Translingual numerals or Category:Translingual numeral symbols.

We currently have both Category:Translingual numerals and Category:Translingual numeral symbols. If there's a difference, I'm not sure what it is. If not, I'm assuming we should merge on into the other. -- Beland (talk) 21:22, 26 April 2019 (UTC)[reply]

Numerals can be words (one, two in spelling alphabets), while numeral symbols are not (Roman numerals). The difference is subtle, but I think it is there. — surjection??18:51, 19 October 2021 (UTC)[reply]

Wiktionary:English entry guidelines vs "About (language)" in every other language[edit]

Some years ago, there was an RFM to rename all these pages, the discussion of which is archived at Wiktionary talk:English entry guidelines#RFM discussion: November 2015–August 2018. The original nomination mentions "and likewise for other languages", meaning that the intent was to rename these pages in parallel for every language. In the end, only the English page was moved, so that now the English page has a name different from all the others. User:Sgconlaw suggested starting a new discussion instead of moving the pages after the RFM has long been closed.

My own opinion on this is to rename the pages in other languages to match the English one. That was the original intent of the first RFM, and the new name better describes what these pages are for. The name "about" instead suggests something like a Wikipedia page where you can write any interesting fact about the language, which is of course not what they're actually for. Some discussion may be needed regarding the shortcuts of all these pages. They currently follow the format of WT:A(language code), so e.g. WT:AEN but also WT:ACEL-BRY with hyphens in the name. The original shortcuts should probably be kept, at least for a while, but we may want to think of something to match the new page name as well. —Rua (mew) 13:00, 29 April 2019 (UTC)[reply]

Support. —Μετάknowledgediscuss/deeds 18:17, 29 April 2019 (UTC)[reply]
Support renaming for accuracy and consistency. —Ultimateria (talk) 22:32, 14 May 2019 (UTC)[reply]
SupportJberkel 23:53, 14 May 2019 (UTC)[reply]
@Metaknowledge Apologies, I missed this from a year ago. I'll go ahead and rename. Benwing2 (talk) 00:51, 29 March 2021 (UTC)[reply]
@Metaknowledge FYI, this may take a little while. Lots of these pages have redirects to them and MediaWiki doesn't handle double redirects, so I have to find all the links to these pages (at least, those in redirects) and fix them. Benwing2 (talk) 01:19, 29 March 2021 (UTC)[reply]
@Benwing2: You mean you have to fix the redirects themselves, right? I hope that we can continue to use the WT:AFOO redirects even after the moves are complete. —Μετάknowledgediscuss/deeds 01:28, 29 March 2021 (UTC)[reply]
@Metaknowledge Yes, the redirects need to be fixed to point to the new pages. Benwing2 (talk) 01:31, 29 March 2021 (UTC)[reply]
@Metaknowledge One more thing: Some 'About' pages aren't just "About LANG". What should we rename the following?
  1. WT:About Algonquian languages: Does WT:Algonquian languages entry guidelines work, or should it just be WT:Algonquian entry guidelines?
  2. WT:About sign languages: Should it be WT:Sign languages entry guidelines, WT:Sign language entry guidelines, or something else?
  3. WT:About Arabic/Egyptian, WT:About Arabic/Moroccan, WT:About Chinese/Cantonese, WT:About Chinese/Cantonese/Taishanese, WT:About Chinese/Gan, WT:About Chinese/Hakka, WT:About Chinese/Jin, ... (other Chinese varieties), WT:About Lingala/Old: Does WT:Arabic/Egyptian entry guidelines, WT:Chinese/Cantonese/Taishanese entry guidelines, etc. work, or should we normalize to e.g. WT:Egyptian Arabic entry guidelines, WT:Cantonese entry guidelines, WT:Gan entry guidelines (or WT:Gan Chinese entry guidelines?), WT:Hakka entry guidelines (or WT:Hakka Chinese entry guidelines?), WT:Old Lingala entry guidelines, etc.? Cf. also Wiktionary:About Contemporary Arabic.
  4. Other subpages: Wiktionary:About Chinese/phonetic series, Wiktionary:About Chinese/phonetic series 2, Wiktionary:About Chinese/references, Wiktionary:About Chinese/tasks, Wiktionary:About French/Todo, Wiktionary:About German/Todo, Wiktionary:About German/Todo/missing a-d (and others), Wiktionary:About Greek/Glossary, Wiktionary:About Greek/Draft new About Greek, Wiktionary:About Hungarian/Participles, Wiktionary:About Hungarian/Todo, Wiktionary:About Japanese/Etymology, Wiktionary:About Korean/Romanization, Wiktionary:About Korean/references, Wiktionary:About Korean/Historical forms, Wiktionary:About Norwegian/Layout1, Wiktionary:About Norwegian/Layout2, Wiktionary:About Norwegian/Layout3, Wiktionary:About Spanish/Todo (probably completely outdated), Wiktionary:About Spanish/Todo/missing a-d (and others), Wiktionary:About Swahili/missing a-z, Wiktionary:About Tibetan/references, Wiktionary:About Vietnamese/references
  5. Wiktionary:About Japanese-English bilingual: What about this?
  6. Wiktionary:About Han script, Wiktionary:About Hangul script: Does WT:Han script entry guidelines work, or should it just be Wiktionary:Han script guidelines or something else?
  7. Wiktionary:About International Phonetic Alphabet, Wiktionary:About given names and surnames, Wiktionary:About undetermined languages: Not languages.

Benwing2 (talk) 01:57, 29 March 2021 (UTC)[reply]

  1. @Benwing2: 1. I don't think we need the word "languages". 2. The second option sounds more grammatically correct. 3 & 4. I would go with subpages, but you may want to hold off on those, as some of the pages are heavily used and links to them will have to be fixed. Opinions solicited: @Justinrleung, suzukaze-c, Atitarev, Tibidibi 5. It should be moved somewhere very inconspicuous; we could even delete it and nobody would miss it. 6. I guess the former? 7. The first one is now fine, the second can stay where it is, and the third seems somewhat useless (but @-sche may have an opinion). —Μετάknowledgediscuss/deeds 02:29, 29 March 2021 (UTC)[reply]
No opinion, although I am of the belief that many of our WT:<CJK> pages should be in the Appendix instead. —Suzukaze-c (talk) 04:33, 29 March 2021 (UTC)[reply]
I would probably like WT:Chinese entry guidelines/Cantonese, WT:Chinese entry guidelines/Gan, etc. for the ones in 3 so that they are still treated as subpages of WT:Chinese entry guidelines. — justin(r)leung (t...) | c=› } 05:57, 29 March 2021 (UTC)[reply]
I agree with Justinrleung WRT the Korean pages as well.--Tibidibi (talk) 01:51, 31 March 2021 (UTC)[reply]
I think there's nothing on Wiktionary:About Algonquian languages that requires that page to exist, anyway, and am just going to make it a hard redirect it to the About Proto-Alg. page instead of the soft redirect which is currently its entire contents, keeping the old edit history and old talk page comments. - -sche (discuss) 18:58, 26 July 2021 (UTC)[reply]
Note: There is another open discussion below on this exact topic. - excarnateSojourner (talk | contrib) 23:58, 21 October 2022 (UTC)[reply]

from Wiktionary:English entry guidelines to Wiktionary:About English (currently it's only redirect)[edit]

Reason: to align it with all other WT:About LANGUAGE pages, such as:

--幽霊四 (talk) 18:44, 6 February 2021 (UTC)[reply]

See “Wiktionary talk:English entry guidelines#RFM discussion: November 2015–August 2018”. — SGconlaw (talk) 21:33, 6 February 2021 (UTC)[reply]
@Sgconlaw, Rua: Partial closure of the RFM was clearly not the best solution. Someone with a bot should move all of these and update the redirects. Rua, would you be willing to do that? —Μετάknowledgediscuss/deeds 23:04, 8 March 2021 (UTC)[reply]
@Benwing2, would you be interested in helping out with this mess? —Μετάknowledgediscuss/deeds 23:18, 21 July 2021 (UTC)[reply]
@Metaknowledge I looked into this awhile ago and never finished it, sorry, because of various complexities. I will try to look into this soon. Benwing2 (talk) 02:24, 24 July 2021 (UTC)[reply]
@Benwing2 Any update on this? - excarnateSojourner (talk | contrib) 18:01, 27 April 2022 (UTC)[reply]
You know a discussion page has become too large and stale when there are two open discussions on the exact same topic. - excarnateSojourner (talk | contrib) 00:02, 22 October 2022 (UTC)[reply]
Support move Wiktionary:English entry guidelines --> Wiktionary:About English. Taylor 49 (talk) 22:13, 12 April 2024 (UTC)[reply]

2019 — May[edit]

toponyms[edit]

I think the categories for toponyms (e.g. English terms derived from toponyms) should be moved to a category just called [language] toponyms (e.g. English toponyms). It feels inconsistent to have English terms derived from toponyms while also having English eponyms. —Globins (yo) 01:14, 6 May 2019 (UTC)[reply]

A term derived from a toponym is an eponym, but is not a toponym itself. So the current names make sense. —Rua (mew) 11:45, 9 May 2019 (UTC)[reply]
Sense 2 for toponym is "a word derived from the name of a place," and the entry mentions eponym as a coordinate term. —Globins (yo) 00:04, 10 May 2019 (UTC)[reply]
@Globins Wiktionary's category structure only follows the first definition, which is the more common meaning. We shouldn't mix up the two definitions. —Rua (mew) 17:52, 13 May 2019 (UTC)[reply]
@Rua: In that case, English eponyms should be moved to English terms derived from eponyms since our current category name follows the less common definition of eponym. —Globins (yo) 21:16, 14 May 2019 (UTC)[reply]
Not really. An eponym is derived from a name. A toponym is a name. So a term derived from a toponym is derived from a name, but a term derived from an eponym is derived from another word that is then derived from a name. They're not equivalent. —Rua (mew) 21:18, 14 May 2019 (UTC)[reply]
I think "eponymic terms" would be better if you want to preserve the "name that a term is derived from" sense of eponym (as opposed to the "term derived from a name" sense). "Terms derived from eponyms" seems odd, maybe tautological, to me because a name is not inherently an eponym, but only when we are discussing the fact that a term is derived from it. — Eru·tuon 21:35, 14 May 2019 (UTC)[reply]
@Globins Do you have any response to Rua or Erutuon? It would be nice to mark this discussion as resolved if it isn't going to go anywhere. - excarnateSojourner (talk | contrib) 03:58, 19 October 2022 (UTC)[reply]
@ExcarnateSojourner I think I agree with Erutuon's category name suggestion then. —Globins (yo) 17:56, 19 October 2022 (UTC)[reply]

This was previously submitted to deletion, but kept (why it wasn't RFMed instead I don't know). —Rua (mew) 18:46, 19 May 2019 (UTC)[reply]

Support. DonnanZ (talk) 18:50, 19 May 2019 (UTC)[reply]
@Rua, Sgconlaw: The word "automobile" is not common in British English, but I think "car" is used everywhere, hence my preference for Category:Car parts. DonnanZ (talk) 09:22, 3 June 2019 (UTC)[reply]
But car is more ambiguous. DCDuring (talk) 10:28, 3 June 2019 (UTC)[reply]
I don't mind one way or another, but the whole category tree then needs to be renamed for consistency. (@Donnanz: how is car ambiguous? Do you mean it could be confused for, say, a train carriage or something?) — SGconlaw (talk) 10:34, 3 June 2019 (UTC)[reply]
Well, car is used especially in US English for a railroad car (either freight or passenger), and can be used in BrE for a railway passenger carriage. I feel the word auto can be ambiguous as well; "auto parts" can be used in the UK, but "car parts" is preferred. The word "auto" isn't used for a motor car in the UK. There is another category, Category:Automotive, so Category:Automotive parts may be a solution. DonnanZ (talk) 13:52, 3 June 2019 (UTC)[reply]
I was employed in the motor trade for many years, supplying car parts of all descriptions, even body shells on one or two occasions. DonnanZ (talk) 14:23, 3 June 2019 (UTC)[reply]
In that case it seems to me that "Category:Automobile parts" is least ambiguous. I'm not sure "Category:Automotive" is well named (why an adjective?); "Category:Road transport" would be better. — SGconlaw (talk) 15:15, 3 June 2019 (UTC)[reply]
Category:Nautical also uses an adjective, and there may be others. DonnanZ (talk) 15:46, 3 June 2019 (UTC)[reply]
Yeah, not too hot on that one either. My suggestion would be "Category:Water transport". — SGconlaw (talk) 15:58, 3 June 2019 (UTC)[reply]
As long as I can type {{lb|en|car part(s)}} and get the topical category Category:en:Automobile parts, my increasingly arthritic fingers would be happy. DCDuring (talk) 16:50, 3 June 2019 (UTC)[reply]
I can only sympathise. Depending on the outcome here, if you feel like fiddling around with modules I think Module:category tree/topic cat/data/Technology is the right one. DonnanZ (talk) 11:36, 4 June 2019 (UTC)[reply]

2019 — June[edit]

As has been pointed out here, "have" isn't part of the term. Chuck Entz (talk) 12:24, 25 June 2019 (UTC)[reply]

As I see it, have isn't part of the metaphor, but it is part of an expression that is not in turn a form of tie someone's hands. The passive (one's) hands are/were/being/been tied are such forms, though none make for a good lemma entry or likely searches. DCDuring (talk) 13:38, 25 June 2019 (UTC)[reply]
@DCDuring: Thanks. Also the second meaning of tied: restricted (which even offers the quotation: but the county claims its hands are too tied) --Backinstadiums (talk) 14:25, 25 June 2019 (UTC)[reply]
It's still a metaphor: a county doesn't have directly have hands. DCDuring (talk) 17:39, 25 June 2019 (UTC)[reply]
For an example of tie someone’s hands being used in the active voice: “It will tie our hands for another nine years with respect to a labor contact [sic] with no layoff clauses and raises that are built in.”
In general, for any expression of form “⟨VERBsomeone’sNOUN⟩”, there is a corresponding expression “ have/get one’sNOUN⟩ ⟨VERBed⟩”. For example, cut someone’s hairhave one’s hair cut. Or knock someone’s socks offget one’s socks knocked off. Or lower someone’s earshave one’s ears lowered. If the expression is idiom, sometimes we have one, sometimes the other, and sometimes both.  --Lambiam 21:24, 25 June 2019 (UTC)[reply]
Indeed. Unless the active form is very uncommon, I'd prefer it as the lemma. I don't think that we would be wrong have both the active-voice expression and the have and/or get expressions, even though we could argue that it is a matter of grammar that one can transform certain expressions in the way Lambian describes. DCDuring (talk) 22:31, 25 June 2019 (UTC)[reply]

Request to merge Haitian Vodoun Culture language [hvc] into Haitian Creole language [ht][edit]

According to Wikipedia and Ethnology, hvc "appears to not be an actual language, but rather an assortment of words, songs, and incantations – some secret – from various languages once used in Haitian Vodoun ceremonies". Our only entries for it are Langaj and Langay, i.e. the two forms of the lect's name for itself. I suggest we consider it a variety of ht instead. Thoughts? Pinging @EncycloPetey as the creator of the two entries, although he hasn't been around for over a month. —Mahāgaja · talk 12:00, 28 June 2019 (UTC)[reply]

Although Ethnologue says it is "probably not a separate language", it does not say which language to which it might belong. So merging it into another language would be original research, unless a source documents its inclusion in Haitian Creole. Nota bene: at the time I created the entries, neither WP nor ethnologue expressed doubts about the distinctness of Haitian Vodoun Culture language. I am aware of its current doubtfulness, but it could also be considered a liturgical language in its own right. Without some authoritative statement, I'd hesitate to merge it into another language. There is more than one language spoken in Haiti. --EncycloPetey (talk) 23:28, 28 June 2019 (UTC)[reply]
Support. Hebblethwaite says in the excellent Vodou Songs in Haitian Creole and English that this is not only not a language, but that "[t]he words and chunks have mostly become incomprehensible to Vodouists and have a ritual or mystical purpose." Apparently the langaj used with different loa can be from entirely different languages from different parts of Africa. The entries we have are indisputably Haitian Creole (and which I think should not be capitalised according to orthographic rules). —Μετάknowledgediscuss/deeds 04:44, 31 March 2020 (UTC)[reply]
I can find little about this apart from the references mentioned already. The Encyclopedia of Language and Linguistics says it is "used as a second language only", as does Toyin Falola, Niyi Afolabi, and Adérónké Adésolá Adésànyà's Migrations and Creative Expressions in Africa... (2008), which says on page 157 "Langay, also referred to by linguists as “Haitian Vodoun Culture Language” (Gordon 2005), is used in Haiti as a second language for religion, song, and dance. Although it clearly has some Haitian Creole words, it is assumed that some of its vocabulary may be African.", as if they aren't even familiar enough with it to be sure what its vocabulary is. Interestingly, Jeffrey E. Anderson's article on it in The Voodoo Encyclopedia (2015), says that some pieces of it are attested from speakers/songs in the Mississippi Valley and its Voodoo tradition, outside Haiti / Vodou, which suggests the ISO's awkward FYROM-esque designation is, well, awkward. (Anderson does caution that "most [records] show little sign of langaj apart from a few words and some personal names of spirits", and "the origin of those that do appear to incorporate langaj is often unclear; the tendency of early authors to uncritically assume that Haitian Vodou and Mississippi Valley Voodoo were essentially the same thing renders it possible that some songs reportedly belonging to the Mississippi Valley may actually have been Haitian".) From what little I could find, it seems like a set of vocabulary (rather than a language per se) that might be compared to e.g. pandanus-avoidance vocabulary or Polari. (Procedurally, it will be subject to the same attestation requirements either way, and can be labelled and categorized.) - -sche (discuss) 07:54, 2 August 2020 (UTC)[reply]

2019 — August[edit]

devil a ...[edit]

I request that someone renames the entry devil a bit to devil a, because in practice "devil a" can be followed by any noun phrase, and it is neither practical nor desirable to have a headword for every combination. I then would also add divil a as a separate headword with no more content than referring it to "devil a" as a dialect variant. JonRichfield (talk) 08:13, 18 August 2019 (UTC)[reply]

We say ourselves in the entry for oxymoron that its use to mean "contradiction in terms" is loose and sometimes proscribed (despite the fact that many people use it this way nowadays). We say much the same thing at contradiction in terms as well.

The so-called oxymorons in this category are all or almost all contradictions in terms, where the contradiction is accidental or comes about only by interpreting the component words in a different way from their actual meanings in the phrase. An oxymoron in the strict sense has an intentional contradiction. I think we should be more precise about this, in the same way as we already are with using the term "blend" instead of "portmanteau", which has a narrower meaning. I therefore suggest we move this page to "Category:English contradictions in terms" (but see my second comment below). Likewise for any corresponding categories for other languages. — Paul G (talk) 06:51, 25 August 2019 (UTC)[reply]

On second thoughts, I think this category should be retained but restricted to true oxymorons, such as "bittersweet" and "deafening silence". Ones such as "man-child" and "pianoforte" are not intended to be oxymoronic and are only accidentally contradictions in terms. — Paul G (talk) 17:18, 26 August 2019 (UTC)[reply]

Support. Andrew Sheedy (talk) 01:30, 27 January 2020 (UTC)[reply]

2019 — September[edit]

Church Slavonic from Old Church Slavonic[edit]

Discussion started at Wiktionary:Beer_parlour/2019/September#I_want_to_add_Church_Slavonic_terms. A new language code for a newer version of Church Slavonic? --Anatoli T. (обсудить/вклад) 12:01, 16 September 2019 (UTC)[reply]

2020 — January[edit]

'Cities in Foo' and 'Towns in Foo'[edit]

@Donnanz, Fay Freak, Rua I'm not sure what the real difference is between a city and a town, and I suspect most people don't know either. For this reason I think we should maybe merge the two into a single 'Cities and towns in Foo' category. Benwing2 (talk) 03:54, 17 January 2020 (UTC)[reply]

I oppose this merger. I would not think to look for a category with such an unintuitive name, and I do not know of any examples where this is problematic. Wikipedia seems to be able to choose which word to use without trouble, so why can't we? —Μετάknowledgediscuss/deeds 05:36, 17 January 2020 (UTC)[reply]
Eliminating one of them is a good idea where there is no meaningful distinction between cities and towns. But that's going to be a country-specific decision: England makes the decision, the Netherlands does not. I think in cases without a distinction, we should keep "cities" and eliminate "towns". —Rua (mew) 10:15, 17 January 2020 (UTC)[reply]
I wouldn't recommend merging them. It's a complex subject though, and the rules defining cities and towns can differ from country to country, and from state to state in the USA; I have come across "cities" with a population of less than 1,000 in the USA, sometimes around 50, but apparently they have that status. Cities in the UK have that status as granted by a monarch, towns can be harder to define in metropolitan areas, and villages can call themselves towns if they have a town council. Some villages large enough to be towns prefer to keep the village title. DonnanZ (talk) 10:34, 17 January 2020 (UTC)[reply]
The odds that editors will accurately/consistently distinguish these categories when adding (the template that generate) them ... seems low. However, even if the categories are merged, that problem will remain on the level of the displayed definitions. And, apparently some users above want to keep them distinct. So, meh. - -sche (discuss) 05:36, 18 January 2020 (UTC)[reply]
I can see arguments for both sides, actually. The idea needs a lot more thought, as you would probably have to drag in villages etc. as well. DonnanZ (talk) 14:15, 19 January 2020 (UTC)[reply]
Could merge them into Municipalities in Foo and have the various alternatives point to that category. Of course there are some "cities" which contain several municipalities, but I don't think there is a word which comprises every form of village/town/hamlet/city/urban area. - TheDaveRoss 12:47, 21 January 2020 (UTC)[reply]
In New York State alone, we have cities, towns, villages (which are subdivisions of towns), and unincorporated places, all of which exist within counties, except NY City, which is coextensive with 5 counties, each of which is coextensive with a borough of the City. The identities and borders of these places in NYS are generally fairly stable, though subject to occasional revision. Legislative and judicial districts are separate, with legislative districts changing after each decennial census. Census-designated places form a parallel structure with relationships to the state systems. The census system has the virtue of being uniform for the entire US, but the borders of many census places do not necessarily correspond to the borders of larger governmental units such as states and counties. Within New York State there are lists of each type of jurisdiction. In principle each US state has its own names for classes of jurisdictions. Finally, in popular practice, place names for inhabited place can differ from the names of governmental units and tend to have different boundaries even when the names are the same.
In light of the lack of homogeneity even within the US, let alone between countries, I think we need to respect national and state and provincial naming systems. If there is a worldwide system for categorizing places, we could also follow that, but I have not heard of such a system. Does the EU have some uniform system?
In the absence of any generally accepted uniform universal or near-universal system for categorizing places, I think we need to accept the fact that nations and semi-sovereign parts of nations (eg, US states, Canadian provinces) each have their own naming systems, which are accepted within their boundaries. I think it would be foolish for us to attempt to have our own system for categorizing places and derelict for us to fail to use the various national and subnational categories.
If the categories then don't lend themselves to a uniform universal categorization system, too bad. DCDuring (talk) 17:55, 20 September 2020 (UTC)[reply]

This has both a Jamaican Creole section and an English section labeled as "Jamaican". I undid the removal of the English section by an IP before I realized what they were doing, but I don't want to revert myself and take it off the radar before someone else has a look at it to make sure the English section really is unnecessary. Chuck Entz (talk) 04:39, 23 January 2020 (UTC)[reply]

@Dentonius Any thoughts? Vox Sciurorum (talk) 00:12, 29 November 2020 (UTC)[reply]
A few of the quotes look like Patwa. But the others suggest that the word is used and generally understood by English speakers. Our -claat words are popular in Hip Hop music and are often used by African Americans (among others). I'd keep the English section. — Dentonius 08:03, 29 November 2020 (UTC)[reply]
I testify that it is correct to have an English section because of widespread use in Multicultural London English, and the pronunciation section is correct. This is also true for bloodclaat and bumboclaat. I.e. while “normal” Britons say bloody many of them new English say bloodclaat instead, and raasclaat. Fay Freak (talk) 11:07, 1 December 2020 (UTC)[reply]

2020 — February[edit]

Merge with out on a limb. Canonicalization (talk) 11:25, 9 February 2020 (UTC)[reply]

L.S., LS, lectori salutem, locus sigilli[edit]

The only related wiktionary entry that now appears is that of L.S.; no entries appear for the terms being abbreviated (!), and search of the full terms does not even currently link to the abbreviation/initialism page. (Nor does a seach of LS bring one to a disambiguation of L.S. and LS!) All of these issues can be rectified easily by any registered editor with a reasonable understanding of disambiguation and markup (e.g., through creation of disambiguating tags and pages, and through duplication of relevant content for new definitions pages based on the abbreviation page).

Note, as an academic, I will not regulary or traceably work on Wiktionary, because of its lack of sourcing requirements for entry and note content. This leaves it with no basis for veracity, its persistent state, and a poor state indeed. (This weakness is more significant than that of Wikipedia, which is weak in largest part for its failure to adhere to its own rules and guidelines regarding sourcing.) Cheers. 2601:246:C700:19D:49BF:AECD:6AA6:2E34 16:26, 25 February 2020 (UTC)[reply]

I just don't understand what you're after. I reverted your edit because it radically changed what seemed to be an ok entry. We don't have disambiguation pages on Wiktionary and I suggest you read up on our rules and guidelines before you start deleting info again. --Robbie SWE (talk) 18:11, 25 February 2020 (UTC)[reply]
  • Kept: I assume lectori salutem and locus sigilli are not used (in their unabbreviated forms) in English and are SoP in Latin, meaning we should not have entries for them under either language. L.S. still has literal translations of the Latin, though. — excarnateSojourner (ta·co) 02:52, 7 April 2024 (UTC)[reply]

2020 — March[edit]

We have four definitions here and probably ought to have one; compare centripetal force, with its one, simple def. —Μετάknowledgediscuss/deeds 15:36, 12 March 2020 (UTC)[reply]

See w:History of centrifugal and centripetal forces. The article just lacks cites, dates, etc to support the historical definitions. I suppose that we should just give up on trying to cover the historical definitions and leave that to our betters at WP. DCDuring (talk) 17:55, 12 March 2020 (UTC)[reply]

Attested only in West Germanic, so it should be moved to Reconstruction:Proto-West Germanic/smalt. I did that already, but Rua reverted me, so I'm bringing it here for discussion. —Mahāgaja · talk 20:25, 16 March 2020 (UTC)[reply]

It cannot have been formed in Proto-West Germanic, as there was no productive means to do so. Therefore, it must have existed in Proto-Germanic. —Rua (mew) 20:26, 16 March 2020 (UTC)[reply]
Ablaut was still very productive in Proto-West Germanic, as it is still today in Germanic languages. One might claim there is no productive means to form dove as the past tense of dive or snuck as the past tense of sneak in modern English, and yet they exist. —Mahāgaja · talk 20:31, 16 March 2020 (UTC)[reply]
I'm not convinced that ablaut was productive even in the most recent stage of Proto-Germanic, let alone Proto-West Germanic. What evidence do we have that it was? —Rua (mew) 20:39, 16 March 2020 (UTC)[reply]
Well, if the fact that it's productive in the modern Germanic languages doesn't convince you, I don't know what will. —Mahāgaja · talk 20:59, 16 March 2020 (UTC)[reply]
In what way is it productive in the modern languages? —Rua (mew) 10:16, 17 March 2020 (UTC)[reply]
I just said: we still use it in modern English to form past tenses. And often in ways that don't even have parallels, so it can't be simply analogy. Dive/dove can be formed from drive/drove, but sneak/snuck and drag/drug don't have direct parallels that allow us to call them simple analogy, because there aren't any other verbs in /iːk//ʌk/ or /æɡ//ʌɡ/, so the only way speakers can have created them is by knowing that the language has a general process of ablaut. And even in Proto-Germanic *smultą/*smaltą isn't exactly a productive pattern: PG didn't generally create exact synonyms of nouns by changing their ablaut grade without any other affixation. So this derivation is just as irregular in PG as it is in PWG, so why not call it PWG since it doesn't exist outside of West Germanic? —Mahāgaja · talk 11:37, 17 March 2020 (UTC)[reply]
@Mahagaja: English drag derives from a strong verb, and was influenced by Old Norse draga with its indicative past "dró-". English sneak could also be derived from a strong verb but why its has snuck is beyond me, possibly by analogy. dove comes from a strong verb Proto-Germanic *dūbaną and dive from *dūbijaną. English drive has as its past participle "drove, drave, driv" with driv being the original, drove possible from *draib and drave possibly from before Middle English? "draib" (PG) -> "drāf" (OE) -> "drove" (E). None of this points towards productivity of ablaut or of the -an suffix but that English can reshape strong verbs by merging weak verbs or reshaping their pattern through analogy. 𐌷𐌻𐌿𐌳𐌰𐍅𐌹𐌲𐍃 𐌰𐌻𐌰𐍂𐌴𐌹𐌺𐌹𐌲𐌲𐍃 (talk) 21:13, 20 March 2020 (UTC)[reply]
Consider also yeet, which is a running joke among young people online. They've decided that the past tense form is yote, which, given that it's completely made up, has no historical process to explain it whatsoever. That means that whoever made this up was aware of ablaut and (sort of) how it works.
Besides, this wouldn't be the first case of a form appearing to be in complete violation of all the rules of historical linguistics. It's always been a matter of probability, with the occasional exception proving the rule. Those poor early Germanic people didn't have access to the Neo-Grammarian literature, so they can be excused for getting it wrong now and then... Chuck Entz (talk) 06:13, 21 March 2020 (UTC)[reply]
I agree ablaut is still (marginally / semi)productive in modern English (this, page 59 from the 1973 Meeting Handbook of the Linguistic Society of America, says "It is also reasonably clear that semiproductive processes like English ablaut are the subject of general rules. For example, new items like snuck, dove, and drug attest to the viability of the ablaut M - rules."). I lack enthusiasm for figuring out whether this is PWGmc or PGmc. - -sche (discuss) 04:30, 18 August 2020 (UTC)[reply]
  • FWIW this noun is possibly the source of Finnish malto (flesh (of fruit)), which would require a NWG dating at least. Unfortunately not assured, though… there is also a partly homonymous Proto-Northern Finnic *malto (soft) with a different Germanic etymology proposed (currently given at malto). In fact this even suggests to me that *smalt might be, instead of a regular derivative of any kind, just a meld of the competing *smultą with the adjective *maltaz, which both can be reconstructed at least for NWG. (The latter's reflexes include ON maltr > maltur and OHG malz.) --Tropylium (talk) 17:31, 1 January 2021 (UTC)[reply]

Retiring Moroccan Amazigh [zgh][edit]

We renamed this code from "Standard Moroccan Amazigh" to "Moroccan Amazigh", but failed to note that the "standard" part was key. This is a standardised register of the dialect continuum of Berber languages in Morocco, promoted by the Moroccan government since 2011 as an official language. Marijn van Putten says this is essentially Central Atlas Tamazight [tzm], but most of the people producing texts in it are native speakers of Tashelhit [shi], so there is a bit of re-koineisation. However, if we move forward with good coverage of the Berber languages, every entry in [zgh] will be a duplicate of [tzm] or else a duplicate of [shi] marked with some sort of dialectal context label. By the way, the fact that there is an ISO code seems to be a political consideration rather than a linguistic one; compare the case of "Filipino", which we merged into Tagalog, or "Standard Estonian", which we merged into Estonian. @Fenakhay, -scheΜετάknowledgediscuss/deeds 21:31, 16 March 2020 (UTC)[reply]

Hmm, I see it's a rather recent attempt at standardization, too. I don't feel like I know enough about Tamazight to be confident about what to do, but it does seem like, if this is based on tzm, it could be handled as tzm (perhaps even, instead of putting "non-[ordinary-]tzm" entries at shi+label, they could be tzm+label, unless they're obviously shi words). - -sche (discuss) 15:44, 19 March 2020 (UTC)[reply]
Generally, it seems the [shi] words are quite obvious; the main differences between [tzm] and [shi] are lexical (as far as I can tell, [tzm] has more internal diversity w/r/t phonology than differences with [shi]). But they're in a continuum anyway, and WP claims that there's debate on where to draw the dividing line. —Μετάknowledgediscuss/deeds 16:35, 19 March 2020 (UTC)[reply]
And “Moroccan Amazigh” does not sound like a language name anyway if you have not been told it is one, it seems like “Berber as spoken in Morocco”, another reason to remove it. Fay Freak (talk) 15:59, 21 March 2020 (UTC)[reply]

2020 — April[edit]

to blue in the face (now a redirect to until one is blue in the face).

In addition of all the tense, person, and number variants (also contractions) of the current entry one can find variants omitting the pronoun, adding adverbs, using till or 'til instead of until; [VERB] oneself blue in the face; go|become|turn blue in the face; and blue-in-the-face and blue in the face as adjectives outside any of these expressions. The unchanging core of these is the set phrase blue in the face. It also has medical use (synonym cyanotic), which renders the figurative sense evolution and meaning obvious. DCDuring (talk) 17:39, 15 April 2020 (UTC)[reply]

The reconstructed infinitive form is useful to understand what the underlying verb is but it is never used in a sentence to convey meaning, like Azerbaijani *imək, Uzbek *emoq. —92.184.116.176 23:50, 20 April 2020 (UTC)[reply]

@Allahverdi Verdizade: Seems to me like reconstructions are not meant for this purpose. Is there a better way to lemmatise this verb? —Μετάknowledgediscuss/deeds 23:38, 8 March 2021 (UTC)[reply]
@Metaknowledge: I agree, and I don't think it's needed for any purpose, at least not for Azerbaijani. There was a user (or anon?) who insisted on adding those "underlying" verbs and creating templates for them, but I never understood the linguistics behind this reasoning. Allahverdi Verdizade (talk) 23:42, 8 March 2021 (UTC)[reply]
@Allahverdi Verdizade: So can we just delete them? How should they be lemmatised? —Μετάknowledgediscuss/deeds 01:27, 9 March 2021 (UTC)[reply]
You could lemmatize imiş as a free morpheme-form of -miş, i.e. Allahverdi Verdizade (talk) 01:36, 9 March 2021 (UTC)[reply]

2020 — May[edit]

to turn state's evidence.

Most use of state's evidence is clearly of state + 's + evidence. I haven't found any use that is suggestive of a restriction to a witness's testimony, except with the use of turn. Also compare turn state's evidence”, in OneLook Dictionary Search. with state's evidence”, in OneLook Dictionary Search.. DCDuring (talk) 14:16, 13 May 2020 (UTC)[reply]

They can't be state + 's + evidence when the phrase encompasses proceedings where the prosecutor is not a state (e.g., a municipality, county, or country). bd2412 T 05:32, 6 August 2020 (UTC)[reply]
Sense 3 of state should cover it. I think if 3(a) doesn't cover it, then "Never do anything against conscience even if the state demands it." is not an appropriate citation thereof; I think Einstein would consider national, state, and city governments all part of "the state".--Prosfilaes (talk) 07:04, 17 August 2020 (UTC)[reply]

I wonder if these all ought to be merged into some entry akin to "play the ____ card" or something. There appear to be other words substituted aside from victim, race, and gender. Tharthan (talk) 22:09, 21 May 2020 (UTC)[reply]

I lament that our way of handling snowclones is not optimal, banishing them to appendix-space, such that the choices here amount to 'have these multiple similar entries in the mainspace where users find them' or 'banish them to a tidy but less-findable appendix'. However, I see that we have a sense at card for this (although the definition could use some work), and between putting a link there and redirects from these entries, I suppose we could get by with migrating these to the snowclone appendix. Centralizing them does seem sensible since there are so many. ("Play the religion card" also exists.) - -sche (discuss) 23:56, 21 May 2020 (UTC)[reply]
Maybe a title like play the prejudice card. — This unsigned comment was added by 2600:387:9:9::bf (talk) at 14:37, 2022 September 4.
Interesting idea. Perhaps there would be an extensive entry for play the (something) card, but with full entries for the main attestable instances (eg, race/gender and perhaps victim, derived terms, and a usage note about "(something)". Play the X card seems to be something that would be highly productive, unless its use in too many cases would be deemed a microaggression. Attestation for play the (something) card would have to be limited to "somethings" other than the forms that have their own attestation. Other instances that I can readily find are disability, oppression, and queer. The uses of feminist and bully don't fit the "victim" semantics, which might warrant a second figurative definition for play the (something) card in addition to a {{&lit}} "definition". DCDuring (talk) 15:10, 4 September 2022 (UTC)[reply]
BTW, we are not alone in having an entry for these, but MWOnline only has one for use/play the race/gender card. Collins and Cambridge Advance Learner's have only play the race card. DCDuring (talk) 15:21, 4 September 2022 (UTC)[reply]
Besides those, there's "play the poverty card", "play the gay card", "play the abuse card", "play the disabled card", "play the rape card", etc., as well as ones which, as you say, seem like they may have different semantics (e.g. some uses of "play the Muslim card" in reference to legislation ?to get Muslim support?, and some uses of "play the Holocaust card"?) ... it seems too productive to have entries for every attested X (it becomes SOP). Should this be in the mainspace as play the something card, or at Appendix:Snowclones/play the X card like Appendix:Snowclones/X is the new Y? For snowclones like this that require placeholders other than "someone"/"one" in the title, we seem to in recent years prefer to put them in Appendix:Snowclones/ rather than in mainspace, but I do see a handful of mainspace titles where "something" is a placeholder, like give something a go. If we redirect all the variations people might search for, add usexes to the relevant sense we list at card, and maybe add a usex to whichever sense of play is relevant, it should be sufficiently findable. - -sche (discuss) 16:48, 4 September 2022 (UTC)[reply]
I'd favor having a full entry for any term (presumably they would be attestable) that another dictionary had. It is unfortunate that our basic search engine searching for "play the disabled card" (with or without quotes) does not take a user to any of our existing play the X card entries. (I have added test entries for play the card and play the something card.) That would imply that we could use hard redirects for as many attestable instances of the snow clone as seem likely to help users. It may well be that the hard redirects should go to the snowclone appendix subpages, but there is no particular reason to do so in preference to a mainspace entry. Concern about the aesthetics of headwords with a placeholders seem misplaced. And (who knows?) someone might actually search for the expression using a placeholder and find it if it were in principal namespace. DCDuring (talk) 20:27, 4 September 2022 (UTC)[reply]
Also, as the MWOnline entry shows play is not strictly essential; it can be replaced by use, among other verbs, such as deploy. So, perhaps a sense of card is an appropriate target for redirects. But I doubt that the entry for card is the right place for an intelligible presentation. For one thing, any etymology (sense derivation), usage notes, and derived terms or collocations (eg, race card) would necessarily be separated from the relevant definition for the polysemous noun, so as not to appear on the same screen. And, even if they did, that they belonged together would not be at all obvious. I realize that this kind of argument, if applied, might make for some inconsistency in our presentation of snowclones and might violate a strict reading of idiomaticity, but cases like this may merit exceptional treatment. DCDuring (talk) 21:05, 4 September 2022 (UTC)[reply]
Onelook finds "play the race card" and "play the gender card" in various dictionaries, but "play the victim card" only in us, and it seems unlike the others in other ways, too: card seems unnecessary, as the same meaning is expressed by play the victim. As you note, all of these can also be found with other verbs, like "use". I am inclined to redirect play the victim card to either card's relevant sense or play the something card. It has a Swedish translation; if there are others, I would think play the victim would be the better THUB location. I'm not sure what to do about play the gender card and play the race card; on one hand, each is in other Onelook dictionaries; OTOH, you can swap out "play" for "use", "gender" for other things ("sexism", "sex", "woman", and with different meaning other things like "religion", etc), it's not a set phrase and the kernel of idiomaticity is obviously some smaller part, maybe just card, not the whole phrase. - -sche (discuss) 17:03, 7 September 2023 (UTC)[reply]
But the current definitions of both card and play the something card need improvement before anything can be sensibly merged there. - -sche (discuss) 17:05, 7 September 2023 (UTC)[reply]
Is the entry for race card good enough? DCDuring (talk) 07:50, 8 September 2023 (UTC)[reply]
As to card, we may need two additional definitions, one for the general metaphorical sense, another (subsense?) for the more specific sociopolitical use. As Equinox observed elsewhere, the metaphor of a competitive card game must be understood for the expression to make any sense at all.
(figuratively) A ploy of potentially advantageous use in a situation viewed as analogous to a card game.
The only card left for him to play was playing dumb.
An invocation of an emotionally or politically charged issue or symbol, as in a political competition.
race card, gender card
HTH. DCDuring (talk) 08:28, 8 September 2023 (UTC)[reply]

I don't think this is a special phrase with "you're", it sounds like a phrasal verb be on. They want a fight? They're on! She issued a challenge, so she's on!. You can also use it in reference to the fight itself, e.g. the fight is on. 76.100.241.89 18:51, 23 May 2020 (UTC)[reply]

Just noting to compare good on you→good on someone above. — 69.120.69.252 02:46, 24 May 2020 (UTC)[reply]
You're on might be considered distinct because it is usually a speech act, indicating acceptance of a bet or dare. DCDuring (talk) 17:34, 24 May 2020 (UTC)[reply]
Hmm, perhaps. But the IP is right that "on" can be used with other pronouns. I suppose the question is whether this is better viewed as someone is on, be on, or just on: we already have a sense for this at on, "(informal) Destined, normally in the context of a challenge being accepted; involved, doomed. "Five bucks says the Cavs win tonight." ―"You're on!" Mike just threw coffee onto Paul's lap. It's on now." - -sche (discuss) 04:25, 1 August 2020 (UTC)[reply]

2020 — July[edit]

I suggest that this entry be moved to Reconstruction:Proto-Slavic/vьlkodlakъ, since the -dl- cluster in the Czech descendant vlkodlak indicates that the cluster was still present in the Proto-Slavic form and was reduced to -l- in the other descendants. --108.20.184.19 00:44, 10 July 2020 (UTC)[reply]

User:Bezimenen, seems sensible? PUC12:02, 10 July 2020 (UTC)[reply]
@PUC: I have no objections to the move, however, I'm not entirely sure that *vьlkodlakъ was the primary form. Semantically, it makes sense to analyze the lemma as Proto-Slavic *vьlkodolkъ = *vьlkъ (wolf) +‎ *dolka (skin) +‎ *-ъ with -ol- > -la- metathesis or possible *vьlkodьlakъ (less likely in view of East Slavic forms with *-olo-, e.g. Russian вурдала́к (vurdalák, vampire)[1] /first recorded in written form in 18-19 cent./). You should consult with User:Rua in regard to which form should be created - *vьlkodlakъ or *vьlkodolkъ. I'm not so familiar with the style that Wiktionary likes to follows. Безименен (talk) 12:25, 10 July 2020 (UTC)[reply]
If the original form had -dl-, why do we not see it in the other languages that preserve it, such as Polish? —Rua (mew) 13:25, 10 July 2020 (UTC)[reply]
Not sure, but looking again at the entry, it seems not only Czech but also Serbo-Croatian and Slovene preserve the -dl- as well. --108.20.184.19 16:51, 10 July 2020 (UTC)[reply]
Serbo-Croatian (and, I believe, Slovene) never preserves Proto-Slavic -dl- clusters, so the Serbo-Croat form indicates either some such form as Proto-Slavic *vьlkodolkъ or a later epenthesis of -d- by analogy with dlaka. — Vorziblix (talk · contribs) 16:20, 27 July 2020 (UTC)[reply]

An example of w:U and non-U English, which probably should be decided for the latter. While “scent” can possibly be broader, this category also has the danger of just about including anything that has a strong odour naturally. Hence I included بَارْزَد (bārzad, galbanum) and جُنْدُبَادَسْتَر (jundubādastar, castoreum). The English category has a weak six entries since created in 2011. But even Category:en:Perfumes includes dubious things. I doubt perfumes are something that can be categorized well – it’s basically anything smelly? –, maybe delete all? Fay Freak (talk) 01:09, 27 July 2020 (UTC)[reply]

I think a case could be made for "scent" being not something that smells, but smell itself (like musk and maybe putridity). I don't see any reason why perfumes can't be categorized. I don't think it's meant to include anything that could be used as the scent of a perfume, but words that specifically describe perfumes. For instance, cologne isn't "cologne-scented", it's the name of a type of perfume; jasmine is a plant, but it is also used as the word for a perfume, not just to describe a perfume (you could say, "She always wore a liberal quantity of jasmine" and not just "She always wore a liberal quantity of jasmine-scented perfume". Of course, you could also say "She always wore a liberal quantity of Autumn Breeze" because it's a proper noun, but I don't think you could say "She always wore a liberal quantity of lilac". Instead you would say "lilac perfume".) Andrew Sheedy (talk) 03:07, 27 July 2020 (UTC)[reply]
So keep Category:Perfumes, in case I wasn't clear. I'd lean towards keeping Category:Scents as well, but I'd have to hear a few more opinions first. Hearing the value of having the category for other languages would be helpful. Andrew Sheedy (talk) 03:09, 27 July 2020 (UTC)[reply]

2020 — August[edit]

Adjective section should be merged into noun section.

I do not believe that the adjective shows a word that is truly an adjective, rather than a noun used attributively. Moreover, the noun section lacks a definition like "an organism or object with a blue tail", which is precisely the sense claimed by the adjective section.

Is this page used for merging of sections of the same entry, in the same language? DCDuring (talk) 13:59, 31 August 2020 (UTC)[reply]

Yes. the listed derived birdnames are actually compounds blue + tail + X, as becomes obvious in German, e.g. Blauschwanz-Fruchttaube, for which nobody would create an adjective entry. Fay Freak (talk) 14:05, 31 August 2020 (UTC)[reply]

2020 — September[edit]

ungjetë to a new ungjet[edit]

Ungjetë is actually just a variant of ungjet, which is the standard form. ArbDardh (talk) 19:13, 12 September 2020 (UTC)ArbDardh[reply]

Tagged but not posted here: Merge with up to something.

Be is not essential to the idiom. Some other copulas work, eg. seem, appear, look. DCDuring (talk) 05:13, 20 September 2020 (UTC)[reply]

2020 — October[edit]

IMO it does not make sense to have some terms categorized directly into Category:Regional English (not its subcategories) and other terms categorized directly into Category:English dialectal terms, because in practice no-one seems to be maintaining a distinction as far as putting one kind of entry in one and another in the other, it seems haphazard as to whether an entry uses e.g. {{lb|en|US|regional}} / {{lb|en|UK|regional}} like pope, mercury, jack, snap, wedge, phosphate, tab, or gob, or else uses {{lb|en|US|dialectal}} / {{lb|en|UK|dialectal}} like pope (!), admire, haunt, on, sook, book, yinz, and gon. Many of the {{lb|en|US|dialectal}} / {{lb|en|UK|dialectal}} terms go on to specify which regions they're used in, like "Pittsburgh and Appalachia" or "Northern England" or "Scotland". And we put every more specific dialect category as a subcat of "Regional", not of "Dialectal". I'm not entirely sure which category the entries in the two top-level categories should be consolidated into, but I'm inclined to think they should go in one or the other. Or do we want to try to implement some distinction? (At the very least, entries that use "regional" but then go on to specify the regions, like "US, regional, Pittsburgh", can drop the unnecessary "regional".) The one situation I can think of where simply changing "regional" to "dialectal" would not work is that some entries are labelled "regional AAVE". Thoughts? - -sche (discuss) 01:06, 10 October 2020 (UTC)[reply]

  • I personally think that dialectal and regional terms should be separated. Since a term for something in a region from an out-of-region dialect should be categorize into both regional dialects. -- 65.92.244.147 16:29, 22 November 2020 (UTC)[reply]
  • I think the real problem is that it's not clear what we mean when we say something is dialectal. Linguistically, a dialect can be any speech variety that is separate from the rest of the language. With a language such as English that has multiple standards, you could say that much of the language is dialectal, though no one uses the term that way. I suspect there may be a value judgment involved: dialectal English is the way local people talk when they're not using proper English. Regional has less of that: I say potayto and you say potahto, but that's just a matter of geography. Theoretically, sociolects like AAVE and Cockney would be better described as dialectal than regional, but I'm not sure whether they're described as either. For a lot of people, though, it's probably whatever it's called in the references they check (or copy from). Chuck Entz (talk) 18:21, 22 November 2020 (UTC)[reply]
"dialectal English is the way local people talk when they're not using proper English".
What, pray tell, is proper English? General Australian? Standard Canadian English? General American (*had trouble including that as a suggestion with a straight face*)? Standard Indian English?
If someone were to suggest that whatever is arbitrarily declared to be the 'standard' dialect of the English in their country is thus "proper English", and every other dialect is not, then that is obvious nonsense. I get that that is the reason why you used the phrasing value judgement, but if what you suggest to be going on is actually going on, then that is a problem.
Wiktionary aims to be descriptive, not prescriptive. So if the category "Regional English" is being used to suggest that certain dialectal terms are more "proper" than others, then we need to get rid of one category or the other. Tharthan (talk) 18:42, 22 November 2020 (UTC)[reply]
I'm not agreeing with the value judgment. I was too lazy this morning to put everything in quotation marks. The basic problem is that this terminology goes back to earlier academic standards and it's hard to tell what it means in a more modern context. A dialectologist or other linguist would probably have a more rigorous definition, but we don't seem to. Chuck Entz (talk) 19:36, 22 November 2020 (UTC)[reply]

There was a discussion about this in 2014 which was closed (in 2016) after little input, but: should this be -trix? The only word listed as a derivative of this which is not -trix is ambassadrix, and viewing it as containing a suffix *"-rix" while simultaneously viewing ambassadress as containing "-ess" is not consistent anyway (why not view it as -ix, at that point? or more compellingly, as a blend influenced by -trix?). Perhaps if there were two more "-rix"es, it could suggest "-rix" had become an alternative form derived from -trix (although again, the lack of a verb *ambassade makes viewing ambassadrix as *ambassade suffixed with -rix rather than ambassador blended with -trix questionable), but the main form appears to be -trix. (Or, actually, the main process by which English acquires -(t)rix words appears to be borrowing directly from Latin without the application of any suffix in English in the first place.) No? - -sche (discuss) 03:00, 10 October 2020 (UTC)[reply]

There may be some cases where the term doesn't exist in Latin. Either way, this page should be moved to -trix. Ultimateria (talk) 22:25, 15 October 2020 (UTC)[reply]
I agree, merge into -trix. - excarnateSojourner (talk | contrib) 06:11, 7 March 2022 (UTC)[reply]

Finally merged. The majority of claimed derivations of "-rix" were words borrowed from Latin and not suffixed with any English suffix at all. The rest were all -trix, apart from ambassadrix. fr.Wikt might or might not want to also update their own corresponding "English words suffixed with -rix" category. - -sche (discuss) 20:48, 29 March 2024 (UTC)[reply]

2020 — November[edit]

Category:en:Artificial languages[edit]

This should probably be moved to Category:Conlangs, because:

. Hazarasp (parlement · werkis) 11:29, 3 November 2020 (UTC)[reply]

The odd choice of wording was intended to avoid the topical category conflicting with Category:Constructed languages, which is a holding category for those languages. Given that our MediaWiki trappings make it impossible to resolve this conflict, I support this proposal as a better compromise. —Μετάknowledgediscuss/deeds 06:16, 8 November 2020 (UTC)[reply]
@Metaknowledge Do you mean that Category:en:Constructed languages would conflict with Category:Constructed languages? Would Category:en:Conlangs be an option? - excarnateSojourner (talk | contrib) 06:23, 7 March 2022 (UTC)[reply]
Strictly speaking, it's not a conflict between Category:en:Constructed languages and Category:Constructed languages, but between two possibly versions of Category:Constructed languages. Our structure for topical categories (categories for entries associated with a particular topic) and set categories (categories for names of things) consists of an umbrella category that holds all the language-specific categories that are the same as the umbrella category, but prefixed with the language code of the language in question. In other words, you can't have Category:en:Constructed languages without an umbrella category called Category:Constructed languages. The problem is that Category:Constructed languages can't be both an umbrella category for language-specific categories and the container for things like Category:Esperanto language and Category:Volapük language. In order to have language-specific categories, you have to have an umbrella category that doesn't conflict with the container category for constructed languages. Currently that umbrella category is Category:Artificial languages, but the proposal here is to change it to Category:Conlangs. Thus,Category:en:Artificial languages would become Category:en:Conlangs. Chuck Entz (talk) 07:22, 7 March 2022 (UTC)[reply]

One should almost certainly be an alt-form of the other. I’m not sure which is best as lemma or whether the pronunciations should be identical. — Vorziblix (talk · contribs) 05:16, 17 November 2020 (UTC)[reply]

It looks like this is only citable with a pronoun, so the lemma should be zijn kat sturen. ←₰-→ Lingo Bingo Dingo (talk) 17:40, 26 November 2020 (UTC)[reply]

2020 — December[edit]

Tagged by Adam78 in July 2019, but apparently never listed. The specific diffs for the taggings are Special:Diff/53620744 and Special:Diff/53620742. The entries seem to have some distinct definitions listed, with take advantage of having "To exploit, for example sexually." and take advantage having "To profit from a situation deliberately." They also seem to share the definition "To use or make use of."/"To make use of something." Of final note, take advantage has a quotation with a usage that is not followed by of. Feel free to move this entry into the 2019 section if appropriate. —The Editor's Apprentice (talk) 19:58, 1 December 2020 (UTC)[reply]

2021 — January[edit]

Two assorted groups of adjective and adverb senses. Merge? Equinox 14:48, 5 January 2021 (UTC)[reply]

I'd bet that you couldn't come up with definitions on the merged entry that were both complete and subsitutable as both adjective and adverb in such definitions. Also I'd expect that synonyms might need to be distinguished by PoS. DCDuring (talk) 16:22, 5 January 2021 (UTC)[reply]
I don't mean that Adj and Adv should be merged, but rather that the two named entries should be merged. Equinox 21:48, 5 January 2021 (UTC)[reply]
Support for the sake of deduplication. - excarnateSojourner (talk|contrib) 06:52, 29 October 2021 (UTC)[reply]
Hmm. I think the adjective is normally all-out. The adverb seems to mostly be "all out". So it seems like each POS is best situated where it is, on its own page...? - -sche (discuss) 05:14, 31 March 2024 (UTC)[reply]

These are terms that were historically used in the Dutch East Indies, perhaps to some degree also in Malay-speaking territories of the Dutch East India Company. A rename to Category:Dutch_East_Indies_Malay makes the most sense. It is doubtful that a category "Netherlands Malay" is needed because the number of speakers of Malay in the Netherlands is not very high. ←₰-→ Lingo Bingo Dingo (talk) 19:57, 10 January 2021 (UTC)[reply]

Support, assuming this can be demonstrated with cites for all the entries. Categories for polities that existed in the past is a good practice when merited; cf. Category:Rhodesian English. —Μετάknowledgediscuss/deeds 22:54, 10 January 2021 (UTC)[reply]

Move to “merry Christmas and a happy New Year” as merry and happy are adjectives which do not need to be capitalized, and per the quotation and the entries merry Christmas and happy New Year. J3133 (talk) 08:55, 15 January 2021 (UTC)[reply]

The capitals seem to be common, albeit not mandatory ([5], [6], [7]). Other forms might be:
As alternative forms (comma or not, different capitalisation) get there own entries, keep what exists. As for the main entry, which form is the most common? --幽霊四 (talk) 18:58, 6 February 2021 (UTC)[reply]
I see the capitalized form more often than the other forms (as illogical as it is), so oppose. But a usage note or at least an alternative forms section should make note of the other forms ("New Year" is rarely lowercase, in my experience). Andrew Sheedy (talk) 22:41, 4 April 2021 (UTC)[reply]
Similar entries include Merry Xmas, Happy Christmas, Happy Holidays, and Happy Thanksgiving. (There are a few in lowercase as well.) - excarnateSojourner (talk|contrib) 07:01, 29 October 2021 (UTC)[reply]
Equinox moved Happy Holidays to happy holidays, saying "caps not required by default, though of course banners and greeting cards etc. often use title casing". - excarnateSojourner (talk | contrib) 20:56, 21 April 2022 (UTC)[reply]

Merge; both contain the same code. J3133 (talk) 12:30, 20 January 2021 (UTC)[reply]

These seem clumsy anyway. {{m}} and its friends already know not to italicize scripts other than Latn; can't {{quote-book}} and its friends be taught the same thing? —Mahāgaja · talk 10:09, 30 January 2021 (UTC)[reply]

2021 — February[edit]

Significant overlap. Equinox 18:43, 6 February 2021 (UTC)[reply]

Merge (note that both have additional translations and reconnoiter was WOTD). J3133 (talk) 09:08, 11 February 2021 (UTC)[reply]

I've merged the verb translations for reconnoiter and reconnoitre. They are all now on reconnoiter#Translations, and reconnoitre#Translations points to this with {{trans-see}}.Voltaigne (talk) 12:38, 20 March 2021 (UTC)[reply]

Not synonyms, of course, but certain senses overlap almost entirely (except people have edited one and not the other without realising). Equinox 04:12, 14 February 2021 (UTC)[reply]

An approach would be to put all and only the true definitions that are most commonly use a given spelling in that spelling and also have a definition in each saying that it is a synonym of the other spelling. That might not be exactly true, but would be close. To rely on the other term appearing in related terms seems a bit weak. DCDuring (talk) 04:19, 14 February 2021 (UTC)[reply]
Yeah, I think that's what we may have to do, with glosses in the {{synonym of}}s to make clear that each entry being a {{synonym of}} the other is not (just) circular. Like egoist vs egotist (we are not the only dictionary to have a sense line defining each of those terms as the other, in addition to other definitions). - -sche (discuss) 19:45, 14 February 2021 (UTC)[reply]
If possible and if properly executed, the approach I advocate gets you out of circularity for each individual definition. DCDuring (talk) 23:00, 14 February 2021 (UTC)[reply]

I would do this myself (it's SOP), but I don't know gender and other grammatical details. Chuck Entz (talk) 07:12, 16 February 2021 (UTC)[reply]

2021 — March[edit]

Equinox 00:38, 1 March 2021 (UTC)[reply]

Merge into whichever is used most often. - excarnateSojourner (talk|contrib) 07:03, 29 October 2021 (UTC)[reply]

Probably verb main lemma Oxlade2000 (talk) 11:06, 13 March 2021 (UTC)[reply]

I think this should be moved to sprong in het duister and converted to a noun. The forms with maken are not overwhelmingly common compared to other uses. ←₰-→ Lingo Bingo Dingo (talk) 14:56, 28 March 2021 (UTC)[reply]

2021 — April[edit]

There are several "rotation" senses that are patchily duplicated between these entries. Equinox 19:39, 9 April 2021 (UTC)[reply]

The definitions are different (I think fight shy is better; the other is too vague) and it seems that the entries should be merged anyway. Note that fight shy can occur alone, without of. Equinox 02:45, 12 April 2021 (UTC)[reply]

2021 — May[edit]

Should be moved to spit in the face of, since one can spit in the face of, e.g., the law, the government, hip-hop culture, and other non-people nouns. Imetsia (talk) 23:48, 7 May 2021 (UTC)[reply]

Actually it should be spit in one's face, as there are already many such set phrases involving a genitive construction where the variable object is represented as one's. For example: change one's mind—though there is also change someone's mind, which is redundant and should probably be deleted.69.120.64.15 02:31, 10 May 2021 (UTC)[reply]
In this case (spitting) it should be "someone's", because "one" spits in "someone" else's face. We use "one" where the phrase is constructed so that it happens to oneself. Equinox 20:27, 23 May 2021 (UTC)[reply]
On the other hand, it's conceivable that someone says, "How dare you spit in my face?", meaning that the person addressed has treated the speaker disrespectfully. — SGconlaw (talk) 17:44, 4 June 2021 (UTC)[reply]
Spit in quoting is "TFFFF" or "PTFFFU". 155.137.27.93 15:05, 29 March 2024 (UTC)[reply]
Ah I see now, the distinction is that one's constructions are supposed to be reflexive. The distinction in titling however is not obvious and I wish it were made clear somewhere. — 69.120.64.15 03:57, 5 June 2021 (UTC)[reply]

Canonical name of "mep"[edit]

Currently, the canonical name of the language in WT is spelled Miriwung, even though every primary/secondary source I could find recommended the spelling Miriwoong, as that is consistent with the language's own orthography, while the spellings "Miriwung" and "Miriuwung" are considered nonstandard. Can someone fix it? --Numberguy6 (talk) 14:47, 8 May 2021 (UTC)[reply]

It's not exactly hard to find sources spelling it as Miriwung, but I'm sure you're right. @-sche? —Μετάknowledgediscuss/deeds 22:52, 21 July 2021 (UTC)[reply]

Can/should the Irish religion senses at these two entries be merged somehow? Equinox 20:26, 23 May 2021 (UTC)[reply]

2021 — June[edit]

All of the other Proto-Tocharian entries so far use ⟨y⟩ for this phoneme */j/, equivalent to Adams' ⟨i̯⟩. This is also the letter used on the Wikipedia article for Proto-Tocharian and in the standard romanization of Tocharian languages, which we use, not to mention for the corresponding phoneme in PIE, *y. It would be nonsensical and confusing to use ⟨j⟩ instead for the Proto-Tocharian stage only. The page was created recently (April), so presumably its creator just forgot to check the existing entries. — 69.120.64.15 03:37, 5 June 2021 (UTC)[reply]

Wait, apparently there is a distinction in how Adams uses ⟨i̯⟩ versus ⟨y⟩ for Proto-Tocharian, but I have no clue what it is. (It has nothing to do with PIE *d versus *y, for instance, and nothing to do with laryngeals.) — 69.120.64.15 04:18, 5 June 2021 (UTC)[reply]
Ok, it seems to be non-phonemic and have to do with the following vowel. */jä/ (⟨ä⟩ ≈ IPA /ɨ/) and */jē/ are written ⟨i̯ä⟩ and ⟨i̯ē⟩ respectively, but /jV/ for all other vowels seem to use ⟨y⟩. I doubt this is a necessary distinction for Wiktionary to make, since it seems entirely predictable from environment, but I'm still unsure what purpose it is meant to serve. @GabeMoore, might you be able to weigh in? — 69.120.64.15 04:34, 5 June 2021 (UTC)[reply]

Suggest making when all you have is a hammer, everything looks like a nail the primary and if all you have is a hammer, everything looks like a nail the alt form. Rationale: sounds better and more hits on Google in a 4:3 ratio. Cheers, Facts707 (talk) 17:19, 12 June 2021 (UTC)[reply]

Support for the sake of deduplication. - excarnateSojourner (talk|contrib) 07:14, 29 October 2021 (UTC)[reply]

I'm all but certain that one can't have a word without pronounced vowels, but I feel that it reads better if it's explicitly stated anyway. Johano 01:15, 15 June 2021 (UTC)[reply]

One can't? Hmmm... Chuck Entz (talk) 03:28, 15 June 2021 (UTC)[reply]
@Chuck Entz: Thanks for the chuckle. — Fytcha T | L | C 04:08, 13 January 2022 (UTC)[reply]
@Fytcha: Shhh...Chuck Entz (talk)
Those categories also have some questionable entries, particularly with Welsh loanwords like cwm and crwth... Seems like it just checks for the absence of a/e/i/o/u/y. – Guitarmankev1 (talk) 12:48, 24 June 2021 (UTC)[reply]
Maybe "Category:English words without vowel letters"? —Mahāgaja · talk 14:03, 24 June 2021 (UTC)[reply]
Yeah (except maybe "English terms"), that would also reduce how dumb it looks that the category includes lots of numbers which are quite regularly pronounced with vowels, and things where the vowels have merely been obscured (b****cks), and abbreviations that aren't even "words" per se, like BHD. - -sche (discuss) 22:03, 8 July 2021 (UTC)[reply]
I support moving to Category:English terms spelled without vowels. - excarnateSojourner (talk|contrib) 07:19, 29 October 2021 (UTC)[reply]

Yeah I think the category should be renamed then. Ffffrr (talk) 20:50, 16 December 2021 (UTC)[reply]

Also, why is it a subcategory of Category:English shortenings? Sure, a lot of shortenings omit the vowels, but the converse isn't true: hmm, grr, 1984 (unless every number is a shortening of its spelled out form, which doesn't seem all that useful). Do I need to start a separate request to remove a subcategory? Medmunds (talk) 18:53, 18 March 2022 (UTC)[reply]

Sorry, that was off topic here. Answered my own question; moving this to Category talk:English words without vowels. — Medmunds (talk) 00:20, 21 March 2022 (UTC)[reply]

2021 — July[edit]

Like other Proto-Dravidian compound words, should not contain a hyphen. — 69.120.64.15 23:33, 19 July 2021 (UTC)[reply]

A bit fiddly: one entry is a verb and the other a noun, and they both have multiple senses with slight distinctions that should be ironed out. Equinox 13:13, 25 July 2021 (UTC)[reply]

2021 — August[edit]

Module:category tree/poscatboiler/data/lemmas already links to "punctual adverbs" but this module does not include "temporal location adverbs", although these apparently correspond to each other. (The former is defined as "adverbs that express a single point or span in time".) Also, "temporal location" looks a bit like a contradiction in terms.

For your convenience, the links to the other two category pairs are

Thanks in advance. Adam78 (talk) 17:21, 18 August 2021 (UTC)[reply]

@Adam78 @-sche @DCDuring I created a poscat for 'point in time adverbs' and unless there are objections I'm going to move 'temporal location adverbs' to this new name. This is consistent with the fact that discussions of temporal adverbs typically divide them into three categories, called "frequency", "duration" and "point in time". See for example [12] quoting Klein (1994) Time in language, as well as [13] Note also [14], which distinguishes "adverbs of time", "adverbs of duration" and "adverbs of frequency". To me this new name, even if a bit bulky, is more or less self-explanatory and far better than either "temporal location" (which suggests a spatial location and seems like an oxymoron) or "punctual" (which suggests aspectual semantics rather than simply an adverb that references a particular relative or absolute point in time). Benwing2 (talk) 06:02, 31 October 2023 (UTC)[reply]
Shouldn't the name use hyphens (ie, 'point-in-time adverbs') to reduce ambiguity? (I'm reminded of 'eats shoots and leaves' and other cautionary examples.)
The association of temporal with spatial location is a natural one. Our category names, especially the linguistic ones, are mostly inside baseball, of little importance to normal users, but possibly liked by people who like neat categories or find them useful, like perhaps ESL teachers.
I suppose that we could have similar categories for spatial adverbs: 'point location' and 'direction' come to mind, but 'area/region/space location' may also be meaningful and distinct.
A test for a set of categories that bears on its intelligibility and utility is whether the members of the set are mutually exclusive and collectively exhaustive. (See w:MECE principle.) I don't think that PoS subcategories tend to be. In real-world category structures 'collectively exhaustive" is achieved by having categories like 'adverbs not otherwise categorized'. Is that what we are doing with direct placement in the hypernym categories? DCDuring (talk) 16:59, 31 October 2023 (UTC)[reply]
I am fine with renaming these to "English point-in-time adverbs".
Side issue: some of the entries in the category seem incorrectly categorized: "twenty-four seven" does not seem like it's designating a point in time the way "in three days time" does, and I'm going to RFV sense 2 of "from dawn to dusk". - -sche (discuss) 05:28, 31 March 2024 (UTC)[reply]
 Done. @-sche:, please feel free to fix the categorization of the English time adverbs; some of them definitely look misclassified. Benwing2 (talk) 07:01, 31 March 2024 (UTC)[reply]

Overlap with armenisht. Also, it may be an adjective not a noun Wubble You (talk) 19:01, 27 August 2021 (UTC)[reply]

2021 — September[edit]

Names of sah, alt, xgn-kha and request for Soyot[edit]

The Constitution of the Republic of Sakha (Yakutia) (https://iltumen.ru/constitution) officially used язык саха referring to the language sah. A government decree («О Правилах орфографии и пунктуации языка саха») which approved the language’s current orthography, used язык саха instead of якутский язык from its annexe. However, this usage is not mandatorily popularised. I suggest Sakha to be adopted instead of Yakut due to the Constitution reference.

Whence atv ‘Northern Altai’ is not a singule language/dialect but a group of several (Kumandy, Chelkan & Tubalar), atv shall be split into subcodes. Furthermore Southern Altai is only a classifying term, Altai as an official term shall be suggested for alt.

Khamnigan xgn-kha, as a transitional dialect (with conservative phonology) between Buryat and Mongolian, its simple name may not create ambiguity.

In addition I also request a code for Soyot. It will help contrasting Sayan Turkic languages. LibCae (talk) 06:36, 2 September 2021 (UTC)[reply]

The Constitution of the Republic of Sakha is not our guide to using English names. In the case of [sah], most scholarly descriptions use "Yakut" (e.g. The Turkic Languages), there are far more raw Google hits for "Yakut language" than "Sakha language", and Google Ngrams show a preference for "Yakut" that has not waned over time (but we don't know past 2008, after which the data are incomplete).
I can't comment on the other code requests, but it would be more convincing if there were some evidence in favour of the need for these codes and their distinctiveness from their closest relatives. —Μετάknowledgediscuss/deeds 16:11, 2 September 2021 (UTC)[reply]
I don’t see the argument how more information would come to light if we split Northern Altai. Surely also Northern Altai and Southern Altai are the most usual names, in either English or Russian. For that number of speakers Northern Altai has, how could there be a benefit? The major factor for editors is what sources they use, whether they indicate the sources and whether those are clear about the place of origin. I had many books about “the Aramaic dialect of [village X]” where I don’t know which damn language code of Wiktionary it is supposed to belong to, Wiktionary making codes centered around city A and B but not village X, in the end I ignored to add anything. Fay Freak (talk) 17:00, 2 September 2021 (UTC)[reply]
Oppose renaming Yakut
Support splitting atv
Support renaming alt to Altai
Abstain regarding xgn-kha
Support creating a code for Soyot, quite strongly so. Allahverdi Verdizade (talk) 17:13, 2 September 2021 (UTC)[reply]

Not the same as take a knee TVdinnerless (talk) 23:56, 2 September 2021 (UTC)[reply]

What's the difference? - excarnateSojourner (talk|contrib) 07:35, 29 October 2021 (UTC)[reply]

Merge into checked and entering? —Suzukaze-c (talk) 06:11, 3 September 2021 (UTC)[reply]

Saltillo in Rapa Nui[edit]

Previous discussion: User talk:Kwamikagami#Saltillo
Pinging @Kwamikagami, Metaknowledge as users that were already part of the discussion. For others, the following TLDR:

  • User:Kwamikagami has moved a few Rapa Nui pages en masse from a straight apostrophe (U+0027) to a saltillo (U+A78C)
  • The reason they give for this is that, since unicode classifies the apostrophe as a punctuation mark, rather than a letter, it shouldn't be used as a letter, and thus the visually similar saltillo should be used.
  • The counter-reason given is that Unicode's classification is arbitrary and has little to do with actual usage in the language, which we as Wiktionary want to follow.
  • There is one mention of the saltillo being open to usage, in Kieviet (2007).
  • There is yet to be found at least one usage of the saltillo for Rapa Nui in the wild. since both Kieviet (2007) and schoolbooks published by the Chilean government use either a straight apostrophe (U+0027) [This one is most common], or a curly apostrophe (U+2018) provided with a font that renders it similar to a prime (U+2032). Other grammar books and dictionaries use any of the three characters.

I believe we should move these pages back to a straight apostrophe, and set the use of the straight apostrophe in stone at WT:ARAP. What do others think? Thadh (talk) 10:53, 3 September 2021 (UTC)[reply]

We have four sources:
We have Du Feu, who used a special font because the usual fonts available to her were inadequate for Rapa Nui, which required two special letters (the glottal stop and the engma). If the ASCII apostrophe were adequate for glottal stop, there would've been no need for a special letter.
We have Kieviet, who states that, now that Unicode provides for the saltillo, there is no longer a need for a special font.
We have the ministry dictionary, which uses an apostrophe letter -- not ASCII input with smart quotes, because it has the '9' shape at the beginning of a word.
We have the ministry educational material, which uses a hodgepodge of ASCII apostrophes, curly apostrophes and curly quotation marks -- that is, sometimes '1' shaped, sometimes '9' shaped and sometimes '6' shaped, with little consistency. Presumably we wish to aim for better than that, even if it is common.
In most languages that use an apostrophe-like letter for glottal stop, it's common to substitute a keyboard <'>, but that doesn't mean we should do the same. When writing Chechen, it's common to use a digit <1> for palochka, but again that doesn't mean we should do the same. When writing Ossetian, it's common to use a Latin rather than Cyrillic æ, but if you did that in a domain name, it would likely be tagged as phishing. The shortcuts people take with typography may be common, but a dictionary is expected to be more professional. kwami (talk) 15:10, 3 September 2021 (UTC)[reply]
To briefly summarise the important points of what I said on Thadh's Kwamikagami's talk page: This move should have been raised here first, so the weight of the evidence should have to point to the saltillo for us not to move it back. Kwami is from Wikipedia, and believes that we should be "more professional", even at the cost of ignoring all actual usage in a language community. (He has not, to the best of my knowledge, taken me up on my suggestion that he should go to the Wikipedias of languages like Rapa Nui and Hausa that use the apostrophe, and tell them that they're doing it wrong — just us.) I was open to the possibility that the saltillo might see actual use, but the fact that it doesn't makes this seem to be all about the Unicode specifications, which are not relevant to a descriptive dictionary. As I result, I support moving back to the apostrophe. —Μετάknowledgediscuss/deeds 16:15, 3 September 2021 (UTC)[reply]
There are several recent cases where @Mahagaja has advocated a particular Unicode character instead of a the straight apostrophe in such cases, but I don't remember the specifics off the top of my head. Chuck Entz (talk) 16:18, 3 September 2021 (UTC)[reply]
We do need to use Unicode correctly. The straight apostrophe (U+0027) and curly apostrophe (U+2018) are punctuation marks and should not be used as letters. That's what the saltillo (U+A78C) and modifier letter apostrophe (U+02BC) are for. If using punctuation marks as letters were acceptable, Unicode wouldn't have bothered creating those characters. Using punctuation marks for letters is as bad as mixing Latin and Cyrillic (which is something we used to do for Montenegrin Serbo-Croatian, but don't anymore), as Kwami points out, and just because other sources do it doesn't mean we should. We can, of course, have hard redirects from spellings with the more easily typable straight apostrophe, or put the correct page name in {{also}} if the spelling with the straight apostrophe exists (as a punctuation mark) in another language. But Kwami was quite right to move these Rapa Nui pages to the spelling using the correct character, and they should not be moved back. —Mahāgaja · talk 16:39, 3 September 2021 (UTC)[reply]
@Mahagaja: So if nearly everyone writing text in a given language (say, tens of millions of people) use a character that you consider "wrong", we should still avoid it because it doesn't respect Unicode? Whatever happened to descriptivism? (And if you think this is a silly hypothetical, it's not — I just described the situation with the apostrophe in Hausa.) —Μετάknowledgediscuss/deeds 21:56, 3 September 2021 (UTC)[reply]
It reminds me of when I started adding entries in the Cupeño language and had to figure out how to deal with a letter that the (pre-Unicode) main source defends as being very easy to replicate by filing bits off the $ key on a typewriter. People work with what they have available, and it doesn't always fit neatly into the right categories. Chuck Entz (talk) 23:00, 3 September 2021 (UTC)[reply]
@Metaknowledge: If tens of millions of people used Rapa Nui, it would have its own keyboard layout and the saltillo would be easy to type for them. Descriptivism applies to language, not orthography. It's not anti-descriptivist to say that recieve is a misspelling, and using an apostrophe as a letter is also a misspelling. The only difference is that using an apostrophe instead of a saltillo isn't a mistake that can be made when writing by hand or by typewriter or that can be detected in a photocopy or a scan, so it's more subtle (like mixing Latin and Cyrillic), but it's still a mistake. —Mahāgaja · talk 06:49, 4 September 2021 (UTC)[reply]
@Mahagaja: As I said, my example wasn't a hypothetical. There are somewhere around 60 million native speakers of Hausa per WP. Mac offers lots of keyboards for lots of languages, including one for Hawaiian complete with ʻokina, but it doesn't provide a Hausa one. When I search for Hausa keyboards on Google, they provide the apostrophe and quotation marks, but no character designated by Unicode as a letter. So are you really maintaining that nearly all typed material in Hausa is misspelt? —Μετάknowledgediscuss/deeds 07:11, 4 September 2021 (UTC)[reply]
Yes, though of course that's not the Hausa users' fault, it's the fault of the software companies that care more about providing support for a minority language spoken by 24,000 people in the United States than about providing support for a language spoken by tens of millions of people in Africa (i.e. systemic racism). I don't blame Hausa users for doing the best they can with the materials available to them, and I know it's unrealistic to expect them all to type &#700; instead of just hitting the apostrophe key, but as a dictionary it's our responsibility to do things the right way rather than the easy way. —Mahāgaja · talk 07:29, 4 September 2021 (UTC)[reply]
@Mahagaja: Systemic racism is the root cause of lots of annoying things, but some of those things are set in stone. At this point, Hausa users have no reason to follow Unicode rules even when they can. I'm sure the editors at Hausa Wikipedia can figure out how to get the "correct" character if they wanted to, but I see that you too have no interest in going over there and telling them they're doing it wrong. I have a radical idea: let's respect their choices. —Μετάknowledgediscuss/deeds 17:57, 4 September 2021 (UTC)[reply]
I have a better idea. We'll let Hausa Wikipedia worry about Hausa Wikipedia, and we'll worry about Wiktionary, which, as I said, has a responsibility to use Unicode correctly, even when other Wikimedia projects use it wrong. —Mahāgaja · talk 18:11, 4 September 2021 (UTC)[reply]
I thought we had a responsibility to document languages, not be Unicode purists. —Μετάknowledgediscuss/deeds 18:33, 4 September 2021 (UTC)[reply]
If Wikitionary or our browsers represent the languages incorrectly, because they follow the Unicode definition that punctuation marks are punctuation marks, then we are not documenting the languages correctly. If a language commission chooses a specific Unicode point that is one thing, but that's seldom the case. Since we by necessity choose a Unicode point for each letter regardless, we might as well choose one that represents the language well. kwami (talk) 01:17, 11 September 2021 (UTC)[reply]
Just to jump in quickly, as someone who is Nigerian and has had to go through the process of creating my own keyboards to be able to type properly in Yorùbá and as someone who is learning Hawaiian, while there definitely is systemic racism when it comes to African languages, I really would not pit them against Hawaiian. Hawaiian still lacks a ton of support, sometimes even less than Hausa, Igbo, & Yorùbá (see: spellcheck on PC Microsoft Word or language packs for Windows), and people are still trying to get more support for it. At the same time, Hawaiian is more than just "a minority language spoken by 24,000 people in the United States", it is an indigenous language that currently is the product of tons of effort gone towards revitalizing it and making sure that it's well-supported. And so, please do not pit them against each other saying that Hawaiian having more support (even though it doesn't) is systemic racism. The communities are aiming for very similar goals and are all dealing with racism in our own ways, not from each other.
Re: the main issue at hand. I would go with what the speakers of the language use. It's similar to what we do for Hausa, Igbo, & Yorùbá tones. No matter how annoying it can be, since the majority of speakers don't write tones out, we don't put them in page titles and only in headword lines, since we want people to be able to find words that they see "in the wild", which will often not be tone-marked. So it's a similar issue here, if the majority of speakers and majority of texts don't use the special character and it's not hard prescribed, then the page title shouldn't change, and the special character can be put in the headword line. That's my personal take on that issue. AG202 (talk) 13:48, 11 September 2021 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── This isn't a case of leaving out elements like tone marks. All RS's for Rapa Nui use the glottal stop. It's a matter of deciding which Unicode point to use for it, not whether to include it.

Re. the poor support for W. African languages, that's not racism so much as bias in the interests of the people developing Unicode. When Unicode decided they would no longer accept precomposed Latin, there was a call for people to get what they needed in before the deadline. But the respondents were all working on European languages. After West Africans started complaining that Unicode didn't adequately support their languages, the Unicode people realized they'd fucked up. At least, the ones I've talked to say they wished they'd realized what was going to happen, and spent more time on major African languages than on minor European languages.

Now that there are supplemental planes, there's room for more precomposed Latin. But as computers improve, there's less and less need for it, so I doubt they'll start accepting precomposed letters again.

I find it amazing that you could write Yoruba without tone. I mean, you can write it without vowels as long as you include the tone! kwami (talk) 23:06, 18 October 2021 (UTC)[reply]

There are many reasons that folks don't write Yoruba with tones, partly because of a lack of solid education, partly because of a lack of technological support, partly because you can (usually) tell what you mean from context, and a ton more. There was a solid seminar done last year at the British Library about it actually, but yea it's complicated. I wish that precomposed characters could be brought back, but that's a pipe dream. I don't think that you can write it without vowels as long as you include the tone though, as Yoruba is very vowel-heavy, and it'd get confusing quickly.
In terms of the question at hand though, I brought up the comparison more of a way to show how proscribed writing & everyday writing can interact on Wiktionary. If the majority of speakers type/write one way in informal & formal registers, that way should be the way that should be primarily reflected on Wiktionary, while the proscribed way can be shown in the headword line or an alternative form or whatever. However, I don't know the specifics of the situation with Rapa Nui, so I won't comment directly on the specifics of addressing it. AG202 (talk) 23:15, 18 October 2021 (UTC)[reply]
My impression of Yoruba, from the very very little I think I know of it, is that in fluent colloquial speech the vowels tend to assimilate to each other, and even consonants sometimes drop out, so that you might be left with a long [ooooo] with a bunch of tones and just a few consonants. It's the tone that makes it comprehensible. But that's by ear; I guess it wouldn't work well in writing.
But Hausa, yeah, I can see omitting the tone without any problem, except maybe the need to dab an occasional word. You might learn to write those few words with tone, the way accent marks distinguish homonyms in Romance languages, and otherwise ignore it. And some languages mark changes in tone, rather than the tone of each syllable. But I doubt that would work for Yoruba either. kwami (talk) 23:23, 18 October 2021 (UTC)[reply]
Input needed
This discussion needs further input in order to be successfully closed. Please take a look!
I have a suggestion that might be able to square this circle, but it's a bit awkward to explain so bear with me:
  • There are situations when it makes sense to remember the difference between the orthographic character a person intends to write, and the Unicode character which they actually use. A good example of this is the full stop "." (U+002E), which is also used in English (and Translingually) to represent the decimal point. We all agree that a full stop and decimal point are two different things, because any competent French translator would have to treat them differently, but the important thing is that that remains true regardless of which Unicode codepoints we happen to use. Indeed, it's true whether or not we're even encoding the characters at all. The same is true in French with the decimal point and the comma, too. Equally, nobody who receives "A-" on their homework is receiving "A dash" or "A hyphen".
  • Conversely, just because I write in full-width doesn't mean that any of us actually think "j" has a distinct identity to "j" etc. There might be technical, historical and/or stylistic reasons why we have both, but the point is that we consider them to have the same orthographic identity.
  • However, none of this prevents us from having a particular manual of style when it comes to certain characters. If we want to start using the en dash "–" (U+2013) or minus sign "−" (U+2212) in places where people intended to use them (i.e. intended characters with those orthographical identities), then that's fine. It would be no more of a problem than our choice to use a clear, black, legible font on white by default, when the original might be scrawled on a barely legible manuscript. Obviously there are no codepoints to interpret in cases such as that. Hell, a lot of the time the codepoints "used" are actually just whatever the OCR software vomited up anyway. Just like with misspellings, there needs to be some genuine intention, and it needs to be considered with respect to the orthographical identity of the characters, and not the codepoints they happened to pick.
  • A final point is that writers of a language don't necessarily know their own language perfectly, or they might not perceive a conscious distinction between two characters that does actually exist, because the context usually makes it so obvious (e.g. the full stop and the decimal point). It's not enough to say "yes, they intended to write an apostrophe because that's what they used". Are they really treating it as one?
I don't know enough about Rapa Nui to know whether the saltillo is the most appropriate character, but I hope that's a framework that makes it easier to determine the answer. Theknightwho (talk) 16:00, 7 July 2022 (UTC)[reply]

See WT:Etymology scriptorium/2021/September#korku, -u and -i

This Turkish suffix entry is probably the same as -i, and possibly , due to vowel harmony. While I don't know much about Turkish, the fact that this was created as the only Turkish edit ever by this contributor and the other two were created by a veteran contributor who is a native speaker has to count for something. Chuck Entz (talk) 05:45, 18 September 2021 (UTC)[reply]

-u is a harmonized form of -i, as are and . The canonical forms of suffixes are those with i and e. I disagree with the current policy of essentially providing the same definition 4 (or 2) times, see for instance the situation with -im, -ım, -um, -üm where only one contains all meanings and etymological information. I'm in favor of keeping the harmonized realizations of the suffixes as separate articles but I'm strongly in favor of converting the non-canonical forms into simple referral pages (see -dük). --Fytcha (talk) 09:59, 18 September 2021 (UTC)[reply]

Apparently the same thing. Reduce one entry to a synonym? Equinox 16:08, 19 September 2021 (UTC)[reply]

2021 — October[edit]

Sundanese by @Rankf. I'm guessing this should be lowercase. —Μετάknowledgediscuss/deeds 07:20, 2 October 2021 (UTC)[reply]

·~ dictátor·mundꟾ 18:24, 15 October 2021 (UTC)[reply]

Agree. DovaModaal (talk) 09:55, 16 October 2022 (UTC)[reply]

Mölmsch[edit]

We have an entry at Männeken in Mölmsch (see w:de:Mölmsch (Dialekt)), which is a dialect of Brabantian spoken in Mülheim, Germany. There is no code for Brabantian, which we consider a dialect of Dutch. So what do we do with Männeken? I'm really not comfortable calling it Dutch, even labeled {{lb|nl|Brabant}} (especially since Mülheim isn't in either Belgian Brabant or Dutch Brabant). So I'd like to create a new code, but for what? One specific to Mölmsch? One for Kleverländisch (the subvariety of Brabantian that Mölmsch belongs to)? One for Brabantian? What do others think? —Mahāgaja · talk 18:42, 16 October 2021 (UTC)[reply]

According to Wikipedia it's not part of Brabantian or Dutch, but it belongs to Low Franconian, to which also belong Dutch (nl) and Limburgs (li). LVR differentiates the following Low Franconian dialects in the Rhineland: Kleverlandish, South Low Franconian, Ostbergisch. Mülheim is classified as Ostbergisch. So some possibilities are:
  • Have an umbrella code for Low Franconian in Germany.
  • Have a code for Kleverlandish and another for Ostbergisch. South Low Franconian, according to Wikipedia, is a synonym (see Limburgisch) or holonym (see Limburgish#Expanded) of Limburgish. In the first case, there's already the code li. In the second case, either all Limburgish entries would have to be moved to South Low Franconian, or there would be South Low Franconian except Limburgish or maybe South Low Franconian in Germany.
--Myrelia (talk) 21:09, 20 October 2021 (UTC)[reply]
Since there's already a Wikipedia article and Wikidata item for the Bergish dialects, I'm creating a code for Bergish and not worrying about the rest. —Mahāgaja · talk 13:04, 11 February 2022 (UTC)[reply]

Move to Wikipedia?
It's encylopaedic and not really about words, for example the etymology of footsies isn't explained (related to foot?). --Myrelia (talk) 21:09, 20 October 2021 (UTC)[reply]

There is good stuff but it's mostly written like a long essay or book. Equinox 00:53, 4 November 2021 (UTC)[reply]
This would almost certainly not survive on Wikipedia. I have mixed feelings about the Appendix namespace here, as it seems a lot of things go there that would never be acceptable in the main dictionary, but that appendix pages are so hard to find for the casual user that it doesn't really bring us down. Soap 11:36, 13 April 2023 (UTC)[reply]
If we do end up deleting this, i'd hope we could try to contact the editor who wrote most of it (see User_talk:DKThel). There are other wikis that could host content like this where they wouldn't be pushed into the background like our appendix pages are. Admittedly the trade-off for that is having ads and using a site that is itself harder to find. Soap 11:47, 13 April 2023 (UTC)[reply]
There's definitely no point in moving it to Wikipedia, since it was originally moved here from Wikipedia, so they already decided they don't want it and foisted it on us. That's why it's written so encyclopedically. If anyone's interested in it, they should clean it up to make it more dictionarian; otherwise we should just delete it. —Mahāgaja · talk 12:34, 13 April 2023 (UTC)[reply]
A lot of these, to be honest, should just have their own entries. AG202 (talk) 19:53, 13 April 2023 (UTC)[reply]

2021 — November[edit]

These two entries link to each other rather confusingly and there may be redundancy in it. Equinox 00:52, 4 November 2021 (UTC)[reply]

This entry seems to have been created in title case by mistake; while proper nouns are capitalized in Esperanto, "eŭkaristio" is not a proper noun and thus should be moved to eŭkaristio. --Martelkapo (talk) 19:51, 10 November 2021 (UTC)[reply]

What does actual usage look like? Eucharist is a capitalized common noun in English; maybe it is in Esperanto too. —Mahāgaja · talk 20:25, 10 November 2021 (UTC)[reply]

Seems to be an alternative spelling of sfacimma, or the other way round. I don't know which should be made the main entry. --Akletos (talk) 09:03, 13 November 2021 (UTC)[reply]

Sfacimma is an alternative spelling of sfaccimma, which is how the word has been mostly written in the past years; the IPA pronunciation of the word is nowadays always /ʃfat͡ʃˈt͡ʃimmə/, with the voiceless postalveolar affricate having always the gemination, hence the spelling cc. Antomanu14 (talk) 13:59, 10 December 2022 (UTC)[reply]

Senses 1 and 3 seem to be the same thing. Maybe this could just be reduced to "dated form of guarantee", even. Equinox 10:56, 13 November 2021 (UTC)[reply]

I think it's been an alternative form of guarantee for a couple of centuries. Only recently (this decade) has it become much less common than guarantee. DCDuring (talk) 17:39, 13 November 2021 (UTC)[reply]
OTOH, MWOnline has differing, but overlapping, definitions for the two terms. DCDuring (talk) 17:42, 13 November 2021 (UTC)[reply]

In Britain the -y spelling seems to be almost exclusively used. Equinox 04:39, 18 November 2021 (UTC)[reply]

(There's whanghee too, but that is suitably an alt-form stub entry.) Equinox 11:22, 20 November 2021 (UTC)[reply]

2021 — December[edit]

Renaming [nlo][edit]

Wikipedia uses the phrase "Ngul (including Ngwi)" to describe this language, which we currently call "Ngul", but this paper indicates that these are just two of several synonyms, and uses "Ngwi" as the primary name. We should follow suit. —Μετάknowledgediscuss/deeds 00:19, 21 December 2021 (UTC)[reply]

Renaming [amf][edit]

We currently call this language "Hamer-Banna", after two of its dialects; WP uses "Hamer". This hyphenated name is found in the literature, though it excludes the third dialect, Bashaɗɗa. Modern publications, following the lead of Petrollino's grammar, use the spelling "Hamar" for that dialect. As I see it, if we stick with the hyphenated name, we should change it to "Hamar-Banna", but we could also consider elevating the name of the primary dialect to cover the language as a whole, as WP does, though in that case we should use "Hamar" instead. —Μετάknowledgediscuss/deeds 07:56, 22 December 2021 (UTC)[reply]

Equinox 05:43, 26 December 2021 (UTC)[reply]

Indus Valley Language[edit]

We currently have this language, which Wikipedia refers to as the Harappan language, as [xiv]. I suggest that we retire the code, because the language is undeciphered and its script has not been encoded, so there is nothing to add to Wiktionary in the foreseeable future. I also suggest that we retire the script code [Inds], which is only used for this language. @AryamanAΜετάknowledgediscuss/deeds 07:14, 28 December 2021 (UTC)[reply]

Merging Yoruba dialects[edit]

Currently, we have codes for [mkl] "Mokole" (see Mokole language (Benin)), [cbj] "Ede Cabe", [ica] "Ede Ica", [idd] "Ede Idaca", [ijj] "Ede Ije", [nqg] "Ede Nago", [nqk] "Kura Ede Nago", [xkb] "Manigri-Kambolé Ede Nago", and [ife] "Ifè" (all of which are lumped into Ede language). These lects are all very close to Yoruba proper (which they use for formal and liturgical purposes), and spoken by people who are considered ethnic Yorubas; moreover, they are included in the Global Yoruba Lexical Database. I have added them as dialects of [yo] "Yoruba" in MOD:labels/data/subvarieties, but treating Yoruba as a macrolanguage means we must remove these codes. (Note: the family code [alv-ede] would have to be removed as well.) @AG202, Oniwe, Oníhùmọ̀Μετάknowledgediscuss/deeds 07:29, 28 December 2021 (UTC)[reply]

Merge, obviously again Ethnologue’s fabrications, which were then copied over from Wikipedia and some other “encyclopedias” with their impractical credulity towards this reference. Fay Freak (talk) 07:54, 28 December 2021 (UTC)[reply]
If anything I would keep the Ede family code and change the lects to be etymology-only languages (edit: excluding probably Ifè since it is much more documented), but putting them all under Yoruba I unfortunately oppose for now. The Western Ede languages as seen here have a higher degree of separation from Nuclear Yoruba, and it checks out more when comparing, at the very least, the words and phrases of Ifè to nuclear Yoruba: Ifè-French Dictionary, Peace Corps - IFÈ O.P.L. WORKBOOK, J'apprends l'ife: Langue Benue-Congo du Togo. While there are obviously words that are shared due to them being related languages, it doesn't feel like a dialect of Yoruba (to me at least), so I feel uncomfortable grouping it under Yoruba. Though I do admit that I haven't really looked into the other Ede languages nearly as much. Edit: This paper may be helpful and at least shows some of the differences between Ifè & Yoruba and some aspects of the dialect continuum. Obviously some Ede varieties are much closer to Yoruba, but then I wonder what to do about the other ones. AG202 (talk) 15:09, 28 December 2021 (UTC)[reply]
@AG202: Thanks for the sources. The question of whether to lump a code is in part based on how much extra work is entailed; would you be willing to work through a subsample to see how much we would just be duplicating Yoruba entries, and how much would be distinct? I'm not sure what you're actually advocating, because making them etymology-only languages (which you say you support) would require merging them (which you say you oppose). —Μετάknowledgediscuss/deeds 07:18, 29 December 2021 (UTC)[reply]
@Metaknowledge Yea, sorry for that being unclear. I oppose the merger under solely Yoruba. Regarding the etymology-only part, I would support having all the Ede lects (excluding Ifè) under the header "Ede" and then differentiating on the definition line which Ede lect it is, mainly because they have much less coverage than Ifè, and it's harder to tell their mutually intelligibility. (Though as mentioned I'm not as well-versed with the other lects, so I might be entirely wrong about their continuum) In terms of working through a subsample, I am up to do so, though I am swamped at the moment so it'd definitely take a while, but from what I've seen so far, I'd be worried about putting possible Ifè terms like ɖíɖì (belt) or àntã̀ (chair) under a Yoruba header and keeping nice clear entries for readers. AG202 (talk) 07:52, 29 December 2021 (UTC)[reply]
Looks reasonable. To clarify, my main note relates to observation that the language names currently in the data are too unnatural to find use and are not even meeting our CFI, which again means there is no entrotopy for those who know the languages to assign material to the designations with little doubt, as there is little to confirm the meanings of the language names, which should be a consideration if you devise new namings, in so far as you would like to not have private language but more or less obvious to new editors what the language codes are for. So I was not to mean that there cannot be a split in a different manner, or a smaller merge, but the current ones should be recognized as off the wall, and then there will have to be something that interrelates the remaining codes if one stumbles upon one, else it will be a reoccurring problem that an editor did not see the distinction of the available language codes. Fay Freak (talk) 01:36, 30 December 2021 (UTC)[reply]

2022 — January[edit]

Equinox 10:35, 1 January 2022 (UTC)[reply]

probably the same thing. Br00pVain (talk) 13:19, 17 January 2022 (UTC)[reply]

Seemingly synonyms. Equinox 19:00, 18 January 2022 (UTC)[reply]

Tagged but not listed in August 2021 by User:Caoimhin ceallach, providing the reason:

I'm in favour of moving this page to *én. As {{R:ine:LIPP|page=221|vol=2}} shows, there is no evidence that points to an initial laryngeal and Greek and Vedic speak against it.

I've redacted the preceding quote by incorporating the reference in the superscript. Thadh (talk) 11:16, 25 January 2022 (UTC)[reply]

We reconstruct all PIE terms with an initial laryngeal on the project, per current PIE theory, so *én = *h₁én. Sidenote, {{R:ine:LIPP}} is an embarrassment in the academic community, and should never be used as a primary source. --Sokkjo (talk) 02:24, 8 February 2023 (UTC)[reply]
can you elaborate on why you think {{R:ine:LIPP}} is an embarrassment? --Ioe bidome (talk) 15:58, 4 March 2023 (UTC)[reply]
@Ioe bidome: His hypothetical system of deriving roots from particles is largely considered crackpottery. --– Sokkjō 20:42, 4 March 2023 (UTC)[reply]
@Sokkjo @Ioe bidome I don't care where we hold this conversation, as long as you reply. I'm the second person who has asked you to elaborate. If you can't, your concern will have to be dismissed. —Caoimhin ceallach (talk) 08:04, 27 March 2023 (UTC)[reply]
I have elaborated to why accademics largely reject {{R:ine:LIPP}}, and referred you to this unfavorable review, DOI:10.1515/zcph-2019-0009. That's all neither here nor there, as on this project, we subscribe to larygeal theory, which also calls for word-intitial larygeals before vowels. If you wish to make an arugment for why we should do away with that standard, feel free to start a discussion in the WT:Beer parlour, but as is, your move request is unwarranted. --– Sokkjō 08:52, 27 March 2023 (UTC)[reply]
@Sokkjo
  • You seem to have not read that review. If you did you'd see that it is overwhelmingly positive:
    • "Ce sera un ouvrage de référence pour longtemps."
    • "Ces remarques ne retirent rien à l’importance de l’ouvrage, qui peut servir de base tant à une recherche synchronique éclairée consacrée à tel ou tel groupe de langues qu’à une étude proprement comparative."
  • The other review I'm aware of is also overwhelmingly positive:
    • "In this massive, and truly monumental, two-volume work that was years in the making, author George Dunkel (henceforth D) draws on the extensive research, and the literally dozens of articles, that he has done throughout his distinguished career as an Indo-Europeanist, investigating the uninflected bits and pieces – the ἄπτωτα (áptota), the indeclinabilia¹ – of the Indo-European lexicon that are so indispensable to the phrasal and sentential syntax and to discourse and text structure in all the family’s languages." https://www.jbe-platform.com/content/journals/10.1075/dia.33.4.05jos
  • Nothing about this thing takes away from laryngeal theory.
  • I'm going to ask again: please elaborate on your misgivings about LIPP. —Caoimhin ceallach (talk) 10:28, 27 March 2023 (UTC)[reply]
Since I'm guessing you don't have academic access to the second page:

Reste une réserve. Malgré la prudence de l’auteur, les processus de formation des grammèmes qu’il étudie relèvent, par définition, de la reconstruction, et l’ouvrage n’étudie pas de manière détaillée les processus qui ont lieu à date historique. Parfois le lecteur peut avoir l’impression que le système titué est d’une complexité qui le rend typologiquement invraisemblable; ainsi, vol. 1, pp. 24–26, l’auteur pense pouvoir reconstruire pour l’indo-européen quatre thèmes pronominaux qui relèvent de l’exophore proximale, deux qui relèvent de l’exophore distale, et quatre thèmes anaphoriques (George Dunkel écrit que les thèmes liés à l’exophore proximale et distale ne sont pas en contraste sémantique les uns avec les autres, mais seulement avec l’absence de déixis; ce point est obscur aux yeux du recenseur).Une telle richesse en thèmes démonstratifs nécessiterait une explication. Au demeurant l’opposition entre exophore proximale et distale n’est pas nécessairement suffisante pour couvrir tous les thèmes de l’indo-européen, qui a pu posséder par exemple trois degrés d’exophore.
En fait il peut sembler que la reconstruction des grammèmes indo-européens est vouée au flou, faute de données permettant d’étudier, notamment, la sémantique exacte des éléments concernés aux différents stades chronologiques et dans les différentes aires géographiques à prendre en compte.

Again, I'm not here to agrue about LIPP -- that's beside the point. The point is that the established convention we follow on the project for reconstructioning PIE is that #VC- only possibly exists in pronouns, if even there. See {{R:ine:IEL|52}}. – Sokkjō 23:03, 27 March 2023 (UTC)[reply]
@Sokkjo, I read the whole review. I even quoted from the second page. As I said before, a reservation does not equal a invalidation.
The validity of LIPP is very much on point. I would like to mention an alternative reconstruction *én (and if others agree move the page), which is supported by evidence instead of on some misplaced assumption. You preclude any discussion by rejecting the evidence out of hand.
In addition, I would like to continue citing LIPP, so your violent objection to it ("embarrassment to the academic community") is relevant to me. I think it is fair to say that if you could back up your objection you would have done so by now.
I am of course aware that roots had a CₓVCₓ structure. There are good reasons for assuming this. However, this is not the case for suffixes, it is not the case for pronouns, it is not the case for adverbs and it is, indeed, not the case for particles.
Your "established convention" that #VC- entries aren't allowed, doesn't exist. If you think otherwise please point to it. WT:AINE does not mention the phonotactics of entries. And at any rate, WT:RECONS clearly says that "variants and disputed forms can then be addressed in great detail within the text of the pages themselves". If you don't want *én on the page, you need to have (at the very least) substantive arguments why the evidence supporting it is wrong.
But I'd ask you to please be more careful about your references. You keep quoting things which don't support your position. {{R:ine:IEL|52}}: "It seems that onsetless initial syllables (#VC-) were rare" ie not nonexistent. LIPP, the first systematically study of Indo-European particles, documents evidence for a substantial number of exactly these. —Caoimhin ceallach (talk) 01:23, 29 March 2023 (UTC)[reply]
I'm aware of what I cited, "rare" meaning they are limited to pronouns, and to continue on to the following sentence, "It is common practice now to reconstruct initial laryngeals even when not strictly provable". You seem to be under the impression that I, created this "common practice" and I set that convention here on the project. I'm honored you think I have that seniority, but despite contributing here for over a decade, it long preceeds me. If you want to argue against the status quo, not just on this project, but in academia at large, the weight is on you to do so. – Sokkjō 03:51, 29 March 2023 (UTC)[reply]

2022 — February[edit]

The art/literature senses are defined very differently at the two entries, which seems like a problem. One is already tagged for cleanup, so, good luck! Equinox 00:21, 1 February 2022 (UTC)[reply]

Different spellings of the same word, from Yiddish. 70.172.194.25 19:46, 2 February 2022 (UTC)[reply]

Members:

@Justinrleung, RcAlex36, 沈澄心Fish bowl (talk) 05:55, 6 February 2022 (UTC)[reply]

@Fish bowl: Gansu means actual Gansu in China, but Gansu Dungan should be its own label perhaps. I'm not sure why those entries are labelled specifically as Gansu Dungan, though, because do we know if it's not used in other varieties of Dungan? Pinging @Mar vin kaiser to know why he chose to label it as Gansu Dungan specifically. — justin(r)leung (t...) | c=› } 06:03, 6 February 2022 (UTC)[reply]
@Justinrleung: There's this website, I can't find the link now, that was like a mini Dungan dictionary, and for some of its words, it has a dialectal label. I think I got it from there. --Mar vin kaiser (talk) 08:39, 6 February 2022 (UTC)[reply]
@Mar vin kaiser: This? I know these words are marked as Gansu here, but I wonder if we need to specify it as Gansu specifically when we don't know if other Dungan varieties use it. — justin(r)leung (t...) | c=› } 09:02, 6 February 2022 (UTC)[reply]
@Justinrleung: Oh, I added the label Gansu with the assumption that it's specifying that it's only used in Gansu. Aren't there just two dialects, Gansu and Shaanxi? --Mar vin kaiser (talk) 14:03, 6 February 2022 (UTC)[reply]

Merge Category:Hokkien, Category:Hokkien Chinese; and perhaps move Category:Hainanese depending on the result of the previous[edit]

Category:Hokkien is an etymology language, while Category:Hokkien Chinese belongs to the {{dialectboiler}} system.

Category:Hainanese is presently both.

Fish bowl (talk) 11:10, 7 February 2022 (UTC)[reply]

@Fish bowl @Justinrleung @RcAlex36 @沈澄心 @AG202 IMO we should delete Category:Hokkien Chinese and recategorize the lemmas under it to Category:Hokkien. This is consistent with the treatment of other etymology languages, particularly since Hokkien is considered a dialect of the Min Nan language and not a dialect of "Chinese" (which is not a language). If you don't mind, I will go ahead and do this. (While we're at it, we should rename the Amoy etymology language to Xiamen Hokkien, which is currently a dialect category but not an etymology language, and give it a standardly-formed etymology code. Its current code is nan-xm, which is badly formatted; etymology codes should consist of sections of three letters, hence nan-xia. Same goes for nan-ph -> nan-phi, nan-qz -> nan-qua, nan-zz -> nan-zha, nan-jj -> nan-jin.) Benwing2 (talk) 05:29, 16 September 2023 (UTC)[reply]
I also think we should upgrade Hokkien to a full language, esp. seeing as Min Nan is itself not a language but a macrolanguage. Benwing2 (talk) 05:30, 16 September 2023 (UTC)[reply]
Agree that we should treat Hokkien as a full language - this feels like to be long overdue. I think in general each lect listed in {{zh-pron}} should be treated as a full language in its own right, which means Sichuanese (currently with etymology code [cmn-sic]) and Leizhou (currently lacks a code, I would suggest [nan-lei] or [nan-lz]) would be upgraded. We might also want to add more etymology codes, but that might warrant a separate discussion.
I however oppose changing the 3-2 letter codes, which are much easier to memorise (since this is just taken from the first letter of each syllable) and also are consistent with the location codes used in {{zh-pron}}. Changing them means that we would need to deal with two separate, inconsistent systems.
Regarding the category name issue, for some reason we also have categories like Cat:Mandarin Chinese, Cat:Cantonese Chinese, Cat:Hakka Chinese, Cat:Min Nan Chinese, etc. alongside the regular lemma categories. I don't really care about their treatment as long as the approach is consistent. – wpi (talk) 17:18, 16 September 2023 (UTC)[reply]
@Wpi It is a pain to have nonstandard etym codes like this, as it requires adding code to various places to handle them. I don't see why the 3-2 codes are easier to memorize; the proposed 3-3 codes consistently use the first three letters of the lect in question, which is standard practice at Wiktionary, whereas the 3-2 codes aren't consistent (nan-ph is not the first two syllables of "Philippine"). In terms of the location codes in {{zh-pron}}, we should rename the latter to match the 3-3 codes. However, as a first step if you don't object, I will promote Hokkien to a full language, and we can continue the discussion on etym codes; in this case we should maybe eliminate Category:Hokkien in favor of Category:Hokkien Chinese for consistency with the other such categories, although in general we need to rethink the naming of these categories. Benwing2 (talk) 19:21, 17 September 2023 (UTC)[reply]
I think one reason that 3-2 codes are easier to memorize is that {{zh-pron}} uses 2-letter codes for dialects of Hokkien. However, if it makes more sense for codes to be 3-3 to be consistent with other languages, I wouldn't mind it. I agree that whatever we do, we should make it consistent with CAT:Mandarin Chinese, CAT:Gan Chinese, CAT:Xiang Chinese, etc., (which means the easiest thing to do is to have CAT:Hokkien Chinese). — justin(r)leung (t...) | c=› } 18:09, 19 September 2023 (UTC)[reply]

Category:Classical Chinese presently has ~270 pages.

Category:Chinese literary terms presently has ~8,000 pages.

(Thankfully, Category:Literary Chinese lemmas has 0 pages.)


Category:Classical Chinese is currently described as:

{{dialectboiler|zh|the 5th century BC to 2nd century AD, and continued as a [[literary language]] until the 20th century}}

[[Category:Old Chinese lemmas]]
[[Category:Middle Chinese lemmas]]
[[Category:Literary Chinese lemmas]]


Category:Korean Classical Chinese is a child category of Category:Classical Chinese, and may be a rationale for keeping Category:Classical Chinese in some form.

Fish bowl (talk) 11:17, 7 February 2022 (UTC)[reply]

@Fish bowl @Wpi @Justinrleung @RcAlex36 @沈澄心 I think we need to do something about this. I propose the following:
  1. At a minimum, we should merge Category:Classical Chinese with the Category:Literary Chinese language by renaming the latter to Category:Classical Chinese language.
  2. I also think we should demote the resulting Category:Classical Chinese language to an etym-only language of Category:Chinese language. It's purely a literary construct and not on the same level as the spoken varieties. Note for example that we don't have separate languages for Classical Latin, Koranic Arabic or Modern Standard Arabic.
  3. Finally, we should consider merging Category:Chinese literary terms into Category:Classical Chinese, as proposed above, but I don't have the background to know whether this is a good idea.
Any objections if I carry out #1 and #2? Benwing2 (talk) 19:13, 17 September 2023 (UTC)[reply]
@Benwing2: "Literary Chinese" is the generally regarded the same as "Classical Chinese", but reading w:Classical_Chinese#Definitions I think we might want to keep them separate for lexicalgraphic purposes, by treating Literary Chinese as the later stages. Note that Classical Chinese is also distinct from "literary Chinese" (note the capitalisation), although they overlap in certain places (such as Ming/Qing era usage).
Using the hypothetical conjunction "if" as an example, (chéng) and 向使 (xiàngshǐ) are found in Qin-Han era Classical Chinese, but not in Ming-Qing era Classical Chinese (which uses words closer to modern usage instead, i.e. literary terms) - I wouldn't call the former ones as literary terms, instead more like obsolete or archaic.
Early Classical Chinese (Qin-Han) is significantly different from modern Chinese in terms of grammar and pronunciation, a reasonably educated person would have a hard time understanding a text even with annotations; Tang-Song era Classical Chinese is still somewhat incomprehensible with a couple of obsolete (in modern standards) terms; late Classical Chinese (Ming-Qing) is more fuzzy and one might simply call it literary Chinese; in early modern times these are all considered to be one thing, which is why we have the misnomer Literary Chinese.

I think #1 of your proposal would be relatively uncontroversial, though I would wait to see input from others.
#2 is questionable, depending on what do we regard Chinese to be. Because we have everything placed under Chinese, this corresponds to like stuffing everything from Old Latin (or even earlier) to Neo-Latin into a subvariety of a Latin-Romance language without treating Latin itself as a language, which is a very awkward thing.
Classical Chinese has its own quotations (and as Fish bowl have mentioned we have Category:Korean Classical Chinese and Category:Vietnamese Classical Chinese - the quotations for these varieties are also placed under the Chinese L2), which are categorised as Category:Literary Chinese terms with quotations. Changing Classical Chinese to etymology-only would mean these quotations have nowhere to go - it is often impossible to discern where they should otherwise be treated. I would rather counter-propose that Category:Old Chinese language and Category:Middle Chinese language be treated as an etymology-only variant of Classical Chinese - OC and MC are essentially just a snapshot of the sound system at a particular time point in the history of Chinese.
#3 is 1000% a no, though I would support it if one day we were to accept Altaic languages as valid :)
Fish bowl's proposal might have been motivated by the similarities between late Classical Chinese and literary terms in modern Chinese, but a more in-depth look would suggest that this is untrue. – wpi (talk) 06:09, 19 September 2023 (UTC)[reply]
@Wpi Old Chinese and Middle Chinese are more conventional languages, even if semi-reconstructed, so I would argue they should stay as full languages, whereas Classical Chinese is somewhat of an artificial construct, and normally we place those as etym languages. I think the issue here is that there is more than one Classical Chinese, whereas most Classical Foo languages are fairly unified. This suggests we should separate Classical Chinese into something like Old Classical Chinese or Early Classical Chinese (an etym language of Old Chinese), Middle Classical Chinese (an etym language of Middle Chinese), and Late Classical Chinese (an etym language of Chinese?). Benwing2 (talk) 06:20, 19 September 2023 (UTC)[reply]
A few issues here. I think we've been kind of sloppy when it comes to the literary/classical distinction. Most entries have been using "literary" since that was what was the norm back in the day. "Classical" came later as the "Classical" label was introduced to other languages, which is probably why we have fewer uses of this label. While I think the Literary/Classical distinction is useful, I wonder if in labelling how we should be making the distinction. If a term is used in both Classical Chinese and Literary Chinese, such as 首 "head" (now labelled "archaic"), do we have to label it with both? Whatever we decide on, I think we also need to think about how this is organized in {{zh-x}}.
In principle, I think #1 would be something okay to do. #2 doesn't seem okay per Wpi. The issue with Classical Chinese is that it cannot fit neatly in OC or MC because as Wpi said, these are snapshots of the phonology. There is also Classical/Literary Chinese works written way after the Middle Chinese period, but not necessarily able to be considered as any modern Chinese variety. #3 would probably need to be worked out entry by entry. Some entries should probably be moved to Classical Chinese, but others may be used in highly formal modern writing. It might be difficult to distinguish the two given our previous usage of "literary" as a label. We would need to set stricter definitions for what goes where. — justin(r)leung (t...) | c=› } 03:33, 20 September 2023 (UTC)[reply]
@Justinrleung @Wpi @Fish bowl Pinging the people who previously participated as well as @Theknightwho. I am trying to convert all the bespoke variety codes in {{zh-x}} to standard codes. I added zhx-lit for "Literary Chinese", specifically the later stage of Classical Chinese; but this conflicts with the name of zhx. I really think we should rename zhx, probably to Classical Chinese. The only issue with this term is that sometimes Classical Chinese specifically seems to refer to the 5th century BC - 2nd century AD period, as in Category:Classical Chinese, or some other similar time period. I'm thinking maybe we need a different term for this: Han Classical Chinese? Although strictly speaking, the Han dynasty only began in the 3rd century BC. Or "Late Old Classical Chinese"? Please note, I also added zhx-pre for "Pre-Classical Chinese" corresponding to the old CL-PC code; but I have no idea if this makes any sense, as it seems awfully similar to Old Chinese. Benwing2 (talk) 04:47, 27 March 2024 (UTC)[reply]
I should add, I also added a code cmn-bec for "Beijingic Mandarin", which is the primary branch of Mandarin that includes Beijing and environs. This is described in Wikipedia under Beijing Mandarin (division of Mandarin) whereas Beijing Mandarin itself (code cmn-bei) is described under Beijing dialect. The term "Beijingic" comes from Glottolog. This was added to correspond to the M-UIB code added and used primary by User:Dokurrat. Since M-UIB is described as "dialectal Beijingesque Mandarin", I assume it approximately corresponds to the Beijingic primary branch. Note that the existence of Beijingic is somewhat controversial as some researchers place Beijing and surrounding dialects into Northeastern Mandarin. I also added labels (but not etym codes) for all primary Mandarin branches and many individual dialects under these branches; basically, any dialect that had 4 or more mentions among the labels as well as any dialect where I could find a corresponding English Wikipedia page describing it. (There are more dialects with Chinese Wikipedia pages but I haven't yet found them all.) Eventually I think we should assign etym codes to most or all of these dialects but for the moment I'm mostly just collecting them into labels; once I have a fairly complete set of labels it will be easier to assign codes in a semi-consistent fashion. Also, I am ready to push the code to allow both new (standard) and old (bespoke) variety codes in {{zh-x}} and then convert all uses to the new codes, but this can't run until my current run obsoleting {{zh-noun}} and {{zh-hanzi}} finishes. (It's run for ~ 22 hours so far and has maybe 14 hours to go.) Benwing2 (talk) 07:01, 27 March 2024 (UTC)[reply]
BTW here is the current mapping I have worked out from old bespoke {{zh-x}} codes to standard codes:
old_to_std_code_mapping = {
  "MSC": "cmn",
  "M-BJ": "cmn-bei",
  "M-TW": "cmn-TW",
  "M-MY": "cmn-MY",
  "M-SG": "cmn-SG",
  "M-PH": "cmn-PH",
  "M-TJ": "cmn-tia",
  "M-NE": "cmn-noe",
  "M-CP": "cmn-cep",
  "M-GZ": "cmn-gua",
  "M-LY": "cmn-lan",
  "M-S": "zhx-sic",
  "M-NJ": "cmn-nan",
  "M-YZ": "cmn-yan",
  "M-W": "cmn-wuh",
  "M-GL": "cmn-gui",
  "M-XN": "cmn-xin",
  "M-UIB": "cmn-bec",
  "M-DNG": "dng",
  
  "CL": "lzh",
  "CL-TW": "lzh-cmn-TW",
  "CL-C": "lzh-yue",
  "CL-C-T": "lzh-tai",
  "CL-VN": "lzh-VI",
  "CL-KR": "lzh-KO",
  "CL-PC": "lzh-pre",
  "CL-L": "lzh-lit",

  "CI": "lzh-cii",
  
  "WVC": "cmn-wvc",
  "WVC-C": "yue-wvc",
  "WVC-C-T": "zhx-tai-wvc",

  "C": "yue",
  "C-GZ": "yue-gua",
  "C-LIT": "yue-lit",
  "C-HK": "yue-HK",
  "C-T": "zhx-tai",
  "C-DZ": "zhx-dan",

  "J": "cjy",
  
  "MB": "mnp",
  
  "MD": "cdo",
  
  "MN": "nan-hbl",
  "TW": "nan-hbl-TW",
  "MN-PN": "nan-pen",
  "MN-PH": "nan-hbl-PH",
  "MN-T": "nan-tws",
  "MN-L": "nan-luh",
  "MN-HLF": "nan-hlh",
  "MN-H": "nan-hnm",
  
  "W": "wuu",
  "SH": "wuu-sha",
  "W-SZ": "wuu-suz",
  "W-HZ": "wuu-han",
  "W-CM": "wuu-chm",
  "W-NB": "wuu-nin",
  "W-N": "wuu-nor",
  "W-WZ": "wuu-wen",
  
  "G": "gan",

  "X": "hsn",
  
  "H": "hak-six",
  "H-HL": "hak-hai",
  "H-DB": "hak-dab",
  "H-MX": "hak-mei",
  "H-MY-HY": "hak-hui-MY",
  "H-EM": "hak-eam",
  "H-ZA": "hak-zha",
  
  "WX": "wxa",
}

Benwing2 (talk) 04:52, 27 March 2024 (UTC)[reply]

On a second thought, I don't think we should have yue-wvc and zhx-tai-wvc as they seems to be too similar to lzh-yue and lzh-tai, plus WVC-C is used only once in 萊苑 (and none for WVC-C-T). @Fish bowl who added the ux in 萊苑 for comment. – wpi (talk) 13:07, 27 March 2024 (UTC)[reply]
@Wpi What is the difference between lzh-yue (CL-C) and yue-lit (C-LIT)? The former occurs 35 times and the latter 415 times. Benwing2 (talk) 18:09, 27 March 2024 (UTC)[reply]
@Wpi WVC-C-T seems to be used 4 times: 亞利務, 交帶, 粗市畔 and 赤市. WVC-C is used on 10 pages: 公仔, 小粉紅, 市卜頃, 格仔, 正埠 (3 usexes), 羅生, 苛例, 菜苑 (2 usexes), 萊苑 and 葛崙. Benwing2 (talk) 02:30, 28 March 2024 (UTC)[reply]
WVC-C can probably be merged into C-LIT, but I don't have any particular suggestion for the *-C-T codes.
Mentioning this again: perhaps we should use a bipartite system giving the text language and the pronunciation language, such as lzh/cmn-TW (Literary Chinese in Taiwanese Mandarin pronunciation).Fish bowl (talk) 03:49, 30 March 2024 (UTC)[reply]
@Fish bowl Thanks for bringing this up; I missed it last time. In fact my recent overhaul of Module:zh-usex/data implemented something very similar. Essentially, there is the variety code, which is typically an etym-only language code, and then for each such variety there is a second "norm code" that is used for romanization (i.e. pronunciation) purposes. There's nothing preventing us from implementing your suggestion on top of this, if it proves necessary. Benwing2 (talk) 04:15, 30 March 2024 (UTC)[reply]
@Fish bowl @Wpi Just to confirm: All three of WVC-C (yue-wvc) "Written vernacular Cantonese", CL-C (lzh-yue) "Classical Cantonese" and C-LIT (yue-lit) "Literary Cantonese" can be merged? This is based on @Wpi suggesting merging WVC-C with CL-C and @Fish bowl suggesting merging WVC-C with C-LIT. I assume that if these were different they would refer to different time periods (?), but I don't know if there's enough difference to warrant separation. Benwing2 (talk) 05:02, 30 March 2024 (UTC)[reply]
@Benwing2: Incorrect. I believe the problem here is that we have multiple understanding of the usage of the Cantonese codes. General speaking, there are these types:
  1. modern spoken Cantonese
  2. modern written vernacular Chinese (e.g. Hong Kong Chinese)
  3. written vernacular Chinese from 19th/early 20th century
  4. Classical Chinese that uses Cantonese pronunciation
  5. spoken Cantonese from 19th/early 20th century (e.g. dictionaries from missionaries)
They are especially difficult to tell apart when the phrase/sentence is short and does not contain much grammatical features.
Justin, Alex and I uses C for #1 and #5 (or the newly added C-HK when applicable), C-LIT for #2, and C-CL for #3 and #4.
Fish Bowl uses C-GZ for #1, WVC-C for #2 and #3, and C-CL for #4. (Please correct me if my understanding is incorrect)
I think what we need to do here is to change WVC-C into C-LIT or C-CL according to the context. – wpi (talk) 12:54, 30 March 2024 (UTC)[reply]
@Wpi OK thanks and apologies for my confusion, I haven't encountered before (as a linguist) the situation where there's a big gap between the written and spoken forms and multiple ways of pronouncing a given written form. Benwing2 (talk) 20:11, 30 March 2024 (UTC)[reply]
I should add, does anyone mind my renaming the old {{zh-x}} codes to the new ones? As shown in the above table, no information will be lost because there is a one-to-one mapping between the old and new codes. Benwing2 (talk) 02:56, 31 March 2024 (UTC)[reply]
I don't particularly support the usage of the C-GZ [Guangzhou Cantonese] tag, and think that (for zh-usex at least) it can be safely merged into C [standard Cantonese]. For 2 (modern HK) I also use C-LIT.
I'd also like to comment that "gua" in the replacement code "yue-gua" feels very foreign and unintuitive. —Fish bowl (talk) 07:13, 31 March 2024 (UTC)[reply]
@Fish bowl Hmm. Do you have a better suggestion in place of yue-gua? Benwing2 (talk) 08:00, 31 March 2024 (UTC)[reply]
@Fish bowl Is your thought that we should use just "Cantonese" (yue, or yue-can as suggested in the RFM discussion below) as the language code? This would be parallel to the normal handling of Latin, where Classical Latin terms are usually identified as just "Latin" (code la), although there's also a code for "Classical Latin" (code la-cla or CL.). The use of Guangzhou Cantonese specifically (yue-gua) as a code would then be restricted to cases where it's important to distinguish usage that is specific to urban Guangzhou speech as opposed to Standard Cantonese. I think this is also parallel to the use of cmn (Standard Mandarin) vs. cmn-bei for Beijing Mandarin. Benwing2 (talk) 00:48, 1 April 2024 (UTC)[reply]
yue-gua: maybe yue-gzh? Keeping the initials is more sane IMO but I also remember that you wrote your own guideline in one of the other discussion.
code: I think we are on the same page, yes. —Fish bowl (talk) 08:10, 2 April 2024 (UTC)[reply]

They look mergeable. Equinox 20:36, 11 February 2022 (UTC)[reply]

Yes, I meant the English sections. Merge request stands! Equinox 02:51, 13 March 2023 (UTC)[reply]

Seems like meat puppet sense #6 is the same as meatpuppet. 70.172.194.25 06:10, 19 February 2022 (UTC)[reply]

The conjunction وَ (wa) is not part of the phrase really. The phrase does occur frequently with it, but this is mainly owing to the "idiomaticness" of conjunctions in Arabic, mostly in prose. It is a sentence in itself, roughly "There is no(thing) equal to", not like the English adverbials that have comparable meanings (such as particularly and especially or above all). The entry should therefore be moved لَا سِيَّمَا (lā siyyamā), with the "variant" with وَ (wa) deleted. Roger.M.Williams (talk) 18:11, 24 February 2022 (UTC)[reply]

2022 — March[edit]

Slavic phylogeny[edit]

East Slavic codes[edit]

Following up a long discussion on the Old East Slavic About: page, I'd like to propose the following splits:

  • Split off Old Ruthenian (zle-ort)
  • Set Old Ukrainian (zle-obe) and Old Belarusian (zle-ouk) as etymology-only descendants and labels of Old Ruthenian
  • Set Ukrainian (uk), Belarusian (be) and Rusyn (rue) as descendants of Old Ruthenian
  • Change Old Russian (zle-oru) to Middle Russian (zle-mru) and set this as a label of Russian (ru)

On the final point there was quite some discussion, and I personally support making Middle Russian as a full-fledged code, but since we couldn't reach consensus, I propose making that a separate discussion if need be.

The proposed historical borders of the languages are as follows:

  • Old East Slavic (until the 14th century)
  • Middle Russian (=Moscow Literary language; 14th century-18th century) [Peter the Great's reforms]
  • Old Ruthenian (='West Russian' Literary language; 14th century-19th century) [Kotliarevsky's Eneïd]

Pinging @Atitarev, ZomBear, Useigor, Ентусиастъ, Benwing2, Rua, Ogrezem. I apologise if I forgot anyone. Thadh (talk) 12:43, 2 March 2022 (UTC)[reply]

I still support only the introduction of Old Ruthenian, which is missing but as before, I don’t claim to be an expert on the matter. The Russian corpus in the other discussion was helpful. When I filtered on “Middle Russian”, I think I was able to find a couple of words, which are now considered obsolete. The rest were words, which just need to be respelled to find quotes in (early) Modern Russian. I found a few different ways to abbreviate and also numerous misspellings. Overall I sort of feel why these additional splits are not so popular - little strong evidence to work with. Middle Russian may be allowed to be added, let’s just look for good cases.
To make decisions easier, why don’t we add a couple of specific examples for each new language code proposed - something to work with. (They can be vocab, grammar or pronunciation cases). They proponents should have examples in mind to make the case(s) stronger. We can work together on confirming or disputing those cases. --Anatoli T. (обсудить/вклад) 22:57, 2 March 2022 (UTC)[reply]
I'll see if I can make a list of features that distinguish Middle Russian from (Modern) Russian. In any case, for the time being, treating Middle Russian like Old East Slavic makes little sense to me, especially if we're splitting off Ruthenian (otherwise we get some kind of Dutch-Afrikaans situation), so we could go ahead with that now and in the meantime continue discussing MR's position as a separate code. Thadh (talk) 23:30, 2 March 2022 (UTC)[reply]
(edit conflict) You can use any of the examples already in discussions used as evidence, e.g. онтарь/оньтарь, агистъ, etc. BTW, I see that "Old Russian" was used incorrectly by ZomBear when actually talking about Middle Russian. "Old Russian" = "Old East Slavic". The Russian term for Middle Russian is старору́сский (starorússkij) but Old East Slavic (Old Russian) is древнеру́сский (drevnerússkij). --Anatoli T. (обсудить/вклад) 00:21, 3 March 2022 (UTC)[reply]
Quick update, I've found a relevant discussion from three years ago, Wiktionary talk:About Russian#Middle Russian?. Also, The Russian Language before 1700 (Matthews 1953) argues your and Fay Freak's point (that Middle Russian is too similar to modern Russian to warrant a linguistic distinction) Fun point, it also provides съмьрть's accentuation :0. I'll still look for differences in the corpora, but if the languages are too similar I guess I don't mind keeping the two together - as long as the descendants sections don't get too cluttered, I'm fine. Thadh (talk) 00:02, 3 March 2022 (UTC)[reply]
BTW, I didn’t get back to you on the concern I have in regards to introduction of word stresses in Old East Slavic. My reason being there are many cases where assumptions can go wrong based on descendants. We should only use referenced data. Well, we don’t have native speakers to prove us wrong, do we? —Anatoli T. (обсудить/вклад) 23:03, 2 March 2022 (UTC)[reply]
Sure, but of course we can still use sound laws for words without referencing the specific word's reconstruction. A word like съмь́рть will have the stress on the second syllable, because otherwise the Russian term would be something like **со́мерть rather than сме́рть. However, I wouldn't know where to look for any reference on this specific word, and googling "съмь́рть" returns no results. Thadh (talk) 23:30, 2 March 2022 (UTC)[reply]
Of course, there could be strong (?) assumptions on vowels, which became silent (i.e. they are unstressed) but I wouldn't be so sure even on e.g. вода́ (vodá) (if it weren't referenced), since the word is stressed on the first syllable in some Ukrainian dialects, if you know what I mean. --Anatoli T. (обсудить/вклад) 00:21, 3 March 2022 (UTC)[reply]
@Thadh: I support your suggestions. Ентусиастъ (talk) 16:19, 3 March 2022 (UTC)[reply]
I have already spoken before. I'm for it too.--ZomBear (talk) 00:57, 4 March 2022 (UTC)[reply]
@Thadh: Again, unfortunately, I see that the discussion has stopped again. It's been almost a month since no one has written anything. Every day I look forward to the solution of this issue with the Old Ruthenian language. --ZomBear (talk) 07:32, 21 March 2022 (UTC)[reply]
Done. What we need now is to split all pages into either Old East Slavic, Russian (with the Middle Russian label) or Old Ruthenian (with or without the Old Belarusian/Old Ukrainian label). Thadh (talk) 18:43, 21 March 2022 (UTC)[reply]
I also removed Old Novogrodian as the child of Old East Slavic. Vininn126 (talk) 08:52, 4 October 2023 (UTC)[reply]

@Thadh how about adding more etymology only language codes? Modern dictionaries use more than just Old Belarusian/Ukrainian. I saw Middle Bulgarian, Old Slovak, Old Slovene, Old Serbian, Old Croatian, Old Serbo-Croatian, Old Bulgarian, Old Upper Sorbian, Old Lower Sorbian. Possibly Middle Czech and Middle Polish also would be useful sometimes. Old Sorbian was also used by Boryś (Old Sorbian peleš as cognate for Polish pielesze), however we can't just link to both Lower and Upper Sorbian at once, so that would require full support for this language (?). Scientific publications mention Old Polabian as language of Polabian Slavs in Middle Ages, it is used usually for proper nouns like given names, theonyms, toponyms, sometimes ordinary words mentioned in Latin texts and it is always reconstructed language, I would like to have it tho. Sławobóg (talk) 14:32, 28 May 2022 (UTC)[reply]

@Sławobóg I'll need from you in order to determine if the splits are worth it is:
- Exact boundaries of the languages' stages
- You need to check how much literature there is in the earlier stages of the language.
- You need to check in how much the languages differ from their modern stages.
Once you do that, we can continue the conversation about splitting them. It seems pointless to split a language off just because there are two inscriptions in some dusty old book. Thadh (talk) 15:15, 28 May 2022 (UTC)[reply]
@Thadh: IMO Middle Polish would benefit greatly from the split.
  • Boundaries: As it is with extinct languages, there aren't really any exact boundaries, but it's usually defined as between the 16th and the 18th century; Polish Wiktionary has settled on years 1500 to 1750 to account for Doroszewski's dictionary.
  • Literature: There are two major corpora, accessible on the SPXVI and ESXVII websites.
  • Differences: I reckon the spelling and pronunciation differences, especially the employment of "slanted vowels" (samogłoski pochylone, I have no idea what their name is in English), should be enough.
Plus, like, this would help with attestation. Hythonia (talk) 11:08, 30 July 2022 (UTC)[reply]
Middle Polish is also thusly defined on Wikipedia. I also think it would make more sense to have Middle Polish as an LDL. The alternative would be having a label. If we split, we'd have to add Middle Polish both to Proto Slavic descendent entries as well as intermediates on etymologies. Vininn126 (talk) 11:52, 30 July 2022 (UTC)[reply]
Also pinging @KamiruPL, as an editor for Old Polish. Do you think we should fully split Middle Polish, create a label, or some other alternative? Vininn126 (talk) 13:44, 30 July 2022 (UTC)[reply]
@Vininn126: I treat Arabic before the spread of printing in the Arab world, which is from 1800 (Napoléon brought the press to Egypt, which was then a state business that over time was rented by privates who would copy it), as LDL. The reason becomes more obvious for Hebrew where we are eager to include hapax legomena in the Tanakh and due to lacking distinctness of the Modern to the Biblical language, from which the former has been resurrected, have little desire to split. This is in analogy to the split of English from Middle and Old English, where basically the split happens following the new medium of printed books—accordingly if Polish literacy in the same fashion starts only somewhere in the 18th century then we become stricter only then.
Circumventing attestation criteria is no reason to split language headers, as your perception about whether something is another language is the same and only disingenuously modified by that consideration of its description. So more appropriate attestation criteria – and I think of the many carefully collected variants sadly left even unmentioned as a consequence of no sense of proportion applied to the teleology of our rules – by no means should serve motivation to split languages; we can already derive them by the accepted statutory interpretation methods.
To be clear, since legal thinking is unwonted and mysteriously strange to many in spite of people rightly being appointed for it in any society: In this case this is really just systematic interpretation: Since the community authoring the policies was biased towards English but the splits of other languages wrought comparative inconsistency with its situation according to which it has been split by chronolects, we break the criteria down to be suited for the languages they were only roughly devised for. Fay Freak (talk) 09:51, 31 July 2022 (UTC)[reply]
In all honesty a label is likely the best option. Vininn126 (talk) 10:05, 31 July 2022 (UTC)[reply]
@Hythonia @Sławobóg @KamiruPL I've gone ahead and added Middle Polish as a label. Vininn126 (talk) 12:11, 8 August 2022 (UTC)[reply]
I've thought about this more, and I think there might be a case for Middle Polish as an L2. If we agree it should be split, I can help convert the existing entries to Middle Polish.
Here is my reasoning:
Old Polish, Middle Polish, modern Polish, and Silesian are four lects that are hard to separate accurately. Part of this argument hinges on Silesian, which we currently treat as an L2, and I don't see that changing. There are political, historical, and linguistic reasons
===Why Silesian should be an L2===
  • Its speakers feel strongly that it is a language, not a dialect, most Polish linguists pushing that it is a language include Jan Miodek, who is a notable prescriptavist who pushes more nationalistic views of how languages should be treated, and I believe that treating Silesian as a dialect is done partially to stifle any sense of individuality to further Polish control. However, I recognize that theory has some tinfoil-hat conspirist vibes to it, so I'll stick to its speakers strongly feel it is.
  • Significant linguistic difference: Silesian has a different phonology to Polish, and other grammatical features, such as retaining the Proto-Slavic aorist in an analytical past tense, as opposed to a more agglutinative/morphological one in Polish. It also recently has undergone strong standardization, as can be seen on silling.org and the ślabikŏrzowy szrajbōnek.
  • Significant lexical differences: Silesian differs quite a bit from Polish in terms of lexical information. Core inherited words are of course similar, but look at other Slavic languages. It's also been heavily "Policized", but so has Kashubian, which we also treat as an L2 and is recognized as a separate minority language in Poland, and both Kashubian and Silesian are recognized by ISO and Glottolog.
  • Finally, the key point to the overall arguement: Silesian is a descendent of Middle Polish. Most claims that it is Czechoslovakian are refuted by Silesian philologists.
===Why Middle Polish should maybe be an L2===
So if we decide that Silesian is an L2, that would give Middle Polish multiple descendents. This would "fix" many inherited etymologies, such as wszystek. This would also fix Latinate borrowings, where Silesian inherited an older pronunciation of Latinate words, and also the chain generally works better as Learned borrowing into Middle/Old Polish -> Polish + Silesian, as opposed to setting multiple Learned borrowings.
Furthermore, Middle Polish was siginificantly different from Modern Polish in terms of phonology and grammar (I recently updated the Middle Polish Wikipedia page). In terms of lexical content - there were significant shifts, I would say less than the standard differences between Slavic languages, but there were still trends, and dictionaries such as {{R:pl:SXVI}}, {{R:pl:SXVII}}, and occasionally {{R:pl:SJP1807}} or {{R:pl:SJP1900}} would be key in this. Furthermore, Middle Polish is otherwise resource poor, and should be treated as an LDL, label or not. Having it as an L2 is cleaner in terms of citations.
If we agree that this should be done, I would recommend setting the cutoff dates as c. 1500-c. 1780, with a language code of zlw-mpl. Vininn126 (talk) 12:39, 24 April 2023 (UTC)[reply]
@Atitarev@Fay Freak@Hythonia@Sławobóg@Thadh@ZomBear@Ентусиастъ Vininn126 (talk) 17:30, 24 April 2023 (UTC)[reply]
Update: there is debate as to whether Silesian should be listed as from Old Polish or Middle Polish, which really affects the above argument. Vininn126 (talk) 14:53, 25 April 2023 (UTC)[reply]
Just flagging up that it's possible to give Middle Polish an etymology-only language code, and to set it as the ancestor of Polish (and Silesian, if desired). This would be a way to keep its entries under the Polish L2, while allowing etymologies to formally mention it. In turn, Middle Polish could have Old Polish set as its ancestor.
Of note is the fact we already have Middle Russian, Old Ukrainian, Old Belarusian, Middle Bulgarian and Early Modern Czech, which are all currently handled in the same way. Theknightwho (talk) 16:14, 25 April 2023 (UTC)[reply]

Old Slovak ?[edit]

How about adding code for the Old Slovak (zlw-osk) as well. In the same {{R:sla:ESSJa}} (ЭССЯ), especially in recent editions, Old Slovak is constantly listed separately. In this case, etymology-only code is sufficient. --ZomBear (talk) 07:32, 21 March 2022 (UTC)[reply]

@ZomBear @Thadh @Sławobóg @Vininn126 What is the current state of this? I notice that Middle Russian is an etym-only language of Russian and has two codes zle-mru and zle-oru, which looks very suspect. I also think Middle Polish has in fact been made an etym language of Polish. Benwing2 (talk) 06:24, 19 September 2023 (UTC)[reply]
I still believe that at least there should be an etym-code for the Old Slovak language. It is also necessary to combine Czech & Slovak into the “Czech–Slovak family” in Slavic languages tree, as was done with Lechitic (zlw-lch) F. ZomBear (talk) 06:54, 19 September 2023 (UTC)[reply]
I know that Sławobóg also wanted to split Old Slovak. As to grouping them and giving them a family lang code, I'm not sure. Perhaps Moravian should also be split and placed in this family. @Zhnka Vininn126 (talk) 07:51, 19 September 2023 (UTC)[reply]
I'm pretty certain that this question isn't as straightforward as you make it out to be, and I read on multiple occasions that the similarities between Standard Czech and Standard Slovak arose due to Czech's influence on Slovak and that dialectal evidence shows no evidence of genetic relationship closer than on the West Slavic level. So I would like a more detailed discussion on this. Thadh (talk) 08:10, 19 September 2023 (UTC)[reply]
@Benwing2 @Thadh @ZomBear @Sławobóg I was reading up on w:sk:Dejiny slovenčiny#11. až 18. storočie, and it seems like there were huge phonological and grammatical changes, IMO upon reading it enough to split Old Slovak into an L2. There also appears to be a dictionary Historický slovník slovenského jazyka that could be used as a source. So I propose that we split Old Slovak. Vininn126 (talk) 10:15, 1 October 2023 (UTC)[reply]
Also @Zhnka for the tactical ping. Vininn126 (talk) 10:49, 1 October 2023 (UTC)[reply]
Support. @Vininn126 I just created a template for Historical Dictionary of the Slovak Language {{R:sk:HSSJ}}. It contains more than 70,000 words from the pre-literary period (before the 18th century) of the Slovak language. This is a really good source for Old Slovak. ZomBear (talk) 11:29, 1 October 2023 (UTC)[reply]
@ZomBear We should be careful, however, Old Slovak is best described as 9th-14th centuries. Vininn126 (talk) 11:33, 1 October 2023 (UTC)[reply]
@Vininn126 it’s just great what’s in this dictionary, when quoting, the year or century when the word was recorded is indicated. For example, voda (“water”), it can be seen that the oldest evidence for this word is 1473, 1585 and 1376. ZomBear (talk) 11:50, 1 October 2023 (UTC)[reply]
Support. Sławobóg (talk) 13:14, 1 October 2023 (UTC)[reply]
I have split Old Slovak and given it the code zlw-osk. Vininn126 (talk) 19:26, 3 October 2023 (UTC)[reply]

There still seems to be a lot of overlap here, e.g. the chandelier sense. Is there any sense of the word that cannot be spelled both ways? Equinox 03:47, 4 March 2022 (UTC)[reply]

Any English word ending with -er occassionally shows up as -re. It doesn't seem like this needs a tag and discussion, though. Some editor at lustre was just wrong: There's no reason to say "alternate form of luster" + 3 repeated senses that're already at luster. Maybe add a usage note if some American speakers tend to still use re unexpectedly more often in some cases.
That said, the luster entry is currently a bit off. 'Shininess', '5-year period', and 'den' all get spelled with an -re in standard British English but using it for 'one who lusts' would still seem like a misspelling. The alt form needs to be with each etym that uses it and not headlining like it is now. — LlywelynII 23:40, 13 June 2023 (UTC)[reply]

Fish bowl (talk) 05:38, 4 March 2022 (UTC)[reply]

Fish bowl (talk) 14:55, 6 March 2022 (UTC)[reply]

Some circularity, with each form linking to the other for certain senses. Equinox 00:20, 8 March 2022 (UTC)[reply]

I think I was trying to show that the open form was used more commonly for some definitions and the closed for others. Is there pure circularity remaining? DCDuring (talk) 17:18, 9 March 2022 (UTC)[reply]
I have tried to make clearer the differences and have simplified the "Further reading" sections. I don't see why they should be moved, merged, or split, whichever it is that you are seeking. DCDuring (talk) 18:08, 9 March 2022 (UTC)[reply]

These look like the same word. 70.172.194.25 09:17, 9 March 2022 (UTC)[reply]

It's not, just a doublet that came into the language by a different (rather convoluted) route. Serynga is listed as an alternative form at English seringa, but it looks like it's really a borrowing from French, where it's an alternative form of French seringa. The spelling is no doubt influenced by the taxonomic name.
From what I can gather, Latin syringa developed into Dutch sering, which was borrowed into Portuguese as the name for rubber plants in the genus Hevea and into French for the syringa, Philadelphus coronarius, both with an "a" added. English borrowed Portuguese seringa for the rubber plant and French serynga for the syringa.
If you're not confused by all of this, you're not paying atttention... Chuck Entz (talk) 15:40, 9 March 2022 (UTC)[reply]
So, are the definitions in both entries correct? Because they currently claim to both have the same first two definitions... in which case we should either have {{syn}} crosslinks between them or reduce both senses on one to {{synonym of}} (+gloss) of the other. - -sche (discuss) 08:07, 28 March 2022 (UTC)[reply]

Possible move to align names[edit]

I made Category:English terms spelled with underscore a few hours ago and I just now discovered Category:Translingual terms spelled with low line. "Low line" and "underscore" refer to the same character: which one should be the name used? I've never encountered "low line" before, but it looks like this has some popularity. Thoughts? —Justin (koavf)TCM 06:43, 12 March 2022 (UTC)[reply]

I too would've thought "underscore" was the usual name; I'm intrigued to see that "low line" pages have apparently been around longer. My !vote is to consolidate the pages on "underscore". (Related issue: "Unsupported titles/Low line interfix" is an unintuitive name for that page; it seems like unsupported titles are more often named in more plainly descriptive ways; now that the page no longer has an interfix on it, can we change it to something like "hyphen underscore hyphen"?) - -sche (discuss) 03:31, 13 March 2022 (UTC)[reply]
I support moving any categories that use low line to use underscore instead, since underscore is more common in my experience, it is the title of the Wikipedia article, and our entry for low line was missing the punctuation definition until I added it just now. - excarnateSojourner (talk | contrib) 17:31, 3 May 2022 (UTC)[reply]
@ExcarnateSojourner: I created the “low line” category, because we have the character at “Unsupported titles/Low line”. You stated that you support moving the categories specifically, however, which would not be consistent with the entries. J3133 (talk) 14:27, 13 May 2023 (UTC)[reply]
@J3133, ExcarnateSojourner, -sche, Koavf I have gone ahead and merged these both into underscore: these categorisations are now automatic, and I really, really didn't want to add a special case giving them different names for the sake of this glacial discussion. I simply picked the one which most speakers are more familiar with. If anyone has strong feelings about changing it to "low line" then I'm happy to change it.
In terms of making page titles match, there are only 4 entries which actually use "low line" in the title: -_-, >_<, _ _ and _ itself, as they're the only pages where we have no other choice but to do it that way. In most instances, we can just use a space, as the "Unsupported titles/" prefix is enough to differentiate it: compare snake case and snake_case. Theknightwho (talk) 17:58, 14 May 2023 (UTC)[reply]
<3 —Justin (koavf)TCM 20:45, 14 May 2023 (UTC)[reply]
I'll probably move the low line entries if no one objects. — excarnateSojourner (talk · contrib) 02:15, 15 May 2023 (UTC)[reply]

Defined recursively in terms of each other. Equinox 22:36, 12 March 2022 (UTC)[reply]

70.172.194.25 09:38, 13 March 2022 (UTC)[reply]

No RFM needed for alternative forms. ―⁠Biolongvistul (talk) 07:28, 22 August 2023 (UTC)[reply]
@Biolongvistul: On the contrary- one of these should be made the main entry and the other an alternative form (if you can call it that: it's really just two slightly different ways of writing the exact same thing). Chuck Entz (talk) 08:02, 22 August 2023 (UTC)[reply]
Just did that. ―⁠Biolongvistul (talk) 08:04, 22 August 2023 (UTC)[reply]

Lots of duplicate information, including but not limited to translations. — Fytcha T | L | C 19:27, 13 March 2022 (UTC)[reply]

in line with Category:Redlinks by language. —Fish bowl (talk) 03:57, 14 March 2022 (UTC)[reply]

move Category:Chinese terms with uncreated forms to Category:Chinese redlinks/zh-see[edit]

I don't remember doing this lol. It makes sense though; 肉体的 points to 肉體的#Chinese which doesn't exist. —Fish bowl (talk) 21:27, 25 February 2023 (UTC)[reply]

move Category:Sino-Vietnamese words with uncreated Han etymology to Category:Vietnamese redlinks/vi-etym-sino[edit]

Not sure whether "descrescendo" should be a misspelling or alternative form, though. It has quite a few hits on Google Books. 70.172.194.25 23:49, 15 March 2022 (UTC)[reply]

If it's kept at all, it should be a {{misspelling of}}. —Mahāgaja · talk 07:56, 16 March 2022 (UTC)[reply]

2022 — April[edit]

Is there a difference? w:Political subdivisions redirects to w:Administrative division. —Fish bowl (talk) 04:05, 4 April 2022 (UTC)[reply]

The intended distinction (which, when I spot-check a few categories, actually seems to be decently well maintained) seems to be as IP 70.172 says. But I am inclined to agree that the current names don't convey a meaningful distinction. If we want to continue having separate categories for "county, burgh, kingdom, ..." vs "Mayo, Yorkshire, Idaho, ...", it would be better to devise more distinct names for the categories... - -sche (discuss) 23:14, 20 February 2023 (UTC)[reply]
IP is right. I just came here because Ottoman Turkish قضا (kaza) was in the wrong category, and pushed the panic button. The naming should be something more intelligent. Fay Freak (talk) 03:33, 21 February 2023 (UTC)[reply]
I agree that the names are highly confusing. Maybe we should rename the first one “types of administrative division”, or something similar. Incidentally, that’s exactly the name of the corresponding en.wikipedia category. 70.172.194.25 03:39, 21 February 2023 (UTC)[reply]
Now the yerba gave me the idea. We just name the latter “named political subdivisions”, to avert the exemplified mistake. The former shall not be renamed because it is added manually while the other is a mediate effect of Template:place etc. I also briefly thought about going to Wikipedia to see how they do but we don’t have the same problems. Fay Freak (talk) 03:47, 21 February 2023 (UTC)[reply]

also Talk:point-blank#merge with point blank. – Jberkel 23:51, 12 April 2022 (UTC)[reply]

Oxford and Collins list only point-blank for both adjective and adverb. DonnanZ (talk) 15:03, 22 December 2023 (UTC)[reply]

Cantonese: main entry at 𧿒腳 or 䂿腳?[edit]

Fish bowl (talk) 07:42, 13 April 2022 (UTC)[reply]

Proposal to rename Ottawa (otw) to Odawa[edit]

I think Ottawa should be renamed to Odawa; It's the more common English name used to refer to the language nowadays, and preferred by speakers. What do you think? /mof.va.nes/ (talk) 15:47, 15 April 2022 (UTC)[reply]

So far as I can tell, the two senses refer to the same thing. Is this a case where differing terminology between chemistry and physics means that it's worth keeping both to better aid understanding? If so, we should probably clarify that they aren't referring to different things.

If I'm wrong and they are actually distinct, could someone with more knowledge than I do make that clearer? Theknightwho (talk) 14:42, 16 April 2022 (UTC)[reply]

The redundancy was added in diff, I've merged the senses. There may be another sense, to which the first etymology (positron + ium) would apply, for positronium conceived of in sci-fi etc as an element or substance a la uranium, polonium, unobtainium, etc. - -sche (discuss) 01:09, 29 May 2022 (UTC)[reply]

To מ־ש־ך, to be consistent with other Hebrew entries. 70.172.194.25 18:13, 29 April 2022 (UTC)[reply]

2022 — May[edit]

This is only an English entry, and on English Wikipedia it is not capitalized inside the middle of sentences. The rationale for capitalizing it in 2007 was that it is a German language entry, except there has never been a German language section on this page. -- 65.92.246.142 03:20, 13 May 2022 (UTC)[reply]

See also: #toponyms

Man has taken a quick look and found that eponym is a term not restricted to words after persons, bare logically since ὄνυμα (ónuma) merely means name, i.e. in modern linguistic terminology proper noun, so we might reckon that our definition under eponym (inconsistent with the single adjective definition and Wikipedia) is a prescriptivist legend (of the dark mid 20·00s to which the categorizations and definitions date) and we rather have to move the category to Category:Terms derived from anthroponyms by language to attain consistency with Category:Terms derived from toponyms by language and mislead the public less. Fay Freak (talk) 20:31, 16 May 2022 (UTC)[reply]

a few or few[edit]

We have a few fries short of a Happy Meal (created by @Equinox) and few cards short of a full deck, few cards shy of a full deck, few sandwiches short of a picnic, few X short of a Y, which were moved/created by @TNMPChannel. J3133 (talk) 08:21, 29 May 2022 (UTC)[reply]

IMO these should all be at "a few...", since a few means something quite different from few. —Mahāgaja · talk 09:21, 29 May 2022 (UTC)[reply]
Whichever form we lemmatize, I guess we might as well leave redirects from the other. Several of these also have variants like google books:"several fries short of a" Happy Meal / happy meal, google books:"several cards short of a" full deck / full pack, which presumably need hard or soft redirects. - -sche (discuss) 19:00, 29 May 2022 (UTC)[reply]
I definitely agree that all of the headwords mentioned should be at "a few ...". Unfortunately there are probably more (attestable?) alternatives besides what -sche has found. Redirects from "few ..." are especially useful because many with beginning knowledge of English seem to have problems with English determiners. DCDuring (talk) 19:23, 29 May 2022 (UTC)[reply]
This is a good issue to raise. I've mentioned before that, with proper nouns, we don't seem to have (or at least we don't consistently use) anything about the determiner/article: I mean it's the Eiffel Tower and the Cold War, but ∅ Dijkstra's algorithm and ∅ Greenpeace. Proper nouns aside, I usually drop the determiner/article from entry titles unless it seems absolutely 100% necessary all the time. But that's pretty vague and comes out of my wacky head. Equinox 01:53, 4 June 2022 (UTC)[reply]
@Equinox: Yes, but it's [ a few ] [ cards short of a full deck ], not [ a ] [ few cards short of a full deck ] (note the alternative form one card short of a full deck, where "one" replaces "a few" ) Chuck Entz (talk) 02:14, 4 June 2022 (UTC)[reply]
Maybe. Yeah. I would imagine "some few..." etc. might be possible. But even I have better things to do than attest them. Just an observation. Equinox 02:17, 4 June 2022 (UTC)[reply]
It's a snowclone with many possible variants. I dont think many people are going to look up few or short of expecting to find this full phrase. And those words arent in every variant anyway ... one can also say "two cards shy of a full deck" which uses neither of them.
What would be nice is if the Appendix namespace was in the default search space so that the snowclone page might at least turn up in a search. As it stands, I don't think we need all these mainspace pages since they are all exact synonyms of each other, but if we delete them there will be no way for a naive user to find the snowclone pages unless they somehow know that it's tucked away in the Appendix. Soap 20:05, 30 June 2023 (UTC)[reply]

2022 — June[edit]

I've created the page using the wrong character. It should be moved to دلكو instead. Dohqo (talk) 07:21, 18 June 2022 (UTC)[reply]

@Dohqo: Because there's already a Persian entry on the same page, and the character is correct for Persian, it doesn't make sense to move the page. Just delete the Old Anatolian Turkish section from this page and create it on the correct page. You can do it all yourself, no admin rights needed. —Mahāgaja · talk 08:36, 18 June 2022 (UTC)[reply]

Given that our planet's name is usually capitalized, I think this should be moved to Category:Flat Earth. Binarystep (talk) 03:51, 20 June 2022 (UTC)[reply]

Duplicate definitions and potentially missing parts of speech. — Fytcha T | L | C 15:42, 22 June 2022 (UTC)[reply]

Delete and merge into Category:Places in France or some appropriate equivalent. France does not have dependencies: all its overseas territories are integrated into the French Republic. See also: Category:ca:Dependent territories of France, Category:zh:Dependent territories of France, Category:nl:Dependent territories of France, Category:en:Dependent territories of France, Category:fi:Dependent territories of France, Category:fr:Dependent territories of France, Category:de:Dependent territories of France, Category:el:Dependent territories of France, Category:hu:Dependent territories of France, Category:ga:Dependent territories of France, Category:it:Dependent territories of France, Category:ja:Dependent territories of France, Category:lv:Dependent territories of France, Category:nrf:Dependent territories of France, Category:nb:Dependent territories of France, Category:nn:Dependent territories of France, Category:pl:Dependent territories of France, Category:pt:Dependent territories of France, Category:rar:Dependent territories of France, Category:ro:Dependent territories of France, Category:ru:Dependent territories of France, Category:es:Dependent territories of France, Category:sv:Dependent territories of France, Category:tr:Dependent territories of France, Category:vi:Dependent territories of France, and Category:vo:Dependent territories of FranceJustin (koavf)TCM 04:10, 23 June 2022 (UTC)[reply]

@Koavf What do you call New Caledonia, French Polynesia and such if not dependent territories? Benwing2 (talk) 00:40, 24 June 2022 (UTC)[reply]
New Caledonia is a sui generis overseas collectivity of France. It has membership in the French parliament and France's rule of law and citizenship extends there just like in Corsica or Guadelope or Lyons. None of these are dependencies: they are all first-level administrative divisions of the French Republic. —Justin (koavf)TCM 00:48, 24 June 2022 (UTC)[reply]
I want a category for all overseas territories of France, and I don't much care about the technicalities. What is the right category? Benwing2 (talk) 01:52, 24 June 2022 (UTC)[reply]
I would have absolutely no objection to Category:Overseas France as a subcat of Category:Places in France and that can include everywhere other than Metropolitan/European France (the mainland, Corsica, and other nearby islands). Seems sensible to me. —Justin (koavf)TCM 04:00, 24 June 2022 (UTC)[reply]
That is hard to do in the current framework without major hacking. There used to be Category:Collectivities in France populated by these entities, will that work? Are they all collectivities? Benwing2 (talk) 05:05, 24 June 2022 (UTC)[reply]
Guadeloupe, Mayotte etc are not collectivities. Unfortunately Justin is right, we need CAT:Overseas France if we're going to be strictly correct here. This, that and the other (talk) 04:20, 26 June 2022 (UTC)[reply]

Wiktionary:Requested entries (Chinese)/Taishanese Chinese[edit]

Wiktionary:Requested entries (Chinese)/Wu Chinese[edit]

Wiktionary:Requested entries (Chinese)/Min Dong Chinese[edit]

Wiktionary:Requested entries (Chinese)/Min Bei Chinese[edit]

Wiktionary:Requested entries (Chinese)/Jin Chinese[edit]

Wiktionary:Requested entries (Chinese)/Gan Chinese[edit]

Wiktionary:Requested entries (Chinese)/Teochew Chinese[edit]

These were recently moved by @Apisite from their own user namespace to the Wiktionary namespace under "Requested entries (Chinese)". All of these pages are not requested entries but pronunciation requests. I'm not entirely sure where these should be moved instead, but I don't think they're in the right place currently. — justin(r)leung (t...) | c=› } 16:59, 23 June 2022 (UTC)[reply]

Seems like they should be subcategories of Category:Requests for pronunciation in Chinese entries. —Mahāgaja · talk 20:50, 23 June 2022 (UTC)[reply]

Duplicate content, move all to fiber. NgramFytcha T | L | C 19:17, 26 June 2022 (UTC)[reply]

@Fytcha: A word of caution: anything involving pondian variation should be handled carefully. There are good arguments for going either way on most of these, and we don't want to start any kind of conflict. Our general practice has been to arbitrarily go with whichever version was first, though it's been a while since one of these came up. Chuck Entz (talk) 20:17, 26 June 2022 (UTC)[reply]
In this case, fibre is older, but by only 14 hours. Also, the translation tables are all already at fibre, so I feel like making fibre the primary spelling and fiber the alternative spelling will be less work. —Mahāgaja · talk 21:09, 26 June 2022 (UTC)[reply]
From Google N-Grams: Since 1911 fiber has been more common. As of 2009 it is about three times as common. DCDuring (talk) 21:14, 26 June 2022 (UTC)[reply]
We don't apply that when it comes to AmEng/BrEng differences. Theknightwho (talk) 21:37, 26 June 2022 (UTC)[reply]
Who says? DCDuring (talk) 21:51, 26 June 2022 (UTC)[reply]
Also, since 2016 fiber has been more common in Google's British English N-Gram corpus andsix times more common in American English corpus. DCDuring (talk) 21:54, 26 June 2022 (UTC)[reply]
@Chuck Entz: I see. If that is de facto policy then the meat should go to fibre. However, if I could have devised the policy, I would have made it so that it always aligns with the frequency because that way the users land on the non-redirecting spelling more often. — Fytcha T | L | C 22:26, 26 June 2022 (UTC)[reply]
We actually had an attempt by a Russian internet troll (geolocating to Crimea) to get us arguing about UK vs. US issues, but it went nowhere. At the time I just thought it was odd, but with the revelations after Trump was elected I finally put two and two together and realized what was going on. I still have no idea why they even bothered, since our discussion forums aren't exactly the center of the universe. I do know that the mutual respect between our US and UK editors, helped by this kind of practice, was the main reason it was such a non-issue. Chuck Entz (talk) 23:20, 26 June 2022 (UTC)[reply]
@Chuck Entz I have it on my to-do list to build a template that duplicates the material from the "primary" entry, which should hopefully circumvent issues like this anyway. I've done something similar with Tangut already (e.g. see 𗁘 (*rjijr²), 𗁩 (*tẽ¹), 𗀏 (*par²)), though the implementation would need some tweaking. Theknightwho (talk) 00:18, 27 June 2022 (UTC)[reply]

2022 — July[edit]

Sense 3: Relating to the spoken rather than written form of a word or name, as opposed to orthographic.

Feels like this could be merged in some way with sense 1: Relating to the sounds of spoken language. Theknightwho (talk) 08:44, 7 July 2022 (UTC)[reply]

I suppose this might be trying (unclearly) to express the sense used in "a phonetic spelling" (one based on how it sounds) as contrasted with, say, "a phonetic sketch of Urama" (one describing its phonology). Whether this merits a different sense I'm not sure. - -sche (discuss) 23:30, 23 July 2022 (UTC)[reply]
Meh, kept distinct; I've added a usex to try to clear things up. - -sche (discuss) 21:17, 29 March 2024 (UTC)[reply]

Inconsistent capitalization of I/internet slang[edit]

We capitalize it as a label — (Internet slang), but not in the category name — Category:English internet slang. When I was adding this category I thought it would also be capitalized, like in the label (but it was a red link). J3133 (talk) 09:16, 15 July 2022 (UTC)[reply]

Not just for English slang; see Category:Internet slang by language and Category:Internet laughter slang by language.  --Lambiam 15:30, 15 July 2022 (UTC)[reply]
I am aware that the slang category is not exclusive to English; however, @Lambiam, what is our solution? J3133 (talk) 15:49, 15 July 2022 (UTC)[reply]
The regular approach is to list these at WT:RFM. This seems, however, a place where proposals go to linger in limbo: there is an unresolved category move request (WT:RFM § Category:WC) from 2015. The sledgehammer approach is to create a vote at WT:VOTE.  --Lambiam 17:20, 15 July 2022 (UTC)[reply]
Good point; I have moved it. J3133 (talk) 17:33, 15 July 2022 (UTC)[reply]
@J3133 What did you move? Category: English internet slang has not been moved since 2019. - excarnateSojourner (talk | contrib) 00:16, 23 October 2022 (UTC)[reply]
@ExcarnateSojourner: I moved this discussion from the Beer parlour. J3133 (talk) 18:11, 13 March 2023 (UTC)[reply]
Personally, I would lowercase the label (and anything else). On the other hand, Google Books Ngrams suggests Internet is more common. That said, it's less work to lowercase the label than to move all the categories... - -sche (discuss) 23:33, 23 July 2022 (UTC)[reply]
According to capitalization of Internet, cited by excarnateSojourner, the trend of capitalising the I in internet is decreasing. So I'll support lowercasing the I in the label. The higher rank of Internet in Google Ngram Viewer is maybe because internet sometime occurs at the beginning of a sentence and is thus capitalised. Sbb1413 (he) (talkcontribs) 08:07, 9 March 2023 (UTC)[reply]
Looks like it was originally capitalized. Mnemosientje (talkcontribs) lowercased it back in 2019. - excarnateSojourner (talk | contrib) 00:15, 23 October 2022 (UTC)[reply]
The capitalization of Internet is a whole thing, but for what it's worth Wikipedia does capitalize it. - excarnateSojourner (talk | contrib) 00:15, 23 October 2022 (UTC)[reply]
@ExcarnateSojourner: Wikipedia's inconsistent too. Compare Category:History of the Internet and Category:People related to the internet. However, the capitalized spelling does seem to be more common in category names, so I support capitalizing Category:English internet slang. Binarystep (talk) 03:11, 13 March 2023 (UTC)[reply]
It should be capitalised. There is such a thing as "an internet" or internetwork (generic; although you very rarely hear this terminology any more), versus "the Internet" (the global thing we all use all the time). Same deal with "the Web" versus (I suppose) "a web" although I don't remember even the most braggart webmasters using the latter. As always, citable usage trumps what I say, but I am historically correct. Equinox 03:14, 13 March 2023 (UTC)[reply]

Split Category:Thieves' cant into subcategories by language[edit]

Currently, this category contains 123 English entries, 1 Japanese entry, and 3 Yiddish entries. This is inconsistent with how categories usually work (compare Category:English fandom slang or Category:English Polari slang). I suggest that we split Category:Thieves' cant into Category:English thieves' cant, Category:Japanese thieves' cant, and Category:Yiddish thieves' cant, allowing for more subcategories if the need arises. Binarystep (talk) 11:33, 21 July 2022 (UTC)[reply]

Sounds reasonable. 98.170.164.88 11:41, 21 July 2022 (UTC)[reply]
Yeah. The category description suggests it was originally intended only for English (compare Category:Rotwelsch, for one or two languages not directly specific in the name). If multiple languages have thieves' cants, as seems to be the case, then this should be split per nom. - -sche (discuss) 23:35, 23 July 2022 (UTC)[reply]
I assumed this was a mistake, to be quite honest. Seems odd to assume this would only exist in English in the first place, really. Theknightwho (talk) 23:40, 23 July 2022 (UTC)[reply]
In fairness, the Wikipedia article and most books I can find about it take it as given that it's an English thing; I've occasionally even seen it capitalized as it people thought of it as the name of a specific lect. Apparently the term for it in other languages and other time periods of English is criminal slang, I now realize. Hmm, now I wonder whether we should split this after all, since then the "thieves' cant" and "criminal slang" categories would overlap. But the current name is clearly too ambiguous, since people are adding non-English entries to it. Maybe we should move the English entries to "English thieves' cant" (for the historical lect) and disperse the other languages and any modern English developments to Category:Criminal slang by language? - -sche (discuss) 23:51, 23 July 2022 (UTC)[reply]
Other languages definitely have thieves' cants, but they might not use the term "thieves' cant". Rotwelsch is German thieves' cant, but it's just called Rotwelsch. —Mahāgaja · talk 06:56, 30 July 2022 (UTC)[reply]
It seems reasonable to use "criminal slang" as the proper category for such terms. "English thieves' cant" can be made a subcategory of that. — Sgconlaw (talk) 07:01, 30 July 2022 (UTC)[reply]
I think that’s a good idea, because that can be broken down by language, and allows for categories like this to go under the language categories. No doubt there are numerous lects of criminal slang in English alone. Theknightwho (talk) 21:23, 14 August 2022 (UTC)[reply]
OK, I moved the three Yiddish entries and one Japanese entry to "criminal slang". For the English entries, are we renaming the historical English "Thieves' cant" lect(s) to "English thieves' cant" for clarity? And then are we
  1. making the "thieves' cant" label English-only, i.e. changing it from always adding plain_categories = { "Thieves' cant" }, regardless of language code to always adding plain_categories = { "English thieves' cant" }, (a subcategory of "English criminal slang"), and ongoingly removing uses outside English?
  2. or allowing for other languages to have their own "thieves' cant" subcategories of "criminal slang" (which entails changing the label to use pos_categories so each language could have its own "thieves' cant" subcategory of "criminal slang")?
- -sche (discuss) 20:39, 21 August 2022 (UTC)[reply]
@-sche: I’d say leave “Thieves’ cant” as English-only. It seems a peculiarly archaic English expression rather than a general term of art. — Sgconlaw (talk) 11:33, 17 September 2022 (UTC)[reply]
Support, and I dislike the usage of criminal slang, this might differ by language. Vininn126 (talk) 14:02, 17 August 2023 (UTC)[reply]
IMO we should use "criminal slang" for the general set of terms. As others have pointed out, "cant" has a primarily historical usage, and not all criminals are thieves. I would make Category:Thieves' cant a subcategory of Category:English criminal slang, though. Benwing2 (talk) 18:41, 17 August 2023 (UTC)[reply]
I suppose having both and making thieves' cant a subcategory would be a good solution. Vininn126 (talk) 19:02, 17 August 2023 (UTC)[reply]
OK, if I am reading the discussion above as saying "thieves' cant" should only be used for English (for the historical lect) and other languages, and non-Thieves'-Cant English criminal slang, should use "criminal slang", then (1) maybe we should capitalize it "Thieves' Cant" for clarity, and (2) is there a way to make {{lb}} treat "thieves' cant" as "criminal slang" if it's used with any other language besides en? I just had to change a Korean instance to "criminal slang" today, because it was categorizing the Korean entry into "Category:Thieves' cant" along with English terms. (Alternatively, if it would be easier: make "thieves' cant" default to categorizing into "LANGUAGE criminal slang" and then special-case use with en.) - -sche (discuss) 00:08, 18 August 2023 (UTC)[reply]
@-sche Yes, the latter is possible. I think capitalizing Thieves' Cant is a good idea. Benwing2 (talk) 00:35, 18 August 2023 (UTC)[reply]
OK, I believe I've made it so that "Thieves' Cant" is English-only and any other language that tries to use it gets categorized as "criminal slang". If a language has a more specific label like Rotwelsch, use that. - -sche (discuss) 21:19, 29 March 2024 (UTC)[reply]

One is tagged as obsolete and defined as A kind of furnace used in refining, to separate the metal from cinders and other foreign matter., another not obsolete defined as A furnace in which slags of litharge left in refining silver are reduced to lead by being heated with charcoal.. Good luck to the potential merger Dunderdool (talk) 18:06, 29 July 2022 (UTC)[reply]

DoggoLingo is the jargon used in doge memes. This should be changed to "Category:English DoggoLingo," since it contains only English terms, and to remain consistent with similar categories (e.g. Category:English 4chan slang). WordyAndNerdy (talk) 05:29, 30 July 2022 (UTC)[reply]

@WordyAndNerdy: 4chan is not only English: see DoggoLingo: “A form of English-language Internet slang related to dogs”. Compare Category:Rotwelsch, etc. J3133 (talk) 06:11, 30 July 2022 (UTC)[reply]
"Category:[Language] [word type]" is the standard naming convention of lexical categories. Category:English irregular nouns, Category:English onomatopoeias, Category:English fandom slang, etc. This category contains only English-language DoggoLingo terms, and thus the correct name should be "Category:English DoggoLingo". German-language DoggoLingo terms would go under "Category:German DoggoLingo", French DoggoLiggo would go under "Category:French DoggoLingo", etc. (Presuming this meme has spread to other languages.) WordyAndNerdy (talk) 06:24, 30 July 2022 (UTC)[reply]
@WordyAndNerdy: I just stated why it does not have “English” in the title: see DoggoLingo: “A form of English-language Internet slang related to dogs”. For the same reason, Category:Rotwelsch is not “German Rotwelsch”, etc. J3133 (talk) 06:29, 30 July 2022 (UTC)[reply]
We could use some empirical data here. Does DoggoLingo or a close equivalent actually exist in German or French? If it does, that provides some reason to approve this proposal (and possibly to update the relevant articles). If not, it provides some reason to reject it. 98.170.164.88 06:40, 30 July 2022 (UTC)[reply]
I agree with this. WordyAndNerdy, do you have proof that Internet slang related to dogs (i.e., of the type of DoggoLingo) exists in other languages, and would use the same name derived from English slang? J3133 (talk) 06:45, 30 July 2022 (UTC)[reply]
*deep existential sigh* English-language lexical categories have an established naming convention. I have never seen an English-language lexical category that was just "Category:Word type" (e.g. "Category:Fandom slang", "Category:Military slang", etc.) in 10+ years of contributing. Can't speak for lexical categories in other languages, but if someone wants to change an established convention, they need to do so by obtaining consensus, not by unilaterally imposing a new standard. This is an extremely straightforward request and having to get bogged down in bureaucratic discussions like this means less time for doing productive things like attesting Internet slang. WordyAndNerdy (talk) 07:15, 30 July 2022 (UTC)[reply]
The difference is that “DoggoLingo” is a proper noun, not just another word type. J3133 (talk) 07:18, 30 July 2022 (UTC)[reply]
*deeper existential sigh* This reasoning is, to be perfectly frank, bizarre and arbitrary. Twitch-speak is a proper noun too. Guess what the relevant English-language lexical category is named? There's an established convention, and this category's name doesn't follow it. WordyAndNerdy (talk) 07:27, 30 July 2022 (UTC)[reply]
A proper noun that specifically refers to English, if you are not already aware. Like Rotwelsch is a proper noun referring to German. J3133 (talk) 07:30, 30 July 2022 (UTC)[reply]
The Wikipedia article defines DoggoLingo as an "Internet language" and doesn't specify that it's limited exclusively to English in said definition. In any case, this is completely perpendicular to the issue of what the category should be named. No one had to prove the existence of Dutch Twitch-speak, Korean Twitch-speak, etc. to create "Category:English Twitch-speak." That's what the category ought to be named following the established naming convention of English-language lexical categories. (And given that you haven't incorporated this category into the category tree module -- which is like step two of creating a new category -- maybe it isn't prudent to act as if you have special expertise or authority in this area.) WordyAndNerdy (talk) 07:51, 30 July 2022 (UTC)[reply]
You are the one acting you have authority here, though. Category:Rotwelsch is not in the category tree either, not sure what is your point. J3133 (talk) 07:54, 30 July 2022 (UTC)[reply]
Let consensus decide, instead of assuming this is a “straightforward request”. J3133 (talk) 08:02, 30 July 2022 (UTC)[reply]
Support. I'm reminded of the discussion I started about how to handle Category:Thieves' cant. Binarystep (talk) 06:37, 31 July 2022 (UTC)[reply]
And the consensus of that discussion seems to be that "Thieves' cant" is a strictly English historical example of criminal slang, and that the non-English entries in Category:Thieves' cant should be moved to language-specific criminal slang subcategories- the opposite of this proposal.
It's true that there's a naming convention to put language names in category names, but that doesn't apply to this kind of entry, and saying it does shows a misunderstanding of the convention. While there's nothing to stop other languages from having their own equivalents to DoggoLingo, it seems to have been created by English-speakers using humor based on the peculiarities of the English language. If other languages come up with their own equivalents, I sincerely doubt that they would be called DoggoLingo. DoggoLingo is a variety of English, just like pig Latin and double Dutch, and "English DoggoLingo" would be redundant. Chuck Entz (talk) 08:13, 31 July 2022 (UTC)[reply]
Funny you should mention Pig Latin, since the category for that is called Category:English Pig Latin terms. Binarystep (talk) 02:54, 1 August 2022 (UTC)[reply]
For the record, that category is very poorly formatted unsurprisingly. There’s no overarching Category:Pig Latin or Category Pig Latin terms, nor does there seem to be other languages linked to it, so there really shouldn’t be an English label there. AG202 (talk) 11:02, 1 August 2022 (UTC)[reply]
I agree, the English label should be removed; see the RFM. J3133 (talk) 11:16, 1 August 2022 (UTC)[reply]
As I did not vote: oppose per Chuck Entz. J3133 (talk) 10:02, 31 July 2022 (UTC)[reply]

2022 — August[edit]

As AG202 stated in the DoggoLingo category RFM, “There’s no overarching Category:Pig Latin or Category Pig Latin terms, nor does there seem to be other languages linked to it, so there really shouldn’t be an English label there.” This was after Chuck Entz used the argument there that “English DoggoLingo” would be redundant, “just like pig Latin”, then Binarystep pointed out that the Pig Latin category does use the English label—redundantly. J3133 (talk) 11:24, 1 August 2022 (UTC)[reply]

This idiom is far more versatile than the specific and somewhat informal phrasing we have here (which doesn't even match the quotation we have), it's a fully fledged verb phrase — see the examples at Teaching grandmother to suck eggs.

Two points: there is such a wide range of familiar terms for grandmother that can be used in this phrase so I think it's best to stick with "grandmother". However I think it's worth investigating if it's more common with or without the possessive pronoun (here "one's"); to me it sounds more natural with it but there are citations both ways. 86.145.59.120 18:42, 14 August 2022 (UTC)[reply]

We also have teach grandma how to suck eggs. J3133 (talk) 08:19, 15 August 2022 (UTC)[reply]
I'm somewhat inclined to pick a most common or general negative form to lemmatize like not teach grandmother how to suck eggs, and also have the positive form (maybe teach grandmother to suck eggs since a possessive doesn't seem required? or if a pronoun is more common, then redirect the pronounless form to the pronouned form, either works). This is both because it's unclear how many translations can have the negative removed and because in general, as I said in the discussion of all it's cracked up to be further up this page, when we redirect a negative expression to a positive one or vice versa there's a risk that a reader who doesn't notice they were redirected will come away thinking the phrase means the opposite of what it actually means. To avoid duplication we could make the negative form almost a soft redirect, defining it like "To not teach grandmother to suck eggs (presume to give advice to someone who is more experienced)" or even "To not teach grandmother to suck eggs (see that entry)"; I don't know, I don't like splitting content across multiple pages, but I also think it's risky to silently strip away the negative polarity with a seamless little redirect and expect IPs who sometimes don't even notice they're on Wiktionary and not Wikipedia to notice and understand that the polarity of the headword has changed and thus that the definition of the term they looked up is the opposite of the one we're giving them. - -sche (discuss) 15:20, 19 August 2022 (UTC)[reply]
Negative polarity is "licensed" in many forms, starting with the negative being separated from the rest of the expression: conditionals, questions, infinitives with certain verbs (eg, try to) or other expressions (eg. hard to). These might lead someone to look up the positive form. I think that a "negative-polarity item" label (with link to WP or our Glossary), usage examples with adjoining and disjoint not and n't, and redirects would enable us to use the positive form as the lemma. I don't see how to use redirects in the other direction. Even usage examples would be problematic with not in the headwords. DCDuring (talk) 21:07, 19 August 2022 (UTC)[reply]
What I mean is, I'm somewhat inclined to have both "not teach grandmother to suck eggs" defined as "not give advice to someone more experienced", and then also "teach grandmother to suck eggs" defined as "give advice to someone more experienced", redirecting all the various negative forms to the first one and the positive forms to the second one. But I'm not opposed to only having the positive form and redirecting everything to it; I do dislike splitting content across multiple pages, I just also think there's always a danger when someone types "not teach grandmother to suck eggs" into the search bar and as seamlessly sent to "teach grandmother to suck eggs" where they read a definition that's inverted from that of the term they typed in and which they think they looked up. - -sche (discuss) 21:46, 19 August 2022 (UTC)[reply]

Split [zhx-pin] into [cnp] and [csp][edit]

[zhx-pin] is an etym-only code added back in 2014 (diff) as [pinhua] and later renamed to [zhx-pin] in 2019. [cnp] and [csp] are ISO 639-3 codes added in January 2020. Note that the current data module incorrectly suggests [yue] (Cantonese) to be the parent of [zhx-pin], but they are generally considered to be distinct, which is mentioned in ISO's comment on the change request. -- Wpi31 (talk) 14:40, 23 August 2022 (UTC)[reply]

Support 12:29, 6 October 2022 (UTC)[reply]
Support — justin(r)leung (t...) | c=› } 16:10, 6 October 2022 (UTC)[reply]
Support Theknightwho (talk) 15:39, 13 December 2023 (UTC)[reply]

Given this has been open for over a year, I'm going to close this as split. Theknightwho (talk) 15:39, 13 December 2023 (UTC)[reply]

Should be moved to a different title as “gender-neutral” is misleading: e.g., femxle, mxn, and womxn are not gender-neutral (only one is—Mixter). Created by WordyAndNerdy who stated “This is the senseid name used to link the -x- infix. Maybe a different name would work better, but this senseid is already baked into links in entries and the category name.” J3133 (talk) 11:03, 24 August 2022 (UTC)[reply]

I have no strong feelings in this matter. This was created after -x, which required a disambiguating sense-id, as there are multiple distinct senses. The simplest solution here would be to just cut "gender-neutral" out of the category name since there is currently only one English sense for the infix. WordyAndNerdy (talk) 11:41, 24 August 2022 (UTC)[reply]
If we decide this is best handled as an affix, I agree we should try to find a better name (possibly just "...words infixed with -x-" as WordyAndNerdy says). Whether this is best handled an affix is under discussion at Wiktionary:Tea room/2022/August#uses_of_x:_to_cover_at_x_or_as_affixes_-x_and_-x-?. - -sche (discuss) 23:21, 24 August 2022 (UTC)[reply]
The four entries have been moved to Category:English terms infixed with -x- for now. If this is felt not to be an infix at all, that can be worked out in the tea room discussion or a fresh RFM. - -sche (discuss) 21:25, 29 March 2024 (UTC)[reply]

Should have one's heart in one's boots be moved to just one's heart in one's boots because it also occurs without have (e.g. when someone stands/waits/etc google books:"with her heart in her boots")? That is why have was dropped from one's heart in one's mouth, according to the edit history. FWIW all three expressions can be found without even the pronouns, as in google books:"heart in throat". - -sche (discuss) 10:35, 28 August 2022 (UTC)[reply]

2022 — September[edit]

Plenty of overlap, spesh with translations. Maybe there's just one species called this, maybe two... something for the animal nerds here... you know who you are Almostonurmind (talk) 00:48, 8 September 2022 (UTC)[reply]

Not finding evidence that O. dalli is ever called "bighorn" or "bighorn sheep". It's called Dall sheep or thinhorn sheep AFAICT. —Mahāgaja · talk 07:09, 8 September 2022 (UTC)[reply]
Formally, that's probably true, though I doubt most people make the distinction consistently colloquially. But that wouldn't be particular to bighorn. I think people who didn't make the distinction would be just as likely to use bighorn sheep when describing Dall's sheep. Andrew Sheedy (talk) 15:18, 21 October 2022 (UTC)[reply]
I have split both into two subsenses and RfVed the O. dalli subsenses. I have not yet found any evidence that either term is applied to O. dalli. I would include O. dalli and thinhorn sheep under See also at both of these entries. DCDuring (talk) 15:57, 21 October 2022 (UTC)[reply]

Re-merge Kven and Meänkieli into Finnish[edit]

@-sche, Chuck Entz, Rua, Tropylium, Hekaheka, Surjection, Brittletheories, Mölli-Möllerö

In the previous discussion on this topic ([1]) it seems everyone has agreed that it's best to merge Kven and Meänkieli into Finnish. However, the discussion was closed without actually merging the codes, and currently we (again) have 40 Kven and 30 Meänkieli lemmas, many of which are also duplicated as Finnish for the reasons discussed in the above discussion. Has anyone changed their opinion or does anyone have anything to add to this or can we actually go ahead and merge the languages?

I guess related to this is also the question of how to handle dialectal morphology of Finnish dialects, but maybe that's a bit out of scope for this discussion. Thadh (talk) 16:24, 23 September 2022 (UTC)[reply]

The strongest arguments in favour of splitting them are political and should therefore be ignored. Our task is to best present the most information, and that would best be achieved by merging the three lects. The dozens or so new dialectal terms will fit in quite well with the 1250 pre-existing ones. brittletheories (talk) 16:49, 23 September 2022 (UTC)[reply]
Incubator says "Wikimedia does not decide for itself what is a language and what is a dialect. We follow the ISO 639 standard." This means that it's up to the agency that grants language codes, not to us, right? Meänkieli and Kven have written standards so they should stay as they are. (In my view, Tver Karelian should also be treated as a language so I could add Tver Karelian words without knowing if they're used in the more usual "vienankarjala" dialect.) Mölli-Möllerö (talk) 19:55, 23 September 2022 (UTC)[reply]
The Incubator standards are not the same as our standards. Our language treatment does not strictly follow ISO 639. — SURJECTION / T / C / L / 20:33, 23 September 2022 (UTC)[reply]
@Mölli-Möllerö: On the Tver Karelian issue, you could also just leave the first parameter of {{krl-regional}} empty or |1=? it, and it will automatically be sorted in Category:Karelian term requests, and I'll be able to add the terms later. Or you could use either {{R:krl:KKS}} or another Viena source, the correspondences are usually quite easy. Thadh (talk) 20:44, 23 September 2022 (UTC)[reply]
Wrong. There's a big difference between Wikimedia's administrative needs and the lexical needs of a dictionary. As for written standards: the world is full of languages with multiple written standards: Brazilian and European Portuguese, European and Canadian French, Austrian and German German, etc. We can't let others decide for us- each case needs to be considered on its own. We've chosen to merge languages treated as separate by ISO and recognize languages with no ISO codes. In other cases we've gone with the ISO. Chuck Entz (talk) 20:59, 23 September 2022 (UTC)[reply]
For outsiders, Meänkieli (in Sweden) and Kven (in Norway) are languages or rather dialects that have become languages by virtue of being across the border (the Finnish-Swedish border and the Finnish-Norwegian border, respectively). Finnish speakers can easily understand nearly 99% of Meänkieli or Kven, and the main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other).
Linguistically they are 100% dialects, but politically both Sweden and Norway respectively have recognized them as separate languages, which is also what their speakers think. A more cynical person might say that they have deluded themselves into thinking their language is not Finnish in order to avoid persecution of Finnish that was prevalent in Sweden and Norway in the 19th and 20th centuries ("Finnish? what Finnish? we're not speaking Finnish, it's Meänkieli/Kven").
However WIktionary best handles cases like these, I don't know. 200 years is not enough for what is generally a phonologically conservative language for it to become anywhere near unrecognizable. It could be compared to how Karelian is now almost universally treated as a separate language, even though it forms a dialect continuum and has been diverging now for at least about 800 years (ever since the 1323 Treaty of Nöteborg).
Finnish sources almost exclusively consider Meänkieli and Kven to be dialects, even more so when these sources are linguistic-oriented (some other sources take a political stance and recognize that they are considered "minority languages" in their respective countries). — SURJECTION / T / C / L / 20:34, 23 September 2022 (UTC)[reply]
"The main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other)"... and the additional Swedish/Norwegian loanwords found in Meänkieli/Kven, of course. But many of these are also found in Finnish dialects. — SURJECTION / T / C / L / 21:37, 23 September 2022 (UTC)[reply]
The divergence of Karelian from Finnish, FWIW, almost certainly goes back at least 1200 years (to the archeological / mentioned-in-Novgorod-sources Old Karelian culture). The initial split-off of Northern Finnish dialects is probably about as old too.
What I would think of as the best argument against treating Meänkieli and Kven as languages is that they're not even internally well-defined — typically they're just catch-all terms for "Northern Finnish in Sweden" and "Northern Finnish in Finnmark" with relatively various dialects encompassed by each. There's some efforts (schoolbooks, etc.) towards a "standard" Meänkieli based on the Torne Valley dialect but I don't think it could be called actually standardized just yet. I suppose one thing we could do is to document whatever is done on this specifically under "Meänkieli" and leave anything else as dialectal Finnish, but that might be a bit premature still too. --Tropylium (talk) 07:44, 24 September 2022 (UTC)[reply]
I would not say that "everybody" agreed on the merger. I didn't. I can only comment Meänkieli but I would not be surprised if similar argumentation would also apply for Kven:
  • The overall small number of Meänkieli words in Wiktionary only proves that we don't have an active editor in Meänkieli. There seem to be some 30,000 entries in this Meänkieli--Finnish-Swedish dictionary[15]
  • The small sample of words we have proves nothing of similarity of the vocabularies. If you study the dictionary I mentioned (press "tutki") you'll find that there are considerable differences between Finnish and Meänkieli. In addition to vocabulary, conjugation of verbs seems to differ (e.g. Meänkieli: tukeat - Finnish: tuet - English: you support).
  • This article[16] promotes the opinion that Meänkieli is a dialect. However the writers admit that the two are not readily mutually understandable: Finnish-speakers usually understand Meänkieli relatively well, partly because of their knowledge of Swedish, but for Meänkieli speakers Finnish isn't as easy. If we took a Finn who does not know a word of Swedish, they would be lost with a Meänkieli speaker.
  • This article[17] starts from the maxim that Meänkieli is a dialect of Finnish but finishes with the conclusion that at the end of the day it is the spakers of a language themselves who decide the status of a language/dialect. Meänkieli speakers have made their opinion clear: they want it treated as a language. How competent are we to second-guess their point of view? Has any of us studied Meänkieli more than superficially?
Here is also a link to a Kven-Norwegian dictionary[18]--Hekaheka (talk) 09:44, 24 September 2022 (UTC)[reply]
To be fair all these points would still hold for Ingrian and Savonian dialects, too, and of Ingrian dialects I'm fairly certain no Finnish speaker would readily understand them much better than, say, Izhorian or Karelian. Thadh (talk) 09:51, 24 September 2022 (UTC)[reply]
A clear-cut solution would be to stick to ISO. Ingrian has an ISO code, Savo hasn't. Is Ingrian currently treated as Finnish dialect? I think it shouldn't. --Hekaheka (talk) 12:05, 24 September 2022 (UTC)[reply]
You're confusing Ingrian (inkeroinen) and Ingrian (inkerin (suomalainen)). The first one is the same as Izhorian and is handled as a distinct language, has an iso code, and is spoken by the orthodox Izhorians. The latter one is the same as Ingrian Finnish and is handled as a Finnish dialect, does not have an iso code, and is spoken by the lutheran Ingrian Finns. My remark concerned the latter. Thadh (talk) 13:46, 24 September 2022 (UTC)[reply]
I've come around to say that I think they should be merged. We don't consider Valencian, Ulster Scots nor Lemko (the linguistic case is very similar between those examples and this one) to be their own languages despite political arguments that they should be considered as such (and even some recognition like in the ECRML). We shouldn't do so here either. And don't even mention the whole thing going on with Serbo-Croatian... The general trend on en.wikt seems to be to consider the linguistic argument more important than any political ones (which I can appreciate). — SURJECTION / T / C / L / 11:51, 3 October 2022 (UTC)[reply]
As a Norwegian, I find it odd that there is a proposal to merge Kven with Finnish - as Kven is an officially recognized minority language in Norway (Finnish is not). I do not agree with this merge, for the following reasons:
  • At least in Norway, Kven and Finnish are considered separate languages. You are able to get elementary school education and books in Kven (but not in Finnish, as far as I know) - you can even study Kven at the University of Tromsø and receive a bachelor's and master's degree in the language (there is a Finnish one as well, and they are considered two separate degrees). Kven people are considered a separate ethnicity, along with their language, descendant from Finns/Finnish.
  • Political reasons are of course relevant, not just linguistic ones. The average Kven speaker has never set foot in Finland, never studied any Finnish, nor consumed any part of Finnish culture and media (music, literature, etc.). An argument was that Finnish speakers understand 99% of Kven - as a Norwegian I understand up to 99% of Swedish and Danish, but they are not getting merged into one language called Scandinavian (for political reasons).
  • If merged, then in theory thousands of new Finnish entries on Wiktionary would emerge, in the form of "dialectal" words which are actually Kven words. If someone bothered to add them all (I, stubbornly, might) - then every Kven word and declension would need to be added under Finnish, and certain words and forms which don't even exist in Finnish dialects in Finland would be present. Every Kven word, even if the nominative singular is identical to Finnish, has a separate declension chart, every single one - there would then need to be a separate template to show these (I think Finnish Wiktionarians would be quite annoyed by this).
  • Kvens in Norwegian have fought very hard for their language, they have gotten their own language institute with a promotion of literature and culture in the Kven language - erasing their language from Wiktionary and treating it as a dialect of a language they don't even speak would be a huge slap in the face. Finns in Finland who speak a dialect of Finnish, also all know standard Finnish, Kven people do not. If a Kven person handed in an essay at a school in Finland, every other word would be marked as wrong or a typo. Supevan (talk) 22:49, 2 November 2022 (UTC)[reply]
This entire argument can be boiled down to "Kven is standardized". So is Valencian and Croatian, but we still don't treat them as separate languages. — SURJECTION / T / C / L / 14:57, 5 November 2022 (UTC)[reply]
@Surjection: Actually, Kven isn't firmly standardised afaik. Thadh (talk) 14:58, 5 November 2022 (UTC)[reply]
We should. Supevan (talk) 17:35, 5 November 2022 (UTC)[reply]
@Supevan Most of these points were already raised for Meänkieli. I will try to answer them anyways.
1) First, our standard procedure is to emphasise linguistics over politics, even when much more controversial (see WT:Serbo-Croatian).
2) Secondly, and most importantly, you claim all Kven inflection should be incorporated into Finnish. This is false. There is already a ridiculous amount of variation in the inflection of the various Finnish dialects, and none of it is represented here. We simply do not have the capacity to maintain 30 different tables containing dozens of inflected forms. Additionally, natives do not stick to one variety of Finnish but mix standard Finnish grammar with that from various dialects and registers. It would also be naive to assume that Kven speakers all use one well-defined standard themselves. A language with a morphology as righ as that of Finnish leaves much space for variation.
3) You say, "thousands of new Finnish entries [– –] would emerge, in the form of 'dialectal' words which are actually Kven words", but this is only true if one assumes Kven not to be a collection of Finnish dialects, which is not a popular opinion among linguists. Besides, only a small number of these terms are exclusive to the Ruija dialects.
brittletheories (talk) 13:46, 27 January 2023 (UTC)[reply]

2022 — October[edit]

ghc: Classical Gaelic aka Early Modern {Irish / Gaelic}[edit]

I’d like to propose adding the Classical Gaelic language with the code ghc to Wiktionary, ie. split the ghc (called Hiberno-Scottish Gaelic in Ethnologue) code from ga and gd.

The code had existed on Wiktionary before (due to being an accepted ISO-639-3 code) but it was merged in 2013 into ga and gd and the move was backed by two arguments: “it seems crazy the number of Irishes we have over time” and “[t]here's no reason 17th-century Irish can't be simply ga (…) [s]peakers of Modern Irish have no more difficulty reading Geoffrey Keating than speakers of Modern English have reading Shakespeare” and I believe this merger was a mistake – especially when we also keep Old and Middle Irish distinct.

The first argument is just a subjective view (on a language with pretty good attestation from 4th century CE til today) of a person not familiar with the history of Goidelic languages. Also, we somehow have no problem with the amount of old- and East-Slavics that we have (reconstructed Proto-Slavic, Old Church Slavonic, Old East Slavic, Old Novgorod, Old Ruthenian – all of them often listing exactly the same forms).

The second one is not applicable to the stage in question in general. The Early Modern stage of Irish and Scottish Gaelic covers the language from the early 13th century up to late 17th century. That’s half a millennium of a language change. Now, we treat this 13th century stage as modern Irish (mostly, rarely as Scottish Gaelic), thus the Irish label and the ga language code is supposed to cover everything Irish from conservative 13th century literature up to colloquial 21st century language.

Also important to note is the term Classical Gaelic (sometimes Classical Irish) – generally applied to the literary standard created in late 12th century and used consistently in dán díreach over the centuries and taught in bardic schools of late medieval and early modern Ireland and Scotland. From The linguistic training of the mediaeval Irish poet, Brian Ó Cuív (1973), DIAS, →ISBN:

Nowadays we regard Early Modern Irish as beginning about the end of the twelfth century. This view is based on the fact that from that time on professional poetry has two distinctive features, the first linguistic, the second metrical. On the linguistic side we can observe the poets using, as a literary medium, a standard language which seems to have had as its basis a normative or prescriptive grammar. On the metrical side we have, from about the year 1200 on, a clear-cut distinction between strict versification in syllabic metres, dán díreach, and other types, such as óglachas and brúilingeacht. (…). I have discussed this development in a recent article in Éigse where I have suggested that the final stage in the development was reached in the second half of the twelfth century.

(…)

I have implied that the vernacular showed variation from place to place, and we may be sure that even within any one area it showed variation between speakers of one age-group and another. The master-poets did not balk at these difficulties. What they did, it would appear, was to examine fairly thoroughly the various current forms of speech against the background of the existing literary usage, taking into account both morphology and syntax. If what they observed of the language at that time had been written down and identified according to regions, and if the manuscripts containing their observations had survived the vicissitudes of the intervening centuries, we would have to-day a fascinating and unique collection of descriptive linguistic material. However, what the poets did was to co-ordinate this material to produce a prescriptive grammar. I suggest that the resultant literary language about the year 1200 had the following elements:

  1. A large residuum of the older language surviving in all areas in modern form (i.e. allowing for phonetic changes, etc.),
  2. Variant forms which had been in use in the language for some considerable time and which were retained either generally or regionally in the modern language,
  3. Modern speech-forms in general use, adopted to the exclusion, or near exclusion, of the corresponding old forms where such forms had existed,
  4. Modern speech-forms, possibly in use at local level only, adopted beside surviving old forms,
  5. Modern speech-forms showing variation, possibly reflecting regional differences, adopted to the exclusion of the corresponding old forms where such forms had existed,
  6. An archaic element consisting of forms which, being either obsolete or obsolescent, were not normal in the ordinary language.

This language was a codified standard of late 12th century spoken Gaelic dialects and includes some features long lost in modern Gaelic languages (and as Ó Cuív mentioned, probably not common already in 13th c.): infixed object pronouns, conjugated copula, accusative-of-motion, accusative direct objects, etc. It is well attested in bardic poetry of 13th–17th centuries and its grammar is described in bardic grammatical tracts (some most important of those were published by Osborn Bergin as Irish Grammatical Tracts and by Lambert McKenna as Bardic Syntactical Tracts). Morphologically and syntactically this language is closer to Middle Irish than to Modern Irish or Scottish Gaelic.

(It also was not used in prose – it was purely a poetic standard. Even parts of poems written in prose do not adhere to it, and also the aforementioned grammatical tracts do not follow it closely in their main text, they just explain its rules and give examples in verse that do follow the standard closely.)

So giving Geoffrey Keating as an example of a ghc-language author was cherry-picking a prose author from the most recent stage of the 500-year-long period.

It was also claimed in the old discussion that Early Modern words can be added to either Irish or Scottish Gaelic as obsolete or archaic, wherever they are attested. But how exactly do you classify words used by Scottish poets writing a praise-poem for a Connacht king and preserved in an Irish manuscript? Or a Munster poet living in Scotland and writing for Scottish lords? That happened, and both Irish and Scottish poets used the same standard, and as I understand, they were trained in the same bardic schools.

The lack of Classical Gaelic as an acknowledged stage on Wiktionary limits its usefulness to anyone interested in pre-18th century Irish. It also makes us list some nonsensical forms. For example the verb ibh (drink) is the standard bardic verb meaning ‘drink’, but as far as I’m aware it hasn’t been used in spoken Irish for centuries. But Wiktionary lists inflected forms of Irish verbs and thus we list the regular present tense form ibheann in that article – this form is not attested in the Irish historical corpus in texts from 1600–1926 even once. The standard bardic 3rd person sg. of this verb is ibhidh (independent) or ibh (dependent, with ibheann hypothetical but unattested(?) variant).

We also list adhaigh (night) which, in this nominative form, does not exist anywhere outside of bardic poetry for centuries, having been replaced by dative form oíche, oidhche. And yet not that long ago we listed nonsensical unattested forms *adhaighe and *adhaigheanta as if it were a regular modern variant.

Those terms exist in FGB and Dinneen’s dictionary (which generally are modern Irish dictionaries), but that’s just because they do sometimes list medieval words as literary.

There are also terms used in the grammatical tracts like taoibhréim for ‘genitive’ which are not used and understood today, not even listed in FGB or Dinneen’s – I won’t add them to Wiktionary because I am not sure if I should do that under ga, gd, or nowhere. And whether I should provide inflection for them or not. I did add the classical sense and examples for nar but am not sure if it was the right thing to do, since that’s also not part of the language for centuries.

Thus my proposal is to restore ghc as a separate language code under the header Classical Gaelic. Since the language was a prescriptive literary standard, I wouldn’t consider it the direct ancestor of Irish or Scottish Gaelic but rather an independent well-attested historical stage (that sometimes would be useful in etymology sections though). I don’t see a problem with continuing to include 16th (maybe 15th) century ⁊ later Irish and Scottish Gaelic prose under the modern languages, but I would consider everything from bardic verse and grammatical tracts to be Classical Gaelic instead. If, for a given lemma, usage differed in prose and poetry, usage notes could clarify what was classical and what was early modern.

Alternative solution: if the “crazy number of Irishes” is an issue, we could merge Old Irish (sga), Middle Irish (mga) and add Classical Gaelic to them – especially since the line between Old and Middle Irish often isn’t clear (while there are some unambiguously Old Irish texts, many Middle Irish manuscripts contain Old Irish stories that mix older and newer forms, the two stages also use mostly the same spelling although that changes later due to some MIr. sound changes) – all three under a single heading like Early Irish, Early Gaelic, or similar. Then, how the forms changed could be documented inside that entry, somewhat similarly to what we do with Ancient Greek (which tries to cover over 2 millennia of development). That’s also what the Dictionary of the Irish Language (the historical dictionary for Old/Middle Irish) does, it covers everything from 7th century until 17th century (but does it in a very confusing, hard to use way if you’re not very well familiar with all the sources it cites – we could be better here). // Silmeth @talk 19:32, 4 October 2022 (UTC)[reply]

This proposal seems pretty reasonable to me. I particularly agree with the points about the inflectional morphology of 13th-17th century Classical Gaelic which is severely under-represented on Wiktionary at the moment. It doesn't make much sense to ignore the inflectional morphology of this significant stage in the language as the developments since then have been quite extensive to count under 21st century varieties. This includes the nominal morphology with the extra accusative case, dual number, and dative plural endings, and an extensive range of copular and verbal forms with personal endings that are now long gone from any modern variety. It would be hard to imagine either L1 or L2 speakers of today's Gaelic being able to recognise such forms without a significant degree of study. I think making space for Classical Gaelic makes the most sense. It certainly doesn't make any sense to lump it all under Modern Irish when it served as a literary standard for both Ireland and Scotland during the Medieval period.
I would say that if we don't assign a space to this period of Gaelic then we should at least consider a massive upgrade to the inflectional morphology under both Modern Irish and Scottish Gaelic (because why only one and not the other?). One of the challenges with upgrading the forms for the modern varieties would be within the realm of the significant orthographic developments that have occurred since Classical Gaelic. I'm not sure what the best approach would be to achieve this in order to represent a wide range inflections that no longer occur in either variety, both having their own orthographic standards. It might be easier to achieve with Scottish Gaelic since its orthography is generally more conservative, and closer to that of Classical Gaelic. Though, this ultimately suggests that it would be easier to set a single space for Classical Gaelic with its own orthographic and morphological standards and definitely less work overall while achieving a much better representation of this historical form that is barely being represented at all at the moment.
Currently, the jump between Old/Middle Irish to Modern Irish and Scottish Gaelic is too great a leap to properly show the respective etymologies and inflections. I hope this proposal goes ahead and am excited to see what this could lead to. Erisceres (talk) 21:59, 4 October 2022 (UTC)[reply]
I forgot the pings: @Marcas.oduinn, Mahagaja, Mellohi!, Moilleadóir, Rua, Catsidhe, Embryomystic, Akerbeltz – not sure who else is active in (historical) Gaelic // Silmeth @talk 11:43, 5 October 2022 (UTC)[reply]
I'm still not convinced that Early Modern Irish and Classical Gaelic can't be adequately covered with "ga", "gd", and generous use of the {{lb|ga|archaic}} and {{lb|ga|obsolete}} labels, but whatever. —Mahāgaja · talk 12:10, 5 October 2022 (UTC)[reply]
What about a special label like we do with {{lb|pl|Middle Polish}}? Vininn126 (talk) 12:40, 5 October 2022 (UTC)[reply]
Should we then put them under ga or gd? I guess ga by default cause most of the sources are from Ireland, but then what about those Scottish poets and some Scottish manuscripts? Should we just list basically the same bardic language under two different headers?
What about inflection tables? Should we remove the inflection forms from ibh completely, or replace them with classical ones marking each of them as classical? What with stuff like feic which classically was faic (doesn’t exist right now, its Scottish Gaelic cognate’s entry is there though) and had forms like 1st sg. do-chiú, ad-chiú – should I create the entries faic and faicsin under Irish as “classical/obsolete form/verbal noun of feic” and add obsolete/classical forms to the main modern entry? What about ag derived from OIr. *aicc ‘see’ used in phrases ag so (+ accusative) ‘this is’, ag sin (+ accusative) ‘that is’ – from which modern Irish sin fear, sin é an fear and Sc. Gaelic seo mo mhàthair, etc. phrases come, but have dropped the ag part and were reanalyzed as copular?
Should we list classical pronunciation under modern Irish and Sc. Gaelic headers? That’s important for the classical poetry (syllabification especially differs from modern languages) and often not reflective of modern spelling (although, truth be told, dialectal pronunciations often diverge from the standard spelling too).
I mean, sure we could treat anything 13th century+ as modern Irish (and sometimes Sc. Gaelic) but if we do, I’d like to have some clear policy on it, and a one that makes it clear to the reader that a given form is not modern. And as I wrote above, in such case I’d rather group classical language under one header with Old and Middle Irish.
I know Old Irish scholars often do use the term Modern Irish to refer to anything 13th century+, but it does not make forms like adhaigh, faic, do-chiú, fhiora, caiméal, meic, ibhidh, inéasad, -fuile, do bhádhas, etc. modern (coincidentally, thanks to the choice of imperative as the headword form of verbs, the hypothetical classical and modern-Irish-in-pre-reform orthography headwords often are the same) // Silmeth @talk 13:12, 5 October 2022 (UTC)[reply]
This proposal seems more than reasonable to me. We cannot adequately accommodate Classical Gaelic under gd/ga plus an archaic label because that would simply create a weird overlap with words which are archaic in MODERN ga/gd but aren't as far back as Classical Gaelic. For example gd has the archaic verb fimir 'must' which became archaic/defunct when the dialects around Inverness died off in the last 50-100 years but AFAIK it's not even attested as a form in Classical Gaelic.
I don't think the number of Irishes is crazy, if it comes across as such to some people, it's perhaps merely a reflection of the unusually long history of writing in Ireland which goes back further than other less well documented languages which only joined the writing party centuries later. Akerbeltz (talk) 13:26, 5 October 2022 (UTC)[reply]
  • Support I am convinced that the grammar and vocabulary of Classical Gaelic is distinct enough from today's Irish that they should be treated separately. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 15:38, 5 October 2022 (UTC)[reply]
Support Well, I'm convinced. embryomystic (talk) 20:26, 5 October 2022 (UTC)[reply]
  • Support I am not an expert on the history of the Gaelic languages, but it seems to me that Silmeth who is much more knowledgeable than I am has presented a very well argued case for adopting the use of the code ghc in this way.
I don’t know where the code ghc “Hiberno-Scottish Gaelic” came from. I guess it originated with Ethnologue and mostly from an ill-conceived idea of covering the in-between Gaelic of Rathlin Island. Rathlin Island Gaelic was never large or different enough to merit a language code all of its very own, and sadly it has long since gone as a living language. Hence the code has been abandoned by Ethnologue and marked as “historical” by ISO 639-3. And since then the code ghc seems to have been increasingly “repurposed”, or “more clearly defined” if you like, a lot further back in time. The ISO 639-3 page for ghc links to “ISO 639:ghc” on Wikipedia, which redirects to “Early Modern Irish”. It also links to code hibe1235 on Glottolog, which links to the Wikipedia page “Classical Gaelic”. It seems to me to be a good idea to allow code ghc on Wiktionary and to define it clearly to be the Classical Gaelic of the bardic schools.
At times when we need to force languages into a genetic tree structure, I guess that the Goidelic tree would show pgl as ancestor of sga, which is ancestor of mga, which is immediate ancestor of all of ghc, ga, gd and gv? (Rather than ghc being ancestor of ga, gd and gv.) I guess this will make the descendant “trees” on Wiktionary look a bit odd at times, with ghc forms similar to parent mga forms being shown on the same level as the modern languages ga, gd and gv.
--Caoimhin (talk) 14:31, 6 October 2022 (UTC)[reply]
@Caoimhin: Yeah, that’s how I imagine it, pglsgamga → {ghc, ga, gd, gv}.
Except that the ghc head-words would often look closer to modern Irish and/or Scottish Gaelic forms – because the spelling was changing in late Middle Ages. (But sometimes, well, longer, eg. classical mathghamhain, Irish mathúin, Gaelic mathan) – many (especially later) classical manuscripts use spelling close to pre-reform modern Irish spelling (or rather – pre-1950s Irish used spelling pretty close to classical). Hence many modern normalized editions of classical texts use basically the spelling of Dinneen’s Dictionary, but there are exceptions – notably Eoin Mac Cárthaigh’s modern edition of the “first” or “Introductory” (though it’s really neither) tract, The Art of Bardic Poetry: A New Edition of Irish Grammatical Tracts I, normalized the spelling in much older fashion, to something closer to what the tract itself prescribes (eg. Gáoidheilg instead of Gaoidhilg or Gaoidheilg; brég instead of bréag, a ttigh instead of i dtigh, etc.). I think we should stick to Dinneen’s spelling though, I believe that’s the most common practice nowadays. We can of course list all variant spellings.
The inflected forms, if we list them, would be closer to Middle Irish with modernized spelling.
One thing we’ll need to settle is how we lemmatize verbs:
  • we use imperative for modern Irish, Sc. Gaelic, Manx and 3rd sg. for Old and Middle Irish,
  • DIL uses 3rd person for classical and so does the word-list in aforementioned The Art of Bardic Poetry,
  • léamh.org glossary (mostly based on word list in Aithdioghluim Dána by McKenna) uses 1st person sg., as does Dinneen (for modern pre-reform Irish).
I don’t have a strong preference in any direction here. // Silmeth @talk 15:25, 6 October 2022 (UTC)[reply]
@Caoimhin: also, you made me think about the code. I see Wikipedia has claimed the ghc code is intended for Gáoidhealg Chlasaiceach at least since 2007, and I can’t find anything publicly available on the Internet that would use the name Hiberno-Scottish Gaelic for anything else. But of course your suggestion that the intention could be to represent Rathlin Gaelic (being basically a Scottish Gaelic variety native to Ireland) makes perfect sense! So I wonder if we have here an example of a mistake made on Wikipedia due to misunderstanding forming the reality instead of describing it. Do you know if there are any documents that would explain the original intention of the code and whether they’d be publicly available? I see the SIL ISO-639-3 website doesn’t list anything (except for noting the type change to “historical” in 2019).
I guess Ethnologue ed. 15 would be the place to look (but I can’t access that)? I managed to verify that ed. 12 doesn’t list this code or language under Ireland or United Kingdom. // Silmeth @talk 22:32, 6 October 2022 (UTC)[reply]
Ethnologue ed. 15 (2005), page 565:
Gaelic, Hiberno-Scottish (Gaoidhealg, Hiberno-Scottish Classical Common Gaelic) [ghc] Extinct. Ireland and Scotland. Class: Indo-European, Celtic, Insular, Goidelic. Lg Dev: Roman script. Bible: 1690. Other: Archaic literary language based on 12th century Irish, formerly used by professional classes in Ireland until the 17th century and Scotland until the 18th century. vso.
98.170.164.88 22:39, 6 October 2022 (UTC)[reply]
Ah, so indeed it was intended for the classical language, before Wikipedia. That’ll make me sleep a bit easier! :) Also I had no idea Ethnologue is available in Internet Archive, didn’t cross my mind to look there. Thanks for the quote and link! // Silmeth @talk 23:07, 6 October 2022 (UTC)[reply]
Ah, very good! So I was wrong in guessing that ghc was originally meant to refer to Rathlin Island Gaelic and the like. I was misled a bit by the name “Hiberno-Scottish Gaelic”, and by the fact that the code originated with Ethnologue, which normally only deals with living (or very recently extinct) languages. I suppose the ‘c’ in ‘ghc’ maybe even stood for “classical”. By the way, I notified a few people much more knowledgable than myself and got replies from John Cowan (linguistics, computing, language codes) and David Stifter (Old Irish, early Celtic). Both said that the proposal sounded reasonable to them in principle, even though this was not their main area of expertese. John Cowan reminded me that it was not just the “Classical Gaelic of the bardic schools” as I wrote but that the grammars were aimed also at the prose-writing needs of lawyers, etc. David Stifter was strongly against any suggestion of combining Old and Middle Irish, with a reminder that Old Irish embedded in Middle Irish manuscripts is still Old Irish, not Middle Irish, and that it was normally very clear which is which. Mark Wringe made the comment that the term “Classical Gaelic” refers solely to a written language, and that the terms “Early Modern Gaelic” or “Early Modern Irish” are used when talking about phonology, language evolution, dialects and suchlike. That is ok, since written language is what Wiktionary primarily deals with at the moment, but I guess it means that we should avoid associating descriptions like “aka Early Modern {Irish / Gaelic}” with the code ghc. // --Caoimhin (talk) 13:07, 7 October 2022 (UTC)[reply]
@Caoimhin:

Mark Wringe made the comment that the term “Classical Gaelic” refers to a written language – but that is ok, since written language is what Wiktionary primarily deals with at the moment.

I would strongly disagree with that. It was a literary language, artificially mixing forms from multiple dialects – yes, but definitely not mainly written. The tracts do focus on pretty minute details of the pronunciation and often mention when something might be spelt differently to how it’s pronounced – and they do emphasize that it is the pronunciation that is important for the poem: rhyming, syllabification, etc., depend on the pronunciation, not on the spelling (and the rules for delenition for example are very strict). We might not know the exact values of the vowels and some consonants (especially how the poets pronounced dh and th – they seem to have merged with gh, sh during Middle Irish, but are still treated differently for the purposes of poems), but the bardic tradition was definitely based on the Gaelic sounds. // Silmeth @talk 16:31, 7 October 2022 (UTC)[reply]
By the way, thank you very much for getting input from all those scholars! I do appreciate it (and didn’t even dare to hope for getting any of them involved in this in any way), it is really great! :) // Silmeth @talk 16:46, 7 October 2022 (UTC)[reply]
@Caoimhin: oh, sorry, I somehow missed the later edits earlier. I totally agree that we should clearly distinguish Early Modern dialects and Classical Gaelic though! And I agree the title I wrote here is bad in this regards.
That’s why I suggested we don’t treat ghc as the ancestor of modern languages, and continue to allow 16th (or 15th) century and later prose to be included in the modern languages, and that whenever we deal with something outside of the standard, we clearly mark it as non-classical (even if described under Classical Gaelic).
For example it’s clear that the preposition do-chum has existed since Old Irish (dochum) continuously up to today (chun), and was often used in the early modern prose, but the grammatical tracts deem it incorrect in verse and AFAIK bardic poetry does avoid its usage (using gus an etc. instead). I think we should include do-chum as a lemma under ghc but mark it as (proscribed) and explain its status in the poetry under Usage notes. I hope this approach is reasonable. // Silmeth @talk 09:19, 10 October 2022 (UTC)[reply]
Support. I'm always of the notion that with proper care, separating L2s out like this will be much more beneficial in the long-run for editors and readers. It looks like with a separate header, Classical Gaelic would have more attentiveness towards it rather than being lumped under (Modern) Irish or Scottish Gaelic and having misaligned inflected forms. AG202 (talk) 05:39, 11 October 2022 (UTC)[reply]

Support. A sound proposal. We should try to reflect historical facts rather than imposing the later divergence on the Classical language. I’d hardly say I was “active in (historical) Gaelic”, but thanks for the ping. ☸ Moilleadóir 05:18, 11 October 2022 (UTC)[reply]

Support. You've made a very good case for this. I look forward to this being created, wiktionary could serve as a brilliant resource for this stage of the language's history. Moling Luachra (talk) 07:21, 12 October 2022 (UTC)[reply]

It’s been over a month and a half with multiple voices of support and no direct opposition (except for one “I'm still not convinced (…) but whatever” voice). So… what happens next? Should I proceed somehow myself? // Silmeth @talk 21:41, 21 November 2022 (UTC)[reply]

Been ~3 months now. Any opinions, suggestions, directions? (Pinging, because I’ve seen ye involved in language treatment changes: @-sche, Mahagaja, Metaknowledge) // Silmeth @talk 15:27, 9 January 2023 (UTC)[reply]
Instead pinging @Benwing, who is more skilled in technical matters and given his participation in the Brythonic dispute. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 02:19, 21 January 2023 (UTC)[reply]
@Benwing2: Wrong account pinged. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 20:53, 21 January 2023 (UTC)[reply]
@Mellohi! I don't think I received your ping and I just noticed this now. What is the request exactly again? Add the code 'ghc' = Classical Gaelic? Can you spell out exactly what needs to be changed in the language data? Apologies, I'm not very familiar with the staging of Irish/Gaelic. Also I'd like to get any comments/thoughts from long-time multi-language editors User:-sche, User:DCDuring and User:Mahagaja to make sure there are no blocking objections. Benwing2 (talk) 07:36, 28 February 2023 (UTC)[reply]
Mahagaja already gave his opinion above: “I'm still not convinced that Early Modern Irish and Classical Gaelic can't be adequately covered with "ga", "gd", and generous use of the {{lb|ga|archaic}} and {{lb|ga|obsolete}} labels, but whatever.”
As for what the proposal is, as I wrote above:

Thus my proposal is to restore ghc as a separate language code under the header Classical Gaelic. Since the language was a prescriptive literary standard, I wouldn’t consider it the direct ancestor of Irish or Scottish Gaelic but rather an independent well-attested historical stage (that sometimes would be useful in etymology sections though). I don’t see a problem with continuing to include 16th (maybe 15th) century ⁊ later Irish and Scottish Gaelic prose under the modern languages, but I would consider everything from bardic verse and grammatical tracts to be Classical Gaelic instead. If, for a given lemma, usage differed in prose and poetry, usage notes could clarify what was classical and what was early modern.

ie. ghc as Classical Gaelic, which would be a separate “leaf node” in Goidelic tree descending from Middle Irish, itself not a direct ancestor of any modern languages, pglsgamga → {ghc, ga, gd, gv}.
(Also not sure what your definition of “long-time multi-language editors” is, but I’ve been editing entries for multiple languages on Wiktionary since 2010 – even if not nearly as actively as some others here). // Silmeth @talk 10:17, 28 February 2023 (UTC)[reply]
I am not aware of any problem that is solved or caused by such a change at this time. I suppose that there is a potential issue with the code being used for another language by those who determine such things, but that could probably be solved easily. DCDuring (talk) 13:35, 28 February 2023 (UTC)[reply]
The code is an official ISO 639-3 code, not a local Wiktionary code, so it's not going to be used for another language. —Mahāgaja · talk 14:02, 28 February 2023 (UTC)[reply]
For my part, I have basically the same opinion as Mahagaja, that this seems unnecessary — it seems like the lect / words can be (sometimes already are) handled under existing codes/L2s — but I'm not going to stand in the way of the numerous people above who support doing it (which is why I didn't bother saying anything before). It's a finite corpus, so it's not the sort of open-ended issue that splitting two questionably distinct living languages would be, and as long as we define a clear cutoff date / criterion as to when something is ghc vs ga/gd, there shouldn't be any more problems with deciding which language a word/text belongs to than we already face with other languages (e.g. if the exact date of a text is uncertain and borderline between Middle and modern English). And I hear the point that we already split a lot of Slavic stages. So, whatever. - -sche (discuss) 00:25, 1 March 2023 (UTC)[reply]
I’ll just reiterate that this is a standardized literary language and we do have a kind of standardized vocabulary too (tracts on declensions and irregular verbs listing lots of paradigms). The morphology is also drastically different from modern Sc. Gaelic and significantly different from any modern Irish dialect or standard. So it’s not an attempt at creating a language stage out of nothingness, the bards of 13th–17th centuries made the whole “defining what is Classical Gaelic” thing for us pretty well already.
Also, regarding the “this seems unnecessary” concern – as someone who’s read a fair bit of 19th and 20th century Irish literature (and Wiktionary was quite useful with that), I almost never use Wiktionary when reading Classical Gaelic, because it’s pretty much useless to me here. I occasionally look at Old Irish lemmata here when I can guess the Old Irish form, but never at modern Irish entries – despite ga and gd supposedly covering the “early modern” period. And whenever I consider adding something classical to Wiktionary, I typically give up as I have no idea where I should include it and how exactly I should handle it (see my comments above about taoibhréim, also adhaigh and nar).
I mostly use eDIL (mostly Old and Middle Irish dictionary, and very difficult to use by an amateur), Léamh.org glossary (website dedicated to Early Mod.Ir., with word-lists scrapped from EMI texts editions, also not the greatest “dictionary”), and if those fail – Dinneen’s dictionary (pre-reform, early 20th c., modern Irish dictionary that has a lot of vocab used in poetry). But I think Wiktionary could become the most useful dictionary for Classical Gaelic, if it dealt with it in its own right.
I guess a specific clear policy on handling Classical Gaelic and earlier parts of Early Modern (13th–15th c., I’d imagine) stuff instead of treating it as a separate language could work too (but then, since the language was already described by its main users in treatises and already has an official ISO code… why not just treat it as such?) // Silmeth @talk 10:34, 1 March 2023 (UTC)[reply]
I've stayed out of this so far, but I have two opinions on the subject.
1. We know that there were differences between Irish and Scottish as far back as the Book of Deer. I would have thought that Classical Gaelic, describing the uniting literary register which held the Irish and Scottish dialects together until the Gaelic Academic establishment fell apart over the 17C, would be a good catchall for that 1200-c.1650 period where there aren't clear signs of regionalisation. It would also tend to emphasise how extended the Modern Irish period is. By labelling everything after 1200 ga, you're mixing words which can't have been coined or borrowed before the 20th century with words which can't be attested after the 15th century. That seems to me to be a bit more extreme than "archaic". Maybe my expectations are wrong.
2. To a first degree of approximation, ghc would seem to be as good a code as we've got for Early Modern Irish, so maybe it would make sense to have pglsgamgaghc → {ga, gd, gv}.
But then, what would I know. Catsidhe (verba, facta) 10:43, 1 March 2023 (UTC)[reply]
@Catsidhe: I suggested keeping ghc as the leaf node, and not the ancestor of the other languages, as:
a) I’m not sure how Manx fits into this (I know Christopher Lewin often cites Classical Gaelic forms when discussing Manx etymologies, but he’s not stating directly they derive from Classical Gaelic forms, I take this practice rather as providing a “Pan-Gaelic” cognate form, to make it easier to connect it to the other Goidelic langs; also AFAIK the Classical G. literary tradition has never existed in Man),
b) Classical Gaelic mostly is an artificial prescriptive literary tradition and there are developments happening in the background, so not all modern Irish forms derive from something approved as Classical Gaelic or attested in the early modern period at all, Scottish Gaelic often keeping its own features dating to before Classical: hiatus in motha, the p in piuthar, etc.)…
but I agree that at least pglsgamga → { (ghc → {ga, gd}), gv} in general could work and be helpful in etymologies. // Silmeth @talk 11:04, 1 March 2023 (UTC)[reply]
@Silmethule I see nothing has been done on this. Reading through this, User:-sche and User:Mahagaja seem to be not in favor but everyone else is in favor. Are either of you two's objections blocking? Also, one possibility is introducing an etymology-only code ghc that's listed as ancestral to ga and gd. That is similar to what has been done for Middle Polish (cf. User:Vininn126). Or should we make a full language? Benwing2 (talk) 18:53, 17 August 2023 (UTC)[reply]
I'm not blocking creation of ghc, but I'm not planning to do it myself either. As I understand the discussion above, the supporters specifically don't want ghc listed as an ancestor of the modern languages, because ghc is a purely literary language, and the modern languages descended from the colloquial languages that were contemporary with ghc. —Mahāgaja · talk 19:08, 17 August 2023 (UTC)[reply]
ghc has been created. — SURJECTION / T / C / L / 15:51, 1 February 2024 (UTC)[reply]

Probably some crossover here. And alternative forms like duck-decoy and decoy duck to be made. GreyishWorm (talk) 23:01, 7 October 2022 (UTC)[reply]

Split into black book GreyishWorm (talk) 22:32, 9 October 2022 (UTC)[reply]

It would seem that have no chill is the proper lemma for this verb. (Probably not possible to instead treat no chill as a noun phrase because it's very awkward to define other than as the lack of the definition for chill, which would be SOP.) 86.144.233.189 13:50, 12 October 2022 (UTC)[reply]

Not always "have". "I got no chill" is also heard, for instance. Equinox 19:08, 17 October 2022 (UTC)[reply]

Are these the same? The Kiowa language does not appear to be related to Shoshone, nor does the Wikipedia article on the Kiowa people claim that they are from North Platte, Nebraska. I have a hunch that Kioway is an alt form of something, and this seems like the most obvious answer, but someone should check and try to make sense of this before merging them. 98.170.164.88 07:56, 14 October 2022 (UTC)[reply]

Ah, it's likely that Webster was referring to the North Platte River and not the specific city of North Platte, NE (which is where our entry Kioway currently links). But the rest of the confusion remains. 98.170.164.88 08:00, 14 October 2022 (UTC)[reply]
Shoshonean is an old term for the northern part of the Uto-Aztecan languages, from Shoshone. Many of the names for the Numic languages are only loosely correlated with linguistic reality, so terms like "Shoshone", "Paiute" and "Ute" are kind of hard to pin down without qualifiers. There is a Shoshoni language, but peoples like the Timbisha and the Bannock are also called Shoshone.
Kiowa is part of the Tanoan languages, which may very well be related to Uto-Aztecan as the Aztec–Tanoan languages, but linguists have yet to completely connect the dots. It was speculative in 1913, and it's still not definitively established in 2022. It reminds me of the Achilles and the Tortoise paradox.
It's all part of the confusion that results from early efforts to classify wide-ranging nomadic peoples who have moved into different regions and adopted different cultural patterns and lifestyles. Just as the Comanche were Great Basin Numic people who moved to the Great Plains and adopted a Plains Indian culture, The Kiowa also moved from the pueblos into the Great Plains and adopted a similar culture.
I would make Kioway a simple alternative form of Kiowa and lose the Interesting Facts™ in the definition. Chuck Entz (talk) 16:17, 14 October 2022 (UTC)[reply]

Distinguish slang terms from terms with slang senses[edit]

I think it is worth splitting cat:Terms with slang senses by language (and subcategories) out of cat:Slang by language (and subcategories) similarly to how we have cat:Terms with dated senses by language distinct from cat:Dated terms by language and cat:Terms with uncommon senses by language distinct from cat:Uncommon terms by language. I am developing a crossword game that uses Wiktionary data (which I do not wish to link as it is associated with my real-life identity), and it would be useful to me if the categories made this distinction for English in particular. - excarnateSojourner (talk | contrib) 02:47, 25 October 2022 (UTC)[reply]

Examples of English terms that have non-slang primary senses, but are currently in cat:English slang because of less common slang senses: aardvark, absolute zero, AC/DC, acid. - excarnateSojourner (talk | contrib) 02:52, 25 October 2022 (UTC)[reply]

2022 — November[edit]

The definitions we give for all three terms are essentially identical, but the forms differ because they are borrowed from different Chinese lects (Mandarin, Cantonese, and Taishanese itself, respectively). Should these use {{alt form}} or {{syn of}}? 98.170.164.88 23:08, 14 November 2022 (UTC)[reply]

Oof, yes; as it stands, the entries make it seem like these refer (respectively) to the inhabitants of three different places. - -sche (discuss) 02:15, 15 November 2022 (UTC)[reply]
Merged into the first form which, per ngrams, is the most common. (For the place rather than the -ese, Taishan is particularly lopsidedly more common than the alternatives.) - -sche (discuss) 07:26, 6 March 2023 (UTC)[reply]

These two are essentially the same phrase sharing the same meaning, with the more common 食花生 being derived from the other. – Wpi31 (talk) 12:00, 18 November 2022 (UTC)[reply]

Requesting to move snowsquall to a space-separated form snow squall. The unspaced form doesn't appear to have been used purposefully or frequently, if at all, in the past or present. It also does not appear to be used by either the US-American NWS or the Canadian MSC, and hasn't appeared in any online news coverage. Bailmoney27 (talk) 19:14, 19 November 2022 (UTC)[reply]

I support this. I've never seen the bunched spelling before and I've been following winter weather for many years. It does seem to be in use, but distinctly less common. Wikipedia's favoring of the bunched spelling seems to be largely a matter of the article having been created early in Wikipedia's lifetime, and with a radar scan from 2004 featuring that bunched spelling. Essentially, we had a model to follow and we stuck to it, but it happens that most people, including the national weather services of both the US and Canada, prefer the two-word form. Soap 21:22, 10 December 2022 (UTC)[reply]
Support. Binarystep (talk) 06:55, 21 December 2022 (UTC)[reply]

Well, I decided to just move the page myself, as it's been up here unopposed for six months, and because I want to fill in the usual "see also" hatnote which would require that both spellings exist. Since this would make a non-admin move impossible, I moved the page before I put in the hatnote. Soap 11:19, 13 April 2023 (UTC)[reply]

(And its sister categories in other languages.) This is currently a subcat of English terms by orthographic property, but this is not an orthographic property. I suggest moving it, but I don't know whither.​—msh210 (talk) 11:30, 21 November 2022 (UTC)[reply]

I'd say CAT:English terms by lexical property. —Mahāgaja · talk 08:37, 22 November 2022 (UTC)[reply]

2022 — December[edit]

These are just alternative case forms, but they have slightly different glosses and large translation tables on both pages. —Al-Muqanna المقنع (talk) 21:10, 2 December 2022 (UTC)[reply]

Etymologies 1 and 2 (including translations) should be merged. J3133 (talk) 05:02, 10 December 2022 (UTC)[reply]

Not sure this is a good idea. Etymology 1 is directly imitative; etymology 2 is from the French. — Sgconlaw (talk) 18:55, 5 January 2023 (UTC)[reply]
@Sgconlaw: I meant that “Etymology 1” and “Etymology 2” (but not “Etymology 3”) in one entry should be merged with the respective etymologies in the other entry. J3133 (talk) 19:02, 5 January 2023 (UTC)[reply]
@J3133: mmm, I'm seeing only one etymology section in ha ha, and only two in ha-ha (for English, that is; all the other language sections have only one etymology section as well). — Sgconlaw (talk) 19:14, 5 January 2023 (UTC)[reply]
@Sgconlaw: Sorry, I meant haha (which has three etymologies), not ha ha—fixed. J3133 (talk) 19:17, 5 January 2023 (UTC)[reply]
@J3133: ah, ha ha! I take it you mean that etymology 1 in haha and ha-ha duplicate each other, so one entry should be made the main lemma and the other converted to an alternative form; and likewise for etymology 2 in those entries. — Sgconlaw (talk) 19:19, 5 January 2023 (UTC)[reply]
Yes, that’s correct. J3133 (talk) 19:29, 5 January 2023 (UTC)[reply]

Shouldn't this be in the Reconstruction namespace? Tbh I'm not sure why we need an entry for this at all, even granting that the suffixes -iġ and -eġ are alternative forms of each other. If this specific non-attested form were mentioned in secondary literature then I could see a case for it, but I can't find anything. To be generous, it's plausible that a version with an /e/ vowel existed in Anglo-Saxon speech, if the versions of the suffix were interchangeable. For now at least, I'll just leave this at RFM, but feel free to send this to RFD if desired. 98.170.164.88 10:49, 11 December 2022 (UTC)[reply]

Shouldn't this be in the Reconstruction namespace? Special:PrefixIndex/Reconstruction:Old Persian already includes plenty of other entries for names not directly attested in Old Persian sources, but found in Greek, Elamite, Semitic, etc. 98.170.164.88 11:04, 11 December 2022 (UTC)[reply]

@Skiulinamo. Seems like IP has a point, but I don't know enough about the topic. Thadh (talk) 12:02, 19 December 2022 (UTC)[reply]

Currently this redirects to arsed and, further to the discussion in the Tea Room, I propose that we undo the redirect. After all we aren't currently redirecting can't be fucked or can't be bothered. It seems better to have stub entries for all synonyms of can't be bothered listing them as alternative forms only, with all the synonyms and translations listed on the same page. Though I'm not suggesting creating be arsed and be fucked, we should probably keep be bothered as a translation hub and for the purpose of distinguishing it from the rare word bebothered as we currently do. --Overlordnat1 (talk) 01:49, 14 December 2022 (UTC)[reply]

I agree. I hate these redirects to single words - they rarely make sense without the rest of the term, and they're unintuitive even for experienced users. Theknightwho (talk) 23:39, 4 January 2023 (UTC)[reply]
Agree — Saltmarsh🢃 08:10, 5 January 2023 (UTC)[reply]
Support undoing the redirect. lattermint (talk) 21:56, 15 August 2023 (UTC)[reply]
I'd be happy to redirect can't be bothered to an appropriate sense of bother#Verb.
I don't know whether there are any other uses of fuck to mean "bother", nor of arse with that meaning.
Why wouldn't we RfV arse#Verb "To make, to bother" if the redirect doesn't seem right? If virtually the only usage with the "bother" sense is can't be arsed there is no reason for this not to be a lemma. DCDuring (talk) 23:44, 15 August 2023 (UTC)[reply]
@DCDuring It is possible to use it separately, but it's not common, and I strongly suspect it's a back-formation(?) from can't be arsed. For example, "can you be arsed with this? Me neither." You can do the same thing with fuck, too. Theknightwho (talk) 12:54, 16 August 2023 (UTC)[reply]

The capitalization of these entries is inconsistent, even though they are all coordinate terms for different views on the same issue. Note that Miaphysite and dyophysite don't (currently) exist, while both capitalizations of monophysite do. Also, some of these have adjective senses and some don't. Not technically a request for a move, merger, or split, but it's a similar issue to what often comes up here, so this seemed like a fitting venue. 70.172.194.25 11:37, 19 December 2022 (UTC)[reply]

I agree, they should have the same capitalisation for the main lemmas, and lower-case makes most sense IMO. —Al-Muqanna المقنع (talk) 03:48, 27 February 2023 (UTC)[reply]

It seems the two terms are sometimes (erroneously?) used interchangeably. But maybe not. Flackofnubs (talk) 10:39, 25 December 2022 (UTC)[reply]

Each of these entries contain overlapping definitions (and similar etymologies), and they should probably be merged into a single entry as they seem to be alternative spellings of the same term. OED2 has sirkar and circar. Einstein2 (talk) 14:20, 26 December 2022 (UTC)[reply]

@Einstein2 as a speaker of Indian English, I confirm that sirkar, sircar, sarkar, circar are all alternative romanisations of Hindi/Urdu/Persian sarkār. I'll make sarkar as the main entry if there are no objections, as virtually all modern romanisation systems of Hindi/Urdu/Persian provide sarkār (formal) or sarkar (informal). Sbb1413 (he) (talkcontribs) 04:23, 6 January 2024 (UTC)[reply]

Both etymologies give "stupid person" as a definition. Perhaps some etymology can be merged, or moved to Dumbo, or just mentioned at Wikipedia. Flackofnubs (talk) 22:06, 27 December 2022 (UTC)[reply]

2023 — January[edit]

The entry one fell swoop is lemmatized at the noun phrase. one foul swoop redirects to that. Meanwhile, the prepositional phrase in one foul swoop has its own separate entry. I think the latter should drop the "in" for consistency. Perhaps it could even be given as an {{alt form}} or {{syn of}} the main entry, but I'm not sure. 70.172.194.25 08:22, 5 January 2023 (UTC)[reply]

IMHO, at the very least, one foul swoop needs explanation and therefore needs a full entry. Also, it has a distinct pronunciation and [[[fell]] and foul are not close cognates, so they don't seem to be alternative forms of one another. One foul swoop seems to refer to (be derived from) one fell swoop. If one foul swoop gets the main entry I think it deserves, then in one foul swoop should redirect thereto. DCDuring (talk) 16:59, 5 January 2023 (UTC)[reply]
I agree that one foul swoop deserves a separate entry and that in one foul swoop should redirect thither. —Mahāgaja · talk 11:31, 6 January 2023 (UTC)[reply]
Agreed. As in one fell swoop redirects to one fell swoop, redirecting in one foul swoop to one foul swoop would seemingly be the only logical and consistent course of action. --Overlordnat1 (talk) 11:43, 6 January 2023 (UTC)[reply]

parlez vous, parleyvoo, and parley-vous are all treated as separate words[edit]

parlez vous, parleyvoo, and parley-vous whilst having the exact same meanings and roughly the same pronunciation, all have their own pages and the others are listed as synonyms. Two have the meaning of “a Frenchmen, one has “the French language” and all of them have “to speak a foreign language, especially French”. Are these all not the word, with differing spellings? -CanadianRosbif (talk) 10:37, 7 January 2023 (UTC)[reply]

We should probably merge them into parlez vous but list the other two spellings as alternative forms. There is also the song 'Mademoiselle from Armentieres' aka 'Hinky Dinky Parley Voo'[19] which has the form parley voo, the spaced version of parleyvoo, though I don't think this bawdy WW1 song would be a good example to include in our entry, fun though it is, as it's not clear what the final refrain of parley voo at the end of each line is actually supposed to mean. There is also a version that appears in the final credits of Peter Jackson's film They Shall Not Grow Old which can be found on YouTube and which is where I first came across the song. --Overlordnat1 (talk) 02:21, 8 January 2023 (UTC)[reply]

The senses and translations should probably be listed under one page, with the other being listed as an alternative form (adjective sense only for fucked up). I'd personally prefer the more common one be listed as the main lemma, but I'm open to suggestions otherwise. AG202 (talk) 11:29, 9 January 2023 (UTC)[reply]

Yeah, they're clearly the same expression in two different spellings. The hyphen doesnt even change the meaning since it's not an attributive noun; i think this is just a matter of people's spelling preferences. I made the hyphenated spelling an alternate of the spaced spelling and will merge the translations later. Soap 07:53, 12 July 2023 (UTC)[reply]

The correct is “Etóña” with an acute diacritic as written on [Wikipedia] 100.undentifieduser (talk) 20:05, 18 January 2023 (UTC)[reply]

Currently, {{lb|de|EU politics}} categorizes as Category:European politics and the lede in Category:European politics says "terms related to politics of the European Union." I don't dispute that this ridiculous misnomer is widespread but we don't do ourselves any favors by leaning into it. I propose that we repurpose Category:European politics and make it the category of all European (i.e. taking place in or relating to the continent of Europe) politics categories and entries, not just those related to the politics of the European Union. Entries and categories pertaining to EU politics should instead be part of Category:EU politics which itself should be a subcategory of Category:European politics. — Fytcha T | L | C 08:22, 19 January 2023 (UTC)[reply]

Support. That makes sense, especially since CAT:Swiss politics is currently a subcategory of CAT:European politics even though Switzerland isn't in the EU yet. —Mahāgaja · talk 08:37, 19 January 2023 (UTC)[reply]
Support. It wouldn't make sense to have Category:en:UK politics moved to Category:EU politics post-Brexocalypse but it would make sense to have both of these as subcategories of Category:European politics. --Overlordnat1 (talk) 10:19, 19 January 2023 (UTC)[reply]
Support. — Fenakhay (حيطي · مساهماتي) 10:24, 19 January 2023 (UTC)[reply]
Support Vininn126 (talk) 10:40, 19 January 2023 (UTC)[reply]
Support J3133 (talk) 12:31, 19 January 2023 (UTC)[reply]
Support although maybe it should be called 'European Union politics' as we tend to avoid abbreviations in category names. Benwing2 (talk) 05:07, 27 January 2023 (UTC)[reply]
Support Prefer Benwing's variant. As a matter of curiosity, would the current unpleasantness in Ukraine belong in thye repurposed Category:European politics? DCDuring (talk) 14:34, 27 January 2023 (UTC)[reply]
Support But the EU politics version. It's consistent with Category:US politics, which does use an abbreviation. Theknightwho (talk) 16:28, 27 January 2023 (UTC)[reply]
@Theknightwho This is true but at the same time we have CAT:New Zealand politics not #CAT:NZ politics. In general I actually think we should replace 'Fooan politics' with 'Politics of Foo'; this is keeping with CAT:History of the United States, CAT:Languages of the United States, CAT:Political subdivisions of the United States, etc. Besides the politics categories, there are no categories that abbreviate US or UK except for a few odd stragglers (e.g. Category:Upper Midwest US English), while there are hundreds of categories that spell out 'United States'. Similarly, we already have CAT:European Union (not #CAT:EU). Benwing2 (talk) 06:50, 28 January 2023 (UTC)[reply]
Fair point. We should probably change "US politics" to "United States politics" and "UK politics" to "United Kingdom politics", in that case. Best to be consistent with country/supranational entity names. Theknightwho (talk) 14:00, 12 July 2023 (UTC)[reply]
Support and prefer Benwing's variant. — excarnateSojourner (talk · contrib) 05:51, 30 January 2023 (UTC)[reply]
Support, and apparently both EU and UK need to be spelt out for consistency, but this is a secondary issue. Fay Freak (talk) 22:26, 4 March 2023 (UTC)[reply]

2023 — February[edit]

Personally, I'm from the US and I've only ever seen/heard "pompom". Ultimateria (talk) 18:26, 1 February 2023 (UTC)[reply]

Merge into Reconstruction:Proto-Indo-European/(s)mel-. Most modern sources agree these are part of one and the same root. The only descendant that (traditionally) requires PIE *a is Latin malus, which fits semantically better with the gloss at *(s)mel- anyway. In fact it is unnecessary to reconstruct *a at all, in light of *mo > *ma unrounding in an open syllable with coda resonant (see de Vaan:2011 p. 8: 7.1; p. 360), the same process that resulted in mare (sea) < *móri. In any case the reconstruction of the vowel is irrelevant to whether the Latin, Slavic and Germanic words are cognate, despite the last sentence of the Latin etymology 1 described at malus. — 69.121.86.13 19:31, 3 February 2023 (UTC)[reply]

No idea why there are both templates existing where the only difference is lower and upper cases on N/n. --Liuxinyu970226 (talk) 08:16, 5 February 2023 (UTC)[reply]

I agree, and have raised this issue before. I think they should be merged. @Erutuon? — Sgconlaw (talk) 05:15, 8 February 2023 (UTC)[reply]

Church Slavonic and Moravian[edit]

Technically Old Church Slavonic and Church Slavonic should be two two separate languages (?), but we only have the former probably because of the small number of editors. These languages are always treated as two different languages in etymology. For now in etymologies and Proto-Slavic pages (*viňaga). For now we trick it as Church Slavonic: {{l|cu|асдф}} or Church Slavonic: {{desc|cu|асдф|nolb=1}}. That is not very convenient, we should have separate etycode for Church Slavonic.

We Should also have etycode for Czech Moravian, which is also pretty often used in Proto-Slavic pages (and many etym dictionaries), Serbo-Croatian has templates like that (ckm, sh-kaj, sh-tor). Sławobóg (talk) 12:53, 5 February 2023 (UTC)[reply]

@Павло Сарт, Atitarev, Kamen Ugalj, Skiulinamo, Rua, ZomBear, Bezimenen, IYI681, Vininn126 pinging some people that might be interested. Thadh (talk) 13:03, 5 February 2023 (UTC)[reply]
Support @Sławobóg I completely agree with you, we need a separate etymological code for the usual Church Slavonic language. I constantly thought about it, why is it not there.. --ZomBear (talk) 19:32, 5 February 2023 (UTC)[reply]
Support for Church Slavonic Безименен (talk) 13:45, 7 February 2023 (UTC)[reply]
Oppose for Czech Moravian: there would be 20-30 more regional varieties that could spring if one started Balkanizing Slavic languages + I don't want to give food for thought to Z-Russians. There are already talks for forging Novorussian, Transnistrian, or Lipovan Russian in order to justify their expansive aspirations over former Imperial Russian territories. Безименен (talk) 13:45, 7 February 2023 (UTC)[reply]

I also propose to do away with similar problems in the tree of Slavic languages once and for all. I suggest:

  • South Slavic:
1. Add etymological code for Old Serbo-Croatian (zls-osh). With a redirect to modern Serbo-Croatian. Meets regularly in {{R:sla:ESSJa}}.
2. Add etymological code for Old Slovene (zls-osl). With a redirect to modern Slovene. Meets regularly in {{R:sla:ESSJa}}.
3. Move the Macedonian language to the descendant of Old Church Slavonic, as it was done some time ago with the Bulgarian language.
4. Add etymological code for Church Slavonic (cu-chu). Perhaps even with a division into Russian Church Slavonic (cu-rcu), Serbian Church Slavonic (cu-scu) and others, if any.
  • West Slavic:
1. Add etymological code for Middle Polish (zlw-mpl). With a redirect to modern Polish or (?). @KamiruPL, Vininn126
2. Add etymological code for Old Slovak (zlw-osk). With a redirect to modern Slovak. It was high time to do it! Meets regularly in {{R:sla:ESSJa}}. Especially if even Early Modern Czech (cs-ear) was awarded a separate code.
3. Possibly add (family code) a Czech–Slovak languages (zlw-csk) ?. Just like there are Lechitic (zlw-lch) F.
4. It's possible: add etymological code for "Old Sorbian" (see Wendish/Lusatian ?) (zlw-osb)? Perhaps with a redirect to Upper Sorbian or (?).
  • East Slavic:
1. Rename etymological codes Old Ukrainian (zle-ouk) & Old Belarusian (zle-obe) → Middle Ukrainian (zle-muk) & Middle Belarusian (zle-mbe), respectively. A similar request from another user was about six months ago (Wiktionary:Beer parlour/2022/September#“Old Ruthenian” language). Therefore, with "Old" for those languages, these are "parts" of Old East Slavic until the 14th c. (this is indicated on the en.Wikipedia).
2. Probably it is worth removing the Old Novgorod from the descendants of the Old East Slavic. Make it a separate and parallel ancient language in the East Slavic subgroup. --ZomBear (talk) 19:32, 5 February 2023 (UTC)[reply]
3. Add etymological code for Pannonian Rusyn with a redirect to Rusyn (rue).
  • PS: LOL, I'm serious, add an etymological code for "Early Proto-Slavic" (sla-ear) (?) with a redirect to Proto-Balto-Slavic (?). Because Wiktionary "for the standard" uses a rather late version of the Proto-Slavic language. And sometimes in the Etymology section it may be necessary to indicate an earlier form, and the presence of a separate etym-code for "Early PSl." would not be superfluous. --ZomBear (talk) 19:50, 5 February 2023 (UTC)[reply]
I don't think any "Old Sorbian" is attested. Both Upper Sorbian and Lower Sorbian are attested only from the 16th century, and they were already distinct at that point. In theory there could be a code for Proto-Sorbian, but it would have to be a full-fledged protolanguage, not an etymology-only language. —Mahāgaja · talk 20:17, 5 February 2023 (UTC)[reply]
@Mahagaja Yeah, I'm not sure about "Old Sorbian" either. This suggestion is only possible. I relied on the fact that in {{R:sla:ESSJa}} sometimes there are words with abbreviations "ст.-луж."/"др.-серболуж." ("старолужицкий"/"древнесерболужицкий" = translation "Old Sorbian") without specifying where the word belongs - to the Upper or Lower Sorbian language. --ZomBear (talk) 21:09, 5 February 2023 (UTC)[reply]
@ZomBear: I agree with most of your suggestions, except for Old Serbo-Croatian and Old Sorbian. Serbs and Croats never had an organized shared language until 17-18 century. One could perhaps talk about an Old Serbo-Croatian stage in the development of the Dinaric Slavic complex, but there never was a common language that could be associated with this period (leaving aside the Bosno-Rascian recension of Church Slavonic or Glagolitic Croatian). The same holds in even greater magnitude for Sorbian. Sorbs may self-identify as one people ethnically, but linguistically their languages are noticeably divergent.
PS I also don't see much educational value in copying all the distinctions that you can find in ESSJa. Note that it often gives old spellings that precede various spelling reforms, dialectal forms which don't follow any orthographic standard, morphological variants (like diminutive forms, etc.) which don't contribute much additional insight, it provides local colloquial meanings which are clearly recent innovations, etc. I personally prefer a more concise and economic presentation for reconstructed terms rather than having 10-15 dialectal spellings of Serbo-Croatian or those monstrosities that are given as dialectal variants of Polish/Bulgarian/Slovenian by ESSJa. Meiner Meinung nach, such an information should go to the respective page of the daughter language, rather than overblowing the proto-Slavic Descendants section.
PS2 Early proto-Slavic is a useful designation, however, I don't know where exactly where one should draw the border between Early, Middle and Late proto-Slavic and what notation should be applied. Безименен (talk) 13:30, 7 February 2023 (UTC)[reply]
As it stands, Middle Polish is listed as a variant of Modern Polish. We do see some significant phonological changes and a few semantic ones as well, however, it's hard to say whether it should have its own code or not. Even if it did, it would certainly be a redirect to Modern Polish, seeing as it's a period of only about 1250 years. (1500-1750). Vininn126 (talk) 13:36, 7 February 2023 (UTC)[reply]
@Vininn126: That's 250 years. —Mahāgaja · talk 15:16, 7 February 2023 (UTC)[reply]
The one and the two are right next to each other.

The prefix is from Glottolog, which is a proper noun. The capital G should be included in the article's name

18:13, 15 February 2023 (UTC)

English. As the entry says "capitalization varies". I see no compelling reason that this shouldn't be a noun sense at boot with "always with 'the'", or something of the sort. Chuck Entz (talk) 23:43, 18 February 2023 (UTC)[reply]

(BTW the Saatse Boot is also referred to as "the Boot".) - -sche (discuss) 23:21, 20 February 2023 (UTC)[reply]
The Boot meaning the Saatse Boot should be somewhere uppercase, I think, whether Boot or the Boot or The Boot I'm not entirely sure, because it functions as a proper noun place name. I'm not familiar with how the (b/B)oot meaning Louisiana is used; in the one cite in the entry, or others I can image like referring to LA as America's boot, it seems like a metaphorical general sense for something or somewhere boot-shaped. So it may be an RFV question, does Saatse-style use as a proper noun place name exist (for either place ... I can't actually find the Saatse one in books, either, only online). - -sche (discuss) 17:51, 24 April 2023 (UTC)[reply]

:-), :-(:), :([edit]

The forms with noses are pretty dated at this point and not in widespread use. I think it'd be better if the noseless forms were the main entries, perhaps with a note on the older forms indicating that they were used first. Binarystep (talk) 20:24, 25 February 2023 (UTC)[reply]

2023 — March[edit]

Polish Silesian and Silesian[edit]

@Shumkichi @KamiruPL The Cieszyn Silesia Polish category has many terms that should probably be moved to Silesian proper. Can we figure out which ones we need to fix? Vininn126 (talk) 12:29, 8 March 2023 (UTC)[reply]

Also maybe @Hythonia, @Sławobóg Vininn126 (talk) 12:30, 8 March 2023 (UTC)[reply]
Idk where Silesian proper starts and Silesian Polish ends so I don't think I'll be of much help o_ _ _ _ _ _ _ _ _ _ _ _ O Maybe let's just assume they'd all be used in Silesian anyway, and then we can add Polish headers to the few entries that can be considered dialectal Polish after we find some sources later??? Shumkichi (talk) 13:33, 8 March 2023 (UTC)[reply]
@Vininn126, Shumkichi Not to throw a monkey wrench into this discussion but ... I read the Wikipedia article on Silesian and it seems there's debate over whether it's a separate language as well as a not-yet-established writing system. Given this, I wonder if it wouldn't be better to unify Silesian and Polish similarly to the way that all Chinese lects as well as Serbo-Croatian are unified. The motivation here is practical: it's significantly more difficult to implement and maintain all the infrastructure for two separate L2's vs. one unified L2, and the minority status of Silesian means it's likely to not get much love as a separate L2 (compare the situation with Jeju vs. Korean and Scots vs. English). Benwing2 (talk) 06:19, 16 March 2023 (UTC)[reply]
@Benwing2 I've actually been trying to do some research on this. One problem with that system are the politics involved - there is a considerable Silesian group that consider it separate. I've also been trying to do some research on the pronunciation, but there are some major difference that point to Silesian having come from an older variant of Polish, as opposed to a modern one. And as to the orthography, recently, Ślabikorz śląski was introduced and has been fairly widely adapted, even silling.org has a normalizer - I've included all of this in WT:About Silesian, and I would actually like to go through all the entries and do a major cleanup. I've even been trying to set up other infrastructure. Vininn126 (talk) 09:59, 16 March 2023 (UTC)[reply]
As to the fact of it coming from an older variant - there are significant sound differences, such as maintaining distinctions from previous long vowels, having more of a 7 vowel system like in Italian, and some significant grammatical differences like continuing the old aorist in a past tense system that's completely different. Vininn126 (talk) 10:22, 16 March 2023 (UTC)[reply]
@Vininn126 I think it's a mistake to conflate whether language A and B are different languages with whether they need separate L2's in Wiktionary. IMO the latter question should be determined by what makes for less work and duplication. If the majority of terms in Silesian are the same as in Polish (which I suspect they are), it might make sense to unify them. The current set of lemmas is non-representative in that it mostly covers lemmas that are different in Silesian. Benwing2 (talk) 15:25, 16 March 2023 (UTC)[reply]
@Benwing2 In order to determine that we need more data on that and currently there aren't any major Silesian dictionaries aside from Silling, which is relatively new, and it's currently doing a massive import of words. Currently they are important a Polish-Silesian dictionary so based on that alone it would suggest a lot sharing. However further work needs to be done to determine how different they really are. As someone who works with it more, I'd say it's not any more different than some of the differences between other Slavic languages, which are remarkably similar. Vininn126 (talk) 15:34, 16 March 2023 (UTC)[reply]
@Vininn126: Makes sense, thanks. Benwing2 (talk) 15:42, 16 March 2023 (UTC)[reply]
@Benwing2 And I think you didn't understand his point. Silesian is not a dialect of Polish since it doesn't come from modern Polish - they both come from Middle Polish (or you could call it Middle Silesian, it doesn't matter, it's just that Polish's always had more speakers, hence the privileged position of Polish over other dialects). That's why your comparison to Serbo-Croatian makes no sense since S-C. is a single language with most of its officially recognised "varieties" not even being different dialects nor even subdialects but simple local variants with at most a few different words, lol. Silesian and Polish, on the other hand, are full of seemingly small but SYSTEMATIC differences that all add up to them being sufficiently different (more so than e.g. Czech and Slovak, I'd say). And the important thing is that they differ not only in vocabulary but also in syntax.
"If the majority of terms in Silesian are the same as in Polish (which I suspect they are)" - no, they are not the same, and your suspicion is wrong. It's as if you looked at the spelling of some Kashubian words and compared them to their Polish cognates - yes, their orthographies are quite similar but it's jsut a superficial similarity. Shumkichi (talk) 20:17, 16 March 2023 (UTC)[reply]
@Shumkichi Don't get all worked up over this. You didn't even read the first line of my comment: "I think it's a mistake to conflate whether language A and B are different languages with whether they need separate L2's in Wiktionary." Benwing2 (talk) 20:33, 16 March 2023 (UTC)[reply]
@Benwing2 I'm not worked up??? And I did read it, that's why I said the orthographies are different, and that's enough NOT to merge Silesian entries with Polish ones. Polish has an official body that regulates its orthography so it can't use two different spelling norms that also differ in pronunciation. Capisci? Shumkichi (talk) 20:55, 16 March 2023 (UTC)[reply]
Also, according to your argument, we should merge Czech and Slovak. But KKK, as they say in Polent. Shumkichi (talk) 20:56, 16 March 2023 (UTC)[reply]
Alright, let's cool it here. It seems like Silesian is here to stay at least for the time being. Vininn126 (talk) 21:17, 16 March 2023 (UTC)[reply]

We have two different entries for the same thing, while links generated with {{m}} or {{l}} like *vьśegъda link to the latter (vьsegъda) as they seem to ignore ś in Proto-Slavic reconstructions which IMO is unexpected. This makes the former (vьśegъda) being ignored and forgotten recently. I guess both entries should be merged and the language modules should be tweaked to make Proto-Slavic stuff ś-aware? // Silmeth @talk 12:28, 15 March 2023 (UTC)[reply]

@Silmethule Converting ś to s seems intentional, and asserts that there's no separate ś phoneme in Proto-Slavic. Reconstructing ś seems ahistorical to me; it's rather that the third (and second ...) palatalizations occurred post-Proto-Slavic. Benwing2 (talk) 06:27, 16 March 2023 (UTC)[reply]
@Benwing2: but it has different reflexes in different branches. So, either those palatalizations happened post-Proto-Slavic and is a valid dia-phoneme projected back and reconstructing *s in those places for Proto-Slavic is wrong, or it was an actual Proto-Slavic phoneme with some value separate from both *s and that merged with those at a later stage – in which case we’re justified to reconstruct and *s is wrong. In either case, unless we undo all progressive and 2nd regressive palatalizations of *x (and all the other sounds? there are traces of non-palatalization in *otьcь in the east too), we need to treat as a (dia)phoneme of its own and *s is wrong. Also WT:About Proto-Slavic seems to treat as a separate phoneme (and even ascribes a specific IPA value to it). // Silmeth @talk 10:00, 16 March 2023 (UTC)[reply]
@Silmethule What do the primary sources say? Benwing2 (talk) 15:10, 16 March 2023 (UTC)[reply]
@Benwing2: What primary sources? Proto-Slavic is a reconstructed, not directly attested, language.
If you mean etymological dictionaries and historical linguistic papers – depends, you get all sorts of things (*vьšь in Polish dictionaries, *vьsь in some southern ones, non-palatalized *vьxъ in Vasmer, etc.) – although in general progressive and 2nd regressive palatalizations are commonly marked. But *x is problematic as it has different reflexes in the west vs south+east; hence Derksen’s notation with , as he puts it:

The introduction of *ś, on the other hand, could not be avoided, cf. *vьśь ‘all’ vs. *vьsь ‘village’

// Silmeth @talk 17:04, 16 March 2023 (UTC)[reply]
@Silmethule We need some other people to weigh in. The current situation with no ś was done intentionally so we shouldn't change it willy-nilly. Benwing2 (talk) 20:35, 16 March 2023 (UTC)[reply]
OK. I’ll leave some pings then: @Fay Freak, Ivan Štambuk, Sławobóg, Thadh, Useigor, Vorziblix, ZomBear. // Silmeth @talk 20:54, 16 March 2023 (UTC)[reply]
I already agreed that we should use ś. Third palatalisation is only absent in Old Novgorodian and most of our entries already do apply the sound law to stops, so I don't see why we should treat the sibilant any differently. Thadh (talk) 22:23, 16 March 2023 (UTC)[reply]
I agree that *ś should be used. Make the main reconstruction - *vьśegъda, and the form *vьsegъda (maybe?) as a redirect. ZomBear (talk) 06:43, 17 March 2023 (UTC)[reply]

Renaming Proto-Mon-Khmer to Proto-Austroasiatic[edit]

Proto-Mon-Khmer is deprecated. The name of Category:Proto-Mon-Khmer language needs to be changed to Category:Proto-Austroasiatic language, just like how we have Category:Proto-Sino-Tibetan language rather than Category:Proto-Tibeto-Burman language. See the Wikipedia article on Austroasiatic languages to get an idea of why Mon-Khmer is no longer valid, because Munda and Nicobarese are simply regular branches that are sisters of the other so-called Mon-Khmer languages.

The page names can simply be renamed, and the lemmas do not need to be changed. Category:Proto-Sino-Tibetan language is a perfect example of this. The Proto-Sino-Tibetan lemmas are actually all Proto-Tibeto-Burman reconstructed forms by James A. Matisoff, who considers Tibeto-Burman to be a branch of Sino-Tibetan. Now, more scholars are thinking that Chinese is simply another another regular sister branch of the various Sino-Tibetan languages out there, rather than its own special branch. Same goes for Mon-Khmer.

So how can this name change be done? Ngôn Ngữ Học (talk) 22:23, 18 March 2023 (UTC)[reply]

Formerly:

  • Austroasiatic
    • Munda
    • Mon-Khmer (which Shorto reconstructed)
      • (about a dozen branches)

Now the consensus is that the tree has a rake-like structure (per Sidwell):

  • Austroasiatic
    • (about a dozen branches including Munda)

That's why Mon-Khmer is an obsolete term now.

Similarly, with Sino-Tibetan, it formerly was:

  • Sino-Tibetan
    • Chinese
    • Tibeto-Burman (which Matisoff reconstructed)
      • (dozens of branches)

Now the consensus among many scholars is that the tree has a rake-like structure with many "fallen leaves" (quoting George van Driem), making Tibeto-Burman obsolete:

  • Sino-Tibetan
    • (dozens of branches including Chinese)

Ngôn Ngữ Học (talk) 22:27, 18 March 2023 (UTC)[reply]

Support. If this change happens we should delete Category:Mon-Khmer languages. Benwing2 (talk) 23:41, 18 March 2023 (UTC)[reply]
Abstain. I prefer to wait for when an actual new reconstruction of Proto-Austroasiatic is published to do the move, see what I wrote at Wiktionary:About Proto-Mon-Khmer, but I do not actually oppose to moving now. However, if the move do happen, I'm would like to see a line like "This reconstruction is from Shorto (2006) for the obsolete concept of Proto-Mon-Khmer, and should not be treated as actual reconstruction of Proto-Austroasiatic, which as of now has not yet fully materialized, and is simply "placeholder" for the actual Austroasiatic etymologies" (probably as a template) to be added as warning for every reconstruction item. I very much want the same thing to happen to "Proto-Sino-Tibetan", considering a lot of them are no way near actual Proto-Sino-Tibetan, and the reconstruction items themselves are "icky" to say at least. PhanAnh123 (talk) 01:52, 19 March 2023 (UTC)[reply]
@PhanAnh123: Take a look at Sidwell's Proto-Austroasiatic reconstruction and Shorto's Proto-Mon-Khmer reconstruction. Sidwell's inclusion of Munda and Nicobarese had virtually no impact on his Proto-Austroasiatic reconstruction (versus if he had only included the "Mon-Khmer" languages) because he considered Munda to be highly innovative and restructured, with few original retentions from Proto-Austroasiatic. Furthermore, it would be very confusing to have duplicates for both Proto-Austroasiatic and Proto-Mon-Khmer. I would just merge them as Proto-Austroasiatic. Ngôn Ngữ Học (talk) 19:25, 19 March 2023 (UTC)[reply]
I have no intention to keep Proto-Austroasiatic and Proto-Mon-Khmer seperated (I consider Proto-Mon-Khmer to be likely a ghost after all), what I mean is that we either should keep the entries as are until actual Proto-Austroasiatic reconstruction comes about, or move the "Proto-Mon-Khmer" items to Proto-Austroasiatic but with the warning added. I know what you mean by "inclusion of Munda and Nicobarese had virtually no impact", because like Sidwell, I do think these branches are quite innovative, however, that does not mean I agree to move the Shorto's Proto-Mon-Khmer reconstruction to Proto-Austroasiatic without any warning, since Austroasiatic linguistics have progressed quite a lot even outside of those two branches. The vocalism in Shorto (2006) was very rudimentary reconstructed, which the reconstruction of the descendant branches as well as the recent "sneak peek" to Proto-Austroasiatic reconstruction by Sidwell improved upon; furthermore, the syllable structure itself is also slightly changed, it is now thought that a glottal stop phonetically presented in any Proto-Austroasiatic word that ended in a pure vowel (meaning any word ended in *aːj would still have *aːj, but those ended in **aː would automatically became *aːʔ), plus there is the status of *ʄ- that very much awaits assessment in the actual reconstruction of Proto-Austroasiatic. Like I said, I don't oppose moving, but there much be strings attached. PhanAnh123 (talk) 01:53, 20 March 2023 (UTC)[reply]
@PhanAnh123, Ngôn Ngữ Học Such a warning can be added by bot to the top of all entries if both of you agree. Benwing2 (talk) 03:30, 20 March 2023 (UTC)[reply]
@Benwing2: Agree, a warning placed by a bot should be sufficient. Also @PhanAnh123, we can use Sidwell & Rau (2015) for some of the basic Swadesh list words, but a full reconstruction of Proto-Austroasiatic is currently being done by Sidwell. It should come out in a few years. Ngôn Ngữ Học (talk) 10:19, 20 March 2023 (UTC)[reply]
We are all in agreement then, so obviously now I support moving. With this Munda cognates can be directly added to the entries. PhanAnh123 (talk) 10:29, 20 March 2023 (UTC)[reply]
Agree on the support.
Abstain Support. I've seen assertions that Mon and Khmer actually form a subgroup within the traditional Mon-Khmer grouping. Of course, it could be something messy as with Indo-European, where we have at least Indo-Iranian and Balto-Slavonic. --RichardW57m (talk) 16:19, 21 March 2023 (UTC)[reply]
There is no such thing as a Mon+Khmer grouping within Mon-Khmer. Some classifications propose Eastern, Southern, and Northern groupings within Mon-Khmer, but none of them put Monic and Khmeric together. Please consult the Austroasiatic languages article on Wikipedia to get a basic refresher of all the major previous classifiations. Ngôn Ngữ Học (talk) 15:04, 23 March 2023 (UTC)[reply]
The cited articles do show that their crown group is larger than Monic + Khmeric, but it does look as though we don't need to worry about anyone using 'Mon-Khmer' to denote their (weak) association. --RichardW57m (talk) 11:36, 28 March 2023 (UTC)[reply]

Renaming Proto-Hmong to Proto-Hmongic[edit]

  1. Category:Proto-Hmong language needs to be changed to Category:Proto-Hmongic language. See Hmongic languages and Hmong language on Wikipedia.
  2. Category:Proto-Mien language needs to be changed to Category:Proto-Mienic language. See Mienic languages and Iu Mien language on Wikipedia.

The Hmong-Mien language tree is like this:

  • Hmong-Mien
    • Hmongic
      • Hmong
      • (dozens of languages)
    • Mienic
      • Iu Mien
      • (several languages)

Proto-Hmong refer thus refers to only Hmong, not Hmongic. There are dozens of Hmongic languages that are not Hmong. They include Hmu, Pa Hng, Bunu, She, and others.

Same goes for Proto-Mienic. Proto-Mien technically refers to Proto-Iu Mien, but does not include Kim Mun, Biao Min, and Dzao Min.

Ngôn Ngữ Học (talk) 22:23, 18 March 2023 (UTC)[reply]

Support. If we make this change we also need to rename the families, i.e. Category:Hmong languages -> Category:Hmongic languages and Category:Mien languages -> Category:Mienic languages. This is similar to the change from Category:Korean languages -> Category:Koreanic languages, which was implemented in Jan 2022. Benwing2 (talk) 23:45, 18 March 2023 (UTC)[reply]
Support. Theknightwho (talk) 17:57, 1 June 2023 (UTC)[reply]

They are defining the same thing, using various grades of nautical jargon Van Man Fan (talk) 10:39, 24 March 2023 (UTC)[reply]

I don't think this would normally be spelled with a hyphen, at least not as a verb. heave-to with a hyphen looks like a noun, probably meaning "the act of heaving to", though as a landlubber I don't know if such a noun exists. —Mahāgaja · talk 10:47, 24 March 2023 (UTC)[reply]

Renaming Wiradhuri (wrh) to Wiradjuri[edit]

I think we need to change this one because—so far as I can tell—"Wiradjuri" is the most current and by far most common English spelling for this language since at least the 1980s, as opposed to our current spelling (see "Category:Wiradhuri language"). "Wiradjuri" is also the form used in official signage and communication (see for instance: local shire boundary signage; a city council webpage; a unit from NSW state school curriculum; cultural information from the National Indigenous Australians Agency—a federal government agency). Helrasincke (talk) 03:44, 29 March 2023 (UTC)[reply]

Support. Even on Glottolog, where they use the -dh- form, the -dj- form (and then -dg- forms) is more common in the names of the reference works about it they have catalogued. - -sche (discuss) 20:54, 1 April 2023 (UTC)[reply]

@Benwing2? This, that and the other (talk) 12:44, 27 December 2023 (UTC)[reply]

@This, that and the other  Done. Benwing2 (talk) 04:49, 1 January 2024 (UTC)[reply]

2023 — April[edit]

kaffir should probs be the main form It is probably (talk) 08:11, 13 April 2023 (UTC)[reply]

You may be right as kaffir seems to be slightly more widely used than kafir, though oddly enough we (and Wikipedia) have an entry for Kafiristan and not Kaffiristan (which is a far more prevalent form on GoogleBooks). Though on a raw Google search 'Kafir' is twice as popular as 'Kaffir' and 'Kafiristan' is a lot more popular than 'Kaffiristan' and there does seem to be a slight tendency of late to differentiate the 2 words so that 'kaffir' is the Souh African insult and 'kafir' is the Islamic one. --Overlordnat1 (talk) 09:06, 13 April 2023 (UTC)[reply]

Splitting Haketia from Ladino[edit]

So from doing a lot of research and hearing testimonies from elders who speak this North African Judeo-Spanish language, I think there should be a separate list and code for Haketia. It has been associated as just a dialect of Ladino but that is not the case. Haketia has consonants and words directly from Arabic that are never used in Ladino as well as an array of different phrases and spellings. It is a separate language. Let me know if this can be done. I have a lot of words, pronunciations and phrases ready for adding to it after it is set up. Shukur/thanks. Nevermiand. (talk) 18:43, 16 April 2023 (UTC)[reply]

Looking at google books:"od's niggers" and google books:"odd's niggers", it seems like the O is always (or almost always?) capitalized, as if treated as a name, like Odd, which we currently only have as Norwegian but which is also attestable in English—as a non-God-related given name, I mean. Odd might also be attestable as a minced oath for God, given the variety of other oaths like this I see used or mentioned in old books, including Odd's pittikins, Odd's blood, Odd's hounds, Odd's dickens, Od's fish, Od's heft. For od's bobs the hits are more split, but that entry too should possibly be capitalized. - -sche (discuss) 04:32, 19 April 2023 (UTC)[reply]

IMO these are usually uncapitalised in later use, though it's quite hard to tell because it's usually the first thing in a sentence so gets a capital anyway. And older texts, pre-mid-19th century, would capitalise nouns fairly commonly anyway. But conventionally ods bodikins, od rat it, odzooks etc. are written with small Os. The OED and Chambers both lemmatise od and derivatives uncapitalised. Ƿidsiþ 13:32, 18 September 2023 (UTC)[reply]

2023 — May[edit]

akrasia” is currently listed as the alternative spelling of “acrasia”, which contradicts Wikipedia, as well as the fact that “acratic” is (correctly) listed as the alternative of “akratic”. Also, “akrasia” has 4.5× as many Google results as “acrasia” does. (There’s probably a better metric I could cite, but oh well.) IMO we should swap the two and make “akrasia” the main one. —⁠Will ⁠• ⁠B[talk] 23:30, 5 May 2023 (UTC)[reply]

Tagged a long time ago, I'm just bringing it here. I tend to think they should be merged to be up to, as all of the citations include a form of to be. 76.100.240.27 19:54, 8 May 2023 (UTC)[reply]

Other static copulas can replace be. Should we consider be a generic static copula as we might consider do a generic transitive verb and something a generic NP? DCDuring (talk) 01:17, 18 September 2023 (UTC)[reply]

Should be put behind oneself methinks. 76.100.240.27

Also whip it on someone should be at whip it on and trust someone to should be at trust to. 76.100.240.27
I agree that trust to is a worse location for the expression than trust someone to, though both are worse than trust + to, IMHO.
Doesn't whip it on require a person (or personified object) as complement? I suppose we could handle that with a label. Also. it is possible that there might be another meaning involving inanimate objects or other expressions. I would probably then be easier on users to be able to compare meanings. DCDuring (talk) 00:42, 23 May 2023 (UTC)[reply]
I would not move this entry. People say "I put it behind me" and "You need to put it behind you and move on", not *"I put it behind myself" and *"You need to put it behind yourself and move on". —Mahāgaja · talk 06:48, 23 May 2023 (UTC)[reply]

Middle Polish (yet again)[edit]

I propose we make Middle Polish an Etymology only code with the language code zlw-mpl. Would be very useful for linking and mentioning. @Sławobóg @ZomBear @Mahagaja @Thadh @KamiruPL Vininn126 (talk) 13:11, 11 May 2023 (UTC)[reply]

I know this is a boring topic that many people find irrelevant, but in what contexts would this code be used? (e.g. are there a lot of Ruthenian borrowings from Middle Polish?) Thadh (talk) 16:03, 11 May 2023 (UTC)[reply]
Or a lot of Middle Polish only inheritances from Old Polish, or Silesian derivations. Vininn126 (talk) 16:13, 11 May 2023 (UTC)[reply]
@Thadh in the Old Ruthenian there are a lot of Polish borrowings, just the period of the 1500-1700s. This is, to some extent, one of its features that alienated the modern Ukrainian and Belarusian languages first from Middle Russian and then from modern Russian. --ZomBear (talk) 16:46, 11 May 2023 (UTC)[reply]
  • To me, as the editor of the Old Ruthenian entries, this would be helpful. In the Old Ruthenian zle-ort language (existed in the period ~ 1387-1798), there are extremely many Polish borrowings. Words borrowed in the 1400s (before 1500) have to be indicated from Old Polish zlw-opl, everything is fine here. But borrowings in the period of 1500-1700 have to be indicated as borrowed from the modern Polish language. The presence of a separate code for Middle Polish would solve this inaccuracy. --ZomBear (talk) 16:42, 11 May 2023 (UTC)[reply]
@ZomBear @Sławobóg Done, thanks @Theknightwho! Vininn126 (talk) 14:00, 18 May 2023 (UTC)[reply]
@Vininn126 thank you and everyone who contributed to this. I have already created the first one, the Old Ruthenian гартова́ти (hartováti), where it is listed as a borrowing from Middle Polish. ZomBear (talk) 17:55, 18 May 2023 (UTC)[reply]

molly-mawk is given as an alternative form of mollemoke, and not of mollymawk. The etymologies given for those two are half-different too, while both mention fulmars. There's probably some obsolete taxonomy in there too, so a taxo-specialist's eyes would be more than welcome. Skisckis (talk) 20:40, 11 May 2023 (UTC)[reply]

English. There seems to be some conflation between the two. {{lb|en|China}} categorizes into the former, though people often do meant the latter, which only has 3 entries. For example, typhoon shelter, Hong Kong foot, add oil, and aiya are labelled as both {{lb|en|China}} and {{lb|en|Hong Kong}}.

Also, "Chinese English" technically includes Hong Kong English by the criteria of geography, but linguistically and lexicographically speaking, there is very little influence on HKE from the mainland, which means there are not many instances where we actually need to categorize into both; the existing ones in the category that I'm aware of are (excluding the four already mentioned above) joss stick, Ins, KMT, and ACG. Note that this also causes abominations like the one at ACG, which is meant to include Taiwan as well. (We can ignore Macau for the sake of simplicity, since the English used there is basically a toned down version of formal HKE) – Wpi (talk) 17:29, 30 May 2023 (UTC)[reply]

Off-topic: In my opinion, KMT and joss stick are not regional forms of English; indeed, the latter is currently not labelled as such. (Indeed, 'joss' is not so labelled, though it's not part of my active vocabulary.) --RichardW57m (talk) 09:18, 2 June 2023 (UTC)[reply]

2023 — June[edit]

English. Needs splitting into vulcanian. Probably some crossover Elevenpluscolors (talk) 08:03, 7 June 2023 (UTC)[reply]

@Elevenpluscolors Per the OED entry, no, it shouldn't be split but geological senses (and probably the cuckold sense too) should be separately listed as appearing with a lower-case initial letter. Whichever is currently the more common form should be the main entry. The other one should still have those senses but use the template for alternative case form of. — LlywelynII 22:04, 9 June 2023 (UTC)[reply]

Okinoerabu and Tokunoshima[edit]

Discussion moved from Wiktionary:Beer parlour/2023/June.

These are two Ryukyuan languages that we currently call Oki-No-Erabu and Toku-No-Shima, because that’s how they’re spelled in ISO 639. However, literature invariably uses the unhyphenated forms, and they’re also much easier to read.

Could we please therefore rename them to the unhyphenated forms? Theknightwho (talk) 19:39, 4 June 2023 (UTC)[reply]

I dislike the EN penchant for glomming Japanese names into long undifferentiated strings, as I find that this instead makes them harder to read, and it erases the distinction between the actual component terms.
In some cases, the resulting interpretation or partial-expansion goes sideways, as we see at w:Tokunoshima, where the English text describes this as "Tokuno Island" -- the no portion is simply the genitive particle (no), so as Japanese, this is better thought of as "Toku Island".
Name derivation, for those inclined to dive into the details...
  • The Japanese historical record bears this out, with the first mention in a 699 text as 度感. At the time, this may have been pronounced as something like twokom or dwokom, based on the Middle Chinese readings and known man'yōgana sound values, although some sites render this as toku or doku; it is not clear to me where the ku reading for comes from. At any rate, the no is not part of the base of the name.
  • For those interested and who can read Japanese, here are several references at the Kotobank aggregator site. Search the page for 度感.
  • See also this entry at Nihon Jiten, which also lists 度感嶋 as an attested spelling with the pronunciation Toku Shima, further evidence that the base name is simply Toku and that the no is the particle.
That aside, I do see that w:Tokunoshima language lists the alternative rendering "Toku-No-Shima", and the w:Okinoerabu dialect cluster similarly lists the alternative rendering "Oki-no-Erabu". A quick-and-dirty Google hits comparison (including "the" to filter for English hits):
In the English-language web, the allthewordsruntogether renderings appear to be most common. Meanwhile, the
Language Subtag Registry based on ISO 639 and maintained by IANA
(https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) does indeed use the hyphenated descriptors.
Meh. After digging into this some, I realize I just don't care all that much one way or the other. ‑‑ Eiríkr Útlendi │Tala við mig 22:09, 9 June 2023 (UTC)[reply]
Searching on Google Scholar, it seems the unhyphenated forms are more common, but I concur with Eirikr's views that they look worse.
However, I would suggest that if we were to retain the hyphens, the two languages should be renamed to "Oki-no-Erabu" and "Toku-no-shima" (or the rarer "Toku-no-Shima"), since the these are more common from Google Scholar, and also because "no" is a particle that shouldn't be capitalised in a proper noun, cf. Southend-on-Sea, Stoke-on-Trent or von, de, etc. in surnames. – Wpi (talk) 11:20, 21 June 2023 (UTC)[reply]

Should these categories be merged? Many terms in the -cide categories end in -icide, and thus should be moved, unless we decide not to make this distinction. J3133 (talk) 12:45, 14 June 2023 (UTC)[reply]

Support -icide isn't really even a real suffix, it's just -i- + -cide. Ioaxxere (talk) 00:02, 16 June 2023 (UTC)[reply]
What (s)he said. Nicodene (talk) 16:09, 17 June 2023 (UTC)[reply]
Support unless someone comes up with a really good argument otherwise. DCDuring (talk) 16:17, 17 June 2023 (UTC)[reply]

English. Theres gotta be some overlap between pinnulated and pinnulate. Someone smarter than me could have a go at fixing it Sub zero Temps (talk) 12:55, 23 June 2023 (UTC)[reply]

The pronoun doesn't have to be present: google books:"queer the pitch". Should the lemma be moved to queer the pitch, or should that be a synonym, ...? - -sche (discuss) 18:12, 28 June 2023 (UTC)[reply]

I am familiar with the similar queer the deal, defined by "NetLingo" as "To ruin a potential business deal or arrangement despite all favorable odds. For example, 'They are a liberal company, so don't queer the deal by letting them know our conservative tactics.'" The "deal" version is more common with "the" than with possessive pronouns. But I wonder whether the right approach isn't to make both the "the"" and the "someone's" versions redirect to the right sense of queer#Verb, adding usage examples there. DCDuring (talk) 20:15, 28 June 2023 (UTC)[reply]
Sense 4 of queer#Verb is the right definition for these. I wouldn't have called these dated, but then I'm dated. DCDuring (talk) 20:21, 28 June 2023 (UTC)[reply]

2023 — July[edit]

Tagged over 5 years ago but never listed, a request to merge {{desc-top}} and {{desc-bottom}} into {{des-top}} and {{des-bottom}}. Both create collapsible two-column tables of descendants, but {{desc-top}} provides only the header "Descendants" while {{des-top}} provides the header "Descendants of [term] in other languages". Seems reasonable to me; at any rate, we certainly don't need both. —Mahāgaja · talk 12:27, 4 July 2023 (UTC)[reply]

I prefer the text generated by {{des-top}}, but the name {{desc-top}}. Why don't we move {{des-top}} and {{des-bottom}} to {{desc-top}} and {{desc-bottom}}. I don't think anything bad would happen, at least not in principal namespace. DCDuring (talk) 13:52, 4 July 2023 (UTC)[reply]
As long as there's a hard redirect from the name not selected, it doesn't matter to me which name is selected. In fact, given our fondness for long names with short redirects, we could move {{des-top}} to {{descendants-top}} and have redirects from both short names. —Mahāgaja · talk 14:12, 4 July 2023 (UTC)[reply]
Why not. DCDuring (talk) 14:14, 5 July 2023 (UTC)[reply]
Do template redirects slow down page rendering at all? Soap 17:06, 12 July 2023 (UTC)[reply]
Merge. I agree here with User:DCDuring about preferring the name {{desc-top}} because it matches {{desc}}. I feel like "Descendants of [term] in other languages" is a bit wordy; maybe it should just say "Descendants in other languages" since the term is implicit? However, I'm not strongly attached to this, and we can always change the wording after the merger. Note that this falls under the purview of WT:RFDO#remove lesser-used column templates where I proposed removing both sets of templates and replacing them with a more general collapsing template (e.g. something like {{box-top|<d>}} for descendants), but that is maybe a longer-term discussion. Benwing2 (talk) 21:45, 24 July 2023 (UTC)[reply]
Actually I really like "Descendants in other languages" because then it isn't necessary to format the word. At the moment, {{des-top}} just puts the term in italics no matter what, but non-Latin scripts aren't supposed to be italicized here, and even without italics, text is supposed to be tagged as the correct language. And changing the text to "Descendants in other languages" is a much simpler solution than changing the template so that you have to type {{desc-top|grc}} (or whatever) and then have the template or module format the text appropriately. —Mahāgaja · talk 06:55, 25 July 2023 (UTC)[reply]

English. cotton, cotton on, cotton on to, cotton to. – Jberkel 18:57, 10 July 2023 (UTC)[reply]

Correct language names[edit]

Could you correct Juǀ'hoan to Juǀʼhoan, Kwak'wala to Kwakʼwala, and K'iche' to Kʼicheʼ? There's no punctuation in the ethnonyms. If we want to use assimilated English forms, then the latter would be Quiché; I'm not sure about Juǀʼhoan. kwami (talk) 19:16, 13 July 2023 (UTC)[reply]

  • Support. To clarify for people using low-resolution screens: the request is to use the modifier letter apostrophe character ʼ rather than the typewriter apostrophe '; the categories are currently at Category:Juǀ'hoan language (ktz) and Category:K'iche' language (quc). Our usual practice is to use the spelling most common in contemporary English-language discussions of the language. Which is more common in current books and journal articles, Kʼicheʼ or Quiché? —Mahāgaja · talk 19:30, 13 July 2023 (UTC)[reply]
    Just to be clear, I personally don't care about ASCII substitutions in category names; what I'm concerned about is proper headers in the dictionary entries. But it's fine by me if the two go together.
As for Kʼicheʼ or Quiché, the English-language lit has been moving from the Spanish form to the ethnonym. That's an ongoing trend, though of course not universal (e.g. 'German', 'Greek', 'Armenian' etc.). kwami (talk) 21:15, 13 July 2023 (UTC)[reply]
The L2 headers and category names do need to match, at least for readers using tabbed browsing. Otherwise, the categories won't appear in the correct language tab. I think there are also bots that require the L2 header to be the canonical language name in order to work properly. —Mahāgaja · talk 22:20, 13 July 2023 (UTC)[reply]
Okay. Works for me. kwami (talk) 22:24, 13 July 2023 (UTC)[reply]
@Kwamikagami Normally at Wiktionary we use typewriter apostrophes rather than curly single quotes, and this issue is somewhat controversial, so this change is unlikely to happen without significant further discussion and consensus. Benwing2 (talk) 04:27, 24 July 2023 (UTC)[reply]
I'm not requesting quote marks. That would also be incorrect. Rather, since we are attempting to use the endonym, IMO it should be the glottal stop or ejective diacritic that's in the orthography. kwami (talk) 04:41, 24 July 2023 (UTC)[reply]
Indeed, no one is advocating curly single quotes. The modifier letter apostrophe is a different character; it's a letter, not a punctuation mark. There are several other language names besides these two that ought to be using it. —Mahāgaja · talk 06:23, 24 July 2023 (UTC)[reply]
Sarci, for example, which was just moved to its endonym (minus tone marking). But I thought I'd wait to see how things went before attempting a more comprehensive proposal. kwami (talk) 06:27, 24 July 2023 (UTC)[reply]
Support - this isn't a matter of using curly quotes vs straight ones; it's a matter of using the correct letter instead of punctuation. We already do this extensively in entries for languages that use it anyway. Theknightwho (talk) 15:39, 24 July 2023 (UTC)[reply]
Going through WT:LOL, these are the languages whose names have the modifier letter apostrophe at Wikipedia but the typewriter apostrophe here:
Other languages with typewriter apostrophe whose Wikipedia article uses a different character include:
  • gez Ge'ez → Geʽez with ʽ (U+02BD modifier letter reversed comma)
  • hps Hawai'i Pidgin Sign Language → Hawaiʻi Pidgin Sign Language with ʻ (U+02BB modifier letter turned comma)
  • num Niuafo'ou language → Niuafoʻou with ʻ (U+02BB modifier letter turned comma)
  • tct T'en → Tʻen with ʻ (U+02BB modifier letter turned comma)
  • tsl Ts'ün-Lao → Tsʻün-Lao with ʻ (U+02BB modifier letter turned comma)
I support making all of these changes. —Mahāgaja · talk 19:54, 24 July 2023 (UTC)[reply]
I oppose these changes. What is the actual benefit? From the above discussion, there are at least three different Unicode apostrophe-like characters involved, which are easily confused, and it will make it significantly harder to type the language names into headers, categories and the like. This is going to be a major pain in the ass for people like me who will have to clean up wrongly-typed apostrophes in language headers in innumerable articles created by IP's and other occasional contributors, who are unlikely to be able to type the right character. Furthermore, even with these changes, the language names in many cases will not actually match their endonym spelling; cf. the proposed Oʼodham, which is actually spelled ʼOʼodham natively with two apostrophes. Similarly, as pointed out by User:Kwamikagami, our spelling of the CAT:Tsuut'ina language doesn't include the tone mark that is present in the native orthography, and wouldn't even with the change in apostrophe. I should add that Wikipedia uses these Unicode chars specifically because Kwami went around renaming all the articles (formerly they used the straight apostrophes), and is not consistent, e.g. the article on the name of the people is still at O'odham with a straight apostrophe. Glottolog uses straight apostrophes for O'odham; so does [20], the Endangered Languages Project. In general, our policy is to use the *English* names for languages; we are not forced to use the exact native spelling. While I agree it's a good idea to approximate the spelling (e.g. avoiding exonyms where possible), I disagree we have to take this to the extreme of using the "correct" Unicode apostrophes (which I bet you will find native speakers not using in many cases as well). Benwing2 (talk) 20:22, 24 July 2023 (UTC)[reply]
Other people's carelessness in using Unicode is no excuse for us to be careless, and anyway, language names can always be inserted by typing {{subst:\|xyz}}, which doesn't involve any non-ASCII characters. Latin a and Cyrillic а look identical in every font and font style too, but substituting one for the other is an error; it's no different with ' and ʼ. —Mahāgaja · talk 07:05, 25 July 2023 (UTC)[reply]
I think you're missing the point. We don't include Cyrillic letters in language names, either. Benwing2 (talk) 07:13, 25 July 2023 (UTC)[reply]
I know that. My point is that using ' where ʼ belongs is as bad as using Cyrillic letters in Latin-script language names. —Mahāgaja · talk 07:24, 25 July 2023 (UTC)[reply]
I would support the changes, but only if they're truly the most used forms in terms of literature. Ideally we'd have people from each community give their opinions here, but alas, we're not afforded that. If the specific respective unicode apostrophe is used in literature, then we can use it here too. I can see the problem with inputting the apostrophes that's been brought up, but let's be real here, how many people are actually working on these languages to where this'd be a serious problem? I feel like this could be fixed with just an about:XYZ page or something. These languages unfortunately don't get enough traction. But again, I'd only support this if it can be proven that they're the forms used in English literature. AG202 (talk) 01:49, 17 August 2023 (UTC)[reply]
@AG202 I agree with you, that is one of the points I made above, which has gotten lost in this thread. Benwing2 (talk) 02:08, 17 August 2023 (UTC)[reply]
Ahh, got it, missed that, apologies. AG202 (talk) 02:11, 17 August 2023 (UTC)[reply]
Hmm... like Benwing, my initial inclination is to oppose this, because the odds of anyone being able to type names with the fancy characters when adding entries is low (and given recent events, I wonder if one or more admins would block people for 'adding wrong language names' if people keep typing the names they're able to type). OTOH, I recognize that we require entries themselves to be input using correct spellings (with accents etc) and not in hacky ways... If we had a system like the French Wiktionary where no-one had to type the language names (instead only typing language codes, which only consist of easily-typeable ASCII characters), then changing the displayed character would be less of a problem (though still hard for navigating to categories, etc). Do we have a template with a simple short name people could subst: to produce the untypeable names, so they could write =={{subst:langname|foo-bar}}== to get ==Fooʾbar==? Or if we took this type of functionality and had a button people could periodically press (hosted on here like that Javascript is, not as a Python script on the computer of a user who might leave the project or be too busy to run it) that would search the database for instances of the typeable names and update them to the untypeable names, then it would be less of a problem (although it'd still be creating an unending maintenance task). - -sche (discuss) 16:22, 16 August 2023 (UTC)[reply]
We do have {{subst:x2i}} that will convert the string _> to ʼ, but more helpfully we have (as I mentioned above) {{subst:\}}, which converts a language code to its canonical name. —Mahāgaja · talk 21:55, 16 August 2023 (UTC)[reply]
Even with these workarounds, it seems extra work for no gain. There is no rule that says we need to follow native orthography to the T in our English names for languages; otherwise we'd have Deutsch in place of German, and русский in place of Russian, etc. I have seen no arguments that indicate why having these special apostrophes in language names gains us anything except some nebulous sense of "correctness". Benwing2 (talk) 23:07, 16 August 2023 (UTC)[reply]
Deutsch is the endonym. What we're talking about here is using the proper Unicode characters for whichever name we decide to use. The apostrophe is a punctuation mark, and the glottal stop is not punctuation. Using the letter for glottal stop is analogous to using en-dashes and minus signs rather than hyphens. kwami (talk) 00:28, 17 August 2023 (UTC)[reply]
Deutsch is the endonym
Yes exactly. The exonym can have apostrophes while the endonym has Unicode whatever. Nothing wrong with that. Benwing2 (talk) 00:56, 17 August 2023 (UTC)[reply]
@Benwing2 I think we’re getting too focused on Unicode. The thing we should care about is what character is actually intended, which isn’t necessarily the same as what they actually wrote. To use an analogy: we don’t lemmatise the palochka with the numeral 1 or Latin l, even though both are probably more common than the actual palochka character, and that’s because we all know that the writer intended to use a palochka irrespective of what character they actually wrote in Unicode. Theknightwho (talk) 02:18, 17 August 2023 (UTC)[reply]
@Theknightwho I think we'll just have to agree to disagree here. I don't think the analogy you are making here with palochka is very applicable and you're still missing the point made by User:AG202 about what's the most common usage in scholarly and other English sources. Benwing2 (talk) 02:24, 17 August 2023 (UTC)[reply]
@Benwing2 The whole reason I brought it up is as an example of when the most common usage isn’t necessarily an indicator of what’s most appropriate. I’ve also seen plenty of typography mistakes in scholarly sources, too, or fonts that map common characters to a glyph of what is actually intended. You can’t just rely on the codepoint. Theknightwho (talk) 02:27, 17 August 2023 (UTC)[reply]
Just to be clear, when I said common usage, I meant what character is actually intended, not necessarily parsing specifically based on codepoints. However, this isn't an easy task for sure, unfortunately. AG202 (talk) 02:49, 17 August 2023 (UTC)[reply]
Doesn't matter whether it's the endonym or exonym: the apostrophe is a punctuation mark, and these are not punctuation marks. Yes, we can substitute, and that's common enough. We could also use a hyphen for a minus or a double hyphen for an em dash -- those substitutions are common too -- but that doesn't mean we should do that. We could substitute click letters with exclamation marks and pipes. But if we want Wiktionary to look professional, then IMO we should typeset it professionally, and not use ASCII substitutes just because they're easier to type. kwami (talk) 04:06, 17 August 2023 (UTC)[reply]

English. Move to bee's knees: like shits for "the shits". 恨国党非蠢即坏 (talk) 03:58, 24 July 2023 (UTC)[reply]

Maybe, but does bee's knees attestably occur other than as a part of the bee's knees? DCDuring (talk) 15:33, 24 July 2023 (UTC)[reply]
@DCDuring: Are attestations needed to leave "the" out? Other entries seemingly just always leave "the" out, like United States恨国党非蠢即坏 (talk) 10:52, 8 August 2023 (UTC)[reply]
It would be better if we had some evidence for all similar cases. OTOH, if we think new normal users are able to use the failed-search page, then they would find [[bee's knees]], even if they searched for "the bee's knees" (and vice versa). I personally think that normal users can't be assumed to make good use of that page. DCDuring (talk) 12:53, 8 August 2023 (UTC)[reply]
We are quite inconsistent about whether we include the or not, e.g. cat's pyjamas redirects to the cat's pyjamas, contrary to the direction of the the shitsshits redirect. It would be better to try to decide on a general approach rather than move entries piecemeal. DCDuring, you argued in favor of redirecting verb oneself to verb even when it's never attested other than with a reflexive pronoun; it seems to me the same logic would make it better to centralize content at bee's knees, too. "The" is dropped from constructions like this when they're used attributively and in certain other cases (peruse the cites at google books:"and bee's knees"), and in headlinese ("Mayor Says New Parks Are Bee's Knees"). I pointed this out about Talk:The Rock, too. - -sche (discuss) 16:15, 16 August 2023 (UTC)[reply]
Note that this entry was already moved to the bee's knees, per a previous RFM discussion (see the talk page for a link to it). Andrew Sheedy (talk) 02:29, 18 August 2023 (UTC)[reply]
save your clicks 恨国党非蠢即坏 (talk) 02:49, 21 August 2023 (UTC)[reply]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


bee's knees

Suggest merging the bee's knees into bee's knees. Per redirect at the the cat's pyjamas. I'm not sure of Wiktionary SOP, so noting here. HTH. Quiddity (talk) 23:10, 22 August 2013 (UTC)[reply]

I would suggest, instead. merging the other way: is bee's knees ever used in any other combination than with "the", as in the bee's knees? Chuck Entz (talk) 00:34, 23 August 2013 (UTC)[reply]
I too would suggest unifying them as the bee's knees. Mglovesfun (talk) 19:44, 23 August 2013 (UTC)[reply]
Moved. - -sche (discuss) 04:17, 1 February 2016 (UTC)[reply]


2023 — August[edit]

@-sche, Sgconlaw {{circa2}} was created apparently to work around the fact that {{circa}} adds (or added) a comma automatically. Now that I'm changing {{circa}} (along with {{ante}} and {{post}}) not to do this, I don't see any use for {{circa2}} and propose merging it into {{circa}}. Benwing2 (talk) Benwing2 (talk) 21:47, 1 August 2023 (UTC)[reply]

@Benwing2: it seems like {{circa}} and {{circa2}} serve different purposes. The template {{circa}} (and {{ante}} and {{post}}) appear to have been created for quotations in entries that do not use quotation templates. That is why the year appears in bold and there is a comma after the year. On the other hand, {{circa2}} is for adding circa or c. before a year in other contexts, such as in etymology sections or image captions. I suppose {{circa}} and {{circa2}} could be merged, but then some parameter would have to be added to allow for switching between the two formats. Alternatively, if all quotations using {{circa}}, {{ante}}, and {{post}} were replaced with quotation templates, then {{ante}} and {{post}} could be eliminated and {{circa2}} could be renamed as {{circa}}. — Sgconlaw (talk) 05:55, 2 August 2023 (UTC)[reply]
@Sgconlaw I have eliminated the aftercomma from {{circa}}, {{ante}} and {{post}}. What differences remain? Just the boldface? That seems a pretty small thing to have two templates for, esp. given the horrible naming. Benwing2 (talk) 06:17, 2 August 2023 (UTC)[reply]
BTW {{ante}} etc. frequently appear inside of quotation templates. What is the way to do without them? Benwing2 (talk) 06:18, 2 August 2023 (UTC)[reply]
@Benwing2: can you give an example of {{ante}} being used in a quotation template? I haven't come across this before. — Sgconlaw (talk) 06:40, 13 August 2023 (UTC)[reply]
E.g. in cornuto:
{{quote-book|en|year={{ante|1597}}|first=William|last=Shakespeare|authorlink=William Shakespeare|title={{w|The Merry Wives of Windsor}}|section=Act 3, Scene 5|passage=No, Master Brook, but the peaking '''cornuto''' / her husband, Master Brook, dwelling in a continual / 'larum of jealousy, comes me in the instant of our / encounter, after we had embraced, kissed, protested, / and, as it were, spoke the prologue of our comedy}}
I am cleaning all of them up to use e.g. a. 1597 instead. Note that |origyear=, |year_published=, etc. now support a., c. and p. prefixes. Benwing2 (talk) 08:22, 13 August 2023 (UTC)[reply]
Hmm. I would not merge these as things stand now, with them having the differences re bolding that they do, and the differences in where they're used: they currently serve different purposes. (Since we don't normally bold years in etymologies, descendants lists, etc, a template used in etymologies to qualify a year as circa shouldn't bold the year either, whereas we do normally bold years at the start of quotation metadata, so a template that supplies circa there should bold the year.) However, if we replace all of the relatively few (~680) uses of {{circa2}} with just the spelled-out word "circa" — formatted however: "circa", "c.", whatever we decide — rather than a template, we could just delete {{circa2}}. And/or if we made sure all uses of {{circa}} were inside quotation templates (not manually-formatted quotations), then we could presumably have the quotation templates know that if year={{circa|####}}, then format #### in bold (but don't bold circa?), and then if {{circa}} and {{circa2}} stopped differing in the formatting they apply, they could be merged. - -sche (discuss) 16:06, 16 August 2023 (UTC)[reply]
@Benwing2, -sche: strange, I've just received a ping from -sche relating to this discussion from 2023. What's happening with {{circa2}}, anyway? — Sgconlaw (talk) 14:06, 29 March 2024 (UTC)[reply]
Was the ping specifically to this discussion? Fascinating. (If it was just to this general page, I might speculate that my recent removal-then-readdition of a bunch of discussions pinged people somehow, even though it shouldn't because I think you have to add four tildes at the same time as linking someone's username to ping them.) Pings seem to be wonky lately; AG202 pinged me in an edit summary on this entry recently and I didn't get the ping, only noticing that it existed because I had the entry in my watchlist and was looking at the edit history. - -sche (discuss) 14:59, 29 March 2024 (UTC)[reply]
@-sche @Sgconlaw I just got a bunch of pings that claim to be from -sche but were actually old responses of mine *TO* -sche. Strange. As for {{circa2}}, I am not sure anymore; I think when I looked into this awhile ago, I concluded they indeed serve slightly different purposes, although the naming is definitely bad. Benwing2 (talk) 19:32, 29 March 2024 (UTC)[reply]
I notice all the pings are from 6 hours ago and are in WT:RFM specifically, so I think they are indeed related to your removal/readdition of discussions at that time. Benwing2 (talk) 19:38, 29 March 2024 (UTC)[reply]
@-sche: when I clicked on the notification I was led to this discussion. Anyhoo, about this discussion, @Benwing2: what about changing {{circa}} to {{circa-quote}}, and {{ante}} and {{post}} similarly since they are all intended only for use with quotations (though they should be phased out in favour of the {{quote-*}} templates), and then renaming {{circa2}} to {{circa}}? Would that be confusing? — Sgconlaw (talk) 19:43, 29 March 2024 (UTC)[reply]
Welp! I apologize to everyone who just got a bunch of pings from that, then. 😮😅 - -sche (discuss) 20:10, 29 March 2024 (UTC)[reply]

Ktunaxa, Secwepemctsín[edit]

Could we rename Kutenai (kut) to Ktunaxa, and Shuswap (shs) to Secwepemctsín please? The first names are the Anglicized terms for the languages, and are somewhat outdated and/or not in use among speakers. GKON (talk) 22:46, 12 August 2023 (UTC)[reply]

@-sche Can you weigh in here? There is nothing wrong per se with having exonyms for languages (we say "German" not "Deutsch" for example), and I note that Wikipedia still uses Kutenai and Shuswap. The main issue in my view is (a) avoid pejorative terms, and (b) use the most common terms as found in English-language sources. Benwing2 (talk) 23:37, 15 August 2023 (UTC)[reply]
For Shuswap, almost no-one uses Secwepemctsín in English, either in books overall as tracked by Ngram Viewer, or in reference works about the language at Glottolog. For kut, Kutenai was the main name (in reference works/Glottolog and overall/Ngrams) until a few years ago, when Ktunaxa started to just barely overtake it. - -sche (discuss) 17:45, 16 August 2023 (UTC)[reply]
That is true, however I would argue that for Shuswap, the use of this term is declining as seen by Ngram. The replacement is looking like Secwepemc, which is another word for the language that is kind of a good middle ground between Shuswap and Secwepemctsín, wouldn't you say? Also, the actual communities in Secwepemc traditional territory mostly use Secwepemc. For example, if there is some quote or phrase on a billboard in Shuswap, the billboard will say that it's in Secwepemc. Another real life example was a board in Banff town, which had greetings in multiple languages. Among them was Blackfoot, Stoney, Ktunaxa, and Plains Cree, (apart from Ktunaxa) these are all Anglicized terms. However the greeting in Shuswap was said to be Secwepemc.
Shouldn't we be using this term, seeing as it gets the most use in these modern times? GKON (talk) 17:09, 20 August 2023 (UTC)[reply]

Should be merged into Category:Christianity. Ioaxxere (talk) 17:38, 13 August 2023 (UTC)[reply]

@Ioaxxere No: the category covers terms mainly used by religious figures, not terms that merely relate to Christianity. Many of the terms in Category:Thai ecclesiastical terms, for example, have nothing to do with Christianity. Theknightwho (talk) 22:21, 15 August 2023 (UTC)[reply]

(Merged from a request to merge into Category:en:Christianity)
Not clear to me how these are supposed to be distinguished. The boilerplate description at Category:Ecclesiastical terms by language says "terms used only by religious figures", but that's manifestly wrong for the terms at Category:English ecclesiastical terms which are also variously used by commentators like historians or musicologists who may or may not be religious themselves. In reality the category, certainly for English, seems to just contain terms topically related to Christian churches—not just religion in general—and these should be listed under Category:Christianity instead. The "ecclesiastical" label should perhaps also be made an alias of "Christianity". @Andrew SheedyAl-Muqanna المقنع (talk) 15:56, 29 August 2023 (UTC)[reply]

This is actually already on this page! See Ioaxxere's discussion above. It was pointed out that not all the terms are related to Christianity. However, I do agree that "ecclesiastical" is not the best label. Simply labelling according to religion would be preferable, I think. Andrew Sheedy (talk) 17:42, 29 August 2023 (UTC)[reply]
@Andrew Sheedy: Oops, completely missed that. I'll merge the discussions (and add a template to save anyone else making the same mistake). @Theknightwho The Thai category is very interesting, looking through it, but it seems to be describing a very different thing from the English category—maybe the problem is specifically how the English category is being used? —Al-Muqanna المقنع (talk) 18:03, 29 August 2023 (UTC)[reply]
It seems like everything which is in this category would be better off in a specific religion's category or, if pan-religious, in the "religion" category. (But many things currently in the "religion" categories are Christianity-specific, as I raised at Wiktionary:Information desk/2023/August#Christianity_terms_labelled_broadly_"religion" and intend to deal with at some point.) The widespread misuse of the label / category for terms that are better in other categories means we might be better off retiring it, although the other possibility is making it an alias of "religion" and then trying to monitor misuse, which we have to do with "religion" already anyway. - -sche (discuss) 16:33, 6 September 2023 (UTC)[reply]
In the Thai case it might be useful to distinguish between terms that are topically relevant to religions and terms used in religious contexts. I'm not convinced that distinction is generally useful, though: stuff like PBUH would certainly fall into the latter category but I think the (Islam) context label does the job (and labelling it "ecclesiastical" would come off as decidedly odd in general). My inclination would also be to merge it in the way you describe, so moving it to the relevant religion(s) or to the overall religion category if it's non-specific, but that leaves more complicated stuff like the POS subcategories at Category:Thai ecclesiastical terms up in the air given that we don't generally do that kind of breakdown for topic categories. —Al-Muqanna المقنع (talk) 17:19, 6 September 2023 (UTC)[reply]
Do the Thai POS subcategories make any sense or can they simply be deleted? "to kill (a god, high priest, or royal person)." does not seem to be an "ecclesiastical verb"-as-different-from-a-"verb", any more than deicide is an "English ecclesiastical noun", it seems to just include religions figures in its scope. And if specific verbs are only used by Buddhists (or whatever), then using the usual POS categories and then also using {{lb}} would seem to be the normal way of handling that, right? - -sche (discuss) 03:13, 7 September 2023 (UTC)[reply]

Both {{lj}} and {{jaru}} are aliases for {{ruby/ja}}, which calls {{ruby}} and wraps it using {{lang|ja|...}}. Now, why doesn't {{ruby}} take a lang code in the first place? That is strange. But the aliases are terrible; I propose eliminating them both in favor of {{rja}}, which is a logical shortening of "ruby/ja". We could have for example {{rko}} for Korean ruby, if it is so needed. Pinging the Japanese work group (Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): , sorry for the wide ping. Benwing2 (talk) 08:14, 15 August 2023 (UTC)[reply]

All three -- {{lj}}, {{jaru}}, and {{ruby/ja}} -- were the creations of Fumiko Take. They did very little to document any of the templates or aliases they created, and if dim memory serves, they were even aggressively oppositional when asked to provide documentation.
  • Stepping back -- what is the use case for this infrastructure? Do we not already have functional ruby text provided by {{ja-r}}?
Granted, {{ruby/ja}} offers the ability to specify arbitrary ruby text -- but I struggle to think of when we'd actually want that. It's used to great effect in manga, when authors will not uncommonly spell a word to convey a particular sense, and gloss it with ruby to indicate a different word entirely -- but for a dictionary, this is aberrant behavior outside of direct quotes of such texts. I suspect that, in most cases, {{ja-r}} would do just fine for our needs.
  • Stepping back a bit further -- do we need ruby text at all?
Serious question. Tiny kana over the kanji is something that only provides value to people who can already read kana, and is otherwise likely to confuse anyone unfamiliar with Japanese typography (which is probably the greater part of our user base). If a given user can already read kana, they are likely to be savvy enough to be able to match up any provided romanized string to the kanji, much as we get when using {{m|ja|TERM|tr=romanization}}.
I argue that kana ruby text over kanji is snazzy, but it also presents usability issues.
At any rate, I would welcome an overhaul of the listed {{lj}}, {{jaru}}, and {{ruby/ja}} templates. ‑‑ Eiríkr Útlendi │Tala við mig 08:34, 15 August 2023 (UTC)[reply]
@Eirikr I'm in full agreement that these are superfluous to {{ja-r}} (and {{ryu-r}}), but I disagree that we should be getting rid of rubytext. I think the aim should be to incorporate rubytext into {{l}}, {{m}} (et al). The infrastructure for language-specific formatting in links already exists (and is already used by Chinese and the Chinese lects to generate simplified forms), so we could add something for the Japonic languages that essentially reimplements {{ja-r}} (for the relevant language). Theknightwho (talk) 22:09, 15 August 2023 (UTC) Forgot to ping Benwing2. Theknightwho (talk) 22:12, 15 August 2023 (UTC)[reply]
Just to add a bit further to this - I'd also like to automate much of the kanji/kana mapping which is currently necessary with {{ja-r}}. It won't be possible to do away with it entirely, due to redlinks or when there are multiple possibilities, but {{ja-pos}} (and all the other headword templates) are able to do this already by looking at the input for {{ja-kanjitab}}, so there's no reason why link templates shouldn't be able to do this as well.
This would greatly simplify a lot of the complexity encountered when adding Japanese links, which would help with the usability issues Eirikr mentions. Theknightwho (talk) 22:17, 15 August 2023 (UTC)[reply]
@TheknightwhoIs there a way to automatically convert {{ruby}} to {{ja-r}}? There are over 1,000 uses of {{ruby}} (often appearing as {{lj}}) and I'd like to get rid of them if possible. Benwing2 (talk) 22:36, 15 August 2023 (UTC)[reply]
@Benwing2 It doesn’t look like it’ll be straightforward, as the syntax is pretty different unfortunately. I’ll need to look at it more in-depth to get a better idea. Theknightwho (talk) 23:11, 15 August 2023 (UTC)[reply]
Blah. So much crappy East Asian code (and templates) out there. Even if the conversion is possible auomatically in only say 80% of the cases, that would probably be good enough, as we can do the remainder by hand or just leave them. If for example there are cases that can be handled using {{ruby}} and not with {{ja-r}} that is probably fine, but we should not have two ways of doing the same thing and randomly use one or the other. Benwing2 (talk) 23:31, 15 August 2023 (UTC)[reply]
@Benwing2 Yeah, I suspect a conversion is possible, and as a last resort 1,000 uses is doable manually if a few of us handle it.
On the subject of crappy East Asian templates (and before I forget), it’s worth you having a look at the templates reliant on Module:th and Module:km as well. Theknightwho (talk) 01:30, 16 August 2023 (UTC)[reply]
@Theknightwho, my usability concern is not about editing, it's about reading, and about accessing the text as it is rendered in the browser.
  • On the reading side, things like 漢方(かんぽう) (kanpō) are visually unclear to anyone not already somewhat familiar with Japanese typography -- it looks like the entire block of kanji + furigana together is the Japanese "word", ruby and all, when in fact the Japanese term is 漢方. Even if a reader understands that the ruby text is not actually part of this term, the kana are only useful for someone who already knows how to read kana. The kana are also superfluous, as we already include a romanization, which provides the same information just in a different script.
  • In terms of the accessibility of the rendered text, for reasons obscure to me, the <ruby> element in the HTML seems to render the Japanese term un-copyable. If I select the text "things like 漢方(かんぽう) (kanpō) are" as rendered, and hit CTRL+C and then try to paste that somewhere, I only get "things like (kanpō) are" -- the Japanese text itself is missing entirely. Meanwhile, if I select the text "the Japanese term is 漢方." and do the same, I get "the Japanese term is 漢方." -- the pasted text includes everything I expected.
I'm curious, why do you think we should use ruby more? ‑‑ Eiríkr Útlendi │Tala við mig 23:00, 15 August 2023 (UTC)[reply]
@Eirikr I think realistically, most Japanese entries are going to be used by people already familiar with Japanese enough to know what the function of the rubytext is. Although we’re a dictionary in English, that doesn’t change the reality that most dictionary entries are of little use to a complete novice.
You’re right about there being an issue from a copy and paste point of view, and it’s something that it would be good to solve if at all possible. I’m sure there is a solution, but I’d need to look into it. Theknightwho (talk) 23:07, 15 August 2023 (UTC)[reply]
Also, just adding that the rubytext does actually serve an additional purpose to the romanisation, as it shows the reading for each kanji; romanisation can’t do that (unless we used rubytext for that instead, which I don’t think would be very helpful as it wouldn’t show semantic word breaks). Theknightwho (talk) 23:14, 15 August 2023 (UTC)[reply]
If folks are familiar enough with Japanese to where they understand both kana and how furigana (kana used as ruby text) work, then they also have some idea of how Japanese phonemes break down, and how kanji readings work -- so again, furigana wind up largely superfluous to the only audience that knows how to use them.
I really think we (speaking generally) get too caught up in technical details and the coolness factor, and lose sight of usability and usefulness. Outside of those manga-esque cases were the spelling and the intended reading are really orthogonal, like 騎士(ナイト) (naito), I honestly don't think that furigana are useful enough to offset the negative impacts on usability.
... One idea occurs to me. Is there any easy way of toggling ruby display on and off? Thinking further, would there be any way of indicating in the wikicode if ruby is really needed (as in the 騎士(ナイト) (naito) example, otherwise anyone who can read Japanese that looks at 騎士 would expect to read it as kishi), or if the ruby is optional (such as when the ruby just indicates the regular reading of a given spelling)? ‑‑ Eiríkr Útlendi │Tala við mig 19:51, 16 August 2023 (UTC)[reply]
@Eirikr I know next to nothing about Japanese but I can see how ruby text is useful. For example, I can read Cyrillic but I don't know the ins and outs of irregular pronunciations in Russian; in cases like that we show a respelling in Cyrillic as well as give the IPA, and I think the Cyrillic respelling is useful. I imagine there are plenty of Japanese learners who will be able to read Hiragana (it's probably one of the first things taught) but have difficulty with Kanji (keep in mind it takes around 10 years for native speakers to learn to read and write Kanji, and probably only a few weeks to learn Hiragana). Benwing2 (talk) 20:30, 16 August 2023 (UTC)[reply]
I was afraid of some confusion, and indeed, here we have it.  :)
Speaking specifically about ruby for Japanese -- I grant that there are plenty of other use cases in other languages. By no means do I advocate for getting rid of {{ruby}}. I'm looking solely at the use case for {{ruby/ja}} and redirects.
→ For Japanese itself, how is ruby using kana any more useful than simply providing the romanization in parentheses? ‑‑ Eiríkr Útlendi │Tala við mig 20:58, 16 August 2023 (UTC)[reply]
@Eirikr The ruby text seems to allow for convenient markup of running Japanese text without interrupting the flow; putting romanizations in parens in the middle of a sentence would interrupt the flow, which is why it gets added at the end. I could imagine putting romanization in ruby text but it seems that isn't conventional. Benwing2 (talk) 21:05, 16 August 2023 (UTC)[reply]
{{usex|ja}} and {{ja-usex}} put romanization afterwards, not mid-text. I can't think of any case where a romanization would be inserted in the middle of an otherwise-running Japanese text.
  • {{usex|ja|これは見本です。|This is an example.|tr=Kore wa mihon desu.}}
これは見本です。
Kore wa mihon desu.
This is an example.
  • {{ja-usex|これは見本です。|これ は みほん です。|This is an example.}}
これは見本(みほん)です。
Kore wa mihon desu.
This is an example.
We could also leverage {{ja-r}}.
  • {{ja-r|これは見本です。|^これ は みほん です。|This is an example.|linkto=-}}
これは見本(みほん)です。 (Kore wa mihon desu., This is an example.)
In terms of the wikicode used to call the templates, I'd argue that {{ruby/ja}} is more of a mess, and the syntax is confusingly different from the rest of our Japanese infrastructure.
From the markup example on the Module:ja-ruby page (what {{ruby/ja}} actually invokes):
  • [[振る|[振](ふ)り]][[仮名|[仮](が)[名](な)]]
Yuck. Granted, part of the problem here is borderline link abuse, but by way of comparison, we could use {{ja-r}} to similar effect, with a more straightforward syntax:
  • {{ja-r|[[振る|振り]][[仮名]]|ふりがな}}
()仮名(がな) (furigana)
Separately, in looking for examples of {{lj}} just now, I'm finding cases where {{lj}} seems to have been used as a replacement for {{lang|ja}} -- there are no ruby characters provided. See this snippet of the wikicode source at 会う#Japanese, for instance:
  • {{quote-book|ja|year=1923|author={{lj|夢野久作}}|title={{lj|約束}}}}
This kind of template misuse should probably be cleaned up as part of this overhaul. ‑‑ Eiríkr Útlendi │Tala við mig 22:32, 16 August 2023 (UTC)[reply]
@Eirikr I completely agree. BTW I have added language prefix support to the quote-* templates, so you can just write this:
{{quote-book|ja|year=1923|author=ja:夢野久作|title=ja:約束}}
and it produces this:
1923, 夢野久作, 約束:
which ought to be the same as (ab)using {{lj}}. Benwing2 (talk) 02:13, 17 August 2023 (UTC)[reply]
Brilliant, thank you! That looks to be much more elegant of a solution. Cheers! ‑‑ Eiríkr Útlendi │Tala við mig 16:47, 17 August 2023 (UTC)[reply]
@Eirikr For reference, on MacOS, using Chrome, when I copy the text with Ruby in it and paste it into TextEdit I get this:
things like 漢方
かんぽう
(kanpō) are
The same thing happens using Safari, which suggests it's an OS issue, although possibly there are carriage returns in the underlying text that are leading to this. Benwing2 (talk) 23:10, 15 August 2023 (UTC)[reply]
Geez. I asked this one in February and again in March to update the documentation of Module:languages/data/2 for the "generate_forms" stuff that is otherwise largely unexplained. With the promise "I'll add it shortly" half a century passed and the documentation is still nowhere to find. Now he suddenly jumps out and complains how Japanese does not follow the Chinese model... -- Huhu9001 (talk) 01:54, 16 August 2023 (UTC)[reply]
@Huhu9001 Not interested in your drama. Theknightwho (talk) 04:02, 16 August 2023 (UTC)[reply]
"Eliminating them both": does "both" mean t:ruby/ja and t:ruby, or t:lj and t:jaru? -- Huhu9001 (talk) 08:36, 15 August 2023 (UTC)[reply]
@Huhu9001, Eirikr My original proposal was to rewrite {{lj}} and {{jaru}} into {{rja}} as a shortcut for {{ruby/ja}}, but given what Eirikr says, maybe we don't need either of them, or {{ruby/ja}} for that matter. It sounds like maybe the best thing is for {{ruby}} to take a language code and use it to wrap the generated text appropriately, and to simply use {{ruby|ja|FOO}} when you really need to display arbitrary ruby that can't be handled by {{ja-r}}. Then we can get rid of {{ruby/ja}} and its shortcuts. Thoughts? Benwing2 (talk) 19:42, 15 August 2023 (UTC)[reply]
That sounds like a saner approach (using something like {{ruby|ja|FOO}}), but I say this in ignorance of the implementation details. ‑‑ Eiríkr Útlendi │Tala við mig 20:36, 15 August 2023 (UTC)[reply]
T:ruby sometimes serves to prevent double wrapping of language HTML classes, mainly in |title= or |chapter= of quotation templates, like this one |title={{lw|ko|s:님의 침묵/생의 예술|{{ruby|[生](생)의 [藝](예)[術](술)}}|tr=Saeng-ui yesul}} in .
If anyone wants to get rid of t:ruby and replace it with t:ja-r entirely, that could mean you will have to type {{ja-r|.....|linkto=-|tr=-}} every time you want just pure text but nothing else. -- Huhu9001 (talk) 01:54, 16 August 2023 (UTC)[reply]
@Huhu9001 We seriously need to avoid having to wrap one template in another. Maybe we need to make {{ruby}} smarter so that it can handle cases like the one above. Can you enumerate other cases where {{ruby}} gets wrapped in another template, or vice-versa, that can't simply be replaced by the equivalent of {{lang|FOO|{{ruby|...}}}}? Benwing2 (talk) 03:43, 16 August 2023 (UTC)[reply]
There are some cases when you want to ruby only a part of text. Then it can be done like: {{lang|LANG|unrubied text, blahblah, {{ruby|somehow rubied text}}, more blahblah}}. One such usage is in 閣下. -- Huhu9001 (talk) 04:14, 16 August 2023 (UTC)[reply]
@Huhu9001 Assuming that {{ruby}} is modified to do lang markup, why can't you just wrap the whole text in {{ruby}} and only annotate the portion of text you want the Ruby stuff added to? Benwing2 (talk) 04:26, 16 August 2023 (UTC)[reply]
  1. Sometimes wrapping the whole text produces repetition. It may become:
    {{lang|LANG|unrubied text, blahblah, {{furigana|text to be rubied|ruby}}, more blahblah}} vs
    {{furigana|LANG|unrubied text, blahblah, text to be rubied, more blahblah|unrubied text, blahblah, ruby, more blahblah}}
  2. Sometimes the wrap is not t:lang. It may be t:quote, like when you have {{quote|ja|text=(ruby text)...}}. How do you "wrap the whole text" for this one?
-- Huhu9001 (talk) 05:05, 16 August 2023 (UTC)[reply]
@Huhu9001 You need to think outside the box a bit. For #1, we're talking for the moment about {{ruby}} not {{furigana}}, but {{furigana}} can be made smarter like {{ruby}} is, so that you can annotate part of the text. For #2, {{quote}} should be modified not to language-tag text that already is language-tagged, so it's OK to write {{ruby|ja|...}} inside of {{quote}}; and/or we make a ruby-quote template, similar to how we already have {{ja-x}}; and/or we add built-in support to {{quote}} for ruby text. In general, having to manually wrap using both {{lang}} and {{ruby}} inside of each other is super ugly and should be avoided. Benwing2 (talk) 05:22, 16 August 2023 (UTC)[reply]
@Huhu9001 What I'm probably going to do is modify {{ruby}} so it takes a language param, but you can write {{ruby|-|...}} to force no language wrapping, so that if you really want to embed one template in another, you can do it without fear. Benwing2 (talk) 05:27, 16 August 2023 (UTC)[reply]
I do not think inside or outside the box. I just tell you the current situation because you asked. -- Huhu9001 (talk) 05:40, 16 August 2023 (UTC)[reply]
I don't see any real need for {{lj}} or {{jaru}}, but I haven't looked at any current uses. It seems to me that {{ruby|ja|}} should suffice. As regards e.g. {{ja-r}}, it makes sense to me to use hiragana ruby with kanji, as this is fairly commonly done in Japanese-learning materials. It seems to me (again naively, without having done any specific research into the question) that users are likely to include a fair number of Japanese language learners. Cnilep (talk) 01:49, 16 August 2023 (UTC)[reply]

We state that Commonwealth of the Bahamas is the official name; however, I created the alternative form Commonwealth of The Bahamas, which Wikipedia states is the official name, providing a reference to “The Constitution of the Commonwealth of The Bahamas”. Per this, I capitalized the at Bahamas (i.e., “Official name: Commonwealth of The Bahamas”). Which should be the main entry? J3133 (talk) 06:15, 27 August 2023 (UTC)[reply]

Translingual.

Asked to request a move because this is not attested as 'translingual', being so far found only in Northern Thai. I don't know why we'd want to request a move rather than just fixing the section header. kwami (talk) 14:14, 27 August 2023 (UTC)[reply]

ฺSee WT:Beer parlour/2023/July, especially Chuck Entz's reply of 01:32, 17 July 2023 (UTC) in WT:Beer parlour/2023/July#Translinguality of Characters in Thai Block. --RichardW57 (talk) 07:18, 29 August 2023 (UTC)[reply]
@Kwamikagami: The analogy is that we would be moving อ‍ย#Translingual to อ‍ย#Northern Thai - and such things are ultimately likely to become pages of their own. --RichardW57m (talk) 08:43, 29 August 2023 (UTC)[reply]
If that's what this forum is for, then sure. I thought 'moving' meant renaming pages. kwami (talk) 10:14, 29 August 2023 (UTC)[reply]
That's why this is a test case, to confirm that it is indeed an appropriate forum for such changes as @Chuck Entz advised. For precedent, see #busy above, whose main-space alerting template is {{rfm-sense}}. --RichardW57m (talk) 12:21, 29 August 2023 (UTC)[reply]

2023 — September[edit]

Oriya (or) -> Odia (or)[edit]

Previous discussions on the matter:

Same issue, Oriya should be renamed to Odia (same thing applies to the script). According to multiple discussions, our language names are based on common English usage, and it looks like the consensus is Odia as of late. I'll copy-paste what I said last time:

"Glottolog, Collins Dictionary, Wikidata, Wikimedia, and the first instance of "Odia" or "O[r]iya" on the language's own Wiktionary seen at the top here all have made the change to and use "Odia" as the (primary) spelling as well. Also, see this discussion on the English Wikipedia about the matter, with the eventual decision being to move to "Odia" based on the rapidly increasing usage at the time 6 years ago (which is bound to have increased by now). [...] More sources: Cambridge Dictionary, the Oxford English-Odia Dictionary (ISBN: 9780199474554), Microsoft, Google (another Google source), the Concise Oxford Dictionary of Linguistics, and multiple Wikimedia blog posts made by Odia natives which label Odia wiki projects using "Odia".

The main primary source, dictionary-wise, that I've been able to find in support of "Oriya" is the OED, but even in the old Lexico dictionary they preferred "Odia". Dictionary.com lemmatizes both, but interestingly enough only puts the name of the ethnic group at "Odia". Ethnologue also prefers "Odia" for the name of the language. As stated two years ago, it feels very strange that we're very much in the minority that still uses the old name. CC: @Benwing2 AG202 (talk) 15:09, 6 September 2023 (UTC)[reply]

Support. Common name for it in English. CitationsFreak (talk) 15:16, 6 September 2023 (UTC)[reply]
Support. I don't think the OED is that relevant in this case, their wheels turn slowly at the best of times and their last citation is from 2001. —Al-Muqanna المقنع (talk) 15:42, 6 September 2023 (UTC)[reply]
I think the rename from Oriya -> Odia was a bit pointless but I won't oppose a rename on common-usage grounds. Benwing2 (talk) 23:39, 6 September 2023 (UTC)[reply]
Support per above, this change is well supported by sources. MSG17 (talk) 00:16, 7 September 2023 (UTC)[reply]
Oppose I now prefer the term 'Odious', but am prepared to compromise and use the term 'Oriya' for the script. Note that the official character names all contain 'ORIYA'. --RichardW57 (talk) 10:26, 9 September 2023 (UTC)[reply]
Unicode is notoriously slow to update names if that’s what you’re referencing, but I can’t take the first point seriously. AG202 (talk) 23:49, 9 September 2023 (UTC)[reply]
Unicode characters names are immutable. The best closest they come is aliases, as when it was pointed out that several pairs of Lao character names were the wrong way round. --RichardW57 (talk) 18:08, 10 September 2023 (UTC)[reply]
So, they don't change with the times? CitationsFreak (talk) 21:47, 10 September 2023 (UTC)[reply]
@CitationsFreak They’re never changed, even when they’re completely wrong. It’s about ensuring backwards compatibility. Theknightwho (talk) 02:45, 18 September 2023 (UTC)[reply]
 Done. Benwing2 (talk) 04:28, 1 November 2023 (UTC)[reply]

I'd do it myself, but there are some translations to deal with, all of which seem to be plural. Once we've singularized them it's a piece of cake Jewle V (talk) 15:30, 6 September 2023 (UTC)[reply]

@Jewle V: If the singulars are attested. It could be as non-trivial as singularising poultry and cattle. --RichardW57 (talk) 10:53, 9 September 2023 (UTC)[reply]
The singular is certainly possible and probably easy to attest, I could easily imagine someone saying or writing “I put my right running shoe on before my left one”. The real issue is that the entirety of the definition and translations should be at running shoe and running shoes should be left as a stub entry, simply defined as ‘plural of running shoe’ as that’s how we normally handle plurals. —-Overlordnat1 (talk) 11:28, 9 September 2023 (UTC)[reply]
I would compare/contrast this with glass / glasses. A glass would not refer to half of glasses ("spectacles") or pair of glasses in current speech. Other similar terms are scissors, pants. Thus, there is some reason to have the main entry at running shoe. OTOH, one could easily define running shoe as "one of a pair of running shoes" (finessing the fact that running shoes in the definition means "plural of running shoe). I don't think we need to be stingily "consistent" in our definitions with only one lemma for running shoe and running shoes. DCDuring (talk) 15:35, 9 September 2023 (UTC)[reply]
The issue is not whether the singular exists in English. That is not a problem. The question is what happens in other languages. Possibly it's no worse than having to enter singulative and lemma form as for Welsh for mouse. My conceptual model is where the base foreign word means 'footwear', and additional words are needed to express a single shoe. --RichardW57 (talk) 18:33, 10 September 2023 (UTC)[reply]
Uncountable footwear? DCDuring (talk) 19:03, 10 September 2023 (UTC)[reply]
I agree that there’s no such thing as uncountable footwear but if you look at Category:en:Footwear then you’ll find 12-14 things listed as plural only and 6-7 as normally plural. The plural only entries are as follows: Birks, daisy roots, dubes (perhaps this should be capitalised like Birks?), elastic sides, gamashes, heels, hold-ups, Jesus boots, purser’s crabs, rubber shoes, studs and street shoes with uppers and wellies somewhat unclearly categorised but arguably in this group. The normally plural entries are as follows: basketball shoes, bobby socks, Doc Martens, oversocks, seven-league boots and Wellington boots (with wellies somewhat ambiguously labelled). Notice that Jimmy Choos isn’t treated as normally plural (like Doc Martens is) or always plural (like Birks and dubes are), so we have inconsistency between brands too! --Overlordnat1 (talk) 23:33, 10 September 2023 (UTC)[reply]
@Overlordnat: In English footwear is uncountable. The inconsistencies may reflect real usage differences. Interference from much more common senses of the singular by prevent the singular from being successfully used. The applies to glasses and heels, I think. There might be a different interference involving the brandname slangs/shortenings, like Dubes, Birks. Doc Martens involves a terminal s providing an additional reason to give it a plural interpretation. The most generic terms, like sock/socks, shoe/shoes, boot/boots, legging/leggings may all be used similarly (a rebuttable presumption). The longer terms may be relatively uncommon in the singular because, say, oversock-ness might not come up in the kind of discourse where left and right do. DCDuring (talk) 01:46, 11 September 2023 (UTC)[reply]
@RichardW57: I was interested in whether the foreign words that mean "footwear" were uncountable in their languages. What would that tell us about running shoe and running shoes? DCDuring (talk) 01:46, 11 September 2023 (UTC)[reply]
@DCDuring: a corollary to that is whether such words are required to have the dual number in languages that allow it, since human bilateral symmetry makes that the most natural way to characterize sets of footwear. That might help distinguish plural-only as an indeterminate amount vs. a specific number that the grammar just doesn't express. Chuck Entz (talk) 02:22, 11 September 2023 (UTC)[reply]
Doesn’t really matter what happens in other languages, as the interpretation of the terms mentioned is specific to the grammar of the language rather than the lemma wherein we refer thereto: which number the unmarked form is assigned. Hence organisms that are not usually individualized default to الْجَمْع (al-jamʕ, plural; collective) in Arabic: دُلْب (dulb, plane) may actually be multiple plane trees but دُلْبَة (dulba) is a single one, and دُلْبَات (dulbāt), plural in form, is a few, paucal in sense, and I don’t need to say anything specific at plane. This generally applies to shoes: جَزْمَة (jazma, boots, shoes) is collective; native speakers use to be embarrassed when I ask them how they call a single shoe then, as it would be supposedly useless to have a single shoe; the plurals given in the entry are for the universal sorter. Unable to deploy this inside baseball, intuitively understood by but never explicitly formulated to language users, we simply translate citation forms with citation forms. Fay Freak (talk) 03:18, 11 September 2023 (UTC)[reply]
It's too bad that most normal English users wouldn't recognize "dual" in the relevant sense, so that it could be used in a label. Maybe we could have a generic usage note (preferably templatized and not too long) for the various terms that are often used in English pair of expressions, that refers to the existence of dual as a case in other languages. (Would a similar approach be useful for English uncountable terms, explaining common use in expressions like '[amount] of X' (eg, '2 pints of milk') and '[container] of X (eg, 'glass of milk', 'vast sea of milk')?). DCDuring (talk) 14:13, 11 September 2023 (UTC)[reply]

Because moving a single page and renaming a language are actually very different processes. Jewle V (talk) 16:09, 6 September 2023 (UTC)[reply]

Not really: Moving a language entails moving all category pages in that language and/or merge all pages with pages of another language. Thadh (talk) 16:18, 6 September 2023 (UTC)[reply]
@Jewle V, Thadh: I'm pretty sure that when Jewle V wrote renaming a language, he meant changing the language. In this case:
  1. The request should be to split off "Wiktionary:Requests for changing the language".
  2. Thadh's remark is beside the point.
Notifying the other interested parties: @Kwamikagami, Benwing2, Chuck Entz.
If we do make the split to RFM, what should the page name and template be? I suggest WT:RFL and {{rfl}}. I suggest this new template record current and proposed, existing language code, and be applicable to both language sections and senses. --RichardW57 (talk) 10:47, 9 September 2023 (UTC)[reply]
If we make the split, I think there'll be a flood of requests (some in batches of about 50) over a few months and then things will go quiet. --RichardW57 (talk) 10:57, 9 September 2023 (UTC)[reply]
Huh? How is my remark beside the point? Say you rename the Foo language by Bar language. You'll have to move CAT:Foo lemmas to CAT:Bar lemmas. If you merge the Foo and Bar languages, you'll have to merge CAT:Foo lemmas and CAT:Bar lemmas. If you instead split the Foobar language into the Foo and Bar languages, you split CAT:Foobar lemmas into CAT:Foo lemmas and CAT:Bar lemmas. Thadh (talk) 14:10, 9 September 2023 (UTC)[reply]
@Thadh: Because I don't think that is what @Jewle V was talking about. If it was, we are not 'in this case', and your remark is not beside the point. --RichardW57 (talk) 23:17, 9 September 2023 (UTC)[reply]
@Jewle_V, Chuck Entz: We've already had some of this discussion at #อ‍ย above.
One big difference between moving a page and changing the language is that links instantly completely break. With moving a page, hard redirects may remain. --RichardW57 (talk) 10:31, 9 September 2023 (UTC)[reply]
A bit of history: before Lua, language codes were handled by templates named after them, so you used {{en}} where we now use other templates with a language code parameter. In those days, changing the language Foo with language code xyz to the language Bar with the language code xzy involved actually moving the [[Template:xyz] template to [[Template:xzy]]. After we retired all the language-code templates (which freed up an enormous number of two-letter and three-letter template names and aliases, by the way), we still kept language-code and language-name matters at rfm because none of the existing venues made sense as an alternative.
It comes close to making sense, since renaming a language could be interpreted as moving the content into new language sections, categories, etc., and changing a language code means moving the templates and modules named after it to new names. When we decide that varieties of a given language are really independent languages, that's indeed splitting the old language, and deciding that a separate language is really just part of another one is indeed merging the two languages.
Renaming rfm would be a really bad idea, since we do lots of things here, like moving, merging and splitting of entries, that have nothing to do with language names or codes. If we decide to split off this function, there are a few possibilities I can think of for naming of the new page:
  1. Looking at what kind of places are symbolic of interaction between sovereign international entities in the same way that beer parlour and tea room are symbolic of gathering for discussions (maybe an embassy or the UN?), or of interaction between people of different languages and/or nationalities.
  2. Basing it somehow on Language treatment, which is where these matters get documented.
  3. Coming up with a name based on a literal explanation of the function of the new page in the same way all the RFD, RFM and RFV pages are named. "Requests for renaming language" doesn't work, since the function involves specific languages rather than language as a whole, and it involves a lot more than just changing language names.
Chuck Entz (talk) 18:58, 9 September 2023 (UTC)[reply]
And a subpage? Fay Freak (talk) 03:28, 11 September 2023 (UTC)[reply]

The same sense is in both places; it shouldn't be. Either leave it at hinge where it has a usage note about the "(up)on", or move it to hinge upon and reduce the relevant sense-line at hinge to a {{used in phrasal verbs|en|hinge on}} pointer. - -sche (discuss) 01:16, 7 September 2023 (UTC)[reply]

'Druther have a redirect from hinge on to hinge. Personally, I am loath to assert that hinge on is a phrasal verb. If it really is one, we should have an entry for it.
OTOH, MWOnline has entries for both hinge#Verb and hinge on. They have a definition that says "used with (up)on" and other dictionaries (not just idioms/phrasal-verbs dictionaries have hinge on entries. DCDuring (talk) 20:20, 7 September 2023 (UTC)[reply]

There are two sense for "To agree as a second person to", which are even acknowledged in the entry itself. One's enough, I guess Jewle V (talk) 12:35, 9 September 2023 (UTC)[reply]

I would vote for putting the sense under the top one, because that's where people will look for it. Additionally, even if the true etymology is from Latin, it's certainly not widely seen that way, since people will say "I second this" ... "I third this", and so on, rather than using whatever the appropriate Latin word would be. Soap 16:53, 9 September 2023 (UTC)[reply]
All senses are from Latin, displacing native twoth. The problem here is that the "agree" sense is partly (and originally) from Etymology 3, and partly rederived from Etymology 1. Rather than reduplicating the sense a better solution would probably be to stick to one section and note the reinforcement in the etymology. —Al-Muqanna المقنع (talk) 18:24, 9 September 2023 (UTC)[reply]
Rather, it muscled in on the territory of native other, for which at Wiktionary we have to go back to Old English ōþer. I can only find twoth as part of a compound ordinal, which is a new function for the meaning 'second'. --RichardW57 (talk) 23:44, 9 September 2023 (UTC)[reply]

They're the same word. While you're at it, Middle English porthors, which I just sloppily created to house to Chaucer quote, needs a tidy too. This, that and the other (talk) 05:37, 14 September 2023 (UTC)[reply]

Hope - valleys[edit]

There are two senses which are probably the same. Etymology 3 - A hollow; a valley, especially the upper end of a narrow mountain valley when it is nearly encircled by smooth, green slopes; a combe. and Etymology 4 - A sloping plain between mountain ridges Jewle V (talk) 18:48, 14 September 2023 (UTC)[reply]

English. To be moved to be going to. Most modern grammars and some dictionaries treat be going to (" ~ will") as an idiom. The inflection line for [[going to]] has long been "(begoing to". Edit summaries show that contributors here have thought the expression included be. I can't think of another copula that could substitute for be. I also can't picture anything other than adverbials like yet, still, later, and some other short temporal adverbs (with or without not) appearing between be and going to. IOW, it's close to being a set phrase. The adverbial insertions would look good in some of the usage examples. DCDuring (talk) 18:51, 14 September 2023 (UTC)[reply]

Support. (Though it occurs to me that the pronunciations and "gonna" altforms will need to be handled if it's moved.) —Al-Muqanna المقنع (talk) 19:53, 14 September 2023 (UTC)[reply]
Indeed. DCDuring (talk) 23:52, 14 September 2023 (UTC)[reply]
I think colloquially the "be" can be elided (especially when there's pro-drop), so it would be best to keep it at going to IMO. — justin(r)leung (t...) | c=› } 17:36, 19 September 2023 (UTC)[reply]
Yeah, I did wonder about that after writing the above and apparently in Early Modern usage (according to the chapter I cited as a source for the etymology) it appears without be as well. So I'm not sure. It might be worth first collecting attestations without "be" to verify how to treat the elided use (e.g. it could be moved and the elided version turned into an informal altform). —Al-Muqanna المقنع (talk) 18:08, 19 September 2023 (UTC)[reply]
Aren't are our normal users better off with a main entry at an unelided form with redirects from any elided forms. In this case, were we to have an additional entry at going to, we should have at least three definitions at going to, to wit, 1., "elided form of be going to; 2. Used other than figuratively or idiomatically: see going,‎ to.; 2.1 "moving toward" (subsenses for 2.1.1 progressive verb, 2.1.2, for gerund?); and, possibly, 2.2, et seq. DCDuring (talk) 22:11, 19 September 2023 (UTC)[reply]
I think I'm thinking of the "be elision" kind of as "be elision" in other cases, such as with "be" + adjective. In other words, the "be" is not core to the construction. But perhaps this is debatable. — justin(r)leung (t...) | c=› } 02:49, 20 September 2023 (UTC)[reply]
I would consider this expression to be distinguishable from, say, about to/be about to because there are copulative verbs that can occupy the be slot (seem and the other perception copulas.). DCDuring (talk) 12:53, 20 September 2023 (UTC)[reply]

Donut is common in American English while doughnut is common in British English. Both are considered valid spellings. Therefore, a merge is unnecessary. — This unsigned comment was added by Netizen3102 (talkcontribs) at 06:30, 15 September 2023 (UTC).

This was first proposed some two and a half years ago above at #donut, doughnut but no one has discussed it. It wouldn't be a redirect, of course, just a reduction of donut to {{alternative spelling of|doughnut}} instead of having two full-fledged entries saying the same thing. Merging seems like a good idea to me. —Mahāgaja · talk 07:13, 15 September 2023 (UTC)[reply]
The donut spelling was probably popularized by the Dunkin' Donuts chain simply because it fit better on the sign. For what it's worth, the Garfield strip text search has 126 donuts and 38 doughnuts, with a clear trend towards the shorter spelling in more recent strips. ngrams shows doughnut ahead, but not by much. Soap 14:58, 18 September 2023 (UTC)[reply]
There are a few terms like donut hole (the US health insurance thing) that will almost always occur with just one of the two spellings, but I agree this is best handled with a merge, whichever spelling we should choose to stabilize on. Soap 15:00, 18 September 2023 (UTC)[reply]
I never knew about the Garfield text database, that's awesome. It's interesting to see that in the 1980s and '90s, doughnut was more common, though donut also occurred, but since 1997 he has used the spelling donut exclusively. —Mahāgaja · talk 15:06, 18 September 2023 (UTC)[reply]
I should point out that while I support a merge, I can see a good argument for each of them being the preferred spelling, so I dont think we should start the discussion in the presumption that doughnut will be the winning target. With usage being so nearly equal, this is a difficult decision to make. Soap 15:05, 18 September 2023 (UTC)[reply]
The "doughnut" page is older than the "donut" page. Per policy, this means that the doughnut page must be the main lemma. CitationsFreak (talk) 15:28, 18 September 2023 (UTC)[reply]
Wait what? Is that an actual thing? AG202 (talk) 15:31, 18 September 2023 (UTC)[reply]
Whether policy of practice, that would be a historical thing, designed to harness pondism to encourage more rapid creation of entries while providing a simple rule to avoid conflicts otherwise hard to resolve in the absence of satisfactory corpora. I think it also gave an edge to right-ponders.
The donut spelling may the future of this term, judging by the number of children's book titles having that spelling. But just looking at Google Books, doughnut otherwise seems to have a narrow frequency edge. I judged by counting the number of Books pages with determiner + [do/dough]nut spellings. Since Google's algorithms are opaque, perhaps this approach is not trustworthy. I didn't look, for example, at COCA or any other corpus. We should probably give more weight to recent (21st century) relative frequency. 15:38, 18 September 2023 (UTC)DCDuring (talk)
@AG202: See WT:AEN#Regional differences. —Al-Muqanna المقنع (talk) 16:13, 18 September 2023 (UTC)[reply]
Note the top of the page: "This is a Wiktionary policy, guideline or common practices page. Specifically it is a policy think tank, working to develop a formal policy." Presumably a formal policy would be voted on. I think more recent practice (also not policy) is to try to use relative frequency in current (actually, recent past) usage to decide on which is the main entry. DCDuring (talk) 17:25, 18 September 2023 (UTC)[reply]
I prefer having the main entry at doughnut not because of frequency but simply because donut is considered a misspelling in some English-speaking countries, while doughnut is not considered a misspelling anywhere. It's similar to realise/realize (and all other -ise/-ize verbs) in that way. —Mahāgaja · talk 17:45, 18 September 2023 (UTC)[reply]
Ah. Cryptoprescriptivism! DCDuring (talk) 18:07, 18 September 2023 (UTC)[reply]
Thanks yeah, I'm glad that's not a formal policy. I think that I'd definitely prefer that it be based on relative frequency. AG202 (talk) 18:14, 18 September 2023 (UTC)[reply]
I want to point out that the spelling doughnut is also an example of U vs. non-U English. It is easy to see how the Smithsonian Magazine writing that Pres. Berdymukhamedov “performed doughnuts in his rally car” does not correspond to the everyday reality to anyone but a bunch of super-rich. Asbos called, as I have found today, “boy racer” in Britain, Tuner- und Autoposerszene by the police in Germany, obviously would be flabbergasted when confronted with thus highbrow a spelling. Not at all laddish to write that somebody “did doughnuts at a cemetery”: we see the journalist, or editorial interference (one reason why I distrust word frequency statistics made by the help of newspapers).
Because we, though championing a heart for the rich, have as much a democratic as an elitarian mindset, I believe it to follow that all should be put at donut, to avoid duplication and misleading labels.
I have seen on the page sely how alternative form pages can also claim that an alternative spelling is particularly or more used in particular meanings, if that is felt necessary. Fay Freak (talk) 12:41, 24 December 2023 (UTC)[reply]
Merged by -sche. J3133 (talk) 03:02, 30 March 2024 (UTC)[reply]

Akan varieties[edit]

@-sche This is another mess. Wikipedia has an article Akan languages yet according to both Glottolog and Ethnologue, all varieties are mutually intelligible and better classified as dialects, and indeed we have a single Category:Akan language (code 'ak'). The correct family tree seems to include a top level division into Fante, Twi and Wasa, all of which have ISO 639-3 codes (respectively fat, twi, wss; and Twi has the ISO 639-1 code 'tw' as well). Twi in turn is divided into Asante, Akuapem and Bono. Fante and all three Twi varieties have their own literary standards, and there is also a unified Akan literary standard based primarily on Akuapem. Up until recently, we had {{dialectboiler}} categories for Fante and Twi, called Category:Fante Akan and Category:Twi Akan. I added etym-only varieties for those two as well as for the Twi lects of Asante, Akuapem and Bono. Then I discovered we also have separate languages under Akan for Category:Abron language (= Bono), Category:Wasa language and Tchumbuli (which has no lemmas, and I have no idea what it is). None of these Akan languages have very many lemmas (< 10 each), and as mentioned Tchumbuli has none. I would recommend either we convert Akan into a family and fix up the hierarchy appropriately, or (preferably) we maintain the single Akan language and convert the sublanguages into etym-only varieties. The list of varieties under Category:Akan language is also somewhat messed up (e.g. what is 'Twi-Fante'?), but that is less important. Benwing2 (talk) 18:10, 17 September 2023 (UTC)[reply]

Looking into the history (of the codes, on Wiktionary), I think the sub-dialects simply escaped notice at the time Twi, Fante, and Akan were merged. I note that the two Wasa entries we have are identical to Akan, and the Abron ones are very similar. I would merge them; AFAIK the difference was historically in spelling, not in speech, and since the 70s also not anymore in spelling. (I entered the Abron entries a year before the lects were merged, using a reference published two years before the speakers of Abron and the other dialects of Akan unified their orthographies. The Wasa entries were added in 2021 by a Japanese editor, also using an old pre-reform ref, which the user also used for the Akan spelling: we should check what the modern spelling is...) Re "Twi-Fante" being listed as a "variety" of Akan: it was originally listed as an alternative name of Akan; when 'alternative names for the language' and 'names of varieties' were split into being separate parameters, someone must've mis-assigned it. - -sche (discuss) 06:02, 23 September 2023 (UTC)[reply]

English: to '(all) kidding aside', which would be a small change to the existing entry at kidding aside.

IMHO this is an example of a more economical and informative way of presenting information about some multi-word entries. Some MWEs can accept the insertion of one and only one word into the expression. Strictly speaking, there is nothing wrong with our existing approach, except that both forms tend to become, or risk becoming, full entries, as in this case. If we believe this approach is right in this case, I would put the general idea to BP consideration and, if considered necessary, a vote.

Some standardization of approach to optional terms (like this one) and dummy terms (someone, something, someone or something) in English headwords might be worth codifying, either as a policy or as a guideline. Similarly '(be)' might be a desirable element of the headword display when some other copulative verbs could replace be. Also some standardization of the use of hard redirects might be helpful. DCDuring (talk) 18:40, 18 September 2023 (UTC)[reply]

English. to under someone's wing (now a redirect to under one's wing), per recent discussions on use of one for reflexive verbs and other cases where a subject is identical to the referent of one. I don't care whether we would keep the (new) redirect after the move. DCDuring (talk) 00:24, 22 September 2023 (UTC)[reply]

English. I have moved senses (and their translations) between on the way and on one's way, making on the way a lemma with most of the definitions from on one's way rather than an alternative form. It occurs to me that there might be a pondian difference that would account for the previous arrangement. In any event, please review the changes. DCDuring (talk) 15:00, 23 September 2023 (UTC)[reply]

At least some categories which were thought (at the time they were created) to be restricted to a single language exist at "one-off" names, e.g. we have Category:Top-level domain codes and Category:European food additive numbers as direct subcategories of Category:Translingual language, instead of them being name Category:mul:Top-level domain codes and Category:mul:European food additive numbers and being in the set-category system. I have two questions:

  1. If a set category is truly restricted to one language (e.g. Translingual), should we leave it at whatever prefixless name it may have, or move it to "mul:" (or whatever other language code is appropriate) and put it into the "set category" system, even if it only exists for one language?
  2. Do the categories named above actually only exist in one language (Translingual)? Should .გე, .հայ, .한국, etc go in the same category as .de, or would they belong in "ka:Top-level domain codes" (and "hy:", "ko:", etc)?

- -sche (discuss) 23:11, 24 September 2023 (UTC)[reply]

@-sche My current tendency is to only create topic and set categories if they exist (or may exist) for more than one language. However, I think is probably the wrong thing to do. The poscatboiler system supports language-specific categories like Category:Bulgarian conjugation 2.1 verbs and I don't see why we can't do the same in the topic category system. (BTW the poscatboiler system now handles all categories of all sorts except for topic and set categories. I've been thinking for awhile of making it handle topic/set categories as well and eliminate the separate topic category system; this would make it possible to consolidate the generic category code into the poscatboiler system, so there's only one unified category system.) For #2, I'm not really sure, but my instinct is that non-Latin-script top level domains should also be translingual. Note for example that Korea created Korean-specific Latin-script TLD's like .kia, .samsung and .hyundai (see .kr on Wikipedia); if these are translingual I don't see why the Korean-script ones shouldn't be. Benwing2 (talk) Benwing2 (talk) 02:22, 26 September 2023 (UTC)[reply]

Yes, English is the primary language in Australia, the US, Britain, etc. but certainly not the only one. So I propose:

Note that Category:Irish slang refers (logically) to the Irish language, not to Irish English. (For some reason there is no Category:Canadian slang or Category:New Zealand slang.) Benwing2 (talk) 02:05, 26 September 2023 (UTC)[reply]

Support This, that and the other (talk) 02:02, 8 November 2023 (UTC)[reply]
SupportMahāgaja · talk 08:12, 8 November 2023 (UTC)[reply]

There are multiple constellation systems, but we only have one category for all constellations - contrast this with Category:Chinese astronomy which is a subcategory of Category:Astronomy. In the label tree there is already {{lb|zh|Chinese constellation}} which categorises into Category:LANG:Chinese astronomy and Category:LANG:Constellations, and therefore makes these categories very messy (see e.g. Category:zh:Constellations where terms ending in are in the European system while the rest are the Chinese ones - I'm in the progress of adding more for the latter). Also note that there are still a bunch more that have been incorrectly categorised, e.g. Ox which has {{lb|en|astronomy}} rather than {{lb|en|Chinese constellation}}, so there would be a decent amount of terms to warrant a split.

I think we can just make said label categorise into Category:Chinese constellations, which would be a subcategory of Category:Chinese astronomy and Category:Constellations. (technically they can be called asterisms but the distinction isn't really clear so I'll just go with constellation)wpi (talk) 09:19, 28 September 2023 (UTC)[reply]

2023 — October[edit]

References:

J3133 (talk) 08:57, 1 October 2023 (UTC)[reply]

This, that and the other (talk) 09:58, 1 October 2023 (UTC)[reply]

We have 5 football codes under Category:Football:

  1. Category:Australian rules football
  2. Category:Canadian football
  3. Category:Gaelic football
  4. Category:Football (American)
  5. Category:Football (soccer)

I imagine this was done this way because in the US, "football" universally refers to American football (or occasionally Canadian football, which is quite similar), and never to soccer (except in the names of certain soccer clubs, which often call themselves "football clubs" (F.C. for short) in imitation of European football clubs). But it looks out of place, and Canada similarly refers to Canadian football as just "football" but our category is Category:Canadian football not Category:Football (Canadian). Wikipedia has its article on American football at American football (logically) and similarly for Commons. BTW once we rename Category:Football (American) to Category:American football, we might consider renaming the soccer category to Category:Association football (consistent with Wikipedia), but that's a separate can of worms. Benwing2 (talk) 04:01, 4 October 2023 (UTC)[reply]

I would think that our contributors could tolerate a lack of parallelism in topical categories where the base terms reflect common usage and the differentia are in parentheses. This seems like overtidying, letting one's own personal preferences for parallelism override broader, user-oriented considerations. The (non)problem only appears in the Category:Football page. DCDuring (talk) 12:16, 4 October 2023 (UTC)[reply]
I would've guessed it was done this way so someone typing "Footba..." into Hotcat (or typing "Category:Footba..." into search) would notice that they needed to specify rather than just using bare "Football". I'm not wedded to the current names, but I don't see a compelling reason to change them, either. - -sche (discuss) 05:56, 3 November 2023 (UTC)[reply]

New language codes for nested Persian translations[edit]

Per Wiktionary:Beer_parlour/2023/October#Persian_nested_translations_-_split_or_labelled?

@Sameerhameedy, @Benwing2, @Theknightwho.

New codes and labels, under "Persian" to work with MediaWiki:Gadget-TranslationAdder.js

  1. "prs" - Dari
  2. "fa-cls" - Classical Persian

Considering "fa-ira" for Iranian Persian. Anatoli T. (обсудить/вклад) 05:13, 4 October 2023 (UTC)[reply]

Don't we normally use ISO 3166 codes for countries? I'd say it should be "fa-IR". —Mahāgaja · talk 09:24, 4 October 2023 (UTC)[reply]
@Mahagaja: Not sure what is right in this case but it must have been done.
Both "prs" and "fa-ira" seem already working but {{t+|اَفْغانِسْتان}} fails to link to fa:افغانستان
Since the code is already working (apart from the interwiki) links, automatic nesting should be possible as well.
Need to make "fa-ira" link to "fa" Wiktionary, just like "cmn" links to "zh" Wiktionary. {{t+|cmn|阿富汗}} to zh:阿富汗
@Benwing2, @Sameerhameedy, @Theknightwho: can someone please fix the the interwiki link? I think it was @Ruakh who made it work for Mandarin. I'll take a look at nesting. Anatoli T. (обсудить/вклад) 00:07, 13 October 2023 (UTC)[reply]
Actually the new codes still don't work with the translation-adder. Some changes to Module:languages/data submodules need to happen. Anatoli T. (обсудить/вклад) 00:19, 13 October 2023 (UTC)[reply]
@Mahagaja: "fa-ira" is correct per Module:etymology_languages/data Anatoli T. (обсудить/вклад) 00:33, 13 October 2023 (UTC)[reply]
Update: @Sameerhameedy: Language code "prs" can now be used for automatic nested translations: Persian\Dari. Just use the language code "prs" in the translation adder but I wasn't able to tweak modules for "fa-ira" or "fa-cls". Anatoli T. (обсудить/вклад) 02:58, 13 October 2023 (UTC)[reply]

Articles with national anthems[edit]

Compare Afrikaans Die Stem and English The Call of South Africa with French Marseillaise (instead of La Marseillaise) and English Star-Spangled Banner (instead of The Star-Spangled Banner). J3133 (talk) 14:45, 6 October 2023 (UTC)[reply]

English. To to someone's taste.

Don't or shouldn't we restrict the use of one's to reflexive definitions? DCDuring (talk) 21:35, 11 October 2023 (UTC)[reply]

Reflexivity is a property of verbs. This restriction appears to be the practice for verbs: do one's utmost, eat one's fill, wrap one's head around. As far as I can see this has not been codified in any guideline. Of course, *do someone's utmost rubs one’s grammar judgement the wrong way. No such rule is systematically applied for other categories than verbs. E.g., we have over one's head, at one's fingertips, one's turn in the barrel, ..., but at someone's service, none of someone's business, up someone's alley, ... .  --Lambiam 18:21, 12 October 2023 (UTC)[reply]
Maybe so. But no OneLook reference has to one's taste, whereas MWOnline has to someone's taste. I wonder what the OED does. DCDuring (talk) 22:19, 12 October 2023 (UTC)[reply]
The 1933 OED has a sense
7. The fact or condition of liking or preferring something; inclination. liking for; † appreciation.
Under that sense, there is a quotation
The other girl is more amusing, more to my taste.
 --Lambiam 04:37, 13 October 2023 (UTC)[reply]
So OED 1933 did not have even a run-in entry for any form of this as an idiom. Did they have any illustrations of the sense used without to ((some)one's)"? Also, they didn't seem to care too much about substitutibility either. DCDuring (talk) 17:52, 13 October 2023 (UTC)[reply]

pro-shipper to proshipper[edit]

proshipper seems to be the more common formatting. pro-shipper is less searched on Google, for example. –MJLTalk 17:29, 13 October 2023 (UTC)[reply]

Merging Mengisa and Leti (Cameroon); Rename Leti (Indonesia) to Leti[edit]

Per Wikipedia, Leti and Mengisa are the exact same thing (Leti is spoken by the Mengisa). We currently don't have the Mengisa language. Googling "Mengisa language" and "'Leti language' 'Cameroon'" show about an equal number of results. I wonder if we could rename leo to "Mengisa" (since the two names are equally used), thus also freeing up place to rename lti to "Leti", making editing both more accessible. Any objections? Thadh (talk) 16:46, 18 October 2023 (UTC)[reply]

@Thadh No objections. Benwing2 (talk) 04:51, 19 October 2023 (UTC)[reply]

2023 — November[edit]

The current name is obscure, a holdover from when you did actually have to supply a label to {{poscatboiler}} and its friends. Now that I have updated the corresponding error message to remove the reference to labels, which is now a purely Lua-internal term, I think it's time to also rename this category. This, that and the other (talk) 00:30, 2 November 2023 (UTC)[reply]

@This, that and the other I support a move of this nature. Maybe "Unrecognized categories" is a simpler name that conveys the same gist? If this seems too short, how about "Categories unrecognized by the category code" (or "... category system") or similar? In a sense, being "unrecognized" is the actual error; being "undefined in the category system" is the cause of the error. I'd like to move away from the term "category tree"; this is indeed the current name of the top-level module that implements category-handling, but I've been thinking of merging the Module:category tree system with the poscatboiler subsystem and calling the result simply Module:category (the topic subsystem would become a subsystem of Module:category, and the current special handling of "langname" categories in the poscatboiler subsystem might become the "lexical" subsystem of Module:category, since the intent of these categories is to represent lexical rather than topical/semantic properties of terms). All of this reorganization would be under the hood and not affect the data modules other than likely moving them to different roots, e.g. Module:category tree/poscatboiler/data/terms by etymology might become Module:category/lexical/terms by etymology or something. Benwing2 (talk) 09:22, 2 November 2023 (UTC)[reply]
@Benwing2 It would be great to retire the name "poscatboiler". I was even thinking of sending {{poscatboiler}} to RFDO, as it seems completely obsolete other than a lingering use on a tiny handful of categories that still confuse {{auto cat}}, like Cat:Reddit slang by language.
However, I'm quite partial to the name "category tree". Why do you feel it should be moved away from? It feels like the term would be intuitively understood even by someone who doesn't know that the term has a precise computer-science definition, whereas a "category system" sounds more abstract and difficult to reason about.
I agree that the name "Unrecognized categories" is too short, but I'm not too fussed what new name we choose for this cat, so long as it clearly emphasises that this is an irregular/error condition. You could argue that "unrecognized" sounds a bit like the system isn't smart enough to recognize this category yet, when it is really just a matter of a missing definition in a list, but as I said, I'm not fussed. This, that and the other (talk) 11:21, 2 November 2023 (UTC)[reply]
@This, that and the other Yes I completely agree with getting rid of "poscatboiler". I've long wanted this gone, but couldn't think of a better name. In truth it's the "subsystem that handles everything except topic categories", and it could fairly easily be made to handle topic categories as well, in which case it would become "the one subsystem to rule them all" and then there'd be no particular need to have subsystems as they're currently defined at all. The thing I don't like about "category tree" is that categories don't actually form a tree, since there are multiple parents to each category. It's true that we put breadcrumbs at the top based on the first parent of every category, which gives the impression of a tree, but the other parents are still there and appear at the bottom (and in addition all children appear in the list of subcategories, whether or not they have the category in question as their first parent). Also when looking for the module that handles categories, I'd expect it to be called simply "category" or "categories"; "category tree" suggests (to me at least) that it's some sort of ancillary module. Benwing2 (talk) 20:21, 2 November 2023 (UTC)[reply]
I forgot to mention, a similar issue to CAT:Reddit slang by language occurs with CAT:Sanskritic formations by language. This was solved not using {{poscatboiler}} directly but by putting "Sanskritic formations by language" as a raw category definition in Module:category tree/poscatboiler/data/terms by etymology. Benwing2 (talk) 21:07, 2 November 2023 (UTC)[reply]
BTW I fixed the issue for these two categories, which now use {{auto cat}} with no special hacks needed. Let me know if you know of any other such categories. Benwing2 (talk) 22:52, 2 November 2023 (UTC)[reply]
One more thing ... we have both CAT:en:Reddit and CAT:English Reddit slang. Should they be merged? Benwing2 (talk) 22:54, 2 November 2023 (UTC)[reply]
It's true that categories don't form a strict tree. "Category system" would be more accurate in that regard. But the user-facing name doesn't have to be the same as the module name.
There are still four cats using {{poscatboiler}} (plus one in a comment) and six cats using {{topic cat}}.
As for the two Reddit cats, I'm surprised you would want to merge them... it seems clear enough to me that Cat:en:Reddit is a "related-to" topic category for terms about Reddit - the metalanguage, like subreddit - while Cat:English Reddit slang is a "terms by usage" category for slang supposedly used on Reddit, like landchad (the term's denotation having nothing to do with Reddit itself). Having said that, I am not sure that "Reddit slang" is a particularly viable subcat of "Internet slang"; a lot of these terms didn't originate on Reddit and don't actually seem restricted to Reddit. It probably only seems that way because of the prominence of Reddit as a site where people use Internet slang heavily. This, that and the other (talk) 23:50, 2 November 2023 (UTC)[reply]
@This, that and the other There should be no more direct calls to {{poscatboiler}} or {{topic cat}}. I see your point about Reddit-related topics vs. Reddit slang. If you think 'Reddit slang' is unviable, maybe create a RFM discussion about merging it into 'Internet slang'; your point about this seems well-taken. Benwing2 (talk) 00:30, 3 November 2023 (UTC)[reply]

The description of Cat:English Reddit slang reads:

English slang terms whose usage is typically restricted to users of the website Reddit.

However, I am not sure that "Reddit slang" is a particularly viable subcat of "Internet slang"; a lot of these terms didn't originate on Reddit and aren't actually restricted to Reddit. It probably only seems that way because of the prominence of Reddit as a site where people use Internet slang heavily.

The only terms that genuinely seem to belong in this category are downdoot, updoot, and AITA and its related terms (ESH, NTA, NAH, YTA). This isn't enough for a slang category IMO. The terms AMA and karma farm can persist in Cat:en:Reddit without being in this category, because the denotation has to do with Reddit itself. This, that and the other (talk) 02:36, 3 November 2023 (UTC)[reply]

@This, that and the other Support. I agree with your reasoning and I imagine there are few terms with currency on Reddit that haven't "escaped" into the wider Internet. Benwing2 (talk) 06:55, 3 November 2023 (UTC)[reply]
Support per nom. - -sche (discuss) 18:03, 3 November 2023 (UTC)[reply]
I would say karma farm is another one that's Reddit-specific, given it references Reddit karma (upvotes). Theknightwho (talk) 19:10, 5 November 2023 (UTC)[reply]
I support, I guess. Heyandwhoa (talk) 01:27, 7 February 2024 (UTC)[reply]
@This, that and the other: You're right that "Reddit slang" isn't a discrete category in and of itself (aside from the few you mentioned), but there are plenty of terms which are restricted to particular subreddits. Therefore I Oppose this proposal as written. Maybe the category should be moved to Category:English Internet slang originating from Reddit? By the way, karma and its derivatives aren't actually Reddit-specific, as the concept exists on (and possibly originates from) Slashdot (see e.g. karma whore). Ioaxxere (talk) 21:23, 6 April 2024 (UTC)[reply]
Even if "Reddit" isn't enough for a category, isn't it enough for its own label. And if a usage or, at least, a term originated in Reddit, shouldn't there be an etymology saying so? DCDuring (talk) 22:13, 6 April 2024 (UTC)[reply]

"Cat:LANG terms spelled with INDIVIDUAL EMOJI": merge all to "Cat:LANG terms spelled with emoji"[edit]

Most of these categories, like Cat:English terms spelled with 👄, will only ever contain one or two entries each. Better to merge them all into a single Cat:English terms spelled with emoji cat. This, that and the other (talk) 02:40, 3 November 2023 (UTC)[reply]

@This, that and the other Support. Also it's not clear such categories will be very useful even if they contain more than a handful of entries. Similar reasoning led to CAT:English terms spelled with numbers etc. instead of categories for individual numbers. Benwing2 (talk) 06:57, 3 November 2023 (UTC)[reply]
Support Fay Freak (talk) 17:00, 3 November 2023 (UTC)[reply]
Support per nom, at least in general. If some particular emoji does have a large number of uses, I am happy to discuss either splitting it off or just dual-categorizing it also into a specific subcategory for itself. - -sche (discuss) 18:05, 3 November 2023 (UTC)[reply]
SupportSgconlaw (talk) 20:22, 4 November 2023 (UTC)[reply]
Support - though this may be a little tricky to implement, since what counts as an emoji is not always straightforward. Theknightwho (talk) 13:57, 7 November 2023 (UTC)[reply]
@Theknightwho Hmmm, is there no emoji-specific block or Unicode property? Benwing2 (talk) 08:16, 8 November 2023 (UTC)[reply]
@Benwing2 There’s a property, but quite a few characters can be either emoji or plain characters, and the default display form depends on factors like viewing device, variation selectors etc. Theknightwho (talk) 12:45, 8 November 2023 (UTC)[reply]
Can you give an example of a character than can be either an emoji or a plain character? I would think any of the "ASCII art" faces drawn with plain characters like :-) etc. are emoticons rather than emojis. —Mahāgaja · talk 13:23, 8 November 2023 (UTC)[reply]
@Mahagaja One example is ↗︎. It tends to default to plain on desktop and emoji on mobile. We ran into this issue because it's used in IPA, so the template automatically adds a variation selector to force the plain display. Theknightwho (talk) 14:14, 8 November 2023 (UTC)[reply]

Other examples are , , (from Wiktionary:Grease pit/2022/August#Preventing_emoji_display_in_titles). - -sche (discuss) 16:49, 8 November 2023 (UTC)[reply]
The full list of single-Unicode-codepoint emoji here. Not all of these characters are to be treated as emoji by us though. We could just make a crude filter that treats anything in the Unicode range U+1F300 to U+1FAFF as an emoji, plus a list of others that need including, like U+2693 ⚓ (for w⚓). This, that and the other (talk) 00:44, 9 November 2023 (UTC)[reply]
Support J3133 (talk) 11:32, 8 November 2023 (UTC)[reply]
Support Almost did this manually before J3133 pointed me here. Worth asking if single emoji entries would be included in such a category. –Vuccala (talk) 22:58, 3 December 2023 (UTC)[reply]
 Done. Benwing2 (talk) 02:56, 7 February 2024 (UTC)[reply]
@Vuccala BTW this does not include single emoji entries, nor entries in languages that don't have a list of "standard characters" in the corresponding language data module. Benwing2 (talk) 02:58, 7 February 2024 (UTC)[reply]
@Benwing2 Cat:English terms spelled with emoji contains only 2 right now. Do all the others (see end of list here) have to be added manually or something? And will the same be done for translingual terms with emoji? Vuccᴀʟᴀ (talk) 03:59, 7 February 2024 (UTC)[reply]
@Vuccala This issue will correct itself automatically over time, as pages get regenerated. (You can speed it up by null-saving the pages in question, if you want.) Benwing2 (talk) 04:09, 7 February 2024 (UTC)[reply]

Splitting Mazurian[edit]

I would like to open a discussion about the pros and cons of splitting Masurian as an L2 with the langcode zlw-maz and as a descendent of Old Polish. I would also like to preface this that while I am leaning towards split that I am not dead-set on it. The argument is as follows:

w:Masurian dialects would benefit a lot from having a separate L2. There are significant differences in pronunciation (extra vowels non-existant in Polish a loss of quite a few consonants), grammar (different endings from standard Polish), and vocabulary, especially outside the "core" vocabulary. Even a significant number of basic forms end up looking different from Polish, and it has many inflections and conjugations. I could place them in the tables for Polish, but it might get cluttery. I would like to also point out that {{R:pl:SgOWiM}} exists as a good, reliable source for entries.

Problems of splitting - most people do consider this specifically a dialect, even most speakers, and most forms of it today are heavily policized. However, at least up until the 20th century it was distinct and much more difficult to understand in comparison to standard Polish. My problem is that some of these differences are so vast it might not make sense to put them all under Polish. Vininn126 (talk) 21:43, 12 November 2023 (UTC)[reply]

A point for not splitting is that some other dialects of Polish might be equally as divergent, such as Łowicz, in some respects. So what might be better is including multiple declension tables and the like. (Notifying KamiruPL, BigDom, Hythonia, Tashi, Sławobóg): , @Benwing2, @PUC, @Thadh Vininn126 (talk) 12:58, 14 November 2023 (UTC)[reply]
Here is some sample text The little prince in Mazurian. This channel has some other examples. As someone with high proficiency in Polish, I can understand large parts of it but there's also a significant portion that is very difficult, maybe 65% for me. Vininn126 (talk) 17:28, 14 November 2023 (UTC)[reply]
@Vininn126 As you know, I tend to lean towards not splitting in cases of doubt, while Thadh leans towards splitting. Comparisons to multi-dialect languages like Occitan and Ancient Greek might be useful. In this case I don't know, but I think we're hampered by the lack of standardization. Benwing2 (talk) 23:06, 14 November 2023 (UTC)[reply]
@Benwing2 There is a notation system widely used for Masurian which is present in the Wikipedia article that I'd be able to use for WT:About Masurian if split. Also, using this system would yield in 1) a different pagename 2) different pronunciation section (as the notation system is based on the different pronunciation) 3) different definition section at least outside of "core" vocab, and core vocab would only share 1-2 defs, as opposed to all of the obsolete senses as well. 4) different conjugation/declension section as well Vininn126 (talk) 08:53, 15 November 2023 (UTC)[reply]
I’d be in favour of the split. As a native Polish speaker I find it difficult to understand some Mazurian texts and eg. parts of this Mazurian rendition of Colors of the Wind, Farbi Zietrżu would be straight ungrammatical in Polish (the infinitive construction in cÿsz ti słicháł zilkä wicz ‘have you heard the wolf howl’, which looks more Czech than Polish and in Pl. would have to be reworder as ‘czyś ty słyszał jak wilk wyje’ or ‘wycie wilka’ or something, but the infinitive doesn’t work).
Also I’ll note that Mazurian also keeps some phonemes long gone from standard Polish (like the /r̝/ phoneme written in the song above which Polish merged with ż /ʐ/).
And, @Vininn126, could you include me too in Polish-related discussions when pinging people? I feel left out ;-) // Silmeth @talk 12:28, 15 November 2023 (UTC)[reply]
@Silmethule I can add you to the Polish ping group. Yes, the completely different set of phonology and grammar are both big points for me. Masurian also keeps reflexes of Old/Middle Polish pochylone vowels while getting rid of quite a few consonants. Reading up on the Wikipedia article, quite a few experts also claim it's a language. Vininn126 (talk) 14:16, 15 November 2023 (UTC)[reply]
Having boned up more on Polish dialectology I'm definitely leaning now more towards split. I haven't been able to find another dialect (that we would mark as such) as divergent as Masurian. There's also a big gap of mutual intelligibility Vininn126 (talk) 15:48, 20 November 2023 (UTC)[reply]
I'll also add I was wrong in the original post - the Masurians had a stronger sense of identity even more so than the neighboring regions. Vininn126 (talk) 16:47, 20 November 2023 (UTC)[reply]
I'm still wavering, upon listening to more recordings. It might be possible to automatically generate pronunciation sections (even though they would be very, very different), and then it would just be a matter of giving special definitions a label and then I suppose conjugation/declensions... Vininn126 (talk) 09:17, 28 November 2023 (UTC)[reply]
@Silmethule @Mahagaja Another question would be the langcode. Is the one I proposed best? I doubt it. At this point I'm fairly sure we are splitting.Vininn126 (talk) 13:48, 7 December 2023 (UTC)[reply]
@Vininn126 Depending on the choice of Mazurian vs. Masurian, it should be zlw-maz or zlw-mas. Benwing2 (talk) 22:21, 7 December 2023 (UTC)[reply]
@Benwing2 You're right, so it's probably gonna end up being zlw-mas. Vininn126 (talk) 22:23, 7 December 2023 (UTC)[reply]
I'm going to go ahead with this today and make an entry. I've also been able to contact someone educated in this lect and they'll be able to check anything that I (or potentially we, me and him) make. There is a weak consensus it should be split, and if it's handled right I think it will be much better than smushing everything into Polish. Vininn126 (talk) 17:55, 8 December 2023 (UTC)[reply]
@Benwing2 @Mahagaja @Silmethule Sorry for all the pings as of late. I figured now would be a good time to take a pause and look at the current state of things after the decision. We currently have 428 Masurian lemmas, Appendix:Masurian pronunciation, Appendix:Masurian Swadesh list, along with various infrastructure. I know this is a lot of material, I ask you to please take a look at some of these and give your input, and I thought now would be a good time before things got too big, and also at this point I am going to slow down.
Of the existing lemmas, I added mostly cognates, so there aren't many words unique to Masuria, but there are plenty of definitions and of course, pronunciations. I haven't been able to do any work with declensions, as Masurian declensions are too complicated for me at the moment, but I can assure you there are plenty of differences.
I also know I gave the impression I was gung-ho for a split, and also for a split for Goral, which isn't the case, I simply found resistance everywhere I went when trying to add Masurian information - some felt it clogged up the main Polish entry, didn't want particular information, other times I heard that it's remarkably different.
Having added all these terms, I can still see it going either way. On one hand, having it split as a language is a view held by some linguists, but not all (always a problem), and I think the orthography us few Masurian editors have been using easily demonstrate the phonemic difference (the template is phonemic except for (literally) 1 or (potentially) 2 phones, that being the ones represented by <ä> (which might be phonemic) and <ÿ> (which I believe is phonemically /i/).
However, if we merged, as I have seen various reactions to the split, and understandably so, I'd have a few questions.
What would be the best way to represent Masurian pronunciation? We could ignore spelling and put everything under the Polish spelling, using a respelling in the pronunciation module. This is the approach I take with Middle Polish, and it serves me well. For Masurian only terms (such as szmanta), I'd prefer to keep {{zlw-mas-IPA}}, similar what we have currently {{zlw-mpl-IPA}}. However this leaves us with the issue of <ä> and <ÿ>.
Another potential approach would be to keep the spellings, but I'd be less sure about this, as it works better for British/American English. One potential issue this would solve is the problem of standard Polish definitions absent from Masurian.

One other potential issue is the fact that Masurian would ideally be treated as an LDL. Currently Middle Polish is (not standardly!) treated as an LDL, despite being part of Polish, and it would be a shame to see the potential for someone to RFV all of them (perhaps they won't, but the option exists) and have certain very real terms deleted just because it's considered part of a WDL.

I know there's been a lot of talk about this lately, hopefully there isn't too much fatigue. That is why I decided it might make more sense to review this now and press on later. Vininn126 (talk) 23:40, 18 January 2024 (UTC)[reply]
I was asked by Vininn to add my two cents on the issue, so here I go.
I must say I am worried about using language splits in order to circumvent the WT:WDL policy. I understand the frustration of having dialectal terms left undocumented, but there is no way to objectively draw a line between one dialect and another. In the end the smallest unit of a complete language system is an idiolect, and between that and a language family any grouping is ultimately either political or arbitrary.
I'm not sure how to define what is and isn't a language. I would say ISO codes are a good start, and after that splits may be warranted provided that there is abundant literature in the lect, a solid written language, or some major problems in mutual intelligibility... Knowing how Slavic languages are, the last one is probably not the case with these Polish lects. I don't know enough about them to comment on the first two.
With historical lects, a different issue comes up. In my opinion, it is only possible to treat a standard language as an WDL after its standardisation, and so I would prefer lects like Middle Polish to stand separate, like Old Ruthenian, and in my opinion the same should be done with Middle Russian (although this discussion led nowhere). Thadh (talk) 13:26, 19 January 2024 (UTC)[reply]
@Thadh As to intelligibility, as mentioned above, I'd say that Massurian (and to a lesser extend Goral) is as intelligible as two other Slavic languages, so somewhat, but also quite diffificult for a lot of people. Middle Polsih is also the period when standardization really began and to some extend, solidified. Vininn126 (talk) 13:42, 19 January 2024 (UTC)[reply]
@Thadh, Vininn126: regarding mutual intelligibility, my subjective opinion is that Middle Polish is easier for a modern Polish speaker than Masurian (if not because of anything else, then due to exposure in school to 16th and 17th century texts) – but since modern standard Polish does continue the standard that was established during Middle Polish period, I think there’s more to it. Masurian truly feels “foreign”. So if we’re willing to keep Middle Polish as a separate lang, IMO Masurian deserves the treatment too.
But then, regarding the factors of attestation in literature, separate grammar, recognition in separate ISO code, etc. – we’ve merged Classical Gaelic with modern Gaelic langs and it’s still not split – despite having its own ISO code, having very rich literature in 13th–18th centuries, its own grammar schooling tradition, established (if changing in time) spelling conventions, etc. So even we acknowledge those factors provide good guidance we definitely don’t always follow it very closely. // Silmeth @talk 14:20, 19 January 2024 (UTC)[reply]

They seem to be the same. 212.179.254.67 08:39, 13 November 2023 (UTC)[reply]

They're not. Not all multiword terms are phrases by our reckoning, and the multiword term category contains more than 10 times as many entries as the phrase category. —Mahāgaja · talk 08:49, 13 November 2023 (UTC)[reply]

The descriptions in the categories are

Hebrew groups of words elaborated to express ideas, not necessarily phrases in the grammatical sense.

and

Hebrew lemmas that are an idiomatic combination of multiple words.

If those are different to each other, the descriptions should say how they are different. 212.179.254.67 09:53, 13 November 2023 (UTC)[reply]

I agree the descriptions aren't clear, but "phrases" in Wiktionary are a grammatical concept and indicate things that can't be clearly classified as nouns, verbs, adjectives and the like, while any POS can be multiword. Benwing2 (talk) 23:09, 14 November 2023 (UTC)[reply]

Then can someone update the category descriptions? 212.179.254.67 15:01, 3 December 2023 (UTC)[reply]

Some overlap here Jewle V (talk) 21:28, 16 November 2023 (UTC)[reply]

Japanese. Move to CBS#Japanese and NHK#Japanese. —Fish bowl (talk) 06:05, 19 November 2023 (UTC)[reply]

2023 — December[edit]

Proposal for several languages without ISO codes[edit]

Tagging @-sche and @Benwing2 who are likely to be interested in this. Here is a list of languages that currently lack ISO codes, with a brief explanation as to why they probably justify an L2 code. In a couple of cases, we're never likely to have more than a handful of entries for the language in question due to the scant number of attestations we have, but I don't think that should be used as justification for exclusion.

Baltic[edit]

  • Splitting Galindian (xgl) into East Galindian (xgl-eas) and West Galindian (xgl-wes).
    This seems to have been a genuine mistake by the ISO: "Galindian" refers to two separate extinct languages within the Baltic family, which don't even seem likely to have been part of the same sub-branch. Both are poorly attested, however.
    What is there to add in either language? WP says both are "poorly attested", but I'm having trouble finding whether they are actually attested or this is just an editor's euphemism for "not attested". (All I've found so far is a random website mentioning that some placenames are known or inferred for "Galindian".) This would help with deciding whether to just retire xgl, add full codes for East and West, or add etymology-only codes for them. - -sche (discuss) 19:29, 16 January 2024 (UTC)[reply]

Creoles and pidgins[edit]

  • Scots-Yiddish (crp-syi)
    A Scots-Yiddish creole spoken in the first half of the 20th century. Attestations are scanty, but some records do exist.
    I'd like to see good evidence that this is a genuine creole (or even pidgin) rather than Scots with some Yiddish loanwords or simple code-switching. Pidgins rarely arise when there are only two languages in contact, and not all pidgins undergo creolization. —Mahāgaja · talk 07:36, 8 December 2023 (UTC)[reply]
    Yeah, I don't think we have enough evidence of this being a real, distinct language to add it. (Several of the relatively few works "in" the "language" appear to be inventing, or as they put it, "reimagining" it like a conlang.) - -sche (discuss) 19:05, 16 January 2024 (UTC)[reply]

Dravidian[edit]

Created. Theknightwho (talk) 01:18, 3 February 2024 (UTC)[reply]
  • Malamuthan (dra-mal)
    A small tribal language related to Malayalam - we have quite a few of these already, and I see no obvious reason to exclude this one.
    I'm having trouble finding any reference works about this; Mikhail S. Andronov (in A Comparative Grammar of the Dravidian Languages and A Grammar of the Malayalam Language in Historical Treatment) speaks of "the Malamuttan dialect". Perhaps we should just wait until someone has content they're wanting to add in this lect, to judge how distinct it is. - -sche (discuss) 19:38, 16 January 2024 (UTC)[reply]
    @-sche I'm not sure if you've seen it, but pages 37 to 39 of Tribal Languages of Kerala has some information about it, which notes a number of distinctive qualities; not least because they have a very strong tradition of isolating themselves from outsiders. That paper cites a 1981 reference work, but I assume it's in Malayalam. Theknightwho (talk) 14:35, 20 February 2024 (UTC)[reply]

Germanic[edit]

  • Greenlandic Norse (gmq-grn)
    A descendant of Old Norse spoken in Greenland until sometime in the 15th century, which diverged likely due to isolation (compare Icelandic and Norn). Some linguistic innovations and conservations have been noted, though the number of attestations is relatively small.
    Oppose: This is concidered a dialect of Old West Norse, for which we already have code: non-own. --{{victar|talk}} 19:22, 7 December 2023 (UTC)[reply]
    @Victar That's an etymology-only code, not a full language code. Theknightwho (talk) 20:22, 7 December 2023 (UTC)[reply]
    I'm aware. This is a subdialect of a larger dialect. --{{victar|talk}} 20:30, 7 December 2023 (UTC)[reply]
    My initial inclination is to keep treating this as ==Old Norse== as far as L2s go (or if we really want to, treat it as ==Old West Norse== and upgrade OWN to being attested like Proto-Norse). Various Old Norse dialects including this one have some differences from one another, but I do not know that it makes sense to speak of Greenlandic Norse as a "descendant" of Old Norse when it was contemporaneous and stopped being spoken at around the same time as other Old Norse, and other members of the dialect continuum do not seem to have had trouble understanding it, or at least modern scholars don't (given the uncertainty over whether various texts or inscriptions represent Greenlandic Norse or e.g. the Icelandic dialect of Old Norse, and that it sometimes even comes down to just the shapes of runes rather than anything about which letters or words are used); it seems like we can continue to treat it as a dialect in the dialect continuum. It would be reasonable to add an etymology-only code, for use in various Greenlandic terms' etymologies (since we are extremely free with these, and have ety-only codes even for things like en-NNN vs en-US ... I see we even have "en-US-CA" although this does not appear to be used anywhere and I am going to suggest it be deleted along with Template:User en-us-ca...). - -sche (discuss) 20:12, 16 January 2024 (UTC)[reply]
Closing this by giving it the etymology-only code non-grn under Old West Norse. Theknightwho (talk) 01:33, 7 February 2024 (UTC)[reply]

Indo-Aryan[edit]

  • Kishtwari (inc-kst)
    Closely related to Kashmiri (and sometimes classified as a dialect), but only retains partial mutual intelligibility, and (unlike Kashmiri) appears to be written using the Takri script.
    Oppose: I have never seen Ka/ishtwari referred to anything other than a dialect of Kashmiri, alongside Kohistani, Poguli, Rambani, and Siraji. --{{victar|talk}} 08:32, 8 December 2023 (UTC)[reply]
    @Victar Poguli has an ISO code, so I’m not sure how much value your assertion has. Theknightwho (talk) 08:42, 8 December 2023 (UTC)[reply]
    And just because an ISO code exists, doesn't mean we on the project should create a language for it. Often times, village dialects have codes just because someone put out a paper on it, not because it's any more unique than any other dialect on the continuum of dialects. --{{victar|talk}} 09:30, 8 December 2023 (UTC)[reply]
    @Victar It calls into question the value of your statement that you have never seen it referred to as a language, if you’re putting it on the same level as a lect which does, in fact, have a language code. It also directly contradicts your previous statement as to the weight we should put on language codes. There is also the matter of the Takri script. Theknightwho (talk) 09:44, 8 December 2023 (UTC)[reply]
    It doesn't contradict my opinion at all. In my experience, partially when it comes to Indo-Iranian, is ISO over assigns language codes, so trying to give a language code to a dialect when even ISO doesn't is saying something. --{{victar|talk}} 10:22, 8 December 2023 (UTC)[reply]
    @Victar None of which is relevant to the fact there is evidence it isn’t even written with the same script - please present something more substantive than a personal hunch, or a selective approach to the weight you put on language codes. Theknightwho (talk) 10:29, 8 December 2023 (UTC)[reply]
    A language written in multiple scripts is practically a hallmark of Indo-Iranian languages and to cite that as a reason to call it a different language would be naive. --{{victar|talk}} 10:39, 8 December 2023 (UTC)[reply]
    @Victar You’re being highly misleading: when a “dialect” is written in a different script, its speakers do not consider themselves to be speaking the same language, and it’s also highly divergent (to the point where it is tonal, unlike Kashmiri), then it creates a compelling case for separating it out. Theknightwho (talk) 10:44, 8 December 2023 (UTC)[reply]
    That is such an absurd statement. Script usage is frequently dependent of region and religion. Most literate Kashmiri speakers write in Perso-Arabic but the Hindus population uses Devanagari, regardless of any dialectal differences. Also I can't find any paper states Kishtwari is any more or less tonal than standard Kashmiri. You're overreliant on a Wikipedia article for your facts. --{{victar|talk}} 11:41, 8 December 2023 (UTC)[reply]
    @Victar Except this is the Takri script and it is directly related to “dialectal” differences, so your comparison is nonsensical because it shows that script usage in this case is affected by the lect, not other factors like religion. Standard Kashmiri isn’t tonal at all, as you very well know. Theknightwho (talk) 11:48, 8 December 2023 (UTC)[reply]
    Yes and the Kishtwari dialect is spoken in the region of the Kishtwar Valley, and the use of Takri is regional. Again, no paper I read remarks anything on tone. Unless you can provide a paper, your statement is meaningless. --{{victar|talk}} 11:57, 8 December 2023 (UTC)[reply]
    @Victar we also have code for haryanvi, considered a dialect of Hindi. So should it be removed? Word0151 (talk) 12:48, 8 December 2023 (UTC)[reply]
    🤷 Plenty of Hindi project users that can decide that. --{{victar|talk}} 01:33, 9 December 2023 (UTC)[reply]
  • Urtsuniwar (inc-unr)
    Closely related to Kalasha, but appears to be divergent enough to constitute a separate language with around 70% mutual intelligibility (compare Spanish/Portuguese with 85-90%).
    Oppose: Urtsuniwar is a synonym for Kalasha, see Decker (1992). Some speakers just use more Khowar borrowings than others. --{{victar|talk}} 08:32, 8 December 2023 (UTC)[reply]
    @Victar Patently untrue - numerous references in the sources provide by WP (and elsewhere), and you’ve failed to explain the issue of mutual intelligibility. Theknightwho (talk) 08:45, 8 December 2023 (UTC)[reply]
    How is it "patently untrue"? Did you read Decker (1992): "Kalasha speakers in the Urtsun Valley sometimes call their language Urtsuniwar." I did explain the "issue of mutual intelligibility" -- speakers of Kalasha use varying degrees of Khowar borrowings. --{{victar|talk}} 09:30, 8 December 2023 (UTC)[reply]
    @Victar 70% mutual intelligibility is far below the threshold typically used to classify something as a dialect (80-85%) - the fact that one citation says they are the same does not discount the wealth of evidence to the contrary. Theknightwho (talk) 09:44, 8 December 2023 (UTC)[reply]
    What "wealth of evidence"? The first reference on the Wiki page literally lists Urtsuniwar under "Other Names" for Kalasha, beside Bashgali, Kalashwar, Kalashamon, and Kalash. Shall we make Kalashwar its own language as well? Another reference there is titled, I shit you not, "Kalasha of Urtsun". --{{victar|talk}} 10:22, 8 December 2023 (UTC)[reply]
    @Victar Insufficient levels of mutual intelligibility, as stated several times. Theknightwho (talk) 10:32, 8 December 2023 (UTC)[reply]

Iranian[edit]

  • Gorgani (ira-gor)
    An extinct Caspian language attested in the 14th century, which appears to have formed a dialect continuum with Mazanderani. Previous discussion here.
    Oppose: The few texts we have in Gorgani are almost indistinguishable from Old Tabari, the ancestor of Mazanderani, and should be considered a dialect of it, not its own language. There are actually more differences between Old Tabari and Mazanderani, but, like Classical Persian and Modern Persian, we treat them as the same language, in large part due to their use of an abjad alphabet. @Fay Freak --{{victar|talk}} 19:35, 7 December 2023 (UTC)[reply]
    @Victar In all seriousness: given you clearly respect the views of Borjian, how do you explain his apparent change in view from the line you quoted from 2004 and his 2008 paper on Gorgani in which he invariably refers to it as a language (not a dialect)? Theknightwho (talk) 22:43, 7 December 2023 (UTC)[reply]
    By its only being apparent. If you search for such a distinction. I’ve just looked into the 2008 paper again just for you. Normal(ly) people don’t look upon the statistical distribution of the employment of “language” and “dialect” in previous publications to find “changes in view” of linguists. Their views are rarely that sophisticated that one could make meta publications as one does on philosophers, and even then following such a bright shiny object is not an argument. language has multiple languages like sublanguage, including dialect, and one is not only not always anxious to make a distinction, there is usually nothing gained at all from such a “turf war”. All is language and words, rarely isolects or lexemes. Whether or not something should be treated separately is decided long before you realize you could beat the topics of this dichotomy again to fill your publication history.
    In this case the talk of “language”, I may argue, is purposefully misleading people, to market one’s publication career. It’s just much more zhoosh to publish about whole “languages” than dialects. But it’s okay to embellish things a bit since the core message of a paper does not hinge on these concepts. All historical sciences use to be much less exact in their design than that of the jurist who has the peculiar task to weigh or find a balance for a final decision. Like how I formulate etymologies in probability terms is secondary to what information is provided, in other words: it is mostly rhetorics to present the material, the related forms, reconstructions, and bibliography—this is the science, the result is of little practical relevance, unlike in the legal art where in the end you get a sentence or recommend an action. There is a principal misunderstanding of what linguistic papers are about here I can make out. Benwing noticed. You take publications of an author and read them with an exactitude that they don’t provide, with “research results” that they didn’t care about. One could enjoy that there are still naive academics whose subjects are recondite enough for their not bewaring of a lawyer around the corner attempting to misinterpret them. Fay Freak (talk) 00:35, 8 December 2023 (UTC)[reply]
    @Fay Freak This seems like a very cynical answer, and it’s difficult to see how you’re not simply accusing Borjian of academic dishonesty. Also Benwing2 didn’t add anything on this topic - he simply asked for consensus. Theknightwho (talk) 08:48, 8 December 2023 (UTC)[reply]

Nuristani[edit]

  • Zemiaki (iir-zem)
    Spoken by around 500 people and related to Waigali, but I'm not seeing any indication it should be treated as a dialect in the literature.
    Oppose: Morgenstierne (1974) calls it a dialect of Waigali, and Edelman (1999) is unsure, labeling it "jazyk/dialekt". We should play it safe and treat it like a dialect. --{{victar|talk}} 21:46, 7 December 2023 (UTC)[reply]

Tungusic[edit]

  • Alchuka (tuw-alk)
    A language in the Jurchenic branch (i.e. close to Jurchen and Manchu), which went extinct at some point in the 1980s. Records of the language aren't great, but there are a handful of works which go into detail.
  • Bala (tuw-bal)
    A very similar situation to Alchuka above, though the language may still be moribund.
  • Kili (tuw-kli)
    Formerly thought to be a dialect of Nanai (a Southern Tungusic language), but now thought to be a Northern Tungusic language influenced by Nanai due to geographical proximity; it had 40 speakers in 1990, and is likely moribund.
With no objections, creating these three. Theknightwho (talk) 18:28, 4 February 2024 (UTC)[reply]

Yeniseian(?)[edit]

  • Jie (qfa-yen-jie)
    Likely to be a Yeniseian language (though possibly Turkic), with only a single attestation from the 4th century (though it wouldn't be the first).
In the absence of objections, I'll create this, given the number of potential entries is capped at 4. Given the contention over its affiliation, und-jie is preferable as a code. Theknightwho (talk) 16:57, 4 February 2024 (UTC)[reply]

Unknown[edit]

  • Xiongnu (und-xnu)
    Attested only via in Old Chinese records of the language [edit: and potentially some inscriptions - see below], but nevertheless, a handful of terms have been recorded (and we can, at least, make broad reconstructions as to how they would have been read): e.g. the Old Chinese borrowing 谷蠡.

Theknightwho (talk) 16:03, 4 December 2023 (UTC)[reply]

Oppose Xiognu (Old Chinese is Old Chinese). West Galindian is also unattested. Is East Galindian attested outside of borrowings? If not, maybe keep as a substrate language? Provisional support Zemiaki, Kishtwari, Urtsuniwar, based on the assumption there are no good arguments to keep these together. Abstain for the others: poorly attested, extinct languages are usually subject to a lot of debate and usually dictionary entries in these don't turn out well, but they at least seem valid. Thadh (talk) 16:25, 4 December 2023 (UTC)[reply]
@Thadh The issue with Galindian is that we need to deal with the present situation, since having a single language code for both is simply incorrect. Re Xiongnu, I'm not referring to borrowings - I'm referring to specific records of the Xiongnu language in Old Chinese sources. Theknightwho (talk) 16:30, 4 December 2023 (UTC)[reply]
@Theknightwho: Do you mean mentions of terms à la Uindiorix, or do you actually mean texts à la Luwian? Because in the former case, I'm inclined to call it a borrowing rather than an attestation, whereas the second one is fair enough. Thadh (talk) 17:18, 4 December 2023 (UTC)[reply]
@Thadh It's a bit tricky - for example, see [21], where Vovin argues (quite convincingly) that they're inscriptions in Xiongnu which used Old Chinese characters for their semantic values, except for terms that needed to be transcribed phonetically, such as titles or personal names. There's obviously precedent for this - compare Japanese, Korean, Vietnamese etc. Theknightwho (talk) 18:01, 4 December 2023 (UTC)[reply]
@Thadh: Discussion will be considerably less confusing if people put their Supports, Opposes and Abstains under each individual case rather than grouping them together at the bottom. —Mahāgaja · talk 18:06, 4 December 2023 (UTC)[reply]
@Mahagaja: I had quite general remarks: Living languages - split. Unattested languages - no split. Rest - abstain. I think repeating this ten times is a bit overkill. Thadh (talk) 21:12, 4 December 2023 (UTC)[reply]
I'm usually sympathetic to adding extinct language X even if it's only attested as quotations/mentions/etc in old records in language Y, as long as we're sure X was a language (and different from, not just a dialect of, Y or another language). With Xiongnu, it seems like no one is sure which of various unrelated ethnolinguistic families the Xiongnu people and language(s) might have been from, or even if it was composed of multiple ethnolinguistic groups. That last part gives me pause. Are scholars generally in agreement that the attested words from the Xiongnu are all in one language, or is this like e.g. "Loup" where it's multiple different languages? (We currently have Category:Loup B language, but this is questionable and it seems good that we don't have any entries.) - -sche (discuss) 21:15, 4 December 2023 (UTC)[reply]
@-sche A lot of that lack of certainty comes from two factors:
  • Because Xiongnu is filtered through Old Chinese characters, any kind of reconstruction therefore relies on us being able to accurately reconstruct the readings of those characters. This is something that is gradually improving, and - for example - we are in a much better position to make this kind of judgment than Pulleyblank was in the 1960s
  • There’s been a huge amount of (understandable) speculation as to whether the Xiongnu and the Huns were one and the same. If I had to put money on it I’d say they probably were related, but I strongly suspect there was a large dialect continuum involved (just as there was with the Mongolian languages a millennium later). However, I’m certainly not proposing we merge Hunnic with Xiongnu or anything as radical as that. What we do know is that the inscriptions which were found were created by the same Xiongnu who are written about in Old Chinese sources, because they were excavated in the old Xiongnu capital of Longcheng in Mongolia, which was discovered quite recently. The question is whether they’re in Old Chinese or Xiongnu, but I’m inclined to agree with Vovin that the evidence suggests the latter.
Theknightwho (talk) 03:36, 5 December 2023 (UTC)[reply]

Move {{}} to {{borrowed arrow}} (or some other English words). Wiktionary:Templates § Naming templates says, "If you can, try to avoid using characters outside the ASCII encoding". A single character, non-ASCII template name with no aliases makes it hard to use, hard to link to, and hard to search for (at least for people like me with ASCII keyboards). — excarnateSojourner (ta·co) 20:44, 14 December 2023 (UTC)[reply]

Support but let's please keep {{}} as an alias. Catonif (talk) 17:21, 3 January 2024 (UTC)[reply]

Sanskrit.

(Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Inqilābī, Getsnoopy, Rishabhbhat, Dragonoid76): The attested orthography for Sanskrit does not use naked hal karima to form consonant clusters. Instead, it uses touching letters or ligatures. (The spelling can be viewed properly using Noto Sans Sinhala.) The spelling should therefore be සංස‍්කෘත, not සංස්කෘත. --RichardW57 (talk) 08:45, 21 December 2023 (UTC)[reply]

Also pinging @AleksiB 1945. --RichardW57 (talk) 08:49, 21 December 2023 (UTC)[reply]

I am re-starting this discussion with a clearer proposal: move Category:Shanghainese to Category:Shanghainese Wu, to distinguish it from Category:Shanghainese Chinese. Support, oppose, comments? - -sche (discuss) 08:04, 28 December 2023 (UTC)[reply]

  • Support. Benwing2 (talk) 01:57, 1 January 2024 (UTC)[reply]
  • Strong oppose. @Musetta6729 and I have discussed this previously in private and have already cleaned up Shanghainese Chinese, which we both found unnecessary as most of the terms in it can be classified as either "chiefly Shanghainese (Wu)" or just plain Shanghainese. As correctly identified previously, the Chinese category contained mostly Wu terms, which we have already dealt with. We have already dealt with the majority of the category's pages, and left four that could also be removed:
    • 鄉下人乡下人 (shián-gho-gnin), 硬盤硬盘 (ngan-boe), and 硬盤人硬盘人 (ngan-boe-gnin) are all generally "xenophobic" terms that can be classed as "chiefly Shanghainese (Wu)" (or something similar)
    • 三環三环 (sé-gue) is a geographical term that pertains to the city of Shanghai. We can simply remove the Shanghainese Chinese label and deal with it much like the other geographical terms, cf. 筲箕灣筲箕湾 as just one example
If we implement these two measures, the Chinese category will be completely vacated and can potentially be removed. Even if we do not remove it, I would like for at least some dignity to be given to Shanghainese, as the to-be completely unused label will get the succinct "Shanghai" name while the language of urban Shanghai will be relegated to the term "Shanghainese Wu", which to be frank, we both found somewhat insulting. — 義順 (talk) 12:57, 26 January 2024 (UTC)[reply]
@ND381 I am confused why you think "Shanghainese Wu" is insulting, unless you deny that Shanghainese is a variety of Wu. As for the label, that is an orthogonal discussion and we can change it any way we want. Benwing2 (talk) 19:48, 26 January 2024 (UTC)[reply]
@Benwing2: Wu is a grouping of languages. No one speaks "Wu". We treat it as part of Chinese for practical reasons, but the Wu languages are quite divergent from the rest of Chinese, and presumably fairly distinct from each other. I suppose they see it as analogous to "English West Germanic" or "Ukrainian East Slavic". Chuck Entz (talk) 20:30, 26 January 2024 (UTC)[reply]
@Chuck Entz If there is a disambiguation issue, I don't see the problem with adding the language family onto the end, compare CAT:Silesian East Central German (to distinguish it from CAT:Silesian language, which is Slavic). Maybe their point is rather that they think just "Shanghainese" should refer to the Wu variety, and anything else have a qualifier. Benwing2 (talk) 20:43, 26 January 2024 (UTC)[reply]
I would like to add a bit onto what has already been said here. Shanghai is incredibly complex sociolinguistically, and what is referred to as "Shanghainese" (on wiktionary as much as elsewhere) tends to be the city-centre varieties that developed during the course of the last centuries as a lingua franca between the original Shanghai locals and migrant populations from nearby areas who now constitute a major part of Shanghai.
But Shanghai in fact has a whole range of regional languages - a range of Wu varieties, in fact, which can all be fairly divergent from each other but still very much maintain mutual contact and influence internally. When someone speaks of "Shanghainese", if they don't specify non-city-centre Shanghainese, then one would usually assume they are talking about city-centre or something adjacent to that. But "Shanghainese Wu" feels then more vague somehow as to whether it refers to any dialect, sociolect or topolect that can be considered "a Wu variety of Shanghai which is not necessarily city-centre", a label which is not in itself necessarily useful, and can potentially even be quite confusing in my opinion.
As of now we have been adding modifiers such as "urban" or "suburban" in front of "Shanghainese" when we come across situations where we need to clarify, and that's been working alright. But coming back to the original point, I think it is also just that "Shanghainese Chinese" - which currently is used as "Standard Mandarin terms found in Shanghai" (the language itself not being native to Shanghai, simply spoken in Shanghai for being the official national language) - should arguably not take precedence to the Chinese varieties that are native to Shanghai instead. — Musetta6729 (talk) 02:47, 27 January 2024 (UTC)[reply]
This discussion is very much more of a footnote, but the fact that the significantly more irrelevant category gets the label that the language is meant to have (ie. I would prefer for S’nese the language to get "Shanghainese" or even just "Shanghai" like how other non-top level groups/lects are handled) and instead we have to settle for the (intentionally obtuse?) mouthful that is "Shanghainese Wu" — not even Northern Wu à la Quanzhou Hokkien or Hong Kong Cantonese. Again, this is very much not the main point and from your profile I'm assuming you don't know that much about socioling and language politics in the area so it would be I suppose easier to leave the discussion here
The main problem is still just the category: S’nese Wu is unnecessarily obtuse and if we can get back to the point of whether or not we can just clear S’nese Chinese’s four remaining pages we can have a more fruitful consensus — 義順 (talk) 20:34, 26 January 2024 (UTC)[reply]
@ND381 Thanks for your comment. I am fine with your proposal to empty the remaining four terms from the category and remove or rename it. Benwing2 (talk) 20:39, 26 January 2024 (UTC)[reply]
As there has not been any negative comments regarding the vacating of Category:Shanghainese Chinese, I have removed all four remaining entries in the category.
Regarding the situation of the naming convention, unless there are any further objections, the current Category:Shanghainese Wu should be renamed to just "Shanghainese", and S'nese Chinese is to be either kept as is, renamed to something like "Standard Chinese in Shanghai", or deleted. Of the three options, I believe the last one would be best, as there genuinely isn't a need for it: "chiefly Shanghainese" would cover for most if not all cases of words in Standarin that are used in Shanghai, as those terms almost/always originate from the local variety anyways. Misspellings or Shanghainese-influenced sayings in Standarin that are not found in Shanghainese should perhaps be labelled with "influenced by Shanghainese", if, again, is necessary, which I highly doubt.
For the time being, the "Shanghainese Wu" label will be renamed to "Shanghainese" as per above discussions, and to stay in line with other "-(n)ese" labels (cf. Hainanese, Sichuanese). If for whatever reason S'nese Chinese (ie. Standarin used in Shanghai that isn't "chiefly Shanghainese") is actually needed, unless there are any objections, something along the lines of "Standard Chinese, Shanghai" or "influenced by Shanghainese" (if appropriate) is to be used, though again, there really aren't any words that would warrant this designation. — 義順 (talk) 13:50, 31 January 2024 (UTC)[reply]
Apologies for the late reply; I too am fine with renaming or removing "Category:Shanghainese Chinese" (and updating Module:labels). In some similar situations we've used noun forms instead of adjectives to make this kind of distinction, e.g. "Category:Switzerland German" (for de) was renamed to that name to distinguish it from "Swiss German" the Alemannic lect, so if we need a category for "standard Chinese / Mandarin terms chiefly found in Shanghai", it would fit the overall schema to name it something like "Category:Shanghai Chinese"... but if people just don't want such a category, and want {{lb|zh|Shanghainese}} / {{lb|cmn|Shanghainese}} to throw an error and put the entry in a cleanup category so someone can re-code it as a wu entry, that works too... - -sche (discuss) 01:30, 6 March 2024 (UTC)[reply]
@ND381 @-sche I have deleted the empty category Category:Shanghainese Chinese. {{lb|zh|Shanghai}} does not currently categorize into Category:Shanghainese Wu, but IMO it probably should, for consistency with the labels Changzhou, Hangzhou, Huzhou, Ningbo, Suzhou, Wenzhou, all of which categorize into the corresponding Wu category (Category:Changzhounese Wu, Category:Hangzhounese Wu, Category:Huzhounese Wu, Category:Ningbonese Wu, Category:Suzhounese Wu, Category:Wenzhounese Wu). ND381 (I think) suggested renaming Category:Shanghainese Wu -> Category:Shanghainese; I am not opposed to this but if we are to do it we should (a) rename some or all of the other *nese Wu categories, (b) come up with a consistent and clearcut rule for which lects get called Foonese and which ones get called Foo Wu (possibly the separation is for major urban varieties vs. all others?), (c) harmonize the resulting category names (whatever they are) to the names of the corresponding etymology-only varieties in Module:etymology languages/data. Also if for some reason we find the need to create Category:Shanghainese Chinese, it should have a corresponding label that makes its scope clear, i.e. NOT Shanghai but either Shanghai Mandarin or Shanghai Standard Chinese, depending on the contexts in which the term is used and found. Benwing2 (talk) 05:00, 2 April 2024 (UTC)[reply]

Old, less-focused discussion which evinced some support for this:

(Notifying Atitarev, Tooironic, Suzukaze-c, Justinrleung, Mar vin kaiser, Geographyinitiative, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Thedarkknightli, Michael Ly): I have no idea what the intended difference between these two categories is, but in practice there's none. The former gets triggered by the Shanghainese Wu label while the latter gets triggered by either Shanghai or Shanghainese. They should be merged. Benwing2 (talk) 04:03, 11 October 2020 (UTC)[reply]

Comment: If we are trying to make a distinction, one category should be referring to Shanghainese Wu, and another should be referring to any variety spoken in Shanghai (i.e. both Shanghainese Wu and Mandarin). I don't know if this distinction should/can be made, though. — justin(r)leung (t...) | c=› } 04:06, 11 October 2020 (UTC)[reply]
@Justinrleung: Can 硬盤人 be an example that is used in "general Chinese in Shanghai"? -- 07:45, 11 October 2020 (UTC)[reply]
@沈澄心: Yes, I think so. — justin(r)leung (t...) | c=› } 07:49, 11 October 2020 (UTC)[reply]
I guess the issue then is, do we have native Shanghainese speakers here who can make this distinction? It looks to me like most entries in both categories are Wu terms. Benwing2 (talk) 22:05, 11 October 2020 (UTC)[reply]
If we have any entries that make this distinction (and one such entry has been convincingly adduced above), then merger would result in losing information. Do you want Shanghai-specific Mandarin terms to go uncategorised as such? —Μετάknowledgediscuss/deeds 03:26, 12 October 2020 (UTC)[reply]
@Benwing2, Metaknowledge: @Thedarkknightli probably knows the Mandarin terms and may know some of the Wu terms. For Shanghainese, we have some resources we can consult, so it's the Mandarin terms that are more difficult to figure out. The terms that are in CAT:Shanghainese are Wu for sure (and I would prefer to call the category "Shanghainese Wu" to make it clear). We would need to sift through the CAT:Shanghainese Chinese category to check what's actually Wu and relabel them with "Shanghainese Wu" or just "Wu". BTW, there might be some need to revamp other labels/categories, like "Sichuan" displaying as "Sichuanese" and categorizing to CAT:Sichuanese Mandarin, which could be confusing when we introduce terms in Sichuanese Hakka or Xiang (which we might have some already). — justin(r)leung (t...) | c=› } 03:40, 12 October 2020 (UTC)[reply]
(edit conflict) A native Shanghainese speaker would be User:辛时雨 but he is not very active.
What we lack with regional labels, which is specific to Chinese since the merger needs to work for varieties and subvarieties is the ability to add variety specific categories, {{lb|zh|Shanghai|Wu}} is meant to not only label a term but also categorise it as Shanghainese Wu but {{lb|zh|Shanghai}} is for general Chinese, esp. Mandarin. --Anatoli T. (обсудить/вклад) 03:43, 12 October 2020 (UTC)[reply]
I think you would need to use {{lb|zh|Shanghai Wu}} or something, not {{lb|zh|Shanghai|Wu}}, since I don't think the same label ("Shanghai") can categorize into two categories. Anyway, add my voice to those saying that if there is intended to be a distinction here, the category names (and, probably, boilerplate texts) should be made clearer. We could also consider "see also"-style crossreferencing them, like Category:Louisiana French and Category:Louisiana Creole French language. - -sche (discuss) 17:26, 13 October 2020 (UTC)[reply]
Rename to Category:Shanghai Wu (阿拉) and Category:Shanghai Chinese (硬盤人). —Fish bowl (talk) 06:36, 6 February 2022 (UTC)[reply]
list of entries to examine
Special:Search/incategory:"Shanghainese" incategory:"Mandarin lemmas"
Special:Search/incategory:"Shanghainese Chinese" incategory:"Mandarin lemmas"

Needs merging with Venus's comb and possibly Venus comb. Ultimateria (talk) 20:11, 30 December 2023 (UTC)[reply]

Indeed, according to Google NGrams, since about 1960 Venus comb is the most common form, with the others more of less in a tie. DCDuring (talk) 22:45, 30 December 2023 (UTC)[reply]
But there are two species with the common names, one a plant, the other a shellfish. I'll see whether Scandix pecten veneris”, in OneLook Dictionary Search. and Murex pecten”, in OneLook Dictionary Search. can help resolve this. DCDuring (talk) 22:50, 30 December 2023 (UTC)[reply]
Still not sure about Murex pecten's vernacular names.
The main issue here is strictly a matter of orthographic rules: how do you spell the combination of the possessive clitic, 's, with a word that ends in "s" in the singular? I was taught " s' ", but it looks like professionally edited works have used " s's " as well, or just avoided the issue by omitting the clitic. There's variation along those lines for both the plant and the mollusk. I suspect the differences in occurence of the spellings has as much to do with time and place of publication as with any difference between usage of the plant name vs. the animal name. Chuck Entz (talk) 02:17, 31 December 2023 (UTC)[reply]

Move {{}} to {{reshaped arrow}} (or something similar) to make it easier to type, as with #Template:→ above. — excarnateSojourner (ta·co) 22:54, 30 December 2023 (UTC)[reply]

Move {{asdfg}} to {{protologism warning}}. I found the name of this template confusing when I stumbled across it. — excarnateSojourner (ta·co) 23:00, 30 December 2023 (UTC)[reply]

2024 — January[edit]

This is a Vietnamese-specific template. Move to {{vi-chu Han form of}} or similar, and delete the redirect. Compare {{vi-Nom form of}} (whose redirect {{Nom form of}} should be deleted). @Benwing2 This, that and the other (talk) 01:47, 1 January 2024 (UTC)[reply]

@This, that and the other Support. Benwing2 (talk) 01:58, 1 January 2024 (UTC)[reply]
@This, that and the other Actually, I think this should just be {{vi-Han form of}}. Note that the writing system for Nom characters is called chữ Nôm, and correspondingly chữ Hán for Han characters. Maybe we can ping some Vietnamese editors? (Notifying Mxn, PhanAnh123): Benwing2 (talk) 05:38, 1 January 2024 (UTC)[reply]
Agreed, the term chữ Hán is specific to Vietnamese, and we can drop the “chữ” from the template name for consistency with {{vi-Nom form of}}. Minh Nguyễn 💬 17:51, 1 January 2024 (UTC)[reply]

Medieval Greek from Ancient Greek[edit]

Please, as in Wiktionary:Beer_parlour/2024/January#Petition_to_upgrade_Medieval_Greek, from Category:Ancient Greek language. (I am sorry that my browser has difficulty to read much of this page.) ‑‑Sarri.greek  I 09:45, 2 January 2024 (UTC)[reply]

Support. The request is to split grk-gkm Medieval Greek out of grc Ancient Greek. Previous discussion at Wiktionary:Beer parlour/2023/March#Medieval Greek. @Fay Freak, Al-Muqanna, Nicodene, Vahagn Petrosyan, JohnC5, Benwing2, -sche, the people who participated in that discussion which (like most discussions at Wiktionary, unfortunately) ended inconclusively. By the way, we've been using gkm as if it were an ISO 639-3 code, but in fact it isn't one. A request was made for that code many years ago, but it's never been approved or denied. Therefore if the split is approved, we need to use the exceptional code grk-gkm. —Mahāgaja · talk 11:10, 2 January 2024 (UTC)[reply]
Support, but only if any editors are willing to clean up the mess left behind by the split, otherwise this should wait a bit. Also, we have to first figure out which of the many modern Greek varieties (Standard Greek, Mariupol Greek, Pontic Greek, Italiot Greek, Tsakonian, etc.) are to be descendants of Medieval Greek, and which shouldn't. Thadh (talk) 11:39, 2 January 2024 (UTC)[reply]
I'm fairly familiar with Attic Greek, but not with Medieval apart from what I've read on Wikipedia. The sources that I've typically used for Ancient Greek entries when I used to create them don't cover Medieval. I wouldn't be opposed if you and a team of other people familiar with Medieval want to split it. I don't know if I can be of much use unless there are bugs in modules or something. — Eru·tuon 08:25, 4 January 2024 (UTC)[reply]
Thank you. I will "clean up the mess left behind the split", @Thadh. It is only 248 words that need fixing, plus all related Modern Greek (el) etymologies; I have a list of 711 corrections. I do a lot of Medieval Greek at el.wiktionary, please do not worry, I will not destroy anything. I need one week to fix everything. Please, (@Erutuon) also Module:grc-pronunciation, Section Period for Template:grc-ipa-rows, Template:grc-ipa-rows-byz, Template:grc-ipa-rows-koi needs to say 10th century Medieval (or Mediaeval, according to your HomeRules) not 'Byzantine', Also at its /data might add med1 med2 also would be a nice addition. I am very happy, to resume work for med.greek! ‑‑Sarri.greek  I 04:51, 6 January 2024 (UTC)[reply]
I suppose actually the lines for Medieval Greek should be removed from {{grc-IPA}} and moved into a separate {{grk-gkm-IPA}}. Likewise the option for |dial=gkm needs to be removed from all grc inflection tables and new grk-gkm inflection tables created. —Mahāgaja · talk 08:19, 6 January 2024 (UTC)[reply]
@Mahagaja, no, not needed. IPA will be with parameter period=byz1 (or period=med1, if Erutuon might give an alias to this parameter). Also: learned medieval inflections are identical to the standard ancient inflections and there is no need to provide them separately. Nothing different. At el.wikt, if we care to repeat them, we add title: learned medieval inflection as in ancient greek. But we shall not provide any of that now. Never mind for vulgar inflections (I'll let you know about these) Thank you for your concern. ‑‑Sarri.greek  I 08:26, 6 January 2024 (UTC)[reply]
We're really not supposed to use one language's templates in another language's entries, so if grk-gkm and grc are two different languages, then we're really not supposed to use things like {{grc-IPA}}, {{grc-decl}}, {{grc-adecl}}, and {{grc-conj}} in grk-gkm entries. And there may still be some differences; for example, does Medieval Greek ever use the dual number? If not then the dual shouldn't be shown in {{grk-gkm-decl}} and {{grk-gkm-conj}} as it is in {{grc-decl}} and {{grc-conj}}. —Mahāgaja · talk 09:10, 6 January 2024 (UTC)[reply]

Thank you, (sorry, this page gives me page unresponsive at my Chrome browser, and is often difficult to write here.) Thank you @Mahagaja, The code gkm is in wide use, and although not -still- activated by ISO; there have been attempts to draw attention to its acceptance, and will notify if something changes officially. At el.wikt there are also dialectal gkm‑crt and gkm‑cyp as subordinate codes.
Thank you @Thadh, I will check all instances of insource:xxx and intitle:xxx occurances of relevant words and correct them. For the update Module:families/data/hierarchy#Hellenic and Module:etymology languages/data#gkm I submit here (quoted) the official greek source: Modern Greek Dialects What is a dialect? - Research Centre for Modern Greek Dialects, Academy of Athens

Nowadays we consider as dialects the Pontiac (in which the Greek of Crimea-Mariupol are included), the Cappadocian, the Tsakonian and the Southern Italian. All the other regional variants of the Modern Greek Standard are known as idioms. In particular, the Cretan and Cypriot idioms are exceptionally known as dialects, thus acknowledging an intermediate level of language variation.

All the modern Greek dialects Cappadocian.cpg, Italiot.grk-ita, Pontic.pnt which includes Mariupol idiom) and Modern Greek.el itself come from Medieval Greek, except Tsakonian.tsd, which is a special case. Thank you ‑‑Sarri.greek  I 13:07, 2 January 2024 (UTC)[reply]

A bit off-topic, but most researchers I have read claim Mariupol Greek is, in fact, not a Pontic lect and doesn't share much if anything in common with Pontic it doesn't with other Greek lects. Thadh (talk) 13:34, 2 January 2024 (UTC)[reply]
I kinda doubt editors are willing to clean up, or review the dialectology of the Abstandsprachen. The ideological distinction is barely worth the effort for that and for always checking in which chronolect a word has been used, an argument I often use, as we do not go completely without distinction if we don’t split at the L2 level: now it means we write a label if we know and abstain if we don’t bother. The result could become more often that someone doesn’t add a valid entry or etymological note due to fear of making a mistake. Fay Freak (talk) 19:46, 2 January 2024 (UTC)[reply]

I oppose the change in name from “Byzantine Greek” to “Medi(a)eval Greek” for referring to this chronolect. I’m undecided about the split itself. @Sarri.greek: Could you point us to some well-developed Byzantine Greek entries in το Βικιλεξικό to give us some idea what they’d look like, and to what extent they’d contrast with Ancient Greek and Modern Greek entries, please? 0DF (talk) 02:19, 7 January 2024 (UTC)[reply]

@0DF. _For the term, professors of linguistics might answer your question (ref). _Examples Παραδείγματα at wikt:el:Κατηγορία:Μεσαιωνικά ελληνικά. ‑‑Sarri.greek  I 08:45, 7 January 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Sarri.greek: Thank you for your response. I'll address the παραδείγματα first.
That category you linked (el:Κατηγορία:Μεσαιωνικά ελληνικά = “Category:Mediaeval Greek”) contains 1,804 entries, so I hope you'll forgive me that I only checked out the first column of entries (from el:ἀβαμπαρλιέρης to el:ἀλλάγιον — 63 pages). Of those, none of the gkm entries contained IPA transcriptions, and the only ones with inflection tables are el:αἰγοβοσκός and el:ἀλλάγιον. Those don't appear to be what I'd call "well-developed". As to contrast, the declension tables in αἰγοβοσκός and ἀλλάγιον are identical to Ancient Greek ones, even including the δυϊκός (duïkós, dual). As they are, those 63 entries suggest there would be no benefit to splitting gkm out of grc and that doing so would only create useless redundancy. That being said, I suspect that there could be some value in the split in the cases of entries like el:-άγρα, el:-αινα, and el:-αλγία, which present (currently unseized) opportunities to explain the loss of the accusative , the loss of the dative entirely, and the collapse of the Ancient nominative–vocative plural -αι and accusative plural -ᾱς into the Modern -ες. I also see cases like the Modern Greek entry καλοκαίρι (kalokaíri, summertime, summer), which currently traces the word's etymology, via Byzantine Greek καλοκαίριν (kalokaírin, good season, good weather), to Ancient Greek καλοκαίριον (kalokaírion, fine weather). It would be great to know how καλοκαίριν (kalokaírin) declines; that being said, is there any reason why its declension couldn't be showcased perfectly well as a {{lb|grc|Byzantine}} {{alternative form of|grc|καλοκαίριον}}?
Now to the nomenclatural issue.
I've taken a look at the authority you cited; for the benefit of others reading this, here are its bibliographical details:

  • David Holton with Geoffrey Horrocks, Marjolijne Janssen, Tina Lendari [Stamatina Lentari], Io Manolessou, and Notis Toufexis [Panagiotis Toufexis] (2019) The Cambridge Grammar of Medieval and Early Modern Greek, four volumes, Cambridge · New York · Port Melbourne · New Delhi · Singapore: Cambridge University Press, →DOI, →ISBN, →LCCN

The authors' rationale for their disuse of the term Byzantine Greek is to be found in the introduction to the work, in this paragraph from page xix:

The system of periodization that we have used is not based on external criteria, which might relate to historically significant dates, such as wars, conquest or independence. For this reason we do not employ the term “Byzantine Greek”: for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not “Byzantine” in a political sense. Our criteria are instead internal ones, based on clusters of important linguistic changes that we see as occurring around 1100, 1500 and 1700 (for details see Holton 2010, Holton/Manolessou 2010). Consequently, we employ the following terminology in order to denote sub-periods of the history of Greek, terms that also conveniently correspond to those widely used for periodization in Western historical thought: Early Medieval (EMedG) from about 500 to 1100; Late Medieval (LMedG) from about 1100 to 1500; Early Modern (EMG) from about 1500 to 1700.

Appeals to authority are all well and good, but that is poor reasoning. Yes, politics affect language, and the Byzantine Empire, whilst it existed, was (I think you'll agree) the political, cultural, and linguistic "centre of gravity" of the Greek world. The authors write that “for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not ‘Byzantine’ in a political sense” (my emphasis); however, a person's language doesn't (immediately) change with political borders. Earlier op. cit., on page iii, there occurs the sentence “The geographical area where Greek has been spoken stretches from the Aegean Islands to the Black Sea and from Southern Italy and Sicily to the Middle East, largely corresponding to former territories of the Byzantine Empire and its successor states.” Doesn't that show the centrality of that polity to the history of the Greek language during this period? The authors' reason is weak, and I reject it.
I see another problem here, which is that Holton et al. seem to be treating this chronolect as existing between AD ~500 and ~1700. As you probably know, the Middle Ages (a.k.a. the Mediaeval period) are traditionally bookended by the falls of two Roman Empires, starting with the fall of the Western Roman Empire in AD 476 and ending with the fall of the Eastern Roman Empire (i.e. the Byzantine Empire) in 1453; it's not too much of a stretch to push it later, to 500–1500, but I don't know any informed person who calls the seventeenth century mediaeval, so we couldn't call this chronolect “Medi(a)eval Greek”. Holton et al. are not alone in this, either: on page xviii op. cit. they mention the “dictionary of Kriaras and the Vienna-based Lexikon zur byzantinischen Gräzität”; that “dictionary of Kriaras” is Emmanuel Kriaras' Λεξικό της Μεσαιωνικής Ελληνικής Δημώδους Γραμματείας, 1100–1669 (Dictionary of Mediaeval Greek Vernacular Literature, 1100–1669, my emphasis). Maybe the Greek Μεσαίωνας (Mesaíonas) is conceived of differently from the English Middle Ages. It would be possible to call the chronolect “Mesaeonic Greek”, but we'd very much be neologising there; I could only find one instance of meseonic, so the adjective alone wouldn't even satisfy the criteria for inclusion.
Finally, I note that the other dictionary mentioned alongside Kriaras' is entitled Lexikon zur byzantinischen Gräzität (Lexicon of Byzantine Graecity), so it's apparent that not everyone rejects the term Byzantine Greek. Indeed, a text search for the string byzantin (case- and diacritic-indifferent) in the bibliography of The Cambridge Grammar of Medieval and Early Modern Greek (which occupies pages xxxvii–clxvi thereof) finds 201 instances. Some of those may be false positives, but that search would also have missed any instances hyphenated across a line break (byz-antin, byzan-tin, vel sim.) or in languages that spell the word bizant- or otherwise. My point is that Byzantine Greek is still a common term and one we should use.
0DF (talk) 09:23, 8 January 2024 (UTC)[reply]

A bit of a nitpick, Byzantine Greek isn't any better than Medieval Greek as a label for the language after the fall of the Byzantine Empire. Strictly speaking it wasn't Byzantine Greek at that point, but Ottoman. But either term applies well to the majority of the period. — Eru·tuon 00:42, 9 January 2024 (UTC)[reply]
I don't like the term Byzantine Greek because a naive reader could think it referred to a regional dialect rather than a chronolect. It would be easy for someone to think it referred to Greek as spoken in Byzantium as early as the time of Alexander the Great, and that it would not refer to Greek as written in Athens or Alexandria in AD 600. Also, 0DF, Holton et al. explicitly do not call the period from 1500 to 1700 medieval; they call it Early Modern Greek, just as we call the English of the same period Early Modern English. Wiktionary already uses 1453 as the border between grc and el; there's no reason separating grk-gkm out from grc should entail shifting the starting date of el later than it currently is. —Mahāgaja · talk 08:02, 9 January 2024 (UTC)[reply]
I don't have much of a stake in this but I also favour Medieval Greek, though I wouldn't be opposed to having Byzantine Greek as an etym-only language attached to it. Theknightwho (talk) 08:43, 9 January 2024 (UTC)[reply]
Side issue: if we split Medieval from Ancient, I suppose the Byzantine flag which is currently used for Ancient Greek in the "Add country flags next to language headers" gadget will need to be moved to Medieval Greek, and Ancient Greek will either need a new flag or no flag. - -sche (discuss) 19:48, 12 January 2024 (UTC)[reply]
Preferably none. —Mahāgaja · talk 22:23, 12 January 2024 (UTC)[reply]

@Mahagaja, Erutuon, Thadh, since I do not see any more objections: _phase_1: I have already cleaned up Modern Greek etymologies involving gkm (need 70 more to do, also supplying sources, ipa etc), to be ready for the term Medieval instead of Byzantine. This is

These steps are for the name-change. If you provide permission and agree to upgrade, from grc, then _phase_2 from Module:languages/data/3/g to Module:languages/data/exceptional, the working alias gkm is already in place and I will be able procede with corrections for titles of Sections wherever needed, sources. etc. Especially where Modern etymologies need a Medieval lemma. Thank you for your help. ‑‑Sarri.greek  I 10:56, 3 February 2024 (UTC)[reply]

There are objections. I would like to add that I too oppose renaming from Byzantine Greek or extending its time frame past the 15th century. Nicodene (talk) 02:17, 4 February 2024 (UTC)[reply]
@Nicodene, I have suggested nothing about post 15th century = Early Modern Greek which we deal with in polytonic at el.wikt, not monotonic. But we are at _phase_1 now, which is to rename 'Byzantine language' to Medieval Greek. I am glad that you are interested in periodisation of Hellenic language; it is rare that non hellenists are interested or take time to study this. We can discuss it, if you wish at our Talk pages? Thank you ‑‑Sarri.greek  I 02:35, 4 February 2024 (UTC)[reply]
(Why not here?)
I see. For the record I do support splitting it out of Ancient Greek, even if the (prescriptively correct, 'learned') inflections are going to be largely the same.
So far I don't see any real argument against the label 'Byzantine'. The point about political control is a bit spurious as the label 'Byzantine' is no way limited to the political level. It is civilisational.
The point about 'Byzantine Greek' being misinterpretable as 'the dialect of the colony of Byzantion' might be convincing if not for the unlikelihood of someone being simultaneously knowledgeable enough about history to even be aware of the (let's be honest) rather unimportant pre-Constantine city, yet also historically illiterate enough to be unaware of what 'Byzantine' means 99 times out of 100. Nicodene (talk) 02:58, 4 February 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Sarri.greek: Respectfully, I think you're being too hasty with this. I acknowledge I've been slow to respond; that has in large part been due to my work researching Atticism (see it, Citations:Atticism, and some of the word's relations) in connection with the more substantive question of whether there is value in splitting gkm from grc. My understanding of that matter is more-or-less in line with this paragraph from the website for Trinity College Dublin's 2024 International Byzantine Greek Summer School (IBGSS):

Byzantine Greek is the dominant form of Greek written during the Byzantine Empire (AD 330–1453). The spoken language changed significantly in this period and came close to Modern Greek, but most Byzantine authors use conservative forms of Greek that looked back to Classical Attic, the Hellenistic Koine and Biblical Greek. Therefore much of the vocabulary, morphology and syntax of Byzantine Greek are not significantly different from Classical Greek, which makes this course a suitable preparation also for reading Classical literature and the New Testament.

But to the matter of the nomenclature: I had previously been arguing that Byzantine Greek is just as good a term as Medieval Greek, but it appears that they may not be entirely synonymous. Please see the quotations I've collected at Citations:Medieval Greek. You'll see that Evangelinos Apostolides Sophocles uses the term Byzantine Greek (for 330–1453) and remarks that “if the expression Mediæval Greek is to be used at all, it should be restricted to the language of [the second epoch of the Byzantine period]” (622–1099), whereas Irach Jehangir Sorabji Taraporewala states that “Byzantine Greek is a direct development from the literary dialect of the second transition period [300–600]” but that “[l]iterary Mediaeval Greek [1000–1450] is a development of the colloquial of the previous (Neo-Hellenic [= Byzantine Greek]) period [600–1000]”; those two sources directly contradict on the details, but they both distinguish the two chronolects. Edward Augustus Freeman speaks explicitly of “a literature, mediæval Greek or Romaic, as distinguished from Byzantine” and the writer for UNESCO discusses in a single sentence borrowings into “Byzantine Greek”, “mediaeval Greek”, and “Neo-Greek”; they appear to have particular time periods in mind, but I'm not sure what they are. And George Leonard Huxley refers to “Byzantine Greek” and “mediaeval Greek language and literature” in consecutive sentences, presumably synonymously, but not obviously so. Many more sources use both terms within the same work, without it being clear whether the terms mean different things or whether they're making a distinction without a difference. Can you explain these distinctions? Are they valid? If not, why not? If so, do you propose more than one offshoot to grc? If not, why not? If so, how many, and what should they be?
@Erutuon: I would argue that, in the same way that Greek writers contemporaneous with but geographically outside the bounds of the Byzantine Empire may nevertheless conform to Byzantine literary norms, Greeks writing after the Empire's fall may, from inertia or nostalgia, also conform to Byzantine literary norms, despite the change in their political context. By contrast, the Middle Ages are strictly chronological and have an exact terminus in the 1453 fall of Constantinople.
@Mahagaja: In my experience, Byzantium is used far more frequently to refer to the Byzantine Empire than it is to refer to the city; most people are unaware that the usage is originally a synecdoche and, whilst a lot of people know Istanbul used to be called Constantinople, far fewer know that Constantinople used to be called Byzantium (and fewer still know that Byzantium used to be called Lygos, but I digress). As such, I don't think that it is at all likely that a naïve reader would make that mistake. A mistake I know some people make, however, is with the qualifiers High or Upper and Low or Lower in geographical and geographically-based terms like Upper Egypt vs. Lower Egypt and High German vs. Low German, with High and Upper mistaken to mean "north(ern)" and Low and Lower used to mean "south(ern)"; I assume the confusion arises from the conventional orientation of maps in the Anglosphere. Despite that confusion, I would not, and I doubt you would, advocate replacing those terms with ones less susceptible to such naïve confusion. For another example, I'm sure a naïve reader could mistake Andalusian Arabic for Arabic spoken in the (present-day) Spanish region of Andalusia; the synonym Moorish Arabic is not susceptible to that confusion, so should we use that instead? There are other confusables as well, I'm sure. ⸻ Re Holton et al., I know they don't call Greek 1500–1700 "Medieval"; the fact that I quoted above a paragraph of theirs that ends "Early Modern (EMG) from about 1500 to 1700" should make that clear. My meaning was that Holton et al. are treating Greek 500–1700 as a single chronolect, which they call "Medieval and Early Modern Greek" and which Kriaras calls Μεσαιωνική Ελληνική (Mesaionikí Ellinikí). Holton et al. make a point of saying that their “system of periodization…is not based on external criteria” and that their “criteria are instead internal ones, based on clusters of important linguistic changes that [they] see as occurring around 1100, 1500 and 1700”. If we did the same, that might indeed entail shifting the starting date of el later than it currently is.
@-sche: I don't have country flags beside language headers turned on and neither am I inclined to turn them on, but if you're interested in having them, you could use the Argead star (commons:File:Vergina Sun WIPO.svg) for Ancient Greek; the English Wikipedia uses that image in its country infoboxes as the flag of the Empire of Alexander the Great, as well as in many other places.
@Nicodene: I largely agree with you, but if we're going to split out gkm, wouldn't it be better to give the inflections that show the changes taking place between Ancient and Modern Greek? Wouldn't it be rather redundant if they had the same inflectional information as that given in Ancient Greek entries?
0DF (talk) 03:46, 4 February 2024 (UTC)[reply]

More than one set of inflections could be shown - the learned and Atticising versus the humble and 'demotic', at least by the time of the Digenes Akritas. Or, working with one set of inflection tables, cases or endings falling out of vernacular use could be placed in brackets with an explanatory note regarding register. Apart from that there would be differences in phonology and in various cases semantics as well. Nicodene (talk) 03:56, 4 February 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Nicodene: To give us all some idea of the kind of inflectional variability we're dealing with, I added a table to βαθύς (bathús) of its Byzantine forms. There's already a lot there, but that's an underrepresentation, if anything. Annoyingly for our purposes, Holton et al. specifically omit the dative from their paradigms, despite the fact it occurs:

Nominative, genitive and accusative cases continue to exist in LMedG and EMG. The dative case, however, had gradually disappeared from the spoken language during the first millennium and its main functions were reassigned (see Humbert 1930, Lendari/Manolessou 2003, Horrocks ²2010: 183–5, 284, Holton/Manolessou 2010: 546–7). Nonetheless, datives survive in many of the written texts that this Grammar is based on, though mainly in documents and other texts in mixed or higher registers, and they may have a range of inherited functions. Particularly common are datives governed by the prepositions ἐν and σύν. Because the dative had ceased to be part of the spoken vernacular by about the 10th c., dative forms are not included in the paradigms set out in the chapters that follow.¹
¹The only exception that has been made is the dative reciprocal pronoun ἀλλήλοις, on the basis that its occurrence, which is quite rare, seems to be as much a lexical survival as a morphosyntactic feature (see 5.12).

—volume II, § 1.1, pages 241–242

and even in novel formations:

In addition to instances like the above, which could be deemed grammatically “correct” (i.e. in accordance with AG morphology and syntax), we also find dative forms with innovative phonology, stress or morphology, or new lexical items: [τουποθεσίᾳ, Ρεθέμνει, βοθρακοῖς, παρρησιᾷ, Ὀγκριᾷ, Ἀράβοις, ἑταιρίδαις, νήσαις, συνπάσοις, Ἑλλήνοις, δοράτοις, ὀξέοις, ἐμπιστευτηόδες, ἐμπιστευτιόδαις (toupothesíāi, Rethémnei, bothrakoîs, parrhēsiâi, Onkriâi, Arábois, hetairídais, nḗsais, sunpásois, Hellḗnois, dorátois, oxéois, empisteutēódes, empisteutiódais)]
Of particular interest is the use of dative forms for loanwords: [σούγλᾳ, μπασταρδικῷ, σερραγίῳ, ὀντάσι (soúglāi, mpastardikôi, serrhagíōi, ontási)]

ibidem, pages 242–243

Moreover, Holton et al. exclude Atticist texts entirely (“the texts on which this Grammar is based – i.e. texts that are not systematically archaizing” — ibidem, page 243); accordingly, if we're to produce accurate and (in aspiration) exhaustive inflection tables, we shall have to supply the missing Attic forms and datives.
Holton et al. mention the dual number, as far as I can tell, exactly twice in their entire four-volume grammar:

The AG reciprocal pronoun (“one another”) had dual (gen. ἀλλήλοιν) and plural (gen. ἀλλήλων) numbers, and was declined for gender and case (genitive, accusative and dative).

—volume II, § 5.12, page 1,183

The London manuscript can be consulted at http://www.bl.uk/manuscripts/. Ms. Athous Pandel. 538, edited by Vasileiou (2003) has the unusual form εγκρεμίζεσθον Varl. & Ioas. (Pantel.) 303, which is unlikely to be an archaic dual (as the subject is 2 sg.), and probably a writing mistake for ἐγκρεμίζεσουν.

—volume III, § 4.3.1.2, page 1,551, footnote 54

so I don't know whether to infer from their silence that the dual saw no use in Byzantine Greek, or that its use was resticted to Atticist texts, and that it is for that reason that Holton et al. make no mention of it.
Certainly, we can't rely on Holton et al. alone to guide what we do about Byzantine Greek. Nevertheless, that table at βαθύς (bathús) is something concrete to work from. 0DF (talk) 23:37, 7 February 2024 (UTC)[reply]

@0DF The effort is quite admirable, thank you. I can't imagine it is sustainable across hundreds of entries, so generating variants with an automated template would be the long-term approach. The tables would probably include a prominent disclaimer like 'not all forms necessarily attested'. The automated romanisation can probably be prevented somehow to alleviate crowding. Nicodene (talk) 23:19, 8 February 2024 (UTC)[reply]
@Nicodene: I agree that the transliterations take up too much space and that they probably are best removed by default from Byzantine inflection tables. I also agree with including a prominent disclaimer of the kind you describe. I got the forms of βαθύς I added to that table from Holton et al., volume II, pages 746–757, wherein βαθύς serves as their paradigm for “Adjectives with Originally 3rd-Declension Endings” (§ 3.3), specifically “Oxytone Adjectives in -ύς” (§ 3.3.1). On the basis of Holton et al., volume I, page xxxiii (“When whole words are enclosed in brackets in the tables, the forms in question may reasonably be assumed to have existed, but no example has been located in the LMedG and EMG texts examined, e.g. (μιανοῦ), (χρυσοῦ).”), I presented each form which they give in parentheses instead with a preceding asterisk, as is standard in historical linguistics. Despite there being many forms already, the forms given by Holton et al. are an under-representation, if anything (Holton et al., volume I, page xxxvii, prefacing the Bibliography: “Classical, post-classical, early medieval and other learned Byzantine texts are not included below.”; volume II, page 746, below the synoptic table for βαθύς: “Residual [scil. inherited Attic] forms, e.g. βαθέος, βαθεῖς, are not included in the above table, but will be discussed below where relevant.”; ibidem, page 242: “dative forms are not included in the paradigms set out in the chapters that follow”; ibidem, page 243: “the texts on which this Grammar is based – i.e. texts that are not systematically archaizing”); the forms Holton et al. give are only those non-dative forms which occur in lower-register texts written 1100–1700: a rather limited subset of the “Medieval and Early Modern Greek” whole that you'd reasonably expect that they're trying to describe. On the other hand, if we are to adhere to the 1453 cut-off for Byzantine Greek, we need to be careful to exclude those forms that occur only in texts from the seventeenth, the sixteenth, and/or the latter half of the fifteenth centuries.
I am increasingly recognising that inflection tables for Byzantine Greek terms ideally require certain features that are different from those that befit inflection tables either for Ancient Greek terms or for Modern Greek terms. One of those features would be the indication of pronunciation for each form, because the vocalic mergers of Byzantine Greek render its graphemes surjective upon its phonemes (i.e., with the exception of the bijective α/a/ and ου/u/, each vowel may be written in multiple ways, namely: αι, ε/e̞/; ο, ω/o̞/; ει, η, ι/i/; οι, υ, υι/y/, then, upon the completion of iotacism in the eleventh century, /i/) and because the representation of significant phonological processes that Byzantine Greek underwent (synizesis and various deletions) are only haphazardly reflected in spelling; this would call for a tie-in between a module such as that behind {{grc-IPA}} on the one hand and modules such as those behind {{grc-decl}}, {{grc-adecl}}, and {{grc-conj}} on the other. Another desirable feature, in the light of Holton et al., volume I, page xxxiii (“smaller tables classify the allomorphs as ‘General’ (if they occur widely in the texts examined), ‘Restricted’ (if they are found in only part of the period covered by the Grammar, or only in certain areas or certain types of text), or ‘Rare’ (if their occurrence is very limited)”), would be the means seamlessly to mark each form for its respective period, locale, genre, register, and frequency. The need for bespoke inflection tables, distinct from those designed for Ancient and Modern Greek, is an infrastructural and thematic argument in favour of treating Byzantine Greek terms separately from Ancient Greek terms on the one hand and Modern Greek terms on the other. 0DF (talk) 22:51, 6 March 2024 (UTC)[reply]

phase 1[edit]

notifying administrators for grc @Mahagaja, JohnC5, Erutuon also @Thadh, Theknightwho, Benwing2 More than one month has passed. Am I to procede with _phase_1:rename Byzantine to Medieval? Do I have permission by administrators to start? Would an admin help with Module:etymology languages/data to do with "Medieval Greek", and aliases = {"Byzantine Greek"}, ? (because I am not an administrator, I cannot intervene)? Thank you.
On some other points: (I did not expect σχοινοτενεῖς, prolix discussions in this page, but at the corresponding Beer talk. Nevertheless, I am obliged to respond and clarify:)

  • Early Modern Greek: @Mahagaja, yes, the phase 1453-1669 (termination of Cretan literature) is Early Modern Greek (πρώιμη νεοελληνική, interchangeably 'Late Medieval' (όψιμη μεσαιωνική) why? _1. because of its retained mediaevalisms many prominent linguists use interchangeably the terms 'Late Medieval' and 'Early Mod.Gr' -we can discuss further-. And mainly _2. I would not propose a split of Modern Greek or further splits in general. We study it under Med.Gr. because its original script is polytonic, all modules, translit, ipa, etc are already in place -probably some modifications, or a few templates will develop-.
  • Period versus Style/Register. Hellenistic Koine -or even Attic dialect- is used by authors long after the 6th century (even until the 20th century in the form of Katharevousa). The typology (inflections etc) of their words are as in Grammatical rules of Ancient.Gr. (a label like learned may be used for some medieval Koine-style neologisms that might interest Med. We will not duplicate existing Ancient Greek inflections.
  • Polytonic original script. Please note, that greek conservative linguists of past century would 'correct' forms at their editions according to Anc.Gr. rules, while the progressive ones (who were prosecuted during these polemic times -Trial of Accents) like Kriaras, at some point switched to monotonic. Nowadays, it is inconceivable to change the script of an original source at a critical edition. Please note, that everything greek up to 1982 was written polytonic. Nowadays, everything, ancient too- might be seen (e.g. at internet, new books) written monotonically, beacuse it is easy-to-type/cheap-to-print.
  • Polytypy in Hellenic: @Nicodene, yes, it is a fact. It is a stubborn language: flactuation in suffixes runs through all grk. See modern verbs like εκλέγω#Conjugation, Template:el-conjug-'ακούω'. One cannot avoid Modern Greek inflections because of their too many allomorphs -and see how much is omitted! Appendix:Greek_verbs#Omitted.-
  • Will Medieval Greek acquire inflections at once? No. It takes some time for discussions, proposals, trials, to crystallise a method. A Working/Trials/Feedback.for.Med page would be a good way to start. (please check some first attempts at wikt:el:παλληκάριον. wikt:el:σκοῦπα, wikt:el:Template:gkm-κλίση-ουσ, a neologism but learned = in the ancient fashion at wikt:el:ἀπόκρεως. Med.Greek does not have Dative. Our learned friend's 0DF table at βαθύς, is a fusion of Koine datives into Med. for the difficult categories -ύς, -ής of adjectives with lots of learned forms preserved. We do not add Dative at Mod.Greek either (Mod.terms with dative@el.wikt but not at its Tables.).
    I have not proposed any trial for an Appendix of clitic paradigmata and/or tables with the distinction of 'expected versus attested' forms yet.

Why try to formate a neoteric Section 'Medieval Greek' here rather than at el.wikt? Because here, there are so many learned and informed editors: experts -some, professionals-, who can help with their bibliography, their valued opinion in this project. At el.wikt I am totally alone in this project and I found it exhausting to update and patrol, make trials, have no feedback, no help for Med.Greek. All experts are assembling here. This wiktionary is the avant-garde of all wikts.
Admins! Please help to begin this project. Give me permission to start with _phase_1:rename Byzantine to Medieval. Allow _phase_2 (upgrade from etymol.language to an autonomus section), so that I can use the title Medieval Greek for poor Τζέτζης who has been waiting for this for a long time. Help, please, please, allow this long phase of greek at en.wikt to exist! Thank you. ‑‑Sarri.greek  I 04:58, 9 February 2024 (UTC)[reply]

@Sarri.greek I tried to read through this discussion. It is confusing because there are two different issues (rename Byzantine -> Medieval, and split out Byzantine/Medieval from Ancient Greek). For issue 1 (rename), it looks like maybe two people (User:0DF and User:Nicodene) disagree with the name change and up to four are in support (User:Sarri.greek, User:Thadh, User:Mahagaja and User:Theknightwho). This is possibly enough for a rename but I feel uncomfortable without a clearer consensus, esp. given that I'm not sure whether User:Fay Freak opposes the name change and/or split (their prose is, as is typical, somewhat impenetrable). User:Erutuon and User:-sche seem willing to accept one or both changes but without a strong opinion. For issue 2 (split), it looks like User:Sarri.greek, User:Thadh, User:Mahagaja and User:Nicodene are in favor of a split, while User:0DF is undecided, User:Fay Freak possibly opposes (?), and User:Theknightwho has not expressed an opinion. Can all the people I just named let me know (1) did I get your opinion correct on both issues and (2) if not, what is your opinion, both about issue 1 (the rename) and issue 2 (the split)? Benwing2 (talk) 05:29, 9 February 2024 (UTC)[reply]
Yes, support. Thadh (talk) 11:19, 9 February 2024 (UTC)[reply]
Support — long overdue!   — Saltmarsh🢃 06:26, 9 February 2024 (UTC)[reply]
@Benwing2: I was more warning with respect to the ambiguous consequences, without obstructing. If people are willing to invest work for a split, it is not my due to oppose it, since I do not expect to do Greek in the medium-term anyway, as it is low on my priority list, relatively to other interesting languages – I have not even followed the forthgoing of the discussion and don’t know what you all exactly intend, especially with respect to the 300–600 time, when I have derived Arabic terms from Byzantine Greek when I am not really sure whether they are from before Islam or right after it or a century later etc. and it might be split to Late Koine and Medieval Greek, which I am not particularly keen to revisit either and Greek editors might be good enough to pinpoint. Fay Freak (talk) 07:07, 9 February 2024 (UTC)[reply]
@Fay Freak OK thank you, that clears things up. Benwing2 (talk) 07:12, 9 February 2024 (UTC)[reply]
Thank you @Fay Freak for not opposing. Indeed the period of Late Koine 300-600 (600 accepted as turning point with original-Greek parts of Novellae at Iustinianos legal reforms, -langugagewise, while history has a different periodisation-), is under the jurisdiction of Ancient Greek administrators. As seen at {{R:DGE}} and Bailly2020: these dictionaries extend to authors of up to 6-9th, 10th, 13th centuries, when such authors use Koine as high register. ‑‑Sarri.greek  I 07:36, 9 February 2024 (UTC)[reply]
@Benwing2, sorry to bother you again: what is going to happen? Would you like me to call more people to vote? Mr @A. T. Galenitis who edits all phases of Greek including Medieval is away. As you see, not many are interested in Greek. But, I am, I am: I am willing and available! Every year, less and less people will be voting. In the end, I will be the only voter! I am awaiting and anxious to start editing. Thank you. ‑‑Sarri.greek  I 18:26, 15 February 2024 (UTC)[reply]
@Sarri.greek I'd like to give it a couple of weeks. As it is looking, the split seems pretty clear and the name change is leaning towards, although User:0DF has not added their votes yet. Note that in general you should not canvas votes, i.e. ping people specifically for voting purposes esp. if you believe they will vote in a particular way that you desire. Benwing2 (talk) 00:51, 18 February 2024 (UTC)[reply]
@Benwing2, of course, of course! people vote if they agree, not because i called them. I am just informing people with whom we have been discussing about this for more than a year, people that have -or want to- edit Greek. Just Mr A. T. Galenitis, an excellent editor, who supports strongly. But they do not come very often, and they do not get messages except from their Talkpages. I always check Related changes for el and for grc, and I am sorry to say, that there are very few people interested. Perhaps some editors doing very many languages, create some exotic lemmata. Thank you very much, I can wait, I know how busy you are. ‑‑Sarri.greek  I 08:53, 18 February 2024 (UTC)[reply]
Thank you very much @Sarri.greek for bringing this into my attention and for putting once again the effort for this very worthwhile change. Indeed, I have been rather inactive lately, but as the creator of many gkm lemmata I am adamant on the need for this split with arguments which have been repeated multiple times. I would be more than happy to put the required work for my own lemmata and create more while at it. Regarding the naming, both approaches have some historical value (with varying power of persuasion) to them yet from a functional point of view it doesn't make much sense to oppose the recent literature and main body of research within the field where "Medieval Greek" has become dominant (vide Holton's et al. recent monumental Cambridge Grammar of Medieval and Early Modern Greek) A. T. Galenitis (talk) 17:09, 16 March 2024 (UTC)[reply]
I am sorry that I write so schoenotenically; I have difficulty with concision. I'm also sorry that I have taken so long to respond; I have done a lot research regarding this topic since η Δις Σαρρή first petitioned the Beer parlour for these changes. As you'll see below, whilst I still oppose the change of this chronolect's name, I have come to support its split into a lect with its own L2 header, at least in principle. I feel I should explain my position, especially regarding my “concern…that η Δις Κατερίνα Σαρρή has a different understanding of what this vote endorses from the understanding of the other voters here”.
Δις Σαρρή· When you write things like:
I am left with the impression that you want the label “Medieval Greek” to refer only to the relevant period's basilect of the Greek diglossia. If that is your position, what then happens to the acrolect of that period? Does it remain part of Ancient Greek (grc)? And if so, should Katharevousa be treated similarly? Ultimately, is post-Classical Greek to be split primarily by register? Perhaps I've misinterpreted you, but if so, please clarify your position. If this is your position, you should make it explicit, so that everyone knows exactly what's being voted on. Perhaps this is what Fay Freak meant by “ambiguous consequences”. I could support either one, be it a split by period or by register. Here's a litmus test: In what variety of the Greek macrolanguage was the Suda originally written?
What I could not support is a split by period that excludes from Byzantine Greek its higher-register elements. You seem to want to do that when you say “Med.Greek does not have Dative.” and “We do not add Dative at Mod.Greek either”. It is untrue that Byzantine Greek does not have the dative; on the contrary, as Staffan Wahlgren writes, “The most important observation…is that the dative is so surprisingly alive and productive in such a wide range of Byzantine texts.” (Wahlgren 2014: Abstract) Even Holton et al. (2019: II, 241–243), whom I've already quoted at length above, acknowledge that “datives survive in many of the written texts that th[eir] Grammar is based on” and that “[p]articularly common are datives governed by the prepositions ἐν and σύν”, before recording their decision nevertheless to exclude all datives (except ἀλλήλοις) with the single sentence “Because the dative had ceased to be part of the spoken vernacular by about the 10th c., dative forms are not included in the paradigms set out in the chapters that follow.” — Blink and you'll miss it! And those datives aren't all just learned preservations; especially noteworthy is the Early Modern Cretan Greek noun ἐμπιστευτιός (empisteutiós), which is one of the “[w]ords belonging to [a] paradigm [which] have only been found in LMedG and EMG texts from Cyprus. In all cases these words are local variants of masculine words in -τής…. The earliest examples are from Assizes B (15th-c. ms).” (Holton et al. 2019: II, 451), and which has the dative plural form ἐμπιστευτηόδες (empisteutēódes) attested in a sixteenth-century text.
As a general concern, I think you lean on Holton et al. too much: their work has a far more limited scope than is immediately apparent. As Martin Hinterberger writes, despite the recent appearance of the Cambridge Grammar of Medieval and Early Modern Greek, it is not the “comprehensive linguistic description of written Byzantine Greek (in all its multifarious variants) [which] remains one of the desiderata of Byzantine literary studies” (Hinterberger 2021: 21); in my opinion, though not (explicitly) Hinterberger's, Holton et al. have treated the Greek of 1100–1700 “as a degenerated, deficient form of classical Greek, [which they have ignored,] or as an immature form of modern Greek” (Hinterberger 2021: 37). We should not do the same.
I want to end this on a note of praise. I admire the enthusiasm and hard work you pour into this. If I have the effect of applying brakes, please understand that I do so only to ensure clarity prevails and that the best decisions are taken, even if it might not seem that way to you. I notice that you are writing a module to handle the declension of all Greek nouns. I think this is a worthwhile effort, and it has a precedent in Module:zlw-lch-headword. It would certainly be good to have a common theme for all Greek nominal declension, since that would avoid such aesthetically objectionable clashing as currently exists in Λεϊβνίτιος (Leïvnítios). Keep up the good work! 0DF (talk) 01:51, 24 March 2024 (UTC)[reply]

Rename to Medieval Greek[edit]

  1. Support ‑‑Sarri.greek  I 05:54, 9 February 2024 (UTC)[reply]
  2. Support   — Saltmarsh🢃 06:26, 9 February 2024 (UTC)[reply]
  3. Oppose - Byzantine is the more common term and no valid argument has been given against it. Nicodene (talk) 08:19, 9 February 2024 (UTC)[reply]
    Thank you @Nicodene for your support for this language. Yes, the termByzantine is extremely common because we have Byzantine studies, Etudes Byzantines at Sorbonne, Byzantine Music, Byzantine Iconography, Byzantine Empire and so on. But I do not recall any language taking its name from an empire e.g. Roman Empire Latin, British Empire English? is there any example? Mandarin perhaps as non-linguistic term? The term was used pre-2000 influenced from the very common 'Byzantine' epithet. Greek linguists also used it, but later, preferred the term 'μεσαιωνικός, medieval. But, thanks anyway. ‑‑Sarri.greek  I 08:38, 9 February 2024 (UTC)[reply]
    The actual comparison to *[British Empire English] would be *[Byzantine Empire Greek], which nobody says either. And it'd be strange to argue that British English, British music, and British art are all "named after an empire" just because there was also a British Empire. They're all named after Britain and the British people, just as all the things you mention are named after Byzantium and the Byzantines. Nicodene (talk) 09:08, 9 February 2024 (UTC)[reply]
    @Sarri.greek: As Nicodene wrote, Byzantine Greek isn't named for the Byzantine Empire; rather, both are named for the Byzantines, who are named for Byzantium. Languages are usually named for people, places, or polities (and polities are usually named for either of the former). Because of what people and places can be named for, this can result in pretty weird language names. For example, Big Nambas (nmb) and Nez Perce (nez) are named for peoples with the same designation, and those peoples are named for their codpieces and misnamed for the Chinooks' nose piercings, respectively. Toponymically, East, South, and West Bird's Head are named for Bird's Head, a peninsula of Papua that looks, indeed, like a bird's head; I can only assume that Port Sandwich (psw) was named for the Vanuatuan coastal settlement that has since been renamed Lamap; and Western Desert (nine dialect codes) is named for desert areas in western Australia (chiefly Western Australia). Many creoles have strange names. Other language names are odd for etymological reasons; for example, Ukrainian (uk, literally “borderlandese”, althought this etymology is disputed) and Zamboanga Chavacano (cbk, literally “poor-taste mooring-place”). And then there are names that are picturesque, like Cœur d’Alêne (crd, literally “heart of awl”), Hill (mrj) and Meadow Mari (mhr), Large (hmd) and Small Flowery Miao (sfm), and Blue (hnj), Green (also hnj), and White Hmong (mww). By comparison, Byzantine Greek is not at all strange or particularly romantic (pun intended).
    I admit I got a bit carried away with the examples there. Sign languages are generally more clearly named for polities; for example, American (ase) and British Sign Language (bfi); compare the more obscure Maritime Sign Language (nsr). Dari (prs and gbz) supposedly derives from Classical Persian دربار (darbār, royal court) and one could argue that Dano-Norwegian is named for the political union Denmark–Norway. However, the language name most unambiguously named for an empire is probably Imperial Aramaic (arc), named for the Neo-Assyrian, Chaldean, and especially Achaemenid Empires. Finally, consider Ashokan Prakrit, which goes one step further by being named for a specific emperor, namely the Mauryan Emperor Ashoka the Great (regnavit circa 268–232 BC). 0DF (talk) 00:34, 7 March 2024 (UTC)[reply]
  4. Support {{abstain}} Both names seem about equally common, and I don't really care which one we use. I'm not opposed to either name. Thinking about it some more, I've decided I prefer "Medieval". —Mahāgaja · talk 09:53, 9 February 2024 (UTC)[reply]
  5. Support Thadh (talk) 18:32, 15 February 2024 (UTC)[reply]
  6. Abstain {{support}} Following the contributions of user 0DF to the discussion, I also see the merit of the term Byzantine Greek. Most importantly, I understand that I require additional reading before coming to a final conclusion. For the time being, abstaining (i.e. agreeing with either terminology to be adopted). A. T. Galenitis (talk) 21:28, 21 March 2024 (UTC)[reply]
  7. Oppose To avoid further perceptions of prolixity, I shall be terse:
    Reasons for “Byzantine Greek”:
    1. As I've argued before, the language should be called Byzantine Greek “because its production is inextricably linked to Byzantine civilization” (Hinterberger 2021: 22).
    2. Other things being equal, endonymy is desirable. However, ready apprehensibility by Anglophone readers often supersedes this consideration. The Byzantines usually called themselves Ῥωμαῖοι (Rhōmaîoi, literally Romans), their country Ῥωμανία (Rhōmanía), and their language Ῥωμαϊκή (Rhōmaïkḗ). English Romaic and Rhomaic exist, but I wager they're little-known, and likely to be mistaken as relating to Romani or Romanian. Ancient Greek Ἕλληνες (Héllēnes) exists, but is not specific to the Byzantine period, and “Hellenic Greek” Hellenistic Greek Koine Greek. There's Ancient Greek Γραικοί (Graikoí), but that's used for the macrolanguage “Greek”. There is marginal self-reference by Byzantines to their histories as Βυζαντιακαὶ (Buzantiakaì) and to themselves as Βυζάντιοι (Buzántioi), so “Byzantine Greek” is endonymic. By contrast, no people in the Middle Ages called themselves “Mediaeval” anything.
    3. “Byzantine” is a fairly familiar term to the average educated Anglophone. It is an epithet applied to a great many disciplines, journals, and phenomena pertaining to the empire of that name (v. e.g. [1], [2], [3]), the vast majority of the primary sources for which are written in Byzantine Greek. Cet. par., it is desirable that referents systematically related in such a manner should share a nomenclature. I doubt that those various disciplines would adopt the relatively cumbersome “Mediaeval Greek X” nomenclature to replace the relatively concise “Byzantine X” nomenclature, and it would be ungrammatical to do so in compound modifiers such as Serbo-Byzantine.
    4. The alphabetical and chronological orders of the three chronolects of Greek (that are written in the Greek alphabet) are the same. For any word homographic in the three chronolects — many (most?) consonant-initial ((pro)par)oxytones — this allows one to trace its development from Ancient Greek, through Byzantine Greek, and all the way up to the Greek of the present day by scrolling down the page and reading in order: a boon for comprehension. This serendipity would be lost if Byzantine Greek were renamed Mediaeval Greek.
    Reasons against “Mediaeval Greek”:
    1. Mediaeval means “of or pertaining to the Middle Ages (Latin Medium Aevum)”, but those Middle Ages were not universally significant. Traditionally, the Middle Ages are regarded as beginning in 476 with the fall of the Roman Empire in the West and as ending in 1453 with the fall of the Roman Empire in the East. Lingustically, the former had a considerable impact on Medieval Latin: the dissolution of Roman institutions, radical decentralisation, vernacular drift, development of feudalism, and immigration of unassimilated peoples lead to linguistic innovations and borrowing on a massive scale; often regarded as corruptions, various attempts were made to restore Classical Latinity, as in the Carolingian Renaissance, but these saw only partial success until the triumph of humanist Ciceronianism in the Italian Renaissance. Thus, Mediaeval Latin was succeeded by Renaissance Latin and then by New Latin. This makes the epithet “Mediaeval” highly suited to that chronolect of Latin. By contrast, Byzantine Greek saw no such dissolution, decentralisation, or feudalism, at least not until the Fourth Crusade; for Greek, the fall of 1453 was vastly more consequential than the fall of 476 — the opposite was true for Latin. This makes the epithet “Mediaeval” highly unsuited to that chronolect of Greek. For more, see Kaldellis 2019: ch. 4 (“Byzantium Was Not Medieval”), pp. 75–92.
    2. The adjective has four justifiable spellings: mediaeval, medieval, mediæval, mediëval. Byzantine has only one. Cet. par., that a term's spelling be uncontested is desirable.
    3. The English Wikipedia has three articles entitled “Medieval X” for languages (Medieval Greek, Hebrew [4th–19th CC.!], and Latin); in other articles I saw, they give Medieval Catalan as a synonym of Old Catalan, Medieval Spanish and Old Castilian as synonyms of Old Spanish, and for Galician–Portuguese they give the five synonyms Medieval Galician, Medieval Portuguese, Old Galician, Old Galician–Portuguese, and Old Portuguese. That would give the impression that, in language names, medieval and old are synonymous; not so Medieval Greek, which has the synonym Middle Greek (alongside Byzantine Greek and Romaic). Middle and Old are much more common as chronolect descriptors than Medieval (CAT:en:Languages has 2 members named “Medieval X”, 25 named “Middle X”, and 64 named “Old X”). AFAIK, no one calls Byzantine Greek “Old Greek”. IMO, “Middle X” only really works for languages with a threefold chornolectal division designated “Old–Middle–New X” or “Old–Middle– X”. Greek, however, has a four- or even six-fold division — Mycenaean–Ancient–Byzantine–Modern or Mycenaean–Homeric–Classical–Koine–Byzantine–Modern — one would be hard-pressed, especially in the latter, to describe the Byzantine chronolect as being in the “Middle”.
    4. Pace Κ. Α. Τ. Γαληνίτη, it is not at all apparent that the term “‘Medieval Greek’ has become dominant”, and contra Holton et al., here are uses of Byzantine Greek from three authors, with many more available. The ISO received three proposals in 2006–2009 to create new codes for Medieval Greek gkm, Ecclesiastical Greek ecg, and Katharevousa Greek elr; last year, the ISO rejected them all, partly due to “the lack of consensus among them” (p. 2). It is noteworthy that § 4 of the original change request for Medieval Greek gkm gave the language's name as “Middle Greek” and said of it that “[t]he language is distinct from Ancient Greek in vocabulary, phonology, and grammar, and displays linguistic attributes which are characteristically Byzantine and uncharacteristic of Ancient Greek” [my emphasis], whereas the first page of the request for the new language code element gkm gave, as the reason for preferring the name “Middle Greek” over the autonym “Romaiki” and the alternative names “Byzantine Greek” and “Medieval Greek”, that “Middle Greek” was the “[m]ost common amongst scholars” (!); it's only because Anastassia Loukina emailed SIL International to write that “the more common term used in Greek linguistics to refer to this stage of Greek is ‘Medieval Greek’ rather than ‘Middle Greek’” that the proposal was changed (by the ISO?) to one for “Medieval Greek”, although Δις Loukina merely asserted her claim, not citing anything. Is there any real evidence that any one term predominates?
    Alas! So much for avoiding prolixity…
    @A. T. Galenitis, Benwing2, Erutuon, Fay Freak, Mahagaja, Nicodene, Saltmarsh, Sarri.greek, -sche, Thadh, Theknightwho: For those of you who have voted or who intend to vote, I humbly request that you consider what I've written. For those of you not voting, I ping you in case you're interested and because you've taken part in this discussion before. To all of you, I apologise for the length of this post; I seem not to be very good at brevity. 0DF (talk) 07:37, 20 March 2024 (UTC)[reply]
    I've read all you wrote above but am not convinced by it, certainly not enough to change my vote. Points 2 and 4 pro Byzantine strike me as irrelevant, and point 3 sounds like it could equally be an argument to use the term "Anglo-Saxon" instead of "Old English", which I trust no one in this day and age still wants to do. None of the arguments contra Medieval strike me as particularly strong. —Mahāgaja · talk 07:56, 20 March 2024 (UTC)[reply]
    And what argument for 'medieval' struck you as strong? Nicodene (talk) 08:27, 20 March 2024 (UTC)[reply]
    I think somewhere in this discussion or an earlier one I said I prefer "medieval" because it makes it clear that the lect in question is a chronolect, not a regiolect. —Mahāgaja · talk 08:37, 20 March 2024 (UTC)[reply]
    Wut, even if Greek writing is located far in in Arabia or Ethiopia, I still call it Byzantine Greek provided it matches the period. Fay Freak (talk) 11:23, 20 March 2024 (UTC)[reply]
    Right, but calling it Medieval Greek makes it clearer that what's relevant is the time period, not the location. —Mahāgaja · talk 11:39, 20 March 2024 (UTC)[reply]
    The case can be made that 'Medieval' is chronologically explicit, but it is simply unimaginable that anyone could know the term Byzantine yet mistake Byzantine Greek for a regional label. Nicodene (talk) 11:59, 20 March 2024 (UTC)[reply]
    I don't find that unimaginable at all. It's certainly more plausible than someone thinking Byzantine Greek referred to overly complex or intricate Greek, but we can't entirely rule that interpretation out either. —Mahāgaja · talk 12:56, 20 March 2024 (UTC)[reply]
    It would require someone who knows about the city of Byzantium and yet is unaware of the existence of the Byzantine Empire, in other words a person that does not exist. As for the other potential sense of ‘Byzantine’, that is simply not an argument as it applies just as well to someone mistaking ‘medieval Greek’ as referring to a brutal or savage dialect. Nicodene (talk) 13:16, 20 March 2024 (UTC)[reply]
    Was Byzantine Greek also used outside the borders of the Empire? —Mahāgaja · talk 13:32, 20 March 2024 (UTC)[reply]
    Certainly, as it doesn't have to do with borders either.
    If anyone has ever actually used ‘Byzantine Greek’ to distinguish one variety of Greek from another based on region or geopolitical control I've yet to see any sign of it. Nicodene (talk) 13:59, 20 March 2024 (UTC)[reply]
    So the language in question is used outside of the geographical area denoted by "Byzantine" but not outside of the chronological era denoted by "Medieval". That's why I prefer to call it Medieval Greek. —Mahāgaja · talk 14:11, 20 March 2024 (UTC)[reply]
    ‘Byzantine’ is not a geographical area.
    The one, and only, valid point in this is as stated above - that ‘Medieval’ is more chronologically transparent. Nicodene (talk) 14:21, 20 March 2024 (UTC)[reply]
    @Mahāgaja: Thank you for reading my rather overlong post. Responding to your points:
    1. Do you regard point 2 pro Byzantine as irrelevant because you disagree with the statement “other things being equal, endonymy is desirable”? If so, I understand you, since that statement is my axiom for that point. Otherwise, I would appreciate a rationale.
    2. I don't see how you could call point 4 pro Byzantine irrelevant for this project. In a dictionary of Byzantine Greek only, it indeed would be irrelevant, but since that's not what Wiktionary is, it's simply an error to call that point “irrelevant”.
    3. AFAICT, “Anglo-Saxon” — itself a compound modifier — is on all fours with “Old English” in terms of its suitability for forming compound modifiers. That seems like a disanalogy to me.
    4. Whereas “mediaeval” is traditionally clear vis-à-vis period (viꝫ 476–1453), a lot of usage muddies the waters. Jacques Le Goff throughout his career (or at least from 1977 onward) sought to extend the Middle Ages into “the eighteenth century, when, he believe[d], the European nation-states properly emerged” (Kaldellis 2019: ch. 4, p. 77). And conversely, some scholars of chronologically preceding and succeeding fields annex parts of the Middle Ages to their own periods: “The field of ‘late antiquity’ has been pushed by some to the early Carolingians (i.e., to the ninth century), whereas at the other end some historians of early modernity have reached back to claim everything after the twelfth century, when the European economy embarked upon a trajectory that would arc to modernity. With late antique and early modern historians claiming so much territory, that leaves only a rump Middle Ages squeezed around the turn of the millennium. [¶] Byzantium has little standing or stake in this debate.” (ibidem: pp. 77–78)
    0DF (talk) 15:27, 20 March 2024 (UTC)[reply]
    I do disagree with the statement "other things being equal, endonymy is desirable". At Wiktionary, as at Wikipedia, what matters is what a language is commonly known as in English, not what its native name is. That's why we call German German, not Deutsch, and Dutch Dutch, not Nederlands. And no ancient language was known to its speakers with modifiers like "Old", "Ancient", "Classical", "Primitive" and so forth. And you yourself point out that Greek speakers of the era under discussion generally referred to their languages as (the Greek equivalent of) Romaic; but absolutely no one here is suggesting that Wiktionary's canonical name for this language should be Romaic. So that point is actually not an argument in favor of Byzantine at all; it's an argument against both Byzantine and Medieval. Point 4 is irrelevant because that's simply not a consideration we have ever had or ever should have. The names "Old Irish", "Middle Irish" and "Irish" are in reverse alphabetical order; so what? —Mahāgaja · talk 15:57, 20 March 2024 (UTC)[reply]
    @Mahagaja: Re “what matters is what a language is commonly known as in English”, I already wrote that “ready apprehensibility by Anglophone readers often supersedes th[e endonymy] consideration”, so we don't disagree on the overriding importance of that. However, given a choice between two English names identical in their recognisability (which is an instance of that “other things being equal” qualifier), would you really maintain that endonymy wouldn’t even be a consideration to break the tie? That's not a strictly irrational position, but I would be surprised if you held it. Anyway, with regard to RomaicByzantineMediaeval, my point is that Romaic would be best in terms of endonymy, but its obscurity disqualifies it; whereas Byzantine and Mediaeval are comparably familiar to educated Anglophones, so Byzantine’s endonymy can break that tie. Is my position on this point any clearer now? That “Point 4” is nothing other than a consideration about page layouts which has some bearing on this issue; I'm not saying that it's a be-all and end-all, just that it's a relevant consideration, even if other considerations are primary. 0DF (talk) 00:09, 21 March 2024 (UTC)[reply]

Split from Ancient Greek[edit]

  1. Support, as creator of this proposal ‑‑Sarri.greek  I 05:54, 9 February 2024 (UTC)[reply]
  2. Support   — Saltmarsh🢃 06:26, 9 February 2024 (UTC)[reply]
    Thank you @Saltmarsh, my guru, mentor and administrator at Modern Greek! I promise to work as you have taught me. ‑‑Sarri.greek  I 06:33, 9 February 2024 (UTC)[reply]
  3. Support Nicodene (talk) 08:13, 9 February 2024 (UTC)[reply]
  4. SupportMahāgaja · talk 08:26, 9 February 2024 (UTC)[reply]
  5. Support Thadh (talk) 18:32, 15 February 2024 (UTC)[reply]
  6. Support A. T. Galenitis (talk) 16:46, 16 March 2024 (UTC)[reply]
  7. Support in principle — I am concerned, however, that η Δις Κατερίνα Σαρρή has a different understanding of what this vote endorses from the understanding of the other voters here. 0DF (talk) 07:46, 20 March 2024 (UTC)[reply]
    See § phase 1 (above) for an explanation of this comment. 0DF (talk) 01:55, 24 March 2024 (UTC)[reply]

?[edit]

Happy month: καλό μήνα (kaló mína), @Benwing2, Mahagaja and everyone! Are we still on hold? I would like so much to come back, but how? having to write {m|gkm|xxx} all the time in pages with Ancient title... for example, @παπᾶς. I need: a month to review what exists. A year to do some labels for Learned Medieval (=archaisms and Hellenistic style), for Early Modern Greek (with medievalisms), some ready-to-fill-in inflection tables, some reference templates etc. I cannot even start without a code. Thank you. ‑‑Sarri.greek  I 17:00, 1 March 2024 (UTC)[reply]

@Sarri.greek: I'm working on responses. Sorry for the delay. Please bear with me. 0DF (talk) 02:06, 2 March 2024 (UTC)[reply]
Oh, M @0DF. What do you mean 'working on responses'? Please do not flood this page? We understand you are against. I shall make a special workpage-plan for MedGr once it is allowed. And with a talk page, and sections for every subject about it, where you can write as long texts as you like. Thank you. ‑‑Sarri.greek  I 06:04, 2 March 2024 (UTC)[reply]
@Sarri.greek It looks like we have consensus for both changes, esp. for the split: 6-0 plus one undecided (User:0DF) for the split, 5-2 for the rename (User:Nicodene and User:0DF opposing). User:0DF, you never gave a response concerning the rename. Do you have anything you'd like to register (e.g. concerns, alternative suggestions, etc.)? Keep in mind that renames are easier to do than splits, so if for some reason it's decided in the future to undo the rename or switch to a third term, it wouldn't be such a big deal. Benwing2 (talk) 01:46, 17 March 2024 (UTC)[reply]
Thank you all, thank you M @Benwing2! Great Sunday! I'm ready to start work! and will be checking the changes. I have prepared a trial-User:Sarri.greek/About Medieval Greek (in the pattern of WT:About Ancient Greek), a trial Template:User:Sarri.greek/gkm-IPA which needs to 'show' visibility, and more. Proposals and suggestions for the first-time-presentation of MedGr are welcome and needed from everyone, especially the administrators of Ancient and Modern Greek. e.g. at User About's Talkpage (or open an extra page?, please tell me, Sir, and everyone.) Thank you. ‑‑Sarri.greek  I

English. Move to open-pit (POS??) This, that and the other (talk) 05:16, 19 January 2024 (UTC)[reply]

It would be an adjective, but is anything other than mines ever called open-pit? —Mahāgaja · talk 08:03, 19 January 2024 (UTC)[reply]
Mining can also be open-pit, as can work in older texts.
(In my part of the world people sometimes talk of open-cuts (open-cut mines, a synonym we don't have - I wonder if it is highly regional). So I was really wondering if open-pit could be a noun that is used attributively in open-pit mine. It's hard to tell.) This, that and the other (talk) 09:16, 19 January 2024 (UTC)[reply]
This seems like a cleanup operation, covering several entries and potential entries.
No other OneLook dictionary has open-pit mine. MWOnline, Oxford, Dictionary.com, and Collins have open-pit, Collins having it as a noun. (Attestable as noun, but SoP?) Also we have some of opencast, open-cast, open cast. We should use GoogleNGrams to determine the most common for each of the -pit, -cast, and -cut forms, use Google Books/News to determine which are attestable, include all attestable forms as alt forms, and make sure that at least the main forms show the main form of the other groups as synonyms. There is also the possibility that some of these are used adverbially. DCDuring (talk) 15:18, 19 January 2024 (UTC)[reply]
Not checked to confirm that most usage is about mines and similar. DCDuring (talk) 15:49, 19 January 2024 (UTC)[reply]

Move definition to aeon and mark eon as an American alternative spelling of aeon, so as to align with Wikipedia and as aeon was borrowed from the Latin aeon, not eon. eonian, eonic and light eon included. A Westman talk stalk 03:22, 24 January 2024 (UTC)[reply]

Don't. eon is demonstrably (at GoogleNGrams) more common: eons recently thrice as common as aeons and eon more common than aeon. (Plural is nearly three times as common as singular. DCDuring (talk) 13:07, 24 January 2024 (UTC)[reply]
Agreed. As a general rule we don't change the spelling of terms with Pondian differences once the entry has settled on one spelling or another. (There are exceptions, e.g. if British spelling allows both A and B equally and American spelling prefers B, I think it would be reasonable to move a term spelled as A to B.) Benwing2 (talk) 02:53, 27 January 2024 (UTC)[reply]
If we check frequency, we should be ready to change which is the main entry. Google NGrams makes it easy, though it covers books (only?). Whether the criterion should be recent usage or all usage is a matter of judgment, at least for now. One can also search in News for usage by location (nation, province/state?) of the source. DCDuring (talk) 16:06, 27 January 2024 (UTC)[reply]

English 's and -'s[edit]

I understand that the distinction between 's and -'s is that the former is a contraction of is, was or has and the latter is a possessive, but I think this distinction is likely to be lost on the majority of Wiktionary users and is better made by merging both pages to 's and making the distinction using different Etymology sections. As it is, there is some duplication between these two entries. Benwing2 (talk) 22:44, 31 January 2024 (UTC)[reply]

We should be able to have something at -'s#English that directs users to the appropriate etymology section at 's. (Is -'s an alternative form, as we use the term, of 's?) DCDuring (talk) 13:10, 1 February 2024 (UTC)[reply]
(Oppose unless it can be demonstrated that we don't normally lemmatize suffixes like this at titles with hyphens.) I'm very sympathetic to the fact that content being somewhere that some people don't expect is a problem, and to need to prominently flag when content is on a different page than some people expect, not just in this kind of case, but also e.g. when we usually lemmatize singulars but occasionally put some senses at the plural, or usually lemmatize without the but occasionally have some senses at separate the X entries, or when we lemmatize phrasal verbs outside the main verb entry. I'm a big fan of Template:used in phrasal verbs and "See..." links like at message. But if lemmatizing the possessive at -'s is technically correct and is consistent with how we treat other suffixes, then we should continue lemmatizing at -'s and just take whatever other measures we can to obnoxiously prominently crosslink it to and from the other page... because if we make an exception and lemmatize this page at an incorrect title, it's inconsistent with other entries... do we also move -'#English? What about -'s#German and -'#German? What about -s? And that inconsistency confuses other users and editors who do understand our system, and look in the right/expected place, only to find that the content isn't there because we moved it to an incorrect/inconsistent place to try to outsmart them. I think we have to do things consistently (e.g. if suffixes usually start with hyphens, do so here too), and use prominent "See also..." links where necessary. For verbs linking to phrasal verbs, and for things like message, such links can just be on definition lines; here, I'd be fine with the link taking the form of a big T:LDL-esque yellow box or something if people want, if people feel a ===See also=== link is insufficient. Obviously, any incorrect duplication should be cleaned up. - -sche (discuss) 14:31, 29 March 2024 (UTC)[reply]

2024 — February[edit]

I'm not sure what to do with the "plural only" section, should it be moved to string attached as sense 2? DonnanZ (talk) 10:46, 4 February 2024 (UTC)[reply]

@Donnanz I would do this and add a label 'chiefly in the plural' to the singular, and move the usage note about "no strings attached" to the singular. In general it's fine to have a plural-only entry for a term that also exists in the singular, but only when the singular and plural-only terms have different meanings. Here there doesn't seem to be a difference in meaning. Benwing2 (talk) 03:26, 7 February 2024 (UTC)[reply]
@Benwing2:  Done. Thanks for your input. DonnanZ (talk) 10:14, 7 February 2024 (UTC)[reply]

Howdy folks! Am wondering if it may be a good or a bad idea to add a new language code for Solombala English, which is a very little attested pidgin, which has some common features with Russenorsk. It has only 20 known words, and two of them are obviously misunderstood by the later translators (but can be seen in the original sources). All the words, as far I know, are presented here: w:ru:Соломбальский английский язык (I added some commentary and sources there as well, but long time ago). The main reason of my request is that Solombala may be useful in etymology of some Russenorsk words. Tollef Salemann (talk) 17:43, 9 February 2024 (UTC)[reply]

Support. Theknightwho (talk) 08:54, 25 February 2024 (UTC)[reply]

Created as crp-slb, since this has been open for a couple of weeks, and no-one else seems to have much to say. @Tollef Salemann.

I have given it the Cyrillic and Latin script codes because, having checked, the original 1849 source uses (pre-reform) Russian Cyrillic, but modern sources seem to prefer a Latin transcription exclusively: e.g. "vat ju vanted, asej!" is actually "ватъ ю вантетъ, асей!" in the 1849 source (pp. 406-7); note that вантетъ (vantet) has been transcribed as vanted, for instance. I can't find the 1867 source referred to, but I assume it's also in Cyrillic.

Please let me know if you think we should be handling the scripts in a different way, though. Theknightwho (talk) 09:30, 25 February 2024 (UTC)[reply]

Thank you! There are also "my" instead of "tu". This was mistake of Broch i guess, and it seems like im the only who noticed it. There is also a funny story with his translation of "milek", cuz it was used in some adult context. As far i remember, there is no original Latin script Solombala, but im gonna first check through all the sources for being sure. The 1867 source took me a while to find last year, but i remember it wasn't impossible. Tollef Salemann (talk) 11:07, 25 February 2024 (UTC)[reply]
@Tollef Salemann Alrighty - let me know if you think we should remove Latn. I should have also said that I've also set it to use Russian transliteration, for obvious reasons. Theknightwho (talk) 03:04, 27 February 2024 (UTC)[reply]

Proto-Italic. Identical to Reconstruction:Proto-Italic/attā. The two should be merged at this page.

Moving Pai-lang to Bailang[edit]

Bailang (白狼) is a Lolo-Burmese language attested in one source from the 1st century, and also happens to be the earliest recorded Tibeto-Burman language. We currently use the name "Pai-lang" (derived from Wade-Giles), but modern sources overwhelmingly use the Pinyin-derived "Bailang", and have done for quite some time. Theknightwho (talk) 02:40, 17 February 2024 (UTC)[reply]

@Theknightwho Support. Benwing2 (talk) 02:56, 17 February 2024 (UTC)[reply]

Moved. Theknightwho (talk) 03:23, 25 February 2024 (UTC)[reply]

Adding Proto-Bai[edit]

While we do currently have a family covering the Bai langauges (sit-bai), we don't have the code sit-bai-pro for Proto-Bai. There are a few publications out there which give Proto-Bai reconstructions, and our Macro-Bai comparative vocabulary list contains 469 from Wang Feng's Comparison of languages in contact: the distillation method and the case of Bai (2006). I'm not suggesting that we blindly create entries for all of these, but we already reference Proto-Bai in four entries anyway: Lama Bai ɕy³³, Southern Bai ɕy³³, Central Bai xuix and Chinese (shān), so there's already a need for the code. Theknightwho (talk) 03:13, 17 February 2024 (UTC)[reply]

Support — 義順 (talk) 02:46, 18 February 2024 (UTC)[reply]

Created - given there's a real need for it, and no objections have been forthcoming. Theknightwho (talk) 04:58, 23 February 2024 (UTC)[reply]

Moving Nung to Nùng[edit]

Three reasons:

  • Both of these names are common in the literature, but newer and more professional publications seem to prefer Nùng, which I suspect is down to typesetting no longer being an issue.
  • It matches our general treatment of other languages in Vietnam, such as Tày, Ná-Meo, Ts'ün-Lao etc.
  • Other than headings, this spelling is already used in entries in the form of labels (e.g. bân, nãhm, nưhng, slao, slíhm).

Theknightwho (talk) 06:08, 17 February 2024 (UTC)[reply]

Moved - given it's consistent with our handling of it elsewhere. Theknightwho (talk) 04:08, 25 February 2024 (UTC)[reply]

@Theknightwho I think you should have waited on this. You only waited a week and no one supported the move. I don't agree with the general principle that we should include all the native-language accents and other Unicode chars in the Wiktionary names of languages. These names should reflect the *English* usage of such names, not the native-language usage. Possibly the name change was justified in this particular case but I don't want any precedent set that would justify e.g. moving the name O'odham to ʼOʼodham (or even worse, a half-ass rename like Kwami did to the Wikipedia article Oʼodham language, which includes the "Unicodified" apostrophe in the middle of the name but omits the apostrophe in the beginning). Benwing2 (talk) 05:02, 25 February 2024 (UTC)[reply]
@Benwing2 I'm not trying to set any precedent - I just noted that it's generally spelled with the diacritic in more recent (English) publications, and our entries already spelled it that way outside of the headings. Given no-one opposed it, it didn't seem like an issue. I'll wait longer in future, though. Theknightwho (talk) 05:21, 25 February 2024 (UTC)[reply]

Converting Min Nan into a family[edit]

Currently, we classify Min Nan (nan) as a language, despite it being a family of several Chinese lects. Because of this, the way we treat those lects is arbitrary and inconsistent.

  • Hokkien and Hainanese are both classified as etymology-only languages, despite Hokkien covering several major (dia)lects in its own right, and it being very common for entries to have a large number of Hokkien readings. It's not currently possible to add Hainanese to {{zh-pron}}, but it's also on the roadmap. In terms of how they are used, nothing distinguishes them from how we handle any of the full languages under the Chinese header, so there's no reason to classify them like this.
  • On the other hand, Teochew and Leizhou Min are classed as full languages, but they both have Min Nan set as their "ancestor", which is nonsense. I assume this was done so that the family tree looked right (see Category:Old Chinese language), but this has clearly happened because editors think of Min Nan as a family, not a singular language.

Currently, there is a pending request at the ISO in order to split Min Nan into a macrolanguage (though I won't address those which we don't currently have codes for, since that discussion is for another time).

  1. nan should be converted to a family code.
  2. Hainanese (nan-hai) should be converted to a full language.
  3. Hainanese, Hokkien (nan-hok), Leizhou Min (zhx-lui) and Teochew (zhx-teo) should be on the immediate level below.
  4. Given the large number of entries with numerous Hokkien readings, there are two options:
    1. Convert Hokkien to a full language, with Quanzhou, Zhangzhou and Xiamen etymology-only languages, possibly with the addition of Taiwanese Hokkien.
    2. Convert Hokkien to a family, and have Quanzhou Hokkien, Zhangzhou Hokkien and Xiamen Hokkien as full languages on the level below. I have no opinion on whether Taiwanese Hokkien (which is split out in the ISO proposal) should be treated separately if we do this.

Theknightwho (talk) 13:07, 17 February 2024 (UTC)[reply]

Support the first three bullet points, but Weak oppose on the fourth:
  • a potential slippery slope: Singapore, Penang, Longyan, etc. could warrant full languagehood if ZXQ and Taiwan are split
  • treatment of the above would be ambiguous due to the nature of Hokkien potentially not being monophyletic and the fact that eg. Taiwanese can’t really be called “a dialect of Amoynese” despite their shared transitionary nature
  • to draw a parallel with Northern Wu, Shanghainese and Suzhounese, both not being full languages, occupy a very similar geneological level when compared to ZXQ, though as far as the current trajectory is going, they will not be gaining full language-hood any time soon
Just my two cents — 義順 (talk) 02:57, 18 February 2024 (UTC)[reply]
@ND381 Just to be clear, does that mean you support option 1 of point 4? Theknightwho (talk) 14:42, 18 February 2024 (UTC)[reply]
ah yeah I misread what that said — yes, I would be in support of option 1 of the fourth point — 義順 (talk) 16:38, 18 February 2024 (UTC)[reply]
@ND381 What do you mean by "transitionary"? (talk) 11:31, 28 February 2024 (UTC)[reply]
I don't particularly know to much abt Hokkien linguistcs (I do Northern Wu) but from what I understand Amoynese and Taiwanese both exhibit features of both Zhangzhou and Quanzhou lects — 義順 (talk) 12:01, 28 February 2024 (UTC)[reply]
@ND381 I see. This is the common wisdom, I guess.
In truth, it makes little sense to pretend that "Zhangzhou" & "Quanzhou" are cardinal dialects. For one thing, there is a great deal of variation within what are supposed to be "Zhangzhou" Hokkien & "Quanzhou" Hokkien. Quemoy & Tâng-oaⁿ 同安 dialects of "Quanzhou" Hokkien, as a clear example, are themselves "transitional to Zhangzhou". So the entire "Zhangzhou-Quanzhou" framework is made of duct tape. "Zhangzhou-Quanzhou" reflects Confucian administrative loyalties more than anything else, as the English terminology (via Mandarin Pinyin) suggests. And the exclusion of Amoy Hokkien from "Quanzhou" is arbitrary & inconsistent in itself. So, there's "nothing there", even if certain isoglosses unsurprisingly bundle along the old prefectural border. (talk) 08:59, 29 February 2024 (UTC)[reply]
Similar to ND381, Support the first three points. The second subpoint of point 4 is a terrible idea, since it leaves out Zhangzhou-Quanzhou mixed varieties of Hokkien, which is one of the reasons why "Hokkien" isn't monophyletic. It's also unclear whether dialects like Jinjiang and Philippine Hokkien would be subsumed under Quanzhou. While we're at this, we would also need to see how certain other varieties of Min Nan are dealt with under the structure based on the first three points, namely Longyan (including Zhangping), Datian, Youxi, southern Zhejiang and Zhangzhou-based varieties spoken in Guangdong/Guangxi. While the Language Atlas of China groups Longyan with other Quan-Zhang varieties, it seems that it traditionally isn't considered "Hokkien". We might also want to see where Hailufeng Min fits here. (I'm writing this in a little rush, so there might be more points that come along after.) — justin(r)leung (t...) | c=› } 14:30, 18 February 2024 (UTC)[reply]
@Justinrleung No, "Longyan" is most definitely not part of Hokkien, either linguistically or sociolinguistically.
Hai Lok Hong Hoklo is clearly parallel to Hokkien & Teochew.
The Hokkien dialects of southern Zhejiang are clearly part of Hokkien.
Many or most pieces seem poised to fall into place. (talk) 11:36, 28 February 2024 (UTC)[reply]
@ I agree with you on this - Longyan should definitely be treated separately. I omitted it from the proposal because I specifically wanted to address the issue of whether we should treat Southern Min as a family, so I only mentioned the codes we currently have. It’s not supposed to be comprehensive, and in fact I was hoping it could set the stage for further additions, as I thought this change should probably happen before we add anything else. Theknightwho (talk) 13:07, 28 February 2024 (UTC)[reply]
No particular vote as I don't think I'm qualified enough to discuss about Southern Min here as I very rarely edit it, but I share similar views with ND and Justin based on my limited understanding of the internal structure of Southern Min after reading Kwok (2018).
I reckon the treatment of Zhongshan Min should perhaps also be discussed here, given that Glottolog treats it as a subbranch of Southern Min, although it seems like some of it is Eastern Min. Eitherway I think it will need a code. – wpi (talk) 14:09, 23 February 2024 (UTC)[reply]
Seconding this. Apparently, so-called "Zhongshan Min" is three mutually unintelligible languages, two of which may not belong to the NAN family (?) at all. (talk) 11:42, 28 February 2024 (UTC)[reply]
I don't have many knowledge of the relationship between ZQX Hokkien and other Hoklo varieties like Chaozhou and Hainan.
However, Amoy variety, Quanzhou variety, and Zhangzhou one are mutually intelligible to some extend. Amoy varieties should be treated like a dialect of ZQX language linguistically. Just like Irish deirfiúr that has contained various pronunciation from the dialect locations in Ireland.
Concerning with whether the Taiwanese (Taigi) should be treated like a fully language or a dialect of Hokkien, it's something like Serbo-Croatian language separation issue.--Yoxem (talk) 10:50, 28 February 2024 (UTC)[reply]
@Theknightwho Supporting Item 2.
Not opposing Item 1 (nor Item 3) in this context, but — even disregarding misplaced outliers — how much evidence is there that these languages (say, Hainanese & Hokkien) belong to one family in a historical sense? (Wikipedia doesn’t treat Singlish & Jamaican Creole, for instance, as being in the same language family as English. Or do we use the term “family” differently around here?)
Supporting Item 4.1, excluding Taiwanese.
The “Zhangzhou-Quanzhou-Amoy” split reflects the mapping of Confucian loyalties. It corresponds somewhat to linguistic reality, but attempts to package “Zhangzhou” Hokkien & “Quanzhou” Hokkien in a systematic manner seem to give off more smoke than light, as suggested by Mar_vin_kaiser’s comment clarifying what “Zhangzhou Hokkien” should mean.
So so-called “Zhangzhou” Hokkien or “Quanzhou” Hokkien or Amoy Hokkien are all just Hokkien. The “Zhangzhou-Quanzhou” split reflects Confucian psychology, not linguistic reality, and “Amoy” was set up as a third group not for linguistic but for Confucian or face-related (“face truce”) reasons. If some words have lots of pronunciations, in part this reflects the sociolinguistic reality of a wide range of dialects being recognized as a single language. Also, marginal pronunciations seem to find their way into Wiktionary for Hokkien much more than for most other languages, but as long as they exist (and not just idiolectally) & are non-extinct, this is good & well. If extinct or poorly attested pronunciations are swelling the ranks, methods may need examined, but that’s for some other day.
There is something to be said for treating Penang-Medan Hokkien as another language. Even w/o getting into the genesis of Penang Hokkien, the phonology of the variety seems to bend the rules of plain Hokkien. But the convention seems to be to treat it as a dialect within Hokkien, and this in turn reflects the sociolinguistic reality. (talk) 11:54, 28 February 2024 (UTC)[reply]

Pinging @Mar vin kaiser, Singaporelang, Mlgc1998, 幻光尘, LeCharCanon, MistiaLorrelay, Kangtw, The dog2, TagaSanPedroAko, Janinga Chang, Yoxem, 汩汩银泉, RcAlex36, Geographyinitiative for comment, who are all users who've edited recently that have some knowledge of Min Nan. Theknightwho (talk) 11:16, 27 February 2024 (UTC)[reply]

Thanks for calling - but actually I'm not proficient on the historical & comparative linguistics of Minnan, so I'll report the opinion from @S.G.Junge1997 who is currently working on various Southern Han varieties (I'm doing so because he's currently suffering from IP block).
“As almost all the Sinitic languages that we discuss here, including Southern Min, Northern Wu and so-on, are de facto macrolanguages, it would be not proper to list just some variety of these macrolanguages as distinct languages while to consider other least-concerned languages a part of the huge dialect continuum, not mentioned the phonological, lexicological or genetic differences between the least-concerned varieties are much larger than these varieties with metropolitan native speakers. Janinga Chang (talk) 15:55, 27 February 2024 (UTC)[reply]
...Taking Southern Min as an example, the macrolanguage Southern-Min itself is emerged among a group of coastal Min varieties in Dàtián, Fújiàn and surrounding area. Genetically, Southern Min can be divided into three varieties, the Western varieties used in Lóngyán and Zhāngpíng, Fújiàn Province, some remnants in Guǎngdōng Province (namely Zhōngshān Hokkien and some varieties of Leizhou Min), while the majority of Southern Min languages are in fact dialects of the massive Eastern varieties, including Chaozhou, Southern Min proper and Taiwanese Southern Min, these varieties shared a huge amounts of vocabularies and intelligibility, with only some of the characteristic vocabularies shared inside different branches. I'm not arguing about not list Chaozhou and Southern Min proper as different languages, but if one should consider listing Chaozhou and Quanzhou-Zhangzhou Southern Min or even Taiwanese Southern Min as separate languages appropriate, they must consider listing Dàtián qiánlù, Dàtián hòulù, Kǒngfūhuà, Sūbǎnhuà, Yànshí-Báishā, Lóngyán proper, Yǒngfú-Héxī, Zhāngpíng proper, Xīnqiáo-Xīnán and other small varieties concerned way less as distinct languages as well, (apart from Dàtián qiánlù and Dàtián hòulù, all these languages are different varieties of Western branch of the Southern Min which are using in different valleys around Lóngyán, most of which have less native speakers than 10k and are critically endangered, and although most of these languages share some common features, their differences in vocabularies and phonologies make them less intelligible internally than most of Eastern branch varieties, even not considering Chaozhou and Southern Min proper as different languages, some of these languages are still so diverse to be okay to be listed as separated) as it wouldn't be so appropriate to have "endangered" language varieties with often more than 1000k metropolitan native speakers listing as different languages while ignoring the real endangered languages with less than 10k native speakers and trying to hide their differences using a leftover garbage can discarded by thie metropolitan people who think their language is absolutely unique.”
Although this might sound offensive to some who values the traditional Quanzhou-Zhangzhou-Amoy-Taiwan layout more, his opinion is definitely worth considering since he had actually been to Longyan for fieldworks for several times. Janinga Chang (talk) 16:05, 27 February 2024 (UTC)[reply]
Hi! I Support the first three points, same as the ones above. I also reject the second subpoint of point 4 for the reasons mentioned. For the first subpoint of point 4, I support making Hokkien a full language. As for "etymology-only languages", I find it vague to say that a word from language X originates from "Zhangzhou Hokkien" when the way we've been using the term "Zhangzhou Hokkien" is the dialect specific to Zhangzhou city proper, and the word might have borrowed it not from Zhangzhou city proper. Seeing the reply of S.G.Junge1997, I'd be open to proposing Datian Min be listed as a separate language. --Mar vin kaiser (talk) 16:13, 27 February 2024 (UTC)[reply]
@Mar vin kaiser Just FYI: "etymology-only language" is a misnomer; a much better description is "variant", as it covers everything from written standards like British English (en-GB) to chronolects like Old Latin (itc-ola) to regional varieties like Penang Hokkien (nan-pen). The thing that matters is that they're "part of" a full language (or, in some cases, another etym-only language). We already have codes for a few varieties of Hokkien, so that part isn't proposing anything new; just that they're nested under the new language code for Hokkien, instead of as sub-variants like they are now. Theknightwho (talk) 17:00, 27 February 2024 (UTC)[reply]
@Theknightwho: Thanks for explaining! Then I see no problem with it. If ever, my question is why it should not be extended to Penang Hokkien, Singapore Hokkien, and Philippine Hokkien. --Mar vin kaiser (talk) 17:07, 27 February 2024 (UTC)[reply]
@Janinga Chang Seconding parts of this. It was careless for all these varieties to have been anonymously swept into NAN w/o careful examination & debate beforehand. (talk) 12:09, 28 February 2024 (UTC)[reply]
Support as well 1., 2., 3., and 4.1. as per further explanation of Theknightwho about variants under/part of Hokkien as a full language, e.g. Quanzhou, Zhangzhou, Xiamen, Penang, Singaporean, Philippine, Taiwanese, etc. etc. and also later expansion of no. 3 as well for the others under nan as a family to be their own as full languages under the nan family/branch of Min of Sinitic if they show divergent enough linguistic features and are realistically practically socially regarded by their speakers as separate from their closest of kin anyways by now, such as those mentioned above by Justinrleung and S.G.Junge1997 and those listed in the ISO pending request and other more there may be. Also, 4.2 is a bad idea due to there still being a lot of structurally similar or reasonably identical enough terms shared with these variants (ZXQ++) still tying them together despite some observable differences, whether in phonemic structure, vocabulary choices, tonal differences, and other tendencies of these variants. The gulf of difference with these variants (ZXQ++) is not yet like the difference with say what makes nan-hok, zhx-teo, nan-hai, zhx-lui, etc. different from each other, enough to definitively split them.
Also pinging as well other users I remember seeing them edit or create nan entries before: @Fish bowl, @Wikijb, @, @TongcyDai, @A-cai, @Hongthay for comment on this. Mlgc1998 (talk) 20:45, 27 February 2024 (UTC)[reply]
Support the first three bullet points. RcAlex36 (talk) 04:30, 29 February 2024 (UTC)[reply]
Thanks for calling and sorry for my bad english.
For point 4, I Support the option 1 and no support option 2.
Since I, as a native speaker (of ZC), I think the differences (of Zhangzhou Hokkien and other Hokkien tongues) are small that cannot split them to languages. I dare say they are just accents of Minnan/Hokkien.
For Teochew, Leizhou-ish and Hainanese, indeed their "ancestor" is not the Min Nan, but they are also southern descendant languages of ancient Min too — different to northern descendants like Fuzhou-ish.
(ZC: the Zhangzhou City accent of Hokkien)
MistiaLorrelay (talk) 10:06, 29 February 2024 (UTC)[reply]

Split with option 1 of point 4, given the overwhelming support in the last two weeks. Taking inspiration from @Benwing2's process to split the Khanty languages above (see #Splitting Khanty Languages), I think this is what needs to happen:

  1. Assign new language codes to Hokkien (nan-hbl) and Hainanese (nan-hnm), and change over Leizhou Min (zhx-luinan-luh) and Teochew (zhx-teonan-tws). For the sake of forward-compatibility, I've used the proposed codes from the pending ISO proposal, since that will make things simpler if they're accepted.
  2. Assign a temporary family code to Min Nan (zhx-nan), which will be used while nan still exists as a language code.
  3. Track any uses of the nan code.
  4. Move all current {{nan-*}} templates to {{nan-hbl-*}}, since they all relate to Hokkien.
  5. Convert any existing entries with the Min Nan headword to the relevant language (which I suspect will be Hokkien in the vast majority of cases, if not 100%).
  6. Change any references to nan to use the appropriate code. Again, I suspect Hokkien will predominate.
  7. Change any references to the existing etymology-only codes to use the appropriate code.
  8. Delete nan as a language code, and add it as a family code, replacing the temporary code zhx-nan mentioned above.

At this point, I also suggest that we start a new thread to discuss any additional languages which should be added to the Min Nan family, as several have been suggested above. Theknightwho (talk) 18:52, 2 March 2024 (UTC)[reply]

I Support points 1-3. However, ZXQ, Taiwanese, Penang, Singapore, and Philippine are really just variable accents with some regional vocabulary, like English dialects throughout England (are all those words recorded in Wiktionary too? They can't be as separate languages though?). Here in Taiwan, Taiwanese is getting more and more standardisation as the years pass, but I agree with another post comparing it to Serbo-Croatian (all accents of a single Stokavian dialect). There are different regional words used in Taiwan, but we start to understand them all as synonyms and I don't even know anymore which words belong to which specific location, like 日頭花 vs 太陽花, or 葉仔 vs 樹葉 vs 樹仔葉 vs 樹葉仔. I frequently travel throughout Southeast Asia and try to use Taiwanese in Penang and Singapore as much as possible. As someone mentioned, Penang has some interesting phonology, but I'm still able to hold conversations with taxi drivers--they speak in their way and I in mine. Though in Penang I've encountered drivers who talk freely at length and at times I find it hard to understand some of the details--they probably understand Taiwanese better than the other way around due to television dramas. But this interaction would not be possible for Chaozhou, which I consider so different as a separate language, and also Hainan and Leizhou--the phonology is far too different and they grammatically use different words. I feel that adding all the various regional pronunciations for ZXQ/Taigi clutters Wiktionary, and I believe that a better unifying meta-spelling would be better that enables regional pronunciations to be deduced through a few simple rules. I think it's best to mention whether a location has a completely separate word for something, rather than providing multiple pronunciations of the same word/字/morphemes. I also dislike the clutter and use of "invented" alternate romanisations that are not widely used or accepted, nor can anybody actually read. POJ or better, TâiLô, function just fine. Kangtw (talk) 09:36, 5 March 2024 (UTC)[reply]
Sorry, when I posted support above, the green + button did not automatically appear when I posted. In spite of that, please consider my vote. Kangtw (talk) 09:39, 5 March 2024 (UTC)[reply]
@Kangtw The vote has actually already closed, but everyone seems to have shared your view that Hokkien shouldn’t be split and should be treated as one language, so that’s how I’ve been carrying it out. Theknightwho (talk) 18:34, 7 March 2024 (UTC)[reply]

Move Hachijo Japanese to Hachijo language & more[edit]

Discussion suggested by @Theknightwho. Please share any thoughts you may have.

Recent scholarly consensus is that Hachijo is distinct from Japanese, being one of the earliest branches of if not outright parallel to (Old) Japanese. Hachijo should thus warrant a conversion into a language.

Similarly, Tugaru-ben and Satuma-ben could also be elevated to language level, though the exact classification on Kyuusyuu lects and the affinity of Tugaru-ben with the rest of Japanese is still not widely agreed upon. This is to be discussed. I am personally ambivolent on the issue though this has already been brought up on preliminary discussions on the enwikt Discord.

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): — 義順 (talk) 21:06, 18 February 2024 (UTC)[reply]

Support. I am not knowledgeable enough to have an opinion on this, but we will need to make a decision on whether it's a descendant of Old Japanese or not before adding this. (Also, less important, but I'd prefer we used the name Hachijō with the macron.) Theknightwho (talk) 01:50, 19 February 2024 (UTC)[reply]
Abstain, as a person who cares about historical linguistics, but doesn't actually keep up on the literature. Cnilep (talk) 03:12, 19 February 2024 (UTC)[reply]
Technical Queries:
  • Do we have an ISO code?
  • If we don't, how will we organize this in our infrastructure?
Also, +1 for @Theknightwho's points that 1) we need to be clear about the provenance of Hachijō (daughter language of OJP? or niece, sharing an even older parent?), and 2) we should use the macron spelling as "Hachijō". ‑‑ Eiríkr Útlendi │Tala við mig 17:22, 19 February 2024 (UTC)[reply]
There's no ISO code for it, from what I can see, so we'd have to treat it with an exceptional code, per the guidelines at WT:Languages. Something like jpx-hcj. AG202 (talk) 17:37, 19 February 2024 (UTC)[reply]
Good choice. I will raise it in a separate thread, but the issue of spelling also affects some of the Ryukyuan languages, where the names we've lifted from the ISO are suboptimal and aren't really used outside of contexts directly related to the ISO standard itself. Theknightwho (talk) 19:40, 20 February 2024 (UTC)[reply]

Since there hadn't been any activity in a while, I've gone ahead and created Hachijō with the langcode jpx-hcj, as a descendant of Eastern Old Japanese (an etym-only language part of Old Japanese). Currently, it has the two templates {{jpx-hcj-head}} and {{jpx-hcj-kanjitab}}, though more language-specific templates can be created as needed. I've converted most terms that were in Category:Hachijō Japanese, but the remaining ones are more complex, so I wanted to leave them to somebody with a better idea of what should be done with them. Pinging (Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2, ND381, AG202): — This unsigned comment was added by Theknightwho (talkcontribs) at 06:19, 15 March 2024 (UTC).[reply]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): Redoing pings, since they didn't work as I forgot to sign (which should teach me from doing this kind of thing just before I go to bed). Theknightwho (talk) 16:33, 15 March 2024 (UTC)[reply]

Making Sichuanese a full language[edit]

We currently handle Sichuanese (cmn-sic) as an etymology-only variant of Mandarin. Four things to consider:

  • Its treatment in Chinese entries is on the same level as that of Mandarin (proper) and Dungan: i.e. it is listed in the "Mandarin" section, in which "Mandarin" is used in the broad sense. Given that Dungan (which is treated as a full language) is also listed there, there's no reason why Sichuanese should be any different. See , which gives Mandarin (), Sichuanese (nyü3) and Dungan (nü, II). Note that Sichuanese transliteration is already handled automatically, since the transliteration module handles it as an independent lect.
  • Sichuanese is Southwestern Mandarin, whereas we use Mandarin to refer pretty much exclusively to the modern standard of Beijing Mandarin (except for that heading in the Chinese pronunciation table, already mentioned). They are pretty distinct.
  • As with Hokkien and Hainanese (as mentioned in #Converting Min Nan into a family), the designation of etymology-only language seems to have been arbitrary, and was likely influenced by the fact it doesn't have its own ISO code.
  • The line is much blurrier here, but it was raised in a past thread about Shanghainese (re the category Category:Shanghainese Chinese) that categories with names such as Category:Sichuanese Mandarin cause confusion, because of the distinction between Sichuanese [Mandarin] and standard Mandarin as spoken in Sichuan.

Two further considerations:

  • We might instead want to make Southwestern Mandarin a full language instead, with Sichuanese itself being part of that.
  • The code would likely need to be changed, as retaining cmn-sic would be misleading if we split it out of cmn.

Theknightwho (talk) 19:29, 20 February 2024 (UTC)[reply]

Support Sichuanese, Oppose SW Mando — 義順 (talk) 20:25, 20 February 2024 (UTC)[reply]
@ND381 How do you propose we handle SW Mandarin that isn't Sichuanese in the future? Theknightwho (talk) 04:07, 25 February 2024 (UTC)[reply]
Either along provincial boundaries and make areal groups (they're all similar enough) or just use Li's atlas. It just feels kind of weird to have "Southwestern Mandarin" as a language since it's very much a linguistic concept — 義順 (talk) 08:13, 25 February 2024 (UTC)[reply]
@ND381 Wouldn't that amount to splitting it into multiple languages, though? I'm not sure if that's warranted. Theknightwho (talk) 08:17, 25 February 2024 (UTC)[reply]
Yes, it would. If you don't see it being practical I can make it practical by adding a few hundred Wuhanese/Kunmingese entries next Sunday. Plus, I really don't see any other functional alternative for resolving this issue — 義順 (talk) 08:25, 25 February 2024 (UTC)[reply]
@ND381 I have no opinion - I just want to make sure we don't add Sichuanese and then find we need to move everything to SW Mandarin later on when adding other SW Mandarin lects. If you think they need to be separate languages then that's totally fine. Theknightwho (talk) 08:49, 25 February 2024 (UTC)[reply]
Right, thanks for the clarification. I really don't see Southwestern Mandarin needing a language-level code any time in the future, as we have Wuhanese implementation and expansion set as a long term goal (see here) and, as mentioned, Southwestern Mandarin doesn't really act like a single coherent unit outside of linguistics even when compared to ones we have like Xiang or Jin, which have "Hunanese" and "Shanxinese" connotations respectively — 義順 (talk) 08:57, 25 February 2024 (UTC)[reply]

Created as zhx-sic. The practical effect of this is relatively minor, since we've already been treating it like a full language anyway. Theknightwho (talk) 18:54, 27 February 2024 (UTC)[reply]

Add etymology-only codes for Proto-Anglo-Frisian and Proto-North Sea Germanic[edit]

As variants of Proto-West Germanic. This shoud hopefully be relatively uncontroversial, since we already have a healthy number of entries in Category:Anglo-Frisian Germanic and Category:North Sea Germanic, and there's a need for these due to both (sub-)families being mentioned in various etymology sections:

No doubt there are many more entries where these could be referred to. Theknightwho (talk) 02:19, 27 February 2024 (UTC)[reply]

@Theknightwho Anglo-Frisian is a well-established clade but I'm not so sure about North Sea Germanic. Cf. Wikipedia's comment:
North Sea Germanic, also known as Ingvaeonic /ˌɪŋviːˈɒnɪk/, is a postulated grouping of the northern West Germanic languages that consists of Old Frisian, Old English, and Old Saxon, and their descendants.
Ingvaeonic is named after the Ingaevones, a West Germanic cultural group or proto-tribe along the North Sea coast that was mentioned by both Tacitus and Pliny the Elder (the latter also mentioning that tribes in the group included the Cimbri, the Teutoni and the Chauci). It is thought of as not a monolithic proto-language but as a group of closely related dialects that underwent several areal changes in relative unison.
Benwing2 (talk) 04:36, 27 February 2024 (UTC)[reply]
@Victar as a major PWG editor.
Not to mention the fact PWG is already pretty controversial (@Mårtensås had some strong opinions on the topic).
I don't think an etym-only code for either is needed at this time, as the supposed differences were very minor, and we don't represent it in our PWG entries afaik. So while the label signifies a term's distribution, it is still supposedly the same language as any other PWG reconstruction in the model we handle. Thadh (talk) 07:24, 27 February 2024 (UTC)[reply]
I've never had a need for either, and North Sea Germanic is generally considered an areal grouping. -- Sokkjō 07:39, 27 February 2024 (UTC)[reply]
I can see the argument against NSG, but there is very clearly a need for Proto-Anglo-Frisian based on the etymologies mentioned above. It’s not about whether any particular editor has a need for it themselves, and nobody is suggesting we create separate entries for them outside of PWG. Theknightwho (talk) 11:00, 27 February 2024 (UTC)[reply]
@Theknightwho I see you created a category Category:Old Frisian terms derived from North Sea Germanic languages as well as Category:Elfdalian terms derived from North Sea Germanic languages‎ and Category:Elfdalian terms derived from Anglo-Frisian languages‎. Why did you do that, since this discussion is far from resolved? Benwing2 (talk) 22:29, 27 February 2024 (UTC)[reply]
@Benwing2 I've already removed the North Sea Germanic family, as I thought better of it. The question of whether we have an Anglo-Frisian clade is separate from whether we have a protolanguage for it (and that category was created back in November). Theknightwho (talk) 22:35, 27 February 2024 (UTC)[reply]
Ignoring that fact that a genetic Anglo-Frisian family is disputed, as far as I'm aware, no one has published "Proto-Anglo-Frisian" reconstructions, not even Boutkan or Siebinga, so we wouldn't even have anyone to cite. -- Sokkjō 00:57, 28 February 2024 (UTC)[reply]
@Sokkjo Then someone will need to deal with the etymology sections in those entries. Either we mention Anglo-Frisian reconstructions with a proper language code, or we don't mention them at all. Theknightwho (talk) 01:43, 28 February 2024 (UTC)[reply]
Which entries, these: CAT:Anglo-Frisian Germanic? -- Sokkjō 02:11, 28 February 2024 (UTC)[reply]
@Sokkjo English welkin (which refers to an "Anglo-Frisian Germanic" term), while Old English hriþer and metegian, Old Frisian hrither, and Saterland Frisian dusse all explicitly give Anglo-Frisian reconstructions. Theknightwho (talk) 02:15, 28 February 2024 (UTC)[reply]
Amended. -- Sokkjō 04:16, 28 February 2024 (UTC)[reply]
@Sokkjo You should also look at the entries mentioned in the North Sea Germanic list at the top of the thread. Once they're dealt with, I'll close this request as resolved. Theknightwho (talk) 06:44, 28 February 2024 (UTC)[reply]
@Theknightwho Before resolving this, we need to clear up whether to let the existing 'Anglo-Frisian' family stand. You created it in November without discussion and it's not clear to me from this discussion whether there's consensus in its favor. Benwing2 (talk) 07:11, 28 February 2024 (UTC)[reply]
@Benwing2 To explain the reasoning: I understood it to be an uncontroversial clade, which was reinforced by the existence of Category:Anglo-Frisian Germanic. I may have misunderstood the implications of that category, though. Theknightwho (talk) 07:26, 28 February 2024 (UTC)[reply]
@Theknightwho I think what this shows is that all additions of clades, and more generally any addition of languages or families, needs discussion beforehand, no matter how uncontroversial it seems. Benwing2 (talk) 07:53, 28 February 2024 (UTC)[reply]
@Theknightwho I see you also created the "High German" family back in November. Let me reiterate, you need to not create any more languages or families without discussion. Benwing2 (talk) 01:25, 1 March 2024 (UTC)[reply]

2024 — March[edit]

Discussion moved to Wiktionary:Requests for deletion/Others#Appendix:Adjectives indicating shape.

Merging Tupinambá (tpn) into Old Tupi (tpw)[edit]

Tupinambá has only 3 entries, i, and ý, which are already covered by Old Tupi, i, and 'y/y. Also, Old Tupi is used as an umbrella term for all Tupi dialects in Wikitionary, so having a separate heading for Tupinambá doesn't make much sense. Trooper57 (talk) 17:11, 9 March 2024 (UTC)[reply]

I also wanted to merge Tupinikin (tpk) for the same reason, just realised there's page for it. This one is basically blank, except for an empty maintenance category. Trooper57 (talk) 21:15, 9 March 2024 (UTC)[reply]
tpw (Old Tupi) got merged into tpn (Tupinambá) in 2022, so we should probably follow suit. I don’t really understand why Tupinikin (tpk) should be merged, though. Theknightwho (talk) 21:52, 9 March 2024 (UTC)[reply]
It's the same case of Tupinambá: what they call "Tupinikin language" is the variant of Old Tupi spoken by the Tupinikin people. I called them dialects but the difference is like General American to Southern American English, they differ on pronunciation in some points and call some things by different words, but aren't languages on their own. The category is just gonna stay blank forever as all lemmas will be put in Old Tupi anyway. Also, both Tupinambá language and Tupiniquim language redirect to Tupi language on Wikipedia.
About the code, I chose tpw over tpn because I prefer the name "Old Tupi", since it's neutral. I don't mind changing the code if we keep the name. Trooper57 (talk) 22:44, 9 March 2024 (UTC)[reply]
@Trooper57 For reference ISO merged Old Tupi and Tupinambá to tpn, and the code tpw was deprecated. It also seems that all varieties of Tupi are extinct. If Tupinambá & Old Tupi [tpn] are not significantly different from Tupiniquim [tpk] perhaps they should all be merged into Tupi [tpn]? - سَمِیر | Sameer (مشارکت‌ها · بحث) 21:54, 9 March 2024 (UTC)[reply]
It seems theknightwho already said that while I was typing so my comment is now pointless 😞. - سَمِیر | Sameer (مشارکت‌ها · بحث) 21:56, 9 March 2024 (UTC)[reply]

Originally at 翖侯, copied to 翕侯. I do not remember why I chose 翖侯.

Should the 翖侯 entry point to 翕侯 instead? And if so, can the page history be merged from 翖侯 to 翕侯? —Fish bowl (talk) 08:20, 7 March 2024 (UTC)[reply]

etymology codes for remaining Chinese varieties in Module:zh-usex/data[edit]

Current variety code Description Current langcode Proposed langcode Current romanization Comment
MSC MSC cmn cmn Pinyin
M-BJ Beijing Mandarin cmn cmn-bei? [SUGGESTION] Pinyin
M-TW Taiwanese Mandarin cmn cmn-TW? [SUGGESTION] Pinyin
M-MY Malaysian Mandarin cmn cmn-MY? [SUGGESTION] Pinyin
M-SG Singaporean Mandarin cmn cmn-SG? [SUGGESTION] Pinyin
M-PH Philippine Mandarin cmn cmn-PH? [SUGGESTION] Pinyin
M-TJ Tianjin Mandarin cmn cmn-tia? [SUGGESTION] Pinyin
M-NE Northeastern Mandarin cmn cmn-nea cmn-noe? [SUGGESTION] Pinyin
M-CP Central Plains Mandarin cmn cmn-cpl cmn-cep? [SUGGESTION] Pinyin
M-GZ Guanzhong Mandarin cmn cmn-gua? [SUGGESTION] Pinyin Guanzhong
M-LY Lanyin Mandarin cmn cmn-lan? [SUGGESTION] Pinyin
M-S Sichuanese zhx-sic zhx-sic Sichuanese Pinyin
M-NJ Nanjing Mandarin cmn-njn cmn-njn cmn-nan? [SUGGESTION; cmn-njn NOT DEFINED] Nankinese Pinyin
M-YZ Yangzhou Mandarin cmn-yaz cmn-yaz cmn-yan or cmn-yzh? [SUGGESTION; cmn-yaz NOT DEFINED] IPA IPA as a placeholder
M-W Wuhanese cmn-wuh cmn-wuh? [NOT DEFINED] IPA
M-GL Guilin Mandarin cmn-gli cmn-gli cmn-gui? [SUGGESTION; cmn-gli NOT DEFINED] IPA IPA as a placeholder
M-XN Xining Mandarin cmn-xin cmn-xin? [NOT DEFINED] IPA IPA as a placeholder
M-UIB dialectal Mandarin cmn cmn-bei-unk? [DO WE NEED THIS?] Pinyi UIB stands for "unidentified Beijingesque"; this is only used for dialects with similar phonology to one of Beijing dialect or MSC
M-DNG Dungan dng dng Cyrillic
CL Classical Chinese cmn cmn-cla lzh-cmn? [SUGGESTION] Pinyin
CL-TW Classical Chinese cmn cmn-cla-TW lzh-cmn-TW? [SUGGESTION] Pinyin (Taiwanese Mandarin)
CL-C Classical Chinese yue cmn-cla-TW lzh-yue? [SUGGESTION] Jyutping
CL-C-T Classical Chinese zhx-tai zhx-tai-cla lwz-tai? [SUGGESTION] Wiktionary
CL-VN Vietnamese Literary Sinitic vi ??? [DO WE NEED THIS?] Sino-Vietnamese
CL-KR Korean Literary Sinitic ko ??? [DO WE NEED THIS?] Sino-Korean
CL-C Classical Chinese yue yue-cla lzh-yue? [SUGGESTION; DUPLICATE ENTRY] Jyutping
CL-PC Pre-Classical Chinese cmn cmn-pcl lzh-pre? [SUGGESTION] Pinyin
CL-L Literary Chinese cmn cmn-lit lzh-lit? [SUGGESTION] Pinyin
CI Ci cmn cmn-cip lzh-cip? [SUGGESTION] Pinyin
WVC Written Vernacular Chinese cmn cmn-wrv cmn-wvc? [SUGGESTION] Pinyin
WVC-C Written Vernacular Chinese yue yue-wrv yue-wvc? [SUGGESTION] Jyutping
WVC-C-T Written Vernacular Chinese zhx-tai zhx-tai-wrv zhx-tai-wvc? [SUGGESTION] Wiktionary
C Cantonese yue yue Jyutping
C-GZ Guangzhou Cantonese yue yue-gua? [SUGGESTION] Jyutping
C-LIT Literary Cantonese yue yue-lit? [SUGGESTION] Jyutping
C-HK Hong Kong Cantonese yue yue-HK? [SUGGESTION] Jyutping
C-T Taishanese zhx-tai zhx-tai Wiktionary
C-DZ Danzhou dialect yue-dan yue-dan? [NOT DEFINED] IPA IPA as a placeholder
J Jin cjy cjy Wiktionary
MB Min Bei mnp mnp Kienning Colloquial Romanized
MD Min Dong cdo cdo Bàng-uâ-cê / IPA
MN Hokkien nan-hbl nan-hbl Pe̍h-ōe-jī
TW Taiwanese Hokkien nan-hbl nan-hbl-TW? [SUGGESTION] Pe̍h-ōe-jī
MN-PN Penang Hokkien nan-hbl nan-pen Pe̍h-ōe-jī
MN-PH Philippine Hokkien nan-hbl nan-hbl-PH? [SUGGESTION; we have nan-plp but this is badly named] Pe̍h-ōe-jī
MN-T Teochew nan-tws nan-tws Peng\'im
MN-L Leizhou Min nan-luh nan-luh Leizhou Pinyin
MN-HLF Haklau Min nan-hlh nan-hlh IPA IPA as a placeholder
MN-H Hainanese nan-hnm nan-hnm Guangdong Romanization
W Wu wuu wuu Wugniu
SH Shanghainese wuu wuu-sha Wugniu
W-SZ Suzhounese wuu wuu-szh Wugniu
W-HZ Hangzhounese wuu wuu-hzh Wugniu
W-CM Shadi Wu wuu wuu-sha wuu-chm? [SUGGESTION; wuu-sha conflicts with suggestion for Shanghainese] Wugniu wuu-cm? including Chongming, Haimen, Changyinsha etc
W-NB Ningbonese wuu wuu-ngb Wugniu
W-N Northern Wu wuu wuu-nor? [SUGGESTION] Wugniu general northern wu, incl. transitionary varieties
W-WZ Wenzhounese wuu-wz wuu-wen? [SUGGESTION; wuu-wz NOT DEFINED and badly formatted] Wugniu
G Gan gan gan Wiktionary
X Xiang hsn hsn Wiktionary
H Sixian Hakka hak hak-six? [SUGGESTION] Pha̍k-fa-sṳ
H-HL Hailu Hakka hak hak-hai? [SUGGESTION] Taiwanese Hakka Romanization System
H-DB Dabu Hakka hak hak-dab? [SUGGESTION] Taiwanese Hakka Romanization System
H-MX Meixian Hakka hak hak-mei? [SUGGESTION] Hakka Transliteration Scheme
H-MY-HY Malaysian Huiyang Hakka hak hak-hui? [SUGGESTION] IPA IPA as a placeholder
H-EM Hakka hak hak-emo hak-eam? [SUGGESTION] IPA Early Modern Hakka, IPA as a placeholder
H-ZA Zhao'an Hakka hak hak-zha? [SUGGESTION] Taiwanese Hakka Romanization System
WX Waxiang wxa wxa IPA

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho We currently have 66 bespoke "variety codes" for Chinese lects in Module:zh-usex/data for use by {{zh-x}} (there are 67 entries in the data module, but CL-C occurs twice). The module maps them to language codes (full and etymology-only), but in a non-unique fashion. I haven't counted but I'm guessing maybe 40% of the variety codes have existing full or etymology-only language codes. I propose creating etymology-only codes for the remaining ones and then doing a bot run to replace the bespoke codes with Wiktionary language codes. In the above table I list my suggestions. Note that some of the currently listed lang codes don't exist and some of them are badly named or formatted and should be renamed. Benwing2 (talk) 05:52, 10 March 2024 (UTC)[reply]

lmao are we finally getting around to extirpating these?
Personally I believe that giving every dialect branch and family (and location?) will be an unwieldy alphabet soup. I propose using full names for clarity and simplicity.
The Classical Chinese codes seem silly; perhaps we should use a bipartite system giving the text language and the pronunciation language, such as lzh/cmn-TW (Literary Chinese in Taiwanese Mandarin pronunciation).
(Can we do {{zh-pron}} next?) —Fish bowl (talk) 06:16, 10 March 2024 (UTC)[reply]
Yeah I don't know much about Classical Chinese; I just added codes for them to correspond to the existing variety codes. I have no objection to merging some of the variety codes. The main disadvantage to using full names in etym codes is that they're long to type. Benwing2 (talk) 06:23, 10 March 2024 (UTC)[reply]
(Counterpoint to the "alphabet soup" concern: these 2-letter ad-hoc codes have worked so far(?) —Fish bowl (talk) 06:33, 10 March 2024 (UTC))[reply]
@Fish bowl We're going to need codes for quite a few of these anyway, when they eventually get added to {{zh-pron}}, so we might as well give them proper codes now. Quite a few of them will need to be made full languages at some point, but giving them etym-only codes is probably fine for now. Theknightwho (talk) 06:42, 10 March 2024 (UTC)[reply]
Support the proposal generally. I don't have strong feelings about the particular etymology codes that are proposed here. While we're at it, perhaps what is called Literary Cantonese should be renamed to Hong Kong Written Chinese or the like. "Literary Cantonese" is kind of a confusing label because it's more like a Mandarin-based register with varying degrees of influence from Cantonese and Literary Chinese. — justin(r)leung (t...) | c=› } 06:37, 10 March 2024 (UTC)[reply]
Support in general. CL-* and CI should perhaps be lzh-* instead of cmn-*, or what Fish Bowl has said above. The WVC ones could use *-wvc instead of *-wrv - the former would be easier to remember. Danzhou should be zhx-dan, since it is just often grouped under Yue for convenience but not really a Yue lect. Penang Hokkien should be nan-hbl-pen as it's a Hokkien dialect. Other than these no strong feelings, though I would caution against using too much of the syllables that are too frequently found in place names (e.g. zhou as zh in Suzhou wuu-szh and Hangzhou wuu-hzh, these are already defined so it's fine), otherwise we will run out of possible letter combinations very soon. Also concur with Justin's view on "Literary Cantonese". – wpi (talk) 07:00, 10 March 2024 (UTC)[reply]
@Wpi @Theknightwho Yeah I think we should try to write up a proposed set of conventions for new etymology language codes. Generally I try to use the first three letters of the lect name unless that creates ambiguity (e.g. I used nea for Northeastern because nor would be ambiguous with Northern), but beyond that some thought is required. Benwing2 (talk) 07:28, 10 March 2024 (UTC)[reply]
@Benwing2 @Wpi I’d oppose having 9 letter codes except where it’s unavoidable (e.g. some proto-languages), since they’re awkward and difficult to remember. In cases like Penang Hokkien, just using the family code as a prefix is probably fine. Theknightwho (talk) 14:36, 10 March 2024 (UTC)[reply]
@Theknightwho Yes that is totally reasonable. Benwing2 (talk) 19:39, 10 March 2024 (UTC)[reply]
Support
Shadi could be wuu-chm (ie. Chongming) to avoid overlap with Shanghainese if you so wish. wuu-nor looks fine. — 義順 (talk) 09:23, 10 March 2024 (UTC)[reply]
Support the general proposal, but I'm wondering if we should have something for colloquial putonghua. There are some colloquial terms that are not specific to the Beijing dialect of Mandarin, and probably some that are more used among non-native Mandarin speakers in southern China. The dog2 (talk) 14:29, 10 March 2024 (UTC)[reply]
Comment: I updated some of the suggested codes based on comments and based on my attempts to be more consistent. I am using the following logic for defining codes:
  1. Use the first three letters of the variety/dialect/lect name if possible.
  2. If that causes ambiguity:
    1. If the lect name has three components, use the first letter of each.
    2. If the lect name has two components and one of them begins with a digraph, use the digraph along with the first letter of the other, e.g. Yangzhou could be abbreviated yzh.
    3. Otherwise, if the lect name has two components, use the first two letters of the first component followed by the first letter of the second, e.g. Early Modern could be abbreviated eam and Northeast(ern) could be abbreviated noe.
In response to User:The dog2's comments about colloquial Putonghua, would the current M-UIB variety code suffice? The comment by it is UIB stands for "unidentified Beijingesque"; this is only used for dialects with similar phonology to one of Beijing dialect or MSC.
In response to User:Wpi: I updated the Classical Chinese codes to use lzh-*. What about 'Vietnamese Literary Sinitic' and 'Korean Literary Sinitic'? Are these actual lects that are essentially Vietnamese/Korean-influenced usage of Classical Chinese, or are they merely the use of Chinese terms in Vietnamese and Korean? In the former case we could adopt codes lzh-VI and lzh-KO; in the latter case I'm not sure we need any codes. Benwing2 (talk) 23:32, 10 March 2024 (UTC)[reply]
@Theknightwho @Chuck Entz We have errors coming from the undefined codes that currently occur in Module:zh-usex/data. We either need to temporarily change the undefined codes to one of the currently in-use codes, or go ahead and define etymology-only language codes corresponding to the undefined codes (not necessarily using the codes already present; see my suggestions above). The set of undefined codes causing errors is wuu-wz (Wenzhounese), cmn-gli (Guilin), cmn-xin (Xining), cmn-njn (Nanjing), cmn-yaz (Yangzhou), yue-dan (Danzhou). Benwing2 (talk) 23:39, 10 March 2024 (UTC)[reply]
@Benwing2 Of those, I'm pretty sure Danzhou and Nanjing should be full languages for sure, but I'm unsure about the others. @wpi, justinrleung? Theknightwho (talk) 23:43, 10 March 2024 (UTC)[reply]
@Theknightwho Full languages take longer to gain consensus. IMO if there are no objections we can define them for now as etym-only languages and upgrade them later when the discussion has played out. (E.g. I have heard it said that Wenzhounese itself consists of several mutually incomprehensible dialects, meaning potentially we would need several full languages at some point.) Benwing2 (talk) 23:49, 10 March 2024 (UTC)[reply]
@Benwing2 Sure. In the case of Danzhou, it should probably be made a child of Chinese (zh) for now, then. It's traditionally been counted as a Yue lect, but more recently it's been treated as an unclassified divergent lect; however, we treat Yue as a family (zhx-yue), and reserve the code yue for Cantonese. Whether or not Danzhou is part of Yue, it's definitely not part of Cantonese in the sense we're defining it as. Theknightwho (talk) 23:54, 10 March 2024 (UTC)[reply]
I don't know about calling it "Beijing-esque". One thing is that you don't really get the erhua in Taiwan, or when people from southern China speak Mandarin. And someone from Beijing will always distinguish between 咱們 and 我們 when speaking standard Mandarin, but you don't see that among people from southern China when they speak Mandarin. Should we just have a generic "Southern Chinese Mandarin then"? The dog2 (talk) 00:09, 11 March 2024 (UTC)[reply]
@The dog2 "Southern Chinese Mandarin" seems problematic because Mandarin is a huge area with lots of diversity, and here you don't mean "Mandarin as spoken in the southern part of the Mandarin-speaking area" so much as "Southern Standard Mandarin". Benwing2 (talk) 00:30, 11 March 2024 (UTC)[reply]
@Benwing2: What I meant is Mandarin as spoken today in traditionally non-Mandarin-speaking areas like Fujian and Guangdong. The dog2 (talk) 01:40, 11 March 2024 (UTC)[reply]
@The dog2 Right, but conceptually your "Southern Chinese Mandarin" is completely different from e.g. Southwestern Mandarin. The latter refers to the lects spoken natively in the southwestern part of the Mandarin-speaking area but the former refers not to the native Mandarin lects in the southern part of the Mandarin-speaking area (which would be something like Jianghuai Mandarin) but to the variety of Standard Mandarin (which is a northern Mandarin variety) as spoken in southern regions that natively don't speak Mandarin at all. Benwing2 (talk) 01:46, 11 March 2024 (UTC)[reply]
@Benwing2: It's a little more complicated than that these days. In some places like Nanning and Fuzhou, most of the younger generations can't speak the local dialect anymore, and now speak standard Mandarin as their native language. The dog2 (talk) 04:27, 11 March 2024 (UTC)[reply]
@The dog2 OK sure (what you are describing is unfortunately happening everywhere), but are you objecting to the term "Southern Standard Mandarin"? ("Southern Chinese Mandarin" seems to me both problematic for the reasons I have outlined, and redundant in that "Mandarin" is a variety of "Chinese".) Benwing2 (talk) 04:34, 11 March 2024 (UTC)[reply]
@Benwing2: Maybe let's ask Justinrleung what he thinks is a good name, because I can't really think of one. And it really depends on which part of China. In the Teochew-speaking areas, the dialect is still going very strong. And people from Fuzhou often lament that people from southern Fujian have preserved the dialects much better than in Fuzhou. The dog2 (talk) 05:05, 11 March 2024 (UTC)[reply]
@Benwing2, The dog2: I don't really know what the purpose of this "Southern Standard Mandarin" for the purposes of zh-x or elsewhere on Wiktionary. Terms that are chiefly used in the south but still considered standard aren't usually marked as southern, and there isn't really a cohesive variety, just possibly some shared tendencies. — justin(r)leung (t...) | c=› } 05:17, 11 March 2024 (UTC)[reply]
The "Beijingesque" tag is chiefly used by @Dokurrat. Dokurrat, can you explain this tag a bit? —Fish bowl (talk) 22:32, 11 March 2024 (UTC)[reply]
@Fish bowl: Sometimes a word or expression exist in both Beijing dialect and my native lexicon and I would construct an example sentence for it. In such case, I feel weird to label my example sentence as "Beijing dialect", as I'm not a Beijing dialect speaker. And hence I created this "UIB" tag thing. Now that I review this thing, I think I could've just used a tag that says "Mandarin" instead. I have no issue retiring the "UIB" tag or renaming it or doing whatever y'all see fit with it. Dokurrat (talk) 08:20, 13 March 2024 (UTC)[reply]
@Benwing2, Theknightwho: I'm not sure about Nanjing being a full language but not other varieties of Mandarin (Yangzhou is also Jianghuai, for example, so why would it be differentially treated?) Danzhou can be a full language since its status is disputed. BTW, I'm not exactly sure about "Cantonese" being not the same as "Yue" in current practice on Wiktionary. It just seems like that because all Cantonese entries in jyutping are based on Standard Cantonese (because Jyutping is inherently created for Standard Cantonese) and translations are 99.9% in Standard Cantonese because of our editors' knowledge; however, in "zh-dial" and "zh-pron", "Cantonese" means "Yue". I'm not exactly down with the idea that yue = Cantonese and zhx-yue = Yue, since it's kind of different from how we're treating nan, for example. — justin(r)leung (t...) | c=› } 01:00, 11 March 2024 (UTC)[reply]
@Justinrleung I think it would be sensible to have a major thread each for Mandarin and Yue to hash out how we handle them, in a similar fashion to what we've been doing for (Southern) Min, since it would help to iron out these kind of issues, as I think the current piecemeal approach leads to a lot of confusion. In particular, there's the issue you point out as to what we should be using the yue and cmn codes for. Theknightwho (talk) 01:06, 11 March 2024 (UTC)[reply]
@Theknightwho: Yes, I agree. — justin(r)leung (t...) | c=› } 01:12, 11 March 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── I added etym codes for the lects in Module:zh-usex/data that were causing errors, using my proposed codes above. They are marked as temporary, pending further discussion. Benwing2 (talk) 22:38, 11 March 2024 (UTC)[reply]

 Done. I omitted the things listed as "DO WE NEED THIS?" above and also omitted "Literary Chinese" because I have no idea how i differs from just lzh, which is also "Literary Chinese". Please note, there are lots more lects mentioned in Module:labels/data/lang/zh and in qualifiers in Chinese thesaurus entries; I will post separately about these. Benwing2 (talk) 23:55, 17 March 2024 (UTC)[reply]

Additional Southern Min languages[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381, Benwing2): Following the various discussions relating to Min in the last month or so, now seems a good time to propose the additional Southern Min varieties which we've been missing:

  1. Zhenan Min (nan-zhn)
  2. Datian Min (nan-dtn)
  3. Longyan Min (nan-lnx) - sometimes grouped as part of Hokkien
  4. Sanxiang Min (nan-zsh) - one of the Zhongshan Min lects; the other two are apparently Eastern Min
  5. Swatow Min (nan-swt) - also known as Shantou
  6. Hoklo Min (nan-hlh) - also known as Hailufeng or Haklau Min; currently etym-only but should be made a full language
  7. Proto-Southern Min (nan-pro) - see Appendix:Proto-Southern Min reconstructions

Although we will want codes for all of these, it might not be desirable to count all of them as separate languages. I also suspect the list is far from complete. Theknightwho (talk) 19:32, 10 March 2024 (UTC)[reply]

Support although (a) are we stuck with the above codes (i.e. they are proposed ISO 639 standard codes)? If not some of them could stand to be rationalized; (b) we should clarify earlier rather than later whether these should be full or etym codes (although for Chinese I suppose it makes less difference than elsewhere as the L2 header used is always "Chinese"). Benwing2 (talk) 19:37, 10 March 2024 (UTC)[reply]
Swatow Min is classified under Teochew, so we do not need additional codes for it. The term "Hoklo" is a bit ambiguous because Hokkien speakers will consider "Hoklo" to refer to Hokkien. The dog2 (talk) 19:44, 10 March 2024 (UTC)[reply]
@The dog2 The difficulty with "Teochew" as a name is that it refers to two different things: (1) what Wikipedia calls Chaoshan Min as a whole, and (2) the specific lect as spoken in Chaozhou, which it calls the Teochew dialect. We will still need a code for it either way, but the question is whether it should be an etymology-only code or a full language code. Theknightwho (talk) 19:54, 10 March 2024 (UTC)[reply]
The first definition of "Teochew" already has a code for it. It is "zhx-teo". But I'd be open to changing it to be in line with that of the other Southern Min dialect. In Southeast Asia, the term "Teochew" in common parlance is generally understood to refer to the first definition. The dog2 (talk) 20:00, 10 March 2024 (UTC)[reply]
@The dog2 Yeah, that makes sense. Just as a side point, the Teochew code was changed to nan-tws with the split of Min Nan, because it makes sense to give all the Southern Min codes the nan prefix, and the pending ISO code is tws. Theknightwho (talk) 20:21, 10 March 2024 (UTC)[reply]
@Theknightwho: Thanks for starting this discussion. There are few issues here.
  1. Zhenan Min might be a confusing name because Southern Zhejiang has both Southern Min and Eastern Min varieties; we may want to look into what other names we can use.
  2. Datian Min might need to split further into Qianlu and Houlu dialects.
  3. Does Longyan Min cover all Southern Min varieties spoken in the prefecture city of Longyan? Otherwise, there are several (sub)varieties of Longyan Min.
  4. Swatow/Shantou should probably not be separate from Teochew - it's rare to consider them different varieties.
  5. I personally prefer Hailufeng over Hoklo for the varieties of Southern Min spoken in Haifeng/Lufeng, since Hokkien may also be called Hoklo.
— justin(r)leung (t...) | c=› } 20:11, 10 March 2024 (UTC)[reply]
@Theknightwho
1. “Zhenan Southern Min” lies within Hokkien, both sociolinguistically & in terms of intelligibility. It’s pretty much an overseas cluster of Hokkien (and not only b/c it arrived by sea), and should be discussed in that context.
2. Yes, but “Datian Min” is not one language. Which “Datian Mins” belong within “Southern Min” (in any meaningful sense) is a question yet to be thoroughly considered.
3. Yes. “Longyan Min” is sociolinguistically not-Hokkien as well as mutually unintelligible vs Hokkien.
4. Yes. (Not sure if the other two are “Eastern Min”, but that’s a whole other ballgame.)
5. Swatow “Min” is part of Teochew, as others have pointed out.
6. Yes, most definitely. BTW, “Hoklo” refers to the language cluster that includes this language, Hokkien, Teochew, Taiwanese, & maybe others. So “Hoklo” & “Haklau” would be cognate non-synonyms, kind of like “Thai” & “Tai”, but not as striking.
7. Maybe the supposed proto-language should be fleshed out first? (+ I apologise if this is obvious, but Kwok’s “reconstructions” seem to be something quite different from what we usually mean by reconstruction. Also note (as with the ONESELF line) how much data it just flat-out ignores or omits (in this case perhaps in order to hang on to the presumed characters-of-etymology 家 & 己). (talk) 13:45, 11 March 2024 (UTC)[reply]

Beserman[edit]

(Notifying Thadh, Tropylium, Surjection): Recently I’ve been adding Beserman Udmurt entries (Category:Beserman Udmurt), and contrary to my expectations, Beserman seems less similar to Udmurt than I initially expected (at least in terms of vocabulary and phonology). Beserman is usually considered to be a 'special' dialect of Udmurt, and since recently it also has it's own written standard. As far as I can see it definitely seems more convenient to create separate Beserman entries. I'm afraid that, if not, Udmurt might get pretty messy, with for most Udmurt entries a Beserman alternative form. A lot of information on the Beserman dialect can be found on http://beserman.ru/. I'll be glad to hear your opinions on this. Илья А. Латушкин (talk) 19:52, 13 March 2024 (UTC)[reply]

At minimum most of the Beserman entries so far should not be listed as synonyms. Most are simply the result of a regular sound change from ы /ɨ/ to ө. Currently it seems this is also transcribed on here as /ʌ/ and translitterated as å, where at least the latter seems weird, most often I have seen the sound described as /ə/ (= Finno-Ugric transcription ə̑, which beserman.ru also seems to use). In any case, these could be easily accommodated similar to differences between e.g. English dialects, as alternate pronunciations + spellings (besides, this is not unique to Beserman but is paralleled by other dialects). A few other phenomena also come down to simple systematic pronunciation differences, e.g. the replacement of ӧ by /e/. It is unclear to me (and per current literature, it seems, also to Uralistics at large) how much else really differs between Beserman and even standard Udmurt. --Tropylium (talk) 20:07, 13 March 2024 (UTC)[reply]
@Tropylium: The usage of synonym of stems from my usage of that format in Komi Izhma entries, e.g. асывыы (asyvyy). It's probably indeed a good idea to mark them as altforms, but the issue I have is mostly that Komi Izhma is actually semi-standardised alongside standard Komi, and the same issue is also present in Beserman.
On the differences between it and standard Udmurt, I honestly can't say a lot as I haven't worked too much with the language. It does feature some unique sound changes from the Proto-Permic language that set it apart from the other Udmurt dialects, like being the only Permic lect to (consistently) differentiate between the reflexes of *u and . It also seems to have a national identity separate from other Udmurts. But other than that I would have to refer to Ilya, as they've worked with the language more closely. Thadh (talk) 20:47, 13 March 2024 (UTC)[reply]
Sorry, whose *ü and where? Beserman has a few unique-looking cases of /ə/ (< ? *ɨ), but only in words where southeastern Udmurt more generally also shows /ʉ/ (the generally accepted historical scenario is that Beserman arises from the SE dialects of Udmurt, after a migration towards the north leaves them slightly isolated). --Tropylium (talk) 21:03, 13 March 2024 (UTC)[reply]
Lytkin's. I'm talking of words like мөнөнө (månånå, to go) and зөмөнө (zåmånå, to dive). And I do take issue with your identification of the vowel as being a schwa, it most definitely isn't one. If you listen to actual recordings I think you'll agree that it is a low vowel, sometimes even as open as [a]. Thadh (talk) 21:30, 13 March 2024 (UTC)[reply]
/ə/ is not my identification but what reference literature insists calling it, e.g. the late Keľmakov's monographs on Udmurt dialectology like Udmurtin murteet (1994), Диалектная и историческая фонетика удмуртского языка (2003). A lot of beserman.ru's recordings do sound more like [ʌ] or [ɐ], I agree. This could be a recent development, also e.g. the loss of ӧ is only post-WW1. --Tropylium (talk) 20:43, 14 March 2024 (UTC)[reply]
Overall Permic languages have undergone some shifts in the recent century, also including the delabialisation of ӧ (ö) in practically all varieties of Komi. Since we are primarily a descriptive dictionary of the modern languages (earlier stages are a bonus!) I think we should stick to the modern pronunciation. The transcription of the vowel as å was taken over from Komi-Yazva, which has a very similar vowel written the same way. Thadh (talk) 09:07, 15 March 2024 (UTC)[reply]
I know nothing about Udmurt, but I do agree that unless and until Beserman is considered a separate language, its entries should be formatted along the lines of {{alt form|udm|аску|from=Beserman}} rather than as synonyms of primary-dialect forms. —Mahāgaja · talk 21:40, 13 March 2024 (UTC)[reply]
@Tropylium I have found some other sound correspondences between Udmurt and Beserman:
1. йырси ~ йөрчө 'hair', кырси ~ көрчө 'son-in-law'
2. кеч ~ кесь 'goat', ӟуч ~ дюсь 'Russian'
3. син ~ синь 'eye', кин ~ кинь 'who', нин ~ нинь 'linden'
4. тэй ~ тей 'louse', дӥсь ~ дись 'clothes', дэрем ~ дерем 'shirt'
5. ӝӧк ~ ӟек 'table', ӝыт ~ ӟөт 'evening', ӝужыт ~ ӟужөт 'high'
6. ньөм ~ ним 'name', йөвор ~ ивор 'news'
7. сылал ~ слал 'salt', плем ~ пилем 'cloud'
Илья А. Латушкин (talk) 18:24, 14 March 2024 (UTC)[reply]
FWIW most of this is also within normal phonetic variation for Udmurt dialects, the /Te/ > /Tʲe/ change is the only systematic feature I don't recall seeing reported before (makes sense though, helps for not entirely losing the э/ӧ contrast).
One thing to consider is that even if we created Beserman separately, we'd then still want to note all forms like these in Udmurt entries, just now as etymological cognates rather than pronunciation variants. It might not save substantial work altogether. The etymologist in me at least thinks this would be probably the nicer option though, if you're already creating separate entries anyway. And it would be more consistent also with how we have split Komi-Zyrian and Komi-Permyak, instead of treating them as variants of single "Komi". --Tropylium (talk) 19:43, 14 March 2024 (UTC)[reply]
The same thing has come to my mind as well, and at first sight the differences between Komi-Zyrian and Komi-Permyak do not seems to be much larger than those between Udmurt and Beserman.
I've found two more sound correspondences (1. ӟуч ~ дюсь 'Russian', ӟеч ~ десь ‘good’, 2. ньыль ~ ниль ‘four’, выль ~ виль ‘new’) and some Beserman words not found in standard Udmurt (most of them Turkic loanwords), eg. бикем ‘aunt’, биягам ‘husband's older brother’, бийөм ‘mother-in-law’, ўармиська ‘brother-in-law’, писяй ‘cat’ (also found as ‘писэй’ in dial. Udmurt), … Also some other, more sporadic, vowel correspondences have come up: изьыны ~ узьөнө ‘to sleep’, губи ~ гиби ‘mushroom’, чорыг ~ чорог ‘fish’, сюрес ~ сьөрес ‘road’, бугро ~ бөгра ‘felling’, … Илья А. Латушкин (talk) 08:50, 15 March 2024 (UTC)[reply]

More etym codes for Chinese varieties, part 1[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho Hopefully this ping isn't too noisy. There are two more sources of Chinese lects here at Wiktionary that I have found that may need etym-only codes: qualifiers in thesaurus entries and labels in Module:labels/data/lang/zh. The following table is derived from thesaurus qualifiers (I computed this as part of converting nan codes and qualifiers to appropriate lect codes):

Qualifier Count Comment Wikidata entry (if any)
ACG 1 Does this mean "Anime, Comics, Gaming"? Not a lect.
Anxi Hokkien 2 Need lect code?
Australia 1 Ambiguous
Buddhism 5 Not a lect
Buddhist temple 8 Not a lect
Chinese landscape garden 1 Not a lect
Christianity 1 Not a lect
Classical Chinese or in compounds 1 Ambiguous
Classical Chinese 59 Ambiguous
Classical 8 Ambiguous
Eastern Min; Southern Min 1 Ambiguous
Fuzhou 1 Ambiguous
Guangdong 1 Ambiguous?
Guiyang 1 Need lect code? Per w:Southwestern Mandarin, a subvariety of the Kun-Gui variety of Southwestern Mandarin Q15911623
Harbin Mandarin 1 Need lect code; a variety of Northeastern Mandarin Q1006919
Harbin 2 (same as above)
Hong Kong 24 Ambiguous
Hong Kong><tr:pot1 1 Ambiguous
Hsinchu & Taichung Hokkien 1 ??? Do we need two lect codes? Wikidata has a "Taichung Accent" (Q10914070) but it is a variety of Mandarin; can't find Hsinchu Hokkien in Wikipedia or Wikidata
Internet slang 9 Not a lect
Internet 2 Not a lect
Japanese calligraphy 1 Not a lect
Jilu Mandarin 1 Need lect code; primary subdivision of Mandarin Q516721
Jinhua Wu 1 Need lect code Q13583347
Korean calligraphy 1 Not a lect
Liuzhou Mandarin 2 Need lect code? Q7224853
Liuzhou 1 (same as above)
Longyan Min 2 Need lect code (but will likely be transitioning to a full language, see #Additional Southern Min languages); per Wikipedia, a variety of Hokkien, but that may be wrong Q6674568
Luoyang Mandarin 1 Need lect code; a variety of Central Plains Mandarin Q3431347
Luoyang 3 (same as above)
Macau 2 a variety of Cantonese? Do we need a lect code?
Mainland China 3 Ambiguous
Mainland 2 Ambiguous
Malaysia 11 Ambiguous
Mandalay Taishanese 1 an overseas variety of Taishanese; Do we need a lect code?
Min 12 Ambiguous
Muping Mandarin 1 Do we need a lect code? This may be a variety of Shangdong Mandarin (Q3285432)
Muping 2 (same as above)
Nanchang Gan 1 Need lect code Q3497239
Northern China 1 Ambiguous
Northern Mandarin 2 Ambiguous
Philippines 1 Ambiguous
Pinghua 1 Ambiguous
Pingxiang Gan 3 Do we need a lect code? A variety of Yiliu Gan Chinese (Q8053438)
Qing Dynasty 1 Not a lect
Sichuanese or Internet slang 1 Sichuanese = zhx-sic; Internet slang = not a lect
Singapore 13 Ambiguous
Son of Heaven 2 What is this? Not a lect.
Southeast Asia; dated or dialectal in Mainland China 1 Ambiguous
Southwestern Mandarin 2 Need lect code Q2609239
TCM 3 Traditional Chinese Medicine? Not a lect.
Taichung & Tainan Hokkien 1 Do we need a lect code or two? See above under "Hsinchu & Taichung Hokkien" for Taichung Hokkien. Tainan Hokkien is mentioned in Wikipedia as being the prestige dialect of Taiwanese Hokkien but can't find it in Wikidata.
Tainan Hokkien 1 (see above)
Taiwan 24 Ambiguous
Taiwanese 2 Ambiguous
Taiyuan 1 Need lect code? Variety of Jin Chinese Q10941068
Taoism 1 Not a lect
Thailand 2 Ambiguous
Urumqi 2 Need lect code? Variety of Lanyin Mandarin Q10878256
Wanrong 1 This is a mountain indigenous township in Taiwan; I don't what lect is being referred to, and whether it's even Chinese Refers to Wanrong County in Shanxi; a variety of Central Plains Mandarin, mentioned in the Great Dictionary of Modern Chinese Dialects; apparently a subvariety of Fenhe Mandarin (Q10379509)
Xi'an Mandarin 1 subvariety of Guanzhong Mandarin (Q3431648); not sure if it needs to be distinguished from Guanzong Q123700130
Xi'an 1 (same as above)
Xinzhou 3 Need lect code? Variety of Jin Chinese, doesn't seem to have Wikidata entry
Yinchuan 1 Need lect code? Variety of Lanyin Mandarin
Yongchun Hokkien 1 Need lect code? Q65118728
Yudu Hakka 1 Need lect code? Q19856416

There are 14 lects among the above qualifiers with Wikidata entries that I could find, and some others apparently without Wikidata entries that might need a code. Benwing2 (talk) 03:12, 18 March 2024 (UTC)[reply]

@Benwing2 Thanks for putting this together. On Longyan Min in particular, it's likely going to be separated out as a full language as per #Additional Southern Min languages, despite Wikipedia calling it a variety of Hokkien. Theknightwho (talk) 03:27, 18 March 2024 (UTC)[reply]
@Theknightwho Ah, I see that now, thanks. Benwing2 (talk) 03:33, 18 March 2024 (UTC)[reply]
@Benwing2: Wanrong refers to Wanrong County in Shanxi; this is a variety of Mandarin (Central Plains IIRC). — justin(r)leung (t...) | c=› } 03:32, 18 March 2024 (UTC)[reply]

More etym codes for Chinese varieties, part 2[edit]

@Theknightwho, Justinrleung Only pinging the people who responded to part 1 above. Here are the uncoded Chinese varieties with labels in Module:labels/data/lang/zh. As above, some have Wikidata items and some are too unspecific or ambiguous to turn into etym-only lects. Some are also clearly full languages or even families.

Canonical label Label aliases Comment Wikidata item (if any)
dialectal Cantonese Not specific enough
Changzhounese Changzhou dialect, Changzhou Wu subvariety of Northern (Taihu) Wu Q1021819
Chuzhou Wu Chuzhou dialect, Lishuinese, Lishui dialect, Fujian Wu, Lishui Wu a variety of Chu-Qu Wu, a Southern Wu language; confusable with Quzhou Wu; not in Wikidata?
Coastal Min coastal Min Not specific enough
Datian Min likely becoming a full language Q19855572
dialectal Eastern Min dialectal Min Dong Not specific enough
Gansu Dungan basis of the Soviet written standard for Dungan; not in Wikidata?
dialectal Gan Not specific enough
Guangxi Mandarin This is possibly the same as Guiliu (Gui-Liu) Mandarin (supervariety of Guilin Mandarin) Q11111664
dialectal Guangxi Mandarin Not specific enough
dialectal Hakka Not specific enough
Hong Kong Hakka Mentioned in the Wikipedia w:Hakka Chinese article Q2675834
Huzhounese Huzhou dialect, Huzhou Wu subvariety of Northern (Taihu) Wu Q15901269
Inland Min inland Min Not specific enough
Jianghuai Mandarin Jiang-Huai Mandarin, Lower Yangtze Mandarin, Huai primary branch of Mandarin Q2128953
Jiaoliao Mandarin Jiao-Liao Mandarin primary branch of Mandarin Q2597550
Jilu Mandarin Ji-Lu Mandarin primary branch of Mandarin? Q516721
dialectal Jin Not specific enough
Korean Classical Chinese Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
Linshao Wu Linshao, Linshao dialect, Lin-Shao Wu, Lin-Shao dialect, Lin-Shao subvariety of Northern (Taihu) Wu; not in Wikidata?
Liuzhou Mandarin a variety of Southwestern Mandarin Q7224853
dialectal Mandarin Not specific enough
Min Not specific enough
Nanning Pinghua a variety of Southern Pinghua Chinese; not in Wikidata?
North America North American Not specific enough
Pinghua A family, not a language
Shaoxing Wu Shaoxingnese, Shaoxingese, Shaoxing dialect variety of Linshao Wu, in turn a variety of Northern (Taihu) Wu Q7489194
Shehua its own branch of Chinese Q24841605
Shuangfeng dialect of Old Xiang Q10911980
Siyi a Yue language? Includes Taishanese Q2391679
Southern Min Min Nan Not specific enough
dialectal Southern Min dialectal Min Nan Not specific enough
Southern Wu appears to be a Wu subfamily, including at least three languages
Standard Written Chinese SWC Per User:justinrleung, this refers to Standard Mandarin = Putonghua, different from Written vernacular Chinese which refers to the standard written vernacular varieties of the Qing and Ming dynasties, as opposed to Classical/Literary Chinese (NOTE: Wikipedia's Standard Written Chinese confusingly redirects to Written vernacular Chinese, and Wikipedia's article on that covers time periods from the Ming dynasty to the present, not just through the end of the 19th century) Q727694
Sujiahu Su-Jia-Hu Wu, Sujiahu Wu, Su-Jia-Hu a subvariety of Northern (Taihu) Wu
Vietnamese Classical Chinese Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
dialectal Wu Not specific enough
Wuzhou Wu Jinhua dialect, Jinhuanese, Wuzhou, Wuzhou dialect, Jinhua Wu one of the Southern Wu languages Q2779891
dialectal Xiang Not specific enough
Xinjiang subvariety of Lanyin Mandarin? Includes Urumqi Mandarin (Q10878256)
Xinqu Wu Quzhounese, Quzhou dialect, Shangraonese, Shangrao dialect, Xinzhou dialect, Xinzhou Wu, Quzhou Wu, Shangrao Wu a variety of Chu-Qu Wu, a Southern Wu language Q6112429

Benwing2 (talk) 04:32, 18 March 2024 (UTC)[reply]

@Benwing2: Huzhounese is Q15901269. Guangxi Mandarin should be approximately the same as Guiliu Mandarin, which is Q11111664. Hong Kong Hakka is Q2675834. Standard Written Chinese is usually referring to the modern standard, whereas Written Vernacular Chinese seems to refer to written vernacular Mandarin in the Yuan, Ming and Qing dynasties.
BTW, Xinzhou dialect as an alias for Xinqu Wu is problematic, since Xinzhou is ambiguous. Xinzhou Jin is a completely different variety from a different Xinzhou. — justin(r)leung (t...) | c=› } 06:19, 18 March 2024 (UTC)[reply]
@Justinrleung Thank you for finding those entries! I think we should remove all aliases that read 'Foo dialect' and consider only allowing aliases that include the language name in them. It is unfortunate that Wikipedia puts the primary entries for various Chinese lects under 'Foo dialect' instead of 'Foo Wu', 'Foo Jin', etc. for precisely the reason you mention. Even in the case of the same location mentioned, it's quite possible for a given location to have multiple dialects of different languages. Benwing2 (talk) 07:02, 18 March 2024 (UTC)[reply]
@Benwing2: Thanks for tabulating these.
re: removing aliases that read 'Foo dialect', there are some dialects whose affiliation is not extremely clear, e.g. Huizhou dialect (not to be confused with Huizhou Chinese which is czh) and so we labelled it as "Huicheng dialect" ("Huizhou dialect" would also work but that will certainly be confused with czh).
Often the labels are used to achieve the text rather than categories, which is why there is a relatively large amount of |_| in {{lb|zh}}. One slighly extreme example would be 鐳#Etymology 2 sense 3, {{lb|zh|Malaysia|&|Singapore|_|Cantonese|Hakka|Southern Min|;|Xiamen|Quanzhou|Zhangzhou|_|Hokkien|;|slang|_|in|_|Hong Kong Cantonese}}, which is actually representing a large number of lects but it's not categorised properly due to the limits of {{lb}}. This is why sometimes you will find labels like {{lb|zh|Taiwan Hokkien and Hakka}} so that the desired result is achieved, even though it should actually be {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}}.
I would suggest to search for additional items in the form of {{lb|zh|Foo|_|Cantonese}} or {{lb|zh|Bar|_Wu}} which should unveil more unencoded dialects, some of which may already be covered in the previous section (e.g. something as mundane as {{lb|zh|Xiamen Hokkien}} isn't a recognized label so often it is inputted as {{lb|zh|Xiamen|_|Hokkien}}). (this is also why there is a relative abundance of Wu dialects in the labels data, probably the result of some dedicated user who added them)
I'll go over the actual individual lects later. – wpi (talk) 12:55, 18 March 2024 (UTC)[reply]
Personally I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code.
  • Austrailia, Malaysia, Singapore, Thailand etc.: these may need a code for each lect (as appropriate), e.g. Malaysian Cantonese, Thailand Teochew (Malaysia may need to be further subdivided by location, we already have Penang Hokkien) [see also my previous comment]
  • Guangdong: usually means Cantonese+Teochew+may be Taishanese+maybe Leizhou+maybe Hainan, this should be replaced accordingly
  • Hong Kong, Macau: usually refers to the standard form of Chinese (not necessarily Cantonese, but often somewhat influenced by Cantonese) spoken in HK/Macau respectively [zh-HK and zh-MO?]
  • Taiwan: similar to above [zh-TW?]
  • Hsinchu & Taichung Hokkien: there may be some need to create code for the Taiwanese Hokkien dialects, but I'll defer to others for this (but IIRC Hsinchu is predominantly Hakka speaking?)
  • Mandalay Taishanese: might need a code but probably won't be used much
  • Shehua: a branch parallel to Neo-Hakka (which we call Hakka/which is the only part of "Hakka" that we have coverage of), "She" is likely the more common academic term (but this clashes with She the Hmong-Mien language, both names share the same etymology). [zhx-she?]
    • (the ancestor Neo-Hakka and She is parallel to Paleo-Hakka, but this is another rabbit hole, plus coverage of it is relatively poor)
  • Anxi Hokkien, Yongchun Hokkien, Muping Mandarin, Wanrong: seems relatively minor to be assigned a code? I'm not certain however.
Some comments (partly based on my observation of the usage in {{lb|zh}} and also based on our[my] plans to increase coverage of dialects), grouped by branch:
  • Gan: label-wise we usually have Nanchang [gan-nan?], Lichuan [gan-lic?], Pingxiang [gan-pin?], Taining [gan-tai?], Yongxiu [gan-yon?]. These are all locations rather than subgroups (my understanding is that the subgrouping of Gan is quite undeveloped). It's worth noting that our Gan coverage is extremely lacking (due to both lack of data and lack of motivated editors), and most likely we will only have these four locations in the foreseeable future.
  • Hakka: Sixian may need to be divided into North Sixian/South Sixian. We might also want to add the rest of the Taiwanese Hakka dialects. Coverage of Yudu Hakka [hak-yud?] and Hong Kong Hakka [hak-HK?] seems OK.
  • Huizhou: this group is too small to have any meaningful subdivision, I think at most we can assign a code to Jixi [czo-jix?].
  • Jin: I think we could have Taiyuan [cjy-tai?] and Xinzhou [cjy-xin?]. The other dialects have poorer coverage. (I didn't find any usage of Xinzhou Wu)
  • Wu: besides the mentioned ones, we may also need Danyang Wu? I'll defer to ND381 and Musetta6729.
  • Eastern Min: representative dialect is Fuzhou [cdo-fuz?], other possible inclusion would be Fuqing [cdo-fuq?] and maybe Ningde [cdo-nin?]. The rest seems too sporadic.
  • Xiang: Changsha [hsn-cha?], Shuangfeng [hsn-shu?], Loudi [hsn-lou], Hengyang [hsn-hya] are major dialects. The coverage situation is similar to Gan.
  • Mandarin: the ones mentioned should be added generally.
  • Pinghua: Southern Pinghua [csp] is usually considered to be part of Yue. Worth noting Nanning Pinghua and Nanning Cantonese are different though.
  • Cantonese/Yue: I think we should add Siyi Yue [yue-siy?/zhx-siy?] and demote Taishanese [zhx-tai] to a variety of it. The usage of [yue] to refer to Cantonese or Yue is pending discussion. Other ones that could be added include Yangjiang [yue-yan?/zhx-yan?] and Dongguan [yue-don?], while the rest seems to have relatively poor coverage.
  • Southern Min is already dealt with elsewhere
  • Puxian Min: I believe this can have Putian [cpx-put?] and Xianyou [cpx-xia?]?
wpi (talk) 16:37, 18 March 2024 (UTC)[reply]
@Wpi Thank you for all the details! I just realized there is a third source of varieties here at Wiktionary, which is the dialectal data found in the data modules for {{zh-dial}}, specifically Module:zh/data/dial. For example, under 討食讨食 you have a whole set of "dialectal synonyms of 要飯要饭 (yàofàn, to beg for food)" in addition to the Thesaurus entries for 乞討乞讨 (qǐtǎo) fetched using {{syn-saurus}}. Ultimately IMO we should probably merge the dialectal data in the {{zh-dial}} modules with the Thesaurus entries, but that is another can of worms. For now I'll just note that the {{zh-dial}} data conveniently comes with links to English or Chinese Wikipedia entries so it should be easy to find the relevant Wikidata items. *HOWEVER*, there are an absolute ton of varieties listed; I count 1,122 of them currently. (Of these, 969 have Wikipedia links, but many of these links are to geographic entries rather than dialectal entries.) I doubt all of these varieties need to be assigned etym-only codes. I think one way to pare them down is to go through the dialectal data and count how many synonyms there are for each variety. This should reveal which varieties are important enough to warrant codes (I imagine a lot of the varieties listed have no synonyms at all in the data). Benwing2 (talk) 22:32, 18 March 2024 (UTC)[reply]
Please see User:Benwing2/zh-dialect-counts. This table lists all the varieties/dialects found among the dialectal synonym data along with counts, the Chinese dialect group they're in and the Wikipedia link, if any. (There 2,787 terms currently listed in the data.) I'm thinking we can start with the first 100 or 200 varieties listed, figure out what to do with them, and go from there. Also, the script I wrote to combine the counts with the variety data in Module:zh/data/dial output the following warnings concerning varieties for which there are synonyms but which aren't in Module:zh/data/dial:
WARNING: Found variety 'Luoyang' not in variety data
WARNING: Found variety 'Zhumadian' not in variety data
WARNING: Found variety 'Pingdingshan' not in variety data
WARNING: Found variety 'Zhoukou' not in variety data
WARNING: Found variety 'Xuchang' not in variety data
WARNING: Found variety 'Nanyang' not in variety data
WARNING: Found variety 'Luohe' not in variety data
Benwing2 (talk) 23:24, 18 March 2024 (UTC)[reply]
@Wpi In response to some of your comments:
  1. As for 'Foo dialect' issues, I think in cases like 'Huicheng dialect' where the affiliation isn't clear, we should just identify them as 'Huicheng Chinese'. It's true that we usually do that for top-level groups but I think it's better in this case than using "dialect".
  2. I will search for labels specified using _ and such. Hopefully the usage isn't too inconsistent.
  3. Concerning your statement "I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code", what is the alternative you are responding to? Is it further full-language splits (e.g. with Southern Min)?
  4. For zh-HK, zh-MO, you say "standard language". If this is Cantonese, maybe we should use yue-HK, yue-MO?
  5. For the specific lect comments, I don't know enough to respond but it all looks reasonable. User:Theknightwho, what do you think of the proposal to demote Taishanese to a variety of Siyi Yue?
Benwing2 (talk) 05:25, 19 March 2024 (UTC)[reply]
In re point #2, see User:Benwing2/zh-label-sets. Benwing2 (talk) 06:41, 19 March 2024 (UTC)[reply]
OK, only a few uses of labels involving 'Foo dialect', and only one involving a label actually listed in Module:labels/data/lang/zh, which was 𠀫𠀪 (which, BTW, is being RFV'd) using 'Hangzhou dialect':
  28 Huicheng dialect
   4 eye dialect
   3 ancient Chu dialect
   1 title=zh:Grammaire du dialect
   1 southern dialect
   1 some Mandarin with a Southern Chinese dialect
   1 of one's speech of the local dialect
   1 ancient Qi or Wu dialect
   1 ancient Qi dialect
   1 [[w:Luoyang dialect
   1 Sòng-Lǔ dialect
   1 Sichuan dialect
   1 Shaanxi dialect
   1 Northeastern dialect
   1 Ningyuan dialect
   1 Hangzhou dialect
I changed that one usage to 'Hangzhounese' and deleted all the 'Foo dialect' labels. We might want to add something for the 'Huicheng dialect' labels (cf. your mention above of this). Benwing2 (talk) 08:10, 19 March 2024 (UTC)[reply]
@Benwing2:
re #3, I'm referring to when we are assigning the codes, i.e. groups like Siyi will have a full code whereas local dialect points like Taishanese will have etym-only codes.
re #4, it's basically Standard Written Chinese as used in Hong Kong/Macau. It should be "written/used" not "spoken" as I previously mentioned. There's a difference between yue-HK (Hong Kong Cantonese) and zh-HK (Hong Kong), it's a bit like Norweigian Nynorsk vs Norweigian Bokmal.
Also pinging @Justinrleung for comments to specific lects.– wpi (talk) 11:31, 19 March 2024 (UTC)[reply]
@Wpi OK thanks. As for #3, I agree with your idea of the separation between full and etym-only languages going along group lines. As for #4, didn't realize there is this difference but it makes sense. Benwing2 (talk) 15:04, 19 March 2024 (UTC)[reply]
Thoughts on Wu codes (locality codes are just suggestions):
  • Northern Wu subbranches imo don't really need codes but individual localities would be beneficial. Of which:
Changzhounese wuu-chz
Danyangese wuu-dan
Shaoxingese wuu-shx
are in need of codes (due to relative abundance of data, and will also be gaining zh-pron support soon). Some others to consider may include
Cixinese wuu-cix
Huzhounese wuu-huz
and all the other lects currently in Module:wuu-pron/sandbox. We are currently still working on it so it may be worth delaying the addition of these lect codes until we finish the Northern Wu overhaul.
  • Currently extant Northern Wu localities (Hangzhounese, Ningbonese, Shadi Wu, Shanghainese, Suzhounese) should all be listed under Northern Wu (wuu-nor) in the family tree on (and any other system that may handle language families).
  • Southern Wu wise, I believe these would be helpful to have in the future, as we will be adding pages/making modules for them as soon as possible:
Jinhuanese / Wuzhou Wu wuu-jih
Taizhounese / Taizhou Wu wuu-tai
Lishunese / Chuzhou Wu wuu-lis
Shangraonese / Xinzhou Wu wuu-shr
in descending order of importance. I decided to split "Chuqu Wu" as is described on the chart as there is no clear consensus as to how the non-coastal non-Northern Wu bits should be split, but in general these three areas (Wuzhou, Chuzhou, Xinzhou) can be seen reflected in some way.
  • A Southern Wu code (wuu-sou) should not be made. It is likely not a familial grouping but rather just a term to use to contrast it with Northern Wu. There have been some preliminary studies that investigate whether it does form a coherent family, but results are mixed and sample sizes are small.
Regarding why there are so many Northern Wu localities, yes, muset & I added them, as unlike Hokkien for instance, the sociolinguistic attitude towards these lects is first and foremost the locality rather than the family (which contrasts with the "Hokkien" identity).
@Musetta6729 - only other active Wu editor: let us know if you have any other/conflicting ideas — nd381 (talk) 19:38, 19 March 2024 (UTC)[reply]
@ND381 Thank you! I will probably take all your suggestions. Benwing2 (talk) 20:26, 19 March 2024 (UTC)[reply]
Just only got the chance to look at this thread now - in terms of Wu I definitely agree with everything that ND has said so far, just two things I would like to mention:
First: Having Urban Shanghainese as a variety (maybe under something like wuu-ush) along with simply "Shanghainese" (wuu-sha) might be useful. This is due to a variety of reasons, but mainly that Contemporary "Urban" Shanghainese has showcased more convergent evolution with say, Ningbonese or Suzhounese during the last century, and has become more sociolinguistically and identity-wise distinct from many Non-Urban varieties surrounding it. With only the label "Shanghainese" now it is tricky to disambiguate between categories such as:
  • Primarily urban inventions not used in non-urban varieties, or that have spread out to non-urban regions as still recognisably "urbanite" speech
  • Common invention/retention in Non-Urban Shanghai varieties that are rare/obsolete/not used in Urban Shanghainese
  • Inventions in Non-Urban Shanghainese that is not geographically restricted to one specific region of Shanghai
  • Usage attested in both 1850s City-Center Shanghainese and contemporary Non-Urban, but not Contemporary Urban Shanghainese
Especially because all of this variance is also deeply interconnected with notions of locality, of new and old, of class, ethnicity and other sociolinguistic variables when looked at from an Urban Shanghainese standpoint. All of this has led to the use of ad hoc labels along with the Shanghainese tag like "old-period", "chiefly non-urban/suburban", "rare or obsolete" etc which is definitely not ideal. By having Urban Shanghainese as a variety I expect that this would be easier to manage - and as we go on to add more coverage on Non-Urban Shanghainese varieties we should hopefully be able to have more specific variety codes for lots of the Non-urban Shanghainese varieties too.
The second thing is a bit more minor - Suhujia (蘇滬嘉 - see linked Chinese Wikipedia article) might be a more commonly used term than Sujiahu (蘇嘉滬), which we seem to have now. The grouping seems to be somewhat areal and vaguely defined to me and I am doubtful of the extent to which having it might be useful, but nevertheless it's a fairly widely accepted grouping so thought I would bring this up in case we end up making the decision to add it. Musetta6729 (talk) 04:38, 24 March 2024 (UTC)[reply]

Redid Chinese labels[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho I redid the label structure in Module:labels/data/lang/zh. I added missing labels corresponding to the new lects in Module:etymology languages/data, canonicalized the labels to include the group name (e.g. Xiamen Hokkien instead of just Xiamen), and added shorter aliases. Duplication is avoided in something like {{lb|zh|Xiamen Hokkien|Quanzhou Hokkien|and|Zhangzhou Hokkien}} (or equivalently, {{lb|zh|Xiamen|Quanzhou|and|Zhangzhou}}) by a new Chinese-specific label postprocessing function in Module:labels/data/lang/zh/functions, which attempts to remove duplicate group names as well as duplicate occurrences of "Taiwanese" in {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. Please let me know if you don't like the output in specific situations and I will tweak the function. Note that I removed the label Taiwanese Hokkien and Hakka and all its aliases, after converting all occurrences to use multiple labels like {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. I also changed a few categories to better reflect the lect name, e.g. the label Philippine Hokkien now categorizes into Category:Philippine Hokkien instead of Category:Philippine Chinese. Benwing2 (talk) 00:50, 20 March 2024 (UTC)[reply]

@Benwing2: Thanks for setting this up. The function looks like it works well generally, but there are some cases where it might lead to confusion, such as {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}} showing up as "Taiwanese Hokkien, Hakka", which could mean the unintended "Hakka (in general) and Taiwanese Hokkien". Perhaps one way to prevent this is to only remove duplicate group names when there is an "and" somewhere in the chain? Is that something that could be done? — justin(r)leung (t...) | c=› } 06:56, 20 March 2024 (UTC)[reply]
@Justinrleung Yup, I can do that, thanks for the suggestion. Benwing2 (talk) 17:08, 20 March 2024 (UTC)[reply]
@Justinrleung This should be done. Let me know if you see anything else needing fixing. Benwing2 (talk) 03:25, 22 March 2024 (UTC)[reply]

I can't find anyone reconstructing *kwh₂et-. Most sources seem to have trouble deriving the Slavic, Latin, and Armenian words from the same root. They probably don't belong here, but I don't know enough about these languages to decide. —Caoimhin ceallach (talk) 17:02, 23 March 2024 (UTC)[reply]

Old Armenian քացախ (kʻacʻax) is certainly not an inherited term, I will remove it. Vahag (talk) 17:23, 23 March 2024 (UTC)[reply]

Do we want this (perhaps under a better name), in which case we'd be splitting Category:English rebracketings, or do we want to merge it back into Category:English rebracketings? There are only 39 entries in the rebracketings category, mostly this type, so if we split by rebracketing type, it seems like we'd end up with maybe two dozen entries in this /n/ category, and then a handful of mistaken additions or removals of /l-/, /t-/, or /-s/ to put into their own categories (or leave in the main category). - -sche (discuss) 00:48, 25 March 2024 (UTC)[reply]

Seems like this was handled by emptying the cat. RFM-deleted This, that and the other (talk) 09:21, 23 April 2024 (UTC)[reply]
Yeah; for clarity/transparency, a single entry had been placed into this category and I removed it a while ago and forgot to update this discussion. Thanks! - -sche (discuss) 20:38, 23 April 2024 (UTC)[reply]

Ramifying/filling out Yue Chinese[edit]

(Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): Apologies once again for the wide ping, as I haven't received any responses to some of my other pings. I added a bunch of labels for Yue Chinese lects, but it is revealing some issues:

  1. We correctly classify Yue as a family, but it contains only two languages (Cantonese language and Taishanese language). Meanwhile per Wikipedia and Glottolog there are something like seven primary branches:
    1. Yuehai Yue, which is more or less Cantonese proper.
    2. Siyi Yue, which includes Taishanese.
    3. Goulou Yue, most notably including Yulin dialect and its sublect Bobai dialect.
    4. Yongxun Yue, with Nanning Yue as the representative dialect.
    5. Gaoyang Yue, most notably including Yangjiang Yue.
    6. Wuhua Yue.
    7. Qinlian Yue, partly intelligible with standard Cantonese.
  2. We are using the code yue for Cantonese proper and zhx-yue for the Yue family, which is inconvenient and contrary to ISO 639-3 usage.

I propose:

  1. Change to using yue for the family and use some more specific code for Cantonese, either yue-can or yue-yue (for Yuehai Yue).
  2. Create L2 languages for each of the above seven groups. We can reuse the "Cantonese language" for Yuehai Yue. This shouldn't entail any real splitting per se as we already have Yue as a family rather than a language.
  3. Demote Taishanese to an etym-only variety of Siyi Yue and assign it a code yue-tai in place of zhx-tai.

Please also note, in the labels I created, the canonical name for each label has "Cantonese" in it for all sublects of Yuehai Yue but "Yue" for Yuehai Yue itself and for all other lects. Almost everything called "Foo Cantonese" (except for variants of standard Cantonese) has an alias "Foo Yue", but not the other way around. For example, the Dongguan dialect is called "Dongguan Cantonese" because it is a variety of Yuehai Yue, and has "Dongguan Yue" as an alias; but the Yulin dialect is called "Yulin Yue" and does NOT have "Yulin Cantonese" as an alias, since it is a variety of Goulou Yue rather than Yuehai Yue. Benwing2 (talk) 22:17, 28 March 2024 (UTC)[reply]

Thanks for the ping. Here are some of my questions, to make sure I understand this better:
  1. What would the categories of a normal entry like 不嬲 look like? I'm asking this because "Cantonese" and "Taishanese" are more recognisable than "Yuehai Yue" and "Siyi Yue" and I'm wondering if these more obscure names would end up in the entry. If this works like the other Chinese splits, I suppose the categories would not change, and just the categories of the categories would change?
  2. We have plans (maybe) to include more Yue languages than just Cantonese and Taishanese, which primarily means expanding the scope of the "pronunciation" section of the entries, and this would also generate more categories. Would your proposal benefit this project because we could more easily categorise the new Yue languages to come?
  3. While normal entries written using Chinese characters have the "Chinese" L2 header, romanisations have their respective header per language, such as xiànglái having the Mandarin L2 header and boán-liân having the Hokkien L2 header. We don't seem to do the same for Cantonese, and the pronunciation sections also don't link to the Cantonese romanisations, and I also can't seem to find any Cantonese L2 header. This might have been decided in an earlier policy that I don't know about, so I guess my question is, would it create problems if you demote Taishanese to an etym-only language?
  4. Per your last point I tried to google "Yulin Yue" but the main results are about someone named Yulin Yue, so I tried to google "Yulin Yue" + language and got 235 hits, while "Yulin Cantonese" got me 73 hits (and "Yulin Cantonese" + language got me only 8 hits). This isn't a question per se, just a comment about how little-known other Yue languages are.
  5. I feel like I just have to insert a comment about the choice of Mandarin exonyms vs. Cantonese exonyms vs. endonyms. I think the first option is generally how we do things (except for the names of the main branches), and I suppose this is just the result of the general scholarship, and I'm not really trying to subvert this practice, but I would just like to raise some awareness to this phenomenon.
The above. Apologies if 1999. --kc_kennylau (talk) 23:01, 28 March 2024 (UTC)[reply]
@Kc kennylau Thanks much for the detailed questions! In response to your questions, let me see if I can answer:
  1. There are two types of categories: (1) L2 language categories (e.g. Category:Mandarin lemmas); (2) etym-language categories (e.g. Category:Xi'an Mandarin). Under my proposal, we would probably use "Cantonese" in place of "Yuehai Yue" as the L2 language name, since they seem more or less equivalent; but "Siyi Yue" would be the L2 language subsuming Taishanese. This means that a Taishanese term would be categorized both under Category:Siyi Yue lemmas and Category:Taishanese Yue (or maybe just Category:Taishanese; there is some flexibility in the choice of etym-language categories). So essentially, things like Category:Taishanese lemmas would go away in favor of Category:Siyi Yue lemmas + Category:Taishanese Yue, but Category:Cantonese lemmas would remain (possibly with additional more specific categories like Category:Guangzhou Cantonese or Category:Hong Kong Cantonese, both of which already exist).
  2. This proposal is somewhat orthogonal to how we handle the pronunciation section entries; the ones for Cantonese and Taishanese can remain as-is, but might categorize differently (as explained above).
  3. If there were romanizations under a Taishanese header, they would have to be renamed to have Siyi Yue as the header and a label Taishanese attached, to make it clear that the romanizations are specifically Taishanese. (Similarly, entries like boán-liân used to be under a Min Nan header before Hokkien got split out as an L2 language.) But since we don't seem to have any such romanizations, this issue won't arise (at least for now).
  4. As for the obscurity of Yue varieties other than Cantonese and Taishanese, I completely agree. The terminology isn't well-worked out and the term "Cantonese" is particularly problematic since it variously refers specifically to (a) the speech of Guangzhou specifically; (b) the more general Yuehai Yue language that Guangzhou speech is part of [which is what I'm defining it as]; and (c) the entire Yue family. This issue doesn't seem to come up so much for other groups like Mandarin and Wu.
  5. As for Mandarin vs. Cantonese/Yue naming, I am not wedded to using the Mandarin terms; I just chose them because that is what Glottolog and Wikipedia largely use. If the consensus is to use Cantonese-language terms for all lects or to use native terms (endonyms), we can do that as well. I am guessing the Mandarin terms see more usage just out of a sort of default familiarity (pretty much everyone who works with Chinese languages is familiar with Mandarin but many aren't familiar with Cantonese or other varieties, and several Yue varieties don't even have standard romanization schemes). Benwing2 (talk) 23:50, 28 March 2024 (UTC)[reply]
I support the move in general (with a strong preference of using yue-can), however here's a couple of problems I can foresee with this proposal:
  1. Goulou actually forms a dialect continuum with Southern Pinghua language, and therefore nowadays [csp] is usually thought of as part of Yue, but weirdly it has a separate language code. Should [csp] be included as well?
  2. Yongxun is a (quite recent) descendant of Cantonese spoken in the major towns and cities in the Pearl River with minor influences from the substrate Goulou varieties. Personally I don't think it should be a separate branch.
  3. As I mentioned before, there are (at least) two distinct varieties of Yue spoken in Nanning, we currently call them Nanning Cantonese (under Yongxun) and Nanning Pinghua (under Goulou-Southern Ping). How can the two be distinguished if it is renamed to "Nanning Yue"?
wpi (talk) 04:19, 29 March 2024 (UTC)[reply]
@Wpi Thanks very much for responding. In response to your issues:
  1. I don't know enough about Pinghua to answer, but I note that Wikipedia's Pinghua article asserts that Pinghua has been treated as its own dialect group, separate from Yue, in most textbooks and surveys written since the 1980's. As for dialect continuums, there are many places where different branches form dialect continuums with each other but are still separated. (As an example, Western Bulgarian forms a dialect continuum with Torlakian, which in turn forms a dialect continuum with (other varieties of) Serbo-Croatian. Serbo-Croatian is considered a Western South Slavic language and Bulgarian an Eastern South Slavic language; despite what the Wikipedia article on Torlakian says, it's more often considered part of Serbo-Croatian than Bulgarian.) Maybe User:Justinrleung or User:沈澄心 can comment? There's an additional issue that if we group Southern Pinghua with Yue, what do we do with Northern Pinghua?
  2. Likewise I don't know enough about Yongxun Yue to have a firm opinion; in any case it seems like we won't have any lemmas in it, so whether we make it its own L2 or group it with some other L2 (which one? Cantonese or Goulou?) wouldn't make much difference.
  3. I think this is only an issue if (1) we leave Yongxun as its own group and (2) we put Southern Pinghua under Yue. If Yongxun is e.g. grouped with Cantonese and Pinghua left as-is, the current names are fine. If both dialects get considered non-Cantonese Yue, then one solution is to clarify them as 'Nanning Yongxun Yue' and 'Nanning Pinghua Yue' or something.
Benwing2 (talk) 04:55, 29 March 2024 (UTC)[reply]

Recategorize terms with "uncertain" etymologies outside of Category:Terms with unknown etymologies by language[edit]

Terms that use {{uncertain}} should have their own category, separate from the terms that use {{unknown}}, as they are on separate levels. There should be a new category like Category:Terms with uncertain etymologies. I'd also personally prefer the term "unclear" over "uncertain," but that's a separate issue. AG202 (talk) 04:45, 31 March 2024 (UTC)[reply]

@AG202 What about renaming Category:Terms with unknown etymologies by language to Category:Terms with unknown or uncertain etymologies by language (although I suppose this somewhat nullifies the point of having two templates)? In practice, people are not maintaining the distinction between "uncertain" and "unknown" but use both terms fairly promiscuously. Benwing2 (talk) 05:35, 31 March 2024 (UTC)[reply]
The distinction is too vague to warrant two separate categories. -- Sokkjō 03:15, 4 April 2024 (UTC)[reply]
I agree there is not enough distinction. I think the distinction some people hope for is "unknown means no-one has any ideas, uncertain means people have ideas" (?), but I'm sure I've seen even other dictionaries use "Uncertain." as the complete etymology for a word they have no ideas about, and conversely I've seen things like "Unknown. Theories include..."; there is no logical or maintainable distinction; if you're not certain what the etymology is, you don't know (with certainty) what it is (you just hypothesize), and conversely if it's unknown you're not certain what it is. I would not object to renaming the category as Benwing proposes, but I would also not object to just merging "unknown" into "uncertain" (or vice versa). - -sche (discuss) 15:55, 4 April 2024 (UTC)[reply]
The argument is fallacious because editors regularly do not have precious knowledge about existence and extent of previous attempts, so template application is quite a guess and theology. Given that the different categorization invites wasteful concerns of editors (adding to the learning curve load), I do not only not see the utility of if but also reckon it harmful, and am also sure that Metaknowledge would position himself likewise, as confronted by my argument about underspecified species names vs. uncertain meaning words on Talk:بركة. If you go from unknownness to uncertainty you can also visit underspecification and other more “science-theoretical” details that can only be left to philosophy papers nobody will actually want to write. Fay Freak (talk) 16:27, 4 April 2024 (UTC)[reply]

@Mellohi!, I think have to disagree with your recent move, from RC:Proto-Germanic/sōwulą to RC:Proto-Germanic/sō(e)l, and subsequent page blanking of RC:Proto-Germanic/sōwulō. Really such a thing should have been discussed first. Though I am aware of Kroonen's reconstruction of *sōel,[1] reconstructing *sōwVl-, however, has long and wide support,[2][3][4][5][6] with *sōwVl- > *sōl- being a later development.[7] @Leasnam, Mahagaja, Mnemosientje -- Sokkjō 00:46, 4 April 2024 (UTC)[reply]

I do not see where Ringe claimed *sōl is a later development. Ringe just lists a bunch of intermediate steps to "?PGmc *sōl". I believe he put a question mark because West Germanic evidence for the l-stem form (for which Old Norse influence can be ruled out) is virtually non-existent. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 02:17, 4 April 2024 (UTC)[reply]
Ringe lists *sōwul before *sō̄l in the development chain, which could could be considered Proto-Germanic, and Gothic 𐍃𐌰𐌿𐌹𐌻 (sauil) cannot derive from *sōl, so it must come from an earlier form. -- Sokkjō 02:56, 4 April 2024 (UTC)[reply]
The Gothic second vowel came from the *-e- of the oblique stem *sh₂wén- being inserted into the nominative singular before the oblique stem itself was remodeled to later lead to the spinoff *sunnǭ (Ringe says this on page 277). — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 03:19, 4 April 2024 (UTC)[reply]
Ringe does indeed suggest that, but it would be leveled from *sōwul ~ *sawen-. I would reconstruct Proto-Germanic *sōwul ~ *sawiniz, with Gothic from leveled *sawil- (or *sōwil-). Proto-Germanic *sunnǭ is a different can of worms altogether. I'd like to hear from the people I pinged. -- Sokkjō 03:26, 4 April 2024 (UTC)[reply]
Could we compromise on *sō(w)(e)l? I do think it makes sense to give the athematic form as the main entry. —Mahāgaja · talk 07:54, 4 April 2024 (UTC)[reply]
@Mahagaja: That's honestly even worse. Perhaps a compromise would be moving it to *sōl, with a paradigm of *sōl ~ *sawiniz, and the etymology stating it from earlier *sōwul ~ *sawiniz, from PIE *sóh₂wl̥ ~ *sh₂wéns. -- Sokkjō 08:17, 4 April 2024 (UTC)[reply]
Having just reminded myself of the existence of 𐍃𐌰𐌿𐌹𐌻 (sauil), which is the oldest attested form and almost certainly athematic (not a neuter a-stem as the entry says), I don't like the idea of naming the entry something that won't yield it, and *sōl won't, but *sō(w)el will. —Mahāgaja · talk 08:37, 4 April 2024 (UTC)[reply]
@Mahagaja: As mentioned above, the Gothic could be leveled from the genitive, per Ringe, similar to how *fōr ~ *fuiniz became PWG *fuir. I also don't see how Proto-Germanic *sōwel could yield the ON and WG forms. -- Sokkjō 09:02, 4 April 2024 (UTC)[reply]
Then we arrive back at the current reconstruction of *sō(e)l. Gothic is from *sōel and Northwest Germanic is from *sōl (which could be contracted from either *sōel or *sōul). But considering the Gothic genitive of 𐍃𐌰𐌿𐌹𐌻 (sauil) was almost certainly *sunnins it doesn't seem like a plausible source for leveling. Surely leveling would have given *sunnō (neuter ōn-stem, not feminine ō-stem) from an analogy hairtins : hairtō :: sunnins : X, X = sunnō. —Mahāgaja · talk 09:17, 4 April 2024 (UTC)[reply]
Are you really so sure the Gothic must be athematic? I can't find sources stating that attested sunnin belongs to sauil, which all sources I've checked - Miller, Lehmann, and Köbler - call a neuter a-stem without mentioning an athematic paradigm, not sure if I'm missing something. Gothic generally tends to level pretty heavily in such cases; consider e.g. 𐍆𐍉𐌽 (fōn), 𐍅𐌰𐍄𐍉 (watō). Can you name a clear example where this did not happen? — Mnemosientje (t · c) 18:07, 4 April 2024 (UTC)[reply]
Just noticed that our etymology at sauil refers to Kroonen, who does indeed claim this. It's a fascinating idea - but I can't really say the evidence seems to support it very strongly. — Mnemosientje (t · c) 18:37, 4 April 2024 (UTC)[reply]
There's no evidence of sauil being an a-stem either, though. No genitive *sauilis or dative *sauila occurs. I think sunnin belongs to sauil for a few reasons: (1) sauil and sunnin appear in Mark and nowhere else, and no other word for "sun" appears in Mark; (2) where the feminine ō-stem sunno appears in other books, it's a perfectly well behaved, regular feminine ō-stem; why on earth would it have an irregular dative sunnin?; (3) -in is the regular dative singular ending for neuter n-stems like hairto, not to mention funin and watin in the other r/n-stems. Thus I think sauil was a highly irregular neuter l/n-stem in Gothic, with sauil in the nominative/accusative, sunnin in the dative, and presumably *sunnins in the genitive. But only the translator of Mark used it; other people used sunno. I suspect that the sauil/sunnin word was archaic already at the time, and the Mark translator felt (as so many Bible translators do to this day) that the Bible is the appropriate place for archaic language. Other translators used sunno, which I bet was the normal, everyday word for "sun". —Mahāgaja · talk 20:04, 4 April 2024 (UTC)[reply]
@Mahagaja: Despite all of Kroonen's ad hoc shoehorning, there is simply no way to get a germinate *-nn- in Proto-Germanic from an L/n-stem, and *sunnǭ f (sun) (and *sunnô m (sun)) instead must be a secondary formation(s), likely from lost adjective *sunnaz (sunny), perhaps from either PIE *sh₂un-wó-s (per Ringe), or *sh₂un-t-nó-s (per Hilmarsson). Did these two terms, an l/n- and n-stem, merge into a single paradigm in Gothic? -- maybe -- but 𐍃𐌿𐌽𐌽𐌹𐌽 (sunnin) cannot be used as the basis for the shape to the genitive of this word. -- Sokkjō 21:51, 4 April 2024 (UTC)[reply]
I haven't read Kroonen (I'm always hampered in these discussions by my lack of access to recent sources), but I see no reason why sunnin can't come from *sulnin with the same doubly marked -nin of funin. At any rate, there is no plausible way that sunnin can come from feminine *sunnǭ, and what is the evidence for a masculine *sunnô other than sunnin itself? —Mahāgaja · talk 06:29, 5 April 2024 (UTC)[reply]
@Mahagaja: I'm not sure how one would arrive at *sulnin, but *-ln- in Germanic yields *-ll-, cf. *allaz. Gothic 𐍃𐌿𐌽𐌽𐌹𐌽 (sunnin) would indeed then derive from masculine *sunnô and is cognate with Proto-West Germanic *sunnō m. -- Sokkjō 07:09, 5 April 2024 (UTC)[reply]
For another take of the matter here's Klimp (2013), the section on this word starting from p. 108. Klimp basically says that the remodeled oblique stem *sunn- came from reshaping original *swen- after some other secondary related word in *sunn- that was created by other means. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 08:00, 5 April 2024 (UTC)[reply]
It's the same take. The Klimp dissertation is just echoing his professor, Kroonen. -- Sokkjō 08:14, 5 April 2024 (UTC)[reply]
@Mahagaja, Mnemosientje: I went and created an entry for PG *sunnô m. Quite a few sources tend to support Gothic 𐍃𐌿𐌽𐌽𐌹𐌽 (sunnin) being the dative to an an-stem, which, again, brings doubt to a l/n-stem surviving into Proto-Germanic. I think perhaps we should restore this entry to the conservative path of having two entries, *sawilą for the Gothic, and *sōlō for the WG and ON. Mnemosientje, do we know the gender of Crimean Gothic sune? -- Sokkjō 23:57, 6 April 2024 (UTC)[reply]
Also maybe worth mentioning, Kroonen reconstructs Proto-Germanic *haulaz as possibly from: nom. *kéh₂ul̥ > *hōl, gen. *kh₂ul-ós > *kulaz, loc. *kh̥₂uéli > *kaweli. Maybe a late-PIE root noun could be argued for, cf. *wósn̥, *h₃éngʷn̥:
  • (proterokinetic) *sóh₂wl̥ ~ *sh₂wél-s > PG (*sōwul >) *sōl ~ *sawilis;
  • or (amphikinetic) *sóh₂wl̥ ~ *sh₂ul-és ~ *sh₂wéli > PG (*sōwul >) *sōl ~ *sulaz ~ *sawili.
Thoughts? -- Sokkjō 04:18, 7 April 2024 (UTC)[reply]

References[edit]

  1. ^ Kroonen, Guus (2013) “*sōel- ~ *sunnōn-”, in Etymological Dictionary of Proto-Germanic (Leiden Indo-European Etymological Dictionary Series; 11), Leiden, Boston: Brill, →ISBN, page 463
  2. ^ Nedoma, Robert (2017–2018) “Chapter IX: Germanic”, in Klein, Jared S., Joseph, Brian D., Fritz, Matthias, editors, Handbook of Comparative and Historical Indo-European Linguistics: An International Handbook (Handbücher zur Sprach- und Kommunikationswissenschaft [Handbooks of Linguistics and Communication Science]; 41.2), Berlin, Boston: De Gruyter Mouton, →ISBN, § The documentation of Germanic, page 877:*sōwulō}
  3. ^ Orel, Vladimir (2003) “*sōwelan ~ *sowelō”, in A Handbook of Germanic Etymology[1], Leiden: Brill, →ISBN, page 361
  4. ^ Mallory, J. P., Adams, D. Q., editors (1997), “*séhₐul”, in Encyclopedia of Indo-European culture, London, Chicago: Fitzroy Dearborn Publishers, page 556:*sōwilō
  5. ^ Hellquist, Elof (1922) “sol”, in Svensk etymologisk ordbok [Swedish etymological dictionary]‎[2] (in Swedish), Lund: C. W. K. Gleerups förlag, page 821:*sōwil-
  6. ^ Pokorny, Julius (1959) “sā́u̯el-, sāu̯ol-, suu̯él-, su̯el-, sūl-”, in Indogermanisches etymologisches Wörterbuch [Indo-European Etymological Dictionary] (in German), volume 3, Bern, München: Francke Verlag, page 881:*sōwila-; *sōwulā
  7. ^ Ringe, Donald (2006) From Proto-Indo-European to Proto-Germanic (A Linguistic History of English; 1)‎[3], Oxford: Oxford University Press, →ISBN, page 136

Categories for entries "spelled with" Ideographic Description Characters[edit]

Such as Category:Translingual terms spelled with ⿰.

Is there any use to separating these categories by the exact character used? Would it not be better to have an overarching category Category:Translingual entry titles using ideographic description sequences or similar. This, that and the other (talk) 05:23, 5 April 2024 (UTC)[reply]

  • @This, that and the other Support. The existing categories are especially problematic when you have multiple ideographic description characters, such as ⿰⿳⿰SIR木阝. However, why are you proposing to use "entry titles" in the category instead of just "terms"? Benwing2 (talk) 06:31, 5 April 2024 (UTC)[reply]
    Well, the term itself is not spelled with the ideographic description character. That's just a consequence of the fact the character is not encoded in Unicode. Nobody would consider these characters to be part of the spelling of the term. Moreover, it's ludicrous to say that ⿰亻尭 is spelled with ⿰ when is not – they are both equally composed of two CJK characters placed side-by-side (not sure of the technical CJK term for that). Compare this to Category:Translingual terms spelled with ◌́, which includes terms that use the combining accent character as well as those using precomposed Unicode characters, hence truly containing all terms spelled with the accent. This, that and the other (talk) 09:31, 5 April 2024 (UTC)[reply]
    Hah, I see you didn't actually argue for the use of the word "spelled". Whoops! I guess my argument against "terms" still runs along the same lines though. The terms themselves do not use these sequences, it is their Unicode encodings of the entry titles that do. This, that and the other (talk) 09:33, 5 April 2024 (UTC)[reply]

"Foo phrasal verbs with particle (bar)", "Foo compound verbs with bar", "Foo compound verbs with base verb bar"[edit]

Originally suggested by User:Arafsymudwr, with support from User:This, that and the other. Examples:

There are at various issues here:

  1. The word "bar" here isn't always best described as a "particle". Sometimes it's an adverb, sometimes an adjective, sometimes a preposition, sometimes multiple words as in Category:Irish phrasal verbs with particle (ar bun)‎ where ar bun is defined as a predicative adjective. Best to avoid specifying a part of speech.
  2. Similarly, when the verb is being classified by the base verb, sometimes the word "base verb" is present, sometimes it isn't.
  3. The categories where the part of speech isn't specified are a bit unclear/ambiguous; e.g. what exactly is the relationship specified in a category like Category:Azerbaijani compound verbs with aparmaq?
  4. "Compound verb" vs. "phrasal verb"; should we be making this distinction? From looking up "compound verb" and "phrasal verb", it seems a phrasal verb is a type of compound verb, specifically one formed with an adverb or similar word.

To make things consistent and unified, I tentatively propose the following:

  1. For phrasal verbs classified by the adverb/particle, use the form Foo phrasal verbs formed with bar.
  2. For compound verbs classified by the verb, use the form Foo compound verbs formed with bar; alternatively, Foo compound verbs formed with base verb bar.

Benwing2 (talk) 06:27, 5 April 2024 (UTC)[reply]

Support. However, specifically for English phrasal verbs as an exceptional case, would quotation marks assist? For instance, English phrasal verbs formed with "bar". Otherwise we end up with confusing category names like English phrasal verbs formed with with and English phrasal verbs formed with in ("within"), and while we can italicise the relevant word on the category page itself, it's my understanding this is not currently possible on the entries themselves or in parent categories. This, that and the other (talk) 09:39, 5 April 2024 (UTC)[reply]
@This, that and the other Hmm. I see your point. Wondering what others think. Benwing2 (talk) 18:35, 5 April 2024 (UTC)[reply]
This is a fair concern, and it's not just English where there are words it could be confusing to not have the term set apart somehow. The current categories' use of parentheses to set the words apart, while effective, seems unusual. Quotation marks seem reasonable, and for consistency I would suggest implementing them across the board rather than just for English. - -sche (discuss) 02:54, 7 April 2024 (UTC)[reply]
@Arafsymudwr @This, that and the other @-sche I support -sche's suggestion of implementing quotation marks across the board, and hence propose Foo {phrasal,compound} verbs formed with "bar". Note that we have a fourth variant popping up in Special:WantedCategories that should be subsumed into this same format: Category:Estonian compound verbs with the particle ette‏‎ and similar. The only issue then is whether to use straight quotes or curly quotes; I propose straight quotes because they are significantly easier to type, and we can always "normalize" to curly quotes in the displayed title of the category page, just like we can italicize the word inside of quotes. Benwing2 (talk) 06:41, 23 April 2024 (UTC)[reply]
The argument that straight quotes are easier to type holds less water in a situation where these names will rarely need to be typed out in full. But yes, I think we don't use curly quotes in page names anywhere on this wiki, so we should probably stick to straight quotes. This, that and the other (talk) 08:01, 23 April 2024 (UTC)[reply]

Channel (the Channel), the Channel[edit]

Hub (the Hub), the Hub[edit]

J3133 (talk) 13:08, 5 April 2024 (UTC)[reply]

Polish entry, might want to be moved to grób pobielany (the singular); see pl:grób pobielany. Hythonia (talk) 10:40, 7 April 2024 (UTC)[reply]

Inlinced to speedy move. Vininn126 (talk) 17:30, 8 April 2024 (UTC)[reply]

English. Move/convert to Appendix. Any red-linked item included in this automatically causes that page to be "wanted" thereby clogging Special:WantedPages with pages almost all or all of the "wants" for which are created the template. There are now 13 such redlinks.

Other templates of a similar nature exist, but should probably be handled one at a time. DCDuring (talk) 20:28, 10 April 2024 (UTC)[reply]

Category:French French and other redundancies[edit]

I propose to rename such categories (e.g. Category:French French, Category:English English, Category:German German, etc.) to less redundant and silly-sounding names. I have already eliminated Category:Spanish Spanish in favor of Category:Peninsular Spanish and am now doing the same for Category:Portuguese Portuguese in favor of Category:European Portuguese, in both cases using the preferred terminology in Wikipedia. (Note: Even though these appear to be slightly changing the scope of the categories, no information is lost as it's merely a recategorization in the labels module. Furthermore, the new category names either didn't exist formerly or, in the case of Category:European Portuguese, had no members.) I propose in general using the country name if there isn't a better/more standard term of some other format, e.g. Category:France French, Category:England English, Category:Germany German. Wikipedia has varying and inconsistent solutions for these cases, namely French of France, English language in England and German Standard German. The format like "French of France" etc. is possible but is inconsistent with the general Wiktionary naming practice of varieties of a given language, which put the language at the end. Benwing2 (talk) 05:46, 14 April 2024 (UTC)[reply]

Support. "France French" fits existing practice, as you say; we already use nouns rather than adjectives in some other cases, like "Switzerland German" to avoid the ambiguity of "Swiss German".
In fact, now that it's possible to have labels categorize differently for different languages, we could consider changing "Switzerland French" and "Switzerland Italian" back to "Swiss...", since those two are not ambiguous and were just collateral damage of people wanting to rename the German category.
But in the other direction... I wonder if we should consider changing not only "French French" but also e.g. "French Yiddish" to "France Yiddish", and "Vietnamese Chinese" to "Vietnam Chinese": I wonder if we should in general try to avoid categories that look like "[language name] [language name]". But that's probably a bigger discussion... - -sche (discuss) 05:58, 15 April 2024 (UTC)[reply]
@-sche OK, I'll change Switzerland French/Italian back to Swiss French/Italian. I should note that there are other cases to prefer the noun country form over the adjective one. For example, there used to be a category 'British Indian English' capturing terms used in British India, but then User:نعم البدل created a label British Pakistani and Category:British Pakistani English intended for terms used by modern British Pakistanis (i.e. British people of Pakistani origin), and that made me realize there could easily be a parallel "British Indian English" consisting of terms used by modern British Indians, so I renamed the existing category to Category:British India English, which is hopefully unambiguous. (However, if we ever create Category:British Indian English for modern usage, it could still be ambiguous, so maybe in that case we should consider either renaming the putative Category:British Indian English to something like Category:Modern British Indian English, or conversely rename Category:British India English to something like Category:Colonial British India English.) Benwing2 (talk) 06:52, 15 April 2024 (UTC)[reply]
In keeping with avoiding "[language name] [language name]", we should change CAT:Luxembourgish French and CAT:Luxembourgish German to CAT:Luxembourg French and CAT:Luxembourg German. "Luxembourgish German" especially could be interpreted as being a synonym of Luxembourgish since it is a High German variety (although that term is rarely if ever used, unlike Swiss German). —Mahāgaja · talk 08:20, 15 April 2024 (UTC)[reply]
In the vein of "Peninsular Spanish", it occurs to me that "French French" could be "Metropolitan French" (though then people unfamiliar with that term might think it means French spoken in metropolises, so I don't know if that's better or worse than "France French"). "England English" seems to be an actual term I can find in use (contrasted with e.g. "American English" and "Australian English"). - -sche (discuss) 15:52, 15 April 2024 (UTC)[reply]

Proto-West Germanic. Shouldn't this be *awjā? -saph 🍏 13:48, 19 April 2024 (UTC)[reply]

Nope, the reconstruction is correct. -- Sokkjō 17:44, 19 April 2024 (UTC)[reply]
Yeah, sorry, I confused ōn and ō stems. Nevermind. -saph 🍏 18:51, 19 April 2024 (UTC)[reply]