Jump to content

Wiktionary:Language treatment requests

Add topic
From Wiktionary, the free dictionary
Latest comment: 1 day ago by Benwing2 in topic Rename Proto-Saka to Proto-Tumshuqese-Khotanese


Wiktionary Request pages (edit) see also: discussions
Requests for verification
Requests for verification in the form of durably-archived attestations conveying the meaning of the term in question.
Requests for deletion
Requests for deletion of pages in the main and Reconstruction namespace due to policy violations; also for undeletion requests.
Requests for deletion/Others
add new request | history
Requests for deletion and undeletion of pages in other namespaces, such as appendices, templates and modules.
Language treatment requests
add new request | history
Requests for changes to Wiktionary's language treatment practices, including renames, mergers and splits.
Requests for moves, mergers and splits
add new request | history | archives
Discussion of proposed moves, mergers and splits of entries or other pages.
Category and label treatment requests
add new request | history
Requests for changes to Wiktionary's categories or labels, including additions, deletions, renames, mergers and splits.
Requests for cleanup
add new request | history | archives
Cleanup requests, questions and discussions.

{{attention}} • {{rfap}} • {{rfdate}} • {{rfquote}} • {{rfdef}} • {{rfe}} • {{rfeq}} • {{rfex}} • {{rfi}} • {{rfp}} • {{rfref}} • {{rfscript}} • {{rftranslit}} • {{t-needed}}

All Wiktionary: namespace discussions 1 2 3 4 5 - All discussion pages 1 2 3 4 5

This is the page for proposing changes to Wiktionary's language treatment practices, including language renaming, merging and splitting.

Use this page if you want to propose a non-trivial change to:

For issues pertaining to a single language, such as orthography, start a conversation on the discussion page of the language considerations page (the so-called "About LANG" page), or the beer parlour if no such page exists.

Archiving: Language treatment requests, once closed and (if applicable) acted upon, are archived on Wikipedia-style archive subpages. These can be found at Wiktionary:Language treatment requests/Archives and in the list below:

Language treatment requests: Archive index

2016

[edit]

Nkore-Kiga

[edit]

As can be seen at w:Nkore-Kiga language, Kiga [cgg] should definitely be merged into Nyankore [nyn]. Unfortunately, this might require a rename to something that is both hyphenated and considerably less common that just plain "Nyankore" (though that is, strictly speaking, merely the name of the main dialect). —Μετάknowledgediscuss/deeds 05:21, 18 September 2016 (UTC)Reply

I'm not sure. WP suggests the merger was politically motivated, but many reference works do follow it. Ethnologue says there as "Lexical similarity [of] 78%–96% between Nyankore, Nyoro [nyo], and their dialects; 84%–94% with Chiga [cgg], [...and] 81% with Zinza [zin]" (Kiga, meanwhile, is said to be "77% [similar] with Nyoro [nyo]"), as if to suggest nyn is about as similar to cgg as to nyo, and indeed many early references treat Nkore-Nyoro like one language, where later references instead prefer to group Nkore with Kiga. Ethnologue mentions that some authorities merge all three into a "Standardized form of the western varieties (Nyankore-Chiga and Nyoro-Tooro) [...] called Runyakitara [...] taught at the University and used in internet browsing, but [it] is a hybrid language." (For comparison, Ethnologue says English has 60% lexical similarity to German.) - -sche (discuss) 00:16, 2 June 2017 (UTC)Reply
Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

Itneg lects

[edit]

See w:Itneg language. All the dialects have different codes, but we really should give them a single code and unify them. I came across this problem with the entry balaua, which means "spirit house" (but I can't tell in which specific dialect). It's also known as Tinggian (with various different spellings), and this may be a better name for it than Itneg. —Μετάknowledgediscuss/deeds 02:09, 23 September 2016 (UTC)Reply

 Support. Itnëg is listed as just one language by the KWF with its variants listed, and we know that their database has been consulted by actual speakers of the language especially pertaining to its dialects.
This has been a problem for me as well with Kankanaey, wherein it also has separate ISO codes, and is reflected here in Wiktionary. Noting that this merger request is from 2016, I hope this gets action soon. — 🍕 Yivan000 viewtalk 05:51, 17 May 2025 (UTC)Reply
@Yivan000 Here's the thing, I cannot say that Itneg is one language at all. The dialects are far too different from one another, especially if it is spoken by a certain tribe. An example is how Maeng Itneg is very much closer to Kankanaey than to Inlaud Itneg, which is closer to Ilocano. While Maeng Itneg uses Min-, Inlaud Itneg uses Ag-. Adasen and Binongan also uses Ag- while Masadiit, Moyadan, Gobang, Mabaka, and Banao tribes use Man-. Itneg/Tinguian cannot be called one language alone. That's why plenty of Itneg people who go to other parts of IP municipalities in Abra use Ilocano to communicate with other tribes, because their Itneg will be so different. They will understant some, but not much. Amianero (talk) 06:38, 8 June 2025 (UTC)Reply
Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

Paraguayan Guaraní [gug]

[edit]

I just noticed that we have this for some reason. Guaraní is a dialect continuum that is quite extensive, both in inter-dialect differences and in geography, and certain varieties have been heavily influenced by Spanish or Portuguese. That said, our Guaraní [gn] content is, as far as I can tell, pretty much entirely on Paraguayan Guaraní, which for some reason has a different code, [gug]. My attention was brought to this by User:Guillermo2149 changing L2 headers (I have not reverted his edits, but they do cause header-code mismatch). We could try splitting up the Guaraní dialects, but it would hard to choose cutoffs and would definitely confuse potential editors, of which we have had more since Duolingo released a Guaraní course. I think the best choice is to merge [gug] into [gn] and mark words extensively for which dialects or countries they are used in. @-scheΜετάknowledgediscuss/deeds 01:29, 1 November 2016 (UTC)Reply

 Support merging gn and gug. - -sche (discuss) 14:33, 1 November 2016 (UTC)Reply
Don't forget there's also [gui] and apparently also [tpj]. - -sche (discuss) 04:28, 16 May 2017 (UTC)Reply

2017

[edit]

Merger into Scandoromani

[edit]

I propose that the Para-Romani lects Traveller Norwegian, Traveller Danish and Tavringer Swedish (rmg, rmd and rmu) be merged into Scandoromani. TN, TD and TS are almost identical, mostly differing in spelling (e.g. tjuro (Sweden) vs. kjuro (Norway) meaning 'knife', gräj vs. grei 'horse' etc.). WP treats them as variants of Scandoromani. My langcode proposal could be rom-sca, or maybe we could just use rmg, which already has a category. --176.23.1.95 20:19, 25 January 2017 (UTC)Reply

Im supporting it. Traveller Norwegian is sometimes referred to as Tavring, and, to be honest, Ive never herd nobody use the term Traveller Norwegian as a language. People are calling it rather Taterspråk or Fantemål, even when books states it as a derigatory therm. The other problem is that we've got in fact 2 differnet Norwegian Traveller languages (the Romani-based and the Månsing-based). So it look like a total mess rite now Tollef Salemann (talk) 07:55, 2 April 2023 (UTC)Reply
I don't think this makes sense if the orthographies are consistently different, which seems to be the case. Otherwise, we could use the same logic to merge quite a few of the Slavic languages, which obviously doesn't make sense. Theknightwho (talk) 13:43, 2 April 2023 (UTC)Reply
Ok, but Traveller Norwegian is not quite right term, cuz the Romani-based TN has two or more branches, which are quite different from eachother, while the main one is allmost the same as the Swedish and had often the same name(s). Meenwhile, there is also a Germanic TN version, unrelated to the Romani-ish TN variations. I mean, we need at least two more L2 in this case, even if we gonna merge TN and Swedish Tavring.
PS there are also Swedish stuff like Knoparmoj and Loffarspråk and more, and they still have remnants in some rare Swedish/Norwegian sociolects. Maybe they also need their L2? Or can we treat them as sociolects? Tollef Salemann (talk) 13:59, 2 April 2023 (UTC)Reply

Yenish

[edit]

The Yenish "language" (which we call Yeniche) was given the ISO code yec, despite being clearly not a separate language from German. Instead, it is a jargon which Wikipedia compares to Cockney (which has never had a code) and Polari (which had a code that we deleted in a mostly off-topic discussion). The case of Gayle, which is similar, is still under deliberation at RFM as of now. Most tellingly, German Wiktionary considers this to be German, and once we delete the code, we should make a dialect label for it and add the contents of de:Kategorie:Jenisch to English Wiktionary. @-scheΜετάknowledgediscuss/deeds 00:49, 7 April 2017 (UTC)Reply

I don't see how that's most tellingly; I don't know about the German Wiktionary, but major language works frequently treat things as dialects of their language that outsiders consider separate languages.--Prosfilaes (talk) 03:01, 10 April 2017 (UTC)Reply
The (linked) English Wikipedia article even says "It is a jargon rather than an actual language; meaning, it consists of a significant number of unique specialized words, but does not have its own grammar or its own basic vocabulary." Despite the citation needed that follows, that sentence is about accurate, as such this should be deleted. -- Pedrianaplant (talk) 10:53, 30 April 2017 (UTC)Reply
(If kept, it should be renamed.)
There are those who argue that Yenish should have recognition (which it indeed gets, in Switzerland) as a separate language. And it can be quite divergent from Standard German, with forms that are as different as those of some of the regiolects we consider distinct. Many examples from Alemannic or Bavarian-speaking areas are better considered Alemannic or Bavarian than Standard German. But then, that's a sign that it is, as some put it, a cant overlaid onto the local grammar, rather than a language per se. Ehh... - -sche (discuss) 03:22, 9 July 2017 (UTC)Reply

2018

[edit]

Category:Nahuatl language

[edit]

Nahuatl is sometimes treated as a language, and sometimes as a family of languages. Right now, Wiktionary is treating it as both simultaneously, which doesn't make sense. "Nahuatl" should be removed as a language. --Lvovmauro (talk) 11:55, 30 August 2018 (UTC)Reply

I agree the current arrangement doesn't make sense; it is a relic of very early days on Wiktionary, and has persisted mostly because it's not entirely clear how intelligible the varieties are and hence whether it's better to lump them all into nah, or retire nah and separate everything. But enough varieties are not intelligible that I agree with retiring nah (or perhaps finally converting it to a family code). - -sche (discuss) 20:34, 31 August 2018 (UTC)Reply
I think a family code for Nahuan languages is really needed since there are many cases where we don't know specifically which variety a word was borrowed from. --Lvovmauro (talk) 09:55, 9 September 2018 (UTC)Reply
@Lvovmauro: OK, thanks to you and a few other editors, all words with ==Nahuatl== sections have been given more specific headers. However, as many as a thousand translations remain to be dealt with before the code can be made a family code and Category:Nahuatl language moved on over to Category:Nahuan languages. - -sche (discuss) 06:48, 19 September 2018 (UTC)Reply
A disturbingly large number of these translations are neologisms with no actual usage. Some of them don't even obey the rules of Nahuatl word formation. --Lvovmauro (talk) 11:03, 19 September 2018 (UTC)Reply
@Lvovmauro: Feel free to remove obvious errors / unattested neologisms. If a high proportion of the translations are bad, it might even be reasonable to start presuming they're bad and just removing them, since they already suffer from the problem of using an overbroad code. - -sche (discuss) 00:28, 21 October 2018 (UTC)Reply
Someone with more time on their hands than me at the moment will need to delete all the subcategories of Category:Nahuatl language, and then the category itself, in preparation for moving 'nah' from the language-code module to the family-code module so the categories won't be recreated by careless misuse of 'nah' in the labels etc of 'nci' entries. - -sche (discuss) 00:24, 21 October 2018 (UTC)Reply
Five years on, I've reviewed the situation here. There are no Nahuatl entries anymore, which is good progress. However, two pressing issues are stopping us from fully retiring this language code:
  • There are still about 450 "Nahuatl" (nah) translations in English entries. I suppose these need manual review. This should not be too difficult if one can find word lists for some of the best-attested Nahuatls.
  • Many languages have at least one word said to be derived from Nahuatl (presumably this is the word for "chocolate" in most cases). This could be solved by making Nahuatl an etymology-only language, or by changing these etymologies to refer generically to "a Nahuan language".
This, that and the other (talk) 09:25, 1 November 2023 (UTC)Reply


Language request: Old Cahita

[edit]

Mayo and Yaqui are mutually intelligible and sometimes considered to be a single language called Cahita. But their speakers apparently consider them to be distinct languages, and they have distinct ISO codes (mfy and yaq) and are currently treated distinctly by Wiktionary.

I'm not requesting that they be merged, but separating them is a problem because an important early source, the Arte de la lengua cahita conforme à las reglas de muchos peritos en ella (published 1737 but written earlier) treats them as a single language, and also includes an extinct dialect called Tehueco. I'd like to add words from the Arte but I can't list them specifically as either Mayo or Yaqui.

One solution would be treat to the language of the Arte as a distinct historical language, "Old Cahita", which would then be the ancestor of Mayo and Yaqui. The downside is there only seems to be one linguist currently using this name. --Lvovmauro (talk) 11:32, 4 November 2018 (UTC)Reply

On linguistic grounds, it seems like we should merge Yaqui and Mayo. Jacqueline Lindenfeld's 1974 Yaqui Syntax says "Yaqui and Mayo are sufficiently similar to be mutually intelligible", the Handbook of Middle American Indians says "the modern known representatives of Cahitan—Yaqui and Mayo—are mutually intelligible", and various more general references say "Yaqui and Mayo are mutually intelligible dialects of the Cahitan language", "The Yaqui and Mayo speak mutually intelligible dialects of Cahita". (There are political considerations behind the split, which a merger might upset, so adding Old Cahita would also work, but we have tended to be lumpers...) - -sche (discuss) 23:03, 18 November 2018 (UTC)Reply
I wouldn't object to merging them. --Lvovmauro (talk) 08:58, 19 November 2018 (UTC)Reply

Merging Classical Mongolian into Mongolian

[edit]

"Classical Mongolian" refers to the literary language of Mongolia used from 17th to 19th century created through a language reform associated with increased Buddhist cultural production (this started in the 16th century, but language standardization took place later). In the 20th century, (outer) Mongolia became independent from China and later adopted a Cyrillic orthography based on the spoken language, while Inner Mongolia kept her Uyghur script.

The literary language of Inner Mongolia continues Classical Mongolian in terms of its orthography as well as most of its grammar (to an extent that Janhunen (?) calls the situation bilingual). Modern varieties, in both Outer and Inner Mongolia, have greatly expanded their lexicons through borrowing of modern terms, but they also both consider all of Classical Mongolian lexicon to be a part of their language, and will put it in their dictionaries, even transcribed into Cyrillic.

The actual problem I have with this division is that when it comes to borrowings from (Classical) Mongolian, we sometimes cannot ascertain whether they precede the 20th century or not, or more common still, we know they precede the 19th century (and post-date the 16th), but they obviously come from a spoken variety and not "Classical Mongolian" as a literary language. Crom daba (talk) 17:14, 15 November 2018 (UTC)Reply

Yes. I find it also strange that Wiktionary distinguishes Ottoman Turkish from Turkish, it’s like distinguishing pre-1918 Russian from “Russian”, or like one reads about “Ottoman Turks” instead of “Turks”. Also Kazakh and the other Turkic language do not get extra codes for Arabic spelling, this situation is even more comparable, innit. Kazakhs in China write in Arabic script, Mongols in China in Mongolian script, but the languages are two and not four. Or also it sounds as with Pali. Am I correct to assume that Classical Mongolian texts get reedited in Cyrillic script? Then you could base all on Cyrillic and make Mongolian script soft redirects, because even words died out before the introduction of Cyrillic can be found in Cyrillic. Fay Freak (talk) 15:23, 17 November 2018 (UTC)Reply
@Fay Freak, the situation is similar to Turkish, but it creates less problems there since the Arabic script Turkish is obsolete and most relevant loans are pre-Republican.
In principle it could be possible to collapse all of Mongolian into Cyrillic, but this would be extremely politically incorrect.
Collapsing everything (potentially even Buryat, Daur and Middle Mongolian) into Uyghur script, like we do with Chinese, would perhaps make more sense, but 1) it's a pain to enter 2) Cyrillic is generally more accessible and useful to our users and (Outer) Mongolians 3) most of my materials are in Cyrillic 4) it corresponds poorly to the spoken forms 5) its Unicode encoding corresponds poorly to its actual form 6) the encoding doesn't correspond that well to the spoken form either. Crom daba (talk) 16:50, 18 November 2018 (UTC)Reply
This is tricky, because as far as language headers and having entries for terms in the language, it seems like we could often resolve which language a word is in(?) by knowing the date of the texts it's attested in. It is, as you say, etymologies where it's hardest to ascertain dates. (Still, if we merged the lects, we could retain an "etymology only" code for borrowings that were clearly from Classical Mongolian, like is done for Classical Persian, etc.) I'm having a hard time finding any references on the mutual intelligibility of the two stages; most references are concerned with the intelligibility or non-intelligibility of modern Khalkha, Kalmyk, etc. If we kept the stages separate, etymologies could always say something like "from Mongolian foo, or a Classical Mongolian forerunner". - -sche (discuss) 22:50, 18 November 2018 (UTC)Reply
@-sche, yes, the Persian model would be desirable.
It doesn't make much sense to speak of intelligibility between Classical and Modern Mongolian, Classical Mongolian is exclusively a written language, its spelling reflects the phonology of 13th-century Mongolian (early Middle Mongolian). The same spelling is used in Modern Mongolian as written in Uyghur script.
The biggest problem with Classical Mongolian is how redundant it is. For any word that is shared between modern and classical periods, and that is probably most of the lexicon, we would need to make two identical entries in Uyghur script for modern and classical Mongolian. Crom daba (talk) 11:18, 19 November 2018 (UTC)Reply
That seems not unlike how we handle Serbo-Croatian and Hindi-Urdu. — [ זכריה קהת ] Zack. 14:25, 30 November 2018 (UTC)Reply
Indeed. The way we handle them sucks. Crom daba (talk) 12:52, 1 December 2018 (UTC)Reply
I agree. All this duplication is a huge waste of resources. Per utramque cavernam 13:22, 1 December 2018 (UTC)Reply
Not exactly; Serbo-Croatian and Hindi-Urdu have redundant entries in different scripts on different pages, while I understand Crom daba's point to be that we would need to have redundant ==Mongolian== and ==Classical Mongolian== entries on the same pages for most Mongolian/Uyghur script words, which would be more like having duplicate Bosnian and Croatian entries on the same pages, not our current system. And Serbo-Croats are testier about their language(s) being lumped than speakers of Classical Mongolian... ;) - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply
OK, does anyone object to the merge? If not, I can try to do it with AutoWikiBrowser later, or Crom or others could start reheadering our small number of Classical Mongolian entries, fixing any wayward translations, etc. For etymologies of terms that are known to derive from Classical Mongolian, we should be able to just move cmg over to Module:etymology languages/data. - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply
@Crom daba, Fay Freak I made the few ==Classical Mongolian== entries we had into ==Mongolian== entries (labelled "Classical Mongolian" unless there was already a modern Mongolian section on the same page), but many of the categories still need to be deleted, and one needs to check whther anything else is left that would break before "cmg" is moved from being a language code to being an etymology-only code. - -sche (discuss) 02:46, 27 September 2020 (UTC)Reply
There's no full correspondence between different Mongolian scripts and none of the scripts is totally phonetic. It's not just the spelling, the phonologies are different but sometimes one script represents the true or historical pronunciation and it's not necessarily Cyrillic, which is strange. There are words that only exist on one or the other, which is quite understandable, cf. modern ᠱᠠᠹᠠ (šafa, sofa) in Inner Mongolia (from 沙發沙发 (shāfā) and софа (sofa, sofa) in outer Mongolia (from софа́ (sofá). I support the merge, though but I am curious if classical Mongolian terms are equally representable in Cyrillic and Arabic scripts. In other words, are there terms in classical Mongolian, which are different from modern and there's no Cyrillic form for them? I think I saw them.
Duplication of entries is a waste. You may think I am biased but I think Mongolian should be presented/lemmatised in Cyrillic (Uyghurjin should also be available in all entries where it can be found) - for which resources are much more accessible. (Serbo-Croatian should be lemmatised on the Roman alphabet, on the other hand, let's finish the senseless duplications of entries)
Also supporting the Ottoman Turkish/Turkish merge. --Anatoli T. (обсудить/вклад) 03:25, 27 September 2020 (UTC)Reply
@Atitarev In Mongol khelnii ikh tailbar toli we see the term уйгуржин бичиг is described as ‘монгол бичгийн дундад эртний үеийн хэлбэр’ (‘early form of the Mongolian/Khudam script’). Middle Mongolian in uigurjin with its own rules shall not to be equated with the later ‘Classical’-Modern script and orthography. I maintain uigurjin (with its specific glyph forms and spelling rules) shall be treated as a term only for Middle Mongolian.
Similarly I also object treating Northern Yuan – Qing (‘Classical’) Mongolian and Modern Mongolian-script Mongolian as one literary language standard. In fact orthographic standardisations and modifications make written Modern Mongolian such different from Classical. Personally I’d like to display a historical feature of this language collectively under ‘Classical Mongolian’, as only this term directly interlinks with an Inner Asian historical and linguistic tradition. LibCae (talk) 16:40, 7 May 2021 (UTC)Reply


Renaming agu

[edit]

We currently call this "Aguacateca", but "Aguacateco" is much more common. (Wikipedia opts for "Awakatek", which is rapidly becoming more common but is probably not there yet — not that we can't be crystal-ballsy if we want to when it comes to names rather than entries.) —Μετάknowledgediscuss/deeds 05:42, 19 December 2018 (UTC)Reply

You're right that several modern (and a few older) sources seem to use Awakatek. In turn, historically Aguacatec has been used in the titles of many reference works on it, and seems like it may be the most common name (ngrams), although it's also the name of the people-group. (Others: Awakateko, Awaketec, Qa'yol, Kayol, and variously spellings of Chalchitec sometimes considered a distinct lect.) - -sche (discuss) 04:31, 19 August 2020 (UTC)Reply
Indeed, the most common name by a longshot is Aguacatec, followed by Awakatek (but these are also names of the people-group), followed by Awakateko, then Aguacateco, and in dead last, our current name of Aguacateca. Can we rename to Aguacatec? - -sche (discuss) 07:02, 28 December 2023 (UTC)Reply
  • Support renaming to Aguacatec. Also being the name of the "people-group" is hardly an argument against it; the same is true of a huge number of languages including French, Welsh, Manx and the vast majority of language names ending in -ish. —Mahāgaja · talk 07:22, 28 December 2023 (UTC)Reply
    Oh, to clarify, I didn't intend that as an argument against using that name, but as a qualification on the data; comparing which term is more common can't easily determine which is the most common name of the language if one term is also used for something else (the name of the people). But Aguacatec seems to be the most common name in e.g. the books about it in Glottolog's bibliography, too. Who has a bot that does renames? This one involves few enough entries that it could be done by hand, but it seems like the tasks that would need to be done are the same for many (all?) language renames, so it should be bottable... - -sche (discuss) 07:51, 28 December 2023 (UTC)Reply

2020

[edit]

Retiring Moroccan Amazigh [zgh]

[edit]
Discussion moved from Wiktionary:Requests for moves, mergers and splits#Retiring Moroccan Amazigh %5Bzgh%5D.

We renamed this code from "Standard Moroccan Amazigh" to "Moroccan Amazigh", but failed to note that the "standard" part was key. This is a standardised register of the dialect continuum of Berber languages in Morocco, promoted by the Moroccan government since 2011 as an official language. Marijn van Putten says this is essentially Central Atlas Tamazight [tzm], but most of the people producing texts in it are native speakers of Tashelhit [shi], so there is a bit of re-koineisation. However, if we move forward with good coverage of the Berber languages, every entry in [zgh] will be a duplicate of [tzm] or else a duplicate of [shi] marked with some sort of dialectal context label. By the way, the fact that there is an ISO code seems to be a political consideration rather than a linguistic one; compare the case of "Filipino", which we merged into Tagalog, or "Standard Estonian", which we merged into Estonian. @Fenakhay, -scheΜετάknowledgediscuss/deeds 21:31, 16 March 2020 (UTC)Reply

Hmm, I see it's a rather recent attempt at standardization, too. I don't feel like I know enough about Tamazight to be confident about what to do, but it does seem like, if this is based on tzm, it could be handled as tzm (perhaps even, instead of putting "non-[ordinary-]tzm" entries at shi+label, they could be tzm+label, unless they're obviously shi words). - -sche (discuss) 15:44, 19 March 2020 (UTC)Reply
Generally, it seems the [shi] words are quite obvious; the main differences between [tzm] and [shi] are lexical (as far as I can tell, [tzm] has more internal diversity w/r/t phonology than differences with [shi]). But they're in a continuum anyway, and WP claims that there's debate on where to draw the dividing line. —Μετάknowledgediscuss/deeds 16:35, 19 March 2020 (UTC)Reply
And “Moroccan Amazigh” does not sound like a language name anyway if you have not been told it is one, it seems like “Berber as spoken in Morocco”, another reason to remove it. Fay Freak (talk) 15:59, 21 March 2020 (UTC)Reply

2021

[edit]

Canonical name of "mep"

[edit]

Currently, the canonical name of the language in WT is spelled Miriwung, even though every primary/secondary source I could find recommended the spelling Miriwoong, as that is consistent with the language's own orthography, while the spellings "Miriwung" and "Miriuwung" are considered nonstandard. Can someone fix it? --Numberguy6 (talk) 14:47, 8 May 2021 (UTC)Reply

It's not exactly hard to find sources spelling it as Miriwung, but I'm sure you're right. @-sche? —Μετάknowledgediscuss/deeds 22:52, 21 July 2021 (UTC)Reply

Names of sah, alt, xgn-kha and request for Soyot

[edit]

The Constitution of the Republic of Sakha (Yakutia) (https://iltumen.ru/constitution) officially used язык саха referring to the language sah. A government decree («О Правилах орфографии и пунктуации языка саха») which approved the language’s current orthography, used язык саха instead of якутский язык from its annexe. However, this usage is not mandatorily popularised. I suggest Sakha to be adopted instead of Yakut due to the Constitution reference.

Whence atv ‘Northern Altai’ is not a singule language/dialect but a group of several (Kumandy, Chelkan & Tubalar), atv shall be split into subcodes. Furthermore Southern Altai is only a classifying term, Altai as an official term shall be suggested for alt.

Khamnigan xgn-kha, as a transitional dialect (with conservative phonology) between Buryat and Mongolian, its simple name may not create ambiguity.

In addition I also request a code for Soyot. It will help contrasting Sayan Turkic languages. LibCae (talk) 06:36, 2 September 2021 (UTC)Reply

The Constitution of the Republic of Sakha is not our guide to using English names. In the case of [sah], most scholarly descriptions use "Yakut" (e.g. The Turkic Languages), there are far more raw Google hits for "Yakut language" than "Sakha language", and Google Ngrams show a preference for "Yakut" that has not waned over time (but we don't know past 2008, after which the data are incomplete).
I can't comment on the other code requests, but it would be more convincing if there were some evidence in favour of the need for these codes and their distinctiveness from their closest relatives. —Μετάknowledgediscuss/deeds 16:11, 2 September 2021 (UTC)Reply
I don’t see the argument how more information would come to light if we split Northern Altai. Surely also Northern Altai and Southern Altai are the most usual names, in either English or Russian. For that number of speakers Northern Altai has, how could there be a benefit? The major factor for editors is what sources they use, whether they indicate the sources and whether those are clear about the place of origin. I had many books about “the Aramaic dialect of [village X]” where I don’t know which damn language code of Wiktionary it is supposed to belong to, Wiktionary making codes centered around city A and B but not village X, in the end I ignored to add anything. Fay Freak (talk) 17:00, 2 September 2021 (UTC)Reply
 Oppose renaming Yakut
 Support splitting atv
 Support renaming alt to Altai
 Abstain regarding xgn-kha
 Support creating a code for Soyot, quite strongly so. Allahverdi Verdizade (talk) 17:13, 2 September 2021 (UTC)Reply

Renaming [nlo]

[edit]

Wikipedia uses the phrase "Ngul (including Ngwi)" to describe this language, which we currently call "Ngul", but this paper indicates that these are just two of several synonyms, and uses "Ngwi" as the primary name. We should follow suit. —Μετάknowledgediscuss/deeds 00:19, 21 December 2021 (UTC)Reply

Renaming [amf]

[edit]

We currently call this language "Hamer-Banna", after two of its dialects; WP uses "Hamer". This hyphenated name is found in the literature, though it excludes the third dialect, Bashaɗɗa. Modern publications, following the lead of Petrollino's grammar, use the spelling "Hamar" for that dialect. As I see it, if we stick with the hyphenated name, we should change it to "Hamar-Banna", but we could also consider elevating the name of the primary dialect to cover the language as a whole, as WP does, though in that case we should use "Hamar" instead. —Μετάknowledgediscuss/deeds 07:56, 22 December 2021 (UTC)Reply

Indus Valley Language

[edit]

We currently have this language, which Wikipedia refers to as the Harappan language, as [xiv]. I suggest that we retire the code, because the language is undeciphered and its script has not been encoded, so there is nothing to add to Wiktionary in the foreseeable future. I also suggest that we retire the script code [Inds], which is only used for this language. @AryamanAΜετάknowledgediscuss/deeds 07:14, 28 December 2021 (UTC)Reply

Merging Yoruba dialects

[edit]

Currently, we have codes for [mkl] "Mokole" (see Mokole language (Benin)), [cbj] "Ede Cabe", [ica] "Ede Ica", [idd] "Ede Idaca", [ijj] "Ede Ije", [nqg] "Ede Nago", [nqk] "Kura Ede Nago", [xkb] "Manigri-Kambolé Ede Nago", and [ife] "Ifè" (all of which are lumped into Ede language). These lects are all very close to Yoruba proper (which they use for formal and liturgical purposes), and spoken by people who are considered ethnic Yorubas; moreover, they are included in the Global Yoruba Lexical Database. I have added them as dialects of [yo] "Yoruba" in MOD:labels/data/subvarieties, but treating Yoruba as a macrolanguage means we must remove these codes. (Note: the family code [alv-ede] would have to be removed as well.) @AG202, Oniwe, Oníhùmọ̀Μετάknowledgediscuss/deeds 07:29, 28 December 2021 (UTC)Reply

Merge, obviously again Ethnologue’s fabrications, which were then copied over from Wikipedia and some other “encyclopedias” with their impractical credulity towards this reference. Fay Freak (talk) 07:54, 28 December 2021 (UTC)Reply
If anything I would keep the Ede family code and change the lects to be etymology-only languages (edit: excluding probably Ifè since it is much more documented), but putting them all under Yoruba I unfortunately oppose for now. The Western Ede languages as seen here have a higher degree of separation from Nuclear Yoruba, and it checks out more when comparing, at the very least, the words and phrases of Ifè to nuclear Yoruba: Ifè-French Dictionary, Peace Corps - IFÈ O.P.L. WORKBOOK, J'apprends l'ife: Langue Benue-Congo du Togo. While there are obviously words that are shared due to them being related languages, it doesn't feel like a dialect of Yoruba (to me at least), so I feel uncomfortable grouping it under Yoruba. Though I do admit that I haven't really looked into the other Ede languages nearly as much. Edit: This paper may be helpful and at least shows some of the differences between Ifè & Yoruba and some aspects of the dialect continuum. Obviously some Ede varieties are much closer to Yoruba, but then I wonder what to do about the other ones. AG202 (talk) 15:09, 28 December 2021 (UTC)Reply
@AG202: Thanks for the sources. The question of whether to lump a code is in part based on how much extra work is entailed; would you be willing to work through a subsample to see how much we would just be duplicating Yoruba entries, and how much would be distinct? I'm not sure what you're actually advocating, because making them etymology-only languages (which you say you support) would require merging them (which you say you oppose). —Μετάknowledgediscuss/deeds 07:18, 29 December 2021 (UTC)Reply
@Metaknowledge Yea, sorry for that being unclear. I oppose the merger under solely Yoruba. Regarding the etymology-only part, I would support having all the Ede lects (excluding Ifè) under the header "Ede" and then differentiating on the definition line which Ede lect it is, mainly because they have much less coverage than Ifè, and it's harder to tell their mutually intelligibility. (Though as mentioned I'm not as well-versed with the other lects, so I might be entirely wrong about their continuum) In terms of working through a subsample, I am up to do so, though I am swamped at the moment so it'd definitely take a while, but from what I've seen so far, I'd be worried about putting possible Ifè terms like ɖíɖì (belt) or àntã̀ (chair) under a Yoruba header and keeping nice clear entries for readers. AG202 (talk) 07:52, 29 December 2021 (UTC)Reply
Looks reasonable. To clarify, my main note relates to observation that the language names currently in the data are too unnatural to find use and are not even meeting our CFI, which again means there is no entrotopy for those who know the languages to assign material to the designations with little doubt, as there is little to confirm the meanings of the language names, which should be a consideration if you devise new namings, in so far as you would like to not have private language but more or less obvious to new editors what the language codes are for. So I was not to mean that there cannot be a split in a different manner, or a smaller merge, but the current ones should be recognized as off the wall, and then there will have to be something that interrelates the remaining codes if one stumbles upon one, else it will be a reoccurring problem that an editor did not see the distinction of the available language codes. Fay Freak (talk) 01:36, 30 December 2021 (UTC)Reply

2022

[edit]

Category:Gansu ChineseCategory:Gansu Mandarin? Category:Gansu Dungan?

[edit]

Members:

@Justinrleung, RcAlex36, 沈澄心Fish bowl (talk) 05:55, 6 February 2022 (UTC)Reply

@Fish bowl: Gansu means actual Gansu in China, but Gansu Dungan should be its own label perhaps. I'm not sure why those entries are labelled specifically as Gansu Dungan, though, because do we know if it's not used in other varieties of Dungan? Pinging @Mar vin kaiser to know why he chose to label it as Gansu Dungan specifically. — justin(r)leung (t...) | c=› } 06:03, 6 February 2022 (UTC)Reply
@Justinrleung: There's this website, I can't find the link now, that was like a mini Dungan dictionary, and for some of its words, it has a dialectal label. I think I got it from there. --Mar vin kaiser (talk) 08:39, 6 February 2022 (UTC)Reply
@Mar vin kaiser: This? I know these words are marked as Gansu here, but I wonder if we need to specify it as Gansu specifically when we don't know if other Dungan varieties use it. — justin(r)leung (t...) | c=› } 09:02, 6 February 2022 (UTC)Reply
@Justinrleung: Oh, I added the label Gansu with the assumption that it's specifying that it's only used in Gansu. Aren't there just two dialects, Gansu and Shaanxi? --Mar vin kaiser (talk) 14:03, 6 February 2022 (UTC)Reply

Merge Category:Hokkien, Category:Hokkien Chinese; and perhaps move Category:Hainanese depending on the result of the previous

[edit]

Category:Hokkien is an etymology language, while Category:Hokkien Chinese belongs to the {{dialectboiler}} system.

Category:Hainanese is presently both.

Fish bowl (talk) 11:10, 7 February 2022 (UTC)Reply

@Fish bowl @Justinrleung @RcAlex36 @沈澄心 @AG202 IMO we should delete Category:Hokkien Chinese and recategorize the lemmas under it to Category:Hokkien. This is consistent with the treatment of other etymology languages, particularly since Hokkien is considered a dialect of the Min Nan language and not a dialect of "Chinese" (which is not a language). If you don't mind, I will go ahead and do this. (While we're at it, we should rename the Amoy etymology language to Xiamen Hokkien, which is currently a dialect category but not an etymology language, and give it a standardly-formed etymology code. Its current code is nan-xm, which is badly formatted; etymology codes should consist of sections of three letters, hence nan-xia. Same goes for nan-ph -> nan-phi, nan-qz -> nan-qua, nan-zz -> nan-zha, nan-jj -> nan-jin.) Benwing2 (talk) 05:29, 16 September 2023 (UTC)Reply
I also think we should upgrade Hokkien to a full language, esp. seeing as Min Nan is itself not a language but a macrolanguage. Benwing2 (talk) 05:30, 16 September 2023 (UTC)Reply
Agree that we should treat Hokkien as a full language - this feels like to be long overdue. I think in general each lect listed in {{zh-pron}} should be treated as a full language in its own right, which means Sichuanese (currently with etymology code [cmn-sic]) and Leizhou (currently lacks a code, I would suggest [nan-lei] or [nan-lz]) would be upgraded. We might also want to add more etymology codes, but that might warrant a separate discussion.
I however oppose changing the 3-2 letter codes, which are much easier to memorise (since this is just taken from the first letter of each syllable) and also are consistent with the location codes used in {{zh-pron}}. Changing them means that we would need to deal with two separate, inconsistent systems.
Regarding the category name issue, for some reason we also have categories like Cat:Mandarin Chinese, Cat:Cantonese Chinese, Cat:Hakka Chinese, Cat:Min Nan Chinese, etc. alongside the regular lemma categories. I don't really care about their treatment as long as the approach is consistent. – wpi (talk) 17:18, 16 September 2023 (UTC)Reply
@Wpi It is a pain to have nonstandard etym codes like this, as it requires adding code to various places to handle them. I don't see why the 3-2 codes are easier to memorize; the proposed 3-3 codes consistently use the first three letters of the lect in question, which is standard practice at Wiktionary, whereas the 3-2 codes aren't consistent (nan-ph is not the first two syllables of "Philippine"). In terms of the location codes in {{zh-pron}}, we should rename the latter to match the 3-3 codes. However, as a first step if you don't object, I will promote Hokkien to a full language, and we can continue the discussion on etym codes; in this case we should maybe eliminate Category:Hokkien in favor of Category:Hokkien Chinese for consistency with the other such categories, although in general we need to rethink the naming of these categories. Benwing2 (talk) 19:21, 17 September 2023 (UTC)Reply
I think one reason that 3-2 codes are easier to memorize is that {{zh-pron}} uses 2-letter codes for dialects of Hokkien. However, if it makes more sense for codes to be 3-3 to be consistent with other languages, I wouldn't mind it. I agree that whatever we do, we should make it consistent with CAT:Mandarin Chinese, CAT:Gan Chinese, CAT:Xiang Chinese, etc., (which means the easiest thing to do is to have CAT:Hokkien Chinese). — justin(r)leung (t...) | c=› } 18:09, 19 September 2023 (UTC)Reply


Slavic phylogeny

[edit]

Old Slovak ?

[edit]

How about adding code for the Old Slovak (zlw-osk) as well. In the same {{R:sla:ESSJa}} (ЭССЯ), especially in recent editions, Old Slovak is constantly listed separately. In this case, etymology-only code is sufficient. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply

@ZomBear @Thadh @Sławobóg @Vininn126 What is the current state of this? I notice that Middle Russian is an etym-only language of Russian and has two codes zle-mru and zle-oru, which looks very suspect. I also think Middle Polish has in fact been made an etym language of Polish. Benwing2 (talk) 06:24, 19 September 2023 (UTC)Reply
I still believe that at least there should be an etym-code for the Old Slovak language. It is also necessary to combine Czech & Slovak into the “Czech–Slovak family” in Slavic languages tree, as was done with Lechitic (zlw-lch) F. ZomBear (talk) 06:54, 19 September 2023 (UTC)Reply
I know that Sławobóg also wanted to split Old Slovak. As to grouping them and giving them a family lang code, I'm not sure. Perhaps Moravian should also be split and placed in this family. @Zhnka Vininn126 (talk) 07:51, 19 September 2023 (UTC)Reply
I'm pretty certain that this question isn't as straightforward as you make it out to be, and I read on multiple occasions that the similarities between Standard Czech and Standard Slovak arose due to Czech's influence on Slovak and that dialectal evidence shows no evidence of genetic relationship closer than on the West Slavic level. So I would like a more detailed discussion on this. Thadh (talk) 08:10, 19 September 2023 (UTC)Reply
@Benwing2 @Thadh @ZomBear @Sławobóg I was reading up on w:sk:Dejiny slovenčiny#11. až 18. storočie, and it seems like there were huge phonological and grammatical changes, IMO upon reading it enough to split Old Slovak into an L2. There also appears to be a dictionary Historický slovník slovenského jazyka that could be used as a source. So I propose that we split Old Slovak. Vininn126 (talk) 10:15, 1 October 2023 (UTC)Reply
Also @Zhnka for the tactical ping. Vininn126 (talk) 10:49, 1 October 2023 (UTC)Reply
 Support. @Vininn126 I just created a template for Historical Dictionary of the Slovak Language {{R:sk:HSSJ}}. It contains more than 70,000 words from the pre-literary period (before the 18th century) of the Slovak language. This is a really good source for Old Slovak. ZomBear (talk) 11:29, 1 October 2023 (UTC)Reply
@ZomBear We should be careful, however, Old Slovak is best described as 9th-14th centuries. Vininn126 (talk) 11:33, 1 October 2023 (UTC)Reply
@Vininn126 it’s just great what’s in this dictionary, when quoting, the year or century when the word was recorded is indicated. For example, voda (“water”), it can be seen that the oldest evidence for this word is 1473, 1585 and 1376. ZomBear (talk) 11:50, 1 October 2023 (UTC)Reply
 Support. Sławobóg (talk) 13:14, 1 October 2023 (UTC)Reply
I have split Old Slovak and given it the code zlw-osk. Vininn126 (talk) 19:26, 3 October 2023 (UTC)Reply

Slavic phylogeny

[edit]

East Slavic codes

[edit]

Following up a long discussion on the Old East Slavic About: page, I'd like to propose the following splits:

  • Split off Old Ruthenian (zle-ort)
  • Set Old Ukrainian (zle-obe) and Old Belarusian (zle-ouk) as etymology-only descendants and labels of Old Ruthenian
  • Set Ukrainian (uk), Belarusian (be) and Rusyn (rue) as descendants of Old Ruthenian
  • Change Old Russian (zle-oru) to Middle Russian (zle-mru) and set this as a label of Russian (ru)

On the final point there was quite some discussion, and I personally support making Middle Russian as a full-fledged code, but since we couldn't reach consensus, I propose making that a separate discussion if need be.

The proposed historical borders of the languages are as follows:

  • Old East Slavic (until the 14th century)
  • Middle Russian (=Moscow Literary language; 14th century-18th century) [Peter the Great's reforms]
  • Old Ruthenian (='West Russian' Literary language; 14th century-19th century) [Kotliarevsky's Eneïd]

Pinging @Atitarev, ZomBear, Useigor, Ентусиастъ, Benwing2, Rua, Ogrezem. I apologise if I forgot anyone. Thadh (talk) 12:43, 2 March 2022 (UTC)Reply

I still support only the introduction of Old Ruthenian, which is missing but as before, I don’t claim to be an expert on the matter. The Russian corpus in the other discussion was helpful. When I filtered on “Middle Russian”, I think I was able to find a couple of words, which are now considered obsolete. The rest were words, which just need to be respelled to find quotes in (early) Modern Russian. I found a few different ways to abbreviate and also numerous misspellings. Overall I sort of feel why these additional splits are not so popular - little strong evidence to work with. Middle Russian may be allowed to be added, let’s just look for good cases.
To make decisions easier, why don’t we add a couple of specific examples for each new language code proposed - something to work with. (They can be vocab, grammar or pronunciation cases). They proponents should have examples in mind to make the case(s) stronger. We can work together on confirming or disputing those cases. --Anatoli T. (обсудить/вклад) 22:57, 2 March 2022 (UTC)Reply
I'll see if I can make a list of features that distinguish Middle Russian from (Modern) Russian. In any case, for the time being, treating Middle Russian like Old East Slavic makes little sense to me, especially if we're splitting off Ruthenian (otherwise we get some kind of Dutch-Afrikaans situation), so we could go ahead with that now and in the meantime continue discussing MR's position as a separate code. Thadh (talk) 23:30, 2 March 2022 (UTC)Reply
(edit conflict) You can use any of the examples already in discussions used as evidence, e.g. онтарь/оньтарь, агистъ, etc. BTW, I see that "Old Russian" was used incorrectly by ZomBear when actually talking about Middle Russian. "Old Russian" = "Old East Slavic". The Russian term for Middle Russian is старору́сский (starorússkij) but Old East Slavic (Old Russian) is древнеру́сский (drevnerússkij). --Anatoli T. (обсудить/вклад) 00:21, 3 March 2022 (UTC)Reply
Quick update, I've found a relevant discussion from three years ago, Wiktionary talk:About Russian#Middle Russian?. Also, The Russian Language before 1700 (Matthews 1953) argues your and Fay Freak's point (that Middle Russian is too similar to modern Russian to warrant a linguistic distinction) Fun point, it also provides съмьрть's accentuation :0. I'll still look for differences in the corpora, but if the languages are too similar I guess I don't mind keeping the two together - as long as the descendants sections don't get too cluttered, I'm fine. Thadh (talk) 00:02, 3 March 2022 (UTC)Reply
BTW, I didn’t get back to you on the concern I have in regards to introduction of word stresses in Old East Slavic. My reason being there are many cases where assumptions can go wrong based on descendants. We should only use referenced data. Well, we don’t have native speakers to prove us wrong, do we? —Anatoli T. (обсудить/вклад) 23:03, 2 March 2022 (UTC)Reply
Sure, but of course we can still use sound laws for words without referencing the specific word's reconstruction. A word like съмь́рть will have the stress on the second syllable, because otherwise the Russian term would be something like **со́мерть rather than сме́рть. However, I wouldn't know where to look for any reference on this specific word, and googling "съмь́рть" returns no results. Thadh (talk) 23:30, 2 March 2022 (UTC)Reply
Of course, there could be strong (?) assumptions on vowels, which became silent (i.e. they are unstressed) but I wouldn't be so sure even on e.g. вода́ (vodá) (if it weren't referenced), since the word is stressed on the first syllable in some Ukrainian dialects, if you know what I mean. --Anatoli T. (обсудить/вклад) 00:21, 3 March 2022 (UTC)Reply
@Thadh: I support your suggestions. Ентусиастъ (talk) 16:19, 3 March 2022 (UTC)Reply
I have already spoken before. I'm for it too.--ZomBear (talk) 00:57, 4 March 2022 (UTC)Reply
@Thadh: Again, unfortunately, I see that the discussion has stopped again. It's been almost a month since no one has written anything. Every day I look forward to the solution of this issue with the Old Ruthenian language. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply
Done. What we need now is to split all pages into either Old East Slavic, Russian (with the Middle Russian label) or Old Ruthenian (with or without the Old Belarusian/Old Ukrainian label). Thadh (talk) 18:43, 21 March 2022 (UTC)Reply
I also removed Old Novogrodian as the child of Old East Slavic. Vininn126 (talk) 08:52, 4 October 2023 (UTC)Reply

@Thadh how about adding more etymology only language codes? Modern dictionaries use more than just Old Belarusian/Ukrainian. I saw Middle Bulgarian, Old Slovak, Old Slovene, Old Serbian, Old Croatian, Old Serbo-Croatian, Old Bulgarian, Old Upper Sorbian, Old Lower Sorbian. Possibly Middle Czech and Middle Polish also would be useful sometimes. Old Sorbian was also used by Boryś (Old Sorbian peleš as cognate for Polish pielesze), however we can't just link to both Lower and Upper Sorbian at once, so that would require full support for this language (?). Scientific publications mention Old Polabian as language of Polabian Slavs in Middle Ages, it is used usually for proper nouns like given names, theonyms, toponyms, sometimes ordinary words mentioned in Latin texts and it is always reconstructed language, I would like to have it tho. Sławobóg (talk) 14:32, 28 May 2022 (UTC)Reply

@Sławobóg I'll need from you in order to determine if the splits are worth it is:
- Exact boundaries of the languages' stages
- You need to check how much literature there is in the earlier stages of the language.
- You need to check in how much the languages differ from their modern stages.
Once you do that, we can continue the conversation about splitting them. It seems pointless to split a language off just because there are two inscriptions in some dusty old book. Thadh (talk) 15:15, 28 May 2022 (UTC)Reply
@Thadh: IMO Middle Polish would benefit greatly from the split.
  • Boundaries: As it is with extinct languages, there aren't really any exact boundaries, but it's usually defined as between the 16th and the 18th century; Polish Wiktionary has settled on years 1500 to 1750 to account for Doroszewski's dictionary.
  • Literature: There are two major corpora, accessible on the SPXVI and ESXVII websites.
  • Differences: I reckon the spelling and pronunciation differences, especially the employment of "slanted vowels" (samogłoski pochylone, I have no idea what their name is in English), should be enough.
Plus, like, this would help with attestation. Hythonia (talk) 11:08, 30 July 2022 (UTC)Reply
Middle Polish is also thusly defined on Wikipedia. I also think it would make more sense to have Middle Polish as an LDL. The alternative would be having a label. If we split, we'd have to add Middle Polish both to Proto Slavic descendent entries as well as intermediates on etymologies. Vininn126 (talk) 11:52, 30 July 2022 (UTC)Reply
Also pinging @KamiruPL, as an editor for Old Polish. Do you think we should fully split Middle Polish, create a label, or some other alternative? Vininn126 (talk) 13:44, 30 July 2022 (UTC)Reply
@Vininn126: I treat Arabic before the spread of printing in the Arab world, which is from 1800 (Napoléon brought the press to Egypt, which was then a state business that over time was rented by privates who would copy it), as LDL. The reason becomes more obvious for Hebrew where we are eager to include hapax legomena in the Tanakh and due to lacking distinctness of the Modern to the Biblical language, from which the former has been resurrected, have little desire to split. This is in analogy to the split of English from Middle and Old English, where basically the split happens following the new medium of printed books—accordingly if Polish literacy in the same fashion starts only somewhere in the 18th century then we become stricter only then.
Circumventing attestation criteria is no reason to split language headers, as your perception about whether something is another language is the same and only disingenuously modified by that consideration of its description. So more appropriate attestation criteria – and I think of the many carefully collected variants sadly left even unmentioned as a consequence of no sense of proportion applied to the teleology of our rules – by no means should serve motivation to split languages; we can already derive them by the accepted statutory interpretation methods.
To be clear, since legal thinking is unwonted and mysteriously strange to many in spite of people rightly being appointed for it in any society: In this case this is really just systematic interpretation: Since the community authoring the policies was biased towards English but the splits of other languages wrought comparative inconsistency with its situation according to which it has been split by chronolects, we break the criteria down to be suited for the languages they were only roughly devised for. Fay Freak (talk) 09:51, 31 July 2022 (UTC)Reply
In all honesty a label is likely the best option. Vininn126 (talk) 10:05, 31 July 2022 (UTC)Reply
@Hythonia @Sławobóg @KamiruPL I've gone ahead and added Middle Polish as a label. Vininn126 (talk) 12:11, 8 August 2022 (UTC)Reply
I've thought about this more, and I think there might be a case for Middle Polish as an L2. If we agree it should be split, I can help convert the existing entries to Middle Polish.
Here is my reasoning:
Old Polish, Middle Polish, modern Polish, and Silesian are four lects that are hard to separate accurately. Part of this argument hinges on Silesian, which we currently treat as an L2, and I don't see that changing. There are political, historical, and linguistic reasons
===Why Silesian should be an L2===
  • Its speakers feel strongly that it is a language, not a dialect, most Polish linguists pushing that it is a language include Jan Miodek, who is a notable prescriptavist who pushes more nationalistic views of how languages should be treated, and I believe that treating Silesian as a dialect is done partially to stifle any sense of individuality to further Polish control. However, I recognize that theory has some tinfoil-hat conspirist vibes to it, so I'll stick to its speakers strongly feel it is.
  • Significant linguistic difference: Silesian has a different phonology to Polish, and other grammatical features, such as retaining the Proto-Slavic aorist in an analytical past tense, as opposed to a more agglutinative/morphological one in Polish. It also recently has undergone strong standardization, as can be seen on silling.org and the ślabikŏrzowy szrajbōnek.
  • Significant lexical differences: Silesian differs quite a bit from Polish in terms of lexical information. Core inherited words are of course similar, but look at other Slavic languages. It's also been heavily "Policized", but so has Kashubian, which we also treat as an L2 and is recognized as a separate minority language in Poland, and both Kashubian and Silesian are recognized by ISO and Glottolog.
  • Finally, the key point to the overall arguement: Silesian is a descendent of Middle Polish. Most claims that it is Czechoslovakian are refuted by Silesian philologists.
===Why Middle Polish should maybe be an L2===
So if we decide that Silesian is an L2, that would give Middle Polish multiple descendents. This would "fix" many inherited etymologies, such as wszystek. This would also fix Latinate borrowings, where Silesian inherited an older pronunciation of Latinate words, and also the chain generally works better as Learned borrowing into Middle/Old Polish -> Polish + Silesian, as opposed to setting multiple Learned borrowings.
Furthermore, Middle Polish was siginificantly different from Modern Polish in terms of phonology and grammar (I recently updated the Middle Polish Wikipedia page). In terms of lexical content - there were significant shifts, I would say less than the standard differences between Slavic languages, but there were still trends, and dictionaries such as {{R:pl:SXVI}}, {{R:pl:SXVII}}, and occasionally {{R:pl:SJP1807}} or {{R:pl:SJP1900}} would be key in this. Furthermore, Middle Polish is otherwise resource poor, and should be treated as an LDL, label or not. Having it as an L2 is cleaner in terms of citations.
If we agree that this should be done, I would recommend setting the cutoff dates as c. 1500-c. 1780, with a language code of zlw-mpl. Vininn126 (talk) 12:39, 24 April 2023 (UTC)Reply
@Atitarev@Fay Freak@Hythonia@Sławobóg@Thadh@ZomBear@Ентусиастъ Vininn126 (talk) 17:30, 24 April 2023 (UTC)Reply
Update: there is debate as to whether Silesian should be listed as from Old Polish or Middle Polish, which really affects the above argument. Vininn126 (talk) 14:53, 25 April 2023 (UTC)Reply
Just flagging up that it's possible to give Middle Polish an etymology-only language code, and to set it as the ancestor of Polish (and Silesian, if desired). This would be a way to keep its entries under the Polish L2, while allowing etymologies to formally mention it. In turn, Middle Polish could have Old Polish set as its ancestor.
Of note is the fact we already have Middle Russian, Old Ukrainian, Old Belarusian, Middle Bulgarian and Early Modern Czech, which are all currently handled in the same way. Theknightwho (talk) 16:14, 25 April 2023 (UTC)Reply

Old Slovak ?

[edit]

How about adding code for the Old Slovak (zlw-osk) as well. In the same {{R:sla:ESSJa}} (ЭССЯ), especially in recent editions, Old Slovak is constantly listed separately. In this case, etymology-only code is sufficient. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply

@ZomBear @Thadh @Sławobóg @Vininn126 What is the current state of this? I notice that Middle Russian is an etym-only language of Russian and has two codes zle-mru and zle-oru, which looks very suspect. I also think Middle Polish has in fact been made an etym language of Polish. Benwing2 (talk) 06:24, 19 September 2023 (UTC)Reply
I still believe that at least there should be an etym-code for the Old Slovak language. It is also necessary to combine Czech & Slovak into the “Czech–Slovak family” in Slavic languages tree, as was done with Lechitic (zlw-lch) F. ZomBear (talk) 06:54, 19 September 2023 (UTC)Reply
I know that Sławobóg also wanted to split Old Slovak. As to grouping them and giving them a family lang code, I'm not sure. Perhaps Moravian should also be split and placed in this family. @Zhnka Vininn126 (talk) 07:51, 19 September 2023 (UTC)Reply
I'm pretty certain that this question isn't as straightforward as you make it out to be, and I read on multiple occasions that the similarities between Standard Czech and Standard Slovak arose due to Czech's influence on Slovak and that dialectal evidence shows no evidence of genetic relationship closer than on the West Slavic level. So I would like a more detailed discussion on this. Thadh (talk) 08:10, 19 September 2023 (UTC)Reply
@Benwing2 @Thadh @ZomBear @Sławobóg I was reading up on w:sk:Dejiny slovenčiny#11. až 18. storočie, and it seems like there were huge phonological and grammatical changes, IMO upon reading it enough to split Old Slovak into an L2. There also appears to be a dictionary Historický slovník slovenského jazyka that could be used as a source. So I propose that we split Old Slovak. Vininn126 (talk) 10:15, 1 October 2023 (UTC)Reply
Also @Zhnka for the tactical ping. Vininn126 (talk) 10:49, 1 October 2023 (UTC)Reply
 Support. @Vininn126 I just created a template for Historical Dictionary of the Slovak Language {{R:sk:HSSJ}}. It contains more than 70,000 words from the pre-literary period (before the 18th century) of the Slovak language. This is a really good source for Old Slovak. ZomBear (talk) 11:29, 1 October 2023 (UTC)Reply
@ZomBear We should be careful, however, Old Slovak is best described as 9th-14th centuries. Vininn126 (talk) 11:33, 1 October 2023 (UTC)Reply
@Vininn126 it’s just great what’s in this dictionary, when quoting, the year or century when the word was recorded is indicated. For example, voda (“water”), it can be seen that the oldest evidence for this word is 1473, 1585 and 1376. ZomBear (talk) 11:50, 1 October 2023 (UTC)Reply
 Support. Sławobóg (talk) 13:14, 1 October 2023 (UTC)Reply
I have split Old Slovak and given it the code zlw-osk. Vininn126 (talk) 19:26, 3 October 2023 (UTC)Reply

Proposal to rename Ottawa (otw) to Odawa

[edit]

I think Ottawa should be renamed to Odawa; It's the more common English name used to refer to the language nowadays, and preferred by speakers. What do you think? /mof.va.nes/ (talk) 15:47, 15 April 2022 (UTC)Reply

Re-merge Kven and Meänkieli into Finnish

[edit]

@-sche, Chuck Entz, Rua, Tropylium, Hekaheka, Surjection, Brittletheories, Mölli-Möllerö

In the previous discussion on this topic ([1]) it seems everyone has agreed that it's best to merge Kven and Meänkieli into Finnish. However, the discussion was closed without actually merging the codes, and currently we (again) have 40 Kven and 30 Meänkieli lemmas, many of which are also duplicated as Finnish for the reasons discussed in the above discussion. Has anyone changed their opinion or does anyone have anything to add to this or can we actually go ahead and merge the languages?

I guess related to this is also the question of how to handle dialectal morphology of Finnish dialects, but maybe that's a bit out of scope for this discussion. Thadh (talk) 16:24, 23 September 2022 (UTC)Reply

The strongest arguments in favour of splitting them are political and should therefore be ignored. Our task is to best present the most information, and that would best be achieved by merging the three lects. The dozens or so new dialectal terms will fit in quite well with the 1250 pre-existing ones. brittletheories (talk) 16:49, 23 September 2022 (UTC)Reply
Incubator says "Wikimedia does not decide for itself what is a language and what is a dialect. We follow the ISO 639 standard." This means that it's up to the agency that grants language codes, not to us, right? Meänkieli and Kven have written standards so they should stay as they are. (In my view, Tver Karelian should also be treated as a language so I could add Tver Karelian words without knowing if they're used in the more usual "vienankarjala" dialect.) Mölli-Möllerö (talk) 19:55, 23 September 2022 (UTC)Reply
The Incubator standards are not the same as our standards. Our language treatment does not strictly follow ISO 639. — SURJECTION / T / C / L / 20:33, 23 September 2022 (UTC)Reply
@Mölli-Möllerö: On the Tver Karelian issue, you could also just leave the first parameter of {{krl-regional}} empty or |1=? it, and it will automatically be sorted in Category:Karelian term requests, and I'll be able to add the terms later. Or you could use either {{R:krl:KKS}} or another Viena source, the correspondences are usually quite easy. Thadh (talk) 20:44, 23 September 2022 (UTC)Reply
Wrong. There's a big difference between Wikimedia's administrative needs and the lexical needs of a dictionary. As for written standards: the world is full of languages with multiple written standards: Brazilian and European Portuguese, European and Canadian French, Austrian and German German, etc. We can't let others decide for us- each case needs to be considered on its own. We've chosen to merge languages treated as separate by ISO and recognize languages with no ISO codes. In other cases we've gone with the ISO. Chuck Entz (talk) 20:59, 23 September 2022 (UTC)Reply
For outsiders, Meänkieli (in Sweden) and Kven (in Norway) are languages or rather dialects that have become languages by virtue of being across the border (the Finnish-Swedish border and the Finnish-Norwegian border, respectively). Finnish speakers can easily understand nearly 99% of Meänkieli or Kven, and the main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other).
Linguistically they are 100% dialects, but politically both Sweden and Norway respectively have recognized them as separate languages, which is also what their speakers think. A more cynical person might say that they have deluded themselves into thinking their language is not Finnish in order to avoid persecution of Finnish that was prevalent in Sweden and Norway in the 19th and 20th centuries ("Finnish? what Finnish? we're not speaking Finnish, it's Meänkieli/Kven").
However WIktionary best handles cases like these, I don't know. 200 years is not enough for what is generally a phonologically conservative language for it to become anywhere near unrecognizable. It could be compared to how Karelian is now almost universally treated as a separate language, even though it forms a dialect continuum and has been diverging now for at least about 800 years (ever since the 1323 Treaty of Nöteborg).
Finnish sources almost exclusively consider Meänkieli and Kven to be dialects, even more so when these sources are linguistic-oriented (some other sources take a political stance and recognize that they are considered "minority languages" in their respective countries). — SURJECTION / T / C / L / 20:34, 23 September 2022 (UTC)Reply
"The main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other)"... and the additional Swedish/Norwegian loanwords found in Meänkieli/Kven, of course. But many of these are also found in Finnish dialects. — SURJECTION / T / C / L / 21:37, 23 September 2022 (UTC)Reply
The divergence of Karelian from Finnish, FWIW, almost certainly goes back at least 1200 years (to the archeological / mentioned-in-Novgorod-sources Old Karelian culture). The initial split-off of Northern Finnish dialects is probably about as old too.
What I would think of as the best argument against treating Meänkieli and Kven as languages is that they're not even internally well-defined — typically they're just catch-all terms for "Northern Finnish in Sweden" and "Northern Finnish in Finnmark" with relatively various dialects encompassed by each. There's some efforts (schoolbooks, etc.) towards a "standard" Meänkieli based on the Torne Valley dialect but I don't think it could be called actually standardized just yet. I suppose one thing we could do is to document whatever is done on this specifically under "Meänkieli" and leave anything else as dialectal Finnish, but that might be a bit premature still too. --Tropylium (talk) 07:44, 24 September 2022 (UTC)Reply
I would not say that "everybody" agreed on the merger. I didn't. I can only comment Meänkieli but I would not be surprised if similar argumentation would also apply for Kven:
  • The overall small number of Meänkieli words in Wiktionary only proves that we don't have an active editor in Meänkieli. There seem to be some 30,000 entries in this Meänkieli--Finnish-Swedish dictionary[1]
  • The small sample of words we have proves nothing of similarity of the vocabularies. If you study the dictionary I mentioned (press "tutki") you'll find that there are considerable differences between Finnish and Meänkieli. In addition to vocabulary, conjugation of verbs seems to differ (e.g. Meänkieli: tukeat - Finnish: tuet - English: you support).
  • This article[2] promotes the opinion that Meänkieli is a dialect. However the writers admit that the two are not readily mutually understandable: Finnish-speakers usually understand Meänkieli relatively well, partly because of their knowledge of Swedish, but for Meänkieli speakers Finnish isn't as easy. If we took a Finn who does not know a word of Swedish, they would be lost with a Meänkieli speaker.
  • This article[3] starts from the maxim that Meänkieli is a dialect of Finnish but finishes with the conclusion that at the end of the day it is the spakers of a language themselves who decide the status of a language/dialect. Meänkieli speakers have made their opinion clear: they want it treated as a language. How competent are we to second-guess their point of view? Has any of us studied Meänkieli more than superficially?
Here is also a link to a Kven-Norwegian dictionary[4]--Hekaheka (talk) 09:44, 24 September 2022 (UTC)Reply
To be fair all these points would still hold for Ingrian and Savonian dialects, too, and of Ingrian dialects I'm fairly certain no Finnish speaker would readily understand them much better than, say, Izhorian or Karelian. Thadh (talk) 09:51, 24 September 2022 (UTC)Reply
A clear-cut solution would be to stick to ISO. Ingrian has an ISO code, Savo hasn't. Is Ingrian currently treated as Finnish dialect? I think it shouldn't. --Hekaheka (talk) 12:05, 24 September 2022 (UTC)Reply
You're confusing Ingrian (inkeroinen) and Ingrian (inkerin (suomalainen)). The first one is the same as Izhorian and is handled as a distinct language, has an iso code, and is spoken by the orthodox Izhorians. The latter one is the same as Ingrian Finnish and is handled as a Finnish dialect, does not have an iso code, and is spoken by the lutheran Ingrian Finns. My remark concerned the latter. Thadh (talk) 13:46, 24 September 2022 (UTC)Reply
I've come around to say that I think they should be merged. We don't consider Valencian, Ulster Scots nor Lemko (the linguistic case is very similar between those examples and this one) to be their own languages despite political arguments that they should be considered as such (and even some recognition like in the ECRML). We shouldn't do so here either. And don't even mention the whole thing going on with Serbo-Croatian... The general trend on en.wikt seems to be to consider the linguistic argument more important than any political ones (which I can appreciate). — SURJECTION / T / C / L / 11:51, 3 October 2022 (UTC)Reply
As a Norwegian, I find it odd that there is a proposal to merge Kven with Finnish - as Kven is an officially recognized minority language in Norway (Finnish is not). I do not agree with this merge, for the following reasons:
  • At least in Norway, Kven and Finnish are considered separate languages. You are able to get elementary school education and books in Kven (but not in Finnish, as far as I know) - you can even study Kven at the University of Tromsø and receive a bachelor's and master's degree in the language (there is a Finnish one as well, and they are considered two separate degrees). Kven people are considered a separate ethnicity, along with their language, descendant from Finns/Finnish.
  • Political reasons are of course relevant, not just linguistic ones. The average Kven speaker has never set foot in Finland, never studied any Finnish, nor consumed any part of Finnish culture and media (music, literature, etc.). An argument was that Finnish speakers understand 99% of Kven - as a Norwegian I understand up to 99% of Swedish and Danish, but they are not getting merged into one language called Scandinavian (for political reasons).
  • If merged, then in theory thousands of new Finnish entries on Wiktionary would emerge, in the form of "dialectal" words which are actually Kven words. If someone bothered to add them all (I, stubbornly, might) - then every Kven word and declension would need to be added under Finnish, and certain words and forms which don't even exist in Finnish dialects in Finland would be present. Every Kven word, even if the nominative singular is identical to Finnish, has a separate declension chart, every single one - there would then need to be a separate template to show these (I think Finnish Wiktionarians would be quite annoyed by this).
  • Kvens in Norwegian have fought very hard for their language, they have gotten their own language institute with a promotion of literature and culture in the Kven language - erasing their language from Wiktionary and treating it as a dialect of a language they don't even speak would be a huge slap in the face. Finns in Finland who speak a dialect of Finnish, also all know standard Finnish, Kven people do not. If a Kven person handed in an essay at a school in Finland, every other word would be marked as wrong or a typo. Supevan (talk) 22:49, 2 November 2022 (UTC)Reply
This entire argument can be boiled down to "Kven is standardized". So is Valencian and Croatian, but we still don't treat them as separate languages. — SURJECTION / T / C / L / 14:57, 5 November 2022 (UTC)Reply
@Surjection: Actually, Kven isn't firmly standardised afaik. Thadh (talk) 14:58, 5 November 2022 (UTC)Reply
We should. Supevan (talk) 17:35, 5 November 2022 (UTC)Reply
@Supevan Most of these points were already raised for Meänkieli. I will try to answer them anyways.
1) First, our standard procedure is to emphasise linguistics over politics, even when much more controversial (see WT:Serbo-Croatian).
2) Secondly, and most importantly, you claim all Kven inflection should be incorporated into Finnish. This is false. There is already a ridiculous amount of variation in the inflection of the various Finnish dialects, and none of it is represented here. We simply do not have the capacity to maintain 30 different tables containing dozens of inflected forms. Additionally, natives do not stick to one variety of Finnish but mix standard Finnish grammar with that from various dialects and registers. It would also be naive to assume that Kven speakers all use one well-defined standard themselves. A language with a morphology as righ as that of Finnish leaves much space for variation.
3) You say, "thousands of new Finnish entries [– –] would emerge, in the form of 'dialectal' words which are actually Kven words", but this is only true if one assumes Kven not to be a collection of Finnish dialects, which is not a popular opinion among linguists. Besides, only a small number of these terms are exclusive to the Ruija dialects.
brittletheories (talk) 13:46, 27 January 2023 (UTC)Reply

2023

[edit]

Church Slavonic and Moravian

[edit]

Technically Old Church Slavonic and Church Slavonic should be two two separate languages (?), but we only have the former probably because of the small number of editors. These languages are always treated as two different languages in etymology. For now in etymologies and Proto-Slavic pages (*viňaga). For now we trick it as Church Slavonic: {{l|cu|асдф}} or Church Slavonic: {{desc|cu|асдф|nolang=1}}. That is not very convenient, we should have separate etycode for Church Slavonic.

We Should also have etycode for Czech Moravian, which is also pretty often used in Proto-Slavic pages (and many etym dictionaries), Serbo-Croatian has templates like that (ckm, sh-kaj, sh-tor). Sławobóg (talk) 12:53, 5 February 2023 (UTC)Reply

@Павло Сарт, Atitarev, Kamen Ugalj, Skiulinamo, Rua, ZomBear, Bezimenen, IYI681, Vininn126 pinging some people that might be interested. Thadh (talk) 13:03, 5 February 2023 (UTC)Reply
 Support @Sławobóg I completely agree with you, we need a separate etymological code for the usual Church Slavonic language. I constantly thought about it, why is it not there.. --ZomBear (talk) 19:32, 5 February 2023 (UTC)Reply
 Support for Church Slavonic Безименен (talk) 13:45, 7 February 2023 (UTC)Reply
 Oppose for Czech Moravian: there would be 20-30 more regional varieties that could spring if one started Balkanizing Slavic languages + I don't want to give food for thought to Z-Russians. There are already talks for forging Novorussian, Transnistrian, or Lipovan Russian in order to justify their expansive aspirations over former Imperial Russian territories. Безименен (talk) 13:45, 7 February 2023 (UTC)Reply
 Support for Church Slavonic AshFox (talk) 11:51, 6 January 2025 (UTC)Reply
 Support for Church Slavonic, or at least there should be a concrete way to handle non canonical words. Chihunglu83 (talk) 11:55, 6 January 2025 (UTC)Reply
@Sławobóg @AshFox @Павло Сарт, @Atitarev, @Rua, @ZomBear, @Bezimenen, @IYI681, @Thadh
It's been a long time, but rereading this thread, we can see at least 5 people for splitting Church Slavonic. I propose to split Church Slavonic and give it etymology codes for the two variants, which I think best matches consensus by number of votes, even if there is disagreement within that. As to Moravian, I think it would be a safe split, but we only had 2 people speak up on it. I'd like the input of other Czech editors. I'll also add Wiktionary:Language_treatment_requests#East_Lechitic_typology and say that the dialect groups all got etymology codes and it has not led to more codes and has overall been a massive benefit. Vininn126 (talk) 12:11, 6 January 2025 (UTC)Reply
@Vininn126: What two variants do you mean? Russian and Serbian? Russian and Croatian? I think this was always the issue with splitting, because we don't have enough people that could comment on which varieties can and cannot be handled together. Thadh (talk) 16:07, 6 January 2025 (UTC)Reply
Perhaps those in favor of splitting could comment. Vininn126 (talk) 16:15, 6 January 2025 (UTC)Reply
It appears it should have 4 variants. Vininn126 (talk) 09:48, 7 January 2025 (UTC)Reply
I have made zls-chs at Module:languages/data/exceptional. As far as etycodes for the recessions and setting east South-Slavic as descendants, this thread should be expanded. At this time an etycode for Moravian needs more input as well. Vininn126 (talk) 10:24, 7 January 2025 (UTC)Reply
> Etym-codes for recensions of Church Slavonic. AshFox (talk) 12:09, 11 January 2025 (UTC)Reply

I also propose to do away with similar problems in the tree of Slavic languages once and for all. I suggest:

  • South Slavic:
1. Add etymological code for Old Serbo-Croatian (zls-osh). With a redirect to modern Serbo-Croatian. Meets regularly in {{R:sla:ESSJa}}.
2. Add etymological code for Old Slovene (zls-osl). With a redirect to modern Slovene. Meets regularly in {{R:sla:ESSJa}}.
3. Move the Macedonian language to the descendant of Old Church Slavonic, as it was done some time ago with the Bulgarian language.
4. Add etymological code for Church Slavonic (cu-chu). Perhaps even with a division into Russian Church Slavonic (cu-rcu), Serbian Church Slavonic (cu-scu) and others, if any.
  • West Slavic:
1. Add etymological code for Middle Polish (zlw-mpl). With a redirect to modern Polish or (?). @KamiruPL, Vininn126
2. Add etymological code for Old Slovak (zlw-osk). With a redirect to modern Slovak. It was high time to do it! Meets regularly in {{R:sla:ESSJa}}. Especially if even Early Modern Czech (cs-ear) was awarded a separate code.
3. Possibly add (family code) a Czech–Slovak languages (zlw-csk) ?. Just like there are Lechitic (zlw-lch) F.
4. It's possible: add etymological code for "Old Sorbian" (see Wendish/Lusatian ?) (zlw-osb)? Perhaps with a redirect to Upper Sorbian or (?).
  • East Slavic:
1. Rename etymological codes Old Ukrainian (zle-ouk) & Old Belarusian (zle-obe) → Middle Ukrainian (zle-muk) & Middle Belarusian (zle-mbe), respectively. A similar request from another user was about six months ago (Wiktionary:Beer parlour/2022/September#“Old Ruthenian” language). Therefore, with "Old" for those languages, these are "parts" of Old East Slavic until the 14th c. (this is indicated on the en.Wikipedia).
2. Probably it is worth removing the Old Novgorod from the descendants of the Old East Slavic. Make it a separate and parallel ancient language in the East Slavic subgroup. --ZomBear (talk) 19:32, 5 February 2023 (UTC)Reply
3. Add etymological code for Pannonian Rusyn with a redirect to Rusyn (rue).
  • PS: LOL, I'm serious, add an etymological code for "Early Proto-Slavic" (sla-ear) (?) with a redirect to Proto-Balto-Slavic (?). Because Wiktionary "for the standard" uses a rather late version of the Proto-Slavic language. And sometimes in the Etymology section it may be necessary to indicate an earlier form, and the presence of a separate etym-code for "Early PSl." would not be superfluous. --ZomBear (talk) 19:50, 5 February 2023 (UTC)Reply
I don't think any "Old Sorbian" is attested. Both Upper Sorbian and Lower Sorbian are attested only from the 16th century, and they were already distinct at that point. In theory there could be a code for Proto-Sorbian, but it would have to be a full-fledged protolanguage, not an etymology-only language. —Mahāgaja · talk 20:17, 5 February 2023 (UTC)Reply
@Mahagaja Yeah, I'm not sure about "Old Sorbian" either. This suggestion is only possible. I relied on the fact that in {{R:sla:ESSJa}} sometimes there are words with abbreviations "ст.-луж."/"др.-серболуж." ("старолужицкий"/"древнесерболужицкий" = translation "Old Sorbian") without specifying where the word belongs - to the Upper or Lower Sorbian language. --ZomBear (talk) 21:09, 5 February 2023 (UTC)Reply
@ZomBear: I agree with most of your suggestions, except for Old Serbo-Croatian and Old Sorbian. Serbs and Croats never had an organized shared language until 17-18 century. One could perhaps talk about an Old Serbo-Croatian stage in the development of the Dinaric Slavic complex, but there never was a common language that could be associated with this period (leaving aside the Bosno-Rascian recension of Church Slavonic or Glagolitic Croatian). The same holds in even greater magnitude for Sorbian. Sorbs may self-identify as one people ethnically, but linguistically their languages are noticeably divergent.
PS I also don't see much educational value in copying all the distinctions that you can find in ESSJa. Note that it often gives old spellings that precede various spelling reforms, dialectal forms which don't follow any orthographic standard, morphological variants (like diminutive forms, etc.) which don't contribute much additional insight, it provides local colloquial meanings which are clearly recent innovations, etc. I personally prefer a more concise and economic presentation for reconstructed terms rather than having 10-15 dialectal spellings of Serbo-Croatian or those monstrosities that are given as dialectal variants of Polish/Bulgarian/Slovenian by ESSJa. Meiner Meinung nach, such an information should go to the respective page of the daughter language, rather than overblowing the proto-Slavic Descendants section.
PS2 Early proto-Slavic is a useful designation, however, I don't know where exactly where one should draw the border between Early, Middle and Late proto-Slavic and what notation should be applied. Безименен (talk) 13:30, 7 February 2023 (UTC)Reply
As it stands, Middle Polish is listed as a variant of Modern Polish. We do see some significant phonological changes and a few semantic ones as well, however, it's hard to say whether it should have its own code or not. Even if it did, it would certainly be a redirect to Modern Polish, seeing as it's a period of only about 1250 years. (1500-1750). Vininn126 (talk) 13:36, 7 February 2023 (UTC)Reply
@Vininn126: That's 250 years. —Mahāgaja · talk 15:16, 7 February 2023 (UTC)Reply
The one and the two are right next to each other.

Polish Silesian and Silesian

[edit]

@Shumkichi @KamiruPL The Cieszyn Silesia Polish category has many terms that should probably be moved to Silesian proper. Can we figure out which ones we need to fix? Vininn126 (talk) 12:29, 8 March 2023 (UTC)Reply

Also maybe @Hythonia, @Sławobóg Vininn126 (talk) 12:30, 8 March 2023 (UTC)Reply
Idk where Silesian proper starts and Silesian Polish ends so I don't think I'll be of much help o_ _ _ _ _ _ _ _ _ _ _ _ O Maybe let's just assume they'd all be used in Silesian anyway, and then we can add Polish headers to the few entries that can be considered dialectal Polish after we find some sources later??? Shumkichi (talk) 13:33, 8 March 2023 (UTC)Reply
@Vininn126, Shumkichi Not to throw a monkey wrench into this discussion but ... I read the Wikipedia article on Silesian and it seems there's debate over whether it's a separate language as well as a not-yet-established writing system. Given this, I wonder if it wouldn't be better to unify Silesian and Polish similarly to the way that all Chinese lects as well as Serbo-Croatian are unified. The motivation here is practical: it's significantly more difficult to implement and maintain all the infrastructure for two separate L2's vs. one unified L2, and the minority status of Silesian means it's likely to not get much love as a separate L2 (compare the situation with Jeju vs. Korean and Scots vs. English). Benwing2 (talk) 06:19, 16 March 2023 (UTC)Reply
@Benwing2 I've actually been trying to do some research on this. One problem with that system are the politics involved - there is a considerable Silesian group that consider it separate. I've also been trying to do some research on the pronunciation, but there are some major difference that point to Silesian having come from an older variant of Polish, as opposed to a modern one. And as to the orthography, recently, Ślabikorz śląski was introduced and has been fairly widely adapted, even silling.org has a normalizer - I've included all of this in WT:About Silesian, and I would actually like to go through all the entries and do a major cleanup. I've even been trying to set up other infrastructure. Vininn126 (talk) 09:59, 16 March 2023 (UTC)Reply
As to the fact of it coming from an older variant - there are significant sound differences, such as maintaining distinctions from previous long vowels, having more of a 7 vowel system like in Italian, and some significant grammatical differences like continuing the old aorist in a past tense system that's completely different. Vininn126 (talk) 10:22, 16 March 2023 (UTC)Reply
@Vininn126 I think it's a mistake to conflate whether language A and B are different languages with whether they need separate L2's in Wiktionary. IMO the latter question should be determined by what makes for less work and duplication. If the majority of terms in Silesian are the same as in Polish (which I suspect they are), it might make sense to unify them. The current set of lemmas is non-representative in that it mostly covers lemmas that are different in Silesian. Benwing2 (talk) 15:25, 16 March 2023 (UTC)Reply
@Benwing2 In order to determine that we need more data on that and currently there aren't any major Silesian dictionaries aside from Silling, which is relatively new, and it's currently doing a massive import of words. Currently they are important a Polish-Silesian dictionary so based on that alone it would suggest a lot sharing. However further work needs to be done to determine how different they really are. As someone who works with it more, I'd say it's not any more different than some of the differences between other Slavic languages, which are remarkably similar. Vininn126 (talk) 15:34, 16 March 2023 (UTC)Reply
@Vininn126: Makes sense, thanks. Benwing2 (talk) 15:42, 16 March 2023 (UTC)Reply
@Benwing2 And I think you didn't understand his point. Silesian is not a dialect of Polish since it doesn't come from modern Polish - they both come from Middle Polish (or you could call it Middle Silesian, it doesn't matter, it's just that Polish's always had more speakers, hence the privileged position of Polish over other dialects). That's why your comparison to Serbo-Croatian makes no sense since S-C. is a single language with most of its officially recognised "varieties" not even being different dialects nor even subdialects but simple local variants with at most a few different words, lol. Silesian and Polish, on the other hand, are full of seemingly small but SYSTEMATIC differences that all add up to them being sufficiently different (more so than e.g. Czech and Slovak, I'd say). And the important thing is that they differ not only in vocabulary but also in syntax.
"If the majority of terms in Silesian are the same as in Polish (which I suspect they are)" - no, they are not the same, and your suspicion is wrong. It's as if you looked at the spelling of some Kashubian words and compared them to their Polish cognates - yes, their orthographies are quite similar but it's jsut a superficial similarity. Shumkichi (talk) 20:17, 16 March 2023 (UTC)Reply
@Shumkichi Don't get all worked up over this. You didn't even read the first line of my comment: "I think it's a mistake to conflate whether language A and B are different languages with whether they need separate L2's in Wiktionary." Benwing2 (talk) 20:33, 16 March 2023 (UTC)Reply
@Benwing2 I'm not worked up??? And I did read it, that's why I said the orthographies are different, and that's enough NOT to merge Silesian entries with Polish ones. Polish has an official body that regulates its orthography so it can't use two different spelling norms that also differ in pronunciation. Capisci? Shumkichi (talk) 20:55, 16 March 2023 (UTC)Reply
Also, according to your argument, we should merge Czech and Slovak. But KKK, as they say in Polent. Shumkichi (talk) 20:56, 16 March 2023 (UTC)Reply
Alright, let's cool it here. It seems like Silesian is here to stay at least for the time being. Vininn126 (talk) 21:17, 16 March 2023 (UTC)Reply


Renaming Proto-Mon-Khmer to Proto-Austroasiatic

[edit]

Proto-Mon-Khmer is deprecated. The name of Category:Proto-Mon-Khmer language needs to be changed to Category:Proto-Austroasiatic language, just like how we have Category:Proto-Sino-Tibetan language rather than Category:Proto-Tibeto-Burman language. See the Wikipedia article on Austroasiatic languages to get an idea of why Mon-Khmer is no longer valid, because Munda and Nicobarese are simply regular branches that are sisters of the other so-called Mon-Khmer languages.

The page names can simply be renamed, and the lemmas do not need to be changed. Category:Proto-Sino-Tibetan language is a perfect example of this. The Proto-Sino-Tibetan lemmas are actually all Proto-Tibeto-Burman reconstructed forms by James A. Matisoff, who considers Tibeto-Burman to be a branch of Sino-Tibetan. Now, more scholars are thinking that Chinese is simply another another regular sister branch of the various Sino-Tibetan languages out there, rather than its own special branch. Same goes for Mon-Khmer.

So how can this name change be done? Ngôn Ngữ Học (talk) 22:23, 18 March 2023 (UTC)Reply

Formerly:

  • Austroasiatic
    • Munda
    • Mon-Khmer (which Shorto reconstructed)
      • (about a dozen branches)

Now the consensus is that the tree has a rake-like structure (per Sidwell):

  • Austroasiatic
    • (about a dozen branches including Munda)

That's why Mon-Khmer is an obsolete term now.

Similarly, with Sino-Tibetan, it formerly was:

  • Sino-Tibetan
    • Chinese
    • Tibeto-Burman (which Matisoff reconstructed)
      • (dozens of branches)

Now the consensus among many scholars is that the tree has a rake-like structure with many "fallen leaves" (quoting George van Driem), making Tibeto-Burman obsolete:

  • Sino-Tibetan
    • (dozens of branches including Chinese)

Ngôn Ngữ Học (talk) 22:27, 18 March 2023 (UTC)Reply

Support. If this change happens we should delete Category:Mon-Khmer languages. Benwing2 (talk) 23:41, 18 March 2023 (UTC)Reply
Abstain. I prefer to wait for when an actual new reconstruction of Proto-Austroasiatic is published to do the move, see what I wrote at Wiktionary:About Proto-Mon-Khmer, but I do not actually oppose to moving now. However, if the move do happen, I'm would like to see a line like "This reconstruction is from Shorto (2006) for the obsolete concept of Proto-Mon-Khmer, and should not be treated as actual reconstruction of Proto-Austroasiatic, which as of now has not yet fully materialized, and is simply "placeholder" for the actual Austroasiatic etymologies" (probably as a template) to be added as warning for every reconstruction item. I very much want the same thing to happen to "Proto-Sino-Tibetan", considering a lot of them are no way near actual Proto-Sino-Tibetan, and the reconstruction items themselves are "icky" to say at least. PhanAnh123 (talk) 01:52, 19 March 2023 (UTC)Reply
@PhanAnh123: Take a look at Sidwell's Proto-Austroasiatic reconstruction and Shorto's Proto-Mon-Khmer reconstruction. Sidwell's inclusion of Munda and Nicobarese had virtually no impact on his Proto-Austroasiatic reconstruction (versus if he had only included the "Mon-Khmer" languages) because he considered Munda to be highly innovative and restructured, with few original retentions from Proto-Austroasiatic. Furthermore, it would be very confusing to have duplicates for both Proto-Austroasiatic and Proto-Mon-Khmer. I would just merge them as Proto-Austroasiatic. Ngôn Ngữ Học (talk) 19:25, 19 March 2023 (UTC)Reply
I have no intention to keep Proto-Austroasiatic and Proto-Mon-Khmer seperated (I consider Proto-Mon-Khmer to be likely a ghost after all), what I mean is that we either should keep the entries as are until actual Proto-Austroasiatic reconstruction comes about, or move the "Proto-Mon-Khmer" items to Proto-Austroasiatic but with the warning added. I know what you mean by "inclusion of Munda and Nicobarese had virtually no impact", because like Sidwell, I do think these branches are quite innovative, however, that does not mean I agree to move the Shorto's Proto-Mon-Khmer reconstruction to Proto-Austroasiatic without any warning, since Austroasiatic linguistics have progressed quite a lot even outside of those two branches. The vocalism in Shorto (2006) was very rudimentary reconstructed, which the reconstruction of the descendant branches as well as the recent "sneak peek" to Proto-Austroasiatic reconstruction by Sidwell improved upon; furthermore, the syllable structure itself is also slightly changed, it is now thought that a glottal stop phonetically presented in any Proto-Austroasiatic word that ended in a pure vowel (meaning any word ended in *aːj would still have *aːj, but those ended in **aː would automatically became *aːʔ), plus there is the status of *ʄ- that very much awaits assessment in the actual reconstruction of Proto-Austroasiatic. Like I said, I don't oppose moving, but there much be strings attached. PhanAnh123 (talk) 01:53, 20 March 2023 (UTC)Reply
@PhanAnh123, Ngôn Ngữ Học Such a warning can be added by bot to the top of all entries if both of you agree. Benwing2 (talk) 03:30, 20 March 2023 (UTC)Reply
@Benwing2: Agree, a warning placed by a bot should be sufficient. Also @PhanAnh123, we can use Sidwell & Rau (2015) for some of the basic Swadesh list words, but a full reconstruction of Proto-Austroasiatic is currently being done by Sidwell. It should come out in a few years. Ngôn Ngữ Học (talk) 10:19, 20 March 2023 (UTC)Reply
We are all in agreement then, so obviously now I support moving. With this Munda cognates can be directly added to the entries. PhanAnh123 (talk) 10:29, 20 March 2023 (UTC)Reply
Agree on the support.
Abstain Support. I've seen assertions that Mon and Khmer actually form a subgroup within the traditional Mon-Khmer grouping. Of course, it could be something messy as with Indo-European, where we have at least Indo-Iranian and Balto-Slavonic. --RichardW57m (talk) 16:19, 21 March 2023 (UTC)Reply
There is no such thing as a Mon+Khmer grouping within Mon-Khmer. Some classifications propose Eastern, Southern, and Northern groupings within Mon-Khmer, but none of them put Monic and Khmeric together. Please consult the Austroasiatic languages article on Wikipedia to get a basic refresher of all the major previous classifiations. Ngôn Ngữ Học (talk) 15:04, 23 March 2023 (UTC)Reply
The cited articles do show that their crown group is larger than Monic + Khmeric, but it does look as though we don't need to worry about anyone using 'Mon-Khmer' to denote their (weak) association. --RichardW57m (talk) 11:36, 28 March 2023 (UTC)Reply

Renaming Proto-Hmong to Proto-Hmongic

[edit]
  1. Category:Proto-Hmong language needs to be changed to Category:Proto-Hmongic language. See Hmongic languages and Hmong language on Wikipedia.
  2. Category:Proto-Mien language needs to be changed to Category:Proto-Mienic language. See Mienic languages and Iu Mien language on Wikipedia.

The Hmong-Mien language tree is like this:

  • Hmong-Mien
    • Hmongic
      • Hmong
      • (dozens of languages)
    • Mienic
      • Iu Mien
      • (several languages)

Proto-Hmong refer thus refers to only Hmong, not Hmongic. There are dozens of Hmongic languages that are not Hmong. They include Hmu, Pa Hng, Bunu, She, and others.

Same goes for Proto-Mienic. Proto-Mien technically refers to Proto-Iu Mien, but does not include Kim Mun, Biao Min, and Dzao Min.

Ngôn Ngữ Học (talk) 22:23, 18 March 2023 (UTC)Reply

Support. If we make this change we also need to rename the families, i.e. Category:Hmong languages -> Category:Hmongic languages and Category:Mien languages -> Category:Mienic languages. This is similar to the change from Category:Korean languages -> Category:Koreanic languages, which was implemented in Jan 2022. Benwing2 (talk) 23:45, 18 March 2023 (UTC)Reply
 Support. Theknightwho (talk) 17:57, 1 June 2023 (UTC)Reply


Okinoerabu and Tokunoshima

[edit]
Discussion moved from Wiktionary:Beer parlour/2023/June.

These are two Ryukyuan languages that we currently call Oki-No-Erabu and Toku-No-Shima, because that’s how they’re spelled in ISO 639. However, literature invariably uses the unhyphenated forms, and they’re also much easier to read.

Could we please therefore rename them to the unhyphenated forms? Theknightwho (talk) 19:39, 4 June 2023 (UTC)Reply

I dislike the EN penchant for glomming Japanese names into long undifferentiated strings, as I find that this instead makes them harder to read, and it erases the distinction between the actual component terms.
In some cases, the resulting interpretation or partial-expansion goes sideways, as we see at w:Tokunoshima, where the English text describes this as "Tokuno Island" -- the no portion is simply the genitive particle (no), so as Japanese, this is better thought of as "Toku Island".
That aside, I do see that w:Tokunoshima language lists the alternative rendering "Toku-No-Shima", and the w:Okinoerabu dialect cluster similarly lists the alternative rendering "Oki-no-Erabu". A quick-and-dirty Google hits comparison (including "the" to filter for English hits):
In the English-language web, the allthewordsruntogether renderings appear to be most common. Meanwhile, the
Language Subtag Registry based on ISO 639 and maintained by IANA
(https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) does indeed use the hyphenated descriptors.
Meh. After digging into this some, I realize I just don't care all that much one way or the other. ‑‑ Eiríkr Útlendi │Tala við mig 22:09, 9 June 2023 (UTC)Reply
Searching on Google Scholar, it seems the unhyphenated forms are more common, but I concur with Eirikr's views that they look worse.
However, I would suggest that if we were to retain the hyphens, the two languages should be renamed to "Oki-no-Erabu" and "Toku-no-shima" (or the rarer "Toku-no-Shima"), since the these are more common from Google Scholar, and also because "no" is a particle that shouldn't be capitalised in a proper noun, cf. Southend-on-Sea, Stoke-on-Trent or von, de, etc. in surnames. – Wpi (talk) 11:20, 21 June 2023 (UTC)Reply


Correct language names

[edit]

Could you correct Juǀ'hoan to Juǀʼhoan, Kwak'wala to Kwakʼwala, and K'iche' to Kʼicheʼ? There's no punctuation in the ethnonyms. If we want to use assimilated English forms, then the latter would be Quiché; I'm not sure about Juǀʼhoan. kwami (talk) 19:16, 13 July 2023 (UTC)Reply

  •  Support. To clarify for people using low-resolution screens: the request is to use the modifier letter apostrophe character ʼ rather than the typewriter apostrophe '; the categories are currently at Category:Juǀ'hoan language (ktz) and Category:K'iche' language (quc). Our usual practice is to use the spelling most common in contemporary English-language discussions of the language. Which is more common in current books and journal articles, Kʼicheʼ or Quiché? —Mahāgaja · talk 19:30, 13 July 2023 (UTC)Reply
    Just to be clear, I personally don't care about ASCII substitutions in category names; what I'm concerned about is proper headers in the dictionary entries. But it's fine by me if the two go together.
As for Kʼicheʼ or Quiché, the English-language lit has been moving from the Spanish form to the ethnonym. That's an ongoing trend, though of course not universal (e.g. 'German', 'Greek', 'Armenian' etc.). kwami (talk) 21:15, 13 July 2023 (UTC)Reply
The L2 headers and category names do need to match, at least for readers using tabbed browsing. Otherwise, the categories won't appear in the correct language tab. I think there are also bots that require the L2 header to be the canonical language name in order to work properly. —Mahāgaja · talk 22:20, 13 July 2023 (UTC)Reply
Okay. Works for me. kwami (talk) 22:24, 13 July 2023 (UTC)Reply
@Kwamikagami Normally at Wiktionary we use typewriter apostrophes rather than curly single quotes, and this issue is somewhat controversial, so this change is unlikely to happen without significant further discussion and consensus. Benwing2 (talk) 04:27, 24 July 2023 (UTC)Reply
I'm not requesting quote marks. That would also be incorrect. Rather, since we are attempting to use the endonym, IMO it should be the glottal stop or ejective diacritic that's in the orthography. kwami (talk) 04:41, 24 July 2023 (UTC)Reply
Indeed, no one is advocating curly single quotes. The modifier letter apostrophe is a different character; it's a letter, not a punctuation mark. There are several other language names besides these two that ought to be using it. —Mahāgaja · talk 06:23, 24 July 2023 (UTC)Reply
Sarci, for example, which was just moved to its endonym (minus tone marking). But I thought I'd wait to see how things went before attempting a more comprehensive proposal. kwami (talk) 06:27, 24 July 2023 (UTC)Reply
 Support - this isn't a matter of using curly quotes vs straight ones; it's a matter of using the correct letter instead of punctuation. We already do this extensively in entries for languages that use it anyway. Theknightwho (talk) 15:39, 24 July 2023 (UTC)Reply
Going through WT:LOL, these are the languages whose names have the modifier letter apostrophe at Wikipedia but the typewriter apostrophe here:
Other languages with typewriter apostrophe whose Wikipedia article uses a different character include:
  • gez Ge'ez → Geʽez with ʽ (U+02BD modifier letter reversed comma)
  • hps Hawai'i Pidgin Sign Language → Hawaiʻi Pidgin Sign Language with ʻ (U+02BB modifier letter turned comma)
  • num Niuafo'ou language → Niuafoʻou with ʻ (U+02BB modifier letter turned comma)
  • tct T'en → Tʻen with ʻ (U+02BB modifier letter turned comma)
  • tsl Ts'ün-Lao → Tsʻün-Lao with ʻ (U+02BB modifier letter turned comma)
I support making all of these changes. —Mahāgaja · talk 19:54, 24 July 2023 (UTC)Reply
I oppose these changes. What is the actual benefit? From the above discussion, there are at least three different Unicode apostrophe-like characters involved, which are easily confused, and it will make it significantly harder to type the language names into headers, categories and the like. This is going to be a major pain in the ass for people like me who will have to clean up wrongly-typed apostrophes in language headers in innumerable articles created by IP's and other occasional contributors, who are unlikely to be able to type the right character. Furthermore, even with these changes, the language names in many cases will not actually match their endonym spelling; cf. the proposed Oʼodham, which is actually spelled ʼOʼodham natively with two apostrophes. Similarly, as pointed out by User:Kwamikagami, our spelling of the CAT:Tsuut'ina language doesn't include the tone mark that is present in the native orthography, and wouldn't even with the change in apostrophe. I should add that Wikipedia uses these Unicode chars specifically because Kwami went around renaming all the articles (formerly they used the straight apostrophes), and is not consistent, e.g. the article on the name of the people is still at O'odham with a straight apostrophe. Glottolog uses straight apostrophes for O'odham; so does [5], the Endangered Languages Project. In general, our policy is to use the *English* names for languages; we are not forced to use the exact native spelling. While I agree it's a good idea to approximate the spelling (e.g. avoiding exonyms where possible), I disagree we have to take this to the extreme of using the "correct" Unicode apostrophes (which I bet you will find native speakers not using in many cases as well). Benwing2 (talk) 20:22, 24 July 2023 (UTC)Reply
Other people's carelessness in using Unicode is no excuse for us to be careless, and anyway, language names can always be inserted by typing {{subst:\|xyz}}, which doesn't involve any non-ASCII characters. Latin a and Cyrillic а look identical in every font and font style too, but substituting one for the other is an error; it's no different with ' and ʼ. —Mahāgaja · talk 07:05, 25 July 2023 (UTC)Reply
I think you're missing the point. We don't include Cyrillic letters in language names, either. Benwing2 (talk) 07:13, 25 July 2023 (UTC)Reply
I know that. My point is that using ' where ʼ belongs is as bad as using Cyrillic letters in Latin-script language names. —Mahāgaja · talk 07:24, 25 July 2023 (UTC)Reply
I would support the changes, but only if they're truly the most used forms in terms of literature. Ideally we'd have people from each community give their opinions here, but alas, we're not afforded that. If the specific respective unicode apostrophe is used in literature, then we can use it here too. I can see the problem with inputting the apostrophes that's been brought up, but let's be real here, how many people are actually working on these languages to where this'd be a serious problem? I feel like this could be fixed with just an about:XYZ page or something. These languages unfortunately don't get enough traction. But again, I'd only support this if it can be proven that they're the forms used in English literature. AG202 (talk) 01:49, 17 August 2023 (UTC)Reply
@AG202 I agree with you, that is one of the points I made above, which has gotten lost in this thread. Benwing2 (talk) 02:08, 17 August 2023 (UTC)Reply
Ahh, got it, missed that, apologies. AG202 (talk) 02:11, 17 August 2023 (UTC)Reply
Hmm... like Benwing, my initial inclination is to oppose this, because the odds of anyone being able to type names with the fancy characters when adding entries is low (and given recent events, I wonder if one or more admins would block people for 'adding wrong language names' if people keep typing the names they're able to type). OTOH, I recognize that we require entries themselves to be input using correct spellings (with accents etc) and not in hacky ways... If we had a system like the French Wiktionary where no-one had to type the language names (instead only typing language codes, which only consist of easily-typeable ASCII characters), then changing the displayed character would be less of a problem (though still hard for navigating to categories, etc). Do we have a template with a simple short name people could subst: to produce the untypeable names, so they could write =={{subst:langname|foo-bar}}== to get ==Fooʾbar==? Or if we took this type of functionality and had a button people could periodically press (hosted on here like that Javascript is, not as a Python script on the computer of a user who might leave the project or be too busy to run it) that would search the database for instances of the typeable names and update them to the untypeable names, then it would be less of a problem (although it'd still be creating an unending maintenance task). - -sche (discuss) 16:22, 16 August 2023 (UTC)Reply
We do have {{subst:x2i}} that will convert the string _> to ʼ, but more helpfully we have (as I mentioned above) {{subst:\}}, which converts a language code to its canonical name. —Mahāgaja · talk 21:55, 16 August 2023 (UTC)Reply
Even with these workarounds, it seems extra work for no gain. There is no rule that says we need to follow native orthography to the T in our English names for languages; otherwise we'd have Deutsch in place of German, and русский in place of Russian, etc. I have seen no arguments that indicate why having these special apostrophes in language names gains us anything except some nebulous sense of "correctness". Benwing2 (talk) 23:07, 16 August 2023 (UTC)Reply
Deutsch is the endonym. What we're talking about here is using the proper Unicode characters for whichever name we decide to use. The apostrophe is a punctuation mark, and the glottal stop is not punctuation. Using the letter for glottal stop is analogous to using en-dashes and minus signs rather than hyphens. kwami (talk) 00:28, 17 August 2023 (UTC)Reply
Deutsch is the endonym
Yes exactly. The exonym can have apostrophes while the endonym has Unicode whatever. Nothing wrong with that. Benwing2 (talk) 00:56, 17 August 2023 (UTC)Reply
@Benwing2 I think we’re getting too focused on Unicode. The thing we should care about is what character is actually intended, which isn’t necessarily the same as what they actually wrote. To use an analogy: we don’t lemmatise the palochka with the numeral 1 or Latin l, even though both are probably more common than the actual palochka character, and that’s because we all know that the writer intended to use a palochka irrespective of what character they actually wrote in Unicode. Theknightwho (talk) 02:18, 17 August 2023 (UTC)Reply
@Theknightwho I think we'll just have to agree to disagree here. I don't think the analogy you are making here with palochka is very applicable and you're still missing the point made by User:AG202 about what's the most common usage in scholarly and other English sources. Benwing2 (talk) 02:24, 17 August 2023 (UTC)Reply
@Benwing2 The whole reason I brought it up is as an example of when the most common usage isn’t necessarily an indicator of what’s most appropriate. I’ve also seen plenty of typography mistakes in scholarly sources, too, or fonts that map common characters to a glyph of what is actually intended. You can’t just rely on the codepoint. Theknightwho (talk) 02:27, 17 August 2023 (UTC)Reply
Just to be clear, when I said common usage, I meant what character is actually intended, not necessarily parsing specifically based on codepoints. However, this isn't an easy task for sure, unfortunately. AG202 (talk) 02:49, 17 August 2023 (UTC)Reply
Doesn't matter whether it's the endonym or exonym: the apostrophe is a punctuation mark, and these are not punctuation marks. Yes, we can substitute, and that's common enough. We could also use a hyphen for a minus or a double hyphen for an em dash -- those substitutions are common too -- but that doesn't mean we should do that. We could substitute click letters with exclamation marks and pipes. But if we want Wiktionary to look professional, then IMO we should typeset it professionally, and not use ASCII substitutes just because they're easier to type. kwami (talk) 04:06, 17 August 2023 (UTC)Reply


Ktunaxa, Secwepemctsín

[edit]

Could we rename Kutenai (kut) to Ktunaxa, and Shuswap (shs) to Secwepemctsín please? The first names are the Anglicized terms for the languages, and are somewhat outdated and/or not in use among speakers. GKON (talk) 22:46, 12 August 2023 (UTC)Reply

@-sche Can you weigh in here? There is nothing wrong per se with having exonyms for languages (we say "German" not "Deutsch" for example), and I note that Wikipedia still uses Kutenai and Shuswap. The main issue in my view is (a) avoid pejorative terms, and (b) use the most common terms as found in English-language sources. Benwing2 (talk) 23:37, 15 August 2023 (UTC)Reply
For Shuswap, almost no-one uses Secwepemctsín in English, either in books overall as tracked by Ngram Viewer, or in reference works about the language at Glottolog. For kut, Kutenai was the main name (in reference works/Glottolog and overall/Ngrams) until a few years ago, when Ktunaxa started to just barely overtake it. - -sche (discuss) 17:45, 16 August 2023 (UTC)Reply
That is true, however I would argue that for Shuswap, the use of this term is declining as seen by Ngram. The replacement is looking like Secwepemc, which is another word for the language that is kind of a good middle ground between Shuswap and Secwepemctsín, wouldn't you say? Also, the actual communities in Secwepemc traditional territory mostly use Secwepemc. For example, if there is some quote or phrase on a billboard in Shuswap, the billboard will say that it's in Secwepemc. Another real life example was a board in Banff town, which had greetings in multiple languages. Among them was Blackfoot, Stoney, Ktunaxa, and Plains Cree, (apart from Ktunaxa) these are all Anglicized terms. However the greeting in Shuswap was said to be Secwepemc.
Shouldn't we be using this term, seeing as it gets the most use in these modern times? GKON (talk) 17:09, 20 August 2023 (UTC)Reply

Akan varieties

[edit]

@-sche This is another mess. Wikipedia has an article Akan languages yet according to both Glottolog and Ethnologue, all varieties are mutually intelligible and better classified as dialects, and indeed we have a single Category:Akan language (code 'ak'). The correct family tree seems to include a top level division into Fante, Twi and Wasa, all of which have ISO 639-3 codes (respectively fat, twi, wss; and Twi has the ISO 639-1 code 'tw' as well). Twi in turn is divided into Asante, Akuapem and Bono. Fante and all three Twi varieties have their own literary standards, and there is also a unified Akan literary standard based primarily on Akuapem. Up until recently, we had {{dialectboiler}} categories for Fante and Twi, called Category:Fante Akan and Category:Twi Akan. I added etym-only varieties for those two as well as for the Twi lects of Asante, Akuapem and Bono. Then I discovered we also have separate languages under Akan for Category:Abron language (= Bono), Category:Wasa language and Tchumbuli (which has no lemmas, and I have no idea what it is). None of these Akan languages have very many lemmas (< 10 each), and as mentioned Tchumbuli has none. I would recommend either we convert Akan into a family and fix up the hierarchy appropriately, or (preferably) we maintain the single Akan language and convert the sublanguages into etym-only varieties. The list of varieties under Category:Akan language is also somewhat messed up (e.g. what is 'Twi-Fante'?), but that is less important. Benwing2 (talk) 18:10, 17 September 2023 (UTC)Reply

Looking into the history (of the codes, on Wiktionary), I think the sub-dialects simply escaped notice at the time Twi, Fante, and Akan were merged. I note that the two Wasa entries we have are identical to Akan, and the Abron ones are very similar. I would merge them; AFAIK the difference was historically in spelling, not in speech, and since the 70s also not anymore in spelling. (I entered the Abron entries a year before the lects were merged, using a reference published two years before the speakers of Abron and the other dialects of Akan unified their orthographies. The Wasa entries were added in 2021 by a Japanese editor, also using an old pre-reform ref, which the user also used for the Akan spelling: we should check what the modern spelling is...) Re "Twi-Fante" being listed as a "variety" of Akan: it was originally listed as an alternative name of Akan; when 'alternative names for the language' and 'names of varieties' were split into being separate parameters, someone must've mis-assigned it. - -sche (discuss) 06:02, 23 September 2023 (UTC)Reply

New language codes for nested Persian translations

[edit]

Per Wiktionary:Beer_parlour/2023/October#Persian_nested_translations_-_split_or_labelled?

@Sameerhameedy, @Benwing2, @Theknightwho.

New codes and labels, under "Persian" to work with MediaWiki:Gadget-TranslationAdder.js

  1. "prs" - Dari
  2. "fa-cls" - Classical Persian

Considering "fa-ira" for Iranian Persian. Anatoli T. (обсудить/вклад) 05:13, 4 October 2023 (UTC)Reply

Don't we normally use ISO 3166 codes for countries? I'd say it should be "fa-IR". —Mahāgaja · talk 09:24, 4 October 2023 (UTC)Reply
@Mahagaja: Not sure what is right in this case but it must have been done.
Both "prs" and "fa-ira" seem already working but {{t+|اَفْغانِسْتان}} fails to link to fa:افغانستان
Since the code is already working (apart from the interwiki) links, automatic nesting should be possible as well.
Need to make "fa-ira" link to "fa" Wiktionary, just like "cmn" links to "zh" Wiktionary. {{t+|cmn|阿富汗}} to zh:阿富汗
@Benwing2, @Sameerhameedy, @Theknightwho: can someone please fix the the interwiki link? I think it was @Ruakh who made it work for Mandarin. I'll take a look at nesting. Anatoli T. (обсудить/вклад) 00:07, 13 October 2023 (UTC)Reply
Actually the new codes still don't work with the translation-adder. Some changes to Module:languages/data submodules need to happen. Anatoli T. (обсудить/вклад) 00:19, 13 October 2023 (UTC)Reply
@Mahagaja: "fa-ira" is correct per Module:etymology_languages/data Anatoli T. (обсудить/вклад) 00:33, 13 October 2023 (UTC)Reply
Update: @Sameerhameedy: Language code "prs" can now be used for automatic nested translations: Persian\Dari. Just use the language code "prs" in the translation adder but I wasn't able to tweak modules for "fa-ira" or "fa-cls". Anatoli T. (обсудить/вклад) 02:58, 13 October 2023 (UTC)Reply


Splitting Mazurian

[edit]

I would like to open a discussion about the pros and cons of splitting Masurian as an L2 with the langcode zlw-maz and as a descendent of Old Polish. I would also like to preface this that while I am leaning towards split that I am not dead-set on it. The argument is as follows:

w:Masurian dialects would benefit a lot from having a separate L2. There are significant differences in pronunciation (extra vowels non-existant in Polish a loss of quite a few consonants), grammar (different endings from standard Polish), and vocabulary, especially outside the "core" vocabulary. Even a significant number of basic forms end up looking different from Polish, and it has many inflections and conjugations. I could place them in the tables for Polish, but it might get cluttery. I would like to also point out that {{R:pl:SgOWiM}} exists as a good, reliable source for entries.

Problems of splitting - most people do consider this specifically a dialect, even most speakers, and most forms of it today are heavily policized. However, at least up until the 20th century it was distinct and much more difficult to understand in comparison to standard Polish. My problem is that some of these differences are so vast it might not make sense to put them all under Polish. Vininn126 (talk) 21:43, 12 November 2023 (UTC)Reply

A point for not splitting is that some other dialects of Polish might be equally as divergent, such as Łowicz, in some respects. So what might be better is including multiple declension tables and the like. (Notifying KamiruPL, BigDom, Hythonia, Tashi, Sławobóg): , @Benwing2, @PUC, @Thadh Vininn126 (talk) 12:58, 14 November 2023 (UTC)Reply
Here is some sample text The little prince in Mazurian. This channel has some other examples. As someone with high proficiency in Polish, I can understand large parts of it but there's also a significant portion that is very difficult, maybe 65% for me. Vininn126 (talk) 17:28, 14 November 2023 (UTC)Reply
@Vininn126 As you know, I tend to lean towards not splitting in cases of doubt, while Thadh leans towards splitting. Comparisons to multi-dialect languages like Occitan and Ancient Greek might be useful. In this case I don't know, but I think we're hampered by the lack of standardization. Benwing2 (talk) 23:06, 14 November 2023 (UTC)Reply
@Benwing2 There is a notation system widely used for Masurian which is present in the Wikipedia article that I'd be able to use for WT:About Masurian if split. Also, using this system would yield in 1) a different pagename 2) different pronunciation section (as the notation system is based on the different pronunciation) 3) different definition section at least outside of "core" vocab, and core vocab would only share 1-2 defs, as opposed to all of the obsolete senses as well. 4) different conjugation/declension section as well Vininn126 (talk) 08:53, 15 November 2023 (UTC)Reply
I’d be in favour of the split. As a native Polish speaker I find it difficult to understand some Mazurian texts and eg. parts of this Mazurian rendition of Colors of the Wind, Farbi Zietrżu would be straight ungrammatical in Polish (the infinitive construction in cÿsz ti słicháł zilkä wicz ‘have you heard the wolf howl’, which looks more Czech than Polish and in Pl. would have to be reworder as ‘czyś ty słyszał jak wilk wyje’ or ‘wycie wilka’ or something, but the infinitive doesn’t work).
Also I’ll note that Mazurian also keeps some phonemes long gone from standard Polish (like the /r̝/ phoneme written in the song above which Polish merged with ż /ʐ/).
And, @Vininn126, could you include me too in Polish-related discussions when pinging people? I feel left out ;-) // Silmeth @talk 12:28, 15 November 2023 (UTC)Reply
@Silmethule I can add you to the Polish ping group. Yes, the completely different set of phonology and grammar are both big points for me. Masurian also keeps reflexes of Old/Middle Polish pochylone vowels while getting rid of quite a few consonants. Reading up on the Wikipedia article, quite a few experts also claim it's a language. Vininn126 (talk) 14:16, 15 November 2023 (UTC)Reply
Having boned up more on Polish dialectology I'm definitely leaning now more towards split. I haven't been able to find another dialect (that we would mark as such) as divergent as Masurian. There's also a big gap of mutual intelligibility Vininn126 (talk) 15:48, 20 November 2023 (UTC)Reply
I'll also add I was wrong in the original post - the Masurians had a stronger sense of identity even more so than the neighboring regions. Vininn126 (talk) 16:47, 20 November 2023 (UTC)Reply
I'm still wavering, upon listening to more recordings. It might be possible to automatically generate pronunciation sections (even though they would be very, very different), and then it would just be a matter of giving special definitions a label and then I suppose conjugation/declensions... Vininn126 (talk) 09:17, 28 November 2023 (UTC)Reply
@Silmethule @Mahagaja Another question would be the langcode. Is the one I proposed best? I doubt it. At this point I'm fairly sure we are splitting.Vininn126 (talk) 13:48, 7 December 2023 (UTC)Reply
@Vininn126 Depending on the choice of Mazurian vs. Masurian, it should be zlw-maz or zlw-mas. Benwing2 (talk) 22:21, 7 December 2023 (UTC)Reply
@Benwing2 You're right, so it's probably gonna end up being zlw-mas. Vininn126 (talk) 22:23, 7 December 2023 (UTC)Reply
I'm going to go ahead with this today and make an entry. I've also been able to contact someone educated in this lect and they'll be able to check anything that I (or potentially we, me and him) make. There is a weak consensus it should be split, and if it's handled right I think it will be much better than smushing everything into Polish. Vininn126 (talk) 17:55, 8 December 2023 (UTC)Reply
@Benwing2 @Mahagaja @Silmethule Sorry for all the pings as of late. I figured now would be a good time to take a pause and look at the current state of things after the decision. We currently have 428 Masurian lemmas, Appendix:Masurian pronunciation, Appendix:Masurian Swadesh list, along with various infrastructure. I know this is a lot of material, I ask you to please take a look at some of these and give your input, and I thought now would be a good time before things got too big, and also at this point I am going to slow down.
Of the existing lemmas, I added mostly cognates, so there aren't many words unique to Masuria, but there are plenty of definitions and of course, pronunciations. I haven't been able to do any work with declensions, as Masurian declensions are too complicated for me at the moment, but I can assure you there are plenty of differences.
I also know I gave the impression I was gung-ho for a split, and also for a split for Goral, which isn't the case, I simply found resistance everywhere I went when trying to add Masurian information - some felt it clogged up the main Polish entry, didn't want particular information, other times I heard that it's remarkably different.
Having added all these terms, I can still see it going either way. On one hand, having it split as a language is a view held by some linguists, but not all (always a problem), and I think the orthography us few Masurian editors have been using easily demonstrate the phonemic difference (the template is phonemic except for (literally) 1 or (potentially) 2 phones, that being the ones represented by <ä> (which might be phonemic) and <ÿ> (which I believe is phonemically /i/).
However, if we merged, as I have seen various reactions to the split, and understandably so, I'd have a few questions.
What would be the best way to represent Masurian pronunciation? We could ignore spelling and put everything under the Polish spelling, using a respelling in the pronunciation module. This is the approach I take with Middle Polish, and it serves me well. For Masurian only terms (such as szmanta), I'd prefer to keep {{zlw-mas-IPA}}, similar what we have currently {{zlw-mpl-IPA}}. However this leaves us with the issue of <ä> and <ÿ>.
Another potential approach would be to keep the spellings, but I'd be less sure about this, as it works better for British/American English. One potential issue this would solve is the problem of standard Polish definitions absent from Masurian.

One other potential issue is the fact that Masurian would ideally be treated as an LDL. Currently Middle Polish is (not standardly!) treated as an LDL, despite being part of Polish, and it would be a shame to see the potential for someone to RFV all of them (perhaps they won't, but the option exists) and have certain very real terms deleted just because it's considered part of a WDL.

I know there's been a lot of talk about this lately, hopefully there isn't too much fatigue. That is why I decided it might make more sense to review this now and press on later. Vininn126 (talk) 23:40, 18 January 2024 (UTC)Reply
I was asked by Vininn to add my two cents on the issue, so here I go.
I must say I am worried about using language splits in order to circumvent the WT:WDL policy. I understand the frustration of having dialectal terms left undocumented, but there is no way to objectively draw a line between one dialect and another. In the end the smallest unit of a complete language system is an idiolect, and between that and a language family any grouping is ultimately either political or arbitrary.
I'm not sure how to define what is and isn't a language. I would say ISO codes are a good start, and after that splits may be warranted provided that there is abundant literature in the lect, a solid written language, or some major problems in mutual intelligibility... Knowing how Slavic languages are, the last one is probably not the case with these Polish lects. I don't know enough about them to comment on the first two.
With historical lects, a different issue comes up. In my opinion, it is only possible to treat a standard language as an WDL after its standardisation, and so I would prefer lects like Middle Polish to stand separate, like Old Ruthenian, and in my opinion the same should be done with Middle Russian (although this discussion led nowhere). Thadh (talk) 13:26, 19 January 2024 (UTC)Reply
@Thadh As to intelligibility, as mentioned above, I'd say that Massurian (and to a lesser extend Goral) is as intelligible as two other Slavic languages, so somewhat, but also quite diffificult for a lot of people. Middle Polsih is also the period when standardization really began and to some extend, solidified. Vininn126 (talk) 13:42, 19 January 2024 (UTC)Reply
@Thadh, Vininn126: regarding mutual intelligibility, my subjective opinion is that Middle Polish is easier for a modern Polish speaker than Masurian (if not because of anything else, then due to exposure in school to 16th and 17th century texts) – but since modern standard Polish does continue the standard that was established during Middle Polish period, I think there’s more to it. Masurian truly feels “foreign”. So if we’re willing to keep Middle Polish as a separate lang, IMO Masurian deserves the treatment too.
But then, regarding the factors of attestation in literature, separate grammar, recognition in separate ISO code, etc. – we’ve merged Classical Gaelic with modern Gaelic langs and it’s still not split – despite having its own ISO code, having very rich literature in 13th–18th centuries, its own grammar schooling tradition, established (if changing in time) spelling conventions, etc. So even we acknowledge those factors provide good guidance we definitely don’t always follow it very closely. // Silmeth @talk 14:20, 19 January 2024 (UTC)Reply

Proposal for several languages without ISO codes

[edit]

Tagging @-sche and @Benwing2 who are likely to be interested in this. Here is a list of languages that currently lack ISO codes, with a brief explanation as to why they probably justify an L2 code. In a couple of cases, we're never likely to have more than a handful of entries for the language in question due to the scant number of attestations we have, but I don't think that should be used as justification for exclusion.

Baltic

[edit]
  • Splitting Galindian (xgl) into East Galindian (xgl-eas) and West Galindian (xgl-wes).
    This seems to have been a genuine mistake by the ISO: "Galindian" refers to two separate extinct languages within the Baltic family, which don't even seem likely to have been part of the same sub-branch. Both are poorly attested, however.
    What is there to add in either language? WP says both are "poorly attested", but I'm having trouble finding whether they are actually attested or this is just an editor's euphemism for "not attested". (All I've found so far is a random website mentioning that some placenames are known or inferred for "Galindian".) This would help with deciding whether to just retire xgl, add full codes for East and West, or add etymology-only codes for them. - -sche (discuss) 19:29, 16 January 2024 (UTC)Reply

Creoles and pidgins

[edit]
  • Scots-Yiddish (crp-syi)
    A Scots-Yiddish creole spoken in the first half of the 20th century. Attestations are scanty, but some records do exist.
    I'd like to see good evidence that this is a genuine creole (or even pidgin) rather than Scots with some Yiddish loanwords or simple code-switching. Pidgins rarely arise when there are only two languages in contact, and not all pidgins undergo creolization. —Mahāgaja · talk 07:36, 8 December 2023 (UTC)Reply
    Yeah, I don't think we have enough evidence of this being a real, distinct language to add it. (Several of the relatively few works "in" the "language" appear to be inventing, or as they put it, "reimagining" it like a conlang.) - -sche (discuss) 19:05, 16 January 2024 (UTC)Reply

Dravidian

[edit]
Created. Theknightwho (talk) 01:18, 3 February 2024 (UTC)Reply
  • Malamuthan (dra-mal)
    A small tribal language related to Malayalam - we have quite a few of these already, and I see no obvious reason to exclude this one.
    I'm having trouble finding any reference works about this; Mikhail S. Andronov (in A Comparative Grammar of the Dravidian Languages and A Grammar of the Malayalam Language in Historical Treatment) speaks of "the Malamuttan dialect". Perhaps we should just wait until someone has content they're wanting to add in this lect, to judge how distinct it is. - -sche (discuss) 19:38, 16 January 2024 (UTC)Reply
    @-sche I'm not sure if you've seen it, but pages 37 to 39 of Tribal Languages of Kerala has some information about it, which notes a number of distinctive qualities; not least because they have a very strong tradition of isolating themselves from outsiders. That paper cites a 1981 reference work, but I assume it's in Malayalam. Theknightwho (talk) 14:35, 20 February 2024 (UTC)Reply

Germanic

[edit]
  • Greenlandic Norse (gmq-grn)
    A descendant of Old Norse spoken in Greenland until sometime in the 15th century, which diverged likely due to isolation (compare Icelandic and Norn). Some linguistic innovations and conservations have been noted, though the number of attestations is relatively small.
     Oppose: This is concidered a dialect of Old West Norse, for which we already have code: non-own. --{{victar|talk}} 19:22, 7 December 2023 (UTC)Reply
    @Victar That's an etymology-only code, not a full language code. Theknightwho (talk) 20:22, 7 December 2023 (UTC)Reply
    I'm aware. This is a subdialect of a larger dialect. --{{victar|talk}} 20:30, 7 December 2023 (UTC)Reply
    My initial inclination is to keep treating this as ==Old Norse== as far as L2s go (or if we really want to, treat it as ==Old West Norse== and upgrade OWN to being attested like Proto-Norse). Various Old Norse dialects including this one have some differences from one another, but I do not know that it makes sense to speak of Greenlandic Norse as a "descendant" of Old Norse when it was contemporaneous and stopped being spoken at around the same time as other Old Norse, and other members of the dialect continuum do not seem to have had trouble understanding it, or at least modern scholars don't (given the uncertainty over whether various texts or inscriptions represent Greenlandic Norse or e.g. the Icelandic dialect of Old Norse, and that it sometimes even comes down to just the shapes of runes rather than anything about which letters or words are used); it seems like we can continue to treat it as a dialect in the dialect continuum. It would be reasonable to add an etymology-only code, for use in various Greenlandic terms' etymologies (since we are extremely free with these, and have ety-only codes even for things like en-NNN vs en-US ... I see we even have "en-US-CA" although this does not appear to be used anywhere and I am going to suggest it be deleted along with Template:User en-us-ca...). - -sche (discuss) 20:12, 16 January 2024 (UTC)Reply
Closing this by giving it the etymology-only code non-grn under Old West Norse. Theknightwho (talk) 01:33, 7 February 2024 (UTC)Reply

Indo-Aryan

[edit]
  • Kishtwari (inc-kst)
    Closely related to Kashmiri (and sometimes classified as a dialect), but only retains partial mutual intelligibility, and (unlike Kashmiri) appears to be written using the Takri script.
     Oppose: I have never seen Ka/ishtwari referred to anything other than a dialect of Kashmiri, alongside Kohistani, Poguli, Rambani, and Siraji. --{{victar|talk}} 08:32, 8 December 2023 (UTC)Reply
    @Victar Poguli has an ISO code, so I’m not sure how much value your assertion has. Theknightwho (talk) 08:42, 8 December 2023 (UTC)Reply
    And just because an ISO code exists, doesn't mean we on the project should create a language for it. Often times, village dialects have codes just because someone put out a paper on it, not because it's any more unique than any other dialect on the continuum of dialects. --{{victar|talk}} 09:30, 8 December 2023 (UTC)Reply
    @Victar It calls into question the value of your statement that you have never seen it referred to as a language, if you’re putting it on the same level as a lect which does, in fact, have a language code. It also directly contradicts your previous statement as to the weight we should put on language codes. There is also the matter of the Takri script. Theknightwho (talk) 09:44, 8 December 2023 (UTC)Reply
    It doesn't contradict my opinion at all. In my experience, partially when it comes to Indo-Iranian, is ISO over assigns language codes, so trying to give a language code to a dialect when even ISO doesn't is saying something. --{{victar|talk}} 10:22, 8 December 2023 (UTC)Reply
    @Victar None of which is relevant to the fact there is evidence it isn’t even written with the same script - please present something more substantive than a personal hunch, or a selective approach to the weight you put on language codes. Theknightwho (talk) 10:29, 8 December 2023 (UTC)Reply
    A language written in multiple scripts is practically a hallmark of Indo-Iranian languages and to cite that as a reason to call it a different language would be naive. --{{victar|talk}} 10:39, 8 December 2023 (UTC)Reply
    @Victar You’re being highly misleading: when a “dialect” is written in a different script, its speakers do not consider themselves to be speaking the same language, and it’s also highly divergent (to the point where it is tonal, unlike Kashmiri), then it creates a compelling case for separating it out. Theknightwho (talk) 10:44, 8 December 2023 (UTC)Reply
    That is such an absurd statement. Script usage is frequently dependent of region and religion. Most literate Kashmiri speakers write in Perso-Arabic but the Hindus population uses Devanagari, regardless of any dialectal differences. Also I can't find any paper states Kishtwari is any more or less tonal than standard Kashmiri. You're overreliant on a Wikipedia article for your facts. --{{victar|talk}} 11:41, 8 December 2023 (UTC)Reply
    @Victar Except this is the Takri script and it is directly related to “dialectal” differences, so your comparison is nonsensical because it shows that script usage in this case is affected by the lect, not other factors like religion. Standard Kashmiri isn’t tonal at all, as you very well know. Theknightwho (talk) 11:48, 8 December 2023 (UTC)Reply
    Yes and the Kishtwari dialect is spoken in the region of the Kishtwar Valley, and the use of Takri is regional. Again, no paper I read remarks anything on tone. Unless you can provide a paper, your statement is meaningless. --{{victar|talk}} 11:57, 8 December 2023 (UTC)Reply
    @Victar we also have code for haryanvi, considered a dialect of Hindi. So should it be removed? Word0151 (talk) 12:48, 8 December 2023 (UTC)Reply
    🤷 Plenty of Hindi project users that can decide that. --{{victar|talk}} 01:33, 9 December 2023 (UTC)Reply
  • Urtsuniwar (inc-unr)
    Closely related to Kalasha, but appears to be divergent enough to constitute a separate language with around 70% mutual intelligibility (compare Spanish/Portuguese with 85-90%).
     Oppose: Urtsuniwar is a synonym for Kalasha, see Decker (1992). Some speakers just use more Khowar borrowings than others. --{{victar|talk}} 08:32, 8 December 2023 (UTC)Reply
    @Victar Patently untrue - numerous references in the sources provide by WP (and elsewhere), and you’ve failed to explain the issue of mutual intelligibility. Theknightwho (talk) 08:45, 8 December 2023 (UTC)Reply
    How is it "patently untrue"? Did you read Decker (1992): "Kalasha speakers in the Urtsun Valley sometimes call their language Urtsuniwar." I did explain the "issue of mutual intelligibility" -- speakers of Kalasha use varying degrees of Khowar borrowings. --{{victar|talk}} 09:30, 8 December 2023 (UTC)Reply
    @Victar 70% mutual intelligibility is far below the threshold typically used to classify something as a dialect (80-85%) - the fact that one citation says they are the same does not discount the wealth of evidence to the contrary. Theknightwho (talk) 09:44, 8 December 2023 (UTC)Reply
    What "wealth of evidence"? The first reference on the Wiki page literally lists Urtsuniwar under "Other Names" for Kalasha, beside Bashgali, Kalashwar, Kalashamon, and Kalash. Shall we make Kalashwar its own language as well? Another reference there is titled, I shit you not, "Kalasha of Urtsun". --{{victar|talk}} 10:22, 8 December 2023 (UTC)Reply
    @Victar Insufficient levels of mutual intelligibility, as stated several times. Theknightwho (talk) 10:32, 8 December 2023 (UTC)Reply

Iranian

[edit]
  • Gorgani (ira-gor)
    An extinct Caspian language attested in the 14th century, which appears to have formed a dialect continuum with Mazanderani. Previous discussion here.
     Oppose: The few texts we have in Gorgani are almost indistinguishable from Old Tabari, the ancestor of Mazanderani, and should be considered a dialect of it, not its own language. There are actually more differences between Old Tabari and Mazanderani, but, like Classical Persian and Modern Persian, we treat them as the same language, in large part due to their use of an abjad alphabet. @Fay Freak --{{victar|talk}} 19:35, 7 December 2023 (UTC)Reply
    @Victar In all seriousness: given you clearly respect the views of Borjian, how do you explain his apparent change in view from the line you quoted from 2004 and his 2008 paper on Gorgani in which he invariably refers to it as a language (not a dialect)? Theknightwho (talk) 22:43, 7 December 2023 (UTC)Reply
    By its only being apparent. If you search for such a distinction. I’ve just looked into the 2008 paper again just for you. Normal(ly) people don’t look upon the statistical distribution of the employment of “language” and “dialect” in previous publications to find “changes in view” of linguists. Their views are rarely that sophisticated that one could make meta publications as one does on philosophers, and even then following such a bright shiny object is not an argument. language has multiple languages like sublanguage, including dialect, and one is not only not always anxious to make a distinction, there is usually nothing gained at all from such a “turf war”. All is language and words, rarely isolects or lexemes. Whether or not something should be treated separately is decided long before you realize you could beat the topics of this dichotomy again to fill your publication history.
    In this case the talk of “language”, I may argue, is purposefully misleading people, to market one’s publication career. It’s just much more zhoosh to publish about whole “languages” than dialects. But it’s okay to embellish things a bit since the core message of a paper does not hinge on these concepts. All historical sciences use to be much less exact in their design than that of the jurist who has the peculiar task to weigh or find a balance for a final decision. Like how I formulate etymologies in probability terms is secondary to what information is provided, in other words: it is mostly rhetorics to present the material, the related forms, reconstructions, and bibliography—this is the science, the result is of little practical relevance, unlike in the legal art where in the end you get a sentence or recommend an action. There is a principal misunderstanding of what linguistic papers are about here I can make out. Benwing noticed. You take publications of an author and read them with an exactitude that they don’t provide, with “research results” that they didn’t care about. One could enjoy that there are still naive academics whose subjects are recondite enough for their not bewaring of a lawyer around the corner attempting to misinterpret them. Fay Freak (talk) 00:35, 8 December 2023 (UTC)Reply
    @Fay Freak This seems like a very cynical answer, and it’s difficult to see how you’re not simply accusing Borjian of academic dishonesty. Also Benwing2 didn’t add anything on this topic - he simply asked for consensus. Theknightwho (talk) 08:48, 8 December 2023 (UTC)Reply

Nuristani

[edit]
  • Zemiaki (iir-zem)
    Spoken by around 500 people and related to Waigali, but I'm not seeing any indication it should be treated as a dialect in the literature.
     Oppose: Morgenstierne (1974) calls it a dialect of Waigali, and Edelman (1999) is unsure, labeling it "jazyk/dialekt". We should play it safe and treat it like a dialect. --{{victar|talk}} 21:46, 7 December 2023 (UTC)Reply

Tungusic

[edit]
  • Alchuka (tuw-alk)
    A language in the Jurchenic branch (i.e. close to Jurchen and Manchu), which went extinct at some point in the 1980s. Records of the language aren't great, but there are a handful of works which go into detail.
  • Bala (tuw-bal)
    A very similar situation to Alchuka above, though the language may still be moribund.
  • Kili (tuw-kli)
    Formerly thought to be a dialect of Nanai (a Southern Tungusic language), but now thought to be a Northern Tungusic language influenced by Nanai due to geographical proximity; it had 40 speakers in 1990, and is likely moribund.
With no objections, creating these three. Theknightwho (talk) 18:28, 4 February 2024 (UTC)Reply

Yeniseian(?)

[edit]
  • Jie (qfa-yen-jie)
    Likely to be a Yeniseian language (though possibly Turkic), with only a single attestation from the 4th century (though it wouldn't be the first).
In the absence of objections, I'll create this, given the number of potential entries is capped at 4. Given the contention over its affiliation, und-jie is preferable as a code. Theknightwho (talk) 16:57, 4 February 2024 (UTC)Reply

Unknown

[edit]
  • Xiongnu (und-xnu)
    Attested only via in Old Chinese records of the language [edit: and potentially some inscriptions - see below], but nevertheless, a handful of terms have been recorded (and we can, at least, make broad reconstructions as to how they would have been read): e.g. the Old Chinese borrowing 谷蠡.

Theknightwho (talk) 16:03, 4 December 2023 (UTC)Reply

 Oppose Xiognu (Old Chinese is Old Chinese). West Galindian is also unattested. Is East Galindian attested outside of borrowings? If not, maybe keep as a substrate language?  Provisional support Zemiaki, Kishtwari, Urtsuniwar, based on the assumption there are no good arguments to keep these together.  Abstain for the others: poorly attested, extinct languages are usually subject to a lot of debate and usually dictionary entries in these don't turn out well, but they at least seem valid. Thadh (talk) 16:25, 4 December 2023 (UTC)Reply
@Thadh The issue with Galindian is that we need to deal with the present situation, since having a single language code for both is simply incorrect. Re Xiongnu, I'm not referring to borrowings - I'm referring to specific records of the Xiongnu language in Old Chinese sources. Theknightwho (talk) 16:30, 4 December 2023 (UTC)Reply
@Theknightwho: Do you mean mentions of terms à la Uindiorix, or do you actually mean texts à la Luwian? Because in the former case, I'm inclined to call it a borrowing rather than an attestation, whereas the second one is fair enough. Thadh (talk) 17:18, 4 December 2023 (UTC)Reply
@Thadh It's a bit tricky - for example, see [6], where Vovin argues (quite convincingly) that they're inscriptions in Xiongnu which used Old Chinese characters for their semantic values, except for terms that needed to be transcribed phonetically, such as titles or personal names. There's obviously precedent for this - compare Japanese, Korean, Vietnamese etc. Theknightwho (talk) 18:01, 4 December 2023 (UTC)Reply
@Thadh: Discussion will be considerably less confusing if people put their Supports, Opposes and Abstains under each individual case rather than grouping them together at the bottom. —Mahāgaja · talk 18:06, 4 December 2023 (UTC)Reply
@Mahagaja: I had quite general remarks: Living languages - split. Unattested languages - no split. Rest - abstain. I think repeating this ten times is a bit overkill. Thadh (talk) 21:12, 4 December 2023 (UTC)Reply
I'm usually sympathetic to adding extinct language X even if it's only attested as quotations/mentions/etc in old records in language Y, as long as we're sure X was a language (and different from, not just a dialect of, Y or another language). With Xiongnu, it seems like no one is sure which of various unrelated ethnolinguistic families the Xiongnu people and language(s) might have been from, or even if it was composed of multiple ethnolinguistic groups. That last part gives me pause. Are scholars generally in agreement that the attested words from the Xiongnu are all in one language, or is this like e.g. "Loup" where it's multiple different languages? (We currently have Category:Loup B language, but this is questionable and it seems good that we don't have any entries.) - -sche (discuss) 21:15, 4 December 2023 (UTC)Reply
@-sche A lot of that lack of certainty comes from two factors:
  • Because Xiongnu is filtered through Old Chinese characters, any kind of reconstruction therefore relies on us being able to accurately reconstruct the readings of those characters. This is something that is gradually improving, and - for example - we are in a much better position to make this kind of judgment than Pulleyblank was in the 1960s
  • There’s been a huge amount of (understandable) speculation as to whether the Xiongnu and the Huns were one and the same. If I had to put money on it I’d say they probably were related, but I strongly suspect there was a large dialect continuum involved (just as there was with the Mongolian languages a millennium later). However, I’m certainly not proposing we merge Hunnic with Xiongnu or anything as radical as that. What we do know is that the inscriptions which were found were created by the same Xiongnu who are written about in Old Chinese sources, because they were excavated in the old Xiongnu capital of Longcheng in Mongolia, which was discovered quite recently. The question is whether they’re in Old Chinese or Xiongnu, but I’m inclined to agree with Vovin that the evidence suggests the latter.
Theknightwho (talk) 03:36, 5 December 2023 (UTC)Reply

2024

[edit]

Medieval Greek from Ancient Greek

[edit]

Please, as in Wiktionary:Beer_parlour/2024/January#Petition_to_upgrade_Medieval_Greek, from Category:Ancient Greek language. (I am sorry that my browser has difficulty to read much of this page.) ‑‑Sarri.greek  I 09:45, 2 January 2024 (UTC)Reply

 Support. The request is to split grk-gkm Medieval Greek out of grc Ancient Greek. Previous discussion at Wiktionary:Beer parlour/2023/March#Medieval Greek. @Fay Freak, Al-Muqanna, Nicodene, Vahagn Petrosyan, JohnC5, Benwing2, -sche, the people who participated in that discussion which (like most discussions at Wiktionary, unfortunately) ended inconclusively. By the way, we've been using gkm as if it were an ISO 639-3 code, but in fact it isn't one. A request was made for that code many years ago, but it's never been approved or denied. Therefore if the split is approved, we need to use the exceptional code grk-gkm. —Mahāgaja · talk 11:10, 2 January 2024 (UTC)Reply
Note: The proposal in question was rejected on Hallowe’en 2023. 0DF (talk) 19:54, 19 June 2024 (UTC)Reply
 Support, but only if any editors are willing to clean up the mess left behind by the split, otherwise this should wait a bit. Also, we have to first figure out which of the many modern Greek varieties (Standard Greek, Mariupol Greek, Pontic Greek, Italiot Greek, Tsakonian, etc.) are to be descendants of Medieval Greek, and which shouldn't. Thadh (talk) 11:39, 2 January 2024 (UTC)Reply
I'm fairly familiar with Attic Greek, but not with Medieval apart from what I've read on Wikipedia. The sources that I've typically used for Ancient Greek entries when I used to create them don't cover Medieval. I wouldn't be opposed if you and a team of other people familiar with Medieval want to split it. I don't know if I can be of much use unless there are bugs in modules or something. — Eru·tuon 08:25, 4 January 2024 (UTC)Reply
Thank you. I will "clean up the mess left behind the split", @Thadh. It is only 248 words that need fixing, plus all related Modern Greek (el) etymologies; I have a list of 711 corrections. I do a lot of Medieval Greek at el.wiktionary, please do not worry, I will not destroy anything. I need one week to fix everything. Please, (@Erutuon) also Module:grc-pronunciation, Section Period for Template:grc-ipa-rows, Template:grc-ipa-rows-byz, Template:grc-ipa-rows-koi needs to say 10th century Medieval (or Mediaeval, according to your HomeRules) not 'Byzantine', Also at its /data might add med1 med2 also would be a nice addition. I am very happy, to resume work for med.greek! ‑‑Sarri.greek  I 04:51, 6 January 2024 (UTC)Reply
I suppose actually the lines for Medieval Greek should be removed from {{grc-IPA}} and moved into a separate {{grk-gkm-IPA}}. Likewise the option for |dial=gkm needs to be removed from all grc inflection tables and new grk-gkm inflection tables created. —Mahāgaja · talk 08:19, 6 January 2024 (UTC)Reply
@Mahagaja, no, not needed. IPA will be with parameter period=byz1 (or period=med1, if Erutuon might give an alias to this parameter). Also: learned medieval inflections are identical to the standard ancient inflections and there is no need to provide them separately. Nothing different. At el.wikt, if we care to repeat them, we add title: learned medieval inflection as in ancient greek. But we shall not provide any of that now. Never mind for vulgar inflections (I'll let you know about these) Thank you for your concern. ‑‑Sarri.greek  I 08:26, 6 January 2024 (UTC)Reply
We're really not supposed to use one language's templates in another language's entries, so if grk-gkm and grc are two different languages, then we're really not supposed to use things like {{grc-IPA}}, {{grc-decl}}, {{grc-adecl}}, and {{grc-conj}} in grk-gkm entries. And there may still be some differences; for example, does Medieval Greek ever use the dual number? If not then the dual shouldn't be shown in {{grk-gkm-decl}} and {{grk-gkm-conj}} as it is in {{grc-decl}} and {{grc-conj}}. —Mahāgaja · talk 09:10, 6 January 2024 (UTC)Reply

Thank you, (sorry, this page gives me page unresponsive at my Chrome browser, and is often difficult to write here.) Thank you @Mahagaja, The code gkm is in wide use, and although not -still- activated by ISO; there have been attempts to draw attention to its acceptance, and will notify if something changes officially. At el.wikt there are also dialectal gkm‑crt and gkm‑cyp as subordinate codes.
Thank you @Thadh, I will check all instances of insource:xxx and intitle:xxx occurances of relevant words and correct them. For the update Module:families/data/hierarchy#Hellenic and Module:etymology languages/data#gkm I submit here (quoted) the official greek source: Modern Greek Dialects What is a dialect? - Research Centre for Modern Greek Dialects, Academy of Athens

Nowadays we consider as dialects the Pontiac (in which the Greek of Crimea-Mariupol are included), the Cappadocian, the Tsakonian and the Southern Italian. All the other regional variants of the Modern Greek Standard are known as idioms. In particular, the Cretan and Cypriot idioms are exceptionally known as dialects, thus acknowledging an intermediate level of language variation.

All the modern Greek dialects Cappadocian.cpg, Italiot.grk-ita, Pontic.pnt which includes Mariupol idiom) and Modern Greek.el itself come from Medieval Greek, except Tsakonian.tsd, which is a special case. Thank you ‑‑Sarri.greek  I 13:07, 2 January 2024 (UTC)Reply

A bit off-topic, but most researchers I have read claim Mariupol Greek is, in fact, not a Pontic lect and doesn't share much if anything in common with Pontic it doesn't with other Greek lects. Thadh (talk) 13:34, 2 January 2024 (UTC)Reply
I kinda doubt editors are willing to clean up, or review the dialectology of the Abstandsprachen. The ideological distinction is barely worth the effort for that and for always checking in which chronolect a word has been used, an argument I often use, as we do not go completely without distinction if we don’t split at the L2 level: now it means we write a label if we know and abstain if we don’t bother. The result could become more often that someone doesn’t add a valid entry or etymological note due to fear of making a mistake. Fay Freak (talk) 19:46, 2 January 2024 (UTC)Reply

 I oppose the change in name from “Byzantine Greek” to “Medi(a)eval Greek” for referring to this chronolect. I’m undecided about the split itself. @Sarri.greek: Could you point us to some well-developed Byzantine Greek entries in το Βικιλεξικό to give us some idea what they’d look like, and to what extent they’d contrast with Ancient Greek and Modern Greek entries, please? 0DF (talk) 02:19, 7 January 2024 (UTC)Reply

@0DF. _For the term, professors of linguistics might answer your question (ref). _Examples Παραδείγματα at wikt:el:Κατηγορία:Μεσαιωνικά ελληνικά. ‑‑Sarri.greek  I 08:45, 7 January 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Sarri.greek: Thank you for your response. I'll address the παραδείγματα first.
That category you linked (el:Κατηγορία:Μεσαιωνικά ελληνικά = “Category:Mediaeval Greek”) contains 1,804 entries, so I hope you'll forgive me that I only checked out the first column of entries (from el:ἀβαμπαρλιέρης to el:ἀλλάγιον — 63 pages). Of those, none of the gkm entries contained IPA transcriptions, and the only ones with inflection tables are el:αἰγοβοσκός and el:ἀλλάγιον. Those don't appear to be what I'd call "well-developed". As to contrast, the declension tables in αἰγοβοσκός and ἀλλάγιον are identical to Ancient Greek ones, even including the δυϊκός (duïkós, dual). As they are, those 63 entries suggest there would be no benefit to splitting gkm out of grc and that doing so would only create useless redundancy. That being said, I suspect that there could be some value in the split in the cases of entries like el:-άγρα, el:-αινα, and el:-αλγία, which present (currently unseized) opportunities to explain the loss of the accusative , the loss of the dative entirely, and the collapse of the Ancient nominative–vocative plural -αι and accusative plural -ᾱς into the Modern -ες. I also see cases like the Modern Greek entry καλοκαίρι (kalokaíri, summertime, summer), which currently traces the word's etymology, via Byzantine Greek καλοκαίριν (kalokaírin, good season, good weather), to Ancient Greek καλοκαίριον (kalokaírion, fine weather). It would be great to know how καλοκαίριν (kalokaírin) declines; that being said, is there any reason why its declension couldn't be showcased perfectly well as a {{lb|grc|Byzantine}} {{alternative form of|grc|καλοκαίριον}}?
Now to the nomenclatural issue.
I've taken a look at the authority you cited; for the benefit of others reading this, here are its bibliographical details:

  • David Holton with Geoffrey Horrocks, Marjolijne Janssen, Tina Lendari [Stamatina Lentari], Io Manolessou, and Notis Toufexis [Panagiotis Toufexis] (2019), The Cambridge Grammar of Medieval and Early Modern Greek, four volumes, Cambridge · New York · Port Melbourne · New Delhi · Singapore: Cambridge University Press, →DOI, →ISBN, →LCCN

The authors' rationale for their disuse of the term Byzantine Greek is to be found in the introduction to the work, in this paragraph from page xix:

The system of periodization that we have used is not based on external criteria, which might relate to historically significant dates, such as wars, conquest or independence. For this reason we do not employ the term “Byzantine Greek”: for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not “Byzantine” in a political sense. Our criteria are instead internal ones, based on clusters of important linguistic changes that we see as occurring around 1100, 1500 and 1700 (for details see Holton 2010, Holton/Manolessou 2010). Consequently, we employ the following terminology in order to denote sub-periods of the history of Greek, terms that also conveniently correspond to those widely used for periodization in Western historical thought: Early Medieval (EMedG) from about 500 to 1100; Late Medieval (LMedG) from about 1100 to 1500; Early Modern (EMG) from about 1500 to 1700.

Appeals to authority are all well and good, but that is poor reasoning. Yes, politics affect language, and the Byzantine Empire, whilst it existed, was (I think you'll agree) the political, cultural, and linguistic "centre of gravity" of the Greek world. The authors write that “for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not ‘Byzantine’ in a political sense” (my emphasis); however, a person's language doesn't (immediately) change with political borders. Earlier op. cit., on page iii, there occurs the sentence “The geographical area where Greek has been spoken stretches from the Aegean Islands to the Black Sea and from Southern Italy and Sicily to the Middle East, largely corresponding to former territories of the Byzantine Empire and its successor states.” Doesn't that show the centrality of that polity to the history of the Greek language during this period? The authors' reason is weak, and I reject it.
I see another problem here, which is that Holton et al. seem to be treating this chronolect as existing between AD ~500 and ~1700. As you probably know, the Middle Ages (a.k.a. the Mediaeval period) are traditionally bookended by the falls of two Roman Empires, starting with the fall of the Western Roman Empire in AD 476 and ending with the fall of the Eastern Roman Empire (i.e. the Byzantine Empire) in 1453; it's not too much of a stretch to push it later, to 500–1500, but I don't know any informed person who calls the seventeenth century mediaeval, so we couldn't call this chronolect “Medi(a)eval Greek”. Holton et al. are not alone in this, either: on page xviii op. cit. they mention the “dictionary of Kriaras and the Vienna-based Lexikon zur byzantinischen Gräzität”; that “dictionary of Kriaras” is Emmanuel Kriaras' Λεξικό της Μεσαιωνικής Ελληνικής Δημώδους Γραμματείας, 1100–1669 (Dictionary of Mediaeval Greek Vernacular Literature, 1100–1669, my emphasis). Maybe the Greek Μεσαίωνας (Mesaíonas) is conceived of differently from the English Middle Ages. It would be possible to call the chronolect “Mesaeonic Greek”, but we'd very much be neologising there; I could only find one instance of meseonic, so the adjective alone wouldn't even satisfy the criteria for inclusion.
Finally, I note that the other dictionary mentioned alongside Kriaras' is entitled Lexikon zur byzantinischen Gräzität (Lexicon of Byzantine Graecity), so it's apparent that not everyone rejects the term Byzantine Greek. Indeed, a text search for the string byzantin (case- and diacritic-indifferent) in the bibliography of The Cambridge Grammar of Medieval and Early Modern Greek (which occupies pages xxxvii–clxvi thereof) finds 201 instances. Some of those may be false positives, but that search would also have missed any instances hyphenated across a line break (byz-antin, byzan-tin, vel sim.) or in languages that spell the word bizant- or otherwise. My point is that Byzantine Greek is still a common term and one we should use.
0DF (talk) 09:23, 8 January 2024 (UTC)Reply

A bit of a nitpick, Byzantine Greek isn't any better than Medieval Greek as a label for the language after the fall of the Byzantine Empire. Strictly speaking it wasn't Byzantine Greek at that point, but Ottoman. But either term applies well to the majority of the period. — Eru·tuon 00:42, 9 January 2024 (UTC)Reply
I don't like the term Byzantine Greek because a naive reader could think it referred to a regional dialect rather than a chronolect. It would be easy for someone to think it referred to Greek as spoken in Byzantium as early as the time of Alexander the Great, and that it would not refer to Greek as written in Athens or Alexandria in AD 600. Also, 0DF, Holton et al. explicitly do not call the period from 1500 to 1700 medieval; they call it Early Modern Greek, just as we call the English of the same period Early Modern English. Wiktionary already uses 1453 as the border between grc and el; there's no reason separating grk-gkm out from grc should entail shifting the starting date of el later than it currently is. —Mahāgaja · talk 08:02, 9 January 2024 (UTC)Reply
I don't have much of a stake in this but I also favour Medieval Greek, though I wouldn't be opposed to having Byzantine Greek as an etym-only language attached to it. Theknightwho (talk) 08:43, 9 January 2024 (UTC)Reply
Side issue: if we split Medieval from Ancient, I suppose the Byzantine flag which is currently used for Ancient Greek in the "Add country flags next to language headers" gadget will need to be moved to Medieval Greek, and Ancient Greek will either need a new flag or no flag. - -sche (discuss) 19:48, 12 January 2024 (UTC)Reply
Preferably none. —Mahāgaja · talk 22:23, 12 January 2024 (UTC)Reply

@Mahagaja, Erutuon, Thadh, since I do not see any more objections: _phase_1: I have already cleaned up Modern Greek etymologies involving gkm (need 70 more to do, also supplying sources, ipa etc), to be ready for the term Medieval instead of Byzantine. This is

These steps are for the name-change. If you provide permission and agree to upgrade, from grc, then _phase_2 from Module:languages/data/3/g to Module:languages/data/exceptional, the working alias gkm is already in place and I will be able procede with corrections for titles of Sections wherever needed, sources. etc. Especially where Modern etymologies need a Medieval lemma. Thank you for your help. ‑‑Sarri.greek  I 10:56, 3 February 2024 (UTC)Reply

There are objections. I would like to add that I too oppose renaming from Byzantine Greek or extending its time frame past the 15th century. Nicodene (talk) 02:17, 4 February 2024 (UTC)Reply
@Nicodene, I have suggested nothing about post 15th century = Early Modern Greek which we deal with in polytonic at el.wikt, not monotonic. But we are at _phase_1 now, which is to rename 'Byzantine language' to Medieval Greek. I am glad that you are interested in periodisation of Hellenic language; it is rare that non hellenists are interested or take time to study this. We can discuss it, if you wish at our Talk pages? Thank you ‑‑Sarri.greek  I 02:35, 4 February 2024 (UTC)Reply
(Why not here?)
I see. For the record I do support splitting it out of Ancient Greek, even if the (prescriptively correct, 'learned') inflections are going to be largely the same.
So far I don't see any real argument against the label 'Byzantine'. The point about political control is a bit spurious as the label 'Byzantine' is no way limited to the political level. It is civilisational.
The point about 'Byzantine Greek' being misinterpretable as 'the dialect of the colony of Byzantion' might be convincing if not for the unlikelihood of someone being simultaneously knowledgeable enough about history to even be aware of the (let's be honest) rather unimportant pre-Constantine city, yet also historically illiterate enough to be unaware of what 'Byzantine' means 99 times out of 100. Nicodene (talk) 02:58, 4 February 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Sarri.greek: Respectfully, I think you're being too hasty with this. I acknowledge I've been slow to respond; that has in large part been due to my work researching Atticism (see it, Citations:Atticism, and some of the word's relations) in connection with the more substantive question of whether there is value in splitting gkm from grc. My understanding of that matter is more-or-less in line with this paragraph from the website for Trinity College Dublin's 2024 International Byzantine Greek Summer School (IBGSS):

Byzantine Greek is the dominant form of Greek written during the Byzantine Empire (AD 330–1453). The spoken language changed significantly in this period and came close to Modern Greek, but most Byzantine authors use conservative forms of Greek that looked back to Classical Attic, the Hellenistic Koine and Biblical Greek. Therefore much of the vocabulary, morphology and syntax of Byzantine Greek are not significantly different from Classical Greek, which makes this course a suitable preparation also for reading Classical literature and the New Testament.

But to the matter of the nomenclature: I had previously been arguing that Byzantine Greek is just as good a term as Medieval Greek, but it appears that they may not be entirely synonymous. Please see the quotations I've collected at Citations:Medieval Greek. You'll see that Evangelinos Apostolides Sophocles uses the term Byzantine Greek (for 330–1453) and remarks that “if the expression Mediæval Greek is to be used at all, it should be restricted to the language of [the second epoch of the Byzantine period]” (622–1099), whereas Irach Jehangir Sorabji Taraporewala states that “Byzantine Greek is a direct development from the literary dialect of the second transition period [300–600]” but that “[l]iterary Mediaeval Greek [1000–1450] is a development of the colloquial of the previous (Neo-Hellenic [= Byzantine Greek]) period [600–1000]”; those two sources directly contradict on the details, but they both distinguish the two chronolects. Edward Augustus Freeman speaks explicitly of “a literature, mediæval Greek or Romaic, as distinguished from Byzantine” and the writer for UNESCO discusses in a single sentence borrowings into “Byzantine Greek”, “mediaeval Greek”, and “Neo-Greek”; they appear to have particular time periods in mind, but I'm not sure what they are. And George Leonard Huxley refers to “Byzantine Greek” and “mediaeval Greek language and literature” in consecutive sentences, presumably synonymously, but not obviously so. Many more sources use both terms within the same work, without it being clear whether the terms mean different things or whether they're making a distinction without a difference. Can you explain these distinctions? Are they valid? If not, why not? If so, do you propose more than one offshoot to grc? If not, why not? If so, how many, and what should they be?
@Erutuon: I would argue that, in the same way that Greek writers contemporaneous with but geographically outside the bounds of the Byzantine Empire may nevertheless conform to Byzantine literary norms, Greeks writing after the Empire's fall may, from inertia or nostalgia, also conform to Byzantine literary norms, despite the change in their political context. By contrast, the Middle Ages are strictly chronological and have an exact terminus in the 1453 fall of Constantinople.
@Mahagaja: In my experience, Byzantium is used far more frequently to refer to the Byzantine Empire than it is to refer to the city; most people are unaware that the usage is originally a synecdoche and, whilst a lot of people know Istanbul used to be called Constantinople, far fewer know that Constantinople used to be called Byzantium (and fewer still know that Byzantium used to be called Lygos, but I digress). As such, I don't think that it is at all likely that a naïve reader would make that mistake. A mistake I know some people make, however, is with the qualifiers High or Upper and Low or Lower in geographical and geographically-based terms like Upper Egypt vs. Lower Egypt and High German vs. Low German, with High and Upper mistaken to mean "north(ern)" and Low and Lower used to mean "south(ern)"; I assume the confusion arises from the conventional orientation of maps in the Anglosphere. Despite that confusion, I would not, and I doubt you would, advocate replacing those terms with ones less susceptible to such naïve confusion. For another example, I'm sure a naïve reader could mistake Andalusian Arabic for Arabic spoken in the (present-day) Spanish region of Andalusia; the synonym Moorish Arabic is not susceptible to that confusion, so should we use that instead? There are other confusables as well, I'm sure. ⸻ Re Holton et al., I know they don't call Greek 1500–1700 "Medieval"; the fact that I quoted above a paragraph of theirs that ends "Early Modern (EMG) from about 1500 to 1700" should make that clear. My meaning was that Holton et al. are treating Greek 500–1700 as a single chronolect, which they call "Medieval and Early Modern Greek" and which Kriaras calls Μεσαιωνική Ελληνική (Mesaionikí Ellinikí). Holton et al. make a point of saying that their “system of periodization…is not based on external criteria” and that their “criteria are instead internal ones, based on clusters of important linguistic changes that [they] see as occurring around 1100, 1500 and 1700”. If we did the same, that might indeed entail shifting the starting date of el later than it currently is.
@-sche: I don't have country flags beside language headers turned on and neither am I inclined to turn them on, but if you're interested in having them, you could use the Argead star (commons:File:Vergina Sun WIPO.svg) for Ancient Greek; the English Wikipedia uses that image in its country infoboxes as the flag of the Empire of Alexander the Great, as well as in many other places.
@Nicodene: I largely agree with you, but if we're going to split out gkm, wouldn't it be better to give the inflections that show the changes taking place between Ancient and Modern Greek? Wouldn't it be rather redundant if they had the same inflectional information as that given in Ancient Greek entries?
0DF (talk) 03:46, 4 February 2024 (UTC)Reply

More than one set of inflections could be shown - the learned and Atticising versus the humble and 'demotic', at least by the time of the Digenes Akritas. Or, working with one set of inflection tables, cases or endings falling out of vernacular use could be placed in brackets with an explanatory note regarding register. Apart from that there would be differences in phonology and in various cases semantics as well. Nicodene (talk) 03:56, 4 February 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Nicodene: To give us all some idea of the kind of inflectional variability we're dealing with, I added a table to βαθύς (bathús) of its Byzantine forms. There's already a lot there, but that's an underrepresentation, if anything. Annoyingly for our purposes, Holton et al. specifically omit the dative from their paradigms, despite the fact it occurs:

Nominative, genitive and accusative cases continue to exist in LMedG and EMG. The dative case, however, had gradually disappeared from the spoken language during the first millennium and its main functions were reassigned (see Humbert 1930, Lendari/Manolessou 2003, Horrocks ²2010: 183–5, 284, Holton/Manolessou 2010: 546–7). Nonetheless, datives survive in many of the written texts that this Grammar is based on, though mainly in documents and other texts in mixed or higher registers, and they may have a range of inherited functions. Particularly common are datives governed by the prepositions ἐν and σύν. Because the dative had ceased to be part of the spoken vernacular by about the 10th c., dative forms are not included in the paradigms set out in the chapters that follow.¹
¹The only exception that has been made is the dative reciprocal pronoun ἀλλήλοις, on the basis that its occurrence, which is quite rare, seems to be as much a lexical survival as a morphosyntactic feature (see 5.12).

volume II, § 1.1, pages 241–242

and even in novel formations:

In addition to instances like the above, which could be deemed grammatically “correct” (i.e. in accordance with AG morphology and syntax), we also find dative forms with innovative phonology, stress or morphology, or new lexical items: [τουποθεσίᾳ, Ρεθέμνει, βοθρακοῖς, παρρησιᾷ, Ὀγκριᾷ, Ἀράβοις, ἑταιρίδαις, νήσαις, συνπάσοις, Ἑλλήνοις, δοράτοις, ὀξέοις, ἐμπιστευτηόδες, ἐμπιστευτιόδαις (toupothesíāi, Rethémnei, bothrakoîs, parrhēsiāî, Onkriāî, Arábois, hetairídais, nḗsais, sunpásois, Hellḗnois, dorátois, oxéois, empisteutēódes, empisteutiódais)]
Of particular interest is the use of dative forms for loanwords: [σούγλᾳ, μπασταρδικῷ, σερραγίῳ, ὀντάσι (soúglāi, mpastardikōî, serrhagíōi, ontási)]

ibidem, pages 242–243

Moreover, Holton et al. exclude Atticist texts entirely (“the texts on which this Grammar is based – i.e. texts that are not systematically archaizing” — ibidem, page 243); accordingly, if we're to produce accurate and (in aspiration) exhaustive inflection tables, we shall have to supply the missing Attic forms and datives.
Holton et al. mention the dual number, as far as I can tell, exactly twice in their entire four-volume grammar:

The AG reciprocal pronoun (“one another”) had dual (gen. ἀλλήλοιν) and plural (gen. ἀλλήλων) numbers, and was declined for gender and case (genitive, accusative and dative).

volume II, § 5.12, page 1,183

The London manuscript can be consulted at http://www.bl.uk/manuscripts/. Ms. Athous Pandel. 538, edited by Vasileiou (2003) has the unusual form εγκρεμίζεσθον Varl. & Ioas. (Pantel.) 303, which is unlikely to be an archaic dual (as the subject is 2 sg.), and probably a writing mistake for ἐγκρεμίζεσουν.

volume III, § 4.3.1.2, page 1,551, footnote 54

so I don't know whether to infer from their silence that the dual saw no use in Byzantine Greek, or that its use was resticted to Atticist texts, and that it is for that reason that Holton et al. make no mention of it.
Certainly, we can't rely on Holton et al. alone to guide what we do about Byzantine Greek. Nevertheless, that table at βαθύς (bathús) is something concrete to work from. 0DF (talk) 23:37, 7 February 2024 (UTC)Reply

@0DF The effort is quite admirable, thank you. I can't imagine it is sustainable across hundreds of entries, so generating variants with an automated template would be the long-term approach. The tables would probably include a prominent disclaimer like 'not all forms necessarily attested'. The automated romanisation can probably be prevented somehow to alleviate crowding. Nicodene (talk) 23:19, 8 February 2024 (UTC)Reply
@Nicodene: I agree that the transliterations take up too much space and that they probably are best removed by default from Byzantine inflection tables. I also agree with including a prominent disclaimer of the kind you describe. I got the forms of βαθύς I added to that table from Holton et al., volume II, pages 746–757, wherein βαθύς serves as their paradigm for “Adjectives with Originally 3rd-Declension Endings” (§ 3.3), specifically “Oxytone Adjectives in -ύς” (§ 3.3.1). On the basis of Holton et al., volume I, page xxxiii (“When whole words are enclosed in brackets in the tables, the forms in question may reasonably be assumed to have existed, but no example has been located in the LMedG and EMG texts examined, e.g. (μιανοῦ), (χρυσοῦ).”), I presented each form which they give in parentheses instead with a preceding asterisk, as is standard in historical linguistics. Despite there being many forms already, the forms given by Holton et al. are an under-representation, if anything (Holton et al., volume I, page xxxvii, prefacing the Bibliography: “Classical, post-classical, early medieval and other learned Byzantine texts are not included below.”; volume II, page 746, below the synoptic table for βαθύς: “Residual [scil. inherited Attic] forms, e.g. βαθέος, βαθεῖς, are not included in the above table, but will be discussed below where relevant.”; ibidem, page 242: “dative forms are not included in the paradigms set out in the chapters that follow”; ibidem, page 243: “the texts on which this Grammar is based – i.e. texts that are not systematically archaizing”); the forms Holton et al. give are only those non-dative forms which occur in lower-register texts written 1100–1700: a rather limited subset of the “Medieval and Early Modern Greek” whole that you'd reasonably expect that they're trying to describe. On the other hand, if we are to adhere to the 1453 cut-off for Byzantine Greek, we need to be careful to exclude those forms that occur only in texts from the seventeenth, the sixteenth, and/or the latter half of the fifteenth centuries.
I am increasingly recognising that inflection tables for Byzantine Greek terms ideally require certain features that are different from those that befit inflection tables either for Ancient Greek terms or for Modern Greek terms. One of those features would be the indication of pronunciation for each form, because the vocalic mergers of Byzantine Greek render its graphemes surjective upon its phonemes (i.e., with the exception of the bijective α/a/ and ου/u/, each vowel may be written in multiple ways, namely: αι, ε/e̞/; ο, ω/o̞/; ει, η, ι/i/; οι, υ, υι/y/, then, upon the completion of iotacism in the eleventh century, /i/) and because the representation of significant phonological processes that Byzantine Greek underwent (synizesis and various deletions) are only haphazardly reflected in spelling; this would call for a tie-in between a module such as that behind {{grc-IPA}} on the one hand and modules such as those behind {{grc-decl}}, {{grc-adecl}}, and {{grc-conj}} on the other. Another desirable feature, in the light of Holton et al., volume I, page xxxiii (“smaller tables classify the allomorphs as ‘General’ (if they occur widely in the texts examined), ‘Restricted’ (if they are found in only part of the period covered by the Grammar, or only in certain areas or certain types of text), or ‘Rare’ (if their occurrence is very limited)”), would be the means seamlessly to mark each form for its respective period, locale, genre, register, and frequency. The need for bespoke inflection tables, distinct from those designed for Ancient and Modern Greek, is an infrastructural and thematic argument in favour of treating Byzantine Greek terms separately from Ancient Greek terms on the one hand and Modern Greek terms on the other. 0DF (talk) 22:51, 6 March 2024 (UTC)Reply

phase 1

[edit]

notifying administrators for grc @Mahagaja, JohnC5, Erutuon also @Thadh, Theknightwho, Benwing2 More than one month has passed. Am I to procede with _phase_1:rename Byzantine to Medieval? Do I have permission by administrators to start? Would an admin help with Module:etymology languages/data to do with "Medieval Greek", and aliases = {"Byzantine Greek"}, ? (because I am not an administrator, I cannot intervene)? Thank you.
On some other points: (I did not expect σχοινοτενεῖς, prolix discussions in this page, but at the corresponding Beer talk. Nevertheless, I am obliged to respond and clarify:)

  • Early Modern Greek: @Mahagaja, yes, the phase 1453-1669 (termination of Cretan literature) is Early Modern Greek (πρώιμη νεοελληνική, interchangeably 'Late Medieval' (όψιμη μεσαιωνική) why? _1. because of its retained mediaevalisms many prominent linguists use interchangeably the terms 'Late Medieval' and 'Early Mod.Gr' -we can discuss further-. And mainly _2. I would not propose a split of Modern Greek or further splits in general. We study it under Med.Gr. because its original script is polytonic, all modules, translit, ipa, etc are already in place -probably some modifications, or a few templates will develop-.
  • Period versus Style/Register. Hellenistic Koine -or even Attic dialect- is used by authors long after the 6th century (even until the 20th century in the form of Katharevousa). The typology (inflections etc) of their words are as in Grammatical rules of Ancient.Gr. (a label like learned may be used for some medieval Koine-style neologisms that might interest Med. We will not duplicate existing Ancient Greek inflections.
  • Polytonic original script. Please note, that greek conservative linguists of past century would 'correct' forms at their editions according to Anc.Gr. rules, while the progressive ones (who were prosecuted during these polemic times -Trial of Accents) like Kriaras, at some point switched to monotonic. Nowadays, it is inconceivable to change the script of an original source at a critical edition. Please note, that everything greek up to 1982 was written polytonic. Nowadays, everything, ancient too- might be seen (e.g. at internet, new books) written monotonically, beacuse it is easy-to-type/cheap-to-print.
  • Polytypy in Hellenic: @Nicodene, yes, it is a fact. It is a stubborn language: flactuation in suffixes runs through all grk. See modern verbs like εκλέγω#Conjugation, Template:el-conjug-'ακούω'. One cannot avoid Modern Greek inflections because of their too many allomorphs -and see how much is omitted! Appendix:Greek_verbs#Omitted.-
  • Will Medieval Greek acquire inflections at once? No. It takes some time for discussions, proposals, trials, to crystallise a method. A Working/Trials/Feedback.for.Med page would be a good way to start. (please check some first attempts at wikt:el:παλληκάριον. wikt:el:σκοῦπα, wikt:el:Template:gkm-κλίση-ουσ, a neologism but learned = in the ancient fashion at wikt:el:ἀπόκρεως. Med.Greek does not have Dative. Our learned friend's 0DF table at βαθύς, is a fusion of Koine datives into Med. for the difficult categories -ύς, -ής of adjectives with lots of learned forms preserved. We do not add Dative at Mod.Greek either (Mod.terms with dative@el.wikt but not at its Tables.).
    I have not proposed any trial for an Appendix of clitic paradigmata and/or tables with the distinction of 'expected versus attested' forms yet.

Why try to formate a neoteric Section 'Medieval Greek' here rather than at el.wikt? Because here, there are so many learned and informed editors: experts -some, professionals-, who can help with their bibliography, their valued opinion in this project. At el.wikt I am totally alone in this project and I found it exhausting to update and patrol, make trials, have no feedback, no help for Med.Greek. All experts are assembling here. This wiktionary is the avant-garde of all wikts.
Admins! Please help to begin this project. Give me permission to start with _phase_1:rename Byzantine to Medieval. Allow _phase_2 (upgrade from etymol.language to an autonomus section), so that I can use the title Medieval Greek for poor Τζέτζης who has been waiting for this for a long time. Help, please, please, allow this long phase of greek at en.wikt to exist! Thank you. ‑‑Sarri.greek  I 04:58, 9 February 2024 (UTC)Reply

@Sarri.greek I tried to read through this discussion. It is confusing because there are two different issues (rename Byzantine -> Medieval, and split out Byzantine/Medieval from Ancient Greek). For issue 1 (rename), it looks like maybe two people (User:0DF and User:Nicodene) disagree with the name change and up to four are in support (User:Sarri.greek, User:Thadh, User:Mahagaja and User:Theknightwho). This is possibly enough for a rename but I feel uncomfortable without a clearer consensus, esp. given that I'm not sure whether User:Fay Freak opposes the name change and/or split (their prose is, as is typical, somewhat impenetrable). User:Erutuon and User:-sche seem willing to accept one or both changes but without a strong opinion. For issue 2 (split), it looks like User:Sarri.greek, User:Thadh, User:Mahagaja and User:Nicodene are in favor of a split, while User:0DF is undecided, User:Fay Freak possibly opposes (?), and User:Theknightwho has not expressed an opinion. Can all the people I just named let me know (1) did I get your opinion correct on both issues and (2) if not, what is your opinion, both about issue 1 (the rename) and issue 2 (the split)? Benwing2 (talk) 05:29, 9 February 2024 (UTC)Reply
Yes, support. Thadh (talk) 11:19, 9 February 2024 (UTC)Reply
 Support — long overdue!   — Saltmarsh🢃 06:26, 9 February 2024 (UTC)Reply
@Benwing2: I was more warning with respect to the ambiguous consequences, without obstructing. If people are willing to invest work for a split, it is not my due to oppose it, since I do not expect to do Greek in the medium-term anyway, as it is low on my priority list, relatively to other interesting languages – I have not even followed the forthgoing of the discussion and don’t know what you all exactly intend, especially with respect to the 300–600 time, when I have derived Arabic terms from Byzantine Greek when I am not really sure whether they are from before Islam or right after it or a century later etc. and it might be split to Late Koine and Medieval Greek, which I am not particularly keen to revisit either and Greek editors might be good enough to pinpoint. Fay Freak (talk) 07:07, 9 February 2024 (UTC)Reply
@Fay Freak OK thank you, that clears things up. Benwing2 (talk) 07:12, 9 February 2024 (UTC)Reply
Thank you @Fay Freak for not opposing. Indeed the period of Late Koine 300-600 (600 accepted as turning point with original-Greek parts of Novellae at Iustinianos legal reforms, -langugagewise, while history has a different periodisation-), is under the jurisdiction of Ancient Greek administrators. As seen at {{R:DGE}} and Bailly2020: these dictionaries extend to authors of up to 6-9th, 10th, 13th centuries, when such authors use Koine as high register. ‑‑Sarri.greek  I 07:36, 9 February 2024 (UTC)Reply
@Benwing2, sorry to bother you again: what is going to happen? Would you like me to call more people to vote? Mr @A. T. Galenitis who edits all phases of Greek including Medieval is away. As you see, not many are interested in Greek. But, I am, I am: I am willing and available! Every year, less and less people will be voting. In the end, I will be the only voter! I am awaiting and anxious to start editing. Thank you. ‑‑Sarri.greek  I 18:26, 15 February 2024 (UTC)Reply
@Sarri.greek I'd like to give it a couple of weeks. As it is looking, the split seems pretty clear and the name change is leaning towards, although User:0DF has not added their votes yet. Note that in general you should not canvas votes, i.e. ping people specifically for voting purposes esp. if you believe they will vote in a particular way that you desire. Benwing2 (talk) 00:51, 18 February 2024 (UTC)Reply
@Benwing2, of course, of course! people vote if they agree, not because i called them. I am just informing people with whom we have been discussing about this for more than a year, people that have -or want to- edit Greek. Just Mr A. T. Galenitis, an excellent editor, who supports strongly. But they do not come very often, and they do not get messages except from their Talkpages. I always check Related changes for el and for grc, and I am sorry to say, that there are very few people interested. Perhaps some editors doing very many languages, create some exotic lemmata. Thank you very much, I can wait, I know how busy you are. ‑‑Sarri.greek  I 08:53, 18 February 2024 (UTC)Reply
Thank you very much @Sarri.greek for bringing this into my attention and for putting once again the effort for this very worthwhile change. Indeed, I have been rather inactive lately, but as the creator of many gkm lemmata I am adamant on the need for this split with arguments which have been repeated multiple times. I would be more than happy to put the required work for my own lemmata and create more while at it. Regarding the naming, both approaches have some historical value (with varying power of persuasion) to them yet from a functional point of view it doesn't make much sense to oppose the recent literature and main body of research within the field where "Medieval Greek" has become dominant (vide Holton's et al. recent monumental Cambridge Grammar of Medieval and Early Modern Greek) A. T. Galenitis (talk) 17:09, 16 March 2024 (UTC)Reply
I am sorry that I write so schoenotenically; I have difficulty with concision. I'm also sorry that I have taken so long to respond; I have done a lot research regarding this topic since η Δις Σαρρή first petitioned the Beer parlour for these changes. As you'll see below, whilst I still oppose the change of this chronolect's name, I have come to support its split into a lect with its own L2 header, at least in principle. I feel I should explain my position, especially regarding my “concern…that η Δις Κατερίνα Σαρρή has a different understanding of what this vote endorses from the understanding of the other voters here”.
Δις Σαρρή· When you write things like:
I am left with the impression that you want the label “Medieval Greek” to refer only to the relevant period's basilect of the Greek diglossia. If that is your position, what then happens to the acrolect of that period? Does it remain part of Ancient Greek (grc)? And if so, should Katharevousa be treated similarly? Ultimately, is post-Classical Greek to be split primarily by register? Perhaps I've misinterpreted you, but if so, please clarify your position. If this is your position, you should make it explicit, so that everyone knows exactly what's being voted on. Perhaps this is what Fay Freak meant by “ambiguous consequences”. I could support either one, be it a split by period or by register. Here's a litmus test: In what variety of the Greek macrolanguage was the Suda originally written?
What I could not support is a split by period that excludes from Byzantine Greek its higher-register elements. You seem to want to do that when you say “Med.Greek does not have Dative.” and “We do not add Dative at Mod.Greek either”. It is untrue that Byzantine Greek does not have the dative; on the contrary, as Staffan Wahlgren writes, “The most important observation…is that the dative is so surprisingly alive and productive in such a wide range of Byzantine texts.” (Wahlgren 2014: Abstract) Even Holton et al. (2019: II, 241–243), whom I've already quoted at length above, acknowledge that “datives survive in many of the written texts that th[eir] Grammar is based on” and that “[p]articularly common are datives governed by the prepositions ἐν and σύν”, before recording their decision nevertheless to exclude all datives (except ἀλλήλοις) with the single sentence “Because the dative had ceased to be part of the spoken vernacular by about the 10th c., dative forms are not included in the paradigms set out in the chapters that follow.” — Blink and you'll miss it! And those datives aren't all just learned preservations; especially noteworthy is the Early Modern Cretan Greek noun ἐμπιστευτιός (empisteftiós), which is one of the “[w]ords belonging to [a] paradigm [which] have only been found in LMedG and EMG texts from Cyprus. In all cases these words are local variants of masculine words in -τής…. The earliest examples are from Assizes B (15th-c. ms).” (Holton et al. 2019: II, 451), and which has the dative plural form ἐμπιστευτηόδες (empisteftiódes) attested in a sixteenth-century text.
As a general concern, I think you lean on Holton et al. too much: their work has a far more limited scope than is immediately apparent. As Martin Hinterberger writes, despite the recent appearance of the Cambridge Grammar of Medieval and Early Modern Greek, it is not the “comprehensive linguistic description of written Byzantine Greek (in all its multifarious variants) [which] remains one of the desiderata of Byzantine literary studies” (Hinterberger 2021: 21); in my opinion, though not (explicitly) Hinterberger's, Holton et al. have treated the Greek of 1100–1700 “as a degenerated, deficient form of classical Greek, [which they have ignored,] or as an immature form of modern Greek” (Hinterberger 2021: 37). We should not do the same.
I want to end this on a note of praise. I admire the enthusiasm and hard work you pour into this. If I have the effect of applying brakes, please understand that I do so only to ensure clarity prevails and that the best decisions are taken, even if it might not seem that way to you. I notice that you are writing a module to handle the declension of all Greek nouns. I think this is a worthwhile effort, and it has a precedent in Module:zlw-lch-headword. It would certainly be good to have a common theme for all Greek nominal declension, since that would avoid such aesthetically objectionable clashing as currently exists in Λεϊβνίτιος (Leïvnítios). Keep up the good work! 0DF (talk) 01:51, 24 March 2024 (UTC)Reply

Rename to Medieval Greek

[edit]
  1.  Support ‑‑Sarri.greek  I 05:54, 9 February 2024 (UTC)Reply
  2.  Support   — Saltmarsh🢃 06:26, 9 February 2024 (UTC)Reply
  3.  Oppose - Byzantine is the more common term and no valid argument has been given against it. Nicodene (talk) 08:19, 9 February 2024 (UTC)Reply
    Thank you @Nicodene for your support for this language. Yes, the termByzantine is extremely common because we have Byzantine studies, Etudes Byzantines at Sorbonne, Byzantine Music, Byzantine Iconography, Byzantine Empire and so on. But I do not recall any language taking its name from an empire e.g. Roman Empire Latin, British Empire English? is there any example? Mandarin perhaps as non-linguistic term? The term was used pre-2000 influenced from the very common 'Byzantine' epithet. Greek linguists also used it, but later, preferred the term 'μεσαιωνικός, medieval. But, thanks anyway. ‑‑Sarri.greek  I 08:38, 9 February 2024 (UTC)Reply
    The actual comparison to *[British Empire English] would be *[Byzantine Empire Greek], which nobody says either. And it'd be strange to argue that British English, British music, and British art are all "named after an empire" just because there was also a British Empire. They're all named after Britain and the British people, just as all the things you mention are named after Byzantium and the Byzantines. Nicodene (talk) 09:08, 9 February 2024 (UTC)Reply
    @Sarri.greek: As Nicodene wrote, Byzantine Greek isn't named for the Byzantine Empire; rather, both are named for the Byzantines, who are named for Byzantium. Languages are usually named for people, places, or polities (and polities are usually named for either of the former). Because of what people and places can be named for, this can result in pretty weird language names. For example, Big Nambas (nmb) and Nez Perce (nez) are named for peoples with the same designation, and those peoples are named for their codpieces and misnamed for the Chinooks' nose piercings, respectively. Toponymically, East, South, and West Bird's Head are named for Bird's Head, a peninsula of Papua that looks, indeed, like a bird's head; I can only assume that Port Sandwich (psw) was named for the Vanuatuan coastal settlement that has since been renamed Lamap; and Western Desert (nine dialect codes) is named for desert areas in western Australia (chiefly Western Australia). Many creoles have strange names. Other language names are odd for etymological reasons; for example, Ukrainian (uk, literally “borderlandese”, althought this etymology is disputed) and Zamboanga Chavacano (cbk, literally “poor-taste mooring-place”). And then there are names that are picturesque, like Cœur d’Alêne (crd, literally “heart of awl”), Hill (mrj) and Meadow Mari (mhr), Large (hmd) and Small Flowery Miao (sfm), and Blue (hnj), Green (also hnj), and White Hmong (mww). By comparison, Byzantine Greek is not at all strange or particularly romantic (pun intended).
    I admit I got a bit carried away with the examples there. Sign languages are generally more clearly named for polities; for example, American (ase) and British Sign Language (bfi); compare the more obscure Maritime Sign Language (nsr). Dari (prs and gbz) supposedly derives from Classical Persian دربار (darbār, royal court) and one could argue that Dano-Norwegian is named for the political union Denmark–Norway. However, the language name most unambiguously named for an empire is probably Imperial Aramaic (arc), named for the Neo-Assyrian, Chaldean, and especially Achaemenid Empires. Finally, consider Ashokan Prakrit, which goes one step further by being named for a specific emperor, namely the Mauryan Emperor Ashoka the Great (regnavit circa 268–232 BC). 0DF (talk) 00:34, 7 March 2024 (UTC)Reply
  4.  Support {{abstain}} Both names seem about equally common, and I don't really care which one we use. I'm not opposed to either name. Thinking about it some more, I've decided I prefer "Medieval". —Mahāgaja · talk 09:53, 9 February 2024 (UTC)Reply
  5.  Support Thadh (talk) 18:32, 15 February 2024 (UTC)Reply
  6.  Abstain {{support}} Following the contributions of user 0DF to the discussion, I also see the merit of the term Byzantine Greek. Most importantly, I understand that I require additional reading before coming to a final conclusion. For the time being, abstaining (i.e. agreeing with either terminology to be adopted). A. T. Galenitis (talk) 21:28, 21 March 2024 (UTC)Reply
  7.  Oppose To avoid further perceptions of prolixity, I shall be terse:
    Reasons for “Byzantine Greek”:
    1. As I've argued before, the language should be called Byzantine Greek “because its production is inextricably linked to Byzantine civilization” (Hinterberger 2021: 22).
    2. Other things being equal, endonymy is desirable. However, ready apprehensibility by Anglophone readers often supersedes this consideration. The Byzantines usually called themselves Ῥωμαῖοι (Rhōmaîoi, literally Romans), their country Ῥωμανία (Rhōmanía), and their language Ῥωμαϊκή (Rhōmaïkḗ). English Romaic and Rhomaic exist, but I wager they're little-known, and likely to be mistaken as relating to Romani or Romanian. Ancient Greek Ἕλληνες (Héllēnes) exists, but is not specific to the Byzantine period, and “Hellenic Greek” Hellenistic Greek Koine Greek. There's Ancient Greek Γραικοί (Graikoí), but that's used for the macrolanguage “Greek”. There is marginal self-reference by Byzantines to their histories as Βυζαντιακαὶ (Buzantiakaì) and to themselves as Βυζάντιοι (Buzántioi), so “Byzantine Greek” is endonymic. By contrast, no people in the Middle Ages called themselves “Mediaeval” anything.
    3. “Byzantine” is a fairly familiar term to the average educated Anglophone. It is an epithet applied to a great many disciplines, journals, and phenomena pertaining to the empire of that name (v. e.g. [1], [2], [3]), the vast majority of the primary sources for which are written in Byzantine Greek. Cet. par., it is desirable that referents systematically related in such a manner should share a nomenclature. I doubt that those various disciplines would adopt the relatively cumbersome “Mediaeval Greek X” nomenclature to replace the relatively concise “Byzantine X” nomenclature, and it would be ungrammatical to do so in compound modifiers such as Serbo-Byzantine.
    4. The alphabetical and chronological orders of the three chronolects of Greek (that are written in the Greek alphabet) are the same. For any word homographic in the three chronolects — many (most?) consonant-initial ((pro)par)oxytones — this allows one to trace its development from Ancient Greek, through Byzantine Greek, and all the way up to the Greek of the present day by scrolling down the page and reading in order: a boon for comprehension. This serendipity would be lost if Byzantine Greek were renamed Mediaeval Greek.
    Reasons against “Mediaeval Greek”:
    1. Mediaeval means “of or pertaining to the Middle Ages (Latin Medium Aevum)”, but those Middle Ages were not universally significant. Traditionally, the Middle Ages are regarded as beginning in 476 with the fall of the Roman Empire in the West and as ending in 1453 with the fall of the Roman Empire in the East. Lingustically, the former had a considerable impact on Medieval Latin: the dissolution of Roman institutions, radical decentralisation, vernacular drift, development of feudalism, and immigration of unassimilated peoples lead to linguistic innovations and borrowing on a massive scale; often regarded as corruptions, various attempts were made to restore Classical Latinity, as in the Carolingian Renaissance, but these saw only partial success until the triumph of humanist Ciceronianism in the Italian Renaissance. Thus, Mediaeval Latin was succeeded by Renaissance Latin and then by New Latin. This makes the epithet “Mediaeval” highly suited to that chronolect of Latin. By contrast, Byzantine Greek saw no such dissolution, decentralisation, or feudalism, at least not until the Fourth Crusade; for Greek, the fall of 1453 was vastly more consequential than the fall of 476 — the opposite was true for Latin. This makes the epithet “Mediaeval” highly unsuited to that chronolect of Greek. For more, see Kaldellis 2019: ch. 4 (“Byzantium Was Not Medieval”), pp. 75–92.
    2. The adjective has four justifiable spellings: mediaeval, medieval, mediæval, mediëval. Byzantine has only one. Cet. par., that a term's spelling be uncontested is desirable.
    3. The English Wikipedia has three articles entitled “Medieval X” for languages (Medieval Greek, Hebrew [4th–19th CC.!], and Latin); in other articles I saw, they give Medieval Catalan as a synonym of Old Catalan, Medieval Spanish and Old Castilian as synonyms of Old Spanish, and for Galician–Portuguese they give the five synonyms Medieval Galician, Medieval Portuguese, Old Galician, Old Galician–Portuguese, and Old Portuguese. That would give the impression that, in language names, medieval and old are synonymous; not so Medieval Greek, which has the synonym Middle Greek (alongside Byzantine Greek and Romaic). Middle and Old are much more common as chronolect descriptors than Medieval (CAT:en:Languages has 2 members named “Medieval X”, 25 named “Middle X”, and 64 named “Old X”). AFAIK, no one calls Byzantine Greek “Old Greek”. IMO, “Middle X” only really works for languages with a threefold chornolectal division designated “Old–Middle–New X” or “Old–Middle– X”. Greek, however, has a four- or even six-fold division — Mycenaean–Ancient–Byzantine–Modern or Mycenaean–Homeric–Classical–Koine–Byzantine–Modern — one would be hard-pressed, especially in the latter, to describe the Byzantine chronolect as being in the “Middle”.
    4. Pace Κ. Α. Τ. Γαληνίτη, it is not at all apparent that the term “‘Medieval Greek’ has become dominant”, and contra Holton et al., here are uses of Byzantine Greek from three authors, with many more available. The ISO received three proposals in 2006–2009 to create new codes for Medieval Greek gkm, Ecclesiastical Greek ecg, and Katharevousa Greek elr; last year, the ISO rejected them all, partly due to “the lack of consensus among them” (p. 2). It is noteworthy that § 4 of the original change request for Medieval Greek gkm gave the language's name as “Middle Greek” and said of it that “[t]he language is distinct from Ancient Greek in vocabulary, phonology, and grammar, and displays linguistic attributes which are characteristically Byzantine and uncharacteristic of Ancient Greek” [my emphasis], whereas the first page of the request for the new language code element gkm gave, as the reason for preferring the name “Middle Greek” over the autonym “Romaiki” and the alternative names “Byzantine Greek” and “Medieval Greek”, that “Middle Greek” was the “[m]ost common amongst scholars” (!); it's only because Anastassia Loukina emailed SIL International to write that “the more common term used in Greek linguistics to refer to this stage of Greek is ‘Medieval Greek’ rather than ‘Middle Greek’” that the proposal was changed (by the ISO?) to one for “Medieval Greek”, although Δις Loukina merely asserted her claim, not citing anything. Is there any real evidence that any one term predominates?
    Alas! So much for avoiding prolixity…
    @A. T. Galenitis, Benwing2, Erutuon, Fay Freak, Mahagaja, Nicodene, Saltmarsh, Sarri.greek, -sche, Thadh, Theknightwho: For those of you who have voted or who intend to vote, I humbly request that you consider what I've written. For those of you not voting, I ping you in case you're interested and because you've taken part in this discussion before. To all of you, I apologise for the length of this post; I seem not to be very good at brevity. 0DF (talk) 07:37, 20 March 2024 (UTC)Reply
    I've read all you wrote above but am not convinced by it, certainly not enough to change my vote. Points 2 and 4 pro Byzantine strike me as irrelevant, and point 3 sounds like it could equally be an argument to use the term "Anglo-Saxon" instead of "Old English", which I trust no one in this day and age still wants to do. None of the arguments contra Medieval strike me as particularly strong. —Mahāgaja · talk 07:56, 20 March 2024 (UTC)Reply
    And what argument for 'medieval' struck you as strong? Nicodene (talk) 08:27, 20 March 2024 (UTC)Reply
    I think somewhere in this discussion or an earlier one I said I prefer "medieval" because it makes it clear that the lect in question is a chronolect, not a regiolect. —Mahāgaja · talk 08:37, 20 March 2024 (UTC)Reply
    Wut, even if Greek writing is located far in in Arabia or Ethiopia, I still call it Byzantine Greek provided it matches the period. Fay Freak (talk) 11:23, 20 March 2024 (UTC)Reply
    Right, but calling it Medieval Greek makes it clearer that what's relevant is the time period, not the location. —Mahāgaja · talk 11:39, 20 March 2024 (UTC)Reply
    The case can be made that 'Medieval' is chronologically explicit, but it is simply unimaginable that anyone could know the term Byzantine yet mistake Byzantine Greek for a regional label. Nicodene (talk) 11:59, 20 March 2024 (UTC)Reply
    I don't find that unimaginable at all. It's certainly more plausible than someone thinking Byzantine Greek referred to overly complex or intricate Greek, but we can't entirely rule that interpretation out either. —Mahāgaja · talk 12:56, 20 March 2024 (UTC)Reply
    It would require someone who knows about the city of Byzantium and yet is unaware of the existence of the Byzantine Empire, in other words a person that does not exist. As for the other potential sense of ‘Byzantine’, that is simply not an argument as it applies just as well to someone mistaking ‘medieval Greek’ as referring to a brutal or savage dialect. Nicodene (talk) 13:16, 20 March 2024 (UTC)Reply
    Was Byzantine Greek also used outside the borders of the Empire? —Mahāgaja · talk 13:32, 20 March 2024 (UTC)Reply
    Certainly, as it doesn't have to do with borders either.
    If anyone has ever actually used ‘Byzantine Greek’ to distinguish one variety of Greek from another based on region or geopolitical control I've yet to see any sign of it. Nicodene (talk) 13:59, 20 March 2024 (UTC)Reply
    So the language in question is used outside of the geographical area denoted by "Byzantine" but not outside of the chronological era denoted by "Medieval". That's why I prefer to call it Medieval Greek. —Mahāgaja · talk 14:11, 20 March 2024 (UTC)Reply
    ‘Byzantine’ is not a geographical area.
    The one, and only, valid point in this is as stated above - that ‘Medieval’ is more chronologically transparent. Nicodene (talk) 14:21, 20 March 2024 (UTC)Reply
    @Mahāgaja: Thank you for reading my rather overlong post. Responding to your points:
    1. Do you regard point 2 pro Byzantine as irrelevant because you disagree with the statement “other things being equal, endonymy is desirable”? If so, I understand you, since that statement is my axiom for that point. Otherwise, I would appreciate a rationale.
    2. I don't see how you could call point 4 pro Byzantine irrelevant for this project. In a dictionary of Byzantine Greek only, it indeed would be irrelevant, but since that's not what Wiktionary is, it's simply an error to call that point “irrelevant”.
    3. AFAICT, “Anglo-Saxon” — itself a compound modifier — is on all fours with “Old English” in terms of its suitability for forming compound modifiers. That seems like a disanalogy to me.
    4. Whereas “mediaeval” is traditionally clear vis-à-vis period (viꝫ 476–1453), a lot of usage muddies the waters. Jacques Le Goff throughout his career (or at least from 1977 onward) sought to extend the Middle Ages into “the eighteenth century, when, he believe[d], the European nation-states properly emerged” (Kaldellis 2019: ch. 4, p. 77). And conversely, some scholars of chronologically preceding and succeeding fields annex parts of the Middle Ages to their own periods: “The field of ‘late antiquity’ has been pushed by some to the early Carolingians (i.e., to the ninth century), whereas at the other end some historians of early modernity have reached back to claim everything after the twelfth century, when the European economy embarked upon a trajectory that would arc to modernity. With late antique and early modern historians claiming so much territory, that leaves only a rump Middle Ages squeezed around the turn of the millennium. [¶] Byzantium has little standing or stake in this debate.” (ibidem: pp. 77–78)
    0DF (talk) 15:27, 20 March 2024 (UTC)Reply
    I do disagree with the statement "other things being equal, endonymy is desirable". At Wiktionary, as at Wikipedia, what matters is what a language is commonly known as in English, not what its native name is. That's why we call German German, not Deutsch, and Dutch Dutch, not Nederlands. And no ancient language was known to its speakers with modifiers like "Old", "Ancient", "Classical", "Primitive" and so forth. And you yourself point out that Greek speakers of the era under discussion generally referred to their languages as (the Greek equivalent of) Romaic; but absolutely no one here is suggesting that Wiktionary's canonical name for this language should be Romaic. So that point is actually not an argument in favor of Byzantine at all; it's an argument against both Byzantine and Medieval. Point 4 is irrelevant because that's simply not a consideration we have ever had or ever should have. The names "Old Irish", "Middle Irish" and "Irish" are in reverse alphabetical order; so what? —Mahāgaja · talk 15:57, 20 March 2024 (UTC)Reply
    @Mahagaja: Re “what matters is what a language is commonly known as in English”, I already wrote that “ready apprehensibility by Anglophone readers often supersedes th[e endonymy] consideration”, so we don't disagree on the overriding importance of that. However, given a choice between two English names identical in their recognisability (which is an instance of that “other things being equal” qualifier), would you really maintain that endonymy wouldn’t even be a consideration to break the tie? That's not a strictly irrational position, but I would be surprised if you held it. Anyway, with regard to RomaicByzantineMediaeval, my point is that Romaic would be best in terms of endonymy, but its obscurity disqualifies it; whereas Byzantine and Mediaeval are comparably familiar to educated Anglophones, so Byzantine’s endonymy can break that tie. Is my position on this point any clearer now? That “Point 4” is nothing other than a consideration about page layouts which has some bearing on this issue; I'm not saying that it's a be-all and end-all, just that it's a relevant consideration, even if other considerations are primary. 0DF (talk) 00:09, 21 March 2024 (UTC)Reply

Split from Ancient Greek

[edit]
  1.  Support, as creator of this proposal ‑‑Sarri.greek  I 05:54, 9 February 2024 (UTC)Reply
  2.  Support   — Saltmarsh🢃 06:26, 9 February 2024 (UTC)Reply
    Thank you @Saltmarsh, my guru, mentor and administrator at Modern Greek! I promise to work as you have taught me. ‑‑Sarri.greek  I 06:33, 9 February 2024 (UTC)Reply
  3.  Support Nicodene (talk) 08:13, 9 February 2024 (UTC)Reply
  4.  SupportMahāgaja · talk 08:26, 9 February 2024 (UTC)Reply
  5.  Support Thadh (talk) 18:32, 15 February 2024 (UTC)Reply
  6.  Support A. T. Galenitis (talk) 16:46, 16 March 2024 (UTC)Reply
  7.  Support in principle — I am concerned, however, that η Δις Κατερίνα Σαρρή has a different understanding of what this vote endorses from the understanding of the other voters here. 0DF (talk) 07:46, 20 March 2024 (UTC)Reply
    See § phase 1 (above) for an explanation of this comment. 0DF (talk) 01:55, 24 March 2024 (UTC)Reply

?

[edit]

Happy month: καλό μήνα (kaló mína), @Benwing2, Mahagaja and everyone! Are we still on hold? I would like so much to come back, but how? having to write {m|gkm|xxx} all the time in pages with Ancient title... for example, @παπᾶς. I need: a month to review what exists. A year to do some labels for Learned Medieval (=archaisms and Hellenistic style), for Early Modern Greek (with medievalisms), some ready-to-fill-in inflection tables, some reference templates etc. I cannot even start without a code. Thank you. ‑‑Sarri.greek  I 17:00, 1 March 2024 (UTC)Reply

@Sarri.greek: I'm working on responses. Sorry for the delay. Please bear with me. 0DF (talk) 02:06, 2 March 2024 (UTC)Reply
Oh, M @0DF. What do you mean 'working on responses'? Please do not flood this page? We understand you are against. I shall make a special workpage-plan for MedGr once it is allowed. And with a talk page, and sections for every subject about it, where you can write as long texts as you like. Thank you. ‑‑Sarri.greek  I 06:04, 2 March 2024 (UTC)Reply
@Sarri.greek It looks like we have consensus for both changes, esp. for the split: 6-0 plus one undecided (User:0DF) for the split, 5-2 for the rename (User:Nicodene and User:0DF opposing). User:0DF, you never gave a response concerning the rename. Do you have anything you'd like to register (e.g. concerns, alternative suggestions, etc.)? Keep in mind that renames are easier to do than splits, so if for some reason it's decided in the future to undo the rename or switch to a third term, it wouldn't be such a big deal. Benwing2 (talk) 01:46, 17 March 2024 (UTC)Reply
Thank you all, thank you M @Benwing2! Great Sunday! I'm ready to start work! and will be checking the changes. I have prepared a trial-User:Sarri.greek/About Medieval Greek (in the pattern of WT:About Ancient Greek), a trial Template:User:Sarri.greek/gkm-IPA which needs to 'show' visibility, and more. Proposals and suggestions for the first-time-presentation of MedGr are welcome and needed from everyone, especially the administrators of Ancient and Modern Greek. e.g. at User About's Talkpage (or open an extra page?, please tell me, Sir, and everyone.) Thank you. ‑‑Sarri.greek  I
I don't have a ton to add to this discussion, any work to offer, or any great expertise - but in terms of periodizing, I wonder if it would also make sense to periodize Koine or classical as separate from ancient (in the sense that I guess ancient greek sort of goes until 300BC, and Koine/Classical goes until whenever we consider Byzantine/Medieval to start). My main thought here is that beyond new vocabulary borrowings replacing other vocabulary, or changes in grammatical forms or pronounciations, it is my extremely amateur perception that meanings, of some words at least, gradually shifted over the Ancient->Classical->[Byzantine to Medieval]->Modern period, especially as a result of Christianization. Or possibly that most attested pre-medieval greek texts are Classical rather than Ancient texts. I also think it's fine to call it Medieval Greek, that seems to be what English Wikipedia uses anyway. Anyway, that's my very late 2 cents to add to this discussion. -Furicorn (talk) 09:57, 31 August 2024 (UTC)Reply
@Furicorn: Thank you for your contribution. Currently, Ancient Greek is all Greek prior to 1453 except for that written in Linear B (which is Mycenaean Greek). Classical Greek and Koine Greek are not synonyms. The core of Classical Greek is the Attic Greek of the 5th century BC. Koine Greek is the form of Greek that developed as a consequence of the language's spread by the empire of Alexander the Great. It would certainly be possible to split Greek five ways — Mycenaean–Ancient/Classical–Koine–Byzantine/Mediaeval–Modern — but I expect that would result in a lot of redundancy, and I'm not sure it would be worth it. In reality, there is more difference between Homeric Greek and the rest of what we currently call Ancient Greek than there is between Classical Greek and Byzantine Greek, but that is not the split that was originally proposed in this discussion. As to what to call this chronolect, “that seems to be what English Wikipedia uses” is not a very strong argument unless you can tell us why it uses that name. 0DF (talk) 19:17, 12 September 2024 (UTC)Reply

Continuation (originally on Sarri's talk page)

[edit]

(moved from User_talk:Sarri.greek/2024#The_old_discussion_in_its_new_place)

Hello, Sarri. It was good of you to create that updated signpost. How is your health nowadays? Do you feel up to answering those questions I posed you at WT:LTR#Medieval Greek from Ancient Greek yet? No pressure if not; I just thought I'd check. All the best. 0DF (talk) 05:00, 27 September 2024 (UTC)Reply

Hello M @0DF, thank you for your interest. Healthwise, I am under therapies (sometimes very hectic). I apologise, that I cannot remember your questions. I am typing with difficulty and I cannot participate in discussions that are too long.
If they rename 'Byzantine' to 'Medieval' or 'Mediaeval Greek', I will be able to check all occurrences. If they split Medieval Greek from Ancient Greek, I will slowly edit the not so many pages involved, marking the too many unmarked Koine entries too.
Attn @Benwing2, Chuck Entz as linguists and bureaucrats: I believe I will have the time to do it. It is simple: There IS a mediaeval period for Greek (grk) (working code everywhere: gkm, or more 'officially' proposed here as grk-gkm). Please include it in en.wiktionary, filling a gap of some 10 centuries from grk's c.3,000+ history. I might make a few very simple templates needed when necessary.
I will be happy to answer questions here; excuse my short answers. Thank you ‑‑Sarri.greek  I 13:28, 27 September 2024 (UTC)Reply
@Sarri.greek: I'm sorry to hear it's still rough for you. The questions are all in WT:LTR#Medieval Greek from Ancient Greek, but that became quite a long discussion by the end. It doesn't seem very ethical to subject you to more questioning in your current condition. Since renaming the chronolect was and is a matter of some contention, what's stopping you from being “able to check all occurrences” of Byzantine [Greek]? 0DF (talk) 01:16, 28 September 2024 (UTC)Reply
@0DF, thank you. gkm automatically gives 'Byzantine'. I have already cleaned up all old manual edits for the language. If administrators that are professional linguists do not prefer the title 'Medieval' to 'Byzantine', there is nothing I can do. ‑‑Sarri.greek  I 06:00, 28 September 2024 (UTC)Reply

@0DF @Sarri.greek I don't think either of you is going to change their mind with further discussion, so I don't think more questions are in order. Given that there is a (bare) supermajority of 4-2 (pro: @Sarri.greek @Saltmarsh @Thadh @Mahagaja; con: @Nicodene; @0DF) with one abstention, I am inclined to go ahead with the rename of Byzantine -> Medieval. More importantly, there is strong support for splitting Medieval/Byzantine out of Ancient Greek, and I don't want this name dispute to be a blocking issue. If it turns out that later on we decide to go back to the name Byzantine, that is not hard to do and I can do it by bot (I've done plenty of such renames before). @Sarri.greek Please be aware that logistically, splitting out gkm into its own L2 language requires adopting a temporary code for either the new L2 language or the old etym-only language while both are coexisting, until everything is moved. My inclination is actually to do the following:
  1. Set up tracking for both the gkm and newly adopted grc-gkm codes.
  2. Rename the etym-only code gkm -> grc-gkm by bot. Leave its name as "Byzantine Greek".
  3. Remove the etym-only tracking for gkm once there are no more references. Leave the tracking for grc-gkm; this will make Sarri's job easier below.
  4. Create a new L2 language gkm named "Medieval Greek". (Having them have different names is fortuitous as it will avoid some complaints about duplicate language names.)
  5. Sarri, over time, moves the relevant entries to the ==Medieval Greek== header and cleans up any existing references to grc-gkm.
  6. When all references to grc-gkm are gone, we can remove this etym-only code.
Also pinging @Theknightwho and @Surjection (who have been involved in prior language splits) for any technical comments. Benwing2 (talk) 07:12, 28 September 2024 (UTC)Reply
M @Benwing2, thank you so much for your reply and thorough plan! I can see how busy you are, dealing with so many languages. I appreciate your work, and your decision; a true gift to grk but also to me, personally. Please note, that admin @Mahagaja has proposed official code grk-gkm, not grc-gkm as it is a period of Hellenic language (grk). I'll follow your edits closely and will do my best, a bit slowly, but diligently; I shall rename Categories, update wikidata and do all the work where adiministrators need not to be bothered. I search with insource: and intitle: I might make a little label to produce: Late Medieval or Early Modern Greek +cat, if I encounter words of 1500, 1600. High register (with datives etc, similar to Koine), can be covered by {lb|gkm|learned}. Thank you, thank you. ‑‑Sarri.greek  I 08:31, 28 September 2024 (UTC)Reply
@Sarri.greek I'm not sure the context behind grk-gkm but grc-gkm is a temporary label used because it refers to a variety of grc. The temporary label will go away once everything is converted to the L2 language gkm. Benwing2 (talk) 08:40, 28 September 2024 (UTC)Reply
@Sarri.greek with apologies - this is really beyond my "pay grade" and terra incognita to me. I rarely venture there. To you personally Sarri - I have friends who have been through similar tribulations and worries, my best wishes.   — Saltmarsh 17:48, 28 September 2024 (UTC)Reply
@Benwing2: I'll agree that no one objects to splitting Byzantine Greek out of Ancient Greek, and that the split should go ahead; however, I don't see how “this name dispute [is] a blocking issue”. Why can't the split take place without changing the name? Unfortunately, I think the original discussion suffered for its obscure location in WT:RFM (now moved to the even more specialised WT:LTR); perhaps there would have been greater engagement had it taken place in WT:BP, where Sarri.greek initially posted about it. To remedy this, shall I draft a vote about the naming issue?
AFAICT, most of what needs to be done on “the front end” is to edit the 289 member-entries of Category:Byzantine Greek to rename, split out, or duplicate their contents as appropriate; of its member-subcategories, only Category:Byzantine surnames needs (presumably) to be renamed Category:Byzantine Greek surnames. As for the changes on “the back end”, I don't really understand why that six-stage process is necessary. Would it not be sufficient to make {{lb|grc|Byzantine}} (and its aliases) categorise into the temporary topic category Category:grc:Byzantine Greek until all the relevant entries are edited to use the new L2 header? I apologise if that is a naïve question. 0DF (talk) 09:14, 30 September 2024 (UTC)Reply
M @0DF. They are not Byzantine. Not necessarily. I would edit under such a title only for historical, artistic fields, probably at wikipedias. Thank you. ‑‑Sarri.greek  I 09:29, 30 September 2024 (UTC)Reply
@Sarri.greek: Sorry, what aren't Byzantine? 0DF (talk) 09:39, 30 September 2024 (UTC)Reply
@Sarri.greek: Do you mean the surnames? Are they, rather, Koine? Shall I recategorise them? 0DF (talk) 14:14, 30 September 2024 (UTC)Reply
M @0DF, aimez-vous les byzantinismes? ‑‑Sarri.greek  I 16:44, 30 September 2024 (UTC)Reply
@Sarri.greek: I'm sure that's very witty, and perhaps I should respond « Pas du tout ! » or something; I'm also sure you're better acquainted with French literature than I am. But to interpret you literally, rather than literarily, no, I don't tend to adopt overcomplicated solutions, and I fail to see how my technically naïve suggestion to Benwing is in any way more complicated than the plan he laid out. 0DF (talk) 17:40, 30 September 2024 (UTC)Reply
@0DF I don't honestly see the need to relitigate this with a formal vote. WT:RFM is the normal place where language moves and splits used to happen (and now WT:LTR). AFAIK everyone has been pinged and had a chance to comment, and the process requesting yes/no votes has been open far longer than a standard vote. However, I will defer to what User:-sche says, who has been the overall person shepherding language moves through; do you think a formal vote is needed? As for the plan I suggested, yes this is necessary because of the existence of the current gkm code; the labels are not the only place that Byzantine/Medieval Greek is being referred to. Benwing2 (talk) 19:08, 30 September 2024 (UTC)Reply
@Benwing2: I figured I'd wait a while before replying, with a view to letting things cool off and in the hope that Ms Sarri might acquiesce to my supplication for a rationale, but no such luck. If this were a līs, the prosecution would be expected at least to make a case. That has yet to happen; consequently, a judgment notwithstanding the verdict is appropriate. There has also been canvassing (see Special:Diff/78033207, Special:Diff/78033252); those are grounds for a “retrial”, surely. We might expect a safer finding from a superior court. In the meantime, I reiterate my suggestion that we make the split without changing the name, since there are no objections to doing that. 0DF (talk) 00:53, 21 October 2024 (UTC)Reply
Plan for Medieval Greek
[edit]

Dear M @Benwing2, I keep checking Watchlists and your contributions, awaiting for your #plan for Medieval Greek. (WT:LTR#Medieval Greek from Ancient Greek) I know how busy you are with more important languages. But I am available for Greek and waiting... Long hours in front of the computer... Thank you ‑‑Sarri.greek  I 20:02, 18 October 2024 (UTC)Reply

@Sarri.greek: I ask again that you present some actual reasons for this proposed name-change, especially since you said in February that you “wouldn't mind terribly either” name. What's changed? 0DF (talk) 00:53, 21 October 2024 (UTC)Reply
[by Sarri.greek: Please excuse a short recap of a long discussion.] The essence of my proposal (as at Jan2024petition & sources for documenting our lemmata in March2023) was to humbly inform the community of editors of the English Wiktionary of the newest developments on Medieval Greek language studies, defined now as a Medieval period of a language instead of "Byzantine" language as we often have been seeing at chairs of "Byzantine Studies" in universities all over the West. This was not MY opinion, but the opinion of professors like w:David Holton, w:Geoffrey Horrocks and many others. I humbly asked the linguists of en.wikt to take a look at their introduction at T:Cambridge Grammar of Medieval and Early Modern Greek
p.xvii … "as Greek scholarship was relatively slow to catch up with the advances made in textual criticism and editorial practice for medieval texts in other major European languages. Over the past thirty or so years much has changed in relation to the situation described above"
And they conclude at p.xix "For this reason we do not employ the term “Byzantine Greek”: for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not “Byzantine” in a political sense. Our criteria are instead internal ones, based on clusters of important linguistic changes that we see as occurring around 1100, 1500 and 1700"
Whether there are linguists specialising in Greek (ancient, medieval or modern) who object, I did not hear any references from editors who object to the name "Medieval". The vote for naming languages is not about how one feels about it, but to give a chance to all the community to bring in references and enrich the information available.
Who is the linguist that opposes to the term "Medieval Greek" or the existence of it as a distinct period? ‑‑Sarri.greek  I 20:17, 21 October 2024 (UTC)Reply
@Sarri.greek @0DF We're looking for consensus rather than litigating something in a court of law, so I am reluctant to do something like a judgment notwithstanding the verdict, which would involve overturning the consensus. I asked on Discord in the #hellenic channel and several additional people expressed support to one degree or another for the term "Medieval" and none for "Byzantine", so I am going to go ahead with the rename. Keep in mind that consensus does not have to be (and often is not) completely unanimous, and that renames are relatively easy to undo if for some reason this needs to happen. At the same time, however, a few people on Discord expressed strong reservations about the split and also said the amount of work required to effect the split might be a lot more than we think. Combined with the fact that in my experience, merges are even harder than splits, suggests to me that we should go slow in putting a split into practice. Let's first effect the rename, get the kinks worked out, and only then revisit the issue and look more deeply into what the split will involve. Benwing2 (talk) 20:38, 21 October 2024 (UTC)Reply
No problem with "slow", thank you M @Benwing2. Who are the linguists (contemporary ones) referred who think that it is not a distinct period? ‑‑Sarri.greek  I 20:55, 21 October 2024 (UTC)Reply
@Sarri.greek I'll let them reveal themselves if they want; they said they didn't want to participate in this discussion to avoid causing further upset and distress. Benwing2 (talk) 01:02, 22 October 2024 (UTC)Reply
@Sarri.greek, Benwing2: My apologies for taking a while to reply here. I am exceptionally busy in real life ATM, so haven't had the combination of time and lucidity to respond until now.
@Sarri.greek: In response to your question (“Who is the linguist that opposes to the term ‘Medieval Greek’…?”): I already cited above the Byzantinist professor Αντώνιος Καλδέλλης (Antónios Kaldéllis), whose 2019 book, Byzantium Unbound, has a chapter explicitly entitled “Byzantium Was Not Medieval”, which I excerpted in Citations:medieval. (Kaldellis is an Athens-born Greek, which I mention because it seems to matter to you.) The gist of his argument is that the Greek world didn't undergo the Middle Ages (the approximate millennium of notional benightedness intermediate between the dissolution of Classical civilisation and its Renaissance – literally “rebirth”), so the adjective “mediaeval” is improper when applied to it. Really, the closest analogue to the Middle Ages in the Greek world was the Τουρκοκρατία (Tourkokratía) of 1453–1821. Sure, you can call 1821 Greece's “delay[ed R]enaissance”, though only if you call 1453 its “delayed Fall”, but then it becomes clear that Latin Europe's Middle Ages and the Greek world's “Middle Ages” have basically nothing to do with each other. I have more to say, but I'll leave it there to keep it short.
@Benwing2: I applied the “court of law” analogy because you used the term relitigate, that's all. Still, I think it was illuminating: I'm sure you won't deny that this whole proposal is ill-conceived and ill-pursued. I do find it galling that Ms Sarri can so grossly mischaracterise the foregoing discussion with statements like “I did not hear any references from editors who object to the name ‘Medieval’” having, by dint of her sheer obstinacy, eventually found a forum that will wave her proposal through; independently of the merits of her proposals, that kind of evasive and dishonest behaviour should not be rewarded. Finally, you wrote that you are “reluctant to do something like a judgment notwithstanding the verdict, which would involve overturning the consensus,” before immediately invoking a conversation on Discord in order to overturn the consensus! How is such a double standard justifiable? But as a general principle, there is no way that “people on Discord said” should carry such weight, least of all when people on-wiki have not been made privy to the text of the discussion and those who opined on Discord decline to “own” their comments in the publicly-accessible on-wiki record. Wiktionary:Discord server neither permits nor prohibits such trust-me-bro invocations, but w:Wikipedia:Discord#Consensus clearly states that the relevant part of Wikipedia's policy on consensus (“Consensus is reached through on-wiki discussion or by editing. Discussions elsewhere are not taken into account. In some cases, such off-wiki communication may generate suspicion and mistrust.”) applies. We should have the same regulation, and judging by the comments by AG202, Mnemosientje, Sgconlaw, CitationsFreak, and koavf in Wiktionary:Beer parlour/2024/October#change in color to nyms, usex, affixusex, such a regulation has considerable community support.
0DF (talk) 17:56, 3 November 2024 (UTC)Reply

Solombala English

[edit]

Howdy folks! Am wondering if it may be a good or a bad idea to add a new language code for Solombala English, which is a very little attested pidgin, which has some common features with Russenorsk. It has only 20 known words, and two of them are obviously misunderstood by the later translators (but can be seen in the original sources). All the words, as far I know, are presented here: w:ru:Соломбальский английский язык (I added some commentary and sources there as well, but long time ago). The main reason of my request is that Solombala may be useful in etymology of some Russenorsk words. Tollef Salemann (talk) 17:43, 9 February 2024 (UTC)Reply

Support. Theknightwho (talk) 08:54, 25 February 2024 (UTC)Reply

Created as crp-slb, since this has been open for a couple of weeks, and no-one else seems to have much to say. @Tollef Salemann.

I have given it the Cyrillic and Latin script codes because, having checked, the original 1849 source uses (pre-reform) Russian Cyrillic, but modern sources seem to prefer a Latin transcription exclusively: e.g. "vat ju vanted, asej!" is actually "ватъ ю вантетъ, асей!" in the 1849 source (pp. 406-7); note that вантетъ (vantet) has been transcribed as vanted, for instance. I can't find the 1867 source referred to, but I assume it's also in Cyrillic.

Please let me know if you think we should be handling the scripts in a different way, though. Theknightwho (talk) 09:30, 25 February 2024 (UTC)Reply

Thank you! There are also "my" instead of "tu". This was mistake of Broch i guess, and it seems like im the only who noticed it. There is also a funny story with his translation of "milek", cuz it was used in some adult context. As far i remember, there is no original Latin script Solombala, but im gonna first check through all the sources for being sure. The 1867 source took me a while to find last year, but i remember it wasn't impossible. Tollef Salemann (talk) 11:07, 25 February 2024 (UTC)Reply
@Tollef Salemann Alrighty - let me know if you think we should remove Latn. I should have also said that I've also set it to use Russian transliteration, for obvious reasons. Theknightwho (talk) 03:04, 27 February 2024 (UTC)Reply


Converting Min Nan into a family

[edit]

Currently, we classify Min Nan (nan) as a language, despite it being a family of several Chinese lects. Because of this, the way we treat those lects is arbitrary and inconsistent.

  • Hokkien and Hainanese are both classified as etymology-only languages, despite Hokkien covering several major (dia)lects in its own right, and it being very common for entries to have a large number of Hokkien readings. It's not currently possible to add Hainanese to {{zh-pron}}, but it's also on the roadmap. In terms of how they are used, nothing distinguishes them from how we handle any of the full languages under the Chinese header, so there's no reason to classify them like this.
  • On the other hand, Teochew and Leizhou Min are classed as full languages, but they both have Min Nan set as their "ancestor", which is nonsense. I assume this was done so that the family tree looked right (see Category:Old Chinese language), but this has clearly happened because editors think of Min Nan as a family, not a singular language.

Currently, there is a pending request at the ISO in order to split Min Nan into a macrolanguage (though I won't address those which we don't currently have codes for, since that discussion is for another time).

  1. nan should be converted to a family code.
  2. Hainanese (nan-hai) should be converted to a full language.
  3. Hainanese, Hokkien (nan-hok), Leizhou Min (zhx-lui) and Teochew (zhx-teo) should be on the immediate level below.
  4. Given the large number of entries with numerous Hokkien readings, there are two options:
    1. Convert Hokkien to a full language, with Quanzhou, Zhangzhou and Xiamen etymology-only languages, possibly with the addition of Taiwanese Hokkien.
    2. Convert Hokkien to a family, and have Quanzhou Hokkien, Zhangzhou Hokkien and Xiamen Hokkien as full languages on the level below. I have no opinion on whether Taiwanese Hokkien (which is split out in the ISO proposal) should be treated separately if we do this.

Theknightwho (talk) 13:07, 17 February 2024 (UTC)Reply

 Support the first three bullet points, but  Weak oppose on the fourth:
  • a potential slippery slope: Singapore, Penang, Longyan, etc. could warrant full languagehood if ZXQ and Taiwan are split
  • treatment of the above would be ambiguous due to the nature of Hokkien potentially not being monophyletic and the fact that eg. Taiwanese can’t really be called “a dialect of Amoynese” despite their shared transitionary nature
  • to draw a parallel with Northern Wu, Shanghainese and Suzhounese, both not being full languages, occupy a very similar geneological level when compared to ZXQ, though as far as the current trajectory is going, they will not be gaining full language-hood any time soon
Just my two cents — 義順 (talk) 02:57, 18 February 2024 (UTC)Reply
@ND381 Just to be clear, does that mean you support option 1 of point 4? Theknightwho (talk) 14:42, 18 February 2024 (UTC)Reply
ah yeah I misread what that said — yes, I would be in support of option 1 of the fourth point — 義順 (talk) 16:38, 18 February 2024 (UTC)Reply
@ND381 What do you mean by "transitionary"? (talk) 11:31, 28 February 2024 (UTC)Reply
I don't particularly know to much abt Hokkien linguistcs (I do Northern Wu) but from what I understand Amoynese and Taiwanese both exhibit features of both Zhangzhou and Quanzhou lects — 義順 (talk) 12:01, 28 February 2024 (UTC)Reply
@ND381 I see. This is the common wisdom, I guess.
In truth, it makes little sense to pretend that "Zhangzhou" & "Quanzhou" are cardinal dialects. For one thing, there is a great deal of variation within what are supposed to be "Zhangzhou" Hokkien & "Quanzhou" Hokkien. Quemoy & Tâng-oaⁿ 同安 dialects of "Quanzhou" Hokkien, as a clear example, are themselves "transitional to Zhangzhou". So the entire "Zhangzhou-Quanzhou" framework is made of duct tape. "Zhangzhou-Quanzhou" reflects Confucian administrative loyalties more than anything else, as the English terminology (via Mandarin Pinyin) suggests. And the exclusion of Amoy Hokkien from "Quanzhou" is arbitrary & inconsistent in itself. So, there's "nothing there", even if certain isoglosses unsurprisingly bundle along the old prefectural border. (talk) 08:59, 29 February 2024 (UTC)Reply
Similar to ND381,  Support the first three points. The second subpoint of point 4 is a terrible idea, since it leaves out Zhangzhou-Quanzhou mixed varieties of Hokkien, which is one of the reasons why "Hokkien" isn't monophyletic. It's also unclear whether dialects like Jinjiang and Philippine Hokkien would be subsumed under Quanzhou. While we're at this, we would also need to see how certain other varieties of Min Nan are dealt with under the structure based on the first three points, namely Longyan (including Zhangping), Datian, Youxi, southern Zhejiang and Zhangzhou-based varieties spoken in Guangdong/Guangxi. While the Language Atlas of China groups Longyan with other Quan-Zhang varieties, it seems that it traditionally isn't considered "Hokkien". We might also want to see where Hailufeng Min fits here. (I'm writing this in a little rush, so there might be more points that come along after.) — justin(r)leung (t...) | c=› } 14:30, 18 February 2024 (UTC)Reply
@Justinrleung No, "Longyan" is most definitely not part of Hokkien, either linguistically or sociolinguistically.
Hai Lok Hong Hoklo is clearly parallel to Hokkien & Teochew.
The Hokkien dialects of southern Zhejiang are clearly part of Hokkien.
Many or most pieces seem poised to fall into place. (talk) 11:36, 28 February 2024 (UTC)Reply
@ I agree with you on this - Longyan should definitely be treated separately. I omitted it from the proposal because I specifically wanted to address the issue of whether we should treat Southern Min as a family, so I only mentioned the codes we currently have. It’s not supposed to be comprehensive, and in fact I was hoping it could set the stage for further additions, as I thought this change should probably happen before we add anything else. Theknightwho (talk) 13:07, 28 February 2024 (UTC)Reply
No particular vote as I don't think I'm qualified enough to discuss about Southern Min here as I very rarely edit it, but I share similar views with ND and Justin based on my limited understanding of the internal structure of Southern Min after reading Kwok (2018).
I reckon the treatment of Zhongshan Min should perhaps also be discussed here, given that Glottolog treats it as a subbranch of Southern Min, although it seems like some of it is Eastern Min. Eitherway I think it will need a code. – wpi (talk) 14:09, 23 February 2024 (UTC)Reply
Seconding this. Apparently, so-called "Zhongshan Min" is three mutually unintelligible languages, two of which may not belong to the NAN family (?) at all. (talk) 11:42, 28 February 2024 (UTC)Reply
I don't have many knowledge of the relationship between ZQX Hokkien and other Hoklo varieties like Chaozhou and Hainan.
However, Amoy variety, Quanzhou variety, and Zhangzhou one are mutually intelligible to some extend. Amoy varieties should be treated like a dialect of ZQX language linguistically. Just like Irish deirfiúr that has contained various pronunciation from the dialect locations in Ireland.
Concerning with whether the Taiwanese (Taigi) should be treated like a fully language or a dialect of Hokkien, it's something like Serbo-Croatian language separation issue.--Yoxem (talk) 10:50, 28 February 2024 (UTC)Reply
@Theknightwho  Supporting Item 2.
Not opposing Item 1 (nor Item 3) in this context, but — even disregarding misplaced outliers — how much evidence is there that these languages (say, Hainanese & Hokkien) belong to one family in a historical sense? (Wikipedia doesn’t treat Singlish & Jamaican Creole, for instance, as being in the same language family as English. Or do we use the term “family” differently around here?)
 Supporting Item 4.1, excluding Taiwanese.
The “Zhangzhou-Quanzhou-Amoy” split reflects the mapping of Confucian loyalties. It corresponds somewhat to linguistic reality, but attempts to package “Zhangzhou” Hokkien & “Quanzhou” Hokkien in a systematic manner seem to give off more smoke than light, as suggested by Mar_vin_kaiser’s comment clarifying what “Zhangzhou Hokkien” should mean.
So so-called “Zhangzhou” Hokkien or “Quanzhou” Hokkien or Amoy Hokkien are all just Hokkien. The “Zhangzhou-Quanzhou” split reflects Confucian psychology, not linguistic reality, and “Amoy” was set up as a third group not for linguistic but for Confucian or face-related (“face truce”) reasons. If some words have lots of pronunciations, in part this reflects the sociolinguistic reality of a wide range of dialects being recognized as a single language. Also, marginal pronunciations seem to find their way into Wiktionary for Hokkien much more than for most other languages, but as long as they exist (and not just idiolectally) & are non-extinct, this is good & well. If extinct or poorly attested pronunciations are swelling the ranks, methods may need examined, but that’s for some other day.
There is something to be said for treating Penang-Medan Hokkien as another language. Even w/o getting into the genesis of Penang Hokkien, the phonology of the variety seems to bend the rules of plain Hokkien. But the convention seems to be to treat it as a dialect within Hokkien, and this in turn reflects the sociolinguistic reality. (talk) 11:54, 28 February 2024 (UTC)Reply

Pinging @Mar vin kaiser, Singaporelang, Mlgc1998, 幻光尘, LeCharCanon, MistiaLorrelay, Kangtw, The dog2, TagaSanPedroAko, Janinga Chang, Yoxem, 汩汩银泉, RcAlex36, Geographyinitiative for comment, who are all users who've edited recently that have some knowledge of Min Nan. Theknightwho (talk) 11:16, 27 February 2024 (UTC)Reply

Thanks for calling - but actually I'm not proficient on the historical & comparative linguistics of Minnan, so I'll report the opinion from @S.G.Junge1997 who is currently working on various Southern Han varieties (I'm doing so because he's currently suffering from IP block).
“As almost all the Sinitic languages that we discuss here, including Southern Min, Northern Wu and so-on, are de facto macrolanguages, it would be not proper to list just some variety of these macrolanguages as distinct languages while to consider other least-concerned languages a part of the huge dialect continuum, not mentioned the phonological, lexicological or genetic differences between the least-concerned varieties are much larger than these varieties with metropolitan native speakers. Janinga Chang (talk) 15:55, 27 February 2024 (UTC)Reply
...Taking Southern Min as an example, the macrolanguage Southern-Min itself is emerged among a group of coastal Min varieties in Dàtián, Fújiàn and surrounding area. Genetically, Southern Min can be divided into three varieties, the Western varieties used in Lóngyán and Zhāngpíng, Fújiàn Province, some remnants in Guǎngdōng Province (namely Zhōngshān Hokkien and some varieties of Leizhou Min), while the majority of Southern Min languages are in fact dialects of the massive Eastern varieties, including Chaozhou, Southern Min proper and Taiwanese Southern Min, these varieties shared a huge amounts of vocabularies and intelligibility, with only some of the characteristic vocabularies shared inside different branches. I'm not arguing about not list Chaozhou and Southern Min proper as different languages, but if one should consider listing Chaozhou and Quanzhou-Zhangzhou Southern Min or even Taiwanese Southern Min as separate languages appropriate, they must consider listing Dàtián qiánlù, Dàtián hòulù, Kǒngfūhuà, Sūbǎnhuà, Yànshí-Báishā, Lóngyán proper, Yǒngfú-Héxī, Zhāngpíng proper, Xīnqiáo-Xīnán and other small varieties concerned way less as distinct languages as well, (apart from Dàtián qiánlù and Dàtián hòulù, all these languages are different varieties of Western branch of the Southern Min which are using in different valleys around Lóngyán, most of which have less native speakers than 10k and are critically endangered, and although most of these languages share some common features, their differences in vocabularies and phonologies make them less intelligible internally than most of Eastern branch varieties, even not considering Chaozhou and Southern Min proper as different languages, some of these languages are still so diverse to be okay to be listed as separated) as it wouldn't be so appropriate to have "endangered" language varieties with often more than 1000k metropolitan native speakers listing as different languages while ignoring the real endangered languages with less than 10k native speakers and trying to hide their differences using a leftover garbage can discarded by thie metropolitan people who think their language is absolutely unique.”
Although this might sound offensive to some who values the traditional Quanzhou-Zhangzhou-Amoy-Taiwan layout more, his opinion is definitely worth considering since he had actually been to Longyan for fieldworks for several times. Janinga Chang (talk) 16:05, 27 February 2024 (UTC)Reply
Hi! I  Support the first three points, same as the ones above. I also reject the second subpoint of point 4 for the reasons mentioned. For the first subpoint of point 4, I support making Hokkien a full language. As for "etymology-only languages", I find it vague to say that a word from language X originates from "Zhangzhou Hokkien" when the way we've been using the term "Zhangzhou Hokkien" is the dialect specific to Zhangzhou city proper, and the word might have borrowed it not from Zhangzhou city proper. Seeing the reply of S.G.Junge1997, I'd be open to proposing Datian Min be listed as a separate language. --Mar vin kaiser (talk) 16:13, 27 February 2024 (UTC)Reply
@Mar vin kaiser Just FYI: "etymology-only language" is a misnomer; a much better description is "variant", as it covers everything from written standards like British English (en-GB) to chronolects like Old Latin (itc-ola) to regional varieties like Penang Hokkien (nan-pen). The thing that matters is that they're "part of" a full language (or, in some cases, another etym-only language). We already have codes for a few varieties of Hokkien, so that part isn't proposing anything new; just that they're nested under the new language code for Hokkien, instead of as sub-variants like they are now. Theknightwho (talk) 17:00, 27 February 2024 (UTC)Reply
@Theknightwho: Thanks for explaining! Then I see no problem with it. If ever, my question is why it should not be extended to Penang Hokkien, Singapore Hokkien, and Philippine Hokkien. --Mar vin kaiser (talk) 17:07, 27 February 2024 (UTC)Reply
@Janinga Chang Seconding parts of this. It was careless for all these varieties to have been anonymously swept into NAN w/o careful examination & debate beforehand. (talk) 12:09, 28 February 2024 (UTC)Reply
 Support as well 1., 2., 3., and 4.1. as per further explanation of Theknightwho about variants under/part of Hokkien as a full language, e.g. Quanzhou, Zhangzhou, Xiamen, Penang, Singaporean, Philippine, Taiwanese, etc. etc. and also later expansion of no. 3 as well for the others under nan as a family to be their own as full languages under the nan family/branch of Min of Sinitic if they show divergent enough linguistic features and are realistically practically socially regarded by their speakers as separate from their closest of kin anyways by now, such as those mentioned above by Justinrleung and S.G.Junge1997 and those listed in the ISO pending request and other more there may be. Also, 4.2 is a bad idea due to there still being a lot of structurally similar or reasonably identical enough terms shared with these variants (ZXQ++) still tying them together despite some observable differences, whether in phonemic structure, vocabulary choices, tonal differences, and other tendencies of these variants. The gulf of difference with these variants (ZXQ++) is not yet like the difference with say what makes nan-hok, zhx-teo, nan-hai, zhx-lui, etc. different from each other, enough to definitively split them.
Also pinging as well other users I remember seeing them edit or create nan entries before: @Fish bowl, @Wikijb, @, @TongcyDai, @A-cai, @Hongthay for comment on this. Mlgc1998 (talk) 20:45, 27 February 2024 (UTC)Reply
 Support the first three bullet points. RcAlex36 (talk) 04:30, 29 February 2024 (UTC)Reply
Thanks for calling and sorry for my bad english.
For point 4, I  Support the option 1 and  no support option 2.
Since I, as a native speaker (of ZC), I think the differences (of Zhangzhou Hokkien and other Hokkien tongues) are small that cannot split them to languages. I dare say they are just accents of Minnan/Hokkien.
For Teochew, Leizhou-ish and Hainanese, indeed their "ancestor" is not the Min Nan, but they are also southern descendant languages of ancient Min too — different to northern descendants like Fuzhou-ish.
(ZC: the Zhangzhou City accent of Hokkien)
MistiaLorrelay (talk) 10:06, 29 February 2024 (UTC)Reply

Split with option 1 of point 4, given the overwhelming support in the last two weeks. Taking inspiration from @Benwing2's process to split the Khanty languages above (see #Splitting Khanty Languages), I think this is what needs to happen:

  1. Assign new language codes to Hokkien (nan-hbl) and Hainanese (nan-hnm), and change over Leizhou Min (zhx-luinan-luh) and Teochew (zhx-teonan-tws). For the sake of forward-compatibility, I've used the proposed codes from the pending ISO proposal, since that will make things simpler if they're accepted.
  2. Assign a temporary family code to Min Nan (zhx-nan), which will be used while nan still exists as a language code.
  3. Track any uses of the nan code.
  4. Move all current {{nan-*}} templates to {{nan-hbl-*}}, since they all relate to Hokkien.
  5. Convert any existing entries with the Min Nan headword to the relevant language (which I suspect will be Hokkien in the vast majority of cases, if not 100%).
  6. Change any references to nan to use the appropriate code. Again, I suspect Hokkien will predominate.
  7. Change any references to the existing etymology-only codes to use the appropriate code.
  8. Delete nan as a language code, and add it as a family code, replacing the temporary code zhx-nan mentioned above.

At this point, I also suggest that we start a new thread to discuss any additional languages which should be added to the Min Nan family, as several have been suggested above. Theknightwho (talk) 18:52, 2 March 2024 (UTC)Reply

I Support points 1-3. However, ZXQ, Taiwanese, Penang, Singapore, and Philippine are really just variable accents with some regional vocabulary, like English dialects throughout England (are all those words recorded in Wiktionary too? They can't be as separate languages though?). Here in Taiwan, Taiwanese is getting more and more standardisation as the years pass, but I agree with another post comparing it to Serbo-Croatian (all accents of a single Stokavian dialect). There are different regional words used in Taiwan, but we start to understand them all as synonyms and I don't even know anymore which words belong to which specific location, like 日頭花 vs 太陽花, or 葉仔 vs 樹葉 vs 樹仔葉 vs 樹葉仔. I frequently travel throughout Southeast Asia and try to use Taiwanese in Penang and Singapore as much as possible. As someone mentioned, Penang has some interesting phonology, but I'm still able to hold conversations with taxi drivers--they speak in their way and I in mine. Though in Penang I've encountered drivers who talk freely at length and at times I find it hard to understand some of the details--they probably understand Taiwanese better than the other way around due to television dramas. But this interaction would not be possible for Chaozhou, which I consider so different as a separate language, and also Hainan and Leizhou--the phonology is far too different and they grammatically use different words. I feel that adding all the various regional pronunciations for ZXQ/Taigi clutters Wiktionary, and I believe that a better unifying meta-spelling would be better that enables regional pronunciations to be deduced through a few simple rules. I think it's best to mention whether a location has a completely separate word for something, rather than providing multiple pronunciations of the same word/字/morphemes. I also dislike the clutter and use of "invented" alternate romanisations that are not widely used or accepted, nor can anybody actually read. POJ or better, TâiLô, function just fine. Kangtw (talk) 09:36, 5 March 2024 (UTC)Reply
Sorry, when I posted support above, the green + button did not automatically appear when I posted. In spite of that, please consider my vote. Kangtw (talk) 09:39, 5 March 2024 (UTC)Reply
@Kangtw The vote has actually already closed, but everyone seems to have shared your view that Hokkien shouldn’t be split and should be treated as one language, so that’s how I’ve been carrying it out. Theknightwho (talk) 18:34, 7 March 2024 (UTC)Reply
@Theknightwho what remains to be done here? Cat:Min Nan language looks mostly empty. This, that and the other (talk) 09:51, 2 October 2024 (UTC)Reply

Add etymology-only codes for Proto-Anglo-Frisian and Proto-North Sea Germanic

[edit]

As variants of Proto-West Germanic. This shoud hopefully be relatively uncontroversial, since we already have a healthy number of entries in Category:Anglo-Frisian Germanic and Category:North Sea Germanic, and there's a need for these due to both (sub-)families being mentioned in various etymology sections:

No doubt there are many more entries where these could be referred to. Theknightwho (talk) 02:19, 27 February 2024 (UTC)Reply

@Theknightwho Anglo-Frisian is a well-established clade but I'm not so sure about North Sea Germanic. Cf. Wikipedia's comment:
North Sea Germanic, also known as Ingvaeonic /ˌɪŋviːˈɒnɪk/, is a postulated grouping of the northern West Germanic languages that consists of Old Frisian, Old English, and Old Saxon, and their descendants.
Ingvaeonic is named after the Ingaevones, a West Germanic cultural group or proto-tribe along the North Sea coast that was mentioned by both Tacitus and Pliny the Elder (the latter also mentioning that tribes in the group included the Cimbri, the Teutoni and the Chauci). It is thought of as not a monolithic proto-language but as a group of closely related dialects that underwent several areal changes in relative unison.
Benwing2 (talk) 04:36, 27 February 2024 (UTC)Reply
@Victar as a major PWG editor.
Not to mention the fact PWG is already pretty controversial (@Mårtensås had some strong opinions on the topic).
I don't think an etym-only code for either is needed at this time, as the supposed differences were very minor, and we don't represent it in our PWG entries afaik. So while the label signifies a term's distribution, it is still supposedly the same language as any other PWG reconstruction in the model we handle. Thadh (talk) 07:24, 27 February 2024 (UTC)Reply
I've never had a need for either, and North Sea Germanic is generally considered an areal grouping. -- Sokkjō 07:39, 27 February 2024 (UTC)Reply
I can see the argument against NSG, but there is very clearly a need for Proto-Anglo-Frisian based on the etymologies mentioned above. It’s not about whether any particular editor has a need for it themselves, and nobody is suggesting we create separate entries for them outside of PWG. Theknightwho (talk) 11:00, 27 February 2024 (UTC)Reply
@Theknightwho I see you created a category Category:Old Frisian terms derived from North Sea Germanic languages as well as Category:Elfdalian terms derived from North Sea Germanic languages and Category:Elfdalian terms derived from Anglo-Frisian languages. Why did you do that, since this discussion is far from resolved? Benwing2 (talk) 22:29, 27 February 2024 (UTC)Reply
@Benwing2 I've already removed the North Sea Germanic family, as I thought better of it. The question of whether we have an Anglo-Frisian clade is separate from whether we have a protolanguage for it (and that category was created back in November). Theknightwho (talk) 22:35, 27 February 2024 (UTC)Reply
Ignoring that fact that a genetic Anglo-Frisian family is disputed, as far as I'm aware, no one has published "Proto-Anglo-Frisian" reconstructions, not even Boutkan or Siebinga, so we wouldn't even have anyone to cite. -- Sokkjō 00:57, 28 February 2024 (UTC)Reply
@Sokkjo Then someone will need to deal with the etymology sections in those entries. Either we mention Anglo-Frisian reconstructions with a proper language code, or we don't mention them at all. Theknightwho (talk) 01:43, 28 February 2024 (UTC)Reply
Which entries, these: CAT:Anglo-Frisian Germanic? -- Sokkjō 02:11, 28 February 2024 (UTC)Reply
@Sokkjo English welkin (which refers to an "Anglo-Frisian Germanic" term), while Old English hriþer and metegian, Old Frisian hrither, and Saterland Frisian dusse all explicitly give Anglo-Frisian reconstructions. Theknightwho (talk) 02:15, 28 February 2024 (UTC)Reply
Amended. -- Sokkjō 04:16, 28 February 2024 (UTC)Reply
@Sokkjo You should also look at the entries mentioned in the North Sea Germanic list at the top of the thread. Once they're dealt with, I'll close this request as resolved. Theknightwho (talk) 06:44, 28 February 2024 (UTC)Reply
@Theknightwho Before resolving this, we need to clear up whether to let the existing 'Anglo-Frisian' family stand. You created it in November without discussion and it's not clear to me from this discussion whether there's consensus in its favor. Benwing2 (talk) 07:11, 28 February 2024 (UTC)Reply
@Benwing2 To explain the reasoning: I understood it to be an uncontroversial clade, which was reinforced by the existence of Category:Anglo-Frisian Germanic. I may have misunderstood the implications of that category, though. Theknightwho (talk) 07:26, 28 February 2024 (UTC)Reply
@Theknightwho I think what this shows is that all additions of clades, and more generally any addition of languages or families, needs discussion beforehand, no matter how uncontroversial it seems. Benwing2 (talk) 07:53, 28 February 2024 (UTC)Reply
@Theknightwho I see you also created the "High German" family back in November. Let me reiterate, you need to not create any more languages or families without discussion. Benwing2 (talk) 01:25, 1 March 2024 (UTC)Reply

Merging Tupinambá (tpn) into Old Tupi (tpw)

[edit]

Tupinambá has only 3 entries, i, and ý, which are already covered by Old Tupi, i, and 'y/y. Also, Old Tupi is used as an umbrella term for all Tupi dialects in Wikitionary, so having a separate heading for Tupinambá doesn't make much sense. Trooper57 (talk) 17:11, 9 March 2024 (UTC)Reply

I also wanted to merge Tupinikin (tpk) for the same reason, just realised there's page for it. This one is basically blank, except for an empty maintenance category. Trooper57 (talk) 21:15, 9 March 2024 (UTC)Reply
tpw (Old Tupi) got merged into tpn (Tupinambá) in 2022, so we should probably follow suit. I don’t really understand why Tupinikin (tpk) should be merged, though. Theknightwho (talk) 21:52, 9 March 2024 (UTC)Reply
It's the same case of Tupinambá: what they call "Tupinikin language" is the variant of Old Tupi spoken by the Tupinikin people. I called them dialects but the difference is like General American to Southern American English, they differ on pronunciation in some points and call some things by different words, but aren't languages on their own. The category is just gonna stay blank forever as all lemmas will be put in Old Tupi anyway. Also, both Tupinambá language and Tupiniquim language redirect to Tupi language on Wikipedia.
About the code, I chose tpw over tpn because I prefer the name "Old Tupi", since it's neutral. I don't mind changing the code if we keep the name. Trooper57 (talk) 22:44, 9 March 2024 (UTC)Reply
@Trooper57 For reference ISO merged Old Tupi and Tupinambá to tpn, and the code tpw was deprecated. It also seems that all varieties of Tupi are extinct. If Tupinambá & Old Tupi [tpn] are not significantly different from Tupiniquim [tpk] perhaps they should all be merged into Tupi [tpn]? - سَمِیر | Sameer (مشارکت‌ها · بحث) 21:54, 9 March 2024 (UTC)Reply
It seems theknightwho already said that while I was typing so my comment is now pointless 😞. - سَمِیر | Sameer (مشارکت‌ها · بحث) 21:56, 9 March 2024 (UTC)Reply
Discussion moved from WT:RFM.

It seems that these are the names of the same language. Infovarius (talk) 16:05, 19 July 2025 (UTC)Reply

(Notifying NoKiAthami, RodRabelo7, Trooper57): please discuss. Juwan (talk) 13:24, 20 July 2025 (UTC)Reply
They are the same. Merge Category:Tupinikin language too, which doesn't even exist: it's based solely on Glottolog's list. Trooper57 (talk) 13:58, 20 July 2025 (UTC)Reply
Agree. Tupinamba, Tupinikin, etc., as far as I know are generally understood as a same language, with of course different dialects. NoKiAthami (talk) 14:26, 20 July 2025 (UTC)Reply
(Moved from RFM to here. BTW, linking another related discussion, Wiktionary:Beer parlour/2023/September#About the Tupi-Guarani family.) - -sche (discuss) 05:23, 23 July 2025 (UTC)Reply
Pinging @Benwing2. What’s the process to change this? Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 19:43, 17 September 2025 (UTC)Reply
@Polomo We would have to first eliminate uses of the codes to be deprecated. There are no Tupinambá or Tupinikin lemmas, which makes things a lot easier, but there are 186 terms with Tupinambá translations. We need to decide (a) what name to use ("Old Tupi" or just "Tupi"?) and (b) what code to use (keep tpw, or switch to tpn following ISO 639-3? I would suggest the latter). Once we've decided those questions, we do something like this:
  1. Make tpw be an alias of tpn (or vice versa, depending on what is chosen as the canonical code), and fix the small number of requests that reference tpk to reference tpn (or tpw, whatever is canonical).
  2. Delete the category pages for Tupinambá and Tupinikin.
  3. Delete the language entries for Tupinambá and Tupinikin.
  4. Use a bot to switch all uses of tpw to tpn (or vice versa) and change the Translation headings from "Tupinambá" to "Old Tupi" (or whatever name is chosen; if these are really Tupinambá-specific translations, we might want to add an indication of this next to the translation; but they just be generic Old Tupi terms; someone will have to review them manually). If there are both Old Tupi and Tupinambá translations for the same term, they will have to be cleaned up manually.
  5. Remove the aliases.
I might have the order slightly off here, but it's close enough. I can do most of the steps but some of them require help from someone who knows the language. Benwing2 (talk) 20:02, 17 September 2025 (UTC)Reply

Additional Southern Min languages

[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381, Benwing2): Following the various discussions relating to Min in the last month or so, now seems a good time to propose the additional Southern Min varieties which we've been missing:

  1. Zhenan Min (nan-zhn)
  2. Datian Min (nan-dtn)
  3. Longyan Min (nan-lnx) - sometimes grouped as part of Hokkien
  4. Sanxiang Min (nan-zsh) - one of the Zhongshan Min lects; the other two are apparently Eastern Min
  5. Swatow Min (nan-swt) - also known as Shantou
  6. Hoklo Min (nan-hlh) - also known as Hailufeng or Haklau Min; currently etym-only but should be made a full language
  7. Proto-Southern Min (nan-pro) - see Appendix:Proto-Southern Min reconstructions

Although we will want codes for all of these, it might not be desirable to count all of them as separate languages. I also suspect the list is far from complete. Theknightwho (talk) 19:32, 10 March 2024 (UTC)Reply

Support although (a) are we stuck with the above codes (i.e. they are proposed ISO 639 standard codes)? If not some of them could stand to be rationalized; (b) we should clarify earlier rather than later whether these should be full or etym codes (although for Chinese I suppose it makes less difference than elsewhere as the L2 header used is always "Chinese"). Benwing2 (talk) 19:37, 10 March 2024 (UTC)Reply
Swatow Min is classified under Teochew, so we do not need additional codes for it. The term "Hoklo" is a bit ambiguous because Hokkien speakers will consider "Hoklo" to refer to Hokkien. The dog2 (talk) 19:44, 10 March 2024 (UTC)Reply
@The dog2 The difficulty with "Teochew" as a name is that it refers to two different things: (1) what Wikipedia calls Chaoshan Min as a whole, and (2) the specific lect as spoken in Chaozhou, which it calls the Teochew dialect. We will still need a code for it either way, but the question is whether it should be an etymology-only code or a full language code. Theknightwho (talk) 19:54, 10 March 2024 (UTC)Reply
The first definition of "Teochew" already has a code for it. It is "zhx-teo". But I'd be open to changing it to be in line with that of the other Southern Min dialect. In Southeast Asia, the term "Teochew" in common parlance is generally understood to refer to the first definition. The dog2 (talk) 20:00, 10 March 2024 (UTC)Reply
@The dog2 Yeah, that makes sense. Just as a side point, the Teochew code was changed to nan-tws with the split of Min Nan, because it makes sense to give all the Southern Min codes the nan prefix, and the pending ISO code is tws. Theknightwho (talk) 20:21, 10 March 2024 (UTC)Reply
@Theknightwho: Thanks for starting this discussion. There are few issues here.
  1. Zhenan Min might be a confusing name because Southern Zhejiang has both Southern Min and Eastern Min varieties; we may want to look into what other names we can use.
  2. Datian Min might need to split further into Qianlu and Houlu dialects.
  3. Does Longyan Min cover all Southern Min varieties spoken in the prefecture city of Longyan? Otherwise, there are several (sub)varieties of Longyan Min.
  4. Swatow/Shantou should probably not be separate from Teochew - it's rare to consider them different varieties.
  5. I personally prefer Hailufeng over Hoklo for the varieties of Southern Min spoken in Haifeng/Lufeng, since Hokkien may also be called Hoklo.
— justin(r)leung (t...) | c=› } 20:11, 10 March 2024 (UTC)Reply
@Theknightwho
1. “Zhenan Southern Min” lies within Hokkien, both sociolinguistically & in terms of intelligibility. It’s pretty much an overseas cluster of Hokkien (and not only b/c it arrived by sea), and should be discussed in that context.
2. Yes, but “Datian Min” is not one language. Which “Datian Mins” belong within “Southern Min” (in any meaningful sense) is a question yet to be thoroughly considered.
3. Yes. “Longyan Min” is sociolinguistically not-Hokkien as well as mutually unintelligible vs Hokkien.
4. Yes. (Not sure if the other two are “Eastern Min”, but that’s a whole other ballgame.)
5. Swatow “Min” is part of Teochew, as others have pointed out.
6. Yes, most definitely. BTW, “Hoklo” refers to the language cluster that includes this language, Hokkien, Teochew, Taiwanese, & maybe others. So “Hoklo” & “Haklau” would be cognate non-synonyms, kind of like “Thai” & “Tai”, but not as striking.
7. Maybe the supposed proto-language should be fleshed out first? (+ I apologise if this is obvious, but Kwok’s “reconstructions” seem to be something quite different from what we usually mean by reconstruction. Also note (as with the ONESELF line) how much data it just flat-out ignores or omits (in this case perhaps in order to hang on to the presumed characters-of-etymology 家 & 己). (talk) 13:45, 11 March 2024 (UTC)Reply

Beserman

[edit]

(Notifying Thadh, Tropylium, Surjection): Recently I’ve been adding Beserman Udmurt entries (Category:Beserman Udmurt), and contrary to my expectations, Beserman seems less similar to Udmurt than I initially expected (at least in terms of vocabulary and phonology). Beserman is usually considered to be a 'special' dialect of Udmurt, and since recently it also has it's own written standard. As far as I can see it definitely seems more convenient to create separate Beserman entries. I'm afraid that, if not, Udmurt might get pretty messy, with for most Udmurt entries a Beserman alternative form. A lot of information on the Beserman dialect can be found on http://beserman.ru/. I'll be glad to hear your opinions on this. Илья А. Латушкин (talk) 19:52, 13 March 2024 (UTC)Reply

At minimum most of the Beserman entries so far should not be listed as synonyms. Most are simply the result of a regular sound change from ы /ɨ/ to ө. Currently it seems this is also transcribed on here as /ʌ/ and translitterated as å, where at least the latter seems weird, most often I have seen the sound described as /ə/ (= Finno-Ugric transcription ə̑, which beserman.ru also seems to use). In any case, these could be easily accommodated similar to differences between e.g. English dialects, as alternate pronunciations + spellings (besides, this is not unique to Beserman but is paralleled by other dialects). A few other phenomena also come down to simple systematic pronunciation differences, e.g. the replacement of ӧ by /e/. It is unclear to me (and per current literature, it seems, also to Uralistics at large) how much else really differs between Beserman and even standard Udmurt. --Tropylium (talk) 20:07, 13 March 2024 (UTC)Reply
@Tropylium: The usage of synonym of stems from my usage of that format in Komi Izhma entries, e.g. асывыы (asyvyy). It's probably indeed a good idea to mark them as altforms, but the issue I have is mostly that Komi Izhma is actually semi-standardised alongside standard Komi, and the same issue is also present in Beserman.
On the differences between it and standard Udmurt, I honestly can't say a lot as I haven't worked too much with the language. It does feature some unique sound changes from the Proto-Permic language that set it apart from the other Udmurt dialects, like being the only Permic lect to (consistently) differentiate between the reflexes of *u and . It also seems to have a national identity separate from other Udmurts. But other than that I would have to refer to Ilya, as they've worked with the language more closely. Thadh (talk) 20:47, 13 March 2024 (UTC)Reply
Sorry, whose *ü and where? Beserman has a few unique-looking cases of /ə/ (< ? *ɨ), but only in words where southeastern Udmurt more generally also shows /ʉ/ (the generally accepted historical scenario is that Beserman arises from the SE dialects of Udmurt, after a migration towards the north leaves them slightly isolated). --Tropylium (talk) 21:03, 13 March 2024 (UTC)Reply
Lytkin's. I'm talking of words like мөнөнө (månånå, to go) and зөмөнө (zåmånå, to dive). And I do take issue with your identification of the vowel as being a schwa, it most definitely isn't one. If you listen to actual recordings I think you'll agree that it is a low vowel, sometimes even as open as [a]. Thadh (talk) 21:30, 13 March 2024 (UTC)Reply
/ə/ is not my identification but what reference literature insists calling it, e.g. the late Keľmakov's monographs on Udmurt dialectology like Udmurtin murteet (1994), Диалектная и историческая фонетика удмуртского языка (2003). A lot of beserman.ru's recordings do sound more like [ʌ] or [ɐ], I agree. This could be a recent development, also e.g. the loss of ӧ is only post-WW1. --Tropylium (talk) 20:43, 14 March 2024 (UTC)Reply
Overall Permic languages have undergone some shifts in the recent century, also including the delabialisation of ӧ (ö) in practically all varieties of Komi. Since we are primarily a descriptive dictionary of the modern languages (earlier stages are a bonus!) I think we should stick to the modern pronunciation. The transcription of the vowel as å was taken over from Komi-Yazva, which has a very similar vowel written the same way. Thadh (talk) 09:07, 15 March 2024 (UTC)Reply
I know nothing about Udmurt, but I do agree that unless and until Beserman is considered a separate language, its entries should be formatted along the lines of {{alt form|udm|аску|from=Beserman}} rather than as synonyms of primary-dialect forms. —Mahāgaja · talk 21:40, 13 March 2024 (UTC)Reply
@Tropylium I have found some other sound correspondences between Udmurt and Beserman:
1. йырси ~ йөрчө 'hair', кырси ~ көрчө 'son-in-law'
2. кеч ~ кесь 'goat', ӟуч ~ дюсь 'Russian'
3. син ~ синь 'eye', кин ~ кинь 'who', нин ~ нинь 'linden'
4. тэй ~ тей 'louse', дӥсь ~ дись 'clothes', дэрем ~ дерем 'shirt'
5. ӝӧк ~ ӟек 'table', ӝыт ~ ӟөт 'evening', ӝужыт ~ ӟужөт 'high'
6. ньөм ~ ним 'name', йөвор ~ ивор 'news'
7. сылал ~ слал 'salt', плем ~ пилем 'cloud'
Илья А. Латушкин (talk) 18:24, 14 March 2024 (UTC)Reply
FWIW most of this is also within normal phonetic variation for Udmurt dialects, the /Te/ > /Tʲe/ change is the only systematic feature I don't recall seeing reported before (makes sense though, helps for not entirely losing the э/ӧ contrast).
One thing to consider is that even if we created Beserman separately, we'd then still want to note all forms like these in Udmurt entries, just now as etymological cognates rather than pronunciation variants. It might not save substantial work altogether. The etymologist in me at least thinks this would be probably the nicer option though, if you're already creating separate entries anyway. And it would be more consistent also with how we have split Komi-Zyrian and Komi-Permyak, instead of treating them as variants of single "Komi". --Tropylium (talk) 19:43, 14 March 2024 (UTC)Reply
The same thing has come to my mind as well, and at first sight the differences between Komi-Zyrian and Komi-Permyak do not seems to be much larger than those between Udmurt and Beserman.
I've found two more sound correspondences (1. ӟуч ~ дюсь 'Russian', ӟеч ~ десь ‘good’, 2. ньыль ~ ниль ‘four’, выль ~ виль ‘new’) and some Beserman words not found in standard Udmurt (most of them Turkic loanwords), eg. бикем ‘aunt’, биягам ‘husband's older brother’, бийөм ‘mother-in-law’, ўармиська ‘brother-in-law’, писяй ‘cat’ (also found as ‘писэй’ in dial. Udmurt), … Also some other, more sporadic, vowel correspondences have come up: изьыны ~ узьөнө ‘to sleep’, губи ~ гиби ‘mushroom’, чорыг ~ чорог ‘fish’, сюрес ~ сьөрес ‘road’, бугро ~ бөгра ‘felling’, … Илья А. Латушкин (talk) 08:50, 15 March 2024 (UTC)Reply

More etym codes for Chinese varieties, part 1

[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho Hopefully this ping isn't too noisy. There are two more sources of Chinese lects here at Wiktionary that I have found that may need etym-only codes: qualifiers in thesaurus entries and labels in Module:labels/data/lang/zh. The following table is derived from thesaurus qualifiers (I computed this as part of converting nan codes and qualifiers to appropriate lect codes):

Qualifier Count Comment Wikidata entry (if any)
ACG 1 Does this mean "Anime, Comics, Gaming"? Not a lect.
Anxi Hokkien 2 Need lect code?
Australia 1 Ambiguous
Buddhism 5 Not a lect
Buddhist temple 8 Not a lect
Chinese landscape garden 1 Not a lect
Christianity 1 Not a lect
Classical Chinese or in compounds 1 Ambiguous
Classical Chinese 59 Ambiguous
Classical 8 Ambiguous
Eastern Min; Southern Min 1 Ambiguous
Fuzhou 1 Ambiguous
Guangdong 1 Ambiguous?
Guiyang 1 Need lect code? Per w:Southwestern Mandarin, a subvariety of the Kun-Gui variety of Southwestern Mandarin Q15911623
Harbin Mandarin 1 Need lect code; a variety of Northeastern Mandarin Q1006919
Harbin 2 (same as above)
Hong Kong 24 Ambiguous
Hong Kong><tr:pot1 1 Ambiguous
Hsinchu & Taichung Hokkien 1 ??? Do we need two lect codes? Wikidata has a "Taichung Accent" (Q10914070) but it is a variety of Mandarin; can't find Hsinchu Hokkien in Wikipedia or Wikidata
Internet slang 9 Not a lect
Internet 2 Not a lect
Japanese calligraphy 1 Not a lect
Jilu Mandarin 1 Need lect code; primary subdivision of Mandarin Q516721
Jinhua Wu 1 Need lect code Q13583347
Korean calligraphy 1 Not a lect
Liuzhou Mandarin 2 Need lect code? Q7224853
Liuzhou 1 (same as above)
Longyan Min 2 Need lect code (but will likely be transitioning to a full language, see #Additional Southern Min languages); per Wikipedia, a variety of Hokkien, but that may be wrong Q6674568
Luoyang Mandarin 1 Need lect code; a variety of Central Plains Mandarin Q3431347
Luoyang 3 (same as above)
Macau 2 a variety of Cantonese? Do we need a lect code?
Mainland China 3 Ambiguous
Mainland 2 Ambiguous
Malaysia 11 Ambiguous
Mandalay Taishanese 1 an overseas variety of Taishanese; Do we need a lect code?
Min 12 Ambiguous
Muping Mandarin 1 Do we need a lect code? This may be a variety of Shangdong Mandarin (Q3285432)
Muping 2 (same as above)
Nanchang Gan 1 Need lect code Q3497239
Northern China 1 Ambiguous
Northern Mandarin 2 Ambiguous
Philippines 1 Ambiguous
Pinghua 1 Ambiguous
Pingxiang Gan 3 Do we need a lect code? A variety of Yiliu Gan Chinese (Q8053438)
Qing Dynasty 1 Not a lect
Sichuanese or Internet slang 1 Sichuanese = zhx-sic; Internet slang = not a lect
Singapore 13 Ambiguous
Son of Heaven 2 What is this? Not a lect.
Southeast Asia; dated or dialectal in Mainland China 1 Ambiguous
Southwestern Mandarin 2 Need lect code Q2609239
TCM 3 Traditional Chinese Medicine? Not a lect.
Taichung & Tainan Hokkien 1 Do we need a lect code or two? See above under "Hsinchu & Taichung Hokkien" for Taichung Hokkien. Tainan Hokkien is mentioned in Wikipedia as being the prestige dialect of Taiwanese Hokkien but can't find it in Wikidata.
Tainan Hokkien 1 (see above)
Taiwan 24 Ambiguous
Taiwanese 2 Ambiguous
Taiyuan 1 Need lect code? Variety of Jin Chinese Q10941068
Taoism 1 Not a lect
Thailand 2 Ambiguous
Urumqi 2 Need lect code? Variety of Lanyin Mandarin Q10878256
Wanrong 1 This is a mountain indigenous township in Taiwan; I don't what lect is being referred to, and whether it's even Chinese Refers to Wanrong County in Shanxi; a variety of Central Plains Mandarin, mentioned in the Great Dictionary of Modern Chinese Dialects; apparently a subvariety of Fenhe Mandarin (Q10379509)
Xi'an Mandarin 1 subvariety of Guanzhong Mandarin (Q3431648); not sure if it needs to be distinguished from Guanzong Q123700130
Xi'an 1 (same as above)
Xinzhou 3 Need lect code? Variety of Jin Chinese, doesn't seem to have Wikidata entry
Yinchuan 1 Need lect code? Variety of Lanyin Mandarin
Yongchun Hokkien 1 Need lect code? Q65118728
Yudu Hakka 1 Need lect code? Q19856416

There are 14 lects among the above qualifiers with Wikidata entries that I could find, and some others apparently without Wikidata entries that might need a code. Benwing2 (talk) 03:12, 18 March 2024 (UTC)Reply

@Benwing2 Thanks for putting this together. On Longyan Min in particular, it's likely going to be separated out as a full language as per #Additional Southern Min languages, despite Wikipedia calling it a variety of Hokkien. Theknightwho (talk) 03:27, 18 March 2024 (UTC)Reply
@Theknightwho Ah, I see that now, thanks. Benwing2 (talk) 03:33, 18 March 2024 (UTC)Reply
@Benwing2: Wanrong refers to Wanrong County in Shanxi; this is a variety of Mandarin (Central Plains IIRC). — justin(r)leung (t...) | c=› } 03:32, 18 March 2024 (UTC)Reply

More etym codes for Chinese varieties, part 2

[edit]

@Theknightwho, Justinrleung Only pinging the people who responded to part 1 above. Here are the uncoded Chinese varieties with labels in Module:labels/data/lang/zh. As above, some have Wikidata items and some are too unspecific or ambiguous to turn into etym-only lects. Some are also clearly full languages or even families.

Canonical label Label aliases Comment Wikidata item (if any)
dialectal Cantonese Not specific enough
Changzhounese Changzhou dialect, Changzhou Wu subvariety of Northern (Taihu) Wu Q1021819
Chuzhou Wu Chuzhou dialect, Lishuinese, Lishui dialect, Fujian Wu, Lishui Wu a variety of Chu-Qu Wu, a Southern Wu language; confusable with Quzhou Wu; not in Wikidata?
Coastal Min coastal Min Not specific enough
Datian Min likely becoming a full language Q19855572
dialectal Eastern Min dialectal Min Dong Not specific enough
Gansu Dungan basis of the Soviet written standard for Dungan; not in Wikidata?
dialectal Gan Not specific enough
Guangxi Mandarin This is possibly the same as Guiliu (Gui-Liu) Mandarin (supervariety of Guilin Mandarin) Q11111664
dialectal Guangxi Mandarin Not specific enough
dialectal Hakka Not specific enough
Hong Kong Hakka Mentioned in the Wikipedia w:Hakka Chinese article Q2675834
Huzhounese Huzhou dialect, Huzhou Wu subvariety of Northern (Taihu) Wu Q15901269
Inland Min inland Min Not specific enough
Jianghuai Mandarin Jiang-Huai Mandarin, Lower Yangtze Mandarin, Huai primary branch of Mandarin Q2128953
Jiaoliao Mandarin Jiao-Liao Mandarin primary branch of Mandarin Q2597550
Jilu Mandarin Ji-Lu Mandarin primary branch of Mandarin? Q516721
dialectal Jin Not specific enough
Korean Classical Chinese Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
Linshao Wu Linshao, Linshao dialect, Lin-Shao Wu, Lin-Shao dialect, Lin-Shao subvariety of Northern (Taihu) Wu; not in Wikidata?
Liuzhou Mandarin a variety of Southwestern Mandarin Q7224853
dialectal Mandarin Not specific enough
Min Not specific enough
Nanning Pinghua a variety of Southern Pinghua Chinese; not in Wikidata?
North America North American Not specific enough
Pinghua A family, not a language
Shaoxing Wu Shaoxingnese, Shaoxingese, Shaoxing dialect variety of Linshao Wu, in turn a variety of Northern (Taihu) Wu Q7489194
Shehua its own branch of Chinese Q24841605
Shuangfeng dialect of Old Xiang Q10911980
Siyi a Yue language? Includes Taishanese Q2391679
Southern Min Min Nan Not specific enough
dialectal Southern Min dialectal Min Nan Not specific enough
Southern Wu appears to be a Wu subfamily, including at least three languages
Standard Written Chinese SWC Per User:justinrleung, this refers to Standard Mandarin = Putonghua, different from Written vernacular Chinese which refers to the standard written vernacular varieties of the Qing and Ming dynasties, as opposed to Classical/Literary Chinese (NOTE: Wikipedia's Standard Written Chinese confusingly redirects to Written vernacular Chinese, and Wikipedia's article on that covers time periods from the Ming dynasty to the present, not just through the end of the 19th century) Q727694
Sujiahu Su-Jia-Hu Wu, Sujiahu Wu, Su-Jia-Hu a subvariety of Northern (Taihu) Wu
Vietnamese Classical Chinese Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
dialectal Wu Not specific enough
Wuzhou Wu Jinhua dialect, Jinhuanese, Wuzhou, Wuzhou dialect, Jinhua Wu one of the Southern Wu languages Q2779891
dialectal Xiang Not specific enough
Xinjiang subvariety of Lanyin Mandarin? Includes Urumqi Mandarin (Q10878256)
Xinqu Wu Quzhounese, Quzhou dialect, Shangraonese, Shangrao dialect, Xinzhou dialect, Xinzhou Wu, Quzhou Wu, Shangrao Wu a variety of Chu-Qu Wu, a Southern Wu language Q6112429

Benwing2 (talk) 04:32, 18 March 2024 (UTC)Reply

@Benwing2: Huzhounese is Q15901269. Guangxi Mandarin should be approximately the same as Guiliu Mandarin, which is Q11111664. Hong Kong Hakka is Q2675834. Standard Written Chinese is usually referring to the modern standard, whereas Written Vernacular Chinese seems to refer to written vernacular Mandarin in the Yuan, Ming and Qing dynasties.
BTW, Xinzhou dialect as an alias for Xinqu Wu is problematic, since Xinzhou is ambiguous. Xinzhou Jin is a completely different variety from a different Xinzhou. — justin(r)leung (t...) | c=› } 06:19, 18 March 2024 (UTC)Reply
@Justinrleung Thank you for finding those entries! I think we should remove all aliases that read 'Foo dialect' and consider only allowing aliases that include the language name in them. It is unfortunate that Wikipedia puts the primary entries for various Chinese lects under 'Foo dialect' instead of 'Foo Wu', 'Foo Jin', etc. for precisely the reason you mention. Even in the case of the same location mentioned, it's quite possible for a given location to have multiple dialects of different languages. Benwing2 (talk) 07:02, 18 March 2024 (UTC)Reply
@Benwing2: Thanks for tabulating these.
re: removing aliases that read 'Foo dialect', there are some dialects whose affiliation is not extremely clear, e.g. Huizhou dialect (not to be confused with Huizhou Chinese which is czh) and so we labelled it as "Huicheng dialect" ("Huizhou dialect" would also work but that will certainly be confused with czh).
Often the labels are used to achieve the text rather than categories, which is why there is a relatively large amount of |_| in {{lb|zh}}. One slighly extreme example would be 鐳#Etymology 2 sense 3, {{lb|zh|Malaysia|&|Singapore|_|Cantonese|Hakka|Southern Min|;|Xiamen|Quanzhou|Zhangzhou|_|Hokkien|;|slang|_|in|_|Hong Kong Cantonese}}, which is actually representing a large number of lects but it's not categorised properly due to the limits of {{lb}}. This is why sometimes you will find labels like {{lb|zh|Taiwan Hokkien and Hakka}} so that the desired result is achieved, even though it should actually be {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}}.
I would suggest to search for additional items in the form of {{lb|zh|Foo|_|Cantonese}} or {{lb|zh|Bar|_Wu}} which should unveil more unencoded dialects, some of which may already be covered in the previous section (e.g. something as mundane as {{lb|zh|Xiamen Hokkien}} isn't a recognized label so often it is inputted as {{lb|zh|Xiamen|_|Hokkien}}). (this is also why there is a relative abundance of Wu dialects in the labels data, probably the result of some dedicated user who added them)
I'll go over the actual individual lects later. – wpi (talk) 12:55, 18 March 2024 (UTC)Reply
Personally I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code.
  • Austrailia, Malaysia, Singapore, Thailand etc.: these may need a code for each lect (as appropriate), e.g. Malaysian Cantonese, Thailand Teochew (Malaysia may need to be further subdivided by location, we already have Penang Hokkien) [see also my previous comment]
  • Guangdong: usually means Cantonese+Teochew+may be Taishanese+maybe Leizhou+maybe Hainan, this should be replaced accordingly
  • Hong Kong, Macau: usually refers to the standard form of Chinese (not necessarily Cantonese, but often somewhat influenced by Cantonese) spoken in HK/Macau respectively [zh-HK and zh-MO?]
  • Taiwan: similar to above [zh-TW?]
  • Hsinchu & Taichung Hokkien: there may be some need to create code for the Taiwanese Hokkien dialects, but I'll defer to others for this (but IIRC Hsinchu is predominantly Hakka speaking?)
  • Mandalay Taishanese: might need a code but probably won't be used much
  • Shehua: a branch parallel to Neo-Hakka (which we call Hakka/which is the only part of "Hakka" that we have coverage of), "She" is likely the more common academic term (but this clashes with She the Hmong-Mien language, both names share the same etymology). [zhx-she?]
    • (the ancestor Neo-Hakka and She is parallel to Paleo-Hakka, but this is another rabbit hole, plus coverage of it is relatively poor)
  • Anxi Hokkien, Yongchun Hokkien, Muping Mandarin, Wanrong: seems relatively minor to be assigned a code? I'm not certain however.
Some comments (partly based on my observation of the usage in {{lb|zh}} and also based on our[my] plans to increase coverage of dialects), grouped by branch:
  • Gan: label-wise we usually have Nanchang [gan-nan?], Lichuan [gan-lic?], Pingxiang [gan-pin?], Taining [gan-tai?], Yongxiu [gan-yon?]. These are all locations rather than subgroups (my understanding is that the subgrouping of Gan is quite undeveloped). It's worth noting that our Gan coverage is extremely lacking (due to both lack of data and lack of motivated editors), and most likely we will only have these four locations in the foreseeable future.
  • Hakka: Sixian may need to be divided into North Sixian/South Sixian. We might also want to add the rest of the Taiwanese Hakka dialects. Coverage of Yudu Hakka [hak-yud?] and Hong Kong Hakka [hak-HK?] seems OK.
  • Huizhou: this group is too small to have any meaningful subdivision, I think at most we can assign a code to Jixi [czo-jix?].
  • Jin: I think we could have Taiyuan [cjy-tai?] and Xinzhou [cjy-xin?]. The other dialects have poorer coverage. (I didn't find any usage of Xinzhou Wu)
  • Wu: besides the mentioned ones, we may also need Danyang Wu? I'll defer to ND381 and Musetta6729.
  • Eastern Min: representative dialect is Fuzhou [cdo-fuz?], other possible inclusion would be Fuqing [cdo-fuq?] and maybe Ningde [cdo-nin?]. The rest seems too sporadic.
  • Xiang: Changsha [hsn-cha?], Shuangfeng [hsn-shu?], Loudi [hsn-lou], Hengyang [hsn-hya] are major dialects. The coverage situation is similar to Gan.
  • Mandarin: the ones mentioned should be added generally.
  • Pinghua: Southern Pinghua [csp] is usually considered to be part of Yue. Worth noting Nanning Pinghua and Nanning Cantonese are different though.
  • Cantonese/Yue: I think we should add Siyi Yue [yue-siy?/zhx-siy?] and demote Taishanese [zhx-tai] to a variety of it. The usage of [yue] to refer to Cantonese or Yue is pending discussion. Other ones that could be added include Yangjiang [yue-yan?/zhx-yan?] and Dongguan [yue-don?], while the rest seems to have relatively poor coverage.
  • Southern Min is already dealt with elsewhere
  • Puxian Min: I believe this can have Putian [cpx-put?] and Xianyou [cpx-xia?]?
wpi (talk) 16:37, 18 March 2024 (UTC)Reply
@Wpi Thank you for all the details! I just realized there is a third source of varieties here at Wiktionary, which is the dialectal data found in the data modules for {{zh-dial}}, specifically Module:zh/data/dial. For example, under 討食 / 讨食 you have a whole set of "dialectal synonyms of 要飯 / 要饭 (yàofàn, to beg for food)" in addition to the Thesaurus entries for 乞討 / 乞讨 (qǐtǎo) fetched using {{syn-saurus}}. Ultimately IMO we should probably merge the dialectal data in the {{zh-dial}} modules with the Thesaurus entries, but that is another can of worms. For now I'll just note that the {{zh-dial}} data conveniently comes with links to English or Chinese Wikipedia entries so it should be easy to find the relevant Wikidata items. *HOWEVER*, there are an absolute ton of varieties listed; I count 1,122 of them currently. (Of these, 969 have Wikipedia links, but many of these links are to geographic entries rather than dialectal entries.) I doubt all of these varieties need to be assigned etym-only codes. I think one way to pare them down is to go through the dialectal data and count how many synonyms there are for each variety. This should reveal which varieties are important enough to warrant codes (I imagine a lot of the varieties listed have no synonyms at all in the data). Benwing2 (talk) 22:32, 18 March 2024 (UTC)Reply
Please see User:Benwing2/zh-dialect-counts. This table lists all the varieties/dialects found among the dialectal synonym data along with counts, the Chinese dialect group they're in and the Wikipedia link, if any. (There 2,787 terms currently listed in the data.) I'm thinking we can start with the first 100 or 200 varieties listed, figure out what to do with them, and go from there. Also, the script I wrote to combine the counts with the variety data in Module:zh/data/dial output the following warnings concerning varieties for which there are synonyms but which aren't in Module:zh/data/dial:
WARNING: Found variety 'Luoyang' not in variety data
WARNING: Found variety 'Zhumadian' not in variety data
WARNING: Found variety 'Pingdingshan' not in variety data
WARNING: Found variety 'Zhoukou' not in variety data
WARNING: Found variety 'Xuchang' not in variety data
WARNING: Found variety 'Nanyang' not in variety data
WARNING: Found variety 'Luohe' not in variety data
Benwing2 (talk) 23:24, 18 March 2024 (UTC)Reply
@Wpi In response to some of your comments:
  1. As for 'Foo dialect' issues, I think in cases like 'Huicheng dialect' where the affiliation isn't clear, we should just identify them as 'Huicheng Chinese'. It's true that we usually do that for top-level groups but I think it's better in this case than using "dialect".
  2. I will search for labels specified using _ and such. Hopefully the usage isn't too inconsistent.
  3. Concerning your statement "I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code", what is the alternative you are responding to? Is it further full-language splits (e.g. with Southern Min)?
  4. For zh-HK, zh-MO, you say "standard language". If this is Cantonese, maybe we should use yue-HK, yue-MO?
  5. For the specific lect comments, I don't know enough to respond but it all looks reasonable. User:Theknightwho, what do you think of the proposal to demote Taishanese to a variety of Siyi Yue?
Benwing2 (talk) 05:25, 19 March 2024 (UTC)Reply
In re point #2, see User:Benwing2/zh-label-sets. Benwing2 (talk) 06:41, 19 March 2024 (UTC)Reply
OK, only a few uses of labels involving 'Foo dialect', and only one involving a label actually listed in Module:labels/data/lang/zh, which was 𠀫𠀪 (which, BTW, is being RFV'd) using 'Hangzhou dialect':
  28 Huicheng dialect
   4 eye dialect
   3 ancient Chu dialect
   1 title=zh:Grammaire du dialect
   1 southern dialect
   1 some Mandarin with a Southern Chinese dialect
   1 of one's speech of the local dialect
   1 ancient Qi or Wu dialect
   1 ancient Qi dialect
   1 [[w:Luoyang dialect
   1 Sòng-Lǔ dialect
   1 Sichuan dialect
   1 Shaanxi dialect
   1 Northeastern dialect
   1 Ningyuan dialect
   1 Hangzhou dialect
I changed that one usage to 'Hangzhounese' and deleted all the 'Foo dialect' labels. We might want to add something for the 'Huicheng dialect' labels (cf. your mention above of this). Benwing2 (talk) 08:10, 19 March 2024 (UTC)Reply
@Benwing2:
re #3, I'm referring to when we are assigning the codes, i.e. groups like Siyi will have a full code whereas local dialect points like Taishanese will have etym-only codes.
re #4, it's basically Standard Written Chinese as used in Hong Kong/Macau. It should be "written/used" not "spoken" as I previously mentioned. There's a difference between yue-HK (Hong Kong Cantonese) and zh-HK (Hong Kong), it's a bit like Norweigian Nynorsk vs Norweigian Bokmal.
Also pinging @Justinrleung for comments to specific lects.– wpi (talk) 11:31, 19 March 2024 (UTC)Reply
@Wpi OK thanks. As for #3, I agree with your idea of the separation between full and etym-only languages going along group lines. As for #4, didn't realize there is this difference but it makes sense. Benwing2 (talk) 15:04, 19 March 2024 (UTC)Reply
Thoughts on Wu codes (locality codes are just suggestions):
  • Northern Wu subbranches imo don't really need codes but individual localities would be beneficial. Of which:
Changzhounese wuu-chz
Danyangese wuu-dan
Shaoxingese wuu-shx
are in need of codes (due to relative abundance of data, and will also be gaining zh-pron support soon). Some others to consider may include
Cixinese wuu-cix
Huzhounese wuu-huz
and all the other lects currently in Module:wuu-pron/sandbox. We are currently still working on it so it may be worth delaying the addition of these lect codes until we finish the Northern Wu overhaul.
  • Currently extant Northern Wu localities (Hangzhounese, Ningbonese, Shadi Wu, Shanghainese, Suzhounese) should all be listed under Northern Wu (wuu-nor) in the family tree on Category:Wu language (and any other system that may handle language families).
  • Southern Wu wise, I believe these would be helpful to have in the future, as we will be adding pages/making modules for them as soon as possible:
Jinhuanese / Wuzhou Wu wuu-jih
Taizhounese / Taizhou Wu wuu-tai
Lishunese / Chuzhou Wu wuu-lis
Shangraonese / Xinzhou Wu wuu-shr
in descending order of importance. I decided to split "Chuqu Wu" as is described on the chart as there is no clear consensus as to how the non-coastal non-Northern Wu bits should be split, but in general these three areas (Wuzhou, Chuzhou, Xinzhou) can be seen reflected in some way.
  • A Southern Wu code (wuu-sou) should not be made. It is likely not a familial grouping but rather just a term to use to contrast it with Northern Wu. There have been some preliminary studies that investigate whether it does form a coherent family, but results are mixed and sample sizes are small.
Regarding why there are so many Northern Wu localities, yes, muset & I added them, as unlike Hokkien for instance, the sociolinguistic attitude towards these lects is first and foremost the locality rather than the family (which contrasts with the "Hokkien" identity).
@Musetta6729 - only other active Wu editor: let us know if you have any other/conflicting ideas — nd381 (talk) 19:38, 19 March 2024 (UTC)Reply
@ND381 Thank you! I will probably take all your suggestions. Benwing2 (talk) 20:26, 19 March 2024 (UTC)Reply
Just only got the chance to look at this thread now - in terms of Wu I definitely agree with everything that ND has said so far, just two things I would like to mention:
First: Having Urban Shanghainese as a variety (maybe under something like wuu-ush) along with simply "Shanghainese" (wuu-sha) might be useful. This is due to a variety of reasons, but mainly that Contemporary "Urban" Shanghainese has showcased more convergent evolution with say, Ningbonese or Suzhounese during the last century, and has become more sociolinguistically and identity-wise distinct from many Non-Urban varieties surrounding it. With only the label "Shanghainese" now it is tricky to disambiguate between categories such as:
  • Primarily urban inventions not used in non-urban varieties, or that have spread out to non-urban regions as still recognisably "urbanite" speech
  • Common invention/retention in Non-Urban Shanghai varieties that are rare/obsolete/not used in Urban Shanghainese
  • Inventions in Non-Urban Shanghainese that is not geographically restricted to one specific region of Shanghai
  • Usage attested in both 1850s City-Center Shanghainese and contemporary Non-Urban, but not Contemporary Urban Shanghainese
Especially because all of this variance is also deeply interconnected with notions of locality, of new and old, of class, ethnicity and other sociolinguistic variables when looked at from an Urban Shanghainese standpoint. All of this has led to the use of ad hoc labels along with the Shanghainese tag like "old-period", "chiefly non-urban/suburban", "rare or obsolete" etc which is definitely not ideal. By having Urban Shanghainese as a variety I expect that this would be easier to manage - and as we go on to add more coverage on Non-Urban Shanghainese varieties we should hopefully be able to have more specific variety codes for lots of the Non-urban Shanghainese varieties too.
The second thing is a bit more minor - Suhujia (蘇滬嘉 - see linked Chinese Wikipedia article) might be a more commonly used term than Sujiahu (蘇嘉滬), which we seem to have now. The grouping seems to be somewhat areal and vaguely defined to me and I am doubtful of the extent to which having it might be useful, but nevertheless it's a fairly widely accepted grouping so thought I would bring this up in case we end up making the decision to add it. Musetta6729 (talk) 04:38, 24 March 2024 (UTC)Reply

Redid Chinese labels

[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho I redid the label structure in Module:labels/data/lang/zh. I added missing labels corresponding to the new lects in Module:etymology languages/data, canonicalized the labels to include the group name (e.g. Xiamen Hokkien instead of just Xiamen), and added shorter aliases. Duplication is avoided in something like {{lb|zh|Xiamen Hokkien|Quanzhou Hokkien|and|Zhangzhou Hokkien}} (or equivalently, {{lb|zh|Xiamen|Quanzhou|and|Zhangzhou}}) by a new Chinese-specific label postprocessing function in Module:labels/data/lang/zh/functions, which attempts to remove duplicate group names as well as duplicate occurrences of "Taiwanese" in {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. Please let me know if you don't like the output in specific situations and I will tweak the function. Note that I removed the label Taiwanese Hokkien and Hakka and all its aliases, after converting all occurrences to use multiple labels like {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. I also changed a few categories to better reflect the lect name, e.g. the label Philippine Hokkien now categorizes into Category:Philippine Hokkien instead of Category:Philippine Chinese. Benwing2 (talk) 00:50, 20 March 2024 (UTC)Reply

@Benwing2: Thanks for setting this up. The function looks like it works well generally, but there are some cases where it might lead to confusion, such as {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}} showing up as "Taiwanese Hokkien, Hakka", which could mean the unintended "Hakka (in general) and Taiwanese Hokkien". Perhaps one way to prevent this is to only remove duplicate group names when there is an "and" somewhere in the chain? Is that something that could be done? — justin(r)leung (t...) | c=› } 06:56, 20 March 2024 (UTC)Reply
@Justinrleung Yup, I can do that, thanks for the suggestion. Benwing2 (talk) 17:08, 20 March 2024 (UTC)Reply
@Justinrleung This should be done. Let me know if you see anything else needing fixing. Benwing2 (talk) 03:25, 22 March 2024 (UTC)Reply

Ramifying/filling out Yue Chinese

[edit]

(Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): Apologies once again for the wide ping, as I haven't received any responses to some of my other pings. I added a bunch of labels for Yue Chinese lects, but it is revealing some issues:

  1. We correctly classify Yue as a family, but it contains only two languages (Cantonese language and Taishanese language). Meanwhile per Wikipedia and Glottolog there are something like seven primary branches:
    1. Yuehai Yue, which is more or less Cantonese proper.
    2. Siyi Yue, which includes Taishanese.
    3. Goulou Yue, most notably including Yulin dialect and its sublect Bobai dialect.
    4. Yongxun Yue, with Nanning Yue as the representative dialect.
    5. Gaoyang Yue, most notably including Yangjiang Yue.
    6. Wuhua Yue.
    7. Qinlian Yue, partly intelligible with standard Cantonese.
  2. We are using the code yue for Cantonese proper and zhx-yue for the Yue family, which is inconvenient and contrary to ISO 639-3 usage.

I propose:

  1. Change to using yue for the family and use some more specific code for Cantonese, either yue-can or yue-yue (for Yuehai Yue).
  2. Create L2 languages for each of the above seven groups. We can reuse the "Cantonese language" for Yuehai Yue. This shouldn't entail any real splitting per se as we already have Yue as a family rather than a language.
  3. Demote Taishanese to an etym-only variety of Siyi Yue and assign it a code yue-tai in place of zhx-tai.

Please also note, in the labels I created, the canonical name for each label has "Cantonese" in it for all sublects of Yuehai Yue but "Yue" for Yuehai Yue itself and for all other lects. Almost everything called "Foo Cantonese" (except for variants of standard Cantonese) has an alias "Foo Yue", but not the other way around. For example, the Dongguan dialect is called "Dongguan Cantonese" because it is a variety of Yuehai Yue, and has "Dongguan Yue" as an alias; but the Yulin dialect is called "Yulin Yue" and does NOT have "Yulin Cantonese" as an alias, since it is a variety of Goulou Yue rather than Yuehai Yue. Benwing2 (talk) 22:17, 28 March 2024 (UTC)Reply

Thanks for the ping. Here are some of my questions, to make sure I understand this better:
  1. What would the categories of a normal entry like 不嬲 look like? I'm asking this because "Cantonese" and "Taishanese" are more recognisable than "Yuehai Yue" and "Siyi Yue" and I'm wondering if these more obscure names would end up in the entry. If this works like the other Chinese splits, I suppose the categories would not change, and just the categories of the categories would change?
  2. We have plans (maybe) to include more Yue languages than just Cantonese and Taishanese, which primarily means expanding the scope of the "pronunciation" section of the entries, and this would also generate more categories. Would your proposal benefit this project because we could more easily categorise the new Yue languages to come?
  3. While normal entries written using Chinese characters have the "Chinese" L2 header, romanisations have their respective header per language, such as xiànglái having the Mandarin L2 header and boán-liân having the Hokkien L2 header. We don't seem to do the same for Cantonese, and the pronunciation sections also don't link to the Cantonese romanisations, and I also can't seem to find any Cantonese L2 header. This might have been decided in an earlier policy that I don't know about, so I guess my question is, would it create problems if you demote Taishanese to an etym-only language?
  4. Per your last point I tried to google "Yulin Yue" but the main results are about someone named Yulin Yue, so I tried to google "Yulin Yue" + language and got 235 hits, while "Yulin Cantonese" got me 73 hits (and "Yulin Cantonese" + language got me only 8 hits). This isn't a question per se, just a comment about how little-known other Yue languages are.
  5. I feel like I just have to insert a comment about the choice of Mandarin exonyms vs. Cantonese exonyms vs. endonyms. I think the first option is generally how we do things (except for the names of the main branches), and I suppose this is just the result of the general scholarship, and I'm not really trying to subvert this practice, but I would just like to raise some awareness to this phenomenon.
The above. Apologies if 1999. --kc_kennylau (talk) 23:01, 28 March 2024 (UTC)Reply
@Kc kennylau Thanks much for the detailed questions! In response to your questions, let me see if I can answer:
  1. There are two types of categories: (1) L2 language categories (e.g. Category:Mandarin lemmas); (2) etym-language categories (e.g. Category:Xi'an Mandarin). Under my proposal, we would probably use "Cantonese" in place of "Yuehai Yue" as the L2 language name, since they seem more or less equivalent; but "Siyi Yue" would be the L2 language subsuming Taishanese. This means that a Taishanese term would be categorized both under Category:Siyi Yue lemmas and Category:Taishanese Yue (or maybe just Category:Taishanese; there is some flexibility in the choice of etym-language categories). So essentially, things like Category:Taishanese lemmas would go away in favor of Category:Siyi Yue lemmas + Category:Taishanese Yue, but Category:Cantonese lemmas would remain (possibly with additional more specific categories like Category:Guangzhou Cantonese or Category:Hong Kong Cantonese, both of which already exist).
  2. This proposal is somewhat orthogonal to how we handle the pronunciation section entries; the ones for Cantonese and Taishanese can remain as-is, but might categorize differently (as explained above).
  3. If there were romanizations under a Taishanese header, they would have to be renamed to have Siyi Yue as the header and a label Taishanese attached, to make it clear that the romanizations are specifically Taishanese. (Similarly, entries like boán-liân used to be under a Min Nan header before Hokkien got split out as an L2 language.) But since we don't seem to have any such romanizations, this issue won't arise (at least for now).
  4. As for the obscurity of Yue varieties other than Cantonese and Taishanese, I completely agree. The terminology isn't well-worked out and the term "Cantonese" is particularly problematic since it variously refers specifically to (a) the speech of Guangzhou specifically; (b) the more general Yuehai Yue language that Guangzhou speech is part of [which is what I'm defining it as]; and (c) the entire Yue family. This issue doesn't seem to come up so much for other groups like Mandarin and Wu.
  5. As for Mandarin vs. Cantonese/Yue naming, I am not wedded to using the Mandarin terms; I just chose them because that is what Glottolog and Wikipedia largely use. If the consensus is to use Cantonese-language terms for all lects or to use native terms (endonyms), we can do that as well. I am guessing the Mandarin terms see more usage just out of a sort of default familiarity (pretty much everyone who works with Chinese languages is familiar with Mandarin but many aren't familiar with Cantonese or other varieties, and several Yue varieties don't even have standard romanization schemes). Benwing2 (talk) 23:50, 28 March 2024 (UTC)Reply
I support the move in general (with a strong preference of using yue-can), however here's a couple of problems I can foresee with this proposal:
  1. Goulou actually forms a dialect continuum with Southern Pinghua language, and therefore nowadays [csp] is usually thought of as part of Yue, but weirdly it has a separate language code. Should [csp] be included as well?
  2. Yongxun is a (quite recent) descendant of Cantonese spoken in the major towns and cities in the Pearl River with minor influences from the substrate Goulou varieties. Personally I don't think it should be a separate branch.
  3. As I mentioned before, there are (at least) two distinct varieties of Yue spoken in Nanning, we currently call them Nanning Cantonese (under Yongxun) and Nanning Pinghua (under Goulou-Southern Ping). How can the two be distinguished if it is renamed to "Nanning Yue"?
wpi (talk) 04:19, 29 March 2024 (UTC)Reply
@Wpi Thanks very much for responding. In response to your issues:
  1. I don't know enough about Pinghua to answer, but I note that Wikipedia's Pinghua article asserts that Pinghua has been treated as its own dialect group, separate from Yue, in most textbooks and surveys written since the 1980's. As for dialect continuums, there are many places where different branches form dialect continuums with each other but are still separated. (As an example, Western Bulgarian forms a dialect continuum with Torlakian, which in turn forms a dialect continuum with (other varieties of) Serbo-Croatian. Serbo-Croatian is considered a Western South Slavic language and Bulgarian an Eastern South Slavic language; despite what the Wikipedia article on Torlakian says, it's more often considered part of Serbo-Croatian than Bulgarian.) Maybe User:Justinrleung or User:沈澄心 can comment? There's an additional issue that if we group Southern Pinghua with Yue, what do we do with Northern Pinghua?
  2. Likewise I don't know enough about Yongxun Yue to have a firm opinion; in any case it seems like we won't have any lemmas in it, so whether we make it its own L2 or group it with some other L2 (which one? Cantonese or Goulou?) wouldn't make much difference.
  3. I think this is only an issue if (1) we leave Yongxun as its own group and (2) we put Southern Pinghua under Yue. If Yongxun is e.g. grouped with Cantonese and Pinghua left as-is, the current names are fine. If both dialects get considered non-Cantonese Yue, then one solution is to clarify them as 'Nanning Yongxun Yue' and 'Nanning Pinghua Yue' or something.
Benwing2 (talk) 04:55, 29 March 2024 (UTC)Reply
  • I would prefer to have Southern Pinghua be kept as its own group separate from Yue. It seems that generally speakers of Southern Pinghua would call their varieties Pinghua, distinguished from Baihua (traditionally Yue varieties). The situation in Nanning is a case in point.
  • I don't have a strong opinion on whether Yongxun should be a branch. The Language Atlas of China does mention a few criteria for separating Yongxun out as its own branch, but it seems like those criteria are retentions rather than innovations (from a cursory glance).
— justin(r)leung (t...) | c=› } 18:43, 20 May 2024 (UTC)Reply

────────────────────────────────────────────────────────────────────────────────────────────────────There has been some discussions, and for reference this is our current categorization:

  1. Gwangfu Yue (廣府片) / Yuehai Yue (粵海片): the "main" branch of Yue that contains Cantonese (廣東話), which is the dominant language (besides Mandarin) within the Yue Chinese lects. Our current approach is to group other (more recent) descendents as sub-branches of this branch.
    1. Guan-Bao Yue (莞寶片/莞寶小片): contains Dongguan Cantonese (東莞話) which is genetically close to Cantonese but might be a bit hard to understand for Cantonese speakers because of the differences in phonology. Some classify it as a sister-branch of Gwangfu, but I think we prefer to group it under Gwangfu.
    2. Yong-Xun Yue (邕潯片/邕潯小片): contains Nanning Cantonese (南寧白話). Again this branch is sometimes considered separate from Gwangfu.
    3. Sanyi Yue (三邑小片): the Cantonese spoken in Sanyi (literally "three counties") is highly intelligible with Cantonese, but I want to group them together because they share the innovation that their Tone 4 ("light level") is particularly high.
    4. Xiangshan Yue (香山小片): contains Shiqi Cantonese (石岐話).
  2. Siyi Yue (四邑片): the second most famous branch of Yue that contains Taishanese (台山話). This branch is particularly distinct within Yue, and there should be no debate over the status of this branch.
  3. Gao-Lian Yue / Gao-Lei Yue (高廉片/高雷片): (the Lian 廉 here refers to the River Lian 廉江, which is unrelated to the Lianzhou 廉州 below, which is 145 km apart.) this branch is a merger of the traditional categories Gao-Yang Yue (高陽片) and Wu-Hua Yue (吳化片). The brief reason for this merge is that Gaozhou Cantonese (高州白話, the Gao of Gao-Yang) is also sometimes classified with Wu-Hua Yue, so I think it's better to just merge the two branches. I chose this name because it was also used in earlier classifications for more-or-less the same span. This covers the Yue lects spoken in the Prefectures Yangjiang (陽江), Maoming (茂名), and Zhanjiang (湛江).
  4. Qin-Lian Yue (欽廉片): this category has more-or-less stayed the same across different classifications, but there are also (scholarly) opinions that this is more a regional grouping instead of a proper genetic branch. The following sub-branches have also been proposed in a paper where Qin-Lian is challenged (where I have removed Qinzhou Cantonese (欽州白話) which we consider to be a descendent of Cantonese instead):
    1. Lianzhou Yue (廉州小片)
    2. Lingshan Yue (靈山小片)
    3. Xiaojiang (小江小片)
    4. Liuwanshan (六萬山小片)
  5. Gou-Lou Yue (勾漏片): this category is also quite consistent, with the main distinguishing feature being that voiced stop initials in Middle Chinese tend to become unaspirated. It is also quite distinct among the Yue lects. This lect is primarily spoken in Gwangxi instead of Gwangdong.
    1. Luo-Guang Yue (羅廣小片): this is the Gou-Lou Yue which is spoken in Gwangdong. It might be a misnomer because the Luo stands for the City Luoding (羅定) in the Prefecture Yunfu (雲浮), but there might be no Gou-Lou Yue spoken here.

(Notes for non-Chinese speakers: 片 = branch, 小片 = sub-branch, 話 = dialect.)

There are some remaining problems:

  • Where does the name "Cantonese belong"? Should the sub-branches of Gwangfu Yue also bear the label "Cantonese"?
  • I support using yue for the whole branch and yue-can for "Cantonese" proper.
  • How should we treat sub-branches? Should they have their own codes?
  • Should the names be A-B Yue or AB Yue?

I am also pinging the Chinese editors again for more opinions. (Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): --kc_kennylau (talk) 14:24, 23 May 2024 (UTC)Reply

Note that the proposed tree above is solely proposed by Kenny, and certain parts of it lack any sort of substantial discussion.
I strongly disagree with the proposed "Gao-Lian"/"Gao-Lei" group, as it clearly includes at least two groups with vastly distinct phonological features: Wu-Hua (1) has a three way contrast with its voiced/implosive stops and (2) pronounces MC affricates (精 series) as dentals, while the Gao-Lei and Liangyang groups (1) only have a two-way contrast and (2) pronounce MC affricates (精 series) as affricates - among many other differences. Note that the reason why Wu-Hua is sometimes described as Gao-Lei (e.g. in Zhan Bowei's 廣東粵方言概要) is most likely due to the lack of data on Wu-Hua. I should also note that Wu-Hua is sometimes considered to be an incoherent group, but regardless that should not result in placing the entirety of Wu-Hua with Gao-Lei. As to the question of whether Liangyang is distinct or not, it seems to me that the arguments for a separate Liangyang group is stronger, especially because it has a tone system distinct from the surrounding dialects and an inflectional personal pronoun system for 1/2/3pl that is much more similar to Siyi.
Essentially, my view is identical to the divisions in Language Atlas of China (but not the classification of certain lects), with the exception of placing Yong-Xun under Guangfu (since the Yong-Xun "features" are also found in a lot of modern Guangfu lects or historical dictionaries/rime books, and it is well known that Yong-Xun is descended from Guangfu) and splitting out Liangyang from Gao-Yang (Yangjiang data is not mentioned at all in the Atlas!), and perhaps also splitting out Guan-Bao and Xiangshan (according to 廣東粵方言概要), but I am uncertain as to their position within the tree.
Moreover, it would be splitting hairs when we go for the subgroups (小片), as research is often lacking beyond first level groups (even if there is research being done, often there is only one work to reference from).
Some further comments:
  • I think the usage of "Cantonese" among Yue lects should be relatively liberal - the general rule would be to apply it to any Guangfu lect and any dialect described as 白話, e.g. Qinzhou, Gaozhou, Nanning.
  • Agree with the use of yue for the whole branch and yue-can for Standard Cantonese (i.e. what we are currently using yue for).
  • Regarding the use of hyphen, it should be present when the name is a combination of two names. Goulou is named after the mountain of Goulou, so there shouldn't be a hyphen.
wpi (talk) 16:10, 23 May 2024 (UTC)Reply
Thanks, Kenny and Wpi. I generally agree with Wpi's points. Kenny's Gao-Lian/Gao-Lei should be at least two groups: Gao-Yang and Wu-Hua. I don't have a strong opinion on whether Gao-Yang should be split further. As for the structure of the tree, such as whether certain groups belong under certain groups, I feel like we can be agnostic and have them placed under Yue without thinking too much about the internal groupings; this would mean we could have Yong-Xun, Guan-Bao, Xiangshan, etc. as sisters to Guangfu unless we have really strong feelings about the grouping. Luo-Guang seems to be a very erroneous idea that we should not bother adopting at all. — justin(r)leung (t...) | c=› } 17:38, 23 May 2024 (UTC)Reply
Indeed, I should have emphasized that the tree above is not final, and I only posted it here to attract more discussion. Thank you for bringing that up.
I will talk about the Gao-Lian/Gao-Lei group here first and leave the other points to later replies.
  1. The "three-way contrast" is not as simple as it seems. The evolution of Middle Chinese stops in Wu-Hua is not consistent. According to 粤语“吴化片”商榷 (2016) by 邵慧君, Middle Chinese *b- became /pʰ/ in Wuyang, and in Huazhou it was distributed (irregularly) between /p/ and /pʰ/. Using Jyutdict I was able to verify this (see table below). Note how 婆 became /p-/ in Shangjiang and /pʰ-/ in Xiajiang, and 抱 is the other way round. According to the paper, *p- became /ɓ-/ in Wuyang just like in Huazhou, but even so, since *b- became universally /pʰ-/ in Wuyang, that would only be a two-way contrast. Of course, the "number" of labial plosives isn't the important point here, but rather "how" they correspond with Middle Chinese and with each other. The situation becomes even more complicated if we account for the influence of dominant languages in this area, and I believe that *b- > /pʰ-/ in Wuyang is the effect of Hakka.
    In summary, if you take *p- > /ɓ-/ as the defining feature of Wu-Hua, then it fails because it is not universal (even though you might attribute the remaining lects that have /p-/ as Cantonese influence); if you take the evolution of *b- instead, then it also fails because it is inconsistent between the lects.
  2. As for pronouncing 精 as dental, if you look at the map in 醉 in Jyutdict, you will find that indeed the four Wu-Hua languages recorded all have a dental /t-/. However, if you keep going up from there, you will find that the dental initials continue to Yulin (鬱林) of Goulou Yue, and then even to Wuzhou (梧州) of Gwangfu Yue. To the right, though disconnected, you will find that Taishanese and Kaiping (開平) of Siyi Yue also have a dental initial. Indeed, it is possible that the dental initial spread from Wu-Hua to Yulin, just like how the guttural "R" spread all throughout Europe. However, I don't see an argument of why it has to be genetic in Wu-Hua in the first place.
  3. According to the paper, Li Jian (李健) said that "鉴江源出粤西信宜市北部山区,南流经信宜、高州、化州、吴川四市入海。......整个流域粤语不但极为相似,而且南北渐变的痕迹也十分明显。" (paraphrase: the dialects of Xinyi, Gaozhou, Huazhou, and Wuchuan form a continuum). I don't think this observation can be attributed to a "lack of data". While the dialect in Gaozhou seems to me to be highly similar to Cantonese, I did find that interestingly the character 坐 has an /-ɛ/ final in Gaozhou and also in the Wu-Hua lects.
  4. As for the Liangyang group, I have not looked a lot into this, so I will take your side and assume that Liangyang should indeed form a group. However, this does not contradict with my proposed Gao-Lei group, where there can simply be a Liangyang sub-branch. I do wonder though how you view the "inflectional personal pronoun system" as you mentioned that is "much more similar to Siyi". Do you think Liangyang split off from Siyi, or do you think Proto-Cantonese had such a system that was lost in other lects, or do you think this feature arose by contact between Liangyang and Siyi?
Character Middle Chinese initial Tone Category Zhanjiang (湛江) Wuyang (吳陽) Huazhou Shangjiang (化州上江) Huazhou Xiajiang (化州下江)
*p- level (平) /pa/ /pa/ /ɓa/ /ɓa/
*ph- departing (去) /pʰa/ /pʰa/ /pʰa/ /pʰa/
*b- level (平) /pʰei/ /pʰei/ /pɛi/ /pɛi/
*b- level (平) /pʰɔ/ /pʰɔ/ /pɔ/ /pʰɔ/
*b- rising (上) /pʰoɐu/ /pʰoɐu/ /pʰɔu/ /pɔ̯ɒu/
*b- departing (去) /pʰei/ /pʰei/ /ɓɛi/ /pɛi/
*b- entering (入) /pʰaʔ/ /pʰaʔ/ /ɓak/ /pak/
--kc_kennylau (talk) 19:53, 23 May 2024 (UTC)Reply
By the way, we have three Yue lects currently covered by zh-pron (see ), which are Dongguan Cantonese, Yangjiang Yue, and Yulin Yue.(COI: I added them.) Should we have language codes for these three varieties? Something like yue-dgx, yue-yjx, yue-ylx? --kc_kennylau (talk) 14:58, 25 May 2024 (UTC)Reply
(Addendum: we just removed Yulin Yue) --kc_kennylau (talk) 15:00, 25 May 2024 (UTC)Reply
(You mean in addition to the two lects that have been here longer, so actually a total of four Yue lects now.) — justin(r)leung (t...) | c=› } 15:12, 25 May 2024 (UTC)Reply
Just to help me understand the "lay of the land", are there papers that specifically group the dialects traditionally classified as Gao-Yang and Wu-Hua together? If so, what is the name they use for such a grouping? (From the way this was described above, it feels a little original-researchy, which we don't want to do.) — justin(r)leung (t...) | c=› } 15:20, 25 May 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── (cc @Benwing2) After more discussion, @Justinrleung and @wpi have mostly agreed with the following tree (the codes are added by me):

  • Guangfu Yue (廣府片) yue-guf
  • Guan-Bao Yue (莞寶片) yue-gub
  • Xiangshan Yue (香山片) yue-xis
  • Yong-Xun Yue (邕潯片) yue-yox
  • Siyi Yue (四邑片) yue-siy
  • Liangyang Yue (兩陽片) yue-liy
  • Gao-Lei Yue (高雷片) yue-gal (defined as Gao-Yang in the Atlas minus Liangyang)
  • Wu-Hua Yue (吳化片) yue-wuh
  • Qin-Lian Yue (欽廉片) yue-qil
  • Goulou Yue (勾漏片) yue-gol

I also mostly agree with this, but I would just like to note that Guan-Bao, Xiangshan, and Yong-Xun (and likely Gao-Lei as well) are descended from Guangfu, and the last four (Gao-Lei, Wu-Hua, Qin-Lian, Goulou) branches are more areal than genetic. From what I can gather, the reason this structure is preferred over a more nested one is because currently all the genetic relationships are still not clear, as Justinrleung explained above.

I also don't know if some of the above branches should have "~ Cantonese" as an alias.

--kc_kennylau (talk) 13:30, 26 May 2024 (UTC)Reply

Agree with the above list of groups. For Wiktionary purposes, we would simply treat all ten of them as direct descendants of Yue without being specific on their relationship. (yue "Yue" would be a family)
On top of these I think we should have the following full code:
  • yue-can, "Cantonese", equivalent to (some of) the current use of yue, parent yue-guf
and the following etymology codes:
  • yue-gzh, "Guangzhou Cantonese", equivalent to existing yue-gua, parent yue-can
  • yue-hkg, "Hong Kong Cantonese", equivalent to existing yue-HK, parent yue-can
  • yue-tai or yue-hsv, "Taishanese", equivalent to some of the existing zhx-tai, parent yue-siy
The "Cantonese" suffix could be applied to (dialects of) Guangfu, Guanbao, Xiangshan, Yongxun, and other "Baihua" varieties such as Qinzhou and Gaozhou, all of which are often considered to be related to Standard Cantonese.
wpi (talk) 14:11, 26 May 2024 (UTC)Reply
Agree. --kc_kennylau (talk) 21:24, 29 May 2024 (UTC)Reply

Manipuri vs Meitei language

[edit]

I propose we change it to Meitei as the language is predominantly spoken by the Meitei people. Meitei is not the only language indigenous to Manipur. There are other ethnic groups in Manipur who speak different languages. So there are many Manipuri languages, Meitei is only one of them. 178.120.0.250 10:40, 9 May 2024 (UTC)Reply

FWIW; this is about renaming what we call Manipuri to Meitei. I told the IP to come here, but in hindsight, perhaps WT:RFM would be a better venue.
At least the English Wikipedia seems to use Meitei as the primary name for the language. — SURJECTION / T / C / L / 11:10, 9 May 2024 (UTC)Reply
Sure, btw you can call me 178 if you want. It's a bit more specific. 178.120.0.250 11:31, 9 May 2024 (UTC)Reply
Yes, WT:RFM is the usual place for discussions about renaming languages. —Mahāgaja · talk 13:34, 9 May 2024 (UTC)Reply
i oppose the proposition as it is unneeded; the rename request is unnecessary as it neither adds nor removes anything valuable. There aren't any active editors in the language, and if such a user comes up and finds problem with the name he will point that out naturally and the the discussion will be fruitful. Discussing over it shall only cause a wastage of time, given that in this case the current name is obviously not obstructive. Word0151 (talk) 14:42, 9 May 2024 (UTC)Reply
 Support seems like Wikipedia already changed the name. Not that we need to match Wikipedia, but if they changed it and the only interested editors here wanna change it too... why not? — Sameer مشارکت‌هابحث﴿ 15:52, 9 May 2024 (UTC)Reply
FWIW:
  • Google Ngrams shows "Manipuri language" having about 4x the usage of "Meitei language" and over 12x the usage of "Meithei language" in the most recent year (2019).
  • Wikipedia says that "Meitei" is now used by most Western scholars, although it's sourced to a single source (Chelliah), so take it with a grain of salt.
  • Wikipedia says that Indian government sources and the Indian constitution call it Manipuri, which is probably easily verifiable.
  • Ethnologue calls it "Meitei".
  • Glottolog calls it "Manipuri".
  • "Meitei" is closer to the endonym for the language.
  • As for Wikipedia's name choice, this happened in 2016 or earlier, and there is debate on the talk page about whether to call it Meitei or Manipuri, with the people in favor of Manipuri claiming it is the common name in English.
Benwing2 (talk) 08:36, 13 May 2024 (UTC)Reply

Please help to sort out Scandoromani

[edit]
See also: #Merger into Scandoromani

Lattjo dives! I have started to make some more Scandoromani and there are 4 main problems which i need to ask about advices before i can go on.

Problem 1. As far I understood, Tavringer Romani is Swedish Scandoromani, also known as Traveller Swedish. Tavring is not something exlusively Swedish, and we already have Traveller Norwegian. May it be a good idea to rename Tavringer Romani to Traveller Swedish? Anyway, it's almost no difference between TS and TN, so may it be even a better idea to merge them into one L2 (Scandoromani)? See also the same problem number 4 about Månsing.

orthographies are consistently different, which seems to be the case. - said Theknightwho once about this problem. But is it really a good reason?

Problem 2. More serious one. Some of my first editions on Wiktionary were in Scandoromani and then i was so dumb that i have not included sources on the most entries i've created. And now many of my sources are completely gone from internet. Now i remember that some entries - i don't remember which exactly - are not even from sources, but i've created them together with my former neighbor, an old drunk guy who spoke the language. I mean, i checked them in dictionaries and found them, but some of them not, and now i don't remember which one exactly, and some of the dictionaries are gone.

Dictionaries i remeber but can not find: an old web 1.0 Norwegian website with black background; an long English PDF with ugly monospaced font comparing Scandoromani and Kalo; a scan of an old Swedish book with big fat letters"

Problem 3. What is "Tavringer Romani terms in nonstandard scripts"-category? The script is unspecified, so why is this category coming up?

Problem 4. What to do with Rodi and Månsing? They are jargons of Swedish and Norwegian, so how we should refer to them? I use to refer to them as jargons, using code "sv" (Swedish), specifying that its also used in Norwegian. I hope it's ok to do so. Otherwise, we maybe need them as independent L2s.Tollef Salemann (talk) 19:42, 15 June 2024 (UTC)Reply

Glottonym tweaks: Franco-Provençal, Venetian → Francoprovençal, Venetan

[edit]

(moved from Wiktionary:Beer parlour/2024/August#Glottonym tweaks: Franco-Provençal, Venetian → Francoprovençal, Venetan)

These changes would bring Wiktionary in line with the naming conventions of modern English scholarship, as found in for instance the Oxford Guide to the Romance languages (2016).

Context:

  • Francoprovençal has been the name used in French scholarship since the 1970's. Removing the older hyphen lessened the misleading impression that the language is some sort of secondary blend of French and Provençal (Occitan). There is also an element of typographical convenience.
  • Veneto has always been the name used in Italian scholarship, if I'm not mistaken, with Veneziano predominantly or exclusively reserved for the varieties spoken in Venice and environs, as opposed to the rest of the Venetan domain (Ve1, Ve3‒7).

Nicodene (talk) 22:05, 9 August 2024 (UTC)Reply

Support, the Venetan proposal in particular has been a long awaited change, and given a part of modern Anglophone scholarship handle this sensibly we have little reason to stay behind. Catonif (talk) 22:15, 9 August 2024 (UTC)Reply
 Support. Never heard of Venetan but if this is the accepted term, so be it. Benwing2 (talk) 07:40, 10 August 2024 (UTC)Reply
Thoughts, @Apisite, IvanScrooge98, Samubert96, Sartma, Ultimateria, Urszag, Word dewd544?
(Active users who speak Venet[i]an or have contributed to its entries.)
Nicodene (talk) 20:52, 13 August 2024 (UTC)Reply
Thanks for pinging me. I am pretty indifferent to the hyphen question for Francoprovençal, while I am not fully convinced about Venetan; after all, Venetia is the anglicized name for the region of Veneto (if the linguistic reasoning is to distinguish the specific dialect of Venice from the language as a whole). But if Venetan is now most common in English-language professional literature, then I don’t think there is much to debate. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 21:21, 13 August 2024 (UTC)Reply
The region's name occurs ~15 times more often in English as Veneto than Venetia, according to a Google search for “region of ____” (119000 results versus 7960). The latter occurs generally in historical as opposed to modern contexts.
Also at the moment we have no (reasonable) way to indicate a term used in Venice proper, as opposed to, say, Padua. A dialect label like Venetian would be identical to the name we currently use for the overall language (contra, as mentioned, the name used in linguistics). Nicodene (talk) 22:05, 13 August 2024 (UTC)Reply
Yeah, as I said, I get the reasoning. The thing is Venetian, despite being most commonly a word for stuff from Venice specifically, is not a strictly technical term like Venetan is—which is what comes to me a bit off given that this project is not directed to linguists but rather to the general public. And we could still label entries from the dialect of Venice as Venice, Venice dialect, Venice Venetian or something along those lines. But, again, it doesn’t mean I strongly oppose changing Venetian to Venetan. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 22:19, 13 August 2024 (UTC)Reply
The general public in Italy would be surprised to hear the dialect of, say, Padua described as veneziano. E.g. on Italian Wiki Dialetto padovano redirects to this page, where veneziano is mentioned solely as an external entity: “le parlate dei centri più importanti…sono state influenzate dal veneziano”.
So this is more about the general public of English-speaking countries, which isn't aware that such a language exists, as opposed to a local variety of (Standard) Italian. Nicodene (talk) 23:00, 13 August 2024 (UTC)Reply
Fair enough. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 23:09, 13 August 2024 (UTC)Reply
How do you pronounce "Venetan"? Benwing2 (talk) 23:20, 13 August 2024 (UTC)Reply
For me it's /ˈvɛnətən/ < /ˈvɛnətəʊ/ (≈Italian /ˈvɛneto/) + /-ən/. Nicodene (talk) 23:31, 13 August 2024 (UTC)Reply
@Benwing2: I would rather pronounce the term as /ˈvɛneɪtʌn/. --Apisite (talk) 10:49, 14 August 2024 (UTC)Reply
 Support If we are not going to have separate h2 for the main dialect groups of the Venetan language, then we must go for Venetan. As @Nicodene said, Venetian is the dialect of Venetan spoken in and around Venice. For instance, Paduans, Vicentines and Trevisans speak Paduan, Vicentine and Trevisan respectively, not Venetian. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 15:27, 15 August 2024 (UTC)Reply
@Benwing2 Shall we go ahead, then? Nicodene (talk) 18:00, 22 August 2024 (UTC)Reply
@Nicodene I'm finally getting around to this. For reference, here is (I think) the correct way to rename a language (e.g. "Venetian" -> "Venetan"):
  1. First, list all the categories in Wiktionary (this takes a little while as there are ~ 1,000,000 categories and the listing is only 5,000 per second). Then find all the categories containing the word "Venetian", e.g. using python3 list_pages.py --namespaces Category (it is not sufficient to use the prefix-listing functionality to list categories starting with "Venetian" because there are other categories with "Venetian" in it elsewhere than at the beginning). Use this list to generate a list of category renames to supply to a script such as my rename.py script.
  2. Then, download the latest dump file from https://dumps.wikimedia.org/ (beware, it may be up to 20 days out of date) and search through it for all occurrences of 'Venetian' (e.g. like this: bzcat enwiktionary-20241001-pages-articles.xml.bz2 | python3 find_regex.py -e '^.*Venetian.*$' --all --stdin > find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1).
  3. Then, change the name in the language module itself (e.g. Module:languages/data/3/v for 'vec' = Venet(i)an), then regenerate the code <-> canonical name caches by going to Module:languages/code to canonical name and clicking on the Update button.
  4. Then, rename the categories containing the old name, using the script input created in step #1. You want to do this soon after renaming the language itself. It should follow the language rename rather than precede, so that when each page gets regenerated as it's renamed, the {{auto cat}} regeneration succeeds.
  5. Then, rename the language in the header of the lemmas and non-lemma forms, e.g. like this: python3 rewrite.py --from '==[ \t]*Venetian[ \t]*==' --to '==Venetan==' --cats 'Venetian lemmas,Venetian non-lemma forms,Venetan lemmas,Venetan non-lemma forms' --diff --track-seen --comment 'rename Venetian language headers to Venetan per [[Wiktionary:Language_treatment_requests#Glottonym_tweaks:_Franco-Provençal,_Venetian_→_Francoprovençal,_Venetan]]' --save > rewrite.venetan-venetian-lemmas-non-lemma-forms.venetian-to-venetan.out.1.save. This should follow the category renames so that e.g. the new categories don't end up in Category:Empty categories. Note that we loop over both "Venetian" and "Venetan" lemmas and non-lemma forms (the latter last) so that we get any terms that were regenerated and moved categories between this step and the previous one, or while this step is in progress.
  6. Then, rename the language in references to it in various places (especially but not exclusively in translation sections), using the output of step #2 as a guide. To do this, download the pages containing the word "Venetian", something like this: python3 find_regex.py --pagefile <(extract_pagename.sh < find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1) -e 'Venetian' --text > find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.orig. Copy the file, e.g. cp find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.orig find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1. Edit the latter file appropriately to change all occurrences of Venetian to Venetan that need to be changed. Push the changes using e.g. python3 push_find_regex_changes.py --direcfile find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1 --origfile find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.orig --comment 'Venetian -> Venetan per [[Wiktionary:Language_treatment_requests#Glottonym_tweaks:_Franco-Provençal,_Venetian_→_Francoprovençal,_Venetan]]' --diff --save > push_find_regex_changes.find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.out.1.save.
Benwing2 (talk) 05:53, 15 October 2024 (UTC)Reply
@Benwing2 Nice! Thank you for your work, this is a good day. :) Catonif (talk) 21:09, 15 October 2024 (UTC)Reply
@Catonif Thank you! @Nicodene I tried to find all the remaining instances of Venetian that should be changed to Venetan, but some I'm not sure about, e.g. the "Venetian" dialect of Italian (should that be "Venetan"? is this actually referring to the Venetan language?). The remaining instances are here: User:Benwing2/venetian-to-venetan Please look over them and change any pages needing changing. Thanks! Benwing2 (talk) 21:17, 15 October 2024 (UTC)Reply
@Benwing2 I went through that list, only a few needed to be changed, very well done! By the Venetian dialect of Italian, do you mean CAT:Venetian Italian? That's fine, it is the regional Italian of the city of Venice. Catonif (talk) 21:51, 15 October 2024 (UTC)Reply
@Catonif Thank you! Yes, I was referring to that category. Benwing2 (talk) 21:54, 15 October 2024 (UTC)Reply
Thank you. I had no idea the process was so complicated.
I’ve gone through the list and made one correction. The other cases were already addressed by Catonif. Nicodene (talk) 22:29, 15 October 2024 (UTC)Reply
@Nicodene Also, I'd like to get more input before renaming 'Franco-Provençal' -> 'Francoprovençal'. No one above commented on this change, and the Wikipedia article on the language (which has a hyphen in it) says this:
Although the name Franco-Provençal appears misleading, it continues to be used in most scholarly journals for the sake of continuity. Suppression of the hyphen between the two parts of the language name in French (francoprovençal) was generally adopted following a conference at the University of Neuchâtel in 1969; however, most English-language journals continue to use the traditional spelling.
Benwing2 (talk) 21:29, 15 October 2024 (UTC)Reply
It seems roughly 50/50 in English, judging by results from the last few years in Google Scholar. There doesn’t seem to be an official spelling in English, but there is one in both French and Italian (in both cases without the hyphen). The closest thing to an official English spelling that I could imagine is the one preferred by Oxford University, which is more or less the “capital” of anglophone scholarship in Romance Linguistics. Nicodene (talk) 22:49, 15 October 2024 (UTC)Reply
@Benwing2: In your edits here and here you changed Venetian to Venetan, citing this discussion. The thing is that you changed the names of external Wikimedia projects, which are still the "Venetian Wiktionary" and "Venetian Wikipedia" regardless of the spelling convention we use in our own entries. So I'm not sure those edits are worthwhile. Ioaxxere (talk) 23:06, 15 October 2024 (UTC)Reply
@Ioaxxere Oops, I didn't realize those are external links. Please undo them, thanks! Benwing2 (talk) 23:12, 15 October 2024 (UTC)Reply
OK went ahead and did this. Benwing2 (talk) 23:13, 15 October 2024 (UTC)Reply

Rename wca from Yanomámi to Yanomam

[edit]

I suggest we rename wca Yanomámi → Yanomam.

Our current name for this language (Yanomámi) is extremely confusing, given that its close relative guu, which we call Yanomamö, is also commonly called Yanomami (with or without various diacritics). In addition, the langauge family to which both of these languages belong is also called Yanomami, even by us (cf. Category:Yanomami languages). (The accent mark on Yanomámi is irrelevant; it may be present or not in any of these uses, so it doesn't help in distinguishing one from the other.)

Current practice in the academic literature is to call wca Yanomam, avoiding this confusion. See Helder Perri Ferreira, Yanomama Clause Structure, page 6: 'To avoid confusion then, the following terms are used in this thesis: [] Yanomam = either refers to a language of the Yanomami family or to its speakers. It corresponds to what Ramirez (1994a: 35) called the “Oriental super-dialect of Yanomami” or “Oriental Yanomami” (Yor). Migliazza (1972: 34) calls this language “Yanomam” as well.' Glottolog uses the similar term Yanomám; see here. Jacques Lizot's work tentatively follows Migliazza and also labels the variety as Yanomam, as does the Endangered Languages Project; see here. 'Yanomam' seems by far the most common designation for this lect in the current literature; it would make sense to rename the language accordingly. — Vorziblix (talk · contribs) 14:21, 28 August 2024 (UTC)Reply

Since I am intending to do some work with this language in the immediate future, I’m going to go ahead and make this change now to avoid having to make many more changes down the line. If there end up being any objections to the move, we can still discuss and undo the change then if needed. — Vorziblix (talk · contribs) 13:27, 3 September 2024 (UTC)Reply

East Lechitic typology

[edit]
  • Relevant Wikipedia articles:

In this thread I would like once and for all to try and determine what should be and what shouldn't be an L2 on en.wiktionary based linguistic, technical, and other criteria.

It's not secret that when dealing with dialect clusters and groups that it can be a headache to determine all of this.

When it comes to Lechitic, the West (Polabian)/North (Pomeranian)/East isn't even a strong grouping anyway; much of East Lechitic didn't even undergo the so-called Lechitic ablaut (some linguists argue that it was later levelled, some argue it never took place), and Old Polish, like many other "Old" languages, is not a single language, but rather a group of dialects with varying phonological features and changes that can be shown to go to a single etymological form, even if that form wasn't omnipresent across all the lects it represents (for example Masuration is a very early change).

The lects in question are Silesian, Masurian, and Goral.

In terms of linguistics, Silesian doesn't differ from other dialect groups as much and shares much in common with Greater Polish and Lesser Polish. However, it has undergone a huge standardization recently, and the socio-linguistic aspect of all this cannot be ignored, either. In terms of technical aspects on Wiktionary, there's not much that it needs that is special, to be honest, but I feel its status as an L2 is fairly safe. I mention this for later points and for context. Mutual intelligibility between Silesian and Polish can vary vastly - depending on the vocabulary used it may be intelligible or not, typical of other Slavic languages.

Masurian was split initially for being incredibly divergent from Polish. It shares a fair amount with some neighboring dialects such as Kurpian, however, to a much greater extent, and mutual intelligibility between Masurian and Polish is limited. Even when using more common vocabulary, it can be difficult to understand, and also a large number of everyday terms differ either by etymology or by a significant number of phonemes. I feel the Appendix:Masurian Swadesh list demonstrates this well (Appendix:Polish Swadesh list]] for reference). As far as the orthography goes, Masurian is a not widely spoken lect, so levels of normalization within the culture are not high, but also its daily usage is not either. It could be possible to normalize to a Polish orthography with a few additions (namely áéóôû, which we are going to need for other dialects anyway. an explanation can be found at w:Dialects of Polish). In terms of technical aspects, many Masovian dialects, such as Kurpian, might need similar support, such as a different declension module, as many more consonant alternations exist due to the decomposition of soft bilabialis (i.e. budowa > budozie). Its status as an L2 is debatable.

Goral sits in between Silesian and Masurian in most regards. Culturally, it is one of the most spoken dialect groups (itself being a dialect group WITHIN the Lesser Polish dialect group, but the number of differences between dialects here is smaller than between other dialects within a dialect group) and its mutual intelligibility is much like that of the relationship between Silesian and Polish. Depending on the vocabulary used as well as the "thickness" of the speakers accent, mutual intelligibility can vary wildly. In terms of orthography, pagenames would differ about as much as some other dialect entries. What I mean is that in Middle Polish you had so called "slanted-vowels) (áéó) which all developed differently in different dialects, as well as w:Masuration. Goral dialect would be spelled on the whole very similarly to other Lesser Polish dialect words, so lekarz would be lykorz for both groups. In terms of technical support, it would also need new declension templates, but it could be handled using most of the same infrastucture as the rest of other Polish dialects. However, one big difference is many Goral dialects have initial stress, which stands in huge contrast to the rest of East Lechitic, which is penultimate.

Solutions:

  1. Split all. Keep Silesian and Masurian split and split Goral as well, setting it as a descendent of Old Polish.
  2. Status quo. Keep Silesian and Masurian split and do not split Goral.
  3. Remerge Masurian. Silesian remains an L2, and Masurian and Goral would be dialects of Polish.
  4. Remerge all.

I personally can see the first three options, or more specifically options 1 or 3. I'm strongly against merging Silesian, and I suspect most people here would be as well, but I am placing the option here for the sake of completeness. I have already set Polish dialects as LDL's on WT:About Polish, so questions of attestation can be put aside.

I am opting to leave out anything about Old Polish and Middle Polish here. Vininn126 (talk) 12:56, 1 September 2024 (UTC)Reply

I would prefer option 3. Almost no language is homogenous, and we can't endlessly split, we need to stop somewhere; I think written language is the most important thing for languages in (western) Eurasia: I'm pretty sure an average Masurian speaker will not see Standard Polish as a language separate from the one they write in day-to-day, and will have little problem to encode their variety in written Polish to a satisfactory degree. You can write a word like ony and pronounce it as /ónÿ/ without much of a problem. You can write /ôwtén/ as owten (which is probably attestable by the way!) and show you're a dialectal speaker. Just in the same way Finnish speakers write Finnish, Scots write Gaelic and Italians write Italian. Thadh (talk) 13:29, 1 September 2024 (UTC)Reply
Second idea: (Notifying KamiruPL, BigDom, Hythonia, Tashi, Sławobóg, Silmethule, Rakso43243, Skerillion): @Benwing2, @PUC, @Thadh How would you feel about having etymology codes for the major dialect groups? We have already one for Middle Polish which has been very useful. I could see it being very useful for having for example pl-GP for Greater Polish, pl-LP for Lesser Polish, pl-MS (or something similar) for Masovian, maybe pl-BOR or something for both Borderlands (but I'm not sure we need that one) and also potentially pl-gor for Goral. This would be very useful for etymologies as each group has different tendencies for borrowing and its relation to other languages, such as some Greater Polish dialects having some vocab in common with Kashubian, for example. Vininn126 (talk) 10:21, 12 September 2024 (UTC)Reply
Only for those that have an actual demonstrably significant number of borrowings into another language that set them apart from Standard Polish or other groups. Thadh (talk) 10:26, 12 September 2024 (UTC)Reply
This would be fairly easy to do if we consider dialectal borrowings - (dialectal) Prussian German often borrowed from Masovian dialects, Slovak dialects often borrowed from Goral/Lesser Polish. Greater Polish most assuredly gave certain words in dialects of Kashubian. I'm fairly sure we could find examples of non Standard Polish words for each, and the given lects mentioned are unlikely to have borrowed from other dialect groups. Vininn126 (talk) 10:31, 12 September 2024 (UTC)Reply
@Vininn126 I just received your "Second idea" ping: 4 days late. No objections to adding etym codes for the major dialect groups, but they should follow the standard etym code notation, hence pl-gre for Greater Polish, pl-les for Lesser Polish, pl-mas maybe for Masovian, pl-bor for Borderlands, pl-gor for Goral. Benwing2 (talk) 22:44, 15 September 2024 (UTC)Reply
And I assume you are for option 3 in the first. Vininn126 (talk) 05:21, 16 September 2024 (UTC)Reply
Yes, not strongly though; I trust whatever you think is best. Benwing2 (talk) 05:28, 16 September 2024 (UTC)Reply
Okay, I think everyone who's going to say something has said their piece. I have tried asking everyone for their opinion. The decision is: Remerge Masurian, don't split Goral, and don't Merge Silesian. Greater Polish, Lesser Polish, Masovian, and Goral will get their own etymology codes. I can implement this starting this week. Vininn126 (talk) 17:48, 25 September 2024 (UTC)Reply

Paraguayan Guaraní (again)

[edit]

Guaraní is a mess. Its problems include a broken pronunciation module, dozens of conjugation templates with no documentation of what they are for[7][8] and a complete lack of references, but the worse one are the language codes. Currently, we have codes for both Guaraní (gn) and each one of its "varieties" — Chiripá (nhd), Classical Guaraní (gn-cls), Eastern Bolivian Guaraní (gui), Mbyá Guaraní (gun), Paraguayan Guaraní (gug) and Western Bolivian Guaraní (gnw) — and all of these are treated as distinct languages with their own L2 heading, which raises the question: if we have a heading for each variety, what the even is Guaraní? Looking through the lemmas, it seems to be a duplicate of Paraguayan Guaraní, an issue that has already been addressed seven(!) years ago, with no consensus in changing anything. Also, Classical Guaraní is currently listed as a descendant of Guaraní and a sister language of Paraguayan Guaraní, which is not ideal.

My proposal is:

  • Making gn a family code, similarly to Tupi-Guarani (tup-gua), putting Classical Guaraní as the ancestor of Paraguayan Guaraní and moving everthing from the Guaraní L2 to Paraguayan Guaraní.
    • The position of the ancestor is still not clear to me, though. To my understanding, what Wiktionary calls Classical Guaraní is the language used in the 17-18th century Jesuitc missions of Paraguay, Argentina and South Brazil. It's the ancestor of Paraguayan Guaraní for sure, but its relation to Mbyá and Chiripá is not well explained, and authors just calling everything "Guaraní" doesn't really help...
  • Another way would be doing the opposite: merge everything into gn, make Classical Guaraní its ancestor and use {{lb|gn|x}} for the different varieties. This would be specially counterproductive because we would end merging Mbyá and Paraguayan, and they certainly aren't the same language. The problem is aggravated with Mbyá having a different spelling that uses X instead of CH.

Taggin' the only active Guaraní editors I know @RodRabelo7, Ovey 56 and @Theknightwho who seemed interested :p. Trooper57 (talk) 17:27, 13 September 2024 (UTC)Reply

Thanks! Finally someone spoke about it! Yes, the Wiktionary pages on Guarani are certainly a mess, but I'd say I liked more your first proposal, since the Guarani varieties are already considered by some as different languages.
Just some things on the language used by the Jesuits in their missions, it was its own language, just like the Jesuitic Nahuatl (I don't remember the language's official name).
I've already wanted for so long for Guarani be recognized as a group of languages than a languages with so many different dialects and not only for the differences in their vocabulary, pronunciation and integibility with one another, but because the contemporary Guarani peoples do not consider the group of varieties as a single language.
I totally agree on editing the pages to show they are different languages, as well as changing the automatic name that pops up when the code "nhd" is used. It should be either "Nhandeva", "Yandeva" or "Nandeva", since "Chiripa" is an outdaded term that some Yandeva people consider derogatory/insensitive. Junior Santos (talk) 13:17, 14 September 2024 (UTC)Reply
Interesting, so Classical and Paraguayan Guaraní were actually spoken at the same time, with the first being like a "formal" version used by the Jesuits?
And I think the categories were created when these names were still in use lol, most of the Tupian languages have been left untouched for years. The Kaapor don't seem fond of "Urubu", too. Trooper57 (talk) 14:52, 14 September 2024 (UTC)Reply
Also pinging @Rodrigo5260 who commented on the issue on Discord. Trooper57 (talk) 14:54, 14 September 2024 (UTC)Reply
  • Thank you, Trooper57, for pinging me into this discussion. First of all, I would like to mention that I have indeed noticed this mess with the Guarani entries. I have worked on some (Paraguayan) Guarani entries, and from my experience, almost all Guarani (gn) entries are actually Paraguayan Guarani (gug). However, since it's more common to see just Guarani, I opted to record them that way... I agree with the question: if we have a code for each variety, what is actually Guarani? I must admit that I only know the differences between (Paraguayan) Guarani, the Mbyá, and the Kaiwá (to which I recently added some entries, such as yrygwasu). I am less familiar with the other varieties. Regarding Classical Guarani (I prefer the term Old Guarani, by analogy to Old Tupi), this is the origin of (Paraguayan) Guarani, Mbyá, and Kaiwá, at the very least. Old Guarani is to these varieties what Old Tupi is to Nheengatu, for example. I also note that I have created the very first entries for Old Guarani, such as cabayu and ĭgaratá. What to do? I'm not sure yet, but I would like others to share their ideas. By the way, it would be interesting if we could gather at least one dictionary for each variety to get a better idea of what we are dealing with. I have a dictionary for (Paraguayan) Guarani, Mbyá, Kaiwá, and, of course, the Montoya's vocabulary on the so-called Classical Guarani, Tesoro de la lengua guaraní. RodRabelo7 (talk) 04:05, 15 September 2024 (UTC)Reply
    Oh, and I'd support removing the diacritic from "Guaraní". "Guarani" is way better... RodRabelo7 (talk) 04:08, 15 September 2024 (UTC)Reply
    About the last part, I haven't found any dictionaries yet, but there's some Eastern Bolivian Guaraní vocab in this pdf by UNIBOL Guarani. Trooper57 (talk) 16:22, 15 September 2024 (UTC)Reply

Add Guachí

[edit]

Guachí is an extinct language known to have been spoken in Argentina in the 19th century; the only record is a word list of 145 words, from 1845. Apparently, it's usually classified as Guaicuruan, but WP says the data is insufficient to demonstrate that. For reference, we already have Appendix:Guachí word list. Theknightwho (talk) 14:18, 17 September 2024 (UTC)Reply

Hi, in the future I'd recommend not adding a language even if you want to, but no one replies to your suggestion to add it in 10 days. In general you need at least one other person to look over and agree with your suggestion. Please don't take silence as consent. In this case you should have pinged User:-sche, who can give you thoughts. I'm personally a bit skeptical as to whether a single word list is enough data to indicate even that it's a separate language as opposed to either a dialect of an existing language or a mishmash of randomly collected words. Benwing2 (talk) 10:19, 28 September 2024 (UTC)Reply
Same thing goes for Kalašma, which you recently added with a similar "silence = consent" assumption. Benwing2 (talk) 10:20, 28 September 2024 (UTC)Reply

Changing the canonical name of kla from "Klamath-Modoc" to "Klamath"

[edit]

Wiktionary's canonical name for the language kla, spoken by the Klamath and Modoc peoples, is currently "Klamath-Modoc", which reflects the fact that the two peoples spoke different dialects. I propose that it be renamed "Klamath", which is the name that sources discussing the language predominantly (though not universally) call it.

  • The Klamath Tribes themselves call the language "Klamath". (The Modoc Nation could conceivably have a stake in the language being called "Klamath-Modoc", but I can't find any references to the language by name on their website.)
  • Most of the academic literature I can find about the language identifies it as "Klamath". In particular, the works of Albert S. Gatschet and M. A. R. Barker, who each produced by far the most extensive and most cited documentation of the language, call it "Klamath".
    • The search string "Klamath language" yields significantly more results in both Google Scholar and JSTOR than the string "Klamath Modoc language".
  • The English Wikipedia article for the language has been titled "Klamath language" since 2011. Also, almost all sources in that article's bibliography refer to the language as "Klamath".

(In the interest of a fully informed discussion, it's worth noting that the following sources use the name "Klamath-Modoc": SIL International, Ethnologue, Glottolog, OLAC, and the California Language Archive.)

— Äþelwulf (talk) 20:56, 24 September 2024 (UTC)Reply

Is there anything I can do to elicit input on this matter? — Äþelwulf (talk) 20:19, 15 October 2024 (UTC)Reply
@Athelwulf Maybe ping User:-sche, who is often involved in these discussions? -sche, can you ping anyone else who you think might have relevant comments? Benwing2 (talk) 21:20, 15 October 2024 (UTC)Reply
BTW the fact that both Ethnologue and Glottolog use the name "Klamath-Modoc" is significant, although not decisive. Benwing2 (talk) 21:22, 15 October 2024 (UTC)Reply
You are right that "Klamath" is the more common term, and although it is hard to be sure how many uses of it mean the language [that encompasses both 'Klamath' and 'Modoc'] and how many mean the dialect ("Klamath-Modoc" is arguably clearer about the scope), probably our preference for using the most common name should lead us to use Klamath here.
It is interesting that there are almost no uses of the native name. ("Klamath" is derived from the Upper Chinook designation for all the natives of the Klamath River Basin, including the Klamath and Karuk and Shasta and Yurok — Modoc is at least [a clipped rendering of] a Klamath-Modoc word for that variety — and Victor Golla, in California Indian Languages (2022), page 135, notes that after "Gatscher used 'Klamath' as the specific ethnographic name for the Indians of the reservation on Upper Klamath Lake and for their dialect of Klamath-Modoc, [...] this usage soon became standard among anthropologists [but] there was [initially] reluctance, however, to extend the term to the Modocs, who had been treated as a separate tribe since the Modoc War of 1872-1873 and their subsequent removal to Oklahoma.") - -sche (discuss) 21:26, 21 October 2024 (UTC)Reply

Ancestor of Azerbaijani

[edit]

Hello, I wrote wiktionary articles in Azerbaijani written in the Azerbaijani Abjad (Turco-Perso-Arabic alphabet), but some other Azerbaijani users cancel all my edits on the pages, because they are "too old for Azerbaijani". The question is related to the constant rollbacks of information from articles written in the Azerbaijani Abjad alphabet, I constantly encounter these restrictions that they write "this word does not exist in modern Azerbaiani". This is due to the fact that the ancestor of the Azerbaijani language is not defined in Wiktionary, or rather it is defined as Old Anatolian Turkish, but this is too ancient an ancestor. For comparison, in the Turkish language (of Turkish Republic) the ancestor is indicated as the Ottoman language and then the old Anatolian Turkish, this is logical, Ottoman Turkish was used until 1920s. This completely solves the problem in the case of the Turkish language (of Turkish Republic). At the same time, there is no solution to this problem for the Azerbaijani language - the ancestor of the Azerbaijani language is indicated in wiktionary as Old Anatolian Turkish, which was used until the 14th century at the latest. Azerbaijani has no ancestor in the time intervals from the 15th to the beginning of the 20th century (according to various sources, modern Azerbaijani can begin in 1922-1923, when the USSR occupied Azerbaijan, or in 1928, when the USSR translated the Azerbaijani language into latin alphabet) — Azerbaijani has no ancestor in the time intervals from the 15th to the beginning of the 1920s. However, historically, the ancestor of Azerbaijani was considered as Ajami Turkish (trk-ajm, "Turkish of Persia" and was language of Qajars, Afshars, Qizilbashs, Qashqayi, Afshar etc, it is also ancestor for Iraqi Turkmen and Sonqori languages, also possible for Khorasani Turkish and Khalaji languages, For example, In book The Turkic varieties of Iran , Christine Bulut says (page 406) that written language for theese language was Ajam Turkic since 16th century. It is a good term). I could write Azerbaijani articles written in the Abjad alphabet within this language so as not to encounter restrictions, but as I understand it is not possible at the moment. Please help me with this issue, since I have a lot of literature and I want to create pages indicating these words, but I encounter restrictions from other users.

At the moment Azerbaijani language page says that Azerbaijani language comes from:

  • Proto-Turkic
  • Proto-Oghuz
  • Old Anatolian Turkish

but it should be

  • Proto-Turkic
  • Proto-Oghuz
  • Old Anatolian Turkish
  • Ajami Turkish

Please, create the language Category for this language Ajami Turkish (https://www.wikidata.org/wiki/Q110812703) and make it ancestor it for Azerbaijani language. It will look like this: Azerbaijani language comes from Ajami Turkish (trk-ajm), which comes from Old Anatolian Turkish:

m["trk-ajm"] = {
"Ajami Turkish",
110812703,
"trk-ogz",
"fa-Arab",
ancestors = "trk-oat",
entry_name = {["fa-Arab"] = "ar-entryname"},
}

Sebirkhan (talk) 19:31, 8 December 2024 (UTC)Reply

January 2025

[edit]

Etym-codes for recensions of Church Slavonic

[edit]
Branch from: Church Slavonic and Moravian.

Recently Church Slavonic (zls-chs) was created as L2. Everything is fine, but Church Slavonic (CS) is divided into "dialects" (recensions or redactions), the spelling of which is very different in places. There is obviously a need for etymological codes for different variants of CS. The most famous recensions of CS are Croatian, Serbian and Russian. But in reality there are more. What are your suggestions, what can be done in this situation? What codes could be created? AshFox (talk) 09:00, 11 January 2025 (UTC)Reply

I recently studied the situation in the East Slavic variants of CS. Often the East Slavic variant of CS is simply called "Russian Church Slavonic" (RuCS), but this is a very, very simplified term. Because the East Slavic variant of CS that existed in the times of Rus (around 988‒1450) is very different from RuCS that exists now, which is very developed and whose spelling is extremely different from the archaic spelling of the times of Rus. The rules for using letters and spelling in modern RuCS can be found, for example, in Смирнова А. Е. (2024), Церковнославянский язык в таблицах. For example, the Greek name "Xenia" in modern RuCS orthography is written as Ѯе́нїѧ (Ksénija), while in the times of Rus it would have been *Ѯениꙗ (*Ksenija) < Gr. ξενία (xenía). Another significant difference is the presence of the reduced ъ (ŭ) / ь (ĭ) in RuCS during the Rus times and their complete absence now. And so on... In general, there are many differences between the modern RuSC and the RuSC of the Rus times, which does not allow us to perceive the Eastern Slavic version of CS from 988 AD to the present day as one "Russian Church Slavonic".

The tree part looks like this:

─┬ Church Slavonic (zls-chs)
 ├[-]┬ Old East Church Slavonic (zls-chs-orv)
 │   ├──── Russian Church Slavonic (zls-chs-ru)
 │   └──── Ruthenian Church Slavonic (zls-chs-rt)
AshFox (talk) 11:50, 11 January 2025 (UTC)Reply
Sounds sound. Fay Freak (talk) 12:57, 11 January 2025 (UTC)Reply
 Support. Etymology codes are an easy way to add precision without too much complexity. Vininn126 (talk) 12:59, 11 January 2025 (UTC)Reply
In the future, I think it would be desirable to have codes like this (names and codes themselves can be clarified/changed):
  • Czech-Moravian Church Slavonic (zls-chs-cs)
  • Bulgarian Church Slavonic (zls-chs-bg)
  • Macedonian Church Slavonic (zls-chs-mk)
  • Serbian Church Slavonic (zls-chs-sr)
  • Croatian Church Slavonic (zls-chs-cr)
  • Wallacho-Moldavian Church Slavonic (zls-chs-ro)
AshFox (talk) 18:43, 11 January 2025 (UTC)Reply
This division is slowly devolving into the tumultuous state of affairs from the times of Belić and Mladenov. I struggle to see how one can comprehensively differentiate between all of these renditions? Are we going to follow linguistic, historical, or geographical criteria?
For example, what label one should give, say, to Gregory Tsamblak writings? Did he write in Bulgarian, in Serbian, in Wallacho-Moldavian, or in Ruthenian ChSl.? After all, in different periods of his life, he worked in different places.
Or what label should be given, e.g., to Didactic gospels? Its author is Constantine Preslavsky, but all surviving copies were written in Medieval Ruthenia?
IMO, if specification is required, just write the concrete source of the word. It will save us the hassle of splitting hairs.
All in all:  Support for renditions that can be localized (Czecho-Moravian, Croatian, Rascian) and  Oppose for the rest. Безименен (talk) 10:25, 19 February 2025 (UTC)Reply
PS Bulgarian/Macedonian Church Slavonic is split into Literary Schools and into different time periods. It is misleading to clump them all together. For example, a text written by Preslav Literary School would have more in common with Czecho-Moravian from the same time period, rather than with the Tarnovo Literary School which emerged after from XII cent. Безименен (talk) 10:41, 19 February 2025 (UTC)Reply
 Support. Chihunglu83 (talk) 21:45, 14 January 2025 (UTC)Reply

Old Albanian

[edit]
Discussion moved from Wiktionary:Beer parlour/2025/January#Old Albanian.

For the purpose of helping users understand the Albanian language and its history, I believe there should be a new and separate language added to Wiktionary: Old Albanian. The Old Albanian language is considerably different from modern Albanian and having both would better help in studying the Albanian language. I think that if there are both Armenian and Old Armenian as separate languages, as well as many others, Old Albanian, too, should be added as separate. It is simply impossible to include such differing eras of the Albanian language under simply "Albanian", which includes modern Albanian of all dialects, as well as Old Albanian from over 500 years ago. The language has evolved considerably in phonology, forms, lexicon, and much more.

A single sentence in Old Albanian has become almost alien to modern Albanian speakers, although not entirely. Take, for example, Old Albanian, "Ënço, tyy të lusmë, Zot, të mujtunitë tat, e eja përse ti tue klenë të lutunitë tanë e të shpëtuomitë tanë na të jemi të denjë ën kësi perikuli të kuatëvet tinëve na me klenë dëlirunë e shelbuom." Shqypëtari (talk) 21:27, 14 January 2025 (UTC)Reply

If so, in view of likeness of attestation situation, the comparison should be made with Old Lithuanian, rather than Old Armenian. Fay Freak (talk) 21:31, 14 January 2025 (UTC)Reply
If anyone has the knowledge/time to critically analyze Old Gheg works of Buzuku (1555), Budi (1618–1621), Bogdani (1685), and the Old Tosk writings by Matranga (1592) and Variboba (1762), why not? I am just concerned it would be unsourced and turned into a huge mess. The initiator would also need to propose a way of spelling normalization. — This unsigned comment was added by Chihunglu83 (talkcontribs) at 14:13, 28 February 2025 (UTC).Reply
I think it's best to write it in the modern Albanian alphabet as the sounds are identical, as far as I know. But I'm not really used to wiki and maybe we can use the script used in its earliest attestation, or use the script it was most commonly written in. However I think that would be a bit messy and so I'd say use the modern Albanian alphabet. Shqypëtari (talk) 16:11, 17 April 2025 (UTC)Reply
Also I'm not sure if this is possible but would we be able to make it a language where sources are required before creating entries? Shqypëtari (talk) 20:22, 17 April 2025 (UTC)Reply
Yeah, just write it as you think it to be modern, if the equivalent is sufficiently certain (probably is not if from Ottoman Turkish alphabet at least).
The driveby editors don’t follow our taste in sourcing standards anyhow and splitting languages would give leeway to double standards, so it is easier to stress best practices from the perspective of modernity. But this is my opinion not knowing Albanian, I just know the kind of people who edit here. Fay Freak (talk) 21:15, 17 April 2025 (UTC)Reply
Hi Shqypëtari! :) Unfortunately my first time greeting you is an  opposition. I'm glad you're interested in contributing to the historical side of the Albanian language, but I believe the best way to do it is under a single header. I don't think a split as this one would make sense neither in theory nor in practice.
In theory, the claim of the language being considerably different stands only for the works of Buzuku and perhaps Matranga, which are indeed pretty wacky, but the following 17th c. works are much closer to the modern language. The sentence you give neatly shows the peculiarity of Buzuku's language, but it does not well represent the rest of what I assume you would like to treat as Old Albanian, notably Budi, Bogdani and Bardhi. There are modern-day varieties with a considerably higher discrepancy in phonology, forms, lexicon, and much more from the modern literary language than these authors are.
And in practice, as well, I am of the idea that building a robust infrastructure and editing habits that allow us to document the many dialectal and historical nuances of the language is a much preferrable option than dividing information into different places. I have been adding alternative forms and quotes from the early authors under the Albanian header, and nothing of this process has been hindering, IMO, the efficiency of how the information is conveyed. On the contrary, the entries under the main header are enriched, and it is easier to see all in one place the various phonetic correspondances between historical attestations and modern dialectal forms, as well as semantic evolution. Catonif (talk) 21:31, 18 April 2025 (UTC)Reply
Hello, thank you for replying. I'm not too familiar with the works of Budi, Bogdani, and Bardhi, and I think the main idea for me was that Old Albanian should be focused more on the older works, like in Meshari. If you think there are better alternative ways of treating Old Albanian, I understand. My main goal is really trying to add more detail to Albanian etymologies. For now, though not as often, I've been adding the forms seen in Buzuku's work under certain words, either as alternative forms or mentioning them in the etymology. Are there any example entries where I could see how you added quotes? Again, thank you. Shqypëtari (talk) 22:09, 18 April 2025 (UTC)Reply
@Shqypëtari Yes, I can get behind why Buzuku feels somewhat separate, there is a good number of books and papers dealing with its language and grammar alone, however it is not a good idea to have an entire L2 for a single work. These archaic forms can be added under ==Alternative forms== as well as ==Etymology== as you have been doing (even though personally it is my habit to keep the etymology section as clean as possible from the attested forms already mentioned in the altform section above). As for quotes, the parameter |norm= is very useful to bring the orthography to a more readable scheme, as examples from Buzuku: thërrmijë, zdryp, mnerë, asgjë. Catonif (talk) 13:30, 19 April 2025 (UTC)Reply
So generally if we are creating a historical word that did not survive till now, should we put that word under the same header but marked obsolete or maybe better to use head=* ? What would be the best way to handle it? Words such as mamës, nasip are currently unmarked (while marked with * on DPWA) and it would really cause confusion for Albanian learners... Also probably it needs to be in a separate category.

"Old Turkic" and "Bulgar"

[edit]

See Wiktionary:Beer_parlour/2025/January#"Old"_and_"Orkhon"_Turkic,_plus_some_more for the discussion leading up to this

I request the creation of these following language headers:

AmaçsızBirKişi (talk) 10:06, 25 January 2025 (UTC)Reply

@Benwing2 can u help with this? Zbutie3.14 (talk) 01:50, 15 February 2025 (UTC)Reply
Are these new L2 languages or etymology variants of existing L2 languages? The former require a lot more consensus than the latter. Benwing2 (talk) 02:06, 15 February 2025 (UTC)Reply
I think they are etymology variants. We already have the tags [xbo] and [otk], [trk-blg-pro] is an etymology tag that deals with Bulgaric languages [xbo] and [cv], excluding [zkz], us having the ability to distinguish these would be nice. [trk-ajm] is also an etymological language, for [az] and [qxq].
AmaçsızBirKişi (talk) 14:15, 15 February 2025 (UTC)Reply
Also there are these 2 siberian turkic languages with no code
https://en.wikipedia.org/wiki/Soyot_language
https://en.wikipedia.org/wiki/Dukhan_language Zbutie3.14 (talk) 21:52, 15 February 2025 (UTC)Reply
What is proto-bulgaric for? Isn't trk-ogr what we use for oghuric? Zbutie3.14 (talk) 15:17, 15 February 2025 (UTC)Reply
It's mainly for distinguishing from which stage of the Bulgar did loanwords into other languages derive from., at least in theory.
Mongolic loans from Oghur branch would separate from Oghuric, but Hungarian/Church Slavonic loans would separate from Bulgaric for instance:
  • Oghur [trk-ogr]
    • Mongolic (borrowed from Oghur)
    • (...) (borrowed from Oghur)
    • Proto-Bulgaric [trk-blg-pro]
      • Church Slavonic (borrowed from Bulgar)
      • Hungarian (borrowed from Bulgar)
      • (...)
        • Chuvash [cv]
AmaçsızBirKişi (talk) 16:24, 15 February 2025 (UTC)Reply
OK I need a little more help. Etymology variants are always variants of something else (either another etymology variant or an L2 language) and can have their ancestor set separately (e.g. Old Italian is considered an etymology variant of Italian, but Italian has Old Italian as an ancestor). Currently xbo (Bulgar) is an L2 language with trk-pro (Proto-Turkic, another L2 language) as its ancestor. Would trk-blg-pro (Proto-Bulgaric) be an etym variant of trk-pro and would the ancestor chain go trk-pro -> trk-blg-pro -> xbo? And would xbo-dnb (Danube Bulgar) and xbo-vol (Volga Bulgar) be etym variants of xbo (Bulgar) and also have trk-blg-pro as their ancestor? (In a case like this I suspect we don't have to set the ancestor explicitly; xbo-dnb and xbo-vol would automatically have their ancestor as the same as xbo. @Theknightwho for verification.) And since we already have otk (Old Turkic) as an L2 language with trk-pro as its ancestor, would Orkhon Turkic (otk-ork) be an etym variant of otk and have trk-pro as its ancestor? Finally, where does trk-ajm (Ajem Turkic) fit? Currently, Azerbaijani has Old Anatolian Turkish (trk-oat) as its ancestor; presumably Ajem Turkic would slot in between Azerbaijani and Old Anatolian Turkish in the ancestor chain; would trk-ajm be an etym variant of trk-oat or of az? (i.e. which one is it more similar to?) And should the name be "Ajem Turkic" or "Ajami Turkic"? Benwing2 (talk) 22:09, 15 February 2025 (UTC)Reply
@BurakD53 wrote a paragraph about bulgar in the original thread
\\
Looking at the family tree on here, https://en.wiktionary.org/wiki/Category:Proto-Turkic_language, Classical Azeri (az-cls) is a descendent of Azeri (az) so it goes az -> az-cls. It says that Classical Azeri is the form of Azeri used in the 16th - 20th century. Shouldn't it be the other way around and renamed to ajem? So it should be: old anatolian turkish -> ajem -> azeri Zbutie3.14 (talk) 23:47, 15 February 2025 (UTC)Reply
What is going on with that page???? old turkic is supposed to descend from south siberian, salar is supposed to descend from oghuz, why are the descendents of common turkic not listed as its descendents, there is no Sayan or Yenisei under south siberian. Am I missing something or is the page just garbage? Zbutie3.14 (talk) 00:03, 16 February 2025 (UTC)Reply
It's not garbage; it just needs some ancestors set. But as someone not familiar with the whole Turkic family tree, I need specific settings from you, @AmaçsızBirKişi and @BurakD53 before I make any changes. It sounds like there are some issues still to be worked out. Benwing2 (talk) 00:24, 16 February 2025 (UTC)Reply
I think the structure I've been working on this past week in https://en.wiktionary.org/wiki/User:Zbutie3.14/trtable is the most accurate we have right now, @AmaçsızBirKişi and @BurakD53 please look at it and tell me if anything needs to be changed Zbutie3.14 (talk) 00:44, 16 February 2025 (UTC)Reply
All right but I still need you and @AmaçsızBirKişi to review my suggestions above for how to put this info into language codes. Benwing2 (talk) 00:51, 16 February 2025 (UTC)Reply
Frankly, I'm not sure if there is a difference between Proto-Bulgar and Proto-Oghur. When we look at it, we would reconstruct Proto-Turkic *öküz as Proto-Bulgaric *ökür because in Hungarian, which contains loanwords from Bulgaric, it appears as ökör. Similarly, we would reconstruct the Turkish word yemiş as Proto-Bulgaric *yémilč based on Hungarian and Chuvash, and this would be the same in Proto-Oghur. In this case, I’m not sure if we can speak of two separate languages. If there is a difference between Proto-Bulgar and Proto-Oghur, you (@AmaçsızBirKişi) should explain what that difference is. As things stand, unfortunately, there doesn’t seem to be any differences.
The Soyot language is considered a dialect of Tofa, and the Dukhan language is considered a dialect of Tuvan. If you believe these languages are distinct enough that they shouldn't be classified as dialects, then you should identify and describe the specific points of divergence. Then we can decide whether they should be considered dialects or not.
Regarding Bulgaric, we know that Danube Bulgar and Volga Bulgar are clearly distinct from each other. I have mentioned this before. Their writing systems, languages, and the cultures they were influenced by are all different. A people who adopted Islam, the Arabic script, and fell under Mongol rule cannot be equated with a people who adopted Christianity, Greek and Cyrillic scripts, and eventually became Slavicized. That’s why, as I said before, the distinction between xbo-vol and xbo-dnb can be made. In fact, we could also add xbo-kbn (Kuban Bulgar), which we will discuss in relation to Hungarian loanwords. However we don't have any inscription in this language.
On the site, Old Turkic includes both Orkhon Turkic and Yenisei Kyrgyz. There is already a separate code for Old Kyrgyz, but one could also be added for Orkhon Turkic. When placing it in the Descendants list, we should not forget that Yenisei Kyrgyz is a continuation of Orkhon Turkic. Even today, the Khakas people, who still bear the name "Kyrgyz" in the region, should be their descendants. After all, when we look at them, the Khakas, just like the Old Kyrgyz, are not Buddhists.
I'm not sure if Ajem Turkic and Classical Azerbaijani Turkic were actually a distinct language. Previously, I argued that they should be considered separate from Old Anatolian Turkish, but when I examined works supposedly written in the Azerbaijani region, I found nothing other than Old Anatolian Turkish. Whichever text I looked at, the language was OAT. This makes sense because there were two literary languages: one was Chagatai Turkic, also known as Eastern Turkic, and the other was Old Anatolian Turkic, also known as Western Turkic, which later evolved into Ottoman Turkish. Writers produced works in these two languages.
In short, I now believe that Azerbaijani should be classified under Old Anatolian Turkish. As for the term Ajem Turkic, it can be used not to indicate a distinct language but rather to refer to both Azerbaijani Turkic and Qashqai, since Ajami means Iranian in our language. Given that it refers to Turkic spoken in the Iranian region, this naming can be justified.
Regarding Classical Azerbaijani, there is no clear-cut distinction between it, Old Anatolian Turkish, and Ottoman Turkish. However, if a logical framework can be established and its distinction from other languages is clearly defined, perhaps a code could be assigned. Personally, I don't see this distinction clearly. Even in Fuzuli’s works, both ben and men appear within the same couplet. Maybe one distinguishing feature could be the use of -em instead of -üm as a suffix. Idk. BurakD53 (talk) 08:54, 16 February 2025 (UTC)Reply
@BurakD53 Thank you very much for your detailed comments. Keep in mind that etym variant codes can be assigned for lects that are not distinct enough to warrant treatment as a separate L2 language but where there is enough of a distinction where it makes sense to make a distinct lect code. As for the five proposed codes above, I think you're saying that Proto-Bulgar isn't needed; xbo-vol and xbo-dnb can be etym variants of xbo; Orkhon Turkic can be an etym variant of otk; and Ajem/Ajami Turkic is not a separate language hence a code isn't needed, or at most it needs to be an etym variant of Ottoman Turkish. Is that right? Benwing2 (talk) 09:03, 16 February 2025 (UTC)Reply
Yes, you're right. I don't think Ajem Turkic is necessary. The Oghuz classification in User:Zbutie3.14/trtable is completely suitable for me. I sincerely thank @AmaçsızBirKişi and @Zbutie3.14 for their efforts, and you as well for your evaluations. BurakD53 (talk) 09:35, 16 February 2025 (UTC)Reply
OK, just to clarify that I have this right:
  1. Proto-Bulgar (or should it be Proto-Bulgaric?): Same as Proto-Oghur, but we have no language for this. Should we create Proto-Oghur and assign it trk-ogr-pro? If so, should this be an L2 language or an etym variant of trk-pro?
  2. Danube Bulgar [xbo-dnb]: Make an etym variant of xbo (Bulgar).
  3. Volga Bulgar [xbo-vol]: Make an etym variant of xbo (Bulgar).
  4. Orkhon Turkic [otk-ork]: Make an etym variant of otk (Old Turkic)? Confusingly, we have Old Turkic and Old Uyghur as separate L2 languages, but Wikipedia says that Old Uyghur is a later dialect of Old Turkic. In that case, what is the difference between Wiktionary's Old Turkic and Old Uyghur lemmas? Should Old Uyghur have Old Turkic as an ancestor? Should Old Uyghur be merged into Old Turkic?
  5. Ajem/Ajami Turkic: Make it an alias of Classical Azerbaijani. Make Classical Azerbaijani the ancestor of modern Azerbaijani and Qashqai.
Benwing2 (talk) 09:55, 16 February 2025 (UTC)Reply
1. Proto-Oghur would be a more inclusive term, allowing us to include the Khazars as well. Additionally, there is a separate Oghur dialect known as the s-dialect, traces of which can be found in Hungarian and some Uralic languages. These loanwords contain sz instead of gy in word-initial position, as they were borrowed from a different dialect.
2.  Support
3.  Support
4. In linguistic literature, Old Turkic includes Orkhon Turkic, Old Kyrgyz, Old Uyghur. However, Orkhon Turkic and Old Kyrgyz use the same script (despite Old Kyrgyz have some special letters) and share the same religion but in the different region. Old Uyghur, at least on the site, refers to texts written in the Old Uyghur script and associated with Manichaean or Buddhist traditions. The Old Uyghurs also produced works in the Orkhon script, but we classify those inscriptions under Orkhon Turkic on the site. For example, Irk Bitig, despite being a Manichaean divination book, is categorized as Orkhon Turkic. As far as I know, its language does not differ significantly from Orkhon Turkic. I would classify like this:
  • Old Turkic:
    • Orkhon Turkic: (written in Old Turkic Script around the Orkhon Basin between 7th to 9th centruies)
      • Yenisei Kyrgyz: (written in Old Turkic script with Yenisei variants around the Yenisei Basin between 8th-13th centruies)
      • Old Uyghur: (written in Old Uyghur script which derived from Sogdian script around the Mongolia, Hami, Turpan, Gansu regions between 9th-14th centruies)
        • Western Yugur:
Or this:
  • Old Turkic:
    • Orkhon Turkic:
      • Yenisei Kyrgyz:
    • Old Uyghur:
      • Western Yugur:
5.  Support BurakD53 (talk) 10:40, 16 February 2025 (UTC)Reply
Old Turkic entryies are the entries of both Orkhon Turkic and Yenisei Kyrgyz. But Old Uyghur has different entries. Old Turkic is used as an umbrella term here, but Old Uyghur entries are treated separately. BurakD53 (talk) 10:45, 16 February 2025 (UTC)Reply
The reason I wanted separate headers for Proto-Bulgar and Proto-Oghur is that they are definitely not the same language. pOghur was thought to have been spoken before 1st century AD., while Proto-Bulgaric is much more recent (6th-13th centuries.)
Proto-Bulgar is also known as West Old Turkic, which was concurrent with the East Old Turkic (i.e. Orkhon, Yenisei, Uyghur, Karakhanid)
For an example, I'd point to *bugday. The ideal way for the Oghuric descendants to be written would be like this:
  • pTurkic: *bugday
    • Early pOghur: *bugday ~ *buday
      • (bor) pMongolic: *buguday
      • (bor) pMongolic: *budagan
        • Late pOghur: *buɣδay
          • Early pBulgaric: *buɣzai̯
            • Late pBulgaric: *būza
              • (bor) Old Hungarian: buʒa
                • Hungarian: búza
              • Old Chuvash (MČ1): *pŭraĭ
                • Middle Chuvash (MČ2): *pŭri
                  • Chuvash: pări
Whether or not we need as much detail as this one is up for debate, but having two different language codes for Proto-Bulgar and Oghur seems like a no brainer for me.
(By the way, I have used 'Old Chuvash' in that entry for Proto-Bulgar, and that page also has some problems, but the desclist I've written above must be correct, here are the sources: [1] and [2])
  1. ^ Agyágasi, Klára (2019), Chuvash Historical Phonetics (Turcologica; 117), Wiesbaden: Harrssowitz, page 240
  2. ^ Róna-Tas, András; Berta, Árpád; Károly, László (2011), West Old Turkic: Turkic Loanwords in Hungarian (Turcologica; 84), volume 1, Wiesbaden: Harrassowitz Verlag, pages 186-188
  3. AmaçsızBirKişi (talk) 11:28, 16 February 2025 (UTC)Reply
    Is the dh > z change mentioned by Kashgari considered Proto-Bulgaric here? Do the Hungarian loanwords follow the dh > z pattern, or is this specific to just this word? BurakD53 (talk) 11:47, 16 February 2025 (UTC)Reply
    That δ > z is just a step in the larger Bulgaric sound shift of *-d- > -r-. In the book by Róna-Tas, it's dubbed the "second rhotacism" and the following chain of sound changes are given: pTurkic cluster *-Vgd- leniates to *-Vgδ- > *-Vɣz- > *-V̄z- > *-Vr- and finally to -V̆r-.
    I guess it is independent of the *-d- > *-y- sound shift present in other Turkic languages, but they have have affected each other.
    The -ɣ- deletion and the lenghthening of the previous vowel seems to be a common theme before -d- in Bulgar, I don't know enough to call it regular, but see these for example:
    • pTurkic: *edgü ("good")
      • pOghur: *ed(ɣ?)i ~ -ü
        • pBulgaric: *edV
          • (bor) Old Hungarian: idʲ ("holy") [i > e change is regular]
            • Hungarian: egyház ("church")
    • pTurkic: *yogur- ("to knead")
      • pOghur: *ǯuɣur-
        • pBulgaric: **Cūr- (?)
          • (bor) Old Hungarian: dʲǖr-öd
            • Hungarian: gyúr ("to knead, pug")
      • pOghur: **ǯiɣur- (?)
        • pMongolic: *ǯigura-
        • pBulgaric: **Cǖr- (?)
            • (?bor) Hungarian: gyűr ("to crumple")
    Source: Same book and volume by Róna-Tas and Árpád, pages 307-310, 411
    AmaçsızBirKişi (talk) 12:18, 16 February 2025 (UTC)Reply
    Forgot to add that these examples also should any doubt as to whether or not to have a distinct Bulgar language code, apart from Oghur. Using Old Chuvash [cv-old] (c. 13-15th century, following the Volga Bulgar) for this would not be accurate at all. AmaçsızBirKişi (talk) 12:20, 16 February 2025 (UTC)Reply
    Since we have adhine > ايرنى "erne" in Volga Bulgar, we can say that there is no trace of this z-shift in VB. Unfortunately, there are no recorded Volga Bulgar words that could serve as examples of this change. We can only confirm that the r-form exists for this specific word. However, if it is claimed that there was an intermediate stage *azne, considered Proto-Bulgaric, then this intermediate phase must have been significant, so we should have a language code. If we accept this, wouldn't Kashgari’s 11th-century record of azak (instead of ayak) for the Bulgars, Yemeks, Suvars, and some Kipchaks be classified as Proto-Bulgaric? But why wouldn’t Kuban Bulgaric *z < Proto-Bulgaric *dh > Volga Bulgaric r be considered a valid transition? Are we certain that Volga Bulgar evolved from an earlier *z? BurakD53 (talk) 12:43, 16 February 2025 (UTC)Reply
    I mean why not this:
    • pTurkic: *bugday
      • Early pOghur: *bugday
        • Late pOghur: *buɣδai̯
          • Kuban Bulgaric: *buɣzai̯
            • Late Kuban Bulgaric: *būza
              • (bor) Old Hungarian: buʒa
                • Hungarian: búza
          • Old Chuvash (MČ1): *pŭraĭ
            • Middle Chuvash (MČ2): *pŭri
              • Chuvash: pări
    BurakD53 (talk) 12:53, 16 February 2025 (UTC)Reply
    Hungarian and Slavic loanwords from Bulgar have a quite noticable cut-off date, around late 10th and early 11th century. Volga Bulgar however is attested 2 centuries later. Also considering that the *-z- we are talking about would probably be a volatile and unstable sound, I don't see a problem with 10-11th century Bulgar *-z- shifting to 13-14th century Bulgar *-r-. Agyágasi also gives this chain of descendants for irne in Chuvash, for your information:
    • New Persian āδīna
      • (bor) Late Proto-Bulgar: **azinʲa ~ **arʲinʲa
        • Volga Bulgar: ايرنى (érne) [loss of palatalization, perhaps the intervocalic -z- is actually a palatal r, but who knows?]
          • Middle Chuvash (MČ1): *erne
            • Chuvash: irne
    There are good reasons for the palatalization of Proto-Bulgar -r-, and this chain of sound shifts are consistent with what I've given above (*-Vd- > *-Vδ- > *-V/V̄z- > (some intermediary shift) > *-Vr- and finally to -V̆r-.)
    Source for the New Persian to Chuvash sound shifts: Agyágasi's book I've ref'd above, page 191.
    Maybe it's actually the Kuban Bulgar which is responsible for that shift, but I'd like to see some sources on Kuban Bulgar, if we even have any substantial material on that.
    AmaçsızBirKişi (talk) 13:11, 16 February 2025 (UTC)Reply

    ────────────────────────────────────────────────────────────────────────────────────────────────────

    The Kuban Bulgars seem to be the ancestors of the Volga Bulgars because, according to Tekin, they contributed words to Hungarian before the 8th century. We know that the Volga Bulgars migrated to the Volga Bulgar region from the Khazar state in around 9th century. Either the Kuban Bulgars were their ancestors or their cousins. As for Danube Bulgar, considering that the First Bulgarian Empire was founded in the 7th century, we can assume a similar background for them as well. To get the necessary answers, it would be useful to examine the Bulgar loanwords in Old Church Slavonic which evolved to modern Bulgarian.
    However, I want to highlight an important point: foreign languages adapt and adopt sounds that are not present in their own languages. How can we be sure that the Hungarians didn't adapt the δ sound as z in their language? One of the strongest arguments supporting this theory appears to be Kaşgarî’s record. However, since Kaşgarî never actually visited the Bulgar and Suvar lands, this record is generally considered inaccurate.
    If all of this, points to a proto-language with *z, I conclude that the Proto-Bulgar language should also have a code. Moreover, Proto-Bulgar already seems to refer to Kuban Bulgar. Danube Bulgar and Volga Bulgar must have evolved from it. @Benwing2
    • Proto-Oghur: *(r,l,lç,dh)
      • Proto-Bulgar: *(r,l,lç,z)
        • (bor) Old Hungarian:
          • Hungarian:
        • Volga Bulgar: (r,l,(l)ç,r)
          • (...)
            • Chuvash:
        • Danube Bulgar: *(r,l,?,?)
          • (bor) Old Church Slovanic
            • Bulgarian:
    BurakD53 (talk) 13:46, 16 February 2025 (UTC)Reply
    @AmaçsızBirKişi @BurakD53 OK, there is no code for either Proto-Oghur or Proto-Bulgar(ic). And I'm still not sure what the ask is in terms of L2 languages. Do you want two new L2 langs, one new L2 lang or no L2 langs? Keep in mind that just because there is borrowing at different stages doesn't mean we need different L2 langs in all cases; etym variants may be enough. For example, we currently have no L2 codes for Proto-anything in the Romance family (although there is a pending proposal for Proto-Romanian or similar), and in the Slavic family we have only one L2 code for Proto-Slavic. In Germanic we have two L2 codes, for Proto-Germanic and Proto-West Germanic (although Proto-West Germanic is still somewhat controversial as a concept; it was mainly Victar pushing for PWG as a separate L2 language). Benwing2 (talk) 20:16, 16 February 2025 (UTC)Reply
    I'm a bit confused. If oghur and proto-oghur are 2 different codes then shouldn't common-turkic and proto-common-turkic also be 2 different codes? We have a common-turkic code but no proto-common-turkic code. Common-turkic and oghur are both unattested so shouldn't we have only a proto-oghur and proto-common-turkic, no oghur and common-turkic code? Zbutie3.14 (talk) 21:15, 16 February 2025 (UTC)Reply
    This is correct; the same situation exists in the Oghuz languages as well. Yes, Proto-Oghuz is a necessity, but if we already have an Oghuz code and can add reconstruction to it in the descendants list, why would we need a separate Proto-Oghuz language code? I think we should add the Proto-Bulgar code, and if necessary, we can add the reconstruction next to the Oghur heading. BurakD53 (talk) 09:14, 17 February 2025 (UTC)Reply
    I figured out what the problem is. Right now common-turkic is a language. It should be a family, not a language. proto-common-turkic is the name of the language. This is why I was confused. Same should be done with oghur, oghur is a family and proto-oghur is a language. @Benwing2 first before adding new languages we should fix the stuff that's broken right now, so common-turkic should be made a family, proto-common-turkic should be a language, proto-oghur should be a language, the oghuz/kipchak/karluk/siberian families should be part of the common-turkic family, old turkic should be part of the south siberian family, and salar should be part of oghuz. Zbutie3.14 (talk) 14:21, 18 February 2025 (UTC)Reply
    @Zbutie3.14 I went ahead and renamed "Common Turkic" to "Proto-Common Turkic" and changed its code from trk-cmn to trk-cmn-pro, so that trk-cmn can be used as the code for the Common Turkic family (currently it's still an alias for trk-cmn-pro). I realize now I should have pinged @AmaçsızBirKişi and @BurakD53 for confirmation but it seems like an obvious thing to do. Benwing2 (talk) 23:55, 18 February 2025 (UTC)Reply
    I tried to convert all of the existing uses of trk-cmn based on the dump file and/or tracking in Special:WhatLinksHere/Wiktionary:Tracking/languages/trk-cmn. I am going to wait a day or two to see if any more uses pop up, and then create a Common Turkic family using the trk-cmn code. I'll deal with the other stuff at that point. Benwing2 (talk) 01:53, 19 February 2025 (UTC)Reply
    thanks ur the best! <3 Zbutie3.14 (talk) 02:07, 19 February 2025 (UTC)Reply
    I think etym variants will suffice, in a similar vein to cv-old and cv-mid er already have @Benwing2.
    AmaçsızBirKişi (talk) 12:02, 17 February 2025 (UTC)Reply
    @AmaçsızBirKişi OK, can you specify exactly which codes you want and what should be their parent language? Benwing2 (talk) 21:21, 17 February 2025 (UTC)Reply
    I think we all agreed about xbo-vol, xbo-dnb, and otk-ork at least.
    • Oghur: (trk-ogr)¹
      • Bulgar: (xbo)²
        • Volga Bulgar: (xbo-vol, etym variant of xbo)
          • (...)
            • Chuvash: (cv)
        • Danube Bulgar: (xbo-dnb, etym variant of xbo)
    ----
    • Common Turkic:
      • Old Turkic: (otk)
        • Orkhon Turkic: (otk-ork, etym variant of otk)
        • Old Kyrgyz/Yenisei Kyrgyz: (otk-kir, etym variant of otk)
    To see if they will support it or have any suggestions to solve the problem @Bartanaqa @Yorınçga573 @Ardahan Karabağ @Blueskies006 @Vahagn Petrosyan @Samubert96 @Əkrəm Cəfər

    ¹@AmaçsızBirKişi thinks here after should be Proto-Oghur. I support.
    ²@Amaçsızbirkişi thinks here should be Proto-Bulgar, and I support it instead of the reconstruction. Because I still think Proto-Bulgar is a -ð- language, not -z-.

    Guys If you are here, plz see also Wiktionary:Language treatment requests#Proto-Oghuz and Proto-Arghu to be able to enter recorded lemmas, if you support or not. Thanks.
    BurakD53 (talk) 07:19, 19 February 2025 (UTC)Reply
    @BurakD53 Can you redo your table, making the following distinctions:
    1. clearly distinguish full languages, etym languages and families;
    2. include all the intermediate nodes;
    3. boldface the stuff that needs adding;
    4. indicate, when language B is indented under language A, whether A is ancestral to B.
    In this case, I take it:
    1. Oghur (trk-ogr) is a family which already exists, but Proto-Oghur does not exist and needs to be added. Proto-Oghur (trk-ogr-pro) would be an etym variant of Proto-Turkic (trk-pro), just like Proto-Oghuz is.
    2. Bulgar is a full language which already exists, and has Proto-Oghur as its ancestor.
    3. Volga Bulgar and Danube Bulgar are etym variants of Bulgar, but there is not an ancestral relationship. NOTE: I am going to use xbo-dan instead of xbo-dnb, for consistency.
    4. Old Chuvash has Volga Bulgar as its ancestor; Middle Chuvash has Old Chuvash as its ancestor; Chuvash has Middle Chuvash as its ancestor. Anatri and Viryal are Chuvash etym variants but there is not an ancestral relationship.
    5. Common Turkic is a family that will be created. Proto-Common Turkic already exists and is an etym variant of Proto-Turkic.
    6. The Oghuz, Kipchak, Karluk and Siberian Turkic families will be placed under the Common Turkic family.
    7. Old Turkic will be placed under the South Siberian Turkic family, which is under Siberian Turkic.
    8. Orkhon Turkic will be created as an etym variant of Old Turkic, as Old Kirghiz already is.
    9. Are there are ancestor/descendant relationships among Old Turkic, Orkhon Turkic, Old Kirghiz and Old Uyghur?
    10. Salar will be placed under the Oghuz family per @Zbutie3.14.
    11. Pecheneg (an L2 language), Salchuq (an L2 language), Khazar (an L2 language) and Arghu (an etym variant of Proto-Turkic, with L2 language Khalaj as its descendant) are currently hanging directly off of Proto-Turkic. Should they be moved elsewhere?
    Benwing2 (talk) 07:49, 19 February 2025 (UTC)Reply
    Going off of Burak's comment, here is the full descendants list (based on ancestry):
    • Proto-Turkic: [trk-pro]
      • Oghur(ic): [trk-ogr] (FAMILY)
        • Proto-Oghur: [trk-ogr-pro] (ETYM) (#1)
          • Proto-Bulgar: [trk-blg-pro] (ETYM) (#2)
            • Volga Bulgar: [xbo-vol] (ETYM) ([xbo] should also work here #3)
              • Old Chuvash: [cv-old] (ETYM)
                • (...)
                  • Chuvash: [cv]
            • Danube Bulgar: [xbo-dan] (ETYM) ([xbo] should also work here #3)
              • (...) (borrowings)
          • Khazar: [zkz]
            • (...)
      • Common Turkic: [trk-cmn] (FAMILY)
        • Siberian Turkic: [trk-sib] (FAMILY)
          • Old Turkic: [otk]
            • Orkhon Turkic: [otk-ork] (ETYM) (#4)
            • Yenisei Turkic: [otk-kir] (ETYM) (#4)
            • Old Uyghur [oui]
              • (...)
          • (...)
        • Arghu: [trk-arg] (FAMILY)
        • Oghuz: [trk-ogz] (FAMILY)
        • Kipchak: [trk-kip] (FAMILY)
        • Karluk: [trk-kar] (FAMILY)
    ---
    /// Footnotes: ///
    '#1: Proto-Oghur, like you said, can be a etym-variant of Proto-Turkic. It will have Proto-Bulgar and Khazar as its descendants. We might need to add Tuoba, Apar and so on if we reach a consensus or if the need arise. But those are very tentative, so I digress.
    '#2: Proto-Bulgar is the theoretical reconstruction of the Bulgaric languages, Danube and Volga (and also Kuban, but that's unattested) Bulgar. Its ancestor is Proto-Oghur and its descendants are Volga and Danube Bulgar variants, alongside the unsplintered Bulgar [xbo].
    '#3: The new Volga variant of Bulgar will have Old Chuvash (and the contemporary Chuvash) as its descendants. Danube Bulgar does not need a descendant, since it is a dead branch. It's there mainly because of loanwords into Hungarian, Church Slavonic and Romanian.
    '#4: Both Orkhon [otk-ork] and Yenisei Turkic [otk-kir] should have Old Turkic [otk] as their ancestor. We also might need to add Old Uyghur [oui] as a descendant of [otk] too. There is a recurring issue of previous edits confusing Orkhon Turkic and Old Uyghur, and people immediately assume a text to be Orkhon if it has runes, which is simply not the case. For example, almost half of the lemmas in Orkhon Turkic mainspace cites Ïrḳ Bitig, a work in Old Uyghur, for instance. Separating these would be more accurate.
    ---
    /// Some more: ///
    1. Arghu is a descendant from the Common Turkic branch, as far as I am aware. The confusion stems from the fact that it is the earliest branch to diverge from other Turkics, but it is firmly in the Common Turkic family.
    2. I don't think it would be appropriate if we placed Old Turkic under South Siberian, that would be anachronistic. Yakuts and Dolgans have not migrated northwards at the time when Old Turkic was spoken.
    3. Orkhon - Yenisei - Uyghur has no ancestral relation to one another. They all stem from Old Turkic, that's all.
    4. Khazar is an Oghuric language. I've already talked about Arghu, and I do not know much about Salchuq or Pecheneg. We don't have any entries in neither, so I don't think chopping them off from the family table (for now) is that much of an issue.
    Please let me know if I got something wrong!
    AmaçsızBirKişi (talk) 11:58, 19 February 2025 (UTC)Reply
    About #4: After the collapse of the Göktürk State, the language used in the Old Uyghur runic inscriptions was no different from Orkhon Turkic. It was a continuation of the same written tradition in the same region, around the Orkhon basin. Therefore, I believe that texts written in the Orkhon script, such as Irk Bitig, should not be included under Old Uyghur entries. In academia, Old Uyghur Turkic is often used to refer to texts written in the Old Uyghur script, while Irk Bitig is frequently classified as Old Turkic. In his book Irk Bitig: Book for Omens, Talat Tekin did not use the term "Uyghur" even once for Irk Bitig. Instead, he simply referred to it as "Old Turkic" and described it as a Manichaean ny dialect. As we know, Orkhon Turkic is also a ny dialect. Therefore I think that's why Yorınçga includes to Old Turkic instead of Old Uyghur. He can explain better. As stated in the source linked, Old Turkic texts written in the Orkhon script are referred to as the Manichaean dialect. See. All the Old Turkic texts written in the Orkhon script are referred to as the Manichaean ny dialect. BurakD53 (talk) 15:53, 19 February 2025 (UTC)Reply
    Very well. I'll remove the quotations from Ïrḳ Bitig I added for Old Uyghur. Thanks for correcting me!
    The Dergipark article you linked is dead, by the way.
    AmaçsızBirKişi (talk) 16:18, 19 February 2025 (UTC)Reply
    [9] here. After reading a bit about it though, I'm not sure. Perhaps it would be more accurate to add it as Old Uyghur. Although it is written in the ny dialect, there are other differences, for example the use of the -gAy suffix for the future tense. Using the ablative suffix -dIn. These are different from Orkhon Turkic. I take my words back.BurakD53 (talk) 19:28, 19 February 2025 (UTC)Reply
    I mean, sure why not? It was written in either year 930 or 942, way outside the range of other Turkic inscriptions (8th century).
    We can remove the IB from the quotations part and the entries that rely only on IB when we deprecate the [otk] in favor of [otk-ork] and [otk-kir]. For example, yél ("mane") is only attested in IB and nowhere else in the Orkhon script. Entries like that will need removal.
    AmaçsızBirKişi (talk) 19:58, 19 February 2025 (UTC)Reply
    Probably not all the Runic inscriptions after the collapse of the Gokturk state, but Irk Bitig should be considered as Old Uyghur. BurakD53 (talk) 19:30, 19 February 2025 (UTC)Reply
    I support the table.  Support. BurakD53 (talk) 16:05, 19 February 2025 (UTC)Reply
    @AmaçsızBirKişi @BurakD53 @Zbutie3.14 OK I tried to implement everything in the above table. Please review the results. Arghu is not currently a family but an etym variant of Khalaj, so I just set its ancestor to Proto-Common Turkic. Also I gave Proto-Bulgar the code trk-bul-pro insead of trk-blg-pro, for consistency. Possibly it should be xbo-pro, but I don't know if it's kosher to have a protolanguage that is "Proto-" of a language rather than a family. Benwing2 (talk) 06:21, 20 February 2025 (UTC)Reply
    Salar should be under Oghuz branch, just like Turkmen. Other than that, it's perfect. Thanks for resolving this issue.
    AmaçsızBirKişi (talk) 10:55, 20 February 2025 (UTC)Reply
    Is Khazar oghur? according to wikipedia it's disputed https://en.wikipedia.org/wiki/Khazar_language Zbutie3.14 (talk) 13:36, 20 February 2025 (UTC)Reply
    We are making quite a few requests, but may I ask for one more thing? Could we create three variants for qwm, just like we did for otk?
    • Proto-Turkic: [trk-pro]
      • Proto-Common-Turkic: [trk-cmn-pro]
        • Kipchak: [trk-kip] (FAMİLY)
          • Cuman-Kipchak: [trk-kcu]
            • Kipchak: [qwm]
              • Cuman: [qwm-cum] (etym variant of qwm) (here what I ask for)
                • Crimean Tatar: [crh]
                  • Urum: [uum]
                • Karachay-Balkar: [krc]
                • Karaim: [kdr]
                • Krymchak: [jct]
                • Kumyk: [kum]
                • Armeno-Kipchak: [qwm-arm] (etym variant of qwm)
              • Mamluk-Kipchak: [qwm-mam] (etym variant of qwm)
    BurakD53 (talk) 07:30, 20 February 2025 (UTC)Reply
    what do you think? @AmaçsızBirKişi BurakD53 (talk) 07:33, 20 February 2025 (UTC)Reply
    Armeno Kipchak must be a descendant of Cuman too. Since Cuman is written in Crimea 14th ce., Armeno Kipchak is written in Crimea in 17th century. While Mamluk Kipchak written in Egypt in 13th-16th centuries, can't be a descendant of Cuman. I will just edit the table, to not confuse more. BurakD53 (talk) 07:44, 20 February 2025 (UTC)Reply
    I added Cuman as an etym variant of qwm (Kipchak) and put Armeno-Kipchak under it, but I'm not sure about putting Crimean Tatar, Karachay-Balkar, etc. under Cuman. Currently the Kipchak-Cuman family (what you call Cuman-Kipchak) is under (a descendant of) the Kipchak language, whereas your tree above has them reversed. Can you edit your tree and label everything that's a family with the label "FAMILY" so we are completely clear what's going on? Also, Wikipedia asserts that "Cuman" and "Kipchak" are the same thing; see w:Cuman language. Benwing2 (talk) 23:15, 20 February 2025 (UTC)Reply
    Also ping @AmaçsızBirKişi @Zbutie3.14. Benwing2 (talk) 23:16, 20 February 2025 (UTC)Reply
    • Proto-Turkic: [trk-pro]
      • Proto-Common Turkic: [trk-cmn-pro]
        • Kipchak: [trk-kip] (FAMİLY)
          • Cuman-Kipchak: [trk-kcu] (FAMİLY)
            • Kipchak: [qwm]
              • Cuman: [qwm-cum] (etym variant of qwm, location Crimea)
              • Armeno-Kipchak: [qwm-arm] (etym variant of qwm, location Crimea)
              • Mamluk-Kipchak: [qwm-mam] (etym variant of qwm, location Egypt)
            • Crimean Tatar: [crh] (location Crimea)
              • Urum: [uum] (location Southeast Ukraine)
            • Krymchak: [jct] (location Crimea)
            • Karachay-Balkar: [krc] (location Caucasus)
            • Karaim: [kdr] (location Crimea, Poland)
            • Kumyk: [kum] (location Caucasus) BurakD53 (talk) 08:21, 21 February 2025 (UTC)Reply
      @BurakD53 This appears to not properly indicate the ancestor/descendant relationships. Presumably Armeno-Kipchak is a descendant of Cuman? What about Crimean Tatar, Krymchak and/or Karaim? Can you explicitly indicate the ancestor of each lect where it differs from the containment relationships shown in the above table? Benwing2 (talk) 08:56, 21 February 2025 (UTC)Reply
      I don't have enough knowledge about these languages, so any comment I make could be incorrect. Yes, one is probably the ancestor or descendant of the other, but I'm saying this just based on location and the period. BurakD53 (talk) 09:03, 21 February 2025 (UTC)Reply
    I think [qwm-cum] stands for the language of Codex Cumanicus right? If so yes we need that.
    AmaçsızBirKişi (talk) 10:53, 20 February 2025 (UTC)Reply
    Just noting, because Salchuq is mentioned a couple of times above, that Salchuq has been removed (from the ISO list of languages and from ours) as spurious per a discussion further down on this page, Wiktionary:Language treatment requests#Retiring Salchuq. - -sche (discuss) 23:55, 22 November 2025 (UTC)Reply
    I have never heard of Salchuq before. It is mentioned here a few times so I guess I completely missed it. Zbutie3.14 (talk) 01:29, 23 November 2025 (UTC)Reply

    Etymology-only codes and dialect labels for South Sumatran Malayic

    [edit]

    I would like to request etymology-only codes and dedicated dialect labels (not sure if this is the right place?) for South Sumatran Malayic varieties under the Musi and Central Malay dialect groups. These varieties used to have their own ISO 639-3 codes before they (except [liw], [vkk], and [pel]) were merged into [mui] and [pse] in 2008. Per McDowell & Anderbeck (2020), many of these lects do have their own salient distinguishing features, and they remain treated as separate languages in most Indonesian publications. Specific words from several of these varieties have been borrowed into Indonesian, and they need to be etymologized properly (attested terms only, per Wiktionary:About Indonesian#Regional Languages).

    Etymology-only languages currently needed:

    • [mui-plm] or [mui-plb] Palembang (formerly [plm])
    • [mui-syu] or [mui-sky] Sekayu (formerly [syu] in Ethnologue 13, pre-ISO)
    • [mui-lmt] Lematang (formerly [lmt])
    • [pse-bke] or [pse-ben] Bengkulu (formerly [bke])

    Not necessary, but may be useful for tracing etymon reflexes:

    • [pse-srj] Serawai (formerly [srj])

    Given the lack of universally accepted standard varieties in both [mui] and [pse] groupings, we also need to carefully label and categorize their entries according to their specific dialectal origin. I propose we adopt the classification given in McDowell & Anderbeck (2020), which retains most of the familiar local "language" labels (in Italics).

    Musi dialect group [mui]

    • Upper Musi
      • Musi Proper (= Musi, formerly [mui] in the narrow sense)
        • Kelingi
        • Penukal
        • Sekayu
      • Pegagan (often misidentified as a dialect of Ogan [ogn])
      • Rawas (formerly [rws])
      • Col [liw]
    • Palembang–Lowland
      • Palembang (formerly [plm])
        • Palembang Lama (traditional variety which includes a polite register akin to Javanese krama, taught locally in Palembang schools since 2024)
        • Palembang Pasar (urban koiné used as a regional lingua franca within and beyond the city of Palembang)
        • Pesisir (rural coastal variety, formerly listed under [mly])
      • Lowland
        • Belide (formerly under [lmt] and [mly])
        • Lematang Ilir (= Lematang, formerly [lmt])
        • Penesak (formerly [pen])

    Central Malay dialect group [pse]

    • Oganic
      • Ogan (formerly [ogn])
      • Rambang
      • Enim (formerly [eni])
    • Highland
      • Bengkulu (formerly [bke])
      • Besemah (formerly [pse] in the narrow sense)
      • Lematang Ulu (identical to Besemah)
      • Lintang (formerly [lnt])
      • Semende (formerly [sdd])
      • Benakat
      • Serawai (formerly [srj])
        • Talo (*-a > [o], used by Adelaar to reconstruct Proto-Malayic)
        • Manna (*-a > [aw])
      • Kaur [vkk]
      • Pekal [pel]

    Currently I have started using some of these labels in entries, cf. katek, rete, and muanai. At the very least, I think we need dedicated labels and categories for the etymology-only languages proposed above + the already existing [pse-bsm] (Besemah). The category names for dialects of [pse] and [mui] may be appended with "Malay", e.g. Palembang Malay, Musi Malay, Ogan Malay, Semende Malay, etc.

    Note that prior to the merger of the codes (and up until now in Indonesia), the term "Palembang Malay" or "Palembang language" (bahasa Palembang) can only refer to the dialects under "Palembang" in particular, while "Musi language" (bahasa Musi) refers to dialects under "Musi Proper". The rest of the dialects are either treated as languages on their own, as dialects of Malay, or occasionally under other umbrella terms such as "Bengkulu language" (bahasa Bengkulu) for Highland [pse] dialects spoken in Bengkulu.

    I am indifferent to the issue of whether we should lump together [vkk] and [pel] with [pse], and [col] with [mui]. In particular, [pel] is sometimes placed closer to [min] than to other [pse] lects (e.g. in Glottolog). Haji [hji] is an isolate within Malayic, sharing only ~60% of its lexicon with neighboring South Sumatran varieties, and is best treated as its own language. All [mui], [pse], and [hji] lects should be written in [Latn] as the default script, but [pse] also uses [Rjng], and [mui] is occasionally written with [ms-Arab]. Swarabakti (talk) 21:21, 25 January 2025 (UTC)Reply

    Reconstruction:Common Romanian

    [edit]

    Common Romanian, also called ‘Proto-Romanian’, is the reconstructed common ancestor of Aromanian, Istro-Romanian, Megleno-Aromenian, and Romanian. There is considerable scholarship on the subject. Sala 1976 treats the phonological aspects of the reconstruction in detail.

    We already host such reconstructions under ‘Reconstruction:Latin’, which is problematic for a number of reasons:

    • The name. No scholar refers to this reconstruction as ‘Latin’, and that name can easily mislead our readers.
    • The orthography. Spellings like *⟨oestricula⟩ are quite out-of-step with reconstructions like /ˈstrekʎe/.

    Proposed orthography: the phonemic transcriptions as they are now, except with some other way of indicating stress. For instance *strékʎe.

    Pinging @Word dewd544, @Catonif, @Bogdan, @Benwing2 as potentially interested parties.

    Nicodene (talk) 20:30, 30 January 2025 (UTC)Reply

    No objection here except possibly to the name; "Proto-Romanian" sounds a bit better IMO although I'm not familiar with the scholarship to know what's the most common term. Benwing2 (talk) 21:07, 30 January 2025 (UTC)Reply
    Is there any chance we could call it "Proto-Eastern Romance" since we group the languages in question together as the Eastern Romance languages? It gets a good number of Google hits. Also, both "Proto-Romanian" and "Common Romanian" are likely to be perceived as the ancestor of Romanian alone, not the other ones. —Mahāgaja · talk 21:09, 30 January 2025 (UTC)Reply
    I agree that it would be useful to have Proto-Romanian (or however we decide to call it), not just for Latin words, but also for borrowings from Albanian. But about this particular case, while I agree that there can be a reconstruction before the split into Romanian and Aromanian pronounced /strekʎe/, the word itself is older, a Late Latin *oestricula must have existed, as the diminutive suffix was no longer productive at the later stage of the language (Proto-Romanian). I also wonder if we can find an obscure descendant of *oestricula in some dialect of Northern Italian, as often happens with Romanian words that are from Late Latin. Bogdan (talk) 23:02, 30 January 2025 (UTC)Reply
    I’m not sure we can regard the criterion for Latin as ‘still having a productive reflex of -iculum’ in light of, for instance, Spanish -ejo.
    @Mahagaja: Italian is often included under the label Eastern Romance, unfortunately. A possible option without this issue is Proto-Balkan-Romance.
    Nicodene (talk) 03:43, 31 January 2025 (UTC)Reply
    Even if others include Italian under Eastern Romance, we don't. We already use that term with the label roa-eas for a family consisting of ro, ruo, rup, and ruq. Calling the protolanguage of that family Proto-Eastern Romance would be internally consistent. That other people define the Eastern Romance family differently doesn't really have any relevance to what we call the protolanguage. —Mahāgaja · talk 07:13, 31 January 2025 (UTC)Reply
    There has never been a discussion or vote on defining the label Eastern Romance, or using it on Wiktionary to begin with.
    The vast majority of the time the term Eastern Romance has a broader scope than those four languages.
    Nicodene (talk) 10:21, 31 January 2025 (UTC)Reply
    Personally, I would be fine with this if it only implies we would still handle the situation exactly as we do now, the only difference being the language name as "Common Romanian" instead of "Latin" and the orthography more fitting, which are the two issues listed here. But I oppose this if, as I am to understand, this would take the role of a full-fledged language language and hence also have term inherited from attested Latin terms and terms borrowed from Slavic or some other Balkan language. This would increase the reconstruction up to an excessive number (approximately two thousands), an immense amount of work for little usefulness provided and greater informational clutter.
    Regarding the name, were the first approach I mentioned go through, I would support "Common Romanian", or if we find it more coherent with the rest of the bunch, "Proto-Romanian". Any mention of "Eastern" or "Balkan Romance" I would vote against. Catonif (talk) 18:15, 31 January 2025 (UTC)Reply

    February 2025

    [edit]

    Proto-Oghuz and Proto-Arghu to be able to enter recorded lemmas

    [edit]

    We have probably discussed this before on Discord and maybe on other discussion pages, but the Proto-Oghuz language should be eligible for entry. Some Proto-languages can be added to Wiktionary without formal reconstruction. Same is needed for Proto-Arghu. We don't have a code for it but Kashgarî recorded words in Proto-Arghu and we add these words like it's Karakhanid language, which is wrong. The same should be possible for Proto-Oghuz. A non-reconstructed language entry should be allowed for the Oghuz dialect recorded in the 11th century and earlier. As an example, I can add recorded lemmas in the Proto-Norse language, but can't for Proto-Oghuz. >>ᚺᚼᛁᛞᛉ<< BurakD53 (talk) 13:30, 18 February 2025 (UTC)Reply

    @Benwing2 @AmaçsızBirKişi @Zbutie3.14 @Bartanaqa @Yorınçga573 @Ardahan Karabağ BurakD53 (talk) 17:15, 18 February 2025 (UTC)Reply
    yeah completely agree for having separate entries for non-reconstructed proto-oghuz/arghu entries cuz they are quite literally attested. Bartanaqa (talk) 23:36, 18 February 2025 (UTC)Reply
    @BurakD53 I'm not opposed but I assume there are very few such terms, is that right? If so, rather than simply turning off the "reconstructed" type for Proto-Oghuz and for Arghu, we (I) should implement the "anti-asterisk" feature mentioned in Wiktionary:Beer_parlour/2024/April#Mainspace_Proto-West-Germanic?. That way, they are still identified as reconstructed languages but you can create mainspace entries provided you identify them with the appropriate symbol (which might be a double exclamation point, !!). Benwing2 (talk) 07:54, 19 February 2025 (UTC)Reply
    Alright, that's great! I believe that more than 200 words from Kashgarî's Diwan, written in the 11th century, should be considered Proto-Oghuz because this language belonged to the *Tağlığ group, whereas today all Oghuz languages, including Salar, belong to the *Tağlı group. The information provided by Arab travelers about the Oghuz people and their language in the 11th century and earlier can also be considered. More than a dozen words from Proto-Arghu must have been recorded in the Diwan as well, since I can recall about a dozen myself. BurakD53 (talk) 14:58, 19 February 2025 (UTC)Reply
    This is داغ#Karakhanid the word in Arghu. Can you show me how do you change it as Proto-Arghu? So I can edit Oghuz and Arghu lemmas in the same way. @Benwing2 BurakD53 (talk) 19:39, 19 February 2025 (UTC)Reply
    @BurakD53 We have no Proto-Arghu or Arghu family yet; Arghu is currently an etym variety of Khalaj. Should the Arghu family be added? See my comments above. Benwing2 (talk) 07:38, 20 February 2025 (UTC)Reply
    It should be. The most distinctive feature of Arghu is that it changes the Old Turkic -ny- sound to -n-. Also, instead of using *emez and its variants like all other Turks or *degül like the Oghuz, it has its own way of saying "not." It is a language that has preserved primary long vowels and does not belong to the ayak group like the Oghuz. Which make it a whole different branch. BurakD53 (talk) 07:56, 20 February 2025 (UTC)Reply
    In that case what should happen to the Arghu etym variety of Khalaj? Should it disappear in favor of Proto-Arghu? Benwing2 (talk) 08:06, 20 February 2025 (UTC)Reply
    It should be:
    • Arghu: [trk-arg]
      • Proto-Arghu: [trk-arg-pro] (some words are attested by Kaşgarî in 11th century)
        • Khalaj: [klj]
    BurakD53 (talk) 09:48, 20 February 2025 (UTC)Reply
    Are you saying that the Arghu language (etym variety) should be converted into a family? It isn't clear to me. BTW I don't think there are any actual Arghu language entries being referenced currently, because there's no page Category:Arghu or Category:Arghu Turkic or any such thing. Benwing2 (talk) 22:57, 20 February 2025 (UTC)Reply
    Arghu Turkic is attested in Diwanu Lügatit Türk, in 11th century. There is no such category because we haven't add Arghu lemmas yet. Arghu languages is a subfamily. See Argu languages. I have never heard that an etym variant of Khalaj called Arghu. Let's also ask to @Xenos melophilos. The only record of Arghu is in Diwan and that's why the group called Arghu. BurakD53 (talk) 23:51, 20 February 2025 (UTC)Reply
    Arghu is a Common Turkic language, so it is a z-Turkic. İt is an adak group Turkic language according to *adak. İt has primary long vowels. It is a -n- group language as in *koń. BurakD53 (talk) 00:00, 21 February 2025 (UTC)Reply
    @BurakD53 So what should we do exactly? Should we rename the Arghu language to Proto-Arghu? Should we instead add Proto-Arghu and keep the Arghu language? And what should Proto-Arghu be an etym variant of? How distinct is it from Proto-Common Turkic and/or modern Khalaj? Benwing2 (talk) 08:58, 21 February 2025 (UTC)Reply
    Sorry to keep asking you the same questions but you need to be extremely explicit about all the various relationships. Maybe @AmaçsızBirKişi can help you. Benwing2 (talk) 08:59, 21 February 2025 (UTC)Reply
    Proto-Common Turkic is a -ny- language, while Proto-Arghu is a -n- language. Proto-Common Turkic uses *ermez for not, Proto-Arghu uses da:g, Proto-Oghuz uses *degül. The difference between Khalaj and Arghu is Arghu was attested in 11th century, but Khalaj is a modern spoken language in 21st century. So Proto-Arghu is the ancestor of Khalaj. Khalaj has Azerbaijani borrowings, while Arghu doesn't. Khalaj probably influenced by Persian quite much. There are attested lemmas in Arghu but does not live in modern Khalaj. Do we have the words balık "mud", teşrüm "string ball", bitrik "peanut" in Khalaj? @Xenos melophilos can help better. I'm not sure we have all the attested Arghu words in Khalaj. That's normal because there is literally a millennium. Of course there must be phonological and maybe morphological differences too. BurakD53 (talk) 09:32, 21 February 2025 (UTC)Reply
    I think we should remove Arghu language and add Proto-Arghu because there is no script in Arghu language we found today. Arghu language must be Proto-Arghu and we can reconstruct with the help of attested words in DLT. İf we have an Arghu language without any reconstruction, there will be only 15 or 30, max 50 lemmas, that will be all the language we have. So, I think Proto-Arghu is better. BurakD53 (talk) 09:40, 21 February 2025 (UTC)Reply
    If you think the Proto language is unnecessary, just make it Arghu language. The only thing that matters to me is that Arghu can be presented as a separate branch from Karakhanid and as the ancestor of Khalaj. I honestly don't care about the rest. BurakD53 (talk) 10:13, 21 February 2025 (UTC)Reply
    Well you guys are mentioning me. What do I think?
    Arghu or protoarghu I don't care, anyways it'll look like protonorse (sometimes attested sometimes not). What it matters is that it should exist a language section called arghu or protoarghu
    Arghu is attested with persoarabic, in khalaj there are more conservative dialects than others, and so arghu words that seem to not exist in khalaj could actually be preserved in some dialect
    My point is that arghu should be a language appart , and not reconstructed because there are words attested in the divan (just like protorse) Xenos melophilos (talk) 14:50, 21 February 2025 (UTC)Reply
    There should not be a "arghu" group because we have just one language with one descendant. Arghu or protoarghu language is fine Xenos melophilos (talk) 14:52, 21 February 2025 (UTC)Reply
     Support.
    AmaçsızBirKişi (talk) 19:02, 19 February 2025 (UTC)Reply

    harmonizing families and proto-languages, and other proto-language warnings

    [edit]

    (moved from Wiktionary:Beer parlour/2025/February)

    We have a whole host of warnings (17) issued concerning mismatches between proto-languages and families:

    1. Proto-Central Togo (alv-gtm-pro) does not have the expected name "Proto-Ghana-Togo Mountain", even though it is the proto-language of the Ghana-Togo Mountain languages (alv-gtm).
    2. Proto-Arawa (auf-pro) does not have the expected name "Proto-Arauan", even though it is the proto-language of the Arauan languages (auf).
    3. Proto-Arawak (awd-pro) does not have the expected name "Proto-Arawakan", even though it is the proto-language of the Arawakan languages (awd). [harmonize under Arawak]
    4. Proto-Ta-Arawak (awd-taa-pro) does not have the expected name "Proto-Ta-Arawakan", even though it is the proto-language of the Ta-Arawakan languages (awd-taa). [harmonize under Ta-Arawak]
    5. Proto-Basque (euq-pro) does not have the expected name "Proto-Vasconic", even though it is the proto-language of the Vasconic languages (euq). [keep as-is]
    6. Proto-Norse (gmq-pro) does not have the expected name "Proto-North Germanic", even though it is the proto-language of the North Germanic languages (gmq). [keep as-is but rename gmq-pro to non-pro]
    7. Proto-Kamta (inc-krn-pro) does not have the expected name "Proto-KRNB lects", even though it is the proto-language of the KRNB lects (inc-krn). [rename family to KRDS languages, keep proto-language as-is]
    8. Proto-Chumash (nai-chu-pro) does not have the expected name "Proto-Chumashan", even though it is the proto-language of the Chumashan languages (nai-chu).
    9. Proto-Maidun (nai-mdu-pro) does not have the expected name "Proto-Maiduan", even though it is the proto-language of the Maiduan languages (nai-mdu).
    10. Proto-Mixe-Zoque (nai-miz-pro) does not have the expected name "Proto-Mixe-Zoquean", even though it is the proto-language of the Mixe-Zoquean languages (nai-miz).
    11. Proto-Pomo (nai-pom-pro) does not have the expected name "Proto-Pomoan", even though it is the proto-language of the Pomoan languages (nai-pom).
    12. Proto-Mazatec (omq-maz-pro) does not have the expected name "Proto-Mazatecan", even though it is the proto-language of the Mazatecan languages (omq-maz).
    13. Proto-North Sarawak (poz-swa-pro) does not have the expected name "Proto-North Sarawakan", even though it is the proto-language of the North Sarawakan languages (poz-swa).
    14. Proto-Salish (sal-pro) does not have the expected name "Proto-Salishan", even though it is the proto-language of the Salishan languages (sal). [harmonize under Salish]
    15. Proto-Samic (smi-pro) does not have the expected name "Proto-Sami", even though it is the proto-language of the Sami languages (smi).
    16. Proto-Kuki-Chin (tbq-kuk-pro) does not have the expected name "Proto-Kukish", even though it is the proto-language of the Kukish languages (tbq-kuk). [harmonize under Kuki-Chin]
    17. Proto-Saka (xsc-sak-pro) does not have the expected name "Proto-Sakan", even though it is the proto-language of the Sakan languages (xsc-sak).

    We also have four warnings about proto-languages without associated families;

    1. Proto-Amuesha-Chamicuro (awd-amc-pro) has a proto-language code associated with the invalid code "awd-amc".
    2. Proto-Kampa (awd-kmp-pro) has a proto-language code associated with the invalid code "awd-kmp".
    3. Proto-Paresi-Waura (awd-prw-pro) has a proto-language code associated with the invalid code "awd-prw".
    4. Proto-Puroik (sit-khp-pro) has a proto-language code associated with the invalid code "sit-khp".

    We also have two weird miscellaneous warnings:

    1. Proto-Rukai (dru-pro) has a proto-language code associated with Rukai (dru), which is not a family.
    2. Kelantan Peranakan Hokkien (mis-hkl) has its canonical name ("Kelantan Peranakan Hokkien") repeated in the table of aliases.

    I can look into the second miscellaneous warning, but for the others, I mostly don't have enough context. Proto-Norse being the ancestor of the North Germanic languages is a special case because it's attested, but for the other mismatches, I imagine a lot of them are unintentional due the existence of multiple names for the same family. It should be possible in many cases to rename either the family or proto-language to avoid the mismatch. Pinging @-sche and @Theknightwho who might know something about this; please feel free to ping others. Benwing2 (talk) 04:09, 19 February 2025 (UTC)Reply

    In some cases, I think the family uses a different name to avoid having the same exact name as a (non-proto) language (as described in WT:FAM). For example, "Proto-Vasconic" gets only 13 Google Books hits (that actually use that term; the subsequent pages upon pages of results that Google returns don't use the term or sometimes even have any particular relevance — who knows why Google returns them), whereas I find 10+ pages [of ten uses each] of "Proto-Basque", so "Proto-Basque" is clearly the more common name for the language ... but without even checking whether "Basque languages" or "Vasconic languages" is more common for the family, I can see that one benefit to calling them "Vasconic languages" is that if they were called "Basque languages", then things like {{der|en|euq|-}} would display identically to {{der|en|eu|-}}. (That might not matter that much in that particular case, but for larger families it'd be confusing. However, {{der|en|qwm|-}} and {{der|en|trk-kip|-}} do display identically... so maybe we need to rename one of those, or find some way of solving this "same name" issue...)
    In some cases, the proto-language and family might really have different common names.
    In the case of Salish, it looks like the family could be renamed "Salish" to match the proto-language; "Proto-Salish" gets 11 pages of relevant Google Books results vs only 9 pages for "Proto-Salishan", and "Salish languages" is apparently also more common. - -sche (discuss) 05:04, 19 February 2025 (UTC)Reply
    "Ta-Arawak" seems to be marginally more common than "Ta-Arawakan", if we wanted to synchronize that pair: on Google Scholar, "Ta-Arawak" gets 40 hits, "Ta-Arawakan" 26; on Google Books, each one gets about 14 hits (discounting a few which are not in English and are only using ta as a particle while mentioning the Arawak/an languages). "Proto-Ta-Arawakan" gets 1 GBooks hit and "Proto-Ta-Arawak" gets none; "Ta-Arawakan languages" returns 2 copies of 1 book, "Ta-Arawak languages" returns 1 book. On Google , "Ta-Arawakan languages" returns 0 hits while "Ta-Arawak languages" returns 7 (of which 3 are duplicates of a single work). - -sche (discuss) 18:31, 19 February 2025 (UTC)Reply
    @-sche What about Proto-Arawak vs. Arawakan? Wikipedia has w:Arawakan languages and w:Ta-Arawakan languages (although the w:Arawakan languages article uses "Ta-Arawak" in reference to the family). Since Ta-Arawakan is a subfamily of Arawakan, it seems we should be consistent in the names of these two families. (Meanwhile, confusingly, Category:Arauan languages is an apparently unrelated family; Wikipedia's article is at w:Arawan languages, which looks more "modern".) Benwing2 (talk) 00:59, 21 February 2025 (UTC)Reply
    Although both names seem to be common enough that the Google (Books) Ngram Viewer should be able to plot them (both seem to get well over 40 hits), it doesn't like the hyphens, so this claims no results, and I can't be sure whether this is actually a graph of "Proto-Arawak" or instead of how many books have "Proto" minus "Arawak". Nonetheless it seems like "Arawak" is more common, if we wanted to standardize everything on that. (Google Scholar also claims to find slightly more results for "Proto-Arawak" than "Proto-Arawakan", and significantly more for "Arawak" than "Arawakan".) - -sche (discuss) 18:32, 22 February 2025 (UTC)Reply
    For Kamta, I notice there's the added oddity that the language family/category is named "... lects" rather than "... languages", even though the languages in the category are named "Category: ... language". AFAICT, that part of the name should be regularized (from "lects" to "languages"). For the name itself, google books:"KRNB" languages Kamta turns up zilch (and I spy only three Google Scholar hits), but "Kamta languages" also turns up zilch (and if the family were renamed "Kamta" to match the proto-language, we would run into the Kipchak issue where {{der}} etc would return the same name whether the family or the [non-proto] language that's already called "Kamta" was called). Wikipedia uses a third name, "KRDS", which I can find a couple of Google Books and a couple of Google Scholar hits using. There are a couple Google Books and Scholar hits for "proto-Kamta", and none for "Proto-KRNB" or "Proto-KRDS", so maybe we leave the proto-language name as "Proto-Kamta" but change the family from "KRNB lects" to "KRDS languages"? Or maybe some Indian-language editors have better knowledge/ideas: pinging User:AryamanA who created Category:Rajbanshi language (and you already pinged TKW, who Category:Surjapuri language). - -sche (discuss) 18:32, 22 February 2025 (UTC)Reply
    In general, I'd follow the literature; if they generally use a different name for the proto-language vs. the group by which the proto-language is reconstructed, so be it. If it's an even split between multiple names: sure, harmonize it for convenience. However, I have a few suggestions.
    • Rename "Kukish" to "Kuki-Chin" (Kuki-Chin is more common)
    • Change the code of Proto-Norse from gmq-pro to non-pro but keep the "Proto-Norse" name (since that's what the literature calls it). It doesn't really make sense for Old Norse to be non but Proto-Norse to have "gmq" instead.
    Ceso femmuin mbolgaig mbung, mellohi! (投稿) 17:06, 24 February 2025 (UTC)Reply
    Definitely, in cases where one name is more common for the proto-language and another for the group, I agree it's fine for them not to match. - -sche (discuss) 17:50, 25 February 2025 (UTC)Reply

    @-sche, Mellohi! I added the results so far in bold. There's a trend here in that so far generally the name of the proto-language has remained and the name of the family changed. I don't know if that applies to the remainder, though. Benwing2 (talk) 20:30, 24 February 2025 (UTC)Reply

    Medieval Greek 2025

    [edit]

    Pending from 2024 (Benwing plan)

    Waiting... [I understand that languages where decisions have a turnout of approx. 5 people are difficult]. It would be useful for reviewing correctly etymologies, Cat:Koine Greek and Cat:Modern Greek simultaneously. At the moment I feel 'blocked' because it will be hectic to have to go back to my reviews to rereview them. I usually write a MedGr.reminder every January of every year since 2023. The stylistic use of Koine through centuries as high register & diglossia should not discourage or confuse this decision. Thank you. ‑‑Sarri.greek  I 15:39, 19 February 2025 (UTC)Reply

    @Sarri.greek: Hello again. Could you explain what you refer to by “The stylistic use of Koine through centuries as high register & diglossia should not discourage or confuse this decision.”, please? 0DF (talk) 01:17, 20 February 2025 (UTC)Reply

    March 2025

    [edit]

    Code for Volga Turki?

    [edit]
    • Proto-Turkic: (trk-pro)
      • Proto-Common Turkic: (trk-cmn-pro)
        • Kipchak: (trk-kip) (FAMİLY)
          • Kipchak-Bulgar: (trk-kbu) (FAMİLY)
            • Volga Turki: (?)
              • Bashkir: (ba)
              • Tatar: (tt)

    >>Volga Türki<<

    BurakD53 (talk) 18:34, 3 March 2025 (UTC)Reply

    Data: Qul Ali Kıssa-i Yusuf and Volga Tatar tombstones (like Volga Bulgar inscriptions). - BurakD53 (talk) 18:37, 3 March 2025 (UTC)Reply
    @AmaçsızBirKişi @Zbutie3.14 - BurakD53 (talk) 18:39, 3 March 2025 (UTC)Reply
    fine with me Zbutie3.14 (talk) 23:28, 3 March 2025 (UTC)Reply
    Why not?  Support.
    @Benwing2 (I know we keep pinging you to add new lang codes for Turkic langugaes, but the thing is the previous ones were very inaccurate and missing.)
    AmaçsızBirKişi (talk) 10:08, 4 March 2025 (UTC)Reply
    @Benwing2 we need a code for Volga Turki it is an L2 language under the Kipchak-Bulgar family and it is the parent of Bashkir and Tatar. Also there are a lot of other things that need to be changed but we can deal with those later I guess Zbutie3.14 (talk) 19:14, 13 March 2025 (UTC)Reply

    Some Sino-Tibetan considerations

    [edit]

    New sub-proto-languages

    [edit]

    I would like to propose some Sino-Tibetan sub-proto-languages:

    • Proto-Bodish (sit-bdi-pro), for Category:Bodish languages; a long list of Proto-Bodish forms is provided in Bodt's "East Bodish Revisited".
    • Proto-Tangkhulic (sit-tng-pro), for Category:Tangkhulic languages; David Mortensen has published extensively on this
    • Proto-Naish (sit-nas-pro), for Category:Naish languages; several reconstructions are given by Jacques and Michaud's "Approaching the historical phonology of three highly eroded Sino-Tibetan languages: Naxi, Na and Laze" (and also Li Zihe has his own reconstruction scattered across separate papers).
    • Ersuic languages (sit-ers) composed of ers (Ersu) and sit-liz (Lizu); its proto-language (which would thus be sit-ers-pro) is reconstructed by Yu 2012.

    Ceso femmuin mbolgaig mbung, mellohi! (投稿) 16:48, 6 March 2025 (UTC)Reply

    @Benwing2 @Thadh @Justinrleung for consideration. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 16:49, 6 March 2025 (UTC)Reply
    Definitely support for Proto-Bodish.
    I'm not very familiar with the rest, so I can't readily say how good the reconstructions are (but on first glance, they seem fine). For Naish, I'm a bit worried that Proto-Naish may not turn out to be that different from Proto-Naic - are there any reconstructions of the latter? When dealing with just five languages, I think a higher-order reconstruction would potentially be more interesting than a lower-order if the languages are closely related. On the other hand, if there's no work being done on these and it's not likely to be the case in the future, we might as well include Proto-Naish now, if it's already reconstructed. Thadh (talk) 17:31, 6 March 2025 (UTC)Reply
    Couldn't find any advancements to a Proto-Naic stage beyond Proto-Naish, no. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 17:42, 6 March 2025 (UTC)Reply
    No objections but I don't know a lot about the intermediate structure of Sino-Tibetan. I have heard there is a good deal of uncertainty, are we pretty sure these families are valid? If so, I'm fine with creating the relevant proto-languages. Benwing2 (talk) 20:13, 6 March 2025 (UTC)Reply
    Naish, Tangkhulic, and Ersuic are unchallenged. They should proceed without a hitch.
    Bodish... has an annoying terminology problem. Everyone agrees that Tibetic and East Bodish belong together, but the terminology used to refer to such a grouping varies wildly:
    • Tibetic + East Bodish alone are the basis of Bodt's "Proto-Bodic" reconstruction. But Bodt defines "Bodish" as synonymous with Tibetic and "Bodic" as Tibetic + East Bodish + Tamangic + West Himalayish.
    • Bodish = Tibetic + East Bodish according to Hill (Hill actually rejects East Bodish as a genetic group, but still considers its components overall Bodish); consequently Proto-Bodish is the ancestor of this grouping. A similar definition of "Bodish" is also used in Glottolog.
    • Shafer uses "Bodish" for two levels of grouping, the lower level consisting of what is now accepted as Tibetic + East Bodish.
    • The current definition of Category:Bodish languages is Tibetic + East Bodish + Tshangla (and for some bizarre reason 'Olekha, which certainly doesn't look Bodish at all to me).
    So basically, Bodt's "Proto-Bodic" is not a valid reconstruction for what he defines as "Bodic" (since it only uses two of the four Bodic branches for reconstruction), but it is valid for what Hill and Glottolog call "Bodish" (East Bodish + Tibetic) which essentially is a subgroup of what Tournadre and others call "Bodish" and Bodt calls "Bodic" (East Bodish + Tibetic + West Himalayish + Tamangic).
    In the end, I would like for Tshangla and 'Olekha to be removed from Category:Bodish languages since their Bodishness is dubious. This leaves behind East Bodish and Tibetic, whose proto-language will be Bodt's Proto-Bodic = Hill's Proto-Bodish with code sit-bdi-pro. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 00:53, 7 March 2025 (UTC)Reply
    Thanks for the details. So I take it you prefer the terminology "Bodish" for the narrower Tibetan + East Bodish, and "Bodic" for the wider group that includes Bodish + Tamangic + West Himalayish? Should we create an intermediate family "Bodic languages", since it doesn't currently exist? Benwing2 (talk) 01:01, 7 March 2025 (UTC)Reply
    No. Absolutely do not use "Bodic" that way. "Bodic" is used in other literature taking after Bradley which adds an additional branch (whatever branch Kiranti is in) on top of the four-branch "Bodish" sensu lato. I have no good ideas on what to call four-branch Bodish; "Bodish (sensu lato)"? — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 01:28, 7 March 2025 (UTC)Reply
    "Macro-Bodish" or "Greater-Bodish" is probably better. – wpi (talk) 13:31, 18 April 2025 (UTC)Reply

    Rearranging rGyalrongic and Tangut

    [edit]

    Tangut (txg) should be placed inside Category:Rgyalrongic languages, not treated like a sister to it.

    The whole rGyalrongic branch should have two subdivisions: West rGyalrongic (sit-wgy) consisting of Tangut txg, Horpa ero, and Khroskyabs jiq; and the other languages like Japhug sit-jap, Situ sit-sit, Zbu sit-zbu and Tshobdun sit-tsh belong to East rGyalrongic (sit-gya I guess).

    Ceso femmuin mbolgaig mbung, mellohi! (投稿) 17:09, 6 March 2025 (UTC)Reply

    Is there a reason to prefer sit-gya instead of sit-egy parallel to sit-wgy? - -sche (discuss) 07:13, 19 March 2025 (UTC)Reply
    This is because "Gyalrong" itself refers to East rGyalrongic alone. But I don't mind "egy" as a qualifier. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 16:44, 28 March 2025 (UTC)Reply

    Done Done Tangkhulic, Ersuic, Naish and rGyalrongic reorganizations and proto-languages, since nobody objected. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 16:56, 5 April 2025 (UTC)Reply

    Done Done Bodish rearrangements as well. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 03:30, 7 April 2025 (UTC)Reply

    'Fingallian' and 'Yola' i.e. the Early Modern English dialects of Fingal, and Forth and Bargy

    [edit]

    My argument here is that Fingallian and Yola as languages is inaccurate and are better classed as dialects of Early Modern English.

    As detailed by Hickey (2005, pp. 196-198), 'the dialect of Fingal' is attested in three 17th century poems which display a small number of features showing the influence of Irish Gaelic and a couple of relatively conservative features (namely Middle English /i:/ and past participle 'y-'). 'The dialect of Forth and Bargy' is attested slightly more substantively from the end of the 18th century with two longer glossaries and some short texts, mostly poems/songs. These display a larger number of divergent or conservative features (2005, pp. 199-202). Hickey (2002, 2005) is essentially the primary scholar on historical varieties of English in Ireland and is clear in referring to these as dialects. No reliable sources make any mention of Yola/Fingallian languages. Similarly, Oxford English Dictionary notes Forth and Bargy words as variant forms under Irish English (Wexford), 1800s such as 'af' and 'av' for 'if' here.

    These two dialects give a glimpse into the development of English in Ireland prior to the large scale language-shift that came in the following centuries. Whilst I recognise that the 'language vs. dialect' argument is mostly contrived and relative, it does not make any sense for these two to be classes as languages on wiktionary and any entries would be better described as dialectal Early Modern English. My view is that these varieties are not actually that different from contemporary or later dialects of English eg Yorkshire or Cumbrian dialects which are traditionally quite divergent from varieties of Southern English yet fall under English all the same. Further, the majority of words currently under the heading Fingallian are cited from a glossary of dialectal words from the 20th century and aren't strictly Fingallian anyway.

    Sources:

    • Hickey, R. (2002). A source book for Irish English. J. Benjamins Publishing Company.
    • Hickey, R. (2005). Dublin English: evolution and change. J. Benjamins Publishing Company.
    • Oxford English Dictionary, s.v. “if (conj. & n.), Forms,” accessed December 2024,

    MolingLuachra (talk) 21:00, 11 March 2025 (UTC)Reply

    @MolingLuachra These diverged before Early Modern English (which begins c. 1500), so what is the basis for treating them as part of it? They also have a separate ancestry to modern Irish English, as they developed out of the forms of Middle English brought over centuries earlier, so putting them under the heading "English" (which strictly refers to Early Modern English onwards) feels contrived. The fact that many Fingallian entries are wrong isn't really relevant, either - those entries just need to be corrected.
    Note also that Wiktionary is not Wikipedia - we aren't limited by whatever reliable sources choose to describe as a language. We merge some traditionally treated as separate (e.g. Serbo-Croatian, Catalan and Valencian), and separate others that are usually grouped together (e.g. Low German is split into Dutch Low Saxon and German Low German). Theknightwho (talk) 12:14, 12 March 2025 (UTC)Reply
    @Zff19930930 as a prolific Yola editor. —Mahāgaja · talk 06:22, 13 March 2025 (UTC)Reply
    Yola is much more conservative than Early Modern English. For instance, baake (bake) /baːk/ was heard and recorded in A Modern Glossary of the Dialect of Forth and Bargy, page 154. Thus, Yola can't be classified as a dialect of Early Modern English.
    There is a comment about Fingallian in A NORTHCOUNTY DUBLINGLOSSARY, page 262.
    This district, Fingal, had in former times a dialect based on the I3th century colonial South-Western English of the Pale. Fingallian, of which we have only the slightest records, must have closely resembled the Forth dialect, recorded by Poole early in the last century; but, owing no doubt to its nearness to the capital, it did not keep its peculiarities so long. One naturally looks for traces of this ancient speech in a North-Dublin glossary, but they are few and doubtful.
    Some words give a flavour of Fingallian, particularly forms like fat for "what", fen for "when", ame for "them" or plack-keet for "placket". Fingallian did exist, and was extinct by the mid-19th century.
    I will clean some Irish English under the heading Fingallian. Zff19930930 (talk) 13:01, 13 March 2025 (UTC)Reply
    You could get 'fat, fen, etc. (pronounced with voiceless bilabial [ɸʲ], the Irish slender /f´/) in most of rural Ireland up to the 20th century, hence eg. making fun of phwat is yer nam?! in An Béal Bocht by Myles na gCopaleen (and I wouldn’t be extremely surprised if there were still some old people with that in the strongest Gaeltacht areas). // Silmeth @talk 19:13, 13 March 2025 (UTC)Reply
    A single conservative feature is not nearly enough to object to 'Yola' being a dialect of English. The conservative lack of vowel shift /iː/ → /ai/ and /aː/ → /eː/ is interesting and I'm not sure about that in particular but all of the features of the dialects in Forth and Bargy/Fingal are widely attested elsewhere or are clear substrate features of Irish. As I said, scholarly consensus is unambiguous in referring in these as dialects of English and my contention is that the classification for wiktionary's purposes as 'languages' makes no sense when other much better attested and much more divergent varieties of historical and modern 'Englishes' are not 'languages'. My argument is essentially that 'Early Modern English' as used by Burnley (1992) refers to a period of the language's history c. 1500-1800. As 'Fingallian' and 'Yola' are dialects of English attested during this period, I think it makes sense to call them 'dialects of Early Modern English'. Especially given that this is the convention taken with other instances of dialectal or historical variation in Old English, Middle English and English such as here where variant spellings reflecting regional or historical differences are given as 'Alternative forms' under the headword 'fader'.
    I'm not sure what the relevance of the quote is but the examples you give are sort of besides the point. As Silmeth pointed out, the substitution of English /ʍ/ for Gaelic /ɸ/ is not limited to Fingal/F&B and continued to be a feature of Irish English until very recently if not still amongst some people. The shift in stress in a word like 'placket' is a feature of F&B not Fingal (argued by O'Rahilly 1932 to be a result of Norman influence). You say that 'Fingallian' 'was extinct by the mid-19th century', do you have any source for that? No reliable sources I can find say anything but that the dialect was only attested in three poems in the 17th century.
    MolingLuachra (talk) 15:00, 14 March 2025 (UTC)Reply
    Coming here from Wikipedia, if such dialects as traditional Somerset and Dorset English aren't considered separate, then neither should the dialect of Forth and Bargy be. Many authors, even while it was alive, talked about how similar they are. Fingallian even less so. Not to mention half the Fingallian etymologies are just wrong and made up by someone who clearly doesn't know much about Irish or the English of Ireland. I'm after correcting one that very clearly comes from Irish, and there's several others.
    Also, as @MolingLuachrasaid, Fingallian is attested by three poems in the 17th century and likely didn't last out the century; indeed, it's unknown if those poems were even written by speakers or rather people making fun of it. There's really absolutely no reason it should be considered separate here. Sionnachnaréaltaí (talk) 19:02, 14 March 2025 (UTC)Reply
     Support. Dialects form a continuum, and it makes no sense to categorize them solely based on point of divergence. In this instance, it seems clear that Yola and Fingallian belong to the broader English continuum; they differ from prestige varieties, but so are most other traditional dialects, especially in the region. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 22:47, 17 March 2025 (UTC)Reply
     Support. As I said earlier, there's no real reason to consider it separate apart from comparing it solely with the prestige language. Even the people who wrote about it while it was alive or recent deceased never considered the Forth and Bargy dialect (Yola) or Fingallian to be separate (and we have scant information on Fingallian to begin with). If we consider these two separate on linguistic features, we might as well consider every traditional dialect of English as a separate language; if we consider them separate based on geographic location, why is American English, Canadian English, Nigerian English, Indian English, modern Hiberno-English, et al. not considered as separate languages? Really, it seems the differences are exaggerated because the nearest dialects to it in the continuum aren't spoken as strongly anymore, and because we compare it to modern prestige English, not to the continuum of its time.
    Sionnachnaréaltaí (talk) 14:07, 18 March 2025 (UTC)Reply
    Comparing it to the differences between American English, Canadian English, Nigerian English, Indian English, modern Hiberno-English seems like a pretty big exaggeration, given they are all widely mutually-intelligible with each other. Theknightwho (talk) 14:35, 18 March 2025 (UTC)Reply
    All the more reason to not consider it separate. As far as we know it was mutually intelligible to other dialects on the spectrum. If they're considered a single dialect continuum because of mutual intelligibility, then the Forth and Bargy dialect should be too based on what we know. Sionnachnaréaltaí (talk) 19:44, 18 March 2025 (UTC)Reply
    Modern prestige BrEng is completely intelligible to me but many traditional Scottish and Irish lects aren't. Hell, many English lects aren't, either, and as a US Southerner I still often struggle to understand rural AAVE. Are these all separate languages? ;P Chances are, most everyone can understand people from the next town over (at least as far as traditional dialectal boundaries go); that's what a dialect continuum is, no? 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 20:33, 18 March 2025 (UTC)Reply
    Prestige dialects of English (those you have mentioned, pretty much) have developed in parallel for a long time with consistent cross-pollination, thus in this era we cannot consider geography as the sole factor in the continuum. We instead need to examine each individual lect and compare them to all other lects that share similar features; in this case, as has been raised above, both Fingal and Yola share many features with other contemporaneous Irish lects, putting it in a similar position as other traditional regiolects; that is to say, I think I don't think anyone would be opposed to treating these as separate languages if e.g. traditional Yorkshire is also treated as such—but they arent. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 20:38, 18 March 2025 (UTC)Reply
    (Would you also merge Scots under ==English==?) I'm somewhat ambivalent, but am inclined to keep Yola separate; it has a divergent history (like Scots, which past discussions have also strongly though not unanimously kept separate), and has an ISO code, distinguishing it from some of the dialects people have pointed to above. Furthermore, it seems like handling it as ==English== would in practice mean deleting coverage of it, since it's not clear to me it would meet the criteria for inclusion (three uses-not-mentions) that English words are subject to. Fingallian OTOH (added without discussion) seems more dubious, particularly because it seems the only works "attesting" it may be parodies of it rather than actual records of it, and thus not reliable bases for entries (consider the differences between African-American English and parodies of it). - -sche (discuss) 03:22, 19 March 2025 (UTC)Reply
    Oppose merging for now for the above reasons. We have pretty strict inclusion criteria for English per WT:CFI & WT:WDL, so if merging means losing the coverage we have, I cannot support it. And FWIW the last time we tried to make a carve-out for variants of WDLs, it unfortunately did not pass: Wiktionary:Votes/2022-08/Regional and Obsolete variations as LDL's. AG202 (talk) 03:50, 19 March 2025 (UTC)Reply

    Merge Northern Kankanay [xnn] to just Kankanaey [kne]

    [edit]

    I would like for these to be merged (i.e. deleting Northern Kankanay `xnn` and moving them all to just Kankanaey `kne`). My reasons include:

    • Resources for Kankanaey doesn't differenciate whether they are just `kne` or specifically `xnn`, aside from a few SIL wordlists.
    • The most comprehensive dictionary source ({{R:kne:Vanoverbergh 1933}}, and basically all of Vanoverbergh's works) is created in Bauko. Bauko is located right in the middle of known `kne` speakers (northern Benguet) and `xnn` speakers (Sagada). I arbitrarily decided to use it under `kne`, but it is also valid to put it under `xnn`.
      • Bauko, Tadian, and Sabangan are the three municipalities wherein I find very difficult to categorize whether it is `kne` or `xnn`. Some sources say they are `kne`, whole others contradict it.
    • From my research, the most defining difference is the way to say "yes" (`kne` is aw while `xnn` is owen). Other than that, it is really difficult to decide what language code to put a term under.
    • The SIL wordlists definitely have differences, however I think they are just dialectal differences, especially the "s" – "h", "r" – "l", and "man-" – "men-" allophones. I have documented some of these as just pronunciation variants.
    • Both sides have the almost the same vocabulary. They just differ in pronunciation. It is unnecessary to have two entries for the same word, one for `kne` and one for `xnn`.
    • The speakers of `xnn`, known as Applai, still refer to the language that they speak as "Kankanaey".
    • {{R:kne:Ortograpiya 2016}} is the standard orthography of Kankanaey. Does it also apply to `xnn`?

    All of these `kne`-`xnn` headaches can be resolved by just merging them into `kne`. — 🍕 Yivan000 viewtalk 09:33, 12 March 2025 (UTC)Reply

    As added points:
    🍕 Yivan000 viewtalk 06:30, 17 May 2025 (UTC)Reply

    Proto-Luwian ?

    [edit]

    I occasionally encounter reconstructions for Proto-Luwian. Kloekhorst has some, Dunkel also has at least one. So shouldn't this be added as language? Exarchus (talk) 14:06, 23 March 2025 (UTC)Reply

    I think generally Proto-Luwian doesn't have a lot of difference from Proto-Anatolian, and the number of different lexemes is very limited. The classification of lower-branch Anatolian is also unclear, so we'll have a problem with deciding which languages are Luwian and which are not. Thadh (talk) 16:52, 28 March 2025 (UTC)Reply
    We already have a category Luwic languages. But it seems both 'Proto-Luwic' and 'Proto-Luwian' are in use, and Kloekhorst differentiates between them here: "This means that Lycian stems from a sister language to Proto-Luwian and that both can be regarded as distinct daughters of Proto-Luwic." But Kloekhorst in his earlier dictionary apparently considers Lycian part of 'PLuw.', given as "Proto-Luwian" on page xii. So I'm indeed not sure how established this classification is. Exarchus (talk) 17:24, 28 March 2025 (UTC)Reply

    Bali-Sasak-Sumbawa

    [edit]

    Bali-Sasak-Sumbawa languages don't belong to the Malayo-Chamic branch (since Malayo-Chamic only includes Malayic and Chamic languages). Can we remove Proto-Malayo-Chamic from their "ancestors" on their language data? Alfarizi M (talk) 23:55, 30 March 2025 (UTC)Reply

    Hello @User:Fenakhay, can you help me with this? I can't edit the modules. Here are the module pages Module:languages/data/3/b (Balinese), Module:languages/data/3/s (Sasak and Sumbawa). Alfarizi M (talk) 09:13, 20 May 2025 (UTC)Reply
    @Alfarizi M: Done Done. I've put them under Category:Bali-Sasak-Sumbawa languages (according to Wikipedia). Is that correct? — Fenakhay (حيطي · مساهماتي) 09:39, 20 May 2025 (UTC)Reply
    Thank you so much! And it's correct. Alfarizi M (talk) 09:41, 20 May 2025 (UTC)Reply

    April 2025

    [edit]

    Merge codes cir and meg

    [edit]

    In 2023, ISO 639-3 merged code meg "Mea" into cir "Tîrî". I propose we do the same. This should hopefully be non-controversial. I also propose we use the spelling "Tiri" without accents, which is more common in the literature (which uses "Tiri" or "Tinrin" over "Tîrî" or "Tĩrĩ"). The only reason the Wikipedia article is at Tîrî language is because Kwami moved it there; putting random accents in Wikipedia article names is his m.o. Benwing2 (talk) 07:05, 7 April 2025 (UTC)Reply

    Support a merger. For the name, poking around Google Books and Google Scholar, the spelling I see most often is Tinrin, but I defer to you if Tiri is more common in more recent or more linguistic works, which I did not have time to try to quantify. - -sche (discuss) 18:32, 10 April 2025 (UTC)Reply
    I don't actually know whether Tiri or Tinrin is preferred; I was just expressing my dispreference for the forms 'Tîrî' (as Wikipedia has it) or 'Tĩrĩ' (as it also is written). I think we should go with 'Tinrin', which in any case is less likely to clash with other languages (for example, the northern Somali dialect is also known as 'Maxaa Tiri'). Benwing2 (talk) 19:10, 10 April 2025 (UTC)Reply

    Treat Category:E language as a Category:Tai languages

    [edit]

    Wikipedia claims that E language is a "Tai–Chinese mixed language".

    Luo & Deng (1998) (which argues for that mixed language stance) identifies 53 out of 98 Swadesh list vocabulary as Kra-Dai (which is still a majority of the vocabulary), and 33 out of 98 as Sinitic, but the latter includes several words that are miscategorised e.g. sɔŋ¹ (purportedly from  / ) is from *soːŋᴬ and ultimately from Middle Chinese (sraewng), ku¹ (purportedly from ()) is from *kuːᴬ.

    {{R:eee:Wei & Wei 2011}} suggests that the many supposedly Sinitic features in E suggested by Luo & Deng (1998) can also be found in other Tai or Kra-Dai languages. They further proposes that E actually constitutes as the third group of Zhuang, but I don't find this extremely convincing.

    Overall my impression is that E is just a Tai language with a very strong Sinitic influence, and I suggest that we simply set the parent of E eee to Tai tai. (with the added benefit of only having to link to the Proto Tai entry instead of listing cognates) – wpi (talk) 18:34, 12 April 2025 (UTC)Reply

    Meh. It seems impossible to tell whether it is underlyingly a Tai language which has heavily mixed with Chinese, or a Tai-Chinese mixed language. I have no strong feelings, but it seems like it should be possible (and if it is not currently possible, we should make it possible) to say that a term in a mixed (e.g. Tai-Chinese) language derives from a (e.g. Tai) protolanguage root—and link there for cognates—even if we don't reclassify the language as a descendant of solely that protolanguage. - -sche (discuss) 03:59, 19 April 2025 (UTC)Reply
    @-sche: I don't have very strong feelings for this, but I simply find that (a) from a linguistic perspective, the mixed language argument is less convincing than the other – by the same logic one could say that English is a mixed language due to its large French/Latinate vocabulary and French-influenced morphology (well there are some who takes such view but the general consensus is that English is a Germanic language), and (b) from a editing perspective, it will be easier to work on etymologies for a normal language (as opposed to a mixed language), see for example the often inconsistent etymology template usage in our pidgin and creole entries. – wpi (talk) 13:49, 20 April 2025 (UTC)Reply
     Support. With languages lacking historical documentation there is often not a solid line to be drawn between mixed language vs. creole vs. heavy loanage vs. substratum effect, et cetera, but if a reasonably clear leaning can be discerned and is being commented on in literature, there should be no issue (for our purposes) treating it as straight inheritance. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 08:29, 21 April 2025 (UTC)Reply

    Rename Wára language [tci] to Upper Morehead

    [edit]

    Wikipedia has the language at Upper Morehead language, which is likely to be more distinct of a name than Wára (there are two other Wara languages given in Wikipedia), but even more to the point, the name Wára actually refers to one of the dialects of this language, not to the language itself. Although Ethnologue (and hence ISO 639-3) appears to use the term Wára for the language as a whole, Glottolog calls it Anta-Komnzo-Wára-Wérè-Kémä based on the five identifiable dialects. The only comment in Wikipedia about this language is:

    Upper Morehead, also known as Wára, is a Papuan language of New Guinea. Varieties are Wára (Vara), Kómnjo (Rouku), Anta, and Wèré (Wärä); these are divergent enough to sometimes be listed as distinct languages.

    So maybe at some point some of the dialects will be split into separate languages but at this point given the single ISO code and Glottolog's view, I would keep as a single language and use a term that does not match any individual dialect. @-sche? Benwing2 (talk) 05:53, 14 April 2025 (UTC)Reply

    Oof, the fact that not one but two dialects/lects of this language are sometimes spelled Wara (give or take some diacritics) seems confusing, but Wikipedia says "Upper Morehead" is also polysemous, sometimes denoting Arammba instead. I will try to find out how commonly it denotes this language vs Arammba. (Exonymic placename language names like "Upper Morehead" always feel a little bit kludgy to me, but sometimes it can't be avoided.) - -sche (discuss) 03:59, 23 April 2025 (UTC)Reply

    Add Hanlao language

    [edit]

    Hanlao language (漢佬話 or 旱澇話 in Chinese, both romanises to Hanlao) is spoken in the northern parts of Qinzhou, Guangxi, China. The primary sources are Luo (2016) (Bulletin of Chinese Linguistics #9 pp121-150, accessible via https://www.academia.edu/59519239/) and the Qinzhou City Annals.

    Its affiliation is unclear, most sources either claim that it is a Zhuang-ised Sinitic language or a mixed language between Tai and Sinitic. However based on the description in Luo (2016) (e.g. 57 out of 97 Swadesh list words are cognates with Zhuang and 22 out of 97 with Sinitic), I believe the case is likely similar to Category:E language above, i.e. Hanlao is a heavily Sinitised Tai language. At any rate, it is clearly distinct from other Sinitic or Tai languages.

    There is no ISO code, so I propose tai-han. – wpi (talk) 18:37, 14 April 2025 (UTC)Reply

     Support on adding the language,  Weak support on Tai inheritance; agree in principle per E above but this specimen seems to have received less relavent coverage in literature. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 08:30, 21 April 2025 (UTC)Reply
    Support adding it; neutral on how to classify it. I tried to do my part / due diligence and look for sources about it (so things don't just sit on this page getting little input); the few mentions of it I could find do support that it is a language (as opposed to e.g. a dialect of another Zhuang language; a main concern whenever anyone proposes to add a new language is to be sure it isn't already / better covered as another language). Luo's paper suggests that more central vocabulary is Zhuangic and more peripheral words come from Cantonese, Pinghua, and Hakka, yes? which would perhaps suggest it is indeed underlyingly Tai. - -sche (discuss) 23:03, 21 April 2025 (UTC)Reply

    Add Podlachian Language

    [edit]

    The Podlachian Language is the East-Slavic language spoken between Narew and Bug. This language has own website and has article in Wikipedia, but there aaren't anything in Wiktionary. Could I create some articles about this here? PGałązka (talk) 10:34, 22 April 2025 (UTC)Reply

    @Underfell Flowey @AshFox @Ssvb @Sławobóg @Thadh @Benwing2 as users with any sort of regular contact with East Slavic. Vininn126 (talk) 10:53, 22 April 2025 (UTC)Reply
    Ok. I have any sort of regular contact with East Slavic too, I'm a half-Podlashuk. PGałązka (talk) 13:27, 22 April 2025 (UTC)Reply
    @PGałązka: One of the biggest problems is that the w:Podlachian language doesn't seem to have the ISO 639-3 code yet (look at the "Proposal for several languages without ISO codes" topic above). Also there are questions about the availability of citations in durably archived sources and about the number of potential Wiktionary contributors in this language. If it's just you alone and you eventually lose interest, then the Podlachian content may become a liability. Additionally, your "half-Podlashuk" self-assessed status is not very reassuring, as there were some hot topics Wiktionary:Beer_parlour/2025/April#Prohibit_AI-generated_content and Wiktionary:Beer_parlour/2025/April#Formally_allowing_removal_of_Babel_boxes_by_other_users_if_proficiency_is_contradicted recently. These are the details that would be useful to clarify. --Ssvb (talk) 14:24, 22 April 2025 (UTC)Reply
    "Once again they want to divide Ukrainian dialects into separate micro languages...". Actually I'm not against adding Podlachian, and even West Polesian... but East Slavic languages ​​should be tidied up... he tree of East Slavic languages ​​should ideally look like this (see below), which would maximally please reality... under such conditions I am only for adding Podlachian and West Polesian... and not any other inadequate options with "attempts to deduce Podlachian from the times of Kievan Rus". — AshFox (talk) 14:35, 22 April 2025 (UTC)Reply
    * East Slavic:
    ** Old East Slavic:
    *** Middle Russian: [etym-only]
    **** Russian:
    *** Old Ruthenian:
    **** Middle Belarusian: [etym-only]
    ***** Belarusian:
    **** Middle Ukrainian: [etym-only]
    ***** Carpathian Rusyn:
    ***** Podlachian:
    ***** Ukrainian:
    ***** West Polesian:
    ** Old Novgorodian:
    *** Old Pskovian: [etym-only]
    
    OK. But i don't understand. Can I add this language or no? PGałązka (talk) 17:28, 23 April 2025 (UTC)Reply
    I'm pretty sure only template editors and admins can add a language. —Mahāgaja · talk 20:18, 23 April 2025 (UTC)Reply
    I don't think that's what they were asking. @PGałązka: On a technical level, if you want to add Podlachian entries, it should to be added to Module:languages (which, yeah, only a template editor or admin can do), but you need consensus from other people who make entries in East Slavic languages before it's added, hence the pings above. You should wait for their input; in my experience though it might be a little difficult to split off a new language code, since you'll also have to go through Ukrainian entries and decide whether they fall under Podlachian or not. Saph (talk) 21:59, 23 April 2025 (UTC)Reply
    Sorry to revive this nearly-month-old thread, but my two cents as someone who does stuff with Carpathian and Pannonian Rusyn (formerly also Belarusian), has no ethno-linguistic dog in the fight, and has had a bit of contact with Podlachian media: I wouldn't classify Podlachian under Ukrainian or Belarusian. Old Ruthenian is as far as I'd confidently go in terms of ancestors, but Podlachian displays both features of Belarusian and Ukrainian, most notably akanie and /d͡zʲ/ from the Belarusian perspective, but also a greater prominence of /ɫe/ from the Ukrainian perspective. Not to mention that different varieties classified under the broad umbrella of Podlachian display different degrees of Ukrainian and Belarusian characteristics. The Maksymiuk standard of Podlachian largely doesn't take akanie into account for example (like Ukrainian), but Niczos from Sw@da x Niczos sings in a variety that has akanie (like Belarusian).
    Nonetheless I think separate classification is still a good idea, precisely because of this etymological ambiguity as to whether it belongs under Belarusian or Ukrainian or both or neither. In addition, Podlachian seems to be written in both the Latin and Cyrillic scripts (contrary to Maksymiuk's best efforts), and classifying them under either Belarusian or Ukrainian would just clog up the "Belarusian/Ukrainian terms spelled with X" categories, as it already is doing. Instead, one could look towards Serbo-Croatian as an example, and list Podlachian as written in both Cyrillic and Latin so it doesn't generate a million "spelled with" categories. Of course it does need to have a distinct ISO code first, or some code needs to be invented for classification within Wiktionary.
    @Ssvb: about availability of citations: the Wikipedia page indicates that there are several texts in Podlachian being published regularly, as well as novels, poetry and memoirs. That's more potential citations than the entirety of Solombala English, which seems to rely entirely on a small handful of sentences from the 1800s for actual usage. My concern is that relying on the Svoja.org website too much would create a disproportionate image and under-represent certain varieties of Podlachian.
    But that's just my two cents. Insaneguy1083 (talk) 11:22, 19 May 2025 (UTC)Reply
    @Insaneguy1083 Thanks for taking a look and posting your opinion. I also have done my own research of the available public information, but I still would like @PGałązka to first provide a lot more details (their self-assessed language competence and the geographical location of the place where they learned it, since there are many local variants), and then make a practical proposal for their vision of how things should be preferably handled. Ssvb (talk) 04:54, 20 May 2025 (UTC)Reply

    May 2025

    [edit]

    Rename Kulon-Pazeh [uun] to Pazeh–Kaxabu?

    [edit]

    “Kulon–Pazeh” refers to a linguistic subgroup containing Pazeh and the extinct Kulon (if ever existed as a language). AFAIC this is an obsolete terminology. The Kaxabu people are culturally related to the Pazeh. Should we use the modern term Pazeh–Kaxabu like on wikipedia instead? I felt a bit confusing to use Kulon-Pazeh. Chihunglu83 (talk) 02:22, 10 May 2025 (UTC)Reply

    Update: code uun is now a retired code and it was split into pzh and uon in 2022, while Kaxabu has not recognized as a language. Chihunglu83 (talk) 15:09, 14 May 2025 (UTC)Reply

    Old Slovene

    [edit]

    I propose to add new South Slavic language: Old Slovene.

    Pinging @Vininn126, Linyker¹²³, Chihunglu83. Sławobóg (talk) 12:32, 14 May 2025 (UTC)Reply

    So far I am of two minds - the very early attestation is indeed noteworthy. There are a few other issues I have.
    1. It seems to be an exceedingly small number of lemmas.
    2. There are indeed unique features, but enough that we couldn't modify Slovene structure to support it?
    Linyker mentioned on the discord he wants to do some reading up on the subject matter soon. Vininn126 (talk) 07:17, 15 May 2025 (UTC)Reply
    I would not oppose as long as someone edits it. Chihunglu83 (talk) 16:29, 15 May 2025 (UTC)Reply
    I once suggested adding Old Slovene a year or more ago, but for a different reason. I'm glad someone else suggested this idea again. I support it! (although I understand that with my zero reputation, no one will take my opinion into account). AshFox (talk) 00:47, 21 June 2025 (UTC)Reply
    Some information: 1) w:Slovenes#History, 2) w:Carantanians#Language, 3) w:Slovene dialects#Evolution. AshFox (talk) 18:46, 21 June 2025 (UTC)Reply
    I made a template for quotes from "Freising manuscripts": {{RQ:zls-osl:BS|3|40|||}}
    AshFox (talk) 00:08, 23 June 2025 (UTC)Reply
    I added a dictionary template: {{R:zls-osl:BS}}
    AshFox (talk) 11:49, 23 June 2025 (UTC)Reply
    @Benwing2, good day. Help us, "we are stuck in this hole" and can't move further... Sławobóg and I would still like to get a separate language code for Old Slovene and together with him I will formalize in Wiktionary all the known lemmas (there are about half a thousand of them and the task has a final goal) of this unique, very archaic Slavic language of the 900s AD, which existed parallel to Old Church Slavonic. I hope you find some free time to give all this attention. Best regards, AshFox (talk) 13:52, 22 June 2025 (UTC)Reply
    I made guideline. Sławobóg (talk) 20:59, 22 June 2025 (UTC)Reply
    Respectfully, having read your guideline I don’t think you can present a full phonemic inventory with IPA values confidently assigned just like that. It’s bad philology and contains already some questionable elements, eg. How do we know they pronounce PS *ť as [t͡ɕ] specifically?.
    Next there is the issue of using your own transcription, whereby even assuming it’s a good alternative (I am not convinced) it probably violates wikipedia’s policy of own research. Wikipedia isn’t a place to publish unsourced speculation, but rather condense, summarise and make accessible research and academic dialogue on a given topic.
    This coupled with your other comments in this thread give me the impression you’re pushing for this out of the novelty of having a new language added to wiktionary rather than actually being able to sufficiently expand on the topic in its own right. As such I’m opposed to you doing this and think you should reconsider your position. Galloglach21 (talk) 16:12, 20 July 2025 (UTC)Reply
    "How do we know they pronounce PS *ť as [t͡ɕ] specifically?." - this is what dictionary authors tell us, they analysed it and probably compared it with other early Slavic languages and texts, I suggest reading about Polabian;
    "it probably violates wikipedia’s policy of own research." - this is Witkionary and not Wikipedia, we allow original research as long as it's good, one user created new transcription for Slovincian and there is no problem with it;
    Any other arguments? Sławobóg (talk) 16:27, 20 July 2025 (UTC)Reply
    Oh, and ť was pronounced as [tʲ]. Sławobóg (talk) 16:30, 20 July 2025 (UTC)Reply
    In this very thread I comment on that transcription that I in fact did not create, but based on sourced material. Get your facts straight. Vininn126 (talk) 07:29, 21 July 2025 (UTC)Reply
    Yes, but you modified it, and that sourced material is not widely used/accepted. And as I said somewhere, my transcription is similar to already existing ones, just more precise. Sławobóg (talk) 17:05, 21 July 2025 (UTC)Reply
    I added a single letter for an uncovered morpheme. Vininn126 (talk) 20:20, 21 July 2025 (UTC)Reply
    @AshFox I don't have the requisite background to know whether this is a good idea or not. I do know we have an awful lot of Slavic and Baltic languages and I want to make sure adding another one is the right thing to do. @Vininn126 @Thadh thoughts? Benwing2 (talk) 21:16, 22 June 2025 (UTC)Reply
    I'm not very knowledgeable either. I think it may be best to finish the above-linked guideline and work out the issues before adding the code, but other than that I can't really say whether this code makes sense or not. Thadh (talk) 21:27, 22 June 2025 (UTC)Reply
    Guideline is finished. Sławobóg (talk) 06:57, 23 June 2025 (UTC)Reply
    Why aren't we using a scholar's transcription? One is sourceable and reflects scholarly work on the field. Vininn126 (talk) 07:12, 23 June 2025 (UTC)Reply
    There are many transcriptions, website mentions just 2 of them. Igor Grdina's transcription is not that good, it's close to original ortography, and for example ⟨s⟩ represents [ʒ] and ⟨z⟩ can represent [s] and [z] which is annoying; ⟨u⟩ represents [u] and [ɔ̃], ⟨c⟩ can be [t͡s] or [t͡ʃ] etc. My transcription fixes all these problems, and is close to transcription made by Alexandr Vasiljevič Isačenko, to which I have no access anyway. Sławobóg (talk) 11:11, 23 June 2025 (UTC)Reply
    @Sławobóg: Look at English: orthographies don't have to be phonetic, they just have to be consistent. Thadh (talk) 11:20, 23 June 2025 (UTC)Reply
    I know, but there are much less letters used than in English. And why would we prefer this transcription over Isačenko's? Transcription I made is better, easier to work with, and more compatible with modern Slovene, like Slovincian is with Kashubian. Plus it's going to be only me and AshFox who are going to be working on this language. Sławobóg (talk) 11:28, 23 June 2025 (UTC)Reply
    The Slovincian transcription is based on one, in fact the only one, used by some people who have worked with the lect. Vininn126 (talk) 11:40, 23 June 2025 (UTC)Reply
    Nice. And my eťe bi dět naš ne segrěšil is very close to Isačenko's eťe bi dêt naš ne sɘgrêšil. Sławobóg (talk) 11:53, 23 June 2025 (UTC)Reply
    Clarification for those who did not understand, the example of transliteration of Исаченко is taken from here w:pl:Zabytki fryzyńskie#Transkrypcje. AshFox (talk) 12:14, 23 June 2025 (UTC)Reply
    We are ready. Sławobóg (talk) 19:59, 23 June 2025 (UTC)Reply
    @Sławobóg, I suggested yesterday that perhaps we should preserve such a feature of the original text's spelling as the method of conveying Proto-Slavic *y through the digraph ⟨ui⟩ [ɨ]. It looks like it was inspired by Old Church Slavonic with its ъ (ŭ) + і (i) > (ŭi = y). Examples from the text: ⟨buiti⟩ w/ alt. form ⟨biti⟩ “to be” (OCS бꙑти (byti), PSl *byti) or ⟨mui⟩ “we” (OCS мꙑ (my), PSl *my). Instead of the ⟨byti⟩ and ⟨my⟩ you suggested. Moreover, there are not many such words. And in the original text the letter ⟨y⟩ is not used even once. AshFox (talk) 12:10, 23 June 2025 (UTC)Reply
    No reason to keep it. Sławobóg (talk) 12:50, 23 June 2025 (UTC)Reply
    I also commented on discord that I'm not sure this needs a split or not and that more research is needed. It might be possible to include this in Slovene, maybe not. Vininn126 (talk) 21:32, 22 June 2025 (UTC)Reply
    Slovene was and still is a heterogenous linguistic grouping. To build a prescriptive account of the whole Old Slovene language based on a single document from a single dialect is methodologically ungrounded. At the very least, it has to be clarified that listed entries are in the Old Carinthian dialect. Even though no other dialects have been attested (so far), they certainly did exist. Безименен (talk) 10:14, 28 June 2025 (UTC)Reply
    How do you know it was still a heterogenous group? Anyway, I think this is not how we treat languages on Wiktionary. Even if old lect has limited geography/texts or is really early form of a dialect, we still have it as old form of main language in our language tree. It is stated in guideline what excatly Old Slovene is for us. And it's not just for us, language of these texts are commonly referred to as Old Slovene. We shouldn't name it other way "because there were other lects that are not attested". Sławobóg (talk) 09:31, 11 July 2025 (UTC)Reply
    Based on what Sławobóg just replied, I am strong  Oppose for his approach towards the topic.
    I cannot speak from the name of all linguists out there, but a short search in the literature showed that (if not a majority, at least) many scholars think of the Freising manuscripts as AN early form of Slovene, not THE only form of Old Slovene. Drawing conclusions about a whole based on the properties of an individual member is a type of sampling bias, which often leads to composition fallacy and hasty generalization.
    This may be alright in the absolutist mindset of Sławobóg, but I don't think it's the correct way of dealing with things. Claiming that Old Slovene, in its entirety, preserved nasals or made phonemic distinction between Proto-Slavic ě vs e, because a single document allegedly showed it (btw, what's the proof of that), is methodologically ungrounded.
    If he wants to provide descriptive account of the Freising manuscript, let him do it. Making prescriptive interpretations, however, could lead to more harm than good. Безименен (talk) 06:36, 14 July 2025 (UTC)Reply
    This is not how languages work, especially here, bro. Sławobóg (talk) 09:06, 14 July 2025 (UTC)Reply
    I knew from the beginning how mediocre your grasp of linguistics is. Not sure why I even tried to engage in constructive conversation.2A02:C7C:3848:1700:4434:D74:1E0A:DA16 16:44, 14 July 2025 (UTC)Reply
    Good points. What about "Freising manuscript Slovene"? Tollef Salemann (talk) 17:20, 20 July 2025 (UTC)Reply
    This is literally what we're debating about. Linyker¹²³ (talk) 17:23, 20 July 2025 (UTC)Reply
    But how is Slovene not a heterogenous group? Anyway, this "Freising manuscript Slovene" looks more like "Mishnaic Hebrew" or "Medieval Dalecarlian" kind of language, not of L2-level, but very important. Tollef Salemann (talk) 17:39, 20 July 2025 (UTC)Reply
    What do you mean by "Good points"? He showed no evidence or arguments, he just did "a short searching in the literature"? Meanwhile our (Proto-)Slavic dictionaries {{R:sla:SP}}, {{R:sla:ESSJa}}, {{R:sla:EDSIL}} label it as Old Slovene. On top of that, {{R:cu:gorazd}} labels it as Church Slavonic (Derksen does it once too).
    "But how is Slovene not a heterogenous group?" If it always was heterogenous group then it is another argument FOR having it as L2 Old Slovene...
    I want to mention that that he insulted me and then changed his message (no rules about that?). He did something like that before, and like before he actually never explained what's wrong with what I'm saying, just adding empty words like bias or generalization in nonsensical situations, or using political arguments against having ety-code for Moravian. :)
    "scholars think of the Freising manuscripts as AN early form of Slovene, not THE only form of Old Slovene" - what does that even mean for us? Nothing. We work with attested languages (not counting reconstructed ones). We have Old Polish, not Old East Lechitic, we have Old Church Slavonic as parent for Bulgarian/Macedonian and Sanskrit as parent for most Indian languages. Freising manuscripts are only Old Slovene texts, and that is enough for Wiktionary to have it as Old Slovene. What are arguments for having it as label of Slovene? Freising and modern Slovene are more different than Old Polish and Polish, should we merge both? I want to mention that current only Slovene editor supports new L2. Sławobóg (talk) 17:34, 21 July 2025 (UTC)Reply
    If these sources call it Old Slovene, then it is the official name. I personally don’t like it of reasons mentioned above, but you are right, as they in deed call it Old Slovene, so let’s call it Old Slovene. The idea of a new L2 seems otherwise fun and may be usefull for etymology sections. Also, IPA is necessary here, because spelling looks very different from the expected pronunciation, but it is reconstructed IPA. How to deal with such stuff? When we have a recorded word, but no known pronunciation (like in Russenorsk olenamann). May it be a problem? Tollef Salemann (talk) 05:59, 22 July 2025 (UTC)Reply
    We have reconstructed IPA for many dead languages, compare Polabian (similar situation to OSl), including automatic templates, like Old Czech, Old Polish, Old East Slavic, Old Greek, etc. Sławobóg (talk) 07:22, 22 July 2025 (UTC)Reply
     Oppose per Bezimenen. Thadh (talk) 21:04, 26 September 2025 (UTC)Reply
     Support we already have enough IMO Linyker¹²³ (talk) 13:27, 18 July 2025 (UTC)Reply
    Bump. Sławobóg (talk) 16:03, 31 August 2025 (UTC)Reply
     Support I support the addition of Old Slovene. Lerman (talk) 11:37, 26 September 2025 (UTC)Reply
    I prefer keeping it to etymology and descendant sections with langcodes sl, cu, und, or none, as it is one underdefined manuscript. I.e. no L2 sections. We haven't even seen examples for the alternatives due to the scanty Neo-Slovene coverage. The Palaeo-Slovene links don't work any more right now. Fay Freak (talk) 21:17, 26 September 2025 (UTC)Reply

    Add Khuzestani Arabic

    [edit]

    Spoken in Southern Khuzestan, Iran, as a branch from Iraqi Arabic with influences from Gulf Arabic, Luri and Persian. Around half a million speakers.

    • Wikipedia article: Khuzestani Arabic
    • No ISO code, acm-IR in IETF but we could use acm-ira like fa-ira

    Saam-andar (talk) 17:24, 25 May 2025 (UTC)Reply

    IMO we have too many Arabic L2's already and don't need another one, esp. as Wikipedia explicitly describes this as a dialect of Gilit Mesopotamian Arabic and not its own language. Benwing2 (talk) 19:34, 6 June 2025 (UTC)Reply
    Hmmm actually are you proposing this to be an etym-only language? That's probably OK. Benwing2 (talk) 19:36, 6 June 2025 (UTC)Reply
    @Benwing2 Fair enough, would be thankful to have it as an etym-only. Saam-andar (talk) 12:11, 7 June 2025 (UTC)Reply
    I've already responded to this request on Discord. It is merely a subdialect of Gelet-type dialects which are grouped under Iraqi Arabic. — Fenakhay (حيطي · مساهماتي) 23:36, 6 June 2025 (UTC)Reply

    June 2025

    [edit]

    Adding codes for Ohlone and Miwok families and proto-language reconstructions

    [edit]

    According to the book Ohlone/Costanoan Indians of the San Francisco Peninsula and their Neighbors, Yesterday and Today, the term Utian is derived from Proto-Costanoan uţxi ("two") + -ian, created by William F. Shipley in 1978. I wanted to add that to the Etymology section on Wiktionary, but Wiktionary doesn't have an appropriate language code.

    "Proto-Costanoan", more accurately Proto-Ohlone, is a lower-order reconstruction of Proto-Utian (nai-utn-pro), and the reconstructed proto-language of the Ohlone languages; its numerals were in 1990 reconstructed by Catherine A. Callaghan, who has also referenced it in her other publications. Callaghan uses the name "Proto-Costanoan", but the standard modern-day term for the family is Ohlone, and the term "Proto-Ohlone" is often used nowadays.

    With that in mind, I'd like to request that the Proto-Ohlone language be added to WT:LOL/S, with the code nai-ohl-pro. The Ohlone language family is not currently on Wiktionary, so I'd also like to request that a corresponding code be added to WT:LOF, with the code nai-ohl. For completeness, I also request the addition of the Miwok family (nai-miw) and Proto-Miwok (nai-miw-pro, also worked on by Callaghan), the other major subdivision of the Utian languages. Ookap (talk) 19:40, 12 June 2025 (UTC)Reply

    @Ookap I moved your requests to WT:LTR as that is where these sorts of requests are normally made. I don't know anything about Ohlone or Miwok or even who to ping other than @-sche; you might poke around to see who has edited terms in these families and ping them. Benwing2 (talk) 20:43, 12 June 2025 (UTC)Reply
    Thanks! I've updated Help:Adding and removing languages to mention this page instead of BP. Ookap (talk) 20:51, 12 June 2025 (UTC)Reply
    @Benwing2: I'm certainly not an expert (my main interest is ethnobiology), but I'm familiar with pretty much all of the languages of California in very general terms, so I will often at least have an opinion on them. California has been inhabited for a long time, has lots of geological barriers (not to mention covering a very large area) and has been out of reach of all the pre-Columbian civilizations of the Americas, so the historical linguistics is very complicated and hard to resolve into anything large-scale. You have some families that are better known elsewhere, like Uto-Aztecan, Na-Dene and Algic, but then you have a number of isolates and smaller families. There were a couple of ambitious proposals in the early days, the Hokan languages and Penutian languages, that still haven't been proven on the highest level, but there's been some progress on demonstrating the validity of many of the parts.
    The Utian languages are are one of those "Penutian" parts where progress has been made. The Miwokan languages have always been accepted as a valid group and the Ohlone languages (I've always known them as Costanoan) as well (with some debate as to whether the latter are languages or dialects). I don't know much on the substance, but it seems to me like Utian should be worthy of Wiktionary recognition, and maybe the Yok-Utian languages. — This unsigned comment was added by Chuck Entz (talkcontribs) at 04:59, 13 June 2025 (UTC).Reply
    Adding family codes for Miwok and Costanoan/Ohlone seems reasonable, and adding Proto-Miwok and Proto-Costanoan—unfortunately, I can find very few sources calling it "Proto-Ohlone" (which may mean we should also call the family "Costanoan" for consistency). - -sche (discuss) 03:03, 14 June 2025 (UTC)Reply
    IMO given the weight of sources we should be using "Costanoan" unless there is strong evidence of a recent shift towards "Ohlone", and the name of the proto-language needs to match the name of the family unless there's a really good reason for the divergence (which I don't see here). Benwing2 (talk) 03:18, 14 June 2025 (UTC)Reply
    I agree with Chuck Entz in that these are pretty accepted groupings. People aren't completely sure about Penutian and Yok-Utian (and I find at least Penutian a bit dubious), but Utian has long been very proven and accepted (and, along with Proto-Utian, is in fact already on Wiktionary). Similarly, the Miwok (or Miwokan) and Ohlone language families are very clearly accepted groupings, and for me should be on Wiktionary.
    With regard to whether to use "Ohlone" or "Costanoan"...unfortunately, almost all sources on Proto-Ohlone (Proto-Costanoan) are from the 1990s, meaning they use the name "Costanoan". Living in the area nowadays, I can say that the name "Costanoan" has completely fallen out of use for the ethnic group and language family, perhaps partially as part of recent efforts to revitalize their culture and languages. Most people here would likely not know what "Costanoan" meant, but know "Ohlone" well, and from what I know Ohlone people, while they might know the word, disclaim it as a colonizer term—even Wikipedia lists the ethnic group as "formerly known as Costanoan". With that said, given formal linguistic sources, most of which are older, mostly calling the language family Costanoan, I suppose I can understand why Wiktionary might want to call it that. My personal preference having grown up in the homeland of the Ohlone leans heavily toward "Proto-Ohlone" and "Ohlone languages", but my more important personal preference is that the proto-language is added, no matter the name. Ookap (talk) 08:07, 20 June 2025 (UTC)Reply

    Adding code for Proto-Ainu

    [edit]

    Several Ainu entries (such as プリ and アㇷ゚ト) show derivations from Proto-Ainu, but there's no corresponding code on WT:LOL/S so they can use templates as is the norm. Proto-Ainu, the reconstructed proto-language of the various Ainu dialects (or of the Ainuic family, already in Wiktionary as qfa-ain), has been reconstructed by Vovin (if not others), and we even have an appendix of reconstructions. Therefore, I request a code, likely ain-pro or qfa-ain-pro, be added, so Ainu entries can use proper Wiktionary formatting. Ookap (talk) 19:52, 12 June 2025 (UTC)Reply

    request from User:AmazingJus: Updating languages found in Module:languages/data/2 and Module:languages/data/3/k

    [edit]

    (moved from User talk:Theknightwho)

    Hi, could you update the data for these two languages to add a bit more flexibility?

    For Ewe (ee), it’d be great if diacritics are stripped at the entry level, specifically acute, grave, circumflex and caron? They correspond to high, low, rising and falling tones respectively.

    For Krio (kri), likewise, remove diacritics (but only acute, grave, circumflex) and also add a sort key with the following order:

    • ɛ after e
    • gb after g
    • kp after k (digraphs gb and kp are both treated as separate phonemes)
    • ɔ after o

    Cheers heaps — oi yeah nah mate amazingJUSSO ... [ɡəˈdæɪ̯]! 01:11, 9 June 2025 (UTC)Reply

    Also pinging user @Fenakhayoi yeah nah mate amazingJUSSO ... [ɡəˈdæɪ̯]! 22:44, 11 June 2025 (UTC)Reply
    Before implementing this, I'd like to hear some confirmation from other knowledgeable editors that these changes are correct, or at the very least, sources showing that (a) these diacritics are used in dictionaries, (b) the diacritics are not used in running text outside of dictionaries. Benwing2 (talk) 03:20, 14 June 2025 (UTC)Reply
    @Benwing2 For the Ewe language, the tone markings are based on Nuseline's Ewe-English dictionary and Basic Ewe for Foreign Students. In the latter source, it says "Note that native speakers of Ewe often leave the marking of tones aside. For learners of the language, however, the marking of tones is essential".
    For the Krio entries, the tones and letter orders are based on A Krio-English dictionary by Clifford Nelson Fyle. The Wikipedia article also says for the tones: "Three tones can be distinguished in Krio and are sometimes marked with grave (à), acute (á), and circumflex (â) accents over the vowels for low, high, and falling tones respectively but these accents are not employed in normal usage." — oi yeah nah mate amazingJUSSO ... [ɡəˈdæɪ̯]! 23:08, 14 June 2025 (UTC)Reply
    It seems like there isn't any update on this so far — feel free to have a look at these sources for reference @Benwing2oi yeah nah mate amazingJUSSO ... [ɡəˈdæɪ̯]! 01:22, 20 June 2025 (UTC)Reply

    Levantine merger

    [edit]

    I found out from here that the split between North and South Levantine Arabic was dissolved by the ISO two and a half years ago. The 2023 discussion about merging them on Wiktionary didn't end up going anywhere because of a lack of available contributors, and I think that situation is even worse today, as the South Levantine Arabic project has left Wiktionary and North Levantine Arabic never sees much concerted activity.

    I think it's good for the Wiktionary merger to happen, but I have some concerns. I want to solicit ideas for making the work manageable, keeping in mind that it could easily fizzle out this time too.

    • After merging, I don't get how to account for all of Levantine. The expansion to Levantine Arabic adds a burden on everyone to know things about Levantine varieties they probably don't have knowledge of in order to make an entry complete.
    This burden is technically there now too, but I feel like the old ISO split creates small enough halves that it feels okay to get away with only focusing on one subvariety within those halves: North Levantine seems to mostly have had Lebanese Arabic contributors (just with a translit convention that's kind of? inclusive of Damascene) and South Levantine focuses on Palestinian because it was part of User:AdrianAbdulBaha's push to improve online resources for Palestinian Arabic. This feels harder to handwave away now.
    I do feel like focusing on as small of a comprehensive area as possible is how you get things done, which is why I'm concerned about expanding the scope of Levantine.
    • Relatedly I'm worried about the module/template infrastructure growing unmanageably large compared to the amount of people available to maintain it (let alone maybe being unreadably spammy on transclusion). I'm lazily working on Module:User:Still, when you think about it/apc-IPA to account for some of the variation in what "North Levantine" was supposed to cover, where South Levantine Arabic never had something similar, but it seems like now it'd be good for an {{apc-IPA}} to also cover Palestinian and Jordanian varieties that I don't have very much knowledge of how to divide. The same goes double for {{ajp-conj}}.
    • Supposing the merger does ago ahead, South Levantine Arabic has 3016 entries (128 non-lemmas) and North Levantine Arabic has 505 entries (9 non-lemmas). I feel like all of those entries will need to be checked for whether they're exclusively Palestinian/South Levantine, exclusively North Levantine, or shared in order to add the right term, sense, or accent labels.
    I think this has to be done by checking terms against published references (even for those of us currently active who do speak Levantine Arabic — at least for my part I don't know all of what's used and not used outside of my own dialect). Would it be of use to add some kind of "warning, this term needs to be assigned to a location — you can help by locating it in these references and then removing this warning" template under the L2 header of all 3k-ish merged entries to allow the effort to continue even as contributors come and go over time? (Just saw that this is what User:A455bcd9 suggested during the initial conversation as well)

    Might be overthinking. Pinging some Arabic contributors, including old ajp editors that may still be around: User:Fayçalmf, User:Fenakhay, User:Benwing2, User:SarahFatimaK

    Still, when you think about it (talk) 18:40, 15 June 2025 (UTC)Reply

    @Still, when you think about it In my experience, mergers are always harder than splits, and you're running up against this reality. I think in practice it's fine to have a warning indicating that a given term was originally North Levantine or South Levantine and hasn't been assigned appropriate labels. I also think focusing on a limited set of dialects is sufficient, maybe just urban Syrian, Lebanese and Palestinian, or the three + Jordanian (I don't know how different Jordanian Levantine is from Palestinian Levantine). As for designing the templates themselves, we can maybe follow the approach of Occitan, which has been able to handle several dialects under one L2, and design the templates so that if someone knows the correct inflections for only a limited set of dialects, only those dialects get displayed. I can help with the coding aspects. There's also the Richard Harrell series of Syrian Arabic grammar and dictionaries, I know these are a bit old but generally I have found the series reliable. Benwing2 (talk) 18:55, 15 June 2025 (UTC)Reply
    Also, {{pt-IPA}} is able to handle several different Brazilian and European Portuguese dialects, and might produce some ideas as to how to handle the pronunciation differences. The general approach followed is to prefer a single spec that gives the maximal information (e.g. I know that some Levantine dialects have merged short ĭ and ŭ but others keep them apart; the "maximal information" would distinguish these two and the underlying code would merge them appropriately for the dialects that merge them), but allow different specs for different dialects. Benwing2 (talk) 18:58, 15 June 2025 (UTC)Reply
    Finally, there is the issue of how to represent the script. All of the resources I'm familiar with use transcription, but Wiktionary prefers using the original script. I don't even know if there's a standard for how to represent the various dialects of Levantine Arabic in Arabic script, much less the specifics of how this works if it exists. Maybe you can help me understand this. Benwing2 (talk) 19:00, 15 June 2025 (UTC)Reply
    There's no top-down standard for dialects specifically, but in real life people write their dialects in the Arabic script, which I say is good for a non-specialist dictionary to reflect. (The exception is mostly Lebanese speakers in their early 30s and younger, who practically exclusively write in 3arabizi online, which would be good to document but has too many random variables). Descriptively/impressionistically, spelling matches Standard Arabic spelling, except for
    • stopped interdentals (always spelled with the plosive letter)
    • emphatics that all dialects have deemphasized (always spelled with the plain letter, not the emphatic letter, like ركد (rakad, to run))
    • feminine -i (almost always spelled ـي to match morphological reanalysis, as in ـكي ـتي)
    • 3ms -o, which Lebanese often spell ـو instead of ـه
    • other sound shifts that collapse a distinction that the Arabic script is supposed to indicate, where it's more correct to match the Fusha spelling but commonplace to respell phonetically (e.g. ق، ـوا، ـة, assibilated interdentals, emphatics in dialects that are losing emphasis across the board, etc)
    Still, when you think about it (talk) 14:25, 16 June 2025 (UTC)Reply
    Speaking of standards, this reminds me that translits are an issue I forgot about. We don't have room to make them show different variants like we do with IPA. Would it honestly be acceptable to just do without translits? If not, I'm imagining a weird amalgam of different pronunciations, and it's a little awkward because even though they're trans-"liter"-ations they kind of suggest pronunciation info nevertheless:
    • بيض (bayð̣, eggs)
      In terms of pronunciation, the combo of -ay- and interdentals is rare, but in terms of translit that's the highest-info representation of what these letters spell
    • تلة (talle, hill)
      There are dialects with invariable ة (/⁠-a⁠/), but you can always derive that from the form of a dialect with ة (/⁠-a, -e⁠/) and not vice versa, hence the symbol -e here. (Or in the true spirit of translit do we want a special symbol only for ة}?)
    • قبضاي (qabaḍāy, macho man)
      This is from an Ottoman Turkish /d/, which formally was loaned as /dˤ/, but some speakers with interdentals went on to associate this /dˤ/ with their /ðˤ/. Is the proper form قبضاي (qabaḍāy /⁠-ḍāy, -ð̣āy⁠/)? (I would prefer قبضاي (qabaḍāy /⁠qabaḍāy, qabað̣āy⁠/) but I don't want /q/ in |ts=)
    • تعتير (taʕtīr, tiʕtīr, miserable situation) ~ تعثير (taʕṯīr, tiʕṯīr)
      This interdental seems fine. The ت only spells t and the ث is only used for dialects with interdentals, in which it can spell . But I'm not sure how to smooth over the templatic variation in the first vowel. I guess a real "transliteration" would be تعتير (tʕtyr) ~ تعثير (tʕṯyr) (same goes for all examples, of course), but I don't think anyone would want that...
    • قاظان (qāẓān, water heater)
      (I actually thought nobody said this with an interdental, the IPA on that page is new to me. If they do, then قاظان (qāẓān /⁠-ẓān, -ð̣ān⁠/, water heater) in the same vein as قبضاي (qabaḍāy /⁠-ḍāy, -ð̣āy⁠/, macho man) above?)
    • ظروف (ẓrūf, circumstances)?
      Or ظروف (ẓrūf, ð̣rūf, circumstances), or ظروف (ẓrūf /⁠ẓrūf, ð̣rūf⁠/, circumstances)?
    • صغير (zḡīr, ẓḡīr, small) ~ زغير (zḡīr, ẓḡīr)
    Anything stand out as super wrong? The q feels a bit unfortunate because it's impossible not to try to pronounce it (and it's a minority pronunciation). I'm not sure how useful it is to have to make up our own WT:AR TR system like this vs. just not doing translits, if Wiktionary would allow that, and leaning only on the IPA available in entries. Still, when you think about it (talk) 15:32, 16 June 2025 (UTC)Reply
    We can remove the interdental pronunciation on قاظان if you're unsure about it. I'm not 100% either.
    About transliteration, we should probably stick to a standard. Probably one that matches urban South Levantine/urban Syrian. Alternate pronunciations can be represented by IPA, but having a standard would probably be more helpful. The amalgamation approach would probably be confusing. Fayçalmf (talk) 15:47, 16 June 2025 (UTC)Reply
    Okay, I think that's a better idea. Potential translit guidelines:
    • No imala, only ا (ā)
    • Only ق (ʔ) in native vocab
    • No interdentals, only ظ ذ ث (ẓ z s) and ض د ت (ḍ d t)
    • Distinguish lax from tense -e -i and -o -u as urban Syrians or urban Palestinians/Jordanians do (I guess by majority rule, e.g. I know a coastal Syrian guy with sani for سنة (year) but the default ought to be سنة (sine, sane, year))
      • What to do about -y -w? My own dialect has /maʃe/ مَشي (walking), /raʔe/ رأي (opinion), /ħelo/ حلو (sweet, pretty, nice), /ʒaru/ جرو (puppy) (MSA loan), but my understanding is these are all /-i -u/ in dialects that distinguish lax from tense final vowels. The Olive Tree Dictionary also gives ḥilew as an option for حلو, apparently.
        • Can we do مَشي (mašy, walking), رأي (raʔy, opinion), حلو (ḥilw, sweet, pretty, nice), جرو (jarw, puppy)?
        • Or just مَشي (maši, walking), رأي (raʔi, opinion), حلو (ḥilu, sweet, pretty, nice), جرو (jaru, puppy) and leave the details to the IPA? I actually like this better visually.
    • No diphthongs, except in cases where a dialect like Damascene will have them, like of course أو (ʔaw, or) and I think elatives like أوضح (ʔawḍaḥ, clearer, clearest) instead of ʔōḍaḥ
      • Can we actually just do diphthongs unconditionally? It wouldn't be faithful to most dialects' pronunciation but I'm wondering if it's an OK tradeoff.
    • Violate these guidelines if the word itself is in violation of them, especially e.g. if it's restricted to or in imitation of a dialect that has differing features
    Still, when you think about it (talk) 16:30, 16 June 2025 (UTC)Reply
    I like these guidelines.
    I agree with maši, ḥilu instead of maši, ḥilw.
    I'm not sure what to do about diphthongs. It could be a case by case basis like what Wikipedia does with British v American spellings & just leave it how the person who wrote the article put it in or we could standardise (only diphthongs vs no diphthongs except where they are in Damascene). Fayçalmf (talk) 17:17, 16 June 2025 (UTC)Reply
    Forgot one more thing to consider that's a whole headache of its own, which is the treatment of kasra and damma. I think when they're medial and closed or stressed we'd do best by trying to adhere to original i/u, as in:
    متل (mitl, like), ضفر (ḍufr, fingernail, toenail), صدر (ṣidr, chest), جملة (jumle, sentence), خلص (xiliṣ, he finished)?
    Quick review of other options, though:
    • Schwa them both like a lot of Lebanese and Syrian references do, but importantly not a lot of Palestinian Arabic references:
      متل (mətl, like), ضفر (ḍəfr, fingernail, toenail), صدر (ṣədr, chest), جملة (jəmle, sentence), خلص (xəleṣ, he finished)
    • Same but with ⟨i⟩ to be more inclusive of non-schwa-ing lects (like my idiolect, so this is just a personal pet peeve). This gets a bit strained when it comes to terms like بكرة ("bikra", bukra) that are still predominantly with u.
      متل (mitl, like), ضفر (ḍifr, fingernail, toenail), صدر (ṣidr, chest), جملة (jimle, sentence), خلص (xileṣ, he finished)
    • Adhere to one's own lect, but at least in my case this isn't of much use: I mostly merge to kasra outside of some sporadic retentions of damma, I systematically round this vowel around emphatics, and I don't feel like I have schwa.
      • If I were using ⟨i u⟩ I'd transcribe my dialect as: متل (mitl, like), ضفر (ḍufr, fingernail, toenail), صدر (ṣudr, chest), جملة (jumle, sentence), خلص (xuliṣ, he finished)
      • Otherwise if it were totally up to me my dialect would be: متل (metl, like), ضفر (ḍofr, fingernail, toenail), صدر (ṣodr, chest), جملة (jomle, sentence), خلص (xoleṣ, he finished)
      • This seems like something to care about in the IPA section, though, not in the translit.
    There's also -iC -uC in final syllables, which Cowell (Damascene) and the South Levantine project on here represent with e o. I would prefer i u for symmetry with the above and because as a bonus it's more inclusive of a common type of Lebanese variety that really does have [-iC] (which I believe often although not universally comes with -uC as well). We can separately figure out how to get more granular with lax/tense kasra and damma in terms of the IPA, though.
    خلص (xiliṣ, he finished)
    Lastly, there's the epenthetic vowel, which despite the fact that I sorta believe it's phonemic in many varieties I still believe shouldn't be represented in translit:
    متل (mitl, like)
    May have still forgotten other stuff, which I'll try add as it comes to mind (+I'll ask anyone reading to do the same too!).
    Still, when you think about it (talk) 17:43, 16 June 2025 (UTC)Reply
    I'm not a Levantine speaker but my vote is to maintain the i and u in transliteration according to conservative dialects that still maintain the original distinction clearly, except for words that don't exist in such dialects, where either i or schwa is fine. Dialects that merge the two can just ignore the distinction in pronunciation. This is problematic for dialects like yours where some u have merged with i but not all; either just ignore those dialects or show two transliterations, one with i and one with u. Hope this makes sense. Benwing2 (talk) 17:56, 16 June 2025 (UTC)Reply
    I'm not completely opposed to a merger, but I'm not really for it either. It's mostly personal bias because I do like them being split, but if most contributers agreed to merging, I wouldn't have an issue with that. Fayçalmf (talk) 22:50, 15 June 2025 (UTC)Reply
    Regarding IPA for different dialects, we could do what the main Arabic articles do when listing other dialect pronunciation & have the "main" pronunciation along with subdialectal pronunciations listed under it
    • Example using قاضي
    • IPA(key): /ʔaː.dˤi/, [ˈʔɑːdˤɪ]
      • (Druze, Coastal Syria) IPA(key): /qaː.dˤi/, [ˈqɑːdˤɪ]
      • (Bedouin) IPA(key): /ɡaː.dˤi/, [ˈɡɑːdˤɪ]
      • (Fellahi) IPA(key): /kˤaː.ðˤi/, [ˈkˤɑːðˤɪ]
    This would allow showing diversity in pronunciation while not needing contributers to have extensive knowledge on different Levantine dialects. Fayçalmf (talk) 02:23, 16 June 2025 (UTC)Reply
    @Fayçalmf Yes, this is very similar to how {{pt-IPA}} handles Portuguese pronunciations. We have a "general Brazilian" pronunciation (reflecting an amalgam of the most common features cross-dialectally, and approximately the way newscasters in Brazil speak) and a "general Portugal" pronunciation (approximately reflecting a cultured Lisbon pronunciation), and nested underneath each are specific Brazil and Portugal regional pronunciations. This is also similar to how {{es-IPA}} works. So this approach is definitely feasible. Benwing2 (talk) 02:28, 16 June 2025 (UTC)Reply
    I also like the split just because it gives us a smaller area to work with, but I can see why it's arbitrary and you can come up with other isoglosses to create whatever other split you would like. Relatedly, the other day I wanted to edit North Levantine Arabic منشان to add Western Neo-Aramaic miššōn- as a descendant and found that that page only has a South Levantine Arabic entry, and it felt bad to duplicate that whole thing to North Levantine Arabic just to add one tangential note. So this is my personal thinking. Still, when you think about it (talk) 14:30, 16 June 2025 (UTC)Reply
    Yeah. Realistically speaking, after the initial hurdle of tidying everything up post-merge, it would be really nice to have everything contained into one Levantine Arabic section. Would it be possible to have categories for terms that are either exclusively South or North Levantine like we already have for Lebanese Arabic, Syrian Arabic, Palestinian & Jordanian? Things like هم can be placed into the South Levantine category & هن in the North?
    Like:
    هم (homme) (enclitic form ـهم (-hom))
    1. (South Levantine) they
    --
    هن (hinne) (enclitic form ـهن (-hon, -yon, -on))
    1. (North Levantine, Galilee) they
    Fayçalmf (talk) 15:34, 16 June 2025 (UTC)Reply
    @Fayçalmf Yes we can easily create such categories. Benwing2 (talk) 17:23, 16 June 2025 (UTC)Reply
    @Still, when you think about it @Fayçalmf I moved this topic to WT:LTR, which is where we normally handle language splits and mergers. In order for this topic not to stall, it would help if one of you could create a list of what is needed compared with what we currently have, and think about drafting a plan of action. I can help with the latter, but am somewhat unsure about the former as I have not studied Levantine Arabic much (I took a couple of years of MSA classes back awhile ago when I was in school, and have studied Egyptian and Moroccan Arabic on my own in fair depth). Benwing2 (talk) 05:12, 17 June 2025 (UTC)Reply
    Of course. Could I get a little more elaboration on "a list of what is needed compared with what we currently have?" Fayçalmf (talk) 05:22, 17 June 2025 (UTC)Reply
    @Fayçalmf Ultimately what we want is a specific plan of action regarding steps to take to implement the merger. See for example Wiktionary:Grease_pit/2023/January#apc_and_ajp_merged, where I enumerated a possible plan of action for merging Levantine Arabic, and Wiktionary:Language_treatment_requests/Archives/2020-24#RFM_discussion:_February–March_2024, which has a similar but more recent plan of action for splitting Khanty into separate languages that was actually put into practice. Part of the work will be creating new modules and templates to handled the combined language, and we need at least a preliminary working version of these modules and templates before we put a lot of working into actually merging the lemmas. In order to create those templates, we need to know how they should behave, and this requires some input from Levantine speakers. The current North and South Levantine Arabic headword templates appear to be based on the Standard Arabic templates that @Fenakhay and I (among others) put together, but there are also South Levantine Arabic verb conjugation templates (there don't seem to be any such templates for North Levantine Arabic). The current templates are not designed for a multi-dialect language, so there will need to be some thinking about how to design them to handle the differences among Levantine dialects. One relatively simple way of handling different dialects is to have one headword line per dialect; see for example Galician querer, which has a line for the standard norm and another line for the reintegrationist norm, and similarly has two conjugation tables. Another approach is to not have anything in the headword; see for example Occitan alenar, which has 5 conjugation tables but nothing in the headword. This latter approach might not make sense for adjectives and nouns, because it requires a declension table for each adjective and noun, which might be overkill (e.g. for nouns, all you need to list is the plural). So what I would need as a start is a specific design for the noun, verb and adjective headword templates, with some examples of what the input would be and how it might display. I would start with nouns (which are easier than adjectives) and start with examples, rather than trying to come up with a design right away. Pick some common nouns and think about how to best display them, and then come up with a template syntax for specifying the relevant forms. I can help with the template syntax if I have several examples of the nouns and their plurals (both in Arabic script and transliteration). After that we can tackle adjectives, and then verbs. Benwing2 (talk) 06:00, 17 June 2025 (UTC)Reply
    I might as well thrown in a couple of other multistandard systems: آب has Urdu (sister lect to Hindi) and Persian (Classical Persian/Iranian Persian/Dari/Tajik) entries where you can see different approaches and examples of infrastructure used to present the different scripts and pronunciations. Note that some of these have their own language codes, scripts and L2 headers, but there are templates that tie them together. Not that I'm specifically recommending any of these for the case at hand, but it may spark some ideas. Chuck Entz (talk) 06:55, 17 June 2025 (UTC)Reply
    - Create a bot to turn all ajp articles into apc, alongside changing headers to ==Levantine Arabic==. We either have it leave lemmas with both to be dealt with manually like the original thread suggests, or we manually merge any terms in both ajp & apc to apc before running the bot.
    - I'm not really sure how the ajp conjugation table works. If it can handle variations of the same form already (i.e. اطلع for South Levantine & طلاع for North), then great. If not, there need to be accomodations made.
    - New categories created for 'North Levantine Arabic' & 'South Levantine Arabic' to hold region exclusive terms à la the country categories.
    - There has to be an agreement regarding ر in IPA. ajp tends to use r/rˤ, while apc uses r/ɾ. I suggested earlier having a "standard" for IPA with regional variations below it, so we could use r for the standard and ɾ and rˤ for the regional pronunciations.
    Those are some bullet points I have for now. I'd still like to hear input from @Still, when you think about it as well as what to do with tables & modules that already exist for apc/ajp to make a finalised plan for merging. I'll add more if I think of more things we need later on. Fayçalmf (talk) 11:31, 17 June 2025 (UTC)Reply
    For sure, the ajp-conj template can be used as a base, but it needs updating to be able to handle variation. I like the Occitan example with multiple tables and I'll try to think about how to implement something similar.
    About the categories, I'm wondering if we can avoid recreating the North/South Levantine split. Would it be possible to stick to "chiefly Syrian, Lebanese", "chiefly Palestinian, Jordanian", alongside transitional areas? I was convinced by User:A455bcd9's reasoning in the ISO proposal that said that the division was somewhat arbitrary and not derived from literature.
    For IPA, I want to give all major pronunciation standards equal weight instead of deciding on one standard ourselves. This doesn't solve the issue of how to transcribe ر, but it does leave it up to individual accents to have it transcribed in their own way without interfering with ours. I'm coming up blank for now on what to do about it, though. Still, when you think about it (talk) 07:43, 18 June 2025 (UTC)Reply
    Not sure countries are the best boundaries. Rural vs urban is often bigger than country A vs B. There are also sectarian differences (esp. Druze). So unless we know that a word is only widespread inside one country's border, it's better to stick to traditional areas ("Jerusalem", "Damascus", "Beqaa Valley", etc.). A455bcd9 (talk) 07:52, 18 June 2025 (UTC)Reply
    Thanks! It's more daunting but you're right. Sorry about the two username mentions, I had mistaken you for inactive. Still, when you think about it (talk) 10:57, 24 June 2025 (UTC)Reply
    A standard for IPA doesn't have to be based on one country/region's pronunciation. It can be a generalisation, then any deviations from the generalised pronunciation can be accounted for as well in the IPA underneath the "standard" one.
    Some examples:
    تقيل / ثقيل
    -
    برداية
    -
    شطرنج
    • IPA(key): /-/, [ʃɑtˤˈɾˤɑnʒ], [ʃɑ.tˤɑ.ɾanʒ]
    -
    I like this method personally because contributers can just put in the most basic form of the pronunciation & nuance can be added in later by other contributers. If we do go with this method, then we have to figure out an order for them to go in so they're not random in every article. Fayçalmf (talk) 13:32, 18 June 2025 (UTC)Reply
    Draft of a plan to merge North & South Levantine Arabic:
    1. Rename apc to "Levantine Arabic"
    2. Merge tables. Edit declension table to be able to accommodate North Levantine as well. (Note: I can't code, so I don't know the logistics of this step.)
    3. Create a bot to merge ajp into apc. Leave articles with both apc & ajp entries alone to be dealt with manually.
    4. Once everything is converted to apc, delete ajp from the language list.
    5. Levantine speaking contributers will have to work on tidying things like IPA & formatting to meet new standards.
    @Still, when you think about it, @Benwing2, @A455bcd9, @Fay Freak
    Any thoughts/criticisms? Anything that needs to be added or something else that needs to be taken into account that I missed? Fayçalmf (talk) 20:20, 21 June 2025 (UTC)Reply
    Looks like you guys can do it. I generally only have bookish knowledge so only added a few, often obsolete, Levantine terms, to finish some etymologies or circumstantial curiosity, when I could not think of much argument to sneak them in under the general Arabic header, and depending on the source it was sometimes left open which Levantine a term was gathered from (e.g. {{R:ar:Berggren}} claiming to have both Damascus and Jerusalem). Fay Freak (talk) 20:42, 21 June 2025 (UTC)Reply
    Hey, sorry I dodged this. Randomly don't have the free time I've had for the last month or so. Will do my best to keep on top of this nonetheless because I don't want to leave it half unfinished. These steps look good to me, I'm just still hung up on the small details: the specifics of IPA formatting (and getting my apc-IPA template to not be broken, although that's more of a side project) and, annoyingly, the i/u thing when it comes to verb headwords, as the "North Levantine" dialects I'm aware of leveled almost all Form I verbs to yiCCuC vs yiCCaC* whereas "South Levantine" dialects seem to maintain a robust distinction between yiCCiC and yiCCuC that's completely impenetrable to me. Fortunately this doesn't affect the lemma (which will be the past form), but I guess this means either spamming multiple headwords per entry or just not doing headword lines? I prefer the former.
    Verb conjugation tables may be easier to deal with. The different systems I'm aware of are
    1. Coastal Syrian katbit, katbīto
      • Also up in the mountains they use originally imperative forms like طلاع as the base lf the 1sg
    2. Nearby Lebanese katbit, katbíto due to these dialects' a-elision, not as a purely morphological thing (do these dialects also have katbto?)
    3. North-ish Lebanese ʕaṭyit, staḥyit, ḡilyit (typically w/ nonpast 3fs+2fs taʕṭe, tistíḥe, tiḡle but somewhat rarely 2fs tistiḥye); ḥkī "speak!" (احكي not احكيه), ktōb, kōl
    4. Typical Lebanese and urban Syrian ʕaṭit, štarit, ḡilyit; katabit, katabíto; bimši, bḡanni, biḡannu (ignoring -i -u vs -e -o); tistíḥi~tistáḥi; stʕart, ḵtart; yitruk; ʔiḥki, ktōb, kōl
    5. South Lebanese tistḥi; stʕirt, ḵtirt
    6. Transitional South Lebanese~Galilean katabit~katabat, katabáto, kátabato; ʔiktub~ʔuktub, kōl
    7. Palestinian/Jordanian and Aleppine bamši, baḡanni, b(i)ḡannu (last one also found in Lebanese areas); tistáḥi~tistaḥi
    8. Palestinian and Jordanian yitrik; ʔuktub, kul "eat!"
    9. Jordanian and regional Palestinian? katbato
    There's some stuff I don't know the details of like the Palestinian distribution of eg ramaw vs ramu or what dialects do ḡilit.
    Conjugation tables technically don't need to show connecting forms so we can ignore the -o stuff to start with. I would like to represent 4 and 7, of course, plus 3 and 5 if possible. (My knowledge of lesser-used forms is poor the further south in the region we get.) This actually seems doable with minimal to no Lua, unless we want some logic to automatically show multiple tables.
    Still, when you think about it (talk) 10:55, 24 June 2025 (UTC)Reply
    Regarding the i/u thing, it's possible to put both in the transliteration or just leave it to what the contributer put it in as. بكرة is already just (bukra), so leaving it as is would be fine, for example. Fayçalmf (talk) 11:47, 24 June 2025 (UTC)Reply
    I really dislike multiple headwords on the same word. It's ugly. I think we could simply have the transliteration reflect the different pronunciations.
    مشى (maša) (non-past بمشي (bamši, bimši))
    The ajp article for أخد has 2 declension table for بوخد & باخد. Perhaps we could do something similar?
    ===Conjugation===
    [Mock table, regular حكى conjugation for most dialects]
    ====Chiefly Lebanon===
    [Mock table, represents forms like حكيوا (ħakyu)]
    Just a suggestion. The current ajp table did a good enough job with إجا on apc (with much more coding), so this approach could work. Any other ideas? Fayçalmf (talk) 22:47, 24 June 2025 (UTC)Reply
    I see your point about multiple headwords -- maybe whenever it's needed we can equally just add a new L3 with {{alternative form of}} (just did this at جاتوه) -- and the fact that small variations seem easy enough to represent within one tr:
    كبس (kabas) (non-past يكبس (yikbis, yikbus), active participle كابس (kābis))
    The one last tr-related thing on my mind is when it comes to usexes and quotes. I believe the trans-"lit" for quotes should also follow pronunciation, like I did for Salam el-Rassi at أما (or to a lesser extent the yṣaḥḥ at واوا). I think usex translits can also just be in whatever dialect the usexer is most comfortable using or transcribing, especially because I don't see a reason to want to change the translits for all the ajp usexes. Can we enforce the use of an accent qualifier for usexes and quotes, like the (Lebanon) at the bottom of عبكرة?
    Also, that last part and the IPA business seems like it means it's worth sitting down and figuring out an acceptable set of discrete sub-accents/dialects to enforce consistent representations of, which should be a priority but not block the merger from
    happening to start with. Still, when you think about it (talk) 17:59, 26 June 2025 (UTC)Reply
    I agree with your points about translit.
    Druze/Coastal Syria is already being represented in apc, and Bedouin & Galilee pronunciations in ajp. We could represent Fellahi accents too, and then for anything else have cities to represent them if needed (which Galilee already is doing).
    دكتور
    Should Imāla be its own subsection? I think it should be with exception to Lebanon-only words like ڤيتاس.
    I mentioned order before, should the subsections go alphabetically or do you think there's a better way to arrange them? No matter what, if we have multiple city specific pronunciations, those should definitely go alphabetically. Fayçalmf (talk) 11:40, 27 June 2025 (UTC)Reply
    Does the order they're in really matter? Most words won't require more than 3 variations anyway Fayçalmf (talk) 02:35, 29 June 2025 (UTC)Reply
    Actually, in terms of making a template, it does. How about the "standard," then Imāla, Druze/Coastal Syria (separated if need be), Bedouin, Fellahi, then anything else like hyperforeignisms or Galilee can be manually added underneath.
    Fayçalmf (talk) 03:53, 30 June 2025 (UTC)Reply
    Personal preference: no base form, each variant we list goes next to the others, but we put the more-urban options up top like you're doing here. Damascene, metro Lebanese, ?urban central Palestinian/Jordanian?, and then Druze, coastal Syrian, Beqaa/Qalamoun, Fellahi, and others? I found this classification of Palestinian and Jordanian dialects by Palva that may help decide on representative forms from down there, although it seems a bit outdated (on the one hand 1984 isn't at all long ago but on the other hand it says Galilean dialects predominantly preserve interdentals and /q/, which I know exists but I'm not sure it's predominant?).
    I am wondering if we can get by without the imala tag. I see the merit in referencing the common name for the phenomenon visible in some pronunciations, but it'll also add clutter.
    I'm admittedly dragging my feet on looking into botting the ajp->apc conversion but I believe that the only things we'll need in order to get started are that and maybe updated declension tables (since that infrastructure already exists). IPA pronunciations (since not much infrastructure already exists for them) can maybe be left as is to start with, with "Palestinian" appended to the current ajp accent quals and "Damascene" added as an accent qual for the current unlabeled apc pronunciations? Still, when you think about it (talk) 16:35, 1 July 2025 (UTC)Reply
    I can do without a base form & leaving the translit to be the "generalised" pronunciation instead. The Imāla tag would essentially be the same as "metro Lebanese," so if we're doing the latter, we don't need the former.
    I agree with the last part about adding quals. I think it would be helpful to have on articles before we get to manually adjusting thing. Fayçalmf (talk) 18:55, 1 July 2025 (UTC)Reply
    + We'll have to specify in the Levantine Arabic terms with /ɡ/ category that it's only for words that are pronounced with it in the majority of dialects. Otherwise, almost every word with ق would be viable to include. Also adding pre-existing ajp terms to the category that fit the criteria like جمبري, جول, أغورة, مزچان, etc.
    Same with /p/ (i.e. دبرس) and /v/ (i.e. فيديو) Fayçalmf (talk) 19:02, 1 July 2025 (UTC)Reply

    IPA transcription of Pannonian Rusyn "в" before a consonant

    [edit]

    I have discovered, both through a Pannonian Rusyn grammar book and listening to actual Pannonian speakers, that "в" before a consonant does not usually make a /v/ or /f/ sound, but rather it is more like the Ukrainian/Carpathian Rusyn/Slovak realization, where it's closer to /w/. The problem is, I don't know exactly which IPA symbol to use. The grammar book transcribes the sound as the Belarusian ў (it literally gives праўда (prawda) as an example of realization). The Carpathian Rusyn pronunciation template uses /w/. Ukrainian and Belarusian IPA templates use /u̯/. Whereas there doesn't seem to be a consistent way of transcribing it in Slovak IPA.

    Sources: In this video, at 7:28, the narrator says жовта (žovta), and at 7:38, the narrator says правдиве (pravdive). There's probably more examples in that video but those are just two in immediate succession. And I found the grammar book here, the в stuff is on page 16. (It's in Pannonian Rusyn, but I just wanted to prove that the ў stuff is actually in the book and that I'm not making it up.) Although the book does note that в before ч or ш is pronounced /f/, in words like вчас (včas) or вшелїяк (všeljijak).

    So which IPA symbol, /u̯/ or /w/, do we think should be used for в before a consonant in the rsk-IPA module? Pinging @Sławobóg, @Vininn126, @AshFox for your thoughts. Insaneguy1083 (talk) 23:37, 16 June 2025 (UTC)Reply

    A question you should be asking yourself is is this phonemic or just phonetic? In which case your use of // is wrong. Vininn126 (talk) 04:09, 17 June 2025 (UTC)Reply
    Well it's square brackets in the IPA template. I'm just used to writing forward slashes more casually. So should it be [u̯] or [w] then, you reckon? Insaneguy1083 (talk) 06:12, 17 June 2025 (UTC)Reply
    I agree with what Ben said. Vininn126 (talk) 06:18, 17 June 2025 (UTC)Reply
    I don't think it matters that much; [w] might be better simply because it's more familiar to the average reader and easier to type. Benwing2 (talk) 05:15, 17 June 2025 (UTC)Reply
    I agree. There isn't a real difference between /u̯/ and /w/; the choice between is more about what aspects you want to emphasize. Use /u̯/ if you want to categorize it as a vowel that is nonsyllabic in this environment; use /w/ if you want to categorize it as a consonant. In this case, we probably want to categorize it as a consonant since it alternates with /v/ in syllable-initial position, so /w/ is probably the better choice. But that doesn't mean /u̯/ is "wrong". —Mahāgaja · talk 06:29, 17 June 2025 (UTC)Reply
    Well put. Vininn126 (talk) 06:30, 17 June 2025 (UTC)Reply

    Pannonian Rusyn nonvirile?

    [edit]

    Sorry that I'm adding another topic on Pannonian after such a short interval, but whose idea was it to add nonvirile as a separate noun gender? None of the Pannonian dictionaries that I use specifically define nonvirile as opposed to masculine pluralia tantum. There's masculine p.t., there's feminine p.t., and neuter p.t. in Pannonian. Not even Czech or Slovak use nonvirile on here. Did someone follow the Polish model a little too hard? If anyone can show me specific and definitive Pannonian documentation that nonvirile is defined as a noun gender, then fine, but otherwise I'll be reverting all the existing NV nouns into their respective pluralia tantums. Insaneguy1083 (talk) 16:27, 19 June 2025 (UTC)Reply

    @Insaneguy1083 Before you just revert everything, see who added them and ping them to get their views. Maybe they had some reason, maybe not. Benwing2 (talk) 17:37, 20 June 2025 (UTC)Reply
    @Thadh Hi, I've removed the nonvirile noun gender for Pannonian Rusyn nouns, since none of the dictionaries I use specifically mention nonvirile as a gender as opposed to just pluralia tantum. Even череґи (čeregi), the noun which you specifically changed to be NV, is listed in the dictionary as, and I quote, ж. мн. (ž. mn.). And there are neuter pluralia tantum like уста (usta) which are listed as с. мн. (s. mn.) in the same dictionary. Czech and Slovak don't use NV on here either, nor any other Slavic languages outside of the immediate Polish-sphere. Insaneguy1083 (talk) 18:11, 20 June 2025 (UTC)Reply
    Did you specifically ignore what I said? I said ping them before reverting. Benwing2 (talk) 20:51, 20 June 2025 (UTC)Reply
    Well, I had already reverted before you sent the initial message. It's not as if there are that many NV nouns anyway. There's like 14 of them or something, if even that, and it's just a matter of changing a few characters in the rsk-noun template if there exists an actual justification to use NV as opposed to just pluralia tantum. Insaneguy1083 (talk) 21:05, 20 June 2025 (UTC)Reply
    @Insaneguy1083, Benwing2: Unlike Czech and standard Slovak, Pannonian Rusyn and afaik Eastern Slovak have a completely different gender system, where masculine human nouns have a different inflection than masculine animate, masculine inanimate, feminine or neuter:
    я жем желєного мужу // я жем желєних мужох
    я жем желєного коня // я жем желєни конї
    я жем желєни лимун // я жем желєни лимуни
    я жем желєне яблуко // я жем желєни яблука
    я жем желєну вишню // я жем желєни вишнї
    Now, I don't know if you notice this, but this is exactly the same system as in Polish. And as in Polish, it is impossible to tell from agreement whether a plural-only noun is masculine non-human, feminine or neuter, except for its inflection class, where mixed classes are still present. Now, it's nice that the Rusyn dictionaries you use have decided on some arbitrary gender for these nouns, but unfortunately we should be able to document any Pannonian Rusyn noun, which includes those that do not have an earlier dictionary entry. Furthermore, just like in Polish, there is nothing that makes череґи inherently feminine rather than masculine or neuter unless a singular *череґа exists. The third-person singular pronoun is the same for all genders, as are verbal endings.
    I would appreciate it if you did not unilaterally remove such things from modules without first understanding the motivation behind it. Thadh (talk) 23:43, 20 June 2025 (UTC)Reply
    @Thadh: That adjectival declension separating masculine personal (i.e. virile) and all others was already implemented in rsk-decl-adj. And if you had checked the referenced 2010 dictionary, you'll find that there does, in fact, exist череґа (čerega). To quote directly from the 2010 Rusyn-Serbian dictionary:

    череґи ж. мн. (єд. череґа) кул. листови, мафиши

    As you can see, it does point out the existence of a череґа (čerega), which on Wiktionary we can decline fully using rsk-decl-noun-f. And personally, I feel like if a Rusyn dictionary, written by Rusyns, indicates a singular form with a specified gender, then maybe we should take their word for it and implement this word as a feminine noun (arguably not even pluralia tantum to be honest, more like a feminine noun that is chiefly in the plural).
    I've read the 1997 dictionary's grammar section, and I've also read the entire nouns and adjectives sections of the 2005 edition of the dedicated Pannonian Rusyn grammar book Ґраматика руского язика (quote from page 35: &28.1. Меновнїки можу буц хлопского, женского и стреднього род. (&28.1. Menovnjiki možu buc xlopskoho, ženskoho i strednʹoho rod.)). By all indications, even Rusyns themselves writing about Rusyn grammar do not specifically differentiate a "non-masculine-personal" gender for any context, other than pointing out that the plural accusative form of adjectives have a different form based on whether the noun is masculine personal.
    It's nice that you'd like to document any Pannonian Rusyn noun, "which includes those that do not have an earlier dictionary entry". But the 2010 dictionary is pretty comprehensive (other than proscribed colloquial words like да (da)), and gives a gender for every pluralia tantum. And I feel like specifying the gender, e.g. to harmonize with etymology and cognates in the case of уста (usta), is rather important even if the resultant declension is the same with say a feminine p.t. or masculine inanimate p.t.. Insaneguy1083 (talk) 08:28, 21 June 2025 (UTC)Reply
    @Insaneguy1083: How don't you see that the fact череґа exist is the reason the noun is feminine? Not all plural nouns have a singular though, even hypothetically. Those are the nonvirile nouns. Thadh (talk) 08:48, 21 June 2025 (UTC)Reply
    @Thadh: Bottom line, languages in the immediate Polish-sphere use nonvirile as a grammatical gender because it is specifically laid out as one (niemęskoosobowy) in the official Polish grammatical canon. For Pannonian Rusyn, nonvirile is NOT in itself defined as its own gender, there doesn't exist any *хлопскоособови (*xlopskoosobovi), and Rusyn dictionaries do as much to provide the gender of pluralia tantum like уста (usta) or дзвери (dzveri), even if the declension for non-masculine-personal nouns are all the same in the plural, even if there doesn't exist a singular form. It seems disingenuous (and frankly unnecessary) to group a series of nouns using the noun gender system of a completely different paradigm, just because there are perceived similarities to the Polish system. If Pannonian Rusyns themselves decide one day that they will start using the nonvirile classification and classifying nouns as such in their own dictionaries, fine. But for the time being, differentiating the adjectival declension using rsk-decl-adj seems very much sufficient to me to address the differences between virile and nonvirile nouns. I'm just following the official line here.
    @Vininn126 @Sławobóg as someone who interacts more with Polish-related entries, what are your thoughts on this? Insaneguy1083 (talk) 09:36, 21 June 2025 (UTC)Reply
    I think it's disingenuous to follow Slovak grammar (which the dictionaries in question in this case seemingly follow) to explain Pannonian Rusyn grammar. If a word is a pluralia tantum, not attested in the singular, and uses the same case agreement in the nominative and accusative, then it is simply not part of any gender other than "not masculine personal". There's no way to otherwise see what the gender is, and using etymology or other languages is not only not sustainable, it's dishonest. Thadh (talk) 09:48, 21 June 2025 (UTC)Reply

    Update Baltic (Golyad language & Dnieper Baltic group / classification of Galindian)

    [edit]

    Wiktionary has a Galindian language and a code xgl for it. But the problem is that this language code combines 2 different languages. (w:Galindians)

    1. Galindian xgl (synonym: West Galindian) ‒ language of West Baltic group, from Northeastern Poland. (w:Galindian language)
    2. Golyad language (synonym: East Galindian) ‒ language of Dnieper Baltic group (which was forgotten on Wiktionary), from area near Moscow, Russia. (w:Golyad language)

    The problem is similar to... if, for example, let's imagine... that Wiktionary ignored the divisions into Low German nds and (High) German de ‒ indicating both of them as "German". Explaining this by saying that "after all, both come from Proto-West Germanic gmw-pro and the names are similar, so specifying which is which is unnecessary."

    I suggest:

    • 1. Galindian xgl which is on Wiktionary:
      1.1. Add a synonym "West Galindian" to it (or rename it altogether, this is at the discretion of the admins, but I think renaming is a bad idea).
      1.2. Move it to West Baltic bat-wes, next to Old Prussian prg. Galindian xgl is currently outside the group ‒ precisely because it combines 2 completely different languages ​​from different Baltic groups.
    • 2. Add a 3rd group of Baltic languages ​​(in addition to West Baltic and East Baltic) that was previously missing from Wiktionary:
    Proposed name: Dnieper Baltic (w:Dnieper Balts / w:Dnieper-Oka language)
    Synonyms: Dnieper-Oka Baltic, Eastern Peripheral Baltic
    Code: eg. bat-dni (similar to East Baltic bat-eas and West Baltic bat-wes)
    Proposed name: Golyad
    Synonyms: East Galindian / Scripts: Latin script (Latn)
    Code: eg. xgl-eas or considering the group ~ bat-dni-gol ~ bat-dni-gld.
    ─┬ Baltic (bat)
     ├[-]┬ Dnieper Baltic (bat-dni)
     │   └── Golyad (xgl-eas)
     ├[-]─ East Baltic (bat-eas)
     └[-]┬ West Baltic (bat-wes)
         └── Galindian (xgl)
    

    A stand-alone code for Golyad language will help to better indicate the etymologies of some words: eg. Old East Slavic голѧдь (golędĭ); many hydronyms in Russian and toponyms (eg. city Russian Волокола́мск (Volokolámsk)), some dialectal and not only Russian words, eg. Russian кромса́ть (kromsátʹ, to shred). I don't know who actively edits the Baltic languages ​​and who is better to ping... @Vininn126, what do you say? Sorry if this is not your section and I pinged you in vain. AshFox (talk) 20:39, 20 June 2025 (UTC)Reply

    UPDATE: Just noticed that @-sche raised a similar issue in January 2024: Wiktionary:Language treatment requests#Proposal for several languages without ISO codes. He also noticed that ISO has a mistake... they mistakenly combine 2 different languages ​​in one Galindian xgl or something like that. AshFox (talk) 20:46, 20 June 2025 (UTC)Reply
    Just a comment: We have sooooooooo many different Balto-Slavic codes. Do we really need more? Can we get away with etym-only codes? Benwing2 (talk) 20:53, 20 June 2025 (UTC)Reply
    @Benwing2 Sorry, I understand that this is unfortunately not another code for a new dialect of the Polish language... but here is a situation where 2 different languages ​​from completely different groups are mistakenly hidden under one language code. These are not 2 dialects of one language. These are separate languages ​​that were territorially separated by 970‒1020 km. They did not even touch...
    I understand that we have many language codes for other Baltic extinct languages, half of which are not attested. But is this an excuse to completely forget about the separate Dnieper Baltic group and not fix the error with the ISO code xgl. AshFox (talk) 21:19, 20 June 2025 (UTC)Reply
    Wikipedia says neither (West) Galindian nor Golyad is actually attested in writing. If that's true, I don't think we should make L2's out of them, since there will never be lemmas in main space for them. On the other hand, making them etym-only means deciding what languages they're etym-only variants of. WP says (West) Galindian is "thought to have been a dialect of Old Prussian, or a Western Baltic language similar to Old Prussian", so it could be a variant of prg. But Golyad is apparently a variety of Dnieper-Oka, which is also unattested and has no ISO code, and which seems to be a branch of Baltic unto itself. Could we make it an etym-only variant of Proto-Baltic, which is itself already an etym-only variant of Proto-Balto-Slavic? Would we even want to, since however Dnieper-Oka and Golyad have been reconstructed, they're bound to be very different from PBS? Are there even published reconstructions of Dnieper-Oka and/or Golyad? —Mahāgaja · talk 21:28, 20 June 2025 (UTC)Reply
    @Mahagaja Who said that if a language is not attested in writing, it cannot be L2? If so, then we need to be consistent and other Baltic languages ​​(Skalvian svx, Curonian xcu, Selonian sxl, Semigallian xzm) that are not documented in writing should be converted into etymological codes... but this is, in my opinion, a stupid limitation. AshFox (talk) 21:37, 20 June 2025 (UTC)Reply
    As for reconstructions of the Golyad language, I have not found a separate "Golyad dictionary" before, and I did not look for it. However, I have come across individual reconstructions in the entries of Russian etymological dictionaries under certain Russian words/toponyms/hydronyms that have Dnieper Baltic (Golyad) origin. For the sake of interest, I simply opened the Russian Wikipedia w:ru:Голядский язык article and chose the first link that came up ‒ Топоров В. Н. О балтийском элементе в Подмосковье // Baltistica [11] and even so, some reconstructions of the Dnieper-Oka Baltic roots that were the source of hydronyms near Moscow are already visible there. But this does not mean at all that it will be necessary to create purposefully reconstructed entries for Golyad on Wiktionary. It is possible only for those Russian words having Dnieper-Oka Baltic origin... as an example. AshFox (talk) 21:59, 20 June 2025 (UTC)Reply
    I agree with Mahagaja. These are substrates, not attested nor comparatively reconstructed languages, and shouldn't have L2s. If there are any other such languages we do consider L2s, then yes, these should also be removed. Thadh (talk) 23:47, 20 June 2025 (UTC)Reply
    The problem is that Balto-Slavic classification is thoroughly muddled by politics because Slavs ruled and subjugated Baltic speakers and exaggerated the importance of Slavic. There are those who consider the division of East and West Baltic to be at the same level as the Slavic branch and there are those in Baltic linguistics who refuse to even consider the idea because they don't want to be in the same language family with Slavic. I'm not qualified to say who's right, but I think it's better to spell things out clearly and thoroughly so that we don't get caught up in such disputes. Better to treat everything as separate so it can be located in the classifications of either version. This is especially true with Galindan, since it could be argued that any grouping that contained both parts would also contain everything else in the entire Balto-Slavic family. Chuck Entz (talk) 21:58, 20 June 2025 (UTC)Reply
    It is just that a demonym has been used twice, as Serbs and Sorbs are actually the same word, or we have to distinguish two Moldavias, and lots of oikonyms parallelly formed due to the limits of human creativity as applied on the limited language inventories, sometimes realistically close to each other as Schröttinghausen (33 kilometres or like seven hours by foot between each).
    And AshFox is fully competent an editor to express needful distinctions. It is a well-known fact that Slavic speakers displaced some Baltic and Uralic languages. Some would only have been spoken in a few of such villages but between a tad greater distances, as we know in detail from descriptions of Africa or Papua in the recent century. You also imagine, after the published notes of less armchairy linguists than we are, how much fun the field-work of describing them all is, and that hence the historical material offers some diluted randomness, but no one longs to establish reconstructions, almost guaranteed as national myths don't depend on these peripheralia, so I think Mahāgaja misunderstood the purpose of filling the language data tree with languages we know to have existed, without prejudice to their contents. Mereological disagreement? There is a point in keeping their addition low-threshold by already vouchsafing their template codes; one already expends some motivation to outline and request them all, that putting in the effort as well to scratch the bottom of the barrel for the last content in Trümmersprachen would appear prohibitive. Fay Freak (talk) 23:21, 20 June 2025 (UTC)Reply
    FTR, I think the unsigned January 2024 proposal to split Galindian was by Theknightwho; my contribution was "What is there to add in either language?"; I see now that the answer is "mentions in the etymologies of other languages". Galindian (Western Galindian) could be considered a variety of Old Prussian; if someone were proposing to add a code for it, I'd say make it an ety-only variant of prg; since the ISO/SIL and we already have a full code for it, we could just as well leave it as-is, unless someone particularly wants to reclassify it (it doesn't change much; either way, the only place the code's used is in etymologies). For Eastern Galindian / Dnieper Baltic, if it's not attested but it needs to be mentioned in some etymology sections, then any code we add should indeed be an etymology-only code, like Mahagaja and others have said; make the Baltic language family its parent (like Suevic has West Germanic as its parent). Do reference works, e.g. etymological dictionaries of the languages that are thought to have borrowed from the lect(s), tend to reconstruct Eastern Galindian and Dnieper Baltic as separate things that different terms derive from? I am wondering if we really need two separate codes or just one. - -sche (discuss) 01:17, 21 June 2025 (UTC)Reply
    Can anyone speak to whether both "Eastern Galindian"/"Golyad" and "Dnieper Baltic" need etymology-only codes, e.g. if they are commonly reconstructed separately by dictionaries of Russian etymologies and would need to be mentioned separately in our entries' etymology sections, or whether we could get by with one etymology-only code? E.g. if we only need to say that some Russian (and other) words may derive from EG/Golyad, then we don't necessarily need a code for Dnieper Baltic, we can just have an etymology-only code for EG/G with the Baltic family as its parent. - -sche (discuss) 22:04, 26 June 2025 (UTC)Reply
    @-sche separate code for family Dnieper Baltic is also needed, definitely. Golyad language is the only clearly distinguishable language from the vast Dnieper Baltic family in ancient times, the geography of which initially affected the territory of not only the Moscow region, but also a huge part of western Russia, eastern Belarus and even northern Ukraine. On the territory of northern Ukraine there are many hydnonyms of Baltic origin, and all of them are borrowed from Dnieper Baltic (work on this: Toporov V., Trubachev O. (1962) "Linguistic analysis of hydronyms of the Upper Dnieper region"). Golyad language stands out only because it survived until the times of Kievan Rus and is clearly localized. But in the times before the Slavs settled on the lands of Dnieper Baltic there were other languages of this subgroup, but they have already been lost in the mists of time. The code for the Dnieper Baltic (bat-dni) family would be useful for the section Etymology of the names of rivers in Ukraine, Belarus, Russia that have Dnieper Baltic origins, but were located outside the territory of distribution of the Golyad tribe specifically. AshFox (talk) 07:03, 12 July 2025 (UTC)Reply
    OK, based on discussion here and at Wiktionary:Language treatment requests#Update Baltic #2 (unattested L2 languages → etym-only code) and what I've managed to find in reference works about Dnieper Baltic (not as much as I'd like!), my understanding is as follows; please correct me if I am mistaken about anything:
    There is evidence to suggest that one or more Baltic lects were spoken in the Dnieper - Oka area, besides Golyad. (This evidence is in the form of words in the other (Slavic) languages spoken in that area today which seem to derive from a Baltic substrate.) It's not certain how many lects "Dnieper Baltic" encompassed, and it's not even certain which branch of Baltic they belonged to (given that they're completely unattested), though some people argue for considering them to form their own third branch. (Some works are even more sceptical than others, e.g. Anthony Jakob, A History of East Baltic through Language Contact (2023), page 36, discusses the scholars "purporting to demonstrate a Baltic substrate in the hydronyms of the Upper Dnieper and Oka basins. The validity of this evidence has practically been taken for granted, and has remained absolutely central to discussions of the Baltic homeland [...but the] call for "tiefer gehende Sichtung und Diskussion" (1966: 2, fn.) seems to have largely remained unanswered, with later contributions rather looking to expand than critically assess the established material [...] In any case, the alleged pervasiveness of a Baltic substrate in the hydronymy of this area contrasts starkly with the almost complete absence of evidence of early substratal loans on a lexical level.") Nonetheless, enough reference works do assert that such a substrate existed that I agree we'd benefit from having a code (of some kind) so that we could report the theories of those works, that X or Y hydronym or word {{der}}ives from Dnieper Baltic. It seems to me that the more conservative approach may be to treat Dnieper Baltic as a substrate (with an etymology-only, "language"-type code), rather than as a family; this would be consistent with how we treat e.g. "Paleo-Hispanic", "Pre-Greek", etc (and then having a separate ety-only code for Golyad as a specific Dnieper Baltic lect would be comparable to how we have a code for e.g. Tartessian as a specific Paleo-Hispanic lect). So, I am inclined to add etymology-only codes for "Dnieper Baltic" (bat-dni) and "Golyad" (bat-gol). If there are objections to this, let me know. - -sche (discuss) 21:19, 14 July 2025 (UTC)Reply
    @-sche I thought that Dnieper Baltic would be added as a group (Language family code), as it is accepted on the English Wikipedia, next to Western and Eastern Baltic. But if you want to add both ‒ as etym languages (I'm not against it in general), then I would insist that Golyad be a derivative of Dnieper Baltic, and not a parallel language to it:
    └──┬ Dnieper Baltic (bat-dni) ? or better Dnieper-Oka Baltic (bat-dno)
    ㅤㅤ└── Golyad (bat-gol)
    Because Golyad is a part (specifically the language that survived longer than all until the 12th century AD) of the more extensive previously existing Dnieper Baltic. AshFox (talk) 04:51, 15 July 2025 (UTC)Reply
    Addition: I looked at the Ukrainian Wikipedia and the Russian Wikipedia. In both, Dnieper Baltic is referred to as a "language" and not "languages". Russian "Днепровско-окский язык" and Ukrainian "Дніпровсько-окська мова" which literally "Dnieper-Oka language". I think that we should then use not "Dnieper Baltic" (bat-dni), but something like "Dnieper-Oka Baltic" (bat-dno). AshFox (talk) 05:06, 15 July 2025 (UTC)Reply
    @-sche so what's your opinion? AshFox (talk) 10:14, 17 July 2025 (UTC)Reply
    OK, I have added etymology-only language codes for Dnieper Baltic bat-dni (based in the limited resources I was able to find on it, that name seemed most common) and Golyad bat-gol, and added West Galindian as an alias of xgl and specified that it belongs to the bat-wes family. If Dnieper Baltic or Golyad reconstructions are entered, the links generated are to Reconstruction:Proto-Balto-Slavic/... pages; caution is advisable when entering reconstructions, because if there's not a solid basis for a reconstruction it is liable to be deleted. - -sche (discuss) 20:33, 22 July 2025 (UTC)Reply
    @-sche Thanks for making these etym-codes. For example, I created Ukrainian Обе́ста (Obésta) and Russian Жи́здра (Žízdra) hydronyms of Dnieper Baltic origin. But I wanted to ask if it would be possible to slightly correct their position in the tree of Balto-Slavic languages... Move them to a category Baltic (bat), along with other Baltic languages. That would be more correct.
    [-]┬ Proto-Baltic (bat-pro) V
    ㅤ├[-]┬ Baltic (bat) F
    ㅤ│ㅤ├──── Curonian (xcu)
    ㅤ│ㅤ├[-]─ East Baltic (bat-eas) F
    ㅤ│ㅤ└[-]─ West Baltic (bat-wes) F
    ㅤ└[-]┬ Dnieper Baltic (bat-dni) V
    ㅤ ㅤ └──── Golyad (bat-gol) V
    [-]┬ Proto-Baltic (bat-pro) V
    ㅤ└[-]┬ Baltic (bat) F
    ㅤ ㅤ ├──── Curonian (xcu)
    ㅤ ㅤ ├[-]┬ Dnieper Baltic (bat-dni) V
    ㅤ ㅤ │ㅤ└──── Golyad (bat-gol) V
    ㅤ ㅤ ├[-]─ East Baltic (bat-eas) F
    ㅤ ㅤ └[-]─ West Baltic (bat-wes) F
    Similar to how it is done for Frankish, which is part of Proto-West Germanic but is in the Low Franconian languages category. AshFox (talk) 10:23, 24 July 2025 (UTC)Reply
    Done Done, I think. - -sche (discuss) 02:08, 29 July 2025 (UTC)Reply
    This is resolved, AFAICT. Striking it so it can be archived later. - -sche (discuss) 22:28, 5 August 2025 (UTC)Reply

    Proto-Oghuz and Proto-Arghu

    [edit]

    see similar heading in February 2025

    If it's not possible or difficult, I have another idea. Instead of making Proto-Oghuz anti-asterisk, we can try this:

    • Oghuz: [trk-ogz]
      • Proto-Oghuz: (trk-ogz-pro)
        • Middle Oghuz: (xqa-ogz) / (mid-ogz)
          • Old Anatolian Turkish: (trk-oat)
            • Gagauz: (gag)
            • Ottoman Turkish: (trk-oat)
              • Balkan Gagauz Turkish: (bgx)
              • Turkish: (tr)
          • Classical Azerbaijani: (az-cls)
            • Azerbaijani: (az)
            • Qashqai: (qxq)
          • Salar: (slr)
          • Turkmen: (TK)
    • Arghu: [trk-arg]
      • Middle Arghu: (xqa-arg) / (mid-arg)
        • Khalaj: (klj)

    Middle- because they are one of the Middle Turkic languages. @Benwing2 @Surjection@AmaçsızBirKişi @Rttle1@Ardahan Karabağ@Bartanaqa

    BurakD53 (talk) 01:53, 21 June 2025 (UTC)Reply

    If you ask me, if it were up to me, I would also want Chigil (xqa-chi) and Yaghma (xqa-yag) as Karakhanid dialects; unlike Oghuz and Arghu, these directly point to the Karakhanid language because they are from the same branch. What I mean is, according to Kaşgarlı, both belong to the Karluk tribal confederation. I would want them, but the problem is, you won’t give them to me. BurakD53 (talk) 02:15, 21 June 2025 (UTC)Reply
    Actually, the Proto-Oghuz period roughly corresponds to Middle Oghuz, most likely around the same time, but unfortunately, anti-asterisk doesn’t work. Either you’re too busy, or you’re not sure, or you prefer it to remain as a reconstructed structure. I would prefer one of those, the Proto-Oghuz language, if it has to be a reconstruction to entry lemmas, then Middle Oghuz is okay. BurakD53 (talk) 02:26, 21 June 2025 (UTC)Reply
    Yes, they should be roughly from the same time period, but if someone wants to keep them separate, I understand and support that too. I just want the Oghuz-related entries in a dialectical dictionary written in Karakhanid to be separated out and assigned to the Oghuz category. That’s my point of contention. BurakD53 (talk) 02:32, 21 June 2025 (UTC)Reply
    I've lost my PDF of DLT long time ago and too lazy to download it again tbh, so I won't be able to take a look to Chigil and Yaghma words. But I think, it is unnecessary to add language codes for dialects. We can mention them with the code of Karakhanid. Ardahan Karabağ (talk) 09:52, 21 June 2025 (UTC)Reply
    Partially  Support, but maybe we should use a different name than "Middle ...". We can also just change [trk-ogz(-pro)] into a non-asterisking descendant too, instead of adding new language codes.
    AmaçsızBirKişi (talk) 06:41, 21 June 2025 (UTC)Reply
    Oghuz is also totally okay for me, but I'm not sure if it's appropriate because it's also the name of the language family. BurakD53 (talk) 10:41, 21 June 2025 (UTC)Reply
    I agree. BurakD53 (talk) 10:43, 21 June 2025 (UTC)Reply
    Since I don't have enough informations about Arghu, I'm assuming that "Middle Arghu" is the one that is attested in DLT? If so, then I propose Proto-Arghu > Middle Arghu > Arghu, but as @AmaçsızBirKişi said, we could use a different name for Middle Arghu. Ardahan Karabağ (talk) 09:45, 21 June 2025 (UTC)Reply
    I think Middle Arghu is a good choice because this term is in use in also other languages family. For example, Middle Chinese, Middle Mongolian, Middle English, etc. @AmaçsızBirKişi @Ardahan Karabağ BurakD53 (talk) 10:46, 21 June 2025 (UTC)Reply
    I mean, it is, if there is no Proto-Arghu without asterisks. If there is, I'm not sure how old Arghu branch is, maybe also... 🤷‍♂️ BurakD53 (talk) 10:50, 21 June 2025 (UTC)Reply
    @BurakD53 My apologies, I have been busy but I will look into the feasibility of implementing the anti-asterisk feature today or tomorrow at the latest. I don't think it will be difficult but I need to verify this in the code. Benwing2 (talk) 19:35, 21 June 2025 (UTC)Reply
    My request is:
    • Proto-Turkic: (trk-pro)
      • Oghuz: [trk-ogz] (family)
        • Proto-Oghuz: (trk-ogz-pro) <<<<<<<<<<
          • Salar: (slr)
          • Turkmen: (tk)
          • Old Anatolian Turkish: (trk-oat)
    Are all of these meant to be separate L2's? Even Cypriot Turkish? Can you clarify this? Benwing2 (talk) 19:03, 30 June 2025 (UTC)Reply
    Cypriot Turkish people formally use the Turkish language, it shouldn't. BurakD53 (talk) 19:08, 30 June 2025 (UTC)Reply
    it's just a dialect close to southwestern dialects BurakD53 (talk) 19:09, 30 June 2025 (UTC)Reply
    You need to make a full proposal indicating what is an etym-only language, what is an L2 language, who is the parent and ancestor of what, and what the existing situation is. It's too confusing in the form you've presented it, for someone like me who is not an expert on Old Turkic languages. Benwing2 (talk) 20:00, 30 June 2025 (UTC)Reply
    If we are going to add new langcodes, we should add Ajem-Turkic/Classical Azerbaijani where you left a question mark too, is my suggestion. There's previous discussion over it too, and I remember it was favored somewhat.
    AmaçsızBirKişi (talk) 19:14, 30 June 2025 (UTC)Reply
    There is az-cls code but it's the descendant of az, maybe it was a mistake BurakD53 (talk) 19:19, 30 June 2025 (UTC)Reply
    I don't have a clear opinion on this topic. As I said before, someone should extract the az-cls sources and clearly define what data this language is based on. BurakD53 (talk) 19:21, 30 June 2025 (UTC)Reply
    I believe [az-cls] would correspond to 14th-18/19th century Azerbaijani literature, like Chagatai. I'm sure at least someone in the future might get an interest and start creating such entries like the bulk of Ottoman entries we have
    AmaçsızBirKişi (talk) 19:35, 30 June 2025 (UTC)Reply
    I don't have the works in this language, I heard some but I cant find their pdfs – BurakD53 (talk) 23:09, 30 June 2025 (UTC)Reply
    Also while we're at it Fuzuli, Şah İsmail etc. would be Classical Azerbaijani. And since some archaic words like yügüş, şol are technically unattested in the Latin script and therefore in Modern Azerbaijani it would belong at Classical Azerbaijani. Bartanaqa (talk) 19:26, 3 July 2025 (UTC)Reply
    The definition of Classical Azerbaijani clearly states it is a historical register of the Azerbaijani language, not a language in Wiktionary sense. Any quotation of Azerbaijani words from 16 to early 20th centuries should be entered under L1 Azerbaijani, if needed with the addition of the label {{lb|az|Classical}}, which is essentially a shorthand for the combination of labels {{lb|az|archaic}} and {{lb|az|poetic}}. Allahverdi Verdizade on a flying visit (talk) 09:11, 25 July 2025 (UTC)Reply
    weird how this doesnt apply to Turkish/Ottoman Turkish Bartanaqa (talk) 05:17, 26 July 2025 (UTC)Reply
    Sorry, not my area of expertise, but how does literature (both Turkish and English) usually label these DLT attestations? Looking up "Proto-Oghuz" "al-Kashgari" on Google Books did not give me the results I hoped for. I was wondering, we could keep Proto-Oghuz trk-ogz-pro as an etym-only code to Proto-Turkic, while keeping these forms under a new code and a new name, like Old Oghuz for example (trk-oog?), by analogy of the contemporary branches of Old Turkic, like Old Uyghur. Name and code may vary, take this generally as the two-code suggestion. Catonif (talk) 19:49, 30 June 2025 (UTC)Reply
    I don’t really think this is necessary, but honestly it doesn’t matter at all, because what we call it isn’t that important. I had actually suggested this at the beginning of the discussion. I thought Middle Oghuz might be an appropriate term. The naming was criticized, yet the Turkic languages of this period are referred to as Middle Turkic languages. So, calling it Middle Oghuz isn’t wrong, and calling it Old Oghuz isn’t wrong either. – BurakD53 (talk) 23:16, 30 June 2025 (UTC)Reply
    Since the languages we refer to as Middle Turkic lasted until the end of the Middle Ages, perhaps we made a mistake by using a broader term. So, Old Oghuz (trk-oog) is reasonable and appropriate. – BurakD53 (talk) 23:22, 30 June 2025 (UTC)Reply
    I didn’t quote the sentence that Kashgari wrote in Arabic, I included the usage example he provided in the sentence as a usage example. – BurakD53 (talk) 23:35, 30 June 2025 (UTC)Reply
    Proto-Turkic Old Oghuz

    (11th ce.)

    Modern Oghuz except Salar Salar
    -g -g
    k- k- g- generally g- generally
    -gAn -An -An -An for old words, -gAn as a suffix
    -gU -AsI -AsI -gUsI as a suffix
    -gAk -Ak -Ak -Ak
    *yarısgu yarısa yarasa yarasan, yarsan, yersan
    ? -gsI/-gsAk ? ?
    -gUçI -dAçI -IcI -gUçI
    -dI -dA -DI -ci

    Old Oghuz is different than trk-oat or any other. – BurakD53 (talk) 00:15, 1 July 2025 (UTC)Reply

    How many words are we talking about? If it's like 10, it might not make sense to create a L2 language just for that. Benwing2 (talk) 03:22, 1 July 2025 (UTC)Reply
    more than 250 BurakD53 (talk) 09:36, 1 July 2025 (UTC)Reply
    there are 111 in my user page, and it s not the half of it BurakD53 (talk) 09:43, 1 July 2025 (UTC)Reply
    OK, in that case it should be a separate L2 I think. The anti-asterisk feature is intended for cases where a language is primarily reconstructed but has a small number of scattered attestations, like Proto-West-Germanic. What you're describing sounds more like Proto-Norse, which we consider an attested language despite the "Proto-" prefix because it has a corpus of several hundred words. Benwing2 (talk) 20:32, 1 July 2025 (UTC)Reply
    OK, I see. BurakD53 (talk) 00:07, 2 July 2025 (UTC)Reply
    Am I going to have it? (trk-oog) – BurakD53 (talk) 21:54, 2 July 2025 (UTC)Reply
    I think this is reasonable. Proto-Turkic is something like 500 BC right? So Old Oghuz would be 1500 years later, which is a long time for linguistic developments to occur. @Catonif @AmaçsızBirKişi @BurakD53 what do you think? In order to create this I need to know:
    (a) which script(s) was/were the language written in? (Arabic? anything else? and is it Perso-Arabic specifically? We have a whole lot of different Arabic script varieties listed in Module:scripts/data)
    (b) what are the ancestor(s)? presumably just Proto-Oghuz?
    (c) what is the correct name? Old Oghuz or Middle Oghuz?
    (d) what are the direct descendant(s)? maybe Turkmen and Old Anatolian Turkish? Is Salar a descendant or does it descend from a sister language?
    (e) how different is this from Old Anatolian Turkish? Could we alternatively make this an etym-only variant of OAT?
    Benwing2 (talk) 02:27, 3 July 2025 (UTC)Reply
    (a) Arabic
    (b) Proto-Turkic > Proto-Oghuz > Old Oghuz
    (c) Old Oghuz or just Oghuz
    (d) Salar is a descendant. Turkmen, OAT and Salar are exact descendants.
    (e) No, we can’t. It is more archaic than Old Anatolian Turkish. The part I wrote as Modern Oghuz in the table above also applies to OAT: Old Oghuz temürgen, OAT demren; OO arqamak, OAT aramaq; OO tuğrağ, OAT tuğra; OO satğaşmaq, OAT sataşmaq; OO bâqırmaq, OAT bağırmaq; OO ö(:)tünç, OAT ödünç... So it is really different. BurakD53 (talk) 02:49, 3 July 2025 (UTC)Reply
    OO çekük, OAT çeküç BurakD53 (talk) 02:54, 3 July 2025 (UTC)Reply
    OK. Keep in mind it's possible for an ancestor of a language to be an etym-only variant of it (as with Old Italian vs. Italian) but if you think they're different enough that this doesn't make sense, I'll follow your advice. However we need to establish the time periods clearly; Wikipedia says that OAT was spoken from the 11th to the 15th centuries, which overlaps with the 11th century time frame for Old Oghuz. Benwing2 (talk) 02:59, 3 July 2025 (UTC)Reply
    Then Wikipedia is wrong, because the earliest Old Anatolian Turkish work was only written in the 13th century. Location: Eastern Anatolia. There are no written works before that. Mahmud al-Kashgari wrote his dictionary in the 11th century. Location: Middle Asia and probably part of Iran. If we assume Old Oghuz language as the language of the Oghuz Yabgu state, we can place it in the 9th-11th centuries, thus including the data recorded by Arab travelers passing through the Oghuz Yabgu territory during those years. BurakD53 (talk) 03:28, 3 July 2025 (UTC)Reply
    All right, once I hear from @Catonif and @AmaçsızBirKişi I will create the L2. Benwing2 (talk) 03:36, 3 July 2025 (UTC)Reply
     Support
    AmaçsızBirKişi (talk) 06:28, 3 July 2025 (UTC)Reply
     Support, thank you for the involvement. :) So what code are we settled on in the end? Because looking back at it xqa-ogz made perhaps more sense (not that it's a crucial detail, I don't mean to slow this down). Catonif (talk) 10:09, 3 July 2025 (UTC)Reply
    OK. Let's get xqa-ogz L2, so I can enter its entries. Later, if I have time to deal with xqa-arg and to find out how many lemmas there, I’ll request that one too.  SupportBurakD53 (talk) 18:36, 3 July 2025 (UTC)Reply
    Shouldn't it be Old Oghuz rather than just Oghuz, which is properly the name of a family? What is the name in the literature? Benwing2 (talk) 18:44, 3 July 2025 (UTC)Reply
    Makes sense. xqa-ogz should be Old Oghuz. – BurakD53 (talk) 18:47, 3 July 2025 (UTC)Reply
    careful with the name "Old Oghuz" tho. Some Turkish scholars use it to refer to Old Anatolian Turkish Bartanaqa (talk) 18:47, 3 July 2025 (UTC)Reply
    @Bartanaqa Do you know how literature usually calls DLT Oghuz? Catonif (talk) 18:53, 3 July 2025 (UTC)Reply
    According to book I have, title "Ana-Oğuzca Durum Morfemleri" by Kenan Azılı:
    • Proto-Oghuz
      • Salar
      • Selchuk Oghuz
        • Turkmen
        • Horasan Turkmen
        • Old Anatolian Turkish
          • Turkish (<Ottoman)
          • Gagauz
          • Azerbaijani
    BurakD53 (talk) 19:12, 3 July 2025 (UTC)Reply
    Alternatively Medieval Oghuz is an option but Old Anatolian Turkish is also a medieval language. – BurakD53 (talk) 18:56, 3 July 2025 (UTC)Reply
    But I really liked it. It is also used for early Oghuzs in academia. – BurakD53 (talk) 19:00, 3 July 2025 (UTC)Reply
    I mean can't we just call it "Oghuz" or just like we are doing "Proto-Oghuz". Or alternatively "Middle Oghuz" but icl Old Oghuz might be the most fit. Bartanaqa (talk) 19:06, 3 July 2025 (UTC)Reply
    Other possible names are "Common Oghuz" (if this is truly the ancestor of all attested Oghuz languages) or "Early Oghuz". Benwing2 (talk) 19:13, 3 July 2025 (UTC)Reply
    Alright, sources I checked simply call these attestations "Oghuz", which for our scopes is too generic. "Middle Oghuz" would make sense but it is confusing to have a "Middle" older than an "Old" (OAT). "Common" and "Proto-" usually refer to theoretical concepts rather than attested languages. "Medieval" is undeniably true but perhaps too vague for a language with a definition this specific (i.e. DLT). "Early" is a synonym of "Old" much less common in language names. I say we adopt "Old", and if it isn't a recognised label we will make it one. Catonif (talk) 19:30, 3 July 2025 (UTC)Reply
    All right, we should just go with Old Oghuz or maybe Early Old Oghuz. If no further discussion I'll go with Old Oghuz. Benwing2 (talk) 20:03, 3 July 2025 (UTC)Reply
    Early Old Oghuz is a perfect match, this way it won't be confused with Old Anatolian Turkish. – BurakD53 (talk) 20:09, 3 July 2025 (UTC)Reply
    @BurakD53 @Catonif @Bartanaqa @AmaçsızBirKişi I created this language under the name "Early Old Oghuz" and put Salar, OAT and Turkmen as descendants. I don't know if putting Salar as the descendant is correct; it wasn't even specified as an Oghuz language, which I changed. Benwing2 (talk) 23:37, 3 July 2025 (UTC)Reply
    Thank you for your all efforts on this topic. And I also thank everyone else who has been involved with this topic, for their support. Salar is an Oghuz language, and that's academicly correct. It's not wrong this way. But we’ll get a clearer idea over time whether it actually descends from Early Old Oghuz. – BurakD53 (talk) 23:51, 3 July 2025 (UTC)Reply
    Thank you. Although "Early Old Oghuz" seems to imply the presence of "Late Old Oghuz" as another label. For what it's worth, in an informal vote on Discord "Old Oghuz" got 3/5 votes. Catonif (talk) 09:39, 4 July 2025 (UTC)Reply
    No, actually there is no need. Late Old Oghuz is Old Anatolian Turkish. Old Oghuz has been used to refer OAT in Academia. – BurakD53 (talk) 13:06, 4 July 2025 (UTC)Reply
    Yeah since Turkmens used Chagatai to produce documents until 18th Century and the Salar just didn't the only other medieval Oghuz language is OAT Bartanaqa (talk) 13:25, 4 July 2025 (UTC)Reply
    I think Early Oghuz would be better. Yes, they are descendants of all Oghuz, because we're not sure if Oghuz in Karakhanid were homogeneous. Actually, we know that it wasn't. – BurakD53 (talk) 19:38, 3 July 2025 (UTC)Reply

    Name of the Yalë / Yale language

    [edit]

    I was looking at the page for bo and saw an entry for the language "Yale". I didn't know if this was an actual language, some kind of secret code used by Yale University students, or a spurious entry. When I tried to look for the language here on Wiktionary, I didn't find it at Yale, but I eventually found it at Category:Yale language which links to the Wikipedia page for the Yalë language. Before I can create an entry for the language's name, there is a question: should the language be canonically called Yale or Yalë on Wiktionary? And I believe this is a correct place to ask.

    Evidence: Currently the categories and entry section headers on Wiktionary do not use the diaeresis. The Wikipedia page, Wikidata Yalë (Q2992915), and ELP use Yalë. Glottolog does not use the diaeresis in the page title but the comment on endangerment does. SIL and a 2020 paper from SIL-PNG authors do not use the diaeresis. Note that this paper uses Yade in the pdf name, though it primarily uses Yale. The paper is authored by Aannestad based on data left by the Campbells, who died before the grammar could be written up formally (mentioned on pg 5). On page 10 it says: "Yale has also been called ‘Yade’ and ‘Yare’, due to different transcriptions of the sound [ɺ]; and in the Campbells’ orthography, it is properly spelled ‘Yalë’." This suggests to me that the canonical name here should be Yalë, but I don't think that's a decision I can or should make unilaterally.

    Results of a decision may implicate:

    • Creating a Yalë page with hatnote links between it and the Yale page
    • Updating section headings for existing entries (e.g. bo#Yale would be changed to bo#Yalë)
    • Updating category names (e.g. Category:Yale lemmas to Category:Yalë lemmas)
    • Creating a Wiktionary:About Yalë page, if it would contain something not already at Category:Yale language
    • Create redirects or entries for common variant spellings/names for the language (e.g. Yale (language), Yadë, Yade, Yare, Nagatman, Nagatiman) to the canonical language name

    Misc:

    • My interest in this topic is mainly just internal consistency: if words of this language are present in Wiktionary, then the word naming the language should itself have an entry. (I'm not planning to make long term contributions on the topic.)
    • The language currently only has 8 terms here on Wiktionary and the English Wikipedia page actually defines more words from this language than Wiktionary does. I don't know if that means the language is essentially "out of scope" for wiktionary or if it is currently a "stub".

    Solid kalium (talk) 23:51, 28 June 2025 (UTC)Reply

    Usually we don't include diacritic marks in the canonical names of languages if there is doubt as to whether the diacritic belongs. Wikipedia is not a good reference to use for this because they have a strong bias (based largely on Wikipedia user Kwamikagami) towards including diacritical marks regardless of what the literature prefers. Since Glottolog, SIL/Ethnologue and numerous sources agree on not including the diacritic, and there is no possibility of ambiguity or confusion, I would oppose a rename. Benwing2 (talk) 19:58, 30 June 2025 (UTC)Reply
    Thanks! I've added definition entries for this language and people at Yale#Etymology 2.
    I don't know if there's value in adding redirects or mentions at the alternate spellings/names. I haven't looked to see where they might be used other than in enumerated lists of alternate names. If you'd like me to add these, just let me know. Solid kalium (talk) 02:42, 1 July 2025 (UTC)Reply
    Also, just adding that I realized "in the Campbells’ orthography, it is properly spelled ‘Yalë’" means that the name of the language, in the language itself, in a particular orthography, is spelled with the diacritic. Which is distinct from the name of the language when writing about it in English. Solid kalium (talk) 03:00, 1 July 2025 (UTC)Reply
    Yup, exactly. As for creating entries for common variant spellings of the language name, that is completely fine as long as WT:CFI is respected, which means in practice that there exist at least three uses of the given spelling in "durably-archived media" or whatever (academic papers, etc.). I assume that this is the case for all of the spellings you list above. As for there being only 8 terms here, that just means no one has gotten around to adding more of them; no natural languages are out of scope for Wiktionary. The only thing is that if there's a practical orthography that has any use at all, it's best to cite the terms in that orthography (if possible) rather than using ad-hoc IPA-based spelling. I don't know what orthography the terms in the Wikipedia wordlist are written in so you'd have to poke around a bit to see if it matches the Campbells' orthography. Benwing2 (talk) 03:17, 1 July 2025 (UTC)Reply
    Thanks for taking the time to inform me! I'll keep this in mind when I contribute in the future. Solid kalium (talk) 15:53, 1 July 2025 (UTC)Reply

    July 2025

    [edit]

    Yeniseian languages

    [edit]

    The family tree's currently laid down like this, following older classification schema:

    • [qfa-yen-pro] Proto-Yeniseian:
      • [qfa-yno] Northern-Yeniseian:
        • (...)
      • [qfa-yso] Southern-Yeniseian:
        • (...)

    ...but it should be like this instead, as given in the Wiktionary:Proto-Yeniseian entry guidelines (and as per Vajda 2024:371, which is also the source we use on Wiktionary):

    • [qfa-yen-pro] Proto-Yeniseian:
      • [N/A] Ketic:
        • [ket] Ket:
        • [yug] Yug:
      • [N/A] Kottic:
        • [xss] Assan:
        • [zko] Kott:
      • [N/A] Arinic:
        • [xrn] Arin:
      • [N/A] Pumpokolic:
        • [xpm] Pumpokol:

    I do not request language codes for the family branches, but they could be useful in the future where we might have more branch-specific reconstructions.

    AmaçsızBirKişi (talk) 10:00, 2 July 2025 (UTC)Reply

    I don't think there's any way to reorganize the subfamilies of Yeniseian without creating codes for them. Also, I recently edited и and ит, both of which mention Proto-Ketic in their etymology sections, for which we have no code. If we make a code for the Ketic family, we may as well add "-pro" to it and create a protolanguage for the family while we're at it. —Mahāgaja · talk 10:56, 2 July 2025 (UTC)Reply
    Having something like [qfa-yke-pro?] (for Proto-Ketic) would be useful. Could a family/branch language code double as a proto-language code, too? To prevent clutter, of course.
    AmaçsızBirKişi (talk) 11:21, 2 July 2025 (UTC)Reply
    Proto-language codes are (almost?) always just the family code followed by -pro. They do need to be distinct. —Mahāgaja · talk 14:43, 2 July 2025 (UTC)Reply
    If so, then we would need:
    Ketic, Proto-Ketic: [qfa-yke(-pro)] (We have a ton of reconstructed Proto-Ketic lemmas.)
    Kottic, Proto-Kottic: [qfa-yko(-pro)] (Not as much as Ketic, but it's quite distinct from the rest.)
    Arinic, Proto-Arinic: [qfa-yrn(-pro)] (This one will also be useful for Xiong-nu lemmas, in literature it's called Old Arin, but Proto-Arinic works I guess.)
    Pumpokolic, Proto-Pumpokolic: [qfa-ypm(-pro)] (This one will also be useful for Xiong-nu and Jié lemmas.)
    Could you do this? I doubt if anyone would object to this at all.
    AmaçsızBirKişi (talk) 16:06, 2 July 2025 (UTC)Reply
    I know next to nothing about Yeniseian but in the interests of parsimony, can any of the above proto-languages be made etym-only varieties of Proto-Yeniseian? I don't know how old this family is or how different the various branches are. Benwing2 (talk) 03:02, 3 July 2025 (UTC)Reply
    Yes, we don't need more than just etym-only variants.
    AmaçsızBirKişi (talk) 06:25, 3 July 2025 (UTC)Reply
    @Benwing2 Will you implement these changes? You can go ahead and nuke the [qfa-yso] and [qfa-yno] langcodes too, since they are obsolete.
    AmaçsızBirKişi (talk) 09:10, 9 July 2025 (UTC)Reply
    Yes, I'll try to get this done in the next day or so. Benwing2 (talk) 19:10, 12 July 2025 (UTC)Reply
    Well? @Benwing2
    AmaçsızBirKişi (talk) 10:37, 3 August 2025 (UTC)Reply
    Done Done Benwing2 (talk) 21:32, 27 September 2025 (UTC)Reply

    Cumbric

    [edit]

    (Notifying RichardW57, Arafsymudwr, Llusiduonbach, Linguoboy, Silmethule, Brutal Russian, Mellohi!, Silmethule, AryamanA, Caoimhin ceallach, Exarchus, Mellohi!, Pulimaiyi, Victar): Although we have some lemmas in Cumbric in main space, the language is in fact totally unattested, only reconstructed. And it's not even reconstructed on the basis of an attested daughter language, but solely on the basis of place names in England and Scotland. Not enough is known about the language for us to say with any certainty how it differed from Proto-Brythonic, so I propose that we change Cumbric from being a full-fledged L2 language to being an etymology-only variant of Proto-Brythonic. Thoughts? —Mahāgaja · talk 07:15, 3 July 2025 (UTC)Reply

    For what it's worth, Jackson (1994, Language and history of early Britain, 4th ed.) states that three words are definitely Cumbric: kelchyn, galnes/galnys, and mercheta. That brings to the argument we've had before here of whether we should consider these true Cumbric words or Latin words with Cumbric roots (because they occur in Latin texts). I'm not sure where we stand on this. —Caoimhin ceallach (talk) 22:50, 6 July 2025 (UTC)Reply
    Oh, that's true. I had forgotten about those. I don't think we've ever come to a consensus about how to handle words in barely attested languages that are only mentioned (not used) in a text in another language. Personally, I'm willing to keep Cumbric as a full language for the sake of these four terms (three different lexemes). But I do still think that any other Cumbric words should be listed as (reconstructed) Proto-Brythonic rather than reconstructed Cumbric, as we just don't know enough about it to distinguish reconstructed Cumbric from PBr. —Mahāgaja · talk 08:21, 7 July 2025 (UTC)Reply
    As I've stated in some earlier discussions, I am of the opinion that languages that are only attested through another language are unattested. An etym-only code seems fine specifically for these terms that seem to be borrowed from the language, but the attestation through another language also means the language is reshaped so much that any analysis becomes tricky.
    On that note, I think we have a bunch of other languages in a similar situation. For instance CAT:Thracian lemmas and CAT:Dacian lemmas seem to be filled with reconstructions that are based on borrowings, even though the attested material is so scarce, that a good reconstruction seems very difficult. I would also like to treat such terms as basically substrate terminology. Thadh (talk) 11:38, 7 July 2025 (UTC)Reply

    Ukrainian etym-codes

    [edit]

    Please add etymological codes for varieties of the Ukrainian language:

    AshFox (talk) 08:00, 4 July 2025 (UTC)Reply

    Support adding an ety-only code for Canadian Ukrainian (like we also have codes for things like Louisiana and Canadian French). An ety code for Podlachian also seems reasonable: it does entail considering it to be an ety-only variant of something, either Ukrainian or Belarusian (unless we cop out and set the parent to "Slavic" lol; let's not), but since as you point out we are already considering it a variety of Ukrainian, it seems reasonable to be internally consistent with ourselves and make the ety-only code a variant of Ukrainian. West Polesian seems to be in the same boat of being unclear whether it's Belarusian or Ukrainian; I can't speak to whether or not to add a code for it or what language to consider it a variant of. - -sche (discuss) 20:16, 8 July 2025 (UTC)Reply
    Ukrainian-Belarusian linguistic border
    @-sche West Polesian is, like Podlachian, a variety of Ukrainian. All other statements ‒ attributing West Polesian to allegedly Belarusian ‒ are just quoting what the Belarusian state propaganda says, which indiscriminately attributed all dialects that were within the borders of the Republic of Belarus to alleged "dialects of Belarusian". But languages ​​do not follow the artificial borders of the state that were divided during the times of the USSR. All linguistic (not political) attempts to draw a line between the Belarusian and Ukrainian languages ​​(see maps) ended up with the region called w:Beresteishchyna (w:uk:Берестейщина), where West Polesian is spoken, being within the boundaries of the dialect continuum of the Ukrainian language. AshFox (talk) 10:36, 9 July 2025 (UTC)Reply
    @AshFox: And any statement attributing West Polesian to Ukrainian follows the Ukrainian state propaganda, which considers any Ruthenian language except Belarusian proper as a variety of Ukrainian.
    West Polesian has its own orthography, and afaik some speakers even consider themselves a separate microethnos. I would not at all be surprised if about a third considers themselves Ukrainian speakers, a third Belarusian, and a third has no idea.
    You painting this as a "clear" and "transparent" issue is not helping. Thadh (talk) 12:26, 9 July 2025 (UTC)Reply
    @Thadh, no, there is no need to turn the arrows in the opposite direction here. Just take any linguistic map of the division of Ukrainian and Belarusian dialects and you will not find anywhere that the Beresteishchyna region and its dialects are considered Belarusian. The automatic attribution of the Beresteishchyna region dialects to Belarusian is a political division along state borders, not along the borders of dialect continua. AshFox (talk) 16:21, 9 July 2025 (UTC)Reply
    PS: I wouldn’t be surprised if you also attribute Ukrainian dialects in Kuban known as w:Balachka to the Russian language, simply because these dialects of the Ukrainian language ended up on the territory of the Russian Federation. AshFox (talk) 16:47, 9 July 2025 (UTC)Reply
    Why not put Polesian as a separate L2?
    PS. Why entries on Podlachian language are so badly shaped? Conjugation tables are just lists. Should I or somebody else try to fix them before adding any serious codes? Tollef Salemann (talk) 13:23, 9 July 2025 (UTC)Reply
    @Tollef Salemann There have been suggestions about adding a separate L2 code before. But the administration believes that any new L2 code in the Balto-Slavic branch is an unacceptable luxury. On the one hand, this would be a great option. But on the other, unfortunately, West Polesian is very poorly standardized and codified. I tried to find at least one dictionary, but I only found one made by enthusiasts from the West Polesian communities in Telegram on 35 pages.
    PS: As for Podlachian, there was also a request to add L2 code for it earlier, but no one expressed interest (#Add Podlachian Language). Podlachian is handled by only 1 person (native speaker), who has little experience, so he does it as best he can. And Conjugation tables look like this because they require separate templates, and regular Ukrainian Conjugation templates are not suitable. AshFox (talk) 16:32, 9 July 2025 (UTC)Reply
    @AshFox: Do you mean that @PGałązka is a native Podlachian speaker? To the best of my knowledge, this person only claimed to be "half-Podlashuk", which by itself could mean many different things and definitely does not guarantee any level of language proficiency. Unfortunately all my requests for clarification had been ignored so far. --Ssvb (talk) 16:35, 11 August 2025 (UTC)Reply
    @Ssvb well, I understand. I redid/specified Podlachian into a dialect of the Ukrainian language: Northern Ukrainian > Western Polesian Ukrainian > Podlachian Ukrainian. And there is no support at all from the administration regarding the L2 code for West Polesian, so I "crossed out" the topic. AshFox (talk) 17:28, 11 August 2025 (UTC)Reply
    Pardon me, what did you understand? Based on my guess using the available data, one of the PGałązka's parents is originally from Podlaskie Voivodeship, the current place of residence is unknown (possibly outside of Podlaskie Voivodeship or even outside Poland). The PGałązka's Podlachian language skills are likely nonexistent, but this person is definitely interested in this language due to it being the language of ther ancestors. In his 2023's interview, Jan Maksymiuk explained that basically only old people can speak the Podlachian language, they didn't teach it to the youngsters, and now the youngsters are starting to show interest in the language. Is PGałązka one of the old people or one of such youngsters? As you can see, I'm leaning to the latter, and I'm forced to make such guesses because it's been 4 months already with no feedback.
    I see that PGałązka is uncooperative, refuses to communicate with the other editors and simply copy/pastes information from the Jan Maksymiuk's dictionary, paying no attention to consensus or the existing entry guidelines. And this definitely doesn't look like a healthy situation to me.
    BTW, has anyone thought about trying to contact Jan Maksymiuk? --Ssvb (talk) 10:36, 13 August 2025 (UTC)Reply
    OK, I have added an etymology-only code for Canadian Ukrainian, uk-CA. Discussion can continue regarding the other two lects, but the existence of a Canadian variety of Ukrainian seems uncontroversial. - -sche (discuss) 21:45, 13 July 2025 (UTC)Reply
    @-sche God bless you. Thank you. AshFox (talk) 08:07, 14 July 2025 (UTC)Reply

    Podlachian and West Polesian: Ukrainian, Belarusian, or separate languages?

    [edit]

    Since the first discussion of Podlachian petered out without reaching a decision to do anything, and it looks like the discussion above might also fail to reach consensus on how to treat Podlachian or West Polesian, let me as a last resort ping all reasonably active users who list Babel-3 or higher knowledge of either Ukrainian or Belarusian (who haven't already commented above): @MaksOttoVonStirlitz, Mzajac, Alexander Mikhalenko, Kohannya, Alexdubr, GPodkolzin, NickK, Rayreat, Roman Popyk, Roman Shosirobe, Ukrenko, Underfell Flowey, Хтосьці, Ssvb, ɶLerman: do you have an opinion on how to treat Podlachian and/or West Polesian, and whether one or both are variants of Ukrainian, variants of Belarusian, or separate languages? - -sche (discuss) 22:34, 10 July 2025 (UTC)Reply

    No strong opinion, just a few thoughts. Both are examples of Slavic microlanguages, and I think they should be treated consistently with other microlanguages (which I did not research). Both of them are part of the dialect continuum: West Polesian is still one dialect away from the border on the Ukrainian side (w:uk:Загородські говірки to the north are considered the language border), while Podlachian is on a triple border (it becomes these same border w:uk:Загородські говірки further east and New mixed dialects of Polish furhter west). Of course it doesn't help that these microlanguages are not really codified — NickK (talk) 23:16, 10 July 2025 (UTC)Reply
    @-sche: I would probably make it a separate language. Although most speakers consider themselves Belarusians. However, according to the Экспедиция к белорусам Польши (2020) [Expedition to the Belarusians of Poland (2020)], many interlocutors called themselves Russian and their language Russian. It's hard for me to say anything. Lerman (talk) 23:38, 10 July 2025 (UTC)Reply
    I would not really trust the Russian Academy of Sciences after 2014 on what is Russian. Podlachian is by any means not close to Russian, it is probably close to Ukrainian, Belarusian and Polish first and only then comes Russian. Jan Maksymiuk, author of a Podlachian language standardisation project, calls the language more similar to Ukrainian but people more associating themselves with Belarusians, with Russian not even mentioned — NickK (talk) 00:54, 11 July 2025 (UTC)Reply
    The fact that it is supposedly "Russian" is, of course, nonsense. Podlachian is part of the Ukrainian dialect continuum, it is just on the periphery. This is confirmed by Jan Maksymiuk, this is confirmed by many old linguistic maps of the spread of dialects of the Ukrainian language. The fact that some residents say that they are "Belarusians" is simply due to territorial proximity to Belarus, and that's all. AshFox (talk) 08:22, 11 July 2025 (UTC)Reply
    @NickK: I see no reason not to trust the Russian Academy of Sciences. These are rather your personal prejudices, but I am not interested in them, I am interested in the results of the expedition. It is clear that the Podlachian language is not close to the Russian language. But this does not change the fact that, according to the results of the expedition, the majority of speakers (within the framework of, again, the expedition) call their language the Russian language. Unfortunately or fortunately, Jan was not the only one interested in this group. See also the second paragraph on page 182. Lerman (talk) 15:52, 13 July 2025 (UTC)Reply
    None of these sources mention that the majority of speakers call their language the Russian language. The first source only mentions that some speakers call their language русский, the second one states that people identify them, among other, as русские. The issue is that Podlachian distinguishes rosijśki and ruśki while Russian does not. Of course the word used in the Podlachian original was ruśki: ludowo-folklorystyczne określenia po-svojomu, po-našomu, po-ruśki, Na Pudlašy kryžovalisie kulturno-movny pôlśko-ruśki (potum biłoruśki i ukrajinśki) vpłyvy, te nazwy, które mu nadawano — po-svojomu, po-našomu, po-ruśki, po-chachłaćki etc. So the correct translation should be that a some Podlachians identify themselves as Ruthenians and their language as the Ruthenian language, which is a very reasonable statement. Stating that the majority of speakers call their language the Russian language is very misleading — NickK (talk) 16:09, 13 July 2025 (UTC)Reply
    “called themselves Russian and their language Russian” — maybe “called themselves Ruthenians and their language Ruthenian” would be a better translation? You can’t really tell without context, the article is written in Russian and Russian doesn’t make the distinction (both Ukrainian руський 'Ruthenian' and російьский 'Russian' would be translated as русский into Russian). Хтосьці (talk) 09:08, 11 July 2025 (UTC)Reply
    Podlachian and West Polesian cannot be part of the Belarusian. They are clearly micro-languages ​​that belonged to or came out of the Ukrainian dialectic continuum. Because they are on the periphery of the Ukrainian dialect continuum, they naturally have common features with neighboring Belarusian, but this does not make them part of Belarusian, as confirmed by numerous studies of the border between the Belarusian and Ukrainian languages ​​and the corresponding maps (1, 2, 3, 4, 5, 6, 7, 8) based on them. Modern Belarusian linguistics classifies all dialects that are present on the territory of the Republic of Belarus as allegedly "Belarusian". But this is a political division!, not a linguistic one. Languages ​​did not follow the artificially created borders during the USSR.
    If Podlachian and West Polesian are ever given separate L2 codes, they should be listed as inherited from Middle Ukrainian (zle-muk). If Podlachian and West Polesian are assigned only etymological codes, then they must be indicated as part of Ukrainian (uk). PS: I would be happy if Podlachian and West Polesian would get separate L2 codes. I would gladly edit West Polesian. And @PGałązka would do something for Podlachian. But I am a realist and I understand that separate L2s are unlikely, the administration will never agree. AshFox (talk) 08:48, 11 July 2025 (UTC)Reply
    Agree with your arguments. I hope that some more people can contribute on the subject. Anyway, we need separate codes for the declension tables in the entries which are already created. Tollef Salemann (talk) 11:48, 11 July 2025 (UTC)Reply
    I have a pretty limited knowledge of both these languages, however I agree with NickK's assessment that they should be treated the same way other Slavic microlanguages are treated.
    If the consensus is reached that they are not to be treated as distinct languages, they should be listed as part of Ukrainian, since to me their origin as dialects of that language seems most plausible. Roman Shosirobe (talk) 20:34, 11 July 2025 (UTC)Reply

    L2 for Podlachian and West Polesian

    [edit]

    @Benwing2 I am contacting you with a request. I am requesting the addition of L2 codes for Podlachian (zle-pdl) and West Polesian (zle-wpo). This issue of Slavic microlanguages should have been finished long ago, when Rusyn was reorganized, even when @Thadh raised this issue about West Polesian in 2022. I would become a permanent editor of West Polissian, especially since it is very close to me as a Ukrainian. I would keep an eye on the section. I found several active Telegram communities where West Polesian speakers have collected widely available literature, dictionaries, dialectical atlases, books in West Polesian. They are very interested in this and would like to help me, and a couple of people would like to come to Wiktionary to add entries to West Polesian themselves. I asked @Thadh on Discord about this and he is not at all against adding these microlanguages, he just said that he doesn't have enough time for it. AshFox (talk) 09:42, 12 July 2025 (UTC)Reply

    @AshFox I know little about Slavic microlanguages. How close are Podlachian and West Polesian to each other? If they are close, it might not make sense to have two separate L2's. Also I'm concerned that West Polesian doesn't appear standardized; we might well end up with the "Scots problem", because Scots is another non-standardized language: We have a bunch of poorly formatted Scots entries, no active editors, no consensus on what sources count for inclusion and under what orthography, no consensus on where the division between Scots and dialectal English occurs, and occasional IP's adding entries that might be totally bogus but where we have no one to check them. Before adding any new L2's I'd like to see these issues addressed:
    1. What orthography should be used? Do we normalize into a consistent orthography or do we just use whatever the original source does?
    2. What are the legitimate sources for these languages? If an entry is challenged in RFV, which sources count for inclusion purposes?
    3. What are the formatting conventions for these languages and how do we ensure the entries stay high-quality?
    4. Where is the boundary between these languages and Ukrainian, Belarusian, and other neighboring microlanguages?
    Benwing2 (talk) 19:05, 12 July 2025 (UTC)Reply
    Also pinging @-sche and @Vininn126 for any thoughts or other concerns. I know that -sche has wrestled with Scots, and Vin created Masurian as an L2 and later ended up thinking better of it. What were your reasons for doing this and do you have any concerns I haven't articulated? Benwing2 (talk) 19:08, 12 July 2025 (UTC)Reply
    For the record, Masurian is still very much an awkward case. Early Masovian as an entire group shares a LOT with early Pomeranian, and also later developments of Masurian are possibly equal in number to Kashubian. Vininn126 (talk) 19:34, 12 July 2025 (UTC)Reply
    Hello @Benwing2. Podlachian and West Polesian are close, but not close enough to be considered one language. Despite their closeness (which is typical for any related languages), they are two separate microlanguages, and Podlachian is standardized, uses the Latin alphabet (as opposed to the Cyrillic alphabet in West Polesian), and its literary version is very far from the more widely used form of West Polesian, which I would like to standardize on Wiktionary. Don't worry, what happened with Scots won't happen in this case. I am the editor of Old Ruthenian zle-ort (where the language has a significant variation in spelling in ancient times), and I managed to standardize everything, I follow the section. I have also been actively editing Old Novgorodian zle-ono for over a year, where in ancient times there was generally a mess (for example, 8 different letters were used to write /o/, and 6 different letters were used for /e/), but nothing terrible happened, and now everything in the Old Novgorodian section is normalized and there is no chaos. The same will happen with West Polesian, where in the future, as I work with it, I will write a clear guide that other participants can follow if they want to add something, and others can double-check it. There will be clear normalized spelling and a clear list of sources from which you can take lexical material. Regarding what happened to Masurian, when it was added and then cancelled, I am unfortunately not very familiar with Polish dialects and cannot judge. But as far as I know, Masurian was never considered a microlanguage. I don't think the same will happen to Podlachian and West Polesian.
    1. Regarding the orthography, I planned to use the alphabet developed by Nikolai Shelyagovich (1990). You can see its full alphabet in my draft: West_Polesian#Orthography. However, after talking to West Polesian speakers, they expressed ambiguous opinions about the Shelyagovich alphabet, suggesting to improve it by replacing 1 letter and adding +1 letter. I think the choice will be up to the administration, to use the original version of Shelyagovich alphabet in the first attempt at codification, or to allow the creation of their own normalization on Wiktionary.
    2. Regarding sources. The main sources are, first of all, a huge dialect atlas of Belarus[12][13], a dialect atlas of Western Ukraine[14], local atlases of dialects of Brest region[15][16][17][18], a two-volume dictionary of the West Polesian dialect[19], local dictionaries of dialects of Western Polesia[20][21][22][23][24][25]. I gave only a few examples. As I work with the language, I will create a guide and a bibliography (I have a lot of experience with bibliographies on Wiktionary), indicating a list of everything that is a legitimate source for West Polesian. Works like this[26] will also be useful.
    3. Perhaps I misunderstood the third question. But regarding the quality of dictionary entries, as I said above, I will create a guide with templates, normalization rules, with acceptable bibliography, in the future I will create templates for declensions and conjunctions, etc. Are the dictionary entries I created earlier, such as Old Ruthenian гайстеръ (hajster), Old Novgorodian непърꙗ (nepŭrja), Old Pskovian дъжгь (dŭzʹgĭ), Carpathian Rusyn реселюв (reseljuv), etc., "of insufficient quality"? If they are acceptable, I see no reason why they would be worse for West Polesian.
    4. Regarding the boundaries. They literally run along the borders of the West Polesian dialects of the Ukrainian language, since West Polesian comes from the (Middle/Modern) Ukrainian dialect continuum. For example, on this map of Central and West Polesian dialects (at the junction of the borders of Ukraine, Belarus and Poland), West Polesian itself is seen mainly in pale yellow/orange on the territory of Belarus. Yellow on the territory of Poland is Podlachian. Even on the dialect atlas of Belarus (which will be one of the main sources), on many maps the southwestern region of Belarus is different and stands out in phonetics brightly. For example, map 5, map 6, map 7.
    PS: When there will be a separate Podlachian, I will help @Gałązka with the design of articles, help create the necessary templates, and also transfer to Podlachian from Ukrainian those articles that he has already created. AshFox (talk) 13:51, 14 July 2025 (UTC)Reply
    @Benwing2 so what's your opinion? AshFox (talk) 10:14, 17 July 2025 (UTC)Reply
    @Benwing2 so what's your opinion? #2 AshFox (talk) 05:25, 1 August 2025 (UTC)Reply
    So my thoughts are:
    1. We shouldn't add two L2's at once; one at a time, to make sure that the first one works out.
    2. We shouldn't add any Slavic lect L2's that aren't (a) standardized, and (b) with clear evidence that the standard is accepted by speakers and in use.
    3. You mention creating guides for templates, normalization rules, bibliography, etc. IMO this should be done before adding an L2.
    4. In general, having only one editor who works on a language isn't good unless it's pretty much guaranteed that this editor will stick around for a long time, because otherwise the language will get orphaned. I still foresee a repeat of the Scots debacle and want to avoid it.
    Benwing2 (talk) 07:24, 1 August 2025 (UTC)Reply
    @Benwing2
    1. Okay, let me focus on West Polesian first. And let's add it as L2.
    2. I will use the West Polesian standardization developed in 1990. I will standardize all entries (from existing paper dictionaries and atlases) to match that orthography.
    3. Just created a template for transliteration: Module:zle-wpo-translit. And Template:zle-wpo-categoryTOC. Well, I planned to create most of the templates after the separate code appeared, because for example I planned to use a new Module:bibliography, but it is impossible to create it without the code itself. An example of Bibliographies I have created previously: Old Novgorodian (data/zle-ono) or Carpathian Rusyn (data/rue), I want to do something similar for West Polesian.
    4. Unfortunately, in many Slavic languages there are few permanent editors (1 or 2 people) or none at all. But if there are people with initiative (like me), then that’s not bad. I'm on Wikipedia since 2012 and have been actively editing Wiktionary since 2021. I have no plans to leave. And we can all die at any moment, but if we are afraid of death, then it is impossible to live like that. I will edit Wiktionary and keep an eye on the sections that interest me for as long as I live. There won't be a disaster like with Scots, I assure you.
    AshFox (talk) 18:31, 1 August 2025 (UTC)Reply
    @Benwing2 I created a template for a West Polesian bibliography module in the sandbox. What is your further opinion? AshFox (talk) 13:58, 3 August 2025 (UTC)Reply
    @AshFox: You might be interested.
    1. Martynov, V. V., Tolstoj, N. I., editors (1968), Полесье: Лингвистика. Археология. Топонимика [Polesie: Linguistics. Archeology. Toponymy]‎[27] (in Russian), Moscow: Nauka
    2. Tolstoj, N. I., editor (1968), Лексика Полесья [Vocabulary of Polesia]‎[28] (in Russian), Moscow: Nauka
    Lerman (talk) 23:35, 24 July 2025 (UTC)Reply
    @ɶLerman Thanks, yes these are also good sources. I had the second book, «Лексика Полесья» (1968), but I hadn't noticed the first one, thanks for the tip. AshFox (talk) 11:24, 25 July 2025 (UTC)Reply

    If anyone discerns a consensus above, please comment; for my part, I get the sense we may just have to wait for scholars and speakers to document and classify and standardize these lects better before we can make any major changes. But it seems fine to add ety-only codes (are there are objections? if not, I will add them later), because whether they are viewed as dialects of one language or another or as separate languages, the existence of them does not seem to be in any serious doubt, and if words in them need to be mentioned (as comparanda etc) in etymologies, it seems reasonable to have a code to make that more convenient. Indeed, if people later decide to reclassify them (making them separate languages, or dialects of Belarusian instead of how we are currently de facto handling at least one of them, as dialects of Ukrainian), having tagged mentions of words in them with their own code will also make that process smoother. - -sche (discuss) 06:33, 1 August 2025 (UTC)Reply

    @-sche: I think it is possible to create codes, but I am not sure about the classification. A long time ago I encountered the problem of the lack of code for the Old Slovene language, but it seems that Sławobóg has now raised this issue and I support him in this. Lerman (talk) 20:55, 1 August 2025 (UTC)Reply
    Ultimately, the Ruthenian Tree should look like this:
    [-]┬ Old Ruthenian (zle-ort)
    ㅤ├[-]┬ Middle Belarusian (zle-mbe) V
    ㅤ│ㅤ└─ Belarusian (be)
    ㅤ└[-]┬ Middle Ukrainian (zle-muk) V
    ㅤ ㅤ ├─ Carpathian Rusyn (rue)
    ㅤ ㅤ ├[-]┬ Ukrainian (uk)
    ㅤ ㅤ │ㅤ└─ Canadian Ukrainian (uk-CA) V
    ㅤ ㅤ└─ West Polesian (zle-wpo)
    By the way, Carpathian Rusyn also needs to be transferred to the descendants of Middle Ukrainian. This is an old mistake that was forgotten to be corrected a couple of years ago. AshFox (talk) 09:42, 2 August 2025 (UTC)Reply
    Stop trying to push this through without any consensus. Thadh (talk) 19:02, 9 August 2025 (UTC)Reply
    @ɶLerman if u support Old Slovene then say it in Old Slovene topic. Sławobóg (talk) 15:21, 3 August 2025 (UTC)Reply
    @-sche: For the Podlachian language we seem to have no trustworthy cooperative editors at the moment (see my previous comment about @PGałązka above), so I don't see any particular urgency in getting this resolved until we know who is going to maintain the new L2.
    Also I wonder why is it named "Podlachian" and not "South Podlachian"? The northern part of Podlaskie Voivodeship is apparently ethnic Belarusian, including its administrative center Białystok, based on the maps posted by @AshFox. Simply labelling this literary standard as "Podlachian" may be misleading, because this kinda unreasonably calls dibs on the whole Podlaskie Voivodeship region. --Ssvb (talk) 11:07, 13 August 2025 (UTC)Reply
    As for just the ety-only codes without L2, does this mean that (South) Podlachian terms are only to be added to the etymology sections of Belarusian and Ukrainian entries as cognates? --Ssvb (talk) 11:28, 13 August 2025 (UTC)Reply
    Podlachian is not a Belarusian language. Just as West Polesian in southwestern Belarus is not a Belarusian language. AshFox (talk) 11:36, 13 August 2025 (UTC)Reply
    Belarusian entries normally list Polish, Russian and Ukrainian cognates in their etymology sections at the moment (despite the fact that Polish, Russian and Ukrainian are not Belarusian). In the same manner, the Belarusian entries could start listing Podlachian cognates too. But again, my main concern is the Podlachian orthography. Do we have a consensus about using Jan Maksymiuk's orthography in Wiktionary? --Ssvb (talk) 12:22, 13 August 2025 (UTC)Reply
    @Ssvb in order to be able to specify Podlachian cognates in Etymologies in other languages, Podlachian needs to be L2. And this seems unlikely to happen...
    Jan Maksymiuk's orthography is the only "official Podlachian orthography". There are no others. But since Podlachian is not L2 language, it is considered a dialect of the Ukrainian language... and this means that I have translated almost all the entries of the word into Ukrainian Cyrillic. AshFox (talk) 12:47, 13 August 2025 (UTC)Reply
    Well, only the Jan Maksymiuk's orthography has an ambition to become "official". But different authors seem to be writing using their own unique to them variants of orthography. The Podlachian grammar book has an appendix titled "Przykłady zapisu tekstów podlaskich" with many examples of these variants. Like the Wiktor Stachwiuk's book "Подых тэмры", which was published in 2021 and used Cyrillic. --Ssvb (talk) 16:22, 13 August 2025 (UTC)Reply
    @Ssvb, unfortunately, @Benwing2 hinted that he does not want to add a new L2 for the requested languages.
    "Podlachian" is now defined as a Podlachian Ukrainian subdialect within the Western Polesian Ukrainian dialect group. AshFox (talk) 11:33, 13 August 2025 (UTC)Reply
    Benwing2 listed a reasonable set of requirements, and it's not impossible to satisfy them in principle, but we seem to be far from that milestone at the moment. First of all it's necessary to confirm whether the Jan Maksymiuk's orthography is acknowledged by the Podlachian locals as truly representing their native language and whether the other Podlachian authors have already adopted it in their works or have plans to adopt it. --Ssvb (talk) 12:08, 13 August 2025 (UTC)Reply
    @Ssvb If you haven't noticed, I've fulfilled 90% of Benwing2's requirements. But the problem is that the administration has no interest in adding anything. AshFox (talk) 12:40, 13 August 2025 (UTC)Reply
    @AshFox I have to respond to this. I am not in principle opposed to adding a new L2 language if the requirements can be met, but I don't think you've satisfied most of my requirements. For example, I said that there needs to be a standardized orthography that is accepted by native speakers, and if I'm not mistaken, you said on Jul 9 that there isn't currently one. This seems like the biggest blocker; if there isn't a standard orthography, or if there are multiple competing ones, then either we have to do a lot of original research (not ideal and could easily end up totally wrong) or we're likely to get a big mess of nonstandardized forms. You also need to identify which sources are reliable, and in what ways; in my experience, a large percentage of sources for any given language are unreliable in one respect or another, and it takes a good deal of research and cross-comparison of resources to figure this out. Yes, not all existing L2's had this work done before adding them, but given that we have the tenable alternative of including these microlanguages as varieties of existing L2's, I think it behooves us to get this right so we don't end up with another Scots. Benwing2 (talk) 19:19, 13 August 2025 (UTC)Reply
    @Benwing2 I said in #2 ‒ 18:31, 1 August 2025 that I decided to use the spelling developed in 1990. Here is Orthography in my sandbox. I also made a translit module for this Orthography. AshFox (talk) 19:33, 13 August 2025 (UTC)Reply
    Who developed this spelling, is it accepted by native speakers and are there competing orthographies? Have you audited your sources to determine which ones are reliable and how? Benwing2 (talk) 06:11, 14 August 2025 (UTC)Reply
    @Benwing2
    • Orthography was developed by Nikolai Shelyagovich during the first West Polesian codification in 1989 ‒ 1995 (eng.Wiki, more details ukr.Wiki / rus.Wiki).
    • I have spoken to West Polesian speakers and they generally accept Shelyagovich's Orthography, except for one letter. They have a negative attitude towards the introduction of the Cyrillic letter Ј, ј to denote /j/, instead of the Й, й that is familiar to Eastern Slavs. Although linguist G. Tsykhun (2001), on the contrary, praised this innovation.
    • There are no competing orthographies. There is simply an attempt to write West Polesian using the Belarusian alphabet.
    • I don't understand the last sentence a bit... The sources that I use initially are already sources of the highest authority: Dialect Atlas of Belarus (DABM, 1963), Dialect Atlas of Ukraine (AUM, 1988), works of linguist on Western Polesian dialects H. Arkushyn (2012, 2014, 2016), famous works about Polesia (1968a, 1968b), etc. I didn't use "unreliable sources" initially, so that there was a need to check something
    AshFox (talk) 08:15, 14 August 2025 (UTC)Reply
    > "Podlachian" is now defined as a Podlachian Ukrainian subdialect within the Western Polesian Ukrainian dialect group.
    Defined by whom? --Ssvb (talk) 12:13, 13 August 2025 (UTC)Reply
    Linguists studying Ukrainian dialects... all Western Polissian (which also includes Podlachian) are dialects of the Ukrainian language. AshFox (talk) 12:50, 13 August 2025 (UTC)Reply

    ISO code changes: add ynb, oak; remove dek, nte

    [edit]

    Since last we looked, the ISO has added or removed codes for a few languages:

    • dek, which we'd already comment-flagged in Module:languages as possibly spurious, has been retired; we don't seem to have a category for it and searches like insource:"l dek", insource:"m dek", insource:"t dek" turn up nothing, so I think we can just remove it.
    • nte has been merged into eko; we have an empty Category:Nathembo language and do not appear to have any content, so I think we can just remove it.
    • ynb Yamben was added; it seems reasonable for us to also add the code (and some words).
    • oak Noakhali was added, a Bengali lect.
    • BTW, some of the codes they added in 2022 are also still missing (vjk, lvl, wtb, ikh, eud, dzd).

    - -sche (discuss) 05:44, 23 July 2025 (UTC)Reply

    Pinging our most active Bengali-speaking editors, @Smartiphone7, Sbb1413, Ash wki: do you have any opinions on whether Noakhali should be added as a separate language, or handled as a {{lb|bn|dialect}} of ==Bengali==? - -sche (discuss) 05:48, 23 July 2025 (UTC)Reply
    @-sche Although I have no problem with Noakhali as a separate language, the problem with the Bengali lect is that it does not have a standardized written form, unlike Chittagonian and Sylheti. Maybe the East Pakistan dictionary can be used to derive an ad hoc orthography for Noakhali. Sbb1413 (he) (talkcontribs) 09:37, 23 July 2025 (UTC)Reply
    I have removed "dek" and added "ynb". - -sche (discuss) 22:34, 1 August 2025 (UTC)Reply
    I have removed "nte". - -sche (discuss) 16:56, 13 August 2025 (UTC)Reply

    Lemmatization for Indonesian regional languages

    [edit]

    Hello everyone, I propose that Balinese and Javanese (and other regional languages of Indonesia) be lemmatized in Latin script. Because both languages today are mostly written in Latin script by their speakers, the use of traditional script is rare and only used for special occasions, for example on certain road signs. Even almost all modern dictionaries of both support this, for example this Javanese dictionary and this official Balinese dictionary; also Google translator and wikis for both languages are written in Latin script too (unlike Hindi, Urdu or Bengali). Even if there is a concern that the use of Latin script is ambiguous, this can easily be resolved by using additional diacritical letters (e.g. "é", "è", "ò" for Javanese; or e.g. "é" for Balinese). Rentangan (talk, contribs) 08:39, 23 July 2025 (UTC)Reply

    Tagging @Austronesier @Rex Aurorum, @Swarabakti, @Sponge2490, @Wiktionarian89 and @Udaradingin for opinions. Rentangan (talk, contribs) 08:42, 23 July 2025 (UTC)Reply
    Totally agree. Udaradingin (talk) 12:36, 23 July 2025 (UTC)Reply
    Agree. I think it's better if their entries is standardized along with other Indonesian languages. Acehnese and Sundanese main entries is already being written in the Latin script with diacritics. It makes sense for Balinese and Javanese to follow the same style. The regional script entry can be created with spelling template just like how Sundanese uses the {{su-hana}} template. Sponge2490 (talk) 10:15, 24 July 2025 (UTC)Reply
    Agree per the above comments. If anything, it is Old Javanese and Old Sundanese that should be lemmatized in their (attested) non-romanized forms, though IDK how practical (?) that would be. Swarabakti (talk) 14:26, 24 July 2025 (UTC)Reply
    Yeah, if the traditional scripts are really important, then they should be only lemmatized for the old(er) stage of the languages (for example, I think Classical Malay should be always linked in Jawi), although as you said before, it might not be practical sometimes. Tagging @Xbypass and @Suku Melayu for additional opinions. Rentangan (talk, contribs) 20:32, 24 July 2025 (UTC)Reply
    I support lemmatization in Latin script. Suku Melayu (talk) 20:37, 24 July 2025 (UTC)Reply
    After considering the Malay Arabic script aka Jawi (which limited vowel representation), I support lemmatization in Latin script for Malay entries. For the Classical Malay, while I prefer to have it in Malay Arabic script, I have no knowledge about how the language was (aka how the Malay Arabic script pronounced). Xbypass (talk) 08:46, 26 July 2025 (UTC)Reply
    In case of Old Javanese, Old Javanese is written in Kawi/Old Javanese, Javanese, and Balinese scripts. While I prefers to make the Kawi/Old Javanese as main entry while makes others as soft direction entries, the Kawi/Old Javanese script has technical issue (as Kawi script was added to the Unicode Standard 15.0 in September 2022) on font and entry support, so lemmatization on Latin can act as practical temporary solution until Kawi/Old Javanese has no technical issue. Xbypass (talk) 08:53, 26 July 2025 (UTC)Reply
    I agree, and I recall this has been brought up before. Personally, I don’t have much say in the matter since I'm not a native speaker, but I've noticed that there are some users who might be native speakers who strongly prefer using the traditional scripts as the main entry forms, which can make it difficult to reach consensus. Wiktionarian89 (talk) 02:26, 25 July 2025 (UTC)Reply
    Tagging @Mrachmad59 for opinion. Rentangan (talk, contribs) 03:46, 26 July 2025 (UTC)Reply
    Agree, and this will be easier for user from usability perspective. Mrachmad59 (talk) 03:20, 27 July 2025 (UTC)Reply
    If it is lemmatised in Latin script with diacritics, the concerns about this idea are
    1. The Latin standard is written without diacritics.
    2. The Latin spelling template proposed, such as {{su-hana}}, does not redirect into specific sense, but to ambiguous page. Different senses have different and not interchangeable traditional spellings, but shared the same Latin spelling.

    In regards of Indonesian, the Indonesian entry has such problem to differentiate the sound and makes such Indonesian entry clustered with different pronunciation and made such long winded entry. While I appreciate that standardized along with other Indonesian languages," I think this depends on the specific circumstances of the particular language.

    If it is lemmatised in traditional script, while Latin entry uses Romanisation templates, the Latin script can redirect in specific traditional script entry and senses, while the entry does not become long winded entry.

    As "the basic idea is very simple: you have one main entry with most of the information and lists of alternative forms, and you have multiple alternative form entries that are there mostly to link to the main form" and "good dictionary is characterized by its clarity, accuracy, and comprehensive coverage of language," hence I prefer the traditional script than the Latin script as it is keep the clarity of spelling in traditional while keep accuracy of redirection from commonly use Latin script. It is comparable to Wiktionary vote to keep Chinese entry in traditional script instead of commonly-used simplified (Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese).

    Nevertheless, I know that there is not many people who understand the traditional script but that can be the extra point of Wiktionary in comparison to the modern dictionaries. Xbypass (talk) 08:39, 26 July 2025 (UTC)Reply
    Okay, I kind of see your points but still, if we really persist on using traditional scripts, it feels very weird, why? These languages are not like Bengal or Hindi where there is active significant usage of traditional scripts for them. And no, just because we lemmatized them (Indonesian regional languages) in Latin script, doesn't mean we just completely remove the alternate traditional script spellings. Also continuing to insist on lemmatizing in traditional script even though most speakers are only proficient in Latin script does not seem neutral and may only make it more difficult for native speakers who really want to look up words and definitions in a dictionary (I even think the soft redirection mechanism you mentioned earlier is impractical for such a language context). To overcome ambiguity due to the use of Latin, it is very easy, just use additional diacritical letters, and if there is a nonstandard diacriticless spelling then just create a new stub entry or etymology section and add for example {{nstd sp|..|..}}. Also, the template {{su-hana}} can be specified in meaning with the example {{su-hana|kolot|t=old}}... and yes I have to admit that these templates cannot redirect to a specific sense/etymology, but they could probably be improved in the future. The comparison you provided doesn't seem to apply to this case. Rentangan (talk, contribs) 10:50, 26 July 2025 (UTC)Reply
    If it's written without diacritics, it can easily be replaced using the {{nstd sp}} template, as shown in an entry added by Zayn Kauthar. Alternatively, both languages can be lemmatized in Latin spelling without diacritics since most speakers don’t typically write with diacritics, while listing the standard form (with diacritics) under the 'Alternative forms' section, though this will result in an "entry clustered with different pronunciation and made such long winded entry".
    Also, if the traditional script can’t currently be linked to a specific sense, the template can be modified to redirect to a sense marked with the {{senseid}} tag. Or we could just check which definition matches the traditional script listed in the head template. Sponge2490 (talk) 13:39, 26 July 2025 (UTC)Reply

    If the template {{su-hana}} can be specified in meaning with the example {{su-hana|kolot|t=old}}, then it defeat the purpose of "the basic idea is very simple: you have one main entry with most of the information and lists of alternative forms, and you have multiple alternative form entries that are there mostly to link to the main form" as the traditional script entry have to maintain information (at least the sense).

    As these templates at this moment does not give redirection to a specific sense in Latin script and I have not seen that "the template can be modified to redirect to a sense marked with the {{senseid}} tag" or "we could just check which definition matches the traditional script listed in the head template" in real situation, so I still prefer to lemmatise in traditional script. I know that "they could probably be improved in the future," but we hold discussion now but not in the future.

    The point is diacritical letters is not the standard. The problem lies on maintaining clear convertibility between Latin and traditional script while keep single main entry. If lemmatization is done in Latin script, the traditional script entry have to include "sense" which break the single main entry rule. If lemmatization is done in traditional script (as Chinese entries), then no problem happens except for "practicality". Xbypass (talk) 00:07, 27 July 2025 (UTC)Reply
    FYI, the Balinese {{ban-bali}} template already has support for the |id= parameter that can lead to a specific sense given that a {{senseid}} is included in the definition. For example, adding {{senseid|ban|honey}} in madu second definition while modifying the ᬫᬥᬸ definition with {{ban-bali|madu<id:honey>}} or {{ban-bali|madu|id=honey}} will redirect it to the honey definition and subsequently highlight it. Though I don't see similar functionality with the Javanese template. Sponge2490 (talk) 09:17, 27 July 2025 (UTC)Reply
    Nevertheless, it resulted in long entry of madu and necessary definition addition in ᬫᬥᬸ such as {{m|ban|ᬫᬥᬸ|t=honey}}. Meanwhile adding {{senseid|ban|honey}} and {{ban-bali|madu|id=honey}} fails to give correct position of honey definition (instead of to fight) although it is correctly highlight the honey definition.

    Hence, As these templates at this moment does not give redirection to a specific sense in Latin script and I have not seen that "the template can be modified to redirect to a sense marked with the {{senseid}} tag" or "we could just check which definition matches the traditional script listed in the head template" in real situation, so I still prefer to lemmatise in traditional script. Xbypass (talk) 11:52, 31 July 2025 (UTC)Reply
    The traditional/simplified characters dichotomy in Chinese entries is not quite comparable to the traditional scripts/Latin orthographies dichotomy in the regional languages of Indonesia, as the latter have developed largely independent from each other. That is to say, the traditional script spelling conventions often have had little influence on their Latin counterparts, though for Javanese in particular there have been attempts to harmonize orthographies in both writing systems. In any case, it is rather misleading to speak of the Latin forms commonly used by speakers of Indonesian regional languages as "romanisations" of the traditional scripts.
    Actually, I think the convention of lemmatizing the traditional forms in Chinese entries is the perfect analogy for the usage of {{nstd sp}} to soft-redirect entries with diacriticless Latin to the ones with diacritics; they are both basically the same writing system, with the former being "simplified" forms combining many of the latter forms.
    Of course, as you said, there is still the problem that the use of diacritics in regional languages of Indonesia are hardly standardized, but at least for larger languages there have been official guidelines outlining spelling conventions published by language regulators, in addition to various dictionaries for smaller languages that may also be used to derive ad hoc orthographies (if necessary).
    Also, I fail to see why would disambiguating the senses in the traditional script entries defeat the purpose of having the main entries with most (not all!) information at the Latin forms. It is not much different from e.g. the way Vietnamese chu Nom entries are directed to the relevant quoc ngu forms. Obviously, there are more to the entries than just simple glosses... Swarabakti (talk) 12:32, 27 July 2025 (UTC)Reply
    ...which does not happens in traditional script as it is pretty standardised, similar to Chinese one. However, Chu Nom is not more standardised than Latin orthography, which does not happens to Indonesian traditional script. In the case of chữ Nôm, there are three ways to write it in chữ Nôm (字喃, 𡨸喃, or 𡦂喃). So, Vietnamese has different problem in contrast of Indonesian traditional one.

    However, the Vietnamese one is similar to the Old Javanese needs, so as I wrote "While I prefers to make the Kawi/Old Javanese as main entry while makes others as soft direction entries, the Kawi/Old Javanese script has technical issue (as Kawi script was added to the Unicode Standard 15.0 in September 2022) on font and entry support, so lemmatization on Latin can act as practical temporary solution until Kawi/Old Javanese has no technical issue." Xbypass (talk) 11:42, 31 July 2025 (UTC)Reply
    So, just because of a technicality of the templates, we suddenly ignore the fact that speakers of these languages use the Latin alphabet as their everyday, commonplace script? I don't think that's fair, and labeling Latin spellings as "romanization" creates the false impression that they're used like Thai or Burmese. Come on, even dictionaries out there, including the official ones, use the Latin alphabet. I must say it again, we can still add traditional script spellings when relemmatizing entries to the Latin script. Rentangan (talk, contribs) 00:41, 28 July 2025 (UTC)Reply
    Well, you ignore the problem that these relemmatizing entries in Latin did not solve the clarity of conversion from tradition script to Latin and vice versa. So, come on, people want clear dictionary... not just another Latin one. Xbypass (talk) 11:34, 31 July 2025 (UTC)Reply
    I know that you want a unique dictionary that utilizes both Latin and traditional script to resolve this kind of ambiguity, and it's also elegant right? Just like how I was (and probably still) stubborn about Indonesian transitive verbs (although, I decided to not bother with this topic anymore; anyone can freely edit with any Indonesian verbs) but the more I consider, I think it's best to stick with common usage, right? And yeah, I shouldn't have jumped to certain decisions too quickly. Rentangan (talk, contribs) 11:48, 31 July 2025 (UTC)Reply
    Well, Wiktionary, to be honest, has advantage in this kind of convertibility issue which is sacrificed in other dictionaries. Lemmatization in Latin orthography makes this capability go to waste. However, I agree to your proposition that people wrote these language in Latin and we shall accommodate this in Wiktionary. Hence, I have a suggestion to accommodate this, while lemmatization is done in traditional script, the Latin entry is allowed to have pronunciation entries (IPA, homophone, etc) and the entry headers uses the normal one (not the -form one and romanisation) and the definition uses soft redirection of romanisation.

    So, the entry of the madu page will goes like this

    ==Balinese== ===Pronunciation=== * {{IPA|ban|/ma.du/}} * {{rhymes|ban|du|s=2}} * {{hyphenation|id|ma|du}} ===Noun=== {{head|ban|noun}} # {{romanization of|ban|ᬫᬤᬸ}} # {{romanization of|ban|ᬫᬥᬸ}} ===Verb=== {{head|ban|verb}} # {{romanization of|ban|ᬫᬤᬸ}} Xbypass (talk) 12:10, 31 July 2025 (UTC)Reply

    I don't like this compromise, it still gives false impression that the languages are used like Thai (and not even close to Hindi) even though they're always written in Latin orthography by native speakers. If we actually want lemmatization in a traditional script, apply it for the old stage of the languages instead, not the modern one. Also, @Mrachmad59, a native Javanese speaker, agrees with lemmatization in Latin. Rentangan (talk, contribs) 05:30, 3 August 2025 (UTC)Reply
    While I don't like that proposal to lemmatise in Latin as it is unclear. Do you think that Wiktionary lemmatization in Traditional Chinese meant that majority uses Traditional Chinese? Xbypass (talk) 03:00, 6 August 2025 (UTC)Reply
    No, many Chinese speakers also competent in writing the traditional characters. And, do you remember when you were linking Malay entries in Jawi? Sorry, I suspect that the only reason you prefer lemmatization in traditional scripts because of personal desire. If you still want to edit Wiktionary, why don't create audio pronunciations or check entries in the Category:Indonesian entry maintenance. Rentangan (talk, contribs) 04:05, 6 August 2025 (UTC)Reply
    Sorry @Xbypass if I sounded harsh. So, you're welcome to contribute in any area, even in both languages. I must admit that sometimes your edits can be very helpful, but please use Latin orthography (and again, we can still provide the traditional spelling), and besides, do readers come here more interested in having entries written in traditional script than Latin?
    To address your concern, what if ambiguity arises if Latin orthography is used? If you mean to distinguish a particular vowel (e-é, for example), then we could standardize the spelling directly—it's more legible to readers than the traditional script—or if we don't want to use Latin orthography with diacritics, we could simply mark the diacritics in the headword. What if your intention is ambiguous because it would "make it difficult" for readers to find the intended meaning for terms with diverse etymologies? Let's pause and think twice: is that the only reasoning we use to prioritize traditional scripts (of which many native speakers are incompetent) over Latin? After all, almost all readers, if they're genuinely curious about a term, they would scroll from the top to the end of the entry and find the intended meaning. Isn't it a good thing to have very long entries (in a reasonable way), because we're indirectly entertaining readers with a wealth of information that might otherwise be overlooked?
    We should reconsider that as responsible editors. I want this dictionary to be accessible to a reasonable audience; I don't want it to treat these languages like museum pieces. And don't forget that we're talking about Javanese and Balinese (and other regional languages in Indonesia), not Thai or Hindi. Because all native speakers are proficient in Latin script today, but only a small minority are truly literate in their traditional scripts. Rentangan (talk, contribs) 03:28, 11 August 2025 (UTC)Reply

    No, many Chinese speakers also competent in writing the traditional characters.

    Of course, many Chinese speakers can write in Traditional, but most Chinese writing is done in Simplified.

    To address your concern, what if ambiguity arises if Latin orthography is used? If you mean to distinguish a particular vowel (e-é, for example), then we could standardize the spelling directly—it's more legible to readers than the traditional script—or if we don't want to use Latin orthography with diacritics, we could simply mark the diacritics in the headword.

    Hence, for resolving inconsistency and ambiguity in the Latin orthography, then Wiktionary add "new standard" that only used in Wiktionary. Personally, I see this as adding more unclarity. We should reconsider that as responsible editors. Basically, lematising in Latin orthography will result in a entry with multiple traditional orthography with soft direction in traditional orthography have to include the sense which has to be maintained.

    What if your intention is ambiguous because it would "make it difficult" for readers to find the intended meaning for terms with diverse etymologies?

    That has been accomodated in homophone templates.

    Let's pause and think twice: is that the only reasoning we use to prioritize traditional scripts (of which many native speakers are incompetent) over Latin?

    Moreover, as "many native speakers are incompetent", it adds to importance for maintaining in traditional orthography as Wiktionary has capability to maintain clear conversion from traditional orthography to Latin orthography, but that is not the main reason.

    After all, almost all readers, if they're genuinely curious about a term, they would scroll from the top to the end of the entry and find the intended meaning.

    Sure, if they are curious, they will click or hover over the link for the definition. Nevertheless, soft redirection from traditional orthography needs to maintain senses in the soft redirection page, while soft redirection from Latin orthography does not.

    We should reconsider that as responsible editors. I want this dictionary to be accessible to a reasonable audience; I don't want it to treat these languages like museum pieces. And don't forget that we're talking about Javanese and Balinese (and other regional languages in Indonesia), not Thai or Hindi. Because all native speakers are proficient in Latin script today, but only a small minority are truly literate in their traditional scripts.

    That is another reason to do lemmatisaation in traditional script as it is unlock the traditional script to wider readers, but that is not the main point. The main point is the traditional orthography (for Javanese and Balinese cases) is more consistent and clear than the Latin orthography, as a good dictionary should be comprehensive and easy to read, containing definitions that are clear and up-to-date, hence it shall be lemmatised in the clearest orthography (in Javanese and Balinese language, it is traditional one). Xbypass (talk) 23:41, 11 August 2025 (UTC)Reply
    Again, the Latin orthographies are not mere conversion of the traditional scripts (or vice-versa for that matter—unless you're following KBJ '92 Javanese script orthography). There is nothing unclear about having the traditional script entries soft-redirect to the relevant Latin ones, with disambiguating glosses for different senses if necessary. In fact it would probably be more helpful for readers to find out the different etymologies (and thus different traditional script spellings) for terms that are homonymic in Latin, if they are discussed at once in the Latin entries.
    One more thing to consider: while languages of Indonesia are not considered well-documented on the internet for the purpose of WT:ATTEST, forms in traditional scripts are especially harder to find quotations for. If traditional script forms remain the lemmas, we also need to clarify whether quotations of the Latin forms (in any spelling) suffice to verify these lemmas. Even then, IMO it would be pretty confusing to have quotations for entries with traditional script headwords given in their Latin forms. Alternatively, we can separate quotations by forms, but this is potentially even more confusing since most quotations available for these languages will likely be found in Latin entries instead of the main traditional script entries.
    If we lemmatize these languages in the Latin forms, we can provide Latin quotations (which are much more readily available) under the same entries as definitions and other information. Of course, many of the Latin materials available are not following the exact same standardized orthographies (neither are actual attested texts in traditional scripts btw). But this can be solved by having them respelled if necessary (cf. Pabaru Cina and anggur sempani). I'd say that respelling quotations in the same writing system is still more intuitive to readers than having them used to verify forms written in different scripts altogether. Swarabakti (talk) 15:21, 31 July 2025 (UTC)Reply
    All quotations should be in one place. If you have quotations in Aksara Jawa, they should be in the lemma entry, no matter if that lemma is in Latin script. Suku Melayu (talk) 15:27, 31 July 2025 (UTC)Reply
    Adding disambiguating glosses in soft redirect page mean that we maintain both entries, which defeat the purpose of "the basic idea is very simple: you have one main entry with most of the information and lists of alternative forms, and you have multiple alternative form entries that are there mostly to link to the main form". Xbypass (talk) 03:04, 6 August 2025 (UTC)Reply
    I'd like to put my two cents to the discussion.
    I've almost never used the Sundanese script in my day-to-day life. While it's great that Wiktionary have entries for the Sundanese script variants of words, having them as the main entry while relegating the more widely used Latin script simply as "Romanisation" can be convoluted and discouraging to people who wants to use the website. I can't say the same for Javanese but as someone who's native Sundanese, I think it's better for the Latin script to be the main entry rather than the Sundanese script, same reason why Malay entries doesn't use Jawi as their main. Zayn Kauthar (talk) 13:59, 26 July 2025 (UTC)Reply
    Should we start a vote for this matter? (also ping @User:DDG9912, @User:Ekirahardian). Rentangan (talk, contribs) 12:05, 31 July 2025 (UTC)Reply
    @Rentangan: I think yes, but where we will put this vote?   DDG9912   12:08, 31 July 2025 (UTC)Reply

    Re-classify Baima as Tibetic

    [edit]

    This should be uncontroversial. Overall the fact we still have Qiangic languages as an actual group is a bit iffy. Thadh (talk) 18:30, 26 July 2025 (UTC)Reply

    August 2025

    [edit]

    Carpathian Rusyn fix

    [edit]

    @-sche, @Benwing2. There is an inaccuracy in the position of Carpathian Rusyn rue in the language tree, it arose a couple of years ago when regional variants were added to Old Ruthenian zle-ort, and Carpathian Rusyn was forgotten at that time... I've been wanting to put this in order for a long time...

    ┬ Old Ruthenian (zle-ort)
    ├─── Carpathian Rusyn (rue)
    ├─┬ Middle Belarusian (zle-mbe)
    │ └─ Belarusian (be)
    └─┬ Middle Ukrainian (zle-muk)
      └─ Ukrainian (uk)

    ┬ Old Ruthenian (zle-ort)
    ├─┬ Middle Belarusian (zle-mbe)
    │ └─ Belarusian (be)
    └─┬ Middle Ukrainian (zle-muk)
      ├─ Carpathian Rusyn (rue)
      └─ Ukrainian (uk)

    Old Ruthenian is a language of the period 1450‒1800, which had 2 regional variants:

    1. Northern (territory of modern Belarus) ‒ is designated by the etym-code Middle Belarusian zle-mbe. From it developed modern Belarusian be.
    2. Southern (territory of modern Ukraine) ‒ is designated by the etym-code Middle Ukrainian zle-muk. From it developed modern Ukrainian uk... and Carpathian Rusyn rue.

    It is necessary to indicate Carpathian Rusyn as a descendant of Middle Ukrainian. It is not for nothing that Carpathian Rusyn was considered a dialect of the Ukrainian language for a long time, because it is derived from the Ukrainian dialect continuum. Carpathian Rusyn inherited all the characteristic "Ukrainian features" that developed in Middle Ukrainian. Such as: "Ikavism" (w:uk:Ікавізм) that is, the transition /o/ > ⟨ô⟩ > ⟨u⟩ > ⟨ü⟩ > /i/ (only in Carpathian Rusyn it has an "unfinished" character, stopping at ⟨ü⟩, which is typical for the border dialects of Middle/Modern Ukrainian); Characteristic Ukrainian transition Yat ѣ > /i/; Typical Ukrainian transition of old и /i/ > /ɪ/; And others, listed the main "Ukrainian" phonetic features, which at the same time turn out to be common to rue and uk. AshFox (talk) 16:30, 5 August 2025 (UTC)Reply

    This has been discussed many many times on Discord and from what I've seen, AshFox was the only one in favour of this, and many people (including me) were sceptical that this is a good idea. Thadh (talk) 17:42, 5 August 2025 (UTC)Reply
    There was not so much “skepticism” among the majority, as “indifference and unwillingness to be interested” in what is happening in the East Slavic languages. Many simply edit only for example Polish and they do not care what is going on in the East Slavic languages... it is enough for them that there is Russian, Ukrainian and Belarusian and that's it, nothing more is needed... and how they are connected to each other, is already indifferent. The main argument of the opponents was that "Ukrainian phonetic features" that developed in the period of Middle Ukrainian and distinguish Ukrainian from all other Slavic languages (and these features are “miraculously” present in Carpathian Rusyn, which is why many considered and consider it part of the Ukrainian language)... in Carpathian Rusyn developed independently of the entire Southern Old Ruthenian (Middle Ukrainian) area. For me, the absurdity of this argument is similar to, for example, declaring “that the Russian language is not East Slavic, because perhaps the characteristic East Slavic feature “pleophony” (*or → oro, *ol → olo ...) in Russian developed independently of Ukrainian and Belarusian, and Russian had no period of commonality with them.” AshFox (talk) 18:51, 5 August 2025 (UTC)Reply
    @AshFox: You're misrepresenting the argument. Features may travel accross dialect/language boundaries, so one sound change, that isn't even the same in the two varieties we are talking about (despite how often you might claim that /y ~ u/ and /i/ is basically the same thing, they aren't), is not necessarily enough to posit a branch, considering Ukrainian and Belarusian also underwent sound changes together that Rusyn did not (e.g. the prothetic /v-/ before /u/). Note also that Belarusian and Russian both underwent the labialisation of a stressed /je/ > /jo/ in exactly the same conditions while, according to our tree model, already being two distinct languages by that time. Thadh (talk) 23:28, 5 August 2025 (UTC)Reply
    No, it is precisely the transition o > i (through the stages o > ô > u > ü > i) that is precisely the unique "Ukrainian feature" ‒ Ikavism, which arose in Middle Ukrainian. Ikavism arose in the central dialects of Middle Ukrainian and spread from the core to the extreme dialects (according to George Shevelov)... and not all border dialects of the Ukrainian dialect continuum completed Ikavism... for example, in Polesia, or in the Carpathians... which is shown by Carpathian Rusyn with its penultimate stage of Ikavism ü (e.g. Ukrainian відкрити (vidkryty), Carpathian Rusyn вӱдкрити (vüdkryty)). The examples given of the type of prosthetic "v-" are not unique to the Ukrainian language and the Ukrainian dialect continuum... AshFox (talk) 03:02, 6 August 2025 (UTC)Reply
    In the 17th‒18th centuries in Middle Ukrainian /ü/ (the penultimate stage of Ikavism) was attested, for example in Middle Ukrainian вюзъ (vjuz, cart), кюнь (kjunʹ, horse), вюсюмъ (vjusjum, eight)...
    There was no proper letter for /ü/, so they used the closest possible "ю" (ju). AshFox (talk) 03:19, 6 August 2025 (UTC)Reply
    There is not enough historical linguistic research on Carpathian Rusyn as a language (as opposed to research on individual varieties) to place it under Middle Ukrainian. I appreciate the arguments for doing so, but this is original research territory.
    I would add that placing it under Old Ruthenian was also unwarranted, but that is tied to issues with the relevant Wikipedia article (see Ruthenian language), so I will start a discussion there. Engelseziekte (talk) 21:34, 12 August 2025 (UTC)Reply

    Language family

    [edit]

    Hi, I would like a code for the Luri language family encompassing CAT:Northern Luri language, CAT:Bakhtiari language, and CAT:Southern Luri language as it's sometimes not specified further in etymologies. (also needed in descendants) Saam-andar (talk) 09:32, 10 August 2025 (UTC)Reply

    @Benwing2 Can you add this? Saam-andar (talk) 09:48, 12 August 2025 (UTC)Reply
    @Saam-andar I think this needs more discussion given that Wikipedia unequivocally asserts that Luri is a single language, not a family. I'm not sure there was even a discussion concerning whether to add the separate Luri lects as L2 languages; someone may have added the languages out of process. WT:LT has no mention of Luri anywhere. Benwing2 (talk) 03:46, 13 August 2025 (UTC)Reply
    @Benwing2 Sorry for the late response, but most point to it being 2-3 languages, [29] [30] and are apparently often mutually unintelligible.
    Two other things:
    • Judeo-Tat (jdt) currently descends from fa-cls, and Tat (ttt) from fa, when both should descend from fa-ear [31]
    • Saranamd informed me that Judeo-Persian (jpr) doesn't make sense as an L2, since it is just Classical Persian written in Hebrew, (and most JP entries are already under the Persian L2). There is Early Judeo-Persian, which could replace the current L2, maybe like this:
      • Late Middle Persian (pal-lat) V
        • Early New Persian (fa-ear) V
          • Caucasian Tat F
            • Tat (ttt)
            • Judeo-Tat (jdt)
          • Classical Persian (fa-cls) V
            • Judeo-Persian (jpr) V
            • [others]
        • Early Judeo-Persian (jpr-ear)
          • Judeo-Persian (jpr) Variety of fa-cls
    Saam-andar (talk) 13:24, 16 August 2025 (UTC)Reply
    Lumping Judeo-Iranian as mere etymology-languages is valid, I and Saranamd already took the stance on Wiktionary:Beer parlour/2023/December § Deprecate Judeo-Persian?, there were but too few Iranian editors to take any measures. Fay Freak (talk) 17:31, 17 August 2025 (UTC)Reply
    Only Judeo-Tat remains, which could also be made a variant of Tat, as many sources support a linguistic geographical distinction rather than a religious one. Saam-andar (talk) 11:41, 18 August 2025 (UTC)Reply
    @Saam-andar IMO it doesn't make a lot of sense for Early Judeo-Persian to be an L2 but Judeo-Persian to be an etym variety, unless Judeo-Persian is not a descendant of Early Judeo-Persian but something else entirely. Benwing2 (talk) 20:30, 18 August 2025 (UTC)Reply
    Yeah, I probably misrepresented that.
    <Judeo-Persian> is Classical Persian in the Hebrew script, while <Early Judeo-Persian> is a different dialect at the time of Early New Persian. Saam-andar (talk) 20:44, 18 August 2025 (UTC)Reply
    OK that makes sense. How distinct is Early Judeo-Persian from Early New Persian? Is it distinct enough to merit an L2, and how many words do we have attested in this language? Benwing2 (talk) 00:10, 19 August 2025 (UTC)Reply
    EJP developed from dialect(s) different from the Khorasan dialects of ENP which in turn developed into the current New Persian.
    According to this (pp. 58-59) it shows more features with Middle Persian rather than ENP (Although it's similar to some ENP dialects in the south like in the Quran-e Qods)
    The EJP corpus is about 600 pages ([32] p. 241) Saam-andar (talk) 11:13, 19 August 2025 (UTC)Reply

    Betawi dialects and etymology-only codes

    [edit]

    Currently we have etymology-only codes bew-kot for "Betawi Kota", bew-ora for "Betawi Ora", and bew-udi for "Betawi Udik", but these labels are mostly used in sociological studies, which often classify the Betawi-speaking communities in a concentric circles paradigm (cf. Knörr 2014 and Chaer 2015, inter alia). In glossaries and dictionaries (cf. Kähler 1966, Chaer 2009, Khoir & Widiatmoko 2012, Sunarji 2018) as well as dialectology studies (Grijns 1991) various different dialectal groupings are presented, but none of them follow the traditional concentric view of Betawi-speaking communities.

    In particular, suburban and rural dialects are rarely considered a single dialect, as opposed to the popular urban/central dialect, which does seem to form a cohesive cluster (see Grijns 1991 for a comprehensive discussion on its characteristics). The latter also happens to be the most studied dialect of Betawi (cf. Ikranagara 1975, 1980; Wallace 1976; Muhadjir 1981) and also the most commonly represented in popular media by a huge margin, to the point that many people in Indonesia fail to recognize the diversity of Betawi dialects other than Kota.

    I suggest that we remove the etymology-only codes for "Betawi Ora" and "Betawi Udik", as they are not primarily linguistic labels, while bew-kot may remain (as either "Kota" or "Urban Jakarta" Betawi). Indonesian borrowings from Betawi can be specified as bew-kot when it does show diagnostic characteristics of the dialect; otherwise, we should just use generic bew, since it is very rare for such borrowings to be from a specific regional dialect of Betawi other than Kota (except Bekasi, perhaps). Forms like Betawi lagi and udah (whence Indonesian lagi and udah), for example, are used cross-dialectally in Betawi (including by some speakers of Kota who retain /-ah/), while Betawi udè, dèh (whence Indonesian deh) are specific to (the more innovative speakers of) Kota.

    I'd also like to add dialect labels for Betawi (should I just go and create the module?), probably based on the classification in Grijns (1991) as it remains the single most comprehensive overview of Betawi regional variants to date.

    Tagging @Rex Aurorum and @-sche who previously discussed these codes. Swarabakti (talk) 13:24, 13 August 2025 (UTC)Reply

    @Swarabakti Apologies for the delayed response. "bew-ora" and "bew-udi" do not appear to be used anywhere on Wiktionary; I added them at Rex Aurorum's request, but he hasn't used them, so seeing no objections or other comments here, I have removed them. I can find "Betawi Kota" mentioned in a decent number of linguistics texts, whereas I can't find many mentions of "Urban Jakarta Betawi", so I left "Betawi Kota" as that lect's name, but added "Urban Betawi" as an alias. If dialect labels are still needed, and particularly if you want to categorize dialectal Betawi entries, please list the Betawi dialect labels you want at Wiktionary:Category and label treatment requests (and I will try to check in a more timely fashion that they are fine to add, or ideally someone more knowledgeable of Betawi will). - -sche (discuss) 02:31, 17 September 2025 (UTC)Reply
    @-sche Great, thanks. For Kota/Urban dialect, another alias that could perhaps be added is Betawi Tengahan (lit. "Central Betawi", but sometimes translated as "Middle Betawi", though this could be confused with a historical stage instead of a geographical variety). Swarabakti (talk) 02:52, 17 September 2025 (UTC)Reply

    etymology-only code for Afghan Uzbek

    [edit]

    (Pinging others who have recently edited Uzbek: @Rumsor, @Lagrium, @LibCae, any anyone else I missed) Hello, I'd like to propose an etymology-only code for Afghan Uzbek, such that we can add automatic transliteration for Afghan Uzbek without affecting other variants of the Arabic script, such as Yangi Imlo (which has fairly significant orthographic differences).

    I would like to propose using a country sub-code uz-afg or uz-af; Though, there is the iso-assigned code uzs which we can also use, if people would prefer that. — BABRkurwa? 19:44, 16 August 2025 (UTC)Reply

    Is there a difference between Afghan Uzbek, Uzbek in China and Uzbek in the Bukharian Emirate (the beginning of the USSR)? Rumsor (talk) 20:07, 16 August 2025 (UTC)Reply
    @Rumsor: Regarding Uzbek in China, supposedly they use a script based on Uyghur (according to Wikipedia, I could not find information elsewhere), I image there would be more information about that in Chinese sources.
    There are few resources (in English) comparing the pre-Soviet Uzbek of the Bukharian Emirate to modern dialects of Uzbek, but I imagine pre-Soviet Uzbek would have relatively few Russian loanwords, similarly to Afghan Uzbek. — BABRkurwa? 23:01, 16 August 2025 (UTC)Reply
    If we're going to add a new code, I'd prefer the sub-codes because it looks better. Lagrium (talk) 20:18, 16 August 2025 (UTC)Reply
    Agreed, I generally prefer the sub-codes aesthetically and because they are a bit more obvious about what they mean. — BABRkurwa? 23:03, 16 August 2025 (UTC)Reply
    Just for reference, uz-af is nonstandard; it should be either uz-afg or uz-AF. Benwing2 (talk) 20:22, 18 August 2025 (UTC)Reply
    I see, I slightly prefer uz-afg FWIW. No one has opposed but it also doesn't seem like anyone else had any strong feelings on the matter (due to no one else working on Afghan Uzbek). I'm not super familiar with the process involved here, but as we are only discussing an etymology-code and not an L2, could we go ahead with this? My main concern is being able to add automatic transliteration for Afghan-Uzbek without affecting Yangi Imlo — BABRkurwa? 05:24, 3 September 2025 (UTC)Reply
    @Babr Yes, generally the standards are significantly lower for etym-only codes because they don't entail major changes to entries. I will add uz-afg. Benwing2 (talk) 06:03, 3 September 2025 (UTC)Reply
    @Babr Added. You will have to let me know when we have an appropriate translit module. Benwing2 (talk) 06:12, 3 September 2025 (UTC)Reply
    I actually had a module made already Module:uz-afg-translit, It was in my userspace until now but I've been using it to generate romanizations as I don't have an Uzbek Keyboard. — BABRkurwa? 06:30, 3 September 2025 (UTC)Reply
    ^I have enabled the transliteration module for Afghan Uzbek. (I don't need a discussion to do that right? If so I can undo) — BABRkurwa? 07:31, 3 September 2025 (UTC)Reply
    I think that's totally fine. Benwing2 (talk) 07:34, 3 September 2025 (UTC)Reply

    rename Sauraseni Prakrit ?

    [edit]

    As mentioned above by @Babr, the spelling Sauraseni instead of Shauraseni or Śauraseni is misleading. I don't have a strong opinion as to which of those two it should be. @Svartava Exarchus (talk) 16:00, 17 August 2025 (UTC)Reply

    Apparently we already give the spelling "Śaurasenī", for example at 𑀅𑀅𑀁 Exarchus (talk) 16:16, 17 August 2025 (UTC)Reply
    Ditto for Category:Kasmiri Apabhramsa and Category:Maharastri Apabhramsa then. I suppose we have a slight preference for the Unicode forms, one major consideration being that h in the English digraph sh tends to be read as an aspiration sign in these words, and the Indian editors are actually more annoyed by it than I am. This historical linguistics subject is so specialist that common usage – which is also largely restricted to drive-by mentions in lexica we copied our language lists from originally – can hardly be regarded as weighty. Fay Freak (talk) 17:21, 17 August 2025 (UTC)Reply
    ditto again for Paisaci Prakrit, IDK why the Prakrit's are spelt like that but, they should use digraph's or diacritics. Until I saw the Wikipedia page, I genuinely thought Shauraseni/Śauraseni was pronounced with a [s-]. — BABRkurwa? 17:53, 17 August 2025 (UTC)Reply
    @Babr, Exarchus: This was raised before: Category talk:Prakrit language#Renaming a few lects but there was a dispreference against using ‘sh’ and ‘ch’. Additionally, WT:LANGNAME specifically advices against using diacritics in the canonical name. – Svārtava (tɕ) 09:58, 18 August 2025 (UTC)Reply
    Lots of languages use diacritics. We even have the ǁAni language.
    Or Pará Gavião, ancestor: Proto-Northern Jê. Exarchus (talk) 10:44, 18 August 2025 (UTC)Reply
    Hmm, renaming to the diacriticized spelling makes sense then. @Kutchkutch: Thoughts? – Svārtava (tɕ) 11:13, 18 August 2025 (UTC)Reply
    There were only a couple of people who disliked sh and ch and not for good reasons IMO. I would much prefer the use of sh and ch to diacritics, in accordance with WT:LANGNAME. Wikipedia also uses sh in e.g. w:Shauraseni Prakrit. The use of ch for a palatal affricate is fairly standard in Indian city and language names already. The use of diacritics in some really obscure language names sometimes does occur, esp. for languages that are normally spelled in the Latin script with diacritics, but that doesn't really apply here and I would advise against it. Benwing2 (talk) 20:20, 18 August 2025 (UTC)Reply
    Those with ü only I found are. , Khün, Mündü, Wichí Lhamtés Güisnay, Mün Chin, Natügu, Sabüm, Tai Nüa, Tübatulabal, San Pablo Güilá Zapotec , Güenoa, Volapük, Nüpode Huitoto. There are as many with ö and a few with ä.
    Note also the contradiction with Ashokan Prakrit.
    Some day the concern with typing Unicode will be wholly irrelevant, when we won't type the language name into L2 anymore but fetch by templates, as other Wiktionaries do. Fay Freak (talk) 14:32, 18 August 2025 (UTC)Reply
    Wikipedia also uses sh in e.g. Shauraseni Prakrit [] The use [] is fairly standard in Indian city and language names already.
    • In general, Wikipedia usage should not be used as a measure of how common a particular romanisation is especially for understudied languages.
    • It is appropriate to anglicise Indian city and language names because they are everyday words used by ordinary people.
    • However, the names of Prakrit lects are not everyday words used by ordinary people.
    I would much prefer the use of sh and ch to diacritics [] There were only a couple of people who disliked sh and ch and not for good reasons IMO.
    • sh, ch; they are being used by the government [as Hunterian transliteration], … not in linguistic works … Sanskrit and Prakrit are well-established English words, whereas the names of the Prakrit lects are more recent transliterations.
    • “h” followed by a consonant is interpreted as an aspiration sign (even if “s” itself is not aspirated) rather than being a digraph.
    • The issue with “h” can even be seen in the language name “Kutchi”, which is clearly anglicised. This name could confusingly be spelled as “Kacchi” with a single “h” for aspiration or “Kachchhi” with a doubled “hh” for aspiration.
    in accordance with WT:LANGNAME … The use of diacritics in some really obscure language names sometimes does occur, esp. for languages that are normally spelled in the Latin script with diacritics
    • Even if Latin script is not a canonical script for Prakrit lect, WT:LANGNAME does not definitely rule out diacritics in languages names cannot have diacritics.
    • In this case, there is no single prevailing common English name.
    @Babr: I genuinely thought Shauraseni/Śauraseni was pronounced with a [s-]
    • “s” and “sh” can be merged as a single sound in many contexts.
    • Thus, the “ś” in “Śauraseni” could be pronounced as either “s” or “sh” depending on the speaker’s background even if it is etymologically “sh”.
    • Furthermore, the “Śauraseni” lect itself does not have the “sh” sound, so using the digraph “sh” is potentially misleading.
    • This historical linguistics subject is so specialist that common usage [as “sh”] [] can hardly be regarded as weighty..
    • The names of Prakrit lects is confined to history, linguistics, Jainism and other scholarly fields that prefer IAST transliteration over anglicisation.
    @Fay Freak: The contradiction with Ashokan Prakrit is because Ashok is a common male given name, and Ashoka is a well-known historical figure with the English adjectival form Ashokan.
    @Exarchus, Svartava:
    • Therefore, renaming to the diacriticized spelling of “Sauraseni” seems to be more appropriate compared to “sh” even if having diacritics in the canonical name is generally not preferable.
    • “ś” would serve as a compromise between both the “s” and “sh” variants in addition to being a variant is that is used in English.
    Ditto for Category:Kasmiri Apabhramsa and Category:Maharastri Apabhramsa
    • Kashmiri is an established term in English with several senses (even though the Kashmiri language is not descended from Category:Kasmiri Apabhramsa), so this situation would be comparable to “Ashokan Prakrit”.
    • However, “Maharashtri” is not an established term in English as the adjectival form of Maharashtra. The English adjectival form of Maharashtra is Maharashtrian (see diff). User:Equinox probably created the entry for Maharashtri (which has no non-linguistic referent) by looking at Wikipedia.
    Kutchkutch (talk) 13:22, 19 August 2025 (UTC)Reply
    For the record I don't think anyone got your ping, but I think you are being overly pedantic.
    Furthermore, the “Śauraseni” lect itself does not have the “sh” sound, so using the digraph “sh” is potentially misleading.
    English generalizes sounds all the time, most languages don't have an "r" sound, that's not an argument that we shouldn't use an "r" in transliteration. Using diagraphs in transliteration is extremely common, and while scientific works tend to prefer single letters, many of them do use diagraph as well. It's not as crazy as you are claiming it is. But on that note, I'm not necessarily opposed to using diacritics (buts it's not my preference), I would just like to change the name to literally anything else that's more clear. — BABRkurwa? 05:35, 3 September 2025 (UTC)Reply
    "I would just like to change the name to literally anything else that's more clear."
    my thoughts exactly Exarchus (talk) 07:17, 3 September 2025 (UTC)Reply

    Add Proto-Ainu

    [edit]

    We already have Appendix:Proto-Ainu reconstructions. There are also a number of (ill-formatted) Ainu etymology sections referring to Proto-Ainu (see Special:Search/insource:"Proto-Ainu"). – wpi (talk) 11:23, 31 August 2025 (UTC)Reply

    September 2025

    [edit]

    etym-only variants of Church Slavonic

    [edit]

    I see that Church Slavonic (zls-chs) has been created as an L2 language distinct from Old Church Slavonic (cu). Any objection to adding etym-only variants for the different recensions? I encountered a Serbo-Croatian term described by Matasović as having a cognate specifically in Russian Church Slavonic (Church Slavonic достизати (dostizati) vs. Serbo-Croatian dostizati) and it would be nice to have a corresponding etym-only code rather than having to write Compare Russian {{cog|zls-chs|...}}. (Maybe zls-chs-RU or zls-chs-ru or zls-chs-rus? Per Wikipedia there's also an Old Moscow recension.) Ping @Sławobóg @AshFox @ZomBear, @Bezimenen, @IYI681, @Thadh @Vininn126 as some people who may have opinions about this and/or be able to list out the recensions that are deserving of etym-only codes. Benwing2 (talk) 04:12, 10 September 2025 (UTC)Reply

    Fine by me. In general I find ety-codes to be safe, but these in particular as well. One reason they might not have been is that there was debate if some should be L2's. Vininn126 (talk) 08:01, 10 September 2025 (UTC)Reply
    I don't object--IYI681 (talk) 07:16, 10 September 2025 (UTC)Reply
    I was already against creating a single Church Slavic code in the first place. It makes no sense to have Church Slavic separate from Old Church Slavic but still handle the different Church Slavic recensions under one code. Doesn't make anything easier, just increases clutter and difficulty.
    By the way, "Old East Slavic Church Slavonic" is about as vague as can be and needs a very thorough description making it distinct from Old East Slavic itself, since borderline cases are now treated as Old East Slavic (basically third-person verbal endings are the major difference between the two from what I can tell). Thadh (talk) 09:44, 10 September 2025 (UTC)Reply
    I raised this topic 6 months ago. Many supported the addition of etymological codes for Church Slavonic, but that was the end of it. AshFox (talk) 01:20, 11 September 2025 (UTC)Reply
    Just a list I wrote on the Discord server:
    • zls-chs-orv Old East Slavic Church Slavonic
      • zls-chs-ru Russian Church Slavonic ✅
      • zls-chs-uk Ukrainian Church Slavonic ✅
    • zls-chs-cs Czech Church Slavonic
    • zls-chs-ro Romanian Church Slavonic
    • zls-chs-bg Bulgarian Church Slavonic
    • zls-chs-mk Macedonian Church Slavonic
    • zls-chs-hr Croatian Church Slavonic
    • zls-chs-sr Serbian Church Slavonic
    AshFox (talk) 01:26, 11 September 2025 (UTC)Reply
    Is the term Old East Slavic Church Slavonic used anywhere in scientific journals? Unsure about that. Otherwise Czech, Croatian, Macedonian CS and such are pretty different. Chihunglu83 (talk) 04:17, 13 September 2025 (UTC)Reply
    OK, since we seem to have a consensus (with one dissenter), I added etym-only codes for the Russian, Ukrainian (aka Rusyn, Belarusian) and Old Moscow recensions. The remainder from Czech Church Slavonic down to Serbian Church Slavonic are commented out for now (using HR for Croatia and RS for Serbia, consistent with their official country codes), as they are not well-described in Wikipedia (except for Serbian Church Slavonic, which is described in a confusing fashion) and don't seem to have Wikidata codes. I entirely left out Old East Slavic Church Slavonic and Old Ruthenian Church Slavonic pending clarification of whether these really exist and are used in scholarly journals. Benwing2 (talk) 04:56, 13 September 2025 (UTC)Reply
    @Benwing2 hi, could you make these Etymological codes in the form of a tree? Because these recensions go from one to another.
    • Old East Slavic Church Slavonic zls-chs-orv — 10th‒14th century (RusWiki)
      • Old Moscow Church Slavonic zls-chs-omo — 14th‒15th century (RusWiki, EngWiki)
        • Russian Church Slavonic zls-chs-ru — 16th / 17th century ‒ present (RusWiki, EngWiki), other names: "Synodal Church Slavonic"
      • Ukrainian Church Slavonic zls-chs-ua — 14th‒18th century / present (UkrWiki, RusWiki, EngWiki), other names: "Kiev Church Slavonic", "Ruthenian Church Slavonic".
      • Belarusian Church Slavonic zls-chs-be — 15th‒17th century (UkrWiki, RusWiki)
    Here is the most precise scheme of development of all East Slavic revisions of the Church Slavonic language. If you want precision on Wiktionary, this is it. But if you consider such detail unnecessary, then you can skip some revisions (I will tell you which ones, just tell me). I also suggest being consistent and using codes in lowercase letters zls-chs-RU/zls-chs-UA ➜ please change to zls-chs-ru/zls-chs-uk. AshFox (talk) 17:53, 13 September 2025 (UTC)Reply
    If this tree is too redundant, it can be reduced/combined to the 3 most common recensions:
    • Old East Slavic Church Slavonic zls-chs-orv — 10th‒14th century
      • Russian Synodal Church Slavonic zls-chs-ru — 16th century ‒ present
      • Ukrainian Church Slavonic zls-chs-uk — 14th‒18th century
    AshFox (talk) 18:01, 13 September 2025 (UTC)Reply
    @Benwing2, in fact it seems doubtful to single out a separate code Old Moscow recension... Firstly, it was for a very short time, ~200 years. Secondly, it has little difference from the modern Russian (Synodal) Church Slavonic ‒ list of differences. The existence of a separate code for the Church Slavonic of the Eastern Slavs of the 10th-14th centuries is much more reasonable. I already mentioned it in Chihunglu83's answer, but the Russian Wikipedia has a separate article about the "Old East Slavic Church Slavonic" with dates and its distinctive features. Regarding "Ruthenian Church Slavonic"... under what code can we unite Belarusian Church Slavonic and Ukrainian Church Slavonic. Belarusian recension similarly Old Moscow recension, was also short-lived and there are no distinctive features anywhere. Which cannot be said about the Ukrainian recension, which is still partly used today.
    There are three main East Slavic recension of Church Slavonic: "Old East Slavic (aka Old Russian)", "Russian (aka Synodal or New Moscow)" and "Ukrainian (aka Kievan)". The other two are very small: "Old Moscow" and "Belarusian". AshFox (talk) 20:07, 13 September 2025 (UTC)Reply
    @AshFox OK I am happy to remove "Old Moscow Church Slavonic". My main concern about "Old East Slavic Church Slavonic" is essentially the same issue brought up by @Thadh: if this variant existed from the 10th to 14th centuries, it overlapped substantially with Old East Slavic itself and OCS, so (a) should it instead be considered a variant of OCS not CS, and (b) is it distinctive enough from Old East Slavic to have its own code? Benwing2 (talk) 21:24, 13 September 2025 (UTC)Reply
    @Benwing2 please add at least 3 more etymological codes that are currently first in line for necessity:
    • zls-chs-cs Czech Church Slavonic (EnWiki), aka Moravian-Czech Church Slavonic
    • zls-chs-hr Croatian Church Slavonic (EnWiki)
    • zls-chs-sr Serbian Church Slavonic (EnWiki)
    Controversial and, for now, second-in-line recensions:
    • zls-chs-orv Old East Church Slavonic
    • zls-chs-ro Romanian Church Slavonic, aka Wallachian-Moldavian Church Slavonic
    • zls-chs-mk Macedonian Church Slavonic
    • zls-chs-bg Bulgarian Church Slavonic (?)
    Currently, the first 4 (Czech, Croatian, Serbian, Old East) are actively used/mentioned on Wiktionary. AshFox (talk) 14:29, 18 September 2025 (UTC)Reply
    @Chihunglu83 yes. Here is the article on Russian Wikipedia, the correct title: Древнерусский извод церковнославянского языка = which is literally "Old East Slavic Church Slavonic". The difference from the modern Russian Church Slavonic is the presence of reduced ones. This is simply Church Slavonic with (Old) East Slavic elements. Compare:
    "Real" Russian Church Slavonic, it has no reduced sounds and is very modern. Its main source is "Большой словарь церковнославянского языка Нового времени" (just created a module for it Module:bibliography/data/zls-chs). If you see that some word is labeled "Russian Church Slavonic", but at the same time there are reduced sounds and the word forms themselves are archaic ‒ this is in fact the "Old Russian Church Slavonic" = "Old East Slavic Church Slavonic". The term "Old Russian" is obsolete, therefore it is replaced "Old East Slavic". AshFox (talk) 16:51, 13 September 2025 (UTC)Reply
    @AshFox: Both OES and OCS could have влъкъ as a variant: OES graphically (metathesis happened quite often with these, possibly because it was more-or-less pronounced as a syllabic liquid) and OCS due to vowel harmonic tendencies. So such a feature would not be sufficient.
    {{RQ:orv:IS2}} can be said to be Church Slavonic with East Slavic features but we on Wiktionary consider it OES since the difference between Church Slavonic with East Slavic features and Old East Slavic with Church Slavonic features is almost non-existent. Thadh (talk) 06:02, 14 September 2025 (UTC)Reply
    "Russian Church Slavonic" of the Rus times (10-14th century) and "Russian Church Slavonic" that emerged in the 16/17th century and is currently used in Russia are completely different recensions of Church Slavonic. They cannot be labeled with the same code. Here is a dictionary of the new Russian (Synodal). It has a completely different spelling than that of the Kievan Rus times. There are no reduced sounds (except for the final ъ by tradition), strong reduced sounds have all gone into ъ/ь>о/е, weak reduced sounds are not written at all ъ/ь>Ø, non-etymological use of the letters ѧ/ѫ, use of new letters ї/й, etc. In order to have the opportunity in (!) exceptional cases to indicate a Church Slavonic word of the 10-14th century in East Slavic area, the code of the modern Russian (Synodal) will not work, it will be wrong. Therefore, I believe that in exceptional cases the code for "Древнерусского извода церковнославянского" (Old East Slavic Church Slavonic) is still needed for the ancient period. If we don't have a separate code for East Slavic Church Slavonic then it turns out that we will ignore the existence of Church Slavonic in the region of Rus' (modern Russia/Ukraine/Belarus) until the 14th century. AshFox (talk) 07:23, 14 September 2025 (UTC)Reply
    @AshFox: I never proposed that the CS from the middle ages be treated the same as the modern Russian CS, I'm just saying it's pretty difficult (or even impossible) to distinguish between the East Slavic CS in the middle ages and Old East Slavic. You still have not adressed that. Thadh (talk) 09:37, 14 September 2025 (UTC)Reply

    Category:Linear A language

    [edit]

    Moved from Wiktionary:Beer_parlour/2025/September#Category:Linear_A_language

    Is this redundant to Category:Minoan language? Should it be deleted and its subcats reconfigured to be subcats of Minoan? —Justin (koavf)TCM 02:38, 14 September 2025 (UTC)Reply

    @Koavf Could you move this discussion to WT:LTR? BTW it definitely sounds like "Minoan language" and "Linear A language" refer to the same thing but since Ethnologue/ISO 639-3 accepted codes for both, it would be enlightening to see the justification for creating whichever one was created later. Benwing2 (talk) 20:15, 14 September 2025 (UTC)Reply

    Justin (koavf)TCM 20:18, 14 September 2025 (UTC)Reply

    Confusing label, claiming language status of a script, as if Cretan hieroglyphs were surely of a different language. Result of a technical mixup, to be deleted. Catonif (talk) 17:15, 16 September 2025 (UTC)Reply

    Split of Altai, Solon into distinct languages

    [edit]

    Hi every one, here are my requests:
    1. Split Southern Altai into ‘Altai’, Telengit, and Teleut;
    2. Split Northern Altai into Kumandin, Tubalar, and Chelkan;
    3. Make Olguya Ewenki a distinct language. All new languages have unique grammatical features. We need independent inflection tables. Now we‘re forced to add every entry of these languages as a ‘dialectal form’. LibCae, or ‘Lithuanian Lime’ 14:44, 22 September 2025 (UTC)Reply

    @LibCae Whether you need to create more than one inflection table for a given form is not a criterion for splitting a language. Plenty of languages do this just fine. The issue is, are these generally considered separate languages by the relevant academic communities, or dialects of the same language? AFAICT, the consensus of Wikipedia, Glottolog and Ethnologue is that Southern Altai and Northern Altai are each a single language, not several languages. Based on this, I would  Oppose this proposal. Benwing2 (talk) 06:09, 24 September 2025 (UTC)Reply

    Bhaca

    [edit]

    Do we have a language code for Bhaca? ISO 639 does not. Should we? 0DF (talk) 02:31, 24 September 2025 (UTC)Reply

    Are you asking for an L2 language code or an etym language code? From browsing the academic sources, Bhaca seems generally to be considered a dialect (one of many of Xhosa, according to Glottolog), so I would  Oppose creating an L2 code for this language, but have no issues creating an etym language code for it. Benwing2 (talk) 06:11, 24 September 2025 (UTC)Reply
    @Benwing2: I'm not advocating anything in particular; I was asking the question neutrally. I read a comment from a Bhaca-speaker bemoaning the absence of any Bhaca dictionary, which is what prompted me to create the entry for English Bhaca. An etym.-lang. code would have the function of allowing that English word to be etymologised properly, at least. Bhaca and Xhosa are both Nguni languages, but it surprised me to read you write that Bhaca is considered a dialect of Xhosa, given that the former is a Tekela language, whereas the latter is a Zunda language, according to w:Nguni languages#Classification. Here are two sentences from w:Nguni languages#Comparative data and w:Bhaca language#Vocabulary, respectively, in fourfold English–Bhaca–Xhosa–Zulu parallel translation:
    1. I like your new sticks.
      • Ndi-ya-ti-thsandza ii-ntfonga t-akho etin-tsha.Bhaca
      • Ndi-ya-zi-thanda ii-ntonga z-akho ezin-tsha.Xhosa
      • Ngi-ya-zi-thanda izi-nduku z-akho ezin-tsha.Zulu
    2. Please buy me eggs and milk when you go out.
      • Bendicela undithsengele amaqandza nentusi na ukhamba.Bhaca
      • Bendicela undithengele amaqanda nobisi xa uhamba.Xhosa
      • Bengicela ungithengela amaqanda nobisi ma uhamba.Zulu
    There are lots of similarities there, but it's not obvious to me from those example sentences why Xhosa and Zulu should be considered separate languages and yet Bhaca should not, in terms of mutual intelligibility. Finally, w:Bhaca language#Vocabulary lists four pairs of Bhaca–Xhosa equivalent words (cognates?), namely
    1. Bhaca inkatinyana ― Xhosa intombazana
    2. Bhaca ukubhobha ― Xhosa ukuthetha
    3. Bhaca layi? ― Xhosa phi?
    4. Bhaca ukukshiksha ― Xhosa ukubetha
    Judging orthographically, those don't seem mutually intelligible to me. 0DF (talk) 20:38, 24 September 2025 (UTC)Reply
    I don't think it's easy to judge intelligibility based on a small sample of words. I said Bhaca is considered a Xhosa dialect only based on Glottolog, which is often wrong; in truth I know next to nothing about this linguistic area. I stand by what I said above, which is that we should defer to scholarly consensus on what is a language vs. dialect and not try to make our own judgments, because they will inevitably be biased. Benwing2 (talk) 21:19, 24 September 2025 (UTC)Reply
    @Benwing2: Sure, and this is hardly my area of expertise either, but let's not pretend that academics are necessarily paragons of objectivity themselves. I suspect that the scholarly consensus you cite mixed some ausbau considerations in with their abstand ones when they constructed their taxonomy. But no matter, is Bhaca to be an etymology-only language, to be treated as part of Xhosa, yes? If so, what form should its language code take? 0DF (talk) 00:41, 25 September 2025 (UTC)Reply
    @0DF If we were to create an etym-only code, it would likely be xh-bha. But we should probably wait a bit for some people who may be more knowledgeable to weigh in. Benwing2 (talk) 00:44, 25 September 2025 (UTC)Reply
    @Benwing2: OK, yes. I agree. 0DF (talk) 01:50, 25 September 2025 (UTC)Reply
    Input needed
    This discussion needs further input in order to be successfully closed. Please take a look!

    Renaming Category:Konyak languages

    [edit]

    This family's proto-language is called "Proto-Northern Naga", and this family also has a singular "Konyak language" as a member. To resolve the inconsistency between language and proto-language, and also avoid the confusion between an identically named family and language, I would like to rename this family "Northern Naga". @-sche @Thadh @Wpi You good with this? — mellohi! (Goodbye!) 00:28, 28 September 2025 (UTC)Reply

    Seems reasonable (both names for the family seem very roughly equally common, maybe Northern Naga is even more common), and would solve the issue in this case, though we probably still need to find some way of fixing the general case (discussed further up this page w.r.t. Kipchak, Salish, etc) that if a language and family have the same name then something like "From {{der|en|foo-bar|-}}" displays the same text whether foo-bar is a language or a family (making it impossible for a casual reader of the page to tell which is meant without checking the link target or categories). Maybe if a module detects that a language and a family have the same name, it could be made to change the text that the family displays, so "From Kipchak" (language) but "From the Kipchak languages" or "From one of the Kipchak languages"? - -sche (discuss) 15:33, 28 September 2025 (UTC)Reply
    Probably better to say "from a Kipchak language", though this would require detecting when the family name starts with a vowel to avoid things like "from a Indo-European language". Chuck Entz (talk) 15:59, 28 September 2025 (UTC)Reply

    Ancestor of Buriat and Kalmyk

    [edit]

    Morphologically Buriat and Kalmyk should not be considered as descendants of Classical Mongolian. Both preserved final -n from Middle Mongol, in contrast of -n hidden in Classical Mongolian, e.g. MM usun > CM usu, but Buriat uhan, Kalmyk us°n.

    Budaev (1992, Бурятские диалекты, pp. 36–37)’s list showed us lexical similarities between Buriat and Kalmyk. Oirat chronicles mentioned Buriats were part of them. I suggest a code for the new term ‘Oiratic’ being the ancestor of Buriat and Kalmyk and a descendant of MM.

    Classical Mongolian chronicles from steppe dumas already clarified that written CM had been introduced into Transbaikalia quite late. And there is even little relation between Cisbaikalian dialects and CM. LibCae (talk) 11:15, 29 September 2025 (UTC)Reply

    @Theknightwho We need to discuss this. LibCae (talk) 11:16, 29 September 2025 (UTC)Reply

    October 2025

    [edit]

    Retiring Salchuq

    [edit]

    We recognize Salchuq (slq) as a language, but that code has been deprecated after a request pointing out that Ethnologue is the only source asserting its existence and there's no independent evidence that the language has ever existed. I therefore suggest we retire this code. —Mahāgaja · talk 18:36, 4 October 2025 (UTC)Reply

    Let’s do this. It totally looks like an accident. Fay Freak (talk) 17:04, 10 October 2025 (UTC)Reply
    AFAICT that ISO change-request is correct that there is no evidence of this being a language, it looks like just a variant spelling of Seljuk (??), and that does not seem to be a language either. Because it was claimed to be a dialect of Azerbaijani, I'll ping az-knowing editors @Allahverdi Verdizade, BurakD53 in case they have any knowledge of it, but if not, let's just remove this. I wonder where this came from in the first place. - -sche (discuss) 07:33, 24 October 2025 (UTC)Reply
    Removed. - -sche (discuss) 04:49, 21 November 2025 (UTC)Reply

    Rendering Samalian

    [edit]

    It appears that Wiktionary does not currently support the Samalian language. Can this be corrected, please? Antiquistik (talk) 19:19, 8 October 2025 (UTC)Reply

    Pinging @Fay Freak, Benwing2. Antiquistik (talk) 08:57, 10 October 2025 (UTC)Reply
    @Kwamikagami should review his 2014 Wikipedia entry on the stele of Ördek-Burnu also. I find it called recently deciphered in →DOI and →DOI (why is it not licensed at the Wikipedia Library yet?).
    The language is not often mentioned in journal databases altogether, being a niche within niche excavation workout. You appear right though even according to older stances that this is an individual Northwest Semitic language, having striking features requiring it to be assumed not the same language community as Aramaic.
    The distance between Samalian and Aramaic is large enough to fit something else between it and Old Aramaic per Pardee 2009 (→DOI) pp. 52–53:
    As research progressed, however, it became clear that what must be termed the primary isogloss for Samalian, the marking of masculine plural substantives in the absolute state with {-w} (nominative case) or {-y} (oblique case) and without a following consonant, was absent from this text. As we shall see, the identification of the language of the new inscription as a previously unattested dialect of Aramaic, situated typologically between Samalian and Old Aramaic, appears to be required. Fay Freak (talk) 17:01, 10 October 2025 (UTC)Reply
    @Fay Freak Does this mean that Samalian can be added to the list of languages supported on Wiktionary? Antiquistik (talk) 17:30, 14 October 2025 (UTC)Reply
    Yes. Fay Freak (talk) 17:41, 14 October 2025 (UTC)Reply
    Good. So how will this be done? Antiquistik (talk) 18:15, 14 October 2025 (UTC)Reply
    @Fay Freak Antiquistik (talk) 02:47, 19 October 2025 (UTC)Reply
    @Benwing2: Can you add Samalian now? I have waited five weeks to give time for matters to be digested, but I have never added a new language yet nor consciously observed anyone doing it and feel a bit crazy if deciding this upon two voices mine including. Unlike Antiquistik might suspect, I have occasionally thought about the matter, but avoided it, and now prefer to avoid the impression of inaction. Fay Freak (talk) 23:13, 11 November 2025 (UTC)Reply
    Are you asking for an L2 language or an etymology language? I have no issue adding an etymology language for Samalian, but for adding an L2 language I'd like to hear more discussion about whether we really need a new L2 language for this, esp. given its sparse attestation. Pinging @-sche for thoughts. Benwing2 (talk) 06:40, 12 November 2025 (UTC)Reply
    I'm on the fence. Perusing the Google Scholar and Books results about it, I see that some scholars think it can be considered Aramaic, and some (my impression is: maybe not as many?) think it is a separate language. Pasting the texts of the lect's 3 inscriptions into a few different online word counters, it looks like we're only dealing with somewhere between 360 - 600 unique words (1450 - 1496 words in total, because some words occur multiple times); it seems to be functionally an 'extinct wordlist-only language' like we were [about further down] (just recorded in a few inscriptions instead of a few wordlists), and it seems like we would not be sparing ourselves much work by either choice (considering it Aramaic or making it a separate language), because if Wikipedia is correct, the three Samalian inscriptions use a different script (Phoenician) than the three scripts we currently lemmatize Aramaic under, so we already have to create a whole new set of pages in order to cover the inscriptions regardless of whether we set the L2 header on those pages to ==Samalian== or ==Aramaic==. (It also has some phonological differences from other Aramaic.) So I'm on the fence, but if our Semiticists want to have it as a separate language, that's probably OK. I'm trying to think who edits this area of Semitic and might have thoughts. Pinging @Cymelo, Kristian Lahdo. - -sche (discuss) 07:40, 13 November 2025 (UTC)Reply
    @-sche: It seems like I was not sufficiently clear. The Stele of Ördek-Burnu is a fourth inscription in Samʾalian, deciphered in 2013 due to its lacunae on ten lines, →DOI and →DOI – you had this last already in the bibliography of the stele at Wikipedia, but it was not referenceworky enough for Wikipedians to understand the paper's authors' understanding of having read Samʾalian – rather than any Semitic, which is obvious on first sight anyway after the transcription and translation, in contrast to the uncritically repeated 1915 claim of there perhaps being Luwian (which was itself undeciphered back then) –, so I gave another DOI on that above.
    Our library access failed in September has been restored, though it seems like now every Wikipedian is catching up on what he has defered and shifty pirates would not fail to be faster.
    As for the Wikipedia article on the Samalian language, it appears less wrong if you stress the primarily in “primarily known from three inscriptions”, and kudos to them for providing transcriptions in the original script for all three. Maybe @Onceinawhile or @Editor259 (but the last only active on that) wants to add the fourth, anyway.
    Here on Wiktionary I have not made the reference templates because I don't have the language code yet, and I assume arc would already be misleading. With that, I suppose that Antiquistik will quickly fix even the coverage of Samʔalian: 69 lines in these four inscriptions, I am not sure this is bad in comparison to other languages we have like Moabite (several inscriptions popped up for that in the recent years, and it's not just Hebrew although closest to it). Fay Freak (talk) 16:03, 20 November 2025 (UTC)Reply
    OK. My impression is that if y'all who have more knowledge of it want to add it, that seems reasonable to me. There seems to be a decent case for it being a language, and it's extinct with a limited corpus. (And if it later turns out that we should revisit and consider it Aramaic, it does not seem like merging it will be much more work than splitting would be if it were handled as Aramaic, since it seems like it exists in a different script than we lemmatize Aramaic in and so we would not need to e.g. combine two L2 sections on the same page and figure out whether similar definitions can be combined, just to bot-change the L2s into context labels.) Would you like me to add a code for it, or would you like to add the code? I would use "sem-sam" unless someone prefers something else. - -sche (discuss) 04:38, 21 November 2025 (UTC)Reply
    @-sche:: sem-sam is handy. I will observe how you add the langcode. The Phoenician script is also shared by Old Aramaic, but I hardly imagine anyone would rather have Samʔalian under the same L2 header only marked with an extraordinary label coordinating with Old Aramaic.
    Huehnergard in What is Aramaic (1995) →DOI coined the term of Aramoid or Proto-Aramoid to branch under, but due to the paucity of features himself found it equally likely that there is no genetic connection beyond the Proto-North-West Semitic level. This represented a restriction of the term Aramaic, when Samʔalian and the Dayr ʕAllā inscription were deemed loosely a “broader Aramaic”. When Tropper The World of the Aramaeans III p. 216 (2001) against this background writes that “in spite of these peculiarities ist das Samʾalische ein aramäischer Dialekt”, it includes a tribal argument and can mean Aramaean as well as Aramaic.
    The elephant in the room, brought up by the conceptualization of an Aramoid branch, is what we will do with the Deir Alla inscription, but this does not influence our decision on Samʔalian, which due to more corpus size exposes more arguments for linguistic distance (and hence is found in yet another branch proposed by Tropper, Proto-Syrian, under which Proto-Aramoid is put only for Proto-Aramaic and the Dayr ʿAlla inscription). Does @Antiquistik even want a stance on it? Fay Freak (talk) 12:29, 21 November 2025 (UTC)Reply
    @Fay Freak Sure, I would like a stance on the issue. Antiquistik (talk) 13:54, 21 November 2025 (UTC)Reply
    @Antiquistik, -sche: So I take it, for it is the first concern in my mind after waking up :/ I read Hackett's 1984 paper on its features. I almost wanted to say we can't add it because it does not even have a real name, apart from being only one inscription in oft-rotated fragments, but nobody says we cannot have provisional languages, actually a lot of those Ancient North Arabian and Old South Arabian languages are just named by finding areas, and Deir Alla language works and exists as a term, and in the case a new inscription is related we have not much to move
    Together with Samʔalian and Aramaic it only goes under Northwest-Semitic, since an “Aramoid” family will only be abused (we already have arbitrary use between the family code sem-ara and the fallback language code further specified by labels arc) and we also have isoglosses with Canaanite and even Arabic, in contrast with Huehnergard's thin branching argument. Fay Freak (talk) 00:48, 22 November 2025 (UTC)Reply

    OK, I added Samalian as a language with the code "sem-sam". - -sche (discuss) 05:46, 22 November 2025 (UTC)Reply

    Renaming Māori

    [edit]
    Discussion moved from WT:BP.
    was "Te Reo Māori standard regarding macron/tohutō usage"

    Bringing a post from Wiktionary_talk:List_of_languages here, as this may be a better place to discuss a change in standards:

    Having done some work creating more accurate audio pronunciations for some Māori language terminology on here, I've come to notice that the standard for the term itself is extremely inaccurate to how te reo is referred to both by Māori and by New Zealand English speakers. In Aotearoa New Zealand, the term 'maori' is considered highly inaccurate. The standard term is Māori, with 'Maaori' used where macrons are unavailable (or sometimes reflecting dialectical variations). I'm not sure how 'Maori' ended up as the standard, but quite literally any material created by Te Taura Whiri / The Māori Language Commission would indicate its innaccuracy. This is a dictionary, it should aim to accurately reflect the language as it is used. Lilith Itou (talk) 22:56, 8 October 2025 (UTC)Reply

    I think the best place to move this discussion is probably Wiktionary:Language treatment requests, and will move the discussion there. Judging by Google Books Ngrams, Maori was the overwhelmingly most common form for most of history, until about ten years ago; since then Māori has been more common. Perhaps that's enough time that we should make a change, though it's worth noting that because diacritics are not easy to type, preferring that as the canonical name would increase the difficulty of adding any terms in the language. (People could copy and paste ā or Māori from elsewhere, and might already need to do that when adding words which are themselves spelled with such diacritics, so perhaps people are willing to regard the increase in difficulty as manageable.) - -sche (discuss) 23:06, 8 October 2025 (UTC)Reply
    Hiya, thanks for your response. I'm not sure why google books displays that, but I can say as a near-native speaker with a professional background that the lack of macrons has been considered an error in written Māori at least since the establishment of Te Taura Whiri i te Reo Māori in 1987.
    How tohutō have been displayed has varied over time, but Māori relies heavily on them to differentiate between different words. 'Maori' isn't a real word in te reo, and with many other terminology a policy of excluding the tohutō would lead to confusion.
    An example of this would be the difference between terms like wahine and wāhine - the latter is a plural. Many words in Māori may mean completely different things with or without the tohutō (here's a relevant article explaining this).
    I understand that this may provide some 'inconvenience' for people contributing to Māori entries on Wikitionary, but if the alternative is actively misrepresenting a language (which, I note, has suffered many instances of intentional misrepresentation throughout modern history) then I think people should simply deal with it. If people wish to contribute regularly to Māori language articles, there are many options for macrons. Windows has a built-in Māori keyboard which takes a couple seconds to set up, mobile users usually have macrons by default, and I'm sure some equivalent exists for desktop iOS users.
    inb4 'this is English language wikitionary' - Māori is an official language of New Zealand with government departments dedicated to setting standards for its writing both i te reo Māori and in regular New Zealand English usage. Because of this, tohutō usage in 'Māori' is standard in New Zealand English. You would be hard pressed to find any reputable NZ English sources which still leave out the tohutō in writing. Lilith Itou (talk) 23:28, 8 October 2025 (UTC)Reply
    @Lilith Itou The way we normally decide how to spell a given language name (and not just inclusion or not of accents, but more broadly) is based on actual usage in English sources, preferring recent sources over older ones, and preferring scholarly sources if necessary. We don't go by the spelling in the language itself, we don't go by government standards, and we need to reflect usage globally, not just in New Zealand. That said, the Ngrams data is enough to convince me it's probably OK to change the spelling to include the macron. Benwing2 (talk) 23:41, 8 October 2025 (UTC)Reply
    As you may notice in my post, I mentioned that the use of Māori in NZ English does consider tohutō standard. If you google 'maori' right now, you will almost certainly find a majority of results include the macron - including English language Wikipedia. I'm sympathetic to the idea that it must reflect global use - but this seems like an odd way to handle a language spoken almost solely in a single country, especially a country with English as the most common language. If you'd like English sources discussing 'Māori' and its usage, I can provide as many as you need. One of these would be the Oxford English Dictionary. Lilith Itou (talk) 23:50, 8 October 2025 (UTC)Reply
    I'm not sure why it's odd to insist that we reflect global use; Wiktionary is specifically a global English dictionary (whereas e.g. the Oxford English Dictionary is largely British in scope), and the issue of global use is a standard across all language name choices. As @-sche notes, there is a crossover point a few years ago (for me it appears to be around 2019), where the spelling Māori became more common than Maori, so I am fine with the rename. I used similar criteria when moving the canonical entry for the large Balearic island from Majorca to Mallorca (but contrastingly maintaining the existing preference of Minorca over Menorca, since usage does not yet clearly show a preference of one over the other in recent scholarly sources; and I should add, Chrome underlines Menorca in red, but does not do the same for Minorca, Majorca or Mallorca). Also keep in mind in naming language articles particularly, Wikipedia often departs from common usage in favor of diacritics because of a single user (Kwamikagami), who acts as if he "owns" the language articles and has gone around doing lots of out-of-process moves of article names to include diacritics. Benwing2 (talk) 00:03, 9 October 2025 (UTC)Reply
    Having looked at Wikipedia's RfC thread regarding the use of macrons for Māori on English language pages, I highly recommend checking it out. The main point I'd bring up from there is that Google Ngrams likely under-represents the instance of macron usage, as "many words published with diacritics get transcribed into Google's database without them". If Ngrams is the touchstone used to determine common usage, then a 'recent' uptick in the use of Māori on there may be indicative of greater prevalence than the raw data would show. Lilith Itou (talk) 00:10, 9 October 2025 (UTC)Reply
    Well, we do have many, many language names with diacritics and weird characters, all the way from reasonable Guaraní to ǃXóõ, whose exclamation mark isn’t even an exclamation mark character.
    By the way, thanks for moving the discussion — I realized I directed User:Lilith Itou to the wrong place, and when I came to check it was already here. Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 23:39, 8 October 2025 (UTC)Reply
    @Polomo See my comment just above. Some diacritic-ful spellings were added out of process, so I wouldn't necessarily assume just because there exist such names that we should use diacritics everywhere. The issue of typing their names is a real one, and I would probably object to ǃXóõ in particular as it's nearly impossible to type that correctly without copy-pasting it. Benwing2 (talk) 23:45, 8 October 2025 (UTC)Reply
    Agreed; Xoo is just a notable outlier. We do have, nevertheless, a bunch of language names with acute accents and macrons. Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 00:17, 9 October 2025 (UTC)Reply
    With the same disclosure that I left on WT:LOL, I am a friend of Lilith Itou's, she has pointed me to this discussion, I don't speak te reo Māori, I would note that a similar RfC on the English Wikipedia resulted in a consensus to add tohutō to articles there. LivelyRatification (talk) 23:40, 8 October 2025 (UTC)Reply
    I hadn't read this until LivelyRatification posted it here, but having gone over it the discussion summary provides an excellent overview of the topic! I would highly recommend anybody interested in contributing to this debate to read it. Lilith Itou (talk) 00:04, 9 October 2025 (UTC)Reply
     Support renaming per nom. Juwan (talk) 10:16, 9 October 2025 (UTC)Reply
     Oppose. I am in favour of using native terms, but this seems to be complicating the spelling with no real reason. It's not like it's unclear what language is meant, and macrons are a real headache to add to L2s when editing. I wonder if we could theoretically come up with a technical solution that displays a macron on the term 'Maori' automatically for logged-out users or something. But from the point of editing - this is not a good idea. Thadh (talk) 11:39, 9 October 2025 (UTC)Reply
    what an odd perspective. if one is editing Māori terms, they would likely already have easy access to the macron in their keyboard. for passer-by edits, the diacritic can be typed in a US-International keyboard, a standard mobile keyboard or, if as a last resort, in the advanced options panel during page editing. Juwan (talk) 11:48, 9 October 2025 (UTC)Reply
    I have a US-international keyboard and afaik I cannot write a macron without copy-pasting the combining diacritic. I have edited Tokelauan for a long while, I have never had a macron on my keyboard, I had to copy-paste it every time.
    I don't think we should expect anyone to use diacritics for L2s (which by the way is also the form used for searching categories). I would also be in favour of changing Guaraní to Guarani etc. Thadh (talk) 11:57, 9 October 2025 (UTC)Reply
    Typing {{subst:\|mi}} will generate the canonical language name without any copy/pasting. —Mahāgaja · talk 12:25, 9 October 2025 (UTC)Reply
    Actually (and forgive me for changing the subject here) “Guaraní” has a diacritic due to Spanish influence, not from Guarani itself (where it’s spelled “guarani”). Estigarribia, author of an excellent recent grammar of (Paraguayan) Guarani, even mentions this, preferring the term without the diacritic. In my view, all varieties of Guarani should have the diacritic removed, but regarding the current proposal, I  support it for the reasons presented by the nominator (and hope they continue recording pronunciation audios for Māori). Yacàwotçã (talk) 16:20, 9 October 2025 (UTC)Reply
    Thanks, I intend to! Lilith Itou (talk) 21:18, 9 October 2025 (UTC)Reply
    @Thadh "I have never had a macron on my keyboard, I had to copy-paste it every time." I write my own keyboard layouts on Linux... Exarchus (talk) 13:18, 25 October 2025 (UTC)Reply
    You need the advanced characters menu for a bunch of other editing necessities, like a lot of {{lang-IPA}} and {{lang-pr}} templates use special characters to distinguish sounds. Clearly most publications have no issue with typing the character, or else it would not be the most frequent choice. And editing a language without being able to type it properly seems like a struggle either way. Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 14:29, 9 October 2025 (UTC)Reply
    Can't we just have a bot swap the spellings from Maori to Māori? — mellohi! (Goodbye!) 23:30, 9 October 2025 (UTC)Reply
    There's an issue with categories, for example, as @Thadh points out; the categories will have the macron in it, making it harder to type (esp. at it occurs near the beginning, so autocomplete will be of less help). Benwing2 (talk) 23:37, 9 October 2025 (UTC)Reply
    FWIW I also support the spelling Guarani without diacritic (Ngrams shows both spellings essentially tied.) Benwing2 (talk) 23:39, 9 October 2025 (UTC)Reply
    I don't see why we can't just bot-create redirects for people trying to search for categories, and within entries we usually categorize via language codes anyway. — mellohi! (Goodbye!) 00:54, 10 October 2025 (UTC)Reply
    I understand that Special:Search makes searching for these names quite easy; by typing Category:!Xoo into it, I get suggested Category:ǃXóõ language. For adding categories to a page, that is usually done with {{cln}} and {{C}} rather than regular wikitext syntax (a bot replaces instances of those), and those take the language code. Even HotCat suggests the correct categories for me. The only issue I could see arising from that would be people manually adding categories (without templates) and not typing the language names correctly, but a bot could definitely identify instances of that, no? Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 03:22, 10 October 2025 (UTC)Reply
    @Polomo This is interesting; apparently auto-complete has gotten a lot smarter than it used to be, and now for example will correctly auto-complete Category:Naʼvi language (with some sort of Unicode apostrophe; FWIW I would probably have opposed putting a Unicode apostrophe in the language name, and I suspect it was added out-of-process) when you type CAT:Na'vi with a plain apostrophe, as well as auto-completing Category:ǃXóõ language when you type CAT:!Xoo. So I assume it would be smart enough to do the same for Māori vs. typed Maori. This obviates some of my concerns about autocompleting categories. FWIW I would strongly oppose a bot adding a bunch of redirect categories from e.g. Maori to Māori, as @Mellohi! proposes; it adds a bunch of junk and presents a false picture of the extant categories, as the bot-created categories will always be running behind the actual extant categories. Benwing2 (talk) 06:43, 10 October 2025 (UTC)Reply
     Support As a New Zealander, changing the language's name to Māori is eminently sensible, since it is definitely normative here both in the linguistic literature and in governmental, media and political discussions regarding the language. However, the claim that the macronless form Maori is "extremely inaccurate to how te reo is referred to both by Māori and by New Zealand English speakers" is somewhat of a overstatement, especially since the widespread use of the tohutō is very recent, both in the language's name and more generally (note the use of Maori without the macron in the title):
    • 1993, Winifred Bauer, William Parker, Te Kareongawai Evans, Maori (Descriptive grammars), Abingdon, Oxon: Routledge, →ISBN, →OCLC:
      Later, a macron was used to mark vowel length, but it was used only sporadically, and not, for instance, on maps, road signs, etc. Williams’s Dictionary from the 3rd edition onwards uses the macron on head words only. The text-books used most widely for the teaching of Maori all use the macron for vowel length. This is despite a long campaign by Bruce Biggs and colleagues at the University of Auckland to write long vowels as double vowels. There are advantages both linguistically and pedagogically to doing so, but the proposal met with no support from the Maori community at large: they feel that their language looks “clumsy”, “silly”, etc written with double vowels, and prefer the macron. Many still see no need at all to mark vowel length.
    Hazarasp (parlement · werkis) 11:55, 9 October 2025 (UTC)Reply
    It's worth nothing that some of the back and forth regarding macron usage (particularly between the double-vowel versus tohutō) is due to dialectical differences and iwi affiliations. The Kiingitanga-aligned iwi have used double-vowels for a long time, as they prefer them, though most Māori see the macrons as the best way of representing the differences between sounds. That said, given the current Kiingitanga is usually referred to as 'Ngā-wai-hono-i-te-pō' instead of the double-voweled version, it's possible that this last holdout of the double vowel may also be moving towards the tohutō.
    On the subject of the 'recent' use of the tohutō, it's true that its usage throughout the wider population is relatively recent - but this is not due to a lack of necessity. As I indicated, long-vowels can completely change the meaning of a word, and that has always been the case. The first usages of macrons date back to some of the earliest Māori written texts (the English Wikipedia page for the Māori Language contains a lot of good sourcing on this matter, so I won't go out of my way to repeat what it says). As this article correctly indicates, macron usage as standard or near-standard in the teaching of te reo Māori can be traced back at least 50 years.
    The 'recent' uptick in its usage broadly mostly reflects English language news media changing policy to accurately reflect the writing of te reo, at first sporadically and then all together around 2017-18 - rather than any confusion amongst Māori or linguists as to whether the long vowel requires marking. Lilith Itou (talk) 21:07, 9 October 2025 (UTC)Reply
    • Regarding the Kīngitanga/Tainui use of double vowels, my guess is that they took up the practice from Biggs since he was from Ngāti Maniapoto, so it would date back to the 1950s when he began to teach and study Māori academically. I do agree with you that it is likely to be displaced by the tohutō though.
    • My emphasis was upon Māori loanwords in English when I referred to the "very recent" introduction of the macron, since this discussion is about one (the word Māori as used in the English-language Wiktionary). The quote I offered was probably ill-advised since it's about the use of macrons in Māori itself, though that could still be argued to be comparatively recent, especially since macroned forms obviously didn't get instantaneously adopted universally when they were mandated in the education system.
    • As for the claim that macrons are "necessary" to mark vowel length, they are obviously "necessary" in terms of current Māori orthographic practice. However, plenty of languages make do with "defective" orthographies that neglect certain phonemic distinctions, and vocalic length contrasts are among the most frequently elided in such systems. For instance, while the orthographies of Samoan and Tongan theoretically employ a macron (fakamamafa and toloi respectively) to mark vowel length, it is frequently ignored in actual practice. However, having a non-defective orthography is obviously very convenient in language learning; the fact that the tohutō has spread to the extent that it is considered mandatory is possibly connected to the fact that Māori is no longer primarily a first language.
    Hazarasp (parlement · werkis) 05:28, 10 October 2025 (UTC)Reply
     Support the change, per above (quite unrelated, but the entry for tohutō also didn't have a macron). Trooper57 (talk) 21:39, 24 October 2025 (UTC)Reply

     Support — I am most convinced by the evidence that adding the macron/tohutō will not thwart Special:Search's autocomplete feature. It seems any difficulties introduced by the macron/tohutō are sufficiently slight that they are worth tolerating. Perhaps diacritics in L2 headers are not the bane I had thought them to be. 0DF (talk) 09:06, 30 November 2025 (UTC)Reply

    Splitting Category:Puroik language (suv) into a language family

    [edit]

    According to Lieberherr (2015), Puroik is not "one language", but multiple; the Bulu and Chayangtajo/Sanchu "dialects" have little-to-no mutual intelligibility. Thus, Puroik on Wiktionary should be split into separate languages and "Puroik" (suv) proper be redefined as a family. Following Lieberherr's study, we should split Puroik into at least 3 languages:

    • Bulu Puroik (suv-bul)
    • Kojo-Rojo Puroik (suv-krj)
    • Chayangtajo Puroik (suv-cht)

    If this split goes ahead, Category:Proto-Puroik language should have its code changed from sit-khp-pro to suv-pro. — mellohi! (Goodbye!) 00:09, 10 October 2025 (UTC)Reply

    @AryamanA, Thadh, -sche Pinging for thoughts. — mellohi! (Goodbye!) 00:13, 10 October 2025 (UTC)Reply
    Not opposed per se, but I'd like to see more than once source making this claim, to establish some sort of consensus in the field, rather than just trusting what one source says. Keep in mind that mergers are a lot harder to execute than splits, so splitting a language is akin to a one-way-door decision (once you make it, it's hard to undo). Also, is there anyone actually working on this language/group or are you just proposing this for theoretical correctness? Benwing2 (talk) 06:36, 10 October 2025 (UTC)Reply
    I made a few entries a while back. I think the split makes sense, and, as Thadh mentioned, reconstructing Proto-Puroik is non-trivial. —AryamanA (मुझसे बात करेंयोगदान) 00:00, 12 October 2025 (UTC)Reply
    If we have Proto-Puroik it only makes sense to have multiple Puroik languages. But I'm not very familiar with this language group. Thadh (talk) 14:32, 10 October 2025 (UTC)Reply

    Runic Script for Elfdalian

    [edit]

    According to Wikipedia, Dalecarlian runes were used to write Elfdalian as recently as last century. Is there any reason not to add the script code "Runr" to code "ovd" in Module:languages/data/3/o? Chuck Entz (talk) 04:25, 11 October 2025 (UTC)Reply

     Support — Let's cite some modern runes! 0DF (talk) 00:21, 12 October 2025 (UTC)Reply

    Adding Proto-Sabaki

    [edit]

    I think it’d be a good idea to add Proto-Sabaki reconstructions. The Sabaki language group is now well-established as a genetic grouping, see Sabaki languages at Wikipedia for example. The book Swahili and Sabaki: A Linguistic History by Nurse & Hinnebusch gives a thorough review of its status and lists a great many reconstructions. Pinging @Tbm, Smashhoof, HeavenlyAestheticist as Swahili/Bantu editors that may want to weigh in. MuDavid 栘𩿠 (talk) 03:29, 16 October 2025 (UTC)Reply

    I think that's a great idea. I recently tried to add Proto-Sabaki to the etymology section of Swahili palilia but got the error "Sabaki languages (bnt-sab) is not set as an ancestor of Swahili (sw)", so this should be adjusted as well. tbm (talk) 10:16, 16 October 2025 (UTC)Reply
    Sounds like a good idea to me. HeavenlyAestheticist (talk) 10:36, 16 October 2025 (UTC)Reply

    (Notifying Theknightwho, Benwing2): Could this be implemented? MuDavid 栘𩿠 (talk) 03:13, 24 October 2025 (UTC)Reply

    Let's wait for @-sche, our obscure languages expert, to weigh in. If no answer in a few days, we can go ahead. Unfortunately I know nothing about Sabaki. Benwing2 (talk) 03:29, 24 October 2025 (UTC)Reply
    There's a good amount of literature discussing the existence of Proto-Sabaki, the dating of it, and phonemes or sound changes it had, and a fair amount that reconstructs at least some words in it. Not being a Bantu specialist, I don't have a sense of how accepted particular reconstructions are or if different authors' are inconsistent, and defer to the Bantu-knowing editors above about how many reconstructions are suitable to add, but it seems reasonable to add the language: as MuDavid says, it's established that it existed. - -sche (discuss) 07:14, 24 October 2025 (UTC)Reply
    All right, I'm convinced and have gone ahead and added it. Benwing2 (talk) 07:27, 24 October 2025 (UTC)Reply
    Thanks a lot! MuDavid 栘𩿠 (talk) 03:40, 27 October 2025 (UTC)Reply

    Kariri (kzw) etymology-only languages: Dzubukuá, Kipeá, Pedra Branca, and Sabujá

    [edit]

    I have recently talked with administrator @Polomo about this. My initial suggestion here is to split Kariri into its attested varieties, but only as etymology-only languages. This is mainly because I am not yet convinced treating them as fully independent languages is the best solution here, and I am apparently the only one interested.

    Most recent works seem to treat them as separate languages rather than dialects of a single one. This is especially true for the best attested varieties, Dzubukuá (a catechism and a manuscript with some sentences) and Kipeá (a grammar and a catechism). Bernard de Nantes, author of the works in Dzubukuá, drew a parallel between their linguistic differences and those between Portuguese and Spanish (“the Kariris called Dzubukuá whose language is as different from that of the Kariris called Kipeá as Portuguese is from Castilian”). As for Pedra Branca and Sabujá, they are attested only in vocabularies collected by von Martius (see here and here, with some additional words scattered throughout the work).

    As an example, the cognate of English maize is recorded as madiki (Dzubukuá), masichí / masikí (Kipeá), and mosiccih (Pedra Branca) and maschicöh (Sabujá). Here, although they are cognates, madiki apparently actually meant manioc, while masichí really meant corn. Splitting them into etymology-only languages would make it easier to handle such cases in the Etymology section rather than Alternative forms. In the entry masichí, for example, we could have: “Ultimately from Proto-Arawak *marikɨ (‘corn’) as a Wanderwort. Cognate with Dzubukuá madiki, Pedra Branca mosiccih, and Sabujá maschicöh.” Something similar happens with badzé, which in Kipeá is attested with the meanings tobacco and to divine, but in Dzubukuá Badze appears only as the name of a god.

    I propose the codes kzw-dzu, kzw-kip, kzw-ped, and kzw-sab.

    (P.S. 1: As for masichí, there is even a reconstruction of this term for Proto-Kariri: *masiki. I am not suggesting the creation of Proto-Kariri for now though, since that is the only reconstruction I have found.) (P.S. 2: There are also three vocabularies recorded in the past century, but from a time when no closely related languages were still spoken. For now I prefer to disregard these, except perhaps for occasional mentions in the Etymology section when they help clarify a meaning, as done in yacá, where these vocabularies seem to show a semantic shift occurred from fox to dog.) (P.S. 3: I have been using here and hopefully henceforth for Dzubukuá the orthography from the appendix of Queiroz's (modern) grammar, and everywhere for Kipeá I follow Mamiani's instructions, which are easy to grasp.) Yacàwotçã (talk) 05:25, 19 October 2025 (UTC)Reply

    Pinging @Trooper57 in case they're interested. Any suggestions are welcome. Yacàwotçã (talk) 05:26, 19 October 2025 (UTC)Reply
    How would having these etymology-only codes help? You can’t exactly say a word in “Kariri” is cognate with one in “Pedra Branca”. What you may want is a similar treatment to (also Chinese), which are listed under one heading... I don't really know why.
    Then you could lemmatize, say, Kipeá, and list the other languages’ words as alt forms. But I’m not sure this would work, given, say, madiki never meant manioc, only its alt form.
    In practice, I believe this could only be handled by having multiple lemma pages for each set of cognates, and at that point they should be under different headings any way. We have a separate code for languages like São Paulo Kaingáng.
    In any case, probably best to look into other possibilities. Hope someone can explain how Prakrit is organized. Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 14:59, 19 October 2025 (UTC)Reply
    @Polomo I'm not sure if the languages you mentioned are really comparable to this case. I was thinking of Coptic where cognates of different varieties are treated as alternate forms, but it seems to me as a layperson that those are actually much closer to each other than the Kariri varieties, and also much better attested. Anyway, just so I understand are you suggesting splitting them into separate languages? Or are you just questioning my idea so we can refine it better? Personally I'd be fine with treating them as independent languages (and would make my life considerably easier) but I wonder if there's any issue with that or if reverting to the current state later would be too much of a hassle. Yacàwotçã (talk) 05:52, 20 October 2025 (UTC)Reply
    Yeah, I’m suggesting treating them as separate languages. It can’t get much clearer than when an Italian missionary in the early 18th century recognizes it. Also, it seems this comparison with Portuguese and Spanish is pretty common... that guy Couto de Magalhães used it for Nheengatu vs. Guarani. Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 10:41, 20 October 2025 (UTC)Reply
    I also  Support treating them as separate languages, they seem different enough. Trooper57 (talk) 19:07, 23 October 2025 (UTC)Reply
     Support adding them as etymology-only languages at a minimum. I would also support treating them as separate full languages, but I may not be aware of the technical problems that may cause. Apparently, it's a lot easier to split than it is to merge (I don't know why; ask Benwing2 if you want an explanation). 0DF (talk) 19:52, 23 October 2025 (UTC)Reply
    Polomo, given your statement and that of Trooper57, I agree with the division as separate languages. When you do so, if you indeed do, please let me know so that I can adjust the current entries. Thanks, Yacàwotçã (talk) 10:17, 26 October 2025 (UTC)Reply
    I lack the technical knowledge of what needs to be done in these cases. Either Mr. ’Wing or another of our more technical editors needs to pitch in. Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 14:58, 26 October 2025 (UTC)Reply
    @Polomo @Yacàwotçã I have no issue adding etymology codes for the different varieties, but if we are proposing a full L2 split I feel that needs a bit more discussion, especially given that all these lects are extinct are some are known only from single word lists. Pinging @-sche for thoughts. Benwing2 (talk) 20:41, 3 November 2025 (UTC)Reply
    Yeah, actually, are there no modern-day studies on this matter? It seems pretty reasonable to say Kipeá and Dzubukuá are separate languages even without them, but for the remaining two it’s more iffy. Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 20:52, 3 November 2025 (UTC)Reply
    It gives me pause that Wikipedia says they are "generally considered dialects of a single language"; if you are correct that more recent works tend to regard them as separate languages, it'd be good to update our sibling project. After that... when he was active, Metaknowledge took (and I was persuaded to accept) the view that for extinct wordlist-only languages, it can actually be tidier and more conservative to give each one its own (full) code, if there's no clear case that they are the same language (and if they are mostly referred to under their separate names in literature, rather than as one language) : that way, people can just enter and link to each one under the name it's most known by, rather than entering it under an umbrella language name + labels, and even if we later decide to (re)merge them, the number of words in most of them being so small and well-defined means that shouldn't be nearly as difficult as merging e.g. large living languages. - -sche (discuss) 02:37, 6 November 2025 (UTC)Reply
    Makes sense. I guess it depends, as you said, on whether there's general consensus that it's the same language being referred to in different wordlists. Benwing2 (talk) 02:55, 6 November 2025 (UTC)Reply

    Early Modern Portuguese

    [edit]

    Add "Early Modern Portuguese" as an etymology-only language, like we already do for Early Modern English (en-ear) and Early Modern Spanish (es-ear); macau is surely one term that comes to mind. Trooper57 (talk) 04:08, 26 October 2025 (UTC)Reply

     Support. leixar and the dixer verb forms come to mind (though we have entries for neither right now). Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 15:15, 26 October 2025 (UTC)Reply
    Though I’m not sure how this is supposed to be added in an entry. Some English entries have [[Category:Early Modern English]] and only label the term as “obsolete”; others have {{lb|en|EME|obsolete}}; others yet have just {{lb|en|EME}}. Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 15:20, 26 October 2025 (UTC)Reply
    I'd only label as Early Modern, obsolete is already implied. Trooper57 (talk) 15:29, 26 October 2025 (UTC)Reply
    But then it won’t categorize under Category:Portuguese obsolete terms/forms; that cat would need to be manually added. It seems to me like it’s most sensible to add both labels, actually. Polomo ⟨⁠ ⁠oi!⁠ ⁠⟩ · 16:20, 26 October 2025 (UTC)Reply
    You can make it add both categories with a single label in Module:labels/data/lang/pt. Trooper57 (talk) 16:23, 26 October 2025 (UTC)Reply
    Then it wouldn't work with "obsolete forms" hmmm. Trooper57 (talk) 16:31, 26 October 2025 (UTC)Reply

    Removing and/or clarifying Church Slavonic

    [edit]

    @Benwing2, AshFox, Sławobóg, Vininn126

    Recently, Church Slavonic has been split off from Old Church Slavonic. In my opinion, the current state of affairs is a tangled mess that is worse than the original situation:

    Pre-split we just called all Church Slavonic varieties 'Old Church Slavonic'. Agreed, that naming is probably not the best, and could be changed, but the handling of these various lects was still consistent - we had a defined set of language varieties that were closer to each other than to other languages under one L2.

    What we have now is the following: Canonical Church Slavonic has been split off, with the rest still under one L2 under the name 'Church Slavonic'. There are many problems with this:

    • The canonical language is almost or completely identical to the language of many following centuries of Church Slavonic, and so basically the entirety of the canonical OCS can be duplicated into CS.
    • The modern varieties are more distinct from each other than they are from (canonical) OCS.

    This makes the split both arbitrary and more of a mess than before the split happened. To make an analogy, imagine we had Belarusian, Carpathian Rusyn and Ukrainian starting from the 1500s under one L2 named 'Ruthenian'. The CS split was basically like splitting off Old Ruthenian specifically from 1500 to 1550, while keeping Belarusian, Carpathian Rusyn and Ukrainian as well as Old Ruthenian from 1550 to the 1800s as one single language.

    So, I propose either of the following two solutions:

    1. (the easiest option) We just re-merge all Church Slavonic varieties into one L2, name it 'Church Slavonic', and proceed with finding regional/temporal labels for anything we want.
    2. (the better, but more complex option) We actually do the homework and find out what varieties are distinct enough from each other to warrant their own L2, leading to multiple CS varieties (likely OCS, RusCS and SerbCS, with a well-defined cutoff date from OCS to the different modern varieties).

    Since in the last few years nobody has tried to do the latter, and I personally definitely don't have time for that, I suppose the former option is more realistic, and we should probably go ahead with it before the time we can split the language properly. Thadh (talk) 11:48, 27 October 2025 (UTC)Reply

    @Thadh I  Support either one of your options; the split out of OCS seems similar to splitting Classical Latin out from "all other Latin varieties", which doesn't make much sense since "all other Latin varieties" is not a clade and Classical Latin is closer to Late Latin than Late Latin is to Medieval Latin. Benwing2 (talk) 20:45, 3 November 2025 (UTC)Reply
     Oppose. 1 - I don't see how is CS + labels better than OCS-CS split + labels. 2 - It's an overkill, we dont need 10 вода or богъ CS lemmas. Having separate lemmas for canon and non-canon is good enough. I don't think anything should be changed. Sławobóg (talk) 21:31, 9 November 2025 (UTC)Reply
    I won't let my vote count on an issue of languages I don't edit, but for what it's worth I am also a firm believer that a good labelling infrastructure under a single header would be the best solution in terms of both practical management and (trusting your judgement) linguistic accuracy. Splitting, with all its benefits, still seems a nearsighted decision, and the earlier we revert it the less work will you guys have to do. To avoid the issue of "polluting" the original corpus, which as far as I understand is one of the main reasons for the split of Middle Armenian, there should also be some categorisation in place to contain terms which are attested already from canonical OCS. Catonif (talk) 16:47, 30 November 2025 (UTC)Reply

    Renaming Senhaja De Srair (sjs)

    [edit]

    Suggestion to change the capitalization of the language name to 'Senhaja de Srair', which is the more common way of spelling it. Lankdadank (talk) 12:11, 30 October 2025 (UTC)Reply

    @Lankdadank  Support; it appears that de here is just the French word for "of" so correctly it should be lowercase. Benwing2 (talk) 20:49, 3 November 2025 (UTC)Reply
    Pinging @Fenakhay and @Wad Yaẓimẓn for comment. Should not be a very controversial change to make, I think. Lankdadank (talk) 11:26, 5 November 2025 (UTC)Reply
    Support per nom, "de" indeed appears to be significantly more common. - -sche (discuss) 21:29, 5 November 2025 (UTC)Reply
     Support. — Fenakhay (حيطي · مساهماتي) 13:54, 14 November 2025 (UTC)Reply

    November 2025

    [edit]

    Move Cajun from etymology-only to a full language

    [edit]

    I am a Louisiana Cajun with family who are native speakers of the language/dialect as long as it has existed. I was a native speaker as a child, but as I grew up in exclusively anglophone communities, my accent and understanding of it have been mostly subsumed into the school French I learned from middle school through college (which was entirely based on the Parisien standard). However, I have recently been gifted my dad's copy of the books Cajun-Self Taught and A Dictionary of the Cajun Language by Rev. Jules O. Daigle. These books were originally published in the 1950s I believe, but the copies I have were given by the author to my dad in 1984. Daigle documents his native dialect, that being the Cajun language as it was spoken in the early 1900s. Obviously the standard he advocates for is somewhat different from the variety that exists today, but I think adding it to Wiktionary would provide a wonderful baseline to allow modern native Cajun speakers to add usage notes or more specific modern-day slang that they have grown up with. I desperately want to digitize this entire dictionary and add it to Wiktionary, both for the sake of future generations of Cajuns and non-Cajuns alike, and as a learning exercise allowing me to familiarize myself with the regional vocabulary as I add it.

    I am aware that Wiktionary currently has 467 words under Category:Lousiana French and an additional 160 under Category:Cajun French, but this pales in comparison to the amount of vocabulary in the books I have (roughly 25,000 English terms listed, most with multiple Cajun translations, as well as an additional section with at least 8000 Cajun words). I would just not be comfortable adding all of this documentation under the existing French section headers, as I (as well as the author of this book) believe that Cajun is its own regional language variety that deserves to be taken seriously, not just concatenated under French, with the expectation being that everything is the same as a default and only the differences are worth noting. Tangerines404 (talk) 21:00, 9 November 2025 (UTC)Reply

    I'd also be happy to provide photos of the books or their pages if anyone doubts the legitimacy of this project, or if they are just curious to see it! It's a very interesting resource Tangerines404 (talk) 21:02, 9 November 2025 (UTC)Reply
    I agree with @-sche here; I would oppose splitting this out based on mutual intelligibility if nothing else. As an aside, I think it's unfortunate that regional lects don't get legitimacy (especially, it seems, in Europe, but not exclusively there) unless they have the title "language" by their name. As a result we have lots of people pushing for microsplits of regional lects on the theory that this is the only way for them to be recognized as distinctive. Benwing2 (talk) 00:44, 10 November 2025 (UTC)Reply
    In an ideal world, a decision about whether to continue to treat Cajun French as a variety of ==French== or split it off as a separate language would be based on a linguistic analysis of whether Cajun French and other French are generally similar and mutually intelligible, or whether they are very divergent and mutually unintelligible. (Because we do not live in an ideal world, there is also a second factor which might influence some people with regard to whether to split it off or treat it as French, which is that French words have to be used in three different books or newspapers etc in order to meet WT:CFI, whereas a separate language would have a lower bar to entry.) On the linguistic front, my current understanding from linguistics literature and from anecdotes from a couple speakers I've met is that Cajun and Louisiana French are intelligible—but sound dated and with occasional unfamiliar words—to speakers of e.g. Parisian French (and vice versa). I found a copy of Daigle's dictionary online, and though he does call it "a separate and distinct language in its own right", he also seems to acknowledge that it is mutually intelligible (his intro talks about allowing "Cajuns [to] become familiar with numerous standard French words and integrate them into their vocabulary; and French-speaking persons of other nations [to] enrich their particular type of French with our unigue Cajun vocabulary", etc). When I spot-check various words in his dictionary, most also exist in other varieties of French. My impression is that Wiktionarians would not want a situation where lots of words were duplicated under both ==French== and ==Cajun French== headers just to indicate that the words also exist in Cajun; my understanding is that it would be preferable to use usage notes (possibly even using a template to provide the same wording every time) to indicate, on words that are common in other varieties of French but not Cajun, something like "This word is not used in Cajun French, where X is used instead", and if a word is entirely European-French-specific (absent not only from Cajun but from e.g. Canadian and African French), it should have a {{label}} indicating that it's France- or Europe-specific, just like Cajun-specific words should and currently do have labels (and if a word is common to all French including Cajun, then it doesn't need any label but you could add Daigle's dictionary to the ===References=== or ===Further reading=== section, or add Cajun quotations, etc).
    (You know this and I know this, but I'll mention for the benefit of anyone else reading this that Cajun French and Louisiana French are also different from the Louisiana Creole language, which we already have as a separate full language.)
    If you think Cajun French is so divergent as to be a separate language from French as spoken in Canada and France etc, it'd be helpful if you could lay out differences, and provide some example paragraphs in Cajun French vs other/France French.
    If the English Wiktionary decides against separating Cajun French, one other thing I will mention is that the French Wiktionary tends to be more permissive when it comes to letting things be separate languages — even languages that are relatively similar to French, e.g. fr:Catégorie:berrichonand they would also be in a great position to judge whether Cajun French and other varieties of French are one language or not, if you would like to suggest this project there. - -sche (discuss) 22:22, 9 November 2025 (UTC)Reply
    In an ideal world, the status of the Cajun (or Cajun French) language would not be in question in and of itself, in my opinion -- while mutual intelligibility is the generally agreed-upon standard for professional linguistics today, ultimately the distinction between "language" and "dialect" is much more perceptual (in the minds of both actual speakers, and non-speakers with adjacency to the speech communities) and sociocultural than it is academic. In contexts like "American English" or "European French," it is easy to write off these individual regional and contextual differences between linguistic standards as "dialects," because these languages share both a professional and academic written standard, as well as a vast amount of literature and media produced in said language, that is intended to be intelligible to all speakers within said country.
    Cajun does not have such a standard. Throughout its over 400 year history, Acadian/Cajun culture has never had any amount of national prestige... it was subject to French colonial rule, then Spanish, then Anglo-American cultural encroachment as Louisiana became more firmly part of the United States. In the past 100-150 years or so, the Louisiana education system has actively suppressed the use of the language as opposed to English in all academic or professional contexts, leading to massive amounts of generation loss from grandparents to parents to children.
    My point for recounting all of this history is simply to say that the fact Cajun is considered purely a "dialect of French" is inseparable from the fact that it exists as a substrate to (Southern American) English. Thus, modern (past the 1950s or so) appeals to linguistic similarities with French are for the most part attempts to legitimize the lect, and make an argument that children who grow up speaking it will thus be granted access the vast amount of cultural products originating in le monde francophone. While it is theoretically true that adult, literate Cajuns who have received both French and English education today will be able to read a basic elementary-level text in European French... historically many of the most "vernacular" varieties of Cajun have not been written down nearly as much as the more formal "prestige" varieties (which tend to be closer to European French due to the influence of French Catholic priests introducing their linguistic standards), and even the written standards that do exist tend to underemphasize as many differences as possible. Illiterate-to-French speakers of these less prestige varieties of the Cajun language, ESPECIALLY monolingual ones who do not also speak English, would have extreme difficulty attempting to read an average daily print of Le Monde, even if they were provided with basic phonics training explaining how to read the French writing system in all its quirks. Tangerines404 (talk) 17:55, 10 November 2025 (UTC)Reply
    Daigle's books are themselves particularly guilty of the "appealing towards the written standard of French" strategy which I am describing: Daigle was himself not a linguist, but he was in fact a priest, and he was educated in written English, French, and Latin. He therefore attempts to argue that the Cajun language deserves recognition due to its unique position as borrowing words from French and English both, just as English borrowed Latin and French and retained older Germanic roots. He argues that vernacular speech varieties he sees as "errors" should not be taught... but by doing so, he luckily documents some of these unique regional grammatical and phonetic features, such that I would be able to preserve them as valid dialectal variants on Wiktionary if this project were accepted!
    Just to provide a few examples of what I'm referring to:
    The phonetic inventory of French seems to be entirely different from European French, not just in pronunciation but in terms of actual core phonemes. Daigle uses a "phonetic code" of his own invention which makes it clear that, among other things, the language has a distinct /æ(:)/ phoneme which he transcribes "ai", it is different from the two "ā"s of "traiter" (/e:/ or /ei/), the "e" of "mettre" (/mɛt/, final /r/ is only preserved in liaison contexts in Cajun) and the "a" of "rat" /rɑ/. The phoneme has both a nasal variant in contexts like "ain" /æ̃/ and "aine" /æ̃n~æn/, the indefinite articles, as well as occuring word finally in "air," which seems to correspond with standard French /ɛʁ/, but also occurs in non rhotic varieties in forms like sha (cher). He also describes how some accents use forms like "a" and "alle" (likely /æ æl/) for the third person feminine pronoun, implying a potential split of several /ɛ/ forms into this phoneme, which is either equivalent to or semi-merged with /a/.
    Daigle's phonetic code is used continually throughout Cajun Self-Taught, and he often uses it to vastly underemphasize significant differences between the written and spoken forms: such as suis "sū" /sy/, "Juliet" /jy.li.ɛt/, equivalent to standard /ʒɥi.e/, and "cette année" /stan.ei/. And again, Daigle's dialect is from the early 1900s and very conservative: he mentions as "common errors" forms like "carculer" for "calculer," "djab" for "diable," "éstora" for "restaurant", and "gonier" for "gagnier" (different vowels).
    On the topic of pronouns, the grammatical features of elision and liaison function very differently in Cajun. While Daigle proscribes against "a/alle", he describes the standard masculine and feminine 3p pronouns as "i" and "e," with the forms "il" and "elle" only occuring before a word with a following vowel. He also describes the first and second pronouns "je" and "te" as being elided "almost always," producing forms like "j'su" for "I'm" or "t'été" for "you were." (I am aware that this type of elision is documented in other French varieties as well, I simply describe it to emphasize the further spoken differences from the written standard). There is also the unique second-plural form "vooz ot" (vous autres), which is used exclusively for plural "vous." Singular "vous" is only used for "persons of special dignity," and both pronouns take the standard conjugation, not the unique -ez form of standard French. He does, however, describe another regional verb conjugation, -on /ɔ̃/, which is used for the third person plural form in examples like "ils avons pas d'argent" or "ils étions pas la".
    Some varieties also seem to have lost the liaison process entirely, likely due, again, to lack of literacy with standard French spelling: Daigle describes forms like "zoiseau" for bird or "zarbe" for grass. He calls these "the type of mistakes made by children and others who cannot read their language," but also calls it a "very widespread custom," implying that it is or was broadly commonplace in many colloquial varieties of the language.
    Daigle also describes a unique derivational frequentative infix, -aill- /ai/, which is used to produce verb forms such as
    roder /ro.de/ "to roam" -> "rodailler" /ro.dai.e/ "to gallivant, run around"
    couper /ku.pe/ "to cut" -> "coupailler" /ku.pai.e/ "to chop into pieces"
    It can also have an "objectionable" connotation:
    haler /ha.le/ "to pull" -> "halailler" /ha.lai.e/ "to tug at roughly or obnoxiously"
    sauter /so.te/ "to jump" -> "sautailler" /so.tai.e/ "to jump up and down, objectionably, noisily"
    While all of the verbs on the left are French words (although Euro-French lost h aspiré a long time ago, and roder has an entirely different meaning), the verbs on the right would be, as far as I am aware, completely uninterpretable to a monodialectal inhabitant of France.
    There are other unique verb constructions as well, such as a present progressive form using "apré(s)" (not après): "j'su aprés parler" = "I am speaking (now)." This usage is similar to Haitian Creole, but après is apparently also used as a conjunction meaning "as" or "according to," as well as as a "quasi-verb:" "va aprés d'l'eau" = "Go get some water" (roughly "go after some water").
    Hopefully, all of this serves as enough to prove that, regardless of whether you call it a distinct language from European French or a dialect of the global French language, Cajun is its own distinct linguistic force, with sub-dialects and alternate standards within it (including many more I don't have the energy to describe unless I know that documentation will be actually seen by Wiktionary users researching Cajun). I firmly believe that opening up the wiki to displaying it as its own language would allow better documentation of pronunciation variants, morphemic differences, unique verb conjugation charts, etc. Tangerines404 (talk) 19:05, 10 November 2025 (UTC)Reply
    It’s a bit of a ‘monkey’s paw’ wish. On the one hand, yes — that makes documentation much easier. On the other hand, now someone — which will likely be you yourself — will have to decide what spelling(s) to use for Cajun lemmas and what inflections to put in the inflection tables. Is there a specific widely-recognized standard which isn’t effectively Standard French? Nicodene (talk) 22:42, 18 November 2025 (UTC)Reply
     Oppose, though I am sympathetic to your cause as a Cadjin myself.
    I agree with -sche. I would estimate that perhaps 90 to 95% (or a similarly large percentage) of the Louisiana French lexicon is shared in some fashion with "Standard" French. And importantly, the two varieties have fundamentally similar grammar. Most trans-Atlantic misunderstandings, I think, come from differences in vocabulary, semantics, and pronunciation. I am of the opinion that it would be a much more manageable task to develop the French Wiktionary the way -sche described. But that does not mean that there aren't challenges to that approach:
    • LF is sorely lacking conjugation information on Wiktionary. However, we do not yet have a canonical template for that, given that LF conjugation is not standardized and thus highly variable. For example: ils mange(nt) ~ ils mangeont, vous êtes ~ vous est, les haricots sont pas salés ~ ...est pas salés, etc. Additionally, LF has a different pronoun set to account for. Standard French has (je, tu, il, elle, on, nous, vous, ils, elles) while LF has (je, tu, il, alle, on, ça, (nous), nous-autres, (vous), vous-autres, ils, eusse, eux-autres) and all their variants. Regardless, I would be happy to help develop an appropriate conjugation template.
    • LF is not sufficiently supported by the {{fr-IPA}} template. Pronunciation sections could also indicate that LF does not exhibit aspirated h.
    • When the word "Cajun" is used in a label template, it could refer either to "Acadian" words (i.e. LF words found only among historic Acadian communities) or as a increasingly proscribed synonym for LF. (I have not found the ambiguity helpful as an editor.) So we will have to figure out whether to phase out the label completely or to enforce a more careful usage.
    • It is difficult to format grammatical gender information when Standard and Louisiana French are at odds, which is rather often).
    • Finally, terms labeled "North America" do not show up in LF categories. This is not an editing challenging per se, but it does annoy me.
    Anyway, to answer Nicodene's question, the Louisiana francophone community has increasingly adopted a form of Standard French orthography to represent the language, though some speakers (as they have every right to do) have chosen to write their speech in their own way. The word "she", for example, has been written as elle, alle, al, a', and a (in this example, pronounced [a]). I do not know of a widely accepted alternative orthography.
    Addenda:
    • As I understand it, LF [æ] is an allophone of /ɛ/ that usually occurs before /r/ (or in the case of cher, before a zeroed /r/). I have not heard of any circumstance where [æ] was distinctive in Louisiana. Its indistinctiveness is more apparent in Louisiana Creole, which has a very similar phonetic system: vær [væɾ] ~ vèr [vɛɾ] ~ [vɛ] (all meaning "green", cf. French vert). Likewise, [ɑ] is an allophone of /a/. If I remember correctly, the [a]/[ɑ] merger is even more widespread in Louisiana than it is in France.
    • Liaison is still alive and well in LF (e.g. ils ont [i‿zɔ̃], Grande Île [ɡɾɑ̃‿tɪl]). Cases like z-oiseau and z-haricot are better explained by rebracketing (cf. Louisiana Creole zwazo and zariko, French licorne, English apron).
    • -aill(er) is not unique to Louisiana.
    • après is a perfectly acceptable form of aprés (and is in my experience much more common), though the latter is indeed more phonetic. I can't speak for Haitian Creole, but après as a present progressive marker is also found in Québec.
    Monsuu (talk) 08:58, 2 December 2025 (UTC)Reply

    Shaetlan

    [edit]

    Hello. I am complete novice on Wiktionary and so I would appreciate some advice on what could turn into quite a large, multi-faceted umbrella project about Shaetlan and related languages.

    For context, I am one of the people behind the Shaetlan language activism group I Hear Dee. Prof. Dr. Viveka Velupillai and I spearheaded an application to have Shaetlan (previously "Shetland dialect") recognised as a language in its own right. As of 15th October 2025, Shaetlan has been assigned an ISO 639-3 code scz, meaning it is now classified separately from Scots.

    The name of the language's article on English Wikipedia is still "Shetland dialect" at the moment, however I have initiated a discussion to have this changed. My preferred option (the autonym Shaetlan) is unlikely at this stage to win the debate. The discussion is still ongoing, so if anyone would like to chip in there about whether it should be “Shetland (language)” or “Shetland language” (or something else), please feel free to contribute to the discussion there.

    I also have a draft which I intend to put in place once a decision is made on the language name on Wikipedia, which I would encourage contributors to this discussion to read.

    My questions will concern a number of different aspects of the categorisation of words on Wiktionary to reflect this new change.

    Shaetlan

    [edit]

    We would like to review existing entries, delete some bogus ones, redirect some existing entries that have uncommon spellings, and try to flesh out the entries and etymologies from our existing dictionary database which Prof. Velupillai has compiled in FieldWorks Language Explorer. Any known automated methods of doing any of this (particularly converting the FLEx data to Wikitext) would be greatly appreciated.

    As the language now has its own ISO 639-3 code (scz) under the basis it is a Mixed Language which is (related to but) seperate from Scots, we feel it would be most logical to have Shaetlan entries listed under its own Shaetlan heading, and that any existing (Shetland) entries under Scots headings be moved to Shaetlan headings. This will invite the question of which terms are Shaetlan-specific and which are found both in Shaetlan and pan-Scots - advice on how to separate this most easily on whatever scale it can be found right now would be appreciated.

    Shetland English

    [edit]

    The same as above - we could do with reviewing, deleting, redirecting and fleshing out. And in particular, making the distinction much clearer between what constitutes Shaetlan vs. Shetland English - both varieties co-exist, and will share some terms, but they are not the same. I presume these would best kept under English main headings and then listed as (Shetland) underneath as many seem to be already.

    Insular Scots

    [edit]

    Now that it has been demonstrated that Shaetlan is a language in its own right, Insular Scots as a categorisation is increasingly redundant. I would suggest phasing out the Insular Scots categorisation entirely in favour of more precise terminology, i.e. Shaetlan, Orcadian, or both as appropriate. Orcadian, as it does not (yet) have an ISO code, should remain as a subheading under Scots for the time being.

    Norn

    [edit]

    As is to be expected with a mixed language there is considerable overlap between Norn terms and Shaetlan terms. While there are two sizeable Norn dictionaries, the vast majority of the words listed in these dictionaries were collected after Norn is widely considered to have died out, and neither of the writers of these dictionaries (Jakob Jakobsen or Hugh Marwick) claimed to have spoken to any Norn speakers, only Shaetlan / Orcadian speakers. So, these Norn wordlists ought to be considered as primarily Shaetlan / Orcadian terms that derive from unattested Norn forms. Only a very scanty handful of words from Norn conversation fragments, songs, riddles, etc., recited by rememberers, could reasonably be argued to constitute actual attested Norn. I think it would be sensible to try to sort this out as well.

    I also suspect there will be some Nynorn terms that have crept in and been classified as actual Norn - this could do with a check as well. I will stay out of the conversation about whether Nynorn ought to be included as a language here or not, it's not at all my area of expertise!


    While a Shaetlan language Wiktionary project is a tempting long-term goal, I think prioritising the English language resource first would be a more productive place to start with more opportunity for support and to allow us to familiarise ourselves with the tools, rules and procedures.

    I hope what I have proposed above is a sensible course of action - if anyone has any comments or advice I'd gladly hear what more experienced users have to say.— 🐗 Griceylipper (✉️) 23:37, 12 November 2025 (UTC)Reply

    • Regarding Shaetlan:
    Having worked with this language before to a limited extent in the course of evaluating what words from old reference works about Shetlandic might be includable under the then- (and still at this moment-) current treatment of words from it as being either English or Scots, I have thought before that it was odd that it was treated here as being entirely just Scots or English, given its different origins and different vocabulary. I agree there is something here which merits being treated as a separate language.
    It exists in a situation comparable to that of Scots in the sense that, as you note, some texts/words from Shetland are not Shetlandic but just plain English (e.g. quoting from a random Shetland Times article, "...geared up to deal with the thousands of holidaymakers who will now throng the north in search of sleet, drizzle, Beltane Rees and Lammas Spaets? I think we should be told. Meanwhile the truth of the old Bressay adage has been proved yet again: ..."), or are Scots, but just like we nonetheless manage to include the Scots language (spending just a little extra effort to evaluate what texts and words are Scottish English vs Scots), I believe it is possible to include at least some Shetlandic.
    (Because this Wiktionary is written in English and tries to call things by the names they are most commonly known by in English—which may not be autonyms, hence e.g. French and German, not Français and Deutsch—it should probably be entered here as "Shetlandic"; it does not seem like it is commonly called "Shaetlan" in English at this time.)
    The other concern I want to mention (besides distinguishing what is Shetland English or Scots vs Shetlandic) is making sure words and spellings that are added are attested in more places than just one recent language revitalization project's website. In this case, the project is academic and I see Velupillai has also published a book about it and mentions that the language is still used 'in the wild' by some users of sites like Instagram, and there are older texts in or about it, all of which is helpful, but please do note that any words added here need to be attested elsewhere [besides Wiktionary and WMF projects], ideally in print (though living uses of the language online could also be considered); Wiktionary is not currently set up to be a place for people to publish their research, or for speakers to come and enter words they personally know but which are not otherwise attested (although there has been limited discussion about whether or not we could find ways to enable that). This is in part because it's very difficult for us to know who is really a speaker or researcher and trustworthy, and we want to avoid things like that one non-speaker writing lots of Scots Wikipedia articles and Wiktionary entries.
    • Regarding "bogus" entries: once a language code for Shetlandic is added (assuming there are not objections to that), "bogus" entries should generally be tagged with {{rfv|scz}} and then added, via the little "(+)" button that template generates, to WT:RFVE, so people can look for attestations and determine whether they are attested in any language (Shetlandic, English, or Scots). (On a technical note @ our technically adept editors, once/if a language code for Shetlandic is added, if other people do not come here and object, my instinct is that someone should update the {{rfv}} (and rfd, etc?) templates to put Shetlandic RFVs on WT:RFVE, as is done with Scots, since people who edit English or Scots are probably best-positioned to help with Shetlandic RFVs.)
    • Regarding Norn: I would want to check what other reference works and editors think before any changes are made; for now, I would not support making changes to Norn. When people who are promoting one language say "records of this other old language should actually be considered to be not that language but our language", that could be right, but there's also an obvious potential that that perspective might not be in sync with other perspectives.
    - -sche (discuss) 21:51, 19 November 2025 (UTC)Reply
    @Griceylipper @-sche already made reference to some of the problems with the Scots entries, namely that a lot of them were essentially made-up terms created by a language-revitalization enthusiast rather than legitimately verifiable terms. But there are other issues, in that a lot of the entries are just stubs, and I'm pretty sure a lot of them contain incorrect information. Furthermore, there are no active Scots editors on the English Wiktionary, which means no one is able to correct the errors and sort out the remaining made-up terms from the verifiable terms. The poor state of Scots is such that often it is better to look for the equivalent dialectal English entry (since many or possibly most Scots terms are also found in one form or another in northern dialects of English, such as Yorkshire English) than the Scots entry itself on a given term. FWIW, we had a very similar issue with Manx; thankfully we eventually had a knowledgeable Manx editor come along who put in the time to separate out the bogus Manx terms from the legitimate ones, and given this list I went and deleted several hundred (maybe over a thousand) bogus Manx terms. (But the state of Manx is still somewhat poor as well.)
    Minority L2 languages also have an unfortunate tendency to attract incompetent editors who don't understand the English Wiktionary's inclusion criteria, don't know the language in question and don't have any idea how to separate reliable sources from unreliable ones -- but think they're competent, and tend to do drive-by editing in lots of minority languages. This has been a perennial problem occurring over and over again here, to the extent that I've gotten very skeptical of any editor who edits several languages and has even some indications that they don't know what they're doing, and have blocked several such editors who showed no interest in changing their ways after several warnings. So I would suggest you take the sourcing requirements very seriously, and make sure to create entries that not only are properly formatted but have references justifying their existence (it can be in the form of an item, or ideally several such items, under Further Reading; you don't need to put an inline citation by every definition), and ideally quotations from those references indicating the usage. You should consider making a Wiktionary:Shetlandic entry guidelines page before you add any entries, modeling it after some of the other entry guidelines pages that are well-written and relatively complete. @Thadh may be able to point you to some such pages and give you some good advice about pitfalls to avoid, as he works extensively in minority languages and learned the hard way about the deleterious effect of poorly sourced entries (i.e. he was overly excited at first and created a bunch of entries for some languages that he eventually realized were so bad that it was better to delete all entries in the language and start over). Also if there isn't a standard orthography (which I suspect there isn't), you will have to think long and hard what orthography to use and how to handle competing spellings (there isn't a single right answer to this issue). Benwing2 (talk) 05:04, 21 November 2025 (UTC)Reply
    @-sche @Benwing2 Thank you both for your replies. I am glad to hear you think Shaetlan as an L2 language on Wiktionary is viable, with some effort. As you may or may not know, the state of affairs on Wikipedia are very slowly coming to the conclusion it should either be Shetland (language) or Shetland language. Shetlandic is a controversial term and is widely disliked by native speakers, which is why I am advocating against that myself. I'm hoping any day now someone will close the discussion so I can get on with more edits outside the one article I've been concentrating on.
    I totally appreciate I personally cannot be the sole source of a word only because I am involved with I Hear Dee. I just want to point out very clearly that we (that's me and Prof. Velupillai) are not at all in the business of coining Shaetlan neologisms. For a minority language that's endangered, and has been stigmatised for hundreds of years, coming up with tons of neologisms is a sure-fire way to alienate native speakers. We are far more concerned with trying to create a standard orthography for the language's existing vocabulary that doesn't really have a widely adopted standard yet. Prof. Velupillai (who it is worth noting is an established typologist and contact linguist) and I follow strict typological principles: that is we aim for consistency, without being over-dogmatic. We use community intuition where there is one, but in the case of clashes or when in doubt, uses either the phonological or the etymological principle. Yes, sometimes that does mean taking a word that has for a long time been spelled commonly in an English-influenced fashion and giving it a spelling more representative of its community intuition in digitalk, etymology, pronunciation, or a compromise of all of the above. But we try to consider all aspects as much as we can while we do this. Sometimes (perhaps unexpectedly) that includes making bizarrely-spelled words a lot more English looking - e.g. John Graham of The Shetland Dictionary spelled a word meaning "a severe gale" as vaelensi, when it really ought to be just valency. We are are very open about the fact that we lean very strongly towards pragmatism over purism in our approach.
    Can I also point out that her and I have been made consultants for the Oxford English Dictionary on matters relating to Shaetlan as of 21st August this year (I don't think we are listed as consultants on their website yet but I have correspondence I can forward regarding this if required).
    It is for this reason that (and I know this is going to sound extremely biased, but) I would use the I Hear Dee orthography for entries here. To compare, a brief overview of Shetland dictionaries, in chronological order:
    • Thomas Edmondston: Very old. Combines Shetland and Orkney vocabulary with very little distinction in one book, so not ideal for our purposes.
    • James Stout Angus: The definitions are generally good, but the entry orthography is bespoke with lots of diacritics that has seen zero adoption by Shaetlan writers or the general public.
    • Jakobsen Jakobsen: The largest dictionary (over 1,000 pages), very highly detailed, extremely meticulous (almost obsessive) attention to detail with phonetics, but all "normalised" entries are rendered using Scandinavian principles of orthography. This frequently makes sense for words of Norn origin (which nearly all are, and we will borrow these spellings if it is appropriate), but for words commonly spelled based on strong community intuition that is most familiar with Anglian spelling principles, a word like peerie rendered as piri would be alienating. P.S. I have transcribed 1 of the 2 volumes of Jakobsen's Dictionary on Wikisource. With templates that can expand every single abbreviation. By hand. Lockdown was a potent drug.
    • John Graham: Probably the most popular dictionary, but it makes no claim to being comprehensive and is frequently internally inconsistent to an annoying degree. It is also founded on spurious principles. I respect that effort was made by the author in this dictionary but I don't think it should be held in nearly as high a regard as it seems to be held in by the general public.
    • A & A Christie-Johnson - the most modern and (I would posit) the best print dictionary available. It is a remarkable achievement by amateurs with a keen interest in our language. This is the closest of all dictionaries to our spellings, but again there are some (usually minor) internal inconsistencies and areas where we slightly disagree about rendering phonetic considerations.
    A common problem with these dictionaries is that they all treat Shaetlan as a dialect in that the only words that feature are Shaetlan-specific. We however realise that leaves large gaping holes with all the "boring" connecting words that are commonly used in Shaetlan but aren't frequently identified as being Shaetlan because they often share meanings and spellings with Scots or English. That doesn't however make it not Shaetlan as well - we strive to include as many of these as we can.
    I hope you can appreciate with what I've said above that we are trying our level best to be thorough and not make a mess of things just to be stubborn because we are attached to one particular spelling of a word, or whatever.
    When you mention that a word must be attested elsewhere, does that mean a particular spelling must be attested elsewhere? Or for example, can a usage/meaning be attested in one source with one spelling, and then a different spelling of the same word be used as the standard, consistent entry name with a reference to our own dictionary on I Hear Dee? Just want to make sure I have the rules right.
    I would be inclined to allow social media usage of a term to be permissible in the case that a published print attestation couldn't be located. There is a very healthy corpus of Shaetlan digitalk to be found especially on Facebook (e.g. on the Wir Midder Tongue group). There is even a surprisingly active meme scene in Shetland that has some well-known terms I can think of - e.g. come wi da mince ("hurry up"), or grip his pilly (an exclamation shouted at Up Helly Aa, lit. "grab his penis") - that I doubt would have any print reference.
    In the name of openness and transparency, you mentioned Prof. Velupillai's book Shaetlan: a young language wi aald røts - well, I am the other named author of that book. I haven't yet set up a user page here but I am quite happy to put my name and association with I Hear Dee very clearly on my user page for disclosure. I am a native Shaetlan-speaker - if you want proof, you can listen to a radio programme I recorded with my wife for BBC Radio Shetland (still available on BBC Sounds for a couple of weeks if you can access it in your country!)
    For the code scz and getting all the templates on Wiktionary set up to work with it, what is the process there - do I have to wait till the end of the year until whatever lists it's on from ISO get updated? Or can someone implement it after a discussion here? If so, can that discussion start now?
    Re: treatment of Norn, there are a number of sources which ask the question of what constitutes Norn vs Shaetlan with Norn etymologies, it is a commonly addressed topic:
    Re: drive-by editing - I will say now for the record now that my languages of interest are Shaetlan, Shetland English, & Norn. Anything past that (even Lowlands Scots!) is out of my wheelhouse. If anybody catches me editing anything more than a typo on any other language, I've probably been replaced by a doppelganger.
    I am quite happy to adopt strict sourcing requirements. In fact it's something I've wanted to do for a while (collecting lots of refs to dictionaries and other attestations in one place for each entry), so if that's enshrined in the entry guidelines that's fine by me.
    I will gladly accept any advice or reading material from @Thadh or anyone else about existing entry guidelines I could look at or any general advice to avoid problematic zealots causing problems. Scots Wikipedia was a hell of a mess, that's the last thing I want to happen with Shaetlan. — 🐗 Griceylipper (✉️) 13:05, 22 November 2025 (UTC)Reply
    I think I maybe messed up my pings? @-sche@Benwing2@Thadh (apologies for a double ping if it worked the first time!) — 🐗 Griceylipper (✉️) 13:11, 22 November 2025 (UTC)Reply
    @Griceylipper Thanks for the (extremely) detailed response! You have convinced me that you will take our quality standards seriously. As for adding a new language, we don't have to wait for ISO to do anything. Our language codes are independent of ISO's; we try to use ISO codes whenever possible but many languages in Wiktionary don't have corresponding ISO codes (so we use bespoke codes designed not to conflict with ISO format codes, for future-proofing), and conversely in some cases we've made the decision not to use a given ISO code (e.g. we intentionally have a single Serbo-Croatian language instead of separate Bosnian, Serbian, Croatian and Montenegrin languages). In a few cases (e.g. North and South Levantine Arabic) ISO has merged two codes into one and we've decided to follow along but haven't yet implemented it; in general, mergers are a lot harder than splits to implement for languages with a non-trivial number of entries, and we don't currently have the appropriate SME support to implement the Levantine Arabic merger. If you look at WT:LT you can see a detailed (although incomplete) list of cases where we deviate from ISO language decisions. If @-sche agrees, we can go ahead and add scz as a new L2 under the name "Shetland", although I would still recommend you create the page Wiktionary:Shetland entry guidelines before diving in and creating entries for the language. (Note that it can and likely will be a work in progress that you will update as you go along, but you should try to include as much as possible beforehand, both to ensure consistency of entries and as something to point to in case some other drive-by editor happens to start adding entries.) You can take a look at a few existing guidelines to help you create this; Category:Wiktionary language considerations has the full list of them, which range from the rather spare WT:Afar entry guidelines to the extremely detailed WT:Indonesian entry guidelines. It should cover the sort of info you mentioned above: what spelling system to use, what references to use, what counts as Shetland vs Norn, how to format an entry, etc.
    As for the spelling system, IMO you are free to choose whatever spelling system you think is most appropriate. As for attestation, it is fine to normalize terms into a given orthography; they don't need to be attested in the exact spelling you choose as long as it's clear it's the same word. Note also that Shetland is an LDL (Limited Documentation Language; see WT:WDL for the list of "well-documented languages", and all others are LDL), meaning it suffices to have a single attestation that adheres to the WT:CFI principles (e.g. that it is a reliable source and durably archived). (WDL's like English require three attestations for doubtful words.)
    As for social media attestations, that policy is unsettled at this point. From a technical standpoint, most social media attestations fail the "durably archived" requirement, but as you point out, a lot of words clearly exist but are not attested in print sources. As a result, you will find people quoting from Twitter (aka X) and other social media sources. I would just say, tread lightly here and only quote from social media for words not attestable in any other fashion.
    If I missed any question you have, please let me know and I'll try to answer it to the best of my ability. Benwing2 (talk) 21:37, 22 November 2025 (UTC)Reply
    ──────────────────────────────────────────────────────────────────────────────────────────────────── Re Jakobsen's Norn dictionary, I get the sense that it may be necessary to evaluate words individually, with an eye to whether it's a case where A) Jakobsen has recorded it as a remembered Norn word, without evidence that it's in use in the Shetland language (in which case I don't see a secure basis for automatically assuming that it not only existed in Norn but furthermore was in use as a Shetland word, any more than we could automatically assume a remembered Tequiraca word is used in Spanish, or assume remembered Dama words are Mende, etc), or B) Jakobsen provides examples of the word still being in use in Shetland. Ironically(?), I get a rather different impression in this regard from scholarly literature about the dictionary than from reading the dictionary itself: Reading Barnes' writings, I get the sense that he takes Jakobsen's dictionary of Norn to be describing (remnants of) Norn, and I don't discern, at least in the two papers I read, that he thinks the dictionary instead describes Shetland (in one paper, btw, he presents some other things he regards as true Scandinavian Shetland Norn, and as an aside opines that "Modern Shetland dialect is Scots"); similarly, reading Knooihuizen's paper, I get the impression he is treating Jakobsen's dictionary as recording memories of Norn coloured by Scots (e.g. dropping preaspiration) due to being recorded after Norn ceased to be a living language, but I did not notice anywhere that Knooihuizen thought the words were Shetland words instead of Norn (though perhaps I missed something) . . . even though some of the usage examples in Jakobsen's dictionary itself do seem to show that some of the words were [not just Norn but] in use in a clearly non-Norn/non-Scandinavian language.
    Re spellings: if most of the spellings you're using are also used in other works/sites/etc, I hope people here will be OK with other words being normalized to be consistent. (If most of the spellings you're using are not the spellings that are attested, I would be less supportive, because as I said to someone about Pattani Malay recently, my understanding is that Wiktionary tries to be descriptive of what words and spellings exist, not prescriptive of ones that haven't been adopted yet. But it sounds like the spellings you are using often correspond to elsewhere-attested spellings. Yes?)
    Re using "Shetland" as the name, and "scz" as the code, OK. I will wait a little while longer before adding it, while these remaining issues are discussed.
    - -sche (discuss) 04:45, 30 November 2025 (UTC)Reply
    Also, re the Norn entries that exist on Wiktionary ... as I look through more of them, you are right that they all need to be checked — independent of what language one considers Jakobsen's words to be — because I notice that in many cases it's unclear what the source of them is: brennja, for example, is AFAICT neither the word for fire in e.g. Hildina nor in Jakobsen's dictionary AFAICT (he has brenna, which I suppose brennja could be someone's alternative orthography of?); "brennja" "Norn" gets no Google Books or Scholar hits and the only hit on the web is a paper on Scribd. (Since there are only 126 Norn entries, I might just try to go through them all and RFV as needed, complimentarily to anything you're planning with regard to going through them and moving ones for which Jakobsen has examples of use in Shetland-language sentences...) - -sche (discuss) 06:11, 4 December 2025 (UTC)Reply

    Rename Proto-Saka to Proto-Tumshuqese-Khotanese

    [edit]

    Following {{R:txb:Peyrot:2018}}, {{R:ine-toc:Dragoni:2022}}, and {{R:ine:Bernard:2025}}, I propose renaming Proto-Saka [xsc-sak-pro] to Proto-Tumshuqese-Khotanese [xsc-tkh-pro]. The latter is admittedly more cumbersome, but Proto-Saka(n) tends to be used primarily in a cultural/ethnological sense rather than a strictly linguistic one. For reference, WT:Etymology_scriptorium/2018/January#PII_language_codes. --{{victar|talk}} 08:37, 18 November 2025 (UTC)Reply

    Actually, let's table it, but if someone could add Proto-Tumshuqese-Khotanese as a variant name under [xsc-sak-pro]. --{{victar|talk}} 02:49, 19 November 2025 (UTC)Reply
    @Victar Done Done (a couple of days ago). Benwing2 (talk) 06:56, 4 December 2025 (UTC)Reply

    Rename Kamkata-viri to Katë (and other stuff)

    [edit]

    Following Halfmann (2021) - Terminological Proposals for the Nuristani Languages and Halfmann (2024) - A Grammatical Description of the Katë Language (Nuristani), I propose renaming Kamkata-viri (the designation used by Richard Strand) to Katë. Kwékwlos (talk) 20:22, 25 November 2025 (UTC)Reply

    EDIT: The Nuristani languages shouldn't be classified into "Northern" and "Southern" branches; each language should be a primary branch as per Halfmann (2021). Kwékwlos (talk) 22:51, 27 November 2025 (UTC)Reply

    Renaming "Mobilian" (ISO: mod) to "Mobilian Jargon"

    [edit]

    As explained in Drechsel 2008 (available for download here: [33]), Mobilian "proper" and Mobilian Jargon—misleadingly—are not the same languages. The former was a (poly)synthetic vernacular of uncertain origin spoken by the Mobile people in the environs of Mobile Bay. The latter was an analytic pidgin and lingua franca with a mostly Choctaw/Chickasaw lexicon spoken widely throughout the modern American South as a second language. Wiktionary's nascent coverage of "Mobilian" to this point is entirely made up of Mobilian Jargon entries. I propose a rename for the sake of accuracy and to prevent confusion in the future. In my view, "Mobilian Jargon" is the most appropriate option, as the term is well-represented in the literature and much more widely recognizable than any of various endonymic variants (e.g. Yama, Yamá, Yamma, etc.). Monsuu (talk) 21:31, 30 November 2025 (UTC)Reply

    (Greater) Binanderean languages

    [edit]

    The Binanderean languages should be treated as a branch, probably still within Trans–New Guinea (for now). Per Smallhorn (2011), this group includes Binandere [bhg], Yekora [ykr], Suena [sue], Zia [zia], Orokaiva [okv], Ewage-Notu [nou] (currently unclassified here), Korafe-Yegha [kpr], Baruga [bjz] and Doghoro [dgx]. Guhu-Samane is considered an isolate marginally related to the Binanderean languages, together forming the Greater Binanderean languages. The proto-language Proto-Binanderean (excluding Guhu-Samane) is reconstructed by Smallhorn (2011). — justin(r)leung (t...) | c=› } 16:59, 3 December 2025 (UTC)Reply