Wiktionary:Beer parlour/2023/November

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Vector2022 letter to el.wiktionary - Discussion[edit]

A letter was sent to us at el.wiktionary, that by Novermber 11th a new skin Vector2022 (like this, or this example) will be applied as default desktop view.
Discussion in English is ongoing here for everyone to join. Regardless of aesthetics, I am worried for the loss of __TOC__ (placed at all our Appendices and such pages) and very sad to have interwiki links, which we click constantly, hidden in a dropdown. We are now trying to substitute these, manually, or with some Tempaltes. If this skin is intended for wiktioanries too, not only wikipedias, could we ask for wiktionary‑specific modifications? We would be interested in your opinion and support. Thank you. ‑‑Sarri.greek  I 23:53, 1 November 2023 (UTC)[reply]

Parsing policy[edit]

Why is Wiktionary:Parsing categorized as a Wiktionary policy?  --Lambiam 08:39, 2 November 2023 (UTC)[reply]

You'll have to ask @Koavf, who added the category. PUC08:49, 2 November 2023 (UTC)[reply]
Because I couldn't think of anything better. —Justin (koavf)TCM 08:59, 2 November 2023 (UTC)[reply]
Resolved: now categorized as just Wiktionary.  --Lambiam 17:58, 2 November 2023 (UTC)[reply]

Splitting Ancient Greek[edit]

@Mahagaja, Sarri.greek, Saltmarsh (please ping any other interested users)

Currently, Ancient Greek is handled as one (macro)language. This means that while Attic and Homeric Greek have a very good coverage, other lects like Aeolic and Doric are mostly an afterthought. For instance, the inflection tables note that the "dialectal" inflections are discussed in the appendix.

AFAIK, until Koine Greek there was no standardised Hellenic variety at all. Homeric Greek had a lot of influence on the various lects, but everyone mostly wrote in their own vernacular. As such it makes sense to me to split the Ancient Greek lects into major dialect groups, also considering the fact the various lects differ quite strongly. I imagine two scenarios:

  • A very rough division (Arcado-Cypriot, Ionic (incl. Attic), Aeolic, West Greek (incl. Doric).
  • A more detailed division (Arcadian, Cypriot, Attic, Western Ionic, Eastern Ionic, Thessalian, Boeotian, Lesbian, Doric, Northwestern Greek)

This would help on the following fronts: Most importantly, it would increase the possibilities in covering the various dialects from inscriptions, as well as (Lesbian) Aeolic in Sappho's work or (Eastern) Ionic in Herodotus. It will also make etymological coverage for Tsakonian historically accurate. As a bonus it would also finally give Proto-Hellenic more credibility, and make it much easier to provide descendants in the form of various languages, rather than various dialects of one single language.

I'm eager to hear your thoughts on this. Thadh (talk) 11:57, 2 November 2023 (UTC)[reply]

I don't think splitting grc into multiple languages is necessary to achieve any of those goals. All of your desiderata are achievable with the status quo of having the dialects be etymology-only varieties of Ancient Greek. Splitting grc up would simply unlink the less well covered dialects from the very useful infrastructure (templates and modules) we have in place. If there are ways in which the existing templates and modules are inadequate for the less popular dialects, I think it makes more sense to improve the templates and modules to accommodate them. —Mahāgaja · talk 12:31, 2 November 2023 (UTC)[reply]
The only ancient greek dialect which was certainly NOT mutually intelligible with the others is Arcado-Cypriot, which, like Scots compared to English, is very conservative in Nature. I think it is the only one which needs to be split.
As for the other four, Attic, Doric, Aeolic and Ionic, they were probably no more different than modern english dialects, with the exception that english orthography is practically the same everywhere. Ελίας (talk) 14:20, 2 November 2023 (UTC)[reply]
Splitting the language up would just make things more complicated: more language codes to keep track of and more knowledge of Ancient Greek dialectology required to do simple things like adding a quote. Chuck Entz (talk) 15:08, 2 November 2023 (UTC)[reply]
Splitting would be a nightmare for Greek borrowings in other languages. We would not know which dialect code to use. Vahag (talk) 19:17, 2 November 2023 (UTC)[reply]
I agree with User:Mahagaja. We already have the infrastructure in place for handling several dialects of Ancient Greek; the focus on Attic and Homeric Greek is simply due to the fact that there are a whole lot more sources for these dialects than for the others. Look at the current situation with Scots, which is almost completely neglected; that's what would happen if we split off various of the Ancient Greek dialects. IMO if there's any split that makes sense, it's splitting the later stages of Greek (e.g. Medieval Greek) into a different L2 language, and I know there has already been a discussion about this initiated by User:Sarri.greek, although it didn't end up anywhere. (Note, I'm not expressing a specific opinion on whether this split is the best thing as I don't know enough about Medieval Greek.) Benwing2 (talk) 20:46, 2 November 2023 (UTC)[reply]
@Benwing2. It did not (Medieval Greek, March2023), and I intend to renew the petition once a year, so that I can resume my work (now mainly on Koine and Med.Greek) at en.wiktionary. I hope that en.wiktionary handles 'languages', period phases as well as dialects (which it calls 'languages' too), according to bibliography, not because of the personal interests of editors. The love of wiktionarians for Homer and dialectal Greek is commended, but may I remind you, that an Athenian of the 5th century would comfortably listen to Doric at theatre plays -amidst the Peloponnesian War- (a label marking dialects, I think, suffices). Speaking of phases, the label Koine (grc-koi), also needs some care, because it covers many centuries. Although en.wikt/arians dislike Koine and Medieval Greek, they did exist, and all bibliography accepts it, the variety of opinions regarding only termini. Thank you Sir, for bringing this issue up, it really was a blow to me, the neglect with which it has been suppressed. ‑‑Sarri.greek  I 00:16, 3 November 2023 (UTC)[reply]
@Sarri.greek It is unfortunately common for Wiktionary discussions to peter out with no action taken. Feel free to create another Beer Parlour discussion, and make it a simple request to split Medieval Greek (with an appropriately defined time period) from Ancient Greek. The last discussion was long and I am not sure exactly what the objections were. You might want to state the prior objections and give rebuttals, but the fewer words used, the better, otherwise people are likely to not read it. Benwing2 (talk) 00:24, 3 November 2023 (UTC)[reply]
@Benwing2, sorry: Why is a discussion needed for the obvious? Does en.wiktionary need discussions to handle well referenced linguistic issues, may that be 'kinds of borrowings, languages, dialects, etc? My first paragraph at Medieval Greek, March2023 is quite short, very clear, mentions the reference-support, and I was amazed that the blah blah had to drag that far. The sysops of a wiktionary or of a wikipedia, need to just take a brief look at the bibliography to get the picture; one does not need to be a specialist on the language. If the sysops of enwikt, abstain from taking a look on the grounds that they are not specialists, it will never, ever be implemented. If there were non-anonymous, professional consultants for wikiprojects, discussions would deal only with tech matters and the details of implementing things. ‑‑Sarri.greek  I 00:38, 3 November 2023 (UTC)[reply]

We should choose a Word of the Year[edit]

Choosing a Word of the Year seems to be a popular dictionary tradition. The problem is that most of the picks are godawful, either being some neologism that no one has heard of (goblin mode, lol) or having no apparent relationship to what actually happened in the last year. I think the dictionary world needs some people who can take this job seriously.

My proposal is generative, reflecting the sophisticated generative AI models released throughout late 2022 and early 2023, such as: ChatGPT (November 2022), GPT-4 (March 2023), and DALL-E 3 (August 2023). Google Trends data shows a significant increase in searches for "generative" and other AI-related terms throughout 2023 [1].

Do you guys think this would be a good idea? Pinging @Sgconlaw, Lingo Bingo Dingo. Ioaxxere (talk) 20:13, 3 November 2023 (UTC)[reply]

Agreed. It can be based on some actual data like increased percentage of views. Maybe a top 10 or unranked list of five? —Justin (koavf)TCM 20:31, 3 November 2023 (UTC)[reply]
Interesting. I have a few questions:
  • Who chooses the word? Is there to be a panel, or is the word to be voted on? Or is it to be based on actual data, as @Koavf suggests?
  • Presumably the word has to have gained currency in the preceding year? Or, to put it another way, should Word of the Year 2023 be featured in 2024?
  • On what date does the word get featured?
Sgconlaw (talk) 20:34, 3 November 2023 (UTC)[reply]
@Sgconlaw, Koavf I think the word should be chosen by a WT:VOTE in which anyone can nominate candidates. The winner can be featured on the main page around late December to early January. Ioaxxere (talk) 19:11, 4 November 2023 (UTC)[reply]
Good thinking. —Justin (koavf)TCM 19:12, 4 November 2023 (UTC)[reply]
@Ioaxxere: I suppose the Word of the Year will be featured somewhere on the Home Page? We'll need to think about the layout of the page and where the WOTY will appear—above the WOTD box, or elsewhere? Will it stay up for a whole year? — Sgconlaw (talk) 20:09, 4 November 2023 (UTC)[reply]
No, I meant that it would only be featured around late December to early January. Ioaxxere (talk) 20:31, 4 November 2023 (UTC)[reply]
It seems a fun idea and I encourage you to pursue it, but I am not going to take part in setting it up or choosing words. I do have an alternative proposal for a WotY: jailbreak, which is in my opinion a lexically more interesting word than generative. ←₰-→ Lingo Bingo Dingo (talk) 20:36, 4 November 2023 (UTC)[reply]
We could ask ChatGPT for suggestions, weren't it for the 2021 cutoff date :) Jberkel 20:56, 4 November 2023 (UTC)[reply]
Ok, I've created Wiktionary:Votes/2023-11/Word of the Year Ioaxxere (talk) 22:50, 5 November 2023 (UTC)[reply]
@Ioaxxere: it probably doesn’t need to be that formal a vote. An ordinary vote here at the Beer Parlour is sufficient. — Sgconlaw (talk) 23:06, 5 November 2023 (UTC)[reply]
I agree that it doesn't have to be that formal, but I think it's fine to keep it as a formal vote. It will give it more prominence, since it will show up on everyone's Watchlists. Andrew Sheedy (talk) 23:13, 5 November 2023 (UTC)[reply]

incipient edit war on Module:ar-headword[edit]

Module:ar-headword has long had the ability to mark a personal/non-personal distinction on nouns, since it affects the agreement and pluralization patterns (non-personal nouns take feminine singular agreement in the plural and often use different plural forms). User:Fenakhay removed this functionality without explanation, and when I asked them why, they gave no justification other than "doesn't make sense". I undid this as a contentious change made without consensus, and Fenakahay reverted my undo claiming that the onus is on me to find consensus to undo his change. AFAIK this isn't at all how Wiktionary consensus works; the onus is on the person making the change to seek consensus if the change is controversial. In my view, this information is useful and important, and similar to the animacy marking in Slavic languages (compare also Romanian, which has a class of gender-changing nouns that are marked as "neuter" on the lemma). Fenakhay thinks this info is not useful mainly based on the fact that it's typically not marked in Arabic dictionaries (but from what I've seen, Arabic dictionaries are deficient in many respects compared with the best dictionaries of other major inflected languages, and leave out lots of info useful for non-native speakers). Benwing2 (talk) 01:26, 4 November 2023 (UTC)[reply]

No Arabic dictionary; be it monolingual or bilingual, marks “animacy” to words. The addition is unjustifiable. Non-natives making stuff up and reinventing how Arabic gender is listed because they read it in a grammar book... Typical. — Fenakhay (حيطي · مساهماتي) 01:31, 4 November 2023 (UTC)[reply]
Furthermore, it is not about “animacy” but being sentient or not. So anything that's not sentient, their adjective is inflected in the feminine singular including animals. For example: تِلْكَ ٱلْكِلَابُ ٱلْحَمْرَاءُ تَنْبَحُtilka l-kilābu l-ḥamrāʔu tanbaḥuthose red dogs bark, as you can see, the adjective أَحْمَر (ʔaḥmar) is inflected in the feminine singular, same for the determiner تِلْكَ (tilka) and the verb itself.
If a learner wants to know if a word refers to a sentient or non-sentient, they only need to ask themselves if the referred is a person or not. — Fenakhay (حيطي · مساهماتي) 01:48, 4 November 2023 (UTC)[reply]
"sentient" is another word "person". "Animacy may not be the right word but there are different levels, e.g. Polish/Ukrainian, etc. has "inanimate/animate/person" (three-way distinction), as opposed to "inanimate/animate" only in Russian, etc. We can discuss terminology. Anatoli T. (обсудить/вклад) 02:13, 4 November 2023 (UTC)[reply]
Notifying Arabic editors: (Notifying Alarichall, Atitarev, Benwing2, Mahmudmasri, Erutuon, عربي-٣١, Fay Freak, Assem Khidhr, Fixmaster, Roger.M.Williams, Zhnka, Sartma): Fenakhay (حيطي · مساهماتي) 01:34, 4 November 2023 (UTC)[reply]
Adding grammatical information is a plus, especially if it helps to determine how words are used in a sentence. Native speakers may find it intuitive but I don't know if the person/non-person agreement is never taught at school in Arabic speaking countries.
Let's look at these examples (yes, from a grammar book in English)
Persons:
  1. الْمُعَلِّمُونَ مُجْتَهِدُونَ (persons)al-muʕallimūna mujtahidūnathe teachers (m-p) are diligent, personal pronoun: هُمْ (hum)
  2. الْمُعَلِّمَاتُ مُجْتَهِدَاتٌ (persons)al-muʕallimātu mujtahidātunthe teachers (f-p) are diligent, personal pronoun: هُنَّ (hunna)
Non-persons:
  1. الْأَقْلَامُ جَدِيدَة (non-persons)al-ʔaqlāmu jadīdathe pens (m-p) are new, personal pronoun: هِيَ (hiya)
  2. الطَّاوِلَاتُ كَبِيرَة (non-persons)aṭ-ṭāwilātu kabīrathe tables (f-p) are big, personal pronoun: هِيَ (hiya)
The adjectives for non-persons in the plural are in the feminine forms.
Another example:
السُّودُ (as-sūdu, the blacks), الْبِيضُ (al-bīḍu, the whites) - these can only refer to humans
I don't quite know what the Arabic gender structure was and is now at Wiktionary but I think we need to distinguish persons from non-persons.
(By the time I've typed my answer, I see new edits appeared) Anatoli T. (обсудить/вклад) 02:06, 4 November 2023 (UTC)[reply]
It is a simple equation:
  • if WORD1 (in the plural) refers to a human being, then the adjectives are inflected according to the gender/number of the word.
  • if WORD2 (in the plural) refers to a non-human; be it an object, a concept or an animal, then the adjectives are inflected in the feminine singular.
Sorry but this is a grammar rule and doesn't add any information to the word itself. It is not rocket science. — Fenakhay (حيطي · مساهماتي) 02:12, 4 November 2023 (UTC)[reply]
@Fenakhay. No rocket science, true but I find it useful what agreement to use dependent on the sense. Like @Thadh also mentioned below, it depends on the sense. The same applies to Slavic languages for many words (not trying to make Slavic and Semitic similar to each other but I find similarities) Anatoli T. (обсудить/вклад) 02:15, 4 November 2023 (UTC)[reply]
We are taught about عَاقِل (ʕāqil, sentient) and غَيْر عَاقِل (ḡayr ʕāqil, non-sentient) in school. — Fenakhay (حيطي · مساهماتي) 02:13, 4 November 2023 (UTC)[reply]
@Fenakhay: Thanks. Do you think labelling "sentient/non-sentient" is inappropriate in the Arabic headword (more than one if that's a case for specific words)? "person/non-person" is just another way of expressing the same thing, which is also used in grammar books. Anatoli T. (обсудить/вклад) 02:19, 4 November 2023 (UTC)[reply]
I'm told that this personal/non-personal distinction in verbal/adjectival agreement is always evident from the noun's meaning, and not inherent to a lemma by itself; In that case, it seems like something we might put as a note in inflection tables, but I don't think it needs to be added to headwords. As for plurality patterns, that doesn't seem like a strong enough argument by itself. Thadh (talk) 01:42, 4 November 2023 (UTC)[reply]
It can be left out in the plural, marking all, even the animates, as pl, but editors we know not will feel a need to be more explicit, I know the edit patterns of casual site visitors. And mark inanimate plurals as feminines for example. To avoid inconsistencies and have clear models, it is sensitive that we have specific gender markers at plural POS. While singular entries typically have enough noise, I wouldn’t want to add on every page “hey, did you know that plural forms of inanimate nouns agree with feminine singular forms in Arabic?” Fay Freak (talk) 02:34, 4 November 2023 (UTC)[reply]
The declension table shows "sound masculine plural"/"sound feminine plural" for sentient (person) nouns. The gender labels can only be applied to sentient (person) nouns to avoid too much "noise".
It can be compared to Czech nouns where only masculine nouns differ by animate/inanimate. The animacy for feminines/neuters is unimportant (no grammatical changes). Anatoli T. (обсудить/вклад) 02:42, 4 November 2023 (UTC)[reply]
@Benwing2: You have not answered my question implied on Fenakhay’s talk page whether after the removal the site uses less Lua memory or processing time. Since I am not invested in essentialist dogmatic distinctions, the consideration that there were only about ten pages, out of myriads, an amount of pages that for Wiki pages necessarily constitutes an “error margin” conditioned by the negligence resulting from project participation being voluntary, that were actually using the removed genders, in combination with the computing principle of toning down complexity, favouritises the removal. I am much less concerned with what would “make sense” abstractly than you might expect: though comparative conceptualization is attractive, the implementation’s predictable effect upon and explainability to occasional readers and editors is of concern. If there are some instructions that can be chosen for a template then one needs to understand what one would try to achieve on pages with it, what to signify to readers, otherwise it is not “useful”. If it were “useful info”, man would have marked it, isn’t it? The entries do not appear to be adrift of accurate, exhaustive grammatical information. Didn’t feel a need nor note and suddenly Benwing opened our eyes that without marking the genders by the theoretically envisioned method we were missing out on something all the time? Here I am concerned with not making entries overfraught with information few uninvited in our particular circle would understand—such as claiming other genders than feminine and masculine in the singular. The point to make, which eventual editors also attempt to make at those pages no matter our choice, is most appropriately noted at plural entries: the being a plural of something but agreeing with feminine singular vs. the being masculine plural and the being feminine plural. Fay Freak (talk) 02:18, 4 November 2023 (UTC)[reply]
I support the view that sentient/non-sentient (person/non-person) should be added as an option to the headword/tables or usage notes, even if it hasn't been regularly done. We could say that Slavic word animacy is not important either. Come on, it's common sense, right? (:sarcasm:)
I will comply with whatever is decided, though. It bothers me, also that we are marking non-sentient plural nouns as "plural", which is kind of misleading. To me, it seems we have to distinguish three types of plurals, which govern different adjectives, verbs and pronouns. Anatoli T. (обсудить/вклад) 02:29, 4 November 2023 (UTC)[reply]
Honestly I don’t really know where the marking non-sentient plural nouns as “plural” comes from, somewhen I recognized it as the correct thing. (Many old entries mark inanimates falsely as m-p or f-p after this rule.) Fay Freak (talk) 02:37, 4 November 2023 (UTC)[reply]
I would use "m-p", "f-p" and something like "np" for non-sentient nouns. (Was is used before?) Anatoli T. (обсудить/вклад) 02:45, 4 November 2023 (UTC)[reply]
@Fay Freak I am trying to understand your comment, but the difference in processing speed and memory between having the extra gender distinctions and not having them is negligible. Benwing2 (talk) 02:56, 4 November 2023 (UTC)[reply]
Actually, @Fay Freak, how should we mark non-sentient plural nouns in your opinion? Should the gender be marked? This has been raised several times. Also, knowing what the original gender (in the singular) of those nouns was seems irrelevant grammatically. In fact, for some words I came across, it may be impossible or difficult to determine if they are feminine singular or (non-sentient) plural. Anatoli T. (обсудить/вклад) 02:55, 4 November 2023 (UTC)[reply]
Probably plural-inanimate (with some abbreviation), this would make learners more aware of the agreement, and fewer editors would make the mistake of marking as feminine singular, as—due to the morphologic relations which a language user is aware of, and to avoid the claim of masculine inanimates switching their gender in the plural—I would prefer to say it is not feminine singular; technically it means that the verbs and adjectives used with the inanimate plurals are also not feminine singular but of the same inanimate plural gender having form syncretism with feminine singular but we won’t muddle the tables with this observation. I know those cases where one is unsure whether something is plural of something or just an alternative form and/or by itself a singular, this is no specific problem, since in such cases there are also masculine singular inanimates. Fay Freak (talk) 03:09, 4 November 2023 (UTC)[reply]
Animacy is culture-specific, and the Slavic languages do not allow for personal beliefs. ("Я съел вкусный зайца" would be ungrammatical regardless of whether you think a hare is animate). From what I understand, in Arabic this isn't the case, and the speaker does decide whether to assign animacy (/sentiency) to a noun or not. If I misunderstand, do tell me, because in that case I will change my opinion above. Thadh (talk) 03:32, 4 November 2023 (UTC)[reply]
@Thadh: In Arabic, the usage is also quite grammatical. Please see the simplest Arabic examples I used in my post above. "sound masculine plural" would be inappropriate for e.g. non-sentient nouns (non-humans). The distinction is not between animate/inanimate but between persons and non-persons (animals fall into the same category as things). Anatoli T. (обсудить/вклад) 03:38, 4 November 2023 (UTC)[reply]
I don't doubt non-persons cannot be agreed to with personal markers, but is the other way around also true? If a word is not clear to be a person or not (e.g. mythical creatures)? Thadh (talk) 04:06, 4 November 2023 (UTC)[reply]
@Thadh The situation in the Slavic languages is not quite so clear-cut, AFAIK. The Russian terms for things like "bacteria" and "virus" may or may not be animate, depending on the speaker (e.g. scientists tend to view the terms as animate, others mostly not) and Czech is known to have a large number of "facultative animates" (things like mushrooms that may or may not be considered animate, depending on the speaker, and things like salami that are clearly inanimate but nonetheless treated as animate by some speakers). Benwing2 (talk) 04:56, 4 November 2023 (UTC)[reply]
@Thadh: The other way around is also true.
  1. In هٰؤُلَاءِ أَوْلادٌhāʔulāʔi ʔawlādunthese are boys هٰؤُلَاءِ (hāʔulāʔi, these) (m. pl) can only refer to sentient (rational) nouns.
  2. In هٰذِهِ كُتُبٌhāḏihi kutubunthese are books هٰذِهِ (hāḏihi, this) can refer to any feminine singular or non-sentient (irrational) plural nouns.
The plurals of non-sentient nouns are treated as feminine singular. They use the pronoun هِيَ (hiya, she), which also means "they" for non-sentient plurals.
Native speakers may shed more light on how mythical creatures are declined but will it make a difference for this discussion? Slavic languages also have corner cases. Anatoli T. (обсудить/вклад) 05:34, 4 November 2023 (UTC)[reply]
Just adding a separate perspective as an Anglophone student of Arabic. I don't have great expertise in the language and I use Wiktionary a lot when reading Arabic because it is more informative and more easily navigable than traditional dictionaries. (I mostly edit when I try to look up a word in Wiktionary and realise that a word or a sense is missing.) I really appreciate having as much grammatical information as possible: the transliterations into Latin script, full inflection tables, information about gender, etc. Wiktionary is unlike traditional dictionaries in providing all these. It was especially useful to me at the beginning of my Arabic-studying journey: Wiktionary helped me stick with learning Arabic, and this in turn has encouraged me to keep contributing to Wiktionary. So although I don't have particularly well informed opinions about marking sentience, I would generally encourage including and keeping information that may seem obvious to native-speakers but is not obvious to students—including and perhaps especially total beginners. Alarichall (talk) 08:01, 4 November 2023 (UTC)[reply]

How consensus works[edit]

An important point is being missed in this discussion. I'd like to get clarity on this. If a module, template or other practice has been stable for a long time, then any contentious change needs consensus before the change is made, and should be left in the status quo until consensus is achieved to change it. User:Fenakhay seems to disagree with this principle, based on their consistent attempts to force through the change being discussed above, and their serial reversions of my undos. Fenakhay claims as justification for this change that "there was no vote when the functionality was originally added", which seems quite spurious, as there rarely is such a vote. As an example, there was certainly no vote that led to the current state of Latin verbs using "I" forms, but the practice has long been stable, hence I am seeking consensus in the BP to change this. Similarly there was no vote that led to Ancient Greek being treated as a single L2 rather than several dialect-specific L2's, and User:Thadh rightly created a BP discussion instead of unilaterally introducing changes and then demanding that anyone wanting to undo the change needs consensus to do so. Benwing2 (talk) 03:09, 4 November 2023 (UTC)[reply]

I agree with the principle that any such changes (provided there is either an active community for the language, the language has a large amount of readers, or the editor in question isn't an editor of the language), in 'core' matters, including headwords and language treatment, should be discussed first. Thadh (talk) 03:26, 4 November 2023 (UTC)[reply]
He thought it is not contentious. It is inflammatory to claim he seems to disagree with the principle. Since practical use in the future was not demonstrated, on the contrary. The discussion has become theoretical by large now since no one is realistically hindered in expanding upon our Arabic entries. Accurate though that the reasoning formulated was spurious. Accurate also that, for this but theoretical effect of the particular present state of the module, there is a negligible status quo bias in favour of the previous module’s state, which would of course be changed anyway if you turn out to have the better view, after the discussion which no one has been prevented from kicking off if he care, this is surely a consideration when someone is WT:BOLD. I understand you cherished your own work and intellectual input that went into the module; if you actually planned to use the contested features within the next days it would be a different matter, but this is not the case, hence we are rightly apathic to whether your version or Fenakhay’s edits stay in the near future: We are making consensus now, and whether or not one of you two gets the provisory last word—you could edit war on it and nothing would change in the world, futile! Again, we try to think what people would realistically use in the entries. Fay Freak (talk) 03:32, 4 November 2023 (UTC)[reply]

Does 'terms borrowed back into LANG' include cases where the borrowing was from an ancestor?[edit]

I am cleaning up remaining cases where 'twice-borrowed terms' occurs, since the category has been renamed. There are, for example, 57 cases in CAT:Greek twice-borrowed terms, all of which appear to have the category added manually and where the chain of borrowing was typically Greek <- Ottoman Turkish <- Ancient Greek. Do these count as "borrowed back into Greek" terms? Similarly, there are several French terms borrowed from English which ultimately were borrowed from Old French. Do these count as "borrowed back into French" cases? Yet another example are wasei kango terms (Japanese coinages made from Chinese words) that are borrowed back into Chinese (we have around 100 of them). Most Japanese borrowings of Chinese words occurred during Middle Chinese, yet the {{wasei kango}} template considers them 'borrowed back into Chinese' terms and adds the category manually. If we do consider these are "borrowed back into" terms, this should be handled automatically, and either way, we should remove the manually added categories (ignoring cases similar to fakaleitī, where the etymology is incomplete so the category wouldn't get added automaticall). Benwing2 (talk) 05:57, 4 November 2023 (UTC)[reply]

Hmm... on one hand, if we say these don't count, then it's kind of arbitrary that terms from ancient Hebrew or ancient zh borrowed via another language back into modern Hebrew or Chinese can be categorized, whereas terms from ancient Greek borrowed back into modern Greek can't, just because we previously and unrelatedly decided it was most practical to handle Ancient and modern Greek under separate L2s, but ancient and modern Hebrew and Chinese under (mostly) one L2 apiece. And a term that went from early Middle English to e.g. (middle) French to late Middle English can be categorized, but a term that went from late Middle English to (middle) French to Early Modern English can't be (which is, again, arbitrary). It means decisions about whether it makes sense to handle two different languages under one L2 will start being influenced by whether people want to be able to consider the language(s) to have twice-borrowed terms, which seems undesirable.
On the other hand, if we say these do count, do we have a "cutoff mechanism", so that we're not considering a term that went "PIE → Latin → English" to have been "borrowed back into English"? (That's not a rhetorical question; do we already have some module in which we record that "Old English, Middle English, modern English" count as stages of 'a language' in a way that "Proto-Indo-European, Proto-Germanic, English" don't? It seems plausible we might.) - -sche (discuss) 06:50, 4 November 2023 (UTC)[reply]
@-sche That's a very good question that I didn't think of. AFAIK we don't have a built-in way currently of specifying that e.g. Old English is an earlier stage of English from this perspective whereas the ancestor of Old English (Proto-West-Germanic) is not. We do have a distinction between object inheritance (which represents an "is-a" relationship, e.g. US English is a kind of English, Mandarin Chinese is a kind of Chinese) and ancestrality (Middle English is an ancestor of English, and Old Italian is an ancestor of Italian even though it's also an etym-language variant of Italian). However, the ancestrality chain for English goes all the way back to PIE. I do think this can be determined automatically in most cases by looking for shared words at the end of the language name, and this accords with most people's sense of "early stage of a language": English and Old English share a word at the end, and Western Neo-Aramaic and its ancestor Aramaic share a word at the end assuming hyphens separate words, whereas English and Proto-West-Germanic don't. Benwing2 (talk) 07:00, 4 November 2023 (UTC)[reply]
@Benwing2 I'm not a fan of that approach, because it's still totally arbitrary: Buryat and Mongolian are both descendants of Classical Mongolian, but your approach would only apply between CM and Mongolian, not CM and Buryat. The only reason we consider one to be Mongolian and the other not is for historical and political reasons, and if we renamed Mongolian to Khalkha (which would we very plausibly could) then suddenly it would change the status of all these terms. You could make the same argument for all the Langues d'oïl other than French with respect to Old French, for example. One of the strengths of the current set-up is that it gets around the issue of which language is the "true" main descendant, and I'd oppose adding it in. Theknightwho (talk) 07:06, 4 November 2023 (UTC)[reply]
@Theknightwho There's also the practical issue that there's no way of distinguishing "back-borrowings" between A and B and regular borrowings using templates such as {{der}} or {{bor}}. Either we'd need to create an explicit {{bbor}} = "back-borrowing" or similar, or we'd have to make a new version of {{der}} that can have multiple levels of the chain inside its parameters. For example, replacing the following:
From {{inh|en|enm|orenge}}, {{m|enm|orange}}, from {{der|en|fro|pome orenge|t=fruit orange}}, influenced by the place name {{m|en|Orange}} (which is from Gaulish and unrelated to the word for the fruit and color) and by {{der|en|pro|auranja}} and calqued from {{der|en|roa-oit|melarancio}}, {{m|it|melarancia}}, compound of {{m|it|mela|t=apple}} and {{m|it|[[un]]'[[arancia]]|t=an orange}}, from {{der|en|ar|نَارَنْج}}, from Early {{der|en|fa-cls|نارنگ|tr=nārang}}, from {{der|en|sa|नारङ्ग|t=orange tree}},<ref name="OnlineED">{{R:Online Etymology Dictionary|entry=orange}}</ref> from {{der|en|dra-pro|*nār-}} (compare {{cog|ta|நார்த்தங்காய்}}, compound of {{m|ta|நரந்தம்|t=fragrance}} and {{m|ta|காய்|t=fruit}}; also {{cog|te|నారంగము}}, {{cog|ml|നാരങ്ങ}}, {{cog|kn|ನಾರಂಗಿ}}).
We'd have something like this:
From {{der|en|<<inh:enm:orenge>>, {{m|enm|orange}}, from <<ibor:fro:pome orenge<t:fruit orange>>>, influenced by the place name {{m|en|Orange}} (which is from Gaulish and unrelated to the word for the fruit and color) and by <<der:pro:auranja>> and <<ical:roa-oit:melarancio>>, {{m|it|melarancia}}, compound of {{m|it|mela|t=apple}} and {{m|it|[[un]]'[[arancia]]|t=an orange}}, from <<ibor:ar:نَارَنْج>>, from Early <<ibor:fa-cls|نارنگ<tr:nārang>>>, from <<ibor:sa:नारङ्ग<t:orange tree>>>,<ref name="OnlineED">{{R:Online Etymology Dictionary|entry=orange}}</ref> from <<ibor:dra-pro:*nār->> (compare {{cog|ta|நார்த்தங்காய்}}, compound of {{m|ta|நரந்தம்|t=fragrance}} and {{m|ta|காய்|t=fruit}}; also {{cog|te|నారంగము}}, {{cog|ml|നാരങ്ങ}}, {{cog|kn|ನಾರಂಗಿ}}>>.
The basic idea is that you can stuff an entire sentence into the second parameter of {{der}} (or whatever), and inheritance/borrowing/calque/etc. relationships are placed inside of <<...>>, similar to {{place}}. The variants ibor:, iinh:, ical:, etc. stand for "indirect borrowing", "indirect inheritance", etc. and indicate that the term in question is borrowed/inherited from the preceding-specified term; this lets the code have access to the full etymology tree, meaning it can do things like automatically find back-borrowings and other interesting phenomena. Benwing2 (talk) 07:45, 4 November 2023 (UTC)[reply]
@Theknightwho If we do include back-borrowings of this sort, I would rephrase it not as "what is the (single) true descendant of a given language" but "how far up the chain do earlier stages go"? That means that e.g. Scots and English (since we treat them as separate L2's) could both have Middle English and Old English as earlier stages, but not Proto-West-Germanic, and similarly the various modern Oïl languages would all have Old French as an earlier stage but not Proto-Gallo-Romance. Benwing2 (talk) 07:55, 4 November 2023 (UTC)[reply]
@Benwing2 I feel like the natural cut-off is to only include attested languages, but that may be too broad. Theknightwho (talk) 08:45, 4 November 2023 (UTC)[reply]
@Theknightwho Does that mean Latin counts as an earlier stage of French? Benwing2 (talk) 08:58, 4 November 2023 (UTC)[reply]
Well I suppose it is, and I suppose it’s somewhat interesting to see borrowings back and forth between language families: compare Old/Middle Chinese terms borrowed into Old/Middle Japanese, where the Japanese descendant has been borrowed into Mandarin. Intuitively, those seem notable to me. Theknightwho (talk) 09:15, 4 November 2023 (UTC)[reply]
Any attempt at automation runs into the problem of determining when a given language started. This can’t reliably be determined by their conventional names, as mentioned in the above discussion. It may be best to simply let the status quo stand, leaving users free to decide this on a case-by-case basis. Nicodene (talk) 11:21, 4 November 2023 (UTC)[reply]
Here is a suggestion for a cutoff mechanism excluding “PIE → Latin → English”. Consider a borrowing pattern A → ... → B → C, in which A is an ancestor of C. If A begat another descendant D before the term completed the leg B → C of its interlingual trip, where D is considered a genuinely different language from C, not a kind of dialect of C, this is a cutoff for the notion the term was borrowed “back” (Toto, I’ve a feeling we're not in A-land anymore). There will remain cases that are on the fence (what is “genuinely different”?), but this excludes most of the obvious cases (PIE *mel- → ... → French mal → English mal ) while allowing Ancient Greek κουκκούμιον (koukkoúmion) → Ottoman Turkish گوگم → Greek γκιούμι (gkioúmi) and Middle Dutch bolwerc → French boulevard → Dutch boulevard  --Lambiam 16:37, 7 November 2023 (UTC)[reply]
Spitballing: have an extra parameter for each language like isStageOf so e.g. ang would be set to enm, and enm to en and sco. Alternatively, store this in a separate module* only {{bor}} et al. access, so it doesn't inflate the size of the module that {{l}}, {{lb}}, {{head}} et al. access. What to consider a stage of what is subjective in places, but I don't think avoiding automating it avoids the problem, since we still need to know whether it's right if an editor manually categorizes a term, so people don't (intentionally, or even unawarely) edit-war over it.
For my part, I'm not sure I would consider Latin to be just an earlier "stage" of French, because Latin split into so many languages and French is not considered the "Modern Latin" (actual la-Latin is). So it'd be useful for us to decide that, regardless of whether we're categorizing manually or by module.
The question also extends to descendants of French, English, etc: if a term in Middle English was borrowed into (middle) French, then borrowed from modern French by Jamaican Creole, was it "borrowed back into Jamaican Creole"? I'm inclined to say no. OTOH an edge case like "term used in colonial-era English texts from Jamaica, borrowed into another unrelated language there, and then borrowed by Jamaican Creole" is the sort of thing I'd suggest allowing manual categorization of.
*In a separate module, each chain could also be separate, if other people actually do want to categorically allow any English term borrowed into another language and then into Jamaican Creole to count as twice-borrowed, and of course allow an Old English term borrowed into [stages of] French and then back into English to count as twice-borrowed, but don't want to consider an Old English term borrowed into French and then into Jamaican Creole to be twice-borrowed. Just have one chain "ang, enm, en" and another "en, jam", and {{der|jam|ang}} would see that no chain contained both "jam" and "ang" and so not count it as 'borrowed back'. - -sche (discuss) 15:16, 4 November 2023 (UTC)[reply]
I've been doing cleanup of {{bor}} vs. {{der}}, and the same issue comes up there: {{bor}} should only be used for borrowing into the language of the entry, but people tend to see the word "borrowed" in an etymology and use {{bor}}, regardless of the steps in between. This is easy to sort out when an English entry uses {{bor}} for the borrowing of an Ancient Greek word into Latin, but there are lots of cases such as English entries where the borrowing occured in Middle English or Old English, or Indonesian entries where the borrowing was into Classical Malay. I can see how it could get really sticky in cases like borrowings between Scots and English, since they're both descended from Middle English but English speakers tend to think of English as the "real" continuation of Middle English. Then there are the Norwegian lects and their relationship with Danish.
Another thing I see a lot of is the use of {{inh}} for ancestors of terms that were borrowed from a related language, so someone might use {{inh|nb|gem-pro}} for a term that was borrowed from Middle Low German- but that's a separate issue. Chuck Entz (talk) 16:02, 4 November 2023 (UTC)[reply]
@Chuck Entz it's good to hear you clarify the point about {{bor}} only being used for "borrowing into the language of the entry", as opposed to borrowings earlier in the chain of derivations. I suspected this to be the case, but it is not actually spelled out at Template:borrowed/documentation (which only has a note about language stages) or WT:ETYM (which contains the confusing, vague wording "If any step of a word’s history is a borrowing, this step should be flagged as such") - I wonder if you could add it to our documentation as appropriate? This, that and the other (talk) 23:39, 7 November 2023 (UTC)[reply]

@Benwing2 about your "cleaning up remaining cases where 'twice-borrowed terms' […] for example, 57 cases in CAT:Greek twice-borrowed terms, all of which appear to have the category added manually and where the chain of borrowing was typically Greek <- Ottoman Turkish <- Ancient Greek. Do these count as "borrowed back into Greek" terms?", - and I add: also Ancient Greek > Latin > some european languages > Modern Greek. The Greek case (relation of ancient-modern) is special. Greek dictionaries use two different terms αντιδάνειο (antidáneio) (literally counter-loanword, marked in strict sense as "to borrow back", Rückwanderer), and αναδανεισμός (anadaneismós) ανά (aná)+δανεισμός (daneismós) (re-borrowing, to borrow again the same word, like your doublets). May I ask please, for a clarification of the definitions for the linguistic terms (also at Glossary) of twice-borrowed and reborrowing and their difference. I translated αντιδάνειο as twice-borrowed, at the above 57 Greek words. ++Probalby, I should have used 'reborrowed' Thank you. ‑‑Sarri.greek  I 00:00, 8 November 2023 (UTC)[reply]

@Sarri.greek "Borrowed back into the same language" means the chain of borrowing was X -> non-descendant Y -> X. (The above discussion is whether the two X's can be different languages in a parent-child relationship.) "Doublet" means one of two terms that originated in the same term but arrived in the destination language by two different paths. "Twice-borrowed" is being phased out in favor of "Borrowed back into the same language", and "reborrowed" doesn't have a formal definition here. Can you clarify what the difference between αντιδάνειο and αναδανεισμός is? Benwing2 (talk) 00:13, 8 November 2023 (UTC)[reply]
@Benwing2 I have difficulty understanding X - nondescendant... w:en:Reborrowing also puzzles me, because it mixes up 'to borrow again = doublet' and 'to borrow back = Rückwanderer'
αντιδάνειο (antidáneio) = thewordXX at language A > thewordXX(perhaps altered) to some OTHER language > thesamewordXX (perhaps altered) back to A2, a later phase of language A. Example: αψέντι (absinthe). For Greek, we mark it only when dictionaries say so, we do not make up markings from our own judgement. The other term 'αναδανεισμός' is the 'doublets borrowed two times resulting in two different forms (like your fire, pyre but Greek dictionaries do not comment on it. ‑‑Sarri.greek  I 00:31, 8 November 2023 (UTC)[reply]

Appendix cruft in Citations[edit]

e.g. Citations:spectre. I don't think these citations for fancruft appendices should be in "real" citations space, mixing with the useful stuff that meets WT:CFI. Thoughts? Equinox 16:28, 5 November 2023 (UTC)[reply]

Yes. get them out of there. — SURJECTION / T / C / L / 17:30, 5 November 2023 (UTC)[reply]
Why? —Justin (koavf)TCM 01:00, 6 November 2023 (UTC)[reply]
Because the sense they are for is never going to meet CFI. — SURJECTION / T / C / L / 06:58, 6 November 2023 (UTC)[reply]
Below, I will try to explore the idea that "the sense they are for is never going to meet CFI". (P.S.: If the issue is with Template:item, please see diff below.)
On WT:CFI at WT:FICTION it is written:
"Terms originating in fictional universes which have three citations in separate works, but which do not have three citations which are independent of reference to that universe may be included only in appendices of words from that universe, and not in the main dictionary space."
At Wiktionary:Criteria for inclusion/Fictional universes, linked at WT:FICTION, it is further written:
"This is a Wiktionary policy, guideline or common practices page. []
"These are examples of the criteria for inclusion as applied to terms originating in fictional universes such as Star Wars, Star Trek, Lord of the Rings, Harry Potter, and Dungeons and Dragons. Examples below include lightsaber, protocol droid, Darth Vader, and Vulcan.
"Such terms which have three citations in separate works, but which do not have three citations that are independent of reference to that universe, may be included only in appendices of words from that universe, and not in the main dictionary space."
I don't care if you delete every quote and every bs fiction fancruft appendix. That's cool, I can 100% understand it. However, I might feel that the claim: "the sense they are for is never going to meet CFI" seen above in this diff seems unusual viewed in light of the above-quoted portions of WT:CFI and the ancillary policy page. I'm only reading the plain language and I have no awareness of other policies or practices that may nullify these passages. --Geographyinitiative (talk) 07:56, 6 November 2023 (UTC) (Modified)[reply]
I think the Citations namespace can be used a place to show that a term is on its way to meeting CFI. I support having the Mass Effect cites there (although there should be a cite that refers to this sense without a mention of the video game.) CitationsFreak (talk) 19:25, 5 November 2023 (UTC)[reply]
I agree with this stance. The citations in the Citations namespace should either count towards meeting CFI or, for particularly rare terms, help clarify the meaning when context alone is insufficient. The namespace should not be used for senses that are not CFI-compliant to begin with. Andrew Sheedy (talk) 19:48, 5 November 2023 (UTC)[reply]
@Andrew Sheedy, CitationsFreak, Daniel Carrero, Equinox, Surjection See Appendix talk:Mass Effect for context. You guys work out what you want to do; this is totally experimental for me and I don't really care. I would encourage you not to judge the Mass Effect cites by Citations:spectre (MOVED TO Citations:Spectre), but instead by one of the better ones: Citations:Ardat-Yakshi. Bro that Citations page kicks ass, as I believe you will agree. Anyway, it's all theoretically identical to Citations:protocol droid which has been around for decades with no problem. lol lmao &c. --Geographyinitiative (talk) 23:43, 5 November 2023 (UTC) (Modified)[reply]
You are correct, it is a very good Citations page. Not ready for a main entry, but something worth being there. CitationsFreak (talk) 23:47, 5 November 2023 (UTC)[reply]
Yes, but the entry at protocol droid was deleted. Jberkel 00:01, 6 November 2023 (UTC)[reply]
@Jberkel Please see Appendix:Star Wars, where 'protocol droid' is listed in an in-universe fancruft appendix containment zone with a link to the ancient page Citations:protocol droid. My goal with the recent Citations pages for Mass Effect in-universe words was to do something similar in most respects. --Geographyinitiative (talk) 00:25, 6 November 2023 (UTC)[reply]
Yes, but different rules apply for entries that have been deleted, the citations page is usually kept for archival purposes etc., and sometimes it is created from the citations of the deleted entry. Ardat-Yakshi has not been deleted, and it'll probably never get created. Jberkel 07:44, 6 November 2023 (UTC)[reply]
@Jberkel Thank you for your response. Please see diff, which explains the basis in WT:CFI for creating Citations:Ardat-Yakshi (1) without any intent to create Ardat-Yakshi, and (2) solely for the purpose of upholding Ardat-Yakshi's appearance on Appendix:Mass Effect. --Geographyinitiative (talk) 07:59, 6 November 2023 (UTC) (Modified)[reply]
Then why not add the citations to the Appendix namespace, something like Appendix:Mass Effect/Citations, or Appendix:Mass Effect/Ardat-Yakshi/Citations? Adding lots of entries to Citations: without the intent of creation makes it kind of pointless. It's just noise. Jberkel 08:25, 6 November 2023 (UTC)[reply]
@Jberkel Thanks for your reply. This is an interesting proposal and would be an unique page as far as I am presently aware. When I put citations on Citations:Ardat-Yakshi page, I was merely following the link automatically generated at Appendix:Mass Effect when using Template:item (which is used for 'protocol droid' at Appendix:Star Wars), where it is written:
"In appendices that contain lists of headwords and definitions, this template returns a headword.
"Additionally, this template also always generates a link to the respectives talk and citations page." --Geographyinitiative (talk) 08:44, 6 November 2023 (UTC)[reply]
Created by one single editor without much consensus, it seems. "For an appendix that uses this template extensively, see Appendix:The Legend of Zelda". The fact that the mentioned Appendix is no more is a giveaway. Jberkel 08:52, 6 November 2023 (UTC)[reply]

This was really a thought-provking discussion for me, and I'd like to let you see what I'm coming out of this with.
My conclusion is to proceed with putting fiction-universe word cites on the normal Citations pages unless/until I see tangible examples made by other people of a Citations page like Jberkel mentions. Whenever I see that, I'll change over to that. But in the interim, I think that it just makes sense to put them on the normal Citations pages, because you never know how these words might "break out" at some point, like hobbit, lightsaber, etc. Because I think anyone would agree the fiction-universe words do still have to meet normal three cites/independent/spanning a year rules somehow, on some page somewhere. Where else would they go than in the normal Citations page? When I see something different, I'll do that, but until then, yeah, I think it's commanded by Wiktionary's CFI guidelines that I continue, unless I assert that the words are in "clearly widespread use". Which I don't, and it would be foolish to assert. SEE ALSO: Wiktionary talk:Votes/pl-2008-01/Appendices for fictional terms. --Geographyinitiative (talk) 13:50, 13 November 2023 (UTC) (Modified)[reply]

what makes an exonym and can it be automated?[edit]

We have CAT:Exonyms by language and its subcats. But how different do the source and target language renderings need to be for it to be counted as an exonym? There are obvious cases like Germany vs. Deutschland, Finland vs. Suomi, Egypt vs. Arabic مِصْر (miṣr). But CAT:English exonyms also includes Rome vs. Italian Roma, Seville vs. Spanish Sevilla, Milan vs. Italian Milano and even Hesse vs. German Hesse (??) and Tierra del Fuego vs. Spanish Tierra del Fuego (????), as well as close renderings of names written in other scripts like Kyiv vs. Ukrainian Ки́їв (Kýjiv) and literal translations like Sugarloaf Mountain vs. Portuguese Pão de Açúcar (literally sugar loaf). I'm asking because I wonder if it's possible to automate this in {{place}}; otherwise the categories are forever doomed to be incomplete. The case of transliteration may be impossible to automate, but just considering cases of borrowing from the same script, does any change at all in the spelling count? What about simple dropping of accents, like Peru vs. Spanish Perú? Benwing2 (talk) 03:53, 7 November 2023 (UTC)[reply]

Maybe someone was thinking of the occasional difference in pronunciation with Hesse (that it's only sometimes pronounced like in German, and sometimes loses a syllable)?? though I'm not sure it's sensible to be that persnickety. Calling Tierra del Fuego an exonym (added in diff) seems wrong no matter how you look at it. Maybe someone just figured that calling it anything in ==English== was an exonym since the native name is ==Spanish==??? Calling Kyiv an exonym also seems wrong (added in diff), but maybe someone thought the fact that English speakers don't write "a building in Ки́їв was bombed last night" means they're using an exonym?? I can follow the train of thought that if the locals call something "Pão de Açúcar", then for English to call it "Sugarloaf Mountain" rather than e.g. "Pao de Acucar moutain" is an exonym, like if Germans called Philadelphia "Bruderliebe" it would be an exonym. But yes, we clearly need to establish some guidelines for what counts as an exonym, if people are categorizing everything up to and including "calling it Tierra del Fuego instead of Tierra del Fuego"! :o Personally, I would not consider adapting a name to another language's orthography and phonology to make it an exonym (so I would not count Kyiv, Peru or Tierra del Fuego as exonyms), or at least, I would suggest that if we were defining exonym as "any alteration whatsoever, even to approximate a phoneme a language doesn't have by one it does have, or to render it into a language's script", then it's too broad to be worth categorizing. - -sche (discuss) 19:16, 7 November 2023 (UTC)[reply]
The definition of exonym agreed upon by the United Nations Group of Exports on Geographical Names is:
Name used in a specific language for a geographical feature situated outside the area where that language is spoken, and differing in its form from the name used in an official or well-established language of that area where the geographical feature is located. [2]
"Differing in its form" is pretty vague and all-encompassing, and seems to cover every example mentioned except for Tierra del Fuego, which has the same spelling and same pronunciation in the native language and in English.
A Polish research article [3] asks the same questions, and comes up with a whopping eleven different categories of exonym. I don't think we necessarily need to be that specific, but I agree that the term "exonym" is too broad for our purposes of categorizing them. I would propose the following divisions:
  • Endophones - the paper lists Turkish İtalya as an endophone of Italian Italia, as the pronunciation is fully identical but the spelling is different.
  • Endographs - France, Paris, and Argentina are examples of English names that are spelled identically to the endonym, but pronounced with notable differences. I'm unsure whether this should include mere diacritic differences like Perú, or whether those should count as entirely different characters.
  • Exographs - 北京 to Beijing or Peking, Ки́їв to Kyiv or Kiev, Brasil to Brazil. The exonym is not spelled with the same characters as the native name, and the pronunciation is not identical, but there is a systematic effort to adapt the endonym directly into the phonemes of the language.
  • Cognate exonyms - München to Munich, Napoli to Naples, 日本 to Japan. There is a significant change in both pronunciation and in spelling, but it is still derived from an endonym. The research paper divides this further based on the specific nature of the change, but that is likely overkill for us.
  • Calqued exonyms - Pão de Açúcar to Sugarloaf Mountain, Dutch Nederland to French Pays-Bas or Irish An Ísiltír. Perhaps German Niederlande and English Netherlands. The endonym is translated instead of directly borrowed.
  • True exonyms - Germany, Holland (in reference to the Netherlands), and Egypt are all examples of names that aren't derived from a modern endonym. They could be further classified into whether or not there is a historical root endonym (as in Egypt) or a partial name has become representative of a whole (as in Holland) but again, that might be overly specific.
Qwertygiy (talk) 23:14, 7 November 2023 (UTC)[reply]
"-graphs" categories seem pointless to me, because any time two languages use different scripts, every placename would go in the category (since we've previously decided that when a term is used in e.g. English in its native e.g. Cyrillic script and not adapted to English letters—like when Москва is used in English—it's code-switching). Only the last three ("Munich", "Pays-Bas", "Germany") seem interesting to me, so for my part my suggestion is to categorize either the three of those, or even just the last one ("true exonyms"). I suppose we should also consider whether it is likely that anyone not involved in the discussion (here?) that decides on standards will understand or maintain any standards (since users are currently putting even things like Tierra del Fuego and Kyiv in), and hence whether it's worth categorizing exonyms at all. (Another thorny question is what to do when speakers of language A lived in a place, developed a name for it, and then got forcibly relocated by speakers of language B. Is "Bakhmut" now an exonym for what the Russians call Артёмовск [Artyomovsk], now that the latter occupy it?) - -sche (discuss) 17:59, 8 November 2023 (UTC)[reply]
According to both sources I found, yes. An exonym can become an endonym, and vice versa, for various reasons. America and Wales, for example -- they were names first applied by people who had never even seen the land in question, but nobody could reasonably argue that English is not the dominant tongue, let alone officially recognized language, of these areas, and that these words are officially used by the English-speaking natives. By the UN definition, that makes them endonyms for at least the last two centuries.
This, one could easily argue, makes the distinction between endonym and exonym almost pointless from an etymological standpoint. Qwertygiy (talk) 21:23, 8 November 2023 (UTC)[reply]

Reconstructed senses[edit]

There exists a problem with some terms that exist half-way between a reconstruction and not. An example of this would be krok#Old Polish, which has an attestation in krokiem, but that is a lexicalized case form. Currently I have it set krok as a reconstructed sense. I know @Thadh has had a similar issue and can provide some other examples where a categorizing label would be helpful. More input overall is needed. Vininn126 (talk) 14:11, 7 November 2023 (UTC)[reply]

I don't remember any specific examples, but this is a widely occuring feature, especially in parent languages with a relatively small or specialised corpus. Thadh (talk) 16:25, 7 November 2023 (UTC)[reply]
IMO, if we considered krokiem a mere inflected {{form of}} krok—with the same POS, and a definition just saying it's an inflected form—then it'd fine to lemmatize the lemma form krok rather than the inflected form, because the word is attested, just not the nominative inflected form. I'm certain we've done already that for various Latin words. But if krokiem is a distinct word with an entirely differently part of speech(!) and its own definition, not a mere inflected form of krok, then krok shouldn't be in mainspace, it should be a Reconstruction: page like Reconstruction:Old Norse/bljúgr or Reconstruction:Old Norse/blettr, because the word krok isn't attested, only the separate word krokiem is attested (like Equinox's recent remark that one can't use the existence of nothingize to add nothingizer, even though the derivation is obvious). - -sche (discuss) 18:54, 7 November 2023 (UTC)[reply]
Perhaps Old Polish krok would be better as a true reconstruction, but there there are still instances where a lemma is attested but a sense is not except i.e. in derived terms and children. Vininn126 (talk) 18:57, 7 November 2023 (UTC)[reply]
I think I see what you mean. AFAIK the correct thing to do even then is to put the unattested sense in the Reconstruction namespace. I agree it's not great that users have to go between two places to find all the content about the word, but putting senses that we know aren't attested into mainspace also seems bad. Old English docce (for example) is a reconstruction because it's not attested on its own, even though it existed in the ancestor language and the child language and is attested in Old English compounds. (And for Bär (boar), I had to demonstrate that it was actually attested on its own and not only in compounds, otherwise Brown*Toad was arguing for deleting it.) - -sche (discuss) 19:53, 7 November 2023 (UTC)[reply]
These are still very different things to the issue at hand.
Imagine language Foo, with the descendant Bar and the reconstructed ancestor Proto-Foo. We have the word foo, which in Bar means "tongue, language", and the reconstructed meanings of Proto-Foo are also "tongue, language". Now, consider a scenario where the only attested sense in the Foo language is "language". Yet, based on both the ancestor and the descendant we can be quite certain that it also meant "tongue".
This is essentially what we're dealing with, it doesn't have anything to do with derived terms, just basic descent. Thadh (talk) 21:38, 7 November 2023 (UTC)[reply]
Yeah, that's the same situation as *docce and e.g. *picga which existed in the ancestor and descendant. (Indeed, docce and picga may have a stronger case for having existed—as each has attested derived terms which use the sense, in addition to having something it descended from and into—than something that "doesn't have anything to do with derived terms, just basic descent".) If the sense isn't attested, my understanding is that (at present) it's supposed to go in the Reconstruction namespace.
But I'm just saying that's my understanding of what the current norm is; I don't mean to come across as defending it; I'm not opposed to changing it, though I'm not convinced that putting senses we know are unattested into mainspace is appropriate. If we do put them in mainspace, the "(reconstructed)" label looks good.
I have wondered for a long time whether we should start using the presence or absence of colour to signal differences in our entries more often (e.g. to distinguish labels indicating restriction to a particular jargon, vs labels that merely indicate topic without any restriction, like "anatomy" on elbow), and here one idea would be, if we want to put reconstructed senses in mainspace, could we colour the sense's background grey like {{Webster 1913}} (and kind of like {{LDL}})? - -sche (discuss) 01:10, 8 November 2023 (UTC)[reply]
@-sche IMO not a bad idea. Accessibility UI guidelines call for being careful with colors due to colorblind users, but this can be handled by looking up the color pairs to avoid (esp. for those with red-green colorblindness), or simply using color vs. no color, as you suggest. Benwing2 (talk) 23:01, 8 November 2023 (UTC)[reply]
@-sche: It's not the same situation - *docce and *picga aren't attested in any sense, rather than just in some of them - but I think we're on the same page about there not being any current solutions other than dumping a word into the reconstruction mainspace. I'm not sure colours are a good idea, but {{lb|LANG|reconstructed}} or {{lb|LANG|unattested}} should, in my opinion, be satisfactory.
To illustrate what I mean I found an example of a reconstructible sense in an attested word: Old East Slavic лѣсъ (lěsŭ) has multiple senses, including "timber". This sense is shared in both Russian and Ukrainian, and I'm pretty sure Belarusian has it, too. However, in Old Ruthenian, only the sense "forest" is attested, whereas "timber" isn't. Thadh (talk) 14:50, 10 November 2023 (UTC)[reply]

Picture dictionary image sizes[edit]

The 'picture dictionary' maps on e.g. Abkhazia and Tbilisi are very large, perhaps because Georgia is a horizontally wide, vertically short country, since I notice that the maps of tall-and-thin countries like Palestine are not as wide, but are correspondingly comically tall (I can't view the top and bottom at the same time unless I zoom way out). On mobile, and on my computer at the level of zoom I normally use, the map on Tbilisi crowds the definition entirely off the screen (although on my computer, if I zoom one step out, I see the definition on the left and image on the right coexisting in what I imagine is the expected harmonious way). Should we make the maps a bit smaller, at least a little closer to the size of regular images? - -sche (discuss) 18:27, 7 November 2023 (UTC)[reply]

@-sche It would be great if the image could resize with the screen width. User:This, that and the other, User:Erutuon or User:Sokkjo as our resident CSS experts, is that possible? For me, the map at Tbilisi is totally fine and occupies only the rightmost 20% or so of the width, but I have a very wide monitor and the Chrome window takes up most of the monitor's width. Benwing2 (talk) 00:08, 8 November 2023 (UTC)[reply]
The problem is WT:PICDIC is a dumpster fire. It should've been rebuilt to use mw:Extension:ImageMap to allow for rescaling. Instead, WT:Picture dictionary/en:Georgia-map has a set width and can't be resized. --{{victar|talk}} 08:01, 8 November 2023 (UTC)[reply]
But maybe the better question is, do we need such an interactive map on a wikt entry? File:Administrative Divisions of Georgia (country) - en.svg would probably suffice and is scalable. --{{victar|talk}} 17:30, 8 November 2023 (UTC)[reply]
I wonder why the map of Georgia is shown at all? Tbilisi is a huge capital city in Georgia, while Abkhazia doesn't even consider itself as a part of Georgia. It would be better and more informative to have a map of Tbilisi and a map of Abkhazia instead. Tollef Salemann (talk) 10:38, 10 November 2023 (UTC)[reply]

Splitting Serbo-Croatian, or at a minimum supporting standardized lects alongside it[edit]

@Anarhistička Maca, Vorziblix, Benwing2 for visibility.

Hi,

This thread is to propose splitting Serbo-Croatian into Serbian, Croatian, Bosnian and Montenegrin, or at least supporting those as separate L2 languages alongside it. This is far from the first BP thread on that subject, so I won't rehash all the details and controversy, but rather focus my argument on current realities and precedent from other Wiktionaries:

  • "Serbo-Croatian" is a polarizing term in the countries of former Yugoslavia. Croatian linguistics and society, in particular, reject it soundly and vocally. As a result, we're likely making it harder to recruit and retain Croatian editors. This probably extends to the other 3 affected countries. As this is fundamentally a volunteer-driven project, I find it self-defeating to argue for Serbo-Croatian "unity" from an abstract linguistic viewpoint that's disconnected from the reality on the ground.
  • Four of the other Wiktionaries with over a million entries - German, French, Greek and Russian - all support, at a minimum, Serbian, Croatian and Bosnian. Some of them also have "Serbo-Croatian", which predictably lags behind the other lects in coverage. That's not too dissimilar from e.g. Croatian vs. Serbo-Croatian Wikipedia.
  • While the standard languages are mutually intelligible, changes in orthography, accentuation and diachronic development can result in entries that are more complex than they should be, e.g. kći. This also affects things that should be simple - like {{ux}} and {{uxi}} - but which in reality require judgment about which standard to pick.
  • The actual vote back in 2009 to unify the lects under the "Serbo-Croatian" L2 header ended up as "no consensus". I lack the historical background on how it became the norm anyway.

It's not lost on me even a little bit what a gargantuan task it would be to properly divvy up the 50k+ existing Serbo-Croatian entries. I see that as a gradual, multi-year process, partly dependent on us being able to recruit more BCMS-speaking volunteers (which some of us are trying to do). But I believe we should start somewhere. To that end, I propose:

  • keeping Serbo-Croatian as an L2 through the potentially lengthy period until a proper "split" is achieved. That would ensure we don't immediately break or have to redo the etymologies of borrowings in other languages.
  • adding Serbian, Croatian, Bosnian and Montenegrin as L2s with their respective ISO codes.
  • investigating options to bootstrap Croatian, Bosnian and possibly Montenegrin by noting that their entries would be Latin-alphabet-only, and have certain labels that distinguish them as such. The bootstrap process should include criteria for the "safe deletion" of the corresponding SCr entry, e.g. when it's not linked to from entries in other languages.
  • consider promoting Kajkavian and Chakavian to L2s as "Kajkavian Croatian" and "Chakavian Croatian", for the following reasons:
    • mutual intelligibility between those dialects and Shtokavian (the basis for BCMS) is limited, esp. in the case of Kajkavian
    • they have independent literary traditions going back centuries. That's part of their successful respective bids to get their own ISO language codes.

These are, of course, just a handful of initial steps - I'd be happy to discuss any sub-projects under the overall "split" umbrella, and we'll likely have a number of those between automation and manual work. My best-case outcome for this project looks like this:

  • smaller, simpler individual entries
  • better coverage of the living, modern state of each variety
  • happy contributors and an easier time in attracting more of them

As always, I'm looking forward to your thoughts!

Cheers,

Chernorizets (talk) 04:14, 8 November 2023 (UTC)[reply]

@Chernorizets Ugh, I am strongly opposed to this. All standardized Serbo-Croatian lects are strongly mutually intelligible and we would be doing a big disservice to our readers to duplicate the information four times over across 50k entries. Whether Chakavian and Kajkavian should be considered separate L2's is a completely separate matter, but in terms of standard Bosnian, Croatian, Serbian and Montegrin, definitely not. If the main issue is the term "Serbo-Croatian" itself, that can potentially be renamed if we can find a suitable replacement term. Benwing2 (talk) 04:20, 8 November 2023 (UTC)[reply]
BTW I think a better use of resources would be to figure out how to reduce the duplication between Latin and Cyrillic equivalent entries, using transclusion or similar. Benwing2 (talk) 04:22, 8 November 2023 (UTC)[reply]
@Benwing2 who in your mind are the readers to whom the split would be a disservice? Is it language learners? Because there are plenty of e.g. Croatian-specific or Serbian-specific textbooks, apps and educational media. Is it people looking up a word they found somewhere? Because they'd either be looking up rijeka or reka or река (which are separate articles today anyway), not some "amalgam" of the three, and it might be useful to know that e.g. the first is the standard Bosnian and Croatian spelling, while the latter two are the standard Serbian spellings. Chernorizets (talk) 04:32, 8 November 2023 (UTC)[reply]
If your concern is indicating the usage as standard Croatian, Serbian, Bosnian, etc., that is easy to do without a massive splitting effort. We currently indicate this particular difference as Ijekavian vs. Ekavian in the "Alternative forms" section, because the split between the two doesn't exactly correspond with the split between countries. The terms "Ijekavian" and "Ekavian" link to a Wikipedia section explaining what these terms mean. We could easily add the terms "Ijekavian", "Ekavian" etc. to the headword. In general, splitting into different L2's would result in a huge additional and unnecessary barrier to entry for editors, who would have to duplicate most info across something like 7 entries (Croatian Latin, Serbian Latin, Serbian Cyrillic, Bosnian Latin, Montenegrin Latin, Serbo-Croatian Latin, Serbo-Croatian Cyrillic, maybe also Montenegrin Cyrillic), which would inevitably result in steady divergences as some entries get updated when others don't. This would cause a ton of confusion for readers who would wonder why the Serbian entry lists definitions A, B, D, E while the Croatian entry lists definitions B, C, E, F, when in reality all definitions apply to both. The quality of the resulting coverage of Serbo-Croatian terms would decline (probably quite significantly within a few years).
I have been told that the differences between standard Serbo-Croatian varieties are less than the differences between American, British and Australian English, and certainly less than the differences between European and Brazilian Portuguese. Would you support a split of English and Portuguese for similar reasons to what you proposed above? Benwing2 (talk) 05:35, 8 November 2023 (UTC)[reply]
@Benwing2 I'm not sure how this is different from the situation with any other close group of lects. The amount of coverage will always depend on the amount of investment by volunteer editors. I don't see it as a problem that e.g. Croatian may end up having more/different entries compared to Serbian, or the other way around, just like I don't see it as a requirement that every article on Croatian Wikipedia needs to have a corresponding article on Serbian Wikipedia.
I definitely wouldn't expect an editor to have to create N entries, and that was never implied in my proposal. As for the possible divergence of senses, that's a valid concern, but consider that sometimes that's actually what we'd want, e.g. zrak. Right now, we're using per-country labels to reflect the fact that this is the common word for "air" in Croatia and Bosnia, whereas in Serbia it's vazduh. I'd argue that something like this is more confusing to a user rather than less. The languages also sometimes differ as to how loanwords and neologisms are incorporated into the lexicon - e.g. sebić is a trendy Croatian word for "selfie", rather than the direct loan selfi. We could continue doing what we do today and remember to tag things with the right country label, but I don't see how that's any less exhausting than if we could just create the entry as a Croatian entry.
As to English and Portuguese - if the people who speak these languages one day decided that "English" and "Portuguese" are divisive, offensive or otherwise polarizing terms, and changed their constitutions to call their official languages something else, then at that point I'd support a split. To the best of my understanding, while we're not there with English and Portuguese, we're there with Serbo-Croatian. Chernorizets (talk) 05:59, 8 November 2023 (UTC)[reply]
@Chernorizets You seem to be confusing the term "Serbo-Croatian" with the linguistic reality that there's only one language involved. As I said above, we can easily use a different term; this is what the ICC did, for example, using Bosnian/Croatian/Serbian or something similar. Benwing2 (talk) 06:08, 8 November 2023 (UTC)[reply]
@Benwing2 Wiktionary L2 names are used in many ways, and it's hard for a casual observer (or even a less casual one) to discern that something is a "term" vs a prescriptive statement or something else. I can get behind just doing something about the "term" (name?), although it's not obvious to me what we'd choose
This is informative: https://en.m.wikipedia.org/wiki/Declaration_on_the_Common_Language. The text of the declaration doesn't give the common language a name, despite arguing that it's a common, pluricentric language. Take a look at the Croatian version of this article. How does that inform your opinion?
I suggested L2s because of precedent elsewhere and because I don't otoh have a better alternative. I'm open to other ideas. Chernorizets (talk) 06:44, 8 November 2023 (UTC)[reply]
@Chernorizets Yes, they are finessing the issue of choosing a name. Wikipedia uses "Serbo-Croatian". I have a book on Serbo-Croatian grammar called "Bosnian, Croatian, Serbian, a Grammar" by Ronelle Alexander subtitled "with Sociolinguistic Commentary" that has a long section on the sociolinguistic issues. By its title the book is implicitly endorsing the BCS naming convention (and even has the letters B C S highlighted in yellow on the cover and grouped together vertically). I know some linguists use terms like "South Slavic Dialect Continuum", although that often includes Slovenian as well. I am not attached to any particular name. Benwing2 (talk) 06:53, 8 November 2023 (UTC)[reply]
@Benwing2 I still think it's telling how the EN version of this article lacks a "Criticism" section, whereas:
I gather from these articles that the official Serbian position is that the common language is most correctly called "Serbian", with three additional codified varieties, whereas the Croatian position is that the declaration is as DOA as the notion of "Serbo-Croatian" (paraphrasing). I suppose any choice of name is going to leave someone unhappy, but the choice of English WP and Wikt to stick with "Serbo-Croatian" is likely appealing to the least number of people.
I think it might be worthwhile reaching out to some of the admins or bureaucrats of DE, FR, EL and RU Wiktionary to ask how they decided to support both "Serbo-Croatian" and the individual lects as L2s. Admittedly, besides RU Wikt, the number of lemmas is relatively small, but I'd assume it was still a conscious decision. Thoughts? Chernorizets (talk) 08:46, 8 November 2023 (UTC)[reply]
@Chernorizets It may have been the "course of least resistance" and stemmed from individual Serbian and/or Croatian contributors. I notice there are many more Serbian lemmas than Croatian lemmas in ruwikt. Benwing2 (talk) 09:04, 8 November 2023 (UTC)[reply]
@Chernorizets: My thought is that they didn’t do a conscious decision but follow the flock of Wikipedia, just like you attempt to do emphatically. en.Wiktionary imported their WT:List of languages thence and hence Ethnologue. Lots of them had to be deleted because even their names are unattested outside, like Saʽidi Arabic and their separate treatment is unnatural to editors and readers, and this is not becoming a concerning thought for Wikipedia editors, like terminology or linguistic concepts in general. Bright shiny object. According to Wikipedia, Rāziḥīy “may be a surviving Old South Arabian language”—because one single author promoted his publication career strategically with this spectacular claim. Because reflection on attention or source criticism would be OR/TF. On Wikipedia the illogical published claim wins against balanced treatment, unless there are forces to also fear to be not mainstream. “It’s in a source!!!” And not fringe unless demonstrated otherwise. Some Dan Polansky can always come up with a skewed statistical argument that his view is the majority. (The majority is actually silent.) Which disregards practical benefit of the reader, who is supposed to become smarter, get his thoughts in order, and not politically correct first. I always prefer a superficial slant if the information density is high. The practical benefit here is one will tell the reader exactly if and in so far as one thinks if something is regional, but there is generally no data on this. If we split then information has to be verified in additional iterations to go sure whether something is Croatian, Serbian, Bosnian, Montenegrin. Fay Freak (talk) 09:31, 8 November 2023 (UTC)[reply]
@Fay Freak I'm not "emphatically" attempting to do anything. I'm seeking opinions on the consensus-based forum that is Beer Parlour, and I have no special powers or privileges to "get my way" on this. Your colorfully phrased assumption that either Wikipedia got it wrong w.r.t. BCMS, or that other Wiktionaries just blindly followed Wikipedia's example, is just that - an assumption. You might be right, you might be wrong, but the thing that's actually troubling to me is that there's no appetite to even reach out to those other projects and find out. It smacks of the "we know better" attitude of EN Wiktionary that I've observed on more than one occasion. Chernorizets (talk) 02:06, 9 November 2023 (UTC)[reply]
@Benwing2 I like this disclaimer on the Talk page of the "Croatian language" article on WP:
Croatian is a standardized register of a language which is also spoken by Serbs, Bosniaks, and Montenegrins. In English, this language is generally called "Serbo-Croat(ian)". Use of that term in English, which dates back at least to 1864 and was modeled on both Croatian and Serbian nationalists of the time, is not a political endorsement of Yugoslavia, but is simply a label. As long as it remains the common name of the language in English, it will continue to be used here on Wikipedia.
I think we'd be well-advised to put a version of this paragraph at the top of Category:Serbo-Croatian language, and maybe also on Wiktionary:About Serbo-Croatian. If nothing else, at least this would give a more principled reason for our choice of the name, rather than ease-of-use arguments or individual editors' interpretation of linguistics (however well-informed and good-intentioned they might be). Chernorizets (talk) 02:29, 9 November 2023 (UTC)[reply]
@Chernorizets: No objections to this. Benwing2 (talk) 03:02, 9 November 2023 (UTC)[reply]
@Benwing2 how would we make it happen? I'm not super familiar with language cat configuration. Chernorizets (talk) 03:19, 9 November 2023 (UTC)[reply]
@Chernorizets You might just put the text on the category page itself, above the call to {{auto cat}}; if we add it to the modules, it will require some work as there's currently a special handler for 'Foo language' categories that doesn't have any provision for customized category text. Benwing2 (talk) 03:26, 9 November 2023 (UTC)[reply]
Oppose furiously. All argument rests on social proof, i.e. conjectural contributor attitudes in place of linguistic realities, and even it does not work, as there will be contributors less attracted to editing this language, either because they like Yugoslavia or because they are historically conscious or diachronically oriented or because the individual languages lose relevancy – I don’t speak Croatian, Serbian or Bosnian, I speak an exotic like Serbo-Croatian! Which curiously has a consequence that I can well speak with people in Germany, hence pick up words, without knowing whether it was Croatian, Serbian or Bosnian or Montenegrin or what. Don’t ask tribes, that’s racist and given the vivid history, inflammatory; speakers are just currently in the process of forgetting the difference. Hence also why would one see from a text on the internet whether it is one or the other? For most native speakers here on Wiktionary I am not sure either even if I have chatted with them in Serbo-Croatian. If it isn’t long enough or just somewhat idiosyncratic and playing upon another regiolect, or just old, one might not know or be sure. Language categorization cannot depend on the place of publication. Fay Freak (talk) 04:34, 8 November 2023 (UTC)[reply]
@Fay Freak both Wikipedia, and the four large Wiktionaries I mentioned, give editors the choice between Serbo-Croatian, which is not the name of an official language in any country today (AFAIK), and the four independent standards. You raise the good point that some editors might prefer to work in Serbo-Croatian, and I'm fine with that. I just don't understand what's so different about English Wiktionary in particular that we'd want to deny the opportunity to create Serbian, Croatian, Bosnian or Montenegrin entries. Chernorizets (talk) 04:41, 8 November 2023 (UTC)[reply]
@Chernorizets: Perhaps it would even be being different for difference’s sake, so someone has a choice with this project as opposed to others :) But seriously, if I can talk to people and be understood, as employing their language essentially, and essentially correctly and not some pidgin, it is the same language and they don’t have separate ones. This means other projects purposefully fail at logics or economic allocation of resources. If you want to be supported by Croatian institutions to make a dictionary then you might limit your scope to Croatian for simplicity’s sake or the like, as getting the countries in hasn’t worked out well before 🫨. Humans doing business is nasty. And you are from the US, it is easy to ignore particularist politics and imagine languages inside of this country without identification by the users and there appraise the differences, as experimental science works; it is a bit like medical diagnosis or something: People don’t needs have the very same thing they claim, wrong beliefs about the body are widespread, so about the tongue. Fay Freak (talk) 05:07, 8 November 2023 (UTC)[reply]
@Fay Freak for what it's worth, I'm from Bulgaria. More importantly though, my thoughts have been shaped by talking to Serbian and Croatian speakers, as well as by comparing the way Serbian and Croatian Wikipedia cover the same contentious topics around language and identity. Of course, you could make the argument that I'm working with a limited sample of people - and therefore opinions - but that would be true for any Wiktionarian.
As to the criterion of mutual intelligibility, even on English Wiktionary we rely more on convention than purely linguistic reasoning. E.g. Catalan and Occitan are largely mutually intelligible, esp when you consider dialects, but we keep with established custom and treat them separately. I could probably dig up a bunch more examples like this, even with smaller language distances involved, and I will if you ask me to, but my point is that EN Wiktionary doesn't have a uniform treatment of "nearby" lects, perhaps because the world doesn't have one either. Croatian is an official language of the European Union. Serbo-Croatian is not. How we handle this is up for debate, but the reality of it isn't. Chernorizets (talk) 05:39, 8 November 2023 (UTC)[reply]
What is it about kći that would be improved by splitting the lects? Sure, the alt-forms are complex, but only one form is labelled dialectally. In fact, I find the labels quite useless; "by analogy with oblique stem forms" and "apheretic variant" are etymological information, not the stuff of context labels or qualifiers. This, that and the other (talk) 07:09, 8 November 2023 (UTC)[reply]
@This, that and the other it was just the most recent example I'd seen of a word with multiple variants, some of which - when you click on them - say they're regional. So the Chakavian stuff would go in a Chakavian entry, and the regional versions would go under the correct region. It may not be the best example - see zrak for some country label soup. Chernorizets (talk) 08:58, 8 November 2023 (UTC)[reply]
@Chernorizets Honestly that doesn't look so bad to me. Benwing2 (talk) 09:04, 8 November 2023 (UTC)[reply]
Strongly oppose splitting Bosnian/Serbian/Croatian/Monetengrin etc etc. I'm all for splitting languages, but this would be splitting based exclusively on politics. We can't please everyone. I don't however have the background knowledge to say anything about splitting Kajkavian/Chakavian/Shtokavian. Thadh (talk) 08:58, 8 November 2023 (UTC)[reply]
I have mixed feelings. I don't think it's purely a political split as stated above, that is a gross exaggeration, however politics do play a major role in this. I think a split based on nation would be a bad idea, but I'd be curious if splitting by various lects would make more sense. I think we'd have to see some examples of what splitting would potentially do to really understand if it makes sense lexically. Vininn126 (talk) 10:17, 8 November 2023 (UTC)[reply]
This split would be a waste of time and resources. One can underline that certain feature is specific to Serbian, Croatian, Bosnian, or Montenegrin even as of now. There is no need of overcomplicating the matter. Furthermore, the actual linguistic differences in the SCr dialect continuum lie within the former dialectal subgroups, which had been (mostly) supplanted by the standard. One would learn almost nothing in regard to them from the split of the standards. 86.185.23.196 11:07, 8 November 2023 (UTC)[reply]
Unfortunately, I'd have to weakly oppose this. I'm one of the biggest proponents of splitting languages, but it looks like this case, there's already a strong linguistic consensus that these lects are standardized varieties of the same language. Maybe the name of the current L2 could change if it's not as clear? Also, for the record, while we're at it, our 3 Norwegian lects should really really be combined into 1. AG202 (talk) 13:36, 8 November 2023 (UTC)[reply]
@AG202 I think there is general consensus outside of Norwegian (and maybe other Scandinavian) editors to merge Norwegian, but last time this discussion came up, the Norwegian editors were strongly opposed. I agree the current situation is non-ideal, to say the least. Benwing2 (talk) 21:33, 8 November 2023 (UTC)[reply]
Honestly, I feel like we’d need to bite the bullet someway or another. The split seems to be just based on non-linguistic reasons. It’d be best to try and convince the everyone else in that case, even though I hate to say it. AG202 (talk) 01:59, 9 November 2023 (UTC)[reply]
I don't see any good reason for comparison between Norwegian and Serbo-Croatian. They are in two very different situations. Ignoring the politics and history, there are atleast two opposite tendentions which become clear in Norwegian (since 1920-s) and Serbo-Croatian (since 1990-s). While the Norwegian mess is over time being more and more united, as Bokmål and Nynorsk are slowly becoming closer to each other, mainly because they are used in the same country, the Serbo-Croatian dialects are splitting their ways because they ain't no more in the same country. Funny enough, those languages which are considered as a part of Serbo-Croatian, they are divided into cross-border dialects. So if you ask me, Serbo-Croatian (as well as Norwegian Nynorsk), is just a standard spelling of those dialectal continuums, and I don't see any reason for splitting. Norwegian Bokmål in other hand is just Danish which became Norwegianized over time, and its grammar, lexicon and pronunciation are hard to merge together with the rural Norwegian dialects, which have more similarities to Swedish and Faroese. Anyway, I'm not so deep into the Serbo-Croatian stuff, so I should abstain from opposing/supporting its splitting. Tollef Salemann (talk) 17:34, 20 November 2023 (UTC)[reply]
@AG202 Yeah, although I do actually support merging Norwegian, I should point out that the Danish ancestry of Bokmål is a genuine linguistic point. Anyway, this is all a bit off-topic. Theknightwho (talk) 17:50, 20 November 2023 (UTC)[reply]
Oppose This basically boils down to a conflict between a prescriptive and a descriptive approach. The separate-language approach is strongly, vehemently prescribed due to politics and to reaction against abusive language policies of the past. It's perfectly understandable, but Wiktionary is a descriptive dictionary. Chuck Entz (talk) 15:29, 8 November 2023 (UTC)[reply]
@Chuck Entz it looks like we have very few - if any - active BCMS contributors, but this wasn't always the case since we have 50k+ BCMS lemmas (which is on the high end for Slavic languages on EN Wikt). I've been trying to "root cause" this, and I wonder if the term we've adopted (as well as the 2009 vote, which didn't pass btw) have something to do with it. I participate in a large online community (> 55k members) of Slavic-language speakers and enthusiasts, and I'm trying to encourage some of them to become Wiktionary editors. Having seen more than one heated discussion around how this language ought to be named, I just fear that even if we could attract volunteers, we might not be able to retain them, particularly if they're from Croatia. I'm searching for the right answer - I admit I don't have it. This proposal was based on precedent in other large Wiktionaries, as well as Wikipedia. Chernorizets (talk) 01:57, 9 November 2023 (UTC)[reply]
@Chernorizets Maybe, maybe not. You should look at who contributed them; I suspect a lot of them come from User:Ivan Štambuk, who identifies himself as a native Serbo-Croatian speaker but who has not been active since 2019 at the latest. For many languages, there are relatively few contributors, but they are often prolific, and so when they go inactive, the language stops getting new lemmas. This seems to have happened with Latvian, for example, where most terms (I think) were added by User:Pereru, who has been inactive since 2015. His entries are characterized by extremely thorough etymologies, BTW. It would be interesting to look at the number of lemmas over time; you can presumably access this by looking at the page history of Wiktionary:Statistics/generated, which is updated fairly often going back to 2010. You could also write a script to parse the Nov 1 dump with complete edit history; see this link. Benwing2 (talk) 03:01, 9 November 2023 (UTC)[reply]
Weak oppose I'd support separate L2s for Kajkavian and Chakavian (the latter might need its own Ekavian/Ikavian split with its intricacies), and a rename of S-C to "Bosnian, Croatian, Serbian, Montenegrin," but IMO this should be abbreviated in heads or templates to BCMS; if a shorter term is suggested which doesn't ruffle any feathers I'm open to it as well. In order to not totally isolate Kaj and Cha where there isn't a a common etymon with S-C, or when such a page is simply missing, I'd like etym sections to state "(Un)related to BCMS/Kaj/Cha," because I really do like the fact that there are links to the other lects' terms on a given entry (due to closeness and the marginal nature of the non-prestige lects on Wikt), and the fact that listing all forms interdialectally can add confusion or clutter, especially for learners, when terms might be given as eg. syns but only be synonymous in a certain lect (for example, ščap is Kaj but its synonym is given as the prestige štap, unmarked as standard, and not a different Kaj variant or the Croatia-only šćap (which you might expect a Kajkavian speaker to use more due to proximity)).
I think we also need a unified policy for writing alt spelling and jat reflexes. For example, on the Cyrillic page дрво I wrote the ux using the standard Serbian spelling седети, and on Latin drvo the ijekavian spelling sjediti. However, there is also an alternate ijekavian spelling, sjedjeti, which goes unmentioned. This could be solved with using slashes to show alternate spellings on the same level, and showing dialectally/regionally differing spellings/words on the entries of their specific (regional) lect.
Also, the issue of Montenegrin-specific letters needs to be addressed. Anarhistička Maca (talk) 22:52, 8 November 2023 (UTC)[reply]
I think the issue of any unified group of lects is a way of more easily marking which a word belongs to. It's very easy to mark a word as being specific to a lect, and an unmarked term is supposedly universal, but if it's in all but a few areas it's clunky to write everything out. I think we need a better system for that, but I'm not sure what. Vininn126 (talk) 23:03, 8 November 2023 (UTC)[reply]
@Anarhistička Maca regarding Kajkavian and Chakavian, things may not be as clear-cut as I had thought. Each of those is actually a dialect group rather than a single variety and, for instance, it turns out that some Chakavian subdialects are closer to Shtokavian than others. I've started a thread in a large online group asking native speakers of Shtokavian to rate their amount of mutual intelligibility with Chakavian and Kajkavian - comments have begun coming in, and I hope more people participate so that I can get a diverse set of perspectives. Chernorizets (talk) 02:15, 9 November 2023 (UTC)[reply]
@Chernorizets Yes, Chakavian is known for this. If you look at various Proto-Slavic pages, you'll see that all listed Chakavian descendants (or at least the ones I added) include the particular town where the term is used; this is consistent with usage in Derksen's Etymological Dictionary of the Slavic Inherited Lexicon (Leiden). Benwing2 (talk) 02:51, 9 November 2023 (UTC)[reply]
Oppose The merger was made on common sense. --Anatoli T. (обсудить/вклад) 22:54, 8 November 2023 (UTC)[reply]
Oppose. — Fenakhay (حيطي · مساهماتي) 02:12, 9 November 2023 (UTC)[reply]
Oppose due to seemingly clear mutual intelligibility and other points as raised in previous discussions. MedK1 (talk) 11:11, 9 November 2023 (UTC)[reply]
Abstain, because I don't speak any BSC, but I would like to comment. Several people here contrast politics with linguistic reality, but I don't think it's that easy, because the former has a strong effect on the latter. I'm reminded of the following quote from The Slavic Languages (Sussex & Cubberly, 2006, p. 74):
Naylor, writing in 1980, observed that ‘‘the linguistic differences between the two variants are no greater than those between British and American English and would not justify separating them into two separate languages’’ (1980: 68). This linguistic judgment has been overtaken by history, and it is difficult to conceive of a set of circumstances which would reunite Serbian, Croatian and Bosnian.
The majority of the political and cultural elite in former Yugoslavia are determined to have separate national languages and their determination makes it so. I think @Chernorizets makes good points and is sensibly cognizant of the downsides of their proposal. It seems to me that even if this is rejected now, it's only a matter of time that something similar is implemented. —Caoimhin ceallach (talk) 17:08, 22 November 2023 (UTC)[reply]

Interviews: Tell us about your experiences using Wikidata in the Wikimedia sister projects[edit]

Hello, the Wikidata for Wikimedia Projects team at Wikimedia Deutschland is investigating the different ways Wikidata is being used in the Wikimedia projects. If you would like to speak with us about your experiences with integrating Wikidata in Wikimedia wikis, please sign up for an interview in this registration form. Please note that currently, we are only able to conduct interviews in English.

For more information, visit our project page. Feedback is always welcome here. Thank you. Danny Benjafield (WMDE) (talk) 13:36, 8 November 2023 (UTC)[reply]

English Wiktionary policies related to pronunciation audio[edit]

Does updating the Help:Audio_pronunciations page require prior discussion and consensus in Beer parlour? In my opinion, the instructions are rather obsolete and misleading in their current form. More details can be found in the following Grease pit discussion [4] and I can repost my suggestions for improvement here if anyone is interested. —Ssvb (talk) 11:20, 9 November 2023 (UTC)[reply]

@Ssvb I'd say no; this sounds like the kind of non-contentious change that you can just go ahead and make. Benwing2 (talk) 20:30, 9 November 2023 (UTC)[reply]
@Benwing2: Thanks! I have updated Help:Audio_pronunciations with the hopefully non-contentious information about Lingua Libre, essentially describing the status quo.
Do you happen to know what's the status of Lingua Libre Bot? I see that its page says "Wikis in tests or needing approval: English Wiktionary ". Is having a bot automatically editing English Wiktionary articles to add references to Lingua Libre pronunciation files even desirable? —Ssvb (talk) 00:02, 12 November 2023 (UTC)[reply]
@Ssvb I don't know what this bot is. There is a bot User:DerbethBot operated by User:Derbeth that auto-adds pronunciation files from various sources but I think Lingua Libre may be on its deny list because of the uneven quality of its audio files. Benwing2 (talk) 04:17, 12 November 2023 (UTC)[reply]

Pronunciations for Minor Geography[edit]

I would like to let you all know that I now want to use some old gazetteers- see Chuxiong and Akesu for examples- to do pronunciations for all the words in Category:en:Places in China and etc. words. Here's some I've done just now: Chengkou, Chenghai, Chenggu. I plan to do this on all of them, and make it nice. I will try to fill in the gaps for words that aren't listed in any gazetteers if I can feel certain that I'm giving a reasonable pronunciation (like Chengdong; for some words like Qingjin, idk if English speakers have spoken stuff like this aloud in English language conversation). This process will probably bring to light some confusing dilemmas and errors and what have you, which I plan to bring up at the tea room. This process will hopefully include IPA pronunciations as well when I can kind of match syllables one-to-one. Let me know if you see any problems with this plan. I hope this will help increase the value of the entries to the readers, attract more editors, and serve as an example for other areas of geography. --Geographyinitiative (talk) 13:00, 9 November 2023 (UTC)[reply]

@Geographyinitiative My main concern is that the pronunciation of lesser-known places may not be very stable in English. I see you dealt with this in the Chenggu article but I'd expect the same to happen everywhere. Also, old gazetteers may have outdated pronunciations -- the spelling is likely to significantly influence the pronunciation of foreign toponyms, and so places written in Wade-Giles or Postal Romanization will likely have different pronunciations from the same place written in Pinyin. Benwing2 (talk) 20:34, 9 November 2023 (UTC)[reply]
"spelling is likely to significantly influence the pronunciation" for sure! I've marveled for a few years now, as I've added pronunciations to Chinese placenames myself, checking whenever possible both modern dictionaries and actual spoken examples on youtube, that spelling pronunciations seem to be the only pronunciations people use for Chinese placenames. Even when the Mandarin pronunciation uses only phonemes that English also has, even educated speakers with knowledge of Chinese use spelling pronunciations—consistently pronouncing -wu- as /wu/ (and not like spoken Chinese /u/), -yu- as /ju/ (even though Chinese has no /j/), pronouncing the other consonants as they are written in pinyin rather than as they are pronounced in Mandarin, and the vowel letters in the same consistent ways... (This is also true for German and French people pronouncing Chinese placenames!) I've actually considered writing, and I suggest we do just write, a table or module that just converts pinyin to (English) IPA.
As you noticed about Cheng-, a few pinyin vowel letters (consistently) get pronounced in a (consistent) few different ways, and for any sufficiently common placename containing that vowel letter all of the ways can be found on Youglish, but the module just needs to output multiple options for those letters. I've yet to find any Chinese placename where a syllable is pronounced some particular way that it isn't also pronounced in other placenames that have that syllable, except when a name is too uncommon to find the full compliment of possibilities—but I don't think that means the full compliment doesn't exist, any more than we would think a pinyin placename was {{lb|British}} if it was so rare that the only three books it appeared in happened to all be British. Even stress is predictable (most commonly it's either on the last syllable or equal on all syllables). - -sche (discuss) 04:09, 10 November 2023 (UTC)[reply]
When I recently added "chǔngʹko͞oʹ" to Chenggu, I looked back at my edit and thought to myself that there is no WAY that an English speaker, no matter their background, is going to read 'Chenggu' with a friggin "k" sound like in cat or tack. N O P E. But then again- perhaps. Perhaps in some hyperspecialized circumstances, it could happen- and indeed, the word would only be spoken in English in very special circumstances anyway. But to me, the "chǔngʹko͞oʹ" pronunciation is probably exclusive to someone thinking of the Ch'eng-ku/Chengku etc alternative forms of Chenggu.
When you look out over the inter-generational and inter-civilizational CHAOS of Citations:Hebei/Citations:Hopeh/Citations:Hopei, Citations:He'nan, Citations:Shanxi, Citations:Guangzhou, Citations:Yunan, Talk:Kuomingtang, and Citations:Xingjiang, you realize how ephemeral even the names of vast regions of China really are.
I will be thinking about your comments above and just correct me if you see anything that looks wrong. I'm open to whatever you have to say. --Geographyinitiative (talk) 10:18, 10 November 2023 (UTC) (Modified)[reply]

@-sche, Benwing2 holy shit guys I'm so excited. Have you seen my pronunciation work recently on the geography terms?? It's going really well I think. I have hoped to do this for a long time. Feel free to add IPA, I just haven't learned IPA yet. But I am planning to get to it. Geographyinitiative (talk) 14:57, 15 November 2023 (UTC)[reply]

Belarusian IPA notation[edit]

I see that the current Module:be-pronunciation produces IPA transcription [ˈzɫod͡zʲej] for the word зло́дзей (zlódzjej). But this somewhat differs from the sounds listed in the Belarusian phonology article (ɫ vs. l, o vs. ɔ, e vs. ɛ). Should it be [ˈzlɔd͡zʲɛj] instead? Or maybe something else? I'm not really familiar with IPA and would appreciate any help. —Ssvb (talk) 14:12, 9 November 2023 (UTC)[reply]

@Atitarev, @Benwing2: There exists Арфаэпічны слоўнік беларускай мовы (Orthoepic dictionary of the Belarusian language) published in 2017. It provides standard Belarusian literary pronunciation transcription for 117K words (albeit in Cyrillic notation) and can be potentially used as a source and reference for the pronunciation information of the Belarusian words in the English Wiktionary.
There was also an interview (in Belarusian) with the creators of this dictionary and an additional article about it. Basically, they developed an automatic converter program (similar to the be-pronunciation Lua module) with several hundreds rules encoded in it. The results of this automatic conversion had been verified by the linguists, who analysed speech of theatre performers, voice actors and other professional Belarusian language users. Their conclusion was that the automatic converter was 98% accurate (made mistakes in roughly 2 words out of 100). For the paper edition of the dictionary, the automatic convertor's mistakes had been corrected by humans, plus alternative variants of pronunciation had been added where appropriate. The authors of the dictionary also have their own website, which can be used to get non-perfect automatically generated transcriptions (also in IPA format) or query information from paper dictionaries.
Now I wonder. How should the words like чэ́шскі (čéšski) be handled in Wiktionary articles? The above mentioned dictionary lists two pronunciation variants in Cyrillic notation: "[чэ́шск'і] // [чэ́ск'і]". If I add manual overrides IPA(key): [ˈʧɛʂskʲi] replace ʧ with t͡ʃ, invalid IPA characters (ʧ) and IPA(key): [ˈʧɛskʲi] replace ʧ with t͡ʃ, invalid IPA characters (ʧ) via the {{IPA|be}} template, then the correct symbols for the Belarusian IPA notation still need to be clarified first (ɫ vs. l, o vs. ɔ, e vs. ɛ). Alternatively, maybe the Module:be-pronunciation could gain an extra feature to allow accepting transcription overrides in Cyrillic notation (directly copied from Арфаэпічны слоўнік беларускай мовы) and automatically convert them to IPA?
Ssvb (talk) 14:31, 9 November 2023 (UTC)[reply]
@Ssvb The way this is generally handled in various pronunciation modules is to use a respelling using the rules of the language in question, hence you could use чэ́скі as a respelling (although if the reduction of шс -> с is systematic, the module could be made to generate it automatically). As for ɫ vs. l, o vs. ɔ, e vs. ɛ, I can't answer that well enough as I don't know Belarusian; but User:Atitarev may be able to answer. Benwing2 (talk) 20:39, 9 November 2023 (UTC)[reply]
@Benwing2, @Ssvb:
"э" should normally produce [ɛ]. Since this is straightforward (?), I have just changed it.
I think "о" is just [o] and we use [ɫ] for a hard, unpalatalised "л".
@Benwing2, I think @Ssvb is asking to allow [ˈt͡ʂɛʂskʲi] (respelled [чэ́шск'і]), as one of the allowed pronunciation, this is the same with Russian (the "шск" part) where че́шский (čéšskij) is pronounced [ˈt͡ɕeʂskʲɪj] but in Belarusian [ˈt͡ʂɛskʲi] (respelled [чэ́ск'і]) is also correct (two variants).
There may be some small imperfections, which may be harder to iron out, since resources are rather poor. There is an unfinished discussion re [e] vs [ɛ] for the Russian module, there may be some similarities with Belarusian, e.g. where "э" should be [e] or where "е" should be [e]?
Module_talk:ru-pron#Stressed_е_not_followed_by_consonant_+_front_vowel_or_by_palatalised_consonant_should_be_ɛ? Anatoli T. (обсудить/вклад) 22:32, 9 November 2023 (UTC)[reply]
@Atitarev OK, I have no idea whether [e] or [ɛ] is more correct for Russian or Belarusian; as a native speaker of the former you'd know better. We can fix чэ́шскі to generate [шск] instead of or in addition to [сːк] (which variants are correct?). Benwing2 (talk) 22:38, 9 November 2023 (UTC)[reply]
@Benwing2, according to @Ssvb, [ˈt͡ʂɛʂskʲi] is correct but I think we should display both [ˈt͡ʂɛʂskʲi] and [ˈt͡ʂɛsːkʲi].
In the linked discussion, started by @User:SUM1, "этот" vs "эти" differ slightly, the former has [ɛ], the latter [e] but I would find a bit hard to define this rule. More here: w:Russian_phonology#Front_vowels. Anatoli T. (обсудить/вклад) 22:55, 9 November 2023 (UTC)[reply]
@Atitarev There is a lot of detail in that article. It indicates the different allophones and their contexts but I'm not sure we want to go into that much detail; I think it would just overwhelm the language learner. Benwing2 (talk) 23:08, 9 November 2023 (UTC)[reply]
@Benwing2: I agree, the Russian module is fine. Thank you for all the efforts! Anatoli T. (обсудить/вклад) 23:11, 9 November 2023 (UTC)[reply]
@Atitarev I mean, right now there seem to be at least three different IPA notation flavours for Belarusian and this doesn't feel right:
It would be great if at least Wiktionary and Wikipedia could agree with each other on the right symbols choice for their IPA notations. The talk page of the Wikipedia article had some discussions. And the dictionary web interface shows some contact information for feedback too. —Ssvb (talk) 00:10, 10 November 2023 (UTC)[reply]
@Ssvb: They don't have to match 100%. It depends on the level of precision, which is always a point of discussion here for various languages. Compare the Russian Wiktionary [ˈfondəvɨɪ̯] with our: [ˈfondəvɨj] for the adjective: фо́ндовый (fóndovyj) or [ɛlʲɪkˈtronʲɪkə] vs our [ɪlʲɪkˈtronʲɪkə] for the noun электро́ника (elektrónika). It has to be consistent and clear, ideally decisions documented.
Do you agree, @Benwing2? Anatoli T. (обсудить/вклад) 00:52, 10 November 2023 (UTC)[reply]
@Atitarev Yes, absolutely. There is a lot of wiggle room in the IPA as well as "conventional" usages of IPA symbols that may not any longer reflect current reality (e.g. the use of [ʌ] for the vowel of cut, which at least in American English is actually [ɐ], and the values of most French nasal vowels; the symbols [ɑ̃ ɛ̃ ɔ̃] reflect the pronunciation of maybe 150 years ago, when the current pronunciations are more like [ɒ̃ æ̃ õ]). Benwing2 (talk) 01:56, 10 November 2023 (UTC)[reply]
@Atitarev: Regarding your examples of Russian. Wouldn't it make sense to list both [ɛlʲɪkˈtronʲɪkə] and [ɪlʲɪkˈtronʲɪkə] as alternative pronunciation variants for электро́ника (elektrónika)? And maybe have an audio sample for each variant if they actually differ. Here are Russian pronunciation audio files recorded in Lingua Libre by different users for various words starting with "элект-": . —Ssvb (talk) 06:46, 10 November 2023 (UTC)[reply]
@Ssvb, @Benwing2: The pronunciation module is based on defined rules. Unstressed "э" is normally reduced in natural pronunciation, just like "е". Some people's pronunciation is affected by the spelling in a slow speech. It is equally applicable to "э" and "е", often to "я" but almost never to "о". It would be burdensome to provide spelling pronunciation to each entry. The most natural and relaxed pronunciation was chosen. Anatoli T. (обсудить/вклад) 21:53, 12 November 2023 (UTC)[reply]
@Ssvb, @Benwing2: I think [ˈzɫod͡zʲej] is correct. Anatoli T. (обсудить/вклад) 22:36, 9 November 2023 (UTC)[reply]

"geographic region" vs. "administrative region"[edit]

I am trying to fix up the handling in {{place}} of entities identified as "regions". The problem is that in some countries, "region" has a specific administrative sense, often as a top-level subpolity underneath the country, whereas in others, "region" is merely a geographic and cultural term for an area with some sort of cohesion. This was leading to problems e.g. in France, which has political regions such as Normandy, Hauts-de-France and Provence-Alpes-Côte d'Azur (quite a mouthful) as well as geographic/cultural regions such as the Loire Valley. My solution for France was to use the term "administrative region" to refer to the political kind of region and just "region" to refer to the geographic/cultural kind. For France at least, this is confirmed by Wikipedia, which terms the political kind of region an "administrative region" e.g. in the page on Provence-Alpes-Côte d'Azur. OTOH, this doesn't seem to apply to all countries, e.g. Regions of Turkmenistan just refers to the 5 political regions as "regions". Nonetheless I'm thinking of requiring that political regions be declared as "administrative regions" in order to be categorized as political entities. The idea is that the category system would recognize "Regions of COUNTRY" categories for all countries just like all countries can currently have "Cities of COUNTRY" and "Rivers of COUNTRY" categories, but only for countries with political regions would "Administrative regions of COUNTRY" be recognized. An alternative is to use the term "political region", but I'm not sure that term has much currency. Note for example that the Wikipedia article Regions of Turkmenistan is categorized under Category:First-level administrative divisions by country.

Thoughts? Benwing2 (talk) 08:30, 10 November 2023 (UTC)[reply]

I think that cultural/geographic entities can be "regions", like America's Midwest and top-level administrative political boundaries can be "administrative divisions", e.g. Indiana is generically an "administrative division of the United States" and specifically a "state of the United States". This avoids problems where there are specifically-defined government "regions", such as Regions of Czechia. "Administrative region" also works as far as I'm concerned. —Justin (koavf)TCM 08:40, 10 November 2023 (UTC)[reply]
I am in favor of "administrative division" over "administrative region" and especially over "political region". For one, as mentioned, it's already standard on Wikipedia, and consistency is a strong argument. Additionally, the term "region" typically implies some level of geographic cohesion, which political borders don't necessarily display. Enclaves, exclaves, gerrymandering, and "wastebasket divisions" of leftover territory can all create somewhat-arbitrary collections of distant pieces of land that are lumped together under one government roof. Using the term "division" in this sense avoids any ambiguity.
"Political region" could be misconstrued even more strongly, implying areas where certain factions (Conservative, Democratic, Taliban, abortion stances, whatever) have the strongest support. (I don't think it should be used that way, but I'm sure someone with strong opinions would if that were the chosen label for this category.) Qwertygiy (talk) 17:31, 10 November 2023 (UTC)[reply]
My interpretation of Benwing's proposal was that "administrative region" was only to be used in cases where the administrative divisions are actually, officially called "regions". The local term for the administrative division (e.g. states, provinces, ...) will continue to be used in all other cases. If this is what is being proposed, it does not make sense to use the term "administrative division" in this context. @Benwing2 correct me if I am wrong here.
In any case I support the proposal as written. This, that and the other (talk) 00:13, 11 November 2023 (UTC)[reply]
@This, that and the other Yes, that's exactly right. Benwing2 (talk) 00:30, 11 November 2023 (UTC)[reply]

WT:PREFS show of hands - is anybody using it?[edit]

This page, which has pride-of-place linkage from the third position of our sidebar, has become very dusty. I just went through it and removed all the preferences that were non-functional or had been superseded by a gadget. I ended up removing around half the options. A number of those that remain are quite trivial visual changes, like relocating {{was WOTD}} to a different position on screen, indenting {{also}} more than usual, and changing the shape of bullets in the Monobook skin.

If WT:PREFS is as little-used as I believe it is, I'm inclined to decommission it entirely after creating gadgets for any preferences which are actually considered useful. Note that logged-out users can use gadgets via WT:Preferences/V2, which is linked via a "Preferences" link in the top-right of the page for these users. On the other hand, if a large number of the WT:PREFS preferences are in use by various people, it would be possible to un-phase-out the page and improve its visual appearance.

What I'm asking is:

  1. Does anybody use any of the features at WT:PREFS? If you're unsure, the answer is most likely "no". Anyone who uses these features would periodically have to re-enable them when you switch computers or browsers - as the title implies, these preferences are per-browser and are not saved in your Wiktionary account.
  2. If yes to (1), which features do you currently have switched on? Do these features matter enough to you that you would like them to remain available?

This, that and the other (talk) 11:44, 10 November 2023 (UTC)[reply]

I do have settings turned on from there, but they really should be moved. Vininn126 (talk) 12:18, 10 November 2023 (UTC)[reply]
Having checked, however, none are turned on. I do remember in the past I had some turned on, but I don't remember which. Vininn126 (talk) 12:46, 10 November 2023 (UTC)[reply]
Nope. Equinox 12:34, 10 November 2023 (UTC)[reply]
No, but if I'd noticed they were there I'd have tried some of them out, and will do so now, specifically:
Edit sections without going to the edit screen. (Same as Ædit?)
Enable audio recording tool. (doesn't work AFAICT. See User:Yair rand/AddAudio.js.)
Filter watchlist and recent changes to only show changes for certain languages. (not compatible with enhanced watchlist and recent changes)
Add a button next to the search box to simplify inputting special characters. (now only works for he, eo, and ru)
I didn't yet look to see whether there are gadgets for these. DCDuring (talk) 17:40, 10 November 2023 (UTC)[reply]
Above are superseded or non-function. (See comments in parentheses for examples.) DCDuring (talk) 20:05, 10 November 2023 (UTC)[reply]
Good point about AjaxEdit. I removed that from WT:PREFS. This, that and the other (talk) 00:02, 11 November 2023 (UTC)[reply]
I'm not using it. Of the various prefs listed there, "Hide the copyright warning in the edit window" seems like something we should not be allowing people to do. For "Disable the javascript redirect between pages that differ only in case", plausibly useful to people adding e.g. German nouns, maybe we should (or already do?) have a gadget that allows people to turn off auto-redirection in general. - -sche (discuss) 17:57, 10 November 2023 (UTC)[reply]
Ha, never realized it existed until now. — Sgconlaw (talk) 18:40, 10 November 2023 (UTC)[reply]
Strongly agree re the copyright warning - why the hell was that even added in the first place? Theknightwho (talk) 19:23, 10 November 2023 (UTC)[reply]
There's a copyright warning on Wikt? CitationsFreak (talk) 20:15, 10 November 2023 (UTC)[reply]
MediaWiki:CopyrightwarningJustin (koavf)TCM 20:19, 10 November 2023 (UTC)[reply]
I have used it before, but some don't work and it doesn't persist. Since it's a failing kludge, it needs to be replaced with proper gadgets, where available. That old record audio in the browser one was great, but no one seems able or motivated to fix it. :/ —Justin (koavf)TCM 19:29, 10 November 2023 (UTC)[reply]
Yes, I've used it, though most of the ones I had selected became gadgets and I switched to using those. I currently have the following selected, though neither is that important to me or works properly:
  • Enable audio recording tool
  • Filter watchlist and recent changes to only show changes for certain languages
Andrew Sheedy (talk) 20:11, 10 November 2023 (UTC)[reply]

This is great input, thanks all. The feedback is more or less as I expected. I've proposed an action (convert to gadget or remove) for each WT:PREFS feature at User:This, that and the other/WT:PREFS dispositions - please take a look and speak up if you have strong feelings (or edit that page directly if you like). This, that and the other (talk) 00:14, 11 November 2023 (UTC)[reply]

This is too small a sample for action. I have added comments on the user page, which probably should be a temporary?/permanent? project page. DCDuring (talk) 14:11, 11 November 2023 (UTC)[reply]
My intent is to leave this discussion open for a month or so, noting that it is linked from a prominent box on WT:PREFS. So I invite further comment from anyone.
Also, thanks for your input on the user page. This is intended to a time limited project so we may as well leave it where it is, I feel. This, that and the other (talk) 21:32, 11 November 2023 (UTC)[reply]
I think that some of the wishes embodied in these 'Preferences' items are pretty good. I don't have any idea how difficult it would be to implement them durably, nor whether they will catch the interest of those with the capabilities to implement. DCDuring (talk) 00:21, 12 November 2023 (UTC)[reply]
I have now decommissioned WT:PREFS. A large red box was present on the page for two months inviting people to comment here - but nobody did. Five of the preferences have been migrated to gadgets that you can turn on in your preferences:
  • Appearance gadgets
    • Hide translation boxes entirely, instead of having them shown collapsed.
    • Display headers inline.
  • User interface gadgets
    • Add links to "nearby" entries at the top of each language section, similar to other online dictionaries.
  • Editing gadgets
    • Filter watchlist and recent changes to only show changes for certain languages.
  • Miscellaneous gadgets
    • Disable the automatic timed redirect between pages that differ only in case.
WT:Preferences for users without an account (previously WT:Preferences/V2) remains available and functional so that IP users can enable gadgets.
Thanks to all for contributing to this discussion. This, that and the other (talk) 05:37, 17 January 2024 (UTC)[reply]
Ah, thanks for doing all this. I missed this thread, but I raised a similar question back in April. I'm glad we've finally moved on to the newer and much more convenient interface. Soap 21:04, 20 January 2024 (UTC)[reply]

Stenoscript[edit]

We have around 100 entries for abbreviations used in Stenoscript, an English shorthand system. They are placed in English sections but often lack a valid PoS header due to their flexibility (e.g. ndv, rsp). How should we approach these terms? I am not sure how many works are published in Stenoscript (apart from manuals) that count towards attestation. If they are kept, they should all have some kind of header and proper categorization. (Pinging @Kwamikagami.) Einstein2 (talk) 18:33, 11 November 2023 (UTC)[reply]

@Einstein2 We have had problems before with the quality of Kwami's entries and I think this very discussion came up previously. We either need to use proper headers and categories or delete the entries. Benwing2 (talk) 04:20, 12 November 2023 (UTC)[reply]
We have several POS headers that do not correspond to parts of speech as the term is commonly understood, such as Ligature and Symbol. Some assignments are artificial; for example, calling the Japanese character a "syllable" is an act of desperation. A few are language-specific, such as Kanja. The list of allowed headers is not frozen; new additions can be proposed. (Determinative was added in 2022.) The main issue with Shorthand is the scarcity of applicable entries. While it could theoretically also do duty for other languages, non-alphabetic shorthand notations are practically inaddible. One possible approach is to classify, say, ak under the POS headers of Noun (act) and Verb (acknowledge) and add as Usage notes to e.g. the verb entry, “This shorthand can also be used for related words (acknowledges,acknowledged, acknowledging, acknowledgement, etc.)”.  --Lambiam 21:11, 12 November 2023 (UTC)[reply]

I've seen stenoscript used to abbreviate scattered words in handwriting, though not in printed works. I think it's worth listing them, though agree that we need some formal solution to the POS header problem. kwami (talk) 21:54, 13 November 2023 (UTC)[reply]

Somali Orthography[edit]

The Somali latin alphabet is remarkably phonemic, with the exception of pitch accent and front/back vowel distinction.

In my opinion, an umlaut should be used to distinguish /æ/, /ɛ/, /ɪ/, /ɞ/ and /ʉ/ from their "tense" counterparts /ɑ/, /e/, /i/, /o/, /u/ the same should obviously apply to long vowels which are simply written as two vowels in a row, as in finnish.

Pitch accent can be phonemically described with an acute diacritic, although it has three different phonetic realizations: high, low, and falling.

Just like in latin and ancient greek, these diacritical marks should not be used in page names, but within the pages themselves.

This is obviously very difficult due to the surprising lack of written somali sources. But I do believe it will have to be done eventually.

Ελίας (talk) 22:36, 11 November 2023 (UTC)[reply]

Somali has a standard orthography. It does not distinguish the tense and lax vowels ever. The pitch accent is, sometimes (I believe the grave is used always, but maybe that's only for some pitches and acutes for others).
We shouldn't be creating a new orthography for a language that already has one. Thadh (talk) 22:59, 11 November 2023 (UTC)[reply]
IMO what User:Ελίας is asking for is not a new orthography but diacritics to identify important vowel distinctions. I would be surprised if there are no dictionaries that contain such diacritics. Benwing2 (talk) 04:21, 12 November 2023 (UTC)[reply]
@Benwing2: I'm yet to find one, which is very sad, because that was the main thing stopping me from editing the language. Thadh (talk) 09:11, 12 November 2023 (UTC)[reply]
My proposal is not to create a new orthography, it is simply to use diacritics within the pages themselves and not in the page name.
Compare an ancient greek entry such as "θυμός" which you can look up without using the macron to indicate vowel length, but has the form θῡμός within the page itself.
This is reinforced by the fact that, as far as I know, the ancient greeks didn't indicate vowel length with the exception of eta and omega.
Ελίας (talk) 09:41, 12 November 2023 (UTC)[reply]
I agree with your proposal, FWIW. Benwing2 (talk) 09:47, 12 November 2023 (UTC)[reply]
Ancient Greeks didn't, but later scholars did, long before Wiktionary came along. If you can find any diacritics already in use (by anyone) outside of Wiktionary, you would absolutely get my support, but otherwise - no.
And @Benwing2: This situation is equivalent to adding diacritics to vowels in English to indicate what phoneme they represent. Thadh (talk) 11:01, 12 November 2023 (UTC)[reply]
@Thadh Well, in fact, there are several dictionaries that do just that for English. Benwing2 (talk) 11:27, 12 November 2023 (UTC)[reply]
Well, there aren't for Somali, not that I know of. I would be pleasantly surprised to be proven wrong. Thadh (talk) 11:30, 12 November 2023 (UTC)[reply]
@Thadh How about scholarly papers? I would think some of them would use notation like this to indicate the pronunciation. How do Somali dictionaries indicate pronunciation? Benwing2 (talk) 12:23, 12 November 2023 (UTC)[reply]
@Ελίας: Okay I have found one grammar discussing possibilities to distinguish the two vowel series (Nilsson 2020), he gives the possibility to notate Ä and Ą for the tense vowels (So /e/ <ë> or <ę> and so on). Honestly surprised to find any mention of diacritics used for this, but both solutions seem fine to me. Thadh (talk) 12:31, 12 November 2023 (UTC)[reply]
(By the way, I must note that he doesn't use either of these systems in the entirety of his grammar, he only mentions them) Thadh (talk) 12:40, 12 November 2023 (UTC)[reply]
I think it would be better to use only one of these, the umlaut. The reason for this is that, according to Nilsson, the "heavy" vowels are less common than "ordinary vowels", and also because the ogonek is less common on keyboards. (I must note, however, that according to Nilsson, /ɪ/ and /ɛ/ are ordinary vowels while /i/ and /e/ are heavy vowels) Ελίας (talk) 12:57, 12 November 2023 (UTC)[reply]
@Ελίας: Yes, so <ä>, <ë>, <ï>, <ö>, <ü> for /æ/, /e/, /i/, /ɞ/, /ʉ/ respectively. Thadh (talk) 18:23, 12 November 2023 (UTC)[reply]
@Thadh You're right, I got mixed up with turkish where ï is a back vowel. The umlaut should indicate the front series /æ/, /e/, /i/, /ɞ/, /ʉ/. Thank you. Ελίας (talk) 19:59, 12 November 2023 (UTC)[reply]
And on pitch, I think we should follow what John Saeed mentions in his grammar: <á> for high, <a> for low, <à> for falling. This seems to be the accepted standard for pitch notation. Thadh (talk) 12:46, 12 November 2023 (UTC)[reply]
If you mean "Central Somali - A grammatical outline" by John I Saeed, then I couldn't find any mention of diacritics other than the acute accent. Also, it seems more parsimonious to use the least amount of diacritics as Somali apparently uses a mora-based pitch accent and not tonemes. The falling tone is phonemically a high-low sequence, so for a long falling /ɑ/ vowel it may be represented as "áa" instead of "àa". But if the consensus is, in fact, to use diacritics for the falling tone, then I agree that it should be followed, even if it is more complicated than it should be. Ελίας (talk) 13:17, 12 November 2023 (UTC)[reply]
@Ελίας: No, I mean Somali by John Saeed (1999) →ISBN. You should see what is used there (note that he mentions using the circumflex (â) for the phonological section only, so just skip over that, he then proceeds using the established practice). Thadh (talk) 18:16, 12 November 2023 (UTC)[reply]
@Thadh "The first observation is that the perceived three tone system can be simplified to two units by treating falling tone (FG) as a sequence of High (H) and low (L) tone." (p. 18-19) He also makes use of this "phonemic orthography" in page 19: *góol > gôol, *náyl > nâyl. I think it would be thriftier to treat the falling tone as high-low. Ελίας (talk) 20:15, 12 November 2023 (UTC)[reply]
@Ελίας: As I said, ignore his phonological section. Later in the book, grave is used throughout for where he uses a circumflex - gòol, nàyl. Thadh (talk) 20:58, 12 November 2023 (UTC)[reply]
@Thadh Then I agree, we should use the established practice. (As long as it is accurate, it doesn't make much of a difference to me.) If I am not mistaken, we have reached consensus. I am new to Wiktionary, and Wikimedia as a whole, so I am not quite sure of what needs to be done for this to become an official policy, but I would guess that a vote needs to be held? Ελίας (talk) 21:16, 12 November 2023 (UTC)[reply]
@Ελίας I don't think a vote is necessary for these sorts of things. Usually votes are for policies that apply to Wiktionary as a whole or for giving someone new privileges e.g. admin/bureaucrat/bot owner. Changes for individual languages just need consensus from the relevant editors. Benwing2 (talk) 21:20, 12 November 2023 (UTC)[reply]
@Ελίας: No vote needed. You should write WT:About Somali including a part about vowel marking in headwords (compare WT:AAA and other "WT:About ..." pages). Thadh (talk) 21:20, 12 November 2023 (UTC)[reply]
@Ελίας no need for a vote. Unless there's some conflict with site-wide rules, the community of editors for a given language have quite a bit of autonomy. The main question is whether there are other people who work with Somali that haven't had a chance to weigh in. Chuck Entz (talk) 21:23, 12 November 2023 (UTC)[reply]
I generally agree with Thadh. I would heavily caution adding phonemic distinctions in the orthography that aren't used by natives and/or grammars in the language. We don't change our orthography in English here to add accent marks or to distinguish /θ/ & /ð/ or the many other consistencies in English. Same with a bunch of other languages. The headword line should match the orthography, while the Pronunciation section should show the actual pronunciation. There are languages that do have optional accent/tone/vowel markers like Hausa or Igbo, but those are also well-documented and can be found in dictionaries and such (and can be found occasionally with native writings), but if that's not the case with Somali, I would oppose any sort of additional change. AG202 (talk) 21:23, 12 November 2023 (UTC)[reply]
@AG202 English is a good example, actually. Wiktionary uses enPR in many entries. That's very similar to what I'm proposing. Ελίας (talk) 21:43, 12 November 2023 (UTC)[reply]
@AG202: Not sure English is a good counterexample; there are dictionaries in English that use diacritics to indicate pronunciation, and the Book of Mormon does too. Also English orthography is heavily irregular while I gather Somali orthography is quite regular other than not marking tone or vowel "heaviness". Benwing2 (talk) 22:59, 12 November 2023 (UTC)[reply]
@Benwing2: But my point is that even though some English dictionaries do that, we do not (nor do other major dictionaries afaik), and I haven’t seen anyone seriously propose that we include them on the headword line (and doubt that it’d have much support).
@Ελίας: If it’s just in the pronunciation section, then I’d support it, but in the headword line, no without other evidence. AG202 (talk) 23:10, 12 November 2023 (UTC)[reply]
@AG202Could you clarify what you mean by evidence? Ελίας (talk) 23:13, 12 November 2023 (UTC)[reply]
@AG202 I guess I don't see the relevance here. Some languages like Russian and other Slavic languages do include such markings, some don't. English and Somali are nothing alike. Benwing2 (talk) 23:27, 12 November 2023 (UTC)[reply]
The analogy was to make an emphasis on what orthographies are actually used. In English, accent marks aren’t commonly used even in English dictionaries so we don’t put them in headword lines. We should apply the same standard and consistency to Somali and not invent things that aren’t used. That’s why I ask for sources to see if that distinction is actually made in multiple sources using Somali. AG202 (talk) 05:16, 13 November 2023 (UTC)[reply]
@AG202 IMO this is not a good analogy because English lexicography has well-established practices while I doubt the same can be said of Somali. I think a better comparison is to other African languages and their treatment in Wiktionary. From looking through Module:languages/data/2, quite a number of African languages have extra diacritics, although I don't know what the diacritics stand for. Benwing2 (talk) 05:28, 13 November 2023 (UTC)[reply]
The other African languages that I can speak of like Igbo and Hausa (as mentioned in my initial comment) have explicitly optional diacritics and characters that are cited in their standard orthographies and can be seen in multiple dictionaries and vocabularies (with input from natives). This also can be seen in materials for non-natives. I don’t know if the same can be said for Somali from what I’ve seen. AG202 (talk) 05:51, 13 November 2023 (UTC)[reply]
@Benwing2: Somali has tons of dictionaries. Here are some examples of how they handle these vowel diacritics.
  • De Larajasse (1897): pitch diacritic (acute for any) if not penultimate
  • Nakano (1976): no diacritics
  • Farah (1992): no diacritics
  • Farah (1995): no diacritics
  • Awde (1999): no diacritics
  • Adam (1999): no diacritics
  • ESL dictionary (2001): no diacritics
  • Puglielli (2012): no diacritics
I think this is pretty clear, no? I personally am partial to pitch marking, which is used by many grammars, but I am yet to use these tenseness marks in use. Thadh (talk) 12:42, 13 November 2023 (UTC)[reply]
@Ελίας The above is what I meant by evidence. Checking some other resources as well:
  • A Somali Newspaper Reader (1984): No diacritics
  • Colloquial Somali (1995): Pitch diacritics, acute
  • Somali (Saeed, 1999): Pitch diacritics (acute for high/stressed, grave for falling, and no accent for low), stating the following on the vowel distinction:

The relationship between the sets of front and back vowels is interesting. Firstly they are not simply phonetically conditioned variants and thus are not allophones in classical phonemic terms. Individual members of the major lexical categories, for example nouns, verbs and adjectives, must occur with a specific vowel quality and there are a number of minimal pairs […] However such minimal pairs are very few and for the most part the back/front distinction is important for correct pronunciation but not for distinguishing lexical meaning.

Looking at some orthographic examples here and especially the grammar from the language committee, there's no mention of umlauts or different orthographies for vowel qualities. I could also take a look at Af Soomaali Aan Ku Hadalno (hadallo), but also I'm not going to purchase it just for this, and I expect that it'd have the same orthography. Thus far, there's only been one source that's suggested umlauts but doesn't even use them itself, which is very telling. There are plenty of Somali writings to look at; you just have to be willing to take the effort to look for it since it's not as accessible online for various unfortunate reasons. It's not lacking literature at all.
Overall, again, I do not think that we should impose an orthography that's used very rarely (if at all). Some languages do not show phonemic distinctions in their orthography and that's fine; English doesn't either. That does not mean that we should add distinctions that aren't there in even optional spelling. It should only be added to the pronunciation section. I would also remove this addition to WT:About Somali as there hasn't been a consensus to add it. AG202 (talk) 16:29, 13 November 2023 (UTC)[reply]
@AG202 Once again, I would like to clarify: I am not proposing an orthographic reform for Somali. I am only proposing a form of phonemic notation, just like enPR, to be used on somali entries. I named the discussion "Somali Orthography" because I wanted to emphasize the shortcomings of Somali orthography, and also because I wasn't exactly sure of what my project was at the time. Ελίας (talk) 17:15, 13 November 2023 (UTC)[reply]
Then yes, I’d support a change in the pronunciation section, but what we have currently at WT:About Somali about changing the headword line cannot stay. AG202 (talk) 17:39, 13 November 2023 (UTC)[reply]

Gay slang vs. LGBT slang[edit]

We currently have a valid {{lb|en|gay slang}} label with the corresponding category tree. On the other hand, a relatively large number of definitions are tagged as {{lb|en|LGBT|_|slang}} or {{lb|en|LGBT|slang}} as there is no valid "LGBT slang" label. Wikipedia considers the two terms synonymous. However, Category:English gay slang states: "English slang terms whose usage is typically restricted to homosexual people."

I suggest renaming the label and category to "LGBT slang" and revising the category description as I don't think there is a distinct slang vocabulary "restricted to homosexual people" (as opposed to other LGBT communities). This would also enable the proper categorization of the entries mentioned above. There is also Category:Transgender slang by language, which could potentially be a subcategory. Einstein2 (talk) 23:26, 12 November 2023 (UTC)[reply]

Hmm, I like your proposal for a new category, but why not leave the gay slang category intact, and make both that and transgender slang children of the new LGBT slang category?
There is definitely some specifically gay slang .... note the easily missed Category:Polari subcategory at the top there. Most of those words, such as eek, are opaque to modern readers, but we categorize old words just as much as new. Additionally I would say that there are some words that, while they might be known to other communities, are still gay slang because they refer specifically to gay people .... more of those terms are for men right now than for women.
Thanks, Soap 09:38, 13 November 2023 (UTC)[reply]
I support the notion of a new Category:LGBT slang (which we should define as accommodating all of 2SLGBTQQIA+) with more specific subcategories, both language-wise and regarding sexual and/or gender identity.  --Lambiam 15:41, 13 November 2023 (UTC)[reply]
Support - makes sense to have this. Theknightwho (talk) 20:45, 13 November 2023 (UTC)[reply]
Support but what do we do with the category Category:English gay slang? Most of the terms seem to be either LGBT or gay-male-specific, which suggests renaming the latter to Category:English gay male slang; but then someone knowledgeable will have to go through and recategorize the existing terms appropriately. Benwing2 (talk) 21:45, 13 November 2023 (UTC)[reply]

Proper definition of when Old Galician-Portuguese ends and Galician/Portuguese starts.[edit]

This is User:MedK1 and I feel like it's about time we define this once and for all. Paging @Stríðsdrengur as the only user who's edited an OGP page recently seemingly.

At some point, we need to draw a firm line. The Galician pages with quotes from the 13th century aren't okay. Portuguese obsolete forms like muyto are subject to the CFI like everything else in the language (It's a WT:WDL) after all), and if somebody RFVs it (which I might soon), I don't doubt it'd fail.

That word and some others were most definitely much more used back in OGP's time, and so, properly adding them to OGP would greatly improve its word-count (currently depressing) and would prevent any interesting information from being removed out of Wiktionary due to limited attestation.

I believe these are the places we could draw the line:

This was supposed to be an exhaustive list. I'm partial to 1516 and 1536 myself; I feel like their reasons are the most strong, and Pero's writing in 1500 has a distinct lack of Renaissance (read: fancy) spellings compared to what we imagine as the start of Modern Portuguese; it's distinct from the 1516 and 1536 spellings as linked above, and worlds apart from the 1789 and 1890 dictionaries that I left... somewhere here in Wiktionary. Thoughts? 2804:18:7B:CB71:1:0:5BBF:CAD5 13:16, 13 November 2023 (UTC)[reply]

@MedK1 I am in agreement that we should not use quotes from Old Galician-Portuguese to illustrate modern Galician terms. User:Nicodene and/or User:Ultimateria might have thoughts as general Romance contributors, and there are various contributors to modern Portuguese who may want to weigh in; otherwise I think whatever you think is best is fine. Benwing2 (talk) 20:34, 13 November 2023 (UTC)[reply]
This was also discussed a few months ago. I defer to the views expressed by @Froaringus and @Sarilho1 on the matter. Nicodene (talk) 21:44, 13 November 2023 (UTC)[reply]
Adding some comments from linguistic sources:
'Most historians have considered Galician and Portuguese as varieties of the same language until around the 14th century [...] Towards the beginning of the 15th century, Galician and Portuguese already show some noticeable phonological differences [...]' - Martínez-Gil 1997, 'Word-final epenthesis in Galician', p. 332 in Issues in the phonology and morphology of the major Iberian languages.
'[...] from the 15th century, when the increasingly impermeable political frontier went up between Galicia and Portugal, Galician lost contact with its sister tongue, Portuguese.' - Hermida 2001, 'The Galician speech community', p. 115 in Multilingualism in Spain: Sociolinguistic and psycholinguistic aspects of linguistic minority groups.
'We shall use the term Galician-Portuguese (GP) for the medieval varieties spoken in Galicia and Portugal until roughly the Renaissance, although some consistent differences already existed during the late Middle Ages (Maia 1997). Following this period, we shall speak of Galician (Glc.) and Portuguese (Pt.) as different languages...' - Dubert & Galves 2016 'Galician and Portuguese', p. 412 in The Oxford guide to the Romance languages.
Nicodene (talk) 22:37, 13 November 2023 (UTC)[reply]
Very informative! Thanks for the link to the other topic; I see now that the man that was called the last remnant of Old GP is Gil Vicente, not Garcia de Resende. I find it pretty interesting that both of them died in 1536 though.
I did some digging through a few of the sources you've presented, and I wasn't able to figure out which differences they're talking about exactly. I was aware of slight differences in spelling trends such as -m/-n, but surely that can't be all, can it? When I think of the modern-day languages, the major, consistent phonological differences I can think about are the devoicing of G/J to X and the pronunciation of C/Zs as a dental fricative. I don't think it's a coincidence that Castillian Spanish shares the same features though; the simplest explanation is that they developed the sound changes at relatively the same time: the middle of the 16th-17th centuries[6]. However, that's past the 'beginning of the 15th century' deadline brought up by the scholars above. Since they can't be talking about these relatively big changes (that still don't even really affect comprehension), I'm not certain about what they might be alluding to.
Replying to points raised in the previous topic, I'm not at all happy with drawing the line at any point before 1400, and especially not at 1300. The amount of obscure lemmas that would have to be created for the modern languages and the amount of terms that wouldn't be able to be represented because they don't pass CFI (pre-Renaissance spellings were an anything-goes situation) would be ludicrous. I agree with Froaringus when he says "That was perhaps the last political opportunity for [Old] Galician-Portuguese to maintain its unity", but the part I agree the most with is his wording: "political opportunity". Ferdinand I's relinquishment of Galicia was a political move, and while it obviously had consequences to the language used in both territories, they can't and couldn't have been immediate. To draw the line over there is to draw the line with a political rather than linguistic base, which is exactly what I, Benwing and other people were voting against just now concerning Serbo-Croatian.
I mentioned consonant devoicing and C/Zs as some of the biggest phonological differences between the two languages. Both features were being 'developed' starting in the middle of the 16th century. I couldn't help but notice that the 1536 figure is pretty close to that timeline-wise. It seems to be a pretty consistent beat when it comes to differentiating Portuguese and Galician linguistically; what with the death of two notable (Old) Portuguese cultural figures, the confection of a Renaissance-styled dictionary and the beginnings of notable phonological changes in Galicia. MedK1 (talk) 00:40, 14 November 2023 (UTC)[reply]
@MedK1 I agree that the big phonological differences between Galician and Portuguese that are shared with Spanish are unlikely to be coincidental; Occam's Razor would dictate that they are due to mutual influence. Benwing2 (talk) 02:14, 14 November 2023 (UTC)[reply]
Hi. I'll insert myself here.
First: Most Galician philologist don't consider OGP a different language, but a different phase of the language. Galician medieval written production (tens of thousands of loose parchments, hundreds of books of inventories and the likes; and then the general prose, to the exclusion of the lyrical production which is a common endeavour) is roughly equivalent in size and quality to the Portuguese one, and is usually studied as a integral part of the curriculum of the language, not as, say, Latin, which is an ancestor language. Again, both literatures and written traditions have had their own autonomous life at least since the middle of the 14th century, and you shouldn't forcibly strip a language of its literature.
Second: Main early spelling differences (since the 13th century): Galician <nn>,<ñ>,<i>, <y> vs Portuguese <nh> for the palatal nasal; Galician <ll>, <l> vs Pt. <lh> for the palatal lateral; Galician <i>, <y> for the vowel vs. Pt <i>, <y>, <h>. It is of importance to note that early Portuguese is much more homogeneous than early Galician, both in spelling and in the admission of dialectal features, because of the Portuguese royal chancellery. Also, already since the 13th centuries is perceptible both some vocabulary difference in between both varieties and, most notably, in the verb conjugation, e.g., Galician disso/disse (MG dixo) vs. Pt disse 'he/she said", Galician quisiste/quisische (MG quixeches) vs Pt. quisiste 'You wanted'...
Third: current MAIN phonetic differences in between both languages with an old origin (out of my head!):
- Galician lost of phonemic opposition b / v (also affects northern Portuguese): attested since the 13th century, notable since 1400 (baca instead of vaca, "cow", since at least 1406).
- Galician devoicing and collapse of fricatives, etc, so there is no /ʃ/ vs. /ʒ/ opposition, and /s/ vs /z/ collapsed in the west but /z/ > /θ/ in the east (notable since 1400: sexa instead of seja, "it may be", attested since 1270; marso" instead of março or marzo, March, since 1314).
- Galician plurals of -l ended words: animal > animaas > animás (in the East and the standard norm animais) vs. Pt. animais: since the late 14th century (rayaas "royals (a coinage)" 1391, oficiaas "officials" 1394).
- Galician loss of phonemic nasal vowels (ã > a in the East, ã > /aŋ/, /aN/ in the West). Since the 13th century, but most notable since 1400 and responsible of a good deal of divergent nouns and verbs in between Pt. and Gz.: G umha Pt uma < OGP ûa, G engadir < êadir "to add", G sandar Pt sarar < OGP sãar "to cure", Gz servidume Pt servidão < OGP servidûe < Lat servitūdinem...).
- Galician result of -ano > OGP -ão: MG irmão, irmãos > WG (irmaan >) irmán, irmáns CG/EG irmao, irmaos (vs. PT irmão) "brother, brothers": Most notable since 1400, but, for example yrmaan "brother" is already attested in 1338.
- Portuguese confusion of -ão / -am / -om > -ão/-am (since the 15th century?) vs. its absence in Galician: Pt. eles comeram 'they ate' vs Galician eles comeron 'they ate'.
- Portuguese <ch> /tʃ/ > /ʃ/, but Galician still /tʃ/ (also residually in N Portugal: 18th century?)
- Galician "gheada", /g/ produced as an aspirate (or regionally as /k/ after a nasal): 18th century.
References:
- Pär Larson (2018) La lingua delle cantigas: grammatica del galego-portoghese. 9788843093953.
(I'll add more at home) Froaringus (talk) 14:29, 14 November 2023 (UTC)[reply]
- Clarinda de Azevedo Maia (1986) Historia do Galego Português
- Fernando Venâncio (2019) Assim nasceu uma língua. 978-989-702-510-5.
- Ramón Mariño Paz (2017) Fonética e fonoloxía históricas da lingua galega. 978-84-9121-187-7.
- Xosé Manuel Sanchez Rei (2021) O Portugués esquecido. O galego e os dialectos portugueses setentrionais. 978-84-8487-537-6.
I should add here that the Portuguese grammarians since the 16th centuries addresses Portuguese and Galician as different languages, but not because Galician being influenced by Spanish, but because Galician felt distinctly rural, unsophisticated, and archaic to them. Froaringus (talk) 17:06, 14 November 2023 (UTC)[reply]
- Rübecamp, Rudolf (1932) “A linguagem das Cantigas de Santa Maria, de Afonso X o Sábio”, in Boletim de Filologia, volume I, pages 273–356
-Vaz Leão, Ângela (2000) “Questões de linguagem nas Cantigas de Santa Maria, de Afonso X”, in Scripta[7], volume 4, number 7, →DOI, retrieved 16 November 2017, pages 11–24
This later author wrote (page 15, my translation) in reference to the language used in the Galician-Portuguese lyric tradition: "13th century literary Galician-Portuguese constituted still an unity, if well an unstable one. Certainly the common spoken use was showing the future bifurcation into Galician and Portuguese. The same was happening to the literary language: inside that artificial unity the presence of advanced notices of separation can be found". Froaringus (talk) 17:33, 14 November 2023 (UTC)[reply]
Sorry guys. Two other references:
- Rudolf Rübecamp (1930) Die Sprache der altgalizischen Cantigas de Santa Maria von Alfonso el Sabio
- Clarinda de Azevedo Maia (1996) "O Galego-Português medieval"
In the late article the author defends the existence of a Medieval Galician-Portuguese language that goes beyond the lyrical Galician-Portuguese tradition (this existence is non pacific), although she recognises that there are growing differences appreciable since the XIII century and that grew during the XIV and XV centuries: "beyond a large common base, Galician documents show some specific evolutions that eventually will constitute true Galician innovations". Later "Since this common Galician-Portuguese phase the variants will follow different "historical pathways": of this separation of Galician and Portuguese are responsible some alterations of historical and political origin happening in both territories, some of them belonging to prior times but which let feel its linguistic consequences most notably from the middle of the XIV century and beginnings of the next century." Froaringus (talk) 18:29, 14 November 2023 (UTC)[reply]
Summarizing some historical facts:
- Galician-Portuguese evolved from Vulgar Latin in what Joseph Piel called Magna Galicia, that is, Galician and northern Portugal. Pre-Latin Western Indo-European languages acted as substrate. As early adstrata acted the Germanic languages of Sueves and Goths.
- The Arabs invasion of the Iberian peninsula produced a partial depopulation of the Douro river valley. Many bishops of what is today Portugal flee to Galicia, among them those of Coimbra, Lamego, Dume and Braga. The Arab presence in Galicia was ephemeral, if any at all. During the next century Galicians "reconquered" and repopulated much of northern Portugal: [ https://revistas.ucm.es/index.php/RFRM/article/view/61690 "Ad populandum": toponímia e repovoamento no sul da Galiza alto-medieval]
- During the 9 and 10th century Galicia and N Portugal constituted an unity, governed sometimes by privative kings; the foundation of Santiago de Compostela as a pilgrimage centre brought people an culture from north of the Pyrenees.
- In the late 11th century Galicia was awarded to the count Raymond of Burgundy as personal fiefdom. He aspirated to succeed the king, but died before him. In any case, his son, future Alfonso VII, was given the Kingdom of Galicia with the title of king. At the same time, his cousin Henry was crowned as first king of Portugal. Alfonso was supported by Galician noblemen and the archbishop of Santiago de Compostela. Henry was supported by the Portuguese nobility and by the recently reinstituted archbishopric of Braga. Galicia was divided in two: Portugal expanded south while Galicia, united with León, managed to maintain their independence from Castile. Both kings of that century lies in the Royal Pantheon of the Cathedral of Santiago de Compostela.
- Probably under the growing French influence, Galicia and Portugal developed a lyrical tradition similar to that of southern or northern France, or Sicily.
- By 1230 Galicia, León and Castile where united under one only king, each country maintaining the title of kingdom. Alfonso X became the most important patron of this Galician-Portuguese tradition. In 1290 the Galician-Portuguese is first mentioned as a language of culture. It was by the Catalan Jofre de Foixà, courtier of the king of Sicily. The language is called, simply, gallego, Galician.
- During the 14th century, Galicia fought a lost for alternative kings to those who finally reigned in Castille. By then, the language was already known internally as galego and divergence with Portuguese was more evident. Galician noblemen paid for the translation of books based on the Roman of Troy, king Arthur, etc, and production of works of history.
- During the 15th century Galicia became impermeable to the Royal power, and rivalry between noble families led to, for example, one knight taking prisoner a bishop, for months, at least at two different times with different protagonists. A series of revolts ended circa 1470 with a true revolution that destroyed most castles along the country. Sadly the revolution was defeated by the lords who, anyway, would be also eventually defeated by the Catholic Monarchs, who implanted a Royal Audience, as body of government and justice of the kingdom of Galicia, under the authority of a governor, also president of the Audience and General Captain with vice-royal powers (which is actually the structure used by the Spanish Empire later in the Americas and the Philippines). The interlocutor of the Governor were the Junta del Reino de Galicia, a representative assembly whose deputies were nominated by the cities. As result, most important noblemen were forced to go to Castile to work for the kings, and the economic powers of the many Galician monasteries were put under Castilian rule. So, most nobles and monasteries stopped issuing documents in Galician, and by 1530 Galician was seldom used in legal documents (beyond personal and place names, and concepts with poor translation into Spanish) but just in private letters, songs, theatre... At that time, the first Portuguese grammarians write their works, acknowledging Galician an Portuguese as two different languages.
So, 1500 could be Ok after more than two centuries of accumulative divergence, but keeping in mind that:
-at least for Galician studies, Old Galician or Galician-Portuguese is a period rather than different language.
-whenever a big fish and a small fish are put together, bigger tends to eat or make disappear the little one.
Froaringus (talk) 21:12, 14 November 2023 (UTC)[reply]
Wow, that's a lot of lines to read. Thank you for this and for all the references!
I don't have a lot to add, but I'd like to make some comments regarding your concerns at the very bottom of your post.
  • Portuguese philologists, just like Galician ones, see Medieval Portuguese as a period of one single language as well[8], just like "Classical Portuguese" right afterwards and the current "Modern Portuguese" period.
  • I see what you mean about big and small fish, but I believe you can rest assured none of that would happen here. As you mentioned, "Galician medieval written production is roughly equivalent in size and quality to the Portuguese one". They're both big fish in a big pond.
    • I actually think that it's more likely for the 'small fish' to 'disappear' if we keep the status quo, because medieval forms under the "Portuguese" L2 are subject to CFI. They're "safer" under the OGP L2.
Qwerty below me mentioned English being in a very similar situation to this, and some leniency being allows for texts that fall "on the 'wrong side' of the line". With that and the fact Galician was actually already somewhat limited in usage by 1530 in mind (I didn't know that!), I too am perfectly alright with 1500 as the date. I think we've reached a consensus here! MedK1 (talk) 18:48, 15 November 2023 (UTC)[reply]
In Galician studies 1500 is, give or take, the limit most frequently used to define the end of Medieval Galician. In fact, even (less formal) late 15th century texts already sound and feel more like Middle Galician, but 1500 is certainly the most accepted date. Froaringus (talk) 08:49, 16 November 2023 (UTC)[reply]
While I am unfamiliar with the particular details of this divide, I do think it's prudent to draw parallels with the boundary where Middle English becomes Modern English and Middle Scots -- coincidentally, it's exactly the same time frame. There are solid arguments for 1476 (the first printing press in England) and 1535 (the first complete Bible printed in English), so rather than weigh the merits of both, the OED simply splits the difference by giving a date of 1500 and allowing for some leniency if context indicates a text falls on the "wrong side" of the line. To my knowledge, we follow the same approach. Qwertygiy (talk) 02:52, 14 November 2023 (UTC)[reply]
Any line between different historical periods of a language is arbitrary, and more of a smeared boundary. It's not like people were speaking OGP one year and Galician or Portuguese the next year. I've seen scholars in different languages use either historical events (e.g. the boundary between Old English and Middle English being around the Norman Conquest), or the time of publication of notable literary artifacts (per @Qwertygiy's examples for Middle vs Modern English).
Personally, I prefer the latter, because it's a concrete example of a coherent text that belongs to a particular language stage per scholarly consensus. It seems that your personal favorites are along that line too, so it seems like 1500 is a decent choice. From what I can tell (without having read the entirety of this thread), Froaringus is on board with that as well. Chernorizets (talk) 02:09, 16 November 2023 (UTC)[reply]
Yep. Sorry for the wall of text. Given more time I could have come with something more compact and palatable, but I decided to act on the spot. And yes, I'm OK with circa 1500 :-) Froaringus (talk) 09:38, 16 November 2023 (UTC)[reply]
@Froaringus based on what I read after posting here, it sounds like the period from roughly 1400-1500 was effectively transitional, where the divergence between Galician and Portuguese was smaller towards the beginning and more pronounced towards the end. I'd anyway expect a transitional period rather than a sharp boundary. Put another way, 1500 sounds like the approximate time past which it would be hard to justify talking about a single Portuguese-Galician language. If my understanding is correct, then 1500 still makes sense, but editor discretion will probably still be needed if one is quoting from a document written in, say, 1485. Chernorizets (talk) 02:44, 18 November 2023 (UTC)[reply]
@Chernorizets There is a parallel in English; the end of Middle English is variously dated 1500-1550 AD, so a text from say 1525 could be assigned to either, I suppose. (For that matter, if we take the end of Early Middle English as when the case and gender system collapsed, it can be dated anywhere from c. 1200-1340 depending on the region ...). Benwing2 (talk) 08:08, 18 November 2023 (UTC)[reply]
Yep, I agree. Still I'll add a pair of things, for completeness:
- Usually 1500 is the year Galician philologist would give as end of the Medieval period, but still legal documents from, say, 1520 would be cited as Medieval because of strictly historic reasons (essentially, Galician being displaced by Spanish as the language of law and administration since circa 1480). But linguistics features of those documents usually show them as Middle Galician's.
- When you compare some documents, or even books, given/published in northern Portugal, near the boundary of both countries, around 1500, these tend to fend the gap. I mean, by then Galician and standard Portuguese have already been making their own separate ways for quite some time, but they existed in a dialect continuum.
So, yes, in my opinion some editor discretion will be needed around that year (both before and after). So, I also agree with Benwing2. Froaringus (talk) 10:47, 18 November 2023 (UTC)[reply]
By the way: can someone look into the flag attributed to Old Galician-Portuguese? It's a minor issue, but it should represent not just the county or kingdom of Portugal, but also the kingdom of Galicia. There're plenty of flags and coats of arms on wikimedia: see Kingdom of Galicia. Froaringus (talk) 17:55, 18 November 2023 (UTC)[reply]

Turkish etmek verbs[edit]

Pinging Afb2011, Anlztrk Flāvidus Itidal Johanna-Hypatia Justthatboredguy Lagrium Moonpulsar Newgrass 82 Orexan PinkPanthress Rd1978 Sabri76 Sedataltundal Trimpulot Whitekiko.

In Japanese, non-compound verbs are a practically closed class; very many verbs are a compound of a noun + する (“do”), for example 管理する (kanri suru, manage), literally “do management”. We have entries for some 7000 of such suru verbs. Per Wiktionary:About Japanese § Verb forms of nouns, these verbs do not have their own independent entries, but are accommodated together with the entry of the noun. (管理する is a hard redirect to 管理#Japanese.)

Turkish verbs do not form a closed class, but Turkish has an analogous construction of verbs that are a compound of a noun + etmek (“do”), for example idare etmek (manage), literally “do management”. We currently have entries for about 143 such etmek verbs, but the official Güncel Türkçe Sözlük of the Turkish Language Association has well over 1000 of these, from abandone etmek to zuhur etmek. Might it be an idea to follow the Japanese example? A user who, not knowing the verb tehcir etmek, encounters the phrase Türkiye’ye tehcir edilen Bulgaristan Türkleri will almost certainly begin by looking up the term tehcir. A further advantage would be that it is much easier to create entries for these etmek verbs by adding them after an existing entry for the noun than by creating a new page.  --Lambiam 12:20, 14 November 2023 (UTC)[reply]

I don't think that would be a bad idea in principle, but how would we deal with verbs derived in such a way which change the spelling of the noun, like zannetmek from zan, or keşfetmek from keşif?
Trimpulot (talk) 13:49, 14 November 2023 (UTC)[reply]
I don't think such verbs are relevant here. Their pages can stay just the way they are. Newgrass 82 (talk) 15:05, 14 November 2023 (UTC)[reply]
Indeed. The suggestion made here applies solely to multi-word verbs in which the last word is etmek, separated in the orthography from the (unchanged) noun by a space.  --Lambiam 21:27, 14 November 2023 (UTC)[reply]
The same applies for Korean 하다 (hada), Hindi करना (karnā), Persian کردن (kardan), but for some reason only Japanese gets the (IMO most sensible) treatment.--Saranamd (talk) 14:44, 14 November 2023 (UTC)[reply]
Eh for Korean, as a Korean learner and having interacted with other learners, we're much more likely to look up the 하다 forms directly. It might also get weird in terms of clutter and then also the verbs/adjectives where the "noun" isn't used outside of the stem. Ex: 은은 (euneun). Korean dictionaries, as you know, also have separate 하다 entries. AG202 (talk) 20:22, 14 November 2023 (UTC)[reply]
@AG202 There is a distinction between actually inseparable verbs/adjectives like 착하다 (chakhada) and ones like 사랑하다 (saranghada). 사랑하다 (saranghada) is clearly not actually one word because it can be split by particles and even entire NPs and adverbial phrases, e.g.
우리 사랑 아름답게 했다. (uri-neun sarang-eul cham areumdapge haetda., We loved very beautifully.)
Japanese has the same distinction, and the equivalents to 착하다 (chakhada) are given their own entries while the equivalents to 사랑하다 (saranghada) are grouped with the noun.
The reason the way Japanese does it best is because when a noun gets updated (in terms of definitions, usage notes, etc.), the verb section is much more likely to get updated along with it if it's actually on the same page. Institutional dictionaries don't have this concern because they aren't reliant on volunteers and the people actually get paid to maintain consistency.--Saranamd (talk) 20:11, 15 November 2023 (UTC)[reply]
For Japanese, at least, the monolingual resources I'm familiar with all list definitions under the noun, indicating whether the noun can be used with suru for verb senses. Basically what we do here.
An example in the bilingual section of Weblio, for the noun 重視 (jūshi, serious consideration; important regard”, literally “heavy + view, noun, usable with する (suru) for verb senses): https://ejje.weblio.jp/content/%E9%87%8D%E8%A6%96 ‑‑ Eiríkr Útlendi │Tala við mig 20:24, 15 November 2023 (UTC)[reply]
I think Japanese gets this treatment because noun + する verbs are much more prominent in Japanese than noun + etmek verbs are in Turkish.
You're right that a person who sees "tehcir edilen" will probably go to the page tehcir if they're unfamiliar with Turkish, but tehcir etmek is already linked as a "derived term" in that page. Newgrass 82 (talk) 15:15, 14 November 2023 (UTC)[reply]
This is true for tehcir, but not in general. The page for alay does not link to alay etmek, the page for ameliyat does not link to ameliyat etmek, the page for dans does not link to dans etmek, and so on. The noun fark lists ten derived terms and three related terms, but fark etmek is not among them. The transitive senses of this verb are not easily guessed from the common meaning of the noun.  --Lambiam 22:14, 14 November 2023 (UTC)[reply]
Uplifting. Turkish pages getting a special treatment.
There is a special headword template in Japanese for those verbs: { { ja-verb-suru }}. See 監督
https://en.wiktionary.org/wiki/Wiktionary:About_Japanese#Verb_forms_of_nouns
Needed here too?
— flavidus (t...) | c=› } Flāvidus (talk) 20:26, 14 November 2023 (UTC)[reply]
If we adopt this approach, we’ll create an analogous headword-line template for Turkish, perhaps named {{tr-verb-etmek}}.  --Lambiam 21:23, 14 November 2023 (UTC)[reply]
I think this will do the job:
{{head|tr|verb|head={{PAGENAME}} etmek|third-person singular simple present|{{PAGENAME}} eder}}
 --Lambiam 21:44, 14 November 2023 (UTC)[reply]
Totally fine by me. I can start working on adding more "etmek" verbs on my free time. That template can make our job easier + maybe we can create a category for these verbs. Moonpulsar (talk) 23:44, 14 November 2023 (UTC)[reply]
No part of this proposal makes any sense. Regarding Japanese entries, I can't wrap my head around why they would decide to do something so strange as this. Structures like this exist in many languages. No amount of productivity can justify this abomination. Completely unnecessary for Turkish and has no academic basis. This part about a user looking up "tehcir" boggles my mind. So? That's like saying if someone didn't know the meaning of do business, they'd look up the word business first. Yeah, that's kinda how the human brain works. Should we just put "do business" under the page for "business" then? I don't see how that constitutes valid grounds for this proposal.
Plus, if we were to do this then surely we'd have to do something similar for derivative suffixes? bilgi (knowledge) and bilgili (who has knowledge, knowledgable) for instance have precisely the same amount of semantic and pragmatic difference as "management" and "to manage" after all. And it doesn't even take a full word like etmek, just a couple letters extra. Most Turkish suffixes are highly productive too, derived entries are just taking up too much space. What about compound verbs formed with yapmak, like alışveriş yapmak, hata yapmak, yol yapmak, açıklama yapmak, egzersiz yapmak etc. We could just chop them all down and stuff them into the entries for their root words and simplify the whole thing. Complete nonsense. Orexan (talk) 07:20, 15 November 2023 (UTC)[reply]
It is clear that you are against the proposal, but I do not quite see what your objection is. It does make sense to me.  --Lambiam 19:22, 15 November 2023 (UTC)[reply]
I agree with @Lambiam -- your opposition is plain, but your reasoning is opaque.
This proposal is specifically about multi-word compounds, where etmek is a separate word. This is not about suffixes. This is not about yapmak (although, in my ignorance of Turkish, I do not see why it couldn't be extended to that construction as well, so long as such constructions are lexically significant).
We did this for Japanese entries precisely because it makes sense -- Japanese can use all-purpose verb する (suru, to do) after almost any noun. There is no real value in creating separate entries for the nouns and then the verb forms using suru. There are exceptions, such as 愛する (aisuru), where the suru portion is analyzed instead as an integral part of the word, rather than as a separate addition, and for these we have full entries. This appears to be analogous to those cases in Turkish where the root noun has fused with the etmek, such as the zannetmek and keşfetmek examples above. For all other Japanese suru verbs, where the suru is deemed a separate word, we have entries just at the noun headword, and include a "Verb" section that describes how this noun works with suru to express verb senses.
If we instead have separate entries for the noun, and the noun + suru, we are forced to duplicate a lot of information, and we must ensure that all of the noun entries correctly point to the suru entries as well, for no appreciable gain in usability. ‑‑ Eiríkr Útlendi │Tala við mig 19:57, 15 November 2023 (UTC)[reply]
@Orexan I don't understand your point about yapmak. As far as I've seen, verbs formed with it are almost exclusively SoPs: in your example of hata yapmak, hata is pretty clearly just the direct object of the verb, while in a construction such as idare etmek, the noun is just providing semantic meaning to the verb, which can still take a direct object. Moreover, wouldn't it be a good thing to move all the verbs derived with etmek or even olmak, when possible, under their noun component, since, as you said, derived entries are taking up too much space? The modality of doing that (whether by following the example of Japanese suru verbs, or by following the example of some Ottoman Turkish entries, such as قادر, which show these verbs as collocations, or by doing something else entirely) can be discussed further, but I don't see a reason not to embrace the general idea. Trimpulot (talk) 18:34, 16 November 2023 (UTC)[reply]
Rather than make a separate entry for noun + etmek phrase, along with the proposed entries under the nouns in question, I suggest adding an informative note to the etmek article explaining its frequent use in verb phrase derivation. Perhaps likewise, mutandis mutatis, for yapmak too. Maybe even note that kılmak fulfills a similar role, albeit in a highly restricted sense; in related Turkic languages qilmoq (for example) gets wider use comparable to etmek. Johanna-Hypatia (talk) 21:31, 25 February 2024 (UTC)[reply]
This proposal is how the Wiktionary data is structured, and has not much to do in particular with Turkish. Since as someone said there're number of other languages with same or similar syntax.
Being a native speaker doesn't entitle anyone a free ticket to a privileged opinion, since the person who proposed this, and others here endorsing are knowledgeable and/or linguists.
As for exceptions with etmek that behaves differently only proves the rule, which do exists in Japanese as well.
QUOTE https://en.wiktionary.org/wiki/Wiktionary:About_Japanese#Verb_forms_of_nouns "Note however that some verbs ending in する behave differently, such as 愛する and other verbs with one kanji plus する. See 愛する."

Maybe we'd ping some Japanese active contributors and admins to ask why did they do something as strange as this. :)
Maybe you'd move this discussion into a admin level, if one exists.
As for me: Not that I study Japanese or any other language, yet I visit Japanese entries a lot, and I remember being glad that the data was structured as it is.
— flavidus (t...) | c=› } Flāvidus (talk) Flāvidus (talk) 12:55, 15 November 2023 (UTC)[reply]

@Lambiam Just FYI, there was an RFD proposal awhile ago to delete all the Hindi करना (karnā) compound verbs as SOP but I objected and pointed out that many of them are not transparently derived from the base word. I don't think the proposal was to move them to the base word, but just to delete them. I am of two minds about whether to lemmatize them under the base term; for English we normally lemmatize phrasal verbs as such e.g. get up, take on rather than putting them under get and take, as some dictionaries do. At least for English this makes sense because common phrasal verbs often have a multitude of different meanings and there are a lot of phrasal verbs derived from common verbs like get and take, and putting them all under the base verb would get extremely unwieldy (as well as the fact that there's no clear structure for doing this in Wiktionary). This may work for Japanese because there appears to be only one light verb that most such verbal compounds are made from i.e. suru, but in Hindi there are several besides करना (karnā). Not sure about Turkish; are there others besides etmek? Benwing2 (talk) 09:26, 16 November 2023 (UTC)[reply]
Orexan mentioned compound verbs formed with yapmak, like hata yapmak. We currently list 19 such verbs, while the Turkish Wiktionary has 115 entries, but the only ones the two Wiktionaries have in common are ağda yapmak, banyo yapmak, sörf yapmak and şaka yapmak. They are unlike the etmek verbs in that, as far as I see, they are all intransitive; the object slot of the transitive verb yapmak is already taken up by the first component, like in English do battle, do business, make amends and make conversation. Also, some of those listed are in my opinion just a sum of parts; for example, araştırma yapmak is as transparent as English do research.  --Lambiam 11:02, 16 November 2023 (UTC)[reply]
PS. I just realized one can also use bir şaka yapmak, bir hata yapmak or hatalar yapmak, which sows some doubt in my mind we are dealing with honest verbs here, rather than idiomatic collocations. Compare pull rank, where one can say “he pulled his rank”.  --Lambiam 11:26, 16 November 2023 (UTC)[reply]
I don't see moving etmek phrasal verbs under the noun entry as a big improvement. If the meaning is predictable and SOP then it should not have an entry altogether and may be added as a collocation in the noun entry, as I have done in a number of OsmT. entries like لاف, قربان, قادر, etc., while if it isn't SOP, as in fark etmek, then it rightly deserves its entry. In this regard Turkish is not unlike most languages I know of, albeit accentuated. The Japanese situation is somewhat different: suru verbs have been analysed since time immemorial as a distinct class of verbs, so treating them as a separate POS under the noun entry is a neat practical compromise. In Turkish on the other hand this usage doesn't seem necessarily restricted to etmek, the same treatment should be given at least for olmak, possibly even for yapmak, and at that point one wonders where to stop. The point "one would look under the noun entry" makes sense, although this is not an isolated case, as has been pointed out. There are many situations, Turkish aside, where I struggled to realise I had to look under a multiword term to grasp the semantic evolution of the words together. This can be any type of multiword term, adjective + noun, preposition + adverb, verb + verb, etc. and not restricted to this kind of phrasal verbs. To conclude, I think SOP etmek (and yapmak, etc.) verbs should be made into collocations and deleted, while non-SOP ones should be kept as derived terms. Catonif (talk) 10:26, 17 November 2023 (UTC)[reply]

What are the attributes of these words from the Salic law around 500AD? I'm reading it as:

speak-1.SG.PRES.IND PRON-2.SG.ACC liberate-1.SG.PRES.IND villein-NOM.SG
‘I declare: Thee I liberate, villein.’

Is this correct, and under which language should they be classed? 500AD seems rather too early for Old Dutch, but we have no Frankish on WT anymore. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 17:49, 16 November 2023 (UTC) Minor formatting changes ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 17:55, 16 November 2023 (UTC) [reply]

Semitic verb lemma glosses[edit]

Hello. What do you think about adding a lemma gloss third-singular masculine past (or similar) to Semitic verbs to indicate what the citation form is? This was requested by a number of users for various languages (cf. Wiktionary:Beer parlour/2023/October#Changing Latin verb definitions to use "to ..." instead_of "I ..."). Based on this, I added it to Arabic and Maltese, and was reverted by User:Fenakhay. He claims this information is unnecessary as language learners of Maltese and Arabic will know Semitic dictionary conventions. Thoughts? Benwing2 (talk) 06:53, 17 November 2023 (UTC)[reply]

Those decisions should be taken by each language community. And posting it in WT:BP is a way to attract unconcerned parties in an effort to enforce a decision on all Semitic languages is a really pathetic tactic to be honest. If you want to discuss those changes for Arabic, post them in Wiktionary Talk:About Arabic. You have been using this tactic many times to impose your rather unilateral driven decisions and it is kinda getting annoying. — Fenakhay (حيطي · مساهماتي) 06:57, 17 November 2023 (UTC)[reply]
@Fenakhay: With all the respect, your criticism toward @Benwing2 is unfair, in my opinion. He has contributed greatly to Arabic inflection modules is huge. No-one had enough patience and stamina to get these modules to the current level. (There's always room for fixes and improvement, of course). Also, unconcerned parties won't contribute in discussions.
Rather than fighting, can we look at the topic at hand? :) If you look at various conjugation tables, e.g. Russian иска́ть (iskátʹ), the lemma "infinitive" appears on the first line, which makes also clear that it's the lemma, even if it's not said in words. Compare with the French chercher, German suchen.
On the other hand, it's not clear why the Bulgarian тъ́рся (tǎ́rsja) is the lemma. Same with the Macedonian бара (bara). I think it would be beneficial to include بَحَثَ (baḥaṯa) on a top line, since this is the lemma. It's the word use look for. I am suggesting any particular design change but it can be discussed if it's agreed on and the discussion continues in a positive way. Anatoli T. (обсудить/вклад) 08:50, 20 November 2023 (UTC)[reply]
And the fact that you haven’t pinged the concerned parties (Arabic and Hebrew editors) shows the motives of this thread. — Fenakhay (حيطي · مساهماتي) 07:02, 17 November 2023 (UTC)[reply]
Best discussed within the editor community of each language. I only have the tip of my big toe involved in Ge'ez (I hope to keep learning more Ge'ez to be more involved in the future), and as far as Fenakhay's claim goes, I think it's very reasonable to say that Ge'ez learners already know the citation form is the 3rd-person singular perfect while using "to X" glosses in the English, because that's what every textbook in fact does (Lambdin, Wright, Prochazka, the draft of Butts). I don't know about Maltese but you guys could check textbooks if there's an argument going on about this. When Fenakhay says there's a tradition in "Semitic languages" to do this, I don't think he means generalist Semitic linguistics literature, but actually does mean the textbook and dictionary tradition of every individual Semitic language (and it's possibly true).--Ser be être 是talk/stalk 07:20, 17 November 2023 (UTC)[reply]
In any case I wouldn't prefer that for Amharic either, for the same reason SBES mentions. Thadh (talk) 08:00, 17 November 2023 (UTC)[reply]
@Fenakhay Please don't attribute spurious motives to me; calling me "pathetic" is a rather strong term to use and reflects more on you than me. I have posted here because I prefer consistency across languages and would like to get a wide set of opinions, and I don't think that's an unreasonable position to take. On your request I am pinging (Notifying Alarichall, Atitarev, Esperfulmo, Erutuon, عربي-٣١, Fay Freak, Assem Khidhr, Fenakhay, Fixmaster, Roger.M.Williams, Zhnka, Sartma): the Arabic language editors. There is no workgroup for Hebrew so I'm not sure who the relevant editors are and I'm not even sure how active the Hebrew community is at this point. Benwing2 (talk) 09:40, 17 November 2023 (UTC)[reply]
@Benwing2: But that's not a gloss of the lemma! It's the non-gloss of identification of a form. It's an explanation of the headword of the entry, and should be de-emphasised. If you want support for a means of presenting that information, show the form to us. It's not documented, and what I saw for Latin (formerly under pluit) was bad, and is now gone. RichardW57m (talk) 11:13, 17 November 2023 (UTC)[reply]
What RichardW57m says. Already discreetly done even, in so far as we link verb verb morphology appendices, that say “the citation forms, which in Arabic means the 3rd-person masculine singular perfect” – perfect. Everything more is noisy. Fay Freak (talk) 14:02, 17 November 2023 (UTC)[reply]
@Benwing2: Could you please give me an example? As I remember, we used to learn Literary Arabic verbs in school by their masculine third person form in the past. Not sure about Modern Hebrew, though, perhaps Classical Hebrew is taught similarly to Classical Arabic. --Esperfulmo (talk) 14:25, 17 November 2023 (UTC)[reply]
Usage for Hebrew varies. Many use the 3sm qal perfective or the simply the root (and that's not a simple rule for פ״ו verbs), but there are some dictionaries that use the present participle. I've a feeling there's a Hebrew dictionary out there that actually uses the infinitive - I've face-palmed on seeing the infinitive used for numerical algorithms for grouping languages by similarity. --RichardW57 (talk) 23:57, 25 November 2023 (UTC)[reply]

Proto-Ta-Ne Omotic[edit]

Request to add code otn-pro, which would include Bench, Gonga/Kefoid, Ometo, and Yemsa/Janjero (https://glottolog.org/resource/languoid/id/gong1255, see also Blažek 2008 in In Hot Pursuit of Language in Prehistory, as there is a basis for grouping on lexical similarity) Saph668 (talk) 12:15, 17 November 2023 (UTC)[reply]

I think @Saph668 has misunderstood my suggestion on WT:TR, which was to ask for the addition of the Ta-Ne Omotic language family (as given above) and its proto-language, and its proto-language. I don't think 'otn' is a suitable code element, as it is assigned to an Otomi language. I think we may have to go for something starting omv-, perhaps omv-ggi for Gonga-Gimojan, Bender's name for the family. Or is there a problem with omv-ggi-pro for the Proto-language? --RichardW57m (talk) 13:45, 17 November 2023 (UTC)[reply]
@Saph668: Is it more frequently used than "North Omotic"? Also, otn is already used for Tenango Otomi, so we need to call it something else, like omv-otn. Furthermore, if there are no reconstructions yet done for the branch, it may be best to just keep it as a family without a reconstructed ancestor. Thadh (talk) 14:11, 17 November 2023 (UTC)[reply]
@Thadh: There are quite a few reconstructions around labelled as "North Omotic" - I'm not sure what their quality is. (I suspect there may be a lot of refinement to come.) The problem with nomenclature is that the sense doesn't seem stable - does it include the Dizoid and Mao groups? On the other hand, the "Ta-Ne Omotic" group does seem to be a stable concept, and is recognised by sceptical Glottolog; it is highly plausible that Proto-Ta-Ne Omotic actually existed. The same cannot be said of Proto-North Omotic - Glottolog does not accept the existence of the "North Omotic" group (or even Omotic). --RichardW57m (talk) 14:28, 17 November 2023 (UTC)[reply]
@RichardW57m: We decide which languages to include in a branch ourselves (of course on the basis of other sources), so we don't need to use a lesser-used term just to make it clear to others - we have family trees for that; But if it is truly more used/preferred by other sources, we can adopt it. And as for reconstructions, if there are at least solid sound correspondences, we can opt for a solution where we do add the proto-language, but agree not to create or link to any reconstructions, rather just putting {{inh|LANG|omv-otn-pro|-}} in the etymology for categorisation. This was also the plan with Proto-Cushitic (although I believe at some point people still started adding some shaky forms to the etymologies). Thadh (talk) 14:35, 17 November 2023 (UTC)[reply]
Apologies, I didn't put much thought into the language code... Regardless, RichardW57m is right, Ta-Ne Omotic is a more stable concept. Saph668 (talk) 14:37, 17 November 2023 (UTC)[reply]
The advantage of a having a proto-language is that one can list the cognates under the reconstruction, rather than having n-1 cognates listed for each of n cognates, as opposed to the current clutter where inherited Tai words are slowly acquiring umpteen cognate Zhuang forms in the etymologies of the cognates. --RichardW57m (talk) 14:43, 17 November 2023 (UTC)[reply]

Maltese "words" ending with hyphen[edit]

(copying from Wiktionary talk:Beer parlour, where it was accidentally placed)

(Notifying Alarichall, Atitarev, Esperfulmo, Erutuon, عربي-٣١, Fay Freak, Assem Khidhr, Fenakhay, Fixmaster, Roger.M.Williams, Zhnka, Sartma): We now include Maltese words such as il- and tal- whose entry name contains a hyphen because it is never used without one (e.g. il-mara, tal-kelb), but this practice seems unprecedented in Wiktionary outside Maltese. Usually the hyphen indicates that the given entry is an affix, e.g. dés-, which combines to give words such as désordre without the hyphen.

I am bringing this up here to decide once and for all whether such Maltese entries should be included in Wiktionary with or without a hyphen. (So please avoid bringing up the argument that the current modus operandi for Maltese is to include the hyphen.)

Personally I can see both benefits and drawbacks, and a potential argument for the hyphens is that it seems a bit ridiculous to suggest that "x" is itself a word (on the basis of the current x-), since it has only one consonant and no vowels. I can also see another potential argument for the hyphens, namely that currently we accept apostrophes in entry titles (e.g. m’).

(P.S. lill- was deleted in 2017 by @Qehath.)

--kc_kennylau (talk) 02:36, 18 November 2023 (UTC)[reply]

@Kc kennylau Personally I don't see why a single consonant can't be a word; but regardless of that, il- is properly speaking a clitic, so if you're looking for analogies outside of Maltese, you should look for how clitics are handled in other languages. Russian has several clitics, for example (which can be found mixed into CAT:Russian particles, but maybe should be moved into CAT:Russian clitics), and the ones that attach to preceding words with a hyphen are written with a hyphen, e.g. -либо (-libo), -то (-to), -ка (-ka). This is somewhat analogous to English's apostrophe-s clitic ('s), where we include the apostrophe that joins the clitic to the preceding word. So this suggests that the current use of a hyphen is correct. (And I should add, Russian has single-consonant words written without a hyphen: б (b), ж (ž), ль (), etc. These are also clitics but are normally written as separate words, hence no hyphen.) Benwing2 (talk) 08:21, 18 November 2023 (UTC)[reply]
I see no reason why it shouldn't include a hyphen, given that it's spelled that way.--Urszag (talk) 11:25, 18 November 2023 (UTC)[reply]

(Notifying Alarichall, Atitarev, Benwing2, Esperfulmo, Erutuon, عربي-٣١, Fay Freak, Assem Khidhr, Fenakhay, Fixmaster, Roger.M.Williams, Zhnka, Sartma): There has been a mini edit-war on what is currently Template:mt-conj/VII+VIII (link to history), so I decided that it would be better to settle things here.

We have both stated our reasonings; I stated that Ġabra, Aquilina, and verb.mt all classify nsteraq etc. as Form VII, while @Fenakhay stated that Aquilina actually said that they belong to both Forms VII and VIII; so I have decided to produce what Aquilina has said:

"Another variant of Pattern 7 is obtained by prefixing in as for the seventh form and infixing t after the first radical as for the eighth form."

Here the keyword is "another variant of pattern 7", meaning that such verbs belong to Form VII; Aquilina mentioned the eighth form as a comparison, i.e. this form is like Form VIII, but it is actually not.

I hereby request @Fenakhay to produce what Aquilina has said that convinced him that they belong to Form VIII as well; and if he does so, I would also like to discuss what we should do in this situation.

I have come here with another problem: Ġabra conjugates the verb in the first person singular as "nsteraqt", which contradicts with the form "instraqt" given by verb.mt and Aquilina, since the former has an extra "e". What should we do in this situation? Which form is the "correct" one? Or maybe they are variants? Or maybe they are underattested theoretical forms (so it doesn't really matter)?

--kc_kennylau (talk) 20:58, 18 November 2023 (UTC)[reply]

First of all, Ġabra and verb.mt are not a reliable source for verb classification as they contain many errors.
Second of all, Aquilina, in his own dictionary, says that, i.e. nxtegħel, is VII+VIII meaning it is a mix of both forms which makes sense. And it is the same approach taken by other Arabic linguistics for mixed forms in dialectal Arabic. Where is your quote from as you haven't provided a source?
For your last point, it should be nstraqt as the cluster /str/ is permissible and I've confirmed it with a native speaker.
N.B. Could you not spam the Arabic workgroup because this doesn't concern Arabic itself. You can either ping the Maltese editors or create a dedicated workgroup for Maltese. — Fenakhay (حيطي · مساهماتي) 21:23, 18 November 2023 (UTC)[reply]
@Fenakhay: My quote is from P.159 in "Teach Yourself Maltese" by the same Joseph Aquilina.
I'll edit the templates to reflect the nstraqt forms etc.
Since you seem to be more familiar with the Maltese side of en.wikt, could you provide me with a list of editors that should be in said "dedicated workgroup for Maltese"?
--kc_kennylau (talk) 22:39, 18 November 2023 (UTC)[reply]
(cc @Fenakhay) Update: I have temporarily included a parameter named "keep4" to specify when the first root vowel should not be deleted in certain forms; I think the template T:mt-conj/VIII also has the same problem, e.g. steraq also has the wrong form steraqt.
--kc_kennylau (talk) 23:37, 18 November 2023 (UTC)[reply]
The necessity of the "keep4" parameter seems to be corroborated by verb.mt, which assigns two models to the form VIII verbs, called ltemaħ and ftakar. --kc_kennylau (talk) 14:20, 19 November 2023 (UTC)[reply]

Surnames with many different origins[edit]

Do we have a preference for whether/when to group various origins of surnames? E.g. Jiang has a lot of different etymology sections for the surname from Mandarin Jiāng as opposed to the one from Mandarin Jiāng as opposed to from Jiǎng, etc.; likewise Li.
In contrast, Wang, Wong, He and Hu have only one section covering all the origins; likewise other names I can find like Campbell, Meyer, Steen, Johnson, Ng, Bear and Doe.
On the face of it, the Wang/Johnson approach seems better to me, because separate ety sections would seem to require we go outside of lexicography and into genealogy in order to satisfy ATTEST/RFV, to show that not only did books mention people named Jiang, but that 3+ traced their ancestry specifically to family 姜 as opposed to 江, and that 3 books' Does took their name from deer as opposed to water, etc. (We also cover e.g. the two origins which led to the modern verb settle under one ety section for a similar reason, that in many cases they can't be teased apart.) What do you think? - -sche (discuss) 19:24, 19 November 2023 (UTC)[reply]

@-sche Seems OK to me to group etymologies like this for the reason you mention: it may not be easy to separate the origins (esp. for Chinese surnames). Benwing2 (talk) 00:07, 20 November 2023 (UTC)[reply]
See also this relevant BP discussion on surnames alt forms/doublets.
I would prefer that each etymology section includes only surnames from one language, since they sometimes have varying pronunciation/alternative forms/doublets of the surname in English, e.g. Hui is /hweɪ/ for the Mandarin-derived ones and /hu.i/ (or /hɵy̯/ in Hong Kong) for the Cantonese-derived one; or Lee and Choi which have different doublets for the Chinese and Korean ones. It would be a somewhat cluttery if they are merged into one section but the same qualifiers repeat multiple times throughout. The exhaustive listing of Li is for sure silly though, and I agree that they 100% needs to be merged. An approach like Wang is still OK to me though, but the etymologies should be formatted like the Chinese and Danish ones on Wang which uses proper templates and feels less wordy than the other ones that repeat "As a …".
I'm also curious as to what should be done for entries where there are some place names/other proper nouns that shares an etymon with a surname, e.g. Wu, which would be extremely messy if the surnames are merged into one section (in either case of keeping the other proper nouns in a section seperate from the surname, or just lumping everything into one massive section). – wpi (talk) 06:20, 21 November 2023 (UTC)[reply]
I don't know how Wiktionary should appropriately cover this situation. However, to shed light on the issue, I have just now demonstrated that all five etymologies could very likely meet WT:ATTEST on their own. Do with that what you will. --Geographyinitiative (talk) 07:15, 21 November 2023 (UTC) (Modified)[reply]
For an example of (what I previously described as) repeating qualifiers, see Liu. I don't really see there is a net benefit of putting all surnames under one etymology. wpi (talk) 14:57, 7 December 2023 (UTC)[reply]

Getting rid of "see also"[edit]

A passing thought: what do people think about the idea of (at some point in the future) deprecating "see also" sections? They would be superseded by the existing section types that express the actual nature of the connection between the words, like "hyponym", "meronym", (etymologically) "related terms", etc. The problem with "see also" is that it has no semantics. Equinox 12:28, 22 November 2023 (UTC)[reply]

Good luck with it... There's nothing inherently wrong with no semantic connection.Jewle V (talk) 12:32, 22 November 2023 (UTC)[reply]
(Oh, to clarify, I mean that "see also" doesn't express the nature of the connection — I'm not saying that the meanings of the see-also words have to be similar.) How else will our AI overlords learn how things fit together? hmmm. Equinox 12:34, 22 November 2023 (UTC)[reply]
Sometimes the fact there's no semantic connection is important. I'd like to think linking dim sum from dim and sum would benefit a curious passer-by. Jewle V (talk) 12:53, 22 November 2023 (UTC)[reply]
Hmmm, well, it seems to be a useful miscellaneous section, for example for terms which are semantically but not etymologically related to the entry and are of a different part of speech. That’s what the section seems to be currently used for. — Sgconlaw (talk) 13:15, 22 November 2023 (UTC)[reply]
We need an "other" heading for items that have some kind of connection with the headword, but not one of those that we have a specified home for. Sgconlaw's example is a very good one, but whatever homes we decree or add for specific relationships, there will always be more, especially in the minds of contributors. In some cases, there are terms that are readily confused with the headword, but aren't homophones, let alone homographs. In an entry that contains "not to be confused with X", I wish we would move 'X' to "See also". OTOH, I may have taken a step too far when I used the heading for disease vectors for an entry for a species of bacteria. DCDuring (talk) 18:54, 22 November 2023 (UTC)[reply]
@Equinox while I understand your concern that "See also" can be a bit of a wildcard/kitchen sink, I think it's a losing game to try to give a name to every kind of relationship that can exist - now and in the indefinite future - between two articles on Wiktionary. If you look at WT:EL, "See also" can furthermore point to "other pages on Wiktionary, including appendices and categories." I don't know how often we take advantage of that, but it could be a great way to increase the visibility of some of our appendices.
Another idea might be to expand the "See also" section of WT:EL with guidance on when not to put things in there, in favor of a more fitting section. It would also be nice to have a brief statement of the perceived or anticipated benefit of "See more" for Wiktionary users. Chernorizets (talk) 12:06, 23 November 2023 (UTC)[reply]
An example of a highly relevant semantic connection that is not necessarily syn or cot is s.v. interrogatee at See also > arrestee, detainee, suspect. Such a subclass can be described as "especially relevant Venn overlap". Strictly hierarchical relations such as hyponymy and hypernymy are also very important, but their limit is the strictly hierarchical, and not all relationships are solely hierarchical. Quercus solaris (talk) 20:01, 23 November 2023 (UTC)[reply]
PS: Granted that such things could be moved to a Thesaurus entry and then the See also section need not say anything except "See Thesaurus:blah". A question that follows, though, is, will people object to having Thesaurus entries that are rather sparse? The one for interrogatee could have an ant section and a small see also section and nothing else. If a consensus exists for allowing such sparse Thesaurus entries, then I could adhere to that method. Quercus solaris (talk) 20:20, 23 November 2023 (UTC)[reply]
It's a grab bag because our entries need grab bags for
  1. items a contributor knows belong in the entry but doesn't know where
  2. items whose semantic relation has a name that is unknown to any users other that some semanticists (which includes more than half of those listed in WT:ELE/
  3. particular examples are sgconlaw's and often-confused-with-the-headword (even excluding disease vectors!)
Putting these items in Thesaurus namespace means that users who are unaware that there should be things beside synonyms and antonyms there will be even less likely to see them and pursue them than now.
If someone wants to put in the effort to clean up a sample of, say, 100 See also sections and report findings, there might be something worthwhile to talk about. DCDuring (talk) 03:39, 24 November 2023 (UTC)[reply]
Agree on both of those two latter points. Quercus solaris (talk) 04:45, 24 November 2023 (UTC)[reply]
Would you rather that contributors put items under wrong headings or failed to put them in at all? We don't pay much attention to talk pages. DCDuring (talk) 15:36, 24 November 2023 (UTC)[reply]
My own opinion (and I suspect a commonly held opinion among Wiktionarians) is that the former is much better than the latter. (I grant that the question might have been asked rhetorically, but it's worth answering, as part of working through answers to the driving question in this thread.) Just let contributors enter logically connected things under "See also" and let someone else refine the placement later if they care to and are able. For example, I often find things at "See also" (from others) that I move up to "Synonyms" or "Coordinate terms" when that's in fact what they are. But when they don't fit there, they have to remain at "See also". Quercus solaris (talk) 17:22, 24 November 2023 (UTC)[reply]
Broadly agree with Sgconlaw, DCDuring and WF above, the semantic categories aren't exhaustive and there are cases where they might fit but pedantry over it doesn't help readers. —Al-Muqanna المقنع (talk) 18:44, 24 November 2023 (UTC)[reply]

feng[edit]

@Seoovslfmo Where would feng shui, feng-huang, Hai-feng go on the feng page if 'See also' were eliminated? Also, should these appear under feng under current rules? ([9]) See Wiktionary:Tea_room/2023/October#What_is_the_relationship_between_tai_and_tai_chi? and elsewhere. -Geographyinitiative (talk) 11:00, 27 November 2023 (UTC)[reply]

That's not for me to say, GI. I'm not a massive believer in always following the rules, as evidenced by my 1000+ (and counting!) blocks on the site! Seoovslfmo (talk) 11:08, 27 November 2023 (UTC)[reply]

Semantics and surface etymology[edit]

We define surface etymology, as of writing this post at least, as:

The apparent etymology of a term based on components occurring in the modern form of the language, such as earth + -en for earthen, which actually occurred in Old English as eorthene.

One question that arose on the Discord server is whether the meaning of the word needs to be accounted for when adding surface analysis to entries.
Consider the apple of discord (pun not intended), the Polish średni ("average, intermediate"):

  • We hold that Proto-Slavic *serda meant "the middle," but also "Wednesday." Its Old Polish descendant śrzeda meant only "Wednesday," and so does Polish środa now.
  • Proto-Slavic *serda gave rise to the adjective *serdьnъ, which meant "middle." Its Old Polish descendant śrzedni meant "middle, average" and so does Polish średni, roughly.

The clou is that the "middle" meaning of śrzeda/środa was lost in Old Polish already, yet we hold that średni is, by surface analysis, środa + -ni. However, surface analysis (again, going by our definition) requires that the components occur in the modern form of the language, and the etymology is apparent. It's pretty clearly not apparent that "Wednesday + [adjectivizing suffix]" would yield a word meaning "mediocre." So is surface etymology more about the meaning of its components, or of its components etymons? Hythonia (talk) 14:21, 22 November 2023 (UTC) Pinging @PUC, Vininn126 as they were involved in the discussion initially. Hythonia (talk) 14:22, 22 November 2023 (UTC)[reply]

Support PUC14:48, 22 November 2023 (UTC)[reply]
I have no strong feelings. I would lean towards accepting etymons, but if everyone agrees that it should also be lexemes then seems fine to me. I'd also like to take this time to say we really should change the wording/name of the template... Vininn126 (talk) 14:57, 22 November 2023 (UTC)[reply]
@PUC What do you support? It was an or-type question.
@Hythonia My understanding of surface etymology is basically that it's the answer a native speaker without a linguistic education would give. I think their answer would usually involve both form and meaning. Applying this as a test to your example, it's quite plausible that someone might derive "middle" from "Wednesday-ish", since Wednesday is after all the middle of the work week. —Caoimhin ceallach (talk) 17:39, 22 November 2023 (UTC)[reply]
I'm not sure that "without linguistic education" is exactly the right criteria - a lot of Polish speakers are not consciously aware of deverbals, but it's part of the modern language. (And yes, you can have surface analysis deverbals). Vininn126 (talk) 18:01, 22 November 2023 (UTC)[reply]
You're right, I was more thinking of knowledge of the historical development of their language and of historical linguistics in general. I don't know if grammatical knowledge has a strong interference effect on judgement. At any rate I meant a speaker should ideally decide purely based on linguistic intuition. —Caoimhin ceallach (talk) 18:22, 22 November 2023 (UTC)[reply]
This raises an interesting point - a lot of people don't realize that "reponsible" comes from response". I suppose the lack of a similar lexeme is the reason, but guaging intuition might not be so easy is the point. Others might find that example obvious. Vininn126 (talk) 18:57, 22 November 2023 (UTC)[reply]
True, absent a systematic study of surface etymology you'd have to intuit other people's intuition, or restrict yourself to clear-cut cases, which the above case is not. —Caoimhin ceallach (talk) 12:58, 23 November 2023 (UTC)[reply]
@Hythonia if earthen is a representative example, then IMO średni is not like it. The English adjective closely tracks the semantics of its noun component, while the Polish one doesn't. It's also not necessarily relevant that other Slavic cognates, e.g. Bulgarian среден (sreden), do support the surface etymology a la Proto-Slavic. It's still the case that średni didn't just appear out of thin air, so if it were me, I'd say in the etymology section that the underlying morphology is that of Proto-Slavic (using {{affix}} with |nocat=1), but the semantic relationship was lost even in Old Polish. So, no {{surf}}. Chernorizets (talk) 11:52, 23 November 2023 (UTC)[reply]
So let's say we'd like to show etymological formations anyway, what would be the best approach for that? Perhaps the lexemes are lost but the etymons are not. Vininn126 (talk) 11:59, 23 November 2023 (UTC)[reply]
@Vininn126 {{surf}} is not the only way to express the morphology of a term - {{affix}} can do that too, and since it doesn't come with a magical incantation like "by surface etymology", you can write something more descriptive of the actual situation yourself. {{affix}} supports qualifiers and other params you can use in some combination to give more context on each morphological constituent. Was that your question, or something else? Chernorizets (talk) 12:12, 23 November 2023 (UTC)[reply]
This is why I'd be for changing the wording of surface analysis to allow for historical developments as well. It reduces the amount of ways to write this information. Why have two ways when we can have one? Vininn126 (talk) 12:15, 23 November 2023 (UTC)[reply]
@Vininn126 productive suffixes don't necessarily need an etymology template. E.g. Bulgarian компютърен (kompjutǎren, computer (rel. adj)) is a simple example of slapping the highly-productive -ен (-en) suffix to компютър (kompjutǎr, computer). I'd express that with {{affix}} rather than an etymology template. The point I'm trying to make is that one size will not fit all, and I'd rather we just clarity the cases where {{surf}} makes sense.
AFAICT "surface analysis" is a bit of a Wiktionary term, and right now it means "synchronic" + "in the present". If we were to change that, we'd need to change Appendix:Glossary and possibly a few other places. Doable, but maybe the way to go is to have a "synchronic analysis" template with a time referent, since synchronic can refer to a point of time in the past. It all depends on how common this is beyond the one Polish example that started the discussion. Chernorizets (talk) 21:01, 23 November 2023 (UTC)[reply]
I still don't see why we can't make a one-size-fits all. Maybe the current use doesn't allow for it, but why can't we modify it? Something to, for example, "morphologically". Vininn126 (talk) 21:08, 23 November 2023 (UTC)[reply]
@Vininn126 just a casual, quick look at the pages linking to {{surf}} indicates that it's being used by many languages, including a bunch of non-IE ones. At this time, I don't think a strong enough case has been made to change the template for everyone. If the issue with średni is particularly common in Polish, then maybe there should be a Polish template to account for it. If the issue is common across languages, then that would make a case for modifying {{surf}}, but this thread by itself doesn't demonstrate that. Just my POV. Chernorizets (talk) 04:18, 24 November 2023 (UTC)[reply]
@Chernorizets It's perfectly reasonable to want to make that kind of change. We should look at how most people are actually using it to inform us if they are using it for etymons or lexemes. If they are using it for etymons, there is a clear indication that people naturally look at it that way, if not, then well no. Just because it's used by many people doesn't mean it can't be changed. Amd creating a particular template for just Polish is a bad idea - lots of terms have lexically obscured etymologies. Vininn126 (talk) 09:10, 24 November 2023 (UTC)[reply]

FYI: The Low Down on the Greatest Dictionary Collection in the World[edit]

https://www.atlasobscura.com/articles/biggest-dictionary-collectionJustin (koavf)TCM 19:15, 22 November 2023 (UTC)[reply]

@Koavf: fascinating! Thanks. — Sgconlaw (talk) 21:34, 22 November 2023 (UTC)[reply]

The full set of blog posts can be found via https://blogs.libraries.indiana.edu/tag/kripke-collection/ - Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:59, 26 November 2023 (UTC)[reply]

Besides the headword, where else should word stress be marked for Bulgarian terms?[edit]

(Notifying Atitarev, Benwing2, Bogorm, Bezimenen, Kiril kovachev):

Word stress is not marked in Bulgarian writing, except in rare cases of homographs like ѝ (ì, (to) her) vs и (i, and). Since stress is free and unpredictable, we indicate it on the headword line, and in the input to our pronunciation module in order to get correct IPA. For consistency with other Slavic languages, we use the acute accent to mark stressed vowels.

Where else (e.g. English translation sections) should word stress be indicated for Bulgarian entries? Currently, the tendency is "everywhere", but that's inconsistently applied, and I'm not sure if it's always the right thing to do. So here's a likely non-exhaustive list of situations, and my thoughts on them - I could use some more perspectives.

  • English translation sections
    I'm not sure about this one. Given that some languages use the acute accent as a diacritic, a user looking at a translation box might reasonably conclude that Bulgarian words are written with stress marks. It would take them an extra click, and possibly a scroll to the top of the page to realize that's not the case. One could argue that it gives a clue to the pronunciation, but I'm skeptical - in a translation box, Bulgarian words get transliterated, but the transliteration is not a good proxy for the actual pronunciation. For that, a user would still have to click on the term and navigate to its entry. In short, the stress doesn't add much, but can be distracting/confusing.
  • Descendant lists of terms in other languages from which Bulgarian borrowed/derived
    For pretty much the same reasons as for translation boxes, I'm skeptical it's of value to mark stress on Bulgarian terms when they are listed as one of several "Descendants" of another entry.
  • Etymologies of terms in other languages that are borrowed from Bulgarian
    This seems useful, because the Bulgarian term's stress might inform the phonology of the corresponding term in the borrowing language.
  • Related terms, derived terms, synonyms etc within a Bulgarian term's Wiktionary entry
    IMO other Bulgarian words related to the headword and included in its entry should have stress, as they do today. That's also the standard practice of Bulgarian dictionaries when they list related or derived terms.
  • Bulgarian quotations
    I think those should not indicate word stress. For one thing, there's a good number of Bulgarian works reflecting regional or dialectal speech, where stress may differ from the standardized variety for some words. I don't expect every editor to know when that's the case. For another, it can make it harder to Google the passage, since Google can get confused in the presence of diacritics. Finally, the quoted piece of text would look odd compared to any other piece of Bulgarian text off Wiktionary, for having all the stresses indicated.

This is what I could think of. I'd be curious to also hear from editors unfamiliar with Bulgarian, since I'm mainly guessing at what would be helpful vs distracting fur such users.

Thanks,

Chernorizets (talk) 11:33, 23 November 2023 (UTC)[reply]

@Chernorizets I mostly agree with your assessment, in that it can be difficult for non-speakers of Bulgarian to realize that the real Bulgarian orthography doesn't even write that accent symbol and that it's just a dictionary convention. I remember a long while back, before I understood this was a thing, I went around deleting accents from Wikipedia because I thought they were just typos. I agree it may be good to minimize accents when they aren't relevant.
Specifically in reply to each of the points:
  • Translation sections:
There's not necessarily a need to have accents here, but I would personally lean towards keeping them around. If you look at certain other languages, e.g. Serbo-Croatian, Russian, you'll see that they use their fully-accented forms, but others in which pronunciation information is unpredictable don't (e.g. Japanese). The reason I think it's good is that most people will figure out early on (like I did, it only took being corrected once) that those signs are just for conveying stress, and so this will just save the need to hop to another page just to read the stress. AND, this can also disambiguate different stresses of a spelling with multiple possible stress locations (with different meanings) in the translation sections itself. If there are multiple valid stresses for the same exact sense, I would leave it out.
  • Descendants:
I would make the same argument as above, in that it can just be helpful to see it, but I'm not too bothered about whether we have it or not here. I'd say these two should both have the same policy though.
  • Etymologies, related terms, etc.: fully agree
  • Quotations:
Very strongly agree. This is one of the perfect places to indicate to readers that stress is not written, unlike Greek (except in really exceptional circumstances for most words). Keeping the grave on ѝ on the other hand helps to identify that this is written, which having too much accent noise in every single other word would invalidate.
In all, I'm honestly having a hard time forming a proper opinion on this, because on the one hand I do find the stresses helpful around the project, but also recognize that this can be misleading and trick people into thinking words should be written with accents all that time - especially considering many other languages do omit their accent notation, which makes us few Slavic languages stand out in an unhelpful way.
Kiril kovachev (talkcontribs) 11:59, 23 November 2023 (UTC)[reply]
@Kiril kovachev thanks for your thoughtful comments! All I would say is - I'd like to decouple the considerations for Bulgarian from what other languages do. East Slavic languages do indicate stress, but what Serbo-Croatian actually indicates is the different pitch accents and vowel length (rising, falling, etc). Macedonian typically has fixed stress so it's not indicated in most translation sections. Chernorizets (talk) 12:19, 23 November 2023 (UTC)[reply]
My two cents: I believe stress is useful for etymologies and descendants for the same reason: Reconstruction (/ etymology) of accentuation. Just like any other phonemic quality, Bulgarian stress is invaluable for the reconstruction of OCS and Proto-Slavic, just like stress/accent in other Slavic langauges. So giving it in descendant sections strengthens the given reconstructed accent placement/value, while giving it in etymologies (for languages with free stress) is useful to explain the location of the stress on the descendant.
I definitely think giving them in mentions outside of Bulgarian entries where they do not impact these two points is overkill - so mentions, noncognates... I agree on quotations, but am undecided on usage examples, since there the usefulness for learners might outweigh its disadvantages. Thadh (talk) 12:40, 23 November 2023 (UTC)[reply]
@Thadh reconstruction pages is one of the situations I missed - in there, I'd def want the Bulgarian stress per the reasons you provided. I was talking specifically about something like καρφίτσα (karfítsa) which has several descendants including Bulgarian through borrowing - I'm not sure that's enhancing the Greek entry in any way. Chernorizets (talk) 12:59, 23 November 2023 (UTC)[reply]
I would like to extend this discussion to Russian (and potentially other East Slavic languages), which has the exact same situation. @Tetromino, Anarhistička Maca, SUM1 (forgive me if I forgot anyone).
No need to necessarily come to the same conclusion, but I think there will be common arguments. Thadh (talk) 12:20, 23 November 2023 (UTC)[reply]
Stress should be always marked, besides quotations. We are a dictionary and dictionaries do that. Argument about diacritics in other languages is bad, if one doesn't know basics about some language it is not important for him anyway. Sławobóg (talk) 12:40, 23 November 2023 (UTC)[reply]
@Sławobóg so e.g. French and Italian words should indicate stress in translation tables and elsewhere they appear? Chernorizets (talk) 21:33, 23 November 2023 (UTC)[reply]
Why would it be difficult for users to realize that Bulgarian is generally not written with accent marks? One can’t rely on dictionary mentions not being enriched with generally unwritten extra information. And the first thing in learning a language is its segments so everyone who looks for words in a language already has the idea of what is an unwritten addition. If stress might differ in dialectal works then do something else than adding stress marks to their quotations. Otherwise what is cited tends to be standard. Everything tends towards a standard.
It’s search engines’s job not to get confused. I fare quite good with DuckDuckGo so far finding Arabic text with and without diacritics whatever I enter, and I think Google is not completely dumb about the matter either, though to the extent of annoying with fuzzy matches. Firefox search function Ctrl+F also has a toggle about whether accents shall be ignored, so this is really programmable. In five years the topic won’t be the same as today. Fay Freak (talk) 21:33, 23 November 2023 (UTC)[reply]
@Fay Freak unlike Russian and Polish, for example, Bulgarian is a relatively unfamiliar language to most people. I'd agree that users who specifically use Wiktionary to look up Bulgarian terms will most likely be aware of the language's written convention. I suppose my comments are mostly about users who'd be getting their first exposure to Bulgarian via Wiktionary, or at least where it's one of their first reference sources. If you really don't know much about the language, it's not even clear that something like á is a stressed version of "a" vs. a different sound, e.g. a long vowel as in Czech. I'd prefer that people navigate to the entry to see the actual pronunciation, rather than form (possibly incorrect) guesses based on how we represent stress. Chernorizets (talk) 21:47, 23 November 2023 (UTC)[reply]
@Chernorizets: The conventions are developed right here. This is the dictionary and we are all the designers. You can find dictionaries with various levels of precision, accuracy, convention. Some only provide definitions or translations into other languages, others add genders, inflections, word stresses (including word stresses in inflected forms), pronunciations, etymologies, then providing synonyms, derivations, usage examples, etc. It depends on how you want to provide, your diligence, time available, etc.
All East Slavic and most South Slavic languages don't have a predictable word stress and stresses can change in forms. You can't assume users will know that, if they can't find it hear, they won't know or they will search elsewhere.
Russian entries always had word stresses in the headword, Bulgarian terms didn't always have that. Bulgarian editors didn't bother that much. Now you have to add, otherwise the entry will fail.
Nobody will make you add stresses everywhere but it's something considered useful. At my first exposure to Bulgarian I was very surprised how stresses can differ from East Slavic and Bulgarian. (East Slavic languages differ between each other as well). Even Macedonian editors (Macedonian stress is mostly predictable but there are exceptions and some rules) discussed recently that it is important to provide stress to avoid misleading fellow Slavs. Users and learners tend to make assumptions, and they are very often wrong. Anatoli T. (обсудить/вклад) 22:16, 23 November 2023 (UTC)[reply]
@Atitarev I have no issue with stress in headwords - that's in fact the convention of all major Bulgarian monolingual dictionaries, and if anything I'm surprised there was a time we didn't have that. I'm taking it for granted, and my post is specifically about indicating stress outside of headwords.
I'm very much not assuming that people will know where Bulgarian stress falls by just looking at a word. What I'm trying to understand is whether e.g. indicating stress in a translation box is helpful, vs. effectively training users to go to the entry, where there's IPA, and due to @Kiril kovachev's efforts - more and more audio. The acute stress we use for Slavic languages written in Cyrillic can mean "vowel lengthening" rather than stress in other Slavic languages, like Czech and Slovak. So if I'm a user looking at a translation box that has Bulgarian, Macedonian, Czech and a bunch of other languages with actual diacritics as part of their regular orthography, I could make a wrong assumption about what the accent means, unless I click to go to the entry.
Again, I'm just asking questions. I myself have been putting stress everywhere, but lately I've been questioning that choice in certain situations. Chernorizets (talk) 22:36, 23 November 2023 (UTC)[reply]
@Chernorizets: Wrong assumptions are easily eliminating by a "preface" in a dictionary, Wiktionary:About Bulgarian, Wiktionary:Bulgarian transliteration. There are many letters and symbols that mean completely different things in different languages (even close ones). Users are smarter than that and we also help them by providing the info. For many years, a Macedonian editor consistently added word stresses in translations only, even he made or edited very few entries. He considered stresses very important but Bulgarian anonymous contributors never-ever did that, which annoyed me personally. I wasn't aware of online dictionaries, so I struggled to find and add those stresses. I even pleaded with Bulgarian Wiktionarians once to reconsider their practice of making entries completely void of stresses. It's good that you raised these points. I strongly support and encourage adding stresses everywhere you can. If you don't do it yourself, others may. Multiple stresses are also supported with by adding both: мо́лив (bg) (móliv), моли́в (bg) (molív). I do so with East Slavic languages, including normalising quotes. I don't think quotes should use e.g. the spelling ребенок (rebenok) when a dictionary entry is ребёнок (rebjónok, child). It's a regular convention not to use letter "ё" in writing. Anatoli T. (обсудить/вклад) 22:53, 23 November 2023 (UTC)[reply]
@Atitarev I would love to be able to communicate things like that using an appendix. However, I'm not sure how we can raise the visibility of these appendices. Wiktionary:About Bulgarian has 115 views in the past 3 months combined, some of which are me adding new content. By comparison, just last month there were ~250k Wiktionary page views from Bulgaria, and ~160M overall. In essence, the "preface" to the dictionary is invisible. If we had a way of making it more visible, I'd worry much less about users getting the wrong idea in a variety of situations, including stress. Chernorizets (talk) 00:16, 24 November 2023 (UTC)[reply]
@Chernorizets: IMO, the visibility should be the least of our concerns. The information is available and there are pronunciation sections to match. Many just figure out themselves. Does everyone read a preface to a dictionary? If you know how things work, you don't have to. If you've seen Bulgarian entries, you don't have to scratch your head about the use of acute accents. I'd say, it's with Czech/Slovak that is kind of unusual (for speakers of other languages, apart from Hungarian and a few others) to know that an acute accent actually stands for a long vowel. You don't have to worry about that. Do you think a lot of people see trouble in using e.g. letter "q" for the sound /t͡ɕʰ/ in Mandarin? You can't cater for everyone who doesn't bother familariasing themselves with spelling rules, alphabets, transliteration symbols and their meanings. Should we worry too much if someone finds e.g. letters "č" or "ǎ", which are used to transliterate "ч" and "ъ" confusing or misleading or is making wrong assumptions about how to read them? Anatoli T. (обсудить/вклад) 00:27, 24 November 2023 (UTC)[reply]
Here is my revised proposal, in view of the preceding conversation. Recall that I take for granted that stress should be displayed in headwords as well as related, derived etc terms within a Bulgarian entry, so this is about other places Bulgarian terms might show up.

Display Bulgarian lexical stress in:
  • collocations and usage examples provided to exemplify a term
  • translation sections of English entries
  • etymology sections of entries in other languages that represent borrowings from Bulgarian
    • relatedly, lists of cognates in etymology sections
  • "Descendants" sections of Old Church Slavic terms and Proto-Slavic reconstruction pages

Do not display Bulgarian lexical stress in:
  • quotations
  • "Descendants" sections of terms in other languages, e.g. Greek or Latin
Thanks,
Chernorizets (talk) 00:55, 26 November 2023 (UTC)[reply]
@Chernorizets: Note that {{desc}} allows multiple forms separated by "|", so adding more than one form would not be onerous.
Pl take a look at Polish żebrak. Ukrainian and Belarusian descendants differ in stress from the Polish source, Belarusian жабра́к (žabrák) and Ukrainian жебра́к (žebrák), unlike the Polish have stresses on the last syllable. How is it not useful?
I used to have a Chrome plugin, which disappeared to quickly add acute accents. The accents are all available in the edit mode and can be copy-pasted.
Also, quotes can be useful tool for language learners. We are dictionary, not a collection of quotes. Russian books with stress marks throughout the book were always in demand when Russian was popular with learners. (it's not so due to known reasons). My point is, adding a stress is not breaking a quote but a link may show the actual text. (With Chinese quotes, both traditional and simplified versions are given, regardless of the source version).
Stresses are always desired but they are not mandatory as it is now. It's up to the editor, in fact. Someone with more time on their hands and their perceptions might add stresses everywhere, if they feel like it. No need to make a rule about it, IMO. Anatoli T. (обсудить/вклад) 07:20, 26 November 2023 (UTC)[reply]
@Chernorizets I have just encountered this thread, and more or less read it. I generally agree with User:Atitarev that we should add stress marks pretty much everywhere. I know some people have argued that quotations in Russian (for example) should not have accent marks to be faithful to the original, but I personally think usefulness to the learner outweighs this consideration. In Russian, especially, changes in stress drastically affect the pronunciation due to unstressed vowel reduction, and I wouldn't be surprised to hear that non-native speakers putting the stress in the wrong place leads to natives often not understanding what's being said; this is a frequent issue with non-native pronunciations in English, which similarly has vowel reduction of unstressed syllables. As for Descendants, it seems strange to me to make a distinction between Descendants sections of OCS and Proto-Slavic terms vs. other terms (I think most people will not be able to remember such a rule or understand the logic of it) and I would definitely argue in favor of keeping the accent marks in these sections. Benwing2 (talk) 02:24, 27 November 2023 (UTC)[reply]
@Benwing2 I can get behind the idea that editors might find it tricky to remember to put the stress in some descendant sections, but not others. I'm happy to amend my proposal to say that stress should be indicated in all "Descendants" sections.

I wouldn't revert or "amend" edits by editors who include stress in quotations, but I'm very much not going to do that myself. A number of early modern Bulgarian works from the 18-19th centuries (pre-standardization) would actually indicate word stress in writing, using the acute accent no less. I would preserve that if I'm quoting from such a text, but otherwise I deem it anachronistic and distracting. I feel like collocations and usexes are more learner-friendly than book quotations anyway (or at least they can be, if we make it so), and I've already recommended that we indicate stress there.

For context, I'm not trying to establish inflexible "thou shalts" that every Bulgarian editor must follow. I am, however, working on a Bulgarian Lemma Improvement Project, and this discussion affects the guidance I want to provide in there. Thanks for your feedback!
Cheers,
Chernorizets (talk) 02:47, 27 November 2023 (UTC)[reply]
@Chernorizets Absolutely, there's no need to include word stress in quotations if you don't want to. For Russian they're often there because I wrote a script to add them (as well as link words to their lemma form) when it's unambiguous. Benwing2 (talk) 03:06, 27 November 2023 (UTC)[reply]
@Benwing2, @Chernorizets: My point as well. Editors don't have to go an extra mile, if it's hard, error-prone, etc. Anatoli T. (обсудить/вклад) 03:09, 27 November 2023 (UTC)[reply]
@Chernorizets, @Benwing2: You don't have to add stresses, it's good to have, maybe there will be a quicker way in the future. If a good practice is established, it will catch on but we don't want wrong stresses either. I have to deal with that in other languages. Anatoli T. (обсудить/вклад) 03:08, 27 November 2023 (UTC)[reply]
@Benwing2, @Chernorizets: Thanks. Other Slavic speakers would invariably mispronounce Bulgarian descendants of *poľe, *moře, etc. It's just (seems) counter-intuitive. Anatoli T. (обсудить/вклад) 03:05, 27 November 2023 (UTC)[reply]

Ideas regarding International Dark Sky Week[edit]

(Original Poster's Note: Originally from here; thanks, Sgconlaw (talkcontribs) for your advice.)

Any ideas for International Dark Sky Week regarding the Word of the Day? -- Apisite (talk) 22:20, 24 November 2023 (UTC)[reply]

noctalgia for sure (hot word). —Justin (koavf)TCM 22:48, 24 November 2023 (UTC)[reply]
The term nyctology has Quiet Quentin hits. --Apisite (talk) 23:07, 24 November 2023 (UTC)[reply]
How about light trespass? --Apisite (talk) 23:08, 24 November 2023 (UTC)[reply]
Four possible candidates: skyglow, scotobiology, satellite flare, dark-sky preserve. Qwertygiy (talk) 23:16, 25 November 2023 (UTC)[reply]
What about scototherapy? --Apisite (talk) 01:14, 27 November 2023 (UTC)[reply]
nyctophilia --Apisite (talk) 11:14, 29 November 2023 (UTC)[reply]

Transliterations for Itelmen and Russenorsk[edit]

Both Russenorsk and Itelmen use Cyrillic alphabet. Russenorsk is usually attested in Latin script, but the spelling is often Norwegian, and does not correspond with transliterations used for languages with Cyrillic alphabet. For example, the Russenorsk куры фра (kury fra) would be spelled rather as "kori fra" by the spelling rules used in Norwegian (which doesn't have an Ы-sound, though). But the transliteration would be something else, so I dared to spell it as "kurî fra", while Brochman spells it as "kury fra" by some reason.

The Itelmen people have two different ways of reading the letter "в": it may be /v/ or some kind of /ɣ/. I dared to show it in transliterations in entries like кувумнук. But in the same time there is a word "кив" spelled as киғ, and I don't find any source for where it came from, because the standard spelling, AFAIK, shold be "кив".

Both for Itelmen and Russenorsk I did also used a letter "ň" in transliterations when the situation required it, buth I'm not sure if this letter is OK to use for non-Slavic languages. In Turkmen, the letter "ň" is a completely different sound, so maybe it would be better to use some other letter.

Is it ok to use this self-made transliterations in these two cases? And if so, what rules should be followed while making such transliterations? Tollef Salemann (talk) 09:20, 26 November 2023 (UTC)[reply]

@Tollef Salemann: I would think of defining Wiktionary:Russenorsk transliteration (for Russenorsk) based on other transliteration policies. Since we don't have anything, something may work. If someones disputes, the policy will change, as long it's consistent and follows some rules. As for letter ы (y), which is transliterated as "y" in Russian as in Russian ры́ба (rýba, fish). You can choose "y" to distinguish from "i" in Russenorsk вино (vino, wine) = vin (wine). The transliteration can be more phonetic, so using the same symbol for different letters is also OK. Anatoli T. (обсудить/вклад) 06:23, 27 November 2023 (UTC)[reply]

"surface analysis" -> "surface etymology" in Template:surf[edit]

User:DCDuring suggested on my talk page that the template {{surf}} (which is an alias for {{surface analysis}}) should read "By surface etymology ..." instead of "By surface analysis ...", because the template is intended specifically for etymology sections and "surface etymology" will be less opaque to non-linguist users. Currently the main name is {{surface analysis}} and {{surface etymology}} is an alias, but presumably if we make this switch, we should switch which name is considered the main one. Benwing2 (talk) 02:13, 27 November 2023 (UTC)[reply]

I care much more about what is displayed to (normal) users than what the name of templates or aliases might be. DCDuring (talk) 15:48, 27 November 2023 (UTC)[reply]
Support: Easier for readers to understand. – wpi (talk) 05:27, 27 November 2023 (UTC)[reply]
I question whether this is actually easier for users to understand. Why is "etymology" an easier word than "analysis"? I've always found it obvious what it meant, where as "surface etymology" made me do a double take. I'm inclined to oppose unless there's actually reason to believe that users have difficulty with the current terminology. Andrew Sheedy (talk) 05:41, 27 November 2023 (UTC)[reply]
Oppose Word0151 (talk) 07:29, 27 November 2023 (UTC)[reply]
Oppose Oppose. Analysis refers to the process of understanding a word (or phrase) from its morphemes. It relates to the fact that words are frequently recoined, so tracing a word back to centuries before an instance of its use may not explain the origin of what was uttered - it may have been freshly recoined. --RichardW57m (talk) 13:27, 27 November 2023 (UTC)[reply]
'Analysis' is open-ended, raising questions that 'etymology' does not. Our definitions of the two terms say they are synonymous, but that only applies to the linguistics definition. A simple Google web search has the first three hits for "surface analysis" be for weather, the next for chemistry. Of the 125 hits Google showed 0 were for linguistics, all but one of the rest were for chemistry/materials science and weather. The exception was for "attack surface analysis" (IT security). Excluding "materials", "chemistry", "weather", and "NOAA" from the search still yield not a single linguistics hit. IOW "surface analysis" has the meaning we would have for it only in a rarified linguistics context. So, if we intend to address a user population that it is not primarily drawn from this tiny group, we should use the more readily understood surface etymology. DCDuring (talk) 15:43, 27 November 2023 (UTC)[reply]
@DCDuring: You are overlooking the certainty of a surface analysis being applied to 'surface analysis'. If we indicate the context, results improve - I quickly got 10 hits on the phrase "surface analysis of the text" and none for the phrase "surface etymology of the text". --RichardW57m (talk) 16:31, 27 November 2023 (UTC)[reply]
Examination of the Google web hits for "surface analysis of the text" shows that, even in linguistics, "surface analysis" has broader meaning than "surface etymology". Why choose a polysemic term when a transparent monosemic synonym if available? DCDuring (talk) 16:49, 27 November 2023 (UTC)[reply]
Check out 臺灣 and Taiwan where "terraced bay" etymology is discussed. Use of surf here [10] --Geographyinitiative (talk) 13:49, 27 November 2023 (UTC)[reply]
@Benwing2 it's probably worth noting that neither "surface analysis", nor "surface etymology" appear to be standard linguistics terms (with the meaning used on Wikt) - searching for them on Google reveals hits on Wiktionary, or other sites like Reddit where people ask each other what Wiktionary means by them. "Synchronic analysis" is a thing, but I'm not sure if it can be applied to single words. Different editors have used different ad-hoc ways to express the idea, such as "morphologically equivalent to". All in all I think the text of this template should receive attention from a professional linguist. Chernorizets (talk) 20:21, 27 November 2023 (UTC)[reply]
@User:Chernorizets The only discussion I found was a three-person thread at Reddit that wondered whether "surface" implied "superficial" and, thus, "erroneous". DCDuring (talk) 18:33, 28 November 2023 (UTC)[reply]
@Chernorizets I have a Ph.D. in linguistics so you could say I'm a professional linguist :) although my background is really computational linguistics. FWIW I've seen "synchronic analysis" in some technical books on the morphology of specific languages, which is why I tended to write "Synchronically analyzable as ..." before using this template. I would in fact prefer "synchronic" something rather than "surface" something, since "synchronic" has a well-defined linguistic meaning. Benwing2 (talk) 20:28, 27 November 2023 (UTC)[reply]
@Benwing2 good to know! :-) As for "synchronic", I would prefer it too from the POV of it having, as you say, a well-defined linguistic meaning, but I fear it would be opaque to users without much linguistics background. One way to address this would be to include a link to an appendix entry, which would be equivalent to the status quo (no user is reasonably expected to know what "surface analysis" means without looking it up). Another way would be to use wording that's still justified by linguistics, but perhaps more familiar-sounding to more people - for instance, "Morphologically equivalent to... ", or even just "Morphologically, ..." as @Vininn126 had suggested in another BP thread.
IMO, we should perhaps coalesce around no more than 3 possible wordings, and seek consensus either in here or in a vote. In all cases, there should be a link to some appendix term. Chernorizets (talk) 03:16, 28 November 2023 (UTC)[reply]
"Synchronic" and "morphological" are not really much help in demystifying what we are trying to say. I think that only historical, diacronic etymology is "real" etymology and objected to the "equivalent to" that was used to introduce the synchronic etymology. Now it seems better to me, so much so that I would have it introduce the etymology section, with the diachronic ("real") etymology following. DCDuring (talk) 18:33, 28 November 2023 (UTC)[reply]
I propose the formulation 'synchronically A + B'. It is concise and amply conveys what is intended. As mentioned, an appendix link can clarify matters for those unfamiliar with the term, as is already done with 'surface analysis'. Nicodene (talk) 14:57, 3 December 2023 (UTC)[reply]
@Nicodene Totally fine with me. As for User:DCDuring's concerns, I don't think "surface" will be more helpful than "synchronic" since to me (and apparently to many others) it suggests there's something wrong with the etymology. Benwing2 (talk) 19:09, 3 December 2023 (UTC)[reply]
This works, though I have a preference for "morphologically". Vininn126 (talk) 19:13, 3 December 2023 (UTC)[reply]
Oppose. This is User:MedK1 and I don't see how etymology is any less "opaque" than "analysis". The term is in the appendix anyways and since "analysis" is more common than "etymology", I'd actually be inclined to argue the opposite point and say 'analysis' is clearer. I'd support changing to synchronically analyzable though, as per Benwing, it is an established term in linguistics. I too got a little confused as to whether "surf" means it's erroneous. The way it is right now, it's completely ambiguous, the change to "etymology" changes nothing and it's still 100% ambiguous, and "synchronically analyzable" says "not necessarily" leaning toward 'no', as it says it is a thing you can do. The real problem is with the appendix's description, not the terminology. 2804:1B0:1900:E91A:D4AA:F5EB:3499:2286 00:26, 30 November 2023 (UTC)[reply]
@MedK1 Agreed. If we ignore non-linguists, it's fine. DCDuring (talk) 18:59, 3 December 2023 (UTC)[reply]
Sorry to repeat myself: The old wording "equivalent to" seems to avoid using jargon and avoid the (rare?, but real) impression that surface implied superficial. DCDuring (talk) 20:06, 3 December 2023 (UTC)[reply]
Support "equivalent to". Andrew Sheedy (talk) 23:56, 3 December 2023 (UTC)[reply]
Support "equivalent to". --RichardW57m (talk) 09:22, 4 December 2023 (UTC)[reply]

Japanese JAccent app - with recordings and pitch accent[edit]

(Notifying Eirikr, TAKASUGI Shinji, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): : I am exploring JAccent app (exists at least for iPhones), which provides Japanese pitch accent, it has good reviews. Unlike Daijirin, it's completely free. There may be other similar apps. I couldn't find their source yet.

Pls share your feedback. BTW, I am not promoting any products, I just think it may be useful for the project. There are still many Japanese terms without pronunciation sections, let alone pich accent info. Anatoli T. (обсудить/вклад) 03:31, 27 November 2023 (UTC)[reply]

The first "victim" of a new reference: (せっ)する (sessuru). Also in Module:ja-pron. Anatoli T. (обсудить/вклад) 03:41, 27 November 2023 (UTC)[reply]
AI-powered Japanese accent dictionary — this is weird. Is this glorified wording for "we asked a statistical model for the most likely pitch accent of any given word"? —Fish bowl (talk) 03:48, 27 November 2023 (UTC)[reply]
@Fish bowl: I think it's referring to usage examples, not individual word entries. I got some kind of warning when I first tried usage examples, something about AI generated, not necessarily matching entries. I dismissed it too quickly but it's worth double- checking. Anatoli T. (обсудить/вклад) 04:06, 27 November 2023 (UTC)[reply]
Some older comments (a few) says they can't distinguish between Heiban and Odaka accent.
Are there a few Odaka examples, other than 羽根突(はねつ) (hanetsuki), which can be both? Anatoli T. (обсудить/вклад) 04:15, 27 November 2023 (UTC)[reply]
Actually, (はし) (hashi), which has an Odaka accent, shows correctly. Anatoli T. (обсудить/вклад) 04:17, 27 November 2023 (UTC)[reply]
(I also wonder if the reviews can be unreliable for the reason that the people who need the app the most would be least likely to spot mistakes. —Fish bowl (talk) 04:45, 27 November 2023 (UTC))[reply]
@Fish bowl: That's true but it's sometimes good for checking (semi-)negative comments, like lack of Heiban and Odaka accent distinction. I can't establish if that's the case, I checked one obvious one and it is OK.
Not sure if they include variant accents. Anatoli T. (обсудить/вклад) 04:52, 27 November 2023 (UTC)[reply]

The idea seems good. However I have some concerns.I believe 99%+ of the entries are accurate, but checking a few tricky ones let me to conclude that it's not reliable source.

1. Reading errors such as 一期一会(いちごいちえ) (ichigoichie) was read as いっきいちえ (ikkīchie), and 十把一絡(じゅっぱひとから) (juppahitokarage) is also wrong.

2. As you mentioned, it can fail to identify odaka, such as 危機一髪(ききいっぱつ) (kikīppatsu).

3. Words with multiple acceptable Tokyo accent, only one is displayed, such as こころ (kokoro)

As a side note, even though daijirin can cost money, there are epwing files for older versions that are freely available, which are much more authoratative in my opinion.Shen233 (talk) 18:39, 27 November 2023 (UTC)[reply]

@Atitarev I'm kind of afraid of whether or not the AI is being used on invidiual words, too, which is hard to say because many compound words can be predicted using AI but there may be exceptions. Moreover, supposedly the sources of the app are https://accent.u-biq.org/ and https://www.gavo.t.u-tokyo.ac.jp/ojad/pages/using/search, the latter of which I know we already use and is a good source. I can't speak to the reliability of this new source, but I'm rather more on the doubtful side, and also wish to note that if the sources used are just the above, that we should consult them directly instead of running the risk of duplicating AI-generated follies. But if you think it's good, then I would accept it being used. Kiril kovachev (talkcontribs) 22:23, 30 November 2023 (UTC)[reply]
@Shen233: Hi. Thanks for the feedback. Sorry, didn't get around to respond. I can see the app doesn't have any variants. It only provides one version. But if that version is correct, should we still accept it? I don't insist. Anatoli T. (обсудить/вклад) 23:24, 30 November 2023 (UTC)[reply]
@Kiril kovachev, @Shen233: I can see the flaws, so yeah, not so reliable. I will check the links, thanks. Anatoli T. (обсудить/вклад) 23:28, 30 November 2023 (UTC)[reply]
@Anatoli T. Well again, for the vast majority of words it is fine. But for some, such as いくら (ikura), different part of speech corresponds to different pitch accent, having one version is then obviously not enough.
That aside, I'm still on the more cautious side that if you were ever to cite it for the sake of filling in the gaps for many words without pitch accent, have some sort of disclaimer that the informaiton comes from an app (possibly AI) and maybe incorrect. That will inform the reader for sure. And if there's accent information from authoritative dictionary, such as daijirin or NHK accent dictionary, then that's always preferred. Shen233 (talk) 23:37, 30 November 2023 (UTC)[reply]
@Shen233: As in my first post, e.g. (せっ)する (sessuru), if you open the entry, the reference says:
"JAccent app JAccent: Japanese dict with AI". Anatoli T. (обсудить/вклад) 23:50, 30 November 2023 (UTC)[reply]

Reference anchor for source for existence of spelling[edit]

If I cite a dictionary for the Mon spelling of a word in an LDL-language, how should I anchor a reference for it? An example of such a case is Pali ဂါဝဳ (gāvī, cow), which is a subsidiary lemma (arguably a non-lemma). I have currently added it to the headword line, where it could be misinterpreted as being cited as evidence of the gender. (The actual evidence for the gender would be indicated in the principal lemma, gāvī (cow); the dictionary cited for the spelling actually fails to tell us the gender.) --RichardW57m (talk) 16:23, 27 November 2023 (UTC)[reply]

In this particular case, the Mon spelling and reported meaning are exactly what we would expect, so the 'editor community' could impose conditions on the use of the dictionary if it ever bothered to list what are acceptable sources for a single mention. --RichardW57m (talk) 16:23, 27 November 2023 (UTC)[reply]

There is support inside of Module:headword for adding references on the headword itself. I need to check if this is exported to Module:headword/templates. Benwing2 (talk) 20:30, 27 November 2023 (UTC)[reply]
So I can add the interfacing code to Module:pi-headword if no-one objects to the layout
ဂါဝဳ[1] f
I'll be developing the code in Module:User:RichardW57/pi-headword. The capability doesn't appear to have been exported to {{head}}, so no early preview. --RichardW57 (talk) 22:27, 27 November 2023 (UTC)[reply]
@RichardW57 Looks fine to me. Benwing2 (talk) 22:35, 27 November 2023 (UTC)[reply]

Burmic Scripts for Pali[edit]

This is just an exploratory question.

Would it be excessive to select glyph styles for Pali by writers' native tongue and country? For example, Tai Tham glyphs are rather different between (1) Burma, (2) Northern Thailand and (3) northeast Thailand and Laos. (China mostly follows Burma - I am not sure of any big differences.) Similarly, a few Burmese script glyphs are very different between (1) Burmese practice, (2) Mon practice in Burma, (3) Mon practice in Thailand, (4) Thai practice - at least, in an allegedly 'Mon script' sample I've found, though it could just be a matter of the date of copying. I think there are some differences between Shan and Burmese practices, and apparently between Shan groups.

One could argue that we don't make any allowance for the vast variations in the Latin script over time, and using different styles could hinder readers rather than help them. --RichardW57m (talk) 12:53, 28 November 2023 (UTC)[reply]

@RichardW57 How are you proposing to do this selection? IMO it should be automatic (which would probably require different etym-language variants of Pali) or not at all. Benwing2 (talk) 03:40, 30 November 2023 (UTC)[reply]
@Benwing2: That's really a Grease Pit question. My thought had been that it would be done by Wiktionary script; Sanskrit already has multiple Wiktionary scripts for both the Bengali and Mongolian ISO 15924 scripts, though the Sanskrit script subdivisions seem rather to identify character repertoires. Entries in etymology-only languages are banned from main space, so I don't see how they would help with headwords. They would only help with quotations and the like. --RichardW57m (talk) 10:27, 30 November 2023 (UTC)[reply]
@RichardW57m Headwords now support etym-only languages as param 1, although it's not clear it's a good idea since the term itself presumably covers multiple etym-only languages, so I may remove it. But if your intent is to choose glyphs per a writer's native tongue and country, that can only happen in quotations anyway because those are the only things associated with an author. Benwing2 (talk) 10:42, 30 November 2023 (UTC)[reply]
@Benwing2: It's not an intent, merely an idea. --RichardW57m (talk) 12:43, 30 November 2023 (UTC)[reply]
Most instances of writing have a (human) writer, who is in general distinct from an author. For CFI-relevant quotations in most modern languages, the style seen is that of the typesetter. The orthographical and notably different typographical styles can mostly be identified by L1 and country, but not every difference in them creates a new way of writing Pali. --RichardW57m (talk) 12:43, 30 November 2023 (UTC)[reply]
Using etym-only headwords under the language heading 'Pali' could work, though I wonder what it does for categorisation. It also causes fragmentation where most writing systems employ essentially the same glyphs, but another has different glyphs. Perhaps we can take hints from the handling of Prakrit, where the different dialects can have multiple subtle differences. For a related issue where spellings also differ, see Pali ဣတ္ထိ (itthi, woman), where some of the inflected forms have culture-dependent forms. I hope to be able to combine the tables when I have developed flexible, automatedly footnoted inflection tables. The specifically Shan writing systems for Pali may be so different from the Burmese tradition that they may as well be treated as a separate script. --RichardW57m (talk) 12:43, 30 November 2023 (UTC)[reply]
@Benwing2 @RichardW57m I think it's possible to just use different script codes instead of making new etym-lang codes. It's possible to assign ur-Arab to a different font than fa-Arab for example. At least that's how I set up custom fonts on my css page. It seems like the different writing traditions take from the regional languages, so we should be able to specify fonts for those languages script codes and use those same script codes on Pali. (If I put sc=ur-Arab on an Arabic link, it'll use the Urdu font but still link to Arabic.) However the page title will always use the last specified font so we could run into issues with that, so we should probably only do this if the different writing styles have significant differences. - سَمِیر | Sameer (مشارکت‌ها · بحث) 20:48, 30 November 2023 (UTC)[reply]
@Sameerhameedy Thanks. There's another discussion floating around about fixing the issue with the page title using the last specified font; I think if we have Urdu and some other Arabic-script language that doesn't use Nastaliq, we should either use the other language's script or a generic Arabic script. Benwing2 (talk) 21:58, 30 November 2023 (UTC)[reply]
The discussion's at Wiktionary:Grease_pit/2023/November#Interwiki_links_and_display_title. --RichardW57m (talk) 11:24, 1 December 2023 (UTC)[reply]
@Sameerhameedy Yes, bespoke scripts should work when we get the style of the vernacular by choice of font. There are some cases where the 'correct' glyphs are selected from a font by language (I'm specifically looking at the Padauk font), which capability doesn't work on Internet Explorer and early (pre-Chromium) Microsoft Edge (neither of which should be used on the Internet), and I would be impressed by a renderer that could distinguish mnw-TH (Mon of Thailand) and mnw-MM (Mon of Burma), let alone carry over that distinction into Pali. I would rely on stylistic sets (feature ssxx) if I were writing a font to serve both. The capability for CSS to specify the 'language' by an OpenType language tag is at risk of being dropped - when I last checked, only Firefox supported it. --RichardW57m (talk) 12:29, 1 December 2023 (UTC)[reply]
P.S. I should have emphasised the word 'should'. For example, some Tai Khuen fonts don't support Pali, despite being written by monks! Their initial cluster support seems limited to those occurring in Tai Khuen, so Indic rearrangement isn't done for all Pali consonant stacks. --12:43, 1 December 2023 (UTC) RichardW57m (talk) 12:43, 1 December 2023 (UTC)[reply]

'pejorative' probably better than 'offensive' as a descriptor for any given sense.[edit]

English synonyms in Italian entries[edit]

I noticed that a number of Italian entries have synonyms with English words: imho they should either be removed or moved to the sense line. Maybe an Italian editor could take a look.

Here's a list: cercoletto, gazzella dorcade, grolare, leone marino della Nuova Zelanda, orso grolare, orso kermode, orso nero, ossifraga del sud, otaria orsina antartica, otaria orsina del Capo, otaria orsina della Nuova Zelanda, pesce torcia, poiana di Harris, potto, rampichino bruno, spinarolo, squalo gattopardo, squalo longimano, sula australiana, tartaruga del muschio

Thanks, tbm (talk) 07:22, 29 November 2023 (UTC)[reply]

@Benwing2 This probably occurs in other languages as well. Leaving it uncorrected invites more of them. DCDuring (talk) 15:30, 29 November 2023 (UTC)[reply]
We have a wider problem with this when it comes to languages that some users feel are part of another language. For example, Malaysian terms being given as synonyms in Indonesian entries, and hundreds of Adyghe/Kabardian equivalent terms have the other listed as a synonym. Theknightwho (talk) 15:40, 29 November 2023 (UTC)[reply]
I've noticed this in some other languages too and was going to bring this up as a separate topic. tbm (talk) 03:52, 30 November 2023 (UTC)[reply]
The Italian entries (at least the ones I checked) are the old work of a user who was ultimately blocked for myriad problematic edits and unresponsiveness; I wouldn't venture to guess whether he ineptly copied the {{syn}}s from the English entries or ineptly added them. - -sche (discuss) 15:53, 29 November 2023 (UTC)[reply]
I looked at instances of {{syn|en}} that were on pages without English lemmas. There were many alternative and inflected forms of English and there were some with seemingly wrong language codes, eg, en for a Hindi word. The rest were Italian and Translingual. The Italian ones are errors, some best corrected by making sure that any existing entries for any of the English terms has the others as synonyms or by placing them on the definition line, as User:tbm mentioned.
I am not sure whether multiple English words that correspond to a taxonomic name should appear as synonyms in the Translingual entry. I usually put three or fewer in the definition line, but for some common organisms there may be as many as a dozen English vernacular names. Is this how we would prefer to show vernacular names of organisms in all Translingual taxonomic name entries? The other possibilities include:
1 a Translations section
1a using {{trans-see}} (only possible where there is a suitable English vernacular name)
2 a Synonyms section (ie, not using {{syn}})
3 none of the above (the most common at present)
Does this need a distinct BP heading and/or a VOTE? DCDuring (talk) 16:18, 29 November 2023 (UTC)[reply]
@DCDuring I think you're asking about whether to put English synonyms under Translingual. I'd prefer not to do that because it sets a (bad) precedent for other languages. Instead I think we should strive to do (1) where possible and otherwise list them as synonyms under one or more of the English vernacular names (whichever seems most common). If you want to create a new BP header for this, go ahead, but I don't think we need a vote. As for the Italian terms, note that sometimes English synonyms are given that are wrong, e.g. under Italian squalo longimano we have it defined as oceanic whitetip shark and a synonym of whitetip reef shark is given, but per Wikipedia these are two different species. Benwing2 (talk) 03:38, 30 November 2023 (UTC)[reply]
@Benwing2 So, where do you think we should put non-English vernacular names for taxonomic names? Are they synonyms, translations, or to be ignored? We could just send people to Wikispecies, where Wikispecies has an entry. DCDuring (talk) 18:06, 30 November 2023 (UTC)[reply]
Didn't we have a vote to allow translation tables in Translingual entries for taxonomic names? Or is that not what you're asking? Andrew Sheedy (talk) 18:15, 30 November 2023 (UTC)[reply]
Yes, in 2016. Thanks. I'm still fighting the last war. The (non-policy) page WT:Translations still limits translations to English sections. I should change that and start adding Translations sections and some {{t-needed}}s for some languages spoken where the organisms are native. DCDuring (talk) 18:40, 30 November 2023 (UTC)[reply]
@DCDuring In that case I suppose it's fine to put these terms in translation tables under the Translingual taxonomic name. Benwing2 (talk) 22:00, 30 November 2023 (UTC)[reply]
I wonder how long until contributors start adding creative calques of binomial names to the Translingual translation tables. DCDuring (talk) 23:15, 30 November 2023 (UTC)[reply]

New language codes[edit]

@Theknightwho I noticed you adding some new language codes, in particular one for "Old Gorgani". Was there a discussion on this I missed? --{{victar|talk}} 19:33, 29 November 2023 (UTC)[reply]

@Victar It's just Gorgani, and no - it was an obvious gap that was mentioned in several descendants sections. Some entries referred to "Old Gorgani" (while still grouping it under the Caspian languages), so I retained that in those entries. Theknightwho (talk) 19:35, 29 November 2023 (UTC)[reply]
For changes like these, can you please please add an edit summary? I'm seeing these changes appear in my watchlist, and I don't know what they're for. These are changes to massive language modules as well. AG202 (talk) 20:13, 29 November 2023 (UTC)[reply]
@AG202 Sure. Theknightwho (talk) 20:16, 29 November 2023 (UTC)[reply]
Adding language codes based on what's found in descendants sections is not a great practice reckless and should probably all be reverted. Old Gorgani was added by a single, now mostly inactive, user. If anything, an Old Tabari code should be created and Old Gorgani made an dialectal etymology-only code to it, as I outlined in a proposal some years ago. But all that is neither here nor there -- adding language codes should be a task left to the people who work in and are familiar with the area. @-sche --{{victar|talk}} 20:47, 29 November 2023 (UTC)[reply]
And you added Daylami as well, woof. Daylami is unattested and we don't even know what branch of Iranian it belongs too. --{{victar|talk}} 21:26, 29 November 2023 (UTC)[reply]
@Victar Certainly with respect to Gorgani, I fail to see how anything you've said justifies non-inclusion. We can certainly discuss specifics, but simply calling for a revert is unhelpful at best. Theknightwho (talk) 21:38, 29 November 2023 (UTC)[reply]
"We can certainly discuss"? Are you an expert in Iranian languages now as well? My recommendation, as someone who's worked extensively in this area, is that both these codes be deleted. Old {{desc|ira-gor|راست|tr=rāst}} clearly indicates that this has not been well thought out. --{{victar|talk}} 22:01, 29 November 2023 (UTC)[reply]
@Victar What is your proposed solution to the fact that Gorgani is attested, and therefore requires a code of some sort? On what basis should it be considered part of Old Tabari? Theknightwho (talk) 22:08, 29 November 2023 (UTC)[reply]
THW, the status quo is that there is no "Gorgani" language code. The onus is on you to prove that, despite ISO not assigning it a code, it should exist. --{{victar|talk}} 00:14, 30 November 2023 (UTC)[reply]
@Victar I never said there was - I'm talking about the fact that the language evidently does exist, as you very well know.[12][13] The question, as I saw it, was whether we should follow your view that it is part of Old Tabari, which is something I cannot find corroborating evidence for. Theknightwho (talk) 00:20, 30 November 2023 (UTC)[reply]
Languages can have alternative names and exist on dialectal continuums. That does not make them different languages. --{{victar|talk}} 00:27, 30 November 2023 (UTC)[reply]
@Victar
  1. The first paper I've linked notes distinctive characteristics from other Caspian languages; specifically by direct comparison with Mazanderani.
  2. We do not have an Old Tabari code either (and it isn't recognised by the ISO), so the exact same issue arises. If you want to argue that they should all fall under the Mazanderani code, then I refer you back to point 1.
  3. We're not bound by the ISO, so even if we were to assign a code to Old Tabari (possibly as the Caspian protolanguage?), retaining Gorgani as part of that whilst keeping languages like Mazanderani separate would seem to be inconsistent, unless Gorgani were particularly conservative. Is it?
  4. Which alternative names are relevant here (that aren't merely alternative spellings like "Gurgani")?
  5. I agree that languages can exist on a continuum, but that does not preclude having separate codes: for example, Adyghe and Kabardian, or Chechen and Ingush. Again, it is irrelevant that the ISO separate them, because we frequently do choose to merge ISO language codes, but the existence of a dialect continuum is certainly not sufficient alone to justify this.
Theknightwho (talk) 00:44, 30 November 2023 (UTC)[reply]
TWK, these are called dialectal features. To quote Habib Borjian, whom I've personally spoken to on this, from Borjian (2014): "Texts have survived also in the dialect of Gorgān, closely related to Tabari". But be my guest and show me what features of Gorgani merit it being its own language. --{{victar|talk}} 01:07, 30 November 2023 (UTC)[reply]
@Victar The same Habib Borjian who wrote "The Extinct Language of Gurgan: Its Sources and Origins" which I linked above? It has the opening line "One of the poorly studied Iranian languages is Gurgāni, the extinct language of Gurgān, the Persian province at the southeastern corner of the Caspian Sea." He specifically calls it the "Gurgāni language" numerous times throughout.
Sure, it was written in 2008, but I'd like to see something which actually evidences his change in view, and not a selective quote. Theknightwho (talk) 01:32, 30 November 2023 (UTC)[reply]
Hang on - I've found it: you're quoting Borjian from a 2004 paper (Māzandarān: Language and People), not 2014, which means it pre-dates the paper by the same author that I linked by 4 years, and it only mentions Gorgani in passing. That's not evidence of anything. Theknightwho (talk) 01:40, 30 November 2023 (UTC)[reply]
Probably none (no meritable features) because I don’t really know how one can argue about that kind of thing. Even if we people sit in front of all materials, no convincing treatise will be made, and authorities for this exotic occur like once in a generation and then we are supposed to extract a conclusion for our purposes via hearsay. You realize how we try to see anything with the magnifying glass.
For such an emprise I find an edition Saul Shaked An Early Geniza Fragment in an Unknown Iranian Dialect in Acta Iranica 28 (1988) 219–235 and all mentioned as publications upon the lect: Edward G. Browne 1896 69–86, Browne 1898, Browne 1907, Browne 1920 365–375, Huart 1909.
So Victar, what should I do in your view if add words from these texts? This is not a lot at least. It would feel like an orphan or deadend to create such “Gorgani” entries, there is a purely practical argument here in Victar’s favour I suppose. Even reference templates I ostentiously have not started. Fay Freak (talk) 01:52, 30 November 2023 (UTC)[reply]
@Fay Freak To be fair, the very author that Victar claims the authority of has written a paper that specifically addresses the very question we're asking in this thread, and clearly speaks of it as a distinct language in its own right while noting numerous distinctive features (8 of them are listed on page 690 alone) in comparison to Mazanderani (aka Tabari). With a language this obscure, it's about as decisive as we could hope for. Theknightwho (talk) 01:59, 30 November 2023 (UTC)[reply]
@Theknightwho: If there weren’t distinct features, there would be no talk of it at all, and it would not even be a dialect. Reading this paper untainted by previous discussion, I would not conceive the notion that it is a language “in its own right” but that it is a peripheral dialect of Old Mazanderani. Any city of Lower Egypt has or has had greater difference from the neighbouring one and we even merged Lower with Upper Egyptian Arabic for distinction being impractical, admittedly because we know the living languages better. Not to speak of the ridiculous Low German continuum or Swiss German where each valley is likely to have greater difference if one searches for them (as the author did). The unknown tends to be aggrandized if a bright shiny object is brought to attention as an anchor, no known information balancing it out. Practically I would use a label Old Mazanderani or Old Ṭabarī, like Old Amharic and Old Harari, as dedicated chronolectal staging for usually occasional bears no benefit, and within that Gorgānī or Gurgānī to keep the peculiar material separate. In the rare cases where one would pin a Persian word down to be borrowed from Gorgānī one wouldn’t even technically need an etymology code, one would state borrowing from “Gorgani Mazanderani” if not for the incongruous sound, “Gorgani (Mazanderani)”.
A maiore ad minus Daylami and Shāhmirzādī are nothing, Gilaki and Mazanderani dialects, according to the very Borjian 2008 paper. Fay Freak (talk) 03:24, 30 November 2023 (UTC)[reply]
@Theknightwho I have to agree with User:Victar and User:Fay Freak here. You should not unilaterally add language codes, esp. L2 ones. Even in seemingly straightforward cases, like Old Slovak, a discussion and consensus is needed before adding any codes. I would suggest you remove the codes and convert any uses in Descendants sections to mention the lects in question without using any codes, and then post a summary of what you think ought to be added, inviting discussion. Benwing2 (talk) 03:47, 30 November 2023 (UTC)[reply]
@Theknightwho Can you go ahead and undo your changes? Otherwise I'm going to revert everything you added to Module:languages/data/exceptional between Nov 26 and Nov 29, since I can't tell what is legitimate or not, and you have mixed a whole lot of changes together (and without proper changelog messages in most cases). Benwing2 (talk) 01:16, 4 December 2023 (UTC)[reply]
@Benwing2 Yes, okay - but I don't feel confident that a discussion on these two languages is going to be a fruitful one, given how Victar has conducted himself in this thread. It's extremely frustrating. Theknightwho (talk) 01:21, 4 December 2023 (UTC)[reply]
You added several other codes without discussion as well, but only reverted these two I mentioned. --{{victar|talk}} 04:19, 4 December 2023 (UTC)[reply]
@Theknightwho I have to agree with Victar again; all additions should be removed, and you should post a list of the proposed new languages and why they should be added. I imagine some of them will be noncontroversial and necessary, but some may need discussion. In general User:-sche is a good resource for obscure lects and from my experience is unlikely to be stubborn or intransigent, so you should ping them when you post the new list. Benwing2 (talk) 04:41, 4 December 2023 (UTC)[reply]

User:Theknightwho is continuing to add more language codes without any discussion. @Benwing2, -sche, Chuck Entz. --{{victar|talk}} 04:35, 6 December 2023 (UTC)[reply]

@Victar Which language codes? I already wrote a message on his talk page about new family codes that he added and the need to get consensus on them. Were there individual language codes added as well? Benwing2 (talk) 05:13, 6 December 2023 (UTC)[reply]
@Benwing2 No, but @Victar has never let honesty get in the way of a good vendetta. Theknightwho (talk) 05:16, 6 December 2023 (UTC)[reply]
@Benwing2: Please see the language family codes they added to Dravidian. --{{victar|talk}} 05:26, 6 December 2023 (UTC)[reply]
@Victar Please learn to read the comments you are responding to. Theknightwho (talk) 05:28, 6 December 2023 (UTC)[reply]
@Benwing2: He still hasn't reverted those edits. Can you go ahead and do that? --{{victar|talk}} 02:14, 7 December 2023 (UTC)[reply]
@Benwing2 This is starting to feel vindictive and counterproductive from Victar at this point, especially when he’s made it plain he doesn’t wish to discuss things directly. Theknightwho (talk) 02:47, 7 December 2023 (UTC)[reply]

User:MedK1 wants to use AutoWikiBrowser[edit]

Hi, that's me. In my user page, I have a little list of words I'd like to take care of. What I need to do is sort through the list and, on pages where the reintegrated spelling matches the standard spellings, add {{gl-reinteg-conj}} under {{gl-conj}}. It's a small change, but since there are roughly 1k–2k articles there, it'd take a long long time to do without a bot.

If I'm allowed to do this, I plan on starting with the verbs in the "-ar" section that don't end in "-ear", as those are the easiest ones to do. The other verbs need a {{gl-reinteg-verb}} under {{gl-verb}} too.

I want to add links pointing to {{R:gl:Estraviz}} too for pages that need them, and, in the future, both check what pages erroneously (as per WT:AROA-OPT... or rather, as per the "Proper definition of ..." section a bit above) have roa-opt templates and then replace them after the proper roa-opt pages have been created.

So yeah, could I please get perms to use AWB here? I'm guessing the Discord is not the right place to ask for it. 2804:1B0:1900:E91A:D4AA:F5EB:3499:2286 00:39, 30 November 2023 (UTC)[reply]

@MedK1 I added you. Benwing2 (talk) 03:13, 30 November 2023 (UTC)[reply]