Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:BP)
Jump to navigation Jump to search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit

July 2022

Iranian villages[edit]

Do we really need Iranian villages and Ethiopian rivers listed as alternative meanings of an English term? Imho this clutter does not serve our readers, hampering navigation through the page. --Ghirlandajo (talk) 21:59, 2 July 2022 (UTC)Reply[reply]

  • P.S. I removed this stuff but was instantly reverted with the advice to take my grievances elsewhere... --Ghirlandajo (talk) 22:00, 2 July 2022 (UTC)Reply[reply]
These are all valid and attested senses. They should stay in our dictionary. ·~ dictátor·mundꟾ 22:04, 2 July 2022 (UTC)Reply[reply]
More meaningful than seventy Americocentric bum-fuck ghost towns / census-designated places in Nowhere, USA. —Fish bowl (talk) 23:33, 2 July 2022 (UTC)Reply[reply]
Holy hell, this is even worse than I thought lmao. And yet we've deleted Charizard, let alone the litany of other issues. AG202 (talk) 02:37, 3 July 2022 (UTC)Reply[reply]
I don’t mind keeping these, but it does feel a bit eyeroll-inducing when so much time has gone into adding this stuff, when we’re missing a lot of far more notable places elsewhere. Theknightwho (talk) 10:16, 3 July 2022 (UTC)Reply[reply]
@Ghirlandajo See: Wiktionary:Criteria for inclusion § Place names. AG202 (talk) 23:39, 2 July 2022 (UTC)Reply[reply]
@Ghirlandajo You were told to take the issue to RFV, which is the appropriate place, not to “take your grievances elsewhere”. You also claimed in your edit summary that they aren’t English terms, which is a different argument to the one you’re making here. Theknightwho (talk) 02:18, 3 July 2022 (UTC)Reply[reply]
  • Crowdsourcing means relying on volunteers for contributions. If we impose restrictions on what the volunteers want to enter, they tend to stop contributing. Some start by contributing low-value entries that interest them for some reason (their local toponyms, occupational terms, local slang, etc) but eventually stay to contribute material that is of greater value. If a contributor makes contributions that are more trouble (poor formatting, wording, etc) than they are worth, the contributor can be reasoned with and then blocked, if reason doesn't work. DCDuring (talk) 17:14, 3 July 2022 (UTC)Reply[reply]
    Well said. Theknightwho (talk) 14:27, 4 July 2022 (UTC)Reply[reply]
    The statement of principle may be good, but the particular contribution was made by a veteran contributor, who could do better than contribute probably unattestable toponyms. DCDuring (talk) 14:40, 4 July 2022 (UTC)Reply[reply]

Personal attacks by User:Inqilābī[edit]

User:Inqilābī has started making personal attacks against editors who disagree with their proposal to exclude organization fullnames from CFI: here they called User:Fay Freak "misguided" for pointing out the flaws in the proposed change in policy. I excised this personal attack upon seeing it; a day later, Inqilābī reverted my edit and posted a message on my talkpage claiming that editors shouldn't edit each other's comments, even to proactively remove personal attacks. After I pointed out that removing personal attacks is one of the few circumstances in which editing another editor's comments is acceptable, and re-removed the "misguided" part of the relevant comment, they stealthily readded the attack in the process of adding other new comments and also called me "misguided" for calling out their use of personal attacks, while trying to claim, in the face of contrary evidence, that they "don’t[sic] make personal attacks". Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 21:49, 2 July 2022 (UTC)Reply[reply]

LOL. ·~ dictátor·mundꟾ 21:58, 2 July 2022 (UTC)Reply[reply]
@Inqilābī What "LOL", huh? You're one of the most annoying editors here (inb4 says fckn Shumkichi, I knnow, I know :3). I still remember your unfunny "joke" about Russia and Ukraine, the distaste remains. Two quick questions: 1) why are Indians so pro-Russian, is it some kind of Stockholm syndrome or what? 2) why do we even allow pro-Russian users to edit in the first place? I think we should introduce some form of thought police here because everyone must unanimously support Ukraine (i.e. be a normal, decent person - why is that so hard for some of you?). I will only forgive you if you publicly admit your guilt and repent for your words. Shumkichi (talk) 11:05, 4 August 2022 (UTC)Reply[reply]
Double LOL! You okay? Long live Poland 🇵🇱 :D !!! Even though it can’t into space without Soviet assistance… 🥲 (and jokes aside— for anyone wondering, Shumkichi is a liar). ·~ dictátor·mundꟾ 00:05, 5 August 2022 (UTC)Reply[reply]
@Inqilābī How am I a liar? You were joking about Ukraine to piss me off a few weeks after the war broke out so who's the liar? Btw. do you only have Internet Explorer in India or what? Because this joke about space is like 1000000000000 years old. We call ppl like you "bezbek" in Poland, look it up. Shumkichi (talk) 08:54, 5 August 2022 (UTC)Reply[reply]
@Shumkichi: You’re a liar because I didn’t even mention the word Ukraine in any conversations with you. And no, this joke about space is a current meme (hint: Polandball). PS. Don’t assume in which country I live. It doesn’t matter. ·~ dictátor·mundꟾ 00:20, 6 August 2022 (UTC)Reply[reply]
@Inqilābī Jeez, which part do you not understand? You didn't have to explicitly mention the name of the country to make a stupid and disgusting, given the bad timing, joke about it. Btw. are you seriously trying to play the race card? Then you're more Americanised than you realise :D They're the ones who incorrectly use this word for any kind of discrimination or even for simple impoliteness. Wow, the American cultural hegemony seems to work, good job you guys. Ppl like you genuinely make me think that Richard Lynn was right. Shumkichi (talk) 01:26, 6 August 2022 (UTC)Reply[reply]
Why do you insist on personal attacks (calling someone a liar) instead of attacking what they say (ie, what you believe to be false)? The person could be mistaken. The person might be joking. The person might be trolling you. YOU may have misread/misinterpreted what was said. Also, you might try taking in w:Fundamental attribution error, w:Attribution bias, etc.. DCDuring (talk) 13:21, 6 August 2022 (UTC)Reply[reply]
True. But this is an exceptional case where unfortunately Shumkichi started spamming in the BP. ·~ dictátor·mundꟾ 00:11, 7 August 2022 (UTC)Reply[reply]
LOL. I am misguided, for other reasons and why am I even here at 1:48 in the morning reading such accusations.
Also, as academics we love people being misguided. Fay Freak (talk) 23:49, 2 July 2022 (UTC)Reply[reply]
I don't know the context of this at all, and I agree that people should not make personal attacks but I'm not sure if "misguided" counts as a personal attack. Saying someone is misguided sounds to me similar to saying you believe they are wrong in a given situation; it may be a bit rude but it doesn't sound especially personal, unless something else was also said. Also, in general you should not edit other people's comments; it may be acceptable in the case of egregious attacks (e.g. if someone uses racist language or profanity), but usually it is better to leave them alone and let the words speak for themselves. Benwing2 (talk) 00:09, 3 July 2022 (UTC)Reply[reply]
For the record, the phrase that was used by me was ‘misguided opinion’, which was totally apropos, considering the critical and somewhat sharp comments of FF I was responding to (while trying to justify what a wordbook is supposed to be). The accuser attempted to make the false impression that I called other people ‘misguided’ so as to make the non-issue look as huge as possible, but haply, has failed. Whoop whoop pull up should consider taking a wiki-break to relax: sometimes wiki-editing can have an effect on our sanity. Peace be upon all. ·~ dictátor·mundꟾ 11:42, 3 July 2022 (UTC)Reply[reply]
Inqilābī has merely called an opinion misguided so you're mischaracterizing what has actually happened. But even if they had called somebody else misguided, that would of course still totally be within the bounds of the acceptable and wouldn't warrant altering somebody else's comments. — Fytcha T | L | C 〉 12:45, 3 July 2022 (UTC)Reply[reply]
Yes, it’s simple: You only may edit someone’s comment if he would want it, or his lawyer would advise him it agood, but not if you or somebody else wants it, as this is a falsification worse than any unfounded personal attack. Really, aren’t all the shams on the internet worse than its abundance of distaste and injury? “Personal attack” is a strong word and we may just keep to the standards according to which something is allowed or disallowed as an insult—which is roughly said that an utterance is permissible if it has sufficient proportion of reference to the publicly relevant subject matter (Sachbezug). Fay Freak (talk) 14:55, 3 July 2022 (UTC)Reply[reply]
Disagree with the first, agree with the second. Even if they'd want to make the change, let them make it themselves. I'd possibly make an exception for broken links/formatting, but that's it. Theknightwho (talk) 16:01, 3 July 2022 (UTC)Reply[reply]
Yes, you would have to conclude that he specifically wants somebody else to correct, an assumption exceedingly rarely permissible, so you disagree not, defining the putative agreement sufficiently narrow. It is but the thought of rightful negotiorum gestio. Fay Freak (talk) 21:03, 3 July 2022 (UTC)Reply[reply]
...I think you lost me there. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 02:07, 4 July 2022 (UTC)Reply[reply]
@Whoop whoop pull up: if that was a personal attack, then your response to it was, too, and double for this thread. By claiming that Inqilābī's behavior was so serious that you had to take emergency action to deal with it, you're making assertions about their intent and their ability to moderate their behavior. While Inqilābī has never exactly been known for excessive tact, the fact that no one else objected or apparently even noticed there was a problem should have given you pause. This was an open discussion with participation by a number of people who could have taken action, including admins, so there was no reason for you to intervene. Chuck Entz (talk) 15:55, 3 July 2022 (UTC)Reply[reply]
If so, then apologies to @Inqilābī for blowing it out of proportion. Peace out. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 16:14, 3 July 2022 (UTC)Reply[reply]
It is always better to direct one's negative comments to observable actions rather than direct them against the actor. By better I mean less likely to cause unproductive interpersonal conflict. DCDuring (talk) 17:27, 3 July 2022 (UTC)Reply[reply]
@Whoop whoop pull up no, no, personal attacks are generally fine, but I'm a proud hypocrite and since I also don't like the editor you're talking about, I will support you this time. Shumkichi (talk) 11:07, 4 August 2022 (UTC)Reply[reply]
Best wishes. ·~ dictátor·mundꟾ 00:09, 5 August 2022 (UTC)Reply[reply]
@Inqilābī British Raj - see, I also know how to use Wikipedia, what do I win? PS. At least ping me if you're responding to me, do I really have to teach you some manners? Shumkichi (talk) 08:59, 5 August 2022 (UTC)Reply[reply]
Keep it on your watchlist, do I really have to teach you etiquettes? I only ping people for serious stuffs, I do not ping the bored, annoying, racist people out there… Anyways, that Wikipedia article is irrelevant to your argument. ·~ dictátor·mundꟾ 00:20, 6 August 2022 (UTC)Reply[reply]

Without having seen the joke in question, I demand, in the name of human rights, human dignity, and a reasonable respect for intellectual freedom, a qualified right of editors to make unfunny jokes within the scope of building the dictionary project. This is not a joke project, but unfunny jokes must not be a plus or minus to any administrative concerns unless they are just totally outside the scope of building the dictionary. There is no need for a right to tell funny jokes- everyone loves those. What's precious and important is the ability to tell an unfunny joke. --Geographyinitiative (talk) 13:07, 4 August 2022 (UTC)Reply[reply]

Gurgani language[edit]

Hello, could the Gurgani language be added to Wiktionary? It is an extinct Northwestern Iranian language attested in some four books written around c. 1400.--Saranamd (talk) 11:33, 3 July 2022 (UTC)Reply[reply]

I don't have any problem adding this, and the the Glottolog code gurg1241 is a good sign that this is recognised by linguists. Despite the name of the Wikipedia article (Gorgani), I note that Gurgani seems to be in much more common use, so I agree we should be using that as the name.
It's one of the Caspian languages which we've given the code ira-csp, so shall we given this one ira-gur? Theknightwho (talk) 13:16, 3 July 2022 (UTC)Reply[reply]

Can the language code be changed?[edit]

There isnt a code for Malto instead it has the code for its 2 main dialects kmj – Kumarbhag Paharia and mjt – Sauria Paharia even though they are intelligible and commonly considered the same language AleksiB 1945 (talk) 13:56, 3 July 2022 (UTC)Reply[reply]

By default we've used the ISO 639 language codes, which split them, but that doesn't always make the most sense. I note from the Wikipedia article that the lexical similarity is 80%, which is quite a lot lower than the similarity between many European langauges such as Spanish and Portuguese (89%), French and Italian (89%) and various others. Are you sure this makes sense? Theknightwho (talk) 14:32, 4 July 2022 (UTC)Reply[reply]
Wait 89% lexical similarity? how are they considered different languages? apart from that almost all articles take them as dialects of Malto like [1] [2] and [3] AleksiB 1945 (talk) 15:04, 4 July 2022 (UTC)Reply[reply]
80%. Spanish/Portuguese and French/Italian both have 89% lexical similarity. Theknightwho (talk) 16:52, 5 July 2022 (UTC)Reply[reply]

(German) Categorically removing adverbially used adjectives[edit]

(Notifying Matthias Buchmeier, -sche, Jberkel, Mahagaja, Fay Freak):

As Wiktionary:Requests for deletion/Non-English § extrem seems to soon conclude with deletion, I wanted to find out whether the majority view is that we exclude these forms in general. If consensus exists, I will amend WT:ADE accordingly giving everybody the license to delete these fake adverbs whenever encountered. I will of course take proper care and formulate it carefully so as to not exclude anything remotely interesting (if the adverb use has gained additional senses for instance).

My arguments are laid out in that thread in more detail, but just to recap:

  1. The majority of German grammarians don't consider adverbially used adjectives to be adverbs, to belong to the lexical category of adverbs.
  2. No serious monolingual dictionary that I'm aware of (not even online dictionaries) has them.
  3. They clutter the adverb category and make it hard to find true adverbs.
  4. They don't provide any meaningful insight: extrem: adj. extreme, adv. extremely; böse: adj. evil, adv. evilly; händchenhaltend: lol

A related issue: adjectives can be used in 3 ways (attributively, predicatively, adverbially). Whenever a given adjective can only be used in 1 of these 3 ways, we have a way to denote that fact: attributive only (Hamburger, siebte): {{de-adj|pred:-}}; predicative only (barfuß): {{de-adj|predonly}}; adverbial only: adverb entry. However, whenever an adjective can be used in exactly 2 of these 3 ways (but not the third), we currently have no way of documenting that. Some work would have to be done (perhaps also in connection with the adjective templates, @Benwing2) if the concern is that we hereby lose the information of which adjectives cannot be used adverbially (this information is not currently present because almost all adjectives have no adverb section, but it could theoretically be there if we had all possible adjective->adverb conversions as PoS headers). See the RFD thread for a more in-depth discussion on this (especially the proposed categories and the examples to which this applies).

So: should "boring" (i.e. ones that have nothing special about them; see above) adverbially used adjectives be excluded from having separate adverb entries? — Fytcha T | L | C 〉 01:25, 4 July 2022 (UTC)Reply[reply]


  1. Symbol support vote.svg Support. — Fytcha T | L | C 〉 01:25, 4 July 2022 (UTC)Reply[reply]
  2. Symbol support vote.svg Support. This feels similar to the issues brought up in the essersi discussion. Things that are entirely predictable and are easy for language learners to grok tend to clutter up entries and categories if they are specified explicitly. (IMO this argument does not apply so much to inflected forms of most inflected European languages because the forms typically are complicated and not easy for language learners. That's why for example I created a script to generate Russian non-lemma forms and am in favor of doing similar things for Romance languages, Arabic, etc. Turkish is a different story; same for Arabic nouns and verbs with object suffixes (but Hebrew object-suffixed nouns and verbs are far less predictable and hence it makes sense to include them). BTW I can add the necessary support to {{de-adj}}. I'm guessing it's impossible to have an adjective that's both attributive-only and predicative-only, so we just need to add an indicator of whether an adjective can be used adverbially, and since most can, it should be something like noadv to indicate that it cannot be used adverbially (right?). Benwing2 (talk) 01:41, 4 July 2022 (UTC)Reply[reply]
    @Benwing: That would be great and very welcome if you could change the templates accordingly. To summarize:
    • siebte can only be used attributively
    • barfuß can only be used predicatively
    • klein can be used attributively and predicatively, but not adverbially
    • monatlich can be used attributively and adverbially, but not predicatively
    Adverbially only doesn't make sense and I'm not sure whether predicative+adverbial exists. Something like noattr, nopred, noadv, as you propose, would be good in addition to (or as a replacement of?) predonly and pred:- (though the latter should IMO be renamed to attronly). — Fytcha T | L | C 〉 09:39, 4 July 2022 (UTC)Reply[reply]
    It appears that siebte can be used predicatively, but in German it is then somehow often considered a nominalization and capitalized: Brömmel wurde Siebte in 4:26,92 Minuten;[4] Bei den Mädchen holte sich Mathilda Bertaggia souverän den Titel, Paula Kristan wurde Siebte;[5] Teresa Stadlober belegte Platz neun im Final Climb und wurde Siebte in der Gesamtwertung.[6] (If they were men, they’d become Siebter.) Not all uses are capitalized: Luise Werner wurde siebte in Ihrer Wertung;[7] Nadine Kuhl wurde siebte in der Gewsichtsklasse bis 57 Kg in der gleichen Altersklasse;[8] Die Schweiz ist nicht siebte geworden, sondern fünfte, mit gleich viel Punkten wie Dänemark und Schweden.[9]  --Lambiam 19:54, 5 July 2022 (UTC)Reply[reply]
  3. Symbol support vote.svg Support  --Lambiam 19:56, 5 July 2022 (UTC)Reply[reply]


  1. Symbol oppose vote.svg Oppose. As I said at the deletion discussion for extrem, CFI says "all words in all languages"; German zero-derived adverbs are words in a language. On the other hand, CFI says nothing about excluding completely predictable derived forms without morphological change. We're not paper, so we don't have to worry about saving space like major German dictionaries do. The further discussion at that thread also shows significant doubt thrown on the assertion that all German adjectives predictably form zero-derived adverbs. —Mahāgaja · talk 08:08, 4 July 2022 (UTC)Reply[reply]
    @Mahagaja: Yes, all words in all languages. The German adverb word extrem doesn't exist though; this supposed adverb is not part of the German lexis as has been demonstrated sufficiently. — Fytcha T | L | C 〉 09:41, 4 July 2022 (UTC)Reply[reply]
    @Fytcha: So when I say Das ist extrem unwahrscheinlich I'm not speaking German? —Mahāgaja · talk 09:54, 4 July 2022 (UTC)Reply[reply]
    @Mahagaja: The difference between lexical and syntactical categories has also been explained sufficiently. — Fytcha T | L | C 〉 09:55, 4 July 2022 (UTC)Reply[reply]
    @Fytcha: Well, the distinction between adverbs and adjectives used adverbially has been asserted, but without any evidence that there's actually a difference. —Mahāgaja · talk 10:11, 4 July 2022 (UTC)Reply[reply]
    @Mahagaja: Sorry, I was being unnecessarily unfriendly to you in my previous reply. To expand on it a little: in that sentence, extrem is an Adverbial, a member of the syntactic, relational category of adverbially-acting components. Adverbiale is a category that exists only in relation to a phrase, just like Objekte. However, it is not a member of the lexical category Adverbien. This is the view of the majority of German grammarians according to the de.wiki articles that I've cited in the RFD thread. To provide some more literature on this (including actual arguments, not just expert opinion):
    • 2012 October 24, Petra M. Vogel, Wortarten und Wortartenwechsel: Zu Konversion und verwandten Erscheinungen im Deutschen und in anderen Sprachen (Studia Linguistica Germanica)‎[10], Walter de Gruyter, →ISBN, OCLC 300492507, page 212:
      Das spiegelt sich auch in der unterschiedlichen Terminologie für solche Erscheinungen als 'adjektivische Adverbien' (z.B. ADMONI 41982: 204) oder 'adverbiale Adjektive' (z. B. EISENBERG 31994: 220). Ich plädiere hier mit EISENBERG für letztere Lösung, und zwar aufgrund der lexikalischen und formalen Übereinstimmung mit flektierten Adjektiven und setze hier parallel dazu ein Nullallomorph an.
      (please add an English translation of this quote)
    • 2016 December 17, Peter Eisenberg, Grundriss der deutschen Grammatik[11], Springer, →ISBN, 6.1 Abgrenzung und Begriffliches, page 204:
      Kein terminologischer Glücksfall ist das Nebeneinander der Begriffe Adverb und Adverbial. Meistens - aber längst nicht immer - wird Adverb als kategorialer, Adverbial als relationaler Begriff verwendet. Wir folgen diesem Usus und gebrauchen ›Adverbial‹ synonym mit ›adverbiale Bestimmung‹ als Bezeichnung für eine syntaktische Relation (s.u.).
      (please add an English translation of this quote)
    • 2017 December 18, Peter Eisenberg, Grundriss der deutschen Grammatik[12], Springer, →ISBN, page 209:
      Wir schließen uns dieser Position nicht an, sondern plädieren für eine Zuweisung zu den Adjektiven und wollen nur noch von adverbialen Adjektiven sprechen. Zu den drei genannten Argumenten: (1) adverbiale Adjektive haben viele Stellungsmöglichkeiten mit Adverbien gemeinsam, viele andere aber nicht, beispielsweise die in 2. Auf weitere syntaktische Besonderheiten wird weiter unten eingegangen. (2) Adverbiale Adjektive sind auf das Verb bezogen, das ist unstrittig. Nennt man sie deshalb Adverbien, so müssen die Adverbien anders benannt werden, denn sie sind gerade nicht auf das Verb bezogen (dazu 6.1). (3) Die Zuweisung zu den Adverbien aufgrund von Unflektiertheit beruht auf einem systematischen Irrtum. Adverbien sind einelementige (›uneigentliche‹) Paradigmen, die wir deshalb als nichtflektierbar bezeichnet haben (2.1). Nichtflektierbar und unflektierbar is nicht dasselbe. Die Kurzform des Adjektivs, wie sie in prädikativer und adverbialer Position erscheint, ist nicht markiert in Hinsicht auf Genus, Numerus und Kasus und deshalb unflektiert. Das Paradigma, dem sie angehört, ist aber keineswegs nichtflektierbar. Wer dieses Kriterium zum ausschlaggebenden für eine Zuweisung zu den Adverbien macht, müßte jedenfalls auch das prädikative Adjektiv zu den Adverbien zählen (so im Prinzip in Droescher 1974). Das beseitigt zwar nicht die Verwechslung von nichtflektierbar und unflektiert, ist bezüglich des einmal gemachten Fehlers aber konsequent. Auch aus der Sicht einer traditionellen Kategorienlehre hätte die Klassifikation als Adverbien unerwünschte Konsequenzen. Fast alle Adjektive können adverbial verwendet werden. Würde man sie in dieser Verwendung als Adverbien klassifizieren, wären die Adjektive Homonyme einer Teilklasse der Adverbien. Die Kategorie Adjektiv würde nur Elemente enthalten, zu denen es auch ein homonymes Adverb gäbe (Grundzüge: 621). Eine der vier lexikalischen Hauptkategorien hätte ihre Eigenständigkeit verloren.
      (please add an English translation of this quote)
    • 2018 June 22, “Adverb”, in grammis - Grammatisches Informationssystem[13]:
      Adjektive, die von Haus aus ja flektierbar sind, gelten also auch dann nicht als Adverbien, wenn sie in adverbialer Funktion vorkommen: So kommen die Adjektive hastig (ein hastiger Schritt) und laut (eine laute Stimme) in den Sätzen Peter isst hastig und Fritz hat laut gelacht als Adjektive in der Funktion eines Adverbiales vor.
      (please add an English translation of this quote)
    Note that Eisenberg is considered more or less the standard reference for German grammar. — Fytcha T | L | C 〉 11:17, 4 July 2022 (UTC)Reply[reply]


  1. On a balance, I'm weakly inclined to support this, but only weakly. Grammarians do seem to feel that these (adjectives used in adverbial roles) don't belong to the category of the adverb part of speech, and because (nearly) any adjective can be used in this way, it does fill the Adverbs category with noise; compare if we put languages' stative verbs in both the verb and adjective categories to swamp out true [nisba, etc] adjectives. On the other hand, is this different from crowding the noun category with nominalizations of verbs? Meh. I also wonder if turning the presence or absence of an "adverb" section into a question of attestation is more like turning the presence or absence of e.g. a plural into a question of attestation (good), or like turning the presence or absence of a dative feminine singular weak declension slot in an adjective table into a question of attestation (bad, misleading, in that if readers see the section/slot in many entries and then find it missing from one, it would be reasonable for them to gather there's some rule preventing the adjective from being used adverbially or in the dative feminine singular in the weak declension, when in fact the absence is just because we only found two but not three cites of that because the adjective itself was already rare). - -sche (discuss) 22:21, 16 July 2022 (UTC)Reply[reply]
  2. I have to admit ignorance, because I must have been absent when Adverbs were introduced in school grammar. I could as well oppose in support of Mahagaja, in favor of continuing the discussion, though its difficult to argue against the communis opinio, which is echoed by w:de:Adverb as well. There remains a caveat:
  • "Die Zugehörigkeit bestimmter Wörter zur Kategorie Adverb ist häufig umstritten, da nicht immer klar ist, ob die vorgeschlagenen Kriterien nur auf Wörter der Kategorie Adverb zutreffen würden"
Maybe that's why they commit to nothing more than two very broad criteria that would include Auto. Vogel begins with notable references for further discussion to continue with Latin adjectives, which exhibit agreement similar to einen schnellen Wagen. We do include some participles as adjectives (verschwindend).
Eisenbergs counterexample "ein hastiger Schritt" is a rash decision [14]. Ein hastig' Schritt would sound archaizing but not wrong, because there are thousands of compounded collocations without binding vowel. The expression is idiomatic and hastig is debatable. It can be nominalized from pret. er schritt hastig [voran], and the diphtong of the more felicit Schreiten may be the same as for strittig / streitig which points to High German only for the latter.
  • extrem is a useful example because the suffix of extra is possibly adverbial.
scheinbar vs. anscheinend is another neat example of adverbs. The suffixes may point to different grammatical functions that should reflect in their distribution. Compounds with the first sense rely on the bare root schein- (pseudo-) instead. The -bar suffix is a curious case because it is able to calque -abel (indiskutable, doch diskutierbar) with an appical approximant, or -e with an uvular approximant /baːɐ̯/ on the other hand (IMHO); Low German feste points to an adverb suffix, s. v. gerne. Latin extremum, -issimus are deceptively similar to German -sam. E.g. arbeitsam pertains literally to a werkwoord, the superlativ -ste may be used pronomial.
By the way, the "als" in Vogel's use in a context with "solche" breaks my parsing. By the same token I have to reconsider my "wie" in use to introduce enums. In combination, and in spirit of the topic, one might try to read an adverbial pronoun phrase all solche, or aliquis in the post field. The use of *-līk is at any rate remarkable. 2A00:20:6007:A3E4:A4A1:3DFA:B280:FB48 18:18, 5 August 2022 (UTC)Reply[reply]

Should alternative forms be in topical categories?[edit]

For example, should both mesail and mezail be in CAT:en:Armor, or only the lemma? Should both azure and azur (which have out-of-sync pronunciations) be in heraldic categories, or only the lemma? On one hand, having alt forms in the category makes it easier to find the entry you want if you're searching the category for one spelling unaware a different one is lemmatized; OTOH, having half a dozen spellings of chamfron, multiple spellings of mesail, mamelière, affronté/affrontée/affronty, etc cluttering up space makes it harder to look over how many actually-different words are in a category. - -sche (discuss) 10:12, 4 July 2022 (UTC)Reply[reply]

I personally never add alternative forms, alternative spellings or even entries equipped with {{synonym of}} to topical categories, and I would like to keep it that way. Thadh (talk) 10:54, 4 July 2022 (UTC)Reply[reply]
(Relevant thread: Special:Permalink/67301887#Topic cats.) I've also never done so and I think it's pretty redundant that way. Maybe we should write this down somewhere as a guideline. —Svārtava (talk) • 12:28, 4 July 2022 (UTC)Reply[reply]
Synonyms should be kept. — Fytcha T | L | C 〉 12:41, 4 July 2022 (UTC)Reply[reply]
Agree to disagree. Thadh (talk) 13:10, 4 July 2022 (UTC)Reply[reply]
I add them to topical categories where it makes sense (e.g. regional), but it's a bit silly for (e.g.) an offensive synonym to be in Category:British English but not in Category:English offensive terms. Theknightwho (talk) 14:35, 4 July 2022 (UTC)Reply[reply]
It depends on the synonym. If the synonym is completely different in form from the main term, I often add it to the category. Chuck Entz (talk) Chuck Entz (talk) 14:40, 4 July 2022 (UTC)Reply[reply]
The terms double bogey and buzzard both refer to the same thing in golf, but the names for it are very different. I don't see any reason to only categorize one of them in "en:Golf". A dictionary is a directory of words, not of unique concepts. If some words happen to have the same denotation that doesn't make them less valid as entries; whereas mere spelling variations are clearly a different matter. 14:44, 4 July 2022 (UTC)Reply[reply]
In principle, terms defined as synonyms deserve fuller treatment than alternative forms, eg, etymology, derived terms. If the synonym is used in fewer or more senses than the term it is defined as a synonym of, it may need a fuller set of definitions or a usage note. DCDuring (talk) 14:46, 4 July 2022 (UTC)Reply[reply]
Yes, agreed (also with 98., Chuck and Theknightwho above). Synonyms should be full articles, alternative forms should be stubs. — Fytcha T | L | C 〉 14:49, 4 July 2022 (UTC)Reply[reply]
Add me to the list of agrees. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 15:20, 4 July 2022 (UTC)Reply[reply]
I find the alternate forms more bothersome and clutter-y than useful. Nicodene (talk) 12:38, 4 July 2022 (UTC)Reply[reply]
@-sche, Thadh, Nicodene: I have created Wiktionary:Votes/pl-2022-07/Stubifying alternative forms which formalizes this and a couple of other things related to alternative forms that should be clear but haven't been followed consistently on Wiktionary. — Fytcha T | L | C 〉 13:37, 4 July 2022 (UTC)Reply[reply]
@Fytcha: I think references should be allowed (perhaps even obligatory if there are any!) for altforms, since these are the main source of attestation in LDL languages. Thadh (talk) 13:41, 4 July 2022 (UTC)Reply[reply]
@Thadh: In the current text they're allowed if they differ from the reference in the main form. Referring to a different entry in a dictionary also falls under "differing". Is this too restrictive in your opinion? — Fytcha T | L | C 〉 13:44, 4 July 2022 (UTC)Reply[reply]
@Fytcha: Yes, because entries can give both the main (canonic) form and the alternative form together, while counting as verification for both. Thadh (talk) 13:49, 4 July 2022 (UTC)Reply[reply]
@Thadh: Fair enough, I'll change it. — Fytcha T | L | C 〉 13:52, 4 July 2022 (UTC)Reply[reply]
As I interpret WT:RfV, alternative forms need three cites, but cites using alternative forms count to support separate senses at the lemma. Is that our policy or practice? DCDuring (talk) 14:13, 4 July 2022 (UTC)Reply[reply]
I personally think you might've jumped the gun with creating the vote after only 3 hours of discussion here, that's not enough time for enough folks to even read what's going on, let alone timezone shenanigans. AG202 (talk) 21:37, 4 July 2022 (UTC)Reply[reply]
@AG202: I've only just seen this reply now. You can ping me whenever, it doesn't annoy me. As to your comment: no, I haven't jumped the gun because I didn't create the vote only because of this discussion; the discussion was merely the last straw. This particular issue has been a long-term grievance of mine that I run into almost every single day. Just moments ago, marvelous and marvellous were posted to WT:RFM which is yet another perfect illustration of the problems with the status quo. — Fytcha T | L | C 〉 20:00, 6 July 2022 (UTC)Reply[reply]
Synonyms: yes. Alt forms: it's another of those synchronisation problems as with color/colour. I suppose we need some bot or template solution in the long term, or things will inevitably fall out of sync. Equinox 14:46, 4 July 2022 (UTC)Reply[reply]
Thanks. (I agree synonyms should be treated as their own words with categories etc, although I'm not sure how I feel about using labels that suggest x is a {{lb|en|medicine}} {{synonym of}} y, as if y might be the nonmedical word, when in fact y is also a {{lb|en|medicine}} word...) I intend to go through heraldry terms later, as we have many duplicate entries that should be alt forms, and wanted to know whether to add categories like someone did to mezail or not. I hope the vote doesn't get derailed by people's feelings about national standard spellings; if so, perhaps these could be excepted. - -sche (discuss) 19:09, 4 July 2022 (UTC)Reply[reply]
I agree with Equinox and sche here, synonyms should be excluded. Also, I agree that national spellings should be excluded as well. There's no current written that state that terms should be lemmatized at any one alternative form with WT:FORMS saying "In particular, while some editors try to make the “main” entry correspond to the most common form —and some sysops actively encourage this— the official policy is that all the forms are equally valid. [...] It is not mandatory to make an alternative form entry's content consist exclusively of an alternative form link; especially in cases when all forms are obscure, a gloss is permitted for each form, although it is usually best to indicate that other forms exist. [emphasis added by me]" Though that emphasized part is not ~official~ policy, it does make me lean on allowing categorization for alternative forms. Also, I don't like that we'd be removing grammatical information like "countable" & "uncountable" from the alternative forms. AG202 (talk) 21:34, 4 July 2022 (UTC)Reply[reply]
@Fytcha Also does this only apply to words with {{alternative form of}} on the definition line? What about dialectal forms? Should those not be included in categories as well? I feel that they should. I don't see much reason for ẹkà to be stripped of imagery, categorization, and information, just because the standard ọkà exists, as it limits the information and can hide the term from the categories that it should be in. Same with any difference in regional spellings and standards. I really feel that this is really majority-language-centric as well, like sure the different spellings of naïveté might not need full categorization, but that's far from the case with all alternative forms across all languages, so this proposed does too much for me now. AG202 (talk) 21:44, 4 July 2022 (UTC)Reply[reply]
Also what about languages that have different scripts? Would アィヌモシㇼ lose info because aynumosir exists? What about Serbo-Croatian? What about translations at 1st vs first? The phrasing "or should correctly be defined as such" is very vague. AG202 (talk) 22:43, 4 July 2022 (UTC)Reply[reply]
@AG202: Writing the vote, I did think about Serbo-Croatian which is exactly what made me add that "(or should correctly be defined as such)" disclaimer because, as it stands, we treat the Cyrillic and Latin entries of any given Serbo-Croatian word as equal, thus neither should correctly be the alt-form of the other (I do acknowledge though that it is worded poorly and I'm open for suggestions). In other words, Serbo-Croatian is excluded from this (and from the looks of it Ainu too, though I don't know the policies surrounding it). Abbreviations, initialisms, clippings etc. are likewise exempt from this vote (as they are not listed in WT:ALTER).
Dialectal forms on the other hand will be stubified which is actually in part what motivated me to create this vote. Not only in English (see the insane amount of duplication and also, unsurprisingly, discrepancy between fiber and fibre), but also in my native language: see the endless duplication of etymologies and alt-forms in the alt-forms of Chatz. When I added one, I had to go to all entries separately and add it, which is definitely not what we want to do. This is not maintainable design. I'm also not a fan of ẹka/ọka: what if a synonym needs to be added, qualified or removed? What if the etymology needs to be expanded? Having multiple non-stubs always leaves the door open for discrepancies to creep in. — Fytcha T | L | C 〉 23:27, 4 July 2022 (UTC)Reply[reply]
@Fytcha "What if the etymology needs to be expanded? Having multiple non-stubs always leaves the door open for discrepancies to creep in." Then editors should update the entries? I already have to do it when I add {{see also}} to entries or whenever I have to update template usage or whatever. I don't see the worth in removing important content from dialectal entries because it's not "maintainable". That brings in the same issue that @-sche mentioned with national standards. And then what if someone is looking up that specific common dialectal term? It's easier to see all the information on that singular page and have it listed, rather than having to click to another page. If you want to stubify coverage for the languages you work in that's fine and honestly that seems like something that could be done on an About page, but as for the languages that I've worked in, I wouldn't approve it, especially when we've been told to try and condense Yorùbá into dialects rather than separate languages with their own headers. Also, dialectal variation of this kind isn't listed on WT:ALTER either to be fair. AG202 (talk) 00:27, 5 July 2022 (UTC)Reply[reply]
fibre & fiber should just be cleaned up on their own. It also brings up the question of which one is the lemmatized form, which has been avoided reasons like this. AG202 (talk) 00:29, 5 July 2022 (UTC)Reply[reply]
I am (slowly) working on a solution to situations where we don't want to prioritise one spelling over another. I don't think it's appropriate to say one is the lemma, because it depends where you're from. Theknightwho (talk) 16:49, 5 July 2022 (UTC)Reply[reply]
@AG202:: sure, it would be nice if all our editors did the right thing and updated both entries, but that's not what really happens. The average editor sees something is missing from the one entry, so they add it. Then they go off and do something else. The number of editors who actually go to the other entry is miniscule. Out of those, the number who take the time to check whether sense #18 that they just added is the same as sense #23 in the other entry, let alone figure out where their new sense fits in the completely different order of senses in the other entry is not much greater than the number of participants in this thread. Come to think of it, there's a good chance that the only people who do that are the participants in this thread... Chuck Entz (talk) 07:38, 5 July 2022 (UTC)Reply[reply]
@AG202: Besides what Chuck said (which I 100% agree with), I'm really not convinced that my proposal makes the average reader's experience any worse. I can see a point for retaining the image in ẹka because there isn't any major source for discrepancies introduced by it but the etymology and synonyms need to go. Yes, the user needs to click on ọka to get to the etymology and synonyms now, but the experienced user will do that anyway. You know why? Because they have learned that in 4 out of 5 cases the information in such alt-form entries is incomplete.
Also, WT:ALTER does list this kind of dialectal variation, at least that's how I understand the "regional variations" bullet point (yes, the examples given only differ by orthography but I don't understand that bullet point to be restricted to that). — Fytcha T | L | C 〉 11:03, 5 July 2022 (UTC)Reply[reply]
@Fytcha That bullet point should be clarified then because it’s not clear. And sure we could move the etymology & synonyms to the other entry, but I still don’t think that it’s necessary to remove categorization or things like that. @Chuck Entz there are a LOT of things that Wiktionary is lacking on or that editors don’t do or update or clean up, whatever, but that doesn’t mean that we should put that laziness into policy and significantly alter the experience for the editors that do do that and find it useful. AG202 (talk) 13:46, 5 July 2022 (UTC)Reply[reply]
@AG202: Addendum: a gloss will still be permitted under this vote because it is part of the {{alternative form of}} template. Countable/uncountable and other grammatical categories are allowed per the vote but I lean towards disallowing them as labels. They're again a source of potential copy-paste errors, discrepancies, incompleteness etc. Secondly, they, for a long time, used to look weird to me because they read like "(uncountable) Alternative form of .."; so is it the uncountable counterpart of a normally countable noun? That's how it naively reads at least. -sche seems to share this perception, given their above comment. I'm also not sure whether such grammatical labels in alt-forms even do any good; when you stop and ponder about a word, don't you move on to the non-alt-form anyway? It doesn't matter that the alt-form lacks such-and-such label because it lacks everything else too; normally you'd move to the main form anyway.
@-sche: The issue with your second suggestion in diff is that obsolete (/dated / rare / uncommon) alt-forms should not have labels either; they should be defined using {{obsolete form of}} etc. so I'm not really sure which example to put there to avoid the rightfully raised issue about pondian bias while not sacrificing illustrative value. — Fytcha T | L | C 〉 23:58, 4 July 2022 (UTC)Reply[reply]
"I'm also not sure whether such grammatical labels in alt-forms even do any good; when you stop and ponder about a word, don't you move on to the non-alt-form anyway? It doesn't matter that the alt-form lacks such-and-such label because it lacks everything else too; normally you'd move to the main form anyway." I don't as often as you'd think, especially if the entries are actually well-done, and it's an issue of national/dialectal/regional variation. If the important info is already there, then I don't need to go to another location for it. AG202 (talk) 00:31, 5 July 2022 (UTC)Reply[reply]
Several problems with this.
  1. At the end of the day, alternative forms are still real words. You're proposing that we essentially turn them into redirects, which would only make topical categories useless for readers. What happens if, for instance, someone tries to find shasqua in Category:en:Swords, and isn't familiar with the shashka spelling we use for the main entry?
  2. What we consider to be the "primary" spelling generally isn't based on any objective criteria other than "someone made this page first". In the case of US vs UK spellings, our policy is to pick the main entry completely at random to avoid giving a preference to either dialect.
  3. Alternative spellings already share most categories (usually automatic ones) with their main entries. I don't see why you're focusing on topic categories in the first place.
Binarystep (talk) 01:27, 7 July 2022 (UTC)Reply[reply]
  1. If somebody tries to look up a certain word that they already know, why would they look into that category then? Your proposed use case doesn't exist. What does exist however is that somebody looks into a topical category and finds that the same semantic item is represented by 5 different elements in that category because of slight spelling differences. How is that useful?
  2. Yes, this is a problem (one that should be changed IMO) but luckily it only affects US vs. UK differences. For all other spelling variations we're usually wise enough to just follow the usage, i.e. put the main entry into the most used form.
  3. Nobody browses through the 276k entries in Category:English countable nouns (such categories exist mainly for query purposes) but people absolutely manually browse through our small topical categories which justifies the difference in treatment. — Fytcha T | L | C 〉 02:00, 7 July 2022 (UTC)Reply[reply]
    1. Fair enough, that was bad phrasing on my part. People might expect to find a term in a topical category, though, and not having it there would only confuse readers. There's also the fact that, at the end of the day, these are still equally valid words, and there's no lexical justification for treating them like redirects. The reason we even use templates like Template:alternative form of has nothing to do with these spellings being less "real", it's because it'd be a pain in the ass to keep definitions in sync across multiple pages. On the other hand, decategorizing these pages feels more akin to placing them in Category:English non-lemma forms or labelling them as misspellings. Going back to my previous example, there's nothing that makes shasqua less relevant to the topic of swords than shashka.
    2. Admitting it's a problem doesn't fix it. What you're proposing would immediately turn topic categories into a mishmash of random spellings.
    3. So? How does it actually harm readers to learn that a word can be spelled in several ways? This is a solution to a nonexistent problem.
    There's also the fact that a significant number of alternate spellings are etymologically distinct from their more common forms, such as caféteria, cameleopard, choc-a-bloc, gasolene, フューシャ, and ホクシャ. These are distinct terms, and they should be treated as such. Binarystep (talk) 03:03, 7 July 2022 (UTC)Reply[reply]

I'm just a humble editor, and I don't understand the concept of a 'topical category' versus another type of category. But I think that, at minimum, Taipei and Taibei should appear in the relevant categories in which only one of either could appear- in this case, one (Taipei) will appear in the Wade-Giles category and one (Taibei) in the Hanyu Pinyin category. Now as to Chongqing and Chong Qing, you may say "it's clutter to include both in the Hanyu Pinyin category". I'm okay with that logic, but I think it would be fun and informative to see all the variants in one category, because only a select few words will be able to achieve (as in, 3 good cites) variants. --Geographyinitiative (talk) 22:00, 4 July 2022 (UTC)Reply[reply]

An example of a topical category is Category:en:Soups. It contains members like bird's nest soup, New England clam chowder and powsowdie, which are linguistically unrelated other than being English uncountable nouns, but have an obvious semantic relationship, all being related to the topic soup.  --Lambiam 19:30, 5 July 2022 (UTC)Reply[reply]

When an alt form, abbreviation, phrase, etc is pronounced like its lemma, components, etc.[edit]

Currently, absence of a pronunciation on an alt form can mean either (a) it's pronounced like the lemma, or (b) no-one added it yet (e.g., mezail lacks a pronunciation but is probably /z/ unlike mesail's /s/). IMO we'd benefit from a template saying ~"like other entry" for when an alt form is pronounced like the lemma, to distinguish that from oversight. This would also be useful when some/all pronunciations of an abbreviation, phrase, etc are the same as the full word. E.g. etc. is pronounced like et cetera but they present pronunciations in differing orders; etc. could just point to et cetera. Or look what I did in Latin@, which has some of its own pronunciations which must be listed, but which can be pronounced like "Latino and/or Latina": it'd be dumb IMO to repeat on Latin@ all the ways Latino (/ləˈtinoʊ/, /læˈtinoʊ/) and and (/ænd/, /ɛnd/, /ənd/, /ən/, /æn/, /ɛn/, ...) can be pronounced. - -sche (discuss) 19:35, 4 July 2022 (UTC)Reply[reply]

If something is pronounced like the lemma, it's an alt spelling, not an alt form. For abbreviations that's more difficult, but I think we should assume it's also pronounced like the written-out form, and otherwise not call it an abbreviation but rather state that it's an abbreviation in the etymology section.
For things like Latin@ such a template might be useful though. Thadh (talk) 19:40, 4 July 2022 (UTC)Reply[reply]
Fair point, in theory "same pronunciation" vs "different pronunciation which no-one has added yet" could be indicated by always using {{alternative spelling of}} for the first one and {{alternative form of}} for the second. But the number of people who use the templates this way seems ... lower than the number who just use one or the other template for either kind of thing. (It also gets tricky if two forms/spellings have different etymologies, e.g. -er vs -or, which would suggest they are different forms and not different spellings, but they are not actually pronounced distinctly.) (It's something DCDuring has talked about: the danger of fine distinctions that are only maintained by adepts is that in practice they are not maintained. There are situations where it frustrates me, because I am one of the adepts who would like to maintain some distinction, but it is what it is.) - -sche (discuss) 19:56, 4 July 2022 (UTC)Reply[reply]
Well, the fact people don't follow a practice doesn't mean we don't have a practice or that the introduction of another template would solve that.
The point about -er, -or is a good one, but I think the best solution for that would be creating a second section for -or and call it an alternative spelling of -er (since that is probably the most historically honest representation of what happened). Thadh (talk) 20:05, 4 July 2022 (UTC)Reply[reply]
It's further complicated by the fact that "form" refers to different things in different languages. Chinese has a strict definition of what it means to be a form, and it also doesn't make sense to refer to spellings. There is the "synonym" template, but it doesn't have the wide variety such as "obsolete form of", "rare form of" etc. Theknightwho (talk) 20:35, 4 July 2022 (UTC)Reply[reply]
Minor point: please don't use the word "like" (which might suggest mere partial similarity). Perhaps "same as", or "see". Equinox 11:37, 5 July 2022 (UTC)Reply[reply]
Good point. - -sche (discuss) 22:16, 17 July 2022 (UTC)Reply[reply]
Another place this might be useful is with names, which at least some users have been disinclined to call "alternative spellings" of other names (fair; does Ashlee think her name is a mere variant of Ashley as opposed to her own name? or as DCDuring said, does someone whose birth certificate says Jim think his legal name is a mere hypocoristic of James, or does he think it's a separate name?), some of which are pronounced the same. Then again, sometimes distinctions are asserted. (Widsith and I and some others discussed Sara vs Sarah a while ago, where either one can be found, in videos from the US, UK, Australia, etc, pronounced in any of the ways the others are, yet a distinction is sometimes asserted to be supposed to exist.) - -sche (discuss) 22:16, 17 July 2022 (UTC)Reply[reply]

Template:R:de:Reverso, Template:R:tr:Reverso etc.[edit]

Do we want to have these? I've seen them be added to more and more entries lately. They're just links to a free corpus search engine and not a particularly good one at that. It contains tons of unchecked, machine-translated stuff. The Turkish and Romanian results are particularly poor but the German ones (which I think is one of their best corpus) are still pretty bad at times.

If you search for instance fresh as a daisy in the English-Turkish corpus, you get a lot of results along the lines of papatya gibi taze / papatya kadar taze. This is a non-idiomatic literal translation (I've asked a native speaker) and Google Books also doesn't contain any uses.

If you search called it a day, you get kıyamet gününde onu çağırırız which is utter nonsense. If you search the same phrase in the English-German corpus ([15]), you get some correct and idiomatic translations but the fourth translation there is Es wurde gerade gestrichen über der Schimmel und nannte es ein Tag. which is not only a non-idiomatic literal translation of the idiom but also grammatically bogus. If you look it up in the English-Romanian corpus ([16]), the first hit is Se pare că calmar murdărie numit o zi prea. which is just incomprehensible gibberish.

If you search all that jazz in their English-Romanian corpus, you get many literal translations actually containing the word jazz (there is no such idiom in Romanian).

It's worth noting that these are the first three English idioms that came to mind. I didn't even have to search to find all these errors. And these are just the problems with translations that at least match, because many pairs don't. Probably has to do with the software that matches the sentences of the two running texts.

If we add search engine buttons to our articles, we may as well link Google "term" site:.tr because that usually at least yields texts written by native speakers (or by humans at all). In my opinion, we should only link to newspaper-grade corpora (the ones that e.g. {{R:DWDS}} provides are good), not to machine-translated garbage.

Pinging also @LinguisticMystic as the creator. Please stop adding these for the time being until they have been discussed. Thanks. — Fytcha T | L | C 〉 23:51, 6 July 2022 (UTC)Reply[reply]

Thank you for your feedback. Reverso has a lot of good info especially for smaller languages, and some errors as well. The Swedish and Ukrainian parts are marked as Beta. The other that is frequently used is https://tr-ex.me/ which might also contain errors due to corpus misalignment. Most of these use ML technology to align text corpora. Do you have a site suggestion that you would prefer to use here? LinguisticMystic (talk) 08:21, 7 July 2022 (UTC)Reply[reply]
I use this dictionary, but I do not think we should have a link to it. Vininn126 (talk) 09:28, 7 July 2022 (UTC)Reply[reply]
They clearly generally aren’t just machine-translated. They are mostly movie subtitles, made by professional translators and fans, perhaps sometimes by way of PEMT – idioms are always stressing the capabilities of translators, as well as very technical language, but those corpora are worth checking for anything colloquially needed by bare languages. Fay Freak (talk) 10:06, 7 July 2022 (UTC)Reply[reply]
There are many machine translations if you check various collocations and similar things. It's very hit or miss. Vininn126 (talk) 10:08, 7 July 2022 (UTC)Reply[reply]
The dictionary part of Reverso is quite good by the way. Do you have any suggestions for freely available general parallel corpora for smaller languages? LinguisticMystic (talk) 12:10, 7 July 2022 (UTC)Reply[reply]
Yes, not exclusively ML. It also depends on the language, the German ones tend to overall be of higher quality. But even something like 20% unsupervised machine translations (which is a lowball figure for Turkish and Romanian) makes this unacceptable to link to. And for German there isn't really a need because we do have high quality, newspaper-grade corpora search engines to link to, for instance {{R:UniLeipzig}} and {{R:DWDS}}. — Fytcha T | L | C 〉 13:05, 7 July 2022 (UTC)Reply[reply]
I think we should remove all these, from my experience this is just a slightly better version of Google Translate, and God knows we don't want thát site to be templatised. Let's stick to paper sources as much as possible. Thadh (talk) 12:51, 7 July 2022 (UTC)Reply[reply]
Seconded.brittletheories (talk) 16:44, 7 July 2022 (UTC)Reply[reply]
To be clear, we all be talking about different things. I was referring to Reverso Context, which shows phrases from bilingual corpora. The slightly-better-than-GT parts on other subdomains are a different piece of cake, which I do not even use, apparently approximately knowing the value. Fay Freak (talk) 11:38, 8 July 2022 (UTC)Reply[reply]
We're not talking about different things: What reverso does, is undiscriminately extract translations by bot from a wide array of websites. Let's take a look at also for instance. Per sentence, the translation is extracted from:
  1. A subtitled video from "Learn to Dance Tango" (a website, unsurprisingly, about learning to dance Tango).
  2. Unknown
  3. Unknown
  4. An article on the website Kömmerling, some kind of Real Estate website. Mind you, the translation's not even there, so we don't know where that comes from.
  5. Unknown
  6. Some kind of very suspicious website that I don't even want to know what it does, again with no translation there.
  7. An article on IVS Dosing Technology, some kind of steam company; The translation was extracted from the English version of the website.
  8. A brochure about renovating museums, the translation is once again from an unknown source.
  9. Unknown
I would go on, but unfortunately further sentences are behind a paywall request to make an account. Note that each sentence had its source indicated as "Various sources". Honestly, even our entries, if unsourced, are a more trustworthy source, because at least you certainly know who the contributor is. Moreover, using whole versions of websites in another languages as a direct source for translations is not a good idea. Thadh (talk) 12:11, 8 July 2022 (UTC)Reply[reply]
I personally use context.reverso.net to look for modern usage of Italian terms, because Italian dictionaries (including extremely good ones like Treccani) tend to be quite prescriptive, and as a result often omit modern usages that they don't like or include archaic usages that they do like. I use the left side of Italian-English examples; the English on the right side is sometimes garbled but you can usually get a sense of what's going on. The Italian usually looks pretty good but maybe this is partly because I'm not a native speaker. However, I would not advocate quoting from this source for precisely the reasons articulated by various people above; the quality is uneven and the English, even when idiomatic, isn't always correctly aligned with the foreign language. Benwing2 (talk) 13:48, 8 July 2022 (UTC)Reply[reply]
@Thadh, Benwing2: “a wide array of websites” is often exactly what you want, and good as long as the crawler detects whether its sources are human-translated. For technical products you don’t get any other bilingual corpora – I mean, tango? Realtor jargon? Museums? Or fitness and fashion? Exactly the kinds of things that you wanna know but then do not find in from-scratch dictionaries, and I slowly had to add by checking the corresponding content of websites on the same topic but in various languages – the bot is for finding those that have been translated.
I caveat that I have mostly used Reverso Context for Arabic. It has many dialogues by reason of their dumping subtitles, and indeed the idiomaticity problems are what humans from a completely differently structured language and culture have struggled with; principally humans work like bots if they aren’t (paid to be) too creative. So I have long followed their request to make an account (so they can understand what topics people search, right? I also let Firefox and KDE etc. track usage so my interest counts) and look for the best examples of umpteen, since indeed the first examples cannot be relied upon to be the most relevant; likewise dropping a phrase into Reverso Context has often been the easiest solution if man has been to dull to parse it into European terms. But I have never felt a need to link it, in spite of my contributions to the Arabic entries by themselves amounting to a whole dictionary. It could be a discussion page template though, “just look {{R:Reverso|source language|target language}}”. Fay Freak (talk) 19:56, 8 July 2022 (UTC)Reply[reply]
@Fay Freak The problem here IMO is that people will use the template for mainspace pages, as they have already done. Benwing2 (talk) 01:05, 9 July 2022 (UTC)Reply[reply]
@Thadh: Good analysis, thank you. This helps explain why there are so many machine-translated or not even matching pairs. I would vote to delete if there was an RFDO for these templates. — Fytcha T | L | C 〉 13:59, 10 July 2022 (UTC)Reply[reply]

Splitting Category:Form-of templates into lemma and non-lemma template categories[edit]

I think this needs to be done desperately. The uses of lemma form-of templates and non-lemma form-of templates are diametrically opposed. When I'm looking for a specialized alternative-form-esque template, all the non-lemma form-of templates are just noise in this category and table. A sortable column in the table would be a start but seeing that the table is not maintained, we should perhaps just split the category into two sub-categories (perhaps Category:Variant-of templates and Category:Inflection-of templates or just simply Category:Lemma form-of templates and Category:Non-lemma form-of templates). Pinging @Benwing2 as you did massive work around all of these. — Fytcha T | L | C 〉 15:46, 7 July 2022 (UTC)Reply[reply]

@Fytcha The table hasn't been updated in awhile because although it's autogenerated, the input file for the autogeneration is a manually curated file. I am in the process now of writing a script to autogenerate the input file as well by parsing the template code for each of the form-of templates. As for separating them out, lemma vs. non-lemma may be a bit tricky because some templates may be used for both purposes. However, there's a natural three-way separation into those that are defined using inflection_of_t (which means they take inflection tags, like {{inflection of}}, {{archaic inflection of}}, {{participle of}} and a few others); those defined using form_of_t (which have their text specified using an arbitrary text string like alternative {{glossary|letter case|letter-case}} form of, and which take approximately the same params as {{m}} and {{l}}); and those defined using tagged_form_of_t (which have their text specified using inflection tags, and which also take the same params as {{m}} and {{l}}, such as {{definite singular of}}, {{active participle of}}, etc.). The latter are the closest to what you probably think of as non-lemma form-of templates, although there are some weird outliers like {{alternative reconstruction of}} (which is defined using the tags alternative and reconstruction; this is a holdover from an earlier format and should be corrected). So maybe we can do a first-pass split based on which function is used to define the template, and make any necessary manual corrections. Benwing2 (talk) 01:26, 8 July 2022 (UTC)Reply[reply]

Weird Indonesian rhymes[edit]

User:Xbypass has created a ton of pages with rhymes categories looking like this (on bubar):


This creates rhyme categories for rhymes like /bar/ and /r/ which makes no sense to me. Is there something special about Indonesian rhymes or is this user simply misusing the template? Benwing2 (talk) 01:10, 8 July 2022 (UTC)Reply[reply]

The rhymes in Indonesian is used for oral poetic form of pantun. In its most basic form, the pantun consists of a quatrain which employs an abab rhyme scheme.
First example
Tanam selasih di tengah padang,
Sudah bertangkai diurung semut,
Kita kasih orang tak sayang,
Halai-balai tempurung hanyut.
Second example
Singapura negeri baharu,
Tuan Raffles menjadi raja,
Bunga melur, cempaka biru,
Kembang sekuntum di mulut naga.
I think it makes sense for me to write rhymes for bubar as {{rhymes|id|bar|ar|r|s=2}}. So, what no sense are you talking about? Xbypass (talk) 03:20, 8 July 2022 (UTC)Reply[reply]
@Xbypass Normally "rhyme" specifically means the end of a word as far left as the stressed vowel, but no more. It does not normally include the consonant preceding that vowel. In the examples you give, there is nothing to suggest that Indonesian rhyme is any different, and the Wikipedia article you link says nothing about rhyme being defined differently in Indonesian than elsewhere. Where is the stress in Indonesian words? I take it that raja and naga have stress on the last syllable, hence the rhyme in /a/ is totally normal. As for baharu and biru, the rhyme is clearly /u/ not /ru/, since your other examples show that the consonant preceding the rhyme vowel is not part of the rhyme. In order to justify a rhyme like /r/, you need to show several examples that rhyme only in the last consonant, whereas you've shown none so far. Benwing2 (talk) 13:40, 8 July 2022 (UTC)Reply[reply]
I don’t know if the rules for what constitutes a rhyme are different for Indonesian, but the standard rule for a perfect rhyme requires the onsets of the stressed syllables to be different, so while brutality is a perfect partner in rhyme for reality, it is not so perfect for mortality. The pair baharubiru seems less than perfect.  --Lambiam 15:25, 8 July 2022 (UTC)Reply[reply]
The examples of drama poetry at the Indonesian Wikipedia show that the different-onset rule is not in force, or the stressed nuclei are not required to be identical (or both).  --Lambiam 15:41, 8 July 2022 (UTC)Reply[reply]
@Lambiam, I think after I read rhyme in Wikipedia, we have different understanding about the rhyme sense. @Benwing2 follow the specific sense of rhyme, ie. perfect rhyme, while I was follow the general sense of rhyme which allow syllabic and half rhyme to be classified as rhyme along the perfect rhyme. As far as I know, Indonesian poetry, such pantun and drama poetry follow the general sense and not necessarily the perfect rhyme. Xbypass (talk) 09:56, 9 July 2022 (UTC)Reply[reply]
Yes, the understanding of rhyme in Indonesian (and Malay in general) is different in first place because stress only plays a minor role in Indonesian prosody. It may depend on the background of the speaker (people from North Sumatra and South Sulawesi, for instance, have very prominent stress that comes with higher pitch and vowel lengthening), but generally, stress has a "floating" character. And in the Riau-Johore area, the cradle of the pantun, it actually falls on the final syllable. So its not surprising that for rhymes, the final syllable is crucial. So baharubiru is a perfect rhyme because of the shared onset.
What strikes me as off in the bubar example is -r. Shared final consonants are, as far as I know, not enough to make a rhyme. –Austronesier (talk) 11:06, 9 July 2022 (UTC)Reply[reply]

Are writings unrelated to Wiktionary's goals allowed on user pages?[edit]

I found WT:USER, but it is not an official policy. 09:40, 8 July 2022 (UTC)Reply[reply]

Yes, within reason. There is a reasonable degree of freedom as to what people can put in their user-space, as long as it is not problematic it will probably be left alone. If you use it as a personal blog, or advertising platform, etc. someone will probably delete it. - TheDaveRoss 12:39, 8 July 2022 (UTC)Reply[reply]

deleting stress marks[edit]

Discussion moved from Wiktionary talk:Pronunciation.

User Theknightwho has repeatedly deleted stress marks, arguing that English monosyllables aren't stressed (?), that stress is "only relative" (as are vowels and consonants), which is grossly ignorant. Lexical stress is phonemic in English, even in monosyllables. If we're going to have a phonemic transcription, shouldn't it be the phonemic description of the word? kwami (talk) 05:08, 8 July 2022 (UTC)Reply[reply]

We do not include stress marks on monosyllabic English words when used in isolation, which I have already explained to you (though other users can comment on other languages). Please also tag someone if you’re going to start talking about them in a public forum, too. Theknightwho (talk) 09:50, 8 July 2022 (UTC)Reply[reply]
It depends on a language-by-language case, but I got the impression that English entries didn't mark stress on monosyllabic words, just like many other Germanic languages (Dutch, Afrikaans and Saterland Frisian at the very least).
That doesn't hold true for every language, and is very much a choice up to the language community. For instance, Russian terms need to have stress, even if they are monosyllabic, because vowel reduction depends on the word's location in reference to the stress. Thadh (talk) 10:13, 8 July 2022 (UTC)Reply[reply]
I’m reasonably sure English includes it if stress is mandatory, but that’s pretty rare, and tends to relate to specific senses of words like the. Theknightwho (talk) 10:17, 8 July 2022 (UTC)Reply[reply]
These are actually instances of (emphatic) prosodic stress, not word stress, although it may have an impact on the pronunciation.  --Lambiam 15:06, 8 July 2022 (UTC)Reply[reply]
Fair point. Theknightwho (talk) 16:05, 8 July 2022 (UTC)Reply[reply]
What monosyllabic words would that even affect in Russian? Vininn126 (talk) 12:17, 8 July 2022 (UTC)Reply[reply]
@Vininn126 See а, по, на. Thadh (talk) 12:22, 8 July 2022 (UTC)Reply[reply]
Ah, fair enough. I think in the more "lexical" words, as opposed to the grammatical ones, it wouldn't be necessary. Vininn126 (talk) 12:24, 8 July 2022 (UTC)Reply[reply]
True, but then again, they aren't at the moment. Perhaps they should though, since what is lexical and what isn't isn't always clear-cut. Thadh (talk) 12:31, 8 July 2022 (UTC)Reply[reply]
I think most small prepositions (and also articles) tend to be more grammatical as opposed to lexical. Vininn126 (talk) 12:33, 8 July 2022 (UTC)Reply[reply]
To be honest, it seems to be the same as with English. We only include stress marks for these where it has semantic relevance. Theknightwho (talk) 13:26, 8 July 2022 (UTC)Reply[reply]
'Grossly ignorant' is harsh, no need for that.
As for the matter at hand, I do not see why stress should not marked in English monosyllables. Clearly there are ones that do carry a lexical stress (girl, door, play) and ones that do not (a, and, or). Nicodene (talk) 15:12, 8 July 2022 (UTC)Reply[reply]
Yes, I was harsh, but they were edit-warring over their ignorance, and have a history of making POINTy edits and other troll-like behaviour, so I wasn't feeling sympathetic. I am happy to see that they're arguing the point here. kwami (talk) 17:34, 8 July 2022 (UTC)Reply[reply]
No, I reverted your attempt to unilaterally impose your view on a project page without any prior discussion. You started this discussion by misrepresenting what I said, and now you’re misrepresenting why you started it. Just stop. Theknightwho (talk) 18:10, 8 July 2022 (UTC)Reply[reply]
Perhaps you didn't look at what you were doing. Paul G deleted, without discussion, stress marks that had been stable for a decade. I reverted him. You then reverted me, again without discussion. If you want to make a change to a long-standing consensus, fine, but when your changes are contested, you need to argue your case rather than just edit-warring. kwami (talk) 06:59, 10 July 2022 (UTC)Reply[reply]
You’ve been told what the long-standing consensus is. A handful of inconsistent mistakes are irrelevant. Nobody cares about your Wikipedia-style rules lawyering here. Theknightwho (talk) 10:52, 10 July 2022 (UTC)Reply[reply]
Told by who? You? You have no support here, and you're contradicting a decade of consensus. kwami (talk) 17:26, 10 July 2022 (UTC)Reply[reply]
What decade of consensus have I overturned here? You're the one arguing that we should include stress marks on monosyllabic English words, which is something that we don't do, other than a handful of exceptions which have already been explained. Or are you dishonestly trying to claim that because a handful of words on WT:Pronunciation included them that the consensus is for a mish-mash and on that page only? Because that would be hilarious. Theknightwho (talk) 17:38, 10 July 2022 (UTC)Reply[reply]
[phonetic] and /phonemic/ are two different things. For over a decade, all of the phonemic transcriptions on that page indicated phonemic stress. kwami (talk) 18:25, 10 July 2022 (UTC)Reply[reply]
Okay, but current practice is that we include them on neither. Make your arguments otherwise, but don't claim that we do something that we don't, particularly when the page itself is silent on the topic (and therefore not authoritative). Theknightwho (talk) 18:28, 10 July 2022 (UTC)Reply[reply]

@Nicodene, Thadh, Lambiam Could I have some help here, please? Theknightwho is edit-warring to change a decade-long consensus, perversely claiming they're defending consensus. kwami (talk) 17:36, 10 July 2022 (UTC)Reply[reply]

Stress is phonemic in English at the lexical level. A phonemic transcription therefore needs to indicate whether or not a word, even a monosyllabic word, has lexical stress, or else it's not phonemic. Being in isolation doesn't matter, because the phonemic structure of a word is (more or less) independent of that. True, in the major lexical classes monosyllabic words will nearly always have lexical stress, but omitting the stress from a phonemic transcription means that the word does not have lexical stress. I.e., the transcription will be in error. It's also not entirely predictable, since some words ostensibly in major lexical classes have been grammaticalized. Omitting stress from monosyllabic words in English (or Russian) because it's "relative" would be like omitting tone from monosyllables in a register tone language like Yoruba because it's "relative" (which it is), despite the language having a contrast between stressed and unstressed or high and low tone. That's true regardless of whether there is associated vowel reduction in words without lexical stress that can be independently transcribed. Unstressed syllables in English may have reduced vowels, but that's not a general rule.
Several major English dictionaries use the IPA stress marks for non-IPA values: a monosyllable without a stress mark is stressed, while a monosyllable with a stress mark is a disyllable. Wikt could imitate that usage, but then we would not have a phonemic transcription. The vowel letters ⟨ᵻ⟩ and ⟨ᵿ⟩ have been removed from Wikt, despite being convenient and being used by the OED, because they're not proper IPA. By that argument, we shouldn't copy the OED's non-IPA conventions for stress marks either.
(I don't mind marking the non-phonemic distinction between primary and secondary stress, which is prosodic rather than lexical, as long as we don't use "secondary" stress to indicate that an unstressed vowel is not reduced, as MW does. The OED is generally a good model in such cases.) kwami (talk) 17:33, 8 July 2022 (UTC)Reply[reply]
I thought I'd copy an answer from my talk page, where it appears that Theknightwho may not understand the difference between phonetic and phonemic:
"Stress marks were present in all the phonemic transcriptions [on the project page], and none of the phonetic ones (monosyllabic). I don't know if that's good practice, but it's common enough to exclude irrelevant lexical details or to add extra-lexical elements to a broad phonetic transcription. A phonemic transcription, on the other hand, needs to include all phonemic distinctions. You're effectively arguing that English monosyllables don't have phonemic stress, which is contradicted by RS's on the subject." kwami (talk) 18:19, 10 July 2022 (UTC)Reply[reply]
I do understand the difference, as you very well know from our previous conversations - that's a pretty obvious attempt at poisoning the well. That response you've just posted wasn't actually pertinent to anything I'd said, which was about what the current consensus is. Theknightwho (talk) 18:25, 10 July 2022 (UTC)Reply[reply]
Then why do you make ridiculous arguments like it being a "mish mash" to have different phonetic and phonemic transcriptions? Either you don't understand, or you're making arguments in bad faith in an attempt to score points. kwami (talk) 18:27, 10 July 2022 (UTC)Reply[reply]
I haven't seen any convincing justification for treating phonetic and phonemic transcriptions differently when it comes to stress. Theknightwho (talk) 18:31, 10 July 2022 (UTC)Reply[reply]
Was there actually a discussion about this at some point? Nicodene (talk) 15:04, 11 July 2022 (UTC)Reply[reply]

UCSUR Pages[edit]

I know a lot of Unicode code points have gotten pages. Should we do the same for UCSUR code points? If so should we also some make pages for words using UCSUR scripts, such as in toki pona? GTbot2007 (talk) 21:00, 8 July 2022 (UTC)Reply[reply]

We don’t do conlanguages, so why should be do conscripts? (Those few that we allow don’t have conscripts.) We would also have completeness problems due to copyright. Fay Freak (talk) 21:19, 8 July 2022 (UTC)Reply[reply]
Some of the appendix-only conlangs do have conscripts. Also Cistercian Numerals are in UCSUR and are NOT conscripts. GTbot2007 (talk) 21:37, 8 July 2022 (UTC)Reply[reply]
The main obstacle is that they aren't part of Unicode, so there's no guarantee that any given site visitor will see the same thing as anyone else. We do have a display hack that allows us to control what site visitors see in certain pages created to work around MediaWiki syntax constraints, but I'm not sure if that would work for Private Use Area codepoints. Someone would have to set it up for you, anyway. Chuck Entz (talk) 00:30, 9 July 2022 (UTC)Reply[reply]
It's just like CJK Unified Ideographs Extension G, not many people can see it but thats why there is a image on the pages --GTbot2007 (talk) 01:23, 9 July 2022 (UTC)Reply[reply]
Not really. There those who see anything all see the same characters. Not so here. The whole UCSUR thing is an unofficial agreement among certain groups. Different fonts/systems can and do use Private Use Area codepoints to encode completely different characters. See WT:Grease pit/2019/July#Unicode Private Use Area Characters for "Jurchen Script", WT:Beer parlour/2021/December#Karia alphabet request, WT:Grease pit/2022/May#Unicode Private Use Area and WT:About Han script#Vietnamese for some other uses of Private Use Area sections that have been discussed in recent years. Chuck Entz (talk) 03:18, 9 July 2022 (UTC)Reply[reply]
Technically there is nothing official about Unicode either. Also as far as I can tell UCSUR is kinda different then those because different fonts agree on UCSUR. --GTbot2007 (talk) 09:22, 10 July 2022 (UTC)Reply[reply]
Other agreements exist, such as MUFI, which are incompatible with UCSUR. Theknightwho (talk) 11:15, 10 July 2022 (UTC)Reply[reply]
Then we can document both because a word can have more then one meaning, the only PUA standers I see a lot of people using are UCSUR, MUFI and SMuFL, so we could do just those --GTbot2007 (talk) 02:31, 21 July 2022 (UTC)Reply[reply]
Unicode is an international standard supported by virtually all modern computers. I'm not sure if it's more or less of a standard than, say, the kilogram, which is more omnipresent but has more outright opposition with the pound and stone. In any case, there's a huge difference between a standard supported by Apple, Microsoft, IBM, Adobe, W3C and just about everyone else, and a standard not officially supported by anyone.--Prosfilaes (talk) 03:44, 12 July 2022 (UTC)Reply[reply]
We could work with conflicting PUA interpretations for different scripts if the entries typically have many characters, so that few words have more than one reading. The page title may be gibberish (and it is sometimes not much better for Tai Tham, where there is no control over the font), but the head lines can usually be assigned a sensible font depending on the language. (Of course, as with Old Tamil, there may need to be separate entries for different encodings, in this case Unicode < 14.0 and Unicode >= 14.0.) So the requirement would have to be the existence of 'widely' available fonts for the various encodings. We could likewise accommodate MUFI characters, but in general for MUFI I think our support for normal lexical entries should be restricted to a transliteration service.
For Tengwar, there is confusion over whether the original encoding was superseded by what looks like a Unicode proposal. In general, I think we should require a reasonably well-standardised encoding, but we can cope with two encodings. In principle, it's not much worse than multiple orthographies. --RichardW57m (talk) 15:49, 11 July 2022 (UTC)Reply[reply]
Ten years ago I had a discussion on Meta of Mongolian in the PUA, part of which I screenshot above; I now see just boxes. How does “ ” display to you?
I don't see the need, either. The reason why Tolkien's scripts aren't encoded in Unicode is because they have possible IP constraints and the Tolkien estate isn't going to wave them. Same thing with pIqaD and Paramount. So why should we push the line? Yes, the Cistercian numerals are currently only in UCSUR, but they're generally out of scope anyway.--Prosfilaes (talk) 03:44, 12 July 2022 (UTC)Reply[reply]
Ok, but there are some scripts that have nothing to do with IP such as sitelen pona --GTbot2007 (talk) 02:31, 21 July 2022 (UTC)Reply[reply]
On the system at which I'm currently sat, it displays as well as ຘັມມະ (dhamma), i.e. a row of empty boxes. At least they're blue.
Wikipedia depicts pIqaD and Tengwar, and there are serious doubts as to the IP protection, especially in US law. --RichardW57m (talk) 11:10, 12 July 2022 (UTC)Reply[reply]
In general, we shouldn't have entries at or using PUA codepoints, for the reasons outlined above (they mean and display different things to different people), for which reason the usual Unsupportedpages title-changing method also wouldn't work, unless perhaps we could change the title to an image. If we want to include images of e.g. Tengwar script at Appendix:Sindarin, then we can discuss the legality of that, but we shouldn't be using PUA codepoints. Prosfilaes' text shows up as boxes for me, whereas Richard's shows up as Lao script (except the first character, which is a box). - -sche (discuss) 00:41, 16 July 2022 (UTC)Reply[reply]
The text I gave is entirely composed of Lao script characters; my point is that even assigned characters are gibberish if you lack the means to display them. Now, text of a known language will display if the script is defined and a font is successfully nominated for it. The same goes for text in the PUA. What matters is that there is an agreed encoding* for it. Indeed it has been the case that if a suitable font is not specified for it, typically by a user's CSS file, Pali entries in the Sinhala script will display as some other word. That does not delegitimise the entry. --RichardW57 (talk) 05:32, 16 July 2022 (UTC)Reply[reply]
  • In the case of words in the Tai Tham script, it seems that there doesn't have to be a firmly agreed encoding for it. --RichardW57 (talk) 05:32, 16 July 2022 (UTC)Reply[reply]

Propose statements for the 2022 Election Compass[edit]

You can find this message translated into additional languages on Meta-wiki.

Hi all,

Community members in the 2022 Board of Trustees election are invited to propose statements to use in the Election Compass.

An Election Compass is a tool to help voters select the candidates that best align with their beliefs and views. The community members will propose statements for the candidates to answer using a Lickert scale (agree/neutral/disagree). The candidates’ answers to the statements will be loaded into the Election Compass tool. Voters will use the tool by entering in their answer to the statements (agree/disagree/neutral). The results will show the candidates that best align with the voter’s beliefs and views.

Here is the timeline for the Election Compass:

July 8 - 20: Community members propose statements for the Election Compass

July 21 - 22: Elections Committee reviews statements for clarity and removes off-topic statements

July 23 - August 1: Volunteers vote on the statements

August 2 - 4: Elections Committee selects the top 15 statements

August 5 - 12: candidates align themselves with the statements

August 15: The Election Compass opens for voters to use to help guide their voting decision

The Elections Committee will select the top 15 statements at the beginning of August. The Elections Committee will oversee the process, supported by the Movement Strategy and Governance team. MSG will check that the questions are clear, there are no duplicates, no typos, and so on.


Movement Strategy and Governance

This message was sent on behalf of the Board Selection Task Force and the Elections Committee
Mervat (WMF) (talk) 09:46, 10 July 2022 (UTC)Reply[reply]

Search result ordering?[edit]

Discussion moved to Wiktionary:Grease_pit/2022/July#Search_result_ordering?.

eliminating Template:projectlink etc.[edit]

We have a zillion ways of linking to Wikipedia, Wikisource, etc. I am going to clean them up. This does *NOT* involve eliminating qualitatively different ways of linking, but redundant templates that do the same thing. We have a lot of qualitatively different link templates (IMO too many), from heaviest to lightest (these are just the ones I've discovered so far, there may be others):

But on top of that, each of these has a bunch of aliases:

We also have {{wpl}}, recently created by User:LinguisticMystic, which vectors to {{projectlink/Wikipedia}} but takes a language code as its first param.

The situation is similar for other projects, e.g. {{projectlink|source}} (and its aliases {{projectlink|wikisource}}, {{sourcelite}}, {{PL:source}}, {{PL:wikisource}}) do exactly the same thing (i.e. they produce the same-looking results) as {{source}} but use different code.

I am planning on eliminating redundant aliases, esp. the longer ones, and different access points for the same thing. For example, I don't think we need {{projectlink}} at all, and it's not used that much (about 1300 uses); use {{pedia}} or a similar dedicated template. I also think {{pedlink}} (177 uses) and {{WPLook}} (no uses) are unnecessary, and long aliases like {{wikipedia-slim}} for {{slim-wikipedia}} are pointless. {{projectlink/Wikipedia}} should not be the main entry; it should sit at {{pedia}}, which is by far the most-used alias, and {{projectlink/Wikipedia}} eliminated. {{wpl}} is a convenient interface for non-English uses, but it should be given a different name, as its existing name is terrible; I suggest {{R:WP}} (or repurpose {{R:W}}, which is currently unused) in line with other reference templates used in References and Further Reading sections. Benwing2 (talk) 03:56, 11 July 2022 (UTC)Reply[reply]

  • We only need one shortcut {{wp}} for {{wikipedia}} ({{wiki}} is a bad name as there are lots of wiki- projects).
  • We only need one shortcut {{swp}} for {{slim-wikipedia}}.
  • We only need {{pedia}} and an alternative interface {{R:WP}} ({{pedialite}} is a bad name, both in that it's a longer alias and in that the -lite series of templates normally refers to non-Lua variants of Lua-backed templates, whereas {{pedialite}} is in fact backed by Lua; but obsoleting it is a bit problematic because it's used quite a lot).
  • We only need {{w}}; it's already as short as possible.
  • We don't need {{pedlink}} or {{WPLook}} at all, as I mentioned above.

Benwing2 (talk) 04:12, 11 July 2022 (UTC)Reply[reply]

Oh hell, I found some more: {{in wikipedia}}, {{PL:dab}} (a bad name), {{wtorw}} (a horrible name), {{w2}} (another bad name), {{vern}} ("Hey Vern!"). Each has their own code. Seems like each user feels the need to create their own version, all similar but slightly different. "You are in a maze of twisty little passages, all alike". Benwing2 (talk) 04:29, 11 July 2022 (UTC)Reply[reply]
@Benwing2 {{vern}} is different. Its main purpose is to track redlinks for vernacular names of taxa. The WP link is only intended as a temporary substitute for the link to the entry- once the WT entry exists, the template links to the WT entry and adds a different tracking category so it can be replaced with an ordinary wikilink. Chuck Entz (talk) 04:45, 11 July 2022 (UTC)Reply[reply]
@Chuck Entz I see, thanks. Benwing2 (talk) 05:11, 11 July 2022 (UTC)Reply[reply]
"Temporary" will probably mean a very long time where {{vern}} is used for uncommon vernacular names, including mistaken spellings, eg, using non-ASCII letters, with wrong gender in specific epithet, with "ii" instead of "i" for specific epithet endings (or vice versa), etc. DCDuring (talk) 12:13, 11 July 2022 (UTC)Reply[reply]
For reference:
Aliased template Canonical template #Uses
Template:wikipedia Template:wikipedia 263940 (202453 not counting alias uses)
Template:Wikipedia Template:wikipedia 3731
Template:wiki Template:wikipedia 1326
Template:wp Template:wikipedia 56430
Template:slim-wikipedia Template:slim-wikipedia 11892 (3275 not counting alias uses)
Template:swp Template:slim-wikipedia 8392
Template:slim-wp Template:slim-wikipedia 207
Template:wikipedia-slim Template:slim-wikipedia 18
Template:projectlink/Wikipedia Template:projectlink/Wikipedia 48734 (478 not counting alias uses)
Template:pedia Template:projectlink/Wikipedia 34297
Template:pedialite Template:projectlink/Wikipedia 12983
Template:projectlink/wikipedia Template:projectlink/Wikipedia 354
Template:projectlink/pedia Template:projectlink/Wikipedia 620
Template:R:Wikipedia Template:projectlink/Wikipedia 2
Template:R:W Template:projectlink/Wikipedia 0
Template:WPLook Template:WPLook 0
Template:pedlink Template:pedlink 168 (165 not counting alias uses)
Template:pedialink Template:pedlink 3
Template:w Template:w 268233 (267984 not counting alias uses)
Template:W Template:w 249
Template:wikipedia-inline Template:w 0
Template:in wikipedia Template:in wikipedia 4
Template:PL:dab Template:PL:dab 7 (2 not counting alias uses)
Template:PL:disambig Template:PL:dab 5
Template:PL:disambiguation Template:PL:dab 0
Template:wtorw Template:wtorw 43
Template:w2 Template:w2 6667
Template:ruby/ja-w2 Template:ruby/ja-w2 484 (0 not counting alias uses)
Template:jarw Template:ruby/ja-w2 14
Template:w2/ja Template:ruby/ja-w2 0
Template:wj Template:ruby/ja-w2 470
Template:zh-wikipedia Template:zh-wikipedia 19031 (3 not counting alias uses)
Template:zh-wp Template:zh-wikipedia 19028

Benwing2 (talk) 05:48, 11 July 2022 (UTC)Reply[reply]

I am for cleaning this up. Vern is important. While we are at it, can we decide where to put these links? Either under the L2 or under further reading? Vininn126 (talk) 12:54, 11 July 2022 (UTC)Reply[reply]
{{wtorw}} and {{pedlink}} are effective duplicates. They can be summarily merged. (In fact they're also identical to {{vern}}, but that has a reason to exist separately.) This, that and the other (talk) 12:47, 12 July 2022 (UTC)Reply[reply]
I don't believe that either {{pedlink}} or {{wtorw}} categorize, as {{vern}} does, into Category:Entries missing English vernacular names of taxa, {{vern}}'s "reason to exist separately". DCDuring (talk) 13:54, 12 July 2022 (UTC)Reply[reply]
I am planning on eliminating {{projectlink}} and {{projectlinks}}. {{projectlinks}} will be replaced by multiple invocations of {{projectlink}}, and then {{projectlink}} will be replaced by using project-specific templates, as follows: use {{pedia}} for Wikipedia; use {{specieslite}} for Wikispecies (because there are 15,256 uses of {{specieslite}} and <= 21 uses of {{PL:species}} or {{projectlink|species}}); use {{PL:PROJECT}} for other Wikimedia projects (e.g. {{PL:books}} for Wikibooks, {{PL:source}} for Wikisource, etc.). @Surjection there is a gadget MediaWiki:Gadget-AggregateInterprojectLinks.js that you have worked on recently that may be affected by these changes; not sure, because the Javascript appears to operate on the resulting HTML rather than the raw Wikisource code. Benwing2 (talk) 05:33, 13 July 2022 (UTC)Reply[reply]
Hmmm, calling a template beginning with PL: doesn't work for some reason; I'll need a different solution. Benwing2 (talk) 07:14, 13 July 2022 (UTC)Reply[reply]
I suggest {{R:wsource}} in place of {{PL:source}}, {{R:wquote}} in place of {{PL:quote}}, etc. The total number of references to all projects other than Wikipedia and Wikispecies is ~ 500, so this sort of change won't be a big deal. I also propose making the first parameter a language code, but omittable for English, and the second parameter the page to link to, but omittable if the linked page is the same as the page title. That way, we can have {{R:wp}} to replace the badly named {{wpl}}, for linking to Wikipedia in Further Reading sections. Benwing2 (talk) 07:42, 13 July 2022 (UTC)Reply[reply]
Also, I propose renaming User:Fish bowl's badly named {{w2}} template to {{lw}}, since it's effectively a cross between {{l}} and {{w}}; it works similarly to {{w}} but takes approximately the parameters of {{l}} (and even uses {{l}} under the hood). The badly named {{ruby/ja-w2}} + variants {{jarw}} and {{wj}} will become {{ja-lw}}. Benwing2 (talk) 07:47, 13 July 2022 (UTC)Reply[reply]
Why not merge {{w}} and {{w2}}? Thadh (talk) 09:08, 13 July 2022 (UTC)Reply[reply]
Just a thought on the unification of templates: ("what do we need"?) For all, we need i)a link to PAGENAME, ii)an alternative showing of the link, iii) a lang= (en is default). The formats we need:
1) a box. {{wikipedia}} +text= could be a useful addition?
2) a written out expression with visible logo {{pedia}}
3) an inline link, like {{w}} as in definitions +tooltip+need some kind of visible feature to warn the reader that we link to a different wikiproject. Like
for Wikipedias, a light blue+tooltip e.g. Hominidae (Is used at greek el:Template:w)
for other wikiprojects, a light green+tooltip e.g. Hominidae (Is used at greek el:Template:wsp, el:Template:s)
Thak you, @Benwing2 for your efforts. ‑‑Sarri.greek  I 18:18, 13 July 2022 (UTC)Reply[reply]
@Thadh When you say merge {{w}} and {{w2}}, which interface should be used? I suspect the large majority of uses of {{w}} are for English Wikipedia and rely on the default language code of en, whereas the large majority of uses of {{w2}} are for non-Englishs Wikipedias and rely on the convenience of specifying the language code in |1=. From above, there are > 250,000 uses of {{w}} so it would take some doing to convert them all (not impossible; I converted all uses of {{inflection of}} to put the language code in |1=, and there were about 1.4 million of them). Benwing2 (talk) 00:28, 14 July 2022 (UTC)Reply[reply]
@Sarri.greek Thanks for your comments. I agree with your approach of having a three-way distinction between boxes, "Further Reading"-style lines and inline links, and I also like the idea of color-coding the cross-project links. My proposal for naming these templates for projects other than Wikipedia is probably as follows: (1) for a box, {{wikisource}}, {{wikiquote}}, {{wikibooks}}, etc. (these already exist); (2) for a line, {{R:wsource}}, {{R:wquote}}, {{R:wbooks}}, etc.; (3) for an inline link, {{wsource}}, {{wquote}}, {{wbooks}}, etc. Some of the latter already exist. Benwing2 (talk) 03:30, 14 July 2022 (UTC)Reply[reply]
@Benwing2, excellent. Let me thank you this time properly, with an n. ‑‑Sarri.greek  I 04:09, 14 July 2022 (UTC)Reply[reply]
@Benwing2: I meant, create an RFM and see which one gathers more support. I personally have never used {{w2}}, yet used {{w|lang=X}} quite regularly. Thadh (talk) 07:00, 14 July 2022 (UTC)Reply[reply]

How to request examining whether senses are different[edit]

Example: these two sense definitions for the verb sight:

  1. (transitive) To register visually.
  2. (transitive) To get sight of (something).
    to sight land from a ship

I do not understand the difference, which may be due to these senses not being different, but also to the definitions being insufficiently precise, compounded with a lack of aptly illustrative examples. I could propose a merger of the senses at WT:RFM, but I’m not at all sure they should be merged, and there is no applicable template for flagging this. I could request verification of these senses, using {{rfv-sense}}, but the issue is not really verification; it is, rather, enlightenment. What is the proper, or best, way to flag this as an issue for discussion?  --Lambiam 07:45, 11 July 2022 (UTC)Reply[reply]

{{rfd-redundant}} Vininn126 (talk) 09:36, 11 July 2022 (UTC)Reply[reply]
Thanks.  --Lambiam 11:23, 11 July 2022 (UTC)Reply[reply]

Non-Roman Usernames[edit]

WT:Usernames_and_user_pages recommends that user names should be easy for English-speakers to pronounce and type. User names like 沈澄心 fail this massively, but are saved by an exemption for the Wikimedia common login. The example user appears to be active on Mandarin Wikipedia. Is there any plan for getting round the problem with names like this? Ideally it would be possible to set up synonyms (as opposed to alternative, softly linked accounts, as with my accounts RichardW57 and RichardW57m, where the latter is relatively insecure). This solution would probably have to be implemented at a Wikimedia/Phabricator level. --RichardW57m (talk) 12:05, 11 July 2022 (UTC)Reply[reply]

I don't think this requirement can be enforced, due to unified login. If you have difficulty typing my username, you can ping @Dringsim instead. I will receive notifications via E-mail (response may be relatively slow). 13:04, 11 July 2022 (UTC)Reply[reply]
@沈澄心: 'Dringsim' is an example of a 'softly linked account'. Thank you for taking the time to set it up. As you noted, the solution is inferior to it being a synonym for your main account. --RichardW57m (talk) 15:46, 11 July 2022 (UTC)Reply[reply]
Yes. (Actually I created it 2 years ago, but forgot to use it XD) 15:56, 11 July 2022 (UTC)Reply[reply]
@RichardW57m This is something I was thinking of just today as I tried to ping an editor with an Arabic name. Maybe we should mandate all users to have a pingable alternative username like Dringsim above. However, this is not enough in itself, as 1) it does not make them recognisable and 2) the alt account is not obvious to anyone seeing the user for the first time. I think those editors should make some allusion to the alt in their nickname. For example, the person above would be called 沈 (@Drigism) or something to that effect. brittletheories (talk) 11:20, 12 July 2022 (UTC)Reply[reply]

Improve on the simplistic intransitive/transitive verb distinction[edit]

We have first class support (labels and categories) for the distinction of transitive and intransitive verbs but next to no support for verbs taking objects in other grammatical cases. This leads to tons of inconsistencies:


  1. {{lb|tr|transitive|dative}} in yardım etmek
  2. {{lb|tr|intransitive|with dative case}} in acımak
  3. {{lb|tr|intransitive|+ ablative}} in bıkmak
  4. {{lb|tr|intransitive| + ablative case}} in iğrenmek
  5. {{q|with ablative case}} in nefret etmek


  1. {{lb|de|with dative}} in helfen
  2. {{lb|de|with dative object}} in dienen
  3. {{lb|de|transitive|or|intransitive|+ [[dative]]}} in schreiben
  4. {{+obj|de|dative}} in weichen
  5. {{lb|de|dated|reflexive|with genitive}} in enthalten
  6. {{lb|de|archaic|or|elevated|transitive|_|+ genitive}} in zeihen
  7. {{qualifier|takes the genitive of the direct object}} in bedürfen

Apart from the inconsistencies and missing categorization, it is also not clear whether "transitive|dative" means that the verb takes an accusative and a dative object (as in holen) or only a dative object (as in yardım etmek). I don't think {{+obj}} (experimental since 2014) is a satisfactory solution; if we document the existence of an accusative object in the label, it is only reasonable to expect the same to be true for objects in other cases (which is reflected by the fact of how widespread these non-standard case annotations within labels are).

Ideally, we would show exactly which objects are present, how they semantically relate to the verb and whether they're optional, but I can't think of a good way to present all this information without bloating everything immensely. The concerned sense (6) in schreiben could be labeled with something like {{lb|de|(acc)<what is being written>|dat<to whom it is being written>}} or, without the semantic information, {{lb|de|(acc)|dat}} which would place it in Category:German verbs that optionally take an accusative object and Category:German verbs that take a dative object while perhaps something like {{lb|de|noobj}} could be used for verbs that take no objects in any case (both senses). I personally find this potentially parenthesized list of cases to be vastly superior to the simple transitive/intransitive distinction. While it appears to break the long-standing tradition of distinguishing between transitive and intransitive verbs, it is always possible to make {{lb|en|transitive}} be an alias of {{lb|en|acc}}; the presentation of this object information can also always be tailored to the specific language (i.e. {{lb|en|acc}}/{{lb|en|(acc)}}/{{lb|en|noobj}} can be made to display (transitive)/(transitive, intransitive)/(intransitive) as usual for English and other language with simple case systems).

This could also be extended to prepositional objects but IIRC I've proposed that a while back and it hasn't garnered the best feedback.

Pinging @Benwing2. — Fytcha T | L | C 〉 21:59, 12 July 2022 (UTC)Reply[reply]

Isn't transitive more strictly "that which has an accusative argument and also a passive participle"? Granted, having some sort of argument is something we should be documenting, but terminology here matters. Vininn126 (talk) 22:53, 12 July 2022 (UTC)Reply[reply]
@Vininn126: Are you asserting that languages that don't have a passive voice don't have transitive verbs? And what counts as a passive participle? What I interpret as Thai participles are unmarked for voice, and don't even need to be analysed as zero-derivation participles. --RichardW57m (talk) 09:24, 13 July 2022 (UTC)Reply[reply]
That has been a definition I have always heard - at least within certain languages, i.e. in Polish you have a verb like obejść that takes an accusative participle but is considered (by most sources) to be intransitive as it lacks a passive participle. It's even a bit of a psuedo... not shibboleth but pop-linguistics question to ask people, to form the passive voice with that verb. Vininn126 (talk) 09:27, 13 July 2022 (UTC)Reply[reply]
@Vininn126: This is the first time I've heard this distinction. It appears to be an idiosyncrasy of Polish grammar specifically (see the section w:Transitive_verb#In_Polish). Also, the rule seems to be "takes a direct object or has a passive participle" which would also make more sense to me. This website names verbs that take the genitive (in the positive) or instrumental but have a passive participle and are thus classified as transitive. This can still easily be accommodated for under the {{lb|en|(acc)}} etc. proposal as this logic for which case annotations lead to which things being displayed can be specified on a per-language basis. — Fytcha T | L | C 〉 10:31, 13 July 2022 (UTC)Reply[reply]
Sure, that was the assumption I originally had before hearing certain Polish grammarians. I think it's more intuitive anyway. It doesn't really change my ultimate feelings on such a proposal - I do think it's worth it to discuss how we represent transitivity. Vininn126 (talk) 10:37, 13 July 2022 (UTC)Reply[reply]
There's also {{indtr}} for verbs taking a prepositional complement. It doesn't really work for languages with a case system, though. PUC – 22:56, 12 July 2022 (UTC)Reply[reply]
@Fytcha What you are proposing with labels is very similar to what I have implemented for {{+obj}}. I prefer doing it using a separate template to the right of the definition rather than mixing case/prepositional usage with semantic labels. I also pinged you elsewhere about this; see User:Benwing2/test-obj for a bunch of examples of what I implemented. What I implemented doesn't currently categorize, but that is not hard to add if we come up with a good scheme for doing it. My main unhappiness with what I implemented is with the appearance; I tried various possibilities, including different colors, to distinguish (a) literal words (prepositions/postpositions/conjunctions/etc.); (b) cases and related grammatical labels (subjunctive, reflexive, etc.); (c) glosses; (d) the word "or". If you or someone else can come up with a good scheme for how this should look, I can implement this and push it to production. BTW there is also {{+preo}}/{{+posto}} (used for similar purpose as {{indtr}}), which my new {{+obj}} subsumes. Benwing2 (talk) 03:13, 13 July 2022 (UTC)Reply[reply]
@Benwing2: Thanks a lot, the new template seems great. I do however agree that we need to come up with a solution to make it more easily humanly readable. The biggest problem to me is that it is not immediately visually apparent where the definition actually is (i.e. where the labels end and the objects begin). The band-aid solution would be having distinguishing background colors for the three components (labels, definitions, objects; even just giving the objects a different bg color does a lot) but I can imagine why people would oppose that. If we moved all grammatical information after the sense1 that would also make it a lot easier to read. Further, if we are linking to the cases anyway, we could also think about using the standard 3 letter abbreviations for them to save some space. And on a related note, I personally don't like the semicolon given that the template already comes with such an easily identifiable or anyway.
I'm not quite sure how this meshes with the transitive, intransitive, reflexive etc. labels. To me it seems like we have split up the information of which objects a verb takes into two visually separated areas (most editors seem to share this perception based on the fact that they usually document indirect objects in the labels; there's a definite tendency to document both together, they're the same kind of information after all). The confusion of whether transitive+dative means that it takes one or two objects (one in the accusative, one in the dative) as I've highlighted above also persists. I don't know how to solve this outside the radical approach of also moving the accusative object(s) behind the definition line (which will be opposed by the community if it came to a vote). The examples for beginnen (no label, +obj transitive), anziehen (transitive, +dative), erinnern (ditransitive, +accusative +genitive) in User:Benwing2/test-obj perfectly demonstrate this issue. But even if we leave these points unsolved and only concern ourselves with non-accusative and prepositional arguments, it is a big improvement.
1: Apart from us treating accusative and objects in other cases differently (documenting the former in labels and the latter either with nonstandard labels or after the sense with {{+obj}}, {{+preo}} etc.), the same is also done for nouns where countability is a label but other grammatical information comes after the sense (see e.g. Ort). I honestly think we are too far gone and that there will never be community consensus to rectify this, so we have to make the best out of the inconsistent situation which your template improvements do. — Fytcha T | L | C 〉 11:53, 13 July 2022 (UTC)Reply[reply]
I think it is good to add such term-specific grammatical info in a standardized, machine-interpretable way. It may need some effort to get this right. The example above for labelling schreiben indicates an optional direct object, but the indirect object is equally optional. Isn't in fact the general situation that these roles are optional? We may want to mark the less usual situation that they are semi-mandatory, as for the Turkish verb yemek (to eat) – you cannot ask, *yediniz mi? ; it has to be yemek yediniz mi? or bir şeyler yediniz mi?. A minor point is that in Germanic languages the indirect object (marked by a case, not in preposition form) tends to precede the direct object, so the preferable order is {{lb|de|dat<to whom something is being written>|acc<what is being written>}}.  --Lambiam 10:05, 13 July 2022 (UTC)Reply[reply]
I think we can further this by talking about syntax in general. Imo {{+preo}}/{{+posto}} aren't bad, but how should I handle a noun like bitwa? I currently have the prepositions next to the headword to signify those prepositions apply to all meanings, but I feel like that is a sub-optimal solution. Also, how can we handle things like certain conjunctions like чтобы and żeby? Finally, what about Slavic language verbs with accusative arguments, unless it's negated, in which case you use genitive? Would we just mark them as using accusative and the reader has to know to use genitive otherwise? Vininn126 (talk) 10:58, 13 July 2022 (UTC)Reply[reply]
I think there are a number of repeated language-specific issues with such awkward behaviours. We have French reflexives that take objects, copulas that are occasionally passivised (English 'be'!), and Slavic (+ at least some Uralic) accusative/genitive switching. Hebrew has some similar behaviour to the last. It's not clear to me that Latin deponent verbs can't be transitive, though I don't think they can be used in the passive. --RichardW57m (talk) 11:50, 13 July 2022 (UTC)Reply[reply]
I'd like to chime in with a perspective from the Japanese language.
  • From what I've read, and recall from earlier in my own education, English grammar distinguishes between syntactically intransitive verbs, verbs that have no direct object in a given sentence, and syntactically transitive verbs, verbs that do have a direct object in a given sentence. Thus, I eat uses the verb eat intransitively, and I eat spaghetti uses the verb eat transitively with the object spaghetti.
Japanese grammar distinguishes between semantically intransitive verbs, called 自動詞 (jidōshi, literally self-acting + word), and semantically transitive verbs, called 他動詞 (tadōshi, literally other-acting + word). Verbs are classed as one or the other, regardless of the syntax of a given sentence. Thus, 食べる (watashi wa taberu, literally I [TOPIC] eat) uses the semantically transitive verb 食べる (taberu, to eat) without an explicitly stated object, and ラーメン食べる (watashi wa rāmen o taberu, literally I [TOPIC] ramen [OBJECT] eat) uses the semantically transitive verb 食べる (taberu, to eat) with the explicitly stated object ラーメン (rāmen), and in both cases the verb taberu is classed as a 他動詞 (tadōshi, semantically transitive verb).
  • Occasionally, there are exceptions that can be puzzling to the language learner. One specific example is that semantically intransitive verbs can sometimes take a direct object marked with the object / accusative particle (o), particularly verbs of motion or temporal persistence as used in expressions describing a time or place through which the action of the verb happens. This might be things like 歩く (watashi wa michi o aruku, literally I [TOPIC] road [OBJECT] walk). The verb 歩く (aruku, to walk) is classed as a 自動詞 (jidōshi, semantically intransitive verb), as the action of the verb does not inherently involve the agent acting upon something else, but rather the agent does something that affects the agent themselves, in a specifically non-reflexive fashion. The grammatical object in this sentence references the spatial context through which the action occurs, and does not indicate an object that is semantically required by the action of the verb.
  • Some non-mainstream grammatical analyses posit the existence of things like "dative subjects" and "nominative objects" in Japanese. What I've read of these so far boils down to apparent confusion introduced by analyzing Japanese grammar in specific constructions from the perspective of the idiomatic English translations of those constructions. As Japanese, and ignoring the colloquial English equivalents, these putative oddities evaporate. (For those interested, read through this thread at Japanese Stack Exchange.)
  • With regard to some of the discussion above, pretty much all Japanese verbs have a "passive" form, even intransitives, due in part to the additional use of the passive as a method of indirection, for purposes of increased politeness, and/or to express capability. Thus, we can say things like どこ行かれます (doko e ikaremasu ka, “where [DIRECTION] [go PASSIVE POLITE] [QUESTION]” → “where are you going? [POLITE / HONORIFIC]”), or いよいよ来られました (iyoiyo koraremashita, “finally [come PASSIVE POLITE PAST]” → “you finally came [POLITE / HONORIFIC]”).
In light of these aspects of how Japanese verbs operate, the presence of direct objects or passive forms cannot be used as any clear indicator of transitivity / intransitivity in Japanese. ‑‑ Eiríkr Útlendi │Tala við mig 23:11, 13 July 2022 (UTC)Reply[reply]
@Fytcha Can you create some mocks of how you think things ought to look? I am not so good at user interfaces, having always done back-end software development and rarely front-end work. BTW I'm not opposed to stuffing all this info into the label before the definition if people think this is the best way; my preference is to put it after mostly because I think the definition itself is more important, and having a bunch of semantic labels + syntactic info all before the definition can make the definition get lost. And yes I don't think we need to solve every issue to create something far better than what we have today. For example, in a language where the concept of transitive vs. intransitive doesn't make sense, you can simply avoid those labels in favor of specifying the case explicitly. Benwing2 (talk) 03:25, 14 July 2022 (UTC)Reply[reply]
@Benwing2: Unfortunately, I currently don't have the time to initiate some big change, but seeing that the transitive/intransitive distinction is more complicated than I had initially been under the impression of, I don't think I can stand behind my initial proposal anymore anyway. I think your proposed template that subsumes {{+preo}}, {{+posto}} and {{indtr}} is a big improvement and I don't have a whole lot to suggest in terms of presentation apart from what I've already said (getting rid of the semicolon before the or). The presentation can always be changed retrospectively; what really counts is that we have the verbs' argument data present in Wikicode in a consistent format. — Fytcha T | L | C 〉 22:22, 18 July 2022 (UTC)Reply[reply]
A further thought, however
What if we included something like a syntax header, or put that under Usage notes? Also, should these be collapsible? Vininn126 (talk) 22:27, 18 July 2022 (UTC)Reply[reply]

Lowercase at definitions[edit]

Could you please make template {{alternative form of}}, {{diminutive of}} to start normally with lowercase as default (as in definitions)? Also, standardise lowercase default at similar templates at Category:Form-of templates? We need to keep adding nocap=1 everywhere at definitions. I take it, that in this dictionary, capital initial letters are used when we write a word which is always written with capital first letter (depending on the language). E.g. now, the impression is that Alternatives or Diminutives are proper nouns for English. (Please, keep in mind, that en.wiktionary is English, but has an international audience: it is crucial to know how words are normally written).
Unlike at etymologies, where we need cap=1: here, we have a full text with sentences (starting with capital, ending with full stop).
Otherwise, we are left only with template {{form of}} which is normal, with initial lowercase default + need to add categories with {{cln}}.
Is there a link to the house rules for comma, capital, and the similar? Is there a reassurance that the use of full stops and capitals are and will remain unchanged in the future? (Sometimes, I use nodot=1 and nocap=1 provisionally, in the fear that templates might be changed in the future). Thank you. ‑‑Sarri.greek  I 17:18, 13 July 2022 (UTC)Reply[reply]

An alternative approach that wouldn't burden those entering English alternative forms etc. would be simple templates that called the above templates and specified "nocap=1". This seems like a Grease Pit matter or you could make such simple templates yourself, which might trigger some veteran template editor to make more definitive templates.
As to "house rules" (ie, a style guide), you could propose one. Our practice is to have full definitions for English terms with initial capitals and a full stop. The usual practice for entries in other languages favors simple glosses with no initial caps and no full stop. DCDuring (talk) 18:08, 13 July 2022 (UTC)Reply[reply]
Style guide: WT:STYLE. Equinox 18:19, 13 July 2022 (UTC)Reply[reply]
O!! @DCDuring, Equinox, thank you. Now i see at Wiktionary:Style_guide#Patterns that en.wikt starts definitions with capitals, even these short phrases! But why? Why in such short explanations? e.g.To walk like... I might think it is To instead of to. I might understand the initial capital if there are 2 sentences in a much larger text... I have asked the same at fr.wikt, where e.g. for fr:Sunday they write *Dimanche instead of dimanche etc... So confusing. I'm so sorry! That makes us (us=non host-language speakers) doubt any wiktionary def especially in wiktionaries of languages we do not understand, we never know: is it or is it not?... ‑‑Sarri.greek  I 18:58, 13 July 2022 (UTC)Reply[reply]
So BTW, @DCDuring, Equinox, Sarri.greek Awhile ago I proposed making all form-of templates generate an initial capital and final period when the language is English, and no initial capital or final period when the language isn't English. IMO this would solve the perennial issue of whether to make these templates capitalize and add a period or not, and this is consistent with what User:DCDuring said above about glosses vs. full definitions. Some people objected to this approach in the past but I'd like to ask it to be reconsidered; the current situation with form-of templates is an utter mess. Benwing2 (talk) 03:19, 14 July 2022 (UTC)Reply[reply]
Small note: @Benwing2, i do not understand in English? Everything here, in en.wikt, is in English. About initial capitals to non-initial-capital-words: i call it "the curse of wikipedia", where everything is lemmatized with capital, regardless of how a word is written. :) ‑‑Sarri.greek  I 04:17, 14 July 2022 (UTC)Reply[reply]
I don't think he said "in English", he said "is English", i.e. when the language of the term being defined is English (not the language of the definition). The reason there is an inconsistency between English and non-English languages is that definitions of English words are descriptive, whereas definitions of non-English words are typically just one-word translations. It is exceedingly rare for the capitalization practices here to create ambiguity. Just as is the case for capitalization at the beginning of sentences, you should generally assume that the capitalization at the beginning of definitions is grammatical/aesthetic and not actually part of the word. Andrew Sheedy (talk) 04:24, 14 July 2022 (UTC)Reply[reply]
I would be in favour of that. The inconsistencies created by templates bothers me. Andrew Sheedy (talk) 04:24, 14 July 2022 (UTC)Reply[reply]
(edit conflict) @Sarri.greek I should have been clearer, what I meant by "the language is/isn't English" is that the language of the term is/isn't English. User:Andrew Sheedy explained it well. English-language terms like physics have full-sentence definitions:
# The branch of [[science]] concerned with the study of the properties and [[interaction]]s of [[space]], [[time]], [[matter]] and [[energy]].
So the corresponding non-lemma forms, alternative forms and the like would have their definitions formatted like full sentences as well. Greek φυσική (fysikí), Russian фи́зика (fízika) and similar terms in other languages have their definitions as simple glosses (e.g. just # [[physics]]), so the non-lemma forms, alternative forms, etc. would be formatted likewise. That would be much better than the current mess at Category:Form-of templates. All form-of templates would take params |nocap= and |nodot= that could be used in the definitions of English-language terms (and for non-English-language terms would either have no effect or throw an error), as well as a param |cap= that could be used in the definition of a non-English-language term if you really wanted it (and for an English-language term would either have no effect or throw an error). Benwing2 (talk) 04:30, 14 July 2022 (UTC)Reply[reply]
Thank you @Andrew Sheedy, Benwing2 for explaining. Still. The example above (# The branch of [[science]] ... . is a sentence. OK, i understand it as such. But
Diminutive of blah, I do not understand it as a sentence. It is just 3 words, no verb!
Nevermind, though, if it is a custom for English dictionaries, Thanks. ‑‑Sarri.greek  I 04:44, 14 July 2022 (UTC)Reply[reply]
It is indeed just a custom, which some dictionaries follow and others don't. We're not saying that the definitions are sentences, just that they're written in sentence style (which means, with the first word capitalized and a period/full stop at the end). Andrew Sheedy (talk) 04:51, 14 July 2022 (UTC)Reply[reply]
Just to be clear, the definition of physics above is NOT a sentence, concerned being a past participle used to modify branch. But, it is formatted with initial caps and a full stop, in the same way that sentences are, as is customary here and in most print English dictionaries and many online ones. The only English dictionaries that I am familiar with that have full sentence definitions are the Collins COBUILD series.
Usually a definition of an English word is written to have the same grammatical function as the PoS header indicates, eg, an adjective definition would consist of one or more adjectives or adjectival phrases. An advantage of that is that the definition can, however awkward-sounding, be substituted (for noun definitions, after trimming any initial determiner) for the defined word in any usage to test the correctness of the definition in any usage. Non-gloss definitions characterize the usage, but don't fill the syntactic role of the term they define. DCDuring (talk) 14:39, 14 July 2022 (UTC)Reply[reply]
@Sarri.greek: We had a vote, Wiktionary:Votes/2020-01/Definitions of English terms should start with a capital and end with a full stop (which ended in no consensus), thus perhaps a vote would also be made regarding the capitalization of alternative-form definitions. J3133 (talk) 09:16, 15 July 2022 (UTC)Reply[reply]
Thank you for the link @J3133. I now see, it is customary for English dictionaries. Not in greek dictionaries though where the opposite is usual style, which is why i was so perplexed. ‑‑Sarri.greek  I 09:23, 15 July 2022 (UTC)Reply[reply]
But cambridge.org and oxforddict here are some examples. ‑‑Sarri.greek  I 09:25, 15 July 2022 (UTC)Reply[reply]

Learned borrowing vs. borrowing templates?[edit]

Since there is now specifically an <<lbor>> template in addition to just plain <<bor>>, how are we planning on using these as per official policy, particularly in relation to things like Latin and Ancient Greek borrowings by the modern descendants of these languages, or in general just borrowings from them into any language (since they are essentially dead languages that have had a lot of impact on technical, academic, and scientific domains and international vocabulary)? Wouldn't the vast majority of borrowings from Latin (whether Romance languages or not) technically be "learned"? Do we need to go back and convert several thousand entries to this (using a bot)? Word dewd544 (talk) 19:58, 13 July 2022 (UTC)Reply[reply]

They would and they do need to be converted. Thadh (talk) 20:11, 13 July 2022 (UTC)Reply[reply]
Are borrowings from Latin into, say, New High German always learned or are there some edge cases (e.g. religious vocabulary) where natural borrowing may have occurred? — Fytcha T | L | C 〉 20:14, 13 July 2022 (UTC)Reply[reply]
I'd also like to point out that a large number of these supposedly classical borrowings are actually {{internationalism}}s, typically coined in English, French or German. brittletheories (talk) 20:49, 13 July 2022 (UTC)Reply[reply]
@Brittletheories: AFAICT, {{internationalism}} should only be used where it's not clear which language directly the term was borrowed from. Finnish hypoteettinen is marked as an internationalism (because it is not known through which language exactly the term entered Finnish; at least that's how it should be) whereas it is known for Romanian ipotetic, even though they have the same ultimate source. — Fytcha T | L | C 〉 20:59, 13 July 2022 (UTC)Reply[reply]
That is our current approach. I do wonder if we should tag internationalisms in the future alongside an ultimate source, but that's a conversation for a different day. Vininn126 (talk) 21:08, 13 July 2022 (UTC)Reply[reply]

Bot task: checking for uncreated German terms[edit]

Hello, I am looking to put my bot (KovachevBot) to work to do a task involving the dict.cc database, a German dictionary consisting of ~1.2 million terms. I'm looking to use Pywikibot to simply check whether each lemma in the dictionary exists on Wiktionary, and from that, to create a vocabulary list which I will publish here on Wiktionary so that others can see what words have yet to be added in German. I've managed to trim down the dictionary to only around 400,000 non-redundant words, so hopefully this would reduce the burden on the servers when connecting to each page to check for its existence. I just wanted to gather approval/dismissal as to whether this is a constructive idea and how I should go about doing it so as to not put overmuch load on Wiktionary.

I've published the code on GitHub so anyone who can read Python can have a look for bugs, etc., though it seems to be working fine on the small-scale test I did for ~30 terms. I intend to expand the script to group words by their part of speech for added convenience: they would appear under different headings for each POS on the generated dump page.

Thanks very much for any feedback, Kiril kovachev (talk) 20:35, 13 July 2022 (UTC)Reply[reply]

@Kiril kovachev: You don't need a bot to do that. It's much faster to download the database dumps[17] and to check whether the entries exist in there. Note however that we won't be able to add all entries from dict.cc here because they have laxer inclusion criteria than we do (this only concerns a minority of entries). You can post these finished lists to your user space. Note however that under no circumstances should entries be automatically generated based on the dict.cc data. — Fytcha T | L | C 〉 20:45, 13 July 2022 (UTC)Reply[reply]
Thanks, that sounds way easier, haha. I forgot those existed. And, of course, no auto-generation can happen, not least because dict.cc doesn't allow the data to be re-published, apparently not even excerpts. I don't know what this means for whether I'll be able to upload the finished list, but anyway, thanks a lot for the input! Kiril kovachev (talk) 20:47, 13 July 2022 (UTC)Reply[reply]
@Kiril kovachev, Fytcha IANAL but I am fairly sure you can't copyright a list of words. User:BD2412 can probably say more. Benwing2 (talk) 03:16, 14 July 2022 (UTC)Reply[reply]
It depends on whether the compiler exercised discretion in compiling the list. If they were to purport to list all words in the German language, then their list would just be a raw fact that anyone could permissibly compile. If they claim to exercise some judgment in determining which words to include, they have a stronger claim to copyrightability. When we previously created User:Brian0918/Hotlist (a list of missing words from other large English dictionaries), as I recall we ultimately generated a list of words common to two such dictionaries, so that it was not actually a list from any one of them. bd2412 T 04:27, 14 July 2022 (UTC)Reply[reply]
  • The best thing is just to do it, and if there's a legal issue we can undo it later. Dunderdool (talk) 19:25, 14 July 2022 (UTC)Reply[reply]
@Kiril kovachev: For similar tasks you could also use the scraped data from https://kaikki.org/dictionary/German/ . This might be even faster, but I am no expert. --MrBeef12 (talk) 08:56, 19 July 2022 (UTC)Reply[reply]
@Kiril kovachev: You could also try @Erutuon's entry index, hosted on toolforge. I had a look at the dict.cc data, its licensing scheme is a bit weird. But as long as the data is not re-published, it seems fine. I think the most interesting use of this list could be to identify missing multiword entries. – Jberkel 18:11, 20 July 2022 (UTC)Reply[reply]
Oh, that looks splendid, looks like that'll forego a lot of trouble scanning through the colossal dump files :)
But, you're right, I do think the dict.cc license is a touch bizarre. I also don't get what the author means by "republishing the data", because that seems to include the words in the dictionary themselves, not just the translations... in that case, it wouldn't be possible to re-publish the generated list, or...? In any case I suppose I'll get in touch :')
Thanks for the great advice, Kiril kovachev (talk) 19:36, 20 July 2022 (UTC)Reply[reply]

Philippine languages subfamilies, and descendant proto-languages of Proto-Philippine[edit]

Perhaps time to create subfamilies for the Philippine languages and some descendant proto-languages of Proto-Philippine. The existing category does look too crowded in its present state.

Proposed family tree for Philippine languages (and existing languages to be grouped therein):

  • Batanic
    • Yami (tao)
    • Ivatan (ivv)
    • Ibatan (ivb)
  • Northern Luzon
    • Ilocano (ilo)
    • Arta (atz)
    • Dicamay Agta (duy)
    • Isnag (isg)
    • Pamplona Atta (att)
    • Villa Viciosa Agta (dyg)
    • Ibanag (ibg)
    • Itawit (itv)
    • Yogad (yog)
    • Central Cagayan Agta (agt)
    • Gaddang (gad)
    • Gad'ang (gdg)
    • Dupaningan Agta (duo)
    • Dinapigue Agta (phi-din)
    • Casiguran Dumagat Agta (dgc)
    • Nagtipunan Agta (phi-nag)
    • Pahanan (apf)
    • Meso-Cordilleran
      • Northern Alta (aqn)
      • Southern Alta (agy)
      • Central-Southern Cordilleran
        • Isinai (inn)
        • Binongan Itneg (itb)
        • Limos Kalinga (kmk)
        • Tanudan Kalinga (kml)
        • Lubuagan Kalinga (knb)
        • Southern Kalinga (ksc)
        • Batad Ifugao (ifb)
        • Amganad Ifugao (ifa)
        • Mayoyao Ifugao (ifu)
        • Tuwali Ifugao (ifk)
        • Balangao (blw)
        • Central Bontoc (lbk)
        • Eastern Bontoc (ebk)
        • Northern Bontoc (rbk)
        • Southern Bontoc (obk)
        • Southwestern Bontoc (vbk)
        • Kankanaey (kne)
        • Southern Kankanay (xnn)
        • Ilongot (ilk)
        • Pangasinan (pag)
        • Ibaloi (ibl)
        • Karao (kyj)
  • Central Luzon
    • Kapampangan (pam)
    • Abenlen Ayta (abp)
    • Ambala Ayta (abc)
    • Bolinao (smk)
    • Botolan Sambal (sbl)
    • Mag-Anchi Ayta (sgb)
    • Mag-Indi Ayta (btw)
    • Sambali (xsb)
    • Remontado Agta (agv)
  • Northern Mindoro
    • Alangan (alj)
    • Iraya (iry)
    • Tadyawan (tdy)
  • Greater Central Philippine
    • Central Philippine
      • Tagalog (tl)
      • Bikol
        • Bikol Central (bcl)
        • Southern Catanduanes Bicolano (bln)
        • Buhi'non Bikol (ubl)
        • Libon Bikol (lbl)
        • Miraya Bikol (rbl)
        • Iriga Bicolano (bto)
        • Northern Catanduanes Bicolano (cts)
      • Bisayan
        • Tausug (tsg)
        • Butuanon (btw)
        • Surigaonon (sgd)
        • Tandaganon (tgn)
        • Cebuano (ceb)
        • Waray-Waray (war)
        • Waray Sorsogon (srv)
        • Hiligaynon (hil)
        • Capiznon (cps)
        • Bantayanon (bfx)
        • Porohanon (prh)
        • Masbatenyo (man)
        • Masbate Sorsogon (bks)
        • Romblomanon (rol)
        • Asi (bno)
        • Aklanon (akl)
        • Kinaray-a (krj)
        • Inonhan (loc)
        • Ratagnon (btn)
        • Cuyonon (cyo)
        • Caluyanun (clu)
      • Mansakan
        • Davawenyo (daw)
        • Mansaka (msk)
        • Kalagan (kqe)
        • Tagakaulu Kalagan (klg)
        • Mamanwa (mmn)
    • Southern Mangyan
      • Buhid (bku)
      • Western Tawbuid (bnj)
      • Eastern Tawbuid (twb)
      • Hanunoo (hnn)
    • Palawanic
      • Brooke's Point Palawano (plw)
      • Central Palawano (plc)
      • Southwest Palawano (plv)
      • Central Tagbanwa (tgt)
      • Palawan Batak (bya)
    • Subanen
      • Western Subanen (suc)
      • Central Subanen (syb)
      • Northern Subanen (stb)
    • Danao
      • Iranun (ill)
      • Maguindanao (mdh)
      • Maranao (mrw)
    • Manobo (mno)
      • several languages already correctly assigned to mno
    • Gorontalo-Mongondow
      • Bolango
      • Buol
      • Gorontalo
      • Suwawa
      • Mongodow
      • Ponosakan
  • Ati
  • Kalamian (phi-kal)
    • several languages already correctly assigned to phi-kal
  • Southern Mindanao
    • Koronadal Blaan
    • Sarangani Blaan
    • Tboli
    • Tiruray
  • Sangiric
    • Talaud
    • Sangir
    • Ratahan
  • Minahasan
    • Tontemboan (tnt)
    • Tombulu (tom)
    • Tonsea (txs)
    • Tonsawang (tnw)
    • Tondano (tdn)
  • Umiray Dumaget Agta

Also requesting addition of some descendants proto-languages of Proto-Philippine.

  • Proto-Northern Luzon
  • Proto-Central Philippine
    • Proto-Bisayan
  • Proto-Sangiric
  • Proto-Minahasan

—⁠This unsigned comment was added by TagaSanPedroAko (talkcontribs) at 06:10, 14 July 2022 (UTC).Reply[reply]

  • Support, although I think we don't have to go into detail with the internal branching of the Bisayan languages (and maybe also of the Northern Luzon and Central Luzon languages).
@TagaSanPedroAko Is there a reason why you have left the languages of Mindoro unassigned? The longstanding working hypothesis is that they can be assigned to the Northern Mindoro and Southern Mindoro (nested in Greater Central Philipines) subgroups. Also, it will be of great help for anyone who is willing to execute this request if you add the language codes. –Austronesier (talk) 16:42, 14 July 2022 (UTC)Reply[reply]
@Austronesier: I simplified the proposed family as noted. Also grouped the Mindoro languages accordingly.-TagaSanPedroAko (talk) 20:23, 14 July 2022 (UTC)Reply[reply]
@Wiktionarian89 What do you think about this proposal? –Austronesier (talk) 15:40, 18 July 2022 (UTC)Reply[reply]
This seems like a great idea. Plus, Bantik (bnq) should also be assigned to this group, specifically to the Sangiric subgroup. Wiktionarian89 (talk) 13:40, 20 July 2022 (UTC)Reply[reply]

Standardise the choice of the lemma and alternative forms of Cantonese, and to allow Jyutping titles as lemma if there are no other alternatives?[edit]

Background (you may skip this if you're familiar enough with Cantonese): There are some Cantonese words without corresponding character(s) in standard written Chinese. For these words, speakers would use a character with same/similar sounds (e.g. ), use a character with the same/similar etymology (e.g. ), or to create new characters with both the phonetic and semantic meaning of the word (e.g. 𨋢), or for Hong Kong Cantonese speakers to use a non-standard romanisation that is often based on English phonology, e.g. hea, gur. These methods works fine for monosyllabic words, where we would then lemmatize the most common form, though it is possible to have many alternate forms due to the reasons mentioned, cf hea. For multisyllabic words this becomes problematic, where the various combination of alternate forms sporadicly appear, which means that it is difficult to determine the most common form. Some dictionaries would try to assign obscure characters that share a similar etymology, see yue:虢礫緙嘞 yue:犖确 for example, but this method is rather arbitrary and also create the same problems with alternative forms. Some dictionaries uses dual-title, e.g. words.hk: 啹/gur.


  1. If there is an indisputable form, use that as the lemma form.
  2. If the forms have distinct pronunciations but share the same/similar meaning(s), the lemma form should be considered separately. (e.g. 尋日 and 琴日)
  3. If there is enough written usage to justify a particular form as the most common written form, that form should be used as lemma (e.g. hea, ), even in cases it may not fit the word's etymology.
  4. If there is enough written usage but there are multiple written forms sharing similar usage, the lemma form should prefer etymology (among the forms found in usage) if possible, otherwise on a first-come-first-served basis.
  5. If a word does not have enough written usage to establish the most common written form, the lemma form should be determined via etymology, provided that this form has actual uses, not only mentions, and that its etymology is logical and actually makes sense. (e.g. 欷欷歔歔)
  6. If the lemma form of a word could not be determined via the rules above, then Jyutping titles will be used as a last resort, where the Jyutping will be determined based on the most common pronunciation. (e.g. the word with the pseudo-etymological form 虢礫緙嘞 would be at gwik1 lik1 kaak1 laak1) Note: This only involves around 20-30 entries by my estimation, most of which are onomatopoeia and Kra-Dai loans.

Rationale: By lemmatizing words under a particular form, we would be implying (or misleading users) that it is the correct form of the word, especially for words in the final category. Note that almost all 虢礫緙嘞 results on Google are mentions rather than uses. (e.g. although is etymologically more correct, the non-verb senses of are almost always written as in Hong Kong Cantonese, and therefore should move to to reflect usage)

This will involve changing quite a number of pages, but I believe it is worth the effort to do so, considering the benefit it brings. This set of principles can also be applied similarly to other Chinese languages in the future if necessary.

See also: this vote, where monosyllabic entry Jyutping entries are allowed as non-lemmas.

What are your thoughts on this? (Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly): -- Wpi31 (talk) 05:43, 15 July 2022 (UTC)Reply[reply]

@Wpi31: Thanks for drafting the proposal. I think I generally agree with this, and it seems to capture what most editors have been subconsciously following. I do have some reservations about certain details, such as lemmatizing at the most common form when the etymological character is apparent. Sometimes this might cause issues where we're using different forms for cognates outside of Yue (or even within Yue), e.g. instead of . I know this would give an effect of implying a correct form, but it is probably inevitable unless we want duplication of information. It is probably better to write usage notes that might give details on which form being more common in which variety. Jyutping should really be a last resort where there are no attested characters (Chinese or Latin). Kwik1 lik1 kwaak1 laak1 (and its variants) do have written forms in dictionaries, and so I would tend towards following what other dictionaries have instead of resorting to Jyutping, even if the characters are rather obscure. — justin(r)leung (t...) | c=› } 06:00, 15 July 2022 (UTC)Reply[reply]
@justinrleung: Regarding the first issue you mentioned, personally I feel like its better to treat such cognates separately, as in vs . I'm also fine with putting everything in the same entry, but more care should be taken. Readers would not go to the usage notes section in many cases, e.g. if a non-native speaker visits to see what it means in Cantonese, they will see the {{zh-alt form|惜}} stating means "to kiss", which fits in their context, and simply assumes that 惜 is the more correct one without even going to the page, let alone reading the usage notes. Perhaps modifying {{zh-see}} and {{zh-alt form}} to allow mentioning something along the lines that 𡃶 is an alternative form of 惜, but 𡃶 is a more common form, and 錫 is the most common form, and maybe adding a {{zh-common form}} which mentions the common forms with a similar format to {{zh-syn}} (so that it doesn't involve scrolling down to usage notes)?
On the second part, I think Jyutping should also be used when the etymology does not make sense, as in the case of gwik1 lik1 kaak1 laak1. Usually when creating/coining characters for related sounds used in the same word, they would share the same radicals/components, as in 徘徊, 蝴蝶, 葡萄, which is why I don't feel like using or its variants since it tries to approximate the pronunciation but ignores other things almost completely. The characters are also too obscure and arbitrary for approximating sounds, since if these characters are only chosen to represent their sounds, they should be common enough for the general public to know its pronunciation: I might have chosen the less accurate 棘力卡啦 instead. (also compare with the straightforward 直不甩) Wpi31 (talk) 07:38, 15 July 2022 (UTC)Reply[reply]
@Wpi31: For the first issue, I think the same issues would kind of come up with other kinds of variants, like simplified Chinese or variant traditional forms (common ones include and ). Alternative forms are usually put in {{zh-forms}} if they apply to all definitions, but I would usually use {{zh-alt-inline}} for variants that are particular to certain senses.
For the second issue, I think the obscurity/arbitrariness of characters is kind of subjective. Unattested forms would be ruled out by WT:ATTEST. (Since Cantonese is a LDL, we would only usually require one attestation [mention or use] from a reliable source.) — justin(r)leung (t...) | c=› } 08:09, 15 July 2022 (UTC)Reply[reply]

Romance Etymology headers[edit]

The etymology of Romagnol bo says the word is from Latin bōs (cow), but actually there are different stages from Latin onward and phonetic changes as much. bo < ʙŏᴠᴇ < bōs (the intermediate form doesn't undergo metaphony because of the final -e). Are all these changes and phonetics rules allowed on Wiktionary, and are the former required to be in ꜱᴍᴀʟʟ ᴄᴀᴘꜱ characters? BandiniRaffaele2 (talk) 19:10, 15 July 2022 (UTC)Reply[reply]

It went probably through Vulgar Latin *boem < Latin bōvem; compare Friulian bo. I have not seen small caps used in etymologies on the English Wiktionary. I don’t think the phonetic rules involved in the development need to be spelled out, unless something non-obvious is going on. We even give the etymology of Middle Persian gwl as “[f]rom Old Persian *vr̥da-” without any further explanation of the sound laws involved – in this case a quite dramatic but entirely regular change.  --Lambiam 20:23, 15 July 2022 (UTC)Reply[reply]
Of course, Romagnol bo descends from Latin bovem and not it's nominative form bōs, it's not like bōs > bove, magically. So should the etymology be 'From Latin bovem'? That is surely more accurate, but it would require users to click twice to get to the Latin lemma entry, presumably they are already familiar with navigating themselves in non-lemma entries.
Should the etymology then be 'From Latin bovem, accusative singular form of bōs'. This makes it so users can immediately get to the lemma entry without misleading users of the terms descending from the nominative form, but it seems unnecessarily long.
Should it be 'From Latin bovem', but then the link actually takes you to the bōs page? The page bovem exists, and I'd feel betrayed and deceived clicking on a link that takes me to another page that it promised.
A hybrid like bovem, bōs?
This, I believe, is the same problem as verbs that was discussed some time ago (Wiktionary:Beer parlour/2022/February#Changing the default citation form for Latin verbs), where the Latin lemma form does not coincide with the Romance lemma form. In the end the discussion died and they agreed on nothing, and since there isn't an agreement all around Wiktionary verbs use different practices (see capire, dolere, cenar, mantenere). Catonif (talk) 18:29, 18 July 2022 (UTC)Reply[reply]
We did have another thread about it here. Perhaps it is time to formally discuss the 'link form A, but display form B' strategy on the Beer Parlour. A bit odd, granted, but it is my favourite of the options that remain after eliminating the apparently too radical idea of changing the Latin lemmas.
I have, incidentally, used 'hybrids' in cases where a Proto-Italic entry is involved. For instance, the etymology for Spanish hender:
'From Old Spanish fender, from Latin findere, findō, from Proto-Italic *findō, from Proto-Indo-European *bʰeyd- (to split).'
Nicodene (talk) 20:19, 18 July 2022 (UTC)Reply[reply]
Oh, sorry. I didn't know that behind the scenes this was still being discussed. Catonif (talk) 20:57, 18 July 2022 (UTC)Reply[reply]

Movement Strategy and Governance News – Issue 7[edit]

Movement Strategy and Governance News
Issue 7, July-September 2022Read the full newsletter

Welcome to the 7th issue of Movement Strategy and Governance News! The newsletter distributes relevant news and events about the implementation of Wikimedia's Movement Strategy recommendations, other relevant topics regarding Movement governance, as well as different projects and activities supported by the Movement Strategy and Governance (MSG) team of the Wikimedia Foundation.

The MSG Newsletter is delivered quarterly, while the more frequent Movement Strategy Weekly will be delivered weekly. Please remember to subscribe here if you would like to receive future issues of this newsletter.

  • Movement sustainability: Wikimedia Foundation's annual sustainability report has been published. (continue reading)
  • Improving user experience: recent improvements on the desktop interface for Wikimedia projects. (continue reading)
  • Safety and inclusion: updates on the revision process of the Universal Code of Conduct Enforcement Guidelines. (continue reading)
  • Equity in decisionmaking: reports from Hubs pilots conversations, recent progress from the Movement Charter Drafting Committee, and a new white paper for futures of participation in the Wikimedia movement. (continue reading)
  • Stakeholders coordination: launch of a helpdesk for Affiliates and volunteer communities working on content partnership. (continue reading)
  • Leadership development: updates on leadership projects by Wikimedia movement organizers in Brazil and Cape Verde. (continue reading)
  • Internal knowledge management: launch of a new portal for technical documentation and community resources. (continue reading)
  • Innovate in free knowledge: high-quality audiovisual resources for scientific experiments and a new toolkit to record oral transcripts. (continue reading)
  • Evaluate, iterate, and adapt: results from the Equity Landscape project pilot (continue reading)
  • Other news and updates: a new forum to discuss Movement Strategy implementation, upcoming Wikimedia Foundation Board of Trustees election, a new podcast to discuss Movement Strategy, and change of personnel for the Foundation's Movement Strategy and Governance team. (continue reading)

Mervat (WMF) (talk) 12:58, 15 July 2022 (UTC)Reply[reply]

"Deleter" user group?[edit]

What do people think about creating a "deleter" user group that has the ability to delete pages? Currently you have to be an admin to be able to delete pages, but I'd like to be able to delete pages from my non-admin account as I try to do all my editing from there. I find that the main reason I have to switch to my admin account is to be able to delete pages. Other admin-only actions (e.g. changing page protection) occur more rarely. I also imagine it may be useful to be able to grant the deletion right to certain non-admins, similarly to how we have a "template editor" user group that gives the ability to edit many otherwise-protected templates and modules. Benwing2 (talk) 00:04, 18 July 2022 (UTC)Reply[reply]

I think this has come up before. My main question is whether this comes with the ability to undelete and possibly also see deleted revisions. In your case it doesn't matter, since you're basically an admin using a non-admin account. For others, though, that might be more problematic. @Svartava got into trouble because they were using their extended mover privileges to delete things (I'm not sure exactly how), but that was a unique combination of factors that may not apply to anyone else who might be granted the role. Chuck Entz (talk) 00:27, 18 July 2022 (UTC)Reply[reply]
@Chuck Entz Deleting, undeleting and seeing deleted revisions are all different user rights. There are actually a whole host of user rights related to deletion, at least the following:
Delete tags from the database (deletechangetags)
Delete and undelete specific log entries (deletelogentry)
Delete and undelete specific revisions of pages (deleterevision)
Delete pages (delete)
Mass delete pages (nuke)
Search deleted pages (browsearchive)
Undelete a page (undelete)
View deleted history entries, without their associated text (deletedhistory)
View deleted text and changes between deleted revisions (deletedtext)
So potentially we could create a "deleter" user group without the ability to undelete or view deleted entries. (I would be fine with this as I don't have occasion to undelete pages very often, and I'm not sure if I've ever found the need to view deleted revisions.) Benwing2 (talk) 01:39, 18 July 2022 (UTC)Reply[reply]
Thank you @Benwing2 for pointing out these rights. (Imetsia might want to comment) I strongly support adding the group deleter/eliminator/closer for a group with deleting and undeleting rights, with or without revision (un)hiding rights (no strong opinion on that). I think the most sensible is delete, undelete, deletedtext, deletedhistory, mergehistory and suppressredirect from Special:ListGroupRights. See also: d:Q10862160 for the group's existence/proposal elsewhere; where it exists are — fawiki, jawiki, hiwiki (no longer), ptwiki, ruwiki, urwiki, viwiki, viwikibooks; elsewhere on the remaining wikis it has been proposed. —Svārtava (talk) • 12:52, 18 July 2022 (UTC)Reply[reply]
"I'm not sure exactly how": maybe by moving to another namespace and suppressing the redirect? 00:37, 18 July 2022 (UTC)Reply[reply]
@Chuck Entz, It's about a year and a half ago when I first got extended mover rights, when I was sort of a newbie with only 6 months of editing. I didn't really know the rules or how things work (and neither should I have been granted that out of process in the first place then) and impulsively moved a bunch of speedy requested pages to uncreated Hindi/Sanskrit pages and changed their content accordingly - which was not what the move feature is for; and secondly it messed the page history. —Svārtava (talk) • 12:52, 18 July 2022 (UTC)Reply[reply]
Wiktionary:Votes/2021-12/Deleter roleFish bowl (talk) 01:57, 18 July 2022 (UTC)Reply[reply]
The bar is, and should be, quite low to be an admin. It is also not difficult to request a page be deleted if one is not an admin. This seems like adding complications without solving any existing problem. If someone wants to be able to delete stuff, request to be an admin. - TheDaveRoss 14:43, 18 July 2022 (UTC)Reply[reply]
I very much favor this idea — I created the last vote about it after all! There are many benefits that come with it, as I've expounded on previously. The main policy obstacles that others put forward the last time are two:
  1. What should the nomination process be?
    It should ideally be robust enough to prevent an unqualified person from becoming a deleter. But it shouldn't be too restrictive that it becomes barely any different from an admin-scale vote. I favor a process by which one admin nominates, and then two other admins have to approve the nomination. Similar to what we do at WT:Whitelist, but a bit more rigorous.
  2. What permissions does the role include, paying particular attention to undeletion, viewing hidden revisions, and unhiding revisions?
    I would favor all the permissions. According to my vision, the role would be basically an admin role but without the blocking power. There are several users for whom such a role would be appropriate. Imetsia (talk) 18:26, 18 July 2022 (UTC)Reply[reply]
@TheDaveRoss I wonder why you think the bar should be low to be an admin; admins can really fuck up the site if they want, so I think adminship shouldn't be given out willy-nilly. Also I definitely believe in separating admin and non-admin accounts similarly to how you wouldn't normally do all your work in Unix as root; it's too easy to mess something up accidentally. Benwing2 (talk) 02:00, 19 July 2022 (UTC)Reply[reply]
@Benwing2: How can admins mess the site up? None of the actions which an admin can take are irrevocable, most of the actions are easily so. The most damaging stuff which can be done (aside from by people with database access) can be done by people not even logged in, they can still run a bot and if they hop around IPs it could be very challenging to find all of the edits they make, especially if it was something like just deleting the Russian section from every page where it exists because they don't like Putin. The fact is the more admins we have the less damage admins can do, because there are more people around to see and stop the problem, and then fix the problem. I agree that we shouldn't give the rights out to anyone who asks, but anyone who sticks around for a while and makes good-faith, quality edits can block/delete in my book. It should also be easier to take the tools away if someone abuses them, or acts like a jerk. It also makes it less "special" to be an admin if the majority of regular contributors are admins, and it shouldn't be special to be an admin. Everyone who contributes should have an equal voice in the project, if it feels like there is a cabal who have all of the "power" that doesn't foster a democratic community. - TheDaveRoss 12:38, 19 July 2022 (UTC)Reply[reply]
@TheDaveRoss: I think you are too sanguine about giving out powerful tools. (Let's give everyone a gun so that there are always guns around to stop any potential mass shooter. Right?!) You can protect pages against non-admins, you can block accounts if necessary, etc. but none of these things work against admins. It is true that someone who was truly malicious could theoretically e.g. use a bot and attack the site using lots of accounts or IP's, but people of that nature are thankfully rare; most of the worst damage comes from people are misguided and stubborn, and think they are doing the right thing when in fact they aren't. I'd call this sort of breakage semi-vandalism, and it's much harder in practice to correct than true vandalism, because true vandalism is obvious and gets identified and reverted immediately, while semi-vandalism may hang around for awhile before being noticed, and in the meantime others may make legitimate changes on top of it. Benwing2 (talk) 05:57, 20 July 2022 (UTC)Reply[reply]
One personal tip on vandalism is to sneak some in amidst hundreds of good entries (another one is to keep on pestering the decent users to create a toxic environment so they don't want to return [not easy]). When I went admin rogue (five times), I didn't last long on my vandalism spree, and it couldn't have taken more than a few minutes to undo anything. To be fair though, if I had used a bot and been more meticulous it could have been much more devastating. But a tiny drop in the ocean is the worst we can expect, nobody's going to be able to destroy the website! Dunderdool (talk) 10:11, 20 July 2022 (UTC)Reply[reply]
@Dunderdool, TheDaveRoss Just FYI, Wonderfool makes it out like the only issues have been with occasional times he "went rogue" and started vandalizing, but in fact I've spent a great deal of time and effort cleaning up Wonderfool's mistakes. User:Equinox knows what I'm talking about. (Same, I should add, with User:SemperBlotto and his and his bot's mistakes, along with certain other editors who continue to make sloppy edits even after repeatedly being warned to be more careful. I may need to start blocking these editors to get their attention ...) This is a good illustration of "semi-vandalism"; the mistakes are mixed in with good commits so it's very hard to find and fix them. Wonderfool makes huge batches of "drive-by" edits and has never shown much inclination to fix his own mistakes. Benwing2 (talk) 22:43, 24 July 2022 (UTC)Reply[reply]
Most of the mistakes eventually get fixed. Dunderdool (talk) 22:59, 24 July 2022 (UTC)Reply[reply]
@Dunderdool Exactly. Someone else will always come along to fix your shit, so who cares? Benwing2 (talk) 01:20, 25 July 2022 (UTC)Reply[reply]
Ben getting the idea that there might be a difference between occasional mistakes and deliberate sabotage. Look when you are eight fucking colons indented, just give it up. He rejoices in wasting your time. Equinox 01:26, 25 July 2022 (UTC)Reply[reply]
What does that have to do with a deleter role? The damage he does as an admin is obvious and quickly reverted, the damage he does as a drunk editor is the perfidious stuff that sticks around for years until someone spots it. Neither situation would change if a deleter role was created. - TheDaveRoss 12:29, 25 July 2022 (UTC)Reply[reply]
@Benwing2: Your gun analogy is flawed, since the consequences of misuse of a gun are extremely high, whereas if someone misuses admin buttons we can take them away and undo everything they did with them without much effort. The tools are just not that powerful, and they are not abused particularly frequently. It is probably too hard to take the tools away when someone isn't playing nice, but I would advocate for making it easier to take them away rather than keeping them out of the hands of people who might use them productively. - TheDaveRoss 12:16, 20 July 2022 (UTC)Reply[reply]
@Surjection, because you were one of the lead opponents the last time, how would you respond to the two questions I posed? Is there any form of the deleter role that you would vote in favor of? Imetsia (talk) 17:55, 24 July 2022 (UTC)Reply[reply]
No. The points I raised earlier still stand as well today. If we can trust someone to (un)delete pages, we should be able to trust them with the other sysop tools too. — SURJECTION / T / C / L / 18:15, 24 July 2022 (UTC)Reply[reply]
My thought is: it is not necessary. People who need this can already do it. Unlike some people who may be more diplomatic (or afraid), I don't mind saying that we don't need to argue back and forth with Wonderfool (as above), nor to provide tools to people like "Svartava". Equinox 01:15, 25 July 2022 (UTC)Reply[reply]
I voted against this proposal in the last vote, and my views haven’t changed. — Sgconlaw (talk) 04:40, 25 July 2022 (UTC)Reply[reply]

Italian fully accented forms[edit]

Why don't we use the head template in Italian entries to display the fully accented form? An older discussion from 2017 settled with hyphenation or IPA, since they already give out the stress, but that's hardly a reason. Non-mandatory accents (that is, not word-finally) are sometimes spelled to avoid homography (eg. sùbito, princìpi, pèsca), it's not just something to put in the pronunciation section. Moreover, we already do it for verbs (only on the lemma entry though), let's be consistent about this. Catonif (talk) 20:41, 18 July 2022 (UTC)Reply[reply]

@Catonif, Imetsia, Sartma I was the one who added the accent marks to Italian verbs. It's true that we routinely add accent marks to some languages to indicate stress (various Slavic languages, for example), and macrons to some languages to indicate length (Old English, Latin, Ancient Greek, ...). The trickiness in Italian is (a) that accents are in fact standardly written when word-final, but not otherwise, and adding them everywhere might be confusing; (b) some monosyllabic words are standardly written with an accent, some with out it. In languages with added diacritics, we normally provide for automatically stripping them out when generating links to terms, which would be logical here except for (b), which makes it hard to implement. Interested in hearing what native Italian speakers think. Benwing2 (talk) 01:57, 19 July 2022 (UTC)Reply[reply]
The written accent can be either:
  • mandatory: word-finally and certain monosyllables, it's already reflected on the standard spelling and cannot be removed (eg. è, così, dovrà)
  • non-mandatory: rarely written, though often used to avoid homography or mispronunciation (eg. còrso, nartèce). This is also the kind of accent you've added in verbs. The general rule is that you could even write uòvo, in case you don't want anyone to for some reason pronounce it ùovo or uóvo (eg. Treccani uses uòvo).
    • useless: in plain bisillabe words where the accented vowel is à, ì or ù (eg. càpo, tìpo, pùpo). Treccani doesn't use it, but it's what happens with verbs (see stìmo at stimare), and I'd keep it for consistency.
  • forbidden; in monosyllables where the accent isn't mandatory, it is forbidden (eg. *é (and), *à (to), * (it has), * (I know), *ài (to the)). The head template should display these terms without accents.
Side note: archaic conjugation of avere (ò, ài, à, ànno) are alternative forms with mandatory accent.
TL;DR, let's do the same as Treccani does, except we specify it even when it's 'useless'. Catonif (talk) 08:31, 19 July 2022 (UTC)Reply[reply]
It should be noted that, while marking stress in words such as càpo/tìpo/pùpo may be useless to anyone at all familiar with Italian, it is not so to anyone unfamiliar with it. Nicodene (talk) 08:50, 19 July 2022 (UTC)Reply[reply]
@Benwing2, @Catonif, @Nicodene: As long as we have the IPA transcription(s), I would only write the accent when it's mandatory. Wiktionary shouldn't be prescriptive, so we should never indicate accents on E's and O's (there are native speakers that only have 5 vowels, and even native speakers who have 7 don't all agree on whether accented E's and O's are open or closed in a given word). E.g. I say "pésca" for both (prescribed) "pésca" (fishing) and "pèsca" (peach), "dòccia" for prescribed "dóccia", etc. All Italian dictionaries are prescriptive when it comes to pronunciation, which is in clear contradiction with WT:NPOV, so I would avoid copying what they are doing. Sartma (talk) 00:15, 20 July 2022 (UTC)Reply[reply]
That's not how it works. The dictionaries are in fact descriptive when it comes to the standard/traditional pronunciation. By your reasoning, we should remove practically all of our pronunciations, including for languages other than Italian, so long as there exist regional varieties to which they do not apply. Not gonna happen. I suggest simply adding additional regional pronunciations if you want representation. Nicodene (talk) 05:00, 20 July 2022 (UTC)Reply[reply]
@Nicodene: No, not for Italian. There's no Italian native speaker who coherently pronounces words as shown in Italian dictionaries (unless they took pronunciation classes). Italian dictionaries are not descriptive when it comes to pronunciation, they're prescriptive. That's also the main reason why we don't write accents on every word that's not accented on the penultimate, like they do in Spanish. There are no native speakers of "standard Italian" in Italy, that's a constructed language we learn in school, and while most might manage to get to a good level of written standard Italian, on the pronunciation front no Italian native speaker speaks like the dictionary. Sartma (talk) 06:31, 20 July 2022 (UTC)Reply[reply]
I do not have it in me to take this seriously. Want to tear down the standard Italian pronunciations on Wiktionary? Have at it. Sounds like a permaban speedrun.
And no, mid-vowel disagreements are not in the slightest the reason why you 'don't write accents on every word that's not accented on the penultimate'. The assumption that the lack of a consistent written distinction must need some special explanation like that is baseless, considering that it is the norm, not an exception, for a writing system not to indicate all of a language's phonemic information. Humans do not need comprehensive training wheels to read their own language, and they rarely care enough to consistently accommodate foreign learners. For Italian in particular, the lack of a Latin precedent for directly marking vowel quality is also relevant. Nicodene (talk) 07:25, 20 July 2022 (UTC)Reply[reply]
@Nicodene: "Want to tear down the standard Italian pronunciations on Wiktionary? Have at it."
That's really not what I said. Talk about straw man! Please read my comments again.
I believe pronunciation issues belong in the pronunciation section. I didn't say we should "tear down standard Italian pronunciation on Wiktionary". I'm just saying that I don't think we should add non-standard accents on Italian headwords, and explained that pronunciations in Italian dictionaries are indeed prescriptive, which is a quite undisputed fact. That's all. It's such an obvious thing to whoever studied Italian linguistics (it's one of the first thing you learn at University) that to be honest I'm not sure why you're reacting so badly to it...
About not writing accents in all words that aren't accented on the penultimate: you talk about the lack of a Latin precedent as being relevant, but the same is true for Spanish, and they indicate all accents anyway, so not sure about that line of reasoning. I don't think anybody was ever afraid of accents and other diacritic signs in Romance Europe, you just need to have a look at older documents, they have any kind of them, and pretty much all Romance languages use quite a wide variety of them. It's Germanic Europe that has a diacritic phobia. XD Sartma (talk) 07:43, 20 July 2022 (UTC)Reply[reply]
You were making an argument about pronunciation, and now you're surprised that someone found it relevant to pronunciation sections.
We're not talking about some ludicrous 'diacritic phobia'. We are talking about the fact that most of the world's writing systems do not convey all phonemic information. Want Romance examples? Catalan, Portuguese, and Campidanese Sardinian spelling does not consistently distinguish close-mid vowels from open-mids either. Nicodene (talk) 08:05, 20 July 2022 (UTC)Reply[reply]
@Nicodene: Please, re-read my comments. The whole thread is about adding diacritics to Italian headwords, not about the pronunciation section, and my comments are on that topic. You can't discuss Italian accent diacritic without talking about pronunciation, that's the very reason why one would use them.
I'm fluent in English and Japanese, probably two of the languages with the weirdest writing/spelling systems in the world, so I'm very well aware that most writing systems do not convey all phonemic information. The point is that if you add diacritics to Italian entries you'll have to decide for one or the other pronunciation when it comes to E's and O's, and if you follow an Italian dictionary, you'd be giving a prescriptive pronunciation. Sartma (talk) 08:23, 20 July 2022 (UTC)Reply[reply]
@Sartma We already have 'decided for [sic] one or the other pronunciation when it comes to E's and O', your attempt to pretend that the pronunciation section is somehow irrelevant notwithstanding. I do not understand your personal crusade against Italian dictionaries anyway. As has already been explained, nobody is stopping you from adding regional pronunciations as well, which has already been done on some entries.
'Prescriptive' this, 'prescriptive' that. We get it: Standard Italian pronunciation is *evil*, and you have arrived to save the human race from it. We all appreciate your heroic efforts. Just try not to let it bother you that it is actually descriptive to describe a standard pronunciation without banning others. Or that practically any pronunciation given on Wiktionary is 'prescriptive' by your standards, since there exist dialects or accents that differ from it. Nicodene (talk) 09:11, 20 July 2022 (UTC)Reply[reply]
@Nicodene: With all your straw men you make it really unpleasant discussing with you. I would appreciate if you would stop putting words in my mouth that I never said, thank you. If you need to quote me, just copy paste what I wrote between quotation marks and spare me your personal and absurd interpretations.
Prescriptive is prescriptive. It's neither good nor bad. It was my understanding that we don't do that on Wiktionary, that's why I'm pointing it out. Especially since that's what I was told when talking about analogue issues for Japanese (see below).
Again: I never said I want to change the pronunciation section. Please stop using that straw man. We're not talking about the pronunciation section in this thread, why do you keep bringing that up?
All standard languages are "formalised" to some degree, that's part of the process of standardisation, but while a lot of languages are based on an actually spoken variety, Italian isn't. There have never been native speakers of Italian to begin with, and there are none to these days. That's why Treccani has no issues saying that l’italiano standard viene ad essere una lingua artificiale, senza nessun reale equivalente in nessuna varietà effettivamente parlata da una concreta comunità linguistica all’interno del territorio nazionale ("The result is that Standard Italian is an artificial language, without any real equivalent in any variety actually spoken by an actual linguistic community within the national territory."). That's the reality of the language you are dealing with, whether you like it or not. And it shouldn't shock anyone. Sartma (talk) 09:59, 20 July 2022 (UTC)Reply[reply]
Oh please, spare me the tactical dishonesty. You know full well the logical conclusion to what you're arguing.
Let's suppose that I accepted your argument that indicating a standard Italian pronunciation of mid-vowels in the part-of-speech section on Wiktionary is simply unacceptable because 'muh prescriptivism'. Your mission now is to convince me that indicating the same mid-vowel in the above pronunciation section, both in clear IPA characters and in the exact same è/é system found in the part-of-speech section, is somehow acceptable. Good luck.
Nothing that you have said about Standard Italian is not incredibly obvious. That is how formal/educated pronunciation works in any number of other languages, including English. Shall we start stripping our standard English pronunciations, etc. from Wiktionary?They're prescriptivist after all, surely. Nicodene (talk) 10:37, 20 July 2022 (UTC)Reply[reply]
@Nicodene: I never said it's unacceptable. Here is what I said:
"As long as we have the IPA transcription(s), I would only write the accent when it's mandatory."
At this point I assume you're not misreading/misinterpreting/misrepresenting what I write on purpose, but you just can't help yourself, so I'm gonna stop here. I said what I had to say anyway.
You do have terrible reading comprehension skills, though, so I strongly recommend you do some training in that area. I would start with refraining from imposing your thought on a text and read what's written, not what you want to read in it. Sartma (talk) 13:26, 20 July 2022 (UTC)Reply[reply]
While you may not have literally said that it was 'unacceptable', you said in the very next sentence that 'we should never indicate accents on E's and O's'. Is there such an important difference between saying that we should never do something on Wiktionary and saying that something is unacceptable on Wiktionary, or are you (like last time) using a quibble over semantics as a deflection tactic?
You have not made the slightest attempt to explain how, per your own reasoning (see the above quote, plus for instance 'Italian dictionaries are prescriptive when it comes to pronunciation, which is in clear contradiction with WT:NPOV'), we would not have to remove the stressed mid-vowel distinctions indicated in Italian dictionaries from our pronunciation sections as well. (Keep in mind that the same pronunciation sections use both IPA and é/ó/ò/è, i.e. 'accents on E's and O's', which you specifically said 'we should never indicate'.) Either accept that your reasoning is wrong or take a consistent stance against showing Standard Italian mid-vowels. Nicodene (talk) 14:59, 20 July 2022 (UTC)Reply[reply]
@Nicodene: Well done for actually going back and reading what I wrote. Next step is quoting me in full, not just using clippings to make them say something they're not saying. My full sentence was: "Wiktionary shouldn't be prescriptive, so we should never indicate accents on E's and O's". If Wiktionary is non prescriptive, then we shouldn't indicate accents on headwords. I didn't say something absolute like adding accents on headwords "is simply unacceptable", as you put it. Yes, there is a different there. Take all the time you need to see it.
If you tell me that Wiktionary is fine with a degree of prescriptivism, then ok, let's add the accents.
I also clearly told you where I'm coming from, and that "standard accent is prescriptive so not good on Wiktionary" was the reasoning I was given when my proposal to add accents to Japanese romanisations was rejected. So I'm basically just repeating what I was told by other Wiktionary editors. That's why I also suggested it would be the case to make this question clear, so that everybody is on the same page. Sartma (talk) 15:56, 20 July 2022 (UTC)Reply[reply]
Your having said 'Wiktionary shouldn't be prescriptive' before that quote does absolutely nothing to change the following 'so we should never indicate accents on E's and O's'. Had you said 'if Wiktionary shouldn't be prescriptive', then this new interpretation that you are trying to claim ('if Wiktionary is non prescriptive...') would apply, and you would in fact be giving a conditional rather than the absolute that you actually gave (and as you have ludicrously tried to deny just now with 'I didn't say something absolute'). As it is now, you are simply lying. Nicodene (talk) 16:16, 20 July 2022 (UTC)Reply[reply]
@Nicodene: Anyway, if it really isn't a problem to give the dictionary "standard" pronunciation of a language here on Wiktionary, we should add this info somewhere, so everybody here is on the same page. When I proposed to revise the pronunciation section on Japanese entries and also pitched the idea to add the accents on JP romanizations, I was told by Japanese editors that "the pitch accent marked in most dictionaries is specific to the "standard" variety of Japanese, which is inappropriately prescriptive for our mission here at Wiktionary" and that "doing so would imply that Tokyo accent is the only pitch accent for all of Japanese, which is incorrect"... Sartma (talk) 07:26, 20 July 2022 (UTC)Reply[reply]
(I'll answer here instead of the other branch of the discussion because of space reasons and practicality) @Sartma is actually making a reasonable point. What you are describing is sadly the case in the North. In central Italy's not-too-rural areas though by now, in spite of the Treccani quote, Dante's Conlang is spoken as a mother tongue by nearly everyone. There is a notable number of words that have different mid-vowel open-ness in different areas, but not to the extent that should prohibit us from displaying the standard variants.
All of this though, seems to me completely beyond the point. Accents are occasionally written, and when they are, it's based on the standard, not on the pronunciation of the writer. For example, if on a text there's written pèsca ("peach") with the spelled diacritic, whether you pronounce it the same way or not as pésca ("fishing"), still makes the accented spelling pèsca mean "peach, and peach only". It becomes a sort of definition-reasoned-difference in spelling, kinda like a and ha. While pronouncing /ˈpeska/ for peach is a legitimate regional pronunciation, writing pésca for peach is just wrong. When Standard Italian decides the Standard Spelling for the Standard Words, it also decides which Standard Accents should be on there (even though they are Standardly Invisible).
This is the difference between this and the Japanese question. Italian written accents are standardized, Japanese written pitch accent is not (well, since it's not writable).
Let the nonstandard pronunciations be in the pronunciation section, and the standardly accented word be in the head template.
<sidenote> To @Nicodene I recommend not bringing Latin in the discussion, since it's not new to the standard pronunciation the concept of changing vowel quality to its liking, making the rest of the peninsula endure the pain of having to treat things like spòso, sógno or édera as standard and having their pronunciations, actually reflecting Latin, be labeled as "regional" (just wait till I get to join the Crusca and change this). </sidenote>
@Benwing2 I forgot to address your second point (the first one being addressed in the big older message in which I forgot to ping you). As for links I would only keep the mandatory accents in them. Even if we wanted all accents though, it would still be pretty logical to strip them away (just remove every non-word-final accent). Still, this could probably cause some problems, I think for example in loanwords from French or Spanish. In general putting written accents on links would be pretty overwhelming and might deceive non-Italian users to think that Italian actually spells like that. Catonif (talk) 15:32, 20 July 2022 (UTC)Reply[reply]
@Catonif: Regional Italian is definitely spoken as a mother tongue by most Italians these days, but there are no native speakers of Standard Italian (with the exact pronunciation given in dictionaries). One might reach a good level of spoken Standard Italian in school, but since no-one really cares about vowel openness or any other phonetic issues, one would retain their Regional Italian vowels (and consonants, and phonotactic doubling, etc.). English (both General American and RP) pronunciation is based on actual spoken varieties, and so is Japanese. There are/were actual communities of native speakers who speak/spoke like that. That's not the case for Italian. Standard Italian was never natively spoken by anybody (as Treccani clearly say), and still isn't.
(I'll reply to the rest later, have to go now). Sartma (talk) 16:16, 20 July 2022 (UTC)Reply[reply]
RP is based on 'actual spoken varieties' and Standard Italian is not? That's a creative claim. Please explain what fundamental difference you are imagining here. Nicodene (talk) 16:23, 20 July 2022 (UTC)Reply[reply]
@Nicodene: Standard Italian is a constructed language, there were no native speakers of it back then when they constructed it and there are none now. Sartma (talk) 16:58, 20 July 2022 (UTC)Reply[reply]
@Sartma: you're describing any written language with the possible exception of those with a logographic writing system. Thadh (talk) 17:36, 20 July 2022 (UTC)Reply[reply]
@Thadh: I'm not sure all written languages have the same history as Italian... I know that German has a similar one, even worse possibly, since it's a sort of Frankenstein language that came out of a mixture of two different chancery languages (but I don't remember the details). Was Russian the same? Were there no native speakers of Russian before they made it the standard language? Sartma (talk) 21:06, 20 July 2022 (UTC)Reply[reply]
@Sartma: you're mixing up "language" and "standard language". No, nobody speaks exactly as dictated by the standard language, that is impossible unless the criteria are incredibly lax. I doubt there was even one person in the history of English who consistently pronounced all vowels and consonants in all positions and contexts exactly as dictated by RP or GA, and used prescribed grammar at all times. Thadh (talk) 21:28, 20 July 2022 (UTC)Reply[reply]
@Thadh: I'm talking about pronunciation on a phoneMic level, not a phoneTic one. é/è and ó/ò in Standard Italian are phoneMic. RP pronunciation is based on the spoken language of the South of England, and native speakers there will pronounce all the phonemes of a word like they are given in a British English dictionary. They'll say /bʌs/ and not /bʊs/ (northern pronunciation) for bus. Sure, there will be phoneTic differences depending on the area, but not phoneMic ones. Sartma (talk) 21:52, 20 July 2022 (UTC)Reply[reply]
First of all, the claim that no Italian speaker's phonological system conforms to Standard Italian's is one that desperately needs a source.
Secondly, phonological analysis isn't something a standard language can dictate: It can prescribe a phonetic pronunciation of words, which can then be analysed phonologically by linguists. And if you do that, the ò-o merger can simply be analysed as a phonetic merger of the two within Standard Italian's phonological system. I'm sure other speakers merge vowels differently. Thadh (talk) 05:26, 21 July 2022 (UTC)Reply[reply]
@Thadh: Gian Luigi Beccaria, one of the most famous Italian linguists, says it clearly:
“in Italia, chi più chi meno, tutti parlano con qualche venatura regionale. Non c’è nessuno in Italia che possieda l’italiano standard come lingua materna.” (in Italy, more or less everyone speaks with some regional influence. There’s no one in Italy who has Standard Italian as a native language)
I'm not sure what you mean with "phonological system" non conforming to Standard Italian. What I'm saying is that no Italian native pronounces the language consistently as it appears on dictionaries. The pronunciation given in a dictionary is no-one's native pronunciation in Italy. People in Florence might get most of the vowels right, but they pronounce consonants differently, so not even they speak like the dictionary prescribes.
Italian was created and fixed as a language way before any native speaker of Italian even existed. At the time people were all speaking different vulgars. Italy itself didn't even exist. I don't know how many other countries had a national language (spoken by no-one) before even having a nation. When Italy became a country in 1861 "Italians" didn't exist. One of the most famous sentence we learn in school is that once they unified the Italian territory the main task was to "create Italians": "Abbiamo fatto l'Italia, ora dobbiamo fare gli Italiani" (We created Italy, now we need to create Italians). All this despite having Italian as a codified language (on paper, not spoken by anyone) since around 1600. Italians started "learning" Italian only with the diffusion of the radio and the TV (they did study it in school, but the major forces for the actual spreading were radio and TV. Most people before my parents generation (talking about those coming out of the second world war and before) would barely complete the first 5 years of school). We literally had "Italian classes" on radio and TV. My grandmother died 4 years ago and she never spoke one word of Italian (just to get you an idea of how recent this all thing is).
It's also strange to think of people pronouncing O's and E's differently as a question of "merger". It's not like there was a spoken Italian to begin with and in a second moment people in different regions started to merge sounds. The differences in pronunciation reflect that of the vulgar varieties spoken in a place way "before" Italian made its entrance in the "spoken" scene. Since there were no native speakers of Italian to mimic (as there aren't to these days), anybody just pronounced it the way that came natural to them, reflecting their own regional language.
You guys should pay me. All these information are available everywhere and are nothing special, it's no secret to anybody. I'm literally giving you lessons on Italian historical linguistics for free! XD Sartma (talk) 07:34, 21 July 2022 (UTC)Reply[reply]
Simply put, the problem is that you are not familiar enough with other languages to realize that Italian is in no way a special case here. What you have written can be said—adjusting for particularities of time, place, and so on—about RP/'the Queen's English', about Standard French, about Standard German, and any number of other cases.
Incidentally, nobody owes you anything, let alone money, for spamming trivial information that is available just about anywhere (even the not-particularly-good Wikipedia page on Italian) mixed in with your own questionable notions, such as the claim that 'people in Florence might get most of the vowels right, but they pronounce consonants differently' in a discussion where you yourself insisted 'I'm talking about pronunciation on a phoneMic level, not a phoneTic one'. For the record, neither la gorgia nor the intervocalic deaffrication are 'phoneMic'. Nicodene (talk) 09:45, 21 July 2022 (UTC)Reply[reply]
@Nicodene: Here, learn something: Who has the most neutral accent in Italian? Sartma (talk) 11:14, 21 July 2022 (UTC)Reply[reply]
Rather than wasting fifteen (more) minutes of my life on this, why don't you tell me directly- does the random Youtuber that you have cited actually corroborate the notion of phonemic consonant differences between Florentine and the standard? With a timestamp preferably. Nicodene (talk) 12:14, 21 July 2022 (UTC)Reply[reply]
@Nicodene: You crazy? I'm happy to waste your time. You're wasting mine, so it's just fair, don't you think? Watch the whole thing and learn something. Sartma (talk) 13:08, 21 July 2022 (UTC)Reply[reply]
@Sartma Answer the simple yes-or-no question and (if it's 'yes') I will be happy to watch the entire video. Nicodene (talk) 14:19, 21 July 2022 (UTC)Reply[reply]
@Nicodene, Sartma (1) No, the video displays the phonetic gorgia as the only difference from standard italian. (2) What does it matter if the gorgia is phonemic or phonetic? Sartma himself said that Florentine people "might get the vowels right". Weren't we talking about accents? Are accents on consonants? (3) Do you guys even remembered why this was relevant in the first place? (4) The way you ended up speaking with each other is the most infantile I've seen on the wikt and makes me value the efficiency of the whole project way less. Snap back to your reasonable selves, please! Excuse my straightforwardness, but this has gotten kind of ridiculous, I'm sure you agree. Catonif (talk) 19:27, 21 July 2022 (UTC)Reply[reply]
@Catonif: The gorgia is not phonemic, and I never said it was. As you said, it's irrelevant. I was talking about é/è and ó/ò.
The only thing I've been saying is that the pronunciation you find in an Italian dictionary is prescriptive. There is no native speaker in Italy that natively pronounces words the way prescribed by the dictionary (unless they study diction). Standard Italian pronunciation is not a native Italian pronunciation, so can't be "described", can only be "prescribed". That's all. Not sure why this fact is so shocking.
Considering that Wiktionary shouldn't be prescriptive, I wouldn't add non-mandatory accents to Italian headwords. Dictionary pronunciation is given in the Phonetic section both in the IPA transcription and in the hyphenation anyway, so it would just be redundant. Sartma (talk) 22:26, 21 July 2022 (UTC)Reply[reply]
> The gorgia is not phonemic, and I never said it was. As you said, it's irrelevant. I was talking about é/è and ó/ò.
Oh really? Then what, exactly, did you mean when you said 'people in Florence [...] pronounce consonants differently' in a discussion about 'pronunciation on a phoneMic level' where you yourself just admitted that the gorgia is irrelevant?
> Standard Italian pronunciation is not a native Italian pronunciation, so can't be "described", can only be "prescribed". That's all. Not sure why this fact is so shocking.
What's shocking is that you still have not grasped that exactly the same can be said about RP (etc.)
> Considering that Wiktionary shouldn't be prescriptive, I wouldn't add non-mandatory accents to Italian headwords.
Meanwhile you are fine with having the same mid-vowels indicated two different ways directly above the part-of-speech section? How is that not 'prescriptive', if we accept your argument? I seem to recall you saying 'Wiktionary shouldn't be prescriptive, so we should never indicate accents on E's and O's'. Nicodene (talk) 06:24, 22 July 2022 (UTC)Reply[reply]
1) Thank you. There is no small irony in posting that video, along with the remark 'learn something', in response to a comment that already mentioned the gorgia toscana and that it is not phonemic. 2–3) It is relevant because he himself chose to make it relevant by declaring 'I'm talking about pronunciation on a phoneMic level, not a phoneTic one' as part of an attempt to deny the parallels between RP and Standard Italian. Nicodene (talk) 20:43, 21 July 2022 (UTC)Reply[reply]
@Thadh: And from what I read in Wikipedia, both General American and RP English pronunciations are based on actually spoken varieties. Standard Italian pronunciation is not, and that's why no native Italian speaker ever consistently pronounces words the way they appear in an Italian dictionary. That's also why Italian dictionary's pronunciation is prescriptive, and not descriptive. Sartma (talk) 21:12, 20 July 2022 (UTC)Reply[reply]
You know full well that Standard Italian is and was based on traditional Tuscan, very much 'actually spoken'. Nicodene (talk) 20:16, 20 July 2022 (UTC)Reply[reply]
@Nicodene: You have a very mystified knowledge of the history of the Italian language. Standard Italian was never based on the spoken language of Florence, but on a literary, written version of it that was a mixture of Florentine, Latin and other vulgars. Here, copy-paste this into Google translate and read the part in bold out loud 10 times:
"Il modello di lingua che viene codificato è «il toscano urbano della classe colta di Firenze» (Galli de’ Paratesi 1984: 60), cioè una varietà scritta, un registro letterario con influenze latineggianti e di altri volgari, e non il fiorentino parlato. Non tutte le caratteristiche del fiorentino sono quindi accolte dallo standard. L’italiano standard in effetti non ha mai, fin dalla codificazione cinquecentesca, coinciso esattamente con il fiorentino, e sin dal Seicento ha accolto, data anche la mancanza fra il tardo Cinquecento e l’avanzato Ottocento di un centro preminente che imponesse una norma, innovazioni di varia provenienza. La distanza dal fiorentino si è ancora accresciuta dopo l’Unità d’Italia, nonostante i tentativi puristici di imporre il fiorentino moderno come modello, in particolare per la pronuncia." (from Treccani: Italiano Standard) Sartma (talk) 20:59, 20 July 2022 (UTC)Reply[reply]
That is adorable. You know what it sounds like? A middle-schooler smugly proclaiming 'No, you're wrong, humans didn't evolve from monkeys, they evolved from apes' and triumphantly pointing at a Wikipedia page, never pausing to think about whether apes themselves evolved from monkeys and so humans through them as well. Likewise, you have apparently failed to realize that 'il toscano urbano' was itself based on 'il fiorentino parlato' (senza parlare delle altre varietà toscane), whatever additional 'influenze latineggianti e di altri volgari' it had. Did it never occur to you that literary English also has Latinizing influences and no shortage of influence from, say, French and Norse?
I have to wonder how you imagined this happening, if you ever stopped to think about it all. Did you picture Dante Alighieri sitting down one afternoon and saying 'You know what? Imma just invent an entire phonology, an entire set of noun and verb inflexions, and an entire grammar and syntax out of thin air just for the lolz' and presto, we have Standard Italian in its embryonic form? If so, it is nothing short of a miracle how well the result aligned in all of these aspects to Old Florentine. Nicodene (talk) 06:29, 21 July 2022 (UTC)Reply[reply]
@Nicodene: Sorry, I'm tired to keep up with your straw men. You're more interested in belittling me, than to have a discussion about the actual topic of this thread, and while I can excuse that behaviour the first couple of times (we can all slip, we're all humans), I'm not willing to engage with you further if that's just your default. So, ciao ciao! Sartma (talk) 08:55, 21 July 2022 (UTC)Reply[reply]
I'll take this as your giving up on even trying to argue the point. Nicodene (talk) 09:34, 21 July 2022 (UTC)Reply[reply]
@Nicodene: Of course you do. I'm not surprised. Sartma (talk) 13:16, 21 July 2022 (UTC)Reply[reply]
Feel free to demonstrate otherwise by actually arguing the point instead of devolving into vaguely flippant one-liners or other irrelevancies. Nicodene (talk) 14:01, 21 July 2022 (UTC)Reply[reply]
Oh no, that was the least relevant part of the entire message lol. I should have ordered my points better. Catonif (talk) 16:56, 20 July 2022 (UTC)Reply[reply]
@Catonif: I know, sorry! Don't really have time atm. Will reply to the rest later! Sartma (talk) 17:07, 20 July 2022 (UTC)Reply[reply]
@Catonif: "This is the difference between this and the Japanese question. Italian written accents are standardized, Japanese written pitch accent is not (well, since it's not writable)."
It's not really "written accents" that are standardised, but the very pronunciation of Standard Italian. Accents just follow, as a way to show that.
Japanese pitch accent is also standardised and it can easily be indicated on the romanisation, but apparently even just giving the standard accent on headwords is "prescriptive, and not in accord with our mission on Wiktionary"...
As a matter of fact, though, it's pretty much the same case. The fact that accents can be more easily added to an actual Italian word than to a Japanese one doesn't change the fact that what's standardised in either case is the pronunciation, not the way to indicate it. Sartma (talk) 23:09, 20 July 2022 (UTC)Reply[reply]
@Catonif: I agree with you that should there be the need to disambiguate two words that are prescribed in Standard Italian as having two different vowels, one would use the accent given in a dictionary, but for the only reason that that's what dictionaries prescribe. Left to their own without a dictionary to check, most Italians would be lost. Nothing in this process is "descriptive". Mind that the vast majority of Italian homophones are written exactly the same and are impossible to disambiguate, we're talking about a very tiny minority of cases here. There's no way to disambiguate "la vecchia porta la sbarra" ("the old lady carries the bar" or "the old door is blocking her"?), for example, and no-one really cares either.
Either way, we do indicate the prescribed accent in the Pronunciation section under hyphenation so I don't see the need to add it on the headword too. It would just be redundant. Sartma (talk) 21:31, 20 July 2022 (UTC)Reply[reply]

Let's recap. Reasons to agree:

  • Because we already do this
    • For consistency, by analogy of Latin, Slavic Languagues, Tagalog, etc.
    • For consistency with the verb lemmas, to which @Benwing2 already added this feature that I very much enjoy
  • Because the pronunciation section is not the right place for this
    • Not all pages have (or should have) a pronunciation section. While I don't dislike the {{it-pr}} module per se, it does take a notable amount of space, and seems kind of out of place in small entries like nottolone. This is especially true for non-lemma forms: how could I know that sagginassero is stressed on the antepenult? That is, without having to look for it in the gigantic conjugation table on the lemma entry.
    • IPA might be misleading in regional-only words, which are actually never pronounced as the Standard key suggests.
  • Because literally every Italian dictionary does this, let Wiktionary not be the weird kid

Problems arising:

  • Monosyllables' accent is sometimes forbidden
    • Solution: never write an accent on a monosyllables unless it is mandatory (see a longer explanation in my second post in this thread).
  • Non-Florentine accents
    • Solution 1: write both the standard and regional accent form (see porgere), but it doesn't look very good and I don't like the idea of having two links one right after the other that take me to the same exact page.
    • Solution 2: write the standard only. While the regional pronunciations are used and thus should be described, they are never used in the written language and are only pronounced. If something is never written and only pronounced, their place is the pronunciation section (this is the one thing that flipped the thread over).

Note: I don't know if here "headword" means "also in links". What I mean by headword is what is displayed by the head template. I'd leave the links alone. Being the newbie here, I ask whether usually here people vote before going in the practical details of the implementation. It would seem sensible, given the disagreement, but I don't think there's enough people interested in this for the votes to have any actual statistical meaning. I hope this doesn't turn into Does even Italian exist? again. Catonif (talk) 17:19, 22 July 2022 (UTC)Reply[reply]

@Catonif: Thanks for the recap! I think that all the reasons you give are quite weak, though:
  1. Consistency with some other languages (Latin, some Slavic languages, Tagalog): As far as I know, Inter-linguistic consistency on Wiktionary is not a thing. Each language has its own practices and standards, so the only thing that matters is Intra-linguistic consistency. Moreover, Italian has the issue of compulsory accents VS non-compulsory accents (very rare), plus words that never take an accent (the monosyllables you mentioned), which complicates things a lot. The other languages we mentioned don't have a similar issue and the diacritics used to indicate accents and vowel length are clearly different from any diacritic used in the orthography of those languages (Latin, Ancient Greek, Slavic languages), so no confusion ever arises.
  2. Consistency with the verb forms: these forms are a recent addition by @Benwing2 and I don't know whether the topic was discussed and voted upon back then before implementation. I take it there wasn't one, otherwise we wouldn't be here talking about this? If that's the case, I don't see why we should seek consistency in that direction. What we should do is removing all accents from Italian verbal headwords (they can stay in all other form and inflection tables). That would solve the issue and no further problems would arise.
  3. Pronunciation section not the right place: You don't give any reason why the Pronunciation section shouldn't be the right place, but your considerations about space are not an issue on Wiktionary, since Wiktionary is not paper.
  4. IPA: What we could do about the {{it-pr}} is marking the standard IPA transcription and hyphenation as "Standard Italian", and add other eventual regional pronunciations after that under "Regional Italian" (with {{q}} or something to indicate the region/area/city). Sartma (talk) 11:01, 23 July 2022 (UTC)Reply[reply]
@Sartma: (1) 'Inter-language consistency isn't a thing'. Generally speaking, wouldn't it be better if it was? (1b) Compulsory and non-compulsory accents don't seem to cause a mess: if it's word-final, it's compulsory. Monosyllables' solution is also simple and already mentioned: do not write accents on monosyllables unless it's mandatory by spelling. (3) 'Wiktionary is not paper' means 'Write as much information as you want, because hardwares are cheap', this doesn't have much to do with this, since writing {{it-pr}} requires only 9 bytes. What I'm talking about is the use of space of the actual displayed page which should be concise. (4) Take for example the Romanesco word "fusaglie" [fu'sajje]: [jj] is the result of a yeist merger of /ʎʎ/ and /j/ (and [s] a merge of /s/ and /z/, though this is of marginal importance). In this case we can't know which one of the two to use in the standard, since the word doesn't exist in the standard. This is a very simple example, and for this, we could probably also use a phonetic transcription, but for words used in a bigger region, there isn't a single phonetic realization. The thing is, I might not actually care/know about about the phonetics and phonemics of every dialect, and only want to write the stress, but before I would have to solve this problem. In the end I settled with an adhoc transcription only because I wanted to display the accent, which I'd deem not ideal. (note) {{a}} is used instead of {{q}} in pronunciations.
Tbh, I'm really about to just give up on this proposal, I thought it would have been easily accepted but it's kinda draining me and it's not like I'm particularly keen of this anyway. Catonif (talk) 12:12, 24 July 2022 (UTC)Reply[reply]
  1. I agree with you that inter-language consistency would be better. It would definitely be my preference, too. But here on Wiktionary it's not a thing. I was told that quite clearly in more than one occasion by more than one Editors/Admin, so it's just a matter of accepting that that's the way it is... (1b) The final/non-final rule is not immediate, and I can see how a reader could be confused by this difference. That question apart, it remains that before @Benwing2 added the accent on verb headwords, there was no question of consistency in the direction of "adding accents". As I said above, all problems and possible confusions are solved if we delete the accents from the verbs, especially considering that (as far as I understand) there was no vote on the subject.
  2. "Wiktionary is not paper" has repercussions on style too. As far as I know there is no rule mandating "concision", so I'm afraid that is just your preference...
  3. Zanichelli has fusaglia, it notes that it's Roman/Romanesco and that it's used mainly in the plural, and has no issues giving "/fuˈsaʎʎa/ (or /fuˈzaʎʎa/)" as its standard pronunciation. That word does exist as Regional Italian, so of course it has a Standard Italian pronunciation (all Regional Italian words have a Standard Italian pronunciation, since we're still talking about Italian, not of a "dialect").
I understand your frustration, it's part of working on a shared project like Wiktionary and I think we all went through it. You have to compromise on everything, often with people who know less than you but still have a voice, and often more power than you. Results are often sub-optimal, but there's nothing you can do about that. It's something you learn to deal with/accept at some point.
That said, I could agree with indicating the accented vowel or syllable in a different way. What about underlining them? Like this: fusa̱glia, saggina̱ssero, nottolo̱ne, etc.? Sartma (talk) 18:31, 24 July 2022 (UTC)Reply[reply]
I have an even better idea: using the grave/acute accents employed by Italian dictionaries, against which you have not provided a single logically consistent argument and which you are so far alone in opposing. Nicodene (talk) 19:28, 24 July 2022 (UTC)Reply[reply]
@Nicodene: Learn something:
Pronuncia dell'Italiano e fonologia dell'Italiano (copy-paste on Google translate):
"Una delle conseguenze più vistose del fatto che l’italiano è una lingua scarsamente parlata si ha sul piano fonetico. Manca uno standard parlato, perché la pronuncia dell’italiano che si è formata a partire dall’unificazione ha subito una forte interferenza delle fonologie locali: più che essere una vera e propria fonologia, dunque, è stata per molto tempo soltanto una mera pronuncia, ovvero una resa orale dello scritto (Mioni 1993; Schmid 1999; Bertinetto & Loporcaro 2005). Non esiste un corrispondente italiano della received pronunciation inglese: la pronuncia delle persone colte «in ogni regione è più simile alla pronuncia delle persone incolte della stessa regione che alla pronuncia delle persone colte di altre regioni» (Lepschy & Lepschy 1981: 13)." Sartma (talk) 23:14, 24 July 2022 (UTC)Reply[reply]
@Sartma I see we've circled back to 'Standard Italian pronunciation don't real, so we shouldn't have it here'. It seems that the point about logical consistency has been lost on you for the second or third time now, since the consequence of what you're trying to argue would be removing all Standard Italian pronunciations from Wiktionary- if, that is, you are actually right.
Of course even a cursory search for 'Standard Italian phonology' brings up reliable sources speaking of it as a real thing, shockingly enough. Luciano Canepari has published an entire book describing it in fine detail (Italian Pronunciation & Accents). In the same vein, see also Martin Kramer's The Phonology of Italian. For incidental comments in other sources, see for instance the Oxford Guide to the Romance Languages (e.g. 'in standard Italian, the phonemic contrast /s/ ~ /z/ is neutralized in preconsonantal contexts' or 'educated standard Italian speakers nowadays may have either no paragogic vowel or just a weak vocalic post-consontal release akin to an "excrescent" vowel'), Lori Repetti's Phonological Theory and the Dialects of Italy (e.g. 'vowel length in Standard Italian'), Clivio & Danesi's The Sounds, Forms, and Uses of Italian (e.g. 'phonetic feature missing from the pronunciation of standard Italian'), or Michele Loporcaro's Facts, theory and dogmas in historical linguistics (e.g. 'for Standard Italian, the experimental-phonetic literature shows that there is a gradual decrease in stressed vowel length as the number of syllables to the right of the stressed one increases'). We can go on all day with more examples. Nicodene (talk) 08:24, 25 July 2022 (UTC)Reply[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Sartma, Nicodene, Catonif Apologies for not reading all the discussion here, there was way too much arguing past each other. IMO Nicodene you are more guilty than others in this discussion of taking a negative tone, although Sartma I don't really understand your assertions that no one speaks Italian as a native language. (This may well have been true in 1861 when Italy was unified, but generations of schooling have resulted in large numbers of people who do speak essentially according to the standard, especially those whose native languages are/were quite dissimilar to the standard. Something very similar has happened with Standard German.) Also, User:Thadh is very correct in noting that the situation is not so different from other languages. The solution used in Wiktionary across all languages that have a standard form is to describe that standard; Sartma, I don't understand at all your desire to remove information on standard Italian from Wiktionary just because some people don't speak that way. I would be strongly opposed to using an ad-hoc method of indicating stress, such as underlining the stressed syllable, given that there is a nearly universally used standard way of doing so using acute and grave accents, which also conveniently indicates the quality of the stressed vowel. In other words I largely agree with Catonif in this regard (and would not be opposed to marking the stress and vowel quality on the headword, including of non-lemma forms, as we already do with verbs). Benwing2 (talk) 06:26, 25 July 2022 (UTC)Reply[reply]

@Benwing2: When I say that no-one "speaks" Standard Italian as a native language, I don't mean that there are no Italian native speakers and I'm not talking about grammar or vocabulary. I never said that. There are native speakers of Italian, of course, but the pronunciation of the Italian they speak is not the standard as indicated in the dictionary. Nowhere in Italy there is a place where the standard pronunciation is native. I mean that Standard Italian doesn't have a native phonology of its own, that the pronunciation given in dictionaries is no-one's native pronunciation. It's never been at any moment in history and it's definitely not nowadays either. And that should definitely shock no Italian (if they were born there and lived there, this should just come as an obvious piece of information to them, unless they're in denial). I understand that we all tend to think things are elsewhere just like they are at home, so Italian must be just like any other standardised national language in the world ("you're just too ignorant to know that", would add Nicodene), but it really isn't the case, and it would be nice to be able to have a bit more of a nuanced discussion here without going crazy and becoming offensive. I've literally just posted an article about Italian pronunciation by Treccani written in 2011 where scholars are quoted saying l’italiano è una lingua scarsamente parlata ("Italian is a hardly spoken language"), which is why manca uno standard parlato ("there is no spoken standard"). Italian pronunciation più che essere una vera e propria fonologia (...) è stata per molto tempo soltanto una mera pronuncia, ovvero una resa orale dello scritto ("more than being a true and real phonology, for a long time it's just been a sheer pronunciation (here they mean "reading"), i.e. an oral rendition of the written form"). They also clearly state that non esiste un corrispondente italiano della Received Pronunciation inglese ("there is no Italian equivalent to English Received Pronunciation").
In the same article, they say that Un modello di standard parlato sarebbe il cosiddetto fiorentino emendato (...). Il modello è stato poco praticato nell’insegnamento scolastico, poiché per certi versi artificioso (...), e in pratica non è appreso da nessun parlante come lingua materna: vale piuttosto come punto di riferimento normativo. ("A model for a spoken standard would be the so-called "emended Florentine". This model has been rarely used in scholastic education, since in a sense it is artificial, and as a matter of fact no speaker acquires it as a native language: it works rather as a normative point of reference.").
Of course if you ask Nicodene, they must all be ignorant scholars that know nothing about languages and Treccani is just a shitty Encyclopedia for ignorant people, but my question to you would be: how many linguistic sources of those other languages we're talking about would be fine with writing in 2011 that their respective Standard languages "have no true and real phonology" but they're just a "reading" of the written standard form? And that their spoken standard language has no native speakers?
I understand that this discussion is probably hitting on a lot of people's cognitive bias, but that doesn't make it less true. Italian dictionaries are prescriptive in the stricter sense of the term: they are normative, they tell you what you should say, despite no Italian speaker natively possesses that phonology system. Sartma (talk) 08:59, 25 July 2022 (UTC)Reply[reply]
@Sartma Here, learn something:
'There is one accent which is not connected with a specific locality, though it is rather more southern than northern in its overall character. This is RP, which is short for Received Pronunciation... Despite the advantages of RP as a regionally neutral accent, it has not displaced the local accents of England... RP is clearly a minority accent... Until World War II RP was also the exclusive accent of the BBC and it is still especially prominent there' (Gramley & Patzold 1992, A Survey of Modern English, p. 309).
For comparison, some 3% or so of Italians speak the standard (from here, p. 11). For the other parallels, such as relative prominence in the media, I assume you can connect the dots yourself. With all due respect to Treccani, the situation is comparable to that of RP in the UK. This is not the first time, incidentally, that I've disagreed with that source on matters outside of Italian lexicography, considering that it incorrectly derives Spanish botilla (< bota + -illa < Latin buttis + -ella) from Latin butticula (whence Spanish botija, with /x/ not /ʎ/).
As for this:
> despite no Italian speaker natively possesses that phonology system
Tuscans do, considering that we're 'talking about pronunciation on a phoneMic level, not a phoneTic one' (and, in any case, we only provide phonemic transcriptions for Italian). Granted, it is not always a perfect match on a lexical basis (e.g. Tuscans often have /ɔ/ rather than /wɔ/ in certain words), but neither is any regional variety of English to RP, that I'm aware of. Nicodene (talk) 11:15, 25 July 2022 (UTC)Reply[reply]
@Nicodene: I hope you'll excuse me for ignoring the random paper you found. It was written in 1983 for a random conference in Finland, with a bibliography where the oldest reference work is dated 1982. Hardly something worth quoting in any serious discussion. Bash Treccani as much as you wish, but at least make an effort to find something better? (besides, even in that paper they say that Italian dictionaries are prescriptive, lol).
Also, "Tuscans" don't all speak with the vowels indicated in dictionaries. Tuscany is a big region, there's a lot of Tuscans having different vowels, so that's just not true. Sartma (talk) 11:45, 25 July 2022 (UTC)Reply[reply]
@Sartma The figures come from Luciano Canepari's Italiano standard e pronunce regionali, which is not available online, hence I linked a paper that extensively quotes from it. Perhaps if you'd cared to look carefully, you might have realized that yourself.
'besides, even in that paper they say that Italian dictionaries are prescriptive, lol'
Lol omg lmao xD indeed. Why don't you look up RP + 'prescriptive' and see what you'll come up with? There's a world of possibilities out there, waiting only for your click. You'll also have to explain to me, as I already asked you to, how it is 'prescriptive' for Wiktionary to describe a standard pronunciation if regional ones are permitted as well.
I like how you've provided "Tuscans" with scare quotes. Anyway, if you personally believe that Tuscans in general differ, on the phonemic level and in some actually significant number of words (say- 10%?), from Standard Italian with regards to /ˈe/ versus /ˈɛ/, or the corresponding back vowels, I'd like to actually see a source for that totally legit, definitely-not-made-up-on-the-spot notion. Bear in mind that this has nothing to do with the overall 'phonology system', as you put it. Nicodene (talk) 11:58, 25 July 2022 (UTC)Reply[reply]
@Nicodene: the difference between RP and Italian dictionaries standard pronunciation is that RP came to be as a linguistic reality first, and only afterwards it was described (and sometimes prescribed). Italian standard pronunciation never existed and still doesn't exist outside of a dictionary, so it can only be prescribed, it can't be described. Florentine and Tuscan people speak with a Florentine or Tuscan accent(s), they don't speak Standard Italian. Sartma (talk) 13:36, 25 July 2022 (UTC)Reply[reply]
Standard Italian pronunciation very much does exist; cf. the several reliable sources that I quoted earlier and the most recent one as well. I'm sorry, but this is simply not something you can deny without descending to the level of flat-eartherism. At this point I have to wonder if you are in fact trolling, considering that you directly admitted 'I'm happy to waste your time'.
'Florentine and Tuscan people speak with a Florentine or Tuscan accent(s), they don't speak Standard Italian' is such a complete non-sequitur that I am flabbergasted. We were clearly talking about differences between Tuscan and the standard. If you do not actually have an answer, just say so.
Edit: just for good measure, another set of sources:
  • Bertinetto & Loporcaro, The Sound Pattern of Standard Italian as Compared with the Varieties Spoken in Florence, Milan and Rome. No need to cite a specific page here, as the entirety of it is à propos. Per the abstract, 'this paper is a condensed presentation of the phonetics and phonology of Standard Italian...'
  • Claudia Vigario et al., Phonetics and Phonology: Interactions and Interrelations, p. 141: 'The elision of voiced vowels also serves to increase the complexity of consonant clusters, creating heavy or superheavy syllables not licensed in standard Italian phonology...'
  • Gabriel et al., Manual of Romance Phonetics and Phonology, p. 276: '...in southern varieties of Italian, a word like roba 'things, stuff' can be produced with a geminate [bː]: [rɔbːa], and the duration of geminates is typically phonetically longer than in Standard Italian'.
  • Gibson & Gil, Romance Phonetics and Phonology, p. 92: 'Some phonetic information for contemporary standard Italian is provided in Ladefoged and Maddieson (1996: 218–21). According to these authors, Italian intervocalic rhotics are realized as trills... The apical alveolar trill is also reported by Bertinetto and Loporcaro (2005: 133) to be the "unmarked allophone" of the rhotic phoneme in standard Italian'.
Also on the previous page: 'Contemporary standard Italian inherited its phonemic inventory from the Tuscan (Florentine) dialect and is phonetically very close to the variety spoken in Florence and other Tuscan areas...'
  • Ledgeway & Maiden, The Cambridge Handbook of Romance Linguistics, p. 240: 'As for standard Italian, stressed vowels in open syllables undergo lengthening, while all unstressed vowels are short...'
  • Maiden et al., The Dialects of Italy, p. 67: '...the dialectal data provided evidence of the CG constituent, while such evidence was lacking from the phonology of Standard Italian'. Nicodene (talk) 15:41, 25 July 2022 (UTC)Reply[reply]
@Benwing2: About using accents as Italian dictionaries do, I'm not against that by principle. But by doing so we are being prescriptive. If that's ok here on Wiktionary, than let's go for it. In the past I was told that this is not accepted here on Wiktionary. @Eirikr (talking about indicating pitch accent in Japanese), talking about Japanese standard pronunciation, for instance, told me:
  • "I would like to draw your attention to the second point at Wiktionary:What_Wiktionary_is_not -- our aim with Wiktionary is to describe how words are used, not to prescribe how words should be used. Specifying a pitch accent pattern in all of our romanizations is overly specific and incorrectly prescribes pronunciation for that term."
  • "the pitch accent marked in most dictionaries is specific to the "standard" variety of Japanese, which is inappropriately prescriptive for our mis