Wiktionary:Beer parlour/2021/October

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← September 2021 · October 2021 · November 2021 → · (current)

Definitions of Letters[edit]

As words of a particular language, many letters have definitions such as "the second letter of the Welsh alphabet". (The Welsh entries themselves are not quite so bad, as they also then spell out the letter and gives their predecessors and successors.) Such definitions are intrinsically unstable, for letters may be inserted in an alphabet. For example, the letter 'j' has been added to the Welsh alphabet since I was a child, and as a result of different sources we now have the opening definition "the fourteenth letter of the Welsh alphabet" for both J and L! As a result of the deletion of letters, both Ll and N are defined as 'the 14th letter of the Spanish alphabet'. --RichardW57m (talk) 11:11, 1 October 2021 (UTC)[reply]

I therefore feel that it would be appropriate to change definitions of one-character letters from "the nth letter of the WW alphabet" to "the letter of the WW alphabet used as the header word of this entry", and add "It is the nth letter of the WW alphabet" to the "Trivia" section of the entry. History may cause the trivium section to expand. Multi-character letters would be handled by analogy. As boldly making this change might be considered vandalism, what do people feel about this proposed change? Does it need a vote? --RichardW57m (talk) 11:11, 1 October 2021 (UTC)[reply]

Should we be documenting the use of letters in non-additive numbering systems, such as 'Section 5(c)'? The most significant feature of such systems is that some letters are not used in such lists. I can see an argument that such documentation belongs to a grammar, rather than a lexicon.--RichardW57m (talk) 11:11, 1 October 2021 (UTC)[reply]

I feel like this discussion will be pointless if the vote about letters entries passes. Thadh (talk) 11:23, 1 October 2021 (UTC)[reply]
@Thadh: How so? Are you assuming that all the letter entries of a language can be squeezed into a single table? --RichardW57m (talk) 12:30, 1 October 2021 (UTC)[reply]
@RichardW57m: Not necessarily in a table, but they probably won't look the same way they do now, so it doesn't make much sense to discuss the way they look in entries before we know where the vote's heading. Thadh (talk) 13:47, 1 October 2021 (UTC)[reply]

Rhyming categories for Middle Chinese[edit]

I think all the data for Middle Chinese rhymes are already there. Those data were sourced from rhyme dictionaries in the first place. Is there plan for actually implementing Middle Chinese rhyming categories? This may even be a fairly good case for automation. --Frigoris (talk) 16:53, 1 October 2021 (UTC)[reply]

HSK lists of Mandarin words update[edit]

Currently, Wiktionary has Appendix:HSK list of Mandarin words accumulating all the vocabulary of the old (pre-2010) HSK test. Recently, the exam was reformed, and the lists of words and characters were published. See this pdf for official specifications. Thus, I propose to update the appendix.

I made drafts of the new HSK word lists:

HSK Beginner (levels 1-3): all three levels

HSK Intermediate (levels 4-6): level 4, level 5, level 6

HSK Advanced (levels 7-9): a-h, j-s, sh-zh

The words are OCRed from the paper, and then converted into traditional characters with some manual corrections. I think some proofreading is still needed.

The following problems arise here:

  1. What should be done with the old appendix?
  2. How should the new appendix be divided? The current version of the HSK has 9 levels grouped in 3 ranks. The high levels (7-9) are not delimited, but they contain roughly as many words as all the preceding levels combined (5636 vs 5456). Note that it's computationally heavy to have a huge amount of words in Template:zh-l on a single page.
  3. There is a category tied to the old word lists, see Category:Mandarin by difficulty level. You may want to reorganize it.
  4. Many words in the HSK can be considered SoPs, and some of them were previously deleted on that ground (see the red links on my drafts).
  5. Many words in the HSK have optional erhua. How should they be listed in the new Appendix?
  6. I think everyone would agree on inclusion of traditional forms of the words, but what should be done about the variant pronunciations (Taiwanese or colloquial Mainland) not listed in the official HSK paper? Should they also be included? --YousuhrNaym (talk) 23:51, 3 October 2021 (UTC)[reply]

Let's talk about the Desktop Improvements[edit]

Annotated Wikipedia Vector interface (logged-out).png


Have you noticed that some wikis have a different desktop interface? Are you curious about the next steps? Maybe you have questions or ideas regarding the design or technical matters?

Join an online meeting with the team working on the Desktop Improvements! It will take place on October 12th, 16:00 UTC on Zoom. It will last an hour. Click here to join.


  • Update on the recent developments
  • Sticky header - presentation of the demo version
  • Questions and answers, discussion


The meeting will not be recorded or streamed. Notes will be taken in a Google Docs file. The presentation part (first two points in the agenda) will be given in English.

We can answer questions asked in English, French, Polish, and Spanish. If you would like to ask questions in advance, add them on the talk page or send them to sgrabarczuk@wikimedia.org.

Olga Vasileva (the team manager) will be hosting this meeting.

Invitation link

We hope to see you! SGrabarczuk (WMF) 15:09, 4 October 2021 (UTC)[reply]

Unifying the transliteration of ʾalef and ʿayin in Semitic languages[edit]

Dear Wiktionary Semitists, I'd like to bring to your attention the current lack of consistency in how ʾalef and ʿayin are transliterated across Semitic languages. Have a look at the following pages and compare transliterations, for example:

  1. Reconstruction:Proto-Semitic/ʕaśar-#Descendants.
  2. Reconstruction:Proto-Semitic/tišʕ-
  3. Reconstruction:Proto-Semitic/šabʕ-

The inconsistency is both inter- and intra-linguistic. It is quite confusing, and since it's basically just a stylistic question, I'd like to start a discussion on whether we should unify to the more traditional (but not user friendly, since they're small and difficult to tell apart) /ʾ/ and /ʿ/ or the more modern (and much more user friendly) /ʔ/ and /ʕ/. Opinions? Thoughts? Let's discuss! —⁠This unsigned comment was added by Sartma (talkcontribs) at 12:22, 5 October 2021 (UTC).[reply]

For Amharic, ʾ and ʿ are the ones in use and since these aren't contrastive, I would like to keep following that practice. I don't have a strong opinion on other Semitic languages though, but /ʔ/ and /ʕ/ do seem more user-friendly in languages where that distinction is relevant. Thadh (talk) 19:30, 5 October 2021 (UTC)[reply]
I'd rather consistency between languages that frequently appear together, like the Ge'ez-script languages or Arabic topolects. I don't see any reason why there should be consistency between all Semitic languages, which only appear next to each other on protolanguage entries. —Μετάknowledgediscuss/deeds 20:32, 5 October 2021 (UTC)[reply]
In my own handwritten notes I find I'm using the IPA symbols as just clearer. We don't have to use pure IPA in transcriptions, but the traditional little curly apostrophes, barely readable in a printed book, become impossible in a computer typeface. The IPA symbols magnify them and make them readable. If you're going to use š rather than sh in transcriptions, you're half way to pure phonetic symbols. The apostrophes are appropriate for semi-technical formats like maps and history books, but for a more linguistic purpose, use clear, readable, unambiguous symbols. --Hiztegilari (talk) 20:58, 5 October 2021 (UTC)[reply]
I support the IPA symbols except for the Gəʿəz-script languages, in which field the half rings seem uncontested, and as mentioned are also the distinction is less contrastive. For Akkadian I don’t know. Fay Freak (talk) 22:57, 5 October 2021 (UTC)[reply]
I have seen that in the Routledge volume The Semitic Languages, most authors use ʔ and ʕ even when they use conventional non-IPA symbols otherwise, e.g. ʔǝgziʔ-ä sämay yä-ṣnǝʕ mängǝśt-ǝyä (Butts, chapter "Gǝʕǝz"). I don't know if this is a general trend, but consistently using ʔ and ʕ in place of ʾ and ʿ is nothing unseen. –Austronesier (talk) 10:33, 6 October 2021 (UTC)[reply]
@Austronesier: True, I remember these fashionable books. They owe it to their character as general overviews, while Wiktionary’s mission is to document the individual languages in detail and as one does when one deals with a narrow selection of languages in detail. I framed the field as Ethiopian studies (Äthiopistik). While this is an Orchideenfach I do not know the people of who nowadays study it, I doubt little that the bulk of the field is gutted if it sees a deviation from that certain transcription system which we currently automatically put and which is of course followed and presented without even any question or any glance on an alternative by the Wikipedia article on Geʽez script—so nobody seeks an article like Romanization of Arabic for Ethiopian Semitic—, and Ethiopists would rather refrain from any change to it. Fay Freak (talk) 16:05, 6 October 2021 (UTC)[reply]
@Fay Freak: Good point. I can confirm from my very own experience that editors of such overview volumes set standards which contributors wouldn't normally follow in more specialized works: e.g. I was urged to change the name of a language to make it confirm with the ISO-standard (in that special case a real abomination). Since you say that the Gəʿəz transliteration in that book was adjusted to an in-volume standard that is otherwise uncommon, I agree we shouldn't really follow it. –Austronesier (talk) 16:28, 6 October 2021 (UTC)[reply]
I've just picked Al-Jallad as an example: in the Routledge volume, he uses ʔ and ʕ in the Safaitic chapter; but in his Safaitic grammar (Brill), he naturally uses ʾ and ʿ. –Austronesier (talk) 16:46, 6 October 2021 (UTC)[reply]
@Austronesier, Fay Freak, Thadh, Metaknowledge Ok, it looks like the majority is ok with using different signs depending on the language. But what about those languages that don't seem to have a standard at the moment? Like Aramaic (the various variety), Hebrew, Arabic and its topolects? To be honest, despite much preferring ʔ and ʕ, I'm more than happy to unify everything to ʾ and ʿ. In the end, there's no real "tradition" that uses ʔ and ʕ, these are just the more "modern" style. To me it's really strange to see Standard Arabic using ʾ/ʿ and other Arabic topolects using ʔ/ʕ, for example. There's no reason why it should be like this. What shall we do? Sartma (talk) 15:16, 7 October 2021 (UTC)[reply]
ʔ and ʕ—the easier if standardization is less relevant. I made the exception only for Ethiosemitic—which is separated by a mere, anyway; I think it will vex you not if we have ʾ and ʿ for Ethiosemitic and ʔ and ʕ elsewhere. Fay Freak (talk) 15:31, 7 October 2021 (UTC)[reply]
Arabic needs input from a great deal more people than will see and interact with this; we'd want a dedicated discussion at Wiktionary talk:About Arabic. As for Aramaic, it will never be completely unified, because some of the modern neo-Aramaic varieties have romanisation traditions that emerged independently from scholarly usage, and should be left as they are. For the long-extinct Aramaic varieties, we can do as we like, and though ʾ and ʿ are the closest we have to a standard for them, I would be happy to switch them over to ʔ and ʕ — although that could be putting the cart before the horse, in that most of the entries don't have romanisation at all and the scheme isn't completely settled anywhere. —Μετάknowledgediscuss/deeds 17:37, 7 October 2021 (UTC)[reply]

Request for new language family and proto-language codes: North Halmahera / Proto-North Halmahera[edit]

User:Alexlin01 and I (or better, mostly Alexlin01 who has been active as IP in the past) have started to add lemmas from languages of the North Halmahera family, together with etymologies from reconstructed proto-forms. There is an existing corpus of 180 proto-forms available, and we might carefully add more reconstructions based on regular sound correspondences.

The North Halmahera languages are part of the proposed West Papuan macrofamily which has the code [paa-wpa] in WT. While West Papuan is still tentative and only based on resemblance sets, North Halmahera is universally accepted, since it is as self-evident as e.g. the Slavic languages. Therefore, we request a code for North Halmahera and Proto-North Halmahera. North Halmahera would be under [paa-wpa] (West Papuan), and include the following languages:

  • Galela [gbi]
  • Gamkonora [gak]
  • Ibu [ibu]
  • Kao [kax]
  • Laba [lau]
  • Loloda [loa]
  • Modole [mqo]
  • Pagu [pgu]
  • Sahu [saj]
  • Tabaru [tby]
  • Ternate [tft]
  • Tidore [tvo]
  • Tobelo [tlb]
  • Tugutil [tuj]
  • Waioli [wli]
  • West Makian [mqs]

Currently, they are under [paa-wpa] (West Papuan) or the generic [paa] (Papuan). ‑Austronesier (talk) 07:35, 6 October 2021 (UTC)[reply]

Hi! Also, from these, Ibu is already extinct. Alexlin01 (talk) 14:34, 6 October 2021 (UTC)[reply]
@Austronesier Created paa-nha and paa-nha-pro. DTLHS (talk) 03:10, 8 October 2021 (UTC)[reply]
@DTLHS Great, many thanks! –Austronesier (talk) 08:43, 8 October 2021 (UTC)[reply]

Inconsistent treatment of Arabic words in Persianate languages[edit]

(Notifying AryamanA, Atitarev, Benwing2, Smettems, Kutchkutch, Bhagadatta, Msasag, Svartava2, Getsnoopy): @Allahverdi Verdizade

There is an inconsistency in the treatment of Arabic words in Persianate languages.

  • In South Asian languages, the proximal donor is given as Persian.
  • In Turkic languages (especially Turkish and Azeri), the proximal donor is given as Arabic.

For example, Hindi किताब (kitāb) is given as coming from Classical Persian کتاب(kitāb), while Azerbaijani kitab or Uzbek kitob is given as ("ultimately") coming from Arabic كِتَاب(kitāb) with no mention of Persian.

Could this be resolved one way or another? I suppose it's a bit iffier for Anatolian Turkish given that the Ottomans had direct contact with Arabic-speaking subject populations, but for Azeri Turkish or the Central Asian languages it should be the same situation as with South Asian languages, i.e. these words entered the language through the means of a Persianate literati class who used both Persian and Arabic, but whose primary language of writing was the former.

My understanding is that there is evidence of Persian mediation for both South Asian and Turkic languages, e.g. Hindi फ़ुर्सत (fursat) meaning "spare time" or Turkish macera meaning "adventure".--Tibidibi (talk) 13:32, 6 October 2021 (UTC)[reply]

Also ping @Vox Sciurorum, @Fay Freak.--Tibidibi (talk) 13:50, 6 October 2021 (UTC)[reply]
I mark Ottoman Turkish and Turkish terms as derived from Arabic unless I have evidence that one was borrowed from Persian. If the word has been in Turkic languages from before the 13th century or so I may assume it was borrowed from Persian. Nineteenth century borrowings I assume were directly from Arabic, if not Ottoman coinages based on Arabic grammar. If there are any phonological or temporal guidelines to use, let me know. Vox Sciurorum (talk) 13:53, 6 October 2021 (UTC)[reply]
@Vox Sciurorum I think there is a stronger justification for having Ottoman terms be derived directly from Arabic because Persian was neither the language of the Ottoman administration nor that of any significant part of the population. For the Turkic languages east of the Ottoman-Safavid border, and for all South Asian languages, the influence of Persian as a prestige language was much more direct.--Tibidibi (talk) 14:10, 6 October 2021 (UTC)[reply]
What Squirrels Voice said.
Also, I see zero value in clogging up the etymology of Arabic derivatives with an extra piece of information, which is hardly provable anyway if it came in through Persian or directly via bookish contexts. Allahverdi Verdizade (talk) 14:15, 6 October 2021 (UTC)[reply]
The Seljuk dynasty that invaded Anatolia after their victory in the Battle of Manzikert was a Persianate society. While Ottoman Turkish was not Persian, the language was replete with loanwords from Persian covering cultural and administrative terminology, while Arabic was the donor for many religious terms. Some of the Persian loanwords the Seljuks brought with them to Anatolia came from Arabic. It is IMO truly impossible to decide whether the proximate source of Ottoman Turkish فلسفه‎ was the (identically spelled) Persian term, or, directly, Arabic فلسفة‎. The choice not to mention Persian as a possible donor is then merely a choice for the sake of convenience, not a matter of principle.  --Lambiam 17:25, 10 October 2021 (UTC)[reply]
The distribution of Persian has a cohesive epicentre while Arabic has been scattered all around the world. Have you heard of Uzbeki Arabic? Now Samarqand clearly was a hotspot of Arabic communication; from there Arabic-speaking tradesmen in low concentration reached Uyghuristan, in the vicinity of which Arabs learned words like خُتُو(ḵutū), on the entry of which I included a quote where Samarqand occurs as a casual station of Arabic rulers; I don’t think one has to imagine the mediation of communication by Persian, contact was generally Arabic language to Turkic language, this regard is most parsimonious. Fitting this picture, Persian words use to reach Mongolian but via Tibetan (!). For Anatolian Turkish it is only most prominent and most obvious, to a Westerner, that contact with Arabic was there, because Arabs were Ottoman subjects (but so they were Kipchak and Turkmen subjects before …). Fay Freak (talk) 15:38, 6 October 2021 (UTC)[reply]
@Fay Freak: Arab speakers in Khorasan are a small minority because the colonists there assimilated quickly. From The Cambridge History of Iran, Volume 4, page 602:
Alongside both the early dialects and dari, which had spread everywhere with a greater or lesser degree of local variation, Arabic had also taken root in Iran. It was of course the everyday language of the Arab immigrants: certain towns such as Dinavar, Zanjan, Nihavand, Kashan, Qum and Nishapur had a considerable Arab population and Arab tribes had also settled in Khurasan. However, these Arab elements were more or less rapidly assimilated: in the middle of the 2nd/8th century the majority of the Arabs in the army of Abu Muslim spoke dari.
In fact, the Islamic conquest led to the expansion of Persian and its replacement of local Eastern Iranian languages like Sogdian.
Since major urban centers such as Bukhara and Samarqand were clearly predominantly Persophone by the period when the region was becoming increasingly linguistically Turkic, I don't see any justification for claiming that most Arabic loans in e.g. Uzbek are directly from the small community of native Arabic speakers instead of reflecting Arabic's position as a prestige language upheld by a primarily Persophone literati elite.
Chagatai, the direct literary ancestor of Uzbek, was marked by extensive Persian influence (to the point that some texts have virtually no Turkic content words) and became a literary language explicitly on the model of Persian in Timurid and Shaybanid courts, both of which retained Persian as the chief bureaucratic language. I understand that Chagatai has little additional Arabic influence beyond what is already systemically found in Persian. Tibidibi (talk) 16:16, 6 October 2021 (UTC)[reply]
The point was that there had been a constant latent presence of Arabic, not only as traces in Persian. Be the communities more or less native or be they acquainted with it due to trade or war or education. Arabic was never eradicated and the influx was continuously renewed. While in India this latent presence lacked, Arabic was really remote and for the educated. Oddly of course Persian scholars wrote Arabic – for Samarqand I think of Najib ad-Din Samarqandi – while Indians wrote Persian, does this tell us something for the question of the thread? So in the former borrowings could be more from Arabic due to some familiarity. Fay Freak (talk) 16:30, 6 October 2021 (UTC)[reply]
If you actually read about Central Asian Arabic, you'll see that they bear signs of having close ties to dialects in Arab countries, which allows us to reconstruct migration events. This is clearly inconsistent with a "constant latent presence" of actual speakers (as opposed to scholars and clerics, who could only influence the language on a literary or religious level). For Indian and Central Asian Turkic languages, there is no reason not to assume a Persian intermediary unless specific evidence is brought to bear for a given word; for Turkish and Azerbaijani, I don't think it's generally knowable. —Μετάknowledgediscuss/deeds 03:30, 8 October 2021 (UTC)[reply]
@Metaknowledge Why do you think it's unknowable for Azerbaijani? I'm not really sure what the major difference would be between Azerbaijani and Chagatai vis-a-vis their relationship to Arabic/Arabs and Persian/Persians. Tibidibi (talk) 14:09, 10 October 2021 (UTC)[reply]
Because the West Oghuz tribes have actually been geographically adjacent to Arabs since around 1000 AD. Allahverdi Verdizade (talk) 10:53, 13 October 2021 (UTC)[reply]

Romanization pages for Mandarin and Cantonese - possible update task for a bot?[edit]

Currently, the various romanization pages for Mandarin Pinyin and Cantonese Jyutping are in a poor state. I presume due to the quantity and ancillary nature of such entries, many are lacking updated content with common characters and there are inconsistent presentation of the relevant characters. Some examples:

  • For 烹, the pinyin entry pēng shows characters such as 硷 and 軽, which are simplified or variant forms but the linked traditional forms do not show this pronunciation. In the case of 軽, this character is more commonly recognised as Japanese Shinjitai since the regularly observed Chinese forms are 輕 and 轻.
  • paang1 does not show 烹 at all
  • xiǎn shows in list items 5 崄 and 6 嶮 which are the simplified and traditional version of the same character, while lower down item 23 lists 猃, 獫 together.
  • Also in xiǎn, item 17 濁 is shown but the simplified form is not included.

This seems to be a good target for a bot to update the entries if it is able to take all the existing pinyin and Jyutping pronunciations for all characters and to update the entries systematically, while also standardising the presentation of simplified and variant character forms. A good example to reference is shí which has a good number of entries (however I'm not sure if it includes all) and most entries list the traditional and simplified forms together. This entry does however list item 2 as "実, 实, 實, 寔", which is a bizarre ¿alphabetical? order of Shinjitai, simplified, traditional and variant characters. As for item ordering, it might seem like it is ordered by radical and stroke - this might be something that needs consideration for standardisation of the romanisation entries.

Would anybody be able to take on this task?

I can try to built such a bot but I have not built bots before and I believe it requires data scraping the pronunciations off all the existing entries, which will be a arduous task in itself, even if done with automation.

Zywxn (talk) 17:14, 6 October 2021 (UTC)[reply]

User TheNicodene - revert war to hide unresolved abuse[edit]

The user is trying to obstruct my efforts at bringing to attention at addressing the abuse they've perpetrated against me deleting and archiving the discussion at Talk:formaticus. They're trying to hide the abuse and break the existing links in other discussions. The issue is not resolved and cannot be archived until it is. I request this user be blocked if they continue the edit war. Brutal Russian (talk) 05:31, 7 October 2021 (UTC)[reply]

I did not 'hide' the discussion; that is a flat-out lie which can be disproved by clicking the link. I placed the discussion in an archive and added a link at the top of the talk page; doing so with discussions over 75000 bytes, in order to free up space for new discussions, is standard Wiki practice. The discussion has not even been replied to for four months now. Nor did archiving it 'break links', which is another flat-out lie. Talk: formaticus functions exactly as it always did.
See here for a write up of only some of the insults this user has thrown at me over several months, for which he has even been temporarily blocked. I have no idea why he is suddenly acting up again after a merciful three-month hiatus. The Nicodene (talk) 05:53, 7 October 2021 (UTC)[reply]

Macedonian: standard, non-standard, misspelling[edit]

@Chuck Entz, Erutuon, Metaknowledge Since I am now creating entries for non-lemma forms of verbs, I would like to discuss some issues relating to the treatment of non-standard and misspelled words. We scratched the surface with User:Erutuon in August, but there are quite a lot of problems to be addressed:

Currently, my entries are formatted as follows:

Assigned to: verbs, lemmas (I am omitting less relevant categories)
Assigned to: misspellings, non-lemmas
Assigned to: participles, non-lemmas
Assigned to: participles, misspellings, non-lemmas
  • очерупа - nonstandard word, non-lemma: "verb" in the headword line, {{lb|mk|nonstandard}} in the definition
Assigned to: verbs, non-standard terms, lemmas
Assigned to: participles, non-lemmas

The problems are as follows:

  • It is also possible to treat корегиран as a misspelling of коригиран, i.e. to link two non-lemma forms to each other, rather than defining each as an inflected form a lemma. I have always tended to opt for the second solution, including with categories other than partciples.
  • Putting "misspelling" in the headword line of a misspelled verb lemma prevents it from being assigned to "verbs", but putting "misspelling" in the headword line of a misspelled participle (non-lemma form) of a verb does not prevent it from being assigned to "participles", because the parameter "part" inside {{infl of}} seems to populate that category.
  • "misspelling" does not distinguish between misspelled lemmas and misspelled non-lemmas.
  • Non-lemma forms of non-standard words are not labelled in any way to indicate that they are non-standard, because if I write {{lb|mk|nonstandard}}, they will get categorized as non-standard terms, which is wrong (they are not terms but non-lemmas), whereas if I write {{lb|mk|nonstandard forms}}, that will technically be correct, except that this label is used elsewhere for non-standard forms of standard words (comparable to English "goed", a non-standard preterite of the standard "go").

Further complications:

Participles have their own inflection, e.g. "коригираниот", which is the definite form. I do not want this to link back to the verb коригира; it is more appropriate for it to be defined as {{infl of|mk|коригиран||def|m|s}}. It will then be assigned to participle forms, with the help of the headword line {{head|mk|participle forms}}. However, if the inflected participle is misspelled as "корегираниот", it would be defined as {{infl of|mk|корегиран||def|m|s}} and the headword line would be {{head|mk|misspelling}}. Consequently, there would be nothing to assign "корегираниот" to participle forms. This would be a second inconsistency, in addition to the aforementioned one ("misspelling" suppresses the category "verbs" but not the category "participles") Martin123xyz (talk) 11:48, 7 October 2021 (UTC)[reply]

Ideal solution:

Redefine the category system to have the following:

  • lemmas
  • non-lemma forms
  • misspelled lemmas
  • misspelled non-lemma forms
  • non-lemma forms of misspelled lemmas
  • non-standard lemmas
  • non-standard non-lemmas forms
  • non-lemma forms of non-standard lemmas

Each of these would contain subcategories for "noun", "verb", "adjective" instead of "lemma", e.g. "misspelled nouns", "misspelled noun forms", "forms of misspelled nouns", etc. There would be separate headers for each, e.g. {{head|mk|noun}}, {{head|mk|misspelled noun}}, {{head|mk|form of misspelled noun}} (with abbreviations for easier typing).

For dealing with non-lemma forms of non-lemma forms, like the declined forms of Macedonian participles, we would need the following:

  • participles < verb forms
  • misspelled participles < misspelled verb forms
  • participles of misspelled verbs < non-lemma forms of misspelled verbs
  • non-standard participles < nonstandard verb forms
  • participles of non-standard verbs < non-lemma forms of non-standard verbs
  • participle forms
  • forms of misspelled participles
  • forms of participles of misspelled verbs
  • forms of non-standard participles
  • forms of participles of non-standard verbs

This is in my opinion the maximal categorization that we arrive at when we take into account all the relevant factors that my creating Macedonian entries has brought to the fore so far. Any other system, including the current one, seems to me to be bound to blur at least one of the empirically established distinctions highlighted above.

I am assuming that no one will be happy to implement such a categorization system, but the overview I have provided above should still be helpful for keeping track of what exactly the current system obscures and coming up with improvements addressing individual problems only. Needless to say, the distinctions that I have presented will also apply to many other languages.

Pending improvements, I would like to ask if the way I format the six types of entries listed at the start of this post is appropriate for the time being, or is there something I could do better, or even should, according to Wiktionary policies. Martin123xyz (talk) 11:48, 7 October 2021 (UTC)[reply]

In my opinion, a misspelt noun or verb is still a noun or verb, and should be categorised as such. Converting the header line of a lemma to the header line of a misspelling is Visigothism, even if committed by @Equinox, and in English loses the mentions of inflections that one could otherwise find by searching. {{misspelling of}} provides the appropriate information and categorisation. --RichardW57 (talk) 02:52, 8 October 2021 (UTC)[reply]
When adding "misspelling" to the header line in addition to using {{misspelling of}}, I was complying with the instructions provided at Wiktionary:Misspellings. However, your suggestion resolves the two inconsistencies I referred to above. Martin123xyz (talk) 07:03, 8 October 2021 (UTC)[reply]
My thought on reading that is 'Quo Warranto?'. I don't know whether to amend Wiktionary:Misspellings, tag it as unadopted or simply request its deletion. Can anyone justify not treating misspelt English verbs as verbs? One problem is that a manual maintenance action needed for verbs will not happen simply because misspelt verbs are not listed as verbs. --RichardW57 (talk) 08:03, 8 October 2021 (UTC)[reply]
Requesting its deletion without providing new instructions would not be helpful. As long as there are some instructions, at least a certain degree of consistency between different users' contributions is ensured. And if you leave it as it is, more users will find it, assume that it is an official policy which enjoys the consensus of the community, and continue to adhere to it. Either way, the instructions for contributors regarding things like "misspellings" need to be significantly expanded - currently they are simplistic, in addition to being biased in favour of English entries. I am considering writing a user guide for Macedonian contributions, except that so many things are unregulated or poorly regulated on the English Wiktionary as a whole that I would need to make my own arbitrary decisions or keep asking here about every point. Martin123xyz (talk) 10:04, 8 October 2021 (UTC)[reply]
'Term' covers both lemma and non-lemma. --RichardW57 (talk) 02:52, 8 October 2021 (UTC)[reply]
Full information about a non-lemma should be given under the lemma; one would not wish to repeat the multiple meanings of a lemma for its inflected forms. Accordingly, it should suffice to record that something is the inflected form of a non-standard term by recording the non-standardhood at the parent term itself. --RichardW57 (talk) 02:52, 8 October 2021 (UTC)[reply]
Thank you for the input. Martin123xyz (talk) 07:03, 8 October 2021 (UTC)[reply]

I have noticed a further problem: not only is "nonstandard form" ambiguous between "inflected form of a nonstandard lemma" and "non-standard form a standard lemma", it can also be understood as "nonstandard equivalent/variant of a standard lemma" (on the analogy of "alternative form of". I had used it in this sense at допринесува recently. Regrettably, {{nonstandard form of}} does not address this threeway ambiguity. Martin123xyz (talk) 14:00, 8 October 2021 (UTC)[reply]

I just created a page for витруелен (vitruelen), using {{head|mk|misspelling}} and {{misspelling of|mk|виртуелен}}, and the entry appears in Category:Macedonian non-lemma forms and Category:Macedonian misspellings, which is wrong, because the word is misspelled lemma, not a non-lemma form. Maybe we need to use {{head|mk|misspelled lemma}} instead, and put those entries in Category:Macedonian misspelled lemmas? Gorec (talk) 14:47, 8 October 2021 (UTC)[reply]
The argument for using misspelling as a part of speech actually argues for splitting the lemma categories into misspelt and 'correctly' spelt lemmas. I'd rather add a parameter to {{mk-noun}} and {{en-verb}} etc. I'm waiting for an old hand to weigh in. --RichardW57 (talk) 16:48, 8 October 2021 (UTC)[reply]


As I've suggested before, we should establish an arbitration committee (much like the one Wikipedia has) to settle entrenched disputes among users. The finer details can be discussed later, but in general, is there any considerable support for this proposal? Imetsia (talk) 19:14, 8 October 2021 (UTC)[reply]

There is from my part! Of course we hope to not have any disputes at all, but as the previous year has shown, they are inevitable in a project of our size. Thadh (talk) 19:36, 8 October 2021 (UTC)[reply]
Just as seatbelts and airbags have lead to more automobile accidents, creating an arbitration committee is guaranteed to lead to more intransigence. Participants in such disputes are all fairly confident that they are in the right and that their PoV will be the prevailing one, with only minor concessions to the other side. Also, there will be less avoidance of potentially controversial edits and other changes because one's point of view will be perceived as more likely to prevail. DCDuring (talk) 20:23, 8 October 2021 (UTC)[reply]
I think I should clarify: I don't know how WP's arbitration works, but my idea was similar to what Vox Sciurorum proposes below. I think we ought to have some system where unaffiliated admins can resolve ongoing disputes. Thadh (talk) 10:36, 11 October 2021 (UTC)[reply]
ArbCom over at Wikipedia has not been a roaring success. It is very important that we recognise that the way their judicial system works is not ideal, it is simply how things happened to play out. Their ArbCom has three distinct purposes: policy, block appeals, and conflict resolution. There is no reason that one body should decide on all three, nor is this necessarily a good thing. As it stands, Wiktionary is much more democratic than Wikipedia, and we handle more policy through votes. I think this should remain the case. So the question is then whether block disputes (not just appeals, which are usually spurious, but where admins are actually in disagreement) and conflict resolution could be handled better than they are now, and at what cost. I think we could do better, so this idea has some merit — but we would also create a venue for the bickering that already distracts from the actual work of editing, and this has been a major effect of Wikipedia's ArbCom. —Μετάknowledgediscuss/deeds 20:39, 8 October 2021 (UTC)[reply]
  • Symbol support vote.svg Support because this would prevent long endless disputes like the recent one ({{inh+}} & {{bor+}}). Svartava2 (talk) 06:08, 9 October 2021 (UTC)[reply]
I agree with Μετα that WP:ArbCom is not as functional as one might wish, and with DCD that the laudible intention of avoiding arbitrariness in arbitration has led to rule codification paving the road to hell endless wikibickering. We should be careful what we wish for. A dispute over a deep disagreement can be held in an amicable way; what made recent disputes unpleasant were the sometimes implied, often straightforward accusations of bad faith cast at the other side. Perhaps an etiquette committee might do some good.  --Lambiam 16:57, 10 October 2021 (UTC)[reply]
I don't like the idea. I know I'm a bit of a handful but it's not "I don't want to be officially reprimanded" (I don't care if I'm officially reprimanded, that's fine), it's more, as Meta suggests above, I think that creating a special little judicial system-in-system does more to foster bullshit than it does to fix actual project issues. Equinox 17:22, 10 October 2021 (UTC)[reply]
It would be useful to have a way to resolve disputes where neither of two contradictory and strongly-held positions has supermajority support. I doubt a formal arbitration committee is the way. Maybe we can find a less formal way to have senior administrators cut the knot in cases like derivation wording without having every vote appealed to them. Vox Sciurorum (talk) 18:44, 10 October 2021 (UTC)[reply]
Say the proposal is instead to create "Wiktionary:Requests for Arbitration," where users can make their case, and well-established editors can vote in support of one disputant or another. I'd imagine this would be very similar to how we run RFD - no committees, formal procedures, rules of evidence, etc. And by the end of one month, we count the number of votes and act according to what the majority decides. Is this a "less formal way" that you'd support? (Really, this question goes to all users in this discussion who don't like the idea of forming an ArbCom). Imetsia (talk) 23:23, 13 October 2021 (UTC)[reply]
@Vox Sciurorum, Metaknowledge? —⁠This unsigned comment was added by Imetsia (talkcontribs).
The problem is that this doesn't differ much from a simple vote... I really do think we ought to restrict the solving of such disputes to the (uninvolved) administrators. Thadh (talk) 21:43, 15 October 2021 (UTC)[reply]
This solution introduces so many new problems that it more than counterbalances the ones it solves. I think that instead of throwing half-baked ideas at the wall and seeing what sticks, it's worth asking what you really want and how to achieve that. If what you want is to know whether you're allowed to use {{bor+}}, then I would say that you're going about it the wrong way — a Supreme Court shouldn't be making policy. —Μετάknowledgediscuss/deeds 22:14, 15 October 2021 (UTC)[reply]
The + templates situation would have been something an arbitration committee could have helped solve. However, it is a moot case at this point, and I wouldn't use a proposed ArbCom to continue to litigate it. For a more current issue, I'd point to the Brutal Russian versus TheNicodene complaints, even though I have no personal stake in that issue and am very unfamiliar with the fact pattern. Again, a board of well-established users voting in his favor/opposition is one possible avenue to put this issue to rest once and for all. Indeed, I think it is the best way to resolve the two above issues declaratively. Such conflict-resolution is squarely in the province of a judicial branch, whose sole purpose it is to interpret policy and settle disputes among litigants. But ultimately, I also understand the objections (though I still think the benefits outweigh the detriments), and I won't continue to pursue the creation of an arbitration committee in spite of myself. Imetsia (talk) 23:44, 15 October 2021 (UTC)[reply]
  • I share concerns that establishing a bureaucratic structure here with formal committees probably wouldn't help in the way proponents are hoping. I worry about the risk of "borrowing trouble", as a wiser fellow expressed to me a while back. ‑‑ Eiríkr Útlendi │Tala við mig 21:39, 13 October 2021 (UTC)[reply]
With the number of people actively in this community, an arbitration committee would feel like a sitcom or Alice in Wonderland trial, where there's an argument and someone puts on a wig and a fine bit of farce is had that satisfies nothing. The English Wikipedia ArbCom works in part because the Committee is not tangled up in all the issues that reach them; I can't see that happening here. Referring our issues to the English Wikipedia ArbCom might work.--Prosfilaes (talk) 23:40, 13 October 2021 (UTC)[reply]
I am deeply reticent to refer any EN Wiktionary concerns to the EN Wikipedia ArbCom. Our organizational cultures and norms are very different. We've had various issues arise because Wikipedia editors engage here, based on Wikipedia norms, requiring much cleanup and coordination. I can't imagine that issues referred to the WP ArbCom would be handled with any ease. ‑‑ Eiríkr Útlendi │Tala við mig 02:48, 14 October 2021 (UTC)[reply]
Like Prosfilaes, I don't think we have a big enough active editor base to have an Arbcom. I like the suggestion that if there's an intractable issue where neither position can get supermajority support, or it's unclear what the status quo is (since votes are structured as changes to the status quo) but we have to do something, we should have a majority vote. It isn't without issues, but...it's an idea. I don't know if Wikipedia's Arbcom would be keen to accept cases from us, since they have a workload as it is, and they (or we) also might often feel they lacked the relevant expertise to judge things like disputes over what template wordings are best for a dictionary. For intractable disputes over blocks, we could ask global sysops to weigh in. - -sche (discuss) 01:33, 14 October 2021 (UTC)[reply]
Global sysops are just as bad as outsourcing to Wikipedia. In my experience, they generally neither know nor care about Wiktionary, and would probably be annoyed at the very suggestion of foisting another local task on them. —Μετάknowledgediscuss/deeds 18:00, 14 October 2021 (UTC)[reply]


This page survived RFD, but many users pointed out the need for a cleanup. Modernization/expansion from experienced editors is welcome. (Discussion here, to be archived at Wiktionary talk:Etymology.) Ultimateria (talk) 00:02, 10 October 2021 (UTC)[reply]

Wording of RFD banner[edit]

I propose that we change the banner message generated by {{rfd}} as follows:

Current text:

This entry has been nominated for deletion
Please see that page for discussion and justifications. Feel free to edit this entry as normal, though do not remove the {{rfd}} until the debate has finished.

Proposed new text:

This entry has been nominated for deletion
Please see that page for discussion and justifications. While voting is in progress, please do not edit this entry in a way that may alter or make unclear the apparent intention of votes already cast. Do not remove the {{rfd}} template until the debate has finished.

What do you think? Mihia (talk) 21:02, 10 October 2021 (UTC)[reply]

I noticed that someone put a noun sense under the verb sense of push and shove, which seemed like a good idea but made the voting less clear. None Shall Revert (talk) 06:56, 11 October 2021 (UTC)[reply]
Also wiki things are not supposed to be "votes" None Shall Revert (talk) 06:58, 11 October 2021 (UTC)[reply]
It does happen from time to time. I have observed several cases where fundamental changes have been made to the whole basis of an entry while voting is in progress, and moreover people sometimes do not even bother to mention that they have done this at the RFD discussion. So an entry is listed at RFD, people vote "Delete" let's say, and then the entry is completely changed or rewritten, or redirected maybe, with no notice, leaving the status of the pre-existing votes totally unclear. I definitely do not agree that we should simply say "Feel free to edit this entry as normal" on the RFD banner -- it's just a question of exactly what we do say. Rather than my suggestion above, we could say "please mention any substantial changes at the RFD discussion", but this still leaves the problem of what should be done with pre-existing votes that may no longer be applicable. Mihia (talk) 08:12, 11 October 2021 (UTC)[reply]

Alternative suggestion (a bit more permissive):

This entry has been nominated for deletion
Please see that page for discussion and justifications. You may continue to edit this entry while the discussion proceeds, but please mention significant edits at the RFD discussion and ensure that the intention of votes already cast is not made unclear. Do not remove the {{rfd}} template until the debate has finished.

Mihia (talk) 08:22, 13 October 2021 (UTC)[reply]

I like the last one. Ultimateria (talk) 17:16, 13 October 2021 (UTC)[reply]
I like this wording better than the first proposal. - -sche (discuss) 01:35, 14 October 2021 (UTC)[reply]
Likewise, I support this last wording. Imetsia (talk) 17:06, 14 October 2021 (UTC)[reply]
OK, I have implemented the second suggestion. Mihia (talk) 17:07, 14 October 2021 (UTC)[reply]

Proposal for new parameter in linking templates: "alternative script"[edit]

I suggest a new parameter for linking templates which will input alternative (non-lemma) script forms within parantheses. This is already partly done for Korean and Vietnamese:

But these language-specific templates are not ideal because they lack most key functions (e.g. part of speech, literal meaning, suppression of transliteration.) and cannot be integrated with other templates such as {{alter}}, {{syn}}, {{bor}}, etc.

An "alternative script" parameter would be useful for various languages:

  • In the case of Korean, especially formal or academic language, there is a very large number of Chinese-derived homophones. An example is 연기 (yeongi), whose entry currently features nine not uncommon and completely unrelated words:
연기 (演技, yeongi, “acting”), 연기 (煙氣, yeongi, “smoke”), 연기 (延期, yeongi, “postponement”), 연기 (緣起, yeongi, “dependent origination”), 연기 (年記, yeongi, “date of composition recorded on an artwork”), 연기 (年期, yeongi, “certain number of years”), etc.
A fully integrated "alternative script" parameter would allow far easier disambiguation of these. To a lesser extent, this is also true of Vietnamese.
  • Many languages are written in multiple scripts. On Wiktionary, one script is usually chosen as the lemma script, with the result that forms in the other script are neglected. For instance, the majority of Azerbaijani speakers live in Iran and primarily use the Arabic script, which has also been the script for most of Azerbaijani history. But this fact is neglected because all Azerbaijani lemmas are in the Republic's Turkish-based Latin script. The integration of an "alternative script" parameter would allow for a more equitable coverage of such languages in etymology or descendant sections, in translation charts, etc. Example:
current {{m|az|Azərbaycan}} Azərbaycan > new {{m|az|Azərbaycan|altscr=آذربایجان‎}} Azərbaycan (آذربایجان‎‎)
current {{m|ks|کٲشُر}} کٲشُر(kạ̄śur) > new {{m|ks|کٲشُر|altscr=कॉशुर}} کٲشُر‎ (कॉशुर, kạ̄śur)

Thoughts?--Tibidibi (talk) 07:11, 11 October 2021 (UTC)[reply]

I've found a similar need in Pali, where there are multiple scripts in use, and I anticipate a similar need for Sanskrit. The solution for Pali is documented by a full set of examples for {{pi-link}}, which generalises {{link}}. One complication there is that some Pali writing systems are ambiguous and that the Roman script is one of the major writing systems, so we end up with transliterations and Roman script equivalent sometimes having to be different. Generally we want to link to the Roman script equivalent, but sometimes it is not easily available, e.g. in inflection tables, which commonly link to the entries in the tables. Sanskrit has a similar but different complication. The Bengali script writing system is ambiguous, and Devanagari is the 'lemma' script. (Don't like the term, as we treat the equivalents in the other scripts as alternative forms, thus also lemmas.) For Pali I've built specialised forms of some linking templates on the standard templates, such as {{pi-alternative form of}} on {{alternative form of}}. I've independently encoded {{pi-nr-inflection of}}, which I ought to convert to build on the standard template using common generalisation logic. --RichardW57 (talk) 12:11, 11 October 2021 (UTC)[reply]
Note that my scheme treats the form in the alternative script as the primary input. --RichardW57 (talk) 12:11, 11 October 2021 (UTC)[reply]
Korean is an unusual case, where the hidden parameter to the conversion is meaning rather than pronunciation. --RichardW57 (talk) 12:11, 11 October 2021 (UTC)[reply]
@Tibidibi: It's a yes for me. Maybe with the possibility of adding a description before the alternative script, like they do in Serbo-Croatian entries (for example: dom#Noun_28). Sartma (talk) 08:27, 12 October 2021 (UTC)[reply]

Splitting Hebrew roots?[edit]

There are a bunch of homonymous Hebrew roots that mean completely different things but just so happen to look the same and there doesn't seem to be a way to distinguish between them. חילוני, התחיל וחלל don't really share a root, right?.--The cool numel (talk) 08:47, 12 October 2021 (UTC)[reply]

I don’t see how the root of חילוני(khiloni, secular) can be ח־ל־ל‎, while that of חילון(khilún, secularization) is ח־ל־ן‎‎. I guess this is a typo. If we had pages for these roots, we could document several unrelated meanings like we do for other homonymous terms, such as fluke.  --Lambiam 04:30, 13 October 2021 (UTC)[reply]
@Lambian: I'm pretty sure the root of חילון‎ is ח־ל־ן‎‎, as it's derived from חילוני‎ which is in turn just the root ח־ל־ל‎ with the pattern קִטְלוֹנִי (like צבעוני). The thing I'm talking about is splitting categories like by meaning. --The cool numel (talk) 09:57, 13 October 2021 (UTC)[reply]
So I take it then the root is the inflectional root, not the etymological root. Doesn’t that make splitting categories by meaning much less interesting? IMO such splitting would best be done by creating subcategories of homonymous roots according to their different core senses, but deciding what these core senses are and recategorizing terms with homonymous roots accordingly will mean a lot of work for a very small bunch of active Hebrew editors.  --Lambiam 11:44, 13 October 2021 (UTC)[reply]

Adding DRAE links to all Spanish lemmas[edit]

There are currently ~18,500 lemmas with links to DRAE. There are an additional ~27,000 Spanish lemmas that do not currently have a DRAE link but do have a corresponding DRAE entry.

I can run a bot to add a "Further reading" category with a link to {{R:DRAE}} to the entries missing DRAE links. Would this be desirable or just annoying clutter? JeffDoozan (talk) 17:02, 13 October 2021 (UTC)[reply]

If you can match the entries accurately I don't see why it would be a problem. I routinely add them manually. – Jberkel 17:10, 13 October 2021 (UTC)[reply]
Huh, I expected more pages to have an entry. I think it's helpful! As I expand Spanish entries I could use it to filter out a set of "core" Spanish words to work on. Ultimateria (talk) 17:14, 13 October 2021 (UTC)[reply]
Only if the bot checks that the target of the link is a real definition. Today I saw several French entries where people added {{R:TLFi}} but the web site has no definition. Vox Sciurorum (talk) 18:26, 13 October 2021 (UTC)[reply]
Yes, it does. JeffDoozan (talk) 18:39, 13 October 2021 (UTC)[reply]

The phrasebook is in dire need of rules.[edit]

(Not referring to the CFI, that's another topic.) Coming from languages that are both gendered and have polite forms, the translation boxes in most phrasebook entries are a mess. It's completely random whether:

  • ...only the polite, only the familiar or both versions are present.
  • ...these polite/familiar forms are qualified as such, whether this qualification comes before or after the entry and whether this qualification is called polite/familiar or formal/informal.
  • ...plural phrases are present.
  • ...all these forms are consistently present both in their male as well as their female forms (if applicable) and how those forms are annotated.
  • ...what the order of all these forms is.

My suggestions:

  • Decide whether to call it polite/familiar or formal/informal and then apply this consistently. See the inconsistencies in are you allergic to any medications
  • Split the translation box into two distinct ones in most articles (where applicable), one for familiar, one for polite forms. Languages that don't have this feature could either be automatically completed using a bot that copies over entries between the boxes or alternatively they could be barred from one of the boxes (maybe by introducing a new {{trans-top}} that only accepts languages with politeness distinctions).
    • If the above point doesn't happen, at least define a consistent scheme. Should the qualifier come before or after? Should entries without qualifiers in languages with politeness distinctions be allowed? What should come first?
  • Disallow plural translations.
  • Decide whether gender should be expressed using the gender parameter of {{t}} or using {{qualifier}}, then apply this consistently. See the inconsistencies between e.g. are you religious and are you single.

--Fytcha (talk) 02:36, 14 October 2021 (UTC)[reply]

I agree with all of this. But it's worth noting that in many languages, politeness and formality are not the same thing. In Korean, you can be politely informal and non-politely formal. Tibidibi (talk) 04:40, 14 October 2021 (UTC)[reply]
In that case, as I don't think it is within the scope of a phrasebook to give impolite phrases (except perhaps for phrases that are explicitly/obviously impolite), I would suggest that we stick with formal and informal and avoid any distinctions between politeness and impoliteness. Andrew Sheedy (talk) 05:53, 14 October 2021 (UTC)[reply]
I also agree with all the above, with the caveat that some languages, like Korean, have both a formal/informal and polite/familiar distinction. As you say, we can choose the most relevant one (I would probably keep polite/familiar for Korean too, since formal/informal is a distinction more pertinent to more restricted scenarios, but I guess Korean editors will make the call on that. Just a note: non-polite means "familiar" and doesn't mean impolite.). Sartma (talk) 09:12, 14 October 2021 (UTC)[reply]

Major opportunity for us to step in for word of the year[edit]

Heads up that OED are slipping. It's our time to strike. —Justin (koavf)TCM 16:36, 14 October 2021 (UTC)[reply]

"...observing that 'worms are all over the place' and 'everybody loves a good worm.' Well, I'm sold. Ultimateria (talk) 16:52, 14 October 2021 (UTC)[reply]
In a way it would be funnier with the computing sense of worm (something like a virus), since I can imagine somebody really out of touch thinking this was a "new" hi-tech word of the 21st century! Equinox 10:13, 15 October 2021 (UTC)[reply]

New SOP policy idea[edit]

I propose adding a new SOP test at WT:Idioms that survived RFD. It would have a caption like "Terms whose parts are substitutable, but with which only a few variations greatly predominate. For instance, the word "air" in air resistance can be switched out for "wind," "snow," "water," "fluid," and others; but "air resistance" is the only widely used and attested form." (A better writer could improve some of the wording). Accordingly, I would name the test WT:AIR RESISTANCE/WT:AIR, although there are probably other entries to which this logic has been applied in RFD discussions. (Talk:idle threat comes to mind). There are also the ongoing discussion about rumor has it and puré de batata.

As a community, this is a justification that has previously won the day, so it makes sense to codify it. In addition, all of our SOP policies are essentially advisory and open to great interpretation (there are no bright-line rules), and I don't think this test would depart from that tradition. Lastly, this policy would finally bring us one step closer to a more fleshed-out approach to handling set phrases and common collocations. Thoughts? Imetsia (talk) 17:36, 14 October 2021 (UTC)[reply]

Your idea sounds great, I like it. The reason why I'd advocate for the inclusion of articles such as air resistance isn't because they're so indecipherable (let's be honest, you really can guess what it means based on the parts) but because:
  • It is the canonical collocation to express this idea. There might be other SOPs that convey the same meaning but this one is the one that's actually used.
  • The article serves many other purposes other than just explaining the idea, such as providing translations, coordinate terms, hyponyms etc.
Your proposal shifts the focus of SOP discussions a bit away from the question "Can its meaning be guessed based on the parts?" to "Is it the principal (i.e. most widespread) collocation to express this concept?", which is a change I welcome with open arms. Fytcha (talk) 18:06, 14 October 2021 (UTC)[reply]
Support. This seems like a good idea. We need some way of including collocations and fixed expressions, anyway. Andrew Sheedy (talk) 19:36, 14 October 2021 (UTC)[reply]
I now agree that we should have a firm basis for including entries for strong set phrases -- combinations that are explicable as SoP, but in practice overwhelmingly predominate over other possible ways of saying the same thing by word substitution of synonyms (however we can best define this idea). While we are looking at this policy area, I also believe that we should have a firm basis for including SoP phrases that are particularly hard to understand from the parts if one does not already know which of many possible meanings to combine together -- another argument that is often made at RFD. Mihia (talk) 21:36, 14 October 2021 (UTC)[reply]
On the second suggestion, I think we'd have to firmly pin down whether there is enough of a multitude of "possible meanings to combine together" for a term to not be SOP. This seems quite hard to establish clearly through policy. Talk:amico per convenienza comes to mind. On the first try, it passed RFD because of just this justification, though the vote was later overturned. (To me, the argument that it was SOP was a slam dunk, and it shocked me that so many users initially disagreed). So I do not disagree with the idea in principle, but we would have to adjust the dials just right to ensure we are neither over- or under-inclusive. Is there really an administrable standard we can come up with to achieve just this result? Imetsia (talk) 22:01, 14 October 2021 (UTC)[reply]
I think both ideas are equally hard to precisely codify because there will always be an element of subjectivity. I think we just have to accept this, and establish the broad policy and let borderline or argued cases go to RFD. I think that examples of phrases that have passed RFD on the stated grounds, as we have done with other cases, are very helpful. FTR, a recent one that was undeleted on the second ground is track meet. Mihia (talk) 10:08, 15 October 2021 (UTC)[reply]
I oppose including common colocations because they are common colocations. We can use {{ux}} to illustrate the more common uses. Vox Sciurorum (talk) 23:30, 14 October 2021 (UTC)[reply]
I oppose the subjectivity of the idea. Although “idiomaticity” is close friends with commonness.
The real question should be technical utility, with cross-language perspectives (which most who want to have a say on a term don’t have, naturally since our language knowledges are limited by our origins in particular language communities). And it wasn’t even about the utility of the term alone in the case of air resistance, but people apparently wanted it as a model for other types of resistance (so we do not have to create them but look in this entry how to construct them, very remarkable). But you are unable to form a reasonable rule or guideline from this example. A particulari ad universale non valet consequentia. (Case law is bad and a meaningless Anglo-fetish.) Fay Freak (talk) 00:37, 15 October 2021 (UTC)[reply]
The guideline I've formulated seems quite reasonable to me. Why do you disagree? It's readily administrable and provides a good general principle that can be applied not mechanically, but by using sound judgment and discretion. Just like every other example on WT:Idioms that survived RFD, this is not a hard and fast rule, and it includes an element of subjectivity. Editors constantly disagree about the application of SOP policies; some are more permissive on the issue of term inclusion, and others are more conservative. This is not an exception to that rule. It fits in perfectly with every other advisory rule we've ever put forward about idiomacity and SOP-ness. Imetsia (talk) 15:55, 15 October 2021 (UTC)[reply]
I don't support individual entries for mere common collocations. I think we can find a conceptual division, albeit slightly grey and subjective, between common collocation and strong set phrase. Mihia (talk) 10:12, 15 October 2021 (UTC)[reply]
The idea may have merit if we can formulate a solid objective criterion, but I cannot resist pointing out that air resistance is a poor example. The term denotes a physical force, expressible in the unit newton. In general, designers try to minimize air resistance. The term wind resistance as commonly used (pace M–W) is an entirely different species, the ability to stand up to wind damage,[1] a highly desirable property (except for the sets of disaster flicks such as Twister).  --Lambiam 10:18, 15 October 2021 (UTC) Addition: In English the first component of such a compound can be the subject or the object of the action. In French you can see the distinction in the preposition used: résistance de l'air versus résistance au vent.  --Lambiam 10:39, 15 October 2021 (UTC)[reply]
As solid and objective a criterion as possible, yes, but it will never be mechanically objective, such that anyone can apply a rule and will always come up with the same answer. If we had only mechanically objective CFI criteria then we would never need RFD discussions. Mihia (talk) 13:14, 15 October 2021 (UTC)[reply]
I agree with Mihia's comment right above. In addition, is air resistance really as poor an example as you argue? M-W, as you point out, has a definition much more in line with that of "air resistance." Even if you claim it's not the most used meaning, you must accept that it is a meaning. And what for the other substitutes like "snow," "water," and "fluid?" (I haven't checked these on my own, but maybe you can make a case for your position based on these). Imetsia (talk) 15:55, 15 October 2021 (UTC)[reply]
Fluid resistance” is a more general term than “air resistance”. It is the resistance experienced by a body in motion, relative to a surrounding fluid. Usually the fluid is air, but when something else, the term “air resistance” is not appropriate. “Snow resistance”, “water resistance” and “wind resistance” generally refer to the ability to resist, or protect against, the intrusion or harmful effects of said phenomena or substances; having good wind resistance means the same as being windproof.  --Lambiam 10:44, 16 October 2021 (UTC)[reply]
@Lambiam: OK, I agree now that snow and water resistance do not fall under the same family of meanings as "air resistance." But I don't agree when it comes to wind resistance. "Wind resistance" definitely does have a similar meaning which is used quite commonly ([2], [3], [4] just for starters). According to the wiki article you linked, there's also "wave resistance," under the same family of meaning. So what would you think about the proposed policy if we switched "wind, snow, water, and fluid" with simply "wind, fluid, and wave?" Imetsia (talk) 18:29, 16 October 2021 (UTC)[reply]
  • I have a comment about the practical implementation of this idea, if it should go ahead. At WT:CFI it says "An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components [...] See Wiktionary:Idioms that survived RFD for other examples." We cannot therefore just plonk a "set phrase" test at Wiktionary:Idioms that survived RFD, as initially suggested, since quite likely the meaning of a set phrase can be easily derived from the meaning of its separate components. Mihia (talk) 17:27, 15 October 2021 (UTC)[reply]
In fact, the same could be said about some other tests at Wiktionary:Idioms that survived RFD, such as the "tennis player" test. It seems that this problem is a pre-existing slight muddle of the wording in these sections. Mihia (talk) 17:35, 15 October 2021 (UTC)[reply]
Support on my end. AG202 (talk) 21:47, 15 October 2021 (UTC)[reply]
Some dictionaries include a separate section of common collocations involving some term in their entry for that term. For examples, see the online Cambridge Dictionary and Collins. I think this would be a good alternative for us too.  --Lambiam 11:43, 16 October 2021 (UTC)[reply]

Voting to elect members to the Movement Charter drafting committee is now open (October 12 - 24)[edit]

Voting to elect members to the Movement Charter drafting committee is now open. In total, 70 Wikimedians are running for 7 seats in these elections.

Voting is open from October 12 to October 24, 2021.

We are piloting a voting advice application for this election. It helps show which candidates hold positions similar to the choices entered.

According to the set up process, the committee will initially consist of 15 members in total. 7 members elected in this process, 6 members selected by Wikimedia affiliates, and 2 members appointed by the Wikimedia Foundation. Up to 3 additional members may be appointed by the committee, and steps may be taken to replace members as needed.

More details and the voting link is on Meta.

Please feel free to let me know if you have any questions about this process.

Xeno (WMF) (talk) 01:47, 15 October 2021 (UTC) (Movement Strategy & Governance Team, Wikimedia Foundation)[reply]

Wiktionary:Votes/2021-10/Standardising wording for showing cognates[edit]

I recently created this vote, for consistency and standardisation. Looking for feedback, concerns, comments, etc. Svartava2 (talk) 16:49, 16 October 2021 (UTC)[reply]

The nuisance of a lot of edits in my watch list may exceed the small benefit. Other than that, I understand the proposal to be replacing all instances of the five strings before {{cog}} with a single one of them, and leaving all other uses of {{cog}} alone. Thus typos like "cognate witth" and alternate wording like "from the same origin as ..." would be untouched. I suggest leaving "include", "with", and "compare" alone and replacing "to" and "of" with "with". Which is not on the list of options. Note that include implies additional unlisted cognates. It is not correct for a bot to replace include with anything else. Vox Sciurorum (talk) 17:02, 16 October 2021 (UTC)[reply]