Wiktionary:Beer parlour/2021/December

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← November 2021 · December 2021 · January 2022 → · (current)

American English vs. things about the US[edit]

There are quite a lot of entries right now in Category:American English which are not really part of the American dialect of English, but are actually just describing things that are located in or about the US. This is true for other similar categories such as Category:Canadian English. Yes, Americans would be the ones to most likely talk about AFDC (Aid to Families with Dependent Children), for example, as that was a US-based organization, but what's barring someone from New Zealand or the UK from mentioning it exactly? It's not like color, which is strictly a dialectal difference between colour. Any ideas on what to do about this issue? PseudoSkull (talk) 15:49, 1 December 2021 (UTC)[reply]

This is an inherent problem with using {{lb}} to add regional categories: as a context label, "US" should mean "when someone is talking about the US". Unfortunately, in entries it's used mostly to refer to English as spoken in the US. This conflict in the understanding of what labels are for is hard to resolve. Chuck Entz (talk) 16:09, 1 December 2021 (UTC)[reply]
@Chuck Entz Perhaps we could have two separate labels, one being "American English" for the dialect and the other being "US" for things about or in the US? I know it's hard to draw that line now though, especially with a backlog of so many years of user error to split into to the correct categories. PseudoSkull (talk) 17:26, 1 December 2021 (UTC)[reply]
@PseudoSkull Doesn't this already exists? {{lb}} accepts the label "American English" and it already categorizes in Category:American English forms. In fact, I've created the "Brazilian Portuguese spelling" and "European Portuguese spelling" based on that idea and I've being trying to clean the previous Brazilian Portuguese and Portuguese Portuguese categories from words that are merely different spellings. It's some hard work, but I think it's doable. - Sarilho1 (talk) 13:42, 11 December 2021 (UTC)[reply]
The solution may be to completely bifurcate these into two separate templates: one for variations/dialects and one for topics. Or to force {{lb}} to have something like 1=us,ca and 2=nautical,women, etc. :/ —Justin (koavf)TCM 04:13, 2 December 2021 (UTC)[reply]
My understanding has always been the opposite of Chuck's, i.e. "(US) a large vehicle" means "this is a word that people in the US use to describe a large vehicle". That's how some print dictionaries do it. Equinox 04:26, 2 December 2021 (UTC)[reply]
At one time it seemed clear to me that labels had to do only with things like dialects, registers, and grammatical peculiarities or limits on the way a word is used with the definition in question. Now the increasing use of topical labels has inevitably led us to the current confusion. I'd love to see an end to it, but my imagination is deficient. DCDuring (talk) 20:12, 3 December 2021 (UTC)[reply]
I can imagine the difference in purpose of the labels being reflected in the typography:
fanny (countable and uncountable, plural fannies)
  1. [Britain, Ireland, Australia, New Zealand, South Africa] (vulgar) The female genitalia.
  2. [North America] (informal) The buttocks.
Like we have a Glossary of linguistic terms to which terms like vulgar and informal are linked, we might have “Varieties of <LANGUAGE>” pages discussing regional varieties, to which regional-variety labels can be linked. Or we can link to Wikipedia; British English is a better target than the current target Great Britain.  --Lambiam 16:50, 4 December 2021 (UTC)[reply]
My understanding matches Equinox's, as far as what labels like "US" are 'for'. I agree we need some way of distinguishing different kinds of labels, as more people want to have/use topic labels. (We've had issues with categories like "Category talk:German English" being populated with the English abbreviations for the names of German institutions because people mis-use country names as topic categories.) Ruakh brought up years ago that some dictionaries use different colours of highlighting for different types of labels (which creates accessibility issues for colorbind users, as well as generally being visually distracting), different kinds of brackets is an interesting idea and I suppose it's not any more likely to create confusion with the use of brackets as part of a definition, since parentheses are also already used in definitions (for example, see the definition I just added to sneeze). - -sche (discuss) 22:49, 6 December 2021 (UTC)[reply]
Square brackets for topics? DCDuring (talk) 22:54, 6 December 2021 (UTC)[reply]
My understanding is also the same as Equinox's. If a topic can be added too, then this ought to distinguish the cases, as for example US politics (universal term for something that exists in US politics) versus US, politics (political term used only in the US), but I sometimes wonder whether this distinction may be a little too cute for readers. Mihia (talk) 11:04, 10 December 2021 (UTC)[reply]
True. Perhaps only certain kinds of topic label would make sense (like "airplanes"), or perhaps none do... in the case of a universal term for something that exists in US politics, I think what's supposed to happen is that the US-ness is just spelled out in the definition itself, like with Congress. Your comment makes me realize that beyond being too cute for readers, it would likely be more than noob editors could grasp, either, so we'd recurringly find "US" topic label being used where the regional-dialect label was meant or vice versa. Perhaps we should just continue to treat "'US' used as a topic label" as a perennial cleanup task... - -sche (discuss) 00:46, 11 December 2021 (UTC)[reply]

Deleter role[edit]

We should create a "deleter" role that grants certain non-admins the deletion power. This could be useful in

  1. closing RFV's and RFD's, which are currently backlogged;
  2. deleting bot mistakes and obvious misspellings; and
  3. fighting vandalism.

I'd imagine that we'd give out the role by cooptation (as we do at WT:WL for the autopatroller role). Thoughts? Imetsia (talk) 17:22, 3 December 2021 (UTC)[reply]

We should also create a "citer" role that does something positive for those who provide qualifying citations for entries in the RfV queue, which is ballooning recently.
Editing or completely deleting a single definition in a multi-sense entry on with multiple L2s or even deleting all but the last L2 in a multi-language entry is already within the power of almost(?) any contributor.
Don't we have a role called "roll-backer"? DCDuring (talk) 19:01, 3 December 2021 (UTC)[reply]
The deleter role would provide most of the power that admins already have, if I am correct to assume that there are no limitations to what could be deleted. If so, there's no need for this role IMO, because if someone is ready to delete pages and revert vandalism, they're ready to be an administrator. In the realm of vandalism also, being able to delete pages made by the vandal and being unable to block them is a weird position to be in, to say the least of it. I'm inclined to oppose. PseudoSkull (talk) 19:11, 3 December 2021 (UTC)[reply]
  • I'm finding myself in agreement with @PseudoSkull here, and thus I oppose the suggestion. It's not clear to me how this proposed "deleter" role would be distinct from the existing "admin" role in any way that's actually useful. ‑‑ Eiríkr Útlendi │Tala við mig 20:03, 3 December 2021 (UTC)[reply]
@DCDuring, PseudoSkull, Eirikr: Addressing the crux of the argument (that we don’t need hairsplitting between admins and “deleter”s), consider that there have been at least two instances where users have wanted to give over the deletion power but withhold the block power. See, e.g., Tibidibi: “Other privileges can be given as necessary, but adminship means blocking rights, which IMO requires generally more professional conduct.” Or TheDaveRoss: “In my mind this [the blocking power] is the functions of admins which is most ‘powerful’, so anyone I would want having this ability I would be happy to have as an admin.” And I can see why: the blocking tools put one in a position of power over other users, so you don’t want to give it to someone whose integrity is in question. I’m not sure the same concern would apply to the deleting tool.
In addition, a deleter role would make the life of a contributor invested with it much easier. They wouldn’t have to bother an admin each time they need a page deleted. It would quicken the deletion of vandalism, bot mistakes, or entries that have failed RFV/D but have sat there for months for no discernable reason.
In sum, a page delete would be a useful addition to our toolbox. It would give admins the opportunity to delegate menial tasks such as the deletion of obvious nonsense/errors and to concentrate on more complex issues that really require admin attention: blocks, monitoring role in debates, adjudication in disputes, etc. And it would help people entrusted with the tools to do their job a bit more efficiently.
What do you guys think about what I have just outlined? Imetsia (talk) 00:17, 4 December 2021 (UTC)[reply]
Small update: there are now at least three instances, with @Fay Freak's comment at one of the recent admin votes: "I don’t like to vote for an unknown quantity, but it should be predictable what he would use the admin tools for... though he probably would have use for the deletion tool." Imetsia (talk) 00:47, 4 December 2021 (UTC)[reply]
Well, a deleter role makes more sense for a dictionary than at other Wikis where one writes massive articles. It would somewhat recoup the disadvantages from the fact that MediaWiki is not totally geared towards dictionaries. I can definitely be friends with it. Fay Freak (talk) 01:11, 4 December 2021 (UTC)[reply]
  • Symbol support vote.svg Support per Imetsia and because of its usefulness. This is just another right, a powerful one to be given carefully. However, being an admin is a whole different level. There are many users who aren't yet fit or willing for adminship but could do with this right — including myself. —Svārtava [tcur] 07:26, 4 December 2021 (UTC)[reply]
  • Symbol support vote.svg Support If implemented, it could definitely help sort out issues like obvious misspellings, an issue that I've encountered on multiple occasions and definitely expect too - especially when it comes to Urdu lemmas, much quicker. I also agree with Svartava2, on that adminship may not be appropriate for some users, and I include myself in that category, but would benefit the community, if they were granted the additional privileges.
    -Taimoor Ahmed(گل بات؟) 07:31, 5 December 2021 (UTC)[reply]
  • Symbol support vote.svg Support too - I've tagged thousands of things for speedy deletion in my lifetime, and would save lots of work by being able to delete them myself Notusbutthem (talk) 12:28, 6 December 2021 (UTC)[reply]
    Silly Wonderfool, votes are for kids. PseudoSkull (talk) 17:03, 6 December 2021 (UTC)[reply]
    Unstruck: this isn't a vote and had little value anyways now the vote is created. —Svārtava [tcur] 15:54, 9 December 2021 (UTC)[reply]
  • Symbol support vote.svg Support If given to the right people, this could be very useful. Vininn126 (talk) 14:06, 7 December 2021 (UTC)[reply]
A deleter role should also be a keeper role, depending on the individual circumstances. DonnanZ (talk) 00:26, 8 December 2021 (UTC)[reply]
@Donnanz: What does this mean? If you're thinking that a "deleter" role should also include an "undeleter" right, I agree. That's one of the rights included under my proposal. Imetsia (talk) 16:35, 8 December 2021 (UTC)[reply]
@Imetsia: No, that didn't occur to me, though I'm pleased you have that in mind. What I meant is that not everything that passes through RFD or RFV is deleted, and the closer quite often keeps the entry in question (in RFD) if there are enough keep votes, or if voting is fairly even, close the RFD and keep the entry as "no consensus". DonnanZ (talk) 16:58, 8 December 2021 (UTC)[reply]
That can be done by any user with no special rights. —Svārtava [tcur] 17:00, 8 December 2021 (UTC)[reply]
Oh ok, I think that right would be implied. Every editor can, at present, mark an entry as RFD-kept without having any special privileges. I don't intend on changing that. Imetsia (talk) 17:03, 8 December 2021 (UTC)[reply]
Interesting, I wasn't aware of that. Deleting an entry, and filing away the RFD is a different matter, of course. DonnanZ (talk) 19:17, 8 December 2021 (UTC)[reply]
@Imetsia: another wrinkle is that having the ability to undelete would also require the ability to view the deleted edits and their history.
Let's say for sake of illustration that someone in South Africa thinks they're just editing the memory in their smart phone, and posts their online banking account details and password in a new page they've created. Whenever we see something like that, we immediately delete it to protect the poster. That means when we're choosing someone with this role we're not just choosing someone who can be trusted to decide whether to delete files, but we're also choosing someone who can be trusted with privileged and sensitive personal information.
Yet another unasked question: what about revision deletions, a.k.a. hiding (and unhiding) of edits? Are they part of this? I wouldn't be surprised if they were, whether we want them to be or not. Chuck Entz (talk) 04:06, 9 December 2021 (UTC)[reply]
@Chuck Entz: No, that isn't being included in this. See the vote, and I've discussed this with Imetsia. What you point out has not been forgotten. Maybe if a user creates a new page with their banking info, that revision can be hidden to only admins? —Svārtava [tcur] 04:14, 9 December 2021 (UTC)[reply]
Or, alternatively, we can remove the undeletion right from the role of "deleter" if it proves unworkable. (That is, restrict "deleter" to include only the right to delete pages, and nothing about undeletions). And secondly, I wouldn't include hiding edits in the deleter role. Imetsia (talk) 15:44, 9 December 2021 (UTC)[reply]
@Imetsia: I disagree, undeleting pages would be equally important. Consider blocking right without the ability to unblock, or to make edits without the ability to revert. It's just fine to restrict this to just pages and not revisions. —Svārtava [tcur] 15:54, 9 December 2021 (UTC)[reply]
I'm just proposing that, if the undeletion right cannot be implemented as you proposed above (i.e., it's a technical impossibility), then it's better to remove it altogether from the role of deleter. Sure, both rights are important, but practical problems exist in instituting both of them in one role. Imetsia (talk) 16:02, 9 December 2021 (UTC)[reply]
@Imetsia: I don't think that this would be technically impossible, but in case it is, we'll just accept whatever could be done. —Svārtava [tcur] 16:34, 9 December 2021 (UTC)[reply]

Old Finnish dictionary now available online[edit]

The Nykysuomen sanakirja was quietly made available online back in February, on the Institute for the Languages of Finland (Kotus) website. That is significant because it has over 200,000 headwords, many of which have not yet been added to Wiktionary, which currently only has about 130,000 gloss entries. See Wikipedia-logo.svg Nykysuomen sanakirja on Wikipedia.Wikipedia heyzeuss 17:49, 3 December 2021 (UTC)[reply]

Language and proper noun[edit]

Hello, it seems that language names are proper nouns in English. At least, this is what I understand when I read English, Arabic and so on. But I also see Malayalam, Breton, etc. We are modifying all language name in English from noun to proper noun on the French Wiktionary but I would like to be sure about the English rule before massively do the modifications. Pamputt (talk) 10:26, 4 December 2021 (UTC)[reply]

Glossonyms are proper nouns in English, according to Wikipedia, anyway. ~ heyzeuss 15:37, 4 December 2021 (UTC)[reply]
(ec) I think that glossonyms are traditionally considered proper nouns in English. The Wikipedia article Linguonym (a horrible hybrid) defines its subject as “a linguistic term that designates a proper name of an individual language, or a language family.” They behave like proper nouns in having a specific referent, and are normally used without article as a singular. Proper nouns share these properties with mass nouns:
  • English has remarkable properties.
  • Ethan has remarkable properties.
  • Ethanol has remarkable properties.
At least in English, language names are the same as the corresponding adjective, and can be “explained” as the nominalization of that adjective resulting from the contraction “the X language” → “X”. This is not the case in all languages. Language names in Latin are adjectives that cannot be nominalized; to refer to Latin you have to use either the adverb Latine (Latīnē) (which can grammatically function as the object of a verb), or say lingua Latīna. Language names in Turkish take the form of an adverb formed from a demonym by appending -ce or a variant thereof; they too can be grammatically used as proper nouns, also as the subject of a verb.  --Lambiam 16:06, 4 December 2021 (UTC)[reply]
English words for language names are often a source of errors in (mis)translations. Language names in English are used as nouns is wider than in many languages. For example, "to speak/understand English" is translated as an adverb in a great variety of language - it is so in Slavic, Baltic languages, Albanian, etc.
"English books" (also adjective in English) is translated as adjectives, which are mostly lower case in European languages - e.g. French, Italian, Spanish, Swedish, etc.
The collocation "the English language" or "English" (proper noun, language name) itself is translated e.g. into Latvian/Lithuanian as "angļu valoda"/"anglų kalba" (language of the Englishmen).
In most languages, language names are common nouns. Nominalisation of adjectives is also common but may not be as standard/formal or the most common way to call a language. If they are proper nouns for whatever reason, they just follow the English way without any other good reason (e.g. Chinese or Japanese). --Anatoli T. (обсудить/вклад) 08:49, 5 December 2021 (UTC)[reply]
There've been a few discussions of this in the past. The short version is that they're generally/traditionally considered proper nouns, but there are also those who view them as common nouns, as well as those who (more consistently!) think "proper noun" should not exist at all. We have almost all language names entered as proper nouns, but a few are indeed currently listed as common nouns. (IMO, this should be fixed by changing those few ===Noun=== entries, but those who take the other viewpoint would prefer to do the inverse.) Similar questions have come up regarding whether religions are proper nouns. - -sche (discuss) 22:59, 6 December 2021 (UTC)[reply]
Yes, in English they are all proper nouns, if I wasn't clear but much less so in other languages, e.g. just a few European language names as they are used in their own languages français, italiano, español, Deutsch, svenska are all currently entered as common nouns. Interestingly, čeština is a proper noun here (lower case) but it's a "podstatné jméno" (common noun) in the Czech Wiktionary. --Anatoli T. (обсудить/вклад) 23:14, 6 December 2021 (UTC)[reply]
Previous discussions: Wiktionary:Beer parlour/2020/July § Are languages proper nouns or common nouns? which links previous discussions. Related issues: Wiktionary:Beer parlour/2019/September § Capitalization of proper nouns in languages using scripts without a lower/uppercase distinction Fay Freak (talk) 01:18, 7 December 2021 (UTC)[reply]
I think it is just the old collectivist paralogism to consider language names proper nouns, as languages aren’t uniform masses:
“Fay Freak's German is as well artfuller as correcter than that of most language critics.”
Labeed’s Arabic contains too many obscure terms for you to benefit from reading him, as a beginner.”
Festivities occur differently for various strata of society and individuals and are hence also common nouns: “My New Year’s Eve was of a boring character, I ate bland food and edited Wiktionary.”
Even Sunday is entered as common noun as it is but a convention that now we have that day and not the other.
More tricky it is for the names of peoples themselves since they are the collectives.
But ultimately I think that either languages or people are categorized, phylogenetically, like animals which are obviously entered as common nouns. If I speak German this is just a beast of many of such kinds that are called German; as with animals it has genetic constraints and epigenetic and acquired characteristics. As with vague vernacular names, “German” includes more diverse forms than “Russian”. Fay Freak (talk) 01:37, 7 December 2021 (UTC)[reply]

Eastern Cham and Western Cham[edit]

In the section for translations, could Eastern Cham (cjm) and Western Cham (cja) be sorted under Cham, like Eastern Mari and Western Mari sorted under Mari? --Apisite (talk) 10:44, 5 December 2021 (UTC)[reply]

@Apisite: Please remind me or others, which module is responsible. I think it's the translation adder. --Anatoli T. (обсудить/вклад) 00:05, 7 December 2021 (UTC)[reply]
@Atitarev: Yes, I meant the translation adder, sir. --Apisite (talk) 10:06, 7 December 2021 (UTC)[reply]
@Apisite: Thanks. It's been a while since I used it last. Please send a link. (You can simply call me Anatoli.) --Anatoli T. (обсудить/вклад) 23:07, 7 December 2021 (UTC)[reply]
@Apisite: OK, I figured it out. Yes check.svg Done in MediaWiki:Gadget-TranslationAdder-Data.js
		//Eastern Cham (cjm) and Western Cham (cja) to be nested under Cham (atitarev)
		cjm: 'Cham/Eastern',
		cja: 'Cham/Western',
Please clear the browser cache before trying it out. --Anatoli T. (обсудить/вклад) 00:48, 8 December 2021 (UTC)[reply]
@Atitarev: Thanks, Anatoli. Couldn't the code be as follows?
		//Eastern Cham (cjm) and Western Cham (cja) to be nested under Cham (atitarev)
		cjm: 'Cham/Eastern Cham',
		cja: 'Cham/Western Cham',
Also, couldn't the name thing be applied to the Sama and Sami languages? --Apisite (talk) 01:19, 8 December 2021 (UTC)[reply]
@Apisite: It's done. Please prepare a list or codes next time to make the task easier. --Anatoli T. (обсудить/вклад) 02:16, 8 December 2021 (UTC)[reply]


This template was RFD'd, with majority for delete. This was originally intended to be a non-language specific template of {{PIE word}}, like {{root}} for {{PIE root}}. However that idea didn't seem to be liked. One admin, Kutchkutch used the template for generating a category for a Proto-Dravidian word (diff). I think if we do it for PIE, it would be a good idea of consistency for this to be done with other protolanguages for words that have no associated root. Thoughts? —Svārtava [tcur] 13:19, 5 December 2021 (UTC)[reply]

If no one objects, I'll be changing this templates functionality to be used only for protolanguages. Pinging users who participated in the RFD discussion: @Inqilābī, Vox Sciurorum, Sgconlaw, The cool numel, Victar, Surjection, Erutuon, Kutchkutch, FenakhaySvārtava [tcur] 05:21, 10 December 2021 (UTC)[reply]
I’m not sure, as I don’t work with protolanguages. In the previous discussion, my point was simply that we should seek consensus one way or another on whether we find it useful to have categories along the lines of “English terms derived from the Proto-Indo-European root *XYZ”. If the consensus is that such categories are useful to have, then templates like {{root}} and {{word}} are fine. If not, then we should delete them. — SGconlaw (talk) 10:14, 10 December 2021 (UTC)[reply]
@Sgconlaw: That's a good point. But see, in the RFD discussion for {{PIE word}}, the majority votes to keep it, so it can be taken as a consensus for such categories too (since the template's purpose is categorisation). —Svārtava [tcur] 10:52, 10 December 2021 (UTC)[reply]
@Svartava2: in that case, why limit {{root}} {{word}} to protolanguages? What is the disagreement here? — SGconlaw (talk) 11:10, 10 December 2021 (UTC)[reply]
@Sgconlaw: {{root}} will function freely, not limited to protolanguages; this proposal is about {{word}} which was initially created for all languages. {{word}} fails RFD, with the current votes. However, as in diff, an admin used it for Proto-Dravidian, and it does appear a good idea to have consistency between PIE and other protolanguages if not PIE and all languages (which was struck down at RFD). —Svārtava [tcur] 13:29, 10 December 2021 (UTC)[reply]
@Svartava2: sorry, I meant {{word}}. Right, I didn't know it is likely that the template is going to be deleted. — SGconlaw (talk) 13:52, 10 December 2021 (UTC)[reply]
@Sgconlaw: It has been re-functioned (so the RFD discussion carries little value as it no longer has the same function), and now if it is deleted, {{PIE word}}, which basically has the same function specific to PIE, will get deleted along with this. —Svārtava [tcur] 14:01, 10 December 2021 (UTC)[reply]
The template failed RFDO. Delete it. -- 20:01, 10 December 2021 (UTC)[reply]

Catalan and its supposed descent from Old Occitan[edit]

Wiktionary currently treats both Modern and Old Catalan as descendants of Old Occitan. That is reflected not only in numerous etymologies (cf. 1 2 3) but also explicitly stated in the 'ancestor list' on Category:Catalan language.

That is, however, a mistaken notion. From their earliest records, the two languages already had significant differences: Occitan, for instance, had a noun-case system; Catalan did not. Occitan retained the Latin dipthong /au̯/; Catalan generally did not. Occitan had /-as/ in feminine plurals and certain 2SG conjugations; Catalan instead had /-es/. Some further differences are discussed here.

Wiktionary's habit of deriving Catalan from Old Occitan has unfortunate results, such as the claim that Catalan peu 'foot' (< Latin pedem) derives from a variant of Old Occitan pe— a variant that did not exist, because the vocalization of Latin /-d-/ to /u̯/ is characteristic of Catalan rather than Occitan. In the latter language, vocalized Latin /d/ instead yields /i̯/; cf. Occitan creire < */ˈkredre/ < Latin crēdere, versus the Catalan cognate creure. The claim that Catalan preu (< Latin pretium) derives 'from a variant of Old Occitan pretz' also fails to convince; no vocalized form exists in Occitan, and Catalan had merged /dz/ (</-tj-/, etc.) into /ð/ (< /-d-/) already in pre-literary times. Various tenth- and eleventh-century examples of its subsequent vocalization to /u̯/ are discussed in the Manual of Catalan Linguistics, pages 452–3. One could go on with more examples, but that may be excessive for our purposes here.

Now, all of this is not to deny that the two languages are close: they are, and that is why they are grouped together under the (perhaps unfortunately-named) category of Occitano-Romance. That should not be taken to mean, however, that Catalan derives from Old Occitan any more than Spanish derives from Old Portuguese, or vice-versa. It would be more accurate, as well as consistent with modern scholarship, to derive Catalan from Old Catalan and to treat the latter as a contemporary to Old Occitan, rather than a subsequent development from it. Nicodene (talk) 05:19, 7 December 2021 (UTC)[reply]

As a native speaker, but not an expert in etymology of Old Catalan, I have been waiting for comments on this. I am also surprised by the systematic treatment of Catalan as descendant of Old Occitan. It seems to me that this is a view established by early German Romanists and currently outdated. There are too many features that cannot be explained via Occitan. Historically, first Catalan texts alternate Occitan in poetry and Catalan in prose. Ramon Llull, the first great writer in Old Catalan, translated his texts into Occitan himself. This is diglossia, not a substrate. --Vriullop (talk) 08:10, 15 December 2021 (UTC)[reply]
@Nicodene, Vriullop: Nice breakdown and citation. So what do we need to do? —Justin (koavf)TCM 16:19, 15 December 2021 (UTC)[reply]
If people agree on this, it could just be a matter of gradually changing Catalan etymologies as we run across them. (Also the descendants section of Latin terms, which I am working on.) Pinging @Linguoboy, @Ultimateria, @Word dewd544 @Narunnaia, and @Jberkel as users who have contributed extensively to Catalan entries. Nicodene (talk) 22:00, 15 December 2021 (UTC)[reply]
I recommend that if action items come from this that you post them to Wiktionary:About Catalan. Thanks. —Justin (koavf)TCM 22:22, 15 December 2021 (UTC)[reply]
Will do. Nicodene (talk) 07:29, 16 December 2021 (UTC)[reply]
I do actually agree with Nicodene that Catalan should not be treated as an actual descendant of Old Occitan/Provencal, and that the relationship is more nuanced. To be more precise, Old Occitan and Old Catalan probably had a close common ancestor (at least probably closer than other Romance languages), but accurately speaking, Old Catalan should not be seen as deriving from Old Occitan (as it was attested anyway- one problem is that some of the Old Occitan terms we have attested and used as a primary lemma here cannot have been the ancestors of Catalan forms as they are, and both are attested after the languages "split"). I had been adding them the way I had so far precisely because it was Wiktionary's policy, for better or for worse. I am for some kind of revision of this policy, but I'm not sure the best way to go about it. As far as the precise relationship between the two, it is probably more complex than we make it seem. They may have had some shared roots (perhaps with divergent, distinct features as far back as Vulgar Latin) but Catalan seems to have come under the influence of the Ibero-Romance family and Occitan the Gallo-Romance... they're both basically a continuum that act as a bridge between those families, as well as to Italian and Rhaeto-Romance more distantly. Word dewd544 (talk) 01:15, 17 December 2021 (UTC)[reply]
For the descendants sections, perhaps it would be convenient to follow the groupings shown here. Here is a quick mock-up of what that might look like (following a general East → West trajectory): https://en.wiktionary.org/wiki/mulier#Descendants Nicodene (talk) 01:22, 18 December 2021 (UTC)[reply]
In the example I would add Old Catalan muller, muyler, the two most common spellings. The second one is relevant as it shows the old pronunciation of some -ll- as /j/, as in current Balearic. Vriullop (talk) 10:49, 18 December 2021 (UTC)[reply]
Done. Regarding the etymologies, it seems that a total of 1099 will need to be changed. Most of these should fall under Catalan terms inherited from Latin, and some under Catalan terms borrowed from Occitan. I will aim to sort 100 entries per day, adding DCVB links where needed. Nicodene (talk) 10:01, 19 December 2021 (UTC)[reply]
Is there perhaps actually two Old Occitans? The common ancestor, of which later regional forms, from after a split, have been treated as representative, and one after a split of Old Occitan and Old Catalan? Like actually Proto-Occitan and then Old Occitan and Old Catalan? The ancestor of Galician likewise has confusing naming. Fay Freak (talk) 12:28, 20 December 2021 (UTC)[reply]
What you have described is, strictly speaking, 'Proto-Occitano-Romance', a phrase so obscure that this comment (as of the time of writing) contains approximately half of all occurrences on the Internet. That is because scholars instead work with Proto-Gallo-Romance as the relevant point of departure. Proving a subsequent Proto-Occitano-Romance stage would require establishing pre-literary innovations common to Catalan and Occitan, but absent from both Oïl and Francoprovençal, and that is a daunting task. The anar/alar isogloss might be relevant, if these verbs (which mean 'go') are not cognate.
A serious difficulty for this hypothesis is that Catalan, like eastern Ibero-Romance, palatalized Latin /nn/ and /ll/, while Occitan merely degeminated them. Compare the following outcomes:
Latin castellum, annum > Catalan castell, any; Occitan castèl, an.
That the Catalan castell and any did not pass through a stage with /-l/ and /-n/, only to have these sounds later palatalize in word-final position, is shown by the persistence of non-palatalized /-l/ and /-n/ in cases such as:
Latin pālum, manum > Catalan pal, man (> ).
That Occitan castèl and an did not pass through a stage with /-ʎ/ and /-ɲ/, only to have these sounds depalatalize in word-final position (in pre-literary times), is shown by the persistence of palatalized /-ʎ/ and /-ɲ/ in cases such as:
Latin fīlium, iūnium > Occitan filh, junh.
(In modern Occitan, these may depalatalize in central dialects, but that was not the case in Old Occitan, hence the spelling.)
Accordingly, the latest forms common to Catalan castell/any on the one hand, and Occitan castèl/an on the other, would have been */kasˈtɛllV/ and /ˈannV/, with a geminate still in place, and so a final vowel as well. That /nn/ and /ll/ (as also /rr/) survived after the Western Romance degemination of intervocalic plosives is confirmed by, among other things, their sporadic persistence to this day in the Belsetán dialect of Aragonese.
The /-V/, however, of */kasˈtɛllV/ and */ˈannV/ is problematic for our 'Proto-Occitano-Romance' hypothesis. We know that the loss of final unstressed vowels dates firmly to the Proto-Gallo-Romance stage, since it is common not only to Occitan and Catalan, but also to Oïl and Francoprovençal. (Indeed, if one had to choose a single defining characteristic for Gallo-Romance, it would be this.) So we have here an impossibility: the Proto-Occitano-Romance forms predate the Proto-Gallo-Romance ones. I digress.
For Galician and Portuguese, the situation is different, in that the Romance of this territory entered its literary phase without anything comparable to the isoglosses that divided Old Occitan from Old Catalan. That is perhaps because the Reconquista already reached Catalonia in the late eighth century, far earlier than Portugal, especially any part of it south of the Douro river.
That said, I do not see why Wiktionary would name it 'Old Portuguese' rather than '(Old) Galician-Portuguese'. That is especially strange considering that the language originated in Galicia. Nicodene (talk) 02:38, 21 December 2021 (UTC)[reply]
Portugal has an army and a navy... Chuck Entz (talk) 03:23, 21 December 2021 (UTC)[reply]
Seriously, though, I believe the idea behind the way we dealt with Catalan and Occitan etymologies was to do something like we do with Old Indic: Classical Sanskrit is an artificial creation from only one part of the Old Indic dialect continuum, while Vedic Sanskrit is ancestral to other parts. The longstanding scholarly practice, however, has been to treat both as just part of "Sanskrit", and to use "Sanskrit" in etymologies as a standin for Old Indic in general. We made a conscious decision to do the same, because reading between the lines in older references to determine who was refering to which lect when they discussed "Sanskrit" was too much of a distraction.
Given that modern Catalan and Occitan are closer to each other than they are to either French or Spanish, treating them as coming from a single language is perfectly understandable. If you do that, Old Provençal is a convenient standin for whatever that common language might have been. I'm not saying it's right, but it probably worked well enough with the state of things at the time. Chuck Entz (talk) 04:19, 21 December 2021 (UTC)[reply]
I can understand it as a convenient shortcut- one that has created problems for us down the line. What advantages there are in it can be retained by grouping Catalan and Occitan together, in the descendants sections of Latin words, under a label such as 'Southern Gallo-Romance', without claiming that one derives from the other. Cross-referencing Catalan and Occitan etymology sections with 'compare {{cog|...}}' would also work. Nicodene (talk) 08:07, 21 December 2021 (UTC)[reply]

Codes for Two Languages[edit]

See here for more details. --Apisite (talk) 10:05, 7 December 2021 (UTC)[reply]

Mispellings and their Criteria for Inclusion[edit]

WE should tackle the issue of misspellings - obviously there is a line between common mispellings worthy of inclusion given the mispelling template, and mispellings that are one-offs or scano-s, etc. We should determine where that border is an formalize this a bit more. Vininn126 (talk) 14:08, 7 December 2021 (UTC)[reply]

Wiktionary:Criteria_for_inclusion#Spellings has some paragraphs on this already, but no hard rule. Equinox 16:30, 7 December 2021 (UTC)[reply]
Right, that's one of the problems. Before style guides have been a source of regulation on the issue, and perhaps that's a good place to start, take their recommended spellings. From there, we could set a certain threshold of frequency for a mispelling outside the selected style guides? Vininn126 (talk) 16:49, 7 December 2021 (UTC)[reply]
My idea was that we have an expectation that a certain number of reliable sources refer to the misspelling as a misspelling, as well as show evidence that the misspelling is actually attested as normal. Since for something to be a misspelling it has to be considered as such by some widespread consensus, the reliable sources given could provide evidence that this consensus exists. Showing something in use alone doesn't prove it's even misspelled. We recently had a debate on Discord about patroler as we weren't sure if it was alternative form or a genuine misspelling. I'm not sure what to consider a "reliable source" or whatever, that's up for elaboration. Any thoughts? PseudoSkull (talk) 16:54, 7 December 2021 (UTC)[reply]
A search tip: if you find the same author using a word several times in the same text, and the outlier spelling only occurs once, that's presumably a misspelling. Equinox 17:06, 7 December 2021 (UTC)[reply]
To clarify, if something looks like a misspelling, are you proposing that authorities have to call it a misspelling before it can be deemed an important enough misspelling to include? Or proposing that authorities have to call it a misspelling before we can consider it a misspelling? (Or something else?) An obvious issue with the second idea is that people generate misspellings far faster than any authority can notice, let alone proscribe; whether someone is misspelling the name of a hard-to-spell lexeem because they don't know how to spell it, or typoign it by acicdent, or it's not clear which (lexeem could be either...), people can generate errors way faster than an authority can notice. So if we only label things misspellings once they're common enough that prescriptive authorities notice them, we might be in the odd position of treating less common misspellings as more valid than common-but-proscribed ones. - -sche (discuss) 03:42, 9 December 2021 (UTC)[reply]
I think we should refer to some specific style guides, and then set the bar for misspellings pretty high. Vininn126 (talk) 21:15, 9 December 2021 (UTC)[reply]
Yes, high bar especially when someone wishes to save an SoP phrase and seeks out three runtogether misspellings in order to invoke the ridiculous "coalmine" rule. Mihia (talk) 23:43, 10 December 2021 (UTC)[reply]
My idea would be to peg it at a particular percentage. That is, if a misspelling accounts for x% of the usage of both the misspelling + the correct spelling combined, then it is common enough to be included. I'm sure we could do this with a simple comparison of G-Books hits. Imetsia (talk) 16:47, 8 December 2021 (UTC)[reply]
May I just raise a general caution about Google hit counts. For example, you can type random crap into general Google search, even within quotation marks, and get the Large Random NumberTM. For example, when I search for "it red off" I get "About 573,000 results". Actual retrievable results run out at 37. Even Ngrams, which we rely on, has been shown to be hopeless at distinguishing between closed, open, and hyphenated forms, which is rather depressing, and diminishes my overall trust in it. I don't know whether / how much these issues would affect numbers for typical spelling errors, or whether in fact ratios of hit counts will be reliable enough. With Book Search, I don't see a hit count on page 1 (does anyone?). To determine counts I would have to page through result pages, which would be problematic with large numbers, and there is no guarantee anyway that Google will go on delivering results as long as more exist; I suspect that it may "give up" after a certain point. Mihia (talk) 21:08, 9 December 2021 (UTC)[reply]
For Google Books, the default is to have that Tools button on the right depressed, which causes various search-option dropdown menus to appear, such as Any view, Any document, and Any time. These menus appear in the same place in the UI that the hit count usually appears. If you click Tools once, the dropdown menus should be replaced with the hit count. ‑‑ Eiríkr Útlendi │Tala við mig 01:27, 10 December 2021 (UTC)[reply]
@Eirikr: Thanks, I never knew that. Seems a strange design. I can confirm that the Very Large Random NumberTM does also exist in Book Search in certain circumstances, e.g. "About 52,900,000 results" for "it it it". Though I suppose that particular case is somewhat pathological, it just makes me nervous and distrust the numbers generally. Also, even in cases of less extreme counts, it is worrying that the number of retrievable results can be a fraction of the headline hit count, e.g. "yeyayo" gives "About 213 results", but only 18 actually retrievable. Do the remaining "about 195" actually exist? Another worrying thing that happens generally is that hits can come up in the list, but the searched-for item is not shown in the snippet, and when you click, you are thrown into some position where the searched-for item cannot be found, so who knows whether it exists anywhere. Mihia (talk) 09:51, 10 December 2021 (UTC)[reply]
Re: UI, ya, that was a strange design decision on Google's part. Can't say I'm a fan.
Re: hit counts, I've generally found that Google's results have become progressively less reliable and less useful, the more that Google tries to "help" by making it's search features more "clever" -- guessing what the user wants, no longer respecting "exact-match strings in quotes", no longer respecting -negative_matches, nor -"negative exact-matches in quotes". They're turning what used to be a decent and straightforward search into something infected by the ghost of Microsoft's Clippy -- overly "helpful" and presumptive results that assume that the developer knows best, that the developer doesn't have to explain what they're doing, and that the user is fucking moron.
(sorry, did I say that out loud?)
Anymore, I've learned that Google results must be taken with a rather large grain of salt. The size of that grain has been getting bigger and bigger over time, and if I'm not careful, it does have a deleterious effect on my blood pressure. (Not entirely joking, given the frustration of rubbish results, especially those where the desired string doesn't actually exist in the returned text.) ‑‑ Eiríkr Útlendi │Tala við mig 18:39, 10 December 2021 (UTC)[reply]

Category:Superstitions ?[edit]

I think there should be a Category:Superstitions for things like touch wood, four leaf clover, black cat etc. None Shall Revert (talk) 19:53, 7 December 2021 (UTC)[reply]

One man's superstition is another man's religion. The issue is that the term "superstition" inevitably makes a statement about the epistemic value. Maybe naming it something along the lines of "supernaturalism" could work though. Fytcha (talk) 20:39, 7 December 2021 (UTC)[reply]
You also have a better term folk belief – why have you man not created its Wiktionary entry, it being a defined academic subject? Fay Freak (talk) 09:06, 8 December 2021 (UTC)[reply]
Right, superstitions and religious practices/beliefs are objectively indistinguishable. Mihia (talk) 21:18, 9 December 2021 (UTC)[reply]
The person who wrote this is the absolute last person who should be touching the conspiracy theories category. WordyAndNerdy (talk) 04:17, 13 December 2021 (UTC)[reply]
Right, I like to use words with definition. It’s mighty eccentric of course, amongst dictionary authors, to like definition. Fay Freak (talk) 04:39, 13 December 2021 (UTC)[reply]
  • Symbol support vote.svg Support While the labelling of certain beliefs as "superstitious" may arouse controversy, it is my belief that there's a certain core of beliefs (such as the examples gave above) that nearly everyone would deem superstitious without objection. Hazarasp (parlement · werkis) 08:37, 10 December 2021 (UTC)[reply]
    @Hazarasp: So is that an oppose or a support for the category? Thadh (talk) 20:43, 10 December 2021 (UTC)[reply]
    My bad. Hazarasp (parlement · werkis) 22:30, 10 December 2021 (UTC)[reply]
  • Symbol oppose vote.svg Oppose too subjective. We already get people doing stuff like adding ordinary words like deplorable to “Category:en:Conservatism” and “Category:en:Donald Trump” (which I reverted). — SGconlaw (talk) 10:17, 10 December 2021 (UTC)[reply]
    Deplorable has come to be a general pejorative for a Trump-aligned Republican. It falls within the scope of the Conservatism category. Political categories encompass derogatory terms, e.g. feminazi under Category:en:Feminism, or the dozen-odd woke coinages under Category:en:Leftism. Making Category:en:Donald Trump into a subcategory of Category:en:Conservatism is probably easier than individually adding the 80+ Trump-related terms to the Conservatism category. Although I recognize there's a possibility for miscategorization of terms related to Trump's real estate ventures, television shows, etc. WordyAndNerdy (talk) 03:42, 13 December 2021 (UTC)[reply]
  • Symbol support vote.svg Support per Hazarasp. Imetsia (talk) 16:49, 13 December 2021 (UTC)[reply]
  • Symbol oppose vote.svg Oppose Too subjective. Too dependent on our current prejudices. DCDuring (talk) 02:03, 14 December 2021 (UTC)[reply]
  • Symbol support vote.svg Support. If whether something has to be categorised in it is disagreed by considerable users, don't categorise it and this applies to EVERY such similar label like "rare", "formal", "literary", etc., not specific to this. —Svārtava [tcur] 02:58, 18 December 2021 (UTC)[reply]
  • Symbol support vote.svg Support. The historical mislabeling of non-mainstream religious beliefs as "superstitions" has blurred what I would contend is a very real distinction: there are minor practices and beliefs regaring the supernatural that aren't really part of any religion. Someone who carries a rabbit's-foot isn't following some religious doctrine: they just think it brings good luck, and they have no idea why. I'm sure one could trace it back to a time when rabbits had a connection with a specific deity, just as we can trace English words back to Proto-Indo-European, but that says nothing about what's in the minds of modern people. We just need to be careful not to use the category for things that people think or do because of their religion. Chuck Entz (talk) 04:58, 21 December 2021 (UTC)[reply]
  • Symbol oppose vote.svg Oppose per Fytcha & Fay Freak. ·~ dictátor·mundꟾ 16:58, 29 December 2021 (UTC)[reply]
  • Weak support. While there are edge cases, superstitions are typically easily distinguishable from religion. Someone who thinks Jesus is the son of God and who also thinks that walking under a ladder is bad luck is unlikely to regard the first as superstition or the second as religion, and pejorative use of "superstition" (including, I would add to Chuck's comment, e.g. atheists being pejorative to even mainstream religion) aside, I think other people grasp the distinction too; frankly, the distinction between "mythology" and "religion" is a lot messier, when neopagan religions involve Greek, Norse, etc mythological figures, but for better or worse our categories already make that distinction. The weakness of my support stems from doubts over how many entries such a category would be useful for (should any object which is the subject of superstition go in, like ladder? So should something like sock go in Category:Armor because they can sometimes be made of mail? Ehh. So how many entries do we have that are definitionally about superstitions, like knock on wood?), and concern that if there's this much opposition they may be onto something, or at least might make a point of mis-filling the category. - -sche (discuss) 19:56, 29 December 2021 (UTC)[reply]

Borrowing or calque?[edit]

How should we classify words that could be analysed as both a borrowing or a calque? See for instance esimerkki: The Ingrian term is derived from Finnish, but since both components (esi- and merkki) exist in the target language, one can't possibly make a distinction between a borrowing and a calque. Ideas? Thadh (talk) 21:07, 9 December 2021 (UTC)[reply]

Personally I don't know if there's much of a problem as listing both if it is truly analyzable as such. It will pop up in many categories, but that's alright, if the situation calls for it. It will probably depend on a word-by-word basis, but this is something we often do on Polish pages, providing things such as "synchronically analyzable as" with the appropriate suffixes. And specifically referring to "calques that look like borrowings", I think calque MIGHT be better, given that it's a TYPE of borrowing, if you squint your eyes. Vininn126 (talk) 21:13, 9 December 2021 (UTC)[reply]
In a few cases we already use “borrowing or calque” (or “calque or borrowing”): elektronvolt, strēlnieks, Westfalish. —⁠This unsigned comment was added by Lambiam (talkcontribs) at 20:13, 10 December 2021 (UTC).[reply]
Yes, but whether they are categorised in "CAT:X terms borrowed from Y" or "CAT:X terms calqued from Y" is inconsistent, so I propose we make it consistent. Also, strēlnieks seems like something that could be (dis)proven to be either a borrowing or a calque, which is a separate case altogether. Thadh (talk) 20:40, 10 December 2021 (UTC)[reply]
See also the related discussion at Wiktionary:Etymology_scriptorium/2021/October#Romanian_asexualitate. Fytcha (talk) 20:55, 10 December 2021 (UTC)[reply]

Karia alphabet request[edit]

Discussion moved from Wiktionary:Requests for verification/Non-English#Karia alphabet request.

We would like to request the inclusion of the Karia alphabet used in Myaing Gyi Ngu, Karen State, which is not yet included in the wiki. The Karia alphabet is one of five Karen alphabets, the Karia alphabet is used by around 100,000 S'gaw Karen living in Myaing Gyi Ngu, Karen State, I used to learn this Karia alphabet as a child, when I learned this Karia alphabet, it was not up to date, since 2019, Karen scholars in Myaing Gyi Ngu have been working to improve the Karia alphabet.

  1. =/kàkʰo̰/
  2. =/kʰàkʰí/
  3. =/ɡàɡílà/
  4. =/ɡàɡí/
  5. =/ŋàθí/
  6. =/sa̰mɛ̀kʰa̰lí/
  7. =/sʰa̰sṵ/
  8. =/za̰mú/
  9. =/zàsúmḛ/
  10. =/ɲa̰wíɲɔ̀/
  11. =/papa̰lè/
  12. =/pʰa̰pʰè/
  13. =/ba̰pwè/
  14. =/ba̰kʰɔ̰mḛ/
  15. =/ma̰mɛ́/
  16. =/ja̰jà/
  17. =/ʃa̰kʰa̰lɔ́/
  18. =/ja̰ʃṵ/
  19. =/wa̰ʔɔ́nù/
  20. =/θàθwḭ/
  21. =/na̰nà/
  22. =/tʰa̰tʰṵ/
  23. =/dàdé/
  24. =/dàmó/
  25. =/dàpʰá/
  26. =/θa̰θà/
  27. =/ɓàbḛ/
  28. = /tàjɔ̀dé/
  29. =/mṵmɛ̀/
  30. =/la̰lá/
  31. =/ha̰húpʰá/
  32. =/rahùbḛ/
  33. =/la̰ka̰lì/
  34. =/ta̰tʰa̰tʰí/
  35. =/rakɔ̀bó/
  36. =/ʔa̰ʔḛ/
  37. =/na̰ná/
  38. =/ta̰taʔ/

I want this Karia alphabet included in the wiki, see this mnw: page for an example, thanks.--Music writer Dr.Intobesa of Japanese idol NMB48 and BNK48. (talk) 12:44, 10 December 2021 (UTC)[reply]

@咽頭べさ The page you posted this on is for nominating entries for deletion on the grounds that they have never been in use. This is where you should make requests such as this. Chuck Entz (talk) 23:01, 10 December 2021 (UTC)[reply]
After a little research, I'm not sure we can do this. The 38 characters in your list seem to all be in a Unicode Private Use Area (EF00-EF1F), which means that they have no official status in Unicode. They may display as Karia characters on your device, but they will probably display as something completely different on most others. For me, they display mostly as tofu, with a few displaying as random variations of various Latin-script letters.
To have entries, we would probably need to have an ISO script name and entry names consisting entirely of non-Private Use Area Unicode characters. There might be some way to work around this using image files and "Unsupported titles" entry names combined with support from our CSS files, but it would be rather tricky and unwieldy. Pinging @Erutuon, who would know more about the issues involved. Chuck Entz (talk) 23:37, 10 December 2021 (UTC)[reply]
@Chuck Entz,Yes, The Karia alphabet is not yet official in Unicode due to its lack of modernity, there are many more unofficial alphabets in Burma than Unicode. To this day, we also encounter Unicode alphabet errors in the Mon language, Unicode may not have alphabet problems for other languages, but alphabet problems for the minority language of Burma still exist today, our experts are still researching and refining the Unicode alphabet for the minority language of Burma on a daily basis. The Karia alphabet Unicode font was developed by Saw Tha Awa in 2021. We, as experts, are really looking forward to the official introduction of the Karia alphabet Unicode.
*You can watch more than Karia alphabet reading video
The alphabet used in the S'gaw Karen language varies from region to region, although they refer to the Karia alphabet as the original Karen script, we archaeologists do not find it in the ancient inscriptions. I have been doing archeology for nine years and to this day I have never seen the Karia alphabet in ancient inscriptions, for whatever reason, I want their Karia alphabet included in the wiki, as far as I know, the Karia alphabet was founded in 1963 by the U Thuzana Buddhist monk in Myaing Gyi Ngu, when I compare the Karia alphabet with the Carian alphabet, I see some similar alphabets, for whatever reason, I think the wiki should be included in the wiki so that the public can learn about the Karia alphabet, thanks.--Music writer Dr.Intobesa of Japanese idol NMB48 and BNK48. (talk) 06:09, 11 December 2021 (UTC)[reply]
On a procedural/technical level, if a script isn't encoded in Unicode, we can't have entries in it or mentioning it in text form : we shouldn't use the Private Use Area characters, because these can't be expected to display correctly, and will even clash if another font encodes another unencoded script or anything else in the same spots. If there are examples of the script in use in what we would consider reliable and/or durably-archived sources, and people want to pursue adding images of "Karia script form: ..." to entries, that could be made to work (like is done for our one Ersu entry to show Ersu Shaba script), although the better approach might be to use those sources to get Unicode to add the script, and then come back here. - -sche (discuss) 04:37, 13 December 2021 (UTC)[reply]
@-sche, Can you accept that I am now using this kahkui. format, if you do not agree with this kahkui. form, you can delete it, but you need to explain to me why you should delete it, thanks.--Music writer Dr.Intobesa of Japanese idol NMB48 and BNK48. (talk) 20:26, 1 January 2022 (UTC)[reply]

Dolphin (team member) etc[edit]

If I see a headline like "former Dolphin arrested" I know what it means but others might need help, so I'm thinking a definition like "member of a sports team such as the Miami Dolphins" would help. Is this accepted? A relevant RFD is for alouette. None Shall Revert (talk) 12:22, 12 December 2021 (UTC)[reply]

Lion has some entries. None Shall Revert (talk) 12:41, 12 December 2021 (UTC)[reply]
It is accepted. The whole scheme “someone who engages in” is not covered by particular provisions of WT:CFI. It is the same with Minecrafter and TikToker and even Georgian, which last the OED includes in spite of not including Georgia, while we have special provisions for place and brand names. You can’t really draw a line because otherwise you would end up banning Catholic, Tea Partier and Republican, just as German Armine is someone from Bielefeld’s major football club and then someone from several Burschenschaften and that brings us to political parties. Of course capitalization can’t have any influence on anything. Dictionaries seem more friendly towards discernible human activities than all the specific things that humans devise. Fay Freak (talk) 01:32, 13 December 2021 (UTC)[reply]
The definitions should be worded to include any team in any sport. It is silly to define Bronco to include only a member of, for example, the Denver Broncos when there are numerous school teams, such as the Broncos of Boise State University, Western Michigan University, Santa Clara University, or the schools of Bronxville, NY. An especially for the most common name would probably be warranted. BTW, there are Dolphins at Jacksonville University, College of Staten Island, Coastline Community College, LeMoyne College, etc. DCDuring (talk) 02:18, 13 December 2021 (UTC)[reply]
If it's attested it's fine AFAIK; Viking is another example, and Trojan and Cub, whereas Spartan, Bear, Raven and Colt are currently missing a sports-team sense. If there are a lot of sports teams with the name it's probably advisable to have a general sense as DCDuring says, with "especially..." or subsenses. (Now, what about e.g. Lidl or Kmart in a headline like "man robs two Lidls"/"two Kmarts"?) - -sche (discuss) 04:21, 13 December 2021 (UTC)[reply]
It’s a loophole. You cite Kmarter and then you can argue about including Kmart. I just see KFCer defined as a customer of Kentucky Fried Chicken. Fay Freak (talk) 04:46, 13 December 2021 (UTC)[reply]
We have WT:BRAND and WT:NSE that deal with brands and names of specific entities specifically, albeit unclearly. BTW, we have IBMer, IBM, but not w:International Business Machines. DCDuring (talk) 02:35, 14 December 2021 (UTC)[reply]

a symbol for syllable boundaries when they're different from hyphenation[edit]

The hyphenation, supplied in {{hyphenation}}, is oftentimes mistaken for syllabification at least as far as Hungarian is concerned, and with a good reason. There is a convention that single vowels are not left alone at the beginning or end of words (or their compound elements), e.g. alap (basis) and dió (walnut) are traditionally not subjected to hyphenation in Hungarian[1] even if they undoubtedly consist of two syllables each (a-lap, di-ó). (@Panda10)

I wonder if the above template could be upgraded so it can indicate syllables that are not to be separated for visual (not phonological) reasons. The most comprehensive orthography manual for Hungarian uses two distinct symbols: the hyphen for hyphenation and the middle dot for traditionally inseparable syllable boundaries. Could we possibly use something similar here?

On some occasions, a full stop (period) was inserted for this purpose as an ad hoc fix, but I'd prefer to have a more standard and uniform solution with general consensus. (The hyphen itself is not suitable since some words already have a hyphen. In fact, abbreviations have periods so this symbol is not the best either.) Adam78 (talk) 17:15, 13 December 2021 (UTC)[reply]

What about adding the syllable marks to the Hungarian IPA? So, [ˈɒ.lɒp] rather than [ˈɒlɒp]? Most languages do hyphenate the same way they syllabify, so I'm a bit hesitant supporting any site-wide changes... Thadh (talk) 17:28, 13 December 2021 (UTC)[reply]
In German it depends on whether you follow the 1996 reform rules or not. Only afterwards außen can be au·ßen
Between 1996 (effective from 1998) and 2006 there can also be a single separated vowel letter, O·fen. Only between 1996 and 2006 you have Bi·omüll
Only since the 1996 reform s and t are hyphenated on a syllable boundary, as before it would look ugly in Fraktur with ſ where you think together the ſt ligature. Where on the other hand before the 1996 rules one separated the ⟨ck⟩ in Zucker against syllable boundaries into Zuk·ker.
But as I understand the {{hyphenation}} template is only to indicate hyphenation, not syllabification, so where’s the craic? Just don’t mistake it for syllabification? Fay Freak (talk) 18:29, 14 December 2021 (UTC)[reply]
Is it possible to make something for Hungarian like we have for Polish and Finnish? I mean pl-pronuncation. Vininn126 (talk) 17:43, 13 December 2021 (UTC)[reply]
@Adam78 If a systemwide change is not feasible, here are a couple of ideas for Hungarian-specific solutions:
  1. Simply add the middle dot to the current template where it is needed: {{hyphenation|hu|di‧ó}}, {{hyphenation|a‧lap|sza|bály}}.
  2. Add two {{hyphenation}} templates with different labels. The first would be the current one with a Hyphenation label, the second would use the "caption" parameter to generate a Syllabification label: {{hyphenation|hu|di‧ó|caption=Syllabification}}
  3. Create a {{hu-hyphenation}} template that calls {{hyphenation}} and add the modification specific to Hungarian. There are Hungarian templates that apply this solution, such as {{hu-noun}} calling {{head}}. Panda10 (talk) 19:08, 14 December 2021 (UTC)[reply]

Thank you all!

@Fay Freak yes, the info on hyphenation is misleading; e.g. people may believe the two vowels create some diphthong together (such as in the name Zoé if no hyphen is inserted in the hyphenation), that's why we need to clarify the syllabification.

@Thadh, I wouldn't like to include something extra in the pronunciation as it may suggest one needs to pronounce something between those two syllables (like a glottal stop in German after prefixes, afaik).

@Panda10: The simple middle dot doesn't really make sense to me since the horizontal bar already has this result and it's not different from the current solution (but it's more difficult to type). Did you mean that the default middle dot should be replaced with a hyphen (as usual in Hungarian publications) and then the middle dot could be applied for the syllabic purpose? – The second is good in terms of appearance but it's a bit tedious to implement and it takes two lines where one would suffice. The third looks perfect, even if it may take a module to implement a replacement, e.g. from di.ó to di͜·ó or di⁐ó (what do you all think about these?), and this would also allow the template to display some note and/or include a link to a page where this special symbol is explained. So @Vininn126, indeed, this seems to be the workable way. Adam78 (talk) 20:41, 14 December 2021 (UTC)[reply]

@Adam78 You're right. The first option doesn't make sense, not sure what I was thinking. :) The character should not be something unusual such as in di⁐ó, or as in előadás. They look confusing to me. A regular period would be fine, since abbreviations are not hyphenated, it should not be a problem. The Finnish and Polish entries use the Syllabification label for all their entries, and I think this would not be a bad solution for us, just adding caption=Syllabification to all Hungarian hyphenation templates. But I understand that you prefer a new template and module. Panda10 (talk) 15:36, 16 December 2021 (UTC)[reply]

@Panda10 In fact, we cannot call it syllabification in general, as e.g. the syllabification of rendőr would be ren-dőr (despite its elements) and it's not what we want to display. I don't think a template can be avoided if we want to have a standard note and/or include a link about the symbol (as I wrote) and also in case we find something more suited later instead of the period. Adam78 (talk) 11:43, 17 December 2021 (UTC)[reply]

OK, let me know if there is anything I can do to help. Panda10 (talk) 19:13, 17 December 2021 (UTC)[reply]

Draft namespace[edit]

Is there any reason that the English Wiktionary doesn't have a Draft namespace like the English Wikipedia does? I think the same advantages and reasons for having one carry over. It could also be a place to store definitions for which we are awaiting attestations in durable media. 07:34, 15 December 2021 (UTC)[reply]

Also for building up citations. --None Shall Revert (talk) 12:38, 15 December 2021 (UTC)[reply]
Use the citations namespace. DTLHS (talk) 16:59, 15 December 2021 (UTC)[reply]
Sorta feels like the sandbox. Vininn126 (talk) 14:08, 15 December 2021 (UTC)[reply]
I don't see the need: an entry in a dictionary is a lot more brief than an encyclopedia article. —Justin (koavf)TCM 15:52, 15 December 2021 (UTC)[reply]
Some people create drafts of currently unattested terms as subpages of their user space: User:PseudoSkull/backwards_long_jump I wouldn't trust in the permanence of that as an IP though. Fytcha (talk) 15:56, 15 December 2021 (UTC)[reply]
They should be using the citations namespace. DTLHS (talk) 16:59, 15 December 2021 (UTC)[reply]
I thought only citations go to the citations namespace, not complete articles that don't pass CFI yet. Do you have an example where a citations page is used in this manner? Fytcha (talk) 17:02, 15 December 2021 (UTC)[reply]
That's correct, citations and putative definitions but not complete articles. There would be no need to create a "draft" page for any other type of content. DTLHS (talk) 17:05, 15 December 2021 (UTC)[reply]
I disagree. There is value in PseudoSkull's numerous user space articles, not least because they can be found via search engines. Fytcha (talk) 17:10, 15 December 2021 (UTC)[reply]
I don't agree that draft articles should appear in external search engines at all. DTLHS (talk) 17:12, 15 December 2021 (UTC)[reply]
We also have {{under construction}} to provide some warning to users when extensive inline entry (or language section) revision is under way, though this is little used AFAICT. (But see Highland Park.) DCDuring (talk) 16:29, 15 December 2021 (UTC)[reply]
Let me revise my comments above. I think the citations namespace has a confused and unspecified purpose that partially overlaps with what you might call a "draft" namespace. It might be good to revisit what we think this namespace is for. DTLHS (talk) 17:19, 15 December 2021 (UTC)[reply]
I don't find the purposes of Citations namespace so much confused as multifarious. It hold citations that
  1. exceed those required for any specific definition
  2. are ambiguous
  3. do not have a specific definition
  4. attest to the existence, but not the meaning of the headword.
  5. attest to a usage feature, eg, uncountability, passive use.
Those are just what come to mind. We could take the trouble to try to provisionally differentiate these with an eye to eventually codifying our approach. At present I don't see why we would want to exclude any of these from Citations space. DCDuring (talk) 18:42, 15 December 2021 (UTC)[reply]

Persian automated transliteration[edit]

Question book magnify2.svg Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

Following up on Module_talk:fa-translit#Review Calling Persian editors: (Notifying Ariamihr, Dijan, Mazsch, Qehath, ZxxZxxZ): and editors who might be interested. @Taimoorahmed11, Benwing2, Erutuon, Fenakhay, Fay Freak. Please ping more people I missed or consider updating the Persian group in Module:workgroup ping/data. --Anatoli T. (обсудить/вклад) 08:55, 15 December 2021 (UTC)[reply]

Also pinging @Tibidibi, Kaixinguo~enwiktionary who might be interested in this topic.

Update: rather sluggish response so far and no native speakers yet. Should we provide cases on how it could or should look on each option? I didn’t realise the complexity of multiple transliterations. Should we consider multiple headwords instead for modern Iranian and classical, @Benwing2? BTW, the Russian (and others) can use |or=. Isn’t that a solution? No need to cram everything with commas. --Anatoli T. (обсудить/вклад) 05:46, 17 December 2021 (UTC)[reply]
You could try soliciting editors from fa.wp or fa.wikt. —Justin (koavf)TCM 05:59, 17 December 2021 (UTC)[reply]

You can vote in all three options. If you disagree with the procedure or how the mini-vote is worded/structured, you can also comment separately, below this line.

Option 1[edit]

  1. Option 1: Persian headwords should have diacritics (allow multiple) and allow automated transliteration. Include an invisible "sukūn" (jasm) to have a more accurate transliteration and exclude false positives. Target modern Iranian transliterations (i.e. "e" and "o", not "i" and "u"). It's based on the status quo.


  1. Symbol support vote.svg Support It seems to be the default and most common variety and I personally think it would be confusing to use the older transliteration (no longer in use and very different from modern standard Iranian Persian). We should provide alternative pronunciation in the pronunciation section. --Anatoli T. (обсудить/вклад) 09:29, 15 December 2021 (UTC)[reply]
  2. Symbol support vote.svg Support If I'm allowed to vote, per Atitarev None Shall Revert (talk) 12:35, 15 December 2021 (UTC)[reply]
  3. Symbol support vote.svg Support Benwing2 (talk) 13:54, 15 December 2021 (UTC)[reply]
  4. Partial Symbol support vote.svg Support. I think only one transliteration standard should be included in the header - and that being Iranian Persian, while a separate template should be made (possibly a replacement of Template:fa-regional) to represent the various dialects. Adding multiple transliterations in the header, will make it look confusing, and another valid point is plurals and other cases.
    -Taimoor Ahmed(گل بات؟) 18:01, 20 December 2021 (UTC)[reply]


  1. Symbol oppose vote.svg Oppose, in favour of the third option. As a reader, I would like both the Iranian + Classical transliteration shown. —Svārtava [tcur] 06:48, 16 December 2021 (UTC)[reply]
  2. Symbol oppose vote.svg Oppose, since Persian encompasses multiple dialects and chronolects. Modern Standard Iranian Persian need not be overrepresented. ·~ dictátor·mundꟾ 14:54, 19 December 2021 (UTC)[reply]


Option 2[edit]

  1. Option 2: Persian headwords should have diacritics (allow multiple) and allow automated transliteration. Include an invisible "sukūn" (jasm) to have a more accurate transliteration and exclude false positives. Target classical transliterations (i.e. "i" and "u", not "e" and "o").




Option 3[edit]

  1. Option 3: Persian headwords should have diacritics (allow multiple) and allow automated transliteration. The headwords should display modern Iranian, classical and Dari transliteration (if different)


  1. Symbol support vote.svg Support. I think just Classical + Iran should suffice, with two different romanisations. Both can be automated if we do this right, and it's similar to the idea proposed for Hebrew (showing both Biblical and Israeli). —Μετάknowledgediscuss/deeds 00:50, 16 December 2021 (UTC)[reply]
    It should be fine, as long as we know the potential number of possible alternative transliterations. For the plurals, etc. the transliterations may need to be suppressed to avoid overcrowding. --Anatoli T. (обсудить/вклад) 03:23, 16 December 2021 (UTC)[reply]
  2. Symbol support vote.svg Support per Metaknowledge. —Svārtava [tcur] 06:48, 16 December 2021 (UTC)[reply]
    Partial Symbol support vote.svg Support. I think only one transliteration standard should be included in the header - and that being Iranian Persian, while a separate template should be made (possibly a replacement of Template:fa-regional) to represent the various dialects. Adding multiple transliterations in the header, will make it look confusing, and another valid point is plurals and other cases.
    -Taimoor Ahmed(گل بات؟) 19:07, 17 December 2021 (UTC)
    @Taimoorahmed11: It seems you support option #1 but you have voted in option #3. --Anatoli T. (обсудить/вклад) 22:19, 18 December 2021 (UTC)[reply]
    Sorry my bad
    -Taimoor Ahmed(گل بات؟) 17:59, 20 December 2021 (UTC)[reply]
  3. Symbol support vote.svg Support. ·~ dictátor·mundꟾ 14:18, 23 December 2021 (UTC)[reply]


  • I think we should display Persian headwords without diacritics. Note that writing Arabic fully vocalized is common but it's not the case with Persian (as well as most other languages that I know that use Arabic-based scripts). Most persian dictionaries don't do that, and no Persian-Persian dictionary ever published do that. I think we should develop tools for editors instead to quickly add transliteration to Persian words in the editor mode. --Z 15:32, 22 December 2021 (UTC)[reply]


  1. Symbol abstain vote.svg Abstain --Anatoli T. (обсудить/вклад) 09:33, 15 December 2021 (UTC)[reply]
  2. Symbol abstain vote.svg Abstain This is a cool idea but I'm not sure how easily we can implement it given the current handling of transliteration. Probably better to show the Dari and standard Iranian pronunciations in the Pronunciation section, as you mention above. Benwing2 (talk) 13:54, 15 December 2021 (UTC)[reply]
    @Benwing2: What makes this any harder to implement than the option you supported? —Μετάknowledgediscuss/deeds 07:51, 16 December 2021 (UTC)[reply]
    @Metaknowledge The transliteration handling isn't set up to have multiple transliterations outputted. We could maybe hack it so the transliteration module itself outputs two or three transliterations, comma-separated with labeled HTML, but the code doesn't expect the module to do that so I expect various things will go wrong. Benwing2 (talk) 07:58, 16 December 2021 (UTC)[reply]
    @Benwing2: This is a very easily surmountable problem, and not one unique to Persian. As you can see at Wiktionary talk:About Hebrew, we intend to display two romanisations for Hebrew headwords. We should not shy away from a little more work if it's the right choice for the dictionary. —Μετάknowledgediscuss/deeds 08:08, 16 December 2021 (UTC)[reply]
    @Metaknowledge (a) it's not obvious to me it's the right choice, esp. to just display two transliterations side-by-side without glossing them; (b) "a little more work" is probably underestimating what needs to be done to get it right, based e.g. on the work I did suppressing hyphens in the display of Korean text but including them in transliteration. This took a lot of subtle work in the guts of places like Module:script utilities; given the complexity of the changes, I doubt very many people other than me and User:Erutuon could have gotten it working in a non-buggy fashion. My fear is that you try to implement multiple transliterations in a hacky fashion and it ends up having a bunch of edge cases that don't work right. Benwing2 (talk) 08:15, 16 December 2021 (UTC)[reply]

Taboo language[edit]

I think we should create the label and appropriate categories for words used to avoid taboos. I was actually pretty surprised we don't have this yet, considering it's a pretty important feature of many languages. See for instance тшӧтшыд: I don't think there's any appropriate label we can use for such words yet. See also the appropriate Wikipedia article. Thadh (talk) 15:09, 16 December 2021 (UTC)[reply]

Symbol support vote.svg Support, if used with specific languages, this feels like a no brainer. Vininn126 (talk) 15:14, 16 December 2021 (UTC)[reply]
I just discovered there's the label "avoidance", that links to the appropriate glossary term, so I guess we don't need the label after all. It would be nice to create a categorisation though. Thadh (talk) 15:59, 16 December 2021 (UTC)[reply]
Should we add "taboo" as an alias for "avoidance"? - -sche (discuss) 20:05, 19 December 2021 (UTC)[reply]
That would actually help, but I was primarily thinking about adding categorisation (something like CAT:Avoidance terms by language). Thadh (talk) 17:19, 20 December 2021 (UTC)[reply]


(@Sangjinhwa) Seems to mass-copy Wikidata / Wikipedia titles without proper care (e.g. missing genders). Should I mass-convert everything to {{t-check}}? Fytcha (talk) 22:10, 16 December 2021 (UTC)[reply]

Or remove the translations altogether. I've left them a message asking them to stop. Ultimateria (talk) 19:27, 17 December 2021 (UTC)[reply]

ISO 3166-2 codes[edit]

Preamble: ISO 3166 consists of three parts:

  1. ISO 3166-1: Official country codes (2 * 249) (all on Wiktionary)
  2. ISO 3166-2: Country subdivision codes (5123) (almost none on Wiktionary)
  3. ISO 3166-3: Not relevant here.

Proposal: Regulate whether, which and how ISO 3166-2 codes are to be documented.

Motivation: I've added all Swiss Canton two-letter abbreviations as German proper nouns because that's useful information to look up. The definitions are of the form "Abbreviation of ISO 3166-2:CH code of <canton>" (see e.g. FR) because this lets me link nicely to the corresponding Wikipedia page (click on the ISO 3166-2:CH). As Switzerland is a multi-lingual country, one could (should?) duplicate these entries to French, Italian, Romansh and Alemannic, at which point a single entry in the translingual section might be more sensible. However, listing the abbreviated ISO 3166-2 code for Swiss subdivisions while making no mention of other 3166-2 codes corresponding to the same two letters is misleading in my opinion (listing only some 3166-2 codes gives off a false impression of completeness). It has to be pointed out that, to the best of my knowledge, ISO 3166-2 is only a formal documentations of subdivision abbreviations that have mostly been in use much before that (by postal services, the government etc.) and thus labeling the two-letter codes as (abbreviations of) ISO 3166-2 codes may opaque their true etymology. --Fytcha (talk) 23:23, 16 December 2021 (UTC)[reply]

Option 1[edit]

Allow the inclusion of all abbreviations of ISO 3166-2 codes in the translingual section. A sample for how this could look like can be found here: User:Fytcha/FR


  1. Symbol support vote.svg Support as every alternative is worse: Having the same definition in 5 languages is worse, including only some of the subdivisions coverd by 3166-2 is worse (misleading, in fact) and not referring to at least some kind of list (I really like the Wikipedia lists) is also worse. Also, my proposed article structure above turned out really neat in my opinion. --Fytcha (talk) 23:23, 16 December 2021 (UTC)[reply]
  2. Symbol support vote.svg Support. AG202 (talk) 12:34, 23 December 2021 (UTC)[reply]



Option 2[edit]

Allow the inclusion of abbreviations of ISO 3166-2 codes in the translingual section, but disallow adding them in bulk. They should only be added upon interest.




Option 3[edit]

Disallow the inclusion of abbreviations of ISO 3166-2 codes in the translingual section.



  1. Symbol oppose vote.svg Oppose. --Fytcha (talk) 23:23, 16 December 2021 (UTC)[reply]


Extend RFV deadline for CJK and Non-English[edit]

This is not a binding vote. I may start a formal vote once I've gotten more input.

Wiktionary:Requests_for_verification/Header currently reads: "Closing a request: After a discussion has sat for more than a month without being “cited”, or after a discussion has been “cited” for more than a week without challenge, the discussion may be closed. Closing a discussion normally consists of the following actions:".

It is very customary for requests to take longer than that and to be cited long after the deadline of one month, simply by virtue of there being much fewer editors of other languages (as compared to English), many of which additionally possibly don't particularly enjoy quote-hunting. We should reflect the status quo in our formal policies. Moreover, this closes a potential loophole: An elaborate troll currently has the legal backing to go around and close loads of non-English RFVs, much to the displeasure of the respective language's editor community. I propose a lower bound of 2 months before deletion. --Fytcha (talk) 23:42, 16 December 2021 (UTC)[reply]

@Fytcha We could simply decree that Wonderfool may not close RFV's. Benwing2 (talk) 04:28, 17 December 2021 (UTC)[reply]
I'd agree with that! Less boring work for Wonderfool :) Br00pVain (talk) 15:00, 22 December 2021 (UTC)[reply]
@Br00pVain: I haven't seen you frequently going around closing non-English RFVs. —Svārtava [tcur] 16:27, 22 December 2021 (UTC)[reply]
Yeah, I don't really speak non-English Br00pVain (talk) 22:17, 22 December 2021 (UTC)[reply]
Well, WF, you probably also speak Spanish and Wonderfoolese, which doesn't seem to be exactly English :P —Svārtava [tcur] 05:11, 23 December 2021 (UTC)[reply]
Both this or Benwing's ideas sound fine, this being more... formal, as it were. Vininn126 (talk) 21:13, 17 December 2021 (UTC)[reply]
Support. AG202 (talk) 03:32, 23 December 2021 (UTC)[reply]
  • This proposal may be very helpful for some languages and not-so helpful for others. The need of the hour is for English RFV, where Wonderfool has been sending a ton of terms. But I send terms in my languages to RFV only after checking for attestation in Google Books, etc. and find myself impatient even having to wait a month for what is obviously uncitable. So here, I would not like a new 2 month rule. We could do something like if an editor of that language closes RFV then one month is fine and if a non-editor closes, then 2 months. —Svārtava [tcur] 05:11, 23 December 2021 (UTC)[reply]
    @Svartava2: Good idea. Under your proposal, a delay of even 3 months (for non-editors) could seem reasonable. It would certainly yield more discretion to the specific language-editing subcommunity. Fytcha (talk) 12:04, 23 December 2021 (UTC)[reply]
  • I don't work on non-English entries (or only rarely), but generally speaking, for all languages, it seems to me that there should be a mechanism for flagging that a competent editor has made a reasonable effort to cite a term before it is deleted at RFV (including possibly the editor who listed it). An entry should not be deleted,* whatever the allowed time period, if no one has properly looked at it. Mihia (talk) 17:56, 24 December 2021 (UTC) * Unless speedily deleted as patent rubbish.[reply]
    • I agree, this is why editors sometimes answer RFVs with what they have come to know even if it resolves not—superficially unhelpful comments. Too bad that sometimes there is no effort reasonable. Like the supposed literal meaning of عَبَّرَ(ʿabbara) buried under its homonyms, I would dig up in hard-to-understand pre-Islamic poetry if I soothfast read such swinks. We can only mark then that we’d like to cite but are forced to believe the references or other circumstances because we lack the manhours and library expenses. In a case of Albanian borrowings in Serbo-Croatian I needed hours to but check whether reality of the terms is evinced, while none of these dialect terms I deem sham: you can just take it or leave it; often it is great enough and meeting practical need the most if there is credible chain of transmission, this is our sunna. We would need something like frequency warnings beside words down to “known from few old mentions” (in fact the absolute majority of the form and sense combinations of medieval Arabic dictionaries fails quality attestation), if it wasn’t to create too much work again as one would then debate how this frequency is constructed. Fay Freak (talk) 18:34, 24 December 2021 (UTC)[reply]


For karaoke I wanted to tell people that "sing karaoke" is a thing, I thought maybe there could be a 'collocations' section but it seems that's not allowed (vote: Wiktionary:Votes/2015-09/Adding a collocations or phrases namespace or section) so I put it in derived terms. You can also "do karaoke" or perform it, how should this kind of thing best be handled? General Vicinity (talk) 03:55, 18 December 2021 (UTC)[reply]

This has something I've thought about for a while. Other dictionaries, including other Wiktionary projects, do this as well. It would create tension but also solve some of the problems we have of "is this phrase SOP or not?", as some are collocations. One of my solutions to this has been to use uxi templates with the collocation in mind. I'm curious what other editors think. Vininn126 (talk) 10:18, 18 December 2021 (UTC)[reply]
Another example is party, you can throw one, hold one, have one. You could have three usexen, or put them all in one like "to throw/hold/have a party", put them in usage notes, find a citation for all of them, leave them out... General Vicinity (talk) 13:44, 18 December 2021 (UTC)[reply]
I am also of the opinion that something has to change in this regard. A laxer SOP policy (such as Imetsia's proposal) would be a step in the right direction as it would allow us to include more collocations as derived terms.
The main reason why I don't use Wiktionary for language-learning is exactly this: Collocations are indispensable linguistic information. For instance, I don't see anything in the article judgment that tells me which verb to use. Compare this to leo.org, which lists a plethora of really useful common verb collocations. Fytcha (talk) 13:45, 18 December 2021 (UTC)[reply]
What if we added a new header for collocations, and we could have a section with the given colocation and its translation? Vininn126 (talk) 19:00, 18 December 2021 (UTC)[reply]
I seems like there would be considerable administrative baggage in creating a new section header, it would be easier if collocations were accepted as "related terms" or "derived terms" or could be put under "usage notes". No-one has reverted my sing karaoke edit yet and I've put bank deposit, which as deleted, as a derived term at deposit to see how that flies. As for translations, if they are nonobvious a bluelinked entry can be created per WT:THUB and if they are obvious maybe they are unnecessary. General Vicinity (talk) 12:12, 19 December 2021 (UTC)[reply]
I could see usage notes. I think we need a way to not generate a bunch of new red links, as often colocations are not inclusion worthy, which is kind of the the whole thing here. Bank deposit is completely understandable from it's parts, but should probably be listed somewhere. This is why I use something like the uxi template - compare amortyzacja. it's not the most elegant solution, but something LIKE that, again, perhaps under usage notes.
I can see your concern about adding a new header section, but it might be worth it, or at least worth considering. Vininn126 (talk) 12:31, 19 December 2021 (UTC)[reply]
We already have many entries with such related/derived terms, mostly as redlinks. As redlinks they offer a constant temptation to add an entry. Perhaps they could appear as unlinked terms. Because we have many different kinds of users, we might put all such unlinked collocations in a box (as the ones for derived and related terms etc) and have a setting or gadget that had the collocation box open or closed by default based on (registered) user preference. DCDuring (talk) 17:52, 19 December 2021 (UTC)[reply]
This could also be an interesting solution. Making a collapsible option is a good idea, as some words can have plenty of collocations that end up just making clutter. Perhaps there should also be a smaller option, for when a word only has a few. Vininn126 (talk) 18:10, 19 December 2021 (UTC)[reply]
The "show/hide" bar allows for an explanation of 'what lies beneath' (eg, "collocations: common phrases using PAGENAME"), so no heading is necessary if it appears under, say, derived terms. The show/hide bar doesn't take up too much more space than a single line. Perhaps hover text instead of a link.
We should also give some thought to what makes a good or bad collocation, so that we don't waste time on adding and deleting silly items. DCDuring (talk) 19:57, 19 December 2021 (UTC)[reply]
Coming up with some criteria would probably be a good idea, tho I'm not exactly sure what criteria we could use. Vininn126 (talk) 20:16, 19 December 2021 (UTC)[reply]
As you've noticed, SOP collocations currently get deleted as entries and pruned out of "related"/"derived terms" lists (and I support that, so I wouldn't support adding SOP collocations there). Currently, such things are "intended" to be given as usexes AFAIK, or in usage notes if there is more to say about them (different connotations between different collocations, etc). I've supported adding a namespace or section where they could exist and have their own translations. I'd still support adding a collocations section, so they don't swamp actually-idiomatic "derived terms" (or a namespace so each collocation could especially have its own header and nested translations section). The vote six years ago was pretty evenly divided but there seems to be more support(?) for collocations now. - -sche (discuss) 20:25, 19 December 2021 (UTC)[reply]
I think the problem was that inclusionists were worried that a collocations section would undermine the justification for the translation hub exemption: if the collocations could have translation tables, there would be no need to have a freestanding entry for an SOP collocation with non-SOP translations.
I would say that any proposal for a collocations section should address those concerns by making sure that the collocations sections could not be used to eliminate translation hubs. I'm not sure exactly how to do that, but that issue should be addressed.
Also to be considered would be the mechanics: I would prefer to have a separate namespace or at least a subpage to avoid overloading the main page. We currently have 19 entries in CAT:E due to out-of-memory errors in spite of our best efforts. Collocation sections would add more.
Aside from that, collocations would take up space on pages that are already extremely long in many cases, even if we were able to find a way to get all the etymology and pronunciation sections out of the way. Some common function words have dozens of senses, which would all require collocations.
That also raises the question of tieing collocations to specific places in the entry. It's hard enough keeping the translation tables in synch with the definitions- this would be far more complicated, with each sense potentially having multiple collocations and some collocations fitting more than one sense. Chuck Entz (talk) 21:06, 19 December 2021 (UTC)[reply]
I think translation hubs could be moved to these new sections, if we decide to give collocations a translation section. A pain in the rear, to be sure, unfortunately. But it should at least be a relatively smooth process. Translation hubs always felt out of place to me, anyways. I dunno about other editors. Why wouldn't we want to migrate them? It's would in theory be at no loss of inclusion and clean things up a bit, centralize them. As for translations - perhaps we'd need a module that would include a built in translation table of sorts into each collocation. This may potentially clash with the idea that the entire section should be collapsable. Vininn126 (talk) 21:26, 19 December 2021 (UTC)[reply]
Chuck makes a good point/reminder that some people opposed collocations sections because they didn't want THUBs or other borderline-idiomatic things to be moved out of having their own main-namespace entries and into mere sections of other entries or another namespace, so it might be best to leave the tiny fraction of collocations which are THUBs alone for now and have the collocations sections only handle the vast majority of collocations that don't meet criteria for having their own entries, at least if we're including translations in the sections/namespace. (If the sections are just listing collocations like derived terms sections just list derived terms, then sure, I see no problem with including links to entry-having collocations in the lists.) Given page size and memory-usage concerns, Chuck is right that we should probably be looking at a namespace (rather than a new L4 section) if we wanna include translations tables. I don't think this would require any new modules, though: just have a page like "Collocations:foobar" with L2 language headers and L3 headers for each collocation, like "===take foobar===", "===make foobar===", and then put a translations table under each collocation/header...? - -sche (discuss) 01:16, 20 December 2021 (UTC)[reply]
I could see this working. Vininn126 (talk) 13:27, 23 December 2021 (UTC)[reply]
The "Derived terms" and "Related terms" sections are presently not used consistently throughout Wiktionary, and/or not used according to WT:ELE (though that documentation needs more examples of simple cases that are frequently encountered, e.g. of open, closed and hyphenated compounds; the main discriminating example for "related" is highly specialised, even pedantic). Also there are "Hyponyms", "Hypernyms" etc. that are patchily or inconsistently used in overlap with "Related terms" and "Derived terms". I am a little nervous about adding yet another section, "Collocations", which editors may not in practice consistently distinguish from all the others, creating even more randomness. If anything, I would like to see more consolidation of these sections, or at least in a way that prompts editors more clearly about which categories are available, and what should go where. Mihia (talk) 11:11, 23 December 2021 (UTC)[reply]
We'd have to use our heads as to what gets moved to a collocation and what doesn't. And while derived and related terms aren't used often, I don't see that as an argument for their non-existence. Not every page needs each and every L4, but some pages need certain ones. Vininn126 (talk) 13:28, 23 December 2021 (UTC)[reply]
On the contrary, I find that derived and related terms sections are used quite often (for English entries), but they may be used indiscriminately. It may be that editors at this forum have a clear idea of what is a collocation, what is a derived term, and what is a related term, but I am saying that many others will not, and they will probably not bother to read or understand WT:ELE either, or even be aware of it, if the present situation is anything to go by. Generally speaking, I would like to see more structured editing tools to enforce more consistency, so for example a tool to add a "see also", in the most general terms, that prompts the editor to choose a category, with brief and clear explanations at that point of what should go where. Mihia (talk) 13:59, 23 December 2021 (UTC)[reply]
I thought that it would be obvious that "derived terms" meant (etymologically) derived + terms in the language of the L2 section. Only slightly less obviously, "related terms" meant (etymologically) related + terms, ie, principally cognates in the same language. I also thought that both of these should, in principle, exclude terms that are SoP, so mere collocations would be excluded.
One serious problem with having a separate heading for collocations is that for polysemic terms the collocations usually apply only to specific definitions. A display something like that for synonyms ({{synonyms}})) might be better than the other ideas that have been advanced so far. DCDuring (talk) 15:55, 23 December 2021 (UTC)[reply]
So we could have the collocations directly under the definition? Vininn126 (talk) 16:00, 23 December 2021 (UTC)[reply]
I guess you could say the same about other sections such as "Derived terms" and "Related terms", but I suppose in those cases there is (or should be) a definition to link to, which ought to clarify which definition(s) are relevant, so the problem may be more pronounced with collocations, which (I assume) would not have linked definitions. Putting the collocations alongside the definitions would have the effect of separating them further from e.g. Derived/Related terms, so might increase the likelihood of duplications and misplacements. Also, how long could the lists become? If they need to be collapsed then that would be unsightly unless the collapsible section can be made "lighter" in look and feel than the present implementations, which I guess would be technically possible ... So ... dunno, but on the other point, unfortunately the use of the related/derived sections may not be as "obvious" to others as it is to you, and I have created a new thread below at Wiktionary:Beer_parlour/2021/December#Derived_terms_and_Related_terms. Mihia (talk) 19:27, 23 December 2021 (UTC)[reply]
I'm not imagining there to be very many. The kinds of collocations I'm thinking should be there would be VO for verbs (principally light verbs) and nouns, AdjN and AdvAdj for adjectives, etc. I would be inclined to keep prepositions and any characterization of complements in labels. Less common words might warrant some loosening of restrictions. My main concern is whether we can successfully enforce any kind of limits on collocations. We seem to have a lot of problems just with SoP terms. DCDuring (talk) 22:33, 23 December 2021 (UTC)[reply]
If we're talking about fairly short lists of collocations next to individual definitions, we may be more or less back to a point where they are like usexes. In some places we do effectively already have these, e.g. "a hot topic" under "hot" definition "Of great current interest; provoking current debate or controversy". If e.g. "hot topic" is, give or take an article, a collocation in our terms, perhaps we actually don't need to do anything new that we're not doing already? Mihia (talk) 22:52, 23 December 2021 (UTC)[reply]
Yes, they would be more like usexes. Many of our better usexes have addressed points of grammar. Collocations are often more about semantics and custom. I was coming to the same idea: at least initially we should just include collocations as usage examples, possibly using a template like {{synonyms}}, but without linking. After we have done that productively we can see whether that is satisfactory in general and whether some definitions need something more. DCDuring (talk) 23:22, 23 December 2021 (UTC)[reply]

Collocation criteria[edit]

We might as well separate this from the main discussion, which is pretty long.

I would say that a collocation should either

  1. demonstrate something about the semantic or syntactic properties of the term, or
  2. show common types of usage, so readers can identify usage they may encounter. Chuck Entz (talk) 20:22, 19 December 2021 (UTC)[reply]
I think we should take into consideration different parts of speech, such as what prepositions, verbs, adjectives etc might collocate with the given thing. Vininn126 (talk) 21:27, 19 December 2021 (UTC)[reply]
I have three basic requirements for a collocation:
  1. It should be a collocation (def. 3) [and not entry-worthy]
  2. It should be "common" (tbd)
  3. It should be short (eg, < 4 words, < 30 characters)
I think it should be a phrase. We already can place information about following prepositions/adverbs and clauses in a label on a definition line, though we don't seem to do so as much as would be warranted.
Reference to and study of the Collins Cobuild Dictionary and Longmans Dictionary of Contemporary English, which are exemplary in presenting this kind of information within the limits of a printed page. DCDuring (talk) 23:00, 19 December 2021 (UTC)[reply]
Those are good criteria imo. I think determining how common something is is going to be difficult without a reliable way to extract information from corpora, speak nothing of languages without a solid body to work with, and I wonder if editors are going to have to end up relying on intuition more often then not, especially when working on collocation sections of non-English entries. Vininn126 (talk) 00:22, 20 December 2021 (UTC)[reply]
I am skeptical about FL collocations. Shouldn't they mostly be in each language's respective Wiktionary, where there should be a sufficient number of contributors with the appropriate intuition? But I defer to my betters in this regard.
With respect to "common", Google N-Grams works for 5 or fewer words. We need some kind of absolute minimum frequency for each of 2-, 3-, and 4-grams.
I don't think it is so easy with respect to deciding that something is in fact a "collocation" (def. 3). We have Appendix:Collocations of do, have, make, and take which would provide a lot of help for the nouns they collocate with. They might be a test-bed for the methodology for looking for 'objective' means of identifying true collocations. DCDuring (talk) 03:23, 20 December 2021 (UTC)[reply]
Would it make sense to exclude adjective-noun collocations and verb-adverb collocations, at least initially? Noun-verb (SV) and verb-noun (VO) collocations seem more important to me. DCDuring (talk) 03:28, 20 December 2021 (UTC)[reply]
Verb-object collocations probably are more important. However I think certain noun-adjective ones are important as well.
As to the foreign language collocations - look at for example zdecydowany. I would love to be able to change that from a definition or usage example to a collocation. Vininn126 (talk) 12:04, 20 December 2021 (UTC)[reply]
@Chuck Entz, Vininn126, DCDuring, -sche, Mihia: I'm thinking of starting a vote for this (in a couple of days/weeks when I'm less busy) because this is something I really feel strongly about: Not including collocations is a huge blow to Wiktionary in terms of usability in the context of language-learning. As a side effect, I could see how this would also lead to people adding much fewer SOPs as standalone articles. Would you be in favor of the following points?
  1. Add a "Collocations" header to WT:EL, one level higher than the corresponding part of speech header (so L4 if the POS is L3), between "Derived terms" and "Related terms" (alternatively between "Descendants" and "Translations").
  2. Collocations must not be entry-worthy (otherwise they belong to derived/related terms).
  3. Collocations may only be listed for lemmas and must include the lemma or a non-lemma form of the lemma.
  4. Collocations must be common (tough to flesh out in a way that can be applied mechanically but maybe this is not even necessary thanks to good judgement; maybe this can also be formulated semantically rather than statistically, i.e. the collocation should demonstrate something "surprising", e.g. that one usually says "return a verdict" as opposed to *“give back a verdict”).
  5. Collocations should not be entire sentences, jokes etc.
  6. Collocations must be grouped with {{s}} if necessary and listed with {{ux}}/{{uxi}} (or something equivalent that doesn't use Lua) (English translations of non-English collocations should be idiomatic (1) in English, not necessarily word-by-word). If there's too many, put a collapsible box there like {{derived terms}}.
I'm not sure whether we want translations of collocations; that would be a bit harder to organize tidily. Very open to suggestions. — Fytcha T | L | C 〉 10:58, 31 January 2022 (UTC)[reply]
Would we include foreign colocations? Starting a vote does not seem like a bad idea. It might get shot down but that at least moves the discussion forward. Vininn126 (talk) 11:30, 31 January 2022 (UTC)[reply]
I like the idea but there's a lot of overlap with {{usex}}. Maybe we could have a new type of usex which is specifically a collocation ({{usex|colloc=1}})? That would avoid extra headers, and the awkward use of {{sense}} to link them with senses (which gets very confusing for multi-sense entries). – Jberkel 12:18, 31 January 2022 (UTC)[reply]
I would support a parameter designating a template for collocations along the lines of {{usex|colloc=1}} as proposed by Jberkel. I hope someone takes it to vote. Allahverdi Verdizade (talk) 18:30, 6 February 2022 (UTC)[reply]
Can we do some kind of mock-up of the possibilities, including Jberkel's idea? DCDuring (talk) 14:54, 31 January 2022 (UTC)[reply]
@DCDuring, Jberkel: I've created User:Fytcha/assistance (feel free to edit). In my opinion, 1 is the neatest with the least amount of duplication, but it will get out of hand quickly for terms with many senses, each featuring many collocations. 2 is somewhat of a middle ground, offering all relevant collocations while not sacrificing readability (the choice between {{s}} / {{der-top}} depends on the number, while for terms with only one sense such as assistance we can also just use neither, as we do for nyms). 3 is by far the most comprehensive but, apart from Finnish translations, most of these translations boxes will probably stay empty for a very long time.
My personal preference is to not have translations of collocations but instead allowing collocations to be added to the foreign language terms themselves (there with translation). I prefer solution 2. — Fytcha T | L | C 〉 10:13, 4 February 2022 (UTC)[reply]
Agreed to having separate collocations. Vininn126 (talk) 11:38, 4 February 2022 (UTC)[reply]
Option 2) is probably best for English entries, as the existing table header structure with glosses can just be copied. Maybe the overlap of usexes and collocations isn't a big deal, especially when sticking to WT:USEX: "grammatically complete sentences, beginning with a capital letter and ending with a period, question mark, or exclamation point." When using a usex to list a collocation this is often not followed. Maybe these "incomplete" usexes can then just be moved. Regarding foreign collocations, I don't see why we shouldn't have them. We could start with English first, and see how that works out before adopting it for all languages. – Jberkel 23:54, 4 February 2022 (UTC)[reply]
  1. The way these look for a single-sense English entry is not much of a test. People seem to have more trouble with polysemic and, especially, highly polysemic English terms like the light verbs (eg, do, have, take, make), but also most common monosyllabic verbs go, run, walk, talk, think, talk, read.
  2. These kinds of verbs are also often used for what we often categorize as "phrasal verbs" (VERB + "particle"), but which can readily be construed as simply VERB + ADVERB. In fact, almost all "phrasal verbs" have or should have an {{&lit}} definition, which suggests the possibility that the VERB + ADVERB combination could be a common collocation.
  3. This leads me to have a problem with [[User:Fytcha's criterion 2 above ("Collocations must not be entry-worthy (otherwise they belong to derived/related terms).").
  4. It certainly seems a bad idea to apply it to function words (prepositions (at least the common ones), closed-class "adverbs", conjunctions, determiners), as User:Vininn126 implied.
  5. I wonder whether, at least initially, we should limit the application of the collocation idea to, say, nouns (at least in English) and then extend to the other "open" word classes, possibly excluding modal, auxiliary, and light verbs, which are also essentially function words.
  6. Just what normal-user problems are we trying to solve with the collocation idea? What evidence do we have that the problem exists? DCDuring (talk) 19:03, 5 February 2022 (UTC)[reply]
    Re 4, 5: I agree wholeheartedly. Nouns are the safest bet in my view too.
    Re 6: The problem of wanting to know which verbs to use with certain nouns for instance. As I pointed out in #Collocations, a dictionary that doesn't tell the user which verb to use in combination with "Hilfe ..." (leisten etc.) (compare leo.org) or "... judgment" (pass, deliver etc.) (compare leo.org) is unsuited for language learning. This is essential linguistic information that anybody who strives to sound natural wants to learn. — Fytcha T | L | C 〉 09:59, 6 February 2022 (UTC)[reply]
    The Leo example illustrates some problems:
    1. It is hard to match verbs against definitions
    2. Not all the English VERB + judgment combinations seem right to me (in American English).
    3. It takes up a lot of space for polysemic words. DCDuring (talk) 15:50, 6 February 2022 (UTC)[reply]
@Jberkel: Thanks for bringing this to my attention; I wasn't even aware that we have this policy regarding usexes. Regarding the last point, sorry if I didn't express myself clearly, I am in favor of having collocations in languages other than English of course, I just think that we shouldn't have translation boxes for collocations in English entries. To put it concretely: The collocation "Hilfe leisten" should be listed in Hilfe, not as a translation of a collocation in in aid/help. — Fytcha T | L | C 〉 10:04, 6 February 2022 (UTC)[reply]

Nicolae Sfetcu, and other book authors who just copy Wikipedia[edit]

Today I happened to be citing words on two totally unrelated topics, and was surprised to see the author name Nicolae Sfetcu come up in Google Books searches for both of them. (Indeed, if you search Wiktionary, he has been quite widely cited by us.) Having done a little more research, I am coming to the conclusion that Nicolae Sfetcu's books are not written by him, but are mere dumps of articles taken from Wikipedia. This means we may have citogenesis problems.

As this problem gets worse over time (since SEO scum will never decrease in numbers) I wonder what we can do to avoid such issues. A blacklist of author names doesn't seem very helpful since they can always invent more names. Perhaps we need something more like the anti-plagiarism text checkers used by essay graders. Equinox 23:58, 19 December 2021 (UTC)[reply]

The blacklist could work to reduce the problem until the techno-solution is in place. DCDuring (talk) 00:02, 20 December 2021 (UTC)[reply]
Could we have to ban cites from any work that mention Wikipedia and/or Telework in their front matter? DCDuring (talk) 00:13, 20 December 2021 (UTC)[reply]
Any ideas on how we could we systematically search for other authors with the same issue? DTLHS (talk) 00:17, 20 December 2021 (UTC)[reply]
I wonder whether there any quite like this one. See "about the author" here. DCDuring (talk) 01:16, 20 December 2021 (UTC)[reply]
A tedious brute-force idea would be: from a database dump, extract all quotations from English entries (possibly other selected languages, but surely we don't need to process most languages, nor other parts of the entry where we may legitimately duplicate Wikipedia's wordings, or which, like translations tables, would needlessly balloon the size of the final document), put them into one or more documents depending on how big a text the aforementioned plagiarism checkers can handle, and run them through (checking against Wikipedia articles). What detectors are available to us? There's "Earwig's copyvio detector" but does it work for comparing our quotations to Wikipedia, or comparing Wikipedia pages to books? (But we want to do so systematically and not one entry at a time, bah.) For example, Sfetcu is currently cited in BEV, Hi-NRG, mateine, recordist, mangadom, longcoat, Italo, billing, psybient, futurepop, predub, sexfight, Aepyornis, hold-up, maverick, guro, eggcrate, press, blam, and alcohol, but Earwig's tool doesn't seem to catch any issues with the citation of him(?). Otherwise, search Google Books for books that cite Wikipedia, especially if there are any formulaic phrases they use to do so, make a list, and make a script that searches our entries for anything on the list (again, tedious). - -sche (discuss) 02:01, 20 December 2021 (UTC)[reply]
Why don't we just replace these 20 citations with better ones? We could also use a filter to warn about use of cites with his name. The big problem is more fun, but seems like a potentially bad time sink. We could do something like collect all citations from 21st century works and run that text against current and older versions of WP. Thereafter we could run only new cites from 21st century sources against the same WP versions. This seems like a good project to propose when WMF (or whoever) asks for techno project proposals. (Is there already a tool or discussion of a tool at WP for this?) DCDuring (talk) 02:18, 20 December 2021 (UTC)[reply]
Another low-tech solution/workaround is to simply avoid using self-published/vanity-press sources whenever possible. Unfortunately Gbooks surfaces a lot of crap and makes no distinction of the source. – Jberkel 20:02, 20 December 2021 (UTC)[reply]
How could we objectively determine what was self-/vanity-published? Maybe if the author, not the publisher, has the copyright and the author isn't covered by a pedia article? DCDuring (talk) 20:47, 21 December 2021 (UTC)[reply]
You can often judge a (self-published) book by its cover, and I sometimes search for the publisher name to make sure. If it's not listed on Wikipedia, it's very likely self-published. Bigger vanity-press outfits have their own article. – Jberkel 21:06, 21 December 2021 (UTC)[reply]
Is there a database of vanity publisher names/imprints? Can we compile one? DCDuring (talk) 02:24, 22 December 2021 (UTC)[reply]
lol, the "don't feed the troll" rule finally fails. Making a list of these fuckers might not hurt. Equinox 04:03, 22 December 2021 (UTC)[reply]
I've noticed in recent GB searches that Google now displays warning boxes for these types of books: "majority of content taken from Wikipedia" or something to this effect. Jberkel 23:57, 4 February 2022 (UTC)[reply]

User:PulauKakatua19's recent religious/political edits[edit]

The majority of their recent work looked fine to me but in the part of it that didn't, I believe there a bias to be discernible. An exposition:

Pinging @Equinox. Fytcha (talk) 14:03, 20 December 2021 (UTC)[reply]

Labelling everything right of Che Guevara as "fascism" certainly doesn't thrill me, but I'm probably a minority in that. Equinox 14:50, 20 December 2021 (UTC)[reply]
I agree that Category:en:Fascism is applied to liberally. Lügenpresse? feminazism? Fytcha (talk) 15:09, 20 December 2021 (UTC)[reply]
I did this because those terms are used by those followers of various ideologies, so I consider them to be relevant. Perhaps from now onwards I should be more specific, using relevant categories only if it is specific to their ideology/belief or used mostly by self-described ideologues. E.g. "sanghi" is specifically for Hindutvas, whereas "nationalist" is too broad to be for one specific group.
I personally consider the alt-right as fascism because their beliefs overlap (e.g. totalitarianism, racism, devotion to a dictator, etc.). And to a lot of left-wing people on the internet, anything that is far-right is considered "fascism" as an umbrella: and even far-leftists will label everything that is remotely right-wing as "fascism" simply for opposing them.
If not, "fascism" here should only be for self-described fascists, rather than encompassing everything that is far-right or extreme right, which will be part of separate categories. The same for the alt-right, which should be a separate category for being a separate set of ideologies from fascism proper.
Conservatism in general is not fascism to me because it's not as extreme. As for secularist terms placed under "atheism", I did so because it's usually atheists who support secularist policies.
I am sorry for the trouble caused. PulauKakatua19 (talk) 15:35, 20 December 2021 (UTC)[reply]
It will only be wrong, @PulauKakatua19, this all sounds irrecoverably confused, please stay away from categorizing ideologies; but great you are sorry.
But I see no point in the category Category:Fascism anyhow, that you might have seen in another way. Like almost all legitimate entries it contains are compounds with fascist or fascism and otherwise there was no ideological system with terminology peculiar to it, it was a syncretism of various right-wing and right-on views, held together by some regular commixions of pseudoscience. (Marxism, obviously, has much more of marked language.) And those views do not distinguish anyone, all citizens of any country are statists and conservatives by default but just few vocal and systematic about it, so no, there are just offensive terms but you won’t succeed in gathering everything vaguely rightist or vaguely extremely rightist. Fay Freak (talk) 15:41, 20 December 2021 (UTC)[reply]
I still believe that Category:Fascism and Category:Conservatism are still relevant, so people can understand which terms are more often used by supporters from those groups, or are part of their teachings, similar to how Category:Marxism is for various terms used in Marxist writings or often used by Marxists.
The groups I am referring to are self-described fascists (like Mussolini) and conservatives (like the Republican Party of America, UMNO, and other political groups that are more conservative). For their categories, a cleanup is necessary to sort which terms are more relevant and which ones are not used by the groups/relevant to their beliefs. PulauKakatua19 (talk) 15:58, 20 December 2021 (UTC)[reply]
Being “more often used” by group is not enough grounds for a term to be categorized into that group. Also those supporters are very rare. You can’t just take over what kids on the internet meme about Mussolini and Hitler and a possible revival, maybe it’s not even fascism. Fay Freak (talk) 16:06, 20 December 2021 (UTC)[reply]
Self-described fascists are fine for such a category. Conservatives aren't. I'm not a fan of conservatism, but labeling conservatism as fascism is more name-calling than anything else. Chuck Entz (talk) 16:05, 20 December 2021 (UTC)[reply]
Re "alt-right as fascism": I don't want to discuss the degree of overlap, let me just add that I see little use in categorizing alt-right vocabulary almost indiscriminately under fascism when we already have alt-right as a subcategory of fascism, just like we don't doubly categorize phonetics items under linguistics.
Re "to a lot of left-wing people on the internet, anything that is far-right is considered "fascism" as an umbrella": I agree that fascism is thrown around a lot in informal political discourse but I strongly disagree that such crass partisan abuse of terminology should be the basis of our categorization. By the same token, we surely also don't want welfare or health care to be categorized under socialism and communism. Let's stick to reality.
Re "it's usually atheists who support secularist policies": There might be some degree of correlation but none that warrants categorizing secularism-related vocabulary under atheism. Fytcha (talk) 17:49, 20 December 2021 (UTC)[reply]

I also agree that much of the categorization exemplified here is odd and over-simplistic. Certain groups like the alt-right or Hindutva largely communicate within their bubbles, which to a certain extent goes along with a tell-tale lingo ("the limits of my language mean the limit of bubble"). Also Marxists have an age-old stock phrasebook. But that's quite a different thing from assigning vocabulary to Category:Fascism because a) fascists happen to use it, even though others use it too, or b) because in partisan simplistic thinking "fascism" is extended to mean anything that is not dogmatic left. –Austronesier (talk) 21:27, 21 December 2021 (UTC)[reply]

"I personally consider the alt-right as fascism". You're like 20 years old though and you've seen nothing. Don't use categories to propagate your opinion. Use them because they are useful for categorising things. Equinox 00:37, 22 December 2021 (UTC)[reply]

Deleter-role vote[edit]

It looks like the deleter-role vote will not pass. One main objection from the opposition is that the nomination/approval process is too lax. This argument was never raised prior to the start of the vote itself, and I therefore couldn't have known to accommodate for it. I would support tightening up the qualification process, but no one ever mentioned it during the discussions preceding the vote. Given this unusual circumstance, we should at least probe the possibility of changing the vote in midstream to make this modification. This would be quite an unusual remedy (probably unprecedented), so I am asking here whether other users would accept this. Imetsia (talk) 19:46, 20 December 2021 (UTC)[reply]

I agree that this point should have been raised earlier, but enough opposers have already pointed out that this isn't any urgent necessity. I see no harm in letting this vote go on as it is and trying again after 3-4 months. There could be other objections that have not been raised yet, and it isn't even 10 days since the vote was started. IMO these votes are a way of knowing the community opinions, and not merely for implementing something. Let this vote have its value, so that the next time this proposal is brought forth, there is better information about the flaws in the original one. I don't think there's any need to rush. —Svārtava [tcur] 06:54, 21 December 2021 (UTC)[reply]
No. You won’t invent an alternative process between the whitelist and an admin-scale vote that fast either. Fay Freak (talk) 06:55, 21 December 2021 (UTC)[reply]

v sounds in Japanese[edit]

アール・ヌーヴォー (art nouveau) gives a̠ːɾɯ̟ᵝ nɯ̟ᵝːbo̞ː as the IPA but āru nūvō as the Romaji. This seems non-ideal. --General Vicinity (talk) 10:20, 22 December 2021 (UTC)[reply]

@General Vicinity: Counterintuitive but not wrong. Wikimedia uses Hepburn romaji which transliterates ヴ as v, while its standard Japanese pronunciation is /b/. -- Huhu9001 (talk) 07:05, 29 December 2021 (UTC)[reply]

Vanity press - durably archived?[edit]

I like using Google books but there is a lot of self-published material there, should it count for attestation? It would mean more work if it didn't. --General Vicinity (talk) 10:27, 22 December 2021 (UTC)[reply]

Not sure about durability, but self-published material has all sorts of other issues (see Nicolae Sfetcu thread above). The Internet Archive is a good alternative to GBooks. No vanity press (or very little), full-text search across all scanned books, no tracking shenanigans. That said, vanity stuff can be useful for attesting neologisms or other weird usage that would normally get removed by a professional editor. – Jberkel 11:05, 22 December 2021 (UTC)[reply]

Can I make mainspace user warning templates myself?[edit]

Faster than Thunder (talk) 17:25, 22 December 2021 (UTC)[reply]

What are mainspace user warning templates? Templates to be included in mainspace to warn users? Are there any now?  --Lambiam 16:43, 24 December 2021 (UTC)[reply]
Please don't. —Μετάknowledgediscuss/deeds 17:46, 25 December 2021 (UTC)[reply]
Please do some work on the project before charging in with rule-setting and moderation stuff. Equinox 22:15, 25 December 2021 (UTC)[reply]

PoS quiz[edit]

Seeking more opinions about which PoS we should list these highlighted word usages under:

PoS quiz sentences
  1. I've forgotten where I was in this book.
  2. Stay where you are.
  3. Please sit where you like.
  4. You cannot be too careful where explosives are involved.
  5. He asked where I grew up.
  6. Where did you come from?
  7. This is the place where we first met.
  8. This is a photo of where I went on holiday.
  9. Go home to wherever you belong.
  10. Wherever you go, I'll find you.
  11. You can sit wherever you like.
  12. Our charity has limited funds, but we help wherever we can.
  13. She lives in Puddletown, wherever that is.
  14. The gauge indicated how hot the oven was.
  15. She showed him how to do it.
  16. How the stock market interprets events has real consequences.
  17. I remember how I solved this puzzle.
  18. People should be free to live how they want.
  19. She told me how her father was a doctor.
  20. She wanted to go; however, she decided against it.
  21. However we do this, it isn't going to work.
  22. Wear your hair however you want.

If you like, you may copy the whole of the above list and write your views next to the sentences, like this:

Example response
  1. I've forgotten where I was in this book. ADVERB
  2. Stay where you are. CONJUNCTION
  3. Please sit where you like. RELATIVE ADVERB
  4. You cannot be too careful where explosives are involved. CONJUNCTIVE ADVERB
.... etc.

(These are random answers just to show the suggested response format.)

Please comment on as many sentences in the list as you wish. If some uses seem indeterminate given the PoS that we have available to us, we could also consider the possibility of having one heading for multiple PoS, e.g. ===Conjunction, Adverb===. Please also comment on this if you have an opinion. Thank you. Mihia (talk) 22:05, 22 December 2021 (UTC)[reply]

Derived terms and Related terms[edit]

Different articles categorise multi-word and hyphenated compounds in different ways with respect to the "Derived terms" and "Related terms" sections (there are other issues too with the use of these sections, but for now I will just focus on this one). For example, "road map", "toll road" and many others similar are listed as "derived terms" of road, yet at street, the multi-word derivatives such as "street map" are largely (with one or two odd exceptions or duplicates) under "related terms". Which should it be? According to Wiktionary:Entry_layout#Derived_terms, "derived terms" are "morphological derivatives". Are terms such as "road map" and "street sign" "morphological derivatives" of "road" and "street" or not? And, if not, are they terms with "strong etymological connections" to the headword, as Wiktionary:Entry_layout#Related_terms are said to be? To me it seems a bit weird to say that e.g. "street map" has "strong etymological connections" to "street". And what about hyphenated terms? At time, for example, these are fairly consistently listed under "derived terms", along with closed compounds, while the open compounds are listed under "related terms". Is this actually how it is supposed to work? And, in fact, do we definitely need to split these derived/related terms into two sections, or can they all be lumped together into one section that incorporates "all words/terms 'containing' the headword"? One disadvantage, if it is seen to be important, might be that the "simple" morphological derivatives, such as "driver" from "drive" (the example given at WT:ELE) might become swamped, but this is already almost becoming the case at drive anyway, so.

Please comment, and I will clarify the wording at WT:ELE where necessary/possible. Mihia (talk) 19:13, 23 December 2021 (UTC)[reply]

I am of the opinion that Derived terms should include any affixed forms, compounds, backformations, etc. Related terms should be words coming from the same etymological source. So Road Map and Mapper would be related terms, but both be found under derived terms of map. Vininn126 (talk) 19:51, 23 December 2021 (UTC)[reply]
Thanks, when you say "Road Map and Mapper would be related terms", related to what? To each other? Or in fact do you mean related to "road"? And do you mean "road map" and "mapper" or "road map" and "road mapper"? Sorry, I don't really understand what you mean. Guessing somewhat wildly, are you possibly saying that IYO "road map" should be a derived term of "map" because it is a type of map, but a related term of "road" because it is not a type of road? Mihia (talk) 20:06, 23 December 2021 (UTC)[reply]
They'd be related to each other through road. So they'd have each other under related terms. And road would also list roadmap as a derived term - roadmap was derived from it after all. Vininn126 (talk) 20:18, 23 December 2021 (UTC)[reply]
I asked you whether you meant "road map" and "mapper" or "road map" and "road mapper", but your answer refers to "they" without clarifying this point. If you mean that "road map" and "road mapper" are related to each other through "road", that seems to me to be a slightly odd way of looking at it. "road mapper" is a clear "morphological derivative" of "road map" IMO, so this is not a problem case.* They other way around, I'm not sure. Mihia (talk) 20:51, 23 December 2021 (UTC) * though, actually, does it matter that map in "road map" is the noun, while "mapper" comes from the verb? Is it still a "morphological derivative" if it is from a different PoS? Hm.[reply]
Oh, sorry. Roadmapper would be derived from road map, and related terms would be road, map, mapper, etc Vininn126 (talk) 22:12, 23 December 2021 (UTC)[reply]
The question of the derived and related terms (if any) of themselves-derived-or-related compounds such as "road map", "road mapper" or "roadmapper" is one that I guess should ideally be resolved and covered at WT:ELE. However, this is not the main thrust of my original post, which focuses on the more salient or widespread issue of the derived/related terms of "single" or "root" words such as "road" and "street". Mihia (talk) 22:31, 23 December 2021 (UTC)[reply]
My instinct is to put things like toll road and road map (and roadmap; I wouldn't split compounds just based on spelling) as "derived terms" of road... I'd use "related terms" the way it's used on conjunction, where conjunctive is a "related term", whereas conjunctional should be a derived term, I guess. As you say, compounds are liable to swamp things like conjunctive and conjunctional regardless of which section the compounds are put in. If we add a "collocations" section, we could consider offloading road map et al. to there, or maybe that's a bad idea (e.g. it would split spaced vs unspaced compounds). - -sche (discuss) 08:14, 24 December 2021 (UTC)[reply]
I agree with you that all compounds should be grouped together, presumably under "derived terms", whether they are single words or spaced or hyphenated. It seems silly that e.g. "roadmap" and "road map" or "road-map" would go in different places. The intention of "related terms", as far as I can gather from WT:ELE, is for it to include words that are seen as "different words altogether", but etymologically related, as at car, for instance, where the "related terms" are "carriage" and "chariot". (Even though "carriage" contains "car", it isn't actually formed in our minds as "car-iage".) The implication of this is that, for many short, common words, such as "road" and "time", the "derived terms" list would be long and the "related terms" list pretty short. It would also mean that articles such as "street" and "time" are presently incorrect in their use of the "related terms" section. The question of what to do with related/derived terms of words that are themselves in a sense derivatives, such as "conjunction", remains. "conjunctive" does not seem a "derivative" of "conjunction", and I agree it seems sensible to call it a related term. However, it doesn't seem a "different word altogether" in the sense of e.g. "car"/"chariot". Does it make sense to use the same section for both types? Mihia (talk) 11:50, 24 December 2021 (UTC)[reply]
In fact, is there a difference between "related terms" according to WT:ELE and "cognate terms"? Would the latter be a clearer heading for e.g. "car"/"chariot", leaving "derived/related terms" for everything else?? Mihia (talk) 18:05, 24 December 2021 (UTC)[reply]
The way we use "cognates", which is principally in etymology sections, includes words from other languages and, indeed, often seems to exclude words from the same language as those in which the etymology section occurs. "Coordinate terms" is a purely a heading for a semantic grouping, parallel to the -nyms heading. DCDuring (talk) 19:43, 24 December 2021 (UTC)[reply]
@DCDuring: If, as I believe you suggested in the other thread, you have a clear idea (or clearer than me, anyway) of how these sections are supposed to be used, could you possibly comment on the other points? Do you agree that all compounds such as "road map", "roadmap", "road-map", "toll road" etc. should be listed as derived terms of their components, in this case "road", "map" and "toll"? Furthermore, do you agree that words sharing a root, e.g. "conjunctive" and "conjunction", should be related terms of one another? Mihia (talk) 14:14, 25 December 2021 (UTC)[reply]
Yes and yes. I have some uncertainty about how to treat back-formations. DCDuring (talk) 03:28, 26 December 2021 (UTC)[reply]
I feel like back-formations pretty clearly fall under "derived terms", as in edit is a derived term of editor. Vininn126 (talk) 04:18, 26 December 2021 (UTC)[reply]
Personally I find it hard to see how one could say that "edit" is "derived" from "editor". Mihia (talk) 10:24, 26 December 2021 (UTC)[reply]
How wouldn't it be? I took the word editor, changed it, and made a new word. I derived edit from editor, albeit through backformation, but it's still derived from it. Vininn126 (talk) 15:16, 26 December 2021 (UTC)[reply]
Sorry, you're correct, I lost sight of the fact that "edit" actually is derived from "editor" as a back-formation, or to some extent anyway, thinking instead of typical "-er"/"-or" words such as "driver". However, if "edit" is a derived word of "editor", one could pose the question of where "editor" should be listed at "edit". Related terms? Mihia (talk) 15:39, 26 December 2021 (UTC)[reply]
Yes. Vininn126 (talk) 16:07, 26 December 2021 (UTC)[reply]
Then (a) "road map" would be a derived term of "road", (b) "conjunctive" would be a related term of "conjunction" and (c) "chariot" would be a related term of "car". The problem I have with this is that the word relationships (b) and (c), in the same section, seem at least as different in nature from one another as do those in (a) and (b), in different sections. Do you see this as a problem? Mihia (talk) 10:19, 26 December 2021 (UTC)[reply]
  • I would also like to raise again the question of whether there really is any point or need in trying to make a distinction between derived and related terms. The random mess that we presently have across many articles suggests that editors are unable to distinguish the two, and I'm not sure how much clarifying the wording at WT:ELE will change this, given that people generally tend not to read documentation/instructions. Also, clearing up said mess manually would apparently be a mammoth task, only for our nice distinctions to decay again over time, whereas the sections could be auto-merged. I dunno, but it's worth considering, I think. Mihia (talk) 11:24, 26 December 2021 (UTC)[reply]

Testing adjective vs attributive noun POS[edit]

How can we test if a term is a noun being used attributively or a true adjective? I realized that garbage is sometimes used with adverbs like "pretty" or "totally", is that one way? General Vicinity (talk) 16:52, 24 December 2021 (UTC)[reply]

"My, what pretty garbage you have!"
"They surrendered totally to the moment."
CGEL (2002) recommend very and too for gradability testing, probably because these words are common, usually used as adverbs, often as degree adverbs, and not too, very polysemic, at least in current English and, therefore, generate fewer false positives. Very, in its older adjectival meaning "true", would generate many false positives. But many degree adverbs can work as tests, just requiring a bit more time and effort. DCDuring (talk) 17:02, 24 December 2021 (UTC)[reply]
This is a difficult, recurring question; see WT:English adjectives for tests (most of which are not foolproof), and Whatlinkshere for past discussions. Pretty could be suggestive, but not conclusive, since nouns can be made attributive with it ("she did XYZ" "that's a pretty/totally/very Anna move"). Inflection for degree with -er or -est would be a good sign. While the near-nonexistence of google books:"garbagest" and inconclusiveness of pretty (or very) doesn't mean it couldn't nonetheless be an adjective, Occam's razor would say to view it as a noun until something (more) conclusively shows that it's become an adjective. - -sche (discuss) 20:41, 24 December 2021 (UTC)[reply]
We've generally been quite generous in allowing words most commonly used as nouns to be also considered as adjectives, allowing them if just one of the tests is met. That is probably too lax, but I don't know a good means of tightening up the criteria. Requiring -est or -er forms seems too much, especially since "more garbage NP" and "most garbage NP" would be sufficient, though harder to search for. DCDuring (talk) 22:53, 24 December 2021 (UTC)[reply]

Broken English[edit]

Some terms, like fucky fucky and jiggy jiggy, seem mostly to be used as, or in imitation of broken English. Is there a standard way to indicate this? General Vicinity (talk) 10:20, 26 December 2021 (UTC)[reply]

Maybe they could be categorized with label of pidgin General Vicinity (talk) 10:44, 26 December 2021 (UTC)[reply]
A "pidgin" label/category sounds reasonable. (We have CAT:Non-native speakers' English (and a label, {{label|en|NNSE}}), which I guess technically could apply, but I tend to view that category as being for things like realtimely where the person thinks it's standard English and it's just not, whereas things like this are obviously broken English / pidgin.) Equinox brought up other examples recently at Wiktionary:Tea room/2021/December#maskee:_how_to_include_this?. - -sche (discuss) 08:20, 27 December 2021 (UTC)[reply]
@-sche: Thanks for mentioning maskee (I knew there'd been one but I couldn't remember the word). I've got another Chinese pidgin word on my list too: piecey, which I suppose can refer to a number of pieces/units (two piecey cloth?) but also has wider application. It's easily found in Google Books but whether we could call it English, and how to classify it (noun? measure word??), are unclear. Equinox 10:13, 27 December 2021 (UTC)[reply]
With Chinese "broken English", you always have to look out for indiscriminate adding of "-ee" to short words as part of the stereotype, as in the infamous "no tickee, no washee". "Piecey" could just be "piece". Chuck Entz (talk) 14:03, 27 December 2021 (UTC)[reply]
Guess so. I created the -ee sense here (I'm sure it will come back to haunt me). But it's attestable. "All words in all languages"? Never mind, deleting it from my list because there are better things to work on. Equinox 14:06, 27 December 2021 (UTC)[reply]
My initial reaction is like Equinox's: if it's attestable, does it matter if the suffix is widely applied? I'm not sure. I know we exclude most apostrophic possessives because it's obvious how to decompose them, but a recent discussion seemed to favour including apostropheless ones like kinges, in part because it's one word, and we already have all plurals like kings even though they're widely applicable. Seeing -ee also doesn't reliably indicate the pidgin suffix (someone can figure out to drop -'s and be right almost all of the time, but dropping -ee or -ey gives wrong results if attempted on words like dicey, employee). And while we did drop Latin words suffixed with -que because it's truly indiscriminately applicable, I think only certain categories of words take -ee: would someone say "Iee sellee youee goodsee ofee theee Eastee"? I know it could be added to the nouns and verbs ("belongey", "tawkee"), but probably not some of the other words. (OTOH, I concede that adding a bunch of pidgin does not seem like a particularly high-value task.) Ehh. - -sche (discuss) 00:49, 28 December 2021 (UTC)[reply]
General Vicinity added the label to jiggy jiggy and I edited Module:labels so it categorizes. If this proves to be a bad idea, please undo. - -sche (discuss) 00:49, 28 December 2021 (UTC)[reply]
Those terms aren’t imitation of broken English though, they are accurate imitation, caricature of typical English: as in rumpy-pumpy, jiggery-pokery, hanky-panky. This may be a US meme, but in Britain popular native musicians are also like my rammy goes stabby / man flogs ammy from pickney / when I spot the rale / I back my handy, wallahi / and dash like an athlete / make them eediots catch me etc., you dun know—just different standards. OP is just extra-racist or more exclusionist about his examples because of their reduplication. Fay Freak (talk) 16:00, 27 December 2021 (UTC)[reply]
To me, one sense of fucky fucky is definitely stereotypically imitative of broken English (e.g. stereotypically spoken by a Chinese woman: "You want fucky fucky?"). I'm not sure about jiggy jiggy. Mihia (talk) 22:46, 28 December 2021 (UTC)[reply]

Hot words, a grace period and CFI[edit]

Recently an RFV was filed for Mickey Mouse ring, a former "hot word", just over a year since its first citation was published. This is in line with the letter of the rules: as the user who started the RFV pointed out, the text at Category:Hot words older than a year says these words "must" be sent to RFV. But it is hardly a productive thing to do. You're going to have a hard time finding cites that "span more than a year" if the word emerged barely more than a year ago.

I checked WT:CFI to see if there was any guidance on hot words, but it's completely silent on the matter. The hot word scheme, largely inspired by the word olinguito which was obviously a worthy inclusion from the day it first appeared, has been running informally and uncontroversially since 2014. It is time to include the hot word scheme in CFI in a way that solves the above problem.

I'd like to run a vote to add hot words to CFI according to the current system, with the addition of a six-month grace period to give time for cites that span more than a year to be published and found:

Add to CFI at the bottom of the "Spanning at least a year" subsection:
To allow Wiktionary to cover new coinages and emerging vocabulary, "hot words" (words for which all identified durably archived sources have been published within the last 18 months) are exempt from the "spanning at least a year" criterion. For example, the attestation of a word which first appeared in a book published on 1 January {{#expr:{{CURRENTYEAR}}}} cannot be challenged under the "spanning at least a year" criterion before 1 July {{#expr:{{CURRENTYEAR}}+1}}, but this "hot word" is still subject to all other criteria for inclusion and may be challenged under those criteria at any time.

Six months is totally arbitrary; it could just as well be a year's grace period. Thoughts? This, that and the other (talk) 03:04, 27 December 2021 (UTC)[reply]

Support up to a year (or however long the max can be). AG202 (talk) 04:56, 27 December 2021 (UTC)[reply]
Support: Six months should be enough, but longer would be OK with me, too. DCDuring (talk) 13:46, 27 December 2021 (UTC)[reply]
Support: "span more than a year" doesn't mean "span 12 1/2 months"- attestations aren't always predictably spaced. Chuck Entz (talk) 16:11, 27 December 2021 (UTC)[reply]
Symbol support vote.svg Support. Fytcha (talk) 00:18, 30 December 2021 (UTC)[reply]
Support in principle, and would be happy with something even a bit longer, although I have no suggested ideal time. Cnilep (talk) 01:12, 30 December 2021 (UTC)[reply]
I made a vote with the grace period set at a year (so that hot words can remain "hot" for up to two years). This, that and the other (talk) 03:52, 30 December 2021 (UTC)[reply]

Global ban proposal for Musée Annam[edit]

There is an on-going discussion about a proposal that Musée Annam be globally banned from editing all Wikimedia projects. You are invited to participate at Requests for comment/Global ban for Musée Annam on Meta-Wiki. Thank you! NguoiDungKhongDinhDanh (talk) 14:22, 27 December 2021 (UTC)[reply]

Upcoming Call for Feedback about the Board of Trustees elections[edit]

You can find this message translated into additional languages on Meta-wiki.

The Board of Trustees is preparing a call for feedback about the upcoming Board Elections, from January 7 - February 10, 2022.

While details will be finalized the week before the call, we have confirmed at least two questions that will be asked during this call for feedback:

  • What is the best way to ensure fair representation of emerging communities among the Board?
  • What involvement should candidates have during the election?

While additional questions may be added, the Movement Strategy and Governance team wants to provide time for community members and affiliates to consider and prepare ideas on the confirmed questions before the call opens. We apologize for not having a complete list of questions at this time. The list of questions should only grow by one or two questions. The intention is to not overwhelm the community with requests, but provide notice and welcome feedback on these important questions.

Do you want to help organize local conversation during this Call?

Contact the Movement Strategy and Governance team on Meta, on Telegram, or via email at msg(_AT_)wikimedia.org.

Reach out if you have any questions or concerns. The Movement Strategy and Governance team will be minimally staffed until January 3. Please excuse any delayed response during this time. We also recognize some community members and affiliates are offline during the December holidays. We apologize if our message has reached you while you are on holiday.


Movement Strategy and Governance

Thank you. Xeno (WMF) (talk) 20:16, 27 December 2021 (UTC)[reply]


Mostly, CAT:en:Armor is personal armor (helmet, breastplate); also horse armor (criniere, but not yet crinet); currently it has ~130 entries and I have ~40 to add soon. However, it also contains a few terms for personal shields (rondache, target, but not pelta, heliman, rotella, etc) and vehicle shields (pavisade, but not the out-of-sync duplicate entries reactive armor/reactive armour, etc); if we systematically add shields, the category will get larger.

  1. Should we move shields to a subcategory? And/or should we move personal armor to a subcategory? But personal armors are the most salient members of the category, so leaving only less-salient stuff in the top-level category feels weird. How big should a topic category be before we consider subdividing it?
  2. Should things like jupon and surcoat (clothing worn over and as part of a full set of armor but which is not, itself, armor) be in the Armor category? One is and one isn't. What about baldric, which gets treated as armor in various modern fantasy works, to the extent of people having iron, steel or chainmail baldrics, but which is arguably more clothing?

- -sche (discuss) 22:22, 27 December 2021 (UTC)[reply]

Related terms like jupon definitely belong in the category like walkies belongs in Dogs General Vicinity (talk) 10:49, 30 December 2021 (UTC)[reply]

Add translations from Turkish[edit]

It is now easier than before to add translations to Turkish Wiktionary. Most articles there (for Turkish words) now have a translation section, where you can enter the language code and the translation, press preview and add, just like here on English Wiktionary. For example, from the article tr:telgraf, you can add what telegraph is in some other language. You might want to know these Turkish words: ad - noun, eril - masculine, dişil - feminine, nötr - neutral, genel - common, çoğul - plural, köken - etymology/origin, çeviri - translation, and -ler/-lar is a plural ending. (Disclaimer: I don't speak Turkish and I'm not an admin. I just run a bot that fixes some technical problems.) --LA2 (talk) 23:29, 29 December 2021 (UTC)[reply]

Use of Wikidata linking template[edit]

{{wikidata}} is kind of funny: it's in a sense similar to {{Wikipedia}} or {{Commons}} but less used. Do we have strong feelings on including it in entries and encouraging users to go back and forth between the sister projects with this template? If so, should we only link it to the Lexeme namespace or to proper entries on the same topic at Wikidata? Happy to get others' perspectives. —Justin (koavf)TCM 03:39, 30 December 2021 (UTC)[reply]

This discussion was prompted by me removing {{wikidata}} from an entry after Justin had added it. I see encyclopaedias, dictionaries, and even galleries of images as being essentially intended for readers to consume, but databases are a different matter. It's possible that there is a good use for this template somewhere, but in most entries, I don't see what readers would gain from going to a database that isn't intended to be read. —Μετάknowledgediscuss/deeds 07:59, 30 December 2021 (UTC)[reply]
Sometimes the Wikidata code is put there as the {{senseid}} (see e.g. Jupiter). Also, check out how the article in question, Haratin, looks like on mobile (in the browser in vertical view). Fytcha (talk) 14:20, 30 December 2021 (UTC)[reply]
I understand desire for 'completeness', to link every other project that has a page, but this doesn't seem useful (at best only very marginally so), IMO it's mostly just "noise" distracting from the actually-useful Wikipedia link and, on mobile, taking space away from the actual definition. (Some people don't even like Wikipedia links being boxes and prefer small See-also links, although those are too easily missed IMO, for something as important as there being an in-depth encyclopedia article on the topic. But maybe a small {{pedialite}}-esque See-also link would be more appropriate for links to Wikidata's database pages, if we are to have them at all...) - -sche (discuss) 06:56, 2 January 2022 (UTC)[reply]
For most readers (that is, outside our WMF bubble) Wikidata links are very likely confusing: we can't assume they know what Wikidata is. Of course there might be a piece of useful information on the Wikidata item page, but they have not been designed for "human consumption". It's an infodump full of arcane WD terminology (properties, items, Q numbers etc). – Jberkel 11:08, 31 January 2022 (UTC)[reply]
The Wikidata links could be useful to editors. For example, taxonomic name entries would benefit from the external databases IDs that are contained in Wikidata. Ordinary users would benefit from the links, but not if they have to go to Wikidata to get them. DCDuring (talk) 16:20, 31 January 2022 (UTC)[reply]
  • @Koavf: Please read this discussion, and let me know what you think. —Μετάknowledgediscuss/deeds 19:43, 31 January 2022 (UTC)[reply]
    • I think that DCDuring is also correct that editors could find a link useful. I'm a big fan of including all projects or no projects, but definitely biased toward interlinking and encouraging editors to edit across all the sister projects. As far as mobile views, I don't have a smartphone, so I don't know how much my desktop browser really represents what a tiny phone would look like but if it's a problem to display Wikidata, seems like we'd have a problem displaying all of the interwiki link templates. —Justin (koavf)TCM 21:53, 31 January 2022 (UTC)[reply]
      • The biggest value to me as editor would be if there were a template that, in effect, kept links to WMF projects and to non-WMF databases up to date. In principle Wikidata could do that. But I don't see why any normal user would want to go and deal with Wikidata's ridiculous user interface. I'd be willing to use WP once to access Wikidata's identifier for a taxononomic name if there were a template into which the ID could be inserted to yield something equal to or better that the current set of links in the best taxonomic name entries. As it is, it is easier for me to get most of the links I need from NCBI, WP, or Wikicommons, without cluttering up the UI with largely irrelevant links. I'd be willing to include one inline Wikidata link in each taxonomic name entry, more if there were homonyms. The Wikidata link would be most useful for the geographic databases, like the numerous Flora (Flora of China, Flora of Zimbabwe, etc). DCDuring (talk) 23:04, 31 January 2022 (UTC)[reply]

Identifying code-switching[edit]

How do we know if something like arigato should have an English entry or is simply code-switching? General Vicinity (talk) 11:53, 30 December 2021 (UTC)[reply]

Run it through RfV to see if it is used without distinctive orthography? DCDuring (talk) 23:56, 30 December 2021 (UTC)[reply]
Probably we should refer to standard dictionaries, such as the OED, to determine whether such outlandish terms are entry-worthy? In this age of globalization, the size of the English vocabulary is practically limitless… ·~ dictátor·mundꟾ 14:06, 31 December 2021 (UTC)[reply]
As the essay Code-switching states, there cannot be a hard and fast criterion for making the distinction. For most instances of actual uses, it should not be hard, though, to make the distinction. An example: “Arigato (Thank you in Japanese) was the only word I knew.[2] If it is used in an English-language text to mean “thanks” without an explanation or reference to Japanese culture, just like merci is used without explanation or reference to French culture, it will appear to have taken up a position in the English lexicon.  --Lambiam 00:06, 1 January 2022 (UTC)[reply]
In some ways, it reminds me of WT:FICTION, with "a foreign language" substituted for "a fictional universe". For many native English speakers- especially in the US in the pre-internet era- a place where people don't speak English might as well be another solar system... Chuck Entz (talk) 03:27, 1 January 2022 (UTC)[reply]
There's no clear-cut border between code-switching and borrowing. The think-tank/essay linked above summarizes the tests people came up with last time. As an example, ge is decently commonly used and even inflects (ges), but AFAICT it's always italicized, so it looks like code-switching. Whereas, kimono is often not italicized and has extended senses not present in Japanese, so it looks like a borrowing. - -sche (discuss) 07:19, 2 January 2022 (UTC)[reply]

Vietnamese 3rd person pronouns[edit]

In Vietnamese, most any word used for "you" (of which there are many) can be converted into "he" or "she" by adding ấy, for example anh + ấy => anh ấy or bạn => bạn ấy [3] .This could be handled by updating the words for "you" or by creating entries for the ấy forms. Thoughts? General Vicinity (talk) 14:46, 31 December 2021 (UTC)[reply]

Those are, technically, sum of parts, especially as ấy can be replaced with any other deictic (kia, đó, này…). I don't therefore see any reason for ấy forms (except for the most common ones, maybe, which seem to exist already). Anyway, most of those words can mean "he" or "she" without ấy as well. I'd argue they aren't pronouns but nouns; they just happen to translate to pronouns in other languages in most (not all) situations. MuDavid 栘𩿠 (talk) 07:52, 1 January 2022 (UTC)[reply]