Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020


August 2020

Unsourced and poorly formatted Proto-Dardic entries by 2401:4900:448F:3291:0:3F:8DB4:7F01 (talk)[edit]

Although Proto-Dardic terms are now frequently shown on mainspace entries, I would be very hesitant to actually create Proto-Dardic entries because they're wanting in terms of research and we still don't have an established methodology and convention (like deciding which inflected form goes into the title, what notation to use, etc) of dealing with PD reconstructions. The entries created by this IP are neither sourced nor formatted. They, in my recommendation, should be deleted. -- Bhagadatta (talk) 11:42, 2 August 2020 (UTC)

@Victar, JohnC5, Bhagadatta I don't think there is any strong basis for Dardic as a language family rather than an areal zone, so the whole premise of Proto-Dardic is suspect IMO. I'm good with deleting this entries if you all are. —AryamanA (मुझसे बात करेंयोगदान) 17:58, 6 August 2020 (UTC)
I'm happy deleting them simply on the basis of them being garbagely put together. Dardic is a language family in my opinion though. See {{R:ine:HCHIEL|435}}. --{{victar|talk}} 18:12, 6 August 2020 (UTC)
I do not know these data well enough to know whether Proto-Dardic is real. But I generally follow the rule that, until a set of consistent guidelines/references for a protolanguage has been established and discussed (i.e. WT:About Proto-Dardic), the entries should not be created. —*i̯óh₁n̥C[5] 20:26, 6 August 2020 (UTC)
@AryamanA: I would support either keeping Proto-Dardic as a code or doing away with it. There seems to be good argument for both sides the way I see it. I'm opposed to creating the entries like the ones that IP created; however, I wouldn't mind having tentatively reconstructed "Proto-Dardic" etymologies on the mainspace, if only to throw light on the possible previous form of those (Kalasha/Khowar/Torwali) words. -- Bhagadatta (talk) 14:27, 8 August 2020 (UTC)

Russian irregular verbs[edit]

Are there any Russian verbs with the -м -шь -ст conjugation besides дать and есть and if so should there be a category for them Dngweh2s (talk) 15:26, 2 August 2020 (UTC)

@Dngweh2s I don't think there are, besides the derivatives of those verbs. Not sure we need a category for only these two verbs + derivatives. We already have lots of verb categories and you can find these verbs easily by looking on the pages for дать and есть, which list all the derivatives. Benwing2 (talk) 23:39, 2 August 2020 (UTC)
@Benwing2, Dngweh2s: There's also создать (sozdatʹ), which I've just removed from the list of derived terms of дать (datʹ) as they're unrelated. PUC – 23:47, 2 August 2020 (UTC)

Set Yakut as an ancestor of Dolgan[edit]

  • "Долганский язык, сформировавшийся в процессе распространения якутского языка в макрорегионе взаимодействия различных этнических групп, является языком этноса, который в особых исторических условиях не ассимилировался в составе превалирующего в этом регионе этноса, язык которого принял, а стал функционировать изолированно, в удалении от основной массы носителей якутского языка." 2001 Артемьев, Николай Матвеевич. Долганский язык.
"The Dolgan language, formed during the spread of the Yakut language in the macro-region of interaction of various ethnic groups, is the language of an ethnos that, under special historical conditions, was not assimilated as part of the prevailing ethnic group in this region, but began to function in isolation, [diverting] away from the main bulk of speakers of Yakut language". 2001 Artemʹjev, Nikolaj Matvejevič. Dolganskij jazyk. Doctoral thesis
  • "Некогда являлся наречием якутского языка, со временем, из-за достаточной обособленности в результате изолированности развития и внутренней перестройки под влиянием эвенкийского языка, стал самостоятельным языком"
"It was once a dialect of the Yakut language, but, over time, as a result of isolation and internal restructuring under the influence of the Evenk language, it became an independent language." Russian Wikipedia

Dīxī. Allahverdi Verdizade (talk) 15:18, 3 August 2020 (UTC)

@Allahverdi Verdizade: I trust in your research here because I don't know anything about this. Done. — Eru·tuon 02:58, 10 August 2020 (UTC)
@Allahverdi Verdizade, Erutuon: Wait, no. Dolgan and Yakut are living (albeit moribund) languages. They're derived from a single ancestor, but that ancestor isn't Yakut as it's spoken today, but instead some Proto-Yakut-Dolgan form. --{{victar|talk}} 03:45, 10 August 2020 (UTC)
Well, it's not unheard of for a living language to be derived from another living language (or earlier form of a living language), like Afrikaans from Dutch. Wikipedia says the Dolgans moved away from the Yakut-speaking region in the 18th century, which is about as long ago as the Dutch colonizing South Africa. Judging by Dutch (or English) I suppose the language at that time wouldn't have been different enough from the modern language for us to assign it a separate language code. — Eru·tuon 05:15, 10 August 2020 (UTC)
Or Malay and Indonesian, or any creole and colonial languages.And, Yakut is kinda thriving, Dolgan is severely threatened. Allahverdi Verdizade (talk) 07:30, 10 August 2020 (UTC)
Language vs. dialect is, as you know, the never ceasing debate. Dolgan was already divergent before it became isolated from Yakut speakers, with its own declension paradigm and since then heavily influenced by Tungusic. It's funny, they actually say that some northern Yakut dialects have a greater degree of intelligibility with Dolgan due to a common Evenki word inventory. How do people feel about creating an Old Yakut language code, with lemma from Nicolaas Witsen lexicon, and make both Yakut and Dolgan descendants of it? --{{victar|talk}} 17:39, 10 August 2020 (UTC)
Actually, if you read Kara, G. (1972) Le glossaire yakoute de Witsen, they point out that the differences between Yakut and Dolgan still far pre-date Witsen in the 17th century. --{{victar|talk}} 22:45, 10 August 2020 (UTC)
That does not change the the-well established fact that Dolgan is descended from Yakut. Proto-Yakuto-Dolgan is unfortunately not a term in use, neither in English nor in Russian, and I do not support creating novel terminology within Wiktionary.
> Dolgan was already divergent before it became isolated from Yakut speakers, with its own declension paradigm
Where can I read abut this? There are to my knowledge very few works on Dolgan grammar, let alone in historical perspective. Allahverdi Verdizade (talk) 10:30, 11 August 2020 (UTC)
Did you read the above? It cites the ancestor of Dolgan and Yakut as Proto-Yakut (proto-yakoute), also pointing out that Dolgan predates the č/ǰ/s-merger that Yakut exhibits. --{{victar|talk}} 00:01, 12 August 2020 (UTC)
This is a good paper too, going over several of the difference between the two languages, with many archaisms in Dolgan that aren't present in even the earliest attestations of Yakut, i.e. Proto-Yakut *suoq > Yakut суох (suox), Dolgan һуок (huoq). --{{victar|talk}} 01:12, 12 August 2020 (UTC)
@victar: I read the above, but nowhere do I find support for the claim that "Dolgan was already divergent before it became isolated from Yakut speakers". Anyways, what do you propose? Create a Proto-Yakout code or what? Allahverdi Verdizade (talk) 19:26, 17 August 2020 (UTC)
@Allahverdi Verdizade: See the chart on page 435. Yes, I believe a Proto-Yakut language code would be most appropriate. --{{victar|talk}} 20:42, 17 August 2020 (UTC)

Technical Wishes: FileExporter and FileImporter become default features on all Wikis[edit]

Max Klemm (WMDE) 09:13, 6 August 2020 (UTC)

Armenian numerals[edit]

Armenian has an old tradition of using letters as numbers "Armenian numerals" in much the same way as Greek and Hebrew has them, except that like Roman numerals, they seem not to be especially marked as being numbers. There is evidence for use of both upper and lower case. Is there any reason why:

1) Upper and lower case numeral uses should not both have entries in Wiktionary. 2) Passing an upper case Armenian letter to {{mul-numberchart}} should not cause both upper- and lower-case letters to be displayed, as for Roman and Greek numerals.

There is an issue that @Vahagn Petrosyan is deleting my changes without giving a reason, let alone a good one. This has happened with {{mul-numberchart}} and he seems to have got it into his head that թ and Թ cannot both be numerals. (He's now deleted the lower case numeral entry!) --RichardW57 (talk) 16:44, 6 August 2020 (UTC)

I did some hunting for lower case Armenian numerals. I think I've found an example on p163 of "Medieval Armenian Manuscripts at the University of California, Los Angeles" by Avedis Krikor Sanjian. In the next to last line of Armenian script text on that page, there is a date explained as (= A.D. 1824) and the 4 characters before it seem to be a lower case Armenian number 1273. (I don't read the Armenian script, so I could have misunderstood what is going on.) --RichardW57 (talk) 18:35, 6 August 2020 (UTC)

Proposal to create category 'Bodies of water'?[edit]

Would it be a problem to add Category:Bodies of water as a subcategory of Category:Landforms and Category:Water?

Currently, Category:Landforms has subcategories Category:Waterfalls and Category:Volcanoes, which are narrower in meaning than bodies of water, while terms such as lake, sea, ocean are simply listed under landforms. I think it is justifiable to categorize the later on their own. First, they are characteristically distinct from relief landforms and additionally not all accumulations of water are necessarily landforms - puddles, pools, marine/oceanic lakes (concentrations of chemically distinct water than the surrounding "normal" water), aquifers, theoretically cosmic accumulations of water, etc. arguably do not constitute landforms. Безименен (talk) 16:53, 6 August 2020 (UTC)

@Bezimenen: Yes! I have desired this category a lot of times already. Fay Freak (talk) 22:01, 8 August 2020 (UTC)

Template:ka-adj[edit]

Could we please remove the declension table from that headword template? დროული (drouli) looks really dumb. PUC – 09:32, 8 August 2020 (UTC)

In what way does it look dumb? Dixtosa (talk) 13:08, 8 August 2020 (UTC)
Because it shows the same declension table twice. PUC – 13:37, 8 August 2020 (UTC)
I wouldn't say it's dumb but it does need fixing to separate it into two categories: Adjectival declension (when its with a noun/pronoun) and Sole/Lone declension. (Pronouns need this too). Solarkoid (talk) 14:31, 8 August 2020 (UTC)
I have removed the table from the entry not from the template as I think the table belongs there. Dixtosa (talk) 21:55, 8 August 2020 (UTC)
Not too fond of that option (I don't think we do that anywhere else), but at least there's no duplicate. Thank you. PUC – 08:11, 9 August 2020 (UTC)
No, we don't. For consistency with other languages, there should be separate headword-line templates and inflection-table templates; and the latter should be used in separate ===Inflection=== or ===Declension=== sections rather than floating on the right-hand side, which I bet makes the mobile display very hard to read. —Mahāgaja · talk 08:47, 9 August 2020 (UTC)
@Solarkoid, Dixtosa: It pops up above the headword line on mobile, which looks even worse. I think that there's no reason Georgian shouldn't stay consistent with all other languages in this respect. —Μετάknowledgediscuss/deeds 04:25, 11 August 2020 (UTC)
From my point its understandable, I can say that much. To be all honest, I didn't like it that it was next to the adjective to begin with but yeah. I personally can't do anything about it because of my lack of knowledge, but as I said, while at it, sole/lone declension might as well be added even though it is irregular on some degree. Solarkoid (talk) 17:18, 11 August 2020 (UTC)

License questions[edit]

I develop an open source - yet to be published - vocable trainer app and I have some license questions regarding the usage of wiktionary data and / or code. I'm new here, so I hope this is the correct place to ask this questions. Please feel free to point me to the right place or existing policies if I'm wrong here and this questions have been answered elsewhere already. Thanks!

  1. I currently use modified variants of Module:fi-verbs and Module:fi-nominals to create conjugation and declination forms of the respective words to learn the different forms. My current understanding is, that this Lua scripts fall under the same license as the dictionary data on wiktionary, namely CC BY-SA 3.0, is this correct?
  2. Does the license of this Lua scripts impose restrictions on my C++ program, which executes this scripts or is this independent of it? My program should be able to be fully used without the scripts and the scripts itself would be published under an CC BY-SA 3.0 license if it is possible to use different licenses of the C++ part and the lua parts (Regarding the C++ code part I'm considering a GPL 2.0 or GPL 3.0 license but I have to investigate this further as there might be license dependencies from other libraries / source code which I might use as well).
  3. Wikidata has support for lexemes (see for example this finnish word) and in principle wikidata has technical advantages for my particular program as it is more friendly for machine reading. I understand that wiktionary has a different and more broad scope which can not be represented adequately by wikidata (which has its own, different purpose), however I wonder if some particular information from wiktionary can be put on wikidata as well and wonder if there are license issues in doing so. I realize that content on wikidata is licensed under CC0 1.0 and in principle it is not possible to put content published under CC BY-SA 3.0 under CC0 1.0 unless you have created the content yourself and are therefore able to dual license it. However in my understanding a single word and also its conjugated forms are not under copyright as they are features of the language (excluding trademarks). Would it be therefore possible to insert conjugation tables from wiktionary in wikidata? Of course all creative content as descriptions and also etymology would be excluded from this due to the license issues.

Coleitra (talk) 11:03, 8 August 2020 (UTC)

1 & 2. The license applies to everything. We're not lawyers, we can't give you legal advice, and you're best off just reading the license carefully. 3. You can do whatever you like at Wikidata, as far as I'm concerned. Obviously, they won't want you committing copyright violation, so if you plan to edit there, check with them rather than us. —Μετάknowledgediscuss/deeds 04:22, 11 August 2020 (UTC)
Most inflection tables are probably only protected by database rights. (A good lawyer may know whether such rights are relevant. Some Wiktionary authors are working in the EU.) However, you will find that some tables are annotated, recording the applicability of the inflection and how well it is attested. I'm finding I'm having to put a lot of thought into the conjugation of Pali verbs - the aorist is particularly hard, but the existence of the middle of other tenses is often quite uncertain. We have a policy of not recording occasional misspellings, and I presume the same principle should apply to tables of inflections. I still haven't decided whether to record parāṇi as a neuter plural of Pali para. It's fairly clearly a Sanskritism, but that doesn't answer the question. Therefore, as we have to apply judgement, copyright probably does apply. --RichardW57 (talk) 17:26, 11 August 2020 (UTC)
Thanks for the answers, Μετάknowledge and RichardW57! I see it is not an easy topic and I should think further about it. I will look also how other projects handle it - it seems the combination of software and CC licenses is problematic up to CC BY-SA 4.0 where Creative Commons and FSF made sure it is compatible with GPL 3.0. However Creative Commons itself recommends not using their licenses for software. I would like to use some kind of copyleft license on my software and I think this is probably also required by the license of one of the libraries I use, so I have to check this very carefully. Coleitra (talk) 06:01, 12 August 2020 (UTC)

dialect vs. dialectical[edit]

An editor changed my use of {{lb|en|dialect}} to {{lb|en|dialectal}}. Is there a difference between the two? Vox Sciurorum (talk) 12:32, 8 August 2020 (UTC)

Either links automagically to dialectal in the Glossary, which gives two meanings: 1. Of or relating to a dialect. 2. Not linguistically standard. Some terms (such as hypercorrections and misconstructions) are not linguistically standard, where the deviation from the standard is not peculiar to a specific (group of) dialect(s) – but it would not be reasonable to call such terms “dialectal”. I do not know if this potential space between the definitions in the Glossary and in Main namespace is intentional, but if it is used, it is confusing. Also, I cannot think of examples that lie in that space for which the label “dialectal” makes sense, while the label “dialect” would not be appropriate.  --Lambiam 13:57, 8 August 2020 (UTC)
I wouldn't use dialectical, though. PUC – 14:46, 8 August 2020 (UTC)
No, dialectical is definitely wrong! Do {{lb|en|dialect}} and {{lb|en|dialectal}} also categorize in exactly the same way? If so, then there really is no difference between the two and there's no reason to change one to the other. —Mahāgaja · talk 08:50, 9 August 2020 (UTC)
It appears they do: bairnish, labelled {{lb|en|UK|_|dialectal|Northern England|Scotland}}, and bargoose, labelled {{lb|en|dated|UK|dialect|South England}}, are both placed in Category:English dialectal terms.  --Lambiam 16:10, 9 August 2020 (UTC)
I hear Manxists are big on dialectal materialism. — Mnemosientje (t · c) 18:47, 9 August 2020 (UTC)

Links to genius.com for song lyrics?[edit]

{{quote-song}} says, for the "url" parameter: "The URL or web address of a relevant external website, such as a website containing a score of the song. Add such a link only if the score is no longer copyrighted – do not link to a website that has content in breach of copyright. Is setting the URL to the relevant genius.com page, if the artist doesn't host their own lyrics (example), reasonable? Genius appears not to violate copyright by hosting lyrics, though I couldn't find any information about how they manage that. This seems more useful than linking to, say, the official music video, since that's harder to 'read'. Does anyone have strong thoughts on the matter? grendel|khan 20:20, 11 August 2020 (UTC)

Lyrics copyright seems like a huge mess. I don't see a problem with it. Just don't expect the links to work forever. DTLHS (talk) 20:24, 11 August 2020 (UTC)
  • a) Surely they violate copyright, their whole business model is based on doing what is illegal – although it might be that they license stuff, this will be often not viable as most lyrics will not be represented by copyright collection societies from whom they could obtain licences, and of course they may have made the commercial decision to just violate the rights as it is cheaper and the likelihood of suits for this matter is almost inexistent (especially in the rap field they began with).
    b) However links are still generally not copyright violations. (The few cases where the ECJ decided otherwise concerned special constellations for example with thumbnails or as when where the linked content would be unfindable without linking so the trial courts could just assume the site hosting the content to be connected to the linking site or consider the linking itself part of the publication infringing copyright. The rulings were it “may be” copyright violation to link but this cannot be simplified “it will be”.)
    c) Additionally even if a link to genius.com is copyright infringement the copyright holder still can’t act upon it against the Wikimedia Foundation as there isn’t the necessary legitimate interest in legal action as it is more effective to sue Genius, to take the content down instead of just the link, both being equally accessible as from US-based companies. (Some procedural principle like that will be true for most countries. I can’t tell for every system of law; the problem cannot be solved that since a web resource is accessible anywhere, one can sue in any country after its laws if the prevailing opinion on private international law in a country says that they therefore apply there.) Fay Freak (talk) 21:21, 11 August 2020 (UTC)

Ah, they did get sued, but settled with the major labels in 2014. Yeah, I'll just use Genius links for most song lyrics. grendel|khan 22:48, 11 August 2020 (UTC)

@Grendelkhan: is Rap Genius the same as Genius? The NYT article only refers to Rap Genius. I would avoid linking to Genius unless we are very sure the settlement and licensing agreement with Rap Genius covers Genius as well. — SGconlaw (talk) 14:31, 17 August 2020 (UTC)
@Sgconlaw: They started with rap, thus Rap Genius, and renamed to Genius when coverage got broader. Anyway, you ignore points b) and c). Fay Freak (talk) 14:47, 17 August 2020 (UTC)
OK. — SGconlaw (talk) 14:53, 17 August 2020 (UTC)
Copyright aside, it doesn't feel "durable": it is a user-generated site rather than any kind of academic archive, and they may fill it with ads and change the URLs at any time. I would imagine the (user-submitted) lyrics are full of errors too, which makes it a dubious source to rely on. Equinox 15:04, 17 August 2020 (UTC)
Ah, I didn't realize Genius was essentially a wiki as well. That's a good reason not to link to it as a reliable source. — SGconlaw (talk) 15:31, 17 August 2020 (UTC)
The audio is durable, if published in the usual way. The transcriptions on the myriad of lyrics web sites are not durable. They are a convenient place to look and link. They may not match the actual song. I listen to the song before trusting a web site. Even the more official version of lyrics packaged with an album can differ from the vocal track. (But the sleeve of an LP or CD is durable, so citable independent of the audio even if different.) As for legality, some of these sites license lyrics. Vox Sciurorum (talk) 15:36, 17 August 2020 (UTC)
This, that they are more likely to license, as well as that it is of a convenient layout, but also because it is a wiki with comments is why one wants to link it. Wikis support wikis, I guess? There is no rule not to link wikis, @Sgconlaw, though of course one does not quote Wikipedia. As Vox Sciurorum already implied Genius is not by itself the source; when I search lyrics I usually hear the song. OP just wanted to link Genius for the convenience of the readers – before other lyrics sites which are much shoddier and more likely to be dead links at some point. Plus wiki ≠ wiki if the only thing editors there are supposed to do is to transcribe correctly. Wiktionary is also more citable than Wikipedia because there aren’t that many things to fail and not as many interests to manipulate (than when writing biographies about living persons etc.), and I think Genius has a revision history (I haven’t felt a need to use it yet), so there is no reasonably-assumed problem from links being unstable; also because even in the far-away case that Genius.com gets sued and therefore must close we can just remove the URLs by bot, as opposed to a situation when people would link random sites. Again, the comparative stability is why OP wants to link Genius. Their market position is merited. He had considered some real reasons why he wanted to link Genius.com and not random lyrics sites, therefore the specific question. Fay Freak (talk) 17:27, 17 August 2020 (UTC)
I've rarely found a mistranscription on genius.com (compared to other lyrics sites). And the content is available on archive.org as well, so I don't see why we shouldn't link to it. – Jberkel 17:47, 17 August 2020 (UTC)

(Proto-Finnic) vowel harmony[edit]

First point: Should we or should we not add these in the descendants? Cf. *-k'as and *-t'oin.

Second point: Should we link the front suffix in Finnic entries (Finnish -kas and -käs yet Karelian -toin and -töin). Thadh (talk) 10:24, 12 August 2020 (UTC)

phrasebook parameter in head templates[edit]

I'd like to use the {{head|hu|phrasebook}} headline template for Hungarian phrasebook entries to separate them from actual phrases and to clean up the phrases category. When I used the phrasebook parameter in the past, MewBot changed it to phrase with this comment: "Fixed part of speech of {{head}}". See ma rossz idő van. Is this a policy and will a bot make this change again if we start using this parameter? Phrasebook appears to be a valid parameter to {{head}}. I understand that this parameter will not put the phrasebook entry into the lemma category and this is fine. Thanks. Panda10 (talk) 16:14, 12 August 2020 (UTC)

@Rua, do you happen to know if this change is expected to take place again? Adam78 (talk) 13:45, 13 August 2020 (UTC)

Spanish section: bot to convert simple syn/ant/hypo/hyper-nyms sections to templates[edit]

I'd like to get consensus to run a bot (User:AutoDooz) to convert the *nym sections in the Spanish language section to their corresponding nym tags.

Here's how a hypothetical entry for "grande" would be affected:

===Adjective===
{{es-adj}}

# [[big]]

====Synonyms===
* {{qualifier|obsolete}} {{lb|es|voluminoso}}, {{lb|es|enorme}}
* {{lb|es|amplio}} {{q|for cloth, shoe, place }}

====Antonyms===
* [[chico]] (Mexico)
* {{lb|es|pequeño}}

Would become

===Adjective===
{{es-adj}}

# [[big]]
#: {{syn|es|voluminoso|q1=obsolete|enorme|amplio|q3=for cloth, shoe, place}}
#: {{ant|es|chico|q1=Mexico|pequeño}}

Here's a diff of some of the the changes this would create if run against the most recent xml dump, it's probably the easiest way to see how this will behave in corner cases: https://gist.github.com/doozan/a6fe2bed7d73bf2f864c93134e780b71/revisions

This is designed to be as non-destructive as possible. It will only make changes when it is 100% sure that a given part of speech has a single definition and it is 100% capable of parsing, understanding, and converting all of the data in the nym section underneath that part of speech. If there's anything in the definition section that it doesn't understand, it will make no changes. If there's anything in the nym section that it doesn't expect, it will do nothing. If it finds any information in the nym section that it can't directly translate to a nym tag, it will do nothing.

  1. Is this a desirable change? It seems like newer entries are moving towards the use of templates over sections, and I've been converting these manually when I encounter them, but if that's wrong, please let me know.
  2. Should this process hyp(er/o)nyms or just synonyms and antonyms? Anything else?
  3. Where should the templates go in the definition? Immediately after the "# " definition, after all of the "##", "#*", "#":" entries? Somewhere in the middle?
  4. In what order should the templates be inserted? Synonyms, then Antonyms, then Hypernyms, then Hyponyms?

I can run this on other languages beside Spanish, if desired.

Bot source code available for inspection at https://github.com/doozan/wikibot JeffDoozan (talk) 18:32, 12 August 2020 (UTC)

1) I'm very much in favor. 2) WT:EL also mentions Meronyms, Holonyms, and Troponyms, which are very rare, but I say include them all. 3) Not sure, I think this is disputed. 4) Yes, that's the order listed at EL. Ultimateria (talk) 18:44, 12 August 2020 (UTC)
An issue: in your gist of proposed changes, {{l|es|conflicto}} {{l|es|bélico}} was interpreted as two separate items. This is bad formatting (it should be {{l|es|[[conflicto]] [[bélico]]}}), but it seems to be pretty commonly used when the synonym is a phrase. So your script should not assume that {{l}} templates separated by a space are separate items. (See a full list of offending Synonyms sections.) However,they aren't all one item either: grid had a case where {{l}} was used by mistake instead of {{sense}} or {{q}}: {{l|pt|starting positions of racers}} {{l|pt|grid de largada}}. This should probably be {{sense|starting positions of racers}} {{l|pt|grid de largada}}. It would probably be best to skip these cases and manually check them over. — Eru·tuon 19:19, 12 August 2020 (UTC)
Thanks for the feedback. I've added support for the extra nyms and corrected the bug Eru mentioned. Anything that has synonyms that use more than one {{l}} or {{q}} tag will no longer be processed automatically. Here's the updated sample of revisions: https://gist.github.com/doozan/c915306db12ae735d5afc1891e561f30/revisionsJeffDoozan (talk) 20:22, 12 August 2020 (UTC)

Image in the entry "swadeshi"[edit]

A 1930s poster with the caption “Concentrate on Charkha and Swadeshi”, depicting the independence activist Mahatma Gandhi using a traditional spinning wheel called a charkha to produce yarn while in prison.

I added the image shown on the right to the entry swadeshi (a policy of nationalist self-sufficiency in India, involving the revival and promotion of domestic production and (originally) the boycott of British products). I think it is good illustration for the entry as it is a historic poster promoting swadeshi, and actually mentions the term in its caption. The entry is appearing as WOTD on 15 August 2020.

@Dan Polansky has removed the image from the entry on the basis that "it does not show policy (the referent of the word) but rather a person and has excessively long caption". I'm bringing the matter here to get more opinions. Thanks. — SGconlaw (talk) 08:44, 13 August 2020 (UTC)

Thank you for starting the discussion. My position is as indicated in the edit summary. This image is not lexicographical and does not show the referent. It does not bring nearer any lexicographical fact to the reader; it stands in contrast to images of animals, even familiar animals such as a house cat. An entry for a policy probably should have no image since policies usually cannot be well shown. What I reject here is the idea of adding various loosely-related illustrations to entries only so that each entry has at least one image, and then adding to each image overlong lexicographically irrelevant captions linking to Wikipedia: the rendering in my browser shows the caption text to be twice as high as the image itself. We have successfully regulated at least one class of SGconlaw images that I found inappropriate, in WT:ELE#Images, via Wiktionary:Votes/pl-2018-04/Image policy, although the discussed image is of different sort of very marginal relevance.
On a general note, burdening the reader's attention with irrelevant or marginally relevant visual items is unfortunate. It does take reader's attention to find what they were looking for on the page; adding extraneous elements is not for free. --Dan Polansky (talk) 08:52, 13 August 2020 (UTC)
As for "mentions the term in its caption": attesting quotations is all we need, if that is meant for attestation. Otherwise, "include any image using the term of the entry in caption" would be a very bad policy, leading to inclusion of swaths of irrelevant and marginally relevant material. --Dan Polansky (talk) 08:58, 13 August 2020 (UTC)
I agree on all points with Dan. Ultimateria (talk) 16:53, 13 August 2020 (UTC)
As do I. Pictures should be reserved for concrete objects (and perhaps adjectives), not abstract concepts. Andrew Sheedy (talk) 03:49, 14 August 2020 (UTC)

What do you think of the picture in 用愛發電? 恨国党非蠢即坏 (talk) 08:16, 15 August 2020 (UTC)

That's a quotation! --RichardW57 (talk) 11:36, 15 August 2020 (UTC)
@RichardW57: No it is not. The phrase in the picture is apparently using the literal meaning, which is different from its current meaning ("to work for free"). 恨国党非蠢即坏 (talk) 09:17, 22 August 2020 (UTC)

Terms which are lemma and non-lemma forms: how to categorize[edit]

I made Teamsters: this both a standard plural of Teamster but also collectively refers to the entity of the Teamsters union, so it is a lemma in that sense. Can anyone give me a precedent for this? Should categories like Category:English lemmas be manually added in this case? And for what it's worth, let's leave aside if the second sense should be in the definition: my question would still stand in general. —Justin (koavf)TCM 01:40, 16 August 2020 (UTC)

I prefer the option of a sense at the main lemma labeled "in the plural" so all the definitions are on one page. In that case I wouldn't include the lemma category at the plural. One decent precedent I see is Cardinal/Cardinals, which rightfully lists entities as proper nouns. What I don't like is the redundancy and the fact that if you only look at one page, it's not obvious that the other contains definitions. Ultimateria (talk) 04:13, 18 August 2020 (UTC)

Picture Upload Policy[edit]

I invite editors experienced with pictures to examine the picture upload policy. The english speaking counties have laws that allow pictures to be used for educational and not for profit purposes. However, wikipedia frowns upon this because each countries's law is just a little bit different. For example, the US allows not-for-profit users to use material, and the UK allows pictures to be used for educational purposes. Here's the UK law (item 7, bullet point #2):

https://copyrightservice.co.uk/copyright/p01_uk_copyright_law

Since wikipedia is the internet's on-line encyclopedia, a cure all for all picture uploading is at hand. Can a code be made, and allowed to be used liberally for picture uploaders? This will address Wikipedia's unfair "fair use" policy (it does not allow the "fair use" of pictures, identified in the link above). —⁠This unsigned comment was added by Lord Milner (talkcontribs) at 17:35, 16 August 2020 (UTC).

There is no uniform policy for the various Wikimedia projects. Wikimedia Commons has the most restrictive one. Wikipedia and Wiktionary allow images that fall under fair use in the US. —⁠This unsigned comment was added by Lambiam (talkcontribs) at 19:08, 16 August 2020 (UTC).
I think it would be hard to justify a fair use policy at the Wiktionary. Wikipedia has encyclopedic articles about, for example, people and events, and it is easier to argue that a non-free image used under a fair use justification would aid in the understanding of such topics. This doesn’t really apply to Wiktionary entries which are simply definitions of terms. — SGconlaw (talk) 19:17, 16 August 2020 (UTC)
I'm having a hard time coming up with a fair use case for Wiktionary. If there was no free photo of Pluto? That's a hypothetical, and even then still a debatable one. There's just not a whole demand for fair use here.--Prosfilaes (talk) 23:52, 16 August 2020 (UTC)
See Special:AllPages/File:, where the only really justified one to have locally under fair use is File:Far Side 1982-05-28 - Thagomizer.png and even then, I am not 100% convinced. —Justin (koavf)TCM 00:46, 17 August 2020 (UTC)
Since we're not actually using it, we don't have a good fair-use case. It's a fine example, but certainly one we could do without.--Prosfilaes (talk) 01:57, 17 August 2020 (UTC)
Since we're not actually using it? It is being used. —Justin (koavf)TCM 13:05, 17 August 2020 (UTC)
We should have a strong bias against local uploads. We should move these screenshots to Commons or straight up delete them. —⁠This unsigned comment was added by Koavf (talkcontribs) at 00:46, 17 August 2020 (UTC).
The thagomizer example is an interesting one, but then I note it is only used on a citations page. In most cases where a non-free image is highly desirable I think it should be possible to refer readers to a Wikipedia article where the image appears, as was done at thagomizer. I think "File:Far Side 1982-05-28 - Thagomizer.png" should be deleted from the Wiktionary. — SGconlaw (talk) 14:25, 17 August 2020 (UTC)
I don't think we can justify using the image from Far Side. The etymology section explains it well enough in words. Vox Sciurorum (talk) 15:31, 18 August 2020 (UTC)

"Common-gender" terminology issues[edit]

@Atitarev, AryamanA We have Category:Common nouns by language, which refers to what Wiktionary calls "common gender". There are multiple issues with this:

  1. "Common nouns" can also refer to the class of nouns that aren't proper nouns. To avoid this redundancy, IMO we should rename Category:Common nouns by language to Category:Common-gender nouns by language, and similarly things like Category:Swedish common nouns to Category:Swedish common-gender nouns and Category:Ancient Greek common nouns to Category:Ancient Greek common-gender nouns.
  2. Because of the ambiguity of "common nouns", Dutch (which has a "common gender") puts its common-gender nouns in a one-off category Category:Dutch nouns with common gender. This should be renamed to Category:Dutch common-gender nouns. Meanwhile, Norwegian (Bokmål and Nynorsk) don't have any per-gender noun categories, which should be rectified.
  3. "Common gender" has two entirely different meanings, depending on language. In some languages, particularly Dutch and North Germanic, it refers to a separate gender category that historically derives from the merger of masculine and feminine genders, and is opposed to the neuter gender. In most other languages, however (e.g. Latin, Greek [Ancient or Modern], Russian, Hindi, etc.), it refers to a noun that can refer to either a male or female being, and takes the masculine or feminine gender according to the sense of the noun. Someone who probably didn't realize that this category existed went ahead and created a label 'masculine and feminine nouns', which is currently populated only by Category:Hindi masculine and feminine nouns. The description says it applies specifically to nouns referring to beings where the gender follows the sense (i.e. identical to the 2nd definition of "common gender noun"), but this isn't obvious from the name, and as a result it wrongly includes nouns like सिगरेट (sigareṭ, cigarette), which can be masculine or feminine but not referring to a being and not with any sense difference between the genders.

I would like to suggest one of two possibilities:

  1. Keep a single common-gender label that can have two different meanings depending on language. Call it 'common-gender nouns' and make it clear in its description that it can have either meaning depending on language. This won't cause ambiguity because languages with common gender (sense #1, i.e. merged masculine/feminine gender) can't have common-gender nouns (sense #2, i.e. either masculine or feminine depending on sense) and vice-versa. At least I *think* this is true; the only possible exception is Dutch, where some dialects have a two-way common/neuter system and others have a three-way M/F/N system.
  2. Split the two meanings into different labels. Sense #1 (merged masculine/feminine gender, as in Dutch and North Germanic) remains as 'common-gender nouns'; sense #2 becomes maybe 'masculine and feminine nouns by sense' or 'nouns that can be masculine or feminine by sense'.

Thoughts? Benwing2 (talk) 03:05, 17 August 2020 (UTC)

Is there a reason you haven't suggested we use the word epicene instead of common where appropriate? It can also be ambiguous, but at least there are two separate words. —Μετάknowledgediscuss/deeds 03:18, 17 August 2020 (UTC)
I think "epicene" could be yet another category for words referring to both males or females but having a fixed grammatical gender, per definition. An example would be до́ктор (dóktor, doctor, physician), which refers to both males and females but is used only as a grammatical masculine. Perhaps similar to Norwegian lege but I don't think we want any changes in Russian categorisations of genders.
OK to move Swedish, Danish and Dutch to "common-gender nouns". I think Norwegian (both NB and NN) is the one that still has remaining masculine/feminine genders, unlike the other North-Germanic languages and Dutch. There is some difference in gender classifications between, say Norwegian and Swedish, which should be carefully checked further.
Yes, the issue with Category:Hindi masculine and feminine nouns and similar needs a fix. --Anatoli T. (обсудить/вклад) 04:01, 17 August 2020 (UTC)
@Metaknowledge I was always under the impression that "epicene" meant as in Anatoli's example, i.e. a noun that can refer by sense to masculine or feminine beings but belongs to a fixed gender. In this sense it's the opposite of common gender, where the gender and sense agree. Benwing2 (talk) 04:13, 17 August 2020 (UTC)
I guess this is a little off-topic, but I have a small issue with the Dutch "common gender". In school, we have been taught that every "common" gender noun would be divided into either masculine or feminine (thus referenced in the third personal pronoun as either hij or zij, and indeed the distinction common-neuter would clash with the canonical distinction masculine-feminine-neuter. If you look at Wiktionary:About Dutch it is clear that the common gender is used only as a placeholder, not a grammatical gender. Thus I am not sure this is a good solution for Dutch, as we don't have the category "?-gendered nouns per language". Furthermore, I think the whole Dutch category (or header policy) should be deleted altogether or moved to some other category. Thadh (talk) 09:28, 17 August 2020 (UTC)
The Dutch term collega is both common-gender in the sense of the grammatical genders m and f having merged in the standard lect as spoken in the Netherlands, and epicene. By defining “common gender” to only refer to the grammatical gender when applied to Dutch nouns, we use the ability to identify epicene Dutch nouns.  --Lambiam 18:27, 19 August 2020 (UTC)

Low German revisited[edit]

I'd like to propose some changes to Low German. Feel free to vote under any item. (Note: I'm not married to any of the language code name.)

Current treeProposed tree
  • Low Saxon: fam:nds
    • Old Saxon: osx
      • Middle Low German: gml
        • East Low Saxon: fam:nds-eas
          • East Low German: nds-gle
          • Plautdietsch: pdt (optional)
        • West Low Saxon: fam:nds-wes
          • Dutch Low Saxon: nds-nl (optional)
          • West Low German: nds-glw
Actionables
  1. Split language code [nds] into East Low German [nds-gle] (Märkisch, Mecklenburgisch-Vorpommersch, East Pomeranian, Low Prussian) and West Low German [nds-glw] (Eastphalian, North Low German, Westphalian)
  2. Move all entries under [nds] and [nds-de] to [nds-glw] and [nds-gle]
  3. Make [nds] a family code instead, renaming it to Low Saxon (I think that was always the intent, to eventually depreciate [nds] as a language code.)
Optionals
  1. Make Plautdietsch [pdt] an etym-only code for East Low German [nds-gle], moving all [pdt] entries to [nds-gle], labeling them with {{lb|nds-gle|dialectal|Plautdietsch}}
  2. Make Dutch Low Saxon an etym-only code for West Low German [nds-glw], moving all [nds-nl] entries to [nds-glw], formatting them with {{alternative spelling of|nds-glw}}, when needed
Previous discussions

@Korn, Rua, -sche, Stardsen, Mahagaja --{{victar|talk}} 21:31, 17 August 2020 (UTC)

I'm not a position to talk about all of these proposals, but I do support splitting the Low German dialects according to linguistic boundaries rather than the political boundary of Netherlands vs. Germany. On the other hand, I oppose merging Plautdietsch in with any variety of Low German that has remained in Germany. Our existing Plautdietsch entries don't seem to reflect this well, but w:Plautdietsch language#Influences and borrowings shows that Plautdietsch has a bunch of words borrowed from languages it's been in contact with (Russian, English, Spanish, etc.), most of which German Low German presumably doesn't have. That's got to be a severe impediment to mutual intelligibility. —Mahāgaja · talk 21:45, 17 August 2020 (UTC)
@Mahagaja Plautdietsch and the other East Low German lects share many of the same borrowings, ex. Koss (goat) from Polish koza, and Margell/Mejal (girl) from Old Prussian mērgā. Conversely, West Low German has quite a few borrowings from North Germanic. Regardless, the same argument could me made for Spanish of the Americas. --{{victar|talk}} 23:45, 17 August 2020 (UTC)
Apart from the fact that it is ridiculous to assume that different names for some exotic plants (watermelon and eggplant, as mentioned on the linked Wikipedia article, which aren’t at all staples in Germany) are “severe impediment to mutual intelligibility” – speakers who came back to Germany anyway had to learn names of various material objects that did not exist in the communist backwaters and quickly did so –, I question the mere distinctness of “Plautdietsch” beside the alleged High German “Volga German”. If there are “effects of the High German consonant shift” as exampled on Wikipedia it is all just an Ausgleichsdialekt. (As also true, as I mentioned before, of Transsylvanian Saxon, i.e. the German dialect of Romania, which cannot be mapped to any dialect in Germany.) Many of the features described for the alleged Mennonite Low German are also features of Mennonite High German. Allegedly it is “Low Prussian” but Low Prussian is almost High Prussian. Suspiciously one always compares both to Standard German, Proto-Germanic etc. but never “Plautdietsch” to Mennonite High German. This concept of “Plautdietsch” heavily suffers from selection bias because some of the dialects will be closer to certain Low German dialects in Germany, some more to other dialects and some be more mixed with Mennonite High German, but both “Plautdietsch” and “Volga German” equally influenced by Russian and more influenced by each other than by any other dialects in German. So it would be most appropriate to speak of Low Russian Mennonite German and High Russian Mennonite German, but it would be unclear how to fit them into the trees, as there apparently came out one language by descendance from multiple languages. You aren’t be able to distinguish this Plautdietsch and Volga German if you pick up various returnees in Germany – only various idiolects some more and some less with features pointing to certain Low or High German dialects, but then again all is mixed up –, only if you suffer selection bias by only surveying certain villages in Russia or Canada etc., then with luck some distinctness can be sieved out – but there isn’t if you get and keep an integral picture. The Russian dialects of German are one language descending from all the German dialects from Palatinate German to Low Prussian, like Slovio descends from all Slavic languages and will sometimes be more close to Russian and sometimes be more close to Slovene, short of being more chaotic because of its states not being planned. And like when you can’t tell that an idiolect of Slovio descends from Russian or East Slavic when it is somewhat close to it you very usually can’t tell of a German dialect speaker from Russia whence his dialect descends. Yes, I have heard many of them speaking. The picture is hopelessly blurred. Fay Freak (talk) 10:13, 18 August 2020 (UTC)
Regarding the comment that you can't distinguish lects in returnees to Germany, whose lects have mixed and blurred and adapted towards standard German and/or the prevailing local lects (i.e. you can't distinguish who currently speaks which of those lects in a group of people who no longer really speak those lects?) : the obvious response is that modern mingling and convergence doesn't travel back in time and erase lects' historical (discrete) existence. Great Andamanese languages koineized and creolized as the populations of speakers diminished and were relocated and intermingled, but this doesn't retroactively change their distinctiveness in the past. - -sche (discuss) 06:14, 21 August 2020 (UTC)
I oppose absolutely everything about this, as my preferred solution would be to merge every Low German lect (including Middle Low German!) into a single language code with a normalised system and spelling and temporal variants covered with tags and "alternative forms of". The splitting of Low German is what Germans call a 'Glasperlenspiel' (a pontlessly complicated effort that serves no other purpose than keeping those involved busy) hampering the actual usability of this dictionary by disjoining related information without benefit in return. The idea that somehow "East Low German" and "West Low German" exist in such a form that these respective regions differ more from each other than the lects in these regions differ from each other I cannot confirm from my studies. Korn [kʰũːɘ̃n] (talk) 20:30, 30 August 2020 (UTC)
@Korn: The heart of what I'm suggesting is merging, but there is a pretty clear delineation between ELG and WLG, both in vocabulary and syntax. I can tell you, as someone coming from a family of Westmünsterländisch speakers, Plautdietsch might as well be High German for as much they understand Low German Mennonites. Normalizing the two into one would be nothing short of artificial shoehorning. For ELG, Plautdietsch would be the main entry space, being the one of the most speakers, and WLG would be Westphalian. I think on that basis, we have a pretty good stage for standardizing orthography. --{{victar|talk}} 16:54, 8 September 2020 (UTC)
So name them. Name these differences which differentiate all dialects west of some line from those east of that line while not also differentiating the dialects east/west of this line internally, can't wait to learn something new after all these years. Korn [kʰũæ̃n] (talk) 20:56, 8 September 2020 (UTC)
@Korn, I can list all the changes Plautdietsch underwent that Westphalian and other WLG dialects did not, if that's what you're asking, but they're nothing the Wikipedia page couldn't already tell you, i.e. palatalization (gistern vs. jistren), diphthongization (ma(a)ken vs. moake(n)), but the chiefly cited division between East and West Low German is syntax, and how WLG generalized -(e)t to the 1-3p plural of verbs, and ELG -en, and the preservation on the present perfect in Plautdietsch. At the very least, we can agree that having separate codes for Plautdietsch [pdt] and Low Prussian [nds-de] is unfortunate. --{{victar|talk}} 03:04, 9 September 2020 (UTC)
You're comparing Plautdietsch and Westphalian. Of course they differ, they're different dialects. Plautdietsch and any other dialect differ, Westphalian and any other dialect differ. Most Westphalian dialects differ no little from other Westphalian dialects. You're talking about some West Low German vs. East Low German and haven't even stated where you want to draw the line between them. But setting aside the irrelevance of your localised examples for the differentiation of West vs. East on a broader scale, even these don't hold up. Palatalisation exists in splotches basically everywhere, including Westphalian, e.g. Lippish where velar fricatives are palatalised unconditionally word initially. (The strong palatalisation of stop consonants is a feature confined to Prussian and Pomeranian, but it's phonetic, not phonemic, so certainly not worth splitting codes over - nor would it be if it was phonemic.) The generalisation of the ending -en exists in multiple dialects on level with or west of Westphalian, e.g. East Frisian. Diphthongisation of /ɔː/ to /ɔˑə/ exists in Southern Westphalian, e.g. around Dortmund. (The lacking merger of /a/ and /ɔː/ is the constitution feature of Westphalian and is not found in dialects either east or west of it.) Incidentally, I think I heard Plautdietsch speakers speak a monophthongised /moːcɘ/ in the past. I see a point to excluding the form of Plautdietsch which is 50% High German, otherwise I fully stand by what I said. Even if all your examples were neatly distributed along some east/west divide, they're minuscule and no basis to split codes on. If we worked like that, we'd have a separate code for every bloody village. Korn [kʰũæ̃n] (talk) 08:07, 9 September 2020 (UTC)

tlb template[edit]

I edited outsuave to remove the {{tlb|en|rare}} at the end of the sense line. I had two reasons: one was because I disputed that the term was rare (which is clearly subjective and I don't want to argue over that), and the other was that the term label is almost invisible at the end of a verb's headword line:

outsuave (third-person singular simple present outsuaves, present participle outsuaving, simple past and past participle outsuaved) (rare)

Are any of our readers actually going to spot that tiny label hiding there at the end of the lengthy, and generally completely ignorable, headword line? I'd be very much in favour of placing the label at the start of each sense, like we have traditionally done. This is especially so in an entry like outsuave with only one sense; there is a clear disadvantage to using the {{tlb}} template with no added benefit. This, that and the other (talk) 01:59, 18 August 2020 (UTC)

@This, that and the other: About the “rare” label: all of the suitable uses I found have been added as quotations, including the only use on Usenet.
The documentation for {{tlb}} states it is used directly after a headword line: “This template takes the same parameters as {{label}} ({{lb}}), but is used directly after a headword line, not in a definition.”
{{tlb}} also adds a different category, “Category:English rare terms”, whereas {{lb}} adds “Category:English terms with rare senses”. J3133 (talk) 02:23, 18 August 2020 (UTC)
My point is that I believe that it is unhelpful to our readers to use {{tlb}} in this way, even if this is how it is documented to be used. This, that and the other (talk) 02:43, 18 August 2020 (UTC)
I suppose that the point is to avoid repetition. If there is only one sense, there is no repetition to be avoided repetition.  --Lambiam 17:50, 19 August 2020 (UTC)
Yeah, it's a messy situation. :When I created the template (as Template:term-context, in the era of Template:context), I envisioned it as enabling a distinction between e.g. entirely archaic words/spellings vs still-common terms with a single archaic sense, and as enabling not having lengthy non-meaning-specific labels like "American spelling" or "British spelling" in front of all the meanings of especially a highly polysemous word with other labels like [[realize]]. But it is easily missed if placed at the end of the headword line, especially of a word with only one or two senses. I don't know where else it could go, though, on entries where it's appropriate to use. - -sche (discuss) 08:01, 20 August 2020 (UTC)
I think that makes total sense, as some labels can be fairly lengthy and tedious if repeated in front of every sense. But if a label is short, {{term-label}} probably doesn't need to be used. — SGconlaw (talk) 08:57, 20 August 2020 (UTC)
@Sgconlaw: It is not only about the appearance. The category that is added is also different, as I mentioned: “{{tlb}} also adds a different category, “Category:English rare terms”, whereas {{lb}} adds “Category:English terms with rare senses”. J3133 (talk) 09:07, 20 August 2020 (UTC)
This is true, and theoretically we should make that distinction... (or we should discuss whether to give up on that as an unmaintainable project, and resign ourselves to the fact that technically a term with only rare senses (or one one sense, which is rare) is still a "term with rare senses"...) - -sche (discuss) 11:18, 20 August 2020 (UTC)
It's tempting to suggest using {{tlb}} on the sense line in cases like this:
# {{tlb|en|rare}} To douse with phlogiston.
But in the case of outsuave we get:
  1. (transitive) (rare) To exceed in suaveness.
which is hardly ideal. In my mind, the imperfect categorisation is a lesser issue than presenting the entry such that all information can be easily found by the reader, so I'd prefer to format it as
# {{lb|en|transitive|rare}} To douse with phlogiston.
until a better solution is available. This, that and the other (talk) 12:43, 20 August 2020 (UTC)
Frankly, I don't see any particular benefit to having both "Category:English rare terms" and "Category:English terms with rare senses". The distinction is too subtle. The latter category is enough. — SGconlaw (talk) 12:59, 20 August 2020 (UTC)
@This, that and the other: If using it that way, why not include “transitive” in the {{tlb}}? J3133 (talk) 13:49, 20 August 2020 (UTC)

disfix/disfixation: Template:disfix/Template:disfixation; Category:Disfixations by language[edit]

To @Rua, Benwing2, Atitarev, Fay Freak, Erutuon and whomever is interested: I'd like to create a template and categories to cover cases such as French entretien and Russian призы́в (prizýv), which were obtained by subtracting a morpheme (with or without ablaut).

I had created {{deverbative}}/{{deverbal}} to that end, but someone (I think it was @Mahagaja) observed that deverbatives/deverbals are simply words derived from a verb; it doesn't say anything about the morphological/derivational process at work (and indeed, the template was used at убо́рщик (ubórščik), for example).

Thoughts? PUC – 11:59, 18 August 2020 (UTC)

Not bad. Would leave us {{deverbal}} for noncatenative derivations, since it is not clear yet how to use {{transfix}}, as @RichardW57 complained not a month ago. But whither would you link? You cannot link to a negative string. Somewhere you would like pages to describe these disfixes, with different functions (with different ids for them by which they are categorized, though at least the template use part is clear. With {{transfix}} having a different linking issue I do not even know what to supply to the template. An exampled explanation is at Wiktionary:Beer parlour/2017/January § Arabic consonant patterns. You would clearly have some sign for the nothing, with IDs if applicable, and probably one place for the negative, whereas with transfixes there are multiple ways to visualize them and places to link to, in contrast. Though perhaps to find this one place is even harder. Erutuon says there it is “unnecessarily biased against nonconcatenative morphology to not have the patterns described in the main namespace” which is most likely true so I guess the disfix page would also need a mainspace page.) Fay Freak (talk) 16:12, 18 August 2020 (UTC)
No objection, seems fine but I have no strong opinion on this. --Anatoli T. (обсудить/вклад) 02:28, 19 August 2020 (UTC)
Hmm, I didn't remember that I said that so definitely. I wouldn't be opposed to putting consonant patterns in appendices now. Some readers could be confused about the placeholder consonants if the patterns are in mainspace. — Eru·tuon 18:26, 19 August 2020 (UTC)
Re "whither would you link": am I missing something, or what about just (for "призыв") "from призывать, by removal of ''-ать", with a parameter to suppress linking of the last part if it were not itself entry-worthy, so that foob might say "from foobar, by removal of -ar" with no link? Linking to a page that explained disfixing would also work. - -sche (discuss) 11:10, 20 August 2020 (UTC)
Is this truly different from Template:back-formation/Category:Back-formations by language? – Einstein2 (talk) 18:15, 19 August 2020 (UTC)
@Einstein2: I hadn't thought of that. But although there's indeed a great deal of overlap, not all back-formations are disfixations: see décontenancer, signaler, insignifier, where a whole paradigm was created out of (what was seen as) a non-lemma form. PUC – 11:18, 20 August 2020 (UTC)
I think those could (should) still be considered back-formations. I think there are English words back-formed from plurals (and labelled as back-formations). This requires more thought, though, as to whether there is a type of entry which is a product of disfixation but not back-formation, or whether we would simply prefer to use the word disfix for some cases for some other reason... - -sche (discuss) 11:27, 20 August 2020 (UTC)

*h₃migʰleh₂[edit]

@JohnC5, Rua, Victar, AryamanA Do y'all have any idea about the placement of the accent in this word? I don't know anything about Balto-Slavic accentuation but it looks to me as if Slavic and Greek disagree with respect to the accent. The Greek and Slavic sources give the form without accent (the Slavic dictionary actually gives *h₃migʰlh₂) but can a word just be unaccented like that? In my opinion, this word was best left at the original root entry and a separate entry was not really required as there were only 3 descendants. Note also that in the original entry the descendants were sourced but they were removed and moved to this entry, this time without any sources. It isn't a big deal as the sources can be added right back away but I think if we cannot determine the accent, it'd be better if we left this information in the main entry. -- Bhagadatta (talk) 03:27, 19 August 2020 (UTC)

@Bhagadatta Even if you change something in the form, the Balto-Slavic stress will not change (these laws are not Balto-Slavic, but descendants). I would quote Beekes' reconstruction as a curiosity. Although this is my favorite Indo-European linguist. But I do not forbid you to fantasize. xD Gnosandes (talk) 17:58, 25 August 2020 (UTC)
@Gnosandes: Haha thanks for moving the page! -- Bhagadatta (talk) 00:58, 26 August 2020 (UTC)
@Bhagadatta Hehe, yep! Gnosandes (talk) 16:55, 26 August 2020 (UTC)

whose voices in audio... will they do more?[edit]

Is it possible to ask the people whose voices are on pages pronouncing, say, the infinitive of a Russian verb, to also record declination sounds? Or the declension of a noun? A participle? Etc?

They have great voices and know the language, so I wouldn't do it, but I think the code infrastructure is present...

How to contact them and offer help?

I didn't find an earlier discussion of this but could be wrong... —⁠This unsigned comment was added by 72.174.54.201 (talk) at 00:06, 20 August 2020 (UTC).

You can click the menu button on an audio file to see the uploader ("author") and leave a message on their talk page to see if they're interested in recording more. Ultimateria (talk) 04:22, 20 August 2020 (UTC)

Categories - English agent nouns that existed previously?[edit]

While looking at the English agent nouns category page, I noticed that 'curator' wasn't included. After doing some investigating how I might add it, I noticed that the normal formatting for it 'agent noun of|en|curate', wouldn't fit since it's not a new form in English, 'curator' having existed in Latin. But seeing as it is an English agent noun I was wondering what policy would be on including such words in this category?

VoxRationum (talk) 05:55, 20 August 2020 (UTC)

Since it's used in definition lines, not etymology sections, it doesn't matter IMO that the term wasn't coined in English. It functions in English as the agent noun of curate, and that's enough. Note, however, Wiktionary:Requests for deletion/Others#Template:agent noun of, where Rua argues that the template shouldn't be used at all and straightforward definitions should be used instead. But in the year and a half since that RFD was opened no one else has commented on it. —Mahāgaja · talk 07:19, 20 August 2020 (UTC)
Re whether to put the word in the category (sidestepping the question of whether to use the template): my initial reaction was that it would be fine. However, I notice both [[agent noun]] and w:Agent noun insist the noun must be derived from a verb, and we did just have some discussions about using other words, like eye dialect, only in their strict sense, so, IDK. - -sche (discuss) 11:13, 20 August 2020 (UTC)
OK. I admit at the time I wrote the above I didn't realize that the verb curate is a back-formation from the noun curator, nor that the verb is currently defined as "act as a curator for", which means reducing the definition of curator to {{agent noun of|en|curate}} would result in a circular definition. I still don't see anything wrong with manually adding {{cln|en|agent nouns}} to the entry, though. —Mahāgaja · talk 07:42, 22 August 2020 (UTC)

Wu Dialects[edit]

I've recently been adding Suzhounese vocabulary. One problem I've run into is that Wu readings automatically default to Shanghainese phonology and its tone sandhi system. Is there a way of incorporating Suzhounese readings, and those of other Wu dialects? —⁠This unsigned comment was added by Fluoromethyl (talkcontribs) at 15:54, 21 August 2020 (UTC).

@Fluoromethyl: The code at Module:wuu-pron is designed specially for Shanghainese. We would need new code designed for Suzhounese.
(Alternatively, we could include new code into Module:zh-pron that accepts plain IPA.) —Suzukaze-c (talk) 07:59, 22 August 2020 (UTC)
@Suzukaze-c: So can we do that? Input IPA for the reading? Fluoromethyl (talk) 09:40, 22 August 2020 (UTC)
Not yet qwq
But I for one won't get mad if you write (for example) * {{a|Suzhounese}} {{IPA|wuu|/a⁵ a⁵/|[a⁵⁻³ a⁵]}} instead. —Suzukaze-c (talk) 09:44, 22 August 2020 (UTC)

User:Geographyinitiative is vandalising 戰狼 and editwarring[edit]

The user is doing this by pointing a link to a nonexistent page zh:戰战狼2 and switching to absurd phrases like "Chinese character title (of a movie)". @Chuck Entz, Justinrleung, kc kennylau 恨国党非蠢即坏 (talk) 09:34, 22 August 2020 (UTC)

@恨国党非蠢即坏, Geographyinitiative: Edit warring is unacceptable (on both ends). You two should have talked it out way earlier. The edit summaries should also be used to communicate the problems with the edits. Reverting without explanation doesn't make it clear to the other side what is wrong with their edits. — justin(r)leung (t...) | c=› } 09:42, 22 August 2020 (UTC)
It is clear that a link helps the reader understand the concept further. But I can understand others may not agree. I don't plan to edit the page again. I'm not really familiar with the term and its usage. The way the page is now is just fine anyway. Sorry for causing trouble. I just wanted a link there. But it is not that important. Geographyinitiative (talk) 10:05, 22 August 2020 (UTC)
@Geographyinitiative: From what I can see, the reason for the revert was that you linked to the wrong place, which is probably even worse than linking. Also, the wording of "Chinese character title" is quite stilted. I don't think anyone is against linking if it's linked properly. — justin(r)leung (t...) | c=› } 10:16, 22 August 2020 (UTC)
I don't plan to edit the 战狼 page anymore. I want to work with others to edit the dictionary, not get into internet fights. The edit I wanted is not important, and I don't want to interfere with other people's creative viewpoint on the dictionary. I tend to add too many links and view character readable in Chinese languages and Japanese etc as characters, not just 'Chinese'. My mindset is Traditional characters as default. But I don't want to fight over that kind of minor stuff. I wash my hands of it, unless there is a penalty I need to undergo. Geographyinitiative (talk) 10:18, 22 August 2020 (UTC)
The problem is not that you added a link. It's that the link was broken (and in multiple ways). —Suzukaze-c (talk) 10:20, 22 August 2020 (UTC)

Night mode[edit]

I hope someone make the night/dark mode for the wikis. The white screen hurts my eyes enough. --Octahedron80 (talk) 12:31, 22 August 2020 (UTC)

It's more the browser's job than every individual site's job to provide colour sets. Perhaps you could create (or find) a custom stylesheet, or accessibility settings. The old Opera browser was very good with applying custom colours to sites but they probably removed that feature when Opera got ruined and dumbed down. Equinox 13:43, 22 August 2020 (UTC)
Many popular sites start to have night/dark mode for their members, like Twitter, Facebook, YouTube. Why don't we have it yet? --Octahedron80 (talk) 03:30, 23 August 2020 (UTC)
It might have something to do with them being giant corporations with dedicated employees hired to work on the UI, versus us being a not-for-profit website entirely run by volunteers. Just a hypothesis. —Μετάknowledgediscuss/deeds 03:34, 23 August 2020 (UTC)
And yet it moves: phab:T221809. --Vriullop (talk) 08:16, 23 August 2020 (UTC)
I ever applied color-inversion CSS on the entire page, but it did not work on the left menu. --Octahedron80 (talk) 11:03, 27 August 2020 (UTC)

Linking verb vs unchanged adverb[edit]

Page 21 of Garner's fourth edition reads

One must analyze the sentence rather than memorize a list of common linking verbs. Often unexpected candidates serve as linking verbs—e.g.:

• “The rule sweeps too broad.” (The writer intends not to describe a manner of sweeping, but to say that the rule is broad.)

• “Before the vote, the senator stood uncertain for several days.) (The word describes not the manner of standing, but the man himself.)

A similar issue arises with an object complement, in which the sequence is [subject + verb + object + complement]—e.g.:

• “Chop the onions fine” (The sentence does not describe the manner of chopping, but the things chopped. The onions are to become fine [= reduced to small particles].)

• “Slice the meat thin.”

An elliptical form of this construction appears in the dentists’ much-beloved expression, Open wide (= open your mouth wide)

However, I find it contradictory that dictionaries include an adverbial meaning with the adequate sense for fine and thin, as well as the adverbs in -ly for phrases such as thinly-sliced ham or finely chopped herbs (oed.com/oed2/00251139 ; oed.com/oed2/00251175; oed.com/oed2/00084909; oed.com/oed2/00084930).

Collins has both adverbs Thin(ly) with the same meaning.

THIN (adverb): ​in a way that produces a thin piece or layer of something, I like my bread sliced thin.

The adverb tight includes specific grammatical points: ahdictionary.com or oxfordlearnersdictionaries.com.

TIGHT oed.com/oed2/00252669 vs TIGHTLY oed.com/oed2/00252684 --Backinstadiums (talk) 15:03, 22 August 2020 (UTC)

Morphemes via Borrowing[edit]

If morphemes may be perceived by speakers of a language though pairs of words borrowed from another language, but we generally adopt the pattern of only presenting the etymology of an attested source word at the source word's entry, may we record such a borrowed word as containing the morpheme? Can the morpheme be given an entry? WT:CFI seems to say that mere morphemes don't qualify, but we nonetheless have plenty of prefixes and suffixes on Wiktionary. --RichardW57 (talk) 17:03, 23 August 2020 (UTC)

English is full of such pairs of borrowed words and indeed families of words derived from Latin or French, but the application I have in mind is the less productive Thai morpheme ำน, which appears in borrowed pairs such as เดิน (dəən, to walk) and ดำเนิน (dam-nəən, procession (verbal noun)), but has been applied to derive only a very few words from 'native' words, such as สำเนียง (sǎm-niiang, sound, phoneme) from เสียง (sǐiang, sound). If we are allowed to record ำน as a constituent of ดำเนิน (dam-nəən), how should we do it? My preferred method is to add a note such as 'synchronically analysable as เดิน (dəən, to walk) +‎ ำน' to the etymology, which automatically puts the word in the corresponding category. (@Octahedron80 would appear to demur, saying that Thai doesn't have the concept of an infix, but merely mimics Khmer. We consequently have a potential edit war over the etymology of สำเนียง (sǎm-niiang), which would be particularly bad if it interfered with the outstanding work of translating the information on the etymology to English.) --RichardW57 (talk) 17:03, 23 August 2020 (UTC)

Colloquial Pali[edit]

My views on this are coloured by the assumption that Wiktionary is intended to be useful (for languages other than Sanskrit), except where we consider it more important to be correct. In particular, we want someone who can read English and can split text into words to be able to look those words up without having to retype them. --RichardW57 (talk) 17:39, 23 August 2020 (UTC)

I have become aware of a minor variation in the abugidic Thai script spelling of Pali. While higher quality publications fight to use the sequence <U+0E34 THAI CHARACTER SARA I, U+0E4D THAI CHARACTER NIKHAHIT> for Roman script "iṃ", most material (about 99%) on the web has given up the fight (which I think began with Windows XP enabled for complex scripts) and just uses <U+0E36 THAI CHARACTER SARA UE> instead. There are also dead trees that use SARA UE, which happens to be currently leading 3:0 in quotations on Wiktionary. I think we should go with the former as the 'standard' form, but how should the other form be tagged? Is it appropriate to tag the forms using SARA UE as 'colloquial'? The overwhelming volume of Pali text on the web is long-established text; I'm not aware of any chatter in Pali. (There is a Pali wikipedia, but that seems to all be in Devanagari.) --RichardW57 (talk) 17:39, 23 August 2020 (UTC)

SARA UE in Pali will transliterate to "iṃ" in the Roman script; there is no problem there. --RichardW57 (talk) 17:39, 23 August 2020 (UTC)

Richard, these long, hyper-specific posts of yours aren't getting any traction, so I'll give you one piece of advice and two recommendations: 1. This probably isn't BP material. 2. We have a glossary of terms we use in the dictionary; look up "colloquial" there and you'll see it doesn't match what you describe. 3. There's barely a Pali editor community. If I were you, I would make a Pali-specific template to handle this spelling variation (an analogue of {{yi-unpointed form of}}) to point the less common form to the more common one, but I think you already knew that you should do that, so none of this was necessary. —Μετάknowledgediscuss/deeds 15:26, 25 August 2020 (UTC)
@Metaknowledge: It isn't obvious where else I should have raised the issue. "Informal" is the only term there which might be better at capturing the stratified nature of the usage. However, your technical solution does kick the question into touch for grammatically simple words - the template can be used to centrally improve the wording if people don't like it. So it's a good solution. Thank you. Perhaps we simply impose more levels of indirection for grammatically complex words before one reaches the detailed glosses. Would a less inflammatory version of "pi-Thai-workaround for" be a sensibly named template? French and Romanian have both suffered from comparable spelling distortions because of subtly less-than-ideal computer support, but I've had no suggestions from that direction. --RichardW57 (talk) 18:40, 25 August 2020 (UTC)
The major use of the labelling will be footnotes for inflection tables. I have plans for the relevant footnote to be centralised so it too can be centrally reworded. --RichardW57 (talk) 18:40, 25 August 2020 (UTC)
I'd recommend the Information desk or a more informal venue like Discord, but I guess I was saying that you should be bold and go ahead with your ideas unless there is disagreement or conflict. Inherent in that is centralising your text, so that if someone does disagree, it will be easy to fix. —Μετάknowledgediscuss/deeds 18:49, 25 August 2020 (UTC)

Entries for Arabic inflection forms[edit]

Which Arabic inflections forms should have separate entries?

To start with, I think there should be separate entries for the indefinite singular masculine accusative of nouns since most (but not all, a counter example being ماء) have an extra alif at the end distinguishing them from the nominative and genitive forms. I am interested in creating a bot to perform this task of page creation. If there is consensus, I can give a more detailed plan. Kritixilithos (talk) 12:50, 24 August 2020 (UTC)

A bot would have to be very carefully safeguarded; @Benwing2 has experience running an Arabic inflection bot and the difficulties therein. If you make a bot, you can run some test edits and then create a vote. —Μετάknowledgediscuss/deeds 00:41, 27 August 2020 (UTC)
@Kritixilithos, Benwing2, Metaknowledge: Firstly, I don't think it's a good idea to create separate entries for terms with the same title but different diacritics, such as مَاءً(māʾan). Secondly, this word doesn't receive an alif in the accusative indefinite, since it ends in a hamza, so the declension is correct. It's unlike كِتَابًا(kitāban), which does receive an alif in the accusative indefinite of كِتَاب(kitāb). --Anatoli T. (обсудить/вклад) 01:16, 27 August 2020 (UTC)
Isn't that just agreeing with everything Kritixi already said? —Μετάknowledgediscuss/deeds 02:03, 27 August 2020 (UTC)
@Kritixilithos, Metaknowledge Beware that when creating non-lemma entries, there are a lot of special cases to handle. You have to handle e.g. creating an Arabic section when there is none, doing nothing when the entry is already present, potentially adding an entry in the same etymology section (we generally group etymology sections by pronunciation for Arabic), creating a new etymology section, etc. I have an existing script to do this for Arabic verbs, participles, verbal nouns, etc.; it runs to 2,324 lines of Python, not counting utility modules. You probably don't need all the complexity in this script but I estimate your script has to be at least 600-700 lines to handle all the cases properly. If you're not comfortable writing and debugging scripts of this length, I wouldn't consider this task. Benwing2 (talk) 04:38, 27 August 2020 (UTC)
@Benwing2: I see, thanks for the advice. I note you handled noun plurals in your script, I might consider adapting it. Kritixilithos (talk) 07:41, 27 August 2020 (UTC)
@Kritixilithos Feel free. It sounds like you have enough programming experience to be able to take on a task like this. The warning is just to make it clear that this isn't a trivial task, basically to scare off newbie programmers who don't have the mindset to handle all the edge/corner cases properly. You also have to be able to clean up mistakes if they happen (which they probably will, eventually). For example, there was a bug in the handling of certain 2nd masculine plural subjunctives and jussives in Module:ar-verb (which I wrote), see [1]. Basically, I forgot a silent alif in one place. The code in this module was used to generate non-lemma entries for several thousand verbs, so I had to write another script to clean up the mess, moving the misspelled pages to the properly spelled pages if that was possible without messing something up, otherwise removing the misspelled form from the page. This removal wasn't trivial: you might have to remove one line, one entry, one whole etym section or the entire page, and when removing an etym section, if you're left with only one, you have to deindent the etym section, since pages with single etym sections are indented less than pages with multiple etym sections.
BTW the github page you link to above is way out of date; if you're interested I'll push my more recent code to that page. Benwing2 (talk) 08:38, 27 August 2020 (UTC)
@Benwing2: Noted. Sure, I would be interested in your recent code. Even if I choose to write a bot anew, I can still refer to your program to know what things to handle. Kritixilithos (talk) 09:06, 27 August 2020 (UTC)
“we generally group etymology sections by pronunciation for Arabic” – we of course group by etymology. Having a new etymology section for each pronunciation, or only to have pronunciation sections, is annoying (sometimes even followed by the identical reference sections for each etymology, which is even more distracting). Page layout should not depend on the pronunciations, it’s not what to structure around. It is probably easiest to put all inflections and non-lemma forms (including verbal nouns, which are categorized as lemmas) into one etymology section {{nonlemma}}, even though even that is often superfluous if etymology 1 is only ”from the root XYZ” and the non-lemma forms are also from that root. Fay Freak (talk) 10:45, 27 August 2020 (UTC)

"Translations" of surnames[edit]

An anon has created a number of entries where they add an English and in some cases French entry for a Finnish surname, see e.g. Tikkanen. The names so far added seem to belong to famous sportsmen and it is clear that a lot of usage can be found in many languages, but does that make "Tikkanen" a French, English, German, Swahili etc. word? What do we think about this? --Hekaheka (talk) 15:05, 24 August 2020 (UTC)

  • Many of the more common English surnames have reasonable translation tables. See Smith as an example. Though I'm pretty sure that "Tikkanen" is not an English surname (only half a dozen entries) on [2]. (for comparison, the same source has nearly 4,000 entries for the Italian surname "Rossi). SemperBlotto (talk) 15:10, 24 August 2020 (UTC)
It is the name of a US horse with an entry on Wikipedia, though. The problem is that some of these entries may be useful. One may want the translations for translating information about a sportsman to another language - perfectly legitimate. Some might miss the Thai forenames - how do you inflect Somchai in Russian? The information doesn't have to come through formal translation - one can just track the descendants of the surname in the original language. If Wikimedia's truly not going to run short of storage space, then we have two filtering criteria:
  • Notability - is there an entry on the English Wikipedia? Do we need to raise the bar beyond that? Do we use different languages' wikipedias?
  • Reliability - three quotations or whatever.
We have the same issues with all essentially multilingual entries - per language pronunciation, and per language grammar (mostly inflections).
Transliteration also raises its ugly head and it may be useful to know what the standard transliteration is - even if we don't like supporting tattoo creation.
As for flooding the Roman script pages, my feeling is that the scheme of all languages on one page is only going to fail more and more as we extend our coverage of ordinary words. --RichardW57 (talk) 16:02, 24 August 2020 (UTC)
We could easily multiply up Rommel to be German, French and Flemish as well as US English. --RichardW57 (talk) 16:35, 24 August 2020 (UTC)
For an extreme example that already exists, take a look at CAT:Cebuano surnames, including such well-known Cebuano surnames as Reichelt, Sommerauer, Jonsson, Nolan, Evans, Perkins, and Jansen van Vuuren. —Mahāgaja · talk 19:09, 24 August 2020 (UTC)
Well, footballers' names do tend to get mentioned a lot, especially if they play for the national team, as Patrick Reichelt does. I could rattle off the names of half-a-dozen Thai pop stars with British surnames - but they're not an issue as we have rules against Thai names and as Thai words they should probably be in the Thai script. I must admit Nolan looks more like a forename than a surname. However, Cebuano on its own is not a flood - it's only one entry per page. For how a flood can develop, see Habsburg, and note the Hungarian inflections. These all have some way to go before they get as bad as Anna or Roma. Fortunately, capitalisation will mostly keep proper names to their own pages. Pity the word meter, with no protection from the SI system. --RichardW57 (talk) 20:29, 24 August 2020 (UTC)

There were five cases, all of them ice hockey players who have played in the NHL: Selänne/Selanne, Lumme, Tikkanen, Koivu and Kurri. I have done the following:

  • kept Tikkanen and Koivu as English (and of course Finnish) entries because according to the US Census a few hundred persons with each surname live in the US
    But do they live in English-speaking parts?
  • kept the English and French sections of Selanne, because it is used as alternative spelling of Selänne in those languages
  • deleted the English and French sections of Lumme, Kurri and Selänne because a) they don't seem to be used as surnames in either language and b) as such the entries provide little if any value
    What is the language of the PDF: http://spectrumgrp.com/wp-content/uploads/2017/04/Lumme-Dale-Captain.pdf ? It looks like English to me.
  • deleted the French sections of Tikkanen and Koivu for same reasons
  • kept transliterations and eventual alternative spellings as Descendants in the Finnish or English section

Hope you find my solution acceptable. --Hekaheka (talk) 11:59, 25 August 2020 (UTC)

I find it ultra vires, and totally misses the point that for a name to be a surname in a language it is not necessary for it ever to have been borne by someone who speaks that language. Putin is a surname in English, even if it have only ever been borne by people who live in Russia. --RichardW57 (talk) 13:58, 25 August 2020 (UTC)
There are no surnames in English. Names aren’t “used” the same way other words are used. I have explained in WT:Beer parlour/2019/March § Attestations of native toponyms mentioned in Latin texts and Wiktionary:Votes/pl-2019-11/CFI policy for foreign given names and surnames and Wiktionary:Beer parlour/2019/October § Place and given names in other languages. We need to present names of people and settlements fundamentally differently in dedicated translingual name sections if anons continue to assign names to languages unbridled by any criteria other than the contourless “it is used” and we want to avoid to counter it with but experienced dictionary editors’s intuition of what goes to far. Fay Freak (talk) 15:00, 25 August 2020 (UTC)
I agree that names and toponyms need to be presented differently. Ultimateria (talk) 15:13, 25 August 2020 (UTC)
What is the problem with how things are developing at present? Is it for example that German nouns risk being buried in a flood of copies of a proper noun? Is it that a page can't support, say 50 inflection tables? This seems different to the case of letters, where I suspect hordes of alphabet lists are part of the problem, but one page can probably handle the letters of an alphabet. --RichardW57 (talk) 20:36, 25 August 2020 (UTC)
Perhaps we should consider the following cases:
  • An inflected name used in Latvian. Perhaps the Afrikaans name Smuts would be a suitably challenging example, as many languages would use the same form, whereas most names are changed by being borrowed into Latvian.
  • A foreign surname used in Russian.
  • A foreign name as used in Chinese.
  • A foreign name in Hungarian.
While I like the idea of making these names that are naturally the same in dozens of languages formally 'translingual', how do we record the language specific aspects? They are pronunciation and inflection once they have stabilised, and sometimes other details, such as gender and number. In some languages, they may be instantaneously stable. Can we document inflection via a set of rules for each language? English inflection is fairly simple; other languages' is not. How do we handle pronunciation? One partial solution may be to exile localised aspects of a translingual name to an appendix for that name; that would declutter the table of contents for the original page. For the eternal city, would Rome and Roma both be translingual? --RichardW57 (talk) 20:36, 25 August 2020 (UTC)

There is/are a wide variety of patterns[edit]

Discussion moved to Wiktionary:Tea room/2020/August § There is/are a wide variety of patterns.

Pite Sami header transcriptions[edit]

I have originally created a page-specific discussion on this topic, but after seeing more of this kind I have decided to open one here. I don't understand the point of such a difficult diacritical representation of the pronunciation. Isn't that why we have the header ===Pronunciation===? And personally, I think the IPA is a far more effective way to show the pronunciation than to change the header from "båssjo" to "bå̄sˈsjo" (or something in that style). I have hidden these transcriptions (two now) for the time being, but the Proto-Samic reconstruction pages seem to favour these writings as well, and that made me doubt my approach. Thadh (talk) 13:46, 26 August 2020 (UTC)

Important: maintenance operation on September 1st[edit]

Trizek (WMF) (talk) 13:49, 26 August 2020 (UTC)

derived from removal of morphemes[edit]

Is it appropriate to change {{descendant}} to consider derived terms not just from the addition of morphemes but from the removal of morphemes too? Otherwise I'm not sure how to classify e.g. Armenian words derived from Russian words ending in -ный where that syllable is dropped in Armenian. It's a big enough difference that I don't think they're direct borrowings. Ultimateria (talk) 18:23, 26 August 2020 (UTC)

Category:Mandarin pinyin entries without Hanzi[edit]

This seems to be obsolete now. From what I can tell, all the entries in this category have hanzi listed. ---> Tooironic (talk) 23:25, 26 August 2020 (UTC)

@Tooironic: The category is automatically generated if, at least one hanzi spelling is red-linked. So, it's valid and not obsolete. Do you have an example in mind? The other thing is, those spellings needs to be valid.
TBH, I think working on multisyllabic pinyin entries is a waste of time, IMO, multiword entries, like [[biànlì shāngdiàn]], especially. Terms can be found without them. I suggested to suppress pinyin links on multiword entries such as 便利商店 (biànlì shāngdiàn). --Anatoli T. (обсудить/вклад) 00:44, 27 August 2020 (UTC)
Ah, I see. Multisyllabic pinyin entries (yìyì, shìshí, etc.) should definitely be kept as they can be used to cross-reference our entries with those of other dictionaries. They are also helpful for users generally. As for "multiword" pinyin entries, do you mean pinyin entries with spaces in them? ---> Tooironic (talk) 01:11, 27 August 2020 (UTC)
@Tooironic: Yes, I mean pinyin entries with spaces in them, the simplest way to identify "multiword" entries. They don't really help to disambiguate anything, just filling red links. I think User:Justinrleung supports this but I don't remember where I asked him about it. --Anatoli T. (обсудить/вклад) 06:35, 29 August 2020 (UTC)
That makes sense. We would need to put it to a vote though wouldn't we? Plus we'd need a bot to deal with the mass deletion of the entries. ---> Tooironic (talk) 06:41, 29 August 2020 (UTC)
@Atitarev, Tooironic: I'm indifferent on this. — justin(r)leung (t...) | c=› } 06:46, 29 August 2020 (UTC)
@Justinrleung, Tooironic: @Justinrleung: It's understandable but you're not editing in pinyin either. Maybe it will become important later, as it was the case with pinyin capitalisations? I am personally annoyed when translations hyperlink pinyin romanisations, as if it's some kind of alternative script, like this: 便利商店 (biànlì shāngdiàn).
@Tooironic: Thanks. A minivote would do, IMO (pro/contra/indifferent). Technical solutions may be requested later, when there is a general agreement. Finding pinyin entries with spaces must be a simple task. --Anatoli T. (обсудить/вклад) 06:58, 29 August 2020 (UTC)
@Atitarev: I also do not like hyperlinking pinyin romanizations anywhere other than in {{zh-pron}}, but that's a separate issue. — justin(r)leung (t...) | c=› } 07:03, 29 August 2020 (UTC)
@Justinrleung: Would hyperlinking like this: [[biànlì]] [[shāngdiàn]] (separate words) be more appropriate in {{zh-pron}} in your opinion? --Anatoli T. (обсудить/вклад) 07:06, 29 August 2020 (UTC)
@Atitarev: I think it'd be safer to suppress links altogether for separate words because there maybe instances where the parsing may not be right (e.g. a suffix attached to multiword phrase, like 美術史學家). — justin(r)leung (t...) | c=› } 07:21, 29 August 2020 (UTC)
@Justinrleung: I agree, otherwise it may require custom hyperlinking, like [[měishù shǐxué]][[jiā]] or similar. --Anatoli T. (обсудить/вклад) 07:28, 29 August 2020 (UTC)

Translations of numbers in short and long scales[edit]

Currently, translations of the numbers that have short and long scale definitions (billion and higher) are on two entries—the entries for the number in the short scale and the long scale. To avoid duplication, the translations should be in one entry—either the entry for the number in the short scale or the long scale. J3133 (talk) 09:03, 27 August 2020 (UTC)

The words for the short scale are always used in the long scale, so these will have to stay entries, so what you are proposing is prioritising the short scale over the long scale. Thadh (talk) 09:34, 27 August 2020 (UTC)
@Thadh: No, I am proposing not duplicating the same translations in two entries. J3133 (talk) 09:52, 27 August 2020 (UTC)
I'm sorry, I thought you meant keep the entries. My bad. I don't know if the long/short system is English-specific though. Thadh (talk) 10:02, 27 August 2020 (UTC)
Most languages don’t have similar ambiguities. If English billion is used in the sense of 109, translating it into French as billion, which can only mean 1012, is wrong. This is not very different from translating English spring in the sense of the season following winter by ressort, which is a good reason for having seperate translation tables for different senses of the term.  --Lambiam 14:45, 27 August 2020 (UTC)
@Lambiam: You do not understand; I am not opposing “having separate translation tables for different senses of the term”. I am opposing having to duplicate all of the same translations (for the same sense) on two entries. J3133 (talk) 14:58, 27 August 2020 (UTC)
Currently the translation table for milliard just says “10^9see billion”. So do you want, likewise, billion to say “a million million; 1,000,000,000,000see trillion” ?  --Lambiam 15:51, 27 August 2020 (UTC)
@Lambiam: It already says that and is not my point. I suggest you reread my previous messages. J3133 (talk) 15:55, 27 August 2020 (UTC)
So could you describe in a positive way, preferably illustrated with a concrete example, what the change is you would like to see, instead of telling us what you don't want?  --Lambiam 16:00, 27 August 2020 (UTC)
@Lambiam: Could you describe what specifically is it that you do not understand? I do not understand what are the points that you are trying to make and they do not seem to be in a positive way. J3133 (talk) 16:07, 27 August 2020 (UTC)
@Lambiam: J3133 refuses to explain, but the problem is that, for example, there are translations for the sense 1012 at both billion (long scale) and trillion (short scale). The proposal is to have just one translation table for 1012, at either the short-scale or the long-scale entry. — Eru·tuon 19:54, 27 August 2020 (UTC)
So should one of the two just give up its translations altogether (but J3133 denies that this is what they want), or should it refer the user to the table at the other lemma, like in the form “a million million; 1,000,000,000,000see trillion” (but J3133 also denies that that is what they want)? I only see statements about what they do not want, leaving no room for a solution.  --Lambiam 20:07, 27 August 2020 (UTC)
The second one is precisely what was proposed (see short/long scale) Thadh (talk) 20:17, 27 August 2020 (UTC)
What I was actually speaking of is, for example, Russian: it has both the terms биллион (billion) and миллиард (milliard). While indeed, both of these could be put together into one translation hub, another approach would be giving these at billion and milliard respectively. Surprisingly, though, this doesn't yet happen, so indeed I agree with your proposal. Thadh (talk) 16:36, 27 August 2020 (UTC)
Depending on the sense in which the term billion is used, биллион (billion) may be an incorrect translation, as I argued above for translation to French.  --Lambiam 20:07, 27 August 2020 (UTC)

More/most unhappy/unlucky etc.[edit]

According to the Collins Cobuild English Usage, page 561

Three-syllable adjectives formed by adding 'un-' to the beginning of other adjectives, for example unhappy and unlucky, have comparatives and superlatives formed by adding '-er' and '-est' as well as ones formed by using more and most.

Should the analytic forms be automatically added to the entries of that type of adjectives? --Backinstadiums (talk) 10:07, 28 August 2020 (UTC)

Restrict the use of UCLA Phonetics Lab Archive transcriptions[edit]

The UCLA Phonetics Lab Archive hosts word lists with recordings and phonetic transcriptions for a variety of languages. [3] The recordings of native speakers are likely always of value for those interested, but transcriptions have to be accurate before they are useful.

There have in the past been instances where the UCLA Phonetics Lab Archive was used as a reference for adding transcriptions.

It is clearly the case that some of the transcriptions in the archive are inaccurate. The transcriptions for Bura do not indicate tone, although it is a tonal language. In the Swahili transcriptions, stress is shown with a high-tone diacritic, even though Swahili is not a tonal language. The Dutch transcriptions contain a high number of errors and notational inconsistencies, such as [tɛ] and [dɛ] for <te> and <de>, transcriptions with aspirated plosives and inconsistent use of [v] versus [ʋ].

So for some of the languages there are serious errors in the transciptions. Even if the transcriptions for other languages are in a better state, the inconsistent quality makes the archive's transcriptions unreliable. So I think that at least some restriction on their use is in order.

I'd suggest three alternative options for restricting those transcriptions:

  1. Universally ban the use of UCLA Phonetics Lab Archive transcriptions.
  2. Generally ban the use of UCLA Phonetics Lab Archive transcriptions, but editor communities may allow their use on a per-language basis.
  3. Allow editor communities to ban the use of UCLA Phonetics Lab Archive transcriptions on a per-language basis.

Naturally, none of these options would disallow editors to transcribe the recorded words themselves.

←₰-→ Lingo Bingo Dingo (talk) 17:49, 28 August 2020 (UTC)

I support #2. As part of this, I also think that it should be removed as a reference from entries. —Μετάknowledgediscuss/deeds 19:49, 28 August 2020 (UTC)

Wiktionary:About Ainu[edit]

This page was deleted a couple of years ago due to lack of content, then created new in June by @Mkpoli. Nothing links to it except Category:Ainu language (automatic), User talk:BenjaminBarrett12 (referring to the deleted version) and Wiktionary talk:About Ainu (because of the archived deletion discussion). It links to nothing but Category:Ainu language and the templates it uses. Other developments:

  • Module:ain-translit was created in 2016 by @Suzukaze-c and edited by @Octahedron80 in 2017
  • Wiktionary:Ainu transliteration was created a few days ago by @Alves9. Nothing links to it, and it only links to Module:ain-translit (which Alves9 edited on the same day), and the entries for the kana it contains.
  • 2600:1:B16F:FBD6:614A:6793:6071:321D has been systematically (and ineptly) converting all the entries for the small kana used by Ainu from Translingual to Ainu. They corrected their mistakes in the one entry they were reverted on, but the rest still have a mix of Translingual and Ainu templates and categories.
  • @Eirikr and Alves9 have been debating (and talking past each other) about approaches to the language in RFV-N and the Tea Room
  • @Siljami posted a quite legitimate question at Category talk:Ainu adjectives about whether Ainu adjectives are really stative verbs. This question was discussed in 2013 and everyone seemed to show support for the stative-verb approach at the time.
  • Sijami mentioned that attempts to change Ainu adjectives to stative verbs have been reverted in the past [in April 2019 by Eirikr], while Wiktionary:About Ainu lists 7 parts of speech- which don't include "adjective".

The impression I get is of several people working independently on the language and setting up parts of the infrastructure, but no community. If memory serves, User:BenjaminBarrett12 (who set up the original useless About page) was the main person working with the language in my early days as an admin, but he hasn't been around since 2015. Category:User ain contains only Category:User ain-1, which is empty. Both were created automatically, and nothing links to them except the automatic link from Category:Ainu language.

Is there any way we can get everybody on the same page and have Wiktionary:About Ainu functioning as a reflection of community consensus, rather than as something irrelevant to most editors in the language? Chuck Entz (talk) 00:10, 29 August 2020 (UTC)

I'm not sure if there was ever any debate surrounding this, but Kana is an awful system for representing Ainu. If we can all agree to mainly use Latin from now on many divides that have appeared because of it will vanish. If we can get that right, I'm positive the About page will look beautiful in no time (provided there aren't any stubborn reverters!) And forgive me if I am wrong, but aren't Japanese "adjectives" also, in fact, verbs? It's a convenience term, I believe. Alves9 (talk) 01:32, 29 August 2020 (UTC)
What script to use is a side issue. How is it an obstacle to arriving at a larger consensus? As for Japanese, that's a matter for the Japanese community, who have their own conventions about part of speech. Let's keep the focus on Ainu. Chuck Entz (talk) 02:00, 29 August 2020 (UTC)
An user has been making quite an issue over whether it should be half-width ィ should be full イ in one place, ㇻ be ラ in another, etc., issues that would not arise if Latin was the predominant system. Also, I am sorry to say, a great deal of Ainu entries are just plain incorrect because someone happened to enter a big kana instead of a small one, since they represent two very different sounds (in fact, that's probably most of the ones taken from Mr. Batchelor's material). I thought you would know as, it seems, you have been following our discussion closely. Alves9 (talk) 02:11, 29 August 2020 (UTC)
That's one item on my list above. What about all the others? Chuck Entz (talk) 02:32, 29 August 2020 (UTC)
These are my considerations:
  • Automatic transliteration should be probably made functional as soon as possible if Kana entries are expected to be added at a large scale at any point (either that or abandon Kana completely).
  • All Ainu language materials that I have make the approximation of stative verbs to adjectives, the same way it is made in Japanese: Both Kindaichi and Chiri make a distinction between verbs and adjectives, although they both note that such a distinction is not required from a functional point of view. Kindaichi says that [Ainu] adjectives (...) are similar to the category of adjectives in Japanese. [Likewise], Chiri says [that] "The difference between verbs and adjectives in Ainu is very slight, semantically the former express acts, while the latter express properties, and functionally the latter have no imperative form -- that is all. There is no morphological difference whatsoever." (Refsing, 1986)
Batchelor does not even mention anything close to the concept of a stative verb (he does, however, curiously point out that some adjectives seem to be just verbs with some variety of prefix). And so it's clear the difference is very insignificant. In general, I think taking a more practical and less pragmatical approach to Ainu is would be best, as it is still more or less a living language, and with a great abundance of material at that.
  • Ainu may not be standardised, but, based on sheer quantity of material and the fact that Ainu Times is monolithically Saru, it's clear the Saru (Biratori) dialect is the predominant one. It should take priority in most situations. Alves9 (talk) 12:08, 29 August 2020 (UTC)
Actually I made automatic transliteration from Kana to Latin (and also I have my local version of vice versa, though i didn't remember if I uploaded it, there are also some other templates and modules I made) fully functional in Japanese Wiktionary (ja:テンプレート:ain-kana-conv) and migrated a version from there to here (maybe needs to be updated). But since I don't think there is enough community and rules, especially currently we are using a mix of Kana and Latin which is quite not happy work with then. So I decided to focus on Ainu dictionary in Japanese Wiktionary. I (and maybe some other personnel) clarified the rules and build the automating/manual template infrastructure as well as added some words.
From what I see, almost all of newer materials from 2000 to now, except some material intended for Basic Ainu for Japanese People, most of them are using Latin as the main script which has a lot of advantages. Especially I have never seen an academic paper or a dictionary without latin transliterations. Before I started contribute at Japanese Wiktionary, it's interesting that almost no lemmas are in Kana, rather all of them are in Latin and some with the manual transliteration there, but here in English Wiktionary it is a little bit unclear, most of existing lemmas using Kana and some with Latin transcription, the others are not. Anyway, I think it is necessary to be consistent whether we settle on Latin or Kana as the lemma. -- Mkpoli (talk) 06:20, 30 August 2020 (UTC)

Latin audio files[edit]

A lot of the audio files for classical Latin aren't accurate. They don't have nasalization or different vowel qualities for short vowels. The long vowels are also like three times as long with unnatural intonation. Dngweh2s (talk) 15:35, 29 August 2020 (UTC)

Classical Latin is a dead language and as such, it should not provide audio files. Few people are able to correctly pronounce Latin due to the Classical Latin phonology. Thus all inaccurate audio files for Classical Latin must be removed, as well as for any language. Ecclesiastical Latin audio files are authorized as it is still alive. Can you provide some examples ? Malku H₂n̥rés (talk) 15:50, 29 August 2020 (UTC)
You're talking about the ones by @EncycloPetey, right? They do sound really unnatural, and have mistakes even within the context of how Americans normally pronounce Latin. Unfortunately, Wikimedia Commons is unlikely to agree to removing them unless EncycloPetey himself requests it. —Μετάknowledgediscuss/deeds 16:43, 29 August 2020 (UTC)
Perhaps a solution is to have a discussion with the Wikimedia Commoners about this, to see if they can be more flexible with the deletion of such incorrect files. --Java Beauty (talk) 16:45, 29 August 2020 (UTC)
I removed all of them from 'a' to 'agnus' Dngweh2s (talk) 17:04, 29 August 2020 (UTC)
@Dngweh2s: You're wasting your time. A bot will just re-add them automatically. The only enduring solution has to be done at Commons. —Μετάknowledgediscuss/deeds 17:33, 29 August 2020 (UTC)
Maybe @Dngweh2s can just comment them out, because bots aren’t programmed to parse comment syntax and hence assume the contents of comments as already present? At least the filters often don’t ignore comments. Fay Freak (talk) 19:24, 29 August 2020 (UTC)
If one knows how it is pronounced correctly, there is also someone who can pronounce them correctly (which “few people are able to correctly pronounce Latin” also says). So we should have Classical Latin Audio files. Ecclesiastical Latin is unnatural and wrong, deriving from a time of exitiable ignorance and superstition where all went by guess and by gosh. New Latin words should also have Classical Latin pronunciations and audio files, since pronouncing all Latin like the Romans would have is the current standard – since nobody understands Latin how Anglo-Saxons pronounce it according to their wont, and few go to church, so “Ecclesiastical Latin” is unheard. And there hasn’t been a monolithic “Ecclesiastical Latin” as presented by Wiki editors either, it is mostly an excuse for accented, native-language based, ignorant Latin. It is left as but the façade for underperformers, the inner circles of Latin students has internalized historical phonology thus far as to skip the Dark Ages – whether the historical baggage or the pseudo-enlightened surroundings – and emit the clean tones of the Roman Republic. Fay Freak (talk) 19:24, 29 August 2020 (UTC)
Why would you assume that Classical Latin is the current standard? Current standard as decided by whom? I am sure Vatican City has its own standard (and being the only country with Latin as a main language, perhaps that isn't so non-standard?), while gymnasiums all over the place pronounce Latin how the pupils would be able to do it easiest. For example, although not Latin, but I know for a fact Latin doesn't differ in this, the Ancient Greek textbooks we received in school contained a table, one column signifying the "school pronunciation" (θ /t/, φ /f/) and another giving the "historical pronunciation" (θ /tʰ/, φ /pʰ/). Thadh (talk) 19:44, 29 August 2020 (UTC)
Why would you assume anything happening in schools to be standard? You mention yourself the reason why schools should be disregarded and disdained: “Gymnasiums all over the place pronounce Latin how the pupils would be able to do it easiest.” Schools are an abomination where humans are degraded and nivellated to keep with the vilest standards, they shouldn’t exist. I am thinking more of university usage, and better living-Latin communities, places people actually choose to attend to use Latin.
But even in gymnasiums I have yet to hear Ecclesiastical Latin, or pronouncing ⟨c⟩ and ⟨g⟩ as anything other than [k] and [ɡ], namely before ⟨e⟩ and ⟨i⟩, or the ending -tiō not as [t̪i.oː], etc. Although I noticed boomer buffers having only been exposed to the vulgar pronunciations – it has already been eradicated in Latin teacher trainings at universities and in gymnasiums (I have witnessed both at multiple places). Apparently this is the standard as decided by the Ministry of Education, as they have stipulated in the school curricula for Latin (page 24 for the lower grades and page 40 for the higher grades) that the candidates ”are able to recite the meaning-bearing words and word blocks with correct pronunciation”. Don’t they demand correct pronunciation in this sense in Britain? I often notice they fall behind the standards of Europe in many respects. What does one even do in countries where there isn’t “traditional” Latin schooling, e.g. Russia or Japan? I guess they have no choice but to go with Classical Latin pronunciation, or to consciously violate the rules.
Science is the current standard, hence the historically correct pronunciation. School pronunciation is either that or substandard. Haven’t ever seen a school textbook using IPA by the way. What happens if a pupil decides to go with the historical pronunciation though? Why would it be less standard? It can’t be, hence his pronunciation represents the mostly violated standard. The majority is wrong. Οἱ πλεῖστοι κακοί. Fay Freak (talk) 22:29, 29 August 2020 (UTC)
I share your enthousiasm of the historical pronunciation, but I am afraid we don't yet live in a society where science dictates truth. Standardisation isn't bound to science, it's bound to usage - which is statistically defined. As for your thoughts about how gymnasiums operate, let me tell you: we pronounced <c> as /s/ before <e> and <i>, we pronounced <tio> more like /tsio/ than /tio/ and we had far too unclassical vowels, and if I in my Greek class would try to pronounced the aspirated consonants instead of fricatives, no-one, including my teachers, would understand me. Also, your notion that IPA isn't used in schools is just outright wrong, we had special lessons in English learning us IPA;
But let me return to the main topic, what is standard Latin? Most Latin speakers undoubtedly speak an Ecclesiastical variety, and like with any language, I would think the vast majority does constitute rule. After all, we don't include the Latin pronunciation in Italian entries just because it is "historically more correct"?? I mean, why would we? And we don't include 18th century pronunciation of modern Dutch, although it is undoubtedly the same language as now (we even write that in etymologies). So why would the case with Latin be different? Thadh (talk) 23:14, 29 August 2020 (UTC)
Strange strawmen: “we don't include the Latin pronunciation in Italian entries just because it is "historically more correct"” – because it is Italian and not Latin. I have not quite said “IPA isn't used in schools”. But that is true for whole countries; in NRW according to my experience nobody learns that in school, unless some teacher is particularly dedicated, nor have I seen it in any school textbook, and I have attended a lot of schools and school textbooks over the years. It is hard to find persons who know IPA. What you tell about how you pronounced, I tell you the opposite how we pronounced. None of that palatalization and if there had been one it would have been cacophonic and difficult to understand; and the German vowels happen to be like those in Classical Latin (“different vowel qualities for short vowels”).
Nor are standards bound to but usage. This is just a convenient view for Anglo-Saxons who lack regulatory authorities on languages. Sometimes usage is 90% wrong, why not. The same way a certain person on Wiktionary fails to understand the difference between misspellings and typos, which is according to what people have in their heads. When doing that “school pronunciation” people don’t even believe it is strictly correct, so it is on an even lower level than a misspelling or mispronunciation is. Statistically, most usage in the country you mention is deliberately, consciously incorrect, and should therefore be disregarded, the same way we have warned certain people to cite certain digitized academic chits on native American languages not intended for publication, the same way customary law does not arise if there hasn’t been, apart from habitual application (longa consuetudo), legal conviction that it is right (opinio iuris). Fay Freak (talk) 16:18, 30 August 2020 (UTC)
But how do you define the difference between Italian and Latin? All difference between historically related languages (which Italian and Latin undoubtedly are) is a matter of definition; If the need arose, Italian could be classified as a dialect of Latin. But historically, that need didn't arise, and indeed the opposite happened. So that is what I am trying to tell you, standardisation and relative difference are subjective matters, and I don't think - and I hope you agree - a dictionary should implement more subjective factors than needed; thus we shouldn't be setting standards, that ought to be done by subjective institutions (e.g. countries). Thadh (talk) 19:06, 30 August 2020 (UTC)

Template:circa and Template:circa2[edit]

I don't think we need two templates to write out the word "circa" or its abbreviation. And why is there no option to toggle the comma off in {{circa}}? It could be trivially done with an if-statement. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 17:06, 31 August 2020 (UTC)

It seems strange, but I think the reason for two separate templates was that {{circa}} is intended only for use for quotations on entry pages, which is why the year is in bold and it is followed by a comma, whereas {{circa2}} is for use in running text. — SGconlaw (talk) 17:26, 31 August 2020 (UTC)

Request new etymology-only languages codes[edit]

I face difficulty when etymologize Indonesia words from a specific isolect without language code. These the isolects:

  1. Basemah/Besemah isolect of Central Malay, possible code pse-bsm
  2. Classical Malay, possible code ms-cla
  3. Old Malay, possible code ms-old
  4. Betawi Ora, possible code bew-ora
  5. Betawi Kota, possible code bew-kot
  6. Betawi Udik, possible code bew-udi

Rex Aurorum (talk) 16:41, 2 September 2020 (UTC)

@Rex Aurorum I've added ms-old Old Malay and ms-cla Classical Malay, as I can see they are well documented as stages-that-exist. I also added the Betawis as I see they are dialects, and if you will find codes useful, I see no harm in granting them (if anyone does have objections, please pipe up). Besemah seems more scantly discussed, but I added it, too. If you search this site for "Old Malay" you will find some etymologies already mentioning it which you could adapt to use this code. :) - -sche (discuss) 03:50, 9 September 2020 (UTC)
@-sche Thanks for granted it. Well, about Old Malay i'll change it asap. Rex Aurorum (talk) 05:49, 9 September 2020 (UTC)
@-sche I just re-read journals about Betawi lects. This the review:
  1. Betawi Kota (syn: Betawi Tengah, Betawi Tengahan), change: add synonyms
  2. Betawi Udik (syn: Betawi Pinggir, Betawi Pinggiran), change: add synonyms
    1. Betawi Ora, it's variant of Betawi Udik, merge to Betawi Udik? Or change to new possible code bew-udi-oraRex Aurorum (talk) 09:24, 16 September 2020 (UTC)

September 2020

Is it non-controversial to run bot-tasks to apply the conventions at WT:NORM?[edit]

As of the last XML dump, there are 88,299 entries that violate WT:NORM in ways that Special:AbuseFilter/103 detects. (Of these, 74.6% violate the "One blank line before all headings, including between two headings, except for before the first language heading" rule, and 44.7% violate rules besides that one. There's overlap, obviously.)

There are also probably many entries that violate WT:NORM in ways that Special:AbuseFilter/103 does not detect; I haven't checked.

Is it non-controversial to run bot-tasks that address violations of WT:NORM? Or do we need individual discussions for different violations and how to bot-address them?

Are there any best practices I should follow for such tasks, or pitfalls I should know about?

RuakhTALK
06:14, 1 September 2020 (UTC)

To answer your first question: non-controversial, go for it. —Justin (koavf)TCM 06:45, 1 September 2020 (UTC)
Agreed! We could really use more people like you volunteering for boring bot jobs! (There are some funky Chinese and Japanese entries using {{zh-see}} and {{ja-see}}, but I don't think they technically break any NORMs; they're just worth being aware of.) —Μετάknowledgediscuss/deeds 06:49, 1 September 2020 (UTC)
OK, sounds good; thanks! —RuakhTALK 21:50, 1 September 2020 (UTC)
@Ruakh: User_talk:Erutuon#ToilBot_"Normalizing"_Vandalism (@Erutuon) —Suzukaze-c (talk) 03:24, 3 September 2020 (UTC)
Thanks for the heads-up! It sounds like Erutuon's bot was specifically targeting recently-edited pages, hence that problem, and that he fixed it by changing it to instead target pages that received edits between one and thirty days ago. (Please correct me if I'm wrong.) If so, then my bot already wouldn't cause that problem, because I use the twice-monthly XML dumps to find the pages to edit, so there's a delay of much more than a day between when the NORM-violating entry was captured in the XML dump and when the bot retrieves and edits it. (Of course, it could still happen by random chance that it edits a page that was recently vandalized, but then, the same is true of Erutuon's updated bot as I understand it. And for that matter, the same is true of any other bot; my {{t}}/{{t+}} updater could similarly edit a recently-vandalized page. So I'm not too worried about this. But it shouldn't be too much work to change the bot to skip pages with recent last-edited timestamps, so, sure.) —RuakhTALK 06:15, 3 September 2020 (UTC)
For what it's worth, the current version of my bot script is here. It won't edit pages where the latest revision is more recent than one day ago. This feature is provided by the Recent Changes API (see rctoponly). That won't be useful for your bot, though, since it's pulling from the dump. — Eru·tuon 06:44, 3 September 2020 (UTC)
Thanks for the link. My bot takes a different approach, obviously; it just retrieves the page, and if it sees that it was edited less than 24 hours ago, it skips it without editing. —RuakhTALK 08:40, 7 September 2020 (UTC)
  • The "WT-NORM" alert is a perpetual annoyance, all the more so because the message does not actually say what the problem is, so it is impossible for anyone with normal patience to fix it, when nothing visibly appears wrong. Any automated process to eliminate this useless irritation would be welcome. Mihia (talk) 22:03, 11 September 2020 (UTC)
Sorry ... I think I do now remember someone saying that "WT-NORM" was useful to identify crap random edits. If so then I stand corrected, but for me personally it is just a stupid irritation because it does not actually tell me what I have done wrong. Mihia (talk) 22:26, 11 September 2020 (UTC)
"WT-NORM" should be broken down to different tags detailing the actual problem. 恨国党非蠢即坏 (talk) 06:21, 15 September 2020 (UTC)
I agree, though according to a previous explanation, I seem to remember also that many WT:NORM "problems" are totally anal from the user perspective, such as might be silently auto-corrected, if for some reason they have a system importance. Mihia (talk) 22:04, 18 September 2020 (UTC)
I'd say they're pretty much all anal. Personally, as the creator of the filter, I've been in favor of leaving the filter but removing the tag, but for some reason haven't done it yet. That way would be totally invisible to most users but users who know how to could find edits that matched the filter. But perhaps the filter should be gotten rid of and we should only be looking at the dump to identify WT:NORM violations. — Eru·tuon 23:39, 18 September 2020 (UTC)
I would definitely support that -- that is, make "WT:NORM" invisible to ordinary users but accessible to editors who care. Mihia (talk) 10:39, 19 September 2020 (UTC)

Format for thesaurus pages[edit]

On Thesaurus pages, lists of synonyms are currently wrapped in {{ws beginlist}} and {{ws endlist}}, with items given with {{ws}}. {{ws}} links to the WS page for the argument, if such a page exists. However, this was clearly designed with a monolingual thesaurus in mind. On Thesaurus:da:nonsense, you can see that it links to a Polish page. I think it would be better to have a single template {{ws list}} similar to {{col3}} that takes a language code, and then as many terms as needed -- of course, the current format for auto-linking only works if Thesaurus entries are entered under a native synonym like Thesaurus:da:fuld or Thesaurus:god (with or without the language code). Knowing the language might also allow us to do some other things, although I can't currently think of any.
Additionally, most Thesaurus pages are not currently in a subcat of Category:Thesaurus entries by language. Most of these are English, but far from all. I added a lang parameter to {{ws header}} some time ago that categorizes. Would someone get a bot to do this?
@Dan Polansky I assume you probably have opinions about this.__Gamren (talk) 23:37, 2 September 2020 (UTC)

Using User:AutoSkull for automated surname edits[edit]

Having had a decent handful of experience with Python coding at this point, I just started messing with pywikibot, with which I am building a potential Wiktionary bot that automates edits to surname entries, and also would automate their creation. I've already been using it on my main account (see some of my recent contributions) for slower semiautomated edits, and just today I had the idea to move the testing and operations of this code to my new AutoSkull account. There are definitely still some tweaks and problems I'm working out, but in the state it's currently in, it could deal with most surname entries pretty well...but obviously most isn't quite good enough.

The tasks it will be able to perform when it is finished are currently listed on the bot account's user page. Basically, though, it will pull from lists of verified surnames and search them on Wiktionary to see if they have English entries here yet. If there is no entry, the bot will just create it. If there is an entry, the bot will decide what to do from there.

It's worth noting that among its many surname-related tasks my bot will be editing currently existing surname pages to make them a bit more complete. It will be adding plural forms to the template {{en-proper noun}} according to the consensus on how surnames should be inflected in English, with a few exceptions (see the last bullet point on User:AutoSkull#English surnames). Entries for plural inflections of surnames will also be added in large numbers. It will also add relevant Wikipedia disambiguation page links for all our surname entries when such a page exists.

I won't share my code yet as it's not in a finished state, but when it is I will. I will also at that time share a large series of edits made perhaps in AutoSkull's userspace subpages, that emulate various different wild circumstances the bot may encounter when unsupervised, to prove it won't just be wreaking havoc here. But even so, I wanted to go ahead and let the community know about the fact that I'm coding and testing with this, as I suppose that's a predecessor to a bot status vote, which I'll start in the near future. I'm really hoping with this project I can help get Wiktionary's coverage of surnames to be pretty lengthy. Let me know of any suggestions or comments. PseudoSkull (talk) 03:16, 3 September 2020 (UTC)

@PseudoSkull This sounds fine to me. If you need specific help, let me know ... I've written over 400 scripts by now to do all sorts of things on Wiktionary. These all use pywikibot and (usually) mwparserfromhell, which has proven to be a great combination. For example, one of my most productivity-enhancing scripts has turned out to be a script I wrote called find_regex.py, which outputs a text file consisting of subsets of pages (either the entire page or one language section) matching a given regex, based off of a category, references to a given page, a fixed list of pages, or a Wiktionary dump. I can then edit the text file, either by hand or using a purpose-written script, and push the resulting changes back to Wiktionary using another script push_find_regex_changes.py. This makes it possible to quickly do all sorts of manual and semi-automated changes. Benwing2 (talk) 06:47, 13 September 2020 (UTC)

Old Korean lemmas with direct attestation are in the reconstruction namespace[edit]

The two egregious examples are the genitive and the topic-marking , both of which are omnipresent in the surviving Old Korean corpus. In the case of 叱, for example, the interpretive gugyeol data makes it undeniable that 叱 (or abbreviated forms) acts as a genitive:

  • 天人供 is used to gloss a Chinese phrase in the Avatamsaka Sutra that means "provisions of the heavenly ones"
  • 國土 is used to gloss a Chinese phrase in the Humane King Sutra meaning "territory of the Buddha's country"

And so forth. These forms are thus attested, there being universal scholarly consensus about their semantic value, and do not belong in the reconstruction namespace per WT:RECONS. What is reconstructed about them is their phonetic value, but this can be marked with an asterisk while the terms themselves (in the hanzi-based orthography) are moved to the normal entry namespace.--Karaeng Matoaya (talk) 08:28, 4 September 2020 (UTC)

Symbol support vote.svg Support, agree with all points. —Suzukaze-c (talk) 03:24, 5 September 2020 (UTC)
@Quadmix77, who created these entries. —Μετάknowledgediscuss/deeds 05:22, 5 September 2020 (UTC)
Symbol support vote.svg Support per above. -- 11:26, 5 September 2020 (UTC)
In the absence of further input, I'm making mainspace entries for 叱 and other attested OK grammatical particles.--Karaeng Matoaya (talk) 00:47, 7 September 2020 (UTC)
@Karaeng Matoaya: Please mark the duplicate entries with {{d}} and an explanation (or just a link to this discussion) once you've made the entries and fixed all incoming links. —Μετάknowledgediscuss/deeds 02:04, 7 September 2020 (UTC)
@Metaknowledge: Done.--Karaeng Matoaya (talk) 13:04, 7 September 2020 (UTC)

Draft proposal for pre-c. 1910 Korean forms (Old, Middle, Early Modern)[edit]

Hi everyone,

After some talks with @Suzukaze-c, I've drafted a brief sketch draft of how to deal with pre-contemporary Korean forms at User:Karaeng Matoaya/Draft.

This will probably be moved to Wiktionary:About Korean/Historical forms if people don't hate it too much. The main features include:

  • The use of the new periodization for Old Korean, in which texts up to c. 1300 are considered examples of OK. This is the growing consensus in South Korean academia and has a number of advantages compared to the traditional periodization still used in many Western sources, which wasn't really evidence-based in the first place.
  • Only forms attested in actual Old Korean texts are considered valid entries, which won't affect anything except 波珍, which should be deleted as a proper noun-based reconstruction. Also added some preliminary standards for disputed OK entries.
  • The use of the three-way periodization of Korean given by ISO 639-3: OKO for Old Korean, OKM for Middle Korean, and KOR for Early Modern and Modern Korean. This means that Korean forms attested between 1600 and 1900 share the KO language code together with contemporary forms, and are modified with obsoleteness templates instead. (Previously the very few EMK entries that existed seemed to be grouped together with Middle Korean forms, but this is problematic given academic consensus that MK ends in c. 1600; if we want to separate EMK from Contemporary Korean, the best way to do that is to create a new language code specifically for EMK.) Some examples of new EMK entries are at 뉴#Etymology 3 and ᄯᅡᆼ.

Thoughts?--Karaeng Matoaya (talk) 13:22, 7 September 2020 (UTC)

Looking over your draft, I have a few questions / comments.
  • In the Chinese wordlists section, you state, "references to these wordlists are strongly recommended in the Phonology sections of Old Korean entries, and in the Etymology sections of Middle and Modern Korean entries." I'm not quite clear on how you mean this. Presumably this recommendation is only for those terms that have alternative forms that appear in the Chinese word lists?
  • In the Proper noun reconstructions section, you state, "references to such reconstructions are strongly recommended in the Phonology sections of Old Korean entries, and in the Etymology sections of Middle and Modern Korean entries." Similar to above.
Albeit from something of an outsider's perspective -- my Korean ability is quite basic -- your proposal looks good to me.
Really appreciating the deeper dive you're giving for Korean entries. Thank you. ‑‑ Eiríkr Útlendi │Tala við mig 18:57, 9 September 2020 (UTC)
@Eirikr Thanks for the comments, and also for the encouragement—they mean a lot. I've fixed both to "strongly recommended in the Phonology sections of otherwise attested Old Korean entries, and in the Etymology sections of likely Middle and Modern Korean reflexes" and also added three examples of how Chinese or proper noun data can be integrated within attested entries: 有叱 (*Is-), 無叱 (*EPs-), and 거칠다 (geochilda).--Karaeng Matoaya (talk) 12:30, 10 September 2020 (UTC)
The changes look good to me. Thank you again for taking this on! ‑‑ Eiríkr Útlendi │Tala við mig 18:25, 14 September 2020 (UTC)

"Pronunciation spelling" label[edit]

Is everyone happy that the usage of the "pronunciation spelling" label has by implication been determined by the outcome of the recent "eye dialect" vote? That vote established that the "eye dialect" label is to be applied only to words such as sed for said or lissen for listen that represent standard pronunciations but imply that the speaker generally uses a nonstandard dialect. It has been said that "eye dialect" is a subset of "pronunciation spelling", on which basis such words could in theory be labelled both "eye dialect" and "pronunciation spelling", but I imagine that this would be viewed as unnecessary.

This leaves words such as borrowin' for borrowing and fink for think, that represent non-standard pronunciations, as well as simplified phonetic spellings such as lite, as eligible for the "pronunciation spelling" label. Is it uncontentious that all these should be labelled "pronunciation spelling"? Are there any other types of words that are "pronunciation spelling" candidates? Mihia (talk) 08:29, 10 September 2020 (UTC)

I wouldn't use the "pronunciation spelling" label for borrowin’ and fink; I'd simply call those nonstandard forms. Things like lite, tonite, and donut, on the other hand, are definitely pronunciation spellings that are not (I think) eye dialect (at least not usually). —Mahāgaja · talk 11:44, 10 September 2020 (UTC)
I second that. I suspect that some common misspellings arose as pronunciation spellings, or, as in the case of artic and nitch, even as mispronunciation spellings. I’d apply the term only, though, to intentional nonstandard spellings that do not imply the use of nonstandard speech but merely aim to convey how kool and with it the author is. —⁠This unsigned comment was added by Lambiam (talkcontribs) at 13:55, 10 September 2020‎ (UTC).
In the case of artic, I wouldn't say that it's a mispronunciation spelling; rather, I'd say that /ˈɑɹktɪk/ is a spelling pronunciation, since 300 or so years ago artic was the normal spelling and /ˈɑɹtɪk/ was the normal pronunciation. —Mahāgaja · talk 15:52, 10 September 2020 (UTC)

Invitation to participate in the conversation[edit]

  • Yay, rules! We'd better start crafting templates to issue various degrees of admonishment, warning, and scolding before escalating to interaction bans and topic bans. We could use help from a graphic artist to produce good icons. Vox Sciurorum (talk) 18:13, 11 September 2020 (UTC)
I wanted to add "right-wingers are humans too" but I got banned instantly, lol. Equinox 22:24, 11 September 2020 (UTC)
You should have read the FAQ: "UCoC may not fit into all cultural contexts." Vox Sciurorum (talk) 22:47, 11 September 2020 (UTC)
... and the footnote at the bottom: "not actually universal"... On a more serious note (but still highly sarcastic), I'm loving the name "Trust and Safety Team" - it fills me with calm and respect, and can be made into a nice acronym too, which has been a must for any initiative since the Patriot Act. --Java Beauty (talk) 23:14, 13 September 2020 (UTC)
Careful there! One of the proposed rules is to ban sarcasm. (I'm not kidding, go look at the draft.) —Μετάknowledgediscuss/deeds 05:36, 14 September 2020 (UTC)
pics or it didn't happen
And it sure would be great if more right-wingers recognized others as human too :^) —Suzukaze-c (talk) 05:14, 14 September 2020 (UTC)

Archaic forms and spellings should not be lemmas[edit]

Most English archaic forms and spellings are lemmas. However, archaic forms are just like declined/conjugated/inflected forms, in that they don't add any information on meaning of the root word. --Numberguy6 (talk) 20:56, 12 September 2020 (UTC)

Archaic terms may have been the predominant form at times in the past. We are attempting to be a historical dictionary among other things. DCDuring (talk) 21:39, 12 September 2020 (UTC)
I disagree whom is just as much a lemma now as it has ever been and so is thee. —Justin (koavf)TCM 02:01, 13 September 2020 (UTC)
I also strongly disagree with the proposition that "Archaic forms and spellings should not be lemmas". Mihia (talk) 22:35, 13 September 2020 (UTC)

Phrase ellipsis, three regular dots or two ellipsis characters (six dots)?[edit]

Hi all,

First, sorry for cross-posting. I was advised that I'd be better served posting here. Here are my original questions:

Concern A: I came across how do you say...in English and I'm ... year(s) old. The former has been moved to how do you say …… in English. After reading the page history, there seemed to be a rational explanation as to why two ellipsis characters (six dots) were used. Given that Wiktionary:Phrasebook provides an example with three regular dots (three separate characters), I'm confused about what the naming convention should be. Please advise.

Concern B: Most people cannot type the ellipsis character (…) without copying and pasting from somewhere else. Doesn't this limit the usefulness of Wiktionary as a tool for looking up words? What if a phrase starts with the ellipsis characters and the user wanted to look that up? It would likely only be found with great difficulty.

-- Dentonius (talk) 00:11, 13 September 2020 (UTC)

I thought redirects worked and still work in Wiktionary as usual, don't they? In Wiktionary, there are many languages and in them lots of characters that are difficult to type for outsiders, however, redirects (and the {{also}} template) do an excellent job. I don't see why we should make an exception at this particular point when we don't do otherwise. The succession of three dots is just a clumsy substitute for an ellipsis character. There are several terms even in English that would be hard to type (e.g. 1,450 terms with æ or 1,213 terms with é) if it weren't for the convenient lookup and redirect features that we have here. Adam78 (talk) 01:06, 13 September 2020 (UTC)

Here is what I wrote at WT:GP:
This is maybe more of a beer parlo(u)r issue, and you might get more traction posting it there. However, I agree with you that six dots seems a bit strange. The explanation "and two of them to mark the width of an average word, separated by spaces as usual" by User:Adam78 makes a certain amount of sense but was clearly a unilateral decision. The issue with an ellipsis character vs. three dots seems less of an issue than you might think; at least for me, if I type "I'm ..." with three dots, it autocompletes to the variant with an ellipsis character. Same thing happens if you start typing "..."; it autocompletes to the ellipsis character entry. Even using a single ellipsis character isn't completely standard; for example, there's what does XX mean and Appendix:X is a beautiful language. In addition, all the entries under Appendix:Snowclones use X, Y, Z, N, etc. For snowclones maybe this makes sense as it makes possible things like Appendix:Snowclones/I'm here to X A and Y B, and I'm all out of A. I think at least all the non-snowclone entries should use a single ellipsis character.
Benwing2 (talk) 05:18, 13 September 2020 (UTC)
I disagree with ever using 6 dots to create space. Normally, when I'm just trying to create space I will use two m-dashes (——) or any number of underlines (___). But for what was being attempted on this site I don't know if any of that would be preferred. I would assume a single ellipsis would be sufficient. -Mike (talk) 22:34, 13 September 2020 (UTC)

A single ellipsis looks to me like a great compromise. I'm sorry for the one-sided change. Adam78 (talk) 15:45, 15 September 2020 (UTC)

Thanks, guys. I appreciate it. ;-) - Dentonius (talk) 17:13, 15 September 2020 (UTC)

Canadian English[edit]

Hello all, I raised a question at Category talk:Canadian English upon which I'd like to hear your input. -Montrealais (talk) 15:43, 13 September 2020 (UTC)

As far as the purpose of having a category is concerned, yes, I agree with what you say at that talk page. "Canadian English" should be for words used only (or primarily) in Canada, else what is the point. The actual name of the category could be open to discussion, though. Would a person expect a category called "Canadian English" to contain every word used in Canadian English? That is, including all North American or even "universal" English words too? I'm not sure. Mihia (talk) 22:24, 13 September 2020 (UTC)

As you imply, it doesn't make much more sense to put all North American words under "Canadian English" than it would to put universal English words under "Canadian English" on the grounds that they're used in Canada. I feel that if the category is to be useful, it should be for words that are, or at least mostly are, peculiar to Canada. There's a difference between a dictionary of English used in Canada (e.g. the Canadian Oxford Dictionary) and a list of Canadian words, which I believe most people would expect the category to be. - Montrealais (talk) 23:16, 13 September 2020 (UTC)

I think that one's perception may vary depending on whether the region in question is one's own or not. For example, as a BrE speaker, I would probably expect a list of "Canadian English words" to include words that are used only (or primarily) in Canada, whereas I might expect a list of "British English words" to include all words that are used in BrE. Opinions may vary. Despite this, we might adopt the convention that "X English" words include words used only (or primarily) in region X, and expect/require people to understand this. Otherwise the labelling may get clumsy. Mihia (talk) 00:33, 14 September 2020 (UTC)
@Mihia I definitely don't think having "British English words" consist of all the words used in British English (as opposed to the ones specific to this variety) would be workable. The category would be enormous and wouldn't be of much value, since over 99% of English words are common to all varieties. Benwing2 (talk) 03:05, 15 September 2020 (UTC)
No, I absolutely agree. I think perhaps I did not explain my point clearly enough. I was talking about what a person might expect a category named "British English words" to contain, if he or she did not already know how Wiktionary defined this. I was speculating that a person might think that it would contain all words used in British English, and therefore musing whether the category name should somehow indicate that it didn't (e.g. "Words specific to British English"). However, in conclusion I mentioned that this may be too clumsy. Mihia (talk) 09:25, 15 September 2020 (UTC)
@Mihia I see, makes sense. Benwing2 (talk) 03:31, 16 September 2020 (UTC)
I agree with the above, that I would expect this category to contain words that are specific to Canada, and I think the category should reflect that. Andrew Sheedy (talk) 22:51, 16 September 2020 (UTC)

We seem to have a consensus here, more or less - how should we proceed? -Montrealais (talk) 05:01, 18 September 2020 (UTC)

Lexicography films[edit]

Apparently there's only one film ever made about dictionaries, called The Professor and the Madman. Anyone seen it? --Java Beauty (talk) 21:12, 13 September 2020 (UTC)

(Malmoi is apparently another one. —Suzukaze-c (talk) 22:40, 13 September 2020 (UTC))
Oof, Korean film set in 1940, no thanks... --Java Beauty (talk) 23:18, 13 September 2020 (UTC)
I saw The Professor and the Madman. Watchable, though it takes various liberties with the facts in the name of drama. Out of curiosity I Googled and found these other ones:
SGconlaw (talk) 12:51, 14 September 2020 (UTC)

Vulgar Latin[edit]

Do we really need Classical & Ecclesiastical pronunciations for unattested Vulgar Latin terms? I think, we ne need. If our community unanimously decide in favour of showing only Vulgar Latin pronunciations, then we can begin using |classical= & |ecclesiastical= (the latter does not work tho' & I know not why) to hide those twain. Or better, some user might even want to make some adjustment with Latin templates to this effect. inqilābī [ inqilāb zindabād ] 12:44, 14 September 2020 (UTC)

@Erutuon, what d'you think? inqilābī [ inqilāb zindabād ] 18:43, 15 September 2020 (UTC)

Wiktionary sitelinks dashboard: URL update[edit]

Hello all, and sorry for writing in English. Feel free to translate this message below.

The Wiktionary Cognate Dashboard presents interesting data about the extension powering your sitelinks. I just wanted to let you know that the URL of this tool changed: it is now accessible at https://wiktionary-analytics.wmcloud.org/Wiktionary_CognateDashboard/ . The former URLs, https://wmdeanalytics.wmflabs.org/Wiktionary_CognateDashboard/ and https://wdcm.wmflabs.org/Wiktionary_CognateDashboard/ , will be disabled on September 25th. Don't forget to update your documentation pages accordingly.

If you have questions about the tool or the URL switch, feel free to ping me. Cheers, Lea Lacroix (WMDE) 11:46, 14 September 2020 (UTC)

If anyone wants an Indian English translation, let me know 🙃 —AryamanA (मुझसे बात करेंयोगदान) 22:33, 14 September 2020 (UTC)
I'm... kind of curious to see what that would entail. —Μετάknowledgediscuss/deeds 22:42, 14 September 2020 (UTC)
Category:Indian_English? Weird but gotta be respected. Equinox 00:05, 16 September 2020 (UTC)
We could all have a go at translating into Scots... XD - -sche (discuss) 20:53, 17 September 2020 (UTC)

Propose making Template:en-noun pluralization algorithm smarter[edit]

@Equinox, DCDuring Pinging a couple of random people who I think work on English lemmas a lot. I propose to make the {{en-noun}} pluralization algorithm smarter. Not sure if this has been discussed before. Basically, I want to implement the following default rules (which are mostly already implemented in the pluralize() function in Module:string utilities):

  1. If the noun ends in -s, -x, -z, -sh or -ch, add '-es'.
  2. If the noun ends in consonant + y, and does not begin with a capital letter, change '-y' to '-ies'. Hence cherry -> cherries, but Kennedy -> Kennedys (begins with a capital letter; cf. Rolling Stones "who killed the Kennedys?"), boy -> boys (ends in vowel + y).
  3. Otherwise, add '-s'.

The values s and es would force an '-s' or '-es' plural, as before. The special symbols -, ~, ! and ? work as before. A new symbol + means "produce the default plural"; this is used e.g. on the page accessibility in {{en-noun|-|+}}, which currently has to be written {{en-noun|-|accessibilities}} (the - in conjunction with a plural means "usually uncountable"; without a plural specified, it means simply "uncountable"). This would be implemented as follows:

  1. Implement the new behavior, but only if |new=1 is given in the template.
  2. Use a bot to find the places where arguments would change between the old and new behavior; change the arguments to the new behavior and add |new=1.
  3. As soon as all such places are changed, make the new behavior the default and remove the dead code supporting the old behavior.
  4. Go through and remove the |new=1 parameter.
  5. Remove the dead code supporting the |new= parameter.

That way, there would be no disruption while making the change. The only possible issue is someone changing the plural of an existing noun or adding a new noun while step 2 is in progress. I may be able to work around this by checking esp. for new entries in the Category:English nouns category, as I think adding a new noun would be a lot more likely than changing an existing noun. Thoughts? Benwing2 (talk) 08:34, 15 September 2020 (UTC)

Sounds promising. I appreciate your concern about the transition process. I will try to think on possible problems etc. How long do you think step 2 would take? Could an input filter be used to prevent changes to the plurals where "new=1" was present? DCDuring (talk) 14:52, 15 September 2020 (UTC)
Currently the noun ally has |1=allies. Under the proposed smarter rules this could be omitted, but I see no step that would perform this simplification. Did I miss something? I am not sure~ what the old "dumb" rules are. What is an example of a case that would be flagged in Step 2?  --Lambiam
Oppose unless you can articulate what benefit this would have. I have not encountered mistakes in English plurals in any significant amount. DTLHS (talk) 22:26, 15 September 2020 (UTC)
“Smarter“ means it takes off the work of thinking about the code – as of typing it, since with the suggested changes one would have less to specificy manually –, which means creating English entries would be faster, and less erroneous in every respect because of more human attention left – unless syntax changes requiring accustomization work towards the opposite, but there aren’t “changes” in that sense, only things becoming unnecessary. Fay Freak (talk) 22:49, 15 September 2020 (UTC)
Hi. Interesting idea. Thanks for pinging my pimply arse. You know what, I mostly see templates as something that gets in my way, that I have to work around, and I know that's really sad and wrong, because a lot of templates do very useful things. See current discussion on my talk page about why I find it hard to use the proper citations template with lots and lots of parameters (year, author, etc.). I also very much appreciate the fact that you are proposing a new=1 parameter and a phase-in rather than sort of just throwing it in there and hoping it works (ahem...). Could you please tell me: (i) what is the basis of your proposed rules (did they come from a certain grammar book, or a corpus study, etc.?) -- not just your head, right?; (ii) I think I just didn't totally follow your explanation, but suppose we have got a "perfect exception" that works with old en-noun but not with yours, such as drivebys: is there any risk of breaking these while implementing your new proposal? Thanks. Equinox 00:14, 16 September 2020 (UTC)
@Equinox In response to your questions: (i) These are the rules I was taught as a kid. Can't cite a specific grammar book but I bet any standard English grammar contains these rules. (ii) Exceptional cases like "drivebys" and "nudibranchs" would just need to be specified as {{en-noun|s}} instead of the current {{en-noun}}. My bot will change them automatically. Benwing2 (talk) 03:11, 16 September 2020 (UTC)
OK. Then probably in favour of this. We always have the Preview screen, after all. (You might also enjoy seeing the hot mess of User:Equinox/code/FindMissingNounPlurals.) Equinox 03:19, 16 September 2020 (UTC)
@Equinox Your code isn't so bad :) ... and it's liberally commented, which is something near and dear to my heart. Benwing2 (talk) 02:41, 18 September 2020 (UTC)
@Lambiam, DTLHS User:Fay Freak articulated the reason well, in my view. What the current module does by default is to always add -s to the noun, regardless of the form of the noun. So, head -> heads, house -> houses, boy -> boys, but also batch -> batchs, cherry -> cherrys, box -> boxs, etc. What I'm proposing to do is make the default rule smarter, so that there are fewer exceptional cases, and so that the cases that do require the plural to be enumerated explicitly correspond with English speakers' intuitions of what are exceptional. For example, currently nudibranch is specified as just {{en-noun}}, relying on the default -s plural, hence nudibranchs. This happens to be correct for this noun because the final -ch is pronounced as /k/, but any native speaker will tell you this is an exception, and that the "normal" plural would be nudibranches. Someone who doesn't know this word and comes across it in Wiktionary might think the bare template call {{en-noun}} is a mistake by some other editor who forgot to specify the explicit plural (which is required for 99% of nouns ending in -ch), and try to "correct" it to nudibranches. When the template is changed as I propose, so its default rules accord with normal English plural rules, all the exceptional cases will be specifically indicated as such and this problem won't occur. Benwing2 (talk) 03:23, 16 September 2020 (UTC)
@Benwing2 — You did not answer my first question, about the bot applying possible simplifications, like for example for ally replacing {{en-noun|allies}} by {{en-noun}}.  --Lambiam 04:00, 16 September 2020 (UTC)
@Lambiam Apologies. Yes, the bot would apply all possible simplifications in step 2. That would include e.g.:
The way I would probably implement it is to first replace all arguments with + where possible (which means "use the default algorithm"), then eliminate + where possible. (Specifically, {{en-noun|+}} -> {{en-noun}}, and {{en-noun|~|+}} -> {{en-noun|~}}.) Benwing2 (talk) 05:26, 16 September 2020 (UTC)
@DCDuring I'm not exactly sure how long step 2 would take, but I can write a script to find out. It usually takes 1-2 seconds to save a page, meaning the bot can do maybe 3000 pages an hour. I think an edit filter would work and be a pretty simple solution; it could even be made to allow changes that add |new=1, but not otherwise, and display a message indicating that this needs to be temporarily done. Benwing2 (talk) 03:30, 16 September 2020 (UTC)
So, if everything worked right and it ran 24 hours a day with you faithfully overseeing it, at least on standby, it would take at least 4 days. If it was overseen 40 hours a week and was only run when overseen, it would take at least 2 1/2 weeks. I don't know that there is anyone besides you who could properly oversee it and lead a prompt recovery from any unforeseen problem. Thus the time step 2 would take would be totally subject to your availability for oversight, at least on a standby basis. DCDuring (talk) 11:41, 16 September 2020 (UTC)
@DCDuring This isn't actually the case. I think you're basing your calculations on the total number of English nouns (about 348,000), but the time would be determined by only those that need to be changed. I think that would be maybe 10,000-30,000, or about 3-10 hours, but I don't know for sure. This could be sped up by about 5x by running multiple processes at once. I also realized that I can use the tracking mechanism (Template:tracking) to track any pages needing updating that get changed during the primary updating process, so there's no need for an edit filter. Benwing2 (talk) 00:52, 17 September 2020 (UTC)
Indeed. I was basing my estimate on all occurrences of {{en-noun}}. Can you identify all of the English nouns that need to be changed before beginning the changes? DCDuring (talk) 01:11, 17 September 2020 (UTC)
@DCDuring Yes, I can write a script to do this. Benwing2 (talk) 02:33, 17 September 2020 (UTC)
Isn't that necessary to reduce the time you would need to supervise the process? from my estimate to yours? Or am I missing something blindingly obvious? DCDuring (talk) 03:13, 17 September 2020 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @DCDuring No, you're not missing anything, I do have to write the script eventually. I went ahead and wrote it; here are the stats:

Stat Count
Total pages with {{en-noun}} 340,887
Number of pages touched if all possible changes/simplifications made and + used everywhere 32,739
Number of pages touched if all possible changes/simplifications made and + used everywhere except when replacing s 23,752
Number of pages touched if all possible changes/simplifications made and + used everywhere except when replacing s or es 22,024
Number of pages that will differ between old and new algorithm if no changes made 352

There are three numbers I give above, depending on how the changes are made. I recommend one of the latter two (where I leave alone s, or maybe both s and es, instead of replacing them with + where it's appropriate to do so). Both solutions would take around 7-8 hours to implement in step 2. Note that there are only 352 pages that actually *have* to be changed with the new algorithm, because they will have different results. These are pages like nudibranch that rely on the default -s ending without specifying it explicitly, and where the new algorithm would add something else (e.g. -es in this case). This means it's unlikely there will be many, if any, pages of this sort that will be added in the time it takes to run step 2 above. However, I will set up tracking so that any changes made in those 7-8 hours get reviewed and fixed up as needed.

Please note, the list of those 352 entries is here: User:Benwing2/convert-en-noun.warnings. Some of these are in fact wrong and need to have an explicit plural (e.g. windgrass, film library). I made a list of all those that may be wrong, here: User:Benwing2/convert-en-noun.warnings.likely-wrong. This list has 110 entries. Some are clearly wrong, some may or may not be wrong. Could you take a look and fix the ones that are wrong? Thanks! Benwing2 (talk) 01:16, 18 September 2020 (UTC)

Also, note that the above stats are derived from the Sep 1 dump, so there may be a few pages not included. Benwing2 (talk) 01:18, 18 September 2020 (UTC)
I just did 15, marked with {{done}}, with an explanation. Tedious. There are many common nouns that are derived from proper nouns which require looking at cites. There are a whole bunch of nouns ending in "oxy", which could probably be resolved most quickly by SB. I'll take another look when I can. DCDuring (talk) 03:47, 18 September 2020 (UTC)
@DCDuring Thanks. If you can just list what needs to be done for the others (if anything), I can make the changes fairly quickly. I flagged these because they need further investigation, e.g. all the -oxy ones seemed wrong to me but I don't know for sure. Calling User:SemperBlotto, who created many of them. Benwing2 (talk) 03:56, 18 September 2020 (UTC)
There are also some that should use {{en-proper noun}}, not {{en-noun}}, and others that are uncountable or, IIRC, plural-invariant. In some cases where there are both common- and proper-noun L2s, I wonder why we have both. DCDuring (talk) 11:52, 18 September 2020 (UTC)
@DCDuring Thanks. I implemented the necessary changes in Module:en-headword and I'm ready to proceed using the Sep 20 dump, when it comes out. Are you OK with this? Benwing2 (talk) 19:11, 19 September 2020 (UTC)
I fixed up some of the remaining cases in User:Benwing2/convert-en-noun.warnings.likely-wrong and removed the ones you or others had already done. There are about 55 left; almost all end in consonant + -y. Benwing2 (talk) 20:42, 19 September 2020 (UTC)
What I'm hoping will come out of this is a greater willingness to review the inflection line for English nouns so as to address various questions about plurals, including countability, the various departures from the basic rules, etc. Not having to type in so many of the plural forms might reduce the tedium and wear-and-tear on keyboards and thumbs of such reviews. DCDuring (talk) 00:26, 20 September 2020 (UTC)

meta:Small wiki audit/Malagasy Wiktionary[edit]

As many of you know, the Malagasy Wiktionary is the second-largest by article count as a result of its very low-quality bot-created entries. I have made a full report on Meta, and I'm hoping that the Wiktionary community can chime in on the talk page, and add pressure at Meta so this actually gets dealt with. —Μετάknowledgediscuss/deeds 02:17, 16 September 2020 (UTC)

Symbol support vote.svg Support, this bot is highly irritating. PUC – 20:15, 16 September 2020 (UTC)
Don't comment here, go comment at Meta! —Μετάknowledgediscuss/deeds 20:24, 16 September 2020 (UTC)

Northwestern Indo-Aryan[edit]

I'm trying to improve our organization of Indo-Aryan languages (it's very loose as it stands), and I have an issue that could use some discussion. The Indo-Aryan lects of the Northwestern zone (Sindhi sd, Punjabi pa, etc.) are currently classified as descendants of Sauraseni Prakrit psu. The (literary) language most closely associated with these lects is Paisaci Prakrit, which we now have a code inc-psc for.

Certainly, Sauraseni doesn't give us the appropriate intermediary forms between Sanskrit and this languages: e.g. Kholosi taɽgo (a Sindhic language) preserves the r in the consonant cluster in Sanskrit दीर्घ (dīrgha), but Sauraseni has lost it as 𑀤𑀺𑀕𑁆𑀖 (diggha). Similarly, Punjabi ਭਰਾ (bharā)'s currently given etymology does not make much sense again due to preservation of r. The conservativeness of Northwest IA is well-known, e.g. {{R:inc:Masica:1993}} discusses it. Sauraseni, on the other hand, is distinctively a Central IA language that is obsessed with cluster assimilation.

However, Paisaci is very very scantily attested and so I'm uncertain whether it actually is a good candidate for intermediary language between Northwest IA and Sanskrit. Glottolog does include it in Northwest IA, but Glottolog also very stupidly puts Dardic languages in there. What's the best way to organize Northwest IA? Create a family for it and group it under Sauraseni? Or put it under Paisaci which would require removing all of the current etymologies involving Sauraseni? Pinging @Bhagadatta, Kutchkutch, Victar, not sure who else might be knowledgeable enough to help but any input is welcome. —AryamanA (मुझसे बात करेंयोगदान) 16:19, 16 September 2020 (UTC)

@AryamanA: I am not knowledgeable about Northwestern IA lects but I would support classifying them as descendants of Paisaci Prakrit. Paisaci Prakrit's meagre attestation should be no cause for not showing it as the intermediate between OIA and Northwest IA, because we should be more bothered about representing the IA family tree as accurate as current linguistic data points to. So we can obviously go ahead with cleaning up the etymologies and the descendants to be affected by this change. inqilābī [ inqilāb zindabād ] 19:40, 16 September 2020 (UTC)
@AryamanA: I don't mind having Paisaci as the ancestor of NWIA but there are a couple of issues. Paisaci Prakrit has sometimes de-voiced Old Indo-Aryan voiced stops like 𑀢𑁂𑀯𑀭 (tevara), for which Punjabi has ਦਿਉਰ (diura). There are more examples like 𑀧𑀸𑀴𑀓 (pāḷaka), 𑀓𑀓𑀦 (kakana) etc for which I don't know the Punjabi equivalent.
Also, as for Kholosi taɽgo, the initial dental is de-voiced in a Paisaci-like manner but the r is also preserved which does not seem to be something Piasaci would do: Skt. घर्म (gharmá) --> Paisaci Prakrit 𑀔𑀫𑁆𑀫 (khamma); the r has been lost.
How should we handle cases like this? -- Bhagadatta (talk) 02:08, 17 September 2020 (UTC)
@Bhagadatta: Hmm, so it seems we won't be finding any perfect Prakrit match to Northwest IA, as I had suspected previously. (If anything, Shahbazgarhi/Mansehra Ashokan Prakrit/Gandhari Prakrit seem to be closer.) I suppose we can make a Northwest IA family and put it under Sauraseni to maintain the status quo. We could have Paisaci as a separate branch with no descendants. —AryamanA (मुझसे बात करेंयोगदान) 02:24, 17 September 2020 (UTC)
@AryamanA: Well, I love the idea of Proto NWIA and Proto Central IA etc. But can we really continue calling Punjabi and Kholosi descendants of Sauraseni? As you pointed out, these languages preserve features and (remnants of) clusters that Sauraseni lost. But then again, there are a lot of features in Punjabi that appear to be from Sauraseni; one feature I can think of is geminated stops. Classifying IA languages is a really challenging task. -- Bhagadatta (talk) 03:16, 17 September 2020 (UTC)
@AryamanA, Bhagadatta, Inqilābī: It’s really unfortunate that the Indo-Aryan language family is not understood as well as it should be, and this particular issue is certainly one that needs to be discussed.
It is the overwhelming consensus in Indo-Aryan scholarship that Punjabi and Sindhi constitute a Northwest Zone distinct from the Central Zone. This modern Northwest Zone has similarities with the Ashokan Northwest Zone. However, there are several challenges in the definition of such a Northwest Zone.
First, the dividing line between the Northwest and Central Zones in modern South Asia is not well defined. Although the Western boundary with Iranian and the Northern boundary with Dardic are somewhat clear, the Eastern boundary is blurred due to contact with Hindi-Urdu and the Southern boundary is blurred due to Rajasthani and Gujarati.
Second, the academic understanding of the dialects of Punjabi and Sindhi is sparse.
For Punjabi, academic literature usually only refers to the Majhi lect or Standard Punjabi (MSP) of Amritsar and Lahore. This predominance of MSP in the academic literature distorts any general understanding of the Punjabi linguistic area as a whole. Other than MSP lect, Saraiki is the next best understood Punjabi lect. Saraiki has the advantage of being both the variety most consistently divergent from MSP and the one with the best local claim to separate recognition. Since Saraiki is spoken in southern Punjab close to the border with Sindh, there are numerous similarities between Saraiki and Sindhi. For example, both have implosives that are otherwise absent in Indo-Aryan. Pahari-Pothwari is perhaps another important Punjabi lect due it being the native lect of Rawalpindi, Islamabad and Mirpur.
Since most Sindhi speakers in Pakistan and less than 1% of Indians speak Sindhi, understanding the Sindhi linguistic area from an Indian perspective is very challenging. Fortunately, the Kachchi lect is spoken natively in Kutch, which is accessible to Indians. Although Kachchi is a Sindhi lect, many Kutchi people choose not to identify as Sindhi as can be seen in their choice to use the Gujarati script. What is needed for a general understanding of the Northwest Zone as a whole is enough data for one or two Module:zh-dial-syns for Punjabi and Sindhi. Appendix:Sindhi Swadesh lists is a step in that direction.
Third, as is the case for all of Indo-Aryan, the documentation of various earlier stages does not represent a logical successive relationship from one stage to the next. There is no information regarding the transition from MIA to Early NIA for the Northwest Zone. The earliest data for Northwest NIA are two short fragments of the Adi Granth termed ‘Old Punjabi’ that have been analysed by Christopher Shackle. Although the attestation is fragmentary, comparing Old Punjabi with Modern Punjabi and Sindhi helps with diachronic analysis.
Despite nearly a thousand years of Perso-Arabic influence, Punjabi and Sindhi still show many features of Prakrit to an extent greater than Marathi, Hindi and Bengali. Markandeya claimed that Vracada Apabhramsa was spoken in Sindh and is the ancestor of modern Sindhi (sindhudeśedbhavo vrācaḍopabhraṃśaḥ). Pischel and Grierson have both supported this claim by Markandeya. Very little is known about the Vracada itself, except nine peculiarities noted by Markandeya. Here are some of those features of Vracada:
  1. Retroflexion of MIA dental stops. For example, <> → /ʈ/ and <> → /ɖ/
  2. An epenthetic <> before <> in Vracada may be the source of the Sindhi and Saraiki implosive /ʄ/
  3. Sibilant merger: ṣ, s→ ś
  4. The च-series are pure palatals. For example, <> → /c/, <> → /ɟ/
Fourth, despite Bhagadatta’s valid analysis, multiple sources say Paisachi represents the Northwest Zone. Page 24 of {{R:inc:CGMIA}} says that regardless of whether the Northwest Zone is the home of Paisachi, it was also spoken in the Central Zone. Pages 30-31 say that there are at least four lects of Paisachi: Kaikeya, Saurasena, Pancala and Culika. The Paisachi lects in Pischel are perhaps Kaikeya and Culika with Culika being marked separately. I see no harm in using reconstructed Paisachi like reconstructed Ashokan Prakrit as a solution to this issue. Since Proto-Prakrit as a separate entity was rejected, attempting to create Proto-NW I-A as a separate entity is likely to be rejected on the same grounds. Like Ashokan Prakrit, the attested and reconstructed terms would represent different entities. See अक्खइ for a Paisachi quotation.
Fifth,
The area encompassed by Sauraseni Prakrit is too large. The area between Mathura and Karachi is at least twice the size of the areas encompassed by both Maharastri Prakrit and Magadhi Prakrit.
There is no Sindhi-speaking editing community (other than the inactive user User:Aursani) to obtain information from.
Old Punjabi pa-old is an etymology-only language.
What is the purpose of Category:Western Panjabi language pnb if it is intended to be merged with pa? Kutchkutch (talk) 13:15, 17 September 2020 (UTC)
@Kutchkutch: I am not an expert, but I would like to make a general remark that the biggest problem faced while classifying the IA family is the effect of dialect levelling and/or dialect mixing that happened historically, which can disrupt the otherwise regular nature of sound laws, and thus lead to common innovations in divergent lects. For example, Punjabi shares with Dardic the feature of losing voiced aspiration, and metathesis of the rhotic consonant. inqilābī [ inqilāb zindabād ] 17:10, 17 September 2020 (UTC)
@Kutchkutch: Very well articulated. I agree that Proto-Northwestern Indo-Aryan has little to no chances of being approved as a full fledged language on wiktionary, complete with lemmas in its name. The best solution seems to use reconstructed Paisaci for this purpose. -- Bhagadatta (talk) 01:34, 18 September 2020 (UTC)
@Inqilābī: The points that you have mention are worth considering. Anything I say about Northwestern Zone is from an outsider's perspective (so if something is incorrect then please feel free to correct it). User:AryamanA and the other Punjabi editors are probably in a better position to make internal judgements about the Northwestern Zone. Despite having an outsider's perspective and numerous limitations, learning about the other Zones of Indo-Aryan is still a worthwhile pursuit (If User:DerekWinters was still around, I'm sure that he would agree). Since there is an international border and a variety of religions in the Northwest Zone, discussing it in detail might involve several sensitive issues such as politics or religion.
Perhaps the effects of ʻdialect levelling and/or dialect mixingʼ, Areal features, Sprachbund#Indian_subcontinent and Dialect_continuum#Indo-Aryan_languages is one of the reasons why ʻthe documentation of various earlier stages does not represent a logical successive relationship from one stage to the nextʼ. These are some of the examples of the shortcomings of the Tree model especially for the Indo-Aryan family. The Wave model tries to fix some of those shortcomings, but understanding and applying it appears to be a challenging task. Although Module:zh-dial-syn seems to be a possible approach to addressing the shortcomings of the Tree model, data is either hard to find or non-existent.
@Bhagadatta: This discussion about the Northwest Zone raises interesting parallels with the other Zones of Indo-Aryan. The work on Maharastri Prakrit, Old Marathi, Marathi and Konkani has certainly been advancing our understanding of the Southern Zone in the public eye. It would be nice to see the same kind of collaboration (if it doesn’t exist already) on the modern and historical languages of the other Zones among native speakers and learners.
Interestingly, User:AryamanA created codes for Proto-Central Indo-Aryan inc-cen-pro, Proto-Northern Indo-Aryan inc-nor-pro and Proto-Northwestern Indo-Aryan inc-nwe-pro without anything more than:
I had not correctly categorized some of the subfamilies, so the pages themselves (except for 1 Ahirani lemma) are okay. This reorganization will take a bit.
So I'm a not sure whether that means we can start reconstructing these proto-languages (finding citations for such reconstructions would be difficult). Perhaps he'll tell us more about what is happening after the reorganisation is complete. According to Wiktionary:Families, non-genetic groups of languages can also be called a ʻfamilyʼ such as CAT:Prakrit languages and now CAT:Central Indo-Aryan languages and CAT:Eastern Indo-Aryan languages. The ancestor of Ahirani is now Sauraseni Prakrit instead of Maharastri Prakrit (perhaps the similarity of से with છે (che) was the reason for the change) with the result now visible on आऊत (औत (aut) is more common than अऊत (aūt) for Marathi but mr.wikt uses अऊत (aūt)). Khandeshi continues to have Maharastri Prakrit as its ancestor. Although {{R:ahr:RSS}} exists, not all the pages of that dictionary are available.
It says on Wikipedia:
Sanskrit refers to the whole range of mutually intelligible Old Indo-Aryan dialects spoken in North-western India at the time of the composition of the Vedas.
the original speakers of what became Sanskrit arrived in the Indian subcontinent from the north-west sometime during the early second millennium BCE
So perhaps that means that attested Vedic Sanskrit is OIA in the Northwest Zone, and *पुरिष (puriṣa), *दिन्न (dinna), Reconstruction:Sanskrit/झापयति and the terms in CAT:Sanskrit reconstructed terms could either be alternate forms of OIA in the Northwest Zone or OIA in the other Zones. Kutchkutch (talk) 09:22, 18 September 2020 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Kutchkutch: The part about Skt. being a catch all term for all OIA dialects in North Western India was added by me when I was trying to re-word Woolner's statement. The original statement was something to the effect of, "if Sanskrit is taken to mean Vedic and all Old Indic dialects then the Prakrits are derived from Sanskrit". I think I ought to update it when I have more time on my hands as OIA was also spoken in the east. The part about Sanskrit speakers arriving in the northwest of India was already there. As for Vedic being North Western OIA, it's beautifully illustrated here that Vedic too was a mixture of several dialects. The hymns and verses that would later be included in the Rigveda were composed near the confluence of the Sutlej and Beas rivers in the Punjab region but the Rigveda was redacted in modern day Haryana where the dialect was slightly different so the original Rigvedic dialect was "filtered" through the phonetics of the Kurukshetran dialect, also called "Western Vedic" which was spoken where the redaction of the text took place. Finally Panini's Classical Sanskrit comes from the Northwestern dialect of OIA spoken in Gandhara. -- Bhagadatta (talk) 10:43, 18 September 2020 (UTC)

@Bhagadatta: Thanks for the explanation and sharing that link. I've read that before but wrongly assumed that it was an imagined story. When you have time, it would be very helpful if you could update Wikipedia with your understanding of this information. Interestingly, Chitrapur Math's Sanskrit lessons use a funny story about the difference between two Konkani dialects to explain what Panini did for Classical Sanskrit here. Apparently there is बड्गी dialect in North Kanara and a तेङ्की dialect in South Kanara (this is reminiscent of the first discussion at Talk:हांव). The female author of the Sanskrit lessons who speaks बड्गी marries a तेङ्की speaker and encounters a few difficulties.
Good ol' Panini, God bless his soul, being extremely sensitive to people's feelings, so no group would feel left out, and wanting to see everybody live happily ever after together, decided to act Pappamma, and brought all of them together under the aegis of "Sanskrit." He toured all over Bharatvarsha, noted every word used and put it all down on paper. Then he classified the words. AND HOW!!! ( To his credit ...he studied all existing grammar works and in his own work, has very religiously and faithfully accounted other grammarians' thoughts on the subject under discussion.) Once Panini's work became known to the people, the Sanskrit Badgis and the Tenkis of the days gone by became familiar with each other's vocabulary and very soon a mixture of the two became a single , common medium of communication. Much like my kids speak today!
The connection between Dardic and Punjabi probably comes from the article Dardic languages. There also may be some Dardic influence on Konkani_language#Pre-history_and_early_development
Dardic may in turn also have left a discernible imprint on non-Dardic Indo-Aryan languages, such as Punjabi and allegedly even far beyond.
Konkani shows a good deal of Dardic ( Paisachi ) influence. Even Magadhi has got a good deal of "Dardic" influence. The other languages on which Paisachi exerted influence are Sindhi, Punjabi, Kashmiri and Nepali in the north.
The influence of Paisachi over Konkani can be proved in the findings of Dr. Taraporewala, who in his book Elements of Science of Languages (Calcutta University) ascertained that Konkani showed many Dardic features that are found in present-day Kashmiri. Thus, the archaic form of old Konkani is referred to as Paishachi by some linguists. This progenitor of Konkani (or Paishachi Apabhramsha) has preserved an older form of phonetic and grammatical development, showing a great variety of verbal forms found in Sanskrit and a large number of grammatical forms that are not found in Marathi.
The link that you provided also demonstrates that some work on the Northwest Zone has been done. Perhaps a summary is in order.
For Paisachi:
The most iconic feature of Paisachi is the apparent devoicing of intervocalic stops (Compare Sanskrit bhagavatī with Paiśācī phakkavatī). The grammatical rules at work in Paiśācī could simply be the reverse application of the voicing rules applied to produce the other Dramatic Prakrits. For von Hinüber (1981), however, the supposed devoicing in Paiśācī is actually a fiction of orthography. According to his theory, at some point in the development of Middle Indic, the character <g> no longer represents voiced velar stop /ɡ/ but rather voiced velar fricative /ɣ/ due to lenition. After this shift, the character <k> is repurposed to mark /ɡ/. For von Hinüber, the odd appearance of Paiśācī is due to the distorting lens of this orthographical shift.
For Sindhi:
The five major dialects of Sindhi are Vicholi, Lari, CAT:Lasi language, Thari and Kachchi. CAT:Memoni language, CAT:Kholosi language and CAT:Luwati language are often included as well. Thari is perhaps another name for Dhatki mki, in which case it may actually be a Rajasthani language. Some sources say that there is a Saraiki dialect of Sindhi and another Saraiki dialect of Punjabi.
Vicholi is the standard dialect in central Sindh. Lari is the dialect of southern Sindh. Lasi is spoken on the western frontier of Sindh and in Balochistan. Thari is the dialect of the Jaisalmer district of Rajasthan. There is a dialect map for Sindhi at [4]. Implosives are explained as the outcomes of geminated voiced stops in MIA (MIA /bb//ɓ/). This is slightly different from the analysis of Vracada Apabhramsa above. The number of voiced implosives differs from dialect to dialect (similar to how the number of tones differs from dialect to dialect for Punjabi). All Sindhi dialects have at least one implosive, and curiously none have a dental. *[tr] → /ʈ/ in ٽي(three).
For Punjabi:
Fariduddin Ganjshakar (1173 - 1266) was one of the first Punjabi writers. Fariduddin Ganjshakar wrote in the Shahmukhi script. Perhaps the name Shahmukhi is only used to distinguish it from Gurmukhi. Fariduddin's literature was included in the Adi Granth. The excerpt interestingly claims that there was some contact between the writers of Old Punjabi and Old Marathi.
The dialects of Punjabi are divided in to Western Punjabi and Eastern Punjabi. Here are some maps: 1 2 3
The Eastern dialects are Majhi, Doabi, Malwai and Puadhi. Doabi is spoken Beas and the Sutlej (perhaps the same region in which the the Rigveda was composed). Malwai and Puadhi are spoken south of the Sutlej along the boundary of the Haryanvi language area. Arya Samaj's promotion of Hindi in the Punjab is often cited as the reason for the blur between Hindi and Eastern Punjabi. Lahnda is an exonym for Western Punjabi coined by William St. Clair Tisdall and this term was also used by Grierson. Two major groups within Western Punjabi are Saraiki and Hindko. Hindko is spoken in discontinuous areas of Khyber Pakhtunkhwa and in frequent contact with Pashto. Kutchkutch (talk) 10:18, 19 September 2020 (UTC)
@Kutchkutch, Bhagadatta, Inqilābī: Thank you all for your input!! I have been a bit busy, but I have read and tried to do my own research as well.
  • I want to clarify, Proto-Central IA, Proto-Northwestern IA, etc. were only a temporary measure for organization in our modules. I don't expect that we will be reconstructing (most of) these and we should strive to replace them with attested languages if possible. Hence, why I brought up Paisaci as a possible substitute for Proto-NW IA. (Also, most sources I found treated Ahirani as a Central IA lect, hence the reclassification.)
  • The status of Paisaci seems to be difficult to ascertain. Upadhye (1939-40) in their review of Paisaci literature classifies it as an Old MIA language, on the order of Ashokan and Pali. This would seem to explain the rarity of it in written texts; probably, since no religious group promoted it (as Buddhists and Jains did with other MIA lects), there was little incentive for its recording, and so all we are left with are the statements of grammarians.
  • But it should also be noted that early Buddhist texts claim that Paisaci was used by one of the schools of the Vaibhāṣika Sthavira (sub-sect) of Buddhism, based in Kashmir. And going through Pischel again, I find that all of the evidence does point to Paisaci being a Northwestern IA language The only argument put forth for it being anywhere else (namely the Vindhyas) are the presence of retroflex ḷ (suggested by Rudolf Hoernlé) but pretty much everyone that followed has found this to be insufficient evidence. It should also be noted that (some lects of) Punjabi have developed ḷ natively too, and I think some of the Dardic languages too.
  • Then the problem follows, what is a descendant of Paisaci? I strongly doubt that Konkani is involved, although there certainly hasn't been enough historical linguistic work on it; I would think a lot of Konkani's archaisms are rather due to re-Sanskritization, perhaps earlier than occurred for other IA languages. The Dardic languages are often tied to Paisaci, but Pisaci only preserves one sibilant while the Dardic languages have all 3 generally. Maybe we ought to keep it as only the ancestor of Sindhi and Punjabi? I also think a code for Vracada (as the ancestor of Sindhi etc. as Kutchkutch investigated) is necessary and uncontroversial. The devoicing really reminds me of Punjabi, which has done that to its voiced aspirated series and has resulted in tones, but if it's purely an orthographic difference then that is not the same process. Upadhye gives a very interesting idea about Culika dialect as being a form spoken by Sogdians who came to India, suggesting the speakers of Paisaci migrated inland later on, which would result in dialect admixture with Sauraseni. But I think it's very difficult to draw satisfactory conclusions.
  • Of course, there are probably discussions to be had about Dardic's placement as well. Probably, Shabazgarhi/Mansehra Ashokan Prakrit are a good proxy for the ancestor of Dardic, but it is confusing where Gandhari comes into play. We will not put it under Paisaci at this time.
  • Finally, I'm afraid the tree model is too simplistic for IA in general, as Inqilābī and Kutchkutch pointed out. On a purely lexical basis, as examined from the view of lexicostatics, Kogan (2016) finds Punjabi to be closest to Hindi, grouping Sindhi closer to Gujarati, Rajasthani, Lahnda, etc. Our grouping is quite different. Hindi itself, as we know, is a highly mixed language that developed in Delhi from contact between many languages in a political centre, as reflected in its early forms such as Sadhukkadi, in which case even it is not a purely Central IA language and probably is highly influenced by other Central (Braj, Haryanvi), Northwest (Punjabi), Western (Rajasthani), Eastern (Bihari lects) and the Ardhamagadhi lects of IA.
  • Overall I am not opposed to the placement of Punjabi and Sindhi (and related languages) under Paisaci, and Sindhi and its relatives under Vracada Apapbhramsa. Reconstructing Paisaci seems difficult however, although I am curious whether late MIA can be reconstructed at all so it may be an interesting experiment to undertake. However, I wonder if we should only rest on geographical groupings as the best we can have, since the Prakrits seem to be difficult to tie directly to NIA languages.
AryamanA (मुझसे बात करेंयोगदान) 21:50, 19 September 2020 (UTC)
@Bhagadatta, Kutchkutch, Inqilābī: Also check out [5] from Suniti Kumar Chatterji's work on Bengali. I think this is the best tree-based chart we'll be getting! —AryamanA (मुझसे बात करेंयोगदान) 02:38, 20 September 2020 (UTC)

Category:Hindi Tadbhava[edit]

@AryamanA, Itsmeyash31, Atitarev, Bhagadatta I am planning to remove this category unless someone comes up with a good reason to keep it. It has only 33 entries in it, is badly named, and duplicates Category:Hindi terms inherited from Sanskrit. Benwing2 (talk) 04:50, 17 September 2020 (UTC)

@Benwing2: Oh man, consensus to delete this category had been reached years ago, I can't believe it's still there... I thought it was deleted. -- Bhagadatta (talk) 05:16, 17 September 2020 (UTC)
@Benwing2: Since we do not have Category:Hindi Tatsama and Category:Hindi Ardhatatsama, then how did this one linger thus long? Obviously, delete it. inqilābī [ inqilāb zindabād ] 13:18, 17 September 2020 (UTC)

Turkish noun inflection[edit]

According to a comment in Requested entries (Turkish) (Special:Diff/60412898) the Turkish noun inflection template is incomplete. Some possessive forms can end in -in or -ini and only the first is generated. Could somebody who knows Turkish explain what needs to change? Vox Sciurorum (talk) 13:46, 17 September 2020 (UTC)

It is not a simple matter. Turkish nouns can have an optional number suffix (marking a plural), an optional possessive suffix, and an optional case suffix, in that order. For example, kitap-lar-ım-da = “book-plural-mine-in” = “in my books”. So the generic form is noun stem + number + possessive + case. Counting the absence of a marking as the presence of a null segment denoted, for the purpose of exposition, by ∅, some possibilities are:
  • kitab-∅-ım-da = “in my book”
  • kitap-lar-∅-da = “in (the) books”
  • kitap-lar-ım-∅ = “my books”
Including the null segment, there are two number suffixes, seven possessive suffixes, and six case suffixes, giving 2 × 7 × 6 = 84 combinations. Some forms are shared, but are analytically and semantically distinguishable. The inflection tables are in comparison rather simplified. They are essentially two tables: a main one for noun stem + number + ∅ + case, and an optional one for noun stem + number + possessive + ∅, reducing the number of combinations to 2 × 1 × 6 + 2 × 6 × 1 = 24. The requested entry bisikletini (which is the definite accusative of both bisikletin (“your (singular) bicycle”) and bisikleti (“his/her/its bicycle”), and therefore a shared form) contains both a possessive suffix and a case suffix, so it is not included in the 24 forma provided at bisiklet#Declension. All this is peanuts compared to Turkish verb conjugations, where you’ll have a real combinatorial explosion if you try to include all possible forms. I think this is more a grammatical issue than a lexical issue, but ultimately it is a policy issue; does the maxim “all words” really mean all 10,080 possible, completely regular and predictable inflections of some base form in some agglutinative language?  --Lambiam 11:25, 18 September 2020 (UTC)

The Azerbaijani inflection table has also the same problem but it has the complete inflection, then why can't we do the same with Turkish? Some Wiktionaries have already the completed one. If we add the inflection for bisikletini we can just add as second person possessive and third person possessive in the accusative form. If we compare with examples as cases: Bisikletin nerde? = Where's your bicycle?(nominative second person possessive). Bisikletini istiyorum = I want your bicycle(accusative second person possessive). Lagrium (talk) 14:01, 18 September 2020 (UTC)

Kyiv[edit]

Perhaps Kyiv should be promoted from an “alternate spelling,” in the wake of the renaming of w:KyivMichael Z. 2020-09-17 16:49 z

The city hasn't been renamed AFAICT; as long as more English-language sources call it Kiev than Kyiv, I think {{alt sp}} is still right. Kyïv, Kyjiv, and Kyyiv may also be attestable, but none of them have an entry yet. —Mahāgaja · talk 17:27, 17 September 2020 (UTC)
Kyiv and Kiev are Romanizations of different names, in different languages, for the city. So they are, IMO, “alternative forms” rather than “alternate spellings”.  --Lambiam 11:40, 18 September 2020 (UTC)
The difference in their written form is only their spelling—not capitalization, not diacritics, or anything else (not that I care much about the label). Michael Z. 2020-09-19 16:37 z
Alternative spellings have to share an etymology. These two forms don't. Ultimateria (talk) 17:37, 19 September 2020 (UTC)
The same could be said about "castle" and "chateau". Would you say that Myanmar and Burma are alternative spellings? How about Beijing and Peking?
Interesting comments. (Who says alternative spellings have to share an etymology?) They sort of do share an etymology, because the English name was strongly influenced by both Russian and Ukrainian at a time when written Ukrainian was suppressed in the Russian empire, when Ukrainian was often referred to as “Russian” or “Little Russian,” especially by the literate class in the Russian empire, and when the old orthography could write it exactly the same in two languages: Киѣвъ (the letter yat was pronounced differently and led to different sounds and letters in modern Russian and Ukrainian). There’s a lot of historical Ukrainian influence on English that still carries that legacy.
The spoken name \key-ev\ is not two different words. It can be transcribed with either of two spellings, depending on whether you use a current style manual or still use your grade four textbook.
Peking and Beijing came about similarly, through transcription of Cantonese and Mandarin languages, respectively. Pronounced differently but written similarly, like Киѣвъ > Кіев/Київ. But English Kyiv/Kiev are spoken the same. Michael Z. 2020-09-19 19:08 z
That's a good point. {{alt form}} is better than {{alt sp}} for this. —Mahāgaja · talk 16:28, 18 September 2020 (UTC)
Indeed the city hasn’t been renamed in Ukrainian or in Russian, but it is in the process of being renamed (re-spelled) in English, much like Peking→Beijing. More current sources are now writing Kyiv, and according to current style manuals Kiev deserves the label “dated.” Wikipedia has changed its practice as a follower, not a leader, and this only after the delay of a six-month moratorium (gag order) and three months of debate on the associated talk page. Michael Z. 2020-09-19 16:43 z
The changes in style guides are motivated by politics. Kiev is still more common. --Vahag (talk) 18:10, 19 September 2020 (UTC)
Our dictionary shouldn’t share your political prejudices. It is based on usage, which we ascertain through references. And it shouldn’t give our readers writing advice that makes them look ignorant of current standards. Michael Z. 2020-09-19 19:09 z
Google Ngram is too coarse. The last data point is all of 2019 averaged out. There was a tipping point in usage during October–November. If you’re interested in politics, the very conservative BBC and New York Times use Kyiv, Breitbart uses KievMichael Z. 2020-09-19 19:20 z 19:20, 19 September 2020 (UTC)

Appendix for strings in unidentified or uncertain languages?[edit]

Over in the User_talk:Karaeng_Matoaya#The_enigmatic_poem_of_Nukata_no_Ōkimi thread, a few of us where discussing a particular poem in the Man'yōshū anthology of Old Japanese poetry, completed around 759 C.E.. Poem number 9 has frustrated readers for centuries, as the first two stanzas may be written in a different language entirely.

That discussion gave rise to a question about whether Wiktionary might have space for collecting snippets of text like this, where the underlying language might not be known. Clearly, a mainspace entry would be inappropriate. But what about an Appendix page?

Do we perhaps already have such an Appendix area set up? ‑‑ Eiríkr Útlendi │Tala við mig 19:05, 17 September 2020 (UTC)

Why is mainspace inappropriate? It's attested, and it's what we already do: see Category:Undetermined lemmas. —Μετάknowledgediscuss/deeds 19:14, 17 September 2020 (UTC)
The Buyla inscription has word breaks so you know what to make entries for. But these verses (there are other examples especially in ancient Chinese sources, as discussed there) are entire sentences (sometimes entire songs) of words in an unknown language where even academics can’t agree on where to split the word boundaries, and it doesn’t seem quite right to treat sentences or songs as mainstream entries.--Karaeng Matoaya (talk) 22:36, 17 September 2020 (UTC)
In that case, we could create entries for each character (as for the Phaistos Disc signs)... although I don't object to creating an appendix, which would I presume present the text and various scholarly ideas of where to break it up and what it might mean? - -sche (discuss) 18:43, 19 September 2020 (UTC)
@-sche: Unlike the case of the Phaistos Disc where the characters are literally undeciphered, the main languages that brought about this discussion are all transcribed in conventional Chinese characters that we already have CJKV entries for, so it's only a matter of reconstructing the pronunciations and orthographic practices at the time and place of the transcription and comparing the resulting sequence to known languages from those parts of Eurasia. Scholars have gone quite a far way to deciphering many of these cases, but of course each interpretation attempt only makes sense as a whole; if a certain passage is determined to be Proto-Turkic from the beginning, you're going to have entirely different results from if you decided that it was Para-Mongolic. So having separate-character entries would not be particularly productive, while discussing the passage as a whole would. This is why I think an appendix entry like Appendix:Song of the Yue Boatman would work better.--Karaeng Matoaya (talk) 01:06, 20 September 2020 (UTC)

Quote adder redux[edit]

In a recent discussion about templatising citations it became clear that some editors don't use templates for citations because it's more difficult (parameters must be remembered, brackets matched etc.).

Other communities have already addressed this problem and created the Citoid extension. It extracts metadata (author, date, title etc) from a URL/ISBN/DOI and generates template code which can be directly inserted into the page. This has been suggested before, but not much has happened since. I'd like to get the ball rolling, if there's interest we'd need to set up a vote to get the extension installed, and create mapping files to work with our templates. – Jberkel 13:58, 18 September 2020 (UTC)

Calling other users involved in that discussion (@Equinox, DTLHS, DCDuring) for their support. inqilābī [ inqilāb zindabād ] 18:11, 19 September 2020 (UTC)
It looks like it takes a fair amount of work to set it up. I'd like to see how a version configured for Wiktionary does on older cites using URLs from Google Books and no ISBN (ie, pre-1970). How does it do on translated works? On edited works with many authors of individual works? And quotations taken not from the speaker/author's own work, but from some secondary source? What about digging out when the work was actually written rather than when the specific print edition was published (though that publication date might be of supplementary interest)? I expect that a lot of manual work will be required for these situations in which the relative ease of Citoid will lead folks to fail to perform the whole job. But the current setup does a pretty good job of that already. DCDuring (talk) 00:48, 20 September 2020 (UTC)
BTW, is anyone working on tools like those developed by ELEXIS? Does anyone have access to and opinions about these tools? Also, has anyone worked with Sketch Engine? DCDuring (talk) 01:19, 20 September 2020 (UTC)

CFI for misspellings[edit]

Hi, I've created a vote at Wiktionary:Votes/2020-09/CFI for misspellings to begin next week. I appreciate any feedback. Please let me know on the talk page of any relevant discussions I've missed. Ultimateria (talk) 19:30, 18 September 2020 (UTC)

I've renamed the vote to reflect its scope of rewriting the Spellings section of CFI: Wiktionary:Votes/2020-09/Misspellings and alternative spellings. Ultimateria (talk) 21:25, 18 September 2020 (UTC)

FWOTDs[edit]

For the past year; I have been selecting the Foreign Words of the Day (FWOTDs) here on Wiktionary. However, I'm at a point where I won't be able to do them any longer and would like to wash my hands of them, so I'm asking for someone to take them on. To set them, choose words from the nominations and add them to the subpages here using {{template:FWOTD}}.

If you have any questions about how to set them, look at previous FWOTDs (e.g. September) for guidance or ask me or User:Metaknowledge (who set them before me)

  • Even if you're not willing to take on the job of setting FWOTDs on a permanent basis, setting a few would be immensely helpful.
  • There's currently a shortage of FWOTD nominees. Nominating more here would also be great; remember to follow the guidelines on that page.
  • I'm open to the idea of potentially changing how Wiktionary does FWOTDs, as long as I'm not involved. As I said, I'd like to wash my hands of them.

Hazarasp (parlement · werkis) 05:39, 20 September 2020 (UTC)