Wiktionary:Beer parlour/2019/April

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← March 2019 · April 2019 · May 2019 → · (current)

Including or excluding ethnic slurs under synonyms for ethnicity[edit]

Recently, User:Jimbo2020 removed the ethnic slurs/derogatory terms under the synonyms of Somali. On their user talk page, they argue that the precedent is to "not have ethnic slurs as synonyms unless they are historically significant". The examples for entries with synonyms including ethnic slurs were Chinese, German and African-American, while those that did not include them were Italian, Finn, and Oromo (the last two of which do not have any synonyms). I don't think there are any specific guidelines on this, so it would be a good idea to come up with at least something. — surjection?〉 18:42, 1 April 2019 (UTC)

I don't know what "historically significant" means. All words are "historically significant". DTLHS (talk) 18:44, 1 April 2019 (UTC)
Wiktionary is not censored. If they are or were at one time used as synonyms, they belong in the list of synonyms. —Rua (mew) 18:54, 1 April 2019 (UTC)
It's not about censoring. Kraut would be a "historically significant" entrance under German. Listing a marginally used neologism like muzrat (which should be deleted btw) under Muslim would not be. Almost all ethnonym pages do not list any ethnic slurs unless the word has a storied history in the English language or is particularly relevant to English speakers. BTW The American page currently has Ameritard listed as a hyponym, does that look right to you? Jimbo2020 (talk) 18:57, 1 April 2019 (UTC)
Based on the definition as "stupid or ignorant American", I would say yes, since it describes a subset of Americans. — surjection?〉 18:58, 1 April 2019 (UTC)
Again, are they synonyms? Then we list them as synonyms. Have you ever looked at some of our Thesaurus pages? They're full of offensive terms. —Rua (mew) 18:59, 1 April 2019 (UTC)
Once again this is about a general style precedent check the pages for Arab and Pakistani and Italian, where are the slurs listed as synonyms? Jimbo2020 (talk) 19:01, 1 April 2019 (UTC)
You've pointed out that those entries are missing some synonyms, so someone will hopefully get around to adding the missing ones. —Rua (mew) 19:04, 1 April 2019 (UTC)
This would seem to undermine our mission as a descriptive reference work. It would not undermine it if we RfVed allegedly offensive terms, though the RfV process itself would advertise them. DCDuring (talk) 19:09, 1 April 2019 (UTC)
I don't understand. How does including all synonyms go counter to our mission? The opposite seems true to me. —Rua (mew) 19:42, 1 April 2019 (UTC)
Sorry about the misleading indentation and the ambiguous deixis of this. I was referring to the deletion of content on apparent grounds of offensiveness. DCDuring (talk) 22:34, 1 April 2019 (UTC)
Ah, ok. Thanks for clarifying. —Rua (mew) 22:46, 1 April 2019 (UTC)
Mehhh. I do sympathize with the desire to not (in effect) "promote" obscure derogatory terms by putting them as synonyms of common terms (like the example of muzrat on Muslim, above), but precedent certainly seems to be that they would be included, with appropriate tags of course (e.g. "derogatory, rare"), along with any alternative spellings (including e.g. rare and obsolete ones which someone might also complain about the oddness of "promoting"). A possible compromise would be to put them in a collapsed box like related and derived terms are put in, or to offload them to a Thesaurus page and have the synonyms section direct people to it. But what I would regard as the usual approach, of just listing them as synonyms with appropriate tags, seems OK. - -sche (discuss) 00:05, 2 April 2019 (UTC)
Terms that meet our CFI should be included, also if they are offensive – until we decide to change the CFI. At the same time we should be careful to mark offensive terms as offensive. Under German (the noun) the synonym “Kraut” is labelled offensive. I think “skinnie” is at least as offensive – and not only to the person being derogated with the slur, but to anyone with sensibility.  --Lambiam 20:31, 2 April 2019 (UTC)
Note that the change did not remove the entries for skinnie/Skinnie, just their appearance as synonyms for Somali. It's not entirely unreasonable to take the position that a slur is not an exact synonym for the corresponding more neutral demonyms. But a slur does have a semantic relationship to the corresponding demonym. Is it an antonym, a coordinate term? Should it appear under 'See also'? For consistency should we make sure that mutt and mongrel do not appear as synonyms of mixed-breed/mixed breed (Lemmings have it.) because they are pejorative?
The likelihood that anyone with pejorative intent will come to Wiktionary to find some good ones is negligible. It is much more likely that someone will come here looking to object to our inclusion of the pejoratives. So this seems to be a matter of w:virtue signalling rather than something likely to have a bad effect outside of the potential controversy. It is a question of our ascription of virtue to descriptivism vs. the proscription against any purported encouragement or even license of the use of ethnic slurs. DCDuring (talk) 21:30, 2 April 2019 (UTC)
If the issue is that words like "muzrat" are absurdly rare (which may be true), it seems this is a problem with listing synonyms of anything, not just of ethnicities. Equinox 21:39, 2 April 2019 (UTC)

Fun game again[edit]

Hi all. As last year we had an excellent time playing a multilingual board game, I'd like to repeat this year. I set up Wiktionary:Random Competition 2019. We'll start sometime soon provided there's someone to play with me. --I learned some phrases (talk) 10:26, 2 April 2019 (UTC)

URL shortener for the Wikimedia projects will be available on April 11th[edit]

Hello all,

Having a service providing short links exclusively for the Wikimedia projects is a community request that came up regularly on Phabricator or in community discussions.

After a common work of developers from the Wikimedia Foundation and Wikimedia Germany, we are now able to provide such a feature, it will be enabled on April 11th on Meta.

What is the URL Shortener doing?

The Wikimedia URL Shortener is a feature that allows you to create short URLs for any page on projects hosted by the Wikimedia Foundation, in order to reuse them elsewhere, for example on social networks or on wikis.

The feature can be accessed from Meta wiki on the special page m:Special:URLShortener. (will be enabled on April 11th). On this page, you will be able to enter any web address from a service hosted by the Wikimedia Foundation, to generate a short URL, and to copy it and reuse it anywhere.

The format of the URL is w.wiki/ followed by a string of letters and numbers. You can already test an example: w.wiki/3 redirects to wikimedia.org.

What are the limitations and security measures?

In order to assure the security of the links, and to avoid shortlinks pointing to external or dangerous websites, the URL shortener is restricted to services hosted by the Wikimedia Foundation. This includes for example: all Wikimedia projects, Meta, Mediawiki, the Wikidata Query Service, Phabricator. (see the full list here)

In order to avoid abuse of the tool, there is a rate limit: logged-in users can create up to 50 links every 2 minutes, and the IPs are limited to 10 creations per 2 minutes.

Where will this feature be available?

In order to enforce the rate limit described above, the page Special:URLShortener will only be enabled on Meta. You can of course create links or redirects to this page from your home wiki.

The next step we’re working on is to integrate the feature directly in the interface of the Wikidata Query Service, where bit.ly is currently used to generate short links for the results of the queries. For now, you will have to copy and paste the link of your query in the Meta page.

Documentation and requests

Thanks a lot to all the developers and volunteers who helped moving forward with this feature, and making it available today for everyone in the Wikimedia projects! Lea Lacroix (WMDE) (talk) 11:57, 3 April 2019 (UTC)

The relationships between lemmas and forms[edit]

Why is colours the plural of colour and not of color? The obvious answer to this question would be that the spellings are different. But I ask you to look at little deeper at this question. All our definitions, etymology and translations are currently on the page color, so that is clearly the lemma. Yet, if you look up colours, then you don't get sent to the lemma, but instead to colour, which doesn't actually have any information and just redirects you a second time. A lot of our entries have this idea that there is some kind of "main" term, a lemma of sorts, which has inflections. But as you saw here, the lemma isn't always the actual lemma (the page that defines the term). Instead, we've created a kind of intermediate tier that is not a lemma, yet it has inflections as if it were a lemma. The result is this double indirection.

Having to hunt for links just to get to the definitions of a term is really bad for users. Someone who looks up colours is not interested at all in colour, which has no useful information. They are looking for color, where the definitions, etymology, translations and everything else useful are. And it begs the question: why is colours not defined as an alternative form of colors? It's equally valid, after all. Moreover, forcing this kind of "sublemma" structure gets really confusing in cases where it doesn't work so neatly. A single form could belong to multiple possible sublemmas (alternative forms). better is the comparative of good, but it is equally the comparative of the alternative goode. In highly inflected languages, you can have quite complicated situations, where there are multiple possible lemma forms, yet all the other inflections are shared. Inflections can sometimes have their own inflections; participles are well-known examples. All this increases the mental burden on the editor who somehow has to figure out how to translate the situation into Wiktionary's conventions, and also on the user who has to jump through multiple hoops to get to the real lemma.

I would like to re-examine the relationship we have between lemmas and forms. There is really only one true lemma here, because only one of the entries has a definition. It's the relationships between the different forms that is throwing us off, because we introduce concepts like "alternative forms": lemmas that aren't lemmas. The way I would analyse the situation above is that there is one lemma (which itself has no inherent written representation) with multiple possible representations of both the singular and plural. color and colour are singular forms of this lemma, and colors and colours are plural forms of this lemma. Each of the forms is used by some subset of English speakers, but they all belong to one lemma, not two. We are hamstrung by the need to place definitions, etymology and translations on the page of one of those forms, and by convention that is the singular, so we picked one of the possible singular forms and placed everything there. But it would be beneficial if we could let go of the idea that the singular is therefore "special", that it has its own inflections and cannot be an inflection itself. There is really no need for alternative forms, and the complications they bring, if we can accept that color is simply the lemma of four forms: color, colour, colors and colours. —Rua (mew) 20:36, 3 April 2019 (UTC)

I don't like your way of doing it because it suggests to me that someone took the plural colours and decided to respell the plural specifically. I think the real solution here is to come up with a system that can show the full entry regardless of which spelling is visited, with appropriate modifications (I realise this won't be easy due to accuracy of citations etc.). Equinox 20:45, 3 April 2019 (UTC)
That's only because we have to choose one of the possible spellings/forms to place the definition at. If we didn't have to do that, if the lemma could be entirely detached from the way it's spelled, then that would no longer be a problem. They'd simply all be lemmas of entry 19515, or something like that. Unfortunately, as I said, we're hamstrung by having that requirement. However, I don't think that should be an excuse to convolute the relationships on purpose, by introducing multiple "fake" lemmas as intermediates when there is really only one. —Rua (mew) 20:50, 3 April 2019 (UTC)
Also note, I'm not directly proposing anything to be supported or opposed. Rather, I'd like people to challenge the assumptions we've always made on Wiktionary, and consider other options. Some of what I said is inspired by Wikidata's data model. Wikidata strictly separates lexemes from forms, where lexemes contain one or more forms, but always at least one. Forms have grammatical properties such as "singular" or "plural", they have a written representation, and they have a pronunciation, all of which lexemes do not have. The representation of the lexeme (the lemma in their terminology) is not strictly tied to how it's written. The lexeme for our color is titled colour/color for example. It seems that most of the problems I described above arise from tying lexemes too closely to one particular written form. If we could treat the lemma form as simply the place where everything is gathered, and not as a word, then things might be easier for us. —Rua (mew) 21:01, 3 April 2019 (UTC)
I can supply a slightly more extreme example: Medises, an inflected form of Medise, a (variant capitalization of medise, which is itself a) variant spelling of medize. I wouldn't want to define Medises as being an alt form of medizes and link to that non-lemma, but I think we could simply pipe the link to the lemma, i.e. define Medises as: Third-person singular simple present indicative form of [[medize|Medise]]. "Colours" could likewise be: plural of [[color|colour]]. The only downside is that that might be what Wikipedia calls an "Easter egg", a link that doesn't go where a reader would necessarily expect, if they expect it to go to the display form and not the place where the content is. However, that doesn't seem much different from how e.g. Mēdōrum goes to Medorum, not Mēdōrum, and since medize mentions Medise as an alternative spelling, a reader should not be confused for long. Would that be a simple solution? (Is this what you were already thinking of, or...?) - -sche (discuss) 23:28, 3 April 2019 (UTC)
How about something like this: at colours have "plural of colour (see [[color]])" giving "plural of colour (see color)". That way they see both forms, but they're linked to the lemma. Chuck Entz (talk) 03:10, 4 April 2019 (UTC)
What should be done for cases where the inflections belong to multiple alternative forms of the same lemma? Or the extreme case where the lemma form is the only form that differs between them? —Rua (mew) 11:39, 4 April 2019 (UTC)

@Rua: Sorry for chiming in. I think the problem is that the term “form” in “lemma form” and “alternative form” on Wiktionary is used to refer to two different concepts: spellings and inflected forms. In the post above, color and colour are different spellings of the same word form ˈkʌl.ə(ɹ), while ˈkʌl.ə(ɹ) (color, colour) and ˈkʌl.əz (colors, colours) are different forms of the same lexeme color. So there are really two levels of hierarchy, but current Wiktionary terminology flattens them into one. If I'm not mistaken, the linguistic definition of a word (word form, to be precise) is its sound shape plus its meaning, and the spelling is largely irrelevant. I touched the point below, where I distinguished two kinds of categories, one dealing with word (form)s (e.g. CAT:Japanese proper nouns) and the other dealing with spellings (e.g. CAT:Japanese terms written with two Han script characters). --Dine2016 (talk) 05:37, 12 April 2019 (UTC)

Proposed change to zh-der[edit]

zh-der currently automatically provides the Mandarin pinyin for entries that have Mandarin pinyin in zh-pron. But for those entries which don't have Mandarin pinyin in zh-pron, no romanization is given. I propose including the non-pinyin romanizations like the Yueyu Pinyin and Min Nan POJ. It does not have to be well thought out or well planned at this stage, it just needs to happen and then be refined over time. --Geographyinitiative (talk) 22:51, 4 April 2019 (UTC)

That would be very confusing to mix up different romanisations. Also, I think this topic is only for Chinese editors only, so this can be discussed at Wiktionary talk:About Chinese instead, rather than here. --Anatoli T. (обсудить/вклад) 23:14, 4 April 2019 (UTC)
moved to Wiktionary talk:About Chinese per suggestion --Geographyinitiative (talk) 23:30, 4 April 2019 (UTC)



Are there any IPA-to-speech projects here?

I see there are a few FOS engines out there. How would/could they be incorporated?

Thanks. Saintrain (talk) 18:37, 6 April 2019 (UTC)

No such projects here, and also no plans. We’d rather have no audio representation than an inaccurate one. Even in narrow transcription IPA cannot reflect all nuances of human speech.  --Lambiam 07:06, 7 April 2019 (UTC)

Vote on excluding typos and scannos is live[edit]

A heads up: the vote on a proposed change to CFI that would exclude typos and scannos is now open. (See also the thread above titled CFI-amendment: excluding typos and scans.)  --Lambiam 07:13, 7 April 2019 (UTC)

Read-only mode for up to 30 minutes on 11 April[edit]

10:56, 8 April 2019 (UTC)

Fortunately for us, English Wiktionary isn't on the list at phab:T220080. — Eru·tuon 10:59, 8 April 2019 (UTC)

Category:English coordinated pairs[edit]

I came across this category and I'm trying to figure out what a coordinated pair is. We don't have a coordinated pair entry, and the description at the top of the category is not very helpful either. Could someone write a better description so me and future mes know what it's for? Thank you! —Rua (mew) 18:36, 8 April 2019 (UTC)

The membership in the category is an ostensive definition of the category. The meaning is SoP. I'll review the membership to see if any mes have erroneously included any terms. DCDuring (talk) 19:05, 8 April 2019 (UTC)
But what's "coordinated" about the pairs then? I really don't get it. It seems to be a category for just any pair of words that happen to appear together in an entry name. —Rua (mew) 19:39, 8 April 2019 (UTC)
I said I'd take a look and I have.
The easy cases are terms linked by coordinating conjunctions, principally and, or and their word-like equivalents ' n ', &, et. In these, each term in the pair is at the same grammatical and (usually) semantic level as the other. slowly but surely seems similar. The harder cases are the pairs linked by commas or hyphens/dashes. In ding-dong, willy-nilly (and others) the elements may or may not have distinct lexical existence and are, in any event, in Category:English reduplications. I'd be inclined to remove these from the category and refer to the reduplication category on the Category:English coordinated pairs page, either as a "see also" or by making it a subcategory. In another day, another dollar, finders, keepers, finders, keepers; losers, weepers, first come, first served and others, there is no coordinating conjunction. The semantic link seems to be not coordination but implication. I'd be inclined to removed these only if there is another plausible short category name that would describe them. I haven't thought of such a name.
I'd like other opinions. DCDuring (talk) 19:43, 8 April 2019 (UTC)
There is Coordination (linguistics) in Wikipedia. I don't pay much attention to the categories, but it would be nice if it had a description or a link to a Wikipedia article which would describe it. -Mike (talk) 21:36, 8 April 2019 (UTC)
If the description is updated, consider also updating Category:English coordinated triples. - -sche (discuss) 22:02, 8 April 2019 (UTC)

Japanese entry layout revisited[edit]

Hi. I'd like to propose the following long-term changes to the Japanese entry layout, and would like to have some of them incorporated into WT:AJA

  • A new citation format of Japanese terms: 日本 (にほん, Nihon, にっぽん, Nippon) or やまと (大和, , Yamato).
    • Currently many Japanese words are either cited with {{m|ja|...}} or {{ja-r}}. The disadvantage of the former is that there is no way to show both kanji and kana, or support multiple readings. The disadvantage of the latter is that (1) it takes up too much vertical space, discouraging editors from adding more synonyms, derived terms, etc. (2) The font size of the kanji is too big compared to normal citations Japanese terms, causing disharmony, and the size of the kana is too small on some computers, as Eirikr reports. I would like to employ the new format to cite all Japanese words and reduce the use of ruby to examples, and I think the best way is to modify {{ja-r}} to use the new format by default. This way we don't need to create new templates or mass-update mainspace entries. Please see User talk:Suzukaze-c#CSS for more.
    • I would also like to propose the new syntax {{ja-r|KANJI:KANA}} in addition to {{ja-r|KANJI|KANA}}. Editors can still use the second format, but other templates relying on {{ja-r}} can take advantage of the former. The reason is as follows: For most languages, one parameter is enough to enter a word (e.g. {{m|en|English}}), and the format of templates is pretty predictable (e.g. {{compound|en|place|holder}}). For Japanese, however, two parameters are often needed (e.g. {{ja-r|日本語|^にほんご}}), leaving different ways to place these parameters (e.g. {{ja-compound|日本|^にほん||}} versus {{ja-vp|終える|終わる|おえる|おわる}}). If we build the new syntax KANJI:KANA, then templates relying on it will have more consistent and more predictable syntaxes (e.g. {{ja-compound|日本:^にほん|語:ご}} and {{ja-vp|終える:おえる|終わる:おわる}}), which are also more interchangable with kanji/kana only versions (e.g. {{ja-vp|終える|終わる}}).
    • What about automatic fetching of the reading from the mainspace entry? For example, {{ja-r|日本料理}} should produce 日本料理 (にほんりょうり, Nihon ryōri) while {{ja-r|日本}} could produce 日本 [Term?] because there are many readings possible.
  • Eliminate sortkeys. Once the use of soft-redirection ({{ja-see}}) is established, there will be no need to categorize the kanji terms under kana. This is because {{ja-see}} copies categories from the lemma spelling to the non-lemma spellings, so all spellings of the term will appear in the same category. If we eliminate sortkeys, the kana part and the kanji part of a category will contain the same set of vocabulary, once in kana and once in kanji, so there is nothing to lose. More importantly, editors are liberated from the constant need to watch for categorizing templates (such as {{lb|ja|...}}) and add sortkeys.
  • Is there consensus on whether to lemmatize the wago vocabulary at kana spellings? I prefer to lemmatize terms at the most common spelling as a general rule, but make the core wago vocabulary an exception to it. First, wago terms have a greater degree of independence from and variety in combination with kanji. The most common kanji spelling is not necessarily the intended meaning it is used (e.g. 帰る返り点), but kana is acceptable everywhere. Second, the etymology of non-transparent-compound wago terms are best illustrated by the kana form. In etymology sections, “くら (, kura) + (, wi)” looks better than “ (kura) + (wi)”. (By the way, when the focus is on the meaning, such as in synonym sections or entries from other languages, I think the kanji should still be put before kana.) On the other hand, I'm not sure about whether to do the same for transparent compounds like 繰り返す, which have less justification. This means that the border between “terms lemmatized at kana” and “terms lemmatized at the most common spelling (usually a kanji spelling)” can be very vague and arbitrary.
  • What about a custom reference template? {{ja-ref|DJR}} is much easier to type than <ref name="DJR">{{R:Daijirin}}</ref>. For common references, we can also make the template link to Wiktionary:About Japanese/references, rather than generating a <ref>, because ===References=== <references/> is also tedious to type :)
  • Simplify the interface of inflection templates. The current syntax is unnecessarily complex. I think only two formats are needed: {{ja-infl|type=1}} (for わらう) and {{ja-infl|つれて いく|type=iku}} (for 連れて行く; the space is merely for the purpose of romanization). Everything else, from slight irregularities (e.g. 行く, ある) to separating the stem and the ending (e.g. {{ja-go-u|わら}}) as well as detecting |sik= should be built into the module. This should make it easy for {{ja-see}} to copy inflection tables around. With the current templates, {{ja-see}} would need to recognize both Category:Japanese inflection-table templates and {{ja-conj-bungo}} as well as learn their quirks (such as remembering to add |sik= when copying from もうでく to 詣で来), which is too tiring and error-prone.

(To be continued.) (Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Nardog, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): --Dine2016 (talk) 10:04, 9 April 2019 (UTC)

  • Generally in favor.
What is sik?
‑‑ Eiríkr Útlendi │Tala við mig 21:37, 9 April 2019 (UTC)
It is short for suffix_in_kanji, one of the parameters of {{ja-conj-bungo}}, used for example in the conjugation of .  --Lambiam 15:11, 10 April 2019 (UTC)
  • I have no real objections (though I don't think I understand all of the technical specifics). I do support the principle of using the most common form of a lemma rather than having a language-wide rule. In other words, treat e.g. 日本, 学校, する, and きれい each as 'main' entries, rather than having a general preference for kanji or kana. Cnilep (talk) 03:54, 10 April 2019 (UTC)
  • I think these are improvements that I expect to be uncontroversial. Some of these proposals are easy to implement, but I feel a plan is needed on how to roll out the more involved changes. As to lemmatization – apart from the fact that we need to strike a balance between what is most useful to the users and what is a reasonable effort to ask from the editors – which form to prefer is an issue for all languages offering alternatives that in the end needs to be addressed on a case-by-case basis, and if for any specific case two choices are more or less equally good (or bad), there is no point in losing sleep over which one to choose. It will be helpful to offer advice on such issues in Wiktionary:About Japanese.  --Lambiam 15:11, 10 April 2019 (UTC)
  • Support.
    • As for the new syntax, perhaps it can be implemented in the major link templates, so that we can use {{compound|ja|FOO:ほげ|...}} or {{compound|ko|방:房|...}}. (Or perhaps make a it a new parameter in the style of {{{ts}}}, if there is larger objection to :.) Personally, I am worried about : taking on too much responsibility in the linking templates.
    • Sortkeys: no aprticular comment.
    • Wago: +1 for kana.
    • However, I don't really like {{zh-ref}}, TBH. (Well, I have =3+hr+ expand to ====== + References + <references/> in my IME; maybe that's why I'm not terribly bothered.)
    • inflection templates: Absolutely.
  • Suzukaze-c 19:58, 11 April 2019 (UTC)
    • Support everything but Oppose lemmatising wago terms on kana entries. Like before, we should lemmatise on the actual most frequent Japanese spelling, so  () (yomu) is the lemma, IMO, not よむ (yomu).
    • I don't understand what is going to happen with eliminating sort keys. Will  () (ほん) (Nihon) still be sorted by "に"? Also, how are terms with multiple readings are going to be sorted?
    • Welcome to all new Japanese specific templates, they are overdue.
    • I think we also need to add categories for Sino-Japanese terms, similar to the Korean and Vietnamese but possibly split into smaller categories, considering the complexity of etymologies, reduce info in kyūjitai entries. Care should be taken when using Middle Chinese templates for sources but this should be encouraged. --Anatoli T. (обсудить/вклад) 01:22, 12 April 2019 (UTC)
  • Curious as to your opposition to lemmatizing yamato kotoba at the kana spelling? The kanji spellings are irrelevant to the etymologies of yamato kotoba, only being applied later when Chinese characters were borrowed, and lemmatizing at kanji spellings actively obscures cognacy and relationships.
Take the verb tsuku, for instance. By kanji, this could be spelled 付く・着く・就く・即く・憑く・突く・衝く・撞く・搗く・舂く・築く・吐く・漬く・浸く・尽く・歇く・竭く. Most of these 17 spellings are etymologically related, sometimes very closely indeed. Lemmatizing by kanji spelling hides this interrelationship and adds confusion, and necessitates a lot of data duplication across entries. ‑‑ Eiríkr Útlendi │Tala við mig 03:38, 12 April 2019 (UTC)
Agreed re: the failure of sortkeys. The current approach was based on the assumption that the back-end capability would eventually support multiple sortkeys for a given lemma string. We reported the MediaWiki shortcoming years ago, and received zero response from the devs -- 黙殺された. It's clear they don't give two shits, so we clearly need to change our approach if we want something workable. ‑‑ Eiríkr Útlendi │Tala við mig 04:40, 12 April 2019 (UTC)
@Dine2016, Eirikr: OK, agreed and Support on both points and sorry for doing this again to you. I completely forgot about the convincing つく-argument :) --Anatoli T. (обсудить/вклад) 05:24, 12 April 2019 (UTC)
@Eirikr, Atitarev: Honestly speaking, I'm not sure if making an exception for wago terms is really a good idea. One problem with kanji spellings is that the most common spelling does not necessarily cover all meanings of the term (while kana does). For example, the 帰る spelling of かえる does not cover the sense “to turn over”, so that the etymology of 裏返る has to be written as “ (ura, ) + 返る (kaeru, alternative spelling of 帰る in the sense ‘to turn over’)”. If we take a kana-centric approach to wago terms, then the etymology of うらがえる is simply “うら (ura, , …) + かえる (kaeru, 返る, 反る, ‘to turn over’)”. Another problem is that wago terms may appear as the reading/furigana to entirely irrelevant kanji, such as in person's names. However, such problems only concern a small percentage of the wago vocabulary, so I'm doubting whether it's really worthy to employ the kana spelling for all wago terms, especially transparent compounds such as 追い払う(注). I think an alternative approach is to (1) either just lemmatize at the most common kanji spelling, but still list the whole range of kanji with {{ja-spellings}}, and sense division with {{ja-def}}, or (2) break the word into different sense groups (e.g. かえる(帰る・還る) and かえる(返る・反る)), and lemmatize each of them as if they were different words, but use soft redirection for the etymology and pronunciation sections to avoid data duplication (c.f. Daijirin's treatment of 帰る as 〔「かえる(返)」と同源〕). This way every word is lemmatized at the most common spelling, and everyone is happy. --Dine2016 (talk) 06:09, 12 April 2019 (UTC)
Um, maybe we can justify the wago exception on the basis that the JA WT is also making it. Or this argument: “if the ‘lemmatize at the most common spelling rule’ were applied for Chinese, then each Chinese word would be lemmatized/mentioned in Simplified Chinese or Traditional Chinese based on whether it is used more frequently in {Mainland China and Singapore} or {Taiwan, Hong Kong and Macau}, which would be too absurd.” --Dine2016 (talk) 06:50, 12 April 2019 (UTC)
Using simplified over traditional as the main entry is a legitimate request, which has been discussed but discarded for very important reasons, etymological and technical., btw. your link is not working: ja:Wiktionary:項目名の付け方. --Anatoli T. (обсудить/вклад) 07:12, 12 April 2019 (UTC)
  • I don't work on Japanese entries but wanted to make general remark about something which came up while working on parsing code to find wanted entries (replacement for Template:redlink_category): if we can avoid specialized templates like {{ja-compound}} it will really help to make these sort of automated tasks much simpler. Otherwise we need to have additional logic to cover the language specific linking templates. The general idea would be to push the responsibility into the core linking code (which could internally still delegate to other modules). This would keep the template "surface area" small. Another thing to avoid is nesting inside linking templates: I've seen some instances of {{bor|en|{{ja-r|....}}}} which is tricky to parse and produces invalid output. {{bor}} should be able to figure out what to do when used with Japanese entries. – Jberkel 00:16, 16 April 2019 (UTC)
    Support language-specific logic that is incorporated into the "main" templates. —Suzukaze-c 18:19, 18 April 2019 (UTC)

Allographic variants[edit]

@Eirikr, Suzukaze-c, Lambiam, Atitarev While I was editing まま, it occurred to me that there are allographic variants among kanji forms which are fully exchangeable in writing without regard to reading. For example, and are essentially the same kanji whether read as まま or まんま, and 間々 and 間間 are essentially the same kanji form whether read as あいあい, あいだあいだ, ひまひま or まま. Therefore I would like to create a variant of {{ja-see}} with a recursion depth of two instead of one:

For pronunciation and definitions of – see the following entries.
まま【儘】 ⇒まま
[particle]as it is; remaining in a certain state; while; still
まんま【儘】 ⇒まんま
[particle](uncommon) Alternative form of まま (mama, as it is; remaining in a certain state; while still)
(This term, , is an alternative spelling of , which in turn is a kanji spelling of several terms.)

Under this approach, only "canonical kanji forms" will contain a list of readings (e.g. soft redirects to まま and まんま), while other kanji forms will simply redirect to the canonical kanji form (e.g. soft redirects to rather than duplicates its content) and have the template fix "double redirects".

For this we need to define what the "canonical kanji form" is. For example:

  • Should we allow extended shinjitai and lemmatize tōrō “lantern” at 灯篭, or should we stick to the official shinjitai list and lemmatize it at 灯籠? I think we need to have a standard if we want to build jitai conversion modules.
  • Should we allow the 踊り字 in canonical titles? I prefer to do so, because it's an essential part of modern orthography just as shinjitai and modern kana spelling are.

Also, how should we list the variants of a canonical kanji form, such as kyūjitai? It seems that there are two ways to present kyūjitai: either we limit ourselves to JIS X 0208/0213 to comply with Japanese computing, or we utilitize Unicode as much as possible to adhere to the Kangxi dictionary printing forms. If the latter, we might want to list as the kyūjitai of , of , or even 𥳑 of , the last of which seems to lack font support.

(Well, sometimes the orthographic variants are not fully exchangeable. For example, needs to fetch a subset of content from , and so does けふきょう, which complicates matters.) --Dine2016 (talk) 15:47, 12 April 2019 (UTC)

As to “canonical kanji”, inasmuch as lemmatization is automated (it may be prudent to allow overrides), following the 2010 jōyō kanji list has the advantage of being a clear standard and avoiding potentially endless debates over which character is to be preferred on a case-by-case basis – but a disadvantage is that this may (at least in some cases) not be what most users would expect. But, as I wrote above, there is no ideal solution to lemmatization, and occasionally having to follow a soft redirect is (IMO) not a big deal. Whatever is decided, the decisions should be encoded in tables used by the software modules, so that future revisions of the list can easily be incorporated. As to the internal representation of other kanji, I am somewhat partial to Unicode as being the more portable approach across platforms and probably the future also of Japanese Industrial Standards. Disclaimer: I have no experience whatsoever editing Japanese entries, so my opinions should not be assigned as much weight as those of experienced editors.  --Lambiam 16:53, 12 April 2019 (UTC)
I compiled a list of 398 official shinjitai at Template talk:ja-spellings#kyūjitai, of which 67 kyūjitai were found to be encoded using CJK Compatibility Ideographs. Since modern computing systems now have better font support for Japanese glyphs, I would prefer to comply with Japanese computing for better searchability. We can still list older forms such as and 𥳑 which are not in JIS X 0208/0213 as "historical kanji" rather than "kyūjitai" and nonstandard simplified forms such as as "extended shinjitai" rather than "shinjitai". KevinUp (talk) 02:51, 13 April 2019 (UTC)
It seems reasonable to me. Perhaps we should enforce the use of "official" Japanese kanji as main spellings, including 籠, for the sake of consistency. And I prefer 々. —Suzukaze-c 22:04, 13 April 2019 (UTC)

Proposal to look to Wikisource for citations.[edit]

I think that perhaps we should establish a practice of making Wikisource the first place that we look for citations for words, particularly older words. There are now thousands of books transcribed there. Cheers! bd2412 T 01:26, 10 April 2019 (UTC)

Why? —Μετάknowledgediscuss/deeds 01:29, 10 April 2019 (UTC)
It's hard to tell how accurate a cite is with just a sentence of context sometimes, and even if that editor can see the full context in Google Books, other editors, depending on their location and sometimes dumb luck, may not be able to. Wikisource will show the whole context to all users.--Prosfilaes (talk) 04:08, 10 April 2019 (UTC)
Wikisource is hardly the only site with full texts. If this is about providing more links that could be discussed. I see no reason to favor Wikisource over other sites such as archive.org. DTLHS (talk) 04:39, 10 April 2019 (UTC)
Archive.org doesn't generally provide full transcribed text, and the scans on Archive.org can often be quite slow to flip through. Wikisource offers both transcribed text and usually a link to the original scan.
Besides which, Wiktionary is hardly the only site with definitions. Should Wikisource work with Wiktionary, or should we link to other dictionary sites?--Prosfilaes (talk) 05:48, 10 April 2019 (UTC)
Archive.org is a rich but messy resource, some works have dozens of scans in varying quality, taking up precious editor time. Wikisource is definitely preferable here. There have been a few (community wishlist) proposals around to build tools to automatically extract and format quotations for the use in Wiktionaries but as far as I know nothing has materialized. – Jberkel 07:20, 10 April 2019 (UTC)
For what it's worth, {{Q}} (Module:Quotations) links to Wikisource quite a bit, for instance when you add a reference to the Iliad or Odyssey in an Ancient Greek entry: {{Q|grc|Il.|1|477|form=inline}}Homer, Iliad 1.477. — Eru·tuon 05:00, 10 April 2019 (UTC)
It might be particularly useful for all the requests for quotes from particular authors the templates for which some find annoying.
Any bias toward Wikisource is also a bias toward out-of-copyright sources and therefore old sources. I don't think we need that at all, even for terms that have been around for a while. DCDuring (talk) 12:21, 10 April 2019 (UTC)
A bias toward Wikisource over other similar collections of out-of-copyright sources doesn't change the overall issues. I'd like to have more quotes from the birth of our language. My problem is more about the dead period, from 1924 through ~1995 where we have the same problem basically anywhere we look. The works just aren't publicly available for copyright reasons anywhere.--Prosfilaes (talk) 03:28, 16 April 2019 (UTC)
  • I use Wikisource all the time for quotes. And there are the awesome lists on User:DTLHS/eswikisource. We should have User:DTLHS/enwikisource too, of course. I believe I asked D to make me one but the reply was something along the lines of that it was "full of crap" - yes, they were the exact words D used. --I learned some phrases (talk) 12:24, 11 April 2019 (UTC)

To expand on my original post:

  1. Wikisource is a sister project of ours, and as a Wiki any of us can edit there, meaning that we have some measure of control over what gets put there.
  2. Due to its joined status as a Wikimedia project, Wikisource is about as stable as Wiktionary. Other websites may disappear out from under our noses, but it is likely that Wikisource will exist as long as Wiktionary exists.
  3. To DCDuring's point, yes, Wikisource does have a lot of old sources but:
    1. We have a lot of old words, and there's nothing wrong with old citations if they define the word accurately.
    2. Wikisource actually does also have a lot of recent material, particularly public domain government documents including reports from various areas of specialization, and some case law; it can permissibly host much more of that.
    3. Didn't we just have this discussion last month about all these Webster's 1913 requests for quotes? Guess which Wikimedia project would be the one to host all the works from which those quotes could be found.
  4. Further to Jberkel's point, we could develop a tool to find and extract sentences containing sample words from Wikisource. It seems reasonable that somebody should be able to make a concordance of Wikisource, or of a particular subset of Wikisource texts.

Cheers! bd2412 T 22:11, 12 April 2019 (UTC)

Does Wikisource have Congressional committee testimony, especially Q&A? That's linguistically valuable and sometimes fun. Bureaucratic reports, not so much fun. DCDuring (talk) 02:05, 13 April 2019 (UTC)
That certainly falls within the remit of Wikisource, although I don't know how much of it there actually is at this time. bd2412 T 15:40, 13 April 2019 (UTC)

6 million entries[edit]

According to Equinox, Finnish konelypsy (automatic milking) is our six millionth entry, created by User:Surjection. —Μετάknowledgediscuss/deeds 14:34, 10 April 2019 (UTC)

That sounds like enough, job done. Time for us to find some new, worthy project; I wonder if Wikipedia still needs help generating lists of Pokemon... - TheDaveRoss 14:43, 10 April 2019 (UTC)
There are still one or two words in Wiktionary:Wanted entries so we shouldn't give up just yet. SemperBlotto (talk) 14:45, 10 April 2019 (UTC)
Onward to the six million and two!  --Lambiam 15:15, 10 April 2019 (UTC)
Did McDonald's stop at 6 million burgers? I think not. -Mike (talk) 20:40, 10 April 2019 (UTC)
Surjection and his milk again. *sigh* --I learned some phrases (talk) 12:21, 11 April 2019 (UTC)
  • Next question: who is the most prolific entry creator? DonnanZ (talk) 23:37, 18 April 2019 (UTC)
Equinox, followed by SemperBlotto. If you count machines then SemperBlottoBot, then WingerBot, then Equinox, then NadandoBot, then SemperBlotto. - TheDaveRoss 00:04, 19 April 2019 (UTC)
Oh, and if you only count euphemisms by sockpuppets, Wonderfool. - TheDaveRoss 00:05, 19 April 2019 (UTC)
Thanks, a predictable answer, I suppose, but I didn't think of pages created by bots. Do you base your figures on pages created by each editor? I do this for my own paltry figure. DonnanZ (talk) 08:29, 19 April 2019 (UTC)
This stats site gives a rundown of the top 76. If they had just one account over the years instead of 200, Wonderfool would be in 4th place, actually. --I learned some phrases (talk) 12:34, 19 April 2019 (UTC)
OK, so that doesn't have page creations. Where do you get your results, Dave? --I learned some phrases (talk) 12:36, 19 April 2019 (UTC)
It does give creates, that is the right set of columns. It is also no longer being updated with 2019 data and beyond, it has been replaced by Wikistats 2 which is garbage for things like user stats. X's tools is still current, but doesn't show lists of users. Not sure if there is a better view of users by contribution count available currently. - TheDaveRoss 12:44, 19 April 2019 (UTC)
Also WF is including bot edits in his count, but not in anyone else's, so [Citation needed]. - TheDaveRoss 12:46, 19 April 2019 (UTC)
It looks as though I rank 12th for edits, and 4th for creates (which is quite astonishing). If I look on my watchlist at "pages watched not counting talk pages" that gives me the current figure (56,708) as all pages created are automatically watched (and I don't watch any other pages). The 53,106 figure for creates in those stats is out of date of course, but seems to be accurate. DonnanZ (talk) 17:53, 19 April 2019 (UTC)

How should gerunds be handled?[edit]

In English, gerunds seem to be entirely ignored, I guess because they are always identical to the present participle. However, that doesn't apply for other languages. There are a few specific cases where this is relevant.

The first is Dutch. Dutch has a gerund, but it's identical in form to the infinitive, which is also the lemma form. We usually don't make form-of entries for forms that are the same as the lemma, so we have no entries for Dutch gerunds at all. It is mentioned in the inflection table, though, see roepen. As shown in the table, the gerund has neuter gender. Should every Dutch verb have a separate entry for the gerund?

The second case concerns German and West Frisian. In both of these languages, the gerund is also neuter, but it's not identical in form to the lemma. In German, there is a difference in capitalization, which also shows that gerunds are treated as nouns. In West Frisian, it's identical to the long infinitive, which is something the other languages don't have (but Old English had it). There seem to be a bunch of entries created for German gerunds already, in Category:German gerunds, and they are given a Noun header with its own inflection table. West Frisian barely has any entries for verb forms yet, so there is no precedent to go by.

The implication I take from the German treatment is that we should really be treating the English, Dutch and West Frisian gerund as nouns in their own right too. After all, why would we have entries for German gerunds but not for English, Dutch and West Frisian ones? In German, the gerund is unique in its orthographic representation, so it can't just "piggyback" on another verb form, and must have its own entry. But gerunds aren't just verb forms in other respects. They can have genders, like nouns, and even case forms depending on the language. They can also take both definite and indefinite articles, as well as possessive and other determiners, in English too. We already treat participles specially in many languages, giving them their own Participle header to show that they aren't just verb forms, but are more like adjectives. The same could be argued for gerunds, but we don't currently have Gerund headers anywhere. Should we? Or should we call them Noun? The fact that gerunds have genders and case forms tells me that we shouldn't just be labelling them as Verb. A sticky point is that Dutch gerunds can have a direct object before them (Wiktionary bewerken is leuk!) and English ones can have it after them (Editing Wiktionary is fun!), which is something specific to gerunds and not shared with regular nouns. That speaks in favour of a separate Gerund header. —Rua (mew) 12:58, 11 April 2019 (UTC)

Why should we draw any implications whatsoever for English PoS from what Dutch, German, and West Frisian inflection. Uniformitarianism is not the official religion of Wiktionary. DCDuring (talk) 17:55, 11 April 2019 (UTC)
I’m of 2.718 minds about this. On the one hand, it seems eminently reasonable. These gerunds are syntactically nouns, and therefore a heading “Verb” is misplaced. On the other hand, giving separate entries for all gerunds whose form is indistinguishable from a verb form will mean a lot of extra work. (In some cases the gerund has become a noun with a slightly different sense, like eten meaning “food”, not the act of eating, and such nouns definitely need a separate entry; here we consider the true gerunds whose meaning follows directly from the meaning of the underlying verb.) In Turkish, next to the infinitive (Sigara içmek yasaktırSmoking is forbidden), also the third-person present simple and future can assume the role of a noun (çıkmaza girmişHe has entered a dead end, literally a “does-not-exit”; gelecek bilinmezdir - the future is unknowable, literally the ”will-come”); moreover, they can also serve as adjectives. (Normally these are called participles by grammarians, not gerunds, but I see no argument why the same reasoning would not apply here.)  --Lambiam 19:07, 11 April 2019 (UTC)
Considering that editing has a noun entry, are you just arguing that the header should be changed to "Gerund"? Could it not just be handled in an etymology section or as text at the beginning of the sense definition? -Mike (talk) 20:19, 11 April 2019 (UTC)
There is noun entry for this particular verb, but every verb has a gerund. I'm saying that we should be making this a regular thing. —Rua (mew) 20:49, 11 April 2019 (UTC)
Is "editing" really a gerund in "Editing Wiktionary is fun"? Equinox 20:21, 11 April 2019 (UTC)
What else can it be? It's not a participle, unless you somehow read it as meaning that Wiktionary is doing the editing. —Rua (mew) 20:49, 11 April 2019 (UTC)

Wikimedia Foundation Medium-Term Plan feedback request[edit]

Please help translate to your language

The Wikimedia Foundation has published a Medium-Term Plan proposal covering the next 3–5 years. We want your feedback! Please leave all comments and questions, in any language, on the talk page, by April 20. Thank you! Quiddity (WMF) (talk) 17:35, 12 April 2019 (UTC)

Classical compounds in Category:English words by prefix and Category:English words by suffix[edit]

These categories are a complete mess right now, because we categorise all elements of Greek and Latin origin as affixes. As a result, the actual proper affixes of English are all but unfindable among all the noise. I think the problem here is our treatment of Greek/Latin elements. The combinations that are created when putting them together are called classical compounds, which makes their nature as compounds rather than affixed words very clear. While they are used productively in English and other languages, they follow their own rules, very different from true affixes:

  • They can be attached to each other, with no apparent root word, like anthropo- + -centric. You can't do this with real affixes: be- + -ness cannot make *beness.
  • They have a strong tendency to occur together. Often they can only be attached to each other, not to any other random word.
  • They originate in their parent language from root words, not affixes. Thus, combinations of them are not affixed words, but rather compounds. This is reflected in the English term for them, too.
  • One and the same term might be a prefix or suffix, with a difference in form. But what's really going on is that the shape depends on the position within the compound, final vs nonfinal. In informal use, words are adapted to this pattern by adding an o at the end of a nonfinal element.

Because of this, I don't think it does to call these "prefix" or "suffix", they're really their own kind of thing. I think in the interest of making the two above categories usable again, we should split the elements of classical compounds into their own kind of derivational category. There should at the very least be a Category:English classical compounds. We could have further subcategories based on the elements used, but I'm not sure if that's really fitting, given that these are compounds and we already tried and failed to categorise compounds by their elements before. I'm not sure about all the details of the solution yet, but I hope it's clear to everyone that something is wrong here. —Rua (mew) 20:55, 14 April 2019 (UTC)

Let’s define an English prefix or suffix to be something that is affixed (with possible morphological adjustments) to the stem of English words so as to form new English words, whose meanings for a given pre-/suffix are more or less derivable from the meanings of the words it is affixed to. Then indeed many of the entries currently advertized as English pre-/suffixes are miscategorized. The distinction with components with a classical pedigree is not always clear-cut, though, as seen in neologisms like user-centric ([3][4][5]) and Britain-centric ([6][7][8]). I think in these words -centric is a (productive) suffix. As another example, -ize is on the one hand a French suffix (-iser) that lifted along with words like angliciser when they were anglicized – in these words it is not an English suffix but an anglicized French suffix; on the other hand, it is responsible for forming new words like dandyize, bowdlerize and mongrelize. While I agree with the drift of this gripe, I think “English classical compounds” is a misnomer. Whatever xeno- and -phobia are, they are not compounds, but components found in classical compounds (and sometimes used in making new compounds with a classy appearance). Perhaps Category:English classicistic components?  --Lambiam 17:27, 15 April 2019 (UTC)
I think you misunderstood a little. I'm not saying that the elements of the compounds should be called classical compounds, but rather the combinations formed from them. In other words, anthropocentric should not be categorised as Category:English words prefixed with anthropo-, nor as Category:English words suffixed with -centric, because it's neither. I do see your point about terms like user-centric, and in that case we might be able to consider them suffixes, but I'm not completely convinced if -centric is a suffix in that case either. And since it's not a classical compound, that's separate from the matter I'm describing here anyway. —Rua (mew) 17:34, 15 April 2019 (UTC)
Sorry, I indeed misunderstood. I agree we should remove anthropocentric from Category:English words prefixed with anthropo- and Category:English words suffixed with -centric; in fact, I just did by changing {{confix}} to {{compound}}. I am not convinced there is a need for a new category Category:English classical compounds. (If the need exists, we will presumably also want Category:French classical compounds and Category:German classical compounds; and what about Category:Ancient Greek compounds and Category:Latin compounds?)  --Lambiam 17:54, 15 April 2019 (UTC)
Distinguishing Greek and Latin won't really be practical, because some of these combine both, even if there are some purists out there that hate it. :) —Rua (mew) 10:59, 16 April 2019 (UTC)
Yes, the word television is an abomination that flies in the face of etymological decency. The horror! The horror! Perskyi pereat!  --Lambiam 21:42, 16 April 2019 (UTC)

Wiktionary:Random Competition 2019[edit]

Hello all, I decided it's time to kick start the 2019 Wiktionary word game, which, for copyright reasons, it not like any other board game in the world. Ever. Any such resemblance is purely a fluke. User:Metaknowledge has won the last two years, let's try to knock them off the top. --I learned some phrases (talk) 00:27, 16 April 2019 (UTC)

Splitting Aramaic[edit]

It seems to me like we need to split the various stages of Aramaic into actual separate language codes, chief in my mind, Ancient Aramaic and Imperial Aramaic from Middle Aramaic, i.e. Jewish Babylonian Aramaic. I'm thinking [arc] should be reserved for the family code. @Fay Freak, Profes.I., Wikitiki89, -sche, Metaknowledge, thoughts? --{{victar|talk}} 03:57, 16 April 2019 (UTC)

No. Don’t know why Jewish Babylonian Aramaic would be Middle Aramaic, while Galilean Aramaic not? And Biblical Aramaic is still not so distinct from Jewish Babylonian Aramaic. And Imperial Aramaic is not that far. And what would even be Babylonian? If some people wrote Aramaic in Spain I would not know if it is “Jewish Babylonian Aramaic”. And what which Aramaic derive all the Arabic, Armenian and what ever terms from that are said to be from Aramaic? Working on the premise that the Aramaic form is the same or same enough, sometimes only a more modern form given (as for example when one gives the now leading German form when there have been a lots of forms before but a language derives from earlier German, not clear exactly which form), we have customarily given “Babylonian” forms from which other language terms are derived. All much constructed, and useless distinctions, and not resembling the actual language situations. Other dictionaries do not distinguish either necessarily, though some restrict a dictionary to a certain “dialect”. The various terms are more distinctions of genres of texts, classifications of corpora, that is for literary studies, than useful for linguistics, or specifically lexicography. What would you gain except pain from splitting?
General rule: If the set of grammar is essentially the same, it is the same language. One should recognize that some languages move slower than others. So “Aramaic” spans two thousand years or more before deserving split language codes, and Arabic has also only one over one and a half thousand years and rightly though some dialects coexist with this Dachsprache, whereas over this time span French has four (Latin, Old French, Middle French, New French), but most other Romance languages only three (Latin, Old Spanish, Spanish), and even that being under the suspicion of being too much as the difference is not so great (“Old Italian” has hardly been used here).
Why would the situation for Aramaic be different from what is now seen in Arabic? They all wrote a Dachsprache even if dialectally the differences might have been greater amounting to “different languages” (which isn’t a clear concept either with the modern Arabic dialects). Only after being conquered by Arabs the unity dissolved. What you want to do is like to remove Arabic as a language and only treat it as a group because one sees some “stages” and unintelligibility between the “actually spoken” languages – but there is continuity too. The situation with Greek seems even be similar, for is it any different from splitting Ancient Greek in “Attic”, “Aeolic”, “Ionic” etc. and “Koine Greek” and “Byzantine Greek” (“Middle Aramaic”)? If something is only from a certain period one can state it in labels, but splitting the alleged stages is de trop. Fay Freak (talk) 13:05, 16 April 2019 (UTC)
@Fay Freak: Sooo... Scots and English are separate languages but not Jewish Babylonian Aramaic and Imperial Aramaic? Look, I'm a huge advocate for merging dialects and do so on the regular, but these are two distinct languages, with their own pronunciations, morphologies, written in two different scripts, and separated by hundreds of years. The delineation seems pretty clear to me, far more than, say, Old French and Middle French. --{{victar|talk}} 15:23, 16 April 2019 (UTC)
But also, what is this entry? It lacks any and all labels. What are its sources? Is it ancient, and if so, are these vowel points true to the attested word, or are they hypothetical? --{{victar|talk}} 16:57, 16 April 2019 (UTC)
See, you force people to write things that they don’t know.
What kind of question is that even “is it ancient?”? People see words as Aramaic, they add them as Aramaic. Who would split all the Aramaic entries? You wouldn’t. Nobody is there who would. You are proposing a thing that is impossible to accomplish, going against existing desires: One could have separated already the lects by labels, but the desire to separate has not been there, and you won’t create it against editors who have hereunto been reluctant to separate.
And again, you ignore the principle of unity while claiming they have been “separated by hundreds of years”. Cicero is separated from us more than two thousand years, and yet Stephanus Berard writes the same language. There is back-coupling and cross-coupling and it is as important and sometimes more important than evolution. And yeah, there is no reason why there wouldn’t be Classical Nahuatl from 2019 if an author subscribes to the old rules. Years and scripts are not even an argument at all, and pronunciation only with caution. Sounds merging the distinction of which is not even expressed in script, like also begedkefet, is rather an argument against splitting for lexicographical purposes because the differences are not relevant on the token-level graphically. The fact that we distinguish “Imperial Aramaic script“ and “Hebrew script” is delusive: It is two scripts but it is also the same script, the like as Cyrillic and Latin Serbo-Croatian but diachronically, or even closer. Morphology: Dubious, I stressed the differences must be essential: The fact that some or many Classical Arabic constructions and derivation types are now not used does not mean Modern Standard Arabic is not the same language. MSA is a subset of Classical Arabic, JBA is a subset of Imperial Aramaic. Distinguish decline from split. Romance was bad Latin before it become modern languages. Fay Freak (talk) 22:15, 16 April 2019 (UTC)
@Fay Freak: How it is unreasonable to expect the user to know which form of Aramaic it is? That's like saying how can we expect people to add πατήρ under the correct header when it just reads Greek. It should be the contributors responsibility to understand the material, especially when it comes to ancient texts. In virtually all of my Aramaic sources, it either specifies the form of Aramaic or cites the work that does. So going back to my example of פתגמא‎, without sources and a proper label, how do we know the original text was even in Hebrew, let alone had vowel points? It seems to me, specifying the form of Aramaic is essential to the entry's quality and the comparison to Serbo-Croatian is comparison between unequals.
I'm trying to follow your comparison to Arabic and Latin. Yes, Imperial Aramaic was a standardized liturgical language, much like Classical Arabic and Middle Latin, but how does the Jewish Babylonian Aramaic of the Middle to Late Aramaic periods fit into that using your argument? Are you trying to say JBA is the liturgical successor to Imperial Aramaic, like Classical Arabic is to Modern Standard Arabic? --{{victar|talk}} 02:48, 17 April 2019 (UTC)
It isn’t some source or the provenance that should tell you whether it is in a language but the text itself. The comparison with Serbian, Bosnian, Montenegrin, Croatian lies here. If I see a text on the internet it often takes long to find out in which of these it is and I often do not know it at all at the end. (And this is not even since the internet but similarly with printed texts in Austria-Hungary.) Hence it is sane to treat all as Serbo-Croatian, because the difference is too minute. The fact that it is often treated separately is no indication that it shouldn’t be done otherwise. And the reason to restrict treatments of Aramaic to certain lects is similar to treating only a regional variant of Serbo-Croatian. A Serbo-Croatian historical dictionary is more work than a dictionary of standard Croatian of the 21st century. Similarly, there is no man who could compile a “Comprehensive Aramaic Lexicon” though there are many who wish they could. If people compile works restricted to periods and provenance it is because of the pile of material is large and scattered. Hence we also have Latin dictionaries restricted to Latin of antiquity because a work including Medieval or even Modern Age Latin would be yuge (so Karl Ernst Georges could enter all Latin from when Latin lived into a dictionary, but not more, and it took his life). You also see that for Ancient Greek the limits have been pushed more to the present apparently with media access improving. The recent The Brill Dictionary of Ancient Greek covers all up to the 6th century CE, other Ancient Greek dictionaries stop at 200 or wherever, just because one is a homo oeconomicus and has to end somewhere or publish somewhere. The definition of a language is pliable dependent on what one wants to accomplish. This is to show you that “what is a language” is an economic decision, and when one writes about “languages” one writes about the literary history of the grammars and dictionaries created: the picture will be slanted by this fact. (Encyclopedias, Wikipedia, often fall for this fallacy, because they can’t know all the material either.) That hence treatments of what might seem as “languages” do not entail that there are indeed separate languages that they should be like that in a community dictionary. A lot of absurd distinctions have been entered this way into Wiktionary already, so we have “German Low German”, “Dutch Low German” (Dutch Low Saxon) and “Mennonite Low German” (Plautdietsch), which is obviously caused by researchers not accessing all three areas, though these lects form a unity. Hence editors who dealt with texts from the three languages somewhat extensively concluded that the separation was wrong (@Korn, as I remember). As a result the editors get disenchanted because of the arbitrary distinction and cease to treat the language on Wiktionary.
What you say “without sources and a proper label, how do we know the original text was” etc. is a general problem of Wiktionary and lexicography, but has little to do with language distinction. We would ideally have quotes to make all clear, which regions and periods used it and what the semantic range was or probably was, but splitting the language distinctions anew is no way to achieve editors to do it more than they already do, but I expect it will cause treatment of the language to die off. Fay Freak (talk) 19:01, 17 April 2019 (UTC)
Confirm, Low German would likely greatly benefit from eschewing categorisation based on non-linguistic tradition (orthography and political borders), but nobody wants to do the work of actually etching out a working solution for such a radical change or risk letting someone else do it alone. From this experience I warn that once a split is decided, some people will start implementing it, maybe botting it, potentially frustrating some editors. But if after five years everyone agrees it isn't an optimal state, it's likely that the decision will never again be undone, because that would require enough editors in Aramaic to band together, declare consensus, and then elbow-grease away five years of random edits to implement it, so you'll have a terrible mixed mess. Korn [kʰũːɘ̃n] (talk) 22:02, 17 April 2019 (UTC)
I'm wary of language splitting in general, but how hard would it really be to split it into Old and Middle or something equivalent?
Most of Wiktionary is written by people with a passionate interest in given languages, rather than passerbys who enter a word or two, especially for ancient or obscure languages. I think that asking editors to ascertain the variety roughly enough is not placing too much burden. I certainly feel that using the actual script in which the form was attested is in should be obligatory.
That said, it comes down to convenience, could we hear some more people who work on Aramaic? Crom daba (talk) 23:17, 17 April 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I think I need to circle back on my original complaint. To call Jewish Babylonian Aramaic the same language as Imperial Aramaic is completely unprecedented according to current scholarly conventions, which is why we have one ISO language code for Old Aramaic [oar] and another for Jewish Babylonian Aramaic [tmr]. Equally troubling is calling Classical Syriac [syc] lemmas "Alternative forms" to JBA, as seen in this entry, which by no stretch of the imagination should be considered one in the same language. Lumping JBA into Old Aramaic also creates the problematic situation where we find ourselves labeling inherited and borrowed descendants of Imperial Aramaic as derivatives of JBA in the descendants section, which is grossly inaccurate.

Here are three possible options:

Keep Old Aramaic and JBA mergedSplit Old Aramaic and JBA apartSplit into Old Aramaic and Judeo-Aramaic
  1. Label all identified entries, ex. {{lb|arc|Imperial Aramaic}}
  2. Move all unidentified entries to Latin script entries
  1. Split Aramaic [arc] into a) Old Aramaic [oar] (also with appropriate labels, i.e. Ancient, Imperial, Biblical, etc.), and b) Jewish Babylonian Aramaic [tmr] (beside other Middle Aramaic lects, i.e. Mandaic [myz], Samaritan [sam], etc.)
  2. Move all current Aramaic entries in Hebrew script (which is all of them) to Jewish Babylonian Aramaic
  1. Split Aramaic [arc] into a) Old Aramaic [oar] (Ancient, Imperial), and b) Judeo-Aramaic [arc-jud] (Biblical, Jewish Babylonian, Jewish Palestinian)
  2. Move all current Aramaic entries in Hebrew script to Judeo-Aramaic

Incidentantally, I found this very old (2012) discussion also advocating splitting Jewish Babylonian Aramaic away from Aramaic, but unfortunately nothing came of it. It should be noted that the 7th most recent Aramaic entry is from 2008/2010, so activity within Aramaic has been virtually dead for a long time. I wrote the transliteration module for Imperial Aramaic just yesterday. --{{victar|talk}} 05:16, 18 April 2019 (UTC)

In “this entry” the Syriac/CPA isn’t put as an “alternative form of JBA”, this is exactly what isn’t done because the entry is just “Aramaic”, and also the JBA form is rather the here alternative form אֲרִישָׁא‎, not what I made main form אֲרִיסָא‎. I didn’t make the JBA form “the” form. Nothing at all troubles me with the current language headings. Also as you can see on the CAL the form is way earlier, “Palestinian Targumic” and the like. On the other hand “Old Aramaic” allegedly gives way to “Middle Aramaic” by the 3rd century which is quite arbitrary. Haven’t dealt with the Imperial inscriptions much but at least Biblical Aramaic is not very different in grammar from so-called Jewish Babylonian Aramaic, from what I have seen in grammars and samples. The “options” show again how arbitrary the distinction is, apart from being variously complicated, given all the alleged dialects of back in the day that have a name (why would you even want to do the opposite from what CAL does? Their preference is also to lump and label, to avoid complicated structure at such a fundamental level). But if you can distinguish, you could already do it with labels, and you could auto-categorize Imperial-Aramaic script Aramaic as Imperial Aramaic.
Ironically, in “this very old (2012) discussion” @334a argued (nobody else argued on this topic) that Aramaic should be split “like Mandarin, Cantonese, Wu”, which is a split that has been reversed since because of having been experienced as tedious.
And again you ignore the principle of unity. You say “To call Jewish Babylonian Aramaic the same language as Imperial Aramaic is completely unprecedented according to current scholarly conventions, which is why we have one ISO language code for Old Aramaic [oar] and another for Jewish Babylonian Aramaic [tmr].” One can likewise say that they are all the same language, hence the language code [arc] and hence the name Aramaic – it shows that the Verkehrsauffassung prefers a unity, even more than with Serbo-Croatian. As I said: The definition of a language is pliable depending on what one wants to accomplish. The truth of what is a separate Aramaic language does not need to be pursued here howsoever. The fact of being entered in Imperial Aramaic script is alone a feature that allows every reasonable reader and editor separation. Adding “Imperial” in every L2 header of every such entry does not add any value, because nobody cares at that point, but splitting has the potential of disenchantment, as sufficiently outlined with Low German and what not. Fay Freak (talk) 16:52, 18 April 2019 (UTC)
@Fay Freak:
  1. Delineating Aramaic into Old and Middle is well founded principle of thought within Aramaic scholarship (see Fitzmyer and Siegal). To call a time period delineation "arbitrary" is to call all such delineations so. A line in the sand needs to be made somewhere and all Middle Aramaic languages have their own language codes on en.Wikt with the unjustified exception of JBA.
  2. No one is suggesting any radical segmentation of Aramaic -- we're just talking about splitting JBA away, as per common convention.
  3. [arc] is intended to exist as a family code, which is why we have codes for Old Aramaic [oar] and JBA [tmr]. Arguing that is like arguing "why do we even have [gem]?"
  4. Who has suggested adding "Imperial Aramaic" to any headers? Not I. I am calling for labels within Old Aramaic. I recommend you carefully read through my three proposals again.
--{{victar|talk}} 17:17, 18 April 2019 (UTC)
Does not make a difference. If you write “Jewish Babylonian” everywhere it does not make the entries any truer. The entries should be already true, and it is true if it stands that they are “Aramaic”, adding “Jewish Babylonian” in L2 does not add to it. Labelling “Jewish Babylonian” (with {{lb}} and {{tlb}}) is what one can already do – provided a form is really “only Jewish Babylonian Aramaic”. With “Aramaic” one is on the safe side. I dispute that a line in the sand must be drawn. Not delineating is also founded. What we must do only is to call entries by a name to give the reader the information they want about what it means and where it comes from, and “Aramaic” does the job no less than any split distictions. Whether it belongs to a certain region or period is something readers expect in labels and not on the language distinction level. Hence Chinese is not even split because it is not necessary for giving the information. Fay Freak (talk) 17:29, 18 April 2019 (UTC)
@Fay Freak: And that viewpoint would best first of the three options I put forth, but that should prerequisite moving unidentified Aramaic entries to Latin script entries because to render them in Hebrew is inappropriate. You may disagree with that statement, citing your incomparable Serbo-Croatian comparison, but I think you'll find that most editors disagree with you on that point, and think that historical lemma should either be rendered in their original script, or in Latin. --{{victar|talk}} 17:41, 18 April 2019 (UTC)
Aramaic should never-ever be in Latin script. The so-called Hebrew script (which is really the Aramaic script at least no less than the Hebrew script) is the appropriate fall-back script, particularly when a script is not yet encoded (there have been a lot, and they flowed into each other that one often does not even know if it is “already an own script”). The scholarly convention has always been to give Aramaic in Hebrew script when it has been attested in a script that is not available for printing or easier to enter, for example Nabataean Aramaic has often been cited as Hebrew. Again you miss what is the actual cause of scholars doing a thing. They don’t use Latin script because it is somehow standard or conventional, but because they are bad at doing better. Like Nostraticists cite all in Latin script. So Iranists rub their hands as long they can excuse writing books about Middle Persian and making recensions of Middle Persian works in Latin script with its not being encoded or no keyboard layouts being available etc. and thereafter they revel in being able to continue being lazy by referring to an alleged convention consisting of practices that have however always been wrong, but they have missed to tell it just in case it becomes opportune to rest on the Latin script. But the philological science of a remote culture only starts when you leave the schemes of the Latin script: As long as something is in Latin script it is pseudo-science, an imitation, if it wasn’t inclined for the Latin script, a surrogate for science. Now I see how the cat jumps: This proposal is another plot of Rome to impose its script upon all peoples, to the detriment of science. Fay Freak (talk) 20:40, 18 April 2019 (UTC)
If I had a class of philologists at my disposal, I'd employ them to lookup random n sentences from the corpus of Old and Middle Aramaic and see word by word how many words would result in effectively doubled data (sans the script) when entered together with its respective cognate (if it exists) into Old and Middle Aramaic.
Barring that, I don't think this discussion will be fruitful. Crom daba (talk) 20:57, 18 April 2019 (UTC)
(edit conflict) @Fay Freak: I would say the vast majority Aramaic terms in sources not in their original attested script are rendered in Latin. The only exceptions you might find to that is in the context of Jewish Aramaic or Hebrew research. Just as we render Tocharian and Book Pahlavi in Latin, instead of, say, Chinese or Arabic, the default on en.Wikt is to use Latin script. Regardless, that would not be my prefered option or the three I present.
I really rather hear more from others, because evidently you and I cannot alone come to any agreement. --{{victar|talk}} 21:07, 18 April 2019 (UTC)
Just chiming in here. I think Aramaic should definitely be split. In particular I find Fay's comparison of Aramaic to Latin to be ill-conceived. While it is the case that Latin is still produced in academic and ecclesiastic contexts, it has no native speaker population. That class of native speakers did stick around in the form of Romance language speakers. While modern Aramaic may have a lot of learned borrowings from earlier stages, this is nowhere near saying that the spectrum modern Aramaic lects is equivalent to the very much dead Latin.
Furthermore, if a user has trouble adding an Aramaic term because they don't know its chronology or script, this is as it should be. We aren't a dumping ground for random words. Even comparison to Ancient Greek show the highly articulated system for handling Ancient Greek dialectology (if "Ancient Greek" existed in any real sense until the Hellenistic period). Users should be expected to know about the words they are adding. —*i̯óh₁n̥C[5] 00:10, 19 April 2019 (UTC)
Script, I always stood on the position that it should be in the original script as far as possible. Nor did I say anything about modern Aramaic dialects. But chronology? I emphasized that it must become clear from the language itself that it is a different language. If a user does not see a difference then there probably isn’t any. Like one finds texts and does not see whether they are Bosnian or Serbian, or even if one knows the difference it does not look enough to necessitate a separation. The date and region is not the language. It’s not about “trouble” but about artificially seeking language distinctions when it is unneeded. The idea that after a certain point we have “Jewish Babylonian Aramaic” because “the line must be drawn somewhere” is, however tempting, still arbitrary, because only the purpose determines whether something is a language. It still has not been shown that only if languages are separate in that scholarly sense that is determined howsoever they should be separated in this lexicon. We treat Ukrainian and Russian and all Slavic languages separately because the standards and tables are different and the words in details too often, though it might appear that they are only one language with umpteen dialects. It does not matter what “is a language”, no! Who inculcated you that when something is considered a separate language by scholars it must be split off on Wiktionary? Baseless presumption. Whether something “is a language” or whether it is “correctly seen so by scholars” does not matter if we still can handle all with the same tables. And so we can handle Biblical Aramaic like Jewish Babylonian Aramaic in the same sections with the same tables. After all we have even Chinese in one heading. According to most purposes German and English are separate languages but a split of Jewish Babylonian Aramaic serves according to my expectation no purpose. Fay Freak (talk) 00:52, 19 April 2019 (UTC)
This proposal is another plot of Rome to impose its script upon all peoples, to the detriment of science. Really? Aramaic is normally written in a script of 22 letters; one can replace them with an arbitrary set of any other 22 symbols and preserve all the information. Transcription into computerized or typeset Aramaic throws away a bunch of information that may or may not be relevant, and that transcription may be more or less accurate. Once that's done, an exact transliteration doesn't change anything, yet makes it easier for people familiar with Latin script to understand the text and makes it possible to compare across languages that use different scripts. Demanding that all the people working in Gothic and Egyptian hieroglyphics use the exact form of the original has nothing to do with science, it's just exclusionary.--Prosfilaes (talk) 04:25, 19 April 2019 (UTC)
So many dissonances: On one hand one shall not be “exclusionary” and one not demand the exact form of the original, on the other hand one shall and it “is as it should be”.
What about acknowledging that the current exclusionarity is exactly right? No proposal for change here is advantageous. It’s all about adding qualifiers like “Old”, “Judeo”, “Jewish Babylonian” or Latin script where it does not belong to. As they are, the entries are correct and miss only the details like most entries on Wiktionary (period, quotes). Pigeonholing further but base all on one’s own “Latin“ standard is just the classical American hybris and doublethink and, in this case, casual anti-Semitism. “Just disregard all cultural ties and do it like we do. The American way is best! Our way is the basis and father of all things!” Fay Freak (talk) 13:45, 19 April 2019 (UTC)
My stance is thus: Ideally, they should be split, but it's a lot of work and maybe not worth doing. A note regarding Biblical Aramaic: It is very different from Jewish Babylonian Aramaic. For one thing, BA is a Western dialect, while JBA is an Eastern dialect. There are numerous grammatical, phonological, and morphological differences. --WikiTiki89 19:11, 19 April 2019 (UTC)
Indeed. There are also numerous grammatical, phonological, and morphological differences between the comedies of Plautus and the sermons of Augustinus. So much one could write about the unlike syntactical constructions, the sound changes meanwhile, and the different endings used. And yet what matters lexicographically would need to justify different language headers. And yes, I also oppose the concept of a language “Old Latin”. Its names aren’t even correct. Fay Freak (talk) 19:47, 19 April 2019 (UTC)

Misspelling Alternative[edit]

One of the biggest objections that seems to be raised around removing misspellings, or banning them, is that the entry for a misspelling points users who search for it to a correctly spelled entry. One possible solution which would enable search to consistently find misspellings we deem important enough to include would be to put them on the correctly spelled entry, but not display them. The search function finds them easily enough, so searches for misspellings still present the correctly spelled entry to the searcher, but without the intermediate step of landing on an incorrectly spelled page.

I created a demonstration template {{misspelling}} which provides the simplest possible version of this idea, and applied it to the recently deleted page urothelical (urothelial). One simply adds {{misspelling|urothelical}} to the end of the language section (or entry?), and then when a user searches for the misspelling they see the correct page (in this case as the first entry suggested). Additionally the template labels the term as a misspelling, so it is somewhat clear what is going on. You can see what it looks like on these example search results. This could obviously be fancied up with language and categorization and all kinds of things if desirable.

Ideally the Mediawiki search would be smart enough that the user always had the correct entry suggested at the top of the search page, but with the multilingual nature of the project that is an extremely difficult goal. This sort of structure may actually strengthen the search's ability to suggest pages, or to become more advanced down the road. Thoughts? - TheDaveRoss 18:37, 17 April 2019 (UTC)

Cool. For this example, isn’t “urothelical” a misconstruction though? Fay Freak (talk) 19:07, 17 April 2019 (UTC)
Probably, I just copied the intent of the original page. No reason this same mechanism couldn't be applied to misconstructions and typos. - TheDaveRoss 19:09, 17 April 2019 (UTC)
If we define a misconstruction to be a misunderstanding or misinterpretation resulting from the use of the wrong meaning of a word that has multiple meanings, then this is not one. Although uro- has two meanings, this is not the result of interpreting it incorrectly as meaning “tail”. The issue is solely with the spelling thelical, which has zero meanings. Note that we also list the miscreation epithelical.  --Lambiam 18:13, 18 April 2019 (UTC)
Support. It's a pretty elegant idea that lets these words be findable without having an entry. On the other hand, how do you distinguish a misspelling from an alternative spelling? —Rua (mew) 15:52, 18 April 2019 (UTC)
Alternative forms are when there are reasons why an informed person uses or used the forms (conscious spellings), while misspellings are when such reasons are absent. If a misspelling is also a legit spelling then one would of course use the gloss templates we are used to, {{misspelling of}}. Fay Freak (talk) 17:05, 18 April 2019 (UTC)
If "urothelical" happened to be a word in another language this wouldn't work at all. DTLHS (talk) 18:15, 18 April 2019 (UTC)
Not at all is a bit strong, but if there is no acceptable way to make that situation work we can leave the status quo for entries which would otherwise exist, and use this method for entries which would not. Other solutions include listing common misspellings in the "also" section at the top, perhaps distinctly in a "did you mean" format. - TheDaveRoss 18:31, 18 April 2019 (UTC)
Does anyone actually make use of our misspellings as data for some purpose?
Even if there were, the proposal, with modifications and limitations as suggested above sounds good. DCDuring (talk) 18:46, 18 April 2019 (UTC)
At this point I'd just like to kill all "misspellings" until they become acceptable spellings (not sure exactly how we make that judgement!). Giving them first-class status encourages all the fungus of categories, alt forms, etc. to grow on them and legitimises them beyond what they deserve. But this idea might be an improvement, sure. Equinox 19:28, 18 April 2019 (UTC)

Proposal to unify the size and style of CJKV text[edit]

I have a proposal for a number of changes regarding CJKV text:

  1. Unify font size: 120%.
  2. Set line-height to be 1, to prevent CKJV text from affecting the line-height of Latn text.
  3. Re-enable bold font weight for Japanese.
  4. Do not enlarge CJKV bold text.
  5. Do not use bold font weight for all Vietnamese Hani text.
  6. (Other cleanup.)

Rough preview of before and after.

Secondary to CSS:

  1. (Use Kore for all Korean text, instead of using Hani for hanja and Kore / Hang for hangul.)
  2. (Repair certain Japanese furigana templates to fix certain oddities regarding font size.)

If there are no objections, I will ask for implementation.

Suzukaze-c 19:56, 18 April 2019 (UTC)

Semi-automatic correction of Cyrillic text with Latin characters[edit]

As editors who watch Recent Changes probably have noticed, I've been correcting Cyrillic text that contains Latin characters. I created a list of links in {{m}}, {{l}}, {{t}}, and {{t+}} for languages that only have Cyrillic script listed in their data table that includes the entries that will be processed. Russian, Ukrainian, Belarusian, Bulgarian, and Macedonian have already been processed. I'm using the list of similar-looking Latin and Cyrillic letters at w:User:Trey314159/homoglyphHunter.js, with some additions. An example edit can be seen here. I review each edit and don't change some of the links because they are clearly in the Latin script. — Eru·tuon 20:33, 18 April 2019 (UTC)

Finished. There still remain other linking templates that might need cleanup. [Edit: Finished the most common etymology templates.] — Eru·tuon 21:28, 18 April 2019 (UTC)

@Erutuon Great work, thank you! You may want to do the same with Arabic (partial) homoglyphs with Arabic, Persian, Urdu, etc. If it's still required. --Anatoli T. (обсудить/вклад) 04:25, 19 April 2019 (UTC)
@Atitarev: Yes, I think I should do that. I'm familiar with Arabic, but not very familiar with Persian or Urdu. What characters should I be looking for in each language and replacing? In Persian, it looks like ك‎ (Arabic letter kaf) and ي‎ (Arabic letter yeh) would be incorrect, since Persian uses ک‎ (Arabic letter keheh) and ی‎ (Arabic letter Farsi yeh) instead. If that's right, I can look for the non-Persian character in Persian linking templates and replace it with the Persian one. (Here's the working list.) — Eru·tuon 05:01, 19 April 2019 (UTC)
(edit conflict) @Erutuon: ك‎ (Arabic letter kāf) and ي‎ (Arabic letter yāʾ), ى‎ (Arabic letter ʾalif maqṣūra), and ک‎ (Persian letter kâf) and ی‎ (Persian letter ye) are exactly the letters to look for, they are partial homoglyphs because they look identical only in certain positions, copypasta and wrong keyboards cause the common misspellings. Urdu uses the Persian ک‎ and ی‎. The Arabic ي‎ is also used in Pashto but Pashto uses the Persian ک‎. These are the most common errors, which can be checked without any deeper knowledge of these languages and the spelling rules. Things to look for is to check if letters specific to one language are used in another, e.g. the Arabic ة‎ (tāʾ marbūṭa) is normally not used in other languages or it would be an extremely rare case, like specific Persian letters can occasionally be used in standard Arabic or dialects. --Anatoli T. (обсудить/вклад) 06:02, 19 April 2019 (UTC)
Okay, thanks! I've added instances of alif maqsūra to the Persian list, and tāʾ marbūta as well, though the latter I will just let others deal with. — Eru·tuon 06:46, 19 April 2019 (UTC)

Finding multiword terms when searching for one of the words?[edit]

I had an interesting situation just now, where a friend used the term Gish gallop. I had no idea what that meant. I tried looking up the unfamiliar word gish, but the definition there made no sense and didn't help me understand what was being said at all. Of course, I didn't realise that this was a multiword term, and doing what is most natural in the situation (looking up the one word I didn't know) gave me nothing. Eventually she explained it to me and then I realised that it's a combination of two words I needed to look up, which then led me to the right entry. But in itself, there was nothing to hint that this was an idiomatic combination and Wiktionary wasn't helpful in getting me where I needed to be. I'm guessing I'm not the only one to have this problem. Is there anything we can do to improve it? —Rua (mew) 23:53, 18 April 2019 (UTC)

Solution: you consider that the element "gish" might be capitalised, go to Gish, and see the link to the derived term. —Μετάknowledgediscuss/deeds 03:18, 19 April 2019 (UTC)
Because I frequently search for unlinked taxonomic names (including one-part names), I have the habit of searching for entries that merely 'contain' my search term. That kind of search yields Gish gallop as the third item on the search results. DCDuring (talk) 10:03, 19 April 2019 (UTC)

"What means X?"[edit]

Hi Wiktionary! I know a German guy and when he doesn't understand a word in English he asks "what means X?". I was trying to explain to him that you have to say "what does X mean?", because "what means X?" sounds like you are asking for a word whose definition is X (although, admittedly, people will probably understand it because the other interpretation is too weird). I don't speak any German and I found it totally impossible to explain to him what the difference is, and why "what means X?" is wrong. (I believe the grammatical term for English is "do-support", but this isn't a guy who will go reading a lot of grammar.) Could anyone help me explain this to him? A few short sentences of German that explain the difference would be absolutely fantastic. Equinox 03:12, 19 April 2019 (UTC)

I'm not great at saying it in German, but What means X? has what as subject (like the nominative case) and and X as direct object (like the accusative case), and What does X mean? is the other way around, with X as subject and what as direct object. — Eru·tuon 04:14, 19 April 2019 (UTC)
I understand the issue grammatically, but I would like to explain it to a German who doesn't know or care about grammar, but thinks "what X means?" and "what does X mean?" are identical. Maybe it would be good to have those two sentences translated very literally into German. Equinox 04:48, 19 April 2019 (UTC)
Hmm, well, the German word for what (was) doesn't have distinct nominative and accusative forms, but if you replace X with you and translate, you get Was bedeutest du? ("What do you mean?") for What does X mean? but Was bedeutet dich? ("What means you?") for What means X?, which seems just as weird as the English. — Eru·tuon 05:10, 19 April 2019 (UTC)
Tell the German speaker that this is a present simple question, for which the auxiliary verb do is required. The German translation, of course, would be "was meint X?" so it would be easy to translate that as "what means X?" --I learned some phrases (talk) 09:50, 19 April 2019 (UTC)
In German the phrase is "was bedeutet X?" (not "was meint X?"), in English you need to add "do" to interrogative sentences, unless the question is about the subject, e.g. "what makes this sound?" - "was macht diesen Laut?" or "who is speaking?"/"who speaks?" - "wer spricht?". --Anatoli T. (обсудить/вклад) 11:56, 19 April 2019 (UTC)
It sounds like he won't care why. You could just tell him that when using means (or meant) in a question the known thing should always be first and the unknown is last. Hence, "This means what?" Now how to put that in German, I have no idea. -Mike (talk) 07:47, 20 April 2019 (UTC)
This web page in German gives a very concise but readable summary (by way of examples) of the main rules of English grammar, including the word order in questions. The example that matches your friend’s problem the most closely is, “What does she watch everyday?”. The page only states what the rules are, not why they are as they are.  --Lambiam 10:41, 20 April 2019 (UTC)

Eye dialect (again)[edit]

I know that there have been discussions previously about the use of the label "eye dialect" within Wiktionary, and, especially, whether it is correct to use the term to refer to "pronunciation spellings" that are intended to mimic a nonstandard pronunciation. Since the last time I looked, I think, the following additional definition has been added at eye dialect:

2. (more broadly) Nonstandard spelling which indicates nonstandard pronunciation.

Is everyone happy with this definition, and happy that it should be applied within Wiktionary, such that, just to give one random example, geddit should be labelled "eye dialect"? Mihia (talk) 03:35, 22 April 2019 (UTC)