Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit


June 2017

Last remaining private use area characters[edit]

The following pages have private use characters: , proposition, 슴새, bên, 배다, xǔxǔ, and 癩 (a redirect). Can we clean these up? DTLHS (talk) 05:06, 1 June 2017 (UTC)

Out of curiosity, did you check all namespaces, or just the mainspace? Just the other day, I changed a PUA character in a sortkey of Module:languages/data3/t! - -sche (discuss) 05:10, 1 June 2017 (UTC)
Just mainspace. DTLHS (talk) 05:15, 1 June 2017 (UTC)
@-sche Here's the entire site if you want to see it. DTLHS (talk) 05:36, 1 June 2017 (UTC)
Thanks! The uses in userspace and some other places can probably be left as-is, but e.g. the uses in Template:IPAsym looks like ones we should check. In one case, I see that the character has since been added to Unicode. - -sche (discuss) 18:26, 1 June 2017 (UTC)
All done. Wyang (talk) 07:33, 1 June 2017 (UTC)

Conventions for Egyptian[edit]

@Furius, Hyarmendacil, Strabismus, CAmbrose I’ve been doing some work with Egyptian and have come across a number of problems that aren’t yet addressed or could do with changing in WT:About Egyptian, so I thought I’d ask about them to try to come to some sort of consensus before going ahead with implementing anything unilaterally. (Unfortunately I think all the Egyptologically inclined editors I pinged are inactive, but I figured it’s worth a try.) The proposals I would make are these:

  • Distinguish between s and z instead of merging them both into s like we (nominally) do now. The phonemes merged by Middle Egyptian, but they were still separate in Old Egyptian, and most authors still make the distinction, from Allen to the Wörterbuch. Even Faulkner’s dictionary of Middle Egyptian puts the variants with z in parentheses where appropriate. On Wiktionary, Egyptian covers Old Egyptian as well as Middle, so that separating z out makes more sense, and there are already a few entries that have z in defiance of the current policy.
  • Change 3 to Ꜣ in transliteration. This is the correct Unicode codepoint dedicated to Egyptological aleph, and font support has become good enough that I think most people can see it. (Is this actually the case?) The current 3 wreaks havoc with the linkify function in templates, which converts it to an unlinked superscript 3 whenever it comes at the end of a word.
  • Change ˤ to ꜥ in transliteration. Not so important, just a switch to the correct codepoint now that it’s probably supported for most people. (Again, can anyone confirm/deny?)
  • Standardize the hieroglyph Z4 (the two dual strokes
    ) and the nisba adjective ending to be a variant of j (
    ) rather than y (
    i i
    ). (Right now it is not defined either way.) Both conventions are common — Faulkner and Hoch call it y, while Allen, Loprieno, and the Thesaurus Linguae Aegyptiae call it j — but the convention in most of our entries was to call it j, presumably because most of our editors were reading Allen. I’m proposing that we standardize what already seemed most common.
  • Only use dots (.) to separate morphemes in the case of suffix pronouns etc., not with inflectional suffixes for singular, plural, etc. This is again the convention that already seems most common, presumably because it’s the one Allen follows. I’d be fine with the other common system, too, where equals signs (=) separate out the suffix pronouns and dots separate out all the other morphemes, but consensus seems to favor the one Allen uses.
  • Standardize the use of periods as dots, rather than using interpuncts (·). Again for this, I have no opinions either way, but this seems to be current consensus.

Thoughts? —Vorziblix (talk) 07:56, 1 June 2017 (UTC)

I don't remember when and why our current standards were established, but it is true that nobody working on them is active any longer. I think all your points seem very reasonable, although alternate transliterations should remain as soft redirects to the standardised lemma form using {{egy-alternative transliteration of}}. —Μετάknowledgediscuss/deeds 15:16, 1 June 2017 (UTC)
Most of these proposals seem reasonable; I do not know enough to comment on j vs y. I strongly support avoiding the equals sign =, which causes difficulties if provided in a template (link, etc), and possibly also if we were to devise a module that parses the content of our entries to generate something (as is done in some Thai entries, and recently proposed for Arabic), and possibly also for some re-users of our content, because of what it usually means in template syntax. - -sche (discuss) 18:37, 1 June 2017 (UTC)
@Vorziblix: I can confirm that the and characters (as well as ˤ) all display on the machine I am currently using, which is a loaner that does not have obscure font support. —Justin (koavf)TCM 19:44, 1 June 2017 (UTC)
I don't edit Egyptian, but and just show up as boxes for me... Andrew Sheedy (talk) 02:22, 2 June 2017 (UTC)

Thanks for all the feedback. Let’s consider the use of the equals sign definitively ruled out, and the soft redirect policy makes sense. Regarding lack of universal support for and , I am inclined to think the template issues and Unicode compliance still make the changeover worthwhile, but will defer to consensus if the general opinion is otherwise. In any case I’ll wait a few more days to give more time for comments before making any changes/additions to WT:About Egyptian. —Vorziblix (talk) 04:20, 2 June 2017 (UTC)

The symbols also display correctly for me. If we switch away from , remember to update that entry, which currently mentions the use of ˤ. I wonder if it would help to switch egy from using Latn as its script, to using Latinx, which I think calls on more comprehensive fonts. - -sche (discuss) 05:20, 2 June 2017 (UTC)
That’s probably a good idea; the characters are in the LATIN-EXTENDED-D block, which is explicitly looked for by Latinx but not by Latn, if I’m interpreting that page correctly. —Vorziblix (talk) 20:26, 2 June 2017 (UTC)
@-sche Since I can’t edit Module:languages/data3/e without admin privileges, would you be willing to make the switch to Latinx? Thanks, —Vorziblix (talk) 20:24, 21 June 2017 (UTC)
Yes check.svg Done. - -sche (discuss) 21:24, 21 June 2017 (UTC)
I am very inactive, sadly, but all of these are reasonable. The first point (s/z) is the one that is most likely to cause issues, I think. What would the new rule mean for e.g. nts? So far as I'm aware (and I confess that I have never studied old Egyptian) the alternative reading only comes into existence once z and s had merged. Would we create a new article for ntz? Furius (talk) 15:57, 2 June 2017 (UTC)
The lemma form would definitely be at nts, since it derives from the fem. sing. 3p. suffix pronoun .s, which was s all through Old Egyptian; if ntz exists at all, it should just be an ‘alternative form of…’ entry, but it’s probably unneeded. In general words that first appear after OE and never had the s/z distinction would be lemmatized with ‘s’; words that had ‘z’ in OE would be at the ‘z’ variant, but with an ‘alternative form’ entry at the ‘s’ variant; and words that had ‘s’ in OE would straightforwardly be at the ‘s’ variant. That way anyone searching by ME form could always search with ‘s’, and anachronistic readings like ntz wouldn’t be necessary.
By the way, thanks for your work with verb conjugations; it helps a lot with making templates! —Vorziblix (talk) 20:26, 2 June 2017 (UTC)
As further potentially useful information, at the bottom of this page there’s a discussion on the principles used by Allen for transliteration; regarding s vs. z they line up pretty well with my suggestion above: ‘If the hieroglyphic writing of a particular instance of a word has z but the original spelling had s, we transliterate the consonant as s,… but vice versa, if a particular instance has s but the original spelling had z, we transliterate the consonant as s.… We transliterate as z only if both the original spelling and the particular instance have z.’
Some of the other discussions there might also be good to consider, particularly the question of whether to hyphenate compound words, where we don’t seem to have any consistent policy. —Vorziblix (talk) 01:41, 3 June 2017 (UTC)

Barring further objections, I’ll start moving ahead with standardizing entries. —Vorziblix (talk) 05:49, 5 June 2017 (UTC)

Enable sitelinks on Wikidata for Wiktionary pages (outside main namespace)[edit]

Hello all,

Here's an important information about the evolution of Wiktionary sitelinks in the next weeks.

Short version: From June 20th, we are going to store the interwiki links of all the namespaces (except main, user and talk) in Wikidata. This will not break your Wiktionary, but if you want to use all the features, you will have to remove your sitelinks from wikitext and connect your pages to Wikidata.

Long version available and translatable here.

If you have any question or concern, feel free to ping me.

Thanks, Lea Lacroix (WMDE) (talk) 08:23, 1 June 2017 (UTC)

@Lea Lacroix (WMDE) What do you mean by "all the namespaces"? There are many custom namespaces (Appendix, Concordance, Index, Rhymes...) Note the discussion at Wiktionary:Beer parlour/2017/May#Use "Cognate" to link between citation pages. --Vriullop (talk) 08:59, 1 June 2017 (UTC)
The namespaces you mention can be stored in Wikidata as well. The namespace Rhymes, for example, has an equivalent on German Wiktionary, so it will be possible to make links between the pages. About Citations: we're going to investigate to see if it is more relevant to make automatic links with Cognate, or centralized links in Wikidata. Lea Lacroix (WMDE) (talk) 09:08, 1 June 2017 (UTC)
For what it's worth, the Portuguese Wiktionary has a Rhymes namespace too, but it's virtually unused. It has only one page, which is obviously a stub: pt:Rimas:Inglês. But I suppose it wouldn't hurt to link it to our Rhymes:English through interwikis. --Daniel Carrero (talk) 13:36, 1 June 2017 (UTC)
Citations pages match their connected main-namespace pages. An exception is when citations are centralized on one page; one wiki might choose to centralize citations of both "have got someone's back" and "have someone's back" on Citations:have someone's back, where another wiki might centralize them on its equivalent of Citations:have got someone's back ... but one wiki might choose to lemmatize one of those phrases in the main namespace, too, where another wiki might lemmatize the other phrase ... and hard or soft redirects might or might not exist ... much like if a certain discussion happened to take place only on Talk:have got someone's back and not Talk:have someone's back. So IMO it makes as much sense to handle Citations pages via the Colgate extension, as it does to handle main-namespace and talk pages. Interwiki links between citations pages seem not very useful, anyway. - -sche (discuss) 05:48, 2 June 2017 (UTC)
I believe Cognate is the better option for Citations pages. Please see my reply at Wiktionary:Beer parlour/2017/May#Use "Cognate" to link between citation pages. --Daniel Carrero (talk) 11:11, 2 June 2017 (UTC)
What about categories? Most Wiktionaries have the same category for each language, so adding each of them individually to Wikidata is a huge waste of effort. There's no difference between Category:English nouns and Category:Dutch nouns other than the language, they should be treated as the same thing. —CodeCat 18:39, 1 June 2017 (UTC)
This phase is the same done for Wikipedia and other projects, so any interwiki link existing on any page will be exported to Wikidata, except user pages, talk pages and main space provided by Cognate. That means Wiktionary: pages, categories, templates, modules, etc., including custom namespaces I asked for. You have not to add them individually, it is a mass export. So Category:English nouns will continue linking to nl:Categorie:Zelfstandig naamwoord in het Engels, etc., but it will be easier to maintain it in a centralized place. Any interwiki link in one language project will appear in all wikts. Any page renamed or deleted will update interwikis immediately (as currently does Cognate). See d:Q7923975. I suppose that besides Wikipedia, Wikibooks, etc, it should appear Wiktionary with Category:English language, nl:Categorie:Woorden in het Engels and all wikt interwikis. Apart from interwiki links for Wiktionary you can access to equivalent links in other projects, if any. --Vriullop (talk) 06:53, 2 June 2017 (UTC)
About CodeCat's "There's no difference between Category:English nouns and Category:Dutch nouns other than the language ..." This only applies to a small minority of categories in the English Wiktionary. Yes, if we had a centralized database of languages, we could then generate a list of "Category:(language) nouns" in the English Wiktionary, but do many Wiktionaries use a predictable system for their nouns categories, and all other categories? Their category systems might unpredictably change at some point, and some existing and future Wiktionaries might not have figured standards for categories yet. Maybe Wikidata is not the most perfect conceivable system for listing our category interwikis using little storage space but at least it seems to be as flexible as needed. --Daniel Carrero (talk) 11:20, 2 June 2017 (UTC)
@Daniel Carrero: Individual Wiktionaries moving things is an argument for a central repository at Wikidata. If they get moved on the (e.g.) Dutch Wiktionary, then the links will be instantly and automatically updated at the (e.g.) Swahili Wiktionary. —Justin (koavf)TCM 15:53, 2 June 2017 (UTC)
It just occurred to me that we would probably want Category:English language directly connected to d:Q1860, and similar for the main category of other languages. Would this be done as part of this change? —CodeCat 19:55, 11 June 2017 (UTC)
d:Q7923975 is the concept "Category:English language" in multiple Wikimedia projects, including Wikiversity, Wikibooks and Wikisource. Its value "category's main topic" is d:Q1860. --Daniel Carrero (talk) 20:01, 11 June 2017 (UTC)
Hmm, I can't say I understand why there's this distinction. Wikipedia's page w:English language is really not about anything different than our Category:English language. —CodeCat 20:02, 11 June 2017 (UTC)
Probably just to store interwikis for the same category in multiple Wikipedia languages: w:en:Category:English language, w:nl:Categorie:Engels... You get the point. This, in addition to the "Category:English language" in Wikibooks, Wikisource etc. as mentioned above. --Daniel Carrero (talk) 20:06, 11 June 2017 (UTC)
But it has a much more prominent role on Wiktionary than on any of those other sites. It's our main page about English, just like w:English language is Wikipedia's. —CodeCat 20:11, 11 June 2017 (UTC)
See for example English Wikibooks. b:Subject:English language is connected to d:Q1860 and b:Category:Subject:English language to d:Q7923975. Accordingly, Wiktionary:About English should connect to Q1860 and Category:English language to Q7923975. --Vriullop (talk) 12:01, 12 June 2017 (UTC)
I agree with Vriullop. Our Category:English language is pretty awesome, with the description full of diverse English-related links, but its main purpose is still just being the main category about English. Wikidata controls interwikis, and the category interwikis should probably be between categories only. --Daniel Carrero (talk) 12:32, 12 June 2017 (UTC)
Is there a discussion page or something else about these "category" items, that might explain their purpose and thus help decide whether Category:English language should be in there? —CodeCat 20:13, 11 June 2017 (UTC)
@CodeCat: I don't know about any discussions, but as I'm sure you know, apparently anything that exists in separate Wikipedia pages also may have separate Wikidata items for the sake of storing interwikis.
--Daniel Carrero (talk) 14:07, 14 June 2017 (UTC)
To be more exact, apparently any page in some Wikimedia projects can have its own Wikidata item to store interwikis. For example, d:Q30237873 is the recent Wikinews article "Theresa May's Conservative Party wins UK election but loses majority, leaving Brexit plan in question". --Daniel Carrero (talk) 06:58, 19 June 2017 (UTC)

Cognate & redirects[edit]

Hello again,

Several persons mentioned the idea of Cognate linking to redirection pages. This is a complex issue that should be decided with a consensus of different languages communities. To complete the discussions that have been running on Phabricator and have more points of view, I created a discussion topic here. Feel free to add a comment. Thanks, Lea Lacroix (WMDE) (talk) 13:13, 1 June 2017 (UTC)

lexicographic approach to learning[edit]

One the main uses of a lexicographic resource is that of educational learning, therefore it would enrich Wiktionary to create a discussion room to ask for advice from advanced users regarding the learning process itself. The feedback would enable discovering current weaknesses and so improve Wiktioanry on the whole. --Backinstadiums (talk) 15:07, 1 June 2017 (UTC)

But it still leaves the learning process with an excessively narrow focus on people like us. We are already pretty good at taking ourselves as model users. We need more than a convenience sample of active, interested users of Wiktionary. Any thoughts on how to get that? DCDuring (talk) 16:45, 1 June 2017 (UTC)
@Backinstadiums, can you give an example of what kind of discussion you would expect to see in such a page? — Ungoliant (falai) 01:31, 2 June 2017 (UTC)

Personal advice from those who are already fluent, what they'd do differently knowing what they know now, mistakes to be avoided, grammatical aspects which the entry of a certain term doesn't clarify, etc. For example, strategies to learn chinese characters and their respective pronunciation (homophones), or arabic broken plurals (unpredictability). --Backinstadiums (talk) 06:02, 2 June 2017 (UTC)

I'm very much interested in having a permanent discussion around this. You're interested in learning strategies in general, or the role of Wiktionary? I think a more general learning discussion might be a bit off-topic here (there are already plenty of resources on the web), I'm more interested is how Wiktionary can be part of the learning process. I recently ran a workshop “How to learn a language by building a dictionary”. Making / editing entries forces you to think about the language in a structured and formal way, plus you can include your own material (photos you have taken, example sentences you came across, citations from books you read etc.). There was interest in it but the problem is (again and again) that Wiktionary is still unknown and that contributing is difficult. – Jberkel (talk) 07:32, 17 July 2017 (UTC)
I would think that it would make sense to have a project page in the Wiktionary namespace which interested users could put on their watchlists. Any issues that emerge that would benefit from a wider audience of Wiktionarians could be brought to WT:BP. DCDuring (talk) 12:25, 17 July 2017 (UTC)

Parameters of Template:quote vs Template:quote-book etc.[edit]

Currently we have {{quote}}, which is just for formatting the text of a quote much like {{usex}}. Then we have {{quote-book}} and its relatives, which also show the source info above the quote, and are a lot more elaborate and contain many parameters. {{quote}} is quite easy to use, since it works the exact same way as {{usex}} and people are very familiar with that. I think it would be desirable if the other set of templates could be modified so that they are more compatible with {{quote}}, so that the more elaborate templates are easier to use for people (like me) who are used to {{quote}} and {{usex}}. I'm thinking mainly of the parameters: first parameter is language, second is quote, third is translation. Would this be ok? —CodeCat 18:56, 1 June 2017 (UTC)

I'm OK with that. You might have to do a lot of cleanup though, since language codes aren't required in {{quote-book}} right now, even if there is a translation. DTLHS (talk) 18:58, 1 June 2017 (UTC)
Yes, and I believe it accepts language names rather than language codes too. This is a drawback of these quote templates currently: they don't tag the quoted text, which {{quote}} does. —CodeCat 19:01, 1 June 2017 (UTC)
Pinging @Smuconlaw as a person who has worked a lot on these templates. —CodeCat 20:37, 1 June 2017 (UTC)
I'm not very clear as to what is being suggested. All the {{quote-}} family of templates can be used with positional parameters for the basic parameters; for example, {{quote-book|[year]|[author]|[title]|[url]|[page]|[passage]}}. I don't understand what "tagging the quoted text" entails, nor why adding a language code as the first parameter is needed. The {{quote-}} (and {{cite-}}) templates have not hitherto had a language code parameter, so if such a parameter is added a bot would need to update all the uses of the template. — SMUconlaw (talk) 20:29, 2 June 2017 (UTC)
All templates on Wiktionary that display words in another language will wrap text in a bit of HTML that indicates the language of the text, by convention using either the first parameter to indicate the language, or the lang= parameter, depending on the template. However, this is missing for the quote- templates. They have a language parameter named language=, but it's optional, and it's provided with a language name rather than a language code. It doesn't follow the conventions of other templates. What I would like is if the first three parameters of the quote- templates could be the same as those of {{quote}}, with the others being named parameters: {{quote-book|[language code]|[passage]|[translation]|title=[title]|url=[url]|author=[author]|page=[page]|year=[year]}}. The current transliteration= parameter would be renamed to tr=, again to match other templates. —CodeCat 20:36, 2 June 2017 (UTC)
I guess I have no strong objection to changing the first three positional parameters so long as a bot can come along and carry out all the required changes to pages where the templates have been used. However, I'm afraid I have no experience with adding language tagging to templates. What is such tagging for, actually, and which text is tagged – the quotation? — SMUconlaw (talk) 21:11, 2 June 2017 (UTC)
Script tagging goes along with language tagging, allows the text to be displayed in appropriate fonts, which are specified in MediaWiki:Common.css. This is particularly important for non-Latin scripts, or Latin scripts containing unusual characters. Language tagging (I hear) tells screen readers which language the text is written in, allowing them to read it correctly. Script tagging is generally added at the same time as language tagging is added, by Module:script utilities. Linking templates like {{l}} and {{m}} add script and language tagging using this module, as well as {{usex}} and {{lang}}. — Eru·tuon 22:00, 2 June 2017 (UTC)
If you want to see it in action, go to Special:ExpandTemplates and type in {{quote|en|this is a quote}}. —CodeCat 22:04, 2 June 2017 (UTC)

A category for words like legged and learned[edit]

I’d like us to have a category for words with -ed pronounced a full -èd where -’d is expected (in other words, it wouldn’t include words like pitted and modded), but I can’t think of a good, accurate name. Any ideas? — Ungoliant (falai) 20:28, 1 June 2017 (UTC)

Words with unexpected syllabic -ed? — Eru·tuon 20:37, 1 June 2017 (UTC)
Are you talking about homographs with different standard pronunciations and distinct meanings, but not distinct etymologies? Some more: aged, cussed, dogged. Would you include words that just had different (presumably standard) pronunciation of the -ed like alleged. DCDuring (talk) 22:58, 1 June 2017 (UTC)
Why not something like Category:English heteronyms (-ed). One could see how to generalize the category name to other morphemes or morpheme groups, though the scheme would probably not work for all heteronyms in all languages. DCDuring (talk) 23:02, 1 June 2017 (UTC)
Hmm, heteronym apparently means a word with a different pronunciation and meaning. The category we're speaking of should only relate to pronunciation. — Eru·tuon 23:33, 1 June 2017 (UTC)
legged and learned have common etymologies and related meanings, but not identical meanings. alleged is, I think the only one of those so far mentioned with the same meaning for both pronunciations. DCDuring (talk) 23:49, 1 June 2017 (UTC)
I assumed from the original post that the category would only relate to pronunciation, because @Ungoliant MMDCCLXIV didn't mention meaning at all. — Eru·tuon 00:31, 2 June 2017 (UTC)
Yes, I’m talking about a category for words that have -ed but don’t follow the suffix’s usual pronunciation rule (as Erutuon explains below), regardless of the relationship between its senses and regardless of whether the same word is pronounceable in a way that follows the rule. — Ungoliant (falai) 01:30, 2 June 2017 (UTC)
For more context, the rule is that the suffix -ed is pronounced /d/ or /t/ after most consonants, but /ɪd/ or /əd/ after /d/ or /t/. (The difference between /ɪd/ and /əd/ is dialectal; some dialects have one, some have the other.) So the category would be for words in -ed that don't follow this rule, that should have /d/ or /t/, but have /ɪd/ or /əd/ instead. — Eru·tuon 00:31, 2 June 2017 (UTC)

Here's a comprehensive summary for -edness, -edly . --Backinstadiums (talk) 05:58, 2 June 2017 (UTC)

Category:English terms with unexpected syllabic -ed it is. — Ungoliant (falai) 20:57, 7 June 2017 (UTC)
I don't like the word the "unexpected". Is there a better way to exclude -ted and -ded? --WikiTiki89 21:30, 7 June 2017 (UTC)
As a native speaker I certainly expect them or at least allow for their possibility, except for the ones based on misspellings (wretch) or with one of the pronunciations being for a rare word, usually a verb (eg, sacre).
Do I understand correctly that the motivation for the category is that someone who encountered in writing the word being used in a sense which commonly required the separate syllable for -ed would mispronounce it, having insufficient experience hearing it? If so, this seems a much better Appendix than a category. A usage note containing a link to such appendix would be more useful than a category. I think an appendix would allow for more flexibility (and length) in its title. DCDuring (talk) 00:34, 8 June 2017 (UTC)
I think the usefulness is simply that someone may be interested in seeing a list of words in this category, which is exactly what categories are for. --WikiTiki89 15:27, 8 June 2017 (UTC)
@Wikitiki89 feel free to rename it to anything you think will be accepted. — Ungoliant (falai) 15:23, 8 June 2017 (UTC)

template:univerbation and Category:Univerbations[edit]

I think we're missing this. For example, зачем (začem) is not a prefixation, it's really the preposition за (za) with its instrumental regimen чем (čem) (compare French pourquoi). --Barytonesis (talk) 12:05, 2 June 2017 (UTC)

Wouldn't that fall under {{compound}}? -- GianWiki (talk) 11:07, 16 June 2017 (UTC)
Sounds like it would be a subtype of compound: one in which the elements were formerly separate words in a phrase. — Eru·tuon 18:23, 16 June 2017 (UTC)
Sounds like a great idea. It would be useful in the Greek etymon of ephemeral, for instance. — Eru·tuon 18:23, 16 June 2017 (UTC)
@Erutuon: I went ahead and created Category:Univerbations by language as well as Category:Russian univerbations. I'd like to create {{univerbation}}, but my ineptness at computing prevents me from doing anything other than copy-pasting the code of other templates; yet I don't want to copy-paste {{back-formation}} or {{doublet}} or {{unknown}} and so on, because I understand they're all using obsolete parameters like "lang", and will eventually be updated. --Barytonesis (talk) 19:43, 27 June 2017 (UTC)
@Barytonesis: I was going to make {{univerbation}}, but I'm not quite sure what parameters it would need. For instance, how would you like to encode the example above, зачем (začem)? Something like {{univerbation|ru|за чем}}, or {{univerbation|ru|за|чем}}? — Eru·tuon 04:27, 29 June 2017 (UTC)
@Erutuon: Good question. Several thoughts (I will depart a little from your question, sorry about that):
  • Do we plan to eventually create categories for compounds like Category:English terms derived from "apple", as it was suggested here? If so, then I guess we could do the same for univerbations (and in that case, the code would have to be {{univerbation|ru|за|чем}}, wouldn't it?). But that idea strikes me as inappropriate in the case of univerbations (Category:Russian univerbated syntagms containing за doesn't make much sense); and makes me actually think that univerbations are rather a sister category to compounds than a subtype of them.
  • In any case, I think that no "+" sign should appear in the output: both markups should give "за чем" rather than "за + чем".
  • For these two reasons, right now I don't really feel strongly one way or the other.
  • As I mentioned in this vote, I'd rather have no automated text at all in etymology templates, and type "Univerbation of" in plain text. What is your opinion about that? --Barytonesis (talk) 10:17, 29 June 2017 (UTC)
If we did want to add categories for univerbations containing individual words, it would be possible to do that with either {{univerbation|ru|за чем}} or {{univerbation|ru|за|чем}}, but I think {{univerbation|ru|за|чем}} would make it easier. Also, {{univerbation|ru|за чем}} is more prone to human error: it would be easy for an editor to fail to link the words. I suppose {{univerbation|ru|за чем}} is another option; it could be linked in the same way that {{head|en|phrase}} adds links for each word automatically.
I agree about having no plus sign between the constituent words, since they form a natural phrase.
I was going to say that if we use {{univerbation|ru|за|чем}}, there might be the annoyance of having separate |altN= parameters for each word, to add accent marks to Russian words and macrons to Latin words (and so on). For instance, the univerbation ἐφήμερος (ephḗmeros) would have to be encoded as {{univerbation|grc|ἐπί|alt1=ἐπῐ́|ἡμέρα|alt2=ἡμέρᾱ}}. But actually, it could just be written as {{univerbation|grc|ἐπῐ́|ἡμέρᾱ}}, and the linking module would automatically generate the actual entry titles.
I'm sympathetic to not having manual text "univerbation", but it might be a good idea to link to a definition in Appendix:Glossary, since few people are likely to know the term, so I'm not sure. — Eru·tuon 18:45, 1 July 2017 (UTC)

Proposal: Clean up, rename and replace "en:" → "English" in all categories[edit]

(This obviously would need a vote to be implemented.)

In the past, some old categories like Category:es:Japanese derivations and Category:es:Derogatory were renamed to remove the language code. I think this was an improvement. (related votes: Derivations categories, Lexical categories)

Please check if the grammar is OK everywhere. Feel free to make any corrections, suggest any changes or ask any questions.

See also discussion: User talk:-sche#Properly splitting topic and set categories.

Place names (see also WT:Place names for naming conventions, edit that page if needed)

Proposal: ALWAYS add the country at the end when applicable.

English ... jargon

(to avoid questions like: is "Medicine" for medicine jargon or for terms relating to the medicine?)

English names of ... (for proper nouns?)
English terms relating to ... (or "pertaining to", or "involving", etc.)

Reason for "relating to" -- most or many of the description of these categories use "related to"

--Daniel Carrero (talk) 14:16, 2 June 2017 (UTC)

It seems somewhat ok, hesitantly. My biggest gripe at the moment is that it's not Category:English names of cities in Ontario, Canada. The addition of country names is a definite improvement. I'll abstain for the moment on the "relating to" categories. —CodeCat 14:43, 2 June 2017 (UTC)
Oops! I made a mistake and typed Category:English cities in Ontario, Canada without "names of". You made me realize that, and I fixed it in the list above. --Daniel Carrero (talk) 14:45, 2 June 2017 (UTC)
I don't know that I disagree as such but I think this will cause a lot more problems than it solves and it seems like a huge undertaking for very little benefit. —Justin (koavf)TCM 15:53, 2 June 2017 (UTC)
I see. What problems do you think it will cause? FWIW, see Wiktionary:Votes/2017-03/Request categories 2 for another large category renaming project that was voted and approved, and was successfully implemented. --Daniel Carrero (talk) 15:58, 2 June 2017 (UTC)
@Daniel Carrero: The wording of many of these categories. "English terms referring to [X]" or "English terms related to [X]"? "English words for [Y]" or "English language words for [Y]", etc. As it stands now, the scheme is very straight forward: code:Idea. —Justin (koavf)TCM 16:32, 2 June 2017 (UTC)
The current scheme is also ambiguous about whether a category is for words in a given set, or words related to a topic. This has been causing quite a few headaches lately. Is Category:Stars for names of individual stars, words for types of stars, or any words related to stars? What if we want categories for each of these types? Daniel's scheme at least avoids the ambiguity: Category:English names of stars, Category:English terms related to stars. I'm not sure where words for types of stars would go. —CodeCat 16:36, 2 June 2017 (UTC)
Shall we introduce a new type of category to the proposed list? I'm thinking of this:
There's a 2011 vote that concerned what to do with a few categories, including specifically Category:en:Stars, but the approved rule is not being followed right now. According to Wiktionary:Votes/2011-08/Categories of names 2, Category:en:Stars must contain only names of stars, and never other star-related terms like fixed star, quadruple star, quadruple star system, etc. The category description says the same thing. But the name Category:en:Stars does not help, it could easily contain any of those "forbidden" terms.
Another example of a currently confusing category name: In practice, Category:en:Internet contains two things: terms involving the internet (frame, FTP, e-mail...) and terms used in the internet (FWIW, IOW, IYKWIM...). The latter should actually be in Category:English internet slang, but once again the category name doesn't help -- "en:Internet" could mean either of the aforementioned possibilities. If this proposal passes, Category:en:Internet should be renamed to Category:English terms relating to the internet (I guess the "the" fits here, right?). Feel free to discuss about different wordings, other than "relating to", but I believe we can always say just "English", never "English language" -- after all, we use Category:English nouns, never Category:English language nouns.
Aside from that, the place name categories need to be cleaned up one way or another, naturally. --Daniel Carrero (talk) 16:53, 2 June 2017 (UTC)
What's the supposed benefit of using language names rather than language codes? Names can be awkwardly long, and are not guaranteed to be unique. Equinox 19:59, 2 June 2017 (UTC)
They are guaranteed to be unique, per WT:LANG. And the benefit is that users can't be expected to learn language codes merely to use Wiktionary. It's an unnecessary barrier. —CodeCat 20:00, 2 June 2017 (UTC)
Yes, they are guaranteed to be as unique in category names as they are in L2 headers. Which means, there could be cases where people assume "Riang" means the Bangladesh language ria when it actually means the Burma language ril, but that shouldn't cause any more headaches with categories than it does with L2 headers. - -sche (discuss) 22:50, 2 June 2017 (UTC)
There are two components to this proposal: changing language code to language name, and changing the names of the categories (the part after the language code or name). Hypothetically the second could be done without the other: that is, there could be a chimeric category name, Category:en:Names of cities in Kyoto Prefecture, Honshu, Japan. I suppose it makes sense to do both at the same time, since (I imagine) thousands of categories will have to be renamed when either change is made. However, some editors might want to keep language codes but have the other part of the category name be changed. (The other option, using language names but not changing the rest of the category name, will not work: Category:en:SexCategory:English sex is nonsensical.)
On the whole, the category names are clearer, but they will be more difficult to find when one is categorizing, because they are longer. Admittedly, the existing categories are difficult to find too, especially when creating one that doesn't exist for a particular language yet. I wonder if a tool could be created to make this easier. HotCat isn't quite what I'm thinking of. Perhaps something where you could type in a phrase, like "cities kyoto" and find Category:English names of cities in Kyoto Prefecture, Honshu, Japan, or the umbrella category thereof.
Functionally, the only purpose of using language codes rather than names is to distinguish topic or set categories from the categories that use language names (part-of-speech categories especially). If language names are used, the only distinction will be the category names after the language name. So, the two types of categories will be harder to tell apart. — Eru·tuon 21:48, 2 June 2017 (UTC)
Category:English sex is nonsensical, but the colon doesn't have to be deleted, so Category:English:Sex. —CodeCat 22:00, 2 June 2017 (UTC)
I would indeed favour something like that over "sentence" categories like "English terms involving/about/relating to sex". The latter seem a bit wordy and would probably sound actively bad or strange in certain cases. BTW, regarding Daniel's proposal, I think "jargon" is a very poor choice: it is a loaded term, suggesting that this is needlessly complex language; we did in fact remove "jargon" glosses from a lot of entries at one stage. Equinox 22:21, 2 June 2017 (UTC)
I have similar concerns/dislike as Erutuon and Equinox towards the long names. "English names of cities in Kyoto Prefecture, Honshu, Japan" is horribly long, and probably too finely "granular" (but the latter is a separate issue). Long names are harder for users to write when adding or searching for categories.
Language codes have several benefits over language names, including being shorter and not needing to be moved when we rename a language (moving 100 categories is a major hassle which I suppose we can avoid now by mass-deleting the categories and letting bots create the new cats and WikiData sort the interwikis out). OTOH, I understand those who feel they are a barrier for less-adept users.
Can we avoid the unwieldy "sentence" names and make the topic category "English:Foobar", and the set/list category "English:List of foobars"? (Another idea, proposed on my talk page, is "Category:English:topic:Foobar" vs "Category:English:List:Foobar(s)".)
As others have said, we should avoid calling things "jargon". Maybe "terminology" would work. - -sche (discuss) 22:42, 2 June 2017 (UTC)
Re: "Functionally, the only purpose of using language codes rather than names is to distinguish topic or set categories from the categories that use language names (part-of-speech categories especially)." -- It's not been always like this in the past, like in the old categories mentioned in the op (Category:es:Japanese derivations and Category:es:Derogatory). Even if that rule is supposed to be followed now, we have Category:English female given names, Category:English surnames, etc., and if we rename any category to contain "names", "terms", "jargon" or "terminology" in the name, it technically becomes a "lexical" category and will need to start with "English..." as per this rule.
Are codes better than names always, or are names better than codes always? If we had no categories for derivations whatsoever, would we want to create Category:es:Japanese derivations? The proposed category names like Category:English terms related to sex are supposed to be straightforward -- the category contains what the title says, as in a normal English text.
The granularity of place name categories is optional. I proposed Category:English names of cities in Kyoto Prefecture, Honshu, Japan, but it could be Category:English names of cities in Japan, even though I prefer the longer name.
By the way, if we renamed all categories as proposed above we could remove "by language" from all categories. In the current naming scheme, we have this:
If this proposal passes, we could have this, without "by language" anywhere (unless we want to keep the "by language", of course, but that won't be a requirement like today):
Let's compare this: Category:English medicine terminology and Category:English medicine jargon. Is "terminology" really better? I wouldn't like the category to have common terms like disease, heal, doctor, check-up. So I feel that "jargon" is better for the purpose of avoiding these terms.
Additional proposal: we could have one module listing all catgegories with "names of", another module for "terms for", another for "terms relating to", and another for the list of place names. --Daniel Carrero (talk) 02:16, 3 June 2017 (UTC)

We seem to have many different ideas. Let's have a poll to get a rough tally of whether more people like names vs codes, and whether more people like shorter/condensed or longer/descriptive names. That way, we can see what kind of names we should focus on (like, if most people want descriptive names, then we can focus on deciding whether "pertaining to" or "involving" is better, but if most people want short names, then bikeshedding the format of hypothetical long names would be silly). - -sche (discuss) 22:42, 3 June 2017 (UTC)

Categories for terms which are limited to the "jargon" of medicine (if you are distinguishing them from "terms that pertain to the topic of medicine") are neither "topic" nor exactly "list" categories, IMO, but are in the same vein as categories for terms which are limited to British English. So, I don't think my poll covers them, because I think they should be addressed separately. - -sche (discuss) 22:42, 3 June 2017 (UTC)

There are way too many reasons not to like this, so I'll start with just one: all of these ideas move the interesting information farther and farther to the end of the name "blahblahblahblahblahblahblah fish" in newspaper jargon terminology talk this is known as burying the lede. The way sorting works means we can't rearrange the order, so we need to be as concise as possible in the left-hand parts. Take a look at the massive logjam of categories at the bottom of most multilingual entries. Now imagine it being twice the size without a corresponding increase in the size of the rightmost nodes. Someone mentioned HotCat: the auto-complete feature is going to be pretty much useless if you have to type in pretty much the whole category name before it gets to the part that it can fill in. And all the stuff about descriptive English sentences being easier for people to understand was the main selling point of COBOL- remember COBOL? I'm sure every programmer aspires to someday write code like the immortal "ADD A TO B GIVING C", and I'm sure any kid in grade school could debug three COBOL business applications before breakfast... right? >;p Chuck Entz (talk) 09:07, 4 June 2017 (UTC)
@Chuck Entz: Do you think you could at least agree with me that the current language-code categories need to be cleaned up one way or the other? I'm pretty sure we can't say that the current system is acceptable. One of Wikimedia's values is "We strive for excellence." Place name categories are probably the messiest of all.
I don't mean it in a sarcastic way, it's just a normal question: As you know, I prefer language names instead of codes and I gave my reasons in this discussion. But if language codes were the best option always, shouldn't all language-specific categories like Category:English nouns and Category:Italian terms derived from Ancient Greek be renamed to include language codes instead of names? When we use the auto-complete feature, I believe we have to type at least "Italian terms d" if we want to get Italian terms derived from other languages. Or we just navigate to Category:Italian terms derived from other languages (which is named using normal English text, like the other categories I'm proposing).
Based on this discussion and this vote, I think in 2011 I myself helped to introduce and cement our existing tradition that apparently "lexical" categories have language names and "topical" categories have language codes, but it was just because I wanted to clean up some categories and put forward the proposal of replacing say Category:es:Euphemisms into Category:Spanish euphemisms without necessarily having to change the whole system yet.
I don't think the logic of programming languages applies to our categories. COBOL's "ADD A TO B GIVING C" is not better than "A+B=C", but Category:English names of stars has some merits discussed elsewhere in this discussion, as opposed to Category:en:Stars. If we want a category for medicine slang (multileaf collimator? s/p? CINV?) which cannot contain everyday words like heal, disease and doctor, then maybe the word "jargon" (or slang, or terminology) somehow needs to be in the category name.
Just for the sake of discussion, if having short categories with the main idea first is important, I wonder if there's some merit to using names like Category:Medicine (English, jargon), Category:Stars (English, star names) or Category:Stars (English, related terms). I'm not saying I support having these names, I'm just discussing what their merits are. We could also talk about renaming Category:Italian terms derived from Ancient Greek to Category:grc→it and Category:Italian terms derived from other languages to Category:*→it, which is probably one of the shortest options available (not to mention these are shorthands that feel kind of related to programming language logic) where they could still be understood as derivation categories. Again, I'm not trying to be sarcastic. Obviously, I don't think the shortest names are always the better ones. I know "grc→it" is a bad category name (in my opinion, at least), but I'm fine with discussing multiple possibilities. --Daniel Carrero (talk) 10:37, 4 June 2017 (UTC)

Another category where this is an issue: Category:en:Forests. While the description and parent category suggest it's for names of forests, it also contains terms for types of forests, and terms related to forests. —CodeCat 22:10, 28 June 2017 (UTC)

Poll 1: Language names vs codes[edit]

On a new line, please indicate what you support:

  • (1) language names (like "French") to be used in naming topic categories (like categories for terms pertaining to the topic of religion);
  • (2) language names to be used in set/list categories (like lists of dog breeds); or
  • (3) language codes (like "fr") to be used in naming topic categories;
  • (4) language codes to be used in set categories.
Thus, if you want names to be used in all cases, you can indicate "1&2", or if you want codes in all cases, then indicate "3&4". But if you want a mix like "1&4", you can indicate that.
  1. Abstain for now; both codes and names have both benefits and drawbacks. - -sche (discuss) 22:42, 3 June 2017 (UTC)
  2. 1&2 —CodeCat 22:55, 3 June 2017 (UTC)
  3. Support 1&2.
    As I proposed above and then further commented in my reply to "Poll 2" below, I'd prefer category names that read like normal English, like Category:English terms relating to sex. In "real life", are many people even aware of different ISO codes for each language, not to mention our non-ISO made-up codes like "alv-pro", "nai-dly" and others found in Module:languages/datax?
    Currently, Category:Baseball contains 18 subcategories like "en:Baseball", "fr:Baseball", "ko:Baseball", etc. Sure, we all know what they mean, but navigating them requires learning how ISO handles codes and how we separately handle them. Language codes are gibberish if you don't know what they mean. — Sure, even "I like monkeys." is gibberish if you don't know its meaning, but you get the point. Using normal text with the actual language name should make the category contents immediately obvious to English speakers.
    Categories starting with our specific set of ISO and made-up language codes are an English Wiktionary signature, and therefore a barrier for reuse. Anyone is allowed to copy Wiktionary content - they can create mirrors, books, CDs with it. So, it's better to make the material as "generic" as possible. If a reader is navigating a new site called www.definitelynotwiktionary.com and finds a category name like Category:gmq-bot:Music, they will probably not understand what "gmq-bot" means. Even if we generously assume that the reader has the privilege of being knowledgeable about our language codes, then it's possible they will be compelled to think "oh so we're using that English Wiktionary system now, I'd better get my list of language codes that they use to start navigating categories like this". By contrast, if we renamed Category:gmq-bot:Music to Category:Westrobothnian terms relating to music, the new name should be sensible anywhere. --Daniel Carrero (talk) 05:07, 4 June 2017 (UTC)

Poll 2: Longer vs shorter names for set categories[edit]

On a new line, please indicate what basic type of name you support — for the part of the category that comes after the language name or code:

  • (1) Long descriptive names like "names of municipalities in São Paulo, Brazil" and "names of dog breeds" or "terms for dog breeds". (The precise format, like "names of" vs "terms for", can be worked out next if it's clear that people prefer long descriptive names to short names.)
  • (2) Short names that contain "list" or "set" to distinguish list vs topic categories: like "list:municipalities in São Paulo, Brazil" and "list:Dog breeds" or "set:Dog breeds".
  • (3) Very short names that don't distinguish list vs topic categories: like "municipalities in São Paulo, Brazil" and "Dog breeds".
  1. Prefer (2). Oppose (3) as too ambiguous. - -sche (discuss) 22:42, 3 June 2017 (UTC)
  2. I believe (1) is the best option, because names like Category:English terms relating to sex are to be read as normal English text, like other categories already are. Oppose (2) because the distinction between list, or set, or topic is arbitrary and not immediately obvious. And of course, oppose (3) as too ambiguous. See further comments below.
    If you get all terms relating to Christianity, this is equally a list, a set and the topic of Christianity-related terms. If we choose to pretend these words have unique meanings which clearly set them apart (which they don't), this would be an obvious kludge to avoid category overlap using as few characters as possible. Using "list", "set:", "topic:" at the start of categories is like using obscure abbreviations to save a few characters. (Some people oppose using template abbreviations like {{der}} instead of {{derived}}, saying that the latter is easier to read. What to do with templates is a separate discussion, but that is a good point. Aside from that, I believe we have a consensus not to use abbreviations like q.v., L., Gr., esp., cf., &c. in etymologies.) If we implemented these arbitrary category prefixes, I fear we would probably have to constantly lecture anons and new editors about what is the correct category prefix to use.
    What I'm saying here is consistent with what I proposed above. The proposed category names are supposed to be read as normal English text. Currently, we have to wonder if categories like Category:en:Stars contains names of stars, and/or types of stars, and/or terms relating to stars. If this proposal passes, we may have this:
    I'm not saying all the three "star" categories above will have to be created, but I'd like any existing category to conform to a naming system like this. The wording may change if people want, naturally.
    In the past, other proposals that made category names read like normal English text were voted and approved:
    --Daniel Carrero (talk) 04:25, 4 June 2017 (UTC)
  3. Prefer (1). Category names should describe exactly what they contain. DTLHS (talk) 05:11, 4 June 2017 (UTC)
  4. 1&2. Oppose 3 as too ambiguous as well. —CodeCat 15:31, 4 June 2017 (UTC)

Poll 3: Longer vs shorter names for topic categories[edit]

On a new line, please indicate what basic type of name you support — for the part of the category that comes after the language name or code:

  • (1) Long descriptive names like "terms pertaining to Christianity" or "terms relating to Christianity". (The precise format can be worked out next if it's clear people prefer long descriptive names to short names.)
  • (2) Short names that contain "topic" to distinguish list vs topic categories: like "topic:Christianity".
  • (3) Very short names that don't distinguish list vs topic categories: like "Christianity".
  1. Prefer (2). Oppose (1) as too long (cf Chuck's comments further up this thread), and oppose (3) as too ambiguous. - -sche (discuss) 22:42, 3 June 2017 (UTC)
  2. I believe (1) is the best option, because names like Category:English terms relating to sex are to be read as normal English text, like other categories already are. Oppose (2) because the distinction between list, or set, or topic is arbitrary and not immediately obvious. And of course, oppose (3) as too ambiguous. See further comments in my response to the "Poll 2" above. --Daniel Carrero (talk) 04:25, 4 June 2017 (UTC)
  3. (1) DTLHS (talk) 05:12, 4 June 2017 (UTC)
  4. 3: Category titles are supposed to be as short as possible. Purplebackpack89 15:30, 4 June 2017 (UTC)
  5. 1&2. Oppose 3 as too ambiguous as well. —CodeCat 15:31, 4 June 2017 (UTC)
  6. Either 1 or, preferably 2, because they permit the addition of usage-context categories (eg, military slang for woman, member of the local population, etc.). DCDuring (talk) 19:51, 1 July 2017 (UTC)


From the results of the poll above, I gather that people want language names in the categories, but that there is less consensus on the format of the rest of the name. What happens next? —CodeCat 19:26, 1 July 2017 (UTC)


Sorry, I'm being lazy (or "delegating"; there's other stuff I want to focus on). This should probably be a prefix though: we have enviro-friendly, environazi, environut, Enviropig, envirospeak, envirotard. Most currently have blend etymologies. Equinox 00:57, 3 June 2017 (UTC)

Done. —Vorziblix (talk) 03:23, 4 June 2017 (UTC)

June Lexisession: concert[edit]

A concert.

Monthly suggested collective task is to take care of concert. You already have a Wikisaurus:musical instrument and a Wikisaurus:musical composition but nothing about the show! Well, in June, there is the Fête de la Musique [World Music Day] and that's enough to plan a new Wikisaurus:concert, isn't it? Also, there is plenty pictures on Commons to illustrated entries related to musical performances.

Show must go on!

By the way, Lexisession is a collaborative experiment without any guide nor direction. You're free to participate as you like and to suggest next month topic. If you do something this month, please report it here, to let people know you are involve in a way or another. I hope there will be some people interested by playing music Face-smile.svg Noé 09:21, 3 June 2017 (UTC)

Where to place {{wikipedia}} templates?[edit]

@Atitarev, Cinemantique, Wikitiki89, CodeCat In new entries, I've been putting {{wikipedia|lang=ru}} templates just under the ==Russian== headword, but some existing entries put in under the ===Noun=== or similar headword. See атеи́зм (atɛízm) for an example where I moved it up. Not sure if this is correct, comments? Also, is there a difference for English-language and foreign-language Wikipedia references? An example where I put both is пого́ст (pogóst); this Russian term has several meanings specific to Russian culture and has an English-language entry under pogost, which is helpful in explaining some of the meanings. Benwing2 (talk) 20:50, 3 June 2017 (UTC)

For English terms I used to place them under the ==English== header too. However, another editor tended to remove and replace them with:
===Further reading===
* {{pedia}}
so that's what I do now. — SMUconlaw (talk) 20:59, 3 June 2017 (UTC)
IMO {{wikipedia}} should go under the language header; I find it a bit messy if it's under the POS, maybe unless it only applies to e.g. the noun section of an entry that lists a verb first, in which case I understand putting it underneath the relevant headword line template. Alternatively using {{pedia}} as Smuconlaw describes is also fine. - -sche (discuss) 01:01, 4 June 2017 (UTC)
The situations with Translingual (taxonomic) is very different than for other languages: the pedia, species, and commons links are on all fours with other external links (which can be numerous). Taxonomic entries often have images so the right hand side can become cluttered and push into other language sections. Thus it makes more sense to me to put the sister-project links under "References".
Something similar applies to entries for English vernacular names of organisms, which often have either images or many sister-project links.
Finally I have the ToC on the right hand side, which further pushes right-hand side content down into other language sections.
I don't know how applicable this is to the use of such links in other L2s. DCDuring (talk) 05:04, 4 June 2017 (UTC)
Thanks for all the comments. Benwing2 (talk) 23:41, 11 June 2017 (UTC)
I'm one of the editors who tends to convert {{wikipedia}} to {{pedia}} in Further Reading. The main reason is consistency: why should Wikipedia links get a special treatment? The box is also quite big and takes up a a lot space on the screen, especially when multiple links are stacked. Even the {{wikipedia}} documentation page says: "Consider instead using the inline version of this template". – Jberkel (talk) 06:15, 15 June 2017 (UTC)
I prefer all sister-project-link templates be placed immediately under the language heading. - [The]DaveRoss 13:35, 15 June 2017 (UTC)
Hmm, there's conflicting documentation and practice. Should we have a vote on this? – Jberkel (talk) 21:53, 15 June 2017 (UTC)

What are / were / should be the rules for anagrams in English and other languages?[edit]

I would be willing to run a bot to update anagram sections (in which subset of languages?) if I know the rules. "Rules" meaning, which characters to ignore / normalize, minimum length, etc. DTLHS (talk) 01:26, 4 June 2017 (UTC)

  • I assume that the rules are the same for usage of the {{also}} template (that also needs bottifying). SemperBlotto (talk) 05:57, 4 June 2017 (UTC)
It would be great if we created a page which documented character normalization policies in a machine-readable format (Wiktionary:Character normalization?). I know these exist to some extent in user-space, but a unified version would be better. - [The]DaveRoss 12:45, 29 June 2017 (UTC)

French Wiktionary monthly news - Actualités[edit]

Logo Wiktionnaire-Actualités.svg


I am dazzled to inform you that the 26th issue of Wiktionary Actualités just came out in English!

As usual, Actualités is in English but talk about French Wiktionary and lexicography in general.

This time: a focus on proposals for Wikimania 2017 related to Wiktionaries, a presentation of two dictionaries about the body and some words about Guaraní language. There is also a stack of statistics, shorts and a game!

As usual, it is translated in English by non-native speakers, so it is not perfect, but can be improved by readers (wiki-spirit and all). Please note that we do not received any money for this publication and we translate it because we are eager to read the same kind of publication about your project in the future and be inspire by your projects. Feel free to leave us comments! Face-smile.svg Noé 10:12, 4 June 2017 (UTC)

Fascinating! Hmm, what is en.Wikt doing that could be reported in such a publication?
Last time I recall us comparing the new words other dictionaries had added to our entries, we similarly already had most of them.
We've been working on templates/scripts that would enable an entry to specify its most recent etymon, and have the script find and display that term's etymon, and that etymon's etymon, to reduce content duplication and dissynchronization.
Efforts to create a module that can automatically transliterate vocalized Hebrew are continuing and may lead to a proposal to Unicode to encode separate codepoints for big and small shvas like big and small qamats.
We've been expanding our coverage of languages that (are not dialects of other languages and) do not have ISO codes (Module:languages/datax).
We've been expanding the number of languages we have referenced/verifiable entries in, using in some cases fr.Wikt's entries (which in many cases were based on en.Wikt's redlinked translations of water, ha).
- -sche (discuss) 02:19, 6 June 2017 (UTC)
Sure, we can report hot topics of en.wikt! Easily if someone from the project do a summary, just like you did. We tried in some old issues, but it was biased because discussions are split in several pages and we didn't know enough the names of the participants to get the whole picture.
The comparison with other dictionaries seems to me very different in English than in French because the editorial choices of French dictionaries is to select only around 60.000 entries whether English dictionary select an average of 100.000. It is purely arbitrary. So in French dictionaries, you will not find any words for technical practices such as leather work for example. Each year, they delete words from dictionaries to save space for the new ones, and I think it is definitively a plus for French Wiktionary, because we do not, and for readers of "old" books, definitions will only be available in Wiktionary, and not anymore in dictionaries! So, French Wiktionary can pretend to have a better coverage than published dictionaries. I think it is more difficult to communicate on this matter for English Wiktionary. Well, I hope you will disconfirm.
Templates for etymologies is a big deal. In French Wiktionary, we prefer to write paragraphs of etymology with large compilations of sources, in a similar way as Wikipedia writing. Policies Wiktionnaire:Étymologie and Wiktionary:Etymology show great differences. French Wiktionary promote long etymologies including folk etymology and false ones with sources. We do not want to have From latin to French in etymology because it is barely false considering the history of the language and the influence of the dialects. We want to trace the path for the forms and the meanings, mentioning regional uses when needed (for example: bataclan). Also, lot of sources says obscure origin when words come from Arab or Gitan (Romani), we prefer to quote these official sources and more recent analyses displaying better data to show that old official sources can be politicly biased. Our etymology have to be more neutral than old ones. I do not judge one strategy better than the other. I think we need to do both. French Wiktionary need to develop a template to display schematic trees with the basic history of words, but also to provide plenty details with the whole history and controversial hypotheses.
Hebrew and Unicode: Great news! I hope you will publish a press release when it will be done!
Expanding coverage for underdescribed languages is great! Is it a global effort or a contribution by few people? In French Wiktionary, it is mainly do by two people, including Pamputt, for water translations Face-smile.svg Noé 08:44, 7 June 2017 (UTC)
  • Wiktionary:Milestones and Wiktionary:News for editors are generally used to announce new things. But they're generally not very interesting. I can't imagine many people going "Wow! How cool! Greek nouns have been reclassified from invariable to indeclinable!!!" -WF.

Wiktionary:Votes/2017-06/Allowing character boxes[edit]

Based on Wiktionary:Beer parlour/2017/May#'character info' box, I created Wiktionary:Votes/2017-06/Allowing character boxes. --Daniel Carrero (talk) 09:55, 5 June 2017 (UTC)

Language codes for East, South and West Slavic[edit]

East Slavic Languages Tree en.png

I was wondering if it wouldn't be a be a good idea to create language codes for Proto East, South and West Slavic. They're well established and would make sense to reconstruct. --Victar (talk) 23:15, 5 June 2017 (UTC)

"Proto-East Slavic" is Old East Slavic, is it not? As for West and South, are you sure it's possible to reconstruct as single Proto-West Slavic and a single Proto-South Slavic? --WikiTiki89 23:18, 5 June 2017 (UTC)
I don't know myself what the differences would be between PES and OES, but if there is no distinction, than I don't think we should have a separate level in descendant trees. I do know that PWS has some pretty distinct features. Matasović argues that South Slavic is strictly as a geographical grouping, not a genetic clade, so I'm not clear on that. @Benwing, Vahagn Petrosyan? --Victar (talk) 01:36, 6 June 2017 (UTC)
West Slavic is really not that distinct. Most of its features are retentions of things already present in Proto-Slavic. —CodeCat 17:23, 6 June 2017 (UTC)
Having distinct features does not mean it is possible to have a single consistent reconstruction. And OES is Proto-East Slavic. Don't forget that "Proto" doesn't mean reconstruction, but just that it is the ancestor of a language group. In this case OES is the ancestor of the East Slavic languages, so it is Proto-East Slavic. --WikiTiki89 17:33, 6 June 2017 (UTC)
I obviously understand what proto means, but if you follow the point I was making, If you're arguing that PES and OES are identical, which I generally disagree with, than we shouldn't have level in a descendant trees for PES above OES, but instead call that branch Old East Slavic. Otherwise that's like adding a Proto Norse above each entry of Old Norse in PGmc descendant trees. --Victar (talk) 18:03, 6 June 2017 (UTC)
Proto-Norse is a separate language from Old Norse, and happens to be marginally attested, even. At least Old Gutnish does not descend from any attested variety of Old Norse, and I recall hearing about similar archaisms in the Finland Swedish and Jutlandic Danish dialects too. --Tropylium (talk) 23:34, 6 July 2017 (UTC)
I already do that. I label the line as "East Slavic" but put the OES term on the same line. —CodeCat 18:06, 6 June 2017 (UTC)
That's great, but I haven't seen that the case in the entries I've come accross, as per my example above. If people are in agreement with that though, I'm satisfied. --Victar (talk) 18:17, 6 June 2017 (UTC)
First of all, why would it be obvious to me what you personally know or don't know? If I want to make sure we are on the same page in terms of terminology, you shouldn't take it personally. Second of all, OES is the ancestor of all East Slavic languages; that makes it by definition Proto-East Slavic. It's not something you can disagree with unless you want to say that OES is not the ancestor of all East Slavic languages (which you could maybe make a case for regarding the Old Novgorod dialect or North Russian, but in that case we probably can't reconstruct a single Proto-East Slavic anyway). If the tree is wrong, it should be fixed; don't impose incorrect descriptions of reality onto reality itself. --WikiTiki89 18:14, 6 June 2017 (UTC)

Manichaean Middle Persian[edit]

Manichaean Middle Persian is currently designated as a separate language from Middle Persian, but this isn't the case, as it's simply one of several scripts used. Shouldn't we delete it? @Vahagn Petrosyan? --Victar (talk) 03:29, 6 June 2017 (UTC)

The difference is not just in the script. Manichaean Middle Persian has systematic dialectal differences from Zoroastrian (Book) Middle Persian. For example, to Zoroastrian nd corresponds Manichaean nn, as in Book bnd (band) : Manichaean bn (bann, bond, link); Old Persian rd gives Zoroastrian l but often r in Manichaean, e.g. sāl vs sār ‘year’; Iranian gives Zoroastrian ar but often ir in Manichaean, e.g. mard vs mird ‘man’.
Even if we decide to merge both under Middle Persian, we should keep Manichaean Middle Persian as an etymology-only language. @ZxxZxxZ, what do you think? --Vahag (talk) 07:13, 6 June 2017 (UTC)
I believe some of those differences are due to limitations and idiosyncrasies of the scripts itself. Note that even though the transcription BMP s’lyn- (to provoke) contains an l, it is pronounced /sārēn-/, as per Cheung. Both alphabets also lacked a full set of vowels. Even so, I think these are minor dialectal changes. --Victar (talk) 08:00, 6 June 2017 (UTC)
As Vahag said the differences are beyond the script, not just Manichaean, but also Zoroastrian (Pazend, Avestan alphabet). I've even read there are differences between the Middle Persian written in Inscriptional Pahlavi and the Middle Persian in Book Pahlavi (which was mostly written in Islamic period and is called "late Middle Persian", as opposed to the "early Middle Persian" of the inscriptions), though I'm not aware of any instances beyond spelling differences (e.g. in the arameograms used). Regarding the s’lyn- instance, it's true, though the letter "l" is also used for l, anyway the instance provided by Vahag is a different case: we know ŠNT was pronounced as sāl in Middle Persian, but it is recorded with r in Manichaean alphabet. I think we should keep them separate. --Z 11:30, 6 June 2017 (UTC)
This family of languages has fluctuated between l and r since the days of IIr. I see more variation in dialects of English than in forms of Middle Persian -- certainly not enough be called a separate language -- and all these "differences" seem highly predictable to me. @-sche, this strikes me as one of those "splittist" cases you mentioned. --Victar (talk) 14:12, 6 June 2017 (UTC)
On second thought I changed my mind a bit regarding this. --Z 20:37, 6 June 2017 (UTC)
I am open to being persuaded otherwise by you all, who seem to have greater knowledge of this subject than I do, but based on this discussion and from what Wikipedia says, it does sound like we are dealing with dialects. They should of course have separate etymology codes (like Cajun French vs standard French). If l/r variation also exists within one or especially both varieties and not just as a distinction between them, that would suggest it should not be held up as a reason to separate them; likewise, if the appearance or absence of any particular variation is due to the constraints of script! Wikipedia speaks of using the documents which were written in more expressive/conservative scripts to understand the documents written in the other script. (It makes me think of the ISO granting separate codes to hieroglyphic vs cuneiform Luwian.) - -sche (discuss) 04:56, 8 June 2017 (UTC)

Re-add two vote references in Wiktionary:Criteria for inclusion/Well documented languages[edit]

In this diff, @Metaknowledge reverted my edit to Wiktionary:Criteria for inclusion/Well documented languages.

I'd like to do the same edit again, where I added two vote references. See the history of the page for further comments from him and myself. --Daniel Carrero (talk) 04:00, 6 June 2017 (UTC)

Addendum: Wiktionary:Votes/2011-04/Sourced policies is the vote where it was accepted to link every piece of text in EL and CFI to their supporting votes through the wiki technique of references. There are too many so-called policies created unilaterally without verifiable consensus. WT:EL and WT:CFI themselves are partially voted and partially non-voted. If we don't link the votes, we can't easily verify the fact Wiktionary:Criteria for inclusion/Well documented languages is thankfully almost 100% voted and approved, with a few unvoted changes concerning Arabic, Irish and Welsh. --Daniel Carrero (talk) 17:22, 6 June 2017 (UTC)

All words in all languages[edit]

In the very first paragraph of our main page we have "It aims to describe all words of all languages using definitions and descriptions in English.". This is manifestly false. We do not include all words (we omit some brand names for example) and we do not treat computer programming languages to be languages. There are two main ways we could improve this situation. The first (my preferred option, as I'm sure you know) is to make the statement true - to include all words, in all languages. A second option is either to rewrite the statement as " ... most words of most languages ... " or to follow it with an asterisk that somehow points to a "terms and conditions apply" section. What does everyone else think? SemperBlotto (talk) 04:56, 7 June 2017 (UTC)

What is this platonic definition of "word" that you seek to make us follow? Just because you say brand names are words doesn't mean we have to agree with you. DTLHS (talk) 04:59, 7 June 2017 (UTC)
Of course you don't have to agree with me. But we define brand name as a form of name, and we define name as a type of word. SemperBlotto (talk) 05:04, 7 June 2017 (UTC)
Then you can see that it's totally impossible to "make the statement true" since we will never agree on what a word is. DTLHS (talk) 05:07, 7 June 2017 (UTC)
This of course comes back to what constitutes a "word". If I stub my toe and say, "Yowzawhoawhoawhoa!" then that will certainly communicate some meaning to someone else ("that hurt a lot") but it's not really a word. There are a lot of signed, spoken, and written things which convey meaning but I think that anyone using common sense would realize that no dictionary could ever include all of these phenomena. —Justin (koavf)TCM 05:24, 7 June 2017 (UTC)
"aspirational". --Catsidhe (verba, facta) 05:26, 7 June 2017 (UTC)
Of course we would include "Yowzawhoawhoawhoa" - if it makes it into three different books by three different authors (like "Windows" has done). SemperBlotto (talk) 05:32, 7 June 2017 (UTC)
Sure, and it never will. That doesn't make it more or less of a "word", it's just not a "word" for our purposes. Someone else could rightly call a lot of things "words" which we don't: everyone would have some caveats on what would constitute "every word in every language" and I don't think ours are so unreasonable as to expect them to need a disclaimer on the front page. —Justin (koavf)TCM 05:34, 7 June 2017 (UTC)
For the record, maybe some dictionaries have a broader criteria for inclusion than us. http://jisho.org is an English-Japanese dictionary where in addition to "normal" stuff, you can search for some people names, brand names and movie titles, among other things. --Daniel Carrero (talk) 08:58, 7 June 2017 (UTC)
  • Indeed, this is manifestly false: we only include attested words. Even if we relaxed attestation criteria even more, we still have to admit that we do not have an omnicorpus of all utterances of all languages that ever existed on the planet, and therefore, we will necessarily fail to cover some words. This is not just a hypothetical concern; we are very certain that we omit some words for lack of evidence, even though we do not necessarily have to know which words.

    One remedy is to do nothing and read the sentence as a slogan that does not contain the necessary qualifications. Another remedy is to inject "approximately", "basically" somewhere in the slogan. The asterisk mentioned above is also an option. Adding "as long as there is enough evidence", which occurred to me, is not so good since that would address the attestation requirement but not the other requirements and exclusions." --Dan Polansky (talk) 10:08, 7 June 2017 (UTC)

  • We could change
    As an international dictionary, Wiktionary is intended to include “all words in all languages”.
    As an international dictionary, Wiktionary is intended to include basically “all words in all languages”, subject to certain conditions.
    --Dan Polansky (talk) 10:12, 7 June 2017 (UTC)
    I like the proposed change. It's an honest assessment of what words we actually accept. Though technically we also accept phrases, symbols and other things. --Daniel Carrero (talk) 10:24, 7 June 2017 (UTC)
  • How about "a bunch of words in a bunch of languages"? -WF
  • Footnoted motto/slogan? An example:
Our motto, annotated

All1 words2 in3 all4 languages5

The ordinary-word meaning of this slogan is somewhat misleading. The following notes explain the qualifications:

1Not every word is included at all, let alone in a meaningful way. Obviously we haven't gotten around to all of them. Attestation requirements exclude many. Due to the narrowness of our contributor base many languages are unrepresented and many specialized contexts are unrepresented, even in English.
2"Word" can include letters, numbers, symbols, abbreviations, proverbs, idiomatic expressions, some non-idiomatic expressions, clitics, affixes.
3Some "words2" could fall between languages. A multi-word expression borrowed from a foreign language could be non-idiomatic in its original language and thereby not includable in that language. It may also only be found in italics or quotation marks in running text in other languages, indicating that authors and editors don't think it has entered the lexicon in that language.
4See Vote on Serbo-Croatian.
5Translingual is not a language. Many non-words are better characterized as things. Things that are not words are not part of languages.

This approach works in real life for more-or-less unchangeable statements of great importance, like those in the US Constitution. DCDuring (talk) 13:09, 7 June 2017 (UTC)

How about "Wiktionary: It's complicated." - [The]DaveRoss 13:31, 7 June 2017 (UTC)
It seems to me that it worth expanding it at Wiktionary:All words in all languages, not as a disclaimer nor a policy but as an introduction to a pillar of Wiktionary. Comments on this thread worth to be annotated in a project/essay page, including "it's complicated" :-) --Vriullop (talk) 13:42, 7 June 2017 (UTC)
I created Wiktionary:Votes/pl-2017-06/CFI leading sentence. --Dan Polansky (talk) 16:13, 16 June 2017 (UTC)
I think the slogan is an acceptable aspirational slogan (good word, Catsidhe). Vriullop's idea of a page explaining it is interesting, but might overlap heavily with WT:CFI. It seems to be obvious to many people who comment on it that it is not to be taken to Amelia Bedelia levels of literalism. Even if we started including brand names, book titles, nonces, etc as some Wiktionaries do, we are unable to include all words in all languages, because some words were never recorded by anyone before they passed out of memory (e.g. in the Khazar language, Ciguayo, or Jassic). We are even apparently prevented by law from including all words in all languages because (in previous discussions in which some of our users who are lawyers have participated, it has been noted that) languages like Dothraki probably constitute significant parts of commercial franchises, and it would probably violate copyright if a third-party dictionary like us included all or a substantial number of the words in such a language. Even the most permissive inclusionism will hit hard limits.
Personally, I expect that as we become more complete, we will include codes for over 9000(!!!!1) languages. And we might broadly guesstimate that poorly-attested languages and highly-inflected languages may average out to half a million entries per language, so perhaps our slogan could say we aim "to describe four and a half billion words in nine thousand languages"? ;D - -sche (discuss) 17:44, 16 June 2017 (UTC)

Wiktionary:Votes/2017-06/borrowing, borrowed[edit]

Sorry, I'm not sure now is the best time to create this vote since I had created another one two days ago. But we have an ongoing discussion about what to do with {{bor}} in all entries, so I created it anyway. Please check Wiktionary:Votes/2017-06/borrowing, borrowed. There are a few discussions linked there. Feel free to edit the vote or suggest any changes.

Aside from that, I intend to create a new Wikidata vote once the current one ends in June 11. --Daniel Carrero (talk) 14:10, 7 June 2017 (UTC)

category for past forms of verbs used in turn as verbs on their own[edit]

A category of these forms may help the learner very much, since if they are not acquainted with these forms, finding them confuses momentarily the undertanding. For example, slew may mean "to veer", but it's as well the simple past tense of "slay"; likewise, "lay" is a transitive verb, as well as the simple past tense of "to lie, when pertaining to position". I do not know how to overlap categories so that I can get the one I wish to create. --Backinstadiums (talk) 14:12, 7 June 2017 (UTC)

I think it would be useful to have a more general category for overlapping of verb forms. This category could also include set and read, which don't distinguish tense in writing. —CodeCat 14:17, 7 June 2017 (UTC)
I see them as two different cases, the so-called "irregular verbs" being more treated than the one I propose. --Backinstadiums (talk) 14:47, 7 June 2017 (UTC)
Could sb. please teach me how to proceed? --Backinstadiums (talk) 12:53, 8 June 2017 (UTC)

WT:ELE - How to alphabetize languages[edit]

ELE dictates that language sections should be in alphabetical orders. Some languages have unusual characters in their English names, should they be alphabetized including those characters at face value, or without those characters? By way of an example, what is the correct order of "Ch'orti', Chachi, Cofán". - [The]DaveRoss 14:16, 7 June 2017 (UTC)

I would say to just ignore the non-letters (from an English point of view). So the order would be Chachi, Ch'orti', Cofán. —CodeCat 14:17, 7 June 2017 (UTC)
I find it hard to believe we haven't had this discussion before somewhere. I support using code-point order sorting, i.e. not ignoring the non-letters, since it's the easiest to implement and it's the state of all entries are in now. DTLHS (talk) 14:50, 7 June 2017 (UTC)
If we decide to do something about this issue, let's please update WT:EL#Languages. It just says that languages besides Translingual and English are "in alphabetical order". --Daniel Carrero (talk) 14:52, 7 June 2017 (UTC)
@DTLHS I figured the issue had been resolved and I was just not aware and couldn't find it readily. While I also like using the code-point ordering, I have found that not all entries are in that order (see: A).
@CodeCat If we go that way we will need to settle on what constitutes a non-letter (e.g. accents). Thankfully the set of allowable language names is limited so we can be comprehensive. - [The]DaveRoss 15:05, 7 June 2017 (UTC)
One other note, the translation sections should also follow the same policy, whatever it is. And presumably other sorted lists should have a policy. - [The]DaveRoss 15:47, 7 June 2017 (UTC)
I don't know about the examples you gave but I'd like to simply caution that those typographic characters actually are letters in some languages, e.g. ʻokina in Hawaiʻian. —Justin (koavf)TCM 16:03, 7 June 2017 (UTC)
Right, but our L2 headers, i.e. language names, are in English. We have ==French==, ==German==, and ==Spanish==, not ==Français==, ==Deutsch==, and ==Español==. For that reason, I'm in favor of ignoring things like apostrophes and diacritics when it comes to alphabetizing languages. —Aɴɢʀ (talk) 20:39, 7 June 2017 (UTC)
There is also a tendency to omit apostrophes in English when they don't seem to do anything (compare Mi'kmaq and Mikmaq), so the order might change depending on which spelling of a language we use, which is potentially confusing. Andrew Sheedy (talk) 20:49, 7 June 2017 (UTC)
The list of L2 headers we currently employ includes many non-English characters, perhaps that should not be the case but it is at the moment. I think we are relatively consistent in this regard within a single language, but I might be wrong on that. - [The]DaveRoss 21:04, 7 June 2017 (UTC)
I'm sympathetic to the argument that codepoint ordering is possibly the easiest to maintain (and is the one used by many entries now, due to bots sorting things), but it would not seem to be too difficult to define a more natural order, with Xârâcùù sorted before Xhosa, etc. It does appear as if other references ignore apostrophes, click letters, and diacritics when alphabetizing:
  • The International Encyclopedia of Linguistics lists in this order ’Akhoe, ǀAnda, Deti, ǁGana, Ganádi, ǀGwi, Hadza, Haiǀom, Hietshware, ǂHua (they print it as =/Hua), Juǀ’hoan [...] Nǀu, ǃOǃung, Sandawe, [...] ǀXam, ǁXegwi, Xiri, ǃXóõ.
  • Dalby's Dictionary of Languages sorts Larestani, Lārī, Lashi, Lāsī, Latgalian, [...] Mabwe-Lungu, Mača, Macao, [...] Māhārāshtri, Mahi.
  • The Ethnologue itself has Afrikaans, ||Ani, Birwa, [...] English, ||Gana, Gciriku, |Gwi, Hai||om, Herero, ‡Hua[sic], Ju|’hoansi, Kalanga [...] |Xam, ||Xegwi, ‡Ungkue.
  • Hodge's old Handbook of American Indians North of Mexico has Háami, Hāʼanaʟěnox, Haatse, Háatsü-háno, Habasopis, [...] Hailtsa, Haiʼ‘luntchi, Haiʼmāaxstō, Hai-ne-na-une, [...] Háiokalita, Haiowanni.
I agree that translations should be sorted in the same order.
Looking at WT:LOL, it appears that the list of characters besides A-Z used in language names and alt names (ignoring case) is
á, à, â, ä, ȁ, å, ã, ā : treat as a?
æ : treat as ae?
ɓ : treat as b?
ç, č : treat as c?
ḍ, ḏ : treat as d?
é, è, ê, ë : treat as e?
ɛ : also treat as e?
ğ : treat as g?
 : treat as h?
í, ì, î, ï, ĩ, ī, ɨ : treat as i?
ł : treat as l?
ñ : treat as n?
ŋ : treat as ng? At the moment, all languages with alt names using "ŋ" indeed use "ng" in their canonical names.
ó, ò, ô, ö, ȍ, õ, ō : treat as o?
ɔ, ɔ̃ : also treat as o?
š : treat as s?
 : treat as t?
ú, ù, ü, ũ, ŭ, ų : treat as u?
ŵ : treat as w?
ý : treat as y?
 : treat as z?
ə : what should happen to schwas?
() (as in "Kare (Africa)", "Yao (South America)") : parse as-is i.e. in codepoint order?
- (and , which was used in four alt names I just switched to use hyphens) : parse as-is?
. (as in "Mt. Iraya Agta") : parse as-is? or ignore i.e. discard?
', ʼ, ǀ, ǁ, ǃ, ǂ and ʻ and ˀ, ʔ (and the nonstandard ’, ‡) : ignore i.e. discard?
Note that many of these special characters only appear in alt names (which I included as a decent repository of what special characters might one day appear in canonical names of same languages), and are already normalized as above in their canonical names which we'd be dealing with, anyway! (Perhaps someone else feels like making a list of only those special chars which appear in canonical names.)
- -sche (discuss) 15:42, 8 June 2017 (UTC)
Can we simplify your suggestion to: "Use the natural ordering after removing all combining characters and punctuation, and splitting ligatures."? I am worried about converting similar characters in other scripts to their Latin counterparts, since that way lies incredible complexity and subjectivity. - [The]DaveRoss 19:05, 9 June 2017 (UTC)
My post only spells out all of the diacritical letters that are in use so people can see which letters those are and see if they agree with the proposed normalization; I expect that an actual rule would be phrased in more general terms, yes. For example, much of it can be simplified to "ignore diacritics". Noe links to an article by the person in charge of Glottolog about naming languages, which agrees with replacing (and sorting) ɛ and ɔ as e and o. As for punctuation, do we want to remove it? Suppose we had a language called "Kala (Zimbabwe)", should it be sorted before or after "Kala Lagaw Ya"? - -sche (discuss) 19:26, 9 June 2017 (UTC)
I am certainly not the right person to make the calls about the best course here, I am merely hoping for a general rule if possible rather than a mapping system. - [The]DaveRoss 19:34, 9 June 2017 (UTC)
I think parenthesized qualifiers such as "(Zimbabwe)" should be entirely ignored unless it results in two languages having identical names, and only then should they be used to sort them. In other words, sort it as just "Kala" but if there happens to be another "Kala" then use the qualifiers to determine which goes first. —CodeCat 19:36, 9 June 2017 (UTC)
Wouldn't that be (effectively) just like sorting with the parenthesis left in place? Perhaps there are instances I am not considering. - [The]DaveRoss 19:48, 9 June 2017 (UTC)
Perhaps, but editors can't be expected to know Unicode code points, whereas they can be expected to know English alphabetical ordering. So even if any programmed implementation treats it as you say, a human-readable description of the process would have to describe it more as I did. —CodeCat 19:57, 9 June 2017 (UTC)
As far as I know, parentheses are only used when two languages do have the same name, and then, in almost all cases — the only exception that comes to mind is that we haven't yet added a qualifier to the million-strong language "Yao" just because the tiny, extinct language "Yao (South America)" exists — they are used on both languages. So I suppose we should leave the parentheses as-is and thus sort a hypothetical "Kala (Zimbabwe)" above "Kala Lagaw Ya". A rule that parentheses' contents "should be dropped unless X is true" where X is true 100% of the time would be needlessly confusing, IMO. - -sche (discuss) 20:00, 9 June 2017 (UTC)
Not really, because "Kala (Zimbabwe)" should be sorted before a hypothetical language called "Kalaza". —CodeCat 20:09, 9 June 2017 (UTC)
I think I may understand the source of the confusion. I'm assuming that a space doesn't count for sorting, since it's not an alphabetical character. So "Kala (Zimbabwe)" would be "Kalazimbabwe" for sorting purposes if we didn't take out the parenthetical part. Or, to take two real examples, I'm saying that "Tokelauan" would come before "Tok Pisin". —CodeCat 20:16, 9 June 2017 (UTC)
Aha, that's a place our assumptions differed; I assumed spaces would be counted. Poking around other reference works, I see that G. Cinque's Typological Studies: Word Order and Relative Clauses sorts "Tokelauan, Tok Pisin" (in the alphabetical index), while J. Lynch's Pacific Languages: An Introduction sorts "Tok Pisin, Tokelauan". I'm not sure which is better. But independent of whether or not spaces are counted, I would never sort "Kala (Zimbabwe)" as "Kalazimbabwe". And if we both want "Kala (Zimbabwe)" above "Kalaza", isn't the simplest way to obtain that to treat the parentheses as parentheses, which are sorted ahead of alphabetic characters? - -sche (discuss) 20:49, 9 June 2017 (UTC)
Here's a list of only those characters that are current used in canonical names, i.e. the ones we'd actually have to sort right now: [a-z], (space), - (hyphen), ' (apostrophe), . (dot), () (parentheses), á à â ä ã å ç é è ê ë í ì î ï ñ ó ò ô ö õ ú ù ü (diacritics, which could be handled by a rule "treat letters with diacritics the same as their base letters"), and ǀ ǁ ǃ ǂ (click consonants, which could be handled by a rule "treat click consonants as if they are not there"). It's possible that we should even remove some of those from the canonical names themselves, i.e. rename the languages. - -sche (discuss) 20:49, 9 June 2017 (UTC)
Should hyphens be dropped? For example, how should Yan-nhangu, Yangkam, Yanomámi be sorted? - -sche (discuss) 22:52, 9 June 2017 (UTC)
If we are doing this, I suggest that we create a module function that outputs a list of every language name in whatever internal order we decide on. Bots can read that page and order language sections and translations accordingly. DTLHS (talk) 22:57, 9 June 2017 (UTC)
My suggestion- four classes of characters:
  1. Basic English letters
  2. Basic English letters with diacritics
  3. Non-English letters with no English counterpart (most or all them clicks and glottal stops)
  4. Punctuation
Perform the following transformations to produce the sort key, in the order given:
  1. Convert apostrophes to one of the other glottal-stop characters so they won't be treated as punctuation.
  2. Convert all punctuation to spaces and then convert multiple spaces to single spaces.
  3. Prefix all letters having diacritics with the corresponding basic English letter.
  4. Swap non-English/no-counterpart letters with the following letter so the following letter comes first. If there are multiple such letters, swap all of those letters next to each other as a group.
This has the advantage of having things sorted first by basic English letters, but having the order of the diacritics followed as well, and having the order of the "ignorable" non-English/no-counterpart letters followed, too. Since spaces are sorted before basic English letters, that also honors the principle that "nothing comes before something".
I'm not positive about the second transformation, since that will mean "Abc (def)" will be the same as "Abc def", but it's something to start with- tweaking is welcome. Chuck Entz (talk) 03:55, 10 June 2017 (UTC)
@Chuck Entz, can you elaborate on transformation number four? - [The]DaveRoss 17:52, 16 June 2017 (UTC)
Sure. The idea is that there should be a basic English letter before other characters for sorting purposes, with the others following it to distinguish between cases distinguishable only by those other characters. "Swapping" isn't really the best choice of words: what I mean is that the first basic English letter following one or more non-English-no-counterpart characters should be moved in front of them. Now that I've had a chance to think about it, maybe it would be better to ensure that the diacriticed letter goes with it, probably by moving the fourth transformation before the third, and clarifying that both basic English letters and diacriticed letters should be treated the same by this new third transformation. Thus 'ábçd would become aá'bcçd. Using these rules on -sche's examples below: Gadang→Gadang, Ga'dang→Gad'ang, and Madi→Madi, Ma'di→Mad'i. Chuck Entz (talk) 21:53, 16 June 2017 (UTC)
...? Why is having to sort Mad'i any better than having to sort Ma'di? It seems like it would be neater to say: ignore apostrophes (clicks, and diacritics) when sorting languages on the page, but if that causes two or more languages to have the same name, then sort those two or more amongst themselves with the apostrophes present (left where they are).
Should spaces also be subjected to such a process (resulting in "Tokelauan, Tok Pisin"), or left alone? I tend to think spaces should be left in per the "nothing comes before something" principle you mention (so, "Tok Pisin, Tokelauan"), at which point there's no reason to remove the parentheses (and indeed, removing them would only make things more complicated and difficult), since leaving them in ensures that a hypothetical "Foo (Bar)" comes before "Foo Bar", which seems appropriate because bare "Foo" should also come before "Foo (Bar)". - -sche (discuss) 04:09, 17 June 2017 (UTC)
You raise an important point, that some language names would be identical if diacritics and special characters were removed. These include gdk Gadang and gdg Ga'dang, and grg Madi and mhi Ma'di. What order should these be in: "Ga'dang, Gadang" or "Gadang, Ga'dang"? - -sche (discuss) 18:20, 16 June 2017 (UTC)

Proposal: a page to centralise the patrolling effort[edit]

For some four years we’ve been unable to keep up with the rate of unpatrolled changes. This means that a lot of inadequate edits, and sometimes even vandalism, gets through. I’ve been trying to think of ways to improve the efficiency of our patrolling, because Special:RecentChanges is very hard to use: it lists all unpatrolled edits in all languages and all areas, but an individual patroller has the knowledge to patrol perhaps 10% of them. For example, right now most recent unpatrolled edits are changes to Hungarian entries and the addition of Galician, Ukrainian and Italian translations. I have enough knowledge to verify whether the Galician translations are good, and I could look up some published dictionaries to check the Italian and Ukrainian translations (but other users could do it faster and better), and I can’t possibly hope to check the content of the Hungarian entries (just the formatting).

In addition to the raw recent changes page, we could have a page with a list of users with unpatrolled edits, separated by language and topic. For example, if I am patrolling the recent changes and come across a user adding Japanese etymologies, I would add a new item to section ==Japanese==, subsection ===Etymology=== on this page, with a link to the user’s contribution page and perhaps an explanation as to why I think their contributions need special attention. Eventually a patroller who is more proficient in Japanese etymology will see this link and check the contributions. In order to keep the patrolling process invisible, as it already is, this page should have a mechanism to prevent users from being pinged (WT:VIP has such a mechanism, if I remember correctly). The advantages that such a page might bring include:

  • Encourage users who don’t the time or patience to go through Special:RecentChanges to patrol.
  • Encourage patrollers to delegate edits to someone who feels more confident in the language and topic.
  • Provide a place where the correctness of someone’s edits can be discussed (like a pre-WT:RFC, without implying that their edits need to be cleaned up)
  • Make it easier to identify sockpuppets and patterns of odd behaviour.
  • Prevent unpatrolled edits from being lost in the Recent Changes limbo.

Ungoliant (falai) 16:38, 7 June 2017 (UTC)

This is a fantastic idea. I think that the problem of a piling up of unpatrolled edits has not been due to the difficulty of patrolling so much as to the fact that not that many people are patrolling. This kind of page could help make clear to admins who don't patrol as much how they can help out in the common effort. —Μετάknowledgediscuss/deeds 18:39, 7 June 2017 (UTC)
I think that this collation of edits by topic and language could be done mostly automatically if anyone wants to do it. DTLHS (talk) 18:41, 7 June 2017 (UTC)
I am not sure I am understanding the suggestion exactly, but it does seem like it could be useful. Is there any way you could mock something up which demonstrates what you are suggestion (even if in a limited way)? I think I would support this effort even if that wasn't possible, I am more curious about what the implementation might look like and be capable of. - [The]DaveRoss 13:27, 8 June 2017 (UTC)
@TheDaveRoss mockup. — Ungoliant (falai) 15:22, 8 June 2017 (UTC)
Thank you for that, it isn't exactly what I expected but it makes a lot of sense now. This seems like a great way to collaborate. - [The]DaveRoss 15:38, 8 June 2017 (UTC)
What did you have in mind, Dave? — Ungoliant (falai) 15:59, 8 June 2017 (UTC)
For some reason I pictured something which showed the edits to be patrolled, and I couldn't think of how that might work (without a lot of fancy coding). I get now that it is more of a WT:VIP analog. - [The]DaveRoss 17:21, 8 June 2017 (UTC)
@Ungoliant MMDCCLXIV: I patrol changes to existing Hungarian entries almost every day by checking my Watchlist. Since it doesn't show new entries, those are harder to find. It would be great to have a better method. We had a Recent Changes list by languages a long time ago, that worked well. I'm not sure what technical challenges it presents. --Panda10 (talk) 18:45, 8 June 2017 (UTC)

Appendix: Easily confused chinese words[edit]

Hi, regading the concept of chinese anagram, I think it would be of great help to create an appendix similar to Easily_confused_Chinese_characters but Easily "confused Chinese words", which in theory could be easily created from a corpus of words, just selecting those with the same number of the same characters yet in different positions. Furthermore, I'd like to know how the concept of anagram can be used for characters themselves, transposing radicals or even strokes. --Backinstadiums (talk) 13:14, 8 June 2017 (UTC)

Wikidata precautionary principle[edit]

Once the Wikidata vote ends, there's a chance we'll get Wikidata installed here.

If that happens, what do you think of restricting its use by implementing the rule below?

"Any and all edits using Wikidata shall be reverted on sight if they were not discussed or voted before."

Or we might demand all Wikidata uses to be voted, not just discussed. --Daniel Carrero (talk) 14:31, 8 June 2017 (UTC)

I agree that it is right to be cautious in any implementation of data from Wikidata, especially when that data will be presented directly to the user. I would suggest that, at least to begin with, any use of Wikidata data which is non-controversial (e.g. using the mapping between ISO codes and language names [in as much as they conform to our current use]) be discussed publicly and agreed upon, and any potentially controversial use (e.g. including a Wikidata identifier [Q12345] as a template parameter; presenting Wikidata data directly) be subject to a vote. - [The]DaveRoss 14:45, 8 June 2017 (UTC)
I'm not sure the language code→name mapping is 100% noncontroversial. I've been thinking it might be a good idea to create a separate vote with this proposal: "Moving all the data from Category:Language data modules, Category:Dialectal data modules, Module:families/data, Module:scripts/data and Category:Unicode data modules to Wikidata." --Daniel Carrero (talk) 15:01, 8 June 2017 (UTC)
Ehh. (Or maybe: LOL.) Treatment of languages (as dialects or macrolanguages, etc) is rather complex, and not a good candidate for quick Wikidata-fication. Treatment of ISO- as well as exceptionally- coded lects as separate or as dialects of one unit varies a lot between wikis; we merge even some lects that have separate wikis, like the Serbo-Croatian lects. Names also vary a lot not just between wikis of different languages (which obviously use their own native names for things), but also within different wikis that use the same language, if wikis have different priorities with respect to e.g. calling each language by its native name, or by the name that is most common in references on the language, vs calling it by a name that distinguishes it from other languages with the same name: hence we have "Austronesian Mor" and "Mbo (Congo)", where another wiki might prefer "Mor (Austronesian)" or "Mbo (Democratic Republic of the Congo)", or might even think the name should be plain "Mor" and damn the torpedoes.
Other things that pertain to languages also vary between wikis, e.g. some wikis might consider that a language that is documented by linguists in the Latin script, or in Cyrillic, but that has no natively-used script should not be said to have "Latn"/"Cyrl" as a script, whereas other wikis might feel otherwise. Since our templates need to know what scripts a language is written in so as to know whether it needs a transliteration, and since our CSS applies fonts based on that information as well, we wouldn't want other folks to pull the rug out from under us as to what script a language had.
Even family information is the source of both disagreement (is Tibeto-Burman different from Sino-Tibetan? is Finno-Ugric different from Uralic? etc) and different priorities (some wikis might want comprehensive family trees that listed every node; for us, they would be too finely granular and would ghettoize languages and derivations into tiny categories hidden at the ends of long trees).
To move only some language names / script infos / etc to Wikidata and handle others locally would be inefficient and unwise, IMO. And in order to move them all, we would need "infrastructure" to be added there that would handle e.g. "is called X on wiki Y", at which point, we'd just be doing the same thing we're doing well here, but doing it there for some reason, which seems inefficient and unwise.
IMO, treatment of languages is so central to what each Wiktionary does that, especially for a wiki with as many active linguistically-adept and technically-adept editors as en.Wikt, it makes sense to do it locally.
Peripheral things, things that are not core to our mission, are better candidates for moves: e.g., parsing that a certain city is in a certain county/province/etc in a certain country on a certain continent, or (re the recent discussion of eponyms) parsing that a certain person has a certain nationality. - -sche (discuss) 16:34, 8 June 2017 (UTC)
Here's an idea: if Wikidata allows it, each language could have an "English Wiktionary name" property. We would use "Austronesian Mor" even if for other purposes the same language is called by other names. --Daniel Carrero (talk) 16:54, 8 June 2017 (UTC)
There is a strong argument for bringing ourselves more into compliance with international standards where we can, but also I disagree that managing by exception would overly complex. There are relatively few people who have any idea how/where to modify the existing language mapping structure, it is both obfuscated and has significant functionality issues (see water, man). As Daniel suggests there are also other paths for this (and virtually every) problem, e.g. creating new properties which would be controlled by this community. - [The]DaveRoss 17:01, 8 June 2017 (UTC)
I'd like to say something again about compliance with standards, which is important. I'm pretty sure all or most language codes listed at Module:languages/datax are not in compliance with ISO. I would support changing them all to be ISO-compliant, but some people may oppose doing that. (this was one of the points discussed at Wiktionary:Beer parlour/2017/February#Proposal: Implementing Wikidata access) --Daniel Carrero (talk) 17:10, 8 June 2017 (UTC)
  • The sentence in quotes is something that I would enthusiastically support. It would certainly assuage a lot of my (and others') personal fears with respect to Wikidata. —Μετάknowledgediscuss/deeds 15:19, 8 June 2017 (UTC)
I support requiring a vote for each distinguishable use case. DTLHS (talk) 16:47, 8 June 2017 (UTC)

Wiktionary:Votes/2017-05/Installing Wikidata passed. I created Wiktionary:Votes/pl-2017-06/Wikidata precautionary principle to implement the principle proposed here. In the vote, I rewrote the proposed text to be a bit longer and more policy-like in my opinion, but the idea is 100% the same as proposed here. Feel free to edit the vote or suggest any changes. --Daniel Carrero (talk) 00:30, 11 June 2017 (UTC)

1000 Middle Dutch entries![edit]

With the appropriately-chosen fêeste, there are now 1000 entries for Middle Dutch. Some of them are alternative forms, but this is compensated by some entries that have more than one lemma. —CodeCat 20:52, 8 June 2017 (UTC)

Impressive! Our coverage of languages like this is an asset. - -sche (discuss) 19:16, 9 June 2017 (UTC)

Borrowed descendants[edit]

Is there is standard format for words whose descendants are all borrowed? Case in point, French hangar. Related, I wonder if we need a variant of {{see desc}} for examples like Frankish *haimgard that instead reads (see there for further borrowings). --Victar (talk) 00:53, 9 June 2017 (UTC)

What do you care for most? What are you concerned with? Take part in the strategy discussion[edit]

Strategy Graphic.pdf
The World in 2030 - Presentation for movement strategy discussions.pdf


The more involved we are, the more ideas or wishes concerning the future of Wikipedia we have. We want to change some things, but other things we prefer not to be changed at all, and we can explain why for each of those things. At some point, we don’t think only about the recent changes or personal lists of to-dos, but also about, for example, groups of users, the software, institutional partners, money!, etc. When we discuss with other Wikimedians, we want them to have at least similar priorities that we have. Otherwise, we feel we wasted our time and efforts.

We need to find something that could be predictable, clear and certain to everybody. A uniting idea that would be more nearby and close to the every day’s reality than the Vision (every human can freely share in the sum of all knowledge).

But people contribute to Wikimedia in so many ways. The thing that should unite us should also fit various needs of editors and affiliates from many countries. What’s more, we can’t ignore other groups of people who care about or depend on us, like regular donors or “power readers” (people who read our content a lot and often).

That’s why we’re running the movement strategy discussions. Between 2019 and 2034, the main idea that results from these discussions, considered by Wikimedians as the most important one, will influence big and small decisions, e.g. in grant programs, or software development. For example: are we more educational, or more IT-like?

We want to take into account everybody’s voice. Really: each community is important. We don’t want you to be or even feel excluded.

Please, if you are interested in the Wikimedia strategy, follow these steps:

  • Have a look at this page. There are drafts of 5 potential candidates for the strategic priority. You can comment on the talk pages.
  • The last day for the discussion is June, 12. Later, we’ll read all your comments, and shortly after that, there’ll be another round of discussions (see the timeline). I will give you more details before that happens.
  • If you have any questions, ask me. If you ask me here, mention me please.

Friendly disclaimer: this message wasn't written by a bot, a bureaucrat or a person who doesn't care about your project. I’m a Polish Wikipedian, and I hope my words are straightforward enough. SGrabarczuk (WMF) (talk) 11:01, 9 June 2017 (UTC)

Yikes, the use of marketing jargon here is horrible, the opposite of wikis' origins as common-sense technical tools. I think I agree with the Germans. Equinox 18:50, 9 June 2017 (UTC)
@Equinox: Excellent thanks for the link! --WikiTiki89 19:03, 9 June 2017 (UTC)
I wish they would keep us out of their propagandistic marketing agendas. --Victar (talk) 19:08, 9 June 2017 (UTC)
This isn't marketing. I'm not a marketer. And I'm not 'we'. Tell me you don't care about anything beyond English Wiktionary, fine, I'll understand (there are many users who 'just edit'), but don't imply I do things that I don't. For me, it's an utter lack of WT:AGF. In other words: are you interested in the movement strategy? do you have any questions? remember: questions to me personally, not to the entire WMF. And please, see my user page and read who I am. SGrabarczuk (WMF) (talk) 21:16, 10 June 2017 (UTC)
I apologize, SGrabarczuk (WMF), the principle of WT:AGF has never been very popular on en.wiktionary. Don't take it personally, though. Plenty of our first-time visitors get jumped on. Think of it as baptism by fire. —Stephen (Talk) 22:01, 10 June 2017 (UTC)
Stephen, I appreciate what you wrote, however, I'm not new at all. I got used to a daily MMA situation when my opponents get medieval on my arguments, and I can do the same with their words. That's a 'normal' wiki-way, but only when one doesn't like the opponent, or when one talks to someone from the other side (a newbie, a WMF staffer). But I'm not one of them, and there's no reason not to like me in advance. I, like 'Red' Redding, may get your community sth from the other side, provided I'm asked civilly. That propaganda, I wrote it myself. I tried to avoid corporate speech, added a friendly disclaimer, but yeah. Cheers. SGrabarczuk (WMF) (talk) 00:52, 11 June 2017 (UTC)
SGrabarczuk, one of our admins has indefinitely blocked w:Jimmy Wales himself. We are nothing if not ecumenical abusers. —Stephen (Talk) 20:58, 16 June 2017 (UTC)
No, nobody is attacking you for no reason. Stuff like "movement strategy ecosystems and actors" is opaque marketing-style jargon from the commercial world, not to be trusted on an open project. Equinox 21:05, 16 June 2017 (UTC)
@Equinox, SGrabarczuk (WMF) is Polish, and his English can be expected to be a bit off. It likely reflects the sort of English texts that he often reads. He indicates that his English level is en-3, so judgments such as "opaque marketing-style jargon from the commercial world, not to be trusted on an open project" are unfair, inappropriate, and very likely incorrect. —Stephen (Talk) 23:08, 16 June 2017 (UTC)
@Equinox, I bet both of us well know how to deal with the History tab. You can see that I didn't write that. You'll welcome to react like tell (err, where did you find this?) insert_the_author's_nick_here not to use that sort of words. I'll agree! I know that most of users assume that guy has a (WMF) in his signature, so he's co-responsible for all that rubbish, but that's an incorrect, simplifying, and not exactly fair assumption. I'm not a boss, thus I'm responsible only for myself. SGrabarczuk (WMF) (talk) 22:05, 16 June 2017 (UTC)
@Stephen G. Brown Really, I always found en.Wikt to be very manageable and relaxed in terms of good faith. Make a round on German Wikipedia (and German editors here...). Quick to assume agendas and deceit and not as shy to say so. (Not excluding myself, I quietly accuse a lot of them of agenda pushing.) @SGrabarczuk (WMF) I agree that the headlines sound like suit buzzwords. The actual explanations are valid topics that might benefit the Wiki projects. I think editors, certainly here, have little interest in diverting time and attention to the peripherals of what they actually like doing. En.Wikt is, from what I see, populated not by a people who wanted to be part of a commune and better the world but by people who use dictionaries a lot, do linguistics a lot or have a pet language they would like to propagate. Small wik-, big -tionary. Even the participation in these here talk pages is minuscule. This means for the 'other side' that things have to be made with the awareness/anticipation that they will be judged (and most likely discarded) after a cursory glance, so things like headlines should be very plain and on point, even if it doesn't have much ring as a motto. And to give you at least some form of feedback: The only things I ever see in these discussions which people would need other than 'more editors making more edits' are technical things. So maybe explicitly asking what's needed in that regard would get a better reply. Really just a guess, though, I don't deal in the Grease Pit. Korn [kʰũːɘ̃n] (talk) 22:45, 16 June 2017 (UTC)
@Korn, read the comments in this discussion by Equinox. Not what I would call manageable and relaxed in terms of good faith. —Stephen (Talk) 23:08, 16 June 2017 (UTC)
@Stephen G. Brown Might be personal bias as I'm partially affected, but I still find "Marketing jargon isn't to be trusted in an open project" less venomous than the things I find on the talk pages of Wikipedia, and I don't think that kind of attitude is common here towards editors editing. I could only tell you one person here who ever showed an actual lack of good faith [read: was instantaneously making blatant accusations] in dealing with such situations and it's a German. Even Wyang and CodeCat only think of each other as stubborn idiots, rather than people with ill intent. As an example of actual bad faith: My worst personal encounter with not-good-faith was moving an article called 'German dialects', dealing in equal parity and no little detal with Dutch, Low German and German dialects, to 'Continental West-Germanic dialects'. I was, IIRC, then accused of doing that for vandalism, by people who had never heard the term, and the fact that I had made a typo, as I do, and had to move the article twice, was taken as evidence that I was covering up my villainy. The article was then moved back and butchered from a perfectly good article about Continental West-Germanic into a disgrace about 'German'. Now that is real lack of good faith, the important kind, which affects the projects' contents. And I never saw that kind happening here. (Feel free to direct me to examples.) Equinox' distrust, while not a good sign of project spirit, doesn't actually make Wiktionary any worse to the user than it is, nor does it drive away able, good-willed editors. And I don't think he's actually thinking WMF is out to worsen the projects. I read it more as an accusation of being out of touch. Korn [kʰũːɘ̃n] (talk) 09:00, 17 June 2017 (UTC)
@Korn, yes, not everyone treats others that way. The number of abusive editors seems to have been decreasing over the years, very gradually. Nowadays, I think there are only a couple of them left. In the old days, half of our editors seemed to be looking for any reason to jump down someone's throat. The atmosphere was tense. It used to be common for some admins to block other admins over minor or weird reasons. A few years ago, a certain admin blocked an new editor for one year because he transcribed a Persian word using /sh/ instead of /š/. The situation is improving, but it is still nothing to brag about. —Stephen (Talk) 09:22, 17 June 2017 (UTC)
It's no insult. All the propaganda and stuff coming from "the Wikimedia foundation" uses a completely different tone and context than the ENTIRE rest of the Wiktionary site. I see it as inappropriate. Contributors here may be involved in the Wikipedia project or other WM projects as well, but are not necessarily and many aren't. Wikipedia and Wiktionary are two completely separate communities, other than the fact that the two often link to each other on a lot of pages, often share contributors, and that they're funded by the same foundation. In my opinion, besides possible funding issues, if Wiktionary ever broke off from its relations with WMF, there would be no differences. The people, the contributors, the tone, expectations of contributors, style of editing, etc., would all be the same. It is my opinion that the ads need to stop coming to places like the Beer Parlour because it's simply not a concern or priority of ours. PseudoSkull (talk) 10:43, 17 June 2017 (UTC)
Really, now? You're attacking the people who do the friendly work of making sure this project stays online because they offer you the choice to participate in the decisions of what they do with the funds they raised? Which you can just ignore and move on with your life? That's the type of thing you want to do? Korn [kʰũːɘ̃n] (talk) 12:28, 18 June 2017 (UTC)
As above, I'm not the only one holding the side you just stated. Also, I wouldn't say my statement is "attacking", but rather stating an opinion that the propaganda is somewhat inappropriate here. Maybe we should just make a page called Wiktionary:Place propaganda and advertisements here to keep it out of our site discussion space. Like I said, Wiktionary is a completely separate community with very little direct relations to Wikipedia, so the propaganda is not really a concern to us. That isn't an attack; that's just the stating that I'm annoyed by it like others above me, as it has always clearly stood out to me that the propaganda did not belong here. I never said anything, though, because I didn't want to possibly be that one egg who said something that pissed Wiktionary off. PseudoSkull (talk) 16:54, 18 June 2017 (UTC)
To be fair, I never said that the content of the propaganda was bad; I personally actually agree with most of what it said and think it's good that free knowledge is being highly encouraged by the foundation, etc., but the fact that there IS propaganda at all is the bad part, regardless of how much I agree or disagree with its statements. PseudoSkull (talk) 16:57, 18 June 2017 (UTC)

Maybe there's a story in the background that justifies your attitude, I don't know. I probably can't and shouldn't resolve that. I can only admit that I understand what you write. Now, look: in various places all over the world, groups of people (not necessarily users) gathered and talked about the future of our movement. Link #1, link #2. The crucial points documented in such pages will help to conduct next the discussion. Think and write what you want, but don't miss the point that you do have an opportunity and you can influence the strategy, if you take part.

By the way, the next cycle (3rd one) of our discussions will begin in July. Now, we're reading the cycle 2 feedback. When the conclusions are published, I'll write again. Do you have any questions concerning the strategy? SGrabarczuk (WMF) (talk) 15:30, 24 June 2017 (UTC)

To clarify: I really, really dislike marketing/advertising and feel that they have destroyed much of what was good about the early Internet (and I spend enough time deleting Viagra and vanity from en.wikt); so I am inclined to go on the attack when I think I spy this stuff creeping into spaces that are still largely neutral and informational, like (the greater part of) the Wikimedia projects. ‎SGrabarczuk, of course I have nothing against you personally and I apologise if you feel you were being attacked as an individual. I know I am difficult to get along with! Equinox 15:45, 24 June 2017 (UTC)

English orthographic categories[edit]

Hi, taking into account the educational characteristic inherent in lexicography, I'd like to propose creating categories for intricate issues related to orthography. Thus, a category for words with doubled letters, even twice (lemmas: aggress, accommodate, etc.) would help learners review lists and make less mistakes. --Backinstadiums (talk) 16:34, 9 June 2017 (UTC)

We have a few already: Category:English terms by orthographic property. — Ungoliant (falai) 16:48, 9 June 2017 (UTC)

Wikidata proposal: Add "English Wiktionary name" as a statement about all languages[edit]

@Lea Lacroix (WMDE): Here in the English Wiktionary, we use a single name for each language everywhere.


The chosen language name is used in etymologies, translations, descendants, categories, appendices, policies and other places. If we decide to change a language name, it will have to change everywhere. We often use modules and templates to convert language code→name information, such as "mhz"→"Austronesian Mor". We have large data modules containing language code→name information. The "mhz"→"Austronesian Mor" information is one of the 621 languages currently stored at Module:languages/data3/m. See also Category:Language data modules for all the language data. WT:LANG is our language policy.

Three questions:

  1. Can we move all the language data modules to Wikidata? (we don't know yet if there's consensus here to do it — it may or may not be done, depending on consensus)
  2. Can we add a new statement "English Wiktionary name" where its value is exactly the name we use? This way, the language name will be exactly as the English Wiktionary decides, even if a different synonym is chosen for other projects and purposes.
  3. Can we protect the new statement "English Wiktionary name" so that people are generally disallowed to edit it even if the rest of the data item is free to be edited? Maybe only Wikidata admins would edit it.

This concern was raised by @-sche in this discussion above: #Wikidata precautionary principle. --Daniel Carrero (talk) 17:23, 9 June 2017 (UTC)

The canonical name of a language is the single most queried item across Wiktionary. This means that there are very strict requirements on speed. How fast can such names be queried from Wikidata? What if it's hundreds of them like on water? As for the name of the data item, something like en.wiktionary might be better, but I have very little knowledge of how Wikidata works in detail (could someone explain it to me on my talk page?). —CodeCat 17:35, 9 June 2017 (UTC)
@Daniel Carrero, Lea Lacroix (WMDE): If this foolish idea is to proceed, then as I noted in the previous discussion, you would also need to add "English Wiktionary script(s)" and "English Wiktionary family" field, and corresponding "French Wiktionary script(s)", "English Wikipedia family", etc to handle cases where script and family information is disputed/controversial between wikis, e.g. where one wiki follows Ethnologue in declaring a language unwritten (Zxxx / Zyyy, because there is no natively-authored literature) but we regard it as being in Cyrl (if we have entries in Cyrl citing linguistic reference works in Cyrl, our templates and CSS rely on that being declared as a script of that language so they know to ask for transliteration, and to call on the right fonts for displaying it); and (respectively) where one wiki treats Uralic and Finno-Ugric as different (maybe a WP would), and another treats them as the same (cf Tibeto-Burman vs Sino-Tibetan, etc), or where one wiki wants every node in a family tree represented, while another wants pragmatic levels of categorization, not to ghettoize languages into a dozen nested levels of families. You would also need parameters for when one wiki considers several ISO or non-ISO coded lects to be distinct, but another wiki considers them one language (e.g., Serbian vs Croatian vs Serbo-Croatian, Rhine Franconian vs Central Franconian vs Kölsch, European French vs Cajun French, Boubonnais-Berrichon vs Bourbonnais and separately Berrichon etc), as well as the basic capacity to handle lects which lack ISO codes. You would need, in effect, to duplicate everything we currently do here, but on another site, where the people who maintain language data on this site would need to have the full editing rights to keep it up to date there, or bother Wikidata editors about hundreds of things a month — see how many distinct codes I've changed in Module:languages's submodules recently — or, more likely, have things go unupdated. And, as CodeCat notes, fetching the data from Wikidata would have to be as fast as fetching it from our local modules, unless you were going to make the existing problems large entries have (which seem to be due to the use of many complex auto-transliteration modules) worse. I continue to regard this fixation on outsourcing everything, despite the lack of benefits, as foolish; language data is so central to each wiki that it makes sense to do it locally, at least on a wiki with as many active technically- and linguistically-adept editors as en.Wikt. - -sche (discuss) 18:10, 9 June 2017 (UTC)
I'm curious as to whether using Wikidata would actually be faster than our local language modules. Is there any way to measure that? Currently, a template replacing "mhz"→"Austronesian Mor" would have to transclude the entirety of Module:languages/data3/m. With Wikidata, it would only have to access a single property (language name, or en.wikt language name) of d:Q2122792. --Daniel Carrero (talk) 20:03, 9 June 2017 (UTC)
Arbitrary access to Wikidata is an expensive function, that is access to a page not connected with the current one. See mw:Extension:Wikibase Client/Lua#mw.wikibase.getEntity. I think it is not an option for the water case. --Vriullop (talk) 20:51, 9 June 2017 (UTC)
That makes me wonder: can we have per-wiki Wikidatas? We don't want to outsource all this data, but the infrastructure of Wikidata could still be beneficial (whether in terms of speed remains to be seen) if it could be kept locally. —CodeCat 20:19, 9 June 2017 (UTC)
It's important that d:Q42365 (the Wikidata item for "Old English") already has statements identifiers like "Quora topic ID" ("Old-English") and "Encyclopædia Britannica Online ID" ("topic/Old-English-language"), so using Wikidata to contain information about multiple sites seems the normal thing to do, it's not an earth-shattering new idea. So I think Wikidata might as well have "English Wiktionary name" (or "English Wiktionary ID") for all languages, it would simply fit the current system. --Daniel Carrero (talk) 20:26, 9 June 2017 (UTC)
Perhaps, but what about my idea of having some kind of Wikidata local to en.wiktionary alone? —CodeCat 20:41, 9 June 2017 (UTC)
OK, about your idea: Symbol support vote.svg Support. --Daniel Carrero (talk) 20:42, 9 June 2017 (UTC)
Sounds to me that what we need is actually WikiMetadata: a database of controlled vocabularies (and even ontologies) used by one Wiktionary to store and serve all linguistic descriptions, be it languages or parts of speech or scripts. No actual "data", only edited by recognized Wiktionarians, cached for fast Lua access, etc. — Dakdada 11:22, 14 June 2017 (UTC)
More and more I think the idea of a local Wikibase installation might be the better course. While I like the idea of sharing as much data as possible across Wiktionaries, I am afraid that the consensus to get there is not likely to arise any time soon. If we start with a local database we can get all of the benefits of a relational database to work with while eliminating the concerns of numerous individuals. - [The]DaveRoss 17:55, 16 June 2017 (UTC)


Thanks for bringing this interesting issue. I'll try to provide a general answer, feel free to ping me if I forgot something.

I'm not sure that a "English Wiktionary name" property would be accepted through the community process on Wikidata. Of course I can't answer on behalf of the editors. In order to get the languages names from Wikidata, maybe it would be more efficient to first improve Wikidata (we are aware that a lot of items about languages are missing), fix the labels if necessary (in collaboration with the Wikidata editors), and finally improve your module so you can use data from Wikidata, but also, in the cases where Wikidata doesn't fit to your naming rules, use your own local labels.

About having part of the data protected: I'm pretty sure this will be rejected by the community, since Wikidata, as the other Wikimedia projects, has the free and direct editing in its basic rules. We don't allow any editor, or external organization, to have specific rights on the data, and we try to solve the potential issues with discussions, source-based informations. I'm sure we can solve this as well whithin both communities.

About having a local database: it is technically possible that you install your own instance of Wikibase, the free software that powers Wikidata. However, our goal here is to share knowledge whithin the Wikimedia projects as much as possible, and try to have less informations split into silos, that's why we can't support this idea.

Now I understand better your concerns about languages, I have the feeling that this is not the best topic to start experimenting with Wikidata and the arbitrary access. These templates are used on almost all the pages of the main namespace. It would be wise to start with something with a smaller scale of change. Also, this topic appears quite controversial sometimes. I agree that we should find solutions together, but for a start, I would suggest something else.

I had a look at the citations namespace, for example this one, and noticed that you're generally including with the quote, the name of the work, its author, and date of publication. These informations could be very easily integrated from Wikidata. Instead of entering manually "1843 — Charles Dickens. A Christmas Carol", you could build a small module that needs only the ID of the work (Q62879) to display automatically the title, author, year of publication, and even more informations. This seems a nice way to start experimenting with arbitrary access. What do you think? Lea Lacroix (WMDE) (talk) 13:57, 12 June 2017 (UTC)

@Lea Lacroix (WMDE): Thanks for your reply. I understand. I agree with the citation idea, I've been thinking that it's a great idea to use Wikidata to fetch that information (and of course help to build Wikidata by adding data about more books when needed). --Daniel Carrero (talk) 02:05, 14 June 2017 (UTC)
About the local database: the idea is that this would not be a database of shareable data, but of metadata. This is very different. Also those metadata are already "split into silos" (in Lua modules), and for good reasons: they are community specific. — Dakdada 11:27, 14 June 2017 (UTC)

Request to add a new language code "rya" for rGyalrong a.k.a. Jiarong[edit]

I'm travelling in rGyalrong speaking areas of Sichuan right now. Danba county and now Kangding, both in Garze (Ganzi).

rGyalrong people identify as Tibetan and are classified as Tibetan by the Chinese government. But there language is more closely related to the Qiang language than to Tibetan. (The Qiang don't idenity as Tibetan and are classified separately by the Chinese government.)

There is an ISO code 'jya'

rGyalrong is apparently very important in reconstructions of Old Chinese as it's considered to be a very conservative member of the Sino-Tibetan family.

Note that it's considered a group of languages (or dialects?) but a proposal to split it into individual language codes was rejected in 2011. This can be handled with labels and in any case I believe none have written forms so we'd have to use either IPA or whatever conventional orthography is used by linguists. One main linguist is known for the study of all these languages so my guess is there's a unified conventional orthography.

hippietrail (talk) 04:14, 10 June 2017 (UTC)

-sche split the code "jya" last year into Situ (sometimes called Eastern rGyalrong) (sit-sit?), Japhug (sit-jap?), Tshobdun (Caodeng, Sidaba) (sit-tsh), Zbu (Rdzong'bur, Showu, Sidaba) (sit-zbu). DTLHS (talk) 04:24, 10 June 2017 (UTC)
If you can obtain more data on how different or similar the lects are, perhaps especially when written, that will be very helpful. The limited data I found on the lects suggested they were not mutually intelligible; even the Ethnologue page says "Dialects are likely three separate mutually unintelligible languages" with low similarity. Guiillaume Jacques says "Rgyalrong comprises at least four mutually unintelligible languages: Japhug, Tshobdun, Zbu, and Situ." That's why I proposed the split DTLHS links to, and (with no feedback for over a month) implemented it. - -sche (discuss) 05:58, 10 June 2017 (UTC)
Thanks for the feedback! None of them are regularly written though I read that Situ had an orthography created before 1950 I think. I haven't found any info on that yet so I guess it's something made up by linguists and/or missionaries. Situ is also by far the most spoken. It's also the one spoken in both Danba and Kangding, so it's the one I'm interested it.
Here is the best technical info I've found so far. I'm still reading through it: http://www.academia.edu/969613/Rgyalronghippietrail (talk) 11:45, 10 June 2017 (UTC)
I've been gathering some comparisons of the languages' pronouns, conjugation patterns and other words at User:-sche/Gyalrong. The last decade or two of literature seems to be in agreement that the lects are mutually unintelligible. The only user I can think of who hasn't commented but might know about these languages is @Wyang. - -sche (discuss) 22:59, 11 June 2017 (UTC)
@-sche I'm not an expert in Rgyalrong either, and only have a physical copy of the 2002 Chinese-Rgyalrong dictionary (in the Situ dialect). The western expert on Rgyalrong is definitely Dr. Guillaume Jacques, who also used to have an account and was an admin on the Chinese Wikipedia: w:zh:User:向柏霖, so it may be a wise idea to contact him re: the organisation of the varieties of Rgyalrong on Wiktionary. Dr. Jacques wrote the 472-page 《嘉绒语研究》 (Jiarongyu yanjiu, “A study on the Rgyalrong language”), which divided Rgyalrong into the four mutually unintelligible dialects above. I also vaguely remember there was a Rgyalrong-Chinese-French dictionary for the Japhug dialect circulating online, which may be handy. Wyang (talk) 08:16, 12 June 2017 (UTC)

Flag semaphore[edit]

I created three flag semaphore entries. Let me know if they look OK and if they should be kept. I also asked SemperBlotto (User talk:SemperBlotto#Flag semaphore).

I tried to imitate the notation used in Category:American Sign Language lemmas.

(It may be of interest that Morse code entries were created in 2016, they also fit the spectrum of "things you can use in place of letters and numbers". The Morse discussion is here: Wiktionary:Beer parlour/2016/August#Proposal: Creating entries for Morse code characters.) --Daniel Carrero (talk) 12:34, 10 June 2017 (UTC)

@Daniel Carrero: Thanks for this. With Braille, Morse Code, and semaphore, I think that covers most unusual encodings for the Latin alphabet other than fingerspelling and shorthand. —Justin (koavf)TCM 16:56, 10 June 2017 (UTC)
@Daniel Carrero, Koavf: Tactile Sign Language: at least its alphabet should be added to Wiktionary. I don't know whether it should be a a type of fingerspelling or rather a new language as such http://www.deafblindinformation.org.au/wp-content/uploads/2016/01/db-tactile-alphabet.pdf
Thanks in advance. --Backinstadiums (talk) 08:31, 11 June 2017 (UTC)
@Backinstadiums: I knew that tactical signing was a thing but not that there was a way to encode it in print. It seems like this is a pictorial chart and not a way to record it that we can use. And tactile signing is not a language itself but another encoding. —Justin (koavf)TCM 15:31, 11 June 2017 (UTC)


Should this be a redirect?--2001:DA8:201:3512:CD84:BF8E:5FA5:70A7 16:54, 11 June 2017 (UTC)

Yes, looks good to me. (single codepoint) and II (two instances of "I") are the same thing, even though there's a distinction at some level which is meaningful to computers. For the same reason, I had redirected say, to !. --Daniel Carrero (talk) 17:00, 11 June 2017 (UTC)
Ok, I won't revert anymore. It's a kind of instinct when you see an IP user doing wholesale deletion of stuff, you think something's fishy. —CodeCat 17:02, 11 June 2017 (UTC)
IP person, if it's not too much trouble, please add {{R character variation}} in these kinds of redirects! --Daniel Carrero (talk) 17:06, 11 June 2017 (UTC)
Could you write documentation and categorise the template, @Daniel Carrero? —CodeCat 17:07, 11 June 2017 (UTC)
Alright, done! --Daniel Carrero (talk) 17:19, 11 June 2017 (UTC)

How can we use Wikidata's existing data pool?[edit]

It's very controversial to adopt Wikidata for anything new that's specific to Wiktionary, but is there any data currently on Wikidata that we can already make use of? Data about species comes to mind. @DCDuring, what do you think? —CodeCat 17:03, 11 June 2017 (UTC)

Author / biographical information is what comes to mind for me. DTLHS (talk) 17:04, 11 June 2017 (UTC)
Certain "is-a" categories, e.g. Rome is a city, labrador is a dog. Equinox 17:06, 11 June 2017 (UTC)
  • Symbol support vote.svg Support for types of place names. --Daniel Carrero (talk) 17:07, 11 June 2017 (UTC)
    • The place name thing is interesting. We might be able to modify {{place}} to make use of it. However, we'd presumably need some way to tell, within an entry, "use this Wikidata item". Would that mean that {{place}} would take a parameter to specify the Wikidata item code (Q...)? —CodeCat 17:10, 11 June 2017 (UTC)
      • I think we should think about if we can avoid using numeric codes directly in entries, if that's possible. DTLHS (talk) 17:13, 11 June 2017 (UTC)
        • If that's possible, then sure. But what is the alternative? —CodeCat 17:14, 11 June 2017 (UTC)
          • I suggest using a combination of numerical codes and hidden text comments that don't affect the entry. d:Q90 (Paris, France) d:Q79917 (Paris, Arkansas) could work like in the table below. --Daniel Carrero (talk) 17:29, 11 June 2017 (UTC)
Code Result
# {{Wikidata place|Q90|capital of France... this is just a comment I can say anything here}}
# {{Wikidata place|Q79917|city of Arkansas... this is the same}}
  1. A city in Île-de-France, France and the capital and most populous city of France.
  2. A city in Arkansas, USA.
Bleh. If we're going to include comments, can we not put them in actual wikitext comments instead of a pointless template parameter? —CodeCat 17:37, 11 June 2017 (UTC)
I think I would be happy enough without any comments at all, just using the 1st parameter for the number code and that's it.
But if we want comments for all place names, I was hoping to do this: if the comment parameter is empty, the entry could be categorized in Category:Place names without comments. --Daniel Carrero (talk) 17:40, 11 June 2017 (UTC)
I like the idea of using comments since it gives an assurance that the wikidata item is actually the intended target- otherwise someone could add an incorrect number and there would be no way to tell what they meant or if it was wrong. DTLHS (talk) 17:45, 11 June 2017 (UTC)
The data itself might help. The module could check (somehow) if the item is in fact a city, town, river or some other kind of geographical thing. If it's not, then it could throw an error. —CodeCat 17:48, 11 June 2017 (UTC)
Two checks: 1) check if the item is a type of geographical location (presumably needed for description and categorization purposes, and a template called {{Wikidata place}} should be able to return a module error otherwise); 2) compare the current entry title with the accepted titles in the Wikidata item. When "Q90" and "Q79917" are used in the entry Paris, the module should be able to check if "Paris" is an acceptable name for both items as per Wikidata. --Daniel Carrero (talk) 17:57, 11 June 2017 (UTC)
That second check might not work. The template is used for definitions in all languages, so it might also be used on Dutch Parijs. —CodeCat 18:01, 11 June 2017 (UTC)
Dutch Parijs is already available in the list of names for d:Q90 in all languages. This reminds me, {{Wikidata place}} should be able to know what is the current language section, so in that Dutch entry the proposed syntax should actually be {{Wikidata place|nl|Q90|capital of France}} (and "en" in English entries, of course). --Daniel Carrero (talk) 18:06, 11 June 2017 (UTC)
That works, but what if the name isn't listed? Should we require the name in language X to be present in Wikidata before we allow the use of the template for X? Adding a tracking category ("hey, this name isn't in Wikidata yet, someone go add it!") would be much more helpful than a straight error. —CodeCat 18:10, 11 June 2017 (UTC)
I support adding a tracking category as you described. Maybe it could be called Category:Dutch place names missing in Wikidata or something. --Daniel Carrero (talk) 18:14, 11 June 2017 (UTC)
A top-level category for stuff to be added to Wikidata in a particular language would also be good. Since you were involved in renaming all the request categories, I'll leave the naming to you. :) —CodeCat 18:17, 11 June 2017 (UTC)
Alright! Proposed category tree (which may be changed/discussed):
If we have template tracking categories as discussed here, I believe we don't actually need those template comments. Sure, it would be wrong to set up a place name definition with the number d:Q10943 because it means "cheese", but the template should be able to recognize that automatically. --Daniel Carrero (talk) 18:36, 11 June 2017 (UTC)
Hmm, wouldn't they rather be request categories? —CodeCat 19:15, 11 June 2017 (UTC)
I and other people in the request category vote seemed to support the following notion: a request category is when you manually request something, like {{rfe}}. In the Wikidata categories above, the entries would get automatically categorized whenever something looks wrong. --Daniel Carrero (talk) 19:18, 11 June 2017 (UTC)
Ok. But to call it an error is a bit extreme. It's just a missing translation, a common symptom of a project that is always a work in progress. —CodeCat 19:20, 11 June 2017 (UTC)
OK, second proposal:
--Daniel Carrero (talk) 19:36, 11 June 2017 (UTC)
I don't like this since it gets into the entire problem of representing lexicographic data in wikidata which they are really not set up to do. Place names have synonyms, obsolete forms, dialectal forms, etc. DTLHS (talk) 18:18, 11 June 2017 (UTC)
This is purely to supplement the current {{place}} template. This template creates definitions based on the parameters you give it and then also categorises appropriately. What would change in a Wikidata implementation is that these parameters would be fetched from Wikidata (e.g. "is a city", "capital of France") rather than being provided as parameters. This isn't lexicographical data in my understanding of the word. —CodeCat 18:22, 11 June 2017 (UTC)
Knowing that Dutch Parijs is a synonym of English Paris is lexicographic data. DTLHS (talk) 18:24, 11 June 2017 (UTC)
And that follows from the fact that both uses of the {{Wikidata place}} template, one on Paris and one on Parijs, would use the same Wikidata item code. This information is therefore not stored on Wikidata at all. Even if there were no Wikidata and {{Wikidata place}} were an empty template, the mere fact that they both had Q90 as a parameter would establish them as synonyms. —CodeCat 18:29, 11 June 2017 (UTC)
It's important that the "city of France" sense in Dutch Parijs will need to access the English translation somehow. Here's two ways to accomplish that: using Wikidata or using a parameter. See table. --Daniel Carrero (talk) 19:04, 11 June 2017 (UTC)
Code Result
# {{Wikidata place|nl|Q90}} <!-- using Wikidata -->
# {{Wikidata place|nl|Q90|Paris}} <!-- using a template parameter -->
  1. Paris (city in Île-de-France, France and the capital and most populous city of France)
The current iteration of {{place}} uses t1= for that purpose. I think we should keep using a parameter, again to minimise our dependence on Wikidata for lexicographical things. —CodeCat 19:07, 11 June 2017 (UTC)
That is an option, but it seems Wikidata is still an option too because it will get "morpheme" data items designed specifically for lexicographical data (Wikidata:Wiktionary).
Maybe the best course of action is just keep using the parameter as you said since it's reliable and it works. But we can change our minds later and delete that parameter from all entries if the morpheme thing works out. --Daniel Carrero (talk) 19:11, 11 June 2017 (UTC)
Yes, I'm aware of how hesitant many people here are about offloading lexicographical data to Wikidata. But if we limit ourselves to the existing data already out there, such as topography in the case of {{place}}, then I don't think it would be as much of an issue. —CodeCat 19:14, 11 June 2017 (UTC)
In my role as articulate fish at a convention of ichthyologists, here's my initial view of how Wikidata might help with Translingual taxonomic entries.

How about a dynamic map for place names? See ca:Kenya. It fact it does not use currently Wikidata but OpenStreetMap with Wikidata identifier. With Wikidata access it could fetch coordinates to add a point in the OSM map, for example for cities. --Vriullop (talk) 12:20, 12 June 2017 (UTC)

The map could be included in an infobox with relevant links to Wiktionary. From d:Q114: capital=Nairobi, demonym=Kenyan, languages=Swahili, English, currency=shilling, TLD=.ke, ISO code=KE. --Vriullop (talk) 13:12, 12 June 2017 (UTC)
Wikidata might be a desirable way for me to speed insertion of "References" to external sites. I have seen such links on Commons, en.WP, and Wikispecies. Wikidata seems to have already accumulated such information from multiple projects, though I haven't yet found which ones. If that were better than what I could find on eg, Rosa at NCBI, then it would marginally speed things up. If I could extract the links in a single step by, say, a substed template (or something more sophisticated), that would save a more meaningful amount of time. I am skeptical about the response time of accessing such links through Wikidata each time an entry is loaded.
Something similar might apply with respect to references at vernacular names, though such references are, IMO, not so useful for such entries.
You might think that the hierarchical taxon/clade structure would be a perfect use of Wikidata, but I believe that relying on such a structure is not helpful in definitions. It seems better to me to define species, genera, tribes, and sections with reference to the family, not matter what intermediate ranked taxa or clades may be used by one or more sites. Non-expert users don't seem to have much familiarity with the various super-, sub-, infra- ranks of phylas, classes, orders, families, and species, let alone tribes, sections, and divisions. DCDuring (talk) 19:49, 11 June 2017 (UTC)
See for example w:ca:Poecilia latipinna. At footpage a template fetches the identifiers from d:Q906572 and links to databases. --Vriullop (talk) 12:12, 12 June 2017 (UTC)
That's the idea, but [] . See [[Alconeura]] for a not uncommon approach to handling missing pages at external databases: providing a link to a page for a higher-ranked taxon. I suppose that could be managed by using a separate template for each higher-ranked taxon selected for inclusion, but I'd want to be able to exclude from the higher-taxon list of links those databases that had a more specific link. In principle there could be many such separate templates for a given taxon, though very few taxa would need more than three. DCDuring (talk) 14:21, 12 June 2017 (UTC)
I'm not sure if I understand. d:Q10404513 has 3 identifiers linking to 3 databases. Are these links ok? Anyway you must provide always the Wikidata page to fetch. Either this one or the higher-ranked. --Vriullop (talk) 14:44, 12 June 2017 (UTC)

Wikidata items as senseids[edit]

Currently, the template {{senseid}} is used to disambiguate specific senses and allow us to link to them from elsewhere. Senseids are just text, they can be anything at all as long as it's unique. This means it's possible to use the codes for Wikidata items as senseids too. For example, we could use {{senseid|en|Q90}} on the first sense of Paris. This wouldn't actually do anything, other than tell editors that this sense refers to the thing whose Wikidata code is Q90. {{senseid}} would not be modified at all for this purpose, it would not access Wikidata. But it does add information to Wiktionary entries, by establishing a conceptual link between senses and Wikidata items.

Such links could be used for future things that we currently haven't thought of. A possibility that comes to mind is Wikipedia links. If {{senseid}} detects that its parameter is a Wikidata item code (Q followed by numbers), then it could be modified in the future to query that item for its en.wikipedia article name. {{wikipedia}} or some similar interproject link could then automatically be displayed next to certain senses, whenever {{senseid}} is given a Wikidata item as a parameter.

For those of us hesitant to offload data onto Wikidata, please notice that this change would change nothing at all on Wikidata's end. The data added by this would be entirely on en.wiktionary, in the form of a wikitext template call, so we have full control. No data would be added to Wikidata, we'd merely be using what's already there. —CodeCat 18:47, 11 June 2017 (UTC)

I've added Wikidata senseids to Paris (English only) to demonstrate what I mean. Wikidata isn't actually enabled yet on Wiktionary, so these do literally nothing other than provide a senseid anchor on the page. —CodeCat 11:54, 12 June 2017 (UTC)
Looks good to me. Two comments: 1) doing this in large scale would probably require a vote, 2) I guess this idea should work 100% well for place names, but we can't use Wikidata items as senseids for all kinds of Wiktionary definitions. Wikidata probably doesn't have separate items for each sense of the verbs do, have, go, be... I guess the future Wikidata "morpheme" thing should work, but apparently the database would need to be built from scratch. --Daniel Carrero (talk) 12:03, 12 June 2017 (UTC)
Indeed, such senseids would probably be limited to nouns only, as Wikidata doesn't currently have items about verbal actions. The important thing to keep in mind is that the Wikidata items are about the referents of words, not the words themselves. The words would have their own items, if we get around to that. This does create some issues that I foresee. The colour green has Wikidata item d:Q3133. But in our entry green, we have not only a noun referring to this colour, but also an adjective. Since senseids must be unique within a single language section, we can't give both of them {{senseid|en|Q3133}}. So which one do we put it on? —CodeCat 12:10, 12 June 2017 (UTC)
Using Wikidata senseids would probably work well, not only for place names but also for a lot of proper nouns. For common nouns, adjectives, verbs and everything else, I don't have actual numbers, but at first sight I think it would fail more often than not. d:Q9465 is "ethics", so would we use it which sense of ethics and/or ethical, if any? d:Q7242 is "beauty", and our entry beauty has multiple related senses too. --Daniel Carrero (talk) 12:23, 12 June 2017 (UTC)
We could just add disambiguators to the ids. They are still just strings, after all. So {{senseid|en|Q3133-noun}} for the noun green would work too. As long as the Wikidata id can still be parsed out, it should be fine. As for ethical, I think the issue here is that ethics is a field of study, which the adjective doesn't really have much to do with. If it did indeed refer to the same thing as ethics, I see no reason not to include the Wikidata id there too. —CodeCat 12:52, 12 June 2017 (UTC)
I'm fine with adding disambiguators to the ids, with hope that later in the future we can replace all that stuff by the Wikidata "morpheme" thing. --Daniel Carrero (talk) 12:56, 12 June 2017 (UTC)
I've added a tracking template, Special:WhatLinksHere/Template:tracking/senseid/Wikidata, to {{senseid}} whenever a Wikidata id is used as a senseid. This allows us to keep track of them for the time being. The process of adding all these senseids to entries will be a long one. I've thought of a possible way to speed it up, though, once Wikidata is enabled. Module:headword can be modified so that it checks if a Wikidata label exists with the same name as the current page. If so, it could track the page. This would give us a list of all pages that could probably have a Wikidata senseid added to them. —CodeCat 19:01, 13 June 2017 (UTC)
Nice work! We should use senseid / senselinks more often, but as you said they are quite tedious to add at the moment. A gadget could also be an option, with a Wikidata suggest-style dropdown. – Jberkel (talk) 16:52, 14 June 2017 (UTC)

Note that senses will have a new ID in Wikidata lexeme data model: mw:Extension:WikibaseLexeme/Data Model, with format L3746552-S4. --Vriullop (talk) 10:59, 17 June 2017 (UTC)

This is one of the uses that I feared, having wikitext littered with numerical identifiers. Ugly. --Dan Polansky (talk) 15:21, 17 June 2017 (UTC)

Deverbative or deverbal?[edit]

User:Barytonesis recently created a template {{deverbative}}. This is a good idea but I think it's wrongly named and should be {{deverbal}}. Both terms are synonyms but "deverbal" is much more common (as well as shorter and easier to type) — about 9x as many hits in Google, plus my spelling checker actually marks "deverbative" (but not "deverbal") as a mistake, plus Wikipedia has entries for Deverbal noun and Deverbal adjective but no entries for deverbative anything, not even redirects. The template puts these terms under the category "Foo deverbatives" (none of which have been created yet, and should not be created at all probably). I think instead they should go under "Foo deverbal nouns" or "Foo deverbal adjectives"; this requires an optional pos= parameter (which should default to "noun"). There are under 50 entries currently using this template and all appear to be nouns. I can easily use a bot to rename the template uses. Any objections? If not I will go ahead and make the changes. Benwing2 (talk) 23:37, 11 June 2017 (UTC)

I agree. It's also consistent with denominal (for which there should also be a {{denominal}} template). Additionally, it should point to Appendix:Glossary. --Victar (talk) 01:07, 12 June 2017 (UTC)
Fine by me. I'm actually happy that you take interest in it --Barytonesis (talk) 09:59, 12 June 2017 (UTC)
While, you're running a bot, you should change the outdated |lang= to |1=. --Victar (talk) 10:13, 12 June 2017 (UTC)

Conditionally renominating User:Dan Polansky for admin per User talk:Dan Polansky#Renomination for admin?[edit]

I hereby nominate User:Dan Polansky for administrator on the English Wiktionary, with the one condition that Dan will be disallowed from using the block tool. For some context, quoting Dan himself:

"For context, my admin vote failed in August 2016.

Note that the block tool is one of power over other people, and should not be awarded to people who we cannot trust, no matter how good editors they are. The use of the block tool does not require consensus, and blocks are rarely challenged. Multiple of current admins are not qualified to use the block tool, in my view. The deletion tool can be abused to hide trails of conversation; it was used in this way by an English Wiktionary admin who meanwhile ceased editing."

I believe Dan would be able to delete pages in accordance to the rules and such, and seems to have a need for such. Still not sure about the block tool, but I'm throwing this condition in because it puts some previous opposers at ease a bit. Honestly, a lot of users may most of all fear that Dan may abuse the block tool. I think Dan is a great editor, though he may be rude sometimes and is notorious for such, so the admin tools of page deletion at the very least should be awarded to him. IMO, the admin tools are not supposed to be awarded to people because they're "nice" (and I'm not necessarily saying Dan is not nice), but because it would be useful for them to have to make even more constructive contributions, for instance, by fighting obvious vandalism, deleting pages in accordance to RFV and RFD, etc.

As I've never started a vote before, who wants to start this vote? PseudoSkull (talk) 18:22, 12 June 2017 (UTC)

I am somewhat uncomfortable with the idea that someone be given user rights that they are then prohibited from using. Either they should have them with the trust of the community to use them appropriately, or not have them. If there are, as Dan suggests, multiple users who should not have the blocking tools but should be able to delete (and protect?) pages, we could always create another user group with just the rights which are applicable. We could, for instance, give folks in the "template editor" group delete rights, or create a "deleters" group with just the rights to delete and restore. It is even possible to have delete rights without the ability to see deleted revisions, etc. I don't know if the complication is worth the extra effort, but to me it is preferable to the alternative. - [The]DaveRoss 19:04, 12 June 2017 (UTC)
I am also uncomfortable with that idea. I would probably support creating a new group "Deleters". --Daniel Carrero (talk) 02:08, 14 June 2017 (UTC)
Along with another group for "non-deleters". --Victar (talk) 02:33, 14 June 2017 (UTC)
Non-deleters would be anyone not in the "deleter" or "administrator" groups. - [The]DaveRoss 11:07, 14 June 2017 (UTC)

Local groups[edit]


Yesterday, I went to met my local colleagues, contributors of Wikipedia, Wiktionary, Commons, Wikidata, OpenStreetMaps or OpenFoodFacts. We met several times this year, to discuss about contribution and drink beers. Is there some contributors here that do the same in their local groups? If yes, do you also contribute on Wikipedia and chat about this project or do you give news from Wiktionary to Wikipedians. It's not a sociological inquiry, it's just curiosity. I have no idea how local groups works out of France. Face-smile.svg Noé 10:06, 13 June 2017 (UTC)

I think that the concentration of francophone Wikimedians in a relatively small, easily accessible area (France) is quite different from the anglophone community. I just recently met another Wiktionarian and would be happy to meet more, but it isn't likely to happen often. I also have plans to meet some Wikipedians later this year and maybe speak a bit about Wiktionary, although I honestly don't know what I'd tell them. —Μετάknowledgediscuss/deeds 17:34, 13 June 2017 (UTC)

Unprotect WT:NFE[edit]

I think it's silly that this is protected against non-admins. Surely non-admins also have important news to announce? —CodeCat 19:27, 13 June 2017 (UTC)

I agree. Maybe allow autoconfirmed only? --Daniel Carrero (talk) 19:29, 13 June 2017 (UTC)
Autoconfirmed sounds good to me. —JohnC5 19:31, 13 June 2017 (UTC)
Agreed and updated edit protection to autoconfirmed. - [The]DaveRoss 19:44, 13 June 2017 (UTC)
Thank you. I've been annoyed at the protection of that page for a while. — Eru·tuon 02:19, 14 June 2017 (UTC)

Foreign-language Wikisaurus entries[edit]

The very great majority of Wikisaurus entries are English. Some, such as Wikisaurus:కోతి are not. Is there some way we could add a "lang=" parameter to these? SemperBlotto (talk) 05:55, 14 June 2017 (UTC)

Where would you put it? {{ws header}}? —Μετάknowledgediscuss/deeds 18:31, 14 June 2017 (UTC)
Hmm, the language should probably be added to all the templates on Wikisaurus:కోతి that display Telugu text, so that it can be properly tagged. And the header should have a display title with the Telugu part appropriately script-tagged, as is done by Module:headword for entries in certain scripts. — Eru·tuon 18:44, 14 June 2017 (UTC)
imho, the English wiktionary isn't even the place for foreign Wikisaurus entries... --Barytonesis (talk) 19:39, 17 June 2017 (UTC)
Why not? Also, not giving language in pagetitles will inevitably lead to collisions, see for example WS:god, god.__Gamren (talk) 09:29, 21 June 2017 (UTC)

Canadian spelling: how to tag entries when both UK and US spellings are accepted[edit]

I see this has been discussed in the past but I'm not sure how to proceed. What I would like to do is tag gynecology and gynaecology as both being acceptable spellings in Canada (Royal societies tend to use the UK spelling, national and regional medical associations tend to use the US spelling, universities are mixed even in department names). I made a try at it using colour and color as examples:

{{tcx|Commonwealth spelling|Canada|lang=en}} and {{tcx|American|Canada|lang=en}} in the "Noun" headword
and {{qualifier|Commonwealth|Canada}} etc. in the "Alternative form" sections.

But with deprecated templates and such I'm not sure if I'm doing this correctly. Can anyone give me some pointers? Thanks! Facts707 (talk) 08:11, 14 June 2017 (UTC)

The label template for headwords is {{term-label}}, for definitions {{label}}. The template for Alternative forms sections is {{alter}}; see the template page for more information. — Eru·tuon 17:06, 14 June 2017 (UTC)

Why are we putting links to Wikipedia in reference sections?[edit]

Grant Parish and many others. Wikipedia is not a reference. DTLHS (talk) 21:32, 14 June 2017 (UTC)

This looks like something that could be fixed by bot in a number of entries. If a "References" section only contains instances of {{pedia}}, rename the section to "Further reading". --Daniel Carrero (talk) 21:35, 14 June 2017 (UTC)
I didn't ask if it could be fixed with a bot. DTLHS (talk) 21:36, 14 June 2017 (UTC)
I know. What you said sounded like a rhetorical question, though. It seems there's already consensus not to put links to Wikipedia in reference sections. --Daniel Carrero (talk) 21:42, 14 June 2017 (UTC)
I am not the first to put Wikipedia under refs, further reading or whatever. Will up the top do, or not at all? I still regard Wikipedia as a reference though, not as further reading. DonnanZ (talk) 22:10, 14 June 2017 (UTC)
So I need to have both "References", increasingly common is taxon entries, and "Further readings" headings when neither is accurate with respect to links to external databases, Commons, and Wikispecies? I specifically said that I thought "References" was an adequate heading for including Wikipedia as well as the others in a recent discussion in which there were numerous participants.
Over time the same type of content appeared under no less than four different headings. First, "See also", which was deemed inappropriate, with "External links" being mandated by vote. Now "References" and "Further reading" are available, but one of those is also to be forbidden? This seems like pointless makework in service of some silly, content-killing uniformity. DCDuring (talk) 23:13, 14 June 2017 (UTC)
The thing with using Wikipedia as a "reference" is that an editor could reference themselves, so it's better to use published, unchanging sources. Andrew Sheedy (talk) 23:18, 14 June 2017 (UTC)
Fine, whatever, I see that this is a pointless discussion. DTLHS (talk) 23:27, 14 June 2017 (UTC)
I understand the issue about mislabelling wikis as references and wish that there were a better title. We really should have a single vague all-inclusive name for these things. IMO "Sources" would do the job. It is wonderfully ambiguous (of what? for whom? of what authoritative status?). DCDuring (talk) 00:09, 15 June 2017 (UTC)
I'm not sure I agree with using "Sources" as a single vague all-inclusive name. I see the problem you have with the current set-up is that it does not seem to work well for links to databases and images. I agree with the images thing, at least, and would accept using some heading for databases even though I don't think is strictly necessary (just my opinion). But the current set-up seems to work well for basically everything else. --Daniel Carrero (talk) 00:38, 15 June 2017 (UTC)
We also have the problem of too many headers. We may be able to have every conceivable entry because we have "no space constraints", but we do have space constraints to the extent that we remain interested in retaining human individual users. We already suffer because we use headers to structure our entries and retain oversized fonts for them. Right now many entries should have both "References" and "Further reading", typically with just one or two lines for each. Hiding the content by default might be a solution, if one could hide all of the references or all of the further reading and databases. DCDuring (talk) 00:53, 15 June 2017 (UTC)
The source I was using (Wikipedia-logo-v2.svg Index of U.S. counties on Wikipedia.Wikipedia ) is in my opinion a reference. There's an old saying "horses for courses", and that can apply to choosing a heading. DonnanZ (talk) 07:05, 15 June 2017 (UTC)

German vs Germans collectively[edit]

Continuation of d:Wikidata:Project chat#Germans (Q42884), only for the group or also individuals?

In the discussion linked above, I asked if the concept of Germans as a collective ethnic group is compatible with our entry German, which has one sense referring to a single individual of that group. We group singular and plural into one lemma, so in principle the same lemma can refer to one German or many, depending on which inflection you choose (and indeed, in some languages, there isn't even an inflectional difference). However, the idea of many individual Germans is still not quite the same as all Germans collectively: "the Germans" in our understanding can refer to this entire group, but grammatically it's just multiple Germans with nothing to indicate that it refers to all Germans as opposed to merely multiple Germans. Does this merit a separate sense, marked with {{lb|en|in the plural}}, referring to "all Germans, Germans as a group collectively"? If not, why would it not? It is conceivable that a language has distinct terms for multiple Germans contrasting with the collection of all Germans, but whether this occurs in practice I don't know. If there are examples of this, it would be evidence that the concepts are indeed separate. —CodeCat 19:32, 15 June 2017 (UTC)

There is difference between the use of Germans with and without the definite article. Without the definite article you have sentences like "Germans drink a lot of beer", which is grammatically the same as "trees absorb a lot of sunlight". With the definite article you have "the Germans have elected a new chancellor", which is grammatically the same as "the trees have started changing color". In both of those cases I'm speaking about Germans as a whole and trees as a whole (which is not quite the same as "all Germans" or "all tree"). The case without the definite article can apply to just about any countable noun. The case with the definite article is more restricted (I'm not quite sure what the criteria are). So I really don't think we need a separate sense line for this. --WikiTiki89 20:11, 15 June 2017 (UTC)
As WikiTiki says, this is a general phenomenon, and probably does not merit a separate "in the plural" sense... but then, some entries where a homograph of the singular can be used as a proper noun to refer to the group collectively do indicate that, e.g. Abenaki! I don't know if that should be changed. Is there a difference (besides ethnicity) between "five Abenaki set out; later, the Abenaki reached their destination", "she settled among the Abenaki" and "the Abenaki considered issuing their own passports" vs "five Germans set out; later, the Germans reached their destination", "she settled among the Germans" and "the Germans considered issuing their own passports"? Hmm...! Compare also Chinese.
The fact that some language may have a distinction (and some constructed language, or even some computer 'language'/framework like is used on Wikidata, probably does have a distinction) does not mean that any of the English words, whether "Germans" or "trees", have separate senses. Some languages may distinguish living animals from dead ones, but English has just one sense of chum salmon, not two for "a living dog salmon" and "a dead dog salmon".
That some words in some languages do not match 1-to-1 to other words in other languages (while other words do) has been brought up before in the context of attempts to migrate Wiktionary sense information to Wikidata.
If Wikidata is ever to be comprehensive, it might well need an "instance of" entry for instances of Q42884.
- -sche (discuss) 20:56, 15 June 2017 (UTC)
I hear the "the Abenaki" in exactly the same way as the "the Germans". The tangible evidence is that "the Abenaki" still has plural agreement. --WikiTiki89 21:19, 15 June 2017 (UTC)

See also links[edit]

It seems that it is common practice to restrict links in the "See also" section to the same language as the entry. However this is not explicitly mentioned in WT:EL. Could it be added to better reflect reality/practice? – Jberkel (talk) 21:57, 15 June 2017 (UTC)

I have used it recently to link English words to Scots words, like Cumbernauld and Cummernaud, in lieu of a translations section. DonnanZ (talk) 10:07, 16 June 2017 (UTC)
Why not just create a "Translations" subsection in such cases? — SMUconlaw (talk) 13:59, 16 June 2017 (UTC)
Because nobody has opened one for other languages, and Scots should be regarded as a dialect rather than a language. To quote Oxford: "[mass noun] The form of English used in Scotland." DonnanZ (talk) 16:01, 16 June 2017 (UTC)
There's a difference between Scottish English and Scots, though. Also, they treat Middle English as English, too, I think. (Do they treat Yola as English?) - -sche (discuss) 16:06, 16 June 2017 (UTC)
I don't think links to foreign-language entries should be entirely banned; they are sometimes useful, for example if there is no English word for something, but two or a few languages have a word for it that could be interlinked. (There are those who prefer to create SOP translations targets in such cases, but that gets harder to justify the fewer and more obscure the languages with words for the thing are.) Some of the information could perhaps be shoehorned into etymology sections (even for unrelated words, one could say "compare/contrast how the X language term for the same thing, Y, is formed"), but that seems suboptimal. - -sche (discuss) 16:06, 16 June 2017 (UTC)

Categories English_N-syllable_words[edit]

This may not be the most important issue in the World, but currently open compounds seem to be listed in the categories English_N-syllable_words, see for example this link[1]. I'm not convinced that open compounds such as extravehicular activity or venire facias de novo should belong to these categories. At least in sensu stricto they are not words but dictionary entries. --Hekaheka (talk) 08:48, 16 June 2017 (UTC)

We call phrases words, e.g. "word of the day". Equinox 11:07, 16 June 2017 (UTC)
We can't change. Think of the implications for our slogan. DCDuring (talk) 11:17, 16 June 2017 (UTC)
We may or may not be able to change. I'm pointing out that this category is not the only instance of the issue. Equinox 11:30, 16 June 2017 (UTC)
Personally, I'd have no objection if we changed words to entries or terms in all such categories. In fact, I've suggested this before at some other forum. — SMUconlaw (talk) 13:58, 16 June 2017 (UTC)
I see no point in a category that lists N-syllable phrases. I could understand words. But then, on the other hand, I could just ignore the part that I find useless. --Hekaheka (talk) 14:21, 16 June 2017 (UTC)

Consistent headings of numeral versus number[edit]

has numeral. has numeral. has numeral. has numeral. 사#Numeral has numeral. 오#Numeral has numeral. 육#Numeral has numeral. 칠#Numeral has numeral.

has number, not numeral. 팔#Number has number. 구#Number has number.

구#Numeral has numeral, not number

Is it possible to provide a consistent heading, so that when I'm copying the hanja for each sino-Korean number, I can just change the hangul in the URL and get directly to the section I want? Alternatively, making both #Number and #Numeral link to the same place would work.

I hope I haven't stepped into an awful WikiWar with decades of fighting. AGrimm (talk) 01:29, 17 June 2017 (UTC)

These are both valid headers. WT:ELE suggests that Numeral is a part of speech while Number is a symbol. I don't know quite what that is supposed to mean and I am not familiar with Korean. Anyone else? Equinox 04:20, 17 June 2017 (UTC)

Inflectional suffixes: -bam, -bas, etc.[edit]

Do we want that? And right now, Category:Latin suffixes is a mess, because it gathers inflectional and derivational suffixes indiscriminately. --Barytonesis (talk) 17:20, 17 June 2017 (UTC)

I see no reason not to add inflectional suffixes like that. As for the category you mention, it occurs to me we should have a category Category:Latin derivational suffixes to match Category:Latin inflectional suffixes. — Eru·tuon 17:43, 17 June 2017 (UTC)
I've created the category derivational suffixes, and moved noun-forming suffixes, verb-forming suffixes, adjective-forming suffixes, and adverb-forming suffixes to be inside that category, rather than in the main category suffixes. This can be undone if editors disagree. — Eru·tuon 17:49, 17 June 2017 (UTC)
@Erutuon: Thanks. The categories you mentioned still appear in the main one, though. --Barytonesis (talk) 19:33, 17 June 2017 (UTC)
@Barytonesis: Yeah, it'll eventually update. Once a module change has been made, the software takes a while to regenerate the pages with the new code, and to change the members of categories. — Eru·tuon 19:35, 17 June 2017 (UTC)

2/3 majority[edit]

At some point, I'd like to mention explicitly in WT:Voting policy that a vote passes if it reaches a 2/3 majority. Apparently that's the true consensus and everybody knows it (although it's challenged and discussed sometimes), but it doesn't seem to be written as policy yet. --Daniel Carrero (talk) 03:54, 18 June 2017 (UTC)

Strange that it isn't mentioned. Are there any types of votes in which a bare majority would be enough? — Eru·tuon 04:13, 18 June 2017 (UTC)
I don't think all vote types should have the same criteria. For instance, a 2/3s majority is not sufficient for CheckUser (due to overriding policy) and I don't think it is sufficient for other user rights. For policies etc. I think that it is a good measure, for user rights I think it is low. Likewise, for removal of user rights I think 2/3s is high. Here is a previous discussion on the matter (thanks Dan). I fully support documenting what we consider a pass or fail (perhaps with a third range for no-consensus). - [The]DaveRoss 12:36, 18 June 2017 (UTC)
We could add "See meta:CheckUser policy#Appointing local Checkusers for the policy about appointing new checkusers." as part of the text in WT:Voting policy. --Daniel Carrero (talk) 19:40, 18 June 2017 (UTC)
I'm fine with having 2/3 majority for both addition and removal of user rights like most kinds of votes. But I think I can see the case for doing something different. Are there any specific suggestions? Maybe 3/4 majority for addition of rights and bare majority for removal of rights? --Daniel Carrero (talk) 02:12, 19 June 2017 (UTC)

Latin demonym capitalization[edit]

Some Latin demonyms (like Germānus, Carthāginiensis or Celta) are capitalized, while others (like romānus or graecus) are not.
What is the correct way to handle these demonyms? All capitalized? None capitalized? One of the two, with the other as an alternative spelling form?
The dictionary I use capitalizes them all, but I'm not sure how these things are done around here.
I would really appreciate any information about this. –– GianWiki (talk) 22:57, 18 June 2017 (UTC)

I'd say we should capitalize them all. We shouldn't have Latin entries for romānus and graecus unless they also have non-demonym meanings. —Aɴɢʀ (talk) 13:48, 20 June 2017 (UTC)
@Angr I wonder, what is your motivation for this? Even Italian, Spanish or Romanian demonyms are lower case. If I am not mistaken, you wanted some Sanskrit terms capitalised as well? --Anatoli T. (обсудить/вклад) 01:50, 21 June 2017 (UTC)
@Atitarev: My motivation is that that's the way I'm used to seeing it done in modern editions of Latin texts. I've always seen the first sentence of De Bello Gallico written "Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur." As for Sanskrit, I don't recall ever endorsing capitalized terms at all; we write Sanskrit in Devanagari anyway. —Aɴɢʀ (talk) 10:51, 21 June 2017 (UTC)
I've noticed that my dictionary capitalizes parts of speech derived from demonyms as well: (e.g. Graecānicus, Graecē, GraeculusGraecus). Also, I'll try summoning @JohnC5 and @Metaknowledge, as active users on Latin, to get more insight on the matter. – GianWiki (talk) 01:22, 21 June 2017 (UTC)
@GianWiki: There's been some disagreement on this topic in the past, particularly in reference to days of the week and months. The Romans only had capitals, so it is an editorial choice on our part (as with the alternations of u ~ v, i ~ j, and inclusion of macra and diaereses). I favor capitalization though I concede we maybe should leave soft redirects from the lowercase entries. —JohnC5 01:50, 21 June 2017 (UTC)
@JohnC5: Capitalization with soft redirects from the lowercase entries (unless they have non-demonym meanings: germānus (pertaining to brothers or sisters) vs. Germānus (Germanic)) sounds like the best option to me – GianWiki (talk) 13:43, 22 June 2017 (UTC)
@GianWiki: Take it away, maestro. —JohnC5 14:34, 22 June 2017 (UTC)
I wonder, do Latin dictionaries written in languages that do not capitalize adjectives, like German, French, and Spanish, follow the rules of the language and not capitalize Latin adjectives? — Eru·tuon 01:28, 21 June 2017 (UTC)
Capitalisation of demonyms is a borrowing from modern English. It has infected even transliterations of languages, which do not have the distinction between lower and upper case letters. Oppose. --Anatoli T. (обсудить/вклад) 01:30, 21 June 2017 (UTC)
@Erutuon: du Cange (published in France) gives entries in all caps, but capitalizes Anglicos in the example sentence. —Aɴɢʀ (talk) 13:55, 22 June 2017 (UTC)
@Erutuon: My la-es dictionary from school days capitalizes all demonyms. The Gaffiot la-fr also does, with different entries for Germānus-germānus. --Vriullop (talk) 08:06, 26 June 2017 (UTC)
I have not capitalised any Middle Dutch words because they weren't in the original manuscript. I think the same practice should be applied to Latin and other old languages. If we go as far to include Gothic in its original script, then we should also write lowercase Old English. —CodeCat 14:04, 22 June 2017 (UTC)
I agree, although the average reader is most likely to come across Latin in a modernized version, meanng that they are most likely to see and look up the capitalized form. Andrew Sheedy (talk) 23:59, 25 June 2017 (UTC)
It wasn't the case when I had Latin classes in the former USSR because Russian has different capitalisation rules. I wouldn't be surprised if Latin citations in an Italian or Spanish text followed the capitalisation of these languages.--Anatoli T. (обсудить/вклад) 08:21, 26 June 2017 (UTC)
The Latin works I've come across in citing the various Latin country names that were RFVed have capitalized demonyms and place names, so I support doing that just like we normalize consonantal u to v. - -sche (discuss) 23:10, 25 June 2017 (UTC)
Were these citation in English or German works? It's OK to normalize consonantal u to v but the capitalisation normalisation is going too far, IMO. Latin is apparently closer to the modern Roman languages than English and the majority, including Italian, don't capitalise demonyms. It's not normalisation but anglicisation. --Anatoli T. (обсудить/вклад) 03:15, 26 June 2017 (UTC)
@Atitarev: As Vriullop and I commented above, demonyms are capitalized in Latin also in works published in France and Spain. It's not just conforming to English- and German-speakers' expectations. —Aɴɢʀ (talk) 10:22, 26 June 2017 (UTC)

Proper nouns in derived terms lists[edit]

I'm wondering if proper nouns really belong in derived terms lists, for example. @Donnanz --Victar (talk) 12:56, 20 June 2017 (UTC)

I think it's fine, unless you think there are too many possible derived terms and it would make the page too big. DTLHS (talk) 13:50, 20 June 2017 (UTC)
Right, that's my concern, with words like park, wood, bury, etc. --Victar (talk) 14:03, 20 June 2017 (UTC)
If there's an overabundance of place names in derived terms, they can be placed in a separate section. I have already done that somewhere. DonnanZ (talk) 14:31, 20 June 2017 (UTC)
One day, I split disease#Derived terms into the current three collapsible boxes. (obviously it's a long list of diseases, not of proper nouns; this is just an idea about how to handle long lists of derived terms) --Daniel Carrero (talk) 14:36, 20 June 2017 (UTC)
That works, but it seems pretty crazy to be manually adding all those instead sorting them into a category, like Category:English terms containing disease using a bot.--Victar (talk) 15:10, 20 June 2017 (UTC)
I support creating categories like Category:English terms derived from "apple" (see apple#Derived terms), Category:English terms derived from "disease", etc. --Daniel Carrero (talk) 15:17, 20 June 2017 (UTC)
I feel like there is a difference between Category:English terms derived from "apple" and Category:English terms compounded with "apple". --Victar (talk) 15:24, 20 June 2017 (UTC)
Is "compounded" a subset of "derived"? Maybe it would be a good idea to keep whichever "apple" category is the most inclusive one. --Daniel Carrero (talk) 15:28, 20 June 2017 (UTC)
I suppose. I'm trying to put to words the distinction between parklet and Park County. --Victar (talk) 15:39, 20 June 2017 (UTC)
Why do we need a category for something that will rarely be used and is most likely to be used from the entry itself, eg for apple. The following special search would provide what is needed https://en.wiktionary.org/w/index.php?search=intitle%3A"apple"&title=Special:Search DCDuring (talk) 15:57, 20 June 2017 (UTC)
I'm thinking more about the page subheaders. --Victar (talk) 16:58, 20 June 2017 (UTC)
If we are talking about open compounds, a template containing the above code would generate a list on demand. Compounds spelled solid would require something different. Note that we already have 244 entries that include the word apple with the headword spelled open, including hyphenated forms. DCDuring (talk) 02:31, 21 June 2017 (UTC)
@Daniel Carrero: Is there precedence for using quotation marks in category names? --Victar (talk) 21:25, 20 June 2017 (UTC)
No, I think there's no precedence for using quotation marks in category names. We don't have to use them. But I think in some cases it might be clearer that we are talking about the word itself. For example: Category:English terms derived from "food" seems clearer than Category:English terms derived from food. --Daniel Carrero (talk) 23:37, 20 June 2017 (UTC)
Actually, it looks like we already generate similar categories, Category:Terms derived from the PIE word *ǵónu. --Victar (talk) 00:35, 21 June 2017 (UTC)
@Daniel Carrero: Given that, what about moving forward with Category:English terms derived from the word apple? --Victar (talk) 01:16, 21 June 2017 (UTC)
I think this one that I had said before is better: Category:English terms derived from "food". It's shorter than the alternative, even though both names are fairly long. "the word" might not even always be true if we decide to have a few compound term categories like Category:English terms derived from "pick up" (pick up#Derivations). --Daniel Carrero (talk) 02:25, 21 June 2017 (UTC)
I agree that they should be listed in theory; if we make an exception, it is only for the sake of page size and navigation. Equinox 14:41, 20 June 2017 (UTC)
I think proper nouns should be included in Derived terms lists, but perhaps the list could be split somehow into "compounds derived from x" and something else. — Eru·tuon 00:31, 21 June 2017 (UTC)
Personally, I think I'd prefer a solution similar to this:
====Derived terms====
[[:Category:Terms compounded with the word park|Terms compounded with the word park]]
* {{l|en|parkade}}
* {{l|en|parklet}}
--Victar (talk) 02:53, 21 June 2017 (UTC)
If the desire is to exclude or to separate placenames or proper nouns, let's just spell that out in the category titles or 'headers' of the separate tables: "terms derived from 'town'" vs "placenames derived from 'town'". To distinguish "compounds" from "derived terms" is to do something else entirely: "Sandtown" (and lots of other "-town"s) is as much a compound with "town" as "townhouse", so both belong in the same category if the categories are based on compounding vs other derivation. - -sche (discuss) 04:22, 21 June 2017 (UTC)

Nahuatl vs Classical Nahuatl?[edit]

So what exactly is the/our difference? Presumably, Nahuatl comprises modern lects, but apart from that?
Or for my own narrow purposes: are words like coyote, chocolate, avocado, tomato, guacamole and their equivalents in many seemingly predominantly European languages from Nahuatl or Classical Nahuatl? Does the perceived inconsistency (see Category:Terms derived from Nahuatl, Category:Terms derived from Classical Nahuatl) reflect actual difference, or simply begrebsforvirring?__Gamren (talk) 20:11, 20 June 2017 (UTC)

Oh, I forgot to link to [2], Wiktionary:Beer_parlour/2008/September for prior discussions.__Gamren (talk) 20:21, 20 June 2017 (UTC)
As the discussion you link to suggests, "nah" seems to be a relict of Wiktionary's early history when we often copied the ISO's codes for both macrolanguages and their constituent varieties, both unintentionally (as a result of importing codes en masse) and apparently sometimes intentionally as a result of a lax attitude towards such inconsistencies. But it obviously doesn't make sense to have both the macrolanguage nah and the subvarieties nci, nch, nhn, etc. There is considerable debate over how mutually intelligible the varieties are, with some sources saying there is little to no mutual intelligibility and speakers often cannot understand another variety at all, and other sources saying they are "largely mutual intelligible". - -sche (discuss) 02:42, 21 June 2017 (UTC)
An example of the pervasive confusion is ca̱la̱', which treats Mecayapan as nah, while other entries treat it as a separate lect with its own header. - -sche (discuss) 16:40, 27 June 2017 (UTC)

Unicode 10.0.0 released[edit]

Unicode 10.0.0 is released on 2017 June 20. Now, you can make entries with new characters (if you can). [3] --Octahedron80 (talk) 05:27, 21 June 2017 (UTC)

🤟 (which is supposed to be the "I love you" hand sign). —Justin (koavf)TCM 06:46, 21 June 2017 (UTC)

There is also the list of variation sequences (which is not yet supported in most fonts). This includes CJK compatibility ideograph mappings. --Octahedron80 (talk) 07:15, 21 June 2017 (UTC)

Thanks for the notice. I see that characters were added to, e.g., Zanabazar Square. Do we need to update the range of characters covered by "Zanb" as listed in Module:scripts/data? - -sche (discuss) 14:25, 21 June 2017 (UTC)
I already pre-updated the module for new scripts. --Octahedron80 (talk) 10:16, 22 June 2017 (UTC)

About attesting symbols[edit]

I'd like to discuss about symbols like these, which often have entries:

I don't know if we want them here.

Let's assume for a second that these symbols are unwanted. In that case, we would probably be able to delete all or most of the entries listed in these appendices. I think we wouldn't even need RFD/RFV to do it. They are unlikely to be found in running text by default. Except the emoticons, but they are mostly found on the internet (which we mostly don't accept for attestation purposes, as you know) as opposed to books. (That said, I added two quotations from a book that uses the domino tile "🁊" in running text.)

On the other hand, let's assume for a second that these symbols are wanted. I would be fine with attesting symbols as symbols -- that is, in drawings, pictures, comics and other contexts, as opposed to only in running text.

  • Attest map symbols by finding them in durably-archived maps.
  • Attest technical symbols by finding them in durably-archived technical books or manuals, etc. (I'm thinking = pause, = power symbol, 🔗 = hyperlink, 🔇 = mute, = refueling needed...)
  • Attest the heart () as a symbol of love by finding it in this context in a durably-archived drawing or something.
  • I got a quotation for 💡 meaning "idea" from a textbook, but it could easily be found in comics.
  • Attest computer symbols (like 💾 for "save") attested by finding them in durably-archived computer screenshots, or even consider its actual use in durably-archived software. (That said, I got a quotation for meaning "Tab symbol" in running text in a 1989 video game -- "⇆Tab to Move Down to Notes Button". In 📋, I mentioned that Windows 3.1 uses it in lieu of a quotation, but it could equally be from a Windows 3.1 screenshot in a book.)
  • Accept uses of symbols in durably-archived movies, comics and video games. (I got a few citations for 🛇 meaning "prohibition symbol" from cartoons.)

Let me know if there are any ideas/comments. --Daniel Carrero (talk) 07:18, 21 June 2017 (UTC)

"Attesting symbols as symbols" e.g. in comics would make us a picture book, not a dictionary. Equinox 14:32, 21 June 2017 (UTC)
The last part is true, this is not the job of a dictionary [sense 1: "A reference work with a list of words ..."].
Maybe the sense 1 is the only one that matters to us -- we would be a normal dictionary and list words only, not symbols. This is one idea that makes sense, and is the first one I started to discuss in my message above.
Maybe this could be the job of a dictionary of another sense [sense 2: "By extension, any work that has a list of material organized alphabetically..." although the "alphabetically" part probably can't apply to all languages and contexts.]
Google Books has some books called "Dictionary of Symbols" which could be called "not dictionaries" too.
Consider the entry . Do you think we should delete the senses "hearts (on playing cards)", "love", "(video games) a hit point", "(video games) healing" and (Japanese) "An emoticon indicating a smooth and pleasant voice."?
If we kept them, that would not make us a "picture book" as defined in the linked entry. It would simply make us a dictionary that lists meanings for "♥". Which is unusual, I know. --Daniel Carrero (talk) 18:26, 21 June 2017 (UTC)
The "love" sense is important, because it appears as a verb in English syntax ("I ♥ cookies"). The others I am not sure about. The playing-card sense only really says "pictures of hearts are used on playing cards"; so what? A picture of the Grim Reaper/Death appears on a tarot card but that's also a picture. Likewise, the video-game sense isn't a sense of a word: it's just symbology: the heart as an image is generally used to represent life or health. Equinox 19:10, 21 June 2017 (UTC)
As I'm sure you know, the entry has two separate senses for "love" in different languages: the English verb as in "I ♥ cookies", and a general Translingual "love" that is not necessarily a verb. Do you think we should delete the Translingual one? Likewise, has four Translingual senses: "death", "poison", "pirates" and "toxic". Do you think we should delete them all?
I wish to defend one specific sense we mentioned: I think we should keep ♥ "hearts (on playing cards)", because poker books regularly have sentences like this, using the playing card symbols in running text: "In that situation, you should raise with AA♠." --Daniel Carrero (talk) 19:29, 21 June 2017 (UTC)
Do you forget about idiomicity policy? --Octahedron80 (talk) 03:17, 22 June 2017 (UTC)
I also think it's a very, very bad idea for us to take the approach that "anything in Unicode is automatically a dictionary-worthy symbol". Unicode is full of all kinds of rubbish these days. Equinox 19:11, 21 June 2017 (UTC)
Could you mention maybe one or two examples of Unicode rubbish? --Daniel Carrero (talk) 19:29, 21 June 2017 (UTC)
🗑 - [The]DaveRoss 20:00, 21 June 2017 (UTC)
Compatibility relics. Skin colour modifiers for smileys. I know there's lots more crap but I don't want to spend my time hunting it down. Equinox 20:04, 21 June 2017 (UTC)
Some votes (listed at Wiktionary:Character variations) supported redirecting a few compatibility characters to "actual" entries when possible. For example, redirects to km (technically this entry was not part of the votes, FWIW).
Skin color modifiers are control characters. They don't have any shape by themselves. I support not having any entries for skin color modifiers. --Daniel Carrero (talk) 20:17, 21 June 2017 (UTC)

Sitelinks are enabled on Wikidata for Wiktionary pages (outside main namespace)[edit]


Short version: Since yesterday, we are able to store the interwiki links of all the Wiktionaries namespaces (except main, citations, user and talk) in Wikidata. This will not break your Wiktionary, but if you want to use all the features, you will have to remove your sitelinks from wikitext and connect your pages to Wikidata.

Important: even if it is technically possible, you should not link Wiktionary main namespace pages from Wikidata. The interwiki links for them are already provided by Cognate.

Long version available and translatable here.

If you encounter any problem or find a bug, feel free to ping me.

Thanks, Lea Lacroix (WMDE) (talk) 08:24, 21 June 2017 (UTC)

I tried on Category:English language and it work well. d:Q7923975. Linking pages must be corresponding to the subject as Wikipedia etc. Otherwise, new Q entry must be created such as parts of speech. However, some Wiktionaries do not allow to cleanup (remove) interwiki links. This must be discussed locally. --Octahedron80 (talk) 09:38, 21 June 2017 (UTC)
This can also include IW from closed wikis but it must be input manually. --Octahedron80 (talk) 01:48, 22 June 2017 (UTC)
@Lea Lacroix (WMDE) May I ask why d:Q1860 was not used for this instead? —CodeCat 12:33, 21 June 2017 (UTC)
Hello, d:Q1860 is used to describe the concept of English language in general. It has, for example, statements like the number of speakers. This is not exactly what we want to describe with d:Q7923975, dedicated to the categories on Wikimedia projects, that's why we use a different item. Lea Lacroix (WMDE) (talk) 13:01, 21 June 2017 (UTC)
My short answer: it's not the same type. :) --Octahedron80 (talk) 15:02, 21 June 2017 (UTC)

@Lea Lacroix (WMDE) There is some problem with d:Q4167836. Wiktionary does not appear and I can not add any page like Wiktionary:Categorization. --Vriullop (talk) 15:12, 21 June 2017 (UTC)

Another problem: Category:English language links to the Dutch Wiktionary category nl:Categorie:Woorden in het Engels ("Words in English"). This category corresponds to our category Category:English lemmas instead, which contains all words in English like the Dutch category. The Dutch Wiktionary apparently has no equivalent to our Category:English language. The Afrikaans Wiktionary has the same problem. —CodeCat 16:38, 21 June 2017 (UTC)

Actually, nl:Categorie:Woorden in het Engels doesn't correspond to any of our categories. It seems to be a combination of Category:English lemmas and CAT:English non-lemma forms, as it includes things like abashes and abashing that we consider non-lemma forms. — Eru·tuon 16:55, 21 June 2017 (UTC)
I've also noticed this, other non-English wikis often have similar systems. Trying to interlanguage-link between Wiktionaries is complicated and at this point probably pointless due to the lack of a uniform system for the categorization of words across all Wiktionaries. — Kleio (t · c) 16:59, 21 June 2017 (UTC)
So how do we fix this exactly? Can we just remove the Dutch category from Wikidata manually? What about all the other languages? —CodeCat 17:08, 21 June 2017 (UTC)
Oddly, it:Categoria:Parole in inglese contains no words, so it's a better match for our category even though it's named "words in English". ie:Categorie:Parol in anglés by contrast contains nothing but words, no categories. co:Categoria:Parolle in lingua inglese meanwhile contains words and categories, like the Dutch one, but appears restricted to lemmas. —CodeCat 17:13, 21 June 2017 (UTC)
Yes, nl:Categorie:Woorden in het Engels and Category:English lemmas are different in nature (in what they cover) and so they should probably not be linked; there should apparently be separate Wikidata items for "category of all words in English" and "category of all lemmas in English". In general, interwiki links between categories can be handled (not on the technical level of Colgate vs Wikidata, but on the level of "do interwiki links exist?") just like links between main-namespace pages. If nl.Wikt has an entry foobar and we don't, they can't interwiki-link that entry to our wiki, and if we have an entry foo and they don't, we can't link our entry to them, but when both en.Wikt and nl.Wikt have entries for bar, they can be linked. Likewise, there's no page on our wiki for nl:Categorie:Woorden in het Engels to be linked to (unless we regard it as a fit with "Category:English language", which, actually, it seems to be, if it's their highest-level category for English), but other categories can still be linked. - -sche (discuss) 17:19, 21 June 2017 (UTC)
Part of the confusion seems to be the conflation of several different roles for the categories:
  1. Top-level category for a language.
  2. Contains lemmas.
  3. Contains non-lemmas.
  4. Contains categories for parts of speech.
The Dutch category fulfills all four roles, while we have three separate categories for roles 1-3 while role 4 is combined with role 2. The Italian category fulfills role 1, so it matches our role 1 category, but it also fulfills role 4 which ours does not. The Interlingue category fulfills only role 2, but not 4 like ours does. The Corsican category fulfills roles 1, 2 and 4. —CodeCat 17:41, 21 June 2017 (UTC)
I can confirm that nl:Categorie:Woorden in het Engels is the top-level category for English, that's why it's linked with Category:English language. The highest-level category for a language is always linked with the highest-level language category in other Wiktionaries, even if the subcategories/pages in them differ. The problem of not exactly matching category interwikis was always there; now, the only difference is that the interwikis will be hosted at Wikidata, instead of locally. Pinging User:Malafaya, who has been maintaining category interwikis with his bot (MalafayaBot) and solving category interwiki conflicts for years, in case he wants to share his experience / advice. -- Curious (talk) 22:07, 21 June 2017 (UTC)
(IMO the fundamental problem is that each Wiktionary has its own structure... —suzukaze (tc) 08:02, 22 June 2017 (UTC))

Can Wikidata store interwikis for Unsupported titles/Colon? --Daniel Carrero (talk) 02:37, 22 June 2017 (UTC)

@Vriullop Some items related to Wikimedia projects are protected, because they are used a lot and attract vandalism. For these, you can add a message on the talk page, with the links you want to add, and an admin will take care of it.
@Daniel Carrero Thanks for noticing, indeed, this is not supported either by Cognate or Wikidata links now. We're going to investigate on the best way to provide automatic links here.
Thanks for your feedback, Lea Lacroix (WMDE) (talk) 07:46, 22 June 2017 (UTC)

@Lea Lacroix (WMDE) Most projects have the Main Page on main namespace, not the case of en.wikt, so pages in d:Q5296 may be linked both by Wikidata and Cognate. It seems that Cognate takes precedence. Page gn:Ape is linked to Ape, not to Wiktionary:Main Page. In the reverse, both English pages are linked to gn:Ape. --Vriullop (talk) 09:04, 24 June 2017 (UTC)

A possible workaround: pages defined at MediaWiki:Mainpage in each project should be ignored by Cognate. --Vriullop (talk) 09:21, 24 June 2017 (UTC)

Semantic loans versus calques[edit]

@CodeCat proposed on my talk page that semantic loan categories, such as Category:Russian semantic loans from English, be merged with calques, such as Category:Russian terms calqued from English.

Theoretically the difference is that semantic loans are single morphemes, like Russian мышь (myšʹ, mouse), while calques, such as Latin accusativus, have more than one morpheme, but this is probably not consistently enforced. Recently I was recategorizing Greek γύρισμα (gýrisma) as a calque, as it consists of two morphemes.

I think it would be much simpler to merge the two categories. I don't see any practical value to the distinction.

I'm going to ping @Dixtosa and his alter ego @Giorgi Eufshi, because he created the template {{semantic loan}}. — Eru·tuon 03:54, 22 June 2017 (UTC)

No the difference is that in the case of semantic loan in the borrowing language the term already exists and already has some meaning (hence the name 'semantic'). And I think this difference is huge. Yes I agree it is very similar to calques and might as well be a special case but having calques and sem loans in differenct places lets us see what languages enriched what language in terms of new words. --Dixtosa (talk) 04:40, 22 June 2017 (UTC)
Ahh, thanks for the explanation. I guess I had misunderstood. That is a more fundamental difference: whether a new word was created or not. — Eru·tuon 04:46, 22 June 2017 (UTC)

Category names in which "words" should be replaced with "terms"[edit]

While editing -oecious and adding the suffix -ous to the etymology, I found that the category name Category:English words suffixed with -ous is incorrect, as a suffix is not a word. It should be Category:English terms suffixed with -ous. Similarly, Category:English words by suffix should be Category:English terms by suffix.

(An alternative would be to add a |pos=suffix parameter, which would put -oecious in Category:Suffixes suffixed with -ous instead. That may or may not be simpler. The category name does sound a little funny.)

Moving the entries would simply require changing some code in Module:compound. There would be a lot of deletion, moving, and creation of categories. Moving, because some of the categories (see, for instance, Category:English words suffixed with -ious) have text besides the category boilerplate template. Some of that could be done with bots.

Another category in the entry -oecious suffering from the same problem is the syllable count category Category:English 2-syllable words. That should be English 2-syllable terms, as has been discussed before. — Eru·tuon 06:29, 22 June 2017 (UTC)

I support term everywhere because it has wider meaning than word; it could be suitable for long proper nouns (like countries), proverbs and phrasebook sentences. At the moment, I just use current namings for modules. --Octahedron80 (talk) 10:22, 22 June 2017 (UTC)
I don't think terms made of multiple words should be categorised by number of syllables. —CodeCat 13:09, 22 June 2017 (UTC)
Well, two things: there should probably be a subcategory for words by syllable count. And what about bound morphemes (prefixes, suffixes, whatever else)? Should they also not be categorized? — Eru·tuon 19:50, 22 June 2017 (UTC)
It seems to me the best solution is to have a category for "terms by syllable count" and within it subcategories for "words by syllable count" and "bound morphemes by syllable count". That way everyone's preferences are accommodated. — Eru·tuon

Category:Anatomy vs. Category:Body[edit]

I think these categories are currently not well delineated. I suppose the firs one is purposed for terms which are only used in a medical context, and not by laymen? --Barytonesis (talk) 14:08, 22 June 2017 (UTC)

Also, would it be possible to make Category:Teeth a subcat of Category:Face? --Barytonesis (talk) 14:28, 22 June 2017 (UTC)
Yes; in theory at least, CAT:Anatomy is for technical terms and CAT:Body is for everyday words pertaining to the body, though in practice the distinction is hardly observed. —Aɴɢʀ (talk) 19:50, 22 June 2017 (UTC)
We really need a set category for stuff in the body, but I don't know what it could be named. —CodeCat 19:18, 24 June 2017 (UTC)

Search results from Wiktionary now part of Wikipedia's search system[edit]

Just to let you know, as announced via mailing list service, English Wikipedia is now receiving search results of this project, Wiktionary, intended to direct Wikipedia users to this project. Currently, an option to suppress the search results of this project from the English Wikipedia search system is proposed at Village pump's "proposal" subpage, where I invite you to comment. --George Ho (talk) 19:16, 22 June 2017 (UTC)

What about in the opposite direction? DTLHS (talk) 19:23, 22 June 2017 (UTC)
I'd love such a feature here, especially if it enabled me to select which projects to include. DCDuring (talk) 19:37, 22 June 2017 (UTC)
It would facilitate article creation, whenever it's necessary to research the meanings of terms. Andrew Sheedy (talk) 20:30, 22 June 2017 (UTC)
No, Metaknowledge, not Phabriator yet. One of WMF staff says that they're planning on spreading the update across other projects, but that would take some time. You can ping that guy if you want. --George Ho (talk) 17:56, 27 June 2017 (UTC)
Thanks. @CKoerner (WMF), we're very interested in getting that feature here. Will it be rolled out to the Wiktionaries or will we have to request it? —Μετάknowledgediscuss/deeds 18:11, 27 June 2017 (UTC)
@Metaknowledge, the team is looking at that very question. We'd like to include more Wikimedia projects. Your comments and feedback are welcome. :) CKoerner (WMF) (talk) 18:36, 27 June 2017 (UTC)

Remaining pages with interwikis[edit]

User:DTLHS/cleanup/pages with interwikis. Pages such as uexo should be looked at carefully. DTLHS (talk) 21:30, 23 June 2017 (UTC)

A curious case: https://en.wiktionary.org/wiki/main%20Page?redirect=no. It is protected redirect not linked by Cognate. Target page Wiktionary:Main Page is linked by Wikidata. --Vriullop (talk) 08:43, 24 June 2017 (UTC)

Taxonomic entry translation resources[edit]

I am beginning to accumulate external databases that have vernacular names of taxa for multiple languages other than English and reference templates for them. Before we actually add translations in our usual fashion, I hope we can include translation sections such as the ones in [[Ipomoea]] and [[Ipomoea batatas]].

I am aware of sources for fish (Fishbase, see Salvelinus fontinalis (brook trout) and for plants (See Ipomoea at Multilingual Multiscript Plant Name Database (University of Melbourne)). I'd be interested in similar sources for birds, mammals, reptiles, crustaceans, insects, etc. DCDuring (talk) 19:18, 24 June 2017 (UTC)

I rarely trust these. They have the same problems that lists of phobias and lists of collective terms do, compounded by the fact that most taxa don't have true vernacular names in the well-known languages- just ones that scientists make up or misapply for completeness. Not to mention that vernacular names often don't match taxa in their distribution: no one in real life uses yam for a boniato, or vice versa, even though they're both Ipomoea batatas. You really have to know the languages and the plants to sort them out with any certainty. Chuck Entz (talk) 04:06, 25 June 2017 (UTC)
At least some people seem to use the names the scientists or bureaucrats prescribe. By the same token, one could wonder about our translations or entries in some languages where the entries are made by non-native speakers. There are lots of redlinks, which makes me worry about their reality as well. At least we wouldn't be endorsing the terms if they are on an external site. DCDuring (talk) 05:00, 25 June 2017 (UTC)
I'd be fine with linking to lists like that, as long as we made sure no one started copying them over to Wiktionary itself. Andrew Sheedy (talk) 05:55, 25 June 2017 (UTC)
I'd hope we wouldn't risk copyvio. I'd hope that the external lists might serve as some kind of check on contributions by our less fastidious contributors. But we can't control what folks will do; we may just have to clean up after them as best we can. DCDuring (talk) 07:09, 25 June 2017 (UTC)
@DCDuring: Natures' Window has a beautifully curated list of common reptile names in English, and I think there's German and possible Danish on there somewhere too but the site's navigation is confusing. It does however suffer from the lexicographical problem articulated above of including or inventing common names that aren't so common in the real world, but it's a very nicely put together list all the same. More general sites include: eol.org which collects common names in all languages, iucnredlist.org includes English, Spanish, and French fields, and catalogueoflife.org (many languages). Pengo (talk) 15:10, 30 June 2017 (UTC)
@Pengo: Thanks for the reminder about these. The hard-to-attest vernacular names are mostly folk names. Others are fish-mongers' names for fish, grocers' for produce, etc. Soon after some entity like the US Dept of Agriculture gives something an 'official' English name there is usage, even if only among a narrow group of speakers. Any group of organisms with a large fan base (eg, large mammals, spiders, marsupials, butterflies) also may have an organization that provides names for the fans.
The databases with vernacular names for taxa seem to show that there are no accepted names for many species in most languages. The number of languages that have such names seems to depend on how well-traveled the species is (alive or dead), how visible it is to the naked eye, and how well it photographs. DCDuring (talk) 17:50, 30 June 2017 (UTC)
There's also the official Australian Standard fish names for 4000 species names that are now commonly used in Australia :) But yes, in general many species appear to have no common name in any language unless the original scientific description decides to include one. Fun fact: Many languages don't even name what we may consider basic colours like "purple" or "orange" until they have commercial significance, such as in dyes and textiles. Pengo (talk) 02:42, 1 July 2017 (UTC)

Allowing "Gallery" section[edit]

At some point, I'd like to create a vote to formally allow the "Gallery" section in entries.

Entries with "Gallery" sections as of today:

--Daniel Carrero (talk) 07:17, 26 June 2017 (UTC)

Please include placement choices/restrictions etc. DCDuring (talk) 12:27, 26 June 2017 (UTC)
OK. Proposed specifications for "Gallery" section:
Placement choices:
  • type one - subsection of a given POS section, one gallery per POS section (L3 section has L4 "Gallery" subsection)
  • type two - single L3 section below all POS sections in the given language section
Heading order (type one):
  • after "Usage notes", "Inflection", "Declension", "Conjugation", "Mutation", "Quotations", "Alternative forms", "Xnyms", "Coordinate terms", "Derived terms", "Related terms", "Descendants", "Translations"
  • before "See also", "References", "Further reading"
Heading order (type two):
  • same as type one, except most of these other headings are not supposed to exist as L3 sections -- in other words, still place "Gallery" before "See also", "References", "Further reading"
See table below. --Daniel Carrero (talk) 12:42, 26 June 2017 (UTC)


# ...



# ...



# ...


# ...

Obviously, there should be some threshold to use; an entry with only one or two (or three, IMO) images doesn't need to use the gallery format. - -sche (discuss) 12:31, 26 June 2017 (UTC)
Sure. Symbol support vote.svg Support having some threshold to use. Maybe 7 or more images. --Daniel Carrero (talk) 12:42, 26 June 2017 (UTC)
Symbol support vote.svg Support. A good idea that I could have used before now. DonnanZ (talk) 23:26, 26 June 2017 (UTC)
Symbol support vote.svg Support. I would say five is a good threshold. Seven per Daniel seems too high. Andrew Sheedy (talk) 01:43, 27 June 2017 (UTC)
Good practices: an image per sense, if relevant, only one per sense and a gloss in the caption referring to the sense. More than some threshold I think the point is its purpose, explanatory or potentially decorative. Is there any norm on the use of images in general? --Vriullop (talk) 07:40, 27 June 2017 (UTC)
WT:EL currently does not say anything about images, but what you said sounds about right to me. --Daniel Carrero (talk) 08:08, 27 June 2017 (UTC)
Some pages as listed above have the "Gallery" section with just one image. Do you think that's a problem? I get that it uses up vertical space, but maybe some people prefer doing it that way. I'm not sure there's always a clear advantage to using right-floating images instead of a gallery. Maybe we should allow galleries with just 1 image after all. --Daniel Carrero (talk) 21:33, 28 June 2017 (UTC)

IPA categories have some strange text[edit]

e.g. Category:User_IPA-4 says "These users can read the International Phonetic Alphabet at a near-native level", but IPA isn't a language that anybody speaks natively. Equinox 22:57, 26 June 2017 (UTC)

Or, in the other direction, almost everyone's native language (if spoken rather than signed or only written in text or braille) is made up of IPA sounds... ;) Maybe it should say "near-perfect" or "near-expert"? - -sche (discuss) 00:59, 27 June 2017 (UTC)
The IPA-4 template itself has the text »This user has a comprehensive understanding of the International Phonetic Alphabet«, so perhaps the category could just be changed to say the same? — Vorziblix (talk · contribs) 17:58, 30 June 2017 (UTC)


Did anyone notice when we hit 5,250,000? I was away when it happened so can't update Wiktionary:Milestones. SemperBlotto (talk) 12:55, 27 June 2017 (UTC)

@SemperBlotto: it's so huge now that we could content ourselves with updating it every 500.000 entries. --Barytonesis (talk) 13:16, 27 June 2017 (UTC)
Yes, well do that. SemperBlotto (talk) 15:06, 27 June 2017 (UTC)
I still want to hit the 6 millionth article. --Recónditos (talk) 17:33, 30 June 2017 (UTC)

NORM vote -- align equal signs[edit]

Based on User talk:TheDaveRoss#Removal of spaces from templates, I created Wiktionary:Votes/pl-2017-06/NORM: allow multiple spaces to align equal signs in templates. --Daniel Carrero (talk) 17:30, 27 June 2017 (UTC)

I would be suspicious that the Mediawiki template parser / our Lua modules do not treat this identically with the nonspaced version with respect to stripping or not stripping spaces. DTLHS (talk) 17:32, 27 June 2017 (UTC)
See m:Help:Newlines and spaces#Trimming on expansion. Spaces are striped from named parameters but not from unnamed parameters. Same behaviour applies to invoke function. --Vriullop (talk) 18:25, 27 June 2017 (UTC)
Can you add the amended text for the existing first rule of the template section to the vote; "No leading or trailing whitespace in templates (name, parameter name and value), links, categories and so on." The new rule would contradict the old rule, so we should alter the old rule as well instead of leaving a contradiction. - [The]DaveRoss 18:09, 27 June 2017 (UTC)
OK. I did two things:
  1. I edited that part without a vote: now it reads "No leading or trailing whitespace in templates (name, parameter name and value)." (the "links, categories" part was out of place and is already said elsewhere, THIS is the "Templates" section)
  2. I edited the vote to suggest a new edit that should remove the contradiction if the vote passes.
Let me know if there's anything else to be done. --Daniel Carrero (talk) 18:31, 27 June 2017 (UTC)

Further reading and external links[edit]

I may miss something here, but many of "Further reading" or "external links" (in the main namespace) are proposals based on contributor's intuition (with bad or good intention) of what (s)he would like to read. Why a dictionary like wiktionary might suggest about what is good to read about the lemma? Links, in such sections, that link to other WMF projects are always welcome, but what is the meaning for links to external databases (private or public)? Do we promote them? Does the wiktionarist that added the link believe it would be better for the readers not to come to wiktionary but go directly to another database (especially a private)? Having external links used as sources is for proving what is written here is not fake, is not writers' imagination. And this is one of our goals. Not the proposing of further reading to external sites. As an example Weiterbildung proposes a commercial site having also a Warenkorb ready for buyers! Or ψαλίδι proposes a very good site for researchers, but this proposal is for other places (such as a proposal to contributors to Greek lemmas) and not in every Greek lemma (or, much more suspicious, in selected ones). I must also note that making proposals lacking sources that clearly state that these are good "proposals" (not good books) for that specific dictionary lemma is at least considered original research.

P.S. I added suspicious not because I do not trust the specific contributor but placing any such link in selected lemmas may make it suspicious. --Xoristzatziki (talk) 18:03, 27 June 2017 (UTC)
@Xoristzatziki: I assume you're referring to my adding {{R:DSMG}} to some Greek entries. Two remarks: 1) I'm not particularly happy with the header "Further reading" and preferred when it was called "References", but per Wiktionary:Votes/2017-03/"External sources", "External links", "Further information" or "Further reading", this is the new policy; 2) I'm not purposefully "selecting" entries: when I see that a Greek entry needs an etymology, for example, I take advantage of the opportunity to add the external link. --Barytonesis (talk) 19:53, 27 June 2017 (UTC)
@Barytonesis: This is not a personal reference. This is a talk about the usage of "Further reading" or "external links" by all users. These were just examples. The problem is not with your contribution, is with the upcoming usage of such links. And, I repeat, it is not personal. It is about how we (all) must use certain things. The 2017-03 and 2016-12 votings were misleading (my opinion) and were mostly about the naming convention of the section and not about the usefulness of the content. I am sorry I missed these votings, but my time was so limited, and I was not concerned about the names of these sections. On the other hand, wiktionary is based in the contributors and if majority prefer links to further reading (which do not have any sense in dictionary's entries) then mea culpa. --Xoristzatziki (talk) 20:25, 27 June 2017 (UTC)
You're the first to make this an issue, as far as I can tell. —CodeCat 20:31, 27 June 2017 (UTC)
That some, but not all, entries of a given type have some reference is just a product of the uneven, unsystematic nature of entry development. (Compare the first print OED, for which the first letters of the alphabet were covered about 44 years before the last. Even now some entries there seem not to have gotten much attention in recent years.) Wiktionary may be imagined to be somehow definitive, but that is far from the case. Contributor (volunteer!!!) attention is the scarcest resource here. DCDuring (talk) 21:12, 27 June 2017 (UTC)
@Xoristzatziki: no offense taken. Regardless of the name "Further reading" (vs. "References"), do you actually oppose to adding {{R:DSMG}} on all lemmas? If not, I will continue adding it whenever I edit a Greek entry (I'd be happy to devise a bot to do it, but I don't have the coding knowledge required). --Barytonesis (talk) 23:59, 27 June 2017 (UTC)
@Barytonesis Wiktionary is not an arbiter of what is good English, Greek etc. If an external link is to be a reference for supporting an edit it is ok. Which means that if your edit added something that may lead to {{rfv}} (or already done one), or your edit already have lead to something like an edit war, then add it as a reference to support it (but only if you think it will clearly support it). Otherwise there is no reason to add it nowhere (ok not nowhere, you can include it in a help page for newcomers (or "oldcomers":-) in Greek lemmas). I must insist (here also) that the existence of a lemma in any dictionary is not a proof that word is used. Neither that a word has that specific translation because a)some authors tend to include misleading edits for copy checking, b)some dictionaries have newer editions correcting mistakes. A word included in an old dictionary, or a new one, for time periods where no written material exists for that language can be "supported" by a reference. But only as reference in a specific point of the lemma. Not as a reference for further reading, or as a generic external link. I also created a {{R:Babiniotis 2002}} (a printed material I possess) but only for supporting etymology in articles that a different etymology exists (or existed) in other wiktionaries. And I am aware of the well known fact that this particular dictionary had many corrections in his etymologies (and entries) since 2000 (which means that if someone has a newer edition it would be good to check, and correct if necessary, my editions). So, concluding, no reason to add the {{R:DSMG}} (or any other such external link) in any lemma unless it is for supporting something specific. --Xoristzatziki (talk) 06:34, 29 June 2017 (UTC)
I must insert another reason c) some dictionaries (as the dictionary of Babiniotis or ALL ISO and ΕΛΟΤ publications) include entries in what has to be correct English or Greek, which is out of the scope of our wiktionary. --Xoristzatziki (talk) 06:40, 29 June 2017 (UTC)
@Xoristzatziki: "no reason to add the {{R:DSMG}} (or any other such external link) in any lemma unless it is for supporting something specific": I disagree with that. The whole idea of Further reading is that it does not support anything specific; instead, it guides the reader to another site relevant to the word documented in Wiktionary. I for one think that monolingual online dictionaries containing definitions such as Duden are in general excellent material to add to Further reading. Our readers do not necessarily know where else to look; we tell them. Furthermore, we do not tell the reader what to do with the external link; if the reader does not trust our verification processes and wants to check, that's up to them. In my mind, ideally, every lemma should eventually link to at least one online monolingual definition dictionary if possible, but I don't like when there are too many external links in a single lemma. --Dan Polansky (talk) 17:02, 30 June 2017 (UTC)
In general I agree with Dan on this. Including links to other dictionaries (even when they are not being used to support any particular piece of information, like an etymology or pronunciation) is acceptable and even helpful, although not obligatory. Unless there is an issue with the DSMG, such as if it were of low quality, I think it can continue to be included. If there are too many external links in a particular lemma (which I agree is undesirable), or if it is a general problem for particular languages (say, there are a lot of dictionaries of English available, and we don't necessarily want to link to all 50+), we can discuss pruning some of the less-relevant/helpful ones or moving them into a collapsible box. I don't think that a dictionary being prescriptivist inherently means we shouldn't link to it. - -sche (discuss) 18:26, 30 June 2017 (UTC)
I have long been using {{R:OneLook}} in English L2s. They include quite a few English dictionaries and many more specialized glossaries etc. It would also be possible to hide all or some external links. But since external links - whatever heading they appear under - are near the bottom of their L2 they interfere with user visual access to definitions much less than overlong Pronunciation and Etymology sections. DCDuring (talk) 21:51, 30 June 2017 (UTC)
"Our readers do not necessarily know where else to look; we tell them."??? No!!! Our job is to gather non-garbage information here. For them. By checking and rechecking and again checking. Adding any "further reading" link just to inform the readers where to find more (reliable or not) definitions is like saying "We at Wiktionary acknowledge that link is a good one and please go there to find more definitions which we may never add here if they are garbage". Further readings are good for wikiversity and wikibooks. But in wikipedia, wiktionary, wikisources and wikiquote the only reason an external link may exist is for supporting something written there (and specified, not to support anything written in the article, ex. to support etymology, or a specific meaning etc.). A link is to inform our readers that we are not writing garbage here. I will take the risk to go a little further and ask: Do you have at least three independent sources that will support your addition of that specific link as "further reading"? Really, do you have at least one source which states that dictionaries (or encyclopedias) should provide further readings? I am not against such links in a talk page in order to inform the contributors where to find material to extend the article. ,Anyway I say all these because I fear that my contributions are only for supporting these specific "further reading" sites and not for gathering the information that may (or may not) provide (free or by charging) for our readers. And I cannot support my opinion more than what I already did. --Xoristzatziki (talk) 01:49, 1 July 2017 (UTC)
We are here to render a service under a free-as-in-freedom license. Adding useful further reading renders more service. I see it as pretty obvious that a link to a monolingual dictionary pointing the reader to a page specific to the word is a useful service. You seem to be opposed to the idea of further reading in general, and only support the idea of references, links serving to back something up. The meta-sourcing problem you hint at (which sources tell you such and such is a good source) applies to sources backing something up as well, and if real, would be more urgent for them; for further reading, it is not so urgent. I don't think that for monolingual dictionaries there exists any serious meta-sourcing problem, that is, we do not have to fear that e.g. {{R:DSMG}} (Dictionary of Standard Modern Greek, 1998) is a very bad dictionary we do not want to link to. Furthermore, if we wished to abandon further reading altogether, we would have to cease linking to Wikipedia: we cannot use Wikipedia to back anything up.
The heading Further reading was introduced recently as a replacement for External links via Wiktionary:Votes/2016-12/"References" and "External sources" and Wiktionary:Votes/2017-03/"External sources", "External links", "Further information" or "Further reading", to reinforce the role these links are to play to the reader, that is, not to back anything up.
As a contributor of Greek content to the English Wiktionary, you do not need to fear the competition of further reading sources, I think. The English Wiktionary integrates information on multiple languages in a beautiful way; it may not be best of breed, but it sports integration, has place for attesting quotations which these sources often do not have, and has many words these sources miss. Furthermore, a Greek monolingual dictionary does not give you English translations and abbreviated English definitions of the Greek terms. I see {{R:DSMG}} complementing but not replacing information in the Greek entries in the English Wiktionary. As a contributor of Greek content to the English Wiktionary, you are contributing to what a Danish journalist recently described as a miracle of a dictionary, in Et mirakel af en ordbog, www.b.dk, 22 April 2017. Our problem is incompleteness, not extraneous further reading links. --Dan Polansky (talk) 06:28, 1 July 2017 (UTC)
Just a clarification: Weiterbildung, which I first mentioned, is not a Greek word at all. So even if I was feared...(!!!) it was not about a single language, but about at least one more. --Xoristzatziki (talk) 06:56, 25 July 2017 (UTC)

Category:Evolutionary theory[edit]

I've added "evolutionary theory" as a label, and have been adding it to some entries, which are then automatically categorised in Category:en:Evolutionary theory; however, I'm not entirely convinced of what I'm doing.

Adding the label to Lamarckism, for example, might make it look like it's endorsed by current research, when this is obviously not the case; and linking to evolutionary theory, defined as "A theory of evolution, especially the scientific theory of evolution through natural selection" will be confusing for the reader, since it's directly at odds with Lamarckism. What about orthogenesis, transmutationism, etc.?

But the thing is, these terms are still used in current evolutionary discourse to speak of obsolete theories. I think I'm getting confused between words and concepts here. @Chuck Entz, Metaknowledge? --Barytonesis (talk) 11:31, 28 June 2017 (UTC)

What's wrong with just pointing out that these theories are disproved in the entry? Korn [kʰũːɘ̃n] (talk) 11:44, 28 June 2017 (UTC)
Would the historical gloss be useful? This indicates a still-current term for an obsolete thing, like velocipede. Equinox 11:54, 28 June 2017 (UTC)
Don't forget that Wikimedia categories are a navigational aid for finding entries that have something in common, not a statement about the meaning of things. Lamarckism is something that one talks about in the context of evolutionay theory, so readers interested in evolutionary theory would be interested in Lamarckism as well: no one would write about the history of evolutionary theory without mentioning Lamarckism, Lysenkoism, and other obsolete theories. Also, "especially the scientific theory of evolution through natural selection" just means that natural selection is the main evolutionary theory, not that it's the only one. Chuck Entz (talk) 13:50, 28 June 2017 (UTC)
All right. I'll continue using it, and add the label "historical" in the relevant cases. --Barytonesis (talk) 15:36, 28 June 2017 (UTC)

Enabling Wikidata arbitrary access[edit]

Hello all,

The Wikidata team followed with attention the different discussions (February, March, June) regarding enabling the possibility to display Wikidata informations in English Wiktionary, and the result of this vote.

I want to let you know that we are now ready to enable Wikidata arbitrary access on English Wiktionary. All the necessary steps have been done (deploying automatic links for the main namespace and Wikidata centralized links for other namespaces). Even if some issues still have to be fixed on this two features, we have all the means to allow you to experiment with Wikidata data. It is now up to you to decide if and when you want to enable it. We don't want to push this feature without your explicit consent, that's why we will now discuss about a possible date of activation with you.

Important note: Wikidata is not ready to store information about words yet. This part of the plan is currently in development and the team is working actively on it, to provide in the next months the possibility to describe Lexemes, Forms and Senses in Wikidata. In the meantime, Wikidata only stores information about concepts, and this is how it should be used on Wiktionary as well.

As we mentioned in a former discussion, including Wikidata data in the Citations namespace in order to provide automatically the work, author, date of publication, etc. for the quote seems to be a simple and useful way to start experimenting with Wikidata.

What do you think? Do you have other ideas on how using Wikidata data? When would you like to enable the arbitrary access feature? Do you have any other question about this feature? Lea Lacroix (WMDE) (talk) 14:56, 28 June 2017 (UTC)

Maybe the Wikidata activation date could be July 17 or later, because that's when Wiktionary:Votes/pl-2017-06/Wikidata precautionary principle is scheduled to end. --Daniel Carrero (talk) 15:06, 28 June 2017 (UTC)
@Lea Lacroix (WMDE): Wiktionary:Votes/pl-2017-06/Wikidata precautionary principle ended today. I published the result in this new policy: Wiktionary:Wikidata policy.
That very important vote you mentioned (Wiktionary:Votes/2017-05/Installing Wikidata) ended in June 11. It's been a little more than 1 month since the use of Wikidata has been formally approved. So, I believe it should be OK to activate Wikidata as soon as possible. --Daniel Carrero (talk) 09:26, 18 July 2017 (UTC)
Thanks for mentioning me @Daniel Carrero! Do you already have a page to discuss about Wikidata, the different use cases, and to group all the decisions and votes that will be made regarding Wikidata data uses? I think that would be useful to have everything on one place. Lea Lacroix (WMDE) (talk) 09:36, 18 July 2017 (UTC)
@Lea Lacroix (WMDE): OK, here it is: Wiktionary:Wikidata. --Daniel Carrero (talk) 21:24, 19 July 2017 (UTC)

Category:Archaic terms by language and Category:Terms with archaic senses by language[edit]

I thought the first type of categories was discouraged in favour of the second? --Barytonesis (talk) 22:01, 28 June 2017 (UTC)

IIRC, at first only the first kind of category existed. When the second was added, it was for words that have some archaic senses as well as some non-archaic (still-current) senses, with the different categories being added by {{term-label}} and {{label}}, respectively. The distinction has generally not been maintained. Technically, "terms with archaic senses" encompasses "archaic terms" (i.e. terms with only archaic senses and no non-archaic senses), even if that is not how a normal reader might understand the name, so it is plausible that it might be best to give up on making a distinction (which has been discussed before from time to time), and merge the categories. - -sche (discuss) 17:47, 30 June 2017 (UTC)
Symbol support vote.svg Support merging. --Daniel Carrero (talk) 17:56, 30 June 2017 (UTC)

Glossary in the sidebar[edit]

FYI: One of these days, I added a link "Glossary" in the sidebar. (This was accomplished by editing MediaWiki:Sidebar.) --Daniel Carrero (talk) 11:02, 29 June 2017 (UTC)

Category:en:Tools and Category:en:Machines[edit]

This seems a bit redundant. --Barytonesis (talk) 22:21, 29 June 2017 (UTC)

How so? —CodeCat 23:39, 29 June 2017 (UTC)
Perhaps they should be combined because we can't maintain the categories.
Relatedly, why is bandage in Category:en:Medical equipment and scalpel in Category:en:Surgery? Is this done automagically?It is done by hard categorization. DCDuring (talk) 00:17, 30 June 2017 (UTC)
I fail to see a clear-cut distinction between the two. How are Category:en:Simple machines not tools? --Barytonesis (talk) 00:22, 30 June 2017 (UTC)
Because that is not how normal humans use the words. There is more art than set theory in coming up with category names. DCDuring (talk) 00:47, 30 June 2017 (UTC)
It's not redundant at all. Not all tools are machines (e.g. cutlery), and although probably all machines are tools in the broadest sense, not all machines are necessarily thought of as tools (like clocks and vending machines). —Aɴɢʀ (talk) 10:25, 30 June 2017 (UTC)
In what sense is a ship or a locomotive a tool? And consider the following: narrative functions as a powerful and basic tool for thinking
@Daniel Carrero We clearly need to specify in the category header what sense of the category name we would like categorizers to use. DCDuring (talk) 11:27, 30 June 2017 (UTC)
I don't think the category header helps a lot. Some people certainly didn't obey what the header of Category:en:Stars says, judging by the contents of the category. --Daniel Carrero (talk) 11:40, 30 June 2017 (UTC)
@DCDuring: Our definition of tool defines it as mechanical, which I would say is wrong. Wikipedia's entry defines it as "any physical item that can be used to achieve a goal, especially if the item is not consumed in the process". That would include a Paleolithic hunter-gatherer's use of a stone as a tool, and would include ships and locomotives as well. Our definition would definitely exclude the stone, and might exclude the ship and locomotive as well. —Aɴɢʀ (talk) 12:17, 30 June 2017 (UTC)
@Daniel Carrero: So the remedy for a poorly function, possibly useless, set of categories is to ignore the problem and not try the most basic identifiable steps to attempt a repair?
I assume that most ordinary contributors or label mavens just put something in any nearly-right category so that it can be eventually put in the "right" category. Wouldn't we want to enable the process to proceed in in the direction of a well-defined improvement, rather than stagnation or thrashing? I wouldn't even know how to clean up membership in a category without knowing what, if any, considered intent there was behind it. How many category hierarchies are there that a given sense of a word is permitted? Does scalpel belong in "Medicine" and/or "Surgery" and/or "Veterinary medicine" and/or "Tools" and/or "Medical equipment"?
@Angr: So, what should we assume is the definition intended by those who have established and enshrined the current topical (and/or usage context?) category system? How do we operationalize these categories. By what criterion should we decide to simplify the system so that it can be implemented adequately as is and gradually refined? Should categories be hidden by default and allowed to be displayed to ordinary users when they have met some kind of rudimentary standard? DCDuring (talk) 17:24, 30 June 2017 (UTC)
I would suggest renaming the categories somehow. It seems people pay more attention to category names than to their descriptions. --Daniel Carrero (talk) 17:33, 30 June 2017 (UTC)
I don't doubt that, but the process shouldn't end there. If it will, than our categories will not have great utility and, moreover, will put our dilettantism on display for anyone who wanders into the category thicket. DCDuring (talk) 21:34, 30 June 2017 (UTC)
I hope I did not give the impression that I oppose editing the category headers. I'm just not confident that it would be very helpful in comparison with the renaming categories as needed. But I support doing this that you said: "We clearly need to specify in the category header what sense of the category name we would like categorizers to use." --Daniel Carrero (talk) 23:10, 30 June 2017 (UTC)

Secret Wiktionary meetup[edit]

It's 1 p.m. on Sunday 16 July, in Pint Shop. Look for the guy wearing all black with very short hair and an obvious attitude problem. If nobody appears within an hour then I will run away and start buying books at the numerous excellent book shops of Oxford, which is why I'm really gonna be there. But say hi if you'd like. Equinox 10:14, 30 June 2017 (UTC)

Sounds fun! Let us know who went, if you don't mind. (I'll be here on this side of the ocean.) @Celui qui crée ébauches de football anglais, I think you should go. --Daniel Carrero (talk) 16:15, 30 June 2017 (UTC)
@Daniel., you pinged my old account (which only idiots do). Strangely, however, that might actually work as a meet-up. If I go, it will be incognito, however. --Recónditos (talk) 17:29, 30 June 2017 (UTC)
I pinged the bottom name from User:Aryamanarora/Wonderfool. It's not even permanently blocked. I figured you would see this post anyway. --Daniel Carrero (talk) 17:37, 30 June 2017 (UTC)
This is one of the disadvantages of being Canadian! We're so spread out over here, meeting up with people we meet online isn't an option for those of us who can't afford to travel everywhere... Andrew Sheedy (talk) 16:17, 30 June 2017 (UTC)
Damn! And Norwegian Airlines has such good fares now. DCDuring (talk) 17:30, 30 June 2017 (UTC)
Hmm, it's only two train journeys away. DonnanZ (talk) 17:49, 30 June 2017 (UTC)
@Equinox: I found Oxford and the Pint Shop but I couldn't find Equinox. The India pale ale is excellent. DonnanZ (talk) 16:25, 2 July 2017 (UTC)
Oh, I just realised it was the wrong bloody Sunday. Oh well. DonnanZ (talk) 16:28, 2 July 2017 (UTC)
Haha. See you in a week and a half? I am pretty sure nobody will turn up honestly. But I can spare an hour. Equinox 21:59, 3 July 2017 (UTC)
@Equinox: I'll try again. See you tomorrow hopefully, why else would I go to Oxford? It's a bit like central London, too many tourists. DonnanZ (talk) 17:02, 15 July 2017 (UTC)
It would be cool to meet a Wiktionary user in real life, except I don't live in the UK and have no way of getting there, especially not anytime soon. PseudoSkull (talk) 22:10, 15 July 2017 (UTC)

How did it go? --Daniel Carrero (talk) 22:56, 16 July 2017 (UTC)

It was fine. Just me and Donnanz though! We chatted and had a few beers. And yes there were too many tourists... Equinox 16:25, 17 July 2017 (UTC)
I very nearly didn't get there because of problems with the trains (weekend engineering works). But it was well worth the effort. DonnanZ (talk) 16:38, 17 July 2017 (UTC)

Vote: Placement of well documented languages[edit]

FYI, I created Wiktionary:Votes/pl-2017-06/Placement of well documented languages.

Let us postpone the start of the vote as much as discussion requires, if at all. --Dan Polansky (talk) 15:49, 30 June 2017 (UTC)

What discussion? —CodeCat 16:05, 30 June 2017 (UTC)
Any discussion. ("if at all" implies 0 or more discussions) But I guess we don't need to say "Let us postpone the start of the vote as much as discussion requires, if at all." in the first place, because a close paraphrase of it is now part of Wiktionary:Voting policy and thus implied in all votes. --Daniel Carrero (talk) 16:12, 30 June 2017 (UTC)
I think the idea of moving that policy page was mentioned in the recent votes about Latin, but I guess we don't need to create a whole discussion with the idea "moving the policy" before creating the vote. Unless we want to discuss it now. --Daniel Carrero (talk) 16:13, 30 June 2017 (UTC)
@Dan Polansky: Am I mistaken, or nowhere do I see "Well documented languages on the Internet" defined positively? It seems to only be defined by contrast with LDL: "WDL are the languages that aren't LDL, which are thus defined [...]"? (got your ping on the Latin vote, will respond later) --Barytonesis (talk) 17:41, 30 June 2017 (UTC)
@Barytonesis: I don't know whether there is such a definition. If the vote passes, it will not matter since the term will no longer be used in the policy; the policy will just say which languages require 3 quotations, that is, which, not what kind of. --Dan Polansky (talk) 17:46, 30 June 2017 (UTC)

July 2017

Change "proscribed" to "considered incorrect"[edit]

Reading through the discussion at Wiktionary:Tea room#.22alot.22_is_NOT_correct_English. finally gave me the resolve to propose something that has bothered me for a long time.

Let's change the "proscribed" label to "considered incorrect".

  • Proscribed is not a common word. I consider myself quite well-read, and I had never come across this word before encountering it on Wiktionary. An Ngram shows more usage of the word than I would have expected, but when you look at Google Books results since 1970, you see that its use is confined to academic and technical works, such as journal articles, textbooks and legislation. It appears in very few works targeted at a general audience, which is surely the audience we are targeting here at Wiktionary.
  • "Considered incorrect" would be better than "proscribed" in that it does not give the suggestion that we are the ones prescriptively proscribing the word - we are simply noting that many, if not all, sources consider the term incorrect.
  • It looks extremely similar to a word with essentially the opposite meaning. I couldn't think of many pairs of differently-spelt English words that look any more similar when written down in lowercase (other than M/RN pairs like "bum" and "burn" perhaps).
  • This information that this label conveys is especially important (I'd even say essential) for English learners and non-native speakers, but because it is conveyed using a word that they most probably do not know, it is going to be lost on them.

Sure, the definition is only a click away at the glossary. But why should we make people learn an extra word to be able to use our dictionary properly? It's silly. Let's do away with it.

I'm inclined to propose a vote along the lines of "changing the display of {{label|en|proscribed}} to (considered incorrect)". This, that and the other (talk) 06:15, 1 July 2017 (UTC)

I am persuaded by your reasoning here. I would support the wording considered incorrect. I wonder what @Dan Polansky would think of this proposal, given that he would prefer us not to use the proscribed label. — Eru·tuon 18:34, 1 July 2017 (UTC)
I seem to prefer often deemed incorrect; "considered" is okay but "deemed" is shorted. The addition of "often" reinforces the idea that the deeming is not done by Wiktionary editors. See also Wiktionary talk:Votes/2016-10/Removing label proscribed from entries#Other label name. --Dan Polansky (talk) 19:02, 1 July 2017 (UTC)
I’d rather we kept a distinction between “considered incorrect by language authorities” (= proscribed) and “considered incorrect by speakers in general” (= nonstandard). — Ungoliant (falai) 18:47, 1 July 2017 (UTC)
I agree the distinction should be kept, though I'm sympathetic to the point that users may not know the word and hence the info may be lost on them. It's hard to find a label that keeps the distinction and is concise and able to be put into all entries that use "proscribed". "Considered incorrect by authorities" or "...by some authorities" is wrong if only one (but e.g., the official or dominant/influential and notable) language authority proscribes the term, e.g. the Académie française, the Duden, maybe the OED, and "some" is wrong if most or all authorities proscribe the term.
For similar reasons, "often" should not be included in the text that is automatically displayed: not all terms are "often" considered incorrect by authorities: some may be considered incorrect by all authorities (this seems especially likely if a language has one or more central authorities), others may only be proscribed by some authorities while other authorities approve of them, in which case we use "sometimes proscribed", which would become "sometimes often considered..." or "sometimes often deemed...". And probably the most frequent occurrence is that one or more authorities proscribe a term and others don't mention it, which makes it debatable whether it is "often" considered incorrect.
(Ultimately, using "proscribed" and linking to the glossary like we do may be the best option, despite its drawbacks.)
An idea based on the name of the category which "proscribed" currently categorizes into is "authoritatively disputed" or "authoritatively deemed incorrect", but I don't like the sound of either of those; "authoritatively" seems liable to be misunderstood.
- -sche (discuss) 18:17, 2 July 2017 (UTC)
I would support changing "proscribed" to "considered incorrect", but I also agree with Ungoliant that it's useful to distinguish between whether something is only "officially" considered incorrect, or whether most speakers would think it a mistake. Ultimately, I think the ideal is to put that information in a usage note, which allows for further elaboration. One can't learn the subtleties of a word's usage from a label. Andrew Sheedy (talk) 19:47, 3 July 2017 (UTC)
I strongly support the use of the more common words, despite the greater length. DCDuring (talk) 02:39, 4 July 2017 (UTC)
"Official" incorrectness vs. "popular" incorrectness is not in fact a binary distinction: words can even be incorrect in some registers while being preferred in others. Don't we have the ====Usage notes==== section for this kind of detail? --Tropylium (talk) 15:17, 23 July 2017 (UTC)

@DCDuring, Andrew Sheedy, -sche, Ungoliant MMDCCLXIV, Dan Polansky, Erutuon I've created a vote at Wiktionary:Votes/2017-07/Changing the wording of the "proscribed" label. The discussion at the talk page may interest you. This, that and the other (talk) 10:16, 9 July 2017 (UTC)

Vote -- Requests for documentation[edit]

Based on Wiktionary:Tea room/2017/June#"the Variety -er", I created Wiktionary:Votes/2017-06/Requests for documentation. --Daniel Carrero (talk) 10:48, 1 July 2017 (UTC)

July Lexisession: flight[edit]

Is it a spin?

Monthly suggested collective task is to collect words about flight. In the category of Wikisaurus about travel and movement, there is nothing about motion in the air, and it is the same in French Wiktionary, so it seems like a good topic for this month - it could soar!

Yay! let's do a barrel roll!

By the way, Lexisession is a collaborative experiment without any guide nor direction. You're free to participate as you like and to suggest next month topic. If you do something this month, please let us know here or in Meta, to let people know that English Wiktionarians are doing something on this topic. I hope there will be some people interested to reach the altitudes! Face-smile.svg Noé 11:13, 1 July 2017 (UTC)

I spruced up a little bit the Spanish entries volar and volador. That's my good deed of the month done, then. Also added an Asturian entry - vuelu. --Recónditos (talk) 11:13, 8 July 2017 (UTC)
Great! Thank you! ¡Muchas gracias! I updated the [[Meta page to display a shorter version of the passed editions. There is no mention if people did not ping me or wrote a note on the beer parlour, so feel free to let me know or to enhance the Meta page. LexiSession is getting a year old soon and it's time to look back and make some improvement in the formula Face-smile.svg Noé 15:59, 3 July 2017 (UTC)
Cleaning up उड़ना (uṛnā). —Aryaman (मुझसे बात करो) 16:43, 4 July 2017 (UTC)

Category:Coinages by language (tentative name)[edit]

I'd like to have a category for words which are known to have been coined by a specific person (example: evolutionarily stable strategy). There is Category:Neologisms by language, but I don't think all neologisms have necessarily a well-defined author. --Barytonesis (talk) 13:46, 2 July 2017 (UTC)

Every word is a neologism and a coinage, so I think neither category should exist. —CodeCat 14:52, 2 July 2017 (UTC)
Only a few words have a clear author, so a coinage category may be justified. — Dakdada 15:57, 3 July 2017 (UTC)
So "coinages by known individuals"? (Or named individuals; or groups; or...) Equinox 16:09, 5 July 2017 (UTC)

Proposal: automatically link all links without a section to the English section[edit]

There have been a lot of efforts in recent times to make sure that terms in non-English are wrapped in a template that tags them as such and adjusts the link target appropriately. Thus, I think it makes sense if all links, by default, link to English. This should make it easier for definition writers, because they can link words in a definition without worrying about where that link goes. The template {{def}} was created to alleviate this issue, and people have been adding {{l|en}} to definitions as well which is even worse. Moreover, a global solution would affect links in etymologies and in other places too.

This proposal of course only affects links to entries in the main namespace. It's also explicitly meant to be applied only in places where English text is expected, so it wouldn't be used in lists such as Derived Terms. Those would still use {{l|en}} to tag them, as before. —CodeCat 17:21, 2 July 2017 (UTC)

What are you proposing? Some javascript to automatically make links point to #English? DTLHS (talk) 17:31, 2 July 2017 (UTC)
I think so. Unless there's another way. —CodeCat 17:47, 2 July 2017 (UTC)
I'm unsure. How expensive is js that only tags links in certain sections (for example, you seem to suggest not applying it to Derived terms")? what is the actual benefit, given that English is already the top section (where a user lands) on almost all pages where an English section is present? How does that benefit compare to the drawback that many bare wikilinks that are not to English terms will be mislabelled? (For example, users sometimes use simple wikilinks to link to German or Russian words if they're long enough that the users think it's unlikely there'll ever be any other language section on that page.) - -sche (discuss) 18:31, 2 July 2017 (UTC)
People are currently using {{l|en}} in definitions, so that suggests that those people find a need for such section links. TabbedLanguages links to the last-used language section whenever a link has no section, which ends up always going to the wrong section when a link is in a definition, etymology, or anywhere else that has running English text. Perhaps only the behaviour of TabbedLanguages should be changed. —CodeCat 18:48, 2 July 2017 (UTC)
Support. — Ungoliant (falai) 18:34, 2 July 2017 (UTC)
Might be OK, but not using expensive JS. DCDuring (talk) 19:05, 2 July 2017 (UTC)
Tentative support I can imagine that there may be some mul use cases but I agree that they are probably going to be English definitions. If JavaScript seems like too much of a headache, just have a bot do it--that way it works for users with scripts disabled. —Justin (koavf)TCM 19:32, 2 July 2017 (UTC)
This seems like a solution without a problem to me; English is already where links go, since they land on the top of the page and English is the first language section. In the cases where Translingual precedes English that is likely the desired solution anyway. - [The]DaveRoss 12:06, 3 July 2017 (UTC)
Again, if that's the case, why do people use {{l|en}} in definitions? —CodeCat 12:28, 3 July 2017 (UTC)
You will have to ask them, but this proposal does not prevent anyone from using {{l|en}} incorrectly. - [The]DaveRoss 12:40, 3 July 2017 (UTC)
True, but I figured if they thought it was necessary, then I'd rather solve it in this way than by having {{l|en}} in definitions. Do you think we should disallow putting {{l|en}} in definitions? —CodeCat 13:23, 3 July 2017 (UTC)
@TheDaveRoss: There are a number of reasons why I (and others) often (not always) use {{l}} rather than bare links in definitions, most of which are already mentioned elsewhere in this thread: (1) if the English word is spelled the same as the foreign word being glossed (e.g. French correct, then a bare link won't provide a link at all, but will merely write the word in bold; (2) sometimes Translingual, not English, is the top entry on the page; (3) in Tabbed Browsing, following a link without an explicit language marking takes you to the same language you were just looking at if it's there, rather than the top entry (e.g. if you're at French corriger and click on a bare link to [[correct]], you will be taken to correct#French, not correct#English. —Aɴɢʀ (talk) 21:57, 3 July 2017 (UTC)
That is fair, I am making no judgment about whether or not it is acceptable to use {{l|en}} in definition lines. If that is a problem then I think there are other possible solutions that don't involve creating a pervasive new scripted process. It is also possible to achieve the same result in the limited cases where it is necessary using standard wiki-markup, e.g. [[correct#English|correct]]. This can even be enforced by bots since it is a very regular situation. As far as the Tabbed Browsing issue, I don't use the feature so I can't speak much about that, but it seems like a bug in Tabbed Browsing which we should not fix by changing the default behavior of the site. - [The]DaveRoss 12:27, 5 July 2017 (UTC)
@CodeCat I use {{l|en}} in FL definitions for words that share a page with the English translation. For example, the French entry for correct includes a link to the English section for the word so that the reader does not have to scroll up, past the Dutch section and the second half of the English section, in order to see the word. This is especially useful for obscure words that have a more full definition in the English section, and/or are several languages down the page. Is that what you're talking about? Andrew Sheedy (talk) 19:56, 3 July 2017 (UTC)
Hmm, this debate will probably never end :). I think it would preferable to use the same (explicit, unambiguous & extensible) mechanism to link to other entries, regardless of the target language. “English is at the top of the page” means relying on an implementation detail of the current wiki presentation. Fixing it on the client-side with Javascript isn't exactly a good solution. But those [[square bracket]]s are just too popular... – Jberkel (talk) 22:35, 3 July 2017 (UTC)

Deleting template def[edit]

FYI, consistent with Wiktionary:Votes/2016-07/Placing English definitions in def template or similar, I proposed to delete {{def}} at WT:RFDO#Template:def. --Dan Polansky (talk) 20:48, 2 July 2017 (UTC)

Changing auto-generated categories at bottom of page[edit]

Hey all, I've been searching the Help pages and haven't found an answer to this. How do I edit the categories at the bottom of a page when they are apparently generated automatically? In particular, overstudious is listed as a 4-syllable word when it actually has 5 syllables. How do I correct this? Thanks for any help. BirdHopper (talk) 21:47, 3 July 2017 (UTC)

@BirdHopper: These are made by templates. In this case, it is {{IPA}}. "Oh-ver" is two and "stood-yuz" is two more, so it generates Category:English 4-syllable words. You may be thinking that it's "oh-ver-stood-ee-yuz" which is five. Since words can be pronounced different ways, it can be in both Category:English 4-syllable words and Category:English 5-syllable words but I don't know that this template has the option to add it to two categories at once presently. —Justin (koavf)TCM 21:51, 3 July 2017 (UTC)
Very interesting! I suppose I can understand how it could be pronounced with 4 syllables. From a GenAm standpoint, the 4-syllable variant is rare, which is probably why I didn't even consider there could be an alternative. I'll just let the issue go, then. Thanks for the insight! :) BirdHopper (talk) 22:06, 3 July 2017 (UTC)
@Koavf: Oh, the IPA template does it! Okay. I just added some syllable breaks (dots) to overstudious and it picked it up as having 5 syllables rather than 4. I don't mean to impose my limited experience of the world on everyone else, but the transcriptions, as written, do have 5 syllables. Now I'm curious what would happen if someone entered a 4-syllable version. I think I'll leave it as-is for now, but it's cool to know more about how that system works! :D BirdHopper (talk) 22:31, 3 July 2017 (UTC)
@BirdHopper: I don't have an example off-hand but I know that some entries have multiple instances of {{IPA}} and are in multiple categories because of it. If you put in both, it will be in both--again, even the same word can be pronounced differently and so will have different IPA transcriptions. Rather than replace the one, maybe have both? Actually, it's probably just the one that's correct. Saying it out loud seems wrong. —Justin (koavf)TCM 23:08, 3 July 2017 (UTC)
More detail: syllable counts are done by Module:syllables. It has a list of English diphthongs, and /iə/ is on that list, because New Zealand has /iə/ as a diphthong in words like here. So, to make the syllable-counting function understand that it's not a diphthong, you have to add syllable breaks. (This would be simpler if {{IPA}} were told what accent the transcription represented, and used that to determine which list of diphthongs to use.) I went through a lot of entries with /iə/ using AutoWikiBrowser and added syllable breaks a while back; I guess I missed this word. (Oh, I see the pronunciation was added recently.) — Eru·tuon 23:40, 4 July 2017 (UTC)
@Erutuon: Thanks for the extra information. Very interesting. I agree that an accent specification would be useful in these cases. For now, I'll have to be a little more aware of diphthongs in other accents and add syllable breaks if necessary. BirdHopper (talk) 16:46, 5 July 2017 (UTC)

"...th most common surname in the United States in 2010" (Xin)[edit]

Why do we want this information? Wyang (talk) 22:23, 4 July 2017 (UTC)

That seems wildly specific and virtually impossible to maintain. It's also probably not something that someone is looking for when looking at this word/phrase/term/entry. Unlike--e.g.--Nguyen, which is notably wildly popular in Viet Nam and is worth mentioning for context. —Justin (koavf)TCM 00:03, 5 July 2017 (UTC)
It seemed to me that if we were going to include surnames we ought to try to include some information about those surnames, such as how common they were and in what demographics. If someone has a decent dataset for demographic information outside of the United States I think they should feel free to add that as well, I do not have that information. As far as maintaining it, the US Government publishes the data in a machine-readable format every ten years with the census, it is fairly trivial to update it. - [The]DaveRoss 12:15, 5 July 2017 (UTC)
I think this is the same situation as the similarly problematic template of {{en-rank}}this is not dictionary material. As the entry itself demonstrates, it is actually composed of multiple etymologies, and it would be much more useful if the surname template can be modified to say A surname of Chinese origin. Statistics showing how many people in the United States bear these surnames (and what ethnicities they are) is inconsequential in a dictionary. Wyang (talk) 12:47, 5 July 2017 (UTC)
What is or is not dictionary material is obviously subjective, and if the consensus is that frequency and demographic information about names is not worth including then I will, of course, defer to that consensus. The other conversations I have had about including these things have been positive.
Re "of Chinese origin," that information should, hopefully, be represented in the Etymology section, but it might not hurt to have it echoed concisely in the "definition" line. - [The]DaveRoss 13:08, 5 July 2017 (UTC)
Comparing it to en-rank supports this; what words are core language words and what words aren't is certainly dictionary material. I'm not sure how I feel about this; it's specialized dictionary material, which tends to move it to the edge of what we we cover, but at the same time tends to say it's not clearly over the edge.--Prosfilaes (talk) 06:02, 8 July 2017 (UTC)

Alternative forms & quotations[edit]

Should a quotation of an alternative form/spelling be placed on the main lemma page, or on the form page? (e.g. should quotations of the term huomo – obsolete spelling of uomo – be placed on the former or the latter's page? – GianWiki (talk) 00:18, 5 July 2017 (UTC)

I tend to decide this on a case-by-case basis. In this case, we're dealing with an obsolete spelling of an extremely common word, so I would add citations to huomo, because what's being attested is the specific spelling with an h, not the existence of the word uomo itself. But for rare words that are attested in multiple spellings, I'd put the citations all together in a single entry, so the reader can see that the word definitely exists but is spelled in a variety of ways. —Aɴɢʀ (talk) 10:27, 5 July 2017 (UTC)
Isn't this part of the reason we have a citations namespace? bd2412 T 13:14, 5 July 2017 (UTC)
I agree. They should be placed there. —CodeCat 13:43, 5 July 2017 (UTC)
If a spelling is RFVed (for example, if someone disputes that huomo exists as an alternative spelling of uomo), citations of it must be put in its entry—or less often on a citations page to which it then links—to prove it meets CFI. If a spelling is rare, some people do this pre-emptively. Everything else tends to be subjective / less agreed upon.
Some people add the earliest uses of English words to the lemma entries, even if the citations use other spellings—sometimes even if the citations are other languages, like Middle English (I see this even with Chaucerian examples that aren't the earliest) or Old English (few editors do this; it seems nonstandard/removable). Some people might put famous uses of words in any spelling on the lemma entries, too. Sometimes citations of one {{standard spelling of}} something are put on the entry for the standard spelling that has had content centralized on it.
But in general I would put citations on the Citations: page for the spelling they use (linked to and from the lemma's citations page via {{also}}) or on the lemma's citations page. - -sche (discuss) 15:31, 5 July 2017 (UTC)

Join the strategy discussion. How do our communities and content stay relevant in a changing world?[edit]


I'm a Polish Wikipedian currently working for WMF. My task is to ensure that various online communities are aware of the movement-wide strategy discussion, and to facilitate and summarize your talk. Now, I’d like to invite you to Cycle 3 of the discussion.

Between March and May, members of many communities shared their opinions on what they want the Wikimedia movement to build or achieve. (The report written after Cycle 1 is here, and a similar report after Cycle 2 will be available soon.) At the same time, designated people did a research outside of our movement. They:

  • talked with more than 150 experts and partners from technology, knowledge, education, media, entrepreneurs, and other sectors,
  • researched potential readers and experts in places where Wikimedia projects are not well known or used,
  • researched by age group in places where Wikimedia projects are well known and used.

Now, the research conclusions are published, and Cycle 3 has begun. Our task is to discuss the identified challenges and think how we want to change or align to changes happening around us. Each week, a new challenge will be posted. The discussions will take place until the end of July. The first challenge is: How do our communities and content stay relevant in a changing world?

All of you are invited! If you want to ask a question, ping me please. You might also take a look at our the FAQ (recently changed and updated).

Thanks! SGrabarczuk (WMF) (talk) 14:50, 5 July 2017 (UTC)

Well documented languages and Tagalog[edit]

Can someone please add Tagalog again to Wiktionary:Criteria for inclusion/Well documented languages? It was removed without a proper process. From reading Wiktionary:Criteria for inclusion/Well documented languages, the minimum would be a discussion in Beer parlour, whereas the removal was indicated to be driven by a RFV discussion as indicated in diff.

I do realize some think this is too formal. But as Wiktionary:Votes/pl-2017-05/Modern Latin as a WDL 2 shows, what some think to be consensus often turns out to be something else when a proper discussion or vote is created. --Dan Polansky (talk) 15:49, 5 July 2017 (UTC)

Etymology before Pronunciation[edit]

Hello again! As I've been adding audio, I've noticed a few pages where the Etymology section is placed after the Pronunciation section, as in gadgetry. This goes against Wiktionary:Entry_layout#List_of_headings. I know that entry layout is flexible, but I personally prefer consistency so I'm tempted to "fix" these issues. Can I assume that Wiktionary:Entry_layout is up-to-date and reflects current consensus regarding layout? I recall someone's user page (I don't remember who) that mentioned that the Entry Layout page needs updating. I'm always hesitant to start making edits when a set of guidelines might not be current.

There are other cases where I've seen Etymology after Pronunciation, as in chess. Here, it makes sense to have Pronunciation first as it is common to both etymologies. Just saying that, because I know there are always exceptions to the rules. However, even in this case, there is a guideline at Wiktionary:Entry_layout#Etymology where, again, pronunciation comes after/below etymology. In the case of chess, one would have to duplicate the pronunciation section.

I know these are just guidelines, and nothing is black and white. I'm just looking for some other opinions, or maybe a pointer to discussion about layout that I'm not aware of yet, before I start hacking away. Thanks! —This unsigned comment was added by BirdHopper (talkcontribs).

Yes you can change the layout. I wouldn't go out of your way to fix thousands of pages by hand since this can be fixed automatically if anyone cares to do so. DTLHS (talk) 18:00, 5 July 2017 (UTC)
Okay. And thanks BTW for adding a signature for me. That's the second time I've done that in as many days. Oops. BirdHopper (talk) 18:25, 5 July 2017 (UTC)
I always put pronunciation before etymology. That way it's consistent if there is one word with multiple etymologies. —CodeCat 18:57, 5 July 2017 (UTC)
@CodeCat: But one word can have the same etymology and two different pronunciations--e.g. perfect (purr-fict and pur-fekt). —Justin (koavf)TCM 20:15, 5 July 2017 (UTC)
There's two etymology sections on that page. Also, the "tense" noun is missing an etymology. —CodeCat 20:44, 5 July 2017 (UTC)
Just because there are two sections, doesn't mean there should be. BigDom 06:58, 11 July 2017 (UTC)
There should be as many sections as there are etymologies, of course. —CodeCat 12:08, 13 July 2017 (UTC)

Inline referencing definitions in English entries[edit]

I think that, in general, we should not be inline referencing English definitions of English words. Not using references has largely been our practice. We use attesting quotations, not references; for English words, references carry no weight as per WT:ATTEST.

I have removed an inline reference in abbate but was reverted. What do you think? --Dan Polansky (talk) 12:35, 6 July 2017 (UTC)

I agree that we do not (with the exception of what a few newcomers have done) and should not add <ref>s to definitions, at least not as references for the definitions. The definitions need to be based on how the terms are used, as indicated by citations, as you say. I agree with your edit to abbate (although ideally inline refs like that should be moved to "Further reading"). I have sometimes seen users add references to {{defdate}}s; that might be OK. I have also seen references added to context labels like "proscribed" and "offensive", but in those cases I think it is better to leave the label bare (unreferenced) and add the references to a usage note. - -sche (discuss) 21:05, 7 July 2017 (UTC)
I don't like references to {{defdate}}, but AFAIK there is no consensus for removing them, or else I'd remove them as well. This was a reference to the definition itself. I think a further reading item pointing to offline The Shorter Oxford is pretty useless for our readers, and I would prefer not to have it there, but let t be now. The presentation of the reference is from a horror dream: "“abbate” in Lesley Brown, editor-in-chief; William R. Trumble and Angus Stevenson, editors, The Shorter Oxford English Dictionary on Historical Principles, 5th edition, Oxford; New York, N.Y.: Oxford University Press, 2002, ISBN 978-0-19-860457-0, page 3." It's a winner in a competition about how to make a reference specification as long as possible while providing close to nothing of value of the reader. --Dan Polansky (talk) 21:11, 7 July 2017 (UTC)

TabbedLanguages edit: default to English for unmarked links[edit]

TabbedLanguages currently sends you to the last-visited section, whenever you click a link that doesn't include a language section. I propose that this be changed so that it sends you to English by default, or if there is no English, to Translingual, and if there's no Translingual either, then to the last-visited section. Thanks to the efforts of various editors to add {{l}} and such to unmarked non-English links, and Daniel's work to fix all instances of {{term}} missing a language, most links to non-English terms are appropriately tagged. Thus, by far the most unmarked links in any non-English section are for English words; sending the user to English is only very rarely wrong, and when it is, it's always a result of a non-English term that has not yet been appropriately tagged. —CodeCat 14:12, 6 July 2017 (UTC)

Makes sense. --Dan Polansky (talk) 14:19, 6 July 2017 (UTC)
I agree that this should be fixed, my only comment is that perhaps Translingual should be the priority. Not a big deal since there aren't that many pages with both. - [The]DaveRoss 14:21, 6 July 2017 (UTC)
On a page such as hotel, it would be very undesirable for the link to go to Translingual by default. —CodeCat 15:32, 6 July 2017 (UTC)
I agree it would be better for plain-linked [[hotel]] to take you to hotel#English, not whatever language you were last reading, nor hotel#Translingual, nor the top of the page (which will take a non-logged-in user to the table of contents only). Doing this would obviate the need for the unpopular {{def}}. —Aɴɢʀ (talk) 17:11, 6 July 2017 (UTC)

Enabling Page Previews[edit]

CKoerner (WMF) 15:02, 6 July 2017 (UTC)

Which language section would it default to? Could it be changed via preferences? —Aryaman (मुझसे बात करो) 16:40, 6 July 2017 (UTC)
It would make sense if it defaulted to the language section given in the link, or to English (or the first section when there is no English) for plainlinks. I don't think it would be that useful for it to be configurable, because most non-English links have the language section specified (and the ones that don't, should). --WikiTiki89 17:28, 6 July 2017 (UTC)
Is this similar to the "Lupin" a.k.a. "Navigation" pop-up gadget? I have often wanted a version of that gadget that would fetch enough of the page that it would consistently fetch at least the first definition of the first (or specified) language section. - -sche (discuss) 21:12, 7 July 2017 (UTC)
For entries with {{wikipedia}} etc. I often see no substantive content at all from the popups we now have. I don't know whether an image is what we really need, rather than more - lots of - definitions. This could be very good. As with many improvements, configurability (suppressing graphics in my case) would be nice, but not at a high performance/server-load cost. DCDuring (talk) 19:43, 8 July 2017 (UTC)
(PoS header and definition lines seem essential; etymology header would be nice to indicate how much content there might be beyond what the page previews might be showing. Others might prefer other headers or content.) DCDuring (talk) 19:48, 8 July 2017 (UTC)

Wiktionary:Votes/pl-2017-07/Vote references in policies[edit]

Based on the discussions linked in the vote, I created Wiktionary:Votes/pl-2017-07/Vote references in policies. --Daniel Carrero (talk) 18:15, 7 July 2017 (UTC)

motî, iOS dictionary app released[edit]

This is a follow-up post to Looking for beta testers for new Wiktionary iOS app from November last year. There wasn't a great deal of interest in testing so unfortunately I didn't get much feedback. However the app is now publicly available in the App Store. It works offline, it's free, and it doesn't have ads (and never will). Right now there are only 10 languages but I plan to add more in later versions. The idea is to continuously update it based on recent dumps. – Jberkel (talk) 09:12, 8 July 2017 (UTC)

This looks really nice! Id love to use it, but I have Android. —Aryaman (मुझसे बात करो) 13:30, 13 July 2017 (UTC)
I'd love to work on an Android version but will focus on iOS first. I also considered doing a simple HTML5/mobile web version, but the offline storage limits are still too low for the (quite heavy) dictionary data. – Jberkel (talk) 07:42, 14 July 2017 (UTC)
Good to heard news of your project! I have Android on my phone too, but I think a better mobile app than the one made for Wiktionary now is a good news! Face-smile.svg Noé 12:12, 23 July 2017 (UTC)

What is the purpose of {{catlangname}} and {{topics}}?[edit]

What does {{catlangname|ru|calques}} do that [[Category:Russian calques]] doesn't? Similarly what does {{topics|ru|Electricity}} do that [[Category:ru:Electricity]] doesn't? I gather there are shortcuts {{cln}} and {{C}}, but there's also the category shortcut [[CAT:...]]. Benwing2 (talk) 22:06, 8 July 2017 (UTC)

Stops HotCat from being usable. Also saves a bit of typing if there are many categories? - -sche (discuss) 02:21, 9 July 2017 (UTC)
Sort keys. —CodeCat 11:24, 9 July 2017 (UTC)
@-sche "Stops HotCat from being usable" - how is that a good thing? A genuine question because I find HotCat very useful. BigDom 05:57, 11 July 2017 (UTC)
I believe -sche was being sarcastic. That sounded like a criticism against the templates. Like this: "We should stop using these templates, they stop HotCat from being usable". --Daniel Carrero (talk) 06:00, 11 July 2017 (UTC)
Or "Please can a JavaScript wizard make HotCat work properly"... — Eru·tuon 06:06, 11 July 2017 (UTC)
Yes! --Daniel Carrero (talk) 06:08, 11 July 2017 (UTC)
I would like to get rid of {{topics}}, but the only way we can do that is if categories automatically sort entries the right way. This is a feature we've been waiting on for years. Right now, {{topics}} is essential for reconstruction pages, which otherwise all get sorted under R for Reconstruction. It's also necessary for mainspace languages since we have custom sort keys. —CodeCat 12:06, 13 July 2017 (UTC)

Compound and fiction etymology categories[edit]

Is there a particular reason that all the top etymology categories for types of compounds are top-level etymology categories? That is, the following categories:

I would propose that these should be placed under Category:Compound words by language. The by-language categories already work this way.

We similarly seem to have Terms derived from [work] categories such as Category:Terms derived from Harry Potter by language as top-level etymological categories rather than, as expected, children of Category:Terms derived from fiction by language. --Tropylium (talk) 13:28, 9 July 2017 (UTC)

Agree. That's what I suggested here. --Barytonesis (talk) 13:32, 9 July 2017 (UTC)
It would be better not to put subtypes of compounds under Category:Compound words by language, but rather under another category. The x by language is supposed to contain only language-specific categories for x. See, for instance, how Category:Lemmas by language does not contain Category:Nouns by language, Category:Verbs by language, Category:Adjectives by language; instead these are placed in Category:Lemmas subcategories by language. So I've gone and moved the primary subtypes of compounds by language to Category:Subtypes of compounds by language, as @Barytonesis proposed on the talk page mentioned above. — Eru·tuon 23:48, 9 July 2017 (UTC)
While a category like Category:Subtypes of compounds by language is necessary, it could be renamed; perhaps Category:Types of compounds by language would be better, or something else? — Eru·tuon 23:55, 9 July 2017 (UTC)

Sound changes: categories and etymologies[edit]

I'm thinking there should be categories for the sound changes that English words particularly have undergone. For instance, terms that have undergone yod-coalescence (nature, nation, tune, idjit, whatcha) or yod-dropping (lute, new, figure, beautiful: obviously both of these vary by dialect), terms affected by the horsehoarse, Marymarrymerry, cotcaught, or weak vowel mergers. The same could be done for sound changes in other languages.

English is in an odd situation because these sound changes are usually not reflected in spelling, and hence they are not visible in etymologies.

Categories could be easily added by {{accent}} ({{a}}) in pronunciation sections, because they already sometimes mention sound changes (see, for instance, before § Pronunciation). But I think etymology sections should also contain information about changes in pronunciation that aren't reflected in spelling. — Eru·tuon 23:38, 9 July 2017 (UTC)

Synchronic regional differences in pronunciation have no place in etymologies, in my opinion: they're part of the history of the lect, not of the term itself. That's not to say that a regional form which is spelled differently than other forms due to one of the sound changes in question shouldn't mention it- but the etymology for Mary should say nothing about the Mary-marry-merry merger. If you think about it, any random sequence of letters of the right shape to be interpreted as containing such phonemes will reflect the changes when individuals read them aloud (e.g. *morpliger will reflect differences between rhotic and nonrhotic lects), so it's not about the history of the term. The place for such things is in the pronunciation section, as part of illustrating the regional variation for the term. Chuck Entz (talk) 02:51, 11 July 2017 (UTC)
I agree, I don't see how the kind of information you mention could sensibly be incorporated into etymology. It would make no sense to mention in the etymology of Mary that it has undergone the Mary-marry-merry merger, for example, since (1) that statement is false for some number of speakers, and (2) it's not etymological information. - -sche (discuss) 02:59, 11 July 2017 (UTC)
Listing examples of sound changes sounds like something best suited for an appendix. English could definitely use one, perhaps also various other languages with "minor" unpredictable spelling rules. See, for example, Appendix:Hungarian words with ly. --Tropylium (talk) 15:23, 23 July 2017 (UTC)

Arabic script CSS font stack proposal[edit]

I ran into some problems with the current Arabic font stack which seems to be rather nonsensical at the moment. I dont know what the logic is behind it. Here's a proposal I came up with for what to do about it: User:Radixcc/ArabicFontStackTest It's a little hard to figure out what's going on with some fonts because I can't find a @font-face directive anywhere for including fonts currently in the CSS. — Radixcc 📞 16:44, 10 July 2017 (UTC)

Which fonts to use for Arabic has been discussed before; one of the issues that complicates things is that not all fonts display sequences of diacritics (vowels + shadda, shadda + vowels) well. But it has been several years since the last discussion, maybe fonts have improved. Perhaps it would be illustrative to check how Wikitiki's "Arabic font test" examples display in the fonts you propose, and the fonts that were previously rejected (to see if those have improved). Pinging some users who participated in the long discussion I linked to: @Wikitiki89, Atitarev, Mzajac. - -sche (discuss) 17:30, 10 July 2017 (UTC)
Ok I added the diacritic tests to my page. Now that I get to comparing them it seems like the problem with Droid Arabic Naskh is that it appears a bit larger than the others. The whole Arabic font situation sure is a headache. — Radixcc 📞 02:37, 11 July 2017 (UTC)

Focus search box by default on most pages[edit]

I am sure this has come up previously, although I didn't find any recent discussions on the Beer Parlour. An OTRS email suggested that it would be very useful for the user if the search box had initial focus on most pages. The German Wiktionary has had this feature implemented for years, and I copied it here for those who would like to test it out. Just add importScript('User:TheDaveRoss/searchFocus.js') to your common.js page to see what it is like. It does not focus the search box on a few pages where that would obviously be undesirable, such as edit pages and log-in pages.
Main question: should we implement this for all users by default? It is not without downside, but it is pretty handy most of the time. - [The]DaveRoss 19:26, 10 July 2017 (UTC)

My comment in the previous discussion was that (AFAIK) focusing a control always forces the page to scroll to a point where the control is visible. This can be very annoying, e.g. if you are visiting water#Occitan. A text box having focus might also block other shortcut keys that would normally scroll, such as PgUp/PgDn. Equinox 19:30, 10 July 2017 (UTC)
The scrolling to focused control problem is an important one, and I can see a few possible ways around that but it is a problem with the current implementation. One would be to only focus if the page URL does not include an anchor, which is crude but effective. There are probably cleverer solutions as well, but I would rely on people who have done webdev and understand how focus affects things in various browsers.
The page up and down, as well as the arrow keys, are still functional for scrolling with this enabled. - [The]DaveRoss 19:40, 10 July 2017 (UTC)
This SO Answer may be applicable, as long as it doesn't do the whole scroll up and down dance. - [The]DaveRoss 19:46, 10 July 2017 (UTC)
We could make the search box scroll with the page. DTLHS (talk) 19:44, 10 July 2017 (UTC)
User could also be advised of the focus-search-box key (Alt+Shift+F in Chrome, might possibly vary by browser, and presumably not available on mobile). Equinox 20:01, 10 July 2017 (UTC)

Count me in as opposed to the idea--we don't need to control the users' browsers or behaviors any more than we already do. Scripts which unexpectedly take away control or focus are a real nuisance to me. —Justin (koavf)TCM 20:45, 10 July 2017 (UTC)

One could argue that this isn't unexpected, the current behavior is what is unexpected, but I agree that poor implementations of focus change are a real nuisance. - [The]DaveRoss 12:26, 11 July 2017 (UTC)

Actually, that "feature" (and iirc also one other stupid script) is the reason why I (as a german) am using enwikt instead of dewikt. So, pretty please don’t do it here. --Nenntmichruhigip (talk) 19:07, 26 July 2017 (UTC)

temp:head or language-specific template?[edit]

I never seem to know which markup is preferable to use: {{head|fr|suffix}} or {{fr-suffix}} (I'm only using French as an example)? I just did this, is this all right? --Barytonesis (talk) 22:30, 10 July 2017 (UTC)

Language-specific templates only really make sense if they add something significant, which is not the case in your edit. So I think your edit was right. —CodeCat 23:00, 10 July 2017 (UTC)

Not bolding the initials of abbreviations, acronyms and initialisms[edit]

At some point, I'd like to create a vote to incorporate this rule in WT:EL:

"Abbreviations, acronyms and initialisms can't use bold letters like this: armoured combat vehicle in ACV. The correct would be simply armoured combat vehicle."

I don't have actual numbers, but I believe this proposal likely reflects an unwritten rule already in practice. Most affected entries don't use the bold letters anyway, but sometimes I find a few entries that do.

If this passes, it would be kind of consistent with this 2010 vote: Wiktionary:Votes/pl-2010-03/Bolding letters in initialisms (based on Wiktionary:Beer parlour/2010/March#Bolding letters in initialisms). All participants voted "Oppose bolding", but I believe this simply means that no rule was effected. Apparently, EL was not edited in any way based on that vote. --Daniel Carrero (talk) 07:06, 11 July 2017 (UTC)

Though I think the bolding has been overused, there are occasions when the bolding makes clear how abbreviations of multi-word expressions ("MWE"s) are constructed from the components, where it is not immediately apparent. Similarly for some blends. IOW, we may have a preference, but a vote seems inappropriately rigid. A more complex proposal that attempts to address the exception I've identified is likely to be harder to understand, have surprising unanticipated consequences, and make for more rigidity. Less legalism, more use of dump-processing to support reviews of possible problematic overuse, misuse, etc seems more wiki-like. If this flexibility is not you one's taste, perhaps WikiData is a better project. DCDuring (talk) 12:14, 11 July 2017 (UTC)
In the entry for ABQ bolding selected letters would be very handy. In fact the display forced by {{abbreviation of}} makes the desirable use of bolding impossible to implement while getting the benefits of {{abbreviation of}}. Note also that in this case the abbreviation is not even one of an MWE. DCDuring (talk) 12:26, 11 July 2017 (UTC)
@DCDuring: Maybe the wording could be something like: "bolding of initials is generally discouraged". Assuming we want some entries to have it, but not all or most. --Daniel Carrero (talk) 14:12, 12 July 2017 (UTC)
Why not just start off WT:ELE with the imprecation: "Try to do a good job of formatting." Another, probably more productive approach would be to eliminate all the bad formatting in existing entries so there are fewer bad examples for contributors to follow. This would probably be more productive than working on yet another vote that doesn't leave us in a better place than we are now. DCDuring (talk) 15:01, 12 July 2017 (UTC)
We'll eliminate all the bad formatting from entries as soon as someone writes an algorithm that can tell us what bad formatting is. Or as soon as we rewrite our entries so that they are actually parseable by a computer without inventing strong AI first. DTLHS (talk) 04:40, 13 July 2017 (UTC)
We don't have to do anything hard. We can identify entries that use the several templates used for abbreviations and also contain an emboldened capital letter followed without a space by one or more lowercase letters. Most of these will be for initialisms or acronyms for which, IMO, there is not sufficient justification for bold. The other cases may need manual review. As we have more or less standardized on the templates involved, we should thus easily identify many of the cases. Once this has been done, a dump could be analyzed for all remaining instances of such or similar used of bold for parts of words, for manual review. Obviously if there is no consensus on the simpler cases, we can't proceed. DCDuring (talk) 05:49, 13 July 2017 (UTC)
I remember this was discussed before and most people disliked the bolding. Unfortunately no idea when/where. Equinox 17:04, 12 July 2017 (UTC)
Other than the 2010 discussion I linked in my first message above? (I noticed you started that discussion.) --Daniel Carrero (talk) 17:10, 12 July 2017 (UTC)

Language sections[edit]

I like very much the presentation of languages in articles in www.mediawiki.org ( example page). Would it be an enchantment to have it in all wiktionaries? I mean, in every page, instead of language sections, to have a table that the user will select the language to see. That way when a user comes to very heavy page with lots of language sections will not be obscured by other languages. I see they only have a <language> tag which probably does most of the work. Maybe in such an extension Languages do not have to be in accordance with iso codes, in order to have the ability to add non standard language (if this is desirable). As of my understaning this works by creating a subpage of a language and just displaying it depending on user preferences (which we may or may not use in wiktionaries). The user has the freedom to choose any language to see. --Xoristzatziki (talk) 14:15, 11 July 2017 (UTC)

There is a similar feature available here, it is called Tabbed Languages. You can enable it in Preferences > Gadgets if you would like to see it in action. - [The]DaveRoss 14:23, 11 July 2017 (UTC)
Can someone remind me why Tabbed Languages isn't enabled for all logged-out users? Entries with several language sections are basically a cluttered mess without it. This, that and the other (talk) 21:14, 12 July 2017 (UTC)
It was actually voted on and passed, but then nobody did anything about it. It should be done now. —CodeCat 12:03, 13 July 2017 (UTC)

Strategy discussion, cycle 3. Let's discuss about a new challenge[edit]

Hi! It's the second week of our Cycle 3 discussion, and there's a new challenge: How could we capture the sum of all knowledge when much of it cannot be verified in traditional ways? You can suggest solutions here. You can also read a summary of discussions that took place in the past week. SGrabarczuk (WMF) (talk) 15:36, 11 July 2017 (UTC)

We already don't require "reliable sources", so I think we're ahead of the curve on that one. —CodeCat 12:10, 13 July 2017 (UTC)

Creating redirects to xx-IPA templates[edit]

Is it an accepted practice to create a redirect to an xx-IPA template (e.g. {{hu-ipa}} redirects to {{hu-IPA}})? I am copying a conversation from User:Liggliluff's talk page:

Hi, what is the purpose of the redirect to {{hu-IPA}}? --Panda10 (talk) 16:43, 13 July 2017 (UTC)

Because {{fa-ipa}}, {{ko-ipa}} exists, and if people are used to these, it'll be easier for them to find the other, and it's quicker and easier to not having to shift case.
But then, the other templates doesn't have lowercase redirects: {{ar-ipa}}, {{ca-ipa}}, {{cs-ipa}}, {{eo-ipa}}, {{et-ipa}}, {{fi-ipa}}, ...
And then you got: {{grc-IPA}}/{{grc-ipa}}
I believe the standard naming convention is xx-IPA. But I will bring up the subject at Beer Parlour.

--Panda10 (talk) 18:43, 14 July 2017 (UTC)

Yes, the standard convention is xx-IPA. A few years back I moved all templates with deviating names to xx-IPA as long as they were luacized templates that automatically generated pronunciation information. Most redirects are there because of the page moves. There's nothing particularly wrong with having redirects from xx-ipa names, but there's no particular reason for them either. Just use xx-IPA. —Aɴɢʀ (talk) 08:17, 15 July 2017 (UTC)
Thanks. --Panda10 (talk) 13:58, 15 July 2017 (UTC)

CFI and Poorly-Attested Varieties of Well-Documented Languages[edit]

The whole distinction between LDLs and WDLs was intended to protect entries for lects with limited corpora of written texts, and yet there are large numbers of dialectal terms even in English that are hard or impossible to verify under the current rules.

For one thing, people have always tended to write only in the standard lect and only speak in the other lects (or at least not write anything that gets durably archived). Add to that the lack of standard spelling, which means that any single variation is less likely to be attested often enough, and you have the equivalent of many LDLs embedded within WDLs.

There's also the matter of historical variation in depth of attestation: modern technology has made it easier to produce, distribute, capture and preserve language, and attitudes about various sublects, not to mention tolerance in general for lects other than the standard ones, have changed over time.

Is there any way we could modify CFI to take this into account? Perhaps we could specify in the WDL list which sub-lects are well-documented, and exempt all the others from the WDL requirements. Either that, or add general parameters for which types of sub-lects should be exempted or not exempted. Chuck Entz (talk) 20:46, 14 July 2017 (UTC)

I definitely agree. It's much harder to attest uncommon variants of a language, and even very informal levels of language can be difficult to attest, for the same reasons. Andrew Sheedy (talk) 03:56, 15 July 2017 (UTC)
I agree as well. We should protect dialectal terms and old hapax legomena (e.g. لسپردرک (laspardarak)). --Vahag (talk) 08:19, 15 July 2017 (UTC)

Entries by (talk)[edit]

This user has been creating lots of entries for generic molecular formulae. I seem to remember that we don't accept these. Or do we? SemperBlotto (talk) 13:05, 16 July 2017 (UTC)

  • Most of his other entries are of rather poor quality - I have half a mind to block him. SemperBlotto (talk) 13:25, 16 July 2017 (UTC)
Now the user is adding phobias, some of which seem to be barely attested, and others of which just get a lot of mentions. But the user is probably adding this content in good faith, maybe not knowing that "mentions" don't meet CFI, so I think advising them on their talk page is better than blocking. - -sche (discuss) 18:02, 16 July 2017 (UTC)
Block them as a BrunoMed sock. At least I think that's who it is. Look for assembly-line-style use of the same verbiage whether it fits the entry or not. I'm not 100% sure, because I don't remember the geolocation details of their previous socks- except that they all geolocated to Croatia, as this one does. Chuck Entz (talk) 01:26, 18 July 2017 (UTC)


Based on the discussions linked in the vote, I created Wiktionary:Votes/pl-2017-07/Gallery. --Daniel Carrero (talk) 11:38, 17 July 2017 (UTC)

"Obsolete" forms that were never really used[edit]

The discussion that started here and the contributions of this anon give rise to a problem that we need to solve – what do we do with word forms that were never naturalised and are they to be considered "obsolete"?

Context: Romanian underwent a change of writing systems in the late 18th century. Transylvanian scholars adapted the Latin alphabet to the Romanian language, using orthographic rules from Italian. The Cyrillic alphabet remained in gradually decreasing use until 1860, when Romanian writing was first officially regulated. At the time, countless mixed alphabets were introduced, some were even used simultaneously. If you were to read texts from this period, it wouldn't come as a surprise if one word was written using several alphabets; it all depended on who you were reading.

Adding to an already difficult linguistic period, there was also a tendency in the works of several scholars to re-Latinise the language ad absurdum (for instance Dicționarul limbii române, 1871-1876, by August Treboniu Laurian and Ion C. Massim).

E.g. the word băiețel ("little boy") was written as baiatellu in the aforementioned dictionary.

The spelling is completely subjective to the ideas and beliefs of the scholars who wrote it. It does not in any way, shape or form describe how the word was actually pronounced or written by the public.

Therefore, my question is should word forms such as baiatellu be included under Alternative Forms as obsolete even if they were never actually used? The form has indeed citations and would technically fulfil minimum requirements for inclusion, but it somehow feels wrong to add it considering that it was used only in a tight-knit circle of scholars and authors of the time. I have similar hesitations when it comes to forms without diacritics (e.g. -țiune vs. -tiune), because it would cause disarray amongst Romanian entries and possibly other languages too.

Any input is highly appreciated (@Redboywild, @Word dewd544). --Robbie SWE (talk) 09:57, 18 July 2017 (UTC)

Maybe a "hypercorrect" gloss? We've had a few such English entries where e.g. an æ spelling has been added that doesn't stand up to scrutiny. Equinox 12:20, 18 July 2017 (UTC)
I would tag them as both "obsolete" and "rare". If there's a standard explanation that you would use in a number of entries, you could also create a template. Chuck Entz (talk) 13:40, 18 July 2017 (UTC)
I'm sorry to have to be a massive pain, but the implications of said solutions are daunting. It would give every Romanian entry several alternative forms, most of which would be artificial and/or unknown to a majority of Romanians. These forms, especially the made-up Latin forms from the late 19th century, would erroneously suggest that Romanian was morphologically closer to the other Romance languages than it actually was. We would basically be accepting counterfactual efforts from some scholars to revamp the historical evolution of the Romanian language. I would personally not go anywhere near forms that popped up during this period – only veritable alternative forms and attested archaic/obsolete forms such as nație for națiune, pâne for pâine (DEX is pretty good at mentioning these alternative and archaic/obsolete forms together with veritable sources from prose and poetry), etc., deserve to be mentioned, IMHO. I think the problem is exacerbated by the lack of written sources in Romanian dating from before the 16th century. It makes it hard to create a historical timeline for the Romanian language – where we have Middle English and Old English, in Romanian we have nada. --Robbie SWE (talk) 16:13, 18 July 2017 (UTC)
It's up to you to decide what to work on. If a word appears in print it can be included, provided it is properly tagged, even if it was used by an author promoting historical revisionism. DTLHS (talk) 16:41, 18 July 2017 (UTC)
If there are three independent citations of forms like you mention (i.e. they don't just appear in the work of one author), then someone who wants to spend the time adding them can do so. I would suggest adding a few sentences at WT:ARO that describe the issue ("in the 1800s circles of scholars proposed many spellings for such-and-such reasons that never caught on and are now obsolete...") — perhaps WT:ARO#Moldovian_and_Cyrillic_Romanian (where the allowance of Old and New Cyrillic forms is explicit in Wiktionary:Votes/2011-10/Unified Romanian) can be generalized into a section on spellings. Then, one could make a qualifier template that links to that explanatory section, to put after those spellings when they're listed in alternative forms, and one could also make a "form of" template to use in the entries for the spellings themselves. Maybe the wording could be "obsolete respelling of X proposed in the 1800s"? - -sche (discuss) 17:31, 18 July 2017 (UTC)
I'm personally of the mind that we shouldn't really bother too much with these. It's just going to add unnecessary confusion for those who aren't very familiar with the language and its historical evolution, or otherwise just take a lot of effort to explain the (rather obscure) context of these forms of the words. At any rate, I won't really be involved with this, as I still have other things to work on. Word dewd544 (talk) 17:58, 18 July 2017 (UTC)
It's definitely within our ambit. If we're worried about adding unnecessary confusions, we should figure out how to record the information in a way that isn't confusing. This is hardly a problem limited to Romanian; few natural languages had the spelling standard that persists today (if the community has, indeed, decided on a spelling standard) upon the birth of writing in that language. Glancing at Shakespeare's First Folio, it seems we have much of the old 17th century spelling, but not linked from the modern spelling in any way.--Prosfilaes (talk) 21:33, 18 July 2017 (UTC)

Ok then. Humour me for a minute – let's play a game of what-if.

What if I were a scholar, specialised in linguistics with a strong proclivity for English. Let's say I hate foreign influence on the English language – Anglo-Norman, Latin and other Romance languages have ruined this Anglo-Saxon gem and nothing would tickle my fancy more than to cleanse the language from these aberrations. In a Tolkienesque manner I author a book proclaiming my agenda and later, a voluminous dictionary where I completely refurbish the English language – vocabulary, grammar and morphology, you name it, have all been Anglo-Saxified. Several fellow colleagues agree with me, write books about my work, cite me frequently and some even continue my purification crusade. Others criticise my work and call me a nutcase (and rightly so, if you ask me), nonetheless plenty of quotes, but not so many headlines (you know, the media is too busy covering Trump's latest tweet or something like that). Flash forward 150 years, someone finds my work and thinks "Wow, English sure has changed – I've never seen these words and archaic spellings before. I think I'm going to add them to Wiktionary as obsolete forms". These contributions are easily cited because finding citations is a piece of cake, so they pass WT:CFI. Mind you, no one else – authors, newspapers, mass media, or that Instagram celebrity who is famous for doing nothing – has ever used the words in my dictionary.

Back to the present. If the consensus is that the postulation above is feasible and something we should accept, then I think I'm going to need Prozac and a call to my shrink cause the world has gone mad I tell you! --Robbie SWE (talk) 18:48, 19 July 2017 (UTC)

All words in all languages. As long as they pass CFI, tag them as obsolete, rare (and maybe even make a special label or template with an informative link that says following the abandoned orthography of Dr. So-and-so). I'd only link to them from a lemma in an autocollapsed box with qualifiers. —Μετάknowledgediscuss/deeds 18:56, 19 July 2017 (UTC)
How would they be easily cited if no one used them? Three independent cites is not a trivial hurdle. There have been quite a few English spelling reformations, but few of them can offer even one real printing in the spelling; we count uses, not mentions. The Deseret alphabet, supported by the local government, could arguably not reach that level for anything.
There's a lot of marginal language use. People are welcome to not spend their time on anything they don't feel is worth it. But if someone wants to record this history of Romanian, it's entirely in our ambit.--Prosfilaes (talk) 21:42, 19 July 2017 (UTC)
"What-if"? See w:Linguistic purism in English. We've had problems with otherwise very good contributors- even admins- trying to push this. We document everything that's been actually used (as opposed to mentioned), but we explain what it is, and we don't allow uncommon forms as translations or in definitions, nor do we usually link to them from the main forms. That way someone who runs into it somewhere (e.g. Google Books) knows what it is, but we aren't promoting it. Chuck Entz (talk) 14:13, 20 July 2017 (UTC)
The point of my somewhat overdramatic "what-if" story is to exemplify what I believe to be an absurd stance – the situation for Romanian is that we have word forms coined by scholars, mentioned within their like-minded networks but not actually used by anyone else. I don't think it is in our best interest to add these forms as alternative forms in main namespaces (like the anon did), because it suggests that they were common at the time. I'm not going to work with these forms, but I dread that someone else will find them extremely interesting and add them to existing Romanian entries. --Robbie SWE (talk) 17:54, 20 July 2017 (UTC)
If they're not actually used, then they're not relevant. You say "coined by scholars, mentioned within their like-minded networks but not actually used by anyone else" which avoids the important question of whether they were actually used by anyone. If they were, then we should have entries on them. I believe we should add alternative forms on them, that we should link all alternative forms that are citable, with appropriate notes, but that's not the important thing.--Prosfilaes (talk) 19:26, 20 July 2017 (UTC)
Romanian forms with -tiune and (silent) ending u were used - and not just mentioned - and are attestable as for WT:CFI. If there are doubts, please use WT:RFVN. Those old spellings are similar to for example old English spellings which are included in the English wiktionary too. (It's not necessarily hypercorrect, and even English spellings with æ or œ are not necessarily hypercorrect.)
If -tiune and (silent) ending u were rare, then it could also be because Romanian was rarely written or rarely written in Latin characters in the 19th century, and not just because -țiune and u-less spellings were the common forms. Anyway, as others pointed out there could be more informative labels than just obsolete.
Latinising spellings, as -tione instead of -tiune/-țiune in the dictionary by Laurianu and Massimu, probably aren't attestable. If they unexpectedly are attestable, then the label simply could/should be more informative than just obsolete, for example it could be [[Wiktionary:About Romanian#Spelling|Latinising spelling]]; obsolete, rare/uncommon. Also dates could be added in the label like 19th century Latinising spelling, or inventors could be mentioned if there are any and if they are famous like Latinising spelling following Laurianu and Massimu. (If the inventors are not famous, then it's not really help- and useful to mention them in a label.) - 18:58, 23 July 2017 (UTC)
I think there's a general consensus that there's not a problem creating entries for these things. The real argument seems to be about linking them to entries using standard spellings, which is not an issue resolvable by WT:RFVN.--Prosfilaes (talk) 17:43, 24 July 2017 (UTC)

Adding Demotic[edit]

Can we add Demotic (the stage of Egyptian between Late Egyptian and Coptic, not the Greek vernacular) as a language (perhaps egx-dem)? It’s cropping up in a lot of Coptic etymologies (e.g. ϣⲉⲣⲓ (šeri), ϩⲁⲓ (hai), ϩⲟⲟⲩⲧ (hoout)) and some others (e.g. lily) with no clear way to link to it. The script and transliteration are different from (hieroglyphic/hieratic) Egyptian, as is the grammar and a good part of the lexicon, so that splitting it off wouldn’t result in significant duplication of content. Traditional lexicography keeps the two separated (cf. the Wörterbuch der ägyptischen Sprache vs. the Demotisches Glossar) with good reason. — Vorziblix (talk · contribs) 16:30, 19 July 2017 (UTC)

Sounds reasonable to me. —Aɴɢʀ (talk) 21:53, 19 July 2017 (UTC)
@Angr Since I don’t have the requisite admin rights to edit the module, could you (or any other admin) add the following to Module:languages/datax:
m["egx-dem"] = {
        canonicalName = "Demotic",
        otherNames = {"Demotic Egyptian", "Enchorial"},
        scripts = {"Latinx", "Egyd"},
        family = "egx",
        ancestors = {"egy"},
        wikipedia_article = "Demotic (Egyptian)",
and add the line
   ancestors = {"egx-dem"},
to Coptic in Module:languages/data3/c? (The tabs might need to be fixed if not copied from source.) Thanks. — Vorziblix (talk · contribs) 00:36, 20 July 2017 (UTC)
Added. DTLHS (talk) 00:47, 20 July 2017 (UTC)
Either way is fine, but looking over results from e.g. Google Books or web search, unqualified ‘Demotic’ in English almost always means Egyptian Demotic (or simply the adjective) and Egyptian Demotic is almost always called simply ‘Demotic’, whereas Greek Demotic is generally specified as such. Context also makes it rather unlikely that the two would be confused, especially given that we don’t have Demotic Greek as a language or dialect separate from (Modern) Greek. However, if consensus favors changing the name, it should be fairly easy to do. — Vorziblix (talk · contribs) 05:15, 20 July 2017 (UTC)
Yeah, it's common enough to call it just "Demotic" (like also the script Egyd).
But on the subject of naming conflicts, we have both Category:Egyptian languages and Category:Egyptian language, i.e. a family and a language have the same name. Wiktionary:Families advises that this should be avoided, but does it actually cause any problems other than in etymologies where "from Egyptian" (compare "from Germanic") and "from Egyptian" would be indistinguishable? If that's the only issue, it seems like it can be worked around without renaming anything. - -sche (discuss) 09:28, 21 July 2017 (UTC)

Vocalisation of laryngeals, again[edit]

Can anyone please help me deal with Victar in Reconstruction:Proto-Indo-European/h₂reh₁- and Reconstruction:Proto-Indo-European/Hreh₁dʰ-? The two given reconstructions make no sense. A sonorant in a zero-grade root becomes syllabic, this is standard PIE. So this means that the laryngeals next to it certainly don't become syllabic. Syllabic sonorants in Germanic develop an epenthetic -u- in front of them, which is what would be expected in such a form. The fact that something else is found implies that the reconstruction is wrong. How do I explain this? I'm tired of being forced into an edit war in order to keep dubious information out of Wiktionary. Clearly there is no consensus to include it, so why should it be included anyway? —CodeCat 21:20, 19 July 2017 (UTC)

It's definitely unexpected for HRHC- to make the second laryngeal syllable rather than the R, but maybe someone's discovered a new sound law by which (word-initial?) HRHC- surfaces as post-laryngeal RəC- > RaC- in Germanic rather than the normally expected R̥̄C- > uRC-. Are there any PGmc words that do start with uRC- < HRHC-? All the uRC- words I can find in CAT:Proto-Germanic lemmas (*umbi, *und, *under, *unhtaz, *unseraz, *urbą) seem to come from *(H)R̥C-, not *(H)RHC-. —Aɴɢʀ (talk) 21:49, 19 July 2017 (UTC)
uRC from RHC: *kundaz, *kurną, *gulþą, *hurną, *hulliz, *þunnuz, *spurą. Kroonen notes for *bladą < *bʰl̥h₃tóm that the -a- must be secondary since it can't reflect an inherited form of the root in any grade. There may be other cases of such "impossible" grades with laryngeal-final roots in Germanic. —CodeCat 19:53, 20 July 2017 (UTC)
None of those are word-initial, though. —Aɴɢʀ (talk) 21:19, 20 July 2017 (UTC)
These are sourced from Kroonen, so in the absense of alternative analyses, I'm not sure what else are we supposed to do here. Perhaps we might add a question mark, and explain the actual issue in the PGmc entries themselves (once they have been created). --Tropylium (talk) 15:35, 23 July 2017 (UTC)
We're not required to go with Kroonen. This is one of the reasons why I opposed blindly following sources in the past. Sometimes they really do lead you somewhere nonsensical. My own interpretation of the situation is that Kroonen is probably essentially correct, but that the derivation is post-PIE. They would have occurred at a time when laryngeals were no longer consonants, but the laryngeal nature of certain roots was not yet entirely lost. A derivation like *bladą is only possible if speakers somehow "knew" that a (or a predecessor) was the vowel to be used in the zero grade of such laryngeal roots, which in turn must have arisen by analogy with CHC-shape zero grades where a is the regular development. However, it is important to note that there is no a in the past plural of strong verbs anywhere, even in verbs of laryngeal roots. Instead, classes 6 and 7, where most laryngeal roots are, have no zero grade altogether. —CodeCat 15:44, 23 July 2017 (UTC)
Most of what we list in PIE root entries under derivations are preforms (projections into PIE) and not proto-forms (comparative reconstructions) anyway. A single reflex in a single branch, say Celtic *sutus < "*séwH-tus" < *sewH- typically does not warrant reconstructing *séwHtus for PIE itself. This is after all why we (and also other reference works) normally list such descendants under just the PIE root, not under any actual PIE term.
So the wider question is: do these pre-forms have to adhere to canonical PIE grammar, or is it acceptable to give pre-forms that clearly were not PIE and indicate later formation? (Note that, from an Indo-Hittite viewpoint, this would also have to include quite a bit of morphology.) We do want to link later formations from PIE entries somehow, and the current approach seems like a workable compromise. Maybe we can add a disclaimer to WT:AINE about forms in derivatives-of-roots lists. --Tropylium (talk) 13:03, 26 July 2017 (UTC)
Normally, the actual reconstructable proto-forms are red/bluelinked and can have their own page, while the projections are unlinked. It becomes difficult when there is actually no possible PIE preform, like in the case of *bladą. We should mention in this case that it's not a PIE preform. —CodeCat 13:37, 26 July 2017 (UTC)

Wiktionary meetup 2, United States[edit]

I'll be in Sandusky, Ohio a lot this summer. I can set up a meeting there with someone who would be able to go to northern Ohio. We'll meet for ice cream or lunch or something. I don't care what we do. I just want to meet a Wiktionarian!

(I could also go to Cleveland, Columbus, Toledo, or any other city within that general area.) Reply or post on my talk page and we'll exchange contact info or whatever if necessary, and figure out where it is in Ohio you want to meet. PseudoSkull (talk) 00:21, 20 July 2017 (UTC)

Pinging Ruakh.​—msh210 (talk) 00:25, 23 July 2017 (UTC)
Thanks for thinking of me, but I live in the Seattle area now. —RuakhTALK 04:46, 23 July 2017 (UTC)

WT meetup, Spain[edit]

Anyone finds themselves near Barcelona this summer too. Send me a message. --Recónditos (talk) 15:36, 20 July 2017 (UTC)

Damn I wish I could meet you! PseudoSkull (talk) 15:40, 20 July 2017 (UTC)
Haha, I have always wondered whether Wonderfool's Spanish location was a lie or not. Also whether he actually married an heiress. I FORGET NOTHING. Equinox 00:33, 23 July 2017 (UTC)
An heiress? Lol, no. --Recónditos (talk) 12:24, 23 July 2017 (UTC)

Strategy discussion, cycle 3. Challenge 4[edit]

Hi! The movement strategy discussion is still underway, and there are four challenges that you may discuss:

  1. How do our communities and content stay relevant in a changing world?
  2. How could we capture the sum of all knowledge when much of it cannot be verified in traditional ways?
  3. As Wikimedia looks toward 2030, how can we counteract the increasing levels of misinformation?
  4. and the newest one: How does Wikimedia continue to be as useful as possible to the world as the creation, presentation, and distribution of knowledge change?

The last, fifth challenge will be released on July, 25.

If you want to know what other communities think about the challenges, there's the latest weekly summary (July 10 to 16), and there's the previous one (July 1 to 9).

If you have any questions, you may ask here (please, remember to ping me). The FAQ might be helpful as well.

Bot request: create entries for Japanese verb and adjective forms[edit]

Please someone use a bot to create the entries for all Japanese verb and adjective forms, if that's OK.

I've been trying to learn Japanese and I think maybe these entries would be helpful.

Unless people consider these entries unwanted for some reason. --Daniel Carrero (talk) 16:08, 20 July 2017 (UTC)

Not knowledgable in Japanese, but I believe all inflections of all words in all languages should be added to Wiktionary regardless. So, in all technicality, they should be welcome. PseudoSkull (talk) 19:45, 20 July 2017 (UTC)
I think that we have to decide what forms we want first. Also, there are conflicting views.
This article describes a set of conjugation rules widely used in order to teach Japanese as a foreign language. However, Japanese linguists have been proposing various grammatical theories for over a hundred years and there is still no consensus about the conjugations. Japanese people learn the more traditional "school grammar" in their schools, which explains the same grammatical phenomena in a different way with different terminology (see the corresponding Japanese article). (w:Japanese verb conjugation)
Because the Japanese language is written without space, different grammar systems tend to have different notions on what constitutes a word. The 学校文法 (gakkō bunpō, school grammar) system tends to cut sentences into smaller pieces to help understand the development of the language. It is used in Japanese schools and dictionaries, but is not designed for a foreign audience who have no experience with the language. A new grammar called 日本語教育文法 (nihongo kyōiku bunpō, Japanese-language education grammar) has been devised since 1960s. It simplifies the “school grammar” system a lot and is widely used in learning materials for non-native speakers. The difference is that the former provide “stems” used to form words and the latter provide prefabricated forms to be used in sentences. (Appendix:Japanese verbs)
In Japan, adjectives and verbs alike all have 未然形, 連用形, 終止形, 連体形, 仮定形, and 命令形 (see our current entries), while for foreigners, verbs have dictionary form, a-form, i-form, u-form, e-form, o-form, and te-form, while adjectives are something else.
(Actually, I've been thinking about this. See [4].) —suzukaze (tc) 01:54, 21 July 2017 (UTC)
(@Eirikr, Atitarev, Dine2016, TAKASUGI Shinji, Fumiko Take, Wyangsuzukaze (tc) 02:36, 21 July 2017 (UTC))
We should avoid adding bound forms because there is no consensus between traditional grammarians and modern linguistes. The following forms may have their own pages:
Negative 書かない 食べない 来ない しない
Volitional 書こう 食べよう 来よう しよう
Polite 書きます 食べます 来ます します
Past 書いた 食べた 来た した
-te 書いて 食べて 来て して
Condition 書けば 食べれば 来れば すれば
Imperative 書け 食べろ 来い しろ
TAKASUGI Shinji (talk) 03:41, 21 July 2017 (UTC)
@Shinji, some questions for you:
What do you mean by "bound forms"? By some interpretations, all of the above except the Imperative are "bound forms".
  • If you mean the causative and passive, I think omitting these does a disservice to that portion of our user base who might be beginner-level studiers of Japanese. The rules for the passive, for instance, and whether to add れる (-reru) to the 未然形 (mizenkei, irrealis or incomplete form) or られる (-rareru) instead, are easy enough once you know them. But for anyone who doesn't know the rules, this kind of information is typically best presented in conjugation tables. And someone running across a verb form like 食べさせられました (tabesaseraremashita, was made to eat something, polite causative-passive past tense) may well not know that the lemma form is 食べる (taberu, to eat).
I'm not necessarily advocating that we create entries for forms like 食べさせられました, but I do think we need to ensure that a user searching for "食べさせられました" can somehow find their way to the lemma entry at 食べる, and have access (via tables, or links to other pages here or on Wikipedia) to the information needed to make sense of the longer conjugated forms. Perhaps having full entry pages is the best way to do this. Perhaps instead we just need to include the conjugated forms somewhere within the lemma pages. Or perhaps there's an altogether different approach. My concern is ensuring that users are able to find what they need.
Also, do you have any opposition to the inclusion of other polite conjugations, such as the past form -ました (-mashita), or the volitional form -ましょう (-mashō)? Various materials targeting English-speaking learners include the polite forms not as a single row, but as a whole column, showing each of the conjugations. ‑‑ Eiríkr Útlendi │Tala við mig 18:17, 21 July 2017 (UTC)
I meant non-final forms such as mizenkei (ex. 書か-, 食べ-). I should have said stem or radical probably. — TAKASUGI Shinji (talk) 23:55, 21 July 2017 (UTC)
Why not improve on the search function instead? Forms of verbs and adjectives are never ending for agglutinative languages. If the search string contains kana and is Japanese-looking, compare it with a repertoire list of all existing Japanese lemmas and their autogenerated forms. Output terms which are most similar to the search string, with their definition, sorted by increasing Levenshtein distance between the search string and term. That would suggest 食べさせる (tabesaseru)―the causative form of 食べる (taberu, to eat)―as the closest match for 食べさせられました (tabesaseraremashita). I think a similar approach is used by the online Korean dictionary Daum. Wyang (talk) 22:27, 21 July 2017 (UTC)
I think it's a great idea to add Japanese inflected forms - verbs and adjectives. They already exist in CAT:Japanese verb forms and CAT:Japanese adjective forms. There's a lot of work though but it can be done with a bot. Care should be taken if a form coincides with another word, as in hiragana spelling of  () (koi) - こい (koi). I support the same for Korean verbs and adjectives and other languages. Recently Persian verb forms were added. The work on search function can be done in parallel. I also think that the forms in the inflection tables should be wikified (linked) as in the majority of inflection tables for other languages. --Anatoli T. (обсудить/вклад) 02:37, 22 July 2017 (UTC)
  • WF would quite like to do it. I remember once, about 10 years ago, WF wrote a bot to add inflected forms of Ancient Greek verbs. Knowing nothing about the language, and with only a smattering of botting experience behind him, he was promptly blocked. --Recónditos (talk) 12:22, 23 July 2017 (UTC)

Sorting Vietnamese[edit]

I just noticed that we don't have automatic category sorting for Vietnamese, which has an extremely diacritic-rich writing system. Should we? How does it work? Are the tone diacritics ignored for sorting purposes, so that à ả ã á ạ are all sorted as a? What about the non-tone diacritics? Are ă/â ê ô/ơ ư sorted together with a e o u, or are they sorted separately? And what about đ? Is it equivalent to d for sorting purposes, or are they separate? —Aɴɢʀ (talk) 10:56, 21 July 2017 (UTC)

For pinging purposes: our currently active editors who claim some knowledge of Vietnamese are @Wyang, Atitarev, Fumiko Take, HappyMidnight, Monni95, MuDavid, Mxn, PhanAnh123. —Aɴɢʀ (talk) 11:01, 21 July 2017 (UTC)
Thanks for the ping but I can only confirm the order "a, á, à, ả, ã, ạ" provided by Stephen below. In case it's not obvious, a, ă and â are separate letters (in this order), also the correct order for similar letters is: d, đ; e, ê; o, ô, ơ; u, ư. Digraphs (gi, kh, ng, nh, ph, th, tr) are not separate letters. --Anatoli T. (обсудить/вклад) 04:14, 22 July 2017 (UTC)
This was discussed previously at User talk:Fumiko Take#Sort. Vietnamese dictionaries sometimes have different practices of sorting the diacritics and tones, but I think the method proposed in the linked discussion is a good one to use. That will require that Module:links and Module:languages/data2 provide customisation for sorting so that the sort key can be generated externally by a sorting function (Module:vi-sort of sorts). Wyang (talk) 11:06, 21 July 2017 (UTC)
Based on the thread you linked to, I think at the very least we should edit Module:languages/data2 to strip the tonal diacritics. I can do that right now if there are no objections. Categories already ignore capitalization for sorting purposes for all languages. Anything beyond that would go beyond my editing abilities, but at least I can take the first step. —Aɴɢʀ (talk) 11:42, 21 July 2017 (UTC)
@Wyang: I've modified Module:languages so that the sort_key value in a language's data table can be the name of a module that contains a sortkey-generating function. The function (currently) must be named makeSortKey and it is automatically supplied the arguments text, langCode, scCode, the same arguments that are supplied to transliteration modules. That should allow you to create a Vietnamese sortkey-generating module. — Eru·tuon 19:11, 21 July 2017 (UTC)
My attempt (without knowing of Erutuon's edits):
	sort_key = {
		from = {
			'à', 'ả', 'ã', 'á', 'ạ',
			'ằ', 'ẳ', 'ẵ', 'ắ', 'ặ',
			'ầ', 'ẩ', 'ẫ', 'ấ', 'ậ',
			'è', 'ẻ', 'ẽ', 'é', 'ẹ',
			'ề', 'ể', 'ễ', 'ế', 'ệ',
			'ì', 'ỉ', 'ĩ', 'í', 'ị',
			'ò', 'ỏ', 'õ', 'ó', 'ọ',
			'ồ', 'ổ', 'ỗ', 'ố', 'ộ',
			'ờ', 'ở', 'ỡ', 'ớ', 'ợ',
			'ù', 'ủ', 'ũ', 'ú', 'ụ',
			'ừ', 'ử', 'ữ', 'ứ', 'ự',
			'ỳ', 'ỷ', 'ỹ', 'ý', 'ỵ',
			'ă', 'â', 'ê', 'ô', 'ơ', 'ư',
			'([1-5])([^%s]+)', -- move tone number to end of syllable
			'([a-z₁₂₃]+)([^a-z₁₂₃1-5]+)', -- add tone 0 to syllables that are not followed by a number
			'([a-z₁₂₃]+)$', -- add tone 0 to syllables that are followed by the end of the string
		to   = {
			' ',
			'a1', 'a2', 'a3', 'a4', 'a5',
			'ă1', 'ă2', 'ă3', 'ă4', 'ă5',
			'â1', 'â2', 'â3', 'â4', 'â5',
			'e1', 'e2', 'e3', 'e4', 'e5',
			'ê1', 'ê2', 'ê3', 'ê4', 'ê5',
			'i1', 'i2', 'i3', 'i4', 'i5',
			'o1', 'o2', 'o3', 'o4', 'o5',
			'ô1', 'ô2', 'ô3', 'ô4', 'ô5',
			'ơ1', 'ơ2', 'ơ3', 'ơ4', 'ơ5',
			'u1', 'u2', 'u3', 'u4', 'u5',
			'ư1', 'ư2', 'ư3', 'ư4', 'ư5',
			'y1', 'y2', 'y3', 'y4', 'y5',
			'a₁', 'a₂', 'e₂', 'o₂', 'o₃', 'u₃',
			'%1' .. '0' .. '%2',
			'%1' .. '0',
It can transform the string Tuyên ngôn toàn thế giới về nhân quyền của Liên Hợp Quốc ; công bằng ; Đại ; Ác-si-mét into tuye₂n0 ngo₂n0 toan1 the₂4 gio₃i4 ve₂1 nha₂n0 quye₂n1 cua2 lie₂n0 ho₃p5 quo₂c4 ; co₂ng0 ba₁ng1 ; d₁ai5 ; ac4 si0 met4.
It's a shame that Lua error in Module:languages at line 348: data for mw.loadData contains unsupported data type 'function'; using a function as the third parameter for gsub may have made dealing with diacritics easier.
edit: forgot about 'y' —19:49, 21 July 2017 (UTC)
suzukaze (tc) 19:45, 21 July 2017 (UTC)
@Suzukaze-c, Wyang: I'll see if I can convert that long series of replacements into a function in Module:vi-sortkey, unless either of you is working on a function now. It might be more efficient to first decompose, then handle the diacritics. — Eru·tuon 20:22, 21 July 2017 (UTC)
I've added a subscript 0 for unmodified vowel letters (that is, a plain vowel letter with or without a tonal diacritic), to make sure that the modified ô and ơ sort directly after plain o. Otherwise, I wonder if modified vowel letters would sort in unacceptable positions. (Hypothetical example: ngôn ngo₂n0 should sort directly after ngon ngon0, but perhaps it would sort after ngoy because would sort after y. So I think ngon should have the sortkey ngo₀n0.) But I don't know how sortkeys work, when non-alphabetic characters are involved, and I could be wrong. Does anyone know if the subscript 0 is needed? — Eru·tuon 21:45, 21 July 2017 (UTC)
Great work on Module:vi-sortkey, thanks. I'm not working on this at the moment, so please feel free to make any changes. Not sure about the sorting algorithm in Lua either, but a good method of testing whether the entries are properly sorted would be to check whether the {{der3|lang=vi}} output using a large number of Vietnamese words is correct. Wyang (talk) 22:20, 21 July 2017 (UTC)
I think I might have been wrong about needing subscript 0, but I'm getting confused now. If someone could look at the documentation page of the module and figure out if the function is working, I would appreciate it. You can process a list of words using the showSorting function on the documentation page of the module. — Eru·tuon 22:58, 21 July 2017 (UTC)
Okay, yeah, my reasoning above was wrong. ngôn should sort after both ngon and ngoy, as ô is a different letter from o. So I think the sort order is fine now. But if someone could confirm, that would be great. — Eru·tuon 23:15, 21 July 2017 (UTC)
Not sure if this will be helpful or not. There is confusion in Western software for Vietnamese in regard to sort order. In Microsoft Word 2010, the order is given as a, à, , ã, á, . In Microsoft Excel 2010, it is: a, á, à, ã, , . These are incorrect. The MS Word 2010 order comes from the physical order of the Vietnamese tones on a Vietnamese keyboard. The order of the keys is not the sort order.
The alphabet, in correct order, is: a ă â b c d đ e ê g h i k l m n o ô ơ p q r s t u ư v x y (the twelve vowels being: a, ă, â, e, ê, i, o, ô, ơ, u, ư, y). The six tones are: a, á, à, ả, ã, ạ, in this order. Therefore, the vowel a, including its associated forms ă and â, take up eighteen places in the sort order:
a, á, à, ả, ã, ạ
ă, ắ, ằ, ẳ, ẵ, ặ
â, ấ, ầ, ẩ, ẫ, ậ
Altogether, the 12 vowels plus 6 tones take up 72 places in the sort order. —Stephen (Talk) 01:18, 22 July 2017 (UTC)
@Stephen G. Brown: I've added the order of tonal diacritics that you describe to Module:vi-sortkey. @Fumiko Take gave the "Microsoft word 2010" order in the talk page discussion linked above, saying that it was used by the Institute of Linguistics of Vietnam and Vietnam National University Publishing House. I can't verify either claim, but the order can be changed easily if necessary. — Eru·tuon 01:58, 22 July 2017 (UTC)
Interesting, I didn't know it was used for Word. Any way, I consulted those huge dictionaries published by those institutions, but there don't seem to be any online copies or previews, so I guess you'll just have to take my word for it. ばかFumikotalk 06:02, 22 July 2017 (UTC)
I'm not sure if there's such thing as a "correct" order. Normally, whenever I recite the tones, it's "ngang, sắc, huyền, hỏi, ngã, nặng", which is how I learned them at grade school. But the dictionaries seem to use either the Tang-poetry-inspired order, or that which parallels with the four tones of Middle Chinese (ngang and huyền - level; hỏi and ngã - rising; sắc and nặng - departing/checked). ばかFumikotalk 06:09, 22 July 2017 (UTC)
MS Word is a word processing program, which is what would be needed to compile, edit, and print big Vietnamese dictionaries. It's likely that the Institute of Linguistics of Vietnam and the Vietnam National University Publishing House used MS Word to produce those dictionaries. Twenty-five years ago, they would have had to sort all of the entries by hand, which is a huge job. Usually they had to write each entry on a card, which they then stored in long card-file boxes designed for the purpose. They moved the cards around by hand to achieve sorting, and then they would type the information from the cards. Today, they can use computerized sorting, which is accurate and almost instantaneous. Those institutions and publishers probably accepted the MS Word word order. To do otherwise would have been difficult and expensive. So what does this mean? Maybe Vietnam is accepting this new word order as an official one. You are our expert for Vietnamese, Fumiko, so the decision is yours to make. If the Institute of Linguistics prevails on MS to use a different sort order in the next edition of MS Word, then it will be easy for us to change our word order as well. So whatever you decide is okay with me. —Stephen (Talk) 12:02, 22 July 2017 (UTC)
I'm not aware of any respectable source that uses the "ngang, sắc, huyền, hỏi, ngã, nặng" order (most dictionaries I've seen that do are from inferior publishers who can't even decide whether to use "từ điển" or "tự điển", so it's safe to just disregard them altogether), so I guess you'll just have to go with the "ngang, huyền, hỏi, ngã, sắc, nặng" order. ばかFumikotalk 07:04, 23 July 2017 (UTC)

I've added the sortkey module to the data table for Vietnamese. It currently uses the order given by @Stephen G. Brown, rather than the order of the Institute of Linguistics, but it can be switched easily if @Fumiko Take wants to go with the other order. — Eru·tuon 04:56, 23 July 2017 (UTC)


http://dotsies.org/ Huh. —Justin (koavf)TCM 06:45, 22 July 2017 (UTC)

It totally misses on the fact that the human brain is good at recognising shapes. —CodeCat 10:03, 22 July 2017 (UTC)
Sort of fun. Did the creator lose interest? No tweets since 2013. Equinox 10:08, 22 July 2017 (UTC)
 --Daniel Carrero (talk) 10:30, 22 July 2017 (UTC)

"Proverb" ain't a part of speech[edit]

Just a thought: remember how we started to get rid of the Initialism and Abbreviation headers because they aren't actually parts of speech (e.g. BBC functions as a proper noun)? - though we still have lots of relics like TLA. Shouldn't we also get rid of Idiom and Proverb on the same grounds? Obviously it's good to know when something is a proverb (and we could use the normal categories for this, maybe a {{lb|en|proverb}}), but Proverb definitely isn't a PoS. And I never really knew what Idiom was good for. We would still have Phrase as a wastebasket taxon for anything that doesn't fit into another PoS. I'm not too bothered either way, but it feels consistent and logical, especially if we're moving towards some semantic (WikiData?) model where a PoS header needs to represent an actual PoS. What is your opinion? Equinox 08:59, 23 July 2017 (UTC)

Can we not call them Sentence? —CodeCat 10:44, 23 July 2017 (UTC)
They are not guaranteed to be sentences though the entry is likely to be the core of a sentence. DCDuring (talk) 11:55, 23 July 2017 (UTC)
I think that "Proverb" is a more-or-less perfect name to describe a proverb. --Recónditos (talk) 12:18, 23 July 2017 (UTC)
Yeeeees, the question is more whether we should put it in a gloss. "Football" is a good gloss for a lot of your sports journalism trash but you don't put that between the double equals signs. Equinox 12:31, 23 July 2017 (UTC)
I don't see any advantages in putting Phrase instead of Proverb, are there any proverbs which aren't phrases? Crom daba (talk) 13:50, 23 July 2017 (UTC)
The common core of a proverbial expression that takes many forms could be a non-constituent, ie, not a phrase. I don't remember whether we have made entries of that kind. DCDuring (talk) 16:20, 23 July 2017 (UTC)
  • I'm fine with "Proverb", although I'd support getting rid of "Idiom" (which is usually supplanted by {{lb|xx|idiomatic}}). —Μετάknowledgediscuss/deeds 16:22, 23 July 2017 (UTC)
Support highly. Please remember that a section heading for POSes are for POSes and not for anything else. Should we also consider "formality" a POS, for instance? PseudoSkull (talk) 02:39, 24 July 2017 (UTC)
The argument in favour of keeping proverb as a part of speech is that they are typically used in a more isolated fashion than other phrases. They can always(?) stand alone, whereas other phrases are woven into a sentence no differently than any other word. I think "Phrase" should go, however, and I can't recall ever seeing "Idiom", but that seems out of place to me. "Phrase" and "Idiom" don't inform the reader how a term/expression is used. "Proverb" on the other hand, does. Andrew Sheedy (talk) 04:33, 24 July 2017 (UTC)
  • A note: Idiom is already explicitly forbidden by WT:ELE, so we don’t need any changes to start getting rid of it. — Vorziblix (talk · contribs) 09:09, 24 July 2017 (UTC)
Proverb, letter, suffix, prefix, symbol, definitions... the easiest solution is not to get rid of “proverb” as a part of speech, but to stop calling our definition-section headings “part of speech” in the first place. — Ungoliant (falai) 15:31, 24 July 2017 (UTC)
I'd support saying "Definitions section" instead of "POS section" in the future. EL could be edited to arrange that if people want. Some entries (Chinese I believe) even have "Definitions" as a POS header. --Daniel Carrero (talk) 15:34, 24 July 2017 (UTC)
I think that would be a good idea. Chinese does use a definitions section sometimes because (I think) many words have ambiguous functions, e.g. with many nouns easily being used as adjectives or adverbs. —Aryaman (मुझसे बात करो) 19:20, 26 July 2017 (UTC)

I would say that a proverb is a kind of set phrase, as hypernym. At least so I would say in Spanish, French and Catalan terminology with its equivalents frase hecha, phrase faite, frase feta. --Vriullop (talk) 07:57, 27 July 2017 (UTC)

Japanese pitch accent requests by Special:Contributions/[edit]

Is it fair to bulk-request Japanese pitch accents, as in インボイス (inboisu)? They are not so readily available in dictionaries. Hardly present in online dictionaries and occasionally available in printed dictionaries and paid apps. --Anatoli T. (обсудить/вклад) 09:38, 23 July 2017 (UTC)

Yes, we should definitely include them. If they are not readily available, that's even more reason for us to provide them. —CodeCat 10:43, 23 July 2017 (UTC)
I've added the ones I could find in Daijirin. Wyang (talk) 11:01, 23 July 2017 (UTC)
@Wyang Thanks. For me, accessing Daijirin has become cumbersome. BTW, shouldn't [ìńbóꜜìsù] show "m" for consistency? --Anatoli T. (обсудить/вклад) 11:42, 23 July 2017 (UTC)
That thing (whatever it is called) is based on the romanisation (inboisu). Wyang (talk) 11:44, 23 July 2017 (UTC)
I've been using Weblio辞書 for Daijirin accents, but unfortunately it doesn't show which vowels are devoiced. --Dine2016 (talk) 16:01, 23 July 2017 (UTC)
@Dine2016 Thanks. I forgot about this resource. It's the only one online, I think. I purchased Daijirin for 17 AUS$ but my android is now malfunctioning and I have switched to an iPhone. Unfortunately, there is no licence transfer and I am not sure if I use this current phone for long. It's a problem with purchased apps. --Anatoli T. (обсудить/вклад) 07:19, 24 July 2017 (UTC)
Any Australian IP that does mass "theme" edits with difficult languages like that is probably an Awesomemeos sock, though I can't be certain enough to start playing whack-a-mole with them. It looks like they've figured out how to keep from always geolocating to the same place (though in this case they probably actually are in Sydney), but their approach to editing is pretty distinctive. Chuck Entz (talk) 03:30, 27 July 2017 (UTC)

Request for adminship[edit]

The main motivation is to be able to edit javascript pages (i.e. gadgets, MediaWiki:common.js, other's javascript pages etc.). Unfortunately, template editor does not allow me to to edit js pages.

What I am going to do:

  • General cleanup of javascript infrastructure.
    • Mainly that includes moving stuff from one place to another.
  • Extract gadgets from MediaWiki:Gadget-legacy.js and also make disabling legacy gadgets in preferences not result in a catastrophe.
  • Modernize gadgets (that is, use jQuery, clean up code, drop deprecated code, etc.)
  • rewrite LangMetadata (currently defined in MediaWiki:Gadget-TranslationAdder.js) to use modules rather than hardcoded data.
  • elimiate the use of langrev subtemplates in MediaWiki:Gadget-TranslationAdder.js and possibly add a better autocomplete.
  • Eliminate JsMwApi in favor of mediawiki's own Api.

Let's make Wiktionary great again!

Any objections?--Dixtosa (talk) 12:01, 23 July 2017 (UTC)

We definitely need someone who is willing and able to tackle these issues. There are a few more open issues with translation tables as well:
  • The conversion of the translation adder code to not rely on a fixed table structure, but instead use the translations-cell CSS class which was added to {{trans-top}} some time ago.
  • Deprecation and removal of {{trans-mid}} in favour of CSS-based balancing, which also includes the removal of all balancing-related features from the translation adder. This relies on the previous step.
  • Migrating translation tables to use vsSwitcher, which doesn't need a surrounding div.
  • Redoing the "favourite languages" feature of translation tables, so that favourite languages are shown as a reduced translation table in the table's collapsed state, rather than in the header of the table. This relies on the previous change, since the older NavFrame system does not allow for content to be shown in the collapsed state, whereas vsSwitcher does.
CodeCat 15:53, 23 July 2017 (UTC)
Wiktionary:Votes/sy-2017-07/User:Dixtosa for admin DTLHS (talk) 15:58, 23 July 2017 (UTC)
I'd be particularly in favor of any JUS improvements that resolved the intermittent, chronic problem with loss of the show/hide controls and sometimes other functionality implemented in JS. DCDuring (talk) 16:24, 23 July 2017 (UTC)
Sounds great. I can get on board with "MWGA". I second @DCDuring's comment. — Eru·tuon 21:31, 23 July 2017 (UTC)
I am not sure exactly how bot work relates to adminship. But Dixtosa is a name that I trust. So sure. Equinox 22:03, 23 July 2017 (UTC)

Wiktionary:Votes/2017-07/Rename categories[edit]

Based on the discussions linked in the vote, I created Wiktionary:Votes/2017-07/Rename categories. This is a large project, so this vote will start in two weeks and then it will end in two months. --Daniel Carrero (talk) 13:35, 24 July 2017 (UTC)

Limiting user vote creations[edit]

Is there a limit to how many votes of user can create in a given time? If not, I think there should be. --Victar (talk) 19:17, 25 July 2017 (UTC)

What limit you would like, exactly? --Daniel Carrero (talk) 19:27, 25 July 2017 (UTC)
I think we just need to have every vote approved by at least five editors or so (in the BP or maybe elsewhere, depending on the topic) before it can be created. --WikiTiki89 20:25, 25 July 2017 (UTC)
How would brigading be dealt with? -- 20:41, 25 July 2017 (UTC)
What exactly do you mean by brigading? --WikiTiki89 20:48, 25 July 2017 (UTC)
How to ensure a neutral assessment of the eligibility of a vote? Should votes be made about votes? -- 23:15, 25 July 2017 (UTC)
To clarify, if five editors (or however many we decide) want to have a vote and a hundred editors don't, we would still have the vote, because those five editors approved it. --WikiTiki89 14:51, 26 July 2017 (UTC)
I agree, I think each vote should get pre-approval. I think users should also only be able to create max 2-3 votes per month. --Victar (talk) 21:13, 25 July 2017 (UTC)
I think if each vote is pre-approved, it we won't need any rate limit. --WikiTiki89 23:12, 25 July 2017 (UTC)
You say that, until someone puts in 10 vote proposals. --Victar (talk) 23:17, 25 July 2017 (UTC)
If five other editors approve each one, what's the issue? Anyway, I don't think we need formal vote proposals. If everything is done right, the issue should already have an ongoing discussion before it is decided that there needs to be a vote. --WikiTiki89 14:51, 26 July 2017 (UTC)
How would the pre-approval process work? Wiktionary:Votes/pl-2015-09/Coauthoring policy votes suggests this: "The proposed requirement that all policy votes have at least one coauthor, that is, a distinct individual who at the very least makes one edit to the descriptive section of the voting page before it starts, even if just to list themselves as a contributor." That vote was created in 2015 and never started. As of today, the vote does not meet its own requirements to start: it doesn't have two contributors yet. --Daniel Carrero (talk) 12:08, 26 July 2017 (UTC)
By five (or however many we decide) editors mentioning in the discussion of the issue that there should be a vote. --WikiTiki89 14:51, 26 July 2017 (UTC)

At the moment nine different votes are running, created by five different users. Do we need a votes "watchdog"? I think there should be a limit on how long a vote runs for, some run for two months. DonnanZ (talk) 06:18, 26 July 2017 (UTC)

I created this two-month vote: Wiktionary:Votes/2017-07/Rename categories. It has not started yet. I think it was a good idea because it's a large proposal. It gives more time for people to read, think and discuss about it. @Dan Polansky sometimes creates two-month votes too. In my opinion this is not an issue, but I can change the "Rename categories" vote to one month if people prefer. --Daniel Carrero (talk) 09:11, 26 July 2017 (UTC)

Too much time is wasted on these trivial votes. Wyang (talk) 09:18, 26 July 2017 (UTC)

@Wyang What votes would you say are trivial? --Daniel Carrero (talk) 09:36, 26 July 2017 (UTC)
Sorry, I mistyped the ping, here it is again: @Wyang. --Daniel Carrero (talk) 09:37, 26 July 2017 (UTC)
Most of the votes running now. The whole idea of creating a vote after every discussion is just wrong. It is continuing to encourage uninformed self-assurance, over critical analysis of the issues. There have been many examples of counterproductive decisions made in the past as a consequence of relying on collective ignorance; superficially having a decision made by such majoritarian democracy looks good, but it could be really damaging in the long run. An example is the decade-long merge-split-merge vacillation of Chinese. Making it worse is the verbosity of many of the votes, such as Wiktionary:Votes/pl-2017-07/Gallery and Wiktionary:Votes/2017-07/Rename categories. I certainly would not want to read 14,835 bytes for a vote, and should not have been given the chance to in the first place. So much work on entries and developing new gadgets and functionalities could have been finished if the time reading the votes is diverted. Wyang (talk) 10:02, 26 July 2017 (UTC)
You mentioned collective ignorance concerning the merge-split-merge of Chinese, so what about having a rule like this: "Only people knowledgeable in [language] (as evidenced by the number Y of edits in [language]) are permitted to vote in issues concerning [language]." What do you think about that?
Apart from the Chinese issue, how are the "Rename categories" and "Gallery" votes "trivial"? They are votes for major changes. I don't claim them to be perfect, they could have problems to solve, but they can't be "trivial". Are there any better ways to try to implement these projects without votes? I've been trying to work under this limit: 1 vote per week. Some people seem to prefer it that way, although the idea of having that formal rule itself failed that vote.
Votes are often much smaller and easier to read than discussions. True, Wiktionary:Votes/2017-07/Rename categories is 14,835 bytes -- but it was based off Wiktionary:Beer parlour/2017/June#Proposal: Clean up, rename and replace "en:" → "English" in all categories which is 31,924 bytes and could still grow. It's OK if you don't want to read it, but please don't vote oppose on "TL;DR" grounds (although you still have that right). It's great to be able to discuss things on the BP, but for major proposals, votes have this advantage: it should be easier to judge the merits of a specific proposal in the vote (often detailing how exactly a policy would be edited) rather doing something out of a discussion with multiple proposals and where people not always give clear support, oppose, etc. for each idea, and can change their minds in the middle of the discussion. --Daniel Carrero (talk) 10:31, 26 July 2017 (UTC)
Maybe, just maybe, the matters aren't worth "resolution". When this very point is mentioned in the discussion, it is often ignored by the vote advocate. TL;DR is usually a somewhat polite way of saying: "Not worth my time". DCDuring (talk) 11:27, 26 July 2017 (UTC)
Do you have any examples of votes that aren't worth "resolution", and/or votes where that point was mentioned in the discussion and was ignored by the vote advocate? --Daniel Carrero (talk) 11:42, 26 July 2017 (UTC)

For what it's worth, in Wiktionary:Votes/pl-2016-11/Voting limits the "Proposal 2" failed. It was about implementing this regulation to limit vote creation: "The same person cannot create more than one vote in the span of 7 days. (For example, if someone creates a vote on December 9, then they must wait until at least December 16 before creating another one.)" --Daniel Carrero (talk) 09:27, 26 July 2017 (UTC)

We may need another vote on that.
It seems to me that votes are poor substitutes for longer-running consensus decisions. They seem to involve forcing resolution of disagreements for the sake of doing so or for the sake of enabling some kind of often premature standardization. With the passage of time some of these matters resolve themselves, others can be resolved more easily as contributors gain more knowledge. Votes force a discussion to take place whether or not participants have had actual experience with the "problem" being addressed. The proposals themselves are often quite amateurish, making the discussion mostly a matter of correcting gross errors and little time to discuss a mature proposal. Most discussion should take place before the vote is initiated. If no one cares enough to participate in the discussion perhaps the matter isn't of sufficient importance or is a "solution" to a non-problem. DCDuring (talk) 11:19, 26 July 2017 (UTC)
If I'm not mistaken, you seem to be talking about the votes concerning "External links" and "Further reading". It could be other votes too. I do think the "commons" links doesn't fit the "Further reading" and it's still a problem. The use of "Further reading" was an improvement otherwise, in my opinion. Let me know if you are talking about other votes. --Daniel Carrero (talk) 11:42, 26 July 2017 (UTC)
You know, you could create votes with vote-limiting proposals. --Daniel Carrero (talk) 11:46, 26 July 2017 (UTC)
Even though you said "We may need another vote on that.", I don't think there's anyone creating more than 1 vote every 7 days so the vote-limiting rule that failed the vote is being de facto followed even if it's not a formal rule. Of course, you could be thinking about different vote-creation limitations that we could discuss. --Daniel Carrero (talk) 11:59, 26 July 2017 (UTC)
Nobody would bother with this if there weren't a basic consensus that we now had too many votes. The simplest way to avoid a vote on this would be to recognize that consensus. DCDuring (talk) 12:51, 26 July 2017 (UTC)
True, we seem to have consensus on that. What happens now? --Daniel Carrero (talk) 13:02, 26 July 2017 (UTC)
We have a vote on whether or not we have too many votes. —Aɴɢʀ (talk) 14:53, 26 July 2017 (UTC)
Naturally not now, but at some point later I could create a vote on whether or not we have too many votes.
Suppose we want to create a vote to implement @Wikitiki89's proposal: "have every vote approved by at least five editors or so". Do we need approval from five editors or so to create that vote itself? --Daniel Carrero (talk) 15:15, 26 July 2017 (UTC)
Even if we don't need it, would it hurt to wait until we have it? --WikiTiki89 15:19, 26 July 2017 (UTC)
All we need is to enforce existing rules. We already require prior discussion before a vote. I certainly oppose "votes out of the blue" the way Dan has often created them. —CodeCat 15:24, 26 July 2017 (UTC)
@Wikitiki89 Of course not. We could also ask: "Are there five people willing to approve the idea of creating a vote for the proposal of requiring all future votes to be approved by five people first?"
@CodeCat By "Dan", are you referring to me? --Daniel Carrero (talk) 15:27, 26 July 2017 (UTC)
No, the actual Dan. —CodeCat 15:28, 26 July 2017 (UTC)
Sorry, my mistake. In the future, I'd like to create a vote with the proposal: "require prior discussion before a vote". This would serve as a confirmation vote. I dispute the notion that we do have this rule, but if it passes, this will become a formal written rule. --Daniel Carrero (talk) 15:32, 26 July 2017 (UTC)
Are you actually asking, or are you pointing out that we could ask? I am in favor of this rule, however I think it's too early to create a vote. Let's discuss it more. --WikiTiki89 15:40, 26 July 2017 (UTC)
I'm just pointing out that we could ask. I agree that it's too early to create a vote. I agree with this too: let's discuss it more. --Daniel Carrero (talk) 15:46, 26 July 2017 (UTC)
What should happen now is that, say, a vote or two is removed from the list and proposers show some basic self-restraint, so we don't waste time making a rule that shouldn't be required. DCDuring (talk) 18:32, 26 July 2017 (UTC)
@DCDuring: One vote per week, per person, at most looks good to you? What are the one or two votes that you would like to remove from the list? --Daniel Carrero (talk) 18:37, 26 July 2017 (UTC)
This isn't the first time someone has objected to the constant stream of votes. The solution isn't to pin you to a maximum, it's for you to have some responsibility and create fewer needless votes. (And please don't now drag this into yet another 20 paragraphs of criminal-lawyer-Daniel asking "prove to me which ones are needless". Everyone knows.) Equinox 18:40, 26 July 2017 (UTC)
Which of the current votes are needless? This is a reasonable question, it took fewer than 20 paragraphs to ask. --Daniel Carrero (talk) 18:46, 26 July 2017 (UTC)
You are making AGF very difficult for me. As to my preferences, I'd prefer that several of the proposals that you have proposed and continue to favor, that seem like they will or might win, but which I oppose, be withdrawn. I am not sure whether I would also like it if you simply noticed that you are losing credibility with every argumentative response and acted to preserve whatever credibility remains and even restore it or that you continued on your current path, which might lead to none of your proposals passing and a change of climate on this page. DCDuring (talk) 19:19, 26 July 2017 (UTC)
Geez, I was just asking. I know I write argumentative responses sometimes, I don't think that's necessarily a bad thing. But don't you think you write argumentative responses too? When I see your name in the recent changes, responding to a discussion or vote where I participate, I always think before I read your words "here we go, it's time to read some more criticism against what I did again".
Most of the votes I created have passed, some don't and I try to learn from them. It's true that if I lost credibility and none of my proposals passed, this would be a strong incentive for me to stop or avoid creating votes.
Of all the current votes, you voted in 7. You supported the vote for Dixtosa to become an admin. You voted "oppose" in all the other 6 votes pages, half of which were created by me (one per request, which I also opposed eventually). I don't think we can just withdraw the votes that I created and you voted oppose. You mentioned "that you have proposed", so do you have anything against me personally? What would it take for you to support a vote? --Daniel Carrero (talk) 19:56, 26 July 2017 (UTC)

Elu Prakrit[edit]

This needs a code, preferably inc-elu. Alternative names are "Helu Prakrit", "Helu", and "Elu", and maybe "Old Sinhalese". Descendants include si. It is an Indo-Aryan language (inc). —Aryaman (मुझसे बात करो) 23:08, 25 July 2017 (UTC)

@Aryamanarora elu-prk exists. Madhav P. (talk) 00:30, 26 July 2017 (UTC)
@माधवपंडित: Oh, thanks! —Aryaman (मुझसे बात करो) 01:47, 26 July 2017 (UTC)
To make sure a language doesn't already exist, you can use the search box in Module:languages. — Eru·tuon 02:51, 26 July 2017 (UTC)
@Erutuon On second thougts I don't think the issue ends here. You cannot use the {{inh|si|elu-prk}} tag even though Sinhalese is its descendant. Also the hyperlink Helu links to the wiki article of some Chinese king. Helu doesn't even have its own catagory page. Madhav P. (talk) 07:57, 26 July 2017 (UTC)
@माधवपंडित: Aha... Helu is in Module:etymology languages, so it does not have a dedicated category page (except it could have the category Terms derived from Helu). It is currently considered a subvariety of Sanskrit. I can change that if it is wrong. I fixed the Wikipedia link. Is Helu the ancestor of any language besides Sinhalese? — Eru·tuon 08:17, 26 July 2017 (UTC)
Okay, Helu, from the Wikipedia article, looks distinct enough that it can't be considered a variety of Sanskrit. I promoted it to a full-fledged language and added it as the ancestor of Sinhalese. — Eru·tuon 08:26, 26 July 2017 (UTC)
@Erutuon: Thanks a lot! I think only Sinhalese descends from Helu. Helu is Middle Indo-Aryan while Sanskrit is Old Indo-Aryan. Madhav P. (talk) 08:30, 26 July 2017 (UTC)
@माधवपंडित: You're welcome. Hm, I need some more items for the data file: scripts and ancestor (if there is a nearer ancestor than Proto-Indo-Aryan). — Eru·tuon 08:33, 26 July 2017 (UTC)
@Erutuon: An immediate ancestor would be one of the closely related Old Indo-Aryan dialects very close to Sanskrit but of course it'd be undocumented. Can't say about the script... @Aryamanarora what do you think? Madhav P. (talk) 12:41, 26 July 2017 (UTC)
@माधवपंडित: It's most likely Brah, Brahmi script. Is dv Dhivehi a descendant? Wiki says it is a descendant of Maharastri Prakrit but then goes on to say sometimes it's considered a dialect of Sinhalese. —Aryaman (मुझसे बात करो) 13:42, 26 July 2017 (UTC)
@Aryamanarora: Wiki places Helu in association to, if not under Maharastri. I think these two prakrits are more closely related to each other than they are to other prakrits. Madhav P. (talk) 13:45, 26 July 2017 (UTC)
@माधवपंडित: They do seems to be, both of them drop almost all medial consonants, but imo Elu has a completely different phonetic system for the "standard" Maharastri Prakrit. But perhaps the vernacular Maharastri sounded more like Elu than we know. —Aryaman (मुझसे बात करो) 13:58, 26 July 2017 (UTC)
Wikipedia says that Dhivehi descends from Maharashtri Prakrit or Helu in different places on the page, but there are no sources for either claim. (I've added Brahmi script to Helu.) — Eru·tuon 17:48, 26 July 2017 (UTC)
elu-prk isn't a properly formatted code, it should be renamed. —CodeCat 15:25, 26 July 2017 (UTC)
@CodeCat: Would you have an alternative? It would be easy to change the code now, as it's hardly used. — Eru·tuon 17:48, 26 July 2017 (UTC)
Aryaman's original proposal. —CodeCat 17:55, 26 July 2017 (UTC)
Any objections from others? — Eru·tuon 18:00, 26 July 2017 (UTC)

Biblical Hebrew hapax legomena[edit]

The ongoing discussion about making Latin a WDL has made me wonder whether we allow Biblical Hebrew hapax legomena (and dis legomena), considering that:

  • CFI no longer considers usage in a well-known work to be sufficient,
  • we treat Biblical Hebrew and Modern Hebrew as the same language,
  • we consider Hebrew a WDL.

In principle, those three facts mean that we would exclude Biblical hapaxes and disses, except for those (like גבינה, זכוכית, and לילית) that have gone on to become regular words of modern Hebrew. How do we want to handle this situation? Shall we:

  1. ban Hebrew words used only once or twice in the entire Hebrew corpus;
  2. divide Hebrew into Modern Hebrew (he, presumably including Medieval Hebrew) and Biblical Hebrew (hbo, presumably including Mishnaic Hebrew), making the former a WDL and the latter an extinct language;
  3. consider all of Hebrew an LDL;
  4. ignore the issue and decide on hapaxes on a case-by-case basis?

Solution 2 is what we've done for Greek, which is divided into grc and el, and solution 4 is apparently what we've mostly done for Latin and what we're currently arguing over. For that reason I'd prefer NOT to apply solution 4 to Hebrew. My preferred solution is 2, but others may disagree. (Personally I think 2 is actually the only logical solution to the Latin Question as well, but this thread isn't for talking about Latin.) —Aɴɢʀ (talk) 12:45, 26 July 2017 (UTC)

This goes back to our old repealed policy of allowing a word used once in a well-known work. The reason we repealed it, is that if nobody ever used or talked about that word again, then we probably don't need to be included. Thus there are no real hapax legomena in Biblical Hebrew when you include non-Biblical Hebrew, because each of them has been discussed and used later, specifically because of its unusualness in the Bible. --WikiTiki89 14:54, 26 July 2017 (UTC)
Indeed, I suspect you are manufacturing a problem. To follow up on Wikitiki's point, can anyone find even a single Biblical Hebrew entry that would fail RFV under our current rules? —Μετάknowledgediscuss/deeds 15:12, 26 July 2017 (UTC)
There do seem to be true Biblical Hebrew hapaxes [5], but we don't have entries for them yet, either because our coverage of Hebrew skews heavily toward Modern Hebrew, or because people know they wouldn't pass RFV. The words in question may be discussed (i.e. mentioned) later, but are they used later? I know some of them are (I mentioned some above), but all of them? —Aɴɢʀ (talk) 15:43, 26 July 2017 (UTC)
I think you misunderstood me. If you consider the corpus of Biblical Hebrew alone, then of course there are true hapax legomena. But when you consider Hebrew as a whole, including later Hebrew, most, if not all, of these Biblical hapax legomena will be discussed and used again later. --WikiTiki89 15:48, 26 July 2017 (UTC)
No, I understood. My question is, are all of them used (not merely discussed) again? What about the two entries other than פלדה (which is a modern Hebrew word too) in Category:Hebrew hapax legomena? Are they used (not mentioned) at least three times across all stages of Hebrew? —Aɴɢʀ (talk) 15:57, 26 July 2017 (UTC)
Out of those three words, זדה is not actually Biblical Hebrew, but from the Siloam inscription, so it is a different situation that we might need to discuss. The other two are used at least in Modern Hebrew. --WikiTiki89 16:05, 26 July 2017 (UTC)
Then take my "Biblical Hebrew" to mean "all Hebrew from before the 4th century CE" or whatever cutoff point is customary for the line between Mishnaic and Medieval Hebrew. Maybe we can call it "Classical Hebrew". The point remains: if Hebrew is all one language, and that one languages is a WDL, and זדה is not used (as opposed to mentioned) at least three times by three different authors, then our current rules do not allow its inclusion. —Aɴɢʀ (talk) 16:16, 26 July 2017 (UTC)
Well its silly to put Biblical and Mishnaic Hebrew together on one side and Medieval and Modern Hebrew on the other side. Mishnaic Hebrew is a lot more similar to Medieval Hebrew than to Biblical Hebrew. If anything, the line would be drawn between Biblical and Mishnaic. But regardless, if you mean to talk about examples like זדה, then let's talk about those. The contradiction is between these two points: (a) In the context of Hebrew as a whole, it is not likely that someone would encounter this word and want to know what it means, and so does not need to be included. (b) If "Epigraphic Hebrew" were to be considered its own language, then this word would be included, as similar words are in ancient languages with even smaller corpi, so it doesn't make sense to exclude it just because it happens to be part of a larger language. I think we need to resolve this contradiction more generally, rather than specifically for Hebrew, as it applies to many other languages, notably the recently-much-discussed Latin issue (although that details of that case are a bit different). --WikiTiki89 18:11, 26 July 2017 (UTC)
The reason I brought up Hebrew specifically is that is the only other language I can think of besides Latin where we consider the ancient form and the modern form to be one and the same language. Other cases where the ancient form and the modern form of a language are similar enough that it's conceivable to consider them a single language (Greek, Armenian, Icelandic/Norse) have two codes, one for the ancient form and one for the modern. Although on reflection, I guess we have just one code for all stages of Arabic and Chinese as well. At any rate, what this comes down to is the absurd situation we're currently in where a large number of users are saying "Post-1500 Latin is to be treated like either a WDL or a conlang; pre-1500 Latin is to be treated as an extinct language; but they're both the same language", and I wanted to see how we handle parallel situations. It does look like זדה currently does not meet CFI, but I bet if someone were to nominate it for deletion on those grounds, most people would vote to keep it, because generally we do keep words found only in inscriptions of ancient languages. —Aɴɢʀ (talk) 18:28, 26 July 2017 (UTC)
I don't know why you're only considering "ancient" and "modern". English is also a good example: Early Modern English had a lot of forms that we don't include, that we probably would include if it had been its own language. And there are many other languages with this sort of situation. --WikiTiki89 18:43, 26 July 2017 (UTC)

Mansi varieties[edit]

We have been getting a decent influx of Mansi lemmas recently, thanks to @Martinus Poeta Juvenis. This might be a good point to consider if we should treat Mansi as one language or as several.

The Mansi varieties are very different from each other: there are almost no cases where a standard Northern Mansi word has the same shape as its the Southern Mansi cognate, and sometimes they are very different indeed (e.g. 'gristle' is Southern /nʲeːrkɤː/, Northern /ńaːriɣ/). In many cases, reconstructions of Proto-Mansi are also available in literature (in this case *ńī̮rɣɜ or *ńē̮rɣɜ). The only written variety is Northern, and its spelling system mostly cannot be extended for other varieties (e.g. there are no signs for /ɤː/, /æ/ or /ɒ/). Inflection differs too: compared to Northern, Southern Mansi has no dual, but has the accusative and comitative cases. A few scholars by now consider "Mansi" to be a language family with up to four individual languages (Northern, Southern, Western, Eastern).

I would suggest:

  • reserve the code mns for Northern Mansi, which is the only living variety;
  • create new codes at least for Proto-Mansi (mns-pro?), Southern Mansi (ugr-sms?) and Central Mansi (ugr-cms?).

I'm not sure if separate Western and Eastern codes are needed at this point: they're a dialect continuum, and we may need a more general Wiktionary discussion at some point about what we want to do with linguistic field data covering dozens of closely related unwritten varieties. Treating everything as a separate language seems ineffective.

pinging also: @Panda10, @Neitrāls vārds, @Mulder1982 and just in case, @Alcenter. --Tropylium (talk) 13:34, 26 July 2017 (UTC)

Are there any attempts at latinisation for those non-literate Mansi varieties? For example I use transcription schemes given in "The Mongolic languages" for normalizing various phonetic spellings of East Yugur, Baonan, Daur, Mogholi and Khamnigan. I've also contemplated making an ad-hoc one for Sary-Yugur, but maybe it would be going too far. Crom daba (talk) 17:33, 26 July 2017 (UTC)
Most dialects have reasonably standardized linguistic transcription schemes, but they're per individual dialect, not dialect group. E.g. the verb 'to stay': Southern koľt-, Eastern: Lower Konda χoľt-, Middle Konda kʷoľt-, Upper Konda kʷuľt-, Western: kuľt-, Northern: χuľt- (= literary хульт-); or the noun 'mold': Southern ka͔šək, Eastern: Lower Konda xāšγə, Middle Konda kē̮səγ, Western: Pelymka kašša, Vagilsk kē̮šša, Northern: xāssi (= literary ха̄сси). It would seem like overkill to add separate entries for all variants. --Tropylium (talk) 19:05, 26 July 2017 (UTC)

References section only for <references/>?[edit]

I was under the impression that under recent policy changes, the "References" section should only be used for <references/>, i.e. to show inline references that are present elsewhere in the entry. However, User:Gamren has pointed out that our policy doesn't actually say so. So what is going on? —CodeCat 10:45, 27 July 2017 (UTC)

But under the prevailing regime, we have no policies that haven't been voted on. In each case, what has been voted on is the wording of a specific proposal. DCDuring (talk) 12:49, 27 July 2017 (UTC)
We allow "References" sections with simple bullet points instead of <references/>, as per Wiktionary:Votes/2016-12/"References" and "External sources". The vote did propose to require always using <references/> in "References" sections, but @This, that and the other and @Tropylium opposed the idea of introducing that specific limitation. --Daniel Carrero (talk) 13:22, 27 July 2017 (UTC)
I see. I'm not sure if I understand the difference between the sections then. What would I use to refer to another dictionary which contains an entry for the term? —CodeCat 13:35, 27 July 2017 (UTC)
In the vote I linked above, please see the comments of Tropylium, TTO and @I'm so meta even this acronym (and maybe others). I'm not saying I personally agree or disagree with them, but by voting that way they helped to shape the regulations as they are now. --Daniel Carrero (talk) 13:47, 27 July 2017 (UTC)
Sorry, I did not answer your last question properly. When you want to refer to another dictionary which contains an entry for the term, please use "Further reading". --Daniel Carrero (talk) 13:48, 27 July 2017 (UTC)
Even if that dictionary is used to "prove the validity of what is being stated", and in which readers may "verify the information available"? Writing "Further reading" instead of "References" is not much more work, it just seems counter-intuitive.__Gamren (talk) 16:38, 27 July 2017 (UTC)
Obviously not in the carefully considered opinion of those who supported the proposal, which they have carefully studied and for which they had their own clinical experience and good evidence. DCDuring (talk) 16:47, 27 July 2017 (UTC)
But, as was discussed before, we use quotations to attest words. If we wrote "References" just to link to the same word in external dictionaries, this would make it sound like we know that the word exists because it's in those dictionaries.
We can use the "References" to "prove the validity of what is being stated" and "verify the information available" when we are making statements in etymologies and usage notes, for example. --Daniel Carrero (talk) 16:50, 27 July 2017 (UTC)
Okay. So "references" may support everything except 1) that the term exists, 2) that it is of the specified POS, and 3) that it means what we say it means? Then, e.g. diff, diff are erronous? Most Danish entries, at least, are like this (DDO seems to be very frequently linked-to here). Perhaps a bot can be taught to recognize the string ===References===\n* {{R:DDO}} and equivalents?
For Greenlandic affixes, I have rarely been using dictionaries (since both DAKA and its ancestor Oqaatsit are crap for those purposes) but mostly Bjørnum's and Nielsen's grammars (on e.g. -lior, -isag, -suaq), both of which have lists of affixes, as references to both meaning and morphological behaviour (see both Usage notes and the headword line). Is this also wrong?
As an afterthought, what if there is a word in an LDL that has no quotations, but is found in approved dictionaries? Are these latter then still not to be called references?__Gamren (talk) 20:24, 27 July 2017 (UTC)
To answer that question, I'd like to use @Angr's words from this discussion (except I don't speak Ancient Greek so I'll just trust him on the examples): "And ideally (but admittedly totally unrealistically), we should be writing our own definitions "from the bottom up", i.e. on the basis of citations, rather than taking them from other dictionaries. For example, we should be saying that μῆνις (mênis) means "wrath" not because LSJ tells us that's what it means, but because we observe that that's what it means in "Μῆνιν ἄειδε, θεά, Πηληιάδεω Ἀχιλῆος οὐλομένην"."
Yes, I believe diff, diff are erroneous. those Please use "Further reading" even when a word in an LDL that has no quotations, but is found in approved dictionaries.
I don't edit in LDLs, I'm just trusting the judgement of people who participated in discussions and votes and who edit in LDLs. The consensus and rules can change if needed. --Daniel Carrero (talk) 20:43, 27 July 2017 (UTC)
This discussion scares me. No comment by a single user in a vote discussion can be taken as policy. It is only the proposal voted on that is approved. If the text of policies has been altered based on those comments, the change should be null and void. If a vote can't be run properly, it should not be run at all. How many alterations of our policy pages have been purportedly made as result of a vote, but actually with reference to a mere comment? DCDuring (talk) 23:17, 27 July 2017 (UTC)
@DCDuring: To clarify: in Wiktionary:Votes/2016-12/"References" and "External sources", most people in the vote supported the whole proposal, fewer people opposed the whole proposal, some people opposed specifically the rule about requiring <references/>. The final vote count allowed for almost the whole proposal to be implemented, except the <references/> was not implemented at all. What's wrong with that? It's not a comment by a single user is taken as policy, it's quite the opposite: a few oppose votes were enough to not implement a rule. --Daniel Carrero (talk) 23:36, 27 July 2017 (UTC)

Accessible editing buttons[edit]

--Whatamidoing (WMF) (talk) 16:56, 27 July 2017 (UTC)

Like +1. Wyang (talk) 21:17, 27 July 2017 (UTC)
I really hate these new giant buttons everywhere. They take up too much screen space and don't integrate with the browser as well. What's wrong with using default browser buttons? If there are accessibility issues, let the browsers take care of it by having the option to change what default buttons look like. --WikiTiki89 21:24, 27 July 2017 (UTC)
I agree with Wikitiki. However, I wish not for them to be abolished completely but for there to be an option to personally disable them. —suzukaze (tc) 23:33, 27 July 2017 (UTC)

Arabic form I with middle فتحة[edit]

Arabic form I verbs with فتحة in the middle consonant of the past الْمَاضِي may change it for any vowel in the middle consonant in the non-past (imperfect) indicative الْمُضَارِع. Therefore, it'd be very helpful to organize them in groups depending on which vowel or vowels they have, and add those categories to Category:Arabic_verbs. --Backinstadiums (talk) 14:09, 27 July 2017 (UTC)

I supposed this would be a fairly straightforward task, since entries are filled in using 'templates'. Am I wrong? --Backinstadiums (talk) 18:39, 27 July 2017 (UTC)

I'd imagine it could be done by Module:ar-verb, which serves {{ar-verb}}. What should the umbrella category be named, and the subategories? Perhaps "Arabic form-I past verbs by middle vowel", "Arabic form-I past verbs with the middle vowel x"? Any suggestions as to the name of the category, @Atitarev, Benwing2, others? Actually, the umbrella category should be under Arabic form-I verbs, because it applies only to form I. — Eru·tuon 18:48, 27 July 2017 (UTC)
Okay, "past vowel" is used in some of the verb categories already. So I would propose Arabic form-I verbs by past vowel, or perhaps Arabic verbs by past vowel (since only form I has variation in the past vowel), and Arabic form-I verbs with past vowel a or Arabic verbs with past vowel a. —This unsigned comment was added by Erutuon (talkcontribs) at 13:57, 27 July 2017‎ (UTC).
Categorizing by past vowel alone is insufficient, I think we should have individual categories for each past-and-non-past vowel combination (and just to point out for anyone who is unaware, this only applies to form-I verbs). --WikiTiki89 19:23, 27 July 2017 (UTC)
Sounds good, as long as you don't mean in exclusion to categories for individual past and non-past vowels. I think there should be both individual past and non-past vowel categories, and categories for combinations. For instance, كَتَبَ، يَكْتُبُ (kataba, yaktubu) could be placed in categories for "past vowel a", "non-past vowel u" and "past vowel a and non-past vowel u". There could be umbrella categories for both individual and combination vowel categories, and a master category for "Arabic verbs by vowel" or something. — Eru·tuon 19:32, 27 July 2017 (UTC)
I don't think we need the categories for individual past and non-past vowels. The past and non-past vowel pairs are interrelated and shouldn't be separated. For example, most active verbs have a-u (a being the past vowel and u the non-past), while most active verbs with gutturals as one of the last two root consonants have a-a. Some active verbs have a-i. Stative verbs usually have i-a or u-a. All other combinations are rare (for strong verbs at least). And of course these are general rules that have many exceptions; a-u verbs can be stative, i-a verbs can be active, etc. So really taking either one separately doesn't tell you much about the verb. --WikiTiki89 21:22, 27 July 2017 (UTC)