March 2017

Language code prefixes on reference templates, yes or no?[edit]

Right now, some reference templates have a language code included in the name, like {{R:zu:ZED}}, while others don't, e.g. {{R:De Vaan 2008}}. This is rather inconsistent so I think we should settle on one way or another. Should we remove the language code from reference templates that have it, or add it to reference templates that lack it? —CodeCat 19:12, 1 March 2017 (UTC)

So, the obvious problem is that not all references are for one language only, and so you'll often have multiple aliases (e.g. {{R:ine:Kloekhorst2008}}, {{R:hit:Kloekhorst}}). I'm fine with this, but I'd like also to standardize the naming so that {{R:ine:Kloekhorst2008}} becomes {{R:ine:Kloekhorst 2008}}. From a housekeeping standpoint, having the prefixes is very useful, but then again, it will cause me undue anguish going from {{R:L&S}} to {{R:la:L&S}}. —JohnC5 19:41, 1 March 2017 (UTC)
Yeah, I think both options have downsides. If you remove the code, you may end up with more frequent name clashes. If you include it, you have a problem with references that cover many languages. I don't think having aliases for the latter is the way to go though. —CodeCat 20:12, 1 March 2017 (UTC)
I certainly agree that the current situation is very frustrating to keep track of and maintain. —JohnC5 20:29, 1 March 2017 (UTC)
Anyone up for taking count of how many name collisions we would surrently have, if not for language prefixes? I suspect there would not be very many at all that involve templates with the R:Lastname Year notation. More idiosyncratic abbreviations are probably a different case. --Tropylium (talk) 20:48, 1 March 2017 (UTC)
I think the best option is to not have the prefix unless we need it to disambiguate, but this is such a minor problem I'd rather not expend too much in the way of time and resources on it. Chuck Entz (talk) 03:01, 2 March 2017 (UTC)
Keep the prefix. It helps to find a template from the search bar quickly. When you forget the template name, you can enter Template:R:xxx: (+sometimes the first letter) and pick from the suggestions. I do it all the time. --Vahag (talk) 05:07, 2 March 2017 (UTC)
A very good point. The names of reference templates can be quite hard to remember, and I've made use of this trick myself. —CodeCat 14:06, 2 March 2017 (UTC)
On top of this, some templates include years, some don't; some include the author's name, while some include an abbreviation of the title; etc. For example, the five reference templates I use most commonly in Proto-Slavic entries are
Three different formats here. (On top of which, the author's names in the template don't match the resulting names in the generated text: Vasmer vs. Fasmer, Chernykh vs. Černyx. For ESSJa it's even worse: the generated text says Trubačev, which links to a Wikipedia article with the spelling Trubachyov, and meanwhile I've been using Trubachev in the main text.) Benwing2 (talk) 17:09, 11 March 2017 (UTC)
FWIW, I think "Fasmer" and "Trubačev" are wrong. These come from directly transliterating the corresponding Russian text, but that's not how names work. We normally write Rachmaninoff and Tchaikovsky, not the less idiosyncratic Rakhmaninov, Chaykovskiy or the transliterated Raxmaninov, Čajkovskij. Benwing2 (talk) 17:09, 11 March 2017 (UTC)
In bibliography it is customary to use strict transliterations of the names and titles so they can be searched in a database and found. Note the results for Trubačev vs Trubachyov and Trubachev. --Vahag (talk) 17:56, 11 March 2017 (UTC)
How should names work? There are slightly more products on Amazon using Rachmaninov than use Rachmaninoff--21K to 20K--and all but one of the products on the first page of Rachmaninov use Rachmaninov as part of English titles. I'd say, contrary to your claims, that culturally, we don't have a consensus on the spelling of Rachmaninoff's name. And that's an easy case; Rachmaninoff was a citizen of the US, and given that his New York tombstone has Sergei Rachmaninoff written on it, that's almost certainly what his English language official documentation has written on it. It's possible that we're the first or at least one of the first to transcribe into English the name of the author of a dictionary of some obscure Siberian language. We don't have any option but to pick some transliteration, being precise or diacritic filled or diacritic-free or natural.--Prosfilaes (talk) 23:27, 12 March 2017 (UTC)

@Benwing2, please undo your edits like this. I demonstrated above that bibliographic references use scientific transliteration. Or look at the references listed at the end of {{R:Derksen 2008}}. He has Trubačev, O.N., e.g. on page 580. --Vahag (talk) 07:43, 20 March 2017 (UTC)

OK, if you insist, I will undo all of them this evening (no time right now). But note that WorldCat and Derksen both consistently say "Max Vasmer" not "Maks Fasmer". Do you still think we need "Fasmer" despite this? Benwing2 (talk) 15:04, 20 March 2017 (UTC)
@Vahagn Petrosyan Benwing2 (talk) 01:36, 21 March 2017 (UTC)
@Benwing2: We can normalize "Maks Fasmer" to "Max Vasmer", I agree. --Vahag (talk) 07:59, 21 March 2017 (UTC)

If we use Wikidata for references, we won't need separate reference templates anymore, will we? If I'm not mistaken, we could delete all the reference templates and use something like {{ref|Q43361}}, where "Q43361" is the code for the reference intended. --Daniel Carrero (talk) 15:08, 20 March 2017 (UTC)

I don't know if that's a good idea. I would prefer keeping the reference templates as they are but using Wikidata on the back end. DTLHS (talk) 15:10, 20 March 2017 (UTC)
Something about which I've been unclear: with WikiData, will parameter passing be possible when transcluding? So would I be able to do {{ref|Q43361|page=12|head=foobar}}? I'm not crazy about this idea regardless, but just curious. —JohnC5 15:16, 20 March 2017 (UTC)
I assume we'll be able to do anything on this page. DTLHS (talk) 15:30, 20 March 2017 (UTC)
From what I see in Wikidata templates and modules that already exist in Wikipedia, apparently yes, we can certainly do that. We should be able to get title, author, year, etc. from Wikidata and use template parameters for more information if we want. JohnC5, you mentioned a parameter "page=12", which seems much needed: we may want to mention which page the information is on for each entry, on a case-by-case basis. --Daniel Carrero (talk) 23:00, 20 March 2017 (UTC)
I was just thinking this the other day when I was adding params to a few. I personally really like the prefixes for the sorting benefit previously mentioned. Sure, there are some sources that contain multiple languages, but for the most part they're all usually under a single umbrella language. I'd also like to see dates become a required suffix, like R:bsl:Derksen:1996, R:bsl:Derksen:2008, R:bsl:Derksen:2015. --Victar (talk) 04:51, 27 March 2017 (UTC)
Redirects/aliases, as some have noted, solve some of these issues. Any widely-used or widely-usable reference like {{R:L&S}}, {{R:Duden}} or {{R:MWO}} should have a redirect from a short name (if it does not simply have a short name). In the other direction, any templates that are located at prefixless names could have redirects from the prefixed names to facilitate finding them via the search function mentioned above. For bilingual dictionaries etc, redirects are obvious (e.g. both {{R:en:Some English-Spanish dictionary}} and {{R:es:Some English-Spanish dictionary}} could produce the same reference, one redirecting to the other.) This would also allow us to standardize the "true names" of many references into a format like Victar suggests, while retaining as redirects the shorter names they currently have which people are used to. - -sche (discuss) 22:10, 1 April 2017 (UTC)

"External sources" - follow-up vote[edit]

Wiktionary:Votes/2016-12/"References" and "External sources" passed, except the point 4 (which would be always requiring the use of tags <ref></ref> and <references/>).

Also, I created Wiktionary:Votes/2017-03/"External sources", "External links", "Further information" or "Further reading" as a follow-up to this vote to double-check if we want to use "External sources" as the section name. As I said, if the follow-up vote fails, I believe "External sources" wins by default, because it already won in the first vote. --Daniel Carrero (talk) 11:07, 2 March 2017 (UTC)

Eighth LexiSession: French words in other languages[edit]

Monthly suggested collective task is to look for French words in other languages. You are invited to help us to describe the spreading of French in your own language or others you know, and to improve the way Wiktionaries describe semantic changes and uses.

This is a collaborative experiment without any guide nor direction. You're free to participate as you like and to suggest next month topic. If you do something to answer to this nice call, please report it here, to let people know you are involve in a way or another. I hope there will be some people interested this month Face-smile.svg Noé 16:02, 2 March 2017 (UTC)

See Category:Terms derived from French for some examples in various languages. —Stephen (Talk) 16:22, 2 March 2017 (UTC)

What should past participle form templates link to?[edit]

There are several templates like {{feminine plural past participle of}}. I have noticed that in Spanish entries they are directed to the infinitive of the word, like in abastadas. However, most French entries, e. g. abadés, link to the past participle form (although there seem to be exceptions like acceptées, which links to the infinitive form too). What is the preferred variant? Could/should it be unified? I think a bot might manage such unification. --Jan Kameníček (talk) 20:54, 2 March 2017 (UTC)

The general practice is that if an inflection has its own inflection, then each form-of entry links to the most immediate term it is a form of. That means that if a participle has feminine and plural forms, those link to the participle lemma form, while the participle lemma then itself links to the main verb lemma. See e.g. gedaan/gedane, captus/captae. —CodeCat 21:12, 2 March 2017 (UTC)
Thanks for explanation. I was asking, because the real practice was confusing to me. I suggest to unify it by bot if it is technically possible. --Jan Kameníček (talk) 23:41, 2 March 2017 (UTC)

Transwiki requests from en.wikipedia[edit]

Apologies if this is not the right place to bring this, but I was wondering if anyone has looked at the requests for transwiki from en.wikipedia lately? There are articles that have been tagged in the transfer category for over 2 years. There's supposedly a bot but clearly it's defunct. Could someone take a look and maybe see if any are suitable, or perhaps de-tag any that are not suitable, with a note? I would do them myself, but I don't have Import privileges here, and I'm not super familiar with your standards of what's worth keeping. Premeditated Chaos (talk) 04:18, 3 March 2017 (UTC)

Honestly, we don't particularly want transwikis from Wikipedia. There would be no point to importing them, because the formatting requirements and criteria for inclusion are so radically different. I can go through and add ones that we lack, but then how would I mark the Wikipedia entry to show that it can be deleted? —Μετάknowledgediscuss/deeds 04:25, 3 March 2017 (UTC)
@Metaknowledge: Just remove w:en:Template:Move to Wiktionary. —Justin (koavf)TCM 04:55, 3 March 2017 (UTC)
@Koavf: But I thought they wanted to get rid of the entries on Wikipedia, so should I mark them for deletion? Some are pretty thoroughly non-notable and crappy. —Μετάknowledgediscuss/deeds 05:02, 3 March 2017 (UTC)
@Metaknowledge: w:Laz language and w:Marshallese language aren't going to be deleted--they are fully-formed articles. Theones that should be deleted can have a deletion notice put at the top of them. If you need help figuring out how to do that, I can assist you. If nothing else, you can just put {{delete|Transwikied to Wiktionary}} at the top. —Justin (koavf)TCM 05:05, 3 March 2017 (UTC)
@Koavf, yeah, I don't really know the ways of Wikipedia. I'll do that, then. —Μετάknowledgediscuss/deeds 05:10, 3 March 2017 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────Savage honesty, @Metaknowledge:, I like it. If you want, I'll keep an eye on your contribs at enwiki and tag the rejected ones for deletion myself, just put an edit summary like "transwiki declined by wiktionary - (unsuitable/already present/whatever else)". I can point to this discussion to confirm you guys don't want them transwiki'd, if need be. Although if you are going to incorporate any of them over here without a transwiki, I wonder about licensing compliance...? Will you copy the en.wikipedia history to the talk page? (PS thanks for the swift responses everyone). Premeditated Chaos (talk) 06:37, 3 March 2017 (UTC)

I just dealt with a handful of them. There are a bunch that I'm not qualified to handle, like the Sanskrit, but I could always put them on the appropriate request pages if you guys want to get rid of the Wikipedia pages (not sure to what degree that's a priority). —Μετάknowledgediscuss/deeds 06:47, 3 March 2017 (UTC)
It's not really. I just have a weird obsession with clearing out old backlogs. Premeditated Chaos (talk) 09:46, 3 March 2017 (UTC)
It's not too weird. I have some categories that I intend to try to keep empty, but which capture a few items from time to time, giving me recurring satisfaction when I empty them. I find they give me more satisfaction than making a 5% dent on a ten-thousand item list of missing entries. You can also make some ad hoc lists of entries with correctable deficiencies by appropriate use of Cirrus search, especially including "insource". DCDuring TALK 12:56, 3 March 2017 (UTC)
Ahh, you get me :) Although I do have a certain 18,000-entry category on en.wiki that I have my sights on. A girl can dream. [[User:Premeditated Chaos|Premeditated Chaos]:] (talk) 21:35, 3 March 2017 (UTC)
I think Wikipedia:List_of_wasei-eigo should be transwikied and placed in Category:Etymological appendices. It's been nominated for deletion. Siuenti (talk) 17:56, 10 March 2017 (UTC)

Alternative romanizations for Japanese[edit]

Our current practice is to have ō rather than ou. However, romanizations with ou do occur in the wild, and people may want to look them up. Case in point is the Twitter channel @kitunegazou which has a name not only transcribing 画像 as gazou rather than gazō or gazo, but also transcribes tsu as tu instead. I think an allowance should be made for such variations so that people can look them up and find out what they mean, per our mission statement. —CodeCat 17:37, 3 March 2017 (UTC)

So far I only edit Latin-script languages, but I think alternate romanizations should be treated like alternate spellings. If they're attestable, I don't think there's any good reason not to include them. Andrew Sheedy (talk) 07:51, 4 March 2017 (UTC)
  • This is an easily bottable job, and IMO a good idea, but it needs consensus from Japanese editors. @Haplology, Eirikr, Wyang, Atitarev, TAKASUGI Shinji, Suzukaze-cΜετάknowledgediscuss/deeds 19:31, 4 March 2017 (UTC)
    (it's called wāpuro rōmaji BTW.) I for one am interested in seeing such entries here; they would be convenient. However, how would they co-exist with Hepburn entries? —suzukaze (tc) 20:59, 4 March 2017 (UTC)
    I oppose having entries for yet another transliteration, also non-standard. Redirects like ou/ō, hu/fu, tu/tsu are fine by me. Transliterations are not words in a language. --Anatoli T. (обсудить/вклад) 00:58, 5 March 2017 (UTC)
    I want to discuss your "non-standard" statement. I assume it arises from my "wāpuro rōmaji" claim. However,
    1. The Nihon-shiki and Kunrei-shiki standards also use つ = tu, ふ = hu, etc.
    2. The ou/ō that you mention, on the other hand, can be considered non-standard since the Nihon-shiki and Kunrei-shiki standards use おう = ô (while Hepburn uses ō).
    suzukaze (tc) 02:49, 5 March 2017 (UTC)
  • My several-cents-worth:
  1. JA romanizations are only ever soft redirects to lemma forms.
  2. Wāpuro rōmaji is not a standard that's taught in any educational materials that I'm aware of. It derives from the Latin-based spellings used by IMEs to convert into Japanese. For instance, しゃ (sha) could be entered into an IME as sha, sya, or sixya. I don't think there's a good case to be made for this as a romanization scheme to include here for headwords.
  3. Both of the romanization schemes put out by the Japanese government, Kunrei-shiki and the older Nihon-shiki, are based on a strict view of Japanese phonotactics. This is reasonable enough in a Japanese-as-Japanese context. However, this makes both of these schemes less suitable to non-Japanese speakers. The English Wiktionary is ostensibly targeted at English readers, who would be confused by ta being read as /ta/, but ti being read as /t͡ʃi/ and tyo being read as /t͡ʃo/.
  4. Both Kunrei-shiki and Nihon-shiki are also defective in that they cannot express the full range of Japanese phonemes. The phoneme つ is romanized in these schemes as tu, and pronounced as /t͡su/. The phoneme ち is romanized as ti, and pronounced as /t͡ʃi/. There is no way in either of these schemes to romanize the differentiation between /t͡su/ and /tu/, or /t͡ʃi/ and /ti/ -- even though the latter in these do now exist in the Japanese language, albeit only in loanwords.
This is basically a recapping of past discussions, which arrived at our current practice of using a modified version of the Hepburn romanization scheme for our romanized-JA entries.
Inasmuch as romanized JA entries are only ever redirects, I do not care greatly whether or not we have any entries created under such alternative schemes. However, I do feel strongly that we should keep our current modified-Hepburn romanizations for the romanized spellings shown in kanji and kana entries. I would be open to having the templates include a link to a page explaining romanization, if people feel strongly about that. ‑‑ Eiríkr Útlendi │Tala við mig 16:59, 6 March 2017 (UTC)

(possible layout?suzukaze (tc) 23:26, 6 March 2017 (UTC))

IMO reverse-transcription/transliteration gadgets and romanisation search support is the key, instead of more romanization entries. Wyang (talk) 10:29, 9 March 2017 (UTC)

Improving pronunciation assistance[edit]

A question at "Wiktionary:Feedback" prompted me to check what support we provide to assist users to figure out what our IPA transcriptions mean. Unfortunately, I found that the {{IPA}} template links the word "IPA" to "Wiktionary:International Phonetic Alphabet" and "(key)" to "w:English phonology", neither of which is easy for someone unfamiliar with IPA to understand.

I suggest that "IPA" should simply link to International Phonetic Alphabet, and "(key)" should link to a much simpler key with examples such as "w:Help:IPA for English". The latter can link to "Wiktionary:International Phonetic Alphabet" for readers who want more detailed information, but that really shouldn't be the first page that readers encounter. Could someone familiar with IPA work on this? — SMUconlaw (talk) 18:44, 4 March 2017 (UTC)

(key) links to Appendix:English pronunciation; it only takes you to Wikipedia's article on a language's phonology if the pronunciation appendix doesn't exist. —Aɴɢʀ (talk) 22:47, 4 March 2017 (UTC)
In that case something is wrong, because each time I click on "(key)" it sends me to "w:English phonology". Does anyone else experience this problem? — SMUconlaw (talk) 06:13, 5 March 2017 (UTC)
@Erutuon has been editing the backend of that recently and probably made some sort of a mistake along the way. —Μετάknowledgediscuss/deeds 06:15, 5 March 2017 (UTC)
@Μετάknowledge: Thanks for the ping. I think I hadn't added this page to my watchlist. I'll take a look at the module and figure out what's going wrong. The "key" link should point to Appendix:English pronunciation. — Eru·tuon 06:31, 5 March 2017 (UTC)
Ah yes, it was an error I introduced into Module:IPA/data. Fixed. — Eru·tuon 06:35, 5 March 2017 (UTC)
Thank you! — SMUconlaw (talk) 06:43, 5 March 2017 (UTC)

Quotations in non-lemma entries[edit]

Can non-lemma entries have quotations (see e. g. the English entry en passants) or should these be added only to lemma entries? Recently I have met an opinion that they should be added only to lemmas. I personally see some advantages if they are permitted to non-lemmas too.

  1. The quotations can show that the non-lemma form is verified. This happened with the above mentioned en passants. In Czech language there are sometimes more forms possible, for example the locative of Černovír can be both Černovíru and Černovíře. Because the latter one seems more typical for eastern part of the Czech Republic (Moravia), some people from Bohemia might feel it is incorrect, and so the quotation can be useful.
  2. The quotations serve also as an example how the particular form can be implemented in a sentence.

Some lemmas can have loads of inflection forms (Czech nouns up to 14, verbs more than 30) and so I do not think it would be a solution to list the quotations of all non-lemma forms in the lemma entry. --Jan Kameníček (talk) 08:28, 5 March 2017 (UTC)

@Jan.Kamenicek: I definitely agree and I'm surprised that anyone disagreed--can you tell me what that person's reasoning was or maybe use {{Ping}} to encourage him to participate? I see a lot of value in providing citations of a form of a word (e.g. where there are two plurals and the usage shifts--from fish as a plural to fishes for instance) or verb conjugations or really many examples. Even shifts in capitalization and spelling would be useful for many inflected forms that wouldn't necessarily change the lemma/base form. —Justin (koavf)TCM 10:05, 5 March 2017 (UTC)
I asked because @CodeCat: removed a quotation from the above mentioned entry Černovíře and added it to the main lemma entry Černovír, where I believe it is less useful. --Jan Kameníček (talk) 10:23, 5 March 2017 (UTC)
I think in general, quotations should be at the main entry, but there can be exceptions, such as when a given nonlemma form is rare, dialectal or obsolete. For example, the quote at sense 1 of childer really does belong there, not at child. —Aɴɢʀ (talk) 10:33, 5 March 2017 (UTC)
I think it is better when rules are regular from when they have too many exceptions.
Above I also pointed out that some Czech verbs can have more than 30 non-lemma forms. --Jan Kameníček (talk) 10:48, 5 March 2017 (UTC)
As far as I know, there is no established consensus that quotations should only rarely be in non-lemma entries. --Dan Polansky (talk) 11:27, 5 March 2017 (UTC)
I can't comment about Czech, but it seems to me the main argument for having quotations of inflected forms at the main lemma is to ensure that it is evident at one location the time period over which a particular lemma appears in the language. Note the quotations at vardapet, which reflect the many alternative forms of the word. (Also, when I first started contributing here, an editor – I don't recall who – advised me that quotations for plural forms should be placed at the main lemma.) Having said that, I wouldn't object to quotations being repeated both at the non-lemma and lemma forms. What we should avoid is quotations separated according to inflected form. — SMUconlaw (talk) 15:28, 5 March 2017 (UTC)
I agree. As I understand it, the main entry of the lemma can be accompanied with quotations no matter which form of the word is used, while e. g. inflected forms can be accompanied with quotations containing the particular form. --Jan Kameníček (talk) 16:59, 5 March 2017 (UTC)
Agreed. The lemma can be thought of as representing the whole paradigm of a word, making it appropriate to include non-lemma quotations in the main entry, but it is still useful to have quotations and other information on inflection pages. Andrew Sheedy (talk) 21:47, 5 March 2017 (UTC)

I think I've added quotations for non-lemma forms before. I certainly have for dialectal forms that do not have definition lines: see ἕως (héōs). I did a search for entries with "inflection of" and {{Q|grc}}, and came up with a few inflected-form-of entries that had quotations: φίλε (phíle), Ἕλλησι (Héllēsi), τείχους (teíkhous), ἔειπε (éeipe). (The last I created, actually.) I don't see a reason to forbid it, even when the forms are standard (as is the case for φίλε (phíle), Ἕλλησι (Héllēsi), and τείχους (teíkhous)). — Eru·tuon 22:53, 5 March 2017 (UTC)

But why would you decide to put them on the non-lemma? If it's just for attestation purposes, the citation page seems more appropriate. Quotations serve as usage examples, but nobody is going to need usage examples for particular inflections of one word. —CodeCat 23:35, 5 March 2017 (UTC)
I have to think more about this, but I suppose the purpose would be so that one can find the citations corresponding to a particular nonlemma form (and so that one can indicate that a form actually exists, if someone doubts it). But if there were a way to link a nonlemma page to the quotation that attests it, then it would be fine to move the quotation to the citation page. — Eru·tuon 00:41, 6 March 2017 (UTC)
To expand, if we place the citations for ἔειπε (éeipe) in the page Citations:εἶπον, then there should probably be a way to link from ἔειπε (éeipe) to the particular citations in Citations:εἶπον that show that form. — Eru·tuon 00:46, 6 March 2017 (UTC)
I'm talking about the citation page of the non-lemma form in particular. If we place citations there, I don't think quotations would have much use. —CodeCat 00:57, 6 March 2017 (UTC)
I am afraid I cannot see the advantage of founding separate citation page for each non-lemma form, but I see some disadvantages. It seems more practical to me to follow current practice of adding the quotations directly to the non-lemma entry. It is practical for the author of the entry but also for the readers. If I (as a reader) visited a lemma entry and decided to have a quick look at an inflection entry, after which I would like to go quickly back, I would not want to go further to another page with quotations. Most readers are also used to the fact that if an entry has just a couple of quotations, they usually find them listed directly under the explanation of the meaning. There is no reason to do it in a different way if there is just one or two quotations. --Jan Kameníček (talk) 01:32, 6 March 2017 (UTC)
But what is the point of putting a quotation usage example in a place where people are never going to look for usage examples? Quotations aren't for attestation, that's what citation pages are for. Quotations are citations that are entered into the entry to serve as usage examples. Attestation should be strictly kept out of entries. —CodeCat 19:38, 6 March 2017 (UTC)
No, you are mistaken, see Wiktionary:Quotations. Everything you wrote is mentioned there only as a possibility which may (not "should") be done in that way. It can be practical if the number of quotations is high, but very impractical for a few of them. If I, as a reader, wanted to check just quickly whether the word really exists, I would definitely appreciate if I had a couple of quotations directly at the entry, and only if I were more curious, I would go to see more of them to a separate page.
As for examples: If I, as a reader, decide to visit a non-lemma page, I appreciate, if there is the example of its usage provided too. It is much better than having examples of all possible non-lemma forms confusingly together on the lemma page. A few of them can be fine, but not too many. As I have written above, many Czech verbs can have more than 30 forms. --Jan Kameníček (talk) 22:06, 6 March 2017 (UTC)
I'm not mistaken. These are my views on what is proper for Wiktionary. Citation pages should be used for attestation, quotations should be considered a special kind of usage example, and should always be also present on the citation page. An editor should be able to freely delete a quotation and replace it with a better quotation or simple usage example, secure in knowing that it's still preserved on the citation page. If it's not currently like this, I am stating now that I believe it should be like this.
And why would you need usage examples of a verb form? Surely that's unnecessary if you know how that form is used in other verbs? I don't need a usage example for painted because I know how the past tense works in English. Also, if you're going to present usage examples for individual inflections, where do you write usage examples for the lemma form, without it getting lost among the nonspecific usage examples? For example, on the page for captus, how would we separate examples and quotations for captus as a word with all its inflections, from those specifically for captus the nominative singular form? —CodeCat 23:19, 6 March 2017 (UTC)
OK, I did not want to offend you and I have no problem to accept it as your opinion. It seemed to me that you had not known that part of the policy because you had not put it as an opinion but as a fact. So now it is clear to me. Despite this I do not think it is a good idea to revert a work of somebody who does what the policy allows because you do not agree that the policy allows it.
As for examples and "painted": not everybody knows every language as well as you know English, somebody may find it useful.
You are right with the captus example, but applying your attitude could make it even worse. Possible solution: if the other inflections were mentioned separately, similarly as it is done in Londinium, than each of the inflections could have its own examples – it is just a quick idea inspired by the Latin entry, a complication would be that the word captus has more meanings, while Londinium just one.
Nevertheless, "examples" are not the main point of my argument and I would not insist on it just because of them. The main point is the attestation part and that having small numbers of quotations directly in the entry is more practical than founding separate citation pages for them. In my opinion, this should be done especially with higher numbers of quotations. --Jan Kameníček (talk) 00:46, 7 March 2017 (UTC)
I reverted it because the common practice is to have very minimal non-lemma entries. There's no policy for or against putting quotations and usage examples on non-lemmas, so it's up to editor judgement, and I saw my edit as an improvement. —CodeCat 00:50, 7 March 2017 (UTC)
I generally believe that lemma forms should go on the base lemma. For one thing, I will happily cite a Esperanto noun with cites ending in a combination of -o, -oj, -on, and -ojn, since the conjugation is entirely clear. Likewise for an English word with an entirely normal plural; there is no reason cites for jat and jats shouldn't be combined. We can keep that general rule and have reasonable exceptions.--Prosfilaes (talk) 23:45, 12 March 2017 (UTC)

Wikisaurus questions[edit]

  1. Shouldn't pages be categorized by language? E.g. German thesaurus entries only.
  2. Should the English pages link to their translations? E.g. wikisaurus:drunkard to wikisaurus:juoppo.
  3. Is there a template like {{suffixsee}} but for linking to Wikisaurus in an entry? If not, there should be for formatting consistency.

I think in general it's not very user-friendly, and I hope we can fix that. Also, if WF or whoever is interested, wikisaurus:borracho/wikisaurus:borrachera would be a fun one. Ultimateria (talk) 22:37, 5 March 2017 (UTC)

@Ultimateria: You are definitely correct that this namespace is underdeveloped. An obvious example: Wikisaurus:happy exists but Wikisaurus:glad does not. If someone is looking up synonyms for "glad" then it would be nice to have a reciprocal link to "happy". (These could be maintained by a bot.) You may think, "Well, we should only have one entry for the most basic or common version of a word" but we have both Wikisaurus:anger and Wikisaurus:rage. I agree that separating them by language is also obviously necessary because the related terms for "pie" in English are going to be desserts and in Spanish they will be feet. —Justin (koavf)TCM 02:54, 6 March 2017 (UTC)
@Ultimateria, Koavf: A possible alternative to linking translations is to merge them. So, wikisaurus:juoppo could be merged into wikisaurus:drunkard, under the existing "Finnish" and "English" headings, just like mainspace entries. Wikisaurus is arranged by concept rather than by the word itself, so there should be no problem in juoppo linking to wikisaurus:drunkard#Finnish (and, if so, the part of speech and sense could be moved above the headings, or merged into {{ws header}}.) Also, instead of a template like {{suffixsee}}, something based on {{seeCites}} seems more appropriate. I've attempted to implement this as {{seeSynonyms}}, and it seems to work (see thief), but I'm not entirely sure about the conventions and policies of this project when it comes to templates. - AdamBMorgan (talk) 18:04, 29 March 2017 (UTC)

Mirandese and Old Portuguese[edit]

I've noticed there are a few Mirandese entries listed as descendants of Old Portuguese, and these make sense since they differ from others in the Astur-Leonese family that Mirandese genetically belongs to (Mirandese was influenced by Portuguese over its history, naturally). But these should be listed as borrowings, shouldn't they? Since Mirandese, in its inherited core, is actually descendant of Old Leonese, more akin to Leonese and Asturian than Portuguese or Galician. Word dewd544 (talk) 04:42, 6 March 2017 (UTC)

In a similar vein, I saw some Asturian words listed as descendants of Old Spanish on the Latin word's page, as opposed to Old Leonese (like llenu on plenus)... Does anyone know if these are exceptions, in which Asturian took its word from Old Spanish, or if it's more of an orthographic matter, with Asturian mirroring Castillian type spelling? There are variants of these Asturian words, in standard for with an initial -ll-, that instead start with -ts-, indicating heavier palatalization. Word dewd544 (talk) 19:01, 6 March 2017 (UTC)
Yeah, they should be listed as borrowings. Despite the borrowings and areal features that connect Mirandese to Portuguese, it remains undoubtedly an Astur-Leonese language. — Ungoliant (falai) 19:10, 6 March 2017 (UTC)

Enabling Wikidata arbitrary access[edit]


After reading this discussion, your arguments and needs, we (=Wikidata development team) are considering allowing arbitrary access to Wikidata data on English Wiktionary :)

This step was anyway planned in the Wikidata for Wiktionary project, but we will try to shorten the process so you can soon display Wikidata data in Wiktionary, test it against your uses and processes, and we can observe together how it works.

Before enabling arbitrary access, we still have several steps to achieve (deploy Cognate & Interwikisorting, deploy sitelinks for Wiktionary meta pages). I will give you more information as soon as possible. In the meantime, you can follow this task on Phabricator.

Useful reminder: Wikidata only stores concepts and not words, yet. We're working on implementing new data types dedicated to lexemes, lemmas and forms. Before this happens, we should not try to describe words in Wikidata. If you're interested in this topic, check the project page or ask me :)

If you have any other ideas, use cases, remarks about using Wikidata data in Wiktionary, please let me know! Thanks for your interest, Lea Lacroix (WMDE) (talk) 10:32, 6 March 2017 (UTC)

@Lea Lacroix (WMDE): Thanks. This promises to be a very helpful addition. —Justin (koavf)TCM 10:37, 6 March 2017 (UTC)
Thank you. Here's an additional suggestion to be implemented eventually: Wikidata could be used to store Wiktionary interwikis like it does for Wikipedia, linking dog, pt:dog, es:dog, ja:dog, etc. Maybe that can't be done right now because we are implementing Wikidata access only on the English Wiktionary at the moment, and also because of the rule "Wikidata only stores concepts and not words, yet". Still, I think that's a good idea for the future. --Daniel Carrero (talk) 11:18, 6 March 2017 (UTC)
Indeed, this is the first part of our project. The interwiki linking will not be done using Wikidata but Cognate extension, that we will deploy soon on all Wiktionaries. Lea Lacroix (WMDE) (talk) 11:50, 6 March 2017 (UTC)
OK, thank you. --Daniel Carrero (talk) 03:37, 7 March 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── What is "arbitrary access"? — SMUconlaw (talk) 07:44, 7 March 2017 (UTC)

This will allow you to display data from Wikidata directly in any page of Wiktionary. With a short code like {{#label|from=Q81561}} or {{#statements:P964|from=Q81561}} the item label or any value from Wikidata will be displayed. Lea Lacroix (WMDE) (talk) 13:13, 7 March 2017 (UTC)
@Lea Lacroix (WMDE): At some point in the future — after new data types dedicated to lexemes, lemmas and forms are implemented, naturally — we'll probably use Wikidata to contain some information in all languages, for example the word "dog" in all languages (cachorro, perro, 犬, etc.)
I suppose we'll use language codes for that? es = perro, pt = cachorro, etc. But there are quite a few languages that don't have a language code, including for example the Old Swedish language. Apparently there's no ISO code for that specific language. The English Wiktionary uses "gmq-osw", while I believe the Swedish Wiktionary uses "gmq-fsv" (see the wikitext of sv:siælver). Do you know yet if Wikidata should be able to store Old Swedish using "gmq-osw" or "gmq-fsv", or maybe another code? Maybe Wikidata will store information using language names instead?
As was discussed here in 2013, most if not all of our exceptional language codes are not ISO-compliant. Neither "gmq-osw" or "gmq-fsv" are: "gmq" means North Germanic (which is correct), but "osw" and "fsv" are currently unassigned, and could be assigned to two other languages by ISO in the future.
I'd probably support this idea (which others may oppose or support too): using new, ISO-compliant, codes for all the languages that don't have an ISO code, especially by taking advantage of the range "qaa–qtz", the codes 'reserved for local use'.
For Old Swedish, maybe we could store data in Wikidata using the code "gmq-qos" if possible, or just "qos". I realize it would take work to change codes, but it's going to take work anyway to move content to Wikidata from scratch, so we might as well do it using a single set of language codes that anyone can use (that is, all the Wiktionaries, and also other projects outside of Wiktionary). --Daniel Carrero (talk) 06:03, 11 March 2017 (UTC)
For the record, I observe this with fear that actual lexicographical data will be stored at Wikidata and removed from Wiktionary, which to me would be an unfortunate development. --Dan Polansky (talk) 08:59, 11 March 2017 (UTC)
My opinion about some possible uses of Wikidata:
  • I'm not happy that Wikidata is CC0 (public domain) as opposed to the CC-BY-SA and GFDL licenses that Wiktionary uses.
  • That said, I would probably support moving the glosses of many foreign language (i.e., not English) lemmas to Wikidata, because say, all translations of the basic fruit sense of apple may have the same gloss, and the gloss might be stored on Wikidata. If you edit the gloss for "apple", it's going to be reflected in all languages. The same gloss may be added in etymologies, too. I would like if all etymologies that use "-cide" had the same gloss when applicable.
  • I would probably oppose moving all English lemmas to Wikidata, because a sense is English is usually longer (so it's more content to protect using CC-BY-SA instead of CC0) and it's likely to be in only one page anyway. If the idea above is implemented, English senses could have the same senseid as the glosses in Wikidata, but the actual definitions in the English section would still be in Wiktionary.
  • I would probably support getting the data for all place name definitions from Wikidata... For example, Wikidata could have names of cities, capitals, etc. and Wiktionary would use it to automatically build the senses and categories.
  • Also, I would be OK with moving the contents of tables like {{table:chess pieces/fo}} to Wikidata.
--Daniel Carrero (talk) 09:23, 11 March 2017 (UTC)
@Daniel Carrero Thanks for your input.
To answer about language codes: for the next step, enabling arbitrary access for Wiktionary, you'll be able to display any item from Wikidata, so if a language, dialect, etc. exists as an item, you can display its content. If it doesn't exist, you can create it on Wikidata without needing an ISO code.
For the future, when lexemes and lemmas will be stored in Wikidata, you will also be able to use any specific language. We will need a code to represent it in the RDF model, but this code doesn't have to be the ISO code. Using a new code system is possible, this is a topic we should discuss in the future with the Wikidata community. Lea Lacroix (WMDE) (talk) 11:27, 13 March 2017 (UTC)
If non-ISO codes are added to Wikidata, would there be a way to indicate whether a code is ISO or not, or where it originates from; for instance, to indicate that the code originated from Wiktionary? (I don't know anything about how Wikidata works.) — Eru·tuon 20:02, 13 March 2017 (UTC)
I think I can try to guess this answer. See wikidata:Q1860, which is the page about English. It already contains some data about the language, such as writing system, and different ISO codes. Probably our templates/modules could just automatically look if each language has an empty value for both ISO 639-1 and ISO 639-3, or alternatively we may discuss about adding the code "mis" ("uncoded language") to all the languages that are, in fact, uncoded. I'm particularly not a fan of adding where the code comes from, but I guess a new property ("language code origin" or something) would be able to do the job. --Daniel Carrero (talk) 20:20, 13 March 2017 (UTC)
  • For the record, the fact that Lea Lacroix says things like "[f]or the future, when lexemes and lemmas will be stored in Wikidata [...]" makes me very concerned. There is no consensus at English Wiktionary, or in the Wiktionaries in general, about how much lexicographical data should ever be on Wikidata. I often feel that people at Wikidata are driving this without concern for what is actually in our best interests, and that these kinds of conversations need to start at the most active Wiktionaries, not on stepwise plans drafted at Wikidata. —Μετάknowledgediscuss/deeds 22:02, 13 March 2017 (UTC)
  • @Metaknowledge: But anyone can use our data and Wikidata is a perfectly appropriate platform form lexicographical data. Integration with Wiktionary is one of the last hurdles of Phase 1--there have been several years to discuss this and a kind of live experiment with structured data at Omega, so I don't know what else to say. Is there a particular problem you have? —Justin (koavf)TCM 04:20, 14 March 2017 (UTC)
  • It's not "perfectly appropriate", and anyone who mucks around in the technical side at en.wikt realises that. If you want a starter, take a look at some of the issues Daniel Carrero brought up in this discussion (which include some potential solutions, because he's more technically proficient than I am). —Μετάknowledgediscuss/deeds 05:20, 14 March 2017 (UTC)
  • I echo Μετάknowledge's concerns -- Wikidata is not the appropriate place for any concept-based data store. Even the supposedly simple example given above of dog is extremely more complicated than discussion has so far touched upon -- What of the unattractive female sense? What of the coward sense? What of the morally reprehensible person sense? What of the any of various mechanical devices for holding, gripping, or fastening something, particularly with a tooth-like projection sense? What of any of the verb senses? We can clearly say that "Wiktionary [LANG A] has a page for term XXX, and so does Wiktionary [LANG B]". But we cannot say that "term XXX in [LANG A] is equivalent to term XXX in [LANG B] for all senses".
Even glosses can be much more complicated than single words. Frankly, in almost all cases, I feel that our single-word glosses are deficient -- different languages use terms differently. Take (sakura), for instance -- the entry could just say "cherry", but that would do our readers a disservice, in that the Japanese term carries many more different meanings and usages than just cherry. Etc., etc.
Languages are messy. Cross-linguistic comparisons and definitions are necessarily messy, squared. Wikidata assumes one-to-one relations, and is thus entirely ill-suited to the cross-linguistic use-case of any multilingual lexicography, be it Wiktionary or some other dictionary project. ‑‑ Eiríkr Útlendi │Tala við mig 17:32, 21 March 2017 (UTC)
I wouldn't mind using Wikidata for short glosses that can be repeated in many languages. For example, maybe the main sense of the word oxygen can have the same gloss in many languages, and maybe the same can be said for each element in the periodic table, each chess piece, and each day of the week. I think it's important that you asked about a lot of different senses of dog, and we do need to discuss about them. But in my opinion, we can simply not use Wikidata for these senses. The specific project I had in mind is this (sorry for repeating): "Use Wikidata to store short glosses used repeatedly in many languages. For other words, senses and glosses, don't use Wikidata." Feel free to discuss about that. Once again, I'd be OK with using Wikidata to contain information about place names in all languages, including automatic senses and categorization. --Daniel Carrero (talk) 17:47, 21 March 2017 (UTC)
@Eirikr: Your argument makes no sense: if that were true, then Wiktionary would be impossible itself... These pages are literally stored in a database anyway. There is no reason why a definitional database can't exist in principle any more than a multi-lingual dictionary can. —Justin (koavf)TCM 19:54, 21 March 2017 (UTC)
To be fair, this statement that Eirikr said above is true: "we cannot say that 'term XXX in [LANG A] is equivalent to term XXX in [LANG B] for all senses'". Still, nobody said otherwise in this discussion. I'm curious as to how the Wikidata sense database is going to work. In theory, it's possible for a dictionary database to work like this: the word dog has senses 41183, 98482, 61381, 98398, 43577, 87676, etc. It should be possible to build a database for all senses and words, and yes, move everything from Wiktionary to Wikidata. I'd oppose doing all that work, but it sounds doable in principle. I support only doing the specific sense moves I said above, and oppose the rest. --Daniel Carrero (talk) 20:05, 21 March 2017 (UTC)
There is a difference between using a database, and using specifically Wikidata. Wikidata was designed mostly with Wikipedia in mind, where it makes a lot of sense. Regarding interwikis, our interwiki links are relatively simple, we don't need something as complicated as Wikidata. We would only need a simple database that knows which pages exist in which language's Wiktionary. Then a bot wouldn't need to update the actual pages. Regarding senses, I would say something more extreme than Eirikr: We cannot say that any given sense of any given term in any given language is exactly equivalent to any given sense of any given term in any other given language. For example, we cannot even say that (what is currently) sense 1 of English dog is equivalent to (what is currently) sense 2 of Portuguese cachorro. --WikiTiki89 22:07, 21 March 2017 (UTC)
Currently, Portuguese cão is defined as "dog" and Spanish perro is defined as "dog" too. These definitions are the same. I wonder if it's OK to add the same gloss in these words, like this: "dog (a domestic canine mammal, Canis lupus familiaris)". This just means that the same gloss would be used for the English word in separate language sections that refer to the English word. Maybe the words cão, perro and dog are not the same at some level, but if the same gloss can be used for the English word when it appears in different language sections, then that's enough, at least for me. I think maybe Wikidata could store that gloss. Feel free to disagree if you want. Maybe that idea won't even work for the English word dog when it is referenced in all other language sections, but the other words I mentioned above (chemical elements, chess pieces, days of the week) may present the opportunity to use the same English gloss across multiple language sections. If nothing else, I still think that Wikidata could be used to generate definitions and categories for place names, which already follow a repeated pattern as it is. Contrary to your "extreme" statement, I believe Nova Iorque and Nueva York are probably "the same word", at least to the point that they probably could be defined using the same sense texts. --Daniel Carrero (talk) 23:21, 21 March 2017 (UTC)
That's also a problem. I've always been an advocate of abandoning the distinction between "definitions" in English and "translations" in other languages. They should all be thorough and independent definitions. The given sense of the English word "dog" is not going to have the same boundaries and connotations as the given senses of Portuguese "cão" and Spanish "perro". For example, if I modify the English definition of "dog", my modifications might not apply to other languages that link to that sense. This might work better for chemical elements and such because chemistry has become an internationally field and much of the terminology has been standardized, but there is only a limited set of words that that applies to and it's still not a strong guarantee that they are exactly the same. --WikiTiki89 15:46, 22 March 2017 (UTC)
  • @Koavf: Wiktionary is indeed stored in a database, but decidedly not in any data-normalized fashion: all our data is stored under the graphical representation of the lemma form, which lends itself well to the page-to-page interwiki linkage numerous posters have talked about -- but this is not what other Wikidata proponents appear to be advocating. Using Wikidata for the entirety of Wiktionary would require refactoring all our data so that each individual sense datum is stored as its own data object. This would be a monumental task, and also not terribly useful. Even just using Wikidata for gloss information is fraught with complication -- if I intend to gloss the Japanese term ぶす (busu) as dog, which sense of dog do I mean? (Since ぶす does not yet exist, I'll explain that this means unattractive female.) Using Wikidata as a multilingual back end where gloss XXX in language A must correlate to some other gloss in Language B requires that we know this at the editing stage. Leaving glosses as text data entered by individual editors might be inelegant in terms of data reuse, but it also provides the flexibility and transparency that we need.
My underlying sense here is that we have more than enough work ahead of us in simply creating, expanding, and maintaining entries. I do not see sufficient value to justify the enormous effort involved in designing a data structure that could apply to multilingual lexicography, and the work needed to suss out the inevitable pitfalls and then design around them -- let alone the work required after that to use such a data structure effectively. ‑‑ Eiríkr Útlendi │Tala við mig 00:16, 22 March 2017 (UTC)
@Eirikr: There would be no more effort than has already been applied to Wiktionary so far. We have definitions, quotations, translations, etc. and they are already in templates. You just have a bot store them in a database so that "dog", sense 1 points to "perro" and "dog", sense 2 points to "ぶす". Just like we have at the moment. Have you ever encountered OmegaWiki? What do you make of it? —Justin (koavf)TCM 02:30, 22 March 2017 (UTC)
But that is dangerous, see my comments above. --WikiTiki89 15:46, 22 March 2017 (UTC)
@Wikitiki89: "For example, if I modify the English definition of "dog", my modifications might not apply to other languages that link to that sense." But that's more of a reason to have structured data--then we can see when a particular field was changed and how. Either way, this isn't a problem for Wikidata per se, it's a problem of Wiktionary in general (a pretty fundamental one). I guess you could argue that Wikidata integration would just entrench the problem further but providing translations of terms is one of the most basic functions of Wiktionary as a multi-lingual dictionary. I'm not sure how your alternative would work. —Justin (koavf)TCM 17:10, 22 March 2017 (UTC)
You can see when it was changed now too. The problem is now you have to go and unlink all the senses in other languages to which the change does not apply. An alternative that I think would work is if we allowed many-to-many relationships between senses and abandoned the idea of a centralized gloss. Every foreign language term would still need to be independently defined, but now we would have automatically generated translation tables, that can exist even without an English term. I don't know if Wikidata has this capability. --WikiTiki89 17:44, 22 March 2017 (UTC)

Enabling access to Wiktionary:AutoWikiBrowser[edit]

Please may I have access to the AutoWikiBrowser as I have 900+ edits and I am also an autopatroller on Wiktionary. Thank you. Pkbwcgs (talk) 19:31, 6 March 2017 (UTC)

What do people think? Normally I am inclined to grant these requests but Pkbwcgs has been trying to get admin privileges since a month after joining Wiktionary so I'm a bit leery. Benwing2 (talk) 03:27, 22 March 2017 (UTC)
He has no need for any of these tools, and they should not be granted. —Μετάknowledgediscuss/deeds 03:30, 22 March 2017 (UTC)

change typeface and font of a certain script[edit]

Hi, I'd like to know how to permanently set the typeface (mainly size and font) of the Arabic script, as well as of the Chinese characters. Thanks in advance. —This unsigned comment was added by Backinstadiums (talkcontribs).

You can change it by adding code at Special:MyPage/common.css. Example code:
.Arab {
	font-size: 180%;
	font-family: "Arabic Typesetting";

.Hani {
	font-size: 200%;
	font-family: "SimSun", "Microsoft Yahei";
Hani is the script code for the Chinese script and Arab is for Arabic script. You can change the font-size and font-family to however you see fit. —suzukaze (tc) 10:02, 9 March 2017 (UTC)
@Suzukaze-c: Hi again, I've just noticed that it does not apply in, for example, category pages, so I'd like to know how if it possible to implement such codes for any term appearing throughout the whole wiktionary, and even wikipedia. Thanks in advance. --Backinstadiums (talk) 13:32, 5 April 2017 (UTC)
You can also change it globally by changing your browser's font settings. The methods to accomplish this vary by browser, obviously, but I believe that all major browsers allow you to set your own fonts. - TheDaveRoss 13:51, 5 April 2017 (UTC)
@TheDaveRoss: I asked in Chrome official forums, and was told about Advanced Font Settings, yet it is completely useless. --Backinstadiums (talk) 15:01, 5 April 2017 (UTC)
I haven't used the extension myself, bummer that it isn't effective. There are other extensions which allow you to use your own custom CSS globally, I have also not used those. - TheDaveRoss 15:14, 5 April 2017 (UTC)
The reason that Category:Arabic nouns of place does not have class="Arab" and lang="ar" applied to each link is that it doesn't have a catfix in the category template ({{ar-noun deriv catboiler}}) to change the styling of each link. I've added that, so the Arabic noun derivation categories should start looking better. — Eru·tuon 21:31, 5 April 2017 (UTC)

Overview #2 of updates on Wikimedia movement strategy process[edit]

Note: Apologies for cross-posting and sending in English. This message is available for translation on Meta-Wiki.

As we mentioned last month, the Wikimedia movement is beginning a movement-wide strategy discussion, a process which will run throughout 2017. This movement strategy discussion will focus on the future of our movement: where we want to go together, and what we want to achieve.

Regular updates are being sent to the Wikimedia-l mailing list, and posted on Meta-Wiki. Each month, we are sending overviews of these updates to this page as well. Sign up to receive future announcements and monthly highlights of strategy updates on your user talk page.

Here is a overview of the updates that have been sent since our message last month:

More information about the movement strategy is available on the Meta-Wiki 2017 Wikimedia movement strategy portal.

Verbatim copying of open-ended copyright notices[edit]

   There are "over 175" instances of "Merriam-Webster, 1996–" having been copied from respective M-W Web pages -- where it presumably should be construed as meaning "we at M-W intend to keep our copyright enforceable as long as feasible, even tho our lawyers say it's too much trouble to update this notice every January to verify that our intention has not lapsed" -- to this site, where it becomes IMO ambiguous between implying

  • (the reasonable pablum) "Wikt finds it plausible that M-W has so far succeeded in protecting its copyright on any changes it has made to their cited Web page since '96"


  • (the more relevant but inherently false) "Wikt checks the corresponding M-W page annually to verify that the page either has undergone no change from its '96 content, or else (1) it now bears a newer copyright date and (2) we have verified that as of that new date it continues to support our invocation of that page (tho we don't bother to reflect that new date here)."

   I'm about to begin the process of replacing those 176 citations with calls to a template of my devising, and i'll periodically post, as further comments below in this section, how my progress is going.
--Jerzyt 02:13, 10 March 2017 (UTC)

@Jerzy: I honestly am having trouble understanding what you are requesting/complaining about. Those are all transclusions of {{R:Merriam-Webster Online}}. What is your concern? —JohnC5 05:34, 10 March 2017 (UTC)
   @JohnC5 Thanks for taking an interest. I understand clearly the different construction "Merriam-Webster, 1996". Similarly, "Merriam-Webster, 1996, 1997, 1999" would mean to me "We have copyright not only for the content of the 1996 edition, but also for the revisions that issued in '97 and '99." In my understanding, "Merriam-Webster, 1996–" could plausibly mean that in some years -- say '97 and '99 -- they updated pages of, or added pages to, their site, and those changes (and perhaps the site as a whole) are protected by copyright in the respective years (and extend the date by which at least those new portions are protected, respectively, by 1 and 3 years beyond the original copyright's expiration).
So "1996–", when it appears on the work itself presumably may mean "We anticipate updating this in the future, but we haven't yet had occasion to do so." But unless i misunderstand the meaning of the terminal hyphen (when it appears on the copyright owner's site), it is at best confusing and almost surely misleading when it is copied (as we did) onto another work whose copyright ownership is different: we are not relying on later editions/versions of their site (which we can neither anticipate as to time or content -- nor can we afford to maintain reliable surveillance on their future editions), so it is unworkable for us to try to keep our citation up-to-date reflecting their subsequent changes. We are merely asserting, accurately, that we relied upon what is (apparently; say so if you construe "1996–" differently) their unmodified '96 edition, which scholarly practice requires we be able to cite even if they later, in a new edition, omit/replace the matter we cited. (That's so, whether they are correcting an error, or reflecting later developments that they feel make the old version of inadequate interest to their target readership/market.)
   It is thus that i await some further or contrary insight of yours as to what "1997-" means on their site, or how it can be anything but a stumbling block when copied onto Wikt, in one citation, or in scores of them.
--Jerzyt 06:51, 10 March 2017 (UTC)
Ah see what you are talking about, but I'm pretty sure that no one on here cares at all about this matter—no offense intended. As long as we aren't copying the content of MW verbatim but instead are analyzing and synthesizing the content, the minutia of the cited year doesn't really matter. Most importantly, this citation is used on a tiny number of pages comparatively. You are thinking a great deal about a reference template that almost nobody uses anyway. —JohnC5 07:04, 10 March 2017 (UTC)
As the editor who updated the {{R:Merriam-Webster Online}} template, my point of view is that information like "1996–" in imprint statements such as those appearing in library catalogues, etc., refers to the date of publication of the source and makes no assertion about copyright. It simply records that the Merriam-Webster Online website was launched in 1996, and so content on the website was published between 1996 and the present day. If it was intended to be the date when the copyright was registered, the practice is to add "©" before the date. Editors are free to use |accessdate= to indicate the date when the website was consulted, and |source= if they wish to indicate the exact source relied on by the website. Perhaps you can explain how the template of your devising is going to be different. — SMUconlaw (talk) 12:55, 10 March 2017 (UTC)

Search results for multiple words[edit]

The top two search results for "hardly ever" should be hardly and ever, assuming it doesn't have an entry Siuenti (talk) 17:44, 10 March 2017 (UTC)

I think this is something that should be brought up at https://phabricator.wikimedia.org/ for the https://phabricator.wikimedia.org/project/profile/209/ and https://phabricator.wikimedia.org/project/profile/1849/ teams, not here. —suzukaze (tc) 05:46, 11 March 2017 (UTC)
I'd support such a change in search behavior. Ever seems to be a stop word in our search, evidenced by it not appearing at all in the search results, even when the words are reversed. DCDuring TALK 19:09, 11 March 2017 (UTC)

Taking bibliographic information from Wikidata for use in quotation templates[edit]

Since we're a dictionary and not a bibliographic database, I think it would be good to start thinking about how to store quotation information on Wikidata rather than here. Benefits would include more consistency between entries, less tedious research, and more comprehensive references. The downside would be the wikicode would turn into something like {{#label|from=Q81561}}, which might be hard to edit- maybe there's a way to make it more descriptive. Above there was a statement that we would soon have "arbitrary access" so I believe we'll be able to do this on our own. DTLHS (talk) 18:42, 10 March 2017 (UTC)

@DTLHS: This is definitely something that Wikidata can and should do. It's used for references in statements and would be essentially the same. —Justin (koavf)TCM 05:43, 11 March 2017 (UTC)
I don't like this. I don't like creating substantial dependence of Wiktionary on Wikidata. If such a dependence arises, disputes will have to be resolved on a foreign project, Wikidata, and this will give increased power to those editors who will master the intricacies of understanding the relations between the two projects and where to edit what. --Dan Polansky (talk) 09:03, 11 March 2017 (UTC)
Still, is there any problem with the specific project of using Wikidata to hold quotation information, like title, year, author, etc. of a certain book? These are just objective facts about the book. What disputes could we have about this? --Daniel Carrero (talk) 10:00, 11 March 2017 (UTC)
I support using Wikidata to store quotation information, too. (I guess I was the first one to propose this? ;) I proposed it last month!)
Wikidata already stores information about some books: for example, wikidata:Q43361 is about Harry Potter and the Philosopher's Stone. Apparently Wikipedia already uses book information in Wikidata for its own purposes. A new template called {{auto quote|Q43361}} might serve to quote that specific book. We could also use a parameter "page": {{auto quote|Q43361|page=78}}.
As said above, I like that it would probably fill the publisher, year (maybe month, day if we want), etc. automatically.
One possible issue is that the same book may have different editions with different ISBNs... Also, the same quotation might be found in different pages in different editions.
About "might be hard to edit- maybe there's a way to make it more descriptive"... Maybe we could add an "edit" link to the Wikidata page automatically at the end of every quotation... Like this: "(year) (title) (author) (etc) [edit]" --Daniel Carrero (talk) 10:00, 11 March 2017 (UTC)

mineral water[edit]

Why is "Terms with manual transliterations different from the automated ones/hi" coming up in the categories? ---> Tooironic (talk) 01:28, 12 March 2017 (UTC)

IMO it's pretty self-explanatory. —suzukaze (tc) 01:30, 12 March 2017 (UTC)
I'm seeing similar weirdness at public transport. How is this useful? ---> Tooironic (talk) 01:31, 12 March 2017 (UTC)
It's indicating that the Hindi translation of "public transport" has a manual transcription that is different from the one generated by Module:hi-translit. As the category Terms with manual transliterations different from the automated ones/hi has not yet been created, the category shows up on the bottom of the page. Once {{auto cat}} has been added to the category page, the category will be hidden. — Eru·tuon 01:36, 12 March 2017 (UTC)
There. Now the category is hidden. You too can hide these categories by adding {{auto cat}} to their pages. — Eru·tuon 01:39, 12 March 2017 (UTC)
Thanks! ---> Tooironic (talk) 02:11, 12 March 2017 (UTC)

Request category vote -- future 2nd vote[edit]

At some point after some of the current votes end, I'd like to create a 2nd version of the vote Wiktionary:Votes/2016-07/Request categories, which ended with "no consensus" (8-6-3) in October 2016. The vote was about renaming the request categories like this: from "Category:English entries needing quotation" to "Category:Requests for quotations in English".

I'd like to use the category names that @Wikitiki89 proposed in the vote above. The full list is also copied in this October 2016 discussion, where I also added names for the umbrella categories: Wiktionary:Beer parlour/2016/October#"Request categories" vote -- no consensus.

Apart from the 8 supporters, if some people opposed or abstained the vote above only because of the specific category names, I'd like to think there's a chance for the 2nd vote to pass, because of the new category names. I still think that the names currently used in our categories are problematic, including all the names with the word "needing" and all the ungrammatical names. I'd like to introduce this as a consistent category system, too.

Now, I'll ping everyone who participated in the previous vote or its talk page: @Erutuon, Bcent1234, I'm so meta even this acronym, -sche, Enoshd, Dan Polansky, This, that and the other, Equinox, Xbony2. --Daniel Carrero (talk) 12:06, 12 March 2017 (UTC)

I created the 2nd vote. Here it is: Wiktionary:Votes/2017-03/Request categories 2. --Daniel Carrero (talk) 10:15, 16 March 2017 (UTC)

User: adding unattested Spanish words[edit]

I admit, my Spanish is not what it used to be, but I started to suspect that some of the contributions from anon Special:Contributions/ were fishy. Calling out to all the Spanish speakers around here to keep an eye on this anon and please feel free to revert any changes I might have done in error. --Robbie SWE (talk) 19:27, 13 March 2017 (UTC)

Make structured data for Wiktionary more useful and usable by learning more about common tasks Wiktionary editors perform[edit]

Hello, my name is Jan Dittrich and I create the concepts for the user interface for structured data for Wiktionary. To do this better, I would like to learn more about common tasks Wiktionary editors perform: How do you add a new word in Wiktionary? How do you watch and improve data?

Knowing more about this, I can consider these tasks better in future design decisions.

In my experience, learning about this is often best done via demonstrating these tasks – simply by doing them and letting someone else watch. This can be easily done remotely via google hangouts or https://meet.jit.si/

If you would like to help me and the team to improve structured data for Wiktionary, and would let me (virtually) look over your shoulder for about 30min-60min time, please write to jan.dittrich(a)wikimedia.de.

--Jan Dittrich (WMDE) (talk) 08:27, 15 March 2017 (UTC)

This is a good idea, and really thoughtful of you. Thanks, Jan. —Μετάknowledgediscuss/deeds 16:39, 15 March 2017 (UTC)

etytree, a graphical and multilingual etymology dictionary based on Wiktionary: feedback and endorsement[edit]

Hi all!

I have now completed the first phase of the project and I’m asking for a renewal and for your feedback! Pls add your comment at the end of page Renewal.

A link to the demo is demo, while a link to the first release is tools.wmflabs.org/etytree.

Looking forward to your comments on the grant page! Epantaleo (talk) 14:23, 16 March 2017 (UTC)

This is really cool! I wish I could click on the nodes to see the page on Wiktionary though. —Aryamanarora (मुझसे बात करो) 16:01, 2 April 2017 (UTC)

Symbols categorised as lemmas[edit]

Right now, Module:headword treats letters and other symbols as lemmas, and probably rightly so. However, the category Category:English symbols is not actually a subcategory of Category:English lemmas, but is in its own category tree. Should this be changed? —CodeCat 15:10, 16 March 2017 (UTC)

Eh, I guess, for consistency. - -sche (discuss) 22:10, 18 March 2017 (UTC)

The Ndebele languages[edit]

@Metaknowledge The situation with the languages called "Ndebele" is a bit of a mess. From what I can gather from this list, there are actually three separate Ndebele languages: Ndebele (Northern - South Africa), Ndebele (Northern - Zimbabwe) and Ndebele (Southern). The issue is that there seem to be two different languages called Northern Ndebele, one a Zunda language (with the class 7/8 prefixes isi-/izi- like Zulu) and one a Tekela language (with si-/ti- like Swazi). Wikipedia only has Northern and Southern Ndebele articles, with no mention of a third. I'm having a lot of trouble figuring out which is which, whether there really are three or just two, and which of them is in the Tekela group. —CodeCat 16:57, 17 March 2017 (UTC)

As far as I can tell, this is how it stands:
Ndebele (Northern - Zimbabwe) has ISO code [nd] and is called "Northern Ndebele" here. It is a Zunda language.
Ndebele (Southern) has ISO code [nr] and is called "Southern Ndebele" here. It is a Tekela language.
Ndebele (Northern - South Africa) aka Sumayela Ndebele lacks an ISO code and thus has no coverage here. It is a Tekela language.
The question then is whether we should change this setup. Though the names we use are not necessarily ideal, it appears that grouping Ndebele (Northern - South Africa) into [nr] is fine, as the lects are extremely similar, but this may be more a result of lack of study than anything else. TBL notes that "Sumayela Ndebele is not recognized as an official language, or for education." Ideally, we would want to find a study to be sure that the dialect continuum can be safely cut this way. —Μετάknowledgediscuss/deeds 04:54, 18 March 2017 (UTC)
I wonder if there's something wrong with the information on that site then. The Zimbabwean variant is labelled "Northern Ndebele (Sindebele / isiNdebele)", but at the same time it says it's an offshoot of Zulu (which Wikipedia also says), and it's odd for a Zulu descendant to have no augment. At the same time, Southern Ndebele is given as "isiNdebele" on its page, and Wikipedia also gives this as the name of the language, which has an augment and might be a bit weird for a Tekela language. Wikipedia's page on Southern Ndebele gives examples which have z rather than the t of the Tekela languages, e.g. uLwezi for "November" contrasting with Swazi Lweti. —CodeCat 13:16, 18 March 2017 (UTC)

Vote: Reference templates and OCLC[edit]

FYI, I created Wiktionary:Votes/2017-03/Reference templates and OCLC.

Let us postpone the vote as much as discussion requires. --Dan Polansky (talk) 07:47, 18 March 2017 (UTC)

The scope of this vote is not sufficiently broad. @I'm so meta even this acronym, Smuconlaw, I, and others for a while now have been fighting Dan to add legitimate citation information to templates only to have it removed because Dan feels that that it is "ugly" or "cluttered". An example of such an argument may be found at {{R:Gaffiot}} among others. Now, I'll admit to being overly aggrieved, but this vote should actually be whether citations should be full academic citations like Smuconlaw and I like or simpler stubs as Dan prefers. There are a variety examples on both sides, but I'd honestly say that our citations should be maximalistic. I don't particularly intend to go around adding OCLC and ISBN codes to everything, but I do not want people to be allowed to remove citational information because of appeals to style or tradition. —JohnC5 08:04, 18 March 2017 (UTC)
I have intentionally limitted the vote subject to OCLC only. The vote does not resolve the full disagreement between me and the ornamentalists. It only deals with one item. I have even intentionally omitted the subject of ISBN.
As for "full academic citatation", please show us an online academic article that uses ISBNs or OCLCs in its references section. --Dan Polansky (talk) 08:12, 18 March 2017 (UTC)
I would prefer not to be termed an ornamentalist but a maximalist, thanks. Again, I personally don't care about ISBN or OCLC, but all those other useful things you have removed like the location, the translation of the title, the full names of the author, editors, translators, volumes, editions, and most of all, a standardized formatting across all templates. —JohnC5 08:21, 18 March 2017 (UTC)
I prefer "ornamentalists" in keeping with the core of my argument. As for "full academic citatation", please show us an online academic article that uses ISBNs or OCLCs in its references section. I mean, really. If you keep making claims about standard citation format, please shows us an example of references in real academic publishing showing ISBNs or OCLCs. Maybe there are some, I don't know. --Dan Polansky (talk) 08:27, 18 March 2017 (UTC)
Once again, I'm not arguing on behalf of OCLCs or ISBNs because they are not part of academic citation. You can ban them for all I care. —JohnC5 08:31, 18 March 2017 (UTC)
I created the vote about OCLC. For a discussion on other details, I checked Britannica online article on tiger[1]. There, the section External links is very plain, even too much so, perhaps. And there, section "Additional Reading" contains very minimalistic manner of referencing, to wit, "K. Ullas Karanth, The Way of the Tiger: Natural History and Conservation of the Endangered Big Cat (2001)," There I see the sort of identification that makes sense to me: author, title, year, and there we go. Not sure whether it is "academic" enough, but looks good to me. I don't oppose going a little further beyond the Britannica example, but not in the ornamentalist manner in which an identification of a reference often spans multiple lines. --Dan Polansky (talk) 08:39, 18 March 2017 (UTC)
Anywyy, can you please show us one real academic article online whose manner of reference identification approaches the way you want to do it? --Dan Polansky (talk) 08:41, 18 March 2017 (UTC)
One of the lovely things about wrapping these types of things in a template is that you can both get what you want. Include some markup in the template that allows you to show or hide the components that you want with CSS and everyone wins. - TheDaveRoss 12:42, 18 March 2017 (UTC)
That does not really solve the problem since then you have to figure out the default. And it is the default that the overwhelming majority of users is ever going to see. I argue that numerical identifiers are inessential for a great majority of users, and that the minority of users should be accomodated by having these identifiers in an appendix that is one click away from the template. --Dan Polansky (talk) 13:02, 18 March 2017 (UTC)

Applying a consistent format to citation, reference and quotation templates[edit]

On a related issue, I'd like to see if there is consensus for aligning the formatting of reference templates (those starting with {{R: ...}}) with citation and quotation templates (those starting with {{cite ...}} and {{quote ...}}). Of course these families of templates have some differences as they serve different purposes, but I am of the view that the Dictionary would look more uniform if we tried to eliminate unnecessary differences. I would propose that reference templates be formatted as follows:

“[entry]” in [editor's name], editor, [Title], [Place of publication]: [Publisher], [year], retrieved [date of retrieval], page [page number], column [column number].
For example: “example” in Webster's Revised Unabridged Dictionary, Springfield, Mass.: G. & C. Merriam, 1913.

SMUconlaw (talk) 15:51, 18 March 2017 (UTC)

I oppose aligning external link template formatting with quotations templates used for attestation. In particular, I oppose starting the formatting of such a template with the year in boldface.
I oppose "retrieved [date of retrieval]"; it is useless unless the template was used to source the information.
I oppose the quotation marks around the entry name as long as that location has a hyperlink, which it usually has.
Beware that the majority of templates starting in R: are in fact used for external links, not references. --Dan Polansky (talk) 15:59, 18 March 2017 (UTC)
If the [] brackets above indicate that information is optional, it has to be said that page number and column number have to be optional. --Dan Polansky (talk) 16:00, 18 March 2017 (UTC)
No, my mistake: [] brackets are used to indicate that something is a field, a filler to be replaced with a specific value. Oh well. --Dan Polansky (talk) 16:07, 18 March 2017 (UTC)
I am not proposing that for references the year be placed in bold at the start. That makes more sense for quotations as it enables a reader to see at a glance the time period over which an entry is used. I agree that for references it makes more sense for the entry to be put at the start. Yes, the brackets in the example above indicate fields to be replaced by a particular value. I added an example with the values filled in for clarity. Yes, the page and column numbers are optional; it can be useful to add them when the external website linked to provides a PDF of the source. — SMUconlaw (talk) 16:17, 18 March 2017 (UTC)
I think I misunderstood what you mean by "aligning". You seem to say, let's remove some difference between attesting quotation templates and external link templates, but not all the differences. --Dan Polansky (talk) 16:28, 18 March 2017 (UTC)

For references, I propose to follow the formatting laid out in this style guide, section 16. --Vahag (talk) 07:53, 20 March 2017 (UTC)

@Smuconlaw, Vahagn Petrosyan: I've actually come around to your opinion that, for logical consistency, the cited entry should come at the end, rather than at the beginning. Let's do away with the "date of retrieval" field, however; it's worse than useless. As for page and column, I prefer the traditional convention for writing those numbers, separated by a slash (e.g. 38/2 = page 38, column 2). Finally, we must consider foliation, complete with recto and verso. — I.S.M.E.T.A. 23:34, 15 April 2017 (UTC)
I think the entry should come first. First, you say what you're citing, then you say where it's found. —CodeCat 23:44, 15 April 2017 (UTC)

Templates cmn-3 and zu-0[edit]

At the moment, my Babel box has cmn-3 with an English sentence (along the lines of "This user has an advanced knowledge of Mandarin Chinese"), whereas all other templates give sentences in the languages they refer to. How come it shows up as "该用户能以熟练的普通话/国语进行交流。" on User:Atitarev's page? How do I get it to show that way on my page too? Another more subtle error is that zu-0 (I think that is the name, I assumed Zulu has code zu) renders as "Lomsebenzisi akanalo noluncane ulwazi lwesiNgisi (okanye kunzima kakhulu ukusiqondisisa)". Um, what? lwesiNgisi? So a long Zulu phrase telling people that "This user doesn't know English (or understands it with great difficulty)"? Surely ought to be "lwesiZulu", "Zulu", right? Somebody correct these please.

MGorrone (talk) 14:24, 18 March 2017 (UTC)

I'm struggling to interpret the Zulu sentence in other ways as well. I'm not sure if it's just my lack of knowledge or that it's actually badly written. —CodeCat 14:49, 18 March 2017 (UTC)
Effectively my translation is a guess. I subsequently tried working it out word-for-word with isizulu.net, and "noluncane" and "okanye" seem to be non-existent, whereas the rest seems like an awfully wordy and formal sentence. Let me try to give the idea: "This-user does-not-have-it [noluncane] he-knows-it of-English ([okanye] it-is-heavy indeed to-make-it-explain)". "Noluncane" might be some strange prefix + "oluncane" = "a little", so the first part would be "This user does not have a little knowledge of Zulu" (negative? Why?). Could be "na-", with/have, but that makes things redundant. As for "okanye", no idea. Maybe "Lomsebenzisi akalwazi isiZulu (noma uluqonda kanzima kakhulu)" could be better? MGorrone (talk) 16:12, 18 March 2017 (UTC)
The strange prefix is na- in all likelyhood, so it might mean "does not have even a little", modifying a class 11 noun, presumably ulwazi. But normally modifiers follow the head in Zulu, so I'd expect ulwazi oluncane (a small knowledge). So this may be a point where my knowledge is lacking. Perhaps this is some kind of fronting for emphasis. Okanye is also a mystery, though it must be somehow linked to kanye and -nye. -qondisisa is not a double causative but rather from -qonda (understand) with the intensive suffix -isisa. —CodeCat 16:32, 18 March 2017 (UTC)
If it mentions English and doesn't mention Zulu, something's wrong. We need to get a fluent speaker to provide a correction, and they can verify or fix the grammar at the same time. Chuck Entz (talk) 20:06, 18 March 2017 (UTC)
Looking further, I see that User:MGorrone uses the MW Babel extension (#Babel:it|), but User:Atitarev uses our {{Babel}} template. The former gets its texts from an exterior source, while ours gets them from our own User templates if they exist. Aside from the texts, there may be differences regarding language codes, as well. Chuck Entz (talk) 20:32, 18 March 2017 (UTC)
The correction the user gave is correct, as far as the language name is concerned. It should be "lwesiZulu", which is the class 11 possessive of isiZulu (listed in the inflection table). —CodeCat 20:39, 18 March 2017 (UTC)
I tried {{Babel|it-5|en-4|fr-3|es-3|de-1|cmn-3|ja-1|la-3|grc-2|nap-2|lmo-1|eml-3|ro-0|id-0|zu-0|ru-0|cs-0|sk-0|sh-0|nl-0|da-0|sv-0|ar-0|Grek-4|Cyrl-2|IPA-4|Arab-1|Hebr-1|template-1|Hira-2|Kana-2|Jpan-1}} and, as you can see on the side, it-5, nap-2, lmo-1, zu-0, sk-0, sv-0 are not recognized. cmn-3, however, now has text in Chinese. Why are those languages not recognized? MGorrone (talk) 13:51, 19 March 2017 (UTC)
Without looking into it, I would guess that means we have no templates for those specific combinations at Wiktionary for {{Babel}} to use. If you have the text for the template to display, it should be a simple matter to create them.
There are tradeoffs between our approach and WM's: their system has more coverage and is more consistent between wikis, but if there's an error or a language code is incompatible with our treatment of language codes here at Wiktionary, you have to deal with a non-WM website that also provides the service to other entities that don't necessarily have the same interests that we do. Chuck Entz (talk) 15:17, 19 March 2017 (UTC)

We should use the #Babel extension and delete individual templates. —Justin (koavf)TCM 15:30, 19 March 2017 (UTC)

The Babel extension doesn't support all of Wiktionary's language codes. —CodeCat 15:47, 19 March 2017 (UTC)
@CodeCat: Then either delete non-ISO codes or use templates only for those (but not, e.g. Zulu). —Justin (koavf)TCM 15:54, 19 March 2017 (UTC)
Feel free to propose that, then. —CodeCat 13:19, 21 March 2017 (UTC)

Should homophones lists include non-lemma forms?[edit]

I saw that in the page manger#French, inflected forms of manger are listed as homophones of the lemma. This is not the case eg for trouver, or many other words. Does somebody find this information useful, or do you know people who might find it useful? Shall we include this information as much as possible for French verbs? — Automatik (talk) 23:59, 18 March 2017 (UTC)

Yes, it's useful. It's also easy to automate. —Μετάknowledgediscuss/deeds 00:00, 19 March 2017 (UTC)
This is unrelated, but it is inconsistent how mangeai is listed as a homophone of manger, but the former is transcribed as having open e and and the latter as having close e. IPA transcriptions of French should probably show the neutralization of the open–close contrast in open syllables: that is, both mangeai and manger should be transcribed as /mɑ̃ʒe/. It seems traditionalist to show a contrast that does not exist. But perhaps there are dialects in which this neutralization does not occur. — Eru·tuon 00:08, 19 March 2017 (UTC)
AFAIK the neutralization of open and close e does not occur in absolutely word-final position in French. Otherwise mangerai would merge with mangerais, mangeai would merge with mangeais. Also AFAIK mangeai has close e, not open e, and the pronunciation is simply wrong here. But I may be mistaken. Benwing2 (talk) 20:26, 25 March 2017 (UTC)
I could be mistaken too. I thought I didn't hear /ɛ/ word-finally, but maybe I was mishearing because my English /ɛ/ is opener. — Eru·tuon 20:35, 25 March 2017 (UTC)

Vote: CFI and place names cleanup[edit]

FYI, I created Wiktionary:Votes/pl-2017-03/CFI and place names cleanup.

Let us postpone the vote as much as discussion requires. --Dan Polansky (talk) 13:24, 19 March 2017 (UTC)

Threats and ethnic slurs from a user[edit]

User:Awesomemeeos threatened to send me, calling me "katsap" to Mordor and used unclear threats, implying I'm not safe. When advised, he just said it wasn't me he meant (katsap) and he is going to take a break. What's the policy? I'm seeking a block. Typing this on the phone, will post links if required. --Anatoli T. (обсудить/вклад) 21:24, 19 March 2017 (UTC)

@Atitarev You may also want to explain this reaction of yours to Awesomemeeos improving edits: [2] --Jan Kameníček (talk) 21:42, 19 March 2017 (UTC)
Awesomemeeos was already blocked for an hour last week to "cool off". He may need a longer cooling-off period this time, e.g. a day. He's said before he has "mental issues" and his "mind was messed up". Perhaps he needs some time off to sort his thoughts. @Jan.Kamenicek: Atitarev's objections are not to Awesomemeeos's edits (which are unobjectionable) but to his edit summaries, which seem to contain veiled threats. —Aɴɢʀ (talk) 21:45, 19 March 2017 (UTC)
I see, I have not noticed, thanks. --Jan Kameníček (talk) 21:50, 19 March 2017 (UTC)
When I blocked them for an hour last week, they sent me a rather bizarre email (not threats to me, but disturbing). My impression is that they have no clue about how to deal with human beings who disagree with them, and resort to over-the-top extreme statements without thinking about what they're really saying. Of course, I'm on another continent and you're within driving distance, so you don't have the luxury of dispassionate analysis of their actions. The best I can suggest is contacting the Wikimedia Foundation for advice, though I'm not sure of the correct procedure. Chuck Entz (talk) 22:50, 19 March 2017 (UTC)
  • To be fair when Awesomemeeos said "and atitarev thinks that he's safe. WELL he's not!" (in these two—now hidden—edit summaries: diff and diff), from the context of what he was doing before and after this (see also this—now hidden—edit summary: diff "and Chuck's next!"), I think he meant that Atitarev is not safe from pages relating to him being edited. It doesn't seem that he meant any threat to Atitarev's physical safety. I still think these are inappropriate edit summaries, which is why I have hidden them. In an edit in one of his sandboxes (diff), Awesomemeeos wrote the Ukrainian IPA transcription of "ви Мо́рдор каца́па" (which our current version of {{uk-IPA}} transcribes as [ʋe ˈmɔrdɔr kɐˈt͡sɑpɐ]), by which he may have meant "to Mordor with the katsap", which is a bit more troubling, but I would still assume that he was fooling around and perhaps expressing his frustration with Atatirev, which is inappropriate again, but this still shouldn't be taken as a serious threat. I don't oppose the block. --WikiTiki89 00:43, 20 March 2017 (UTC)
  • Awesomemeeos's edits are not unobjectionable. He boldly edits in languages he does not speak and makes too many mistakes. I would block him indefinitely for disruptive edits. --Vahag (talk) 07:35, 20 March 2017 (UTC)
  • Despite ethnic slurs, I do not get the impression that he is genuinely malicious, and I feel that it would be a waste of human potential to ban committed language enthusiasts no matter how randumb their ways are. On the other hand, I'm not the one who has to clean after him, and blocking him for a few days will certainly do no harm. Crom daba (talk) 21:41, 20 March 2017 (UTC)
    • The problem is that they don't have an real grasp of things like interacting with human beings or the ethics of working on reference material. They may try very hard to do things right, but any time they encounter a new situation they fall back on their old, bad approaches and go astray- in often bizarre fashion. For instance, telling someone who's given you a short block "to cool off" that you "have mental issues" isn't exactly a winning strategy (I won't tell you what they said via email, but it was worse). It wouldn't surprise me if they thought the ethnic slur was just harmless banter. They remind me a lot of Uther Pendrogn in some ways, but without the defiance or aggressive/abusive behavior (maybe RazorFlame is closer, but you probably aren't familiar with them).
    • At any rate, the question is whether their positive contributions outweigh the time and effort necessary to keep them from damaging the project- especially since their penchant for editing in difficult languages makes for disproportionate impact on the specialist editors we need most for other things. Chuck Entz (talk) 04:23, 21 March 2017 (UTC)
      • Wikipedias sometimes apply topic bans towards editors who are good contributors generally but make harm when editting some specific topics. I do not know Awesomemeeos much so I cannot say if this is the same case, I am just presenting the possibility of banning a user from editting e. g. anything besides the languages s/he can safely speak. --Jan Kameníček (talk) 15:28, 21 March 2017 (UTC)
        • As far as I can tell, Awesomemeeos's only "safe" contributions are when he templatizes untemplatized text without adding any content to it (even in these cases, the "danger" is that he will also add an improper gloss or something like that). I haven't seen him actually add any substantial material to entries, although I haven't been stalking all his edits or anything. If I'm wrong about this, I apologize, but that is my impression. --WikiTiki89 15:35, 21 March 2017 (UTC)
          • Awesomemeeos seems back to have unhealthy, in my opinion, interest in my name - first name and surname. He did also edit earlier (before the block) katsap, Russki, Entz, Petrosyan, people who he is angry with. There are no threats in the edit summaries this time but this reaction bothers me. @Jan.Kamenicek As was mentioned above, my reaction was to the user's edit summary, which is a concern - "不,Atitarjóv!我比你好!" - "No, Atitarjóv! I'm better than you!" when editing my first name's entry. --Anatoli T. (обсудить/вклад) 00:44, 23 March 2017 (UTC)

"Alternative reconstructions" or "Reconstruction" header?[edit]

Some reconstruction-namespace entries have an "Alternative reconstructions" section. I've been putting alternative reconstructions under "Alternative forms" but it occurs to me this might be wrong. Is "Alternative reconstructions" a legitimate section header? Also, I've seen some sections titled "Reconstruction" near the top, which contain descriptive text about alternative reconstructions. Is this legitimate? If both are legitimate, which is preferred?


Benwing2 (talk) 03:19, 22 March 2017 (UTC)

I think that "Alternative reconstructions" has more of a referential feel- that is, here are some other ways that scholars have reconstructed this thing. And there's no definitive list of "legitimate" headers. DTLHS (talk) 03:28, 22 March 2017 (UTC)

Belarusian Taraškievica[edit]

To whoever might be interested in what's happening with Belarusian.

You might know that there are two Belarusian Wikipedias - one with the language code "be" and one with "be-x-old". The latter promotes the traditional Belarusian orthography or Taraškievica. In fact, it's not just spellings, eg сёння (sjónnja) vs "сёньня" (pronounced the same way) but Taraškievica uses older words, which significantly differ from the current standard or official orthography, slightly different styles and even grammar. In my latest observation, some liberal or opposition sites switch to Taraškievica, so the difference will continue to exist and the statement that Taraškievica is only used informally or by enthusiasts is no longer true. One significant example is Радыё Свабода (the Belarusian version of Radio Liberty) to which I am subscribed.--Anatoli T. (обсудить/вклад) 06:49, 22 March 2017 (UTC)

So what do you think we need to do about it? --WikiTiki89 15:37, 22 March 2017 (UTC)
I don't know yet but labelling may change, depending where things go in terms of frequency and availability of citations. --Anatoli T. (обсудить/вклад) 19:55, 22 March 2017 (UTC)


Could someone fix the Hindi translation here? There's something wrong with it. ---> Tooironic (talk) 11:46, 22 March 2017 (UTC)

Fixed. --Jan Kameníček (talk) 11:51, 22 March 2017 (UTC)

{{also}} for character entries[edit]

@Haplology, Eirikr, Suzukaze-c Should {{also}} used at the top of character entries include only alternate forms of the character? I've recently noticed pages like and which list all kinds of completely unrelated characters (perhaps only because they are slightly similar in form), and I have been removing any which are not alternate forms. Should I continue to do this as I come across these? For example, had

See also: , , , , , , , , , , , , , , and 𠨰

and I removed all but 𠨰 which is the only one which could be considered an alternate form. Likewise had

See also: , , , , , , and

and I removed , , and and left , and which are the only true alternate forms. Typos corrected. Sorry for re-pinging you all (talk) 18:39, 22 March 2017 (UTC)

{{also}} should be used for visually similar entries, not alternative forms (these should be placed in their own heading).
I, at least, find these helpful since I often use an OCR program for reading Chinese texts (usually one word glosses from dictionaries) and sometimes it can mistake one character for the other. Crom daba (talk) 20:20, 22 March 2017 (UTC)
@Crom daba Thanks for mentioning how you have found these helpful, but Wiktionary is not intended to be an OCR program assistant. I'd like to hear other views on this especially from the Japanese editors I have pinged, but of course I'm interested in what Chinese editors have to say about this too. 馬太阿房 (talk) 06:02, 23 March 2017 (UTC)
I also think putting visually similar characters in {{also}} is OK. —suzukaze (tc) 06:10, 23 March 2017 (UTC)
@Suzukaze-cThanks for your input on this. I just now noticed that the {{also}} template documentation page did say something about how visually similar terms may be included even if unrelated, but then I checked cama, the page that was referenced for the example
See also: čama and сама
and found that it was reverted by a bot and no longer shows the mearly visually similar term.
Personally, I wouldn't mind seeing these under their own heading such as "Visually Similar" or something like that. 馬太阿房 (talk) 06:26, 23 March 2017 (UTC)
I would say that the edit to was exactly the opposite of what should have happened. As others said, alternative forms go in their own ===Alternative forms=== section inside the relevant language section. {{also}} is for visually similar / confusable things, which may not even be in the same language: for example, aca links to acá although the two have no languages or meanings in common. Of course, for alphabetic languages (or just Latin-script ones?) it is relatively simple to decide what counts as "visually similar" (basically, anything that differs only by diacritics, case, some punctuation, or use of another script like a vs а), and the process of adding {{also}} has even finally been automated (yay!). For Chinese characters it might be a bit more subjective; I don't know. But most of the things listed at except the last one which is still there now seem similar enough to list. (I don't think it would make sense to have a headered "Visually similar" section, because it would necessarily not belong in any specific language section...) - -sche (discuss) 06:37, 23 March 2017 (UTC)
Based upon what appears to be popular concensus, even though I haven't heard from a couple of people I pinged on this discussion, I have reverted these entries back to include the visually similar forms in {{also}}. Thanks to all of you for sharing your thoughts on this.馬太阿房 (talk) 15:49, 23 March 2017 (UTC)

Where to record usage of the form "the X"[edit]

Should a sense of the form "the X" be covered at [[X]] with a label saying {{lb|en|with 'the'}} or {{lb|en|with definite article}}, or should it be split to [[the X]]? Does it vary, and if so by what criteria? The norm/trend seems to be to put the information at [[X]] with a redirect from [[the X]]: hence, "the weed" (tobacco) is at "weed", "the deep" (sea) is at "deep", and likewise for "the world", "the alliance", "the dead", "the underground", etc. But "the bomb" is its own entry, likewise "the man". See Talk:the pits and Talk:the regions for some old discussion and other examples. The most important thing IMO is that there's a pointer at whichever form we don't lemmatize towards whichever form we do lemmatize, especially a {{only used in}} sense line on [[X]] if we put anything at [[the X]], because most people probably know what [[the]] means and will only look up [[X]]. - -sche (discuss) 16:19, 23 March 2017 (UTC)

Personally, I add the in the head, e.g. ((en-proper noun|head=the Eiffel Tower)). I don't usually include it in the entry title. Equinox 16:21, 23 March 2017 (UTC)
Also see the sex, which I believe prompted this discussion. I would argue that the man, the bomb and the sex all have meaning beyond just the + man etc., whereas the alliance is just the definite of alliance. These are two cases, and a third is when the definite article is obligatory and hence does not change the meaning, as in the Eiffel Tower, the Dead Sea, the Grim Reaper and Det Døde Hav, which was deleted. I am not sure what to do in the latter case.__Gamren (talk) 18:41, 23 March 2017 (UTC)
English proper nouns can be used attributively (White House press conference), in compounds (White House-style), and with other definite determiners (Obama's White House, this White House). It seems a bit misleading to include the, even just in the inflection line, without having a long usage note explaining the exceptions. The "exceptions" are really just part of normal usage of proper nouns, ie, part of grammar. DCDuring (talk) 18:59, 23 March 2017 (UTC)
Users won't know about the "the" unless they see it somewhere, since your examples also work for non-"the" nouns: "Greenpeace press conference", "Greenpeace-style"... Equinox 19:05, 23 March 2017 (UTC)
I agree the "the" should be somewhere. IMO it makes the most sense in the context label when a term has multiple senses (like "the weed", "the man", etc). Perhaps we could even create a label just for that (currently 100+ entries spell it out manually) which would link to an appendix that explained the circumstances under which the article was deleted. When a term has only one sense, and especially in the case of proper nouns, it might make more sense to put it in the head or page-title. - -sche (discuss) 19:48, 23 March 2017 (UTC)

Separate entries for Persian word forms[edit]

Is there a reason that Persian word forms don't have their own entries? And, relatedly: that verb forms in conjugation tables aren't clickable? Most verbs are fairly regular, and the adding of these conjugated forms could probably be easily automated. 10:01, 24 March 2017 (UTC)

CAT:Persian non-lemma forms and CAT:Persian verb forms do exist and aren't empty, so there are some such entries. If there aren't more, it's just because no one has bothered adding them yet. The inflection tables would have to be edited to make the terms linkable. —Aɴɢʀ (talk) 10:50, 24 March 2017 (UTC)
Thanks for your reply. I'm working on a bot to create them automatically (User:HannesPBot) and I'm waiting to have it approved. ✎ HannesP · talk 09:43, 30 March 2017 (UTC)
@HannesP What is your level of knowledge of Persian? Are you able to evaluate the correctness of the forms you would be creating? DTLHS (talk) 15:55, 30 March 2017 (UTC)
@DTLHS Yes, I consider my knowledge of Persian sufficient, and I have a rather thorough grammar (including verbs, their stems and conjugations) for reference. ✎ HannesP · talk 22:54, 2 April 2017 (UTC)

Old Russian vs. Old East Slavic; Old Slovak, Old Ukrainian, Old Belarusian, etc.[edit]

A few issues related to old Slavic languages:

  1. Should we have a separate language code for Old Russian vs. Old East Slavic? Most sources don't distinguish them, but they use "Old Russian" in place of "Old East Slavic". The problem here is that we give Old East Slavic (10-12th centuries, maybe 13th century?) as an ancestor of all East Slavic languages, whereas Old Russian (13th century and later?) is only the ancestor of Russian. "Old Russian" seems to go up through the 17th century in most sources. We could use zle-oru for Old Russian.
  2. Should we have separate language codes for Old Slovak, Old Ukrainian, Old Belarusian? We already have Old Czech and Old Polish. Old Belarusian in particular uses quite different spelling from modern Belarusian. We could use zlw-osk for Old Slovak, zle-ouk for Old Ukrainian and zle-obe for Old Belarusian. Benwing2 (talk) 17:17, 24 March 2017 (UTC)
At some point, Old East Slavic developed into two literary standards, which we might call Old Russian and Old Ruthenian. Whether that makes them separate languages, I'm not sure. Don't forget that the modern spoken languages (as always) are descended from the dialect continuum of the spoken language and not from either of the two literary standards (although the literary standards did of course have some influence on the modern written languages, which in turn had some influence on the modern spoken languages). Furthermore, as you say, the term "Old Russian" often refers to the entire period of OES up to modern Russian. --WikiTiki89 17:44, 24 March 2017 (UTC)
It has also become a very sensitive topic for Ukrainians and Belarusians. Some view that even the name "Russian" was usurped by Muscovy, then modern Russia. Nevertheless, "Old East Slavic" is called давньору́ська мо́ва (davnʹorúsʹka móva) (the Old Russian/Rusian language) in Ukrainian and similarly in other Slavic languages. Although, the term for "Old Ukrainian" also exists. Note that in Ukrainian ру́ський (rúsʹkyj, of Rus) is different from росі́йський (rosíjsʹkyj, Russian). --Anatoli T. (обсудить/вклад) 04:27, 25 March 2017 (UTC)
OK. What should I do with Old Belarusian and Old Ukrainian terms? For Old Russian starting in the 13th century, I've taken to using language code orv (Old East Slavic) but listing the language as "Old Russian" instead of "Old East Slavic", and giving only Russian as a descendant instead of also including Ukrainian and Belarusian. Should I do the same for Old Belarusian/Old Ukrainian, or use modern language codes uk, be, or create new language codes zle-ouk, zle-obe? Benwing2 (talk) 20:13, 25 March 2017 (UTC)
I think we need to see some cases first. Old East Slavic or Old Russian (meaning "Rusian", of Rus) is definitely the parent of Russian, Ukrainian, Belarusian and Rusyn. The terms that differ between Russian on one side and Ukrainian/Belarusian on the other are normally Old Church Slavonic influences on Russian, Polish influences on Ukrainian/Belarusian or loanwords from other languages influencing any of the three. Is there sufficient material on written Old Belarusian and Old Ukrainian (Ruthenian), anyway? --Anatoli T. (обсудить/вклад) 09:09, 27 March 2017 (UTC)
Supposedly, a lot of state documents of the Grand Duchy of Lithuania were written in Old Russian/Ruthenian/Belarusian and there's also Arab script Ruthenian used by the Tatars, so I'm sure there's some interesting material out there, someone who's into OES should find some of those texts and see if they warrant a split.
@Useigor seems to be the most active OES editor atm, so perhaps his input could be valuable. Crom daba (talk) 11:32, 27 March 2017 (UTC)
@Crom daba I have fixed your WP link. The Belarusian mentioned is pretty much modern Belarusian or almost. The text snippets I see there are heavily influenced by Polish, whose influence is more common with the opposition who try to distance themselves from Russia. There were many scripts floating around, including various Latin alphabets for Belarusian. The current official spelling and word choices have never been completely adopted by everyone. See also WT:BP#Belarusian_Taraškievica. --Anatoli T. (обсудить/вклад) 11:50, 27 March 2017 (UTC)
Thank you. So maybe lump post 13th-century (or at least 16th-century) Belarusian Ruthenian together with modern Belarusian? Crom daba (talk) 11:55, 27 March 2017 (UTC)
Well, that's around the time of the split. At least (modern) Russian is told to have formed by then (by the 15th century?). It's not the same Russian used today of course or even in Pushkin time. Perhaps same with Belarusian. The text from your link can be identified by nothing but Belarusian ("be" language code). --Anatoli T. (обсудить/вклад) 12:26, 27 March 2017 (UTC)
I have analized 3 sources as much as i could, so:
  1. Vasmer mentions:
    • 15+ "ср.-болг." in бдын, болван, бор, годовабль, дегна, делва, дрозд, золовка, колбаса, орябь, пахирь, плоска, под, подпега, -цепить, ...
    • 1+1 "ст.-польск." in усторобиться (also in Trubachev comments: поджарый)
    • 0+2 "ст.-укр.": -- (but in Trubachev comments: винтовка, цинга)
  2. ЭССЯ mentions:
    • "ст.-чеш." in every volume
    • "ст.-укр." in every volume
    • "ст.-польск." in every volume
    • "ст.-блр." almost in every volume
    • "ст.-слвц." almost in every volume (frequently after 26)
    • "ст.-русск." / "ст.-рус." in ~3/5 of all volumes (frequently in 14 and after 23)
    • "ст.-сербохорв." in ~3/4 of all volumes
    • "ст.-серб." / "ст.-сербск." in ~1/4 of all volumes
    • "ст.-болг." in ~1/6 of all volumes
    • "др.-болг." rare (many of them in parantheses beside "ст.-слав.")
    • few "ст.-хорв."
    • few "ст.-словен." but frequently since 27 volume
    • few "др.-чеш."
    • 1 "ст.-серболуж.
    • 1 "др.-польск."
  3. Derksen (from list of abbreviations): Middle Bulgarian, Old Belorussian, Old Czech, Old Polish, Old Russian (in sense Old East Slavic), Old Slovak
Thus probably needing codes for Middle Bulgarian (bgm?), Old Slovak, Old Belorussian, Old Ukrainian, Old Russian, Old Serbo-Croatian (zls-sh?), Old Slovene (zls-sl?). —Игорь Тълкачь (talk) 21:20, 1 April 2017 (UTC)
Perhaps some of these could be etymology-only variants of OCS. Crom daba (talk) 11:21, 2 April 2017 (UTC)
@Atitarev There are lots of instances of Old Belarusian, Old Russian, Old Ukrainian in the Reconstruction name space, e.g. for Old Belarusian see Reconstruction:Proto-Slavic/načęti, Reconstruction:Proto-Slavic/mьněti, Reconstruction:Proto-Slavic/měniti, etc. The spelling of Old Belarusian looks like Old East Slavic and not like modern Belarusian. I actually added a hack into Module:be-translit to account for и (no longer in modern Belarusian), on the assumption that Old Belarusian would use the be code; but it probably makes more sense to make Old Russian, Old Ukrainian and Old Belarusian be etymology-only variants of Old East Slavic. (Note meanwhile that Wiktionary actually defines Old Belarusian as "Old East Slavic".) Meanwhile we should probably create a new language code zlw-osk for Old Slovak, since we already have separate codes for Old Polish and Old Czech. Anyone object if I make these changes? Benwing2 (talk) 16:07, 2 April 2017 (UTC)
Thanks, I'm OK with examples provided and have no objections to new language codes. --Anatoli T. (обсудить/вклад) 00:06, 3 April 2017 (UTC)
@Benwing2: I think it was a mistake to create these codes based on the descendants given on the reconstruction pages, at least without further investigation. For each form given, it needs to actually be verified what period it is from (if it indeed exists) and only then would we have a clearer picture of which language codes need to be created. --WikiTiki89 20:36, 3 April 2017 (UTC)
The need for these codes is real. Old East Slavic, for example, only went through the 13th century per Wikipedia Ruthenian language or at the latest the 15th century per Old East Slavic (although Ukrainian had already split off by the 13th or 14th century). The sources are pretty consistent in saying "Old Ukrainian" and "Old Belarusian" (and often give dates). Benwing2 (talk) 20:52, 3 April 2017 (UTC)
@Benwing2: Regardless of how you split up the language, the real problem is identifying which words belong to which period of which dialect. I wouldn't trust another dictionary's labeling of "Old Ukrainian", "Old Belarusian", etc. unless that dictionary itself precisely defines what they mean. If the sources give dates, that's a great start (we should add them to the entries so we can keep track better). It would be even better to find actual quotations. --WikiTiki89 21:12, 3 April 2017 (UTC)
I agree. I can imagine it would be quite hard to find sufficient quotations of Old Belarusian or Old Ukrainian, which are different from modern languages and different from Old Russian. --Anatoli T. (обсудить/вклад) 23:50, 3 April 2017 (UTC)

reference on Serbo-Croatian grammar (in particular including tones and accent classes)?[edit]

This is a repeat of a request I posted in January. I am looking for a reference on Serbo-Croatian grammar similar to Zaliznyak's Russian grammar [Grammatičeskij Slovar' Russkovo Jazyka], specifically one that would show all the various accent classes. Ideally it would be written in English because I don't speak Serbo-Croatian, but I can probably puzzle out the Serbo-Croatian language using Google Translate. Ideally it would also be a morphological dictionary indicating the accent class of each word, but the following reference might be sufficient: [3] Ideally it would also be available online. Thanks! Benwing2 (talk) 19:06, 24 March 2017 (UTC)

I am growing more certain that such a thing does not exist.
Supposedly the alfanum project has compiled this data for speech recognition purposes, the data here being accentuations of case forms, not an actual classification. However they haven't published this dictionary and I don't know if they will.
Unlike Russian stress, SC accent is a more delicate thing and I'm not sure if accentuation of inflected forms was ever really strictly prescribed by the linguistic authorities. These sort of details are mostly found in dialectological descriptions of particular varieties.
Crom daba (talk) 00:02, 25 March 2017 (UTC)
@Crom daba Thanks. I'm almost certain that a reference guide for Serbo-Croatian accents exists, since there does exist a standardized language, as found in dictionaries. Benwing2 (talk) 16:09, 2 April 2017 (UTC)
@Benwing2: You might be interested in Daničić's work, he's one of the key founders of "standard Serbo-Croatian".
In practice, the standard is the pronunciation of the capital cities, Zagreb and Belgrade, there wasn't much need to codify accents beyond that. After all, they aren't written outside of work which specifically deals with them.
There's also the fact that throughout socialist Yugoslavia the Serbo-Croatian project had to walk the tightrope between unitarism and nationalism, and there was no space for monumental works like those done in the 19th century when an entirely new literary language was built from scratch on the basis of a spoken language.
Now of course, 'language experts' are free to play at creating artificial languages to our frustration, so such a dictionary could perhaps be created in near future, except of course, it won't say "Serbo-Croatian" on the cover. Crom daba (talk) 21:21, 3 April 2017 (UTC)


The commonly used template, {{l}}, creates a link with language tagging around it (<span lang="language_code"></span>), while the template {{ll}} only creates a link with no language tagging.

I have been using the latter template for linking English terms in running text, as it seems inelegant to add language tagging in the middle of English text.

I've also used it once in a headword, to allow the headword to link to a senseid. Using {{l}} in the headword results in redundant HTML. For instance, this is the HTML source code in the Noun section of the entry عَبْد (ʿabd) when the template {{l}} is used inside the headword template:

<b class="Arab" lang="ar" xml:lang="ar">
	<span class="Arab" lang="ar" xml:lang="ar">
		<a href="/wiki/%D8%A3%D9%85%D8%A9#Arabic-slave" title="">أَمَة</a>
Arabic words are enlarged on Wiktionary if they are encircled by two HTML tags both having the class "Arab" and lang attribute "ar".

The <b> and <span> tags both have the class Arab and the lang attribute ar. This is redundant; the <span> tag should be omitted, by using the template {{ll}} rather than {{l}}. (It also causes the Arabic word أَمَة to be enlarged as shown on the right, at least in my browser, due to the CSS. Not quite sure why this is and if it can be easily fixed.)

@CodeCat has been replacing {{ll}} with {{l}} (or removing it altogether, if it is in a headword; examples: 1, 2, 3, 4). She posted on my talk page explaining why: she says it is too complicated to add more linking template, {{ll}}, to the more commonly used templates {{m}} and {{l}}.

I do not agree. I recognize that {{ll}} is not widely used right now, but the HTML code of Wiktionary would be improved if {{ll}} were used more widely in place of {{l}}. We would have less redundancy in language tagging.

I gather that @Wikitiki89 created the template {{ll}}. There has been discussion about why the template is needed on his talk page in 2014, as well as on CodeCat's request for the template to be deleted.

I'm posting here, because the use of {{ll}} is a question of policy: should we widely deploy this template, and formulate clear criteria for where it is to be used, or not? I would argue that it should be used in several cases:

  1. In the inflection form parameters of headword templates, when one wishes to link to a particular sense id for the inflected form.
  2. In English text, where language tagging is not needed.
  3. In running text of another language, when the whole text is language-tagged.

Its use would improve the cleanness of Wiktionary's HTML code. Language tagging should only be added when text in one language is inside of text in another language. Otherwise, there should be a link but no language tagging.

CodeCat doesn't find this reasoning (if it qualifies as reasoning) persuasive, so I guess I'm curious what other editors think. If there is consensus against using {{ll}} because it's too complex to have three basic linking templates, then I will by all means revert my changes adding it. Otherwise, I would like to keep the instances of the template that I have added, and deploy it more widely. — Eru·tuon 03:14, 25 March 2017 (UTC)

Concerning case 1: what is the advantage of using {{ll|id=foo}} instead of {{l|id=foo}}? — Ungoliant (falai) 03:21, 25 March 2017 (UTC)
The same as in case 3, more or less. When languages have CSS that increases the font size, embedding one set of language tags inside another causes the font size to increase twice. I would argue that this is an issue to be fixed with better CSS though, and not with an additional template. Furthermore, it doesn't account for the case in which language 1 appears inside text of language 2; the tagging is necessary here and thus {{l}} is unavoidable, and so is the double size increase. Only CSS can provide a proper fix. —CodeCat 03:30, 25 March 2017 (UTC)
In addition to the CSS-related rationale, look at the example given above, from the entry on عَبْد (ʿabd). Both {{l|id=foo}} and the headword template (in this case, {{ar-noun}}) add script and language attributes. Thus, using {{l}} inside a headword template duplicates these attributes in the HTML unnecessarily. This duplication can be avoided by using {{ll}} instead. — Eru·tuon 03:31, 25 March 2017 (UTC)
I see. That does make sense. — Ungoliant (falai) 03:35, 25 March 2017 (UTC)
Can you expand on the "cleanness of HTML" rationale? What is the benefit of having clean HTML, and what does it mean for HTML to be clean? DTLHS (talk) 03:28, 25 March 2017 (UTC)
By clean HTML I mean adding a particular HTML attribute only to one tag around a given word. I can't give a practical reason why it is beneficial (for instance, why messy HTML would cause problems), except that I prefer to avoid redundancy. — Eru·tuon 03:33, 25 March 2017 (UTC)
To say it another way, clean HTML should have only as many tags as needed to serve a purpose. The code above has a <span></span> tag whose only purpose is to contain the script and language attributes class="Arab" lang="ar". These attributes are already applied by the <b></b> tag, so the <span></span> tag is not needed. To me, it is simply axiomatic that needless things should be removed. That would be done in a Lua module; why should it not be done in entries? — Eru·tuon 05:38, 25 March 2017 (UTC)
What about applying it inside |head= in headword lines of compounds, would that be a valid use @CodeCat? Crom daba (talk) 13:48, 25 March 2017 (UTC)
  • As a possibly naive question, why use this at all? I tried figuring out the use case based on examples, and I could not find any instances where {{ll|TERM}} was preferable to simply using [[TERM]]. In headword templates and {{ux}}, regular square-bracket links are already rejiggered to point to the correct language when not otherwise specifying a different target. I am puzzled why anyone would use templates within the headword template, though I recognize that my use cases for Japanese likely just don't intersect with such needs. ‑‑ Eiríkr Útlendi │Tala við mig 18:12, 27 March 2017 (UTC)
    • Not a naive question: the reason for using {{ll}} in the headword template is to provide a sense id for one of the forms, or for a link within a term consisting of more than one word. Otherwise, you don't need to manually link the forms in the headword, because they're automatically linked.
    • In the example above, from عَبْد (ʿabd, slave), the headword links to the the entry for أمة. The sense id is necessary because there are two entirely different words in that entry (since Arabic entry names do not contain vowel diacritics) – أُمَّة (ʾumma, nation) and أَمَة (ʾama, female slave) – of which the latter is the intended target of the link in the headword template. — Eru·tuon 19:22, 27 March 2017 (UTC)

Vote: Desysopping for inactivity[edit]

FYI, I created Wiktionary:Votes/pl-2017-03/Desysopping for inactivity.

Let us postpone the vote as much as discussion requires, if at all. --Dan Polansky (talk) 14:10, 25 March 2017 (UTC)

When did params 4/5 and gloss= get deprecated?[edit]

Somewhere recently someone made a decision that param 4 (param 5 in some templates) and param gloss= should be deprecated in favor of param t=. When was this decision made? I don't think I agree with it. Thanks. Benwing2 (talk) 20:23, 25 March 2017 (UTC)

This should be the last public discussion we had on this WT:Beer_parlour/2016/October#Deprecating_glosses_as_the_fourth_positional_parameter_of_.7B.7Bm.7D.7D_and_.7B.7Bl.7D.7D.
Personally, I prefer |t=, but I don't see any value in deprecating alternatives. Crom daba (talk) 21:23, 25 March 2017 (UTC)
Thanks. So it looks like there was no vote, and not even any real consensus established. I think we should un-deprecate those other parameters. If we are to deprecate them, then (a) there should be a vote, (b) if the vote passes, a bot should be run to convert existing uses to use t= instead. Benwing2 (talk) 21:40, 25 March 2017 (UTC)
I think it should be voted on as it may go against the keyboarding habits of so many contributors and few participated in the discussions. DCDuring (talk) 23:52, 25 March 2017 (UTC)
I favor the use of |t=, but I agree with everyone else that there should be a vote, for the additional reason that we need to resolve the frustrating situation where certain editors (such as @Angr) are replacing numbered parameters with |t= and then being reverted. — Eru·tuon 00:01, 26 March 2017 (UTC)
Symbol oppose vote.svg Oppose I oppose the deprecation of |4=. It is widely in use and, on average, cuts down on a character. @Angr and others should suspend their conversions until it's properly voted upon. --Victar (talk) 04:09, 27 March 2017 (UTC)
It would seem sensible to vote on this - having more than one parameter name for the same argument is never good as it confuses new editors. (I have always used "gloss=" but would happily change to t= if I'd been told!Saltmarsh. 05:04, 27 March 2017 (UTC)
FYI, I support the depreciation of |gloss=. For that, there were actual discussions. --Victar (talk) 17:54, 28 March 2017 (UTC)
Deprecated doesn't mean you can't use it, only that it is preferable to use |t= now. I don't think this needs a real vote unless we are actually talking about bot replacement and removing support for the old parameters. I think a BP poll should be enough. So far the only people objecting seem to be Benwing and Victar. --WikiTiki89 13:33, 27 March 2017 (UTC)
I Symbol oppose vote.svg Oppose deprecating without a proper vote. Crom daba (talk) 18:22, 27 March 2017 (UTC)
@Crom daba: Just to be clear, are you referring to the word "deprecating" or to the idea of preferring one parameter over another? I'd be happy to change the phrasing to something like "|t= is the preferred parameter", and I really don't see why doing that would require a formal vote. --WikiTiki89 19:30, 27 March 2017 (UTC)
I simply believe that stating preferences like that holds more weight than it appears, and that having it resolved in a proper vote will make it more acceptable to editors who disprefer it. Crom daba (talk) 21:04, 27 March 2017 (UTC)
I think having a proper vote will give it more weight than it should have. --WikiTiki89 21:11, 27 March 2017 (UTC)
@Wikitiki89: Tell that to users who are actively converting |4= to |t=. And I quote, "'Deprecated' means "don't use it anymore". --Victar (talk) 17:54, 28 March 2017 (UTC)
  • Deprecation isn't just about preference but also (more importantly) intent of future removal of a deprecated thing (see our definition) which does call for a vote. --Giorgi Eufshi (talk) 05:52, 28 March 2017 (UTC)
@Wikitiki89 Absolutely everyone else who has weighed in has expressed a desire to have a vote on this. We really do need a vote. Benwing2 (talk) 07:02, 28 March 2017 (UTC)
Fine let's vote on it then. --WikiTiki89 18:04, 29 March 2017 (UTC)
I've created the vote: Wiktionary:Votes/2017-03/Deprecating 4=, 5= and gloss= parameters in favor of t=. --WikiTiki89 18:32, 29 March 2017 (UTC)
The vote should mention when and how the params should be deleted/replaced or else the best it is going to achieve is a small-ass note in the template documentation page. --Dixtosa (talk) 19:03, 29 March 2017 (UTC)
The small-ass note is all I really want, which is why I didn't think we needed a vote. Normally features are deprecated long before it is known when they will actually be removed, so I think it's fine not to mention it. --WikiTiki89 19:08, 29 March 2017 (UTC)

Blocking Wikidata vandals[edit]

If we use Wikidata, and someone vandalizes the data, obviously we won't be able to block them without having special rights in that project.

I don't suppose any of us regular Wiktionary editors is a Wikidata admin too? It would be nice if we got some people with admin rights in both projects eventually.

Apparently we'll need to start using this page to request blocks when needed: wikidata:Wikidata:Administrators' noticeboard. --Daniel Carrero (talk) 12:23, 26 March 2017 (UTC)

One of the dangers of using Wikidata is that if a Wikidata page is vandalized, it will not be visible in Wiktionary Recent changes, and so the vandalism may easily stay unnoticed. Jan Kameníček (talk) 13:48, 26 March 2017 (UTC)
The lag time between vandals being noticed and vandals being blocked is incredibly slow, by our standards. We have vandals that could do a great deal of damage if let loose for an entire hour with no admins around. I had not considered this before, but it really raises a risk that we will not be able to look after our own content effectively. What about questionable edits — Wikidata admins aren't going to have the judgement to know what bad lexicography by well-meaning noobs looks like when they try to reword a basic, like those Daniel suggested putting on Wikidata. —Μετάknowledgediscuss/deeds 16:51, 26 March 2017 (UTC)
@Jan.Kamenicek: You are mistaken about Wikidata showing up Recent Changes or on watchlists. —Justin (koavf)TCM 17:41, 27 March 2017 (UTC)
??? --Jan Kameníček (talk) 18:14, 27 March 2017 (UTC)
@Jan.Kamenicek: I'm not sure what you are asking. You said that Wikidata changes do not appear on Special:RecentChanges. That is not true--they do show up there, and also on watchlists. If you need more help, I will be happy to assist you but I'm not clear on what is confusing. —Justin (koavf)TCM 18:22, 27 March 2017 (UTC)
Chiming in, out of curiosity -- Assuming that Wikidata is included in a Wiktionary page, and that Wikidata is later changed -- do Wiktionary users see any indication on Wiktionary's Special:RecentChanges and watchlists? Or are these changes only visible on Wikidata's wikidata:Special:RecentChanges and watchlists? ‑‑ Eiríkr Útlendi │Tala við mig 18:29, 27 March 2017 (UTC)
@Eirikr: Sorry if I'm somehow being unclear but that is exactly what I am saying, yes--Wikidata changes show up there. They are also published through the syndication feeds as well. Unless you primarily follow pages via email updates, there is basically no chance that you will miss changes to Wikidata. I think I understand your fears but while they are legitimate they are in this instance entirely misguided. —Justin (koavf)TCM 18:33, 27 March 2017 (UTC)
(Also, I have no idea what a "syndication feed" is in this context.) ‑‑ Eiríkr Útlendi │Tala við mig 18:46, 27 March 2017 (UTC)
@Eirikr: Changes made to an entry here which is connected to Wikidata will show up at Special:RecentChanges and on watchlists here, unless you choose to not see them. Syndication feeds are generated for every page in all our wikis. You may be familiar with w:en:RSS? Is this clear enough or is something still confusing? I think at this point, it is simply easiest to show rather than tell if you are having a hard time understanding me. —Justin (koavf)TCM 19:03, 27 March 2017 (UTC)
So, if you change some data on Wikidata which is connected to Wiktionary, it will appear in our recent changes? If that's correct, I was not aware of it yet, but it sounds like good news to me.
Does it work like that on Wikipedia already? I mean, if you change some data on Wikidata which is connected to Wikipedia, will it appear on Wikipedia's recent changes?
For our purposes, if we had to rely just on Wikidata's recent changes (wikidata:Special:RecentChanges), it would be problematic because we would certainly see tons of data changes that are unrelated to Wiktionary. --Daniel Carrero (talk) 19:34, 27 March 2017 (UTC)
@Daniel Carrero: That is exactly correct, yes. And this applies not just to Wikipedia but all other WMF projects as they are all connected now (except the Wiktionaries, Incubator, Old Wikisource, and some of the backend wikis, like outreach:). —Justin (koavf)TCM 19:37, 27 March 2017 (UTC)
If somebody changes the item Albert Einstein on Wikidata, watchers of the only corresponding Wikipedia article Albert Einstein will see it in their watchlist. But the above proposal seems different to me. There is a proposal that Wikidata could be used to store glosses. E. g. the gloss: a domestic canine mammal, Canis lupus familiaris would be stored in some Wikidata item and then used for several different Wiktionary pages: en: dog, cs: pes, de: Hund, pl: pies, ru: соба́ка... Does it mean that watchers of all these English Wiktionary pages would have it in their watchlist if the Wikidata gloss was changed? --Jan Kameníček (talk) 20:00, 27 March 2017 (UTC)
@Jan.Kamenicek If someone modifies d:Q937, then someone who is watching w:en:Albert Einstein and someone watching w:es:Albert Einstein will both see the change in his watchlist. The biggest hurdle is that Wiktionary has two things which more-or-less approximate an interwiki link: both actual interwiki links (like between en:pie and) and a translation (such as between en:foot and). How that will work, I don't know. —Justin (koavf)TCM 21:02, 27 March 2017 (UTC)
@Koavf I was not talking about different language projects but about different language entries within English Wiktionary. I know that d:Q937 is interconnected with w:en:Albert Einstein and w:es:Albert Einstein. The crucial thing for spotting a vandalism on a page I watch is whether d:(page with a gloss about a dog) will be interconnected not only with English Wiktionary en:dog and Spanish dictionary es:dog#Inglés, but also with English Wiktionary pies#Polish, pes#Czech, Hund#German, perro#Spanish, пес#Ukrainian and many others. I think this is not possible at the moment. --Jan Kameníček (talk) 07:20, 28 March 2017 (UTC)
@Jan.Kamenicek: This is something good to bring up at d:Wikidata talk:Wiktionary. —Justin (koavf)TCM 16:31, 28 March 2017 (UTC)
There might be anti-vandal benefits to Wikidata, too, by the way. I don't know how the internals work, but on Wikipedia they have some process to detect hard-to-spot data vandalism, like changing a single digit in a chemical formula info-box. Equinox 19:30, 26 March 2017 (UTC)
They have bots with really impressive heuristics to do that sort of work. Presumably they could be deployed on Wikidata as well. There is also quite a bit more work involved in modifying Wikidata data to vandalize Wiktionary, so the number of vandals would be smaller, but the vandals would probably be more savvy. - TheDaveRoss 23:47, 26 March 2017 (UTC)

Euphemistic forms vs euphemisms[edit]

Should Category:Euphemistic forms by language (things like what the fudge and f***er) be merged into Category:Euphemisms by language (things like answer the call of nature), or should the two stay separate? A merger was proposed in February of 2014 and attracted no attention, so I'm bringing it up here. They do seem hard to keep separate; I see that a into g is classified as a "euphemism", for example, while p.o.'ed is a "euphemistic form", although they seem to be the same kind of thing. - -sche (discuss) 05:59, 27 March 2017 (UTC)

I think the distinction could be somewhat useful, though I don't outright oppose a merger. I would say a into g is just miscatagorized. Andrew Sheedy (talk) 04:51, 28 March 2017 (UTC)

electric toothbrush[edit]

Could someone please fix this error that comes up on the page: Lua error in package.lua at line 80: module 'Module:tracking' not found. Thanks. ---> Tooironic (talk) 02:43, 28 March 2017 (UTC)

I did a null edit on the entry. It seems to be OK now. --Daniel Carrero (talk) 02:45, 28 March 2017 (UTC)
It was my fault. @-sche alerted me to it on my talk page, and I fixed it. But unfortunately, pages take a while to update once a module error is fixed. — Eru·tuon 03:19, 28 March 2017 (UTC)

{{descendant}} template[edit]

If this has been discussed, forgive me, but I would love a {{cog}} equivalent for descent trees, maybe {{desc}}. It seems such a waste of time to me to have to always manually type it out, i.e. Dutch: {{l|nl|vrede}}. It's also far more prone to typos and other errors. --Victar (talk) 18:14, 28 March 2017 (UTC)

@Victar: I've been thinking about making a general linking template that displays the language name before the link in the way that {{cog}} does, but haven't figured out a good name for it. I can fairly easily create the descendant template that you request. — Eru·tuon 20:39, 28 March 2017 (UTC)
All right, the template has been created. — Eru·tuon 21:03, 28 March 2017 (UTC)
We could add a "bor" parameter to make it easier to type the arrow character (which I can never remember). DTLHS (talk) 21:08, 28 March 2017 (UTC)
Done. — Eru·tuon 21:36, 28 March 2017 (UTC)
@Erutuon: Thanks a ton! And good thinking, @DTLHS. I guess if anyone changes their mind about its validity as a template, they can run a bot to convert them back. --Victar (talk) 21:44, 28 March 2017 (UTC)
@Erutuon: Does this template accept etymology-only languages? I frequently use specific dialects in a situation where I would use this. —JohnC5 01:23, 29 March 2017 (UTC)
@JohnC5: Not yet, but I can make it do that. — Eru·tuon 02:58, 29 March 2017 (UTC)
How about the ability to link multiple terms? See transluceo. Ultimateria (talk) 09:41, 29 March 2017 (UTC)
We don't need that anymore than we need it with {{cog}}, {{inh}} or {{der}}. The normal practice is to use {{l}}/{{m}} for subsequent links. —CodeCat 14:18, 29 March 2017 (UTC)
I agree with CodeCat. --Victar (talk) 15:57, 29 March 2017 (UTC)
  • So you're saying we need it and soon, because the normal practice is unnecessary typing work, right? Korn [kʰũːɘ̃n] (talk) 08:47, 31 March 2017 (UTC)
It could be done by using the same parameter names that are used in {{affix}} for the components of a word, but I think it's probably unnecessary. — Eru·tuon 20:42, 29 March 2017 (UTC)
@Erutuon, I forgot to mention that we need to filter out the "Proto-" prefix for reconstructed languages in descent trees. --Victar (talk) 22:13, 31 March 2017 (UTC)
@Victar: I don't understand. Could you give an example? — Eru·tuon 22:26, 31 March 2017 (UTC)
@Erutuon: If you have a look Reconstruction:Proto-Indo-European/ph₂tḗr, you'll see that Celtic:, Germanic:, and Indo-Iranian: all have the "Proto-" prefix dropped. Why that standard was set in place, I don't know, (maybe CodeCat does) but it's what I need to adhere to. --Victar (talk) 23:59, 31 March 2017 (UTC)
It's left over from when we didn't have proto-languages. We'd use the families to group the descendants. When proto-languages were added, it was natural to place the forms after those family names. —CodeCat 00:01, 1 April 2017 (UTC)
Thanks for the background, CodeCat. I think best to just keep to that standard, and if people want to change it in the future, it's an easy change of code. --Victar (talk) 00:07, 1 April 2017 (UTC)
@Victar: Ahh! Now I see what you mean. Done!Eru·tuon 00:35, 1 April 2017 (UTC)
@Erutuon: Huh, in the tests I just ran using it, it's leaving a residual "o-". --Victar (talk) 01:30, 1 April 2017 (UTC)
Fixed. You forgot the quote the dash. ;-) --Victar (talk) 01:58, 1 April 2017 (UTC)
@Erutuon, CodeCat, what to do in case when there is 2+ scripts (e.g. Serbo-Croatian)? Should i use {{etyl}}? —Игорь Тълкачь (talk) 21:41, 1 April 2017 (UTC)
@Useigor: I could be wrong, but I would just specify one script in {{desc}} and another script in {{l}}: {{desc|sh|ре̑ч|sc=Cyrl}}, {{l|sh|rȇč|sc=Latn}}
@Erutuon: Ok, i think i will be using only one script if the words words are same (e.g. {{desc|sh|ре̑ч}}, {{l|sh|рије̑ч}}, {{l|sh|ри̑ч}}). —Игорь Тълкачь (talk) 08:45, 2 April 2017 (UTC)

Should alternate plurals link to each other?[edit]

If a word has alternate plurals, such as POTUSSES and POTI, should the plurals link to each other as synonyms, or just back to the main entry? Siuenti (talk) 04:40, 31 March 2017 (UTC)

I generally just link back to the main entry, mostly because I'm too lazy to list the alternative plurals. This is particularly relevant in languages like Welsh, where many nouns have as many as three or four different attested plural forms. —Aɴɢʀ (talk) 15:03, 31 March 2017 (UTC)
I do the same. The primary purpose of non-lemma entries is to get the user to the lemma page. It is on that page that they can see the full range of inflections. —CodeCat 15:30, 31 March 2017 (UTC)
I agree, and have been specifically trimming down the moderate number of entries along the lines of CACTI: plural of cactus; alternative form of cactuses. Equinox 15:38, 31 March 2017 (UTC)

April 2017

"removed Category:en:Grasses; added Category:en:Hordeeae tribe plants"[edit]

Is this really a helpful change to anyone? Ƿidsiþ 08:39, 1 April 2017 (UTC)

An issue I've raised before is that when things in category A get split instead into subcategories AA, AB and AC, there is no longer a way to get a list of everything in category A: you have to go through all the subcategories. (That might be a UI criticism rather than a criticism of how granularly we actually classify our entries.) Equinox 08:42, 1 April 2017 (UTC)
We have 89 species, 60 genus, and 9 tribe Translingual entries for members of the family Poaceae, which includes all the grasses. All of those entries, ie, Translingual entries for grasses, are accessible using CirrusSearch without the use of a category.
According to the Angiosperm Phylogeny Group extant are 12 subfamilies, 707 genera, and 11,337 species. I don't know about the numbers of obsolete names and names of subspecies and cultivars, etc, but sometimes vernacular names are associated with taxa of such low rank.
I suppose we could use subfamilies instead of tribes for subcategories of grasses, but I'm not sure about the stability of the membership in those subfamilies. DCDuring (talk) 17:37, 7 April 2017 (UTC)
First of all, I responded to the concern expressed above by changing the category from "Hordeeae tribe plants" to "Hordeeae tribe grasses", because replacing a category with "grasses" in the name with one that says "plants" makes the name less informative for the vast majority of users who don't know or care what "Hordeeae" refers to. My reason for creating the category was to make Category:en:Grasses more manageable in size. The in-between categories I've been creating aren't intended for most languages, but are helpful in a language like English that has thousands of terms for grasses. In general, I don't create organism categories unless A) they're going to have at least a couple dozen members that would otherwise be too many for the parent category, or B) they're an obvious natural grouping that people are going to want to know about. Most of the categories of the B) type have already been created, so I've been concentrating on the A) ones.
As for whether to do subfamilies instead of tribes: many of the subfamilies are too big, and they're even more meaningless to non-botanists than tribes are. Chuck Entz (talk) 01:20, 8 April 2017 (UTC)
Sidenote: as a presentation level, Equinox's concern can be addressed with DynamicPageList which was activated on en.WT a decade ago to do pretty much exactly this: show the members of a category tree (there are more-modern tools as well.) - Amgine/ t·e 18:11, 26 April 2017 (UTC)

Centralizing labels[edit]

Right now there are dialectal data modules that contain labels used by {{alter}} to label alternative forms. Confusingly, they don't just contain dialect names (for instance, Attic or att for the Attic dialect of Ancient Greek), but also spelling systems (for instance, Oxford for Oxford spelling), morphemic variations (with movable nu), sound changes (apocopic), and perhaps other things.

Dialect labels are also found in Module:labels/data/subvarieties, and are used by {{label}} and {{term-label}}, and by {{alternative form of}}.

The labels used by {{alter}} often correspond to each other. For instance, see ἕως (héōs), a form-of entry, where the Attic label from Module:labels/data/subvarieties is used in the definition line through {{alt form of}}, and the main entry ἠώς (ēṓs), where the Attic label from Module:grc:Dialects is used in the Alternative forms section through {{alter}}.

There are some labels that are only found in Module:grc:Dialects: for instance, apocopic and with movable nu.

The latter would be difficult to use in {{alternative form of}}: grammatically, it has to be placed after the lemma: alternative form of lemma with movable nu. That could be specified in the data module, I suppose.

It would also never be used in {{label}}. That too could be specified in the data file too.

This is sort of a rambling post. The point is, I think the so-called dialectal data modules (that don't just contain dialect labels) should probably be moved into Module:labels/data and its submodules, though I am not sure precisely how every detail would work out. This would be easier for the actual dialect labels that are duplicated in Module:labels/data/subvarieties, probably harder for the other types of labels in the dialectal data modules.

I think others have made the same point before, or raised questions on this general theme; @CodeCat, @Angr? — Eru·tuon 21:46, 1 April 2017 (UTC)

I appreciate the problem, but I don't have any ideas on solving it. —Aɴɢʀ (talk) 12:23, 3 April 2017 (UTC)

Oh, and I forgot about the accent labels in Module:a/data, which are used by {{a}} in Pronunciation sections. Those would also be candidates for centralization. — Eru·tuon 02:39, 6 April 2017 (UTC)

And the labels used in {{qualifier}} contain much of the same content as those used in {{label}}, except they do not add categories. That too could be centralized. — Eru·tuon 03:06, 6 April 2017 (UTC)

Yeah, I gotta admit, it have no clue when to use {{label}} over {{qualifier}}. --Victar (talk) 02:14, 19 April 2017 (UTC)


Why can't I find million if I search for 1000000? Siuenti (talk) 16:40, 2 April 2017 (UTC)

There are infinitely many numbers and we have voted not to include most of them. Equinox 16:41, 2 April 2017 (UTC)
So 1000000 doesn't lead to million for the same reason 97432 doesn't lead to ninety seven thousand, four hundred and thirty two? Siuenti (talk) 16:45, 2 April 2017 (UTC)
one million is the first result when you search for 1000000 (if you bypass the annoying automatic redirect to ១០០០០០០). DTLHS (talk) 16:51, 2 April 2017 (UTC)
The trick is to use the "search" button, not the "go" one. The clue is in the name. SemperBlotto (talk) 16:55, 2 April 2017 (UTC)
I don't seem to have a search button. I do have a little box that says "search wiktionary" Siuenti (talk) 17:11, 2 April 2017 (UTC)
You don't have this? [4] Equinox 17:15, 2 April 2017 (UTC)
Here's what mine looks like [5]. I'm sure there's some preference that controls this. DTLHS (talk) 17:20, 2 April 2017 (UTC)
Yeah mine looks like that if I log out. Can you please make a separate thread about the search button? Siuenti (talk) 18:40, 2 April 2017 (UTC)
Did you guys change your default skin from Vector to Monobook? —suzukaze (tc) 23:31, 3 April 2017 (UTC)
I don't think that matters if I'm logged out. Siuenti (talk) 23:47, 3 April 2017 (UTC)
Vector is the default, including for logged-out users. —suzukaze (tc) 20:57, 4 April 2017 (UTC)

"Declension" and "Conjugation" on the list of valid headers at WT:EL[edit]

If you look at the list, it says "Declension" and "Conjugation" right below the POS header, where the headword line should be. This is confusing at best (since there are also sections with these names) and misleading at worst (since there is more often no inflection in the headword line at all). So I think this should be changed to just say "Headword line", so that it matches the name of the section further down that describes the use of the headword line.

Another possible improvement, I think, would be to link the various elements in the list to the sections within the page that describe them. So the aforementioned headword line would become a link to the "Headword line" section. —CodeCat 23:26, 3 April 2017 (UTC)

I agree. I don't think that this is a substantial change that needs a consensus, so I'll go ahead and do it. --WikiTiki89 17:01, 4 April 2017 (UTC)
Thank you. The second point remains, though. —CodeCat 17:19, 4 April 2017 (UTC)

Admin v Sysop[edit]

We have two current votes - 1) x for de-admin, and 2) y for desysop. Am I mistaken in assuming these are one and the same thing? — Saltmarsh. 19:07, 4 April 2017 (UTC)

They are the same. Sysop is not a term in official use around here but some people have picked it up elsewhere. Equinox 19:09, 4 April 2017 (UTC)
My question was really rhetorical. Perhaps we should "de-sysop" our literature - mixing the terms will confuse - for example Wiktionary:Administrators uses both of them — Saltmarsh. 20:06, 4 April 2017 (UTC)
My bad. I started the votes, and just love using synonyms and grandiloquent words. I find it so...good. --G23r0f0i (talk) 20:10, 4 April 2017 (UTC)
You sound like someone who should be a sysop around here. - TheDaveRoss 20:14, 4 April 2017 (UTC)
Not yet. I need a bit more experience, but thanks for the support. --G23r0f0i (talk) 20:18, 4 April 2017 (UTC)
We should de-admin everything too. The real term is "administrator" (if you stress it on the first syllable it sounds more intimidating). After all, you don't call the Terminator a "termin". --WikiTiki89 20:33, 4 April 2017 (UTC)
I have a pet allig. --G23r0f0i (talk) 20:49, 4 April 2017 (UTC)
Actually, sysop is used by the popups that I have enabled: when I put my curser over a link to a user page, for instance the one in @Wikitiki89's signature, it says sysop. The protection text for modules (from MediaWiki:Protectedpagewarning) also says sysop. — Eru·tuon 20:44, 4 April 2017 (UTC)
Mediawiki itself uses the terms interchangeably, the usergroup is called sysop in the database, but has been "translated" to administrator on Wikimedia projects. Some messages refer to one and some the other. - TheDaveRoss 12:13, 5 April 2017 (UTC)

Osco-Umbrian/Sabellic languages[edit]

Should this be added as a family node? It'd be especially helpful with loanwords into Latin like rūfus, lupus, and bōs, which are currently categorized as deriving from both Oscan and Umbrian, if at all. KarikaSlayer (talk) 19:38, 4 April 2017 (UTC)

Is it actually a genetic family, though? Or just a group of related languages? —CodeCat 12:31, 5 April 2017 (UTC)
It's listed as a node in Glottolog and {{R:De Vaan 2008}} (which also references Proto-Sabellic, implying common origin), at least. KarikaSlayer (talk) 17:15, 5 April 2017 (UTC)
Some of the main distinctions between Sabellic and Latino-Faliscan are:
  • Intervocallic *b, *d, *g, and *gʷ all become *f
  • *kʷ becomes *p
  • Gen.sg. endings *-ojso and *-os were replaces by *-ejs
  • 3pl. ending -nd become *-ns
There are probably a bunch more, but those come from a bit of scrounging. —JohnC5 18:12, 5 April 2017 (UTC)
Strictly, I'd say intervocalic *ɸ, *θ, *x, and all become *f in Sabellic, while in Latino-Faliscan they become *b, *d, *g, and *. But the result is the same. —Aɴɢʀ (talk) 22:37, 5 April 2017 (UTC)
So, how would we go about adding this then? I guess we'd have to decide on a name (Osco-Umbrian? Sabellic? Sabellian?) and a language code (itc-sab would be the most obvious choice). KarikaSlayer (talk) 19:32, 17 April 2017 (UTC)
I'm not really in favour of creating very small families. They make our category structure more messy. —CodeCat 19:37, 17 April 2017 (UTC)
"Derived from [family]" is probably something we should be doing less of, in general. It seems to have two separate uses: a word being derived from another family's proto-language (and added before we had much proto-languages around); or, a word being derived from one of a group of languages, but we don't know which one. With Latin we probably have cases of the second kind, which does not even require the languages in question being from a single sub-family. So filing them under both Oscan-derived and Umbrian-derived is maybe the best option so far. (Maybe we eventually want to add something like "Latin words with competing etymologies" to help with machine-readability.) --Tropylium (talk) 01:37, 29 April 2017 (UTC)


An important announcement...the first ever batch of ten-year-old entries have finally matured. Top of the list is unperceptive, which has been brewing 10 whole years without being corrected. It was made by some prat called Keene (talkcontribs). I wonder what came of him... --G23r0f0i (talk) 20:17, 4 April 2017 (UTC)

Why is it the first-ever batch? There have been ten-year-old entries for at least three years now. --WikiTiki89 20:36, 4 April 2017 (UTC)
Nope, I've been monitoring OldPages for many years now. This is first time they've matured. And to show how sad I am, I have been playing a game to try to get all WF pages in the top 20 oldest pages. --G23r0f0i (talk) 20:47, 4 April 2017 (UTC)
It's true that, according to that report, "the first ever batch of ten-year-old entries have finally matured".
But, FWIW, the report is simply wrong. For instance, the entry dictionary was created in 12 December 2002. --Daniel Carrero (talk) 21:01, 4 April 2017 (UTC)
By "oldest pages" it means pages that have not been edited for the longest time. DTLHS (talk) 21:05, 4 April 2017 (UTC)
OK then. --Daniel Carrero (talk) 21:05, 4 April 2017 (UTC)
Still though, it's quite possible entries have "matured" before and then were subsequently edited. --WikiTiki89 21:16, 4 April 2017 (UTC)
I remember Keene (talkcontribs). Most of the time, he did pretty good work. He had some excellent bots that he used. —Stephen (Talk) 19:11, 5 April 2017 (UTC)
Stephen, I wish I knew if you have a really dry sense of humour or not, because sometimes I just can't tell. —Μετάknowledgediscuss/deeds 22:27, 5 April 2017 (UTC)
My sense of humor can be dry and hard to detect, so I try to avoid humor. I'm serious about WF, I can't help liking him. I admit he used to be irritating, but I've gotten used to his antics. —Stephen (Talk) 10:09, 7 April 2017 (UTC)
I would rather say if not even a bot had to update a single bit of formatting in all these years, that these entries are more immature than mature (no IPA, no etym, no usex,...) Julien Daux (talk) 09:43, 8 April 2017 (UTC).
In that list, I found some entries (like in this diff) lacking a space between the headword line and the definitions. Like this:

# blah blah blah blah
So, apparently our bots haven't been fixing that small formatting "mistake" in the last 10 years. --Daniel Carrero (talk) 15:14, 8 April 2017 (UTC)
Yes, in that entry, the last "blah" is definitely redundant. --G23r0f0i (talk) 17:06, 8 April 2017 (UTC)
But SRSLY though, JD makes a good point. OldPages is generally a good place to start if you want to look for stubs and update some formatting. --G23r0f0i (talk) 17:06, 8 April 2017 (UTC)

to ñandú[edit]

If I was reading some Spanish which doesn't have diacritics, I might end up at the page nandu because I didn't know what it meant. What would be my next step? (what I need to find is ñandú but I don't know that yet) Siuenti (talk) 09:17, 5 April 2017 (UTC)

If you're reading Spanish with missing diacritics, your next step is to throw it away and get some Spanish that's spelled correctly. Failing that, I suppose you could go through all the pages listed under "See also:" at the top of the page nandu until you find one with a Spanish entry. —Aɴɢʀ (talk) 11:36, 5 April 2017 (UTC)
Oh come on. Don't deny reality. --WikiTiki89 15:38, 5 April 2017 (UTC)
That seems a bit inconvenient when there could be a #Spanish section that could take me straight there. Siuenti (talk) 21:30, 5 April 2017 (UTC)
This is one of the reasons why we have the see-alsos at the top of the page. --WikiTiki89 21:34, 5 April 2017 (UTC)
I've seen form-of entries for diacriticless forms in other languages; perhaps such entries can be created for Spanish words too? — Eru·tuon 21:35, 5 April 2017 (UTC)
Example: etre, which is marked as a misspelling. — Eru·tuon 21:37, 5 April 2017 (UTC)
I think it would be a little much to create these for all attested diacriticless spellings. --WikiTiki89 21:38, 5 April 2017 (UTC)
Given that French has more diacritics than Spanish and it already appears to have lots of diacriticless entries, it would make more sense to say it is too much to have such entries. — Eru·tuon 21:44, 5 April 2017 (UTC)
Actually, there aren't all that many entries in Category:French misspellings and certainly not one diacriticless spelling for every diacriticful French spelling, so I'm not sure what the criteria for including them was and how to determine whether similar Spanish entries should be added. But I would have thought any attested misspellings can have entries; isn't attestation the criterion for inclusion? — Eru·tuon 22:06, 5 April 2017 (UTC)
Misspellings are treated differently by the criteria for inclusion. They are only allowed if they are exceptionally common. — Ungoliant (falai) 22:11, 5 April 2017 (UTC)
There's a three-way distinction to be made here (I'll talk specifically about diacritics, but this can be applied to any spelling variations in general):
  • Misspellings: the author omits the diacritic(s) accidentally (whether this is simply a mistake or due to lack of knowledge of the "correct" spelling, in an otherwise diacriticized text). These need to be exceptionally common to merit inclusion.
  • Alternative spellings: the author intentionally omits the diacritic(s) (in an otherwise diacriticized text). These follow the ordinary inclusion criteria.
  • The entire text lacks diacritics (which I think is the case Siuenti is referring to). These I don't think should be included, but we don't really have a policy.
--WikiTiki89 16:13, 6 April 2017 (UTC)
What about a search for "nandu spanish" instead of plain "nandu"? —suzukaze (tc) 00:08, 6 April 2017 (UTC)

Should IPs be allowed to create new entries?[edit]

A good deal of IP vandalism involves creating new pages, which take much more time to delete than it takes to rollback a vandalistic edit when patrolling. Good IP edits are usually small changes to existing entries or adding translations; creating new entries requires a greater amount of experience. Those IP editors that do spend enough time to learn that can take the time to register an account anyway, and most vandals won't bother. Wikipedia has already had this system in place for a while, which has greatly reduced their patrolling effort (I should note that we have quite a backlog now). Is there support for restricting IP editors from creating new pages? —Μετάknowledgediscuss/deeds 18:22, 5 April 2017 (UTC)

I would oppose that restriction. (I hate having to sign up for things, for many reasons, and think it goes against the openness of a wiki project. I can remember when Web forums were things you could just post on, without having to give away your e-mail address etc.) However, I would support making more use of abuse filters, e.g. blocking new entries that lack certain basic elements, and automatically encouraging the user to create a valid one: trolls won't bother, while those with something to contribute probably will. Equinox 18:24, 5 April 2017 (UTC)
(edit conflict) Yes, they should. I was an IP before I became a registered user in 2013, and I believe other registered users started contributing in the same manner before they took the plunge. DonnanZ (talk) 18:37, 5 April 2017 (UTC)
I agree in principal, however, I'm always cautious when it comes to restricting participation. Believe you me, I'm the first to celebrate less vandalism but I'm inclined to agree with Equinox that optimising abuse filters might be the way to go before having to consider, in lack of a better word, more "drastic" measures. --Robbie SWE (talk) 18:32, 5 April 2017 (UTC)
  • An Abuse Filter that prohibits IPs from adding new entries lacking a valid L2 header would be a good improvement. Can that be written so we can test it for false positives? @DTLHS, maybe? —Μετάknowledgediscuss/deeds 18:53, 5 April 2017 (UTC)
    • @Metaknowledge Invalid L2 headers will be picked up by people doing dump analysis (I know me and Ungoliant MMDCCLXIV at least check this). There's also no way to tell if a L2 header is valid in the abuse filter, just that it's present. DTLHS (talk) 00:57, 6 April 2017 (UTC)
      • However, I have created abuse filter 64 (disabled for now) and will see how many hits it gets. DTLHS (talk) 00:59, 6 April 2017 (UTC)
  • Another observation I'd make is that omitting headers isn't the real problem we want to solve. A blanket ban might further discourage users who have real content to contribute but don't know how to format it. The really "bad" kind of vandalism is the people who just want to write PENIS everywhere. Equinox 16:15, 6 April 2017 (UTC)
  • I think we can handle cleanup or deletion of new entries of anons. There are multiple ways of identifying them. Sometimes such contributions open our little world to entire areas of language that we have neglected, eg, technical jargon specific to trades and industry. DCDuring (talk) 11:00, 7 April 2017 (UTC)
  • Any restriction on anon users will increase the ratio of bad registered users to good registered users--Giorgi Eufshi (talk) 11:33, 7 April 2017 (UTC)
I'd like to extend that to you need to be logged in to make edits to reconstructed entries. Often they're simply reverted because we have no one to address a discussion to. --Victar (talk) 23:52, 18 April 2017 (UTC)

We are missing lots of words[edit]

As an example, the numbers of words in the following categories are:-

Or we could just be missing lots of simple etymology sections. SemperBlotto (talk) 15:51, 6 April 2017 (UTC)

I'm kind of on it! Equinox 15:55, 6 April 2017 (UTC)
Are you trying to tell us Wiktionary is not finished yet? --WikiTiki89 16:13, 6 April 2017 (UTC)
Our French coverage is pretty good overall, so I suspect the low numbers have more to do with missing etymologies than anything. Often, the spellings are the same as the English word, or similar, and people tend not to put etyms for FL's when that is the case. Andrew Sheedy (talk) 02:14, 7 April 2017 (UTC)


Wikibase was briefly mentioned in the Beer parlour in February. There is now under development something, a Mediawiki extension, called WikibaseLexeme (mw:Extension:WikibaseLexeme). I'm not sure where this fits in the great plan, but it looks as if it could reorganize and centralize some of Wiktionary's work, in a similar fashion as Wikidata took over and centralized Wikipedia's interwiki links.

Should we care or not? Does it concern us? Or when will it concern us? I don't know.

Anyhow, this extension has a data model (mw:Extension:WikibaseLexeme/Data_Model), that says words have forms (bird-birds, hard-harder-hardest) that are listed, that is, they are explicitly enumerated for each word without any intermediate template or inflection pattern. For me, a speaker of Swedish and German, languages rich in forms that follow a few patterns, this sounds rather stupid, so I had to ask for clarity. Here is the discussion: mw:Topic:Tneli5zzrq5jb5bo, in case anybody is interested. --LA2 (talk) 20:59, 6 April 2017 (UTC)

It does sound stupid. Evidently they are trying to render us obsolete without even understanding how lexicography works. We should be concerned, but it seems that these people don't really have any desire to work with us, let alone listen to us. —Μετάknowledgediscuss/deeds 21:34, 6 April 2017 (UTC)
Storing forms raw makes sense to me. If the Wikidata side tries to become so complex so early, it could end up a huge mess. Languages with complex inflection patterns should have them handled on local Wiktionaries, not on Wikidata, at least until WikibaseLexeme becomes far more sophisticated. --Yair rand (talk) 21:56, 6 April 2017 (UTC)
Seems interesting but woefully incomplete... —Aryamanarora (मुझसे बात करो) 13:45, 7 April 2017 (UTC)
Seems to follow an excessively Rationalist approach. Wouldn't something more evolutionary be better? DCDuring (talk) 17:10, 7 April 2017 (UTC)

Start of the 2017 Wikimedia Foundation Board of Trustees elections[edit]

In addition to the Board elections, we will also soon be holding elections for the following roles:

  • Funds Dissemination Committee (FDC)
What to Do With New Entries for Numerals > 100[edit]

I've created an abuse filter that seems to be good at spotting new entries by non-autopatrolled accounts and IPs where the page name consists of nothing but digits and can be parsed as a number greater than 100. Right now I have it just tagging entries, because I'm not sure exactly what we want it to do when it finds such an entry.

I'm thinking we don't want to disallow, because there are no doubt many potential entries for numerals with meaning other than their numeric value (e.g. 411, 666 and 360). I think the best option would be to tag and warn, with a message explaining our policy/practice. Does anyone (perhaps @BD2412?) want to write one so I can add it to the abuse filter? Chuck Entz (talk) 03:53, 7 April 2017 (UTC)

  • I'd be glad to, but I'm turning in for the night now, and won't have time to focus on it until Monday. bd2412 T 04:21, 7 April 2017 (UTC)
  • I think you should consider carefully about whether there should be entries in cases where the number is a synonym for an NSOP entry, such as 1000 Siuenti (talk) 08:37, 7 April 2017 (UTC)

Norwegian languages[edit]

I guess this has been asked umpteen times before, but anyway: if a word is the same (e.g. "nemnd") in Norwegian Bokmål and Norwegian Nynorsk, does one make two entries, one for each language, or only one for "Norwegian"? --Hekaheka (talk) 13:23, 7 April 2017 (UTC)

  • Two entries. DonnanZ (talk) 13:30, 7 April 2017 (UTC)
Thank you. What are Norwegian (no) entries for? --Hekaheka (talk) 13:34, 7 April 2017 (UTC)
They're older entries which haven't been separated yet, which is something I will get back to. They're well and truly in the minority now. DonnanZ (talk) 13:38, 7 April 2017 (UTC)
In some cases they may stay where they are, especially proper nouns, surnames and the like which apply to both languages. DonnanZ (talk) 13:42, 7 April 2017 (UTC)

Proposal: Create entries for all unattestable Unicode symbols but without "real" definitions[edit]



  1. Create entries for all Unicode codepoints that are assigned to something, attestable or not: characters, symbols, emojis, diacritical marks, "box drawing" stuff, etc. (except control characters, I guess)
    • Example: ¤ (Talk:¤) failed RFV and doesn't exist. We could recreate it anyway.
  2. For unattestable symbols, instead of a "real" sense, the sense should be a comment like This symbol is not attested.
  3. Keep symbol redirects when multiple codepoints are the "same" symbol in some way.
    • Example 1: We can keep redirecting , and to ! (i.e., redirecting fancy exclamation points to the normal exclamation point).
    • Example 2: is the reverse empty symbol, which failed RFV (Talk:⦰), but today I redirected it to , the normal empty symbol.


  • This idea of not using "real" senses was inspired by {{translation only}}, used in entries like older sister and three days ago.
  • We can use the "Description" section to say what is the shape of the symbol, instead of using senses for that.


  • About the "not real" senses:
    1. If a symbol is not used enough by humans, maybe we can't say it "means" anything in a descriptive dictionary. The symbol ¤ is the "currency sign" according to Unicode (a prescriptive authority) but it does not seem to be used in three durably-archived sources, so in a sense it does not "mean" anything.
  • About the inclusion of all symbols:
    1. I don't have any actual numbers, but I wonder if many people are interested in searching for random symbols and emoji. At least, there are some websites dedicated to do these searches, and Wiktionary would join them. We do have complete Unicode appendices, but they are "buried" in the appendix namespace. They may be hard to be found unless someone already knows where to look. Having entries for all symbols would make them noticeably easier to be found.
    2. Entries can have content that can't fit in the Unicode appendices, such as non-durable citations and references from the internet, at most two citations from durable-archived sources (which is not enough for attestation), cross-links/tables/"see also", multiple images, and categories.
    3. This should stop the current practice of creating unattestable symbol entries where the sense is the shape of the symbol. Currently, 🌵 is defined as "cactus", but this does not seem citable. This exact sense would be attested by three quotations like: "I was in the desert and I saw a 🌵!" Otherwise, we can look for other senses of 🌵, or it would probably just fail RFV.

Entry examples:

(Disclaimer: I don't know if all the descriptions are good. The descriptions can be changed in the future.)

Exhibit 1:
cactus (🌵)
Exhibit 2:
currency sign (¤)
Exhibit 3:
a ribbon arrow ()
Exhibit 4:
fuel pump ()
Exhibit 5:
skull and crossbones ()
Exhibit 6:
light bulb (💡)
Exhibit 7:
heart exclamation mark ()
failed RFV: Talk:¤ failed RFV: Talk:⮴
{{character info/new}}

A [[cactus]].


# ''This symbol is not attested.''
{{character info/new}}

A small [[circle]] in the middle of a small [[X]] shape. (?)


# ''This symbol is not attested.''
{{character info/new}}

A ribbon-shaped arrow folded in 90 degrees, coming from the right side and pointing upwards.


# ''This symbol is not attested.''
{{character info/new}}

A [[fuel pump]].


# a [[gas station]]
{{character info/new}}

A [[skull]] and [[crossbone]]s.


# [[death]]
# [[poison]]
# [[piracy]]
{{character info/new}}

A [[light bulb]].


# an [[idea]]

{{R character variation}}

--Daniel Carrero (talk) 17:10, 7 April 2017 (UTC)

If we were mass creating these entries, how would we determine if they were attested or not? False negatives in other words. DTLHS (talk) 17:27, 7 April 2017 (UTC)
I would be okay with just allowing their manual creation as opposed to necessarily mass creating these entries. Maybe there are some groups of entries that can be safely mass-created, like Appendix:Unicode/Box Drawing and Appendix:Unicode/Block Elements. --Daniel Carrero (talk) 17:46, 7 April 2017 (UTC)
I certainly don't agree with creating pages whose definitional content is just "this isn't attested". And remember that there's some real junk in Unicode for compatibility reasons etc. Equinox 17:42, 7 April 2017 (UTC)
The NISOPs like older sister are defined as this, which looks very similar to "Entry not attested": "Used other than as an idiom: see older,‎ sister. (This entry is here for translation purposes only.)"
Do you have some examples of Unicode compatibility junk you think we shouldn't have entries for? --Daniel Carrero (talk) 17:49, 7 April 2017 (UTC)
The difference is that "older sister" is attested. --WikiTiki89 18:14, 7 April 2017 (UTC)
Yes, but that's beside the point as stated by Equinox above. He mentioned the definitional content, but both "older sister" and the proposed "¤" have basically the same definitional content. --Daniel Carrero (talk) 18:18, 7 April 2017 (UTC)
Still different. The definition in your proposal is basically saying "I do not exist". With "older sister", it's just saying "nothing needs to be said because the definition is obvious". --WikiTiki89 19:05, 7 April 2017 (UTC)
I oppose entries for unattested symbols. --WikiTiki89 18:09, 7 April 2017 (UTC)
I also oppose. - TheDaveRoss 18:33, 7 April 2017 (UTC)
What about just hard redirecting all unattestable symbols to the relevant page in the Unicode Appendix? Andrew Sheedy (talk) 00:05, 8 April 2017 (UTC)
Personally, I don't like that idea very much. If I search for 🍨 (ice cream symbol, which may be unattestable), I would like to see info about it, not about a whole list of food entries, which might be confusing. It's not even clear why the reader was redirected in the first place -- they might not know what Unicode is. More importantly: if the person does not have the right fonts to see the symbol, it would probably reduce significantly their ability to find the right symbol in the list. --Daniel Carrero (talk) 21:23, 16 April 2017 (UTC)
The main reason for not doing this is that it really provides no useful information at all. The user clicks on the link, to be told that they wasted their time. There's already far too much filler on the web as it is: it's easy to compile and present, and people like to think they're being comprehensive. I, for one, find it annoying to look something up and find nothing but the obvious and self-evident. I don't want to be told that there are no elephants native to my zip code, that a green leaf is a leaf that's green in color, or that a small circle inside of an x shape can be described as a small circle inside of an x shape. If the user's reaction is going to be "well, duh...", we're better off not having the entry. Chuck Entz (talk) 00:34, 8 April 2017 (UTC)
As I said below, it's alright if people don't want to do it as proposed above, and your arguments make sense to me. Still, I'd like at least to add {{no entry}} in all the unattestable entries for symbols. It shows that we don't have them because they don't actually exist, as opposed to just having an "incomplete" coverage of symbols. We might want to delete the hundreds of unattestable symbols that we already have, but people will probably keep creating them. --Daniel Carrero (talk) 02:15, 10 April 2017 (UTC)
About this: "a small circle inside of an x shape can be described as a small circle inside of an x shape". Not all people have fonts for all characters, and some characters like 🏦 have major rendering variations. The symbol ¤ can also have usage notes like "Unicode describes it as the 'currency sign' but it has not found widespread use." Plus, the entry for ¤ can have information about the same character in different encodings, and the etymology and derived terms when applicable. --Daniel Carrero (talk) 01:23, 11 April 2017 (UTC)
Does "Unicode characters" include weird hanzi Unicode encoded because they have bizarre standards for inclusion? —suzukaze (tc) 01:23, 8 April 2017 (UTC)
Yes, the proposal above would include those, too. FWIW, this hanzi is already a "no entry": 𪜁. I would support adding the "no entry" in all the weird non-existing hanzi. It shows that we don't have them because they don't actually exist, as opposed to just having an "incomplete" coverage of hanzi. But, if people don't want the "no entry" in all weird hanzi, then I'd suggest deleting that one, too. --Daniel Carrero (talk) 23:55, 9 April 2017 (UTC)
@Daniel Carrero: These ghost kanji may be disanalogous; see Category talk:Ghost kanji. — I.S.M.E.T.A. 23:35, 13 April 2017 (UTC)
It's OK if people don't want to create the entries for all the symbols without "real" definitions as I had proposed above. @Chuck Entz's reasoning is one that makes sense to me, too.
Aside from that idea, what about the symbols that we already have but don't seem to be actually used in three durably-archived sources? I assume it would be OK to just delete most of them outright? Maybe opening RFV discussions for all these symbols is not feasible, because there are too many.
I mentioned the cactus (🌵) above, which may not be attestable and could be deleted. There are also 283 entries (+1 appendix) in Category:Miscellaneous Symbols and Pictographs block, and 169 entries (+1 appendix) in Category:Miscellaneous Symbols block. (A few of these symbols already have citations and don't need to be deleted, I believe.) --Daniel Carrero (talk) 23:49, 9 April 2017 (UTC)

FWIW, I support Daniel Carrero’s proposal. At the very least, an entry’s Description section, transclusion of {{character info/new}}, and any image that may be included in a given entry serves to elucidate incomprehensible mojibake. — I.S.M.E.T.A. 23:43, 13 April 2017 (UTC)

Concerning my proposal above, so far we have:
  • 2 support "votes", counting myself
  • 3 oppose "votes", I believe
  • I'm not counting @Andrew Sheedy and @suzukaze-c in this support/oppose list
Naturally, I'd prefer doing as I proposed before (having all these symbol entries without "real" definitions, or with {{no entry}}).
But let's assume that proposal fails. (Implementing my idea would require a 2/3 majority in an actual vote, it goes without saying.) I guess the natural course of action would be doing the opposite, and deleting all the unattestable symbols. Some people here in this discussion seem to support deleting all those entries. We seem to have at least a few hundreds of entries for symbols likely to be unattestable: (these are all bluelinks at the moment) 🍨, 🍩, 🍪, 🍫, 🍬, 🍭, 🍱, 🍶, 🍼, 🎀, 🎂, 🎃, 🎄, 🎅, 🎆, 🎑, 🎒, etc.
I won't create a few hundreds of RFVs because it would flood WT:RFV, (well, maybe I could start with a few RFVs, just not all the symbols at once!) but in theory we could create those RFVs or just delete the symbols outright. This would be consistent with our normal practices.
I dislike having all these unattestable symbol entries with the Unicode name used as the definition, and I know I'm not the only person with that opinion. Sure, 🍱 means "bento box" but it's just because Unicode says so, not necessarily because there's a symbol used like that in real life. It also apparently sets an uncomfortable precedent -- it may look like we could also create (these are all redlinks at the moment) 🍲, 🍰, 🍯, 🍮, 🍧, 🍥, 🍤, 🍣, 🍢, 🍡, etc. using the same kind of definitions.
Deleting or attesting all these symbols would fix that problem, or creating all the symbols without "real" senses. But I know it's not up to me. --Daniel Carrero (talk) 21:16, 16 April 2017 (UTC)

Separate proposal[edit]

I do support adding entries for symbols even though not attested, but make actual definitions for them instead of just saying "not attested". The rationale is that, though symbols such as phone emojis are usually not used in durably archived sources, it is EXTREMELY commonly used in casual texting conversation. 😊 For example, this emoji is more common than I can even say, yet we don't even have an entry for it. On a side note, some genius in this world should come up with a way to durably archive some public chat room text so Wiktionary can have more attested words. PseudoSkull (talk) 21:50, 16 April 2017 (UTC)

I created WT:RFV#😊 for that symbol, let's see if we can find some citations for it.
Thanks for supporting the idea of adding entries for unattested symbols. But let's talk a bit more about your separate proposal. Let's assume this entry is unattestable: 🍨 (it's the ice cream symbol). How would you define it? Sure, we could just say "ice cream" in the sense, but if it's not used like that in real life, then we are not being descriptive anymore, we are being prescriptive. Besides, we already use the "Description" section to say what the symbol looks like.
Compare the symbol . It's the "fuel pump" in Unicode, but it's not a symbol meaning fuel pump. (we didn't find any citations where that symbol means "fuel pump") It looks like a fuel pump, but it means "gas station". If you see that symbol somewhere, it's probably indicating a gas station. --Daniel Carrero (talk) 22:08, 16 April 2017 (UTC)
@ User:Daniel Carrero It would be described based on how it's really used, i.e. with a non-gloss definition. Used to express the need to use or state of using a fuel pump. Used to express the desire for or presence of ice cream. PseudoSkull (talk) 22:42, 16 April 2017 (UTC)
For some support, the Japanese and Swedish Wiktionaries actually have entries for these symbols. PseudoSkull (talk) 22:44, 16 April 2017 (UTC)
Unfortunately, I'm not a big fan of these definitions that you wrote. "Used to express the desire for or presence of ice cream." seems to be exactly a prescriptive definition. How can you tell that symbol means exactly that, apart from the Unicode codepoint name? Are people using the ice cream symbol in three independent, durably-archived sources? Besides all that, the first word in that definition is "Used", which is false if people are not actually using the symbol.
I'm not greatly aware of other Wiktionaries' practices and policies, but I've seen a few symbols in other Wiktionaries before. It's interesting that the Japanese and Swedish Wiktionaries have entries for these symbols, but why did they create these entries in the first place? Do they have some policy concerning symbols? We have a number of these entries too, so maybe they just copied us? I don't know their reasons, and I don't know if these Wiktionaries are striving to be descriptive rather than prescriptive. But if they created these symbols just because they exist in Unicode, and used the Unicode names as normal definitions, then those definitions are prescriptive. --Daniel Carrero (talk) 23:14, 16 April 2017 (UTC)
@Daniel Carrero, PseudoSkull: Would it be OK to write descriptive definitions for these emoji based upon non–durably-archived sources? — I.S.M.E.T.A. 20:24, 17 April 2017 (UTC)
I would support implementing new rules to allow getting citations from some (not all) places on the internet, for the purpose of attesting words and symbols alike.
For details, read this long discussion: Wiktionary:Beer parlour/2016/October#Why we don't need durable citations (especially this subsection of the same discussion: Wiktionary:Beer parlour/2016/October#Proposed CFI change).
This is not how CFI currently works. But one thing I consider important if we do this is the ability to verify and disallow these citations when they become inactive -- that is, according to this proposal, if you get three citatiosn for a word or symbol from websites or from the Web Archive, and then these pages are deleted, then we will need to get three new citations or delete the sense. This is to ensure we can verify how the symbol is actually used. --Daniel Carrero (talk) 23:48, 17 April 2017 (UTC)
@Daniel Carrero: Does the Web Archive not preserve things indefinitely? — I.S.M.E.T.A. 02:06, 18 April 2017 (UTC)
A few past discussions about using Web Archive to get citations:
Also, this vote (it ended with 7 support and 7 oppose, traditionally known as no consensus):
Apparently, owners of the domains can choose to delete stuff from the Web Archive, (not to mention that domain owners often change) and that's why it's not reliable as a durably-archived source. --Daniel Carrero (talk) 02:25, 18 April 2017 (UTC)
@Daniel Carrero: Bugger. How about storing our own screenshots, then? I was thinking something like what Wikisource does to source some of its texts (for example, the scan at s:Page:Keats, poems published in 1820 (Robertson, 1909).djvu/141, used to source the beginning of Keats’ “Ode on a Grecian Urn”). — I.S.M.E.T.A. 23:46, 21 April 2017 (UTC)
What you said right now is one of the multiple ideas that were discussed in Wiktionary:Beer parlour/2016/October#Why we don't need durable citations.
Maybe that could work. If other people want to do it, I might want to support it too.
But one serious flaw of that idea is that these screenshots can be easily fabricated, so they don't serve as extremely reliable proof that the word was once used on the internet, especially if the original webpage gets deleted.
I support getting citations from the internet for words and symbols one way or another. If we are going to do it, the idea I like the most is still the one that I proposed in the discussion "Why we don't need durable citations" (accepting citations from the internet as long as they still exist on the internet). --Daniel Carrero (talk) 00:40, 22 April 2017 (UTC)
Fabricated screenshot.png
@Daniel Carrero: I am uneasy about the prospect of an increasing proportion of our content being perpetually contingent upon external sites' longevity. In what way can screenshots be easily fabricated? More easily than the pages of a PDF of an old book? — I.S.M.E.T.A. 12:20, 25 April 2017 (UTC)
Check your last message in this image. --Daniel Carrero (talk) 12:33, 25 April 2017 (UTC)
@Daniel Carrero: You added an extra space between that final full stop and the hair-space+em-dash of my signature, so that fabricated screenshot lacks strict verisimilitude! Still, I take your point. :-(  — I.S.M.E.T.A. 12:45, 25 April 2017 (UTC)
LOL. For the record, this is how I fabricated the screenshot: I clicked "Inspect element" on my Firefox and edited the HTML source of the page. Then I took the screenshot. --Daniel Carrero (talk) 12:50, 25 April 2017 (UTC)
@Daniel Carrero: I've seen that done with Twitter DMs, so it shouldn't've surprised me to see it done here. — I.S.M.E.T.A. 12:52, 25 April 2017 (UTC)

Rajasekhar1961 has asked me about the format of as a Telugu abbreviation. We used to use Abbreviation as a header for these things, but now that header is not allowed. As I understand it, the header has to be Noun, Adjective, Verb, and so on, and then {{abbreviation of}} is used in the definition line to link to ఉత్పలమాల and ఉత్తరము. Then it becomes more complicated. What does Rajasekhar1961 write in ఉత్పలమాల and ఉత్తరము to define the abbreviation , and how does he link it back to ? Or does he link it to at all? WT:EL is difficult enough for me to understand, and I think it would be much harder for Rajasekhar1961, whose English is en-2. —Stephen (Talk) 07:37, 8 April 2017 (UTC)

Since both ఉత్పలమాల and ఉత్తరము are nouns, I would give its POS as noun. I would list under Synonyms at each of the full entries, just as ave. is listed as a synonym of avenue and St. is listed as a synonym of Saint. —Aɴɢʀ (talk) 07:44, 8 April 2017 (UTC)

Checkbox for translation editor gadget[edit]

In the preferences under gadgets, there is one labelled "Disable the buttons that allow editing of translation tables". It's not clear, but I'm guessing that checking this will disable the translation editor. If that is the case, then I'd suggest changing it so that the editor is enabled when the box is checked, which is more intuitive. Of course it has to be checked by default then, rather than unchecked. Also, the label could be clearer too, what "buttons" is it referring to? It could just say "Enable the translation editor".

I just noticed that the button for the rhymes editor also has the sense inverted: the editor is enabled when the box is unchecked. This also ought to be reversed. —CodeCat 18:56, 8 April 2017 (UTC)

"Alternative forms" of given names and surnames[edit]

I find it EXTREMELY disturbing that we list entire names of people as "alternative forms" of other names. First of all, these are people's NAMES we're talking about here, things that they have lived being called all their lives. Imagine if you looked up your own name or surname here, just to see it listed as "(just an) alternative form of ______ (and it has no other importance than that it is just an alternative)". I feel this could be extremely offensive to some people.

Second of all, they're not actually "alternative forms". If someone's name is Jasmine and you accidentally spell their name like Jazmine on a formal document of some sort, that person will point it out to you and tell you to fix it, because that is not how you spell her name. It's not like if Jasmine goes to a different country with a different dialect, her name magically changes to "Jazmine". It's also not like if in different situations her name is spelled "Jazmine" rather than "Jasmine". No. Her name is ALWAYS Jasmine, unless she literally changes it in court or tells other people "Oh I don't like the original spelling of Jasmine, so just spell my name Jazmine please. That's my nickname." umm.... But the latter would never happen...

I propose to make rules here against listing names as alternative forms of other names. Same goes for surnames. They really aren't alternative forms. PseudoSkull (talk) 19:09, 9 April 2017 (UTC)

Additionally, I think that rather than listing similar names in "Alternative forms", we should put them in "Related terms" or "See also" instead. PseudoSkull (talk) 19:10, 9 April 2017 (UTC)
We're a dictionary; we're interested in the lexicographic properties of words, including names. It's not our job to avoid offending people. My own first name is a relatively rare alternative spelling of a fairly common first name, and if I look my first name up in a dictionary, that's exactly the information I expect to see. It doesn't offend me at all. —Aɴɢʀ (talk) 20:51, 9 April 2017 (UTC)
Try telling that to the Japanese who can't come to terms with shinjitai changing the shape of the kanji in their last names.suzukaze (tc) 20:57, 9 April 2017 (UTC)
I don’t think it is that serious of an issue, but there is room for improvement; different forms/spellings of a name have different semantic implications from alternative forms/spellings of regular words (i.e “I’ll analyse it” and “I’ll analyze it” mean the same thing, but “Jon will do it” implies a different person than “John will do it”).
My proposal is this: instead of treating variants of a name as completely different forms, we add parameters like variantof= and variantq= to {{given name}} such that {{given name|male|variantof=John|variantq=spelling}} displays its current text in templatised form. This would formalize the already common practice of using “variant of” instead of “alternative form/spelling”. — Ungoliant (falai) 21:11, 9 April 2017 (UTC)
I'm not sure if this will work. It's not really possible to point out which is a variant of another; John could equally be a variant of Jon. Our current definition of Jon doesn't make much sense to me, because it defines a single name twice, as if it's two different names when it's the same word, given as a name to people, in both cases. Really, etymology is what should give this kind of information. —CodeCat 22:48, 9 April 2017 (UTC)
Presumably the most common variant (or the first created when you can’t point to a single most common variant) acts as the hub, like we do with all other groups of alternative forms. — Ungoliant (falai) 22:53, 9 April 2017 (UTC)
True, but I agree with the OP that speaking of alternative forms with names is weird. They're not interchangeable, as has been pointed out. What criteria are there for treating them as alternatives of each other anyway? What makes them in any sense "the same"? —CodeCat 22:55, 9 April 2017 (UTC)
Becki is "the same as" Becky because it was directly derived from it by applying some trendy/tacky/whatever spelling rule. The names were not devised independently. How is this different from sulphur/sulfur? Equinox 23:02, 9 April 2017 (UTC)
Then the relationship is etymological. Is there any synchronic relationship? —CodeCat 23:33, 9 April 2017 (UTC)
I strongly oppose changing things because someone might be offended. Just document facts. Equinox 22:27, 9 April 2017 (UTC)
Oppose. People can be potentially offended by just about anything, and our goal is not to offend as few people as possible, it's to document language. It might be useful for someone to know that a certain spelling of a name isn't the traditional/most common/standard one. Andrew Sheedy (talk) 22:46, 9 April 2017 (UTC)
I deleted Support and Oppose because people have introduced new ideas of how to better deal with etymologically similar names. I think the discussion should continue before starting a real vote. PseudoSkull (talk) 23:11, 9 April 2017 (UTC)
As CodeCat and others have said, these are not "alternative forms", as 'Becki' refers to a different person than 'Becky'. One being derived from another is ===Etymology===. "Jon" and "John" is a particularly informative example for someone to have brought up, above, since neither one is even etymologically derived from the other! - -sche (discuss) 00:56, 10 April 2017 (UTC)
Hmm. My ex Rebecca originally used Becky, then changed to Becki to be unique in a class with several "Beckies", then switched back to Becky to get a job and not sound like a stripper. Equinox 01:02, 10 April 2017 (UTC)
Sometimes Becki and Becky refer to the same person. Sometimes Becky and Becky refer to different people. If we were to split them up by who they refer to, we'd have an entry for every single person on earth. So I agree with Andrew Sheedy and Equinox. But I'll also mention that usually Jon and John are not alternative forms of each other, but rather Jon is usually an alternative form of Jonathan. --WikiTiki89 13:45, 10 April 2017 (UTC)
"Sometimes Becki and Becky refer to the same person." Not really. Equinox's example is just some one person's personal action, which I don't even see as a common action. As for diminutives, that's also personal preference (but still in the entry it should be mentioned somewhere that that name is a diminutive of another name), but they do tend to refer to the same person as the full form. For instance, I insist that people do not call me "Maddy" as people will often jump to do, but instead to call me by my real name, Madison. And also, if I did want to be called Maddy, and someone spelled it "Maddie" when referring to me, I'd correct them. And I certainly wouldn't keep changing it from "Maddy" to "Maddie" and vice versa. Also, need I mention that there is a similar name "Mattison" which would refer to a completely different person? PseudoSkull (talk) 17:14, 10 April 2017 (UTC)
Yes, and? My point was that it doesn't matter if it refers to the same person or not. Alternative form doesn't mean interchangeable. Maddie is an alternative form of Maddy, both of which are diminutives of Madison. That doesn't mean that anyone named Madison can be called Maddy and Maddie, but that it is common for people named Madison to be called Maddy or Maddie. --WikiTiki89 17:23, 10 April 2017 (UTC)
I also knew someone who was called Daniel by some people and Dan by others (without any explicit change from one to the other; they were used simultaneously). Equinox 21:46, 13 April 2017 (UTC)
That's actually very common. --WikiTiki89 21:59, 13 April 2017 (UTC)
But that's a nickname. We're talking about things like Jazmine being an "alternative from of Jasmine" when it's not. You either call Jasmine Jasmine or you call Jazmine Jazmine. They're two different people in all but VERY few cases of personal preference. They're not alternative forms. Rather, it should say in the etymology that it is a variant rather than in the definition. The definition should say "This is a female given name." Rather than "This is an alternative form." PseudoSkull (talk) 21:42, 16 April 2017 (UTC)
It's true that calling a person named Traci "Tracy" wouldn't be an example of an alternative form, but that isn't the only way usage varies for names. If the Smiths name their daughter "Traci" and the Jones' name their daughter "Tracy", they're using different forms of the same name. The fact that "Traci" will always be "Traci" and "Tracy" will always be "Tracy" doesn't change that. Getting rid of "alternative forms" means we miss out on obvious relationships between names like Geoffrey/Jeffrey/Jefferey, or Gillian/Jillian/Jill. Besides, your logic would require that we get rid of translations for names, too, since William isn't Guillermo, and Jacques isn't Jack or Jacob. Chuck Entz (talk) 02:14, 17 April 2017 (UTC)
What is the practical alternative to listing some names as variants of others? — I.S.M.E.T.A. 23:47, 13 April 2017 (UTC)
@I'm so meta even this acronym What Chuck Entz said directly above, in addition to the following anecdotal evidence that indicate that people don't see different variants of a name as different names: I was at an event a few weeks ago where there were two guys named Andrés, one of whom was called Andrew by his brother. A couple people remarked how there were three Andrews, even though we didn't technically have identical names. I've also heard exchanges like "What's your name?" "Kathryn" "Do you spell that the normal way, or some weird way?" If "Kathryn" and "Catherine" are truly different names, then one wouldn't hear exchanges like that, which indicate that most people think of them as different versions of the same name. It's also worth noting that if Jesse meets Jessie, they might say "Hey, we have the same name" even though they spell it differently. Andrew Sheedy (talk) 01:18, 18 April 2017 (UTC)
@Andrew Sheedy: Yes, I agree with you and Chuck. I was just wondering whether it's in any way feasible not to treat some names as variants of others. — I.S.M.E.T.A. 02:05, 18 April 2017 (UTC)
Whoops, my brain must have substituted some word in the place of "alternative"... Evidently I need to read more carefully. Andrew Sheedy (talk) 02:14, 18 April 2017 (UTC)


Two years later (see older BP post: Wiktionary:Beer_parlour/2015/March#Wordset) they have now officially closed shop, due to “lack of interest”. A bit sad, but the good news is that there's now a data dump available on github, licensed under CC BY-SA 4.0. I'm especially interested in their example sentences, which they tried to make gender neutral (something I think we should also adopt). Are there any objections against (semi-)automatically importing some of this data? – Jberkel (talk) 22:59, 9 April 2017 (UTC)

I'm actually somewhat unsure of the role of usexes here in general, particularly for English, where actual citations are pretty much always better. —Μετάknowledgediscuss/deeds 23:13, 9 April 2017 (UTC)
I'd say they serve two different purposes. Citations are useful as a reference but often a bit long-winded, embellished or written in dated/archaic language or spelling (esp. when citing older works). Usexes in contrast are simple, to the point, written in contemporary language and therefore easy to understand (think of non-native readers). Citations often don't make sense unless a lot of additional context is added. A usex stands on its own, or should be constructed in such a way. It can be scanned quickly and also helps to clarify the sense. – Jberkel (talk) 22:07, 10 April 2017 (UTC)
That would be my assessment of them as well. The French Wiktionary uses many of each to illustrate each sense, and as a non-native speaker, I tend to find the usexes more useful since they are constructed for the specific purpose of illustrating the definition. We would be well off to add far more of each, IMO. Andrew Sheedy (talk) 03:09, 12 April 2017 (UTC)
If English Wiktionary already have enough examples, other wiktionary don't have for English entries. At least for French Wiktionary, we'll be glad to have some of those! Face-smile.svg Noé 11:34, 12 April 2017 (UTC)
IMO we often need more usexes rather than fewer, especially in L3 sections with more than one definition, and most especially for grammaticized words. I am not at all confident that usexes from a dictionary that did not have exactly our set of definitions would provide useful usexes without a great deal of manual effort to match the usexes with our definitions. Of course, we can have the same problem with citations for such L3 sections. DCDuring (talk) 12:13, 12 April 2017 (UTC)

Removing inactive editors from Category:User coders, Category:User languages, and Category:User scripts[edit]

The bottom-level categories within Category:User coders, Category:User languages, and Category:User scripts are populated in no small part by the user pages of very many users who are now inactive or who have never been active. The first sentence of Wiktionary:Babel reads: “User language templates aid multilingual communication by making it easier to contact someone who speaks a certain language.” Since it would be pointless to contact someone who, it might be assumed, would never read a message of contact, having those inactive users in those user-proficiency categories undermines those categories’ purpose. Accordingly, I propose that a bot be run, tasked with adding a |nocat=1 or |inactive=yes parameter to the {{Babel}} transclusion of every user who has not edited this project within the preceding year (past 365–366 days) of a given bot run. This parameter may function either to remove a user from his user-proficiency categories or to move that user to different categories (marked “inactive” in some way).
Does that seem desirable to everyone? Is the task automatable, as I’d hoped? Does anyone have a better idea? Pinging KIeio and Awesomemeeos because of their contribution to the relevant discussion at User talk:KIeio#Babel. — I.S.M.E.T.A. 23:58, 9 April 2017 (UTC)

@I'm so meta even this acronym I think that's a good idea! I like what bots can do. Let's do this! — AWESOME meeos * ([nʲɪ‿bʲɪ.spɐˈko.ɪtʲ]) 00:07, 10 April 2017 (UTC)
I think one year is too short a period. Otherwise yes. Equinox 00:10, 10 April 2017 (UTC)
As in the earlier discussion, I support the idea, probably something like 1-2 years inactivity as a threshold. — Kleio (t · c) 22:46, 10 April 2017 (UTC)
support Crom daba (talk) 23:36, 10 April 2017 (UTC)
This is a good idea. In some respects, 1 year does seem a bit short, but then again, when I go through the categories I usually ping only people who've been active in the last month or two! So I guess I am OK with a limit of 1 year, especially if the same bot that checks if users have been inactive and need to be removed from the category (nocat=1-ified) will also check if users have become active and need to be restored to the category. 2 years (mentioned above) would also be acceptable to me; it would leave a lot of inactive users but still winnow the categories somewhat. - -sche (discuss) 23:39, 10 April 2017 (UTC)
@Awesomemeeos, Crom daba, Equinox, KIeio, -sche: OK; how about two years? Re “check[ing] if users have become active and need to be restored to the category”, yes, I would propose that that be part of the bot’s duties (although, couldn’t a newly “reactivated” user just remove the |nocat= or |inactive= parameter from the {{Babel}} transclusion?). — I.S.M.E.T.A. 01:43, 14 April 2017 (UTC)
@Awesomemeeos, Crom daba, Equinox, KIeio, -sche: Shall I write the vote? — I.S.M.E.T.A. 23:58, 15 April 2017 (UTC)
Yes, of course — AWESOME meeos * ([nʲɪ‿bʲɪ.spɐˈko.ɪtʲ]) 00:04, 16 April 2017 (UTC)
Yeah. I think two years is reasonable. Equinox 00:44, 16 April 2017 (UTC)
@Awesomemeeos, Crom daba, Equinox, KIeio, -sche: I've created the vote; see Wiktionary:Votes/pl-2017-04/Removing inactive editors from user-proficiency categories. — I.S.M.E.T.A. 01:14, 16 April 2017 (UTC)
(By the way, I don't think it's a better idea, but if there any technical barriers to the parameter, then a different idea might just be to have the bot comment out the template.) Equinox 01:24, 16 April 2017 (UTC)
@Equinox: Yes, that would work; however, I went for the option that was minimally disruptive whilst still salvaging the value of the user-proficiency categories. — I.S.M.E.T.A. 01:29, 16 April 2017 (UTC)

Category:Braj language[edit]

Shouldn't this be merged into Hindi? —Aryamanarora (मुझसे बात करो) 01:00, 10 April 2017 (UTC)

Yes, I think it probably should. For future reference, this kind of thing usually goes at WT:RFM. @-scheΜετάknowledgediscuss/deeds 02:34, 10 April 2017 (UTC)


Am a bit concerned about Special:Contributions/Romanophile, with a definite agenda to take every man phrase and create a woman one. It's like when PaM started taking every church phrase and creating a mosque version (many were deleted). I don't feel great about politics (even if they are nice "progressive" politics) dictating what entries we have. In particular I feel we need to gloss unusual things like "oh my goddess!" and "woman the lifeboats!" as non-standard, simply to help anyone trying to learn English, rather than creating them without comment. Am I being reasonable? Equinox 02:16, 10 April 2017 (UTC)

We really need to have a written-down policy for usage examples, so that when we revert edits to usage examples it won’t look like we’re targeting their politics. — Ungoliant (falai) 02:18, 10 April 2017 (UTC)
See also [6], which isn't necessarily a bad edit but did raise my eyebrows. Certainly we don't have to use women/pretty, men/athletic in a usex, and it's better to avoid those stereotypes, but changing "men and women" to "women and men" is breaking idiom. Equinox 02:45, 10 April 2017 (UTC)
Actually, "women and men" is fine, I believe. It is citable. --Daniel Carrero (talk) 03:42, 10 April 2017 (UTC)
It's citable but it's not the traditional ordering of the two words. —suzukaze (tc) 03:44, 10 April 2017 (UTC)
To me, this looks like a reason to use "women and men" more — because it's not been used often enough, in comparison with the opposite order of words. To shake things up, and have some variation. --Daniel Carrero (talk) 06:27, 10 April 2017 (UTC)
Our goal is for our usexes to be as natural as possible, not to write what we think people should say, or with the intent of "shaking things up." Andrew Sheedy (talk) 06:49, 10 April 2017 (UTC)
In this case, "women and men" fits because it says right after "but more frequently by women". However, in most cases "men and women" is more natural. See here and here for a percentagewise comparison. If you want things to change, go change them in the real world and eventually the dictionaries will reflect that. But it's not the job of a dictionary to promote any sort of social change, nor to oppose it. This is what being descriptive rather than prescriptive is all about. --WikiTiki89 14:22, 10 April 2017 (UTC)
One could argue (and many do) that there is no way to abstain from that debate, unless one refrains from usage examples which have any gender specificity. If you choose to write "men and women" you are reinforcing historical norms, if you choose "women and men" you are subverting them. Even the androgynous "people" can be considered "progressive." Regarding the question of Romanophile's changes, I am ambivalent to the ones I have read, if they wish to change the ordering of genders in a usex that is no skin off my nose. If there are changes like "firemen" to "firehumans" then I would take issue with that. - TheDaveRoss 14:49, 10 April 2017 (UTC)
@TheDaveRoss: I’d say that the entries that I added were slightly inaccurate at worst. In which case, they could be redefined in a way to make them hyponyms. If the definitions seem inadequate it’s probably because I felt overconfident from the results that I skimmed on Books, not because I wanted to enforce a ‘political agenda’ as the OP’s laughworthy and overly worried conclusion suggests. — (((Romanophile))) (contributions) 16:54, 10 April 2017 (UTC)
Look at the numbers I linked to in my previous post. "Men and women" is historically 99% and since the late 1960s has dropped to about 85%. We don't have to "reinforce historical norms", we just have to follow the current norms. And 85% is still a strong norm, especially if you assume that some of those 15% are context-specific, which shows that the hard rule that "men and women" is the only correct order has disappeared, but it is still the unmarked order. --WikiTiki89 14:55, 10 April 2017 (UTC)
Well, an 85% norm would also suggest that we should use exclusively heterosexual couples in example sentences, and probably avoid any interracial couples. If 20% is a strong norm as well then we shouldn't refer to African-Americans with college degrees. That is neither here nor there, since "strength" of the norm has nothing to do with my point. - TheDaveRoss 16:26, 10 April 2017 (UTC)
I would add an image of a gay couple at couple. It already has quotes, but it lacks an image at the moment. --Daniel Carrero (talk) 21:12, 10 April 2017 (UTC)
Do we also get to change ladies and gentlemen to gentlemen and ladies in order to shake things up? No, it’s only changing things in a way that looks like (to a Tumblrite maybe) it “benefits women” that is politically correct (but if anyone ever finds a woman who actually gains anything from us changing “men and women” to “women and men”, let me know). — Ungoliant (falai) 15:23, 10 April 2017 (UTC)
I don't see any suggestions that we should enforce changes to a "progressive" style, the question at the top was about whether it was allowable to use a less-common construction in usage examples. - TheDaveRoss 16:26, 10 April 2017 (UTC)
ladies and gentlemen is an idiom; men and women is not, otherwise we would have that entry. (Should we have that entry?) Either order ("men and women" or "women and men") is fine and natural. --Daniel Carrero (talk) 21:11, 10 April 2017 (UTC)
Both ladies and gentleman and men and women obey Behagel's law of increasing terms: the shorter term (the one with fewer syllables) comes before the longer term. This is why we say salt and pepper, not *pepper and salt, and why English speakers say bow and arrow but German speakers say Pfeil und Bogen (lit. "arrow and bow"). —Aɴɢʀ (talk) 11:44, 11 April 2017 (UTC)
I had never seen those before, but I like that the second "law" is effectively "bury the lede." - TheDaveRoss 11:52, 11 April 2017 (UTC)
  • My two cents: Changing a shorter phrase like she is pretty (adjective) to a longer phrase like she is the best athlete (multi‐word predicate) seems petty and counterproductive to me. Usage examples (and quotations) should be as short and simple as possible. (As a personal rule, I never truncate a full sentence, though.) Add only what is required to exemplify exactly what you want to show and do it in a way that does not distract from the item meant to be exemplified by it. To me that means they should always be as descriptive and close to natural language as possible, and that means avoiding crassly infrequent forms unless the example is meant to show exactly this infrequent form of speech. If users can recognise that your choice of words was a choice, i.e. is a conscious decision against a habit or is at very least a habit of yours specifically trained, that's your indicator that it's probably not befitting a plain neutral dictionary example. If I can recognise that the author had an agenda, Sachliteratur (educational non‐fiction) loses my trust. ps.: I'm no native speaker, but are these terms even idiomatic? My Google results for womaned are Urban Dictionary, we and discussions whether or not it's acceptable in Scrabble. Korn [kʰũːɘ̃n] (talk) 22:44, 11 April 2017 (UTC)
  • Regarding the topic this has strayed into: as someone once put it, if a man finds it jarring to open a book and see (for example) "Any citizen who wishes to change her registration should ..." (=gender-neutral she) — if it seems like an "agenda" to him — he should try to imagine how women feel opening every book that says "Any citizen who wishes to change his registration should ..." ("gender-neutral" he). Etc, etc. But to return to the initial topic... creating entries for attested "woman" and other terms seem fine, though we'll obviously have to figure out what context labels apply — some need no label ("neo-fascistic"), some are "rare", "nonstandard", etc. - -sche (discuss) 02:45, 12 April 2017 (UTC)
Eh, I think it has more to do with what's most common. Any women or girls I've spoken to about this sort of thing (admittedly there aren't very many) find it odd when they hear someone default to "she" when the gender of someone is unknown. I know "he" sticks out to me slightly (unless it's an older text, since I expect to see it there), since it's not commonly used by my generation, and I'm more used to a gender-neutral "they". Andrew Sheedy (talk) 03:03, 12 April 2017 (UTC)
Literally everything offends somebody. I don't like the default "he" and will try to gender-neutralise things where I can. But I was taught that two wrongs don't make a right, and wouldn't swap one bias for the opposite bias. I know that's terribly unfashionable now. Equinox 03:13, 12 April 2017 (UTC)
I agree. Otherwise we're going to need a men's rights movement in a hundred years or so. ;) More seriously, though: as concerns Wiktionary, I think we should try not to offend within reason, but not be fanatics about it. Let's not offend people's intelligence by including obvious attempts to be all-inclusive, when it doesn't reflect the language as it is used. It isn't the job of a dictionary to promote progress.... (Besides, some potentially offensive usexes have made my day, so that makes them worth keeping, right?) Andrew Sheedy (talk) 04:38, 12 April 2017 (UTC)
@Andrew Sheedy: “Suicide is inimical to the health of the participant.” — That is lovely. A smile was cracked. — I.S.M.E.T.A. 00:04, 14 April 2017 (UTC)
Yeah, I don't see anything wrong with that usex except possibly the last word — does English normally describe a single person who kills themself as a "participant" in suicide? - -sche (discuss) 17:50, 15 April 2017 (UTC)
@-sche: You're right — I'd much prefer practitioner. ;-)  — I.S.M.E.T.A. 19:36, 15 April 2017 (UTC)

It seems the only truly feasible way to resolve such conflicts is to replace the offending usage example with a real quotation. — I.S.M.E.T.A. 00:06, 14 April 2017 (UTC)

  • The news just now is on how the frequent use of career/math/science words with male names but art/family words with feminine names, and other tendencies of our (=humans') texts to accept and reproduce traditional biases, is teaching artificial intelligence to be biased, causing mistranslations of e.g. "o bir doktor". Obviously, Wiktionary's mission is not to save the world from bad AI (we do have a user with a username that looks suspiciously like an abbreviation of John Connor, but I won't blow his cover), but this helps quantify The Dave Ross's comment of 14:49, 10 April 2017 (UTC) about how reproducing traditional biases is not an automagically bias-free position. But I do think we can — and usually do — find a balance, like many above have said. :) - -sche (discuss) 17:57, 15 April 2017 (UTC)
@-sche: But o bir doktor (he is a doctor) is not a mistranslation, in the same way that o bir hemşire (she is a nurse) is not a mistranslation. Grammatically, one can substitute he, she, or it in either case without error. However, there is nothing objectionable or “problematic” (that weasel word…) about Google Translate’s default translations. And those “biases” are wholly justified, both statistically and etymologically: etymologically because of derivation from words undeniably of the corresponding grammatical and/or natural gender [Latin doctor m, Old French norrice f (wet nurse), Persian همشیره (sister)], statistically because most doctors are men and most nurses are women (no matter how “problematic” a fact that may be). It seems that Ian Johnston and writers like him are too ready to infer damnable prejudice. It is far better to maintain unconscious biases, potentially thereby preserving some subtle, unrecognised facet of a language, than it is to extirpate the lot in favour of linguistically misguided, politically motivated counter-biases. — I.S.M.E.T.A. 21:11, 15 April 2017 (UTC)
It's a misleadingly overspecific translation especially if the doctor in question is, for example, established as a woman in a previous sentence. The usual way of indicating a pronoun is not gender-specific, at least in the languages I work with, is "he/she" or "she/he" (though even this fails to capture cases where a pronoun could equally also apply to non-binary people — and some of the languages I refer to are spoken by peoples who recognize traditional more than two genders, like the Ojibwe two-spirits who gave English that term). Who is proposing "extirpating" any swath of things except as a straw man? The usage examples I see discussed here include things like "she is pretty" → "she is the best athlete", where the second phrase is just as fluent as the first, but provides greater variety than the hackneyed "she is pretty". - -sche (discuss) 21:31, 15 April 2017 (UTC)
May partly be an artifact of Google's overly clever "statistical" translations that don't always attempt to understand the context/flow of a text, if they can just compare it with billions of similar snippets. Equinox 21:40, 15 April 2017 (UTC)
@-sche: Sure, there's no problem with something like an isolated instance of substituting she is pretty with she is the best athlete, but I'd like to pre-empt an implicit imprimatur to go about sanitising all our “problematic” example sentences. I mean, great, Ojibwe has non-binary pronouns (or however it expresses that), but English standardly only has he and she — should we maintain that idiomatic binary, or roll out usage like the pronouns ze, zim, and zir (either when translating Ojibwe usexes or just generally)? Like Equinox wrote, “literally everything offends somebody”; I'm sure steak and juicy commonly collocate — is that fact objectionable? — I.S.M.E.T.A. 22:53, 15 April 2017 (UTC)


I just realised that I mistakenly classified capitalist as a ‘synonym’ of Keynesian. Is there a reason why we have template:synonyms but not template:hypernyms? Because if not, I’ll just make the template myself. — (((Romanophile))) (contributions) 02:53, 10 April 2017 (UTC)

No, go ahead. DTLHS (talk) 03:44, 10 April 2017 (UTC)
Don't forget the documentation page. The template is currently uncategorised. —CodeCat 14:27, 10 April 2017 (UTC)

customizable lists of terms[edit]

I think the easiest and most far-reaching implementation would be a user-friendly interface page where different "linguistid" parameters can be selected to create customized categories, instead of the pre-arranged ones that we have now. --Backinstadiums (talk) 07:13, 11 April 2017 (UTC)

You don't have to use the pre-arranged categories. Just create some new category like Category:English (something) words. But if it's a good category that works for many languages, it should be implemented into the "pre-arranged" system eventually. --Daniel Carrero (talk) 07:28, 11 April 2017 (UTC)
@Daniel Carrero: No, I am afraid that stationary approach is no longer usuful. I meant for users to select certain terms according to the searchable info. appearing in their respective entries. For example, listing "Arabic adjetives with a certain pattern which form their plural with the pattern فعْلة". Currently, such a dynamic approach, narrowing down or intersecting linguistic features, is not available, yet a modern online dictionary might very well offer it. --Backinstadiums (talk) 07:58, 11 April 2017 (UTC)
Oh, I understand. You mentioned "intersecting linguistic features". I would definitely support having some way to intersect categories. I'm pretty sure there was an external website where you could search and intersect Wiktionary categories.
That aside, I don't speak Arabic and I don't understand anything about "the pattern فعْلة". But I would seriously support having a category like Category:English nouns with -es plurals, so maybe other people could consider creating something like Category:Arabic adjetives with فعْلة plurals too, if that makes sense. --Daniel Carrero (talk) 18:22, 11 April 2017 (UTC)
@Daniel Carrero, Backinstadiums You can intersect categories by writing in search bar for instance incategory:"English 2-syllable words" incategory:"English adjectives . It doesn't show like a regular category, but it does definitely help. Julien Daux (talk) 19:45, 11 April 2017 (UTC)

@Daniel Carrero, Julien Daux Categories should be replaced by adding such info. in the template of a term entry, enabling any user to choose the specific features of the list of terms they want to arrange. Instead of those lines of code, it would enrich the dictionary to create a new page where all the different available options (linguistic/lexicographic features actually) can be 'ticked' before hitting the 'search' button. Should this be a formal proposal? How to proceed then? --Backinstadiums (talk) 07:45, 12 April 2017 (UTC)

French Wiktionary monthly news - Actualités[edit]


I am glad to inform you that the 24th issue of Wiktionary Actualités just came out! As usual, it is a short page about the project and lexicography in general. This time: a focus on a dictionary about toponyms and another focus on the jargon in use in European administrations. I am sure it is poorly translated in English, and I am still doubting it is useful to translate it, after six months of regular translations. So, you are welcome to read it but also to improve it! Let us know if it inspires you some new ideas for your project. We celebrate two years of regular publication, and we are quite proud of it, but in the same time, we are not sure of the popularity or quality of our collective writing. We received very few feedbacks. The popularity of Wiktionary is changing, and there is more conferences about the project this year than ever, like a big venue in a museum in Paris in February or a one hour discussion about lexicography and Wiktionary this month in Lyon. Another one is plan for this Thursday in Lyon too. In this picture, we do not know if Actualité is playing a role or not. So, in preparation for our next month edition, an anniversary edition, please let us know your opinion and ideas for the future! Thank you Face-smile.svg Noé 12:25, 11 April 2017 (UTC)

Persian conjugation tables[edit]

The currently used conjugation tables for Persian words generally do a great job listing the various forms and their transliterations. However, there are some major issued that need be addressed. I will list them as separate points below:

  1. A large number of forms are in fact compound forms. Since these are regularly formed using auxiliary verbs, I don't think that they should be included in the table. It is as though Spanish verb tables included forms such as ha comido, había comido, está coimendo and va a comer. To my knowledge, it is not common practice on Wiktionary to include such forms in any language.
  2. The table includes an alleged "aorist" form, but there is no such form in modern Persian. It might have been described as such by some author, but it is definitely not standard.
  3. There is some redundancy in the naming of the forms "past (imperfect)" and "present (imperfect)". In my opinion, they should be termed simply as "past" and "present".
  4. I find it a little bit strange to have conjugation tables for colloquial forms, but this seems to be done in other contexts too, such as examples (e.g. گفتن#Verb).
  5. Maybe negated forms should be included, since the prefix used is not the same for all tenses.

Thoughts? Unless anybody opposes, I think the conjugation tables should be reworked to more accurately and concisely represent modern Persian verbs. HannesP (talk) 14:03, 11 April 2017 (UTC)

I haven't started learning Persian in earnest yet, so I can't address most of these issues myself. I do agree that across languages, we should avoid giving too much space to compound forms, but I think the presentation of separate colloquial tables is very valuable and should be kept. @Dick Laurent, ZxxZxxZ, Irfan, Vahagn PetrosyanΜετάknowledgediscuss/deeds 16:55, 11 April 2017 (UTC)
Fixing typo above, and because none of these people seem likely to respond but Kolmiel is active, @Irman, KolmielΜετάknowledgediscuss/deeds 23:52, 11 April 2017 (UTC)
While I don't know much at all about Persian, the points you make seem to be sensible, so I support. —CodeCat 23:54, 11 April 2017 (UTC)
My opinions:
1. It is common practice to include compound forms in several languages, but I agree that it is not a good thing. So, support.
Later additional comment: I would count the perfect among the non-compound tenses, though. It's written in one word.
2. The aorist form is definitely standard. First of all, our Persian entries cover all of modern Persian, which starts around the 8th century AD. But even in contemporary Persian this form is used, chiefly in poetic language. I don't know if the name "aorist" is standard. I suppose there could be a better name; particularly since the Greek aorist is chiefly a past form, while the Persian form is present.
3. I agree with calling the present "present", but not with "past". Persian has at least four past forms: "imperfect", durative, perfect, pluperfect. I think the "imperfect" could be called "preterite".
4. Colloquial conjugation tables are useful. The vernacular has different endings and different rules with stems that end in a vowel, etc. Once we get rid of the compound forms, it will also become less messy.
5. I'm not sure if negated forms would be useful. They deserve to be mentioned more than compound forms; agreed. But the prefix is generally na-. In Iran nami- now commonly becomes nemi-, but this is not part of classical Persian nor of Dari. And the thing is that adding these would make the table messy again when what we want is to make it clearer. Kolmiel (talk) 12:15, 12 April 2017 (UTC)
Later additional comment: It's true that the negative imperative is often ma- in older language, and with vowel-initial stems na- becomes nay-... So, maybe it does make sense to include negative forms. However, my preferred solution would be to give just one form (e.g. 1st p. sg.) as an example for each tense. The rest is mere repetition. Kolmiel (talk) 15:05, 12 April 2017 (UTC)
1. Neutral
2. Kolmiel is right
3. No comment
4. Those collowuial forms are very valuable, you can hardly find any source that mention them, and this colloquial Persian (which is markedly different from standard Persian) is increasingly used sometimes even in written published works (unfortunately).
5. I think it may be useful. In passive tense for example the prefix is added before شدن šodan: گفته نشده است gofte našode ast. In classical Persian also it is not always added before می mi-: می نگوید (ha)mê nagôyad. And there is also م_ ma- used for certain forms. I think we should add these stuff and make the table less messy through adding hide and show options.
5+1. An archaic form is missing: it is created by the suffix ی , I don't know what it is called.
--Z 13:07, 12 April 2017 (UTC)
Thanks for your input, Kolmiel and Z!
  • Concerning negative forms, I think you both present valid points. The negative is manifested in enough ways for it to make sense being included (ne-/na-, nay-, be- -> na- in subjunctive, classic forms) but it is true that it would clutter the tables. Just showing it for one person/number sounds like a fair solution, but I can't recall seeing it in other languages on wiktionary.
  • As for the perfect, I still think it qualifies as a compound form, since it is regularly formed with the participle and the enclitic copula. In that sense, it's pretty similar to the perfect in e.g. Italian. Furthermore, it's still written as two words in literary Iranian Persian (شده است etc).
    • Clarification: the last point specifically refers to third person singular.HannesP (talk) 19:13, 13 April 2017 (UTC)
  • How common is this "aorist" form in practice? And what's the correct term? My 400-pages grammar doesn't even mention it, but that's not necessarily indicative of its actual use. Could you please provide examples of where, and in what situations it's used?
  • As Z brought up, there are further forms that aren't included in the current tables. The -i indicative is one, but there are others, such as the hami- indicative and the be- perfective. Where should we draw the line? Why include "aorist" but not "hami- indicative"? Even though Modern Persian spans a long period, I think the conjugation tables should mainly reflect contemporary use. Compare with English entries, which don't include forms such as "[thou] bringest", even though such forms have certainly been used in Modern English.
  • Regarding whether forms such as رفتم should be called "past" or "preterite", I don't really have a strong opinion. In my opinion, the fact that there are several past forms isn't a convincing argument against "past". In English, forms such as "went" are commonly called "past form" but that doesn't mean English doesn't have perfect or pluperfect. However, for clarity "preterite" is fine by me, but I don't know how established it is in the literature. My Swedish grammar uses "preteritum" to describe this form in Persian, though.
    • After giving it some more consideration, I do think that preterite is a better term than past. HannesP (talk) 21:43, 13 April 2017 (UTC)
--HannesP (talk) 19:12, 13 April 2017 (UTC)
@HannesP: You may find it profitable to survey the way in which Latin verb forms are presented in Latin entries’ conjugation tables. — I.S.M.E.T.A. 01:49, 14 April 2017 (UTC)

Read-only mode for 20 to 30 minutes on 19 April and 3 May[edit]

MediaWiki message delivery (talk) 17:33, 11 April 2017 (UTC)

Moving the translations of water[edit]

Water is in CAT:E again because it hits the limit on available memory for Lua. The unusually comprehensive translation table plays a role in that: the page makes it as far as Singpho, ~2,130 out of ~2,750 translations in, before collapsing. Perhaps, in the same way that we put undisplayably long words into special pages, the undisplayable translations of the first definition of water could be moved to a subpage or appendix (the translation-adder might have to be updated to work in the appendix namespace). It would be linked-to using the regular {{trans-see}} template. Obviously, in the distant long term when every page is as complete as water, we'll need to update our modules and templates to be more efficient, or ask the developers to let us use more memory, but for now this page is a special case — the page with the next-most translations has only a third as many, and the page after that has only a sixth as many. - -sche (discuss) 21:14, 11 April 2017 (UTC)

I'd ask why adding more links causes an out-of-memory error in the first place. Surely creating one link doesn't need less memory than creating a thousand? After each link, the used memory should be discarded, so it can be reused for the next. It seems like Scribunto has faulty memory management to me. —CodeCat 21:17, 11 April 2017 (UTC)
It is not running out of physical memory, it is hitting the 50mb limit which set for page construction (Lua memory usage: 50.21 MB/50 MB). We ought to remove all of the calls to {{t-simple}} and create flat wiki-markup links. Even the "simple" version makes calls to modules which have to do page loads and lookups, it is all terribly inefficient without a database. It would also be a good idea to move all translations to a sub-page and transclude them, so that clicking "edit" on the article would be less sluggish. - TheDaveRoss 21:33, 11 April 2017 (UTC)
There is no way that the final page is 50 MB, so something along the line is not releasing memory when it should. Expanding one template doesn't need 50 MB, so if the memory is freed afterwards, you can do the same 1000 more times. It's obviously possible to do this with 50 MB, so if it's not, then the software's memory management is faulty. —CodeCat 21:40, 11 April 2017 (UTC)
I am not sure how the memory works, but I doubt any memory that is used is later freed up; if that were true, the limit could never be reached, given how small our module pages are in bytes. None are anywhere near 50 megabytes. — Eru·tuon 21:54, 11 April 2017 (UTC)
Before I posted here, I considered the possibility of replacing the t-simples with bare wiki markup, but I realized it would have some insidious side effects, like if a code is later removed from the language modules (due to being subsumed as a dialect of something else), there will no longer be an error if someone fails to update [[water]] — and if they search for and replace all uses of the code, the search won't turn up water. - -sche (discuss) 21:41, 11 April 2017 (UTC)
I agree that there are a lot of upsides to having the translations templatized, however when we hit the technical limits of the platform we might have to make accommodations. - TheDaveRoss 21:46, 11 April 2017 (UTC)
We would also be losing the script tagging. DTLHS (talk) 21:47, 11 April 2017 (UTC)
Or the platform needs to be fixed. I suggest filing this as a phabricator issue. —CodeCat 21:48, 11 April 2017 (UTC)
I would sooner look to fixing the languages module than Mediawiki. - TheDaveRoss 21:52, 11 April 2017 (UTC)
Any suggestions? —CodeCat 21:55, 11 April 2017 (UTC)
Migrating all of the data to WikiData, however that is not possible quite yet. If we aren't going that route, creating an extension which does the language lookups from the MW database would also speed things up tremendously and reduce the amount of memory needed by the page creation. - TheDaveRoss 21:58, 11 April 2017 (UTC)
I think you still don't understand the problem. This link doesn't need 50 MB to be created, and it works just fine: test. Why does memory usage go up when I include more of such links? Why are 999 previous transclusions affecting how the 1000th one is transcluded? Transclusions should be completely independent of each other and not share any memory. Once one transclusion has been processed, the memory should be done with and available for the next one so that the next has just as much memory available as the previous. If this is not happening, and pages are gradually running out of memory with each successive transclusion, then there's a memory leak. —CodeCat 22:03, 11 April 2017 (UTC)
That assumes that there is no parallel processing going on, and that GC is run after each object. - TheDaveRoss 22:12, 11 April 2017 (UTC)
If each template expansion uses a shared pool of memory and runs in parallel nondeterministically, then memory usage itself is nondeterministic. It means that some pages will sometimes run out of memory and sometimes won't, entirely at random, by the very nature of the system. That seems like a fairly big flaw. —CodeCat 22:15, 11 April 2017 (UTC)
I have an idea. Module:scripts uses the lang:getScripts() function from Module:languages. This function appends a table of script objects to the language object. Thus, I assume, for example, the script object for "Latn" is repeated inside the language object of every language that uses the Latin alphabet. Not sure how much memory that would use, but it seems quite wasteful. — Eru·tuon 22:05, 11 April 2017 (UTC)
It shouldn't be very wasteful unless you are handling lots of language objects at once. But each call to {{t}} only needs one language object, and as I noted above, the memory of each template call should be freed afterwards. So the amount of memory needed for each individual template is not at all high. It's only if the memory is not properly being freed and reused that it eventually gets used up, which I suspect is the case. —CodeCat 22:09, 11 April 2017 (UTC)
Can we construct a module that proves it is the scribunto implementation that is wasting memory and not something that we can control? DTLHS (talk) 22:10, 11 April 2017 (UTC)
It's hard to do that since nothing else but Lua uses Lua memory. However, if transcluding the same template call 1000 times uses more memory than just once, then something is off. —CodeCat 22:13, 11 April 2017 (UTC)
Again, I think your assumption of how Lua memory works is faulty. It seems that memory is not freed up after a template is evaluated; rather it accumulates as the software goes through each Lua-based template from the top of the page to the bottom. — Eru·tuon 22:16, 11 April 2017 (UTC)
I'm saying how it should work. If it doesn't work that way, then that's a flaw in the software that should be corrected. There's absolutely no need for a call to {{t}} to use memory from the previous call, since individual transclusions and module invocations are entirely separate. mw.loadData is the only way in which multiple modules invocations can share data, and that's tightly controlled and read-only. —CodeCat 22:23, 11 April 2017 (UTC)
A compromise might be to move the "obscure" languages (based on number of entries, or speakers, or something) and keep the common ones: it's annoying to make users do an extra click to translate water into something mainstream like French or Chinese. Equinox 21:18, 11 April 2017 (UTC)
I posted about this in the Grease pit. I'm puzzled why the module error suddenly cropped up. Perhaps the recent addition of two cognates to the Etymology section pushed it over the top. I wonder if it could be avoided by a simpler method than moving translations (though there are so many translations that I think that wouldn't be a bad idea, even without a Lua memory error). Perhaps by handling language objects differently in the etymology and translations modules. — Eru·tuon 21:26, 11 April 2017 (UTC)
Would it be possible to move translations to alphabetical subpages and then have a menu to expand them (by first letter of the language name) as needed? DTLHS (talk) 21:34, 11 April 2017 (UTC)
It would require Javascript to function unless we were willing to load them all every time. I am not sure if we have a stance on whether Javascript should ever be required for all content to be displayed. - TheDaveRoss 21:39, 11 April 2017 (UTC)
Might recent module renovations be a cause? Maybe we should be looking at the efficiency of the code. —suzukaze (tc) 21:56, 11 April 2017 (UTC)

@CodeCat: Test of Lua memory: this edit (1.22 MB) versus this edit (2.98 MB). The second consists of 1 case of {{l|en|word}}, the second of 24 cases of the same. So, it's not quite 24 times as much memory. — Eru·tuon 22:24, 11 April 2017 (UTC)

Yeah, that basically shouldn't be happening. Lua invocations should be entirely independent of each other, that's how Scribunto was designed. I wonder what happens when you do this with a Lua module that has a function that just returns an empty string. —CodeCat 22:33, 11 April 2017 (UTC)
Well, perhaps you are right that it shouldn't work this way or perhaps not. I don't know. Perhaps the memory does not actually accumulate, but rather the memory-measurer increments by the amount of memory used by a given template, whether or not the memory is dumped before the next template is evaluated. I suppose one would have to post on Phabricator to find out. I wouldn't know what exactly to ask. — Eru·tuon 23:30, 11 April 2017 (UTC)
  • Unqualified commentary, I know zilch about tech: This could be an incentive to centralise all translations on WikiData and then find a way to provide them to every Wiktionary at once, providing our Wiki's already‐put‐in work to all Wiktionaries with less manpower. Korn [kʰũːɘ̃n] (talk) 22:30, 11 April 2017 (UTC)
Right now, I think it pretty clear we should move the translations of water to a subpage. It may not be what we would want to do in optimal circumstances, and perhaps Wikimedia improvements could let us move it back, but perfect is the enemy of good.--Prosfilaes (talk) 23:13, 11 April 2017 (UTC)
Is that a better solution than simplifying all of the translations which are presented on the page so that they don't make module calls and thus don't use nearly as many server resources? There are lots of possible solutions, many of which make no substantive change to the user experience. - TheDaveRoss 13:30, 12 April 2017 (UTC)
Removing module calls might require removing transliteration, gender, and script classes. If so, I don't like the idea. — Eru·tuon 15:35, 12 April 2017 (UTC)
It would only remove those things being done on the fly, the exact same presentation is possible without using templates/modules. - TheDaveRoss 16:05, 12 April 2017 (UTC)
I don't understand what you mean by "removing these things being done on the fly". Could you clarify? — Eru·tuon 16:39, 12 April 2017 (UTC)
Sure. When we use {{t}} (and others) the template makes several module calls and require a bunch of Lua overhead. As we have discovered this overhead exceeds the limits put in place by Wikimedia's tech staff. If we do not make module calls, but instead replace the module calls with static results (e.g. instead of calling the languages module getByCode(es) to get "Spanish" we write "Spanish") then we reduce the overhead significantly. On most pages where there are relatively few module calls it is not an issue, however Water results in many thousand module invocations and thus a lot of overhead. The downside is that future changes require a "manual" update of water, but until a better solution comes along it would not require that the user experience of water change, only the wiki-markup. - TheDaveRoss 16:59, 12 April 2017 (UTC)
For example, here is a simple change we could make: {{t-simple/test}} adds an optional lang= parameter, and when a language name is included in the call it skips the module call entirely. - TheDaveRoss 17:06, 12 April 2017 (UTC)
I changed the parameter to |langname=, because |lang= usually means language code. --WikiTiki89 17:23, 12 April 2017 (UTC)
This works, see User:Wikitiki89/water. --WikiTiki89 17:46, 12 April 2017 (UTC)
But the language tags are missing from most of the words. —CodeCat 17:50, 12 April 2017 (UTC)
That's a bug in the template. It can be fixed. --WikiTiki89 18:32, 12 April 2017 (UTC)
Done. --WikiTiki89 18:37, 12 April 2017 (UTC)
@-sche, this is obviously better than using a subpage. Do you have any opposition to using {{t-simple/test}}? —Μετάknowledgediscuss/deeds 00:05, 15 April 2017 (UTC)
If that template really uses less memory, then I'm all for it. (I suppose we can find a better name for it at some point.) However, we'll have to be careful about which translations we use it on, and not use it on any that get font support (like Navajo) or automatic transliteration (which I suppose is a subset of languages that get font support, since I think all our non-Latin-script languages have fonts specified for them?), or even that have genders, because it seems to disable all those functions. (It would seem possible to add the ability to specify a gender without invoking any modules; just accept whatever gender is input and trust human editors to review things on the one or two pages this should be used on.)
I'll set about adding it. - -sche (discuss) 19:01, 15 April 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I think we can just update {{t-simple}} since it is explicitly made for water. It also did not previously do any of the other things that {{t}} did, so the places where it is already in use would see no change to their displayed text. There are comparable workarounds for all of the other components of {{t}}, though, if we want to implement them in {{t-simple}}. - TheDaveRoss 19:18, 15 April 2017 (UTC)

Good point. I've just replaced the content of {{t-simple}} with the code from {{t-simple/test}}, and updated water/translations to use it as much as possible, with langname= specified. However, I haven't moved the translations back onto water; see my comment at the bottom of this thread that has the same timestamp as this one. - -sche (discuss) 02:26, 17 April 2017 (UTC)
When I moved the translations back into the main entry, it broke again; see the Grease Pit. - -sche (discuss) 09:58, 25 April 2017 (UTC)
  • Who says splitting a page can reduce memory usage? misunderstood.--Giorgi Eufshi (talk) 07:54, 12 April 2017 (UTC)
  • I suspect that memory usage could be reduced by creating lighter-weight versions of Module:languages, Module:scripts, and Module:links, which are used by {{t}}. For instance, Module:links stores a separate script object every time it makes a link. All the script code does is confirm that the script code in the language's data file is valid. This could perhaps be done more simply by a table containing { scriptCode1 = true, scriptCode2 = true, ... }. A single script code-validating table would probably use less memory than one script object for every single link in a translations table. Similarly, a comprehensive table of canonical names indexed by language name might be smaller than one language object for every link on a page. I could be wrong, but I suspect this is true, as the amount of memory used by {{#invoke:languages/templates|getByCanonicalName|English}} (which invokes Module:languages/canonical names, which creates a list of all languages indexed by canonical name) is only 10.01 megabytes. — Eru·tuon 17:51, 12 April 2017 (UTC)
    • Such a list could be generated on the fly and then loaded with mw.loadData, so that we wouldn't have to maintain it. All that would be needed in Module:scripts, then, is a function like isValidCode that verifies the existance of the code using this module. However, we would need to ask where this check should be done. When functions expect script objects, the mere fact that you need to provide such an object already forces you to go through getByCode to retrieve it, and a check is part of that process. But if functions take script codes, then this layer of validation is lost. —CodeCat 18:10, 12 April 2017 (UTC)

Use # for senseids in raw-link parameters to templates[edit]

Right now, there is no way to indicate the senseid a link should point to, other than by wrapping a template around the link and giving it the id= parameter. The template {{ll}} was created in part to work around this problem. I think there is a better and more intuitive way to do it. Links already naturally support the # fragment notation to link to a specific section on the page. However, we never use this on Wiktionary because such links are ambiguous and prone to breaking: they always point to the first section with that name on the page, no matter what language it's in. Reordering sections or even adding a new language to the page can break links. Senseids, on the other hand, are stable and don't get messed up like this. So how about we co-opt the # syntax to indicate the senseid for links when they are given to templates?

An example (contrived) use in such a situation might be {{head|en|...|head=[[give#grant]] [[up#skyward]]}}. The {{head}} template, and any other template using Module:links, would automatically convert these links to give#English-grant and up#English-skyward, which is the format for link targets that {{senseid}} generates.

If we are going to do this, then we first have to find any existing uses of fragments within links and fix them up appropriately. —CodeCat 18:33, 12 April 2017 (UTC)

I would agree that it would be better to have a better syntax, but how would the template distinguish between language names and sense-ids? You wouldn't want it, for example, to make {{en-noun|head=[[tête#French|tête]]-[[à#French|à]]-[[tête#French|tête]]}} link to tête#English-French and à#English-French. Also, just a minor correction, {{ll}} was not created to work around this problem, but rather it was adapted later to work around this problem. --WikiTiki89 18:47, 12 April 2017 (UTC)
This is why I said we need to fix such cases first, before implementing this. —CodeCat 18:51, 12 April 2017 (UTC)
You partly misunderstood the question then. What would you replace my example with so that it still links to the French section? --WikiTiki89 18:54, 12 April 2017 (UTC)
I would not have it link to the French section, because it's an English term and the individual parts mean nothing in English. The term was borrowed as a whole from French, so the etymology should deal with that. I would write {{en-noun|head=tête-a-tête}}. —CodeCat 18:58, 12 April 2017 (UTC)
I oppose breaking this functionality. --WikiTiki89 19:03, 12 April 2017 (UTC)
I don't know what functionality there even is that would be broken. The example you gave isn't even functionality but a misuse, so I wouldn't count that. —CodeCat 19:07, 12 April 2017 (UTC)
The functionality that allows language sections to be explicitly specified in the links. It was a feature we intentionally added back when this style of linking was first developed, and it should be kept. --WikiTiki89 19:15, 12 April 2017 (UTC)
I agree entirely with Wikitiki89 here. CodeCat, your idea's a good one; why won't you use your considerable ability to preserve this desirable functionality? — I.S.M.E.T.A. 23:05, 13 April 2017 (UTC)
I agree with CodeCat. I don't think an English head should ever be linking to a French entry. — Eru·tuon 19:14, 12 April 2017 (UTC)
Don't forget that this affects nearly all templates, not just headword templates. --WikiTiki89 19:15, 12 April 2017 (UTC)
Also, it's not a good idea to omit features in templates in order to enforce "good practices", so even if we agreed that this shouldn't be done, that doesn't mean the template shouldn't support it. --WikiTiki89 19:18, 12 April 2017 (UTC)
Of course templates should enforce practices. {{l}} doesn't allow you to omit the language code, nor does it allow you to use |lang= instead. Note that we are still cleaning up the mess from {{term}}'s optional language parameter. —CodeCat 19:38, 12 April 2017 (UTC)
I'm not sure when the language-anchors would be needed, but it is probably best not to transform an anchor marked by # into a senseid. It is likely to cause confusion for editors who have not read whatever documentation pages would describe this feature. Perhaps a different collocation of characters could be chosen to mark the senseid, something not otherwise allowed in page titles. (One of the symbols listed in Appendix:Unsupported titles, I guess.) — Eru·tuon 00:12, 13 April 2017 (UTC)

Cognate & automatic interlanguage links[edit]

Hello all,

From April 24th, a new interlanguage link system will be deployed on all Wiktionaries. This extension, Cognate, automatically links the pages with the same title between the Wiktionaries. This means they no longer have to be added in the pages of the main namespace.

This new feature has been developed by Wikimedia Deutschland as the first step of the project Wikidata for Wiktionary, but does not rely on Wikidata.

To allow the feature to operate, all the former interlanguage links have to be removed from the wikitext. You can do this by using a bot, as it was done on Wikipedia in the past. If you leave them in they will overwrite the automatic links.

During the development we had a lot of discussions with Wiktionary editors to understand their needs, but it's possible that some automatic links don't work as you would expect. If you find some bugs or have suggestions for improvements, feel free to add a sub-task on Phabricator or add a message on this talk page.

None of the announcements linked from above refers to the name Cognate, nor does it seem to link to a place where the name is mentioned. This seems to be the first opportunity in which the English Wiktionary can get aware of the name, an obvious misnomer. In fact, it would have sufficed if anyone who was choosing the name checked what the word cognate means. Misnomers like this are bad even if they do not affect most Wiktionary users. I add my voice to those who kindly ask you to reconsider and to change the extension name to something that is not a misnomer. --Dan Polansky (talk) 14:31, 16 April 2017 (UTC)
Just for the record, it's virtually impossible to rename an extension. We still have an extension called SyntaxHighlight_GeSHi, despite the fact that it hasn't actually used the GeShi framework for years. And as Lea observes, extension names are purely internal, appearing on Special:Version only. This, that and the other (talk) 04:19, 22 April 2017 (UTC)
It is certainly possible to create a new extension with the same content as another one; the question is whether that is worth the hassle. --Dan Polansky (talk) 09:08, 22 April 2017 (UTC)
@Dan Polansky: This seems to be the Phabricator project page about (Not) Cognate: https://phabricator.wikimedia.org/project/profile/2320/
And this seems to be a related patch: https://gerrit.wikimedia.org/r/#/c/332912/
I'm not an expert in extension-building, but it seems there are discussions, members, watchers, subprojects, milestones and probably other stuff. The patch page displays a list of authors and their contributions in the history. Maybe creating a new extension with the same content is not an option because the new extension would lose all that. It seems we'll have to live with that name. --Daniel Carrero (talk) 03:10, 28 April 2017 (UTC)
Can the "Title Normalization" feature be turned off? It seems to be a mistake. —RuakhTALK 18:27, 16 April 2017 (UTC)
Hello @Ruakh, the title normalization feature has been requested by the community, in order to link together pages that have similar names but just a difference of characters, such as ellipsis vs 3 dots. We can adapt this or create new rules if necessary. Can you explain me some use cases where this should be modified? Thanks Lea Lacroix (WMDE) (talk) 10:44, 18 April 2017 (UTC)
With all due respect to Amgine, I don't think (s)he constitutes "the community". There's no technical reason that we couldn't already treat certain variants as equivalent; rather, we have chosen not to. —RuakhTALK 14:41, 18 April 2017 (UTC)
In self-defense, I was attempting to point out that different variants are different, and should not be normalized. Neither should they generate multiple IW. My exemplar was that combined ellipses are different than three stops, and should not interwiki the same (that is, only three dots should generate an IW to three dots.) (I believe EncycloPetey was also discussing combining unicode at the time.) - Amgine/ t·e 05:09, 26 April 2017 (UTC)

Hi, Colgate Cognate will be deployed at 12:00 UTC tomorrow, should I start removing all interlanguage links with the exact same title with my bot? Regards. --Thibaut120094 (talk) 20:23, 23 April 2017 (UTC)

Supposedly, the presence of manually-spelled-out interwiki links will override Colgate. Once the extension is deployed, I would suggest removing all interwikis with the exact same title from a few pages to check that the extension kicks in and works as expected, and if it does, then it would be a good idea to remove the interwikis by bot. Some might argue it would be good form to hold a quick vote (..and might block the bot otherwise...), although that seems to me like an excess of bureaucracy to accept something that makes it easier for us to obtain the same result (of pages having interwiki links). - -sche (discuss) 21:12, 23 April 2017 (UTC)
I can't wait to see how new toothpaste work. --Octahedron80 (talk) 01:54, 24 April 2017 (UTC)
@Thibaut120094 For information, there is no need to rush to run your bot just after the deployment. When the extension will be deployed, if the manual links are not removed, nothing changes. I would advise to take some time to try first, removing links on some pages to check if the extension behaves correctly, then do the full automatic removal. Lea Lacroix (WMDE) (talk) 10:14, 24 April 2017 (UTC)
  • Does it work immediately? I added the German word dichterisch without an interwiki link and it doesn't show that it is present on the German Wiktionary. SemperBlotto (talk) 10:16, 24 April 2017 (UTC)
@SemperBlotto The deployment is in progress, and expected to be finished at 12:00 UTC. Then, yes, the new words you create should have the automatic links. Lea Lacroix (WMDE) (talk) 11:49, 24 April 2017 (UTC)
Edit: actually, it works now between en and de :) The rest of the Wiktionaries will be populated in alphabetic order in the next minutes. Reminder: if any problem occurs, feel free to create tickets or ping me. Lea Lacroix (WMDE) (talk) 11:58, 24 April 2017 (UTC)
I removed the interwiki-links on the german lemma. After that you can see the links to the Wiktionary 'cs', 'en' and 'eo'. The links 'hu', 'io' and 'zh' are outstanding, because they haven't removed the manual links from the entries. --Alexander Gamauf (talk) 12:02, 24 April 2017 (UTC)
This is problably some caching introducing issue. Now there are all languages A-L, few mintes ago there were only languages A-I. JAn Dudík (talk) 12:31, 24 April 2017 (UTC)
As a test, I tried making Estonian the only interwiki link at grape juice, but all three interwiki links are still there. --WikiTiki89 16:50, 24 April 2017 (UTC)
As a test, if you add a Japanese interwiki at grape juice, then the entry will show all the three existing interwikis plus Japanese (which does not have that entry yet). So, rather than overriding the whole list, you can just add more interwikis manually. --Daniel Carrero (talk) 17:02, 24 April 2017 (UTC)
Which is a problem, because we can't hide incorrect interwikis (if there happen to be any). --WikiTiki89 17:20, 24 April 2017 (UTC)
Another test: if you remove all interwikis from grape juice and add only [[et:example]] in the entry, it will show three interwikis: the two correct "grape juice" and the Estonian "example". So, you can override an interwiki in a specific language, but apparently you can't just delete an incorrect interwiki. Yes, that may be a problem at some point, I guess. --Daniel Carrero (talk) 18:10, 24 April 2017 (UTC)
Thanks for noticing. Let me know if you encounter a case of wrong automatic link that you would like to change/hide. Lea Lacroix (WMDE) (talk) 09:17, 25 April 2017 (UTC)

Have you notified all bot's owners that are adding IW links? I still see some bots edit here which their owners are Meta users. --Octahedron80 (talk) 03:31, 25 April 2017 (UTC)

Am I right in thinking that our bots don't always put the interlanguage links in alphabetical order by language code? I vaguely recall something about the ordering being slightly different. Is this the case, and if so, does the output of Cognate conform to these standards? —JohnC5 17:39, 25 April 2017 (UTC)

@JohnC5: You are correct: they are not alphabetized by the ISO code and Cognate also does not alphabetize by ISO code. They are alphabetized by a different standard. —Justin (koavf)TCM 17:41, 25 April 2017 (UTC)
The standard is explained at WT:EL#Interwiki links. Incidentally, that policy needs to be updated to account for (Not) Cognate. It still says that interwikis are maintained by bot. --Daniel Carrero (talk) 17:42, 25 April 2017 (UTC)
I'm guessing it uses the same ordering that we have been using. --WikiTiki89 17:48, 25 April 2017 (UTC)
By the way, the order we were using is defined here. --WikiTiki 89 17:54, 25 April 2017 (UTC)
They will deal with iw sorting later, as answered in the extension's talk page.--Octahedron80 (talk) 04:08, 26 April 2017 (UTC)
I have removed that section of EL to reflect the deployment of Congregate. —Μετάknowledgediscuss/deeds 17:49, 25 April 2017 (UTC)
@Koavf: I'm surprised you need to be told this, but do not edit other people's content in a discussion. It's fine if you don't get the joke and misinterpret it as an error, but editing it — and then even edit-warring over it! — is never something you should be doing. —Μετάknowledgediscuss/deeds 19:56, 25 April 2017 (UTC)
@Metaknowledge Suffice it to say, in this instance, you intended the misspelling and I missed that. It's generally better to not edit others' comments, sure but there are also times that it's entirely appropriate--refactoring, fixing links, important misspellings, etc. —Justin (koavf)TCM 20:00, 25 April 2017 (UTC)
When in doubt, don't. Particularly "refactoring" worries me; don't refactor my messages and change their context or break sentences apart that flow into each other. The cost of a misspelling is usually much less than the cost of changing a correct spelling (or one consciously chosen by its author).--Prosfilaes (talk) 23:39, 25 April 2017 (UTC)

New proto-languages[edit]

Should the new proto-languages like Proto-Eurasiatic, Proto-Nostratic and Proto-Borean be added to Wiktionary? Although they are controversial and not fully endorsed, there are strong evidence that these might exist. Besides, there are a lot of books and reliable websites with hundreds of reconstructed words from these proto-languages with their etymology. Whether they have existed or not, on Wiktionary exist constructed languages with their own lemmas. So until they will be they might be seen as constructed languages.

Here are some sources with reconstructed proto-words:

-- 11:16, 13 April 2017 (UTC)

No, they shouldn't. Good luck convincing enough editors that there's strong evidence for them. Lingo Bingo Dingo (talk) 11:41, 13 April 2017 (UTC)
This isn't an attempt to convince other editors that there's strong evidence for them. This is not even a proposal. And good luck to the dutch people to become smart. -- 12:44, 13 April 2017 (UTC)
Please do not be offensive to other editors. —JohnC5 16:30, 13 April 2017 (UTC)
We've discussed previously whether to let etymologies mention proposed things like Indo-Uralic and Nostratic. My view, for which there was some support, is that as long as we convey how controversial / undemonstrated a proposal is, it's fine to mention in the etymology appendices like Reconstruction:Proto-Indo-European/h₁nómn̥. I guess we could get by without codes for them, but having codes seems like it would standardize the formatting, and have benefits for categorization. We do (as noted) have codes for various constructed and reconstructed languages that are restricted to appendices, as well as etymology-only substrates, and we have a code for Proto-Altaic already. - -sche (discuss) 21:20, 13 April 2017 (UTC)
I stand by an earlier stance of mine: as soon as you can find a "Proto-Nostratic" or similar root that is accepted by three or more people with competing reconstruction schemes (that is, they roughly agree both on what the reconstruction is and what the descendants are), including it would not be a problem. But, alas… almost all sources on these continue to assume different incompatible proto-forms from one another. There is just about nothing that could be added.
The basic problem, in other words, is that we do not add entries for languages, we add entries for individual reconstructions. If the "reconstructions" are so unstable that no two sources agree on anything, there will be no point in creating an entry. (And, for what it's worth, I suspect this should be a rule of thumb also when dealing with established proto-languages.) --Tropylium (talk) 20:44, 22 April 2017 (UTC)

Two proposals: removing and/or hiding quotations sections[edit]

Previous discussions:

See this link. It is an entry that I edited. I removed a "Quotations" section and moved the two existing quotations to the respective senses:

Two proposals:

  1. Allow for all "Quotations" sections to be removed manually (not by bot!) in all entries, by moving the quotations to their respective senses. If the sense is unclear, the quotation can be moved to the citations page.
    • Rationale: Quotations serve the purpose of illustrating the senses, so they are better placed below each sense. The "Quotations" section also uses up some space in the entry, as opposed to the quotations hidden below each sense.
    • Note: The vote (which was mentioned above) proposed moving all the "Quotations" sections by bot to the citations page, but the vote failed. Moving the quotations to the senses is better, but a bot can't do that. (with the current technology, I guess! ;) )
  2. Automatically hiding all existing "Quotations" sections by adding templates like {{quote-top}} and {{quote-bottom}} in all these entries by bot.
    • Note: This was originally @Donnanz's idea in the Grease Pit discussion (also linked above). If we want to move all the quotations as explained above, then we can hide them until the work is done; or, if we don't want to move the quotations to the senses, they can just remain hidden.

I think the proposal #1 is important. If #1 passes, in the long run #2 won't matter because we won't have any "Quotations" sections in the first place. But I see some merit in the proposal #2 too as I said above. Maybe other people have more to say. Feel free to agree or disagree or whatever. I think we could create a single vote with the two separate proposals, if people want. --Daniel Carrero (talk) 04:02, 14 April 2017 (UTC)

@Daniel Carrero: I wholeheartedly support your first proposal (which is what you did with Lawrence). I'm undecided w.r.t. your second proposal. — I.S.M.E.T.A. 18:26, 14 April 2017 (UTC)
I support #1. The quotations that are moved need to be formatted in the normal "under-a-sense" way, i.e. collapsible. Equinox 18:31, 14 April 2017 (UTC)
I support 1, but it's going to be a lot of work. DTLHS (talk) 04:08, 15 April 2017 (UTC)
So far, we have 4 supports for the proposal #1 (counting myself) and no one opposed it yet. Looks like we are starting to have consensus for that idea, if we didn't already have that consensus before (which can be judged by reading the previous discussions). If this keeps up, I think we can simply do as proposed -- I mean, I guess we won't need a vote to allow doing that. (that's my opinion, at least)
This is the list of entries to be edited: User:Daniel Carrero/Quotations sections. --Daniel Carrero (talk) 00:18, 16 April 2017 (UTC)
I've been moving quotes to under specific senses for a while now, so I have no objection. Andrew Sheedy (talk) 06:48, 16 April 2017 (UTC)
I proposed using {{quote-top}} and {{quote-bottom}} as a quick and easy temporary solution, but if total removal of these Quotations headers is the preferred option that's OK by me. Everyone seems to realise that's harder work. DonnanZ (talk) 10:35, 16 April 2017 (UTC)


I just created this category as a parent of Category:en:Cookware and bakeware and Category:en:Cutlery. According to w:Cookware and bakeware, this term is restricted to containers for preparing food in, but a lot of the entries in Category:en:Cookware and bakeware are really general kitchenware. I would appreciate any help in recategorising these. There may also be some entries in Category:en:Tools that can be moved. —CodeCat 20:26, 14 April 2017 (UTC)

User:x/Books/ and Category:Books[edit]

Where are these coming from? —suzukaze (tc) 19:06, 15 April 2017 (UTC)

Go to the lefthand sidebar, and near the bottom you'll see "Create a book". If we could hide or disable it, I think that'd be great. —Μετάknowledgediscuss/deeds 19:17, 15 April 2017 (UTC)

Wiktionary:Votes/pl-2017-04/Removing inactive editors from user-proficiency categories[edit]

I have just created this vote, which is set to run from the 23rd of April to the 22nd of May, this year. For details about it, see the vote page and/or the #Removing inactive editors from Category:User coders, Category:User languages, and Category:User scripts section, above. — I.S.M.E.T.A. 01:16, 16 April 2017 (UTC)

Eponyms of surnames[edit]

I keep coming across entries of eponyms of surnames with no definition for the surname itself, with the etymology saying "Named after ____ {surname}." or something. Here's a damn good example: Nemeth. "Named after Abraham Nemeth." Well whoever put that, don't you think the surname itself merits a definition? So how do you have 2 etymologies when the second etymology just comes from the first etymology? How do we deal with that here? For now, in the entry, I put:

"1. A surname. 2. (Named after Abraham Nemeth) Bla bla bla about some braille thing he made or whatnot."

Is there consensus to instead do this?

Etymology 1

Wherever Nemeth comes from

Etymology 2

Named after Abraham Nemeth.

Or what if we had a thing where we had an etymology within an etymology, for instance:

Etymology 1

Wherever Nemeth...

Etymology 1.1

Named after Abraham Nemeth.

Or instead, we could label it in the etymology.


Wherever Nemeth comes from.

(for the eponym) Named after Abraham Nemeth.

Proper noun

1. A surname

2. Bla bla bla braille bla bla bla something else

Get the idea? This is very confusing. What do we do with etymologies within etymologies anyway? We especially need to worry about this for surnames and their eponyms as there are a ton of those I'm coming across. PseudoSkull (talk) 04:02, 16 April 2017 (UTC)

I'm sorry, technically this is not an eponym because it's still a proper noun in both definitions, but you get what I mean. PseudoSkull (talk) 04:03, 16 April 2017 (UTC)
If thing-X is named after surname-X, it might as well be under the same etymology, a lot like figurative senses of existing words. Being a separate sense doesn't make you a separate etymology: we already have sense lines for that. You get a separate ety if your derivation is separate. Equinox 05:22, 16 April 2017 (UTC)
@PseudoSkull: I second Equinox’s reasoning here. The eponymy can (and should) be explained in the etymology, more or less as in your last example of how to deal with Nemeth. — I.S.M.E.T.A. 20:31, 17 April 2017 (UTC)
I agree, with the reminder that the "named after Abraham Nemeth" bit goes in the etymology (or potentially as part of the definition: "A system of braille developed by Abraham Nemeth...") rather than as a context label: [7]. - -sche (discuss) 20:59, 23 April 2017 (UTC)

Flerd (Middle English)[edit]

Would someone fluent in Middle English provide a better translation of the quotation at flerd? Thanks. — SMUconlaw (talk) 18:36, 16 April 2017 (UTC)

@Smuconlaw, this sort of query belongs in the Tea room. Anyway, the translation was ungrammatical and missing a large chunk; I have fixed that, but I would still like somebody else to look it over and see if they can't improve it. @Leasnam, perhaps? —Μετάknowledgediscuss/deeds 06:33, 17 April 2017 (UTC)
Oops! Thanks. — SMUconlaw (talk) 06:43, 17 April 2017 (UTC)

Category:Russian words suffixed with -∅[edit]

What should we do with the -∅ part? It's not conventional @Atitarev, Wikitiki89, Chuck Entz? — AWESOME meeos * ([nʲɪ‿bʲɪ.spɐˈko.ɪtʲ]) 01:54, 17 April 2017 (UTC)

No idea, I haven't seen any precedent. This was introduced without any prior discussion. Besides, transliterations of -∅ should be suppressed with "-". --Anatoli T. (обсудить/вклад) 02:48, 17 April 2017 (UTC)
For Dutch, we have Category:Dutch words suffixed with -en (denominative). However, the actual suffix -en is just the ending of the infinitive, which is the lemma form of verbs in Dutch. The stem of the word does not change, and this is illustrated by verb forms with no ending, such as the first-person singular present. adem (breath, noun) and adem (I breathe, verb) are the same. Because the derivation happens to be with a lemma form with a nonzero inflectional ending -en, things fit into our system neatly. However, derivations can also work in reverse in Dutch, by converting a verb stem to a noun without changing it. Thus, although it did not actually happen so historically, it's quite possible for adem to derive from ademen instead. Nouns have no inflectional ending in their lemma form, so the lemma is the stem. How would such a derivation be denoted, if not with a null suffix? Again, keep in mind that both this null derivation and -en are the same morphologically, neither of them changes the stem, the difference in form is only because of the choice of lemma forms. —CodeCat 20:49, 17 April 2017 (UTC)
Dutch -en, just like Russian -∅ are inflectional suffixes, not derivational suffixes. Thus, they should not be mentioned in derivations. In other words, we should say things like "ademen is from adem" and "adem is from ademen" rather than "ademen is from adem + -en" and "adem is from ademen with -en removed". --WikiTiki89 16:39, 18 April 2017 (UTC)

Proposal: Remove "The essentials" from EL[edit]



  • That section to be deleted reads like a complete guide about "Language", "Part of speech" and "References", but it is too short and sometimes misleading, as said below. WT:EL already has three separate, more comprehensive and up-to-date sections for these items: WT:EL#Language, WT:EL#Part of speech and WT:EL#References.
  • The three separate sections mentioned above were completely created/revised by vote through the last year and a half: Wiktionary:Votes/pl-2015-12/Language, Wiktionary:Votes/pl-2015-12/Part of speech and Wiktionary:Votes/2016-12/"References" and "External sources". (Only "Ideophone" was recently added in the POS list without a vote.)
  • This statement is false, because for definitions, we use the attestation process, not references: "While we may be lax in demanding references for words that are easily found in most paper dictionaries, references for more obscure words are essential."
  • This statement is false, because we don't add references directly in the senses (apart from sometimes adding footnote links in the senses): "References may be added in a separate header of adequately chosen level or added directly to specific senses."

I already created a vote for this proposal in December 2015, but the vote never started:

Procedural notes:

  • I've been planning to propose starting that vote after revising the three actual, separate EL sections covered above. First, I created the three separate votes linked above and revised the three separate sections. Now I feel it's a good time to remove "The essentials" entirely. Feel free to agree/disagree/discuss.

Also, here's one past discussion about this, between Wonderfool and myself:

--Daniel Carrero (talk) 06:12, 17 April 2017 (UTC)

I added this vote in the list. It's going to start in 7 days. --Daniel Carrero (talk) 10:41, 22 April 2017 (UTC)

Please vote[edit]

Planned, running, and recent votes [edit this list]
Some votes are going to end pretty soon, and have 5 participants at most. Please vote before they end. It goes without saying that abstaining is fine too. Thanks in advance. --Daniel Carrero (talk) 02:05, 19 April 2017 (UTC)

Where does one file a complaint against an Admin?[edit]

Hi all, I've been clicking around but I can't find where to signal abuse of admin powers or unwiktionarianlike behaviour by an admin. Can someone point me in the right direction? Thanks. Great floors (talk) 08:35, 19 April 2017 (UTC)

Just post it here. Equinox 08:41, 19 April 2017 (UTC)
As Equinox has said, this is a good place, or you can bring it up directly with the person on their talk page (if appropriate). If you want to maintain anonymity you can email the OTRS service (info-en(a)wiktionary.org or see "Contact us" page on the left) and one of the volunteers there will look into the matter for/with you. - TheDaveRoss 11:02, 19 April 2017 (UTC)
Ach, don't encourage people to send even more spam our way! This user was reverted for adding a noun definition to an adjective, has been rather defensive about it, and wants to waste everyone else's time. I'd recommend not engaging any further. —Μετάknowledgediscuss/deeds 16:24, 19 April 2017 (UTC)
Even if this particular person is not in the right, this conversation might be the one which helps some future user find the right place to seek help. - TheDaveRoss 17:57, 19 April 2017 (UTC)


Without any discussion whatever, G23r0f0i has taken it upon themself to remove ===Statistics=== sections and {{rank}} templates from English entries. I don't have any particular interest in rankings either way, but I do think widespread removal of a feature we've had for a long time needs to be discussed before it goes any further. —Aɴɢʀ (talk) 20:37, 19 April 2017 (UTC)

Seemingly, this user has a history of removing stuff like that. See their talk page. PseudoSkull (talk) 20:39, 19 April 2017 (UTC)
Sorry about that, I was removing too many in the same spurt. I should've carried on gradually removing them among other edits, hiding the deletion to avoid detection. It's just a bugbear I have, especially with blatant crap like having the rank template on the page def. And having on the page 4? Really? Useless!--G23r0f0i (talk) 20:43, 19 April 2017 (UTC)
I agree that there should be some discussion, I also agree that they should be removed. The source of the material was not well sanitized (as can be seen by the rank of "Gutenberg") which makes it not only obsolete but also misleading. We should remove the rank data until and unless we can create something which is meaningful. - TheDaveRoss 20:44, 19 April 2017 (UTC)
I don't know about you, but I use the word Gutenberg nearly as often as the. --WikiTiki89 21:02, 19 April 2017 (UTC)
Of course, just not in mixed company. - TheDaveRoss 21:32, 19 April 2017 (UTC)
@G23r0f0i: If you think that Template:rank should be deleted, then propose that. But don't just remove it haphazardly: that defeats the purpose of a deletion discussion and it also makes it impossible for other users to see how a template is actually being used. —Justin (koavf)TCM 22:49, 21 April 2017 (UTC)
It has been proposed, a long time ago. So far a supermajority of people contributing to the RFD support deletion, but it's never been closed. —Μετάknowledgediscuss/deeds 23:05, 23 April 2017 (UTC)
Maybe the user was being facetious, but hiding removal of the template would be worse than performing edits that are more clearly a removal of the template. Fortunately, the user is now blocked, so this proposed action will not happen. — Eru·tuon 22:02, 24 April 2017 (UTC)
If we do decide to include rankings, we should at least automate them in Module:Rankings or some other centralized place because word frequencies change over time. —Aryamanarora (मुझसे बात करो) 22:31, 21 April 2017 (UTC)
@Aryamanarora: I agree. It would be so much easier to generate them with a module than to manually write them out as is currently done. — Eru·tuon 21:52, 24 April 2017 (UTC)

User:CodeCat for admin again.[edit]

She seems to need admin tools still, and heck, I don't see why not. PseudoSkull (talk) 00:57, 22 April 2017 (UTC)

I'm in favor as long as (1) User:Wyang is also re-sysopped, and (2) the two of them agree not to wheel-war with each other anymore. —Aɴɢʀ (talk) 10:17, 22 April 2017 (UTC)
No offence, but I do not think CodeCat has the right temperament to be an admin; case-in-point, the example above. She's very "delete first, f**k you and your questions later". --Victar (talk) 16:06, 22 April 2017 (UTC)
I like her shake-things-up attitude, I think her contributions are largely for the better, and her not having the sysop tools is a waste of time. I agree with Angr's caveats, however. --Barytonesis (talk) 18:07, 22 April 2017 (UTC)
@Barytonesis: Do you have a previous account? Your account looks to be only a few months old. --Victar (talk) 18:15, 22 April 2017 (UTC)
Yes. Why? --Barytonesis (talk) 18:17, 22 April 2017 (UTC)
Well, obviously the quality of your opinion is only worth your experience on Wiktionary. --Victar (talk) 18:19, 22 April 2017 (UTC)
As far as I'm concerned, both CodeCat and Wyang are still admins until there's a vote to change that. The reason they're not allowed to act as admins is because 1) reinstating just one would be taking sides and 2) I don't have a commitment from both that neither will resume their side of the wheel war. In other words, Angr's statement is basically what I've said before and will continue to say until either the community decides otherwise or the dispute is resolved. Chuck Entz (talk) 19:29, 22 April 2017 (UTC)
It's important to the project that both of them have their admin tools, and we should restore them at once. If problems crop up again in the future, it is a simple matter to remove the admin bits. There is no risk in this, so we should restore their powers now. —Stephen (Talk) 14:07, 23 April 2017 (UTC)
What will happen if I restore Module:links to its correct state and Wyang undoes it again, like before? Which version has consensus? This needs to be sorted. —CodeCat 14:40, 23 April 2017 (UTC)
Wyang has still been breaking modules though, and having to fix them when someone complains. DonnanZ (talk) 15:34, 23 April 2017 (UTC)
You could start a vote to decide whether phonetic_extraction should or shouldn't be present in the links code. You will have to re-explain your views as to what's going on and why it should go; I don't recall what the argument was any more. I do think it would be helpful if you agree not to remove phonetic_extraction from Module:links, and Wyang agrees not to add it back in if someone else removes it. This is pretty much what Chuck is asking for. Benwing2 (talk) 16:11, 23 April 2017 (UTC)
There you have it, Stephen. You say 'there's no risk in it' but CodeCat's first question is basically an announcement of seamlessly continuing the edit war given the chance. Which isn't surprising, since the matter was never settled, nor did any of them concede. They were just stopped from acting. Those knowledgeable [read this as: Other people than CodeCat and Wyang] must decide to handle those Asian languages one way and not the other, full stop, before either can be reällowed to use the admin tools or we'll have an eternally repeating history. Korn [kʰũːɘ̃n] (talk) 10:59, 25 April 2017 (UTC)
When I said no risk in it, I meant that any files they edit can be returned to a previous state if we don't like the edits, and it only takes a few seconds to do it. It's not as though they could actually do any damage. I don't see CodeCat's question as a threat ... I think it's a good question. It's been over eight months since the wheelwars and I have forgotten what the dispute was. Maybe someone could explain the disagreement and the community could make a decision. —Stephen (Talk) 22:47, 27 April 2017 (UTC)
Two thoughts: 1. Yes, anything they change can be returned to a previous state. Which is what they did. Constantly. We call that an edit war. 2. Frankly, as much as I agree that CodeCat is useful as a admin and stellar as an editor, basically everything a user could aspire to be here, the hardcore stubbornness and willingness to do the very thing for which to stop she was invested with the tools isn't really an advertisement of the diplomatic skills and insight I as a user would hope to find in those in positions of judgemental power. If we solve the issue at hand, we've probably not solved the actual issue. Sorry, CodeCat. Korn [kʰũːɘ̃n] (talk) 10:50, 28 April 2017 (UTC)


This has been discussed briefly at Wiktionary:Tea room#Category:en:Scouting, but something must seriously be done. There are so many new scouting terms I and other users have added, and I believe it deserves a category. What I propose we do is make it so that when the label "scouting" is used in Template:lb, it automatically adds it to Category:en:Scouting. Examples of recent entries include merit badge (and I still can't believe this wasn't already here as it's such a well-known scouting concept), merit badge university, Philmont, Philmonter, Eagle, Venturer Scout, Totin' Chit, MBU, PTC, minibear, merit badging, and many more and there are still a lot more to cover. PseudoSkull (talk) 00:57, 22 April 2017 (UTC)

I don't know how/where to do it, or I would. Equinox 16:27, 22 April 2017 (UTC)
First, you need to decide which parent category it would best be placed under. Then, go to that category, click the "Edit category data" button, and in the module code, insert a definition for scouting (the list is alphabetical). —CodeCat 17:49, 22 April 2017 (UTC)
@PseudoSkull, Equinox, CodeCat: Unless I'm mistaken, I believe this and this have instituted what is desired. — I.S.M.E.T.A. 11:26, 25 April 2017 (UTC)
If it works, then I'd say so. But why the capital letter? —CodeCat 13:16, 25 April 2017 (UTC)
@CodeCat: I took my lead from the word's capitalisation in the label for the definition of merit badge. — I.S.M.E.T.A. 13:31, 25 April 2017 (UTC)

Category:English clippings & Category:English short forms[edit]

What's the difference? "short form" is given as a synonym of "clipping"... --Barytonesis (talk) 23:55, 22 April 2017 (UTC)

@Barytonesis: Short form sounds more professional than clipping to me. — I.S.M.E.T.A. 11:30, 25 April 2017 (UTC)
@I'm so meta even this acronym: Clipping is the linguistic technical term, though. —Aɴɢʀ (talk) 11:36, 25 April 2017 (UTC)
@Angr: It sounds to me like clipping is the process which creates short forms. — I.S.M.E.T.A. 12:26, 25 April 2017 (UTC)
The term refers both to the process (as a mass noun) and to the form so created (as a count noun). See the first sentence of the second paragraph of the Wikipedia article, for example. —Aɴɢʀ (talk) 12:41, 25 April 2017 (UTC)
@Angr: *grumbles* Fine, I suppose. I can get with that programme. Clipping it is. — I.S.M.E.T.A. 12:50, 25 April 2017 (UTC)
@I'm so meta even this acronym: Although I disagree with you that "short form sounds more professional" and would argue precisely the opposite, I don't really care one way or the other. My concern is over why we have two categories for the same thing. --Barytonesis (talk) 12:41, 26 April 2017 (UTC)
"Short form" is a generic name for anything that's a shorter form. Not all short forms are clippings. Not all clippings are short forms. In some languages, like Russian, "short form" has a specific meaning in the morphology of adjectives (and it's not a clipping, but rather the lack of a suffix in the first place). We should probably get rid of the short forms category altogether, because it doesn't really mean anything useful. --WikiTiki89 14:12, 26 April 2017 (UTC)

IPA character replacements by NadandoBot[edit]

This bot seems to be responsible for replacing r with ɹ in every case. Is there any genuine reason for this, or is it a whim of the bot operator? One example is here. DonnanZ (talk) 08:20, 23 April 2017 (UTC)

As long as it's only replacing r with ɹ in English-language sections it's okay. Our convention is to use /ɹ/ for English; see Appendix:English pronunciation. —Aɴɢʀ (talk) 13:39, 23 April 2017 (UTC)
I can't see anything in that reference to indicate why. Other references like Wikipedia, Oxford and Cambridge never use ɹ, so why is Wiktionary the odd one out? DonnanZ (talk) 13:56, 23 April 2017 (UTC)
@Donnanz: Primarily because of Wiktionary:Votes/2008-01/IPA for English r. —Aɴɢʀ (talk) 14:49, 23 April 2017 (UTC)
Ah, I wasn't a registered user then, and a lot of the supporters of that vote aren't around any more. Maybe that vote should be rerun. There's one easy solution: don't enter any IPA with r in it, that way nothing will be altered. DonnanZ (talk) 15:00, 23 April 2017 (UTC)
In fact you should probably never edit any pages at all since your contributions can be changed by anyone at any time (maybe there's a name for this type of website?). DTLHS (talk) 15:46, 23 April 2017 (UTC)
I'm not bothered by that, and my edits are rarely reverted. But one can't be forced to do something in the knowledge that it will be undone by a bot, and of course you run the bot in question. DonnanZ (talk) 16:18, 23 April 2017 (UTC)
Nobody's forcing you to do anything, but all Wiktionarians share in a mutual agreement to follow consensus as determined by votes. If you don't like the vote, feel free to start a new one to repeal it. —Μετάknowledgediscuss/deeds 17:23, 23 April 2017 (UTC)
Almost all Wiktionarians, MK...--WF April 2017 (talk) 22:18, 23 April 2017 (UTC)
Maybe, maybe not. I'm not exactly getting any support for this. DonnanZ (talk) 10:33, 24 April 2017 (UTC)
For what it's worth, I'd be for using the approximant symbol /ɹ/ if it came up for vote again. The use of the generic trill symbol /r/ by most English dictionaries annoys me. — Eru·tuon 22:45, 24 April 2017 (UTC)
Is there any difference in pronunciation? DonnanZ (talk) 08:17, 25 April 2017 (UTC)
Strictly speaking, /r/ stands for a trill, like the Italian r or the Spanish rr, while /ɹ/ stands for the usual English r. However, most English-language dictionaries and phonology reference works use /r/, because it's easier on both typesetters and readers and because the two sounds do not contrast in English. (Some varieties of English, particularly some varieties of Scottish English, do use the trill.) If we were a monolingual English dictionary, I'd be in favor of /r/, but because we're a multilingual dictionary I think it's preferable for us to use /ɹ/ for English and /r/ for languages that actually have a trill. —Aɴɢʀ (talk) 11:09, 25 April 2017 (UTC)
@Donnanz: You can listen to that difference in pronunciation in these recordings for [ra ara] and [ɹa aɹa]. I agree with Erutuon and Aɴɢʀ here. — I.S.M.E.T.A. 11:40, 25 April 2017 (UTC)
Thankyou, but unfortunately I can't hear the audio on either recording for some reason. DonnanZ (talk) 13:26, 25 April 2017 (UTC)
@Donnanz: OK. Try this YouTube video instead. — I.S.M.E.T.A. 13:34, 25 April 2017 (UTC)
No problem with that, thankyou. DonnanZ (talk) 13:50, 25 April 2017 (UTC)
@Donnanz: You’re welcome. Can you hear the difference between them? And, if so, does it change your opinion on this at all? — I.S.M.E.T.A. 22:23, 25 April 2017 (UTC)
I must admit it has. I didn't connect with the meaning of "trill" before. I thought it might be something to do with rolling of Rs, which incidentally I do slightly, being a Southlander by birth. DonnanZ (talk) 22:53, 25 April 2017 (UTC)
@Donnanz: I see. So the phone might really be [r] in your idiolect. Perhaps it's Southland's populace's Scottish heritage. — I.S.M.E.T.A. 23:01, 25 April 2017 (UTC)
@Donnanz: Trill is what I'd mean when saying "rolled r", but I'm not sure what a Southland rolled r is, articulatorily speaking; I'm not sure if I've ever heard it. Would you by any chance be able to find a recording of it? — Eru·tuon 23:07, 25 April 2017 (UTC)
It's more of a drawn-out r, not a trill, some of my relatives were very strong with rolled Rs. Yes, both Otago and Southland have a strong Scottish heritage, but there were Irish and English settlers too, my great-grandfather was born in Stoke-on-Trent. DonnanZ (talk) 23:18, 25 April 2017 (UTC)
It's amazing to me that you're arguing for one over the other without knowing what the difference is! Ƿidsiþ 11:49, 25 April 2017 (UTC)

Category:Russian words suffixed with -∅ and the "null suffix"[edit]

@Atitarev, Cinemantique, Wikitiki89, Wanjuscha, KoreanQuoter What should we do with this category and the "null suffix" that it implies? There are many nouns in Russian that appear to be formed by taking the root of a verb and converting it directly into a masculine noun, without any apparent suffix (or alternatively, with a null suffix). In Proto-Slavic, these words would have normally had the suffix -ъ, but this no longer corresponds to anything in Russian. Cf. взгляд (vzgljad, glance), apparently derived from the root of взгляну́ть (vzgljanútʹ, to glance (at)), where -ну́ть (-nútʹ) is a verbal suffix that in most cases suppresses a preceding consonant (in this case, d), and the root взгляд- (vzgljad-) with the d is found in other derivatives such as взгля́дывать (vzgljádyvatʹ). In this case we could probably indicate that it comes from a Proto-Slavic word *vъzględъ, but there are also productive formations that don't go back to Proto-Slavic, e.g. водово́д (vodovód, aqueduct), clearly a calque on the Latin word and transparently composed of водо- (vodo-) (combining form of вода́ (vodá, water)) and -вод (-vod) (action noun derived from the verb води́ть (vodítʹ, to lead)). User:D1gggg has added many etymologies to such words; these are somewhat OK and somewhat broken and I'm trying to fix them up but I'm not quite sure how to proceed. Benwing2 (talk) 17:31, 23 April 2017 (UTC)

Oops, I see this was brought up only a week ago. But there was no resolution then. My main question is what's the proper way of formatting such etymologies. Benwing2 (talk) 17:33, 23 April 2017 (UTC)

Null suffix is:

  1. Almost not related to etymology at all
  2. Has little to nothing with Proto-Slavic, but invented by some linguist (AFAIK)
  3. Taught in regular schools at 1-3 grade
  4. Used by professional linguists "стирай (нулевой)" by Krylova Maria Nikolaevna PhD in Philological Science, Assistant Professor of the Professional Pedagogy and Foreign Languages Department
  5. Was in use for decades and (maybe) about centuries.

Anyone who claims non-existence of null suffixation is illiterate according to standard of the language. d1g (talk) 17:42, 23 April 2017 (UTC)

Null suffixes aren't controversial among linguists. The question is how best to handle them in our etymologies (which include word formation). Benwing2 (talk) 18:33, 23 April 2017 (UTC)
my suggestion is to use common sense and name things with their names "word by etymology" "words by morpheme/allomorphs" (or lemmas or something else) d1g (talk) 19:08, 23 April 2017 (UTC)
Null suffix is the right concept but there is nothing useful in explicitly showing it. The lack of anything after the stem is enough.--Anatoli T. (обсудить/вклад) 22:21, 23 April 2017 (UTC)
But how do you write this in an etymology using {{affix}}? —CodeCat 19:57, 24 April 2017 (UTC)
Why do you have to use {{affix}}? --WikiTiki89 20:06, 24 April 2017 (UTC)
How else? —CodeCat 20:26, 24 April 2017 (UTC)
"From вода́ (vodá, water)) and the root of води́ть (vodítʹ, to lead)", for example. —Aɴɢʀ (talk) 20:30, 24 April 2017 (UTC)
That etymology doesn't really tell me much. The etymology should look like that of Dutch ademen, with the base word and the affix used to derive it, both in their lemma form. If the affix has no orthographic representation in its lemma form, -∅ should do just fine to indicate this. —CodeCat 20:34, 24 April 2017 (UTC)
I disagree. We shouldn't give inflection suffixes such as Dutch -en in derivations. --WikiTiki89 20:37, 24 April 2017 (UTC)
It's not an inflectional suffix, it's a derivational suffix. They have separate sections on the page for -en. Words derived with -en follow a particular inflectional class, which clearly not all verbs with the infinitive ending -en follow. This is analogous to e.g. Latin , which derives only first conguation verbs. —CodeCat 20:43, 24 April 2017 (UTC)
More accurately, you are deriving a weak verb stem from an adjective or noun stem (without any alterations), and the -en happens to be the inflectional suffix of the lemma form of this new weak verb. --WikiTiki89 20:57, 24 April 2017 (UTC)
Yes, just as Latin derives a verb stem whose lemma form has the ending and that inflects as a first conjugation verb. In Dutch, verbs of all classes have the infinitive ending -en, but this is a historical accident caused by the merging of unstressed vowels. Just as historical accident caused former *-āō to become , the same ending as third conjugation verbs. —CodeCat 21:02, 24 April 2017 (UTC)
My point is the verb stem is created without any affixation, and so there is no derivational suffix. Then is added as an inflectional suffix to form the first person singular present, but that shouldn't be part of the etymology. --WikiTiki89 21:07, 24 April 2017 (UTC)
We indicate the combination of null suffix + lemma ending as -en. Likewise, the combination of null suffix + null lemma ending should be -∅. —CodeCat 21:20, 24 April 2017 (UTC)
That's exactly what I'm saying we shouldn't do. Are we clear now that our disagreement is real and not a miscommunication? --WikiTiki89 21:24, 24 April 2017 (UTC)
Our practice has always been to show the morphological derivation of a word. A null suffix is also part of that derivation. Consider a hypothetical case in which a Latin second declension noun is derived from a first declension noun. We'd denote this with + -us. Now, if in another case, a lemma without an inflectional ending is derived from a lemma with one, then you'd denote this in the same way, by showing the suffix + ending as usual. If both are null, then you need another way to indicate it; -∅ seems like a good way to do so. —CodeCat 21:30, 24 April 2017 (UTC)
Actually "our practice" has always been inconsistent about this. In the hypothetical case that a Latin second declension noun is derived from a first declension noun, it would be more useful to say exactly that (something like "from the first-declension noun X, converted to the second declension") than to say "X + -us". In the other hypothetical case where a lemma without an inflectional ending is derived from a lemma with one, the same applies, the inflectional endings should be ignored in the derivation. --WikiTiki89 21:39, 24 April 2017 (UTC)
As a linguist, I don't doubt the existence of null morphemes for a moment, but as a lexicographer, I do doubt the usefulness of showing them in etymology sections of a dictionary. I don't think our readers will benefit from being told that the plural deer is formed from the singular as deer + ∅ or that the past participle run is formed from the infinitive as run + ∅. —Aɴɢʀ (talk) 20:33, 24 April 2017 (UTC)
We don't indicate etymologies for inflections. —CodeCat 20:34, 24 April 2017 (UTC)
Fair enough. I don't think our readers will benefit from being told that the verb dog is formed from the noun as dog + ∅ or that the noun break is formed from the verb as break + ∅. —Aɴɢʀ (talk) 21:00, 24 April 2017 (UTC)
I don't know. I'm not sure how it should be indicated in etymologies, or how the categories should be structured or named, but it is unfortunate that terms that are derived from another part of speech without the addition of a morpheme are currently not categorized in any way. It's another derivational process, and it should be recognized for the sake of completeness. — Eru·tuon 00:08, 25 April 2017 (UTC)
I think I agree with CodeCat and Erutuon that we should maybe show the null morpheme -∅ in derivations. It's definitely a derivational process and IMO the categories are useful. Benwing2 (talk) 01:10, 25 April 2017 (UTC)
I feel more or less as Angr does. Ƿidsiþ 05:03, 29 April 2017 (UTC)
In взлёт etymology, I see взлете́ть (vzletétʹ) + -∅. That does not look like a true null suffix to me. In fact, the stem of взлете́ть (vzletétʹ) had to be modified to produce взлёт. Similarly, in Czech, nákup is probably derived from nakoupit but I would be surprised to see nákup etymology specified as nakoupit + -∅. The derivational process involved does not look like suffixation. Not all derivational processes consist in extension of something; blending, yielding e.g. smog, is an example. For Czech, I will oppose the use of -∅ for such a purpose until I find some convincing arguments. For Russian, if at least a slight majority of Russian contributors opt to oppose the use of "+ -∅", I will add my voice to them.
Can anyone recommend some good reading on "null suffix", written in English? I am not impressed by "Philological Science" (an oxymoron?) and credentialism. --Dan Polansky (talk) 10:11, 29 April 2017 (UTC)

Template:not a morph[edit]

@Atitarev, Cinemantique, Wikitiki89, Wanjuscha, KoreanQuoter This was created by User:D1gggg. It originally just said "not a morph" and now says "unlisted as a morpheme" (i.e. in a major reference book of Russian grammar that D1gggg likes to cite), which is more accurate, but I think this entire template is unhelpful and unnecessary. Opinions? Benwing2 (talk) 19:00, 23 April 2017 (UTC)

Orphan and delete.--Anatoli T. (обсудить/вклад) 22:12, 23 April 2017 (UTC)

Oldest quotation in Wiktionary, aka elephant poop[edit]

While doing my once-every-ten-year common-misspelling Wiktionary purge, I came across a quote from around 2 millenniums BCE at 𒄠𒋛. It brought me to the obvious question...do we have quotes that date back further than that? In a way, I kinda hope not, coz it would cool if our record-breaking quote is all about elephant poop. --WF April 2017 (talk) 21:49, 23 April 2017 (UTC)

Wow, that's hilarious. I agree, it is a great quote to have as the earliest one on Wiktionary. — Eru·tuon 22:08, 23 April 2017 (UTC)
Methinks a certain person is back. DonnanZ (talk) 22:35, 23 April 2017 (UTC)
You don't say? Here I was thinking his username referred to Wallis and Futuna. —Μετάknowledgediscuss/deeds 23:04, 23 April 2017 (UTC)
LOL, he even welcomed himself on his talk page... Andrew Sheedy (talk) 23:40, 23 April 2017 (UTC)
  • Someone could easily find an older one for Sumerian, but not me. —Μετάknowledgediscuss/deeds 23:04, 23 April 2017 (UTC)
    Good point. I was thinking, isn't Sumerian older than Akkadian? — Eru·tuon 23:57, 23 April 2017 (UTC)
    • @Erutuon: Writing-wise, yes. The Akkadians adopted Sumerian characters. Those are the oldest writing system and the ancestor of all current writing systems other than the Chinese and possibly some deliberate scripts from recent times (that is, for conlangs). —Justin (koavf)TCM 21:35, 24 April 2017 (UTC)
I just added the following quote, I doubt anyone will get anything older. - TheDaveRoss 12:26, 24 April 2017 (UTC)
  • 13.82b BCE - The Universe
Possibly , as in "Russian words suffixed with -∅"... Equinox 14:16, 24 April 2017 (UTC)
The 2nd millennium BC is of course a very wide range. The Rig Veda and Hittite inscriptions are also from that millennium. —Aɴɢʀ (talk) 14:36, 24 April 2017 (UTC)
13.82b BCE is before the Latin alphabet was adapted to write English, so I doubt that was the original script. --WikiTiki89 14:49, 24 April 2017 (UTC)

Hmm, I'd bet some of our Old Persian/Avestan/Sanskrit stuff approaches that. —Aryamanarora (मुझसे बात करो) 01:08, 25 April 2017 (UTC)

Having looked up some stuff, the Avestan Gathas are from 1200 BC, the Old Persian Darius inscriptions are from 550 BC, and the Vedas are 1700-400 BC... so I think there's some competition. —Aryamanarora (मुझसे बात करो) 01:08, 25 April 2017 (UTC)
Our Ugaritic quotations are c. 1400-1200 BCE. I'm sure we could find Akkadian, Sumerian, or Egyptian quotations from the 3rd millennium BCE. And maybe even an Egyptian quotation from the 4th millennium. --WikiTiki89 14:24, 25 April 2017 (UTC)

Dialect labels[edit]

I've thought further about how to clean up our system of several different templates that create labels ({{label}}, {{term-label}}, {{accent}}, {{qualifier}}, {{alter}}). Besides the details of the way these templates display, the relevant difference is that {{label}} and {{term-label}} are used in definitions and headwords, and they frequently add categories, while the others do not. But all of these labels templates often refer to language varieties.

So, I propose that we have some centralized module for language varieties, and that we allow all these templates to refer to it. Currently, we have Module:labels/data/subvarieties, and all the dialect modules, like Module:en:Dialects (which are only used to label alternative terms).

Perhaps what would be easiest is to allow {{qualifier}} to use the labels in Module:labels/data/subvarieties, while, of course, not adding categories. Then, if you put the name of a language subvariety in {{qualifier}}, it can be automatically linked to a Wikipedia article.

This would, however, require that we add language codes to instances of {{qualifier}}. That way, the module can check that, for instance, the "Australian English" label is being used in an English entry, the language to which that language variety belongs. If no language code is provided in the template, then the language variety labels module will not be used.

Categories should not be added because {{qualifier}} is used in lists of terms related to the current entry somehow (for instance, Derived terms and Synonyms). If a term has a synonym in another variety of the language, it does not mean that the term's entry should be categorized under that variety.

I am not sure if I have the energy to put this idea into practice right now, but there it is. — Eru·tuon 02:16, 25 April 2017 (UTC)

  • {{label}} has been generally replaced by {{lb}}, and performs a different function to {{qualifier}} or {{q}}. DonnanZ (talk) 08:22, 25 April 2017 (UTC)
    • Right, {{lb}} is the abbreviated form of {{label}}. I am aware that {{label}} is used in different parts of entries. I have outlined the purposes of these templates in User:Erutuon/labels. — Eru·tuon 09:19, 25 April 2017 (UTC)
    • The area where the purposes of {{label}} and {{qualifier}} intersect is in giving a dialect, variety, or sociolect. {{label}} gives a dialect in the context of definitions, {{qualifier}} elsewhere. See, for instance, Wikisaurus:bathroom, which tags synonyms of the word using {{qualifier}}. And see pissed, which uses {{label}} to indicate which dialects the two senses are found in. Both templates should draw from the same label data, since they are identical apart from what situation they are found in. {{qualifier}} need not be free-form; it would be better to standardize it like {{label}}. — Eru·tuon 09:39, 25 April 2017 (UTC)
    • To give an example, the entry for pissed might contain the definition {{lb|en|US}} [[angry]], while the entry for angry might contain, in its Synonyms section, {{l|en|pissed}} {{qual|US}}. In this way, the two templates are related to each other and can contain similar content. — Eru·tuon 09:49, 25 April 2017 (UTC)
  • It would be good for general data consistency, but not sure if it adds a lot of benefits given the effort. I'm still hoping that at some point we can avoid passing redundant lang parameters everywhere, and maybe defer this decision until then. I'm cleaning up the mess from the {{qualifier}} template conversion; there does seem to be some confusion around the usage of the two templates. How about leaving {{qualifier}} freeform and adding a specialised {{dialect}} template? – Jberkel (talk) 09:16, 25 April 2017 (UTC)
    • I concur that {{dialect}} would be a better place for this. I like having {{qualifier}} be general purpose and with no-frills. —JohnC5 03:06, 27 April 2017 (UTC)
      • I sort of like the idea of having a dedicated template, but adding another qualifier template would also be a headache for editors. It would require us to go through thousands of entries replacing {{qualifier}} with {{dialect}} whenever it mentions language subvarieties. It would enforce a strict distinction: you can never use {{qualifier}} when mentioning a dialect name. And then you would have to choose between two templates specifically dedicated to language subvarieties: {{accent}}, which is used in pronunciation sections, and {{dialect}}, which is used everywhere else except in headwords and definitions. It seems to me much easier to just loop {{qualifier}} into the existing language subvariety labels module. Editors don't have to change their ways, but their chosen qualifiers will now be modified in the same way that content supplied to {{label}} is modified. (I suppose the only potential change is to require {{qualifier}} to be supplied with a language code, but that may not be necessary.) — Eru·tuon 03:42, 27 April 2017 (UTC)
  • I agree with making a data module specifically for varieties. Presumably, it would replace most of what is currently in Module:etymology languages. —CodeCat 13:21, 25 April 2017 (UTC)

Using Wikidata to store alphabets[edit]

This is a possible case use for Wikidata. @Lea Lacroix (WMDE), could you please check if Wikidata is able to do this?

This is the English alphabet:

  • Aa, Bb, Cc, Dd, Ee, Ff, Gg, Hh, Ii, Jj, Kk, Ll, Mm, Nn, Oo, Pp, Qq, Rr, Ss, Tt, Uu, Vv, Ww, Xx, Yy, Zz

I don't speak Turkish, but apparently this is the Turkish alphabet:

  • Aa, Bb, Cc, Çç, Dd, Ee, Ff, Gg, Ğğ, Hh, Iı, İi, Jj, Kk, Ll, Mm, Nn, Oo, Öö, Pp, Rr, Ss, Şş, Tt, Uu, Üü, Vv, Yy, Zz

Will we be able to store alphabets in Wikidata and query Wikidata for questions like this?

  • Is "H" in the English alphabet? (answer: yes)
  • What is the position of "H" in the English alphabet? (answer: 8th letter)
  • What is the 9th letter of the Turkish alphabet? (answer: capital Ğ, small ğ)
  • What are all the alphabets that use the letter "A"? (answer: a really large list of alphabets)
  • Can we store letter names too? I'd like Wikidata to answer this: What is the name of the letter "B" in English? (answer: "bee")

Thanks in advance. --Daniel Carrero (talk) 11:08, 25 April 2017 (UTC)

Note that there are pages for characters: d:Q9992, so at least part of the data should already be there. — Dakdada 12:36, 25 April 2017 (UTC)
Also for alphabets, with the order: d:Q754673. — Dakdada 12:42, 25 April 2017 (UTC)
Hello @Daniel Carrero and thanks for these interesting questions. As @Darkdadaah mentioned, items already exist for alphabet letters (maybe not all, and not perfectly described).
To describe the alphabet letters, as concepts, items and their statements work well. For example A (Q9659) has a statement "part of: English alphabet". This could probably be improved by adding a qualifier to indicate that it's the first letter of this alphabet. Then one could query the position of the letters and answer some of your questions.
"a" will also exist as a lexeme, several lexemes in that case, to describe the use of this letter as a word, in English, French, and other languages that use this word. In the lexeme "a (English)" one will then describe the lexical category, several forms and senses, that will answer other questions.
I tried a few queries and realized that a lot of letters are still missing or need to be better described in Wikidata, because the list of letters part of English alphabet is almost empty when list of letters part of latin script has way more results. Seems like a work to do together with Wiktionary and Wikidata community :)
I hope this answers to your questions. Lea Lacroix (WMDE) (talk) 10:15, 26 April 2017 (UTC)
Thank you for your response. :) I agree that it seems like a work to do together with Wiktionary and Wikidata community. --Daniel Carrero (talk) 21:27, 27 April 2017 (UTC)
@Lea Lacroix (WMDE), you will find all sorts of alphabetic information in Category:Script appendices. Some examples are: Appendix:Latin script/alphabets, Appendix:Cyrillic script, and Appendix:Arabic script. —Stephen (Talk) 22:31, 27 April 2017 (UTC)

Splitting WT:RFV[edit]

RFV is consistently so long as to be unwieldy to edit, just by dint of how long it takes to load. A good solution would be to split it into a page for English terms and a page for non-English terms (though I am not sure what those pages should be called). I was evidently not the first person to suggest this idea, but I'm bringing it here to encourage more discussion of it. —Μετάknowledgediscuss/deeds 07:26, 26 April 2017 (UTC)

Symbol support vote.svg Support - TheDaveRoss 11:58, 26 April 2017 (UTC)
Symbol support vote.svg Support (I have also indicated my support at "Wiktionary talk:Requests for verification"). — SMUconlaw (talk) 12:14, 26 April 2017 (UTC)
Symbol support vote.svg Support - I don't usually have issues with loading, but certainly if it helps out Leasnam (talk) 18:19, 26 April 2017 (UTC)
I wonder what the ratio of English to non-English would be. The problem is that some RFVs sit there for a long time, probably because no one knows how to confirm or reject them. DonnanZ (talk) 12:45, 26 April 2017 (UTC)
Another idea would be to highlight some of the oldest RFVs on the "Recent changes" page in the same way as wanted entries are. DonnanZ (talk) 13:10, 26 April 2017 (UTC)
I think I'd support this. It doesn't feel too hacky/arbitrary to split them, either, since English has special status on en.wikt. Equinox 13:53, 26 April 2017 (UTC)
Very much support this, that page is the length of a book at this point. — Kleio (t · c) 13:55, 26 April 2017 (UTC)
A few things I want to point out: If we split the pages, we would have to be stricter about enforcing language codes in the {{rfv}} and {{rfv-sense}} templates. Otherwise the automatically generated links from those templates (most importantly, the "+" link) will not work. Secondly, much of the slowness of page loading has to do with too much JavaScript. If we can reduce the amount of JavaScript that runs when you load a page, then problem solved. Other than that, I have nothing against splitting the page. --WikiTiki89 14:11, 26 April 2017 (UTC)
It would help if we could do something about abuses like WT:Requests for verification#Compounds with quis. This two-part section, by itself, is 3% of RFV. Its sheer volume is compounded by the fact that it's going to hang around longer because no one is going to read it. Perhaps we could banish it to a subpage with a TLDR message in its place? Chuck Entz (talk) 14:25, 26 April 2017 (UTC)
The fact that it's only 3% of RFV shows that it's not the real issue. I agree it's annoying though. Since it's Latin perhaps User:Metaknowledge is willing to take care of it. --WikiTiki89 14:29, 26 April 2017 (UTC)
Why not just close most of the unresolved RFV discussions as "failed"? The intro text says specifically: "After a discussion has sat for more than a month without being 'cited', or after a discussion has been 'cited' for more than a week without challenge, the discussion may be closed." The RFV discussions will be moved to the entries' talk pages, which will be accessible in case someone in the future wants to try attesting them. --Daniel Carrero (talk) 14:46, 26 April 2017 (UTC)
Because that's prioritising cleaning out RFV over making Wiktionary better. Every RFV'ed term deserves a serious attempt at being cited by someone who is competent with the language in question, and if that takes more than a month (because of people who speak more obscure languages being slow to respond to pings, etc), then so be it. If we close it as failed, nobody is likely to return to check if the term is okay for a very long time. (This is the same thing as terribly unformatted new entries by anons; we should fix them up or ping people who can rather than delete them outright because they broke all our layout rules.) —Μετάknowledgediscuss/deeds 18:12, 26 April 2017 (UTC)
  • @TheDaveRoss, Smuconlaw, Equinox, KIeio, Chuck Entz: How does WT:Requests for verification/English and WT:Requests for verification/Non-English sound? Who is willing to update {{rfv}} and {{rfv-sense}} and force them to take language codes? —Μετάknowledgediscuss/deeds 18:16, 26 April 2017 (UTC)
    • Keep the main page for non-English. Just like we have WT:Requests for deletion and WT:Requests for deletion/Other. —CodeCat 19:00, 26 April 2017 (UTC)
      • I would prefer having the main page serve as a directory pointing to the other two. Otherwise, the main page should be for English, while all other languages get a separate page. --WikiTiki89 19:08, 26 April 2017 (UTC)
        @Wikitiki89: Is there a way that we could ensure that everyone who watchlisted the old page can also automatically watchlist the new ones if it is split directory-style, by moving it? —Μετάknowledgediscuss/deeds 07:10, 27 April 2017 (UTC)
        @Metaknowledge: Yes, all you have to do is before you create the new pages, move the current RFV page to the two new pages' locations and then move it back. --WikiTiki89 11:45, 27 April 2017 (UTC)
        Should the non-English page be subdivided with headings for different languages, like ==Japanese==, ==Latin==, ==Thai==, and so on? As for updating {{rfv}} and {{rfv-sense}}, perhaps someone very experienced with templates like @Erutuon or @JohnC5 can assist. — SMUconlaw (talk) 15:28, 27 April 2017 (UTC)
        I think the non-English page should continue to function as RFV does now. It would be annoying to have add a header for each individual language that has only one RFV'd term, even if it might be useful when a whole batch of terms is RFV'd from one language. --WikiTiki89 15:36, 27 April 2017 (UTC)
I like having everything on one page because it forces us to deal with the buildup of requests when the page gets to big. If we split the page I would hope it would not result in less attention to the non-English section. DTLHS (talk) 18:20, 26 April 2017 (UTC)
  • Symbol support vote.svg Support Quicker loading and saving would make the review/discussion part of the verification process easier to participate in and might speed resolution.
Also, strict time limits, eg, 3/4/6 months, for English RfVs are much easier to justify than the same limits for languages that have fewer contributors. DCDuring (talk) 18:46, 26 April 2017 (UTC)
@WikiTiki89: JavaScript shouldn't be the problem or not be the only problem - even when disabling it, the page takes some time to load.
@Chuck Entz: No, it's no abuse and the text shows that it's a justified RFV. Of course, maybe one could correct entries without having a RFV, but then guys like you could reasonlessly rollback it. Or one could put it shorter like just having the line "For the feminine quaequam and the plural", but then guys like you could complain that no reason is given or that the person maybe didn't search enough and ignore it. Your "because no one is going to read it" might be correct and the length might be a problem for some, but the length isn't the only problem and it's the lesser problem. Another problem is the language. WT:Requests for verification#quintus from 11th October 2016 is very short but about Latin too, and there was not a single reply. WT:RFV#emodulo and WT:RFV#camminus even are from May 2016 - so almost one year old - and yet there were only two replys. And as for emodulo, even though some cites were provided, it misses an analisation or translation to see whether or not the cites attest a certain meaning.
@Daniel Carrero/Μετάknowledge: Technically Daniel Carrero is correct. Those old RFVs would be RFV failed (and are sometimes even marked as RFV failed although they aren't archived), even though the terms could exist. But a problem with Daniel Carrero's suggestion would be that it's not easy to find those old somewhat unresolved discussions. A third way (besides having it RFV failed and placing it somewhere where nobdy can find it, and having it at RFV for a too long time) could be to have it as RFV failed but to collect those discussions elsewhere where one can find them. There could be a subpage like WT:RFV/input needed linked from WT:RFV and then maybe someone sees those old entries and can give some further input.
@DCDuring: Formally by the rules - obviously not factually by practice - for English (and all other languages) it's one month without a reply or having cites. A criterion to have it like 1 month for some and e.g. 3 month for other languages could be WT:WDL/WT:LDL. But even 3 months for a LDL might often be not enough, while 1 year for LDL might be too long even for a LDL.
@CodeCat: It should be better to have two subpages like Appendix talk:List of protologisms#non-English. WT:RFV should contain the rules and links to the subpages and the subpages should contain the RFVs.
- 19:24, 26 April 2017 (UTC)
I don't know if your Latin requests are appropriate for RFV. I think you could be more productive if you created an account and just started making changes. DTLHS (talk) 19:30, 26 April 2017 (UTC)
I'm not saying you shouldn't use rfv for these things, it's just that everything you post is literally ten times the size that even verbose people like me use to say the same thing. Giving us chapter and verse is bad enough, but you also seem to be throwing in several editions of the entire Bible, the kitchen sink and the chassis of a '57 Buick just to be on the safe side. It's worse than posting in Greek, because at least some of us can read Greek... Chuck Entz (talk) 04:29, 27 April 2017 (UTC)
  • Unlike some I find WT:RFV loads in an instant. But wading through the list is a different matter, and it needs to be shortened somehow. Splitting it can be tried, and if it doesn't work it can be reversed. DonnanZ (talk) 20:50, 26 April 2017 (UTC)
Symbol support vote.svg Support. Like Donnanz, I don't have a big problem with it loading, but it is a pain to scroll through, and I'm much less likely to look at older discussions if I can't participate in half of them. I share DTLHS's concern though. Andrew Sheedy (talk) 01:58, 27 April 2017 (UTC)
  • Symbol oppose vote.svg Oppose The splitting will do very little and will make it harder to add items since the adding person will have to pay attention to whether the item is English or non-English. --Dan Polansky (talk) 09:56, 29 April 2017 (UTC)


Are tweets acceptable sources of citations? I can't recall seeing any before, but they would be useful sources for primarily oral languages like Scots or Swiss German. Any ideas for how a tweet should be cited, wording-wise? Ƿidsiþ 14:07, 27 April 2017 (UTC)

Isn't the Library of Congress archiving them now? Makes the durability issues less relevant. - TheDaveRoss 14:16, 27 April 2017 (UTC)
Is the Library of Congress archiving all Tweets? Otherwise we need a way to tell which ones are archived, and whether they will continue to be archived if they are deleted from Twitter. --WikiTiki89 14:53, 27 April 2017 (UTC)
This is just a recollection I have of news from awhile back, not sure of any details. If they are archiving it would be especially great if we could link back to the archive rather than Twitter itself. - TheDaveRoss 15:03, 27 April 2017 (UTC)
Found this article. It seems we can't rely on it yet, but may be able to in the future. --WikiTiki89 15:23, 27 April 2017 (UTC)
How frustrating. It's an amazing source of data, but I can't see a solution to the problem that the original tweets might just be deleted. Ƿidsiþ 11:48, 28 April 2017 (UTC)
Can tweets be archived at the Internet Archive? — SMUconlaw (talk) 13:51, 28 April 2017 (UTC)
archive.org is not a reliable permanent archive, because a website can choose to have itself removed from it. --WikiTiki89 14:30, 28 April 2017 (UTC)

The strategy discussion. The Cycle 2 will start on May 5[edit]

Base (WMF) and SGrabarczuk (WMF) (talk) 16:07, 27 April 2017 (UTC)

erroneous entries apparently generated semiautomatically[edit]

Apparently User_talk:SemperBlotto/2015#aufbringen created some or many erroneous German entries, see also einbringen, using some kind of automated or semiautomatic process since his user page says de-1. We should probably check them all and strongly discourage similar activities. Incorrect entries are much more harmful and cause a very much bigger waste of time than missing entries or missing definitions. --Espoo (talk) 21:04, 28 April 2017 (UTC)

  • Both of those entries have entries in the German Wiktionary that are over ten years old. What's the problem? SemperBlotto (talk) 05:50, 29 April 2017 (UTC)

Words formed by respelling letters, e.g. deejay[edit]

Some words are formed by respelling letters: deejay, emcee, okay, Seabee. Is there a technical name for this? Equinox 19:22, 29 April 2017 (UTC)

Wikidata interwikis for categories, templates, project pages...[edit]

@Lea Lacroix (WMDE), could you please check this case use for Wikidata?

I noticed that Wikipedia has the ability to use Wikidata to store interwikis for a number of namespaces, including categories, templates and the (site name) namespaces.

It seems Wikibooks, Wikinews, Wikiquote, Wikisource, Wikiversity and Wikivoyage can do the same thing, too.

I suppose Wiktionary will be able to use Wikidata for interwikis in some namespaces too?

Here are some pages that could use it:

Thanks in advance. --Daniel Carrero (talk) 23:36, 29 April 2017 (UTC)

This would only be practical if it handles all categories of a certain type the same way. If Category:French nouns and Category:English nouns are stored independently, implying that there will be one "nouns" data item for every possible language, then there's not much point. Also, using equivalences for templates won't work very well. Not all Wiktionaries have a concept of a headword line, for example. So, for example, de.wiktionary has nothing like our {{de-noun}}. Contrarily, many Wiktionaries have templates for generating headers, like de.wiktionary's {{Worttrennung}}, which have no English equivalent. —CodeCat 23:48, 29 April 2017 (UTC)
What about Category:English language, which has 143 interwikis? It would be nice if we centralized them all in a single place, to avoid having to use bots to update all the interwikis in all Wiktionaries everytime something changes -- as in, if another Wiktionary creates a category for "English language" or if some category gets renamed. Obviously, I'm not saying the idea above would work for all categories, templates and pages, but it's because some of these pages won't even have interwikis in the first place. Point taken, {{Worttrennung}} does not have an English equivalent, which means it won't have English interwikis, stored in Wikidata or not. For the pages that have interwikis, Wikidata would help a lot.
Ideally, for categories/templates/pages repeated in many or all Wiktionaries, we should have a system to predict patterns like "Category:(language name) nouns" and "Category:(language name) language", but this probably can't work because maybe not all Wiktionaries have categories/templates/pages with consistent names. Even the English Wiktionary has inconsistently named templates like Template:fa-interjection abnd Template:la-interj. --Daniel Carrero (talk) 02:30, 30 April 2017 (UTC)