January 2017

Reconstructed Latin Words[edit]

So what I'm proposing is to rename this section, which deals with reconstructed Vulgar Latin verbs, nouns, etc. to Proto-Romance and apply the proper phonology for that, which has also been reconstructed in the same way, i.e. via applying the comparative method to modern Romance Languages. The result reflects the last common ancestor of all of them, which was Proto-Romance by definition. Vulgar Latin is a much broader term which can cover centuries before the period in question; the about page for Vulgar Latin even feels the need to specify that 'the form of Vulgar Latin that is considered in Wiktionary's entries is primarily the latest common ancestor of the Romance languages.' So I think we're essentially in agreement on this point.

One benefit of the Proto-Romance label is that we can have a better idea of the phonology to use.

Let's take a look at some of the issues concerning the current phonology:

1) Mulgo = [molgo]

The first [o] reflects only the Italo-Western Romance pronunciation, while Sardinian and Eastern Romance are both given with [u].

That is because they all came from Proto-Romance [ʊ].

2) Festizo = [ˈfes.tʲe.zo]

The source cited for this page indicates that the < z > represents [ʤ]. That section then refers to a previous one that mentions that this evolved along the lines [dj] > [j] and sometimes to [ʤ] depending on context. For Proto-Rom. I'm fairly certain that we're still at the [dj] stage. In either case, [z] is wrong. Also the bot has automatically palatalized the t to tʲ, which can't be right as all the derivatives show that it was a simple [t].

3) Desidium = [deˈse.ðe.õ]

The consonant ð did not exist in the period in question (2-4th. centuries AD) according to any source I could find. Looking at this phonological timeline for the French language, the intervocalic lenitions (which is what we're seeing here with desi[d]ium -> desi[ð]ium ) happened during and after the 'Proto-Gallo-Ibero-Romance' period, so essentially it affected the areas of France and Spain. This would also explain why the consonant doesn't appear in Italian, which keeps e.g. the [t] in vi[t]a, which became [ð] in Spanish/Portuguese and early Old French (and disappeared in later French).

I can't find a source saying that nasal vowels existed at all in Proto-Rom. so to me that final õ is suspect. For whatever reason, the bot has failed to make the second (prevocalic) < i > into [j], as it usually does. Also, since the mergers of [ɪ] into [e] and [ʊ] into [o] had not occurred yet in PR, the overall pronunciation here would be along the lines of [deˈsɪdjʊ].

In sum, the section would more accurately be called Proto-Romance and its current phonology is sketchy and could stand to be replaced with the one linguists have come up with for that period. —This unsigned comment was added by Excelsius (talkcontribs).

Category:Pages with ISBN errors[edit]

This page is being populated by the new {{ISBN}} template. Entries in this category have ISBNs that do not pass the checksum test or are possibly incorrectly formatted and cannot be recognized as valid. They will need to be researched carefully and replaced with the correct numbers. DTLHS (talk) 7:50 pm, Today (UTC−8)

Module:check isxn This module is crosslinked at d:Q21856723 (but, of course, en.wikt doesn't have interwiki links through d: yet). It provides functionality for other standardized numbers such as ISSNs but I didn't copy over any templates for that. —Justin (koavf)TCM 05:00, 4 January 2017 (UTC)

Stable 20 % growth in active editors?[edit]

From stats:wiktionary/EN/PlotEditorsZZ.png it looks like around the end of 2015 there was a 20 % jump in active editors which has been sustained until now. Most of the growth seems to come from the English Wiktionary: stats:wiktionary/EN/PlotEditorsEN.png. If we look at the detailed statistics (stats:wiktionary/EN/TablesWikipediansEditsGt5.htm), we see that en.wikt has had around 300 monthly active editors for years, but since April 2016 is around 390. October 2016 has been the first month ever with over 400 active editors (429); while deletions can reduce this number upon recalculation from updated dumps, I think it's likely to stay above 400.

Do you agree this is real growth and not just a statistical glitch? Do you have any idea what's going on? --Nemo 08:13, 4 January 2017 (UTC)

Because we are cool. Maybe also because Wiktionary is starting to become useful in reading and not only fun to edit. People find answers here and can just add small hints and not create a whole new page on there first edit. It is easier then to become a regular editor. Another plausible explanation is people coming from Wikipedia because of fights there or because of Wikidata improvements. Finally, it may come from the multiple efforts people are doing to talk about Wiktionary all over the world, like by doing a talk at Wikimania or creating content on Youtube about the project. In short: Wiktionary is becoming trendy Face-smile.svg Noé (talk) 16:25, 4 January 2017 (UTC)
Maybe we could find some of these supposed new editors and ask them which of those reasons, if any, they have for being here. DTLHS (talk) 16:29, 4 January 2017 (UTC)
I started editing frequently around the end of 2015, so I suppose I'm part of this growth, but my case was mainly due to a revival of my personal interest in languages and etymology. Turning to Wiktionary came naturally due to its etymologies and the breadth of its coverage of old languages (especially Latin). — Kleio (t · c) 15:49, 7 January 2017 (UTC)
When did the translation editor java script gadget get added? That seems to be something that lots and lots of people use between 5 and 100 times. - TheDaveRoss 20:59, 5 January 2017 (UTC)
I think User:Conrad.Irwin started working on that (User:Conrad.Irwin/editor.js) on 9 April 2009‎. —Stephen (Talk) 11:29, 6 January 2017 (UTC)
Wiktionary, particularly en.WT, has a developing reputation amongst academicians. Language students and academics, and hobbyists of same, are likely our largest contributing pool as our content is most related to their field(s) of interest. Chicken, egg. (This is, imo, a failing of the WT project.) - Amgine/ t·e 17:09, 8 January 2017 (UTC)
@Amgine: I'm confused--how would attracting specialists be a bad thing? Do you think this project is too difficult for newcomers or lay editors? —Justin (koavf)TCM 17:57, 8 January 2017 (UTC)
Wikimedians work on what interests them. A group of specialists and academics are interested in, and build, rather different things than a lay person who is looking for a dictionary. - Amgine/ t·e 21:17, 8 January 2017 (UTC)
@Amgine: I agree that this project is the most intimidating. I encourage totally new users to check out q:, since it is so easy to edit and has so few rules and templates. This would be the last project because of its complexity. —Justin (koavf)TCM 21:51, 8 January 2017 (UTC)
So all the growth is thanks to my Wiktionary special on the Wikimedia research newsletter? ;-) I would like to believe so, but I'm not sure there are so many academics contributing. If there are, they should be pointed to the conclusion of the GLAWI paper, which I quote again: «Wiktionary serves its purpose well by having little constraints and maximising participation, while standardization can be performed downstream».
Nowadays, I suspect that any academic who finds Wiktionary too simple will spend time on d:Wikidata:Wiktionary or custom SemanticMediaWiki wikis, rather than try making Wiktionary gradually more similar to what they want. --Nemo 16:09, 11 January 2017 (UTC)
I agree with DTLHS, let's list the new active users and write to a sample of them. :) If someone wants to work on this, I can help with database queries. Nemo 16:09, 11 January 2017 (UTC)
@Nemo_Bbis: are you thinking of OmegaWiki in particular or do you know of other examples? —Justin (koavf)TCM 16:24, 11 January 2017 (UTC)
I'm not sure when this started happening, but on my iOS device, whenever I use the "look up" feature (which was changed from Define with the release of iOS 10), the Wiktionary entry appears with other dictionaries. Maybe the Wiktionary widget was added sometime in August, since there was an increase in page views starting in August 2016 (here). More page views would eventually draw more editors. Also, Google now gives Wiktionary entries considerable weight in its search algorithm; they appear in the first page for many definition searches and often as in the first three results for Latin searches (in my experience). I'm not sure how to tell when Google implemented that change. The problem with ascertaining these things is that there are many independent variables, and we can only measure one dependent variable—the progress of the data in Wikistats. Icebob99 (talk) 05:36, 29 January 2017 (UTC)

Sixth LexiSession: car[edit]

Monthly trend topic is car. You are invited to participate in the common goal to discover what can be gathered around the word car! There is already a Wikisaurus on vehicle and a Wikisaurus on automobile but there is still a lot to describe on pimping cars like spoiler, hubcap and vinyl roof. Plus, illustrations are welcome for every parts of cars and we may also imagine figures to illustrate the internal structure of vehicules!

This collaborative experiment is still running without any guide nor direction. You're free to participate as you like and to suggest next months topic. If you do something, please report it here, to let people know you are involve in a way or another. Hope there will be some people interested by this one Face-smile.svg Noé (talk) 16:40, 4 January 2017 (UTC)

Redirect CJK & Kangxi radicals to common CJK characters[edit]

As we discussed around October 2016 about to redirect halfwidth & fullwidth characters to common ones, CJK & Kangxi radicals are in the same situation. The CJK & Kangxi radicals appear same as its common CJK characters. I suggest to redirect them either. For example, U+2F00 KANGXI RADICAL ONE should be included into U+4E00 CJK UNIFIED IDEOGRAPH-4E00 (and add character info too), and so on. Except only if a radical does not have same common character (I think I see some), so it can have its individual page. I am starting to do this at Thai Wiktionary. [1] How do you think? --Octahedron80 (talk) 02:50, 7 January 2017 (UTC)

PS. I also think about CJK compatible characters to be redirected. But I am not sure whether wiki system will allow to do that. --Octahedron80 (talk) 03:19, 7 January 2017 (UTC)

I'm pinging @I'm so meta even this acronym in case he's interested, because this is about character boxes and redirects.
Support both types of redirect wherever applicable. The wiki software apparently already redirects automatically all the codepoints in "CJK Compatibility Ideographs" and "CJK Compatibility Ideographs Supplement", but I'd support adding the second character box in all these entries.
Apparently, things like the / are just basically two codepoints for the same character. Unrelatedly, I naturally support keeping separate entries for simplified/traditional Chinese, which is a different thing.
I generally support redirecting any separate codepoints that are basically variations of the same single character. For example, I redirected ("HEAVY HEART EXCLAMATION MARK ORNAMENT") to ! because when you write an exclamation mark with a large heart style, it's still an exclamation mark. As I said in the last vote, if we define "D" as "Fullwidth form of D", this would not make a lot of sense if Wiktionary were printed or if the reader does not care about separate codepoints, because it's just a typographical variant. We might as well define "D" as "Comic Sans MS form of D".
See Category:Character variation redirects. I believe that currently, all the targets of these redirects have multiple boxes to account for the character variations. If the boxes are not enough (they are geeky and may require a bit of knowledge of Unicode to read properly), we may want to eventually add usage notes en masse to all these entries, explaining the differences of codepoints. --Daniel Carrero (talk) 07:41, 7 January 2017 (UTC)
I Symbol oppose vote.svg Oppose this. The "Kangxi Radicals" and "CJK Radicals Supplement" Unicode blocks specifically represent ⾞ as a symbol used in classification of Chinese characters rather than 車 as a morpheme meaning "cart". —suzukaze (tc) 14:48, 7 January 2017 (UTC)
What is the difference? The radicals are only used as dictionary indices collecting for convenience. And we are the dictionary makers here. ;-) --Octahedron80 (talk) 14:53, 7 January 2017 (UTC)
"Kangxi Radical Cart" is a "translingual symbol" in the purest sense. Usage of "Kangxi Radical Cart" in text semantically differs from usage of "CJK Unified Ideograph 8ECA", unlike usage of halfwidth vs. fullwidth characters. —suzukaze (tc) 14:59, 7 January 2017 (UTC)
In addition, not all of them have "CJK Unified Ideograph" counterparts, like "CJK Radical repeat". —suzukaze (tc) 15:04, 7 January 2017 (UTC)
[edit conflict] See also chapter 18 of the Unicode Standard, page 20.suzukaze (tc) 15:09, 7 January 2017 (UTC)
It's just the Unicode name to call it something individually; it is not very special. If you dig a bit, you will find that they are already mapped with common characters on compatible mode. So it is not so wrong to use common 'car' as Kangxi 'car', and vice versa. Yes, some of them have no common character as I told above; we still keep them. --Octahedron80 (talk) 15:07, 7 January 2017 (UTC)
@Octahedron80: Regardless of the merits, if any, of redirecting these entries (which I support doing, as I said above), please wait some time (maybe a couple of weeks) before creating more redirects. I see that you redirected a few pages already, but we might want to discuss this first. Before redirecting the full/halfwidth characters, I waited 3 months, by my count: I created a one-month vote that started two months after I had created a discussion with the initial proposal. --Daniel Carrero (talk) 15:19, 7 January 2017 (UTC)
^ I also have a lot of work at thwikt with the same topic. So I must recess from enwikt for now. --Octahedron80 (talk) 15:27, 7 January 2017 (UTC)
Are there any paper dictionaries that define 車 as "cart" but which keep ⾞ separately defined as something like "This a symbol used in classification of Chinese characters! Don't confuse it with 車, which is absolutely different!"? On the contrary, are there any sources (even written in Chinese) that assign both uses to the same character, by saying something like: "you can use the cart symbol (車) as a way to classify Chinese characters"? --Daniel Carrero (talk) 15:24, 7 January 2017 (UTC)
  1. Anyone who would explicitly distinguish them in a traditional dictionary must be mad. But we're not a traditional dictionary.
  2. In common use, the "CJK Unified Ideograph" character is regularly used to represent the concept of the Kangxi radical (such as at 1 or 2) and most people probably don't even know of the existence of the "Kangxi Radical" Unicode block.
  3. Many dictionaries include a "Kangxi radical" definition for the character. For example, [2] (a dictionary by the Taiwanese Ministry of Education) defines the character as
    1. an alternative form of (rén)
    2. one of the 214 radicals.
  4. Accordingly, in September I added similar definitions to the "Translingual" section of "CJK Unified Ideographs" entries here while creating entries for "Kangxi Radicals" characters.
  5. See the Unicode Standard chapter I linked to above.
suzukaze (tc) 15:35, 7 January 2017 (UTC)
The halfwidth & fullwidth characters (and other outdated symbols) exist in Unicode with the same reason, backwards compat. Why not these radicals can't do the same way :-) . --Octahedron80 (talk) 15:41, 7 January 2017 (UTC)
These Unicode symbols have a difference explicitly defined by Unicode itself. —suzukaze (tc) 15:49, 7 January 2017 (UTC)
(Don't blame me.) I read your message many times. It sounds like you support in 1-3 as: After redirecting, we could include all available definition of a same appearent character in one page, even radical's and compat's. Because this is not traditional dict, no one would go mad. After including, we would have one more character info that still contains the radical's information, as the character itself and its categories, like the old ones. So nothing would be missed. --Octahedron80 (talk) 16:31, 7 January 2017 (UTC)
@Suzukaze-c: Naturally, feel free to disagree with me, but personally I still support redirecting the entries as Octahedron80 suggested. You did say that it's madness for traditional dictionaries to separate between 車/車; apart from the "traditional dictionary" label, you did not state any reason to justify the madness, (I'm not sure if my edit to $ which you linked implies something) but for one I believe that it really is a bad idea for print dictionaries to do so, because just by looking at the shape of 車/車 it's not possible to tell the difference between them. Still, if print dictionaries can't have these as separate entries for that reason, then it means that if Wiktionary were to be printed in the future (which it might be, either as an official Wikimedia project or as a legally CC-licensed derivative by any person), then it's going to be a "traditional" dictionary too in some sense, and it would be a bad idea for Wiktionary to keep 車/車 separated, too.
You linked to your edits on the entries and , the "main" one having a sense linking to the "radical" one. On principle, I applaud the idea of building a system to organize the characters, including your effort to create Template:mul-kangxi radical-def to link the entries. But I think that merging them is better: if we were going to have a whole sense in the "main" entry for the radical, then clicking on the separate "radical" entry did not seem able to provide very much else for the reader besides having an additional link to Index:Chinese radical/玄. We can just add the index link in the "radical" sense of the "main" entry and be done with it; this should be able to save one click from the reader's time.
I have the impression that, no matter whether the reader tries to access or at first (as in, they might have copied any of those from a separate website), the information on the whole "main" entry is probably going to be of interest; the "radical" page basically just had a sense linking back to the "main" page. The "main" entry contains the etymology, images and links to external databases. The "radical" entry did not have these things and was therefore incomplete; if it had these things, they would be a repetition of what can be found in the main "entry", and therefore reading it might be considered a waste of time.
As you said, in the dictionary you linked to, (the "main" entry) is defined as "one of the 214 radicals"; that dictionary does not use (the "radical" entry) for that definition. That does not invalidate the existence of the "radical" codepoint, but it does serve as additional evidence for the hypothesis that 儿 and ⼉ are the same character; possibly, there are applications that sort the "radical" in a different way (I didn't check) and/or have other hidden properties of these characters; the "radical" sense in the "main" entry should probably state "use the codepoint for the radical when a semantic distinction is needed". This usage instruction came from the page 21 of the Unicode policy you linked. It says: "The characters in the CJK and KangXi Radicals blocks are compatibility characters. Except in cases where it is necessary to make a semantic distinction between a Chinese character in its role as a radical and the same Chinese character in its role as an ideograph, the characters from the Unified Ideographs blocks should be used instead of the compatibility radicals." --Daniel Carrero (talk) 18:40, 7 January 2017 (UTC)
Thanks for the ping, Daniel Carrero. Primâ facie, I would support this, but suzukaze-c's opposition and this paragraph from page 688 of the Unicode Standard: “Characters in the CJK and KangXi Radicals blocks should never be used as ideographs. They have different properties and meanings. U+2F00 kangxi radical one is not equivalent to U+4E00 cjk unified ideograph-4e00, for example. The former is to be treated as a symbol, the latter as a word or part of a word.” make me hesitant. How would this proposal affect KangXi radicals that are visually distinguishable from their equivalent CJK unified ideographs? — I.S.M.E.T.A. 23:03, 7 January 2017 (UTC)
re. ISMETA: There are no "Kangxi Radicals" characters that are visually distinct from the "Unified Ideograph" equivalent (unless the font designer is weird or you get picky), but there are multiple characters in "CJK Radicals Supplement" that may map to a single "Unified Ideograph" character () and some that may map to multiple or no Unified Ideographs.
re. Daniel: Think of it this way: the character 儿 (Unified Ideograph) encompasses usage as both a radical and a morpheme used in language and running text while the character ⼉ (Kangxi Radical) is solely for usage as a radical. —suzukaze (tc) 02:49, 8 January 2017 (UTC)
@suzukaze-c: OK, so is your point that, because there exist both multiple radicals in surjective correspondence with one ideograph (e.g. , , ) and one radical in injective correspondence with multiple ideographs (e.g. , 𣱱), there is no way to do this consistently? — I.S.M.E.T.A. 15:06, 15 February 2017 (UTC)

So I made an Evenki transliteration module[edit]

It's based on the pre-1938 Latin alphabet with the following changes:

  1. Cyrillic orthography doesn't differentiate between ə and e (э and е) after n, ņ t, d, ʒ and j as the characters are used to indicate palatality.
    • I've left ə for эand e for e in all such cases, since even if we found a way to automatize this as it's been done for Russian, there are too few resources to practically recover these distinctions.
  2. I've replaced the original transliteration of ш (s which is what с maps to as well) to ş (which щ maps to as well). In the literary dialect this sound occurs only in Russian loanwords, however some dialects have it in native words.

Does the community support this transliteration? Can I put it in operation? Crom daba (talk) 03:13, 7 January 2017 (UTC)

I have no comments on Evenki specifically, but I'd like to request that whatever transliteration conventions we create, they be 1) documented, and 2) linked from the appropriate places, such as Wiktionary:About Evenki (which doesn't even seem to exist yet). A decent further addition might be 3) motivation for the particulars of the transliteration, if that's not too much of a bother. --Tropylium (talk) 16:11, 12 January 2017 (UTC)
Done, thanks for not letting me off the hook without the documentation. Crom daba (talk) 04:01, 15 January 2017 (UTC)

Steps towards a policy on ... place names[edit]

It would help to have an official policy on place names. The current CFI talks around the issue but does not provide clear guidance: "Wiktionary articles are about words, not about people or places. Many places, and some people, are known by single word names that qualify for inclusion as given names or family names. The Wiktionary articles are about the words. Articles about the specific places and people belong in Wikipedia."

I think this is unhelpful to editors particularly less experienced members of the community. Editors risk investing a lot of time in articles that later get deleted - even without a formal policy change. That is off putting. I can see that in the past there has been some difficulty in achieving consensus but I would like to try again.

I would suggest that we start with a very simple policy to increase the chances of reaching agreement. My proposed policy statement is:

"The following place names meet the criteria for inclusion:

  • The names of continents.
  • The names of seas and oceans.
  • The names of nation states.
  • The names of primary administrative divisions (states, provinces, counties etc).
  • The capital cities of nation states.
  • The capitals of primary administrative divisions.

The community has not reached a consensus on place names/geographic features other than those listed above."

John Cross (talk) 19:10, 7 January 2017 (UTC)

These rules would exclude major Dutch cities like Leiden, Eindhoven and Breda, Belgian cities like Leuven and Charleroi, and German cities like Cologne and Nürnberg. I think that would be absurd. —CodeCat 19:55, 7 January 2017 (UTC)
My intention was that the policy was neutral on Cologne for example. John Cross (talk) 21:56, 7 January 2017 (UTC)
IMHO it'd be good to just have a blanket acceptance of all placenames. Toponymy and other forms of onomastics should fall within Wiktionary's scope, I think. — Kleio (t · c) 22:40, 7 January 2017 (UTC)
I'm inclined to agree with Kleio on this. I also dislike the use of the term nation state in this proposed policy statement; the nation state is a historically rather recent phenomenon (the 1648 Peace of Westphalia instituted it, according to the traditional account), so how does this policy affect Latin toponyms attested in Classical to Renaissance sources? — I.S.M.E.T.A. 23:11, 7 January 2017 (UTC)
Agreed We could have Springfield 1.) a city in Illinois, 2.) a city in Oregon, etc. for common place names and The Hague would have info on the one municipality that has that name. Place names can have etymologies and will certainly be attested. —Justin (koavf)TCM 23:12, 7 January 2017 (UTC)
Are you saying we should have a separate definition (and translations) for each city in the US named Springfield? DTLHS (talk) 23:14, 7 January 2017 (UTC)
@DTLHS: If they all have the same etymology (which these would), then I think it's fair to say "A common city name found in Illinois, Ohio, ..." —Justin (koavf)TCM 23:24, 7 January 2017 (UTC)
I would agree that for common city names, there should generally be a single entry, except where one particular usage is more important. Our current entry for Springfield is a good example. Capital cities should be noted individually because the capital is often used synonymously with the government of the entire jurisdiction. bd2412 T 00:05, 8 January 2017 (UTC)
We have Wiktionary:Place names as a list (under construction) of different types of places of each country for further analysis. Also, I agree with Kleio on this: "IMHO it'd be good to just have a blanket acceptance of all placenames." --Daniel Carrero (talk) 00:14, 8 January 2017 (UTC)
I too would agree to blanket acceptance of all place names. DonnanZ (talk) 00:29, 8 January 2017 (UTC)
What is a place name? Would we accept names of streets and buildings? DTLHS (talk) 00:42, 8 January 2017 (UTC)
This is a rule of thumb that can be discussed/changed by other people: I think I'd be fine with having all place names from continents and countries to cities, towns, neighborhoods, villages, etc.
I guess we probably don't want names of buildings and streets here; although we do have Empire State Building and Harley Street. I wonder if we are going to keep only a few streets and buildings that are notable. --Daniel Carrero (talk) 01:43, 8 January 2017 (UTC)

Place names - 2nd attempt[edit]

I was really pleased with all the feedback I got on my 1st attempt here is attempt 2.

"The following place names meet the criteria for inclusion:

  • The names of continents.
  • The names of seas and oceans.
  • The names of countries.
  • The names of areas or regions containing multiple countries (e.g. Middle East, Eurozone).
  • The names of primary administrative divisions (states, provinces, counties etc).
  • The names of conurbations, cities, towns, villages and hamlets.
  • The names of natural geographic features (e.g. deserts, mountains, rivers) which are notable (per Wikipedia:Notability).

The Community has not yet reached a consensus as to whether or not place names/geographic features other than those listed above should be included in Wiktionary."

I have avoided mentioning streets, tunnels, buildings etc.

John Cross (talk) 08:24, 8 January 2017 (UTC) (amended John Cross (talk) 08:36, 8 January 2017 (UTC))

I think that if the name is a single word (e.g. Haymarket) then we should include it. If it is two or more words (e.g. Downing Street) then we should include it only if we can provide three or more usages of it in the usual sort of sources. SemperBlotto (talk) 08:57, 8 January 2017 (UTC)
Probably not an issue, but it feels a bit odd for us to rely on a WP policy (notability) that is somewhat outside our control. Equinox 10:21, 8 January 2017 (UTC)
What about place names that are linguistically interesting, but not otherwise "notable"? DCDuring TALK 17:00, 8 January 2017 (UTC)
<nods> I think the notability question should be relevant to Wiktionary's sphere of interest. However, it does become a local problem to determine what is notable. E.g. I have several books on feature names and their origins of the west coast of North America from the late 19th and early 20th century, and most every named rock has a story of a shipwreck, confrontation, or other historic event in any of several dozen languages. - Amgine/ t·e 17:19, 8 January 2017 (UTC)
I think place names should be allowed, provided they meet CFI. I imagine that that would exclude almost all but the most notable street names, which would not be mentioned in any published works that are citeable (i.e. excluding reference books, maps, etc.). The number of such places would doubtlessly be enormous, but I see no harm in such entries, if people are willing to add them. Etymologies of even the most insignificant places can be interesting to those who live near it, and it would be kind of neat to eventually be able to search for any town in the world and find out how it got its name, etc. I have no strong feelings on including street names or non-notable towns and villages, but I support adopting John Cross's criteria. Andrew Sheedy (talk) 21:51, 8 January 2017 (UTC)
Pretty much every city and town in the United States has a "Main Street", and I would hazard a guess that pretty much all of them that have local newspapers could meet CFI (news stories always name the street where something happened). Are we ready to have an English entry with more proper noun senses than water has translations? And definitions that mostly say "a street in <NAME OF A CITY OR TOWN>", at that. As for the laziness of users limiting quantity: we seem to attract more than our share of obsessive types who will literally add every permutation theoretically possible of everything unless someone stops them. Without an extremely clear and robust consensus about what is not permissible, we could find ourselves either with dozens of entries that look like the index of a large map book, or with boatloads of rfds- or both. Chuck Entz (talk) 04:27, 10 January 2017 (UTC)
I support this as well, and feel similar to Andrew Sheedy on this. Etymology is lexically interesting, whether it's a common noun or a place name. So is pronunciation, which we should certainly also include. I would like John Cross to clarify whether subdivisions of cities are also included. This is important because the status of settlements can change: what was once a separate village can later become a suburb of a larger city. Since official status can change at any time, this might imply that includability of a term is temporary, and what was once includable might lose that status once its real-life status changes too. I think that's undesirable, so we should make our rules independent of the official status of any particular place.
As for rivers in particular, we could adopt a different definition from notability, one based on how many places it flows through, or perhaps its length or water volume. These are more objective criteria. —CodeCat 22:57, 8 January 2017 (UTC)
Well, I am far more interested in the water-related place names than the land-based ones. As a nautical buff I would suggest there are many reasons for a waterway to be notable for itself, even if it no longer exists. E.g. portions of the Zuiderzee such as this bight northwest of Amsterdam which no longer exists but certainly once had a name, the North Pacific Gyre which is a solely current-defined feature, and the Nahwitti Bar (which forms the western end of Goletas Channel between Vancouver and Hope Islands on the north-eastern coast of the former, named after the 'Nakwaxda'xw tribe whose native language is 'Nak̕wala Kwak'wala, part of the Northern Wakashan group.) - Amgine/ t·e 00:45, 9 January 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I could probably support this, as long as the notability requirement were removed from point seven; we could have something like “The names of significant natural geographic features (e.g. deserts, mountains, rivers) which are notable (per Wikipedia:Notability).” — with “significant” allowing for fine-tuning by casuistry — instead. — I.S.M.E.T.A. 02:03, 10 January 2017 (UTC)

Place names - 3rd attempt[edit]

I was really pleased with all the feedback I got from previous attempts - here is attempt 3.

"The following place names meet the criteria for inclusion:

  • The names of continents.
  • The names of seas and oceans.
  • The names of countries.
  • The names of areas or regions containing multiple countries (e.g. Middle East, Eurozone).
  • The names of primary administrative divisions (states, provinces, counties etc).
  • The names of conurbations, cities, towns, villages and hamlets.
  • Districts of towns and cities (e.g. Fulham).
  • The names of inhabited islands and archipelagos.
  • The names of other significant natural geographic features (such as large deserts and major rivers).

The Community has not yet reached a consensus as to whether or not the names of places and geographic features other than those listed above should be included in Wiktionary. There is currently no definition of "significant natural geographic features", but by way of an example, the twenty largest lakes in the world by surface area would each qualify. It is hoped that the Community will develop criteria over time to provide greater clarity and address matters not currently covered (for example the names of streets, buildings, tunnels). This policy is not intended to remove or reduce the requirement to find citations to support entries."

Please let me know what you think and please also help me to get this passed as a policy. Thank you.

John Cross (talk) 05:16, 12 January 2017 (UTC)

Support. Andrew Sheedy (talk) 05:36, 12 January 2017 (UTC)
I support everything except the requirement that islands be inhabited. — I.S.M.E.T.A. 08:21, 12 January 2017 (UTC)
Support As well. I suppose that strictly speaking you can also add some copy about how other locations are assumed to not fit the criteria unless otherwise notable or somesuch. Maybe provide a higher threshold for attestation for (e.g.) streets than continents. —Justin (koavf)TCM 08:26, 12 January 2017 (UTC)
  • Support. I would imagine that an uninhabited island could still qualify as a significant natural geographic feature. I'm not very worried about missing out on insignificant uninhabited islands. bd2412 T 23:58, 12 January 2017 (UTC)

Moved the draft text to an actual policy vote: https://en.wiktionary.org/wiki/Wiktionary:Votes/pl-2017-01/Policy_on_place_names

John Cross (talk) 05:12, 13 January 2017 (UTC)

The vote is now open: Wiktionary:Votes/pl-2017-01/Policy on place names John Cross (talk) 09:00, 21 January 2017 (UTC)

Vote: Trimming CFI for Wiktionary is not an encyclopedia 2[edit]

FYI, I created Wiktionary:Votes/pl-2017-01/Trimming CFI for Wiktionary is not an encyclopedia 2 to try again to remove misleading sentences from CFI. The first attempt was at Wiktionary:Votes/pl-2015-02/Trimming CFI for Wiktionary is not an encyclopedia.

@John Cross: The vote may help address some of the concerns you have raised recently. Note that the English Wiktionary inclusion policy about places names is actually at WT:NSE, and in practice leads to a fairly indiscriminate inclusion of a broad variety of place names. A vote that well indicates the English Wiktionary stance on place names is this: Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2. --Dan Polansky (talk) 08:46, 8 January 2017 (UTC)

@Dan Polansky thank you. I have voted. John Cross (talk) 20:27, 17 January 2017 (UTC)

Wiktionary:Entry layout[edit]

POS orders I don't see anything specifying the order in which parts of speech should be placed in an entry. It seems like it goes something like Nouns→Verbs→Adjectives→...others? But I don't see a particular guideline. For instance, book has this order but the meanings of a bound collection of pages and codices is a lot more common than when the police enter a perp into the system (the verb form). At perfect, the adjective is before the adverb but this is also true of alphabetical order. Is there a preferred order for these entries? If so, it seems wise to state it explicitly. —Justin (koavf)TCM 22:06, 8 January 2017 (UTC)

They're ordered similar to senses: from common to rare. If they are ordered differently in an entry, it should be fixed. —CodeCat 22:43, 8 January 2017 (UTC)
I always thought they were alphabetized (adjective, noun, verb). DTLHS (talk) 22:49, 8 January 2017 (UTC)
Not usually. But I'd tend to put (say) the noun above the verb if the noun clearly came first — even though I prefer senses to be ordered by commonest current usage first. Equinox 22:50, 8 January 2017 (UTC)
Like Equinox, I list POS sections in order of historical development (ditto for etymologies, in the case of homonyms), but I also do that for senses. AFAIK, there does not exist a consensus here vis-à-vis listing by historical development vs. frequency in common usage vs. alphabetical order vs. whatever other scheme; accordingly, "fixing" the order of POS sections and/or senses, as CodeCat suggests, is likely to prove controversial. — I.S.M.E.T.A. 01:58, 10 January 2017 (UTC)

Actualités: Monthly news of French Wiktionary in English[edit]

Hi all,

We continue to translate our monthly publication in English, for people interested in what is going on in French Wiktionary. This month, we relate a big project we have on Occitan, a language spoken in the Southern France, and give some wordplay in French. We're eager to receive your feedback and joke suggestions!

Every months, we provide a bunch of metrics and try to write a couple of paragraphs about our beloved project. This edition is the 21th issue, so it is allowed to drink alcool in US, something quite unusual for a periodical newspaper! Noé (talk) 00:01, 9 January 2017 (UTC)

You forgot the link! [3] Equinox 00:09, 9 January 2017 (UTC)
Do you have any insights about running large projects involving multiple editors? On this site it seems like people prefer to work alone and we don't really do anything like focus on a single language for a month. DTLHS (talk) 00:39, 9 January 2017 (UTC)
The Occitan effort was part of a larger trans-project effort in continuing to develop content in one of the several primary regional languages of France. A parallel does not seem to exist in English, although possibly it might be if there were an effort to build content in Scots within en.WP and across the English wikimedia projects. - Amgine/ t·e 03:44, 9 January 2017 (UTC)
Cornish, anyone? Equinox 15:54, 10 January 2017 (UTC)

Special:Contributions/Cherusk - Russian pronunciation edits[edit]

The edits of this user are currently under my scrutiny. Apparently they have unsuccessfully tried their luck on the Russian Wiktionary first. A regular knowledgeable user think it's a troll. Cherusk uses various non-standard or dialectal pronunciations, records own audio files. In some cases they are definitely wrong pronunciations. E.g. diff. Not sure if they are good faith edits. Considering a block if Cherusk doesn't stop. I don't have time to check each audio file but I can judge by the IPA edits. --Anatoli T. (обсудить/вклад) 03:21, 9 January 2017 (UTC)

Move character box images to Lua modules[edit]

See this diff.

I'd like to move all the images from the character boxes in each entry to Lua modules like Module:Unicode data/images/000. The charbox template ({{character info/new}}) is already prepared to find images in the modules.

Images to be moved: Category:Character boxes with images.


I think this would be a good idea, in case we want to use a database of images for each character somewhere else other than charboxes. In the future, I'm thinking of showing all the character images automatically in the Unicode tables like Appendix:Unicode/Latin Extended-A, if it's OK with everyone.

(ping: @I'm so meta even this acronym)

--Daniel Carrero (talk) 07:07, 10 January 2017 (UTC)

@Daniel Carrero: This change seems entirely positive to me. Just to make sure I understand you properly, do the numbers in the PAGENAME for Module:Unicode data/images/000 (taking your example) denote the first three numbers common to the Unicode codepoints in the range for which that page lists data? That is, does that “000” denote “U+000##” (where the first zero denotes the Basic Multilingual Plane), meaning that Module:Unicode data/images/000 covers everything from U+00000 NULL to U+000FF LATIN SMALL LETTER Y WITH DIAERESIS (ÿ) — in this case the entirety of the C0 Controls and Basic Latin and C1 Controls and Latin-1 Supplement Unicode blocks? — I.S.M.E.T.A. 10:42, 10 January 2017 (UTC)
@I'm so meta even this acronym. Good question. The answer is: I'm planning to imitate what is already done for character names. (unless there is any suggestion of a different naming system) We have modules like Module:Unicode data/names/000, Module:Unicode data/names/001, etc. The full list of modules is here: Module:Unicode data. That is, I'd like the first image module (the one you asked about: Module:Unicode data/images/000) to contain all codepoints from U+00000 NULL to 0x00FDA TIBETAN MARK TRAILING MCHAN RTAGS, which is the last assigned codepoint before the true upper limit: U+00FFF (unassigned). --Daniel Carrero (talk) 11:33, 10 January 2017 (UTC)
@Daniel Carrero: Oh, my mistake: Plane 16 (consisting of the Supplemental Private Use Area-B block and two non-characters) comprises codepoints U+100000–U+10FFFF, so it makes sense for “000” to denote “U+000###”. I agree with that naming system. — I.S.M.E.T.A. 14:36, 10 January 2017 (UTC)

Now, I made the images appear in the Unicode appendices. See Appendix:Unicode/Basic Latin, Appendix:Unicode/Greek and Coptic, Appendix:Unicode/Yijing Hexagram Symbols, Appendix:Unicode/Arrows... --Daniel Carrero (talk) 18:41, 12 January 2017 (UTC)

I noticed that @Octahedron80 copied the charbox template to the Thai Wiktionary (template link), which is great. Other Wiktionaries may want to copy the image modules, too. I created Category:character info/new with image as a temporary category with all the entries that are still using an "image=" parameter. I plan to delete the category once it's empty. There are still about 800 entries to go. --Daniel Carrero (talk) 10:34, 15 January 2017 (UTC)

Yes check.svg Done. I moved all charbox images from entries to modules. I removed the "image=" parameter from {{character info/new}}, too. --Daniel Carrero (talk) 20:01, 9 February 2017 (UTC)

Well done, Dan. That's some great work you've done there. :-)  — I.S.M.E.T.A. 14:18, 15 February 2017 (UTC)
Thank you. :) --Daniel Carrero (talk) 14:21, 15 February 2017 (UTC)

Chakavian, Kajkavian, Torlakian?[edit]

I would like to add entries for Chakavian words to Proto-Slavic entries but there is no language code that I can see for this language. Realistically speaking I would say this is its own Slavic language rather than a dialect of Serbo-Croatian, although it could be argued the other way. If we add Chakavian, we should also add Kajkavian and maybe Torlakian as well. Comments? Benwing2 (talk) 01:53, 12 January 2017 (UTC)

@Ivan Štambuk and others have been content to add it as Serbo-Croatian with a dialect label. See Category:Chakavian Serbo-Croatian. Dialectal words can always be added as descendants where appropriate. —Μετάknowledgediscuss/deeds 01:02, 17 January 2017 (UTC)

What accentual notation should be used for Proto-Slavic?[edit]

@CodeCat I've been working on adding Proto-Slavic etymologies to Russian words and editing some of the Proto-Slavic reconstructed terms. Derksen 2008 uses the following notation:

  • à = short rising (old acute on an originally long vowel, neoacute on an originally short vowel)
  • á = long rising (neoacute)
  • ȁ = short falling (circumflex in some syllables, or original short accent? circumflex if the vowel was originally long, short accent otherwise), e.g. *dȅsętь "ten", *dȅsnъ "right", *dȍma "at home"
  • ȃ = long falling (circumflex in other syllables, or original short accent? circumflex if the vowel was originally long, short accent otherwise), e.g. *dȃrъ "gift", *dȏmъ "house"

The alternative notation is:

  • ȁ (maybe?) = original short accent
  • á = old acute
  • ã = neoacute
  • ȃ = circumflex

It could be argued that the latter notation is simpler, on the other hand it may not be accurate -- if you believe Derksen's view (maybe the Leiden view?), there was a never a time with three contrasting long-vowel accents, and the old acute was shortened prior to the shift that produced neoacutes.

Which should we use? Benwing2 (talk) 16:05, 12 January 2017 (UTC)

I'd recommend using a notation that does not explicitly encode a particular tone contour or length, as these aspects often differed in various Slavic dialects. Instead, the tones should be indicated by their identity: the acute is the acute no matter if it's long or short. —CodeCat 16:10, 12 January 2017 (UTC)
OK. I looked at WT:About Proto-Slavic and it does have a section on prosodic notation. However:
  1. It's pretty complicated, with six different accent types (seven if we count the macron for unstressed length). Furthermore, some of them (e.g. a̍, the word-final accent) don't display very well (e.g. in edit windows, at least on my Mac, where I just see boxes).
  2. This system isn't actually being used all that much. I took a look at various nouns in Category:Proto-Slavic nominals with accent paradigm b and many of them are using word-final à and ò, as in Derksen's and Wikipedia's system (see below).
Do we want to simplify this system? Possibly we can use the system used in the noun declensions given in Wikipedia's Proto-Slavic, which seems pretty logical to me.
Also, I am thinking of modifying the module that generates Proto-Slavic declensions to include accent marks. Sound OK to you? If we do this, it's another argument for using the Wikipedia system, since it would match up the declensions directly with that system. Benwing2 (talk) 19:27, 12 January 2017 (UTC)
The stuff on WT:ASLA was mostly put there by Ivan Štambuk without there being any agreement on it. Is the Wikipedia system the same as the alternative notation you gave above? —CodeCat 19:55, 12 January 2017 (UTC)
The stuff on Wikipedia is very similar to the Derksen system above but it uses ã for the neoacute instead of á or à (on both short and long vowels). The only place where the acute accent (á) is used is on final syllables, where both acute and grave occur. This is evidently indicating a length distinction (e.g. nominative singular nogà vs genitive singular nogý) but I don't know if this is standard or not. Note in particular that the genitive singular of class B žena is notated ženỳ whereas the genitive singular of class C noga is notated nogý. Benwing2 (talk) 00:27, 13 January 2017 (UTC)
What is the reason for this difference, I wonder. —CodeCat 00:30, 13 January 2017 (UTC)
I thought somewhat about this, and this is what I concluded:
  1. The length of Late Common Slavic final accented syllables is apparently reflected in the neo-circumflex in Slovenian (but nowhere else).
  2. The length of nogý but shortness of ženỳ happens because there was at one point a shortening of unstressed final syllables (this appears to be mentioned at the end of section 7 in [4]), and at the time, nogý was stressed but ženỳ was not. The final stress in class B occurred subsequent to this as a result of Dybo's law.
  3. The reason both nogà and ženà are short is a bit subtle, but apparently it's because the vowel was acute. At the time before Dybo's law, the forms were something like žȅna and nogá, with the final vowel acute. First, post-tonic acutes were lost (hence in žȅna) leaving a short vowel (per 7.13 in [5]), which remained short when the final syllable gained stress through Dybo's law. Later on, the acute was lost generally, producing a short rising vowel (9.2 in [6]).
I'm sure not everyone subscribes to this theory, though. Benwing2 (talk) 01:26, 13 January 2017 (UTC)
Why was the final vowel of *nogý not shortened the same way the vowel of *nogá was, when both had an acute? —CodeCat 02:10, 13 January 2017 (UTC)

Out of curiosity, once you decide on a system, will someone implement the paradigms in the inflection templates (if that is possible)? —JohnC5 03:57, 13 January 2017 (UTC)

Yes, I was thinking of doing exactly that. Benwing2 (talk) 05:54, 13 January 2017 (UTC)
Great! I wish I could contribute more to expedite this discussion, but I am woefully uninformed in this matter. —JohnC5 06:06, 13 January 2017 (UTC)
So I'm tentatively going with the Wikipedia notation. I corrected the comments above about it, and I'll repeat it:

Non-word-final vowels:

  • à = short rising = old acute (only on an originally long vowel)
  • ȁ = short falling (circumflex in some syllables, or original short accent; circumflex if the vowel was originally long, short accent otherwise), e.g. *dȅsętь "ten", *dȅsnъ "right", *dȍma "at home"
  • ȃ = long falling (circumflex in other syllables, or original short accent; circumflex if the vowel was originally long, short accent otherwise), e.g. *dȃrъ "gift", *dȏmъ "house"
  • ã = neoacute


  • à = short rising
  • á = long rising

Benwing2 (talk) 17:10, 13 January 2017 (UTC)

See Module talk:sla-noun. I implemented a first pass at Proto-Slavic accents (only for hard masculine o-stems). Benwing2 (talk) 06:31, 15 January 2017 (UTC)

Old French[edit]

How are we meant to distinguish between Early Old French, Old French and Late Old French? Take these:

Latin: dirēctus
Vulgar Latin: drēctus
Early Old French: dreit /dreit/
Old French: droit /droit/
Late Old French: droit /drwe/
Middle French: droit /drwe/
French: droit /dʁwa/
There's only one Wiktionary page and that's for the Old French form, not the early or late ones. ÞunoresWrǣþþe (talk) 10:20, 14 January 2017 (UTC)

As far as I know, Early Old French and Late Old French are both subsumed under just Old French. KarikaSlayer (talk) 15:44, 14 January 2017 (UTC)
Perhaps labels could be used to distinguish between them, just like Ancient Greek has labels for Attic, Koine and Byzantine, for example. — Kleio (t · c) 18:30, 14 January 2017 (UTC)
That'd be great, they really are rather different, mainly since they cover almost a millenium (like Greek)!ÞunoresWrǣþþe (talk) 18:38, 14 January 2017 (UTC)
You could do pronunciation data like it was done for Ancient Greek (here's an example) and for spelling variants you might do something like:
# {{form of|early spelling of|droit|gloss=[[right]]|lang=fro}}
While keeping all the pronunciation data at the main entry. Crom daba (talk) 02:12, 15 January 2017 (UTC)
Is there no alternative to {{form of}} here? That template should really only be used for one-off things, any kind of form-of message that could appear in many entries should have its own template. —CodeCat 02:14, 15 January 2017 (UTC)
Creating the equivalent template is left as an exercise for the reader. Crom daba (talk) 03:04, 15 January 2017 (UTC)
FWIW, I'd consider the pronunciation /drwe/ as Middle French, not Late Old French. Also, Old French droit doesn't derive from Latin dirēctus, it derives from dirēctum. The reflex of dirēctus is OF droiz.
For dreit vs. droit, you might consider a label {{lb|early}} (or maybe {{lb|Early Old French}}), since it's more a dialectal than a spelling difference. Benwing2 (talk) 04:03, 15 January 2017 (UTC)
It doesn't matter what you consider it, Late Old French droit was pronounced /drwe/. Middle French had the same pronunciation, but all the relevant sound changes had already happened by 1350. And it doesn't really matter what it derives from, as droiz ended up as a homophone anyway. Not to mention both are just inflections, not separate words. ÞunoresWrǣþþe (talk) 18:56, 17 January 2017 (UTC)
It's not a dialectal difference, one is a descendant of the other. Hwaet isn't a dialectal version of what (the same timespan separates both hwaet and what and dreit and droit). ÞunoresWrǣþþe (talk) 19:00, 17 January 2017 (UTC)

About "Glyph origin"[edit]

Some Chinese and Translingual entries have a "Glyph origin" section. Random example: . A search for "glyph origin" (link) currently returns 6,071 results.

If people want to use that section, at least I'd like it to be mentioned in WT:EL. --Daniel Carrero (talk) 05:36, 15 January 2017 (UTC)

I support adding it. —Μετάknowledgediscuss/deeds 00:59, 17 January 2017 (UTC)
Does this not conceptually overlap with the Description header that e.g. explains why the biohazard symbol looks like it does? Equinox 14:32, 17 January 2017 (UTC)
Actually, I think that "Glyph origin" overlaps with "Etymology", doesn't it? See the entry 水#Chinese (only the Chinese section has a "Glyph origin" subsection at the moment). It contains a table "Historical forms of the character 水". Should it be moved to "Etymology" in all entries? --Daniel Carrero (talk) 14:40, 17 January 2017 (UTC)
Glyph Origin and Etymology are different. See ⿱成龍 for an example. —suzukaze (tc) 22:26, 17 January 2017 (UTC)

I created Wiktionary:Votes/2017-02/Glyph origin. --Daniel Carrero (talk) 18:46, 6 February 2017 (UTC)

Morphological dictionary for Lithuanian or Serbo-Croatian?[edit]

Does anyone know of a good morphological dictionary for Lithuanian or Serbo-Croatian, comparable to Zaliznyak's book on Russian? (Grammaticheskii Slovar’ Russkogo Iazyka, aka Russian Grammar Dictionary) It should include in particular the accent patterns of words. I imagine such a thing might be written in Lithuanian or Serbo-Croatian, which isn't ideal as I don't speak either language, but I can potentially puzzle it out esp. with the help of a speaker. (I puzzled out Zaliznyak's book not really knowing Russian either, with the help of Google Translate.) Benwing2 (talk) 00:05, 17 January 2017 (UTC)

The best monolingual dictionary for Lithuanian is, by far, this one. My favourite grammar of Standard Lithuanian is Mathiassen, but there are others that are good as well. Send me an email if you need further resources. —Μετάknowledgediscuss/deeds 00:59, 17 January 2017 (UTC)
For Lithuanian you can also try these six dictionaries. --Vahag (talk) 06:19, 17 January 2017 (UTC)

Appendix for Russian patronymics[edit]

I think we need a special page for Russian patronymics. I didn't get around fixing notes in -ович, -евич, -ыч, etc. The rules and exceptions might need to go into a separate file, rather than describing them on each patronymic suffix entry. @Erutuon, thanks for your attempt, though.

Calling on whoever might be interested in collating the rules @Cinemantique, Benwing2, Wikitiki89, Erutuon, KoreanQuoter, Wanjuscha, Stephen G. Brown, CodeCat, Vahagn Petrosyan. Things to consider are formal patronymics (used in documents), colloquial forms and abbreviations, variants, irregular pronunciations (feminine -чна is consistently pronounced irregularly as -шна, as in Ильинична), stress patterns, declensions and categories, use of foreign names in the former USSR space and overseas, occasionally for expats living in Russia, naturalised migrants, usage. @Benwing2, does Zaliznyak cover this topic. @Cinemantique, it would be beneficial if the Russian Wiktionary cover this as well. --Anatoli T. (обсудить/вклад) 06:52, 17 January 2017 (UTC)

I agree, but I don't have time to work on it right now. --WikiTiki89 15:39, 17 January 2017 (UTC)
I don't think Zaliznyak covers this topic. I agree it would be great to have such an appendix. Benwing2 (talk) 08:47, 19 January 2017 (UTC)

Old Norse[edit]

How should we handle the evolution of "Old Norse"? For example, following the word Vreka from start to finish...
Proto-Germanic: IPA(key): /wrekɑnɑ̃/
Common West Scandinavian: IPA(key): /wrekɑ/
First Grammarian’s Icelandic: IPA(key): /wrεkɑ/ invalid IPA characters (ε)
Classical Old Icelandic: IPA(key): /vrεkɑ/ invalid IPA characters (ε)
Icelandic: IPA(key): /rɛka/

Common West Scandinavian, First Grammarian’s Icelandic and Classical Old Icelandic all count as Old Norse yet have very different pronunciations. How should this be represented in the entry for vreka? ÞunoresWrǣþþe (talk) 21:08, 17 January 2017 (UTC)

@ÞunoresWrǣþþe: Are you making and Old Norse pronunciation module? Also, what are your sources for these pronunciations? —JohnC5 21:25, 17 January 2017 (UTC)
No, I'm asking how we should differentiate the different pronunciations the word had in Old Norse. My sources are Old Icelandic: Its Structures and Development by Hreinn Benediktsson, The First Grammatical Treatise: The earliest Germanic Phonology by Einar Haugen, The Nordic Languages, An International Handbook on the History of the North Germanic Languages by Oskar Bandle and primary sources. ÞunoresWrǣþþe (talk) 22:51, 17 January 2017 (UTC)
You can use {{a}} to specify a specific accent. DTLHS (talk) 22:52, 17 January 2017 (UTC)
It's not an accent, they are separate, descendant languages. "Old Norse" is a blanket term that frankly shouldn't really be used given how many sound changes occurred in it. ÞunoresWrǣþþe (talk) 23:06, 17 January 2017 (UTC)
This problem exists for all ancient languages. Ancient Greek, for example, covers 2000+ years. They've handled the changing pronunciations quite well, I think, for that language. Benwing2 (talk) 23:13, 17 January 2017 (UTC)
Some can understandably be exempted, like Old English. The only sound changes in the seven century span that I can think of are syncope towards the end, and unstressed e's becoming schwas and eventually being dropped. For something like wrekɑ > vrεkɑ, some sort of note of the different pronunciations seems like it would be useful. ÞunoresWrǣþþe (talk) 23:21, 17 January 2017 (UTC)
Several points: phonological change is not the only thing that determines whether two lects get considered separate languages. Indeed, many dialects of modern English are as far apart in pronunciation as these historical periods of Old Norse, but they are still English. There is also disagreement about how sequences such as ⟨hv⟩, ⟨hr⟩, ⟨hl⟩, ⟨hn⟩, and ⟨-r⟩ would have been pronounced at different times. —JohnC5 23:29, 17 January 2017 (UTC)
Are you actually implying that PGM /e/ was [e] and lowered to [ɛ]? Korn [kʰũːɘ̃n] (talk) 23:58, 17 January 2017 (UTC)
Oh lol, I didn't even notice that! —JohnC5 00:25, 18 January 2017 (UTC)
@ÞunoresWrǣþþe: Why not just add all three of those Old Norse pronunciations (Common West Scandinavian, First Grammarian’s Icelandic, and Classical Old Icelandic) to the Old Norse entry in question? — I.S.M.E.T.A. 14:42, 15 February 2017 (UTC)

Policy proposal for LDLs.[edit]

In the light of some recent events, and since everyone expects me to do it for some reason, I'd like to toss something into the ring concerning uncodified languages which do not have an extensive written coverage and a great variety of written forms and thus put us before an issue when it comes to inclusion. We could make at least two dozen entries for some words, we know some words exist in some languages but they are never used in a CFI way, only recorded in professional works' IPA. The idea I want to put forward is to organise entries of such languages according to their last standardised form, even if it is only standardised by the scientific community. For example the regional nordic varieties would become standardised around Old Norse normalisation. To give an example from a field I actually understand: Normalised Middle Low German has fîf (5), modern forms are fif, fief, fiv, fiev, fiw, fiew, fiwe, five etc. pp. As long as we can reasonably decypher the origin of a word, we would make a hub entry based on the normalised form (fîf after Middle LG) and everything else links to it as an alternative form. Do you think this is feasible? Do you see issues? Should we have it in main entries or appendix? What are your thoughts? Korn [kʰũːɘ̃n] (talk) 11:03, 18 January 2017 (UTC)

@Korn: This is an excellent question and one that I don't think is really answered outright in Wiktionary:Criteria for inclusion. We are to include terms that one would "run across" but for dead languages, no one would "run across" anything except in print. Similarly, for a living language which is generally not documented (or at least, generally not documented with print, a la American Sign Language), one could definitely "run across" all manner of words which simply aren't encoded in written language. So there are verifiable variants of words which someone would never encounter unless he were reading 13th-century Scottish land transactions and there are words which are used on a daily basis by communities which have used a language for thousands of years. It's not really clear to me what exactly Wiktionary is attempting to include. If it's all terms one would run across then documenting oral/sign-only languages is virtually impossible. If it's all written encodings of terms that one can verify in print or digital media, then we will inevitably have huge numbers of variant spellings of the same term which will be very difficult to harmonize. If we want to do both, then we have both problems. As someone who edits here regularly but doesn't really get into the nuts and bolts of the site, this has confused me for years. —Justin (koavf)TCM 15:34, 18 January 2017 (UTC)
Yes, we have many alternative spellings, and for extinct or minority languages, it is common practice to choose a single orthographic standard to lemmatise. Also, we do cover sign languages. —Μετάknowledgediscuss/deeds 15:48, 18 January 2017 (UTC)
@Metaknowledge: Either I was unclear or your response was a little hasty: I realize that we cover sign languages. The point that I was making is that there are very common terms in many languages (most of them) which are difficult if not impossible to include here on a practical and technical level. There are speakers of languages which have no script at all, so that language certainly does exist and if you were in that community, you would run across myriad terms which we cannot include here since there is no way to write that language. (Or to put it alternately, there are many ways one could write it but none of those has been adopted by that community.) The same is true of sign languages which are only documented in audio/visual media and those communities do not use the methods for written notation that a few specialists have devised. They could—they simply choose to not. Prioritizing one form is fine but it does leave the possibility that the content at "color" and "colour" will drift and change over time in ways that we wouldn't want—these two spellings refer to the same word and concept. (We definitely would want different quotations/attestations but we would not want different definitions or translations. The etymologies would be slightly different, of course.) And of course, prioritizing one form over another is compounded by languages which use or have used different scripts: the Serbian version of Bosnian/Croatian/Montenegrin/Serbian uses Cyrillic and Latin alphabets. Kurdish variations use Arabic-based and Latin-based scripts, plus they formerly used Cyrillic-based scripts. Lithuanian was briefly written with Cyrillic characters, so there are Cyrillic versions of "name" or "house" but not for "computer". So the two difficulties I was trying to explain above are that 1.) there are many terms which we cannot include in principle as this is a written dictionary but those terms presumably could or should be included and 2.) there are many variant forms (spellings or transliterations) which we have to maintain to keep from forking into separate definitions. These two problems are pretty fundamental to the project but it's not clear to me what the solutions are. (I recall discussion on the latter at one of the discussion forums from last year). There may not be any one silver bullet solution—I suspect there's not—but as someone who uses this site on a daily basis for years, I don't know what the community has decided about these two problems. —Justin (koavf)TCM 16:05, 18 January 2017 (UTC)
I agree that forking should be minimised; we make exceptions for issues like American vs Commonwealth or Serbian vs Croatian, but generally we choose one standard or script to lemmatise at. I doubt they've been created, but any Cyrillic Lithuanian that meets CFI should be added as an alternative spelling with an appropriate label, just like we have Afrikaans entries in Arabic script, etc. —Μετάknowledgediscuss/deeds 16:10, 18 January 2017 (UTC)
Why are there exceptions? Korn [kʰũːɘ̃n] (talk) 16:48, 18 January 2017 (UTC)
For political reasons. Crom daba (talk) 17:35, 18 January 2017 (UTC)
organise entries of such languages according to their last standardised form (…) For example the regional nordic varieties would become standardised around Old Norse normalisation
Is this particular example somewhat off-center, in that modern Nordic regional varieties generally aren't very close to Old Norse? Do I read you right that you'd want to eliminate separate Elfdalian or Jamtish entries, and replace them with pronunciation footnotes in Old Norse entries? This sounds like a poor idea, and also one that does not particularly generalize: most of the time unwritten language varieties simply have no earlier written standard to fall back on, and in many cases where there it, it would take us impracticably far back (e.g. with modern Aramaic, modern West Iranian, some modern Indo-Aryan, or even some modern Romance varieties, the closest written direct ancestors are going to be all the way back in antiquity). But maybe I'm misunderstanding.
In any case, I agree with the main observation: many LDLs are often going to be essentially un-CFI, if not unverifiable from uses altogether. I wonder if a first step should be to divide LDLs into a couple subcategories. The following seem reasonably distinguishable:
  1. living languages with an established literary tradition but poor online representation (e.g. Mon, Javanese)
  2. living languages with at most a nascent, unstandardized literary tradition (e.g. Elfdalian, Karelian, Tuvan, Tulu, Navajo)
  3. extinct/liturgical languages with a reasonably-sized corpus (e.g. Ancient Greek, Akkadian, Coptic, Old Tupi)
  4. extinct languages with only fragmentary records known (e.g. Oscan, Phrygian, Crimean Gothic, Old East Japanese)
  5. extinct unwritten languages documented only in linguistic works (e.g. Mator, Sireniki, Yurok, Mbabaram)
I expect #2 will be the largest category by number of languages (and, therefore, potential entries), and the default LDL policies should probably be fine-tuned for them. Non-standard dialects could probably also follow the same guidelines. The majority of current LDL entries are however probably from #1 and #3, which may require their own separate CFI standards. Lastly, I am not convinced if #4 and especially #5 should be covered in mainspace at all. --Tropylium (talk) 16:25, 20 January 2017 (UTC)
Sorry, to clarify: The proposal specifically was that out default approach be that we devise our own normalisations based on the normalisation/standard of the closest ancestor, even if a divergent but non-exhaustive (doesn't cover enough spoken language) and/or chaotic (differs somewhat strongly from author to author) written form exists. Korn [kʰũːɘ̃n] (talk) 22:00, 20 January 2017 (UTC)
I'm most interested in Middle English and Scots, with respect to this issue. Both have decent sized corpuses, but huge variability in spelling, including in the case of Middle English many works published in spellings that have been normalized towards modern English. Having only normalized entries with Middle English isn't going to help the reader much.--Prosfilaes (talk) 21:26, 22 January 2017 (UTC)
Nobody said 'only'. They become the entry, everything else links back to it. To avoid having fifteen entries or multiple discussions which form to consider the entry and which an alternative form. Korn [kʰũːɘ̃n] (talk) 22:05, 22 January 2017 (UTC)

@I'm so meta even this acronym I guess this qualifies as 'gets going'. Korn [kʰũːɘ̃n] (talk) 11:39, 23 January 2017 (UTC)

@Korn: Thanks the the ping. I shall mull over this a bit, and then contribute. — I.S.M.E.T.A. 12:58, 15 February 2017 (UTC)

Getting rid of the idiomatic tag[edit]

In my opinion, it doesn't carry any useful information whatsoever, and in many -if not all- instances could be best replaced by the figuratively tag; in fact, that's what I've been doing, and I'm sure nobody would break a sweat if I kept doing that silently. What do you think? --Barytonesis (talk) 19:57, 18 January 2017 (UTC)

Support, though it can't always be replaced with "figuratively". Figuratively requires a contrasting literal meaning, but we don't include literal meanings because they are SoP. Therefore, every term consisting of multiple words that is includable is implicitly not literal. I am not convinced that the "figuratively" label is useful for single-word terms either. For me, there's no literal and figurative senses of a word, there's just senses. —CodeCat 20:12, 18 January 2017 (UTC)
Sometimes it seems to have been used where colloquial would be more appropriate, e.g. bucket down. Equinox 20:14, 18 January 2017 (UTC)
Ofttimes, colloquial is used where informal would seem better. I've thought that colloquial indicated a set of situations, not a register. DCDuring TALK 20:43, 18 January 2017 (UTC)
Might be nice to create a style guide that connects each of the many labels to how we, collectively, use them and expect them to be used. - TheDaveRoss 21:38, 18 January 2017 (UTC)
I support removing it in general, but I think it should be done carefully in a way that ensures that if some tag is needed, then a good replacement is found. --WikiTiki89 21:56, 18 January 2017 (UTC)
What about cases in which a word can have a certain meaning in a (limited) number of set phrases but only in them? I'd use idiomatically in such cases, would this also be removed? Crom daba (talk) 00:46, 19 January 2017 (UTC)
I would label that "in idioms". --WikiTiki89 15:39, 19 January 2017 (UTC)
I think there's at least some usefulness to the idiomatic tag. For one thing, we have the {{&lit}} template that opposes it. For another, for phrasal verbs like get down or throw up, some of the definitions given follow logically from the sum of their parts (even if it's not completely obvious a priori that the phrases can be used in that fashion) and others certainly don't (e.g. get down meanings #1, 3, 4, 6, 8, 9 are at least partially non-idiomatic, whereas #5 and 7 are certainly idiomatic. Similarly, throw up meanings #1, 4, 5 are more or less non-idiomatic, where #2 (to "vomit") is almost protypically idiomatic (and I would dispute that it's colloquial, I think it's actually quite neutral in register). Benwing2 (talk) 03:35, 19 January 2017 (UTC)
Usually it can and should just be removed. - -sche (discuss) 11:02, 20 January 2017 (UTC)

Possibly useful site for Early Modern English attestation[edit]

See here This site is a part of Zooniverse which has some people-powered research which is a little tangential to what we do here (somewhat similar to v:) and this project attempts to transcribe all manner of works from the time of Shakespeare to provide context to his plays. It could also be useful for some of us looking for spellings or older instances of words. —Justin (koavf)TCM 21:19, 18 January 2017 (UTC)

@Koavf: Lovely! Thanks for posting this link. — I.S.M.E.T.A. 13:22, 15 February 2017 (UTC)

Arabic consonant patterns[edit]

Most languages have suffixes or prefixes, which are easy to write entries on, but Arabic has consonant templates. Perhaps we could add a new POS type under the category of morpheme (alongside prefix, suffix, circumfix, root, and so on): perhaps Pattern. Then, using the root traditionally used to describe morphological forms, ف ع ل(f-ʿ-l) (and its variations), entries could be created for patterns such as فَعَلَ(faʿala), فَعِلَ(faʿila), فَعِيل(faʿīl), فَاعِل(fāʿil), فَعَائِل(faʿāʾil), فَعَائِيل(faʿāʾīl), فَعَى(faʿā), فَعَا(faʿā), فَاعَ(fāʿa), فَعْلَقَ(faʿlaqa), and their various meanings and alternative forms could be described, in much the same way that the various meanings and variations of a suffix like -y are described. As with prefixes and suffixes, there would be a main category, Category:Arabic patterns, and subcategories like Category:Arabic noun-forming patterns, Category:Arabic verb-forming patterns, Category:Arabic diminutive patterns, and so on.

The alternative, I suppose, would be an appendix that lists these patterns and defines them. But I think it would be unnecessarily biased against nonconcatenative morphology to not have the patterns described in the main namespace in the same way as concatenative morphemes such as prefix and suffix are.

I wonder, has this idea been proposed before? @CodeCat, Atitarev, Wikitiki89, Mahmudmasri, Benwing, ZxxZxxZ, any thoughts? — Eru·tuon 23:17, 18 January 2017 (UTC)

These are in Hebrew too (binyan), and maybe other Semitic languages. Equinox 23:20, 18 January 2017 (UTC)
See w:Transfix. —CodeCat 23:47, 18 January 2017 (UTC)
So I guess the question is, shall we use transfix or pattern for this morphological concept? Pattern is a translation of وَزْن(wazn), I think, so it would be more traditional. — Eru·tuon 03:43, 19 January 2017 (UTC)
I'd rather use transfix, as it's the linguistic term, and also more distinctive than "pattern". It also fits better with other kinds of -fixes. —CodeCat 14:11, 19 January 2017 (UTC)
@CodeCat: In all the linguistic literature I've read about Semitic languages, I have never encountered the word "transfix". They usually either use the word "pattern" or a borrowing from one of the languages, such as "wazn" or "mishqal". Sometimes other words like "mold" are used. @Erutuon: "Pattern" is not a translation of وَزْن(wazn), but simply an English description of the phenomenon. A translation of وَزْن(wazn) would have been "weight" (the Hebrew מִשְׁקָל(mishkál) is probably a translation of وَزْن(wazn)). --WikiTiki89 15:53, 19 January 2017 (UTC)
Thanks for tagging me. I would rather go for pattern, but it appears that transfix had already been chosen. --Mahmudmasri (talk) 19:06, 19 January 2017 (UTC)
Category:Arabic patterns is a really bad name. It's far too ambiguous a term, "pattern" can mean many things. We should choose a name that is clear. —CodeCat 19:09, 19 January 2017 (UTC)
It's not ambiguous if it's clearly explained on the page. You're the only one who supports "transfix" and you don't even work with these languages. We already have Category:Hebrew terms by pattern, if you want to see how it could look. --WikiTiki89 19:15, 19 January 2017 (UTC)
We are already (to some extent) doing this for Hebrew. See the root and pattern links (and categories) at the entry מלכה. I've been considering doing this for Arabic as well. --WikiTiki89 23:49, 18 January 2017 (UTC)
Lexical content shouldn't be in appendixes, I think. Appendixes can be used to clarify a topic or give an overview of forms, but the actual forms should still have real entries. —CodeCat 23:52, 18 January 2017 (UTC)
But the patterns are often only quasi-lexical. Sometimes they have well-defined meanings, other times they are just happenstance. --WikiTiki89 00:07, 19 January 2017 (UTC)
Perhaps we could establish official criteria to distinguish happenstance from lexical patterns. Perhaps we could classify them as lexical if an identifiable meaning for the pattern can be discerned in at least three terms using the pattern. This criterion would easily apply to the patterns I mentioned above. But the pattern فِعَال(fiʿāl) of كِتَاب(kitāb) might not, so it might have to go in an appendix. (On the other hand, another use of the pattern, for an adjectival broken plural, as in جِمَال(jimāl), might qualify as lexical.) — Eru·tuon 03:43, 19 January 2017 (UTC)
@Erutuon: I think it would be confusing to have some patterns in one place and others in another place. It would be better to have them either all in the main namespace or all in the appendix. Another reason I don't like the idea of having them in the main namespace is because they contain placeholder consonants that are not really part of the morpheme. That gets especially confusing if it coincides with an actual word. --WikiTiki89 15:53, 19 January 2017 (UTC)
I support the idea but this is already happening (Erutuon must be aware of this, judging by the edits): Category:Arabic_roots. Ideally, all Arabic terms, which have {{ar-root}} in the Etymology sections, should be categorised by the root letters for the ease of locating them. E.g تَكْفِير(takfīr) is formed from the root letters {{ar-root|ك|ف|ر}} (k-f-r) and should be searchable by k-f-r, even if it starts with "t" (ت) if searched alphabetically. --Anatoli T. (обсудить/вклад) 00:03, 19 January 2017 (UTC)
In Arabic, we are doing it for roots, but not for patterns. --WikiTiki89 00:07, 19 January 2017 (UTC)
Ah, I see, patterns. Yes, I support this as well. Loanwords, if they are not formed by using a pattern, should somehow be excluded. I think it's better to use romanised names for categories, e.g. Category:Arabic fāʿil pattern, which may belong to other pattern groups. --Anatoli T. (обсудить/вклад) 00:11, 19 January 2017 (UTC)
It should be mentioned that for verbs it's possible to categorize any verb with a conjugation template (which is essentially all of them) by root already -- {{ar-conj}} automatically computes the root of a verb as part of its operation, and AFAIK there are no loanword verbs in Arabic. Benwing2 (talk) 03:22, 19 January 2017 (UTC)
برج says it was borrowed, or should the noun and verb go in different etymology sections? (also has a descendant from the same term from which it was supposedly borrowed). DTLHS (talk) 03:33, 19 January 2017 (UTC)
The verb appears unrelated to the noun and IMO should be in a different etym seciton. OTOH now that I think of it there are clearly verbs that are indirectly borrowed, e.g. تَفَلْسَفَ(tafalsafa, to philosophize), although in this case it's formed from a borrowed noun and it could still be argued that it was formed by extracting a root f-l-s-f and applying a verbal pattern to it, and the root thereby acquired a meaning "philosophy". Hebrew similarly has לסבסד(lesabsed, to subsidize) where the same thing could be said, I think. Benwing2 (talk) 03:40, 19 January 2017 (UTC)
@Benwing2: Yes, I think the best way of looking at it is that the root was extracted from the noun and now exists on its own. The decision for Hebrew was that categorization by root is synchronic, so it doesn't matter if the word came from the root or the root came from the word or whatever else. --WikiTiki89 15:53, 19 January 2017 (UTC)
@Erutuon: Am I correct in interpreting a term's pattern as its discontinuous series of vowels which dovetails with its consonantal root? — I.S.M.E.T.A. 13:15, 15 February 2017 (UTC)
@I'm so meta even this acronym: Close. A pattern can also contain additional consonants that are not part of the root. --WikiTiki89 16:48, 15 February 2017 (UTC)

I've now created {{transfix}}, which works like the other morphology templates. It has a corresponding category template {{transfixcat}}. —CodeCat 15:00, 19 January 2017 (UTC)

Per-sense genders[edit]

We currently have mechanisms to indicate the genders of nouns in the headword, and we have the ability to indicate multiple genders. However, there are occasionally multi-gender nouns that have the same etymology and thus are one word (i.e. one "Noun" header and headword, one inflection table) but for which the genders are different for different senses. We have no method of showing this, currently. Context labels don't seem appropriate, because they set a context in which a certain sense applies, while this is the opposite: rather than "when feminine, the noun means X", this is "when the noun means X, it is feminine", cause and effect are the other way around. So how should we deal with it? —CodeCat 15:18, 19 January 2017 (UTC)

We had this conversation last year; see this discussion. The upshot is that multiple noun headers works fine in most cases. —Μετάknowledgediscuss/deeds 15:42, 19 January 2017 (UTC)
I don't find that solution satisfactory at all, so I'd like to ask for a better one. —CodeCat 16:24, 19 January 2017 (UTC)
@CodeCat: I agree that that is not a satisfactory solution. I tend to use context labels, which is the method you initially criticised, because I know of no better solution/presentation. — I.S.M.E.T.A. 13:27, 15 February 2017 (UTC)
Another one we've used is to mark both genders in the headword line and then add {{lb|fr|masculine}} and {{lb|fr|feminine}} to the appropriate senses. --WikiTiki89 16:28, 19 January 2017 (UTC)
How about two noun sections? I don't have any example, but when in German die Leiter and der Leiter would have the same etymology there could be two Noun sections (===Noun=== or ====Noun=====) and two headers. Also in German it would be a mess to use one header for both genders like {{head|de|noun|g=m|g2=f|....}} as the word declines differently based on the gender (der Leiter, gen. des Leiters, pl. die Leiter; die Leiter, gen. der Leiter, pl. die Leitern). - 15:47, 22 January 2017 (UTC)

User keeps adding nonstandard Dutch diminutives[edit]

The IP user that speaks in gibberish, User:, keeps adding nonstandard dialectal diminutives to entries. I've asked them to stop, but just received more gibberish in response and they've continued to do it. Can someone make the point more clear, please? —CodeCat 13:21, 22 January 2017 (UTC)

It's good that they add some Belgian Dutch coverage, of which we don't have a lot on Wiktionary, and for that reason I've been working a bit with this anon to improve the entries they created and I don't want to be too harsh about some mistakes they make while editing. But rules are still rules, and their writing style (which is due to a strain injury which makes it difficult for them to type, so they use a shorthand which can look like gibberish) is just a tad too abbreviated to be easily understood, which can be annoying. — Kleio (t · c) 15:34, 22 January 2017 (UTC)
What's nonstandard and dialectal about snottebelleke, mantelzorgerke?
-ke, -ken, -tje mention a Dutch suffix -ke without saying anything like nonstandard or dialectal. Let's assume it is dialectal. Is it used in regular Dutch or just in dialects or dialects of other languages like Dutch Low German? Compare for example with German -li. In usual German it's uncommon, in dialectal German (Alemannic German, not regular German) it is used, and in Swiss German (regional German but still regular German) it's sometimes used as well. Even Duden has an entry for example for Hörnli (www.duden.de/rechtschreibung/Hoernli). If Hörnli would have the meaning little horn in (Swiss) German, it would be a regional diminutive of Horn, but would still be regular German and not just dialectal. The problem then would be wiktionary's habit of adding dimutives. Diminutives and gendered forms usually are derived terms like Hörnlein is Horn + suffix -lein (and umlaut), Lehrererin is Lehrer + suffix -in. But when adding some derived terms like diminutives in the header, I can't see any reason to exclude regional, e.g. Swiss German, diminutives. Also why are diminutives added in headers but no adjectives like friend (... adjective friendly) or abstract nouns like green (.... abstract greenness) or collectives like Brüder (... collective Gebrüder)? It should make more sense to mention derived and related terms (diminutives, adjectives, collectives) only as derived and related terms respectively and not in headers.
As for Horn and Hörnli: Duden only has the figurative meaning croissant (compare with Hörnchen) for Hörnli, and I haven't seen Hörnli meaning little horn in (Swiss) German. If it doesn't have the literal meaning little horn in (Swiss) German, then it doesn't fit as diminutive. But if it does have that meaning, it would be a diminutive of Horn.
As for snottebelleke, mantelzorgerke: As I searched at google books, there weren't any results for it, so it could fail WT:RFV. Then they doesn't belong in any entry. But if they are attested, I can't see any reason for not adding them as diminutives.
- 15:47, 22 January 2017 (UTC)
Well Brabantian is mainly spoken, so one would need to go recording people (next Google project?) 16:50, 22 January 2017 (UTC)
Songs and films also count as attestations, but unpublished recordings and Youtube clips don't. Crom daba (talk) 02:23, 23 January 2017 (UTC)
De_Koninck_(bier), see bolleke. Of course, most Brabantian words have a diminutive, but it's rare to see them in print (Youtube would be fabulous actually for spoken lects!) As for unpublished recordings, I see the problem (really something for Google if you ask me with respect to documenting spoken lects). 09:04, 23 January 2017 (UTC)
ps.byATESTNalsoQUITEAFEW SN.DIMz'dstrugl as1.DICentryznocount(USAGduz!2.thoweNATIVSPEAKRSalno'owthey=formdd,noal=usdmuch(letalonin

ritin>shalwestartTAKEOUT CCsDIMtilshealATESTSm??(owmanyANGELSonpinzpoint gen,sai)INSTEDofADINCONTNT(evnSNpart=dirtpoorfulofomisnz(butlilwondrgivnabuv..:((~~

I've changed two of Sven's posts to regular spelling. Lingo Bingo Dingo (talk) 13:00, 23 January 2017 (UTC)
For words that get used in some kind of standard language, be it in Belgium or the Netherlands, I'd prefer putting the standard diminutive ahead, even if it's hard to attest. But I don't see much point in giving preference to a fictional standardised form if the word isn't used in standard lects to begin with. Lingo Bingo Dingo (talk) 13:00, 23 January 2017 (UTC)

atestatnVANDALIZDinfo:https://www.google.be/search?client=firefox-b-ab&q=dikkenekske&nfpr=1&sa=X&ved=0ahUKEwjUy-GQttjRAhWJOBoKHWvAAdEQvgUIGigB;nowCC"standed"rplacmnt:https://www.google.be/search?client=firefox-b-ab&q=dikkenekske&nfpr=1&sa=X&ved=0ahUKEwjUy-GQttjRAhWJOBoKHWvAAdEQvgUIGigB#q=dikkenekje&nfpr=1&start=10<ZILCH(naturaly,ashewasMESINinBRABentry(ivsenpplBLOKD4les..(ndaRUBISHstilstands:(https://en.wiktionary.org/wiki/dikkenek62.235.174.135 20:22, 23 January 2017 (UTC)

a.SN-ke formsWAYmorIRegular,fe.baasje><bazeke<aditionalNED2v'm! 13:48, 25 January 2017 (UTC)

morATESTATN(vers van de pers!:13, 2013 - Urbanus roept prins Laurent op het 'matteke' · Te weinig leraren in Antwerpse scholen door griepepidemie · Zeker zes doden bij schietpartij in ... Hele gemeente in shock na tragisch ongeluk Renske (4) - Vlaams ...<thezFORMSnedACOMODATEDinWT,somuch=clear(OWecanDEBAT~,sur(a.~ie,tjie etc81.11.219.200 14:49, 1 February 2017 (UTC)

Strengthening connections between entries[edit]

We have so many stubs, isolated pages, and one-way links that I'm wondering what we can do to strengthen the connections between our entries. So many of our Spanish entries are just a link to the plural and the English word. One idea is generating lists of one-way links within a language (related terms, et al), trans tables to entries, or (especially) between etymology sections (descendants sections are usually underpopulated/nonexistent), but I think we should discuss other ideas. For example, I can't find it, but isn't there a kind of subpage we can link to for related terms in the way we sometimes do for descendants? Is that worth pursuing for this purpose? What other ideas do y'all have? Ultimateria (talk) 12:41, 23 January 2017 (UTC)

My entry-generation program tries to find related terms to add to new entries. It works like this: suppose entries X, Y and Z have a related terms section with a link to W; when the program generates the entry for W, it adds a related terms section with links to X, Y and Z.
But in practice very few new entries ended up having automatic related terms, because existing related terms are usually links to entries that already exist. I could change it to find new related terms that could be added to existing entries, if there’s enough interest. — Ungoliant (falai) 13:52, 23 January 2017 (UTC)
@Ultimateria: You probably know this but just in case you didn't or others didn't, Special:DeadendPages and Special:WantedPages are reports which have some relevance to what you're proposing. (biste has almost 5,000 incoming links!) —Justin (koavf)TCM 18:14, 23 January 2017 (UTC)
@Ungoliant: That's roughly what I was thinking; often a Romance verb will have 2-5 related entries, none of which link back to it. We could do something there. Especially with English basic lemmas which could link to dozens more pages, say, entries that link to them in the headword.
@Koavf: I have used those pages at times, but the ones I'm even more concerned about are Special:OrphanedPages, and I think it would be helpful to also create lists of pages by language that are orphaned by other lemmas, ie, that aren't just linked to from their own inflections. Ultimateria (talk) 11:14, 24 January 2017 (UTC)
One idea would be to populate related links from different wiktionary editions, so for Spanish entries take data from es.wikt. Doing this completely automated is tricky, but it could perhaps be implemented as a tool which runs on demand&supervised. – Jberkel (talk) 18:40, 23 January 2017 (UTC)
  • This sounds like our entries need building out, in general -- any entry or POS subsection that just provides a single one-word gloss strikes me as quite deficient. How is the term used? What shades of meaning does it include? Are there slang senses? Etc., etc. ‑‑ Eiríkr Útlendi │Tala við mig 18:46, 23 January 2017 (UTC)
I'm skeptical of any automatic solution. One thing that helps is to enable User:Dixtosa/nearby.js since related terms will often be nearby alphabetically. DTLHS (talk) 18:49, 23 January 2017 (UTC)
I think using other wikts is an excellent idea, and I'm also skeptical of anything automated. As it is, I just expand Spanish entries as I find them, but to have a list of words that I could add to them would help me greatly at finding ones to expand and knowing which are its related terms. Ultimateria (talk) 11:14, 24 January 2017 (UTC)
Does anyone do a dump run that set-subtracts our entries from the items enclosed in {{l}}, sorted by the language parameter? That would serve to identify misspellings as well as exploit work already done that partially addresses the missing entry concern. Initially one could exclude multi-word terms.
Does anyone have a list of all of the tokens used in definitions (which are supposed to be in English) and in citations (which could be sorted by language of the citation)?
I suspect that there are other similarly simple approaches. I don't have any intuition as to how resource-intensive these are, but they could be tested on subsets, such as words beginning with "q", "z", etc.. DCDuring TALK 13:36, 24 January 2017 (UTC)
One-way links were identified systematically by Visviva's "Linkeration" system, which seems to have addressed many subtleties, judging by the subpages of User:Visviva/Linkeration. I don't know whether he would respond to an e-mail. DCDuring TALK 13:48, 24 January 2017 (UTC)

AWB acess[edit]

User request I'd like to be able to use AutoWiki Browser here. As you can see from my most recent contributions, I've been filling in lists and sets in English and Spanish and having access will make it much easier to make the categories. —Justin (koavf)TCM 19:29, 23 January 2017 (UTC)

Done. DTLHS (talk) 22:54, 23 January 2017 (UTC)

Merger into Scandoromani[edit]

Discussion moved to WT:RFM#Merger_into_Scandoromani.

Hi, I suppose it's an automatic process, so perhaps this is not the place to post. I'll provide an example:

In the English entry for 'pilgrim', the etymology section provides the following info : 'Middle English (early 13th century) pilegrim', yet plegrim's link directs you to an entry of its Norwegian homograph without any reference to Middle English.

I'd like to avoid it, so that I do not waste time following a link which will not provide any relevant info.

Thank you in advance.

avoid homograph hyperlink[edit]

Hi, I suppose it's an automatic process, so perhaps this is not the place to post. I'll provide an example:

In the English entry for 'pilgrim', the etymology section provides the following info : 'Middle English (early 13th century) pilegrim', yet pilegrim’s link directs you to an entry of its Norwegian homograph without any reference to Middle English.

I'd like to avoid it, so that I do not waste time following a link which will not provide any relevant info.

Thank you in advance. --Backinstadiums (talk) 09:00, 26 January 2017 (UTC)

The problem of course, is that the page pilegrim doesn't have a Middle English entry yet. That happens all the time. There's no way to prevent links to pages that lack the corresponding language section, but if you go to your Preferences, under the Gadgets tab, you can select the box for "OrangeLinks: colour links orange if the target language is missing on an existing page". Then links to pages lacking the corresponding language section will appear in orange rather than blue, thus warning you ahead of time that there is, for example, no Middle English section at pilegrim. The link is still there, but the color keeps you from getting your hopes up. —Aɴɢʀ (talk) 09:38, 26 January 2017 (UTC)
@Angr: Hi, this does not seem to work with Arabic script. As you can check, in its root page, 'Form I: رَسِلَ ‎(rasila)' links to a page in which it's رُسُل ‎(rusul) what appears, both rasila & rasul having the same homograph base form, namely رسل. I'd like to color these links as well if they lead to a page in which the specific form hasn't been added yet. Lastly, I'd like to change orange to green coloring, if possible, so that I can notice them better. PS: mention me when replying so that I am notified. thanks in advance. --Backinstadiums (talk) 18:59, 26 January 2017 (UTC)
@Backinstadiums: It isn't Arabic script that's the problem. The orange-link function only looks to see if the language section is there, not if the exact form you're interested in is there. As for using green rather than orange, I imagine there's some way to change that on your own CSS page, but I couldn't begin to tell you how. —Aɴɢʀ (talk) 19:06, 26 January 2017 (UTC)
@Angr: Regarding the arabic issue, should I post a new thread? can you come up with a solution for it? Regarding the colors, do I need a chunk of code? It's important for me as I suffer from a chromatic disfunction, and so it's difficult to distinguish orange and red, the latter being used usually for non-existent pages. —This unsigned comment was added by Backinstadiums (talkcontribs) at 13:17, January 26, 2017‎ (UTC).
@Angr, Backinstadiums: I did "inspect" in Chrome, and I think the CSS class for the orange links is .partlynew. You can use that to assign a color that's more visible for you. — Eru·tuon 20:17, 26 January 2017 (UTC)
@Erutuon:First of all, thanks for replying. Could you, please, post a step-by-step guideline on how to proceed? I know nothing about codes or CSS. —This unsigned comment was added by Backinstadiums (talkcontribs) at 14:43, January 26, 2017‎ (UTC).
.partlynew { color: #hexcode or colorname; }
Okay, so go to your common.css page. Add a style rule like the one shown to the right. .partlynew selects for HTML tags that have class="partlynew" in them. The brackets enclose style properties; the property color is font color. You'll have to go to a page like the Web colors article on Wikipedia to select a color. That's the most basic answer. There are also ways to make visited links colored differently, but I don't have experience with that. (Also, just to note, I signed your posts for you.) — Eru·tuon 21:06, 26 January 2017 (UTC)
@Erutuon: Hi again, I do not why it doesn't work, have you tested it yourself? could you take a look? https://en.wiktionary.org/wiki/User:Backinstadiums/common.css --Backinstadiums (talk) 21:33, 26 January 2017 (UTC)
@Backinstadiums: I added your code to my CSS page, and checked a link. It's getting overwritten by the default CSS. Add a space and !important after the hex code, before the final ;. That will make your color override the default. — Eru·tuon 21:42, 26 January 2017 (UTC)
@Erutuon: thank you so much, I does work now. Regarding the arabic script, where could I get a solution? --Backinstadiums (talk) 22:17, 26 January 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Backinstadiums: For a link to show whether a particular diacriticked form of an Arabic word is present in a page, someone would have to write a JavaScript function to look for the diacriticked form on the page and then add a CSS class to the link. Not sure if that's possible; I'm not very familiar with JavaScript. Lua (a module) might be able to do it too, though. I noticed there's a way to get the raw text of a page. But Lua would require a lot of processing power. — Eru·tuon 22:23, 26 January 2017 (UTC)

1000 Zulu entries :)[edit]

I managed to get Zulu up to 1000 entries. I just wanted to announce that. —CodeCat 18:37, 26 January 2017 (UTC)

That's very good news, and much needed! Looking at Special:Statistics, African languages are pretty underrepresented here -- the ratio of 244 lemmas on Wiktionary versus 35 million native speakers for Amharic is a good example of this. Hopefully eventually we might get some native speakers to help out. — Kleio (t · c) 19:00, 26 January 2017 (UTC)
Yay! It's always good to hear about good progress being made! I'm eagerly looking forward to the day when our coverage for every language is at least that good (it'll take a while though...we're still about 7,000,000 entries short... :P). Andrew Sheedy (talk) 19:03, 26 January 2017 (UTC)
  • If anyone wants to help out with our coverage of African languages, I am accumulating a great deal of notes on paper but have yet to transfer them to Wiktionary! What I need is assistance in making more modules and templates, so even if you haven't studied these languages, you can help with the effort if you have technical knowledge. —Μετάknowledgediscuss/deeds 06:12, 27 January 2017 (UTC)
    • I made several modules for Zulu, but they're not quite finished yet, in particular when it comes to tones. —CodeCat 14:04, 27 January 2017 (UTC)
Hi @Andrew Sheedy, out of curiosity, is the 7,000,000 number based in fact or just a divine vision of how many entries are left? That sort of thing might be good to include in Wikistats—someone gave a presentation in the WMF January Metrics meeting about overhauling Wikistats, so maybe strike while the iron is hot. Icebob99 (talk) 05:40, 29 January 2017 (UTC)
It's just a quick estimate of how many entries we'd need to add to reach a minimum of 1000 entries in every language. Since there are about 7500 known languages (including dead ones) and of those, about 120 have more than a thousand entries on Wiktionary, plus another 100 or so are well on their way to 1000, we have basically no entries for 7300 or so languages (so really, I underestimated, and we're about 7,300,000 entries short of having a minimum of 1,000 entries per language). So no, I don't have any special knowledge, nor did I put much effort into that estimate... Andrew Sheedy (talk) 06:27, 29 January 2017 (UTC)
Well done, CodeCat! Excellent work! — I.S.M.E.T.A. 14:09, 15 February 2017 (UTC)

Display of Template:alter[edit]

with transliteration (φιλέω(philéō))
without transliteration (honor)

I've gone and changed Module:alternative forms so that the dialect labels are enclosed in parentheses if the language doesn't have a transliteration module, but separated from the links by an n-dash if there is transliteration. (I created a new function Language:hasTranslit() in Module:languages to make this possible.) Some sort of a change to the display has been requested by @Mihia (October 2016) and @ObsequiousNewt (December 2015). — Eru·tuon 23:11, 26 January 2017 (UTC)

I would have added brackets in both cases, but it looks better (in my opinion) not having two parentheses in a row:

Hence, n-dash seemed the better option. Let me know your thoughts. — Eru·tuon 23:14, 26 January 2017 (UTC)

Conceivably, if there is disagreement about the display, we could add CSS classes that would allow people to remove the default display, and add their own preferred text before and after. — Eru·tuon 23:16, 26 January 2017 (UTC)

In the past, I proposed removing the brackets from the transliteration instead. —CodeCat 23:17, 26 January 2017 (UTC)
I like these changes. —JohnC5 05:46, 27 January 2017 (UTC)
@Erutuon: I'm not super-thrilled by the inconsistency of display, but this is better than what we had before. How about doing away with parentheses for the dialect tags altogether, and separating them from the actual forms by em dashes in all cases? — I.S.M.E.T.A. 13:36, 15 February 2017 (UTC)


There is a Github project to produce recordings for instant upload to Commons. At the end of recording, the snippets are to be self-categorized by commons:Category:Pronunciation.

The questions this raises for me include:

  • Are these useful linguistic pronunciation categories?
  • There is a distinct lack of documentation on how to categorize; is it possible to create a flow which usefully guides laypeople through categorizing their particular pronunciation?
  • Is this valuable for Wiktionary?

- Amgine/ t·e 20:07, 27 January 2017 (UTC)

eliminate accents and parens from Proto-Balto-Slavic headwords[edit]

If you look in Category:Proto-Balto-Slavic lemmas, you see that some of the lemmas have accents in them, and some don't. In reality, all of them would have been accented on a given syllable, it's just that in many cases we don't know which syllable. Rather than the current random hodge-podge, I think we should consistently not include accents in the headwords. This would make it easier to locate the words, would add consistency and would be unlikely to cause very many (if any) homograph clashes. Benwing2 (talk) 19:27, 28 January 2017 (UTC)

Also, cases like Reconstruction:Proto-Balto-Slavic/s(w)esō and Reconstruction:Proto-Balto-Slavic/tusk(t)jas should instead be split into two entries, one with the parenthesized letter and one without it, and linked using {{alternative form of}}. Benwing2 (talk) 19:29, 28 January 2017 (UTC)
@Benwing2: Please explain why “[t]his would make it easier to locate the words”. How do other online sources list reconstructed forms? I agree with your second idea, viz. that *s(w)esō, *tusk(t)jas, and their ilk should be split into *sesō and *swesō, *tuskjas and *tusktjas, etc. with one of each pair being lemmatised and the other thereof linking to the lemma with {{alternative form of}}. — I.S.M.E.T.A. 13:55, 15 February 2017 (UTC)
@I'm so meta even this acronym Other online sources are listed alphabetically so the issue of accents doesn't apply. The thing is that often different sources disagree about where the accents go or even whether the position of the accent is reconstructible at all -- then what do we do? As far making it easier to locate the words: if someone is trying to autocomplete a word, (1) it's often hard to type accents, and (2) they have to know to check the version with the accent and without it, esp. if the accent occurs near the beginning of a word. Furthermore, someone entering the form as a link in another page is less likely to get the lemma right. Benwing2 (talk) 02:56, 16 February 2017 (UTC)
I see no reason to do this. There's the same "hodgepodge" for PIE and other languages with accents as well. We should provide what information we know. —CodeCat 20:23, 28 January 2017 (UTC)
Yes and I think this should be done for PIE as well. As for "other languages", this issue doesn't apply to modern languages. Either they do or don't include the accents but in both cases it's standardized and doesn't depend on someone's random theory of where the accents go. Benwing2 (talk) 20:59, 28 January 2017 (UTC)
I'm not opposed to removing accents from PBS and PIE page names as long they can remain in headword lines. We can treat them the same as macrons in Latin, complete with diacritic stripping in linking templates. —Aɴɢʀ (talk) 21:08, 28 January 2017 (UTC)
Like Angr I would be fine with accents being removed from PIE entry names if they are displayed in headword lines. In fact, I like the idea, because it would make entry names more consistent. — Eru·tuon 21:20, 28 January 2017 (UTC)
Yes, this is exactly what I am proposing -- remove the accents from the page names but keep them in headword lines, and do diacritic stripping in links. Benwing2 (talk) 01:47, 29 January 2017 (UTC)
I oppose this. —CodeCat 18:17, 31 January 2017 (UTC)
Is CodeCat the only person who opposes? If so I may put this to a vote. Benwing2 (talk) 01:37, 1 February 2017 (UTC)
If there is a possibility that many words can be reconstructed only without the accents, then I strongly support, otherwise I'm neutral. --WikiTiki89 15:14, 1 February 2017 (UTC)
I also oppose the removal of the accents from PIE. —JohnC5 17:41, 1 February 2017 (UTC)
@JohnC5 I'm not proposing removing accents from links or headwords, only from the lemma (i.e. from the page name). Benwing2 (talk) 02:57, 16 February 2017 (UTC)
@Wikitiki89 There are lots of words where the accent isn't clear, and different authors often differ radically in how they reconstruct the position of the accent. Benwing2 (talk) 02:59, 16 February 2017 (UTC)
@Benwing2: I'm aware of what you are proposing. I oppose the removal of accents from the pagenames in PIE. —JohnC5 04:26, 16 February 2017 (UTC)
@JohnC5: Do you have an opinion for PBS? --WikiTiki89 15:09, 16 February 2017 (UTC)
@Benwing2: I like the accents but am fine either was in PBS. —JohnC5 15:33, 16 February 2017 (UTC)
@Benwing2: There's a big difference between an alternative reconstruction with a different accent, and an unknown accent. Alternative reconstruction means we have evidence pointing to different scenarios, while an unknown accent would mean we don't have evidence about the accent. It's the unknown case that's problematic if accents are required for pagenames. Alternative reconstructions can be resolved with alternative form entries. I still see no downside to removing accents from pagenames in either PIE or PBS, so if there is a need to so, then I support removing them. --WikiTiki89 15:14, 16 February 2017 (UTC)
A good thing then that accents are not required for pagenames. They're included if known. —CodeCat 16:38, 16 February 2017 (UTC)
The problem is that "known" is not a binary variable. Kortlandt may claim an accent is known but someone else like Matasovic may completely disagree. AFAIK the state of Proto-Balto-Slavic accentation is somewhat unsettled (even more so for Proto-Slavic). Benwing2 (talk) 01:56, 17 February 2017 (UTC)

Arabic patterns[edit]

Hi, it would improve inmensely the Arabic dictionary to add categories for the different patterns used in terms which lack them. For example, for the noun رَئِيس (pl. رُؤَسَاء) the following information shows up 'From the root ر ء س ‎(r-ʾ-s) of رَأْس ‎(raʾs, “head”)', yet marking the pattern with which it's form, namely فَعِيل (pl. فُعَسَاء), specially for broken plurals, would help those beginning to learn the language to recognize patterns, just as happens with the conjugation of verbs which is automatically added in each entry. Thanks in advance. --Backinstadiums (talk) 16:38, 31 January 2017 (UTC)

We have a template {{transfix}} for this already. It just needs to be added to entries. —CodeCat 18:18, 31 January 2017 (UTC)
@CodeCat Hi, I am a simple user, so I have no idea about it. Can it be implemented automatically for every single term, including verbal nouns? Where should I request it?
No, sadly. You have to add it manually to every entry. —CodeCat 19:22, 31 January 2017 (UTC)
@CodeCat and could it be created with some lines of code? Here's some academic work on it: http://www.attiaspace.com/
What is the pattern of رَأْس ‎actually? Is it -a--? —CodeCat 19:35, 31 January 2017 (UTC)
The pattern is "faʿl", from the root consonants where each root consonants are represented by "f-ʿ-l". The patterns for the two plural forms of رَأْس(raʾs) are 1. "fuʿūl" رُؤُوس(ruʾūs) and 2. "ʾafʿul" أَرْؤُس(ʾarʾus). --Anatoli T. (обсудить/вклад) 21:35, 31 January 2017 (UTC)
Why are the root consonants included in the pattern? That seems counterintuitive to me. —CodeCat 21:40, 31 January 2017 (UTC)
(Before E/C) It's doable. The verb patterns are already automated - User:Benwing2's work. See also Wiktionary:Beer_parlour/2017/January#Arabic_consonant_patterns. One needs to separate loanwords, which don't have patterns and may contain any number of consonants and can be confused with patterns.
(After E/C). This is how Arabic (and other semitic) patterns are described by grammarians - for declensions, conjugations, forming verbal nouns, plural forms, etc. Why is it counterintuitive? --Anatoli T. (обсудить/вклад) 21:47, 31 January 2017 (UTC)
@CodeCat If you look at Appendix:Arabic verbs, it's all based on root patterns where consonants ف ع ل (f-ʿ-l) are used for each verb form, perfective and imperfective. --Anatoli T. (обсудить/вклад) 21:52, 31 January 2017 (UTC)
@Atitarev, Benwing2: Do you think a function detecting the pattern of a word would need the root to be input into the headword template, or could it automatically detect both root and pattern? — Eru·tuon 21:57, 31 January 2017 (UTC)
My guess is that just the (e.g. noun) headword with plural forms would provide the plural forming patterns but Benwing2 would know better. With verbs, the root letters are not supplied as a separate parameter. --Anatoli T. (обсудить/вклад) 22:02, 31 January 2017 (UTC)
@Erutuon In some cases it's possible to auto-detect both root and pattern from the noun given the vocalized form, but there are a whole lot of cases where it isn't easy and may be impossible, esp. when there are weak root letters involved. For example, there's no real way to autodetect the radicals of a word like صِلَة(ṣila, link), from و ص ل(w ṣ l), or بَاب(bāb, door) from ب و ب(b w b). It would be another story if the root were specified. Potentially something could be written to autodetect the easy cases and throw an error if it can't autodetect, requiring that the root be specified. Even for verbs, the verb form (I, II, III, IV, etc.) must be specified, else many verbs are ambiguous. Benwing2 (talk) 04:05, 1 February 2017 (UTC)
@Benwing2 I suspected there might be a problem with assimilated, weak, defective, geminated, hamzated and quadrilateral roots making the detections unreliable. Loanwords would further complicate it. If the root letters are supplied in the etymology, they could be used to determine the pattern, no? --Anatoli T. (обсудить/вклад) 04:14, 1 February 2017 (UTC)
@Atitarev I think so. There are a lot of special cases with nouns so it would be complicated, but probably doable. Benwing2 (talk) 04:17, 1 February 2017 (UTC)
@CodeCat One of the reasons for using root letters ف ع ل(f ʿ l) is that it helps indicate roots where a letter occurs more than once. For example, the فَعَّال(faʿʿāl) pattern has a doubled consonant in the middle of it, and its plural has the pattern فَعَاعِيل(faʿāʿīl), where a vowel breaks up the two parts of the doubled consonant. If it were simply indicated as -a-ā-ī- it wouldn't be obvious that the 2nd and 3rd consonants are always the same in this pattern. Benwing2 (talk) 04:21, 1 February 2017 (UTC)
@Atitarev: Root letters in the etymology? I was thinking that root and pattern detection would happen in the headword template. — Eru·tuon 04:25, 1 February 2017 (UTC)
It turns out to be too hard or impossible. A combination of root letters from the etymology's {{ar-root}} with the rest coming from the headword would do the trick. You should follow up the implementation with Benwing2, though.--Anatoli T. (обсудить/вклад) 04:38, 1 February 2017 (UTC)

@Benwing2, @Atitarev, @CodeCat, @Erutuon Please take a look, there's a lot of work already done, https://sourceforge.net/projects/arabicpatterns/, so the most common 'assimilated, weak, defective, geminated, hamzated and quadrilateral roots' are already taken into account in opensource academic works. What do you say? --Backinstadiums (talk) 23:35, 1 February 2017 (UTC)

@Backinstadiums When you use {{ping}}, you have to sign your post, otherwise it won't work. Your link is not very useful, especially for the technical solution to be made. These type of roots are already covered by verb conjugation modules. As discussed, it's doable if both the root consonants and headwords are used but it's up to User:Benwing2 or someone else knowledgeable in both Lua and some Arabic to implement it. --Anatoli T. (обсудить/вклад) 23:25, 1 February 2017 (UTC)
@Atitarev: There's a way for one template to get parameters from another template? How would that work? I think I'm going to try to write a function at Module:User:Erutuon/sandbox. — Eru·tuon 02:33, 2 February 2017 (UTC)
I'm afraid you have to ask at WT:GP. I am not writing modules any more. --Anatoli T. (обсудить/вклад) 02:44, 2 February 2017 (UTC)
Ah, okay. I'll write a function first and then ask that question. — Eru·tuon 02:48, 2 February 2017 (UTC)
One template can't easily get the params of another. You'd have to copy the params, perhaps by bot. Benwing2 (talk) 03:40, 2 February 2017 (UTC)
@Erutuon Hi, I just want to ask whether you were able to write that 'function', as well as say that I am willing to help with the work as long as I have the skills required. Regards. --Backinstadiums (talk) 10:20, 15 February 2017 (UTC)
@Backinstadiums: I created the beginnings of a function in Module:User:Erutuon/sandbox. It divides a word into consonants and vowels. I'm still thinking about how to create the pattern recognition part of the function. — Eru·tuon 18:05, 15 February 2017 (UTC)

February 2017

Czech verbs ending in -ti or -t?[edit]

@Dan Polansky All my etymological Slavic sources (Derksen, ESSJa, Vasmer) consistently write Czech infinitives ending in -ti, but in Wiktionary they are entered ending in -t instead. What's going on here? Should we at least create soft redirects for the forms in -ti? Benwing2 (talk) 04:58, 1 February 2017 (UTC)

The -ti endings must be archaic. Most modern Czech verbs end in -t, AFAIK - brát, psát, číst, nést, být, dělat. --Anatoli T. (обсудить/вклад) 05:36, 1 February 2017 (UTC)
Quoting one web source: A Czech infinitive usually ends in -t (infinitives in older literary texts may end in -ti). --Anatoli T. (обсудить/вклад) 05:38, 1 February 2017 (UTC)
Dan has answered this for me before. Older dictionaries lemmatize with -ti, newer dictionaries lemmatize with -t. As far as I know, both forms have always existed in parallel (and are not etymologically the same form and sometimes the difference is non-trivial, e.g. péct vs. péci). I think both forms should be given in conjugation tables, with the -ti form being treated like any other non-lemma form. We should probably also give them separate names. --WikiTiki89 15:22, 1 February 2017 (UTC)
@Benwing2: -ti is archaic. -ti forms are used neither in modern speech nor in modern writing, as you can verify in Google Books for yourself. Nonetheless, even some 20th century dictionaries like {{R:PSJC}} and {{R:SSJC}} lemmatize on -ti. Interestingly enough, SSJC uses -ti as lemma but -t in its definitions, e.g. for "plavati"[7]. In my Wiktionary editing practice, I am ignoring the -ti forms altogether. --Dan Polansky (talk) 13:21, 4 February 2017 (UTC)

Russian-Rusyn offline dictionary (PDF)[edit]

@Wikitiki89, Benwing2, CodeCat I've got a copy of Russian-Rusyn dictionary by Igor Kercha (Игорь Керча)- 65,000 terms as two scanned PDF files, over 30 MB altogether. It seems good but I'd check translations with other sources occasionally. It's cumbersome to use (the files are not searchable) but I can share it. Please suggest how to send it to you if interested. --Anatoli T. (обсудить/вклад) 11:39, 1 February 2017 (UTC)

empty declension sections (?)[edit]

Hi, some terms show an interrogation mark in sections of their declensions, e.g. https://en.wiktionary.org/wiki/أغلب#declension. I'd like to know what it exactly means and how it could be filled. Thnks in advance. --Backinstadiums (talk) 19:33, 1 February 2017 (UTC)

Most elatives aren't declined any more. The ?'s indicate that the forms probably aren't attested. Benwing2 (talk) 03:46, 2 February 2017 (UTC)
@Benwing2 Not even in the whole arabic wikipedia? There're some probabilistic approaches which offer really good results using such corpora, but as no possible copyright issues may arise regarding the existence of lexicographic terms, so any good dictionary, as for example the oxford arabic, or lexicon could offer so. Furthermore, OCR'd resources might be used as well. --Backinstadiums (talk) 19:08, 4 February 2017 (UTC)
@Backinstadiums I looked through several dictionaries to find elative forms. Benwing2 (talk) 19:23, 4 February 2017 (UTC)

Suggestions without reply in entry discussion sections[edit]

Hi everybody, I've been adding some remarks and suggestions on different 'discussion' tabs of some entries. I'd very much appreciate it if someone could check them out and discuss on them. Thank you so much in advance. --Backinstadiums (talk) 08:38, 3 February 2017 (UTC)

Seventh LexiSession: fever[edit]

Monthly suggested trend topic is fever. You are invited to participate in the common goal to discover what can be gathered around the word fever! It may be description of subtype of fever, creation of a Wikisaurus, add of sounds or pictures.

This is a collaborative experiment without any guide nor direction. You're free to participate as you like and to suggest next months topic. If you do something, please report it here, to let people know you are involve in a way or another. I hope there will be some people interested by this one Face-smile.svg Noé (talk) 10:50, 3 February 2017 (UTC)

Standardizing homophone qualifiers[edit]

I've noticed we don't really have a standard for designating in which accents a word is a homophone. What I see most commonly is after the IPA, assumed to be for all accents (unless indented under one) unless there is a qualifier template with "in some accents". There must be a way to do these consistently though, such as directly in the homophones template ("ex1|ex2|q1=some|q2=caught-cot" or similar?). As it stands, if there are 3 homophones and the last 2 are "in some accents", there is no clear solution. It's also tricky with automated translations, such as in ca-IPA where Central Catalan has many, many homophones compared to Valencian, and I think it is worthwhile to specify where two words are homophones, and it is clearest under the line in question instead of after them all. Ultimateria (talk) 22:18, 4 February 2017 (UTC)


This contributor has been warned about, and blocked for, questionable entries several times. First they were creating entries of protologisms, and finally unblocked when they agreed not to do that any more. Since July, they've been mass-creating entries from lists, all with exactly the same content even when it doesn't make sense. In September, it was country-name prefixes, all with exactly the same etymology except for the source country-name and the entry name- an etymology that was correct for only a handful of the entries. Today they created entries for about 154 given names (there may be a few in there without the English section), each with an English and a Serbo-Croatian section, without any evidence that most of these Serbo-Croatian names have ever used in English. Many of them, such as "Ognjemir", would be unpronounceable to the vast majority of English speakers. They also created 48 Latin proper-noun entries based on Serbo-Croatian names. After seeing how much work it took to clean up their September run, I simply mass-deleted the the whole batch from today (and also reverted an edit where they changed the declension of a Latin noun ending in -a from first to second declension).

At the same time I blocked them for two weeks. Given that they continued to create bogus entries using both their own account and IPs long after being warned and blocked (they blanked their talk page, so most of the warnings aren't visible), and that they were blocked and warned again in September for mass-creating entries, but came back and did it again, I would like to extend that block much longer, to make it indefinite- but I would prefer to have the consensus of the community before doing so.

They've been wasting a lot of other people's time and effort with their actions over the past couple of years. The specific types of bad edits have changed, but the pattern of reckless disregard for accuracy has been quite consistent. Even if they stop their current practices, I see no reason to expect that they won't come up with something equally irresponsible later on.

What does everyone else think? Chuck Entz (talk) 03:51, 5 February 2017 (UTC)

When someone consistently adds bad stuff I think it becomes reasonable to demand a discussion and block them if they refuse to engage/reply. So far it looks as though all he has ever done with his talk page is to blank it. Equinox 08:58, 5 February 2017 (UTC)
Don't think I'm vandalizing again. I blanked my talk page because I have right to do so. I clean every WhatsApp message when I read it, I did the same with my talk page. I didn't realize that my mass creation could be a problem, because there are thousands of entries much worse than mine (and I think mine are good).
Today they created entries for about 154 given names (there may be a few in there without the English section), each with an English and a Serbo-Croatian section, without any evidence that most of these Serbo-Croatian names have ever used in English. Many of them, such as "Ognjemir", would be unpronounceable to the vast majority of English speakers.
This is the proof that the name is used in English. English is a universal language, thus every Serbo-Croatian name without diacritics is also English name and it belongs to English language. That's why I love English. You cant just delete all my entries without asking what I'm doing. If you told me, I would make everything right. And without people like me, on Wiktionary would be no more than a hundred entries. And there are much more unpronounceable words than "Ognjemir" like Hradec Králové, hryvnia, hsianghualite, ZWNBSP, Þrymskviða, Łukasiewiczian, ǂKxʼauǁʼein, Świętochłowice, Ṛgvedic... is that pronouncable to vast majority of English speakers???
The problem is with Latin entries. Please recreate those entries because they have nothing to do with Serbo-Croatian names in English, they are completely fine and correct:
I think he is now Bjelun (talkcontribs). SemperBlotto (talk) 15:20, 7 February 2017 (UTC)
You don't think correct. --Bjelun (talk) 15:22, 7 February 2017 (UTC)

Reconstructed Latin terms are not linked[edit]

Hi, the etymology of meddle states "from Late Latin misculare", yet misculare is in red, without specifying it's a reconstructed term despite there being an entry for it. I think I have seen reconstructed terms indicate so with an asterisk. Thanks in advance. --Backinstadiums (talk) 19:35, 5 February 2017 (UTC)

Mhm, so you add the asterisk when you see something like this. I've fixed it now. — Kleio (t · c) 19:44, 5 February 2017 (UTC)
@Backinstadiums: Okay, I'm going to give you some basic information on linking. You can fix this stuff yourself. The way to link to the Reconstruction namespace is to place an asterisk before the term in the linking template (or etymology template in this case): thus, {{m|la|*misculo}} or {{der|en|VL.|*misculō}}. The asterisk and language code tell Module:links to add the text Reconstruction:Latin/ to the wikilink. Also, you can link to entries using [[meddle]] or [[Reconstruction:Latin/misculare|*misculare]] (or with the linking template {{m}}) rather than copying out the whole URL. — Eru·tuon 06:05, 6 February 2017 (UTC)
Also, etymologies in English-language dictionaries tend to use the infinitive, but we use the first-person present indicative for our lemmas, so it's a good idea to change to that form when you see it in our etymologies. Otherwise users click on the link, only to get a "form of" entry referring them to the lemma. Chuck Entz (talk) 13:14, 6 February 2017 (UTC)

Glyph origin vote[edit]

I created Wiktionary:Votes/2017-02/Glyph origin.

It was based on the discussion Wiktionary:Beer parlour/2017/January#About "Glyph origin". --Daniel Carrero (talk) 18:47, 6 February 2017 (UTC)

@Daniel Carrero Thank you so much. I do support it. --Backinstadiums (talk) 14:11, 7 February 2017 (UTC)

"only present in non-English entries"[edit]

WT:EL#Headings after the definitions contains this line:

  • Inflection, or Conjugation for verbs, or Declension for nouns and adjectives, or Mutation, only present in non-English entries

I'd like to remove the part "only present in non-English entries", and I guess we don't a need a vote for that.

Rationale for the edit: the entry be#English has a conjugation section, and I believe that's OK. --Daniel Carrero (talk) 14:24, 7 February 2017 (UTC)

Mutation is entirely separate from Inflection, both co-occur in entries. So it should be changed so that Mutation doesn't appear to be a variation on Inflection. —CodeCat 14:27, 7 February 2017 (UTC)
Maybe we can just list them like this:
  • Inflection
  • Declension
  • Conjugation
  • Mutation
This would be consistent with Synonyms, Antonyms, Hyponyms, etc. that are all in separate lines. --Daniel Carrero (talk) 14:49, 7 February 2017 (UTC)
But Inflection, Declension and Conjugation are mutually exclusive, whereas Synonyms and Antonyms are not. —CodeCat 16:17, 7 February 2017 (UTC)
OK. Then it probably should be:
  • Inflection, Declension or Conjugation
  • Mutation
--Daniel Carrero (talk) 16:19, 7 February 2017 (UTC)
Yes check.svg Done --Daniel Carrero (talk) 21:50, 16 February 2017 (UTC)

Vulgar Latin hypothetical verb forms and Ibero-Romance[edit]

I was wondering if there should be a distinction noted for the descendants of Vulgar Latin verbs in Ibero-Romance (not including Catalan). Specifically, these languages merge the Latin second and third conjugation types (though the infinitives derive from the second type, with the long e in -ēre). For example, coser, saber, vender, torcer, etc. I understand there's no need to make a distinction for a main Latin entry (of an existing word), but if we're making the Vulgar Latin entries more detailed for those who wish to study or learn about it more, including differentiating various types of conjugation for the types of Romance, should there be a distinction made for these languages (for example in *cōsō or *torcō)? Or should it just be assumed that Portuguese, Spanish, Asturian, Galician, etc. always have the stress on the final syllable of infinitives, even if the Vulgar Latin form they're listed under has stress on a different vowel (which would apply for the rest of Romance). Does anyone know if this was a later phenomenon in that region or if they presumably used a separate form of Vulgar Latin in this regard? Since the first records of these languages comes much later, it's hard to come to a conclusion on this. In other words, would Spanish coser have come from a *cosēre, as a variant or later evolution of *cōsere, or simply just changed the stress for all its verbal infinitives later, analogically? Word dewd544 (talk) 22:39, 7 February 2017 (UTC)

Petitioning to change Turkic Khalaj to Khalaj[edit]

Anyone who's heard of Khalaj knows Khalaj is a Turkic language, Iranic Khalaj appears to be a ghost language and it would be a good idea to stop perpetuating confusion about this. Crom daba (talk) 02:23, 8 February 2017 (UTC)

@Crom daba: I support the change. My materials always call this language simply "Khalaj". --Vahag (talk) 08:57, 12 February 2017 (UTC)

finding an unknown word[edit]

I would like to be able to find all possible words with particular first and last letters by typing in at least the first letter of a word, and a certain number asterisks (or ?), representing subsequent letters and then the last known letter. Is this something I can do in Wiktionary? Or perhaps someone knows another site that can help me do this. Clduncan (talk) 16:57, 8 February 2017 (UTC)

This would be extremely helpful, but I'm pretty sure you can't do it on Wiktionary. Andrew Sheedy (talk) 17:24, 8 February 2017 (UTC)
Try the simple way using '*' for a wildcard. You may want to restrict yourself to English lemmas using 'incategory:"English_lemmas"'. It works for me. DCDuring TALK 18:39, 8 February 2017 (UTC)
Well, look at that. I could have sworn I'd tried it before without success. Andrew Sheedy (talk) 18:45, 8 February 2017 (UTC)
I hadn't realized it until I went to the Mediawiki documentation (CirrusSearch) today. I thought it would take some regex search, which I didn't know how to limit to entry titles. DCDuring TALK 19:54, 8 February 2017 (UTC)


Since the Saxons didn't name themselves after the seax, where does the word *sahsô come from? ÞunoresWrǣþþe (talk) 19:42, 8 February 2017 (UTC)

Display order of Template:cite-book[edit]

Right now, it displays the author(s), then the year, then the entry, then the title of the book. This doesn't really make sense to me. A more sensible order would be to have the entry first, then the title, then author(s), then year. Can this be changed? —CodeCat 19:02, 9 February 2017 (UTC)

IMO, the year needs to come first. This helps with ordering the citations in chronological order. --Daniel Carrero (talk) 19:05, 9 February 2017 (UTC)
Yes, and Wiktionary:Quotations#How to format quotations calls for putting the year first. —Aɴɢʀ (talk) 10:37, 10 February 2017 (UTC)
These are citations though, not quotations. —CodeCat 13:32, 10 February 2017 (UTC)
Aren't citations just quotations found in the "Citations:" namespace? Is there any reason to format them differently at all? (except you add a "#" when the quotation is directly below a sense in an entry) --Daniel Carrero (talk) 14:07, 10 February 2017 (UTC)
See the documentation of {{cite-book}}. —CodeCat 14:39, 10 February 2017 (UTC)


Huh? Why is this not in IPA? —CodeCat 14:01, 10 February 2017 (UTC)

-త- Wyang (talk) 10:50, 13 February 2017 (UTC)
So can it be moved please? —CodeCat 17:09, 13 February 2017 (UTC)
Sorry I don't know Telugu. This only looks like a concerned face to me. You may wish to tag the creator Dr. Rajasekhar for his opinion. Wyang (talk) 09:41, 14 February 2017 (UTC)

Actualités: Monthly news of French Wiktionary in English[edit]

Hi all, Here comes the 22th issue of Actualités, the periodical news about Wiktionary, translated in English specially for you! Well, it's January edition with a small delay (only 11 days!).

This is a huge issue. Massive. You're gonna love it. It's wiktionarilly tremendous. There is three amazing articles: a focus on thesauri in French Wiktionary, a description of a published dictionary of insults (more than 9.300 insults in French!) and an article about the names for the snow in Inuit. Plus: a Wiktionarian coded a small game based on Wiktionary database and it was improved collaboratively. As usual, we also mention some briefs and provide metrics.

This time, it was quite hard to translate, so I am sorry but it is barely better than a draft and you may have to guess the meaning for some part, or to help to finalized the translation if you can. It is still made without any funding for anyone, craft by Wiktionarian hands. I hope it will be inspiring for those who will read it and I'll be delight to answer to any questions or to listen to comments on our work! Cheers, Noé (talk) 20:16, 11 February 2017 (UTC)

I wish I had time to help with the translations this time, but I haven't even found time to read it yet! Andrew Sheedy (talk) 03:54, 12 February 2017 (UTC)

Hunnic Lemmas[edit]

There were previously no Hunnic Lemmas on the site. I've added 18 of them, with 2 of those being reconstructions. ÞunoresWrǣþþe (talk) 12:47, 12 February 2017 (UTC)

They should all be reconstructions IMO since we don't have any primary sources. Not to mention that Pritsak's etymologies are speculative and that alternative theories exist. Crom daba (talk) 14:12, 12 February 2017 (UTC)
The Greek forms count as attested ones. ÞunoresWrǣþþe (talk) 15:58, 12 February 2017 (UTC)
We usually count those as belonging to the language they're attested in, compare Alanic *Asparuk. Crom daba (talk) 16:34, 12 February 2017 (UTC)
This is different. ÞunoresWrǣþþe (talk) 16:40, 12 February 2017 (UTC)
Yeah, to my eye, these should all be reconstructed, as there is not direct attestation of the words in the Hunnic language. We have sometimes been inconsistent (Gaulish names in Latin are sometimes put in Gaulish), but in the absence of a proper written Hunnic corpus, all lemmata should be reconstructed, especially those words that are not directly attested (bárs, munǯuq, qará, etc.). —JohnC5 18:01, 12 February 2017 (UTC)
@ÞunoresWrǣþþe: You moved all of those words to Reconstruction, but they need to be in Reconstruction:Hunnic/. Currently, they are not legal entries. —JohnC5 19:10, 13 February 2017 (UTC)
Would like to note that quite a few scholars (e.g. D.H. Green) still consider Attila to likely be a Gothic name (or a Gothicized version of a Hunnic name), his Hunnic name may very well have been different. — Kleio (t · c) 17:31, 12 February 2017 (UTC)
Those historians are a minority. ÞunoresWrǣþþe (talk) 18:52, 12 February 2017 (UTC)
What makes you say so? — Kleio (t · c) 19:31, 12 February 2017 (UTC)
Because it's a fact. ÞunoresWrǣþþe (talk) 21:08, 12 February 2017 (UTC)
Then back it up. Korn [kʰũːɘ̃n] (talk) 21:32, 12 February 2017 (UTC)
No thanks. I'm done wasting my time in pointless arguments. ÞunoresWrǣþþe (talk) 14:51, 13 February 2017 (UTC)
Love how you live in some pathetic little bubble where basic facts have to be sourced. There's no "source" to the fact a minority of historians believe it's Gothic and a majority believe it's Turkic-Altaic. Want a list of every publication that's ever leaned in favour of one or the other? Good luck. It's impossible. And since you with your superior opinion are clearly correct... Please, source the opposite claim: that there are enough historians that believe Attila is a Gothic name to even warrant the inclusion of that fringe theory in a dictionary? By all means, try, but I expect you only "contributed" to this discussion to be contrarian. Oh, don't forget to source every word in your response, otherwise we can't be sure you're speaking English or that your words mean what they do. ÞunoresWrǣþþe (talk) 15:04, 13 February 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Sorry to break it to you but some of the sources you have provided do not meet Wiktionary's standards (e.g. Reconstruction:Hunnic/strava, Reconstruction:Hunnic/medos). In addition to this, you haven't followed appropriate naming regulations either (e.g. Reconstruction:Ōybárs, Reconstruction:ōy, Reconstruction:čérkün, Reconstruction:tōn). I believe that the concerns raised by other users are well-founded – the subject at hand is not well documented and therefore prone to scrutiny, and rightfully so. I also advise you not to refer to words in languages you don't master as you did here. And last but not least, will you do us the courtesy of abstaining from personal attacks, they won't do you any favours around here. --Robbie SWE (talk) 20:43, 13 February 2017 (UTC)

@ÞunoresWrǣþþe: Also, where do the guidelines for orthography in Wiktionary:About Hunnic come from? Why are some of the vowels starred (, *oː, *aː) and others are not, especially when they are all reconstructed? Also, why is there a mix of IPA representations (*oː) and orthographic ones ()? —JohnC5 22:18, 13 February 2017 (UTC)
In that case, Robbie, never edit a Hunnic page again. You don't master the language. You clearly have some racial bias here, I can see from your user page you're Romanian. You're the last person I'd trust on this, given your source contradicts you, and was posted in a foreign language with no translation to confuse or mislead people. ÞunoresWrǣþþe (talk) 23:18, 13 February 2017 (UTC)
Cucura > Koúkouron > Cucură or Cucura > Cucură. I must say it's very "hard" to decide. Reminds me of my fringe theory where the word boat doesn't come from Old English bāt, but instead bāt > qóng > boat, came to English through Mandarin. ÞunoresWrǣþþe (talk) 00:12, 14 February 2017 (UTC)
Wow, do we have another Uther on our hands? —CodeCat 23:51, 13 February 2017 (UTC)
@CodeCat: Interesting question, let's look at the evidence. This user created the page Wiktionary:About Hunnic by copying WT:ACEL-BRY and still doesn't understand how the {{shortcut}} template works. The user has edited primarily Celtic and Anglo-Saxon pages. The user does not understand how namespaces work. The user has taken on a language for which (s)he clearly lacks a comprehensive or consistent understanding of the literature. The user has a name associated with the British Isles' mythology (Old English for Thunder's Wrath). And the user is quite rude, resorting to incomprehensible personal assaults when challenged for evidence. All in all, I'd give this one a strong 8/10 on the “Uther-o-meter”™. —JohnC5 00:14, 14 February 2017 (UTC)
I am not that person, though you may of course believe what you want. I will say, however, that you are a 10/10 on the failed abortion-o-meter. A grade-A cunt, not those made in China cunts but a bon and bred American cunt. ÞunoresWrǣþþe (talk) 00:30, 14 February 2017 (UTC)
And pssst, Thor's Wrath, you bescittende cunte, and fukkende horninsunu. Chupama y chingate, cago en la leche de th madre. ÞunoresWrǣþþe (talk) 00:32, 14 February 2017 (UTC)
Oh sorry, that is my bad as it is completely ambiguous. I'm gonna' upgrade you to a 9/10 on the Uther-o-meter. Does this mean we block now? @Chuck Entz? —JohnC5 01:23, 14 February 2017 (UTC)
After that, I support a block. —CodeCat 01:33, 14 February 2017 (UTC)
I don't know what the official policy for declaring someone a sockpuppet. Do I need a CheckUser (@TheDaveRoss)? Or can we just block Uther? Regardless, I'll block the account for a few days for offensive behavior. Anyone wanna' nuke some Hunnic entries? —JohnC5 01:50, 14 February 2017 (UTC)
I think what has happened so far is bad enough, without them being confirmed as a sockpuppet. So block away I say. —CodeCat 01:51, 14 February 2017 (UTC)
I always have trouble judging these things. I gave them a week block. If someone wants to upgrade that, feel free. —JohnC5 01:54, 14 February 2017 (UTC)
  • @JohnC5, CodeCat: I have upgraded the block; regardless of whether or not he is Uther, this is grossly inappropriate. I do think that a CU investigation would be handy all the same. The real question, which I am linguistically ignorant of, is whether we should nuke all these Hunnic lemmas, reconstructed and otherwise. —Μετάknowledgediscuss/deeds 03:21, 14 February 2017 (UTC)
    • @Metaknowledge: Thanks for the upgrade. As to the Hunnic lemmata, I possess not the knowledge to evaluate whether the entries are salvageable nor know-how to fix them if they be so. Regardless, they cannot stay as they are. —JohnC5 03:30, 14 February 2017 (UTC)
      • I've moved Reconstruction:Hunnic/adám to an acceptable name and tidied it up in accordance with the one cited source. I have neither the time nor the inclination to do the same for all the remaining Hunnic entries, but this one at least has a scholarly background. —Aɴɢʀ (talk) 09:05, 14 February 2017 (UTC)

Wow, nothing like a good ol' cup of bigotry in the morning. Not trying to defend myself here since I don't believe that anyone buys what you're selling, I just want to point out that I did not do any substantial changes to Hunnic reconstructions because I'm admittedly not well versed in reconstructed languages. However, unlike you, I speak Romanian and I know Romanian history and orthographic norms, so I'm not going to abstain from correcting obvious and elementary errors. I'm somewhat taken aback by your claim that I'm racially biased – it says much more about you than it does about me. Not that it matters since you don't know me, you apparently have no respect for anyone else around here and you do remind me an awful lot of Uther who was impossible to discuss with, but for the record I'm a Swede. Mind you, it doesn't make me more or less objective but it exposes once again your flawed logic and antagonistic behaviour. I would've blocked you indefinitely if it were up to me, but thankfully it's not. I have respect for my peers, hence I trust their judgements regarding the duration of you block and your future participation in this project. --Robbie SWE (talk) 10:34, 14 February 2017 (UTC)

We should probably delete all his Hunnic reconstructions, if only because they do not list descendants. --Vahag (talk) 07:20, 17 February 2017 (UTC)

@Vahagn Petrosyan: Would you do the honors? —JohnC5 07:27, 17 February 2017 (UTC)
Yes check.svg Done. All that was salvageable has been moved to Reconstruction:Proto-Mongolic/köküür --Vahag (talk) 09:52, 17 February 2017 (UTC)

De-Recognition of Wikimedia Hong Kong[edit]

I think this is a big mistake. ÞunoresWrǣþþe (talk) 18:06, 13 February 2017 (UTC)
@ÞunoresWrǣþþe: Have you tried contacting anyone at WM-HK? —Justin (koavf)TCM 19:57, 13 February 2017 (UTC)

Proposal: Implementing Wikidata access[edit]

I suggest implementing Wikidata access on Wiktionary. I believe this requires a vote and then if it passes we'll have to fill a request on Phabricator.

Apparently we can't access Wikidata using wikitext like {{#property:P569}} or Module:wikidata right now.

I'd like to use Wikidata to contain all character information (including character names and image links) that is currently on subpages of Module:Unicode data, for use by other Wiktionaries as discussed here: Wiktionary:Grease pit/2017/February#Porting and debugging of module.

There are probably other things we can do with Wikidata.

I wonder if we need all the 19 extensions from the groups "Wikibase" and "DataValues" that Wikipedia is currently using. Some of these extensions seem optional like a geographical parser and a JavaScript API, but seem nice anyway.

See these pages for the lists of extensions:

--Daniel Carrero (talk) 18:59, 13 February 2017 (UTC)

  • Symbol support vote.svg Support I absolutely believe that we need to integrate with Wikidata and the rest of the WMF projects. How that will look exactly and the extent to which Wiktionary will become more like an OmegaWiki project that is primarily a database are important discussions to have but there's no doubt that there are substantial advantages to opening up to Wikidata. —Justin (koavf)TCM 20:00, 13 February 2017 (UTC)
  • Symbol support vote.svg Support Strongly support. If we want to raise the quality of all language Wiktionaries we need ways to cooperate, instead of duplicate loads of data on every site. –dMoberg 21:42, 13 February 2017 (UTC)
  • Symbol support vote.svg Support Absolutely. - TheDaveRoss 21:58, 13 February 2017 (UTC)
  • Symbol support vote.svg Support (naturally) and also comment: If we are going to use Wikidata to contain any information in all languages (for example: the periodic table in all languages or something), then I'd also suggest eventually changing all our exceptional language/family/dialect codes to be ISO-compliant. (which can be discussed and/or done later, it does not have to be right now) Reason: it appears that any information we put there will be accessible by other people rather than only by Wiktionary, so it might make sense to use a universal standard. For example, maybe it would be nice to change our "gem-pro" to "gem-x-pro", because I believe that the former basically means "Germanic-Provençal" instead of "Proto-Germanic". (here's a previous discussion about this: Wiktionary:Beer parlour/2013/February#Changing "exceptional" language codes to comply with the HTML specification) --Daniel Carrero (talk) 06:06, 15 February 2017 (UTC)
    Basically, this is a proposal that would involve more typing for zero benefit. It would be one thing if the ISO assigned a meaningful code for Proto-Germanic (they won't), but if we are to make our own codes, they should be consistent with the ISO codes and not too much trouble — and both of those things are currently true. —Μετάknowledgediscuss/deeds 07:11, 15 February 2017 (UTC)
    @Metaknowledge: Re "they should be consistent with the ISO codes": I think I know what you mean, because our codes don't conflict with actual ISO codes. Still, we are not using strictly ISO-compliant codes. Probably the use of made-up language codes is an issue to be discussed at Wikidata too. We might want to ask them: "Are you cool with keeping some information using langcodes that Wiktionary made up like gem-pro meaning Proto-Germanic and roa-opt meaning Old Portuguese?" I guess they have a right to want ISO-compliant codes but I didn't check that yet.
    If in the future we decide to change all our codes to be ISO-compliant for whatever reason, for "gem-pro", instead of "gem-x-pro" we might use "qge-pro" and keep within the established 7-character limit. (we may even shorten it somehow) Admittedly, it would take work to change all the codes. --Daniel Carrero (talk) 15:23, 15 February 2017 (UTC)

There is an ongoing project in Wikidata to include Wiktionary data and I invite you to participate with your proposal. In my opinion, it is mainly developed by people looking at benefices for Wikidata and not so much people looking for how it can be useful for Wiktionary. I tried to make them clarify some aspects of the development, but it is still quite hard for me to understand how it can not create a fork. Anyway, I am very happy to see your discussion about Wikidata Face-smile.svg Noé (talk) 09:11, 15 February 2017 (UTC)

  • Symbol support vote.svg Support - also volunteering to help with the integration. @Noé to clarify, this proposal is a bit different and complementary to the one you linked. We need a 2-way exchange of data, this proposal deals with the integration of already existing data (such as language codes) back into Wiktionary. – Jberkel (talk) 09:28, 15 February 2017 (UTC)
    I am still not sure to understand which data do you want to use from Wikidata. Module where mentioned, but it means to host them in Wikidata and I don't know if it is a good repository for that kind of file. I will mention this discussion in French Wiktionary to invoke more comment on your proposal, and see if we may not do a common ticket on Phabricator Face-smile.svg Noé (talk) 09:48, 15 February 2017 (UTC)
    The obvious first modules to migrate are the ones which are faux databases already, those supporting language codes/names/families and those supporting categorization spring to mind. This is data which cannot currently be well supported (those modules are just a flat-file database which is extremely limiting). Beyond that, the sky is really the limit concerning what could be migrated, the limiting factor will probably be UI development more than anything. - TheDaveRoss 12:26, 15 February 2017 (UTC)
    I suppose you are talking about e.g. Module:languages/data2? The only thing I'm worried about is performances: those lists are basically loaded once per page, which allows a page full of translations to be rendered quite quickly. I'm not sure it will be as fast if we have to query the database for every language (unless we get all the languages at once with a single query). — Dakdada 14:11, 15 February 2017 (UTC)
    Also, those faux databases are the subject of much contention. Working out Serbian vs Serbo-Croatian and all the other headaches between different Wiktionaries would make the databases a lot more complex, unless you expect the individual Wiktionaries to choose to lose their autonomy. —Μετάknowledgediscuss/deeds 17:48, 15 February 2017 (UTC)
    About moving language codes/names/families to Wikidata... Would we use Wikidata to expand "pt" into "Portuguese", "ojp" into "Old Japanese", etc.? We might have a problem if someone wants to change language names in a way that we don't approve, like if someone decides that "ang" should be "Anglo-Saxon" instead of "Old English". Someone might decide that "en" looks better as "Modern English" rather than "English" for some purpose outside of Wiktionary, which could then affect Wiktionary.
    That said, Wikidata could have a complete list of languages, including the fact that "sr" means "Serbian", but we could have an internal module blocking the use of "Serbian" in some places like translation tables. --Daniel Carrero (talk) 09:46, 16 February 2017 (UTC)
    The reason this could be a problem is simple: the language information used on Wiktionaries are not data but metadata, and this metadata needs to be controlled by the respective Wiktionaries (as in controlled vocabulary). In other words, Wikidata is not the right tool for that, and it should be reserved for actual content. — Dakdada 09:55, 16 February 2017 (UTC)
    The majority of data could probably come directly from Wikidata. For the remaining edge cases (the mentioned custom language codes, Proto-Germanic, contentious denominations like Serbian etc) we should try to see if there's a way to get this data into Wikidata, if that's not possible we could have “overrides” on the Wiktionary side(s) which would take precedence. – Jberkel (talk) 10:46, 16 February 2017 (UTC)

Another comment: apparently Wikidata already contains information (author, publication date, etc.) about some books. The page wikidata:Q43361 is about Harry Potter and the Philosopher's Stone. Maybe we can use it for quotation purposes. We could type something like {{auto quote|Q43361}} and have all the information filled automatically when possible. (the template does not have to be called "auto quote", it was just an example). --Daniel Carrero (talk) 13:13, 16 February 2017 (UTC)

A new Labs Tool to visually explore etymological relationships extracted from the English Wiktionary[edit]

Hi all! I have developed a tool to visualize etymologies. Please check it out at tools.wmflabs.org/etytree. My work is funded by an IEG grant. Please leave your feedback here. It will help improve it.

a screenshot of the graph for word coffee

As a first release, it's is impressive how well automatic extraction of data works (with some bugs of course...). This is because Etymology Sections are written using well defined standars. I would like to get some feedback about some difficulties I have encountered while extracting data and some ideas I have about new templates. I wrote some notes here. Please add your comment there if you have any. Some additional notes follow:

  1. I could not use trees as in the nicer demo because there are loops between words that cannot be fit in trees (in trees branches don't merge). Loops are conflicting etymologies. Many are due to simple inconsistencies that users can easily fix, others are real conflicting etymologies and should be represented using multiple trees. I will work on this in a future release.
  2. Etymology Sections rarely link to words and their sense/pos, generally only link to the lemma. This is a problem for homographs, cause they generally have different etymological trees which get mixed up in this current implementation. See for example the discussion on the Etymology Scriptorium. It would be nice to have more precise links in etymology sections that link to the correct word.
  3. I am not plotting all derived words as of now to clean up a bit the visualization.

Looking forward to your comments! Epantaleo (talk) 18:22, 15 February 2017 (UTC)

The use of also[edit]

It seems PapiDimmi and I disagree on how to use also. Please advise on this reversion: [8]. Relevant policy should be here: Wiktionary:Entry_layout#Before_the_first_language_section. Equinox 19:31, 15 February 2017 (UTC)

I wouldn't say "identical letters" but rather "similar letters" as for example oe, œ, ö and ø, are different letters but can have an {{also|}} like övrig an øvrig. However, after reading WT:EL#Before the first language section I wouldn't think that chauffeur and choffer belong on Chauffer. chaufer, chaffer, caûffer and maybe many more entries are also "similar" to Chauffer. And then there's the matter of interpreting "similar". Similar in spelling, in pronunciation? As wiktionary covers many languages, fish and Fisch would be similar too, both by spelling and pronunciation. And wouldn't fis, visch, ish, gish, tish and wish be similar to fish too? With all those "similar" entries the also would become too long and the "similar" would be too vague, too open for interpretation. -Slœtel (talk) 10:32, 16 February 2017 (UTC)
I think you were right in removing those links. {{also}} is for typographical similarity. For language-dependent similarity, there is the See also section of entries. — Ungoliant (falai) 11:44, 16 February 2017 (UTC)
That template is supposed to show words that are identical except for capitalization; also for words that are identical except for diacritics. Originally there were two reasons why {{also}} was needed. First, if you typed, for example, mike, and then later typed Mike, you would go to mike and would not be able to access Mike. I think this problem has since been fixed. Second, since most people with an English keyboard have trouble typing diacritics, we also put forms with diacritics (cafe, café). {{also}} is definitely not for variant spellings, synonyms, translations, or the like. However, in the case of other alphabets and scripts, it might be useful to include words spelled with a letter that many English-speakers confuse with another letter (B, ß). —Stephen (Talk) 12:14, 16 February 2017 (UTC)
I agree with Equinox, Ungoliant, and Stephen that {{also}} is for word that differ only in capitalization (mike/Mike), diacritics (sake/saké), punctuation (wont/won't), spacing (everyday/every day), and the like. I'm undecided on visually similar characters from different scripts, though (e.g. to/το; hug/հաց). —Aɴɢʀ (talk) 12:39, 16 February 2017 (UTC)
Agreed with all the above. Another instance in which {{also}} might be useful is for confusion between the long s and lowercase f (e.g. filly could have a link to silly, if the latter is attested as ſilly). Andrew Sheedy (talk) 12:44, 16 February 2017 (UTC)
In these cases it might be useful to actually use the long s in the template, so readers can recognize it. Andrew Sheedy (talk) 12:46, 16 February 2017 (UTC)
I have long thought that the basic use of {{also}} above the first L2 was most useful to help someone get to an entry with different diacritics or capitalization in a different language. That is the reason for its placement above the first L2. IOW, having terms that are merely alternative forms in the same language would be not just redundant but wrong IMO. Whether this would be something worth the effort of correction is a separate matter. DCDuring TALK 13:56, 16 February 2017 (UTC)

Titles of pronunciation modules[edit]

I recently created the Czech pronunciation module Module:cs-pronunciation. @CodeCat said on the talk page that it should be moved to Module:cs-IPA, and I asked why, because -IPA is currently not the most commonly used form for the module names in Category:Pronunciation modules. (23 modules use -pron, 11 use -IPA, 6 use -pronunciation, and 4 use -pronunc.) @Atitarev said that it was decided in a discussion that -IPA should be used, but did not link to the discussion. I couldn't find the discussion by searching. Does anyone know where that discussion was? — Eru·tuon 20:21, 17 February 2017 (UTC)

FWIW, I would prefer them all under -IPA as well. —JohnC5 20:30, 17 February 2017 (UTC)
Aside from what name all modules should have, if the template's name is {{cs-IPA}}, it only makes sense to use the same name for the module. —CodeCat 20:32, 17 February 2017 (UTC)
I don't remember where the discussion was but I also don't see why the module can't be renamed. --Anatoli T. (обсудить/вклад) 22:48, 17 February 2017 (UTC)
Not all pronunciation modules are pure IPA modules. Wyang (talk) 02:32, 18 February 2017 (UTC)
I think they should be standardized under '-pron' because that's the most common case currently, it's easier to type, and for Wyang's reason. Benwing2 (talk) 02:34, 18 February 2017 (UTC)
@Wyang: Which pronunciation modules are not pure IPA? I have not encountered that yet. — Eru·tuon 20:09, 18 February 2017 (UTC)
@Erutuon: Module:my-pron, for example, also generates a phonetic respelling and romanizations in addition to IPA. —Aɴɢʀ (talk) 20:26, 18 February 2017 (UTC)

I guess I personally prefer -pronunciation, because it does not require editors to know what IPA stands for, and because pron is an ambiguous abbreviation that can mean either pronoun or pronunciation. — Eru·tuon 20:09, 18 February 2017 (UTC)

I recall a discussion in which it was decided that templates that automatically produce pronunciation info should be named "xx-IPA"; I don't think there was ever a discussion about what the corresponding modules should be called. Since ordinary editors very rarely have to type the names of modules (as they frequently do the names of templates), succinctness is less important for module names than for template names. I agree that "xx-pron" should be avoided as templates named "xx-pron" are usually headword-line templates for pronouns, and it would be confusing for "pron" to stand for something else in modules. Other than that, I don't really mind what the module is called as long as the template is called "xx-IPA" as expected, since the module name only has to be written once, namely in the template that invokes the module. —Aɴɢʀ (talk) 20:24, 18 February 2017 (UTC)

Tsonga Dictionary[edit]

Via OTRS (OTRS ticket # 2016092010022731) someone has released a digitized Tsonga-English/English-Tsonga dictionary under CC-BY-SA. I currently have the Tsonga-English portion as a 438 page Word document, and I would like to figure out what the best thing to do with the information is. I am happy to do some work in terms of reformatting and uploading the content, however I do not speak Tsonga nor am I familiar with Bantu or any South-African languages. I gather that the language is a bit politically complicated as well. We have very little coverage of this language as it stands.

Are there any contributors who would feel comfortable helping to review the dictionary and help with getting the data ported into Wiktionary? One possible starting point is that I can reformat and upload all of the content to a collection of subpages, which could then be moved into the main namespace by contributors who feel comfortable verifying. I could also provide the Word document to interested people who could advise on the content and whether it is trustworthy enough to just upload in its entirety. - TheDaveRoss 21:00, 17 February 2017 (UTC)

How many words is it? What kind of information is present- definitions, pronunciations, inflections? DTLHS (talk) 21:01, 17 February 2017 (UTC)
I am not sure how many defined words or definitions there are exactly, there are 220,564 words in the document so I would guess that the number of defined terms is on the order of 20,000. (edit: there are 27,451 line breaks, so nearly that many terms.)
Here is a sample of the information provided:
  • abuxeni, (salutation) see avuxeni.
  • abvana 5, see bvana, maabvana.
  • accelerando 9, accelerando (music).
  • adajiyo 9, adagio (music).
  • adirese 9, (Eng.) address of a letter, postal address.
  • -adiresela, address a letter.
  • afirikati 9, (Eng.) affricate.
  • Afrika 1, the continent of Africa; -- -Dzonga, South Africa; -- wa Dzonga, Southern Africa.
  • agenda 9, (Eng.) agenda.
  • Agoste, (Eng.) August.
  • ahanti, ahati, interj. to express uncertainty, dubiousness, indecision, perhaps, weak refusal: May be! I don't know I am sure.
  • ahe, ahee, interj. as an answer to a greeting, also to express thanks, approval, assent; interj. to children and boys among themselves.
  • ahee, ahehe, interj. used as a warning against danger: mind, look out, beware; here you are, there it is.
  • -ahlama, be wide open; open one's mouth, gape.
  • -ahlamela, stay with open mouth. (Idiom) -- munhu, to pick a quarrel with a person.
  • -ahlamisa, cause mouth to open, make gape; hold open, as a sack. (Idiom) -- nomu, to gasp, to speak; -- tinhlaya, to feel drowsy.
  • -ahlamula, yawn, open the mouth.
  • -ahluka, detach, separate itself from.
  • -ahlukana, become divided, separated, as husband and wife by divorce.
  • -ahlukanya, detach, separate.
  • -ahlula, judge, adjudicate, try a case.
  • ajenda 9, (Eng.) agenda.
  • -aka, 1 build, construct, edify, elevate, erect, as a hut; -- tinghalava, shipbuild-ing. 2 dwell, inhabit. 3 be ingrained, e.g. mukhuva wu ta aka, the habit is ingrained. (Idiom) -- vuxaka, to foster friendship; -- mhaka, to prepare a court case; -- tiko, to develop a country; -- muti, to stay peacefully together, as husband and wife.
  • -akana, help each other to attain a high status; mould each other's character.
  • -akademiki, (Eng., adj.) academic.
The portion of the document that I have does not indicate what all of the annotation means, so there may be more information there (such as the 1s and 9s) than I can interpret. - TheDaveRoss 21:10, 17 February 2017 (UTC)
I'd guess the numbers are noun classes. Equinox 21:13, 17 February 2017 (UTC)
I'd recommend looking at Zulu and Swahili entries. Zulu and Swahili are the best developed Bantu languages on Wiktionary. In Zulu, verbs and adjectives are placed at the form without the hyphen but the headword is displayed with the hyphen, as are links. See bona for an example. As for the noun classes, comparing it to the list at w:Tsonga language, it appears that your list does not actually give forms that include the noun prefixes. Perhaps there are different allomorphs of the prefixes, so that for example class 9 nouns have no prefix if the noun begins with a-, but I don't know. I may look into it more. Feel free to ask me for any help as I'm already experienced in handling Zulu. —CodeCat 21:23, 17 February 2017 (UTC)
I've created a basic entry at aka, using {{ts-verb}} which I just created. You can use this as a basic template at least for the verb entries. We probably want a reference template for this dictionary, but I don't know what it's called, who made it or where it's located. —CodeCat 21:31, 17 February 2017 (UTC)
  • @TheDaveRoss: It will have to be done partially by hand, and that will require a lot of effort. Right now, the key is to make this publicly available by posting it on Wikisource. —Μετάknowledgediscuss/deeds 21:32, 17 February 2017 (UTC)
    I'll try and get it up on Wikisource once I have the remainder (especially since there may be more meta information available in the other part. - TheDaveRoss 21:50, 17 February 2017 (UTC)
    • @TheDaveRoss: For what it's worth, AWB would probably help in creating them as well. Ping me if you'd like me to assist. —Justin (koavf)TCM 03:06, 18 February 2017 (UTC)


Is rollback available to user who are non-administrators? Pkbwcgs (talk) 19:03, 18 February 2017 (UTC)

@Pkbwcgs: It is possible to create a rollbackers group but that doesn't exist here. If you want, w:WP:TWINKLE has a rollback feature. —Justin (koavf)TCM 19:13, 18 February 2017 (UTC)
@Koavf: I would like to use Twinkle but it is not available on Wiktionary. I tried to create the gadget on my own but it didn't work. See User:Pkbwcgs/Twinkle.js and User:Pkbwcgs/Twinkle warn.js. Pkbwcgs (talk) 19:17, 18 February 2017 (UTC)
@Pkbwcgs: Looks like we both learned something today (me twice!). At Special:ListGroupRights, you can see that we do in fact have a rollbacker group. —Justin (koavf)TCM 03:03, 19 February 2017 (UTC)
Not having ever had the ability to use all the sysop buttons, I've often wondered: what the difference is between using rollback and simply undoing an edit? Andrew Sheedy (talk) 03:13, 19 February 2017 (UTC)
Rollback is instantaneous. No opportunity to give an edit summary, just undo the edit as fast as possible. —suzukaze (tc) 03:14, 19 February 2017 (UTC)
Ah, OK, thanks. Andrew Sheedy (talk) 03:17, 19 February 2017 (UTC)
Rollback also marks the reverted edit as patrolled, so those of us who go through Recent changes with patrolled edits filtered out won't see them anymore. That's the main reason to be careful about who gets rollback: we lose the ability to correct mistaken reverts or to look for larger problems/patterns that the rollbacker might have missed in fixing the obvious immediate problem. Chuck Entz (talk) 04:43, 19 February 2017 (UTC)

Placement of synonyms[edit]

In diff, synonyms were moved away from their Synonyms section, to definitions. I don't like this and it does not match our long-standing practice. Can someone please point me to a relevant discussion? User:CodeCat may like to comment. --Dan Polansky (talk) 19:31, 18 February 2017 (UTC)

I think putting synonyms and antonyms under their respective senses is a good idea, especially for words with lots of meanings, like head. Keeping them in separate Synonyms and Antonyms sections makes them harder to find, makes it harder to find which sense they belong to, and makes it easier for them to be assigned to senses that are not actually listed. Separate sections are fine for words that have only one or two meanings, but for words like head with a very large number of senses and subsenses and subsubsenses, I think the new arrangement is an improvement. —Aɴɢʀ (talk) 19:57, 18 February 2017 (UTC)
I have noticed these edits too. I think they are structurally an improvement, but they do suck up vertical space in a way that is awkward for somebody who just wants to read the list of definitions. I suppose I would like a way to hide *nyms until explicitly choosing to view them. Equinox 20:00, 18 February 2017 (UTC)
Especially in long entries, it is important to be able to skim definitions, which these changes make harder to do. Incidentally, the head entry is now a horrible mess of inept subsensing; there is e.g. "Mental or emotional aptitude or skill" as a subsense of "The part of the body of an animal or human which contains the brain, mouth and main sense organs". --Dan Polansky (talk) 20:02, 18 February 2017 (UTC)
I'm not fond of subsensing either, but that's an entirely separate issue that has nothing whatever to do with the placement of synonyms. —Aɴɢʀ (talk) 20:27, 18 February 2017 (UTC)
What happened to the idea that was proposed last time we discussed this, to put them in a collapsible form, identical in structure to quotations? I oppose placing them under definitions, unless they display in that format, in which case I fully support having them under definitions. Maybe we should put it to a vote so it gets more attention? Most people seemed to be in favour of putting them directly under definition lines, IIRC. Andrew Sheedy (talk) 03:16, 19 February 2017 (UTC)
FWIW Simple English Wiktionary has collapsible functionality: simple:language. —suzukaze (tc) 03:59, 19 February 2017 (UTC)
Interesting. That's exactly what I'd like to see here. Andrew Sheedy (talk) 04:01, 19 February 2017 (UTC)
I'm very fond of the format of the Simple English Wiktionary as well. Wyang (talk) 05:12, 19 February 2017 (UTC)

onelook.com problems[edit]

While onelook.com is a useful site, obvious errors never seem to be fixed, and the people operating it seem entirely unresponsive to any reports submitted through their "Contact us" page. There has for a long time been a glaring problem with Wiktionary links whereby most words incorrectly link to a capitalised form that usually does not exist. Does anyone have any inside track for contacting them about this? Or perhaps someone else will have more luck with their "Contact us" than I ever have. Mihia (talk) 21:49, 20 February 2017 (UTC)

I think the problem should be apparent from the lack of advertising. I haven't been able to get them to correct glaring problems like deadlinks to dictionaries (no just entries in dictionaries). We could produce our own equivalent I suppose. We might even be able to include some sources that they don't/can't. DCDuring TALK 00:45, 21 February 2017 (UTC)
I don't understand what you're saying about capitalised links. A lot of useless spam-scum sites either steal material from Wiktionary, or link to us, in an attempt to get credibility, hits, or whatever those sick fucks want. I think they are just best ignored. Equinox 00:46, 21 February 2017 (UTC)
I think calling onelook spam scum is unwarranted. DTLHS (talk) 01:56, 21 February 2017 (UTC)
Quite disgraceful. Mihia (talk) 03:06, 21 February 2017 (UTC)
I reached out to OneLook initially back in 2006 to suggest they include Wiktionary, and Connel and I worked on generating the list of English terms which they originally used to link back to Wiktionary. Those lists are long since defunct, and I am not sure what they are now using. I still have the email address of the creator of the site, and the guy who was maintaining it, I can try and reach out to them. A lot happens in 10 years, so they may not even be involved any longer. Re motives, at least at that point they were all about making data more open, so I wouldn't put them in the same class of folks as those who mirror content to try and generate ad revenue. - TheDaveRoss 13:49, 21 February 2017 (UTC)
I would expect that it would be better maintained if they did generate more ad revenue. I use the site often to see and show what the lemmings are doing, often adding it as an External link to our entry. DCDuring TALK 15:49, 21 February 2017 (UTC)
While OneLook indeed curiously links to Rubber from its rubber[9], it is generally a great service that I use when I want to compare the content of multiple dictionaries, which I do quite often. OneLook's Alexa rank[10] (~ 24,000) is worse than Wiktionary's[11] (626), so they don't really appear to steal readers from Wiktionary. I am glad the service exists. --Dan Polansky (talk) 11:28, 25 February 2017 (UTC)

Category:Antonomasias by language[edit]

I see we have Category:Genericized trademarks by language, which are basically a special case of antonomasia. Could we have the more general category, to treat cases like mentor and stentor, which come from Μέντωρ(Méntōr) and Στέντωρ(Sténtōr) respectively? --Barytonesis (talk) 08:39, 21 February 2017 (UTC)

That seems like a very unknown jargony term. I'd rather stick with something that people do understand. —CodeCat 14:23, 22 February 2017 (UTC)

New game[edit]

Hey. Anyone fancy another game of Wiktionary Scrabble? The last one was pretty epic, with a pretty table from Kenny. Granted, there was shambolic organisation, spelling, rulekeeping and general gamesmastership from WF. But it's been a while since we played a game, and Equinox is getting bored. I started a page at Wiktionary:Random Competition 2017. --Quadcont (talk) 18:45, 21 February 2017 (UTC)

I've been hoping for a game! Now I just need to figure out which language I can start learning that exploits these rules most effectively... —Μετάknowledgediscuss/deeds 00:54, 22 February 2017 (UTC)
I don't think there's been a game since I started editing here! Hopefully my weekend isn't too busy for me to play. (Oh good, this isn't scheduled to start till the end of the month) Andrew Sheedy (talk) 03:35, 22 February 2017 (UTC)
We start the game on Feb 24th, with the seven letters EWERSOA. --Quadcont (talk) 18:26, 22 February 2017 (UTC)
Does the first word have to go through the center square? DTLHS (talk) 18:31, 22 February 2017 (UTC)
Yes, it does. --Quadcont (talk) 18:43, 22 February 2017 (UTC)
  • I hope Scrabble doesn't sue us. --Quadcont (talk) 18:44, 22 February 2017 (UTC)
    I hope they do, because then they might merge Wiktionary with the Scrabble dictionary and it wouldn't be missing so many common words. - TheDaveRoss 19:07, 22 February 2017 (UTC)
  • We need more competitors! People who like this kind of thing: @Andrew Sheedy, DTLHS, Pingku, Wikitiki89, msh210, SemperBlotto, please join in! —Μετάknowledgediscuss/deeds 19:34, 25 February 2017 (UTC)
    • Already in (and thinking about next Christmas). SemperBlotto (talk) 20:00, 25 February 2017 (UTC)
      • ("sasowere" seems to be a Japanesy-type word (transliteration) but I can't quite get my head around it.) SemperBlotto (talk) 20:02, 25 February 2017 (UTC)
        You won't get anywhere with that, as /we/ is a disallowed syllable in modern Japanese. —Μετάknowledgediscuss/deeds 23:58, 25 February 2017 (UTC)
    I'll join in! I was away, so I couldn't join in till today. Andrew Sheedy (talk) 17:22, 26 February 2017 (UTC)

Another UtherPendrogn clone?[edit]

Llacheu (talkcontribsglobal account infodeleted contribsnukeedit filter logpage movesblockblock logactive blocks). Can someone check? —CodeCat 19:25, 26 February 2017 (UTC)

THE ONLY VANDAL IS YOU! STOP! Llacheu (talk) 19:36, 26 February 2017 (UTC)

Thank you for the confirmation... :) As long as you act like Uther, you can't fool anyone. You only lasted as long as you did with the last sock because you managed to behave- up until you finally blew it. I pretty much knew it was you from the beginning, but decided to let you have a go at turning over a new leaf- which you almost did. I, for one, appreciate your trying. Chuck Entz (talk) 22:57, 26 February 2017 (UTC)
Late to the party, but confirmed. - TheDaveRoss 15:49, 27 February 2017 (UTC)

Syntax of Template:label/example[edit]

@Erutuon You left a change message saying "odd syntax, but makes the code much shorter". I don't think making for shorter Lua code is necessarily a good reason for using an odd syntax. I'm not opposed to using en: to specify a language code instead of a separate parameter, because it may be easier to read, but having the examples separated by a semicolon instead of as separate params seems a bit dubious. In reality I don't think it would actually make the Lua code longer to iterate over separate arguments instead of splitting on semicolon. Benwing2 (talk) 19:57, 26 February 2017 (UTC)

@Benwing2: I used the odd syntax because I am not sure how else to make the template work when there are multiple examples in a row. How would the module determine where one example ends and another begins? It's not shorter Lua code; rather, shorter template code. — Eru·tuon 20:03, 26 February 2017 (UTC)
@Erutuon Can you explain further? It seems to me you could do various things:
{{label/example|en:foobar, _, bazbip;
        en:foobar, _, bazbip, slang;
        en:foobar, or, bazbip;
        en:foobar, and, bazbip;
        en:foobar, and, bazbip, or, Australia;
        en:Australia, or, foobar}}

{{label/example|en:foobar, _, bazbip|
        en:foobar, _, bazbip, slang|
        en:foobar, or, bazbip|
        en:foobar, and, bazbip|
        en:foobar, and, bazbip, or, Australia|
        en:Australia, or, foobar}}



Using a blank argument to separate sets of template arguments is fairly standard. In this case you'd probably have to strip whitespace off of the arguments, since this isn't done by default for unnamed args. Benwing2 (talk) 20:14, 26 February 2017 (UTC)

Well, okay, I guess there are a number of ways to do it. Actually, I came up with the method of indicating the parameters with punctuation at some point, and simply never changed my ways. I would change my method if there is a more generalizable method, one that can be made into a general template examples module. — Eru·tuon 20:40, 26 February 2017 (UTC)
Question: Why not keep the old template-based syntax? Adding an entire new function and worrying about its syntax seems like overkill for a static demonstration of the template. —suzukaze (tc) 00:36, 27 February 2017 (UTC)
Because I was adding a few more examples and I don't like typing out {{temp|label}} and then {{label}} with the same parameters, and adding the arrow between. It seems simpler and more reliable to have a module do it (though I admit the function takes quite a while to write). — Eru·tuon 01:10, 27 February 2017 (UTC)

User:Cl adding incorrect Dutch pronunciations[edit]

This user has been butting heads with me and User:Lingo Bingo Dingo over their edits to Dutch pronunciations. They've introduced nonphonemic features to phonemic transcriptions, as in diff, diff, diff, diff, diff. When pointed out, they removed the phonemic transcription altogether, and replaced it with a phonetic one without specifying which varieties of Dutch it applies to: diff, diff, diff, diff (these pronunciations are not universal). They also removed one of the two possible pronunciations in diff. When reverted, they start an edit war. Can this please be sorted? —CodeCat 19:13, 27 February 2017 (UTC)

I refer to reliable sources when making changes, yours are none. Cl (talk) 19:15, 27 February 2017 (UTC)
A dictionary is not a valid source, I've already pointed this out to you. —CodeCat 19:16, 27 February 2017 (UTC)
A dictionary can't but be a reliable source. Whatever you say isn't (unless supported by a reliable source). Cl (talk) 19:23, 27 February 2017 (UTC)
What is "reliable"? And no, a dictionary is not a valid source for Wiktionary. Have you read WT:WFW yet? —CodeCat 19:24, 27 February 2017 (UTC)
WHere exactly is stipulated dictionaries aren't a valid source for Wiktionary? This text, e.g., implies they are: [[12]]. It'd be highly illogical if they shouldn't be accepted as reliable sources. Cl (talk) 19:30, 27 February 2017 (UTC)
Those are references, not sources. There's a big difference. Also, it's common sense: a dictionary can't use another dictionary as a source. And again, what is a "reliable" source, and what makes your dictionary one? —CodeCat 19:33, 27 February 2017 (UTC)
Wiktionary isn't a dictionary by itself, but a secondary source. And of course existing dictionaries should be checked as a reliable source, should some question arise. How come can't a dictionary be a reliable source {of a phonetic transcription in this case) if Wiktionary itself demonstrates they are? Cl (talk) 19:45, 27 February 2017 (UTC)
Wiktionary is a dictionary and says so in its logo. Professionally published dictionaries are not the work of gods and we do not trust them unquestioningly. Equinox 19:51, 27 February 2017 (UTC)
You should have the official Wikipedia reliable source policy changed, if you adhere to a different point of view. I don't think you're going to succeed, though, because that'd result in chaos. Cl (talk) 19:54, 27 February 2017 (UTC)
Wiktionary does not follow Wikipedia policies, so their definition of "reliable source" is completely irrelevant to us and always has been. —CodeCat 19:56, 27 February 2017 (UTC)
One still needs some kind of verification, and yours has been null so far. Cl (talk) 19:58, 27 February 2017 (UTC)
Well then, what's the procedure for verifying IPA? —CodeCat 20:03, 27 February 2017 (UTC)
Pronunciation dictionaries and works on the Dutch phonology, in my opinion. Should you have a better idea, I'm ready to listen. Cl (talk) 20:07, 27 February 2017 (UTC)
It was a question asked to everyone else too. I genuinely don't know, so some other experienced editors should clarify hopefully. —CodeCat 20:10, 27 February 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── When it comes to pronunciation info, clearly our usual criteria of use in a permanently archived source by multiple independent authors spanning the course of more than a year cannot apply. When I add pronunciation info for English, German, and Burmese, I always rely on information from other dictionaries; when I add it for Irish, I rely on published phonetic descriptions of specific dialects combined with my own knowledge of the phonology of the language. For Welsh and Lower Sorbian, I rely on the spelling-to-pronunciation rules described in dictionaries as well as as phonological descriptions of the language. And for English and German, still I often come into conflict with other users who have different opinions on how best to transcribe these languages, even if we don't actually disagree on how a given word is pronounced. In general, phonemic transcriptions are more important here than narrow phonetic ones, but there's no reason not to include both. Is there any objection to {{IPA|lang=nl|/kroːˈaː.(t)si.jə/|[kroːˈwaː.(t)si.jə]}}? —Aɴɢʀ (talk) 20:40, 27 February 2017 (UTC)