Wiktionary:Beer parlour/2017/July

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← June 2017 · July 2017 · August 2017 → · (current)

Contents

Change "proscribed" to "considered incorrect"[edit]

Reading through the discussion at Wiktionary:Tea room#.22alot.22_is_NOT_correct_English. finally gave me the resolve to propose something that has bothered me for a long time.

Let's change the "proscribed" label to "considered incorrect".

  • Proscribed is not a common word. I consider myself quite well-read, and I had never come across this word before encountering it on Wiktionary. An Ngram shows more usage of the word than I would have expected, but when you look at Google Books results since 1970, you see that its use is confined to academic and technical works, such as journal articles, textbooks and legislation. It appears in very few works targeted at a general audience, which is surely the audience we are targeting here at Wiktionary.
  • "Considered incorrect" would be better than "proscribed" in that it does not give the suggestion that we are the ones prescriptively proscribing the word - we are simply noting that many, if not all, sources consider the term incorrect.
  • It looks extremely similar to a word with essentially the opposite meaning. I couldn't think of many pairs of differently-spelt English words that look any more similar when written down in lowercase (other than M/RN pairs like "bum" and "burn" perhaps).
  • This information that this label conveys is especially important (I'd even say essential) for English learners and non-native speakers, but because it is conveyed using a word that they most probably do not know, it is going to be lost on them.

Sure, the definition is only a click away at the glossary. But why should we make people learn an extra word to be able to use our dictionary properly? It's silly. Let's do away with it.

I'm inclined to propose a vote along the lines of "changing the display of {{label|en|proscribed}} to (considered incorrect)". This, that and the other (talk) 06:15, 1 July 2017 (UTC)

I am persuaded by your reasoning here. I would support the wording considered incorrect. I wonder what @Dan Polansky would think of this proposal, given that he would prefer us not to use the proscribed label. — Eru·tuon 18:34, 1 July 2017 (UTC)
I seem to prefer often deemed incorrect; "considered" is okay but "deemed" is shorted. The addition of "often" reinforces the idea that the deeming is not done by Wiktionary editors. See also Wiktionary talk:Votes/2016-10/Removing label proscribed from entries#Other label name. --Dan Polansky (talk) 19:02, 1 July 2017 (UTC)
I’d rather we kept a distinction between “considered incorrect by language authorities” (= proscribed) and “considered incorrect by speakers in general” (= nonstandard). — Ungoliant (falai) 18:47, 1 July 2017 (UTC)
I agree the distinction should be kept, though I'm sympathetic to the point that users may not know the word and hence the info may be lost on them. It's hard to find a label that keeps the distinction and is concise and able to be put into all entries that use "proscribed". "Considered incorrect by authorities" or "...by some authorities" is wrong if only one (but e.g., the official or dominant/influential and notable) language authority proscribes the term, e.g. the Académie française, the Duden, maybe the OED, and "some" is wrong if most or all authorities proscribe the term.
For similar reasons, "often" should not be included in the text that is automatically displayed: not all terms are "often" considered incorrect by authorities: some may be considered incorrect by all authorities (this seems especially likely if a language has one or more central authorities), others may only be proscribed by some authorities while other authorities approve of them, in which case we use "sometimes proscribed", which would become "sometimes often considered..." or "sometimes often deemed...". And probably the most frequent occurrence is that one or more authorities proscribe a term and others don't mention it, which makes it debatable whether it is "often" considered incorrect.
(Ultimately, using "proscribed" and linking to the glossary like we do may be the best option, despite its drawbacks.)
An idea based on the name of the category which "proscribed" currently categorizes into is "authoritatively disputed" or "authoritatively deemed incorrect", but I don't like the sound of either of those; "authoritatively" seems liable to be misunderstood.
- -sche (discuss) 18:17, 2 July 2017 (UTC)
I would support changing "proscribed" to "considered incorrect", but I also agree with Ungoliant that it's useful to distinguish between whether something is only "officially" considered incorrect, or whether most speakers would think it a mistake. Ultimately, I think the ideal is to put that information in a usage note, which allows for further elaboration. One can't learn the subtleties of a word's usage from a label. Andrew Sheedy (talk) 19:47, 3 July 2017 (UTC)
I strongly support the use of the more common words, despite the greater length. DCDuring (talk) 02:39, 4 July 2017 (UTC)
"Official" incorrectness vs. "popular" incorrectness is not in fact a binary distinction: words can even be incorrect in some registers while being preferred in others. Don't we have the ====Usage notes==== section for this kind of detail? --Tropylium (talk) 15:17, 23 July 2017 (UTC)

@DCDuring, Andrew Sheedy, -sche, Ungoliant MMDCCLXIV, Dan Polansky, Erutuon I've created a vote at Wiktionary:Votes/2017-07/Changing the wording of the "proscribed" label. The discussion at the talk page may interest you. This, that and the other (talk) 10:16, 9 July 2017 (UTC)

Vote -- Requests for documentation[edit]

Based on Wiktionary:Tea room/2017/June#"the Variety -er", I created Wiktionary:Votes/2017-06/Requests for documentation. --Daniel Carrero (talk) 10:48, 1 July 2017 (UTC)

July Lexisession: flight[edit]

Is it a spin?

Monthly suggested collective task is to collect words about flight. In the category of Wikisaurus about travel and movement, there is nothing about motion in the air, and it is the same in French Wiktionary, so it seems like a good topic for this month - it could soar!

Yay! let's do a barrel roll!

By the way, Lexisession is a collaborative experiment without any guide nor direction. You're free to participate as you like and to suggest next month topic. If you do something this month, please let us know here or in Meta, to let people know that English Wiktionarians are doing something on this topic. I hope there will be some people interested to reach the altitudes! Face-smile.svg Noé 11:13, 1 July 2017 (UTC)

I spruced up a little bit the Spanish entries volar and volador. That's my good deed of the month done, then. Also added an Asturian entry - vuelu. --Recónditos (talk) 11:13, 8 July 2017 (UTC)
Great! Thank you! ¡Muchas gracias! I updated the [[Meta page to display a shorter version of the passed editions. There is no mention if people did not ping me or wrote a note on the beer parlour, so feel free to let me know or to enhance the Meta page. LexiSession is getting a year old soon and it's time to look back and make some improvement in the formula Face-smile.svg Noé 15:59, 3 July 2017 (UTC)
Cleaning up उड़ना (uṛnā). —Aryaman (मुझसे बात करो) 16:43, 4 July 2017 (UTC)

Category:Coinages by language (tentative name)[edit]

I'd like to have a category for words which are known to have been coined by a specific person (example: evolutionarily stable strategy). There is Category:Neologisms by language, but I don't think all neologisms have necessarily a well-defined author. --Barytonesis (talk) 13:46, 2 July 2017 (UTC)

Every word is a neologism and a coinage, so I think neither category should exist. —CodeCat 14:52, 2 July 2017 (UTC)
Only a few words have a clear author, so a coinage category may be justified. — Dakdada 15:57, 3 July 2017 (UTC)
So "coinages by known individuals"? (Or named individuals; or groups; or...) Equinox 16:09, 5 July 2017 (UTC)

Proposal: automatically link all links without a section to the English section[edit]

There have been a lot of efforts in recent times to make sure that terms in non-English are wrapped in a template that tags them as such and adjusts the link target appropriately. Thus, I think it makes sense if all links, by default, link to English. This should make it easier for definition writers, because they can link words in a definition without worrying about where that link goes. The template {{def}} was created to alleviate this issue, and people have been adding {{l|en}} to definitions as well which is even worse. Moreover, a global solution would affect links in etymologies and in other places too.

This proposal of course only affects links to entries in the main namespace. It's also explicitly meant to be applied only in places where English text is expected, so it wouldn't be used in lists such as Derived Terms. Those would still use {{l|en}} to tag them, as before. —CodeCat 17:21, 2 July 2017 (UTC)

What are you proposing? Some javascript to automatically make links point to #English? DTLHS (talk) 17:31, 2 July 2017 (UTC)
I think so. Unless there's another way. —CodeCat 17:47, 2 July 2017 (UTC)
I'm unsure. How expensive is js that only tags links in certain sections (for example, you seem to suggest not applying it to Derived terms")? what is the actual benefit, given that English is already the top section (where a user lands) on almost all pages where an English section is present? How does that benefit compare to the drawback that many bare wikilinks that are not to English terms will be mislabelled? (For example, users sometimes use simple wikilinks to link to German or Russian words if they're long enough that the users think it's unlikely there'll ever be any other language section on that page.) - -sche (discuss) 18:31, 2 July 2017 (UTC)
People are currently using {{l|en}} in definitions, so that suggests that those people find a need for such section links. TabbedLanguages links to the last-used language section whenever a link has no section, which ends up always going to the wrong section when a link is in a definition, etymology, or anywhere else that has running English text. Perhaps only the behaviour of TabbedLanguages should be changed. —CodeCat 18:48, 2 July 2017 (UTC)
Support. — Ungoliant (falai) 18:34, 2 July 2017 (UTC)
Might be OK, but not using expensive JS. DCDuring (talk) 19:05, 2 July 2017 (UTC)
Tentative support I can imagine that there may be some mul use cases but I agree that they are probably going to be English definitions. If JavaScript seems like too much of a headache, just have a bot do it--that way it works for users with scripts disabled. —Justin (koavf)TCM 19:32, 2 July 2017 (UTC)
This seems like a solution without a problem to me; English is already where links go, since they land on the top of the page and English is the first language section. In the cases where Translingual precedes English that is likely the desired solution anyway. - [The]DaveRoss 12:06, 3 July 2017 (UTC)
Again, if that's the case, why do people use {{l|en}} in definitions? —CodeCat 12:28, 3 July 2017 (UTC)
You will have to ask them, but this proposal does not prevent anyone from using {{l|en}} incorrectly. - [The]DaveRoss 12:40, 3 July 2017 (UTC)
True, but I figured if they thought it was necessary, then I'd rather solve it in this way than by having {{l|en}} in definitions. Do you think we should disallow putting {{l|en}} in definitions? —CodeCat 13:23, 3 July 2017 (UTC)
@TheDaveRoss: There are a number of reasons why I (and others) often (not always) use {{l}} rather than bare links in definitions, most of which are already mentioned elsewhere in this thread: (1) if the English word is spelled the same as the foreign word being glossed (e.g. French correct, then a bare link won't provide a link at all, but will merely write the word in bold; (2) sometimes Translingual, not English, is the top entry on the page; (3) in Tabbed Browsing, following a link without an explicit language marking takes you to the same language you were just looking at if it's there, rather than the top entry (e.g. if you're at French corriger and click on a bare link to [[correct]], you will be taken to correct#French, not correct#English. —Aɴɢʀ (talk) 21:57, 3 July 2017 (UTC)
That is fair, I am making no judgment about whether or not it is acceptable to use {{l|en}} in definition lines. If that is a problem then I think there are other possible solutions that don't involve creating a pervasive new scripted process. It is also possible to achieve the same result in the limited cases where it is necessary using standard wiki-markup, e.g. [[correct#English|correct]]. This can even be enforced by bots since it is a very regular situation. As far as the Tabbed Browsing issue, I don't use the feature so I can't speak much about that, but it seems like a bug in Tabbed Browsing which we should not fix by changing the default behavior of the site. - [The]DaveRoss 12:27, 5 July 2017 (UTC)
@CodeCat I use {{l|en}} in FL definitions for words that share a page with the English translation. For example, the French entry for correct includes a link to the English section for the word so that the reader does not have to scroll up, past the Dutch section and the second half of the English section, in order to see the word. This is especially useful for obscure words that have a more full definition in the English section, and/or are several languages down the page. Is that what you're talking about? Andrew Sheedy (talk) 19:56, 3 July 2017 (UTC)
Hmm, this debate will probably never end :). I think it would preferable to use the same (explicit, unambiguous & extensible) mechanism to link to other entries, regardless of the target language. “English is at the top of the page” means relying on an implementation detail of the current wiki presentation. Fixing it on the client-side with Javascript isn't exactly a good solution. But those [[square bracket]]s are just too popular... – Jberkel (talk) 22:35, 3 July 2017 (UTC)

Deleting template def[edit]

FYI, consistent with Wiktionary:Votes/2016-07/Placing English definitions in def template or similar, I proposed to delete {{def}} at WT:RFDO#Template:def. --Dan Polansky (talk) 20:48, 2 July 2017 (UTC)

Changing auto-generated categories at bottom of page[edit]

Hey all, I've been searching the Help pages and haven't found an answer to this. How do I edit the categories at the bottom of a page when they are apparently generated automatically? In particular, overstudious is listed as a 4-syllable word when it actually has 5 syllables. How do I correct this? Thanks for any help. BirdHopper (talk) 21:47, 3 July 2017 (UTC)

@BirdHopper: These are made by templates. In this case, it is {{IPA}}. "Oh-ver" is two and "stood-yuz" is two more, so it generates Category:English 4-syllable words. You may be thinking that it's "oh-ver-stood-ee-yuz" which is five. Since words can be pronounced different ways, it can be in both Category:English 4-syllable words and Category:English 5-syllable words but I don't know that this template has the option to add it to two categories at once presently. —Justin (koavf)TCM 21:51, 3 July 2017 (UTC)
Very interesting! I suppose I can understand how it could be pronounced with 4 syllables. From a GenAm standpoint, the 4-syllable variant is rare, which is probably why I didn't even consider there could be an alternative. I'll just let the issue go, then. Thanks for the insight! :) BirdHopper (talk) 22:06, 3 July 2017 (UTC)
@Koavf: Oh, the IPA template does it! Okay. I just added some syllable breaks (dots) to overstudious and it picked it up as having 5 syllables rather than 4. I don't mean to impose my limited experience of the world on everyone else, but the transcriptions, as written, do have 5 syllables. Now I'm curious what would happen if someone entered a 4-syllable version. I think I'll leave it as-is for now, but it's cool to know more about how that system works! :D BirdHopper (talk) 22:31, 3 July 2017 (UTC)
@BirdHopper: I don't have an example off-hand but I know that some entries have multiple instances of {{IPA}} and are in multiple categories because of it. If you put in both, it will be in both--again, even the same word can be pronounced differently and so will have different IPA transcriptions. Rather than replace the one, maybe have both? Actually, it's probably just the one that's correct. Saying it out loud seems wrong. —Justin (koavf)TCM 23:08, 3 July 2017 (UTC)
More detail: syllable counts are done by Module:syllables. It has a list of English diphthongs, and /iə/ is on that list, because New Zealand has /iə/ as a diphthong in words like here. So, to make the syllable-counting function understand that it's not a diphthong, you have to add syllable breaks. (This would be simpler if {{IPA}} were told what accent the transcription represented, and used that to determine which list of diphthongs to use.) I went through a lot of entries with /iə/ using AutoWikiBrowser and added syllable breaks a while back; I guess I missed this word. (Oh, I see the pronunciation was added recently.) — Eru·tuon 23:40, 4 July 2017 (UTC)
@Erutuon: Thanks for the extra information. Very interesting. I agree that an accent specification would be useful in these cases. For now, I'll have to be a little more aware of diphthongs in other accents and add syllable breaks if necessary. BirdHopper (talk) 16:46, 5 July 2017 (UTC)

"...th most common surname in the United States in 2010" (Xin)[edit]

Why do we want this information? Wyang (talk) 22:23, 4 July 2017 (UTC)

That seems wildly specific and virtually impossible to maintain. It's also probably not something that someone is looking for when looking at this word/phrase/term/entry. Unlike--e.g.--Nguyen, which is notably wildly popular in Viet Nam and is worth mentioning for context. —Justin (koavf)TCM 00:03, 5 July 2017 (UTC)
It seemed to me that if we were going to include surnames we ought to try to include some information about those surnames, such as how common they were and in what demographics. If someone has a decent dataset for demographic information outside of the United States I think they should feel free to add that as well, I do not have that information. As far as maintaining it, the US Government publishes the data in a machine-readable format every ten years with the census, it is fairly trivial to update it. - [The]DaveRoss 12:15, 5 July 2017 (UTC)
I think this is the same situation as the similarly problematic template of {{en-rank}}this is not dictionary material. As the entry itself demonstrates, it is actually composed of multiple etymologies, and it would be much more useful if the surname template can be modified to say A surname of Chinese origin. Statistics showing how many people in the United States bear these surnames (and what ethnicities they are) is inconsequential in a dictionary. Wyang (talk) 12:47, 5 July 2017 (UTC)
What is or is not dictionary material is obviously subjective, and if the consensus is that frequency and demographic information about names is not worth including then I will, of course, defer to that consensus. The other conversations I have had about including these things have been positive.
Re "of Chinese origin," that information should, hopefully, be represented in the Etymology section, but it might not hurt to have it echoed concisely in the "definition" line. - [The]DaveRoss 13:08, 5 July 2017 (UTC)
Comparing it to en-rank supports this; what words are core language words and what words aren't is certainly dictionary material. I'm not sure how I feel about this; it's specialized dictionary material, which tends to move it to the edge of what we we cover, but at the same time tends to say it's not clearly over the edge.--Prosfilaes (talk) 06:02, 8 July 2017 (UTC)

Alternative forms & quotations[edit]

Should a quotation of an alternative form/spelling be placed on the main lemma page, or on the form page? (e.g. should quotations of the term huomo – obsolete spelling of uomo – be placed on the former or the latter's page? – GianWiki (talk) 00:18, 5 July 2017 (UTC)

I tend to decide this on a case-by-case basis. In this case, we're dealing with an obsolete spelling of an extremely common word, so I would add citations to huomo, because what's being attested is the specific spelling with an h, not the existence of the word uomo itself. But for rare words that are attested in multiple spellings, I'd put the citations all together in a single entry, so the reader can see that the word definitely exists but is spelled in a variety of ways. —Aɴɢʀ (talk) 10:27, 5 July 2017 (UTC)
Isn't this part of the reason we have a citations namespace? bd2412 T 13:14, 5 July 2017 (UTC)
I agree. They should be placed there. —CodeCat 13:43, 5 July 2017 (UTC)
If a spelling is RFVed (for example, if someone disputes that huomo exists as an alternative spelling of uomo), citations of it must be put in its entry—or less often on a citations page to which it then links—to prove it meets CFI. If a spelling is rare, some people do this pre-emptively. Everything else tends to be subjective / less agreed upon.
Some people add the earliest uses of English words to the lemma entries, even if the citations use other spellings—sometimes even if the citations are other languages, like Middle English (I see this even with Chaucerian examples that aren't the earliest) or Old English (few editors do this; it seems nonstandard/removable). Some people might put famous uses of words in any spelling on the lemma entries, too. Sometimes citations of one {{standard spelling of}} something are put on the entry for the standard spelling that has had content centralized on it.
But in general I would put citations on the Citations: page for the spelling they use (linked to and from the lemma's citations page via {{also}}) or on the lemma's citations page. - -sche (discuss) 15:31, 5 July 2017 (UTC)

Join the strategy discussion. How do our communities and content stay relevant in a changing world?[edit]

Hi!

I'm a Polish Wikipedian currently working for WMF. My task is to ensure that various online communities are aware of the movement-wide strategy discussion, and to facilitate and summarize your talk. Now, I’d like to invite you to Cycle 3 of the discussion.

Between March and May, members of many communities shared their opinions on what they want the Wikimedia movement to build or achieve. (The report written after Cycle 1 is here, and a similar report after Cycle 2 will be available soon.) At the same time, designated people did a research outside of our movement. They:

  • talked with more than 150 experts and partners from technology, knowledge, education, media, entrepreneurs, and other sectors,
  • researched potential readers and experts in places where Wikimedia projects are not well known or used,
  • researched by age group in places where Wikimedia projects are well known and used.

Now, the research conclusions are published, and Cycle 3 has begun. Our task is to discuss the identified challenges and think how we want to change or align to changes happening around us. Each week, a new challenge will be posted. The discussions will take place until the end of July. The first challenge is: How do our communities and content stay relevant in a changing world?

All of you are invited! If you want to ask a question, ping me please. You might also take a look at our the FAQ (recently changed and updated).

Thanks! SGrabarczuk (WMF) (talk) 14:50, 5 July 2017 (UTC)

Well documented languages and Tagalog[edit]

Can someone please add Tagalog again to Wiktionary:Criteria for inclusion/Well documented languages? It was removed without a proper process. From reading Wiktionary:Criteria for inclusion/Well documented languages, the minimum would be a discussion in Beer parlour, whereas the removal was indicated to be driven by a RFV discussion as indicated in diff.

I do realize some think this is too formal. But as Wiktionary:Votes/pl-2017-05/Modern Latin as a WDL 2 shows, what some think to be consensus often turns out to be something else when a proper discussion or vote is created. --Dan Polansky (talk) 15:49, 5 July 2017 (UTC)

Etymology before Pronunciation[edit]

Hello again! As I've been adding audio, I've noticed a few pages where the Etymology section is placed after the Pronunciation section, as in gadgetry. This goes against Wiktionary:Entry_layout#List_of_headings. I know that entry layout is flexible, but I personally prefer consistency so I'm tempted to "fix" these issues. Can I assume that Wiktionary:Entry_layout is up-to-date and reflects current consensus regarding layout? I recall someone's user page (I don't remember who) that mentioned that the Entry Layout page needs updating. I'm always hesitant to start making edits when a set of guidelines might not be current.

There are other cases where I've seen Etymology after Pronunciation, as in chess. Here, it makes sense to have Pronunciation first as it is common to both etymologies. Just saying that, because I know there are always exceptions to the rules. However, even in this case, there is a guideline at Wiktionary:Entry_layout#Etymology where, again, pronunciation comes after/below etymology. In the case of chess, one would have to duplicate the pronunciation section.

I know these are just guidelines, and nothing is black and white. I'm just looking for some other opinions, or maybe a pointer to discussion about layout that I'm not aware of yet, before I start hacking away. Thanks! —This unsigned comment was added by BirdHopper (talkcontribs).

Yes you can change the layout. I wouldn't go out of your way to fix thousands of pages by hand since this can be fixed automatically if anyone cares to do so. DTLHS (talk) 18:00, 5 July 2017 (UTC)
Okay. And thanks BTW for adding a signature for me. That's the second time I've done that in as many days. Oops. BirdHopper (talk) 18:25, 5 July 2017 (UTC)
I always put pronunciation before etymology. That way it's consistent if there is one word with multiple etymologies. —CodeCat 18:57, 5 July 2017 (UTC)
@CodeCat: But one word can have the same etymology and two different pronunciations--e.g. perfect (purr-fict and pur-fekt). —Justin (koavf)TCM 20:15, 5 July 2017 (UTC)
There's two etymology sections on that page. Also, the "tense" noun is missing an etymology. —CodeCat 20:44, 5 July 2017 (UTC)
Just because there are two sections, doesn't mean there should be. BigDom 06:58, 11 July 2017 (UTC)
There should be as many sections as there are etymologies, of course. —CodeCat 12:08, 13 July 2017 (UTC)

Inline referencing definitions in English entries[edit]

I think that, in general, we should not be inline referencing English definitions of English words. Not using references has largely been our practice. We use attesting quotations, not references; for English words, references carry no weight as per WT:ATTEST.

I have removed an inline reference in abbate but was reverted. What do you think? --Dan Polansky (talk) 12:35, 6 July 2017 (UTC)

I agree that we do not (with the exception of what a few newcomers have done) and should not add <ref>s to definitions, at least not as references for the definitions. The definitions need to be based on how the terms are used, as indicated by citations, as you say. I agree with your edit to abbate (although ideally inline refs like that should be moved to "Further reading"). I have sometimes seen users add references to {{defdate}}s; that might be OK. I have also seen references added to context labels like "proscribed" and "offensive", but in those cases I think it is better to leave the label bare (unreferenced) and add the references to a usage note. - -sche (discuss) 21:05, 7 July 2017 (UTC)
I don't like references to {{defdate}}, but AFAIK there is no consensus for removing them, or else I'd remove them as well. This was a reference to the definition itself. I think a further reading item pointing to offline The Shorter Oxford is pretty useless for our readers, and I would prefer not to have it there, but let t be now. The presentation of the reference is from a horror dream: "“abbate” in Lesley Brown, editor-in-chief; William R. Trumble and Angus Stevenson, editors, The Shorter Oxford English Dictionary on Historical Principles, 5th edition, Oxford; New York, N.Y.: Oxford University Press, 2002, ISBN 978-0-19-860457-0, page 3." It's a winner in a competition about how to make a reference specification as long as possible while providing close to nothing of value of the reader. --Dan Polansky (talk) 21:11, 7 July 2017 (UTC)

TabbedLanguages edit: default to English for unmarked links[edit]

TabbedLanguages currently sends you to the last-visited section, whenever you click a link that doesn't include a language section. I propose that this be changed so that it sends you to English by default, or if there is no English, to Translingual, and if there's no Translingual either, then to the last-visited section. Thanks to the efforts of various editors to add {{l}} and such to unmarked non-English links, and Daniel's work to fix all instances of {{term}} missing a language, most links to non-English terms are appropriately tagged. Thus, by far the most unmarked links in any non-English section are for English words; sending the user to English is only very rarely wrong, and when it is, it's always a result of a non-English term that has not yet been appropriately tagged. —CodeCat 14:12, 6 July 2017 (UTC)

Makes sense. --Dan Polansky (talk) 14:19, 6 July 2017 (UTC)
I agree that this should be fixed, my only comment is that perhaps Translingual should be the priority. Not a big deal since there aren't that many pages with both. - [The]DaveRoss 14:21, 6 July 2017 (UTC)
On a page such as hotel, it would be very undesirable for the link to go to Translingual by default. —CodeCat 15:32, 6 July 2017 (UTC)
I agree it would be better for plain-linked [[hotel]] to take you to hotel#English, not whatever language you were last reading, nor hotel#Translingual, nor the top of the page (which will take a non-logged-in user to the table of contents only). Doing this would obviate the need for the unpopular {{def}}. —Aɴɢʀ (talk) 17:11, 6 July 2017 (UTC)

Enabling Page Previews[edit]

CKoerner (WMF) 15:02, 6 July 2017 (UTC)

Which language section would it default to? Could it be changed via preferences? —Aryaman (मुझसे बात करो) 16:40, 6 July 2017 (UTC)
It would make sense if it defaulted to the language section given in the link, or to English (or the first section when there is no English) for plainlinks. I don't think it would be that useful for it to be configurable, because most non-English links have the language section specified (and the ones that don't, should). --WikiTiki89 17:28, 6 July 2017 (UTC)
Is this similar to the "Lupin" a.k.a. "Navigation" pop-up gadget? I have often wanted a version of that gadget that would fetch enough of the page that it would consistently fetch at least the first definition of the first (or specified) language section. - -sche (discuss) 21:12, 7 July 2017 (UTC)
For entries with {{wikipedia}} etc. I often see no substantive content at all from the popups we now have. I don't know whether an image is what we really need, rather than more - lots of - definitions. This could be very good. As with many improvements, configurability (suppressing graphics in my case) would be nice, but not at a high performance/server-load cost. DCDuring (talk) 19:43, 8 July 2017 (UTC)
(PoS header and definition lines seem essential; etymology header would be nice to indicate how much content there might be beyond what the page previews might be showing. Others might prefer other headers or content.) DCDuring (talk) 19:48, 8 July 2017 (UTC)

Wiktionary:Votes/pl-2017-07/Vote references in policies[edit]

Based on the discussions linked in the vote, I created Wiktionary:Votes/pl-2017-07/Vote references in policies. --Daniel Carrero (talk) 18:15, 7 July 2017 (UTC)

motî, iOS dictionary app released[edit]

This is a follow-up post to Looking for beta testers for new Wiktionary iOS app from November last year. There wasn't a great deal of interest in testing so unfortunately I didn't get much feedback. However the app is now publicly available in the App Store. It works offline, it's free, and it doesn't have ads (and never will). Right now there are only 10 languages but I plan to add more in later versions. The idea is to continuously update it based on recent dumps. – Jberkel (talk) 09:12, 8 July 2017 (UTC)

This looks really nice! Id love to use it, but I have Android. —Aryaman (मुझसे बात करो) 13:30, 13 July 2017 (UTC)
I'd love to work on an Android version but will focus on iOS first. I also considered doing a simple HTML5/mobile web version, but the offline storage limits are still too low for the (quite heavy) dictionary data. – Jberkel (talk) 07:42, 14 July 2017 (UTC)
Good to heard news of your project! I have Android on my phone too, but I think a better mobile app than the one made for Wiktionary now is a good news! Face-smile.svg Noé 12:12, 23 July 2017 (UTC)

What is the purpose of {{catlangname}} and {{topics}}?[edit]

What does {{catlangname|ru|calques}} do that [[Category:Russian calques]] doesn't? Similarly what does {{topics|ru|Electricity}} do that [[Category:ru:Electricity]] doesn't? I gather there are shortcuts {{cln}} and {{C}}, but there's also the category shortcut [[CAT:...]]. Benwing2 (talk) 22:06, 8 July 2017 (UTC)

Stops HotCat from being usable. Also saves a bit of typing if there are many categories? - -sche (discuss) 02:21, 9 July 2017 (UTC)
Sort keys. —CodeCat 11:24, 9 July 2017 (UTC)
@-sche "Stops HotCat from being usable" - how is that a good thing? A genuine question because I find HotCat very useful. BigDom 05:57, 11 July 2017 (UTC)
I believe -sche was being sarcastic. That sounded like a criticism against the templates. Like this: "We should stop using these templates, they stop HotCat from being usable". --Daniel Carrero (talk) 06:00, 11 July 2017 (UTC)
Or "Please can a JavaScript wizard make HotCat work properly"... — Eru·tuon 06:06, 11 July 2017 (UTC)
Yes! --Daniel Carrero (talk) 06:08, 11 July 2017 (UTC)
I would like to get rid of {{topics}}, but the only way we can do that is if categories automatically sort entries the right way. This is a feature we've been waiting on for years. Right now, {{topics}} is essential for reconstruction pages, which otherwise all get sorted under R for Reconstruction. It's also necessary for mainspace languages since we have custom sort keys. —CodeCat 12:06, 13 July 2017 (UTC)

Compound and fiction etymology categories[edit]

Is there a particular reason that all the top etymology categories for types of compounds are top-level etymology categories? That is, the following categories:

I would propose that these should be placed under Category:Compound words by language. The by-language categories already work this way.

We similarly seem to have Terms derived from [work] categories such as Category:Terms derived from Harry Potter by language as top-level etymological categories rather than, as expected, children of Category:Terms derived from fiction by language. --Tropylium (talk) 13:28, 9 July 2017 (UTC)

Agree. That's what I suggested here. --Barytonesis (talk) 13:32, 9 July 2017 (UTC)
It would be better not to put subtypes of compounds under Category:Compound words by language, but rather under another category. The x by language is supposed to contain only language-specific categories for x. See, for instance, how Category:Lemmas by language does not contain Category:Nouns by language, Category:Verbs by language, Category:Adjectives by language; instead these are placed in Category:Lemmas subcategories by language. So I've gone and moved the primary subtypes of compounds by language to Category:Subtypes of compounds by language, as @Barytonesis proposed on the talk page mentioned above. — Eru·tuon 23:48, 9 July 2017 (UTC)
While a category like Category:Subtypes of compounds by language is necessary, it could be renamed; perhaps Category:Types of compounds by language would be better, or something else? — Eru·tuon 23:55, 9 July 2017 (UTC)

Sound changes: categories and etymologies[edit]

I'm thinking there should be categories for the sound changes that English words particularly have undergone. For instance, terms that have undergone yod-coalescence (nature, nation, tune, idjit, whatcha) or yod-dropping (lute, new, figure, beautiful: obviously both of these vary by dialect), terms affected by the horsehoarse, Marymarrymerry, cotcaught, or weak vowel mergers. The same could be done for sound changes in other languages.

English is in an odd situation because these sound changes are usually not reflected in spelling, and hence they are not visible in etymologies.

Categories could be easily added by {{accent}} ({{a}}) in pronunciation sections, because they already sometimes mention sound changes (see, for instance, before § Pronunciation). But I think etymology sections should also contain information about changes in pronunciation that aren't reflected in spelling. — Eru·tuon 23:38, 9 July 2017 (UTC)

Synchronic regional differences in pronunciation have no place in etymologies, in my opinion: they're part of the history of the lect, not of the term itself. That's not to say that a regional form which is spelled differently than other forms due to one of the sound changes in question shouldn't mention it- but the etymology for Mary should say nothing about the Mary-marry-merry merger. If you think about it, any random sequence of letters of the right shape to be interpreted as containing such phonemes will reflect the changes when individuals read them aloud (e.g. *morpliger will reflect differences between rhotic and nonrhotic lects), so it's not about the history of the term. The place for such things is in the pronunciation section, as part of illustrating the regional variation for the term. Chuck Entz (talk) 02:51, 11 July 2017 (UTC)
I agree, I don't see how the kind of information you mention could sensibly be incorporated into etymology. It would make no sense to mention in the etymology of Mary that it has undergone the Mary-marry-merry merger, for example, since (1) that statement is false for some number of speakers, and (2) it's not etymological information. - -sche (discuss) 02:59, 11 July 2017 (UTC)
Listing examples of sound changes sounds like something best suited for an appendix. English could definitely use one, perhaps also various other languages with "minor" unpredictable spelling rules. See, for example, Appendix:Hungarian words with ly. --Tropylium (talk) 15:23, 23 July 2017 (UTC)

Arabic script CSS font stack proposal[edit]

I ran into some problems with the current Arabic font stack which seems to be rather nonsensical at the moment. I dont know what the logic is behind it. Here's a proposal I came up with for what to do about it: User:Radixcc/ArabicFontStackTest It's a little hard to figure out what's going on with some fonts because I can't find a @font-face directive anywhere for including fonts currently in the CSS. — Radixcc 📞 16:44, 10 July 2017 (UTC)

Which fonts to use for Arabic has been discussed before; one of the issues that complicates things is that not all fonts display sequences of diacritics (vowels + shadda, shadda + vowels) well. But it has been several years since the last discussion, maybe fonts have improved. Perhaps it would be illustrative to check how Wikitiki's "Arabic font test" examples display in the fonts you propose, and the fonts that were previously rejected (to see if those have improved). Pinging some users who participated in the long discussion I linked to: @Wikitiki89, Atitarev, Mzajac. - -sche (discuss) 17:30, 10 July 2017 (UTC)
Ok I added the diacritic tests to my page. Now that I get to comparing them it seems like the problem with Droid Arabic Naskh is that it appears a bit larger than the others. The whole Arabic font situation sure is a headache. — Radixcc 📞 02:37, 11 July 2017 (UTC)

Focus search box by default on most pages[edit]

I am sure this has come up previously, although I didn't find any recent discussions on the Beer Parlour. An OTRS email suggested that it would be very useful for the user if the search box had initial focus on most pages. The German Wiktionary has had this feature implemented for years, and I copied it here for those who would like to test it out. Just add importScript('User:TheDaveRoss/searchFocus.js') to your common.js page to see what it is like. It does not focus the search box on a few pages where that would obviously be undesirable, such as edit pages and log-in pages.
Main question: should we implement this for all users by default? It is not without downside, but it is pretty handy most of the time. - [The]DaveRoss 19:26, 10 July 2017 (UTC)

My comment in the previous discussion was that (AFAIK) focusing a control always forces the page to scroll to a point where the control is visible. This can be very annoying, e.g. if you are visiting water#Occitan. A text box having focus might also block other shortcut keys that would normally scroll, such as PgUp/PgDn. Equinox 19:30, 10 July 2017 (UTC)
The scrolling to focused control problem is an important one, and I can see a few possible ways around that but it is a problem with the current implementation. One would be to only focus if the page URL does not include an anchor, which is crude but effective. There are probably cleverer solutions as well, but I would rely on people who have done webdev and understand how focus affects things in various browsers.
The page up and down, as well as the arrow keys, are still functional for scrolling with this enabled. - [The]DaveRoss 19:40, 10 July 2017 (UTC)
This SO Answer may be applicable, as long as it doesn't do the whole scroll up and down dance. - [The]DaveRoss 19:46, 10 July 2017 (UTC)
We could make the search box scroll with the page. DTLHS (talk) 19:44, 10 July 2017 (UTC)
User could also be advised of the focus-search-box key (Alt+Shift+F in Chrome, might possibly vary by browser, and presumably not available on mobile). Equinox 20:01, 10 July 2017 (UTC)

Count me in as opposed to the idea--we don't need to control the users' browsers or behaviors any more than we already do. Scripts which unexpectedly take away control or focus are a real nuisance to me. —Justin (koavf)TCM 20:45, 10 July 2017 (UTC)

One could argue that this isn't unexpected, the current behavior is what is unexpected, but I agree that poor implementations of focus change are a real nuisance. - [The]DaveRoss 12:26, 11 July 2017 (UTC)

Actually, that "feature" (and iirc also one other stupid script) is the reason why I (as a german) am using enwikt instead of dewikt. So, pretty please don’t do it here. --Nenntmichruhigip (talk) 19:07, 26 July 2017 (UTC)

temp:head or language-specific template?[edit]

I never seem to know which markup is preferable to use: {{head|fr|suffix}} or {{fr-suffix}} (I'm only using French as an example)? I just did this, is this all right? --Barytonesis (talk) 22:30, 10 July 2017 (UTC)

Language-specific templates only really make sense if they add something significant, which is not the case in your edit. So I think your edit was right. —CodeCat 23:00, 10 July 2017 (UTC)

Not bolding the initials of abbreviations, acronyms and initialisms[edit]

At some point, I'd like to create a vote to incorporate this rule in WT:EL:

"Abbreviations, acronyms and initialisms can't use bold letters like this: armoured combat vehicle in ACV. The correct would be simply armoured combat vehicle."

I don't have actual numbers, but I believe this proposal likely reflects an unwritten rule already in practice. Most affected entries don't use the bold letters anyway, but sometimes I find a few entries that do.

If this passes, it would be kind of consistent with this 2010 vote: Wiktionary:Votes/pl-2010-03/Bolding letters in initialisms (based on Wiktionary:Beer parlour/2010/March#Bolding letters in initialisms). All participants voted "Oppose bolding", but I believe this simply means that no rule was effected. Apparently, EL was not edited in any way based on that vote. --Daniel Carrero (talk) 07:06, 11 July 2017 (UTC)

Though I think the bolding has been overused, there are occasions when the bolding makes clear how abbreviations of multi-word expressions ("MWE"s) are constructed from the components, where it is not immediately apparent. Similarly for some blends. IOW, we may have a preference, but a vote seems inappropriately rigid. A more complex proposal that attempts to address the exception I've identified is likely to be harder to understand, have surprising unanticipated consequences, and make for more rigidity. Less legalism, more use of dump-processing to support reviews of possible problematic overuse, misuse, etc seems more wiki-like. If this flexibility is not you one's taste, perhaps WikiData is a better project. DCDuring (talk) 12:14, 11 July 2017 (UTC)
In the entry for ABQ bolding selected letters would be very handy. In fact the display forced by {{abbreviation of}} makes the desirable use of bolding impossible to implement while getting the benefits of {{abbreviation of}}. Note also that in this case the abbreviation is not even one of an MWE. DCDuring (talk) 12:26, 11 July 2017 (UTC)
@DCDuring: Maybe the wording could be something like: "bolding of initials is generally discouraged". Assuming we want some entries to have it, but not all or most. --Daniel Carrero (talk) 14:12, 12 July 2017 (UTC)
Why not just start off WT:ELE with the imprecation: "Try to do a good job of formatting." Another, probably more productive approach would be to eliminate all the bad formatting in existing entries so there are fewer bad examples for contributors to follow. This would probably be more productive than working on yet another vote that doesn't leave us in a better place than we are now. DCDuring (talk) 15:01, 12 July 2017 (UTC)
We'll eliminate all the bad formatting from entries as soon as someone writes an algorithm that can tell us what bad formatting is. Or as soon as we rewrite our entries so that they are actually parseable by a computer without inventing strong AI first. DTLHS (talk) 04:40, 13 July 2017 (UTC)
We don't have to do anything hard. We can identify entries that use the several templates used for abbreviations and also contain an emboldened capital letter followed without a space by one or more lowercase letters. Most of these will be for initialisms or acronyms for which, IMO, there is not sufficient justification for bold. The other cases may need manual review. As we have more or less standardized on the templates involved, we should thus easily identify many of the cases. Once this has been done, a dump could be analyzed for all remaining instances of such or similar used of bold for parts of words, for manual review. Obviously if there is no consensus on the simpler cases, we can't proceed. DCDuring (talk) 05:49, 13 July 2017 (UTC)
I remember this was discussed before and most people disliked the bolding. Unfortunately no idea when/where. Equinox 17:04, 12 July 2017 (UTC)
Other than the 2010 discussion I linked in my first message above? (I noticed you started that discussion.) --Daniel Carrero (talk) 17:10, 12 July 2017 (UTC)

Language sections[edit]

I like very much the presentation of languages in articles in www.mediawiki.org ( example page). Would it be an enchantment to have it in all wiktionaries? I mean, in every page, instead of language sections, to have a table that the user will select the language to see. That way when a user comes to very heavy page with lots of language sections will not be obscured by other languages. I see they only have a <language> tag which probably does most of the work. Maybe in such an extension Languages do not have to be in accordance with iso codes, in order to have the ability to add non standard language (if this is desirable). As of my understaning this works by creating a subpage of a language and just displaying it depending on user preferences (which we may or may not use in wiktionaries). The user has the freedom to choose any language to see. --Xoristzatziki (talk) 14:15, 11 July 2017 (UTC)

There is a similar feature available here, it is called Tabbed Languages. You can enable it in Preferences > Gadgets if you would like to see it in action. - [The]DaveRoss 14:23, 11 July 2017 (UTC)
Can someone remind me why Tabbed Languages isn't enabled for all logged-out users? Entries with several language sections are basically a cluttered mess without it. This, that and the other (talk) 21:14, 12 July 2017 (UTC)
It was actually voted on and passed, but then nobody did anything about it. It should be done now. —CodeCat 12:03, 13 July 2017 (UTC)

Strategy discussion, cycle 3. Let's discuss about a new challenge[edit]

Hi! It's the second week of our Cycle 3 discussion, and there's a new challenge: How could we capture the sum of all knowledge when much of it cannot be verified in traditional ways? You can suggest solutions here. You can also read a summary of discussions that took place in the past week. SGrabarczuk (WMF) (talk) 15:36, 11 July 2017 (UTC)

We already don't require "reliable sources", so I think we're ahead of the curve on that one. —CodeCat 12:10, 13 July 2017 (UTC)

Creating redirects to xx-IPA templates[edit]

Is it an accepted practice to create a redirect to an xx-IPA template (e.g. {{hu-ipa}} redirects to {{hu-IPA}})? I am copying a conversation from User:Liggliluff's talk page:

Hi, what is the purpose of the redirect to {{hu-IPA}}? --Panda10 (talk) 16:43, 13 July 2017 (UTC)

Because {{fa-ipa}}, {{ko-ipa}} exists, and if people are used to these, it'll be easier for them to find the other, and it's quicker and easier to not having to shift case.
But then, the other templates doesn't have lowercase redirects: {{ar-ipa}}, {{ca-ipa}}, {{cs-ipa}}, {{eo-ipa}}, {{et-ipa}}, {{fi-ipa}}, ...
And then you got: {{grc-IPA}}/{{grc-ipa}}
I believe the standard naming convention is xx-IPA. But I will bring up the subject at Beer Parlour.

--Panda10 (talk) 18:43, 14 July 2017 (UTC)

Yes, the standard convention is xx-IPA. A few years back I moved all templates with deviating names to xx-IPA as long as they were luacized templates that automatically generated pronunciation information. Most redirects are there because of the page moves. There's nothing particularly wrong with having redirects from xx-ipa names, but there's no particular reason for them either. Just use xx-IPA. —Aɴɢʀ (talk) 08:17, 15 July 2017 (UTC)
Thanks. --Panda10 (talk) 13:58, 15 July 2017 (UTC)

CFI and Poorly-Attested Varieties of Well-Documented Languages[edit]

The whole distinction between LDLs and WDLs was intended to protect entries for lects with limited corpora of written texts, and yet there are large numbers of dialectal terms even in English that are hard or impossible to verify under the current rules.

For one thing, people have always tended to write only in the standard lect and only speak in the other lects (or at least not write anything that gets durably archived). Add to that the lack of standard spelling, which means that any single variation is less likely to be attested often enough, and you have the equivalent of many LDLs embedded within WDLs.

There's also the matter of historical variation in depth of attestation: modern technology has made it easier to produce, distribute, capture and preserve language, and attitudes about various sublects, not to mention tolerance in general for lects other than the standard ones, have changed over time.

Is there any way we could modify CFI to take this into account? Perhaps we could specify in the WDL list which sub-lects are well-documented, and exempt all the others from the WDL requirements. Either that, or add general parameters for which types of sub-lects should be exempted or not exempted. Chuck Entz (talk) 20:46, 14 July 2017 (UTC)

I definitely agree. It's much harder to attest uncommon variants of a language, and even very informal levels of language can be difficult to attest, for the same reasons. Andrew Sheedy (talk) 03:56, 15 July 2017 (UTC)
I agree as well. We should protect dialectal terms and old hapax legomena (e.g. لسپردرک (laspardarak)). --Vahag (talk) 08:19, 15 July 2017 (UTC)

Entries by 78.3.158.12 (talk)[edit]

This user has been creating lots of entries for generic molecular formulae. I seem to remember that we don't accept these. Or do we? SemperBlotto (talk) 13:05, 16 July 2017 (UTC)

  • Most of his other entries are of rather poor quality - I have half a mind to block him. SemperBlotto (talk) 13:25, 16 July 2017 (UTC)
Now the user is adding phobias, some of which seem to be barely attested, and others of which just get a lot of mentions. But the user is probably adding this content in good faith, maybe not knowing that "mentions" don't meet CFI, so I think advising them on their talk page is better than blocking. - -sche (discuss) 18:02, 16 July 2017 (UTC)
Block them as a BrunoMed sock. At least I think that's who it is. Look for assembly-line-style use of the same verbiage whether it fits the entry or not. I'm not 100% sure, because I don't remember the geolocation details of their previous socks- except that they all geolocated to Croatia, as this one does. Chuck Entz (talk) 01:26, 18 July 2017 (UTC)

Wiktionary:Votes/pl-2017-07/Gallery[edit]

Based on the discussions linked in the vote, I created Wiktionary:Votes/pl-2017-07/Gallery. --Daniel Carrero (talk) 11:38, 17 July 2017 (UTC)

"Obsolete" forms that were never really used[edit]

The discussion that started here and the contributions of this anon give rise to a problem that we need to solve – what do we do with word forms that were never naturalised and are they to be considered "obsolete"?

Context: Romanian underwent a change of writing systems in the late 18th century. Transylvanian scholars adapted the Latin alphabet to the Romanian language, using orthographic rules from Italian. The Cyrillic alphabet remained in gradually decreasing use until 1860, when Romanian writing was first officially regulated. At the time, countless mixed alphabets were introduced, some were even used simultaneously. If you were to read texts from this period, it wouldn't come as a surprise if one word was written using several alphabets; it all depended on who you were reading.

Adding to an already difficult linguistic period, there was also a tendency in the works of several scholars to re-Latinise the language ad absurdum (for instance Dicționarul limbii române, 1871-1876, by August Treboniu Laurian and Ion C. Massim).

E.g. the word băiețel ("little boy") was written as baiatellu in the aforementioned dictionary.

The spelling is completely subjective to the ideas and beliefs of the scholars who wrote it. It does not in any way, shape or form describe how the word was actually pronounced or written by the public.

Therefore, my question is should word forms such as baiatellu be included under Alternative Forms as obsolete even if they were never actually used? The form has indeed citations and would technically fulfil minimum requirements for inclusion, but it somehow feels wrong to add it considering that it was used only in a tight-knit circle of scholars and authors of the time. I have similar hesitations when it comes to forms without diacritics (e.g. -țiune vs. -tiune), because it would cause disarray amongst Romanian entries and possibly other languages too.

Any input is highly appreciated (@Redboywild, @Word dewd544). --Robbie SWE (talk) 09:57, 18 July 2017 (UTC)

Maybe a "hypercorrect" gloss? We've had a few such English entries where e.g. an æ spelling has been added that doesn't stand up to scrutiny. Equinox 12:20, 18 July 2017 (UTC)
I would tag them as both "obsolete" and "rare". If there's a standard explanation that you would use in a number of entries, you could also create a template. Chuck Entz (talk) 13:40, 18 July 2017 (UTC)
I'm sorry to have to be a massive pain, but the implications of said solutions are daunting. It would give every Romanian entry several alternative forms, most of which would be artificial and/or unknown to a majority of Romanians. These forms, especially the made-up Latin forms from the late 19th century, would erroneously suggest that Romanian was morphologically closer to the other Romance languages than it actually was. We would basically be accepting counterfactual efforts from some scholars to revamp the historical evolution of the Romanian language. I would personally not go anywhere near forms that popped up during this period – only veritable alternative forms and attested archaic/obsolete forms such as nație for națiune, pâne for pâine (DEX is pretty good at mentioning these alternative and archaic/obsolete forms together with veritable sources from prose and poetry), etc., deserve to be mentioned, IMHO. I think the problem is exacerbated by the lack of written sources in Romanian dating from before the 16th century. It makes it hard to create a historical timeline for the Romanian language – where we have Middle English and Old English, in Romanian we have nada. --Robbie SWE (talk) 16:13, 18 July 2017 (UTC)
It's up to you to decide what to work on. If a word appears in print it can be included, provided it is properly tagged, even if it was used by an author promoting historical revisionism. DTLHS (talk) 16:41, 18 July 2017 (UTC)
If there are three independent citations of forms like you mention (i.e. they don't just appear in the work of one author), then someone who wants to spend the time adding them can do so. I would suggest adding a few sentences at WT:ARO that describe the issue ("in the 1800s circles of scholars proposed many spellings for such-and-such reasons that never caught on and are now obsolete...") — perhaps WT:ARO#Moldovian_and_Cyrillic_Romanian (where the allowance of Old and New Cyrillic forms is explicit in Wiktionary:Votes/2011-10/Unified Romanian) can be generalized into a section on spellings. Then, one could make a qualifier template that links to that explanatory section, to put after those spellings when they're listed in alternative forms, and one could also make a "form of" template to use in the entries for the spellings themselves. Maybe the wording could be "obsolete respelling of X proposed in the 1800s"? - -sche (discuss) 17:31, 18 July 2017 (UTC)
I'm personally of the mind that we shouldn't really bother too much with these. It's just going to add unnecessary confusion for those who aren't very familiar with the language and its historical evolution, or otherwise just take a lot of effort to explain the (rather obscure) context of these forms of the words. At any rate, I won't really be involved with this, as I still have other things to work on. Word dewd544 (talk) 17:58, 18 July 2017 (UTC)
It's definitely within our ambit. If we're worried about adding unnecessary confusions, we should figure out how to record the information in a way that isn't confusing. This is hardly a problem limited to Romanian; few natural languages had the spelling standard that persists today (if the community has, indeed, decided on a spelling standard) upon the birth of writing in that language. Glancing at Shakespeare's First Folio, it seems we have much of the old 17th century spelling, but not linked from the modern spelling in any way.--Prosfilaes (talk) 21:33, 18 July 2017 (UTC)

Ok then. Humour me for a minute – let's play a game of what-if.

What if I were a scholar, specialised in linguistics with a strong proclivity for English. Let's say I hate foreign influence on the English language – Anglo-Norman, Latin and other Romance languages have ruined this Anglo-Saxon gem and nothing would tickle my fancy more than to cleanse the language from these aberrations. In a Tolkienesque manner I author a book proclaiming my agenda and later, a voluminous dictionary where I completely refurbish the English language – vocabulary, grammar and morphology, you name it, have all been Anglo-Saxified. Several fellow colleagues agree with me, write books about my work, cite me frequently and some even continue my purification crusade. Others criticise my work and call me a nutcase (and rightly so, if you ask me), nonetheless plenty of quotes, but not so many headlines (you know, the media is too busy covering Trump's latest tweet or something like that). Flash forward 150 years, someone finds my work and thinks "Wow, English sure has changed – I've never seen these words and archaic spellings before. I think I'm going to add them to Wiktionary as obsolete forms". These contributions are easily cited because finding citations is a piece of cake, so they pass WT:CFI. Mind you, no one else – authors, newspapers, mass media, or that Instagram celebrity who is famous for doing nothing – has ever used the words in my dictionary.

Back to the present. If the consensus is that the postulation above is feasible and something we should accept, then I think I'm going to need Prozac and a call to my shrink cause the world has gone mad I tell you! --Robbie SWE (talk) 18:48, 19 July 2017 (UTC)

All words in all languages. As long as they pass CFI, tag them as obsolete, rare (and maybe even make a special label or template with an informative link that says following the abandoned orthography of Dr. So-and-so). I'd only link to them from a lemma in an autocollapsed box with qualifiers. —Μετάknowledgediscuss/deeds 18:56, 19 July 2017 (UTC)
How would they be easily cited if no one used them? Three independent cites is not a trivial hurdle. There have been quite a few English spelling reformations, but few of them can offer even one real printing in the spelling; we count uses, not mentions. The Deseret alphabet, supported by the local government, could arguably not reach that level for anything.
There's a lot of marginal language use. People are welcome to not spend their time on anything they don't feel is worth it. But if someone wants to record this history of Romanian, it's entirely in our ambit.--Prosfilaes (talk) 21:42, 19 July 2017 (UTC)
"What-if"? See w:Linguistic purism in English. We've had problems with otherwise very good contributors- even admins- trying to push this. We document everything that's been actually used (as opposed to mentioned), but we explain what it is, and we don't allow uncommon forms as translations or in definitions, nor do we usually link to them from the main forms. That way someone who runs into it somewhere (e.g. Google Books) knows what it is, but we aren't promoting it. Chuck Entz (talk) 14:13, 20 July 2017 (UTC)
The point of my somewhat overdramatic "what-if" story is to exemplify what I believe to be an absurd stance – the situation for Romanian is that we have word forms coined by scholars, mentioned within their like-minded networks but not actually used by anyone else. I don't think it is in our best interest to add these forms as alternative forms in main namespaces (like the anon did), because it suggests that they were common at the time. I'm not going to work with these forms, but I dread that someone else will find them extremely interesting and add them to existing Romanian entries. --Robbie SWE (talk) 17:54, 20 July 2017 (UTC)
If they're not actually used, then they're not relevant. You say "coined by scholars, mentioned within their like-minded networks but not actually used by anyone else" which avoids the important question of whether they were actually used by anyone. If they were, then we should have entries on them. I believe we should add alternative forms on them, that we should link all alternative forms that are citable, with appropriate notes, but that's not the important thing.--Prosfilaes (talk) 19:26, 20 July 2017 (UTC)
Romanian forms with -tiune and (silent) ending u were used - and not just mentioned - and are attestable as for WT:CFI. If there are doubts, please use WT:RFVN. Those old spellings are similar to for example old English spellings which are included in the English wiktionary too. (It's not necessarily hypercorrect, and even English spellings with æ or œ are not necessarily hypercorrect.)
If -tiune and (silent) ending u were rare, then it could also be because Romanian was rarely written or rarely written in Latin characters in the 19th century, and not just because -țiune and u-less spellings were the common forms. Anyway, as others pointed out there could be more informative labels than just obsolete.
Latinising spellings, as -tione instead of -tiune/-țiune in the dictionary by Laurianu and Massimu, probably aren't attestable. If they unexpectedly are attestable, then the label simply could/should be more informative than just obsolete, for example it could be [[Wiktionary:About Romanian#Spelling|Latinising spelling]]; obsolete, rare/uncommon. Also dates could be added in the label like 19th century Latinising spelling, or inventors could be mentioned if there are any and if they are famous like Latinising spelling following Laurianu and Massimu. (If the inventors are not famous, then it's not really help- and useful to mention them in a label.) -80.133.118.3 18:58, 23 July 2017 (UTC)
I think there's a general consensus that there's not a problem creating entries for these things. The real argument seems to be about linking them to entries using standard spellings, which is not an issue resolvable by WT:RFVN.--Prosfilaes (talk) 17:43, 24 July 2017 (UTC)

Adding Demotic[edit]

Can we add Demotic (the stage of Egyptian between Late Egyptian and Coptic, not the Greek vernacular) as a language (perhaps egx-dem)? It’s cropping up in a lot of Coptic etymologies (e.g. ϣⲉⲣⲓ (šeri), ϩⲁⲓ (hai), ϩⲟⲟⲩⲧ (hoout)) and some others (e.g. lily) with no clear way to link to it. The script and transliteration are different from (hieroglyphic/hieratic) Egyptian, as is the grammar and a good part of the lexicon, so that splitting it off wouldn’t result in significant duplication of content. Traditional lexicography keeps the two separated (cf. the Wörterbuch der ägyptischen Sprache vs. the Demotisches Glossar) with good reason. — Vorziblix (talk · contribs) 16:30, 19 July 2017 (UTC)

Sounds reasonable to me. —Aɴɢʀ (talk) 21:53, 19 July 2017 (UTC)
@Angr Since I don’t have the requisite admin rights to edit the module, could you (or any other admin) add the following to Module:languages/datax:
m["egx-dem"] = {
        canonicalName = "Demotic",
        otherNames = {"Demotic Egyptian", "Enchorial"},
        scripts = {"Latinx", "Egyd"},
        family = "egx",
        ancestors = {"egy"},
        wikipedia_article = "Demotic (Egyptian)",
}
and add the line
   ancestors = {"egx-dem"},
to Coptic in Module:languages/data3/c? (The tabs might need to be fixed if not copied from source.) Thanks. — Vorziblix (talk · contribs) 00:36, 20 July 2017 (UTC)
Added. DTLHS (talk) 00:47, 20 July 2017 (UTC)
Either way is fine, but looking over results from e.g. Google Books or web search, unqualified ‘Demotic’ in English almost always means Egyptian Demotic (or simply the adjective) and Egyptian Demotic is almost always called simply ‘Demotic’, whereas Greek Demotic is generally specified as such. Context also makes it rather unlikely that the two would be confused, especially given that we don’t have Demotic Greek as a language or dialect separate from (Modern) Greek. However, if consensus favors changing the name, it should be fairly easy to do. — Vorziblix (talk · contribs) 05:15, 20 July 2017 (UTC)
Yeah, it's common enough to call it just "Demotic" (like also the script Egyd).
But on the subject of naming conflicts, we have both Category:Egyptian languages and Category:Egyptian language, i.e. a family and a language have the same name. Wiktionary:Families advises that this should be avoided, but does it actually cause any problems other than in etymologies where "from Egyptian" (compare "from Germanic") and "from Egyptian" would be indistinguishable? If that's the only issue, it seems like it can be worked around without renaming anything. - -sche (discuss) 09:28, 21 July 2017 (UTC)

Vocalisation of laryngeals, again[edit]

Can anyone please help me deal with Victar in Reconstruction:Proto-Indo-European/h₂reh₁- and Reconstruction:Proto-Indo-European/Hreh₁dʰ-? The two given reconstructions make no sense. A sonorant in a zero-grade root becomes syllabic, this is standard PIE. So this means that the laryngeals next to it certainly don't become syllabic. Syllabic sonorants in Germanic develop an epenthetic -u- in front of them, which is what would be expected in such a form. The fact that something else is found implies that the reconstruction is wrong. How do I explain this? I'm tired of being forced into an edit war in order to keep dubious information out of Wiktionary. Clearly there is no consensus to include it, so why should it be included anyway? —CodeCat 21:20, 19 July 2017 (UTC)

It's definitely unexpected for HRHC- to make the second laryngeal syllable rather than the R, but maybe someone's discovered a new sound law by which (word-initial?) HRHC- surfaces as post-laryngeal RəC- > RaC- in Germanic rather than the normally expected R̥̄C- > uRC-. Are there any PGmc words that do start with uRC- < HRHC-? All the uRC- words I can find in CAT:Proto-Germanic lemmas (*umbi, *und, *under, *unhtaz, *unseraz, *urbą) seem to come from *(H)R̥C-, not *(H)RHC-. —Aɴɢʀ (talk) 21:49, 19 July 2017 (UTC)
uRC from RHC: *kundaz, *kurną, *gulþą, *hurną, *hulliz, *þunnuz, *spurą. Kroonen notes for *bladą < *bʰl̥h₃tóm that the -a- must be secondary since it can't reflect an inherited form of the root in any grade. There may be other cases of such "impossible" grades with laryngeal-final roots in Germanic. —CodeCat 19:53, 20 July 2017 (UTC)
None of those are word-initial, though. —Aɴɢʀ (talk) 21:19, 20 July 2017 (UTC)
These are sourced from Kroonen, so in the absense of alternative analyses, I'm not sure what else are we supposed to do here. Perhaps we might add a question mark, and explain the actual issue in the PGmc entries themselves (once they have been created). --Tropylium (talk) 15:35, 23 July 2017 (UTC)
We're not required to go with Kroonen. This is one of the reasons why I opposed blindly following sources in the past. Sometimes they really do lead you somewhere nonsensical. My own interpretation of the situation is that Kroonen is probably essentially correct, but that the derivation is post-PIE. They would have occurred at a time when laryngeals were no longer consonants, but the laryngeal nature of certain roots was not yet entirely lost. A derivation like *bladą is only possible if speakers somehow "knew" that a (or a predecessor) was the vowel to be used in the zero grade of such laryngeal roots, which in turn must have arisen by analogy with CHC-shape zero grades where a is the regular development. However, it is important to note that there is no a in the past plural of strong verbs anywhere, even in verbs of laryngeal roots. Instead, classes 6 and 7, where most laryngeal roots are, have no zero grade altogether. —CodeCat 15:44, 23 July 2017 (UTC)
Most of what we list in PIE root entries under derivations are preforms (projections into PIE) and not proto-forms (comparative reconstructions) anyway. A single reflex in a single branch, say Celtic *sutus < "*séwH-tus" < *sewH- typically does not warrant reconstructing *séwHtus for PIE itself. This is after all why we (and also other reference works) normally list such descendants under just the PIE root, not under any actual PIE term.
So the wider question is: do these pre-forms have to adhere to canonical PIE grammar, or is it acceptable to give pre-forms that clearly were not PIE and indicate later formation? (Note that, from an Indo-Hittite viewpoint, this would also have to include quite a bit of morphology.) We do want to link later formations from PIE entries somehow, and the current approach seems like a workable compromise. Maybe we can add a disclaimer to WT:AINE about forms in derivatives-of-roots lists. --Tropylium (talk) 13:03, 26 July 2017 (UTC)
Normally, the actual reconstructable proto-forms are red/bluelinked and can have their own page, while the projections are unlinked. It becomes difficult when there is actually no possible PIE preform, like in the case of *bladą. We should mention in this case that it's not a PIE preform. —CodeCat 13:37, 26 July 2017 (UTC)

Wiktionary meetup 2, United States[edit]

I'll be in Sandusky, Ohio a lot this summer. I can set up a meeting there with someone who would be able to go to northern Ohio. We'll meet for ice cream or lunch or something. I don't care what we do. I just want to meet a Wiktionarian!

(I could also go to Cleveland, Columbus, Toledo, or any other city within that general area.) Reply or post on my talk page and we'll exchange contact info or whatever if necessary, and figure out where it is in Ohio you want to meet. PseudoSkull (talk) 00:21, 20 July 2017 (UTC)

Pinging Ruakh.​—msh210 (talk) 00:25, 23 July 2017 (UTC)
Thanks for thinking of me, but I live in the Seattle area now. —RuakhTALK 04:46, 23 July 2017 (UTC)

WT meetup, Spain[edit]

Anyone finds themselves near Barcelona this summer too. Send me a message. --Recónditos (talk) 15:36, 20 July 2017 (UTC)

Damn I wish I could meet you! PseudoSkull (talk) 15:40, 20 July 2017 (UTC)
Haha, I have always wondered whether Wonderfool's Spanish location was a lie or not. Also whether he actually married an heiress. I FORGET NOTHING. Equinox 00:33, 23 July 2017 (UTC)
An heiress? Lol, no. --Recónditos (talk) 12:24, 23 July 2017 (UTC)

Strategy discussion, cycle 3. Challenge 4[edit]

Hi! The movement strategy discussion is still underway, and there are four challenges that you may discuss:

  1. How do our communities and content stay relevant in a changing world?
  2. How could we capture the sum of all knowledge when much of it cannot be verified in traditional ways?
  3. As Wikimedia looks toward 2030, how can we counteract the increasing levels of misinformation?
  4. and the newest one: How does Wikimedia continue to be as useful as possible to the world as the creation, presentation, and distribution of knowledge change?

The last, fifth challenge will be released on July, 25.

If you want to know what other communities think about the challenges, there's the latest weekly summary (July 10 to 16), and there's the previous one (July 1 to 9).

If you have any questions, you may ask here (please, remember to ping me). The FAQ might be helpful as well.

Bot request: create entries for Japanese verb and adjective forms[edit]

Please someone use a bot to create the entries for all Japanese verb and adjective forms, if that's OK.

I've been trying to learn Japanese and I think maybe these entries would be helpful.

Unless people consider these entries unwanted for some reason. --Daniel Carrero (talk) 16:08, 20 July 2017 (UTC)

Not knowledgable in Japanese, but I believe all inflections of all words in all languages should be added to Wiktionary regardless. So, in all technicality, they should be welcome. PseudoSkull (talk) 19:45, 20 July 2017 (UTC)
I think that we have to decide what forms we want first. Also, there are conflicting views.
This article describes a set of conjugation rules widely used in order to teach Japanese as a foreign language. However, Japanese linguists have been proposing various grammatical theories for over a hundred years and there is still no consensus about the conjugations. Japanese people learn the more traditional "school grammar" in their schools, which explains the same grammatical phenomena in a different way with different terminology (see the corresponding Japanese article). (w:Japanese verb conjugation)
Because the Japanese language is written without space, different grammar systems tend to have different notions on what constitutes a word. The 学校文法 (gakkō bunpō, school grammar) system tends to cut sentences into smaller pieces to help understand the development of the language. It is used in Japanese schools and dictionaries, but is not designed for a foreign audience who have no experience with the language. A new grammar called 日本語教育文法 (nihongo kyōiku bunpō, Japanese-language education grammar) has been devised since 1960s. It simplifies the “school grammar” system a lot and is widely used in learning materials for non-native speakers. The difference is that the former provide “stems” used to form words and the latter provide prefabricated forms to be used in sentences. (Appendix:Japanese verbs)
In Japan, adjectives and verbs alike all have 未然形, 連用形, 終止形, 連体形, 仮定形, and 命令形 (see our current entries), while for foreigners, verbs have dictionary form, a-form, i-form, u-form, e-form, o-form, and te-form, while adjectives are something else.
(Actually, I've been thinking about this. See [1].) —suzukaze (tc) 01:54, 21 July 2017 (UTC)
(@Eirikr, Atitarev, Dine2016, TAKASUGI Shinji, Fumiko Take, Wyangsuzukaze (tc) 02:36, 21 July 2017 (UTC))
We should avoid adding bound forms because there is no consensus between traditional grammarians and modern linguistes. The following forms may have their own pages:
Negative 書かない 食べない 来ない しない
Volitional 書こう 食べよう 来よう しよう
Polite 書きます 食べます 来ます します
Past 書いた 食べた 来た した
-te 書いて 食べて 来て して
Condition 書けば 食べれば 来れば すれば
Imperative 書け 食べろ 来い しろ
TAKASUGI Shinji (talk) 03:41, 21 July 2017 (UTC)
@Shinji, some questions for you:
What do you mean by "bound forms"? By some interpretations, all of the above except the Imperative are "bound forms".
  • If you mean the causative and passive, I think omitting these does a disservice to that portion of our user base who might be beginner-level studiers of Japanese. The rules for the passive, for instance, and whether to add れる (-reru) to the 未然形 (mizenkei, irrealis or incomplete form) or られる (-rareru) instead, are easy enough once you know them. But for anyone who doesn't know the rules, this kind of information is typically best presented in conjugation tables. And someone running across a verb form like 食べさせられました (tabesaseraremashita, was made to eat something, polite causative-passive past tense) may well not know that the lemma form is 食べる (taberu, to eat).
I'm not necessarily advocating that we create entries for forms like 食べさせられました, but I do think we need to ensure that a user searching for "食べさせられました" can somehow find their way to the lemma entry at 食べる, and have access (via tables, or links to other pages here or on Wikipedia) to the information needed to make sense of the longer conjugated forms. Perhaps having full entry pages is the best way to do this. Perhaps instead we just need to include the conjugated forms somewhere within the lemma pages. Or perhaps there's an altogether different approach. My concern is ensuring that users are able to find what they need.
Also, do you have any opposition to the inclusion of other polite conjugations, such as the past form -ました (-mashita), or the volitional form -ましょう (-mashō)? Various materials targeting English-speaking learners include the polite forms not as a single row, but as a whole column, showing each of the conjugations. ‑‑ Eiríkr Útlendi │Tala við mig 18:17, 21 July 2017 (UTC)
I meant non-final forms such as mizenkei (ex. 書か-, 食べ-). I should have said stem or radical probably. — TAKASUGI Shinji (talk) 23:55, 21 July 2017 (UTC)
Why not improve on the search function instead? Forms of verbs and adjectives are never ending for agglutinative languages. If the search string contains kana and is Japanese-looking, compare it with a repertoire list of all existing Japanese lemmas and their autogenerated forms. Output terms which are most similar to the search string, with their definition, sorted by increasing Levenshtein distance between the search string and term. That would suggest 食べさせる (tabesaseru)―the causative form of 食べる (taberu, to eat)―as the closest match for 食べさせられました (tabesaseraremashita). I think a similar approach is used by the online Korean dictionary Daum. Wyang (talk) 22:27, 21 July 2017 (UTC)
I think it's a great idea to add Japanese inflected forms - verbs and adjectives. They already exist in CAT:Japanese verb forms and CAT:Japanese adjective forms. There's a lot of work though but it can be done with a bot. Care should be taken if a form coincides with another word, as in hiragana spelling of  () (koi) - こい (koi). I support the same for Korean verbs and adjectives and other languages. Recently Persian verb forms were added. The work on search function can be done in parallel. I also think that the forms in the inflection tables should be wikified (linked) as in the majority of inflection tables for other languages. --Anatoli T. (обсудить/вклад) 02:37, 22 July 2017 (UTC)
  • WF would quite like to do it. I remember once, about 10 years ago, WF wrote a bot to add inflected forms of Ancient Greek verbs. Knowing nothing about the language, and with only a smattering of botting experience behind him, he was promptly blocked. --Recónditos (talk) 12:22, 23 July 2017 (UTC)

Sorting Vietnamese[edit]

I just noticed that we don't have automatic category sorting for Vietnamese, which has an extremely diacritic-rich writing system. Should we? How does it work? Are the tone diacritics ignored for sorting purposes, so that à ả ã á ạ are all sorted as a? What about the non-tone diacritics? Are ă/â ê ô/ơ ư sorted together with a e o u, or are they sorted separately? And what about đ? Is it equivalent to d for sorting purposes, or are they separate? —Aɴɢʀ (talk) 10:56, 21 July 2017 (UTC)

For pinging purposes: our currently active editors who claim some knowledge of Vietnamese are @Wyang, Atitarev, Fumiko Take, HappyMidnight, Monni95, MuDavid, Mxn, PhanAnh123. —Aɴɢʀ (talk) 11:01, 21 July 2017 (UTC)
Thanks for the ping but I can only confirm the order "a, á, à, ả, ã, ạ" provided by Stephen below. In case it's not obvious, a, ă and â are separate letters (in this order), also the correct order for similar letters is: d, đ; e, ê; o, ô, ơ; u, ư. Digraphs (gi, kh, ng, nh, ph, th, tr) are not separate letters. --Anatoli T. (обсудить/вклад) 04:14, 22 July 2017 (UTC)
This was discussed previously at User talk:Fumiko Take#Sort. Vietnamese dictionaries sometimes have different practices of sorting the diacritics and tones, but I think the method proposed in the linked discussion is a good one to use. That will require that Module:links and Module:languages/data2 provide customisation for sorting so that the sort key can be generated externally by a sorting function (Module:vi-sort of sorts). Wyang (talk) 11:06, 21 July 2017 (UTC)
Based on the thread you linked to, I think at the very least we should edit Module:languages/data2 to strip the tonal diacritics. I can do that right now if there are no objections. Categories already ignore capitalization for sorting purposes for all languages. Anything beyond that would go beyond my editing abilities, but at least I can take the first step. —Aɴɢʀ (talk) 11:42, 21 July 2017 (UTC)
@Wyang: I've modified Module:languages so that the sort_key value in a language's data table can be the name of a module that contains a sortkey-generating function. The function (currently) must be named makeSortKey and it is automatically supplied the arguments text, langCode, scCode, the same arguments that are supplied to transliteration modules. That should allow you to create a Vietnamese sortkey-generating module. — Eru·tuon 19:11, 21 July 2017 (UTC)
My attempt (without knowing of Erutuon's edits):
	sort_key = {
		from = {
			'%-',
			'à', 'ả', 'ã', 'á', 'ạ',
			'ằ', 'ẳ', 'ẵ', 'ắ', 'ặ',
			'ầ', 'ẩ', 'ẫ', 'ấ', 'ậ',
			'è', 'ẻ', 'ẽ', 'é', 'ẹ',
			'ề', 'ể', 'ễ', 'ế', 'ệ',
			'ì', 'ỉ', 'ĩ', 'í', 'ị',
			'ò', 'ỏ', 'õ', 'ó', 'ọ',
			'ồ', 'ổ', 'ỗ', 'ố', 'ộ',
			'ờ', 'ở', 'ỡ', 'ớ', 'ợ',
			'ù', 'ủ', 'ũ', 'ú', 'ụ',
			'ừ', 'ử', 'ữ', 'ứ', 'ự',
			'ỳ', 'ỷ', 'ỹ', 'ý', 'ỵ',
			'ă', 'â', 'ê', 'ô', 'ơ', 'ư',
			'đ',
			'([1-5])([^%s]+)', -- move tone number to end of syllable
			'([a-z₁₂₃]+)([^a-z₁₂₃1-5]+)', -- add tone 0 to syllables that are not followed by a number
			'([a-z₁₂₃]+)$', -- add tone 0 to syllables that are followed by the end of the string
		},
		to   = {
			' ',
			'a1', 'a2', 'a3', 'a4', 'a5',
			'ă1', 'ă2', 'ă3', 'ă4', 'ă5',
			'â1', 'â2', 'â3', 'â4', 'â5',
			'e1', 'e2', 'e3', 'e4', 'e5',
			'ê1', 'ê2', 'ê3', 'ê4', 'ê5',
			'i1', 'i2', 'i3', 'i4', 'i5',
			'o1', 'o2', 'o3', 'o4', 'o5',
			'ô1', 'ô2', 'ô3', 'ô4', 'ô5',
			'ơ1', 'ơ2', 'ơ3', 'ơ4', 'ơ5',
			'u1', 'u2', 'u3', 'u4', 'u5',
			'ư1', 'ư2', 'ư3', 'ư4', 'ư5',
			'y1', 'y2', 'y3', 'y4', 'y5',
			'a₁', 'a₂', 'e₂', 'o₂', 'o₃', 'u₃',
			'd₁',
			'%2%1',
			'%1' .. '0' .. '%2',
			'%1' .. '0',
		}
	},
It can transform the string Tuyên ngôn toàn thế giới về nhân quyền của Liên Hợp Quốc ; công bằng ; Đại ; Ác-si-mét into tuye₂n0 ngo₂n0 toan1 the₂4 gio₃i4 ve₂1 nha₂n0 quye₂n1 cua2 lie₂n0 ho₃p5 quo₂c4 ; co₂ng0 ba₁ng1 ; d₁ai5 ; ac4 si0 met4.
It's a shame that Lua error in Module:languages at line 348: data for mw.loadData contains unsupported data type 'function'; using a function as the third parameter for gsub may have made dealing with diacritics easier.
edit: forgot about 'y' —19:49, 21 July 2017 (UTC)
suzukaze (tc) 19:45, 21 July 2017 (UTC)
@Suzukaze-c, Wyang: I'll see if I can convert that long series of replacements into a function in Module:vi-sortkey, unless either of you is working on a function now. It might be more efficient to first decompose, then handle the diacritics. — Eru·tuon 20:22, 21 July 2017 (UTC)
I've added a subscript 0 for unmodified vowel letters (that is, a plain vowel letter with or without a tonal diacritic), to make sure that the modified ô and ơ sort directly after plain o. Otherwise, I wonder if modified vowel letters would sort in unacceptable positions. (Hypothetical example: ngôn ngo₂n0 should sort directly after ngon ngon0, but perhaps it would sort after ngoy because would sort after y. So I think ngon should have the sortkey ngo₀n0.) But I don't know how sortkeys work, when non-alphabetic characters are involved, and I could be wrong. Does anyone know if the subscript 0 is needed? — Eru·tuon 21:45, 21 July 2017 (UTC)
Great work on Module:vi-sortkey, thanks. I'm not working on this at the moment, so please feel free to make any changes. Not sure about the sorting algorithm in Lua either, but a good method of testing whether the entries are properly sorted would be to check whether the {{der3|lang=vi}} output using a large number of Vietnamese words is correct. Wyang (talk) 22:20, 21 July 2017 (UTC)
I think I might have been wrong about needing subscript 0, but I'm getting confused now. If someone could look at the documentation page of the module and figure out if the function is working, I would appreciate it. You can process a list of words using the showSorting function on the documentation page of the module. — Eru·tuon 22:58, 21 July 2017 (UTC)
Okay, yeah, my reasoning above was wrong. ngôn should sort after both ngon and ngoy, as ô is a different letter from o. So I think the sort order is fine now. But if someone could confirm, that would be great. — Eru·tuon 23:15, 21 July 2017 (UTC)
Not sure if this will be helpful or not. There is confusion in Western software for Vietnamese in regard to sort order. In Microsoft Word 2010, the order is given as a, à, , ã, á, . In Microsoft Excel 2010, it is: a, á, à, ã, , . These are incorrect. The MS Word 2010 order comes from the physical order of the Vietnamese tones on a Vietnamese keyboard. The order of the keys is not the sort order.
The alphabet, in correct order, is: a ă â b c d đ e ê g h i k l m n o ô ơ p q r s t u ư v x y (the twelve vowels being: a, ă, â, e, ê, i, o, ô, ơ, u, ư, y). The six tones are: a, á, à, ả, ã, ạ, in this order. Therefore, the vowel a, including its associated forms ă and â, take up eighteen places in the sort order:
a, á, à, ả, ã, ạ
ă, ắ, ằ, ẳ, ẵ, ặ
â, ấ, ầ, ẩ, ẫ, ậ
Altogether, the 12 vowels plus 6 tones take up 72 places in the sort order. —Stephen (Talk) 01:18, 22 July 2017 (UTC)
@Stephen G. Brown: I've added the order of tonal diacritics that you describe to Module:vi-sortkey. @Fumiko Take gave the "Microsoft word 2010" order in the talk page discussion linked above, saying that it was used by the Institute of Linguistics of Vietnam and Vietnam National University Publishing House. I can't verify either claim, but the order can be changed easily if necessary. — Eru·tuon 01:58, 22 July 2017 (UTC)
Interesting, I didn't know it was used for Word. Any way, I consulted those huge dictionaries published by those institutions, but there don't seem to be any online copies or previews, so I guess you'll just have to take my word for it. ばかFumikotalk 06:02, 22 July 2017 (UTC)
I'm not sure if there's such thing as a "correct" order. Normally, whenever I recite the tones, it's "ngang, sắc, huyền, hỏi, ngã, nặng", which is how I learned them at grade school. But the dictionaries seem to use either the Tang-poetry-inspired order, or that which parallels with the four tones of Middle Chinese (ngang and huyền - level; hỏi and ngã - rising; sắc and nặng - departing/checked). ばかFumikotalk 06:09, 22 July 2017 (UTC)
MS Word is a word processing program, which is what would be needed to compile, edit, and print big Vietnamese dictionaries. It's likely that the Institute of Linguistics of Vietnam and the Vietnam National University Publishing House used MS Word to produce those dictionaries. Twenty-five years ago, they would have had to sort all of the entries by hand, which is a huge job. Usually they had to write each entry on a card, which they then stored in long card-file boxes designed for the purpose. They moved the cards around by hand to achieve sorting, and then they would type the information from the cards. Today, they can use computerized sorting, which is accurate and almost instantaneous. Those institutions and publishers probably accepted the MS Word word order. To do otherwise would have been difficult and expensive. So what does this mean? Maybe Vietnam is accepting this new word order as an official one. You are our expert for Vietnamese, Fumiko, so the decision is yours to make. If the Institute of Linguistics prevails on MS to use a different sort order in the next edition of MS Word, then it will be easy for us to change our word order as well. So whatever you decide is okay with me. —Stephen (Talk) 12:02, 22 July 2017 (UTC)
I'm not aware of any respectable source that uses the "ngang, sắc, huyền, hỏi, ngã, nặng" order (most dictionaries I've seen that do are from inferior publishers who can't even decide whether to use "từ điển" or "tự điển", so it's safe to just disregard them altogether), so I guess you'll just have to go with the "ngang, huyền, hỏi, ngã, sắc, nặng" order. ばかFumikotalk 07:04, 23 July 2017 (UTC)

I've added the sortkey module to the data table for Vietnamese. It currently uses the order given by @Stephen G. Brown, rather than the order of the Institute of Linguistics, but it can be switched easily if @Fumiko Take wants to go with the other order. — Eru·tuon 04:56, 23 July 2017 (UTC)

Dotsies[edit]

http://dotsies.org/ Huh. —Justin (koavf)TCM 06:45, 22 July 2017 (UTC)

It totally misses on the fact that the human brain is good at recognising shapes. —CodeCat 10:03, 22 July 2017 (UTC)
Sort of fun. Did the creator lose interest? No tweets since 2013. Equinox 10:08, 22 July 2017 (UTC)
 --Daniel Carrero (talk) 10:30, 22 July 2017 (UTC)

"Proverb" ain't a part of speech[edit]

Just a thought: remember how we started to get rid of the Initialism and Abbreviation headers because they aren't actually parts of speech (e.g. BBC functions as a proper noun)? - though we still have lots of relics like TLA. Shouldn't we also get rid of Idiom and Proverb on the same grounds? Obviously it's good to know when something is a proverb (and we could use the normal categories for this, maybe a {{lb|en|proverb}}), but Proverb definitely isn't a PoS. And I never really knew what Idiom was good for. We would still have Phrase as a wastebasket taxon for anything that doesn't fit into another PoS. I'm not too bothered either way, but it feels consistent and logical, especially if we're moving towards some semantic (WikiData?) model where a PoS header needs to represent an actual PoS. What is your opinion? Equinox 08:59, 23 July 2017 (UTC)

Can we not call them Sentence? —CodeCat 10:44, 23 July 2017 (UTC)
They are not guaranteed to be sentences though the entry is likely to be the core of a sentence. DCDuring (talk) 11:55, 23 July 2017 (UTC)
I think that "Proverb" is a more-or-less perfect name to describe a proverb. --Recónditos (talk) 12:18, 23 July 2017 (UTC)
Yeeeees, the question is more whether we should put it in a gloss. "Football" is a good gloss for a lot of your sports journalism trash but you don't put that between the double equals signs. Equinox 12:31, 23 July 2017 (UTC)
I don't see any advantages in putting Phrase instead of Proverb, are there any proverbs which aren't phrases? Crom daba (talk) 13:50, 23 July 2017 (UTC)
The common core of a proverbial expression that takes many forms could be a non-constituent, ie, not a phrase. I don't remember whether we have made entries of that kind. DCDuring (talk) 16:20, 23 July 2017 (UTC)
  • I'm fine with "Proverb", although I'd support getting rid of "Idiom" (which is usually supplanted by {{lb|xx|idiomatic}}). —Μετάknowledgediscuss/deeds 16:22, 23 July 2017 (UTC)
Support highly. Please remember that a section heading for POSes are for POSes and not for anything else. Should we also consider "formality" a POS, for instance? PseudoSkull (talk) 02:39, 24 July 2017 (UTC)
The argument in favour of keeping proverb as a part of speech is that they are typically used in a more isolated fashion than other phrases. They can always(?) stand alone, whereas other phrases are woven into a sentence no differently than any other word. I think "Phrase" should go, however, and I can't recall ever seeing "Idiom", but that seems out of place to me. "Phrase" and "Idiom" don't inform the reader how a term/expression is used. "Proverb" on the other hand, does. Andrew Sheedy (talk) 04:33, 24 July 2017 (UTC)
  • A note: Idiom is already explicitly forbidden by WT:ELE, so we don’t need any changes to start getting rid of it. — Vorziblix (talk · contribs) 09:09, 24 July 2017 (UTC)
Proverb, letter, suffix, prefix, symbol, definitions... the easiest solution is not to get rid of “proverb” as a part of speech, but to stop calling our definition-section headings “part of speech” in the first place. — Ungoliant (falai) 15:31, 24 July 2017 (UTC)
I'd support saying "Definitions section" instead of "POS section" in the future. EL could be edited to arrange that if people want. Some entries (Chinese I believe) even have "Definitions" as a POS header. --Daniel Carrero (talk) 15:34, 24 July 2017 (UTC)
I think that would be a good idea. Chinese does use a definitions section sometimes because (I think) many words have ambiguous functions, e.g. with many nouns easily being used as adjectives or adverbs. —Aryaman (मुझसे बात करो) 19:20, 26 July 2017 (UTC)

I would say that a proverb is a kind of set phrase, as hypernym. At least so I would say in Spanish, French and Catalan terminology with its equivalents frase hecha, phrase faite, frase feta. --Vriullop (talk) 07:57, 27 July 2017 (UTC)

Japanese pitch accent requests by Special:Contributions/120.18.168.107[edit]

Is it fair to bulk-request Japanese pitch accents, as in インボイス (inboisu)? They are not so readily available in dictionaries. Hardly present in online dictionaries and occasionally available in printed dictionaries and paid apps. --Anatoli T. (обсудить/вклад) 09:38, 23 July 2017 (UTC)

Yes, we should definitely include them. If they are not readily available, that's even more reason for us to provide them. —CodeCat 10:43, 23 July 2017 (UTC)
I've added the ones I could find in Daijirin. Wyang (talk) 11:01, 23 July 2017 (UTC)
@Wyang Thanks. For me, accessing Daijirin has become cumbersome. BTW, shouldn't [ìńbóꜜìsù] show "m" for consistency? --Anatoli T. (обсудить/вклад) 11:42, 23 July 2017 (UTC)
That thing (whatever it is called) is based on the romanisation (inboisu). Wyang (talk) 11:44, 23 July 2017 (UTC)
I've been using Weblio辞書 for Daijirin accents, but unfortunately it doesn't show which vowels are devoiced. --Dine2016 (talk) 16:01, 23 July 2017 (UTC)
@Dine2016 Thanks. I forgot about this resource. It's the only one online, I think. I purchased Daijirin for 17 AUS$ but my android is now malfunctioning and I have switched to an iPhone. Unfortunately, there is no licence transfer and I am not sure if I use this current phone for long. It's a problem with purchased apps. --Anatoli T. (обсудить/вклад) 07:19, 24 July 2017 (UTC)
Any Australian IP that does mass "theme" edits with difficult languages like that is probably an Awesomemeos sock, though I can't be certain enough to start playing whack-a-mole with them. It looks like they've figured out how to keep from always geolocating to the same place (though in this case they probably actually are in Sydney), but their approach to editing is pretty distinctive. Chuck Entz (talk) 03:30, 27 July 2017 (UTC)

Request for adminship[edit]

The main motivation is to be able to edit javascript pages (i.e. gadgets, MediaWiki:common.js, other's javascript pages etc.). Unfortunately, template editor does not allow me to to edit js pages.

What I am going to do:

  • General cleanup of javascript infrastructure.
    • Mainly that includes moving stuff from one place to another.
  • Extract gadgets from MediaWiki:Gadget-legacy.js and also make disabling legacy gadgets in preferences not result in a catastrophe.
  • Modernize gadgets (that is, use jQuery, clean up code, drop deprecated code, etc.)
  • rewrite LangMetadata (currently defined in MediaWiki:Gadget-TranslationAdder.js) to use modules rather than hardcoded data.
  • elimiate the use of langrev subtemplates in MediaWiki:Gadget-TranslationAdder.js and possibly add a better autocomplete.
  • Eliminate JsMwApi in favor of mediawiki's own Api.

Let's make Wiktionary great again!

Any objections?--Dixtosa (talk) 12:01, 23 July 2017 (UTC)

We definitely need someone who is willing and able to tackle these issues. There are a few more open issues with translation tables as well:
  • The conversion of the translation adder code to not rely on a fixed table structure, but instead use the translations-cell CSS class which was added to {{trans-top}} some time ago.
  • Deprecation and removal of {{trans-mid}} in favour of CSS-based balancing, which also includes the removal of all balancing-related features from the translation adder. This relies on the previous step.
  • Migrating translation tables to use vsSwitcher, which doesn't need a surrounding div.
  • Redoing the "favourite languages" feature of translation tables, so that favourite languages are shown as a reduced translation table in the table's collapsed state, rather than in the header of the table. This relies on the previous change, since the older NavFrame system does not allow for content to be shown in the collapsed state, whereas vsSwitcher does.
CodeCat 15:53, 23 July 2017 (UTC)
Wiktionary:Votes/sy-2017-07/User:Dixtosa for admin DTLHS (talk) 15:58, 23 July 2017 (UTC)
I'd be particularly in favor of any JUS improvements that resolved the intermittent, chronic problem with loss of the show/hide controls and sometimes other functionality implemented in JS. DCDuring (talk) 16:24, 23 July 2017 (UTC)
Sounds great. I can get on board with "MWGA". I second @DCDuring's comment. — Eru·tuon 21:31, 23 July 2017 (UTC)
I am not sure exactly how bot work relates to adminship. But Dixtosa is a name that I trust. So sure. Equinox 22:03, 23 July 2017 (UTC)

Wiktionary:Votes/2017-07/Rename categories[edit]

Based on the discussions linked in the vote, I created Wiktionary:Votes/2017-07/Rename categories. This is a large project, so this vote will start in two weeks and then it will end in two months. --Daniel Carrero (talk) 13:35, 24 July 2017 (UTC)

Limiting user vote creations[edit]

Is there a limit to how many votes of user can create in a given time? If not, I think there should be. --Victar (talk) 19:17, 25 July 2017 (UTC)

What limit you would like, exactly? --Daniel Carrero (talk) 19:27, 25 July 2017 (UTC)
I think we just need to have every vote approved by at least five editors or so (in the BP or maybe elsewhere, depending on the topic) before it can be created. --WikiTiki89 20:25, 25 July 2017 (UTC)
How would brigading be dealt with? --176.23.1.95 20:41, 25 July 2017 (UTC)
What exactly do you mean by brigading? --WikiTiki89 20:48, 25 July 2017 (UTC)
How to ensure a neutral assessment of the eligibility of a vote? Should votes be made about votes? --176.23.1.95 23:15, 25 July 2017 (UTC)
To clarify, if five editors (or however many we decide) want to have a vote and a hundred editors don't, we would still have the vote, because those five editors approved it. --WikiTiki89 14:51, 26 July 2017 (UTC)
I agree, I think each vote should get pre-approval. I think users should also only be able to create max 2-3 votes per month. --Victar (talk) 21:13, 25 July 2017 (UTC)
I think if each vote is pre-approved, it we won't need any rate limit. --WikiTiki89 23:12, 25 July 2017 (UTC)
You say that, until someone puts in 10 vote proposals. --Victar (talk) 23:17, 25 July 2017 (UTC)
If five other editors approve each one, what's the issue? Anyway, I don't think we need formal vote proposals. If everything is done right, the issue should already have an ongoing discussion before it is decided that there needs to be a vote. --WikiTiki89 14:51, 26 July 2017 (UTC)
How would the pre-approval process work? Wiktionary:Votes/pl-2015-09/Coauthoring policy votes suggests this: "The proposed requirement that all policy votes have at least one coauthor, that is, a distinct individual who at the very least makes one edit to the descriptive section of the voting page before it starts, even if just to list themselves as a contributor." That vote was created in 2015 and never started. As of today, the vote does not meet its own requirements to start: it doesn't have two contributors yet. --Daniel Carrero (talk) 12:08, 26 July 2017 (UTC)
By five (or however many we decide) editors mentioning in the discussion of the issue that there should be a vote. --WikiTiki89 14:51, 26 July 2017 (UTC)

At the moment nine different votes are running, created by five different users. Do we need a votes "watchdog"? I think there should be a limit on how long a vote runs for, some run for two months. DonnanZ (talk) 06:18, 26 July 2017 (UTC)

I created this two-month vote: Wiktionary:Votes/2017-07/Rename categories. It has not started yet. I think it was a good idea because it's a large proposal. It gives more time for people to read, think and discuss about it. @Dan Polansky sometimes creates two-month votes too. In my opinion this is not an issue, but I can change the "Rename categories" vote to one month if people prefer. --Daniel Carrero (talk) 09:11, 26 July 2017 (UTC)

Too much time is wasted on these trivial votes. Wyang (talk) 09:18, 26 July 2017 (UTC)

@Wyang What votes would you say are trivial? --Daniel Carrero (talk) 09:36, 26 July 2017 (UTC)
Sorry, I mistyped the ping, here it is again: @Wyang. --Daniel Carrero (talk) 09:37, 26 July 2017 (UTC)
Most of the votes running now. The whole idea of creating a vote after every discussion is just wrong. It is continuing to encourage uninformed self-assurance, over critical analysis of the issues. There have been many examples of counterproductive decisions made in the past as a consequence of relying on collective ignorance; superficially having a decision made by such majoritarian democracy looks good, but it could be really damaging in the long run. An example is the decade-long merge-split-merge vacillation of Chinese. Making it worse is the verbosity of many of the votes, such as Wiktionary:Votes/pl-2017-07/Gallery and Wiktionary:Votes/2017-07/Rename categories. I certainly would not want to read 14,835 bytes for a vote, and should not have been given the chance to in the first place. So much work on entries and developing new gadgets and functionalities could have been finished if the time reading the votes is diverted. Wyang (talk) 10:02, 26 July 2017 (UTC)
You mentioned collective ignorance concerning the merge-split-merge of Chinese, so what about having a rule like this: "Only people knowledgeable in [language] (as evidenced by the number Y of edits in [language]) are permitted to vote in issues concerning [language]." What do you think about that?
Apart from the Chinese issue, how are the "Rename categories" and "Gallery" votes "trivial"? They are votes for major changes. I don't claim them to be perfect, they could have problems to solve, but they can't be "trivial". Are there any better ways to try to implement these projects without votes? I've been trying to work under this limit: 1 vote per week. Some people seem to prefer it that way, although the idea of having that formal rule itself failed that vote.
Votes are often much smaller and easier to read than discussions. True, Wiktionary:Votes/2017-07/Rename categories is 14,835 bytes -- but it was based off Wiktionary:Beer parlour/2017/June#Proposal: Clean up, rename and replace "en:" → "English" in all categories which is 31,924 bytes and could still grow. It's OK if you don't want to read it, but please don't vote oppose on "TL;DR" grounds (although you still have that right). It's great to be able to discuss things on the BP, but for major proposals, votes have this advantage: it should be easier to judge the merits of a specific proposal in the vote (often detailing how exactly a policy would be edited) rather doing something out of a discussion with multiple proposals and where people not always give clear support, oppose, etc. for each idea, and can change their minds in the middle of the discussion. --Daniel Carrero (talk) 10:31, 26 July 2017 (UTC)
Maybe, just maybe, the matters aren't worth "resolution". When this very point is mentioned in the discussion, it is often ignored by the vote advocate. TL;DR is usually a somewhat polite way of saying: "Not worth my time". DCDuring (talk) 11:27, 26 July 2017 (UTC)
Do you have any examples of votes that aren't worth "resolution", and/or votes where that point was mentioned in the discussion and was ignored by the vote advocate? --Daniel Carrero (talk) 11:42, 26 July 2017 (UTC)

For what it's worth, in Wiktionary:Votes/pl-2016-11/Voting limits the "Proposal 2" failed. It was about implementing this regulation to limit vote creation: "The same person cannot create more than one vote in the span of 7 days. (For example, if someone creates a vote on December 9, then they must wait until at least December 16 before creating another one.)" --Daniel Carrero (talk) 09:27, 26 July 2017 (UTC)

We may need another vote on that.
It seems to me that votes are poor substitutes for longer-running consensus decisions. They seem to involve forcing resolution of disagreements for the sake of doing so or for the sake of enabling some kind of often premature standardization. With the passage of time some of these matters resolve themselves, others can be resolved more easily as contributors gain more knowledge. Votes force a discussion to take place whether or not participants have had actual experience with the "problem" being addressed. The proposals themselves are often quite amateurish, making the discussion mostly a matter of correcting gross errors and little time to discuss a mature proposal. Most discussion should take place before the vote is initiated. If no one cares enough to participate in the discussion perhaps the matter isn't of sufficient importance or is a "solution" to a non-problem. DCDuring (talk) 11:19, 26 July 2017 (UTC)
If I'm not mistaken, you seem to be talking about the votes concerning "External links" and "Further reading". It could be other votes too. I do think the "commons" links doesn't fit the "Further reading" and it's still a problem. The use of "Further reading" was an improvement otherwise, in my opinion. Let me know if you are talking about other votes. --Daniel Carrero (talk) 11:42, 26 July 2017 (UTC)
You know, you could create votes with vote-limiting proposals. --Daniel Carrero (talk) 11:46, 26 July 2017 (UTC)
Even though you said "We may need another vote on that.", I don't think there's anyone creating more than 1 vote every 7 days so the vote-limiting rule that failed the vote is being de facto followed even if it's not a formal rule. Of course, you could be thinking about different vote-creation limitations that we could discuss. --Daniel Carrero (talk) 11:59, 26 July 2017 (UTC)
Nobody would bother with this if there weren't a basic consensus that we now had too many votes. The simplest way to avoid a vote on this would be to recognize that consensus. DCDuring (talk) 12:51, 26 July 2017 (UTC)
True, we seem to have consensus on that. What happens now? --Daniel Carrero (talk) 13:02, 26 July 2017 (UTC)
We have a vote on whether or not we have too many votes. —Aɴɢʀ (talk) 14:53, 26 July 2017 (UTC)
Naturally not now, but at some point later I could create a vote on whether or not we have too many votes.
Suppose we want to create a vote to implement @Wikitiki89's proposal: "have every vote approved by at least five editors or so". Do we need approval from five editors or so to create that vote itself? --Daniel Carrero (talk) 15:15, 26 July 2017 (UTC)
Even if we don't need it, would it hurt to wait until we have it? --WikiTiki89 15:19, 26 July 2017 (UTC)
All we need is to enforce existing rules. We already require prior discussion before a vote. I certainly oppose "votes out of the blue" the way Dan has often created them. —CodeCat 15:24, 26 July 2017 (UTC)
@Wikitiki89 Of course not. We could also ask: "Are there five people willing to approve the idea of creating a vote for the proposal of requiring all future votes to be approved by five people first?"
@CodeCat By "Dan", are you referring to me? --Daniel Carrero (talk) 15:27, 26 July 2017 (UTC)
No, the actual Dan. —CodeCat 15:28, 26 July 2017 (UTC)
Sorry, my mistake. In the future, I'd like to create a vote with the proposal: "require prior discussion before a vote". This would serve as a confirmation vote. I dispute the notion that we do have this rule, but if it passes, this will become a formal written rule. --Daniel Carrero (talk) 15:32, 26 July 2017 (UTC)
Are you actually asking, or are you pointing out that we could ask? I am in favor of this rule, however I think it's too early to create a vote. Let's discuss it more. --WikiTiki89 15:40, 26 July 2017 (UTC)
I'm just pointing out that we could ask. I agree that it's too early to create a vote. I agree with this too: let's discuss it more. --Daniel Carrero (talk) 15:46, 26 July 2017 (UTC)
What should happen now is that, say, a vote or two is removed from the list and proposers show some basic self-restraint, so we don't waste time making a rule that shouldn't be required. DCDuring (talk) 18:32, 26 July 2017 (UTC)
@DCDuring: One vote per week, per person, at most looks good to you? What are the one or two votes that you would like to remove from the list? --Daniel Carrero (talk) 18:37, 26 July 2017 (UTC)
This isn't the first time someone has objected to the constant stream of votes. The solution isn't to pin you to a maximum, it's for you to have some responsibility and create fewer needless votes. (And please don't now drag this into yet another 20 paragraphs of criminal-lawyer-Daniel asking "prove to me which ones are needless". Everyone knows.) Equinox 18:40, 26 July 2017 (UTC)
Which of the current votes are needless? This is a reasonable question, it took fewer than 20 paragraphs to ask. --Daniel Carrero (talk) 18:46, 26 July 2017 (UTC)
You are making AGF very difficult for me. As to my preferences, I'd prefer that several of the proposals that you have proposed and continue to favor, that seem like they will or might win, but which I oppose, be withdrawn. I am not sure whether I would also like it if you simply noticed that you are losing credibility with every argumentative response and acted to preserve whatever credibility remains and even restore it or that you continued on your current path, which might lead to none of your proposals passing and a change of climate on this page. DCDuring (talk) 19:19, 26 July 2017 (UTC)
Geez, I was just asking. I know I write argumentative responses sometimes, I don't think that's necessarily a bad thing. But don't you think you write argumentative responses too? When I see your name in the recent changes, responding to a discussion or vote where I participate, I always think before I read your words "here we go, it's time to read some more criticism against what I did again".
Most of the votes I created have passed, some don't and I try to learn from them. It's true that if I lost credibility and none of my proposals passed, this would be a strong incentive for me to stop or avoid creating votes.
Of all the current votes, you voted in 7. You supported the vote for Dixtosa to become an admin. You voted "oppose" in all the other 6 votes pages, half of which were created by me (one per request, which I also opposed eventually). I don't think we can just withdraw the votes that I created and you voted oppose. You mentioned "that you have proposed", so do you have anything against me personally? What would it take for you to support a vote? --Daniel Carrero (talk) 19:56, 26 July 2017 (UTC)

Elu Prakrit[edit]

This needs a code, preferably inc-elu. Alternative names are "Helu Prakrit", "Helu", and "Elu", and maybe "Old Sinhalese". Descendants include si. It is an Indo-Aryan language (inc). —Aryaman (मुझसे बात करो) 23:08, 25 July 2017 (UTC)

@Aryamanarora elu-prk exists. Madhav P. (talk) 00:30, 26 July 2017 (UTC)
@माधवपंडित: Oh, thanks! —Aryaman (मुझसे बात करो) 01:47, 26 July 2017 (UTC)
To make sure a language doesn't already exist, you can use the search box in Module:languages. — Eru·tuon 02:51, 26 July 2017 (UTC)
@Erutuon On second thougts I don't think the issue ends here. You cannot use the {{inh|si|elu-prk}} tag even though Sinhalese is its descendant. Also the hyperlink Helu links to the wiki article of some Chinese king. Helu doesn't even have its own catagory page. Madhav P. (talk) 07:57, 26 July 2017 (UTC)
@माधवपंडित: Aha... Helu is in Module:etymology languages, so it does not have a dedicated category page (except it could have the category Terms derived from Helu). It is currently considered a subvariety of Sanskrit. I can change that if it is wrong. I fixed the Wikipedia link. Is Helu the ancestor of any language besides Sinhalese? — Eru·tuon 08:17, 26 July 2017 (UTC)
Okay, Helu, from the Wikipedia article, looks distinct enough that it can't be considered a variety of Sanskrit. I promoted it to a full-fledged language and added it as the ancestor of Sinhalese. — Eru·tuon 08:26, 26 July 2017 (UTC)
@Erutuon: Thanks a lot! I think only Sinhalese descends from Helu. Helu is Middle Indo-Aryan while Sanskrit is Old Indo-Aryan. Madhav P. (talk) 08:30, 26 July 2017 (UTC)
@माधवपंडित: You're welcome. Hm, I need some more items for the data file: scripts and ancestor (if there is a nearer ancestor than Proto-Indo-Aryan). — Eru·tuon 08:33, 26 July 2017 (UTC)
@Erutuon: An immediate ancestor would be one of the closely related Old Indo-Aryan dialects very close to Sanskrit but of course it'd be undocumented. Can't say about the script... @Aryamanarora what do you think? Madhav P. (talk) 12:41, 26 July 2017 (UTC)
@माधवपंडित: It's most likely Brah, Brahmi script. Is dv Dhivehi a descendant? Wiki says it is a descendant of Maharastri Prakrit but then goes on to say sometimes it's considered a dialect of Sinhalese. —Aryaman (मुझसे बात करो) 13:42, 26 July 2017 (UTC)
@Aryamanarora: Wiki places Helu in association to, if not under Maharastri. I think these two prakrits are more closely related to each other than they are to other prakrits. Madhav P. (talk) 13:45, 26 July 2017 (UTC)
@माधवपंडित: They do seems to be, both of them drop almost all medial consonants, but imo Elu has a completely different phonetic system for the "standard" Maharastri Prakrit. But perhaps the vernacular Maharastri sounded more like Elu than we know. —Aryaman (मुझसे बात करो) 13:58, 26 July 2017 (UTC)
Wikipedia says that Dhivehi descends from Maharashtri Prakrit or Helu in different places on the page, but there are no sources for either claim. (I've added Brahmi script to Helu.) — Eru·tuon 17:48, 26 July 2017 (UTC)
elu-prk isn't a properly formatted code, it should be renamed. —CodeCat 15:25, 26 July 2017 (UTC)
@CodeCat: Would you have an alternative? It would be easy to change the code now, as it's hardly used. — Eru·tuon 17:48, 26 July 2017 (UTC)
Aryaman's original proposal. —CodeCat 17:55, 26 July 2017 (UTC)
Any objections from others? — Eru·tuon 18:00, 26 July 2017 (UTC)

Biblical Hebrew hapax legomena[edit]

The ongoing discussion about making Latin a WDL has made me wonder whether we allow Biblical Hebrew hapax legomena (and dis legomena), considering that:

  • CFI no longer considers usage in a well-known work to be sufficient,
  • we treat Biblical Hebrew and Modern Hebrew as the same language,
  • we consider Hebrew a WDL.

In principle, those three facts mean that we would exclude Biblical hapaxes and disses, except for those (like גבינה, זכוכית, and לילית) that have gone on to become regular words of modern Hebrew. How do we want to handle this situation? Shall we:

  1. ban Hebrew words used only once or twice in the entire Hebrew corpus;
  2. divide Hebrew into Modern Hebrew (he, presumably including Medieval Hebrew) and Biblical Hebrew (hbo, presumably including Mishnaic Hebrew), making the former a WDL and the latter an extinct language;
  3. consider all of Hebrew an LDL;
  4. ignore the issue and decide on hapaxes on a case-by-case basis?

Solution 2 is what we've done for Greek, which is divided into grc and el, and solution 4 is apparently what we've mostly done for Latin and what we're currently arguing over. For that reason I'd prefer NOT to apply solution 4 to Hebrew. My preferred solution is 2, but others may disagree. (Personally I think 2 is actually the only logical solution to the Latin Question as well, but this thread isn't for talking about Latin.) —Aɴɢʀ (talk) 12:45, 26 July 2017 (UTC)

This goes back to our old repealed policy of allowing a word used once in a well-known work. The reason we repealed it, is that if nobody ever used or talked about that word again, then we probably don't need to be included. Thus there are no real hapax legomena in Biblical Hebrew when you include non-Biblical Hebrew, because each of them has been discussed and used later, specifically because of its unusualness in the Bible. --WikiTiki89 14:54, 26 July 2017 (UTC)
Indeed, I suspect you are manufacturing a problem. To follow up on Wikitiki's point, can anyone find even a single Biblical Hebrew entry that would fail RFV under our current rules? —Μετάknowledgediscuss/deeds 15:12, 26 July 2017 (UTC)
There do seem to be true Biblical Hebrew hapaxes [2], but we don't have entries for them yet, either because our coverage of Hebrew skews heavily toward Modern Hebrew, or because people know they wouldn't pass RFV. The words in question may be discussed (i.e. mentioned) later, but are they used later? I know some of them are (I mentioned some above), but all of them? —Aɴɢʀ (talk) 15:43, 26 July 2017 (UTC)
I think you misunderstood me. If you consider the corpus of Biblical Hebrew alone, then of course there are true hapax legomena. But when you consider Hebrew as a whole, including later Hebrew, most, if not all, of these Biblical hapax legomena will be discussed and used again later. --WikiTiki89 15:48, 26 July 2017 (UTC)
No, I understood. My question is, are all of them used (not merely discussed) again? What about the two entries other than פלדה (which is a modern Hebrew word too) in Category:Hebrew hapax legomena? Are they used (not mentioned) at least three times across all stages of Hebrew? —Aɴɢʀ (talk) 15:57, 26 July 2017 (UTC)
Out of those three words, זדה is not actually Biblical Hebrew, but from the Siloam inscription, so it is a different situation that we might need to discuss. The other two are used at least in Modern Hebrew. --WikiTiki89 16:05, 26 July 2017 (UTC)
Then take my "Biblical Hebrew" to mean "all Hebrew from before the 4th century CE" or whatever cutoff point is customary for the line between Mishnaic and Medieval Hebrew. Maybe we can call it "Classical Hebrew". The point remains: if Hebrew is all one language, and that one languages is a WDL, and זדה is not used (as opposed to mentioned) at least three times by three different authors, then our current rules do not allow its inclusion. —Aɴɢʀ (talk) 16:16, 26 July 2017 (UTC)
Well its silly to put Biblical and Mishnaic Hebrew together on one side and Medieval and Modern Hebrew on the other side. Mishnaic Hebrew is a lot more similar to Medieval Hebrew than to Biblical Hebrew. If anything, the line would be drawn between Biblical and Mishnaic. But regardless, if you mean to talk about examples like זדה, then let's talk about those. The contradiction is between these two points: (a) In the context of Hebrew as a whole, it is not likely that someone would encounter this word and want to know what it means, and so does not need to be included. (b) If "Epigraphic Hebrew" were to be considered its own language, then this word would be included, as similar words are in ancient languages with even smaller corpi, so it doesn't make sense to exclude it just because it happens to be part of a larger language. I think we need to resolve this contradiction more generally, rather than specifically for Hebrew, as it applies to many other languages, notably the recently-much-discussed Latin issue (although that details of that case are a bit different). --WikiTiki89 18:11, 26 July 2017 (UTC)
The reason I brought up Hebrew specifically is that is the only other language I can think of besides Latin where we consider the ancient form and the modern form to be one and the same language. Other cases where the ancient form and the modern form of a language are similar enough that it's conceivable to consider them a single language (Greek, Armenian, Icelandic/Norse) have two codes, one for the ancient form and one for the modern. Although on reflection, I guess we have just one code for all stages of Arabic and Chinese as well. At any rate, what this comes down to is the absurd situation we're currently in where a large number of users are saying "Post-1500 Latin is to be treated like either a WDL or a conlang; pre-1500 Latin is to be treated as an extinct language; but they're both the same language", and I wanted to see how we handle parallel situations. It does look like זדה currently does not meet CFI, but I bet if someone were to nominate it for deletion on those grounds, most people would vote to keep it, because generally we do keep words found only in inscriptions of ancient languages. —Aɴɢʀ (talk) 18:28, 26 July 2017 (UTC)
I don't know why you're only considering "ancient" and "modern". English is also a good example: Early Modern English had a lot of forms that we don't include, that we probably would include if it had been its own language. And there are many other languages with this sort of situation. --WikiTiki89 18:43, 26 July 2017 (UTC)

Mansi varieties[edit]

We have been getting a decent influx of Mansi lemmas recently, thanks to @Martinus Poeta Juvenis. This might be a good point to consider if we should treat Mansi as one language or as several.

The Mansi varieties are very different from each other: there are almost no cases where a standard Northern Mansi word has the same shape as its the Southern Mansi cognate, and sometimes they are very different indeed (e.g. 'gristle' is Southern /nʲeːrkɤː/, Northern /ńaːriɣ/). In many cases, reconstructions of Proto-Mansi are also available in literature (in this case *ńī̮rɣɜ or *ńē̮rɣɜ). The only written variety is Northern, and its spelling system mostly cannot be extended for other varieties (e.g. there are no signs for /ɤː/, /æ/ or /ɒ/). Inflection differs too: compared to Northern, Southern Mansi has no dual, but has the accusative and comitative cases. A few scholars by now consider "Mansi" to be a language family with up to four individual languages (Northern, Southern, Western, Eastern).

I would suggest:

  • reserve the code mns for Northern Mansi, which is the only living variety;
  • create new codes at least for Proto-Mansi (mns-pro?), Southern Mansi (ugr-sms?) and Central Mansi (ugr-cms?).

I'm not sure if separate Western and Eastern codes are needed at this point: they're a dialect continuum, and we may need a more general Wiktionary discussion at some point about what we want to do with linguistic field data covering dozens of closely related unwritten varieties. Treating everything as a separate language seems ineffective.

pinging also: @Panda10, @Neitrāls vārds, @Mulder1982 and just in case, @Alcenter. --Tropylium (talk) 13:34, 26 July 2017 (UTC)

Are there any attempts at latinisation for those non-literate Mansi varieties? For example I use transcription schemes given in "The Mongolic languages" for normalizing various phonetic spellings of East Yugur, Baonan, Daur, Mogholi and Khamnigan. I've also contemplated making an ad-hoc one for Sary-Yugur, but maybe it would be going too far. Crom daba (talk) 17:33, 26 July 2017 (UTC)
Most dialects have reasonably standardized linguistic transcription schemes, but they're per individual dialect, not dialect group. E.g. the verb 'to stay': Southern koľt-, Eastern: Lower Konda χoľt-, Middle Konda kʷoľt-, Upper Konda kʷuľt-, Western: kuľt-, Northern: χuľt- (= literary хульт-); or the noun 'mold': Southern ka͔šək, Eastern: Lower Konda xāšγə, Middle Konda kē̮səγ, Western: Pelymka kašša, Vagilsk kē̮šša, Northern: xāssi (= literary ха̄сси). It would seem like overkill to add separate entries for all variants. --Tropylium (talk) 19:05, 26 July 2017 (UTC)

References section only for <references/>?[edit]

I was under the impression that under recent policy changes, the "References" section should only be used for <references/>, i.e. to show inline references that are present elsewhere in the entry. However, User:Gamren has pointed out that our policy doesn't actually say so. So what is going on? —CodeCat 10:45, 27 July 2017 (UTC)

But under the prevailing regime, we have no policies that haven't been voted on. In each case, what has been voted on is the wording of a specific proposal. DCDuring (talk) 12:49, 27 July 2017 (UTC)
We allow "References" sections with simple bullet points instead of <references/>, as per Wiktionary:Votes/2016-12/"References" and "External sources". The vote did propose to require always using <references/> in "References" sections, but @This, that and the other and @Tropylium opposed the idea of introducing that specific limitation. --Daniel Carrero (talk) 13:22, 27 July 2017 (UTC)
I see. I'm not sure if I understand the difference between the sections then. What would I use to refer to another dictionary which contains an entry for the term? —CodeCat 13:35, 27 July 2017 (UTC)
In the vote I linked above, please see the comments of Tropylium, TTO and @I'm so meta even this acronym (and maybe others). I'm not saying I personally agree or disagree with them, but by voting that way they helped to shape the regulations as they are now. --Daniel Carrero (talk) 13:47, 27 July 2017 (UTC)
Sorry, I did not answer your last question properly. When you want to refer to another dictionary which contains an entry for the term, please use "Further reading". --Daniel Carrero (talk) 13:48, 27 July 2017 (UTC)