Wiktionary:Beer parlour/2016/October

Initialisms etc

Our current policy on initialisms fails to give guidance (or have I missed it?) on the appropriate format. Entries with the "Initialism" header are labelled as "Entries with non-standard headers". I have been adding and occasionally editing entries as "nouns" or the appriate POS.
Do we have policy? — Saltmarsh^{συζήτηση-talk} 10:11, 1 October 2016 (UTC)[reply]

The absence of explicit guidance leaves us with only the other PoS headers. One thing that the Acronym and Initialism headers did was provide pronunciation guidance (eg, for WHO: W-H-O or who). DCDuring TALK 11:12, 1 October 2016 (UTC)[reply]

syllable marks in English pronunciation

I think they don't belong. It's especially problematic in cases like daughter written (US) /ˈdɔ.tɚ/, which would tend to imply that /t/ isn't flapped, which is false. Benwing2 (talk) 21:07, 2 October 2016 (UTC)[reply]

Or at least, they should be used only when they clearly convey something useful, as in nitrate vs. coatrack, where the tr in the middle of the two words is pronounced quite differently in one vs. the other due to the morpheme boundary in the latter. Benwing2 (talk) 21:11, 2 October 2016 (UTC)[reply]

The syllable marks are useful in my experience for non-native students of English. Some education systems and languages encourage counting and recognition of syllables. English is so arbitrary that explicit syllable marks help the newcomer.

By the way, there is more discussion of syllables at https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2016/September#Stress_marks_and_syllable_marks Bcent1234 (talk) 13:36, 3 October 2016 (UTC)[reply]

Pluralization of Acronyms and Intialisms

For an entry like DINK, it is not clear to me if we want to create an entry for the plural form, which I would usually write as DINKs. This reflects the plural nature, and the fact that the word is based on an Acronym/Initialism. I don't know if there is any received practice on 1) to create the entry at all 2) to name the entry with the mixed case 3) Provide cross linking from this new entry to the plural word of similar pronunciation ( dinks ) Could someone give me some guidance ? Thanks ! Bcent1234 (talk) 13:42, 3 October 2016 (UTC)[reply]

@Bcent1234: I created DINKs, because it is a plural word (abbreviation) citable in Google Books ( https://books.google.com/ ). It's like PC -> PCs. If "DINKs" were not citable, we would not be able to create the entry. --Daniel Carrero (talk) 19:58, 3 October 2016 (UTC)[reply]

why long marks in Canadian English?

Why does Appendix:English pronunciation indicate long marks for Canadian but not American English? This makes no sense. Canadian English is largely the same as American English and doesn't have any clear distinction between long and short vowels, any more than American English does. Using separate symbols means we can't write (Canada, US) or (US, Canada) or (cot–caught merger, Canada) or similar, even though the two dialects are phonemically identical in most words. Benwing2 (talk) 21:39, 2 October 2016 (UTC)[reply]

Apparently @QuartierLatin1968 (interesting pattern of contributions) added this detail when copying the table from w:International_Phonetic_Alphabet_chart_for_English_dialects Crom daba (talk) 03:12, 3 October 2016 (UTC)[reply]

Sorry if including length marks led to difficulties! Yes, I don't contribute much to Wiktionary; I'm more on other projects. Cheers, QuartierLatin1968 (talk) 04:10, 4 October 2016 (UTC)[reply]

I am a fan of including length marks in American English for the phonemes /ɑː/, /iː/, /uː/, and /ɔː/. --Wiki Tiki 89 18:29, 5 October 2016 (UTC)[reply]

Why? There's no noticeable length on any of these phonemes, any more than any others. Benwing2 (talk) 20:18, 5 October 2016 (UTC)[reply]

I don't think that's entirely true. It may be partially true, in that length is not phonemic, but I for example pronounce beet slightly longer than bit and bead a lot longer than bid. Anyway, the way I see it is that /iː/ is just a symbol for a phoneme and the length mark is just part of the symbol, and so choosing /iː/ over /i/ simply makes it more consistent with our UK pronunciations. And now that I mentioned the UK pronunciations, the length situation in the UK is actually not that different from the situation in the US (especially when you take internal variation into account for both countries) except for the fact that there are instances where length is the only distinguishing feature in the UK (like in dared /dɛːd/ vs dead /dɛd/ for at least some speakers). --Wiki Tiki 89 20:35, 5 October 2016 (UTC)[reply]

It seems extra cruft to me. Also, in UK English, the lax phonemes /æ ɛ ɪ ɒ ʌ ʊ/ are quite short, shorter than the US phonemes and noticeably shorter than the tense phonemes, whereas e.g. /ɑː/ and /ɔː/ are noticeably long. In the US, however, there's no obvious length difference at all between e.g. bat, bot, bought, nor bad, bod, bawd, so writing /æ/ but /ɑː/ is misleading. We will have to distinguish UK and US English much of the time anyway so I don't see why it helps that much to distort US notation to accommodate UK notation. Benwing2 (talk) 20:54, 5 October 2016 (UTC)[reply]

It's not a distortion. Just because the length difference doesn't hold in for all vowels in all environments doesn't mean we should do away with it entirely. --Wiki Tiki 89 21:01, 5 October 2016 (UTC)[reply]

Minimal Difference Pairs

Is there an existing template or practice of how to link words that lexically differ by one word ? Such as bill and bull ? Similarly, is there an existing template or practice of linking words which have a pronunciation that only differs by one sound (at least in some dialects), such as tinned and tend. These are important in teaching a language so the student knows the importance of pronunciation and knows what other words they may be mis-interpreted as saying. I know we have homophones to warn if two words sound similar. Do we have a standard way to describe these other relationships ? Bcent1234 (talk) 13:53, 3 October 2016 (UTC)[reply]

Spanish I had actually just started User:Koavf/Appendix:Spanish terms distinguished by similar letters a few days ago. If others think it is a good idea to have these, I would agree. —Justin (koavf)❤T☮C☺M☯ 14:08, 3 October 2016 (UTC)[reply]

I was thinking these words would be linked in the word entries, not a separate page, much like homophones are. By the way, would the perro / pero pair qualify in Spanish? Bcent1234 (talk) 14:14, 3 October 2016 (UTC)[reply]

Exactly This is what I had in mind: differences like <l>/<ll> or <r>/<rr> or <n>/<ñ>. But this brings up the question of inclusion criteria for "minimal difference". Is it just one letter or phoneme? Bill/bull and also lose/loose? What about el/él? I assume that we want to restrict these pairs to a given language so that we aren't linking trivial things across languages like the Anglicized facade to façade, since that will be in the etymology or alternative spellings anyway if they have an actual relationship. —Justin (koavf)❤T☮C☺M☯ 14:27, 3 October 2016 (UTC)[reply]

For my students, I want words that are minimally different in writing or pronunciation which yield a totally different meaning, as they cause problems for the student because it is a distinction the non-native speaker hasn't learned is significant. I don't know that there is a different between Anglicized facade to façade as I don't really speak French so I can't contrast the English word meaning to the French word meaning. Bcent1234 (talk) 15:43, 3 October 2016 (UTC)[reply]

Hungarian: I collect them in Appendix:Hungarian pronunciation pairs. A while ago I did link these entries to each other in the Pronunciation section, but other editors did not think it was a good solution. Maybe this time we can come up with a better way. --Panda10 (talk) 17:02, 3 October 2016 (UTC)[reply]

Since these are largely subjective (in terms of which groups to create, as well as sometimes what belongs in a group) I think that appendices are a much better choice than main entry space. Such relationships could quickly overwhelm entries. - TheDaveRoss 17:24, 3 October 2016 (UTC)[reply]

I personally don't see these words as subjective, as they are created by a mechanical process from an existing word. simply identify a phoneme or a letter, and replace it by another phoneme or letter where the end result is also a word.

This mechanical process is not any different than that which is chosen to create an anagram of a word. There is a part which is difficult to automate, as an anagram may be a word for some folks, and not for others, just as a minimal change to a word may not be a recognizable word for some folks and might be one for others. But this isn't really much different than homophones might not sound alike in one dialect and can in another. Bcent1234 (talk) 22:12, 3 October 2016 (UTC)[reply]

I meant that it is subjective where you draw the line, you can draw it at a single character which is similar, you can draw it at two characters, etc. - TheDaveRoss 15:55, 16 November 2016 (UTC)[reply]

Personally, I can understand making an appendix if there were only a few minimal change words, but I see there as being a LOT of them. I can understand not having them in a single page as the page would grow very large very quickly. I guess you could use the browser "search page" capability to find a word and where it appears in a chain/groups. It just seems more inefficient as a single word may appear in multiple chains/groups. Bcent1234 (talk) 22:12, 3 October 2016 (UTC)[reply]

I don't see a need for this. --Wiki Tiki 89 18:30, 5 October 2016 (UTC)[reply]

I support adding a 'Minimal pairs' header to our list of standard headers somewhere in the -nyms section. Let editors who want to invest their time do that. I'd have had use of that kind of thing many times in the past. I'm thinking of spelling only for a start. Korn [kʰũːɘ̃n] (talk) 20:27, 5 October 2016 (UTC)[reply]

I agree that spelling is the natural basis for the process/method that would create them. Could someone explain the idea and ramifications of doing this in an appendix? As you can see from my comment earlier, I thought making an appendix would involve all the minimal pairs for every word being mentioned in a single document. We currently distribute the cost of the anagrams links and the homophones links in each word. It may not change the cost, but it would make it only visible when you are in a minimal pair grouping. Bcent1234 (talk) 21:50, 18 October 2016 (UTC)[reply]

We already have users complaining that they can't find the definitions (especially in English). We also have many complaining about how long it takes to download larger pages, ie, those entries with short headwords that appear in multiple languages that are highly likely to appear in minimal difference pairs. Hiding some of the material that only a linguistics major would love addresses issue one, but not issue two. DCDuring TALK 22:30, 18 October 2016 (UTC)[reply]

I am focused on the number of syllables in a word, right now, but understand the argument of bloating individual entries and making the definition hard to find. I will try the Appendix:Minimal_Pairs idea and report back to everyone what I see as the results. Should I continue to report here? or should I report to the current month when I get more information? Bcent1234 (talk) 15:15, 16 November 2016 (UTC)[reply]

We could link to the Appendix from an L4-header somewhere below the definitions. Appendices are too obscured from the mainspace in general. Korn [kʰũːɘ̃n] (talk) 15:50, 16 November 2016 (UTC)[reply]

Third LexiSession: police

Dear all,

Apologies for writing in non-native English; please fix any mistakes you may encounter in these lines!

The Tremendous Wiktionary User Group, a nice and open gathering of Wiktionarians, is happy to introduce the third chapter of our collective experiment baldly named LexiSession.

So, what is a LexiSession? The idea is to coordinate contributors from different languages to focus on a shared topic, to enhance all projects at the same time! First LexiSession was about cat, second on roads and ways. For this third LexiSession, we offer a month - until the end of October - to deal with the police! There is a substantial amount of slang and police codes, including abbreviations of services, and it can be very helpful to help people to better understand this domain.

English Wiktionary already have a Wikisaurus:police but there is still plenty work to do. If you're up for this LexiSession, please indicate your contributions here! You can also have a look at what other Wiktionarians are doing, on the LexiSession Meta page. We will discuss the processes and results in Meta, so feel free to have a look and suggest topics for the next LexiSessions.

Thank you for your attention, and I hope you will be interested in this new way of contributing. I'll get back to you later this month for an update! Noé (talk) 14:19, 3 October 2016 (UTC)[reply]

Update: Thesaurus of police in French have been created. Info about this LexiSession have be spread to a large range of other Wiktionary, but you are still welcome to translate this message to our non-English speaking mates on other projects, if you can :) Noé (talk) 07:24, 7 October 2016 (UTC)[reply]

Time for feedback! It is very hard to know if there is any changes influenced by LexiSessions. To be honest, it is slightly depressing to invest efforts to animated several communities every months without feedback. I am convinced by this idea and I am not begging for contributions, just for comments about this dynamic. It may be improve by another calibration, another schedule, maybe by linking the theme with the monthly Photo challenge or thematic months. I don't know. Maybe by organizing a simultaneous edit-a-thon on Wiktionary worldwide, but it may be very hard considering the few iteration of specific Wiktionary-a-thon made in the past. Well, I stay tuned for comments on LexiSession! Noé (talk) 09:59, 2 November 2016 (UTC)[reply]

"OK guys this month do this" won't get us interested.--Giorgi Eufshi (talk) 10:36, 2 November 2016 (UTC)[reply]

I feel that the OP could be shorter and less promotional in tone. Along the lines of: "Some people from other Wikimedia projects are currently interested in focusing on the theme 'police'. I see that Wiktionary has Category:en:Law enforcement and Wikisaurus:police linking to a variety of entries. Do you think there's any specific work to be done in these entries on short notice?" -- And @Noé, you could offer yourself to help with any Wiktionary entries, for example by recording your voice as pronunciation examples of words in the theme you want to work for. --Daniel Carrero (talk) 10:46, 2 November 2016 (UTC)[reply]

It's very hard for me to have the right tone, enthusiastic but not promotional, informative but not directive. I don't have skills in community management and my knowledge of English is mainly focused on academic literature, so my writing in this language is not very vivid. Plus, I am not very familiar with English Wiktionary. I may try something different for the next ones...or let someone else deal with your community, if there is volunteers. If not, I'll may also consider LexiSession not to be something adequate for the way people like to contribute in Wiktionary. Perhaps we don't need to communicate between communities and stay divided. I am just afraid of Wikidata wills of merging our communities without including us so much. I think horizontal co-contributions are better. Maybe by a in residence operation? I'll be glad to host a Wiktionarian for a week or so, if it helps to arose new ideas. Something else?

@Daniel Carrero, I am creating entries and thesauri for each themes and I like to add pictures

I may record pronunciations if I can find a good mic and a quiet place. There is more than fifty shade of contribution

Noé (talk) 12:13, 2 November 2016 (UTC)[reply]

Contrary to Giorgi Eufshi and Daniel Carrero, I like this idea and only wish I had the patience to contribute more to it. I must have missed the announcement of October's topic, but I did improve a few Hebrew entries during September's streets topic. --Wiki Tiki 89 13:12, 2 November 2016 (UTC)[reply]

The idea of LexiSessions doesn't fit the way I contribute particularly well. That said, it did inspire me to add the word for "police" in a couple languages. —Μετάknowledge^{discuss/deeds} 18:02, 2 November 2016 (UTC)[reply]

<nods> This is not quite how I tend to contribute, either, but I really appreciate Noé's (and Tremendous Wiktionary Group's) work to improve communication within Wiktionary. TWG was recently recognized as an official Wikimedia users group, and I hope to hear a lot from them during their first official year. Like other regular notifications I rarely take time to respond to TWG's lexisessions, but I do read about them and hope they are successful, and would miss them if they stopped happening. - Amgine/^t·e 19:02, 2 November 2016 (UTC)[reply]

I would think the obvious thing to do is to make sure that we have identified the most relevant categories, in this case, Category:Law enforcement (See Category:en:Law enforcement.), determine whether additional (sub?)categorization would useful and not in conflict with other goals, and make sure all such categories are appropriately populated. In this case there are only 178 entries and one category (for Prison) in Category:en:Law enforcement. In contrast nearly a thousand English noun entries use the word police, mostly in the headword or a definition, but sometimes in less relevant roles such as citations. Sorting this out usefully would require actual engagement with member of TWG to determine what they thought about categories and membership, definitions, and entries. DCDuring TALK 19:36, 2 November 2016 (UTC)[reply]

That said, I don't know how one could count on someone being motivated to do the work. DCDuring TALK 19:46, 2 November 2016 (UTC)[reply]

I also think these are a good idea, and some of our entries on common and basic words do need updating and revising, but I tend to focus on my own (very numerous) to-do lists. Equinox ◑ 19:39, 2 November 2016 (UTC)[reply]

Pronunciation and Etymology

I have made it a practice to put the Pronunciation section under the Etymology section, but in a word like luma I think it is more readable at the first, as the pronunciation is common to both Etymologies. Is there a consensus on this ? Bcent1234 (talk) 16:38, 3 October 2016 (UTC)[reply]

Personally, I believe it makes sense to place etymology before pronunciation (except in Japanese) and in my experience English entries often use that format. (I don't have actual numbers.) I don't think there's a consensus on this. WT:EL#List of headings has "Etymology" before "Pronunciation", but this was not voted yet. --Daniel Carrero (talk) 17:06, 3 October 2016 (UTC)[reply]

I put pronunciation before etymology, because that way, the order is the same whether there is one etymology section or several. It's more consistent that way. —CodeCa t 17:10, 3 October 2016 (UTC)[reply]

By the way, note that the word is "etymology" not "entymology" (possibly you are getting it mixed up with entomology). Mihia (talk) 17:58, 3 October 2016 (UTC)[reply]

Thanks. I fixed my typos. I'm aware of the two words and difference of meaning (history & bugs) but struggle sometimes Bcent1234 (talk) 18:10, 3 October 2016 (UTC)[reply]

I put Etymology first, then Pronunciation, because the pronunciation may change due to the Etymology. For instance, English record (noun) and record (verb) each have distinct Etymology sections with nested Pronunciation sections. However, in the event that multiple Etymologies have identical Pronunciations, I will often place the Pronunciation outside (i.e. first) in order to conserve space on the page. Leasnam (talk) 18:14, 3 October 2016 (UTC)[reply]

I find that it's much more common across languages for the same pronunciation to apply across all etymologies, than for the pronunciations to differ. So this is the case that we should base it on. Which is why I put pronunciation before etymology. The first section after etymology should always be the POS section, except when there are etymology-specific pronunciations. The same logic applies to alternative forms as well, but since we now have a vote to allow putting them under the POS section, that point is moot. —CodeCa t 18:20, 3 October 2016 (UTC)[reply]

There was a long discussion about this a few months back. The consensus was that the editors of each language should decide which order is the most reasonable for their languages - and I think, but this is a part I'm not at all sure of, that within one language, the order should be fixed. If you want to search for it, I believe it was in the Beer Parlour and involved Latin, Russian and Japanese examples, the Latin one being the entry auraria, if I'm not mistaken. Korn [kʰũːɘ̃n] (talk) 19:56, 3 October 2016 (UTC)[reply]

@CodeCat I have complained to you about this before. If existing entries do things a particular way, you need to follow that way even if you think a different way is more logical. In particular, if a given language tends to put etymology before pronunciation, you need to follow that. Benwing2 (talk) 20:12, 3 October 2016 (UTC)[reply]

Indeed. A lack of consistency makes Wiktionary harder to use, and also makes us look very disorganized. Now, on that note, CodeCat also tends to split nouns and verbs with a common origin into two etymologies on the basis of one having come from the other and thus having a marginally different origin. I don't strongly disagree with this, but which leads to a lot of inconsistency. Am I justified in merging these etymology sections, or is there a sufficient lack of consensus on this that CodeCat's position is equally valid? Andrew Sheedy (talk) 02:39, 4 October 2016 (UTC)[reply]

There already is consensus that consistency should take a second place to reasonable order, though. For me, the information that a word only secondarily derives from a homophone is relevant information and keeping it under a separate etymology seems cleaner to me. Korn [kʰũːɘ̃n] (talk) 07:37, 4 October 2016 (UTC)[reply]

I'm not sure if consistency should take a second place to reasonable order, (I mean, judging entries case-by-case, right?) but it does sound like something other people may be likely to support. Either way, I don't think there's evidence of actual consensus for that yet. Please let me know if this was discussed before. --Daniel Carrero (talk) 08:02, 4 October 2016 (UTC)[reply]

@Daniel Carrero, Bcent1234 Wiktionary:Beer_parlour/2016/January#About:_Pronunciation_1.2C_Pronunciation_2.2C_Pronunciation_3 - You actually initiated that discussion and created a failed vote from it: Wiktionary:Votes/2016-02/Multiple_pronunciation_sections#Decision. Korn [kʰũːɘ̃n] (talk) 09:16, 4 October 2016 (UTC)[reply]

Yes, but the discussion was basically about numbered pronunciation sections, like "Pronunciation 1". I don't think it is a great indicator of what we do concerning entries which have non-numbered etymologies and pronunciations. --Daniel Carrero (talk) 09:23, 4 October 2016 (UTC)[reply]

A good part of the discussion was about the relationship and order of the Pronunciation and Etymology headers, it's the only conversation relevant to the question here of which I know. Korn [kʰũːɘ̃n] (talk) 10:45, 4 October 2016 (UTC)[reply]

Fair enough. --Daniel Carrero (talk) 10:54, 4 October 2016 (UTC)[reply]

I didn't realize this was an issue. I thought our longstanding practice was to put ===Pronunciation=== after ===Etymology===, unless there are multiple etymologies and all have the same pronunciation, in which case ===Pronunciation=== comes before ===Etymology 1===. —Aɴɢʀ (talk) 21:04, 3 October 2016 (UTC)[reply]

I believe the long discussion mentioned above (the one with auraria as an example entry) was Wiktionary:Beer parlour/2016/January#About: Pronunciation 1, Pronunciation 2, Pronunciation 3, which was followed by Wiktionary:Votes/2016-02/Multiple pronunciation sections. The discussion was basically about the existence of numbered pronunciation sections: "Pronunciation 1", etc. The order between non-numbered etymology and pronunciation was at best a secondary issue in that discussion. --Daniel Carrero (talk) 07:56, 4 October 2016 (UTC)[reply]

Suggestion: Rule in EL about not linking back misspellings

I suggest adding this rule to WT:EL (section section WT:EL#Alternative forms) eventually. Apparently, this is a rule we already follow, so I figure it shouldn't hurt to formalize it:

"Misspellings link to the correct spellings, but correct spellings do not link back to misspellings. Don't link marshmallow (correct spelling) to marshmellow (misspelling)."

--Daniel Carrero (talk) 19:31, 3 October 2016 (UTC)[reply]

I heartily agree with this policy. Students don't need to know how many ways to mis-spell. They can come up with those on their own. Linking back to the correct spelling is useful, as it changes the page to the proper page. Is this more than just a #REDIRECT ? or is there a need to have a true page for the mis-spelled word ? Bcent1234 (talk) 19:48, 3 October 2016 (UTC)[reply]

But theoretically one can misspell an already misspelled word. At least there are two level deliberate misspellings: pr0n is deliberate misspelling of pron that in turn is for porn.--Giorgi Eufshi (talk) 07:53, 4 October 2016 (UTC)[reply]

I would oppose transforming all misspellings into redirects. To be fair, even if we decided to do that, pr0n is a deliberate misspelling and thus a word on its own right, and I believe words like that could be "spared" and kept as normal entries. Still, there are probably a few entries for misspellings which can't be redirects because they are spelled the same as other normal words. The Portuguese entry trás is both a normal preposition and a misspelling of a verb form. --Daniel Carrero (talk) 04:06, 5 October 2016 (UTC)[reply]

This rule has nothing to do with entry layout and does not belong in WT:EL. I don't think it really needs to be codified at all, it hasn't been a problem. --Wiki Tiki 89 18:32, 5 October 2016 (UTC)[reply]

I agree that WT:EL is not the right place for this. But I tend to feel that the misspelling thing has grown beyond what it should be, and we now have a lot of misspelling entries that don't really deserve to exist. Equinox ◑ 19:48, 5 October 2016 (UTC)[reply]

This isn't even about them existing, but about linking to them. I don't think we have an epidemic of that and I don't think anyone would object to removing those links. --Wiki Tiki 89 19:49, 5 October 2016 (UTC)[reply]

I'm under the impression that "entry layout" would encompass "what we should put and not put in an entry". For example: Should we add translation tables, and where? Don't add translation tables inside Finnish entries! -- Because WT:EL says we shouldn't! --Daniel Carrero (talk) 22:00, 5 October 2016 (UTC)[reply]

I see your point. However, I still don't think we need to have an explicit policy about this. --Wiki Tiki 89 22:05, 5 October 2016 (UTC)[reply]

I'm afraid the current WT:EL#Alternative forms might imply that we should list misspellings in the alternative forms section, because it is listed equally with other entry variations. The current text was voted approved at Wiktionary:Votes/pl-2015-10/Entry name section as part of the "Entry name" section, but I moved it to the current section without a vote some time ago, because it seemed to make more sense. If we clarified that misspellings should not link back to entries, I believe it would be an improvement. --Daniel Carrero (talk) 21:58, 5 October 2016 (UTC)[reply]

Unless anyone objects to this change, that is no longer a problem. --Wiki Tiki 89 22:05, 5 October 2016 (UTC)[reply]

I support your change, it looks good. I agree that my concern above is no longer a problem.

I'd still like to create a vote eventually, introducing that rule about not linking back misspellings, just because it's something we already do. This does look like a good "layout" rule. --Daniel Carrero (talk) 22:09, 5 October 2016 (UTC)[reply]

Why can't we focus on the issues that matter first? --Wiki Tiki 89 22:16, 5 October 2016 (UTC)[reply]

I agree that this rule is not very important -- still, many of my EL votes are about formalizing unwritten rules that we already follow, which is. I created most of the 2016 votes (WT:VTIME) and a chunk of the votes in previous years. If we wait for all big problems to be solved we won't ever get to tackle small issues. I could even say: "I'm satisfied that we have reviewed all the current pronunciation text and our list of POS sections, and that EL even finally mentions that prefixes and suffixes usually have an hyphen in the entry title. I'm so happy that I will look for some small stuff to solve now." At the moment, I'd say that Wiktionary:Votes/2016-07/Request categories and Wiktionary:Votes/pl-2016-09/Placement of "Alternative forms" 2 (weaker proposal) are examples of really important current votes, and some other votes that I created may be less important compared to them. I prefer to have a small number of major votes at a time, because they require more thought, discussion and are harder to pass. Adding a simple "don't link back mispellings" should be a no-brainer, in my opinion. --Daniel Carrero (talk) 22:45, 5 October 2016 (UTC)[reply]

I agree with having common misspellings link or redirect to the correct spelling. However, it concerns me that having misspellings as headwords means that these get picked up in word lists and presented as if correct. For example, if you type bizzare into onelook.com, it lists the Wiktionary entry with no indication that it is misspelled. Someone looking for confirmation of spelling might take that as such and not look further. Mihia (talk) 00:17, 7 October 2016 (UTC)[reply]

You're right. I agree that this is a valid concern. Redirecting all misspellings when possible would fix that problem. When an entry is both a misspelling and an actual word, we could use some soft redirect like "You may be looking for marshmallow." or something. --Daniel Carrero (talk) 00:29, 7 October 2016 (UTC)[reply]

I guess there are some entries that have both misspelling and right-spelling senses in one and they must be linked unavoidably. Found one: apart --Octahedron80 (talk) 00:39, 7 October 2016 (UTC)[reply]

CFI and idiomaticity clarification

Based on User talk:Renard Migrant#CFI and idiomaticity clarification, I created Wiktionary:Votes/pl-2016-10/CFI and idiomaticity clarification. --Daniel Carrero (talk) 04:02, 5 October 2016 (UTC)[reply]

Creative Commons 4.0

Hello! I'm writing from the Wikimedia Foundation to invite you to give your feedback on a proposed move from CC BY-SA 3.0 to a CC BY-SA 4.0 license across all Wikimedia projects. The consultation will run from October 5 to November 8, and we hope to receive a wide range of viewpoints and opinions. Please, if you are interested, take part in the discussion on Meta-Wiki.

Apologies that this message is only in English. This message can be read and translated in more languages here. Joe Sutherland (talk) 01:34, 6 October 2016 (UTC)[reply]

About the smallest discussions

Based on this request by @Korn, I created Wiktionary:Smallest discussions and added a new "smallest discussions" box in the watchlist. Feel free to discuss/revert/etc.

Note: There's a minor bug that could be annoying. The displayed entries are only properly formatted if the headings have spaces between the title and the equal signs. So, == example == works but ==example== would not work. All listed entries have spaces in the headings as I described, so this bug could go unnoticed for a while. I don't know how to fix it. --Daniel Carrero (talk) 10:29, 6 October 2016 (UTC)[reply]

Can we make the watchlist box collapsible? BTW, I plan on going through the module and reworking it, including fixing the section header bug. --Wiki Tiki 89 13:58, 6 October 2016 (UTC)[reply]

Personally, I slightly prefer the un-collapsed watchlist box, but it's fine if other people want it collapsed. Thank you for fixing the module and reworking it. Apparently you successfully fixed the bug that I mentioned above. --Daniel Carrero (talk) 19:44, 6 October 2016 (UTC)[reply]

About WT:SD

WT:SD was an old, barely used redirect to Category:Candidates for speedy deletion. I created CAT:SD for it. I'd like to avoid having redirects from WT: to Category: when possible. I edited all pages that were using this shortcut and pointed it to Wiktionary:Smallest discussions. I suppose it's OK? --Daniel Carrero (talk) 19:44, 6 October 2016 (UTC)[reply]

I'll repeat what I said on your talk page: Don't forget that these shortcuts are not only for links but also for the search bar. So the number of pages that use it is only half the picture. --Wiki Tiki 89 21:37, 6 October 2016 (UTC)[reply]

Point taken. I don't suppose I should revert what I did? CAT:SD really is better than WT:SD. It's unlikely that many people used "WT:SD" in the search bar to mean Category:Candidates for speedy deletion, otherwise chances are they would use it more in actual discussions. At least, in the last months, barely anybody even access that specific redirect page. (see access counter) --Daniel Carrero (talk) 21:55, 6 October 2016 (UTC)[reply]

Why is it "better"? No one ever needs to discuss Category:Candidates for speedy deletion. It's just a page that admins check once in a while. I have restored the deleted WT:CSD, and re-added both shortcuts to the category page. We can re-check later to see whether CAT:SD is actually more popular than the old ones. --Wiki Tiki 89 22:01, 6 October 2016 (UTC)[reply]

The thing with the shortcuts and the search bar is another thing that nobody tells you in an easy to find manner. Let me just mention user:Korn/draft again. Korn [kʰũːɘ̃n] (talk) 23:12, 6 October 2016 (UTC)[reply]

I used WT:SD (and WT:FSB btw) --Derrib9 (talk) 08:14, 7 October 2016 (UTC)[reply]

September News of French Wiktionary

Hi all,

French Wiktionary is publishing a monthly page with fresh news about the project named Actualités. In August, we started to translate our editions to English, to give more visibility on what is going on with French Wiktionary. So after August Actualités, here is September Actualités. Translations have been made by Pamputt and I, with probably mistranslations. So, be gentle on the language, it is still a wiki and it is collaboratively improvable We are very interested by every comments you may share about this publication and are aware of what can be of your interest for the next edition! Noé (talk) 07:37, 7 October 2016 (UTC)[reply]

I just wanted to highlight the article on Synonyms which also serves as a metric for development/maintenance of the Thésaurus. Useful pseudo-objective concept. - Amgine/^t·e 15:02, 7 October 2016 (UTC)[reply]

Dutch nouns with gender-based meanings

This topic came up while editing zegel (cf Talk:zegel). Genders are a tricky topic in Dutch, some nouns can have different meanings depending on the used gender, in this case het zegel (“seal”) and de zegel (“stamp”). Right now both senses are in one entry, with the gender mentioned in the definition line. Is there a better way to achieve this? Maybe split into two complete separate entries? – Jberkel (talk) 09:49, 7 October 2016 (UTC)[reply]

You can have two separate noun headings to take into account different genders, yes. Renard Migrant (talk) 17:12, 7 October 2016 (UTC)[reply]

That's what we normally do. See Swahili spika, for example. —Μετάknowledge^{discuss/deeds} 17:51, 7 October 2016 (UTC)[reply]

But those have different etymologies. They have different noun classes, which is equivalent having to different declensions in, say, Latin. The choice of noun class is an inherent part of the borrowing process; why else would there be two outcomes? In fact, it's equally possible that only one of them was actually borrowed, and the second was derived from the first. The etymology doesn't currently clarify this. —CodeCa t 19:52, 7 October 2016 (UTC)[reply]

Well, they do come from different senses of speaker. It's unparsimonious to suggest that one was derived from the other, considering the senses involved. —Μετάknowledge^{discuss/deeds} 21:04, 9 October 2016 (UTC)[reply]

Standardizing Template:calque

I happened to be using this template to note a few Latin grammatical terms (for instance, optativus) that are calques of Greek terms. The template is rather annoying to use: it doesn't take the "from" language in the second parameter, the term in the third parameter, the link text in the fourth parameter, and the translation in the fifth parameter, as {{der}}, {{inh}}, and {{bor}} do, but instead uses |etyl lang=, |etyl term=, and |etyl t=. I think it should be standardized to use the same parameters as the other etymology templates. Would anyone disagree? I see it uses Module:etymology/templates, so I am not sure how to make the changes myself — Eru·tuon 18:50, 7 October 2016 (UTC)[reply]

I agree, but let's discuss what the interface should be before making any changes. --Wiki Tiki 89 18:53, 7 October 2016 (UTC)[reply]

@Wikitiki89: Well, if this is what you mean, how about this, for the example I gave above: {{calque|la|grc|εὐκτική||related to wishing}} > Calque of Ancient Greek εὐκτική (euktikḗ)? — Eru·tuon 19:06, 7 October 2016 (UTC)[reply]

The current setup is {{calque|fr|etyl lang=en|etyl term=light year}}. I'd say the new setup should be {{calque|fr|en|light year}}, but that won't work right away because the older setup, {{calque|année|lumière|etyl lang=en|etyl term=light year|lang=fr}} is still accommodated. Perhaps we need a new template with the new setup during the period of transition; {{cal}} is currently a redirect to {{calque}} but it's used on only a handful of pages. Maybe we could separate {{cal}} for now, make it follow the new setup, correct the 13 pages it's used on, and then gradually migrate uses of {{calque}} with the old setup to {{cal}} with the new. Then, once no pages are using the old setup anymore, we can move {{cal}} back to {{calque}} (deleting the old template and leaving a redirect) in order to achieve our status quo of having a short-name template redirect to a long-name template. —Aɴɢʀ (talk) 19:08, 7 October 2016 (UTC)[reply]

@Erutuon: Don't forget that the current {{calque}} template supports a lot more features, such as showing the component parts in the calquing language as well. The new template would either have to handle that or we would have to figure out how to reformate that outside of the template. --Wiki Tiki 89 19:11, 7 October 2016 (UTC)[reply]

@Wikitiki89 Hmm. If that functionality should be supported, then perhaps the syntax would be {{calque|la|grc|εὐκτική||related to wishing|εὔχομαι|-τικός}}, but I don't quite understand how it works in {{calque}}... — Eru·tuon 19:18, 7 October 2016 (UTC)[reply]

That feature is already deprecated in the current version of the template. It's better to use the variety of morphology templates we have such as {{affix}} and {{compound}}. —CodeCa t 19:20, 7 October 2016 (UTC)[reply]

We don't need to create a new template, we can make the existing one support both old and new usage in the same way that {{bor}}, {{prefix}} and {{suffix}} do. In each of those templates, the presence of lang= is tested for, and if it's absent, then the new parameter format is used, otherwise it falls back to the old one. {{calque}} already does this too, but only with lang=, not with the other parameters. However, etyl lang= is set as a required parameter by the module, so we have the guarantee that all existing uses have that parameter. This means that we can use its presence to switch between old and new behaviour in the same way. If etyl lang= is present, then use the old parameters, otherwise use the new ones. —CodeCa t 19:26, 7 October 2016 (UTC)[reply]

Yes, I see now that it doesn't do anything too fancy anyway. I completely agree then. And I agree with CodeCat that we don't need a new template. --Wiki Tiki 89 19:30, 7 October 2016 (UTC)[reply]

I've implemented my proposal now. All existing entries should still work, but see optativus, which now uses the new parameter format. That said, I'd like it if @Erutuon added a Latin etymology to the entry as well, to show which elements the word was constructed from when calqueing. —CodeCa t 19:47, 7 October 2016 (UTC)[reply]

@CodeCat Done. — Eru·tuon 19:56, 7 October 2016 (UTC)[reply]

Deprecating glosses as the fourth positional parameter of `{{m}}` and `{{l}}`

I have just added the parameter t= as an alias for gloss= for templates {{m}} and {{l}}. We have already been using t= for this purpose in templates such as {{der}} and {{cog}}. I think having this shorter alias should enable us to transition away from using the fourth positional parameter for this. Thus, instead of {{m|fr|école||school}}, we will have {{m|fr|école|t=school}}. The main advantage of this, is that we will no longer have to deal with the confusing empty parameter in between, that often causes a lot of errors. And especially, it enables us to more logically arrange the parameters when a transliteration is involved, for example {{m|he|בַּיִת|tr=báyit|t=house}} is much more logical than any of the current possibilities of {{m|he|בַּיִת|tr=báyit||house}}, {{m|he|בַּיִת||tr=báyit|house}}, or {{m|he|בַּיִת||house|tr=báyit}} (all of which actually do occur). What does everyone think? --Wiki Tiki 89 19:55, 7 October 2016 (UTC)[reply]

I know this section looks scary because of all the template code, but it affects everyone and needs input. --Wiki Tiki 89 18:08, 10 October 2016 (UTC)[reply]

I support deprecation of the parameters "gloss" and even the fourth parameter in favor of a short parameter, like "t". --Z 18:19, 10 October 2016 (UTC)[reply]

I'm not yet convinced this is necessary. Benwing2 (talk) 18:30, 10 October 2016 (UTC)[reply]

I support that.--Dixtosa (talk) 18:31, 10 October 2016 (UTC)[reply]

By the way, this was discussed before back when were first converting our link templates to lua (I think). However, I cannot seem to find that discussion (maybe someone else remembers where/when that was?). What I do remember, is that people did support it, but we didn't go through with it because I guess no one thought of using t= for it. --Wiki Tiki 89 19:09, 10 October 2016 (UTC)[reply]

Duplication of definitions for spelling and other minor variants

It seems obvious to me that definitions should not be duplicated across entries for spelling variants and other minor variations; for example, pedestrianise and pedestrianize, or Down syndrome and Down's syndrome. However, sometimes in the past (probably quite a while ago) when I have tried to merge definitions for such entries, the merge has been reverted with an explanation that the Wiktionary convention is to keep the definitions separate. I do notice, though, that certain entries such as labour and labor, which previously had duplicate definitions, now have the definitions all in one place. May I assume that common sense has now prevailed, and that it is OK to merge all definitions to one of the variants? Mihia (talk) 20:20, 7 October 2016 (UTC)[reply]

I hope so. —CodeCa t 20:24, 7 October 2016 (UTC)[reply]

@CodeCat What would be the best way to accomplish this? Transclusion? A template? A bot which monitors duplicate definitions? I have to admit, I have wondered this in the past myself... —Justin (koavf)❤T☮C☺M☯ 20:37, 7 October 2016 (UTC)[reply]

colo(u)r has been particularly contentious since the early days; see Talk:colour, Talk:color. Equinox ◑ 20:39, 7 October 2016 (UTC)[reply]

I always use US spellings even though I'm British just to have some consistency. I'd happily lemmatize just color and not colour if for no other reason than consistency. Renard Migrant (talk) 23:17, 7 October 2016 (UTC)[reply]

I agree (actually, I use ours sometimes and theirs sometimes, depending on the topic or usage region) but I think our various resident Anglo-Saxonists would be horrified. Relatedly, I believe WP has a policy of using (or at least not subsequently changing) UK spelling for UK topics, and so on; such a rule could in theory be applied to some kinds of dictionary entry. Equinox ◑ 23:21, 7 October 2016 (UTC)[reply]

Should they be called Anglo-Saxophones? DCDuring TALK 23:31, 7 October 2016 (UTC)[reply]

Ideally there should be a way of creating an entry called, for example, "colour or color", so that there is no preference except for the order. "color" and "colour" should then both point to that. Usage of the spellings can be explained somewhere in the single article. Failing that, I believe that the person creating the entry should choose where to place the definitions, where the topic is not obviously nation-specific. I do not support making all headwords American spellings by policy since that will give the impression that Wiktionary is an American English dictionary. Mihia (talk) 00:10, 8 October 2016 (UTC)[reply]

Unfortunately, though, that doesn't work with Wiktionary's multingual aspect. There are entries for other languages at both color and colour. Andrew Sheedy (talk) 00:37, 8 October 2016 (UTC)[reply]

I agree that is an obstacle, but maybe there is some way around it. In any case, the present situation with colour and color, where all the content -- definitions, translations, and the rest of it -- is duplicated, is clearly ridiculous, in my opinion. Mihia (talk) 01:11, 8 October 2016 (UTC)[reply]

While this may sound crazy, what if we made a list of all words with a pondian spelling difference, and divided it evenly so that an equal number of American and British spellings would host the main entry? How we split them would probably be fairly arbitrary, but it would ensure that American and British spellings get equal treatment.

On a side note, perhaps we should change the wording of definitions of British/American spellings when they link to the other form of the word (rather than duplicating content). For example, the definition line of honour would be changed to: (British, Canadian and Irish, Australian, NZ, and South African) See honor for definitions. It would thus no longer be implied that one was just an alternate (and perhaps inferior) form of the other. Andrew Sheedy (talk) 01:22, 8 October 2016 (UTC)[reply]

Heh, then every time we added one single new word with a spelling difference, we'd have the same argument about what to do with it. Equinox ◑ 14:01, 8 October 2016 (UTC)[reply]

Haha, true, though I can't imagine there are many words with these sorts of spelling differences that we have yet to add (for English, anyway). Andrew Sheedy (talk) 17:10, 8 October 2016 (UTC)[reply]

@Equinox: In case you are wondering, the relevant page on Wikipedia is w:WP:ENGVAR. —Justin (koavf)❤T☮C☺M☯ 13:56, 8 October 2016 (UTC)[reply]

I oppose any merge of definitions that is from a higher-frequency variant to a lower-frequency variant. That is where a contention arises since some people wanted that the variant that is the oldest in Wiktionary should be the main one, which can turn out to be the lower-frequency one. --Dan Polansky (talk) 16:06, 8 October 2016 (UTC)[reply]

My preference is that whoever adds the first form get to choose the primary form. That would probably produce about 50% left- and right-pondian main entries. SemperBlotto (talk) 16:12, 8 October 2016 (UTC)[reply]

Are misspellings lemmas?

I was under the impression that they were not, and that misspellings should use {{head|xxx|misspelling}}. "Misspellings" is also listed under nonlemmas in Module:headword. We define lemma as "The canonical form of an inflected word", so is a misspelling canonical? DTLHS (talk) 01:11, 8 October 2016 (UTC)[reply]

A misspelling may be inflected; it can have a plural, verb conjugations, and so on. That makes them fit that definition, for the same reason alternative forms in general are considered lemmas. Furthermore, for some languages we show the inflections of words in the headword line, which necessitates the use of a headword-line template that places the entry in the lemmas category. —CodeCa t 01:15, 8 October 2016 (UTC)[reply]

It should be removed from the list in Module:headword if so ("misspelling" isn't a part of speech anyway). DTLHS (talk) 01:19, 8 October 2016 (UTC)[reply]

It's only in the list because many entries already used it when the list was made, and I didn't want to flood Category:head tracking/unrecognized pos. —CodeCa t 01:20, 8 October 2016 (UTC)[reply]

It may be possible to cleanup with bot. --Octahedron80 (talk) 01:30, 8 October 2016 (UTC)[reply]

I like the idea discussed above of redirecting all misspellings to the main entries when possible, to avoid any sites that use Wiktionary to consider misspellings such as "marshmellow" and whatnot as correctly spelled entries. --Daniel Carrero (talk) 01:54, 8 October 2016 (UTC)[reply]

A possible disadvantage of this (if I am correctly understanding "redirecting") is that people may not notice that they have been redirected, and, for hard-to-spot misspellings, may not become aware that what they originally typed was misspelled. Mihia (talk) 01:57, 8 October 2016 (UTC)[reply]

What about something like {{no entry}}? DTLHS (talk) 01:58, 8 October 2016 (UTC)[reply]

That sounds like a good idea to me. --Daniel Carrero (talk) 02:01, 8 October 2016 (UTC)[reply]

That doesn't address the concern that they should be lemmas, possibly with inflected forms listed. DTLHS (talk) 02:02, 8 October 2016 (UTC)[reply]

As a random example, we have the misspelling aqcuire but we don't have entries for aqcuired, aqcuiring and aqcuires... Should we? I don't see a lot of value in linking "acquire" to its misspelled conjugations, but that may be just me. Maybe we could create entries for all the misspelled conjugations and just link them to the correctly spelled conjugations, which sounds a great way to handle entries for misspelled conjugations (but I'm not saying that we should have them in the first place...) It's my opinion, at least. If possible, I would want Category:English lemmas without any misspellings, because it is our "index" in a way, and having blatantly wrong entries there seems harmful, to some extent. We can't expect everyone to check all entries individually when navigating, to make sure that they are not defined as "misspelling of" something. --Daniel Carrero (talk) 02:28, 8 October 2016 (UTC)[reply]

Yes. Misspellings are, and must be treated as, second-class citizens, or we will descend into farce. Equinox ◑ 09:30, 8 October 2016 (UTC)[reply]

Misspelling are lemmas in that they are not inflected forms. The word "canonical" in the definition of lemma is misleading; it is "canonical" only in that it is the form chosen to be in the dictionary whereas the other forms are absent from a traditional dictionary. Traditional dictionaries focus on words as lexemes, not words as inflected word forms, and for the purpose, they do pick a favor with one form type that they declare to be the "lemma". Misspellings are second-class citizens in that they are declared as misspellings, and do not contain their own definitions proper. Soft redirects such as that in aqcuire, with definition line # {{misspelling of|acquire|lang=en}}, have been the usual practice and make sense to me. --Dan Polansky (talk) 16:16, 8 October 2016 (UTC)[reply]

Non-lemmas for misspellings

e.g. we have a misspelling entry "yooman" for "human", or "digg" for "dig"; we should not allow inflections like "yoomans" or "diggs". Isn't that a voted policy? I can't find it. I thought we had to use (head|en|misspelling), and not allow inflected misspelling entries. But CodeCat just reverted me here: [1]. If what I describe isn't policy, and I'm mistaken, then it probably should be. Equinox ◑ 09:29, 8 October 2016 (UTC)[reply]

Just noticed the thread above appears to deal with the same topic... Equinox ◑ 09:30, 8 October 2016 (UTC)[reply]

I can assure you that disallowing entries for misspelled inflections is not a voted policy, at least not yet. If you visit WT:VTIME, click "Show other boxes" and do a Ctrl+F for "spell", you are going to find some voted policies about misspellings, but this is not one of them. (but maybe you already did what I said)

Should we disallow all misspelled inflections? Surely there are some common misspelled inflections that we would want to keep? I just dislike the idea recently proposed of having to create separate entries for inflections and conjugations of every "lemma" misspelling. Just because we have aqcuire, it does not mean that we should automatically create aqcuires, aqcuired and aqcuiring. --Daniel Carrero (talk) 12:31, 8 October 2016 (UTC)[reply]

Indeed, #Are misspellings lemmas? above is for a similar topic. --Dan Polansky (talk) 16:24, 8 October 2016 (UTC)[reply]

Extend Description vote?

I'm repeating what I said in the vote talk page. Wiktionary:Votes/2016-08/Description is going to end in 2 days, and currently has only 8 participants (5-3-0). Maybe we should extend it by 1 month?

Currently, the vote would fail. If 1 more people supported it, it would pass. Either way, this small turnout is not a great indicator of consensus. --Daniel Carrero (talk) 11:52, 8 October 2016 (UTC)[reply]

Perhaps the low turnout is an indication that folks aren't that interested. Or perhaps the mistake is to have votes during Summer in the northern hemisphere. DCDuring TALK 12:03, 8 October 2016 (UTC)[reply]

That does not really answer my question, but I remember that last time in August 2016, you opposed extending a vote. Maybe it's understandable if people are disinterested with this vote, because it basically only affects some Translingual symbols. In my experience from previous years, I did not notice any big change in turnout depending on the season of the year, but in any case it's not summer yet. --Daniel Carrero (talk) 12:14, 8 October 2016 (UTC)[reply]

I have noticed a lower participation during Summer. Also early Fall sometimes seems to lead to lower participation by the educators among us.

Votes about trivial matters will not get much participation. DCDuring TALK 13:45, 8 October 2016 (UTC)[reply]

Thanks for the information. Don't you agree that this proposal required a vote? I don't think that we can introduce a new heading without a vote. --Daniel Carrero (talk) 13:50, 8 October 2016 (UTC)[reply]

I support extension of the vote since the participation has been not so great so far and the result is near to pass. An extension opens the proposal to greater scrutiny, and the threat of result picking or "fishing" for results is greatly overrated, IMHO. --Dan Polansky (talk) 16:27, 8 October 2016 (UTC)[reply]

Derived terms vote

Based on Wiktionary:Beer parlour/2016/September#{{bor}} and {{inh}} should also categorize into "Foo terms derived from Bar", I created Wiktionary:Votes/2016-10/Populating "derived from" categories with borrowed and inherited terms. --Daniel Carrero (talk) 12:00, 8 October 2016 (UTC)[reply]

Wouldn't there be something to be learned from low participation in other votes? DCDuring TALK 12:04, 8 October 2016 (UTC)[reply]

What's your point? --Daniel Carrero (talk) 12:06, 8 October 2016 (UTC)[reply]

Votes on trivial matters leads to low participation in all votes. Too many votes on trivial matters is likely to lead to less willingness to take the trouble to make an informed decision on subsequent votes. DCDuring TALK 13:49, 8 October 2016 (UTC)[reply]

How many of the current votes in the watchlist box are on trivial matters? --Daniel Carrero (talk) 13:59, 8 October 2016 (UTC)[reply]

Not every BP discussion needs to be turned into a formal vote. We can do a poll in the BP or just assess the consensus from the discussion itself. --Wiki Tiki 89 15:52, 10 October 2016 (UTC)[reply]

@Daniel Carrero: Let's make another rule of thumb: Only create votes for things if the desire for a vote has already been expressed in the discussion by at least a few editors. --Wiki Tiki 89 16:02, 10 October 2016 (UTC)[reply]

Why is that needed, even as a rule of thumb? --Daniel Carrero (talk) 16:15, 10 October 2016 (UTC)[reply]

Because if no one's asking for a vote, then no one wants a vote. You'd think that would be common sense, but you don't seem to get that. So I'm stating it explicitly for you as a guideline. Don't forget that the only reason you went on a vote-creating spree for our policy pages is because we already had discussions where these votes were requested (we were trying to approve a huge draft bit by bit). But you seem to have improperly taken that momentum to other issues. --Wiki Tiki 89 16:18, 10 October 2016 (UTC)[reply]

What you said did cross my mind before you explained it further, but I disagree with you and I may find it a bit difficult to state my case if you just give your rule of thumb like you are obviously right. Maybe I could have just said: "no, thanks"?

I created a lot of votes to edit WT:EL because EL was garbage 1 year ago and votes are required to edit the policy. I just want Wiktionary:Entry layout to reflect reality. I don't create votes suggesting new policies as much as I create votes attempting to formalize what we already do, which barely needs a discussion, in my opinion. --Daniel Carrero (talk) 17:57, 10 October 2016 (UTC)[reply]

What I meant, is that it was decided that we wanted to overhaul the policy pages and that we needed to create a lot of votes for that. And then you created the votes. Now, you're creating votes for things that we never decided we needed votes for. "I don't create votes suggesting new policies as much as I create votes attempting to formalize what we already do", really? What is this vote that you just created in this discussion? --Wiki Tiki 89 18:05, 10 October 2016 (UTC)[reply]

I'll reply your last question, but let me get to an important point first: I'm fine with withdrawing the current "Derived terms" vote and just implementing that categorization change if you want. I just re-checked the discussion to make sure if there's a consensus... By my count, there are 6 supports, 3 opposes and 1 abstention. I am counting myself as a support. If that discussion were a formal vote, and thus requiring a 2/3 majority, it would barely pass (and could fail if more people voted). My plan was to proceed with the vote to make sure there's a consensus here, but I don't care about it anymore.

We were discussing general guidelines about creating votes. You quoted my statement: "I don't create votes suggesting new policies as much as I create votes attempting to formalize what we already do". In 2016, I created 12 votes for new policies or practices and 25 votes attempting to formalize current practices (and 1 admin vote). I'm not counting unstarted, unfinished, and withdrawn votes. Even if the current "Derived terms" vote is about a new practice, my statement that you quoted was truthful. --Daniel Carrero (talk) 18:42, 10 October 2016 (UTC)[reply]

Like I said, most of your votes about formalizing current practices were sort-of pre-approved, which is fine. The problem is you've gone beyond the pre-approved area, where are "normal" practices are that we create a vote when we decide that we want to create a vote. One editor should not just go and spontaneously create a vote. By the way, I'm not trying to be mean or anything, I'm just trying to help you stay in the good favor of the community, because very many of us are annoyed at all the votes. --Wiki Tiki 89 19:04, 10 October 2016 (UTC)[reply]

Ok, that's nice of you, thank you. Are you annoyed at all the votes? Would you like me to withdraw the current "derived terms" vote and maybe edit the module to implement the discussed categorization change? --Daniel Carrero (talk) 19:10, 10 October 2016 (UTC)[reply]

I think the vote can be withdrawn. Whether or not to implement the change should continue to be discussed in the original BP discussion. --Wiki Tiki 89 19:15, 10 October 2016 (UTC)[reply]

I withdrew the vote. --Daniel Carrero (talk) 19:20, 10 October 2016 (UTC)[reply]

"Famous bearers" section on names?

Would anyone be interested in this? So for the name "Abraham", you might have "Abraham Lincoln", "Abraham Woodhull" and "Abraham Van Helsing (fictional) as famous bearers. UtherPendrogn (talk) 16:43, 8 October 2016 (UTC)[reply]

This seems like it might have more relevance on Wikipedia (which already has lists of this sort), as who had what names has no lexical significance. Andrew Sheedy (talk) 16:50, 8 October 2016 (UTC)[reply]

Some people don't have wikipedia entries though, or they have the wrong name (Alaric and not the correct gothic Alareiks). UtherPendrogn (talk) 16:58, 8 October 2016 (UTC)[reply]

True, but the thing is that we already mention famous people in name entries. Alaric, for instance, is already defined as a king of the Visgoths, and there's nothing stopping anyone from defining Alareiks as the same (or at least mentioning his name in the defnition line) in Gothic, provided it is attestable in that language. Andrew Sheedy (talk) 17:02, 8 October 2016 (UTC)[reply]

I don't think the reconstructed *𐌰𐌻𐌰𐍂𐌴𐌹𐌺𐍃 (*alareiks), which is not attested in Gothic, is necessarily more 'correct' for English speakers than the accepted Latinization, Alaric. Anyway, I concur with Andrew Sheedy -- this seems like it would belong on Wikipedia, not mainspace Wiktionary. — Kleio (t · c) 19:49, 8 October 2016 (UTC)[reply]

The person not having a WP entry is an argument for creating a WP entry, not for adding anything here. Equinox ◑ 19:54, 8 October 2016 (UTC)[reply]

I think it’s important to mention the famous bearer in some cases. If I walk up to a random English-speaker on the street and say “Socrates was cool”, the person is expected to know that it refers to the Greek philosopher, not to any of many people called Socrates. — Ungoliant ^(falai) 13:01, 18 October 2016 (UTC)[reply]

A similar kind of 'name resolution' takes place if you are talking about your friend Bob ("you must mean the Bob we both know; or the one of the two Bobs we both know whom we saw most recently; or Bob Holness who is currently on the TV we are watching"). Somebody being "the famous Socrates" feels like just another criterion. Naturally we can link to Wikipedia for the name but it's not a separate dictionary sense. Equinox ◑ 23:03, 31 October 2016 (UTC)[reply]

Thinking of systematically adding missing pronunciations to English words

I'm thinking of running a bot to clean up English word pronunciations and add missing pronunciations based on the CMU Pronouncing Dictionary. Some issues that I'd appreciate comment on:

The dictionary says it's for "North American English". It does make the cot-caught distinction (thankfully) but doesn't make the Mary-merry-marry distinction. I think it's OK to tag it as "US" or "General American". Agreed?
I'm thinking of having a bot make certain substitutions in accordance with how our standards dictate what to do (Appendix:English pronunciation). Example is /r/ -> /ɹ/. Another possibility is removing long marks from pronunciations specifically tagged as US or General American.
I think syllable divisions should normally not be shown in English words because there's so often an ambiguity as to how to divide syllables. What do people think of having a bot remove them? Should I leave them alone?
On the other hand, the dictionary indicates primary and secondary stress directly on vowels, when we probably need to indicate them on syllables. What should the rules be as to where to put the stress mark? This entails deciding how to divide syllables. It seems clear that VCV should be divided VˈCV but it gets trickier with VCCV. One possibility is to divide most VCCV as VCˈCV but divide VˈClC and VˈCɹV as long as the Cl and Cɹ are possible syllable onsets (which excludes e.g. /dl/, /tl/, /ʃl/, /sɹ/, /nl/, /mɹ/ etc.). For VCCCV and VCCCCV a decision will probably have to be made based on what are possible syllable onsets, but what about cases like VCsCV, e.g. /pɑɹsli/ and /ɛkstɹǝ/? My instinct is to divide them /pɑɹ.sli/ and /ɛk.stɹǝ/, i.e. put the /s/ with the following syllable as long as it's a possible onset (which excludes /sɹ/ in particular).
The dictionary doesn't distinguish /ǝ/ and /ʌ/. I'm thinking we should use /ǝ/ in unstressed syllables, and /ʌ/ in syllables with primary or secondary stress. Reasonable?
The dictionary doesn't distinguish /ɚ/, /ɝ/ and /ǝɹ/. One possibility is to use /ɝ/ in syllables with primary or secondary stress, /ǝɹ/ before vowels in syllables without stress, /ɚ/ not before vowels in syllables without stress. On the other hand IMO the distinction between /ɚ/ and /ɝ/ is largely spurious in GA; maybe we should just use /ɚ/ consistently. I do think we should distinguish /ǝɹ/ from /ɚ/; at least, aberration /ˌæbɚˈeɪʃən/ looks strange to me.
/l̩ m̩ n̩/ or /əl əm ən/? The CMU dictionary writes /əl əm ən/ but I could convert them if necessary. Appendix:English pronunciation isn't clear about this.
/ɪ/ vs. /ə/ in unstressed syllables: The CMU dictionary does make this distinction but I can't tell if it's consistent or random (e.g. they write Abigail /ˈæbəˌgeɪl/ but Abilene /ˈæbɪˌlin/). They do write consistent /ə-/ for unstressed a- and consistent /ɪ-/ for unstressed e-. For words like recorded they give two pronunciations: /ɹəˈkɔɹdəd/ and /ɹɪˈkɔɹdɪd/. In my speech there's no obvious distinction between these two sounds in the vast majority of cases (excepting certain cases like Rosa's vs. roses which are clearly different), and I kind of doubt that this distinction is salient in General American (witness e.g. the vast confusion between affect and effect). My instinct however is just to keep whatever they have.

Benwing2 (talk) 21:02, 8 October 2016 (UTC)[reply]

Systematically adding pronunciations would be nice, but cleaning up existing ones is fraught with risks. No single source can adequately account for all of the variation, so you might end up regularizing away legitimate variants that would be better glossed as such rather than eliminated. Also, it may not be enough to know that a pronunciation is wrong without knowing what it's an error for. As for using general rules: some would probably work, but there's too much that depends on things like morpheme boundaries for me to feel safe in general about running the modification part on autopilot. English is such a dynamic, multifaceted phenomenon that talking about "correcting" things makes me nervous. Chuck Entz (talk) 21:52, 8 October 2016 (UTC)[reply]

I agree that the bot should only add missing pronunciations, not attempt to change existing ones. As to your questions: (1) I'd tag it "General American"; "US" is ambiguous and should be avoided (not all US accents are GenAm). (2) Yes, have the bot follow our conventions for representing GenAm. (3) I've long been opposed to indicating syllable boundaries in English, but I think I'm in the minority here. (4) Maximize the onsets of stressed syllables when indicating stress placement. (5) I agree; /ʌ/ in primarily and secondarily stressed syllables and /ə/ elsewhere. (6) I support /ɝ/ in primarily and secondarily stressed syllables, /ɚ/ in unstressed syllables, and /əɹ/ in unstressed syllables before a vowel. All three variants are illustrated in murderer: /ˈmɝdəɹɚ/. (7) I'd follow Kenyon and Knott here: /l̩/ after all consonants; /n̩/ after alveolar consonants, otherwise /ən/; /əm/ everywhere. Thus /ˈkækl̩/, /ˈbʌtn̩/, /ˈtʃɪkən/, /ˈɹɪðəm/. (8) I'd let the bot just follow CMUPD here; individual entries can be cleaned up later as necssary. —Aɴɢʀ (talk) 22:16, 8 October 2016 (UTC)[reply]

Are /ɚ/ and /ʌ/ actually used? Korn [kʰũːɘ̃n] (talk) 22:19, 8 October 2016 (UTC)[reply]

Yes, of course they are. —Aɴɢʀ (talk) 22:28, 8 October 2016 (UTC)[reply]

It's not that of course, really. People keep using /u/ for the US, but it's shifted so thoroughly to /ʉ/ that [u] is basically used as a marker for non-native accents in media. ps.: Oh, I made a typo above. I meant [ɝ]. pps.: I'm retracting my example with [u], I remembered some people using it. Korn [kʰũːɘ̃n] (talk) 23:08, 8 October 2016 (UTC)[reply]

What's extremely rare outside of Wiktionary is using /ɹ/ rather than /r/ for the English r-phoneme. Probably most reference works that render American English in IPA use /ɜr/ for the nurse vowel and /ər/ for the letter vowel, but /ɝ ɚ/ do have some usage as well (e.g. Kenyon & Knott, PEAS). —Aɴɢʀ (talk) 10:22, 9 October 2016 (UTC)[reply]

@Angr In general I think your suggestions are fine. When you say "maximal onset" do you simply mean that anything that can be an initial cluster should get grouped in the following rather than preceding syllable? Benwing2 (talk) 16:53, 9 October 2016 (UTC)[reply]

Yes; anything that can be a word-initial onset cluster can be a stressed syllable onset cluster. —Aɴɢʀ (talk) 16:56, 9 October 2016 (UTC)[reply]

Regarding 8: The lack of distinction between unstressed /ɪ/ and /ə/ is known as the weak vowel merger. According to the Wikipedia article, it's very common in General American (and I think it's probably common in Canadian English as well). Unfortunately, many supposedly US transcriptions on Wikipedia show the distinction... or maybe it's not that unfortunate, since some American accents do have the distinction (for instance, Southern American English, according to the article). Old-fashioned RP had the distinction, but I am not sure if modern RP really does; at the very least, the unstressed /ɪ/ is more centralized than in old-fashioned RP. You can here the old-fashioned vowels in some TV shows. I think one of the recent Miss Marples had a rather good old-fashioned RP accent. — Eru·tuon 22:23, 10 October 2016 (UTC)[reply]

BTW I think the Wikipedia article is wrong in that many GA speakers with the merger still distinguish Rosa's from roses. Benwing2 (talk) 00:12, 11 October 2016 (UTC)[reply]

That might be true. I don't think I distinguish the two vowels in most cases, but feel like Rosa's and roses are different, though only slightly so... then again, perhaps it's only self-deception, like my feeling that my cot and caught are different. — Eru·tuon 02:38, 11 October 2016 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Angr I'm implementing your Kenyon and Knott rules for ən -> n̩ after alveolar and not before a vowel. Which of [tdszʃʒθðlnɹ] do they count as alveolar? All of them are coronals. Does this specifically refer to [tdszln] or do [sz] count as dental? (The only clear motivation I can see for making this alveolar/non-alveolar distinction is in /tn̩/, where the /t/ is glottalized instead of flapped; but this applies only to /t/, and applies also when a vowel follows the /n/.) Benwing2 (talk) 22:35, 15 October 2016 (UTC)[reply]

@Angr Also, By analogy with the above, I am assuming the /ɝ/ should -> /ɜɹ/ before a vowel, e.g. furry /ˈfɜɹi/, even though /ɜ/ doesn't otherwise occur. Benwing2 (talk) 22:42, 15 October 2016 (UTC)[reply]

Furthermore, what about /t͡ʃ/ vs. /tʃ/, /d͡ʒ/ vs. /dʒ/? Appendix:English pronunciation says the tie bars should be used, but none of the three example words chat, teach, nature actually use them (of the three example words joy, agile, age, the second and third use tie bars but the first doesn't). BTW I think this is something that a bot could reasonably clean up; similarly /r/ vs. /ɹ/. Benwing2 (talk) 22:53, 15 October 2016 (UTC)[reply]

Also, /ʍ/ vs. /hw/? CMU uses /hw/ but this could easily be changed. Benwing2 (talk) 22:56, 15 October 2016 (UTC)[reply]

How about sewer /ˈsuɚ/ vs. /ˈsuəɹ/, and similarly for seer, rapier, steward and other such words with vowel + /ɚ/? Current pronunciations are inconsistent. Benwing2 (talk) 23:30, 15 October 2016 (UTC)[reply]

@Angr Here is a sample of words with their current IPA pronunciations per the code I've written: User:Benwing2/cmudict-sample If you could look over these words (there are 250 of them) and let me know if you see any issues, I'd be grateful. Keep in mind there are some inevitable issues stemming from the algorithm used to determine syllable boundaries, and other issues that reflect potential errors in the source corpus. Benwing2 (talk) 23:37, 15 October 2016 (UTC)[reply]

(1) K&K don't use /l̩/ or /n̩/ after /ɹ/. I'd forgotten about that until I saw /ˈtʃæɹl̩/ on your test page. But they write /ˈbæɹəl/ and /ˈbæɹən/ with shwas, and that feels right to my intuitions as well. They use /n̩/ after /t d s z/ but /ən/ after /ʃ ʒ tʃ dʒ θ ð l n ɹ/, and I largely agree (though I'm not adverse to Ethan /ˈiθn̩/ and heathen /ˈhiðn̩/ either). (2) I'm inclined to write /fɝi/ for furry since /ɝ/ is always stressed and doesn't lose its r-coloring before a vowel. That's different from ephemeral /iˈfɛməɹəl/ where I feel like the first shwa really isn't r-colored. (3) I have no opinion on the use of tie bars on affricates. I don't usually use them myself for English, but I usually do for other languages; I can't justify why. I certainly don't object to them. (4) I generally prefer /hw/, not least because it makes it easier to combine merging and contrasting accents by writing /(h)w/. (5) I'd use /ɚ/ after a vowel. —Aɴɢʀ (talk) 07:23, 16 October 2016 (UTC)[reply]

Possible future vote about deleting all programming language symbols

When there are fewer votes in the list, I'm thinking of creating a vote with the proposal "deleting all programming language symbols", to see if there's any consensus for the idea... I was hoping to be able to create new entries for APL symbols and so on, but a few RFDs are being created to delete some symbols. Even if they pass, this leaves us with other symbols kept (for now) and others deleted. I'm curious to see if there are people who would support nuking all the symbols. If not, are we sure where to draw the line? The argument "these symbols are not used in any natural language" could also be used against math symbols and chemical formulae, if someone wishes to do so. --Daniel Carrero (talk) 22:29, 8 October 2016 (UTC)[reply]

Mathematical and chemical symbols are used to communicate concepts to human beings. Computer languages are used to control the actions of computers. Big difference. Chuck Entz (talk) 22:51, 8 October 2016 (UTC)[reply]

Discussions show whether there is consensus. Don't create a vote just to test things. We have too many votes going on already, as has been pointed out before. Equinox ◑ 22:53, 8 October 2016 (UTC)[reply]

Is the consensus deleting all programming language symbols? --Daniel Carrero (talk) 22:58, 8 October 2016 (UTC)[reply]

I had conversations in JavaScript with a Russian friend when explaining something in English was just too cumbersome. I'm pretty on the fence of this topic. Can I hear some arguments for why we should limit ourselves to communication with other humans (and pets (not rocks))? Korn [kʰũːɘ̃n] (talk) 23:03, 8 October 2016 (UTC)[reply]

@Daniel Carrero: You're missing Equinox's point. For the most part, you shouldn't be starting votes that have a low chance of passing just to prove that they lack consensus. Instead, votes should put the official seal on consensus that has been hammered out in discussion but has a large impact or is not 100% clear, and therefore needs to go to a vote. This would help reduce both the number of active votes and the ill will toward them a great deal. —Μετάknowledge^{discuss/deeds} 05:17, 9 October 2016 (UTC)[reply]

Sure, Metaknowledge. I'm not even saying I have to create a new vote about it... We can discuss it in a friendly way, to determine if a vote is necessary and see what decisions we can make just in this conversation. Specifically, about the votes that I've been creating... Is there ill will toward them? Do people agree I've been doing something wrong? How many of the current votes could be avoided and should not have been created? --Daniel Carrero (talk) 12:14, 9 October 2016 (UTC)[reply]

I contest the idea that programming languages are means of human-to-computer communication exclusively, maybe machine code, but higher level programming languages are designed and written for mutual understandability among coders, not just for ease of use. Crom daba (talk) 03:56, 9 October 2016 (UTC)[reply]

Maybe not exclusively, but that's their primary function. In the same vein, color-coding of electrical wires is used to make it easier for mutual understandability of their work among electricians, but I wouldn't expect a dictionary to tell me which color is positive and which color is ground. Chuck Entz (talk) 04:25, 9 October 2016 (UTC)[reply]

I suspect that source code is read by humans more often than it is read by computer - certainly for compiled languages. I would welcome all programming language entries in this wiki (my early COBOL entries were mostly deleted). SemperBlotto (talk) 06:54, 9 October 2016 (UTC)[reply]

Anything that isn't used in human language shouldn't be in the main namespace. [[http://]] isn't a word in any language, for example. Renard Migrant (talk) 12:39, 9 October 2016 (UTC)[reply]

Allow me to disagre with you: I'm under the impression that "http://" is a SOP of http and ://. I created the latter, some time ago. --Daniel Carrero (talk) 12:43, 9 October 2016 (UTC)[reply]

Thanks for the example. I'd delete :// unless it could be shown it was used in human language somehow. Renard Migrant (talk) 12:49, 9 October 2016 (UTC)[reply]

HTTP is an English initialism used in sentences. http within a URI is not English as such because of its context. A similar case: sealed is an English word, but the keyword sealed on a Java class is not English as such because of its context. Equinox ◑ 12:49, 9 October 2016 (UTC)[reply]

We define "@" as an e-mail delimiter. In my opinion, it seems right to keep both "@" and "://" for the same reasons. Does anyone disagree? --Daniel Carrero (talk) 12:52, 9 October 2016 (UTC)[reply]

One could equally argue that it seems right to delete them both for the same reason! Describing something as a delimiter is hardly defining it, anyway: "this is a delimiter" does not give us semantic information as a dictionary should. Equinox ◑ 12:55, 9 October 2016 (UTC)[reply]

If we formally decide to delete the e-mail sense from @, I assume it's very likely that anons will think we are a missing a sense by mistake and will want to re-add it. My point is that the e-mail delimiter is a big part of what the "@" is. You said now: "'this is a delimiter' does not give us semantic information", but I'm not sure what to make of it. This is not any random delimiter, it separates the ~~e-mail~~ username from the domain, which is a meaningful explanation. Concerning human language unrelated to computers, we have entries for punctuation marks: , . ! ( ) etc. -- and the space, the plainest delimiter of all: ] [. --Daniel Carrero (talk) 13:14, 9 October 2016 (UTC)[reply]

I take your point. @ in john.doe@examplemail.com, you could argue the @ isn't part of any language in this example but it also seems silly to delete it. A bit like 2+2=4, + and = aren't necessarily part of any language here. You could argue the meaning they convey is not linguistic and yet it seem ludicrous to delete them. Renard Migrant (talk) 21:03, 10 October 2016 (UTC)[reply]

Well, "@" as a general symbol of the Internet has entered the popular consciousness in a way that :// certainly hasn't. Equinox ◑ 17:23, 11 October 2016 (UTC)[reply]

As said in the RFD discussion, :// is actually a SOP of : (delimiter used after ftp, http, smtp...) and //, which already has a networking sense. --Daniel Carrero (talk) 17:30, 11 October 2016 (UTC)[reply]

Then pretend I said "in a way that : and // certainly haven't". Equinox ◑ 17:32, 11 October 2016 (UTC)[reply]

Sure. --Daniel Carrero (talk) 18:04, 11 October 2016 (UTC)[reply]

Looking for German speakers to add test cases to Module:de-IPA/testcases

I'm thinking of creating a module to generate German pronunciation from spelling, sometimes with the help of pronunciation respellings (e.g. Phonem probably has to be respelled something like Phoném to indicate the unexpected stress). I don't actually know whether this is reasonably possible for German but I assume it probably is in most cases. (Note that there's already a module of sorts in Module:de-IPA that purports to do this, but it's really horrible.)

I'm looking for knowledgeable German speakers to add test cases to Module:de-IPA/testcases. This is module code but don't be alarmed; adding test cases is very easy, just follow the examples.

The only special symbols so far I've created are:

acute accent (e.g. á é í ó ú ä́ ö́ ǘ) for unexpected primary stress; expected stress should be on the first syllable except for certain prefixes like ge-, be-, ver-, etc.
grave accent (e.g. à è ì ò ù ä̀ ö̀ ǜ) for secondary stress, unless it's somehow predictable
slash to separate compounds joined together (e.g. Buch/stabe)

Note that there's already a module that does this quite well for French (Module:fr-pron), despite the vagaries of French spelling; see the test cases in Module:fr-pron/testcases. There's similarly a Russian pronunciation module Module:ru-pron, with test cases Module:ru-pron/testcases. Both of these I rewrote entirely based on early versions (respectively by User:Kc kennylau and User:Wyang). Benwing2 (talk) 23:09, 8 October 2016 (UTC)[reply]

This really should be preceeded by a thorough discussion of our German pronunciation practice, which I consider in dire need of overhaul, which is usually met with roaring silence. Korn [kʰũːɘ̃n] (talk) 00:09, 9 October 2016 (UTC)[reply]

Fine with me. I didn't realize that there was a problem. What do you think we should use instead? Benwing2 (talk) 02:37, 9 October 2016 (UTC)[reply]

Not a native speaker but I disagree with some test cases:

The final "r" being completely silent. It's light and I think /ɐ̯/ should be used for the sound, Uhr is a correct test case.

Qualität has no 2-ary stress.

Reichstag probably needs a respelling to make it obvious that -tag is long in this compound word. It worked for Auswahl, though. --Anatoli T. ^{(обсудить}/^вклад) 05:16, 9 October 2016 (UTC)[reply]

Here are the things we need to discuss and find consensus on. I cannot answer most questions because they're southern issues.

Which pronunciations do we put down? We need at least three national standards. (Germany has at least two standards, though.) Maybe more? And in which form? What do we define as standard? For a start, my definition is: "Standard are those features which are not avoided by speakers in the most formal setting." This includes for example ⟨Chemie⟩ /ʃeːmi/, ⟨wichtig⟩ /vɪxtɪk/. And I do not think the artificial language of newsreaders is a good representation of the varying different standards. That language mostly excludes /ʃeːmi, vɪxtɪk/.
Where I'm from (north) and live (Berlin area), there is no /ə/ phoneme, unstressed vowels get deleted or are phonetically fully identical with /ɪ/ as [ɪ~ɘ], or even [ɪː] with northern accent (which is not considered standard). The merger itself absolutely standard, I wouldn't be surprised if a good deal of northerners would normally parse [ə] as /-ər/. Is this a negligible phænomenon of a specific (big) region or do we need to address it? Is there actually a different situation elsewhere?
Unstressed vowels can surface as [ɛ] and [a] in Austria commonly, /ə/ is [e] in some parts of the south. Will Austrian dialects be represented with ⟨ə, ɐ⟩, ⟨ɛ, a⟩, both or a mixture, what about [e]? Korn [kʰũːɘ̃n] (talk) 08:02, 9 October 2016 (UTC)[reply]
Switzerland is rhotic, Austria and Bavaria are facultatively rhotic, some parts of Germany are traditionally rhotic. Where and how do we represent what?
Even German German is traditionally rhotic in some places where it is no longer general. /stark/ is [ʃtaʁk],[ʃtark] or [staːk]. Which to include? What about [ʃtaχk]?
Austro-Bavarian does nor palatalise fricatives after liquids, so that /furxt/ is [fʊrxt] instead of e.g. [fʊɪçt], which is the local standard where I live. How standard is it? Do we represent it?
Same question for /xs/ which is [ks] in Germany and [xs] in Austro-Bavarian.
Also /x/ in general. Switzerland and at least parts of Austria only have one phone [x~χ], the north has three [ç, x, χ], Berlin only knows [ç, χ]. What to include?
We need to discuss the shown vowel qualities of both lax high vowels and /a/, which might be tense high vowels and [ä~ɑ] respectively in the south. (Wikipedia says that [ɑ] is the realisation for Austrian standard, it should be the same for Switzerland.) /a/ is [a~ä] in Germany, which to pick?
Further, the northern third of Germans has phonemic backness: /a, ar, aː/ are [a, aː, ɑː]. Ignore or include? But ⟨Maß⟩ and ⟨Mars⟩ are identifiable minimal pairs here, but as I'm told not in other regions.
Is /aɐ̯/ actually used? Do we favour /aɐ̯/ or /aː/?
The last consensus was not to show aspiration. I think this is a mistake.
I would consider /pf-/ [pf-] and the presence of an /ɛː/ in most words to be nonstandard for here. I.e. if someone native to my area would use it, I would consider it so foreign to be wrong. /(p)f/ is easily represented. Do we double e.g. /ʃpɛːt/, /ʃpeːt/?
What to use for /v/ which is not truly [v] in many regions?
What to do about fortis/lenis? As far as I'm aware, in the south, lenis consonants are unaspirated voiceless [t, p, k] with tensed but unvibrating vocal cords, while fortis consonants are longer and more forcefully pronounced. I'm not a friend of /d̥, p̊, g̊/ for dogmatic reasons, but this is the least we can do.
Consonant length instead of ambisyllabic consonants in the south in general. Having ⟨Katze⟩ as /kat.tse/ [kˑɑt̚tsV] is perfectly normal for Austria and Switzerland. If its absence is markedly foreign, we should include it. Cf. audio file at Mitte.
Fortis-lenis levelling is said to be absent in Austria/Switzerland. (I only know it for Switzerland.)
/r/ is [ʕ~ʁ~ʀ~ɾ~r~ɹ], which to use where? I'm strictly opposed to using [ʁ] in broad transcription.
Beyond automatic additions, which regional non-standard forms are worth adding/allowing? Northern standard realises /v/ as [w] in several instances, e.g. /ʃvyːl/ is inavriably [ʃʷɥ͓yːl], /tsvo/ as [tswo]. Is there any point in prohibiting any pronunciation at all?
Do we include a superdialectal broad form or one for each? I.e. /bitːər/ + [pitːər] (Austria), [bɪtʰɐ] (Germany) or /bitːər/ [pitːər] (Austria) + /bɪtər/ [bɪtʰɐ] (Germany)? Korn [kʰũːɘ̃n] (talk) 09:34, 9 October 2016 (UTC)[reply]

That's what I can think of off the top of my head. If we start adding stuff automatically, we might as well do the whole shebang and do it right. Korn [kʰũːɘ̃n] (talk) 08:02, 9 October 2016 (UTC)[reply]

ps.: A lot of motivation behind this, aspiration for example, is that I assume we want to represent a native pronunciation that people can emulate, and mix and match simply makes you sound like a foreigner. It's one thing to speak German with Bavarian features if you're a Bavarian, but if you're a Slovene who comes to Berlin and tries to adapt the local language while erroneously using non-regional features, people will not interpret it as Bavarian but as Slovene. It's like mixing Geordie and Texas. Sure, it's English, but if a dictionary wouldn't tell me the difference, I'd consider it a shoddy bad work. Korn [kʰũːɘ̃n] (talk) 08:12, 9 October 2016 (UTC)[reply]

Lots of what you're trying to represent is dialectal features. Fundamentally, though, we for the most part avoid doing this because it's a huge can of worms, as you've demonstrated. For British English, for example, we pick the most standard form (RP) and represent it, and don't even try to represent all the numerous dialects. Similarly for German, that would mean choosing the standard as usually found in most dictionaries, which is similar AFAIK to what we already have. As for things like aspiration, I agree we shouldn't show it because we're trying to represent a broad phonemic representation rather than phonetic detail that is more likely to be dialect-specific. We don't represent aspiration in English, for example. In general we should follow the lead of other dictionaries. If you want to show some dialectal renderings in addition to the standard, that is fine but we don't need to force ourselves to do that by default. Similarly, for example, we show "General American" pronunciation for American English even though that differs significantly from e.g. Southern or New York English. For French what we actually show is something like a 100-year-old Parisian standard that doesn't very accurately represent anyone's speech any more but is what is conventionally found in dictionaries. Benwing2 (talk) 15:53, 9 October 2016 (UTC)[reply]

Small thing first: We absolutely and commonly show aspiration for English. cat, water, take, and I don't think aspiration was ever contested for English. We just don't have many pages with narrow transcriptions.

And yes, this is about dialects because there is no non-dialectal German. The standard is not a thing, there are the standards and these need to be represented and we need to discuss how. To give the broadest æquivalent, we're normally only showing the German version of GenAm (Germany). We need to discuss whether we show it the right way and we need to discuss how we want to represent the æquivalent to RP (Austria) and GenAus (Switzerland). And if we don't show them, we're simply not that good a dictionary. The standard 'found in most dictionaries' is some form of representation of the standard of non-southern Germany, simply because that is the largest area by far. It has little relevance to people not living in that area and people in Austria often don't have a opinion of it either - not that this is relevant to us.

While lack of aspiration is non-standard in most of Germany, presence of aspiration might be non-standard in Austria and Switzerland. Those things are features speakers using either standard would want to get right in order not to sound less capable. I repeat that a mixture of standards shows a lack in proficiency. It is for that reason that I think the features demarking the standards need to be represented, if we want to produce something useful. And this is where my questions aim: 1. We need to stop being so stupidly German-centric. 2. We need to find out which features make an actual difference between the different standards. 3. We need to find out how to represent them. Korn [kʰũːɘ̃n] (talk) 17:03, 9 October 2016 (UTC)[reply]

Very well then, I am fine with "only showing the German version of GenAm". That's my exact point. We simply don't have the resources to consistently represent a large spectrum of dialectal variants. That's why you're getting the "silence". Are you volunteering to personally do all the work to get this done? If not, who's gonna do it? Remember that "the perfect is the enemy of the good". Benwing2 (talk) 20:08, 9 October 2016 (UTC)[reply]

I feel that we should stay broadly consistent with the Wikipedia page on Standard German phonology. As far as deviations from this (ideal) standard go, perhaps we could include pronunciations specific for certain cities that would implicitly represent the accent of a wider region. Crom daba (talk) 20:51, 9 October 2016 (UTC)[reply]

So you would be fine with not including RP for English? I'm not sure that's the right spirit. And we don't need to represent a 'large spectrum of dialectal variants'. There's four German speaking countries, amongst which there are 3-5 different standards. I think we won't break a leg by adding 2 broad ones with 2-3 narrow versions each. And adding German phonology is one of the things I wanted to do on Wiktionary, but every time I tried to make the entries a little less misleadingly generalising, @Kolmiel pops up, reverts my edits and barks at me to get consensus first. Which I can't keep because people can't be arsed to have a discussion. Catch-22, thanks very much. As for the Wikipedia article, according to Kolmiel, about half of its information is currently verboten. Also, who's going to do the work is perfectly irrelevant for making rules on what's to be included and what not. Certainly nobody's gonna do anything if it keeps getting reverted, innit. Korn [kʰũːɘ̃n] (talk) 22:58, 9 October 2016 (UTC)[reply]

I think our pronunciation practice is fine. Keeping it simple and intelligible to at least a mentionable minority of users. What is very wrong is to say that the "northern German standard" has no relevance to southern Germany, Austria, Switzerland. This is nonsense. Bavarian and even Austrian radio news are now commonly read in a distinctly northern accent, with -ig pronounced -ich and everything. Swiss television has even hired what must be German readers to speak over their reports. Apart from that, I guess Korn will have his anyway and everything will be messed up. Do what you want. I don't care that much. Kolmiel (talk) 23:08, 9 October 2016 (UTC)[reply]

It's not right to argue about dialectal differences before the basic stuff is implemented. Regional standards can be implemented later using the same module using phonetic respellings, something like "|phon=wichtik|reg=at" (just an example), if you want to use the Austrian pronunciation of wichtig. Get your priorities right, people and help Ben fix the module! --Anatoli T. ^{(обсудить}/^вклад) 06:55, 10 October 2016 (UTC)[reply]

Regional standards cannot be implemented later, since, which is my point, this is implementing a regional standard. It's simply the biggest and the one exerting most influence because of its prevalence. Starting with one first is nothing I am opposed to. What I'm opposed to is not labeling it and having it come to pass without prior thorough discussion of its ambiguities. And, as in any discussion, excluding information - not by not entering it but by prohibiting its entering - without a proper case made for it. Korn [kʰũːɘ̃n] (talk) 12:17, 10 October 2016 (UTC)[reply]

Korn, I don't object to labeling the German standard as "German standard" or whatever. But you seem determined to gum up the process to the point that nothing gets done. How about, as Anatoli suggests, we try to implement this German standard rather than just arguing? It's easy to change the module at any point to e.g. use r instead of ʁ, and changing the test cases isn't hard either. We've done it plenty of times in Russian, for example. Benwing2 (talk) 13:50, 10 October 2016 (UTC)[reply]

@Korn: Has there been a similar system developed for German to the enPR/AHD? If there is or if you create one, we could that {{de-IPA}} go from the lemma to dePR. Then for any dialect whose transcription you'd like to add, we can have a function to create that dialect from the dePR representation. I think that before anything can be done in terms of broad dialectal coverage, someone needs to make some big tables of correspondences since your lists of changes are a bit overwhelming and difficult to understand. —John C5 14:39, 10 October 2016 (UTC)[reply]

The creation of exactly that kind of thing for German is what I wanted to initiate! I wanted people to discuss which variants to include and which values to put for the variables. I was assuming that Wiktionary had some person who'd know more about southern German than I do (which isn't that much) and that this would be a thing done with in no time. Korn [kʰũːɘ̃n] (talk) 17:46, 10 October 2016 (UTC)[reply]

Evidently you will have to be the one to do it, if you want it done; no one else appears to have the experience or interest. However, I really don't think this is necessary to get done before helping me create the test cases I've requested above. Benwing2 (talk) 18:28, 10 October 2016 (UTC)[reply]

You miss the point. I would have done this long time ago, but people kept removing my edits. The point is not so much that I want people to do anything, the point is more that I want some consensus that these things can be done. ps.: And of course it is relevant to decide what the expected results of test cases should be before test cases are added. I have no idea what you want your module to put out and I'm surprised, for example, that the module expects ⟨Quatsch⟩ to be [kvatʃ] instead of [kfatʃ]. Korn [kʰũːɘ̃n] (talk) 19:19, 10 October 2016 (UTC)[reply]

I don't know why people are removing your edits; they would seem fine to me. As for Quatsch, I simply copied the pronunciation found on that page; it should be fixed there if it's wrong. Benwing2 (talk) 19:31, 10 October 2016 (UTC)[reply]

And here the circle closes. Because this is exactly the kind discussion I wanted to have. This isn't wrong, this is one of two options ([kfatʃ]/[kʋatʃ]). And we must decide whether the expected result of the test case should be one, the other, or both. We can not not decide. Do you see my point now? We can't put any test cases without deciding one way or another what we want tested. Korn [kʰũːɘ̃n] ([[User [talk:Korn|talk]]) 21:14, 10 October 2016 (UTC)[reply]

@Korn You think you're getting your point across but you're not, instead of helping, you seem to be sabotaging, involuntarily. You can use test cases for STANDARD German (Northern, Germany-centric, Bühnendeutsch, whatever). Any deviations or variants could use phonetic respellings and regional labels. --Anatoli T. ^{(обсудить}/^вклад) 21:35, 10 October 2016 (UTC)[reply]

I am obviously not getting my point across because not a single person has yet understood what I'm saying. Bear with me one more time, I'll try to make it absolutely clear. To repeat: 1. There is more than one standard. There is not the standard. 2. I am fine with starting out with one of the standards, and have it be the northern one. 3. EVEN THEN the northern standard sometimes has multiple valid realisations. MULTIPLE. Not 'one and some deviations', multiple. equally. valid. standard. realisations. of. equal. standardness. To give another English analogy: If I were to say: 'Received Pronunciation realises ⟨ol⟩ in one of two ways, [əʊl] and [ɔʊl], which one do we use?' - And all I'm hearing instead of a reply is: 'Nobody cares about this, just put the RECEIVED PRONUNCIATION.' If you don't want to pick one form, what do you want to do? Put a picture of a dancing elephant instead of letters? Korn [kʰũːɘ̃n] (talk) 21:50, 10 October 2016 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Just choose ONE standard, choose a label for it. You do realise that showing multiple realisations may be difficult, right? Consider this simple example of two standard pronunciations of the same word

ähneln (IPA follows)
ähneln (phonetic respelling: ehneln) (IPA follows)

Benwing will come up with some tricks, I'm sure.--Anatoli T. ^{(обсудить}/^вклад) 22:01, 10 October 2016 (UTC)[reply]

Do you notice how 'just pick one' is the first actual reaction to the point I get AT ALL? How this is actually some form of guideline or method that allows to actually enter test cases with any form of consensus? How this is a jolly lot more than refusing the very question? Korn [kʰũːɘ̃n] (talk) 22:07, 10 October 2016 (UTC)[reply]

It seems like you're downplaying the pre-eminence of the Northern standard (known elsewhere as Standard German), it's the "artificial language of newsreaders" that dictionaries, learning materials and radio broadcasts use. Dialects may be used in the most formal of settings, but as evidenced by your 20 points, they are not standardised. Crom daba (talk) 22:13, 10 October 2016 (UTC)[reply]

Newsreaders are using [ʃtaʁk], [ʃtark] and [ʃtaːk] with equal standing, depending on nothing else but personal inclination. So are news readers speaking a non-standardised dialect now? Should we do away with the whole concept of German pronunciation because it isn't regulated to the last iota and replace it with dancing elephants? Or should we just be productive for a fraction of second, pick one, or give a guideline for the editor to pick, and get on with the module? Which this discussion does not seem capable of at all. This issue is not nearly as big as to merit this discussion in the slightest. This is ludicrously blown out of proportions. I would have just gone and entered something by now, but, and this is the critical point, there's this history of something being removed if I cannot point at some form of consensus. Which apparently just cannot be achieved because people for some reason keep thinking I'm discussing in order to enter medieval Cimbrian as standard German. Korn [kʰũːɘ̃n] (talk) 22:29, 10 October 2016 (UTC)[reply]

Do you think this situation is unique for German? The standard German phonology is described many times and is used in many places. Dictionary publishers won't get stuck on variations. If I were like you, we would still be arguing, if the Russian часы́ (časý) should also include the pronunciation [t͡ɕɐˈsɨ] as many native speakers, including educated ones or even newsreaders, would choose to pronounce, especially from the South or from Ukraine. The standard and the most common Russian pronunciation of the word is [t͡ɕɪˈsɨ]. A suggestion: just use [ʃtaʁk] for stark but be consistent about it! BTW, you even have a chance to choose your favourite accent, being an active German editor and thus - one of the few dictionary makers. We'll have to trust your judgement but someone may dispute your decision (which is also OK!) but we need to have something happening. --Anatoli T. ^{(обсудить}/^вклад) 23:49, 10 October 2016 (UTC)[reply]

Korn, we all do want to get on with the module. I'm still not sure which changes of yours are getting reverted. I think the number of actual issues involved in rendering standard Northern German will be small, and in doubt, err on the side of being more conservative. Hence use [ʃtaʁk] not [ʃtaːk], and maybe [kv-] instead of [kf-]. But overall these are minor issues, not the major issues that are involved in trying to handle multiple dialects. In many cases I think people will defer to your judgment -- similarly, I've largely deferred to Anatoli's judgment because he's a native Russian speaker and has very good linguistic intuitions. Occasionally we have had discussions over how much detail to render or how to handle things like palatalized affricates, but these have not been grave. Benwing2 (talk) 00:05, 11 October 2016 (UTC)[reply]

Yes. Yes. These are minor issues. Which is exactly why it's so frustrating that it takes a page-long conversation before somebody says something productive like 'just pick one' or 'in doubt use the more conservative one'. This is exactly the kind of thing I asked about and you (plural) could have said: 'I get that there's pluricentricity, but foreigners abroad are taught German German, so I think it's best if we start with the biggest variety of that. Put something you feel adæquate for now and if there's variation in there, just use the more conservative one', right after I posted my 20 points and we would have been done with this two days ago. Anatoli, with all due respect, which I do have for you, all of you, the reason that dictionary publishers don't get stuck on variations is probably that they just decide for one instead of making a hub-bub about the raising of the question. And that in all other publications, pages don't get torn out by random passer-bys who disagree with that choice. I think we're all somewhat annoyed by this kerfuffle, I hope this isn't dampening anyone's good spirits. Korn [kʰũːɘ̃n] (talk) 07:49, 11 October 2016 (UTC)[reply]

Which constructed languages belong in mainspace?

The status quo: CFI allows for Esperanto, Ido, Interlingua, Interlingue (Occidental), Lojban, Novial, and Volapük to be in mainspace; all other constructed languages must have their entries in appendices.

The problem: I suspect some of them have so little material that nearly all our entries would fail RFV. (In fact, Klingon is not allowed in mainspace, but its durably archived corpus probably rivals that of Lojban, for example.)

Proposed solution: Constructed languages with small corpora shouldn't be in mainspace. Esperanto without a doubt, and probably Ido and Volapük as well should stay. The others should be moved to Appendix space like Toki Pona and Quenya. I would like to get some feedback before I start a vote on this. —Μετάknowledge^{discuss/deeds} 21:12, 9 October 2016 (UTC)[reply]

I agree that Esperanto, Ido, and Volapük should probably stay. For the other languages, I share the concern that the corpora may be too small. I'm not very familiar with Interlingua, Interlingue, or Novial, but for Lojban, very common words like .i and la appear to be citeable from Usenet, but I can't seem to cite an ordinary word like muvdu. —Mr. Granger (talk • contribs) 15:47, 10 October 2016 (UTC)[reply]

It wouldn't bother me to have only a dozen or so entries for one language, if that's all that can be cited. The problem is creation of entries based on dictionaries or whatever other criteria when the words don't meet CFI. A bit like users who enter a load of -phobia entries, I see these as about the same. Renard Migrant (talk) 15:58, 10 October 2016 (UTC)[reply]

So should we have a policy that words from these languages can be deleted on sight if no citations are provided? DTLHS (talk) 16:04, 10 October 2016 (UTC)[reply]

My own view is that we shouldn't have languages in appendices at all. Either they should be in mainspace or nowhere at all. —CodeCa t 17:08, 10 October 2016 (UTC)[reply]

Why? —Μετάknowledge^{discuss/deeds} 19:55, 10 October 2016 (UTC)[reply]

@Ungoliant MMDCCLXIV, in case you're interested in contributing. —Μετάknowledge^{discuss/deeds} 23:12, 16 October 2016 (UTC)[reply]

We have a dozen or more natural languages from which only 1-3 words meet CFI, so there is no inherent problem in having languages which have only a few entries after the razor of CFI has been used to shave them, as Renard says. But I agree with Metaknowledge that it makes little sense to allow some sparsely-attested artificial languages in the mainspace but not others. - -sche (discuss) 04:25, 20 October 2016 (UTC)[reply]

Comment, because Klingon was mentioned: in previous discussions, it has been noted that Wiktionary cannot include many words in Klingon, Dothraki, and some other languages (Quenya?) without running into legal/copyright problems. - -sche (discuss) 04:25, 20 October 2016 (UTC)[reply]

Why we don't need durable citations

It's generally asserted that durable citations are required to pass RfV. I think this requirement is needlessly bureaucratic and should be abolished. While it might be preferable to have durable citations, I don't think it's imperative. Here's why:

In today's world, a greater percentage of content is on not-necessarily-durable website rather than print media or durable websites.
Compare to books. We don't link to every book we have, and we don't unfrequently cite rare books where Joe Avg is unlikely to ever obtain a copy.
Some things don't stay up forever, but many things do. Instead of assuming that something will eventually disappear, we could assume that it won't (FWIW, Wikipedia makes the latter assumption; it doesn't require durable references)
If the quote that uses the word in one or more sentences is already on Wiktionary, do we necessarily need a link to anything anyway? Purple backpack89 05:14, 10 October 2016 (UTC)[reply]

We should expand what is considered as "durable", but not reject it. We still need quotes that we can check up on later, and won't simply disappear (as many of Wikipedia's links do). I can think of times when someone misread a source and only later, by looking at the same source, could we determine what it actually said for the purpose of citing it. —Μετάknowledge^{discuss/deeds} 05:28, 10 October 2016 (UTC)[reply]

We can't just accept any words from random websites or else there's some danger that we will get flooded with things that would otherwise go on AP:LOP. You said elsewhere: "[Acela Republican is] used a lot on the Internet, and, since it's a word that postdates the decline of print media, that oughta be good enough." but by your own count, there are 744 Google hits for that term, which I consider a small number. Only 744 hits seems indicative of a minor trend in using that term. Do you have any specific criteria in mind? I wonder if all English words that are used (not just mentioned) on more than, say, 10,000 websites are likely to appear in books anyway. --Daniel Carrero (talk) 05:34, 10 October 2016 (UTC)[reply]

744 may seem like a small number compared to 10,000, but it's a big number compared to 3, which is the floor for number of durable citations we need. If something appeared in 20 or 30 non-durable websites, I think we ought to consider having it. As for your concern about protologisms, those tend to be weeded out more by the at-least-a-year rule than by the durability rule. Purple backpack89 05:38, 10 October 2016 (UTC)[reply]

Sorry, I understand your point but I don't find it very convincing. You may wish to create a vote with the proposal "include words that have 700+ uses on the internet" and see if many people support it. But in my opinion 1 durable citation is worth at least 10,000 non-durable citations... Actually, you may ignore my arbitrary math, I just meant that non-durable citations are really worthless (again, IMO) except maybe in very large numbers. --Daniel Carrero (talk) 07:20, 10 October 2016 (UTC)[reply]

10,000 to 1? Really? Why do you think non-durable citations are so worthless? Is it reliability? There are plenty of websites that most would consider "reliable" that aren't durable. By contrast, some of the durable things we have aren't necessarily all that reliable. Purple backpack89 17:53, 10 October 2016 (UTC)[reply]

Because if something is written in 3 books (especially 3 books easily accessible through Google Books), chances are the attested word is "set in stone" -- we won't have to worry about it and we won't have any trouble verifying it and reading the page for further context that may not be available in the quotation. If something is attested from a site that might disappear tomorrow, chances are we are going to have a headache when checking the quotation and attesting the term. But feel free to convince me otherwise if you want. --Daniel Carrero (talk) 18:05, 10 October 2016 (UTC)[reply]

But if something is common enough as to have 700+ uses on the internet, surely citing would be as easy as choosing another new source from the 700+ were a previous source to disappear? —suzukaze (t・c) 20:28, 10 October 2016 (UTC)[reply]

Sure, but that introduces an endless maintenance need that maybe can be avoided for most words. If a word is used so often on the internet that Wiktionary arguably needs to include it, what are the chances that the word already exists, too, in books and other durably archived sources? Do you know any specific words that Wiktionary might include by citing websites that can't be included by citing durably archived sources? --Daniel Carrero (talk) 20:34, 10 October 2016 (UTC)[reply]

Minority languages with obscure publications, conlangs, internet slang no one in their right mind would publish, etc. This is the phrase "to speak Teochew" in the Teochew dialect of Min Nan Chinese. It only has four pages of Google results. —suzukaze (t・c) 20:37, 10 October 2016 (UTC)[reply]

These are good examples, thanks. They are about the phrasebook, though. I believe there's no doubt that these sentences can be composed, so attestation is less important for them than for most single words. We could have a new policy like "allowing entries for translations of all accepted English phrasebook entries, in all languages, even if said translations have 0 Google hits." --Daniel Carrero (talk) 20:48, 10 October 2016 (UTC)[reply]

BTW, the "hot word" policy is relevant in a lot of cases. Equinox ◑ 15:48, 10 October 2016 (UTC)[reply]

I think there are loads of revisions we could potentially make here. For one, in actual fact WT:CFI doesn't say anything about copy citations up into entries. It's just assumed that that's what we do. But as someone once put it, nothing is specified, so nothing happens. CFI doesn't say you have to copy up the citations, so you don't have to. Of course best practice is to copy them up because then everyone can see them, but with hundreds of items tagged with rfv at a time, I'm certainly happy for anything that's definitely citable to go without the citations being copied up (citations yes, citations copied up, no). Renard Migrant (talk) 15:56, 10 October 2016 (UTC)[reply]

Nobody's really ever wanted to define durably, and I think we should. As someone said, what about taking screenshots of websites and adding them to Commons? Commons is likely to last as long as this wiki is, why shouldn't that count? Renard Migrant (talk) 16:05, 10 October 2016 (UTC)[reply]

Commons itself might last, but the image theoretically might be deleted. But I do think that's actually a good idea. The more realistic issues are that images can be tampered with, and screenshots especially (in fact Google Chrome allows you to actually change the content of any web page right in your browser, maybe other browsers have this feature as well). And if we start including anything from the internet, we'd have to come up with more rules to limit typos and misspellings and other nonsense. And the three different authors rule would be difficult to follow if the authors are anonymous. --Wiki Tiki 89 16:16, 10 October 2016 (UTC)[reply]

Maybe we can upload screenshots to enwikt, but not to Commons. I believe our current quotations and possible future screenshots are fair use, which Commons does not accept. If the website has a CC license or is in public domain, Commons might accept it. --Daniel Carrero (talk) 17:38, 10 October 2016 (UTC)[reply]

I see those options as essentially equivalent. --Wiki Tiki 89 18:03, 10 October 2016 (UTC)[reply]

Why even go that far? If we have the quote of it being used in a sentence and we put that on Wiktionary, shouldn't that be good enough? Purple backpack89 17:53, 10 October 2016 (UTC)[reply]

Copying errors. --Wiki Tiki 89 18:03, 10 October 2016 (UTC)[reply]

Because it's not durably archived. Someone might want to check the original for any of several reasons, and it may not be there. You take a screenshot and there's something to check. Renard Migrant (talk) 19:43, 10 October 2016 (UTC)[reply]

IMO, copying errors is too narrow a reason to dictate our entire attestation policy. Purple backpack89 20:17, 10 October 2016 (UTC)[reply]

What about having multiple editors verify the text of a quote before accepting it? Even if it changes we have the affirmation of multiple people of what the text used to say. —suzukaze (t・c) 20:29, 10 October 2016 (UTC)[reply]

That does not sound too good. Just imagine if all our current quotations required to be checked by other people. It's hard enough to "finish Wiktionary" and ideally attest all senses as it is. If there are any copying errors, they are going to be found eventually, if the website does not vanish first. --Daniel Carrero (talk) 20:51, 10 October 2016 (UTC)[reply]

I suspect the reason why we have this policy is to avoid drowning in Internet ephemera, it's not a perfect filter, but removing it would necessitate creating more policies regarding exactly what kinds of words we want and how to define them, which would increase rules lawyering and decrease the quality of the dictionary. Crom daba (talk) 21:14, 10 October 2016 (UTC)[reply]

It stops people coining words on their on social media account then citing those social media accounts as sources. Renard Migrant (talk) 11:06, 11 October 2016 (UTC)[reply]

People are already able to do that through Usenet, aren't they? --Daniel Carrero (talk) 11:33, 11 October 2016 (UTC)[reply]

I think we're kidding ourselves into think that durable citations keeps only bad words out and keeping only good words in. Purple backpack89 18:27, 11 October 2016 (UTC)[reply]

Does anyone actually think that, though? Renard Migrant (talk) 20:56, 11 October 2016 (UTC)[reply]

Words needing citations from the internet

After some thought, I decided to support getting citations from the internet. English already has a social-media-ey place called Usenet from which we are allowed to get citations of internet slang and a number of random neologisms, but there are not a lot of Portuguese-speaking folks on the Usenet, so our coverage of modern Portuguese terms may be not as good. Maybe other languages too, I don't know.

Here are some Portuguese words that seem to be common enough on the internet but I was unable to find 3 citations for them on Google Books.

cospobre -- a very poorly done cosplay
democracídio -- "democracide", the act of killing the democracy
flopar -- to fail
nerfar -- to nerf (video game sense)
Olindar -- to spend time in Olinda, Pernambuco
qnd -- abbreviation of "quando" (when)
shippar -- to ship (fictional character relationship sense)
SQN -- abbreviation of "só que não" ("only not"), added at the end of a sentence
trisal -- polyamorous relationship consisting of three people
upar -- to upload

Also:

exactly 1 trillion and a half emoticons can be attested from websites, if nobody minds

--Daniel Carrero (talk) 17:54, 11 October 2016 (UTC)[reply]

Usenet is considered durably archived because multiple sites, financially stable, with a long history, actually have the archives. It is not a precedent for proprietary social networking sites, which may disappear or become inaccessible when the owner does or when the owner is in a tight financial bind. It would be better to look for what would get the multiple institutions with Usenet archives to add other classes of text data. DCDuring TALK 18:05, 11 October 2016 (UTC)[reply]

Apparently, the Library of Congress is/was preserving an archive of Twitter: [2]. The OED occasionally cites tweets. Equinox ◑ 18:07, 11 October 2016 (UTC)[reply]
The US Library of Congress ("LoC") was to be the recipient of a donation by Twitter of a few years of public Twitter postings. BYU also thought they would have that corpus among their offerings. BYU no longer mentions Twitter and I haven't found any discussion on the LoC site of the Twitter data. The discussion seems to have bogged down. They did not seem too eager to support unlimited access. DCDuring TALK 00:18, 12 October 2016 (UTC)[reply]

Then so should we.

Another thought from a probably perspective: if a word is used 500 times on the internet, what are the odds that 498 or more of those citations will disappear from the Internet in 10 years' time? Purple backpack89 18:27, 11 October 2016 (UTC)[reply]

Maybe I'm short sighted, but if a word is 700 times on the internet, do we have any method to sieve out the automatic copy paste to make sure it's not used 2x350 times? Korn [kʰũːɘ̃n] (talk) 19:17, 11 October 2016 (UTC)[reply]

This discussion was rooted in the belief that, to satisfy attestation, somebody would still have to find three different quotes spanning a year and all that... Purple backpack89 20:00, 11 October 2016 (UTC)[reply]

Yes, and upload a screenshot (a small cropped version, I hope) of the website to enwikt to ensure that we have the source content if the website vanishes. Should we have an additional rule of only attesting entries from websites if they have a lot of Google results in the first place? Just by creating an entry, we fill the Wiktionary mirrors with the same entry, thereby increasing the number of Google results to some extent. --Daniel Carrero (talk) 20:12, 11 October 2016 (UTC)[reply]

What does 'exactly 1 trillion and a half' mean? 1,000,000,000,000.5 or 1,500,000,000,000? And really exactly? Renard Migrant (talk) 20:56, 11 October 2016 (UTC)[reply]

Please don't take it literally. What I meant was: "An arbitrarily large number of emoticons can probably be attested if we accept citations from random websites." But, I thought it was clear that 'exactly 1 trillion and a half' means 1,500,000,000,000: that is "1 trillion and a half trillion". --Daniel Carrero (talk) 02:12, 12 October 2016 (UTC)[reply]

The screenshot idea is problematic. It's easy to manipulate them, you'd need a complicated scheme to find them when you want to ('cause you can't put them in the entry), I'd take a lot of time so people probably won't do it, you can't use screen readers (except maybe if we use PDFs), etc. Instead I suggest using w:Wayback Machine, see also w:Help:Using the Wayback Machine, the archiving can be done with a click and some Javascript and it's even bottable. We can limit sites that disallow archiving. Another way to limit problematic sites is to limit to something like sites that are in the 100,000 (maybe too big) top most popular in a certain country as measured by w:Alexa Internet rank or something else. I agree that checking search engine hit counts is a good idea. —Enosh (talk) 08:50, 12 October 2016 (UTC)[reply]

Also w:Wikipedia:Link rot, this problem feels pretty well covered. —Enosh (talk) 08:59, 12 October 2016 (UTC)[reply]

Re "we can limit sites that disallow archiving": this changes over time. The archives of some of my own past domains have been hidden on Archive.org when a cybersquatter has taken over and applied a more restrictive robots.txt. (The same is true of Google's Usenet archive, where you can apply to have your old posts hidden; but I think we have assumed Usenet to be archived by more people than just Google.) Equinox ◑ 13:21, 12 October 2016 (UTC)[reply]

w:Wikipedia:Link rot basically suggests: "Don't delete a citation just because the link is now broken! Search for copies of the citation instead! You can even use internet archives!" Is this something we can implement on Wiktionary? Basically, we would not need screenshots, we would only trust that all current citations are correct, but if a link is broken, we can fix it ourselves or we can open an RFV to verify an existing citation that is a broken link. We may want to use a separate request page for that, like WT:Requests for citation check or something. We could have the rule that if a citation can't be double-checked at a later date, it is invalidated. (like when a new robots.txt disables old, archived pages)

As a related subject, should we be able to accept citations from movies, video games, musics, for attestation purposes? I would be happy to accept citations from Brazilian movies because I'm under the impression that sometimes the characters use dialectal/regional speech that may be difficult to find in books. When something is written on a video game or movie (as opposed to said out loud), we could upload a screenshot, I suppose. The fact that a screenshot is easy to manipulate is not a huge issue, is it? Text citations are the easiest thing to manipulate, and we can't disallow text citations based on that. We will just have to be able to keep double-checking citations when we want. If a certain video game disappears forever from all computer systems (which sounds unlikely to me), we may even choose to invalidate citations and delete screenshots taken from it. --Daniel Carrero (talk) 13:54, 12 October 2016 (UTC)[reply]

We do have a handful of citations from games (e.g. deathmatch), films (hubba hubba), and songs (bootylicious). In some cases these are the simplest way to find a word (e.g. hip-hop slang) or the earliest practical citation. I don't think it's much harder to get hold of these media than a book. They seem pretty "durable". Equinox ◑ 13:57, 12 October 2016 (UTC)[reply]

~~Maybe WT:CFI should say it explicitly? "We accept citations from books (visit Google Books!), Usenet, video games, music, movies..."~~ --Daniel Carrero (talk) 14:02, 12 October 2016 (UTC) Nevermind, it does. Except video games and songs, though. --Daniel Carrero (talk) 14:05, 12 October 2016 (UTC)[reply]

@Equinox: In fact, much easier many times--you can just search for the song and stream it in seconds. I think that adding some kind of mass media real world usage is superior to hypothetical but illustrative examples (e.g.) —Justin (koavf)❤T☮C☺M☯ 14:02, 12 October 2016 (UTC)[reply]

There is, of course, the issue that songs mostly don't come with lyrics (unless perhaps on the album sleeve), so it's hard to prove that a particular word and spelling is what the song contains. But that probably needs a separate discussion: this indentation is getting crazy. Equinox ◑ 16:23, 12 October 2016 (UTC)[reply]

I agree with Daniel's above suggestion to allow citations from anywhere, trust current citations as correct, and fix them if challenged and/or as needed. Purple backpack89 15:37, 12 October 2016 (UTC)[reply]

Proposed CFI change

If we want to accept citations from the internet for attestation purposes, I believe the right place to edit would be WT:CFI#Attestation, as described below.

Currently, that section contains these items (I'm not copying the vote references):

clearly widespread use, or
use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year (different requirements apply for certain languages).

We could add a new item and move a portion of text below the list, this way:

clearly widespread use, or
use in permanently recorded media, or
use on the internet, in sources that remain publicly accessible over time; if a link is broken, try searching for the same content in archives and other sources, otherwise the current citation is invalid.

For attestation purposes, all citations must convey meaning, in at least three independent instances spanning at least a year (different requirements apply for certain languages).

Maybe we can use WT:RFV to check for review of existing citations, but that page is already too large and unwiedly. It's more than twice the size of WT:RFD, or WT:RFM, or WT:RFDO, or WT:RFC. For this reason, I think it's better to create a new page to check for existing citations. Maybe it should be called... I don't know. Maybe WT:Requests for citation review (WT:RFCR) or something else. --Daniel Carrero (talk) 00:33, 13 October 2016 (UTC)[reply]

I support expanding CFI to include Internet citations in some form or other. This would make it ten times easier to cite slang terms, especially ones like savage that have common meanings that would dominate a Google Books search (note that that example is missing a slang sense because I haven't been able to easily find citations). Andrew Sheedy (talk) 01:42, 13 October 2016 (UTC)[reply]

I support in principle, but we need to spend a lot of time and effort to construct a viable policy. The above is totally insufficient and I oppose it. --Wiki Tiki 89 15:21, 13 October 2016 (UTC)[reply]

Do you see any specific problems with the currently proposed text? --Daniel Carrero (talk) 15:38, 13 October 2016 (UTC)[reply]

It's not the text, it's that we need to figure out a system, and the system described in your proposed text is insufficient to weed out garbage and does not specify any limits on how frequently or infrequently internet citations need to be re-checked. It contains no protections against sites that are frequently edited. The problem is that this is actually a hugely significant change and so it requires a great deal of careful thought and discussion before we arrive at a workable system. --Wiki Tiki 89 15:52, 13 October 2016 (UTC)[reply]

Sure. I hope this is a good start. I think it would be a bad idea to have any time limits to re-check entries, people can do it whenever they want and use the proposed WT:Requests for citation review when necessary. Additionally, maybe bots can crawl all the links and search for 404 and other errors, and tag entries automatically. Also, if we consistently fill the accessdate= parameter of web quotations, then years later we may search for the earliest accessed entries to see if they are still OK. If a certain hosting service is disabled, like the old Geocities, we should be able to find all citations that use it, to see if we can fix them or should invalidate them.

Should we have protections against sites that are frequently edited? If we use wikis for citations, we may link to the edit histories. If there are no edit histories, we may search in the internet archives or invalidate that citation if all else fails. We should probably ban citations from Wiktionary discussions themselves, to prevent a circular logic: if we allowed citations from our discussions, then basically only 2 citations from external sources would be needed at all times, because we would easily fabricate the third, which I believe would be unfortunate.

I think we should only accept citations when it's clear who is the author, not only to credit them, but because I'm afraid those Tumblr articles with "Source: Tumblr" and nothing more are deeply unprofessional. This would probably weed out many random clickbait articles like "20 Signs You Spend Too Much Time Building Dictionaries". In fact, we probably should ban all clickbaits and advertisements altogether, because I don't want the word "delicious" to have a citation reading "You should eat a delicious Big Mac™." This would be bad in citations from magazines and newspapers, too, unless maybe if they are from ages ago and the word is used in a meaningful way. --Daniel Carrero (talk) 16:31, 13 October 2016 (UTC)[reply]

What's to prevent people requesting re-checks of the citations every other day? That's one reason we need limits. As for editable pages, with or without histories, there will frequently be errors (not just typos, but misuses of words, and other such things) that are later corrected, so the fact that we are able to point to some particular revision does not make that the right revision to point to. In print media, there is a lot more proofreading going on, which is why we don't have to worry about it so much, but on the internet it becomes a problem. And even just having the ability to edit pages after posting makes it less likely that authors will proofread themselves before posting. --Wiki Tiki 89 17:15, 13 October 2016 (UTC)[reply]

But RFV, RFD, RFDO, RFM and RFC don't have a rule like "don't create new requests every day!", and we don't seem to have that problem. I double-checked the introduction of the aforementioned request pages now. Wiktionary:Voting policy contains "No topic should have a new vote more than once a day (24 hr period)."

WT:RFV includes: "Those who would seek attestation after the term or sense is nominated will appreciate your doing at least a cursory check for such attestation before nominating it"... Based on it, we may require people to check the internet archives by themselves before making a new request for citation-checking.

I take your point: "the fact that we are able to point to some particular revision does not make that the right revision to point to". Wikis or not, basically all pages on the the internet are editable by their authors. Maybe we should have greater quality standards and ignore one-off uses of words... If a typo like "esaclator" appears even in a book, we won't create a new entry esaclator because of it. We are able to tell the different between mistakes and actual words, right? Or maybe not? --Daniel Carrero (talk) 17:46, 13 October 2016 (UTC)[reply]

RFV doesn't need a rule at the moment because once something is attested it cannot become unattested. RFD/RFDO, however, does have a rule that you can't nominate something that has already passed RFD (without a substantially new reason or there having been a change of practice or policy since the last RFD). At RFC, you wouldn't nominate an entry that has just been cleaned up because it is already clean. With this new internet stuff, however, something can be attested one day and unattested the next, but it would be too time-consuming to re-check the same word every day. As for editable pages, if someone spells selfie as selfy, we might use it to attest selfy as an alternative spelling, but then the author comes back the next the day and corrects the spelling to selfie because it was really just a mistake. How are we to know? I would say we need some sort of standards such as only using content that is professionally proofread or something like that. --Wiki Tiki 89 18:19, 13 October 2016 (UTC)[reply]

I'm not sure I agree with attesting entries only from content that is professionally proofread. I assume we would want to use sentences with random orthography and abbreviations like h8 = hate to build up our database of internet slang and text messaging slang. That's one kind of thing we are already able to find in Usenet. Usenet is not professionally proofread.

For non-internet slang cases, yes, we may want to implement a number of quality standards. We might have a specific rule saying that a nonstandard spelling that was later fixed becomes invalid as a citation.

I'm still not sure that we need a time limit. I could even suggest having an explicit time limit of 30 days or something if people really want it, even though I don't see the need.

But let's see how it would work out with an example: Suppose we visit http://www.snopes.com/luck/chain.asp (Snopes is professionally proofread, I believe) and get one citation from that page. In the first large paragraph, the first sentence is: The practice of circulating letters to other parties beyond their original recipients has existed for centuries, so pinpointing the exact origin of chain letters is problematic.. We could attest pinpoint using that sentence.

Suppose that tomorrow I visit the page again, and witness a very unfortunate turn of events: the owners of Snopes decided to close the website forever. I believe I would have to search for archives to see if we can keep the citation. If I don't find any suitable archives, I may create a citation review request.

If Snopes is not closed and the citation still exists in the original page, or if I already found an usable archive, there's no need to request a citation review. In which circumstances would someone be able to create a new request over and over for the same entry every day? --Daniel Carrero (talk) 19:03, 13 October 2016 (UTC)[reply]

The way it's worded right now, absolutely not. In principle should we include more things common on the Internet but uncitable by our current CFI, yes we should. I would favor replacing 'clearly in widespread use' which I suspect is getting as the same thing, as other things that are in clear widespread use would pass the three citations rule anyway. Let's rewrite that line. Renard Migrant (talk) 19:12, 13 October 2016 (UTC)[reply]

Usenet is not modifiable, so it's less of an issue. I never said "time limit", I just said "limit", which can be any sort of limiting factor. Professionally proofread is an example of something we can use to limit garbage, that doesn't mean it's the only thing we can do. Another thing discussed was increasing the citation requirements to 700, but that is probably beyond our research capacity. Right now we're in the brainstorming phase, so please don't make any concrete proposals any time soon. --Wiki Tiki 89 19:29, 13 October 2016 (UTC)[reply]

It's fine, I said "time limit" because I understood it as some specific limit to avoid new requests every day. Sure, I would probably support the general idea of having some limits. In WT:RFV#Acela Republican (to be archived at Talk:Acela Republican), Purplebackpack89 said "I got 744 [Google Web hits]". I think the idea proposed was that a word with 700+ words deserves to be included somehow, but even if we accepted it, it's not necessary to add the 700 citations here.

By "please don't make any concrete proposals", you mean I shouldn't create a vote, right? Because, in my opinion, a vote right now would be pointless, but I believe we could still discuss what could be the CFI text to be edited, even if partially. I don't have time to do it right now, though. --Daniel Carrero (talk) 19:43, 13 October 2016 (UTC)[reply]

No, I mean we don't need to be looking at specific CFI text yet. Not until we have a thorough understanding of how the new system is going to work. How do you very that the 700+ hits are of the correct sense of the word? And that they actually are uses of the word? You need to check them all. --Wiki Tiki 89 19:50, 13 October 2016 (UTC)[reply]

I don't see a huge difference between saying: 1) "Let's have a rule saying that professionally proofread text is required." (no CFI text proposed) and 2) "Let's add in CFI: Professionally proofread text is required." (CFI text proposed).

I take your point that Usenet is not editable.

About the 700+ hits thing, you are going to have to ask that to PB89. --Daniel Carrero (talk) 19:57, 13 October 2016 (UTC)[reply]

The difference is that it's an idea. It's not a whole proposal, it's only a part of the bigger picture. And it's also not necessarily a good idea, it's just an idea. It needs to be discussed first. Also, proposing the actual CFI change draws too much attention to the language, which distracts from the content itself. --Wiki Tiki 89 20:02, 13 October 2016 (UTC)[reply]

Ok, I understand. --Daniel Carrero (talk) 20:05, 13 October 2016 (UTC)[reply]

Requiring six citations from the internet

Would it help to require six citations instead of the usual three when citing from the Internet? In other words, we'd count them as only having half the value of a citation from a published work (and thus you could use 1 quote from a book and 4 from the Internet to meet the requirements). That might decrease the possibility of typos and spelling errors being used in citations. Andrew Sheedy (talk) 01:23, 14 October 2016 (UTC)[reply]

Maybe, that seems worth discussing. Some of the limits discussed above still should apply, IMO. If someone writes "selfy" and later fixes it, (in a wiki or otherwise), as mentioned above, I don't think it would serve as a good citation for selfy, even if we require 6 citations.

I think CFI or a separate page should list in a comprehensive way what are the known durably archived sources, like this:

Books
Usenet
Video games (which games? all of them?)
Songs
Google Scholar (I guess?)
etc.

--Daniel Carrero (talk) 14:03, 14 October 2016 (UTC)[reply]

A quote from the man who created the Dead Media Project:

2001, Bruce Sterling, Digital Decay‎^[3], retrieved February 7, 2012:
Originally delivered as the keynote address for Preserving the Immaterial: A Conference on Variable Media at the Solomon R. Guggenheim Museum on March 30, 2001
Bits have no archival medium. We haven't invented one yet. If you print something on acid-free paper with stable ink, and you put it in a dry dark closet, you can read it in two hundred years. We have no way to archive bits that we know will be readable in even fifty years. Tape demagnetizes. CDs delaminate. Networks go down.

-- DCDuring TALK 15:52, 16 October 2016 (UTC)[reply]

Are most websites reasonably expected to fade away in a few decades? (some defunct hosting providers like Geocities and hpg.com.br come to mind, because naturally when they were disabled, websites hosted by them were disabled too) Archive.org looks reliable enough, ...there's always the threat that changes in a website's robots.txt will delete the archives of that specific website within Archive.org, but we could simply reject any citations if they can't be found on the archives anymore.

I don't know what will happen in 200 years, but if by any chance the copyright laws remain the same, apparently all that remains of today's internet will be in the public domain, and future internet archivers should have more freedom to keep it if they want. (correct me if I'm wrong) --Daniel Carrero (talk) 20:51, 18 October 2016 (UTC)[reply]

Suggested rules

As discussed above, these are some of the suggested rules. I hope I didn't forget anything important:

Requiring 6 citations from the internet, instead of 3. (I'd probably oppose that as unnecessary. But it's okay if people want it.)
A given citation needs to be publicly available in order to count. It can either link to the original page or to live archives. If no archives can be found anymore, the citation does not count anymore.
We should probably have a separate page like WT:Requests for citation check if a given citation can't be found either on the original page or the internet archives anymore, to request people to keep searching before considering a citation invalid.
If we are citing a text that was later edited to remove the cited word, then our citation is invalid because the author was probably fixing a mistake. This includes revisions of text in wikis.
Only allowing citations if the author is known, either by the real name or by nickname. This is intended to weed out random memes and articles with unknown authorship. This should also weed out those fake quotations like "Wiktionary is awesome --Albert Einstein", in which the quoted author never actually said it. For wikis, we can probably use "contributors" and link to the history or something.
New idea: Disallow any clickbait websites that require you to click "next >>" 14 times in order to view a full article. Reason: For a bit of quality, please. Cracked.com is fine, in my opinion. (Sorry, that's subjective and probably needs discussion. We may want to disallow Cracked.com if this rule is implemented.)
Disallow any citations from ads. We don't want the entry delicious with a citation like "Eat a delicious Big Mac."

Rationale:

With the popularity of the internet, there are terms that can't be found just in books and other durably-archived sources, so allowing citations from the internet would allow our coverage to be more complete. This probably includes some internet slang, text messaging slang and emoticons. I listed some Portuguese words above.

--Daniel Carrero (talk) 23:16, 20 October 2016 (UTC)[reply]

How many votes?

Planned, running, and recent votes [edit this list]

(see also: timeline, policy)

Ends	Title	Status/Votes
Oct 11	User:Catonif for admin	passed
Oct 24	User:Ioaxxere for admin	passed
Nov 11	User:Svartava for admin	11 2 4
Nov 27	Excluding trivial present participal adjectives	0 4 0
(=4)	[Wiktionary:Table of votes]	(=78)

Please let me know how many votes is a good maximum number of votes created by the same person at a given time.

Of the 12 "planned, running and active votes", I created 8. Actually, only 9 of those are actually active (because they already started and didn't end yet), of which I created 6... which is a little more than 1 vote per week.

"A little more than 1 vote per week" has been my actual rule of thumb for some time. --Daniel Carrero (talk) 19:34, 10 October 2016 (UTC)[reply]

I think it's also important to make sure votes are well-conceived. I would say that a lot of failed votes is a sign that something is wrong. A single failed vote isn't necessarily a problem but too many failed votes is just a waste of time on everyone's part. Benwing2 (talk) 19:38, 10 October 2016 (UTC)[reply]

Of my votes created in 2016, 18 fully passed, 6 at least partially passed and 14 failed. --Daniel Carrero (talk) 19:55, 10 October 2016 (UTC)[reply]

Of clean, well-crafted votes that don't require midstream revision or additional provisos not strictly speaking part of the proposal, I'm sure that we could do one a week. If the votes were open for five weeks, we could be reasonably sure that the realistic potential population of voters would have a chance to consider the proposals, even if they only came by once a month. Some care should be taken to avoid having too many votes at times when contributors may not be available, eg, beginning of semester, summer vacation. DCDuring TALK 00:07, 12 October 2016 (UTC)[reply]
Sure! I've been trying to learn from my mistakes, for example nowadays when I want to edit WT:EL, I prefer to propose a change to small portions of text at a time, because a vote trying to review whole large sections is usually very difficult to pass.

Unfortunately, even after multiple people supported the creation of a vote, sometimes people bring up new problems after the vote has started. Usually, these votes would just fail, (but most of my 2016 votes have passed) which may result in a better votes in the future for the same things. I think it's OK when people edit ongoing votes to fix minor grammar mistakes, though.

It seems that in practice, most BP discussions that are not about syllable categories are unlikely to get new answers after a week or so. By the second week, sometimes I feel the urge to think: "Well, consensus or not, this discussion is as good as it will ever be. It's over." I've been waiting some more time just in case, but when nobody comes, I secretly think: "I knew it!" --Daniel Carrero (talk) 15:37, 13 October 2016 (UTC)[reply]
In general, others are not be as interested in a proposal that you make as you are, usually because the proposal solves no problem that they have. Also many problems that they do have may not get addressed until they create some technical problem, at which point they are addressed (not necessarily solved, sometimes replaced by worse problems or inconveniences) without a vote.

I don't think it's OK that "minor" grammatical mistakes are fixed on the fly. They should not occur. Each such "fix" may change the proposal substantively in unintended ways. Presumably many who read the proposal once would have to reread it to determine whether the change was in fact "minor". The result will be that few indeed will read the proposal until the proposal text becomes stable. That is why we have BP discussions. (Other fora are not acceptable substitutes IMO.) Having votes that are uninteresting, eg, trivial, and imperfectly drafted will lead to simplified voting heuristics, such as "Always vote no" or more selective versions. DCDuring TALK 21:22, 13 October 2016 (UTC)[reply]

Most of my votes are to edit policies rather than solve technical problems. Maybe you could count proposals such as "install a new extension", "implement a new namespace shortcut" or "create a new user rights group" as technical problems, which clearly require votes. Most tech problems don't require votes.

I think it's OK if a policy-edit vote is uninteresting. When I create BP discussions and suggest creating votes, I believe I have never heard this specific complaint: "Don't create it, it's uninteresting!" But I understand that it may be a reason for a certain vote to have a noticeably low turnout.

If someone attempts to edit a vote that already started with the purpose of fixing a grammar mistake, I suggest you revert them if you want, but if multiple people prefer the revised version of that specific vote anyway, please cater to them. Grammar mistakes happen. If it's so serious, we can withdraw the vote and try again. --Daniel Carrero (talk) 19:19, 14 October 2016 (UTC)[reply]
IMO generally, policy should be written when there is disagreement regarding de facto practice. It should be pruned regularly to reduce creep. Clarifying policy often leads to quibbles, and discussions about doing rather than doing, and (ime) discussions about discussions about doing. I would say votes are the last resort when something has stopped people from doing. So, uhm, 1 every year or two, if you must. - Amgine/^t·e 05:18, 1 November 2016 (UTC)[reply]
I wish to keep discussing about the number of votes to reach an agreement. I do heartily agree with this: "[Policies] should be pruned regularly to reduce creep." But sorry, I disagree with this: "policy should be written when there is disagreement regarding de facto practice". I believe policy should be also written to inform new users who are not aware of consensus.

I believe 1 or 2 votes per year is really too low; why bother voting then? I've been focusing on suggesting updates to WT:EL: see this diff for changes in the last year. They require votes, otherwise anyone could edit the policy with abandon. Voted and approved policies are often solid policies. --Daniel Carrero (talk) 10:18, 1 November 2016 (UTC)[reply]
When a new contributor is not able to discover how to do something, and therefore asks someone how to do it, and questions on doing that something happen often enough to disrupt the workflow, then maybe it needs to be documented. This is not the same as writing policy. - Amgine/^t·e 16:49, 1 November 2016 (UTC)[reply]

Adverb, prepositional phrase, adjective, ...?

Are things like at ease and à gauche best categorized as adjectives, adverbs or prepositional phrases? There are tons of things in CAT:English prepositional phrases and a few in CAT:French prepositional phrases; many more putative prepositional phrases are found in CAT:French adverbs, for example. Where is the boundary to be drawn? Benwing2 (talk) 19:36, 10 October 2016 (UTC)[reply]

I don't know a thing about the traditions of French grammar or lexicography.

Putting prepositional phrases into the word classes adverb and adjective is not traditional, but neither is treating prepositional phrase the way we do. Most dictionaries have them as run-in entries where they do not require their own word class. Grammarians would say that prepositional phrases can be used as adverbs or adjectives, but would not put them in the corresponding word classes.

In principle, every English prepositional phrase is just that. Recategorization would just be a matter of replacing the categorizing inflection-line templates. Merging the Adverb and Adjective PoS headers would require rewording definitions one at a time. It's not the kind of thing that most of our contributors are capable of or interested in. One would probably get disagreement as to the desirability of even the category change. DCDuring TALK 09:28, 11 October 2016 (UTC)[reply]

I like prepositional phrase as it covers both. Renard Migrant (talk) 11:04, 11 October 2016 (UTC)[reply]

I would keep in mind that things that look like they're used as adjectives may not necessarily be adjectives. This is stronger in French with its postposed adjectives than in English with preposed adjectives. Compare, for example, "this house here" to "cette maison ici". "ici" looks exactly like an adjective in this position, but is it? Care should be taken when judging prepositional phrases to be adjectives this way. —CodeCa t 17:49, 12 October 2016 (UTC)[reply]

Formatting of cognates at Reconstruction:Proto-Celtic/kumbā

A year ago, there was Wiktionary:Beer parlour/2015/September#Formatting proposal: always put cognates in a separate paragraph, which has majority support. I've been implementing this in entries ever since, but now User:Victar has started reverting me. I pointed him to the prior discussion, but he dismissed it, claiming that as he created the page he's entitled to choose how he wants to format it. This isn't true of course; there's no ownership of pages, anyone can edit anything, and decisions are made by consensus. Since this matter has no consensus and there's only two parties involved, there's two ways out: edit war into eternity or form a wider consensus. I'm choosing the latter option. So I'm asking now, how should the cognates be formatted: in their own paragraph or not? —CodeCa t 17:37, 12 October 2016 (UTC)[reply]

Your proposal was simply that, a proposal. It was not ratified as formatting guideline, and as such, it shouldn't be enforced with the same blind vigor. If we're simply talking about a matter of personal preference, which it is, than I have the right to have my own, and yes, I think especially as the creator of the entry. I strongly believe that creating separate lines for cognates unnecessarily pushes down the whole of the content. --Victar (talk) 17:51, 12 October 2016 (UTC)[reply]

Indeed, in that discussion it seems that a large portion of people expressed the opinion that having cognates in a separate paragraph should be an option, especially when the Etymology section is big, but not a requirement. If I had contributed to the discussion, that's probably what I would have said, too. —Aɴɢʀ (talk) 18:48, 12 October 2016 (UTC)[reply]

Edit protect tchýně

Can someone please protect this page? There's a bunch of Czechs who seem to think it's ok to ignore Wiktionary's descriptivist approach and repeatedly inserting all kinds of POV appeals to authority. There's also some warring on the talk page because one of them posted in Czech, which is inappropriate for a discussion on the English Wiktionary. —CodeCa t 19:12, 12 October 2016 (UTC)[reply]

I don't have the time to check that entry right now, but I see some kind of edit war ongoing. Please someone review that. I added "autoconfirmed" protection. Is it OK or does it need admin-level protection? Also, someone pretty please give CodeCat's admin rights back, thanks. --Daniel Carrero (talk) 19:21, 12 October 2016 (UTC)[reply]

It needs admin protection, they just did another edit. —CodeCa t 19:25, 12 October 2016 (UTC)[reply]

Weird. That person who edited the entry now is not even autoconfirmed. --Daniel Carrero (talk) 19:28, 12 October 2016 (UTC)[reply]

@Daniel Carrero Well can you do it please? They're still at it. —CodeCa t 22:47, 12 October 2016 (UTC)[reply]

Done --Daniel Carrero (talk) 22:50, 12 October 2016 (UTC)[reply]

Thank you! —CodeCa t 22:53, 12 October 2016 (UTC)[reply]

@Daniel Carrero or anyone else: can you also protect dceřinná společnost and vyjímka? It's the same issue, they've just started messing with other pages instead. —CodeCa t 23:24, 12 October 2016 (UTC)[reply]

For future reference: note in particular the edit history of these three entries. They consist almost entirely, from the moment Dan Polansky created them, of some editor changing it to "misspelling" and/or adding "incorrect" POV notes, and then various more experienced editors putting it back. —CodeCa t 23:27, 12 October 2016 (UTC)[reply]

Done. Again, I'm just trusting you on these ones, because I don't speak Czech. If others want to review my page protections, I welcome them. --Daniel Carrero (talk) 23:30, 12 October 2016 (UTC)[reply]

I don't speak Czech either, but their edits and arguments seem like just an appeal to authority to me, ignoring Wiktionary's descriptive nature. I trust Dan's judgement more than any of theirs, since he's a native speaker and a knowledgeable Wiktionary editor. Another Czech editor has now provided further statistics to show that tchýně is widely used. —CodeCa t 23:33, 12 October 2016 (UTC)[reply]

I understand. Yes, I see your point and I think you're right. I see that now you asked Dan Polansky to weigh in on Talk:tchýně, which is good. --Daniel Carrero (talk) 23:35, 12 October 2016 (UTC)[reply]

Well, it would have been polite, if you (plural):

pinged us in this talk, so we could react here
pointed us to relevant descriptive pages explaining what is considered "descriptive"
would not consider and (indirectly) call us less experienced, considering the fact that some of us are in wikiverse longer than you
started the discussion instead of blind reverting without even stating the reasons which obviously led to the reverting and it didn't have to if clear explanation was provided
followed the way how to make a consensus

Very impolite behavior of you to de facto here-newcomers, shame on you, this is not how users should be treated.

You have a chance to remedy at least the second and fourth point now though...

@CodeCat: I don't think it is OK to ignore any rule as long as I know it. So it is heavily unfair what you have written in the second sentence of your opening post in this section, because none of you bothered to point us to such rule (as I've mentioned above). Not even speaking that none of you followed the consensus making process, so as we say in Czech "sweep in front of your doorstep first".

Anyway, this whole situation is obviously one big misunderstanding mostly because of lack of the communication (fortiori proper) from the local folks towards us. I can not obviously speak on behalf of other involved Czechs, but I'm pretty sure they would like me prefer the discussion instead of dragging the rope there and back.

So it would be constructive if you (plural) at first explained and described reasons why you (plural) keep putting "alternative spelling" instead of "misspelling" in those entries. (And no, simple wordcount statistics is not a reason.) Instead you should for the beginning at least clearly describe these two terms, so we could move forward.

PS: Please also mind the "lost-in-translation" factor, which may be a big stakeholder here.

— Danny B. 01:43, 13 October 2016 (UTC)[reply]

Actually simple word count is a reason. Descriptive linguistics is all about describing language as it is used, not as people think it should be used.--Prosfilaes (talk) 12:33, 13 October 2016 (UTC)[reply]

Maybe we don't understand whot is meaned by alternative spelling. For me is alternative situation when I should use e.g. both encyclopaedia or encyclopedia in written text. But this is not this case, when I use tchýně in czech text, it will be considered as mistake. When I use it in the school, I got worse mark.

We provided sources from university, from linguistic-oriented blogs, from newspaper etc, but because this error is very common, is this more than sources? OK, if i aply the same for inglish, i shuld writ as i hear and it wil be only altenative speling. JAn Dudík (talk) 18:19, 13 October 2016 (UTC)[reply]

That's what the usage note is for. Also, I added the tag "proscribed", perhaps no one will object to that? --Wiki Tiki 89 18:21, 13 October 2016 (UTC)[reply]

On the talk page, I suggested using {{nonstandard spelling of}}, but no one's responded to that suggestion yet. —Aɴɢʀ (talk) 18:42, 13 October 2016 (UTC)[reply]

I object to both proscribed and nonstandard; for more, see Talk:tchýně. google:"tchýně" shows how incredibly widespread this form is. The usage note in tchýně covers the matter, upholding descriptivists standards while at the same time accurately reporting the absence from Pravidla ("Rules") for those who find that fact relevant. --Dan Polansky (talk) 13:05, 15 October 2016 (UTC)[reply]

If you only count the Czech speakers involved, it is largely one me against multiple others. This is probably because Czechs are brought up in a prescriptivist language culture. Many Czechs seem to think that if a spelling is absent from a regulatory list of approved forms, then it is "incorrect". That is one reason why the idea that consensus should only be made by the natives of a particular language leads to poor results. Instead, my position is that consensus is to be sought among all eligible English Wiktionary editors, whether they know any Czech or not. Since even English, Dutch and Portuguese editors with no knowledge of Czech can distinguish prescriptivism from descriptivism. --Dan Polansky (talk) 13:15, 15 October 2016 (UTC)[reply]

@Dan Polansky: What do you think "proscribed" means? It means that many Czechs think it's wrong. --Wiki Tiki 89 15:46, 17 October 2016 (UTC)[reply]

Wiktionary:Glossary#P does not tell me what "proscribed" means. As far as I am concerned, "proscribed" label could be removed Wiktionary; it is suggestive of prescriptivism. Is there any English dictionary that uses the label? --Dan Polansky (talk) 17:30, 17 October 2016 (UTC)[reply]

What about going to Wikipedia and labeling W:Homosexuality article with "proscribed" box, meaning "many Americans think it is wrong"? Makes sense? --Dan Polansky (talk) 17:32, 17 October 2016 (UTC)[reply]

We are describing the fact that prescriptivism exists in Czech. --Wiki Tiki 89 18:05, 17 October 2016 (UTC)[reply]

Agreed? Homosexuality should get a nice red box "proscribed" in Wikipedia to indicate that anti-homosexualism exists in the U.S.? --Dan Polansky (talk) 18:13, 17 October 2016 (UTC)[reply]

That's not Wikipedia's style. Anti-homosexualism is discussed in the entry (or should be, I haven't checked). --Wiki Tiki 89 18:18, 17 October 2016 (UTC)[reply]

I don't see why it should be our style. In tchýně, I have used the usage note to indicate the term is not on the approved word list; the information is there. That is analogous to Wikipedia having no prominent box "proscribed".

Let me quote Ruakh from Template talk:proscribed: 'The problem with labeling a sense as "proscribed" is that, as Metaknowledge implies, it makes it sound like we are proscribing it. I prefer to write "sometimes proscribed" or "often proscribed", which I think makes it a bit clearer that we're talking about other people's proscriptions. And probably "condemned" or "criticized" would be better than "proscribed". —RuakhTALK 19:42, 13 September 2012 (UTC)'

I agree with Ruakh: it sounds like we are proscribing it. And we the dictionary are not proscribing anything. Given the current circumstances, I support deprecation of label "proscribed". --Dan Polansky (talk) 18:21, 17 October 2016 (UTC)[reply]

I agree with Ruakh as well, and if you had asked me to change it to "often proscribed", I would have done so. But I don't think it should be removed. Tags like this are already our style, it's no different from a "(dated)" tag. Why doesn't Wikipedia need to put a big red box at the top of w:Floppy disk saying "Dated"? --Wiki Tiki 89 18:40, 17 October 2016 (UTC)[reply]

It is different from dated in that there is nothing prescriptivist about dated or archaic. Put differently, label dated is not an imperative in disguise. The whole disagreement is not about provision of information since, again, I stated in the usage note that the spelling is absent from the mighty uberlist, but rather about the prominence and tone of the labeling. In any case, "often prosribed" would do a bit to alleviate my anti-prescriptivist concern.

Furthermore, since German zumindestens is actually being proscribed by language teachers, should it be marked as "often proscribed"? Should all vulgar terms also be labeled "often proscribed", since they indeed are often proscribed? And since ain't is often proscribed, shall it be so labeled? You have to clarify how far do you intent to spread the badge of shame that is "proscribed". --Dan Polansky (talk) 18:52, 17 October 2016 (UTC)[reply]

I think labels such as "colloquial", "vulgar", and "slang" already imply that it is proscribed. There is no reason not to have both the tag and the usage note. For some dated terms as well, we tag them as dated and also include a usage note specifying when it was used. --Wiki Tiki 89 18:58, 17 October 2016 (UTC)[reply]

Assuming the above, which I don't but anyway: then why not use the the definition line "informal form of" or "colloquial form of" and be done with it? --Dan Polansky (talk) 19:05, 17 October 2016 (UTC)[reply]

Do those actually apply to this word? --Wiki Tiki 89 19:35, 17 October 2016 (UTC)[reply]

They seem to: tchýně is how people very often pronounce the word, whereas tchyně is on the uberlist of the continental regulators. "tchýně" is how people very often write the word when not subjected to the rigor of zealous copyeditors; this is probably so because Czech has a rather phonetic spelling and "tchýně" matches the pronunciation. The other Czechs have to be "on guard" when writing lest they commit an "error".

Now, I cannot prove that people very often pronounce the word as "tchýně". I can only demonstrate that conspicuously many "tchýně" make it into printed works in Google books and to the world wide web, compared to "tchyně". --Dan Polansky (talk) 19:48, 17 October 2016 (UTC)[reply]

But anyway, "often proscribed" is so much better than "proscribed". Until we get "proscribed" removed entirely from the dictionary and become fully descriptivist again, it is okay, I guess. --Dan Polansky (talk) 20:00, 17 October 2016 (UTC)[reply]

I don't think the word "colloquial" can apply to alternative spellings. And to me, an "informal" spelling is something like thru. --Wiki Tiki 89 20:07, 17 October 2016 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Elsewhere, I proposed to use informal and discontinue colloquial, so I am certainly okay with informal. "thru" seems much more informal to me than "tchýně", but I am no native English speaker. Interestingly, M-W's thru[4] does not seem to say "informal" or "non-standard". --Dan Polansky (talk) 20:15, 17 October 2016 (UTC)[reply]

I clicked on the link to this article given in the entry you linked to, which has some interesting information. Based on that, I assume that they don't mark it as informal because thru was technically attested before through. However, at the end of the article it says: "All that said, thru is still considered an informal variant of through, despite its history and the AP's limited approval." --Wiki Tiki 89 20:37, 17 October 2016 (UTC)[reply]

Very interesting link; thank you. I wonder why they do not mark "thru" informal straight away. Be it as it may, they do not mark "thru" non-standard or "often proscribed", right? Does any dictionary do that for thru? And then, what do schoolteachers think of thru?

I would love to follow the model of the modern Anglo-American lexicography and mark "tchýně" as "informal form of" or the like. --Dan Polansky (talk) 20:45, 17 October 2016 (UTC)[reply]

But I'm not sure that thru has an equivalent usage pattern to tchýně. Most people when they write thru recognize that it "should" be spelled through but choose to ignore that fact for whatever reason (space limitations, laziness, stylistic considerations, etc.). I would think that most people who use tchýně do it either by without knowing that tchyně is the more widely accepted spelling, or simply by mistake without thinking about it, or perhaps because they are insistent on spelling words phonetically. Am I correct about that? --Wiki Tiki 89 20:54, 17 October 2016 (UTC)[reply]

I don't know see any meaningful distinction above. Again, "thru" does not seem any less informal than "tchýně"; indeeed, the article you linked above mentioned that, when drive-thru was proposed to be placed on signs, at "an editor's conference in 2014, there was an audible gasp in the room when this was mentioned [...]: the decline of English in action!". If thru signifies the decline of English to many and still can be labeled informal rather than often proscribed, I don't see any evidence to suggest that tchýně should be "often proscribed". It is better than "proscribed" but I disagree with it, find it prescriptivist, and hope it will be gone. If I could edit tchýně, I would remove "often proscribed", thereby returning the entry to the status quo ante. I am thinking about trying my luck and getting rid of "proscribed" altogether from Wiktionary, but do not see too good chances: descriptivism is very fragile and I actually found it surprising that it was upheld so well until now. --Dan Polansky (talk) 21:13, 17 October 2016 (UTC)[reply]

Are there any entries at all that need "proscribed" as a label before the definition? I agree with Dan Polansky's reasoning that this label is an imperative in disguise, and it indicates that we, the English Wiktionary, are proscribing a word or a sense.

For example, in my experience as native speaker of Portuguese from São Paulo, Brazil, basically everyone writes mozzarella as mussarela, (there are pizzerias everywhere selling pizza de mussarela, and markets sell slices of queijo mussarela by the kilogram). But one source online states that both Aurélio and Houaiss (two important dictionaries) use muçarela. This is because of a prescriptivist rule stating that words borrowed from other languages and then adapted to our orthography always use "ç", not "ss" in the middle of a word. I wouldn't want to add "proscribed" at the beginning of mussarela, because it would seem like we are prescribing it. --Daniel Carrero (talk) 12:47, 18 October 2016 (UTC)[reply]

More ise/ize

canonize/canonise is a typical example of -ise/-ize definitions being assigned to one entry and not duplicated over both. Given that one entry has to be chosen, by whatever method, to host the definitions, the treatment should otherwise be as symmetrical as possible. Presently this is not the case. While canonise is labelled "UK" or "British", no label is anywhere attached to canonize, giving the impression that canonise is a deviation from the norm. What is the best way to redress this -- given, as I say, that it is accepted that only one of the entries, in this case canonize, can host the definitions? Mihia (talk) 20:10, 12 October 2016 (UTC)[reply]

In actual practice, spellings like canonize are hardly used in British English, except by non-natives and American companies such as GM and Ford I suspect. DonnanZ (talk) 16:08, 13 October 2016 (UTC)[reply]

I agree, and I am aware of that, but the question is how best to indicate this in the entry for, say, canonize. In practical layout terms, where does the "US" label go? Mihia (talk) 17:43, 13 October 2016 (UTC)[reply]

Some British publishers do use the -ize spellings, notably the Oxford University Press, which is why British spelling using the -ize variants is known as Oxford spelling. It shouldn't be labeled simply as "US" (the way center or color should be) but should be labeled {{lb|en|US|Oxford}} or the like. —Aɴɢʀ (talk) 17:49, 13 October 2016 (UTC)[reply]

OK, but where should the label go? Mihia (talk) 19:19, 13 October 2016 (UTC)[reply]

I think having usage notes on every -ise/-ize verb would get pretty tedious. Can this be managed through context labels alone? Renard Migrant (talk) 19:28, 13 October 2016 (UTC)[reply]

I think so. I'd say the actual definition should be at only entry, and the other entry should be marked as an alternative spelling, with the appropriate context labels. The problem is which entry to make the primary one. In the past, some have suggested using Google Books Ngrams to see which is more common across English as a whole (i.e. without specifying en-US or en-GB); in this case, that would be canonize. But I don't know whether there's a consensus to use that method. —Aɴɢʀ (talk) 19:48, 13 October 2016 (UTC)[reply]

Which entry to make the primary one may be another problem, but it is not one that I am concerned about here. All I am concerned about in this thread is practically how to label the "primary" entry (the one hosting the definitions), to show that it is US, UK, or whatever. Mihia (talk) 20:57, 13 October 2016 (UTC)[reply]

-ize spellings can be labeled {{lb|en|US|Oxford}} and -ise spellings can be labeled {{lb|en|non-Oxford}}. The form called the alternative spelling can use a from= parameter instead of {{lb}}, like this. —Aɴɢʀ (talk) 21:23, 13 October 2016 (UTC)[reply]

I see ... you have just put the label against every definition at canonize. I do not personally feel that this is a very satisfactory solution (in fact, I had more or less discounted it, as I suppose I should have made clear). I suppose it may be tolerable with just three definitions, but if you look at an entry like color/colour (let's assume it is agreed to merge the definitions, and not get into that debate specifically here), it would be tremendously tedious to have to repeat the national labels against every definition, along with the various other labels on top. I was looking for a neater solution. Mihia (talk) 22:12, 13 October 2016 (UTC)[reply]

Just noticed (don't know why I didn't notice before) that "colour" and "color" actually put the national labels just once, next to the headword. Perhaps that is the way to go ... Mihia (talk) 22:16, 13 October 2016 (UTC)[reply]

That's {{term-label}}. —CodeCa t 22:18, 13 October 2016 (UTC)[reply]

OK, thanks, I changed it to use "term-label". I think that is better, unless anyone has any other suggestions about how to handle this ... Mihia (talk) 22:30, 13 October 2016 (UTC)[reply]

I wasn't aware of {{term-label}}. That is a good idea. —Aɴɢʀ (talk) 19:51, 14 October 2016 (UTC)[reply]

Proposal: Redirect many single-character entries

Sometimes, when I propose a new thing, it's something that I've been thinking for years. This is one of those times.

Proposal #1: Redirect many separate single character codepoints that basically mean the same thing. I believe it would be useful to create an exact, comprehensive list of the group of characters affected, if possible. I mean hard redirects, the ones that use #REDIRECT.

Proposal #2: For every redirected character, add {{character info/new}} in the main entry with the codepoints of all redirected characters. (or maybe another template if there are too many redirects for a single entry)

Here's a partial list. This assumes that all characters (including emojis bellow) are attestable. These are all the examples I could remember for now. Please add more if you remember any, discuss if you want, etc. (there are no Han characters in this list, apparently we prefer soft redirects for the traditional/simplified variants at least)

redirect (when possible) all specialized Roman numeral characters
- Ⅲ → III
redirect all fullwidth and halfwidth letters and symbols (I started a discussed recently about it here)
- Ｘ → X
- ＄ → $
- ￤ → ¦
- ﾃ → テ
- ﾢ → ㄲ
redirect all subscript and superscript characters
- ₖ → k
- ⁶ → 6
redirect all small caps characters (except when they have a separate meaning in IPA or something, like ᴀ)
- ꜱ → S
redirect all combining characters when possible (already voted and approved here in 2011)
- (combining acute accent) → ´
redirect single-character digraphs (already voted and approved here in 2011)
- Ǆ → DŽ
redirect single-characters that stand for multiple punctuation marks
- ‼ (double exclamation) → !!
redirect some specific characters for units of measurement
- ℃ → °C (or maybe soft redirect into both ° and C)
- µ ("micro-" sign) → μ (small mu) (it seems the "micro-" sign is used a lot in our entries, though)
- K (Kelvin) → K (the sofware already redirects this one automatically, it appears)
some random spaces like "EM SPACE" and "THREE-PER-EM SPACE" appear to be impossible to use in page titles, but they at least should probably have character boxes in Unsupported titles/Space, in my opinion
redirect Arabic presentation forms, I guess (I don't even speak Arabic, ignore if blatantly wrong)
- ﺏ, ـب, ـبـ, بـ (EDIT: I don't know how to write this, but ب should be the main entry, I believe)
concerning w:Hangul Compatibility Jamo and w:Hangul Jamo (Unicode block), redirect one to the other, I suppose
- ᄊ, ᆻ → ㅆ (we are already redirecting from normal to compatibility entries? shouldn't it be the other way around?)
it may be just me, but I'm not too happy with the small katakana word boxes -- I would redirect them and add {{character info/new}} to the full word entries
- ㌒ → キュリー
redirect all pieces and components of single characters
- ⎛, ⎞, ⎜, ⎟, ⎝, ⎠ (parentheses pieces) → ( ) (although these could redirect to either ( or ) I guess -- the ( ) feels more like the "main entry" to me)
redirect all "ornament" versions of other characters
- ❨, ❩ → ( ) (see comment in the item above)
redirect all vertical writing versions of other characters
- ⏜, ⏝ (vertical parentheses) → ( ) (see comment two items above)
redirect all fancy typography
- ❣ (heart-shaped exclamation mark) → !
redirect all emojis that basically are the same thing
- ⏳ (hourglass with flowing sand) → ⌛ (hourglass)
specifically, redirect emojis that mean the same thing but differ in color
- 💜, 💛, 💚 (fancy colored hearts) → ♥
specifically, redirect emojis that are basically the same expression of emotion
- 😭 (LOUDLY crying) → 😢 (crying)
- 😙 (kissing face with smiling eyes) → 😗 (kissing face)
- see the huge list from Unicode (link) for yourself, let me know if there are any problems with this one
redirect characters and emojis that mean the same thing but are arbitrarily rotated or inverted with no additional meaning
- ⦰ (reversed empty set) → Ø (unless the reversed one has any actual, separate meaning)
- ⌐ (reversed not sign) → ¬ (unless the reversed one has any actual, separate meaning)
- ⛉ → ☖ (white shogi piece)

Also, I think we should SOFT redirect these, because they are basically SOP and consist of multiple entries:

number + period
- ⒈ → 1 + .
- ⒉ → 2 + .
number + comma
- 🄂 → 1 + ,
number + parentheses
- ⑴ → 1 + ( )
letter + circle (...which sounds controversial, because the circle is just a typography thing; well, some like Ⓣ do have a separate, attestable meaning)
- Ⓑ → B + ○ (alternative idea: redirect Ⓑ → B as a single character)
number + circle
- ① → 1 + ○ (alternative idea: redirect ① → 1 as a single character)
random fractions... my point is: in my opinion, we don't need entries for random fractions
- ½ → 1, /, 2

--Daniel Carrero (talk) 20:48, 12 October 2016 (UTC)[reply]

Support these sorts of redirects for anything where there is no distinction made when writing by hand as opposed to using Unicode (meaning that I'm hesitant to support #10, though I don't oppose it either). I'm not sure I support #19, at least for common emojis, due to the fact that some have slightly different connotations even if they are very similar. It would be very helpful, BTW, to include the list of all variations of an emoji in all entries for them, just as that page you link to does, since the way they are perceived can change depending on what version is used. Andrew Sheedy (talk) 01:58, 13 October 2016 (UTC)[reply]

Apparently, the separate "start", "end", etc. Arabic letter varieties only exist for compatibility purposes, for this reason I'm under the impression that these redirects would be fine. (correct me if I'm wrong)

If people want to keep a separate entry for each emoji, that could probably work, too. But there are just too many for the same things, in my opinion. --Daniel Carrero (talk) 18:32, 13 October 2016 (UTC)[reply]

I oppose the systematic creation of redirects. --Wiki Tiki 89 15:23, 13 October 2016 (UTC)[reply]

I'm thinking of creating a vote for a new page called WT:Single-character redirects and starting with only the ones about Latin script letters and punctuation, then adding the others later if other people agree. We can link CFI to the new page. The specific redirecting rules that were already voted and approved in Wiktionary:Votes/2011-06/Redirecting combining characters and Wiktionary:Votes/2011-07/Redirecting single-character digraphs can be kept in the new page. --Daniel Carrero (talk) 18:32, 13 October 2016 (UTC)[reply]

One thing I wish to add: If we find 3 citations with "!!", are we citing the codepoint "DOUBLE EXCLAMATION MARK" ("‼") or two exclamation marks together ("!!")? I agree with Andrew Sheedy's remark about "these sorts of redirects for anything where there is no distinction made when writing by hand as opposed to using Unicode". When the distinction only exists in the text encoding as opposed to being an actual linguistic distinction, I believe it's a good idea to create redirects.

In my opinion, having a separate entry for Ａ (fullwidth A) is like having another entry for italic A, other for boldface A, other for sans-serif A, etc.

In my opinion, having a separate entry for 🄐 does not make a lot of sense either, because it's basically a SOP of A and ( ). 🄐 is an entry whose meaning can be perfectly understood from the sum of its parts, and it appears to currently exist only because Unicode has a codepoint for it. --Daniel Carrero (talk) 19:03, 14 October 2016 (UTC)[reply]

‼ should be treated like ligatures such as ﬁ. In fact, I don't even think we should have entries for them at all. --Wiki Tiki 89 21:03, 14 October 2016 (UTC)[reply]

If my proposal of creating certain redirects as described above passes, then the single-character ‼ can be redirected to !!, but if the latter should not exist, then the redirect could point to !. Per WT:CFI#Repetions, we are able to have a lot entries with repeated letters like suuure, so for consistency I believe it makes sense to create a few entries for repeated punctuation marks like !! and !!!. --Daniel Carrero (talk) 21:51, 14 October 2016 (UTC)[reply]

Minor comment: if you'd like to save smallcaps that mean something specific in phonetic transcription (#4), that logic immediately axes the redirecting of superscript letters (#3), which are regularly used in phonetic transcription for "overshort" sounds, such as the release of an affricate or a diphthong. (I'm also not sure how many smallcaps that aren't used in phonetic transcription there are, but it's not many.) --Tropylium (talk) 01:47, 16 October 2016 (UTC) [reply]

Thank you for the comment. Yes, I support keeping separate entries for all superscript letters that have a separate meaning. Apparently, some superscript numbers like ¹ and ² already have separate meanings and thus merit separate entries under that logic.

But, I would suggest converting the entry ³ into a redirect. The 1st definition is "superscript three", which is not a semantic, meaningful definition, it just describes the typography of the glyph. The 2nd definition is "cubed", whose meaning is taken from 3 + Appendix:Superscript. Any superscript number (or letter) is cubed, so it's not a special meaning of "3". In my opinion, having an entry ³ meaning cubed is like having an entry ×3 meaning "times 3". --Daniel Carrero (talk) 03:22, 16 October 2016 (UTC)[reply]

To suggest implementing the point #2 of this proposal, I created Wiktionary:Votes/2016-10/Redirect fullwidth and halfwidth characters. --Daniel Carrero (talk) 13:39, 21 October 2016 (UTC)[reply]

Terms attributable to a particular source

Does anyone object to adding this into the category hierarchy? Perhaps someone can think of a better wording? I mean it to house "___ terms coined by _____" categories ( so far only Latvian has such categories, but I'm thinking of making a {{coined by}} template to automate such a categorization ) together with "___ terms coined in the Simpsons/the Economist/Usenet" or whatever else we feel the need to categorize. Thoughts? Crom daba (talk) 13:41, 14 October 2016 (UTC)[reply]

Excellent idea! That said, I would limit the categories for individual coiners to a predefined list of people and add the rest to a catch-all category, otherwise we will end up with a lot of terms coined by _____ categories with only one or two terms. — Ungoliant ^(falai) 12:07, 18 October 2016 (UTC)[reply]

Correct use of templates

This edit has the desired visible effect. Please advise whether it is the correct use of templates, or whether the result ought to be accomplished in some other way. Mihia (talk) 21:05, 14 October 2016 (UTC)[reply]

I think {{term-label}} is meant to be used after the headword. For alternative forms, we actually have a dedicated template {{alter}}. --Wiki Tiki 89 21:09, 14 October 2016 (UTC)[reply]

Thanks, {{alter}} does not specially understand the parameter "non-Oxford" or do the links or the brackets. I can almost replicate the "term-label" results like this, but I'm not sure if this may be more of an abuse of templates than just using "term-label". Also, the brackets come out in italics which they strictly shouldn't. Mihia (talk) 10:45, 15 October 2016 (UTC)[reply]

We could create Module:en:Dialects and create abbreviations for {{alter}} to recognize, the way Module:hy:Dialects and Module:grc:Dialects do for those languages. Otherwise, I'd just use {{q|non-Oxford British English}}; that way the parentheses aren't italicized but the contents are. —Aɴɢʀ (talk) 11:02, 15 October 2016 (UTC)[reply]

OK, thanks, I plan to use canonise/canonize as a model for definition-merged -ise/-ize entries, in terms of layout and labelling, so if there's anything you or anyone else finds unsatisfactory about the way those entries are currently set out, please say. Mihia (talk) 12:46, 15 October 2016 (UTC)[reply]

@Angr: I was recently surprised by the fact that there isn't a module for English dialects yet! I think I'll start it. — Eru·tuon 02:46, 16 October 2016 (UTC)[reply]

@Erutuon: That's fine, but wouldn't it be better to have {{alter}} call on the same dialect list that {{lb}} and {{alternative form of|from=}} already call on, i.e. Module:labels/data/regional? It seems redundant to have two lists. —Aɴɢʀ (talk) 07:27, 16 October 2016 (UTC)[reply]

@Angr: The same could be said about Module:grc:Dialects: I have added some Greek dialect labels from there to Module:labels/data/regional. I agree that it's redundant, but I'm not sure how to solve the problem. — Eru·tuon 18:09, 16 October 2016 (UTC)[reply]

@Erutuon: Couldn't {{alter}} be rewritten to use Module:labels/data/regional instead of, or in addition to, Module:XXX:Dialects? —Aɴɢʀ (talk) 09:16, 18 October 2016 (UTC)[reply]

@Mihia: I added Oxford to Module:en:Dialects, but wasn't sure what Wikipedia article to link to for non-Oxford. If you'd like to add that label, go ahead. — Eru·tuon 03:02, 16 October 2016 (UTC)[reply]

@Erutuon:. Thanks. I believe that "non-Oxford" should link to https://en.wikipedia.org/wiki/Oxford_spelling. I don't understand in practice how what you have done affects the entries canonise or canonize. I would be grateful if you could explain exactly what needs to change in those articles to take advantage of it, or make the edit yourself if you prefer. Mihia (talk) 20:39, 16 October 2016 (UTC)[reply]

@Mihia: The labels in Module:en:Dialects can be used in the template {{alter}}, by adding a blank parameter after the word, and then adding the label in the next parameter (or multiple labels, one per parameter). I did that at canonize. — Eru·tuon 20:49, 16 October 2016 (UTC)[reply]

@Erutuon: I see, thanks. Would it be possible to change the text to read Non-Oxford British English, to be consistent with other templates, and also to put this label in brackets? In other words, for the appearance of the whole thing to be like the version here? Mihia (talk) 21:01, 16 October 2016 (UTC)[reply]

@Mihia: Yes. Go ahead and edit the display part of the data['Non-Oxford'] label in the en:Dialects module. That will change the link text for the label. — Eru·tuon 21:04, 16 October 2016 (UTC)[reply]

Ooops, I'm not sure about the brackets part. That falls into the realm of the {{alter}} template and Alternative forms module. I think brackets are not used because for Ancient Greek and other languages that are transliterated, this would result in two consecutive parentheses – for instance, ὄνῠμᾰ (ónuma) (Aeolic, Doric) – which is ugly. There may be a solution, but it would be more complicated... — Eru·tuon 21:09, 16 October 2016 (UTC)[reply]

As Erutuon has pointed out, there is another long-standing request for brackets at Template talk:alter. Anyone fancy doing this? It is beyond my skill level. Mihia (talk) 19:33, 18 October 2016 (UTC)[reply]

@Mihia: I finally got familiar enough with Lua to fix the problem. See Template talk:alter (I pinged you there too). — Eru·tuon 23:03, 26 January 2017 (UTC)[reply]

@Erutuon: Thank you! Mihia (talk) 23:15, 26 January 2017 (UTC)[reply]

Italics in Project-Link Templates

Some time ago, User:DCDuring suggested that an |i= parameter be added to these templates so links to taxonomic names could be italicized according to standard practice for such names. That was never done, but he started adding it to the wikitext in entries in anticipation of someone getting around to doing it eventually. This wasn't a problem, though, since templates ignore undefined parameters. Then User:CodeCat decided to convert these templates to use a Lua module. Even that would be fine, since Lua adds useful capabilities that can be exploited later. She also incorporated Module:parameters, which lets you specify which parameters can be used in a template. That's where things went off the rails. Aside from maybe half a dozen unrelated errors, all of the 462 entries currently in Category:Pages with module errors are due to the previously-ignored |i= parameter.

CodeCat had said that the luafied versions would work exactly like the un-luafied versions to start with. I don't know about you, but 450+ module errors seems like a big difference to me. It's not that Module:parameters is inherently evil, but in this case I would suggest it's been misused. Sure, I found a small number of typos that needed to be fixed, but that could have been done without breaking over 450 entries.

As I see it, we have four main options, in order of ease of implementation:

Do nothing. Not recommended, because this has replaced links to Wikipedia, Wikispecies and Commons with alarming red error messages in close to 460 entries, and it's hard to spot real errors among the three pages of these errors.
Change the templates to have Module:parameters ignore the |i= parameter
Implement the |i= parameter. That's what I would recommend, because the codes governing taxonomic names explicitly say that at least genera and species should be italicized, and the system overhead is negligible.
Remove the |i= parameter from all 450+ entries. Why? They were added in good faith and could serve a useful purpose.
Pork. Sorry, I just wanted to see if anyone was paying attention.

What would everyone prefer? Chuck Entz (talk) 01:14, 15 October 2016 (UTC)[reply]

Option 3 sounds like the best to me too. If there's some kind of obstacle to option 3, then option 2 seems like an acceptable temporary fix. —Mr. Granger (talk • contribs) 01:25, 15 October 2016 (UTC)[reply]

I for one am willing to go for the pork option, if pork is a misspelling of fork. That would mean substituting a non-Lua template for {{pedia}}, {{specieslite}}, and {{comcatlite}} that supported italics in those entries that needed it, genus and subgeneric entries. At some future date we could add the ability to de-italicize portions of the taxa that should not be italicized like "subsp.", "subg.", "var.", etc. I am fairly sure that CodeCat will never work on that feature, having bigger fish to fry. I don't like my fish fried anyway; it's bad for my heart. DCDuring TALK 02:02, 15 October 2016 (UTC)[reply]

I did option 2 (it's a trivial line of code), I tried doing option 3 but I think it better that someone who knows what he/she's doing handle it properly. Crom daba (talk) 02:50, 15 October 2016 (UTC)[reply]

@Crom daba: I looked at the code for Module:wikipedia, and then at Module:links, and finally at Module:script utilities. Those are the modules that it uses to create links to Wikipedia. Apparently italics and bold are not supported (see § tag_text), so there is no way to make the links to Wikipedia articles on species names be italicized without removing boldface. That's quite irritating... — Eru·tuon 02:05, 16 October 2016 (UTC)[reply]

Those modules are of "our" own creation. The lack of functionality is a self-inflicted wound. DCDuring TALK 15:29, 16 October 2016 (UTC)[reply]

Thanks to @CodeCat:, the change has been made. A look at [[Argentina#External links]] will show another class of interproject links that make a fork of the templates worth considering. A WP entry like w:Argentina (plant) or a commons link like [[commons:Category:Argentina (Rosaceae)}} have mixed character formatting. They should be Argentina (plant) and Argentina (Rosaceae) respectively. In general the taxonomic authorities prescribe that only a genus and subgeneric names and epithets should appear in italics in text.

They also prescribe that such items should appear in ordinary typeface when embedded in italic running text, so templates that force italics or regular text force the appearance of such taxa to deviate from the prescription. We might not care about prescriptions, but these prescriptions usually followed in scholarly works and often in popular science and nature books. DCDuring TALK 18:11, 18 October 2016 (UTC)[reply]

I created a function to apply correct italics to species, subspecies, and variety names on Wikipedia (in w:Module:eFloras, which is used in a reference template). A similar thing could be created here – though the parameter that triggers it would have to have a name besides |i= (|genus=, |subspecies=, etc. or something else?). It could detect the abbreviations subsp., ssp., var., and f., and words in parentheses, all of which should not be italicized, and then apply italics to everything but them. Either that, or we manually enter link text with correct italics in every case. Thus, it would take Cupressus arizonica var. glabra and display it as Cupressus arizonica var. glabra, and the previously mentioned Argentina (plant) would display correctly as Argentina (plant). The parameter |i= could still be used in cases where the whole title should be italicized. — Eru·tuon 19:55, 18 October 2016 (UTC)[reply]

Cool. That's exactly the kind of thing I was hoping for. Some questions remain in my mind:

Is it worthwhile to attempt to provide for a piped alternative formatting for whatever cases that the logic you have implemented on WP doesn't cover every situation we find? Have you found any exceptions at WP?
We might discover other taxon elements that should not be italicized, eg, "morph.", perhaps "×". Would this be updateable by altering data in a Module?
Should this be implemented here within the project-link system or with separate templates and/or modules for the big-box and inline versions the templates for pedia, species, and commons?

None of these are likely insurmountable and all but the last may be ignorable. DCDuring TALK 21:04, 18 October 2016 (UTC)[reply]

@DCDuring: I haven't encountered any exceptions in the logic that I used in w:Module:eFloras, but it's unlikely there will be any, because the floras and lists that the eFloras template creates references to are all plants, and there is a fair amount of regularity in botanical names. For instance, all plant families end in -aceae, so the module searchs for that and makes sure it's not italicized. The logic has to be different here on Wiktionary, because more than just plant names are involved, and the automatic italicization would have to be explicitly turned on in links to species, subspecies, variety, etc. pages and not turned on for links to family pages, which aren't ever italicized.

It would be easy to add or remove elements to the module, if we discover any more that should not be italicized. There should just be a list of testcases somewhere (with one example each of genus, subspecies, form, etc.) that we can look at to check that the code is doing what it's supposed to.

I'm not familiar with the structure of these interwiki link templates and modules, but I think I'll start work on a module that does the simple task of this automatic italicization, which can then be used in whatever interwiki link modules require it. — Eru·tuon 23:14, 18 October 2016 (UTC)[reply]

I greatly appreciate your undertaking this. DCDuring TALK 23:22, 18 October 2016 (UTC)[reply]

Module:italics is now complete and seems to work. It has an array of things that shouldn't be italicized, and it's pretty easy to add another one if you think of any. The documentation page has a set of testcases that show how the module handles some of the un-italicized elements that we talked about. If there are no problems, someone can add this function to the interwiki link modules; not sure what the parameter that turns it on should be called, though. — Eru·tuon 01:19, 19 October 2016 (UTC)[reply]

My choice: "taxi=". DCDuring TALK 11:09, 19 October 2016 (UTC)[reply]

That might be fine, though I wonder: would there be any examples of page names with parentheses that should be italicized but are not taxonomical? I was thinking maybe the titles of plays, but I couldn't find any that are linked to. — Eru·tuon 15:53, 19 October 2016 (UTC)[reply]

Examples: plays like Antigone (Sophocles play), which are named after characters and therefore have a disambiguator. Or Richard II (play), which is named after a real person. These currently don't have Wiktionary entries, or they are not mentioned in definitions (see Antigone; there's no entry on Richard II) yet, but perhaps they will be in the future, and then they would have to be italicized in the same way as genus or species names. — Eru·tuon 16:57, 19 October 2016 (UTC)[reply]

For a broader reference for the parameter name how about "seli", for "selective italics"? DCDuring TALK 00:22, 20 October 2016 (UTC)[reply]

Use of Google hit counts

Frequently I see people quoting Google hit counts -- the numbers that come up at the top of the first page of results -- as if they were an exact measure of how many times a word or phrase is used on "the Internet". Unless and until someone can give a clear explanation of how these numbers are generated, they need to be treated with the greatest caution or scepticism. A common behaviour is that a largeish number (say 10,000) comes up, but a much smaller number (say 200) appear to actually exist in the sense that a web page containing that exact word or phrase can be retrieved. There is a variable cutoff point at which Google's "page through the results" runs out, even for search terms that obviously will genuinely have very large numbers of results, but the way this works is entirely opaque. It is not hard to find egregious examples. For example, a search for "a it of" (in quotes) supposedly yields 2,510,000 results, of which 98 are actually retrievable. A search for "goodgrief" supposedly yields 5,690,000 results, of which virtually every one of the retrievable set is actually "good grief" and not "goodgrief". Mihia (talk) 03:22, 16 October 2016 (UTC)[reply]

Indeed, what I usually do is click several times "to the right", on results page 10, then 15, etc., to see whether a much smaller number appears. For google:"goodgrief", clicking to the right ultimately lands me on Page 16, where it says "Page 16 of about 146 results". I have to admit that we have to be careful with these numbers.

Where possible, it is better to use Google Ngram Viewer to get frequency numbers. This works not only for English but also for Spanish, German, Russian, Italian, French and Hebrew.

Czech is not in Google Ngram Viewer, so either we have to make do with the Google numbers, or use an academic corpus of Czech. --Dan Polansky (talk) 05:29, 16 October 2016 (UTC)[reply]

Number of hits also doesn't tell you anything about meaning. Something with 500 hits may have 490 hits as a username and 9 as a brand name. Renard Migrant (talk) 12:27, 18 October 2016 (UTC)[reply]

I agree with the above sentiment about result count not conveying meaning, so they are only valuable to the extent that common sense is applied to their consideration. Some notes about the result count number: the number is an estimate unless the rc=1 parameter is used, even then it will only be actual up to one million results. Also, the fact that results beyond page 16 or so do not display is that Google only returns 1000 results for any search, the fact that they aren't displayed does not mean they do not exist. In cases where there are even fewer than 1000 despite a large number it is likely at least in part due to the condensing process that results are put through after the ranking which combines and removes some "duplicate" results. - TheDaveRoss 13:10, 18 October 2016 (UTC)[reply]

Nonstandard spellings

FYI, I have started Wiktionary:Requests_for_deletion/Others#Template:nonstandard_spelling_of. I think the template should only exist if we have a reasonably clear idea of what we mean by "nonstandard"; I don't have such a clear idea. --Dan Polansky (talk) 06:35, 16 October 2016 (UTC)[reply]

How to categorise zero-derivations of English verbs from nouns, etc.?

A zero derivation is one that does not lead to any changes in the word, the part of speech is just switched over without any changes. English uses zero derivation a lot, so it's especially important there. However, we don't currently categorise or otherwise mark such derivations. In fact, the majority of such cases are lacking an etymology altogether. I'm wondering how we can best handle zero derivations more explicitly. When it comes to suffixes, we categorise by suffix, so the nature of the derivation is generally clear from the nature of the suffix (and if there are multiple homographic suffixes, we can disambiguate with id=). It's clear that -ify creates verbs, for example, while -age makes nouns. For zero derivations this is less clear, so we should probably add a part-of-speech qualifier to the category name. So embrace would get a second etymology which would categorise into something like Category:English noun zero derivations. We'd presumably create a template for the occasion too. —CodeCa t 17:26, 17 October 2016 (UTC)[reply]

Late response, but: It needn't be listed under a separate etymology. That's certainly one way to do it; the other is to stick under the etymon's etymology section with "And the noun sense derived therefrom circa 1720, first attested referring more specifically to" vel sim.—msh210℠ (talk) 15:30, 27 November 2016 (UTC)[reply]

This is definitely worth doing, IMO. As Msh210 says there need not be a separate etymology section. Further I think such separate etymology sections for zero derivations should be eliminated and replaced by a zero-derivation template, as CodeCat suggests, with a parameter for date of first attestation of the derived term. If such a parameter is missing, it seems to me to be worth categorizing the entry appropriately to draw attention to possible uncertainty in the order of derivation. It will be a lot of work to determine which PoS comes first. We should start with those for which the attestable zero-derivation occurred in EME or later. DCDuring TALK 16:25, 27 November 2016 (UTC)[reply]

Deleting user talk pages

I propose prohibiting [admins from] deleting user talk pages, especially, their own one unless [the user proves] it is very necessary or a link to the archive is clearly shown on their talk page.

Talk pages are the best and fastest way to study a user, to see what they are up to or have been up to, what their expertises are, how communicative they are, what usergroups they belong to etc. Also, talk pages usually do not contain garbage but rather discussions that may be useful to others to read in order to not ask the same question. --Dixtosa (talk) 17:35, 19 October 2016 (UTC)[reply]

Support: Non-admins should be allowed to see the history of all user's talk pages, except for graffiti and outing edits. Purple backpack89 19:34, 19 October 2016 (UTC)[reply]
Support --Daniel Carrero (talk) 19:36, 19 October 2016 (UTC)[reply]
I would even go so far as to say that the pages should not be emptied either. All the discussions should be either on the page or archived on a subpage. This would ensure that all the content is searchable. --Wiki Tiki 89 19:40, 19 October 2016 (UTC)[reply]
Support. I've expressed support for this before, at Wiktionary:Information desk/Archive 2014/January-June#Geequinox. Archiving talk pages seems unobjectionable, and I'm even okay with emptying them, but I think deleting them should be avoided. —Mr. Granger (talk • contribs) 19:46, 19 October 2016 (UTC)[reply]

There is currently a page deletion reason "userspace page deleted on user's request". Would the use of this be restricted? Would users have to provide evidence that the page "should" be deleted at some kind of forum? Equinox ◑ 19:47, 19 October 2016 (UTC)[reply]

This deletion reason is for userspace sandboxes, notes, etc. Deleting them is fine, in my opinion. It does not apply to talk pages. --Daniel Carrero (talk) 19:49, 19 October 2016 (UTC)[reply]

There is a difference between a userspace page and a user-talk-space page. --Wiki Tiki 89 19:49, 19 October 2016 (UTC)[reply]

I think I have deleted user talk pages when marked with this reason by users. I think others have too. So needs to be clarified. Equinox ◑ 19:50, 19 October 2016 (UTC)[reply]

Ok, I should have said that there should be such a distinction. --Wiki Tiki 89 19:51, 19 October 2016 (UTC)[reply]

Oppose Admins need to have the authority to delete whatever they think needs deleting. SemperBlotto (talk) 19:53, 19 October 2016 (UTC)[reply]
Except the main page. --Daniel Carrero (talk) 19:54, 19 October 2016 (UTC)[reply]

Oppose. A contributor’s user[talk?]space is his castle, and as long as it’s not harming the project or other users, I don’t see a problem with allowing him to bulldoze and rebuild one of the castle’s towers. Concerning some of your points:

Talk pages can be used to study users: which is why users who don’t want to be probed should be given the option of getting rid of their talk page.
Talk pages contain useful content: if a discussion is important for Wiktionary, it should take place or be archived in a public page, not in someone’s talk page.

— Ungoliant ^(falai) 20:09, 19 October 2016 (UTC)[reply]

I've been archiving my talk page for years, under the impression that it's what everybody does, and that people would want to read it. If it turns out we can delete our talk pages, I'm thinking of maybe deleting mine. --Daniel Carrero (talk) 20:26, 19 October 2016 (UTC)[reply]

@Ungoliant MMDCCLXIV I think it's generally held around here, including by you, that users must consent to allowing their actions to be "probed" as a condition of participating. It's less "bulldozing and rebuilding the castle's towers"; that would be just blanking the page. Deleting the page and its history is more akin bulldozing the castle's towers while demanding that all record the towers ever existed be burned. Purple backpack89 03:23, 20 October 2016 (UTC)[reply]

Actions in public pages must be kept for probing. — Ungoliant ^(falai) 11:04, 20 October 2016 (UTC)[reply]

Talk pages are public pages. When they aren't deleted, anyone can read them. Purple backpack89 14:01, 20 October 2016 (UTC)[reply]

By public, I mean they are not inherently connected to an individual user. — Ungoliant ^(falai) 15:11, 20 October 2016 (UTC)[reply]

A lot of discussions start from someone's talk page and only very few of them reach to the "public pages".

Overall, I think it is far more important for Wiktionary that each user can have some idea about any other user and can access to useful information than the right to have own talk page deleted for some lame reasons. --Giorgi Eufshi (talk) 16:27, 20 October 2016 (UTC)[reply]

Support. I find that talk pages are helpful for figuring out who to talk to about what. Andrew Sheedy (talk) 01:50, 20 October 2016 (UTC)[reply]
Support: Let us preserve discussions at least in page histories. And I find it especially troublesome when admins delete talk pages of banned users. Let transparency reign supreme. --Dan Polansky (talk) 13:23, 22 October 2016 (UTC)[reply]

Question

If this proposal passes, what to do when a user deletes their own talk page? Would another person restore it and perhaps archive it for them? --Daniel Carrero (talk) 14:51, 22 October 2016 (UTC)[reply]

restore and admin-protect it. --Dixtosa (talk) 15:07, 22 October 2016 (UTC)[reply]

And how do non-admins post to an admin-protected talk page? Chuck Entz (talk) 00:46, 23 October 2016 (UTC)[reply]

I have seen this option in protection summaries "move=only admins". I thought there would be "delete=only stewards". Dixtosa (talk) 05:49, 23 October 2016 (UTC)[reply]

The system gives us only two types of protection: against editing and against moving. We have a choice of what level of user we can protect against, but those are the only two actions. I don't know about stewards, but bureaucrats have no special powers beyond the ability to add or remove privileges- the ability to block, protect and delete comes from being an admin, not a bureaucrat. I'm sure stewards have all of the above on any wiki they go to and globally, but I don't know if they can set protections against admins- I think they would have to remove a given admin's ability to delete anything rather than being able to protect a given item from deletion by all admins as a class. As I understand it, it took action by the developers to institute admin-proof protections when there was a dispute at de-WP a while back. Chuck Entz (talk) 08:10, 23 October 2016 (UTC)[reply]

By delete, do you mean delete, or do you mean erase the contents of? I don't think there's a big problem in the latter case, as long as the page history is still accessible. Andrew Sheedy (talk) 00:36, 23 October 2016 (UTC)[reply]

I don't see the problem. How is this different from any other case of an admin abusing his deletion power to delete something out of process? --Wiki Tiki 89 13:39, 24 October 2016 (UTC)[reply]

Durability of CFI for Google groups

Hi everyone,

I have an opinion but I am not sure it is valid enough so I wanted some input from fellow Wiktionarians to check my reasoning. This is my problem: as far as I know to attest a word it suffices to use a citation from a Usenet group - the rationale is that Google archives it "durably". And I was thinking if that is so, then this also applies to words that appear in Google Groups but not necessarily in the Usenet. I suppose Google archives Google Groups also "durably" so the word is also durably archived and then it should satisfy that part of our strict Criteria for Inclusion. Does anyone have an opposing opinion? My intention is to include a word here and propose it as a FWOTD candidate but I'm not sure it passes the "durability" part of CFI. Hope to hear from you soon, cheers all, --biblbroks_{дискашн} 19:41, 19 October 2016 (UTC)[reply]

I'm not sure that it's specifically Google's archives that make Usenet durably archived, but rather because Usenet is archived independently by various organizations, and Google just happens to provide a convenient searchable interface. This is much the same as Google Books, the books aren't durably archived by Google, but by libraries; Google just gives us convenient searchable access. --Wiki Tiki 89 19:45, 19 October 2016 (UTC)[reply]

Then wouldn't the wording "[...] this naturally favors media such as Usenet groups, which are durably archived by Google.[..]" (at Wiktionary:ATTEST) be somewhat incomplete? IMO it should read similar to "... which is durably archived by Google and various other organizations." Otherwise it gives an impression of Google's archives as a sole contributor to the "durability" part of the criterion. I am not sure if this is of much importance but I remember some discussions few years ago about these "durably"/"permanent recorded media" stuff. Or was it something like that? OTOH, I much more vividly remember a dispute over Request for Verification with a contributor whose argument was that Google searches weren't a proper way to base a word's attestation. At that time I was busier with winning the "RFV contest" of one particular phrase than deciphering the true meaning of CFI even to myself let alone to that editor. If I understood the durability more properly it might have helped. The editor eventually received a perma-ban. --biblbroks_{дискашн} 20:32, 19 October 2016 (UTC)[reply]

The best search term for Wiktionary discussions about this is "durably archived".

We value Google for its convenient online access. The print corpora they have (Books, News, Scholar) are durably archived because the recorded documents are of print media which are physically archived somewhere, not necessarily convenient of access, eg, Otago Daily Times is probably in multiple NZ libraries, but few, possibly none, elsewhere. An analogous situation exists with respect to Usenet: multiple archived copies, but only Google offers convenient access. DCDuring TALK 00:52, 20 October 2016 (UTC)[reply]

@biblbroks: You are right: the CFI text is misleading. We are still in the process of cleaning up CFI. The whole durably-archived business is something of a gray area. --Dan Polansky (talk) 13:17, 22 October 2016 (UTC)[reply]

Speaking of deleting user pages...

Should the deletion reason "Userspace page deleted on user's request" be changed to "Userspace page deleted on owner's request" (or similar)? It just struck me, seeing I'm-so-meta delete DTLHS's tracking page that Wonderfool had completed, that it might sound as though any user could request deletion of any other user's page. Equinox ◑ 21:25, 20 October 2016 (UTC)[reply]

Probably. --Wiki Tiki 89 21:30, 20 October 2016 (UTC)[reply]

I changed it now. I don't think we could actually use "deleted on user's request" as a reason to let one person delete another user's page, but having even excess clarity shouldn't hurt. --Daniel Carrero (talk) 21:52, 20 October 2016 (UTC)[reply]

Thanks. (If they are the page owner, why don't they have the right to delete it on demand?) Equinox ◑ 00:35, 22 October 2016 (UTC)[reply]

Do you mean that maybe a non-admin should have a "Delete" button for their own userpages? There's a Wikipedia policy saying that they wouldn't want to implement that, because a person in bad faith could move pages to their userspace and then delete them. I think there were more reasons, too, that I don't remember right now. --Daniel Carrero (talk) 01:22, 22 October 2016 (UTC)[reply]

I was really talking about your anti-deletion comment in "Deleting user talk pages" above. But only as a speculative aside. Equinox ◑ 01:47, 22 October 2016 (UTC)[reply]

IMO, people should be completely free to delete their user pages that are not talk pages. At one point, I tried to delete my own talk page on that wiki called explain xkcd, but someone reverted it and said that it is not allowed. I'd prefer if you and other people never deleted their user talk pages, but I'd feel weird reverting or archiving others' talk pages if they want them deleted, and it's certainly not a blockable offense, so even if we implemented the rule that nobody can delete their own talk pages, I'm not sure if we would just ignore you or other people if they don't want to comply. --Daniel Carrero (talk) 02:14, 22 October 2016 (UTC)[reply]

Editing the introduction of WT:EL - Pronunciation

Mainly to address some complaints in Wiktionary:Votes/pl-2016-07/Pronunciation 2, I'd like to edit the first sentence of WT:EL#Pronunciation.

This is a minor edit, and I'm thinking this won't need a vote. @Dan Polansky, it seems you are usually the first person to defend having votes to edit policies, not counting myself. Do you think that this needs a vote?

Current text:

~~Ideally, every entry should have a pronunciation section, with the phonetic transcription and an audio file. Note that pronunciations may vary widely between dialects.~~

The region or accent ({{a|GA}}, {{a|RP}}, {{a|Australia}}, et al.) is first if there is regional variation, followed by the name of the transcription system, then a colon, then the transcription. It is preferable to use an established transcription system, such as enPR or IPA (see Wiktionary:Pronunciation key for an outline of these two systems). Phonemic transcriptions are normally placed between diagonal strokes (/ /), and phonetic transcriptions between square brackets ([ ]).

Proposed text:

The pronunciation section includes the IPA transcription, audio pronunciations, rhymes, hyphenations and homophones.

The region or accent ({{a|GA}}, {{a|RP}}, {{a|Australia}}, et al.) is first if there is regional variation, followed by the name of the transcription system, then a colon, then the transcription. It is preferable to use an established transcription system, such as enPR or IPA (see Wiktionary:Pronunciation key for an outline of these two systems). Phonemic transcriptions are normally placed between diagonal strokes (/ /), and phonetic transcriptions between square brackets ([ ]). Note that pronunciations may vary widely between dialects.

Changes:

Removing "Ideally, every entry should have a pronunciation section, with the phonetic transcription and an audio file.".
Adding "The pronunciation section includes the IPA transcription, audio pronunciations, rhymes, hyphenations and homophones."
Moving "Note that pronunciations may vary widely between dialects." to the end of the first bullet point.

Rationale:

Arguably, the "every entry should have ..." part " is useless clutter. We don't want that statement in every section "ideally, every entry should have an etymology section" and such. (as pointed out in the vote)
The "every entry should have ..." part is false. There are some languages that shouldn't have a pronunciation section (e.g. sign languages, which have a Production section instead, and many ancient languages whose pronunciation is unknown). (as pointed out in the vote, too)
Arguably, the part "Note that pronunciations may vary widely between dialects." fits better the text that is explaining the IPA transcriptions, instead of the introduction.

--Daniel Carrero (talk) 21:56, 20 October 2016 (UTC)[reply]

I oppose explicit mention of IPA. Also, I don't see the point of saying "Note that pronunciations may vary widely between dialects." Why not just drop it? --Wiki Tiki 89 22:00, 20 October 2016 (UTC)[reply]

Sure, we can change "IPA transcription" to just "transcription". I agree with your second point, too: we can just drop the "Note that pronunciations may vary widely between dialects." I always like when we're able to remove any statements from WT:EL that are comments rather than regulations. --Daniel Carrero (talk) 22:08, 20 October 2016 (UTC)[reply]

@Daniel Carrero: The meta-principle states that "Any substantial or contested changes require a VOTE". The proposed change is not very substantial and is so far uncontested. While I prefer to always have a vote, a vote does not seem required by the meta-principle in this case. But if you want to use this BP discussion as a basis for changing WT:ELE, you should wait a couple of days before you change WT:ELE to allow other people to provide input. --Dan Polansky (talk) 14:57, 21 October 2016 (UTC)[reply]

OK, sounds good to me. If nobody objects, I'll do the change without a vote, then, after waiting some time. --Daniel Carrero (talk) 01:39, 22 October 2016 (UTC)[reply]

Taking Wikitiki89's first message into consideration (which I support), the exact proposed text is going to be this:

The pronunciation section includes the transcriptions, audio pronunciations, rhymes, hyphenations and homophones.

The region or accent ({{a|GA}}, {{a|RP}}, {{a|Australia}}, et al.) is first if there is regional variation, followed by the name of the transcription system, then a colon, then the transcription. It is preferable to use an established transcription system, such as enPR or IPA (see Wiktionary:Pronunciation key for an outline of these two systems). Phonemic transcriptions are normally placed between diagonal strokes (/ /), and phonetic transcriptions between square brackets ([ ]).

--Daniel Carrero (talk) 01:59, 24 October 2016 (UTC)[reply]

I edited the policy as suggested. Feel free to discuss/revert/etc. --Daniel Carrero (talk) 07:54, 31 October 2016 (UTC)[reply]

ASCII vs. Unicode apostrophes in French entries

User:Angr edited d'où to use Unicode apostrophes instead of plain ASCII apostrophes in various places. What's the desired behavior here? Our entries are named using ASCII apostrophes so I think we should stick with ASCII apostrophes. Benwing2 (talk) 23:30, 20 October 2016 (UTC)[reply]

I don't see any harm in it. We use all kinds of things like macrons and accents in headwords that we don't include in the entry name- this is just an extension of that. It may even be beneficial in cases where the search engine doesn't recognize the plain character and the fancy character as the same thing: it means that spellings with both the plain and fancy character are in the entry for the search engine to find. Chuck Entz (talk) 02:26, 21 October 2016 (UTC)[reply]

It would be ideal if Unicode apostrophes replaced ASCII ones across all French entries for consistency. I don't much like the idea of some entries have one type and some having the other. Andrew Sheedy (talk) 02:37, 21 October 2016 (UTC)[reply]

I always change them back to straight apostrophes, as these are the ones that appear on keyboard. They sometimes get corrected in Microsoft Word to curly ones but that's it. Renard Migrant (talk) 12:49, 21 October 2016 (UTC)[reply]

I've been under the impression for years that our usual practice is to use typewriter apostrophes in entry titles and curly apostrophes in headword line displays, not just for French but for all languages that use apostrophes. I understand the rationale behind using typewriter apostrophes in entry titles, but curly apostrophes look prettier, so I prefer using them for display whenever possible. What should definitely always be the case, though, is for there to be a hard redirect from the curly version to the typewriter version, e.g. d’où → d'où, because French Wiktionary always uses curly apostrophes, and the only way to make sure our entries link correctly to theirs is for en:d'où to link to fr:d'où, which hard-redirects to fr:d’où, which links to en:d’où, which hard-redirects back to en:d'où. —Aɴɢʀ (talk) 13:19, 21 October 2016 (UTC)[reply]

’ in entries seems to be incredibly rare, making me think there is no such unofficial policy. Renard Migrant (talk) 17:28, 22 October 2016 (UTC)[reply]

Personally, I would be quite pleased if a bot went through and replaced all ASCII apostrophes with Unicode ones (aside from actual page names). I agree with Angr's reasoning. Andrew Sheedy (talk) 00:33, 23 October 2016 (UTC)[reply]

Equally, my preference follows Rendard's: the ascii apostrophe is available to editors without issue or complication, unlike (in most cases) the unicode apostrophe. (That is, this is make-work which will result in on-going make-work to correct future editors.) - Amgine/^t·e 05:07, 1 November 2016 (UTC)[reply]

I'm thinking that it would be much easier, and less work (I mean, less ongoing work), if we did not use a bot, but rather had Module:links and Module:headword automatically do replacements: for link text, replacing plain apostrophes with the single right quotation mark (curly apostrophe), and for URLS, replacing curly apostrophes and other symbols with the plain apostrophe. We have also been talking about this at Wiktionary talk:About Ancient Greek. I mean, we used to have to add macron- and breve-full Latin as alternative link text (for instance, {{m|la|Roma|Rōma}}), but now the links module automatically replaces macroned and breved letters with the plain equivalents when creating links. Why not have modules also do apostrophe replacement? It would be so much easier. — Eru·tuon 21:19, 29 October 2016 (UTC)[reply]

Please don't, the fact it benefits no-one should be reason enough to not do it. Renard Migrant (talk) 14:07, 30 October 2016 (UTC)[reply]

Vote: Removing label proscribed from entries

FYI, I created Wiktionary:Votes/2016-10/Removing label proscribed from entries. Let us postpone the vote as much as discussion requires, if at all. --Dan Polansky (talk) 19:43, 21 October 2016 (UTC)[reply]

template:it-conj-ire / morire

Morrebbero does not seem to be showing up in the conjugation as an alternative to morirebbero. (3rd person plural conditional.)— Pingku^dimmi 03:29, 22 October 2016 (UTC)[reply]

Removing: "No topic should have a new vote more than once a day (24 hr period)."

I'd like to make Wiktionary:Voting policy a full-fledged policy at some point, instead of just a think tank. One thing I'd like to remove is this:

"No topic should have a new vote more than once a day (24 hr period)."

Rationale:

What does this rule even mean? If a person creates a vote about Greek romanization today, can't I create another vote about Gothic romanization on the same day, because it's the same "topic"? Does anyone want that? If not, let's just remove that rule.

--Daniel Carrero (talk) 13:07, 22 October 2016 (UTC)[reply]

You're right about the need for change. The period for no additional votes on the same topic (broadly construed) should be 30 days. DCDuring TALK 14:06, 22 October 2016 (UTC)[reply]

Or to simplify these determinations, we should limit the number of proposals a single individual could make to, say, one a month or one a week. As we already have a policy against sock puppets, that way, at the very least, someone with many proposal would at t least have to find stooges to introduce additional votes. DCDuring TALK 14:10, 22 October 2016 (UTC)[reply]

I'm currently the only one above a certain number of votes created per month. Personally, I don't think I would like to find stooges to create votes for me, thank you -- sometimes I do the opposite, creating votes for proposals that were introduced by other people. I wish more people created votes for things that need to be voted. Anyway, what you talked about could be seen as variations of the rule that I proposed to remove, but that would be a long shot. It's easier just to remove it and then discuss and introduce whatever other proposals concerning how many votes may be created, which is something that we discussed recently. --Daniel Carrero (talk) 14:21, 22 October 2016 (UTC)[reply]

Yes, I certainly see the advantage to you of getting inertia on your side. DCDuring TALK 14:31, 22 October 2016 (UTC)[reply]

What is the advantage to me of getting inertia on my side? --Daniel Carrero (talk) 14:45, 22 October 2016 (UTC)[reply]

I agree that it should be removed. But I also think that when a single person creates too many votes in a week, that is not so good and should be avoided. --Dan Polansky (talk) 14:19, 22 October 2016 (UTC)[reply]

Strong discouragement has proven insufficient to avoid the creation of too many votes by an individual. DCDuring TALK 14:31, 22 October 2016 (UTC)[reply]

By contrast, I saw Daniel Carrero respond to disagreement by reducing the number of created votes a lot. My rule of thumb is that there should never be more than 10 votes running and listed, and that is where we have been lately. In any case, "at most one new vote a day" is too lenient if applied to a single person since that would be 30 votes a month. The wording should go since it does not do anything useful anyway. --Dan Polansky (talk) 14:55, 22 October 2016 (UTC)[reply]

Procedural note: I would like to remove "No topic should have a new vote more than once a day (24 hr period)." from Wiktionary:Voting policy without a vote. Rationale: This is a think tank policy and as such, I believe there's some leniency in editing it with abandon. By contrast, the whole section "Voting eligibility" was voted and approved at Wiktionary:Votes/2010-04/Voting policy and I would probably oppose changing any regulations in that section without a vote. --Daniel Carrero (talk) 15:03, 22 October 2016 (UTC)[reply]

I removed "No topic should have a new vote more than once a day (24 hr period)." as suggested above. Feel free to revert/discuss/etc. --Daniel Carrero (talk) 07:55, 31 October 2016 (UTC)[reply]

Can we revert this please, or (sigh) have a vote on it? I think it's a good rule and 24 hours should be the very minimum. Equinox ◑ 14:10, 31 October 2016 (UTC)[reply]

I reverted it per your request; the 24-hour rule is back in the policy. If we are going to vote on removing it anyway, shouldn't we define a higher and thus more useful limit? Like 1 vote per week? --Daniel Carrero (talk) 14:20, 31 October 2016 (UTC)[reply]

Votes created on the whims of a single editor

Instead of imposing any kind of limits, why don't we simply have a rule that there needs to be a discussion in which at least a few editors express their desire to have a vote on the topic? That way no vote can be created merely on the whims of a single editor. --Wiki Tiki 89 13:41, 24 October 2016 (UTC)[reply]

At the moment, we have this related but more lenient rule in Wiktionary:Voting policy: "Votes should not usually be called for on Wiktionary:Votes. They should be the result of prior discussion located elsewhere (the Beer Parlour)."

As you may remember, I oppose this: "why don't we simply have a rule that there needs to be a discussion in which at least a few editors express their desire to have a vote on the topic".

Personally, I have this political position: The right to create new votes is a fundamental right, and anyone must be free to create new votes if they want, with or without prior discussion. (except maybe this: people who aren't eligible to vote probably should not be able to create votes -- but we may want to cross that bridge when we come to it) In practice, probably most voted proposals require discussions as you can't just propose a new thing without everyone being on the same page, but formally requiring discussions for all votes would, in some cases, be a bureaucratic hindrance, or maybe a more serious problem.

If a new vote is most certainly going to pass, no new discussion must be created to check if a vote is needed. Chances are, the proposal was already discussed before at some point, and the old discussions may be linked in the vote.
@Dan Polansky created this vote about the "def" template a few months ago. From what I understood, he did it because if the vote did not exist, people would probably be adding {{def}} to new entries without the vote. Sometimes, passing votes are needed to show consensus for a new thing and failed votes are needed to show lack of consensus for another thing. We don't need to create discussions to prove that there is lack of consensus for a new thing; rather, the proposers of the new thing need to show consensus on their side.
Votes to remove clutter from policies (like this) or formalize what we already do (like this) don't need, or barely need a discussion to check if the vote can be created.

If a new discussion is created proposing a new vote but nobody or few people bother to answer, that is good enough to me and we can create the vote, rather than insisting for people to reply. If a new non-discussed vote is created but which should have been discussed, the vote is probably going to fail anyway, and with comments about what exactly is wrong with the vote -- which probably takes about the same effort as replying and pointing problems at a pre-vote discussion. Sometimes, even when a vote that was discussed a lot before its creation, people point out new problems listed while the vote is ongoing, so pre-vote discussions are not a guarantee of creating perfect votes.

In all cases, even when no new discussion is required to check if a vote can be created, a new BP discussion may be created alongside the new vote, to inform people that the vote exists.

As long as we are fundamentally able to create votes if we want to, I'm fine with having restrictions like this: a maximum number of votes created per person during a certain period of time. --Daniel Carrero (talk) 17:06, 24 October 2016 (UTC)[reply]

Why is creating votes without prior discussion a fundamental right? If you're the only one who wants to vote on something, how do you expect it to even pass? We have the right to not have our time wasted by bad votes. What's so hard about starting a discussion and asking about whether we need a vote? --Wiki Tiki 89 17:28, 24 October 2016 (UTC)[reply]

I said: "probably most voted proposals require discussions". Do you believe that 100% of votes created require pre-vote discussions? Where's the disagreement? What are the bad votes you are talking about? --Daniel Carrero (talk) 17:40, 24 October 2016 (UTC)[reply]

Suppose we say that the right to start a BP discussion on an issue is a fundamental right. If there is no support for something in a BP discussion, there isn't much point in going to the trouble of starting a vote. bd2412 T 17:43, 24 October 2016 (UTC)[reply]

Yes, I agree with you, but on most votes, not all votes. In my message from 17:06, 24 October 2016 (UTC), I listed some cases that in my opinion, should be exceptions. Do you think that 100% of votes created require pre-vote discussions? --Daniel Carrero (talk) 20:20, 24 October 2016 (UTC)[reply]

Exceptions would be procedural votes like bot and adminship approvals. The ones you listed above should not be exceptions in my opinion. --Wiki Tiki 89 20:32, 24 October 2016 (UTC)[reply]

I agree that Wikitiki's are good exceptions and that Daniel's are not. DCDuring TALK 20:57, 24 October 2016 (UTC)[reply]

That's OK. Should we create a vote with a proposal like "Votes may only be created as a result of a discussion in which at least a few people supported creating a vote, except for nominations of bots, administrators, and bureaucrats.", to be added at Wiktionary:Voting policy? --Daniel Carrero (talk) 23:15, 24 October 2016 (UTC)[reply]

"At least a few" needs to be more specific. DCDuring TALK 23:27, 24 October 2016 (UTC)[reply]

Maybe 3 people? @DCDuring, would you like to create a vote with the proposal: "Votes may only be created as a result of a discussion in which at least 3 people supported creating a vote, except for nominations of bots, administrators, and bureaucrats."? Or maybe another number of people?

I may create the vote if people want, even though I'd vote oppose. I'm okay with either requiring 3 people to start this vote (for consistency?) or just creating it at once. Please, do whatever you want. The idea is not mine, I'm just offering to help with implementing others' ideas, which is something I like to do often. But I would also oppose implementing this requirement somehow without a vote, because it's a serious limitation on the ability to create further votes. Not to mention that it sounds like a bad idea to me, because it's needlessly bureaucratic and I don't see what problem it fixes, but I guess I can live in a system that I disagree with if it will make others happy. --Daniel Carrero (talk) 04:52, 25 October 2016 (UTC)[reply]

Votes are usually attended by more people over a longer period of time than BP discussions which always show the same dozen names and do not usually survive the end of a month. Votes are a way to create consensus for things which people just don't care about, to force a hand. Binding such a method which might be the last escape from an utter lack of input to the presence of input seems like introducing a bug to our system. Korn [kʰũːɘ̃n] (talk) 13:07, 25 October 2016 (UTC)[reply]

I agree with Korn. --Daniel Carrero (talk) 17:32, 25 October 2016 (UTC)[reply]

@Daniel Carrero re: "would you like to create a vote with the proposal...?" No. I'd like to see if there are others who agree and think it's worth a vote. I think that, by itself, it is not worth a separate vote. If we added some other "common-sense" reforms to our voting process, like quora, there might be something worth having folks stop adding and improving content and instead evaluate the proposal and its elements, considering how the elements work together. I'd also like to save my proposal-of-the-week for something better that might come along. DCDuring TALK 17:50, 25 October 2016 (UTC)[reply]

@DCDuring: you mentioned quora. Do you mean, requiring a minimum number of participants in an ongoing vote, in order to successfully close the vote? --Daniel Carrero (talk) 18:11, 25 October 2016 (UTC)[reply]

Old Provençal or Old Occitan?

OK, I already brought this up but it bears repeating in light of the dubious category CAT:Catalan terms derived from Old Provençal. The terms in this category are largely inherited terms and express the completely wrong notation that Catalan derives from Old Provençal. The intent was clearly to derive Catalan from Old Occitan, but even then I think this is wrong. This brings up two issues:

Can we please rename Old Provençal to Old Occitan?
What's the "old" language that Catalan derives from? We don't seem to have "Proto-Gallo-Romance".

Benwing2 (talk) 17:04, 22 October 2016 (UTC)[reply]

We treat Old Provençal as a synonymous name for Old Occitan, and Provençal as a synonymous name of Occitan. And the two related categories are... Old Provençal and Occitan. Yeah. There was a decision on this many years ago, I dunno, 2010, and Old Provençal won out. There is a Category:Old Catalan language and I've seen references to Gallo-Romance rather than Proto-Gallo-Romance. Like, some say that the Oaths of Strasbourg are written in Gallo-Romance rather than Old French (but that's another debate). Renard Migrant (talk) 17:25, 22 October 2016 (UTC)[reply]

As a user I would find it highly preferable if the ancestor of X, and X alone, was Old X and not Old Y. Korn [kʰũːɘ̃n] (talk) 17:45, 22 October 2016 (UTC)[reply]

How is that supposed to work when Old X is the ancestor of multiple modern languages? Old Irish is the ancestor of Irish, Scottish Gaelic, and Manx. Old English is the ancestor of English and Scots. Old Norse is the ancestor of some 10 languages, not one of which is called "Norse". —Aɴɢʀ (talk) 08:51, 23 October 2016 (UTC)[reply]

But Provençal is no-way, no-how the same as Occitan. Provençal is a dialect of Occitan, as are Languedocien, Auvernhat, Gascon, etc. As for Old Provençal vs. Old Occitan, Wikipedia and AFAIK all modern scholarly sources use "Old Occitan" for the basic reason that the language is ancestral to all of the modern Occitan varieties (except maybe Gascon), and is not specifically an old version of Provençal. IMO we need to change the terminology. Anyone else agree? Benwing2 (talk) 18:35, 22 October 2016 (UTC)[reply]

I agree. —CodeCa t 01:10, 23 October 2016 (UTC)[reply]

I support renaming pro Old Occitan. —Aɴɢʀ (talk) 08:51, 23 October 2016 (UTC)[reply]

Oh yes, consensus can change and I support it for broadly the same reasons (though I'm not massively clued up on Occitan vs. Provençal). Renard Migrant (talk) 11:27, 23 October 2016 (UTC)[reply]

I too agree. Leasnam (talk) 17:15, 25 October 2016 (UTC)[reply]

Attention @-sche... —Μετάknowledge^{discuss/deeds} 00:01, 25 October 2016 (UTC)[reply]

Old Provençal was the historic name, and was still slightly more common as of 2008 (and in general Provençal was several times more common than Occitan). Wikipedia also says Provençal was the older name, but it lemmatizes Occitan and Old Occitan, saying "in the English-speaking world, the term Provençal has historically also been used to refer to all of Occitan, but is now mainly understood to refer to the variety spoken in Provence." Perusing Google Books, it does seem like "Old Occitan" is more common in the most recent books (2010-2016). To add clarity as to the scope of pro and add consistency between the names of oc and pro, I'd support renaming it to "Old Occitan", though my feelings on the matter are not strong. Keep the old name as an alt name, obviously. - -sche (discuss) 21:02, 25 October 2016 (UTC)[reply]

Yesterday, I was pinged to Wiktionary:Beer parlour/2014/August § Old Provençal or Old Occitan?, a 2014 discussion which supported a rename; I was going to start a new BP thread to be sure there was still support 3.5 years later (since the rename will directly affect hundreds of entries), but I see from this thread (only 1.25 years old) with mostly-different participants that there is.

With regard to commonness: Ngrams suggest Old Provencal was somewhat more common even as late as 2008 (and significantly more common before then), although it was trending down and Old Occitan trending up. When I search Google Books for works published in the ten years since Ngrams' cutoff, I find 20 using "Old (Provençal|Provencal)" (and 23 irrelevant ones using "old Provençal" as in "old Provençal saying", etc), and 18 using "Old Occitan" (including one that explicitly supports a rename, "also called Old Provencal and today known as the Old Occitan language"), neck and neck. Wikipedia calls it "Old Occitan", but Glottolog calls it "Old Provençal". The only English reference work on it that I found published in the last twenty years calls it "Old Occitan".

But several other concerns have been raised in the 2014 discussion and in this one, as breaking the tie in favour of "Old Occitan".

I'll start renaming it. - -sche (discuss) 14:13, 20 January 2018 (UTC)[reply]

@-sche: There are a few module errors (for instance, in tres) because Module:descendants tree can't find an Old Occitan section. Is there a method to change all the headers, Descendants entries, and translations to the new name? — Eru·tuon 06:12, 22 January 2018 (UTC)[reply]

I've been changing them with AWB, working from a list of all occurrences of the string "Old Provençal" in the last database dump. There are about 400 left, which I should finish today. :) (A bot could also have done it, and if there had been more than a few thousand entries I would have requested that, but there would have been a few false positives where "Old Provençal" was correctly mentioned as e.g. the thing that "OProv" is an abbreviation of.) - -sche (discuss) 15:35, 22 January 2018 (UTC)[reply]

AWB access

I have used AWB on English Wikipedia, and there are a couple of tasks here on Wiktionary that I really don't want to do by hand: adding syllable breaks in the sequence IPA^(key): /iə/ when it's not a diphthong and correcting some {{R:Smyth}} references. — Eru·tuon 01:34, 23 October 2016 (UTC)[reply]

Done. Benwing2 (talk) 01:46, 23 October 2016 (UTC)[reply]

I used search, but I still don't know what AWB is. Since @Erutuon: thinks it would help with syllable breaks, and hence counting syllables, I'd like to know what it is. Where can I look ? Bcent1234 (talk) 13:42, 25 October 2016 (UTC)[reply]

@Bcent1234: AWB stands for AutoWikiBrowser, and it's described at Wiktionary:AutoWikiBrowser. Actually, it can't exactly help with counting syllables. Module:syllables can count syllables, when someone adds it to Module:IPA. I was using AWB to add syllable breaks to /iə/, which is listed as an English diphthong in Module:syllables, because it is a diphthong in New Zealand English, but is a disyllabic sequence in most other dialects (and in most of the existing IPA transcriptions on Wiktionary). Anyway, this was a complicated explanation. If a syllable break isn't added, words containing the disyllabic sequence /iə/ would be counted as having at least one less syllable than they actually have. In short, AWB can't exactly help with counting syllables, but it can help to modify IPA transcriptions so that the syllable-counting module will work correctly. — Eru·tuon 16:59, 25 October 2016 (UTC)[reply]

Comment about the "Request categories" vote

Wiktionary:Votes/2016-07/Request categories is going to end in 4 days. Current results: 8 supports, 4 opposes, 3 abstentions. Total: 15 people voting.

I believe that we should simply close the vote at 23:59, 27 October 2016 (UTC) as scheduled, instead of postponing the vote, because it was already created as a 2-month vote and the current turnout is pretty good. With 15 people voting, it's unlikely that a lot more people would vote even if we postponed it. That said, ongoing votes that are very close to 66.6% support are noticeably unpredictable. Currently, the vote would pass, but 1 new "oppose" could cause it to fail. --Daniel Carrero (talk) 02:16, 24 October 2016 (UTC)[reply]

Joconde or La Joconde? Seine or La Seine? Cap or Le Cap?

La Joconde is the French name of the Mona Lisa. Formerly this name sat under Joconde, but the headword displayed la Joconde (lowercase, although the French Wikipedia article capitalizes La in La Joconde even in the middle of a sentence). So far I've adopted three different solutions for similar instances:

I hard-redirected Joconde to La Joconde.
I left Seine as-is.
I changed Cap to use {{only in|Le Cap|lang=fr}} (a hard redirect wasn't possible because there was also an English defn of this term).

What's the correct way of handling these cases? I don't like the current solution of having the headword disagree with the page name. Benwing2 (talk) 04:49, 24 October 2016 (UTC)[reply]

The way you don't like is the way the Irish editors have agreed to list country names in Irish, which almost always have the definite article. For example, the entry name for the Irish word for "France" is Frainc, but the headword line says An Fhrainc. —Aɴɢʀ (talk) 13:03, 24 October 2016 (UTC)[reply]

Le Touquet (for which we have no entry) is actually the name of the place Le Touquet, so « aller à Paris » but « aller au Touquet » (for non-French speakers, see au). Not sure about Le Cap as I hadn't heard of it. Head word disagree with the page name is not always wrong and it's used in English entries as well. La Joconde I have no idea and I'd have to research it, instinctively the article isn't part of its name but perhaps on researching it it will turn out to be. Renard Migrant (talk) 21:13, 24 October 2016 (UTC)[reply]

We have (deprecated template usage) The Hague and there are probably more similar names. (I've added "Le Touquet" by the way.) SemperBlotto (talk) 01:56, 25 October 2016 (UTC)[reply]

p.s. A Google ngrams search of "Mona Lisa,the Mona Lisa,The Mona Lisa" shows it to be used with the definite article about half the time. SemperBlotto (talk) 01:59, 25 October 2016 (UTC)[reply]

(e/c) In English we are equally inconsistent:

We have The Hague, where Hague has a "see also The Hague".
But we have the Gambia under Gambia, with headword "Gambia" and The Gambia a hard redirect (whereas Wikipedia has the country under w:The Gambia).
Yet we have the Netherlands under Netherlands with headword "the Netherlands" (not "Netherlands").
Finally, for river names, we have e.g. Rio Grande with headword "Rio Grande" and no mention anywhere of the fact that it is normally "the Rio Grande"; similarly for Thames.

Arguably the different treatment of rivers stems from the fact that most rivers in English are preceded by "the" whereas cities, states and countries usually aren't. The English example suggests we ought to have the Seine in French under Seine with headword "Seine" and similarly for other French rivers, and maybe the same for countries, since rivers and countries in French normally take le/la/les. Benwing2 (talk) 02:25, 25 October 2016 (UTC)[reply]

The article is not an inseparable part of these terms though. You could say "The second Rio Grande estuary". Here, the article modifies "estuary", and there's also an adjective in between. —CodeCa t 17:09, 25 October 2016 (UTC)[reply]

This test isn't probative. You can say "the second Hague tribunal" even though we generally agree that "The Hague" is the name of the city. Benwing2 (talk) 20:17, 25 October 2016 (UTC)[reply]

Then The Hague is different, and is actually an inseparable unit, unlike Rio Grande. —CodeCa t 20:28, 25 October 2016 (UTC)[reply]

I don't understand what you're saying. Are they different simply because someone says they're different? They both behave syntactically the same. Benwing2 (talk) 20:45, 25 October 2016 (UTC)[reply]

But they're not the same, and this is one instance where they aren't. You might say "The Rio Grande conference" but not "The Hague conference", you actually say "The The Hague conference". The article is an inseparable part of The Hague, it's not actually syntactically an article. —CodeCa t 20:47, 25 October 2016 (UTC)[reply]

You don't say "The The Hague conference". That would be rather strange. Do a Google search on "a Hague" and "the second Hague" and "the only Hague" and you'll see what I mean. Benwing2 (talk) 22:17, 25 October 2016 (UTC)[reply]

"the the hague", with quotes, gets almost 10000 hits on Google. —CodeCa t 22:22, 25 October 2016 (UTC)[reply]

And "the the rio grande" gets 258,000. What does that prove? Benwing2 (talk) 23:12, 25 October 2016 (UTC)[reply]

Proposal: don't show brackets around transliterations

Currently, transliterations are shown with brackets around them. I propose to remove the brackets, which gives a cleaner look with less visual clutter. —CodeCa t 22:39, 24 October 2016 (UTC)[reply]

They are? Where? абелево кольцо- where are the brackets? DTLHS (talk) 02:06, 25 October 2016 (UTC)[reply]

They are written between parentheses, though. The entry bracket asserts that parentheses count as brackets. --Daniel Carrero (talk) 02:11, 25 October 2016 (UTC)[reply]

current display (clunky):

Cognate with ... Ancient Greek γένεσις (génesis) (English genesis)

better:

Cognate with ... Ancient Greek γένεσις, génesis (English genesis)

no brackets in headword:

γένεσις • ‎génesis f ‎(genitive γενέσεως); third declension

I think in some cases it would be very nice to have the option to display it without brackets or parentheses. That would be helpful when you want to put a non-Latin-script term in a parenthesis, or put another parenthesis directly after a transliterated term. For an example of the latter, here's something I'm editing right now, the etymology section of gēns. Looks clunky having two parentheses right after each other. Much better, I think, if the transliteration is separated from the original by a comma.

But I think in most cases it's fine to have brackets: for instance, if the Greek term in the example is just in a list with items separated by commas, and doesn't have another word parenthesized after it. — Eru·tuon 05:14, 25 October 2016 (UTC)[reply]

Also, I would support removing parentheses from around the transliteration in headwords. I think it looks fine if the original script is separated from the transliteration by a bullet. — Eru·tuon 05:23, 25 October 2016 (UTC)[reply]

To me the proposal is both less clear and harder to read. I'm strictly against it. Korn [kʰũːɘ̃n] (talk) 12:58, 25 October 2016 (UTC)[reply]

I prefer keeping the parens, they make it clear that the "main" word is written in its native script and that the transliteration is secondary information. --Daniel Carrero (talk) 14:16, 25 October 2016 (UTC)[reply]

I also prefer keeping the parens, except perhaps in the headword line. "γένεσις • génesis f" doesn't look that bad to me, but in running text the parens really need to be there. —Aɴɢʀ (talk) 14:54, 25 October 2016 (UTC)[reply]

Agreed with Angr. Benwing2 (talk) 16:13, 25 October 2016 (UTC)[reply]

There are some awkward cases (such as mentions that are themselves inside brackets), but usually this seems to be on the benecifial side. --Tropylium (talk) 00:11, 31 October 2016 (UTC)[reply]

Suggestion: Edit the abbreviation policy

May I create a vote to edit what we have to say about definitions of abbreviations in WT:EL#Abbreviations, a subsection of WT:EL#Definitions? I intend to do it eventually if people are OK with it, I'm not in a hurry.

Current text:

The “definitions” of entries that are abbreviations should be the expanded forms of the abbreviations. Where there is more than one expansion of the abbreviation, ideally these should be listed alphabetically to prevent the expanded forms being duplicated. The case used in the expanded form should be the usual one — do not capitalise words in the expanded form of an abbreviation that is made up of capital letters unless that is how the expanded form is usually written.

Where the expanded forms are entries that appear (or should appear) in Wiktionary, wikify them. Expanded forms that are encyclopedic entries should also be wikified and linked to the appropriate Wikipedia entry. When the expanded form does not merit an entry of its own, either in Wiktionary or Wikipedia material, wikify its component words and give a gloss (italicised, in parentheses) after the expansion explaining what the term means (see SNAFU for an example).

See PC for an example entry.

Proposed text:

For abbreviations, acronyms and, initialisms (Examples: PC, SNAFU), the definitions usually use templates linking to their expanded forms. For example, one of the senses in the entry PC may be "Initialism of personal computer." Do not capitalise words in the expanded form unless that is how the expanded form is usually written. (in the previous example, don't write "Personal Computer") Where the expanded form is an entry that exists (or should exist) in Wiktionary, link to it. Otherwise, if an appropriate Wikipedia article exists, you may link to it. When the expanded form does not merit either a Wiktionary entry or a Wikipedia article, link it to its component words. You may expand the definition with a gloss if appropriate.

Rationale and changes:

Concerning the 1st sentence of the original text:
- Replacing "abbreviations" by "abbreviations, acronyms, and initialisms", a more complete list.
- Removing the quotation marks around "definitions". They are actual definitions.
- Mentioning that these entries usually use templates.
- Removing "should be the expanded forms of the abbreviations". In a few entries, like LOLWUT, the sense may be a non-gloss definition and the abbreviation may be in the etymology. In most entries, the definition is "abbreviation of X Y Z".
- It may be unnecessary, but I'm adding an explanation of what an "expanded form" is.
Concerning the 2nd sentence of the original text:
- Removing it completely. I don't think we should usually bother listing the senses alphabetically, or should we?
Concerning the 3rd sentence of the original text:
- Full rewrite, with an example added of what is meant by incorrect capitalizing words.
Other changes:
- Rewriting the difference between linking to Wiktionary entries and Wikipedia articles.
- Making it clearer that Wiktionary has entries and Wikipedia has articles.
- Removing the explanation that glosses are "italicised, in parentheses", because the template is going to deal with formatting, and people may see other styles if they edit their personal CSS pages.
- Mentioning the abbreviation examples (PC and SNAFU), together in the same line. The original text had them in separate lines.
- Minor edits, including word replacements like "appear [...] in Wiktionary" -> "exists [...] in Wiktionary" and "wikify" -> "link".
- Rewriting clutter to make the text shorter without removing any rules, unless otherwise stated.

Would you change anything? Please let me know. --Daniel Carrero (talk) 04:41, 25 October 2016 (UTC)[reply]

1 week passed, and there was no response on this topic. Maybe that is understandable, because the "rationale and notes" is quite large but on the whole this proposal just a bunch of minor changes and rewrites that is not supposed to change the status quo. I'll keep waiting to see if there's any response here in the future. If no one suggests any changes to this proposal, may I create a vote later? --Daniel Carrero (talk) 09:50, 1 November 2016 (UTC)[reply]

If no one replies here, I'd like to create a vote to edit that text in WT:EL as suggested anyway. Eventually. --Daniel Carrero (talk) 05:00, 13 November 2016 (UTC)[reply]

I created Wiktionary:Votes/pl-2016-11/Abbreviations as suggested here. --Daniel Carrero (talk) 19:50, 24 November 2016 (UTC)[reply]

Why don't `{{m}}` and `{{l}}` support etymology-only languages?

Most of the errors in CAT:E are because Lombardic was made an etymology-only language, and the etymology or descendants sections use {{m}} or {{l}}. Why don't these support etymology-only languages? {{inh}}, {{der}}, {{bor}} and {{cog}} all do. Benwing2 (talk) 21:17, 26 October 2016 (UTC)[reply]

There are no entries for an etymology only language, so how could you link to them? DTLHS (talk) 21:18, 26 October 2016 (UTC)[reply]

Because in theory it would be no different from using the parent language code. With {{cog}} et al. there is a difference, in that the name displayed is that of the etymology-only language rather than of the parent language. I think I suggested to User:CodeCat before that for consistency {{m}} and {{l}} should accept etymology-only languages, but CodeCat disagreed. --Wiki Tiki 89 21:27, 26 October 2016 (UTC)[reply]

Example. Renard Migrant (talk) 22:58, 26 October 2016 (UTC)[reply]

That's what you're "supposed" to do. --Wiki Tiki 89 23:03, 26 October 2016 (UTC)[reply]

Perhaps in such situations it makes sense for it to work, but I'm not sure. I like the idea that codes are unique, and that you can't use multiple codes interchangeably for the same thing. —CodeCa t 00:02, 27 October 2016 (UTC)[reply]

Personally, I prefer to keep {{m}} and {{l}} without support for etymology-only languages; if we allowed these templates to do that, it's likely that we would see some uses that don't make a lot of sense, like {{m|LL.|whatever}} for links to Latin sections that are not actually about Late Latin. The links would work, but the "LL." code would lose some of its meaning, being effectively equal to "la" except in templates that actually care about the distinction. When you type {{der|en|LL.}}, the latter code is to show the text "Late Latin" and link to the correct language section; whereas {{m}} and {{l}} don't show the language name, they just link to the language section.

One way to look at it is if we had long explanatory parameters for each purpose:

Derivation from Late Latin to English: {{der|en|LL.}} = {{der|show language name=LL.|source language code=la|target language code=en}} ("source language code" is not actually needed because its correct value can be automatically inferred from "show language name")
Simple link to a Latin section: {{m|la}} = {{m|source language code=la}} (there is no need for the parameter "show language name", because no language name like "Latin" or dialect name like "Late Latin" is shown; incidentally, there's no need for "target language code" either)

--Daniel Carrero (talk) 09:59, 31 October 2016 (UTC)[reply]

"Request categories" vote -- no consensus

Wiktionary:Votes/2016-07/Request categories ended as no consensus: 8-6-3 (57.14%-42.86%). The vote proposed renaming all the request categories (with 18 categories specifically listed), in all languages, under a single naming system. For example, one of the proposed named was Category:Requests for etymologies of English terms.

Five opposers stated that they would prefer category names with the "English" on the front. Personally, I believe that there's no reason to do that. For people who share that opinion: Is there any possible good category name starting with "English", including any name that was not proposed yet? If not, could you please reconsider and support names beginning with "Requests"? Some names mentioned in the vote were:

Category:English entries with requests for XYZ (but, as @Wikitiki89 said: That doesn't work for most of the categories. For example, "Requests for Sanskrit terms" are most often not even in Sanskrit entries, but in etymology sections of other languages' entries.)
Category:Sanskrit terms requested (This one does not look very good to me either; it seems we are "forcing" the language name to appear first. Does anyone like this one?)

Wikitiki89 suggested using another list of categories starting with "Requests", which has some supporters. I'd support, too, using Wikitiki89's names if it means finally cleaning up the requests category names and using a consistent naming system. I'll copy below the exact same names that Wikitiki89 proposed (correct me if I made any mistake), and I'll add the names for unbrella categories. Wikitiki89 did not propose any different name for quotations and usage example categories, so I'll add my own. What do you think?

Proposed name	Proposed umbrella category	Current name
Category:Requests for etymologies in English entries	Category:Requests for etymologies by language	Category:English entries needing etymology
Category:Requests for expansion of etymologies in English entries	Category:Requests for expansion of etymologies by language	Category:English entries with incomplete etymology
Category:Requests for attention in etymologies in Latin entries	Category:Requests for review of etymologies by language	Category:Latin etymologies needing attention
--
Category:Requests for pronunciation in English entries	Category:Requests for pronunciation by language	Category:English entries needing pronunciation
Category:Requests for audio pronunciation in English entries	Category:Requests for audio pronunciation by language	Category:English entries needing audio pronunciation
--
Category:Requests for example sentences in English	Category:Requests for example sentences by language	Category:English requests for example sentences
Category:Requests for quotations in English	Category:Requests for quotations by language	Category:English entries needing quotation
Category:Requests for dates of English quotations	Category:Requests for quotation dates by language	Category:Requests for date (no language-specific category)
--
Category:Requests for translations into Sanskrit	Category:Requests for translations by language	Category:Translation requests (Sanskrit)
Category:Requests for review of Sanskrit translations	Category:Requests for review of translations by language	Category:Translations to be checked (Sanskrit)
--
Category:Requests for English terms	Category:Requests for terms by language	Category:English term requests
Category:Requests for native script of Sanskrit	Category:Requests for native script by language	Category:Sanskrit terms needing native script
Category:Requests for transliterations of Sanskrit	Category:Requests for transliterations by language	Category:Sanskrit terms needing transliteration
--
Category:Requests for definitions in English entries	Category:Requests for definitions by language	Category:English entries needing definition
Category:Requests for inflections in English entries	Category:Requests for inflections by language	Category:English entries needing inflection
--
Category:Requests for attention in English entries	Category:Requests for attention by language	Category:English terms needing attention
--
Category:Requests for images in English entries	Category:Requests for images by language	Category:English entries needing images
Category:Requests for references for English terms	Category:Requests for references by language	Category:English entries needing reference

--Daniel Carrero (talk) 01:35, 29 October 2016 (UTC)[reply]

Related terms heading in the edittools

Shouldn't Related terms and See also have three and four equality signs respectively around them? "Relatedness" is a word's property and can't be specific to one of its lexical categories. See also's case is more debatable I guess. --Dixtosa (talk) 13:01, 29 October 2016 (UTC)[reply]

They should both be level 4 headers, as they both pertain to the current word and not to all words of the same spelling. —CodeCa t 15:40, 30 October 2016 (UTC)[reply]

Every other link in edittools is adjusted to words having one etymology. --Dixtosa (talk) 15:44, 30 October 2016 (UTC)[reply]

Wiktionary:Beer parlour/2016/October

Initialisms etc

syllable marks in English pronunciation

Pluralization of Acronyms and Intialisms

why long marks in Canadian English?

Minimal Difference Pairs

Third LexiSession: police

Pronunciation and Etymology

Suggestion: Rule in EL about not linking back misspellings

CFI and idiomaticity clarification

Creative Commons 4.0

About the smallest discussions

About WT:SD

September News of French Wiktionary

Dutch nouns with gender-based meanings

Standardizing Template:calque

Deprecating glosses as the fourth positional parameter of {{m}} and {{l}}

Duplication of definitions for spelling and other minor variants

Are misspellings lemmas?

Non-lemmas for misspellings

Extend Description vote?

Derived terms vote

"Famous bearers" section on names?

Thinking of systematically adding missing pronunciations to English words

Possible future vote about deleting all programming language symbols

Looking for German speakers to add test cases to Module:de-IPA/testcases

Which constructed languages belong in mainspace?

Why we don't need durable citations

Words needing citations from the internet

Proposed CFI change

Requiring six citations from the internet

Suggested rules

How many votes?

Adverb, prepositional phrase, adjective, ...?

Formatting of cognates at Reconstruction:Proto-Celtic/kumbā

Edit protect tchýně

More ise/ize

Proposal: Redirect many single-character entries

Terms attributable to a particular source

Correct use of templates

Italics in Project-Link Templates

Use of Google hit counts

Nonstandard spellings

How to categorise zero-derivations of English verbs from nouns, etc.?

Deleting user talk pages

Question

Durability of CFI for Google groups

Speaking of deleting user pages...

Editing the introduction of WT:EL - Pronunciation

ASCII vs. Unicode apostrophes in French entries

Vote: Removing label proscribed from entries

template:it-conj-ire / morire

Removing: "No topic should have a new vote more than once a day (24 hr period)."

Votes created on the whims of a single editor

Old Provençal or Old Occitan?

AWB access

Comment about the "Request categories" vote

Joconde or La Joconde? Seine or La Seine? Cap or Le Cap?

Proposal: don't show brackets around transliterations

Suggestion: Edit the abbreviation policy

Why don't {{m}} and {{l}} support etymology-only languages?

"Request categories" vote -- no consensus

Related terms heading in the edittools

Navigation menu

Search

Deprecating glosses as the fourth positional parameter of `{{m}}` and `{{l}}`

Why don't `{{m}}` and `{{l}}` support etymology-only languages?