Wiktionary:Beer parlour/2014/July

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← June 2014 · July 2014 · August 2014 →

Category for all lemmas again[edit]

Previous discussion: Wiktionary:Beer parlour/2014/January#A category for all words or lemmas in a language

The previous discussion seemed to have general support, so I would like to make this change, but there are a few details I'd like to ask about first. We can either have just a category for all lemmas, but nothing else changes, or we could split off all "form" categories into their separate tree and have another category for non-lemmas (which may not be all that useful in the end?). A third option would be to have a category for lemmas alongside a category for all terms in a language regardless of lemma status. However, this last option could also be achieved by mentally merging the lemma and non-lemma categories, so this does not have much added value over the second option. —CodeCat 11:34, 1 July 2014 (UTC)

I prefer the idea of having a per-language category with all words (not just lemmata/headwords), rather like the way the Official Scrabble Words is presented. When I've used Index:English in the past — of course, it's years out of date now — I've wished it had all words and word forms. Equinox 13:25, 6 July 2014 (UTC)
I think we should have both: one category for all words, and another category for all lemmata. Fr.Wikt and De.Wikt already have categories for all words in each language. Both categories would have many uses. A category of all words would be useful for scrabble players, and for finding entries in the event that we needed to (a) make some change to all words in a certain language, or (b) examine all words in a certain language to see which of them met a certain criterion (e.g. used an acute accent, if we decided that they were all actually supposed to use a macron). (On De.Wikt I used to use the "all words" categories to look for words I didn't recognize, check Google Books and other dictionaries for them, and 'RFV' them if necessary.) A category of all lemmata would be useful for finding words to alliterate, and would also probably be more useful for any other practical purpose, for highly inflected languages where inflected forms would otherwise swamp the lemmata. Both categories would allow Wiktionary to be used like a paper dictionary, where all words can be seen in alphabetical order regardless of POS. - -sche (discuss) 16:22, 6 July 2014 (UTC)
Would a category for all lemmas, and another for all non-lemmas also be ok? That way, you could still look through all words, by searching through both categories. —CodeCat 16:34, 6 July 2014 (UTC)
No, I think there are advantages to having a category that already contains all words, vs having to merge two categories oneself. And I don't actually see a benefit to having a category for all non-lemmata at all, besides that it might provide a more up-to-date count of "form[-of] definitions" than WT:STATS does.
It's also worth noting that a category for all words will be simpler on a philosophical level, and presumably also on a technical level, to implement than a category for all lemmata, because for the "lemmata only" category we will have to wrestle with questions like: are Template:alternative spelling ofs lemmata? Are Template:standard spelling ofs lemmata? What scalable way is there to know which category to use for entries that only contain {{head|foo}} with no POS set? What scalable way is there to know which category to use for entries like messages (q.v.)? Etc, etc. Whereas, anything with {{head|en}} can go into the "all words" category. - -sche (discuss) 16:58, 6 July 2014 (UTC)
It's more that if we have a category for lemmas and all words, then every lemma in every language will have two more categories added to it. If we split them, it will only be one. As for the question of what is a lemma, I think it's relatively simple: if it would probably be listed as a lemma in a paper dictionary, we would do the same. My intention was to create separate category trees for lemmas and non-lemmas, Category:English lemmas and Category:English non-lemma forms. The former would contain Category:English nouns, Category:English verbs etc, while the latter would have Category:English plurals, Category:English verb forms and so on. I would consider an alternative spelling a lemma, because it is the lemma form of a word, and would presumably be found in a paper dictionary with a "see (other lemma)" notice. —CodeCat 17:31, 6 July 2014 (UTC)

FWIW, De.Wikt and Fr.Wikt both use their equivalents of Category:English language as their "all words in English" categories. We could either follow that model, or come up with a separate category, like Category:English words. - -sche (discuss) 16:58, 6 July 2014 (UTC)

A plea for more scrupulous patrolling[edit]

Rather sloppy content has been slipping through RC patrol recently. I have found some through second-hand monitoring pages like Special:UncategorizedPages and Special:Shortpages. Apparently User:SemperBlotto has been inactive lately, which means that someone else has to do what he has been doing. I urge all sysops and patrollers to visit Special:RecentChanges more often.

On request, I can grant rollback and patroller rights to trusted regulars. Keφr 06:56, 2 July 2014 (UTC)

Rollback and patroller rights AFAIK have in the past been done at WT:WL, requiring two admins' input, not one.​—msh210 (talk) 05:55, 6 July 2014 (UTC)
Given the lack of interest, this question is kind of academic anyway, but: Wiktionary:Beer_parlour/2013/October#Purplebackpack89 Rollback request. And for some (if not most) users listed at Special:ListUsers/rollbacker the rollback or patroller right has been granted without any process at all (just because Stephen sees someone undo a lot of edits). Of course, for me an autopatrolled flag (which is granted at WT:WL with input from two admins) is a prerequisite here. And given that I am announcing this in public, and it can be undone in case someone disagrees with my judgement, I think it should not pose a problem. Keφr 06:49, 6 July 2014 (UTC)
Sounds good to me.​—msh210 (talk) 07:40, 6 July 2014 (UTC)
@Kephir: Please make me a patroller. I don't promise anything, but becoming the patroller will create the temptation for me to actually patrol. Let the patroller flag be removed from me as soon as anyone disagrees. --Dan Polansky (talk) 09:00, 6 July 2014 (UTC)
Granted. Keφr 09:09, 6 July 2014 (UTC)
Wait, Dan doesn't have the mop? If he doesn't, he should. Purplebackpack89 15:02, 6 July 2014 (UTC)
He shouldn’t be an administrator if he still can’t deal with editors peacefully. I don’t think that he’s merited patroller rights either. --Æ&Œ (talk) 18:59, 6 July 2014 (UTC)
To be clear, I am hardly a big fan of Dan's, but given his, shall I say, very critical attitude to other people's editing, I doubt he is going to abuse the "mark as patrolled" button too much. About the rollback button, I am less sure. Keφr 20:08, 6 July 2014 (UTC)
AEOE, If dealing with editors peacefully is a criteria for adminship, there are some admins who should have their mops taken away. Purplebackpack89 22:49, 6 July 2014 (UTC)
Too bad SemperBlotto drove away all the new users that could have picked up the slack :P Kaldari (talk) 08:38, 13 July 2014 (UTC)

Converting WT:Information desk to monthly pages[edit]

Moved from Wiktionary:Grease pit/2014/July#Converting WT:Information desk to monthly pages

Can we do this now? The last time this was proposed there was some contention that new users might be confused by the monthly pages system and post things to the wrong page. However, I cannot recall a single such incident, so this seems to be a non-issue. Shall we switch WT:ID to the monthly page system as well? The benefits are quite obvious.

Keφr 21:14, 1 July 2014 (UTC)

There are plenty of examples of people (and in some cases the "+ (add section)" button itself—see [1]) getting confused and mistakenly posting to the main page rather than the monthly subpages, e.g. [2] and [3]. (There are also examples of people posting to the wrong monthly subpage.) However, I no longer feel that this is much of a problem. - -sche (discuss) 22:44, 1 July 2014 (UTC)
Is this worth even asking the question. The page doesn't get very big and shows no signs of growth AFAICT. DCDuring TALK 23:31, 1 July 2014 (UTC)
Yes it does. --WikiTiki89 23:35, 1 July 2014 (UTC)
I think all these were submitted while MediaWiki:Common.js was broken. So assuming it will not break too often, we are rather safe. Keφr 05:13, 2 July 2014 (UTC)
This is a WT:BP question now, since we know we are technically capable of it. --WikiTiki89 22:49, 1 July 2014 (UTC)
Yes, I see. The method used for BP would work for ID, but not for request pages without further complications. DCDuring TALK 00:28, 2 July 2014 (UTC)

It looks like -sche is for it, DCDuring has been convinced(?), and Wikitiki89 seems kind of supportive. One more supporter and if no one objects I go with it. Keφr 13:32, 5 July 2014 (UTC)

  • I oppose this. The benefit that I can see is no need to archive the page anymore, but the page is low-profile enough that archiving is not really a problem. The subpaging seems less intuitive than having a single page. --Dan Polansky (talk) 15:52, 5 July 2014 (UTC)
    • I think one of the reasons it is "low-profile" is that nobody wants to visit it, because it is so annoyingly large. (See WT89's diff above.) Keφr 16:01, 5 July 2014 (UTC)
      • I don't think so; the information desk is a rather unimportant page, especially compared to Beer parlour, so it gets low traffic; nothing to do with the size. As an aside, you said "I knew I could count on you." in the edit summary. If you want to say such things, be enough of a man and put them in the discussion, or, better yet, drop that juvenile behavior. --Dan Polansky (talk) 16:37, 5 July 2014 (UTC)
  • How about automating the current archiving method by archivebot.py (docs)? There will need to be slight (and probably good, I'd say) changes, though; the month headings need to go, and archiving will be done section by section, not all sections in a period at once. For an example, see ArchiverBot working on [4]. I can volunteer to run it, if there is interest. Whym (talk) 10:53, 6 July 2014 (UTC)
    • Not workable in my opinion. We have rather few bots, and for all I know, there is no one who can afford to run a bot full-time. And even if, they would probably prefer it to handle mainspace tasks. Also, I never liked Wikipedia-style archives. With monthly pages, you know that if you started a thread in one place, it stays there, unless expressly moved. Keφr 13:34, 6 July 2014 (UTC)
      • Just to clarify, I am an operator of the archive bot for two other wikis. It costs almost nothing to me to add one wiki. Whym (talk) 16:05, 6 July 2014 (UTC)
        • And who will replace you if you stop running it? (Which is another problem here. High rotation and little staff.) Keφr 16:43, 6 July 2014 (UTC)
          • (Just responding to "if you stop running it" for the record, not objecting to the other concerns Kephir and -sche noted) My bot uses Tools Lab. [5] Co-maintainers are welcomed. It could also be useful for archiving user talk pages. Whym (talk) 03:09, 7 July 2014 (UTC)

{{look}} Asking User:Æ&Œ, User:Equinox, User:-sche, User:Angr, User:Stephen G. Brown for further input. (Anyone else is also welcome.) Keφr 16:43, 6 July 2014 (UTC)

  • I have no strong opinion on the issue one way or the other. —Aɴɢʀ (talk) 19:13, 6 July 2014 (UTC)
  • I also have no strong feeling about monthly subpages. In the past, I opposed converting the Information Desk to subpages, out of concern for teh noobs, but as evidenced by my comment above, I no longer feel that people posting to the main page rather than the subpages is much of a problem, given how easy it is to move threads. The suggestion that a bot could archive threads on an individual basis is interesting, but the number of pages on which that might conceivably be useful is small (BP, GP, ID, ?TR?), and I think the benefit Kephir notes (of knowing that if you started a discussion on the July subpage, that's where it's staying) outweighs the small potential benefits of per-thread archiving. - -sche (discuss) 19:29, 6 July 2014 (UTC)
  • I don’t have a strong feeling about it. It gets very little traffic, so I don’t think it matters either way. —Stephen (Talk) 03:29, 7 July 2014 (UTC)
    • In the last archived batch, ID had 16 threads per month on average. Which seems rather typical, and is not that small in my opinion. The Etymology Scriptorium often has fewer topics.
    Anyway, what we have here seems to be three "welllll, sure, if you want to" (WT89, -sche, DCD), one oppose (DP), and two strong lacks of opinions (Angr, Stephen). I am going to convert it now. Revert me if you give a shit. Keφr 09:38, 9 July 2014 (UTC)

New Word of the Day feed[edit]

Featured Feeds for Word of the Day are now available: rss, atom. If you have a suggestion to better format the feed, I'd like to help implementing. Otherwise, enjoy. :) Whym (talk) 15:30, 2 July 2014 (UTC)

I just set up a FWOTD feed, when should I expect it to appear? Also, it would be nice if the feed item contained the actual word for its title. I already know how to set that up, but it requires running a bot over WOTD/FWOTD pages, which I am too lazy to do right now (basically, the same way we solved the problem with context templates). Otherwise, wooooooo! Keφr 16:00, 2 July 2014 (UTC)
Feed names need to be added on the server side; see gerrit:136316. Should FWOTD be added for all Wiktionaries or only for English Wiktionary? Whym (talk) 11:19, 3 July 2014 (UTC)
I have no knowledge of other projects having a FWOTD. Keφr 11:21, 3 July 2014 (UTC)
Ok, I have made the request in bugzilla:67563. Whym (talk) 11:02, 6 July 2014 (UTC)
And it has been resolved: [6][7] Whym (talk) 09:32, 11 July 2014 (UTC)

Recent "Tbot" entries[edit]

I've been finding a few entries here and there that are tagged with the {{tbot entry}} template that date to 2013 and 2014. They had redlinked categories, and I created a few of those categories using the {{tbotcatboiler}} template before I realized that these were for new entries.

Not that I have anything against the type of entries Tbot used to create, but if we're going to be doing this sort of thing again, we should change the documentation so we're not listing someone who's no longer here as the contact, and talking about how things are different now that it's 2007. Chuck Entz (talk) 07:44, 6 July 2014 (UTC)

If these entries are not by Tbot… where do they come from? Keφr 13:36, 6 July 2014 (UTC)
See this. One user making a few. I'd ask User:Liuscomaes. --Type56op9 (talk) 00:37, 9 July 2014 (UTC)

Template:abbreviation-old[edit]

Hi,

Does the {{deprecated}} headband is still available on that template? The template seems to be very used and no replacement is proposed. — Automatik (talk) 15:48, 7 July 2014 (UTC)

The replacement is to use a real part-of-speech header like "noun" or "verb". —CodeCat 16:06, 7 July 2014 (UTC)
What would be the replacement for IANAL? Keφr 16:08, 7 July 2014 (UTC)
Did you even look at the entry? :) —CodeCat 16:09, 7 July 2014 (UTC)
Oh, me stupid. Previous time I checked, the header was "Acronym". But truth is, even "Phrase" does not seem very fitting. Keφr 16:13, 7 July 2014 (UTC)
Well, in any case, the replacement is whatever header you would use for the fully spelled out form. So if "I am not a lawyer" is a phrase, then so is this. If not, then this needs to be changed, but I don't know what into. —CodeCat 16:15, 7 July 2014 (UTC)
And for the categorisation? {{en-noun|-}} doesn't seem to be correct for Mbps, neither {{en-noun}} because there is no inflection for this word. — Automatik (talk) 17:12, 7 July 2014 (UTC)
{{en-plural noun}}? (Which I still think to be a stretch.) Keφr 17:15, 7 July 2014 (UTC)
I don't think, because we can say 1 Mbps. — Automatik (talk) 17:19, 7 July 2014 (UTC)
I guess it's safe to say that it stands for both "megabit per second" and "megabits per second". --WikiTiki89 17:33, 7 July 2014 (UTC)
{{en-noun|Mbps}}? Keφr 17:35, 7 July 2014 (UTC)
Thank you, I used it. — Automatik (talk) 14:43, 8 July 2014 (UTC)
Realistically I don't think this template will ever be orphaned because it always needs human intervention. That is, a bot can't tell if it's a 'noun', a 'verb', (etc.) so a human editor is always needed. Meanwhile the template is still being used in new entries. But in principal 'noun', 'verb' (etc.) offers more information to the user, while things like 'acronym' should be in the etymology, as 'acronym' explains how the word was formed in the first place. Renard Migrant (talk) 10:13, 17 July 2014 (UTC)
We could make an abuse filter for it. —CodeCat 11:08, 17 July 2014 (UTC)

"Definitions" header in Chinese entries[edit]

Apparently, people have been adding this header to Chinese entries instead of part-of-speech headers. But I recall that there was no support for this in the previous discussion. Why is this being done anyway? These entries should be fixed. —CodeCat 11:58, 8 July 2014 (UTC)

Can we just have a real vote on it? Otherwise people are just going to keep going back and forth. DTLHS (talk) 21:53, 8 July 2014 (UTC)
Because the validation of a language-specific header does not require consensus by vote (Wiktionary:Entry layout explained/POS headers#Other headers in use). It only needs the agreement between editors who regularly deal with such entries. The "definitions" header is no different from the "Han character" header in use in the hundreds of thousands of Chinese character entries (e.g. ). Wyang (talk) 03:23, 9 July 2014 (UTC)
Inventing a new part of speech header for languages where it's appropriate (I've done this too, I added the "Relative" POS for Xhosa and Zulu) is not a problem. It's a very different story when you're introducing a new header to remove part-of-speech information altogether. That is my objection here. —CodeCat 11:33, 11 July 2014 (UTC)
Regardless of bureaucracy, what exactly is the reason for replace POS headers with ===Definitions===? --WikiTiki89 13:54, 11 July 2014 (UTC)
I'm with CodeCat and DTLHS here. Why not just split the meanings by part of speech like we do for literally every other language. If there's a case to be made for not doing this, set it out in a vote where we can all see it. Renard Migrant (talk) 10:43, 15 July 2014 (UTC)
  • Re: what is the reason, there's the simple fact that 1) Chinese doesn't inflect at all, so there's no useful information provided by the POS header other than the POS itself, which can easily enough be included inline; and 2) many Chinese terms have basically the same meanings applied in different POS ways. Take , for example. We've got 13 senses listed under 5 different POS headers. The headers really only serve to break up the page in ways that are unintuitive for Chinese. ‑‑ Eiríkr Útlendi │ Tala við mig 18:52, 15 July 2014 (UTC)
    Thank you for explaining your reasoning. Here's what I think: The POS headers are useful because they make it easier to find the definition you are looking for. Most of the time when you are looking up a word, you already have a good sense of its POS because of how it was used in a sentence, and so you can use the headers to narrow down your search for the definition. It would be very redundant to list "(noun)" before every noun sense, etc. BUT I think it may be a good idea to remove the requirement for "inflection lines" after each POS header, since they serve no purpose other than to duplicate the same information over and over. --WikiTiki89 18:59, 15 July 2014 (UTC)
Some of the reasons were also mentioned here: Template_talk:zh-pron#Why_does_this_categorise_in_part-of-speech_categories.3F. The choice of PoS is often arbitrary, based on the translation into English, dictionaries either mix up PoS or ignore it. By any system, listed PoS's do not sufficiently represent the actual usage. --Anatoli T. (обсудить/вклад) 05:28, 17 July 2014 (UTC)
I oppose "Definitions" header in Chinese entries, now as before. I already posted this, albeit to what is now ranked by someone as "off-topic", below. --Dan Polansky (talk) 18:05, 23 July 2014 (UTC)

Abbreviated Authorities in Webster[edit]

I have recently discovered the Abbreviated Authorities in Webster Table, and, noticing that a few of the early entries have been linked to Wikipedia, I have been adding a few such links myself. It's interesting, though there are occasionally mismatches of dates (should the Wikipeida date be moved in?). But it's a bit inconvenient for navigation. I feel that the table should be divided by initial letter. If this seems to be generally agreed upon, is it something I would need to do myself or is it something that should be done by a coding whizz ? —ReidAA (talk) 08:08, 9 July 2014 (UTC)

Excellent! That table could be quite useful in resolving some of the {{rfquotek}} entries.
I started manually splitting the table by initial letter. It is not hard. It just requires copying the wikitable formatting surrounding the "W" or Y" headers and inserting it in the appropriate place in the undivided table.
What might be a great help would be adding links to Wikisource, Google Books, or Project Gutenberg versions of some of the specific works. As an example I did so for Hawking and Hunting. To make sure that the work is useful we should extract from the XML dump a list of how often each authority is used within {{rfquotek}}. DCDuring TALK 10:12, 9 July 2014 (UTC)
The table is now initialised. I've done a bit more wiki-referencing some of the authors. --Catsidhe (verba, facta) 12:23, 9 July 2014 (UTC)
Thanks. A dump run would help us see which authorities were actually in use, so, for now, we may as well just pursue what is interesting. DCDuring TALK 14:01, 9 July 2014 (UTC)
As we would want to use this to source citations, the best forms of a work to link to would be those that allowed search at once of the entire range of the authority in question. Wikisource often breaks the work into chapters, which is unsatisfactory for search, though arguably good for linking. It is not so handy to have to download the work to search it. DCDuring TALK 14:22, 9 July 2014 (UTC)

Proper nouns[edit]

I just came across Bible and Qur'an, which are labelled proper nouns. But at the same time, these have plurals and can take an indefinite article. I just read through w:Proper noun, which suggests that real proper nouns (or proper names) can't take indefinite articles nor have plurals. If they do, then they're not proper nouns, but refer to a class of things rather than a unique entity. The article uses "Toyota" as an example that can be either: the company itself as a proper noun, or a car made by the company as a common noun. In this sense "Bible" is a common noun because it's a book that many copies can exist of. It doesn't act grammatically the same as other book or story titles, whether old or modern. Compare for example "Odyssey", which takes a definite article like "Bible", but doesn't normally have an indefinite article: a Bible versus a copy of the Odyssey, not *an Odyssey. So I wonder what kind of criteria we should apply to proper nouns on Wiktionary, and whether we shouldn't consider relabelling some. —CodeCat 18:33, 11 July 2014 (UTC)

Are given names not proper nouns? They can be pluralised: "All the Jameses in the room raised their heads.", and they do not seem to have a distinct meaning in the plural. Keφr 18:44, 11 July 2014 (UTC)
(e/c) In particular, we currently label personal names as proper nouns, while simultaneously admitting (in many though not yet all entries) that they pluralize. Ditto country names (Germany : Germanies, Germanys, America : Americas, France : Frances). - -sche (discuss) 18:46, 11 July 2014 (UTC)
These words can be both common and proper nouns. Compare the following sentences:
  1. The Bible says to honor one's parents.
  2. Jack read the Bible.
  3. Jack put the Bible he had just bought under his pillow.
In the first sentence, "the Bible" is indisputably a proper noun, while in the third, it is indisputably a common noun; in the second, however, it can be interpreted either way. --WikiTiki89 19:01, 11 July 2014 (UTC)
In cases such as Bible and Qur'an, I think we should include both POS sections, which is what Bible already does. — Ungoliant (falai) 18:54, 11 July 2014 (UTC)
Yes, in the case of books, I think including both sections (Proper noun, and Noun) is best. In the case of personal names, on the other hand, I think including two sections would be unjustifiable; as Kephir notes, the singulars and plurals have the same sense (differing only in number): "one Richard" means one person named Richard, "two Richards" means two people named Richard. Whether that means it would be better to relabel all personal names plain nouns, or live with pluralized proper nouns, I don't know. - -sche (discuss) 19:07, 11 July 2014 (UTC)
"One Richard" is a common noun. "Richard" by itself is a proper noun. However, I think it would be overkill to create common noun sections for every name. --WikiTiki89 19:14, 11 July 2014 (UTC)
We could also just call them all nouns, couldn't we? We could keep the category if needed, but just use the normal "Noun" header. —CodeCat 19:34, 11 July 2014 (UTC)
(@Wikitiki) I don't necessarily disagree that "Richard" can be a proper noun, but I note that whatever parts of speech "Richard" can have, "Richards" can also have. The very reason that given names' definition-lines are italicized is that they are in most uses non-gloss; "and then Richard arrived" means "and then a person named Richard arrived", not *"and then a male given name arrived". An exception would be a hypothetical use like *"not long after the first scribe began to spell the adjective which had been hart as hard, the change spread to instances of the word in compounds, and with that, Richard had arrived", where "Richard" really would be a proper noun meaning "a male given name" — but NB Richards could (equally hypothetically) be used the very same way, e.g. *"and when 'd'-final words began to pluralize with '-s' rather than '-es', Richards arose". - -sche (discuss) 19:36, 11 July 2014 (UTC)
Mentioning a word is an entirely different story. I was not referring to that at all. I also disagree that the plural exists as a proper noun (except in cases where a group of people who are all named "Richard" are collectively named "Richards"; e.g. Richards are coming for dinner, where "Richards" refers to a specific group of people). --WikiTiki89 19:48, 11 July 2014 (UTC)
FWIW, here's how de.Wikt handles it: common names are common nouns, e.g. de:Angela's POS is "noun - first name" and de:Fritz has one POS "noun - first name", another "noun - last name", and a third, labelled "noun", which covers in one section the slang uses that our entry on Fritz split into a "noun" and a "proper noun" section. When a name is defined as referring to only one specific person, e.g. de:Archimedes, it is labelled "noun - proper noun" (but contrast de:Platon). - -sche (discuss) 19:36, 11 July 2014 (UTC)
[e/c] A basic distinction is between proper names (of specific entities, eg, "The White House", "Mack the Knife", "Germany", "The Federal Republic of Germany", "Deutschland", my late dog "Hayek" [short his full name "Friedrich Augustus von Hayek"]) and proper nouns. CGEL (Huddleston and Pullum) hold that ""Proper nouns, by contrast, are word-level units belonging tho the category noun. Clinton and Zealand are proper nouns, but New Zealand is not." and "Proper nouns are nouns which are specialised to the function of heading proper names. There may be homonymy between a proper noun and a common noun, often resulting from historical reanalysis in one or other direction." Their examples are sandwich and Sandwich and rosemary and Rosemary.
Our L3 header "Proper noun" is applied both to terms that serve as names of specific entities and to "nouns which are specialised to the function of heading proper names". Even a term such as White House, which is often considered the name of a specific entity, ie, a proper name, can be shown to be attestably made into a plural. Are the uses of White House to be taken as nicknames for the specific entities Roosevelt White House or the Franklin Delano Roosevelt White House?
Whether in a given case we have under the L3 header "proper noun" a proper name or a proper noun (in the CGEL sense), there is no reason not to show plurals, if attestable. Showing a word like Bible as both a common noun and a proper noun seems fine as the common noun meanings are not entirely predictable from any of the meanings of the proper noun and are attestable, but both common and proper noun meanings are likely to be pluralizable, some attestably so. DCDuring TALK 20:09, 11 July 2014 (UTC)

Reading all of the discussion here, I wonder if things would benefit from an approach like the German Wiktionary, with all of them treated as nouns. Our header structure is different, so it would not fit in exactly the same way. So how about relegating proper name-ness to the actual definition line? For given names, we already have a template to do the job, and for others, the definition already implies properness in most cases. So there's nothing that the header "Proper noun" really adds beyond what the definition already tells the user. It would also allow us to list plurals without problems, while labelling the real proper names as uncountable, and we could also merge Noun and Proper noun sections together in entries when the distinction is not so clear anyway (like in Bible). Furthermore, we need to distinguish nouns that are used without the definite article (such as names) from those that are used with it. There is nothing in the current Bible entry that indicates this to the user. —CodeCat 20:59, 11 July 2014 (UTC)

Uncountability, as we use it, is not the same as not having a plural, though many use {{en-noun}} as if that were true, either through lack of understanding of uncountability, not reading the {{en-noun}} documentation, or being defeated by it. The problem would seem to be that we use "uncountable" both in reference to mass nouns, specific entities, and nouns whose plural form is the same as the singular form. If we could find attestation for expressions like "too much/little White House" (which we probably can), that wold show White House to be uncountable in the sense of mass noun.
Nothing in a template should per se prevent us from making a decision to show plurals for things that appear under the proper noun header. We would just have to revise {{en-proper noun}} and search instances were the plural shown by the template ("tail") did not conform to usage ("dog").
OTOH, none of the OneLook dictionaries call (the) White House a proper noun. (Most call it a noun; some seem to dispense with PoS labels.) We could either take that as an indication that we have bitten off more than we can chew or that we are making an un-lemming-like advance over other dictionaries.
Use with the is usually grammatical information (eg, no the in attributive use; the used to emphasize that a named entity was the famous one of bearing the name), but may also be sense-level information (examples to follow).
It seems to me that we are still some distance away from having a sufficient shared appreciation of the issues involved in altering the thousands of English proper noun L3 headers, let alone those in other languages. DCDuring TALK 22:10, 11 July 2014 (UTC)
  • I dissent from Wikitiki and CodeCat on this. It is possible for a proper noun to have both singular and plural forms. You can have one James or a lot of Jameses, one Henderson or a lot of Hendersons. I also don't understand where CodeCat is coming from with her Wikipedia argument: I read the article last night, and I came out of it thinking the opposite. Purplebackpack89 17:45, 23 July 2014 (UTC)
    @Purplebackpack89: See the section w:Proper noun#Capitalized common nouns derived from proper nouns. --WikiTiki89 17:51, 23 July 2014 (UTC)
    Jameses is the plural form of a common noun. It's very easy to see this just by back-forming the countable singular. A James is not the same thing as James, and there is certainly a big difference between saying you don't look like James and you don't look like a James. Furthermore, the statement James is a James is true, which illustrates that a single specific person called James is a member of the class Jameses (people who have the name James). Compare this to a car is a vehicle which has the same semantic structure. —CodeCat 18:35, 23 July 2014 (UTC)
    The problem, though, is that "Jameses" can be definite or indefinite. "a James" (indefinite) might be common, but "the James" is proper. Purplebackpack89 18:41, 23 July 2014 (UTC)
    "The James" is still a common noun, unless it is turned into a name/nickname. For example: Here "The James" is a common noun: "The James I met yesterday was taller than the James I met the day before." But here "The James" is a proper noun: "There are five people named 'James' at school, but only one of them—the biggest and baddest one—we call 'The James'; Everyone is afraid of The James." --WikiTiki89 18:49, 23 July 2014 (UTC)

Example sentences in ELE, linking of words and delinking of transliterations[edit]

What's the history of the rule behind Wiktionary:ELE#Example_sentences? Who said we can't link individual words? Now that transliteration is (unintentionally) wikified in {{usex}}, see Wiktionary:Grease_pit/2014/May#Transliteration_linked_to_individual_parts_in_usexes_when_hyperlinked, my request to delink it, is brushed by - we shouldn't link words, anyway. Can we change this rule - "not contain wikilinks" for words used in usage examples? Do we really need a vote for that? Can somebody help delink usex transliterations, as in this revision or this revision ? --Anatoli (обсудить/вклад) 00:28, 14 July 2014 (UTC)

Someone needs to edit Module:usex to delink transliterations- removing links by hand is a waste of time. DTLHS (talk) 00:39, 14 July 2014 (UTC)
Yes, I agree (thanks for agreeing to fix!) but the rule itself doesn't reflect the reality and I think it's not helpful. A lot of Russian usexes are linked (not my edits but I don't see it as a problem, in fact, it may quite useful for learners to link to lemmas or some difficult words) and most Chinese usexes are linked and it's very useful for languages with no straightforward word boundaries. Anyway, editors should be free to choose, if they want to wikify individual words in usexes. --Anatoli (обсудить/вклад) 00:58, 14 July 2014 (UTC)

Gender templates for French inflected forms[edit]

{{fr-adj-form}} has been edited so that it no longer accepts gender. I understand that adjectives do not inherently have their own gender but agree in gender with what they are describing. I also understand that in with the 'definition', it says 'feminine singular of' or 'masculine plural of'... but I still think we should encourage having the gender in the head word wherever possible.

My proposal is to enable gender in {{fr-adj-form}} and to add back the gender to French adjective forms wherever possible. This is very doable by bot, for example \{\{fr\-adj\-form\}\}\n\n# \{\{feminine of\|([\ -9\;-\\\^-z\}-ퟻ]+)(\||\}) is a regex that finds all the uses of {{fr-adj-form}} with no gender, followed by {{feminine of}} on the following line (with a single blank line in between). Renard Migrant (talk) 16:49, 14 July 2014 (UTC)

Why should the gender information be in two places, both on the headword line and in the definition? - -sche (discuss) 18:28, 14 July 2014 (UTC)
I oppose this for the reason -sche gave, and the reasons you yourself gave too. —CodeCat 18:29, 14 July 2014 (UTC)
I find it quicker to understand with the gender in the head word. I say quicker, probably by a few tenths of a second. Renard Migrant (talk) 21:57, 14 July 2014 (UTC)

How about going the other way then? Actively removing the gender from the headword template? That's even easier to do! Renard Migrant (talk) 11:03, 15 July 2014 (UTC)

Software update: <ref> without <references/> no longer shows an error or categorizes[edit]

As was announced on Wikipedia but oddly not over here yet, "With the deployment of 1.24wmf12 on July 10, missing reference markup will no longer show an error; the reference list will show below the content [...] without adding a category, so there's no way to find and fix the affected pages." See this WP thread (permalink) and this WP thread (permalink) for discussion, and diff for an example of the phenomenon. Note that our abuse filter still (correctly) discourages adding <ref> without <references/>. - -sche (discuss) 17:18, 14 July 2014 (UTC)

If a page has multiple language sections, and un-<references/>ed ref tags are added to one of the upper language sections, the references appear in the last language section. This has the potential to be especially confusing for people who use Tabbed Languages. - -sche (discuss) 18:26, 14 July 2014 (UTC)

Uncountable?[edit]

In a discussion above, DCDuring noted something that (I think) implied we're not using the term "uncountable" the way we should. But I'm not quite sure what this means, as to me uncountable just means having no plural. Is this not what it means, and what does it mean in that case? I came across a few categories named "singulare tantum", is that the term we should be using instead of "uncountable"? —CodeCat 18:34, 14 July 2014 (UTC)

Uncountable does not mean having no plural, it means that quantities of the noun are not measured in discrete amounts. Theoretically, a noun could be countable, but not have a plural if, for example, there is only one in existence and no one ever speaks of any others. For an uncountable noun, it is impossible to say that "there is only one in existence". Proper nouns can be countable but not have a plural: there is only one William Shakespeare (barring metaphorical usage, or others who happen to have the same name), but William Shakespeare is most certainly countable. --WikiTiki89 18:57, 14 July 2014 (UTC)
What I understand, then, is that uncountable words have no plural for semantic reasons (it makes no sense to speak of a plurality) while the remainder have none only because it is simply rarely used or not at all. —CodeCat 19:04, 14 July 2014 (UTC)
Well I think that proper nouns such as William Shakespeare also don't have a plural for semantic reasons, but it's a different semantic reason. --WikiTiki89 19:23, 14 July 2014 (UTC)
"Paint" is uncountable when you talk about "some paint" but countable when you talk about "three different red paints". If something has no plural but is singular, I tend to use the "plural not attested": {{en-noun|!}}. Equinox 19:30, 14 July 2014 (UTC)
Yes, but the "paint"s in your two examples are different senses. --WikiTiki89 19:34, 14 July 2014 (UTC)
I think CodeCat is thinking about the inflection line for common nouns and {{en-noun}}, ie, not definition-level of countablity/uncountability distinctions.
The prevailing pattern of usage of "uncountability" by English native contributors at Wiktionary coincides with the mass noun concept. However, many uses of various early incarnations of {{en-noun}} used features of the template intended to mark uncountability (mass noun) to suppress the display of plurals, for whatever reason the contributor felt justified that suppression, eg, user didn't-know-how/couldn't-be-bothered to get plural ending in "es" or a truly irregular form to display, user didn't think noun had or should have a plural form, plural form was not attested. If you combine that with the changes to {{en-noun}} wrought by contributors with an imperfect understanding of the concept, you can understand why we have not made much progress in rectifying this. I hope we can come up with some scheme so that our inflection-line displays can be made correct without thousands of hours of tedium and are not too misleading in the interim. I doubt that bots can be relied on however, except perhaps for narrowly circumscribed cases.
At the sense level we use "labels" or "contexts" to distinguish. There is nothing that prevents understanding usually if someone uses a countable noun uncountably or an uncountable noun (mass noun) countably, but we have invested a great deal of effort in attempting to distinguish uncountable from uncountable senses, which effort is worth preserving. The task of marking each English noun sense as countable or uncountable (or both) is quite incomplete.
At the inflection-line level, we do not usually get data to support our claims that a given common noun is always or never countable or that countability of uncountability is the prevailing usage, relying mostly on native-speaker intuition, as most other dictionaries do not expend resources on this matter. DCDuring TALK 20:59, 14 July 2014 (UTC)
Would it be ok then if we adopt the practice of showing "no plural", "singular only" or the like in the headword line, and leave countable/uncountable information to the individual senses? That way the headword line is agnostic about countability, which makes sense if this can be different for different senses anyway. It would also mean changing the categorisation of many nouns, emptying out "uncountable nouns" categories in most cases and substituting it with something else. Possibilities might be Category:English singular-only nouns or Category:English nouns with no plural. We may want to revise the use of "plurale tantum" as well. —CodeCat 21:09, 14 July 2014 (UTC)
CodeCat said "as to me uncountable just means having no plural". Oh come on I find it hard to believe you're not better educated than that. There's such a thing as countable singular use, e.g. "I have a grain" is a countable singular use of grain. "I have some grain" is uncountable use of grain. Some countable nouns will be attested in the singular but not the plural. Renard Migrant (talk) 22:04, 14 July 2014 (UTC)
WT:AGF says we should take CodeCat's word for it. DCDuring TALK 22:15, 14 July 2014 (UTC)
No, it only says that we should assume CodeCat's intentions were in good faith. --WikiTiki89 15:00, 15 July 2014 (UTC)
@CodeCat: If eliminating inflection-line information would make things simpler for you, who am I stand in your way? Why don't we eliminate the display of regular plurals (ending in "s", "es", and "ies") too? Oh, wait, users might value the information.
The logic of our entry display is that inflection-line information is assumed to carry over to definition lines unless there is something contrary indicated on the definition line. Thus exceptional plurals are sometimes displayed at definition lines, sometimes only at definition lines. It is a major change to depart from that formulation for one attribute of one PoS in one language, especially where the language is the wiki's host language.
So, before we start changing modules and templates of wide use, I would like to understand an implementation plan that preserved the correct information that was now in the inflection lines and transferred it to the definition lines for each type of headword-line, whether implemented using {{en-noun}} or {{head}} directly or by other means. A dump-processing run that took a census of the options used in {{en-noun}} would be useful for that. We must have at least a dated one to support the major changes you previously made to {{en-noun}}.
It would be nice if the changes were carried out with more care and knowledge than the changes made to {{en-noun}}. DCDuring TALK 22:15, 14 July 2014 (UTC)

Context Label: Reflexive[edit]

I have recently edited the module code for the context labels (https://en.wiktionary.org/wiki/Module:labels/data) so as to have it automatically send entries marked with the label "reflexive" into a category named "-LANGUAGE NAME- reflexive verbs". I did this in an attempt to have the Macedonian reflexive verbs compiled into a list, since I didn't see any other way to do this other than add "[[Category:Macedonian reflexive verbs]]" under each entry, which didn't seem like an ideal solution - I wanted something automatic, just like the automatic system that works for intransitive and transitive verbs. I also thought that if I merely wrote "[[Category:Macedonian reflexive verbs]]", it may end up erased in the future, whereas some automatic mechanism would be operable on a longer term. However, things have gone awry.

Apparently, the context label "reflexive" has been used for various entries in various languages to mark reflexive pronouns as well. It has also been used to denote reflexive senses of verbs which are not truly reflexive and thus don't belong in a reflexive verb list. Now, I suppose these things need mending, so I have come here to announce what has happened in hope that someone will be able to restore things the way they were before the change (and possibly advise me as to how to solve the problem I had with the Macedonian reflexive verbs, i.e. how to have them automatically go to a list of reflexive verbs). Martin123xyz (talk) 14:59, 15 July 2014 (UTC)

As far as I understand, you could have a label 'reflexive verb' that displays reflexive but categorizes in reflexive verbs. I don't know about other languages (much) but in French, almost all transitive verbs can be used reflexively, and almost no verbs are always reflexive, so you could talk about reflexive usage but not reflexive verbs (because they're not inherently reflexive, just they can be used that way). Renard Migrant (talk) 15:02, 15 July 2014 (UTC)
I noted before you made this change that calling verbs where one or more senses are used reflexively "reflexive verbs" is silly. Just look at Category:English reflexive verbs now. Almost none of them are actually reflexive, they just happen to have a sense that is used reflexively. The same applies to Category:English transitive verbs and Category:English intransitive verbs as well, which also had categorisation added recently for some reason. And Category:English countable nouns and Category:English uncountable nouns are a similar problem, which prompted me to start the discussion above. —CodeCat 15:03, 15 July 2014 (UTC)
That's the argument for categories with names like Category:English nouns with countable senses. Even then, it would seem even better to just not categorize at all. Renard Migrant (talk) 15:19, 15 July 2014 (UTC)
Probably, yes. Most of the time these labels are only used when it's not clear from the definition, or to contrast with other definitions. So paradoxically, the nouns labelled "countable" are primarily those which are also labelled "uncountable". —CodeCat 15:21, 15 July 2014 (UTC)
I know that many think it pointless to have a reflexive verb category, but in Macedonian there are some verbs that are always reflexive, i.e. whose reflexive form is inherent. For example, "се кае" means "to regret", but "кае" doesn't anything. Also, there are many cases where a reflexive form of a verb is unrelated to the basic one when it comes to meaning. Thus, "дере" means "to skin" whereas "се дере" means "to scream". I think that these verbs deserve a separate category. Finally, many of the reflexive verbs in Macedonian have one-word equivalents in English - in those cases, English doesn't convey reflexivity explicitly. For example, Macedonian "се движи" and "движи" both correspond to English "move", but they have different meanings - the former means to be in motion whereas the latter means to cause something to be in motion. I think that in these cases too, the reflexive verb deserves its own category.
It's not as though I created separate entries for all reflexive forms in Macedonian and then declared them unique verbs. For example, I haven't created an entry "се допира" beside "допирa", because I don't feel that there is anything special about the reflexive form - it is marked explicitly in English too. Namely, the difference is that between "to touch oneself" and "to touch". This is because "се допира" is a true reflexive verb, whereas the point is that I am not really focusing on the true reflexive verbs. I am more interested in a separate category for the autocausative, anticausative and inherent ones. The true reflexive, reciprocal, and universal passive verbs are predictable and as you pointed out, derivable from any transitive verb. I really don't know why all of these are under the umbrella term "reflexive verbs"...
Anyway, I have a potential solution. I would create a new context label in the module code, called "mkreflexive", which would send entries to "Macedonian reflexive verbs", and I would mark Macedonian reflexive verbs with it. Meanwhile, I would set the display to simply "reflexive", which is what users actually need to see. Then, only I (and possibly someone who chooses to continue my work in the future) would use this label, and there would be no categories for reflexive verbs for other languages and no problems with reflexive pronouns or pseudo-reflexive verbs. However, could the problem I have caused already be fixed, i.e. could all the unnecessary (and defective) categories be undone? Martin123xyz (talk) 15:24, 15 July 2014 (UTC)
{{fr-verb}} covers this by allowing type=reflexive. There are some, s'agir differs in usage from agir for example. Renard Migrant (talk) 15:28, 15 July 2014 (UTC)
I don't agree with creating such a label. A better solution would be to let the inflection table add the category. —CodeCat 15:31, 15 July 2014 (UTC)
How would I let the inflection table add the category? I use the same inflection table for reflexive verbs, except that I use the parameter "ref" to have it add the reflexive marker "се" where appropriate. 77.29.125.14 16:08, 15 July 2014 (UTC)
You (or someone) can edit the template and have the "ref" parameter trigger a category. --WikiTiki89 16:48, 15 July 2014 (UTC)
Could you tell me how to have the parameter trigger a category? I have no idea where to code that, as I've never even defined the "ref" parameter anywhere. I just automatically used it in an if-statement and it worked. Martin123xyz (talk) 16:55, 15 July 2014 (UTC)
Exactly the same way. You use it in an if-statement, and have the true-clause add a category: {{#if:{{{ref|}}}|[[Category:WHATEVER]]|}}. You can add that anywhere really, but the end is the best place I think. --WikiTiki89 17:06, 15 July 2014 (UTC)
How very simple - thank you. I didn't think it could just work like that. I'll see to it soon enough. Martin123xyz (talk) 17:08, 15 July 2014 (UTC)
I prefer refl in general to avoid confusion with reference. Renard Migrant (talk) 20:58, 17 July 2014 (UTC)

Script or language: let us reduce ambiguity and prevent confusion![edit]

On pages such as https://en.wiktionary.org/wiki/Appendix:Proto-Slavic/-ica and in many translations lists, spellings of one word in one language are given in multiple scripts. These scripts are indicated by names that sometimes coincide with language names, such as with "Latin" and "Hebrew". That easily creates ambiguity and, with it, confusion, at least with me. I have changed such references several times by adding the word "script" where a script is meant, but such contributions have also been reverted. I do insist that tables where language( groups) may be branched into several languages and where languages may be branched into several scripts, it is difficult for the eye to make out if the final branch concerns a language or a script.

I propose adding the word "script" to all occurrences of script names near language names. I do not know how to do it, but there seem to be scripts (of a different kind, this time) that can help us do this in a rather automated way.

Please help a language enthousiast, and his colleagues!

(I am trying to find my way, and just found out that this divided into month parts. This is the second place where I added my plea, because I put the first one in a month part somewhere in 2013.Redav (talk) 20:27, 17 July 2014 (UTC)

Support. Before I started using targeted translations, I used to come across some really strange Latin translations, only to find out it was Latin-script Ladino. It was my fault for skimming through the translations too quickly, but it won’t hurt to add script to the lines. — Ungoliant (falai) 20:35, 17 July 2014 (UTC)
(e/c) I don't see where the confusion can come from, since Latin is not a sub-language of Serbo-Croatian. In some places, we do use the word "Roman" instead, but this does not solve the problem in the general case, since some multi-scriptal languages use scripts like Hebrew and Arabic, which are also languages. --WikiTiki89 20:38, 17 July 2014 (UTC)
"Roman" is a misnomer anyway. The script is called Latin; "roman" is one variety of the Latin script, the other being called "italic". —Aɴɢʀ (talk) 20:44, 17 July 2014 (UTC)
By that logic, romanizations would be de-italicizations. "Roman" has both meanings. --WikiTiki89 20:50, 17 July 2014 (UTC)
Would they even be de-italicizations, if they were presented (as they sometimes are) in italics? Would they not then be italicizations? Why, this puts a whole new spin on the debate over whether or not to italicize Cyrillic! lol
As WikiTiki says, "Roman" has both meanings. - -sche (discuss) 20:55, 17 July 2014 (UTC)
For some languages, such as Cree, script names are (in my experience) not provided at all. One sees simply Cree: ᒪᐢᑲᐧ / maskwa. Providing script names, especially with "script" spelled out, would be quite unwieldy:
- -sche (discuss) 20:55, 17 July 2014 (UTC)

For Beer parlour people who work more in discussions than on translations or in the main namespace, it IS confusing to have "Latin" and "Hebrew" to mean both the script and the language (also script tags Roman and Cyrillic). If you used User:Conrad.Irwin/editor.js quite a lot, you'd notice that the name conflict is quite frequent. When a translation into Hebrew, Aramaic, Serbo-Croatian, Latin appears not where it's expected, either from this tool, a bot or a human error. I'm not suggesting any specific solution but just letting you know that I have also experienced these problems firsthand and I am also very interested in the resolution. --Anatoli T. (обсудить/вклад) 00:49, 18 July 2014 (UTC)

You're right that it causes bugs in some of our tools, but I don't think it's confusing to people (at least when everything is formatted correctly). --WikiTiki89 13:13, 18 July 2014 (UTC)

User:JackBot[edit]

This bot, which belongs to User:JackPotte, has been active on Wiktionary in the past, but in December I noticed that it has no bot flag, so I followed our current procedure as I understand it, and blocked the account. Since a protest of the block has now been posted on the talk page, I thought it would be a good idea to expedite things by bringing it up here. I also want to know if I should have dealt with the matter differently, and if I should handle bot accounts differently in the future.

I should mention that, although most of the edits have been interwikis, a run was performed in March of 2013 that created a large number of entries for Geological era names, at least some of which (if memory serves) ended up in rfv. I don't have any objection to those entries as a whole, and they may very well be a one-time exception to the bot's normal interwiki tasks, but I thought they were worth mentioning, just to be complete. Chuck Entz (talk) 21:58, 19 July 2014 (UTC)

Hello, just to precise that my March summaries were pointing to their BP permission. JackPotte (talk) 22:48, 19 July 2014 (UTC)
I think JackBot failed either two or three bot votes. Renard Migrant (talk) 21:20, 23 July 2014 (UTC)
Precisely and objectively I had already proposed here two bot jobs which had been judged unnecessary by a minority:
  1. Wiktionary:Votes/bt-2009-12/User:JackBot
  2. Wiktionary:Votes/bt-2010-11/User:JackBot2
But they could also have been useful as on 21 other wikis as you can see, and are not linked to the test for which I was indefinitely blocked after without a message (which is not praised in the current recommendations as I've already demonstrated in the dedicated template).
Moreover I used to published my scripts on the bot subpages and Github, if you want to make your own idea of the whole context apart from that. JackPotte (talk) 19:30, 24 July 2014 (UTC)

Russian pronunciation - standard, alternative, regional, dated or simply individual[edit]

User:Wikitiki89 has been persistently adding alternative Russian pronunciations, which I consider not only non-standard but individual and rare, possibly limited to immigrants. He has been very persistent in his edits, so any reversals just results in edit wars. I have no problem with having alternative non-standard forms but Russian is much more phonetic than he claims it to be, so if you pronounce it irregularly, you can spell it so, there are notable (well-documented exceptions), which also follow certain rules or patterns but there is some limit to irregularities. I tried to compromise by creating alternative non-standard forms but he insists on adding these irregular pronunciations on the regular entries. In particular he claims that these words are alternatively pronounced:

  1. капюшо́н (kapjušón) as капишо́н (kapišón)
  2. двою́родный (dvojúrodnyj) as двою́рный (dvojúrnyj)
  3. во́доросль (vódoroslʹ) as во́дросль (vódroslʹ)
  4. не́который (nékotoryj) as не́кторый (néktoryj)
  5. сейча́с (sejčás) as щас (ščas) (I'm OK with this one but still the casual pronunciation should belong to the alternative forms, since it exists)

Another claim was that бюрокра́тия (bjurokrátija) can be also pronounced as бирокра́тия (birokrátija), which I find quite ridiculous and he's using кверх нога́ми (kverx nogámi) as the first translation for upside down, "кверх нога́ми" sounds very rustic and illiterate to me (even if this form can be found on the web), вверх нога́ми (vverx nogámi) is the common and standard form. These alternative forms do exist but they are not as common and these pronunciations are neither standard nor common. In any case, the alternative pronuncations, IMO, belong to alternative forms. I am creating a request on gramota.ru, since I don't know how to handle this situation. The English Wiktionary doesn't have enough native Russian speakers, so I'm not sure this argument can be resolved. On the Russian Wiktionary such edits would be ultimately reverted. I don't claim to be the ultimate source for the Russian language but some Russian edits of Wikitiki89 surprise me. Sorry, I don't mean to insult him or something. My goal is accuracy. --Anatoli T. (обсудить/вклад) 00:43, 21 July 2014 (UTC)

Let it be known that (1) Anatoli's Russian and my Russian are from different regions, (2) I was raised in a highly educated environment and could not possibly have picked up any "illiterate" Russian, and (3) I have been willing to discuss each of the above cases individually with Anatoli and don't see a need for BP discussion. --WikiTiki89 01:44, 21 July 2014 (UTC)
1) My family's accents are a mixture of south Russia, Ukraine and Siberian accents. Due to education exposures and self-discipline as far as the language I speak standard and common Russian, not southern or Ukrainian Russian, travelled a lot in Russia, read many books and watched a lot of movies, videos, etc. My Russian is not regional at all and I can tell when Russian is regional or non-standard. And since I lived till I was 30 in Russia, speak Russian with my family, friends and communicate with Russians in Russia, I have been exposed to various accents. I'm sure I can judge what is right and what is wrong in Russian to a high degree but as I said, by no means, I don't consider myself an ultimate source. Having said this, I humbly consider my Russian significantly better than his. 2) It is quite commendable for a long-time emigrant, who left Russia in the young age to preserve the language but there are still small problems, which show in the edits and I don't think we should allow misleading info. 3) the discussions so far have not been very fruitful and edit-warring has happened on a number of entries. As an interim solution, I suggest to source irregular pronunciation with something other than plain Google searches. As I said, I don't oppose any non-standard form entries, which I have also created. --Anatoli T. (обсудить/вклад) 02:05, 21 July 2014 (UTC)
And how should I do that? With links to YouTube videos? --WikiTiki89 02:17, 21 July 2014 (UTC)
Not sure yet. Maybe Youtube, if the pronunciation is clear and the speakers are native speakers. --Anatoli T. (обсудить/вклад) 02:23, 21 July 2014 (UTC)

I support moving non-standard pronunciations to the entries with non-standard spellings. --Vahag (talk) 08:57, 21 July 2014 (UTC)

This would resolve some disagreements, since Russian pronunciation is much more regular and most irregularities are documented. It doesn't matter that much if a spelling is used more often than pronunciation or the other way around. Just need to create those non-standard forms. The irregular spelling is used quite often used to render the irregular pronunciation and the existence of irregular spellings can usually be easily found. --Anatoli T. (обсудить/вклад) 09:11, 21 July 2014 (UTC)
On the other hand, most people who use the colloquial pronunciations, use only the formal spellings. --WikiTiki89 11:55, 21 July 2014 (UTC)
Yes but even if a person reads "what's up" as "wassup", it doesn't mean that "what's up" should have the same pronunciation. It's better to separate regular and irregular pronunciations and spellings, especially when a form is definitely a different (older, colloquial, regional) of another one, like капюшо́н (kapjušón) and капишо́н (kapišón). --Anatoli T. (обсудить/вклад) 22:52, 21 July 2014 (UTC)
"What's up" and "wassup" is a bad example, because it is colloquial either way and so people will write it exactly as they say it. This is more like environment being pronounced like enviorment (which is citeable in google books:"enviorment"). Most people who say enviorment, still write environment, which is why it makes sense to have the pronunciation right there. --WikiTiki89 23:32, 21 July 2014 (UTC)
Well, it depends on the case and if you can make this type of judgement. If you consider "wassup" a bad example, then "капишон" is worse. It's a dated form, not an alternative pronunciation, since "пю" is never read as "пи" as you insisted. --Anatoli T. (обсудить/вклад) 23:51, 21 July 2014 (UTC)
I said that "wassup" is a bad example, because "what's up" is also colloquial. "Капюшон" is not colloquial, so that reason does not apply. Compare it to my example of environment. --WikiTiki89 23:58, 21 July 2014 (UTC)
I think you misunderstand. "wassup" (colloquial) should have its own pronunciation, so should "капишон" (dated) and "двоюрный" (irregular) but the regular forms shouldn't include them. "водросль", "некторый" may be considered similar to the "enviorment" case. --Anatoli T. (обсудить/вклад) 00:14, 22 July 2014 (UTC)
I did not misunderstand you. You misunderstood me. All I am saying is that "wassup" is a bad analogy because "what's up" itself is also colloquial, while "enviorment" is a much closer analogy. Would you say that the pronunciation /ɪnˈvaɪɚmɪnt/ doesn't belong at environment? --WikiTiki89 00:25, 22 July 2014 (UTC)
It does, I have already said so, so does "сечас" belong to "сейчас", "пожалуста" to "пожалуйста", also "водросль", "некторый", even if pronunciations are less common. --Anatoli T. (обсудить/вклад) 00:33, 22 July 2014 (UTC)
Am I missing something or are you agreeing with me now? --WikiTiki89 00:39, 22 July 2014 (UTC)
What I'm saying is, one needs to judge whether a pronunciation is indeed alternative or it should belong to a different spelling. /ɪnˈvaɪɚmɪnt/ and /ɪnˈvaɪɚnmɪnt/ can belong to "environment" entry. Same with some Russian words I mentioned above, e.g. сейчас as /sʲɪˈt͡ɕæs/ (=сечас). However, "капюшон" and "капишон" should definitely have separate pronunciations, like "wassup" and "what's up". --Anatoli T. (обсудить/вклад) 00:48, 22 July 2014 (UTC)
Ok, so you agree with me for "водросль" and "некторый", but not for "капишон". I'm willing to concede "капишон" for now until I get some more data on it. I have already found a few YouTube examples of the pronunciations "водросли" and "проволка" used with the spellings "водоросли" and "проволока". --WikiTiki89 01:10, 22 July 2014 (UTC)
Yes, "водросль" and "некторый" are OK, even if I don't think they are common, I found that people were surprised like me with these accents but these can be considered alternative pronunciations with a drop of vowel, which does happen. So, I'm conceding on these. I put "двоюрный" into the same bucket as "капишон", although they differ in etymology. Note, even if you find pronunciation "капишон", it still belongs to this different spelling. Just making sure you agree on the distinction. --Anatoli T. (обсудить/вклад) 01:19, 22 July 2014 (UTC)
It depends. For example, if I find a video titled "мой классный капюшон" where it is clearly pronounced "капишон", then that is (one piece of) evidence that the pronunciation does belong at "капюшон". I also think there is some confirmation bias going on. When I listen to someone say "капюшон", I hear "капишон"; and I'm sure that if you listen to someone say "капишон", you will hear "капюшон". These vowels are very close and in a short unstressed syllable, they are hard to distinguish. --WikiTiki89 01:27, 22 July 2014 (UTC)
I know what you mean. There are ways, as I suggested one can use a tool such as Audacity where you can listen in a very slow speed (you can adjust the speed). The audio should be available as an MP3 or OGG file, for example. Yes, your example with "мой классный капюшон" would work. As an example, I used Audacity to determine Chinese tones and prove my point that Chinese tones are pronounced even in quick speech. "Hard" is not impossible with technologies. --Anatoli T. (обсудить/вклад) 01:42, 22 July 2014 (UTC)
As a longtime student and user of the Russian language, I consider the Russian entries on English Wiktionary to be intended for native English speakers who are interested in a Russian word or who are studying Russian. As such, I see no value in putting these anomalous pronunciations here, and I think American students of Russian will take away the wrong thing from them. Such pronunciations belong in the Russian Wiktionary for the enjoyment of a native Russian audience. This reminds me of w:Charles Robert Jenkins, an American defector to North Korea. Jenkins got a job teaching English at a North Korean university, since the North Koreans wanted to learn English well enough to pass as South Korean. However, Jenkins was from North Carolina and spoke with a strong southern accent. Once the Koreans learned his English pronunciation was very odd, he was fired from his job. When people study a foreign language, they usually want to learn the best standard pronunciation. —Stephen (Talk) 03:31, 22 July 2014 (UTC)
I have nothing against properly indicating which pronunciations are standard and which are colloquial, but there is no reason to suppress information. --WikiTiki89 04:00, 22 July 2014 (UTC)
We’re not suppressing information, it’s a matter of putting the information where it belongs. This information belongs on the Russian Wiktionary. There are three major accent areas in spoken Russian ... if we wanted to see nonstandard pronunciations here, it would be far more preferable to show the pronunciations of the other two major accents, northern (with оканье among other features) and southern (with аканье/яканье among other features). But even this is really not useful to indicate on every page, and would be likely to cause confusion and damage. The northern and southern Russian accents should be described and explained with sufficient examples on Appendix pages. But the idiosyncratic pronunciations you are adding are not so useful or interesting and I would not include them on English Wiktionary at all. —Stephen (Talk) 05:55, 22 July 2014 (UTC)
I will not comment on the specifics of this discussion, as I’m very little familiar with Russian, but I support the inclusion of regional, nonstandard and colloquial pronunciations in the English dictionary. They should be tagged as such, of course. — Ungoliant (falai) 21:48, 23 July 2014 (UTC)
It's about specifics of various pronunciations. It's not so much about whether we include regional, nonstandard and colloquial pronunciations but whether they are frequent enough for inclusion (not individual, used by limited overseas communities), belong to the same spelling as the standard pronunciation. Yes, labelling is important and we do include variants. Major variations - northern "okanye" and southern "h" for "g" could be considered as well, if they are needed. --Anatoli T. (обсудить/вклад) 22:59, 23 July 2014 (UTC)

User:Wikitiki89 and proper nouns[edit]

User:Wikitiki89 has been going around changing things from proper noun to common noun. In particular, he has been doing it with political factions such as Libertarian and Democrat, plus the California separatist group known as the Osos. I believe that he is in error, and I have reverted him pending discussion here. Purplebackpack89 17:34, 23 July 2014 (UTC)

Democrat, Libertarian and Oso are proper nouns
  1. Purplebackpack89 17:34, 23 July 2014 (UTC)
Democrat, Libertarian and Oso are common nouns
  1. Sure, Democrat, Libertarian and Oso are common nouns. Just like many of the items at Category:English words suffixed with -ian. Just like Frenchman, Popperian, or Clintonite. --Dan Polansky (talk) 17:58, 23 July 2014 (UTC)
Discussion
Please take a look at our POS for Englishman, American, Frenchman, and many more, none of which I have ever edited. --WikiTiki89 17:37, 23 July 2014 (UTC)
But you have mass-edited a number of pages in the last hour or so after I mentioned Democrat was tagged as a proper noun, and have edit-warred with me to keep them common nouns. You should stop changing pages until this discussion is over or you've linked me to another beer parlor discussion that supports your POV. Purplebackpack89 17:43, 23 July 2014 (UTC)
I do not need your permission to make changes that we have had a consensus on for a long time. --WikiTiki89 17:45, 23 July 2014 (UTC)
If you claim such consensus has existed for a long time, the least you can do is provide a link to that discussion (and the discussion from earlier this month is a) still going, and b) not at consensus at the moment). And if the last discussion with consensus was indeed a long time ago, then it may not hold now and it is perfectly acceptable to revisit it. Particularly if the discussion was about some subset of nouns that are different from this subset. Purplebackpack89 17:50, 23 July 2014 (UTC)
Take your pick. --WikiTiki89 17:55, 23 July 2014 (UTC)
Wikitiki89 is correct; these are common nouns, like Briton and Nazi. - -sche (discuss) 18:01, 23 July 2014 (UTC)
I agree, and so do the professionally edited dictionaries in which I just checked Frenchman (Chambers, Merriam-Webster, OED). The test of "properness" of a noun is not, of course, just whether it has a capital letter! Equinox 18:56, 23 July 2014 (UTC)
So a user has been going round making correct edits. Why are we discussing this? Are there so few correct edits nowadays we need to have threads to discuss them in? Renard Migrant (talk) 21:19, 23 July 2014 (UTC)
Democrat isn't the faction (as you put it) Democratic Party is the faction, perhaps that's what's causing the confusion here. Renard Migrant (talk) 21:26, 23 July 2014 (UTC)

Can we get 'particularly useful translation target' into CFI?[edit]

From Wiktionary:Requests for deletion#emergency physician (later: Talk:emergency physician) two users want to keep outside of CFI as a translation target. I worry about translation targets as a bit of a slippery slope issue. Do we want an entry in English for everything that can be expressed as a single word in one other language? No. Because then we'd end up with he had had in his possession a bunchberry plant (I'm not kidding, see xłp̓x̣ʷłtłpłłs). Is there any way to regulate this? There's a further issue, translation is necessarily subjective so what one person might translate with a two-word noun, I might translate with a slightly different two-word noun. It's tricky.

As a completely separate issue, I've noticed that entries de facto don't need to meet CFI. They just need to not get nominated for deletion or get nominated and pass with a consensus even if they don't meet CFI. I suppose that's why serious efforts to amend CFI into something usable have failed. It's easier to just keep on ignoring it. Renard Migrant (talk) 21:16, 23 July 2014 (UTC)

You're setting up a strawman. No one has ever been proposing to have he had had in his possession a bunchberry plant only because there is a single entry like xłp̓x̣ʷłtłpłłs. If we were after a formal strict set of criteria for translation targets, we would take care to handle these sorts of languages. --Dan Polansky (talk) 21:36, 23 July 2014 (UTC)
I'm not setting up a strawman. I'm saying we would need criteria and you seem to be agreeing. Renard Migrant (talk) 21:45, 23 July 2014 (UTC)
I agree; this practice should be codified. My first suggestion is that it should be used for lexemes, not individual forms, with distinct meaning (i.e., let’s not add I will do because farei, haré etc. exist nor translations of the “sentence-words” of polysynthetic languages). — Ungoliant (falai) 21:40, 23 July 2014 (UTC)
Thirded. We should codify "hot words" too. The problem always is that there are so many issues and so few people willing to tackle them. And often people get distracted before we reach anything conclusive. Keφr 22:10, 23 July 2014 (UTC)
  • Support: Purplebackpack89 22:12, 23 July 2014 (UTC)
  • Support as well. I would also support clarifying CFI in general to make it less opaque and more friendly to people not familiar with Wiktionary. Ideally, it should be written in such a way that someone who has spent only a day using Wiktionary (as a reader, not a contributor) should be able to understand enough of it to not do anything really bad. —CodeCat 22:16, 23 July 2014 (UTC)
  • Support as well. Also, I think we need to include Wiktionary:Lemming_principle#Lemming_test. What about back-translations from English (for lexemes only, as per Ungoliant's comment above)? Terms such as па́лец ноги́ (pálec nogí) and 足の指 (ashi no yubi), etc. have passed RFD, both non-idiomatic translations of toe, literally "finger of the foot". Such terms do penetrate various dictionaries, since "toe" exists in English, what's the word for it in language X? --Anatoli T. (обсудить/вклад) 22:59, 23 July 2014 (UTC)
    • To note: the closing comment at Talk:палец ноги and Talk:足の指 are "Kept: no consensus to delete either entry" and "Kept: no consensus to delete" respectively. Keeping due to "no consensus" is a rather weak outcome in my opinion, and I always had the impression that these "no consensus" entries are more open to renomination than those with clear consensus to keep. This is hardly "passing RFD". Keφr 23:17, 23 July 2014 (UTC)
I know. "No consensus" is not a strong case for closing RFD. Still, they are kept for now. With proper formatting (a soft redirect?) and labelling, they may be a bit more palatable. They are not idiomatic by definition and if they are only there to point users to how an English term is translated, there may be some room for them here. --Anatoli T. (обсудить/вклад) 06:59, 24 July 2014 (UTC)

Looking to get AWB privileges[edit]

Hello. I'm relatively new to Wiktionary but I've been active on Wikipedia for a long time. I've been working on Old French verbs and there are a bunch of changes I'd like to make that are too painful to do without an automated regex tool like AWB -- basically, to change the templates used for conjugating a number of verbs. Could someone add me to the list of registered AWB users? Thanks.

Benwing (talk) 05:32, 24 July 2014 (UTC)

I have added you to Wiktionary:AutoWikiBrowser/CheckPage#Approved_users. —Stephen (Talk) 06:59, 24 July 2014 (UTC)
Awesome, thank you. Benwing (talk) 08:50, 24 July 2014 (UTC)

Are phrases lemmas?[edit]

Entries marked as {head|xx|phrase} are currently listed in the main lemmas category. Are they really lemmas? I think they should be in a Phrases subcategory under the main lemmas category. --Panda10 (talk) 12:33, 24 July 2014 (UTC)

They are lemmas because they are not a form of another lemma. —CodeCat 12:40, 24 July 2014 (UTC)
I agree, though it is sometimes hard to identify the lemma properly, as such multi-word entries, especially with verbs, are, at essentially defective, or at least have a dramatically different distribution of use across inflected forms. DCDuring TALK 13:51, 24 July 2014 (UTC)
The problem is really that we're not using the term "phrase" properly on Wiktionary. In many cases, it seems that "sentence" is the more appropriate term. See w:Sentence (linguistics). —CodeCat 14:05, 24 July 2014 (UTC)
I'm not sure about the "They are lemmas because they are not a form of another lemma" argument. Phrasebook entries such as I don't understand or this morning clutter up the lemma category and will contribute to an inaccurate count of lemmas. --Panda10 (talk) 13:05, 25 July 2014 (UTC)

Including sum-of-parts terms[edit]

One of the reasons against including sum-of-parts terms is that they are counter-productive in defining the term by picking and choosing some of the senses of the component parts, thus under-emphasizing the other senses. Listing all possible combinations would cause too much duplication of information, which is bad for a number of reasons; for example, adding or modifying a sense of the component parts would require also adding or modifying one or more senses of the whole term as well. When we include sum-of-parts terms, we often try to make them sound more idiomatic by making the definition more specific than it needs to be.

On the other hand, there are many reasons to include some sum-of-parts terms:

  • They are defined in other dictionaries and/or people are likely to look them up: random number
  • They have useful translations into other languages: last year
  • They happen to be spelled as one word or have alternative spellings as one word: coal mine, unhelpful (un- + helpful)
  • They have unusual etymologies, pronunciations, or other useful information: (can't think of any at the moment, but I know they exist)
  • They are non-obvious in the encoding direction, even if they are obvious in the decoding direction: and so on and so forth

We have provisions, some of which are controversial, for keeping some of the types of words listed above, but not for all. We also have endless RFD debates about keeping words "outside of CFI".

I think a compromise is needed and I propose allowing the inclusion of some sum-of-parts terms that we decide would be useful to include, but without real definitions, similar to what we already do for translation targets. This can apply to terms included through WT:COALMINE, as well as simple cases of prefixes and suffixes, where a full definition has very little benefit over linking to the component parts. Here are some examples I created: User:Wikitiki89/coal mine, User:Wikitiki89/unhelpful, User:Wikitiki89/and so on and so forth.

--WikiTiki89 15:35, 24 July 2014 (UTC)

The problem I see with your example for "coal mine" is that it requires prior knowledge of the term to understand how to interpret the parts of the term. There is nothing in your entry that specifies that it's the sense "excavation" that is meant, rather than "explosive device". This is exactly why we need a full definition for it and other similar entries. If a term were truly SOP, then it could be validly be interpreted and used as any possible combination of its parts' meanings. But the reality is very different, such terms usually have much more restricted uses. —CodeCat 15:48, 24 July 2014 (UTC)
Another more general issue is that we seem to treat "idiomatic" and "SOP" as antonyms where they often are not. and so on and so forth is definitely idiomatic, even if it may also be interpretable as a sum of parts. Idiomatic phrases often translate into idioms in other languages, but we are sorely lacking translations for such terms thanks to our overly strict focus on deleting SOP terms. —CodeCat 15:50, 24 July 2014 (UTC)
But that's the thing about SOP, a "coal mine" could be an explosive device made of coal (also, as I said, we will only do this "where a full definition has very little benefit over linking to the component parts"). As to your second point, that is why I did not use the word "idiomatic" here. --WikiTiki89 15:54, 24 July 2014 (UTC)
I think we need to consider whether a term is a term of art in a specified field. For example, genuine issue of material fact is SoP to one who knows which senses of each term are intended, but is also a set phrase used in the law, and one that can not be substituted for other phrases. I think that if a general dictionary has a phrase, we should have it, and if a specialized dictionary (legal, medical, engineering, slang, etc.) has a term, then we should have it with the appropriate context label. Context labels go a long way towards eliminating the problem of "picking and choosing some of the senses of the component parts" because they indicate that when this phrase is used in this field it only refers to the specified senses of the words included. bd2412 T 15:59, 24 July 2014 (UTC)
That's the idea here: we will have the phrase, but we will link it to the component parts. Note that we can consider "material fact" to be one part rather than two, and possibly likewise for "genuine issue" if it is in fact a set phrase outside of this term. --WikiTiki89 16:03, 24 July 2014 (UTC)
For that example, I'm not aware of "genuine issue" being used outside the complete phrase. Our definition of genuine actually doesn't really capture the meaning used here (an actual controversy between the parties, rather than the facade of a controversy designed to test the law). It is sense 11 of issue. However, I generally think that a veneer definition requiring readers to look at two or three different entries to figure out the complete meaning of a term would be a needless inconvenience. bd2412 T 16:28, 24 July 2014 (UTC)
It's less of an inconvenience than the inconvenience of finding incomplete information presented as if it were complete. --WikiTiki89 16:32, 24 July 2014 (UTC)
That is where I think a context tag helps. If you are talking to a geologist or a civil engineer or a utility company about a coal mine, then there is only one relevant meaning, and the information presented is complete within that context. We could, for all of SoP definitions that are set phrases within a particular context, have an &lit sense, so that we can inform readers that when used other than in the sense of industry or geology, "coal mine" can mean any combination of coal and mine. bd2412 T 17:04, 24 July 2014 (UTC)
The word there is only one relevant definition of "mine" when talking to a geologist or civil engineer; this has nothing to do with the preceding word "coal". --WikiTiki89 17:22, 24 July 2014 (UTC)
Coal mine is a bad example for this point, since it only exists due to coalmine. If "coalmine" didn't exist, I would agree to deleting "coal mine" as readily as "copper mine" or "uranium mine". However, this principle is directly applicable to random number, which in the context of mathematics will never mean a "slapdash and seemingly directionless performance of a dance routine within a larger show". bd2412 T 13:24, 25 July 2014 (UTC)
That's one of my points. Since we are only including coal mine because of coalmine, it does not need a real definition, so we can just link to its parts. --WikiTiki89 13:29, 25 July 2014 (UTC)
Are we still going to have a complete definition at coalmine? I wouldn't object to coalmine being an "alternative spelling of" template and coal mine being bare links, but I don't think coalmine can be used to describe a military mine that runs on coal, so something would be getting lost in the sequence there. bd2412 T 15:16, 25 July 2014 (UTC)
I think there's a bit of a slippery slope here. Your test page decomposes "unhelpful" into [[un-]] + [[helpful]], but there's nothing stopping it from being decomposed into [[un-]] + [[help]] + [[-ful]]. Is [[electricity]] then SOP too, as [[electric]] + [[-ity]]? Is [[nothing]] just [[no]] + [[thing]]? Are full definitions only for monomorphemic words? —Aɴɢʀ (talk) 16:32, 24 July 2014 (UTC)
Most multimorphemic words are not simply SOP of their morphemes. Out of your examples, only [[nothing]] can actually be defined as just [[no]] + [[thing]], but then it is for us to decide whether it is beneficial to do so in each specific case. --WikiTiki89 16:37, 24 July 2014 (UTC)
I disagree that "nothing" is the only one that is SOP of its morphemes, but either way, I think it would create far too much work for us to decide on a case-by-case basis which polymorphemic words are SOP of their morphemes and which aren't. It's hard enough for us to decide that for multi-word expressions as it is. —Aɴɢʀ (talk) 17:20, 24 July 2014 (UTC)
There isn't much deciding to do. If the definition at the term is clearly equal to the component definitions, then you can replace it with a reference to each component. If someone later decides that that definition is inadequate, he could replace that with an adequate definition. No huge RFD discussions are even required. --WikiTiki89 17:26, 24 July 2014 (UTC)
@Wikitiki89: It can be used with that meaning, yes, but Wiktionary concerns itself only with attestable meanings. So the question we should be asking is: is it used with that meaning? Does coal mine ever mean "explosive device made of coal"? I would be very surprised if it did, precisely because its main sense "excavation for mining coal" is so much more common and using it in any other sense would cause confusion. So in reality, "coal mine" is much more restricted in meaning than its parts allow, which makes it idiomatic and hence includable per CFI. —CodeCat 17:17, 24 July 2014 (UTC)
It's probably possible to attest that meaning. --WikiTiki89 17:22, 24 July 2014 (UTC)
"Probably" isn't good enough for an RFV, though. If our current entry was like your proposal, I could validly RFV all senses that arise from the possible combinations of meanings of its parts. And many of them would likely fail, which would then mean we would have to put in a more specific, limiting definition. —CodeCat 10:17, 25 July 2014 (UTC)
If you want to find citations, I will. Anyway, something like "see coal, mine" (however we choose to format it) does not imply that all combinations exist, so it is not necessary to narrow it down. --WikiTiki89 10:57, 25 July 2014 (UTC)

My feeling about a set phrase is a bit like the US Supreme Court judge's feeling about hard-core pornography: I can't define it, but I know it when I see it. Some are little more than common collocations, but when they are common enough, especially within a particular field or arena, then to me they start to ‘feel’ like single concepts and not two concepts stuck together. This is unscientific but I'm just trying to explain my process. The CFI tests are good ways to check if something is a set phrase, but sometimes a term can fail all of them and still demand coverage (at least to my mind). DCDuring's ‘lemming test’ is, I think, valuable because it gives us a rationale without having to explain exactly why something should be kept. The weird thing is that when I first joined Wiktionary, I was a firm deletionist. I thought that entries like fried egg and Egyptian pyramid were a waste of time. But over the years I have slowly done almost a complete 180. My feelings in general now are that if there is a significant minority of people who see value in an entry, then we lose nothing by keeping it. Ƿidsiþ 16:58, 24 July 2014 (UTC)

The whole point of my proposal here is to allow us to keep these set phrases, without duplicating their wide range of definitions from the component parts. --WikiTiki89 17:02, 24 July 2014 (UTC)
I don't object to it on principle in some cases, though not necessarily routinely. There is also the issue that if a multi-word term has more than one meaning, we would presumably want to split the two senses so as to show quotation evidence for each one, and then you would have to write some kind of meaningful definition. Ƿidsiþ 17:14, 24 July 2014 (UTC)

Misspellings[edit]

Misspellings are recognised as lemmas by {{head}}, but that doesn't quite seem right. They have their own parts of speech of course, so they should probably use the normal POS categories and templates like {{en-noun}}. But I imagine some might object to this because they are supposedly not "proper". Recently I created rediculous, which is the spelling I normally use, and which is quite easily CFI-attestable. But I opted to call it an alternative spelling, because it didn't seem right to label a spelling I use normally a "mistake". So I have been wondering whether labelling things as "misspellings" does not go against the descriptivist philosophy of Wiktionary. What we really mean is that these spellings are commonly proscribed, but they are probably not considered misspellings by the people that use them. So what do other editors think of this situation? Should we categorise them simply as misspellings, or should we give the proper POS? And should we continue to label them as "misspellings" or change the wording to something more descriptive?

As a side note, the template {{misspelling of}} originally said common misspelling, but I removed this because it looked silly for entries like animalike. —CodeCat 10:25, 25 July 2014 (UTC)

Rediculous is certainly a misspelling. I think the criteria for that should have something to do with whether most people who use it would admit that it is a misspelling if shown the correct spelling. --WikiTiki89 11:02, 25 July 2014 (UTC)
Well that doesn't include me, because I think the spelling "rediculous" makes more sense. It better reflects how it's pronounced, and that's probably what all the other people think too. —CodeCat 11:04, 25 July 2014 (UTC)
I realize that it does not include you, but I do think that it includes most people. I also think that the main influence of this spelling is not the pronunciation, but the abundance of word initial re- compared to the relative rarity of ri-. --WikiTiki89 11:12, 25 July 2014 (UTC)
That's bizarre: "littel" (little) would make more sense for pronunciation, but everybody knows that's not how English spelling works. Which other words do you respell for this reason? Equinox 12:07, 25 July 2014 (UTC)
The difference is that it was not a conscious effort to change the spelling based on some reasoning. I just wasn't acutely aware of how other people spelled it, and I spelled it the way I figured it would make the most sense. It's only after I found out how people write it that I figured, my way is fine too. —CodeCat 12:11, 25 July 2014 (UTC)
As to the question of whether something we agree is a misspelling should count as a lemma. I would think the answer to that is simply NO.
Perhaps we need to also review items in our English alternative spellings categories to root out miscategorized entries. We serve users well by misleadingly characterizing common misspellings as alternates. After all we are supposed to only have common misspellings. AFACIT rediculous is not even a "common" misspelling. It occurs 3 times in BNC/COCA combined vs nearly 8,700 occurrences of ridiculous. Results are similar in Google Books and Google N-gram. DCDuring TALK 11:17, 25 July 2014 (UTC)
Why only common misspellings? Why not just any that are attestable per CFI? And why should misspellings not be lemmas? They have plurals and other inflections like any other lemma might have. —CodeCat 11:34, 25 July 2014 (UTC)
It has been our practice to do so because the number of attestable misspellings of common words probably exceeds by far the number uf axepted [spelins. DCDuring TALK 12:02, 25 July 2014 (UTC)
I do hope that reasoning distiguishes between accidental mistakes, deliberate respellings, and deliberate and consistent spelling variants that are intended as normal use. We should definitely have the latter no matter how common, per descriptivism. For the former two, I think a criterium for commonness is ok. —CodeCat 12:07, 25 July 2014 (UTC)
No it does not and should not. We are documenting the set of conventions called language. DCDuring TALK 12:12, 25 July 2014 (UTC)
That would make sense if everyone followed the same conventions, but clearly they don't. If labelling something a misspelling is a matter of one group disagreeing with another group about the spelling, then why can we not label things like color as misspellings? My point is just that: Wiktionary cannot and should not decide what is a misspelling, and clearly mispelling-ness is not strictly defined as there are varying opinions about it. So what I ask for is clear criteria, which are verifiable, that can be used to decide when the label "misspelling" should be used. If Wiktionary is descriptive (which it is), then a label like "misspelling" should describe some objective verifiable reality, not subjective opinion. —CodeCat 12:17, 25 July 2014 (UTC)
I completely agree with CodeCat here. There is a difference between a misspelling most likely caused by the writer's clumsy typing alone (e.g. typign), a misspelling caused by the writer most likely not knowing how to spell the word (e.g. independance), and a misspelling most likely caused by the writer's intentional choice to use a variant in order to achieve a literary effect like showing snarkiness or dialect (e.g. rediculous, "gawn to the sto'"). The only typo we should include is teh, because its commonness has turned it into a word intentionally used in jest. The second kind we should include if they are common enough that a reader would want them defined, so we can inform the reader in our definition that this is not the correct spelling. The third kind we should include if they are attested, because their specialized use makes them subtly different words in terms of the definition itself. bd2412 T 12:52, 25 July 2014 (UTC)
Yes but CodeCat isn't saying that exactly, he's saying he (or she, not sure) continues to use ‘rediculous’ because he thinks it's ‘more logical’ and therefore it shouldn't be called a misspelling. Ƿidsiþ 13:00, 25 July 2014 (UTC)
I mean to agree with CodeCat's comment immediately preceding my response. But his earlier point is also valid. Isn't that why we have thru and tho? bd2412 T 13:19, 25 July 2014 (UTC)
My objection concerning "rediculous" specifically is that it didn't seem like a misspelling to me, just an uncommon alternative spelling. The "misspelling" part lies only in the proscription against it. This is why I consider "misspelling of" to be equivalent to "(proscribed) alternative/rare spelling of". Whether something is a misspelling is subjective, but widespread proscription against a certain spelling is objective and can be verified at least in theory. Proscription can wane as forms become more accepted, and people will no longer consider them wrong. So I think we should replace "misspelling of" with something else that makes that more clear. Something like "proscribed spelling of" - this fits with how "(dated)" + "alternative spelling of" gives "dated spelling of" and similarly for other usage labels. —CodeCat 13:41, 25 July 2014 (UTC)
I would consider something an "alternative spelling" if a significant number of people believe that it is the correct spelling, even if others proscribe it. --WikiTiki89 13:43, 25 July 2014 (UTC)
Then what about {{rare spelling of}}? —CodeCat 13:48, 25 July 2014 (UTC)
I would consider that an equivalent of {{cx|rare}} {{alternative spelling of}}. If a spelling is considered by almost everyone to be a misspelling, then it we should label it as such. --WikiTiki89 13:53, 25 July 2014 (UTC)
Do you think there is a difference between {{context|proscribed}} {{alternative spelling of}} and {{misspelling of}}? —CodeCat 14:18, 25 July 2014 (UTC)
Yes, something can be proscribed by some people and accepted as correct by others. --WikiTiki89 14:24, 25 July 2014 (UTC)
Does that mean that to you, a misspelling is accepted by nobody? —CodeCat 15:01, 25 July 2014 (UTC)
By no significant group at least. Note that I'm saying what the intrinsic criteria are, even if it may be impossible for us to determine whether this is the case or not. --WikiTiki89 15:17, 25 July 2014 (UTC)
I just noticed an entry that uses "misspelling" as the second parameter of {{head}}, i.e. uses "misspelling" as if it were a part of speech: [[aqui]]. I do not recall noticing this before. My initial reaction is that such entries should declare their actual part-of-speech, which in [[aqui]]'s case is "adverb". But I can also see how that would "pollute" the part-of-speech (and "lemma") categories with non-words (to whatever extent we use "misspelling" to describe things that are actually misspellings/mistakes, as opposed to intentional alternative spellings), and so I can see an argument for continuing to not put them into the POS categories.
Regarding the wording of the template: I think the idea behind including the word "common" was that it would emphasize and enforce our exclusion of rare misspellings. In practice, however, rare misspellings were including using the template anyway, so removing the word was probably good.
Similar to BD, I distinguish three categories of nonstandard spellings: (1) typos or typo-like misspellings, which are distinguished by (among other things) not being used consistently throughout a work, and which are not includable, (2) misspellings, or mistaken spellings, and (3) intentionally deviations from standard spelling, which we handle through templates like {{alternative spelling of}} and {{eye dialect of}}. (Re "teh": in my opinion, teh is includable because it has come to be used intentionally, and so it does not constitute an exception to the exclusion of typos.) - -sche (discuss) 02:51, 26 July 2014 (UTC)

Use of babel templates from other wikis[edit]

I was going to create Category:User eml, but saw that it's based on a language code we don't recognize (it was split into egl and rgn). That led me to wonder why we had Category:User eml-3 and Category:User eml-N. It turns out that there are a couple of user pages that have {{#babel:it| which means they're using the Italian Wiki's babel system, which apparently recognizes some language codes we don't, and that this prompts User:Babel AutoCreate to re-create categories that we had deleted.

Is this ok, and, if not, what should we do about it? Chuck Entz (talk) 19:19, 25 July 2014 (UTC)

The script was actually blocked twice for creating categories like this, once by someone who though it was a bot and once by someone who seemed to think it was a live user. As I noted when I unblocked it, the solution that's most obvious to me is to salt the categories we don't want by protecting them such that only admins can re-create them. Alternatively, we could allow people to specify fluency even in things we don't consider languages, and specially categorise the categories, e.g. we could allow Category:User eml and put it in Category:User egl and Category:User rgn. - -sche (discuss) 20:15, 25 July 2014 (UTC)
We also ended up with Category:User simple, Category:Romany language, Category:Traditional Chinese language, Category:British English language and Category:Simplified Chinese language thanks to this script. The categories don't exist, but they do have entries in them. —CodeCat 20:46, 25 July 2014 (UTC)
All of those except the first one were due to mistaken hard-coded categories, which it was simple to fix (e.g. 'Romani' was misspelt, I corrected it). We could continue to delete and "salt" those categories even if we decided to allow categories for retired language codes. - -sche (discuss) 02:58, 26 July 2014 (UTC)

Proposed compromise votes on romanizations[edit]

Since the various recent votes on romanizations have failed to achieve a consensus, I have drafted two compromise votes incorporating some ideas that had some traction in the various discussions. These are Wiktionary:Votes/pl-2014-07/Allowing well-attested romanizations of Sanskrit and Wiktionary:Votes/pl-2014-07/Redirecting attested romanizations. Cheers! bd2412 T 02:50, 27 July 2014 (UTC)