Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives +/-
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014


Contents

November 2014

Inappropriate capitalization of nouns[edit]

I've been working on clearing up some missing entries and I've noticed that many of the entries that are redlinked are, in fact, present, but under a capital letter. For instance admiraless is redlinked, but Admiraless is not. This is the case for a good number of nouns and probably ought to be corrected. The list of capital letter English nouns should be culled of all non-proper nouns. —Yellowhen (talk) 18:34, 1 November 2014 (UTC)

Look at the citations listed at Admiraless. It really is spelled with a capital letter. — Ungoliant (falai) 18:38, 1 November 2014 (UTC)
"Admiraless" seems to be a truly exceptional case. Equinox 18:45, 1 November 2014 (UTC)
The citations page does give one example of lower-case admiraless. Examples of lower-case usage are hard to come by since it's almost always as a title or part of a title. However, culling the list of capital letter English nouns of all non-proper nouns would be a bad idea since there are plenty of common nouns that are always capitalized in English, first and foremost demonyms like Englishman and Spaniard. —Aɴɢʀ (talk) 19:00, 1 November 2014 (UTC)
All those capitalized admiralesses are so because they are honorifics, or proper nouns (usually in an archaic writing style), or in in one case in an article title. This is absolutely clear where several of them accompany the term admiral, likewise with initial cap.
There are now three l.c. citations (although one is a reference to the word). This entry should be moved. See Wiktionary:Requests for moves, mergers and splits#AdmiralessMichael Z. 2014-11-24 02:27 z

Any way to force WT:CFI to be applied?[edit]

I'm getting a bit annoyed that entries that don't meet WT:CFI keep passing a vote at WT:RFD. Is there any way to force WT:CFI to be applied? Renard Migrant (talk) 16:50, 3 November 2014 (UTC)

I think one thing we need to stress is that RFD/RFV discussions are supposed to be about whether the entry meets CFI and not whether we personally want to keep the entry. This is easier to apply when there needs to be a unanimous decision because people are compelled to convince others of their reasons, as I have experienced when serving in a criminal jury. However I'm not sure how it should work when there is a vote and thus less of an obligation to think critically, since I have not experienced a civil jury. (Pardon me if my jury analogy does not hold outside the US.) --WikiTiki89 17:08, 3 November 2014 (UTC)
Whether an entry meets WT:CFI is often a matter of subjective interpretation, not objective fact. While it's true that sometimes people say "keep in spite of the fact that it doesn't meet WT:CFI", much more frequently it's a matter of one side saying "this entry does meet WT:CFI" and the other side saying "no it doesn't". When I vote "keep" at RFD for a term some people consider to be SOP, it's because I disagree with them that it's SOP, not because I think that its SOPness should be ignored. —Aɴɢʀ (talk) 18:17, 3 November 2014 (UTC)
A valid concern. It could be addressed by creating more objective criteria of what makes a term idiomatic, like we did with WT:COALMINE. Keφr 19:33, 3 November 2014 (UTC)
  • Oppose: If people want to keep an entry, it should be kept. This proposal would essentially give deletionists a supervote, and it would damn near make CFI a criteria for speedy deletion. Purplebackpack89 18:52, 3 November 2014 (UTC)
    • It already is a de facto criterium for speedy deletion. —CodeCat 19:01, 3 November 2014 (UTC)
      • And it shouldn't be. CFI is too subjective for that. Speedy deletion should be for junk or vandalism entries. Most entries with RfD votes aren't junk or vandalism entries. Renard wants any entry that doesn't pass CFI to be automatically deleted. The problem is the only way CFI is determined is for somebody to say "this passes CFI" or "this fails CFI", with certain permuations such as "this passes SOP" or "this passes SOP". Therefore, if Renard got his way on this proposal, if any ONE editor said "this fails CFI", the article would have to be deleted. That's regardless of whether or not he's in the minority, or whether or not somebody gives a good reason as to why it doesn't. That seems patently ridiculous to me. Purplebackpack89 19:06, 3 November 2014 (UTC)
        • That's not what I want. The problem is that entries that nobody thinks meet CFI get kept because they win the vote. Right now the vote is everything and policy is nothing. It wouldn't be a 'supervote' for deletions any more than it would be for inclusionists. Interpretation of CFI would matter if we applied it at all. We don't. Like I said, 100% voting, 0% policy. Renard Migrant (talk) 12:30, 6 November 2014 (UTC)
    • You misspelled "support". Keφr 19:38, 3 November 2014 (UTC)
      • @Kephir:, I didn't though. I don't like the ramifications of the enacting of what Renard wants, so I opposed the proposal. Since this is a discussion about interpreting or changing policy, I am fully entitled to oppose it for any reasons I see fit. Purplebackpack89 21:14, 3 November 2014 (UTC)
  • Could someone list a few examples of what this would affect, perhaps even recent RFD listings? It sounds like we have two camps:
    1. Delete anything that doesn't meet CFI.
      • Frankly, that sounds reasonable to me, and seems to have been our MO for quite some time.
      • Angr brings up concerns about subjectivity and how CFI is applied. These strike me as reasonable concerns, and also as the underlying issue in many of the CFI disputes I have witnessed over the past few years. An effort to further clarify CFI could be warranted.
    2. Include anything just because someone wants it included.
      • I must admit that this sounds terrible. I understand that WT is intended to be prescriptive and not proscriptive, but part of that descriptivism necessitates some evidence that a given term is actually in use in the language. Including an entry for fleemkaboddinal just because I happen to like the way it rolls off the tongue doesn't strike me as a sound basis for building a dictionary (pun not intended).
      @Purplebackpack89: could you clarify your statement? Are you arguing that any entry should be kept whenever any single editor wants to keep it? Or is your argument intended to be narrower in scope -- do you instead intend to state a position more specifically about entries deemed to be SOP, or some other more limited area? ‑‑ Eiríkr Útlendi │ Tala við mig 19:26, 3 November 2014 (UTC)
      No, @Eirikr:, I'm arguing we should keep any entry that at least 50% of RfD participants want kept. Also, the problem with your first camp is "who determines it"? Purplebackpack89 19:32, 3 November 2014 (UTC)
      • When you say "who determines it", what is the it? CFI itself? SOP-ness? Some other aspect? ‑‑ Eiríkr Útlendi │ Tala við mig 20:14, 3 November 2014 (UTC)
      The 50% of participants is a bad idea because not everyone will be around to participate in every RFD. 86.136.110.109 20:16, 3 November 2014 (UTC)
      When I said 50% of participants, I meant 50% of the people who participated in a given RfD... Purplebackpack89 21:14, 3 November 2014 (UTC)
      So did I! (Sorry, not logged in.) Having rules like CFI allows the consensus opinion (as established by policy votes) to be applied without every user having to repeat their opinions on every vote. Equinox 22:07, 3 November 2014 (UTC)
      So you'd be OK with an article being deleted after four users vote keep and Renard votes "Delete. Fails CFI"? Deleting something even if 70-80% of people in that particular discussion said keep? That seems to be what Renard wants. I think that would be a bad idea. Purplebackpack89 22:14, 3 November 2014 (UTC)
      I would be okay with deletion for things that fail CFI. Renard has nothing to do with that. Equinox 22:37, 3 November 2014 (UTC)
      Even in the circumstance I outlined? Purplebackpack89 22:46, 3 November 2014 (UTC)
      Renard does not have a supervote, so your outlined circumstance is not really relevant. Only the failing of CFI is relevant. Renard might point out that something fails CFI but he does not decide whether it fails. Equinox 19:43, 4 November 2014 (UTC)
      I honestly can't see how you can divorce Renard's premise from supervoting. Renard is upset that votes are being closed based on consensus. If you don't close on consensus, the closer is giving weight to some opinions (perhaps his own) than others. The people whose opinions get undue weight hold supervotes. BTW, where is Renard anyway? He started this thread and hasn't been heard from since yesterday. Purplebackpack89 20:04, 4 November 2014 (UTC)
      It shouldn't take much thought to not that there are many possible procedures to help enforce CFI. All votes could be required to present a reasoned argument or consent to someone else's reasoned argument and be invalidated if the argument were shown to be wrong. This would only require some more explicit rules for inclusion and exclusion. People who vote without reasons could be allowed only a fixed number of votes per month (proportional to their contributions?) before being disenfranchised or blocked or whatever. We could have votes to disenfranchise contributors for a time or indefinitely. The franchise to delete could be limited based on some explicit criteria, to those with a degree in linguistics, employment in language teaching or professional lexicography, veteran status, Or those could be deemed to disqualify. We could simply formalize the lemming criterion. I'm sure you could think of others. DCDuring TALK 21:15, 4 November 2014 (UTC)
      I don't really want to think of any others, because I honestly believe that the whole premise isn't really a problem, and certainly not one worth solving. And each of the counter-solutions you propose are solutions I cannot stomach. People should be entitled to participate in as many discussions as they see fit without any penalty whatsoever or the need to present bona fides. That's how a Wiki project works. Your counter-solutions are pretty clearly designed to prevent people from participating, which I find wrong and in violation of the "anyone can edit" ethos of Wiki projects. It's even worse because the proposal seems to be singling out people who Renard/you disagree with. Purplebackpack89 21:26, 4 November 2014 (UTC)
      • You note, “People should be entitled to participate in as many discussions as they see fit without any penalty whatsoever or the need to present bona fides. That's how a Wiki project works.” I must disagree -- that's how some Wiki projects work. In my entire time participating in Wiktionary, that is not how Wiktionary works.
      Anyone is welcome to edit. Anyone is welcome to participate in discussions. But when it comes to the outcome of discussions, bona fides of some sort are very much part of how the community consensus comes together. Bona fides could be something as simple as being a community member (i.e. editor) in good standing. In fact, that's probably the most important bona fide here.
      But suggesting that anyone and everyone can and should have equal weight in the outcome of any discussion is in error, and is decidedly *not* how Wiktionary operates. Moreover, I cannot support any move to make Wiktionary operate that way. ‑‑ Eiríkr Útlendi │ Tala við mig 21:53, 4 November 2014 (UTC)
      @Eirikr:, "anyone can edit" is a pillar of all Wiki projects. I dislike the term "good standing" because who the hell determines "good standing". I'm sorry, but the things that have been written here smack of disenfranchisement of Wiktionary editors, including potentially myself and Dan for at least the next few weeks. And the problem is that the people who are pushing this proposal happen to fall on the deletionist side of things. If this was being pushed equally by keepist and deletionist editors, I wouldn't have any problem with it. If it was being pushed primarily by people who didn't participate in RfDs, I wouldn't have a problem with it. This is a proposal started by a deletionist who's upset that articles aren't being deleted, trying to make an end run around the consensus of RfD discussions to get more articles deleted. Purplebackpack89 22:02, 4 November 2014 (UTC)
      I, for one, would be happy with any reasonable explicit inclusionist criteria that reduced the total amount of blather on this page. DCDuring TALK 22:34, 4 November 2014 (UTC)
      @Eirikr:, "it" is whether or not it is CFI Purplebackpack89 22:48, 3 November 2014 (UTC)
  • I formerly thought that we just needed to make explicit more criteria for inclusion and exclusion by having them voted on so as to reduce the scope for debate. The apparent lack of any desire to adhere to any such criteria as well as the miserable experience of most votes makes me think that this is no solution. No one seems to feel the need to make principled arguments (whether or not based on CFI), let alone develop explicit criteria. Even something as simple as criteria for differentiating and adjective from a noun used attributively was never made a policy. Actually applying to all the cases where it should be applied would probably generate a firestorm of opposition whining. DCDuring TALK 19:45, 3 November 2014 (UTC)
  • My point of view as a frequent participant in discussions, and as a frequent closer of discussions, is that RfD is a very heavily trafficked page, and every editor has ample opportunity to participate in every discussion on the page. Therefore, if there are twenty editors participating in discussions on the page, and only five of them weigh in on a given point, then fifteen don't have a strong or certain enough opinion to bother expressing it in the discussion. The obvious cases generally come out with overwhelming support for the obvious position. In other words, if four editors say "keep", and one editor says "delete" on the grounds that the word doesn't (in their view) meet the CFI, the fact that fifteen other editors participating in other CFI discussions above and below didn't think it important to agree with the proposed deletion speaks volumes about whether the presence of the word is seen as a serious breach of our standards. bd2412 T 01:37, 4 November 2014 (UTC)
    I completely disagree. People tend to ignore RFD discussions about words outside of their field of interest. If many editors ignore a particular RFD discussion, it may not be because they have no strong opinion on whether it should be kept, but because they have no interest in that word at all. --WikiTiki89 02:39, 4 November 2014 (UTC)
    • I agree with Wikitiki89. I generally only weigh in on RFD discussions pertaining to Japanese terms. I may put in my 2p on the stray English term, but as far as monitoring the RFD page as a whole, I often skim through for Japanese and move on if nothing presents itself. —This unsigned comment was added by Eirikr (talkcontribs) at 08:09, 4 November 2014 (UTC).
    • I am somewhere in between those two. For some terms I do not care, and for others I simply do not feel competent to judge. But I also have other reasons: one of the things that really discourage me from participating in RFfoos is the sheer volume of these pages: having to wait sometimes half a minute every time I post anything on those pages is just too frustrating. It would help if there were fewer discussions, shorter discussions (both in terms of duration and amount of text) and the discussions were promptly archived and removed. Right now we only have a solution for the last problem. Keφr 12:32, 4 November 2014 (UTC)
      One thing we can do to reduce page size is to split WT:RFD into two separate pages, one for English entries and one for foreign entries. --WikiTiki89 16:08, 4 November 2014 (UTC)
      Yes! And the same for RfV too, please. I've actually been holding off nominating a few entries 'cuz they're foreign and might make it hard for most people around here to find the relevant, i. e. English terms. -- Liliana 00:18, 7 November 2014 (UTC)
      Please don't create a separate page for RFD or RFV for non-English terms. Let us keep processes simple. The non-English RFV nominations are a fraction anyway. Let those who post nominations to RFD or RFV in larger volumes also help 1) close and 2) archive the nominations. --Dan Polansky (talk) 09:54, 30 November 2014 (UTC)
    • I formerly cared more. I often disagree with keep decisions, but have come to be resigned to the fact that many contributors find multiword terms easier to wrap their heads around than single-word terms. Adding silly entries is less destructive than definitions omitted because of insufficient breadth of participation from contributors with special domain knowledge, excessive reliance on MW 1913, poor organization of definitions for highly polysemous words, and incorporation of polysyllabic, rare, obsolete, and archaic words in definitions and glosses when better words are available. DCDuring TALK 13:33, 4 November 2014 (UTC)
      In my experience, the words that tend to draw the sharpest divisions of policy interpretation are common, everyday things, like fat as a pig, have an affair, and devalueing. I grant that when it comes to truly esoteric stuff like arfer dda, there are likely fewer editors who feel qualified to offer an opinion. bd2412 T 14:01, 4 November 2014 (UTC)
    • I'd say I participate about as much as Kephir does. I certainly don't weigh in on every RfD; I seldom if ever weigh in on foreign words and there are plenty of English words I take a pass on as well. I don't think the solution is fewer discussions; I think we have the right number of discussions and I would oppose the suggestion that discussions be replaced with speedy deletions. I say the problem is discussions (particularly those trending keep) are dragged on for months and months and months, even if consensus is clear. All RfD discussions should be closed within a month, and if there's no consensus, that means there's not enough support for deleting them. It's also flummoxed me that we don't break up RfDs by month the way we break up this page. Purplebackpack89 14:14, 4 November 2014 (UTC)

Add "via" parameters to Template quote-news[edit]

Can someone please add "via" parameters to Template:quote-news as is used at en.wikipedia for w:Template:Cite news ?

This way, we can specify what database archive may be used to verify the material, for example: NewsBank, LexisNexis, Westlaw, InfoTrac, etc.

Thank you,

-- Cirt (talk) 20:17, 3 November 2014 (UTC)

Aren't some of those sources behind paywalls?
You can include the url now. DCDuring TALK 22:30, 4 November 2014 (UTC)

Converting RfD to monthly subpages[edit]

Previous discussions:

User:BD2412 has expressed a wish for splitting WT:RFD into monthly subpages. It seems like a good idea, but probably requires change to {{rfd}}, {{rfd-sense}}, and {{rfd-redundant}} [others?]. I don't think it will need a vote, but it certainly needs an opportunity for discussion to make sure. DCDuring TALK 18:09, 4 November 2014 (UTC)

I wouldn't object. When we've done this to other pages, it seems that any newly created month page gets automatically added to my watchlist, if I am already watching the parent page. I would want that to happen here too. Equinox 19:45, 4 November 2014 (UTC)
No such thing. Keφr 19:55, 4 November 2014 (UTC)
While I might dislike it, the current single-page set-up has one advantage — it makes sure no discussion slips through unresolved (even though it sometimes takes ridiculously long to close some of them). Which is what often happens to Tea Room discussions now. Anyone remembers why succumb was tagged with {{rft}} and whether the issue was resolved? I do not. Sure, I can check backlinks in the appropriate namespace and find out, but it is quite tedious. Keφr 19:53, 4 November 2014 (UTC)
@Kephir: How does the use of subpages lead to items falling between the cracks? Is that a big contributor beyond the other contributors to requests being neglected?
Requests of all kinds fall between the cracks for several reasons. We have items that are tagged, but not added to the appropriate page for rfc, rfd, rft, and rfv. The absence of any time limits or dramatic consequences of rft and rfc mean that such items are not closed, let alone archived (at least for rft) to the appropriate page. Tags are not removed from many of the above requests and also rfi, rfp, and rfe. DCDuring TALK 20:59, 4 November 2014 (UTC)
Because the main pages of discussion rooms using the monthly subpages system display only discussions from last three months, and there is no good way to view all unresolved discussions in order to assess and properly close them. RFI, RFP and RFE are irrelevant — no debate is usually started for those, because none is needed; those requests are considered resolved simply when someone fulfils them. Keφr 21:36, 4 November 2014 (UTC)
Can the discussion room main page structure be changed to include more time periods/subpages? How about three months per page? And more for the oldest? What about templates to categorize items as "open", "closed", "look"? It is possible to have tables that present the newest or oldest X members of such categories. DCDuring TALK 22:20, 4 November 2014 (UTC)
  • Oppose, would just lead to nominations getting forgotten. DTLHS (talk) 19:39, 5 November 2014 (UTC)
  • Oppose. The risk of rubbish being kept because a discussion disappears from the page before it is closed is not worth it IMO. — Ungoliant (falai) 01:07, 6 November 2014 (UTC)
  • Support, of course. We can always have a single transcluded page where editors can go if they feel like having a long wait while the page loads, and keep the discussions on shorter pages for those who prefer faster loading. Better yet, we could go to the system used by Wikipedia and Wikiquote, where each discussion is a subpage transcluded into the page for the month. bd2412 T 14:13, 7 November 2014 (UTC)
  • We've been doing this on the French Wiktionary for years, but it's different because those pages don't get archived to talk pages, and ours do. Probably oppose for that reason. Renard Migrant (talk) 16:25, 11 November 2014 (UTC)
  • I oppose splitting RFD to monthly pages. RFD can get shorter if editors who make most nominations also help close old nominations. Closing old nominations includes providing a boldfaced disposition, deleting the nominated page if appropriate, and striking out the heading. Splitting the page to months will not make it any less stale, and will remove the long-page-displeasure incentive to start closing old discussions or start posting fewer new ones. --Dan Polansky (talk) 23:07, 28 November 2014 (UTC)

SPAM or Spam?[edit]

The entry on Wikipedia is titled "Spam (food)," not "SPAM (food)," so which one should be the main (as opposed to the alternative form of) listing? Right now, it is "SPAM." WikiWinters (talk) 20:48, 4 November 2014 (UTC)

Hormel Foods calls it The SPAM® family of products, SPAM® brand, SPAM Classic, Great American SPAM® Championship, SPAM® Musubi, SPAM® Tocino, The SPAM Museum, and #SPAMCAN. Apparently the all-caps style is part of the logo. —Stephen (Talk) 23:04, 5 November 2014 (UTC)
It looks like the company is trying to protect their brand name from w:Trademark erosion by using a spelling that's less likely to show up in non-brand usage. If you think about it, we shouldn't be interested in how the company decides the brand name should be spelled, but in how the term is spelled when used for non-brand senses. I think the main entry should be at spam (as it is now), and I have my doubts as to whether we should even have an entry for SPAM. Chuck Entz (talk) 02:32, 6 November 2014 (UTC)
  • As editors of a descriptive dictionary, should we not include an entry for SPAM if the term is indeed used with that capitalization?  :) ‑‑ Eiríkr Útlendi │ Tala við mig 06:26, 6 November 2014 (UTC)
@Chuck Entz: The main entry currently is SPAM, not spam. Do you suggest changing this? WikiWinters (talk) 23:56, 6 November 2014 (UTC)
For future reference WT:TR is the discussion room for individual entries where there are no policy issues. Google Ngram Viewer gives a slight edge to spam even before the Internet meaning appears so I'd go with that. Renard Migrant (talk) 16:34, 7 November 2014 (UTC)
Yes check.svg Done (Got it. Also, I corrected the entries.) WikiWinters (talk) 20:02, 7 November 2014 (UTC)

Eliminating Template:trans-mid, etc.[edit]

I recently created a pair of very simple templates {{col-top}} and {{col-bottom}} that create auto-balancing columns of text (for an example, see WT:Wanted entries). If we integrate these templates into pairs like {{trans-top}}/{{trans-bottom}}, {{rel-top}}/{{rel-bottom}}, etc. we will no longer need to manually balance their columns with {{trans-mid}}, {{rel-mid}}, etc. Assuming we test this for browser compatibility, is this something we would want to do? --WikiTiki89 17:11, 5 November 2014 (UTC)

  • It seems like a worthwhile goal. Can you tell anything about resource consumption before testing? Assuming it is not a resource hog and passes on all major browsers, it would seem that it could be initially deployed by having the existing templates call it. Is that right? DCDuring TALK 18:15, 5 November 2014 (UTC)
    It's CSS-based, so it's all on the client side, so no effect on server load. Even on the client side, I would think all the browser needs to do is a simple division of the number of lines by the number of columns, which should be completely insignificant. The only potential issue (as with all CSS features) is browser compliance. As far as deploying it, yes, we just need to have the existing templates call it and have the mid-templates do nothing. --WikiTiki89 19:22, 5 November 2014 (UTC)
    Which browsers will have trouble with it? DTLHS (talk) 19:27, 5 November 2014 (UTC)
    I'm not expecting that any will, but we still have to test it. Maybe some outdated browsers or versions of browsers will not support it. Just to be clear, it work perfectly in the latest Chrome, Firefox, and IE. --WikiTiki89 19:29, 5 November 2014 (UTC)
    Are there sites for testing using older browser versions? Does MW have copies or a testing suite or insight? DCDuring TALK 20:16, 5 November 2014 (UTC)
    CSS columns are not supported by Internet Explorer 9 and lower and Opera 11 and lower. --Yair rand (talk) 22:17, 5 November 2014 (UTC)
    Sigh. Are higher versions part of automatic updates for IE 9 and Opera 11? What share of users have the old versions? I suppose it's too much to expect that it fails gracefully. DCDuring TALK 23:07, 5 November 2014 (UTC)
    Re browser share: The most recent Wikimedia Traffic Analysis Report shows the following usage share: IE9 - 2.25%, IE8 - 2.20%, IE7 - 0.89%, IE6 - 1.53%, IE5.5 - 0.22%, Opera<11 - 0.3%. Browsers that don't support CSS columns will display the content all in one column. --Yair rand (talk) 00:56, 6 November 2014 (UTC)
    Thanks. That doesn't seem fatal. Also, isn't it possible to change template/CSS/etc behavior based on the browser? That would at least dramatically diminish the importance of balancing that tables as 90% of users would see the table as balanced even if the various "mid" templates were misplaced. Isn't such balancing done by a bot? (Autoformat did it.) DCDuring TALK 01:07, 6 November 2014 (UTC)
  • (After e/c) This sounds brilliant. It always puzzled me that we had no auto-balancing, given how simple the math is. ‑‑ Eiríkr Útlendi │ Tala við mig 20:17, 5 November 2014 (UTC)
  • I would recommend to use a column width as parameter (with a set default, like 20em) instead of a number of columns, so that the number of columns would adapt to the screen width. Dakdada (talk) 10:12, 6 November 2014 (UTC)
    @Darkdadaah: Would that work with more browsers? DCDuring TALK 13:52, 6 November 2014 (UTC)
    No more, no less: same support (see here and here). Dakdada (talk) 15:00, 6 November 2014 (UTC)
    But that would be a drastic layout change for our translation tables. I'm not against it, but we would probably need to vote on it. --WikiTiki89 15:50, 6 November 2014 (UTC)
  • One thing that occurs to me. In some cases, JA editors (and probably others) have been using the {{mid}} family of templates in semantic ways -- in my case, specifically by splitting up derived term tables to have derived terms starting with the headword on one side, and derived terms ending with the headword on the other side. (This is common and useful for Japanese entries.)
Is this proposal intended to entirely scrap the {{mid}} family of templates? Or is this proposal more limited in scope, and targets only some of the {{mid}} templates? ‑‑ Eiríkr Útlendi │ Tala við mig 18:28, 6 November 2014 (UTC)
For that kind of situation I would suggest using something other than {{mid}} to delineate the split. That way it's clear that the split is not just there for balancing purposes. —CodeCat 18:45, 6 November 2014 (UTC)
What would you suggest? A sample usage is here on the 刀 entry. I added column headers here to try to clarify the table organization. In either layout, though, I have no idea what to use to split the columns other than the various {{mid}} templates. ‑‑ Eiríkr Útlendi │ Tala við mig 19:02, 6 November 2014 (UTC)
I don't know. We probably don't have templates specifically for this kind of thing, but it would be a good idea. I am a proponent of using templates in a way that signifies intent/meaning, rather than just using whatever template "looks right". —CodeCat 19:08, 6 November 2014 (UTC)
One thing to think about is whether that is actually better than just having two separate tables as in this edit. --WikiTiki89 20:06, 6 November 2014 (UTC)
  • Having just the one collapsible div seems like less clutter and better usability. Perhaps some other template tweaking would do the trick? We could create something like {{der-col-top}} etc, or even just {{der-head|header text}}, which would fit between {{der-top}} and {{der-bottom}}. ‑‑ Eiríkr Útlendi │ Tala við mig 20:12, 6 November 2014 (UTC)
    But having two separate collapsible tables makes it easier for readers to expand only what they want to see. It's not really a lot of clutter to have two tables. --WikiTiki89 20:27, 6 November 2014 (UTC)
So what's the verdict? Can we do this? --WikiTiki89 20:53, 14 November 2014 (UTC)
I'd think it needed a vote, because it is a bit rough on those with older browsers. DCDuring TALK 04:02, 15 November 2014 (UTC)
In what way is it rough? Manual column breaks are rough on all browsers, for all readers with very narrow or wide viewports ( which is a large proportion these days). Michael Z. 2014-11-28 15:50 z

Multiple etymologies=mess?[edit]

The use of the whole etymological chain of a word is necessary? For example see the entry for the French word "démocratie", which derives from the Latin "democratia". The origin of the latter is Greek, but should this be presented in the etymology of the French word or only for the Latin one? And why is this exhaustive etymological analysis through the Proto-Indo-European roots presented, which applies only to the Greek word? In the categorization, the French world is presented as deriving from all these languages, Latin, Greek and Proto-Indo-European, while it's only a loanword from Latin. Actually, the Latin comes from the Greek word, and the Greek comes from the PIE. In this way (which isn't used for the most part of the words in the wiktionary), all loanwords come from a very first proto-something language, but the point is to present the language from which a word derived, e.g. the French word derived from Latin and that's all. If anybody wants to see the origin of the Latin word should go to its entry and so on.--Ymaea (talk) 16:48, 6 November 2014 (UTC)

Does it really make sense to only go back one step? If you put borrowed from Latin, then you click on the Latin it says borrowed from Ancient Greek. You click on the Ancient Greek it says from PIE. That's a lot of clicking. If you get a chain of seven languages in an etymology you're going to have to click 7 times to get all the etyma. Renard Migrant (talk) 17:11, 6 November 2014 (UTC)
Another issue is that sometimes the intermediate etyma don’t exist. From example, there are 1809 words listed at Category:Portuguese terms derived from Old Portuguese but we only have 460 Old Portuguese entries. And sometimes the etymon immediately preceding the word is not the most important; people who want to know the origin of French words are more likely to want to know the Latin or even Old French etymon than the Middle French one. — Ungoliant (falai) 17:21, 6 November 2014 (UTC)
Without a proper etymology backend this is all just pissing in the wind. DTLHS (talk) 18:17, 6 November 2014 (UTC)

The problem here is mainly the automatic categorization according to the etymologies used. The French word "démocratie" is listed in three categories: a) "French terms derived from PIE", b) "French terms derived from Ancient Greek" c)"French terms derived from Medieval Latin". My objections:

  1. What is this word? It cannot be PIE, Greek and Latin simultaneously.
  2. Especially the first category (PIE) is totally weird, as it indicates a straight connection between the French and the PIE words. But originally only the Greek word was formed from PIE.
  3. The mess becomes more chaotic when we want to describe the origin of the French "démocratie". We should say that it has a Greek origin, it's a Greek influence, which was passed in French through Latin. This "through" doesn't indicate the etymology, but the route of the word. So, categories which indicate that this word derives from PIE and from Greek and from Latin, are obviously wrong.

To sum up, imagine a category "French terms derived from Ancient Greek through Medieval Latin". It's much more accurate and totally different from this coexistence of the three categories above.--Ymaea (talk) 18:53, 6 November 2014 (UTC)

Being in three categories does not imply that the etymon existed in three different languages. French démocratie is derived from both Medieval Latin and Ancient Greek, though I would say it is not derived from PIE since the compound was coined in Ancient Greek and didn't exist yet in PIE. Even the Greek word is not derived from PIE; it was coined within Greek from two words that were themselves independently derived from PIE. Categories like "French terms derived from Ancient Greek through Medieval Latin" sound like a good idea in principle, but in practice I think they would quickly become unmanageable. —Aɴɢʀ (talk) 21:31, 6 November 2014 (UTC)
"It cannot be PIE, Greek and Latin simultaneously." No and we're not claiming it is. It's a bit like saying a word can't be a verb and a noun simultaneously. Not simultaneously no, but separately, yes! Renard Migrant (talk) 22:55, 6 November 2014 (UTC)
"So, categories which indicate that this word derives from PIE and from Greek and from Latin, are obviously wrong."
"So, genealogies which indicate that I am descended from my father, my great-grandfather, and my grandfather, are obviously wrong."
--Catsidhe (verba, facta) 22:59, 6 November 2014 (UTC)
  • I'm not really seeing the need for a determination on this. a) I'm generally OK with long etymologies, and b) how long an etymology should be should be dictated by common sense. Purplebackpack89 22:17, 6 November 2014 (UTC)

Regarding this genealogy case, yes you are descended from these three persons, but it would be weird to put you in the category of each one without giving this vertical kinship ties. So, when the Latin and the French word are both categorized as deriving from Greek, one could assume that we talk about two separate formations with a common ancestor, the Greek one. When a dictionary says "100 French words derive from Latin and 100 more from Greek", this word is double-counted? I don't think so. But in wiktionary yes, it's double-counted, you can see it in both categories. My point is very clear when we compare the present situation with a category like that I proposed, "French terms derived from Ancient Greek through Medieval Latin". On the other hand even this solution would sound weird in other cases and I have some in my mind. Indeed it would be very complicated and possibly we couldn't handle a situation like this. But, I just wanted to point out that there is a strong lack of clarity with the categories in the way they are constructed. Thank you all!--Ymaea (talk) 01:47, 7 November 2014 (UTC)

"... derived from ..." ≠ "... directly derived from...". I certainly belong in a category of "people descended from {my grandfather}", and in "people descended from {my great-grandfather}", but not in "children of {my grandfather}". Similarly, démocratie is derived from δημοκρατία, but is not directly derived from it. --Catsidhe (verba, facta) 02:08, 7 November 2014 (UTC)
Category:French terms derived from Ancient Greek through Medieval Latin would fail the utility test, i.e. not useful to anyone. It has no advantages whatsoever; it's not more useful and it's not more accurate. And there isn't a lack of clarity, quite the opposite. We include all relevant truthful etymological categories. So if something's derived from Latin, we include that. If something's derived from Ancient Greek, we include that. We don't pick one or the other. With something like mock you end up with Category:English terms derived from Proto-Indo-European via Proto-Germanic, Old Saxon, Middle Low German, Middle Dutch, Middle French and Middle English. If that's your idea of clarity, I'll take obscurity thanks! Renard Migrant (talk) 16:28, 7 November 2014 (UTC)
Yes, with Wanderwörter the chains can get quite complicated. It wouldn't be difficult to list e.g. a dozen of words in Inari Sami whose etymology is roughly "from Finnish < from Swedish < from Low German < from French < from Latin < from Greek < from Persian". To keep this in hand, the useful stages to indicate would seem to be
  1. Direct loan origin.
  2. Ultimate loan origin.
Both indicate an action: the loaning by French (or by Inari Sami, etc), and the word's derivation in Greek (or Persian, etc). Anything else is not that necessary.
Note that by "derivation" I do not only mean the morphological composition, though. Sometimes a specific semantic or phonological mutation may have occurred in a specific language (say, between Greek and Persian), and this is also relevant info for the etymology of a word.
On the other hand, indicating the reconstructed proto-roots from which a word was derived in some other language entirely is largely superfluous IMO; while for inherited words, though, these clearly ought to be mentioned. Pretty much everything in English comes in some way from PIE (sometimes thru quite a few detours) — the purpose of a category like "English terms derived from PIE" would be mainly for indicating what exactly has been inherited from that far back.
I suppose this gets more difficult with French vs. Latin, or Hindi vs. Sanskrit, where one might want to distinguish inherited vs. learned vocab. But maybe things like "French terms derived from Medieval Latin" versus "French terms derived from Vulgar Latin" suffice for the job?
Worth mentioning here as well BTW: we currently have a somewhat inconsistent system where e.g. some Germanic words' etymology is discussed right under the modern words (as is proper), while others' is discussed under the corresponding Old Norse or Old English or Old High German words. --Tropylium (talk) 01:08, 9 November 2014 (UTC)

Sercquiais[edit]

User:Ready Steady Yeti found out the hard way that we don't include this in our in our data modules as a language. We do have Guernésiais and Jèrriais, which are lects spoke nearby, and likewise often considered to be dialects of Norman. According to w:Sercquiais, the island was settled in the 16th century by speakers of Jèrriais, and has archaic features that have been lost in that language, along with considerable Guernésiais influence. Should we create a language code for this, or treat it as a dialect of Jèrriais/Norman? Chuck Entz (talk) 00:12, 9 November 2014 (UTC)

Why do we have separate codes for Guernésiais and Jèrriais to begin with? I'd say all three should be dialects of Norman. —Aɴɢʀ (talk) 13:51, 9 November 2014 (UTC)
I, too, am not convinced Jèrriais and Guernésiais are languages distinct from Norman. — Ungoliant (falai) 15:42, 9 November 2014 (UTC)

Indication of different pronunciations of English words of shared etymologies[edit]

Many English multi-syllable words that are used in different functions, especially as a noun and a verb, have the stress on different syllable according to which part of speech they occour as. Examples include increase, reject, excerpt, defect. The meanings and etymologies of such a word are usually related and the pronunciations are all grouped under the Pronunciation header. Many pages use various templates, some of which are clearly wrong, to indicate the part of speech a pronunciation pertains to. WT:ELE doesn't prescribe any template, instead hinting that parts of speech should be separated under multiple Pronunciation headers. Sound files often lack part of speech information. They may be indented under the relevant IPA description, but that requires a knowledgeable linguist, and to my knowledge isn't suggested on any policy page. The template {{qualifier}} seems to be the closest to what is intended. Is there a preferred way? Kumiponi (talk) 18:55, 11 November 2014 (UTC)

I'd use {{sense}} myself, I don't know about other editors. —Aɴɢʀ (talk) 19:12, 11 November 2014 (UTC)
That template's documentation doesn't allow such usage. Kumiponi (talk) 19:29, 11 November 2014 (UTC)
We've discussed this before (does anyone remember when/where?). Some people say that if a word has different pronunciations, then the two pronunciations actually have different etymologies. --WikiTiki89 22:07, 11 November 2014 (UTC)
At the very least, they're distinct words, so maybe something like this? Etymology at the top, in level 3, followed by Pronunciation 1 at level 4, then the words with that pronunciation, then Pronunciation 2 at level 4, etc. Of course this should only be done if we're really sure the words have the exact same etymology and came into existence at the same time from the same origin. —CodeCat 23:12, 11 November 2014 (UTC)
But how do you explain the different pronunciations? It should be part of the etymology. --WikiTiki89 23:20, 11 November 2014 (UTC)
You could say that the etymologies are different, but that sets up an inconsistency with the way we divide terms without pronunciation differences: we currently show the verb perfect as derived from the adjective perfect, but we don't show the verb mouse as derived from the noun, nor do we show the computer mouse as derived from the rodent mouse. Chuck Entz (talk) 00:03, 12 November 2014 (UTC)

Module:template utilities[edit]

Could someone please show me where the renaming of Module:template utilities to Module:ugly hacks was discussed at WT:RFM and where the deletion of Module:template utilities was discussed at WT:RFDO? As far as I can tell, the first was only mentioned in passing while discussing the fate of other templates, and the second wasn't discussed at all.

I can understand why User:Kephir might want to discourage people from using the module, but I can't understand why he didn't discuss it in the appropriate places first. At the very least it would have given people a chance to point out any potential problems, and to get people thinking about alternatives (this reminds me of a line in w:Dr. Strangelove about keeping deterrents secret, but I can't remember it offhand).

I'm sure he was very diligent in orphaning template utilities from everything that links to it. He wasn't so diligent, however, in checking Category:Templates that must be substituted. This is analogous to demolishing a bridge without closing the roads on either end. I've updated my own templates to use the other module, but there may be others. Chuck Entz (talk) 01:09, 12 November 2014 (UTC)

Yup. DCDuring TALK 02:33, 12 November 2014 (UTC)
How incredibly self-serving (or what a sign of youth?) it is to be able to say "I won't bother writing documentation for this module because you shouldn't use it: you should use Lua instead (and to hell with you if you don't and I won't answer your questions on my talk page if I don't feel like it). DCDuring TALK 02:41, 12 November 2014 (UTC)

AWB rights or task request[edit]

I'm looking to be granted rights to use AWB on this project. I noticed that Category:English simple past forms was supposed to be empty and I was going to correct the pages to point to Category:English verb simple past forms as prescribed. If granting AWB rights is not advisable because I'm an infrequent editor on this project, I'd ask that someone else please perform this task. Thanks, Ost316 (talk) 18:26, 13 November 2014 (UTC)

On it. bd2412 T 19:51, 13 November 2014 (UTC)
Done. Cheers! bd2412 T 20:01, 13 November 2014 (UTC)
Not that I object but, was this perhaps done unilaterally by CodeCat with no prior discussion? I don't think verb is really needed as other parts of speech don't have simple past forms. We don't have Category:English verb past participles for the same reason, 'verb' is implied by 'past participle'. Renard Migrant (talk) 13:30, 15 November 2014 (UTC)
That may be so, but we had two categories with duplicated intent, one containing about 75 entries and the other containing about 20,000. I have no problem recategorizing the 20,000, but it is definitely an easier task to recat the 75 with AWB. If we go the other way, it will be a bot task. bd2412 T 17:43, 15 November 2014 (UTC)
  • Let us fill Category:English simple past forms, and discontinue Category:English verb simple past forms. --Dan Polansky (talk) 18:37, 5 December 2014 (UTC)
    • If that is what consensus favors, I am up to the task (or a bot could do it in a day). bd2412 T 15:38, 6 December 2014 (UTC)
      • I don't agree with it. —CodeCat 17:51, 6 December 2014 (UTC)
        • @CodeCat: You could take the trouble to give reasons, instead of forcing us to beg you for them. Your opinion is one that I would recommend be ignored if you fail to participate usefully in discussions. DCDuring TALK 19:32, 6 December 2014 (UTC)
        • Well ok, you can ignore me and Dan. I'm just giving a counterweight to Dan because BD2412 suggested that might be the consensus. —CodeCat 19:33, 6 December 2014 (UTC)
          • I did not intend to suggest that there was a consensus for anything. I am volunteering to carry out the task if there is a consensus. bd2412 T 23:24, 6 December 2014 (UTC)
          • My main argument is Renard's: "I don't think verb is really needed as other parts of speech don't have simple past forms." My secondary argument is that status quo ante should prevail unless a performer of mass undiscussed changes can explain their reasoning and demonstrate consensus for their change. --Dan Polansky (talk) 12:23, 7 December 2014 (UTC)
  • I agree with Renard about the apparent redundancy of "verb" in the category name. Is there any language where a word class other than 'verb' has a past tense? DCDuring TALK 00:45, 7 December 2014 (UTC)
There are many languages where adjectives act like verbs, but one could quibble whether they're verbs or adjectives in such cases. There are similar issues with participles. There are also a number of agglutinative languages which have verb affixes on nouns. Given that the latter group are mostly not Indo-European, and tenses are rare outside of the Indo-European languages, I'm not sure whether there are any that have tense as opposed to aspect.
At any rate, I suspect that the main reason for "verb past tense forms" would be to maintain a uniform naming scheme between parts of speech and between languages. Given that many assumptions we make about what kinds of things are limited to which part of speech are wrong somewhere in the world (Modern Hebrew verbs, for instance, can have gender as well as person and number), it would make things easier in general to be explicit about such assumptions. I'm still trying to figure out where I come down on this particular case, though. Chuck Entz (talk) 01:32, 7 December 2014 (UTC)
WT:RFM I guess. Renard Migrant (talk) 13:20, 7 December 2014 (UTC)

Template:Bibleref[edit]

I came across a recent edit which replaced a use of the non-existent Template:Bibleref with a direct wikilink to the relevant Biblical book, which is certainly of no use. There used to be on WP a template Bibleverse, which used "mediatools", when that went belly up, it failed for a while, and now it has been replaced with a same named template that works again. It is absolutely magnificent: the template creates an external link to any of a dozen off-site Bibles.

I assume such a template existed, and was deleted when mediatools disappeared. Either way, a Template:Bibleref that works like the WP template would be quite useful. That's certainly much better than "fixing" the Templates to something hard-coded. Choor monster (talk) 15:36, 14 November 2014 (UTC)

What does the WP template do? Does it link to one or multiple translations? Does it 'find' the citation? DCDuring TALK 16:15, 14 November 2014 (UTC)
On a partially-related issue, I find quotes that just say 'Bible' and not which edition a bit irritating. Modern English Bibles span at least 500 years, so which edition really does it matter. Renard Migrant (talk) 13:24, 15 November 2014 (UTC)
  • Good grief, it worked on Friday—linking to the asked-for particular Bible translation—and now it doesn't work at all, trying to link to tools.wmflabs.org and coming up empty. Well, it half-way works, creating a properly formatted Biblical reference. As an example, in Shuah, there are numerous instances of the template, with the parameter HE added, that originally and last Friday when I checked, provided a link to this stable line-by-line English/Hebrew Bible. This was particularly useful for this article, since there are four distinct Hebrew spellings that KJV transliterated into "Shuah". In contrast, the template did not provide links for any on-line LXX, so the relevant links had to be hand-coded, which is, of course, a nuisance. And in theory, this is less robust, but in practice, it has turned out to be more robust. Choor monster (talk) 16:39, 17 November 2014 (UTC)

Proposal to allow breves for Latin words in certain exceptional cases[edit]

Some Latin words have one or more vowels with variable length (for example, agrimensor/agrīmensor, Galilaea/Galīlaea, Lūcipor/Lūcipōr, Moȳsēs/Mōȳsēs, patruus/pātruus, Pharisaeus/Pharīsaeus, -por/-pōr, Publipor/Pūblipor/Pūblīpōr, redux/rēdux, succisīvus/succīsīvus, etc.). Allowing only macra, and no breves, accurate presentation requires something like this. To me, that seems like a crazy amount of duplication to account for variation in the length of one vowel; far better, in my opinion, would be presentation like this. Right now, however, such presentation is problematic, because the links generated point to page titles with ĭ in them, but this is easily fixed by automatically stripping macra–breves from Latin links in the same way that standalone macra are currently stripped from Latin links; to achieve this, the following change would need to be made to Module:languages/data2:

This sort of double-diacriticking is standard practice, and can be seen in Lewis & Short's Latin Dictionary and Félix Gaffiot’s Dictionnaire Illustré Latin-Français. L&S and Gaffiot both use standalone breves to mark short vowels anyway, but the Oxford Latin Dictionary, which only uses macra and never breves to mark fixed-length vowels, also uses macron–breve double-diacriticking to mark vowels with variable length, as in the case of “sibī̆¹, sibe” (p. 1,753/1 in the 1st ed.), its headword for sibi, the dative of the reflexive pronoun .

Does allowing the use of breves on Latin words in these exceptional cases seem like a good idea to everyone? — I.S.M.E.T.A. 18:15, 14 November 2014 (UTC)

Regardless of whether we should use them, the module should definitely strip them. I will make the change (and your particular suggested edit will not work). --WikiTiki89 18:21, 14 November 2014 (UTC)
@Wikitiki89: Thanks for that. And I'm sorry that I was wrong with my suggested edit. Could you explain what the u(0x0304), u(0x0306), u(0x0308) in the text you added does, please? — I.S.M.E.T.A. 21:05, 14 November 2014 (UTC)
@I'm so meta even this acronym: First of all, the reason your suggestion would not have worked is that the combining breve is treated as a separate character, thus "Pharī̆saeus" would have become "Phariasaeus" rather than the intended "Pharisaeus". u(0x0304), u(0x0306), and u(0x0308) are respectively the combining macron, combining breve, and the combining diaereses, which are replaced by nothing. The u(...) function converts a number representing a Unicode codepoint into the character itself (if you look at the top of the module, you will notice that u is just a shortcut for mw.ustring.char), the 0x indicates that the following number is in hexadecimal notation, and the number is the Unicode codepoint of each respective character. --WikiTiki89 21:27, 14 November 2014 (UTC)
@Wikitiki89: Hugely illuminating. Thank you very much. :-)  — I.S.M.E.T.A. 21:55, 14 November 2014 (UTC)

Template usage policy (generic vs. FL)[edit]

Do we have a written template usage policy? I thought FL-specific templates were allowed whenever there was a need, and the use of generic templates were encouraged when it was possible, especially when no special FL functionality was needed. But the opposite is happening in two cases and it's confusing.

  1. {{hu-proper noun}} is linked to {{head|hu|proper noun}}. It does not have any extra functionality. I was planning to manually move every entry that is using {{hu-proper noun}} to {{head|hu|proper noun}}, and eventually delete the template. So I try to change {{hu-proper noun}} to {{head|hu|proper noun}} whenever I see it, but there are other non-Hungarian editors who do the exact opposite: they change {{head|hu|proper noun}} back to {{hu-proper noun}}.
  2. {{hu-suffix}} was developed to provide functionality specific to Hungarian entries. The template was changed to point to {{suffix}} which does not provide the same functionality. Requests for adding back the unique original functionality are treated with resistance.

What is the policy that should be followed? --Panda10 (talk) 17:42, 15 November 2014 (UTC)

Now that many templates simply call on {{head}}, we should only use the ones that have additional functions or have a realistic prospect to have added functions in the future. For example {{fr-noun}} has a few extra functions, such as the automatic plural and the gender. {{fr-adv}} has no added functions (when compared to {{head|fr|adverb}}) and no realistic prospect of them, which is why I've nominated it for deletion. Renard Migrant (talk) 18:30, 15 November 2014 (UTC)

Changing the alternative display form parameters, again[edit]

Previous discussions: Wiktionary:Beer parlour/2014/January#Parameter to use for alternative display of links, Thread:User talk:CodeCat/Why the rush?

Now that we have Lua to automatically strip diacritics, we don't need the third parameter of {{l}}, {{m}} and similar templates nearly as often as before. Some people have brought this up before and suggested that we could rename this parameter to alt= and "shift" the gloss parameter (the fourth) downwards to take its place. But it's far from clear whether more entries use the fourth parameter than use the third. So before we make this change, I would like to add some tracking to these templates so that we can more easily judge which of the two parameters gets used more, and make a decision based on that. Is this ok? —CodeCat 20:56, 16 November 2014 (UTC)

I still think we should add |text= instead. Keφr 21:51, 16 November 2014 (UTC)
…by which I meant that instead of having {{m|en|A|alt=B}} we would write {{m|[[A|B]]}}. And instead of {{m|en||X}}, {{m|en|text=X}} (I preferred {{m|en|=X}} initially, though). Keφr 22:14, 16 November 2014 (UTC)
Yes it's a good idea, and I prefer |alt= because it's more established here. Renard Migrant (talk) 21:54, 16 November 2014 (UTC)
  • I support doing this, and I support naming the parameter |alt=. —Aɴɢʀ (talk) 22:06, 16 November 2014 (UTC)
    • I actually wanted to do this before deciding on whether to rename it. To see if it's needed. Keep in mind that there are many instances where {{m}} is used with only the alt display and no linked term. These cases turn up often in etymologies where you might want to show an intermediate reconstructed form without linking to it. Renaming the parameter would make such cases longer to type. —CodeCat 22:16, 16 November 2014 (UTC)
      Maybe we could create two more templates, such as {{l*|en|foo}} and {{m*|en|foo}} that would not automatically link the parameters (we don't have to actually go with my scheme-influenced asterisk usage)? --WikiTiki89 00:56, 17 November 2014 (UTC)
      Your suggested {{l*}} seems almost the same as {{lang}}. —CodeCat 01:13, 17 November 2014 (UTC)
      Except that {{lang}} doesn't support transliterations or language linking. --WikiTiki89 02:10, 17 November 2014 (UTC)
      It should probably support the latter. And maybe the former too. —CodeCat 02:29, 17 November 2014 (UTC)
      But regardless, we'd need a mention version of it as well. --WikiTiki89 03:31, 17 November 2014 (UTC)
      {{l*}} makes me rather think of reconstructed/unattested terms than Scheme. Keφr 12:18, 18 November 2014 (UTC)
      Then maybe you haven't used Scheme enough. The asterisk is used similarly to the way the prime symbol is used in mathematics. Compare functions such as let*, list*/cons*, and map*, and see this question. --WikiTiki89 16:55, 18 November 2014 (UTC)
      Quite probably so. Though why not {{l'}} if you just meant to use a prime? Or hell, even {{l′}}? (That is U+2032 PRIME.) Keφr 18:05, 18 November 2014 (UTC)
      The apostrophe is a special character in LISP-derived languages; the asterisk is not. The Unicode prime symbol would only work in some implementations and would be inconvenient to input anyway. As for why I didn't use an apostrophe here, I didn't actually connect the Scheme asterisk with the mathematical prime symbol until I was righting that response. --WikiTiki89 20:14, 18 November 2014 (UTC)
I don’t think the tracking is necessary. Even if it shows that there are more uses of {{m}} with an alternative display than with a gloss (which would not correspond to my personal experience), all that proves is that we need to start adding more glosses. But if that’s what it takes for people to support the parameter change, I see no harm in it. — Ungoliant (falai) 05:41, 17 November 2014 (UTC)

Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep[edit]

Let's start applying our own rules! Otherwise I will be nominating Wiktionary:Criteria for inclusion for deletion as de facto it isn't being used anyway. Let's go one way or the other; apply our own rules or get rid of them all together. Renard Migrant (talk) 18:49, 17 November 2014 (UTC)

Oh, you know I'm voting oppose! Purplebackpack89 20:32, 17 November 2014 (UTC)
I don't know how useful this vote is. It just means we'd be squabbling over "interpretations" of CFI. It has to come down to common sense in the end. To paraphrase Renard under another name, if the lunatics take over the asylum (single-issue propagandists, fringe kooks, etc.) we are not going to defeat them with rule-mongering. Equinox 22:32, 17 November 2014 (UTC)
Oh, it's not useful at all. There is an enormous crisis of implementation if this vote passes. Which it shouldn't, because entries should be kept or deleted because of consensus. I'm also worried about the motivations of this vote: it and the discussion above grew out of Renard's complaint about not enough articles getting deleted. I'd feel much more comfortable if this was coming from a neutral third-party rather than an ardent deletionist. Purplebackpack89 22:52, 17 November 2014 (UTC)
CFI is consensus in itself. It consists of the rules that all of Wiktionary has agreed to work with. If the rules don't suffice that doesn't mean we should just override them when we feel it's necessary. It means we need better rules. I'm also rather surprised that you, as a proponent of applying Wikipedia practices here, do not agree with the suggestion that we apply the common Wikipedia practice of using policies as rationales. Personally I think this is a problem here and if policies were enforced more strictly then they'd actually mean something. In particular I would love to see Wikipedia-style deletion messages used here, in which the name and a link for the relevant policy stating rationale for deletion is always included. —CodeCat 23:09, 17 November 2014 (UTC)
Since you brought up Wikipedia, my experience is that admins there don't tend to close an AfD against consensus, even if consensus is in one direction and policy is another. The exception to that is AfDs that have a lot of IPs or new editors, who are usually discounted. Purplebackpack89 23:32, 17 November 2014 (UTC)
That may be true, but I don't know what you're interpreting as "consensus" in this case. I've seen administrators close discussions with option 1 even though the majority of votes was for option 2. The rationale given for this was that the people who wanted option 1 gave better rationales, in particular rationales that had more merit with respect to Wikipedia policy. That consensus depends on the quality of arguments over numerical superiority is a Wikipedia policy in itself, as can be read at w:WP:Consensus. And I would definitely favour a similar approach on Wiktionary. The only problem is that with our much smaller number of users and administrators, it's harder to find someone who is not involved and who can therefore be trusted to view the given arguments impartially. So we have the problem of "no consensus over the consensus"... —CodeCat 00:03, 18 November 2014 (UTC)
I am interpreting "consensus" as the preponderance of opinions on the matter. I honestly don't think it occurs as often as you do. The decisions you cite where that's true almost always fall in the 60-40 range. Admins almost never close a discussion one way when more than 65% of the votes are the other. Purplebackpack89 00:05, 18 November 2014 (UTC)
It's not common, no. But the fact that it can and does happen does mean something. The fact that Wikipedia even has an official policy saying it must be done that way is even more telling. While Wiktionary is certainly not Wikipedia, there are some things we can learn from and I think this is one case. —CodeCat 00:08, 18 November 2014 (UTC)
Purple, that's absurd: claiming that Renard operates on how many entries get deleted. He only wants to delete the ones that are actual SoPs. I know you don't understand SoP, as you have often made evident. Equinox 00:23, 18 November 2014 (UTC)
I stand 100% by that. Prior to the beer parlour discussion above, Renard had expressed dismay that a number of entries were closed as keep despite he believing them to fail CFI. He mentions this dismay in the beer parlour discussion above. Purplebackpack89 00:30, 18 November 2014 (UTC)
If anyone is operating purely "by the numbers" and not by thought or logic, it is you, Purple, who want to decide everything with a transient "keep" or "delete" vote, rather than deciding why, and formulating rules based on the reasoning. Equinox 00:24, 18 November 2014 (UTC)
I object vociferously to what you've just said. I give reasons for every vote I make. And I'm nowhere near the most voting person here. I also understand what SOP is; I believe it to be a ridiculously restrictive policy that should be eliminated to allow us to have worthy entries that many other dictionaries have. It's not that I have no reasons, it's mostly that you and Renard don't like me reasons. Purplebackpack89 00:28, 18 November 2014 (UTC)
There won't be any point in making any entries (of more than one word) at all if this is passed. Migrant's policy would ruin the dictionary and it would be in danger of stagnation, IMO. There are more receptive dictionaries around on the Internet. Donnanz (talk) 23:29, 17 November 2014 (UTC)
Yep, what Donnanz said. Purplebackpack89 23:32, 17 November 2014 (UTC)
I'm not given to the expression of strong opinions around here, but, frankly, this proposal seems like an absolutely terrible idea to me. It isn't always clear-cut whether a term meets CFI or not. Sometimes the determination of CFI compliance depends upon making a subjective/qualitative judgment (SOPness) rather than simply ensuring that certain objective criteria are being met (three non-mention citations spanning a year). The RfD process exists to resolve such cases. We discuss the matter and reach a consensus. If we're not going to respect the outcome of these discussions — if we're going to allow admins to become judge, jury, and executioner, deleting entries at their sole discretion based on their own personal interpretations of CFI - then the RfD process will become nothing more than a meaningless song and dance. Discouraging discussion and disrespecting consensus in such a manner would be contrary to the collaborative spirit of this project. -Cloudcuckoolander (talk) 00:01, 18 November 2014 (UTC)
On Wikipedia, as I noted above, AfD (article for deletion) discussions are not the same kind of back-and-forth that is often seen here. Instead, each person gives arguments and then a third party will judge those arguments based on policy and decide which view has more merit. The fact that a third party is involved means that powers are somewhat separated. Perhaps we should do something similar here, by requiring that whoever closes an RFD discussion must not have taken part in it. —CodeCat 00:06, 18 November 2014 (UTC)
We absolutely should do that! Purplebackpack89 00:09, 18 November 2014 (UTC)
This looks like something that should be obviously desirable and nice in theory, but I do not think the CFI as it is written right now is a good policy to enforce to the letter. For starters, I would like WT:CFI amended to accommodate "hot word"/"hot sense", {{translation only}} and phrasebook entries first. But even if we do that, there will still be too much room for subjectivity in interpreting CFI, which at best means that RFDs will keep being votes — or worse: as one person on TOW put it, "AfD is not a vote" means "AfD is a vote that administrators are allowed to count any way they like.". Keφr 08:22, 18 November 2014 (UTC)

This is a joke proposal, right? It’s honestly difficult for me to tell. --Romanophile (talk) 16:03, 18 November 2014 (UTC)

Your comment is a joke, right? You think applying rules we already have is a joke? Renard Migrant (talk) 20:08, 19 November 2014 (UTC)
Maybe the joke is that we need to vote to approve rules that have been already formally enacted. Keφr 20:22, 19 November 2014 (UTC)
No, I believe Renard is solid in his convictions. Solid enough that the vote starts on Monday. Purplebackpack89 16:42, 18 November 2014 (UTC)
Purplebackpack89 is wrong about what he says about me, for the reasons he gives. It's nothing to do with the number of entries being deleted, but which ones. Also, Purplebackpack89 if with what CodeCat says about deletion debates not being about counting votes but about the overall arguments made, why would you oppose this vote? It's what I'm proposing, after all. Renard Migrant (talk) 18:16, 18 November 2014 (UTC)
"Which ones". The ones I've seen you complain about the most vociferously are those where you voted delete, others voted keep, the discussion was closed as keep, and you claim the entry should be deleted on CFI grounds. What you're essentially proposing is to shut out a line of argument, which coincidentally tends to be one you don't agree with. Purplebackpack89 18:46, 18 November 2014 (UTC)
But if Renard's argument is rooted in CFI whereas nobody else's are, why should we ignore CFI? I support this vote because it means people will be forced to come up with better arguments - specifically, arguments that follow consensus-established Wiktionary policy - if they want their views to be taken into account. I think that's a good thing. —CodeCat 19:54, 18 November 2014 (UTC)
More than that, any problems that are in CFI, currently we've no reason to fix them because editors are free to ignore CFI as much as they choose. Forcing CFI to be applied will put in under much greater scrutiny and therefore it will get amended. What's the reason to improve it right now? Renard Migrant (talk) 22:05, 18 November 2014 (UTC)
Because participating in RfD is easy and changing CFI is hard. And CFI can never cover everything. Sometimes, you just have to use the Potter Stewart test. Purplebackpack89 22:08, 18 November 2014 (UTC)

Wiktionary:Votes/pl-2014-11/Require third-party closures of RfD and RfV discussions[edit]

Starting one week from today, there is going to be a vote on whether RfD and RfV discussions must be closed by uninvolved editors. Purplebackpack89 00:37, 18 November 2014 (UTC)

We generally do this anyway, no need to vote on it. --WikiTiki89 00:39, 18 November 2014 (UTC)
There's no policy that says we have to, though. Purplebackpack89 00:41, 18 November 2014 (UTC)
We don't need a vote to enforce something that we already do. --WikiTiki89 00:42, 18 November 2014 (UTC)
We need a policy to make sure we keep doing it. —CodeCat 00:52, 18 November 2014 (UTC)
It is not that clear that we should. It is naïve to think that requiring "uninvolved editors" to close discussions will eliminate biased closures — maybe even the opposite. Also, this requirement is especially superfluous on RfVs, where the existence of citations is usually not a matter of any interpretation. Keφr 08:30, 18 November 2014 (UTC)
I have removed RfVs from the vote, though I still harbor reservations that there's nothing stopping the same editor from starting and closing an RfV. Purplebackpack89 16:49, 18 November 2014 (UTC)
I think as it's worded, no, because it excludes anyone who voted. If it excluded just the original nominator and the entry creator, I could go with that. Excluding anyone who votes (though I note, not anyone who comments) could rule out practically everyone for really well discussed entries. Also why would someone who hasn't voted necessarily be 'unbiased'? Also, you could abstain from voting in order to be able to close, which means if you had a bias you'd be free to impose it onto the entry, because you hadn't voted. Renard Migrant (talk) 18:20, 18 November 2014 (UTC)
This. Keφr 18:33, 18 November 2014 (UTC)
I have closed dozens of deletion discussions that I have participated in - mostly because it seems (to me at least) that discussions tend to languish on the page long after they have run their course. If I hadn't closed those discussions, someone else would have had to pick up the ball, which wasn't happening. If consensus differs from my views, then I go with the consensus. The closed discussion remains on the page for a week following the closure, so if there are objections to the closure they can be raised. bd2412 T 17:07, 19 November 2014 (UTC)

Harassment by User:Kephir[edit]

Another admin, Kephir, is harassing me. He removed comments I made on another user's talk page, here and here. When I asked him not to do that, he deleted the message on my talk page, claiming it was vandalism here (commenting on another person's talk page is clearly not vandalism nor graffiti, as he labeled one of my edits). There are many other instances of harassment of me by this editor in months past, including a number of unwarranted personal attacks on talk pages and in edit summaries (this is a good example). Could someone PLEASE get him to stop? Purplebackpack89 22:56, 18 November 2014 (UTC)

If the abuse is long-term I recommend starting User:Purplebackpack89/Kephir. Subsequently write all the instances you can think of where he has misused his tools so that everything is in one place. That way you could remember every negative interaction that has occurred between you and if others agree with you then they may chime in where Kephir could possibly eventually get his admin tools revoked. Zigguzoo (talk) 23:05, 18 November 2014 (UTC)
Kephir has been an admin for 10 months only. Maybe he;s experiencing powertrips of some sort but I get the feeling that if he continues on his current projection of misleading edit summaries and misuse of the tools he may face desysopping soon. Zigguzoo (talk) 01:12, 19 November 2014 (UTC)
Oh, he oughta. The deletion of talk page threads and hiding of edits he did was just seeing what he could get away with with the tools he's proving he shouldn't have. While it's acceptable to clear one's talk page, user warnings should not be tagged as vandalism or graffiti. What compelled him to remove good-faith edits from your talk page, I do not know. Purplebackpack89 03:30, 19 November 2014 (UTC)
I think he's making legitimate criticisms, and I also think you're happy to make legitimate criticisms yourself, but when someone does it to you, you accuse them of harassment. Why do you think it's ok for you to act that way, but when other people do it to you, it's awful and horrendous? Renard Migrant (talk) 18:02, 24 November 2014 (UTC)
@Renard Migrant:, because I don't delete the criticisms with the summary "vandalism", nor do I remove other people's comments on talk pages other than my own. It's not so much what he says to me, it's the misleading edit summaries. Purplebackpack89 17:31, 28 November 2014 (UTC)


And he's doing it again[edit]

When I expressed concern about his deletion of two templates, User:Kephir deleted my comment with the summary "Incomprehensible, meaningless or empty: please use the Sandbox". The sandbox is not the appropriate place for user issues. When I told him that, he just deleted that with the edit summary "No usuable content given", which is a CSD for articles, not user comments. Can somebody please tell him to stop the misleading edit summaries? Purplebackpack89 19:04, 28 November 2014 (UTC)

Proposal to start WT:Courthouse[edit]

I propose we start WT:Courthouse, a page to discuss user conduct, blocks, reverts, etc. without disrupting the WT:BP. --WikiTiki89 16:09, 19 November 2014 (UTC)

Definitely support. But I wonder if it wouldn't be clearer if we just named the page after its function. —CodeCat 16:11, 19 November 2014 (UTC)
Support the page Not a big fan of the name, though. Purplebackpack89 16:13, 19 November 2014 (UTC)
Is "courthouse" not indicative of its function? --WikiTiki89 16:17, 19 November 2014 (UTC)
Not in the same way that WT:Vandalism in progress is. —CodeCat 16:29, 19 November 2014 (UTC)
Courthouse may actually be a little too indicative. I'd just call it Wiktionary:User conduct Purplebackpack89 16:31, 19 November 2014 (UTC)
Yes that's a good idea, although that might be mistaken for a page describing how users should behave. —CodeCat 16:35, 19 November 2014 (UTC)
WT:Vandalism in progress is a different story, since it's for emergencies. I chose the name WT:Courthouse by analogy to pages such as WT:Beer parlour, WT:Tea room, etc., except that it is actually more indicative of what the page is about. I'd put it at the same level of self-explanatory-ness as WT:Information desk. --WikiTiki89 16:47, 19 November 2014 (UTC)
I realise that. I'm not really that fussed about the names, I'm just wondering if the fancy names like "Beer parlour" and "Grease pit" aren't too confusing to new people who don't already know what they are. —CodeCat 16:50, 19 November 2014 (UTC)
Of course they're confusing, but only for the minute before they read the description on the page. But "Information desk" is not confusing, and I don't think "Courthouse" would be either. --WikiTiki89 17:09, 19 November 2014 (UTC)
What about Wiktionary:Dispute resolution? —CodeCat 17:19, 19 November 2014 (UTC)
That is currently a redirect to Help:Dispute resolution, which says, among other things, that if you have a dispute with a user, come here. That page will have to be reworded when the user conduct page is created. FWIW, I think the user conduct page would be more expansive than just dispute resolution. Purplebackpack89 17:37, 19 November 2014 (UTC)
"Dispute resolution" is something that can be done on userpages. This new page would be for when the dispute resolution fails. --WikiTiki89 18:17, 19 November 2014 (UTC)
...like above, when another editor arbitrarily decides any attempt to communicate with him is vandalism and immediately deletes it. Purplebackpack89 18:26, 19 November 2014 (UTC)
How about Court of last resort? —Stephen (Talk) 19:40, 19 November 2014 (UTC)
As a five-time administrator, I put my name forward for being the judge in the courthouse. --Type56op9 (talk) 14:36, 20 November 2014 (UTC)
In my experience the accusations of harassment and evidence gathering and posse forming which go along with it often result in more abuse that the instigating incident. I don't think adding a venue for tong wars will do anybody any favors. I would go so far as to say that disputes between users are better settled off of the site. - TheDaveRoss 23:09, 19 November 2014 (UTC)
I would have agreed, except that some users are always going to complain and giving them a place for it will prevent the Beer parlour from being disrupted by the complaints. It's like creating designated smoking areas versus banning smoking altogether. --WikiTiki89 23:16, 19 November 2014 (UTC)
I support the creation of Wiktionary:Courthouse, and with that name (and, consequently, with the shortcut WT:CH), if only to get all the recent whining off my watchlist via WT:BP and WT:RFD. — I.S.M.E.T.A. 21:52, 20 November 2014 (UTC)
I suspect this will become some sort of sensationalistic, exhibitionistic bullshit like Judge Judy. Let's do it. Equinox 02:58, 21 November 2014 (UTC)
I support this idea. It will be a place for just average block discussions, rather than for just emergency vandal reports. WT:VANDAL has in the past also been used for regular block discussions, which gives another advantage to WT:Courthouse. Rædi Stædi Yæti {-skriv til mig-} 03:36, 21 November 2014 (UTC)
  • How about calling it WT:Theater since it will be for nothing but drama? —Aɴɢʀ (talk) 21:18, 24 November 2014 (UTC)
    That could be a redirect. --WikiTiki89 15:26, 25 November 2014 (UTC)
  • I oppose this proliferation of discussion forums. User conduct is only hardly ever discussed in public forums; user talk pages are usually enough for that. We already have too many forums. --Dan Polansky (talk) 01:23, 29 November 2014 (UTC)
    • Dan, what do you do if somebody refuses to engage? Or if you're at an impasse that can only be solved by a third set of eyes? Purplebackpack89 02:02, 29 November 2014 (UTC)

Wiktionary:Criteria_for_inclusion#Inflections[edit]

Hi,

According to Wiktionary:Criteria for inclusion#Inflections, entries like keeps one's options open are to delete? (cf. keep one's options open).

Regards, — Automatik (talk) 20:23, 20 November 2014 (UTC)

Links to Appendix:Glossary in Template:inflection of[edit]

I've added a feature to this template (in its module Module:form of) that automatically links to the glossary definition of a given grammar tag, if it exists. For example:

Not all of the recognised grammar tags have glossary entries yet. I hope this is useful in any case. This feature would make it more desirable to use this template instead of {{form of}}, at least as long as it's possible. I will probably also add this feature to the "shortcut" templates like {{plural of}}, {{accusative of}} and so on. —CodeCat 18:12, 21 November 2014 (UTC)

Oddity in Template:context[edit]

(Not sure if this is BP or GP material...)

I was just adding context labels to two Chinese entries (traditional spelling 庫納 and simplified 库纳) to clarify that the meanings here are about a currency. I added {{context|lang=zh|currency}}. Confusingly, and incorrectly, this adds (numismatics) to the visible page, although it does add the page to Category:zh:Currency as expected.

I've left the context labels in place on those two entries. Could someone more familiar with the context infrastructure see about changing this behavior? Numismatics is specific to coinage, whereas currency is about more than just coins. TIA, ‑‑ Eiríkr Útlendi │ Tala við mig 19:30, 21 November 2014 (UTC)

This is all handled in Module:labels/data. You can remove the following lines, if you think that is the right thing to do:
aliases["currency"] = "numismatics"
deprecated["currency"] = true
--WikiTiki89 19:50, 21 November 2014 (UTC)
Some unelected person made the unsanctioned decision to deprecate the currency topical tag. Is it clear to anyone what the logic of these labels is? Is this deprecated because some believe topic categories are not a good idea or because this one is not to their taste? Is it explained anywhere accessible? It certainly doesn't fit the documentation of {{context}}.
In the absence of any particular sanction for such, I guess you can do whatever makes sense to you. Even if something ends up broken, it wouldn't be all bad: it might create some pressure to make some actual community decisions about this kind of thing. DCDuring TALK 19:55, 21 November 2014 (UTC)

Actually, the label only adds to the "Money" category. This is because the "numismatics" label is defined twice:

labels["numismatics"] = {
	display = "numismatics",
	topical_categories = {"Currency"} }
-- ...
labels["numismatics"] = {
	topical_categories = {"Money"} }

The following entries use the "currency" label as of now:

Use that list as you please. Keφr 21:15, 21 November 2014 (UTC)

Note that the {{context}} and {{label}} tags aren't supposed to be about clarifying which meaning of a word is being referenced. That's what {{gloss}} is for. The context tags are for labeling technical terms within a certain field. Numismatics is a field that has technical terms, but currency isn't. —Aɴɢʀ (talk) 21:19, 21 November 2014 (UTC)
We have many other labels like that, including "cardinal", "ordinal", "personal" and so on. We should probably get rid of those. —CodeCat 22:50, 21 November 2014 (UTC)
Those seem less clear cut than the more purely topical labels. Like other dictionaries we use beginning-of-the-line labels to convey grammatical information that qualifies and clarifies the definition. The labels mentioned above sometimes convey grammatical information. For example, in English, ordinals normally fit only in certain slots relative to determiners and adjectives in NPs. DCDuring TALK 15:56, 22 November 2014 (UTC)
But does that mean they should be a context label? "Ordinal" is not a context, it's a description of the semantic function of the word. —CodeCat 16:00, 22 November 2014 (UTC)
So are the labels about transitivity and countability. The convention among dictionaries is generally placement at the beginning of the line. (We rejected the other convention of having a separate header for transitive and intransitive.) Dictionaries that have information on complements (eg, head of following PP), semantic restrictions of what is modified, and orthography also place that at the beginning of the definition line. DCDuring TALK 16:27, 22 November 2014 (UTC)
Transitivity is contextual though. Verbs can have different meanings depending on the presence of an object. —CodeCat 18:38, 22 November 2014 (UTC)
Why should we care about what you or I think is or is not 'contextual' or 'grammatical'? We are discussing a user interface. The sole question of importance is what and where would users expect certain classes of information. I was simply arguing against another case of jumping to - and acting on - premature conclusions.
I like the idea of putting topical information after the definition, if we have it at all. Are you saying that this is information that belongs after the definition or on the inflection line or in a usage note or that it doesn't belong in the entry at all? DCDuring TALK 19:34, 22 November 2014 (UTC)
"cardinal" and "ordinal" don't need to be displayed because that should already be obvious from the definition and the part of speech. "personal" could just be a gloss after the definition, but only if it's needed to qualify it (for example who alone could be both a personal and relative pronoun). —CodeCat 20:37, 22 November 2014 (UTC)
I might agree with this conclusion, in most applications, but the evidence of how you reach it scares me. "Obvious" to whom?
Is it obvious from the PoS header "Adjective" or from the PoS header "Numeral"? Both of them are applied to English ordinal numbers. BTW, do we still have runs of entries with unusual or no-longer-conforming headers? DCDuring TALK 23:18, 22 November 2014 (UTC)
@CodeCat: At the very least next and last are ordinals that depart from the numeral pattern. Presumably, next to last, ultimate, penultimate, antepenultimate are also ordinals. My imagination and memory both fail to provide me with others, but I suspect there are more, some probably used only in special contexts. I think all of these have common characteristics in terms of order in noun phrases: '[det] [ord] [card] [adj]s [N]' (much less often '[det] [card] [ord] [adj]s [N]'), not *'[det] [adj]s [card] [ord] [N]', nor *'[det] [card] [adj]s [ord] [adj]s [N]', etc. Ie, 'the last six red cars' (or ?'the six last red cars'). I think this kind of characteristic suggests that we need to keep the 'ordinal' label. I have the strong suspicion that it is only by ignoring some of the fine-grained features of syntax that we can feel free to ignore this kind of essentially lexical information. I wonder whether we shouldn't actually apply an RfD process to any deletions of data from the modules. DCDuring TALK 01:25, 25 November 2014 (UTC)
I don't see why the meanings of those terms can't be expressed without putting {{lb|en|ordinal}} in front. —CodeCat 01:50, 25 November 2014 (UTC)
It is a question of SYNTAX, not semantics. That is exactly the sort of thing that we put in front-of-definition labels. By labeling them properly we can help folks use them properly without having to have a common usage note in every English entry that has an ordinal sense. Wiping out content that you don't seem to understand does not engender trust in your decisions. DCDuring TALK 04:46, 25 November 2014 (UTC)
What Angr said (at 21:19, 21 November 2014 UTC). "Currency" is a gloss, not a context. - -sche (discuss) 02:01, 25 November 2014 (UTC)

Wiktionary:Votes/sy-2014-11/User:ObsequiousNewt for admin[edit]

Hello all. I'm just posting this here to notify everyone that the vote to confer administratorship on User:ObsequiousNewt began twenty-five minutes ago; the vote will end at 24:00, 10 December 2014 (UTC). — I.S.M.E.T.A. 00:31, 25 November 2014 (UTC)

Proposed new page: Wiktionary:Errors and omissions[edit]

We often get feedback about things that are incorrect or missing in our entries. This is very useful, but having it on the feedback page puts it out of view of some of the editors who might fix it. Furthermore, the feedback is only for anonymous users, registered editors can't use it (the link simply doesn't show up). So I believe it would be beneficial to have a single page, somewhat like a discussion page but without any real "discussion", where IPs and logged-in users alike can report mistakes in entries. This would primarily be used by people who don't have the linguistic (might not be fluent enough to be sure) or technical (might not understand our code) knowledge to fix it themselves. —CodeCat 22:41, 25 November 2014 (UTC)

Maybe... though to me this feels slightly redundant to individual entries' talk pages (though I know they can be overlooked) and the Tea Room. Equinox 22:52, 25 November 2014 (UTC)
My intent was to create a page that is focused more on reporting errors and less on discussing them, which the Tea Room is more about. It's definitely not obvious to me (or to many anonymous users, I imagine) that the TR should be used to report mistakes. —CodeCat 22:59, 25 November 2014 (UTC)
But it could be made obvious, e.g. by having a link on every mainspace entry, visible to both registered and unregistered users, saying "Find an error or omission? Please report it to the Tea Room." —Aɴɢʀ (talk) 23:02, 25 November 2014 (UTC)
I like the idea. Putting the items in TR could overwhelm the place.
Could we do something to really encourage timid or uncertain users to report errors and omissions. For example, imagine a tab or something on the entry page with an invitation to report an error, with the report either on the talk with a link thereto on the proposed page or directly on the proposed page. DCDuring TALK 01:05, 26 November 2014 (UTC)
I definitely like the idea of the "report" button, I was thinking of something like that as well. It would streamline the process for the user and make it more likely that they will report things, which helps us in turn. —CodeCat 01:10, 26 November 2014 (UTC)
I agree, let's use the tea room and talk pages rather than an entirely new page. WT:FEED not ideal but still better than nothing. Remember a lot of these corrections will turn out to be wrong or unprovable. Renard Migrant (talk) 18:20, 26 November 2014 (UTC)
I agree with the RM about the likely low yield (less than 50%, possibly much lower) of usable 'reports'. That seems to me to be a reason not to put such items on WT:TR. I do think we need a single page that has all the instances of such reports. Whether it would be populated with the 'reports' and links to the entry or with links to the section of the entry talk pages that contained the 'report' is worth discussion. I suppose leading users to the talk page has some advantages, especially as the issue may have been discussed before.
Is there any benefit to a limited test or is the best test full implementation with attentiveness to user response to the new invitation to 'report'? DCDuring TALK 15:32, 29 November 2014 (UTC)

Inflecting alternative forms[edit]

As an Ancient Greek editor, many words that I have come across have dialectical or alternative forms. For verbs, this can mean that different tenses have different alternative forms. Policy only states that the alternative lemma should not have a full entry, and what I have found is not clear as to which headers should be included. Therefore I ask: should inflection tables of alternative forms go only on the lemma entry, only on the alternative form entry, or on both?

I'd hold that they should at least appear on the lemma entry, if only for the reason that, in the case of GRC verbs, most alternative forms will be for non-present tenses. In fact, I would favor including nothing in an alternative form's entry but the Alternative forms, Pronunciation, and POS headers; the lemma should hold all of the verb's information, while alternative forms should be only a soft redirect. The argument against this, with regards to inflection at least, would be that the inflection of the alternative form is relevant to the alternative form and not the lemma- e.g. that the inflection of δᾶμος (dâmos) is relevant only to the alternative lemma δᾶμος (dâmos) and not the standard form δῆμος (dêmos). ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 16:04, 28 November 2014 (UTC)

Inflections can be added to alternative forms. — Ungoliant (falai) 16:11, 28 November 2014 (UTC)
@Ungoliant MMDCCLXIV: Do you mean that this is the practice, or that you think it should be? And if so, should they be added to the lemma entry as well? ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 16:43, 28 November 2014 (UTC)
It’s the common practice, at least for the languages that I read and edit (an also what I think it should be).
I don’t recall ever seeing the inflection of an alternative form displayed in the lemma. If the peculiarities of Ancient Greek mean it is better to list all the inflections in one place, you should convene the Ancient Greek contributors to see what they think, and write something in WT:AGRC if they agree it requires special treatment. — Ungoliant (falai) 16:54, 28 November 2014 (UTC)
I always include full inflection in alternative forms. —CodeCat 16:55, 28 November 2014 (UTC)

Reordering the years on Requested entries pages[edit]

On Wiktionary Talk:Requested entries (English)#Reordering the years was a brief discussion of the benefits of having the year appear in descending order for each letter. The principal benefit would be the likely elimination of the need to move items to the current year from the first year listed when a requester accidentally places it in the wring year. Users who click on the letter will find a list in which to place their request and will from time to time fail to notice the year section heading. Even users who do find the right year heading need extra pagedowns or scrolls to do so. Other than the work to do it and the modest change for habituated users of such pages I can't think of significant reasons not to do this. (The option of having separate sections for each year with alphabetical ordering thereunder would make it harder to find requests in different years that were duplicates or near duplicates.) DCDuring TALK 15:12, 29 November 2014 (UTC)

@DCDuring: Seems sensible. Go for it. — I.S.M.E.T.A. 22:57, 29 November 2014 (UTC)
Ordering by year before letter might make even more sense, so that people can focus on the oldest ones first. —CodeCat 23:13, 29 November 2014 (UTC)
@CodeCat: I assume you mean ascending order by year. Did you think the whole alphabet of requests for 2011 should appear before, say, those for 2015?
I am not concerned so much about what experienced contributors do, though I would not want to inconvenience them, as much as I am concerned about reducing needless user error, especially, but not limited to, newbies. Our veteran contributors can make decisions about the kind of requests they would like to fill (or delete) by whatever criteria they have, probably little influenced by how the page is presented. DCDuring TALK 00:02, 30 November 2014 (UTC)
@DCDuring: As a frequent editor of the page, I think CodeCat's idea is the most efficient. Year should be given priority over letter, as even the most experienced users would be much more likely to refocus their efforts on the older entries simply because they are further up the page, and they would thereby be completed more quickly. If they are being completed more quickly, the inexperienced users would be less likely to even stumble upon the entries from the older years as they would already be completed. I support the replacement of the year and letter sections with each other. WikiWinters (talk) 22:39, 30 November 2014 (UTC)
I am not entirely certain what you mean by priority. Are you saying that all 2011 items should precede all 2012 items etc, down to the present? I contrast your view with that of another frequent user of the page, Equinox, who seems to think that almost anything more than a year old is not worthwhile. I would have thought that new additions are most worthy of consideration. Consideration leads to entry creation, removal from the list, or kicking the can down the road. After a few contributors have given an item such consideration, presumably the item is too specialized for any current contributor. Why would we want to force folks who have already looked at the item to keep on looking at it or to expend keystrokes to avoid looking at it? DCDuring TALK 22:59, 30 November 2014 (UTC)
@DCDuring: I mean that, rather than having sections for each letter and sorting each letter's entries by year, I propose that there be sections for each year (only the years that contain entries currently) and that within the year sections there be smaller sections for each letter, so it would be the opposite of what it is now. Many of the older entries easily meet the CFI requirements, and, while standard protocol is for the entries to be open to discussion and for the community to leave entries as they are, their theoretical placement above all of the newer entries would make it easier for all users, regardless of level of experience, to notice them. It's obviously expected that people not immediately delete entries because they don't think they meet CFI, but with this new system, experienced users would not have to scroll down as much and would notice the entries that do not meet CFI more easily, and new users would be more likely to attempt to create the first entries they see, which would be the ones from the oldest year, at the top. Newer entries tend to meet CFI more easily, simply because the older entries generally stay only after users see that they are older and are therefore probably "there for a reason," while the newer ones are generally treated more at face value and less by seniority, so as to be dealt with accordingly at a more rapid and efficient rate. WikiWinters (talk) 23:32, 30 November 2014 (UTC)
SemperBlotto used to wipe the 12-month-old requests every year (around Christmas, in fact; I like to imagine he thrust them into a cheerful Dickensian fire). I felt this was a good idea, since important words would keep coming up again and again, while dross (like much of the WT:REE page) would be lost. But someone objected to this, which is why the page is now full of crap nobody will create, but nobody dares to delete. Equinox 02:59, 30 November 2014 (UTC)
Pushing the old year sections to the bottom of each letter section is somewhat in that direction, but I was mostly trying to reduce keystrokes without a lot of controversy. DCDuring TALK 03:39, 30 November 2014 (UTC)
@Equinox, SemperBlotto: Would SB now delete everything from 2011 and 2012? So, most of the time, except December, we would have two years of requests? DCDuring TALK 22:59, 30 November 2014 (UTC)
@Equinox: How about moving all requests from the calendar year before last to subpages (e.g., WT:RE:en/2012 and previous)? — I.S.M.E.T.A. 09:33, 30 November 2014 (UTC)
A possibility that would mean that duplicates would be harder to discover. I suppose one could argue that it is the more recent addition that better indicates degree of interest. DCDuring TALK 13:16, 30 November 2014 (UTC)
@DCDuring: My thinking was that such consignment to subpages would have the decluttering effect of SemperBlotto's Dickensian fire, but without the actual loss / deletion that some have objected to. The presence of duplicates is pretty unimportant; they can be cleared out as blue links every few months. — I.S.M.E.T.A. 14:58, 30 November 2014 (UTC)
I sometimes sign requests <small>~~~~</small>. It shows who made the request and when. Maybe get rid of the years all together as it isn't when the request was made that matters. Renard Migrant (talk) 16:40, 1 December 2014 (UTC)
Signatures make the page larger and might force us to a subpage structure.
In the absence of signatures, the year conveys something: that none those who fill requests have been able or willing to fill the specific request over a time period, nor have they felt free to delete it. It is obvious to me that fresh requests are more actionable (whether the action be fulfillment, deletion, or comment) than stale ones.
Perhaps we need some kind of workflow-related organization of requests: automatic linking to component terms (as in {{head}}?) and a timestamp would be the best initial presentation, possibly together with determination whether the item was a duplicate of a previous request. This would be a better base for subsequent decisions, such as fulfillment or deletion. Items not rapidly fulfilled or deleted could be manually labelled with "issue" tags like "attestable?", "SoP?", "spelling?", "language?", or tagged as topically specialized. Thus subsequent reviewers of the items would not be starting over and could pick items based on the issues or topical specialty involved. DCDuring TALK 17:17, 1 December 2014 (UTC)
If we absolutely must keep every addition to the page, then I think we need a somewhat "intelligent" script or database that can track all additions, and boost the popularity of a word when a second (third, fourth...) person requests it, etc.; and this will at least give us a numerical justification for deleting very old and unpopular requests. But it seems like massive overkill. I still prefer the idea of wiping the page every couple of years. Equinox 06:08, 17 December 2014 (UTC)

Html button[edit]

Hello everyone!I had a idea for a new button:A HTML button.It can completely allow html and gives people the ability to make entries better.

Supersonic414-On Wikia 00:09, 30 November 2014 (UTC)

And what would it be used for exactly? —CodeCat 00:38, 30 November 2014 (UTC)
The only button that makes things better is the secret red one that nukes everybody on the planet. Allowing free HTML on here would just encourage stupid exploits like JavaScript. Equinox 03:10, 30 November 2014 (UTC)

Categorizing misspellings[edit]

Please comment I went to categorize Grauniad as under Category:English misspellings but the category's introduction says that it is for unintentional misspellings. Do we have or should we have some scheme for navigating deliberate misspellings? cf. eye spellings like da and tha for "the". —Justin (koavf)TCM 20:08, 30 November 2014 (UTC)

We have {{eye dialect|...|lang=xx}}. But for deliberate misspellings, well, if you deliberately misspell something, you spell it a way that is not in the dictionary; so I don't see why we should expect to include these until/unless they become mainstream spellings. See the recent discussion on rediculous (which CodeCat spells that way on purpose). Equinox 20:18, 30 November 2014 (UTC)
Not all deliberate misspellings are eye dialect, though; Grauniad isn't. Category:English misspellings does already contain some terms labeled "deliberate misspellings", though, e.g. enuf. In fact, in that entry I see that {{deliberate misspelling of}} automatically categorizes into the Misspellings category. —Aɴɢʀ (talk) 21:04, 30 November 2014 (UTC)
What about "fictional" misspellings? I'm thinking of Jane Austen Persuasion, end of chapter VI: ‘... mentioning him in strong, though not perfectly well-spelt praise, as "a fine dashing felow, only two perticular about the school-master," ...’ (bolding added) [1]. Do these get automatic entries? They are fictional eye-dialect, but of course, deliberate by Austen. Choor monster (talk) 21:12, 30 November 2014 (UTC)
Perticular would probably be attestable as eye-dialect. I hear that pronunciation sometimes. DCDuring TALK 23:06, 30 November 2014 (UTC)
It's already an entry perticular, where it's marked as "obsolete". Choor monster (talk) 23:25, 30 November 2014 (UTC)
Change the category description. Renard Migrant (talk) 16:47, 1 December 2014 (UTC)
To what? The OED makes it clear that it's obsolete. I gave the Austen cite is a prominent example of a deliberate fictional misspelling, not eye-dialect, a distinct kind of misspelling from something like "rediculous" or "nucular". Usually these are one-offs, but because Austen is Austen, maybe it rates an entry. Similar, but not rating an entry would be from the end of Miss MacIntosh, My Darling: "She would hang a sign in the restaurant window. Owt to luntsch. Bee bak in a whale. For she could not spell either." Choor monster (talk) 17:21, 1 December 2014 (UTC)
No, change the category description, the written statement at the top of the CATEGORY. Renard Migrant (talk) 23:12, 2 December 2014 (UTC)

Eye spellings Don't go too off the rails about eye spellings: this isn't one. I only mentioned them as another type of misspelling which is not accidental. —Justin (koavf)TCM 10:52, 2 December 2014 (UTC)

Koavf's change to the description is pretty good. But what do you think of making the wording even more compact, like:
  • "Common accidental (or sometimes deliberate) misspellings of {{{langname}}} terms."
 ? - -sche (discuss) 18:26, 9 December 2014 (UTC)

December 2014

Is 'label' the new 'context'?[edit]

As in [2] and [3]. If so, what has 'label' got that 'context' hasn't? Is it just a younger model? I can't keep track of the never-ending cycle of template changes. Kaixinguo (talk) 13:00, 1 December 2014 (UTC)

The language code goes in the first positional parameter (instead of named |lang=), which some find convenient. Also, there are some definition labels which are not strictly contexts, so the name is a bit more accurate. Keφr 14:10, 1 December 2014 (UTC)
Thank you. I need to look into when I should be using 'label', then. Kaixinguo (talk) 10:33, 2 December 2014 (UTC)
@Kaixinguo: The edits of Embryomystic are not supported by consensus. The use of {{label}} is not standard; it currently sees a tiny minority use. See also Wiktionary:Votes/2014-08/Templates context and label and also the talk page Wiktionary talk:Votes/2014-08/Templates context and label. --Dan Polansky (talk) 18:32, 5 December 2014 (UTC)
I'll be honest and say that I prefer {{label}} over {{context}}, but the main reason for those edits was to add language. embryomystic (talk) 23:40, 6 December 2014 (UTC)

Allow checking translations with the translation editor[edit]

I think it might be nice if the translation editor could somehow allow translations that need checking to be marked as "checked". That way people won't need to edit the entry anymore, which speeds things up. —CodeCat 19:14, 1 December 2014 (UTC)

New changes to Chinese entries[edit]

Question book magnify2.svg Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

(Notifying Kc kennylau, Atitarev, Tooironic, Jamesjiao, Bumm13, Meihouwang):

Dear Chinese-language editors,

As the presence of Chinese entries grows rapidly on Wiktionary, I would like to propose some further changes to the format of Chinese entries. These changes aim to reduce the workload of Chinese-language editors (more specifically, avoid data reduplication and synchronisation hassles) and further neatify the code of Chinese entries. The changes include:

  1. Introduction of "lemma forms" for Chinese entries. This follows from sporadic suggestions and discussions raised before, such as at Talk:個. Copying my post there, the arguments for the introduction of lemma forms are:
... introducing the idea of lemma forms, so that information is all kept centralised and the trad-simp entries do not have to be synchronised. Currently the supposedly established practice of synchronisation is poorly maintained, from what I observe from my bot's sweeping edits. Conceivably the lemma forms should be traditional, since trad-to-simp conversion can be performed reasonably reliably. It's not because I discriminate against simp; I grew up with those characters too. That way we could just enable automatic trad-to-simp conversion in zh-usex, and all the information on the page (with the minor exception of the title) would be di-scripted.
In detail, this lemma form proposal would entail:
  • Centralising all information (etymology, pronunciation, definitions, "see-also" terms, compounds) at the traditional form, which is considered the lemma form of a Chinese word. If multiple traditional forms exist, the most common form is chosen as the lemma form.
  • All Chinese text under the Chinese header, including example sentences, related terms, synonyms/antonyms would include both scripts. Example sentences in both scripts can be generated automatically by the template {{zh-usex}}.
  • Other non-lemma forms will be converted to a soft-redirect - see User:Wyang/历史 and User:Wyang/语. The format of these redirects is negotiable but the principle is they should contain as little information as possible.
  1. Adopting a new neater version of the Hanzi box - {{zh-forms}} (backend is Module:zh-forms). Now that many complex editing tasks could be performed automatically with the Lua language, there is no need for partial manual coding of the Hanzi box as is currently implemented. Instead of the code
{{zh-hanzi-box|[[电脑]]|[[電]][[腦]]}}
at 電腦, one can use
{{zh-forms|s=电脑}}.


Please let me know what you guys think about these changes.

Thank you!

Wyang (talk) 11:08, 2 December 2014 (UTC)

(Not an editor of Chinese) I always thought it was interesting but perhaps inconsistent that content has been removed from so-called British entries and focussed on the American spelling, the reason given being that it would be too much work to maintain both entries, whilst having a far greater number of simplified and traditional Chinese entries to maintain. Kaixinguo (talk) 11:24, 2 December 2014 (UTC)
Interesting, but not true. There is no rule or practice to consolidate only in favor of American entries: it can go either way. Chuck Entz (talk) 13:13, 2 December 2014 (UTC)
The idea of centralization sounds logical. I have a couple of questions:
  1. Will the centralized information on the lemma page be too crowded? Since both formats will be displayed.
  2. Will the Hanzi-box still display the characters in the order of simplified-traditional? Would it make more sense to switch the order to trad-simp now that the lemma page will be the traditional?
  3. The soft redirect will not categorize the simplified entries. Is that ok?
  4. Instead of the soft redirect, what about a simple page similar to alternative spelling that we use for other languages?
  5. Will the lemma page visibly contain both formats at the same time or can users set preferences to see only one of them? --Panda10 (talk) 14:31, 2 December 2014 (UTC)
Ping didn't work for me, repeating here @Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang:.
Symbol support vote.svg Support in principle. My preference is simplified for lemmas but if the rest decides traditional, I won't object. (I will give my reasons for preferring simplified over traditional later.)
Any dictionary - print or online uses just one form for articles, providing the other form for reference. Yes, keeping the info centralised is the main issue. --Anatoli T. (обсудить/вклад) 21:06, 2 December 2014 (UTC)

--Anatoli T. (обсудить/вклад) 21:06, 2 December 2014 (UTC)

  • I also support this proposal in theory. It would save myself and other Chinese editors a lot of time dealing with the synchronising work. Using the traditional form as the lemma makes sense, since mapping from trad->simp can be easily automatised, while simp->trad would require the intervention of human editors. Are we at the stage now where we can see a model of this proposed change? ---> Tooironic (talk)
    • Another argument for traditional is that it's more widely represented within the time span of the language we call "Chinese". Simplified is only about... half a century old? —CodeCat 23:02, 2 December 2014 (UTC)
Pro-simplified arguments:
  1. Much more common today. Used in (mainland) China (also Singapore, Malaysia, Indonesia) as the obviously biggest user of the Chinese language (in any form). Not sure if there is statistics about it but you can guess that the explosion of Chinese in Internet is due to China proper.
  2. Some will disagree but I doubt China will go back to traditional characters. Traditional characters are used only in Taiwan and Hong Kong. The status of both is less than a fully independent state. Taiwan government is working hard on preservation of traditional characters for a reason. Many Hong Kong citizens are fluent in English, so foreigners don't need to know Chinese (Cantonese or Mandarin) to get by.
  3. Overseas Chinese almost completely switched to simplified. Universities mostly use it in education. Also preferred by learners for practical reasons.
  4. Ancient books are all converted to simplified spellings, including works in Classical Chinese. How often do you need to read ancient books?
  5. Japanese shinjitai is also relatively new but the battle shinjitai vs kyūjitai (pre-reform spellings) is almost over. The simplification process wasn't agreed on with communities outside China, so the resistance is still big and anti-simplification propaganda affects some people. It shouldn't be politicised and there's no need to link simplified Chinese with communism. It has happened, compare with 1918 reform of the Russian spelling. Although there are flaws, there are obvious benefits.
  6. Simplified Chinese jiantizi is standardized much better than traditional Chinese fantizi. It's fantizi that still has more variants, obscure characters and IME (input methods) are better suited for simplified Chinese.
  7. Although traditional Chinese is the original form and links better to Sino-Xenic derivations, it's not always true. Japanese has its own simplification - shinjitai, which matches 30% (I guess) with jiantizi and has its own, Japanese specific forms. Korean and Vietnamese have their variants to some extent and for these languages, Chinese characters are no longer the writing system they use, especially Vietnamese.
  8. Most Chinese contributors are from mainland China, the majority of learners, IMHO, focus on simplified Chinese. I wonder how they feel when they find they have to click on the soft redirect links to get the info. --Anatoli T. (обсудить/вклад) 23:42, 2 December 2014 (UTC)
I wish to stress that I won't object traditional over simplified, if everybody wants so but the choice should be carefully weighed out. Languages and scripts are like currencies, it's not we like but what's more common and practical. --Anatoli T. (обсудить/вклад) 01:11, 3 December 2014 (UTC)
Question: Is there a one-to-many correspondence between traditional and simplified simplified and traditional characters, or is there a many-to-many correspondence? In other words, are there any traditional characters that correspond to more than one simplified character? --WikiTiki89 01:24, 3 December 2014 (UTC)
Usually it's a one-to-many correspondence between simplified and traditional characters (in this order). So a conversion from simplified to traditional would be harder but if traditional characters are used as a source for usage examples, etc, then it would still work. It's quite rare but a simplified version should be manually fed into templates, when there are variants. Automatic conversion tools work better (but not 100%) from traditional to simplified. --Anatoli T. (обсудить/вклад) 01:38, 3 December 2014 (UTC)
So you're saying there are traditional characters that have more than one simplified character, but very few of them? --WikiTiki89 01:43, 3 December 2014 (UTC)
Apparently, there are 19 (?) such traditional characters: , , , , , , , , , , , , , , , , , , . Most common/notable is probably . Wenlin editor always asks how you wish to convert the character to simplified. Don't fully trust Wiktionary on this, single-character entries need a lot of attention. --Anatoli T. (обсудить/вклад) 01:53, 3 December 2014 (UTC)
The funniest character is , which, when incorrectly translated causes mistranslations, like "dry food" becomes "fuck food". is both simplified and traditional for this sense. --Anatoli T. (обсудить/вклад) 01:59, 3 December 2014 (UTC)
That explains some of those Chinese mistranslation memes. Anyway, that means that if we have an automatic conversion from traditional to simplified (whatever it is used for), then it should require a manual conversion whenever those characters are present, right? So I have an idea that would let us have the lemmas at the simplified character entries and still make use of automatic conversion: At simplified character entries, we can group the definitions by traditional equivalents that are indicated in the headwords and include HTML anchors. Then the traditional character entries will link to the correct anchor at the simplified character entries, bringing the reader directly to the definitions he was looking for. --WikiTiki89 02:17, 3 December 2014 (UTC)
With automatic conversion it always requires a knowledge/intervention of an editor, whether you create a simplified or traditional character term, character forms and pinyin. See 台湾 (Taiwan) or its traditional equivalents 臺灣 and 台灣. Or others like 什么. The simplified entry could contain a usage example, where traditional character are used with parameters when a different conversion is required:
什麼 [MSC, trad.]
什么 [MSC, simp.]
Zhè shì shénme? [Pinyin]
What is this?
--Anatoli T. (обсудить/вклад) 02:30, 3 December 2014 (UTC)

Thank you for the replies. Answering Panda10's questions:

  1. Centralised information on the lemma page will not look crowded - the examples and derived terms will be in collapsed mode. Please see for an example.
  2. Good point. I have changed the order.
  3. Categorisation will be added.
  4. Personally I think the current format for alternative spellings will be excessive for non-lemma Chinese forms. Pronunciation, definitions, see-also terms will be 100% the same.
  5. Currently both are displayed, but envisageably some sort of gadget could be developed for this purpose, using what the Chinese Wikipedia and Wiktionary do.

@Tooironic: The new lemma forms would look like (with all Chinese text in both scripts), and the non-lemma forms would look like User:Wyang/历史 and User:Wyang/语 (or other formats if people prefer).

I entirely agree with CodeCat's reason that Traditional Chinese's long history is a factor not to be overlooked here. It is the reason that Hanyu Da Cidian, the most inclusive Chinese dictionary produced by PRC and in history, uses traditional forms as headwords and most of its citations. The main advantage would be the automation of the trad-simp conversions. Graphical etymologies and descendants would also be heaps easier when done at the traditional forms - see for example 風#Etymology and 學#Etymology.

The choice of script for the title is IMO not a crucial choice, since the title would be the only Chinese text that is not di-scripted on the lemma pages. (The title may even be di-scripted with gadgets.) Soft redirects shouldn't be too much of a problem - since information is not lost. Wyang (talk) 04:33, 3 December 2014 (UTC)

@Wyang: Let it be traditional then. Do we need a vote? If you set it up, I'll support it. I'm not sure if {{ping}} works, maybe we need to poll some editors manually. --Anatoli T. (обсудить/вклад) 21:24, 3 December 2014 (UTC)
This is Chinese so you need tone marks: {{pīng}}. —CodeCat 13:54, 4 December 2014 (UTC)
Is that or ? Chuck Entz (talk) 14:15, 4 December 2014 (UTC)
We can bombard those unresponsive with 乒乒乓乓. Wyang (talk) 03:13, 5 December 2014 (UTC)
OK, thanks Anatoli. Let's wait for a day or two, and we can start the vote then. Wyang (talk) 13:40, 4 December 2014 (UTC)
We have to work out details of the format (a template) and categorising (more detailed) of jiantizi, perhaps also about interwikis (Wiktionary:Grease_pit/2014/December#Interwiki_bots), since Chinese Wiktionary mainly use jiantizi. It's a big job, hopefully it can automated and simplified characters are not disadvantaged in usage examples, synonyms, etc. --Anatoli T. (обсудить/вклад) 21:47, 4 December 2014 (UTC)
OK, any suggestions for the template and the categorisation? Wyang (talk) 03:13, 5 December 2014 (UTC)
I think categorisation should possibly mirror traditional entries, if that's reasonable and possible (without having to edit the entry manually - done and forgotten). A simplified entry should have a definition line, and (IMHO) a short usage note (standard in PRC, Singapore, Malaysia, etc.). I'll give it a thought. --Anatoli T. (обсудить/вклад) 05:20, 5 December 2014 (UTC)
I added a note and categorisation - Please take a look at User:Wyang/历史 and User:Wyang/热爱. Wyang (talk) 07:20, 5 December 2014 (UTC)
I still find the dark gray template at the simplified entry too different from the current standars. Would it be too complicated to create a template with the below layout? How about adding the Hanzi box? Another topic: As the language develops over time, is there a chance that the simplified character will get a new meaning that the traditional character will not have?
==Chinese==

===Noun===
'''历史'''

# ''simplified form of'' '''[[歷史]]'''

====Usage notes====
* '''[[Simplified Chinese]]''' is mainly used in Mainland China and Singapore.
* '''[[Traditional Chinese]]''' is mainly used in Hong Kong, Macau and Taiwan.

--Panda10 (talk) 19:00, 5 December 2014 (UTC)

My preference would be to have minimal information on the non-lemma pages, i.e. link only. Definitions and other details (including parts of speech and hanzi box) are covered at the main entry, for clarity and ease of maintenance (e.g. 保险). Answer to the second question: No, they are strictly two versions of the same thing. Wyang (talk) 02:25, 6 December 2014 (UTC)

Symbol support vote.svg Support I support this proposal. I was worried about categorisation but I read above that it will be handled. For the conflict between simplified and traditionnal, is it possible to have a template on the non-lemma page which will parse the lemma page, extract the parts corresponding to the non-lemma word, convert it to the non-lemma script and display it? This way, the lemma and non-lemma page will display the same information. Meihouwang (talk) 15:00, 6 December 2014 (UTC)

(I'm not a Chinese-speaker, but) I support this proposal. Compare how Swiss spellings like Strasse are soft-redirected. - -sche (discuss) 18:39, 9 December 2014 (UTC)

We can use an existing L3 header, which is legal! - ===Hanzi=== and probably not just for simplified but ALL Chinese entries, Wyang, you may get away with the ===Definitions=== idea after all but using ===Hanzi=== instead. Chinese may not need PoS headers:

==Chinese==

===Hanzi===
# ''simplified form of'' '''[[歷史]]'''

...notes follow

An entry where simplified is also traditional for some senses could use a normal format. --Anatoli T. (обсудить/вклад) 00:42, 7 December 2014 (UTC)

  • ===Hanzi=== was only for single-character entries, no? Much like the ===Kanji=== header used in Japanese single-character entries? ‑‑ Eiríkr Útlendi │ Tala við mig 01:02, 7 December 2014 (UTC)
Yes, that was the original intention but hanzi (and kanji) are invariable nouns. --Anatoli T. (обсудить/вклад) 01:14, 7 December 2014 (UTC)
  • Sorry, I don't understand your comment. I think you mean that the singular and plural forms are identical, which is fine, but also irrelevant to my intended point.
To expand upon my earlier question, if ===Hanzi=== has been used primarily in single-character entries as a header indicating information about that specific character that does not belong under any of the other headers (such as character composition, Unicode chart links, historical development), then the sample use above (in a multi-character entry, and not as a header indicating info about these characters, but rather as a generic non-POS header) strikes me as misleading and potentially confusing. ‑‑ Eiríkr Útlendi │ Tala við mig 07:11, 7 December 2014 (UTC)
The information (non-lexical) you're referring it is stored under ==Translingual== L2 header, not under ===Hanzi=== L3 header, everything lexical related to single or multiple character terms is the same - it's lexical, transliterations and pronunciations in the new format are under ===Pronunciation===. Unlike Japanese (also Korean, Vietnamese) - hanzi is the only writing system for standard Chinese, so whether it's a single character (even a component) or a long word, they can all be handled under one header. Rather than using ===Definitions=== (there is no agreement on this heading and an administrator may potentially removed it), I suggest using ===Hanzi=== (which is legal) or we need to promote ===Definitions=== and make it legal. --Anatoli T. (обсудить/вклад) 09:34, 7 December 2014 (UTC)
I'm neutral on the choice of header, if any. My feeling is that this general format gives too little emphasis on the actual link, in that it does not specifically inform readers of where to find definitions and other content instead. I'm quite sure that people might start complaining about "no definition, no pronunciation" in Wiktionary:Feedback, since the redirect appears no different from a normal link and is not conspicuous enough. Wyang (talk) 20:23, 7 December 2014 (UTC)
You can add the information after the 'simplified form of 歷史' - in the same line - as you planned in the original gray box. --Panda10 (talk) 21:05, 7 December 2014 (UTC)
First things first. We need descriptive templates for links to traditional and usage notes templates, which can mention that all the info is in the traditional form entry. It's better to split them into two. Simplified characters may be also traditional for some senses or alternative forms, like or . "Hanzi" header may be used for both traditional and simplified entries. --Anatoli T. (обсудить/вклад) 23:05, 7 December 2014 (UTC)
How about this? Wyang (talk) 07:56, 8 December 2014 (UTC)
Thanks, it looks good BUT it may not pass the requirements on WT:ELE#Definitions and someone may complain and we'll have to redo as it was with Wiktionary:Votes/pl-2013-03/Japanese Romaji romanization - format and content and Wiktionary:Votes/pl-2013-03/Romanization and definition line. Maybe the links should have an L3 header (not generated by a template), e.g. ===Hanzi===? The problem with ===Definitions=== header is that it has not been approved yet, so a change may not be supported because a new header is introduced, not because of the change itself. Not sure about colours either, maybe no colours should be present in the redirect, just text. I'm just thinking of things that can possibly cause problems. --Anatoli T. (обсудить/вклад) 11:42, 8 December 2014 (UTC)
Since I am not an editor of Chinese, my apologies for my questions. But when I click on the traditional link in the redirect box, where is it supposed to jump to? To the translingual entry? Or to the Mandarin entry? Currently, it just goes to the top of the page.
Do you need the collapse function for the 2-line simplified/traditional usage note? Could they be centralized? For example: placed into their own box on the right, only once for the entire entry. Or no box, just simple text in small font, right under the Chinese L2 header, before any of the L3 headers start. It's a central note, valid for all ety's.
I agree with Anatoli that the colors may not be necessary in the redirect, just the text, although I understand that the color would highlight the fact that this is a redirect, not just a usual ety. For the text itself, there could be several variations, depending on which part should be first and which second. Just reducing the number of double quotes might help to simplify the look. Two possible examples:

Etymology 1[edit]

See ('dry') for pronunciation and definitions of . ( is the simplified form of .)

OR

Etymology 1[edit]

is the simplified form of . See ('dry') for pronunciation and definitions.

--Panda10 (talk) 18:06, 8 December 2014 (UTC)
Thank you both for suggestions. My understanding is that Wiktionary:ELE serves as a format guide for normal entries (The first sentence on the page says "While the information below may represent some kind of “standard” form, it is not a set of rigid rules."). Wiktionary lacks policy or even precedents of soft-redirects, for situations where a multi-scripted language consistently redirects forms written in one script to the forms written in the other script. This is of a completely different nature to the "non-lemma forms" commonly mentioned before, which were mostly stylish (naivety), compounding (sockpuppet), historical (anæmia) and erroneous (accomodation) variants, with very few being actual regional variants (compare the non-redirection of colour/color). As a consequence, there is no properly designed layout for this new category of soft-redirects, and existing layout fails to give due emphasis to the link to information, misleading readers to think that the form is an uncommon (and unimportant) alternative variant. In the case of , the inconspicuity of the redirections does not create the impression that two thirds of the definitions of the character should actually be found at the respective lemma forms.
Answering Panda10: The link to traditional links to the Chinese section, if it exists. The merger of Chinese variants is ongoing. The box will only show the notes if the variant type is "simplified". It could also be "ancient", "obsolete" or "variant", in which case the notes would not be appropriate. Characters like 干 are very rare - most will have only one note displayed, thus having notes underneath the link might be more explanatory. Wyang (talk) 00:09, 9 December 2014 (UTC)
The closest thing we have for a single-language full-entry soft redirect is {{no entry}}. —CodeCat 19:06, 9 December 2014 (UTC)
I would say {{pinyin reading of}} or {{ja-romaji}} are close equivalents to soft redirects. I personally don't have strong objections to Wyang's suggested format but I'm almost sure, there will be strong opposition to the new format, once we start changing entries. The votes, discussions on Japanese romaji entry format is a good example of that, despite the seeming triviality of romanisation entries. Specifically, the definition line (starting with #, not generated by a template), a PoS header (including "Definitions" or "Hanzi") should be sorted first - need to get some kind of legitimacy. The topic is mostly ignored now by people, who will surely raise their voice later. I don't want to create obstacles, I fully support the change (despite my preference for simplified) but I'm worried about time and efforts that could be spent. --Anatoli T. (обсудить/вклад) 21:35, 9 December 2014 (UTC)
If KassadBot (or similar) is restarted, it will flag these entries as incorrectly formatted. --Anatoli T. (обсудить/вклад) 21:53, 9 December 2014 (UTC)
{{pinyin reading of}} and {{ja-romaji}} are not for native scripts, whereas these redirects will be for native scripted forms and therefore need to be more eye-catching. IMO the first step is to make sure the introduction of lemma forms is agreed upon, and further changes to formats can be discussed once the first is established. A vote shouldn't be necessary if there is overwhelming support from related editors - and it seems from the discussion above that we can presume that step 1 is accepted by most, if not all. Wyang (talk) 09:02, 11 December 2014 (UTC)
I know they are not for native scripts but the lack of "#" on a definition line on romaji entries caused quite a stir (even if it was generated by a template). The new format examples introduces Definitions header as well (when simp./trad. is shared in some cases), for which we still have opposition. I've made some 字 entries with this header, anyway. I think you can proceed. --Anatoli T. (обсудить/вклад) 00:47, 12 December 2014 (UTC)
  • For the record, I disagree with removing definitions from simplified entries. Let those who want to concentrate on traditional entries do so; they should not be forced to create simplified entries at the same time. --Dan Polansky (talk) 12:13, 14 December 2014 (UTC)
The problem is not about having or not wanting to create simplified entries but about having them badly out of sync. Long-timers have been putting lots of efforts keeping them in sync but bewcomers often fail to do so, especially with entries with derived terms, see also's, etc. It's admittedly hard to synchronise large entries. Please note that the changes User:Wyang made will allow to show both forms in usage examples, etc. Editors only need to provide the traditional form. For the record, no single published or online dictionary has identical contents of both traditional and simplified Chinese, it's always one or the other. The other form is also provided. Our objective is to have both forms for each user example, synonym, etc. so that there is no information loss and users who have difficulties with traditional Chinese could use simplified right next to it (or below in multiline usexes). The only disadvantage (at this moment) to simplified Chinese users is having to click through the link but even sophisticated electronic ductionaries are not able to show both forms in usexes, users have to set options as in Pleco or Wenlin. --Anatoli T. (обсудить/вклад) 10:48, 19 December 2014 (UTC)

FYI: Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese —This unsigned comment was added by Dan Polansky (talkcontribs).

The "soft redirect" is a very bad idea! 173.89.236.187 04:10, 26 December 2014 (UTC)
@Wyang, Kc kennylau, Tooironic, Jamesjiao, Bumm13, Meihouwang: The vote is on. --Anatoli T. (обсудить/вклад) 05:24, 26 December 2014 (UTC)
  • The following is my opinion as an active Chinese editor. According to what Atitarev tells me, we do not currently have a way to store both traditional and simplified Chinese entry content in a central database which can then be displayed at the corresponding trad and simp entries. Given this reality, I would support converting all simplified entries to hard-redirects to their traditional counterparts with the exception of 1) entries which have Japanese/Korean/Vietnamese readings, and 2) simplified entries which have more than one traditional conversion. In the case of these two scenarios, we can still provide the soft redirect. This may involve a bit of stuffing around, but I think it is the best compromise we can reach given the circumstances. I can tell you from personal experience that the hours of work I've spent on duplicating simp/trad entries and synchronising all the content (etymology, example sentences, synonyms, derived terms, etc.) is something I am very keen to say goodbye to. If we can get read of this time-wasting activity altogether, we can boost the efficiency of our Chinese editors, and in turn largely increase our coverage of the Chinese language on the English Wiktionary. ---> Tooironic (talk) 06:05, 27 December 2014 (UTC)
The conversion to having simplified entries re-direct to traditional seems already to have begun. Should it not wait until the vote is over? Kaixinguo (talk) 02:49, 10 January 2015 (UTC)

Demoting kyūjitai to stubs/soft-redirects[edit]

Somewhat similar to the discussion just above and Wiktionary:Tea_room/2014/December#社会 and 社會, I suggest to make some changes to Japanese entries, which are kyūjitai and is not current use (some kind of exceptions can be made for kyūjitai, which are still in use, perhaps. Even the format of 社會 is too much, IMHO. It shouldn't contain translations, romanisation, etc, just a one-line link to lemma - 社会. I have no exact format at the moment, just wish to mention and get opinions. Calling @Eirikr, TAKASUGI Shinji, Haplology, Whym: (please add anyone I missed). --Anatoli T. (обсудить/вклад) 03:50, 3 December 2014 (UTC)

I agree. We use {{archaic spelling of}} for English. Why not for Japanese? — TAKASUGI Shinji (talk) 23:46, 3 December 2014 (UTC)
I agree that a one-line link would be sufficient for them. As for the wording, I would prefer saying something like "archaic" (as in the template Takasugi-san suggests) or "rarely used", than "not in current use". At least some words in kyūjitai such as 藝術 (art), (cherry tree) appear to me more like archaic than obsolete (in the sense that most people, if not all, can understand). This is perhaps because I keep seeing some institutions and people using those forms as part of their names. Whym (talk) 00:19, 4 December 2014 (UTC)
I also agree, and I think {{archaic spelling of}} sounds like a great idea. Many (most?) of these spellings aren't strictly obsolete, as Whym notes, and do get used intentionally from time to time. ‑‑ Eiríkr Útlendi │ Tala við mig 00:50, 4 December 2014 (UTC)
{{archaic spelling of}} is not the best solution, IMO, as there are archaic spellings, which are not kyūjitai. Kyūjitai merits a separate template, with categorisations. I also support removing all additional infos, definitions, examples to avoid duplications. --Anatoli T. (обсудить/вклад) 00:52, 12 December 2014 (UTC)

PoS filtering at OneLook[edit]

Here is a description of the new capability added to OneLook. It already had wildcard searches which allowed searches for words ending in "full" (which we don't as "full" is not a suffix and {{compound}} does not categorize). DCDuring TALK 17:02, 2 December 2014 (UTC)

Using templates to synchronize US-UK spelling[edit]

FYI: Wiktionary:Grease pit/2014/December#Revisiting the issue of English UK/US spellings and entry synchroni(s.7Cz)ation. --Dan Polansky (talk) 22:07, 5 December 2014 (UTC)

Esperanto participles - markup in headword lines[edit]

FYI, an editor currently mass replaces "{{head|eo|participle}}" with the likes of "{{eo-part|alĝustig|ite}}", as in diff. I don't know the benefit of such a replacement; if you are an Esperanto editor (User:Mr. Granger?), you might want to have a look and see whether the change seems good to you. --Dan Polansky (talk) 11:03, 6 December 2014 (UTC)

{{eo-part|alĝustig|ite}} puts the term into Category:Esperanto adverbial participles, while {{head|eo|participle}} puts it into Category:Esperanto participles, so it's more specific. {{head|eo|adverbial participle}} would have the same effect, but by using {{eo-part}}, editors don't have to remember which kind of participle is associated with which suffix, as the template does the work for them. —Aɴɢʀ (talk) 11:14, 6 December 2014 (UTC)
Is the goal to have Category:Esperanto participles empty, having all participles classified in one of Category:Esperanto adjectival participles‎, Category:Esperanto adverbial participles‎ and Category:Esperanto nominal participles‎? --Dan Polansky (talk) 11:43, 6 December 2014 (UTC)
That sounds like a reasonable goal to me—every participal is either adjectival, adverbial, or nominal, so it's certainly possible to move all of them to the subcategories, and maybe that would be useful in some way. There's no need for User:Embryomystic to do it by hand, though—it could easily be done by bot. —Mr. Granger (talkcontribs) 14:12, 6 December 2014 (UTC)
Fair enough. Send in the bots. embryomystic (talk) 23:33, 6 December 2014 (UTC)

Mass or indiscriminate adding of RFE - requests for etymology[edit]

I noticed a user is mass additing RFE tags to Estonian entries; not using a bot but in considerable volumes anyway. My opposition to these requests tags is probably known; I still oppose this practice. If there are other people who like me think that the tags are pointless especially when being added indiscriminately, maybe we could do something to prevent the continuation of addition of these tags. For reference: Category:Estonian entries needing etymology, Recent changes in that category. --Dan Polansky (talk) 15:10, 7 December 2014 (UTC)

Are you suggesting that these entries do not need etymology? —CodeCat 15:43, 7 December 2014 (UTC)
All entries ought to have etymology ultimately, but the absence of an ety section already indicates that it is missing. RFE should be reserved for words that a user has a particularly keen interest in; otherwise we might just as well auto-add it to every entry, which is unhelpful for readers. Equinox 16:15, 7 December 2014 (UTC)
My experience is that a notice makes people more likely to add information. It has helped with inflections for example. Furthermore, the category is a to-do list, as it shows all entries that still lack an etymology. Keen interest is irrelevant; for every entry where a user adds a request template to indicate an interest, there are ten more where the user has simply left, disappointed, without adding a notice. —CodeCat 16:21, 7 December 2014 (UTC)
We also (judging from the feedback page) have users who leave disappointed because they literally can't find the definition among all the tables of contents and large ety and pron sections! Equinox 16:32, 7 December 2014 (UTC)
Tabbed Languages, people. Keφr 16:34, 7 December 2014 (UTC)
{{rfelite}} is less intrusive than {{rfe}}. DCDuring TALK 17:01, 7 December 2014 (UTC)
rfelite still requires an etymology section heading for what is not content, just a request. --Dan Polansky (talk) 17:30, 7 December 2014 (UTC)
But we're talking about a category that would contain millions of entries- entries which require individual attention by human beings with knowledge on how to do etymologies. It would never be cleared in your lifetime or mine- what kind of a motivator is that? Chuck Entz (talk) 22:43, 7 December 2014 (UTC)
The category name is misleading. Recently, the category name was Category:Requests for etymology (Estonian). The renaming happened via Wiktionary:Requests for moves, mergers and splits#Category:English definitions needed to Category:English entries needing definition discussion in August 2014 with very little participation; the only boldfaced support there was by Wikitiki. Now as before, I think Wiktionary:Requests for moves, mergers and splits should either be discontinued or limited to pages in the main namespace, since it is a positively harmful process.
I have seen no evidence that these RFE tags make people more likely to add etymologies, and I don't believe that to be the case. --Dan Polansky (talk) 17:03, 7 December 2014 (UTC)
FYI: Wiktionary:Votes/2014-12/Adding RFEs to all lemma entries where etymology is missing. --Dan Polansky (talk) 17:07, 7 December 2014 (UTC)
I think you may be right after all. If there are votes for every little issue you have, people will start to ignore those, too. I know I will. —CodeCat 17:14, 7 December 2014 (UTC)
It's not a little issue. It's a deviation from a previous practice. Up to now, we did not try to use RFE (which is still named a "request") to cover all missing etymologies; even right now, it is not our practice. Fact is, you are not very good at creating votes that result in support for your changes. Of the top of my head I don't remember any, but there probably is at least one such vote. One of your recent proposals has a vote which does not show consensus for your change: Wiktionary:Votes/2014-08/Migrating from Template:term to Template:m. --Dan Polansky (talk) 17:30, 7 December 2014 (UTC)
In my experience, a limited number of requests (of any type) is necessary and most editors and some users do this but when they are too many, it's demotivating (an exception is, obviously when there are known editors who do this on a regular basis or there is a previous agreement). I dislike when people add translation requests to any entry they edit, especially when it is a request of a non-trivial term and into a language we have few contributors for. --Anatoli T. (обсудить/вклад) 03:47, 8 December 2014 (UTC)

Template:attributive of under Adjective PoS[edit]

A significant number of the 675 uses of {{attributive of}} appear under the Adjective PoS header. The overwhelming majority of these are not adjectives, as the use of the template suggests and as tests for adjectivity would likely show.

  1. Do we want to keep these inane entries and add more to preempt the creation of shoddy, inane Adjective PoS sections by well-meaning contributors?
  2. Do we want to clean them out by RfV to test the validity of their adjectivity?
  3. Do we want to allow contributors to delete all of them that are not attested and do not have any other definition that might warrant inclusion?

Neither option 1 nor option 3 are in accord with CFI, but that may be more a suggestion than a policy anyway. DCDuring TALK 23:07, 8 December 2014 (UTC)

There can be no mass deletion. We can rfv them or rfd them all separately. Or change the header and templates to noun. That's the only one that can be done en masse. Renard Migrant (talk) 23:22, 8 December 2014 (UTC)
There usually already is a Noun PoS. So you think it would be OK to move the offending sense line to the existing Noun header?
Are you at all concerned about the likely re-creation of an Adjective PoS section after the changing-to or merging-into the Noun PoS header? DCDuring TALK 23:46, 8 December 2014 (UTC)

Translations of non-lemma forms (see newest)[edit]

Do we allow translations for English non-lemma forms? See newest. --Panda10 (talk) 15:16, 9 December 2014 (UTC)

  • Why would we not? bd2412 T 16:15, 9 December 2014 (UTC).
Would a superlative form be considered an "inflected form"? According to Wiktionary:Entry_layout_explained#Translations: "English inflected forms will not have translations. For example, paints will not, as it is the plural and third-person singular of paint. In such entries as have additional meanings, these additional meanings should have translations. For example, the noun building should have translations, but the present participle of build will not." --Panda10 (talk) 16:42, 9 December 2014 (UTC)
Yes, a superlative is considered an inflected form. —Aɴɢʀ (talk) 16:47, 9 December 2014 (UTC)
Not in all languages. Latin comparatives and superlatives are considered lemmas in Wiktionary. And in many other languages such as Slovene and Finnish, the comparative and superlative are what you might call a "half-lemma": they have a non-lemma definition, but they also have their own inflection table like a lemma. Participles are treated similarly in many languages. —CodeCat 19:40, 9 December 2014 (UTC)
Latin novissimus is in Category:Latin non-lemma forms, not in Category:Latin lemmas. —Aɴɢʀ (talk) 20:52, 9 December 2014 (UTC)
But compare Category:Latin superlative adjectives. —CodeCat 21:13, 9 December 2014 (UTC)
  • Because we presumably list the translation at the lemma form, and inflected forms of the foreign word at the foreign-language entry. If you want to know how to say "newest" in Hungarian or Polish, you go to [[new#Translations]], find the Hungarian or Polish word, go to that entry, and see what the superlative of it is. I'd be opposed to listing translations of nonlemma forms, because of the sheer quantity of translations that could theoretically be added, especially for verb forms. I don't relish the idea of seeing an entire translation table of third-person singular forms at [[walks]] and two entire translation tables (one for the past tense and one for the past participle) at [[walked]]; especially not for languages where those forms are not distinct from that language's lemma form in the first place, meaning the entries for those languages would be redundant to the entries at [[walk]]. We'd never be able to keep them coordinated with the translations tables at the lemma form, which is why we already use {{trans-see}} for near-perfect synonyms, alternative spellings, and the like. —Aɴɢʀ (talk) 16:47, 9 December 2014 (UTC)
    What Angr said. DCDuring TALK 16:53, 9 December 2014 (UTC)
    It seems to me that it would make it much easier for the reader if they could look up the translation by going to the exact word for which it is a translation. This would be doubly useful for words for which the lemma might have a dozen different meanings, but a particular inflection only occurs for one of those meanings. bd2412 T 17:26, 9 December 2014 (UTC)
    Let me introduce a radical consideration: resources, ie, contributors. Is this what we would like to either do ourselves, offer to new contributors as a task, or attempt to automate? DCDuring TALK 19:28, 9 December 2014 (UTC)
    There are eleven translation tables at [[new]], some with dozens of languages in them. Are we to repeat all of those translations at [[newest]], but using the superlative form, even for languages where the superlative is formed fully regularly, even periphrastically (e.g. le plus nouveau with each word linked separately)? And when someone comes along to [[new]] and adds a new language, say Marathi, to the translations, who's going to go to [[newest]] and add the superlative there? And many languages have multiple past tenses, while English only has one, not to mention multiple persons and numbers; should the French translation for [[walked]] list all of the following: marchais, marchait, marchions, marchiez, marchaient, marchai, marchas, marcha, marchâmes, marchâtes, marchèrent, ai marché, as marché, a marché, avons marché, avez marché, ont marché? And then the same thing for all of the polysemous verbs that have more than one French translation; shall we list all 17 forms for each verb that can be used to translate the English word? Lower Sorbian has at least four verbs that mean "go", two past tenses, three persons, and three numbers; should the translation table for [[went]] really list all 72 forms? I don't think that's going to make things any easier for the reader than simply going to [[go]], finding the Lower Sorbian lemmas, and then finding the appropriate inflected form on the Lower Sorbian lemma page. —Aɴɢʀ (talk) 20:52, 9 December 2014 (UTC)
    No one is proposing such a thing any more than they are proposing that "newest" should have eleven senses reflecting the senses at "new" (some of which seem redundant to me, like sense 12 merely being a special case of sense 3). As for conjugations, we can find other ways to deal with those. I am merely proposing that we should give the reader the shortest path to finding what they want. If contributors don't want to add such information, then it won't get added, but that doesn't mean it should be prohibited. We might as well say that etymologies or pronunciations of non-English terms should be prohibited because including them is too daunting a task for contributors to engage in. bd2412 T 21:17, 9 December 2014 (UTC)
    But indirectly that is what's being proposed, because if we remove the prohibition on inflected forms having translations from WT:ELE, there's nothing to stop someone from adding 72 Lower Sorbian forms in a translation table at [[went]]. And that would not help anyone, not even someone trying to figure out how to say "I went to Cottbus" in Lower Sorbian. Basically, I think it's an illusion that listing the translations for inflected forms will help the reader. It seems at first blush like it will, but in actual practice it won't. —Aɴɢʀ (talk) 21:44, 9 December 2014 (UTC)
    Can we draw a distinction between inflected verbs and inflected adjectives? Are we going to find 72 Lower Sorbian forms of "newest"? bd2412 T 22:22, 9 December 2014 (UTC)
    We can, but the original question was about English non-lemma forms in general, not English adjective forms specifically. There will only be 15 distinct Lower Sorbian forms of "newest". —Aɴɢʀ (talk) 22:57, 9 December 2014 (UTC)
    A tricky topic. The superlative form [[newest]]] can have a lemma form as well in the Slavic languages - masculine, singular, nominative case, e.g. in Russian it's нове́йший (novéjšij) or са́мый но́вый (sámyj nóvyj), other genders, plural, cases should not be in the translation, if they are added. If I were to translate the past tense form [[went]] into Russian, then I would use masculine singular - шёл impf (šol), пошёл pf (pošól) - concrete of идти́ (idtí), ходи́л impf (xodíl), походи́л pf (poxodíl) - abstract of ходи́ть (xodítʹ) (verbs of movement can have concrete and abstract versions in Slavic languages). Plus, there are equivalent verbs to go by a vehicle (perfective, imperfective, concrete, abstract), so there could be, at least eight translation into Russian. See go#Translations, e.g. translations into Russian. --Anatoli T. (обсудить/вклад) 23:36, 9 December 2014 (UTC)
To answer the question, no, it was formally disallowed by a vote. Renard Migrant (talk) 15:11, 10 December 2014 (UTC)
Did the vote explicitly define whether comparatives and superlatives are considered inflected forms? I would say they are, but there doesn't seem to be unanimity on that issue. —Aɴɢʀ (talk) 21:00, 10 December 2014 (UTC)
The Wiktionary:Votes/pl-2011-02/Disallowing translations for English inflected forms did not mention comparatives and superlatives. --Panda10 (talk) 14:57, 11 December 2014 (UTC)
In that case, what we need to decide is whether comparatives and superlatives are considered inflected forms in the sense of that vote or not. —Aɴɢʀ (talk) 15:09, 11 December 2014 (UTC)
I wrote that vote, and they are. Renard Migrant (talk) 18:22, 11 December 2014 (UTC)
Who wrote the vote is immaterial. Voters only voted on what it says in the vote, not on the unspoken intentions of the creator of the vote. --Dan Polansky (talk) 12:37, 14 December 2014 (UTC)
For the purposes of translations (and I think most others), I think they should be. Apart from "logistic" issues (i.e. keeping the lists in synch) and redundancy (you can already translate the ungraded form and look up the graded form in the target language's entry), different languages may handle gradation differently (e.g. elative form in Arabic, or several "workaround" constructs for Japanese, which has no proper adjectives to begin with), so often there will not be a good match in the target language. Keφr 18:24, 11 December 2014 (UTC)
If people want to amend to vote to explicitly cover comparative and superlative forms, fine. Basically the intention was Category:English non-lemma forms (which didn't exist yet). Renard Migrant (talk) 13:49, 14 December 2014 (UTC)
You didn't write that. User:Mglovesfun did. You can't speak for him.
In any event that is how I understood it to apply in English. I was somewhat aware of the complications in Latin with inflected forms of participles and gerunds, but, perhaps unrealistically, did not expect the vote to be applied mechanistically, even in English, let alone across all languages. AFAICT we have never been very good at drafting substantive proposals that anticipated even obvious matters such as this. (Not that we've been much better on procedural matters.) DCDuring TALK 14:33, 14 December 2014 (UTC)

@bd2412: the reason is simple: in most cases, it's meaningless, because inflection rules only depend on the language and on the context in the sentence. An example: if the feminine form of an adjective is used in Italian because the noun is Italian, it must be translated to the masculine form of the corresponding adjective in French is the feminine noun is translated to a masculine noun. This is why the définition should be Feminine form of, not a translation (and, anyway, the translation in English would be the same for all forms of the Italian adjective). For the translation section, this is the same issue. Sometimes, grammatical rules are similar enough (e.g. existence of a plural form meaning the same in two languages), but this is a special case. Lmaltier (talk) 08:56, 20 December 2014 (UTC)

I don't think there's reasonable doubt over what the vote proposes to cover. You could consider it a loophole; a way to get around the rules like Google avoiding paying corporation tax. But don't pretend it's good faith, it isn't. Renard Migrant (talk) 22:07, 31 December 2014 (UTC)

Oxford Dictionaries word of the year 2014[edit]

[4]: They chose vape (we had it in 2012) and runners-up bae (we had it in 2014), budtender (2012), contactless (2008), indyref (we don't have it), normcore (2014), slacktivism (2006). Equinox 23:55, 10 December 2014 (UTC)

  • Excellent, but we shouldn't dislocate our shoulder patting ourselves on the back. DCDuring TALK 00:45, 11 December 2014 (UTC)
  • indyref added December 2014 - not sure if it will have a lasting usage. SemperBlotto (talk) 08:44, 11 December 2014 (UTC)
    indyref is a word? I thought it was a hashtag. Renard Migrant (talk) 18:23, 11 December 2014 (UTC)
    • A Google News search turns up some reputable media outlets that appear to be using it as a word, though mostly as shorthand in headlines. bd2412 T 18:42, 11 December 2014 (UTC)

Request for Permissions[edit]

I would like to request to be able to delete pages and move pages without a redirect, please. Anglom (talk) 18:38, 11 December 2014 (UTC)

  • That would make you an administrator, as those are the only editors able to do so. Based on your length and duration of activity here, I would support you in an adminship bid if you make one. bd2412 T 18:44, 11 December 2014 (UTC)
  • @Anglom: Who are you? I do not recall you from the dramaboards. Keφr 18:53, 11 December 2014 (UTC)
@T Ah, I didn't realize. Thank you. An adminship is probably more responsibility than I'm willing to take on right now, but how might I go about that in the future? By requesting here?
@Keφr I'm sorry, I don't much make it over to discussion pages. Anglom (talk) 19:16, 11 December 2014 (UTC)
@Kephir: I know Anglom from his work on Germanic languages. He's a good and conscientious editor who largely stays away from drama. If this were Wikipedia, his almost complete avoidance of the project namespace would be problematic, but here at Wiktionary I don't think it is. @Anglom:, being an admin doesn't actually give you more responsibilities unless you want them. You're not obligated to go vandal hunting, or block people, or protect pages, or anything like that. I'd support you for adminship too. —Aɴɢʀ (talk) 21:09, 11 December 2014 (UTC)
I offer conditional support. Anglom needs to put up a Babel box, be emailable, and, most importantly, provide etymology and gender for Alle. DCDuring TALK 21:55, 11 December 2014 (UTC)
The gender is hard to find, I assumed it would be a neuter third declension i-stem, but going by the International Code of Zoological Nomenclature "30.2.3. If no gender was specified, the name takes the gender indicated by its combination with one or more adjectival species-group names of the originally included nominal species", I would have to say feminine based on Alca, yes? Anglom (talk) 00:05, 12 December 2014 (UTC)
Thanks for humoring me and for your diligence. I was hoping that there was an answer from the apparent language of origin. DCDuring TALK 01:12, 12 December 2014 (UTC)
Hold on, DCDuring. You already created the vote, but I have not yet seen Anglom say that he wants to be an admin. --WikiTiki89 03:17, 12 December 2014 (UTC)
He said he would accept the nomination. DCDuring TALK 03:19, 12 December 2014 (UTC)
Oh, I didn't notice that you asked him on his talk page. --WikiTiki89 03:21, 12 December 2014 (UTC)

Deletion of rfv-passed and the like[edit]

FYI, there is a proposal to delete {{rfv-passed}}, {{rfd-passed}} and the like and to replace it with a longer markup. It is here: Wiktionary:Requests_for_deletion/Others#Archive_templates. I oppose the proposal. If it ain't broke, don't fix it, and don't make the markup longer. --Dan Polansky (talk) 10:40, 14 December 2014 (UTC)

Converting classic talk pages to Flow[edit]

I oppose converting classic talk pages to Flow. This is a Beer parlour subject, IMHO. (Was raised at Wiktionary:Grease pit/2014/December#The process of converting classic talk pages to Flow. --Dan Polansky (talk) 10:46, 14 December 2014 (UTC)

Have not seen Flow, but I oppose anyone doing it overnight without a long beta test and discussion. LiquidThreads was horrible. Equinox 11:59, 14 December 2014 (UTC)
Both of which are happening, on mw:Talk:Sandbox and mw:Talk:Flow (and on w:Wikipedia talk:Flow/Developer test page and w:Wikipedia talk:Flow because most Wikipedians cannot be bothered to visit other wikis). Yes, the page width has been mentioned, and yes, apparently it is here to stay. Keφr 12:14, 14 December 2014 (UTC)
@Kephir: Check out w:User:TheDJ/flowidth (a new userscript that allows drag-to-change width), which I'm currently urging the dev team to incorporate (or at least borrow ideas from) in the extension itself. Some sort of toggle or changer is/was planned for many months, but this has helped push it forward. :-) Quiddity (WMF) (talk) 02:57, 18 December 2014 (UTC)
I support conversion as it's much much much better than what we use now. —CodeCat 14:41, 14 December 2014 (UTC)
How? DCDuring TALK 14:50, 14 December 2014 (UTC)
Looks good, BUT how would it work for archives like deletion debates? Or would we be able (and willing?) to bypass flow for archive templates? Renard Migrant (talk) 16:36, 14 December 2014 (UTC)
Are you kidding me? The looks are the worst part of it. Oversized fonts, gratuitous animations, too much wasted screen space (between lines, padding, empty space to the right of the screen), poor visualisation of discussion structure (posts not clearly separated, who replies to whom only indicated by indentation). Also no pagination, missing basic functionality like deleting threads, and links like w:Topic:S22olnmzgd49twr0 (never mind finding this link in the first place was quite inconvenient). I am no fan of talk pages — they are crappy, monthly subpages here are marginally better, LiquidThreads has a few warts, but Flow is just horrible.
As for archiving, I think the original plan was to render archives obsolete. I think it was planned to make it possible for a single discussion to be visible from two talk pages at once. I have not seen anyone actually working on this, though. Instead WMF seems to concentrate on generating hype to make itself appeal to "OMGJAVASCRIPT" types, as most of its other software projects do. I mean, just look at Media Viewer, or the migration to Phabricator (the latter is not actually bad, but the improvement over Bugzilla is marginal). Keφr 17:28, 14 December 2014 (UTC)
(I hate Media Viewer too. It looks like a spammy "subscribe to our newsletter" popup, and hides all the useful metadata behind further clicks.) Just played in the Flow sandbox. Personally I could stand the UI but the performance is ridiculously poor, taking 20-30 sec to respond to a button press, with no visual cue that it's doing anything at all. I didn't think my computer was that old. Equinox 17:32, 14 December 2014 (UTC)
The looks can be changed, with enough sensible feedback (just like anything onwiki), and enough patience (there are only 3 devs working on Flow at the moment, and a voluminous list/backlog of requested features & changes). I (with my volunteer hat) want more density, too, and I'm pushing phab:M17 to help solve these subjective disagreements in the longterm.
(Tangentially: The benefits of phabricator over bugzilla include: A) it uses SUL (so no need for another account, and no more exposed email addresses), B) it replaces: Bugzilla/Trello/Mingle/RT/Gitblit, and eventually Gerrit, so it will be a lot easier for everyone (every community, and every wmf team) to collaborate on, and track, the various projects/extensions/code.)
Deleting threads/posts is available (to admins), and everyone else currently has a "hide" feature, which is equivalent to reverting but without being quite so opaque. There's room for improvement here, which will come with time and feedback and usage.
The performance does need to be improved, particularly for those of us with older machines. Examining that is constant, but attacking it vigorously is on the agenda.
The Topic URLs definitely need to be improved, and that's on the (long) to-do list. Quiddity (WMF) (talk) 02:57, 18 December 2014 (UTC)
  • I just half-skimmed, half-read the content at mw:Flow, and I found myself thinking that this is geeks seeking a new! improved! technical solution to something that 1) is at least partly a social problem (“New users on English Wikipedia have become less and less likely to participate in on-wiki discussions, in spite of a growing and mostly automated body of messages directed at them” -- sounds like capital-D Drama might drive some people away, and being increasingly nagged by automated stuff is just off-putting), and that 2) already has technical solutions for a number of the other issues they brought up, in the form of LiquidThreads (mooting the whole second paragraph of the Background section).
I do see in the Why not use LiquidThreads? section that they discuss some of the latter, but given Equinox's comments above, I'm unconvinced that we at EN WT have any need for Flow. This looks like the broader organization forcing something on all Wiki communities for the sake of ... I dunno, some ideal of consistency? The purported features of Flow that LiquidThreads does not offer don't look like anything we need, or would have much use for (globally unique identifiers, cross-wiki threads).
There are some other "features" of Flow that concern me. Flow will not directly support custom signatures. WTF? LiquidThreads does that just fine. Now we're going to be forced to use a different discussion mechanism on all talk pages, and we won't be able to use our sigs. The technical reasons they give all sound like “we're technically incompetent and it's too hard for us to figure out, so we're just going to throw out this one feature that all of our users have had for ages.” If they're too incompetent to figure out signatures, exactly how are we to trust that they can implement this entire system at all well?
Looking at the sample link Keφr provided, I am further horrified at this mess. Indentation stops after three levels, at which point, I completely lose any ability to tell which post is in reply to what. The mw:Flow page says that indentation relies on “arcane wikicode knowledge”. Granted, a user needs to know a little bit to use wikicode effectively. I posit that this is much more preferable over a patently broken and hard-to-use automated layout system. The justification for this visual crippling is that one quarter of WP viewership is on mobile devices. This reasoning is seriously flawed:
  • How much of that mobile viewership has any interest in Talk pages?
  • How much of that mobile viewership is using devices like iPads or Galaxys, which actually have pretty big screens?
  • How on earth is it acceptable, or even a good idea, to make design decisions for the PC based on what mobile phones are capable of? Microsoft spent billions on this boondoggle of an idea, and the early reviews of Windows 10 suggest that even this corporate behemoth has learned the error of its ways and is changing course (moving away from Metro, reinstating the Start menu, etc).
Oppose the current implementation of Flow. This is exceedingly poorly implemented, and I want no part of it until it is substantially improved and reworked. ‑‑ Eiríkr Útlendi │ Tala við mig 19:06, 14 December 2014 (UTC)
On other UI matters MW has imposed its will, despite a near revolt at de.wiki. I don't know what the final outcome was. As with many elites nowadays they probably believe that they have superior insight into the true needs of the people, which justifies ignoring everything that does not contribute to implementation of the plan. I'm fairly sure that we will get messages from MW staff tantamount to "You have to implement it in order to know what it is." DCDuring TALK 21:57, 14 December 2014 (UTC)
@Eirikr: Indent: The indentation limit is going to change - they've been postponing it for a long time, but everyone agrees that the current 3-indents limit is not good - it was meant to be an evolving experiment, and has remained static for too long. They're currently waiting for Design to provide a greater improvement than simply increasing the limit.
LQT: The LQT extension is no longer maintained and many of the highly active users (e.g. translatewiki) want to transition away, and there's a script for converting LQT to Flow (because for all LQT's problems, it is a logically structured system, at least). That part is fairly easy (comparitively speaking). It's how to deal with existing talkpages, that Gryllida was originally requesting comment on. Gryllida was not suggesting converting Wiktionary any time in the near future. Everyone agrees that Flow isn't nearly ready for that.
Signatures: Our username-attribution will need some way of being altered/adapted, to enable the editors who want an alternate name to display, e.g. the many editors who include a Greek/Latin name in addition to their local/primary script name.
However, the current system — of allowing anyone to A) give their name multiple different colors, in bold/superscript/comicsans/etc (which is bad for equality, and bad for accessibility), and B) to add numerous links (can often be confusing, or soapboxy), and C) to add templates and circumvent size-limits "but please don't", and D) to obfuscate our primary username (the one which appears in History/Logs/etc) thereby complicating everything — could benefit from a few changes. Quiddity (WMF) (talk) 02:57, 18 December 2014 (UTC)
Oppose. If it ain't broke don't fix it (especially if it's anything like liquid threads). SemperBlotto (talk) 07:59, 15 December 2014 (UTC)

Stranding off-topic a bit: I realise that LiquidThreads has a few quirks, but I think the amount of hate it gets from some users is quite disproportionate. Could those people illuminate me on what makes LQT so horrible to them? Keφr 14:33, 15 December 2014 (UTC)

Why does a bit from some old movies come to mind? This could get ugly... ;) Chuck Entz (talk) 14:59, 15 December 2014 (UTC)
Don't get me started. I have stopped watching the user pages of those who use LQT because of the annoying messages and the lack of inclusion in my regular watchlist. As a result I am effectively prevented from usefully communicating with users who use LQT. I assume it was done this way intentionally to force rapid adoption, in imitation of Facebook. DCDuring TALK 15:29, 15 December 2014 (UTC)
  • The watchlist issues DCD mentioned;
  • It messes up patrolling (not that this affects many people...)
  • It’s hard to read the history of an entire discussion.
  • It’s ugly.
  • The normal system is much more flexible.
Ungoliant (falai) 16:09, 15 December 2014 (UTC)
Oppose per Eiríkr. - -sche (discuss) 02:28, 17 December 2014 (UTC)
Hi, I've replied in a few places above. TL;DR: Flow is not in a final state by a long shot, and your feedback (throughout this thread) is appreciated. Gryllida was not suggesting converting Wiktionary any time in the near future. Everyone agrees that Flow isn't nearly ready for that. The dev team is concentrating on implementing the feature-set requests of the communities that are already actively/happily testing Flow, and they can only do so much at once. Personally, I'll be back to ask about and investigate more Wiktionary workflows (in all languages), in later months. Eventualism, and "slow and steady", are my constant mantras; They're (part of) the only reason any of this wonderful/aggravating/overwhelming/hopeful wikiverse work (imho, with apologies for using generalizations ;-) . Hope that (all) helps. Quiddity (WMF) (talk) 02:57, 18 December 2014 (UTC)
I for one believe that it is either impossible or very difficult to implement something as useful and flexible as the current system of talk pages, which are just wiki pages in a different namespace. I for one am not interested in providing feedback to Flow so that Flow can be improved. I just dread the day on which Flow is going to be forced down our throats the way Media Viewer was forced on German Wikipedia. --Dan Polansky (talk) 08:31, 20 December 2014 (UTC)

Hyphenation linked to a language-specific appendix[edit]

Similarly to the IPA key, can we link the Hyphenation label to a language-specific appendix (such as Appendix:Hungarian hyphenation) where the rules of hyphenation would be described? The {{hyphenation}} already has the lang parameter to make this feasible. If the appendix does not exist, then no linking would take effect. --Panda10 (talk) 20:56, 16 December 2014 (UTC)

problems and errors in Latin diphthong info[edit]

Discussion moved to Wiktionary talk:About Latin.

Google 1gram.[edit]

I asked Jimbo to put in a word with Larry Page and Sergey Brin to see if we couldn't get a comprehensive list of words appearing in Google Books. Another editor suggested we look at "Google 1-grams", http://storage.googleapis.com/books/ngrams/books/datasetsv2.html :

"File format: Each of the files below is compressed tab-separated data. In Version 2 each line has the following format:
ngram TAB year TAB match_count TAB volume_count NEWLINE
As an example, here are the 3,000,000th and 3,000,001st lines from the a file of the English 1-grams (googlebooks-eng-all-1gram-20120701-a.gz):
circumvallate 1978 335 91
circumvallate 1979 261 91
The first line tells us that in 1978, the word "circumvallate" (which means "surround with a rampart or other fortification", in case you were wondering) occurred 335 times overall, in 91 distinct books of our sample."

Is this something we can use as a source of words? bd2412 T 02:08, 17 December 2014 (UTC)

Yes, last year I did, User:DTLHS/googlebookscorpus/A. I think I still have the code somewhere, I could do more letters if that's not enough. DTLHS (talk) 00:41, 18 December 2014 (UTC)
Awesome. I notice some spurious things (e.g. "andthen", and "ation" which I assume is the suffix -ation widowed by a line break), but the list still seems quite useful. Question: if a book has, say, "Anna", is that in the 1-gram list as-is, or is everything downcased (to "anna")? - -sche (discuss) 05:04, 18 December 2014 (UTC)
There are very many typos though e.g. (in the first line) asterwards for afterwords, attomeys for attorneys. SemperBlotto (talk) 15:06, 18 December 2014 (UTC)
I am interested in getting a complete list of all words appearing in all Google Books. Can we get that from these lists compiled by Google? bd2412 T 17:53, 18 December 2014 (UTC)

Category:Numerals by language and Category:Numbers by language[edit]

Some of us talked about the two similar categories three years ago in Wiktionary:Beer parlour/2011/June#Numbers and numerals. Now that they are creating confusing interwiki links, how about unifying them? We should delete Category:Numbers by language and its subcategories because they are newer than Category:Numerals by language and its subcategories. — TAKASUGI Shinji (talk) 03:51, 17 December 2014 (UTC)

Numbers are not a part of speech, while numerals are. We should keep them separate. —CodeCat 14:17, 17 December 2014 (UTC)
I know we came up with a distinction, but I can't remember what it is. Renard Migrant (talk) 00:25, 18 December 2014 (UTC)
One point is that all the words in Category:English ordinal numbers are adjectives. That means they can't also be numerals, because numeral is already a part of speech. —CodeCat 00:32, 18 December 2014 (UTC)
If we keep them separate, Category:English numbers should contain Category:English numerals as a subcategory. Currently the two are not aware of each other. — TAKASUGI Shinji (talk) 10:35, 25 December 2014 (UTC)
Why does it need to? All members of Category:English numerals are already in Category:English cardinal numbers. —CodeCat 10:37, 25 December 2014 (UTC)
I am talking about categories, not entries. I just want to decide which should be the top category for interwiki links. If we separate them and Category:English numbers is the top category, then we need to fix all the interwiki links. Currently Category:English numerals works as an interwiki target from other language versions. — TAKASUGI Shinji (talk) 00:43, 30 December 2014 (UTC)
Category:English numerals is for the part of speech "numeral". Category:English numbers is for any word with numeric semantics, whether its part of speech is "numeral" or not. —CodeCat 01:47, 30 December 2014 (UTC)
Category:English ordinal numbers, being a topical category, should be named Category:en:Ordinal numbers (deleted on 29 July 2014). Wiktionary:Votes/pl-2010-01/Number categories needs to be abolished; we now have paralled categorization for number words: one by part of speech, one by semantics. --Dan Polansky (talk) 10:48, 25 December 2014 (UTC)
Ordinal numbers are not a topic. —CodeCat 11:05, 25 December 2014 (UTC)
They are, just like mammals (Category:en:Mammals). --Dan Polansky (talk) 11:10, 25 December 2014 (UTC)
Not quite. The terms in Category:en:Mammals refer to mammals as objects. But the words "one" or "seventh" don't refer to numbers as objects, or at least not in normal usage. If they did, they would all be nouns, like mammals are. —CodeCat 01:50, 30 December 2014 (UTC)

Categorization of Polish participles[edit]

There are currently three categories for Polish participles:

These categories do not correspond at all to the categories of participles recognized in Polish grammar, which are:

  • adjectival participles:
    • active adjectival participles (imiesłów przymiotnikowy czynny), such as czytający
    • passive adjectival participles (imiesłów przymiotnikowy bierny), such as przeczytany
  • adverbial participles:
    • contemporary adverbial participles (imiesłów przysłówkowy współczesny), such as czytając
    • anterior adverbial participles (imiesłów przysłówkowy uprzedni), such as przeczytawszy

Currently, it seems that passive adjectival participles of perfective verbs are under "Polish past participles", active adjectival participles are under "Polish present active participles", and passive adjectival participles of imperfective verbs are under "Polish present passive participles‎". Anterior adverbial participles are nowhere to be found. This is clearly suboptimal and also misleading, since the entries under "Polish past participles" can also be used to form present tense sentences, such as Ten samochód jest już sprzedany which means This car has already been sold.

I can migrate the entries to the correct categories (there are less than 300 of them). I added the correct categories to Module:category tree/poscatboiler/data/non-lemma forms, but the new POS names are unrecognized by Module:headword. Can anyone with admin access add them? --Tweenk (talk) 22:41, 18 December 2014 (UTC)

Never mind, I created a new headword template Template:pl-participle that obviates the need for modifications to Template:head. --Tweenk (talk) 01:28, 22 December 2014 (UTC)

German composed forms[edit]

Is there any reason we present German composed forms? As far as I'm aware, the only "irregularity" in German composed form is whether a past participle takes "haben" or "sein", and that's already covered in both the heading line and the conjugation table. It seems at best sort of redundant, and at worst a little misleading (while there's nothing incorrect about the table at sein, for example, some of the composed forms aren't very idiomatic). On the other hand, German Wiktionary includes all this information, so it's entirely possible that I'm missing something. Smurrayinchester (talk) 14:48, 19 December 2014 (UTC)

I think this is very usual in conjugation tables (in all languages). Some conjugated forms are easy, some are less easy, but conjugation tables should be as complete as possible, this is useful to people wishing to use the verb and with little knowledge of conjugation rules. Lmaltier (talk) 17:59, 20 December 2014 (UTC)
I don't think it's necessary to show the full conjugation of all the auxiliary verbs. After all, the conjugation for those verbs can easily be looked up itself. For Slovene, I adopted the practice of showing only one form of the auxiliary. It keeps the tables much shorter while still giving an idea of the general formula. See kupovati for an example. For Latin, we don't list composed forms either, we show how to form them. Like on canto. —CodeCat 18:22, 20 December 2014 (UTC)
With such a reasoning, you could as well remove all regular forms and keep conjugation tables of individual verbs only for irregular verbs. But, once again, providing complete tables is useful to some readers. Lmaltier (talk) 18:34, 22 December 2014 (UTC)

User:JAnDbot for bot status[edit]

FYI, there is a new vote Wiktionary:Votes/bt-2014-12/User:JAnDbot for bot status. Previous failed vote: Wiktionary:Votes/bt-2012-06/User:JAnDbot_for_bot_status. --Dan Polansky (talk) 08:15, 20 December 2014 (UTC)

Free 'RSC Gold' accounts[edit]

I am pleased to announce, as Wikimedian in Residence at the Royal Society of Chemistry, the donation of 100 "RSC Gold" accounts, for use by editors wishing to use RSC journal content to expand articles/ items on chemistry-related topics. Please visit en:Wikipedia:RSC Gold for details, to check your eligibility, and to request an account. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:34, 20 December 2014 (UTC)

The idea of an open and free dictionary using content that is behind such elitist closed doors doesn't sit well with me. —CodeCat 14:20, 20 December 2014 (UTC)
There's probably not much useful for a dictionary in those journals anyway; I'm sure the words in those articles are found in plenty of more easily accessible places. Great news for Wikipedia, though! —Aɴɢʀ (talk) 14:28, 20 December 2014 (UTC)

Kurdish Wiktionary Article Count Abuse[edit]

Kurdish Wiktionary has hundreds of articles for same words. like: Abdullah- a proper name. The dictionary has 429.314 words, which is nonsensical because it has barely 10.000 words Is this an article counter abuse? --Kafkasmurat (talk) 10:20, 23 December 2014 (UTC)

How is that abuse? — Ungoliant (falai) 11:00, 23 December 2014 (UTC)
And I don't understand what you mean: you state that the dictionary has 429314 words and that it has barely 10000 words (with a link to another dictionary!). What do you mean? Lmaltier (talk) 11:33, 23 December 2014 (UTC)
First of all, the orthography of written Kurdish is very complex, due to not having a single authoritative standard. I'm sure all of those spellings are in use. Second, you can't tell how many words a language has by looking at an online glosbe dictionary: those tend to be very sparse and incomplete- especially for a lesser-known language such as Kurdish. Third, even if you could, Wiktionaries shouldn't be limited to one language, so I'm sure they have entries for English, Arabic, Turkish, etc. There's no reason for them not to have as many entries as any other Wiktionary (we're approaching 4 million). Besides, each Wiktionary is independent, so there's not much point in complaining to us. Chuck Entz (talk) 13:36, 23 December 2014 (UTC)
I'm confused, what do you want us to actually do? Just agree with you so you feel better about yourself? Renard Migrant (talk) 15:19, 23 December 2014 (UTC)

Spanish bot[edit]

Hi. I got blocked a couple of times for running a bot when pretending it wasn't a bot with the usernames User:Type56op8 and User:Type56op9. I'd like to continue with it, but without getting blocked again. The bot is really amazing, and makes past participle forms for Spanish verbs. Check out the recent contributions here and here and [here which were all created without a bot flag on the user, and tell me what you think. --SuperWonderbot (talk) 14:39, 23 December 2014 (UTC)

If you didn't even follow the rules before you got a bot flag, how do we know we can trust you to follow them after you do have one? —CodeCat 19:20, 23 December 2014 (UTC)
I have an idea. How about 'no'. Renard Migrant (talk) 19:24, 23 December 2014 (UTC)
I'll give the bot code to someone else if they want it. I've just got thousands of ready-made Spanish entries, and just wanna help out WT. --SuperWonderbot (talk) 23:36, 23 December 2014 (UTC)
Maybe a flood flag? --SuperWonderbot (talk) 23:41, 23 December 2014 (UTC)
Anyway, I very rarely follow the rules. You all know me, I do loads of great work for a while, then others clean up after me. Then later I clean up after others, and the whole site improves as a result. --SuperWonderbot (talk) 23:52, 23 December 2014 (UTC)
I'd have to disagree with the whole "the site improves as a result" part. --Neskaya sprecan? 16:59, 8 January 2015 (UTC)
You mean the bot that added loads of Spanish entries with ast as the language code? Yeah, amazing bot. — Ungoliant (falai) 23:43, 23 December 2014 (UTC)
Yeah, I fixed them (you helped too, thanks by the way)! --SuperWonderbot (talk) 23:46, 23 December 2014 (UTC)

User:Mglovesfun for immediate desysopping[edit]

Self-nomination, does not require a vote in my opinion. Seems like a no-brainer. Renard Migrant (talk) 15:03, 24 December 2014 (UTC)

  • Does "Mglovesfun" mean "M gloves fun" or "Mg loves fun"? bd2412 T 15:39, 24 December 2014 (UTC)
His initials are MG. Equinox 15:44, 24 December 2014 (UTC)
I suspected that, but I thought that his username would then be MGlovesfun. Welcome to the wild world of the Internet, I suppose. bd2412 T 16:09, 24 December 2014 (UTC)
Uh it's not a vote. Equinox 09:59, 25 December 2014 (UTC)
Done. —Stephen (Talk) 22:52, 27 December 2014 (UTC)

Edit warring on a user's signature[edit]

[edit]

Wiktionary-logo-portal.svg
Wiktionary-logo-en.svg
Wiktionary-logo wpstyle-en with transparency.png

Our logo has been the focus of widespread complaint for a long time, with one person on WT:Feedback noting that it seems like somebody brings it up every week, and I know many editors here dislike it. It's visually unappealing and doesn't abide by our IPA standards on transcribing English r while simultaneously giving only a British pronunciation, and looks like a strange attempt to imitate the layout of a physical dictionary, which is everything we are not. Its only advantage, in my opinion, is that of consistency.

The other two logos shown at right are the tiles logo, which has the advantage of being the logo that the Wiktionaries all chose in a communal vote and therefore being the most widely adopted logo, and the book logo, which I (and I believe some other editors) find the most aesthetically and symbolically appealing, and which is also used by more Wiktionaries. If there are other logos that people like, or new ones that someone is willing to create, I would be happy to see them as well. Note that it is currently possible to change how you see the logo in your Preferences if you're logged in, and we will keep that no matter how the vote goes.

My two main questions are: preliminarily, what do people think about these options and having a vote on them? And secondly, would anyone be willing to draft the vote with me? I'm hoping that we can finally have a logo that we can be proud of rather than one that attracts regular complaint and criticism. —Μετάknowledgediscuss/deeds 22:35, 27 December 2014 (UTC)

I prefer the book logo. — Ungoliant (falai) 22:39, 27 December 2014 (UTC)
I would not mind a vote. Personally, I like the "tiles" logo. The "book" logo looks quite dull and non-indicative of what this project is about: pretty much any kind of knowledge can be put into a book (if you forget about the limitations of the medium). A book is practically meaningless as a symbol. Keφr 22:45, 27 December 2014 (UTC)
Please see Wiktionary:Votes/2010-02/Accepting the results of the Wiktionary logo vote which failed. -- Liliana 22:47, 27 December 2014 (UTC)
Actually, it had no consensus. But more importantly, that was more than four years ago, and it seems to me quite possible that the community will be able to agree this time. —Μετάknowledgediscuss/deeds 23:00, 27 December 2014 (UTC)
As a logo, I dislike the book. It is not logo-like. I like the tiles. The biggest (or only) complaint I recall about the tiles was the Japanese kana that looks like a happy face. The specific symbols on the tiles are not immutable, they can be changed. Most wikis that went with the tiles did change one or more of them. Consider ko:위키낱말사전:대문 and tr:Ana_Sayfa. —Stephen (Talk) 23:06, 27 December 2014 (UTC)
I like the tiles but I wish someone would re-draw them into something more modern looking. The colors and gradients don't look good. DTLHS (talk) 23:11, 27 December 2014 (UTC)
I still prefer the book logo. The tiles look too... cheap? Unprofessional? Playful? It also doesn't really evoke a dictionary as a repository of knowledge on words. Instead it seems to focus on language variety. The book is literally a dictionary (or is supposed to be) so it seems much more fitting. —CodeCat 23:17, 27 December 2014 (UTC)
How is "playful" a bad thing? Keφr 23:27, 27 December 2014 (UTC)
I agree that the tiles appear more evocative of a child's game (and they contain letters/symbols, not words, which is our main focus); the book seems generic. I would go for the tiles if the characters could be replaced with short words like be and 考虑 and عجب. bd2412 T 00:57, 28 December 2014 (UTC)
Another note: a user once commented that there was a page layout error with the current logo, because he didn't realise that the portions of previous and subsequent "entries" (shown as bits of grey text) were meant to be just that: incomplete portions. Equinox 02:41, 28 December 2014 (UTC)
I'm not an artist, so I can't give you a picture, but I would like to suggest a computer screen with "Wiktionary" on it, surrounded by a book, a scroll, a clay tablet, and selected other representations of iconic things writing has appeared on throughout history and throughout the world (maybe a hand making sign language, too). Everything should be minimalist, and we don't need to show details of the writing (aside from the word "Wiktionary"). Chuck Entz (talk) 03:00, 28 December 2014 (UTC)
I prefer the Scrabble tile logo, but if we do use a different logo, I'd like something that uses the Wikimedia colors of red, green, and blue ThreeCircles.svg. Some previous proposals are at m:Red, green, and blue#Proposed Wiktionary logo. —Aɴɢʀ (talk) 16:38, 28 December 2014 (UTC)
Logo Dzień Destubizacji Orem version.svg
Out of those (and not looking just under "Wiktionary"), I like this one. Cheers! bd2412 T 17:04, 28 December 2014 (UTC)
I do not think so. Awfully generic, just like the "book" one. Keφr 21:58, 28 December 2014 (UTC)
The ninth is Goatse, and the second looks like someone pooping. — Ungoliant (falai) 22:01, 28 December 2014 (UTC)
I'm rather partial to the fourth one; it looks like a birds-eye view of someone reading a dictionary. —Aɴɢʀ (talk) 22:20, 28 December 2014 (UTC)
You know, if we went with Goatse, it would bring some much-needed attention to the site. However, it might be the wrong kind of attention. bd2412 T 16:34, 29 December 2014 (UTC)
The tile logo doesn't evoke our mission (lexicography) at all. The book does, although I'm not convinced that switching to it would be better than updating our current logo, either to use modern RP and GenAm pronunciations, or to drop the pronunciation and perhaps replace it with a terse etymology. I recognize the point made above that any sort of information can be put into a book, so I understand how some people might not think the book evoked lexicography, but I don't see how anyone thinks the book doesn't evoke lexicography but the tiles do. Of the previously-proposed logos Angr links to, only meta:File:Wiktionary firehazard07 v1.gif and meta:File:Wikt bookdictionary logo.svg clearly suggest lexicography to me. meta:File:Wiktionary Dynamic Dictionary Logo 2.svg is two people arguing about the meaning of a story by talking past each other about different highlighted paragraphs, and something seems off about meta:File:Wiktionary logo Stephane8888 01f.svg and meta:File:Wikty no text up.png. meta:File:Wikt jlc 10-4 plus 6a.svg would make a great logo for a WikiBacteria or WikiViruses spin-of of Wiki-Species and the puzzle piece logo would be good for a WikiPuzzles site, but neither suits a dictionary, IMO. - -sche (discuss) 20:11, 29 December 2014 (UTC)
Actually, reading Angr's comment, I can see how meta:File:Wiktionary logo Diego UFCG.png and meta:File:Wiktionary logo Stephane8888 07.svg could work, although I dislike how bold the colours are — I think they would draw attention to themselves and away from the content of our pages (black text and small, unobtrusive blue links on white or grey backgrounds). - -sche (discuss) 20:19, 29 December 2014 (UTC)
Actually, tiles evoke some word games, such as Jarnac, and thus evoke words. Its different scripts evoke the variety of languages. Thus, this logo evokes words in all languages, which is a perfect summary of our mission. Note that the Wikipedia logo also evokes a game, not an encyclopedia, and also evokes all languages. Thus, the tiles logo is very consistent with the spirit and the Wikipedia logo, and is visually more appealing (in my opinion, because it's subjective). Lmaltier (talk) 20:51, 31 December 2014 (UTC)
I'd say the puzzle pieces of the Wikipedia logo allude to "creating a complete picture". And the logo implies that by putting together the pieces, one is creating the world with all its knowledge. Wiktionary is not that different, but we'd want the puzzles to come together to form languages or words rather than encyclopedic knowledge of the world. So maybe a design with a book/dictionary made up of puzzle pieces? —CodeCat 21:27, 31 December 2014 (UTC)
We're a dictionary, but not a book. I must say I don't understand the reluctance of some people to the tiles logo. After adopting it, even contributors opposing it get used to it, and this is not an issue any longer on fr.wikt. Lmaltier (talk) 22:19, 31 December 2014 (UTC)
I don't think being a dictionary is the reason to go for a book logo. Books are also tied to language and words, to literature. Of course, words are also spoken, which is why some of the proposed logos have people speaking instead. But using a book seems to fit better with our practice of requiring written attestations? —CodeCat 14:45, 1 January 2015 (UTC)
A logo serves not only to symbolise what the project is about, but also to set it apart from other similar things. Many more things are connected to literature. Every Wikimedia project could use a book for its logo (maybe except Wikinews and some meta-projects). A book is so generic that it becomes meaningless. Keφr 15:29, 1 January 2015 (UTC)

I think we should have a logo vote, and that the winner should have at least (or more than) 50% of the votes (run-off voting if necessary). If one wants to play it really safe, one could even have a vote of confidence for the process itself: has the logo candidate selection been fair? Are the voting rules fair? Etc. If the current logo never has been voted for, I think we should definitely strive for a successful vote (which of course includes the possibility that the current logo, modified or not, wins). --Njardarlogar (talk) 12:11, 1 January 2015 (UTC)

Some language codes missing[edit]

Per Wiktionary:Guide_to_adding_and_removing_languages, I'd like to request that oui and khk be added to the list of languages. Currently "ug" is being used in place of "oui" in several etymologies (note that Old Uyghur is not an older form of Uyghur, but is in fact a different language with the same name), and "mn" is being used instead of "khk". —Firespeaker (talk) 07:54, 29 December 2014 (UTC)

I agree that we need to add oui, but I'm not yet convinced that we need to distinguish mn and khk. —Aɴɢʀ (talk) 08:29, 29 December 2014 (UTC)
Mongolian (mn) is a macrolanguage. Many examples give "Mongolian" examples, but are in fact Khalkha examples in the standard orthography of Mongolia. Many other varieties of Mongolian are not represented by this orthography (Chakhar, Ordos, etc.). If I were to consider a single variety "Mongolian", it would be written Classical Mongolian (still standard orthography for most varieties of Inner Mongolia, and recognised as an alternative in Mongolia), which is not the language of the forms being provided. —Firespeaker (talk) 04:19, 30 December 2014 (UTC)
oui is already there. Here. — Ungoliant (falai) 14:18, 29 December 2014 (UTC)
Thanks! (uhh, your link doesn't work, but I got it;) —Firespeaker (talk) 04:19, 30 December 2014 (UTC)

Category:Entries missing English vernacular names of taxa[edit]

I am rather puzzled by the name of this category, let alone its purpose, and would like to suppress it when it appears in entries which include the vernacular name, such as dvergbjerk. In this case there is no entry for Betula nana, only the Wikipedia link. How can I suppress it? Donnanz (talk) 13:34, 29 December 2014 (UTC)

I think {{vern}} is supposed to be used with the vernacular name, e.g. {{vern|dwarf birch}}. The taxonomic name is supposed to use {{taxlink}}, e.g. {{taxlink|Betula nana|species|noshow=1}}. (I have no idea what |noshow=1 does, but someone always comes along and adds it if I forget it, so I assume it needs to be there.) —Aɴɢʀ (talk) 13:48, 29 December 2014 (UTC)
Everything is a Angr says. I add the noshow=1 after I have looked at the entry and thanked the person who used {{taxlink}} (but, in this case, me). As we have an entry for dwarf birch, it doesn't need {{vern}} and would show up in a category of entries in which that template is redundant. DCDuring TALK 15:25, 29 December 2014 (UTC)
The missing-taxonomic-name and missing-vernacular-name categories are tracked so that the most commonly "needed" ones are added relatively quickly, at least when I'm in the mood to do so. DCDuring TALK 15:27, 29 December 2014 (UTC)
So does that mean I shouldn't be including noshow=1 when I use the template? Also, since Category:Entries missing English vernacular names of taxa is a cleanup category, shouldn't it be hidden? —Aɴɢʀ (talk) 15:34, 29 December 2014 (UTC)
Ah, thanks, that works fine! The problem was that I copied what was already entered at dvergbjørk (not by me), so I have gone back and changed that entry accordingly. Donnanz (talk) 15:57, 29 December 2014 (UTC)
@Angr: As you already have my undying gratitude for so many things, sending further thanks would be coals to Newcastle. If you are uncertain about something, you could leave it out and I would look at it within a day, often less. I like to look at new English entries that use taxonomic names anyway, because some of them merit some additional external links and images, sometime including a map of the range of the taxon members to support requests for native-language translations. DCDuring TALK 19:07, 29 December 2014 (UTC)
I would like it if users added English vernacular names. If, as a result, some pages are categorized as having redundant vernacular-name templates, I look at the new entries and add what I think is missing. Therefore I like it being a visible category, especially as there is not redlink due to the pedia link rendering it blue. OTOH, I would certainly defer to any consensus to the contrary. DCDuring TALK 19:15, 29 December 2014 (UTC)
To me, visible categories are for all readers, including those who never edit, while hidden categories are for "behind the scenes" work that only editors do. —Aɴɢʀ (talk) 19:20, 29 December 2014 (UTC)
The category membership is in lieu of a redlink, which is an invitation to add an entry. I suppose we could hide the category and add an explicit invitation to add the missing vernacular name, something more conspicuous like what {{t-needed}} adds. DCDuring TALK 20:16, 29 December 2014 (UTC)
Or restore the redlink, and add small superscript link to Wikipedia article. DCDuring TALK 20:18, 29 December 2014 (UTC)

January 2015

Wiktionary:Translation requests[edit]

I think, with 38½ sections per month on average and little interest in archiving, this page is a good candidate for conversion to the monthly pages system. If we still want to keep it, anyway... Keφr 22:33, 1 January 2015 (UTC)

Or it could be purged more often. — Ungoliant (falai) 22:36, 1 January 2015 (UTC)
Yes, but who wants to do it? Keφr 22:48, 1 January 2015 (UTC)
Nevermind. I thought it had the same “archiving” system of the feedback page. — Ungoliant (falai) 23:25, 1 January 2015 (UTC)
I prefer the purging method, but the auto-archiving method is easier! Renard Migrant (talk) 13:42, 2 January 2015 (UTC)

Maybe have some new words of the day this year[edit]

Does anyone fancy updating the word of the day a bit more? I noticed we had a lot of repeats last year and already this year. I'd be grateful to anyone who put any work into it. Renard Migrant (talk) 13:08, 2 January 2015 (UTC)

I’ve briefly considered it before, but I’ve worried that, if I were to accidentally feature a word that was already Word‐of‐the‐Day elsewhere, it could spell serious trouble for the project. --Romanophile (talk) 13:11, 2 January 2015 (UTC)
To blow out of proportion, verb: To overreact to or overstate; to treat too seriously or be overly concerned with. Keφr 13:27, 2 January 2015 (UTC)
Quite an understatement. I do not think we had any new word of the day at all this year. And I think you can guess at least one reason why if you look up how these are set up compared to FWOTDs. Keφr 13:27, 2 January 2015 (UTC)
From above "if I were to accidentally feature a word that was already Word‐of‐the‐Day elsewhere, it could spell serious trouble for the project." Seriously? Is this a joke? Renard Migrant (talk) 13:28, 2 January 2015 (UTC)
Couldn’t somebody sue the Wikimedia foundation if it looked like we copied their Word‐of‐the‐Day? See also: ‘Avoid other WOTDs - We want to feature words that haven't been WOTDs for other dictionaries, partly to highlight unique terms that make Wiktionary so special, partly to avoid complaints (by preventing the possibility entirely) that WOTDs were "stolen" from other dictionaries.’ --Romanophile (talk) 13:37, 2 January 2015 (UTC)
I'm not an intellectual property lawyer, but no I don't think so! If we copied the definitions or copied large strings of words of the day from another websites, then yes. But nobody can copyright the use of a word on its own. @BD2412: would be the person to ask (he actually is an intellectual property lawyer) but I can't imagine I'm wrong on this. Renard Migrant (talk) 13:41, 2 January 2015 (UTC)
You're not wrong on this. Cheers! bd2412 T 14:29, 2 January 2015 (UTC)
Certainly one reason we haven't any new WOTDs yet this year is that this year is only two days old. —Aɴɢʀ (talk) 14:02, 2 January 2015 (UTC)
While not ideal, perhaps it would be better to cycle through our (non-offensive) entries one by one, regardless of what the words are, than to repeat a few hand-picked ones forever. Equinox 14:08, 2 January 2015 (UTC)
How about extrabellum? Cheers again! bd2412 T 14:37, 2 January 2015 (UTC)
I don't like the idea of cycling through all our entries regardless of quality. At WT:FWOTD there is the requirement that a candidate word must have pronunciation information and at least one citation; I think that English WOTDs ought to be held to at least that standard if not a higher one. —Aɴɢʀ (talk) 15:17, 2 January 2015 (UTC)
I was just going to propose something similar. Keφr 15:30, 2 January 2015 (UTC)
How hard can it be to find words that meet the criteria? Or, better yet, to take words that are close to that and bring them up to par? bd2412 T 15:39, 2 January 2015 (UTC)
That depends on the word, but usually relatively easy, methinks. But I guess it has to be blessed as policy somehow. Not necessarily a full vote. Keφr 15:52, 2 January 2015 (UTC)
The problem is not the lack of words to feature (the nominations page has enough interesting words to last for months), it’s the lack of someone to update the templates. — Ungoliant (falai) 17:20, 2 January 2015 (UTC)

Deprecating "Acronym", "Initialism" and "Abbreviation" headers[edit]

Previous discussions: Template talk:abbreviation-old#RFD discussion, Wiktionary:Beer parlour/2014/July#Template:abbreviation-old

Some time ago it was proposed that these "part-of-speech" headers be deprecated (in favour of real part-of-speech headers like noun, adjective, interjection, etc.). I support the idea, but to be honest, I cannot recall a wider discussion about it. Meanwhile people are still adding new "initialism" entries, and there is nothing to point them to about it. If we formally deprecate those headers, we should remove them from WT:NEC, deprecate and track {{head|...|abbreviation/acronym/initialism}}, and probably set up an edit filter. Do we do it? Keφr 18:10, 2 January 2015 (UTC)

I get the impression, of the people who've expressed an opinion, that everyone's broadly in favour of it (i.e. deprecating), but implementing it would be horrendously difficult. Because they all need to be sorted by hand, it can't be automated. Users are still adding them in good faith and it's better to have them under an initialism header than not at all. Renard Migrant (talk) 18:13, 2 January 2015 (UTC)
Like I said, we could set up an edit filter which would catch that and show an explanatory message. Keφr 18:16, 2 January 2015 (UTC)
Sorry I missed that bit. Support. Renard Migrant (talk) 19:11, 2 January 2015 (UTC)
Support. — Ungoliant (falai) 19:05, 2 January 2015 (UTC)
I support it in principle, but I've found that there are some cases where it's not easy to define it with any other part of speech. I can't think of any example right now but I know there are some. —CodeCat 19:19, 2 January 2015 (UTC)
@CodeCat: I don't know if this is what you meant, but there are various kinds of abbreviations of phrases for which Phrase doesn't seem like the right header, eg, YOLO. DCDuring TALK 19:41, 12 January 2015 (UTC)
Yes, that would be a good example. —CodeCat 19:53, 12 January 2015 (UTC)
YOLO seems somewhat like an interjection to me. And the more I think about IANAL, the more it seems like an interjection too. (Compare QED.) Keφr 20:56, 12 January 2015 (UTC)
@Kephir: Just because we have numerous erroneous uses of Interjection as PoS header doesn't mean that we should err yet again. Consider this definition of interjection from MW Onlne:
"an ejaculatory utterance usually lacking grammatical connection: as
a : a word or phrase used in exclamation (as Heavens! Dear me!)
b : a cry or inarticulate utterance (as Alas! ouch! phooey! ugh!) expressing an emotion"
Their definition of ejaculatory refers to ejaculation "something ejaculated; especially a short sudden emotional utterance"
Their other definitions are even less applicable.
YOLO, QED, and IANAL represent full sentences without marked emotional content. DCDuring TALK 21:06, 12 January 2015 (UTC)
What is the term for a phrase that has a finite verb as its head? —CodeCat 21:07, 12 January 2015 (UTC)
@CodeCat: Do you mean an absolute? That also strikes me as too technical for a PoS header, even if it were in WT:ELE (and for any definiens, except possibly some more technical style/grammar/linguistics terms). DCDuring TALK 22:39, 12 January 2015 (UTC)
No, that's not it. It would have to include the terms that the abbreviations above stand for. All I can think of is "sentence", but they're not necessarily always used as stand-alone sentences (they can be, though). —CodeCat 22:53, 12 January 2015 (UTC)
Wikipedia says at w:Finite verb: "A finite verb is a form of a verb that has a subject (expressed or implied) and can function as the root of an independent clause; an independent clause can, in turn, stand alone as a complete sentence." So I think "independent clause" would be most fitting as the part of speech for these and similar terms, whether abbreviated or not. It may not be the easiest to understand for readers, but any alternative would be ambiguous, so we don't have much choice. —CodeCat 22:58, 12 January 2015 (UTC)
Whatever grammatical term we use is likely to be a problem because the term will be technical or it will clash with the basic notion that something spelled without spaces is a word (not a clause or phrase), or both. We have entries for MWEs that have the PoS header "Phrase" that are technically not phrases, but I don't think that the technical requirement that a "real" phrase be a constituent bothers very many ordinary users.
For now I am willing to plow ahead on the entries that have clear-cut PoS headers to replace these and wait for lightning to strike in the form of some conceptual breakthrough. Or for English speakers and their dictionaries to decide that interjections don't have to be emotional outbursts or that phrases don't have to have spaces between the component words. DCDuring TALK 23:19, 12 January 2015 (UTC)
I definitely support it in principle, but it is somewhat tedious to implement.
It would be handy to have a template to automatically provide the pronunciation for initialisms so that the pronunciation information implicit in the initialism header would not be lost due to the additional effort to provide a proper IPA pronunciation. DCDuring TALK 20:14, 2 January 2015 (UTC)
I did make one some time ago, but it was before Lua so it was a bit cumbersome to use, and never saw much use. I don't remember if I deleted it or not. —CodeCat 20:31, 2 January 2015 (UTC)
Tidiness strikes again? Even a poor template could be used to test the concept and discover missing features that might be useful. DCDuring TALK 20:52, 2 January 2015 (UTC)
Would that be {{IPA letters}}? DCDuring TALK 21:02, 2 January 2015 (UTC)
{{IPA letters|I|B|M|lang=en}}IPA(key): /aɪ biː ɛm/ DCDuring TALK 21:05, 2 January 2015 (UTC)
It would be handy if it worked directly on {{PAGENAME}}, perhaps with {{PAGENAME}} subst:'d. DCDuring TALK 21:23, 2 January 2015 (UTC)
You would have to use Lua for that. Keφr 21:26, 2 January 2015 (UTC)
I hope you mean you in the sense of "one". DCDuring TALK 23:06, 2 January 2015 (UTC)

I support it. Some pages already use the right header (e.g. UFO). The example of fr.wikt shows that it's possible to do without these headers. Lmaltier (talk) 17:08, 3 January 2015 (UTC)

Filter created: Special:AbuseFilter/42. Keφr 19:17, 12 January 2015 (UTC)

Thanks. Once we clean up the instances of {{en-acronym}} and {{acronym-old}} and any associated Acronym L3 headers, which usually need Pronunciation sections too, we will have {{initialism-old}} and {{en-initialism}} to work on, adding {{IPA letters}} under Pronunciation. DCDuring TALK 22:50, 12 January 2015 (UTC)
Note the filter has not been enabled yet. Also, NEC has been modified not to offer the headers discussed here. Keφr 18:23, 14 January 2015 (UTC)

Can we insert proto‐words into translation tables?[edit]

If I desired to add Proto‐Germanic *kūz to a translation table for cow, could I get away with it? --Romanophile (talk) 01:26, 3 January 2015 (UTC)

No, as far as I know, the common agreement is that only attested terms can be in translation tables. —CodeCat 01:48, 3 January 2015 (UTC)
I for one do not want to see protolanguages in translation tables. —Aɴɢʀ (talk) 08:58, 3 January 2015 (UTC)
How about no links to unattested form in translation tables. Renard Migrant (talk) 15:09, 3 January 2015 (UTC)
I wouldn’t mind reconstructed terms in translation sections. I bet there are more people who want to know what’s water in PIE or PG than people who want to know what it is in Minica Huitoto, Northern Emberá or Lijili. — Ungoliant (falai) 16:19, 3 January 2015 (UTC)
The information is of interest, but it doesn't belong in a translation table with attested words. Maybe in a "See also" section? Chuck Entz (talk) 17:28, 3 January 2015 (UTC)
I can't imagine anyone wanting a translation that isn't attested. That is, a translation which according to evidence, may never have been used by anyone, ever. Renard Migrant (talk) 19:52, 3 January 2015 (UTC)
I would not be opposed to things like Appendix:English–Proto-Indo-European glossary, Appendix:English–Proto-Semitic glossary, Appendix:English–Proto-Algonquian glossary and the like, though, just like proto-language word are in Appendix space rather than mainspace. —Aɴɢʀ (talk) 21:07, 3 January 2015 (UTC)
As a translation- no. The point is that someone might want to know about the history of a particular concept within a given language family. Right now, that requires either fishing through translations in descendant languages looking for references in the etymologies, or browsing through the appendix to find the entry. That's why I suggested putting it in "See also"- there's no implication beyond there being some connection that might be of interest. Of course, it could easily be overdone, but that just calls for restraint, not a categorical exclusion. Chuck Entz (talk) 21:18, 3 January 2015 (UTC)

Make Wikisaurus a separate wiki[edit]

Hopefully this doesn't come across as blasphemous, but it seems to me that Wikisaurus should be made into a separate Wikimedia wiki project instead of existing as a sub-namespace within Wiktionary. The way this project is currently, it is quite problematic to search for thesaurus entries on words, because they're mixed in with the dictionary entries, and "Wikisaurus:" works like a kludgy title prefix on every single thesaurus entry. Also, it's awkward within a Wikimedia project that every single linked thesaurus word on another thesaurus article entry must be wrapped in a "ws" template. It seems that ideally, it would be desirable for plain word links within Wikisaurus to link within Wikisaurus, and if we wanted any links between Wiktionary and Wikisaurus then that should use proper inter-wiki links. Also, the Wiktionary articles already have static synonym lists in them; wouldn't it be desirable to have this be dynamically generated cross-wiki from Wikisaurus? Also, wouldn't it be desirable to have an automatic reference link for every word entry between these two projects, similar to how we have template-based automatic links between Wikipedia and Wiktionary? It seems very substandard to keep Wikisaurus as simply a static sub-namespace, and I feel like it needs to be developed into something much more tailored to working like a thesaurus site. —This unsigned comment was added by Wykypydya (talkcontribs) at 11:23, 3 January 2015 (UTC).

Burn the witch!
No, seriously. It would complicate things for the sake of… what exactly? Most links in Wikisaurus do, in fact, point to regular entries. Keφr 12:01, 3 January 2015 (UTC)

No, I think that this belongs to the Wiktionary project. Here, words can be found through the search box, through categories and through thesaurus pages, these 3 methods should be kept, all of them are very useful, depending of what the user needs. A thesaurus page can provide thousands of words (not only synonyms), its contents cannot be moved to normal pages. But I would rename Wikisaurus to Thesaurus, this would be less kludgy... But something should be done about phrasebooks entries: they should be grouped into topical pages, just as in all paper phrasebooks, or they could be moved to another project (Wikiversity?) Lmaltier (talk) 17:22, 3 January 2015 (UTC)

Renaming rhyme pages vote[edit]

Could you please post your abstains (or other votes, as applicable) to Wiktionary:Votes/2014-09/Renaming rhyme pages so that the vote shows explicitly editor indifference, or maybe even gets a clearer outcome? Thank you. --Dan Polansky (talk) 11:50, 4 January 2015 (UTC)

Category:Terms by their individual characters by language automation in Lua[edit]

Happy new year to all. I've just adapted my "rare letters" identifier Lua module here to deploy this automatic categorization like on the French Wiktionary. This example works, but we'd rather invoke this module from the existing templates, like Template:en-noun. JackPotte (talk) 16:14, 4 January 2015 (UTC)

To clarify what was said at Module talk:languages, the idea is to include a list of the letters for a language in our data modules. That way, any entry containing a character not appearing in there can automatically be added to a category. The question, though, is which letters should be considered unusual in a given language. Some languages don't natively use letters like q or x, but still have them in many loanwords. —CodeCat 16:30, 4 January 2015 (UTC)
Also an implementation detail: While it seems sensible at first glance to list the letters only in one case form, case conversion is actually language-specific. A good example is Turkish: it has separate dotted İ i and dotless I ı. Using the "standard" case conversion would give incorrect results for Turkish. —CodeCat 16:33, 4 January 2015 (UTC)
For example I've let "é" into this letter list for French (Module:Rare letters/data, which can be filled with upper or lower letters) in spite of the detection of nearly 150,000 entries. JackPotte (talk) 16:36, 4 January 2015 (UTC)
I don't think that's the right approach, though. Unicode contains thousands of characters. We want to list the characters that are not rare, so that any that are not in the list are categorised. —CodeCat 16:38, 4 January 2015 (UTC)
In a second time we could gather in one category the letters which are unknown by the script. JackPotte (talk) 19:25, 4 January 2015 (UTC)
Right now, basically all the French letters (and English- the list for both is identical) with diacritics and all the digraphs are listed in (Module:Rare letters/data as "rare". That's not just isolated silliness: it's created redlinks to Category:French terms spelled with À‏‎ and Category:French terms spelled with Ç‏‎ from its use in just one entry. If this template were implemented in all French entries with the current data, a substantial part of all the French entries would be in one or more of these categories, and many of the categories would be so huge as to be pretty much useless. Please don't add this to any more entries without a consensus as to which characters should be included in the "rare" lists. Chuck Entz (talk) 04:11, 6 January 2015 (UTC)
Actually we are now all able to measure objectively and precisely the numbers of these presumed rare letters. For example "é" represents 341 342 of the 1 366 145 French articles, so 25%! Consequently we can now remove it from our categories if the consensual criteria becomes 1% at the light of these global digits. JackPotte (talk) 19:35, 6 January 2015 (UTC)
I think we need to discuss why we want such categories: remember that categories aren't for classification, but for navigation. The question you have to ask is "why would someone want a list of entries with a given letter in it?". I would say that relative frequency doesn't make a such a list of interest: even the rarest letters in the English alphabet such as j,q and x are found in lots of ordinary words- it's only unusual contexts like word-initial x or q not followed by a vowel that are of interest. French letters in the normal French alphabet aren't worthy of categorization- who wants to look through a huge list of French entries with ç in them? It's only letters that aren't part of normal French, and that make a word unusual by their presence, that should be categorized. Chuck Entz (talk) 15:03, 8 January 2015 (UTC)

Lemma dilemma[edit]

My issues are more involved than simply a lemma, but the title sounded so cool.

I've got some experience on Wikipedia, but I'm working on my first Wiktionary entry. It is the Indonesian word pengairan which is not yet complete. A number of things came to my mind as I attempted to make this word part of Wiktionary.

  1. What is a lemma? According to Wiktionary:Lemmas, "When a word has multiple distinct forms, the lemma is the main entry at which the definitions, etymology, inflections and such are placed." Also, "For nouns, the lemma is normally the form that is used as the singular subject of an intransitive verb." Pengairan meets these criteria except for the etymology, since that information belongs with the root air, the prefix peng- and the suffix -an used to form the word. However, if we look at the second Wiktionary definition of lemma, it says, "The canonical form of an inflected word; ie the form usually found in dictionaries." Here is where the conflict arises. Pengairan will certainly be found in Indonesian dictionaries. However, it will be listed under a for air, not under p. This is true of all Indonesian words formed by a base and one or more prefixes and/or suffixes. By way of example, the official dictionary of the language printed by the Indonesian government has a listing for air. It gives two definitions. These are followed by six well-known proverbs in which the word is used with each proverb explained. Next, there are 171 idiomatic phrases that begin with the word air. Finally, there are six words derived from air listed: mengairi, pengairan, berair, perairan, berpengairan and keairan. None of these words can be found in the dictionary under m, p, b or k. Indonesian dictionaries treat words like this in a manner similar to the treatment English dictionaries afford inflections. In that regard, these words aren't lemmas, since the entire concept of a lemma is different. Were a dictionary of the Indonesian language to be written (for printing on paper) using the methodology employed in writing English dictionaries, these words would certainly have separate entries. For that reason, I've placed pengairan into the category for Indonesian lemmas. However, I would like to know what others think. Should the lemma category for non English words be for things that are lemmas as determined in the way they are in English, or should the foreign language's concept of a lemma be used?
  2. Other than pronouns, Indonesian nouns are pluralized by doubling them. So, the plural of pengairan is pengairan-pengairan. Alternatively, the word para can be placed in front of the noun to indicate plurality as para pengairan. I created an entry for pengairan-pengairan. However, should we consider something else? Every countable Indonesian noun other than pronouns would have an entry that simply doubles the word. This seems to be a large number of entries that may be better handled another way. It might make sense to create an entry for pengairan-pengairan and have it redirect to pengairan. On the pengairan entry, the plural could be shown as an inflection but not linked. Folks should be mindful that plural forms are only used in Indonesian when they are needed to disambiguate. For instance, the word for dog is anjing, and the word for cat is kucing. People say, "She has two anjing and three kucing," not "two anjing-anjing and three kucing-kucing," since the context makes it obvious that the word is plural. This means the plural inflection has less importance in Indonesian than it does in English, and doesn't appear often.
  3. Aside from plurality, other inflections apply to many Indonesian words. Nouns do not have possessive forms. Instead, they just follow the nouns they own. For example, "anak" means child, and "anjing anak" means the child's dog. However, Indonesian nouns do have possessed forms if they are owned by singular pronouns. These are the same for every Indonesian noun that can be owned. So, "anjingku" means my dog, "anjingmu" means your (singular) dog, and "anjingnya" usually means his, her or its dog. The -nya suffix can also act as a definite article. So, "anjingnya" can also mean the dog. Either way, it adds specificity to the base word. There are no irregular forms of these. They apply to every Indonesian noun. Should separate entries be created for pengairanku, pengairanmu and pengairannya, since each of these is a separate word?
  4. Another inflection issue in Indonesian is the suffix -kah. If pengairan were to appear as the first word of a question, it could be pengairankah. This alerts the reader or listener that a question is coming without changing the meaning of the word. It is only used where the writer or speaker thinks it is necessary. Often, this would be at the start of a lengthy question. Usually, voice inflection alone is sufficient to make a sentence an obvious question to a listener. Question marks are used to end written questions as in English. So, -kah is omitted far more often than it is used. Should a separate entry be created for pengairankah, since it is a separate word? A vast number of Indonesian words can be used to start questions meaning each of these would have an entry for the -kah inflected form.
  5. Finally, there is also an inflected form with a -lah suffix. This is most commonly used to soften a message. For example "duduk" means to sit. It can stand alone as a sentence in the imperative mood. However, by itself, "Duduk," can sound like a command given to a child. English speakers inviting someone into their homes might say, "Please, sit," as they welcome their guests. Indonesians can say "Silakan duduk," which means the same thing or simply "Duduklah," which isn't the same as saying please but is also not a stern command; it is polite. The suffix -lah can also be used to emphasize a word and will usually appear as the first word of the sentence. For example, "Pengairanlah bisa memecahkan masalah kami," is the same as saying, "IRRIGATION can solve our problem." The suffix -lah serves to highlight pengairan. This usage is fairly common and can apply to words across different parts of speech. Does pengairanlah merit its own entry in Wiktionary?

Of course, these questions are not really all about pengairan. They are about how these peculiarities in the Indonesian language should be handled in Wiktionary. Perhaps similar issues have arisen with other languages (or maybe these issues with Indonesian have been discussed before). I'd be delighted to read through such prior discussions if someone could point me in the right direction. Thank you for your patience, if you made it all the way here, and thank you for any thoughtful reply you provide. Taxman1913 (talk) 20:26, 6 January 2015 (UTC)

  1. Why is pengairan not considered a lemma by other dictionaries? Being morphologically predictable is one thing, but morphology alone is not enough to make a word. For example, in English, paintify is morphologically and semantically transparent (the noun paint + -ify), but that doesn't mean that people actually use it. Considering pengairan a non-lemma form of air would only make sense if any noun has a peng- -an form.
  2. I would say that the Indonesian plural shouldn't be included, as it is fully predictable, has no exceptions, and is written in a form that allows its parts to be recognised easily. This is similar to, for example, the English negative form of a verb, which just consists of a form of do paired with not and the infinitive. There's no objection to including it in an entry without a link, but that would only be useful for absolute beginners. I know no Indonesian but I still know that doubled nouns are plural.
  3. I think the possessive forms of nouns should be included, as unlike with plurals, it's not obvious what parts pengairanku is made up of at first glance. Likewise for the definite form. Irregularity doesn't really matter, compare for example Esperanto where all words are regular, but we still include inflections for it. I think we also have possessive entries for Hungarian, Finnish and Turkish? It's not clear where to draw the line, though.
  4. If -kah can be attached to any word, then it behaves similar to the Latin suffixes -que, -ve and -ne, the English -'s, or the Finnish -kin and -kaan. We don't include entries for words which have these suffixes. I'm not sure why not, exactly, but it could just be a purely practical consideration: if they can be attached to any word, then you'd have to double, triple or quadruple the number of entries (lemma and non-lemma) for that language, without a very clear benefit. See also the deletion discussion for satisne.
  5. This seems more or less the same as above; if it can be attached to any word, it's probably better not to have entries for it.
CodeCat 20:53, 6 January 2015 (UTC)
Note that some suffixes are systematic in English~, too (e.g. -like can be attached to any noun, according to my Pocket Oxford Dictionary), but this is not a reason to omit these words provided that they are attested. In my opinion, requiring attestations if sufficient, and this requirement would reduce drastically the number of potential entries. Lmaltier (talk) 21:26, 6 January 2015 (UTC)
The point about Indonesian lemmas is that given the second definition of lemma right here on Wiktionary, pengairan doesn't look like one, because you will not find it under "p" in an Indonesian dictionary. It will be under "a" for air. The Indonesian government's official dictionary defines the word lema (Indonesian for lemma) as (1) Kata atau frasa masukan di kamus di luar definisi atau penjelasan lain yang deberikan di entri; (2) butir masukan; entri. I'll translate that as (1) The word or phrase of a dictionary entry outside the definition or other explanation given in the entry; (2) An entry. Masukan (as used here) and entri are synonyms. The definition of masukan is not helpful as it is frequently used outside linguistics contexts. Entri is defined as (Linguistics) (1) Kata atau frasa di kamus beserta penjelasan maknanya dengan tambahan pejelasan berupa kelas kata, lafal, etimologi, contoh pemakaian, dan sebagainya; (2) lema. I'll translate that as (1) A word or phrase in a dictionary along with the explanation including additions to the explanation in the form of part of speech, pronunciation, etymology, usage examples, etc.; (2) lemma. So the first definition of lema could lead to a conclusion that air is a lema while its derived words are not, since they are presented in the explanation and not outside it. However, with lema and entri being used to define one another in secondary definitions, we are left wondering whether lema could also include the entirety of a dictionary entry as entri clearly does. Whether pengairan is deemed a lemma by Indonesian linguists has no bearing on whether it merits its own entry in Indonesian dictionaries. It is a derived term and listed under its base word which makes it look like something other than a lemma according to the definition of a lemma on Wiktionary. Taxman1913 (talk) 21:56, 6 January 2015 (UTC)
CodeCat, your point about pengairan being a lemma unless all Indonesian nouns have peng-*-an forms is an excellent one. The prefix peng- is a form of the prefix pe- which can also take the form of pem- or pen- depending upon the first letter of the base word. Both nouns and verbs can have one or both of these affixes added to them. For example, main means to play, pemain is player, mainan is toy, and pemainan is game. While these affixes are common and can be used with both nouns and verbs, Indonesian people aren't free to just make up all kinds of words by using them. According to the government's dictionary, pengair and airan are not words. Similarly, a word I used above, lema, has no derived forms resulting in pelema, lemaan or pelemaan. So, it is clear that words formed in this manner have unique characteristics that lend much credence to the argument that they are truly lemmas. Taxman1913 (talk) 22:19, 6 January 2015 (UTC)
CodeCat, your reasoning about Indonesian plurals makes sense. Unless someone presents a compelling reason why an entry for each of them is needed, I'm going to delete the one I created. Taxman1913 (talk) 22:22, 6 January 2015 (UTC)
I think attestation of the specific possessed inflections of Indonesian nouns as suggested by Lmaltier would drastically reduce the number of entries. How often will someone use the phrase "my irrigation"? While the construction of pengairanku is obvious to me, I can see CodeCat's point that most users of English Wiktionary not familiar with Indonesian word contruction will disagree. So, if attested, the word has a place here. Taxman1913 (talk) 22:32, 6 January 2015 (UTC)
Someone may not normally say "my irrigation", but Wiktionary entries don't always bother to indicate which parts of a word's inflections don't exist, especially if there are many of them. For example, it's clearly possible for one form of a Latin verb, out of the dozens it has, to happen to not ever be used (at least that we can tell). But it wouldn't make much sense to require excluding this in each case individually; it would become a nightmare to manage it. So for the sake of convenience, we sometimes list possible forms, even if they may not actually be used very much, or at all. This is not a hard rule of course, because in some cases we do exclude certain parts of the inflection, like comparatives for adjectives that are not comparable. For Indonesian nouns, I suppose we could include an approach that's somewhat similar, with "ownable" being like "comparable": there would be ownable and unownable nouns.
But we have to be careful, as someone might not normally use a form, but they still exist in case anyone wants to. For example, rain and its translations in many languages are often considered to be "impersonal" verbs, having only a third-person singular form (or equivalent), as there is not normally any specific thing that rains. But in figurative meanings or poetry, you might indeed find we rained. So we shouldn't be too quick to say that an Indonesian noun is unownable, because even though it's not "logical", there may be cases where someone has used those forms anyway. —CodeCat 22:43, 6 January 2015 (UTC)
I wouldn't consider rain an impersonal verb unless you promise never to rain on my parade. Taxman1913 (talk) 22:51, 6 January 2015 (UTC)
CodeCat, I agree with you about -kah and -lah forms. They are even more common than the Latin -que and the English -'s, since those are for nouns. These two can be attached to just about anything including prepositions and adverbs. But your mention of -que and -'s makes me think once again about -ku, -mu and -nya. Perhaps these are overkill as well. Even though pengairanku might not look obvious, are good quality entries for both pengairan and -ku sufficient? Even if attested, perhaps they're unnecessary. I could probably find an example of dog's somewhere. But we don't need a definition for it on Wiktionary. Taxman1913 (talk) 22:46, 6 January 2015 (UTC)
I'm going to work on creating an Indonesian noun template and circulate it for commentary. Taxman1913 (talk) 22:53, 6 January 2015 (UTC)
Actually, Latin -que can be attached to anything as well. It's often found attached to conjunctions for example, or indeed any other word that appears first in a clause.
I understand your point about the possessives, and there isn't really a clear line we can draw to say "this is ok to include" and "this is not ok", or at least not without just making up an arbitrary reason. The best I can come up with is this: Possessives only modify the meaning the noun they attach to, so they are "word-level" suffixes. But -kah and -lah are "clause-level" suffixes, they change the sense of whole sentences at a time. At least, that's how I understand it, please correct me if I'm wrong. So with that distinction, we could decide on a rule that words with clause-level suffixes/enclitics don't get their own entry.
We can also think of practical reasons. For possessives, we know that only nouns have them, and only some nouns at that (going by the "my irrigation" example). But the other suffixes can be attached to anything, and I'm guessing that they can even attach on top of other suffixes like possessives, and perhaps even on top of each other (is -lahkah possible, or something similar?). The number of possible combinations quickly becomes unmanageable. So we could decide to limit it to possessives, just to avoid creating a huge mess of possible combinations. —CodeCat 22:57, 6 January 2015 (UTC)
The Latin I studied for four years in high schoool is obviously rusty after 30 years. I only recalled the usage of -que as a conjunction for nouns until consulting the definition. Hopefully, Fr. Tighe and Mr. Scott can find it in their hearts to forgive me.
Your word-level/clause-level theory makes sense, and it points me more firmly into the direction of not including an entry for Indonesian possessed forms. You are correct that -kah and -lah change the sentence without changing the meaning of the word to which they attach. The change is subtle; I would say they only change the tone of the sentence, not its basic meaning. I've spent about 85% of the past four years living in Indonesia and have not come across -kah and -lah used in combination, but it is easy to imagine someone wanting to do that. The logical form would be -kahlah as in "Pengairankahlah belum digunakan di sini?" Which translates as "Hasn't IRRIGATION been used here yet?" This accomplishes sending out the question alert with the opening word of the sentence while also highlighting that word. So, -kahlah would be another possibility that could be affixed to nearly all Indonesian words including those to which -ku, -mu or -nya is already attached.
Turning back to the Indonesian possessed forms, the suffixes seem to me to also change the meaning of the sentence rather than the word. The mechanics are closer to the Latin -que than to the English -'s in this regard. Consider:
  • dog's = belonging to the dog; this word is no longer purely a noun.
  • puerque = and the boy; the meaning of puer is unchanged, and -que ends up replacing what could be written as a separate word in the sentence as in et puer.
  • anjingku = my dog; the meaning of anjing is unchanged, and -ku ends up replacing what could be written as a separate word in the sentence as in anjing aku.
  • anjingmu = your (singular) dog; the meaning of anjing is unchanged, and -mu ends up replacing what could be written as a separate word in the sentence as in anjing kamu.
  • anjingnya = his dog; the meaning of anjing is unchanged, and -nya ends up replacing what could be written as a separate word in the sentence as in anjing dia.
Illustrating it in this way leads me to conclude that the Indonesian possessed forms are mechanically identical to Latin words constructed with -que and do not need separate dictionary entries.
This is also true when -nya is used as a definite article.
  • anjingnya = the dog as opposed to dog; the meaning of anjing is unchanged, and -nya has its own meaning that just happens to be attached to the word.
The definite article is only used in Indonesian for disambiguation. Since the construction is the same as the third person singular possessed inflection, it often is impossible to tell what was in the speaker's mind when it was used as in "Gadis takut karena anjingnya." This could be "This girl is afraid of the dog," or "The girl is afraid of his dog," when translating to English. Only context can provide the correct answer. It really doesn't matter much. We can understand that there is a specific dog that makes the girl afraid. If we said, "Gadis takut karena anjing," we would understand that the girl is afraid of all dogs. No matter what, anjing remains dog and doesn't become somethng else with -nya affixed to it.
I'm in favor of no separate entries for Indonesian words with -kah, -lah, -ku, -mu or -nya suffixes. I also feel that separate entries are not needed for Indonesian plurals that are simply doubled. Hopefully, I can create a template that will present this in a clear way. I'm not sure there are nouns that cannot be owned. For instance, we might think arithmetic cannot be owned until someone says, "Your arithmetic is flawed." Nevertheless, I'll build an option into the template to make nouns unownable. Taxman1913 (talk) 07:46, 7 January 2015 (UTC)
But we do include possessives for Hungarian and Turkish nouns. Compare hal or hal for example. I don't think the Indonesian possessives work any different from that. So at least there is precedent for including them. I disagree with you that they work more like Latin -que, because like I said the possessives only modify a single word. "my" clearly belongs to "dog" and can't somehow be applied to the whole sentence. Your example of "puerque" shows that -que can also apply to a single word, but I think that's a wrong way of seeing it. -que joins two words, or two clauses, and therefore applies to more than just a single noun. Furthermore, you can imagine saying "a big dog and a small cat"... which word does -que attach to then? I would guess that it attaches to "small", which clearly shows that it's not about the noun, but about the whole phrase, and that it always attaches to the first word in the phrase. How would this work for Indonesian possessives? —CodeCat 15:12, 7 January 2015 (UTC)
Since Latin word order is free, -que attaches to whichever word is first: canis magnus felisque parvus or magnus canis parvusque felis, which just proves your point. —Aɴɢʀ (talk) 15:52, 7 January 2015 (UTC)
Here are a few Indonesian phrases for us to kick around:
  • a big dog and a small cat = seekor anjing besar dan seekor kucing kecil
  • my big dog and my small cat = anjing besar dan kucing kecilku
  • your big dog and her small cat = anjing besarmu dan kucing kecilnya
  • a big dog and my small cat = seekor anjing besar dan kucing kecilku
  • his big dog and a small cat = anjing besarnya dan seekor kucing kecil
We can see from the above that when an adjective modifies the possessed noun, the possessive suffix usually attaches to the adjective which ordinarily follows the noun. Some adjectives (usually those indicating quantity) normally appear in front of the nouns they modify. Here are more illustrative phrases:
  • many dogs = banyak anjing
  • my many dogs = banyak anjingku
Here the possessive suffix attaches to the noun, because it is the final part of the phrase that is owned. This is a consistent characteristic. Similary, we consistently apply this in English in the opposite direction. We say, "my many dogs," not "many my dogs" unless we are in a particularly poetic mood.
Indonesian adjectives modifying two nouns joined by a conjunction typically refer to both nouns. This leads to more illustrations:
  • blue pants and blue shirt = celana dan kemeja biru
  • my blue pants and my blue shirt = celana dan kemeja biruku
Here we see -ku attached to an adjective that simultaneously modifies two nouns and indicates that the entire phrase is possessed.
I look forward to reading thoughts on these. Taxman1913 (talk) 18:53, 7 January 2015 (UTC)
These examples have convinced me that the Indonesian possessive suffixes are clitics and operate on a phrase-level rather than on a word-level. The crucial part, to me, is that -ku attaches to the last word, irrespective of what kind of a word it is. The suffix is evidently parsed as including the entire preceding noun phrase. So I don't think we should have entries for possessives. Is it the same for the definite suffix -nya? —CodeCat 19:01, 7 January 2015 (UTC)
I'm curious though, how you would write something like "a dog and a small cat" or "dogs and my cats"? You'd have to clarify in this case that the adjective or possessive applies only to the last noun phrase. —CodeCat 19:04, 7 January 2015 (UTC)
The definite article suffix -nya works the same way as the third person singular possessed -nya. It wraps around an entire phrase where applicable.
These three sufffixes show up in other places as well. They act in a manner similar to contractions at the end of some transitive verbs and prepositions. Here are some illustrations:
  • bagi = for
  • bagimu = for you, effectively a contraction of bagi kamu
  • melihat = to see
  • melihatku = to see me, effectively a contraction of melihat aku
This also affects a vast numer of words. Any thoughts? Taxman1913 (talk) 19:35, 7 January 2015 (UTC)
My first thoughts on your questions were
  • seekor kucing kecil dan seekor anjing
  • kucing-kucingku dan anjing-anjing
Reversing the word order solves the problem. That's what I would do to be sure I was understood. My wife, a native speaker, says that if the context is known, an adjective or a possessive suffix could apply to only the second of two nouns linked with a conjunction. Conversely, she suggests that redundant use of adjectives and/or possessive suffixes may be warranted where the listener might incorrectly assume they only apply to the second of two nouns linked by a conjunction. She offers the following:
  • seekor anjing dan seekor kucing kecil
    • The presence of seekor as an indefinite article is sufficient to tip off the reader/listener that kecil only modifies kucing. Articles are frequently omitted in Indonesian in places where they would certainly be used in English. Indonesian indefinite articles are cumbersome. They vary depending upon the type of noun. The most common indefinite article is sebuah. Others include sebiji, sebatang and sebutir. Seekor is used for animals. Ekor means tail. They are all multisyllabic and force you to take a pause. In my wife's opinion, this is enough to separate the two linked nouns.
For "dogs and my cats," my wife agrees that reversing the word order is the best way to ensure you're understood. You could just go with anjing-anjing dan kucing-kucingku if the context makes it obvious that only the cats are yours. I should point out that if the listener already knows there is more than one dog and more than one cat, the nouns wouldn't be doubled. Taxman1913 (talk) 20:10, 7 January 2015 (UTC)
The same ambiguity issue can arise in English with the adjective at the front. "I went on vacation and spent the week drinking hot tea and beer." Alternatively, "I went on vacation and spent the week drinking cold beer and tea." Both of these are ambiguous. We can make the sentence more clear by using an adjective to modify each direct object. The same can be done in Indonesian to produce more clarity. Taxman1913 (talk) 20:44, 7 January 2015 (UTC)
My two cents:
With agglutinative languages, there are many components, which are attached to words. It's not really necessary to create all forms, IMHO, even if they are spelled without a space. Japanese, Korean have a lot of forms, which are attached, e.g. 커피하고 머핀 (keopihago meopin) "coffee and muffin" where 하고 (hago) is attached to 커피 (keopi). Indonesian doesn't seem to have a lot such forms and possessive forms are quite predictable. Arabic also has enclitic possessive suffixes - my, your, his, etc. It's part of the language grammar. E.g. بَيْتِي (baytī) means "my house" = بَيْت (bayt) + enclitic ـِي (ī) but I don't think we need such entries. Arabic doesn't need definite entries like الْبَيْت (al-bayt) "the house", even if the definite article is attached to the beginning of a word.
There is no consistency in how some languages are treated. Scandinavian languages include definite forms of nouns but Bulgarian/Macedonian don't. Albanian definite nouns change their forms, e.g. feminine indefinite -ë changes to -a, so it would probably make sense to include them but redirects or definite forms in the header would suffice.
The case with "pengairan" is trickier. It reminds me of Arabic roots. Best Arabic dictionaries store information by the root consonants and having all words separately sometimes makes dictionaries too large but it is sometimes hard to determine the root consonants, e.g. فَتَاة (fatāh) is derived from ف ت ي (f-t-y) or ف ت و (f-t-w). So, of course, it's better to have separate entries.
Wiktionary doesn't boast good coverage of Indonesian. So it's better to focus on lemmas, not predictable forms, IMO.--Anatoli T. (обсудить/вклад) 23:04, 7 January 2015 (UTC)
Two cents? That was worth at least a dollar, Anatoli. Thank you for such a thoughtful and thorough analysis. It seems there are no objections to excluding -ku, -mu, -nya, -kah and -lah forms of Indonesian nouns from having their own entries. That would apply to adjectives as well, since the analysis of the issue would be identical. Further, plurals of Indonesian nouns do not need entries. Would it be useful or advisable to have an infobox for Indonesian nouns and adjectives alerting readers to the possibility that the word may appear with one or more of the aforementioned suffixes attached? If so, is the best place for this with the headword or in the area where declensions would go? Taxman1913 (talk) 19:47, 8 January 2015 (UTC)
I tried to make pengairan-pengairan completely blank assuming that would inspire a bot (or a human) to delete it. The word cannot be attested and, therefore, doesn't meet the criteria for inclusion. The system thought I was vandalizing and wouldn't let me make it completely blank. It insisted I leave a language and part of speech on the page which I did. It suggested I contact an administrator if I think my edit was constructive. One would think something so sophisticated to protect against vandalism would provide a link to let an administrator know what you tried to do. But that isn't the case. Taxman1913 (talk) 20:00, 8 January 2015 (UTC)
Your edit showed up in Special:AbuseLog. Though I think only I look at it semi-regularly (mostly watching out for spam filter false positives). If you had used {{delete}} or {{rfd}}, the filter would not have been triggered. Whether the request would have been granted is another matter of course. Keφr 20:04, 8 January 2015 (UTC)
Thank you, Keφr. Taxman1913 (talk) 20:36, 8 January 2015 (UTC)

Rename tr= parameter to xlit= or similar[edit]

There are quite a few entries where tr= as been interpreted as meaning "translation". This is not very surprising. To prevent this kind of misunderstanding and confusion, I'm proposing to rename this parameter to xlit=, or something else that is less liable to be confused. —CodeCat 18:31, 7 January 2015 (UTC)

On the French version, we use "R=" like "Romanization". JackPotte (talk) 19:50, 7 January 2015 (UTC)
It should be easy to detect when someone is using it for translation just from the characters, no? (or with a language that should never be transliterated). DTLHS (talk) 19:54, 7 January 2015 (UTC)
PS: Romanization stands for an hyperonym of transcription and transliteration. JackPotte (talk) 19:58, 7 January 2015 (UTC)
|r= looks good. It's not misleading, also it's shorter. "Romanization" is also more accurate to describe what we pass to the parameter. --Z 22:36, 7 January 2015 (UTC)
  • Don't rename. Rare confusion is not a sufficient reason for this kind of deviation from the previous practice. --Dan Polansky (talk) 22:39, 7 January 2015 (UTC)
    • And established practice (that is, the inertia of established editors), is not an argument against improvement. —CodeCat 22:50, 7 January 2015 (UTC)
      • I also don't think a change is necessary. It seems like a solution in search of a problem. —Aɴɢʀ (talk) 22:52, 7 January 2015 (UTC)
      • Established practice is an argument against low value added change, whether considered an improvement or not. Changes have costs, and these have to be weighed against minor or even hypothetical benefits. I ask the reader to check various historical revisions of our mainspace pages and see how changes made by CodeCat had drastically reduced legibility of these revisions, where the legibility is reduced by missing templates and by various module errors produced by templates. It is a pitiable state of affairs to look at. --Dan Polansky (talk) 22:55, 7 January 2015 (UTC)
        • Writing a dictionary in plain text was a terrible idea from the start and it still is, which is why we need all the templates to compensate, and why our templates and other technical infrastructure need so many revisions. I can't help it if Wiktionary was badly designed from the start. That's no reason not to try to fix it up some. I will continue to try to improve Wiktionary to the best of my ability, and provide a counterbalance to those who do otherwise, as long as I am active on this project. —CodeCat 22:59, 7 January 2015 (UTC)
          • I could not disagree more. I love wikitext, I love Wiktionary design and I dread the day on which the likes of you are going to remove the wikitext from the editor, and lock it down behind a WYSIWYG interface. --Dan Polansky (talk) 23:06, 7 January 2015 (UTC)
            • Sadly, that day won't come. My hope is to make Wiktionary semantically parseable, ideally something that a script could convert to another format like XML. In other words, to make Wiktionary's code about content only, and not about presentation. —CodeCat 23:10, 7 January 2015 (UTC)
              • "Established practice (that is, the inertia of established editors), is not an argument against improvement". It should be in template-space, CodeCat. Moving, renaming, and adding or removing parameters confuses the shit out of editors. You and Kephir take it far too lightly. Purplebackpack89 23:13, 7 January 2015 (UTC)
            • Love makes you blind, apparently. Wikitext has quite obvious limitations and drawbacks when it comes to creating a dictionary. If it were not for the comprehensive template infrastructure, maintaining category lists would be burdensome, and categories would often fall out of sync with the rest of entry content (just look at pages "tagged but not listed" on RFX). If it were not for standardised headers, distinguishing between definitions, etymologies and synonym lists would be much harder, if not impossible. Typing wikitext manually is error-prone; if it were not for sanity checks in templates and modules, the errors might linger in entries indefinitely. Remembering how every single template should be used is tiresome; even I cannot be bothered to do it, and I would be happy if I could just assume every template works the same. There are tons of boilerplate in each entry — entries being free-form wikitext pretty much forces us to have it; the alternative is complete chaos. Our translation lists would most probably be much smaller if we did not have WT:EDIT. If it were not for edit filters, the flood of spam and vandalism might be unmanageable. And each of these solutions I just mentioned is a brittle workaround, and entries still fall between the cracks anyway. Splitting entries into per-language pages gets proposed every once in a while, and is probably the most desirable feature of all, but if we were actually to do it, it would be a nightmare from both technical and copyright standpoints. Wikitext talk pages are obnoxious for those who know how to use them and confusing for those who do not. Free-form wikitext is an awful foundation on which to build a dictionary. Now waiting for the inevitable liberum veto thought-terminating cliché of "I disagree, just because". Keφr 10:13, 10 January 2015 (UTC)
Unfortunately, anything you propose tends to attract a disproportionate amount of vehemence and bitterness from certain quarters, which is getting very old. Still, on the merits, I'm very leery of changing something that's second nature to anyone who's edited much in non-Latin scripts just because of a small percentage that are confused (there are probably more who mistakenly use the language code "dk" for Danish, but no one is arguing we change that). I think the potential to alienate large numbers of our hardest-to-find type of editors by forcing them to relearn a basic part of their editing routine outweighs solving such a minor problem. Chuck Entz (talk) 03:37, 8 January 2015 (UTC)
Oppose. It’s not that common, in my experience. — Ungoliant (falai) 17:20, 8 January 2015 (UTC)
More pathological need to change things from CodeCat. Could you change other parts of your life and leave us alone? If the need is so strong you must change all parts of your life at all time, have you considered talking to a doctor about it? No this isn't sarcasm. Renard Migrant (talk) 17:28, 8 January 2015 (UTC)
Why don't you just go fuck yourself? That's a change I could agree with. —CodeCat 18:02, 8 January 2015 (UTC)
Could you answer the questions? Renard Migrant (talk) 22:28, 8 January 2015 (UTC)
  • Oppose: Because of what I said about moving stuff confusing people. Purplebackpack89 17:38, 8 January 2015 (UTC)
  • Count me as opposing. I am not convinced that the benefits outweigh the costs. Keφr 20:05, 8 January 2015 (UTC)
  • Oppose Though the recommendation CodeCat proposes would have been a good idea long ago, there is no evidence that this is a significant problem. In fact we have a completely unsubstantiated claim that it is sometimes a problem without a single instance. If, 1., it isn't worth collecting the evidence that it is a problem and , 2., most contributors to this discussion don't think it is a significant problem, then we should not waste more time on this. DCDuring TALK 22:30, 8 January 2015 (UTC)

Project Wiktionary Meets Matica Srpska[edit]

We would like to inform you about the project we are starting. The project's aim is to increase support for Open Knowledge / Free Content movement through establishing long term strategic partnership with the venerable cultural institution Matica srpska and to increase the quality, accuracy and volume of Wiktionaries through digitization of two dictionaries, while developing a potential model for future development of cooperation across Wiktionaries through targeted mobilization of the communities.

There are two activities within the project that we need your support for. First, we are preparing a list of lexicographical terms (it would contain approximately 100 terms) that needs to be translated into as many languages as possible, in order to ensure further work on the project and to create the foundations for other lexicographical projects in the future. For this task, we would use a separate application, but all of the terms would be inserted into Wiktionaries, as well (making approximately 10,000 new entries per Wiktionary, counting that the terminology would be translated into 100 languages). That would also serve as the preparation for translating the Serbian Ornithological dictionary, which is the second activity.

The Serbian ornithological dictionary encompasses all local names of birds living on the Serbian speaking territory. All names are specified under the appropriate Latin name in accordance with the contemporary classification system. This creates the opportunity to translate it easily to various languages, since the basic list of terms is in Latin and it is fairly small (370 species of birds).

We would like to try and motivate as many Wiktionary communities as possible to participate in translating these two dictionaries, especially since the benefit for each particular Wiktionary would be great - for example, if we succeed to motivate people from 100 Wiktionaries to participate, the amount of primary entries to these 100 Wiktionaries would be 3,700,000 (37,000 per Wiktionary). If we succeed to motivate just 20 Wiktionaries, the amount of entries to these 20 Wikitionaries would be 148,000 basic entries. Of course, these entries would be incorporated into the respective Wiktionaries according to the interest and the rules of each community.

With this project, we are opening cooperation with the venerable Serbian cultural institution Matica srpska [2] and we believe that this partnership will have major impact on future cooperation between Wikimedia organizations and similar institutions in Slavic countries. If this cooperation could be relevant to any other partnership you are trying to establish in your country or globally, we would be more than willing to share our knowledge and contacts.

Besides support in translation, we are open for Wikimedia volunteers to participate more substantially and thus build knowledge inside of the community on how to deal with this kind of data. For example, if you are willing to join the core team and help us in communication with the Wiktionary communities of the languages which you are speaking, please contact Milica (milica.gudovic@yahoo.com) or Milos (millosh@gmail.com) via email. The same goes if you are willing to contribute by coding in Python and/or PHP.

Please join us on project's discussion page or send us an e-mail. If you are willing to participate but are not sure in your knowledge of English please check the list of languages organizational team is speaking - there is a chance we can communicate in your native language.

We are very excited about this projects and hope that you will be part of it as well!

Looking forward to hear from you!

--Godzzzilica (talk) 15:55, 8 January 2015 (UTC)

How does our merger of Serbian, Croatian, Bosnian and Montenegrin into Serbo-Croatian affect the project? How does Matica Srpska feel about that? — Ungoliant (falai) 17:23, 8 January 2015 (UTC)
Serbian lexicography usually treats all those languages as one, with mentioning that something is a Croatian variant (for example, attorney is "advokat" in Serbian, while "odvjetnik" in Croatian; capital monolingual dictionary of Serbian language would have entry "odvjetnik", while mentioning that it's "hrv. for advokat"). This topic is much more problematic in Croatia than in Serbia. --Millosh (talk) 20:32, 8 January 2015 (UTC)
Said so, the best idea is not to insist on their formal position. If you don't insist, that would be treated as editorial choice of the English Wiktionary, not under the jurisdiction of Matica srpska. If you insist, they would likely give a kind of negative comment. --Millosh (talk) 20:32, 8 January 2015 (UTC)
Linguistically speaking, some things could be fixed. For example, одвјетник (Cyrillic spelling of odvjetnik) is a kind of neologism. It's strictly Croatian word and contemporary Croatian is written strictly in Latin alphabet; it could be transcribed for the linguistic and similar purposes, but if in Cyrillic alphabet, it's translated to адвокат. --Millosh (talk) 20:32, 8 January 2015 (UTC)

Oh, I've read everything in relation to the decision made few years ago now. The community didn't make a formal decision in relation to that issue. So, brace yourselves! This could become significant issue on English Wiktionary (again) after this project gets public attention. Note that this is likely an exclusive practice of English (and Serbo-Croatian) Wiktionary. --Millosh (talk) 21:45, 8 January 2015 (UTC)

Topical context labels[edit]

diff. Judging by that edit, User:Jamesjiao probably feels that context labels are meant to show limited usage or to disambiguate. Since this definition doesn't need disambiguating and is in general use, he removed the label. But the fact that the label was there in the first place means that someone else thought differently. That other person presumably thought that labels could also used merely to give extra information about what general semantic field or topic a term pertains to, even if it's not a term specific to it.

These two approaches have been at odds with each other for a while now, but I don't know if there has been a discussion on it recently-ish. If there has been, it was either inconclusive or not widely advertised so I missed it. So my question is, should context labels be used to indicate topics or other information, when they give no further on the definition itself that is not already apparent from that definition? That is, should labels be always restrictive/defining or can they also be non-restrictive? —CodeCat 00:02, 9 January 2015 (UTC)

I feel that in some cases people use labels because they are too lazy too add the categories manually. — Ungoliant (falai) 00:33, 9 January 2015 (UTC)
Well, it's true that some other dictionaries do add topical labels. So it's probably inspired by that. That doesn't mean I understand why those other dictionaries do it, though. —CodeCat 00:39, 9 January 2015 (UTC)
I see the logic of both kinds of labels, but the restrictive usage context seems to be an essential function for a reference that purports to be about words. We apply neither type of label consistently, even in a single PoS section, eg, car#Noun. I don't think that we have even well characterized what we mean by a usage context. Is it a group of people with shared vocabulary, eg, soldiers? Is it a situation, eg, military service? Is it a kind of technical expertise, eg, modern firearms design or use? DCDuring TALK 01:27, 9 January 2015 (UTC)
The less said about the arbitrariness of topical categories the better. DCDuring TALK 01:27, 9 January 2015 (UTC)
I personally feel that geology as a science is quite a specialised field that has its own set of jargons, which this term shouldn't be a member of, imho. One reason I removed this label is because I noticed that the term is already found in this category Category:nl:Landforms, which better suits non-jargon terms like this one. JamesjiaoTC 01:34, 9 January 2015 (UTC)
One trouble with categories is that they are not visibly connected to a particular sense of a polysemic word. DCDuring TALK 02:32, 9 January 2015 (UTC)
The discussed edit (diff) complies with Wiktionary:Votes/pl-2009-03/Context labels in ELE v2. It also matches our dog entry, which does not use "zoology" label before "A mammal, Canis lupus familiaris, that has been domesticated for thousands of years, of highly variable appearance due to human breeding" definition. --Dan Polansky (talk) 18:13, 9 January 2015 (UTC)

Word-mining at Wikipedia[edit]

A development at Wikipedia may be of interest to Wiktionary editors. Over at w:Wikipedia:Typo Team/moss a new project is finding all words occurring within the English-language Wikipedia that are not mentioned in any Wikipedia article title, nor in any Wiktionary entry or Wikispecies entry. One of the aims is to identify and fix errors in Wikipedia articles; another is to identify and fix missing entries at Wiktionary and Wikispecies.

You are invited to review the current list and perhaps be inspired to create missing Wiktionary entries. -- John of Reading (talk) 14:22, 9 January 2015 (UTC)

  • I've added a few. SemperBlotto (talk) 14:59, 9 January 2015 (UTC)
  • The list is useful, but does it distinguish the different languages on Wiktionary? A word may have an entry on Wiktionary, but not in English. —CodeCat 15:08, 9 January 2015 (UTC)

RFV - period for closing nominations[edit]

Wiktionary:Requests for verification/Header currently says:

Closing a request: After a discussion has sat for more than a month without being "cited", or after a discussion has been "cited" for more than a week without challenge, the discussion may be closed. ...

The same page says that a discussion can be archived a week after closure.

This double one-week period seems excessive. I propose this change:

Closing a request: After a discussion has sat for more than a month without being "cited", or after a discussion has been "cited" for more than a week without challenge, the discussion may be closed. ...

Thus, once an entry is cited, the discussion should be able to be closed immediately, after which there will still be one week for objections before the discussion is archived. Thoughts? --Dan Polansky (talk) 13:48, 10 January 2015 (UTC)

  • If the sole reason for change is "This double one-week period seems excessive", why would we bother? In particular it doesn't seem excessive to me. DCDuring TALK 14:18, 10 January 2015 (UTC)
    • As per the header, when the entry is cited, one first needs to wait one week before closing the nomination, and them one week before archiving the nomination. You rarely close or archive RFV nominations, which may help explain that this does not seem excessive to you. To me, waiting one week after an entry has been cited before removing the nomination from RFV page (via archiving) seems sufficient, especially when the closing person is different from the person who added the quotations. --Dan Polansky (talk) 14:28, 10 January 2015 (UTC)
Since struck headers can be unstruck, there's no reason not to close immediately if the citations are apparently watertight. It would be bad faith if there are less than three certain citations. However if a mistake is made by striking a header for a term that isn't certainly cited, we can just unstrike it. In fact, we do. Renard Migrant (talk) 15:53, 10 January 2015 (UTC)

acronym, initialism, abbreviation template use[edit]

Looking at Category:English acronyms I note many entries that seem unpronounceable as words or, at least, not obviously pronounceable. I hope we have agreement that for purposes of our definitions that, for English L2 sections at least:

  1. an acronym is an abbreviation formed from the initial elements (or just initials?) of the component words of the term it represents pronounced as a word (uses {{acronym of}})
  2. an initialism is an abbreviation formed from the initials (or initial elements?) of the component words of the term it represents pronounced as the component letters of the initialism (uses {{initialism of}})
  3. an abbreviation is anything not certainly or probably either an acronym or initialism (uses {{abbreviation of}}).

If something is pronounced either as an initialism or as an acronym in the senses above we should have a pronunciation section and not use {{acronym of}} or {{initialism of}}. We should probably require a pronunciation section for everything claiming to be an acronym.

Is the above a reasonable summary of existing preferences? It certainly is not carried out. I would suggest that, if there is agreement, we should perform some cleanup, especially on the entries in Category:English acronyms. DCDuring TALK 14:12, 10 January 2015 (UTC)

I was doing a course a few months ago where the tutor referred to the 'acronym' DLA which of course, isn't an acronym. It's even used erroneously to mean initialism, so yes, correct it as much as possible. Renard Migrant (talk) 22:53, 11 January 2015 (UTC)
I'd like more people to agree that this is the correct approach so that I don't duplicate the mistake being perpetrated with respect to {{eye dialect of}}. (See WT:TR#gub'mint.) DCDuring TALK 23:15, 11 January 2015 (UTC)
I just looked at an extensive discussion of use of the term initialism on Wikipedia. It was a debate between advocates of "precision" versus advocates of accessibility to a general population of readers. DCDuring TALK 16:28, 12 January 2015 (UTC)
I suppose some of the good news is that {{acronym-old}} remains on only some 200 entries. I don't know how many L2 sections have Acronym as a PoS header. I miss Ullmann's runs that produced lists of non-ELE L3, L4, L5 headers. DCDuring TALK 16:53, 12 January 2015 (UTC)
KassadBot used to keep records of headers. Renard Migrant (talk) 18:00, 12 January 2015 (UTC)
I can see the category populated by {{rfc-header}}, but not something comparable to User:Robert_Ullmann/L3#Invalid_L3_headers. He seems to have been way more diligent than anyone since. DCDuring TALK 19:18, 12 January 2015 (UTC)
Someone — I think it was User:DTLHS — made a WT:TODO list last year of all headers used on Wiktionary, which they and CodeCat and I were using to find and fix misspelt headers like "===Etymologoy===", but which also caught correctly-spelt but nonstandard headers. Perhaps they could generate a new list for you. - -sche (discuss) 23:41, 12 January 2015 (UTC)
You'd be amazed if you fix say, 5 per day, how quickly a list of 200 goes down to zero. I seem to think I fixed 700 Swedish entries with a declension table not under a header doing 10 per day. Renard Migrant (talk) 16:32, 13 January 2015 (UTC)
I've added all the entries I can find with the offending headers to Category:Entries with non-standard headers. Renard Migrant (talk) 00:24, 16 January 2015 (UTC)

Template:zh-see[edit]

User:Wyang and User:Mar vin kaiser seem to be implementing Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese which not only doesn't close for another two weeks, it's on course to fail by a clear margin. What to do, nothing? Blocks? Mass deletions? Renard Migrant (talk) 23:17, 11 January 2015 (UTC)

What are you talking about? The discussion has reached consensus among Chinese-language editors. As I said before on my talk page, there is no point for a vote when all Chinese-language editors support the proposal. The vote is a means for a bunch of utter standers-by to dictate what chores others should do. Wyang (talk) 23:22, 11 January 2015 (UTC)
The objective of the project is a dictionary to be used by readers. What is the proportion of Chinese-language editors among readers? It's negligible. What editors should do should be dictated by what users need, like, use. Lmaltier (talk) 22:04, 14 January 2015 (UTC)
What Wyang said - we have reached an agreement. The vote doesn't have any rationale because it wasn't created by Chinese editors. There are legitimate concerns, though. Eventually and ideally, the simplified entries should display the same information as the traditional but the contents should be stored in one place. Having entries, which are out of sync, is a big problem. (Correctly formatted) traditional entries use both traditional and simplified - in synonyms, usage examples, etc, so there is no discrimination. --Anatoli T. (обсудить/вклад) 23:28, 11 January 2015 (UTC)
In any case, how does deleting simplified entries with {{zh-see}} help the project!? --Anatoli T. (обсудить/вклад) 23:30, 11 January 2015 (UTC)
Right you are. Thanks for bring this to my attention. Renard Migrant (talk) 23:38, 11 January 2015 (UTC)
Then, we should nominate Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese for deletion perhaps? Renard Migrant (talk) 18:04, 12 January 2015 (UTC)
I am confused. What are you trying to say? Keφr 18:05, 12 January 2015 (UTC)
That perhaps we should nominate Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese for deletion. Renard Migrant (talk) 18:13, 12 January 2015 (UTC)
…because? Keφr 18:40, 12 January 2015 (UTC)
A RFDO on the vote is unlikely to pass the 2/3 threshold, and seems pointless. The vote is fine. The voters have plenty of room for posting their rationale directly in the vote, or linking from the vote to locations where they posted their reasoning; they can also post "as per <person>". On another point, "consensus among Chinese-language editors" is not enough, IMHO, in part since the Chinese entries also serve readers, and since multiple of the editors who helped create Chinese entries are no longer around to oppose. On a related note, in Wiktionary:Votes/pl-2014-12/Making simplified Chinese soft-redirect to traditional Chinese, among the supporters I currently see two editors with anything like significant contribution to Chinese entries: Anatoli T., and Wyang. --Dan Polansky (talk) 18:26, 12 January 2015 (UTC)
If the vote fails, which seems likely, then we maintain the status quo, which is to use {{zh-see}} anyway. Renard Migrant (talk) 18:55, 12 January 2015 (UTC)
Are you trying to solve a problem? If so, what is the problem statement and what are the tentative solutions that you offer? --Dan Polansky (talk) 19:06, 12 January 2015 (UTC)
No. Renard Migrant (talk) 19:11, 12 January 2015 (UTC)
There is no point in keeping the vote open now that it is being ignored; this is a perfectly reasonable observation. Kaixinguo (talk) 19:14, 12 January 2015 (UTC)
The vote is not being ignored: people keep posting to it. Via the vote, we are getting increasingly better evidence and information about the actual scope of support and opposition; we would not have that without the vote. Furthermore, your defeatist stance is dangerous, since it encourages certain types of editors to think like this: "go ahead, ignore any vote or opposition, and the opposers will just give up". For those who support, posting "support per <person>" costs close to nothing, likewise for those who oppose, so I see little point in trying to make prophesies about the outcome. A better course of action is figure out the right stance, and let it known in the vote. --Dan Polansky (talk) 19:39, 12 January 2015 (UTC)

Merging Finno-Volgaic, Finno-Samic, Finno-Permic and Finno-Ugric into Uralic[edit]

The internal subgroup of the Uralic family is not usually settled upon, and many competing ideas still exist, some with greater or lesser support. The first two families in the header (or three, if Wikipedia is to be believed) are still in dispute. Finno-Ugric has much more support, but as a reconstructed language it's virtually identical to Uralic. In fact, even Finno-Permic does not differ significantly from Uralic as a whole at least phonologically. The tradition among many Uralic linguists is to label reconstructions based on the branches it's attested in. So for example, an etymology would label a reconstruction Finno-Ugric if it's not found in the Samoyedic languages.

At Wiktionary talk:About Proto-Uralic, User:Tropylium (who I definitely trust on Uralic linguistics) suggested that we should eliminate the various subgroups and their associated proto-languages as separate languages on Wiktionary, and merge them into Uralic. Treating each language or branch separately is not practical for Wiktionary, where we would end up with a long string of identical words in etymologies, like the example Tropylium gives for Finnish kala. It gets even worse if we could potentially create three or more separate entries for the various proto-languages in the chain, all of which would have identical forms for almost any word. It's much more practical, as in the current Finnish entry, to jump straight from Finnic to Uralic.

So the proposal is to treat Proto-Finno-Ugric, and perhaps also Proto-Finno-Permic, as simply dialects of Proto-Uralic (which they were in reality, in all likelyhood). Entries would be moved and etymologies adjusted accordingly, so that the categories for terms derived these languages would end up either empty, or subsumed under Uralic just like we do with Category:Terms derived from Anglo-Norman or Category:Terms derived from Late Latin. References to Finno-Volgaic and Finno-Samic would be removed altogether.

I just want to note that this is explicitly not a statement that these are not valid subgroups or languages. This is just a matter of practicality, just like when we choose to merge any other group of languages. —CodeCat 22:14, 13 January 2015 (UTC)

I support merging Finno-Volgaic and Finno-Ugric into Uralic. Ever since Tropylium pointed out the issue at Wiktionary talk:Families#Removing.2Fadding_families (in 2012), Finno-Ugric has been on a sticky note in the back of my mind as one of the places where our classification of linguistic families needed to be updated; kudos to you for pushing for action on the matter. Those unfamiliar with this language family can take a look at the first two sentences of w:Finno-Ugric languages and the first three short paragraphs of w:Finno-Volgaic languages, which sum up how out-of-date the linguistics behind those groupings is. As for Finno-Samic (and for that matter Finno-Volgaic): we never gave it a code in the first place, did we? So there's nothing to merge. - -sche (discuss) 04:22, 14 January 2015 (UTC)
I am obviously enough in support of this procedure. Though as noted at Wiktionary talk:About Proto-Uralic, it might be a decent idea to not completely merge "Finno-Ugric" and "Finno-Permic", but to relabel them dialects of Proto-Uralic instead.
User:Liliana-60 also mentioned back at Wiktionary talk:Families that aside from etymological appendix work, families are useful for users to find languages. If a user wants to find information about the other Finno-Ugric languages, can we leave redirects so that they will end up at the right address? --Tropylium (talk) 22:34, 14 January 2015 (UTC)

Merge direction[edit]

A subtopic: in what direction should a merger be done? The term "Finno-Ugric" has a long history of being used not only for Uralic minus Samoyedic, but also less formally for the family as a whole. It is also much more commonly used than "Uralic": the term "Finno-Ugric languages" gets some 800k Ghits, "Uralic languages" only 80k. So it might be worth considering to abolish the label "Uralic" instead. This would however not allow a dialect status for Finno-Ugric. --Tropylium (talk) 22:34, 14 January 2015 (UTC)

If we use Uralic, then there is no ambiguity over what we mean. With Finno-Ugric that doesn't apply. So we should use Uralic I think. —CodeCat 01:11, 15 January 2015 (UTC)
Yeah, use Uralic and list "Finno-Ugric" as an alternate name. (As an aside, maybe we should revive the idea of splitting the names= field into canonical and alternate name fields.) - -sche (discuss) 03:34, 16 January 2015 (UTC)

If there are no objections or comments, I'll start merging soon. —CodeCat 22:38, 26 January 2015 (UTC)

I've now deleted the "fiu-pro" code from the main language modules, and moved it to the etymology language module. This means that it's no longer valid to link to a Proto-Finno-Ugric entry, but you can still specify it as a source language in an etymology (like "Late Latin" and similar). There will probably be a lot of module errors from entries that currently still link to Proto-Finno-Ugric terms, I'll be running a bot regularly to change these to link to Proto-Uralic. If you find one and can't stand to leave it for the bot, you can fix it yourself by replacing {{m|fiu-pro|...}} with {{m|urj-pro|...}}. —CodeCat 15:29, 28 January 2015 (UTC)

Shall we also merge Category:Terms derived from Finno-Ugric languages and Category:Terms derived from Finno-Permic languages (and related target-lang-specific ones) with Category:Terms derived from Uralic languages? --Tropylium (talk) 23:18, 28 January 2015 (UTC)
I kept the categories separate for now because I wasn't sure if we still wanted to allow these languages to be mentioned in etymologies, even if we treat them as Proto-Uralic dialects. —CodeCat 23:36, 28 January 2015 (UTC)
Aren't "etymology language" exceptions encoded separately from the language family tree (which still seems to feature fiu)? --Tropylium (talk) 05:44, 31 January 2015 (UTC)
Yes, but I think you're confusing two things. There are both the language fiu-pro and the family fiu. The language has been merged completely so it's now a dialect of urj-pro, but the family still exists. —CodeCat 14:19, 31 January 2015 (UTC)
That's exactly what I was saying: shouldn't we also merge the families? Or do you think this would take further discussion yet? --Tropylium (talk) 17:17, 31 January 2015 (UTC)
I have been, but it takes longer for the database to catch up because all the etymology categories need to be updated too. —CodeCat 18:38, 31 January 2015 (UTC)

Admin userboxes[edit]

I know that userboxes are usually not permitted, but why hasn't anyone made an admin userbox yet for Wiktionary? I think it'd be useful to be able to look at someone's userpage and know straight off the bat, without having to go through all the trouble of looking through the large list, to see if someone is an administrator, crat, rollbacker, sysop, or whatever. (All due respect, I refer to all 3, admins, crats, and sysops, all as admins, because its too complicated to differentiate). If this isn't a good idea, shouldn't we at least have a category for admin users that shows up immediately at the bottom of every admin's user page? Has there been a discussion on this yet, or a resolution? NativeCat drop by and say Hi! 00:14, 9 January 2015 (UTC)

I think it’s a good idea. — Ungoliant (falai) 00:16, 9 January 2015 (UTC)
I might not particularly mind them, but I worry it may encourage what on TOW is called hat collecting. Admin userboxes can be put on pages fraudulently (i.e. without user being an admin), though this is usually not a problem — the userbox may contain a link to a page which verifies that an account indeed has admin rights. On the flip side, however, (as I expect, at least) no admin will be forced to put those userboxes on their pages, so it may fail to achieve the result you have in mind. Keφr 10:39, 9 January 2015 (UTC)

I will be completely unavailable for the next week.[edit]

Please have the dictionary finished by the time I get back. This includes all the foreign languages. Cheers! bd2412 T 05:53, 14 January 2015 (UTC)

Haha, sorry, your wish cannot be fulfilled. NativeCat drop by and say Hi! 13:26, 14 January 2015 (UTC)
@NativeCat: Well, not with that kind of attitude. —Justin (koavf)TCM 17:00, 14 January 2015 (UTC)
Haha, a dictionary can never be completely completed. Who disagrees? NativeCat drop by and say Hi! 21:27, 14 January 2015 (UTC)
(That's the joke.) --Catsidhe (verba, facta) 21:40, 14 January 2015 (UTC)
Jokes aside, targets for languages could and should be set for each language. How about 20,000 lemma entries for most frequent words for a language? Inflection (if necessary for a particular language), pronunciation and etymology being desired but not mandatory. If this can be achieved, then a language could be considered to have a good coverage in our dictionary. How many foreign languages would fit these arbitrary criteria at Wiktionary? Twenty to thirty? --Anatoli T. (обсудить/вклад) 21:41, 14 January 2015 (UTC)
Of the languages I search, I find only our English and Latin content good enough to be my primary source. Spanish and German are close. Italian has a lot of entries but most of them are so vague, it is easier to try to make sense of monolingual dictionaries. I’m sure Finnish is on the same level as English and Latin because there are a lot of entries and all the ones I run across are good. — Ungoliant (falai) 22:12, 14 January 2015 (UTC)
Don't you wish Danish coverage could be as good as the languages Ungoliant listed? NativeCat drop by and say Hi! 23:29, 14 January 2015 (UTC)
I would do Danish, but Norwegian is taking up all of my available time (in between burying our dead cat today). Donnanz (talk) 23:44, 14 January 2015 (UTC)
Apart from the above, I would say the coverage of Russian, Chinese, Japanese, French, Portuguese, Serbo-Croatian, Dutch are also quite good, including definitions, probably Polish, Hungarian. Number of Korean lemmas is close 20,000. --Anatoli T. (обсудить/вклад) 00:26, 15 January 2015 (UTC)
  • @Ungoliant MMDCCLXIV: Latin isn't as good as you think. Our entries are great, although they tend not to be as good as other online resources, but Latin is lacking in so many translation tables that one can't use en.wikt for English to Latin. That's a long-term project for me to attack when I've more time (which may never happen). —Μετάknowledgediscuss/deeds 23:35, 14 January 2015 (UTC)
    English often disappoints me. We have good breadth of coverage but quality of definitions for many common words is poor or, at best, uneven. DCDuring TALK 23:39, 14 January 2015 (UTC)
    My main contact with Latin is through etymology, and since I began editing there have been only a handful of cases where a Latin etymon is entryless. But yeah, the English-to-Latin translation coverage is horrible. If you’re interested, I can generate one of these for Latin. — Ungoliant (falai) 23:46, 14 January 2015 (UTC)
I don't think any Wiktionary has a good enough coverage of Danish as it should. I'm currently working a lot at the Danish Wiktionary, and even it doesn't have nearly as many Danish words (or {insert language here} words) as it could, which is a little disappointing. Really this problem goes on for all Wiktionaries, and English Wiktionary is not the only Wiktionary that has this problem. However, I feel we'll never be complete. That's the point of a wiki, it is never complete, there is always new content to add/edit. NativeCat drop by and say Hi! 00:33, 15 January 2015 (UTC)
"Never complete" is a strong statement. What we need, is the dictionary to be useful and used. If a Wiktionary covers a large number of words by frequency in a good manner, then all fancy, archaic, interesting otherwise words can be added later. Many editors focus on entries they like, not the words that are really necessary by dictionary standards. Russian entries, for example (and a couple of other Slavic languages), have inflections for most words, which are missing in most published dictionaries, which is a clear advantage, since inflections are not straightforward and can't be easily construed from grammar references. Chinese, apart from Mandarin, now has thousands of transliterations/pronunciations for other topolects, which is also hard to get (separate topolects, including Mandarin may still not be able to compete with other dictionaries but our entries have not only pinyin but also IPA and zhuyin, other dictionaries don't have all three). Electronic dictionaries available mostly provide only Mandarin and Cantonese, with Cantonese having less coverage. There's, of course, room for a lot of improvements. Finnish is probably the best online dictionary available. --Anatoli T. (обсудить/вклад) 00:50, 15 January 2015 (UTC)

Spanish adjective forms[edit]

Over the past two weeks, Wonderfool has created no less than thirteen sockpuppets (one of which has been used four times), acting as bots to create missing Spanish participle and adjective forms. As he shows no sign of giving up, and the blocks only serve to slow him down, it seems to me that we should try to take some sort of alternative action. I don't really want to get involved in edit warring, or, frankly, any discussion outside of my field of expertise, so I'm merely going to make a couple suggestions as to solving or mitigating this problem:

  1. Create an abuse filter. I'd do this, but (a) I don't know if there's policy involved, (b) I don't know if I'm allowed, and (c) I don't know how to write one (I could figure it out, but someone would still have to check over it.)
  2. Request and create a bot to add these entries, thereby removing the need for Wonderfool to do so. I don't especially want to take that much time to do this, but if this is deemed a good solution and nobody else will, I can. ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 19:19, 14 January 2015 (UTC)
Special:AbuseFilter/22 already exists for that purpose. Remember not to make the filter too broad. Keφr 19:28, 14 January 2015 (UTC)
Ah! I'd missed it, thanks. I'd assumed such a filter would be closer to the end of the list, though... has he done this before? ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 20:09, 14 January 2015 (UTC)
Okay, I've made some modifications. Please do check over the filter if you can. ObsequiousNewt (ἔβαζα|ἐτλέλεσα) 15:19, 15 January 2015 (UTC)
  • Hi. It would be super if someone ran a bot to create these forms, which should free me up some time for more useful stuff around here. I did find a handful of errors in this category and this one, which I've corrected, so ideally the bot should be run be someone with Spanish knowledge. User:Adrian F4/Bot code has some forms which can be used by a bot account, which all have been checked. Regards - WF. --Adrian F4 (talk) 21:12, 15 January 2015 (UTC)
    Until then, I'll take charge of adding any forms -WF. --Adrian F4 (talk) 21:14, 15 January 2015 (UTC)
    BTW, the filter you made doesn't really work. I can still create the pages. Perhaps you could make it stricter, or just forget about it. -WF --Adrian F4 (talk) 21:18, 15 January 2015 (UTC)
    Another suggestion is giving me a flood flag, like DCDuring kindly did, and I can add loads of good entries without bothering the RC patrol. Up to you, really. -WF --Adrian F4 (talk) 21:19, 15 January 2015 (UTC)
    Yes, but it's a terrible suggestion. Renard Migrant (talk) 22:58, 15 January 2015 (UTC)

New code for Norman[edit]

Today I discovered that there is a new ISO 639-3 code, valid for only a few days, nrf, for Jèrriais roa-jer plus Guernésiais roa-grn. Would it be straightforward and uncontroversial for us to get a bot to merge those two ad hoc codes into the new, official one? The bot should also make liberal use of {{label}}s so that we can still categorize J and G separately, much as we categorize the B, C, M, and S lects of Serbo-Croatian.

Also, a few months ago, Chuck Entz brought up here the issue of Sercquiais, the langue d'oïl variety spoken on Sark in the Channel Islands. Ungoliant and I supported the idea of merging all the Channel Islands varieties into Norman roa-nor, but no one else commented and the discussion fizzled out.

So I'm raising the issue again, but this time for the new code nrf. Can we decide to apply nrf to all Norman varieties, rather than just for Jèrriais+Guernésiais, as ISO has it? This would have not only the obvious advantage of collecting several dialects of what really is a single language into a single code, but would also allow us to use "Norman" as the canonical name for nrf instead having to come up with something like "Jèrriais–Guernésiais" or "Channel Islands Norman" or something.

Is this something that needs to be voted on?

Pinging NativeCat, Chuck Entz, Ungoliant, and Embryomystic. —Aɴɢʀ (talk) 22:08, 15 January 2015 (UTC)

Support merging Jèrriais, Guernésiais and Norman, but it probably should have a vote. — Ungoliant (falai) 22:39, 15 January 2015 (UTC)
I'll admit to being a little attached to the idea of the different Norman varieties as separate things, but considering that they already have a shared Wikipedia (under nrm), it certainly does make sense. And between Jèrriais and Guernésiais, especially, there is a lot of duplication that could be trimmed down. I've not been able to find as much info on Continental Norman or Sercquiais, but I'm fairly certain there's duplication with the former, perhaps not so much the latter (with its quite distinct orthography). embryomystic (talk) 22:43, 15 January 2015 (UTC)
If we merge them, then we need to change the mapping to nrm in Module:wikimedia languages. —CodeCat 22:49, 15 January 2015 (UTC)
Better yet, Wikimedia needs to move its nrm projects to nrf. —Aɴɢʀ (talk) 23:05, 15 January 2015 (UTC)
I support this. It makes it less confusing. If it's just a mere dialect, not even considered a language in itself, we should not have it as a header, IMO. We should do like we do with Serbo-Croatian and add words pertaining only to a specific dialect using Template:context. NativeCat drop by and say Hi! 23:08, 15 January 2015 (UTC)
Support merging the varieties of Norman into nrf. The fact that there's one unified Norman Wikipedia is telling. - -sche (discuss) 08:40, 17 January 2015 (UTC)
As someone with Norman ancestry, I support merging the varieties of Norman; they are dialects and not separate languages. The dialects often have spelling differences. But we don't consider American English a separate language just because it uses color rather than colour. Taxman1913 (talk) 17:10, 21 January 2015 (UTC)
  • So can we consider this to have consensus, or do I have to start a vote before it will actually happen? —Aɴɢʀ (talk) 22:11, 24 January 2015 (UTC)
    Since there's only support, no opposition, and since the dialects in question were never recognized by even the generous/lax folks at the ISO as languages, anyway (we had to give them exceptional codes), I say you could just start merging them. FWIW, my standard is to set up votes only for cases that discussion (either on-wiki or in the real world) has shown to be contentious either linguistically or politically (like Moldovan→Romanian). And this doesn't appear to be contentious. Let me know if you need help merging them. - -sche (discuss) 22:57, 24 January 2015 (UTC)
    As a first step, I've added nrf as "Norman" (with all the dialects as alt names). - -sche (discuss) 23:11, 24 January 2015 (UTC)
    Short of going through all of Category:Guernésiais language, Category:Jèrriais language, and Category:Norman language manually, I don't know what to do to merge them. I'm not a bot operator, nor would I even know how to begin programming a bot. —Aɴɢʀ (talk) 12:49, 25 January 2015 (UTC)
I'm not sure a bot would be a good idea, since we have multiple language sections that need to be merged, with the potential for differences in the content between them. I'm sure a great deal of the content is going to substantially overlap, but where there are differences, we need to think about how to represent them. We also need to preserve the dialectical information (have we added "Guernésiais" and "Jèrriais" to Module:labels/data so we can use context labels?). Chuck Entz (talk) 16:11, 25 January 2015 (UTC)
Module:labels/data already includes "Guernsey" and "Jersey", so we can use those. It will generate the names "Guernsey Norman" and "Jersey Norman", which are probably easier to understand than Guernésiais and Jèrriais anyway. Without a bot, it will take for freakin' ever, since Jèrriais alone has over 9000 entries. —Aɴɢʀ (talk) 18:41, 25 January 2015 (UTC)

Proposal for bot edits to French[edit]

Two proposals:

  1. Remove gender templates from adjective forms which have gender indication in the definition. For example, {{masculine plural of|word|lang=fr}}.
  2. Change m-p and f-p to m and f inside {{head|fr|plural}}. So {{head|fr|plural|g=m-p}} becomes {{head|fr|plural|g=m}}. Rationale: 'plural of' is already in the definition.

Objections? Renard Migrant (talk) 18:17, 16 January 2015 (UTC)

I don't object. However, the proposal would be easier to assess if you gave us two example diffs or if you gave links to two entries that would be affected, so that we can check the current situation. Not as means of objection but rather from curiosity: is this a fun activity for you? Or do you think there will be tangible benefits for the user of the dictionary, more tangible than your adding new entries? --Dan Polansky (talk) 09:40, 17 January 2015 (UTC)
Regroupements for a noun example. Renard Migrant (talk) 17:43, 17 January 2015 (UTC)
Blanches adjective example. Renard Migrant (talk) 17:47, 17 January 2015 (UTC)
Support. I would also support removing the gender from non-lemma form headwords altogether, as it reduces duplication and makes maintenance easier. Of course, genders aren't modified that often, but they still can be, and nobody is going to remember to change all the form-of entries as well, especially if there are many of them. —CodeCat 18:01, 17 January 2015 (UTC)
I would strongly oppose that for inflected forms that have inherent gender, like maisons (French) which is inherently feminine. I think it's misleading to remove the gender, because it might look like French plurals don't have gender, furthermore it's use-unfriendly because the user has to click on the singular to get the gender. I feel like you're proposing to replace a good system with an inferior system, so I oppose it. Renard Migrant (talk) 19:12, 17 January 2015 (UTC)

Codes the ISO deleted or added in 2014[edit]

In 2014, the ISO deleted some codes and added others. Here are the changes they made and my thoughts on whether we should follow suit. If you have any comments of your own, please comment.

- -sche (discuss) 08:59, 17 January 2015 (UTC)

There are some I wanted to request the deletion of, but I can't find a 'contact us' page. Renard Migrant (talk) 17:39, 17 January 2015 (UTC)
According to the SIL page on page on submitting change requests, you can fill out this form and e-mail it to the address listed here. The form will become part of the public record. - -sche (discuss) 22:04, 17 January 2015 (UTC)
Data point: the entire known corpus of Yurats amounts to < 150 isolated words. I feel that extinct languages that have only been attested at this level of detail would, in general, be better recorded as single pages the Appendix namespace than as entries in the Main namespace. Do we have any general policy on the CFIness of extinct and poorly recorded languages? --Tropylium (talk) 01:52, 18 January 2015 (UTC)
The most relevant point would be WT:LDL. But we do include quite a few languages that are very poorly attested, such as Mycenaean Greek, Proto-Norse, Crimean Gothic and Oscan. —CodeCat 01:56, 18 January 2015 (UTC)
In general, the criteria for inclusion actually make it easier to have extinct and poorly-documented languages in the main namespace than to have well-attested living languages there, by requiring more citations of the latter, heh. That's because (we wanted to make it easier to include extinct and poorly-documented languages, and) this site is geared towards covering attested natural languages in the main namespace. I think that makes some sense, too — have all such languages in one namespace. (We include a couple languages of which only a single word is attested.) But for languages with few words, we could certainly also have an Appendix: or Index: for it.
In this particular case, however, because it's not at all clear that Yurats it actually a distinct language, I recommended we not give it a code, but we could make an appendix of it. - -sche (discuss) 03:01, 18 January 2015 (UTC)
Update: I've deleted all of the codes the ISO deleted except aue, and I've added all of the codes they added except cbq, gku and the code noted above as being excluded (rts); aue, cbq and gku I simply haven't gotten around to dealing with yet (aue and gku since they're slightly messier than I anticipated, and cbq because I'd like to figure out which of its names is most common, but none of them seem to be attested at all). - -sche (discuss) 22:12, 30 January 2015 (UTC)

Literary Kajkavian Serbo-Croatian[edit]

A few days ago, the ISO added a code for literary Kajkavian Serbo-Croatian: kjv, defined as denoting specifically the 16th to 19th century literary language and not modern Kajkavian dialects. What should we do? We could decline to follow suit, and continue to handle Kajkavian as sh; or we could add kjv to Module:languages and start having ==Kajkavian== entries; or we could add it to Module:etymology language so it could be cited in etymologies but wouldn't get its own L2. Or we could repurpose the code to refer to modern Kajkavian (the same way we repurposed ltc from "Late Middle Chinese" to "Middle Chinese") and then do one of the last two things. Or...
Pinging Serbo-Croatian speakers User:Ivan Štambuk, User:Biblbroks, User:Dijan.
- -sche (discuss) 09:41, 17 January 2015 (UTC)

There is no difference between "Kajkavian literary language" and modern Kajkavian dialects (spoken and written), apart from orthographic and typographic conventions used. By the 16th century the development of all SC dialects was pretty much over (there are few changes in case desinence usages, but that's it). At any case, this should be discussed if and when the number of Kajkavian entries and their categorization and periodization becomes an issue, and not before. --Ivan Štambuk (talk) 10:03, 17 January 2015 (UTC)
The goal of ISO 639 is apparently not one language, one code, but just to have codes for whatever is useful, even if a language gets represent twice, three times or whatever. Renard Migrant (talk) 19:15, 17 January 2015 (UTC)
I read your comment as "don't add kjv (at this time)". That's fine by me. (As for categorization, we do have Category:Kajkavian Serbo-Croatian.) To ask a more specific question, is Kajkavian mentioned often enough in etymologies that it would be useful to have an etymology code for it, the way we have frc for "Cajun French" (Category:English terms derived from Cajun French)? - -sche (discuss) 21:54, 17 January 2015 (UTC)
For my point, very good example. For ISO 639, the fact that Cajun French has a code does not imply that it's not the same as French. It has a code because such a code could prove useful. Renard Migrant (talk) 21:57, 17 January 2015 (UTC)
Kajkavian, Chakavian and Shtokavian are the three main varieties making up Serbo-Croatian, where Shtokavian is the one used as the base for writing. Linguistics topics refer to the other two varieties often enough, but I don't know if there are any specific phonological isoglosses that would separate them from the larger whole. In fact, the distinction is based on a vocabulary isogloss, the word for "what": kaj, ča, što. —CodeCat 22:02, 17 January 2015 (UTC)
@-sche: Your powers of observation serve you well - the point was indeed that we shouldn't put the cart before the horse and make lengthy discussions and arrangements for activity that might as well never materialize. Regarding the the Kajkavian borrowings into the the literary form - we're dealing with perhaps at best a couple of hundred terms in the entire language, mostly in regional usage and I would be hard-pressed to find attestations for them outside Kajkavian works. Kajkavianisms and other words from subliterary dialects were mercilessly weeded out from the standard language (along with LWs from Turkish etc.) by Vukovians during the standardization in the 19th century. (It's a Balkans thing - keep your language as pure as your society through occasional cullings of the unfit). If anything, there is a need for deriving the other way around - literary SC borrowings into Kajkavian (which would be kind of nonsensical to name - Kajkavian Serbo-Croatian terms derived from Serbo-Croatian). Most of the kaj words were added by User:Fejstkajkafski who is a dialectologist (I think), but they don't seem to edit anymore. When it becomes a necessity we can create codes for Kajkavian as a whole and/or its subdialects (which are many, they differ a lot, are not always mutually intelligible and are today for the most part written in scholarly transcription by linguists who study it since native speakers always utilize the standard language in writing). --Ivan Štambuk (talk) 22:27, 17 January 2015 (UTC)
This reminds me of the language ISO is pleased to call "Hiberno-Scottish Gaelic", a term they apparently made up, which has been given the code ghc. Basically, it's a cover term for Early Modern Irish and Early Modern Scottish Gaelic and is hardly more different from the modern varieties (abstracting away from the Irish spelling reform) than Early Modern English is from modern English. We do not recognize ghc as a separate language, but simply treat 13th- to 17th-century Irish as ga and 13th- to 18th-century Scottish Gaelic as gd. I'm inclined to believe Ivan when he says the differences between "Literary Kajkavian" and modern dialectal Kajkavian are more orthographic than linguistic, and to oppose recognition of the kjv code, at least for the time being. —Aɴɢʀ (talk) 10:33, 19 January 2015 (UTC)
OK, I have updated WT:LANGTREAT to note that we treat kjv under sh’s code and header. - -sche (discuss) 20:44, 21 January 2015 (UTC)

Redundant Austronesian entities[edit]

I took a look at our Austronesian categorization system, and there seem to be some messy things going on. Some initial observations:

--Tropylium (talk) 02:38, 18 January 2015 (UTC)

If you're surprised that Proto-Eastern Polynesian exists and has a code, just know that until recently it existed twice and had two codes. (See this RFM.)
Anyway, pinging User:Amir Hamzah 2008 and User:Metaknowledge, who are the main editors of our Polynesian proto-language entries.
- -sche (discuss) 03:19, 18 January 2015 (UTC)
I agree with you on deleting Western Malayo-Polynesian and merging Central Malayo-Polynesian to Central-Eastern. - -sche (discuss) 03:19, 18 January 2015 (UTC)
The non-Polynesian mess is due to the easy availability of Blust's Austronesian Comparative Dictionary online. I refer to it a lot because it has a great deal of useful data, but you have to be aware of the author's biases, and you also have to be aware that he regularizes the orthography of the reflexes to make it easier to compare between languages. That source doesn't provide proto-forms for Polynesian, but they're available at the Combined Hawaiian Dictionary and Pollex Online. There are actually enough differences between PPN and PEP for the latter to be worth maintaining, even if we don't have much of anything in it as yet. I haven't really looked at the differences between Tongic and Nuclear Polynesian enough (or at least recently enough to remember anything), so I can't say much about it. If anything, I would add Proto-Central Eastern Polynesian, rather than taking away PEP. Chuck Entz (talk) 05:59, 18 January 2015 (UTC)
"There are actually enough differences between PPN and PEP for the latter to be worth maintaining". Yes, I'm sure there are differences — but we don't usually create separate proto-stages just to highlight the phonetic evolution of a word (for example, early PIE *h₂érh₃trom 'plough' is considered sufficient, and we do not create a "Late PIE" entry *árə₃trom, or a "Mid PIE" entry *h₂árh₃ətrom). If you wanted to be systematic about this type of a thing, it leads to tons of unmaintainable duplication, where approximately every Proto-Pol. entry has a corresponding Proto-EP entry — and then easily a dozen other nodes downward towards Proto-Austronesian. Yet it seems to me that the differences are not quite enough to make the two proto-stages more than two dialects of the same languages, so we ought to be able to cover this type of situation well enough by noting things like "evolution s > h is Eastern Polynesian", or labeling Eastern lexical innovations as dialectal Proto-Pol.
Core point being that etymological appendices do not exist for the purpose of documenting various reconstructible proto-language stages; they exist for demonstrating the relationships of attested languages. When chronology is known in great detail, this seems to necessarily lead to having to treat closely successive stages on single pages. --Tropylium (talk) 20:17, 18 January 2015 (UTC)
These are not different stages of PIE but different notation systems - underlying (aka (morpho)phonological or etymological) vs. surface (aka phonetic, i.e. the one you get with comparative method). The established notation is however inconsistent, though some recent works tend to use slashes and square brackets to distinguish the two. E.g. our entry *bʰréh₂tēr is technically nonsensical, it should be either */bʰréh₂ters/ or *[bʰráh₂tēr] (or *[brā́ter] if you one doesn't subscribe to the theory that the Balto-Slavic acute is of laryngeal origin..). No idea about Polynesian stages though, but if the separate clades are generally accepted than intermediary steps are OK to have, especially if there are restrictions in lexicon to various subbranches (e.g. West-Germanic only word should be reconstructed as Proto-West-Germanic and not Proto-Germanic). --Ivan Štambuk (talk) 00:21, 25 January 2015 (UTC)
What? Of course those are different stages. Laryngeal theory is precisely about (among other similar developments) the idea that (some cases of) Late PIE *[a-] historically developed from an earlier *[Ha-], which historically developed from earlier *[h₂a-], which historically developed from Early PIE *[h₂e-]. One can claim that the sound changes continued to hold an existence as phonological processes (i.e. that *[Ha-] would have still been phonologically */h₂e-/), but don't let this distract you. All phonological processes have a diachronic origin.
…Anyway, I'm switching to your West Germanic example, as something less prone to getting the discussion stuck in details. Of course we should reconstruct any exclusively West Germanic words as West Germanic and not as Proto-Germanic proper. The question is how should we notate this. In principle, we could create a separate Proto-West Germanic appendix. But I continue to think that this is a poor approach, since an entire separate appendix for a only marginally different proto-language stage entails massive duplication of work. We do not create pages like "Appendix:Proto-West-Germanic/balluz" to go with Appendix:Proto-Germanic/balluz, even though a word's existence in West Germanic languages and Proto-Germanic implies its existence in Proto-West Germanic. What we can do instead, what we already do, and what I am proposing should be done in similar cases as well (like here, Proto-Polynesian versus Proto-East Polynesian etc.), is that we can set up Proto-West Germanic as a dialect of Proto-Germanic, and file words only attested in West Germanic under Category:Regional Proto-Germanic.
Note again also the absense of entries just to mark phonetic development! People working on this area have been content to have just e.g. Appendix:Proto-Germanic/ēnu, not creating duplicate entries like "Appendix:Proto-Germanic/ānu" to demonstrate the Northwest Germanic development *ē > *ā. It's even entirely possible to be explicit about a thing like this regardless, the easiest way being to include a line such as
* Northwest Germanic: *ānu
in the list of descendants, and then indent all NW Germanic entries to depth-2. --Tropylium (talk) 21:52, 25 January 2015 (UTC)
Nope, they're synchronic really: */bʰréh₂ters/ → *[bʰráh₂tēr], not */bʰréh₂ters/ > *[bʰráh₂tēr]. It's the same thing just in two different notations. *a is just an allophone of *e colored by an adjacent *h₂. We just use the former notation to distinguish such *a from the "real" *a that is sometimes postulated. Nobody knows the ancestral form in Early and Mid PIE because there is nothing to compare them against, and or theories based on internal reconstruction are too speculative because the evidence is very scarce.
It's phonemically */bʰréh₂tēr/. Resolving -ēr into -ers is not phonemic, because there could in theory be another word for which the underlying phonemes are also */bʰréh₂tēr/, but which can't be morphologically resolved into -ers. The proper analysis is that the phonemic distinction between -ēr and -ers is neutralised. —CodeCat 23:45, 25 January 2015 (UTC)
If the distinction is neutralized it in principle doesn't matter which form you use, except in this case *ers# → *ēr# by Szemerényi, so we know that the "real" underlying phonological form for this specific word is */bʰréh₂ters/. It doesn't matter if there are other reconstructions with the same form at the surface level. Their existence or absence does not invalidate the underlying form of other words. --Ivan Štambuk (talk) 00:27, 26 January 2015 (UTC)
The differences between PWGmc and PGMc seems substantial though (shouldn't it be PWGmc *ballu really?), and when we have protolangs separated by centuries we can't really speak of dialects anymore. I suspect that there are many such "small" protolangs that should better be fitted into some larger grouping, but we don't really have space constraints and if there is a body of scholarship supporting them (not just the reconstructions, but the entire protolanguage, with inflections etc.) I see no point in forbidding them. They're the problem of those who add them. Not everything has to be perfectly structured, it's all a perpetual work in progress here. --Ivan Štambuk (talk) 23:15, 25 January 2015 (UTC)
Space constraints are a red herring. My concerns are internal consistency, and editor attention constraints on what can be reliably kept up to date. Or, more specifically: as our treatment of things like Northwest Germanic demonstrates, "having an intermediate proto-stage" and "having a separate appendix / separate language status for an intermediate proto-stage" are two different things. No one wants to stop recognizing Proto-Eastern Polynesian; I am only proposing covering it in the same appendix entries as corresponding Proto-Polynesian forms.
(This might also be the first time I hear "it is a work in progress" used as an argument against improving organization.)
Also I guess a more general discussion on preferrable ways to organize etymology appendices might be worthwhile. I think I'm going to start that below. --Tropylium (talk) 00:11, 26 January 2015 (UTC)
Actually, Proto-West-Germanic never existed in all likelihood, given the current thoughts in the field. West Germanic doesn't appear to be a clade. While it certainly has some innovations that are shared among the group, it seems that West Germanic was still in a continuum with North Germanic when those changes occurred. This is similar to the West and South Slavic language groups, for which a single proto-language stage is not reconstructable either. —CodeCat 22:07, 25 January 2015 (UTC)
The Ringe & Taylor 2014 book has two chapters on it. Situation with Slavic division is different - we're know for certain that they are geographical groupings and as speech communities they never existed because the earlier form of the language is already attested in terms of Old Church Slavonic, which already demonstrates dialectal variation.--Ivan Štambuk (talk) 23:15, 25 January 2015 (UTC)
  • I'm certainly responsible for some of the mess; I haven't time to respond in full, but suffice it to say that entries in PNP or PEP are not that we may show every stage in the phonetic development of a proto-word but rather that we may have entries for words that by editorial laziness or simple lack of cognates we cannot place in PPn in good faith and yet are demonstrably valid up to a certain level in the Polynesian tree. —Μετάknowledgediscuss/deeds 21:17, 22 January 2015 (UTC)
  • Returning to the subject at hand... Western Malayo-Polynesian and Central Malayo-Polynesian are negatively-defined areal rather than valid genetic groups, so proto-languages for them seem nonsensical. Is there any logical opposition to deleting the first one and merging the second to Central-Eastern? - -sche (discuss) 00:27, 26 January 2015 (UTC)
    • No argument from me. They're just the kind of artifacts you would expect from Blust's dependance on lexicostatistical cladistic analysis, so it's not surprising that he would provide lots of them- but that doesn't make them valid. Chuck Entz (talk) 01:45, 26 January 2015 (UTC)

Header levels[edit]

To the best of my knowledge, there is no 'official' list of header levels, WT:ELE mentions the headers, but not the levels. A particular one to start with, should Descendants be L3 when the descendants are not from a particular part of speech? I think the most official list we have is User:AutoFormat/Headers. Renard Migrant (talk) 11:55, 18 January 2015 (UTC)

What about DTLHS's list? —CodeCat 20:14, 20 January 2015 (UTC)
What makes something "official"? Shall we have a vote? My list also doesn't take into account the relative positions of headers. DTLHS (talk) 20:20, 20 January 2015 (UTC)
I'm not advocating officialness just asking the question. Renard Migrant (talk) 20:41, 20 January 2015 (UTC)

Proto-Arawak[edit]

Hi, could someone do me a favor and create a language module for Proto-Arawak?

m["arw-pro"] = {
names = {"Proto-Arawak", "Proto-Arawakan", "Proto-Maipuran"},
type = "regular",
scripts = {"Latn"},
family = "awd",
}

Thanks for your help! --Victar (talk) 17:49, 20 January 2015 (UTC)

I've created it, but as awd-pro, since the usual naming scheme for proto-languages is to add "-pro" to the family code. I also used "Proto-Arawakan" as the canonical name because "Arawakan" was the name of the family, but given that Proto-Arawakan and Proto-Arawak have about as common over the last few decades, and Arawak has always been more common than Arawakan, feel free to start an WT:RFM or ask me to if you think the family and protolanguage should be renamed. - -sche (discuss) 17:46, 21 January 2015 (UTC)
Thanks a ton -sche! --Victar (talk) 19:05, 21 January 2015 (UTC)

Taíno vs Taino[edit]

Hi, the proper way to spell Taíno in English is Taíno, not Taino (sans i-acute). Can we change its canonical name to reflect this? --Victar (talk) 18:34, 20 January 2015 (UTC)

It's not one person's decision what the 'proper' way to spelling a word is. Sure, we could change it if we collectively wanted to, but do we? Renard Migrant (talk) 20:42, 20 January 2015 (UTC)
Appendix:Taíno/kasike‎ may need moving and correcting. Renard Migrant (talk) 20:46, 20 January 2015 (UTC)
By proper, I mean the spelling used in all the vast majority of academic papers and on Wikipedia. So if an admin could change it before start adding more entries, I would appreciate it. --Victar (talk) 20:53, 20 January 2015 (UTC)
All academic papers? All of them, in the history of the English language, that's what you're saying isn't it? Renard Migrant (talk) 22:21, 20 January 2015 (UTC)
Yes, all in the history of man. ;-) But seriously, if it's the form used in academic papers and on Wikipedia, it should be the form used on Wiktionary. Do you have the capability to change it? --Victar (talk) 22:27, 20 January 2015 (UTC)
Wrong! I found one that uses Taino. Hard lines. Renard Migrant (talk) 22:33, 20 January 2015 (UTC)
OK, sure, but I can tell you that it is not the common spelling in academic papers, and not he spelling used on Wikipedia. --Victar (talk) 22:38, 20 January 2015 (UTC)
Check google ngrams. DTLHS (talk) 22:43, 20 January 2015 (UTC)
I don't know if that's the best method, since OCR software frequently misses /í/ and people often casually omit diacritical marks. Taino is also a Japanese surname. But if anything, the spelling should match the Wikipedia articles, w:Taíno, w:Taíno language. --Victar (talk) 22:58, 20 January 2015 (UTC)
Why? We are not Wikipedia’s vassals. — Ungoliant (falai) 23:04, 20 January 2015 (UTC)
That's true, but consistency is a good thing and if Wikipedia thought Taíno the preferred spelling, that says something in and of itself.. --Victar (talk) 23:13, 20 January 2015 (UTC)
I'll propose the renaming on Wikipedia then. Renard Migrant (talk) 12:30, 21 January 2015 (UTC)
... and the Ethnologue and Wiktionary thought Taino the preferred spelling. If we do change the name, it should be because the Wiktionary community found it better to use the new name, not because we have to kneel before Wikipedia. — Ungoliant (falai) 23:25, 20 January 2015 (UTC)
I don't know about Ethnologue and whether they make full use of diacritical marks, but Wiktionary had no Taíno entries until I just added three, so that's why it's being brought up now. Also, are we not the Wiktionary community having a discussion on this right at this very moment? --Victar (talk) 23:40, 20 January 2015 (UTC)
  • While I grant that we intend (generally) to be a descriptive dictionary in how we create and edit our term entries (and as such, we should probably have entries for both spellings), in my subjective experience, we also seem to hew somewhat to academic norms when it comes to the terminology used for categories and other infrastructure, such as language names and codes. Avoiding ambiguity and all that. ‑‑ Eiríkr Útlendi │ Tala við mig 23:16, 20 January 2015 (UTC)
For reference, Languages of the Pre-Columbian Antilles is probably the main published work on Taíno terminology and will be often cited for entries. It too uses the spelling Taíno. --Victar (talk) 23:25, 20 January 2015 (UTC)
As is documented in wt:Languages, "Whenever possible, common English names of languages are used, and diacritics are avoided." Judging my ngrams and by the comments about academic literature, this may be a case similar to that of Maori (Māori). I'd stick with the diacritic-less name since it's common (well-attested) and easier to type. - -sche (discuss) 22:00, 22 January 2015 (UTC)
Well, like I mentioned above, ngrams isn't a good tool in this case so you have to really look at the current published academic literature to make an assessment. My impression is that Taíno is the far preferred spelling in papers on the topic. --Victar (talk) 23:20, 22 January 2015 (UTC)
I see you're point though with the example of Māori being the more "correct" spelling, but for the sake of ease in typing, Maori is preferable. I think the major difference though is Taíno is a reconstructed language, with a finite amount terms taken from academic papers, where as Māori is a living language with tens of thousands of terms. --Victar (talk) 02:46, 23 January 2015 (UTC)

JAnDbot[edit]

Discussion moved from WT:RFDO#JAnDbot.

Please, unblock user:JAnDbot. I am working on maintenaning interwiki links on (language) categories and I need to use it in enwikt too (now I must use my own account). Please note, that bot is blocked for more than 6 years and the reason is now obsolete. JAn Dudík (talk) 11:00, 21 January 2015 (UTC)

It's an illegal bot because it hasn't passed a bot vote. That's the reason. Renard Migrant (talk) 12:35, 21 January 2015 (UTC)
Bot did not passed a vote for bot flag, but there is no reason for leave it blockd for many years. I dont want to work on interwiki on articles, but on categories only. And I want to use my bot's account, beacuse of SUL (in the other case I could make new account, but it is problematic). There are many categories which have no interwiki even if exist in some other languages. JAn Dudík (talk) 14:05, 21 January 2015 (UTC)
No, the requirement for a vote has nothing to do with the bot flag, but has everything to do with whom we trust enough to allow the capability to perform high-volume, unsupervised edits. It doesn't matter what account you use- if you operate a bot without permission, you're subject to being blocked. Chuck Entz (talk) 14:28, 21 January 2015 (UTC)
The user is now operating his own account as a bot... - -sche (discuss) 22:24, 26 January 2015 (UTC)
I've blocked the user for a day, and think the bot account should stay blocked because of this. —CodeCat 22:27, 26 January 2015 (UTC)

Please, can you say me, how can I work on category interwiki? When I want to use bot you say 'no, it should stay blocked. When I use my own account, tou say he is using his account as a bot, block him. English wictionary is the biggest one, so even If I edit here only in must-edit-cases (incorrect interwiki, no interwiki, no other language with this interwiki, missing 5 or more links), there are many edits. There is 170 other wictionaries, where I work, and there are no problems with these edits. But english Wictionary is so selfcentric and selfish, that there say Problems with interwiki on other Wictionaries is not our problem. We dont want to have correct interwiki here. And some interwiki conflict? WTF? JAn Dudík (talk) 07:33, 28 January 2015 (UTC)

I would suggest starting a new bot vote and saying: These are the kind of interwiki links I want to add. Hopefully there will be more participation in the bot vote, and people will be able to look at your recent contributions and see if they're correct or not. I'm sorry we're such sticklers about bot policy... - -sche (discuss) 19:59, 28 January 2015 (UTC)

Voting to always supersede policy[edit]

I think we should seriously consider a vote that says voting always supersedes 'policy'. By this, individual pages (no matter what the namespace) and groups of pages will not be subject to any written policies if there is a vote on that individual entry. The prime example is Wiktionary:Votes/2014-11/Entries which do not meet CFI to be deleted even if there is a consensus to keep, voting comes before policy for deletion matters. The only reason I can think of not to vote on this issue is that it's equivalent to a vote on whether water's wet. It's already what we do. Renard Migrant (talk) 12:43, 21 January 2015 (UTC)

Consensuality and attestation[edit]

I have just and rather belatedly realised that the work I had last done yesterday evening had been undone on two grounds: lack of consensus and multiple quoting.

If consensus is not required for undoing but is required for doing plain work, and if it is not a general principle of Wikipedia, though it is elsewhere in dictionaries, that quotation is primarily for attestation while examples are primarily examples, then I'm giving Wikipedia away, not without very great chagrin.—ReidAA (talk) 21:31, 24 January 2015 (UTC)

Mari templates[edit]

Our templates for Mari language seem to be inconsistent with ISO 639-3 and even between themselves.

chm
External links section in the entry Mari links this to Mari the macro language, which is consistent with ISO, but {{etyl|chm}} links to Wikipedia article on Eastern Mari.
mhr
External links section in the entry Mari links this to Eastern Mari, which is consistent with ISO although the preferred name for the language seems to be Meadow Mari, but {{etyl|mhr}} returns an error message.
mrj
External links section in the entry Mari links this to Western Mari, which is consistent with ISO although the preferred name for the language seems to be Hill Mari, and {{etyl|mrj}} links to Wikipedia article on Mari language.

Could somebody familiar with the workings of our language templates fix this? I will write Wiktionary entries for Meadow Mari and Hill Mari. --Hekaheka (talk) 11:09, 25 January 2015 (UTC)

We had a discussion in WT:RFM a while ago. @-sche: Could you help dig it up, please?
I don't find anything on Mari in WT:RFM, not in the active discussion nor in the archives. --Hekaheka (talk) 12:59, 25 January 2015 (UTC)
I'm sure it was in RFM. --Anatoli T. (обсудить/вклад) 13:58, 25 January 2015 (UTC)
"chm" is the code for Mari and Eastern Mari, which have been merged to Eastern Mari, since Eastern Mari (Meadow Mari) is the standard language of Mari people. "mrj" is reserved for Western Mari (Hill Mari), which is considered a dialect and "mhr" is not used @Wiktionary. If "Mari" is used on its own, Eastern Mari is implied, there is no other (macro) Mari language. We decided to only use terms "Eastern Mari" (="Mari") and "Western Mari" here, not "Meadow Mari", "Hill Mari" or just "Mari". --Anatoli T. (обсудить/вклад) 11:33, 25 January 2015 (UTC)
"chm" is the code for Mari and Eastern Mari, which have been merged to Eastern Mari -- I can see that we have done it. I just pointed out that we have it differently from ISO 639-3. Has this merger been done somewhere else outside the Wiktionary space or is this our own invention? Anyway, I maintain that I'm right in saying that our current practice is confusing. Perhaps a clarifying usage note under the entry "Mari" would do the trick. My Russian is elementary, but I understand from ru-Wikipedia that Hill Mari (Горномарийский язык) has some sort of official status in Mari El. Anatoli - can you elaborate a little on this point? --Hekaheka (talk) 12:59, 25 January 2015 (UTC)
Eastern Mari, Russian and Western Mari are three official languages in Mari El but Eastern Mari has 10 times more speakers than Western, is used wider and is often just called Mari (in whatever language). Despite separate code, there's no separate Mari language. Let me know if you want to know more. --Anatoli T. (обсудить/вклад) 13:58, 25 January 2015 (UTC)
There have been two discussions of Mari, wt:Beer_parlour/2013/September#Merging_Mari_and_Buryat_varieties and wt:Language treatment/Discussions#Merging_Buryat_dialects.3B_also.2C_merging_Mari_dialects, which led to the status recorded on wt:LANGTREAT: "Both the macrolanguage and its subdivision mrj are treated as languages, but the macrolanguage code is used in place of the code (chm) which the ISO gave the standard variety of the language." I thought the use of the macrolanguage's code (chm) for Eastern Mari (properly mhr) was because most sources used chm for Eastern Mari text (à la what Anatoli is saying about how it "is often just called Mari"); if that's not the case, we could always retire chm and add mhr. But note that using a macrolanguage's code for the standard variety is something we've done before, e.g. we use lv for standard Latvian even though technically lv includes Latgalian, and standard Latvian alone would be lvs; likewise we use et and not ekk for Estonian. (And the decision to recognize either the dialects (under whatever codes) or the macrolanguage, but not both, is, well, to avoid redundancy.) - -sche (discuss) 17:15, 25 January 2015 (UTC)

All right, I think I now understand the way of thinking followed here. In the end the exact definitions of languages are often a matter of convention, so let's follow the one that has been reached. I have written a usage note to the entry for Mari and also otherwise edited it. As the terms "Hill Mari" and "Meadow Mari" are widely used, I think we should somehow explain how they relate to the whole. I made an attempt for that end, too. Please check what you think of the entry as it is now. --Hekaheka (talk) 19:06, 25 January 2015 (UTC)

Surely Wiktionary:About Eastern Mari and Wiktionary:About Western Mari (both currently nonexistent!) should be the primary places to record policy on the representation of Mari on WT? Although I observe that about pages are neglected for a lot of smaller languages. --Tropylium (talk) 22:03, 25 January 2015 (UTC)
Feel free to set up such pages. FWIW, though, WT:LANGTREAT is the central place to record mergers of dialects, splits of languages, and information like that Western Mari is mrj while Eastern Mari is chm. The about pages often concern themselves with orthography, transliteration, entry formatting, etc. - -sche (discuss) 22:06, 25 January 2015 (UTC)

General principles of protolang appendices[edit]

It seems that WT:PROTO and WT:ETYM do not actually go much into this topic at all. A new section in the former, or a separate think tank at Wiktionary:Proto-languages might be in order.

So, some guidelines I would like to have stated explicitly — or, in case it turns out other editors disagree, discussed explicitly:

  1. Appendix pages are not dictionary entries, and are hence not automatically subject to WT:ELE.
    • This means e.g. that we are not obliged to create different entries for different, but related proto-words. In principle an etymological proto-root appendix may be only a single stem, and the different morphological variants indicated by the different descendants discussed in prose instead.
    • Obviously many etymology appendices regardless are effectively about a single reconstructed word. I agree with the current statement at WT:PROTO that in this case,

      (…) the layout of the entries generally conforms to WT:ELE, although some compromises may be made for the sake of usability.

  2. Documenting a proto-language is not the main motivation for having etymology appendices in the first place: they instead exist primarily to highlight the etymological relationships of attested words.
    • All proto-language roots should have at least one existing Wiktionary entry listed as a descendant.
    • Synonymous proto-forms differing only in e.g. declension class should be preferrably discussed on a single page, not split across several "appendix entries".
    • Likewise, even if a subclade with a slightly different proto-language can be established, it should be by default treated together with its parent root. If no parent root of a particular item is known, it should be filed under "Regional Proto-Foo".
      • Which does not mean a blanket ban on establishing closely-spaced protolang levels, of course, if reasons for creating some were to regardless exist.
  3. Roots in intermediate proto-languages which have a well-known parent — such as Proto-Germanic, Proto-Indo-Iranian, Proto-Finnic, Proto-Oceanic — should be provided with either an inherited or loan etymology, marked as words of unknown origin, or tagged with a request for etymology.
    • Not all bottom-level proto-languages have been well-reconstructed, though. E.g. no consensus exists on what Proto-Afro-Asiatic looked like, and so entries in e.g. Proto-Berber or Proto-Semitic would probably not benefit from mass-tagging with requests for etymology just because a reference to cognates elsewhere in AA has not been found.
    • We currently also have a single entry in Category:Proto-Indo-European terms with unknown etymologies for some reason.
    • Similar to WT:WDL, establishing a list of well-reconstructed proto-languages might be useful to have for reference. Not for enforcing new standards on the languages on it — but as a warning sign to anyone setting out to create appendix pages dealing with proto-languages that don't have any "standard" reconstruction.

--Tropylium (talk) 01:50, 26 January 2015 (UTC)

As for Category:Proto-Indo-European terms with unknown etymologies, I just reverted the edit responsible. The etymology said "Origin unknown", and someone couldn't resist the temptation to replace that with {{unk.}}, which is kind of silly for a top-level proto-language. Chuck Entz (talk) 02:23, 26 January 2015 (UTC)

Is documenting all Unicode characters within the scope of Wiktionary?[edit]

We have a lot of information about a wide variety of Unicode characters. In many cases, there is a Translingual section just to give a "definition" for some obscure symbol. I have my doubts about whether this falls within the scope of Wiktionary. We're a database on languages, words and their meanings, but arbitrary symbols aren't necessarily used in any of those. So I think we should reconsider this, and set some specific criteria on what symbols to include. Either that or we should explicitly state that Wiktionary is a Unicode database. —CodeCat 20:15, 26 January 2015 (UTC)

I do not think any special criteria are necessary for symbols: attestation in natural-language texts should suffice. I would go further, actually, and establish a principle of describing characters rather than code points. For one, I do not think that fullwidth forms of Latin letters deserve separate entries. Keφr 20:34, 26 January 2015 (UTC)
At least we managed to reach agreement to use regular Latin letters rather than fullwidth letters in the names of entries like CD and CD机.
I could get behind redirecting the individual fullwidth letters to their regular-width counterparts.
- -sche (discuss) 21:37, 26 January 2015 (UTC)
I'm inclined to be more restrictive than that. Symbols should only be allowed if they can be used as entities representing words, not as entities representing themselves. That is, they are used as representing something other than the symbol. That would mean "" or "" would not be allowed for English, because I imagine the only occasion where you could find them in running text is as mentions, in the sense that the symbol stands for itself as an entity. It's not used as a word to stand for something else. —CodeCat 22:09, 26 January 2015 (UTC)
Our usual attestation criterion requires "conveying meaning", so I took this as implied. Keφr 22:39, 26 January 2015 (UTC)
Yes, but the outcome of a recent vote is that consensus can make CFI mean whatever we want it to mean. —CodeCat 22:42, 26 January 2015 (UTC)
Some people may find translingual sections useful but I don't. If character information is to be kept - stroke orders, input methods, references, such as links to Unihan database, they are better kept outside Chinese sections, so this is a pro-argument.
Chinese (single) characters often convey basic meanings, without any connection to a part of speech. That's why translingual sections for Chinese characters lacked PoS. The actual PoS is determined by the usage in a phrase but there could be multiple interpretations and no consistency. There is no inherent PoS in a Chinese word. Published Chinese dictionaries sometimes use PoS info, sometimes don't and they have a lot of mismatches. We should allow ===Definitions=== header, at least for single characters. Definitions from Translingual should be moved to appropriate language sections - Chinese, Japanese, Korean, Vietnamese where appropriate. --Anatoli T. (обсудить/вклад) 23:11, 26 January 2015 (UTC)
This discussion is more about characters like U+1F6A6 VERTICAL TRAFFIC LIGHT than 漢字. Keφr 23:28, 26 January 2015 (UTC)
Well, Chinese characters were mentioned and the presence of a definition line, that's why I commented. The definitions in the Translingual sections may be useful until the moment they are moved (checking required) into language sections. --Anatoli T. (обсудить/вклад) 23:43, 26 January 2015 (UTC)
This is a completely different issue. This is about whether arbitrary Unicode characters warrant entries. 漢字 were only used as an example of characters that may be mentioned, instead of used for their meaning. Keφr 10:08, 27 January 2015 (UTC)
Yes. Chinese characters convey meaning in the languages where they are used, and don't merely stand for themselves. So they can be included for those languages. —CodeCat 23:34, 26 January 2015 (UTC)

Let's keep the meaningless Unicode to appendices, please. We have them for whoever's looking, but we need not have entries for them. bd2412 T 00:39, 27 January 2015 (UTC)

I propose to make entries of characters whose only "definition" is their Unicode character name speedy-deletable. Does this warrant a full vote? Keφr 10:08, 27 January 2015 (UTC)

Yes, because I'd oppose. I think entries of characters whose only definition is their Unicode character name should be hard redirects to whatever Appendix lists them, rather than being redlinks. —Aɴɢʀ (talk) 13:03, 27 January 2015 (UTC)
In the spirit of compromise, I propose to speedily redirect entries of characters whose only "definition" is their Unicode character to whatever Appendix lists them. bd2412 T 13:52, 27 January 2015 (UTC)
Yes, I think that's the best solution. If we delete, someone may be tempted to recreate the entries. Chuck Entz (talk) 15:10, 27 January 2015 (UTC)
On the other hand, redirecting might discourage creation of entries with legitimate definitions. For one, [[]] currently has no attested definition, but it does not mean that it never will. [[]] currently does not exist, but I have seen the symbol used as a logical AND operator over arbitrary sets: I have a really hard time finding good attestation for it, though. Redlinks make it explicit that there is no entry for a given character; redirecting gives the impression that further content is not needed, even superfluous — even though a determined editor can create an entry over the redirect. Secondly, readers may take the Unicode character name as a definition — at which it often does poorly, as I noted in the RFD to be archived at Talk:⦰. And lastly, I just dislike cross-namespace redirections. Keφr 21:53, 29 January 2015 (UTC)
You, Aɴɢʀ and bd2412 specified "characters whose only definition is their Unicode character name", though? This seems to primarily imply "decorative" Unicode blocks like "miscellaneous symbols", "box drawings", "arrows". If there is reason to suspect that there is a more specific definition in existence — as is the case for all orthographic and mathematical characters, for starters — no redirect should be created, IMO. --Tropylium (talk) 22:14, 29 January 2015 (UTC)
Nothing in this proposal, as written, implies it. There are plenty of mathematical and other technical characters with exactly this kind of non-definition (say, [[]] or [[]]), but this is not yet an indication that this is the only possible one. Keφr 13:33, 30 January 2015 (UTC)
Sorry, poor L2 use of "implies" here, I suppose. If an alleged mathematical symbol fails to be attested in any use, I agree that this proposal would allow redirecting that into the appendix namespace as well. --Tropylium (talk) 04:38, 31 January 2015 (UTC)
Symbol support vote.svg Support DCDuring TALK 15:52, 27 January 2015 (UTC)
Symbol support vote.svg Support Equinox 19:29, 28 January 2015 (UTC)
Symbol support vote.svg Support (but see above). --Tropylium (talk) 22:14, 29 January 2015 (UTC)
  • Documenting all Unicode characters could be within scope of Wiktionary if we so decide, and I tend to support this for the sake of user convenience. Having the information in the appendix while having a link to the appendix from the mainspace seems to do the information service for the user, although I am not sure what the benefit is over having this information in the mainspace. As I have pointed out, we include letter entries as letters despite the fact that the letters have no meaning as letters; the objection was that letters at least form meaning-carrying larger objects, which I accept as an interesting one.

    As for "consensus can make CFI mean whatever we want it to mean": no, that is not the outcome of the recent vote. The CFI means what it means and nothing else, and the vote did not suggest otherwise. The opposers in the vote did not suggest that we should be lying about what the CFI says, merely that we should not consider CFI 100% binding. --Dan Polansky (talk) 19:45, 30 January 2015 (UTC)

Goodbye (with regrets and thanks)[edit]

As an editor who has made over ten thousand individual edits over the last year or two, I was dismayed when another editor began undoing all my edits of the past few days without any warning, much less with running his reasons past me.

When this had just started I put an entry into the beer parlour justifying my edits in general terms. This has received no comments from the community.

I am aghast at this sort of behaviour by another editor, and dismayed that it doesn't seem to worry the community. My dismay is because I saw a future for Wikipedia in which the online environment, having enormous data capacity and the ability to link within and beyond Wikipedia, had the potential to eventually greatly outdo my favourite reference book, The Oxford English Dictionary.

However, this intracommunity brutality has quashed my expectations and has made me realise that I have become addicted to trying to improve the Wiktionary. The only way to cure an addiction is to give it up completely. So, and sadly, this is goodbye with thanks to the brute.—ReidAA (talk) 00:38, 27 January 2015 (UTC)

The reason it has no comments from the community is likely that this user is well known for this kind of harassment and point-making, so it's kind of the same old for most of us. Furthermore, stepping up against it would just cause a repeat of the kind of drama that the user is known for causing. So I think people kept quiet to avoid trouble. I know that was my reason for not responding, in any case. —CodeCat 01:07, 27 January 2015 (UTC)
I seem to recall that ReidAA was previously cautioned about using the same fairly lengthy passage over and over again, when the quote at issue appeared to be pressing a political viewpoint (even if the intention was not to press that viewpoint). It is quite frankly very rarely the case that the same quote will be the best showcase for each word used in it. I tend to think that the reasons for the contested reversions were pretty clear. bd2412 T 01:15, 27 January 2015 (UTC)
I don't see what the point of all the reversions was. It looks to me as if someone was just ticked off at him for reasons beyond understanding. Instead of having mediocre-to-average citations for uncited senses we have none. How is that an improvement? DCDuring TALK 01:54, 27 January 2015 (UTC)
It seems to me that the wrong party left. DCDuring TALK 01:56, 27 January 2015 (UTC)
Re: "the wrong party left". Agree here. --Anatoli T. (обсудить/вклад) 04:27, 27 January 2015 (UTC)
I, too, agree. And Reid isn't the first highly-productive user Dan has driven away from the project; remember Speednat? Unfortunately, the community seems to be very reluctant to ban people, even when they are clearly hurting the project. - -sche (discuss) 19:16, 27 January 2015 (UTC)
Let the reader please check User talk:Speednat. The editor seemed to be copying definitions word-for-word from a copyrighted source; shortly after I asked him about that (User_talk:Speednat/2012#Webster.27s_Third_definitions), he stopped adding definitions and went to enter attesting quotations which I suspected were from a copyrighted dictionary as well which lead to User_talk:Speednat#Source_of_quotations; shortly after that, they left. To see that kind of editing from that editor as "highly-productive" is an error. The reader may have a look at User talk:Speednat, and check whether the complaints that I have raised on that page were justified. --Dan Polansky (talk) 20:14, 27 January 2015 (UTC)
I may have jumped the gun with my assessment, but that was the experience I recall. bd2412 T 03:52, 27 January 2015 (UTC)
I draw your attention to User talk:ReidAA items 11 et seq. I think one person single-handedly managed to drive a promising, detail-oriented contributor out of the project. I wish I had been able to pay attention to this over the last 2-3 weeks. DCDuring TALK 04:22, 27 January 2015 (UTC)
While most of the back-and-forth there does involve a single editor, there are other conversations where this editor seems to have driven others to exasperation. Of course, we probably all do that from time to time. bd2412 T 15:11, 27 January 2015 (UTC)
I think the project is worse to have lost him as a contributor, especially as he was beefing up our English language content, especially the more common words, which could well use the attention. Many of his typographic concerns are legitimate and have been neglected and could have been addressed by attention from our technical contributors together with his willingness to attend to the details. DCDuring TALK 15:47, 27 January 2015 (UTC)
I've been asked to draw attention to the recent contributions of Reid and of Dan, where the latter are just more reverts of the former. If we can come to consensus about which version of the entries in question (put away, etc) is best, perhaps we can entice Reid to return. - -sche (discuss) 19:16, 27 January 2015 (UTC)
I request that, above all, user ReidAA is prevented from (a) removing spaces from after #, and (b) performing non-consensual switching of context and label templates. If, by contrast, user ReidAA is not prevented from switching context to lb, then I do not see how I can be prevented from switching lb to context, given the best evidence about consensus or its lack available at Wiktionary:Votes/2014-08/Templates context and label. As for (c) having a single quotation used more than 10 times, I oppose that and I want to see community consensus for this before this continues; this could even by copyright violation, since fair use rationale gets weaker with this: we need quotations for attestation, but we do not need to reuse a single quotation e.g. 20 times. As for (d) put away, I emphasize that the grouped senses were not grouped by hyponymy but by being from baseball. User ReidAA has been making these kinds of willy-nilly groupings as he saw fit without any regard to lexicographical propriety, which I surmise is inferior and should not be continued. I ask the reader to check OneLook dictionaries how they do things, and check whether any of the dictionaries is making sense groupings in ReidAA arbitrary vein. --Dan Polansky (talk) 19:33, 27 January 2015 (UTC)
Later: There would be no copyright violation with even high-multiplicity repetition of quotations from out-of-copyright works. But I find such highly repeated use highly inferior nonetheless, worse than nothing. I surmise that even attesting quotations should be good examples of use, which highly repeated use makes unlikely. --Dan Polansky (talk) 19:41, 27 January 2015 (UTC)
Edits that don't change the appearance of the page, such as whether there's a space after # and whether {{context}} or {{lb}} is used, should never be edit-warred over, or worried about in any way. If someone switches between one way and the other, let them. It has no bearing on the dictionary at all. As for copyvio, what does BD2412 say? It seems to me to be less of a copyvio to use the same quote over and over than to use different sentences from the same work to exemplify different words, because that way we're using less of the total work. And copyrighted or not, I see no problem at all in using the same sentence to illustrate multiple words. I've done it myself to illustrate words of Irish. —Aɴɢʀ (talk) 19:46, 27 January 2015 (UTC)
Re: "If someone switches between one way and the other, let them." I vehemently disagree. I especially disagree that editors should be allowed to remove spaces from after #, since it massively hinders usability of the wikitext from my standpoint. Furthermore, generally, this non-consensual switching generally leads to back-and-forth even without an appearance of an edit war: one editor switches to no space after #, and, say, after a month or more, another editor switches back to their preferred form. We've seen this with "<" vs. "from" in etymologies. This non-consensual back-and-forth is unprofessional and counterproductive; it suggests immaturity on the part of the editor pushing their preferred style. It is one of the reasons why Wikipedia has rules about U.K. vs. U.S. spelling; they do not say "if someone switches U.K. spelling to U.S. spelling, let them". Moreover, I point the uninvolved reader to User talk:ReidAA to see for how long a time I have shown restraint as for label vs. context, that I talked about this multiple time to the user to no avail. I will emphasize one more time: non-consensual switching leads to back-and-forth and should be avoided. --Dan Polansky (talk) 20:00, 27 January 2015 (UTC)
The difference between US and UK spelling is visible on the displayed page, as is the difference between "<" and "from". The difference between presence and absence of a space after # is not. And there is no way the absence of a space after # "massively hinders usability of the wikitext"; it's a purely aesthetic preference on your part. "Non-consensual switching" only leads to back-and-forth if a second editor actually goes to the trouble of switching an edit back to how it was before, which in the cases under discussion is superfluous and frankly silly, since edits of this kind have no effect on the actual content. Of the diffs provided in this thread, I have not yet seen one worth getting upset about, much less one worth reverting. —Aɴɢʀ (talk) 20:12, 27 January 2015 (UTC)
You are entitled to your indifference about wikitext, and I am entitled to my care about wikitext and its legibility. Obviously, user ReidAA cares as much as I do, albeit in the other direction; if they did not care, they could have left the spaces alone. You cannot allow ReidAA to care, and disallow the very same type of care (albeit in the other direction) to me; that is unfair and unjustifiable. --Dan Polansky (talk) 20:18, 27 January 2015 (UTC)
If he were the one reverting your edits my reaction would be the same. —Aɴɢʀ (talk) 20:25, 27 January 2015 (UTC)
I think you're just using consensuality as a pretext for forcing your own preferences. Essentially, what you're implying is that any difference from the norm whatsoever needs to be discussed at length, and people are not allowed to edit as they wish. This, in turn, means that if anyone makes a change you don't like, you can just revert it and claim they have to ask everyone nicely for permission first. But the truth is, the only permission anyone ever needs is yours, as you seem to have declared yourself the arbiter of right and wrong on Wiktionary. And if anyone speaks up about it, the result is a stream of wikilawyering, tu quoqueing and other forms of communal shaming from your part. And I don't see any restraint at all, you basically just messaged a user and told them they should be blocked for not doing things they way you want. That's not just completely unproductive, that falls squarely in the "intimidating behaviour/harassment" ban reason. I would have blocked you for it already if I could be sure your sympathisers would not just unblock you again. —CodeCat 20:18, 27 January 2015 (UTC)
If anyone should feel entirely free to start removing spaces from after # regardless of previous overwhelming practice, why should not another person be entirely free to be adding spaces after # as they see fit? This is plain as a day. You seem to want to disallow me to be adding spaces after # as I see fit, and to be converting lb to context as I see fit, while allowing the conversion in the other direction to another person; how more unfair can this get? --Dan Polansky (talk) 20:28, 27 January 2015 (UTC)
You can't compare those two things. The space after # is more or less universal across Wiktionary, and so is support for continuing to keep it that way. However, you well know that the issue for context and label is much less clearly established and no consensus exists. So if you want to be fair, you shouldn't just be reverting this one user for that, but also the many other regular contributors that use {{cx}}, {{lb}} or {{label}}. The fact that you don't revert all editors, but do revert this one user, seems to me like you're just picking an easy target to bully, who you know won't have much support in the community or will to stand up against you. And now they left. You're a real hero, Dan, congrats. —CodeCat 20:32, 27 January 2015 (UTC)
I have absolutely no wish to perform any non-consensual switching in any new entry created by an editor with any of the {{label}}, {{context}}, {{lb}}, and {{cx}}, and I see nothing unfair about it. I see it as perfectly legitimate to use any of the four templates in newly created entries.
As for "The space after # is more or less universal", so was the use of the likes of {{slang}} without {{context}} before an illegitimate run of your bot. --Dan Polansky (talk) 20:45, 27 January 2015 (UTC)

With respect to the copyright question raised above, there is little to no impact on a fair use rationale from using the same quote multiple times. However, there is no need for us to expose ourselves unnecessarily to any risk of copyright infringement. I would import the practice that Wikipedia uses with images: if it is in the public domain, use it however you want; if it is not, then use it only if there is no comparable public domain alternative. Of course, new words (or words coined since 1923, in any case) will primarily be found in works that are under copyright, as will examples showing that old words remains in current use. I admit, when I add cites to an entry, I always try to find both the oldest use I can find, and the most recent. This also speaks to the quality of sentences as showcases for the words defined. We have a much better fair use argument for reproducing a sentence that does an excellent job of showcasing the word (where that word is the "star" of the sentence) than we do for a lengthy passage with the word incidentally buried in it, particularly if it can easily be shown that better example sentences are available. bd2412 T 20:32, 27 January 2015 (UTC)

Are function words ever the star of a passage (at least to anyone besides a grammarian) ? —This unsigned comment was added by DCDuring (talkcontribs) at 21:31, 27 January 2015 (UTC).
Probably only in very rare and odd circumstances, but I would guess those words would be the easiest to find in public domain sources. Don't forget that the public domain covers not only pre-1923 stuff, but all U.S. government-produced documents (including the limitless world of federal court opinions), and anything that any private author has deliberately released into the public domain. bd2412 T 21:37, 27 January 2015 (UTC)

Arrowred.png Um, guys --

The very next thread here, though by another user, mimics the lead-in to this thread so closely that I surmise it is probably the same editor as ReidAA. Notably, this new editor also signs off as WF, making me wonder if we aren't barking up the wrong tree by censuring anyone for ReidAA's seeming departure... ‑‑ Eiríkr Útlendi │ Tala við mig 21:28, 27 January 2015 (UTC)
It could just as easily be WF mocking ReidAA; or another person entirely mocking them both. bd2412 T 21:35, 27 January 2015 (UTC)
Yes, Wonderfool is mimicking Reid's post for comedic purposes; he previously mimicked Semper's post in which he announced he would stop patrolling Recent changes. - -sche (discuss) 21:40, 27 January 2015 (UTC)
  • Does anyone here have access to tools for seeing the IPs from which ReidAA and Regret and Reward have connected? Otherwise we're left with conjecture. ‑‑ Eiríkr Útlendi │ Tala við mig 00:20, 28 January 2015 (UTC)
    D'oh. DCDuring TALK 00:34, 28 January 2015 (UTC)
  • Does it matter? FWIW I think it's most likely to be WF making fun of this thread and that ReidAA has nothing to do with it, but really it just doesn't matter either way. —Aɴɢʀ (talk) 00:38, 28 January 2015 (UTC)
  • If the two are the same, then this current thread about why ReidAA left is basically entirely mooted. ‑‑ Eiríkr Útlendi │ Tala við mig 00:46, 28 January 2015 (UTC)
  • Only a checkuser can do that, and they wouldn't do it just for a fishing expedition like this. As for whether the two are the same: the only thing they share is their manner of using quotes. In editing style, level of competence and basic approach, they're quite different. Chuck Entz (talk) 03:08, 28 January 2015 (UTC)

Hello (with relief and disapproval)[edit]

As an editor who has made over one hundred thousand individual edits over the last ten years or so, I was amused when another editor began deleting all my edits of the past few days without any warning, much less with running his reasons past me.

When this had just started I was creating lots of Spanish plurals, by the way.

I am not surprised by this sort of behaviour by another editor, and relieved that it doesn't seem to worry the community. My amusement is because I saw Wiktionary as a ting in the past, having enormous data capacity and the ability to link within and beyond Wikipedia.

I too am addicted to improving Wiktionary, and figure the only way to take my addiction to the next level is by staying put. So hello to all users old and new, it will be a pleasure working side by side with you all. WF --Regret and reward (talk) 16:07, 27 January 2015 (UTC)

Abuse filters for TV show content?[edit]

I don't know why — perhaps it's a rogue bot — but we very often get people creating TV episode lists on Wiktionary. This has been going on for months. I suggest new abuse filters that prevent entries with a title starting "List of", or entries that contain {{Infobox television. Equinox 00:26, 29 January 2015 (UTC)

I would say this is a relatively rare occurrence. If this were a bot, we would get that much more regularly. I think these are just idiots who cannot tell us from Wikipedia and click buttons at random. I rewrote Special:AbuseFilter/12 to catch this and some more. Keφr 13:06, 29 January 2015 (UTC)

User:Type56op9[edit]

I have blocked this user (User:Type56op9) with no direct evidence. It is up to you guys to judge if this block is correct. --kc_kennylau (talk) 14:46, 29 January 2015 (UTC)

  • I have always assumed he was WF. SemperBlotto (talk) 14:48, 29 January 2015 (UTC)
    • @SemperBlotto: He has admitted it in his talk page, case closed. By the way, what does WF mean? --kc_kennylau (talk) 14:51, 29 January 2015 (UTC)
      • Oh, I talked too early... --kc_kennylau (talk) 14:54, 29 January 2015 (UTC)
        • I can’t help but like WF. He was always very sharp, but also immature for his age. Most people continue to mature in judgment, understanding, and outlook through the age of 30, and I assume WF is slowly growing up as well. I would not support him for admin status, but I think he’s an asset and his occasional bouts of wigging out didn’t result in any damage that was hard to find and fix. —Stephen (Talk) 21:45, 29 January 2015 (UTC)
          • I've come around to the position that indeffing WF was wrong and that he should be unblocked. Purplebackpack89 22:59, 29 January 2015 (UTC)
User:Type56op9 is unquestionably Wonderfool (aka WF), though only Wonderfool knows who Wonderfool himself is. As Stephen said, he's mostly an experienced, hard-working asset to Wiktionary, but is prone to irresponsibility and occasional bad behavior. He also tends to focus more than he should on quantity rather than quality. Many years ago, he did a bunch of massively disruptive stuff and got permanently blocked for it. For some years after that, he would constantly reappear under new accounts until someone (usually SemperBlotto) figured out it was him and blocked those accounts, too (he's gone through literally hundreds). Finally he got tired of doing that all the time, and proposed a compromise, which resulted in an agreement: as long as he stuck to one account at a time and behaved himself, he wouldn't get blocked. I would liken this to parole or work release: he's free to do what he wants, but we watch him carefully, and any admin is free to block him at any time for even minor misbehavior. This has worked for the most part, but he does still end up getting blocked from time to time, and he should never be taken at his word, and never be completely trusted. As Stephen said, he's gotten more responsible over the years and acts up less. The main problem lately: Given his track record with bots (we still occasionally find bad entries from years ago- especially from his Polish period), it's hard to trust him with one, and his fixation on high edit-volume makes him want to do things with bots and accelerated entry-creation all the time. Chuck Entz (talk) 02:26, 30 January 2015 (UTC)
User:Chuck Entz's pretty smart, you know. The best new user in the last few years, IMHO, thanks to his diplomacy, good humour, sharp observations and conscientious editing. He's hit the nail on the head with me, too. As for the Polish, yeah, I admit punching way above my weight in that one - I'd been learning Polish, but certainly not enough for quality entries here. --Type56op9 (talk) 07:49, 30 January 2015 (UTC)
BTW, it's nice to be talked about every now and then, too. Thanks! Perhaps we all should talk about each other more (nicely, if we can) and this place will be more harmonious. Of course, the opposite could certainly be true too... --Type56op9 (talk) 07:52, 30 January 2015 (UTC)
The only thing worse than being talked about is not being talked about. —Aɴɢʀ (talk) 09:38, 30 January 2015 (UTC)
Yes, but keeping your head below the parapet for a while, not causing controversy, and not being talked about can also be nice. Even though I'm not admin and have no desire to be, I would say no bots for this user. Donnanz (talk) 10:09, 30 January 2015 (UTC)

Cosmetics vs. makeup[edit]

Category:Fashion contains both Category:Cosmetics and Category:Makeup. What, if any, is the difference between them? —Aɴɢʀ (talk) 17:40, 30 January 2015 (UTC)

  • There's a very fine line of distinction, I think. Can they be merged under one category - e.g. "Cosmetics and make-up"? Donnanz (talk) 17:57, 30 January 2015 (UTC)
There's not many entries in either category. It wouldn't take long to merge them. Donnanz (talk) 18:01, 30 January 2015 (UTC)

Distinguishing loans vs. inheritance[edit]

Etymologies on Wiktionary currently use the expression "(derives) from" in no less than three mutually exclusive ways.

  1. A word has been inherited from an ancestor language, and its older form has been recorded/reconstructed as (*)foo.
  2. A word is a loan from another language, where it appears as föö.
  3. A word is a derivative of another word in the same language, and can be analyzed as fo+o.

The third is generally categorized separately (Category:Words by suffix by language, etc.), but the first two are currently completely mixed together under Category:Terms derived from other languages.

There seems to be widespread consensus that a word's etymology should be traced as far as possible along any one of these directions: e.g. name needs to be not only in Category:English terms derived from Middle English, but also Category:English terms derived from Old English, Category:English terms derived from Proto-Germanic, and ultimately Category:English terms derived from Proto-Indo-European. Similarly, alkali needs to be at least in Category:English terms derived from French, and Category:English terms derived from Arabic. (Whether we need to note other intermediate languages of transmission is less clear, but this is not the current point.)

Things get messy once we start mixing these together. Obviously we will not categorize terms according to their morphological structure in their source language, whether it is an ancestor or a loangiver. But plenty of words like bazaar are currently filed under "English derivations from PIE", despite deriving via a long Wanderwort chain and not via inheritance. (This particular word is pretty bad case, in that there isn't even a PIE cognate to the word itself, only to the roots from which it was assembled in Old Persian.)

To get some order around here, I believe introducing some finer-grained variants of {{etyl}} would help, perhaps by some flags. This would allow us to distinguish the various ways in which a word can derive from a different language. I could imagine eventually distinguishing all sorts of things, e.g.

  • {{etyl|la|fr|via=desc}}: French words inherited from Latin
  • {{etyl|la|fr|via=loan}}: French words loaned from (e.g. Modern) Latin
  • {{etyl|la|fr|via=loan-desc}}: French words loaned from a descendant of a Latin word in another Romance language
  • {{etyl|la|fr|via=desc-loan}}: French words inherited from Old French but loaned in there from (e.g. Medieval) Latin
  • {{etyl|la|fr|via=und}}: French words deriving in some undetermined fashion from Latin
  • {{etyl|la|fr|via=deriv-desc}}: French words derived from another French word, which has been inherited from Latin
  • {{etyl|la|fr|via=deriv-loan}}: French words derived from another French word, which has been loaned from Latin
  • {{etyl|la|fr|via=loan-deriv}}: French words loaned from a Latin word, which is a derivative of another Latin word
  • {{etyl|la|fr|via=desc-deriv}}: French words inherited from a Latin word, which is a derivative of another Latin word
    • These last four could be useful if the intermediate derivative is archaic or unattested.

But just a loaned/inherited/neither distinction should be a good start. --Tropylium (talk) 20:22, 31 January 2015 (UTC)

We already have {{borrowing}}, so I propose we create another template to match it, for inherited terms. —CodeCat 20:50, 31 January 2015 (UTC)
Strong support. This would be a huge improvement on the meaningfulness of our etymological categories. — Ungoliant (falai) 20:58, 31 January 2015 (UTC)