This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.

Beer parlour archives edit

2024

2023

Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

December

August 2009

Multiple forms in translation sections

With the creation of a new Index:French by Conrad.Irwin's awesome script, I'm noticing an odd little side effect that calls for a discussion:

Should the feminine of nouns and adjectives be given in translation section? Should it be linked?

The presence of a linked entry creates (IMHO) undesirable non-lemma entries in the Index. I've dealt with this in several ways, and considered others. Here's a summary (example for abject, simplest to most elaborate):

abject (fr) m
abject (fr) m, e
abject (fr) m, e f
abject (fr) m, abjecte f
abject (fr) m (abjecte f)

As far as adjectives go, something like "abject, e ^(fr)" would be an ideal solution (if only because adjectives do not have inherent genders in French, unlike noun), but a similar problem arises with variable nouns. In either case I'm not too keen on giving a full feminine entry, since it's just a call for trouble as later editors are likely to link them. Circeus 02:18, 2 August 2009 (UTC)[reply]

I don't think putting the feminine form at all is necessary. The same would apply to Spanish and Italian, then (and a score of other languages). I think that information belongs on abject#French, not the translation. I do appreciate the point you're making, though. Mglovesfun (talk) 09:20, 2 August 2009 (UTC)[reply]

Swedish would have three forms (for some irregulars, four) there, and I have been recommended only to use the lemma... But I have never considered giving the gender of this lemma (it's always common, if we are talking adjectives). Should I have done so? \Mike 10:32, 2 August 2009 (UTC)[reply]

In the case of adjectives, I agree with Mglovesfun and \Mike, and feel very strongly that it should be just abject (fr); I see no need to list the feminine singular, masculine plural, or feminine plural, and no need to indicate that this is the masculine singular form. (And I certainly see no need for half-measures, listing the feminine singular but not either plural, and indicating that the form (deprecated template usage) abject is masculine but not that it's singular.) It's a given that French adjectives have four forms, of which we list only the lemma form (masculine singular), just as with French nouns (two forms; singular) and French verbs (forty-eight synthetic forms; infinitive). A reader looking for inflection information will click the link.
In the case of nouns, I do think we should include both in full, because I don't think they're two forms of one word, but rather, two very closely related words. (Arguments can certainly be made either way, but I prefer to err on the side of including both, since otherwise it's not obvious to a reader that both even exist.)
—Ruakh_TALK 03:33, 4 August 2009 (UTC)[reply]

If the "form" is included (and is often useful, as long as one doesn't go overboard with f, f pl, m, m pl ... ;-) it should be linked:

abject (fr) m, abjecte (fr) f

If this confuses the script, the script needs a bit of fixing I think. (Yes, it is awesome ;-) In any case, using "e" or something like "-e" should never be done. It is one of the irritations of paper dictionaries, where one is never quite sure exactly which letters, if any, are replaced. Robert Ullmann 14:04, 5 August 2009 (UTC)[reply]

Template to go with Appendix:French spelling reforms of 1990

I've adapted this from the French (re-reading for errors very welcome). A template to allow quicker creation of these 'alternative spellings' seems a plus to me (fr:Modèle:ortho1990 in French). As far as I can see, either a variante of {{alternative spelling of}} with a usage note, or my preference, something that goes directly under ====Usage notes==== would be better. Any ideas what name would be good? Preferably starting with fr-. Mglovesfun (talk) 09:17, 2 August 2009 (UTC)[reply]

Perhaps relevant is the German template {{de-note obsolete spelling}}. It's used in usage notes to show that the spelling was made obsolete by a German spelling reform. —Rod (A. Smith) 17:36, 2 August 2009 (UTC)[reply]

Thanks for that link. The spelling reform of 1990 didn't actually make the old spellings obsolete; it tried to, but it didn't gain traction, and the eventual result was that new spellings were deemed also-correct. Even French governmental publications generally use the old ones; and as you can imagine, the new ones really got nowhere in Francophonia-at-large. I'd argue that for most of the reforms, it's more important to label the post-1990 spelling as "supposedly O.K., but don't try it at home" than to label the pre-1990 spelling in any way. But Stephen's comment suggests that something similar is the case for the last German spelling reform, so perhaps French and German editors can share thoughts on a good way to present this information. —Ruakh_TALK 03:41, 4 August 2009 (UTC)[reply]

Yeah, I used gout in a university article, and got told it was wrong, it which point I said it wasn't, and since we didn't have a dictionary to hand (and well, there are other students!) we dropped it. Mglovesfun (talk) 16:00, 7 August 2009 (UTC)[reply]

Language maps

One might know the experiment of putting translations on a world map to arrange them geographically. While I think this is not a good idea (imagine that for water), we have the Languages by country categories which are currently a bit bland and could benefit greatly from maps of languages. It's mainly useful if you don't exactly know the name of a language (or if it's ambiguous), but you know where it's approximately spoken, in which case the map would help a lot, but there are probably other uses for it. By extension, the Languages by genetic classification categories could also get these maps, though it might be a bit more problematic with these. Ethnologue, in their 16th edition, also introduced language maps, and I believe Wiktionary's category system would be much more useful if we did the same. -- Prince Kassad 19:49, 2 August 2009 (UTC)[reply]

AWB

Can I be approved for AWB? For now I want to change 'Suffix' to 'Affix' in a family of articles I created until discussion comes to consensus at Tea Room, then change to whichever term we agree on. kwami 02:37, 3 August 2009 (UTC)[reply]

I've added you to the list of approved users. --Ivan Štambuk 02:46, 3 August 2009 (UTC)[reply]

Thanks! kwami 06:47, 3 August 2009 (UTC)[reply]

suffix or interfix?

I raised this question in the Tea Room, but perhaps it belongs here, since it has wider policy implications.

There are "suffixes" such as English -tum- and Esperanto -aĉ- which cannot occur word finally. Question is, can we label as a suffix s.t. which has the form "-X-", or is our "suffix" label restricted to "-X"? Neither intrafix nor infix would seem to apply to these cases. (I'm not sure what -tum- is, since it always attaches to other affixes, forming words which contain no root or stem, but -aĉ- is universally described as a suffix in Esperanto, when that term is used at all.)

Anyway, the discussion is over there. kwami 07:16, 3 August 2009 (UTC)[reply]

Attesting color names

This continues a discussion begun at WT:RFV#outer space. DCDuring TALK 15:18, 4 August 2009 (UTC)[reply]

It seems to me that we have very basic questions about color words (nouns and adjective, mostly):

To what extent do we accept the nomenclature of standard-setting bodies?
What is that we should be attesting to?
What are the limits on what we can achieve that are imposed by current and near-future computer technology?

Analogies come from taxonomy and chemical nomenclature. In those cases there are standards for current usage, which specific institutional systems have been in operation for less than a century, I think. There are previous naming practices which have some carryover usage, sometimes differing significantly. Matching vernacular names to standard names might be a valuable service to normal users. The analogies mostly seem relevant for question 1.

It seems possible that what we should do initially is identify non-standard terms in our Category:Colors and run those through RfV. That only requires that we identify one or more standard-setting bodies that seem to address nomenclature in a way consistent with wiktionary.

There is obviously a special role played by the standards for color representation on a computer screen. At one level we just have to accept it. At other levels color on a computer screen is a match to real-world color only through our interpretative meat-mechanisms (and prosthetics). DCDuring TALK 15:46, 4 August 2009 (UTC)[reply]

Definitions in technical subject areas may be prescriptive, and may be more precisely defined than general-use definitions. So, e.g., while plum may be defined generally as a deep blue-purple colour, another technical sense may carry the subject label {{web design}} and be defined as “#DDA0DD”. Because of our descriptive methodology, I would prefer to see some actual usage attested for each of these, rather than just mass-producing entries from technical glossaries, even if the definition is based on a prescriptive source—if web designers don't actually write or speak about the technical plum colour, then we oughtn't define it. This has a broad application, for example an arm is different in medicine, a tint in visual arts, or DNA in genetics. I believe clay has different meanings in soil science, hydrology, and ceramic arts. —Michael Z. 2009-08-04 17:08 z

When used as a definition, "#DDA0DD" may have that restricted context, but, when used to generate a representation of what people might expect, it has potentially greater use. Mass-reproduction of starter entries with a highly restricted context sounds like a great idea to me. We could proceed from there, adding broader contexts where we had confidence and facing selective rfv challenges. After all, it's not clear that it is so much more important that we attest color entries as opposed to multi-word entries or prepositions or proverbs or engage in any of the hundreds of other classes of tasks that await us at every turn. DCDuring TALK 19:33, 4 August 2009 (UTC)[reply]

The colour swatches can be misleading. Indian red, is not a particular colour, but an open series of chemical pigments, both natural earths and artificial chemicals. Its defining qualities are of interest to printers, artists, and house painters: richness and intensity of hue, mechanical coverage, and chemical permanence of the colour; and to chemists: the inclusion of ferric oxide and no other common ingredient. It is not a particular red colour, but a wide range of rusty red and purplish hues. Although crayons, coloured pencils, and an HTML hex triplet have been named after it, they are not Indian red. Representing Indian red with a swatch of #CD5C5C is like throwing up a picture of Walt Disney's Goofy to illustrate dog. The number only represents the (non-standard) HTML colour, and nothing else. It fails to give the reader the correct impression, which is easily done with a half-dozen words.

I'm also not crazy about dumping a technical glossary into the dictionary, although we already have something like that in Appendix:Colors. What is needed now is attestation or some corpus research. Without that, do we know if ecru is really used as an adjective, or tow-colored as a noun, or if they are actually used by anyone at all? —Michael Z. 2009-08-05 03:59 z

Template:Xyzy

Mr. Ullmann seems to be under the impression that any changes to his template have to go through lots of red tape and his personal approval before being implemented. This wouldn't be much of an issue if I were trying to add support for some language spoken by 500 people, but Urdu? I'll point out that Hindi is supported. So why not Urdu? Basically the same language with a different script. Why should this be an issue that requires a vote? <edit> We also actually have at least one Urdu editor who I would expect this to be useful for. Do we have anyone who does major work in Belarusian, Macedonian, Ukrainian? Or Tamil or Telugu? — [ R·I·C ] opiaterein — 13:40, 5 August 2009 (UTC)[reply]

It would be very useful. (could you possibly can your abusive attitude? someday?) The issue is (as explained on the talk page) that changes must be made very carefully, as each addition is a tremendous amount of overhead, and deletions are/will be very, very painful (finding every default use by 'bot, and adding sc=). So it is just about going slowly and looking at them. make the suggestions, and at some point we will add a few carefully. This template is used more times than any other (411 thousand pages, but more times on each page than the two before it, see Special:MostLinkedTemplates). Changes are not trivial. Robert Ullmann 13:54, 5 August 2009 (UTC)[reply]

I will probably continue to be abusive to you if you keep stunting my contributions with what I perceive to be ridiculous bullshit and meaningless, extremist sensationalism. (This "proposal", and "vote" are, in themselves a crime against humanity)

If each addition is a "tremendous amount of overhead", why are there so many to begin with? — [ R·I·C ] opiaterein — 14:05, 5 August 2009 (UTC)[reply]

None for any of those listed except Macedonian which has 1-2 semi-regular editors but is usually inactive 10 out of 12 months a year. All Indian languages except Hindustani are way underrepresented considering the number of their speakers. Hindu and Urdu are usually added in pairs (since they're basically the same language in 2 scripts) by Dijan so it makes sense to allow either both or neither of them.

Now reading upon the technical limitations of {{Xyzy}}, it appears to be only of limit assistance to the editors. If the defaulted scripts get introduced/dumped according to the # of Wiktionary entries the languages they reflect have in a particular moment of time, it could possibly prove to do more damage than being useful considering that it can apparently support only a limited number of default scripts (how much?), and there is no way for us to know how much entries will Wiktionary have in e.g. 10 years in what language, so sooner-or-later when we hit the limit there'll be no way to keep in sync the top languages which should have their script defaulted and those actually supported by {Xyzy}, since once they're added to {Xyzy} they cannot be removed without doing damage that will not be easy to fix. IMHO it's simply the best to add sc= manually (it could be also added by a bot in most of the scenarios), then to rely on such a mutable template. --Ivan Štambuk 14:01, 5 August 2009 (UTC)[reply]

Quite so. Note that the defaulting of script is only part of what the template does; the more important bit is generating language tags when there is no other script template, the tags being tasked to script templates. The defaulting is only useful to a small number of languages, where it is very useful. (Japanese, Armenian ...). It isn't really based on the number of entries (which as you observe correctly, would only be a statistical starting point). And the point is that it must not be "such a mutable template" to work effectively. (;-) Robert Ullmann 14:12, 5 August 2009 (UTC)[reply]

threshold for voting

As far as I know, everyone who had created an account here before the vote started, is eligible to vote. This has lead recently to appalling manipulations (in the vote of unifying Serbo-Croatian) and influx of unknown novices with less than 10 edits for Wiktionary. It is insulting to see how the votes of (not one or two, but dozens of) unknown editors stir discord and impede important policies and how those novices' votes are influencing the decision with the same weight as Stephen's, Ruaks, Prince Kassad's, Ivan's and so on. Therefore I suggest adopting a policy prohibiting users with less than x contributions from partaking of votes. In French Wikipedia the threshold is 50 contributions in the main space and at least 7 days of contributing before the vote started. In Bulgarian Wikipedia it is 400 contributions in the main space and 40 days before the vote started. In German Wikipedia, if I remember aright, it was 300 (or 200...) contributions in the main space. So, before starting a vote I would like to know what kind of threshold most of you would indorse? 150, 100 contributions in the main space, Appendices or Citations? How many days of contributing prior to the vote ought to be required (and not just of creating the account, which became trivial after the SUL had been introduced)? The uſer hight Bogorm converſation 09:47, 6 August 2009 (UTC)[reply]

I thought we had already implemented such a threshold for voting. I agree that we should require at least the 50 contributions with 7 days anticipation that the French have, but I would not be opposed to a requirement of 300 contributions with 30 days in anticipation of a vote. —Stephen 10:00, 6 August 2009 (UTC)[reply]

Contributing to Wiktionary is much easier and faster than to Wikipedia, so I'd be rather in favor of the figure of some 200-300 edits in the main namespace, and 7 days before the vote was started. --Ivan Štambuk 10:38, 6 August 2009 (UTC)[reply]

500 contribs and 30 days. --Vahagn Petrosyan 11:32, 6 August 2009 (UTC)[reply]

10,000, one year, verified En-N status and identity, and Mensa membership. DCDuring TALK 22:50, 6 August 2009 (UTC)[reply]

Your cynicism is not appreciated. 500 edits is a figure that can be achieved in a few days by any decent contributor. --Ivan Štambuk 22:58, 6 August 2009 (UTC)[reply]

Sarcastic is perhaps the word you were looking for. How could you know that I was cynical? With equal basis, I could say that I find the proposals of a self-proclaimed elite to be the height of free-loading cynicism, attempting to appropriate a valuable resource for their own purposes. Also, how do you know that there isn't someone who appreciates what I said. Did you take a straw poll or have you determined that you are the spokesman for a silent majority? DCDuring TALK 23:16, 6 August 2009 (UTC)[reply]

No, I meant "cynical". --Ivan Štambuk 23:23, 6 August 2009 (UTC)[reply]

Perhaps you could explain yourself. DCDuring TALK 18:22, 10 August 2009 (UTC)[reply]

I don't mind if a new or casual contributor votes, I just don't want a non-contributor voting, or someone contributing in order to vote. Casual and new contributors are as entitled to their opinions as I am to mine (and while some are very stubborn, most tend to take a "go with the flow" approach until they have their sea legs). But they have to be here because they want to contribute, not because they want to vote, or else we risk becoming a battleground for vote-canvassers with their own agendas. (And given recent events, the word "risk" may be an understatement.) —Ruakh_TALK 21:26, 6 August 2009 (UTC)[reply]

I don't want regular contributors recruiting people to manipulate the outcome of a vote. — [ R·I·C ] opiaterein — 18:34, 7 August 2009 (UTC)[reply]

This is absolutely the wrong way to approach this problem. I'm not entirely sure I can explain myself, but I'll try. Firstly, I can't see the use in an edit-count based privilege system - restricting to whitelisted users would make a lot more sense - but again is an outrageously blunt measurement of ability to make sensible decisions (and to actually use the whitelist as qualification for voting would bring suspicion on that process too, so let's forget I said it). Secondly voting is a mainly useless way of making decisions anyway, which is exactly why Wikipedia have their !vote page (which of course no-one anywhere sticks to because votes are by far the easiest way of doing things). Votes have two places that I can see, 1) We have come to a common conclusion, let us ratify it and document it formally; 2) We have decided to take action, but are ambivalent to which of A,B,C we actually do. When a vote no longer falls into those two categories it becomes a pointless waste of time - all the discussion time wasted trying to "inform the misled" could be much better spent working on either the dictionary, or a counter-proposal. A lecturer of mine once pointed out (somewhat more eloquently) that, while economists trade to bring mutual benefits, politicians fight to try and be the one who wins. The recent S-C vote is a prime example of where politics gets in the way of mutual benefit, simply because it's easier to fight than to compromise. Conrad.Irwin 21:29, 6 August 2009 (UTC)[reply]

What’s to prevent someone from dragging in a bunch of ringers from the Wikipedias and whitelisting them en masse? A contributions quota and probationary period may not be the perfect way to insure that only those with a real interest in Wiktionary will cast a vote, but it’s probably the most practical way we will find. Let’s make it 500 contributions and a 30-day waiting period. Also whitelisting. —Stephen 23:33, 6 August 2009 (UTC)[reply]

There shouldn't even be a vote if it's contentious enough for people to care that much - is I think my main point anyway. Conrad.Irwin 00:14, 7 August 2009 (UTC)[reply]

Hm, I was just coming to the BP to start a topic similar to this... I suppose 30 days and 500 contributions is reasonable. How about a no previous bans? Or no bans within the past year or two? Kinda like the no votes for convicted felons :) — [ R·I·C ] opiaterein — 17:45, 7 August 2009 (UTC)[reply]

This would prevent anyone who previously took a wikibreak and new admins experimenting with the block function from voting. -- Prince Kassad 17:48, 7 August 2009 (UTC)[reply]

"Experimenting with the block function"? I have to say, Kassad, as much of a p.o.s. I think Ullmann is, I'm really disappointed in you. — [ R·I·C ] opiaterein — 18:32, 7 August 2009 (UTC)[reply]

We can make that a reasonable exception to the rules. The important thing is to filter out the malicious voters with zero interest in both improving Wiktionary and actually contributing here. These seem to thrive recently and we must stop such votes. --Ivan Štambuk 17:50, 7 August 2009 (UTC)[reply]

What's the percent of voters in the oppose section who created accounts in 2008 and made no edits until the vote? Bet it'll be high. — [ R·I·C ] opiaterein — 18:32, 7 August 2009 (UTC)[reply]

Presumably the way to implement this would be to vote on it? And, assuming rationality, everyone who does not fulfil these criteria will vote against such a proposal. Such restrictions are not fair, and impose a larger bias on the votes than already exists. Stopping legitimate newbies from voting is vindictive, stopping deliberately subversive votes is not necessary >90% of the time. Additionally, the amount of bureaucracy needed to ensure that everyone voting fulfils any criteria is an absolute waste of resources. The only correct solution to votes that people are willing to cheat at is to accept that the outcome of the vote is undecided and return to the drawing board - making correct decisions can not be down to the amount of voting force you can muster, it must be down to proper discussion. Conrad.Irwin

Re: "And, assuming rationality, everyone who does not fulfil these criteria will vote against such a proposal": That's not even close to true, unless you plan to traipse through the 'pedias canvassing for "oppose" votes. But it's true that, as an open wiki, we probably can't even enforce the kinds of controversial decisions that these restrictions might seem useful for. —Ruakh_TALK 19:48, 7 August 2009 (UTC)[reply]

I didn't say there would be any, and the chances are there would be very few, but it is a good example of a vote that would be very skewed by implementing such arbitrary criteria. Conrad.Irwin 22:18, 7 August 2009 (UTC)[reply]

Re: "stopping deliberately subversive votes is not necessary >90% of the time" - The ongoing SC vote is exactly a situation where well-defined criteria for vote-acceptance are necessary, and where relatively significant amount of votes (both supportive and opposing, more of the latter group I'd say :D) appear to come from users expressing their political opinions and not voting on the proposed WT:ASH policy per se. It simply doesn't make sense to equally treat all votes in relatively "controversial" votes such as this one, as the "good faith" principle would always be abused in such cases. --Ivan Štambuk 20:09, 7 August 2009 (UTC)[reply]

Ok, so Ullmann's team is better at cheating than Ivan's - this is exactly why decisions should never be made by vote; votes are there to acknowledge that a compromise has been agreed on. I don't think anyone in particular is to blame in this case, but the grown-up thing to do is just abandon the current destructive snow-ball and start afresh. Even if you make it harder for newbies to contribute, you don't remove the significant effect that rhetoric can have; you don't even remove the probability that Wiktionarians who know nothing about this particular issue will vote anyway. Conrad.Irwin 22:18, 7 August 2009 (UTC)[reply]

votes are there to acknowledge that a compromise has been agreed on - there was a consensus amongst all the contributors for 3 months, nobody was bothered when I announced it in March. As far as I can see, of the regulars having any proficiency in Slavic languages, only 2 of them are voting for oppose The apparent "lack of consensus" was introduced by Ullmann by political FUD. These canvassed opposing votes by nationalist bigots who imagine that "linguistics does not determine what language is" are worthless, and it's just matter of formally making them so. --Ivan Štambuk 22:46, 7 August 2009 (UTC)[reply]

I agree that this vote is not ideal, but I still notice that you completely ignore any possibility that the people opposing might be right; as I have no knowledge on this issue, only your opinion, I am very wary. It is clear that objections should have been raised before the vote started, but hey, not everything will work all the time. The mature thing to do is just let the vote run its course, the conclusion of a vote does not prevent it from being run again, and there would be much less wastage of time if there was less discussion on the vote page. In fact, now I think about it, preventing any discussion in the Support/Oppose section of all votes might well be a good thing. Conrad.Irwin 07:52, 8 August 2009 (UTC)[reply]

I don't see on what exactly the people voting for oppose may be right. We have Croatian nationalist bigots that plainly lie that Serbian and Croatian are as distinct as Romance languages. We have others that imagine that I'm a "Yugo-chauvinist" ! Several of them openly say "this is not matter of linguistics, but politics". Not a single one of them has actually opened a WT:ASH talkpage and contributed to the discussion of the possible deficiencies of the proposal. This pretty much proves everything I wrote, on the "differences" being more imaginary than real. The only argument I've seen from the opposing clique worth discussing is that they're against "forbidding languages", which I personally consider absurd as we are not forbidding anything, only treating it commonly at a single ==BCSM== header, in a 100% NPOV way. I also agree that prolonging the vote wasn't a good thing to do. We'll perhaps have to reiterate it later, but with voted voting-acceptance threshold (for this particular vote, the date would still be July 1st). --Ivan Štambuk 12:12, 8 August 2009 (UTC)[reply]

Maybe we should be more specific, then. Maybe for votes that aren't controversial, no major guidelines should be in place, but for votes that need to be repeated (as this one probably will), we should be more strict on who we allow to vote. For instance, this current vote isn't going to affect 90% of the people who voted. They'll just go back to their respective projects and forget it ever happened. Why should their opinions matter as much as those of the ones who actually contribute in the area? — [ R·I·C ] opiaterein — 20:21, 7 August 2009 (UTC)[reply]

We can always "freeze" the vote, and close (i.e. interpret the results) later, when we agree on the details of the vote acceptance rules. It's pointless to force all the people to waste their time again and again to repeat the position which they already explicitly expressed by a vote before, and stood by it for 5 weeks. Or e.g. allow everyone to change their vote (during some reasonable period), if they feel like doing so. But simply repeating the whole procedure again...it makes me shudder at the very thought. --Ivan Štambuk 20:37, 7 August 2009 (UTC)[reply]

Modifying the conditions under which a vote is being run, while it is running, is a ludicrous proposal. You simply cannot ask for a community decision and then ignore the outcome because you don't like it. Sure, it's not the terms under which a vote should be running, but it is too late now. Conrad.Irwin 22:19, 7 August 2009 (UTC)[reply]

But there are no conditions now! (with the advent of SUL), and with obvious canvassing it's imperative that we introduce some. You still didn't explain why it would be a "bad precedent", or "ludicrous"? It's absurd to have the vote close up with different end-results in a timeframe of several weeks. We all agree that the votes of these nationalists bear little value, it's just a matter of formally acknowledging it. --Ivan Štambuk 22:39, 7 August 2009 (UTC)[reply]

It undermines any point of voting - I don't understand why you can't see that. What is the point in having a vote if the outcome is decided by someone based not on the number of votes, but simply on the number of votes that they choose to acknowledge, on criteria that they choose to impose. It's exactly as if we had a vote for a new prime-minister and labour noticed that the polls implied most 18-25 year-olds voted for the lib-dems so decided that anyone under 25 is not mature enough to vote, and then discounted all of those votes. You are certainly of the opinion that "the votes of these nationalists bear little value"; they presumably are not. How do I, who knows nothing about either political viewpoint, know which is "more right", I've seen references to academic work from both sides. I'm quite happy to not consider the political/linguistic consequences and implications of this, clearly some people aren't, maybe they don't understand the issue, but maybe it is me who doesn't understand. Conrad.Irwin 07:52, 8 August 2009 (UTC)[reply]

The present, unrestricted, situation is akin to requiring no citizenship qualification for voting in an election; i.e., allowing tourists and other foreign visitors to vote: maybe no big deal, unless the country has a small electorate that is liable to be swamped or the vote is particularly close or controversial. Do you see the analogy? Allowing this isn’t democracy, it’s heterarchy. † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 14:08, 15 August 2009 (UTC)[reply]

Your analogy is really far-fetched. The only reason why those folks are eligible for voting now is becuase we haven't set any threshold at all. That omission on our part could be trivially fixed later. As I said, if we reiterate the vote after a few more weeks, but with vote-acceptance rules set of e.g. 300 edits before July 1st, you'll get pretty much exactly the same end-result. It's not a big deal if we do it, it's just the easier way to apply the vote-acceptance rules retroactively on a frozen vote, than to wait for 1 months again. 1 month of more neddless stress upon the community, and I'm pretty sure that everyone is fed up with this mess and simply wants it resolved ASAP.

As for the "uninvolved party doesn't know whom to trust": Just wait a few more days until I collect more e-mails from professionals who actually wrote books on SC (dictionaries, grammars, cutting-edge research). You can either trust them, or "academicians" who are interested in "proving" that these are separate langauges by long political and historical tirades... Every single dictionary of "Croatain" is also 95% valid dictionary of Serbian and Bosnian, and this has been so for the last 100 years and will not change in the next 100 years. This is undeniable fact, and discussing anything else is a waste of time. For a lexicographer, there is only one way to go. --Ivan Štambuk 12:25, 8 August 2009 (UTC)[reply]

If we leave the vote alone, we'll save ourselves a lot of stress, and it'll come up as no consensus. So you can just go back to doing what you've been doing. If the other HBS contributors agree, just do it. If Ullmann wants to have a hissy fit about it, let him. He doesn't contribute to HBS so his opinion on it means shit, IMO :D. If our only HBS contributors think it better to not be racist and divide things up meaninglessly... let them. Who knows, it might help keep out the racists. Or when they get here they'll bitch themselves to death. — [ R·I·C ] opiaterein — 02:53, 8 August 2009 (UTC)[reply]

I just went to vote in the Wikimedia board elections and noted that they have a similar threshold for voting. There they require that a voter not be blocked, not be a bot, and have made at least 600 edits before 01 June 2009, and have made at least 50 edits between 01 January and 01 July 2009. —Stephen 18:14, 8 August 2009 (UTC)[reply]

It makes sense, and as has been pointed out before, 600 isn't even that difficult to get to on Wiktionary, especially with our new assisted editting tools.

I noticed just a minute ago that in our current "voting guidelines" it says "Anyone can vote, especially regulars from other language Wiktionaries" which not only contradicts the previous 'rule', but I think it gives a lot of room for the kind of meatpuppeting we're seeing on the current BCS vote. I don't vote at the French Wiktionary... it wouldn't be right. I may edit there once in a while, but their votes don't affect me. Why should users from hr.wikipedia be able to dictate how we do things here if immediately after the vote, they're going back to their main projects, never to be seen by us (at least here) again? — [ R·I·C ] opiaterein — 18:41, 9 August 2009 (UTC)[reply]

About the threshold, I would like to point out that it's very easy to get any number of contributions on a Wikipedia: you just have to look for common misspellings or typographic problems and to correct them. It's very easy, trust me (from time to time, I enter televison in the Wikipedia search box, and I correct a number of pages). I would say that it's easier to reach any given threshold on Wikipedia than on Wiktionary (because there are fewer misspellings here). Therefore, I would adopt the same kind of rules and threshold as Wikipedias. The most important rule is that voters should explain the reason for their vote: any vote without giving a reason is useless when you try to conclude.

But the most important point in the talk page is There shouldn't even be a vote if it's contentious enough for people to care that much (see above). This is related to the NPOV principle. Lmaltier 08:30, 15 August 2009 (UTC)[reply]

That's contrary to Stephen's perception that the facility of editing Wiktionary is higher. By adding translations one is able to make dozens of edits per day, whereas apart from spelling corrections this is not the case in Wikipedia. So you insist on explaining the vote? Well, I could incorporate that rule too. Do I need to decrease the amount of votes for that or you suggest just adding it to the proposed threshold? The uſer hight Bogorm converſation 10:53, 15 August 2009 (UTC)[reply]

It's very easy to contribute here by adding translations, I agree, or by adding new pages (it's easier to find new pages to be added than for Wikipedia). Nonetheless, I think that it's still much easier to find something to correct on Wikipedia. I see no reason to require a higher threshold here. The condition about explaining the vote is something different, but somewhat related: I think that anybody, even somebody with very few contributions (or no contribution at all), may bring helpful arguments, and that's what matters. Lmaltier 12:06, 15 August 2009 (UTC)[reply]

I can't really be bothered reading all that above, but couldn't you semi-protect votes (or at least, controversial ones) which has the same effect, right? Mglovesfun (talk) 13:18, 15 August 2009 (UTC)[reply]

No, semiprotection would allow anyone who registers a username to vote, which would apply to almost all of the RU’s Yugoslavian thugs who came, registered a username, and then tried to shove their uninformed but political-extremist views down our throats. —Stephen 15:51, 15 August 2009 (UTC)[reply]

Ouch!

I'm assuming that the decision by Apple to censor Wiktionary from the iPhone mentioned in this article is a result of assumption rather than research, or could they have a point? Would anyone be interested in trying to communicate with Apple to see if we can improve our standards to match theirs? Conrad.Irwin 21:31, 6 August 2009 (UTC)[reply]

It got from a WMF list that the writer of the app has to rate us "17+" for Apple to approve the app. I guess someone could do a "G" version, but is a wide-open wiki likely to be able to guarantee a "G" rating? DCDuring TALK 22:41, 6 August 2009 (UTC)[reply]

Read Gruber's original article[1] and follow-up[2] for the whole story. Apple is twice removed from us, but it appears that for whatever reasons, several developers have a need to be able to filter our database. Ultimately they have to be responsible for their results, but perhaps we can assist this kind of thing with a high standard of consistent labelling: coarse, vulgar, offensive, etc, or perhaps in more detail: sexual slang, vulgar insult, etc. —Michael Z. 2009-08-07 04:06 z

Yes, I agree [with Mzajac]. It's not "improv[ing] our standards" to remove words that meet CFI, but if someone wants to bowdlerize, we can help with that. That sort of sense-label is useful even for human readers. That said, much of our content will not be useful to a determined bowdlerizer; if we tag well, they can filter out sense lines and the stuff under sense lines; but they won't be able to use our onyms, translations, etc. —Ruakh_TALK 14:25, 7 August 2009 (UTC)[reply]

I think that the reusers can take care of themselves. They have already bowdlerized the content.

I also understand that some libraries and schools block WMF sites. And some of our direct users or rather the parents thereof have complained about the same kind of content. I don't think we have been willing to consider the implications of those complaints. Should only registered users have access to such content so that we can serve educational needs more broadly? Should we have a bowdlerized version for parents and institutions who are attempting to retard children's use of such vocabulary or respond to religious, moral, social or political norms? Or we could just leave the bowdlerizing process to institutions that have values more in line with "censorship" than WMF?

Bowdlerizing seems to me like a bad fit with the base of contributors we have, though I think we once had someone who might have had an interest. If someone would like to do it, then we could try to make that easier, but I wonder whether:

we would agree on how to do it and
we would be willing to enforce any sanctions against someone who undermined any rules we were able to enact. DCDuring TALK 15:54, 7 August 2009 (UTC)[reply]

one's vs. someone's in English verb phrase headwords

I just wanted to check my understanding of a simple point. To bust one's chops (bust one's own chops) is not the same as bust someone's chops (bust someone else's chops). The use of one's relative to an object of a simple verb, phrasal verb, or preposition phrase implies that the subject of the verb is the "one". The use of "someone's" implies that someone else is involved. "One" has crept in to more than a few headwords where it does not belong, it seems to me. It haven't checked all of the OneLook dictionaries as to their practice in this regard, but RHU/Dictionary.com uses one's and someone's exactly as I would have expected.

If this is so, how is it that there are so many entries which use "one's" where "someone's" seems more appropriate? Is there a regional (UK/US) difference? If so, I dread the implications. DCDuring TALK 01:59, 7 August 2009 (UTC)[reply]

I believe some progressive dictionaries use plainer language, like bust your chops. COD and NOAD both use one's for the reflexive and someone's for the transitive. OED uses one's and the or (a person's), for example, tickle the fancy, but hate (a person's) guts [boldface and italics sic]. —Michael Z. 2009-08-07 03:45 z

Thanks for the UK and Canadian/North American confirmation. I wouldn't object to less affected language, but the trashing of the distinctions for many English idioms really distresses me. Sometimes the two are conflated with redirects. DCDuring TALK 14:42, 7 August 2009 (UTC)[reply]

I don't think I confirmed any difference (and a sample of 1 wouldn't confirm anything)—COD is the Concise Oxford Dictionary. But I found some more: NODE (UK), Random House (US), AHD (US), and CanOD (Can.) also use one's and a person's the same way. —Michael Z. 2009-08-08 22:54 z

Do we need separate entries for expressions that have both reflexive and transitive uses of verb idioms? For example: cool one's jets and cool someone's jets? To me the "someone's" version is more general, more appropriate for a lemma. DCDuring TALK 11:56, 8 August 2009 (UTC)[reply]

I think cool one's jets is only reflexive: you cool your jets, I cool my jets. I can't cool your jets.

Anything transitive can probably be used reflexively, even if it's unusual: “I busted his chops,” “I swung wildly and accidently busted my own chops.” there may be cases where the reflexive meaning is different, though I can't think of one at the moment. —Michael Z. 2009-08-08 22:54 z

Yes. I thought you confirmed no difference, though I wasn't clear about that. I should have mentioned that 2/20 of the uses in COCA of "cool someone's jets" were not reflexive. (Linguistic creativity at work?) Perhaps given the relative infrequency it could be left to a usage note, but that makes it even harder to find or notice.

Do you think "a person's" would be more acceptable than "someone's"?

In reviewing our verb-phrase headwords containing "one's", I see a majority that "permit" transitive, non-reflexive usage -- not that one couldn't find some exceptions. Very many have objects that are considered exclusively under one's own personal control or experience (eg, temper, tongue, time). But perhaps one could bide one's master's or employer's or client's or principal's time.

Contributors have vastly preferred using "one's" to "someone's" or "somebody's". In many cases, such as prepositional phrases, there is no direct harm. But they also use "one's", even when the reflexive use use is not common or even "impossible". If we were starting fresh perhaps "someone's" could be mandated in all headwords, except those with reflexive use (ie certain verb phrases). This would have the effect of sensitizing users to the difference in meaning between the two terms in the cases where it matters: verb phrases with objects. But it doesn't seem realistic to change any non verb-phrase headwords with this rationale.

So the open questions, just for verb phrases, are:

Given that transitive includes reflexive, logically it should never be a problem to substitute "someone's" for "one's". But it seems to be. At how high a level of relative frequency of non-reflexive usage should the lemma be worded with "someone's" instead of "one's"? 1%? 2%? 5%? 10%? 20%? 40%?
Does it ever pay to have separate reflexive and non-reflexive/transitive entries?

I am looking forward to hearing more thoughts on this. DCDuring TALK 01:02, 9 August 2009 (UTC)[reply]

I don't sure that there are more than a dozen more verb-phrase entries that use "one's" where "someone's" would be better. There must be more problems in the bodies of the entries, but that is a second-order problem. We also have many uses of "somebody's", but that is only a matter of style consistency. DCDuring TALK 01:45, 9 August 2009 (UTC)[reply]

I only found one in COCA: “maybe the prospect of a 15-hour flight has cooled your jets,” but it does seem to show that this is transitive. (Bonus points for someone who demonstrates that the transitive is a new sense extended from the reflexive.)

Well, the reflexive (only-reflexive) is not transitive, so it should have “one's.” We don't have the capability or knowledge to do frequency studies right now, so I wouldn't set a quantitative threshold—we have to rely on citations and editors' judgment.

Separate entries? Depends on the case. I think there is a lot of subtle judgment to be shown in these. For example, the stereotypical phrase is “not on my watch‚” but there's nothing wrong with “I won't mess up on your watch”—so I think on one's watch is technically transitive, but typically reflexive.

But a look at a couple of pages of search results for one's and someone's tells me that most editors have intuitively done this right. The only mistake I found is under one's thumb—it seems to me that you can be under anyone's thumb except your own. Is this an example of a transitive, non-reflexive case? —Michael Z. 2009-08-09 02:03 z

Yes. Once I was a fish who just swam, reasonably well. Then I thought I could describe how I swam. Then I found my description weren't very accurate. Now I find that I can't swim while thinking about it. I've even lost faith that others really know how to swim. It might just be time for a wikibreak. DCDuring TALK 02:27, 9 August 2009 (UTC)[reply]

CFI for English versus other languages

Am I the only one that feels like there are a load of deletion requests for English words, but when someone (okay, thinking of me) makes a nomination in another language that is equally unidiomatic, often much worse, it gets kept? On fr: as well we have some English stuff that I'd like to get rid of, but I can't get it through a vote. swing away was one, surely swing (make a swinging movement) + away covers this nicely? It's that old chestnut where the translation of an idiomatic (or even single word) term in English is unidiomatic in the target language. Consider the Spanish conmigo which means with me, I'd be very unsurprised if that got created in Spanish. Mglovesfun (talk) 16:05, 7 August 2009 (UTC)[reply]

Could fr.wikt count as evidence our exclusion of an English term? How different are their standards for inclusion? Perhaps we could provide SoP determination services for them on a request basis. We might get a few entries or improved/additional component-word senses. DCDuring TALK 16:45, 7 August 2009 (UTC)[reply]

What are some examples of French entries that you wanted to delete? I suspect that the apparent different attitude is based on the fact that English is our native language, so we don’t need to bother ourselves with unidiomatic SoP terms in English, while French is a foreign language that most of us do not know, and entries that the French Wiktionary might reject, such as aux, are useful and needed by us because for us they are not so crystal clear. So it will be easier to understand your argument if you bring some examples. —Stephen 18:23, 8 August 2009 (UTC)[reply]

Numbers

I created 1992 which was tagged as RFD almost straight away ([3]). 1992 has its own entry in the online Oxford English Dictionary. I won't duplicate that specific debate here but what is the policy about entries made (only) from the digits 0 - 9.

My first thought was that some such as 101, 911 and 999 are useful but I that would question 5555.

Having though about it more there is a case for allowing all digits 0 to 9 just to illustrate the number system, there is also a case for including numbers like 11 and 50 which have their own word ("eleven", "fifty") which cannot be broken down into smaller words. You could go further and say 1 to 2009 are acceptable as they refer to calendar years. I think that there will always be special cases that should be included such as 101, 911, 1471, 1992 whether or not they fall in the 'basic range'.

I would also support including entries starting with 0 where appropriate e.g. 007 (see James Bond).

I'd say every entry acceptable as a written out word is also appropriate as a number (i. e. 0-20, 30, 40 and so on), plus numbers with special meanings such as 666 or 1337. -- Prince Kassad 19:26, 9 August 2009 (UTC)[reply]

This last would exclude few years. It is also not part of our sole applicable policy and so requires a vote to have lasting impact. It would be better to first determine how existing WT:CFI applies.

There is nothing about a year number that would means it has to be excluded automatically by CFI. But it could be argued that, after the basic elements of a numerical system are defined 0-9, "-", all other numbers qua numbers, are defined by the "morphology" or "grammar" of numbers and are SoP in their most basic meaning and not worthy of lexical treatment just as an infinite number of constructable phrases are excluded.

The interesting question to me is: What should be valid attestation of the meaning of "1992" or similar? IMO, our standard should probably be that the year number must be used in a way that brings forth the referent events at the year number's first use in a given document (I would argue that a teaser paragraph or title should not count, precisely because such use depends on the reader not knowing the importance of the year in question: eg 1421: The Year China Discovered America, Gavin Menzies.) Would the attestation be specific to a language? If 1066 were attested only in Middle English, would 1066 only have a Middle English L2 header? Similarly 1968 in Czech, etc.? DCDuring TALK 19:58, 9 August 2009 (UTC)[reply]

I don't really "get" what the 1992 entry is meant to be, what does 1992 mean? I'm not against numerical entries either, I added fr:360 (and 540 and 720) to fr.wikt as they (to me) are undeniably English words. Mglovesfun (talk) 20:27, 9 August 2009 (UTC)[reply]

Okay I get it now it's been cleaned up - so how would comparable terms like 7/7 and 9/11 do? I think these are also synonyms of proper nouns. Mglovesfun (talk) 20:36, 9 August 2009 (UTC)[reply]

"1992" came to refer to the harmonization of legislation affecting trade within the EU and the signing of the Maastricht treaty. In the business press worldwide, "1992" was sometimes used as a reference to the process. A journalist might have asked a president of a multinational: "What are you doing to prepare for 1992?" not meaning the year, but the harmonization, and do so without any need for explanation. "Since 1992" can still be found in discussions of recent economic and political history, referring not to the year itself. There is a clearer case for inclusion than for almost any other year number because of the use in advance and the fact that some of the anticipated events did not actually happen in 1992, but are still apparently referred to as "1992". HTH, DCDuring TALK 20:41, 9 August 2009 (UTC)[reply]

Template:Spanish possessive adjective

Hello, as yo#Spanish uses Template:Spanish pronoun to navigate quickly, I suggest to extend this kinds of templates to the Spanish determiners. Moreover on fr: we also already have the French ones, ready to be imported. JackPotte 15:20, 10 August 2009 (UTC)[reply]

So I'm going to proceed as on fr:. JackPotte 21:21, 6 November 2009 (UTC)[reply]

Names

Take a look at User talk:Alasdair, I agree these need sort out, the appendices and categories. I appreciate Alasdair's massive input, but it does need to follow WT:ELE. For example, do you write Category:oc:Names or Category:Occitan names? The first one looks okay to me, by comparison with Category:oc:Place names. I have a few more things to add, but I can't find the page names yet. Mglovesfun (talk) 11:40, 11 August 2009 (UTC)[reply]

Okay, this is pretty horrible, somewhat unfinished and abandoned, possible irrelevant too.

Appendix:Names male-A, weird title, something like Appendix:Male given names/A seems more appropriate.

I don't want to "chase" Alasdair away from the Wiktionary, but this is not a blog or a personal website, there are rules and guidelines. Mglovesfun (talk) 11:49, 11 August 2009 (UTC)[reply]

Obviously we cannot have both Category:Occitan names and Category:oc:Names. Given names and surnames are parts of speech, so I would go for "Occitan names". Place names are topics (London is defined as a city, not as a place name), and Category:Place names is an erratic title. It should be Category:Places (see Dan Polansky's user page, under Surnames ), but nobody has the energy to change it.User Daniel. has recently created Category:Spanish names, to be used in his Template:namecatboiler.

But do we need a top level name category at all? A "name" can mean too many things, every proper noun can be called a name, rose is a name of a flower.

As for Alasdair, Appendix:Names and Appendix:Surnames are quite useful, although they do contain many mistakes. It's good to have all given names and surnames in alphabetical order. I wish we could persuade Alasdair to stick to them. Every time Alasdair steps outside them he creates a mess - not only in format, but I'd say about five percent of his information is erratic or mere guesswork.--Makaokalani 12:13, 11 August 2009 (UTC).[reply]

I always wondered how Appendix:Names and Appendix:Surnames are supposed to be used? What are they for? I could clean up Appendix:Armenian given names if I knew its fucntion. --Vahagn Petrosyan 13:18, 11 August 2009 (UTC)[reply]

I can see two reasons for name appendices. One is a preliminary list to see which names we are missing - a good example: Appendix:Hungarian male given names. Or something that cannot be explained in an ordinary entry or through categories, like frequency - a good example: Appendix:Chinese surnames. I don't understand why Alasdair copies names from Danish or Norwegian categories and calls them Appendix:Danish/Norwegian given names. But we also get strange appendices transwikied from the Wikipedia.

Appendix:Names and Appendix:Surnames can work as preliminary lists, if you use discretion. Alasdair has been adding secret explanations for them for three years, nobody understands why. For example, Kettu is supposed to be a Finnish male name, from the surname Kettunen (!!). But what's the harm? Very few people are likely to push the edit button. Maybe Kettu really is a name in some language. What I'm worried about is that he'll put it in Appendix:Finnish given names, or make an actual entry.

Maybe a bot could make all these remarks visible, in small text for example. But then somebody would have to clean them up. I'm certainly not volunteering.

While we are on the subject of Armenian names, what do you think, Vahagn, if I created "Category:Armenian male/female given names in Roman script"? Or should they be grouped separately by every language using Latin alphabet, is there too much variation? It seems wrong to define transliterations as "English", I think they should be "Translingual", even if they are not used in every language of the world. But whatever the language statement, a proper category is missing. --Makaokalani 09:36, 12 August 2009 (UTC)[reply]

You're raising a very tough question. I'm sure you remember this discussion which did not end in a real consensus. As I see it, we need three types of categories:

1) Armenian names in Armenian script, e.g. Աժդահակ (Aždahak) (such categories already exist at Category:Armenian male given names and Category:Armenian female given names)

2) Non-Armenian names in Armenian script, e.g. Օբամա (Ōbama) (no such category yet)

3) Armenian names in Latin/Cyrillic/etc. script, e.g. (deprecated template usage) Vahagn (no such category yet)

3a) Armenian names in English, transliterated according to English pronunciation rules, e.g. (deprecated template usage) Azhdahak

3b) Armenian names in French, transliterated according to French pronunciation rules, e.g. (deprecated template usage) Ajdahak

3c) Armenian names in Russian, transliterated according to Russian pronunciation rules, e.g. (deprecated template usage) Аждахак, etc. --Vahagn Petrosyan 14:01, 12 August 2009 (UTC)[reply]

That's the most logical comment I've heard about this problem so far. And the list would go on:Armenian names in Thai, Thai names in Armenian...Here and here is some more discussion. Real names (like English Natasha, pronounced as in English) should be separated from the way a foreign name appears in that language (like Kirill, pronounced as in Russian). Surnames are easier, immigrant families usually keep them but change the pronunciation in a few generations. "Category:English surnames from Armenian" is fine. But if Vahagn is an English proper noun, how can it be entered in a category beginning with "Armenian..."? Should transliterations be grouped by script or by language? Hundreds of languages use Roman script and there usually isn't so much variation. We have very few names of this kind yet, maybe it's too early to worry. I'm just nervous that somebody will create categories like "English given names from Armenian/Armenian given names from Thai" etc. --Makaokalani 12:22, 13 August 2009 (UTC)[reply]

The categories can be called Category:English renderings of Armenian given names if you want an English proper noun to be in a category starting with "English...". Re the grouping: I think it definitely should be done by language, not scripts. For example, Armenians in France render their names in Latin differently from the ones in the US, Uruguay or Turkey. Besides, different sections in different languages would have different inflections. If you are worried about the same spellings in hundreds of languages being entered into Wiktionary, I don't see that happening. I'm sure just an English entry for Latin spelling or a Russian for Cyrillic one will suffice in practice. --Vahagn Petrosyan 23:16, 14 August 2009 (UTC)[reply]

I'm skeptical about much of that (anyone surprised?)

Names (a.k.a. proper names, proper nouns) are a special kind of word.[4] They are translingual in a way. Whichever language you speak of me in, I still have the same name.

Romanizations and other transliterations are typically not “English renderings” or whatever. Some romanization schemes are language-specific, but others are not, and even the ones that are are usually used in different languages. People who romanize their names tend to use one form in any language. I remained Michael when I travelled in several non-anglophone places, and Russians have one official passport romanization that they must use throughout the world.

Language is more clearly an etymological attribute of a name than a synchronic one. Михайло is a traditional Ukrainian name, but it may also be used in Russian or Bulgarian. Michael is an English version of it, but used in many languages. Mykhajlo is a romanization of it, also used in many languages, including English.

By the way, Place names and Surnames are lexicographical categories classifying words. Places and people are encyclopedic categories classifying things. People, places, and things are already well categorized in Wikipedia—let's stop trying to duplicate their work, because we will forever do it worse—and anyway we only have entries for words in the dictionary, not entries for things. —Michael Z. 2009-08-19 03:35 z

Aside: Names are never "male" or "female", as they have no biological sex, no reproductive organs, and neither mate with each other nor reproduce. Names have grammatical gender, which is properly expressed as "masculine" or "feminine". If we are going to talk about corrections in pages names, etc., then this should be addressed. --EncycloPetey 05:31, 16 August 2009 (UTC)[reply]

I disagree. English names do not have grammatical gender, because English does not have grammatical gender. They do, however, have social gender, in that, say, "Michael" is used for men, "Michelle" for women. Similar things can be said about languages like Hungarian/Magyar and Finnish and Persian/Farsi, which all lack a masculine-feminine distinction linguistically (even more so than English — they don't have separate pronouns for men as for women), but nonetheless have names that tend to belong to one sex or the other. Nowadays "masculine" and "feminine" are usually preferred to "male" and "female" when social gender rather than biological sex is at issue, but in a dictionary, we need to be extra careful not to give the impression that grammatical gender is in play. —Ruakh_TALK 01:39, 21 August 2009 (UTC)[reply]

(off topic, but...) English does have grammatical gender, but not in agreement between head and dependent or in morphology (a few rare nouns aside). Grammatical gender in English is evident only in pronoun selection. With personal pronouns, we have masculine (e.g., he), feminine (she), and neuter (it), and with relative pronouns, we have personal (who) and nonpersonal (which). Because of gender, you can say the baby is in its crib, but you can't say *baby Mia is in its crib).--Brett 23:30, 21 August 2009 (UTC)[reply]

I've seen some "stolen gender": a blond man, a blonde woman, a naïf man, a naïve woman. Vanishingly rare, though. Equinox ◑ 23:37, 21 August 2009 (UTC)[reply]

Google searches with non-alphanumeric characters

A common problem, I bet this has come up before, but Google tends to ignore (or just deal badly with) special characters like é, è, ë when doing searches, which makes it harder to verify stuff. Is there any way to get round this, or is there another search engine that deals with them better? Mglovesfun (talk) 12:50, 14 August 2009 (UTC)[reply]

Put it between quotation marks, and then only the exact form will be matched. E.g. try "Bronte" vs. "Brontë" Qorilla 14:13, 14 August 2009 (UTC)[reply]

I'm already doing that, for example réglement vs. règlement. Either it can't tell the difference, or it assumes I want all of these (also reglement, Reglement, REGLEMENT, etc.) which I don't. Mglovesfun (talk) 14:17, 14 August 2009 (UTC)[reply]

You must have a different Google than me. It works perfectly fine here. -- Prince Kassad 14:28, 14 August 2009 (UTC)[reply]

Try a plus sign before the word: +réglement. --Vahagn Petrosyan 16:23, 14 August 2009 (UTC)[reply]

When I follow google:"réglement" and google:"règlement", I get two different result sets. --Dan Polansky 16:53, 14 August 2009 (UTC)[reply]

Years ago Google used to ignore most diacritics in the USA, but not elsewhere (e.g., in Canada, where French is an official language). Perhaps it still works that way. —Michael Z. 2009-08-16 03:18 z

Transliteration in Template:l

I will add support for tr= to Template:l so that people don't add the transliteration in brackets after the word, (and would have the bonus that if we get Extension:Transliterator installed it can automatically add them). This is working on the assumption that we want transliterations beside links, which I personally think are good, but no doubt there's a whole 'nother argument to be had about that. The only hitch is that it would use the same space as the current gloss= parameter (in brackets after the word). At the moment, the gloss= parameter is used by American Sign Language to indicate the English spelling of the word, and by some foreign language definitions to point to the correct English definition. (see وصل and 1@TipFinger-PalmBack-1@CenterChesthigh-FingerUp 1@BaseThumb-PalmBack-1@CenterChesthigh-FingerUp). As neither language need transliteration this shouldn't be a problem, but it is an issue that may need resolving at some point. Conrad.Irwin 22:17, 14 August 2009 (UTC)[reply]

Template:term handles that issue well, I think.—msh210℠ 02:56, 17 August 2009 (UTC)[reply]

{l} was suppose to be a simple template used for the listings in appendices, ====X terms==== and such, where there would be no need for transliteration and glosses, essentially simply a shorthand for typing the full language name and the linked term twice. Now it appears that the transliterations are needed almost everywhere in case of obscure scripts, and esp. in case of obscure fonts supporting obscure scripts, and people like to add glosses in ====Related terms==== and similar, and with this new functionality this template would simply become a clone of {{term}}. Perhaps it should simply redirect to {term}? --Ivan Štambuk 03:46, 17 August 2009 (UTC)[reply]

But {term} italicizes.—msh210℠ 21:20, 17 August 2009 (UTC)[reply]

Wiktionary logo 2009 refresh voting

Hello Wiktionarians! The Wiktionary logo 2009 refresh has been going on for a while now, and as the logo submissions have been made, the voting will start soon. Please visit meta:Wiktionary/logo/refresh#Voting for a discussion about the voting, and participate when the voting actually starts at meta:Wiktionary/logo/refresh/voting. One of the reasons why the old logo vote failed to reach consensus was that too few people from Wiktionary joined the vote, so please consider helping the project get a universal high-quality logo (even if you prefer the current logo, you're allowed to vote for it). Thanks! Wyvernoid 11:56, 15 August 2009 (UTC)[reply]

Linking headwords

WT:ELE#The entry core doesn't directly address treatment of the headword, except to say “For uninflected words it is enough to repeat the entry word in boldface.”

Many editors link parts of the headword, although this is not mentioned in the guidelines. Some infer that this serves as an etymology, to the extent that I recently had words with an experienced editor who was removing information from etymology sections in part or in whole, because he felt that headword links already communicated the information.

Problems with linked headwords:

Links in the headword are not etymological, except by coincidence.
- Example: farmer's sausage doesn't come from farmer's + sausage but is a borrowing from something like bauerwurst (“farmer's sausage”)—others: art for art's sake, ball lightning, flea market.
- In thousands of entries, compounds are linked but not etymologically—for example, we see links like Spanish Water Dog (now improved), but the etymology is Spanish + water dog.
The links hide subtleties: in the above example, the words Water and Dog, or Water Dog link to water and dog or water dog, hiding the case distinction from the reader. It's bad practice and confusing for the reader to have link targets differ from link text (especially when the difference is a subtle as capitalization, and most especially when the capitalization is significant, as in Wiktionary).
The links hide everything: you can only see where they start and stop one word at a time, while you mouse over the term and watch carefully for the underlines appearing. In trompe l'oeil#Italian the link boundary falls within a word.
Coloured links turn the most important part of the entry into a multicoloured collage of blue link, purple visited link, and black unlinked text. See St. Elmo's fire (in my browser: purple St., blue Elmo, black 's, purple fire), trompe l'oeil (two variations in the entry), etc.
If the links actually did represent etymology, there's no way for the reader to know this, and having a standard “Etymology” section discourages the reader from guessing that this may be the case.

The headword links water down the visual impact of the headword, give uninterpretable and inconsistent information, and link terms for unknown reasons, which should either be explicitly mentioned and linked in another part of the entry, or are linked for no reason.

I'd like to propose we add a line to ELE saying not to link the headword. Alternately, we must explain to editors and readers how headwords should be linked, and exactly what the links represent. —Michael Z. 2009-08-16 04:18 z

I don't agree, it certainly (well, not often) doesn't do any harm. In some cases and etymology is needed as well, if it's a calque or a derivation of something. I don't think saying spring clean necessary suggests that this is the etymology, just that you can click on these links for further information. Sometimes autogenerated titles needs to be corrected, like Statue of Liberty needs lower case (statue of liberty) because the two nouns are only capitalized as part of proper nouns. Mglovesfun (talk) 10:45, 18 August 2009 (UTC)[reply]

At the very least we should state clearly in ELE that such links are not an etymology, and in all cases an etymology is needed as well. Is the reader supposed to guess whether these links are etymological or not in each entry? What prevents readers and editors from assuming that these are always etymological?

But what exactly is the point of these links? What do you mean “further information?” Such unfocussed and inconsistently-used elements water down the functionality of entries. Pointless design elements do real harm (let's add a dozen other things which “do no harm (well, not often)” and see what a great experience Wiktionary becomes).

No great design ever included elements just because they “do no harm.” Leaving these links in brings down Wiktionary. —Michael Z. 2009-08-18 17:41 z

Wiktionary:Picture Dictionary

A new user starter off this page with no discussion whatsoever. Looks like it has some merit to me. Mglovesfun (talk) 11:28, 17 August 2009 (UTC)[reply]

Yes. I think that these pictures showing the names of the different parts of a washing machine, etc., should be integrated to Wikisaurus. A thesaurus is more useful when illustrated, and should include words related to the thing (the main entry focusing more on words related to the word, such as derived words, etc.). Lmaltier 16:58, 18 August 2009 (UTC)[reply]

See eighteen-wheeler for a thought-starter. DCDuring TALK 16:26, 22 August 2009 (UTC)[reply]

Drop encyclopedic categories

Let's get rid of encyclopedic categories, and put some energy into cross-linking with Wikipedia.

Of course we'd keep lexicographical categories like Category:Surnames, Category:Place names, Category:Exonyms, and technical vocabulary categories like Category:Geography and Category:Onomastics.

But let's get rid of encyclopedic categories for things, because our entries represent words and names, not things. Wikipedia has articles about things, and categorizes things, and if we spend 1,000 editor-years working on our redundant copy of their categories, they will have spent 50,000 improving the original category tree.

Things like Category:Countries should be renamed Category:Names of countries, or deleted. We should make an effort to add sister-project boxes to all of the corresponding categories in both Wiktionary and Wikipedia, to make it easy and clear for the reader to jump to the appropriate category in the appropriate project.

Who's with me? —Michael Z. 2009-08-19 04:08 z

I'm with you on the fact that our entries represent words (names should be included only when they are words, in my opinion), not things.

I'm with you on the fact that we should not have encyclopedic categories such as Sparidae (or all category names understandable only by specialists). But categories such as Fish (i.e. categories using very common names) are useful. If you don't remember the name of some fish in Japanese, and you cannot enter Japanese characters easily, a category such as fish in Japanese is very useful.

I'm not with you with the renaming of categories: while you are right that a Countries category groups country names, this is always implicit here, so why do you want to make category names longer, more complex? The only difficult case for category naming is for separating e.g. names of towns (e.g. London) and words carrying the sense of town (e.g. megalopolis) or, generally speaking, separating "types" from "-nyms" (common nouns from proper nouns). Such cases should be discussed. Lmaltier 06:03, 19 August 2009 (UTC)[reply]

Hm—fish in Japanese is a good counter-example. One resolution would be for the Wiktionary link-box in Wikipedia to let you choose the language, but let's not get into a project like that now. I'll give this some thought.

Regarding naming: we'd like it to always be understood, but I see both newbies from Wikipedia and veteran Wikilexicographers having real trouble understanding the distinction. As a result, our categories are a mish-mash of technical subject vocabularies, thematically grouped concepts, and grammatical categories of words. Why on earth do we have separate Category:Names and Category:proper nouns? Why is it impossible to rename Category:US to Category:American English so it matches every other kind of Category:Regional English? The categories' names have to clearly define their nature, to help keep the hierarchies separate. For starters, let's give all of our 'nym categories explicit 'nym names. —Michael Z. 2009-08-19 07:09 z

I disaree with dropping topic categories, but maybe I do not quite understand the classification under your proposal. What is the hyponymy structure of the terms for categories that you are using? To clarify, the top part of the category taxonomy as I understand it:

category
- hyponym
  - term category
    - example
      - Category:Place names
      - Category:Surnames
  - topic category
    - example
      - Category:Physics
      - Category:Countries

From what you have written and I have understood, you use the following terms to classify categories (but am I correct?):

category
- hyponym
  - lexicographical category
    - example
      - Category:Surnames
      - Category:Exonyms
  - encyclopedic category
    - example
      - Category:Countries
      - Category:Rivers
      - Category:Trees (?)
      - Category:Vehicles (?)
      - Category:Boats (?)
      - Category:Sound (?)
      - Category:Movement (?)
      - Category:Communication (?)
      - Category:Language (?)
  - technical vocabulary category
    - example
      - Category:Geography - is the category allowed to include "river", given "river" is not a technical term?
      - Category:Onomastics
      - Category:Mathematics
      - Category:Physics
      - Category:Philosophy

Until you document your classification scheme with a broader list of examples, I have a hard time understanding the impact of your proposal.

Admittedly, I am inclined to oppose your proposal regardless.

--Dan Polansky 07:23, 19 August 2009 (UTC)[reply]

That looks about right. Geography illustrates the problem: it is clearly a specialized subject field and not a classification of referents, but our category naming is so unclear and inconsistent that anything goes: the category is full of entries and subcategories that relate to “geography” in three or four ways. The meaning of categories is further watered down because the same categories are applied with restricted-usage labels and [[Category:xxx]] tags, so technical vocabulary is lumped together with thematic categorization (furthermore, labels are widely misused, and unjustified labels like {{bird}} promote misunderstanding).

We really need to do something to improve this embarrassing state. —Michael Z. 2009-08-19 08:57 z

I agree that categories for referents are for WP, not us. Fish in Japanese can be found (in theory) by looking up the word fish in Japanese (if there is one; I'm sure some categories of things in some languages have no name for the type but do for the individuals) and looking at its listed hyponyms or Wikisaurus entry. I think topical categories should go, though categories indicating fields' jargon should not.—msh210℠ 20:14, 20 August 2009 (UTC)[reply]

"The following is a list of Estonian words related to geography." I think that's pretty clear. I love the topical categories. Love. — [ R·I·C ] opiaterein — 22:44, 20 August 2009 (UTC)[reply]

I love the topical categories as well. They allow me to find missing entries on a theme, and I've seen quite a number of other editors make use of them for that. It's not possible to determine accurately whether non-editors are making as much use of them, since such activity doesn't show up the way that new entries and edits do. Consider Category:Hair, which led to the creation of many entries on names of hairstyles, facial hair patterns etc., and not only in English. The categories have a use also in finding a word that you know is in a particular field or related to an idea, but you can't fihure out how to express it. Additionally, I learned words related to hair that I never knew, all because contributors categorized articles that I had missed in my ignorance. There, I have presented three good reasons to keep these categories. I haven't seen an actual reason presented for getting rid of them. Some problems in our current categories have been pointed out, but that's reason to fix them, not to eliminate them. --EncycloPetey 02:53, 25 August 2009 (UTC)[reply]

Then how about separating technical vocabulary categories from encyclopedic/thematic categories? Terms marked with regional and usage context labels are sorted into categories representing dialects and specific usages. We also have a rich set of restricted-usage labels which represent a very specific set of lexical information (applied by Topical context labels), but the terms so marked are mixed in with plain category tags applied to thousands of other entries. Perhaps we can use a set of prefixes for separate hierarchies? —Michael Z. 2009-08-25 06:29 z

Clarification of WT:CFI#Names of specific entities

This comes from Wiktionary:Requests for deletion#Uncle Scrooge, and many other discussions. I'd like to amend the guideline as follows. Please suggest improvements, and I'll start a vote shortly. —Michael Z. 2009-08-19 04:47 z

A name should be included if it is used attributively, with a widely understood meaning, independent of its referent. For example: New York is included because “New York” is used attributively in phrases like “New York delicatessen”, to ~~describe~~ refer to a particular sort of delicatessen. A person or place name that is not used attributively (and that is not a word that otherwise should be included) should not be included. Lower Hampton, Sears Tower, and George Walker Bush thus should not be included. Similarly, whilst Jefferson (an attested family name word with an etymology that Wiktionary can discuss) and Jeffersonian (an adjective) should be included, Thomas Jefferson (which isn’t used attributively) should not.

Started a vote at Wiktionary:Votes/pl-2009-08/Clarify names of specific entities. —Michael Z. 2009-08-27 04:53 z

Since there's no discussion, I'll start the vote now. —Michael Z. 2009-08-30 17:23 z

Rename Category:US Category:American English

This was discussed to death before, and I relented because one editor had a strong objection. In retrospect, I should have taken it to a vote. So here's my justification one more time.

All categories in Category:Regional English are named for the dialects they represent. The very important dialect spoken in the USA is called American English by linguists and lexicographers. Nobody calls it US English, United States English, or “US”. Keeping the wrong name for this very important category is confusing for readers and editors, and just plain embarrassing.

Throw in your two bits, and I'll start a vote. —Michael Z. 2009-08-19 09:05 z

I support. I’d expect to find things like cowboy or cola in Category:US and not what is there currently. --Vahagn Petrosyan 10:59, 19 August 2009 (UTC)[reply]

The word English is clearly redundant in all of the English-language context tags. I'd favor eliminating it for all of the sense-level contexts in the interests of increasing the space available on a single screen for more useful non-redundant content.

I suppose the other extravagant wastes of visual space make any one waste of space seem like a trivial matter, especially compared to professional embarrassment.

I think we can take comfort from the likelihood that we would only be driving away users by making our entries more and more technically correct and complete and less and less useful to folks looking for definitions. We clearly already have more than enough contributors. If we could just reduce the demands on contributors' time from patrolling, feedback, answering silly questions, correcting entries that don't fit our unwritten framework, the extra time available should enable us to make great strides toward our eventual goal of […] (What was that goal again?) DCDuring TALK 12:25, 19 August 2009 (UTC)[reply]

The labels don't include the word English—the text is like US, Canadian, British, Irish, Newfoundland, or Hartlepool. They are usable for different languages, e.g., {{Canada|lang=fr}} puts an entry into Category:Canadian French. Only a few have language-specific text, like African American Vernacular and Western Armenian.

But the strings of labels do get long, and I wouldn't mind abbreviating them, as other dictionaries do. We won't run out of paper, but it's awkward to have labels longer than definitions, and we could be terse and expressive with US, Cdn, Brit, Ir, Nfld.

This proposal is just to make the category name reflect its contents. By the way, the corresponding encyclopedic category is Category:United States of America. —Michael Z. 2009-08-19 17:59 z

When re-implementing {{context}}, the intent was/is to have {{US}} and {{UK}} be regional context labels, just like others, defaulting to English. The labels are thus US and UK (as now), and the cats Category:US English and Category:UK English, consistently with all other region/language combinations.

As to "American English": it may be the usual name, but is an illiteracy. (Do you know how effing amusing for most of the world to listen to the US English usage of the word "American". People in the US really do think the US is the only country in the Americas. E.g. PBS Newshour: "Venezuela is America's third-largest source of imported oil." which sort of silly nonsense we see regularly. In the US, of course, it sounds just fine: Venezuela is some alien, not-really-existing place. It is especially hilarious when reading the US press referring to "Mexican immigration to America". ;-) In any case, we want to keep the tag "US", and it is best for the cat name to match the tag. Robert Ullmann 07:16, 20 August 2009 (UTC)[reply]

I agree with Mzajac, we should use the usual names (but full names, not abbreviations), even when we disapprove them. But is American English supposed to cover both Canada and the US? (it's not clear to me). It it covers both, why not creating 3 categories (American, US and Canada). If the category is reserved to words used only in the US, and not used in Canada, US English might be less ambiguous.

I would not add cowboy and cola to this category: these words might originate from the US, but they are used everywhere. Lmaltier 21:20, 20 August 2009 (UTC)[reply]

<rant> "American English" is not an illiteracy. It is a standard term used by many professionals in linguistics. Just because many (generally Central/South Americans) get rilled up when people from the USA use "American" in a different sense than others, does not make that usage "wrong". Remember, we are primarily descriptive here. Let's not have proposals to adopt neologisms such as "USan". </rant> --Bequw → ¢ • τ 02:03, 22 August 2009 (UTC)[reply]

Yes, thank you, Beq. The two main branches of English are British English (formerly called “English” in the Empire) and American English, and they existed before the Thirteen Colonies got uppity about tea and split off from British North America. It may not be PC to say so, but although Canadian English constitutes the third English orthography, it is a (distinct) variety of American English (in the last few decades, North American English has been coined in recognition that a Canadian English dialect exists). For precision, Wiktionary use of restricts American English to mean the USA, and North American English (in Category:North American English) to encompass that with Canadian and Newfoundland English (and considers Newfoundland English to be Canadian, even though we are a historical dictionary—Nfld. was separate until 1949).

“US English” and “UK English” are not used in linguistics and lexicography except by mistake. —Michael Z. 2009-08-23 23:19 z

(You may find the attributive use of US in combination—for example, US English speakers, meaning “English-speakers of the US.” This is different.) —Michael Z. 2009-08-26 00:57 z

Started a vote for further discussion at Wiktionary:Votes/2009-08/Rename Category:US Category:American English. —Michael Z. 2009-08-26 00:57 z

No discussion, so I'm starting the vote. —Michael Z. 2009-08-30 17:23 z

political parties

There are loads of political parties that I think should be deleted, but since they haven't been deleted yet, maybe they meet CFI and I'm missing something. Here are the ones I've discovered so far. Republican Party, Liberal Democrats, Labour Party, Democratic Party. Green Party seems to have merit as it seems like every country has one (should this be a common noun maybe?) but for the others, I just don't see it. Any chance of having a fairly complete list of these to see which ones we want to nominate, if any. Mglovesfun (talk) 09:56, 20 August 2009 (UTC)[reply]

what'boutdeletor phenotype? [o-did imentionithoroughlyHATE thoseINCREDIBLYSUBLIMINAL REMARKS Of quite afew Engl-nativs here-isthattheirPREROGATIV!?!butpl cmylilrant below.--史凡>voice-MSN/skypeme!RSI>typin=hard! 19:54, 20 August 2009 (UTC)[reply]

Wiktionary:Spelling variants in entry names

Input unbelievably welcome; last edits seem to be more than a year ago. Mglovesfun (talk) 19:32, 20 August 2009 (UTC)[reply]

wotifeelgoeswrong

[movd frm rfd]

i'dgo4wt tobe abroaddict>a.bit of grammar[ala Swan,whichsome entrys ractualy~dict.styl],gazeteer/geo,bitencycl.,phrasebooki/SHORTish entrysREFERING2wp,wm-books,etc>userFRIENDLY,klik-efficient[here:guidance2find wotevastuf:)[thoputinboundaryshard,irealiz
wp entrys<therisETYL2that,when1.used,changin'use praps-thatdef.sthWE'dbedoin~placenames[who rtheynamd after,lit.quotesundundund--史凡>voice-MSN/skypeme!RSI>typin=hard! 04:18, 20 August 2009 (UTC)[reply]

"Wikipedia is an encyclopedia, and they deal with topics in their articles. We are a dictionary, and deal with words in our entries. The principles of organizing an encyclopedia do not apply here because our goals are quite different." -

hate/likeit:there isOVERLAP--likalthose discusions here'bout saytheDEF OFA WORD[lexicografik1]-itookmejust2weeks dealin intesivly w/apliedlinguistics waybak i/oz2c thatMOSTofthose holy/bigwordHOTOPICS/TECHN.TERMSrpoorly defind>wotsthepoint inalthefiting??encycl/wp do alilbitof linguistiks[ipa,etyl],weneed2HELP'EMw/that[styloid-ipa?spica-etyl?let alone spica splint--have funsearchin i/wp..]>INCLUDING WP entrys [w/justLILdef-flesh,that indeed4wp],doinOURJOB w/etyl,ipa etc andsoHELPourusers.[imtrulyfedupw/althese mostlynarowsens def getinpalmdofasTHEdef[ex.:WOT IS A DICTIONARY,answerREALYNOTASTRAIGHT4WARDasu regulars'dlik2makebeliv,ncomin downw/big[policy usay?perdef>{punintended ;)}ALWAYS IN FLUX]stiks isntv.RESPECTFULeither],aweaknes esp.ofalthoseSOFTsciences as sociology,psychology etc imo[lookatsuch wp-entrys,howlers!!],nlet alonethe impresion itmaks uponanewby]
nmostofthose"dict.constraints"had2do w/SPACElimits["so we'lmakesomARBITRARYCRITERIAup"]-why esp.here onaproject ofsuchunprecedenteddimension/breadthppl rso"closed"2wotburgeonin'technologys cando4them-itleavesmebafled,butrealy..:(
nthisimhoPERVERS/DESTRUCTIVfocus on"shalwe deletethisentry,yea?!{hyper-tone intended.}"[mywatchp.isnowINUNDATEDbythem--isCREATINstuf realysoborin??]-rwe here2BUILDUPor2smashea others efortsunderthepretensofGARDIN'THEGRAIL--itsaWORKINPROGRES,4krist'sake
thenwehavthe bl/whitepushing[mywayornon],thecomnsensmidlwayislarglyuntrodn.
asstephensays-me2istartfeelinliknotmakin'entrys/substantialchangesenymore-wot'dbethepoint-nSMdesire2cthenextrfdstupiditythatlmanage2puldfromthewal??
iobviesly'vninputprob-wotsudestructivpplexcuse?ichardlyanysignofwillines2sitdown, givthingsathoughtn consequentlyformulate'em outinaclearncomprensivconstructivway-nifuvcomitmnt2get2urloftygoals,thatswotitltake,hardsciensinstedof kaffeeklatsch[findthataharshstatmnt?sitdownntake alongdeeplookatwoturdoin-idothatregularly,nfindthathelpfl.
wannadeletethiscozoftypin?go-on,nmakeafurthercharadeofurselfs!![iwenton2breakmyscaphoid,just4goodmeasure.]
thesametrustedcontributorskeepflipflopin,sayinthis,sayinthat--'gain:HOWSTHISGONNAWORK??
instedofaquiki rv,2hard2takthetim2weighthemeritnwhakitinbettershape?
theonliconsistentlinice'nthoughtflpersnhereinmyexp.isvisviva[welkombak!]--speculation:mitethe isuesrisdcontribute2sucha users absens??
thisisamegarant,sure--butwhohasbeenbrewinitfromthefirstfoot iputinsidehere???'vathought.[ivgotmorecontributionsnowalredy,mainlyonactual dict-p.,thansomsysops,w/outmuchguidans butWITHunclearguidelinz a3yo'dmostlydoABETTERJOB'ritin' jee..]
furthermore,ithoroughlyHATE thoseINCREDIBLYSUBLIMINAL REMARKS Of quite afew E-nativs here-isthattheirPREROGATIV!?!
wot1gets here:RUDEnHARSHNES,wot1NOTgets:TOLERANS,EVENHANDEDnOPENMINDEDNES
reRUnwritin2WMFboard:ivbeenontheverge3TIMS,ofdointhesame[DISCRIMINATION ISUES]1s'gain:aintnobubl here,paradise4pplkeenonDESTROYIN ES OTHERS TOYS[abitofAPRECIATION4 1s eforts?soNAIV huh--plSPENDTIMEmakin'GOOGFAITHEDITS,treat'emassuch!![ex:"recently we'd aneditor.."AINTREEdebacl,anotherNICEWELCOM!![dearth ofeditors,howkum eh?
urCMNref.manJUSTCARZ'BOUT HISPRIVAT TOYS-butCOMPLAINShethe only1adin'entrys[8000WOW-imSOimpresd,maksitUSEFL,realy.],havin'made itin2aTYPIN'SLOG-but askin4anINPUTMASK,MAKIN'HISPOINTBOUT TR-ISUES-ohno,hezurINVISIBLEXPERT,pulindastringfromhisrelmofshadz.
Posta q boutengl[letalochin.]>75%NOanswer-2busyDELETIN'STUFey,funnyglovz4most,ni'dntcalthisPATHETIC?!?
butdunspendtimeREADIN'THIS,iunderstandnowhere urtimispent.
usysopSILLY,IMATURnNOTVKNOLEDGBLppl[nothatimup4that,alredycozofmyhands]-dunufeelalotoftheprobshere ROFUROWNMAKING??[orurself-proclaimd'inclusionist'1whospecializesi/SPEEDYDELETIOONS4FORMAT-ISUES [nocontradiction?helpful?],not'avin'a clueboutmostforeignlanguagesn'even'avin'LIMITATIONSI/ENGLISH["wotsthesubjectofabiographynamd"-no ref-booksI/YOUROWNCOUNTRY??orDELETEDthepertainin'entry alredy?thegruelin'incOMPETENShere,je-sus!]-icanTRY2be niceANDthinkamapretydecentchap,butaSPADISASPADIsAS-P-A-D-E,sory4thatfact.
thathisranthasurvivd>12h w/out rv,isonliCOZICONTRIBUTED-wotimentiondheretho isclearFROMSTARTERShere-sowhySILENCE[By rv ofcours]newbys whotaketheirtimeTELIN'USO??howDAREupplcomplainboutbeinSHORTHANDEDwhenaludo=CHUCKIN'newpplOUT??[theSELFRIGHTEOUSNES'doffendGODiftheris1:(
whywasthentry"hypothenar"NOTthere-cozENCYCLOPEDICword???althoseARCHAIConceptspplike2bash eaotherw/here,makesmeflee2CCEDICT['dntwe importheirwordsbytheway soourchin.sectionbekumzFUNCTIONAL??NDIDIcreatesth2day-NO,isuccumbd2theNEGATIVFORCES HERE!!!!!!!!!!!!!!!!!!!!!
if som1 taksmycontrbutions'nmaks'em beter,imthe'apiestman i/theworld;rude/plain rm/del justCHEESEMEOFF,MEGA,asonlyLAZYINCONSIDERATEppl'ddoso.[go'n'count rv done byme vs. ACTUALIMPROVMENTS,exemplaryindeed--ori'lASKthepersonconcernd--stilthinkin'uguyzhavANYhighmoralground???
'dufeelaMENTALITYnATITUDCHANG'dbe considerd??

>igotridofmyeg, indisgust--史凡>voice-MSN/skypeme!RSI>typin=hard! 15:09, 20 August 2009 (UTC)[reply]

What the fuck is all this gibberish!!!!!

Please do an audio recording of your message and upload it to commons. Then, people might reply. -- Prince Kassad 01:55, 22 August 2009 (UTC)[reply]

isugestedAUDIOwaybak[butwasrebuted]-how2dosuch pl?--史凡>voice-MSN/skypeme!RSI>typin=hard! 02:32, 22 August 2009 (UTC)[reply]

"Please make a proposal to amend WT:CFI so that we can apply our resources to more entries. I know that we have already made all of our existing entries as good as we know how to. We need most especially to add entries that other dictionaries omit. It is particularly important that we make sure that language learners never have to work through the meaning of a phrase using entries for the constituent words. Better we should lexicalize everything. Let a billion collocations bloom."

Please make a proposal WHENI CANINPUT+constr.MILIEU/MIDSTto amend WT:CFI so that we can apply our resources to more entries. I know that we have already made all of our existing entries as good as we know how to.UR2BUSY'DELETIN'4THAT2HAPEN We need most especially to add entries that other dictionaries omit.INDEED-MYSTREETNAME:IWANT ETYL,OBSCURSPORTSTERM-IWANT PLAINENGLIS EXPL ETC. It is particularly important that we make sure that language learners never have to work through [DICT=GOLDSTANDED,NEEDS ENTRYS]the meaning of a phrase U[THEABOVPOSTER]HAVNO DEEPLEARN/TEACHING OF2NDLANGUAGE EXPERIENS,N'HENCE LAKPERSPECTIV ,AOTH BOUTHE 'CONSTANTGUESIN'N'WORKIN'OUTREQUIRD INTHATTPROCES.using entries for the constituent words.LIKE GOIN'THRU THE28SENSESOF'OFF' JUST COS SB POSTEDAN INCOMPEHENSIBLTECHN.DEF OFA CRICKETERM-NOTX[MINUTS=OK,HRS NOT4GETTIN ANEWCOLOCUTION,SOI STILDONTNO]. Better we should lexicalize everything.YES!! Let a billion collocations=NOTORIOUSTUMBLIN'BLOK4LEARNERS bloom. MYCAPS---史凡>voice-MSN/skypeme!RSI>typin=hard! 03:09, 22 August 2009 (UTC)[reply]

itsnotPOLICYppl dislike,butur0-TOLERANS~POLICESTATE[butCOMNSENS/PSYCHOLOGYSKIL1realy'lNOTfindhere.
uppl rv4TECHNICALITYS,nthansosurprisednew-editorsDONTLIKEIT-DOUPPL LIVONTHE MOONORWOT???--史凡>voice-MSN/skypeme!RSI>typin=hard! 10:49, 22 August 2009 (UTC)[reply]

American Sign Language

For the past half-year, I've been trying to develop a writing system for American Sign Language based on the Roman alphabet. I've made some progress -- enough, at least in my estimation, to warrant adding some of my results as entries in Wiktionary. However, my results are still limited in extent, only preliminary, and yet without sanction from the ASL community. Nonetheless, despite these limitations, I really do believe this method I've developed could potentially be a foundation for a writing system that could benefit ASL research and deaf culture. And with a lot of input, know-how, and initiative from the Wiktionary and ASL communities, I think this project could be a success. So, because I'm new to Wiktionary and not familiar with its policies or capacity, I've started this new discussion topic to determine whether members of Wiktionary might believe my data and initial results are appropriate to the mission of this wiki site and, if so, how an ASL section with words represented in this manner might best be implemented. However, although I hope my proposal gets a lot of sympathetic feedback, there are at least three potentially complicating factors that should be considered first: (1) Other writing or transcription systems have been attempted in the past, although only one of which, Valerie Sutton's Sign Writing, in my inexpert opinion has any following among present ASL signers -- and even its popularity seems minimal. (Anyone interested in learning more about Sign Writing should see its entry in Wikipedia.) Now, as you might expect, I think the method I've tried to start is, on balance, more helpful than Sign Writing -- mainly because Sign Writing's mode is quasi-pictorial and thus incompatible with most people's communication software. Even so, I would like to sincerely advocate for its adjunct inclusion in any ASL section in Wiktionary, if technically feasible, because I think its strengths and weaknesses complement those of my proposed method. (2) ASL communication is intimately tied to the English language, so much so perhaps that an ASL section in Wiktionary strictly separate from English might not be optimal. And (3) just as letters in any writing system are associated with certain sounds native to a language, letters in the system I'm proposing are linked to phonemic features native to ASL, and so, if possible, a page in Wiktionary dedicated just to these phonemic correspondences might be helpful. And, well, I hope after reading all the above, people could reply, give their opinion, and offer any constructive advice they might have -- I will be very appreciative; thanks. 66.213.98.17 20:56, 20 August 2009 (UTC)[reply]

Sorry to inform you that you've duplicated fairly recent efforts (especially by User:Rodasmith though also by User:Positivesigner and others) to develop a way to include sign-language entries into Wiktinary. See WT:ASL and Index:American Sign Language.—msh210℠ 20:59, 20 August 2009 (UTC)[reply]

Thank you for your response. I wasn't aware that an ASL section was already on Wiktionary, although really I shouldn't be surprised. Still (and if this is the appropriate forum to be asking), what do others think about a community-developed writing system based on the Roman alphabet? I know Sign Writing has strong proponents, but I've also heard some criticism as well. Would perhaps ASL signers in general prefer a writing system more like the standard Western European type? decimus 21:13, 20 August 2009 (UTC)[reply]

Our system (see the links above) uses the Latin alphabet already. If you wish to change that system, I recommend you make your recommendation at the more specialized page Wiktionary talk:About sign languages, since I suspect that most people watching the Beer parlour don't care how we do it. But note that we have a god number of entries and translations in the existing system already, and a lot fo work has gone into developing it, so if your recommendation is rejected, that doesn't reflect badly on the recommendation necessarily (though I haven't seen it yet): perhaps it's merely not that much better as to warrant changing everything around that's already in place.—msh210℠ 22:56, 20 August 2009 (UTC)[reply]

Misbehaving audio file....

i am using firefox 3.5.2 on win xp sp2. i tried to play the audio file on the page lion. when i click on it it takes me to a new page where firefox opens the audio file and plays lion. but when i replay the file from there it spells li-lion. there is some error here.

i downloaded the ogg file to my comp and played it. it spelled fine. no prob with the file... i haven't tested any other audio file. i don't know where the prob is. but if u experience the same prob join me to report it.... — This comment was unsigned.

This is almost certainly an issue of browser configuration. When an entry includes an audio file, it just offers a standard hyperlink to that audio file, just like a link to any other page or resource. If you can't play it properly, or it opens new windows etc., it's probably your setup. Equinox ◑ 23:41, 21 August 2009 (UTC)[reply]

Inclusion of SOPs for translations — proposal

I'd like to propose that we allow English SOPs that are found in major English-to-X dictionaries for multiple values of X. My reasons are severalfold:

Stephen G. Brown (talk • contribs), who seems to be an experienced translator, seems to think that many SOPs are useful for translators.
One presumes that the compilers of these dictionaries must also think such entries are useful.
Users frequently add such entries. I think these users are nearly always misguided (sorry, users!), thinking that something is an idiom when in fact they've simply failed to look up the component words — indeed, we sometimes get comments at RFD that basically take the form "Keep, because I've failed to look up the component words, so apparently we need all possible SOPs that ever use the senses I'm not familiar with" — but it's probably more welcoming to convert such entries into translation-hangers than to delete them. (DCDuring (talk • contribs) has often observed that we can use such entries to improve the entries for the component parts, and that's true, but I don't think users get warm fuzzies when we redlinkify their entries, even if their contribution does end up helping out in this way.)
Most major dictionaries don't take our approach of splitting out idioms into their own entries; rather, most of them treat these idioms in the entry for their most important component word. This has problems of its own — "most important component word" is often subjective — but it renders irrelevant the often-blurry distinction between SOPs and idioms: no matter which one it is, you can look it up at the most important component word. Our approach means that readers have to try multiple entries to determine if we consider it SOP or idiomatic; and worse yet, if we consider it SOP, then we usually do very little to help them find the relevant senses of the component words. (There are exceptions — someone looking up "have a cow", for example, would have relatively little difficulty finding the right sense of cow — but I wonder if, now that I point it out, someone will "fix" it to have the normal, unhelpful presentation. And even [[cow]] isn't as good as what you'll see in many other dictionaries, where the salient phrase would appear in bold and/or italics at the start of the sense line, in the style of a sub-headword.)

I therefore suggest that we give minimal definitions:

The brother of one's mother; see maternal, uncle.

and dispense with etymologies, usage notes, related terms, etc. (unless there's a specific reason to have them — in which case it's probably not actually SOP), and encourage the addition of translations.
Questions? Comments? Concerns? Death threats? —Ruakh_TALK 00:33, 22 August 2009 (UTC)[reply]

Are there to be any criteria for including/excluding these? At least one translation at time of creation? Within a month? Within a month after challenge? Does the translation have to be verifiable? How?

One thing I have often wondered about is the appropriateness of the wording of translations, especially the the use of awkward English, obsolete words, or mixed-register phrasing. Headwords with these ills will be more visible to search engines in this scenario. Will we have some tags to mark these? We will probably have many more critics of translations than we will have folks to correct the translations, but the tags might be useful to prevent translators from wasting time on awkward English expressions. DCDuring TALK 01:43, 22 August 2009 (UTC)[reply]

YES!=1.step2moreUSERFRIENDLINES--史凡>voice-MSN/skypeme!RSI>typin=hard! 02:14, 22 August 2009 (UTC)[reply]

Could we perhaps use this as the long-sought inclusion criterion for Phrasebook? I would replace "major dictionaries" with "at least [3] dictionaries published in print," but otherwise it seems like as good a basis for this problematic area as any. If we are going to have these entries, it would be much better to integrate them with our existing translations-only entries under a single, clear criterion. -- Visviva 05:49, 22 August 2009 (UTC)[reply]

wotstheOBSESIONw/PRINTEDICT.s-theySUKanyway!--史凡>voice-MSN/skypeme!RSI>typin=hard! 10:15, 22 August 2009 (UTC)[reply]

I like the idea. However, I would not express it this way. An English-French dictionary might mention in its bird entry: little bird: oisillon. This is not a reason for accepting little bird as a separate page, this information should be given in the translation section of the bird page. Anyway, nobody would consult a little bird page (or am I wrong?). But, if it can be considered as a verb or a set phrase, such as thrust back, vector graphics or ranked society, it makes sense to include it, even when its meaning can be deduced from the sum of its parts.

These cases are simple cases. There are more difficult cases. My English-French dictionary mentions to give an account of and provides a translation. Should give an account be considered as a set phrase and created as a page here? Maybe, but I'm not sure. Is it a set phrase?

In other words, I think that all set phrases should be accepted, even when SOP. A good reason is that, although they can be understood when heard, their existence cannot be guessed by people not knowing them, who might use slightly different words, and be misunderstood. But when something is not a word nor a set phrase, it should not be included (and, anyway, the page would not be consulted) and the translations given in the page(s) for its components. Lmaltier 06:21, 22 August 2009 (UTC)[reply]

I've always been against adding SoP terms in any language because they can be translated by a single word in another language (usually English for us). We've had some real atrocities on fr.wikt like high school student because that's collégien (or lycéen) in French, which is of course one word. Mglovesfun (talk) 10:19, 22 August 2009 (UTC)[reply]

I do think set phrases get a rough deal here. I tend to think if something is really common it should get an entry, unless it's unbelievably sum of parts. Be able to, to me anyway, is so commonly used, and since be and able have a lot of meanings, I'd support it. Mglovesfun (talk) 10:24, 22 August 2009 (UTC)[reply]

(unindent) I welcome the initiative, but do not know the impact of allowing translation targets. Quite possibly, the likely impact could be discouraging, meaning we would let too much in. The likely impact—the likely added new terms—should better be documented.

I have created Wiktionary talk:CFI#Translation target as a home location of the topic, from which I have linked to this discussion in Beer Parlour. --Dan Polansky 17:17, 22 August 2009 (UTC)[reply]

I'd be in favour of a good lexically-based proposal for accepting “set phrases,” common expressions, or whatever. I'm against making up new English terms and creating entries for them, just to support foreign-language entries—this is just multiplying the work required for a single entry. Headwords get entries, glosses and definitions go into the entries. —Michael Z. 2009-08-23 18:16 z

Re: "I'm against making up new English terms and creating entries for them, just to support foreign-language entries": Yeah, of course. No one is suggesting that. —Ruakh_TALK 19:32, 23 August 2009 (UTC)[reply]

I am sorry for the confusion that I have created with creating the section page "Translation target" at the talk apge of WT:CFI. Let us continue the discussion here, if you don't mind, or otherwise correct me if I'm wrong.

I am reposting the terms that possibly lie within the impact of the proposal, although people disagree on whether they do.

Examples of possible translation targets:

high school student – French: collégien or lycéen; but: "highschooler"
indoor football – Dutch: zaalvoetbal; is this actually a non-SoP name of a sport?
problem solving – German: Problemlösen; but: "problemsolving"; but-but: "problemsolving" is much less common than "problem solving"
small boat – Czech: loďka, lodička; diminutives in general; but: "boatlet"; but-but: "boatlet" is rare.
two-wheeled – Finnish: kaksipyöräinen

Candidate criteria for inclusion of translation targets, even if sketchy:

(C1) The translation target has to be included in at least three printed translation dictionaries.

--Dan Polansky 07:45, 24 August 2009 (UTC)[reply]

I think it could work. In fact I would go further and say that terms like "high-school student" are properly idiomatic, in the true sense of the word, in that they are the most natural way of expressing the concept in English. So if a term is idiomatic then I would always consider it worth including, even if it is also sum-of-parts. Ƿidsiþ 22:45, 24 August 2009 (UTC)[reply]

I disagree. Allowing high school student opens the door for middle school student, elementary school student, art school student, community college student, etc. All of these are the most natural way to express the concept, and all are [attributive noun] + [noun]. Likewise, small boat should not get an English entry. It is in no way idiomatic. There are other words in English used for "small boat", depending on the context or type of boat. Indoor football is called arena football in the US, and that (at least) might merit an entry, since it is not clear from the combination that the arena must be indoor. However, it is used only for indoor American footbal, not bor indoor association football (soccer). --EncycloPetey 02:39, 25 August 2009 (UTC)[reply]

thisis justMOREfrom som1whoNEVERGOTANYWHERE IN FOREIGNLANGUAGSKILLS-butelmeTRAD-userHOSTILE-DICT-PETER,where amigonafind fi the chin.tr-l 4'internet acces',tr-literations like4'gwBush',spREDOVERTHEPARTSUTHINK?!?o-leme gues,icango'nCHEK WP INTERWIKIS,thatsolvzit ay--reread a user's w/PERTINENTEXPERIENS like sgb 's coments,letitSINK,rUMINATEit,NTHEN COMBAKw/aMORSENSICAL ELABORATIONifupl,blAMATEURtalk here!

ps mostengl-speakers rNONATIVS[bilions of'em!!],theyneed2be aHI PRIORITY4en.-theDE FACTO WORLDLANGUAGE-wt,nNOTjust som oldfashiondGRAMARFREAKSwhothink theygothe a&o boutMAKIN'A DICT justcos theypourd a lotoverOLDSTUFYBOOKS--ivhad anABSOLUT OVERDOSE ofBAAAD DICT.S 'n wt 'ljoin'em/thoseranks OVERMYDEADBODYonly,n throwtherest ofurOLDRESTRICTIVGUARDatme,i'l[grudginly admitently ] deal w/it[no1 everbeen abl2say i'dlak pluckines.]--史凡>voice-MSN/skypeme!RSI>typin=hard! 06:42, 25 August 2009 (UTC)[reply]

Echoing Widsidth, there is one thing that bothers me about Wiktionary's use of the term "idiomatic". I've grown to read "idiomatic" here as "not sum of parts", but my pre-Wiktionary understanding of the term "idiomatic" was different. My original understanding was that a phrase or an expression is idiomatic if it sounds fully natural in the given language, but its naturalness cannot be derived purely from the knowledge of the meaning of the parts, and, equally importantly, the non-naturalness of a non-idiomatic phrase cannot be derived purely from the knowledge of the meanings of the parts. So what cannot be derived from the meaning of the part is the naturalness of the sum, while the meaning of the sum may still be perfectly clear from the meaning of the parts. To disambiguate, I store the concepts under two terms in my mind: "Wiktionary:idiomatic" and "Dan:idiomatic".

Keeping an additional concept on a given overloaded term "idiomatic" is not a big deal. But there is the template {{idiomatic}} that may be not wholly consistent with the "nonSoP" reading of "idiomatic". Per Wiktionary:idiomatic (=nonSoP), each multi-word term is idiomatic. And yet, not every multi-word term is marked using {{idiomatic}}. The term "black hole" is a multi-word term and a nonSoP, but it is not Dan:idiomatic; the non-SoP-ness of "black hole" has nothing to do with Dan:idiomacity. What is Dan:idiomatic is "to make sense", instead of the German "Sinn ergeben" or Czech "dávat smysl".

I am writing this to share a confusion that I have had about the term "idiomatic" for some time. I am not proposing with this that WT:CFI's "idiomatic" should be redefined to mean Dan:idiomatic. --Dan Polansky 10:48, 25 August 2009 (UTC)[reply]

Halfwidth and Fullwidth Forms

Some graphemes are represented in Unicode as both halfwidth and fullwidth characters. We're not super consistent in our treatment of these distinctions. Does anyone have a preference of using a hard-redirect, soft-redirect, or not including them at all?

Halfwidth and Fullwidth Forms Unicode Block - Unicode.org chart (PDF)

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

U+FF0x

＇

／

U+FF1x

７

？

U+FF2x

Ｇ

Ｏ

U+FF3x

Ｗ

＿

U+FF4x

ｇ

ｏ

U+FF5x

ｗ

｟

U+FF6x

ｧ

ｯ

U+FF7x

ｷ

ｿ

U+FF8x

ﾇ

ﾏ

U+FF9x

ﾗ

ﾟ

U+FFAx

(ﾠ)

ﾧ

ﾯ

U+FFBx

ﾷ

U+FFCx

ￇ

ￏ

U+FFDx

ￗ

U+FFEx

Note: U+FF65–FFDC encodes halfwidth forms. U+FFE0–FFEE includes fullwidth and halfwidth symbols. --Bequw → ¢ • τ 00:53, 22 August 2009 (UTC)[reply]

From The Unicode Standard 5.0, pages 434-435, they were only added "[t]o achieve round-trip conversion compatibility with [...] mixed-width encoding systems[...]". We have no need to convert between legacy encodings. They are an artifact of encoding systems, and we have no business using them. Just redirect them. There is no need to create entries with words written in these characters either, so perhaps even better, something should be done at the base Wiktionary level to automatically convert these to their standard versions. Bendono 01:20, 22 August 2009 (UTC)[reply]

ｗ is the only one that can even be considered keeping. All the others should just be hard redirected. -- Prince Kassad 01:52, 22 August 2009 (UTC)[reply]

Anything other than a soft redirect seems to violate the principle of least astonishment IMO. If I enter this into the searchbox or URL line, it's probably because I ran across it somewhere and want to know what the heck it is; redirecting doesn't really give me the information I'm looking for. This is actually one area where Wiktionary can (currently) provide better information than Google and most other search engines. So if we do redirect, the redirect should go to the "Appendix:Variations of Foo" page rather than the page for the half-width form. But given the community's unwillingness to delete or redirect other random Unicode codepoints (thinking of the Hangul syllabic blocks here), I see no reason why we would exclude these. -- Visviva 05:38, 22 August 2009 (UTC)[reply]

Prince Kassad, they are not non-standard characters, they are just foreign characters. Japanese standard input produces full-width ０, １, ２, ３, ４, ５, ６, ７, ８, ９, instead of "standard" 0123456789, some other symbols from the above table are also used in standard Japanese, e.g. ＊ and　＆　(counterparts of * and &). ２ is a redirect page, should be removed IMHO. They occupy more space and look wider and look harmoniously with the rest of the characters. Chinese also use them but not as consistently as Japanese. We have entries for Arabic numerals, we might have Japanese as well: ٠ zero, ١ one, ٢ two, ٣ three, ٤ four, ٥ five, ٦ six, ٧ seven, ٨ eight, ٩ nine. Anatoli 10:21, 22 August 2009 (UTC)[reply]

I have created Japanese entries for full-width numerals. I could use some more help in usage in other languages. The full-width Roman characters have a similar usage in Japanese, so CD is normally spelled ＣＤ (full-width) or CD (half-width) in Japanese　but I have to find if it's standard. The half-width Roman characters seem to be more prevalent in Japanese but some Japanese word-processors and dictionaries use full-width only. Anatoli 10:50, 22 August 2009 (UTC)[reply]

The difference being that half/fullwidth is strictly a computer issue. I don't go around and write halfwidth and fullwidth letters on a piece of paper. The same reason is why we hard redirect entries with typographical apostrophe to entries with the normal ASCII apostrophe. -- Prince Kassad 11:03, 22 August 2009 (UTC)[reply]

Whilst we have hard redirects from entries with the typographical apostrophë (such as from ha’p’orth to ha'p'orth), we have distinct entries for the individual characters themselves; e.g., (deprecated template usage) ', (deprecated template usage) ’, (deprecated template usage) ‘, &c. Accordingly, we ought to hard-redirect something like (deprecated template usage) ６９ to (deprecated template usage) 69, but we should have full explanatory entries (especially for technical data) for the individual characters themselves. † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 11:22, 22 August 2009 (UTC)[reply]

We don't write here on a piece of paper, I see the use for this info. The codes, look and especially usage differs and it matters. If in Japan they use ２００９ or 二〇〇九, not 2009, users may want to know it. On a piece of paper, Japanese would write numbers with larger spaces, especially noticeable in the vertical script. Some Japanese words with Roman letters might need some fixes, like JR -> ＪＲ (Japan Railway) or CD -> ＣＤ. That's the way they appear in a Japanese text online. Although, the half-width are also used. Anatoli 11:32, 22 August 2009 (UTC)[reply]

Note also that this will be a precedent for creating 1,000 entries for italic, bold, bold italic, fraktur/blackletter, double-struck, monospace and sans-serif Latin letters in Unicode (why they were encoded in the first place is beyond me). -- Prince Kassad 15:04, 22 August 2009 (UTC)[reply]

True, but that doesn't seem problematic as long as we do them all, with some measure of consistency as to format. The total number of assigned codepoints in Unicode 5.1 -- about 240k -- is much less than the number of attested lemmata in a typical written language. -- Visviva 03:01, 23 August 2009 (UTC)[reply]

I think there's an even stronger reason to have real entries for the w:Mathematical alphanumeric symbols (that Prince Kassad mentions) than for both full/half-width forms. While many fonts do display full/half-width forms differently, they are not necessarily different (generally one's format determines the choice between 0 and ０ not some semantic difference). Unicode specifically said the difference between the styles in mathematical symbols was "fundamentally semantic rather than stylistic", so I would want entries for these symbols for sure. --Bequw → ¢ • τ 05:35, 23 August 2009 (UTC)[reply]

I agree that the stylistic variants aren’t nearly as important as the semantic variants, but nevertheless, when I look up a character on here, I want to know all the technical information pertaining to it, who uses it and for what reasons, why a variant exists, and so on. As Visviva said above: “This is actually one area where Wiktionary can (currently) provide better information than Google and most other search engines.” † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 11:35, 23 August 2009 (UTC)[reply]

I agree with Doremítzwr, and for his reasons: include the characters, but hard-redirect words they comprise. ✡ ﴾^(u):msh ^(t):210 ﴿ 19:47, 23 August 2009 (UTC)[reply]

I believe the non-(Katakana/Hangul) symbols can be used in all the w:CJK langauges, so I imagine we'd want separate L2 langauges (up to 3) listed on each of those pages (like the latin numerals are now) rather than one "Translingual" header (like most of the punctuation marks are now). I also started a usage note template. --Bequw → ¢ • τ 00:22, 24 August 2009 (UTC)[reply]

I strongly question this. "Translingual" does not mean omnilingual. I think that the best approach is the one we have long taken with Han characters (which have a similar distribution of usage): a ==Translingual== section for those aspects that are language-independent (such as technical information), and ==(Language)== sections for any aspects that are more or less language-specific. Otherwise a complete entry for full-width characters would include redundant sections for not only Japanese, Korean, and Mandarin, but also Cantonese, Min Nan, Shanghainese and so forth. For most punctuation marks, I think Translingual alone is probably sufficient (unless there are specific, documented peculiarities of usage in a specific language). If we can get by without, say, a German entry for ~, I think we can get by without a Korean entry for ～. -- Visviva 05:16, 26 August 2009 (UTC)[reply]

I disagree. Translingual is indeed too broad. "East Asian" is already a limitation, especially if it's in the English Wiktionary, out of the East Asian languages, it's only Chinese and Japanese that count, not Korean. No need to mislead users with characters, which have no relevance to English and European languages. Listing all Chinese dialects to show the punctuation is again absurd, as there is (basically) only one written standard form. A ==Chinese== umbrella would do a better job but since there was so much pressure to abandon "Chinese" term for "Mandarin", ==Mandarin== and ==Japanese== will do.

These full-length characters are no longer used in Korea, Japanese use them more consistently than Chinese. The usage in Chinese is fading for numbers and Roman characters but commas, colons, question marks and other punctuation is used (examples of characters in today's newspapers: China: 文章摘录如下： , Taiwan: 歐鴻鍊請辭？. Today's date in the Japanese newspaper (horizontal style) looks like ２００９年８月２６日, where characters are aligned better than 2009年8月26日 (the idea is that any character occupies the same square space, including punctuation symbols). In my opinion, Mandarin and Japanese flags are needed. No need for Korean and Chinese dialects. We don't have a special category for Cantonese punctuation, it is shared with Mandarin. Anatoli

Certainly there are some technical and visual difference between “２００９年８月２６日” and “2009年8月26日”, but to me a Japanese native speaker, they are identical as characters, in the layer of lexical recoginition. I mean, the difference I feel is quite similar to that between “August 26, 2009” and bold “August 26, 2009”, though the technical difference resides in an upper layer, not in Unicode code points but in Wiki text, in the latter case. I would call the both pair lexically the same, and don't feel a need to employ a different 2nd-level header for full-width ２ than that of half-width 2, as in the case of 2 and 2. While the fact that those full-width characters appear mainly in Chinese or Japanese text is important, I believe it can be sufficiently described as a part of the definition or a usage note.

Anyway, of course, I agree that providing explanation for each of those possibly-exotic characters will be a big help to our users. --Tohru 16:35, 26 August 2009 (UTC)[reply]

Thanks, Tohru, but what do you suggest? A simple redirect won't give any information, that means there must be an entry. Luckily, it's not such a large unmanageable subset. Unlike bold or italic, these characters' usage is specific to Japan and Chinese speaking countries. Anatoli 00:52, 27 August 2009 (UTC)[reply]

I've already given my opinion above. While I recognize that fullwidth forms are in use, similar to Tohru, I do not recognize them as lexically distinct from their halfwidth forms. Accepting them now will potentially open up many future problems. For instance, they are not necessarily limited to Japanese or Chinese. Ｉ　ｈａｖｅ　ｒｅｃｅｉｖｅｄ　ｃｏｍｐｌｅｔｅ　ｅ－ｍａｉｌ　ｍｅｓｓａｇｅｓ　ｎｕｍｅｒｏｕｓ　ｔｉｍｅｓ　ｗｒｉｔｔｅｎ　ｅｉｔｈｅｒ　ｅｎｔｉｒｅｌｙ　ｏｒ　ｐａｒｔｉａｌｌｙ　ｉｎ　ｆｕｌｌｗｉｄｔｈ　Ｅｎｇｌｉｓｈ　ｃｈａｒａｃｔｅｒｓ．　Ｔｈｅ　ｒｅａｓｏｎ　ｆｏｒ　ｔｈｉｓ　ｉｓ　ｄｕｅ　ｔｏ　ｔｈｅ　Ｊａｐａｎｅｓｅ　ＩＭＥ．　Ｊｕｓｔ　ｌｉｋｅ　ｗｉｔｈ　ｄａｔｅｓ，　ｙｏｕ　ｃａｎ　ｔｙｐｅ　”Ｅｎｇｌｉｓｈ”　ｓｕｃｈ　ａｓ　ｔｈｉｓ　ｗｉｔｈｏｕｔ　ｎｅｅｄｉｎｇ　ｔｏ　ｓｗｉｔｃｈ　ＩＭＥｓ．　Ｉ　ａｍ　ｓｕｒｅ　ｔｈａｔ　ｔｈｉｓ　ｉｓ　ａｎｎｏｙｉｎｇ　ｓｏｍｅ，　ｓｏ　Ｉ　ｗｉｌｌ stop. Just because you can Google such text and verify that it exists does not necessarily mean that we should start adding entries for them.

On a similar note, just now I wanted to type U+FA5B. This is a glyph variant of 者 with an extra dot in the center. Wiktionary automatically converted (without even a redirect) this to U+8005. And rightly so, as U+F900-U+FAFF are compatibility ideographs again encoded for round-trip conversion. This is the same situation. Technical information is nice, but do not loose sight that we are compiling a dictionary of words first. Bendono 08:11, 27 August 2009 (UTC)[reply]

Questionable sense of a word (with no citations to support it)

Hi. I occasionally run into a sense of a word in a Wiktionary entry that seems questionable; sometimes they "seem" dead wrong or "as if" they might be the work of a vandal or bored teenager. If there is no citation, and no example of usage, I tentatively confirm in my mind that this might be an issue worth noting, for the sake of the quality of Wiktionary. My question is: How can/should novice editors bring that to the attention of one of the more serious and competent wordsmiths on Wiktionary?

Is there any sort of a template with which one should tag that particular sense of the word? I could not find any on Wiktionary:Index_to_templates that seemed appropriate to the purpose.

Should we just note it on the discussion page and move on, hoping that some serious wordsmith will one day read the discussion page and catch the item? (This is what I did on the sense I noted this morning on tenant; my comment is here: Talk:tenant. But I don't want to sideline my main question with this specific example.)

So how should we bring such an issue to the attention of one of the more serious and competent wordsmiths on Wiktionary?N2e 16:29, 22 August 2009 (UTC)[reply]

I use the {{rfv-sense}} template. SemperBlotto 16:33, 22 August 2009 (UTC)[reply]

- Thanks SemperBlotto. I did not find that template when I looked. My bad. And now I see that DCDuring has already marked that sense I was concerned about on tenant. So all is very well. N2e 19:25, 22 August 2009 (UTC)[reply]

aspired h vs silent h

As this sound absence has no phonetic symbol, we've decided last month on fr.wikt to use a Template:h. Here we're currently also forced to replace the inappropriate "ʔ" by aspired h, as in haricot#French. But I really think that adopting a {{h}} would be more practical. JackPotte 23:48, 22 August 2009 (UTC)[reply]

This issue is being discussed on Wiktionary talk:About French. I agree with you that ʔ is inappropriate and I created a {{asph}}. This template is still in its infancy and feel free to make any changes. (I like the way fr.wikt does it!) No {{muteh}} exists yet. —Internoob (Talk|Cont.) 18:14, 23 August 2009 (UTC)[reply]

Authorization to run bot

Hi. I would like permission to run my bot:

My user name: Malafaya

My bot's user name: User:MalafayaBot

Software: Pywikipediabot

Task: It will exchange interwiki links among categories only with other Wiktionaries and update them accordingly

Due to the very low update rate required, I believe a bot flag is not absolutely necessary.

Thanks, Malafaya 00:58, 23 August 2009 (UTC)[reply]

Uh, isn't that exactly what User:VolkovBot does? I don't see the necessity for another interwiki bot. -- Prince Kassad 01:20, 23 August 2009 (UTC)[reply]

Yes, I believe VolkovBot does that and yet there are still lots of categories without interwiki links or outdated (because that's lots of work anyway). Allow me to explain why I'm asking for permission here: I already run my bot in smaller Wiktionaries. More often than not, my bot ends up not updating anything even if there was supposedly things to update. Imagine for example a category "Numbers in greek", existing in Portuguese Wiktionary and here, and nowhere else (at least, linked anywhere). English Wiktionary is not aware of the same category in PT, but PT is aware of the EN category. After I run my bot, things will be kept exactly the same, because the bot is not allowed to update here. So, even if VolkovBot runs here, it still won't find the Portuguese category because there's nothing linking to it (only the other way). This happens very often and it's the main reason why I'm applying for bot use here. Malafaya 01:31, 23 August 2009 (UTC)[reply]

Due to the lack of interest and the seemingly intrinsic counter-interwiki-bot spirit present here, I hereby retire my request. The only comment I got after 2.5 days was analog to "we already have someone working on Italian words so why would we want you to do it too?" (and I thank Prince Kassad for taking the time to post his comment, even if it's not what I was hoping for). My perspective is that, wikiwise, the more the better, even for bots. With the exception of interwiki linking in Wiktionary main namespace, which is ruled by the same ortography rather than translation of concepts to the wiki language, pages need at least an interwiki link somewhere else (i.e., a link from category German to category Deutsch in de.wikt). This means that, the wiki you process on makes a difference, as some interwiki-isolated pages may have information that links to other wikis. VolkovBot is just one and, even if it runs against all Wiktionaries, it will takes a long time before a cycle is completed, and even then, it's not for certain that it will catch all the relations between categories of different wikis. Two bots would be better and quicker than just one (despite VolkovBot running all around, take a look at my bot's contributions at pt.wikt and ca.wikt, for instance).

Enough said, I have applied for bot flag at the French Wiktionary and it should go well, so I'll be updating there. VolkovBot will eventually get the new interwikis it needs from there, so the long term result is approximately as if I was updating here directly.

Thank you for your attention, Malafaya 11:22, 25 August 2009 (UTC)[reply]

VolkovBot doesn't seem very active, I'd strongly support this, having seen it in action. Mglovesfun (talk) 14:55, 25 August 2009 (UTC)[reply]

VolkovBot seems to be more active in the main namespace lately. Nevertheless, its work is a very important contribution and it's the only globally established Wiktionary category interwiki linking bot. MalafayaBot should discover new data (interwikis) for VolkovBot, and VolkovBot should spread that data globally. It's a good symbiosis :). Malafaya 22:52, 25 August 2009 (UTC)[reply]

You need to pass a vote to get the bot flag. You should start one. --Ivan Štambuk 16:08, 25 August 2009 (UTC)[reply]

Maybe that's it. Maybe I should have started a vote immediately. I followed the directions at WT:BOT which mention gathering a consensus here at BP, but I guess no one takes that step too seriously (except myself :) ). Thanks, Malafaya 16:12, 25 August 2009 (UTC)[reply]

I started here as I'm not sure how well you know the English Wiktionary. Mglovesfun (talk) 16:32, 25 August 2009 (UTC)[reply]

Thanks, Mglovesfun. Yes, you're right. I don't know it that well in what concerns community decisions. As I mentioned above, I followed the procedure described in WT:BOT which directed me first to conducting an opinion poll here at WT:BP, with "meager" replies. Again, thank you for clearing out what I thought was lack of interest by the community. I'll be following the vote page. Over and out on this topic here :). Malafaya 22:46, 25 August 2009 (UTC)[reply]

rfd RENAME

in2VERIFICATION [ONLY!]which ofcours hasitsplace;ifsthFAILSthat,it'l inth endgetTAKENOUT[orLABELDasuch imo],i/part orentirely,thatsDUEPROCES,we doNOTneed2stres"delx5"likenow,itjustmakus lookbadlikeNAROWMINDED OGRS,eagerto eatawayNEWBYS EFORTS[butrealy,whostilfeels likeCREATINGentrys i/the PRESENTATMOSPHERE?!?{nwe rcomplete alredy,sure,c hypothenar's history.}--史凡>voice-MSN/skypeme!RSI>typin=hard! 04:13, 23 August 2009 (UTC)[reply]

Assuming you were attempting to say "RFD should be merged into RFV because RFD sounds negative and the process is the same" (which is incidentally much easier to type and read than what you put), RFD serves a different purpose from RFV, while RFV is for words that might not be words, RFD is for things that, even if they would pass an RFV, we might not want to include anyway. While they could be combined, it's quite nice to have the two seperate as a stale RFV gets deleted, while a stale RFD is kept. Conrad.Irwin 22:36, 25 August 2009 (UTC)[reply]

Also, the file size of such a page would break the MediaWiki limit for page size. -- Prince Kassad 22:42, 25 August 2009 (UTC)[reply]

Phrasal sentence adverbs

I am not sure why we would want all attestable phrasal sentence adverbs. The idiom rationale that we are now applying seems to effectively provide a lower standard for inclusion of such phrases than for any others. It appears that we could include to be honest and various simple derivatives pf the form: "to be X honest, where X ranges over a subset of adverbs that includes [perfectly, brutally, very, totally, completely, more, really, and absolutely]. Similarly "In all X" where X ranges over a list of nouns like "fairness", "honesty", "frankness", "innocence", etc.

Do we want all of them as entries?
Should the ones with adverbs be redirects to the forms without adverbs?
Do we want some of them only to appear in appendices with titles of the form Appendix:English sentence adverbs of the form "in all N"?

At the next level:

Is this controversial?
Does it need research?
Do we want to include this among the considerations to be addressed in "technical amendments" to WT:CFI? DCDuring TALK 20:00, 23 August 2009 (UTC)[reply]

These look a bit like snowclones, but offhand I can't recall how we decided to handle those. The arguments in that discussion might be relevant and helpful, if someone can find them. --EncycloPetey 02:28, 25 August 2009 (UTC)[reply]

Wiktionary:Beer_parlour_archive/2008/April#Gaps in entry titles. seems to be our most recent full discussion of the the general X-formula/snowclone issue. DCDuring TALK 11:23, 25 August 2009 (UTC)[reply]

IMO: 1. No. These are an open set; our goal as a project is wildly ambitious, but still finite.

2. Yes (if they are common enough that someone might plausibly search for them).

3. I don't think we should delude ourselves that such appendices are anything but a black hole at present.

1'. Ha ha.

2'. Seems cut and dried to me. Research might be merited in a specific case, if there were e.g. a question of whether "to be brutally honest" is anything but sum of parts.

3'. Seems like a good idea. Pity it's such an effing pain to update policies around here.

One advantage of these is that there is a natural home for the main entry, at the adverb-less version. Obviously this does not work for many snowclone cases. -- Visviva 13:19, 28 August 2009 (UTC)[reply]

curation

Looking up the meaning of a word in wiktionary : "curation" :

"Curation : The act of curating". Clicking on to to "curating".
"Curating : Present participle of curate". Clicking on to "curate".
"Curate : 1. (transitive) to act as a curator". Clicking on to "curator".
"Curator : A person who manages, administers or organizes a collection , historically at a museum, library, archive or zoo. The function is now quite detached from institutions and many curators, some of the most famous, tend to be independant , organizing exhibition all over the world, with different partners, as public as private." Jackpot! Only took me 3 more clicks!

157.193.203.65 08:05, 24 August 2009 (UTC)[reply]

Then you can easily improve it into "Curation : The act of managing, administering or organizing a collection , historically at a museum, library, archive or zoo." :-) --Kipmaster 12:06, 24 August 2009 (UTC)[reply]

Me

For your information: since I am not much active lately, I have been desysoped at my request [5]. --Kipmaster 12:08, 24 August 2009 (UTC)[reply]

<grins> Can someone please unlock my userpage...? :-) --Kipmaster 12:11, 24 August 2009 (UTC)[reply]

Done :) --Dijan 12:14, 24 August 2009 (UTC)[reply]

At least you're around enough that we know you're alive. —Neskaya kanetsv? 08:51, 5 September 2009 (UTC)[reply]

Appendix:List of legal Latin terms

I created this appendix (by which I mean I copied it from Wikipedia) and made a few edits. I thought it might be useful to users but also to contributors.

A lot of the links are red and I might want to add some of these terms.

I was wondering what the policy is for Latin phrases that get used a lot in (otherwise) English texts.

Can I use the heading "English"?

John Cross 21:55, 24 August 2009 (UTC)[reply]

If you're going to copy stuff from Wikipedia, or another Wikimedia project, then AFAICT (IANAL) you have to preserve the edit history, per the GFDL (or whatever license is being used), which requires attribution. This can be done by using special:import (for those who can see that page) or by asking someone else to do so or by copying the edit history to the talkpage of the page you make. (I think there's a script somewhere that produces a page's edit history wikified. Try w:WP:JS perhaps?)—msh210℠ 22:31, 24 August 2009 (UTC)[reply]

The newly created list can probably be deleted, as it is mostly redundant to Appendix:Legal terms, unless you want to specifically select Latin terms used in English to the exclusion of natively English terms.

Appendix:Legal terms is linked to from the law entry, from the section "See also". --Dan Polansky 10:20, 25 August 2009 (UTC)[reply]

I think there is some benefit to having a list of Latin legal terms used regularly in English texts/Courts in English speaking countries. I can't use special:import, perhaps someone with more amin rights could help me. I think it would be tough to argue the word list was copyright of anyone other than Wikimedia Foundation, I don't really see a major issue here.John Cross 18:25, 31 August 2009 (UTC)[reply]

That depends. Do you mean "terms from Latin that are used as legal terms in English courts" or do you mean "Latin legal terms that have since been borrowed into English"? There is a big difference there. Some "Latin" terms were in fact used in courts of law where Latin was the language of the court, but many of those expressions are just everyday Latin phrases or collection of words that had no special meaning in Latin. Such terms have only taken on specific legal meanings within the corpus of English law (and its derivatives in other countries). For these words, the language is English, since even though it is composed of Latin words, it did not have a special legal meaning in Latin. If you are indexing those words, then the title of your appendix is misleading, since it implies they are terms from Roman law and not from English law.

As for the copying, the copyright isn't the issue. MW documents are required by the licensing to display the contribution and edit history. If you copy the contents without the edit history, you are claiming to be the author of the content, which is unethical as well. User:Goldenrowley is probably our most experienced admin when it comes to importing from WP. --EncycloPetey 04:14, 3 September 2009 (UTC)[reply]

Redundant articles?

Is there a policy saying that we should always use articles in our definitions? It is customary to use articles in dictionaries, but to me it seems redundant. For example, the definition of "a table" would be "an item of furniture with a flat top surface raised above the ground, usually on one or more legs" and a definition of "the table" would be "the item of furniture with a flat top surface raised above the ground, usually on one or more legs", so should the definition of "table" be simply "item of furniture with a flat top surface raised above the ground, usually on one or more legs"? Is there any good argument why we should include the articles? What do we all think about making it a policy to use no articles unless they are necessary? Gregcaletta 02:22, 25 August 2009 (UTC)[reply]

There isn't a fixed policy I'm aware of, but house style is to include the indefinite article when defining a common noun and (often) including the definite in defining a proper noun. --EncycloPetey 02:26, 25 August 2009 (UTC)[reply]

It just reads better with the indefinite article IMO. A definition -- for a noun, at least -- is basically an answer to the question "What is a(n) _____?" If someone asks what a table is, it is far more natural to answer "An item of furniture (...)" than just "Item of furniture (...)".

It is easy to imagine uses for Wiktionary data that would require that definitions be perfectly substitutable for the definiendum -- e.g. some sort of AI/NLP application. But in those cases, it is trivial to remove the "a"s and "to"s. Our own style should be human-oriented, I think, and should follow lexicographic precedent unless there is reason to do otherwise. -- Visviva 10:42, 25 August 2009 (UTC)[reply]

The same applies to the use of the particle "to" in our definitions of verbs. In both glosses and definitions "to", "a", and "the" often serve to disambiguate between a verb and noun. The other means of doing so may not be present in users' working memories as they read the gloss or definition. DCDuring TALK 11:04, 25 August 2009 (UTC)[reply]

I agree. They also help to distinguish between a countable noun and an uncountable one (though they're not foolproof in that regard). —Ruakh_TALK 00:24, 26 August 2009 (UTC)[reply]

Fair enough. Someone could add this to policy, if it hasn't been already, but it's probably not necessary as articles sees to be used pretty consistently anyway.Gregcaletta 04:17, 28 August 2009 (UTC)[reply]

It is not so consistent in glosses, such as appear in many pages in {{trans}}, {{term}}, and {{sense}}, where the same considerations apply, except with more force because there are often fewer clues as to how to read a word in a gloss. It might be worth a proposal and vote. See Wiktionary talk:Entry layout explained#Including articles and particles in definitions and glosses. DCDuring TALK 14:36, 28 August 2009 (UTC)[reply]

Stereotypical sample sentences

I just changed the sample sentence of indecisive from "Girls are very indecisive. They spend ages choosing a dress." to something a little less generalizing. No matter what one's opinions are about gender roles are, or indeed any kind of stereotype, these sample sentences are wholly unnecessary and should in the name of neutrality be avoided when possible. Here's another example[6] of how one can, if not actually provide outright counter-culture images, then at leasta uphold a semblance of reasonable balance, ie that men are not always active and while women passive, especially when it comes to issues of romance or sexuality.

I would assume that gender-specific statements are probably among the most common when writing sample sentences, but I wouldn't be surprised if this might occasionally occur when it comes to ethnicity and other categories as well. Do we have any guidelines on this? Has this been discussed before?

Peter ^Isotalo 09:23, 25 August 2009 (UTC)[reply]

Help:Example sentences contains the guidelines and the policy (transcluded); I have done this to the page to reflect what you said, feel free to improve upon it further - I've noticed that the guidelines are all written in slightly different styles if you're looking to improve the whole section (though the policy cannot be modified without a vote). Example sentences are there to demonstrate how words are used, and I feel that providing a natural sounding example is more important than providing a "politically correct" example - though there is little need to deliberately be controversial when writing them. At least one user has been blocked permanently for persistently adding unacceptably explicit sentences, so there is some control over them, but at the same time it would be a mistake to overregulate them - they are useful both for providing context and also for providing tiny nuances that won't fit into a definition. In the particular instance of indecisive, I think your example is less natural than the original, but I'm not sure I can pinpoint why. While in the change to come on I see no particular gender issues, I would be very careful in deliberately going against stereotypes in example sentences; we are seeking for examples that show the language as it is commonly used (whether that be correct or not). _{Conrad.Irwin 00:04, 26 August 2009 (UTC)} Conrad.Irwin 00:01, 26 August 2009 (UTC)[reply]

Also, I think it can always benefit an entry to replace a contrived “sample” showing how we think a word is used with an actual citation quoted from a book or other source. Even some of the better print dictionaries have included made-up examples which don't match real usage at all. —Michael Z. 2009-08-26 01:18 z

The diff you mention changes the entry [[come on]] from having two sentences with a man's coming on to a woman to having one of those and one with a woman's coming on to a man. For lexical purposes (viz, to show users that the word come on can be used when referring to either gender), that's reasonable. But the purposes you mention (to show "that men are not always active and while women passive, especially when it comes to issues of romance or sexuality"), that's, if you'll excuse me, ridiculous. We should provide sentences that provide usage information of the word, and that's it. We're a dictionary.—msh210℠ 18:24, 26 August 2009 (UTC)[reply]

I agree that reflecting proper usage is the first proriority of a dictionary, but it doesn't mean that any other aims are "ridiculous". What's the point of that kind of characterization anyway? To try to prove that dictionaries aren't intended for real world usage? That our readers couldn't possibly care and would never notice? That we as wiktionarians are never prejudiced? I never suggested that we write specific guidelines that prescribe that exactly 50% of the personal pronouns have to be female or that any sentence even remotely touching the issue of gender roles have to be 100% politically correct. It's a suggestion that we could at least hint that when choosing between two equally relevant example sentences, there's generally only benefits in choosing one that doesn't potentially offend, and applies not just to racial and ethnic slurs but stereotyping in general.

The change that Conrad seems like it could solve the worst of the problems, though.

Peter ^Isotalo 07:02, 28 August 2009 (UTC)[reply]

Wiktionary:About Old French

Does anyone mind if I start an article for this? I've been down to Leeds Uni (where I study) and read the introductions to a couple of French-Old French dictionaries. I'm also trying to coordinate it with the French Wiktionary at the same time. Mglovesfun (talk) 16:06, 26 August 2009 (UTC)[reply]

Go for it. I had the impetus once to add some basic words from an Old French grammar I picked up, but quickly discovered the spelling variations problems and lost all hope. --EncycloPetey 04:08, 3 September 2009 (UTC)[reply]

It would be useful. Our etymologies of Middle English and English words ofter refer to Norman French (xno), Old French (fro), and Middle French (frm). I have also seen Old North French (no separate ISO code, I think) in etymologies. Clarifying each of these would be useful. Old French could be the first and mention the others. DCDuring TALK 10:04, 3 September 2009 (UTC)[reply]

Agreed. I would love to see this. Old French is a very important language for the English Wiktionary, probably only surpassed by its importance to the French Wiktionary. -Atelaes λάλει ἐμοί 11:32, 3 September 2009 (UTC)[reply]

Greek derivations.

Previous discussion: Wiktionary:Beer parlour archive/2007/November#Greek_derivations.

What do we want to do about Modern Greek derivations? el means Modern Greek and grc means Ancient Greek, but due to a confluence of various well-meaning past actions, {{etyl|el}} has come to be used in many entries that actually require {{etyl|grc}}. Ideally, I think {{etyl|el}} should present the text "Modern Greek", link to w:Modern Greek, and categorize in Category:Modern Greek derivations; but that seems inappropriate until we fix the existing entries.

So, I propose the following multi-step plan:

Create {{etyl:el-GR}} for Modern Greek, that does what I describe above. (GR is the country code for Greece.)
Go through Category:Greek derivations, and its other-language counterparts, and edit all entries to use either {{etyl|grc}} or {{etyl|el-GR}}.
Move {{etyl:el-GR}} to {{etyl:el}}, or redirect {{etyl:el}} to {{etyl:el-GR}}, or something.

While we're at it, we may also want to create templates for other forms of Greek, such as Byzantine/Medieval Greek.

Thoughts?

—Ruakh_TALK 19:42, 26 August 2009 (UTC)[reply]

This is a known problem. I have been very slowly cleaning up Category:Greek derivations but the number of entries is too much for me alone. The situation is exacerbated by the fact that a huge number of etymologies misses Greek script, another portion of those misses polytonic diacritic marks, yet another large group (of verbs) is wrongly lemmatized to infinitive and not the first person singular.

We need more helping hands.

PS. Byzantine/Medieval Greek does not have an ISO code. As far as I know we treat everything with polytonic diacritics as "Ancient Greek", the rest as "Greek" (except for the recently created Category:Cappadocian Greek language). --Vahagn Petrosyan 20:14, 26 August 2009 (UTC)[reply]

Right. And I'm not suggesting that we start giving Greek words under ==Modern Greek== L2 headers. It's just that it was decided a long time ago that Category:Greek derivations would be split, but then later changes undid that. So I'm re-suggesting that we split it, and suggesting a way to do it. It'll take a lot of a hard work on the part of knowledgeable editors, and I'm not expecting it to happen overnight; indeed, the fact that it won't happen overnight is the reason that we need a way. —Ruakh_TALK 21:28, 26 August 2009 (UTC)[reply]

Similar approaches have been previously proposed (e.g. {{MGr.}}) and ultimately rejected. I think in large part because it confuses an already confused issue. If we wanted to take an automated stab at cleaning this up, the best bet would really be to have a bot auto-replace all instances of {{el}} with {{grc}}, as there are so few instances of Greek derivations, especially compared to Ancient Greek derivations. Such an approach would allow for easy monitoring of additions to Category:Greek derivations, and reproachment of editors adding {{etyl|el}}. If we follow your route, we're going to have a whole lot of entries claiming to have come from modern Greek, when they really didn't, whereas now, they're at least ambiguous (at least to a user who doesn't know our language naming policies). As for Byzantine Greek, that really needs a unified approach to dialect forms. As it currently stands, Byzantine Greek is a dialect of Ancient Greek (which includes everything up to 1453. I've created a number of Ancient Greek dialect templates, which are currently only used in {{grc-alt}}, but I think could easily be incorporated into {{etyl}}, if we could agree on it. However, all things considered, I think that this issue should be put on the back-burner, as I'm unwilling to work on it at the moment (I'm working on something else with Ancient Greek at the moment, which, when finished, will I think be worth the wait), and as far as I've seen, I'm the only editor interested in consistently working with the language. -Atelaes λάλει ἐμοί 21:52, 26 August 2009 (UTC)[reply]

Soon Middle Greek will get its own code gkm which we might utilize. --Ivan Štambuk 22:03, 26 August 2009 (UTC)[reply]

Re: "If we follow your route, we're going to have a whole lot of entries claiming to have come from modern Greek, when they really didn't": I don't get it. My entire goal was to avoid that: entries that currently use {{etyl|el}} would only claim to have come from modern Greek if they were manually edited to use {{etyl|el-GR}}. What am I missing? :-/

Re: Byzantine Greek: Oh, that's good, then. I was imagining that Category:Greek derivations must include some Byzantine derivations we'd need to create a home for; but if it's considered O.K. to use {{etyl|grc}} for them, then that works great. :-)

—Ruakh_TALK 22:38, 26 August 2009 (UTC)[reply]

When these derivation categories are finally sorted (so that all the derivations are properly sorted, be they from the Ancient or the Modern language), I think that {{etyl|el}} should display Modern Greek (and not just Greek) — the fact that the language of derivation will explicitly state “Modern” will probably cause editors to pause and correct their misuse of ISO codes. † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 01:35, 27 August 2009 (UTC)[reply]

I may have misunderstood, but I thought you proposed redirecting {{etyl:el}} to {{etyl-GR}}. Wouldn't that cause {{etyl|el}} to say Modern Greek? -Atelaes λάλει ἐμοί 22:46, 26 August 2009 (UTC)[reply]

Oh, I did, but only as a last step, after all the entries that currently have {{etyl|el}} are sorted out. Maybe I just haven't looked enough, but I haven't seen any examples of human editors using {{etyl|el}} inappropriately; all I've seen are cases where an editor used {{Gr.}} (which IMHO was legitimately ambiguous, even though it "officially" meant the same as the ==Greek== L2 header, i.e. Modern Greek), where AutoFormat (talk • contribs) autoconverted that to {{etyl|el}}. But if it does happen that editors use {{etyl|el}} inappropriately, then I could go either way on that part of the proposal: we'd want to clean up any entry that showed up in Special:WhatLinksHere/Template:etyl:el, but while it was there, would it be better if it assumed Modern Greek, or not? Either way, we're talking about a far-off date — there are 841 entries using either {{etyl|el}} or {{etyl|el|xx}}, and it'll be a while before they're all sorted out — so we can probably cross that bridge when we come to it. :-) —Ruakh_TALK 01:56, 27 August 2009 (UTC)[reply]

wrong translation of section name Beer Parlour...

Sorry to inform but translation of section named Beer Parlour from English to Portuguese is wrong. The correct translation for beer parlour from English to Portuguese is 'Cervejaria', not 'esplanada'. I'm a Portuguese native and a speaker of Portuguese as a mother-tongue. I hope this will help... — This comment was unsigned.

That's not a translation, but a link to a page with a similar purpose on pt.wiktionary. So the name can be very different, since every community chooses her own. --Nemo bis 08:31, 28 August 2009 (UTC)[reply]

Wiktionary:Citations

See Wiktionary_talk:Citations#Why_split_the_citations.3F. I can't understand this policy. --Nemo bis 08:28, 28 August 2009 (UTC)[reply]

Thanks for checking the documentation and bringing the inconsistencies here. I have explained it as best I can, but, as you suggest, the guideline (not policy, I think) needs to be updated to fully reflect our current best practices. DCDuring TALK 14:53, 28 August 2009 (UTC)[reply]

Appendix:Spanish names for María

What's this, and what does it do? Mglovesfun (talk) 10:04, 28 August 2009 (UTC)[reply]

Changing the title to Spanish names referring to Mary would be clearer. But some of the columns are not Spanish at all... Lmaltier 19:10, 28 August 2009 (UTC)[reply]

Wiktionary:Votes/pl-2009-08/Add en: to English topical categories

Pretty much does what it says on the tin. Mglovesfun (talk) 15:03, 28 August 2009 (UTC)[reply]

Curious how I was just thinking of this problem, as my bot is at the moment under a vote. Currently, there's ambiguity between "Category:X (All languages)" and "Category:X (English only)" on this Wiktionary which is a problem for example for interwiki linking. Malafaya 15:09, 28 August 2009 (UTC)[reply]

+en: for English where every other language has its own ISO code? I’m for that. † ﴾^(u):Raifʻhār ^(t):Doremítzwr ﴿ 15:11, 28 August 2009 (UTC)[reply]

Yes, I think that's it. I.e., "Category:Countries" would be split into "Category:Countries" (same but non-language-specific, with all "Category:xx:Countries" inside) and "Category:en:Countries" just with English words. I'm also for :). Malafaya 15:25, 28 August 2009 (UTC)[reply]

So am I to understand that, on English Wiktionary, in a list of categories, users should be made to learn by trial and error that they need to go through the list of language categories to find English, provided that they aren't put off Wiktionary altogether? Is everyone just writing off the notion that we might want en.wikt to serve users who are primarily looking for/expecting English? Why does convenience in writing templates and bots override the needs of actual end users? Is en.wikt just our playpen? DCDuring TALK 16:30, 28 August 2009 (UTC)[reply]

OK, I understand your point of view. In that case, shouldn't categories in other languages move out of the English category? Something like "Category:Countries (all languages)". It's not logical that, say, category "Countries in Ukrainian" is sitting side by side with country names pages in English. Malafaya 16:59, 28 August 2009 (UTC)[reply]

P.S. You say en.wikt is supposed to "serve primarily users in English". Serve primarily in English, such as definitions, etc., yes, but serve primarily English words, maybe not. This is a multilingual project in what concerns words, terms, expressions, etc., so it doesn't bother me too much if categories for English concepts are marked as being English, no defaults assumed. Malafaya 17:06, 28 August 2009 (UTC)[reply]

English meaning being strongly determined by word order, what I wrote above, "to serve users who are primarily looking....", is not equivalent to "to serve primarily users who are looking...."

I simply want to make sure that any users who come to English Wiktionary with the naive expectation that it is well suited to helping them find out in English about English words are not excessively disappointed. After all, as benighted as they might be, they might have some quaint or colorful regional dialectal expression to contribute. I am not in a position to predict whether the slogan "all words in all languages" will turn out to have been a quixotic goal, but it is clear that we are having trouble reconciling the needs of the various user groups. As a result, we often seem to settle on just meeting our own. DCDuring TALK 18:30, 28 August 2009 (UTC)[reply]

For example Category:Vulgarities interwikied to fr:Catégorie:Mots vulgaires en anglais and VolkovBot understandably then linked it to all the macro-categories, not the "English" ones (pt:Categoria:Obscenidade for example). Mglovesfun (talk) 17:13, 28 August 2009 (UTC)[reply]

I share the DCDuring's concern. But it's common to all languages, not only English. Countries in English, Countries in Ukrainian... would be better category names. Lmaltier 17:18, 28 August 2009 (UTC)[reply]

I believe that the native language of each Wiktionary deserves special treatment on that Wiktionary to facilitate use for that language by people who are seeking a reference work in that language. That includes monolingual users. I would argue that only categories that have no prospect of ever being visible to users should have the language codes as part of the name. DCDuring TALK 18:30, 28 August 2009 (UTC)[reply]

(from the left) In fairness, what does that mean DCD? Mglovesfun (talk) 15:09, 29 August 2009 (UTC)[reply]

I'm not sure what "that" is intended to refer to. I will assume you mean the lead sentence and hope that I cover your actual intended question.

It's a statement of principle, like "all words in all languages", but not catchy enough to be a good slogan. I think all users of something called "Wiktionary, the free dictionary" have a right to expect things like the following (which we have at present for the most part, except for some user pages, stray definitions, and recent changes):

All running text on all pages in English other than usage examples/citations for non-English terms.
All terms used in headers, context tags, category names, attributes in English.
English language section first (Translingual debatable)
English at the top of all multi-lingual category and other listings.
Omitted lang= parameter defaults to English where required.

More controversially, perhaps no non-Roman and accented characters in transcriptions.

The purpose of all of this is to make sure that en.wikt can compete against mono-lingual English dictionaries reasonably effectively, also thereby attracting potential contributors who can help broaden and keep up to date our English content for the benefit of all those interested in English as she is spoke. The very inclusion of non-English material is a liability in serving monolingual English users. Of course, having non-English material is also a major advantage in serving EN users who are seeking information about words from other languages, in etymologies and in recently borrowed terms. It enables us to have a productive multi-lingual community of contributors that provides a stimulating environment for people interested in words and language. DCDuring TALK 15:56, 29 August 2009 (UTC)[reply]

O.K., but that's moot. The fact is that we do have lots of non-English stuff in (e.g.) Category:Fruits, and a proposal like this one would make it easier for an English-only reader to ignore non-English content. You write above, "So am I to understand that, on English Wiktionary, in a list of categories, users should be made to learn by trial and error that they need to go through the list of language categories to find English […] ?", and I simply don't understand why that should be. If they're navigating from English entries, they'll automatically find themselves in the English topic-category hierarchy. If, for whatever reason, they do reach Category:Fruits when what they want is Category:en:Fruits, there's no reason we have to make them "go through the list of language categories"; if we separate {English} from {everything}, then it will be easy for {everything} to link prominently to {English}. The only thing I don't like about this proposal is that our readers may not understand the meaning of the en: prefix; but then, we already have that problem. Just because someone is learning about, or interested in, a foreign language, doesn't mean they're instantly and magically a tech-savvy person familiar with the ISO language codes. (Worse yet, they probably will be familiar with the ISO country codes used as TLDs, which look similar, but which don't correlate very well.) I'd prefer names like Category:Fruits (English), Category:Fruits (French), etc. —Ruakh_TALK 18:52, 29 August 2009 (UTC)[reply]

I agree that the ISO codes are not user friendly unless one is saving time my typing in a code instead of a name and has the code available in one's memory.

Some of what I would like might be accomplished by the sort=* parameter or something similar that pushes English to or near the top of listings. The alternative is allow all default contexts and all lang=en context tags to appear in the top category.

If context tabs without a lang= parameter default to either Category:Fruits or to Category:en:Fruits, we will have a cleanup problem. It is not terribly difficult to clean non-English terms out of the top category (no lang=) by eyeballing the terms and inserting the correct lang parameter usually based on information in the language section. If one can use the Language header itself, that makes the process even more certain. There is a labor tradeoff between having to insert "en" in almost all English word context tags and having to pick non-English terms out of the category used for English terms. DCDuring TALK 20:21, 29 August 2009 (UTC)[reply]

Sorry, I think what I wrote was ambiguous. When I said, "we do have lots of non-English stuff in (e.g.) Category:Fruits", I wasn't referring to miscategorized entries; I was referring to the fact that Category:Fruits is the parent category for (e.g.) Category:fr:Fruits. Take a look at http://en.wiktionary.org/wiki/Category:Fruits, and tell me if that Web-page, as it stands, meets your "English-first" expectations. The screenful of language-specific subcategories pushes the English entries far down on the page; and furthermore, it causes the English entries to be split across two pages (you have to click the "next 200" link), even though there are fewer than 200 of them, so the software would happily show them all on one page if it weren't for the subcategories.

That said, I should point out that there are other ways to address this problem. The current proposal is to split Category:Fruits into a generic Category:Fruits and an English-specific Category:en:Fruits; but an alternative approach might be to split it into an English-specific Category:Fruits and a generic Category:xx:Fruits or something. (That's a bad name, but you see what I'm getting at.) Would that suit your sensibilities better?

BTW, I think sorting under * looks tacky. Personally, I'd prefer something like [[Category:Fruits| en:Fruits]] (using a space instead of an asterisk), which puts it first, before any of the character-headed groupings. But that's a minor thing. :-) It's just too bad that it's not possible to list it both first (for prominence) and in its properly-sorted place (for someone who knows what they're looking for); but then, we can always link to e.g. Category:en:Fruits in the actual template-provided text of Category:Fruits, and then not override the sort order.

—Ruakh_TALK 17:11, 30 August 2009 (UTC)[reply]

I'm with DCD on this. There's no reason to add an extra level of hierarchy for the basic finding of English words in the category tree. Currently, words of our main language fall under a simple hierarchy of topics, and they lead the reader an extra step to the related categories in every other language.

This proposal would remove words from the topic categories, which would be reserved for subcategories only. I think the result would be conceptually more difficult, forcing reader to make the leap to figure out how the two different types of category pages form a single branch-and-leaf structure.

We should rethink category naming as Ruakh suggests. Our restricted-use labels already categorize tens of thousands of special-vocabulary words. But we completely destroy this lexical categorization by mixing in thousands of thematically categorized words to create a lame Wikipedia-wannabe category tree instead. Using category prefixes may be a way to fix this. —Michael Z. 2009-08-30 17:18 z

I think that Ruakh's view is, as is thankfully often the case, virtually identical to mine, superior where it differs, and much more clearly expressed. That said, I in no way want to imply that encyclopedic categories are necessarily useful. Language, grammar, context, and maintenance categories seem much more clearly in line with being a dictionary. Our other topical categories seem capricious. Our semantic relations framework, supplemented by Appendices and see also links provides a system superior to categories in many regards. DCDuring TALK 19:02, 30 August 2009 (UTC)[reply]

Possibly, the solution does not have to be adding "en:" to all English categories but my opinion is surely that the main category should be split in two. I was one of the people who had trouble finding English categories for countries (I believe it was "Countries" I was looking for at the time). I had noticed the categorization was following the convention "Categoriy:xx:Countries". When that didn't work for English, I went to "Category:Countries" and at a first glance I still could not find the countries in English. Had I scrolled the page to the end, I would have found them... but only after a few head bumps I did find them there. Malafaya 11:47, 31 August 2009 (UTC)[reply]

There is an unrelated problem in play—the poor MediaWiki interface which gives the reader no hint that the page contains a list of entries to be found by scrolling down past the list of subcategories. Even worse, it gives no hint that more subcategories are to be found by clicking through "Next 200" pages listed. We can't really address this here. —Michael Z. 2009-08-31 12:29 z

Wiktionary:Anagrams

I'd like to propose some changes to this. Not only does this totally ignore foreign languages, why not allow diacritics? In Scrabble diacritics are always ignored, because there are no tiles that have them! Only a few languages used diacritics for Scrabble - French and Romanian don't, nor does Italian, and Spanish only has ñ, nothing else. Mglovesfun (talk) 15:13, 29 August 2009 (UTC)[reply]

I'm not responding to your specific points here, but since (accents aside) the definition of an anagram is so algorithmically measurable, I'd like to see this automated. I realise it's a lot of work for something of no use to most readers, though. In the same way that the alphabetical index is periodically generated, perhaps someone could look into a process that determines which (newly added or deleted) words are anagrams of others...? Equinox ◑ 15:39, 29 August 2009 (UTC)[reply]

In French, [7]. Certainly adding anagrams by hand is pretty futile, yeah. On fr.wikt we consider that eéêèëEÉ (etc.) are all e and iîïiI (etc.) are all just i, in terms of anagrams that is. So pâté is a perfectly acceptable anagram of tape. Mglovesfun (talk) 16:06, 29 August 2009 (UTC)[reply]

It would make sense to not add anagrams directly to entries, either to have a category for each one, Category:English anagrams of aenv, or (I prefer) a template that is included on each entry, {{anagrams:en/adeht}}. It makes sense, in this case, for the title to be included in the template, as that way the edit link will point to the correct place. I have created that template and added it to the two anagrams death and hated. While creating these templates and adding them to entries with a bot is very doable, it is not totally trivial and we need to work out (probably on a per-language basis) what to do with diacritics, punctuation (and clicks), how to deal with Mapudungun (and any other language that has two seperate writing systems using the same set of letters), what to do with multi-glyph letters (does the Hungarian cs just count as c + s), and whether the phrases created must be dictionary entries (i.e. should "the da", and "Ed hat" also appear on {{anagrams:en/adeht}}. Given the large number of entries that may have anagrams, it seems to me that we should ammend the WT:ELE example of a vertical list for them and have a horizontal list instead (I doubt there are many words for which this list is huge). I might start having a go at doing this with User:Conrad.Bot, given that I already have the word lists, though I will wait for further comments and a VOTE before editing in earnest. Conrad.Irwin 16:06, 30 August 2009 (UTC)[reply]

About diacritics and multi-gryph letters for anagrams, no general rule can be defined: in French, diacritics are traditionally ignored for anagram purposes, but the tradition might be different in other languages.

About phrases, I would include everything included in the Wiktionary, e.g. bien sûr but not Ed hat. I would make an exception for famous anagrams (e.g. un veto corse la finira for Révolution française, but without linking this anagram sentence, of course). Lmaltier 16:19, 30 August 2009 (UTC)[reply]

Other things to not include are misspelling entries, and (presumably) entries for Abbreviations, Acronyms and Initialisms? Conrad.Irwin 17:14, 30 August 2009 (UTC)[reply]

I agree for misspellings, of course. But why not abbreviations, acronyms...? If the reader is not interested, he can skip them... Lmaltier 17:19, 30 August 2009 (UTC)[reply]

No reason really, I just don't count them as "real words" :). The other issue is for alternatives like cafe and café where it is not clear from the entries pages which spelling is preferred. It would seem strange to list them as "anagrams" of each other, but then maybe we'd want to do that for linking purposes. For pages like co-operate and cooperate, the bot can detect the {{alternative spelling of}} template and not include the alternatives when they use the same letters in the same order. Conrad.Irwin 17:17, 31 August 2009 (UTC)[reply]

Or links s.v. =Alternative spellings/forms=? In any event, I like the idea of automating this.—msh210℠ 18:21, 31 August 2009 (UTC)[reply]

Anagrams imply a different order. café is not an anagram of cafe. Lmaltier 18:26, 31 August 2009 (UTC)[reply]

Ok, how do {{anagrams:en/adeht}} (death, Death, hated), {{anagrams:en/acef}} (cafe, café, face) and {{anagrams:en/eft}} (eft, EFT, FET) look? Conrad.Irwin 01:06, 4 September 2009 (UTC)[reply]

At fr.wikt, we use a single line when several words are the "same" anagram. An example in fr:écran:

Lmaltier 05:27, 5 September 2009 (UTC)[reply]

I've now done that, the only issue remaining with my implementation (for English anyway) is that it lists theatres of war and theaters of war as anagrams (it excludes theater of war and theatre of war as they are clearly marked as alternatives). I presume this isn't too much of a problem, and the system has the ability to be manually overwritten (using the templates {{include anagram}} and {{exclude anagram}} in the created templates). Any other suggestions? Conrad.Irwin 23:10, 10 September 2009 (UTC)[reply]

Why considering that theatre and theater are not anagrams (the letters are the same, in a different order)? Is the rule traditional for English anagrams? Lmaltier 21:25, 12 September 2009 (UTC)[reply]

They are anagrams, even though they are different spellings of the same word. I think we should certainly have them listed as such. Equinox ◑ 21:44, 12 September 2009 (UTC)[reply]

Since this is, hopefully, going to be added to the majority of English entries, I'd prefer switching to a horizontal layout. We could use parenthesis to group words differing only by diacritics. So Lmaltier's example would be "* ancre (ancré), caner, carne (carné), cerna, crâne (crâné), ... ". The word left before the parenthesis could be the one that would come first in an alphabetical sort. This would be the form w/o diacritics, if one exists. Is there much support for this? It would eventually take a vote to change the WT:ELE. --Bequw → ¢ • τ 19:03, 12 September 2009 (UTC)[reply]

Ok, I'll just include them then, there are around 75000 templates I could create, so yes they'd be on most entries. The format could be changed to the one you describe just by editing {{anagrams}}. Conrad.Irwin 22:08, 12 September 2009 (UTC)[reply]

Please indicate your opinion at Wiktionary:Votes#User:Conrad.Bot_to_do_anagrams. Conrad.Irwin 22:25, 13 September 2009 (UTC)[reply]

This has now started. If you notice problems, please let me know ASAP. It will take it about a week to do English. Conrad.Irwin 14:50, 26 September 2009 (UTC)[reply]

Database of English words

I am currently in the last phase of development of my mathematical software AlgoSim II [8] (a software quite similar to Wolfram's Mathematica in many respects). One feature I would like to add to the application, is the ability to search an English dictionary for words, definitions, synonyms, etc. Of course, Wiktionary is the best choise when it comes to the source of the data. What I really would like is a plain-text UTF-8 file with all English words at en.wiktionary.org, in a simple-to-parse format, e.g.

word1(SOME PRIVATE-USE CHARACTER)def1

word2(SOME PRIVATE-USE CHARACTER)def2

⋮

wordN(SOME PRIVATE-USE CHARACTER)defN

where $N\in \mathbb {Z} ^{+}$ is the number of such words in the dictionary. Is there such a file already available? If not, how can I create one? --Andreas Rejbrand 12:55, 30 August 2009 (UTC)[reply]

If you only want words and definitions --as in your example -- it's fairly easy to create such a file from the database dump. Some cleanup is likely to be required, though; how much will depend in part on how you want to handle inflected forms, alternative spellings, and so forth.

On the other hand, if you want synonyms and other relations, it gets a lot hairier -- for example, you have to decide whether to associate synonyms with words or with senses, and if with senses, how to handle cases where glosses are unavailable or inadequate. -- Visviva 14:23, 30 August 2009 (UTC)[reply]

Also bear in mind that the contents of the dictionary are constantly changing (more words added, and the occasional nonsense word deleted) so you can't just do this dump once and rely on it forever. Equinox ◑ 14:28, 30 August 2009 (UTC)[reply]

Thank you for your comments. I think it would be enough with the structure displayed above (that is, no synonyms). My best idea right now is to create a command line utility that reads the Wiktionary *.xml file, and creates the desired plain-text file. But the task becomes untrivial unless all articles have the same, valid structure, which is rather unlikely. I guess I can simply ignore the incorrectly-formatted articles, though... --Andreas Rejbrand 16:54, 30 August 2009 (UTC)[reply]

The format is good enough for getting the definition line (well, in well over 99.99% of entries). You simply need to find every line between ==English== and the next ---- that starts with a "#" but does not start with "#:" or "#*". Translating that into a useful definition is harder, many entries have "definitions" such as '# {{plural of|fudge}}' or (even worse) '# {{misspelling of|collaborate}}' which you will need to work out how you want to deal with. There are also context tags (at the start of a definition line) which may or may not have the same label as the wikitext. (i.e. {{Scotland}} renders as (Scotland)). If you come up with something good to do the definition-line -> definition, then please let us know! Conrad.Irwin 17:40, 30 August 2009 (UTC)[reply]

You can save yourself a lot of grief by using the xmlreader module from Pywikipedia. Instead of messing around with saxutils or whatever, you can just download the dump and --without even unzipping it -- write:

dump=xmlreader.XmlDump("wikt.bz2")

for entry in dump.parse():

...do something with entry.text <and entry.title>...

Conrad is right (of course) about templates and the like, but there are quick-and-dirty ways of getting messy but usable output. I'm currently running (approximately) the code in User:Visviva/sloppy.py, will post the result off-wiki once it's done. Output looks decent so far; there are some catastrophes like noun phrase (now fixed), where someone had helpfully numbered the examples, but not too many of such. I'm thinking this would be a useful thing to run on a regular basis for each language. -- Visviva 18:49, 30 August 2009 (UTC)[reply]

Thanks for the clue about dumps. Lmaltier 21:14, 30 August 2009 (UTC)[reply]

You're welcome. Never would have known about xmlreader myself, if Robert hadn't shown the way. -- Visviva 08:13, 31 August 2009 (UTC)[reply]

Here is a rough cut, after some very minimal cleanup (has some chopped lines due to sloppy coding on the first run). If you slice out the lines enclosed entirely in parentheses (form-ofs), and anything that has "(" after the beginning of the sense line -- and also anything with an invalid part of speech -- you'll still have a very large number of mostly-usable definitions. I would also remove all the proper nouns (or at least any containing templates/parentheses), but YMMV.

Somewhat off topic, it seems like emulating Special:Expandtemplates, using a DB dump, should not be an impossible task in Python. Anybody know of an already-written function that does this? I know there's mwlib, but it seems to require that you install every library on the planet. -- Visviva 08:13, 31 August 2009 (UTC)[reply]

Thank you very much! This is great! I will scrutinize the data when I get more time. --Andreas Rejbrand 10:44, 31 August 2009 (UTC)[reply]

Besides problmes with {{tags}}, I found this line:

GTA Initialism rand Theft Auto

Apparently, a "G" is missing. (By the way: is GTA, as in the game, really appropriate for a dictionary?) --Andreas Rejbrand 12:46, 31 August 2009 (UTC)[reply]

Hilbert Proper Noun (surname, from given names, dot=) derived from a (etyl, enm) given name of (etyl, gem) origin, _hild_ + _berht_.

--Andreas Rejbrand 12:54, 31 August 2009 (UTC)[reply]

Yes, I made a foolish assumption when first running the code, so the first letter of any definition that did not have a space after the "#" was zapped. I will fix this when I run it again (which I think I will do in any case, since this seems like a useful thing for us to have) ... but as it took about 15 hours to run on my little machine, it might be a few days before I can swing an update. Feel free to take the code and run with it, if you need something in a hurry.

The templates that have recently been introduced for surname and given name entries are parameter-heavy, so my little cheat of replacing the "{{}}" with "()" doesn't work at all. This is why -- at least in this iteration -- it would be necessary to slice out the proper nouns, or at least anything that contains "(surname" or "(given name". Similar problems apply to some other, less-common templates. On the plus side, I have just put together a Python function that will render {{surname}} accurately (though it will still fail badly on more complex templates).

As for GTA, I can only say that our inclusion policy for initialisms has been a bit ... odd. We wouldn't accept the actual game as an entry. -- Visviva 15:07, 31 August 2009 (UTC)[reply]

Here is a more satisfactory version, I think: [9] (about 8.5 megs zipped). Still got a ways to go, but it looks pretty usable overall, if you don't mind the usual wiki palimpsest effect. Known issues include the fact that the special properties of some context templates are not accounted for, so one gets "(usually, informal)" rather than "(usually informal)". This can be fixed easily, but I've run out of time again.

Significant further improvements on my end will depend on my figuring out how to get Python to render our more omnipresent meta-templates, particularly {{form of}}. Any help would be appreciated (or maybe I should just scotch the idea and hard-code them, as I did with {{wlink}} and {{isValidPageName}}).

NB: there is a PHP script that can be used to create static HTML dumps of a wiki (that one has loaded onto a local server), from which extracting fully-formatted definitions would be trivial. The copy on svn.wikimedia.org seems to have gone AWOL, however.

Anyway, this is a very satisfactory way of looking over a slice of our content, IMO. Could be useful for QC, if anyone is inclined. -- Visviva 15:29, 3 September 2009 (UTC)[reply]

Now updated from the latest dump. Fun fact: when pasted into MS Word in a suitably dictionaric 8-point font and 2-column layout, this takes up 4,648 pages. -- Visviva 04:15, 4 September 2009 (UTC)[reply]

Have you made any progress the last weeks? Perhaps the best thing would be if there was a command-line utility that accepts a database dump and creates the required file. Then anyone could use his/her own CPU time to parse the data, and update the dictionary whenever he/she wants. Or, even better, there could be an integrated CGI application in Wiktinary, that creates the file perhaps once a month. --Andreas Rejbrand 08:01, 25 September 2009 (UTC)[reply]

Some progress has been made; see current file (based on the 9/17 dump). I have been using this to create User:Visviva/Page of the day; it is reasonably clean (IMO), but still not exactly perfect. I am thinking that a really satisfactory approach is going to have to involve working from a static HTML dump. I hope to have such a dump ready by next week (requires a bit of setup, since I can't really use a computer that I'm using for anything else). Of course it may turn out to be more problematic than I imagine...

At any rate, I daresay it still isn't anything a professional programmer would want to be seen in public with, but User:Visviva/transclusion.py is now running to completion in a couple of hours on my laptop, so time isn't a big issue anymore. "transclusion.py <name of dumpfile>" should do the trick, should you care to take it for a spin. It does require Pywikipedia (specifically xmlreader.py) -- I suppose it wouldn't be too hard to make it self-sufficient, but it didn't seem worth the hassle at this juncture. -- Visviva 08:58, 25 September 2009 (UTC)[reply]

Have a look at dict.zip @ privat.rejbrand.se. Unpack the compressed archive and run the executable. --Andreas Rejbrand 00:36, 7 October 2009 (UTC)[reply]

shut the fuck up [rfd'nd

aLEARNER'dbe abl2CURSORthis>dropdown w/tr-l/def etc[acc.2hisPREFS]>pl GETRID ofthe sop-foibl [we rNOTpaper-basd n'dHELP USERS!--史凡>voice-MSN/skypeme!RSI>typin=hard! 17:13, 30 August 2009 (UTC)[reply]

Start a vote to eliminate the rule you don't like. Complaining here will achieve little. Equinox ◑ 17:20, 30 August 2009 (UTC)[reply]

Split communal discussion pages by month

I propose that this page, and WT:TR, WT:GP use subpages to stop the pages becoming unbelievably long. The process is also very simple. For September, you just add {{/September 2009}}. The advantages are dividing the page up more, plus to shorten the page, you just remove the link and it disappears - it's already archived without anyone doing anything!

There's a major drawback with doing this with deletion and verification pages, because the whole thing gets archived at once, including ones that haven't actually been closed. Mglovesfun (talk)

Support. -- Visviva 08:22, 31 August 2009 (UTC)[reply]
Support for BP, GP, TR, but not for RFV, etc. per nom. -Atelaes λάλει ἐμοί 08:59, 31 August 2009 (UTC)[reply]
I'd prefer to do this on a per-topic basis as we do for WT:VOTE, per month is arbitrary and has the disadvantage that if you get a link to WT:BP#title you don't know where to start looking (if it's been archived). For RFV and RFD I'd like an approach similar to WT:ES, though I appreciate the wiki-markup is ugly - it makes indexed archiving so easy. Conrad.Irwin 09:07, 31 August 2009 (UTC)[reply]
Agreed, but is there a way that we can make this remarkably simpler for an editor? If not, the point is moot? -Atelaes λάλει ἐμοί 10:09, 31 August 2009 (UTC)[reply]
Yes, it would take either some javascript (i.e. make the (+) links on {{rfv}} etc. do the transclusion onto the main page, and load up the talk page with the boilerplate already present, and do something similar with the new section link on WT:BP) or use a bot that detects the addition of a normal section to any of these pages and sub-pages it automagically. Conrad.Irwin 17:13, 31 August 2009 (UTC)[reply]

Could there be some other approach to this? Could a bot automatically archive a TR, BP, GP, ID, Feedback topic once there has been no discussion for 30 days (or some other period). RfV and RfD just seem to need a different approach or at least an adjusted one. Should a discussion inactive for a period of time be moved to an "active archive" with the topic heading remaining on the main page with a link to the active archive? A brave soul might try to restart the discussion on the main page by attempting a brief summary of the previous discussion and adding any new insight thereto. The headings might bear the date of the last archived comment.

I think Connel used to run one of these, it shouldn't be too hard if someone wants to implement it; given that everyone leaves the latest timestamp on their edit. Conrad.Irwin 17:13, 31 August 2009 (UTC)[reply]

What would be involved in said implementation? Is it a question of finding what CM ran or of developing the bot? Of monitoring the bot, stopping it and calling for help or of constantly recovering from and redesigning in light of major problems? DCDuring TALK 18:37, 31 August 2009 (UTC)[reply]

Wiktionary:Beer parlour/2009/August