Wiktionary:Beer parlour

From Wiktionary, the free dictionary
Archived revision by Ceyockey (talk | contribs) as of 23:28, 21 April 2007.

Jump to navigation Jump to search

Wiktionary:Beer parlour/header

Policies in development

Full list

  1. Wiktionary:Policies and guidelines
  2. Wiktionary:Assume good faith
  3. Wiktionary:Civility
  4. Wiktionary:No personal attacks
  5. Wiktionary:About Japanese-English bilingual
  6. Wiktionary:Neutral point of view
  7. Wiktionary:Obsolete and archaic terms
  8. Wiktionary:Entry layout explained/POS headers
  9. Wiktionary:Redirections
  10. Wiktionary:Spelling variants in entry names
  11. Wiktionary:Translations & /Wikification
  12. Wiktionary:Transliteration
  13. Wiktionary:Usage notes
  14. Wiktionary:Bots
This discussion page is automatically archived by Werdnabot. Any sections inactive for 28 days are automatically archived to Wiktionary:Beer parlour archive/2024/June.
Wiktionary:Beer parlour archive/headings

Connel's essay in response to Dmh's statements on deletion

OK, now you've ruffled my feathers. So, since I intend to start writing the book I've outlined, I need the practice at being verbose. So I'll practice my prose with a chapter here. For good measure, I cleared out/archived 185 KB of text so I'll have room for all these ones and zeros here.

What is Wiktionary?

That actually is a very good question. No one really knows the answer, though. The answer seems to be a dictionary that represents the minds of the collective contributors at any given point in time. If only vandals are here, then yes, it will quickly become another Urbandictionary. To be fair, Urbandictionary today, bears little resemblance to Urbandictionary a year ago. But I'm sure your can understand that if left unchecked, Wiktionary would rapidly decline.

But what keeps real contributors here? Is it their notion, that Wiktionary will grow to become the dictionary, that they each want to be able to tell their grand-kids they helped write? Does it purport to be some massive force for social change?

This is the left arm of Wikipedia. An encyclopedia simply cannot, and should not, cover the needs that a dictionary does. But for a universal reference, a dictionary component is needed. Most people looking something up, can't grok the idea of an encyclopedia containing dictionary definitions - that stuff, their brains automatically want to see lumped into a dictionary.

Back to the social change stuff for a moment...Wikipedia in and of itself, is becoming a force of some sort. More and more people are turning to it first when they'd like to understand something. So what is changing? The role of publishers? I can't imagine they are particularly pleased with that prospect.

So, off on a slight tangent, have you checked how much it costs to access the OED online these days? $382.75 In North America, buying an annual subscription through your local library is only $195.00. You don't get a copy of it or anything - you just get the ability to search their references and read individual items. Granted, that is a lot less than it was a year ago, but still...that is not chump change. Limited (i.e. incomplete) editions are available free of charge through most libraries.

Oddly, m-w.com still doesn't charge for their general access, but does charge $29.95 for annual access to the unabridged version. Nor does dictionary.com (but e-reference can be downloaded for $34.95, AHD for $26.00, Cult lit. $29.95, etc.) Cambridge is free or £21.00 to buy, Bartleby is free to access, or $60.00 for hardcover.) Instead, both bombard you with advertisements, many which make it past the various filters made to combat such nonsense. But how long can even that last?

So, um, wait a second. Why are we here again? I think it is the realization of all contributors, that such free access is extraordinarily precarious now. Any day, "they" (you know, them - "Them" - the big meanies) may decide it is time to start charging for access. All of the serious dictionary publishers have already made their attempt at going online. From their perspective, there is no more "adaptation" to new environments that they can do.

You and I, we know better. Free content is not just free, it is liberating.

Do I want to see a replacement for other dictionaries? No. I do want Wiktionary to be equivalent (or better.) I get the feeling that most contributors here feel the same way.

Now, does being a "multilingual dictionary" make us better? Absolutely not. Lookups are astronomically harder, glosses are much more susceptible to splits in deference to other languages, translations clog up the works to no end. Technical aspects, like simply rendering the alphabet, are no longer simple. But who am I to say? The decision was made by this community long before I ever heard of wikiAnything. And although I only listed the glaring defects, there are also benefits.

Being a multilingual dictionary gives tremendous insight into etymology and cognates. Having everything as a single search means the information you are looking for, about an obscure Greek term, is right at your fingertips. The lack of language separation at the software level has had direct benefits on the English side as well. It has forced us into listing all forms of a word, which really is a good thing.

Such a technical marvel has been inconceivable since dictionaries first existed. Look at the other online dictionaries...they still don't get it. Instead of listing the entire word, as spelled, the list the suffixes for a given headword. They could spell them out, but they are so set in their ways, they refuse to.

But back on topic, our immediate international reach has allowed other miracles, like the list of French Wiktionnaire's English terms that aren't yet in the English Wiktionary. The German and Dutch word distinctions have forced us to explain many entries in ways we would never imagine, as native speakers. And words like uncle which have only a single meaning, are suddenly clarified beyond imagination. (Yes, that is both good and bad.)

So why is being a multilingual dictionary so bad? It doesn't meet the expectations of our readers. As calcified as the dictionary publishers seem to be (to me,) the readership is an order of magnitude more guilty. So our software here, has to accommodate both the lay-reader and the hard-core linguists. (The recent PIE vote would be a good example of that.)

Of all things, I think our readers are of the most concern. No one wants to open a dictionary and see goatse. No one wants to look up a term, and find an obscure, deranged S&M re-definition of a normal word. No one wants to be redirected to a made up "phobia" that describes a fear of the word they are looking up. No one wants to find out that a Pokimon character used the thing they are looking up in episode 827.

And no one wants to be told the wrong way to spell a word (especially on a close match lookup.) People do want to know that they are using the right word, spelled the right way. Some authors want to know that the obscure word they are using is acceptable. Authors, in particular, know perfectly well, when and how they can go outside the bounds of strict, formal, correct usage.

Are we currently building a usable dictionary?

Now, lets look for a moment at the contributors. Who do we have? Not so many people are desperately interested in the grunt-work of composing accurate and consistent definitions. Instead, we tend to get a lot of people stopping in, with a strong desire for world recognition.

Turning en.wiktionary.org into an intellectual pissing contest (scenarii) is not exactly productive.

Other contributors wish to provide the terms relevant to their "vertical segment" in an increasingly popular dictionary. Others are here just because they are baffled that we don't have entries for their favorite terms. Many others seem to unconsciously think this is, or should be, a slang dictionary.

None of those groups are interested in the day-to-day grunt-work of building a usable dictionary. Occasionally, an individual in one of those groups is, but by and large, those stereotypes are nearly opposite of what en.wiktionary.org needs.

So, back to what is Wiktionary. Or rather, what should it be.

I for one, am embarrassed by the enormous number of words we have that do not appear in any other general-purpose dictionary. I for one, am embarrassed by the enormous number of words that appear here, that are universally thrown out as spelling errors elsewhere. I am astonished that, for the most part, except a tiny handful that I've marked, those errors have all the appearance of "valid" words. Such short-sightedness makes a task such as building a spellchecker from Wiktionary, nigh impossible.

Should we throw up our hands, as Dmh suggests, and allow a free-for-all? Or should we get serious about building a real, usable dictionary that can be looked at as an historic achievement? We have an opportunity here to provide the World with a copyleft usable dictionary.

Now, before I start on chapter two, (How to get there via a "multi-level Wiktionary") I'd like to take a quick straw poll. --Connel MacKenzie 00:42, 19 January 2007 (UTC)Reply

Direction

  • Wiktionary should be a usable, "real" dictionary with nonsense and slang kept out of the main namespace.
  1. --Connel MacKenzie 00:42, 19 January 2007 (UTC)Reply
  2. --Versageek 01:17, 19 January 2007 (UTC)Reply
  3. --Cynewulf 02:18, 19 January 2007 (UTC)Reply
  4. --DAVilla 04:56, 19 January 2007 (UTC) except that I'd challenge your notion of a real dictionary, since even respectable dictionaries have slang; and as long as "nonsense" is defined objectively.Reply
    I was intentionally vague, so that I'd have something to say for chapter two.  :-) But then, that will just be a rewrite of the "multi-level Wiktionary" thing, that I've gone on about before, elsewhere. --Connel MacKenzie 05:55, 19 January 2007 (UTC)Reply
  5. --Jonathan Webley 07:26, 19 January 2007 (UTC). Delete the nonsense, but keep the slang.Reply
    Perhaps you should move your vote down then. I am suggesting slang be eradicated from namespace zero, but remain search-able as full entries, e.g. "Slang:bitchin." --Connel MacKenzie 17:20, 19 January 2007 (UTC)Reply
    As you can see from my delete history, I'm not in the free-for-all camp. To be honest, I'll need to see the slang namespace in action before I can be certain whether I agree with it or not. Jonathan Webley 11:44, 20 January 2007 (UTC)Reply
  6. --Enginear 15:06, 19 January 2007 (UTC)Reply
  7. --Jeffqyzt 16:58, 19 January 2007 (UTC) Agreeing to the bold text, assuming that the non-bold is merely commentary stating Connel's POV. Otherwise, the two options are equally disagreeable.Reply
    Perhaps you should move your vote down then. Yes, it is an attempt to clarify my POV, for my little straw poll here. --17:20, 19 January 2007 (UTC)
  8. --Cerealkiller13 21:01, 19 January 2007 (UTC) (Let me be quite clear that I think this is the best option of the two, but I am not advocating it as a Wiktionary policy).Reply
  9. —Stephen 23:20, 19 January 2007 (UTC)Reply
  10. I am a dictionary editor and this is my manifesto? - [The]DaveRoss 17:21, 23 February 2007 (UTC)Reply
  • Wiktionary should be a free-for-all with no [delete] or [move] buttons for anyone, whatsoever.
Please do not add other choices. Neither goal is likely; I'd simply like to know what the general desire actually is.
OK, then I can't answer yes to either of the above. I like a lot of the points you make above (though I'm still more with Hippietrail on making better use of available technology). I also completely believe you offer the choice in good faith, but it's still a false dichotomy. Wiktionary should be a usable, "real" dictionary. But real dictionaries include slang, and "nonsense" is like obscenity — you know it when you see it. Which is why ...
To be usable, Wiktionary needs consistent rules and they need to be consistently applied. It can't be a free-for-all. Back in the day we'd argue over whether a particular made-up word, and I mean something that the contributor said they'd made up out of whole cloth, should be in the main namespace or not. Eventually we decided on LOP, but then we had to explain why someone's baby ended up there instead of in the main space. It was on their web page after all.
It was about that time that CFI got larger — at least one person said too large — and a hell of a lot less vague. Now when someone introduces a made-up word, we can say "Nope, sorry, fails independence and attestation. LOP." Game over. Done. Next customer, please. This is progress.
I share your concern, to some extent, about filtering out garbage. I'm not concerned about (IMHO) silliness like scenarii, ingenuitive and the "I don't like it" sense of illiteracy. I'm not greatly concerned about vulgarity, profanity and internet-flavor-of-the-month. As a word geek, I'm much less bothered than most where a citation comes from, as long as it's durably archived and it's clear the speaker is using the term in question in earnest.
I am, however, concerned about two things that I believe you're also concerned about:
  • Preserving the information that (however silly the reasons) some terms will give people the impression you're stupid. Similarly, some (for generally more valid reasons) will offend. Also, some are only used formally and some are never used formally and some are in between. We need to say this, but without taking a POV as to whether any of this is justified.
  • Noting which spellings are commonly accepted and where, which ones are used infrequently but not considered outright wrong, and, within limits, which ones are commonly used but commonly considered wrong. We want to do this without letting in a full entry for every conceivable spelling of every conceivable word.
I think the solution to all but the very last item is, "Include, but mark." We can have (and have had) a lot of fun arguing over just how to marke things, but they need to be marked. The entry for scenarii needs to say that it's only used in particular techincal contexts and scenarios is overwhelmingly common elsewhere. The entry for ingenuitive might say that the very similar ingenious is much more common. The entry for your favorite obscenity should be marked as such and given as prudent a definition as will convey the meaning concisely.
All of these should be codified as general rules to be applied in such cases, not just done ad-hoc. They should be codified for the same reason that we codified handling of made-up words. It will make our life simpler and Wiktionary better.
The last item is harder. Plenty of bogus spellings will pass the current CFI, which is more aimed at filtering out made-up words. Actually, I think I have a proposal, but I'll give it separately. Even if that doesn't pan out, we need some sort of well-defined rule.
If I've been expressing this well at all, it should be clear that I'm in no way advocating a free-for all. If you re-read most of my complaints over time, you may find that they suddenly make more sense if viewed as "this is not following any consistent rule" and not as "we need to let in anything, from anyone, anywhere, any time" or "I'm just rattling cages" (I'm very seldom just rattling cages :-).
You can't just fiat "no nonsesnse shall appear in the main namespace of Wiktionary." You have to give clear, objective criteria. Otherwise you do get a free-for-all. -dmh 04:00, 19 January 2007 (UTC)Reply
Having taken the trouble to look it up now, I'm no longer bothered by ingenuitive or of the opinion that ingenious should be offered as an alternative. They're two different words. I'm not sure what a better example would be above, so please pretend that ingenuitive is just an odd variant on ingenious -dmh 05:00, 19 January 2007 (UTC)Reply

A dictionary which excludes slang is neither "useable" nor "real". As for "nonsense", well everyone agrees that that should be kept out, but the point is that everyone has a different idea of what constitutes nonsense. Widsith 17:12, 20 January 2007 (UTC)Reply

I agree. That is why I said "kept out of the main namespace" (ambiguously.) To be less ambiguous, I am suggesting a "Slang:" namespace (fully searchable, but marked as slang by entry title.) --Connel MacKenzie 22:24, 20 January 2007 (UTC)Reply

We can still describe slang words in Wiktionary without any of the usual crap found on Urban Dictionary. I'm inclined to agree with Widsith on this, but no free-for-all though. --Williamsayers79 18:13, 20 January 2007 (UTC)Reply

Yes, in the longer proposal, I'd shunt such entries to "Vulgar:" or "Obscene:" namespaces. Since this is all so hypothetic, no real discussion of what the exact namesaces will be, has even started. But with positive feedback, I think I will propose a bunch. --Connel MacKenzie 22:24, 20 January 2007 (UTC)Reply
I'm fine with "include, but mark" as a general approach. Namespaces may or may not work as a practical means of marking. I'm much less interested in the mechanism or even the categories than the rules for categorizing. Once again, rules. I don't think anyone has seriously proposed a free-for-all. -dmh 02:02, 21 January 2007 (UTC)Reply
I regularly comment that there should be some area or areas (probably not in the main namespace) where misspellings, misprints, typos and scannos can be placed, subject to something similar to our present CFI, so they can be searched for. This apparent serious request for a definition shows exactly why. I have yet to hear a good argument why such entries are less useful than "correct" entries, although I accept that we need to find a way to discourage mirroring before we add them. Personally, I think an entry for niany would be more useful than, say, metropoleis. The latter should, IMHO, normally only be used to an audience who understand Greek inflections, since a more commonly understood plural is readily available. Few people would therefore need to look it up in a dictionary, indeed perhaps none yet have. However, many with limited knowledge of English might be expected to look up the scanno niany, or indeed the scanno bum, as repeated in one of the cites. --Enginear 20:38, 7 February 2007 (UTC)Reply

(It should be clarified that nothing in this paragraph was intended with meanness) OMG thought police OMG! Your absolutist logic makes no sense to me. No reasonable person wants either option at all. Then your points - "No one wants to see goatse" - no one will if you watch for vandalism. However, a wiki is a wiki, and you can't change that. "No one wants to see an S&M term" why not? I think your "deranged" descriptor is frightening, to be honest, because S&M is a notable subculture, and there's no reason its definitions should not be included. Obviously, a completely bogus phobia should be deleted, and if they're looking up a real word it shouldn't be a redirect to a completely bogus phobia. I think that a pokemon's name is not a definition, and therefore belongs at Wikipedia, which would of course get a transwiki link. I don't think including misspellings is a bad thing either. I think one of the most confusing parts of your argument is that you suggest somehow including peripherary information makes it so readers can't find what they're looking for. I'm not a regular contributor, but I think the wiktionary needs no general policy revamp, and I'm very concerned that you think so. 66.87.91.36 09:55, 9 March 2007 (UTC) (Also Atropos.)Reply

We can have our cake, and eat the bits we like too

Connel loves back & white discussions, doesn't he just. But the world is not black and white. It is more complex. My view is that

  • the Wiktionary database should contain everything possible
  • the reader, on registering, sets their preferences of what they want to see.

You set which languages you want to see. (Maybe you just want an English dictionary, or maybe an English/Hindustani dictionary, or "All Languages" is your interest. In future, maybe a Hindustani/Japanese dictionary). Maybe you are offended by Vulgar slang, so check the right exclusion box. But then you might be reading a text with a word you suspect is a slang word and you want to know it's meaning. So check the right box to allow all slang. Do you want to see misspellings or not ? Maybe you really don't want to see the etymology. I certainly don't, most of the time. Check if you want to see Obsolete terms or not. Check if you want to see Protologisms or not. And then change your preferences if your usage changes.

By having this kind of universal database, many views approach, we could conceivably keep everyone happy. It is, to me at least, an obvious compromise between Universality and Personal Useability. Certainly, seriously considering this has to beat playing Connel's simplistic black or white debate.--Richardb 10:30, 19 March 2007 (UTC)Reply

Actually, I think I devoted a couple thousand words above describing just how much of the gray areas are gray. We should retain all vandalism, people's phone numbers, "JOSH IS GAY" entries? My, that is a new one, even for you. We don't have the technical ability to accomplish what you propose, anyhow. WM lookups are WM lookups...so if the garbage has an entry, it will always count as a direct hit. So the "misspelling vandalism" ends up being very effective. Great. --Connel MacKenzie 23:34, 24 March 2007 (UTC)Reply

Structure of Wiktionary

One of the things I don't understand about en.wiktionary.org - which according to the main page is an English language dictionary - is why the entries define the word for multiple languages. For example, if I look up "hut" then as well as the English word I get translations for the Czech, Dutch and Old High German words "hut". In an English language dictionary it would be useful and interesting to have comparisons to related words in other languages (of the "c.f. Dutch, hut" type), but surely each language pair should have its own dictionary (in these cases Dutch-to-English, Czech-to-English, and so on)? What is the point of listing potentially completely unrelated words in the same article just because they acidentally happen to have the same spelling? To cope with the occasional event that I might have a word and not know which language it is in it would be much more sensible to have a "master lookup" feature across all dictionaries. Thus, I type in "hut" and it tells me the word is in the English dictionary, plus the Czech-to-English dictionary, etc. with links. The way it's organised at the moment seems bizarre to me. Perhaps I am missing some fundamental point. Matt 20:52, 16 February 2007 (UTC).

The point you may be missing is that this is a multilingual dictionary. We're striving to contain every word in every language (we still have a very long way to go on that). What distinguishes this as the English Wiktionary is that all of the definitions and explanations are in English. The beauty of this system is that if I, as an English speaker, want to know what the Dutch word "hut" means, I can look it up here and find out. If I look up hut in the Dutch Wiktionary, everything (the definitions, usage notes, etymology, etc.) is in Dutch, and I'll be completely lost. I hope that answers your question. Feel free to leave further clarifications if it didn't. Atelaes 21:23, 16 February 2007 (UTC)Reply
I understand what you are saying, but if I, as an English speaker, wanted to know what the Dutch word "hut" means I would never consider looking it up in the same place as I looked up the definitions of English words. I would be looking somewhere for a Dutch-English dictionary. Perhaps this is just because every other dictionary that I've ever seen works like that. I've never come across anything with the structure of Wiktionary, which is probably why to me it seems so bizarre! Matt 22:00, 16 February 2007 (UTC).
Wiktionary is a Dutch-English dictionary. And a Czech-English dictionary. And an everything-else-English dictionary as well, and also a regular old English dictionary. We're just not constrained by space like those paper dictionaries you're used to, or constrained by lack of imagination like those online dictionaries that only mimic the paper dictionaries. bd2412 T 22:04, 16 February 2007 (UTC)Reply
I remain unconvinced, but I appreciate that others take a different view. Matt 00:51, 17 February 2007 (UTC).
Matt, you are not alone. Hippietrail is putting finishing touches on a "Multi-lingual Wiktionary" extension to the MediaWiki software. I completely agree that lookups (and therefore, also edits) should be restricted to languages a user prefers. A simple note that the word's definition exists in other languages should be more than sufficient. Also of note: the Latin Wiktionary, I believe used "Wikipedia-style disambiguation" to separate the languages, instead of level two headings. It's the same, only different. --Connel MacKenzie 01:07, 17 February 2007 (UTC)Reply
A further argument could be made against (by default) having translation sections and entries for foreign words. One or the other would lead to much greater consistency. --Connel MacKenzie 01:11, 17 February 2007 (UTC)Reply
How would that help someone who either a) wants to know how to say "foot" in Spanish, and has no idea, or b) comes across the word "pie" in a Spanish essay and wants to know what it means? Would you suggest they look up "pie" in Spanish Wiktionary and hope to find the English translation? bd2412 T 01:24, 17 February 2007 (UTC)Reply
What, what, what? No, click on "Show all languages" or "Show Spanish entries" before pressing [search] for "pie". For multi-lingual Wiktionary, many things assumed here (currently) about the [Go] button would not be/should not be valid. The way we've done it here, so far, is not scalable, nor flexible. --Connel MacKenzie 05:53, 19 February 2007 (UTC)Reply
I will be bold and say that nobody (including me) is quite sure exactly how Wiktionary should be organized. From what I've seen, we really have three types of contributors at Wiktionary:
  1. people who create definitions for words
  2. people who obsess over the format of individual entries and/or the organization of Wiktionary as a whole
  3. people who write software to speed up mundane editing tasks
those are the three big ones. Most contributors to tend specialize in one of the three. I personally believe that we will not be able to know how Wiktionary should be organized until we have created entries a lot more words.
A-cai 12:23, 17 February 2007 (UTC)Reply
Have gone through phases of each of the three stereotypes listed above, I'm not sure what you're trying to say. I will say that at this point, I do not wish to sit by, idly, while Wiktionary is turned into an un-parsable (programatically unusable) mess. --Connel MacKenzie 05:53, 19 February 2007 (UTC)Reply
I'm not sure what I'm trying to say either :) The only thing that truly seems to unite the long term contributors to Wiktionary is the belief that Wiktionary could become something truly ground-breaking. Having now worked on Wiktionary for over a year now, I'm no longer under the illusion that it will happen any time soon. The biggest problem that I see is that we have a lot of people worrying about the form of Wiktionary, but not enough who worry about the content. I believe that the reason for this, based on the bilingual people that I have talked to, is that the process for creating entries is still too cumbersome. This is something that we old-timers tend to forget. Wiktionary needs to become much more user friendly if we are ever going to have a chance of attracting a large number of language enthusiasts (many of whom are not computer savvy).

A-cai 07:50, 19 February 2007 (UTC)Reply

I fully agree with Connel that it would be very nice to make Wiktionary a customizable experience. Most of the users really only want to see a simple definition of an English word, but I'll be damned to let Wiktionary be limited to that. The OED online has some hints of what I would like to see Wiktionary one day become. It has buttons at the top which allow the user to determine which portions of their entry they want to see. If you want to see the etymology, you click the "etymology" button at the top. Same goes for pronunciation, quotations, etc. I think it would be nice if there was something similar to WT:PREFS at the homepage, where people could set up the default views. Here they could determine if they want to see etymologies, if they want to see translations (and if so, from which language(s)), etc. This would certainly require the imposition of a lot of rigid formatting rules, but I think it would make Wiktionary appeal to a broader audience, especially as our articles continue to (hopefully) grow. It certainly is rather intimidating to go to an entry, looking for the definition, only to find five pages of text that you have to sort through. But at the same time, I don't want to give up an iota of those five pages. Atelaes 07:36, 19 February 2007 (UTC)Reply
I too like the OED buttons available on the front end accessed from [1]. Unfortunately, it seems that most US libraries have gone for the Oxford Reference front end [2] which does not use those buttons, so people like Connel can't easily see what we're talking about. However, your description seems pretty clear. The only things I can think to add to the description are that on OED the preferences last only for the session, which is a pity, and that the etc in your "same goes for" represents date chart.
Obviously, another useful button for us would be translations; and we need, as Connel has just hinted somewhere, a means of choosing what languages we want to see entries for. It would be an advantage to have the buttons visible on every page (as OED does) rather than having them on the home page, since I find I sometimes alter them a few times during a session, depending what I'm looking for.
I do think it makes sense to have a relatively high proportion of editors caring about setting the style while the number of entries is still in the 100k's rather than the 10M's. I agree that, once we have a format that seems scaleable to the "all words in all languages" goal, we should make the entering of words easier. However, I think it may actually be good that the present imperfect system throttles the number of edits and leads to a preponderance of nerds, at this stage when the structure clearly needs attention.
And to anyone who hasn't noticed Hippietrail's latest (at WT:BP#Wiktionary structure awareness extension prototype live for testing) then do look (though I haven't yet worked out myself how to vary it from the default). --Enginear 16:33, 19 February 2007 (UTC)Reply

I agree with some of the sentiments expressed above. As a newcomer, the more Wiktionary pages I look at, the more of a mess it seems to be. Someone needs to sit down and properly design the structure, and then create an interface that *enforces* that structure, so that individual editors can't just go off and do their own quirky things. (I should emphasise that my comments are not in any way intended as a criticism of the people who have obviously put in a lot of effort to get Wiktionary to where it is. It's just the way the thing's grown I guess.) Matt 14:51, 20 February 2007 (UTC).

Some of the perception of messiness comes from having a large number of articles which are fine, but could be more complete, combined with articles that are fairly complete. E.g. if every article consistently had pronunciation, it would look less "messy".
But: is is very fortunate that we started out with the 'pedia s/w, and that no-one "sat down and properly design[ed] the structure"! Most of what we have done in the last two years would have been difficult or nearly impossible if we had had s/w that enforced the structure. For example, if the several people working on Greek/Ancient Greek right now had to make code changes and get them committed to the running s/w base, instead of playing with a few templates and formatting pages as they like, that work would almost certainly not be happening. They would just be forcing the information into the pre-conceived format, with inferior results. Sure, I could write s/w today that would look really good, but couldn't have done it 6 months or two years ago; we didn't know enough. And note that the previous sentence will still be true 6 months or 2 years from now ...
The WiktionaryZ/Omega project is trying to write such software, but it "freezes" some level of understanding (and when they did the first version, they didn't even know that Japanese could be written in 4 different scripts, and that entries were not 1-1). Even if they do it over, they just freeze at another point.
We are still at a fairly early point, still learning enormous amounts about what a dictionary can be freed of the constraints of paper. (Why have lots of lang-x to English dictionaries, when one can have an Any to English dictionary? I dreamed of compiling one of these in the 1970's, and figured out it run to to many dozens of volumes, so not be terribly useful ...) Right not we have about 300K entries, in less than a year we will have a million+, in two years probably 5-10 million as we move toward comprehensive coverage of 40-50 major languages. Anyone think they can predict what is going to be needed in the structure? All we can do is work on it and learn. Robert Ullmann 12:39, 23 February 2007 (UTC)Reply
(Interposed comment; sorry if this disrupts the flow... not quite sure where to put it). The reason why the "any-to-English" format does not work as currently implemented in Wiktionary is that 99% of (English language) users, 99% of the time, either want an English definition of an English word or want a translation of an English word into a known, specified language, or want an translation of a word from a known, specified language into English. Mixing everything together on the same page just makes it more difficult for people to find what they are looking for, while adding no value. The one unusual circumstance where someone wants a translation of a word, and they don't know what language it is in, should be handled by some sort of "global lookup" feature. Matt 14:36, 25 February 2007 (UTC).
As I noted below, the number 1 hit on the English wiktionary (after the drunk-college-student obscenities ;-) is Category:Japanese language. I have a suspicion that contrary to your assertion, the vast majority of our users are in fact looking for English definitions of words in other not-necessarily-known languages. And if you are looking up something written in Han characters, which language/Englist dict are you going to look in? No value? The translingual/common and related languages (e.g. Mandarin/Min Nan) add a lot. A "global lookup" as you say? That's just what we provide. If it offends you (;-) that you get additional information: we are working on a filter, see below. Robert Ullmann 14:50, 25 February 2007 (UTC)Reply
I do find it very hard to believe that most users are looking for an English definition of a word in an unknown language. If this is true then the community of Wiktionary users must be a very atypical bunch compared to your average dictionary user, I would say. Matt 15:13, 25 February 2007 (UTC).
I'm not suggesting that once designed the structure can never change; that would be daft. I am suggesting that a structure be devised and enforced to cope with the content that exists now, that can then be extended/revised as people have new ideas and want to do new things. What I'm talking about is tidying up the sort of mess that we have, for example, at note (just to pick at random one of countless examples), which has a list of definitions of the noun, an out-of-sync list of translations which if extended to "all" languages would be about 100 pages long, followed by the definition of the verb, followed by more translations etc. This is the sort of very unfriendly "ad hoc" layout that could be avoided if, for example, a sensible structure for handling translations were designed. Matt 23:08, 24 February 2007 (UTC).
I agree that what headings can be used should be restricted, but only (if and only if) that list of restricted headings can easily be extended by sysops. For example, I'd love to see ===Usage note=== never be allowed (indicating instead that only ===Usage notes=== is a valid heading.) There are now four different "flavors" of related cleanup lists...mine are at User:Connel MacKenzie/todo, todo2, 3, 4, 5 etc. No one has been eager to attack {{rfc-trans}} recently...it does seem to be a growing problem.
The "Preload" templates have gone a long way towards helping newbies enter English new words. Many of the preload templates still need expanded "-intro" fillers, like template:new_en_noun_intro. And other languages...ahhh. Big time.
The biggest roadblock to making it "easier" to edit is that Wiktionary serves a lot more pages to readers than contributors. Right now, it is still fairly easy for a newcomer to make a minor correction to an existing entry. And although cluttered, I think the entries are somewhat comprehensible for newcomers to read. But certainly starting a new entry is daunting, for newcomers. Unfortunately, I don't see many ways to simplify that. --Connel MacKenzie 08:23, 25 February 2007 (UTC)Reply
The minor variations to headers aren't so difficult; we just need to periodically run something to fix them. (E.g. (^={3,6})\s*[Uu]sage\s*\[Nn]otes?\s*={3,6} (to) \1Usage notes\1 or such ;-) I have something that would fix all of them, but right now it "fixes" a bit too much ... as to the readers: 200+ hits day on MILF and choad? No wonder we have to protect those pages. It is interesting that Category:Japanese language is in the top 100.
Some kind of entry method for new users/new entries (much better than the preload templates) would be a fine idea. Robert Ullmann 12:09, 25 February 2007 (UTC)Reply
Some kind of "model page" might also be useful: a page that is fully populated, with all sections present, translations into all languages present, etc. Has anyone done this? Initially it might be good for someone very familiar with Wiktionary to do this and invite comments so that a consensus view on how all the elements should be laid out is arrived at (for example, how to avoid breaking up English definitions with acres of translations). Then the page could sit as a useful reference for newcomers like me. Matt 15:13, 25 February 2007 (UTC).
A single model page would never satisfy this need; there are too many possible variations. There is no single word in the English language that can function as every part of speech and every subcategory of every part of speech. Examples would be needed for each part of speech, and also for handling words that serve in multiple categories. There are also relatively few words in English that have directly precise translations in all languages, never mind the fact that we don't have editors who speak all the various languages of the world. For example, there are more than 1600 languages spoken in India alone, and only a handful of those languages have any entries on Wiktionary.
That said, there are a small number of pages that show a high proportion of basic layout information. I started a project (which I work on only occasionally) to accumulate some pages to serve as models. One such page is listen, and I am working to make Central Europe, transparent, and round into model pages as well. You can see the starting putline of my efforts at User:EncycloPetey/Model pages. --EncycloPetey 18:54, 25 February 2007 (UTC)Reply
It could just as well be a made-up word with made-up definitions and translations. The purpose is to ilustrate the structure and layout, not the actual content. Matt 21:06, 26 February 2007 (UTC).
WT:ELE originally was exactly that - the made up word "Hrunk" formatted a la Wiktionary. It has evolved a little bit, over the past couple years. --Connel MacKenzie 00:45, 25 March 2007 (UTC)Reply
IIRC, it was moved to its current name in late 2004, and gained initial acceptance in early 2005. Not sure how long after that, that WikiMedia added log entries for moves, deletions and protections. --Connel MacKenzie 19:03, 17 April 2007 (UTC)Reply

Plurals and translations.

I was recently browsing around and came across the entry for geese, the plural of goose.  I noticed there were three translations for that entry.  I know I've seen it elsewhere, but haven't noted it.  My first inclination was to delete the translations, but wondered if there is a policy for that.

I personally think entries marked as English plurals should not have translations sections.  The non-English plurals should be in their own pages. — V-ball 12:47, 28 February 2007 (UTC)Reply

I think that is the policy. I certainly delete translation sections on plurals when I see them. Widsith 12:53, 28 February 2007 (UTC)Reply
That is insane. For completely irregular plurals you'd delete useful content? No, that is not policy. The General case (which is wrong) is for regular inflections, to not require translations. --Connel MacKenzie 07:10, 1 March 2007 (UTC)Reply
I am in total agreement with Connel on this - there is no sound basis for deleting translation sections from plurals. Consider the end user who wants to know how to say, for example, friends in French. They might well go first to friends, and seeing no translation section there, may give up, or may go to friend (which will have ami and amie, but not amis or amies. Here's another place where our user may give up, or if they are intrepid they may go on to look up ami and find what they sought (after first suffering two unnecessary disappointments). bd2412 T 07:20, 1 March 2007 (UTC)Reply
How about etymologies? Atelaes 07:14, 1 March 2007 (UTC)Reply
I have no problem with etymologies (or pronounciations, citations, 'pedia links, etc). An entry for a plural is an entry that happens to be for a plural; we may define plurals by reference to the singular, but that doesn't mean they have to be stripped barren but for that information. bd2412 T 07:22, 1 March 2007 (UTC)Reply
Please try not to call me or my actions insane. I was under the impression that we had discussed this before and concluded that translation sections were only attached to singular forms? As for a sound basis, I find it hard to sympathise with BD's hypothetical user, since I can't believe anyone wanting to know the French for friends would not look up friend. That is the way all other dictionaries work. Anyway, I'll do whatever the community decides, but as I say I thought we'd been over all this in the past. Widsith 09:51, 1 March 2007 (UTC)Reply
To take some odd examples, what would you do with news, data, or peoples, where I expect the translations are often quite different from the "singulars"? More generally, I agree with the others that it is normally inappropriate to delete any content which might be useful. --Enginear 20:33, 1 March 2007 (UTC)Reply
I too had always understood that we made a fundamental distinction for certain information between the lemma form and non-lemmata. In particular, that when the only "definition" of an English word is "form of foob", that the translations will be given on the lemma page foob. Otherwise, we open ourselves up for incredibly bad headaches of maintenance. I for one don't want to try to correlate and verify all the translations of English present participles, to ensure that the Latin translation is the present active participle. I don't want to have to be sure that the correct gerund form is given under the English gerund form, even though the part of speech will not match between the entries. And which verb forms should we give then in the translation of English verb lemma, if we're going to open it up like this? The first person singular present active indicative? The present active infinitive? The passive preterite infinitive? Latin verbs have six infinitive forms (unless they're defective). Translations are not one-to-one. I say any non-lemma entry should point to the lemma for translations. --EncycloPetey 03:51, 2 March 2007 (UTC)Reply
I agree with this proposal. Perhaps there should be exceptions made for words like news, data, and peoples, and all other forms where a plural is the only form, or has a separate meaning. But, in general, if a word is simply plural form of foob (which, humorosly enough, is a Hmong verb), it should simply refer back to the lemma, where all the pertinent information will be held. I think most users will be intelligent enough to figure this out, especially if we are consistent in this, and the non-lemma is simply a soft redirect with nothing else. Atelaes 04:10, 2 March 2007 (UTC)Reply
I'm not seeing a "why" - paper dictionaries limit the information they provide in accordance with their corresponding limits on available space. We have no such barrier. bd2412 T 04:34, 2 March 2007 (UTC)Reply
We may not have limits on space, but we most certainly have limits on manpower. Having all the information in both places adds little in terms of the user's experience, it is a simple matter to follow the redirect to the lemma form. However, it adds a great deal of workload in maintaining the entries, as well as figuring out which form to use (this becomes more relevant with verbs, as EncycloPetey mentioned earlier). And, if we decide not to maintain the entries, we are then presenting a low quality product, which no one wants to do. Atelaes 04:57, 2 March 2007 (UTC)Reply
This is an all-volunteer project. Our manpower is whoever is interested in doing whatever they are interested in doing - so long as additional information is merely permissible, but not mandatory, I see no manpower problem. As for maintaining the entries, do you mean policing edits? I don't think having additional information in legitimate entries will increase the number of vandals. It really doesn't even give them additional targets, as we already have plurals as entries. bd2412 T 05:50, 2 March 2007 (UTC)Reply
I'm not talking about vandalism, no. What I'm talking about is when some anon adds the Turkish translation of foob, but doesn't do it on foobs. Then we're offering a sub-par version of foobs, which is lacking in the Turkish translation. On the other hand, if we offer nothing but "plural of foob", then we're giving them the same high-quality product, as they're, in essence, forced to go to foob and see the Turkish translation. You can certainly say, "Well I'll just add the Turkish translation to foobs," but will you? I won't. I don't have time for tedious stuff like that. And while our manpower is theoretically limitless, in reality, it does have a very distinct limit. We have, what, maybe a few dozen solid contributors? I think it unwise to add work for ourselves which adds little to the overall project. Atelaes 06:18, 2 March 2007 (UTC)Reply
Okay, but if some anon does go and add the Turkish translation to "foobs" do you think one of us solid contributors should then be tasked with taking the time to delete this information from the entry? bd2412 T 13:44, 2 March 2007 (UTC)Reply
I suppose so, yes. Atelaes 13:58, 2 March 2007 (UTC)Reply
I have to agree with BD2412 in this case. For example, the word marines refers to the marine corps, but marine does not refer to the marine corps! Depending on context, it can mean a member of the marine corps, or it can mean a variety of other things that the plural marines does not mean (you can't have a plural adjective can you?). This could be true of other languages as well. To use Atelaes' example, foob does not necessarily equal foobs in everyway. Take a look at the following (We'll call our language Fooblese for the sake of argument):
English
foob
===Translations===
  • Fooblese: foobe
English
foobs
===Translations===
  • Fooblese: foober
This is especially true for a language like English, with its wacky plurals such as cactus/cacti, city/cities etc. I also disagree with deleting a valid translation just because it was placed under the plural form and not the singular. Even if Atelaes is correct about the plural/singular thing, the translation should be moved to the singular form, not just deleted out of hand!

A-cai 14:41, 2 March 2007 (UTC)Reply

If foobs is not simply the plural of foob, then obviously it should have translations for those senses which are not simply plurals for senses of foob; I don't think anyone's claiming otherwise. Also, the existence of "wacky plurals" is not an argument one way or the other: the pages for cactus, city, etc. state what the plurals are. If you want to add the plural of a non-English noun, the place to do so is at the entry on the singular, under an "inflection" heading. —RuakhTALK 16:01, 2 March 2007 (UTC)Reply
I wholeheartedly agree with the fact that a translation put in the plural should be moved to the singular form (if not present), instead of unceremoniously deleted. I apologize for not being more clear on that. As for the marines, as I mentioned earlier, there will certainly be exceptions. Anytime a word cannot be defined simply as "plural of foob", then all bets are off as far as what I'm talking about. As for cacti, certainly it's a goofy plural, but we'll have a succinct explanation of it: "plural of cactus". Perhaps it might also be wise to have a usage note (follows Latin declension) to explain it, but otherwise, I still think it should simply be a soft redirect. Atelaes 16:05, 2 March 2007 (UTC)Reply
I don't know if "wacky" plurals like cacti are anything special.  To me, it seems weird to have translations sections on plurals.  For example, the page cactus will have a translation section, and in that one can see (after they make an unnecessary extra click to show the translations (my pet peeve since I can't seem to get my preferences to work)) how to say cactus in various languages.  Most likely, you will see the Russian word кактус.  If I really want to know what the plural of кактус is in Russian, I will click on it because it's entry should have a paradigm showing the plural, and I would not expect the entry for cacti to have the Russian nominative plural, кактусы, listed.  The plurals of foreign words should be listed the same ways English words are, meaning кактусы is mentioned on the кактус page as a plural, and кактусы has its own entry saying, "Nominative plural of кактус." — V-ball 16:20, 2 March 2007 (UTC)Reply
Since I am not usually very interested in translations, I suppose my view -- about 100 lines up -- should not be given undue weight. But the general issue of how to deal with "incomplete" entries for inflections, that is, skeleton entries or entries which are less complete than what we normally call a full entry, is of wider interest. Perhaps the standard "definition" of an inflection should be along the lines of Plural of foob, where further information can be found. --Enginear 18:35, 2 March 2007 (UTC)Reply
Take a look at friend and friends. Please tell me that we are not going to put the translation for the TV show under the singular entry! Also, note the difference between translations in Mandarin and Min Nan. Which information should be left in, and what should go into the entries for the Chinese words?

A-cai 18:42, 2 March 2007 (UTC)Reply

Translations of the TV show belong on the capitalized Friends page and are a separate issue altogether. Details of how 朋友 is inflected in different Chinese languages belong on the 朋友 page. Widsith 18:47, 2 March 2007 (UTC)Reply
Aha! But did you notice what happens when you click on Friends? It redirects you to friends! I'm not sure any more what Wiktionary policy is for that, although what you say makes sense.

A-cai 19:05, 2 March 2007 (UTC)Reply

Policy is that the proper noun should be at Friends [though I can't remember if Proper noun is still used, or whether we now call them all Nouns] and the noun at friends. I've now removed the redirect and split the entry. --Enginear 20:25, 2 March 2007 (UTC)Reply

Whoa, cool down guys. There is a very good reason why we do not give translations (or synoyms, etc) for inflected forms. It is that words often have multiple meanings and the translations usually do not apply to all of them.

Taking "friend" as an example, that currently has seven meanings. There are, correspondingly, seven translation tables. Examining these shows that translations differ with sense. For example, French has "ami(e)" for the first sense, "petit ami(e)", "copain"/"copine" for second, and so on. "Friends" however gives "amis", suggesting that this is a suitable translation for all senses of the word. This is utterly false.

A user can easily find the translation they require (and, more importantly, the correct translation) by following the link to the uninflected form, and then clicking on the link in the translation for the sense they require. A well-formatted entry will include the plural (and any other inflected forms) there.

In the case of plurals that have special meanings, such as "marines" or then, of course, translations can be given for these. Otherwise, entries for English plurals and other inflections of English words must not include translations.

Now, I understand Connel's point about removing useful information. The thing is, this information is in the wrong place and is unhelpful or misleading as it stands. The appropriate action to take is to move these translations to pages for the foreign-language singular forms (and plural forms, if required, especially if these are irregular) and then to delete them from "friends". Anyone willing to help me with this?

By the way, Friends should be deleted. It is encyclopedic. — Paul G 11:33, 3 March 2007 (UTC)Reply

I'm for collecting information at one place in cases where it isn't controversial. There are a number of things besides translations that need not appear on "stub" pages, that is, pages where the only definitions are those that refer to other pages. These types of entries include alternative spellings and inflections, but not synonyms such as Allen wrench and Allen key.

There is no rule of thumb, I think, so much as an outcome of process. It is never acceptable to delete correct information that does not violate an accepted standard. Especially when in question, such as with word histories, a deletion should be noted, of course. Over deletion, it is much preferable to consolidate information such as synonyms and etymology (e.g. a full etymology becomes root plus inflection). By consolidation I do not mean "move" so much as "merge", although the original example of translations for plurals is minor and acceptable as per Paul G. Consolidating differing information should be allowed even if it leaves a stub page, provided the information does not contradict (as with color/colour, program/programme) and there is no controversy over the "correct" form, i.e. no clear principal spelling of e.g. irregular plurals. On the latter point, changing a principal page into a stub page without consolidation as a clean-up measure is not allowed. Expanding a stub page into a full page is already permissible if the contributor has good reason to believe that it should be a principal page. DAVilla 20:32, 7 March 2007 (UTC)Reply

Widsith, I apologize for calling your idea insane. To clarify what I meant, the removal of beneficial information is much worse than standardizing the Translations section layout (in this manner.) Please note that BD2412's hypothetic user (in his example somewhere above) usually is not going to be a human being; rather, it will be that human being's software performing the lookup.
The notion that all software out there properly knows how to truncate a word form to a lemma (I assert) is insane - most software can't even tell (accurately) what language a given word is in. Looking at the Wikipedia pages on Corpora linguistics, I'm stunned that my trivial frequency analysis of 1.6 billion words from Project Gutenberg wildly overshadows the ANC.
My expectation is that there will be an order of magnitude more software components written over the years. Some will get better, but all new ones are very likely to start from the same starting point. If Wiktionary provides information directly for all forms of a word, the programatic mistakes are not only eliminated (before they happen) but subtle mistakes are avoided entirely. This comes about by human contributors here verifying the word forms individually, and noting exceptions accordingly.
My point of view (admittedly, my own) is that first hits to Wiktionary pages should contain as much information as possible. Every web-based extension of Wiktionary I've seen so far has tremendous difficulty linking back to anything other then the "direct hit." As those components become more elaborate, the navigation to what you call "the correct" lemma form will become more difficult, if not impossible. (E.g. try browsing Wiktionary on your cell phone - GOOD LUCK!)
With all that said, from my perspective, it is "insane" to remove translations from plural entries, especially as a matter of procedure. (Again, I think I'm using "insane" as an intensifier, not as an insult...that may be why you were offended by my wording initially?)
--Connel MacKenzie 14:14, 26 March 2007 (UTC)Reply
I think that if a plural entry definition were to say "Plural of foob, where further information can be found." then many of the objections to adding incomplete sets of translations, etc, would be countered. --Enginear 19:58, 26 March 2007 (UTC)Reply

Trademark names

We need a policy for trademarks. If we have one, I can't find it (and it should be at Wiktionary:Trademarks). I think that widely know trademark names should be included if they can be used metaphorically (e.g. Cadillac), descriptively (Mark wears a Rolex and drives a Lexus, whereas Joe wears a Timex and drives a Honda), if the mark is approaching genericism (Kleenex, Xerox), or if the mark is a specific use of word that would otherwise be in the dictionary anyway (Bounty for paper towels, Crest for toothpaste, Janus for mutual funds). I'm putting together a listing of the most widely known brand names at User:BD2412/brand names.

Also, I've noticed that from time to time folks need to look up trademark registrations here, so I'm going to provide some quick tips on how to do this.

1. Go to the United States Patent and Trademark Office trademark main page.
2. Near the top of the right-hand column, click [Search].
3. I recommend the Free Form Search. Type in the word you're looking for followed by [comb] and you'll get a combination of searches for the word alone, with punctuation, or as part of a phrase. Also, it often helps to add "and live[ld]" to a search, as this will limit it to live marks and filter out marks no longer registered.

Cheers! bd2412 T 23:18, 2 March 2007 (UTC)Reply

I agree with most of this, but disagree with your view that the Bounty and Crest and Janus trademarks should be included just because bounty and crest are words and Janus is a dictionary-worthy proper noun. (Maybe they should be included anyway, but if we develop criteria for inclusion of trademarks, I think they should apply regardless of whether the trademark represents its own entry, or simply an additional sense in an existing entry.) —RuakhTALK 03:40, 3 March 2007 (UTC)Reply
I think that trademarks that incorporate a common word for an uncommon purpose (e.g. Apple) merit a one-line entry because the word is already in the dictionary, and the trademark definition is a legitimate alternative definition of a word for which we are trying to give complete information. That said, I think such instances should be limited to trademarks that can be demonstrated by reference to a source such as a trade journal to be very widely known and very strong. bd2412 T 19:25, 3 March 2007 (UTC)Reply
I strongly disagree with this (sorry bd2412, I just seem to keep picking fights with you, nothing personal :-)). Certainly we should have entries for bandaid (or is it band-aid?) and xerox, because they're used in an idiomatic sense, not necessarily related to the brand itself, perhaps rolex as well. But, my opinion is that they should not merit entries until they can be put in non-capitals, as xerox and bandaid can. Otherwise this opens us up to including every brand name in existence, which is not dictionary material. Unless someone can show exactly where else the line should be drawn, I say we draw it here. Atelaes 04:50, 4 March 2007 (UTC)Reply
I think there are resources that would make it fairly easy to draw sensible lines. I've been tossing some ideas back and forth in my mind and would say, for example, that we can easily agree to exclude company names that are just collections of surnames (e.g. Morgan Stanley Dean Whitter, Bristol-Myers Squibb, and Ethan Allen). I do, however, think that we should make every effort to list all brand names for medications (Tylenol, Dexatrim, Motrin, Prozac) because I can see a particular utility to such listings, in part because the drug makers tend to come up with fanciful words, and in part because most such drugs can be described by reference to their key ingredient (i.e. acetaminophen, ibuprofen). I'd also be rather inclined to include fanciful car names (Integra, Montero, Prius). There are hardly so many that it would cause a fuss. With respect to other corporate or brand names, I'd set a higher criteria than the CFI to show that the brand name is used in some descriptive or attributive sense, but that should be easy for truly mega-brands such as Coke and Pepsi, McDonalds, Microsoft, etc. bd2412 T 03:30, 6 March 2007 (UTC)Reply
That might be useful, but the utility argument is a very well documented logical fallacy. We're aren't aiming to be useful, we're aiming to be a dictionary (which is useful, of course, but not just). Usefulness includes TV listings, atlases, currency converters, whatever you can think of; there are lots of useful things. I fail to see why we would give any class of words immunity from CFI, and I especially fail to see why, if we did, we would want them to be medications and car models. Those aren't within the urview of a dictionary, but are more appropriate at an encyclopedia. Trademarked names or brand names still need to pass attestation with independent use (and not just mention). I would suggest a vote to make the point clear, but I'm already satisfied with CFI's wording: "To be included, the use of a trademark or company name other than its use as a trademark (i.e., a use as a common word) has to be attested." Dmcdevit 06:49, 25 March 2007 (UTC)Reply

Use of ® and ™ in entries

Greetings! As a professional intellectual property attorney, I can assure you that there is no requirement whatsoever that we should use the ® and ™ symbols adjacent to the headword of names that are trademarks (registered or not). First, we're an educational organization making a purely nominative use of the terms (i.e. we're not selling hamburgers, so we don't even have to acknowledge that McDonald's is a trademark). Second, even so, we do indicate in the entry and often in a usage note that the word is a trademark or is a registered mark. Third, the ® or ™ symbol is not a part of the actual word. Finally, trademark registrations are neither eternal nor certain. Registrations lapse, get cancelled, or become abandoned all the time (I have personally seen some very big companies errantly allow the lapse of registrations for some very famous trademarks).

Frequently multiple parties claim ownership of a particular mark and spend years litigating who has the right to use the mark, or whether both parties can use the mark for different products (e.g. Ritz crackers and Ritz hotels); parties may have rights to a mark in limited geographic areas; and parties often claim to own generic or descriptive marks that can not actually be "owned" by anyone. In short, even information on the best known marks can become obsolete, and there are few people here with the technical background to determine the status of a mark, particularly one that is contested.

In short, we should get rid of those symbols. Cheers! bd2412 T 06:44, 9 March 2007 (UTC)Reply

Any idea why all other dictionaries seem to use the marks, then? I don't understand what is bad about including the mark on a term that has had technical problems with renewals. OTOH, retaining the marks, alerts our readers that they probably should use the symbol as well. To me, it seems that removing the marks would be inconsistent and unhelpful. --Connel MacKenzie 10:47, 11 March 2007 (UTC)Reply
It seems the cleanest way of marking trademarks and brand names to me. The symbols are universal and concise. --EncycloPetey 03:46, 12 March 2007 (UTC)Reply
To Connel, I'm looking at the Webster's Collegiate Dictionary, Tenth Edition (which is the one I have handy at the moment) and it has a listing for Xerox without the symbol, but with trademark at the beginning of the definition line. I do not believe we need to 'alert our readers' in the manner that you suggest, as there is no reason for anyone other than the owner of the mark to actually use such a symbol. A Google book search for Coca Cola, Absolut, Tylenol, shows that such symbols are absent not only in works of fiction, but even in non-fiction works examining these specific industries.
To EncycloPetey, I think the cleanest way of marking trademarks and brand names is the same way we mark medical terms, slang, vulgarities, sports terms, etc., with a notation in the definition line. This is particularly evident where we have an entries such as Cadillac, Hartford, Lincoln, Mercedes, Nike, Quaker, and Saturn each of which is a famous trademark, but each of which has additional meanings for which a capitalized entry is necessary (place names, given names, surname, mythological figures, etc.). bd2412 T 16:46, 12 March 2007 (UTC)Reply

I agree with BD; I have always found it a little weird that we include the symbols, when they virtually never appear in actual usage. We give the impression that such symbols have to be used with the word, which is not the case. Widsith 17:11, 12 March 2007 (UTC)Reply

  • Sorry guys, but these symbols while concise are certainly not universal. In Spanish neither are used, instead MR marca registrada takes the place of both.
  • As for the symbols being used in the headword section but not in actual use, this is just silly. We also put m or f in the headword next to nouns but these are also never seen in actual use. — Hippietrail 14:34, 13 March 2007 (UTC)Reply
I agree with bd2412, having a tag on the definition and not in the headword seems like a good solution. Regarding Hippietrail's comment: so, if we did decide to continue using TM in headwords, do we restrict this to ==English== parts of speech, and use MR in the ==Spanish== headers? -- Beobach972 15:44, 14 March 2007 (UTC)Reply
Hippietrail, m and f indicate how the word must be used in speech and writing. ® and ™ do no such thing; rather they indicate how one party, the owner of the trademark, would like you to use the word, but not any way in which you are required to use the mark. And how about the many words (including some listed above) that have one sense that is a trademark and others that are generic? Also, how does Spanish account for unregistered marks? A word or phrase used in a Spanish-speaking country that is used as a trademark but is not registered gets no recognition, virtually no protection whatsoever. bd2412 T 16:17, 14 March 2007 (UTC)Reply
bd2412, your original argument made no mention of "how a word must be used" but merely "the symbol is not a part of the actual word". Since you are content to change your argument as we go would you not concede that some writers do care whether a term might be a trademark or not. It could be a matter of style or policy in the departments of certain companies. I think it is never an error to include too much information on any word. Those who do not care about certain parts of the information can ignore them. If we erase information then people who do care have no way to create them however. Now the idea of putting the warning that a term may be a trademark into the sense sounds like one worth exploring.
Sadly I am not aware how the various Spanish-speaking countries account for unregistered trademarks. My experience is mostly in Mexico where I am only a visitor. — Hippietrail 17:01, 14 March 2007 (UTC)Reply
Well let me point out that the way we are using the symbols adds to the appearance that they are, in fact, a part of the word. We are putting them in boldface right next to the word, with no space between. Our m and f and pl and so forth are in italics and set apart by a space. Now, if we were to do that with the ® and symbols, it would be more appropriate, but in my opinion it would look horrible. It also seems to me that the qualifications we use in the headword line are fairly stable. We expect a masculine noun to still be a masculine noun in a hundred years. However, a word that is vulgar today may lose that connotation in some years; a term that is slang now may become so widely used as to be deemed formal; and a trademark may become generic, or may simply cease to be used.
If we're going to indicate trademark status, we should do it on the definition line for the trademark definition. After all, what do we do with Ace, a common nickname for fighter pilots, but also a famous trademark for two unrelated companies (selling hardware and bandages, respectively). What do we do with Dove? In fact, if you go to the website of the United States Patent and Trademark Office (http://www.uspto.gov) you'll find that thousands of words in the English language are marks, including the names of most any figure from mythology or ancient history, most city, county, state, and other place names, most given names and surnames - should we put a copyright symbol next to all of them? There are nine current registrations for "Dan", should our entry read Dan®? Should we note marks that are registered trademarks for some purposes and unregistered trademarks for other purposes with both symbols, Dan®™? What if the owner of one of the 40 or so registered "Scott" marks should decide to sue us for using the ® with some marks but failing to use the ® with Scott? Or Venus? Or Rio? Or Smith? Or Taurus? You must know that if we include marks at all, we can not do so discriminately.
I have no need to change my argument, but I'll surely raise additional arguments, all of which indicate that we have no business playing with trademark symbols, especially when our use is bound to be inconsistent unless someone is willing to check every word in the dictionary against the constantly changing USPTO database every few months. bd2412 T 07:35, 15 March 2007 (UTC)Reply
It's worse than that. Many, probably most, of us are not based in the US. There seems to be general agreement above that there is no legal requirement to use the symbols in the dictionary, so the location of the servers is irrelevant. It is where we use the words that counts. Many marks are registered in one or a few countries only. It would be necessary to check all countries' patent offices, and perhaps note in the entries where the marks were registered and where not. That is not the job of a dictionary. I agree with you that we should not use the symbols, but where known, we should gloss the entry to say it is used as a trade mark. --Enginear 12:29, 15 March 2007 (UTC)Reply
Whatever we decide to do, one thing I think we really ought to have is a disclaimer, as some paper dictionaries do, that the inclusion or absence of a trademark symbol does not affect the trademark status of the word so indicated or not indicated (in other words, if you take our word for it but we've got it wrong and you get into trouble because of it, don't sue us). — Paul G 10:21, 18 March 2007 (UTC)Reply
  • Now that I'm back in Mexico again I've kept an eye out and I have actually seen ® (but not ™) used here. I don't know if it's considered the same or different to MR. Also in at least one of my Spanish-English dictionaries, ® is used in both the English and Spanish sections for some words. Niether ™ nor MR is used in either section. — Hippietrail 18:03, 19 March 2007 (UTC)Reply

I CALL FOR A VOTE (can someone put that together?) bd2412 T 05:05, 22 March 2007 (UTC)Reply

Ok, I have started a vote at Wiktionary:Votes/pl-2007-02/Trademark designations. Cheers! bd2412 T 03:39, 23 March 2007 (UTC)Reply

A proposed Vote concerning Placenames

A recently started a vote concerning the criteria for inclusion for placenames. See Wiktionary:Votes/pl-2007-02/Placenames This was not discussed beforehand and has degenerated somewhat. I propose that it be abandoned, and replaced with a simpler vote with fewer, and less specific options as follows.

  1. The criteria for inclusion for placenames should be exactly the same as for all other words - broadly attested, used rather than mentioned.
  2. One or more addtitional criteria should be applied to placenames - details to be discussed later if this option is caried.

Please make your thoughts known here and provide a better first vote if you can think of one. SemperBlotto 09:31, 13 March 2007 (UTC)Reply

I agree that the vote should be abandoned, it's quite a mess. A (rather lengthy, I fear) discussion is needed before a vote of this nature could be restarted. I think that the current criteria are good, but I don't believe it's a simple matter of picking one or the other. Past that I have nothing useful to offer, sorry. Atelaes 10:07, 13 March 2007 (UTC)Reply
I agree that the vote should be abandoned, it seems we didn't plan it out very well. -- Beobach972 20:56, 13 March 2007 (UTC)Reply
It's alright, you're right that we need a vote; we just need to clearly plot out all the options we'll have on it. -- Beobach972 15:36, 14 March 2007 (UTC)Reply

The criterion I like to apply to proper names (and with little difficulty to all words really) is if it can be understood out of context. For place names this is a little more lenient than I had previously been judging them. In Athens, Georgia the word "Athens" means the city in Georgia, so if this can be verified with three independent citations, e.g. newspapers or what have you, out of context meaning "Athens" (as per the title of the page) instead of "Athens, Georgia", then the place of Athens, Georgia is one sense of Athens that is understood regionally. Outside of that Athens could only be assumed to be the city in Greece or taken just as a general, unspecified place name. DAVilla 18:54, 23 March 2007 (UTC)Reply

I don't agree with (or maybe don't understand) that logic. First, if we assume Athens, Georgia did not generate 3 cites for Athens, surely the term Athens, Georgia is understood out of context and can be attested as such. For that matter, what place can't be understood out of context regionally considering placenames belong to regions? The problem is that being understood out of context is not a criterion that established appropriateness for the dictionary. If I said the single word "blast" even a cytologist is not likely, in any region, to assume first that I mean an immature cell. Conversely, the cytologist in the US would understand Millard Fillmore, or even Mallard Fillmore, out-of-context. In general, while I wouldn't say that all proper nouns don't belong, I would say that while not all proper nouns should be removed (I just added Sahabi, for instance), since English, Taoism, and Enlightenment have definable meanings with or without context, Skagway, Transamerica Pyramid, and Lü Buwei can only be described by pointing at the physical objects they reference. They don't belong, and neither, I think, does Athens, Georgia, unless its an attributive or generic term. Dmcdevit 23:03, 26 March 2007 (UTC)Reply
I don't think we should have an entry on Athens, Georgia, but we should most definitely have entries on Athens and Georgia! bd2412 T 23:09, 26 March 2007 (UTC)Reply

Easter Competition 2007

This is an announcement to open the Easter Competition 2007. As with previous contests, the prize consists mainly of a warm woolly feeling inside, but the primary object is playful competition among Wiktionarians. --EncycloPetey 02:34, 15 March 2007 (UTC)Reply

Results are now posted. --EncycloPetey 17:16, 10 April 2007 (UTC)Reply

Interpretation of CFI

Someone (dmh) else said earlier in this forum that the CFI are too vague. I tend to agree.

A user created an entry for Friends, defining it as the US TV show. I nominated it for deletion, saying it was encyclopedic (that is, that "Friends" belongs in an encyclopedia). (Note that the entry currently does not contain this definition.) This provoked a heated discussion that comes down, in my understanding of the issue, to the interpretation of what the CFI say about inclusion of names, namely "A name should be included if it is used attributively, with a widely understood meaning".

It is being argued that, as "Friends" can be used attributively (as in "Jennifer Aniston, the former Friends star"), it should be in. Taking CFI literally, this means "Friends" is allowed in. My feeling, based on my experience of what does and does not get into Wiktionary, is that this is not the intention of that part of the CFI.

If the CFI are to be interpreted literally on names, then TV shows, movies, place names (no matter how tiny the places they refer to) and many other names are allowed in provided they have an attributive use ("Friends star David Schwimmer", "Gladiator star Russell Crowe", "Nowhereville resident Joe Bloggs").

Well, yes. If there are three such uses spanning a year in permanently recorded media. bd2412 T 04:30, 16 April 2007 (UTC)Reply

However, if the CFI are not to be interpreted that way, and I don't think they are, then they need to be tightened up to state more precisely what can be included and what should not be. — Paul G 07:13, 15 March 2007 (UTC)Reply

I think that might be a good idea. Proper nouns CFI seems to be a hot topic as of late. Atelaes 08:19, 15 March 2007 (UTC)Reply
I agree that the issue at hand is whether or not to include proper nouns at all. I also agree that WT:CFI is rather vague in this area. My vote would be to include proper nouns (place names, people, companies, TV shows, movies etc) as long as the definitions are kept short, and the proper noun is widely used in spoken or written English. Some contributors have expressed reservations about this idea, because they fear that we might not know where to draw the line (i.e. include Rocky, but not Rocky II?). My take is that Wiktionary is in its infancy, and it is probably too early to be overly conservative about what to leave out. We are breaking new ground with Wiktionary, and this is definitely an area where we have a potential to surpass our traditional expectations of a dictionary.
For example, did you know that there are three different Chinese translations for the 2000 movie Gladiator (PRC: 角斗士; Hong Kong: 帝國驕雄; Taiwan: 神鬼戰士)? These three translations are, of course, not to be confused with the Chinese translations of the 1992 movie Gladiator (PRC: 终极斗士; Taiwan: 神鬼拳王)! How are these terms pronounced, what are their etymologies, which one (if any) is the literal colloquial term for gladiator in Chinese? Wikipedia does a poor job of answering such questions, but Wiktionary is ideally suited to answer them all. -- A-cai 09:11, 15 March 2007 (UTC)Reply
I agree that we should take a liberal course towards including proper nouns, with some caveats.
  1. I think that we should include place names with abandon - from the country level down to the town, borrough, or hamlet. In order to prevent ourselves from going crazy with 50 towns named Springfield, I propose a rule that if a place name is used for more than 5 places, then it gets a single line indicating that it is a commonly used place name (unless one of those places is a world city like Paris, or a capital like, well, Springfield (Illinois, that is).
  2. I think we should include a line for any brand name, movie name, TV show title, band name, or song title for which we should otherwise have an existing Wiktionary entry (Friends, Sneakers, Pledge, Nirvana, Joy, etc.)
  3. I mentioned somewhere above that I think we should include the brand names of medications and cars (remember, we're talking about one-line entries here).
  4. With respect to people, I think we have to act on the presumption that any combination of a first and last name (Joe Smith, Marcia Clark, Reginald Denny) is simply non-idiomatic unless it means something other than merely an identification of a human being (e.g. Shirley Temple, Benedict Arnold) bd2412 T 11:10, 15 March 2007 (UTC)Reply
That sounds generally O.K. to me, except for criterion #3#2. I don't think "is a capitalized version of a common noun" should be a criterion for including a proper noun, any more than "is a protologistic use of an existing word" should be a criterion for a normal word. —RuakhTALK 16:12, 15 March 2007 (UTC) and 07:20, 16 March 2007 (UTC)Reply
Good work - this seems reasonable. I particularly like number 2, which would cover the current "Friends" debate; I have already suggested in the discussion on RFD that we should cross-refer to Wikipedia in this case, and I think we should do this for all words in category number 2 above.
Ruakh, did you mean "#2" when you said "#3"? As I understand it, the idea here is to acknowledge the existence of a capitalised form of a common noun, but not to give it any treatment here unless it falls into any of the other categories that we will have articles for. So, for example, bath should have a "See also" at the top (indicating another entry in Wiktionary) that links to Bath, the place in England; cavalier and mini would do the same (brands of cars); but friends and nirvana would include links to the Wikipedia articles on the TV show and band name respectively under the "See also" section towards the bottom of the article. We would do this not because these proper nouns deserve special treatment, but rather because users might expect to read about them here, and would be directed to Wikipedia; and because it will also go some way towards preventing contributors from adding definitions for the proper noun to the entry for the common noun. Users searching for proper nouns that aren't capitalised forms of common nouns in Wiktionary will get the "Perhaps there is an article on X in Wikipedia" link and can then find what they want in Wikipedia.
Articles on particular people, such as "Shirley Temple", are clearly encyclopedia material and would get no coverage in Witionary at all, that is, not even a cross-reference or link. Users searching for "Shirley Temple" will get the page that says "Perhaps there is an article on Shirley Temple in Wikipedia". Vanity articles created by users about themselves will of course continue to be deleted.
I think this is pretty much what we do already, even if it is not explicitly stated in CFI. What is therefore needed is a form of words for this that we can put into CFI to make it much more clear exactly what is to be included and what is not. — Paul G 07:00, 16 March 2007 (UTC)Reply
Paul, I don't think you meant to use Shirley Temple as the example, it is the counter-example: it is the name of a drink, which is why we have the wikt entry. And given person who does is not a drink or synonomous with traitor would just be in wikipedia. This all seems pretty good to me; but I don't think that we should have "Gladiator" as a film title because it is named variously in other languages; that should be in the Wikipedia articles (English and others) on the film; in particular there should be interwiki links. If WP is weak in this area, well, go improve it! (We don't have the invented term "sorcerer's stone" because J. K. Rowling thought—probably correctly—that the title had to be dumbed down for the American audience who wouldn't know what a philosopher's stone was ;-) If the Chinese names for Gladiator qualify in their own right, sure they should be included as ordinary words, with the same see-also reference(s) to w:zh etc. Robert Ullmann 09:53, 16 March 2007 (UTC)Reply
Ah, my mistake. Yes, names of cocktails are certainly to be included, and many of these are named after people (famous or otherwise); and yes, we would have no entry for "Gladiator". People wanting translations of the name of the film should look at w:Gladiator and then follow the link for the language they are after, or request it if it is not there. The Wiktionary article "gladiator" would have an interwiki link in the "See also" that just says "Gladiator in Wikipedia" or something like that, without specifying that this is a film. (Who are we to say what article Wikipedia has or will have in the future at "w:Gladiator"? Currently, it's a page for the common noun, with a link at the top to the disambiguation page which contains a whole load of links, including not one, but three films with that title. So there you go — a Wiktionary article on "the" film Gladiator would be incorrect.) You make a good point about the treatment of languages that don't have capitalisation, such as Chinese.
Once there is agreement, I'll draft some text summarising what we are discussing here and add it to CFI. Or should it be treated as draft policy first? — Paul G 10:36, 16 March 2007 (UTC)Reply

[Back to margin.] This discussion was only started yesterday, so I suggest that either it should be left a few days for other views to be added, or it should be written up as a draft policy.

However, subject to fine tuning, I think these are excellent criteria. The page that says "Perhaps there is an article on Marcia Clark in Wikipedia" needs to be made a bit more friendly (because it doesn't actually say that at present), and IMHO, our "See also" heading, when placed at the top for cases like this, should read "For additional senses see". Pop stars, et al, who use a single name, should be treated the same as single word film titles.

Incidentally, translations of place names can be covered by checking WP, in a similar fashion to film names, but we still need them here for their etymological value.

So to consider one example, say someone wants to find out what the name Sigourney Weaver means, and for some reason comes to wikt to find out, we seem to be aiming for the following:

  • Searcher finds no entry on wikt but is given page saying "Perhaps there is an article on Sigourney Weaver in Wikipedia"
  • Searcher finds there is such an article; in the article finds that Weaver is her family name, and that she chose Sigourney as her stage name to match a character in The Great Gatsby. (The article should perhaps say that this was etymologically a singularly appropriate choice, but it doesn't.)
  • Searcher therefore decides to check the individual words; Weaver's disambiguation page should direct the searcher to wikt for the meaning and etymology of the surname Weaver, and we should have an entry saying that, as a surname, it most commonly means descended from someone who made a living from weaving. Neither of these are in place yet, but since they are fairly obvious, that is not too important.
  • Searcher, having got the message that wikt deals with surnames, decides to try looking up Sigourney. At the moment there is nothing there but let's imagine...there is an entry saying
    1. Surname adopted in US by certain Huguenot families previously called Sigournay, named after their town of origin, Igournay, France
    2. Town in US, named after Lydia Huntley Sigourney
      Those seem uncontentious, and investigation on WP could find that the Hugenots were forced out of France by religious persecution, and brought their skills as weavers with them. (Neither WP nor wikt have entries on Igournay at present.)
      But should we also have:
    3. Stage forename chosen by Susan Alexandra Weaver, after a character Mrs. Sigourney Howard in The Great Gatsby. Mrs. Sigourney Howard was herself named after Father Sigourney, a tutor of F Scott Fitzgerald.
      Or should we just have
    4. For additional usage see w:Sigourney Weaver.
      and leave the searcher to search w:The Great Gatsby to find out (actually, it's not yet mentioned there, but then nor is it yet in wikt).
      or should we not mention it at all? My preference is for the second option. --Enginear 14:09, 16 March 2007 (UTC)Reply
You make some good points. Yes, we should certainly carry on discussing this for a while until we have clarified what changes we are going to make. For now, I'll put a note in CFI that the policy is under review, pointing to this discussion.
Regarding given names and surnames, we already have a policy on this - given names go in; surnames go in if they are etymologically interesting. So "Sigourney" should probably go in, as, even if Ms Weaver was the first to adopt it as a first name, no doubt there are lots of baby girls who have been named after her. "Weaver" would also go in because of the etymological interest, along with Archer, Smith, Taylor and another surnames derived from occupations.
I prefer the "See also" option for the info on Sigourney Weaver. I lke "For additional usage", by the way. I would prefer the link to look like this: [[w:Sigourney Weaver|the Wiktionary article on Sigourney Weaver]], which makes it clearer what the user will get on following that link.
I'm intrigued... what was singularly appropriate about the choice of "Sigourney"? Perhaps you could update the Wikipedia article if this is of interest. — Paul G 12:29, 17 March 2007 (UTC)Reply
Not really my bag to add it, but see [3]. --Enginear 16:23, 17 March 2007 (UTC)Reply
I don't see how a surname can fail to be etymologically interesting. Virtually all surnames have some basis of derivation, whether by place, occupation, patronymic, even assignment as a form of derision. But isn't this discussion mostly about place names, brand names, and titles of media? bd2412 T 15:10, 22 March 2007 (UTC)Reply

Categories as Wantedpages but not Wantedcategories

I suppose there's a good reason why many of the top Wantedpages are categories like Category:xx:Slang or Category:en:Slang, and yet none of these categories seem to be in the list of Wantedcategories? A quick search didn't turn up any revelant discussion about this. - dcljr 17:11, 18 March 2007 (UTC)Reply

The reason seems to be that those categories aren't actually included — {{#ifexist:…}}s are used in the relevant templates to prevent non-existent categories from being included — so the pages aren't actually added to those categories. IMHO they shouldn't show up at Special:Wantedpages, either, but this seems to be part of a more general problem; for example, when you edit a page that includes a template in a non-active part of an {{#if:…}}, the list of templates-being-used at the bottom of the page does list that template. —RuakhTALK 18:28, 18 March 2007 (UTC)Reply
The problems with a lot of them is an interaction between a MW bug and the nav template. Connel has fixed this one.
The problems with the ones for xx: and en:, such as xx:computing are from the context templates, which pass {{{lang}}} as a parameter even when lang is not defined; I pointed out to DAvilla that the calls on context/label should use lang={{{lang|}}}, but she insisted this was caught lower down. As you can see, it isn't, and other code uses xx and en and it gets picked up and used as a reference. One oddity about template syntax is that the variable namespace scoping is not at all what you think it is in some cases. May also be related to the same MW bug. (These templates are way more complicated than need be, not at all sure why.) Robert Ullmann 18:36, 18 March 2007 (UTC)Reply
What a mess. This was working, at least for a little while. 10 cents to the person that can find the MW change that caused it! --Connel MacKenzie 16:19, 23 March 2007 (UTC)Reply

Context labelling of inflected forms of Regionalisms

Hello all, I had a small conversation EncycloPetey (talkcontribs) about de-tagging inflected forms of Regionalisms. Specifically the Geordie ones in category:Geordie. I felt that the category had become unecessarily cluttered with plurals and verb forms therefore I set about de-tagging the ones where the infinitives/non-inflected forms are marked already, with exception given to inflected forms of non-dialect words that are specific only to that region.

Anyway, we thought it might be polite to ask others first. But I do feel that tagging ALL inflected forms adds clutter.--Williamsayers79 00:01, 19 March 2007 (UTC)Reply

Why not have a Category:Geordie plurals, and so on, like with English plurals? —RuakhTALK 00:34, 19 March 2007 (UTC)Reply
Its a possibility but I'm not sure everyone will like it since Geordie is a dialect (albeit very substantial) of English and does not follow the standard language + POS naming convention. I remember my very first entry radgie nearly got killed by Connel because I mistakenly had Geodie as the language header :-> --Williamsayers79 00:57, 19 March 2007 (UTC)Reply
I agree with your view that only the singular form should be tagged, which comes from my general preference of having lemma pages as the sole information containing entries, with inflected forms simply as soft redirects (with certain exceptions, of course). However, I have received flack for this view before, so take it for what you will. Ruakh's suggestion would make an excellent compromise if people are adamant on having the the inflected forms categorized (although I must admit I think it somewhat pointless myself). In any case, I strongly feel that putting lemma forms and non-lemma forms in the same category is highly effective at making those categories useless. Atelaes 00:53, 19 March 2007 (UTC)Reply
Yes I think your right on it there.--Williamsayers79 00:57, 19 March 2007 (UTC)Reply
  • I'm not sure what you mean be "tagging" in this context. I strongly believe there should be a label on each sense which is regional. On the subject of putting articles in Categories I have no strong opinion other than agreeing than flooding categories with inflected forms is a bad idea. If the problem is that some template is being used that adds a regional label and a category I'd say don't use the template on inflected forms but still use a regional label. — Hippietrail 17:50, 19 March 2007 (UTC)Reply

Against demolishing the present Policy structure

One CM is trying, through the RFD method, to blow away the present policy structure which allows the gradual development of Policies. He is basically proposing that all discussion again return to the rowdy environment of the Beer Parlour. My experience was that the Beer Parlour was generally a never ending discussion that never reached any conclusion, that never developed a single policy in Wiktionary. I totally oppose his idea.

What I also have to question is the basically sneaky way that is being used to try to achieve this. When the change which would be brought about by these policy method deletions is advocating that all policy discussion take place in the Beer Parlour, isn't it rather odd that there is actually no mention in the Beer Parlour of this planned complete change to the very idea of policy development. In fact there seems to be no (easily found) coherent explanation of what CM is really proposing to put in place of that which he wants to demolish.

I see no merit what so ever in what CM is proposing. It is purely destructive. It would leave no real way of developing policies. In the past, no policies were developed until this policy development structure was put in place and a concerted effort was made to develop policies beyond the discussion stage outside of the highly volvatile forum of the Beer Parlour. In future, CM would have us believe that somehow we could make a leap from Beer Parlour discussion (more like a shouting match half the time) to a fully fledged, approved policy, without any intervening stages. My view is that the change is a recipe for killing off any future policy development. It is not a positive move at all.

I have to observe that it seems the proposal is quite naive about the whole need for policies, and the ways policies are developed in real world organisations. The genius of an idea may come from a Beer Parlour discussion. But to make it all the way to an Official Policy it needs to go through varous development stages of serious consideration. It involves draft proposals, policy focus groups, white papers, proposals, and only at the end, a vote on the Official Policy. The Beer Parlour is not the place, nor the right tool, for this sort of back room work to take place to fully develop and refine a policy.--Richardb 10:58, 19 March 2007 (UTC)Reply

What you refer to as the "present Policy structure" is entirely obsolete, it was not being used. All he is doing is marking the remnants for deletion.
The actual present "Policy structure" is to tag draft policy documents with the {{policy}} template with the draft= parameter to describe the status as it is drafted and discussed here and on its talk page, then conduct a WT:VOTE and remove the draft=. It then takes a vote to make changes. That is the whole Torah, the rest is commentary. Robert Ullmann 11:24, 19 March 2007 (UTC)Reply
Far from it. He has not made real changes. seeWiktionary:Policies and guidelines which still has all the steps described, and is thus, according to CM's own tagging, now Policy. :-) Show me some evidence that CM ever took this Policy change to a vote. Show me any serious discussion of the change. In fact, since you say "The actual present "Policy structure" is to tag draft policy documents with the {{policy}} template", how about you show me where that policy is ? The whole point is that the discussion of any change, by either the past policy or the CM idea, should be on the talk page of Wiktionary:Policies and guidelines. But it is not. This has all the appearance of a totally unilateral move by CM, very illogical and very incomplete. Very destructive. Very much not properly discussed and voted on. Just a typical CM jackboot approach. He is a techo through and through. He should stick to techo stuff.
OK, we could use the {{policy}} template with the parameter. But, to me, this is a typical unnecessary complication much beloved of techos, hated by the ordinary user. And totally unnecessary.
Yopu say "What you refer to as the "present Policy structure" is entirely obsolete, it was not being used." what you mean is that things were not moving in it. Perhaps some of the policy ideas captuired were in fact stable, and could have been promoted. But CM has always been one of the last to do any real work on Policy. He much prefers to shout louder than anyone else and do things unilaterally. To be destructive of other peoples' work. (comment continues below)
Wait just a cotton-pickin' second! Are you talking about me or someone else? You other comments may have had some basis, but what is this all about? Unilateral changes? Obsolete/ignored/superseded policies were stable? What on earth? No, those proposed policies were in DIRECT conflict with existing practices, and represented (in each case) one person's POV of how things should be. --Connel MacKenzie 15:31, 19 March 2007 (UTC)Reply
Funny, the proposed policies stood for a year or so, and the fact they are stable generally means acceptance. During the process I constantly publicised what I was doing, and, contrary to your statement, quite a few other people did contribute to the ideas. Richardb.
(comment continued from above) I stand by the current policy of Wiktionary:Policies and guidelines. And it says we have the various development steps. That policy should not be changed without discussion and a vote. So it still stands. So that is the clearly stated current policy, not CM's half-baked, half implemented, half-forgotten proposal.--Richardb 11:45, 19 March 2007 (UTC)Reply
Your abuse of Connel is completely, totally out of line! It constitutes a personal attack. Stop right now. (comment continues below)
I contributed heaps to Wiktionary for two years or more, till I got totally p'd off by CM's too destructive approach. I'll desist fromo calling a spade a spade when he stops trampling all over other peoples rights.--Richardb 12:15, 19 March 2007 (UTC)Reply
While some of that is true, that doesn't automatically give you the right to assume bad faith. Richardb and I have at times worked exactly towards the same goal, other times directly opposite (while both trying to achieve the same end result.) While I understand Richard's bitterness now, anyone who wasn't here for all of the fireworks probably does not understand the ins and outs of the situation. I'd like to request that no one fight for me, per se. Richard has some genuine complaints, along with some very major misconceptions about what has transpired, and those circumstances. --Connel MacKenzie 15:21, 19 March 2007 (UTC)Reply
Thanks for the conciliatory note Connel. Now I've got your attention I'll try to be more polite. Richardb.
(comment continued from above) For everyone else's information, this has been discussed; a lot of it was on the IRC channel; Connel is in no way whatsoever acting unilaterally or improperly. There is a lot of cruft to clean up. Robert Ullmann 12:02, 19 March 2007 (UTC)Reply
The IRC channel has absolutely no standing in deciding policies. You make no attempt to point to any evidence in the log, or the talk page, or any where, that this was ever discussed in writing. I can only assume that is because there is no written log of the discussion. So how does that fit into any sort of policy ? And are you going to point to the "policy" you purported to quote ? Or does the policy I pointed to have more standing. You are only demostrating your own complete ignorance of the written policies of Wiktionary, and your unfounded faith in CM's good faith. Point to some real evidence, or just back off with your useless platitudes.--Richardb 12:15, 19 March 2007 (UTC) (In no mood to be polite with people who put politeness above actually following the rules. Connel is clearly way outside the written rules. If you can find any rules he has followed in this area, please point them out to me. Otherwise just - shut up! The most cruft to clean up is the useless waffle about being polite.--Richardb 12:15, 19 March 2007 (UTC)Reply
By the way, I checked the Beer Parlour Archives for January Wiktionary:Beer_parlour_archive/2007/January#63275614825 and found not a scrap of discussion about this issue, yet the RFD's were put up around Jan 27th. Did find a bit of discussion in the Sept06 Archives, but nothing to actually back up what Connel did.--Richardb 06:57, 20 March 2007 (UTC)Reply
Is that a public expression of intent to wheel-war? That sounds like fun... --Connel MacKenzie 15:35, 19 March 2007 (UTC)Reply
Wheel Wars is something that exclusionists indulge in, not Inclusionists such as me. I haven't touched a single one of your entries in regard to this deabte (But couldn't resist RFDing your apparently useless, unexplained Catgory on "Pages with a shortcut"). But I do ask you to rethink. Inclusionists such as me just try to bury you in verbosity :-) See the note "A way to go" below.--Richardb 06:18, 20 March 2007 (UTC)Reply

Try reading w:Wikipedia:Off-wiki policy discussion for the Wikpedia view on the standing of IRC discussions when it comes to deciding policy. To quote their highlights :-

  • "Consensus" in the Wikipedia context means consensus amongst comments posted on Wikipedia. Off-site discussions do not contribute to "consensus".
  • IRC can also be used for the purpose of consensus-building. Quite simply, Serious policy discussion should be common on IRC. When good ideas or proposals result from such a discussion, participants should publicly post a summary of the idea on Wikipedia.

So where is this summary of CMs idea posted on Wikipedia ?--Richardb 13:00, 19 March 2007 (UTC)Reply

At present, I don't recall even which section of w:WP:AN is is archived under. I'll try to dig up links this evening. --Connel MacKenzie 15:24, 19 March 2007 (UTC)Reply
I'm confused here, why would Connel post comments about Wiktionary policy on Wikipedia? --Versageek 21:17, 19 March 2007 (UTC)Reply
May I ask for some clarification on specifically which of Connel's actions are being called into question here? Perhaps a link to the relevant diff, or at least the page which he deleted, or whatever it is. I'm totally lost, and would very much appreciate being brought up to speed. Thanks. Atelaes 20:44, 19 March 2007 (UTC)Reply
start reading here it's this entry, and several that follow. --Versageek 21:04, 19 March 2007 (UTC)Reply
The very point I'm trying to make. Connel has used the RFD process to change Policy. There is no-one place (That I can find) that puts forward a proposal for change. The talk pages of the affected policy pages do not include any discussion for the change. Ricahrdb

A way forward ?

Being optimistic, is this an indication of Connel being willing to actually work on developing policies. If so, I'm more than willing to spend a bit of time working with him, and anyone lese. But, we have to do it the right way. Changes have to be proposed and publicised and slowly a consensus built. All completely transparent and in writing in Wiktionary, in the talk pages of the pages affected. No using RFD to push changes through. First signs of goodwill I'd like to see would be:-

  • Connel to withdraw the RFD for/from each of these pages.
  • Connel to put up a written proposal somewhere (probably in the talk page of Wiktionary:Policies and guidelines as to what he proposes. If there was a considerable IRC chat about this, can we have a summary.

I would also suggest that we possibly try to align somewhat with the more mature Wikipedia. See w:Wikipedia:Policies_and_guidelines. They seem to have "Policies", "Guidelines", "Proposals", "Essays". Not so different from "Official Policy", "Semi Official Policy", "Draft Policy", "Policy Think Tank", but perhaps not so apparently rigid and "bombastic". But nevertheless a recognition that it takes stages to develop a policy.

Hope we can work together on this Connel, even though I can barely spare the time.--Richardb 05:29, 20 March 2007 (UTC)Reply

Wikipedia needs many levels of policy development because of the number of people involved. While I agree that we need more than one level, we should be cautious of going for an over-complex and over-rigid framework more suitable for a large organisation. The result of excessive complexity is that the system falls into disrepute and is ignored...which has indeed happened here. --Enginear 12:31, 20 March 2007 (UTC)Reply
A few notes on this. First, I agree that simply nominating these things for deletion may not have been the best approach. However, it should be noted that, at the very least, he did not delete them outright. RFD still allows the opportunity for discussion and debate (as the very fact that we are now having this debate shows). Second, I think it should be noted that many of these pages were in a state of being sidetracked and ignored when Connel nominated them. Third, I would really like to have a page that has a listing of all the policy pages (as I noticed one of the RFD'd pages was). Wiktionary has such a ridiculous amount of policy (and yet rightly so), and I find it hard to keep track of it all myself. And, while I am not a veteran like some folks here, I'm not exactly a newbie either. Yet I still find myself unaware of certain policies. However, most of the pages that were nominated for deletion do need a great deal of work if they are to remain useful, as they certainly do not reflect current practice (which is really what Wiktionary policy is, in reality). (comment continues below)
If they don't reflect current Wiktionary current practice, there are at least two ways to go.
  • Update the policy to reflect the current practice.
  • Modify practice to more follow policy, thus slowly edging away from the current poor practice.
Those who generally agree with the benefits of having written policies will tend to the latter, with some of the former. Those who generally disagree with having policies will tend to the former to some extent, but actually are more likely to just ignore any policies anyway. Which they are free to do. (Indeed that in itself iis a Wikipedia policy). But no need for them to try to knock down policies which are useful to newbies, and to those who do want to try to build and use them. --Richardb 06:18, 20 March 2007 (UTC)Reply
(comment continues from above) At least some of them merit (in my opinion) such work. Finally, while I agree that it is sometimes beneficial to have policy discussions in a location other than the BP (as, for example, I found the discussions on the talk page of the About Greek to be much more focused than many BP discussions), as it largely limits the people involved to those who are interested and somewhat knowledgeable in the topics at hand. However, each and every single discussion of this type absolutely must have a note on the BP publicly announcing that the discussion is happening and where. Atelaes 05:55, 20 March 2007 (UTC)Reply
Absolutely. The BP should always have a noticeboard of what policies are being discussed. And guess what. That is what is there right at the start of BP. But, even so, it's worth standing up in the Beer Parlour every so often and shouting "Anyone interested in seriously discussing ..... should go to ..... for the serious debate". Which, I guess, is what I've done. Was it just a bit of a tactic to also throw a couple of swings at "my mate" CM in the process, to get some extra attention ?--Richardb 06:18, 20 March 2007 (UTC)Reply
I get enough of that from rolling back vandals, thank you very much. As Versageek was baffled earlier, let me try and assemble some of the relevant events. There was a Wikipedia blowup about Wiktionary, with a couple Wikipedia admins visiting Wiktionary and immediately running afoul, based on their assumption that this is Wikipedia.
The policys that I tagged for RFD were specifically called out as being what led those contributors astray. Each was undeniably obsolete. Each one was also long abandoned.
During this time, much of the confusion was resolved on IRC. The fallout of that, after the RFDs was my rearrangement of what the existing policies are, for visiting Wikipedians' sake. In a nutshell: WT:CFI and WT:ELE are the absolute pillars of Wiktionary. Discussions are (for better or for worse) held in the central WT:BP area. WT:VOTE is used to implement/validate new policies and practices. What I abandoned, was devoting a couple hours per week of my time to keeping them up to date...once the three Wikipedians in question got comfortable, there was no urgent incentive to pursuing the policy maze that Richardb had originally set up (for further simplification/dismantling.) NOTE: Richardb spent a lot of time and effort singlehandedly trying to implement a policy structure that he though was appropriate...however, with the lower traffic of en.wiktionary.org, the system was enormously too complex, and overkill for the situation by several orders of magnitude. Remants of his proposed policy structure (which everyone ignores) shouldn't be left around for Wikipedians to trip over...that was the impetus for the initial RFDs!
So, where do we go from here, indeed? --Connel MacKenzie 16:16, 23 March 2007 (UTC)Reply

Revert first, look later

After a string of incompetencies by mister Connel Mackenzie I see everyone talks about recently (see my IP's Block Log for edifications), today I see another idi... um, fellow, reverting my changes and then his after probably actually SEEING what he has reverted. Now, I know that these people are busy, but to revert a change based on the comment, or worse, on Connel's side, for just editing a word that he knew as mostly vandalised seems to me incompetence, not to mention a violation of a certain statute, if I'm not mistaken, of this site's, that mentions ,,good will" assumed of one's modifications. That may apply to people that actually read the modification, but what do you call those incompetent idi... um, folks that just revert cause they don't like the edit summary or the word edited? 86.107.8.10 15:56, 19 March 2007 (UTC)Reply

If you think any reversion of edits on Wiktionary is a violation of any statute, then you are indeed mistaken. Cheers! bd2412 T 15:59, 19 March 2007 (UTC)Reply
Well, if you tolerate bans for no reason from admins or modification in the detriment of an articol and take no action against that or see no problem with it, I must congratulate you on a job well done promoting vandalism. —This unsigned comment was added by 86.107.8.10 (talkcontribs) 16:12, 19 March 2007 (UTC).Reply
I didn't say that. All I said was, there's no statute violated. But take it however you wish. Cheers! bd2412 T 16:28, 19 March 2007 (UTC)Reply
I'm not sure what you're saying. If you mean that no reversions violate policies, then I respectfully disagree. If you mean that not all reversions violate policies, then you're correct, but I'm not sure what your point is; anon wasn't claiming otherwise. Rather, he was saying it violates the "assume good faith" policy (w:WP:AGF; I don't know if Wiktionary has a similar counterpart) to revert an edit without looking at it. —RuakhTALK 19:55, 19 March 2007 (UTC)Reply
No, I'm just nitpicking. We have policies. We do not have statutes. Cheers! bd2412 T 22:15, 19 March 2007 (UTC)Reply
It would help if you referred to a specific reversion or edit, rather than making vague grumbling noises. We can't fix a generic problem without addressing the specifics first. --EncycloPetey 16:16, 19 March 2007 (UTC)Reply
Anon is referring to http://en.wiktionary.org/w/index.php?title=Special:Log&type=block&page=User:86.107.8.10 and http://en.wiktionary.org/w/index.php?title=pizda&diff=next&oldid=2149808. Frankly, I agree with him/her: Connel MacKenzie (talkcontribs) seems to have acted indefensibly in this case. Anon made a series of contributions pertaining to Romanian, all seemingly reasonable and correct (I don't speak Romanian, but that's how they seem), culminating in a contribution to pizda adding the Romanian sense of that word (which, unsurprisingly, is the same as the sense of that word in the various nearby Slavic languages). Connel MacKenzie responded by reverting the edit and blocking the user, writing "don't mess with constant vandalism targets please". This is indefensible; the edit is quite reasonable-seeming, and he seems to have made no effort to determine whether it was accurate. If he feels that anonymous editors shouldn't edit this page, he should semi-protect it rather than block any who try. —RuakhTALK 18:55, 19 March 2007 (UTC)Reply
I looked into this case a while back (the anon had posted a friendly little note on Connel's talk page). The extra history behind this is that someone had been adding a Romanian section, when Dijan had noted (a number of times) that a Romanian section did not belong here, but rather at pizdă. A number of different anons had attempted to incorrectly implement a Romanian section at this entry, all being reverted. Perhaps Connel acted a bit hasty in this block, but it is not as though there was no reasoning behind it. This page definitly did have a history of anons doing things that shouldn't be done to the page. To the original author of this thread, as EncycloPetey notes, you would do well to cite specific grievances, and to ask for specific remedies, instead of making incoherent claims and poorly veiled attacks. I agree that Connel did in fact make a mistake, but I have yet to find an admin who hasn't. However, a quick search of your own contribution history finds you making the understandable mistake of adding a false link [4], so I think it should be admitted that no one is perfect. If you feel that some action is necessary in response to Connel's mistake, then please propose one. Otherwise, I suggest you get on with your life. Atelaes 20:29, 19 March 2007 (UTC)Reply
I'm confused: there's currently a Romanian section at pizda, and it's been there for almost three weeks with no comment. What's changed? —RuakhTALK 21:02, 19 March 2007 (UTC)Reply
pizdă and pizda are different words. Cynewulf 21:09, 19 March 2007 (UTC)Reply
No, they're not … —RuakhTALK 22:07, 19 March 2007 (UTC)Reply
Well, the pizda entry says that one is articulated and one is unarticulated. However, I don't quite understand what that means. Anyone care to curb my raging ignorance? Atelaes 06:02, 20 March 2007 (UTC)Reply
Romanian, like Bulgarian, marks definiteness of nouns in the ending. For this particular word, pizdă is the citation form and means cunt. The spelling pizda means "the cunt" (nominative and accusative); pizde is genitive/dative, and pizdei is the definite genitive/dative (of/to the cunt). —Stephen 23:01, 20 March 2007 (UTC)Reply
As I've stated, I don't speak Romanian; but I understand it to mean that pizda means the poontang while pizdă means simply poontang (see w:Romanian grammar#Articles). —RuakhTALK 16:58, 20 March 2007 (UTC)Reply

On a related note, perhaps Wiktionary should have a policy specifying when users can make automated reverts. Such a policy might specify that automated reverts are to be made only in cases of clear vandalism. I know the arbitration committee on Wikipedia criticized an editor for not explaining his reversions once. Without an explanation, many editors assume the worst as to why a reversion was made. Some editors are embarrassed by them because it makes it look as if their edits were so bad that an explanation is un-necessary. Users who write detailed summaries of their edits may feel like they're being ignored. Others may feel like their new-comer status is being highlighted by the use of such powerful tools.--Νικα 22:41, 20 March 2007 (UTC)Reply

I have to apologize, but let's bring this back down to reality :) An anonymous contributor edited a vulgar word in a language that most of us do not speak or understand. Why should we trust the edits? The anonymous editor made no attempt to establish a track record of credibility (this is especially important when editing slang words). When I make an edit, people generally trust that it's correct because I have demonstrated, over time, that I am knowledgeable in the languages that I work on. Anybody can look at my edit history and decide that for themselves. This is key, especially when the word is poorly documented. If nobody can verify that you know the language, your only other option is to include where you got the information from (not in the history comment, but in the actual article under the references section). Remember, we're all anonymous here. Nobody should believe what anybody does unless it can be verified in some way. -- A-cai 02:37, 21 March 2007 (UTC)Reply
Shouldn't editors assume good faith, though? And if someone (a sysop, editor — anyone) isn't knowledgeable about a subject, shouldn't they simply leave the entry alone? Or, if they find it suspicious, they can RFV it, rather than simply revert edits. But I could be misunderstanding your comments. Could you elaborate on exactly what types of practices you support vis-a-vis newer editors?--Νικα 03:29, 21 March 2007 (UTC)Reply
Frankly I think we should do what Wikipedia does and lock down our most frequently vandalized words. After all, the meaning of fuck or fag is not likely to change radically anytime soon, making it far more stable as a dictionary entry than George W. Bush or Hillary Clinton are likely to be as encyclopedia articles. I suggest that for particularly vandal-prone words, we make the entry as complete as possible and then lock it down, with an invisible note at the top of the page to tell would-be editors to take their suggestions to the talk page. Cheers! bd2412 T 03:42, 21 March 2007 (UTC)Reply
Well, before we lock something down, we should make sure it has translations in all (order of magnitude 10000) languages, all derived terms set, all "see also"s added, all synonyms and all alt spellings. -- unsigned
They have pretty thorough coverage of major languages as is. With respect to synonyms, we have WikiSaurus. Anything else, take it to the talk page. bd2412 T 05:20, 21 March 2007 (UTC)Reply
In most cases, the correct action would be to submit suspicious definitions to WT:RFV. However, vulgar and contentious words tend to invite vandalism, which is why sysadmin's tend to be quick to revert suspicious edits. Is this the correct course of action? I guess that's depends on your perspective. I'm simply saying that if you make such edits, you should be able to demonstrate in some way that the edit is legitimate. Perhaps bd2412's suggestion is correct. Maybe Wiktionary should only allow registered editors to directly edit contentious words. This would not be out of keeping with what's going on over at Wikipedia. -- A-cai 04:05, 21 March 2007 (UTC)Reply
Guys, you're sidetracking part of the issue. My problem isn't that these guys take one look at the definition and if they see something that they don't understand, they revert it. That would be... let's say, a little arrogant, but somewhat understandable. My problem is that, by all apparences, THEY DON'T LOOK AT WHAT THEY'RE REVERTING. Simple as that. They just hit the revert button, hell if I know where they have it as to not see what they're reverting, and then look at what they did... if they actually do, that is. It wasn't Connel's actions that made me take stand, one bad weed I can more or less understand, but it was yet another blistering show of ignorance: http://en.wiktionary.org/w/index.php?title=Australia&action=history which makes me think this is a regular habit for people with some experience around here. It didn't end in a ban like the last time, but it sure got me annoyed once more.
As for discussions about track record, achieving credibility... come on! Do you think every (or any, for that matter) anonymous contributors knows about the ground rules you set up for them? For me in particular, I just edit where I see apropriate to change/add/whateva. smth, I don't care about what words I edit or how many. paused
inserted response I understand your point of view as a casual contributor with respect to assuming good faith. However, you must understand that my comments about credibility have nothing to do with "ground rules." I'm simply stating reality: life is not fair! A sysadmin is not likely to give the benefit of the doubt to an anonymous contributor who makes a questionable edit (i.e. an edit that cannot be independently verified by a sysadmin). This is because too many people (registered or not) make bogus edits to words. One glaring example of this is editing a word for a language that you do not speak!!! You can complain about the sysadmins all you want, and in some cases, you may be justified. But life is a two way street; don't do things that will get you reverted, and you won't be reverted. I've added thousands of words to wiktionary over the last year. I have only had my edits called into question on rare occasions. In every case, I resolved the matter, not by cursing at the person who did it, but by citing evidence for the validity of my edit. end of inserted response -- A-cai 03:07, 24 March 2007 (UTC)Reply
resume
bd2412, nice of you to try to close the issue by nitpicking (taking advantadge of the fact that I said statutes in stead of policies... that was quite fair-play of you, I must admit).
inserted response Perhaps, you are unaware of the fact that bd2412 is a lawyer :-) end of inserted response -- A-cai 03:07, 24 March 2007 (UTC)Reply
resume
And for the record, as I saw mentions of my gender, I am a male :P Signing off is here default getaway 86.107.8.10 with a friendly warning: mister Ullmann was strike two, if there is ever another strike of stupidity from someone that considers himself/herself superior enough to revert stuff just 'cause they don't like the editing message, as much as I appreciate this project's ambitions, I shall feel forced to use the ,,big guns" (dear old Proxy Switcher works like a charm ;) )to thank them in a civilised (NOT) order. Toodles! 86.107.8.10 14:44, 21 March 2007 (UTC)Reply
This is a dictionary, so I feel I have the right to be particular about the meanings of words. A "statute" implies that violation thereof is unlawful. A policy is more like a guideline to be interpreted in accordance with the dictates of the situation. bd2412 T 20:33, 23 March 2007 (UTC)Reply
Wow. All this from a troll/vandal 86.107.8.10 (talkcontribswhoisdeleted contribsnukeabuse filter logblockblock logactive blocksglobal blocks), who has been reentering an item that has previously failed RFV/RFD, who not only resorts immediately to personal attacks, but gets support for those attacks? Why wasn't this section immediately rolled back? WTF is going on here? --Connel MacKenzie 15:45, 23 March 2007 (UTC)Reply
I don't think he's a troll/vandal, and you've provided no evidence that he is. All of his contributions seem well meant. The only person he seems to be attacking is you, which I think is understandable, seeing as you had blocked him for no reason and have never looked back. (That doesn't make it acceptable, mind, but eminently understandable.) —RuakhTALK 16:43, 23 March 2007 (UTC)Reply
User:Ruakh that is bullshit, and you know it. Someone who not only expressed intent to use open proxies, but is already intimately familiar with them is not a vandal? It is a good reason to review his edits in detail, but certainly no reason to feed the troll, nor to hide in fear from threats of vandalism. If he wants to post goatse on my user talk page now, we can certainly use the exercise of blocking new/residual open proxies. As to the pizda entry, go take a look at the history. Frankly, I trust Dijan's research more than Stephen's knee-jerk assumption of good faith, but I don't have a handy method of checking either, at the moment. Did the vandal resubmit with three citations? Is it attested? Come off it. It is run-of-the-mill vandalism. --Connel MacKenzie 17:36, 23 March 2007 (UTC)Reply
Sorry for saying so, but you're the one bullshitting; I guess you find that easier than recognizing your error and apologizing for it? I'm also familiar with open proxies; I've never used one, but to be honest, if an administrator blocked me for no reason, I might decide to use one; does that make me a vandal, too? (Granted, I probably wouldn't try to circumvent an unjust block, as I'd more likely just say "fuck this" and give up editing entirely — but I can't say for sure one way or the other. I guess it depends whether I felt the problem was Wiktionary in general, or a single power-mad administrator.) It seems quite obvious to me that pizda is a legitimate Romanian word, whose definition should be something like {{form of|articulated (definite) nominative|pizdă}}. A Google search for google:site:ro "pizda" will show you instantly that pizda is much more common on Romanian Web sites than pizdă is (whether because people don't bother typing the breve, or because the definite nominative form is more common than the indefinite nominative, or what). Also, your attack of Stephen strikes me as just this side of crazy; if anyone here has made a "knee-jerk assumption", it's you. You blocked a user for a good-faith (albeit slightly misguided) edit, and now you're acting like his angry response justifies your having blocked him. —RuakhTALK 19:34, 23 March 2007 (UTC)Reply
O.K., having say that, I see that now that you've re-blocked him, he's started to genuinely and blatantly vandalize under a different IP (163.28.176.4). So, congratulations; you've been outdone in the bad-guy department. *is done defending the anonymous-editor-turned-vandal* —RuakhTALK 19:51, 23 March 2007 (UTC)Reply
Again (as if you didn't already) see comments below. He always was a vandal, from the very start. I suggest you redact your "outdone" comment. --Connel MacKenzie 20:27, 23 March 2007 (UTC)Reply
Okay, here's a better example, untainted by all the vulgarities and slang nonsense. This reversion violates Assume Good Faith in my opinion. Deletions like this should be commented at the very least, and probably noted on the talk page as well. DAVilla 19:20, 23 March 2007 (UTC)Reply
You're suggesting that isn't nonsense? DAVilla, that edit is nonsense - you have now (a half a month later) dredged up a student's (now assistant professor or something) web-page as "evidence" that all astronomers make the same mistake as this one former student? If astronomers use the jargon term metalicity, with a similar meaning, then that entry might merit an entry here - but such a blatantly bogus redefinition of metal? Without references? What is going on around here? --Connel MacKenzie 01:00, 24 March 2007 (UTC)Reply
As this vandal likes to point to: User talk:Connel MacKenzie/archive-2007-3#Thanks for the ban. Note clearly, that any "good" contribs this guy has ever made are far outweighed by his initial, and constant, vandalism, interspersed throughout. This is not some guy who is "slightly misguided;" rather, he is an insipid troll. Frankly, the more obvious vandalism he's doing now is much easier to deal with, than the subtle mistakes he was intent on inserting into Wiktionary. --Connel MacKenzie 19:48, 23 March 2007 (UTC)Reply
Sorry for butting in here as a relative newcomer, but may I suggest that Connel MacKenzie might do best to take a small step back here, and admit to a slight glitch. Glitches happen! I see good faith in general (although admittedly, cannot back up this faith with "evidence"). It is clear that Connel cares a lot for the Wiktionary project. --Keene 01:17, 24 March 2007 (UTC)Reply
Nope. As evidenced, it was no glitch to block the vandal. --Connel MacKenzie 01:55, 24 March 2007 (UTC)Reply
You seem to call users vandals all the time Connel. --Keene 02:16, 24 March 2007 (UTC)Reply
I never tried to cite that definition as it was never posted on RFV. What I was doing was taking 5 seconds to give a reason for my reversion. If anyone does even a half-ass job of looking they'll see it's actually quite common in astronomy. The point is that there's really too much information out there for any single person to claim to know what kinds of entries are bogus or not. That's why you have to assume good faith. DAVilla 17:27, 24 March 2007 (UTC)Reply
I have to partially disagree with DAVilla. It's not about what any one person knows or good faith etc. It's about the integrity of the information in wiktionary. "Take my word for it" is not a viable solution for any of the wikis. I will grant that there are plenty of words, that are often not found in mainstream dictionaries, which should be included in Wiktionary. However, someone saw the word somewhere, or it shouldn't be on our site. Here is my approach to this (mind you, I'm not stating Wiktionary policy, just my own opinion). For example, if I create an entry for a basic Chinese word such as 杯子 = cup, anyone can verify the definition from any number of on-line resources[5]. Technically, I should provide proof of the entry's validity, but I didn't in this case (laziness), because it's so easy to verify. Now take a look at 缓冲器. This word is poorly documented in other dictionaries. As a result, I was questioned about it by a contributor (see: User_talk:A-cai#bumper). My solution was not to say, "Trust me, I speak the language and you don't!" Why should that person have to take my word for it? He doesn't even know me. I say, good for him! My solution was to find proof, and then include that information in the article. -- A-cai 02:53, 25 March 2007 (UTC)Reply
That's not what this discussion is about. I don't think DAVilla is arguing for assuming that someone is right or wrong, but for assuming good faith. No one should ever assume that someone is right or wrong. If you have a strong suspicion that something is wrong, you might request verification. If you strongly believe that something is wrong, you might undo an edit. But under no circumstances (in my opinion) should you assume that someone is wrong based solely on the age of their account or on whether an addition is sourced. Perhaps it has to do with my world view, but I believe that humans, in general, act in good faith. I think that they are good at heart. Look at the recent changes for this wiki and 99.9% of the changes you will see are made in good faith and are factually valid. Most of the additions are also unsourced. Assuming that these edits are by their unsourced nature incorrect would be a logical fallacy and also tragic. We saw with Essjay on Wikipedia that having an old screen name on the internet means nothing. The best way to ascertain accuracy is to discuss the content and not the editor.
Also, to get back to my other point: Under no circumstance should anyone be reluctant to explain why they have done something. If you have carefully examined an edit and you are reverting in good faith, then you should have no trouble explaining yourself. In fact, you should be eager to tell everyone. On the other hand, if you do not have a valid reason for making an edit, then you will be reluctant to explain yourself.--Νικα 05:44, 25 March 2007 (UTC)Reply
Unless you're talking strictly in the abstract, you seem to be misunderstanding something here. No one asked the anon to justify his edit; rather, an admin reverted it and blocked him for having made it — and didn't even leave a note at his talk-page explaining why. It's fine to be a bit cautious while assuming good faith — we have to be, especially at oft-vandalized entries — but the admin made a strong assumption of bad faith without any support for that assumption so far as I can discern (though he maintains that there is support for it). —RuakhTALK 05:13, 25 March 2007 (UTC)Reply
Let me clarify my position, if good faith is so important, then the anon must also assume good faith on the part of the sysadmin. The proof that you have cited of Connel's unfairness is not a slam dunk case in my opinion. It appears to me that Connel was making a good faith attempt to stop what he believed to be a vandal. Remember, the anon still has not demonstrated a proficiency in Romanian, nor has he offered any proof from a credible source of his definition (an example sentence might be nice. For example, see 上穷碧落下黄泉). Had one of those things happened, I might have been more likely to side with the anon. What he has done instead, and you can read it above, is threaten to make edits via some kind of voodoo proxy in the future, so that he can't be blocked as easily (rather than making an attempt to support his edits with evidence). With respect to the original potty mouth word that started this whole thing, until a fluent Romanian speaker comes along and sets us all straight, I'm not sure what else we can do at this point. -- A-cai 09:08, 25 March 2007 (UTC)Reply
Someone is lucky that the anon was not civil after what, aside from the history of the page, could look like an unjustified slam. Had the anon not been the same contributor, had he acted civily and been able to credibly source the definition, there would have been no justification for blocking without communication, regardless of the histroy of the page, and I might have recommended disciplinary action against the admin for violation of AGF. That the anon did not act civily means the history of the page backed the admin. So I think the point is that Dijan knew what he was talking about, and that Connel is either just lucky or he really knows what he's doing, which I hope doesn't mean don't suspect means singling people out and driving them acts of vandalism like this. DAVilla 11:45, 1 April 2007 (UTC)Reply
"Unjustified slam"? I suppose it could look like that, if you are blind, perhaps. --Connel MacKenzie 04:09, 2 April 2007 (UTC)Reply
Yes, that's almost exactly what I said. If you're blind to the history of the page, then it could look like (not equal to "is") an unjustified slam. The history of the page makes the story turn into one of an ambitious contributor not listening to reason. The incivility of the anon makes the story into one of an ambitious contributor not listening to reason. If the history were debunked with proof of the word, and if the anon had acted civily, then the story would be completely different. If the history were debunked with proof of the word, and if the anon had acted civily, then the block would have been inexcusible. But that is not the case. You blocked the right guy this time. Is that because you're lucky or you really know what you're doing? For a successful stockbroker, they say that's impossible to tell, the difference between luck and skill. And so it might be here. And so why bring up the question? To point out its irrelevance. You blocked the right guy this time, and everything else we can say, and pretty much everything I said, is nothing more than speculation on the difference between luck and skill. DAVilla 19:04, 2 April 2007 (UTC)Reply

Template:neologism

Do we have guidelines anywhere for what it takes for a word to get this tag? It seems to me, like it is a last ditch plan for prescriptivists to defame words which thwart rfv/rfd :-) Several of the tagged words were not neologisms, so I removed the tag from them; many others would probably be better off deleted. The very nature of this tag just seems contradictory: either something passes RfV/RfD, or it does not :-) Though, I don't want it to sound like I have anything against DaVilla (who created the template), in fact I think DaVilla is a fantastic contributor and we can all learn much from their contributions :-) Anyway, if we are going to use this template, we could make link to a page describing the specific, objective criteria used to make the classification. That is more in line with Wikimedia philosophy in general, and would no doubt stir joy and happiness in the hearts of all our readers!!!! :D -Signed, Language Lover

Word. I think we're much better served by appropriate use of {{context}}. —RuakhTALK 16:46, 20 March 2007 (UTC)Reply
To clarify: the template's talk-page does say how it's to be used ("Use this template […] on pages that have passed the RFV process or are otherwise well sourced, but which do not appear in any of the six major English dictionaries […]"), but I'm not sure it's actually being used that way, and the current wording is grossly misleading. —RuakhTALK 16:53, 20 March 2007 (UTC)Reply
I fail to see why inclusion in the "six major English dictionaries" is relevant. For one thing, Wiktionary is itself a major dictionary :-) For another thing, such a philosophy reeks of copycatting, I mean if we're just mirroring those dictionaries, how are we better than dictionary.com? For yet another thing, the classification of dictionaries as "major" or "non-major" is mostly arbitrary (the arbitrariness is of course obscured by lots of appealing to authority and such)-- why is OED "major" and Urban Dictionary not? Does the fact UD's contributors don't all have college degrees, mean that the words they speak aren't words? The way I see it, the "six major dictionaries" can look to us for inspiration/confirmation, not the other way around (and if it's not like that now, it ought to be our goal anyway) :) Especially with all the wonderful work all of you guys like Ruakh do :-) Language Lover 17:19, 20 March 2007 (UTC)Reply
Although I have commented that the neologism template in its current state is useful, Language Lover's comments are so perfect, I can't help but second them.  What Language Lover said is the essence of "wiki is not paper" IMHO.  If a word exists somewhere out there, it should be here and others should be able to find definitions and usage notes on it here. — V-ball 17:24, 20 March 2007 (UTC)Reply
A few comments. First, in my opinion, there is a glaring distinction between the OED and the Urban Dictionary. Yes, one of them is that the editors of the OED have degrees, and, in general, the editors of the UD do not. But more importantly, the OED is consistent, extremely well researched, and more representative of the language as a whole. UD includes definitions used by small communities, or sometimes solely of individuals, whereas the OED's definitions generally represent the semantic understanding of millions. That being said, there is, nonetheless, some merit in your comments, Language Lover. There definitely is a point where we should strive to carve our own niche in the dictionary world, and not simply try to imitate the OED. However, at the same time, we have to deal with the ambiguous line between descriptivism and prescriptivism. I think that most of the editors here are quite in favour of going with the descriptivist school, meaning that we are striving to describe language as it is actually being used, not trying to tell people how they "should" use their language. However, many of our readers are not in on that frame of mind. Many of them look to a dictionary to find the "correct" spelling of a word, or the "correct" context in which to use a certain word (I must admit that I do from time to time). If we do not make some distinction between correct and incorrect (in certain situations), then we are misleading our readers. Whether that is our fault or theirs is irrelevant. That being said, Ruakh makes an excellent point that the context tags might often be more appropriate and more useful for many places where the neologism tag is currently in use. Finally, thank you very much, Language Lover, for properly signing your comment. Atelaes 19:51, 20 March 2007 (UTC)Reply

Good discussion from everyone :-) It is an unignorable fact that one purpose of a dictionary is to help elementary school students check whether words should be put in their book reports. The prescriptivist response is to single out "bad" words and "condemn" them. The progressive response is to take a stance of, we are the ones in the forefront, it's not a matter of the words being bad, but of the English teachers being out of touch. I think for the most part everyone agrees that tags like {{slang}} are an excellent compromise :) I doubt many readers will say to themselves, "I don't mind putting slang in this report, but I can't put neologisms in it!" :D

For the sake of being constructive, here's a possible guideline for neologism status. I'm just putting this up with little thought, hopefully others will expand on it.

  • To be considered a neologism, three out of the following four conditions are required:
    • The word (or sense) is known to have been coined within the past five years, by a single individual, for the sole purpose of coining it. (See santorum)
    • The word is not a straightforward construction by agglutination (ruling out common sense words like windshieldlike, podiumward, etc., which could easily be "accidentally" "coined" by an author without even realizing it; also rules out tsunameter)
    • In a 3/4+ majority of the word's citations, the author talks about the word itself, or defines it as soon as it is brought up; as opposed to the author naturally, seamlessly slipping it in among other words (this rules out things like lolicon)
    • The word is not an eponym, Latin/Greek construction, or other such construction, defined in peer-reviewed academic literature, government literature, etc.
In addition, the word cannot have more than twenty-five independent citations in preserved mediums (as described in CFI).

This is just a rough proposal, hopefully it'll inspire some terrific discussion :-D Language Lover 00:47, 21 March 2007 (UTC)Reply

Defining foreign-language verb forms

This is something that has been on my mind for a while. It seems to me that our current convention for defining non-English verb forms is not to give the translation, as with all other (as far as I know) non-English entries (and as recommended at WT:ELE#Variations for languages other than English), but to give a definition. What I mean is that just as, for instance, the entry for hola is "#hello, hi" which tells me what that word means in English, I would expect a verb form that I've looked up, like comido, abro, getroffen, or karju to tell me "eaten," "(I) open," "found," or "(you) shout," with the appropriate glosses to designate which senses are meant. Instead, the definitions given there, and commonly at all non-English verb forms are in the form of "The past participle form of the verb comer." "The first-person singular of abrir in the present indicative." "past participle of treffen," and "Second-person singular imperative of karjua." These definitions are confusing and not very helpful for the reader. The meanings of first-, second-, and third-person might be common knowledge, but it's not readily obvious to the reader what we mean by conditional, subjunctive, imperfect, past participle etc. mean, or how they are translated into English verb forms. Imagine going to a dictionary to look up fuesen and finding "familiar second-person plural imperfect subjunctive form of ir".

I can't make out any good reasons for them except for the ease of automation. I think we should make it clear that while bots might mass-generate easily created definitions like these, the ideal one one should give a proper translation to an English word for to reader's reference, and should include a gloss if it is necessary for the reader's comprehension, as with other non-English entries. I'm thinking about a change like this one, which I think improves the meaning considerably: [6]. If I go fill in yéndose, should I write "leaving" as I've done for animado, or "present participle of irse"? Any thoughts on this? Dmcdevit 04:26, 21 March 2007 (UTC)Reply

This is an excellent point you raise, one I've pondered a bit myself. One thing to keep in mind is that "translations" are really only approximations, there is no one-to-one correspondence between most languages and English. I think there's a balancing act going on. One type of reader uses the dictionary directly to translate text. For them, your suggestion would simplify things. The other type of reader uses the dictionary as a companion for learning a language. For them, the suggestion might make things seem overly complicated. See Ruakh's excellent examples below. Hmm, it is a subtle and interesting thing which you have brought up!!!!  :-D Language Lover 04:40, 21 March 2007 (UTC)Reply
While you make a good point, I think that the technical information should not be removed. It is highly useful to many people. In addition, I think the "soft redirect" nature of these entries needs to remain. People need to know that they are not seeing the whole picture. I propose a format similar to that which is in place at λύῃς. Atelaes 04:55, 21 March 2007 (UTC)Reply
I don't mind including that information, certainly, but I'm still wary of putting it in the translation, when it isn't. Could we move information like "Present active subjunctive 2nd singular form of λύω" to a "Usage notes" or "Etymology" (or "Verb form"?) section, or do something with it to differentiate? Dmcdevit 05:12, 21 March 2007 (UTC)Reply
Oooh, I like λύῃς :-) You did a fantastic job with that word, Atelaes! :-) The only thing I'll add to your comment is this "best of both cakes" approach should be optional, to allow the quicker, more mechanical old method as well. One other thing, I wonder how one would directly translate the te-form of Japanese verbs? Is that possible? It seems to it is impossible. Language Lover 05:00, 21 March 2007 (UTC)Reply
I think we should define the form in terms of the lemma, as is currently common practice here, for a few reasons:
  1. If the lemma has a number of different senses, then it makes sense for all information to be contained on the lemma page, rather than listing all the different senses for each inflected form (a maintenance nightmare).
  2. Forms correlate poorly across languages. When you define yéndose as "leaving", you ignore the fact that Spanish gerundios behave quite differently from English gerunds and present participles; yéndose often means "in leaving"/"while leaving"/"by leaving" rather than simply "leaving", and conversely, leaving often means "yéndome"/"yéndote"/etc. or "irme"/"irte"/etc. By using a standard template for gerundios, we leave for ourselves the possibility of having and linking to a useful Appendix:Spanish gerundios or whatnot, rather than giving a largely unhelpful "translation". (Note: I use the term gerundio because there doesn't seem to be a good English word for this form. Are we really referring to them in entries as "present participles"? That's grossly misleading, because Spanish gerundios are quite different from present participles in languages that have true present participles.)
  3. I think you underestimate people. I think most people looking up a Spanish word in an English dictionary have some basic familiarity with the terminology; and if any don't, it's at least helpful to know that yéndose is a form of irse, even if they won't get what form without actually knowing a bit of Spanish.
RuakhTALK 05:03, 21 March 2007 (UTC)Reply
Yes, I was being too brief with yéndose, but your good point about poor correlation is true with all translations between languages, including gerundio, it seems, not just verb forms. That's why we suggest glosses to convey the proper sense. In any case, it may have been unclear of me to say that the current definitions arenot obvious. I don't actually think they are definitions. Even in English, it would be like defining cats as "The plural form of cat". But that's more akin to a part of speech, not the real meaning of cats, which would be "more than one cat". It's not that people won't get it, but that it just doesn't convey meaning, except by being a degree removed from the actual usage of the word. I deally, I would think it is best to put the translated meaning in the definition space, and then include the technical terminology specifying the precise tense/person/etc. in a section of its own. Dmcdevit 05:27, 21 March 2007 (UTC)Reply
Your English comparison is a good one; but I think the solution is still to stick to an explanation of what the word is (plural form of cat, adverbial participle of irse, etc.), but to use italics so it's clear it's not actually a definition. By the way, I don't just do this for inflected forms; a while back, I rewrote English adjective sense #6 of gay in a way that gave no definition, only italicized explanation. In that case, it's because there doesn't seem to be an actual definition, whereas in the case of inflected forms, it's because I think an explanation of the word is much clearer than an attempt at translation, but it's the same result either way. —RuakhTALK 06:11, 21 March 2007 (UTC)Reply
In your example edit, inspired is also the simple past tense of inspire -- what is the simple past tense of animar? (Assuming it has one, the French simple past isn't so common) Inspired is also an adjective! What would you propose for 言われた, passive past tense of 言う say? "was said" and nothing more? How about 言いました, past polite form? How would it differ from 言った plain past? Cynewulf 05:07, 21 March 2007 (UTC)Reply
I have been thinking about the formatting of λύῃς and similar words for some time now. This word illustrates an excellent example of what Ruakh is saying. The subjunctive sense does not always mean "might" (although it certainly sometimes does), but has a whole array of nuances. Thus, the translation given is not entirely accurate, or at least not comprehensive. I think this illustrates an important conflict which goes on throughout all considerations on Wiktionary. Do we make it user-friendly or comprehensive? Often times we can do both, and so it is not an issue. However, sometimes we cannot. For example, a word may have a subtlety in meaning which is not adequately covered in less than a decent sized paragraph. But most users simply want a quick and dirty definition and are not concerned with nuances of meaning. In my opinion, we should always strive for both, if at all possible. In this situation, I feel that providing both serves the casual user who simply wants a quick definition of yéndose or λύῃς and then wants to get the hell out of here, as well as the linguist who wants to know what those words "really" mean. It seems that the two do not detract from each other. In addition, including both takes care of Cynewulf's excellent critiques. Atelaes 05:19, 21 March 2007 (UTC)Reply
I guess this would have been a better way to do it? It's important to me that the usage notes are not an actual translation of the word though, and belong separate (as long as there is true translation, that is), and that we should encourage adding true translations and glosses to words that have only tense terminology. Dmcdevit 05:40, 21 March 2007 (UTC)Reply
I think that's a reasonable way to go. However, as we fill out these pages more (which I see as a good thing), we should come up with a way to show that these are "stub" pages, in essence, and that there is (hopefully) a whole lot more info waiting at the lemma page. Any thoughts on that? Atelaes 05:48, 21 March 2007 (UTC)Reply
I would have the bot move its additions to the usage notes section like in animado and leave a note, like "{translation needed}" sign (a template with a category?) after the # in the definition with a link to some explanatory page, in its place. Perhaps not the most aesthetically pleasing, but it seems like the most correct option. Most regular verbs in English with clear translations will be easy to add without needing glosses. Dmcdevit 05:56, 21 March 2007 (UTC)Reply
I have to say I disagree with that. Perhaps it could put it on the definition line and then also put it in a cat. as you say. However, there are a lot of inflections (do YOU want to go through the 90,000 Spanish inflected forms?), and I think we should simply admit that that's a project which shall be waiting for some time. Atelaes 06:04, 21 March 2007 (UTC)Reply
Well, I do, just not personally. :-) My thinking was that either format without clear translations is less than ideal, but moving the current content to a usage notes section (a bot could do that, I'm assuming) at least clarifies the entry. It's a work in progress either way. It's not a big deal though. I would like to at least update WT:ELE (or wherever it should go) with the preferred format, because it appears to me (notice that the two non-Spanish entries, German and Finnish, in my original post were not created by bots) real, live editors are now seeing the inadequate bot-processed creations as the conventional format. Dmcdevit 06:15, 21 March 2007 (UTC)Reply
I think usage notes is a little less than optimal. Usage notes is supposed to be a place for pointing out quirks and such. We could instead make a new section header called "Grammar". Or we could put the grammar data in the line where the word itself appear bold. For example:
Esperanto
Adjective
kreota Future participle of krei (plural kreotaj, accusative singular kreotan, accusative plural kreotajn)
  1. which will be created
I guess that could cause trouble with words which have tons of conjugations on that line. But maybe such words should be dealt like Japanese, with a separate conjugation table below? Incidentally, I think someone already pointed out the maintainance problem. If we do this, then any time we significantly change an unconjugated word, we'll have to make appropriate changes to all its forms... yikes :-) Language Lover 14:29, 21 March 2007 (UTC)Reply

(Coming back to the margin.) I'd like to reiterate the point already made here (and which I made a long time ago, in another long-since-archived discussion) that giving translations of inflected forms, rather than grammatical information and a cross-reference to the uninflected form, is a bad idea. One reason for this is that the English translations may have many senses. The French word poser can be translated as "to set", but the English verb has dozens of meanings (take a look at it in the OED). So if I edit the page for posé (the past participle of poser) and just give the translation as "set", then — leaving aside the fact that the past participle is identical to the infinitive in English and so "set" is ambiguous (it needs a gloss) — it is unclear which of the senses of "to set" I am referring to.

Of course, you could (and should) include a gloss, and this would be one solution. However, suppose the word being translated has many translations, and someone adds, edits or deletes one for the uninflected form. Then, in theory, they would also need to update all of the pages for the inflections. If they make a mistake, that means a lot of pages to roll back, especially for languages like French, in which verbs conjugate to give very many different forms. If they didn't do the updates (which is very likely) then the pages end up giving different information, or, in the worst case, contradicting each other.

If there is just a cross-reference, none of this extra donkey work is needed, and users can still find all the information they need. Note that we already do this with English inflections: if the noun "foo" has the meanings "foo: 1. an X. 2. a Y. 3. a Z.", we don't give three meanings at the entry for its plural: "foos: 1. Xs. 2. Ys. 3. Zs"; we just say "plural of foo".

The grammatical information is useful for those who understand it, and those who don't can find out what it means by looking it up in Wiktionary or elsewhere. — Paul G 15:43, 21 March 2007 (UTC)Reply

To cover the issue raised by Ruakh about differences in usage in different languages (such as "yéndose"/"irse" for "leaving") then this can be covered by giving usage notes and examples in the entry for the inflected form. — Paul G 15:45, 21 March 2007 (UTC)Reply
What I'm not understanding here with the concerns about ambiguities is that I don't see why the inflected form posé is any more ambiguous in translation than the infinitive poser. However the word is translated at the infinitive, it should simply have the same translation in the inflected form, except the English should be inflected to the proper tense as well. Ambiguities are a concern for all non-English words in translation; how does pointing back to the infinitive (which is then translated) with a tense specification change that problem? What makes these verb form problems different from normal translation issues with ambiguous English equivalents, which we just have to deal with and try to clarify as best we can with glosses or notes or whatever else the situation requires?
(If it's mostly about the extra work (editing the uninflected form requires changes to all its children), well, that's true, but it doesn't strike me as a very compelling argument. Endless work is the nature of the project.) Dmcdevit 17:30, 21 March 2007 (UTC)Reply
Oh, now I full on disagree with you. The point of having all the information at the lemma is so it doesn't have to be repeated. We can do a rather thorough job at the lemma, add twenty different English translations, an etymology, whatever we need to try and get it adequately covered. That in itself is difficult, but doable. Having all that at all its inflected forms, is not do-able. Not at all. This is the beauty of having non-lemmata as soft redirects. Once we state which part of speech and what their lemma is, we're done. We can work on the lemma for years, trying to get just the right translation, and it's no problem. However, if we include all the same info on all forms, inflected languages become a nightmare. I absolutely refuse to change all 200 or so forms of φιλέω every time someone adds a slightly better translation. And someone will, it's a pretty simplistic translation right now that I'm sure does not covere everything. Having full entries at inflected forms is simply not practical at all. Atelaes 18:32, 21 March 2007 (UTC)Reply
There was recently a similar discussion re Translations of inflected forms of English words at WT:BP##Plurals_and_translations. To go back to your earlier example, I should like to see something like:
̓λύῃς
  1. (present active subjunctive 2nd-person singular of λύω) often You might loosen (see λύω for further information).
(but preferably less verbose). While I should like to see as much info as practicable at the inflected entries, eg I feel cites using that form should be included there rather than at the lemma entry (and this in itself will help clarify meanings), it will not be possible, for the foreseeable future, to give all the detailed info on meanings and (for English words) translations, that are at the lemma entry. However, this need not prohibit giving the most common meanings and translations of inflected forms, provided it is clear to the reader where they can find further info if they want more than a quick and dirty answer. --Enginear 19:43, 21 March 2007 (UTC)Reply
Re: "[…] differences in usage in different languages (such as 'yéndose'/'irse' for 'leaving') […] can be covered by giving usage notes and examples in the entry for the inflected form": I strongly disagree. The solution is for irse (the lemma entry) to explain everything that's specific to the verb irse, and for Appendix:Spanish gerundios or the like to explain everything that's specific to Spanish gerundios (though this might actually be better as part of an Appendix:Spanish conjugation or something, rather than as its own appendix). It seems crazy to re-explain the function of gerundios at the entry for every single gerundio. —RuakhTALK 19:34, 22 March 2007 (UTC)Reply

I've thought of another problem: non-analogous lemmata. In Hebrew, for example, the verb "Template:he-link" means "to go", but the actual verb form "Template:he-link" is the third-person masculine singular past tense (suffix conjugation); the infinitive is Template:he-link (well, or Template:he-link), or Template:he-link — different linguists apply the term slightly differently to Hebrew — the take-home point being that no one uses any of these infinitives as the lemma). The current system handles this well: "Template:he-link" is translated as "to go", per universal tradition, and "Template:he-link" is explained as the infinitive of "Template:he-link", and so on. How would your system handle this? —RuakhTALK 21:21, 24 March 2007 (UTC)Reply

A similar problem occurs in Latin. Verbs in Latin have five infinitives: present active infinitive, future active infinitive (with three sub-forms), present passive infinitive, and so on. I can't even begin to imagine trying to translate correctly the sense of each infinitive form on every one of the verb form pages. I'd much rather say "present active infinitive of verb X" and put the grammatical explanation into an Appendix. --EncycloPetey 18:52, 25 March 2007 (UTC)Reply
I think definitions should only be given in the main article. If giving a definition for 10-50 inflictions (some languages has alot) and someone finds this definition could be explained better, then he has to change the definition in all 50 articles. If not, we will get tons of more or less thought-through definitions for all inflictions, all saying different things. It will be impossible to find the right discussion page of all these. The work with adding inflections by bot will be 100 times more work. And as a user, you will not be sure where to look for the information of best quality. Focusing on one main article, will make the quality so much better than spread it out on 50 different articles. To find the inflicted forms should just be a way for the user to find his way to the main article, the article with all the information. Including special information about certain inflictions aswell as perhaps a grammatics table of different forms. Creating "definitions" by bot, only stating the grammatical form, is the best way to keep it clean and simple, and adopting this standard will speed up the work considerably to include these forms and present them in a standardized and easy understandable way. The viewpoint many of you already suggested, that different inflictions of different languages often also lack direct equivalence in other languages, giving the grammatical info is also a way to give exact and correct information in an effective way. Then, creating grammatics tables in the main article will be a better way to serve the user with information since he can see the information of the infliction in the main article in its context related to other inflictions. ~ Dodde 01:07, 27 March 2007 (UTC)Reply

Format of abbreviations

I've added a section to WS:ELE on how to format abbreviations. In particular, I mention that expanded forms should notobvious error corrected --Enginear 20:01, 21 March 2007 (UTC) be in their usual forms and not capitalised just because the corresponding abbreviation is made up of capital letters (eg, the expansion of AI should be given as "artificial intelligence", not "Artificial Intelligence") and that expanded forms should link to Wiktionary or Wikipedia articles, as appropriate.Reply

It looks sound to me, but please make any necessary revisions. — Paul G 13:12, 21 March 2007 (UTC)Reply

I mention SNAFU there - it needs a gloss, as someone has already pointed out in RFC for that word. — Paul G 13:14, 21 March 2007 (UTC)Reply
Thanks for doing this. It looks good overall, but I think I disagree on one point. If the expanded form doesn't have and doesn't warrant a Wiktionary entry, then I think the components should be wikified as links within Wiktionary. Whether or not the expanded form warrants a Wiktionary entry, the {{wikipedia}} template should be used to link to relevant Wikipedia articles, which can include any Wikipedia articles on the abbreviation itself (e.g. w:SNAFU) as well as any Wikipedia articles on the expansions (e.g. w:Recreational vehicle). —RuakhTALK 13:44, 21 March 2007 (UTC)Reply
I see what you mean. The reason for linking to a Wikipedia article rather than the Wiktionary articles for the component words is that the user is likely to want to know what the whole expanded form means rather than its component words. But if we can do both, then great. Could you perhaps give an example to illustrate how this would work, and then we can update WS:ELE accordingly if people agree with your idea? — Paul G 15:21, 21 March 2007 (UTC)Reply
The disadvantage of the {{wikipedia}} approach is that it doesn't make clear, for those abbreviations with several meanings, which senses can be found there. I am tempted to specifically write (see Wikipedia article) by the appropriate senses. However, that would lead to an error if the 'pedia article were modified. --Enginear 20:01, 21 March 2007 (UTC)Reply
The problem you mention with {{wikipedia}} is not specific to abbreviations; it's a problem with any noun that has multiple distinct senses, and we need to formulate a general solution rather than a hackaround. (Actually, I think this is much less of a problem for the typical abbreviation, since Wikipedia tends to name articles after the expanded form, so the link text will make clear which sense is being referred to.) One option is to give that template a way to specify which sense is intended. —RuakhTALK 19:29, 22 March 2007 (UTC)Reply
The {{wikipedia}} template allows the entry of the direct diambiguated link. --Connel MacKenzie 15:57, 23 March 2007 (UTC)Reply
Don;t forget that it's possible to use a directed {{pedialite}} template in-line or at the end of the entry under "See also". The specific article name may be entered as a parameter. --EncycloPetey 18:38, 25 March 2007 (UTC)Reply
I honestly think the {{wikipedia}} template needs to be completely rethought. Intended to appear once on a page, it could not do more than link to the primarly definition of a word. But in many cases, not just abbreviations, there are a good number of relevant Wikipedia articles that need linking to. Proper names such as Disney are an example, but also any word that has a more specific technical sense, or several common meanings such as trunk, etc. DAVilla 18:25, 27 March 2007 (UTC)Reply

Our deletion logs are being harvested

It appears that any deletion with a deletion summary that contains "content was: 'text here'" gets harvested for the following site: http://www.in-vacua.com/interdiction.html

Now would be a good time for all admins to sign up for the "Replace text in deletion log comment." of WT:PREFS, so we don't accidentally expose personal info posted by vandals in the deletion log. fwiw, --Versageek 07:09, 22 March 2007 (UTC)Reply

Note: This has been de-Connel-ized to the Wiktionary namespace now. Please be bold rewording it. --Connel MacKenzie 21:54, 24 March 2007 (UTC)Reply

X form headers

The question of not using X form headers (Verb form, Adjective form, Noun form) was never quite formally resolved; WT:POS says at one point it is under discussion, but the tables say that X form is deprecated.

Any objection to just treating this as settled and routinely correcting X form to X? (Which a number of people have been doing for a long time? ;-) Robert Ullmann 14:00, 23 March 2007 (UTC)Reply

(Oh, the reason I ask is that AutoFormat is finding these with some frequency, should it be fixing them? Robert Ullmann 14:09, 23 March 2007 (UTC)Reply

I think that would be non-contentious for ==English== entries. But I recall some respected contributors claiming that X form was almost essential for some highly inflected languages. --Enginear 14:43, 23 March 2007 (UTC)Reply
I don't recall that being the conclusion at all. As I recall, it was that making the "form-of" distinction is even worse for foreign languages, than for English. This is supposed to be targeted to English readers after all. --Connel MacKenzie 15:55, 23 March 2007 (UTC)Reply
On the other hand, bot edits are supposed to focus on non-contentious edits, so this is probably outside the purview of AutoFormatBot. I don't recall seeing a proposal for it, by the way. Looks good so far, but should have more community input. --Connel MacKenzie 21:59, 24 March 2007 (UTC)Reply
Indeed, it is out of scope (User:AutoFormat#Principles ;-) if it is not long resolved. Probably should be voted on and the resolution added to WT:POS. If you look at the control table (User:AutoFormat/Headers) "Verb form" is listed as POS, and non-standard. That means "Verb Form" will get changed to "Verb form", but not to "Verb". And the section will be treated as a POS section. Robert Ullmann 22:29, 24 March 2007 (UTC)Reply
To be clear, I recall (and agree with) Connel's viewpoint on this, but I do not recall a clear consensus re highly-inflected languages, even though there was for English (there may have been a consensus, but I don't recall it). (But this is irrelevant if the change is only to regularise the capitalisation.) --Enginear 15:04, 25 March 2007 (UTC)Reply

Wiktionary:Things to do, Category:Wiktionary

Wiktionary:Things to do and the sysop pages in Category:Wiktionary need some attention. They are out of date. Thanks --Keene 01:01, 24 March 2007 (UTC)Reply

Homophones as a L4 header

I formally propose that we modify WT:ELE to recommend Homophones as a Level-4 header under Pronunciation, just as we have L4 headers for Synonyms and Antonyms following the definitions. Homophones are important enough to warrant their own header, particularly as they may confuse English Learners. Such words should not simply be listed in-line within the Pronunciation section, since they are separate words, and not aspects of the entry under which they appear. Unless there is mass opposition, I'll start a VOTE on the matter in the next week or two. --EncycloPetey 19:14, 24 March 2007 (UTC)Reply

I think you mean level four. (?) Pronunciations is L3 (unless under Etymology n) Not a bad idea. I've seen a number of them. Robert Ullmann 19:44, 24 March 2007 (UTC)Reply
Yes, you're absolutely right. I've modified the text above (and section header) accordingly. --EncycloPetey 21:08, 24 March 2007 (UTC)Reply
I think that's a good idea. Before it goes to a vote, though, we should probably have some discussion on how to make clear that homophones depend on dialect and speaker (Mary/marry/merry, witch/which, etc.). BTW, would this be used at words in all languages (or at least, all languages with non-phonemic writing systems), or only at words in English? —RuakhTALK 21:39, 24 March 2007 (UTC)Reply
I'm not sure how much would be needed. Each entry should have its own pronunciation(s) marked by region. Are you thinking about cases in which the homophones exist only in a limited range of dialect? I could see that as an important issue, and would like to hear suggestions. I seem to recall having seen some odd examples marked before, but can't recall which words they were.
Yes, this would be used in all languages, BUT each would be specific to the language section in which it appears. There would not be any reason to link a German word as a homophone in an English section, just as we wouldn't cross-link Related terms between languages. --EncycloPetey 22:36, 24 March 2007 (UTC)Reply
Before it goes to a vote, I'd rather see someone creatively come up with a new/better scheme for the L3 Pronunciation sections. The "look" of the Pronunciation sections currently is awful. Adding subsections to that would only make it worse. --Connel MacKenzie 22:02, 24 March 2007 (UTC)Reply
Something should be detailing the format of the Pronunciation section. Perhaps at Wiktionary:Pronunciation? (or is there another page already?) Then referred to from ELE. As Connel says, it needs some style ;-) Robert Ullmann 22:07, 24 March 2007 (UTC)Reply
It is my ultimate intent to have a fully-fleshed out style guide for the Pronunciation section at Wiktionary:Pronunciation along with a thorough summary at WT:ELE, but there are many, many issues to be resolved in the Pronunciation section and I am trying to attack them in small steps. Otherwise, we would have too many discussions going on simultaneously and none of them would be fully resolved. I started with the AHD --> enPR proposal, and am now tackling the issue of Homophones. I have a laundry list of other concerns too :) My thought is that the homophone issue makes a nice self-contained sub-issue that could be then formatted independently of the rest of the pronunciation section. We could deal with formatting the rest of the Pronunciation section next. --EncycloPetey 22:36, 24 March 2007 (UTC)Reply
Well then, my vote is for having a bulletted, indented "Homophones" tag within the pronunciation section. I don't want to have to rewrite Dvortybot to account for the intervening section. If you are only going to make it uglier and less consistent, I don't see the point at all. --Connel MacKenzie 22:45, 24 March 2007 (UTC)Reply
Who said anything about ugly or inconsistent? I'm suggesting we adopt a standard, and (if it assauges your concerns) this is the only subsection I can see as being worthwhile within the pronunciation section. Everything else should be part of a bulleted list (unless we come up with a better idea).
Part of the problem I have with a bulleted tag for Homophones is that such things don't show up in the Table of Contents (yes, I use them and like them). Having the section separate also eliminates the need to decide where to put the homophones. With a subsection header, it comes at the end of the pronunciation section every time. With your proposed bulleted item, it could show up anywhere in a list of items that may or may not all be included (regional pronunciations, various audio files, rhymes, and hyphenation, at least). This is part of what is making the Pronunciation section look "ugly" right now -- we have a mish-mash of items that all look different but are all set up in a list as if they had parallel format. --EncycloPetey 22:57, 24 March 2007 (UTC)Reply
It is hard to see how to deal with regional homophones without using bullets, each region having homophones shown after its pronunciations. But you're probably much more in touch with the ideas than I am. --Enginear 15:14, 25 March 2007 (UTC)Reply
Having now read EP's explanation below, I understand better, and think that his examples 1 & 3 are the best, for complex and simple cases respectively. --Enginear 19:42, 26 March 2007 (UTC)Reply
I've found some entries where that would rapidly become a mess. Within a region, there may be more than one pronunciation of a given word. Each specific pronunciation has homophones both in and out of the region, which vary with which of the regional pronunciations is compared. One recent headache is sere. There's a UK (Commmonwealth?) pronunciation of Lua error in Module:parameters at line 95: Parameter 1 should be a valid language code; the value "/ˈsɪə/" is not valid. See WT:LOL., which is a homophone of UK sear and one pronunciation of UK seer. In the US, there are two major pronunciations of sere: Lua error in Module:parameters at line 95: Parameter 1 should be a valid language code; the value "/siːr/" is not valid. See WT:LOL. and Lua error in Module:parameters at line 95: Parameter 1 should be a valid language code; the value "/sɪr/" is not valid. See WT:LOL., with the former having a southern US variant of Lua error in Module:parameters at line 95: Parameter 1 should be a valid language code; the value "/siːɚ/" is not valid. See WT:LOL.. Only this Southern variant is a homophone of US seer, but it is homophonic with seer as generally pronounced in the US. The second US pronunication is also regional, and depending on region is homophonic with either sear or sir, but not seer. The first US pronunciation is homohonic with sear, but not as pronounced in the Southern US.
Frankly, I can't envision any means of communicating even a fraction of that information cleanly if the homophones are interpolated between the various regional pronunciations, and we've only considered the US and the UK so far. I think it would be much better to list the homophones (and the rhymes?) in a Homophones section that is structured first by a bulleted list of IPA pronunciations. Each IPA pronunciation would begin a line of homophones, each identifying in parentheses the region (or dialect) for which it is a homophone.
Example:
====Homophones====
*{{IPA|/ˈsɪə/}}: [[sear]] (UK), [[seer]] (UK)
*{{IPA|/siːr/}}: [[sear]] (US)
*{{IPA|/ˈsiːɚ/}}: [[sear]], [[seer]] (US)
Of course this is just one possibility. I could imagine the structure of the Homophones section paralleling the main Pronunciation section by organizing along regional lines, just as the Synonyms and Translations sections parallel the list structure of the definitions:
Example:
====Homophones====
*{{italbrac|UK}}: [[sear]], [[seer]]
*{{italbrac|GenAm}}: [[sear]]
*{{italbrac|Southern US}}: [[sear]], [[seer]]
Or we could start off less ambitiously and just use:
====Homophones====
*[[sear]], [[seer]]
...and although that would eliminate all the dialectical information, there are some words for which that simpler form would be sufficient. Please keep in mind that this is not the best example for the potential difficulties, but it happens to be one fresh in my mind and therefore easier to find and discuss.
My feelings are rather strong on this issue because the homophones are words with distinct entries, rather than elaborations of the entry in which they appear. Just as we separate the synonyms, antonyms, and related terms into their own subsections rather than interpolating them among the definitions, so I would like to see the homophones separated into their own subsection rather than interpolated among the pronunciations. Particularly since there may be more than one pronunciation in a given region listed on the same line, and not all of them may share the same set of homophones. This wouldn't happen with our definitions, where each definition gets a separate line, but in the Pronunciation section it is a possibility and happens not unfrequently. --EncycloPetey 18:27, 25 March 2007 (UTC)Reply

I think my preference is for something like this:
====Homophones====
Note: homophones vary by dialect and speaker. Each of the following words is a homophone of ''sere'' for at least some speakers:
*[[sear]]
*[[seer]]
only preferably less wordy. To see exactly who treats those words as homophones, they'll need to look at the various pronunciation sections, but this both lists possible homophones (useful for language learners) and makes clear that they may not homophones for everybody. —RuakhTALK 23:40, 25 March 2007 (UTC)Reply

I agree with EncycloPetey homophones needs its own heading just like synonyms, antonyms etc. It's not always easy to state clear regions of which a certain pronounciation is used, so I think that information in paranthesis should be optional. The good information about the homophone's pronounciation should be in the page entry for the homophone anyway, not in the page where the homophone is listed. So I reject the idea given by example 2 where homophones are devided by region. Though I think it's great to devide the list by pronounciation given by example 1. It should also be possible to add homophones, even without adding the IPA pronounciation, and sometimes the page entries aren't that complex with many different pronounciation. Therefor also example 3 with a plain list of the homophones should be acceptable, in my opinion. ~ Dodde 02:07, 26 March 2007 (UTC)Reply

See also

While we are talking about headers: this is one of the most common headers on the wikt, and not mentioned in WT:ELE. It gets used at L4 under a POS, typically after Synonyms, Translations, etc, but before (recognized) headers External links and References. Convention seems to be that See also is references inside the wikt and WM projects that are not Syn/Ant/Related/Derived terms. (It also shows up at L3 when it shouldn't, and sometimes when it maybe should, and also shows up at L2, which it clearly shouldn't!)

I'd think it ought to be listed in WT:ELE in that place in the sequence, as references to other words/indexes/related bits that don't fit in the preceeding headers, but aren't external links, which follow. (Is all that clear as mud?) Robert Ullmann 19:44, 24 March 2007 (UTC)Reply

I can see uses both as a L3 and L4 header. When phonemics has a See also listing phonetics, that usage could certainly fall under the POS as a level-4 header. However, when that Afar entry links to the *Afar edition of Wiktionary, that usage of See also should be at level-3. I can't justify including such interwiki links in a subcategory of the Noun part of speech. --EncycloPetey 21:13, 24 March 2007 (UTC)Reply
That seems just about right. Something has to also say that the L3 use of "See also" has to be at the end of the language section, not intermingled with POS sections. Robert Ullmann 22:03, 24 March 2007 (UTC)Reply
If anything, WT:ELE should specify it as an L3 heading. The L4 headings are inappropriate, and should be "disambiguated" at the L3 level instead. --Connel MacKenzie 22:05, 24 March 2007 (UTC)Reply
I'd be fine with putting them all at level 3, or with using a combination of L3 and L4. --EncycloPetey 22:38, 24 March 2007 (UTC)Reply
I would be wary that "See also" sections are just an invitation for random trivia and spam to accumulate. Anything that we actually want readers to also see can fit under an existing heading, and if not, a new heading could be considered. The phonetics in phonemics above is a related term (or derived term), and "See also" shouldn't be used. It might not be worth going through all the instances of "See also" and changing them, but I don't see any reason to codify that type of header into policy. Dmcdevit 03:20, 25 March 2007 (UTC)Reply
No, not everything can be coded under an existing header, which is why the regular sysops use "See also" so often. For example, Semper uses it for taxonomic entries to link to subtaxa. (e.g. to link Oleaceae to Fraxinus). It's how I cose to link to . These are just two cases where none of the existing headers are really appropriate, and there any many more similar situations besides. Although the "See also" isn't officially sanction in the ELE, it's used all over the place and has been for a long time. I rarely see unwanted detritus accumulating there, though it does happen from time to time. I don't see that as a new problem, though, since we have the same rate of additions of duplicate definitions. --EncycloPetey 03:50, 25 March 2007 (UTC)Reply
I can't think of any cases where a "see also" section is necessary — lists of subtaxa fit more comfortably at Wikipedia or Wikispecies (though if there are just a few top-level subtaxa, or a few particularly important ones, then those should be mentioned in the definition line), and the astrological symbols are conveniently grouped into the interestingly named Category:Astronomical symbols, and phonemics can link to phonetics either in its definition line (in a "contrasted with" phrase, like at white-collar) or in the usage notes (in a "not to be confused with" note, like at affect#Verb), or both — but seeing as "see also" sections aren't going away anytime soon, it would be nice for WT:ELE to mention them and give guidelines on how to use them (where they should go relative to other sections, what kinds of links they should contain, how to format each link, how to order the links, whether and how to group the links, etc.). —RuakhTALK 05:35, 25 March 2007 (UTC)Reply
To be clear, what I mean is that if there is a case where none of the existing headers work, I would much rather that the editor add a descriptive one than a "See also". So, I'd rather see someone using "Subtaxa" (or whatever) than "See also". Dmcdevit 17:12, 25 March 2007 (UTC)Reply
The flip side of that is that it would proliferate the number of various headers, which we definitely don't want to happen. A limited set of headers at L3 and deeper means that (1) we can more easily search for and remedy spelling problems, and (2) we can have a short list for new users to learn and grow comfortable with. Too many extra headers makes it harder to control the structure of the data as we've been trying to do. The See also remains a much more flexible option, particularly when the user is directed to one of the Appendices. --EncycloPetey 18:04, 25 March 2007 (UTC)Reply

The above discussion (appropriately) ignores the other use of "See also" at the beginning of an article to link to alternate Capitalised/non-capitalised spellings, and sometimes spellings with diacritics. Such usage needs separate consideration. --Enginear 19:47, 26 March 2007 (UTC)Reply

absolutely

What would be a good term or phrase to define a situation or just a word that becomes used excessively- to where it begins to annoy people? Something other than redundancy. Basically comes to mind as an example. From what I see on TV, law enforcement and military personell are the main offenders. It becomes a mindless usage used in every other sentence. It ends up clouding conversation and not complimenting the talker. Another example could be the word absolutely. Where is the spelling checker on this thing? —This unsigned comment was added by Gord 6789 (talkcontribs) 04:03, 25 March 2007 (UTC).Reply

I'd say a cliché, catchword, or buzzword, depending on the details; but you might want to take your question to Wiktionary:Tea Room, which is more suited to that kind of question. —RuakhTALK 05:16, 25 March 2007 (UTC)Reply
Yes, cliché is the appropriate term here. You can also use "hackneyed phrase".
Wiktionary has no spellchecker. You can always spellcheck content in a text editor or word processor and then copy and paste it here. — Paul G 09:51, 25 March 2007 (UTC)Reply

Use of anchors in {{t}}

The template {{t}}, used for linking translations to other wiktionaries, is great, but it doesn't allow, as far as I can see, for the use of anchors. I've just been working on "vine", of which one sense is translated as "vite" in Italian. As this is also a word in French and probably several other languages too, I wanted to link the translation to the Italian section, thus, [[vite#Italian|vite]], but this won't work if the translation is given using the {{t}}.

I see that this was discussed when the template was created. Was it ever resolved? Couldn't the template be parsed to recognise an optional template following one containing a hash? — Paul G 10:04, 25 March 2007 (UTC)Reply

It would be so very, very useful if our language code templates didn't contain wikilinks, then we could translate any code to the canonical language name in another template, and this case would be trivial, {t} could just always generates the anchors (#{{{{{1}}}}}). And it is easy to link the result of a template call anyway, so someone wanting (say) Scottish Gaelic linked could just use [[{{subst:gd}}]]. But it is impossible to unlink the result of a template call. But there is resistance to just unlinking all the code templates, even though would be incredibly useful. Robert Ullmann 14:52, 25 March 2007 (UTC)Reply
What about another series of templates for "unwikified" language names? {{n-en}}, {{n-es}}, {{n-gd}} etc. You are trying to use existing templates for something they were not intended for. (Actually, doing that might be a bit crushing to the WMF servers - checking multiple cascading templates on each translation, in 21,000+ entries?) --Connel MacKenzie 15:06, 26 March 2007 (UTC)Reply

Move to WT:GP??? --Connel MacKenzie 15:06, 26 March 2007 (UTC)Reply

@Paul G: right now, the template adds the link automatically, but due to this, it only works with languages in the WT:TOP40. It is this technicality that is discussed above. So go ahead and use {{t}}. You can see in the preview that it gives the correct link. H. (talk) 16:46, 26 March 2007 (UTC)Reply

I go with Connel and suggest we use two setups of templates containing the language name, one containing the language names with wikilinks (or whatever it rules for the TOP40 and such is right now), and one for use with the {{t}}-template containing the language name without anything else whatsoever. I am not sure why Connel suggests the naming convention "n-". I think "t-" would be more suited since it will be used with t-template in translations lists. ~ Dodde 00:25, 27 March 2007 (UTC)Reply
Thanks. I picked "n-" thinking "name" but really, any prefix will do. What it really should be, is a list of Wiktionaries that exist. So, at the template level, if the language code doesn't have a language name, the template would know not to link to the non-existant foreign language Wiktionary. The "top 40" list is great, for what it does, but this really is a separate problem/function. --Connel MacKenzie 16:12, 28 March 2007 (UTC)Reply

Medical Eponyms

I believe the preferred form for medical eponyms in the AMA style book is to omit the possessive 's. However, there is some debate on this - http://www.medtrad.org/panacea/IndiceGeneral/n5_dirckx.pdf

Wikipedia reports:

In 1975, the US National Institutes of Health held a conference where the naming of diseases and conditions was discussed. This was reported in The Lancet (1975;i:513) where the conclusion was that "The possessive use of an eponym should be discontinued, since the author neither had nor owned the disorder." Medical journals, dictionaries and style guides remain divided on this issue. - http://en.wikipedia.org/wiki/List_of_eponymous_diseases#Punctuation

Should our von Willebrand's disease be Von Willebrand disease as in Wikipedia? (For now, let's ignore Wikipedia's unfortunate use of the capital "V"!)

Ben 12:42, 25 March 2007 (UTC)Reply

Therefore, the attested form you are suggesting we should move? That's not quite right. We should have entries for both forms with Usage notes describing the AMA's/The Lancet's prescription. Funny that they would use that logic - the person's attribution "owns" the disorder. Seems like a pretty weak excuse for trying to change lots of common disease names (which are used primarily by newspapers, not medical journals.)
If/when each new form is attested, we can add each "disorder" entry here. (The Lancet, itself, certainly counts as a "reviewed journal" - that is quite likely the publication for which that clause was added to WT:CFI.) --Connel MacKenzie 15:00, 26 March 2007 (UTC)Reply

The underlying reason for omitting the apostrophe, I suspect, was to simplify spelling, especially when the name ends in "s." The Lancet is a very fine journal, but The Annals of Internal Medicine omits the possessive (Neil A. Goldenberg, Linda Jacobson, and Marilyn J. Manco-Johnson. Brief Communication: Duration of Platelet Dysfunction after a 7-Day Course of Ibuprofen. Ann Intern Med, Apr 2005; 142: 506 - 509. "......concern given the high prevalence of von Willebrand disease (1 in 100 individuals)....". So does the Journal of the American Medical Association (at least since 1982 or so). The New England Journal of Medicine uses the pssesive for the disease, but omits it for "von Willebrand factor."

At any rate, how should we proceed? Would it be necessary to find a published example of each form and set up a new page for each? Abels test and Abels' test? Osler's nodes and Osler nodes? Or, do we just put a usage note on one form that indicates the other is sometimes used? Ben 12:05, 27 March 2007 (UTC)Reply

For a word/phrase to pass WT:CFI it should normally be possible to find three durably archived cites (or one in a refereed academic journal). However, if a word is categorised as a "misspelling" (or perhaps "misuse" though this is more contentious) a higher bar is set (not AFAIK defined). So is von Willebrand's disease a misspelling or misuse? I don't know, but I suggest that there has been a change of scientific fashion which is broader than medical usage.
Previously, those who discovered (or improved knowledge of) scientific entities were often linked to their discovery, as in Halley's comet or Weil's disease. But nowadays this is considered a flawed description -- Edmund Halley did not own "his" comet, and Adolf Weil did not suffer from "his" disease, as is perhaps implied by the descriptions.
So now, scientists refer to Comet Hale-Bopp (and indeed Comet Halley) and von Willebrand disease. For older phrases, I suggest that both are valid, perhaps with a note on the 's version that the usage is now deprecated within the scientific community. For newer discoveries, perhaps they should be treated the same, or perhaps the 's version is a misuse. Of course, if less than three (or one refereed) cites exist for a spelling, then the issue does not arise as it cannot even meet normal CFI.
To answer the specific question, I believe there should preferably be separate entries for each spelling which meets CFI, with cites for each. However, this doesn't mean that it is essential for a contributor to add more than one entry or add any cites at all. It is better to add a single entry for a term believed to meet CFI than add none at all; it is even better to add a "soft redirect" entry for "the other" spelling; having one or both entries cited is better still (some say this is best, while others of us would prefer two "full" cited entries). This is a wiki. Once a basic entry is in place, others can build on it (and usually will if its appropriateness is later challenged). --Enginear 15:45, 27 March 2007 (UTC)Reply

I think we should have, as Enginear suggested, seperate pages for each spelling and include relevant context labels, etymology or usage notes as appropriate.--Williamsayers79 13:31, 28 March 2007 (UTC)Reply

Enginear, thank you for restating what I said more clearly. --Connel MacKenzie 16:14, 28 March 2007 (UTC)Reply

OK, I like this solution, and I think I understand it, too, except: What is a "soft redirect?" Thanks --01:37, 29 March 2007 (UTC)

A "soft redirect" is what would be considered a "stub" entry on Wikipedia. The minimal Level two language heading, the minimal level three part-of-speech heading, and a "#" definition line using one of the form-of templates, such as {{alternative spelling of}}. --Connel MacKenzie 18:17, 29 March 2007 (UTC)Reply
Basically, a "soft redirect" is an annotated link: hametz is an example of a simple soft redirect to a full cited entry at chametz, cat-flaps is a cited soft redirect, while cat-flap and cat flap are full entries each noting the existance of the alternative spelling. --Enginear 18:26, 29 March 2007 (UTC)Reply

Language sort order

In WT:ELE, we specify that languages (after English) are to be sorted in alphabetical order by the English name in L2 sections, likewise in Translations sections. Strictly, that means that Classical Nahuatl should sort under C, and Old English under O, etc.

I think it would be better if we sorted on the base name in these cases, (which is what people often do anyway), so that Old English sorts as "English, Old", while remaining "Old English" in the header. And would group with English on the page in this case. "English, Middle" and "English, Old" conveniently sort into reverse chronological order following English.

Prefixes treated this way would be Old, Middle, Middle High, Ancient, Classical etc. Or we could treat any language name that ends with a recognized language name as something to be "inverted" for sorting?

Or do we just stick to strict alphabeticity? (is that a word?)

See wine and vino Robert Ullmann 16:30, 25 March 2007 (UTC)Reply

I'm not sure how I feel about that, but it adds another layer of complexity to transaltions section. We already group some languages rather strangely, varying by editor. Should the various forms of Chinese (which are not called "Chinese") be grouped together (look at the entry for birthday)? Should all eight or more flavors of Sami (look at Monday)? In short, your question is part of a larger sorting issue for languages in the Translations section. For instance, would you want to group Scottish Gaelic with Irish (Gaelic) becuase the end of their name is "Gaelic", or separate them because we have arbitrarily decided to call Irish Gaelic simply "Irish"? Do we group Tosk Albanian together with Gheg Albanian because they both contain the word "Albanian", or do we separate them because they're not mutually intelligible anyway? And if we decide on a case by case basis, just how long a list of little sorting rules would be too long?
I absolutely do not agree with placing language families together in the translations section. We should be consistent, applying simple rules. Even doing it for Chinese, this opens a can of worms. Are we then to do the same for other language families? No, each language or dialect that is identified should be alphabetized independently. DAVilla 20:44, 26 March 2007 (UTC)Reply
Your preliminary list merely scratches the surface of possible prefixing words; consider Western Apache versus Plains Apache, Moroccan Arabic versus Egyptian Arabic, Upper Sorbian versus Lower Sorbian, Inari Sami versus Lule Sami, and note that Tok Pisin is etymologically a compound as well (though I doubt the average user would guess that).
That said, I think it would be good to alphabetize while ignoring words like "Old", "Middle", and "Ancient", primarily because these describe a specific period of development in a language. I think it would be good to group the various forms of Arabic, and possibly the various forms of Chinese. However, this is a very tricky issue with many angles and I don't think I've gotten them all sorted out in my own head yet. --EncycloPetey 17:56, 25 March 2007 (UTC)Reply
Sort of what I was thinking: the "age" qualifiers should be secondary key (not ignored entirely). I think this will make a lot of intuitive sense to people, as well as being fairly simple to code where needed. (if starts with word in set, moe it to the end, then sort) I don't want to get into groupings, it is endless, and they overlap in various ways; this is (one of the things) that the alpha order was intended to avoid. Robert Ullmann 15:09, 26 March 2007 (UTC)Reply
I don't think this would be very transparent to contributors, so it really makes sense to keep the rules very simple. If you really believe this sort order is desirable, then you'd have to be willing to allow for the naming convention to be "English, Old" etc. But in reality this is of minimal benefit. Olde English could just as easily be called Anglo-Saxon, and there are other languages where the "old" language is only known by an entirely other name. DAVilla 20:44, 26 March 2007 (UTC)Reply

Does the ISO language definition code sort in an intelligible order? Ben

Not really. Some of the codes are similar to the English names, but that isn't the objective of the coding. For example, Mandarin, German, French, Dutch, Cantonese are in alphabetical order ... (cmn, de, fr, nl, yue ;-). There are wikts that use the code templates all the time, and sort on them (which produces a consistent, but often apparently random order). Robert Ullmann 15:09, 26 March 2007 (UTC)Reply
But I haven't seen anyone try to implement this within a section of an entry. How would this work apply to the various translations sections, and would such a format make it difficult for visiting translators to add or check translations? --EncycloPetey 21:59, 30 March 2007 (UTC)Reply
Have you given Hippietrail that suggestion on http://wiktionarydev.leuksman.com/ yet? He may already have something up his sleeve... --Connel MacKenzie 16:00, 31 March 2007 (UTC)Reply
Good idea. I've thought about this but not when I've been editing my todo list on WiktionaryDev. I'll add it now. — Hippietrail 17:19, 31 March 2007 (UTC)Reply
I'm sorry that I don't understand what the extension does yet. Hippitrail, if you could automatically pass a {{{languagecode}}} and {{{languagename}}} parameter to every template included within any section, it would be useful to the utter extreme. DAVilla 12:36, 3 April 2007 (UTC)Reply

Constructed languages

Hey, I know we've brushed over this topic before, but I really think it would be best to finally come to some sort of decision on the matter. Do we include constructed languages, or more specifically which ones? It seems rather clear that we do include Esperanto; I don't think there is much debate about that. But what about Quenya? It's an Elvish language constructed by J. R. R. Tolkien for his Lord of the Rings series. We currently have an anon cleaning up the section, and well, I guess I'd feel sort of shitty if a few months down the road we decide to squash all their hard work. We should either put a stop to it right now, or decide to allow this language. I must admit I don't have any strong convictions about it one way or the other. If any dictionary is ever going to include such things, we are certainly the perfect format for such a venture, not being limited by paper. However, this admittedly opens the doors to all sorts of nonsense. If I was forced to make a decision right now, I would say allow Quenya, but disallow certain other languages, such as Brithenig, simply because I like one language more than the other. But it seems that perhaps Wiktionary ought to have some more rigid standards than that. Any thoughts, anyone? Atelaes 23:12, 26 March 2007 (UTC)Reply

I would prefer to put lexicons for minor constructed languages in the Appendix namespace in a single page rather than in the mainspace, but I'm not sure of a good metric for differentiating major constructed languages (Esperanto, Interlingua, Ido, Lojban, etc.) from minor ones (Quenya, Klingon, etc.). Words, and by extension languages, whose use is restricted to a single literary work like Quenya would seem to fail CFI in my opinion, but that doesn't settle the matter completely, considering other languages like Toki Pona. Dmcdevit 23:54, 26 March 2007 (UTC)Reply
Does Quenya have an ISO language code? Wasn't that part of the stadard (or at least rule of thumb) we were using? RJFJR 16:04, 27 March 2007 (UTC)Reply
Yes, 'qya. But the relevant section of WT:CFI#Constructed languages says that uncoded languages are not acceptable, but coded constructed languages may or may not be; and gives a specific list. The current list (and policy) seems pretty good. I would think if someone wants to change the status of any given (coded) language, it just goes to a vote. At present Quenya does not meet CFI, it is explicitly listed as not approved (all of the constructed languages coded in 639 are explicitly listed as in or out).
So the question that presents is: do we want to change CFI to permit Quenya? Robert Ullmann 16:15, 27 March 2007 (UTC)Reply
I think we should stick with what CFI states until given good reason to do otherwise. It's just that I've never heard anyone interpret that particular CFI paragraph so simply. Last time I brought up this issue, it was a whole lot of "ummmmm"'s and "I don't know"'s. Well, that certainly answers the question to my satisfaction. All that remains to be said is this: If anyone disagrees with this, speak now. If I don't hear a community uproar in about a week, I'm going to start going through that list and cleaning out all the Quenya, Brithenig, etc. However, the question also remains of what to do with all these entries. My instinct is to go with Dmcdevit's excellent suggestion of putting them all in their own indeces. Atelaes 16:31, 27 March 2007 (UTC)Reply
What, precisely, do you mean, "cleaning out" that list? You'd need a separate vote on each one, would you not? --Connel MacKenzie 16:17, 28 March 2007 (UTC)Reply
By cleaning out the list I simply mean moving all the mainspace entries which do not meet current CFI (because they are part of a non-CFI language) to appendices. It does not mean that I'll be changing the list. I don't think that requires a vote. If you think it does, please say so. Also, does anyone know of a good example appendix which I can model the Quenya appendix after? Another question, should I leave a redirect (to the appropriate appendix) in place of the article, or just delete it entirely (after all the info has been moved)? Atelaes 05:38, 29 March 2007 (UTC)Reply
Whew. Thanks for the clarification; I'm glad I merely misinterpreted it the first time. --Connel MacKenzie 18:15, 29 March 2007 (UTC)Reply
I'm confused by that list. It says Interlingue is accepted, while Occidental is not; but according to our and Wikipedia's articles on them (Interlingue, Occidental, w:Occidental language), they're the same language, Occidental being an older name and Interlingue a newer one. Am I missing something? —RuakhTALK 17:34, 27 March 2007 (UTC)Reply
That's correct. It appears Occidental isn't used at all, and is not very notable except as Interlingue's predecessor. Dmcdevit 07:14, 29 March 2007 (UTC)Reply
My thoughts are tat if these oddities do not meet the current CFI then they sould be removed from the main namespace. There is no harm in having them in an index or appendix are like the proto-languages.--Williamsayers79 13:06, 28 March 2007 (UTC)Reply
BTW, Quenya is stretching it a bit anyway, but Brithenig really takes the biscuit!--Williamsayers79 13:06, 28 March 2007 (UTC)Reply

The problem with this though, is that while I can say why Quenya is forbidden, as someone who isn't familiar with these languages, I can't tell why Novial, for instance, is included. I can't even find any indication its noticeably more well-known or used than the others. Dmcdevit 07:14, 29 March 2007 (UTC)Reply

Novial has some active speakers/writers/users. See, for example w:nov:Chefi pagine ;-) Quenya is just a vocabulary in a literary work (albeit a very notable one). Robert Ullmann 18:21, 29 March 2007 (UTC)Reply
I suspected as much, though I was hoping for a more quantitative measure to differentiate between the non-literary conlangs. Dmcdevit 22:06, 31 March 2007 (UTC)Reply

This isn't a keep/kill vote really, but I actually benefited from our Quenya entries just yesterday, when I read this, leading me to look up tengwar. If not for wiktionary, I probably never would have figured it out. In that sense, it is nice to have entries for Quenya words, and I'm tempted to say, "what harm can it cause?" On the other hand, obviously Brithenig words shouldn't be put in unless they see a massive increase in usage. I think Quenya really does fall right smack in gray area, and it really is a tough decision. I think it would be good if, assuming we move the Quenya stuff to appendices, when people search for a word which we don't have, search results might include the Quenya appendix (or other language appendices) if applicable. Although our appendices are awesome, I don't imagine many of our casual readers have discovered the many joys of appendices yet :-) Language Lover 23:36, 29 March 2007 (UTC)Reply

One thing though, which I just realized, is that all our Quenya entries are morphologically transliterated into the Roman alphabet! If Tolkien's Elves really did exist, wiktionary would be next to useless to them, since we wouldn't have the words in their rune forms, even if those runs could somehow be transmitted into the search box!  :) Language Lover 23:39, 29 March 2007 (UTC)Reply
As it turns out, there actually is a Unicode range reserved for Tengwar. How many people can see this:  ? Somehow I can. But, I admit that many people probably can't. In any case, I think it might be nice to have both Latin and Tengwar scripts. Atelaes 07:00, 30 March 2007 (UTC)Reply
"Reserved" is not the word I'd use. The ConScript Unicode Registry attempts to coordinate the use of the Private Use Area for artificial scripts, and recommends the use of a certain part of the Private Use Area for Tengwar use; but so far, according to w:Tengwar, only one font supports it, and given the nature of the Private Use Area, this can never become standard. —RuakhTALK 21:11, 30 March 2007 (UTC)Reply
Moving constructed languages that are used in one or more major works, but do not meet CFI as living languages, to appendices seems like a brilliant idea to me. -- Beobach972 21:32, 31 March 2007 (UTC)Reply

Amending WT:CFI

I do think we should codify this better in WT:CFI though. If we agree that constructed languages whose primary use is restricted to a (series of) literary work and its fans do not meet WT:CFI, may be allowed in lexicons in the Appendix: namespace, but are not appropriate in the main namespace, shall we put that to a vote? Currently, CFI seems to imply that there is no agreement either way.

I can't think of a good metric for other ISO 639-3 languages. It has to do with how well used it is, but a measure of that would be nice, if anyone can think of one. Dmcdevit 22:06, 31 March 2007 (UTC)Reply

Specifically, WT:CFI#Constructed_languages implies that there is no agreement; I would like to change the section to add a fourth bullet stating "There is consensus that languages whose origin and use are restricted one or more related literary works and its fans do not merit inclusion as entries or translations in the main namespace. They may merit lexicons in the Appendix namespace." Dmcdevit 00:51, 4 April 2007 (UTC)Reply

From the current list of languages in WT:CFI#Languages to include, which would be moved there? --Connel MacKenzie 04:45, 4 April 2007 (UTC)Reply

Quenya, Sindarin, Klingon, and Orcish I think are the applicable ones mentioned in other categories. Dmcdevit 05:20, 4 April 2007 (UTC)Reply
Sounds worthy of a vote, to me. Thank you. --Connel MacKenzie 05:26, 4 April 2007 (UTC)Reply
I created the subpage: Wiktionary:Votes/pl-2007-04/Fictional languages. Any last suggestions about the wording, or should I make it live? Dmcdevit 06:01, 4 April 2007 (UTC)Reply

Archiving of WT:RFV

I notice that for the last year Wiktionary:Requests_for_verification/archive, linked from the header on WT:RFV, no longer has a list of words which have failed RFV (and the Jan & Feb 07 archives don't exist at all). Is the list of failed words available somewhere else? If not, how are we meant to check if a word has failed in the past? Previously a search for it on the /archive page was sufficient. I vaguely remember this being discussed, but can't remember the resolution. --Enginear 11:57, 28 March 2007 (UTC)Reply

Someone had volunteered to manually maintain that list. I've taken stabs at automating it, but have not had time to devote to completing my preliminary efforts on that task. --Connel MacKenzie 16:06, 28 March 2007 (UTC)Reply
I have a suggestion for how to maintain an archive of RFV-failed terms without maintaining the terms themselves anywhere on wiktionary where search engines could find them and archive them and cloud future verification requests (which seems to be a community concern and reason against keeping them on Wikt): remove them from the RFV page, then put a link to the diff in the archive. What do you think? -- Beobach972 18:38, 28 March 2007 (UTC)Reply
Thanks. That is an excellent suggestion, actually. I forget why it fell into disfavor in the past. As we move towards an automated solution, I think that merits another look - as it is a superior method. At any rate, the problem is implementing the solution - the day to day drudgery of someone actually doing it. --Connel MacKenzie 18:13, 29 March 2007 (UTC)Reply

WordWeb 5 Freware Dictionary

Anyone have experience with this yet? They seem to be using Wiktionary (Yay!) so it might be worth checking into, in detail, at some point. (I stumbled across it, here.) --Connel MacKenzie 18:22, 29 March 2007 (UTC)Reply

Hmmm, they don't seem to be complying with the GFDL very well though. --Connel MacKenzie 18:29, 29 March 2007 (UTC)Reply

RFVing of words with generous b.g.c. hits

Kind contributers, I wonder if anyone here would agree with the following proposal. I propose that if a word has at least 10 immediate citations right at the front of books.google.com upon a simple search, and they are independent, then the burden of proof should be upon the person who wants the word deleted, not upon those who want the word kept. With the current system, a person could go RFV cat, dog, the, and pencil, and the burden of proof would be on those of us who like those words, to spend some of our time writing citations for these obviously good words. For example, someone recently RFV'd usurpress, even though providing cites for this word is just a tedious task of entering it in b.g.c. and choosing some cites from there and typing them in here. We could better use our time than that, unless the person doing the tagging offers an actual reason why the word should be deleted :-) The oversight of words is an important part of the dictionary, but in cases where a 1 minute search will immediately make it clear a word passes CFI (without even resorting to controversial cites like usenet), I think such words' presence is definitely a good part of our dictionary :-) Language Lover 19:06, 29 March 2007 (UTC)Reply

That misrepresets the current practice quite wildly. If something is "clearly in widespread use" there is no reason to issue an "rfvfailed" for it. But in theory, all entries should have references, so an RFV isn't the sinister thing you are making it out to be. Re: "oversight:" Wikipedia has a special meaning for this term; in general, I mean the common meaning of the word, not the Wikipedia special meaning. --Connel MacKenzie 19:14, 29 March 2007 (UTC)Reply
Wow, thanks Mr. MacKenzie, this is an interesting aspect I didn't realize.  :-) You're definitely not considered one of the leading Wiktionary contributers without good reason!! :D See below (my response to Enginear) for more... Language Lover 20:28, 29 March 2007 (UTC)Reply
I agree with Connel on this one. It does not seem to be the case that people actually are RFVing common words, and then just forcing someone else to work on them. From what I can tell, people are generally only RFVing rather obscure words, which are the ones which most desperately need cites. Ultimately, I think it must simply be admitted that part of the drudgery of entering obscure words is fighting for their existence. If people start RFVing dog, cat, and the, then perhaps the policy might need some revisiting. However, for now, I think it works well as it stands. Atelaes 19:26, 29 March 2007 (UTC)Reply
And also, this is a wiki. We don't all need to do everything ourselves. If someone refers a word they don't recognise, without checking adequately, then as you say, it is quick for someone else to correct them (as I've just done for WT:RFV#lose one's rag). OK, it then takes time to add the cites to the page, but I've yet to see a case where a link to plenty of clearly appropriate cites had been added to the RFV page and yet the entry still failed. Either someone copies the cites into the entry, or someone maintaining RFV uses their discretion to strike it from the RFV page, or to leave it in place for another month until one of us has time. --Enginear 19:51, 29 March 2007 (UTC)Reply
Wow, thanks for all this great discussion :-) Alright, it looks like people generally are cool with the current system, at least as long as noone starts indeed RFVing cat and dog.  :-) Altering the subject, what would you guys think of a new {{rfcites}} tag for words which one does not want deleted, but simply for which one feels some in-article citations would make our readers happier?  :-) Mr. MacKenzie brought up the great point that sometimes an article would improve with more cites, but is not one we want to delete outright. A tag similar to rfphoto would be entirely appropriate :-D Thanks Connel, you are a great innovator!!! :-D Language Lover 20:27, 29 March 2007 (UTC)Reply
I think that an {{rfcites}} is an excellent idea for words which are obviously in use, but could use some cites simply as an effort at improvement. However, one thing which would need to be considered is some general guidelines as to what sort of entries could specifically use cites. Because, ultimately, all entries which don't have cites could be improved by them, but I imagine that {{rfcites}} would be used on entries that, for some reason, would especially benefit from them. Atelaes 20:50, 29 March 2007 (UTC)Reply
Also of concern with such a teplate, is Eclecticology's idea of separating {{rfv}} and {{rfv-sense}} onto separate pages. That same distinction (for {{rfvcites}}) seems advisable, so this sort of confusion is (perhaps?) less likely to arise. Maybe. I don't feel like creating a separate scheme for them though, nor am I particularly inclined to monitor yet-another-maintenance-page. --Connel MacKenzie 15:54, 31 March 2007 (UTC)Reply
While I have struck words for being clearly widespread use (and I probably should have done so for attender, but what the heck), I could not do so for usurpress as much as it is evident to me that it belongs. DAVilla 20:10, 29 March 2007 (UTC)Reply

Wiktionary:Contact us and OTRS

You might notice our brand new "Contact us" link in the sidebar. It goes to the new Wiktionary:Contact us page. This is a feature of all Wikipedias, and it leads to various help pages as well as a link to the Foundation email address (OTRS). We decided to start answering Wiktionary-related emails at OTRS, and that's one of the primary reasons for the new sidebar link. I think we'll see what the volume is in the next few days and then determine what to do about volunteers, if anything. However, currently that page is mostly a copy of Wikipedia's page, with the content substituted for Wiktionary equivalents by me. Please spruce it up, make it better, and more Wiktionary-like. Dmcdevit 21:00, 29 March 2007 (UTC)Reply

Dictionary.com

A few months back, default http://www.dictionary.com/ lookups started displaying translations.

Dictionary.com has always been the number one source for inadvertent copyright violations, on en.wiktionary.org. Particularly, from visiting Wikipedians unfamiliar with our rules and our particular copyright concerns.

To me, there seems to be a direct correspondence between the change at dictionary.com, and the increase in contributors entering questionable translations. It did not occur to me that d.com was the source of the translations, particularly for sockpuppets of people who did not speak those languages.

Could all sysops, when checking translations, please remember to check dictionary.com, to see if any patterns evolve? To me, edits like this are particularly disconcerting. Do we automatically block indef for stuff like this?

Thanks in advance, --Connel MacKenzie 15:33, 30 March 2007 (UTC)Reply

I think block indef would be a large overreaction. However, you have a good point that this is something that should be watched for and stopped. Atelaes 18:08, 30 March 2007 (UTC)Reply
Blocking them for a while and let them explain themselves. If not indef block. That edit you picked out highlights the kind of arrogance some of these people display, a good blocking always brings them down a peg or too.--Williamsayers79 18:27, 30 March 2007 (UTC)Reply
Well deal with it as you like, but please don't block the user who's editing jungle. I'm working with them. Atelaes 18:30, 30 March 2007 (UTC)Reply
I agree. I don't see wht's so disconcerting about that edit, Connel. Eseentially, the only changes are (1) changing the outdated language name "Hindustani" to the more usual "Hindi", and (2) changing the POV so that it leaves open whether the word came into English through Urdu or Hindi (instead of definitiely from one or the other). I don't see either of these changes as potentially a copyvio. --EncycloPetey 21:44, 30 March 2007 (UTC)Reply
I wouldn't say that the term Hindustani is outdated. The term encompasses Hindi and Urdu, and it especially refers to the colloquial versions (mutually intelligible) of both standards (which have become so distant from each other, due to political reasons, that they're almost mutually unintelligible). --Dijan 20:33, 1 April 2007 (UTC)Reply

Math words with many equivalent definitions

In math, there are some terms for which there are many definitions, such that the definitions are all actually the same, but that fact is not at all obvious. For example, computability theory is famous for having tons of definitions of computable functions, which seem utterly different, but turn out to be identical (but proving that takes a lot of work). I wonder how we should define such words. How about if we gave a broad handwavey definition, together with a link to a subpage which lists the most common formal definitions? What do you guys think? :) I'd like to make a page for semisimple, and I'd also like to add computability theory senses to recursive and computable.  :-) Thanks, y'all!!! :D Language Lover 06:21, 31 March 2007 (UTC)Reply

Hopefully, in such cases there will be a comprehensive article on Wikipedia that may be linked. In such cases, a general definition and link to Wikipedia should suffice, since Wikipedia allows for a lengthier discussion. --EncycloPetey 08:00, 31 March 2007 (UTC)Reply
With the current software limitations, I am pretty strongly opposed to "subpage-for-everything" concepts. The "Citations" tab was enough of a nightmare, but at least it has direct lexical relevance to dictionary-making. --Connel MacKenzie 15:56, 31 March 2007 (UTC)Reply
It's not only in maths that precise definitions are required -- some of the terms used in linguistics, for example, seem equally complex to me, and certainly in physics/engineering, work is defined in several ways which all end up with the same result (though I accept this is a lot simpler than the words you are talking of). It's just that the math ones stand out more in a dictionary. I think there's a general consensus that "technical" definitions should be included, provided that they are not too distracting to that majority of users which only wants the everyday meaning.
In the long run, once someone works out how to do it, I like the idea of collapsible sub-sections for precise definitions, and again for cites <onto soapbox>(to me, the idea of Citations sections/sub-pages reliant on glosses is wrong: citations help to draw out the exact meanings of definitions so should be adjacent to them) <off soapbox>. But meanwhile, here's "one I prepared earlier". I did it a few months ago, and I don't really like it. But I've offered it for criticism before, and so far no one's improved the layout.
Obviously, it would look better, to most, if the Template:italbrac sections were collapsed, and only opened up to those who wanted them...Now I've remembered it, I'll add some cites soon. --Enginear 18:14, 31 March 2007 (UTC)Reply
I'm sorry, but most of that information strikes me as encyclopedic rather than as definitional. For one thing, what's with the list of common heat sources? Is this to imply that the use of the term "boiler" is dependent on what heat source is used? If a new heat source were discovered today and people started using it to build boiler-like devices and started referring to these devices as "boilers", would you consider that to be an extension of the existing sense? —RuakhTALK 21:58, 31 March 2007 (UTC)Reply
I largely agree, which is why I said "I don't really like" the article. I did it 10 months ago, when I was far less aware of what should/should not be here. I am fairly sure that the short list of prohibited heat sources (hot water and steam) in the Template:italbrac sections is definitional and never likely to change. Some of the rest should probably be in Usage notes (since it makes clear circumstances where the word should/should not be used, at least in the building industry), and some should be junked. Improving it is still on my "to do" list, but I won't be offended if you rewrite before I get to it (which will be a few days at least). --Enginear 20:00, 1 April 2007 (UTC)Reply

Much ado about Graphemes

Since I see that the last edit to the Beer Parlour was on the "Connel is an asshole" topic (and I know we've all heard enough of that), I thought I'd try and start a discussion on something a bit more constructive:the formatting of letter entries. Seeing as we currently have about five letters in RFC, and, in my opinion, most of our letter entries are kind of messy and unstandardized, I thought a BP discussion on the topic was in order. In my opinion, the first topic which needs to be covered (and has been much discussed on the RFC page, without any conclusion) is what header does a letter go under. The precedent appears to be translingual, and perhaps it should stay that way. However, Stephen has made the excellent point that, while most Latin letters are used in a slew of languages, most other letters are somewhat more restricted. Take the letter β for example. As far as I can tell, it is used only in Greek. Now, bear in mind that I'm taking specifically about β as a unicode character. Coptic also uses beta, but it is a different beta: . So, in that respect, it's not terribly translingual. How about the character 𒊕 (don't worry, I can't see it either), used in Sumerian and Akkadian? By the way, there is an insightful discussion about this topic centering around that particular character at Wiktionary:Requests for cleanup#𒊕. Another question which arises is what information should be included at the entry for a letter. Should it have a pronunciation for every language which uses it? Should these pronunciations all be within a single L2 header, or should each language receive its own L2 header with a pronunciation section within it. If they're all within a distinct Language header, what's the part of speech? And where would we include IPA in all this mess? Does a letter get an etymology? It certainly descends from something (in the case of β, it comes from the Phoenecian letter 𐤁). I imagine there is a whole slew of other issues that could be raised, but I figure that's enough to start. To facilitate this discussion, I've created the entry β/test, which is an identical copy of the β page. I figure we can use this page as a testing ground for different ideas without worrying about presenting the users with some half-baked crap. I picked a Greek letter because it's a bit less complicated than a Roman letter (fewer languages), and thus seemed rather more appropriate for testing. It even comes with it's own special template: {{greek letter-temp}}, which is only used on this page. Hamaryns has asked that this template be cleaned up anyway, so I figure people can fiddle around with it a bit without screwing up all the Greek letters in the process. So, any takers? Atelaes 12:38, 2 April 2007 (UTC)Reply

I’m not having any luck with the encoding [[&#55304;&#56981;]]. I wonder if you mean 𒊕 Wiktionary:Requests for cleanup#𒊕. The original works for me, as does the last, but not &#55304;&#56981;, it looks like <|=| H. (talk) 09:51, 4 April 2007 (UTC)Reply
Yes, besides that fact that some scripts such as the Roman alphabet are used for many languages, other scripts are used only for one or two languages (e.g., Lao, Burmese, Thai, Korean, Oriya, Khmer, Tamil, Malayalam, Cherokee, and so on). Also, there are some letters in common scripts such as Cyrillic that are used for only a single language (e.g., Cyrillic is used by a number of languages, but the Cyrillic letter Template:RUchar is limited to Chuvash). And while some scripts are used by multiple languages, there are some languages that use multiple scripts (e.g., Serbian).
In my opinion, the way I set up the Cyrillic and Arabic scripts takes everything into account and works well. See for example ж and Template:ARchar. The principal header is the name of the script (==Cyrillic alphabet==, ==Arabic alphabet==) and the next level heading is ===Letter=== (in the case of alphabets), ===Syllable=== (for syllabaries), ===Logogram=== (such as Sumerian).
As you see in ж and Template:ARchar, the "definition" lines indicate the position in the alphabet and the pronunciation in the different languages that use that letter in that script, in alphabetic order.
After the script section that describes the letter, if the letter in question is also a word or abbreviation of some languages, these sections follow with second-level headers (==Russian==, ==Urdu==, etc.).
I always thought that that word translingual was very odd for this purpose. A few scripts such as Roman are used by many languages (translingual), but most scripts are not. And there are some letters in the Roman and Cyrillic alphabets are restricted to a single language. To me, symbols such as !@#$%&*()-+=/., are translingual, because they are used not only by virtually every language that uses the Roman alphabet, but also by languages that use several other alphabets, including Cyrillic and Greek (although the meaning of specific symbols vary from language to language even if the typographic symbol does not.
We have had short discussions about this several times over the last couple of years, but nothing has ever been decided. I did the Cyrillic and Arabic alphabets and could do the some for many other scripts, but I don’t want to do it when this is still all up in the air the way it is. —Stephen 18:25, 3 April 2007 (UTC)Reply
I think you set those up very nice, but am unlucky with the L2 headers X alphabet. That’s why I would propose to just put them one level lower, with, indeed, Translingual as the l2 header. As I suggested in the page about the cuneiform above, we probably want to think of a better word instead of translingual. Maybe ==Symbol==, with l3 ===Cyrillic letter===, ===Roman letter===, ===Cuneiform logogram===, ===Diacritic===, ===Ligature===, ===Reading mark=== (for !;:.$, but there is probably a better English word for that), ===Mathematical sign=== (for +-%#∃∄∃∈∉...), ===IPA symbol===, ...
For the other use of Translingual, for things that are more than symbols such as µg, ff, mW etc., it may be kept, or we think of a second alternative. H. (talk) 09:22, 4 April 2007 (UTC)Reply
Thanks for bringing this up. As some might have noticed, I’ve been spending a lot of time on the first few letters of the Greek Alphabet. β sort of represents what I think it should look like now. But if you browse through the recent history, you see that I’ve come a long way to getting to that form. I would recommend that (and perhaps also for α, γ, ...) before commenting here. H. (talk) 13:08, 2 April 2007 (UTC)Reply
Wait, I thought Wiktionary treated Modern Greek and Ancient Greek as separate languages? β is used in both.
At any rate, I've been thinking about this for Hebrew letters, and what I was thinking was:
  • letters are translingual; even if the letter is only used in one language's writing system, actual graphic references to the letter are the same no matter what language the referring text is in.
    • they don't have a pronunciation per se, though we could have e.g. an Appendix:Greek alphabet that gives that kind of information (though insofar as this borders on a discussion of the phonological history of Greek, it might be more appropriate on Wikipedia).
    • uppercase and lowercase letters are separate, for a few reasons, most notably that case mappings depend on the language (e.g., in Turkish "i" and "I" are different letters, with "i" having a dotted uppercase counterpart and "I" having a dotless lowercase counterpart), and that the Greek lowercase letters should have their uppercase counterparts as their etymologies.
  • names of letters are language-specific; for example, "beta" is an English word that refers to both β and Β.
  • the definition of a letter in an ordered alphabet should link to its predecessor and successor. (The reason I say this should go in the definition is that the same letter may appear in multiple alphabets — especially common with Latin-based and Cyrillic-based alphabets — in which case the letter's predecessor and successor may vary. For example, n should have a separate definition for the Spanish alphabet, giving ñ as its successor.)
  • if a letter is a specific form of an abstract letter (like β is of beta, and a Japanese katakana character is of a kana character), then it should link to the other forms.
  • So, for example, I think β should be something like "==Translingual== ===Letter=== β # Lowercase beta, the second letter of the Greek alphabet (uppercase form Β), coming between α and γ." (Plus the other definitions at that page, obviously.)
Is that reasonable at all?
RuakhTALK 16:47, 2 April 2007 (UTC)Reply
Excellent conclusions. I especially like the distinctions you make between the symbols and their relationship to the alphabets, in case and in order.
Does any transligual word have a pronunciation? How many different expanded forms are there for ? Apparently translated as знак номера in Russain! How many ways are there to say TAXI? In Nigeria, /dag'zi/... DAVilla 19:20, 2 April 2007 (UTC)Reply
Actually, I think this is a case where translingual is misapplied. As far as I know, languages that use the Roman alphabet do not use the symbol . It is very familiar and perfectly readible, but it is quite unusual to actually use it in a text. Cyrillic is a better description of , because langauges that use Cyrillic do not have the letter N readily available (at least in pre-Unicode days), and therefore that symbol is specifically provided on Cyrillic keyboards (the uppercase of the number 3 or sometimes 4). Japanese uses various things for this, including , etc., and as far as I know, the symbol , although it appears to be a Roman symbol, is Cyrillic only. —Stephen 18:57, 3 April 2007 (UTC)Reply
Once again: if we simply think of an other word which does not have the connotation that it is to be used in more than one language, this could be solved. I am all for one header which applies to everything which is not a word in a language and thus does not fit under a ==language== header. H. (talk) 09:22, 4 April 2007 (UTC)Reply
Indeed. Grumble, grumble, now I’ll have to redo a lot of my greek letter work. And indeed you’re right that the lower case forms were derived from the upper case ones, I should have thought of that. Hm more input still welcome. H. (talk) 10:14, 3 April 2007 (UTC)Reply
I don't think your template is useless. Just put it under a language-specific header. The example above would be placed in Greek. There's no reason not to define it in both the Translingual header and in the languages from which the letter actually derives. But realize that there could be more than one instance of the template on a page, and work towards making a more concise format when shared by may languages, e.g. on n. DAVilla 12:16, 3 April 2007 (UTC)Reply
No. I don’t like that at all. The template is some sort of extra thingie, it would be ugly if there were more than one on one page. But it can be extended to allow for more than one previous / next letter, by using named params or something. I’ll give it a try, since the problem just arised for ζ (sixth in modern Greek, seventh in Ancient Greek).
I think n is a bad example, it really needs some cleaning up (which I’d be happy to do, once this discussion has settled, and I finished the Greek alphabet, and perhaps some other ones :-) ) H. (talk) 09:22, 4 April 2007 (UTC)Reply
I had a go at {{greek letter-temp}}, to accomodate more than one previous letter, and put it into use in β/test. Have a look. H. (talk) 09:51, 4 April 2007 (UTC)Reply

And by the way, do we also have to make distinctions between different forms of letters that are conflated in English? The two lower-case a's have different meanings in IPA. This is handled by unicode, as are, strangely, a number of other very similarly looking characters, but what about cases that are not? The number 7 can have a stroke through it in some parts of the world, two strokes in others. In Taiwan the left bar of the 5 extends upwards vertically. (I have even had my handwriting "corrected" by a local.) Print, in block letters, the words "island", "glands", and "sliding" and then compare them. You might be surprised! What about symbols that don't have a unicode equivalent, such as the happy face, many of the more obscure and antiquated astrological symbols, and some of the symbols used in print by various magazines, journals, etc. to indicate the end of an article? DAVilla 19:41, 2 April 2007 (UTC)Reply

I think every unicode symbol deserves its own page. No redirects at all. The fact that different languages use different orders in the alphabet makes templates like {{greek letter}} uselessdifficult to use. That’s a pity though, since they are nice. Anybody have an idea how to combine the two? One such table per language using the letter is absurd, but something similar would be nice. H. (talk) 10:14, 3 April 2007 (UTC)Reply
What if they're exactly the same symbol, just with different uses? Why bow down to Unicode? DAVilla 12:20, 3 April 2007 (UTC)Reply
It's not a question of "bowing down" to anyone or anything. What can anyone possibly look up terms here, using? Since the distinction by spelling has already been made, it seems only reasonable to extend that same by spelling (of headword) to individual symbols. --Connel MacKenzie 03:36, 4 April 2007 (UTC)Reply
Good point. Unicode gives a short description of each symbol, maybe for a starters it is possible to import that with a script? And even non-Unicode symbols are welcome, but do they still exist? H. (talk) 09:22, 4 April 2007 (UTC)Reply
Does the happy face have a Unicode character? Might seem silly, but remember we're talking about a noncommercial symbol that's instantly recognizable internationally and used in writing today. DAVilla 16:16, 4 April 2007 (UTC)Reply
Indeed, it has two: (U+263A, WHITE SMILING FACE, = have a nice day!) and (U+263B, BLACK SMILING FACE). —RuakhTALK 16:31, 4 April 2007 (UTC)Reply
Wow! Unicode is so complete that counter-examples are clearly difficult to come by. Really compelling ones, that is. The handicap sign and boy/girl stick figures just aren't used in running text. Not that I'm aware of, anyways. For the more contemporary ones, I'm sure I've seen a little symbol for a TV here and there, as a fancy bullet or what have you. Nah, maybe just an icon. What are we down to, the ancient Chinese only coded in BIG-5? DAVilla 00:06, 5 April 2007 (UTC)Reply
I just read up about Chinese in Unicode (due to the decomposing suggestion below): there are 70000+ Chinese ideographs in Unicode, so you’ll have to search far to find some which aren’t, but indeed, they do exist (there are some examples in the document referenced in the below discussion). And I’m pretty sure there are some obscure mathematical symbols which aren’t, yet. But eventually they all will be, I suppose. Hell, even the most abstruse cuneiform symbols are in there. H. (talk) 10:38, 5 April 2007 (UTC)Reply
By indices, similar to Chinese characters. The symbol (that is, one of the symbols) for Pluto uses a combination of P and L. Going from the planet to the symbol is easy. In the other direction, if you found it online and wanted to look it up you could copy and paste it. But if you saw it in a book and you didn't know what it meant, there would be no other way of telling the computer "look at this and tell me what it is" than to decompose it.
I have no objection to making a separate page for each unicode character. It's not certain that it's the ideal solution but it's certainly the most clear one. I would jost hope that some of them are very closely linked, even tighter than a simple "see also" at the top. DAVilla 16:09, 4 April 2007 (UTC)Reply
We might make exceptions for symbols that are only present for backwards compatibility purposes, though, such as CJK Compatibility Supplement: U+2F800–U+2FA1D. H. (talk) 10:38, 5 April 2007 (UTC)Reply
You don't have any choice to make an exception in this case ;-) the WM software (correctly) maps to the standard character, so you can't make an entry at the compatibility code-point. FYI: User:Robert Ullmann/Han is a complete map of the CJKV/Han characters we have. Robert Ullmann 11:53, 5 April 2007 (UTC)Reply
I've altered β/test to conform to my vision of what the proper formatting should look like, which can be seen here. I suggest that others might consider doing the same, as it's much easier to see the stuff in practice than in theory. I've put my name at the top as an L1 header, in case others put their own versions, just so it'll be easier to keep track of whose version is whose. A few notes: First of all, it should be remembered that Ancient Greek did not actually use this letter, which is kind of interesting. We use minuscules in our Ancient Greek words because that is the general standard in other Ancient Greek works. I think it best to simply get the Wiktionary Ancient Greek section up to the standards of other lexicons before trying to outdo them. But it's something will will certainly come up in the futre, but is not really germane to this particular conversation, and so I'll drop it for now. I've dropped a lot of the stuff which should really be on the majuscule version's page. All of the information which I feel is specific to the character (outside of the context of any specific language) I've put under the translingual header. Everything which depends on the context of a specific language, I've put under the headers of the languages. As for the template, I think that, with a bit more tweaking, it could be general enough to be used for most languages, and would be best used in the language sections on the letter entries. Atelaes 21:54, 4 April 2007 (UTC)Reply
Good idea, I put my version in its own section below it: [7]. I borrowed some of your ideas, and interspersed mine with small comments, where suggestions are welcome. Most important I find that I use the template only once, with the accommodations I made to it to have multiple previous/next letters for different languages. I am not enough of a historian to decide on some points. which I put in the comments. H. (talk) 10:38, 5 April 2007 (UTC)Reply
That is an excellent idea (I was thinking people would each just have a version, but your idea is much better. The facts are, ultimately, unimportant at this point, only the format. Atelaes 15:32, 5 April 2007 (UTC)Reply

Some input please

It seems that only Atelaes and me are interested in this any more. What do others think of my suggestion to use ==Symbol== instead of ==Translingual==? Who else wants to experiment with β/test? Stephen, you at least should have a go. I want this settled, so I can continue with the Greek alphabet. H. (talk) 15:25, 6 April 2007 (UTC)Reply

I can accept ==Translingual== for symbols that are used by numerous languages and even in different scripts (Roman, Greek, Cyrillic, etc.), such as !@#$%*()[]/:;,.?, but it strikes me as silly if the symbol in question is only used by one language and in only one script, such as (a Tamil "ka"). There is noting "translingual" about it. So, Symbol would be a better choice, although still a problem in some cases, since the alphabets used by some languages include digraphs, trigraphs and tetragraphs (e.g., Dutch IJ, ij). If a tetragraph can be considered a "symbol", then it wouldn’t be too bad.
However, if we use Symbol, then some "symbols" will be letters of alphabets, some symbols will be punctuation, some symbols will be numerals, and some symbols will be symbols (e.g., @#$%*)). That means that there would be cases where the L2 heading was ==Symbol== and the L3 heading was also ===Symbol===.
Besides ===Letter===, ===Symbol===, and ===Punctuation===, there will also be ===Logogram=== (e.g., Sumerian, where a glyph has both syllabic and semantic value), and ===Syllable=== (e.g., the syllabaries of Amharic, Oriya, Gujarati, Bengali, Thai, Khmer, Lao, and so on). Also, there are some true alphabets that only write "letters" that have been composed into complete syllables (e.g., Korean, Phags-pa).
So I still hold that the name of the script (Roman alphabet, Cyrillic alphabet, Greek alphabet, Cuneiform script, and so on) are the best choice for L2 headers, keeping the type of symbol (punctuation, symbol, letter, syllable, logogram) for L3 headers. But if it comes down to "translingual" vs. "symbol", I much prefer "symbol". —Stephen 05:10, 15 April 2007 (UTC)Reply
There is a serious problem with using things other than languages at L2: there are hundreds of bots and programs that read the en.wikt, to add entries to other wikts, to extract various kinds of info, etc. Level 3/4/5 headers (if valid) are in a smallish set, 50 or so; a program can have a table of what it is interested in, and treat others as unknown/errors. But at level 2, the program cannot reasonably have a "complete" table of the languages (7000+ coded now), so the only way it can parse the heading is to recognize "Translingual" as not a language, and treat all of the others as language names. And that is what they do. If there is another open-ended set of headers at L2, with no syntactical indicator that they are not a language, the parsing is irretrievably broken. And we don't have any syntactical indicator. (If we were using XML or something, we'd use L2-lang and L2-thing or whatever.)
More abstractly, to maintain the ability to abstract the semantic meaning from the entry syntax, L2 must always be a language name.
The other point is that "Translingual" is exactly the right header for the Cyrillic and Arabic alphabets, each is used in dozens of languages. (And the letters aren't "symbols".) Things like the Tamil "ka" can just be under Tamil (as all of the Hiragana entries are under Japanese.) Robert Ullmann 12:00, 15 April 2007 (UTC)Reply

English to Arabic wordlist relicensed to GFDL

Arabeyes.org is proud to announce that its GPL English to Arabic wordlist was relicensed to GFDL to meet the Wiktionary needs. The source PO files can be found here. It already has a web interface named Qamoose. It can be a valuable addition to the Wiktionary. --Chahibi 01:23, 3 April 2007 (UTC)Reply

I'm quite limited on Wiki-time right now, myself. Please (everyone?) see Help:Bots / WT:BOTS etc. (The help page is obviously my first draft - please be bold rewriting it.) I think if 20-30 of our current admins take an hour to install the bot framework, we'd have a respectable pool of bot operators to draw from (and much greater understanding of the advantages and limitations, all around.) --Connel MacKenzie 04:54, 3 April 2007 (UTC)Reply

Words that are the same in other than English language.

What to do with words that are the same word as in English, in some language other than English, and with largely the same definitions? I.e. most of the time words that come from Latin or Greek, such as epsilon: in Dutch it means about the same as in English (of course), except for the computer science meaning. The question is: what to put in the Dutch definition line:

# [[epsilon#English]] (letter, mathematics, phonetics) 

i.e. a short gloss (but not so nice, and can get long if a lot of definitions coincide) or

# The name for the fifth letter of the [[Greek alphabet]].
# {{context|phonetics|lang=nl}} The [[IPA]] symbol that represents the [[w:open-mid front unrounded vowel|]].
# {{context|mathematics|lang=nl}} An [[arbitrarily]] small [[quantity]].

i.e. a repitition of the English definitions? H. (talk) 10:35, 3 April 2007 (UTC)Reply

Other languages use a translation, not a definition, where possible, which means that the first option is better. However, there should still be a separate definition for each foreign sense of the word. That might mean making three definitions which all translate to the same English word, with three different glosses. DAVilla 12:08, 3 April 2007 (UTC)Reply

{{trans-top}} and AutoFormat

At Connel's request, I added code to AutoFormat to convert top/mid/bottom only within Translations sections to trans-top/etc.

If you add {{rfc-auto}} to an entry when editing it will find the entry, even if not run for a while.

The gloss is correctly folded into the template if it is ;... or ... a few variant cases won't work (see name), these show up in Category:Translation table header lacks gloss. This is only done in the Translations section; top isn't supposed to be used elsewhere, but often is. Robert Ullmann 11:06, 3 April 2007 (UTC)Reply

The bot probably shouldn't touch anything under {rfc-trans} or {checktrans} either, or if it does then it should treat those cases specially, with the "gloss" being 'Translations to be checked' or similar. DAVilla 12:05, 3 April 2007 (UTC)Reply
If the "Translations to be checked" header is there it won't. (You might be surprised at how often it changes "Translations to be categori{sz}ed" to the correct header ;-) Stopping at either of those two templates is a good idea; will do; it will just leave the rest alone. Robert Ullmann 12:11, 3 April 2007 (UTC)Reply
Thank you. Shall I change all "{{top}}"s to "{{rfc-auto}}{{top}}"s?  :-) --Connel MacKenzie 03:39, 4 April 2007 (UTC)Reply
Please don't. I've cleaned out a number of the table-header-lacking-glosses entries in that category and found the work to be tedious and mundane. In a few of cases I actually had to write a gloss, or used one the bot missed, but on most pages it was unclear and all of those translations had to be ttbc'd, and adding ttbc tags is a repetitive chore. On the other hand in a few cases like summer I was able to do some research to discover when the second sense was added, and wound up being able to write a gloss after all, one that applied to translations in several dozen languages. I think we should strive for that kind of solution, not overburdening the translation work any more than it is, and I feel that there's a lot of clutter that we really don't need to be digging up until there's a more automated solution. In other words, marking those where a gloss does not exist does not solve any problems. It floods the more interesting work with trivial tasks that really only pass the buck onto the translators. I don't have an immediate solution, although hopefully some day about half of the checktrans traffic I think could be eliminated with a bot that were history-aware. Maybe someone else could clear out part of the category and get a feeling of what sort of things need to be done. DAVilla 15:45, 4 April 2007 (UTC)Reply
Please note my "smiley"! --Connel MacKenzie 20:19, 5 April 2007 (UTC)Reply
By the way, you'll find that the fewer the number of definitions, the easier it is to salvage the table. But the majority were ttbc'd as I said. DAVilla 15:48, 4 April 2007 (UTC)Reply
If we were to do this, there is a much easier way (add the cat to {top}!); but we shouldn't do that yet. I've changed the code for now to not convert the templates where it can't find the gloss. (So as to avoid flooding that cat for now.) If you wanted to tag entries that have ''' or ; at the start of one line and {{top}} on the next, that might be useful. Then we can see where we are. I wonder how many instances of top outside of translations sections we still have? Robert Ullmann 12:06, 5 April 2007 (UTC)Reply
Answers to my own questions: top is used about 24 thousand times, in just over 15 thousand entries; about 12 thousand do not have glosses. It is used about 700 times outside of translations/ttbc, where it shouldn't be used; mostly in derived and related terms. Robert Ullmann 15:57, 10 April 2007 (UTC)Reply

Components of Chinese characters

I'm not sure if this has been discussed before (the discussion archives are a bit difficult to search), but the Chinese character entries are missing a decomposition into components, as described in wikipedia:Radical (Chinese character), subsection "Character decomposition".

The decompositions could be given as Unicode ideographic description sequences (see [8], figure 11-8) and if necessary also in some other format. It would also be useful to have indices based on them, as most dictionary programs have a way of doing component search and Wiktionary should too. Multicomponent search and other such complicated things could be left to external software which could just get the indices from Wiktionary. The ultimate wiktionary project could also provide the extended search functionality if/when it materializes.

Of course there are many characters that are hard to produce good decompositions for, but most are easy, and there's no need to fret over the details. Simple graphical decompositions provide good enough indices for searching. Actual radicals and etymologies etc. are also a separate matter. If there's some kind of decision on this then one could start adding the decompositions right away, just like stroke order diagrams are being added incrementally. -- 130.233.24.129 11:32, 3 April 2007 (UTC)Reply

Thank you for your suggestion. A character decomposition section may indeed prove useful to someone wishing to know more about a particular character. I would anticipate that the most challenging aspect of such an undertaking would be the shear amount of time and effort involved in inputting such information. Unless a non-copyrighted database containing this information is already in existence, we would have to type this information by hand, one character at a time, into Wiktionary. My hope is that some day, we will have enough Chinese speakers to tackle such tasks in a short amount of time. For the time being, there are only a handful of contributors that work on Chinese entries. Of these, I'm the only one fluent in Chinese that regularly contributes Chinese words (Mandarin and Min Nan). My main activities to date have been focused on two areas:
  1. creating entries for useful Chinese words and phrases that are not found in other Chinese-English dictionaries
  2. creating entries for words found in the Appendix:HSK list of Mandarin words
I also recently finished the Appendix:Amoy Min Nan Swadesh list, and completely revised the Appendix:Mandarin Swadesh list that originally came from Wikipedia. If you are interested in working on character decompositions yourself, there are several of us here who could offer formatting suggestions, proofreading etc. If this sounds like something you would like to work on, I would suggest that you create an account for yourself. Once you have done that, you should read WT:ELE and WT:AC. -- A-cai 12:18, 3 April 2007 (UTC)Reply
It is something I'm interested in, but I don't tend to contribute much on wikis. I would contribute decompositions now and then if there was an accepted format for them. I don't know any Chinese though, only Japanese.
There's no public domain database of decompositions that I'm aware of, but there is a GPL one at [9]. GPL is unfortunately incompatible with GFDL, even though both are GNU licenses. You can do searches on the aforementioned database at [10]. E.g. if you enter 糸車口 it gives you a list containing 轡, and with 肉退 a list containing 腿 (because the 月 is 肉月 you have to enter 肉; I think it would be more useful to allow 月退 too as that's what it looks like graphically). It allows both the actual radicals and their meanings, e.g. both ⺅中 and 人中 give you 仲.
Anyway, as there's an existing (free, even if incompatible with GFDL) implementation, it's both possible and useful. I don't think there's need to do this in a short amount of time - it's not like this information will become obsolete any time soon. It will eventually be complete even if done little by little. If there were a few examples and maybe a category of "Character decomposition needed" like there is "Cantonese definitions needed" etc, a casual visitor like me might add a few when they see they're needed. I've added some entries from time to time for Japanese words and would do that for character decompositions if there were an accepted format for it. -- 130.233.24.129 12:59, 3 April 2007 (UTC)Reply
How about you just go ahead, create one or two entries as you see fit, post them here, and then others can comment on it and make suggestions. Someone has to be the first... H. (talk) 10:09, 4 April 2007 (UTC)Reply
Robert, I'm thinking that this is something that should be in your Template:Han char template under the translingual section. Do you think it would be a problem to add a variable to the template? If we use as our model character, then the character decomposition would look like: 宀子. We would put this information under a variable called comp or something. For example:
{{Han char|rad=子|rn=39|as=03|sn=6|four=3040<sub>7</sub>|canj=十弓木 (JND)|comp=宀子}}
would produce:
字 (radical 39 子+03, 6 strokes, cangjie input 十弓木 (JND), four-corner 30407, composition 宀子)
That should do the trick I think. -- A-cai 23:05, 3 April 2007 (UTC)Reply
I'd also like to see for example 字 listed on both and or the proper indices. The radical is more important of course, but this dictionary is not limited by paper constraints. DAVilla 15:54, 4 April 2007 (UTC)Reply
I think it would be nicer to use IDS descriptions instead of a plain list of components. E.g. 字 would be "⿱宀子", 轡 "⿱⿲糸車糸口" and 疑 "⿰⿱匕矢⿱龴疋". This way the layout and the count of each component are also present. IDS is originally meant to describe characters missing from Unicode to the reader, so having such descriptions would also be useful if the user's font is lacking some rare characters that are in Wiktionary. Having a list of these would also facilitate advanced searching in external software (such as browser plugins or free dictionary software). Simple indices should of course ignore this extra information, as that would get too complicated. Simple component lists (i.e. "宀子", "糸車口" and "匕矢龴疋" for the above) are not bad either, but I think the extra information with IDS is useful, too. -- 130.233.24.129 17:46, 4 April 2007 (UTC)Reply
This is probably obvious, but.. The component list should be restricted to characters that have entries in Wiktionary and be linked there. Index:Chinese_radical lists the radicals. There are some compatibility characters in Unicode that look the same but don't have Wiktionary entries, e.g. ⼥(U+2F25) vs. (U+5973). As a result some differences will have to be ignored, e.g. instead of using the compatibility characters ⻌⻍ one would always use . In the same vein characters like would be decomposed as and . Using instead of is better because that's how it looks like; similarly is better as and than and . -- Coffee2theorems 13:31, 7 April 2007 (UTC)Reply

Getting backing before making drastic changes.

I've been reprimanded recently for making changes to some of Wikt's pages. Sorry for that. I'm still quite new tho'. To make my point, where does one go to get support for changes here? One example is my recent creation Template:Keene-un. This is a template which I figure is used to save time, and isn't a 'bot, so is it ok to use? Do I have to get backing to use it? Also i editted WT:ELE recently, making only minor changes to improve the flow of the page, but got blocked for it. Is Wikt so stringent as to worry about things like this? --Keene 23:14, 4 April 2007 (UTC)Reply

I don't know what the policy is for having personal templates in the common space. I guess there should be some recommendation about it since it's easier to type {{subst:#invoke:template link|show}} or the like than [[:]]. These kinds of templates are useful, and could be developed into a Go-failed button. Do make sure you do substitute it though, including the 5 pages listed under "What links here". DAVilla 00:14, 5 April 2007 (UTC)Reply
I have done this {{xhan}}; of course I can just delete it myself when I'm done. I don't think it is a problem if you make sure the name doesn't conflict with various reserved spaces (2 and 3 letter templates, and things starting with 2 or 3 and -). "keene-un" seems reasonable. Make sure it says in noinclude tags that it is yours, and can be deleted if left around, and do tag it with {{delete}} when done.
As to the WT:ELE edit, you did more than "improve the flow"; you deleted important text, explaining that they should not be entered manually. (IMHO, the section could be reduced to just that sentence; it is the only thing most users need to know: don't add or modify iwikis!) Robert Ullmann 12:16, 5 April 2007 (UTC)Reply
WTF? Why aren't you just using the preload templates? Is there a bug in one of them? --Connel MacKenzie 21:33, 5 April 2007 (UTC)Reply
But why was he blocked for this? The edits don't appear all that radical. Granted, he deleted the last sentence, which was perhaps a mistake. But, it does not appear to be a malicious act on his part. As for the preload templates, maybe as a new contributor, he did not know about them. Am I missing something? -- A-cai 05:52, 6 April 2007 (UTC)Reply
The 3-day block seems a little harsh. Aside from the last paragraph, the edits did not change the substance. But that issue is completely unrelated to this. DAVilla 21:38, 7 April 2007 (UTC)Reply
Hehe... what are preload templates? *Language Lover deftly dodges all the thrown tomatoes and eggs* Language Lover 14:01, 6 April 2007 (UTC)Reply
Is there no process by which contributors can go about making new tools? This template clearly had a more specific purpose than any of the preload templates provided. DAVilla 21:38, 7 April 2007 (UTC)Reply

Thesaurus resource

http://lists.wikimedia.org/pipermail/wikitech-l/2007-April/030762.html

--Connel MacKenzie 20:17, 5 April 2007 (UTC)Reply

Before I start the pagefromfile.py to populate Wikisaurus with some real entries, does anyone have comments on this? --Connel MacKenzie 05:57, 7 April 2007 (UTC)Reply
As thesaurus entries are generally interesting, I plan on not requesting the bot flag for these, to increase visibility, and throttle them to one entry per 20 minutes so people can fiddle with them. --05:59, 7 April 2007 (UTC)
I thought the argument against using a bot for the Thesaurus was that entries were too complex and required close scrutiny of the precision of a given term for a given definition. I would be interested to see what pagefromfile pages looked like, but I have to imagine that very few meaningful pages would emerge from them. - [The]DaveRoss 01:39, 11 April 2007 (UTC)Reply
Wow I wish I had your mad skills at programming, Connel :-) A programming master like you is a great boon to the wiktionary. Let's turn Wikisaurus into a Wikisaurus REX!! :-) Language Lover 02:16, 11 April 2007 (UTC)Reply

Time to whittle

Original by dcljr

The entry for time has become our longest regular definition page, at over 40K, thanks to hundreds of "Derived terms" added by User:Paul G in February. I wanted to bring people's attention to this because it seems to me that many of the added terms are unnecessary, being either technical terms that probably don't warrant their own entry here (such as acquisition feeding time or clot retraction time), terms that are [arguably] easily understood by considering their constituent words (such as at what time or closing time), or alternate forms of other derived terms (such as about time too, when about time is already listed). (Note: I've notified Paul G about this comment, in case he wants to respond.) - dcljr 22:06, 5 April 2007 (UTC)Reply

I would be quite happy to keep them all (Wiktionary is not paper). They are nicely hidden, and we might even get around to defining some of them one day. I am a bit miffed that he has beaten my list of defined terms at poly- (definitions in progress). SemperBlotto 22:23, 5 April 2007 (UTC)Reply
What I don't like about this is that is obscures the more critical words like timely in this huge list. I have suggested before another section called Compound terms which would take phrases and compound words, those formed by simple concatenation of words with or without spaces, and leave Derived terms for the remaining words, those being words formed as blends and in particular with affixes. However, I'd like to hear what User:AutoFormat has to say about this since he or she likes to revert my edits and is clearly more knowledgeable on what would be best for Wiktionary with regard to this matter. DAVilla 07:38, 6 April 2007 (UTC)Reply
Sounds like a WT:VOTE is needed for "===Compound terms===" then? --Connel MacKenzie 06:07, 7 April 2007 (UTC)Reply
Does anyone have a better suggestion for what to name them or how to define the differences? A good test case might be vineyard. Should I bundle into the proposal that their priority placement is much lower than Related terms, even lower than Tranlations? If it's a 3-level header then it isn't dependent on part of speech. Is it dependent on etymology? DAVilla 21:29, 7 April 2007 (UTC)Reply
Compounds like at what time are completely transparent to fluent English speakers, but if you've ever studied other languages, you know that these are actually very idiomatic. The prepositions are mostly arbitrary. For someone learning English as a 2nd language, such constructs are not transparent at all. Now as for the bigger issue... I seem to be in the minority for being in favor of making lots of "/" subpages. If I were a supreme arbiter, I'd make a list of the most "important" derived terms, and below that, have a link to a subpage with the complete list of derived terms. :-) What does everyone think of that idea? Language Lover 13:56, 6 April 2007 (UTC)Reply
Subpages are NOT supported for this stuff, by the WM software. Don't use subpages for anything other than "Citations" (which has only rudimentary SW support.) --Connel MacKenzie 06:02, 7 April 2007 (UTC)Reply
Long pages are not bad, in and of themselves. --Connel MacKenzie 06:02, 7 April 2007 (UTC)Reply
Derived and compound terms should be dependent on etymology, yes; rush hour is certainly unrelated to the Old English rysc. -- Beobach972 19:41, 9 April 2007 (UTC)Reply
Well, yes, but currently, as derived terms, they depend on more than the etymology. Being level-four headers they would depend on the POS. This is deliberate and supported by Paul G. But I'm not the only one who has had difficulty in classifying them. At the same time, for those that are classified correctly, do we want to toss that differentiation out? I need to look at time again... DAVilla 20:38, 9 April 2007 (UTC)Reply
Yes, this can be confusing, e.g. timer is derived from the verb, not the noun. But at least it's clear where that one comes from, and that's a bad example because it really should be a derived term anyway. Paul G had brought up two examples with seal, I think, that even he wasn't sure of, but those cases are rare.
There are also some terms that include "time" but are not derived from it, such as counter-time. So I'm not entirely certain that Compound terms even at level four is an appropriate as a header unless we were to clarify that they are also derived terms, or if we can accept that they may not be. I do think being able to extract timely from that list would help a lot. DAVilla 23:52, 9 April 2007 (UTC)Reply
While long pages aren't bad necessarily, they are usually bad anyway. Even though we aren't paper and we technically have the capacity for gigantic pages, they aren't generally easy to navigate or particularly useful beyond a certain size. 40k of non-prose text is HUGE, and I think that if anyone were to do a study on the readability of pages like time et al. it would be right down there with technical documents for lay persons...bad. We want to balance the inclusion of as much relevant information as we can stuff in there with cleanliness and readability, if we have everything anyone could ever want to know about a given term on a page that is wonderful, but if no one is actually able to sift through the stuff that they could care less about to get to what they actually need than what good have we done? I agree that that list should be cut down, we don't need every collocation and phrase ever written that includes "time" to be listed there, probably just idiomatic and other "interesting" terms belong. - [The]DaveRoss 20:50, 9 April 2007 (UTC)Reply
The problem is that they are all idiomatic, or they shouldn't be listed. DAVilla 21:03, 9 April 2007 (UTC)Reply
"Achilles tendon reflex time", "French Revolutionary Time", "QuickTime", "Hawaii-Aleutian time"...there is plenty in this list that doesn't belong, timezone names, random phrases which aren't idiomatic containing the word time, they are certainly not _all_ idiomatic. There are plenty there which should be on the page, but I guess what we are getting down to is that it is time for a more strict criteria for "derived terms", "related terms" etc. sections, especially for the exceedingly large pages. - [The]DaveRoss 22:05, 9 April 2007 (UTC)Reply
Hmm... part of the problem is that it's impossible to tell from the list what deserves an entry and what should be removed. "Achilles tendon reflex time" = Achilles tendon + reflex time as far as I can tell, but the expression "a stitch in time saves nine" was removed! Plus it's difficult enough keeping the list alphabetized. Someone decided to list old as time itself under "A" with as old as time itself. What is this, a topical list??
I'm moving the red links to Wiktionary:Requested articles:English/time so that if anyone wants to argue their inclusion they can simply create the page. DAVilla 23:52, 9 April 2007 (UTC)Reply
Sounds like a good cleanup for this page, but I think a general discussion is called for regarding treatment of these sections. It is obvious that some delineation needs to be made, but where to draw the line? - [The]DaveRoss 00:01, 10 April 2007 (UTC)Reply
Long pages aren't bad, you say? I just spent over half an hour, probably more than an hour actually, going through the derived terms at time. All I was doing was correctly alphabetizing the list (per below), removing extra words like "the" and trailing <!- comments -> (per below) many of which I intended to move in creating the actual page later, and standardizing other comments like <!- a stitch [in time] saves nine ->. I pushed the wrong button at some point and the browser paged back, which 50% of the time means I lose all of my work. I lost all of my work. So if you want the list to be managed, have fun managing it yourself. I've already rolled back my move to WT:RA, and it's not my fucking problem any more. DAVilla 15:36, 20 April 2007 (UTC)Reply

Policy proposal

This policy is narrowly intended for pages with a great number of derived terms. However, it hashes out some specifics with regard to the Derived terms section in general, and may have implications on other such sections.

  1. The section is to be listed alphabetically. That means closely related words with different spellings—such as old-time and old times, or tact time and takt time—must be listed separately.
    Rationale: An ordering that is alphabetical does not necessarily coincide with one that is topical, even weakly so. Consider Taiwan time and old-timer, which would separate the above examples. Of the two incompatible orderings, only the first can be clearly defined in formal specification. It also has the advantage of being manageable by bot.
    Point of contention: It may be permissible to list on the same line terms that use the same letters but have different spacing or hyphenation, or use ligatures like æ=ae and ï=i which are conflated. However, note that these are not always synonymous, e.g. some time and sometime. Likewise summer time and mean time, as spaced, are systems of measuring time, while summertime and meantime are not.
  2. Only blue links are to be shown, with the exception of closely related words such as alternative spellings (which would be shown in the see-also at the top of a page and/or as alternative spellings in the language section) or inflected froms (where there are additional definitions, as would be shown in the see-also at top of a language section).
    Rationale: Red links are fine for giving an indication of what needs to be done, but an overwhelming number of red links are impossible to manage. To avoid removal, red links need comments if they do not appear idiomatic, such as short time and to time, or legal/medical terms. The term just-in-time is an example of one removed from time (by Paul G no less) perhaps because, lacking a comment, it did not appear idiomatic. On the other hand, these partial definitions, information that really belong on the pages themselves, may not be is not commonly removed after the page has been established. Furthermore there is no process for determining if the comments are correct, or if certain words are in fact idiomatic, other than the RFV process for entries themselves.
    While a sea of unverifiable red links do injustice to the page, and in my opinion more closely resemble requests for articles than a useful compilation, at the same time we cannot push requests off to another space when a closely related term exists in the Wiktionary. Doing so would be asking for a good number of pages that could be soft redirects, or very brief at the least, to be recreated from ground zero. This ties up time of knowledgeable contributors in wikifying the page, finding the existing alternatives perhaps much later, and then having to coalesce the information. At the same time, not allowing these red links to remain on the page might suggest that there is one principal spelling and no alternatives to a term. While that may certianly be the case for many spellings, for spacing and hyphenation in particular there is a good variety even among the major English dictionaries.
  3. Terms that are added in the derived terms of a derived term (especially one that is a string prefix; see below) should then be omitted from the page. For instance, space-time and time series are derived terms of time which themselves have a number of derived terms. Otherwise any blue link is acceptable.
    Rationale: Either this system or a more complete one are feasible, but this is more elegant since anyone looking for e.g. time series analysis, time series data, time series database server, time series model, or time series prediction (assuming those are all idiomatic) would be just as inclined to follow a link to time series. At the same time, terms that are not derived terms of the derived term time series in this example, such as time series animation, should not and would not be excluded from the listing at time. Another such example is space-time trade-off.
    Point of contention: Since words that are not string prefixes of the derived term, such as anti de Sitter spacetime, are alphabetized differently, could they also be included as a derived term of e.g. de Sitter, Sitter, space, and time? Presumably not of de?
    Point of contention: Are blue links unquestionable if they are redirects to other pages? One example is take time to smell the roses, which redirects to stop and smell the roses. In that case it is not possible to link to the primary title as a derived term. There are other cases where both could be listed, e.g. have a whale of a time and its redirect whale of a time. Should they both be?

DAVilla 05:38, 10 April 2007 (UTC)Reply

Stupid question: when you say "may not", do you mean "might not", or "must not"? ("these partial definitions […] may not be removed") Ordinarily they're distinguishable from context, but that sentence is kind of confusing me. —RuakhTALK 05:45, 10 April 2007 (UTC)Reply
Might. Not dumb, thanks for pointing it out. DAVilla 13:25, 10 April 2007 (UTC)Reply
I disagree strongly with the removal of red links. They are our friend, and tell us what terms are still to be defined. (Some of us actually define words here.) SemperBlotto 07:35, 10 April 2007 (UTC)Reply
I want to agree with you, but if we are to find another solution then please aknowledge that not all red links imply the term is needed. Some of them should simply never be defined. The more questionable include "former + times", "Old + Father Time", "time-and-motion + expert", and "waste of time", and then there are the musical meters (now there's a can of worms). Longer phrases like "at the present time" and "this is no time for" might be better at shorter ones like present time and be no time for. And you can't know that all, possibly shortest remaining time and worst-case execution time for instance, are idiomatic until you look them up. I wouldn't have known man time was tosh™ until I saw the defintion "a man's bowel movement". What would you say if I added rotation time as a derived term? Considering you've deleted the page before, I would hope that means you would be willing to remove it from the list. You've also deleted preposition of time, stoppage time, and even time limit, presumably for content one would hope? I suppose "at" is a too succinct definition of the first.
The hedges that grow on time are possibly some of the most laboriously trimmed. Do note that I added a number myself, of already existing entries, but I also seem to be the only one using clippers (May, Sept 2006). If you want to keep all of those links, please propose a system for keeping track of what is or is not worthy of inclusion. DAVilla 13:25, 10 April 2007 (UTC)Reply
I also disagree with the removal of red links. A cleanup of the section in which those appear is the proper way to go, laborious though it may be. Red links show us what needs to be done, but at the same time, if you see a red link of which you think it should not be defined, and the page has no comment regarding some idiomatic meaning, removing it is probably less time-consuming than actually creating the page and defining the term. H. (talk) 15:19, 10 April 2007 (UTC)Reply
I agree that we shouldn't be basing any sorts of content decisions on the "redness" of a link, whether or not we currently define a term doesn't hold water when deciding it's relevance in this case. That leaves us still with the decision of how to choose what does and does not merit listing in a given headwords "derivatives" section, not an obvious set of rules.
I like the idea of second tier derivations being pushed onto the first tier derivation's pages (space-time continuum on space-time but not on space or time). How we should organize them...well I suppose that comes down to what we think they are actually used for. I am not exactly sure what the purpose of these sections are, but the purpose should define the form. - [The]DaveRoss 20:19, 10 April 2007 (UTC)Reply
2B. Alternate proposal to #2. Derived terms are not to be <!-commented-> with definitions, context, or any other information specific to a term. Any red link can be removed by any contributor to the requested articles page indefinitely if he or she has any reasonable (if uninformed) doubt of the term's idiomatic status. If any of the terms are recent additions by non-regulars, the edit should be so commented, e.g. "indefinite removal to RA per DT policy".
Conduct: This provision shall not be abused. Contributors are advised to perform a simple search of any terms that appear to be jargon before deciding on them. Deletions can be rolled back if the contributor is not familiar with the RFD process or does not make a good-faith effort to abide by existing standards, as would likely be indicated by a removal of red links en masse. However, deletions cannot be rolled back simply because the contributor was wrong. Subjective opinion is allowed, and individual removals are not to be questioned. If the term has idiomatic status, the page can simply be created before a term in the list is reinstated.
At the same time, other contributors are not required to check the history before adding derived terms. While they are instructed not to reinstate terms they feel were removed incorrectly until that page exists, they are neither liable for accidentally reinstating derived terms that have been previously removed, for instance one added by another contributor formerly and included in a long list of new additions.
Summary: En masse additions are okay. En masse deletions in the general case are not. Individual deletions are okay, and should not be reviewed unless the contributor intends to turn the links blue. Essentially this gives all contributors veto power on any term. However, this is a weak power since any link can be reinstated by simply creating the page.
Rationale: This proposed policy allows for a large number of red links and at the same time avoids vilifying the targeting of red links by those who are willing to tidy a page, to remove links that could never be blue. More importantly it avoids the need for commenting Derived terms. Comments are not visible to the outside world and are a waste of our time. DAVilla 10:32, 11 April 2007 (UTC)Reply
While I like the spirit of this option, I question the functionality. One of the more annoying things about editing lists of red and blue links is that while you are editing you can't tell what is what. If we have large lists of variously commented terms in these sections they will quickly become difficult to edit and control. Is there some way we can prevent that from happening? - [The]DaveRoss 20:40, 11 April 2007 (UTC)Reply
I don't understand. Why do you think the terms would be "variously commented"? DAVilla 23:33, 11 April 2007 (UTC)Reply

Inclusion of derived terms

Wow, I'm surprised that my contributions to time have provoked so much comment. I'd like to add some of my own.

"Time" is, apparently, the commonest (clean) four-letter word in the English language, according to a question on The Weakest Link (they gave a source - I don't remember what it was, though). A large number of the uses of this word are, no doubt, in idiomatic phrases, and so, necessarily, the list is long.

The derived terms I have been adding to "time" and elsewhere are compiled from various print and online dictionaries (onelook.com is very useful in this regard, given that it allows for the use of regular expressions in searches). Many (or most) of the terms that I find I am unfamiliar with. Some are obscure or dubious. I prefer to err on the side of inclusion, figuring that if the terms linked to are not idiomatic or do not exist, they will be removed, but if I leave them out and they are worthy of inclusion, no one else might ever enter them. That is not to say I have entered everything I could find - there is plenty that was, to my mind, not idiomatic or too obscure that I therefore left out.

The derived terms for "time" took a very long time to compile and verify, needless to say, but they are there to be edited, so by all means whittle away any terms that fail CFI. However, note that many of these terms are in the OED with citations, or in reputable online sources. Terms that appear to be unidiomatic might in fact be idiomatic. I suggest checking the OED, other print dictionaries and onelook to confirm one way or the other before entries are deleted from the list. (Inclusion in any of these sources doesn't necessarily mean that a term passes Wiktionary's CFI, of course.)

All the terms for time zones that I found (mainly in Wikipedia) have been included. It's debatable whether these should be in. Some print dictionaries give "Greenwich Mean Time", so why not the others? The list of these is finite and fairly short. Again, delete if these don't pass CFI, but my thinking is that they do (all or most have Wikipedia entries).

Technical (including medical) terms certainly do belong in Wiktionary if they pass CFI. In fact, they are more likely to do so, as they often appear in print in journals and other scientific publications.

I have tended to list terms B derived from terms A derived from "time" under B rather than under A itself. For example, "a stitch in time saves nine" comes under "in time", I believe, with a comment to that effect. I think the "derived terms of derived terms" system is cleaner, but this might make it harder for users to find terms or make them think that terms have been overlooked. (Incidentally, this is why "just-in-time" has been removed from the derived terms: you'll find it under just in time, which is the phrase from which "just-in-time" is derived.) If there are inconsistencies (such as "Achilles tendon reflex time"), then please fix these.

In short, the list of terms derived from "time" is not set in stone. None of us are infallible experts on everything, so please edit anything that I have not got right, and if I might be so bold as to ask, possibly be grateful that I researched and entered these hundreds of terms? — Paul G 09:23, 11 April 2007 (UTC)Reply

On the whole it's a good list, yes. I have no doubt that most of the terms, nearly all in fact, should be included. I will revert my change shortly. DAVilla 10:37, 11 April 2007 (UTC)Reply

Wiktionary:About Ancient Greek

I realize this has been a while coming, but I feel that I've finally gotten this page to a point where it's ready to be accepted as official Wiktionary policy. Will everyone who has any interest in the state of Ancient Greek on Wiktionary please take a look. I've recently made a few minor changes to the page, in preparation for this. In particular, the Pronunciation & Romanization section has been updated. Unless something major comes up, it is my intention to start a vote in a week or so to make it official policy. Please, if anyone has any problems with the page (or is considering having problems), please bring them up now, before the vote. Thanks very much. Atelaes 04:12, 6 April 2007 (UTC)Reply

I think the policies/guidelines there are great, but much of the page seems intended to inform the reader about Ancient Greek (especially the "Diacritics & Accentuation" section); I think that that information is fascinating and should be kept somewhere, but probably not at Wiktionary:About Ancient Greek. (Maybe it could be put at a Appendix:Ancient Greek or the like?) To a lesser extent, I don't think that Wiktionary:About Ancient Greek should duplicate as much of WT:ELE as it currently does; I really think Wiktionary:About Ancient Greek should simply tell people-​who-​understand-​Ancient-​Greek-​and-​have-​read-​WT:ELE the Wiktionary policies that are specific to Ancient Greek — which is to say, the specific things they'll need to know in order to contribute to entries on Ancient Greek words.
That said, I do have one minor policy/guideline quibble; I think primary-source attestations should go in unordered lists after the senses they correspond to, or in "Quotations" sections, or in /Citations subpages, like at entries for words in other languages. (My personal preference is for unordered lists in each sense, but WT:ELE says that there's no consensus yet.) I don't see what benefit there is in giving these in the "References" section.
RuakhTALK 05:09, 6 April 2007 (UTC)Reply
Concerning the excessive information in the diacritics & accentuation section, I tend to agree. However, I was ordered to write that section (at gunpoint, I might add). Perhaps it should be trimmed down somewhat. As for the primary sources in the references section, I feel that to be somewhat of a shortcut, for the time-being. Writing citations for the Ancient Greek entries is incredibly time-consuming, and I don't think it will happen much in the immediate future, although ultimately they should all get some. For an example of what all goes into them, take a look at θεῖον. I really don't like the convention (that a few people have tried) of simply scattering the sources throughout the definitions, as I think it's rather unhelpful and makes the entry look messy. Putting these sources in the references section provides a quick and easy (and temporary) way to reference the words. Atelaes 05:38, 6 April 2007 (UTC)Reply
What's the difference between a gloss and a translation? Am I to understand that the gloss is in an original somewhere? If so it isn't cited as to which version it comes from, and it needs to be to give credit. If you like your translation better then why have the gloss at all? By the way, does the translation belong in italics or not? DAVilla 07:46, 6 April 2007 (UTC)Reply
The difference between the gloss and the translation is that the gloss retains more of the original language, at the expense of English. It doesn't come out terribly well in these two passages, admittedly. A gloss is not an authoritative version, by any means. Rather, it is an attempt at as simplistic a translation as possible, which follows the word order, grammatical structures, etc. of the original. The translation is meant to feel like real English, but this often requires a bit more freedom with the language of the original. Its main benefit is to allow people who actually have some handle on the language to see an intermediate step between the original and the translation. Atelaes 07:55, 6 April 2007 (UTC)Reply
By the way, I'm not sure how to reconcile "The normal standard for modern languages is three independent attestations. However, Ancient Greek, as a dead language, requires only one attestation." with WT:CFI. I didn't think language considerations pages could override CFI? Or is the thinking here simply that all surviving Ancient Greek manuscripts can be considered "well-known works"? —RuakhTALK 07:44, 6 April 2007 (UTC)Reply
Yeah, I was expecting to get more flack on that when I first proposed it, but no one said anything. It's certainly open to debate, but I think that one citation should be the norm for all dead languages because they're not subject to the same flux that living languages are. And, yes, I would say that all Ancient Greek works would count as well-known works, at least within a certain context. Atelaes 07:55, 6 April 2007 (UTC)Reply
I think it is OK for an "About Language" page to differ from both the ELE and the CFI. However, those differences should be clearly spelled out, and each About Language page must be voted in as policy. There are enough oddities and special cases in various languages that we can never hope to have a concise ELE or CFI document if we try to incorporate them all into those two primary documents. --EncycloPetey 22:27, 6 April 2007 (UTC)Reply

I thought I'd explain my motivations for the most recent changes to θεῖον. First, I really hate ELchar. I really have no idea why it does this, but on my browser it puts all the characters into this weird loopy font that just looks ridiculous. Polytonic does not do this for me. My hope is that polytonic is allowing people to see just as many characters as ELchar is. Any feedback on this? Are people seeing more or fewer characters with the template switch? I see them all completely regardless of fonts templates. A second comment, I changed the indentation, because I think it rather important that the words in the three lines (most especially the original and the gloss) are in line with each other as much as possible. Responses? Atelaes 08:19, 8 April 2007 (UTC)Reply

Either template looks fine on my screen. In fact ELchar is a little straighter and the present one more cruvy, but not "loopy" or anything. But it needs to be one or the other, or I can't read it... rather, it doesn't show; I can't read it regardless.
I really have to say, Ruakh, that I don't like the new look. "Original" and "translation" are just unnecessary, and the word "gloss" is confusing. The only reason I knew it wasn't an annotation in the original text is the source itself. You know, the Bible is rather ancient and all. But in a modern work that's what "gloss" would mean to me. As to indentation, will there ever be a need to preserve a translation that was in the original work? I would think placing them at the same indentation should be preserved for that. Or maybe it would be enough to put our own words in italics. Or maybe we really ought to do both. I don't know if this has ever been discussed. DAVilla 00:26, 12 April 2007 (UTC)Reply
That's O.K., I don't like it that much either. My preferred versions are the first two I did (http://en.wiktionary.org/w/index.php?title=%CE%B8%CE%B5%E1%BF%96%CE%BF%CE%BD&oldid=2284749 and http://en.wiktionary.org/w/index.php?title=%CE%B8%CE%B5%E1%BF%96%CE%BF%CE%BD&oldid=2290015 — they differ only in indentation levels, with one putting the translation on par with the original, the other indenting it less than the original and more than the gloss); the last one was just an attempt to line them up nicely, as Atelaes prefers. (Seeing as he actually understands Ancient Greek, I think it makes sense to trust his instincts.)
It's actually pretty standard to use the term gloss to refer to a pseudo-translation that maps each word in the original text to a word or phrase in the target language, sometimes with annotations like "-DATIVE" and whatnot. If you can suggest an alternative word (or short phrase) to use, though, I'd be O.K. with that.
You know, rather than give a separate gloss, we could do something like this:
γένος οὖν ὑπάρχοοντες τοῦ θεοῦ οὐκ ὀφείλομεν νομίζειν […]
(Most of those are probably wrong, but you get the idea.)
It would probably be a lot of effort, though. :-/
RuakhTALK 01:44, 12 April 2007 (UTC)Reply
That is an interesting idea, but would indeed be a lot of work to implement on a regular basis. Also, I don't know how many users would get the idea, unless we had a little box saying, "scroll over text to see gloss" or something. I've fixed it, by the way. And yes, Davilla, this is virgin territory which I've never seen a discussion on, nor have I seen anything of the sort anywhere else on Wiktionary. We just might want to start a new discussion just on this, as we might benefit from the opinions and technical expertise of others. Unless I'm sorely mistaken, this will be setting a precedent for all other languages, as I have to imagine that we (eventually) want to have citations for our foreign language entries as well as our English entries. Atelaes 04:14, 12 April 2007 (UTC)Reply

Placenames redux

Since the discussion earlier up petered out without a resolution, I want to bring this up again. We have a considerable number of placenames that seem to be in contravention to WT:CFI#Names of actual people, places, and things, which gives "A name should be included if it is used attributively, with a widely understood meaning." and "A name should be included if it has become a generic term." In essence, a placename still needs to meet the same attestation standards as any other term, since this is still a dictionary, not a placename database or encyclopedia. We should only include placenames with some significance towards our goal of defining words, not collecting geographical data. However, I seem to be able to find a large number of placenames that cannot be attested as generic or attributive, in my opinion. Consider Alagoas, Maceió, Abilene, Afula, Aeolian Islands, Lipari Islands, Ahmedabad, Aegadian Islands, Adyghe Autonomous Oblast, Adigoppula, (yes, these are just grabbed at random from the first page of the proper noun category) Bursledon, Titchfield, Tula, Thousand Oaks, etc. As you can imagine, there are a lot more. Not just the 50 in and 100 in , but the many hundreds more in . Clearly I can't just go on a deletion spree, can I? (It would take forever!) The main problem is that even if any of these have generic or attributive senses, and some, though not most, do, almost all of them are "defined" in the form of "A town in Oaxaca, Mexico." (Juchitán) That, to me, is an encyclopedia article (if a stubbish one), not about the word. So, what to do about these? I don't think they are adding much to the dictionary; this isn't Wikipedia. Frankly, I'd like to see most of them gone: all the ones that cannot be attested according to the standards currently in WT:CFI. Is there an efficient way to do this that doesn't involve hundreds of RFD listings? Or people violently disagreeing, ideally? Dmcdevit 09:03, 6 April 2007 (UTC)Reply

I, for one, am fully supportive of such a deleting spree. Although, I imagine others might disagree. I think the CFI paragraph you quoted is quite clear on this, and should be followed. I suppose somewhat major placenames (at the deleting admin's discretion) should be placed under RFV, so that people are allowed the chance to save such words, if they care to try. But, Juchitán should just go, as far as I'm concerned. Atelaes 09:08, 6 April 2007 (UTC)Reply
On the other hand, what place name entries can do here that they can't do at Wikipedia is provide translations. If I want to know, or inform others, what the Aeolian Islands are called in Yiddish, or Turkish, or Swahili, the place for that is Wiktionary. Wikipedia is willing to provide the name in the local language (in this case Italian), and the interlanguage links work for those languages that have an article, but not all other languages do have an article. Wikipedia does have lists of translations of place names, to be sure, but most of them have been nominated for deletion at one time or another on the grounds that the information there is more appropriate for Wiktionary than for Wikipedia. Angr 09:31, 6 April 2007 (UTC)Reply
That's Wikipedia's problem though, not ours. Which is to say, that is fallacious logic: just because we can do something that another project doesn't, does not make that dictionary-appropriate. Wikipedia doesn't give translations (transliterations) or all of its Latin-script people, either which is another tens of thousands of entries we could add (or phonebook entries of restaurant reviews, for that matter). But if Juchitán doesn't belong in a dictionary, neither does Juchitán in Hawaiian, if there were such a word. You might get aways with sticking a compendium of placename translations in an appendix, but I still don't think they belong as articles. Dmcdevit 22:19, 6 April 2007 (UTC)Reply
Juchitán gets [http://books.google.com/books?hl=en&q=Juchit%C3%A1n&btnG=Google+Search&ie=UTF-8&oe=UTF-8&um=1&sa=N&tab=wp 675 Google Books hits. I'll bet you a thousand dollars that at least three of those are uses that I would consider "attributive" (but don't hold me to it right this moment, I'm going to be incommunicado on vacation until mid-next week). Cheers! bd2412 T 06:15, 7 April 2007 (UTC)Reply
I was about to put it on RFV and say "prove it," but I'll just wait then. :) Part of the problem is that "attributive" as noted in a discussion somewhere earlier on this page, is a bit ambiguous. I think it's clear it's intended to mean a having specific meaning tha describes something other than being simply in or from the place. So three cites of Juchitán being uses in the same sense, as in (totally made up) "a Juchitán pizza" or "a Juchitán sandwich" meaning "a pizza with fish" or "a sandwich with fish". "A Juchitán pizza" just meaning a pizza made in Juchitán is not the spirit of the criterion, since any placename can be used to modify in that way. The problem with the current entry is that even if there is such an attributive use, it is certainly not the one defined, which gives encyclopedic data about the city's location. Dmcdevit 06:42, 7 April 2007 (UTC)Reply
Saying "That's Wikipedia's problem though, not ours" shows precisely what Wiktionary's problem is. Wiktionary and Wikipedia are complementary sister projects, not two completely unrelated websites. Dictionaries, not encyclopedias, provide translations of individual words. Angr 11:06, 7 April 2007 (UTC)Reply
Wikipedia does not exist in a vacuum, but Wikipedians tend to behave as if it does. They have no one but themselves for their reputation. Perhaps if Wikipedians were inclined to cooperate, they'd find sister projects more willing to help where they can.
All that aside, it is a problem for Wikipedia, and not for Wiktionary. It is an encyclopedic concern; demographic statistics are the useful criteria - and that should not be in a dictionary. Including demographics has certainly met fierce resistance, in the past. --Connel MacKenzie 20:15, 7 April 2007 (UTC)Reply
In my experience, it is Wiktionarians who behave as if Wiktionary exists in a vacuum, free from all established norms of lexicography. Translations for place names are not demographic statistics, and they are not encyclopedic. Where they belong is in a dictionary. Angr 08:45, 8 April 2007 (UTC)Reply
The problem is not translations of place names in general, since many are validly included, but translations of place names that should not be included according to CFI. I don't see how translations of words deemed inappropriate for a dictionary are appropriate for a dictionary. Dmcdevit 22:35, 9 April 2007 (UTC)Reply
The fact that "any placename can be used to modify in that way" does not detract from placenames being words; it enhances it. Consider it this way: a reader of a passage describing a Juchitán pizza may not be able to tell from the context whether this is a pizza from a particular place, or just a particular kind of pizza. The CFI exists, I think, not just to tell us that certain words are the kind that fit in a dictionary, but to tell us that words need to have a certain level of use before we should worry about potential readers coming across them and having the need to look them up in a dictionary. bd2412 T 23:23, 9 April 2007 (UTC)Reply
No. Do not go on a deletion spree. Many of think that ALL placenames deserve an entry here. By all means have a vote though, but make it a simpler vote than the one that I started. I would suggest something like - "should we allow entries for all placenames, or should there be some other criteria (which criteria would be the subject of a second vote)". SemperBlotto 11:08, 6 April 2007 (UTC)Reply
I would instead, suggest separate votes on each criteria proposed. Just not lumped together into one vote. That would give each enough time for debate, without as much crossover. --Connel MacKenzie 05:56, 7 April 2007 (UTC)Reply
But vote on what? I don't disagree with CFI. Does anyone have a proposal to change it? And what does "should we allow entries for all placenames, or should there be some other criteria" even mean? We do allow “all words in all languages”, as long as they meet our guidelines for proof of being true words in common use. Similarly, we allow all placenames already, as long as they meet our guidelines for proof of being true words in common use. If you are suggesting that we include placenames that don't need such proof, I don't think that is tenable. All words need to pass attestation; why give carte blanche to placeames, of all categories of words? I just named the floor under my bed Pirate's Alley. It doesn't need to be attested? We think of some solution for this proliferation of encyclopedic entries with no attempt at giving attestable definitions. Dmcdevit 22:19, 6 April 2007 (UTC)Reply
WT:CFI#Names of actual people, places, and things says "A name should be included if it is used attributively, with a widely understood meaning" and "A name should be included if it has become a generic term", but nowhere does it say only under those conditions. You want place names to be attested? That's easy: "This is the smallest and western-most of the inhabited Aeolian Islands and lies about 67 miles from Milazzo" is an attestation for the words Aeolian Islands and Milazzo. We just have to find two more such attestations, and both have passed the CFI. Angr 11:06, 7 April 2007 (UTC)Reply
Yes, CFI is quite clear on this; but nobody follows it. I couldn't even get the name for Narita International Airport deleted.. should probably do a vote on this to change CFI. This sort of inconsistency is silly. Cynewulf 15:17, 6 April 2007 (UTC)Reply
The CFI should not state that this is the policy if it is not followed. Someone needs to be bold enough to alter it to say that attribution is only one suggested criterion, and that more definitive criteria are under debate. DAVilla 16:09, 7 April 2007 (UTC)Reply
As I write each time this is mentioned, many placenames are among the oldest and most interesting words we use (see what I have written before for examples). They can also be confusing to people who do not know a language well -- if I cannot understand a sentence in German containing the noun Köln, say Heute, mehr als dreißig Jahre danach, sind viele Bauten von europäischem Rang in Köln immer noch nicht wiederhergestellt, why should my life be made more difficult because Köln is banned from the dictionary? Well, OK, I knew that Köln translated as Cologne, which is where you turn right to get to Austria, but substitute Leverkusen, Hundhausen, or another small town in the area, and I would have been stumped. I can't imagine why placenames should be subject a tougher CFI than other words. Also, the attribution rule seems particularly odd. It seems to suggest that Cologne would get in because of Eau de Cologne but Köln would not because Kölnerwasser is merely a compound word. --Enginear 18:00, 7 April 2007 (UTC)Reply
Yes, but how could Wiktionary help you in the above-mentioned circumstance? Say you looked up Leverkusen and discovered that it was a town in Germany. Well then... what? From the context, it is appearent that Leverkusen must be a placename, I am not certain how confirming that it was would do you much good. If you wanted to know where, specifically, it was, you should have consulted a map anyway, and not a dictionary. (What's that Wikimapia site, anyway?) -- Beobach972 00:12, 8 April 2007 (UTC)Reply
Besides, Wikipedia would have an entry on Leverkusen, with the German and English names. Why should wiktionary have an entry for it — so you could look up the Russian translation? Well, if you're using Russian-language directions to try to navigate around Germany, you have — and I say this is all humour — some nerve coming to the English-language wiktionary and expecting it to help you! :-D -- Beobach972 00:12, 8 April 2007 (UTC)Reply
I think Kölnerwasser would get in because it is not water from Köln, and thus Köln is used figuratively, so it would, I think, get in, too. -- Beobach972 00:14, 8 April 2007 (UTC)Reply
I am, as I have surely stated before, in favour of deleting non-attributive placenames — or modifying our CFI to allow them (the Appendix idea is good, but in action it could become unwieldy and large). As it stands now, they should be deleted. -- Beobach972 00:12, 8 April 2007 (UTC)Reply
CFI does not exclude them. It says place names used attributively should be included; it does not say place names not used attributively should be deleted. Angr 08:48, 8 April 2007 (UTC)Reply
It says very clearly that they should not be included. What do you think that means?
I have said before that I do not agree with CFI on this point, but I'm not going to try to change it through re-interpretation alone. DAVilla 20:03, 9 April 2007 (UTC)Reply

Amoy

What are you doing? This isn't going to work. We define languages down to 639 code level for a reason, and treat variations/dialects within them within the section. We need the Min Nan section for everything that isn't Amoy anyway. (Chaozhou might reasonably be nan-ch.) We can't do this for the same reason we can't have "British" as a language header. Please stop. Was this discussed anywhere? Robert Ullmann 12:41, 6 April 2007 (UTC)Reply

This was not discussed anywhere because I didn't think anyone cared. However, I did post an explanation of my reasoning at Category talk:Min Nan. In brief, it is not comparable to putting British instead of English. As I explain in my post, the prestige dialect of Min Nan is widely considered to be Amoy (Xiamen dialect). Therefore, I originally thought it no problem to label entries as Min Nan. However, this becomes problematic if we ever want to create separate entries for other Min Nan dialects, which is already happening with the case of Teochew. The most likely scenario would be Teochew, since there are a large number of Teochew speakers living in Western countries. Teochew is part of the Min Nan language family as well, but is only 50.4% mutually intelligible with Amoy.[11] Since, Amoy is a well established name for the language/dialect spoken in Quanzhou, Xiamen, Zhangzhou, Taiwan (known there as Taiwanese), and Southeast Asia (known there as Hokkien), it seems like the best choice. The language code can still remain as nan. If we ever need to create a separate language code for Teochew, we can do something like nan-CN-44 (per ISO 3166-2). I will give you some time to digest this. I realize it's all of the sudden. I honestly didn't expect anybody to know or care since I'm the only one that has ever created entries in Amoy Min Nan on Wiktionary (with the exception of maybe one or two words). I look forward to your response. -- A-cai 12:53, 6 April 2007 (UTC)Reply
Indeed, way too sudden, and other people do care. Recall that we had some serious discussion on BP about the use of Mandarin v Mandarin Chinese, and whether Chinese should be subdivided at all. Renaming Min Nan (the ISO standard name) to Amoy without even mentioning it on BP is not good (note that we discussed "Scots Gaelic" to "Scottish Gaelic" for a while). The headers should almost certainly be Min Nan and Teochew. (And note that we are not using 3166 based code variants, they are deprecated; we, and ISO 639, code the languages, not the countries.) We have exactly one standard language header for each code. Finally, note that using the name of the "prestige dialect" might be considered serious POV. This has to be discussed on BP. Robert Ullmann 13:09, 6 April 2007 (UTC)Reply
I have posted the above from my talk page in deference to Robert's wishes. If anyone else has opinions on the subject, please let us hear from you. -- A-cai 13:15, 6 April 2007 (UTC)Reply
  • We have exactly one standard language header for each code
This is the crux of the problem. There are instances in which two languages/dialects are not mutually intelligible, but are assigned the same language code. This is such a case. Essentially, I'm looking for a solution. -- A-cai 13:22, 6 April 2007 (UTC)Reply
Some background reading:
Enjoy. -- A-cai 13:29, 6 April 2007 (UTC)Reply
Your first link doesn't really support you, as it says there are only two Min languages, North and South. (I take it the "Nan" in "Min-Nan" is the same as in "Nanjing", i.e. "South"? So would the other one be "Min-Bei"?) —RuakhTALK 14:01, 6 April 2007 (UTC)Reply

Way over my head here since I've never studied Chinese. I read somewhere, though, that while the dialects are mutually unintelligible, the writing systems are mutually intelligible? Is that true, or is it a load of crap? If it were true, then except for specific idioms and such, we could just make "Chinese" entries but include lists of pronunciations in different dialects... You Chinese speakers totally rock, someday I will join you!!! :D Language Lover 13:48, 6 April 2007 (UTC)Reply

Thanks for your response. Not true, this is a myth. Most Chinese are in fact bilingual, meaning they usually speak Mandarin and one other dialect. Since Mandarin is well established as the official language in most Chinese speaking countries, Mandarin has become the de facto written lingua franca. However, if one were to write one of the other Chinese languages/dialects in Chinese characters, it would generally be incomprehensible to a Mandarin speaker. For an illustration of my point, take a look at the right hand column in Appendix:Sino-Tibetan Swadesh lists, note the variety. -- A-cai 13:57, 6 April 2007 (UTC)Reply

I don't know enough to comment on the specifics of Chinese, but on a general point I think that equating our language headers with IS-639 isn't necessarily ideal. Its use of separate codes for Middle and modern English encourages us to separate them, which I've argued above is not helpful; whereas languages like Jèrriais and Guernésiais, which are certainly distinct languages, are lumped together (with much else) under ‘Romance: Other’. In this case, then, A-cai seems pretty persuasive to me. Widsith 14:04, 6 April 2007 (UTC)Reply

In response to Ruakh, Min is actually family of language families (read further down in the first article, I agree it's misleading) which includes Southern Min, Eastern Min, Central Min and Northern Min (based on their geographical location with respect to the Min river in Fujian). Southern Min contains four distinct strains: Amoy, Teochew, Qiongwen and Zhejiang Min Nan. None of these are mutually intelligible. Similarly, Eastern Min also has several mutually unintelligible dialects (Fuzhou dialect being the prestige dialect in that case). Amoy is known as Taiwanese in Taiwan, and Hokkien in Southeast Asia. Obviously, Taiwanese is inappropriate as a language header, because it leaves out the speakers not from Taiwan. Hokkien means Fujian. Like Min Nan, it is popularly identified with Amoy. However, since other languages/dialects are also spoken in Fujian besides Amoy, it doesn't seem appropriate either. I think Amoy is appropriate because it refers to the place of origin of this form of speech (similar to how English is a reference to England). -- A-cai 14:11, 6 April 2007 (UTC)Reply
I also don’t know enough to reasonably contribute here. Let’s hope that in future versions, the ‘unintelligible dialects’ will be recognised as languages and get their own code. In the meantime, we have to think of something, since I think A-cai’s arguments hold ground (is that English?). Silly enough, the other thing occurs too, although that is much less of a problem: Vlaams is recognised as a separate language of Dutch, but I certainly do understand most of it if I make an effort, and almost no one from that region is going to treat his language differently from Dutch. They have a vls:Wikipedia, though.
I think we should trust on the judgement of the most knowledgeable persons here. H. (talk) 16:26, 6 April 2007 (UTC)Reply
No, Vlaams is not recognized as a separate language, it merely has an ISO code. It is very important to realize that the SIL does give codes to major dialects as well as to languages and "super languages". The existence of an ISO code should not be taken necessarily to mean that it represents a distinct language. Interestingly, the article about West-Vlaams (on the vls Wikipedia defines West-Vlaams as "a dialect group in Dutch" ("de meest zuudwestelyke dialectgroep van et Nederlands"). So the Wikipedia in that :language" doesn't even define itself as a language. --EncycloPetey 22:24, 6 April 2007 (UTC)Reply
I think part of the problem is that ISO 639 is fairly detailed with respect to Western languages, but falls down on the job with respect to lesser known languages, especially Chinese dialects. I think the nature of the problem is not fully understood by the average person in Asia either. This is partially a result of the promotion of Standard Mandarin as the official language. Most people here will not be able to read the following link:
However, I would like to post it so that it is part of the record. It is a discussion about what to call the language that I'm proposing we call Amoy (based on a history of usage that actually predates the use of the term Min Nan). Various people from Amoy speaking areas (Singapore, Taiwan, Malaysia, PRC) have posted their opinions in both Mandarin (some in simplified script and some in traditional script) and Amoy (some in Chinese characters, some in Romanized script). I wish I had time to sit down and translate the whole thing for you, but it would take way too long. In short, some of the posters did feel that the term Min Nan is too broad to be useful. Min Nan is an academic term that describes a group of languages/dialects spoken by people who originally came from Southern Fujian. In that sense, it is a legitimate label. However, it is not useful as a label to describe a single language that is mutually comprehensible to all of its speakers. To put it in terms that a western audience will understand, saying that Amoy and Teochew are the same language by virtue of the fact that they both belong to Min Nan would be akin to saying that Spanish and French are the same language by virtue of the fact that they both belong to Romance. I should mention also that Chinese dialects do all have one thing in common; they don't generally distinguish between plural and singular. In other words, the Chinese word for Min Nan may be interpreted as either Min Nan language or Min Nan languages. The way you translate it would depend on the context. If your talking about Min Nan in the context of one specific dialect such as Amoy, then it would be Min Nan language. If you are talking about all of the varieties of Min Nan, then it would be Min Nan languages.
In summary, does anyone have objections if I continue with my work. Robert, are you satisfied with the discussion? Do you still have any concerns? -- A-cai 23:26, 6 April 2007 (UTC)Reply
Part of the reason we follow ISO-639 is so that we have someplace to defer these ridiculous "splitter" debates to. It is not within our remit to make these decisions. No ISO 639? No heading. (You'll note that I lost the "Chinese" vs. "Mandarin/Min Nan/..." debate, based only on the argument that ISO 639 gives them codes - even while those language names are not recognized by the broad majority of English speakers. If this were a reasonable Wiktionary, we'd call them all "Chinese", precisely as they are called in English.) --Connel MacKenzie 05:48, 7 April 2007 (UTC)Reply
Connel, I can tell that you're against the idea. But what I don't know is whether your response is a knee jerk reaction or whether you've taken the time to actually read all of the info I posted above. My responses to all of your concerns are already up there. I'm sorry that Sinitic languages are not cooperating with ISO 639 standards. You make ISO 639 sound like a well established standard that has been around for years. In fact, ISO 639 was first published in 2002, and has continually undergone revisions since then (the most recent being February 2007 with the publishing of ISO 639-3:2007). Do we really want to put all of our eggs in that basket at this point? I'm trying to dispell myths and avoid confusion. Sometimes, a square peg just won't fit into a round hole. I realize that remarks are often taken the wrong way in BP, but I feel like I'm being ordered to comply with some arbitrary regulation! I've practically single handedly built up our inventory of Amoy Min Nan words from scratch. Frankly, I think that entitles me to more of an opinion than the rest of you about how to format the entries. Is that wrong of me? -- A-cai 06:55, 7 April 2007 (UTC)Reply
Ok, maybe I was a little too forceful in my last post. Let me try a different approach. Take a look at the translation section for the word child. I have reformatted the Chinese section in a way that I think makes sense, based on my experience with this. Let me make it clear. We are not talking about synonyms within one language called Chinese. As I stated before, both Teochew and Amoy belong to the Min Nan group. However, the Amoy word and the Teochew word are not interchangeable in some unified language called Min Nan. This is an inconvenient fact, despite what the ISO language codes imply. The ethnologue page for nan specifically states that Amoy and Teochew are not mutually intelligible.[12] So here is the question I pose: what exactly should we do about this situation? -- A-cai 08:35, 7 April 2007 (UTC)Reply
Sorry for being so blunt in my last post. I was (seemingly) knee-jerking in response to Widsith' knee-jerk. Brooklyn-ese is (for the most part) mutually unintelligible with Texanglish. I hope you weren't suggesting that the same phenomenon doesn't exist even within America, let alone when considering US/UK issues. Both bum & fanny are cutsie baby-talk words in America, yet are apparently quite vulgar in UK English...that is, here, you can say "Sit your fanny down" to a three year old, and everyone will smile; if you say "Sit your ass down" to that same three year old, someone would instantly call Child Protective Services.
There is nothing knee-jerk about my thoughts on this issue. Watch your tone. Widsith 16:33, 7 April 2007 (UTC)Reply
Perhaps you and I interpret knee-jerk differently? You said above "I don't enough about the specifics..." then went on to reiterate your stance from a previous conversation that was essentially turned down. Or, was your comment about tone a reference to the example words I picked, because they are vulgar in your dialect? I didn't mean that as a slight - it was a simple statement of fact. Your threat, on the other hand, seems rather pointed. --Connel MacKenzie 03:47, 8 April 2007 (UTC)Reply
I see the dialect issues you raise, as equal or lesser, to the US/UK debate, which has concluded (many times now) with the language heading ==English==. --Connel MacKenzie 14:03, 7 April 2007 (UTC)Reply
It may be necessary to fall back on some other authority for our sanity, but these language-splitting debates are not ridiculous by any means. They may be political, and people might say the same thing about politics, but ridiculous is being drafted into war, deported to another country, or imprisoned for your beliefs. Issues of opinion cannot be discounted as such. They can weigh very heavily.
Many, I think most who know anything about Chinese, would think the opposite of what you said, that it is a Wiktionary that classified all Chinese languages as "Chinese" which would be unreasonable. The existence of ISO codes may have been why you gave in on the distinction, but it is not the only reason you lost. And Widsith's response was not a knee-jerk reaction. If you want to insist that the criteria be objective, that we not make distinctions for ourselves, that's fine. However, it is not only permissible but appropriate and in fact necessary to gauge how well the criteria meet our needs. DAVilla 16:44, 7 April 2007 (UTC)Reply
I maintain that it is ridiculous for us (Wiktionary) to be taking on the role of mediating what "is" or "is not" a language, particularly when the ISO-639 does exist, and does have methods for ammending it, directly. --Connel MacKenzie 03:51, 8 April 2007 (UTC)Reply

Min Nan (language)

  • Xiamen (dialect)
    • Amoy (subdialect)
    • Fujian
      • Fukien
      • Hokkian
      • Taiwanese
  • Leizhou
    • Lei Hua
    • Li Hua
  • Chao-Shan
    • Choushan
    • Chaozhou
    • Teochew
  • Hainan
    • Hainanese
    • Qiongwen Hua
    • Wenchang
  • Longdu
  • Zhenan Min

See why we want to resolve things on the level of ISO 639 coding? If we use "Amoy", we need at least 17 more names and codes, and we still will have nothing for Min Nan itself. (And this is just this language, there are 12 others, we end up with several thousand if we code sub-dialects) We should keep Min Nan (code nan), which is primarily Amoy, but will have the other dialects noted in pronunciations, etc. The exception is Teochew (Chao-Shan), which is not mutually intelligible to any useful extent, and needs an extension code. (nan-tch or whatever, in the Min Nan WP they are discussing defining an extension code). Note that this coding applies to all of WM: it is used in the domain names and prefixes. The only thing we should be doing is that: deciding on a nan-xx extension for Teochew. Robert Ullmann 14:17, 7 April 2007 (UTC)Reply

Connel, US/UK English being under the same L2 header makes sense because:
  • Anglo-Frisian ⊂ West Germanic ⊂ Anglic ⊂ English (mutually intelligible: US/UK/Australian etc.)
In parallel, we have
Chinese ⊂ Min ⊂ Min Nan ⊂ Amoy (mutually intelligible: Quanzhou, Xiamen, Zhangzhou, Taiwanese)
However, if you were to say:
therefore, Scots language and English should have the same L2 header called ==Anglic==, this would be analogous to saying:
  • Chinese ⊂ Min ⊂ Min Nan ⊂ Teochew
therefore, Teochew and Amoy should have the same L2 header called ==Min Nan==. Obscuring this issue, is the fact that many people think of Amoy when they think Min Nan, just like many people think of Standard Mandarin when you say Chinese. -- A-cai 15:00, 7 April 2007 (UTC)Reply

Robert, are you suggesting that we keep nan for Amoy and call it Min Nan, but use Teochew with nan-whatever but call it Teochew? BTW, I agree we need more codes for Chinese languages. They have been short changed, there's no way around it. -- A-cai 15:00, 7 April 2007 (UTC)Reply

Also, your list implies that we give separate codes for Taiwanese and Xiamen etc. This is not what I'm saying. I'm only talking about having separate codes for groups of mutually intelligible languages/dialects. So the number wouldn't be 17, but it would be more than just one, which is simply inadequate. -- A-cai 15:05, 7 April 2007 (UTC)Reply

If I'm reading you correctly, the translation section for child would look like:

Is this what you're proposing? Doesn't that look funny, since Teochew is also Min Nan? -- A-cai 15:09, 7 April 2007 (UTC)Reply

I understand Robert to mean
which is very similar. Although I agree with the both of you on the utility of distinguishing these, I would suggest that the Teochew entries just be labeled contextually under Min-nan until such time as the ISO codes are updated. DAVilla 16:46, 7 April 2007 (UTC)Reply
I'm proposing that we use nan=Min Nan for Amoy (which is what most people mean, and this is the common name) and mutually intelligable variants, and nan-tch=Teochew for Teochew, both as L2 headers and languages in the translation tables. Where "nan-tch" is whatever code the WM projects overall adopt. We can't wait for SIL/ISO; and WM already has extension codes where needed: fiu-vro is Template:fiu-vro. Robert Ullmann 12:38, 8 April 2007 (UTC)Reply

Amoy: prestige dialect policy vs. inclusive dialect policy

As I see it, we need a decision about Wiktionary policy. Here are our three choices (if anyone has another choice, I'm open to suggestions):
Model 1 (in cases where only one ISO 639 code exists for more than one mutually incomprehensible dialect, we will label the prestige dialect according to its localized name, and label non-prestige dialects by their colloquial name, and add an extension to the code)
  • cdo = Fuzhou dialect -> ==Fuzhou==
  • cdo-extension (TBD) = Fuqing -> ==Fuqing==
  • nan = Amoy dialect -> ==Amoy==
  • nan-extension (TBD) = Teochew -> ==Teochew==
  • nan-extension (TBD) = Qiongwen (Hainanese) -> ==Qiongwen==
  • wuu = Shanghai dialect -> ==Shanghainese==
  • wuu-extension (TBD) Southern Wu -> ==Southern Wu==
Translation section would be (child):
model 1a
model 1b
Model 2 (in cases where only one ISO 639 code exists for more than one mutually incomprehensible dialect, we will label the prestige dialect according to the ISO code, and label non-prestige dialects by their colloquial name, and add an extension to the code)
  • cdo = Fuzhou dialect -> ==Min Dong==
  • cdo-extension (TBD) = Fuqing -> ==Fuqing==
  • nan = Amoy dialect -> ==Min Nan==
  • nan-extension (TBD) = Teochew -> ==Teochew==
  • nan-extension (TBD) = Qiongwen (Hainanese) -> ==Qiongwen==
  • wuu = Shanghai dialect -> ==Wu==
  • wuu-extension (TBD) Southern Wu -> ==Southern Wu==
Translation section would be (child):
Model 3 (only the prestige dialects would be given a separate L2 header, hope for new ISO 639 codes in the future)
  • cdo = Fuzhou dialect -> ==Min Dong== (Fuqing may be included in the pronunciation section, but only Fuzhou gets example sentences)
  • nan = Amoy dialect -> ==Min Nan== (Teochew and Qiongwen may be included in the pronunciation section, but only Amoy gets example sentences)
  • wuu = Shanghai dialect -> ==Wu== (Southern Wu may be included in the pronunciation section, but only Shanghainese gets example sentences)
Translation section would be (child):
So here's the crux of the matter; which of the above three models should we go with? Model 1 (A-cai), model 2 (Robert), or model 3 (Connel)? -- A-cai 01:27, 8 April 2007 (UTC)Reply
I'd change "hope for updated ISO..." to "push for updated ISO..." in the above. But that "push" needs to happen there, not here. --Connel MacKenzie 03:55, 8 April 2007 (UTC)Reply
Please avoid the POV term "prestige" in this conversation (if possible.) --Connel MacKenzie 03:56, 8 April 2007 (UTC)Reply
What term would you prefer? -- A-cai 04:10, 8 April 2007 (UTC)Reply
"Widespread"? "Widely recognized"? I really don't know, but prestige has serious negative connotations in this context. --Connel MacKenzie 15:39, 10 April 2007 (UTC)Reply
Take a look at 朋友, I've added example sentences for Amoy, Teochew and Mandarin. I don't know Teochew, so I had to rely on this site. I think the Teochew sentence is correct (enough for this discussion anyway). I hate to do it this way, but if we go with Model 3, I don't know how else we could reasonably do it, and still do justice to the languages in question. Opinions? -- A-cai 07:26, 8 April 2007 (UTC)Reply
That example is good, now change Teochew to an L2 header ...
SIL/ISO are working on more codes (4 and 5 letters ;-), but that is a long process. For WM to have a Teochew Wikipedia now (something for which there is interest), we need a code like nan-tch. I don't know the currect state of the discussion there (reading Min Nan / Amoy in POJ is a bit beyond me). We need to be able to add a limited number of extension codes like this.
As to "pushing" ISO, the precise way to do it is to see what we need to code, and what definitions we want and use, and then feed that into their process. That's how it works. (Not ringing them up: "Hello SC2? We need more codes ..." ;-) Robert Ullmann 12:57, 8 April 2007 (UTC)Reply
This being an multilingual issue, it would be good to have the support of other Wiktionaries before proposing such changes in the outside world. Not that we want to be taking official positions on this sort of thing, but anyway if we did we couldn't say it was Wiktionary's position, only the English-language Wiktionary, or it would be misleading. DAVilla 18:42, 8 April 2007 (UTC)Reply
Before we go to other projects, it would be nice if we had more of a consensus here on the English Wiktionary about what to do. I say this because I think a lot of the other Wiktionaries look to English Wiktionary as the model (rightly or wrongly). After we achieve a consensus (hopefully), where would we go to ask other Wiktionaries? Would that be on some page at Wikimedia? I know they have a lot of pages there that are sort of gathering places for multilingual issues like these. It sounds like Robert believes that model 2 is the best approach, whereas Connel was leaning toward model 3. I think model 1 is the best from a linguistic precision point of view. However, I recognize the technical standards arguments, and agree that model 2 might be the best we can do in light of the fact that the ISO standards may not be updated for quite some time. I think model 3 is like trying to force a square peg into a round hole. Obviously, we don't run into this a lot yet. We actually don't have that many Teochew words. The ones we do have are mostly in the translations section. On the other hand, our policy deficiencies in this area just might be part of the reason for this. Do we have anyone that could act as a tie breaker? -- A-cai 23:25, 8 April 2007 (UTC)Reply
See, normally, I'd refer you to our resident expert on such matters: some guy with the username "A-cai". --Connel MacKenzie 15:39, 10 April 2007 (UTC)Reply
I guess the deafening silence that followed the last few posts has given me my answer. Since, Robert and Connel feel strongly about leaving the L2 header for Amoy as Min Nan, and nobody else has offered any passionate counter arguments, I will not push for Amoy to be an L2 header (at least for now). What to do about Teochew is another matter, probably best tabled for the time being (until we actually get a regular contributor of Teochew words). By the way, some of you may have noticed that I only recently created the w:Amoy (linguistics) article on Wikipedia. The article was actually featured in the Did you know? section on Wikipedia's main page for 10 Apr 2007. Not bad for a language/dialect (whatever) that doesn't even have its own separate ISO code ;o) -- A-cai 10:27, 10 April 2007 (UTC)Reply
The deafening silence reiterates my point, that we are not situated to act as an authority on this matter. --Connel MacKenzie 15:39, 10 April 2007 (UTC)Reply
Let me break the deafening silence after my long week-end: I feel for option 1, but could live with option 2. Option 3 just twitches linguistic reality too much. It is not because those languages are unknown, that they don’t deserve a header. Just like Jerriais or Tagalog, languages I have never heard of as well, but about which there is no discussion.
And indeed we are not an authority, but presumably there are just too few knowledgeable people on this topic at all in the world, so let’s do what seems most reasonable: listen to those who at least know some of it.
However, I am unsure about the romanizations: you link them all, do we want that? Are they used as words? Even those with the numbers? In child, the Teochew translations have to be cleaned up: either put wikilinks around them, or parentheses etc. Please edit Wiktionary:About Chinese and report here. H. (talk) 09:16, 11 April 2007 (UTC)Reply
Based on the discussions so far, I have revised WT:AC in the following section: Wiktionary:About_Chinese#Min_Nan. If anyone feels the wording needs revision, or we need more added, please let me know. I think one success story of Wiktionary, so far, is that by nature of the fact that we are a multilingual dictionary, various languages/dialects get thrown together that might not otherwise have had to live in the same space, and this tends to put us face to face with the question of what exactly is a language? We thought we knew until we started playing with Wiktionary :-P
BTW, I do want to provide a more complete response to the US/UK English argument, because that has come up several times in this and other similar discussions. Has anyone noticed that we do not have a separate Swadesh list for US/UK/Australian etc English? That's because these variants of English are so closely aligned phonologically and lexically, that nobody seems to feel a pressing need for separate lists. It's why an American doesn't generally need subtitles when watching a Hugh Grant movie. US/UK/Australian etc English is what we mean when we talk about mutually intelligible languages. Now, take a look at the Appendix:Sino-Tibetan Swadesh lists. You can't even get past the third word in the list without running into significant differences among the Chinese dialects (word seven for Amoy and Teochew)! This is because you are now looking at languages/dialects which are not mutually intelligible (in other words, there is no such thing as one big happy language known as Chinese). There, now I have that off my chest. -- A-cai 12:50, 11 April 2007 (UTC)Reply

The logo in the upper left hand corner

Hi, we often see people posting things here which belong in wikipedia. I've given some thought to why this might be and one thing I realized is our logo on the upper left corner says "a multilingual free encyclopedia". Of course that's because it's supposed to look like a snapshot of a page out of a paper dictionary, which would list Wiktionary right after Wikipedia. Personally, I love the logo and whoever made it kicks all kinds of ass :-) I wonder, though, if the way it is might contribute to the confusion some of our readers seem to suffer. What do you all think? Language Lover 22:13, 6 April 2007 (UTC)Reply

See WT:FAQ.  :-) Brion was astounded that the logo he threw together in a couple minutes (if that long) had lasted two years. When it was clear that it was still superior to the logo-vote proposals, he was shocked. Back then, the entry for Wikipedia was just before Wiktionary. --Connel MacKenzie 05:39, 7 April 2007 (UTC)Reply
It's an interesting thought, but somehow I doubt that's really the reason, or even a contributing factor. I think it's just that (1) many or most people lack a clear sense of the difference between a dictionary entry and an encyclopedia article, and (2) many or most people who stumble upon one of the two projects don't fully appreciate that both exist and are sister projects and that a given fact is generally not appropriate for both. (Indeed, given people's propensity to add useless facts to Wikipedia articles, I think there might be a more general principle that people are happy to contribute regardless of the usefulness of their contribution. Surprisingly, this actually seems to work out pretty well.) —RuakhTALK 05:48, 7 April 2007 (UTC)Reply

IPA to X-SAMPA

Maybe you'll be interested to know that there a template on fr: that converts automatically IPA to X-SAMPA, via javascript. We can choose to switch to one of them (or both) with some lines on the CSS page.

Thanks to that, we got rid of the API/X-SAMPA distinction in the pronunciation sections, and we also use it in the flexion templates (see chat, chanter for examples).

Do you think such a template could be used here on en: ? - Dakdada 17:12, 7 April 2007 (UTC)Reply

Wouldn't it be easier on us to convert X-SAMPA into IPA? I can't read most of the IPA characters in the edit box, and a few on the page, even with the fancy font stuff. DAVilla 18:04, 7 April 2007 (UTC)Reply
It can be adapted like that, yes. - Dakdada 16:22, 10 April 2007 (UTC)Reply
I'll try and look at it later; if it converts it, then displays both, then I'm fine with it. If both are displayed, I can't imagine what objections might arise. --Connel MacKenzie 19:59, 7 April 2007 (UTC)Reply
Yes, it can display both, like « /ʃɑ̃.te/, /SA~.te/ » (or something else). The only thing is that it is done by a script, not by the software like the {{UC:}} stuff. - Dakdada 16:22, 10 April 2007 (UTC)Reply
I really like this idea, especially because I often encounter SAMPA transcriptions that indicate a different pronunciation to the IPA on the same page. --Wytukaze 17:58, 12 April 2007 (UTC)Reply

Language or dialect

Last year I proposed changing "language" to "language or dialect" in the ELE if there were no objections, but even I forgot about it. I would like to know if there are any objections now. As I see it we use "language" to mean anything (sans Translingual) that is acceptable as a two-level header, which could be a language or a dialect. In fact the distiction between language and dialect is not liguistically precise.

I believe this change is very closely coupled to a change in the way we list certain languages under Translations. If Mandarin (or Mandarin Chinese) and Min-nan are languages by our definition, then they should be alphabetized under M. I know this is going to generate some controversy, and I anticipate having to bring the latter to a vote. Related issues include what to classify as a language (Amoy, Serbian), what to name the languages, and how to alphabetize, but aside from trying to force "Chinese Cantonese" as a name, although even "Chinese Mandarin" has been turned down in the past, it is possible to keep those topics independent of the question I'm raising. DAVilla 17:16, 7 April 2007 (UTC)Reply

What we have now is a nightmare to parse; you wish to make it an order of magnitude worse? I think I could oppose that measure. We don't have Hippietrail's extension loaded here yet (go test it on http://wiktionarydev.leuksman.com/) that groups languages together by language groups, arbitrary groupings, and possibly by arbirary user preference groupings (I don't know that he has that part working yet.)
Without underlying software that can unify the different language names entered, I will strongly oppose "opening the floodgates." Even then, we'd need some way of describing just what a dialect is. Brooklynese? Connelese? --Connel MacKenzie 19:56, 7 April 2007 (UTC)Reply
What are you talking about? I'm not proposing a change to what we consider to be valid 2-level headers, and as far as I know Brooklynese is not one of them. All I'm proposing is that we acknowledge that there is no distinction between "language" and "dialect", and that by "language" in our terminology we sometimes mean what most people would consider a dialect.
And I don't intend to make anything worse to parse. In fact it would be easier to parse if
* Chinese
*: Cantonese
*: Mandrin
* Japanese
became
* Cantonese
* Japanese
* Mandarin
If you misread above, that's all I'm proposing. DAVilla 20:37, 7 April 2007 (UTC)Reply
Introducing the misleading term "dialect" into the debate, I cannot see as being helpful. --Connel MacKenzie 03:30, 8 April 2007 (UTC)Reply
We keep talking ourselves in circles whenever we raise the issue of language or dialect. A dialect is a language, and a language can be a dialect. Let me demonstrate my meaning by using an analogy. Let's substitute the word language with fruit and the word dialect with apple. ISO 639 codes can either represent a fruit or an apple. Of course, there are many varieties (accents) of apples (Washington apple, crab apple etc.), but we don't worry about that for the purposes of an L2 header. An orange is also a fruit, but it is clearly not an apple, so it gets a separate L2 header. But what about a pear? There are some pears that look remarkably similar to apples. In trying to define which things are fruit and which things are apples, we run into a problem that an apple is a fruit, and a fruit can be an apple. In the Amoy post, I'm essentially arguing that Amoy (apple) and Teochew (pear) are two types of fruit. Connel's counter is that they are the same fruit because they both have the code nan (which contains several types of fruit, each type of fruit having several varieties). I'm trying to separate things out at the fruit level, but am not always aided in this effort by ISO 639 codes, because two or more types of fruit are sometimes covered under the same code. This is because sometimes we argue about whether an apple is a fruit or just an apple. -- A-cai 00:19, 8 April 2007 (UTC)Reply
Okay, so it has to be handled a little more sensitively than I thought. My intent was simply to say that these two-level headers that we call "languages", whatever they might actually be, are all in the same basket, so to speak. DAVilla 03:26, 8 April 2007 (UTC)Reply
Yes, leave the word "dialect" out of it (and out of ELE). (There are a number of linguists who eschew the term, preferring "group", "language" and "variant", precisely because the term gets misused.)
That said, by all means lets get rid of the nested construction, and just put the languages in alphabetical order. Definitely easier, and Hippietrail-like things can present language groupings however preferred by the user. Robert Ullmann 12:14, 8 April 2007 (UTC)Reply

...should be discouraged. There is an example on Wiktionary:Quotations which could not be any better:

http://books.google.com/books?vid=ISBN0451527046&id=3f2ne_bk-xoC&pg=PA131&lpg=PA131&ots=A_mzPeA12T&dq=gully&sig=IT-VC9zDPaUuVACY1mDpoUXIVQY

When I try to read page 131 of Treasure Island, I'm asked to log in (even though log-in is not a requirement for this book, as it is for some). We don't ask users or even contributors to register with us. Why would would ask them to register with Google?

Furthermore, the link includes information in a &sig field that tracks who copied the link, and which doesn't reference the correct page if removed. Until I took it out, the link even contained all of the search criteria used to locate the quotation. I would remove such links on much simpler grounds, namely that they point to a dynamic CGI page rather than a static one, when even static URL's are highly susceptible to breaking.

Anways, the whole point is that the book is durably archived, not the website, so we should be using ISBN's. The URL is essential only when it is part of the record. For instance, when I quote websites—which isn't often as they almost never meet CFI—I always print the domain name, e.g. secretstrom.blogspot.com [13]

DAVilla 17:58, 7 April 2007 (UTC)Reply

It's worth pointing out, the extra data contained in the link does have a positive side: it often causes the words to be highlighted in the text, which is a very nice feature, especially if the page is large with small print. If we had a way of just linking directly a page, there'd be no way for Google to know what words to highlight, and everyone would have to painstakingly read on the order of the whole page to find the word. Language Lover 21:19, 7 April 2007 (UTC)Reply
I agree. Besides, if someone wants to link to Treasure Island, they can find the full text at our sister project Wikisource. --EncycloPetey 18:30, 7 April 2007 (UTC)Reply
The precedent was set by the proponent of the current RFV system (Muke.) He obviously would never have been given that inch, if he didn't include exact pointers so that people can check.
Until we have a more reasonable WT:CFI, and a working WT:RFV, it is only for obfuscation, that one might wish to remove the direct pointers. Perhaps a week after a successful RFV, that might be reasonable. But during the verification phase, it is just a waste of everyone's time to camouflage the links. --Connel MacKenzie 19:41, 7 April 2007 (UTC)Reply
By the way, constantly referring to our CFI in such a derogatory tone could be considered mild propaganda. Personally, I like our CFI, but that's as irrelevent as your disliking it. The fact is there are two sides to a dictionary: the readers who want to make their vocabulary sound smarter, and the readers who want to figure out what a word that living humans use means. Our CFI should be a compromise between both, and I think that combined with good context tags to accomodate the former, it does a good job at being that compromise :-) The fact is, no matter how dubious or "unwashed masses" the citations are, that doesn't mean they aren't words (1), nor that noone will ever want to know what they mean. Language Lover 21:19, 7 April 2007 (UTC)Reply
Tee-hee, he called Connel's propaganda "mild" ><. - [The]DaveRoss 21:37, 9 April 2007 (UTC)Reply
Also of note: the usenet archives exist only on that website now. That (as ridiculous as the notion is,) is precisely what is considered to be "durably archived" (well, sorta) for the purposes of our broken WT:CFI. So no including a very exact link for those, would mean the link is not truly "durably archived" after all. Therefore, all usenet "citations" should once again be removed. Is that what you are asking for? --Connel MacKenzie 19:44, 7 April 2007 (UTC)Reply
You seem to be conflating <it is durably archived> with <we provide a link to a durable archive of it>, but the two strike me as quite different. Also, I understand DAVilla's comment as referring to citations given in entries, not to discussion at WT:RFV. (?) —RuakhTALK 20:16, 7 April 2007 (UTC)Reply
My edits to Wiktionary:Quotations, reverted at Enginear's request, made it clear that links were acceptable during the RFV process, but I did not mention it above. When passing an RFV, which isn't a chore I take up regularly, I always check the Google book links before removing them from the page. I don't wish to over-proceduralize that aspect of the process, so I simply suggested that they be put in the talk spaces. But I didn't use a chisel to write that. DAVilla 20:59, 7 April 2007 (UTC)Reply
I didn't go into detail on Usenet. I can accept linking those discussions, but I have suggestions for the links. When performing a search, the URL is the same garbage as with Google books:
http://groups.google.com/group/alt.fan.james-bond/browse_thread/thread/d5e99dc86b552d18/915eac996e0b21a9?lnk=st&q=manwhore
However, messages have individual ID's that are part of the Usenet structure, such as 915eac996e0b21a9 for the link above. On Google groups, this is retained in a static URL that could be link as alt.fan.james-bond [14]
DAVilla 20:59, 7 April 2007 (UTC)Reply
To DAVilla: I agree wholeheartedly. I don't suppose you could amend WT:CITE to indicate how exactly ISBNs should be formatted? —RuakhTALK 20:16, 7 April 2007 (UTC)Reply
I agree that it is the book which is durably archived, but the paper copy is often not the most accessible source, particularly when it is rare and old; even the British Library normally gives access to facsimile scans of old books, rather than to the books themselves. We give a quote which is perhaps one sentence long, while a page or two would be needed to give reasonable context. The main purpose of a quotation is to show exactly how the word was used. It is important that people who want to find this out are able to inspect the full context. The ability to facilitate this is one advantage we have over paper dictionaries and I believe we should therefore give links wherever possible. --Enginear 20:56, 7 April 2007 (UTC)Reply
Google books links should be encouraged. Removing them is almost vandalism. They allow users to check context without physically going down to their local library and likely having to order the books from somewhere else, which requires a significantly more inconvenient registration process than signing up for google books. Kappa 13:17, 11 April 2007 (UTC)Reply

RFP: Format of examples and quotations

About

This is a request for proposals on the format of examples and quotations between definition lines. Only those options which are seconded will be included in an approval vote, to be held no sooner than April 21. Proposals by new and anonymous contributors must also be sponsored by a regular (200 edits, two weeks prior). Contributors can make several proposals, but may be asked to limit the number they sponsor if there would still be five or more options overall. As always, discussion is welcome.

Given the objections in the preceding conversation, I see this "request for proposals" as completely inappropriate. --Connel MacKenzie 03:58, 8 April 2007 (UTC)Reply
I meant for this to be completely tangential to that discussion. Since only the information provided should show in any proposal, links were not meant to be part of the discussion. I didn't think anyone would conder a link to be a requirement in adding a correctly formatted citation since there are books that do not appear anywhere online. However, there is certainly enough information to obtain a link, so for clarity it is worth declaring that, regardless of one's opinion on the propriety of external links, they should be excluded included in every case, and that doing so does not reflect in any way on one's convictions with regard thereto. DAVilla 12:26, 8 April 2007 (UTC)Reply
I assume that you would not ban an "important" cite, say the earliest cite so far entered, because some of the info is not known to the editor (eg the publisher of some early documents is often not clear). Similarly, as you say below, an important cite should not be banned just because it is not easily verifiable, unless there is reason to believe it may be fake. Certainly, it should not be banned just because it cannot be verified online. However, if there is a choice between two otherwise similar cites, the one available online should be chosen as being the most convenient for those readers who want to research the context further. --Enginear 17:48, 10 April 2007 (UTC)Reply
This request has no bearing on other citations, except to lead by example in the final selection. I have asked that the specific information listed below be included in any proposal, no less and no more, only to have consistency across the different options. The way this has been done in the past has concerned these abstract placeholders that you're alluding to. I'm a little fed up with that because it doesn't give me any valuable feedback on the sorts of quotations I find in the real world. I could have come up with some really wacky stuff, but these in comparison are pretty tame and still don't fit the mold very well. So here we have three real-world quotations (and one real-world example, don't forget) and a different approach to the same problem. DAVilla 21:57, 10 April 2007 (UTC)Reply

The proposals are by example. They must show all of the following information, regardless of correctness or verifiablity, unless it is deemed irrelevant, and unless it is tied into the proposal as a requirement, no more than the following:

    • Example:
      Our grandson owns a radio, but he’d like a transistor.
    • Quotation:
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • Original work:
      Author: Ruth Bondy
    • Quoted work:
      Date: 1968
      Title: MISSION SURVIVAL
      Subtitle: The people of Israel’s story in their own words: from the threat of annihilation to miraculous Victory
      Translator: I. I. Taslitt
      ISBN: 0491008392
      Page: 25
      Publisher: Sabra Books
      Location: New York
    • Quotation:
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • Original work:
      Date: 1973
      Author: Iris Murdoch
      Title: The Black Prince
      Publisher: Viking Press
    • Quoted work:
      Date: 2003
      ISBN: 0142180114
      Page: 407
      Publisher: Penguin Classics
    • Quotation:
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”
    • Original work:
      Date: July 25, 2004
      Interviewer: Liane Hansen
      Interviewee: Lon Simmons
      Title: Baseball Announcer Simmons Enters Hall
      Production: Weekend Edition Sunday
      Producer: National Public Radio

Previously (Oct - Dec 06)

Current practice, so far as I can tell, is shown immediately below. Please feel free to edit it, with comment, if you feel you know better. There are many things I am uncertain about. DAVilla 01:35, 8 April 2007 (UTC)Reply

  1. A transistor radio.
    Our grandson owns a radio, but he’d like a transistor.
    • 1968, Ruth Bondy, I. I. Taslitt (tr.), Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory, Sabra Books (New York), ISBN 0491008392, p. 25—
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), ISBN 0142180114, p. 407—
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004, Lon Simmons, Liane Hansen (interviewer), “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio, July 25—
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”
edited --Enginear 19:55, 8 April 2007 (UTC)Reply
added pedia links, missing comma, changed some to colons as was the standard before, edit: abbr. page, (date)->date DAVilla 22:04, 8 April 2007 (UTC)Reply
Did't we use to use dashes before the quote? Anyways, doesn't matter. This one isn't up for consideration. DAVilla 22:07, 8 April 2007 (UTC)Reply
This is getting a bit anal, but...changed colons after date to commas...colons predated this period, being the standard from Dec 05 - Oct 06...Dashes were used pre-Dec 05. Thanks for the other corrections. :-) --Enginear 17:58, 10 April 2007 (UTC)Reply

The sources can be found on Google books here and here (registration required) and on NPR.org here. Edit: The last page contains an audio link but does not have the word in print, as transcripts are not available without charge. DAVilla 20:30, 7 April 2007 (UTC)Reply

Actually no registration is required for this one: don't be mislead by b.g.c. asking you to log in; usually closing the form asking for log in is sufficient (though there are a few where login is required). --Enginear 20:07, 8 April 2007 (UTC)Reply

Currently

(as interpreted by --Enginear)

Please feel free to edit, with comment, if you feel that this does not reflect policy or the practices agreed upon. DAVilla 21:34, 8 April 2007 (UTC)Reply

  1. A transistor radio.
    Our grandson owns a radio, but he’d like a transistor.
    • 1968, I. I. Taslitt et al. (translators), Ruth Bondy, Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory, Sabra Books (New York), ISBN 0491008392, page 25,
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), ISBN 0142180114, page 407,
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004, Lon Simmons, Liane Hansen (interviewer), “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio (July 25)audio,
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

I believe this is the best format.I'm changing my vote to one with a minor tweak about six sections below. --Enginear 17:13, 14 April 2007 (UTC) The date and author (or translator) are the most important features, so are put first. The page no is put after the publisher and date, since it relates to a particular edition, not to the text in general. I assume that the quotes are chosen to illustrate different features. (The second book has been linked to its Wikipedia entry, but if it were available on Wikisource or Gutenberg, that would take precedence.) However, IMO, the first cite is imperfect because only a snippet is available on the net. Similarly, audio (the last one) is technically substandard because the spelling cannot be checked. Personally, I would try to find better ones, although for what I assume is now an archaicism, and rare compared with other uses of the word, I might not succeed. --Enginear 19:55, 8 April 2007 (UTC)Reply

I have removed your addition of Ohad Zemorah because, apart from Google books, every source I can find claims that he is an editor. If he were an author I could understand, but anyway the RFP specifically says that no additional information is to be added unless it is required. In fact if he were an author then being listed second it might still be permissible to exclude him, especially since the words quoted are primary those of the translator. I am not certain that "et al." is necessary but I will leave it be.
It is my understanding from Wiktionary:Quotations that tr. is to be used in place of "translator". If that is incorrect then great! I do not believe we need to abbreviate anything here. However, you should update the policy page if you do not want to revert your change above. I cannot send this into a vote claiming it is the current standard if it is not in fact.
I have also edited the last quote since "editor", which I introduced, is incorrect. And the definition line is not the subject of this vote, so it should be uniform across all proposals. DAVilla 21:18, 8 April 2007 (UTC)Reply
Does the third source need a comma at the end? DAVilla 21:38, 8 April 2007 (UTC)Reply
Yes. Added. I interpreted the note re tr. abbreviation as meaning that it was permitted rather than required. I think there should be some limited personalisation allowed in such matters. I have modified Wiktionary:Quotations in line with this, and we'll see if it is reverted — I'm not aware of anyone here who really likes abbreviations, so I think it's uncontentious even when the subject is under discussion. ducks just in case (I also added the use of et al. which, as you politely hinted, was not previously there.) --Enginear 18:14, 10 April 2007 (UTC)Reply

Proposal by DAVilla

  1. A transistor radio.
    • Our grandson owns a radio, but he’d like a transistor.
    • 1968, I. I. Taslitt et al. (translators), Ruth Bondy, Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory, Sabra Books (New York), ISBN 0491008392, page 25
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), ISBN 0142180114, page 407
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25, Lon Simmons, Liane Hansen (interviewer), “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio [15]audio
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

As above, but use a bullet for the example, put the date all together, and leave off the trailing comma. DAVilla 18:27, 14 April 2007 (UTC)Reply

Second proposal by DAVilla

  1. A transistor radio.
    • Our grandson owns a radio, but he’d like a transistor.
    • 1968, I. I. Taslitt et al. (translators), Mission Survival, The people of Israel’s story in their own words: from the threat of annihilation to miraculous victory, Ruth Bondy (author), Sabra Books, New York, ISBN 0491008392, page 25
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), ISBN 0142180114, page 407
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25, Lon Simmons, “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, Liane Hansen (interviewer), National Public Radio
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

As per first proposal, being more careful with the subtitle, experimenting with the links, leaving off "audio" (which honestly I've never seen before), and using a little less italics. Also put off people who did not use these exact words (the author in another language, the interviewer) to information about the source. DAVilla 18:33, 14 April 2007 (UTC)Reply

Third proposal by DAVilla

  1. A transistor radio.
    • Our grandson owns a radio, but he’d like a transistor.
    • 1968, I. I. Taslitt et al. (translators), Mission Survival, The people of Israel’s story in their own words: from the threat of annihilation to miraculous victory, page 25, Ruth Bondy (author), Sabra Books, New York, ISBN 0491008392 °
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973, Iris Murdoch, The Black Prince, page 407, Viking Press, Penguin Classics (2003), ISBN 0142180114 °
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25, Lon Simmons, “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, Liane Hansen (interviewer), National Public Radio °
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

Still under expeirmentation. Does anyone have a better suggestion for a link symbol? Please, try it out above. DAVilla 08:59, 11 April 2007 (UTC)Reply

Proposal by BD2412

My proposal is this. No matter what configuration we end up using for quotes, we should make a template with parameters for the parts that we use (year, author name, title of work, text of quote, whatever else), so we don't have to go changing thousands of quotes around if we decide to change up the config in the future. Cheers! bd2412 T 03:51, 11 April 2007 (UTC)Reply

Word. Indeed, we can have a few different templates, one for each of the major common kinds of sources (normal-books, Usenet messages, periodicals, compilations, anything else?) —RuakhTALK 04:18, 11 April 2007 (UTC)Reply

No. Absolutely not. This will not work. There are simply too many variable pieces of information that would (or might) have to be considered. Even for just books, one has to consider: edited books, books of collected stories, books translated into English from another language, revisions of books, books with multiple authors, books with separate section titles...and leave in the options for linking important persons to Wikipedia and appropriate works to Wikisource. This doesn't even consider plays, journal articles, DVD subtitles, scriptural texts, speeches, television programs, poetry, and the myriad other forms we regularly cite for quotations. There is too much to bind in a single template, and too many possible templates for me to want to have to figure out which (if any) existing template will do what I need it to. --EncycloPetey 04:23, 11 April 2007 (UTC)Reply
I disagree. Most of those cases don't need to be handled separately. To wit (sticking only with the cases you mentioned — feel free to bring up others):
  • edited books ← all books are edited. The editor is only relevant if it's a compilation of some sort, in which case the individual work has an author and the whole collection has an editor. I already listed compilations as warranting their own template.
  • books of collected stories ← this is a kind of compilation, and I don't see that it needs separate treatment.
  • books translated into English from another language ← all we need are optional "translator" and "sourcelang" parameters in the various templates.
  • revisions of books ← edition information is standard. I don't see how that's a complication.
  • books with multiple authors ← no one will object if the "author" parameter lists multiple authors.
  • books with separate section titles ← this is only relevant if the different sections are written by different people, in which case this is a compilation. We don't currently note section titles anywhere, do we?
  • links to other Wikimedia projects ← nothing is preventing this. No one will object if the "author" or "title" parameter contains a link to Wikipedia or Wikisource.
  • plays ← generally we'll cite print copies, no? And there's no reason we can't support "act" and "scene" and even "line" parameters in addition to "page" parameters.
  • journal articles ← journals are periodicals. I mentioned those.
  • DVD subtitles ← sorry, I don't know how these work. Worse come to worst, things that aren't worth templatizing can always be given a Category:Templateless citations or something so we can keep this sort of thing organized, at least.
  • Scriptural texts ← Good point; these probably warrant their own template. Don't worry, such a template would get plenty of use in entries on Ancient Greek and Hebrew words.
  • speeches ← in what form is it durably archived? In a compilation? We'll have templates.
  • television programs ← we might be able to treat these the same way as periodicals, I'm not sure; I guess it depends on the details.
  • poetry ← in what form is it published? As a book? In a compilation? In a magazine? We'll have templates for each of those.
(I can't guarantee this will work perfectly, but I don't understand how you can say "absolutely not" at this point.)
RuakhTALK 05:44, 11 April 2007 (UTC)Reply
Well, at the very least how about a template for the bread-and-butter typical Google Books result, a quote from a book with an author, a title, a year, and a page number! bd2412 T 21:14, 15 April 2007 (UTC)Reply
The general case is very complex. I wouldn't mind a simple case if you personally think it would be useful to at least yourself, provided it's substituted, or at least substitutable e.g. by AutoFormat. The reason is that templates can turn off new editors who just need to make a minor change, e.g. if Google Books says author but it's an editor as per this example. (Heck, even I haven't found my way around the POS templates.) That sort of flexibility isn't so keen with a context label, say, and not so common with a POS heading, but probably the majority when it comes to quotations. With subst: the format couldn't be changed instantly if needed, and anyways monkey-see monkey-do is going to win out in the end, so a substituted template is very different from your original proposal. DAVilla 06:25, 21 April 2007 (UTC)Reply

Since this proposal does not list the example sentence and three quotations provided, it could not be included as a candidate in the upcoming vote. It doesn't appear intended for that anyway. If this proposal would be valid "no matter what configuration we end up using for quotes" then it is independent of that formatting issue. BD2412, if you agree then I would like to voice my own opinions on the above. DAVilla 08:57, 11 April 2007 (UTC)Reply

I have no objection. Cheers! bd2412 T 21:14, 15 April 2007 (UTC)Reply

Proposal by Ruakh

My proposal is mostly like it is now, but with periods to separate things more clearly:

  1. A transistor radio.
    Our grandson owns a radio, but he’d like a transistor.
    • 1968. I. I. Taslitt et al. (translators), Ruth Bondy. Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory. Sabra Books (New York), 1968; ISBN 0491008392. Page 25.
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973. Iris Murdoch. The Black Prince. Viking Press, Penguin Classics, 2003; ISBN 0142180114. Page 407.
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25. Lon Simmons, Liane Hansen (interviewer). “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday. National Public Radio. audio link
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

Fourth proposal by DAVilla

  1. A transistor radio.
    • Our grandson owns a radio, but he’d like a transistor.
    • 1968: I. I. Taslitt et al. (translators), Ruth Bondy (author). Mission Survival, The people of Israel’s story in their own words: from the threat of annihilation to miraculous victory. Sabra Books, New York, ISBN 0491008392, page 25
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973: Iris Murdoch. The Black Prince. Viking Press. Penguin Classics (2003), ISBN 0142180114, page 407
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 July 25: Lon Simmons, Liane Hansen (interviewer). “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio [16]
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

As above but with a bulleted example and a bunch of tweaks. Periods might help make the distinction between different versions more clear, so I've added one between Viking Press and Penguin Classics, but I've removed the one before NPR since Weekend Edition is wholely a part of it. Still not sure how to handle the audio. Anything you object to? DAVilla 19:00, 14 April 2007 (UTC)Reply

Proposal by Enginear

  1. A transistor radio.
    Our grandson owns a radio, but he’d like a transistor.
    • 1968 Template:italbrac, I. I. Taslitt et al. (translators), Ruth Bondy, Mission Survival: The People of Israel’s Story in Their Own Words, From the Threat of Annihilation to Miraculous Victory, Sabra Books (New York), ISBN 0491008392, page 25,
      The transistor is the center of life. It has priority over almost every other activity. At each fresh news program we gather round and listen tensely.
    • 1973 Template:italbrac, Iris Murdoch, The Black Prince, Viking Press, Penguin Classics (2003), ISBN 0142180114, page 407,
      We listened to some Mozart on Bradley’s transistor. Later he said, ‘I wish I’d written Treasure Island.’
    • 2004 Template:italbrac, Lon Simmons, Liane Hansen (interviewer), “Baseball Announcer Simmons Enters Hall”, Weekend Edition Sunday, National Public Radio (July 25)audio,
      A lot of fans come up to me and say, “Well, we—when we were kids, we used to take a transistor to bed with us and fall asleep listening to you.”

This is identical to my interpretation of the current style, except for the addition of the country of first publication of the cite (in the current language), immediately after the date. The date, country and author (or translator) are the most important features, so are put first. The country is important for charting the development of usage of the word with the defined meaning. It is usually simple to determine, but not always. For example, the first edition of the second book (in 1973) was by Chatto & Windus, which was at the time a British company, though now owned by a US corporation. I have trialled this version at a few "real" words, eg mardi gras (French), half and half and ned.

The page no. is put after the publisher and date, since it relates to a particular edition, not to the text in general. The second book has been linked to its Wikipedia entry, but if it were available on Wikisource or Gutenberg, that would take precedence.

I assume that the quotes in the current example were chosen to illustrate different features. However, IMO, the first cite is imperfect because only a snippet is available on the net. Similarly, audio (the last one) is technically substandard because the spelling cannot be checked. Personally, I would try to find better ones, although for what I assume is now an archaicism, and rare compared with other uses of the word, I might not succeed. --Enginear 17:13, 14 April 2007 (UTC)Reply

Umm.. you do realize that the {{UK}} and {{US}} tags mark words or definitions that are restricted to a particular region? They are therefore inappropriate for marking quotations. Also keep in mind that we're discussion how to format quotations that are interspersed with definitions. These will tend to be short lists, so identifying the country of publication is not especially relevant or useful. Such identification only becomes significant for longer lists, which are typically transferred to a /Citations page. I'm not sure how or where I would want to see the country (or region) information, but I'm sure it would be only an addiitonal burden for short lists of quotes placed among the definitions. I'm also dubious about giving the nation of publication such significant standing. Suppose the earliest publication of some of Ghandi's writings is in the UK. That's a politically charged issue I'd rather not open. Suppose the earliest publication of T. S. Eliot (born American) is in the UK. Eliot spent his first 25 years of life in America, and his poetry will reflect that. How would marking quotations from his poems as "UK" tell us anything other than the place of publication? I really don't see the publication information as so important that it must be placed up front next to the date. Let's stick with more traditional layouts like all other publishers have chosen to folow. --EncycloPetey 17:59, 14 April 2007 (UTC)Reply
{{label}} is in the works and will mirror the context lables without using categories. Although unlikely, it would be possible to redefine certain templates such as {{US}} and {{UK}} to default as labels rather than context labels, therefore requiring {{context|US}} and {{context|UK}} on definition lines. More likely, {{label|US}} or {{label|UK}} could be written in above, or {{italbrac}} could be used more directly, or simply the literal desired styling. DAVilla 19:11, 14 April 2007 (UTC)Reply
To answer EP:
  • No I didn't know that the usage of {{UK}} etc, and nor is the fact stated either on the relevant pages or their (blank) Talk pages, unless one is expected to know that they must be used in accordance with the Categories that are noted on their pages. I understood that categories usually noted that the head word is sometimes used in accordance with the category, rather than required to be used in that way, see eg, et, which is only colloquial in one of its several languages. But I've now altered the examples above to use {{italbrac}}.
  • I disagree with you re the usefulness of regional information for intersrersed cites, partly because <soapbox> I dislike the use of /Citations pages and would rather that, in the future, longer lists of interspersed cites could be collapsed, as is done for Translations. It seems wrong that long lists of cites, which demonstrate usage in detail, should be on a separate page (divided only by glosses) where the full details of the definitions are not available.</soapbox> There are many cases where a particular meaning of a word starts in one area and later spreads to others which have previously used the word in a conflicting way. Billion and trillion spring to mind. Sometimes a particular usage of a word dies out in one area while remaining in another. This information is important.
  • I use the same facts you mention to reach the opposite conclusion. The words one uses, and the meanings one intends for them, depend markedly both on one's ideolect and one's intended audience. One may use jargon or obscure language to people who will understand it. If multi-lingual, one will choose a language to suit. We generally avoid on this site suggesting that we table anything. I messed up a few weeks ago by forgetting we were international, and referring to a gas fueled camping stove, as if I was dealing only with a UK audience. Therefore both author and intended audience are important in understanding why a word is used in a particular meaning. The nearest we can easily get to the intended audience is usually to know the publisher. I believe that the country of publication, which is often not obvious from the publisher's name, and which is also an indication of the likely nationality of a book's editor, is worthy of separate mention.
  • TS Eliot is indeed a person worthy of discussion in this context. His earliest work was first published in US, and much of his later work first published in UK. In 1920, he published two books of poetry, one in US & one in UK (where he had by then lived for five years). The contents differed. I suggest that he and/or his publishers were aware of differing audiences. I suggest it is therefore reasonable to tag words only in the first as US and only in the second with UK. The 75% published in both books was presumably considered OK for both audiences, so presumably the "more important" cite would be chosen, for example if it marked a time when a word common in US was first used in UK, the UK cite would be given, and vice versa. It's the work, not the author, which is being tagged.
  • I don't see that it is particularly "politically charged" to note where an author has a particular work first published. In the case of Gandhi, it could have been during in Britain during his English education or later visits, during his time in British-ruled South Africa, where he first rose to prominence as an activist (or should I say passivist), or in British-ruled India (as I think it actually was, in 1917). It does not change his nature, nor his success at improving all three countries. --Enginear 16:17, 15 April 2007 (UTC)Reply

Proposal by

nonstandard & illiterate

Hi, would anyone object if I reclassify words classified as "illiterate" as "nonstandard"? The two classifications appear to be identical, save that "nonstandard" is kinder and less controversial. Language Lover 23:55, 7 April 2007 (UTC)Reply

Personally I like the category. It makes me feel better about myself to think that Arthur Conan Doyle is not "undefeatable", that I am "unexceeded" by illiterates like Clark Ashton Smith, "unexcelled" by L. Frank Baum, Benjamin Harrison, or Grover Cleveland. DAVilla 01:01, 8 April 2007 (UTC)Reply
I'm astounded to learn that prominent and prolific authors and speakers are never given license to deviate from accepted norms. Is that what you are suggesting? --Connel MacKenzie 16:35, 10 April 2007 (UTC)Reply
I'd object. To me, those two terms are not synonymous. Y'all is "non-standard", in that it's not part of any dialect of Standard English; but irregardless is "illiterate", in that no one who uses it can really be considered an English speaker. (O.K., that's an exaggeration; but you know what I mean.) Nonetheless, I really don't like the term "illiterate", because it makes it sound like people who use the term are illiterate, which is needlessly provocative; I'd prefer a name like Category:Malapropism or something, which describes the term instead of its users. —RuakhTALK 05:33, 8 April 2007 (UTC)Reply
On further reflection, it seems we have (a) certain editor(s) who is/are a bit trigger-happy in using that tag. Absent any objective way of distinguishing non-standard forms from "illiterate" ones, I think we'll have to make do and let the "non-standard" tag do double duty. —RuakhTALK 20:55, 11 April 2007 (UTC)Reply
Jesus christ. We have an "illiterate" category? And us two is in it?!? There is no hope for Wiktionary. --Ptcamn 13:40, 8 April 2007 (UTC)Reply
"us two" is good enough for Charles Dickens and Sir Arthur Conan Doyle. Well known illiterates. Robert Ullmann 18:25, 8 April 2007 (UTC)Reply
And I've just noticed Template:smarter. I quit. --Ptcamn 13:58, 8 April 2007 (UTC)Reply
Yeah, someone needs to be informed that "us two" is perfectly acceptable as an object. But I guess it's just easier to classify it as a "misspelling". Want to look smarter? Try using "misconstruction" instead. ;-) Personally I would delete on the grounds that it's not idiomatic. "Us" is commonly used, incorrectly of course, as a subject whenever other words, whether they be "two" or what have you, obscure that fact.
Don't give up only because of Template:smarter, which is relatively new and not widely used. I only ran across from this thread. It has four inclusions at the moment, only one of them grossly incorrect. On the other hand, 75% isn't a very good score, is it? DAVilla 14:10, 8 April 2007 (UTC)Reply
I would argue that "us" as a subject is not incorrect in any objective sense. There are certainly people who regard it as incorrect, and we should note that, but there's a very big difference between frowned upon and being somehow intrinsically wrong.
Template:smarter isn't that bad, but it's the straw that broke the camel's back. There seems to be too many people here who are more interested in telling people how to speak than informing them about words. (And if you think what people need to be informed about is how to use words correctly, ever heard of covert prestige?) --Ptcamn 14:37, 8 April 2007 (UTC)Reply
I created Template:smarter as an attempt to replace the illiterate tag and still please Connel MacKenzie. It took some prodding, but I managed to get him to articulate why he dislikes undefeatable (see "Undefeatable :)" at his talk page), and the reason is that he thinks "invincible" sounds smarter. So, I thought, we can convey that in a less derogatory manner. But he insisted on keeping the illiterate tag nevertheless. If it were up to me, we wouldn't have put any prescriptivism at undefeatable, but after the soap opera gone down on Connel's talk page over it, I think I'm not passionate enough about it to press the point :P Language Lover 15:10, 8 April 2007 (UTC)Reply
(Linked the user page above. DAVilla 18:24, 8 April 2007 (UTC))Reply
Wow. Having read his talk page, it seems like Connel's pot has cracked. The OED isn't a real dictionary now? Crazy.
I'm removing both Template:smarter ("invincible" is not exactly synonymous anyway) and the illiterate tag from undefeatable. If anyone doesn't like it, cite a source. Verifiability not truth and all that. --Ptcamn 15:38, 8 April 2007 (UTC)Reply
I'm over my head in this all, having never studied the subtle differences between UK English and US English (and other dialects), but one thing Connel brought up near the bottom of that discussion, was the possibility that undefeatable might be {{US}} or {{UK}} (I'm not sure which he meant since I don't know where he's from, he just said "across the Atlantic"). Never having rigorously studied English, I defer all judgement on the word to those who have, and I look forward as always to learning awesome new things from everyone :-) Connel's intentions are good, regardless of the specifics of individual words, and since I've done some crazy things in the past myself, I'd really hesitate to call anyone crazy :) Language Lover 17:18, 8 April 2007 (UTC)Reply
OED is a major dictionary by any means, and we all know the entry is there. Keffy claims that Webster's Third and Random House Unabridged also list it, and I take him at his word. Though I can't access it, M-W online does seem to have an entry. So the removal of "illiterate" on that page was entirely justified in my opinion. This for a word that an admin had considered deleting on sight, and thankfully didn't. DAVilla 18:24, 8 April 2007 (UTC)Reply
I have an observation here: I grew up in the US, with quite a bit of exposure to British television (The Avengers to Monty Python to The Saint etc.) and literature (everything). I now live in Kenya, in the opposite circumstance; everyday English is Commonwealth English, and I only hear pure GenAM on television (Boston Legal ;-). I'd use "undefeatable" in UK/Commonwealth English, and it sounds right, but at the same time it sounds wrong in US English (should be "invincible" or some other.) I don't see the distinction as very strong though; but then perhaps I wouldn't. Robert Ullmann 18:51, 8 April 2007 (UTC)Reply
It's not a very common word in UK English either. That doesn't mean it's wrong; it's just fairly rare, on either side of the Atlantic. Widsith 21:48, 8 April 2007 (UTC)Reply
Both terms sound fine to me, though I wouldn't expect to hear them under the same circumstances. An invading army of super-villains might be described as invincible (just before the heroes show up and defeat them), while a sports team with a long string of spectacular successes in a season might be termed undefeatable (or undefeated). --EncycloPetey 23:28, 9 April 2007 (UTC)Reply
In what region is that acceptable? "Undefeated" is very commonly used, but saying a team is "undefeatable" is wrong. --Connel MacKenzie 16:39, 10 April 2007 (UTC)Reply
Who says it's wrong? Have you seen this in a style guide somewhere? Because as long as it's in dictionaries it doesn't seem very wrong to me. Do you have any references to back it up? Widsith 13:39, 11 April 2007 (UTC)Reply
That is the rub; it is absent in all normal "abridged" English dictionaries. That's what started all this, in the first place. --Connel MacKenzie 03:29, 12 April 2007 (UTC)Reply
But...abridged dictionaries by definition do not include every word, only the more common ones. No one is saying undefeatable is common, just that it's not wrong. Widsith 14:22, 12 April 2007 (UTC)Reply

Wiktionary:Alternative spellings

Language Lover put together some text on this long-debated topic, which I moved in order to begin a policy page on the matter. It's about time we set some of our ideas down as policy since the issue is raised quite regularly in various fora. The name for the new page follows the style of Wiktionary:Pronunciation and Wiktionary:Etymology in using the standard header form from the ELE. --EncycloPetey 23:32, 9 April 2007 (UTC)Reply

I think it should be changed to "Alturnitave spelengs". Ha, joking. bd2412 T 23:38, 9 April 2007 (UTC)Reply
I'd rather that the more common spelling always be the main entry, except in cases of regional spellings. (I'd actually be O.K. with always having the more common spelling be the main entry, even in cases of regional spellings, but I know that people from certain regions would object to that, so whatever.) —RuakhTALK 05:00, 10 April 2007 (UTC)Reply
Good start, Language Lover. I think this really could develop into some kind of offical status. I have a few nit-picks (which I hope don't detract from the fact that overall, I like the initiative you have taken) here:
  1. The comment about "saving disk space" seems simply wrong. The only rationale I've heard for the "alt" designation is to make them easier to keep in sync. That should be reworded before people start screaming "wiki is not paper."
  2. When both terms are obscure, the "alt" entry should have a gloss (on the same line.) There should perhaps be some note, that "alt" designation can be suplemented with a full definition at any time and is generally encouraged.
  3. "One hundred years" is the limit we've used for "obsolete," not two hundred (but even that has recieved lots of objection from people who want it to be 50 years or 25 years.)
  4. For "leet", instead of saying "in some rare cases", I think it should say "for the five that have entered the general lexicon," so people understand that no new leet inventions are welcome.
  5. Has attestation been an issue for regional spellings? Rather than emphasize WT:CFI, it might be reworded to emphasize the different regional tags we are supposed to use (since that seems to be the more ambiguous distinction, from time to time.) As a formatting note, because the regional tags also add categories, there should be mention of when to use the tag, and when to simply list it in italics.
Nice start! --Connel MacKenzie 15:02, 10 April 2007 (UTC)Reply
Great observations, I took the liberty of adding some of them to the entry. Thanks for correcting me about the obsolete designation. By the way, the idea for this entry comes more from HippieTrail than myself. I agree about the tags, since those should help people who are designing third party software that uses Wiktionary. Thanks for sharing your senior wisdom here Connel! Language Lover 01:58, 11 April 2007 (UTC)Reply

Incidentally, I think there should be a fourth category, variant or alternative form or something, for when a word has multiple spellings that are pronounced differently, but that are still clearly the same word. One region-based example of this is AmE negotiation (with two [ʃ]s) -slash- BrE negociation (with one [s] and one [ʃ]); one non-region-based example is pescatarian (with a [sk]) -slash- pescetarian (with a [s]). —RuakhTALK 16:14, 11 April 2007 (UTC)Reply

"Listed as a spelling error by..."

Given the enormous bias in the "illiterate" section above, I've simply started a new section.

Currently, Wiktionary has no clean way of identifying words that all/most/some spellcheckers list as errors. Since I am under direct attack for applying what I think are reasonable tags to such things, I'd like to take a step back, and ask this community what the best approach might be.

Some of en.wiktionary's entries have "Dictionary notes" sections. While that one section has not had any reasonable criticism offered against it and has had considerable support, the occurrences of that section often are covertly removed, inexplicably.

But even then, the lack of a dictionary listing is the sort of thing that can change from revision to revision. Yet at the same time, a "spellchecker" is not a "dictionary" and so can't/shouldn't be listed the same way, anyhow. On the other hand, perhaps regular contributors here would like to see both sections, for all terms where a spelling is an obvious mistake?

I'd like to see a section, in Wiktionary entries, that lists which spellcheck programs list a term as a misspelling. Since finding the localization may be difficult for some, it would make sense to be very explicit in what the format should look like, eventually. As a secondary concern, of course, is which spellcheckers are deemed "widespread" enough.

Of important note:

Please let me be absolutely crystal clear, that I am not proposing any WT:CFI change: only the very most common misspellings meet WT:CFI currently. I am only talking about entries that clearly are garbage are overwhelmingly considered errors, yet easily pass our WT:CFI (of which I grumble about so often.)

So, would adding a ===Listed as misspelling=== section work? If so, what format would you prefer? --15:21, 10 April 2007 (UTC) This comment was typo'd as three tildes instead of four. Sorry...here are four tildes: --Connel MacKenzie 15:41, 10 April 2007 (UTC) Reply

We want to avoid propogating new section headers when possible. Considering that, I can see two possibilities. Either we can include the inforation in Usage notes, or we can use a general header ===Spellings===, which would be more flexible and allow for other sorts of spelling information to be included in the section, such as Alternative spellings and Hyphenation. It would also allow for a better place to include alternative script information, such as for Serbian entries where there is a standard spelling in both Roman latters and Cyrillic letters. --EncycloPetey 15:33, 10 April 2007 (UTC)Reply
Adding new stuff in ===Spellings===? For misspellings? I could see listing these as a sub-section of ===References===, if that's what you mean? --Connel MacKenzie 15:43, 10 April 2007 (UTC)Reply
I'm thinking in terms of a much bigger picture than just the one issue, but I see your point. My concept could accomodate a list of spellings "listed as misspellings" with references keyed to the References section. --EncycloPetey 15:48, 10 April 2007 (UTC)Reply
But that is addressed by the =alt spellings= header elsewhere. Which speaks not at all, to the entry itself.

The thing is, certain pinheads think every random sequence of characters put together, merits inclusion. Our WT:CFI generally supports such idiocy, despite the fact that a usable dictionary will clearly tell someone when they are misspelling, misusing or mis-constructing a term. What else is a dictionary used for, after all? Obscure linguistic research? That's what makes the OED not be a true dictionary; it isn't useful to any, but for the most bizarre "descriptivist" questions.

Many terms entered here are pure garbage. But garbage that passes WT:CFI. How should the garbage be labelled? --Connel MacKenzie 16:11, 10 April 2007 (UTC)Reply

Please don't insult contributors with pejoratives like pinhead, since we are all judged by how we speak. Anyhow, I think you're way off saying this, can you cite even one example of a regular (non anon) editor who "thinks every random sequence of characters put together, merits inclusion"? I'm probably pretty radically far leftist when it comes to inclusion, but I still carefully research every word I submit or defend. Most are way less liberal than I. The overwhelming majority of our editors do a fantastic job of researching words :-) Language Lover 18:47, 10 April 2007 (UTC)Reply
There are a few different kinds of garbage, and we need to take care to distinguish them. If something is genuinely a (common) misspelling — a (common) non-standard spelling of a word that does have a standard spelling — then the appropriate way to handle it is to define it as "Common misspelling of foo." If it's actually a completely non-standard word that meets CFI, and an editor encountering it would be best advised to replace it with a completely different word (or phrase), then the appropriate way to handle it is to define it normally, but to mark it with {{context|non-standard}} (or {{context|obsolete}} or whatnot, if it's an issue of a formerly standard word), and provide a usage note that explains the situation.
Of course, that's if everyone agrees it's a non-standard spelling or non-standard word. If it's arguably a common misspelling, then we should define it as "Common misspelling or alternate spelling of foo", and if it's arguably a non-standard word, then we should tag its definition with {{context|possibly non-standard}} and provide a usage note that explains the situation.
RuakhTALK 16:46, 10 April 2007 (UTC)Reply
That is very "descriptivistic" and useless to say. If 9 our of 10 spellecheckers list something as a misspelling, to then call it "possibly non-standard" is inaccurate. That's why, rather than someone taking my word that something is "illiterate", we should simply list what resources do list it as wrong. --Connel MacKenzie 16:57, 10 April 2007 (UTC)Reply
You realize that your argument contradicts your conclusion? If you think that consensus can never be reached on any word, then no words can be labeled non-standard; instead, all such words should be labeled possibly non-standard with usage notes explaining who says so. —RuakhTALK 18:28, 10 April 2007 (UTC)Reply

This is some good discussion :-) I just wanted to wonder aloud, why are we fussing about spellcheckers? A spellchecker is never intended to be a flawless arbiter of English, rather it is intended to catch typos; anyone who uses one very frequently knows it's not at all uncommon to get false positives (positive meaning "misspelling") from them. Especially anyone who writes anything remotely technical or context-specific. What's more, the English language is not a fixed, crystallized thing, else anyone could just pick up the original Beowulf and start reading. Spellcheckers can be expected to usually be a few years behind the actual language as used and spoken by people with a pulse. We are not the Borg, nor is this the 10th Edition Newspeak Dictionary! :-D By the way, don't read this as an attack against you, Connel, I totally love all the sweet work you do, so please don't accuse me of attacking you! :) Language Lover 18:47, 10 April 2007 (UTC)Reply

Do spellcheckers even list misspellings? It was my understanding that they only list the correct spellings, with misspellings determined by absence from that list. --Ptcamn 18:40, 10 April 2007 (UTC)Reply

Some of them do, yes. In MS word, a number of common misspellings are programmed in so that spelling is checked while you type. This can be frustrating when the word segreant is unhelpfully "fixed" to become sergeant while typing. --EncycloPetey 18:43, 10 April 2007 (UTC)Reply
Regarding segreant; why would you expect such a rare term to not be listed as a probable typo? --Connel MacKenzie 13:48, 11 April 2007 (UTC)Reply
Absolutely it should be considered a probable typo. The problem is the *expletive* program (this is Microsoft we're talking about here) decides it knows what you really meant, and happily changes it for you. Without your consent. Without even informing you, in fact. And these are the default settings. And this is the standard editing application, on the standard operating system, and none of that is going to change any time soon. DAVilla 23:55, 11 April 2007 (UTC)Reply
I do not use that program regularly, but am familiar with many deficiences of that "auto-correct" feature. If you use it frequently, you have my sympathy. (Last time I checked, it still "auto-corrected" both "Connel" and "MacKenzie" to incorrect spellings.) --Connel MacKenzie 03:36, 12 April 2007 (UTC)Reply
Yes, spellcheckers function by "stoplists"; otherwise all checks would be on the order of magnitude of N**length instead of N. From what I've looked at, most use sevreal types of stoplists. Vlad, I know you'd like to think that English has no rules at all, but the fact remains that there are rules for word formation and spelling. Being purely descriptive, ignoring those rules, is a massive disservice to anyone that wants to actually use content created here.
That is why, almost every day, I repeat that WT:CFI is broken. With very few exceptions, all the words nominated on WT:RFV have had problems in one spellchecker or another.
So, these week, I've decided to take my "fussing about dictionaries" and rephrase the complaint in terms of "fussing about spellcheckers" to try to reach some certain pinheads that call me crazy. Pure "descriptivism" is "crazy", not pure "prescriptivism." At least the limits of prescribing spellings, is essentially known. On the other hand, there is no limit to the stupidity one can find when searching usenet archives in the name of descriptivism. <insert "infinite" Einstein quote here>
Yes, I understand that it is "easier" to be lazy, and take only a "descriptive" approach. But, sorry, the English language does have rules. And again (as I always seem to have to repeat in different ways) a resource with a low signal to noise ratio cannot be useful.
The new stats have been encouraging. I honestly thought the signal to noise ratio had already fallen below 1.00. Evidently, such cries of imminent demise are still early.
--Connel MacKenzie 13:48, 11 April 2007 (UTC)Reply
Language rules are created based on the language people speak. If, overnight, 99% of English speakers suddenly decide to spell night as "nite", the rules will change to accomodate this. The "rules" are really more of a model, like Newtonian physics is a model of the true universe. Descriptivism isn't the lazy route at all! If we wanted a pure prescriptivist route, we could just stop all further development right now and say, "these are the English words, anything else is wrong". Incidentally, your objections to "undefeatable" go more against the rules than for the rules: very simply, there are rules which say you can add "able" to a transitive verb to get an adjective, and add "un" to any adjective where it is phonologically acceptable and doesn't have a particularly unusual etymology. You are in essence saying undefeatable is an exception to those rules. Well, maybe you're right, (I don't know myself), but I think very few editors so far have been convinced. (comment continues below)
"Stop right now?" Um, how do you reach that conclusion? We didn't start out with complete coverage of the English language, nor have we done more than covered a fraction of what most dictionaries have for "standard English." No, descriptivism is the cheap way out; to look up the prescriptive rules for correct use in each case is much harder. To properly cross reference all misuses with accurate indication of why they are wrong is much harder. --Connel MacKenzie 03:36, 12 April 2007 (UTC)Reply
Impossible in fact: there are no reasons, it's all convention. Widsith 14:20, 12 April 2007 (UTC)Reply
(comment continued from above) The purpose of language rules is to simplify things so that people don't have to rigorously research every word the first time they use it. But we are in the business of doing such research, so to us the rules are just a guideline (which usually applies, but not without exceptions). Let us embrace the everchanging nature of language and rejoice in the beauty thereof :-) If we don't, it'll just change anyway and we'll be left yelling at those damn kids to get off our lawn! ;) Language Lover 16:17, 11 April 2007 (UTC)Reply

Spellcheckers are relatively small dictionaries. If they were the basis for what we should include, we wouldn't have bothered creating Wiktionary in the first place, we'd just use an actual spellchecker. I have no problem labelling misspellings as such (though personally I'd just leave them out altogether), but I can't help feeling that Connel's idea of a misspelling is any word he's never heard of that gets a red wiggly line under it in MS Word. Sadly — not sadly! — there are many many perfectly valid words which fall under those criteria and which are neither garbage nor added by pinheads hell-bent on corrupting the rules of our language. If authors took out every word not recognised by their word processors, then Ulysses would read like The DaVinci Code. Widsith 14:02, 11 April 2007 (UTC)Reply

Who says they are words I've never heard of?
Who says there is a GFDL spellchecker? There certainly wasn't when I discovered Wiktionary.
Please note clearly the "pinhead" comment was about the previous insult.
I've seen one person claim that one British source dictates that "-able" can be added willy-nilly. I do not believe that to be true in the general case, in en-US, nor in the specific exception case of "undefeatable".
Just how many years separate "Ulysses" and "The DaVinci Code"?
--Connel MacKenzie 16:24, 11 April 2007 (UTC)Reply

How has Wikimedia Changed your Life?

This message is being crossposted around village pumps and mailing lists - apologies if you receive it more than once!
Have any of the Wikimedia projects had an effect on you in real life, or do you know of someone, or some group of people, who use our projects in real life? If so, we want to hear from you at m:Success Stories - How has Wikimedia Changed your Life?. The hope is that this page can become somewhere to which we can point members of the press so that they can immediately get an idea of the usefulness of our projects. Please, take a look, and add your stories! Martinp23 16:02, 10 April 2007 (UTC)Reply

By editing wiktionary, I've developed super powers and ninja skills. I can now turn invisible, fly, and turn back time by shoving the Earth the opposite way around its axis. Everyone who wants super powers of their own, should come define some words for us!!! :D Language Lover 18:25, 10 April 2007 (UTC)Reply
Haha - I knew there'd be at least one ninja :D Martinp23 18:31, 10 April 2007 (UTC)Reply
Now I feel ripped off... - [The]DaveRoss 02:00, 11 April 2007 (UTC)Reply

Proposed order for "see also" template

Some of the terms which use this template have a large number of terms stuck in there, enough so that I would like to propose the following guidelines on how to order terms in the template. I propose the following scheme:


Lua error in Module:languages/errorGetBy at line 16: The language code "pan" in the first parameter is not valid (see Wiktionary:List of languages). Lua error in Module:languages/errorGetBy at line 16: The language code "nu" in the first parameter is not valid (see Wiktionary:List of languages).

The order of terms in the template should be:

  1. The basic uncapitalized and unadorned script.
  2. A variation that merely capitalizes the first letter of the term (e.g. Bush as a surname; Atheist as atheist in German - many German nouns being identical to English but capitalized).
  3. A variation that capitalizes all letters of the term (e.g. MOD).
  4. Variations that capitalize a letter other than the first letter (e.g. pH or more than one but not all letters of the term.
    Note that this order of precedence in capitalization should be the same for similar variations further down the list; so pan- comes immediately before Pan-, and so forth.
  5. Variations containing punctuation:
    1. A prefix followed by a hyphen (e.g. man-).
    2. A suffix preceded by a hyphen (e.g. -man).
    3. Followed by an period (e.g. co., Sun.).
    4. Preceded by a period (e.g. .co).
    5. With a period between letters (e.g. t.ex.).
    6. Followed by an apostrophe (e.g. ca').
    7. Preceded by an apostrophe (e.g. 'kay).
    8. An apostrophe between letters (e.g. c'est).
    9. All other terms incorporating punctuation, order to be determined as cases arise.
  6. Variations containing a diacritic over a single letter, with the earliest diacritized vowel listed first.
    1. I prefer to order diacritics with Chinese tonal order first (e.g. mān, mán, mǎn, màn) followed by umlauts, breves, tildes, carats, rings, cedillas, etc., in no particular order (though I think one should be set).
  7. Variations containing diacritics over two vowels (e.g. mêlée, résumé.
  8. Variations containing diacritics over three vowels (can't think of any).
  9. Variations containing a diacritic over (or under) a single consonant (e.g. ça, ĉu).
  10. Variations containing a diacritic over (or under) a consonant and a vowel (e.g. çà, çã).
  11. Variations containing multiple diacritics over a single letter (e.g. , ).

Once an order is settled on, it should be possible to instruct a bot on that order and have it fix all see also templates accordingly. Cheers! bd2412 T 23:31, 10 April 2007 (UTC)Reply

Looks largely OK to me, but this isn't really a Grease Pit issue. There is not a technical problem to be addressed. --EncycloPetey 23:37, 10 April 2007 (UTC)Reply
Hmmm... Beer Parlor? bd2412 T 23:46, 10 April 2007 (UTC)Reply
One change I'd suggest to simplify this is to streamline the bit about diacriticals at the end. Order these first by number of diacriticals (1, 2, etc.), then by appearance order (diacriticals on first letter, second letter, etc). It makes the process needlessly complex to worry about whether the diacritical is associated with a vowel or a consonant. I would also expand the idea of a diacritical to include letter variants such as Polish slashed-L (Ł, ł). When there is a tie, run the order from top to bottom (over, through, under) the character in question. --EncycloPetey 23:53, 10 April 2007 (UTC)Reply
So ĉu would come before, say, cù because the diacritic is on the first letter? But both would come before çà, which has multiple diacritics? bd2412 T 00:40, 11 April 2007 (UTC)Reply
Yes. Though I'm not sure any of this will help the list much on pages like a. --EncycloPetey 00:44, 11 April 2007 (UTC)Reply
Hence Appendix:Variations of 'a' (like a hammer to break the glass in case of emergency, we can trot out an appendix for terms with a truly monumenal number of forms; this includes all of the vowels, so take single-letter terms out of your thinking on this). Ok, I like it. bd2412 T 01:11, 11 April 2007 (UTC)Reply
Although I do think Wiktionary needs to establish a precise order for alphabetization, which varies from language to language, I don't think this is such a crucial issue for see also sections at the top because of the appendix option for variations. (Incidentally there are several pages that currently need such appendices.) But I do think some general guidelines should be laid out, which would establish a partial order of the strings, to use the mathematically precise term. That is to say, as far as diacritics, letter variants (including ligatures), and that sort of thing go, let contributors use their best judgements. Bots would be able to insert words wherever the owner might best try to fit them, and if people want to rearrange them then the bot would not override that. However, the bot would be able to ensure something like:
Incidentally I'm not sure if I agree even on the order of captials and puctuation. What would be best would be to find a lot of examples, even short ones like compound words with optional spacing and hyphenation, determine which ones we agree on, and derive a partial order from that. DAVilla 08:25, 11 April 2007 (UTC)Reply
I like the unrestricted format we have currently; I try to organize the entries in this navigation tool by order of frequency. The recent barrage of overloading the {{see}} template has been pretty silly. The purpose is to help people find the entry they actually are looking for, not to list irrelevant obscurities in an incoherent fashion. --Connel MacKenzie 03:43, 12 April 2007 (UTC)Reply
Interesting point, but what if someone is actually looking for an obscure term (particularly one using diacritics that can't readily be typed into the search window)? If I'm looking for ĉu or or çã, I'd be inclined to type in cu or nu or ca, and would expect to either find them there or find a link to them. As for the order, I think we need some kind of normalization so that (eventually) the templates can be completely bot generated/updated. A bot should be smart enough to pick any given word (man, for example) and find every other word composed of the letter sequence "m" (cap or lowercase, with or without diacritics), "a" (same) and "n", with or without punctuation at any point, and to stick that word in the right place on the "see also" template on all other pages having that sequence. That's my thought and I'm sticking to it. Also, I think the "see also" template should handle up to twelve, and we should thereafter go to an appendix. Cheers! bd2412 T 04:20, 12 April 2007 (UTC)Reply
On http://wiktionarydev.leuksman.com/, running Hippietrail's extension, they ARE fully automated. We need that code here. --Connel MacKenzie 06:07, 12 April 2007 (UTC)Reply
I propose to simply have them sorted by a bot, that uses some or other built-in sorting algorithm. Easy, simple. H. (talk) 11:53, 12 April 2007 (UTC)Reply
  • I used to put a lot of thought and work into sorting these. My ordering was much like the proposed one but I also sorted by type of diacritic as per which are most common in English words to which are most exotic to English speakers unaccustomed to other languages: á, à, ä, â, cedilla, macron, others.
  • But when I was developing the DidYouMean extension to automate the entire "disambiguation see also" process I realized this sorting would be too complex and slow to impose on mediawiki and I just settled for the natural Unicode order because that at least meant it would be fast and consistent. — Hippietrail 19:01, 13 April 2007 (UTC)Reply
    Unicode order is fairly similar to this "more familar" first order: the diacritics (specific combinations) common to Western European languages are coded from 00A0 to 00FE (Latin-1), and then other characters less and less familiar. So this isn't bad at all (except that it may look fairly random in some cases ;-) Robert Ullmann 19:09, 13 April 2007 (UTC)Reply
    Yet further proof that Wiktionary is Unicode's bitch.
    What about the double-letter stuff? Shouldn't that at least be put at the tail? DAVilla 18:13, 14 April 2007 (UTC)Reply
    You mean two letters with diacritics and/or letters with two diacritics? I suppose the rare single letter with two diacritics has its own unique unicode combo, and should be ordered accordingly if that's what we use. As for two letters with diacritics, I would put them at the end and use the first diacriticized letter as the basis for sorting against any others. Cheers! bd2412 T 05:42, 17 April 2007 (UTC)Reply
    No, I'm not talking about æae or the like. I mean r vs. rr in pero and perro, o vs. oo in good and god, etc. These should go at the end of the list regardless of how the rest are ordered. DAVilla 06:06, 21 April 2007 (UTC)Reply

Outcome of rfd and rfv

Is there any way of marking terms as good after they have been nominated for deletion or verification and then show to be acceptable? Clearly, adding citations is one way, but maybe we need a note on a page (or against an individual definition) saying "This term has been confirmed as being good" or something like that. I ask because apparently asdf has been nominated for deletion for a third time. — Paul G 15:00, 11 April 2007 (UTC)Reply

It's supposed to go on the talk page (see Category:Verification templates). It's just a pain to have to do, and automation has fallen through the gaps (see {{process}}). DAVilla 23:37, 11 April 2007 (UTC)Reply
Extended discussion (more feedback really is needed): Wiktionary:Grease pit archive/2007/April#Better, more, faster archiving. --Connel MacKenzie 06:05, 12 April 2007 (UTC)Reply

Wiktionary:About Persian

There is a new page at Wiktionary:About Persian about issues related to entries in Persian (Farsi), where any comments or ideas are welcome :-D Pistachio 00:07, 13 April 2007 (UTC)Reply

Italian compound words

Recently someone made a perfectly valid request for fargli in Wiktionary:Requested articles:Italian. This word is a compound of the verb fare and the pronoun gli. There must be more than a million such compounds, and adding them would be a nightmare. However, you come across them all the time, and newcomers to the language don't always recognise them for what they are. If we were to add them, it would be the job of a bot (and I am considering applying for bot-status in order to add several thousand Italian verb forms).

  1. Do such words merit inclusion (and meet our CFI)?
  2. Would "Verb" be a reasonable part of speech?
  3. Would an explanation e.g. "Compound of the verb (...) and the pronoun (...)" be acceptable, as there isn't really an easy translation?
  4. What Category should they go in (we have "Italian verb forms")?

Thoughts please. SemperBlotto 14:36, 14 April 2007 (UTC)Reply

Seeing as, by my understanding, we're already committed to supporting polypersonal agreement in languages like Georgian, we might as well support the limited polypersonal-agreement-like compounds in Italian, Spanish, and so on — especially in those languages where it can affect spelling (fare + glifargli and not *faregli, haciendo + lohaciéndolo and not *haciendolo, etc.). —RuakhTALK 15:17, 14 April 2007 (UTC)Reply
P.S. Yes to questions 1–3. As to the category, I think Category:Italian verb-pronoun compounds would probably be clearest. —RuakhTALK 15:46, 14 April 2007 (UTC)Reply
  1. Yes
  2. Yes
  3. Yes
  4. "Italian verb forms" seems as good as anything. Widsith 15:31, 14 April 2007 (UTC)Reply

OK, I have made a first attempt at fargli, but won't add any more for the moment (unless requested). SemperBlotto 17:07, 14 April 2007 (UTC)Reply

Yes, yes yes etc. But please use {{form of}} or create a derivative from it, such that the correct css is applied. And maybe re-read WT:ELE: Capitals please, #* before example lines, italicise examples but not their translations, do not bolden the translation etc. See my last change to the page. But please make the template yourself (as an ‘advanced’ speaker, you are more aware of special needs that might occur); I am willing to help if you don’t know how. H. (talk) 21:38, 15 April 2007 (UTC)Reply
See Wiktionary:Votes/2006-12/form-of_style. H. (talk) 21:45, 15 April 2007 (UTC)Reply

WT:STATS#Detail

I've corrected some errors in WT:STATS on this pass; I had been counting "#:" and "#*" as definition lines previously. I refined the "form of" and "slang" detection to also notice those respective stopwords even if not formatted properly. So "English slang" jumped from 6,000 to well almost 18,000. Enjoy the new numbers! --Connel MacKenzie 16:30, 14 April 2007 (UTC)Reply

Drama Sucks (Wikidrama part four)

Preamble

Lots of Wiktionary history reads like it would make a good soap opera.

The first year or two, no one noticed it; a dozen or two contributors tried to sketch out the basic coverage of language features. There was a bureaucrat, and then a couple sysops, who just deleted crap as it rolled in. The only bot activity in that time was NanshuBot which made quite a mess, leaving all sysops here with an astronomically negative opinion of all bot activities.

The next year or two showed a ramping up of sysops (essentially, all the regular contributors.) Concerns about format of entries were raised, as some milestones of basic English language coverage (compared to other basic dictionaries) were met. Some of what is now formalized in CFI & RFV started to take shape. Some things were discussed reasonably and adequately. Other things (for a variety of reasons) ended up on strange tangents, such as enforcing the "etymology" hierarchy we still have today. Bot activities were violently and vehemently attacked and discouraged.

As the number of regular contributors increased, the number of sysops generally did not (minor jumps here and there, only occasionally.) As (critically necessary) bot activities increased, the old school resisted more strenuously; by the time Category:English nouns came to a head, the original bureaucrat found himself in an intractable mess, unable to save face. Since that time, he has taken a background role, contributing infrequently or not at all, yet remaining active on the wiktionary-l mailing list, biding time out of the spotlight.

During last year, an enormous effort also was made to codify many accepted practices. Some were done faithfully, others inaccurately. Attempts at making anything an official policy failed, by and large. But a core set of principles emerged. WT:ELE and WT:CFI seem to be the undisputed pillars of Wiktionary.

But I post today, to talk about bots.

Bots

When I got to Wiktionary, I was surprised and relieved by the lack of automation. Here were people creating a resource that would be free (as in speech, and as in beer) to the world to use ever-after. Copyright violations were dealt with sternly and very quickly. Nonsense was ripped out faster than you could blink your eye. And what remained was a core group of solid contributors, all miraculously working in concert towards a common goal.

But early attempts I made at automation were astoundingly, uniformly, universally and beligerantly resisted. I was baffled. (To this day, Webster's 1913 remains outside of Wiktionary.)

As I cooperated and interacted with the core group, they understood quite clearly that I was interested in the same end result; a respectable, usable free dictionary. As time went on, my off-the-beaten-path attempts at automation were gradually accepted. Better still, I made significant progress getting actual approval of certain bot tasks. But it was very painfully slow progress; parsing the original dumps (and much later, the XML dumps) was always a nightmare. It was clear then (much clearer now) that without some uniformity, the problems would only get worse. As a result, I contributed to the current inflexibility we have, sometimes in a very large way.

The shift I felt, was that I became more interested in encouraging others to automate, rather than trying to simply automate more myself. This was more for garnering support when facing off against the "old school" than anything else. But in some ways, the old school was right: each task should have separate approval. Last time I checked WT:BOTS, it still did (much to my chagrin.)

Today, we have numerous, fantastic bot operators, contributing in fantastic ways. But I find myself at a certain stopping point with regard to one minor technical matter. To simply bot-war/wheel-war it into non-existence would be (ahem) trivial. But that clearly wouldn't be productive, for me, or for the excellent bot operator in question.

More history

Two things Wiktionary has always done wrong are:

  1. Over emphasizing etymology, and
  2. Over emphasizing "part of speech" headings.

In traditional dictionaries, and in colloquial expectations, dictionaries are about definitions.

The part-of-speech of a definition is fluid, or irrelevant. To native speakers, the part-of-speech is irrelevant most of the time. To translators and ESL learners, it is of obvious importance.

The etymology, likewise, is perhaps the only interesting part of an entry (since the definition, in most cases, is obvious; petty squabbles usually only arise about particular wordings.) But with etymologies, the importance to ESL learners and translators is completely misleading. The English words that have multiple etymologies always have blending back-and-forth over time, from one etymology to the other.

Structure

Because of our current inflexible strictures, we hold the "etymology" heading level to by holy, likewise the part-of-speech headings. This forces truly stupid things to occur at lower levels. It also misrepresents what naturally occurs in the English language; etymologies themselves blend and are borrowed as "second meaning" uses become more common than primary senses.

Likewise, synonyms, antonyms, related terms and derived terms blend from one meaning to another.

With the most recent automation activities, this is being further misrepresented, by forcing L3 headings to L4 whenever a particular bot thinks it is appropriate. Not only does this undermine intentional L4 to L3 movements, it obnoxiously enforces a technicality that never had widespread support in the first place (having been added to CFI only during an edit war, in direct conflict with the widespread practice and the rest of CFI, unnoticed. The rest of CFI that conflicted was conveniently removed later.)

On a more technical note, the bot currently being run is not "well-behaved" in that it re-corrupts human corrections, when it is pointed out that it has done something wrong. (In regards to WT:BOTS, the bot/bot-operator is never supposed to make controversial formatting changes at all; the disregard for that point, is what has me in a tizzy.)

It is my opinion that this is not only wrong, but entirely misleading, and should be stopped. Furthermore, the bot in question should have activities subdivided, with separate approval phases, now that it is (long!) past the 100 test-entries threshold. The crossing the "t"s and dotting the "i"s of the formal bot-approval phase should provide sufficient feedback to make the bot operate smoothly, instead of covertly.

End of four-part observation and complaint.

--Connel MacKenzie 05:17, 17 April 2007 (UTC)Reply

I am stunned that we haven't imported the 1913. Let's do that. bd2412 T 05:43, 17 April 2007 (UTC)Reply
Yes, the most annoying aspect of this is that it is particularly time-consuming and counter productive. There are lots of better things we could be doing. --Connel MacKenzie 15:37, 17 April 2007 (UTC)Reply
I wonder why there is a need for such secrecy. The bot in question is User:AutoFormat, which is run by User:Robert Ullmann (at least, I believe that's who Connel is talking about here). Personally, I believe that a dictionary has a very pressing need for uniformity and strict formatting policies, and I believe AutoFormat is a boon to our goals. However, I agree that before it goes any further, we should codify some of our practices, so that they are the result of the community's vision, and not of a single contributor (it should be noted that most of the rules were not simply invented by Robert, but were taken off of policy pages). I believe that this should take two forms. First, of all, the bot as a whole should get a vote, on the general grounds of its overall purpose (i.e. making various formatting changes in an automated fashion) in the normal fashion. Then, a draft should be made of all the formatting rules that AutoFormat follows (sorry to do this to you Robert). The Wiktionary community should be able to discuss each of them separately. I suppose a page should be set up somewhere (somewhere other than the Beer Parlour). I am unsure of how exactly this should be done, but in some way the community should discuss and come to decisions on each of these rules. Some of the rules will be consented almost immediately (such as switching "Derivated terms" to "Derived terms"). Others will involve more arguing, I imagine. I'm hesitant to require a vote for every single formatting rule that AutoFormat is allowed to do, because that's just ridiculously lengthy. Perhaps, to start out with we could do this. We have the bot vote, and Robert puts up his list of AutoFormat rules. Any rule which is not questioned by two users with over 100 edits gets to go unmolested, and everything that does get questioned by two such users is temporarily put on hold, until we can discuss it further. This would allow AutoFormat to get back up and running quickly, and still allow us to discuss formatting issues which require discussion. Any thoughts on this? Atelaes 06:42, 17 April 2007 (UTC)Reply

While I agree with some of your reservations about Wiktionary policies (in particular, I agree that separating by etymology is a bad idea), I don't think we can be critical of AutoFormat (or any other bot) for enforcing long-codified policies. Indeed, overall it's better to have the uniformity gained by enforcing such, firstly because if it turns out we dislike its changes, then we know to fix the policy, and secondly because if we ever decide to change the policy, it's easier for bots to update the structure of a consistently-formatted Wiktionary than that of a hodge-podge. If AutoFormat undoes something a human editor does, that most likely means either that the human editor made a mistake (in which case AutoFormat is doing the right thing) or that the human editor is intentionally violating policy (in which case AutoFormat is doing the right thing). Nonetheless, if people think it's essential that it be possible for humans to violate policies, then I suppose we could create a {{nobots}} that would add a page to a Category:No bots, and require that all bots skip content between the {{nobots}} and the end of whatever section it's in (or the entire page if it's not in a section). —RuakhTALK 11:25, 17 April 2007 (UTC)Reply

Those are excellent observations Ruakh. But the interpretation of well established policy is what is in question here. AF is taking a stricter interpretation, than WT:ELE actually says. I think the {{nobots}} idea is too tenuous to try to back-support for all existing bots. Sometimes it might work; other times it would likely be misused (accidentally or intentionally.) --Connel MacKenzie 15:45, 17 April 2007 (UTC)Reply
(Atelaes: don't be sorry ;-) There is a list of what it does, it actually has documentation: User:AutoFormat.
While AutoFormat is written to run autonomously, it runs by itself in its own window on my laptop. But I watch it when it runs, and I check every edit, looking at the diff and the result unless I can tell for sure from the edit summary that it is okay. In this way it is more like someone running AWB. I haven't suggested getting a bot flag for it, because it turns out to be useful to have the edits in RC, where more people will look at them. (Clearly, if it is handed a large task, like fixing all of the {top} calls, it would need to flagged.) Robert Ullmann 12:09, 17 April 2007 (UTC)Reply
(As to "re-corrupting", look at last straw, which was the example on the talk page: AF corrected the structure, Connel reverted it, Hippietrail immediately re-corrected it. At question the structure was clearly wrong; AF got it almost right, but needed a bit more refinement and re-run, and it is now correct.) Robert Ullmann 12:37, 17 April 2007 (UTC)Reply
I'm going to allow myself a snarky comment, just because I like and respect Connel: If I was running this automation under my own account, perhaps using an edit summary of "===Hdr===, fmt", no one would have ever noticed. Robert Ullmann 12:48, 17 April 2007 (UTC)Reply
Weren't you here back then? Yes, every semi-automatic edit was critiqued, lambasted, slowcooked over a fire and served with toast. (That was what all the "history" above was all about.)
My objection isn't to the fantastic work done with AutoFormat, so far. It is against the notion that something that hasn't been codified can be enforced by bot. The L4 stuff has never had solid consensus one way or the other; Ncik's flamewars and editwarring with me was about that specific topic.
On the other hand, the etymology "splitter" mentality has had consensus, even though it is quite certainly wrong (notably, support from me, myself.)
--Connel MacKenzie 15:03, 17 April 2007 (UTC)Reply
re: Snarky comment' Actually, that is why my talk page is archived; it was overwhelmed by certain complaints and flamewars as the result of such edits, that first year or two. --Connel MacKenzie 15:35, 17 April 2007 (UTC)Reply

Is ELE policy?

AutoFormat stopped due to threats from Connel. User Talk:AutoFormat#This is wrong

Apparently we need a vote to either confirm that WT:ELE means what it says, and is in fact en.wikt style and policy, or to remove the {policy} tag and to treat it as (what?) I didn't realize there was any such basic doubt about our basic format. Or is there? Is Connel just off the wall? Robert Ullmann 11:15, 17 April 2007 (UTC)Reply

(Quote from above, Connel:) With the most recent automation activities, this is being further misrepresented, by forcing L3 headings to L4 whenever a particular bot thinks it is appropriate. Not only does this undermine intentional L4 to L3 movements, it obnoxiously enforces a technicality that never had widespread support in the first place (having been added to CFI only during an edit war, in direct conflict with the widespread practice and the rest of CFI, unnoticed. The rest of CFI that conflicted was conveniently removed later.)

(I presume he means ELE) The fact is, this is the way it was resolved; and presently is working and used by the majority of entries and users (I've run stats). If you want to change it, then the correct action is to re-open ELE for consideration, and call for a WT:VOTE on your proposed change. It is not to criticize me or the bot for following current policy, pretending it is "controversial" because you are still upset that you lost that edit war, and would prefer to think that your way is still/ever was the convention. See?

By all means propose changing the current, established policy. Don't criticize someone for following it. Robert Ullmann 12:27, 17 April 2007 (UTC)Reply

  1. Who says I "lost" that edit war?
  2. I still interpret what WT:ELE actually says quite differently than you.
  3. The convention is (and certainly was) to use L3 for those headings. If AF has skewed the results to your favored interpretation now, that doesn't mean it is correct!
--Connel MacKenzie 15:09, 17 April 2007 (UTC)Reply
WT:ELE shows ====Synonyms==== in the example, and before Translations, which must be L4 (right?) The description of the format for the synonyms section indicates that it is part of the POS section. There is no indication anywhere that L3 is permitted (unlike Derived terms, which specifies the exceptional case in which it has to be used at L3 even though that is not the normal case.)
The stats are 4380 occurrences at L3, 11525 at L4. Taken from an XML before any AF changes. Robert Ullmann 12:27, 18 April 2007 (UTC)Reply

"Off the course!"

A long time ago I used to ski at a little area in N.E., mostly at night after work. One night, there was a slalom course set up along the side of one slope.

I decided to try running it, it was set up fairly easily, and turned out I could ski it fairly well. As I was, I heard a yell from someone on the chairlift, inside the trees to my left: "OFF THE COURSE!"

When I got the bottom, someone else screamed in my face that I wasn't supposed to be using the course. I watched and listened over the next few hours, and more than a few times someone would try it, or just go around a gate or two, and there would be screams from the lift: "OFF THE COURSE!"

Now, were there a sign at the top, they would seriously reduce their stress level, and not look like idiots ... but this never occurred to them.

Just remembered that for some reason. Robert Ullmann 12:27, 17 April 2007 (UTC)Reply

Right. The applicable part of WT:ELE is the part about "flexibility." --Connel MacKenzie 15:15, 17 April 2007 (UTC)Reply
But how can "flexibility" make it an error to follow what is claimed to be formal policy? And more personal, how will I know that there are no further conventions "floating around" never written down, and which I involuntarily break by following ELE? Is there any point in even having a policy which for the last 2.5 years has said one thing [synonyms, antonyms, quotations etc has been marked by a H4 heading there since late August 2003], which is considered incorrect, but which is more or less the only piece of information that never has been changed (at least not for very long)? I.e., not even by the part who consider it to be wrong. \Mike 12:03, 18 April 2007 (UTC)Reply

more

If I understand Connel correctly, WT:ELE has certain things as policy that were not voted on, and he feels strongly that this AutoFormat bot is enforcing those things in WT:ELE which he disagrees with. Perhaps a debate about the specific offending WT:ELE items is in order. However, I agree with the above post which says that the development of AutoFormat bot should proceed with respect to the non-controversial edits.
Robert, how much of a pain would it be for you to turn off the offending features, but still run the other edits?
Connel, would such a compromise be satisfactory until the other issues are resolved? Or are you completely against the AutoFormat bot in its entirety? -- A-cai 12:46, 17 April 2007 (UTC)Reply
Note that he's the one that tagged the current WT:ELE with {{policy}}: must not be modified without a WT:VOTE, and he set up the vote process ;-) I'd like to know if anyone else thinks there is a controversy.
It isn't hard at all: all of the header formatting is controlled by User:AutoFormat/Headers. Robert Ullmann 12:55, 17 April 2007 (UTC)Reply
My complaint, yes, is about the L4 headings. OTOH, AF would do very well (and see many more improvements) if it went through its formal approval process! --Connel MacKenzie 15:19, 17 April 2007 (UTC)Reply
And yes, I would like to see AF turned back on without the L4 error. Yes, I would like to see it get a bot-approval, with or without the bot flag as a result. --Connel MacKenzie 15:40, 17 April 2007 (UTC)Reply
I agree with Ruakh & A-cai. The sooner we get to a bot-recognisable standard format the better. But if there are specific aspects where there is not clear consensus AND where changes made by AutoFormat would be difficult for it or another bot to undo, then we should hold off "regularising" those aspects until we have consensus. There are already plenty such issues which AutoFormat is set to leave alone. A few more would not hurt too much. At present, I am content with ELE as interpreted by AutoFormat, but I accept I have not thought hard about other possibilities which might be better. --Enginear 19:34, 17 April 2007 (UTC)Reply
I've changed the code to tag some entries with {{rfc-level}} as an experiment. It will not change any header level. The entries tagged are not the same set as the entries it would have fixed; it will tag entries it can't fix as well, and not tag some simple cases it could correct. (Just using about 2 lines of code right now, to try it out.) Gives us something to look at, and we can always feed the cat to the bot later ;-) Robert Ullmann 13:05, 18 April 2007 (UTC)Reply
Not only are synonyms dependent on the POS, they're dependent (or should be) on the definition number! This argument doesn't make any sense at all. DAVilla 14:24, 17 April 2007 (UTC)Reply
For all words' synonyms, in all circumstances? Of the synonym headings we have, how many are "disambiguated" like translation sections? 1%? 0.1%? 0.2%? That is approximately how often such subdivisions are appropriate. Normally, subdividing them is inappropriate; the synonyms apply to figurative uses just as much to literal uses. --Connel MacKenzie 15:14, 17 April 2007 (UTC)Reply
Subdividing the synonyms is almost always appropriate, because there are very few English words that are strict synonyms across all definitions. There's little impetus for two words with totally identical meanings to continue to exist in any language. Most often, one will drop out of use (as Charles Darwin hypothesized, and other evolutionary theorists and historical linguists since have repeatedly shown). Synonyms and Antonyms should always be L4. --EncycloPetey 23:52, 17 April 2007 (UTC)Reply

WT:RFV

Needs some serious cleanup and archiving. Any volunteers? - [The]DaveRoss 02:58, 19 April 2007 (UTC)Reply

If there were a clearly outlined procedure for the process, I would have begun doing some of this long ago, but there isn't. And I can never remember what formatting / templates /etc. are supposed to be used (and can't remember where to go to look for good examples of passed or failed entries either). --EncycloPetey 11:34, 19 April 2007 (UTC)Reply
I've been cleaning out the list, removing rfvfailed entries and archiving the oldest rfvpassed entries, but it's a lot to do. -- Beobach972 20:24, 20 April 2007 (UTC)Reply

Somebody pulled a Wonderfool/Dangherous style stunt on Wikipedia

w:User:Robdurbar (who has since been desysopped) deleted the main page on Wikipedia, blocked several bureaucrats, and deleted several important pages like w:Cheese and w:History. This reminded me of the Dangherous stunt. Could we coordinate with Wikipedia to help them learn from the Wonderfool and Dangherous stunts and to halt this kind of madness much more quickly, and possibly coordinate with its CheckUsers to determine if Robdurbar is a Wonderfool sock? Jesse Viviano 04:11, 20 April 2007 (UTC)Reply

Please see w:Wikipedia:Administrators' noticeboard/Incidents#Robdurbar for the details on this incident. Jesse Viviano 04:13, 20 April 2007 (UTC)Reply

Sock? I find that highly unlikely. They're two completely separate users who got fed up and went off the edge. Picaroon 04:18, 20 April 2007 (UTC)Reply
How do you know that? --EncycloPetey 15:18, 20 April 2007 (UTC)Reply
Think about it. I don't know much about Dangherous (talkcontribs), but User:Robdurbar is a longtime (almost two years) Wikipedia contributor who seems to have burned out, and then returned to mess around with the sysop tools. Wonderfool/Thewayforward/Dangherous stopped editing Wikipedia as those back in 2006, but Robdurbar was active as late as February 2007, before posting this goodbye message in early March and returning yesterday for his stunts. If he was Wonderfool, would he not have "gone rogue" earlier? Why stick around and keep on making productive contributions for another four months? Picaroon 20:01, 20 April 2007 (UTC)Reply
Concur. Wonderfool even did it better, with the right timing. And if it were his second time, he wouldn't have wondered aloud how long he could continue doing what he was up to. DAVilla 05:56, 21 April 2007 (UTC)Reply
Well, Wonderfool certainly had his idiosyncracies...so who knows there. Any Steward can run CheckUsers across projects, or if a Wikipedia CU were to join #wikimedia-checkuser on freenode we could hash it out in there. - [The]DaveRoss 20:47, 20 April 2007 (UTC)Reply
We do have someone (as of a few days now) who is a native CheckUser on both projects. Robdurbar was checked early on to see if the account was compromised. I don't have Wonderfool's IPs on hand though. If another Wiktionary CheckUser can confer with me in private, I can compare. Dmcdevit·t 21:45, 20 April 2007 (UTC)Reply

Zip Code lists on Wikipedia facing deletion - suggested outcome

There is a large group of list-articles on Wikipedia that have been nominated for deletion: see Wikipedia:Wikipedia:Articles for deletion/Lists of ZIP Codes in the United States by state. I am of the opinion that Category:Appendices is an appropriate home for this almanaic content. I am not asking for people to weigh into the Articles-for-discussion debate on Wikipedia, but to consider the appropriateness of the inclusion (via transwiki) of this information in Wiktionary; it is my contention that the listing is a de factor thesaurus. --Ceyockey 01:24, 21 April 2007 (UTC)Reply

Whereas this content is durably archived all over the place, and it is freely available in it's most up-to-date format online anyway, I personally think that the [delete] button is the best home for these lists as far as Wikimedia is concerned. - [The]DaveRoss 01:52, 21 April 2007 (UTC)Reply
Have to agree with TheDaveRoss. Cheers! bd2412 T 03:30, 21 April 2007 (UTC)Reply
I saw a lot of good reasons for keeping the Oregon page, and I saw a lot of bad reasons for deleting it. My favorite is, paraphrasing, "We've voted to delete all the other zip code pages, so regardless of the merits of the Oregon page, it has to go too." That's very clearly the wrong mentality for batch deletions. Personally I would neither have given so much credit to the votes that argued this information is already available elsewhere. Even if that were a valid reason, anyone who wrote that they did any investigation, apart from simply Googling USPS for the URL, concluded that it was not in as much detail. I'm not saying that I would have voted to keep the Oregon page according to encyclopedia standards, and anyways I didn't see it because it's already been zapped, but it looked like it deserved some attention at least. I don't know why quick-to-judgement comments like the above won the day.
On the other hand, it doesn't sound like Wiktionary material, certainly not as an appendix. I would argue that the proper names don't deserve individual pages since they don't have any linguistic value, and likewise for the numbers. Probably the only exception is 90210. No, we do not discriminate against numbers here, but apart from a few hundred counting numbers they do have to be more than just that. The basic question here is if someone might run across the term and wonder what it means. We don't even include the full names of historical people on those grounds alone, only if the name, not the person, has some significance, linguistic rather than historical significance. Anyone who ran across a zip code would know immediately from context what it was and where to look it up, and the utility of neither dictionary nor thesaurus have that definition. Only to a postal worker, a legitimate but not sufficient exception, would a zip code be a "name", but I doubt dictionary or thesaurus would be a good name for their reference books either.
If you've received a lot of negativity, it does sound like your project has some great value. I would suggest that you not give up your search for a suitable wiki. There is no Wikimedia map wiki that I'm aware of, but I've heard of a yellowpages wiki. As I'm not sure who would be interested in the historical data, I sincerely hope they don't turn a deaf ear either. Don't let something so useful fall through the cracks. DAVilla 05:36, 21 April 2007 (UTC)Reply

Thanks all for your input. The lists have indeed been deleted with the unfortunate citation of a vote count (there is much discussion on Wikipedia about "Articles for Discussion is not about voting", but acting on a count is effective nonetheless). I'm not personally on a mission to find a home for this information, but I do think that it has a home somewhere. It is useful to have thoughtful input from you on the scope of the Appendix section of Wiktionary. Regards, --Ceyockey 23:28, 21 April 2007 (UTC)Reply

Quality expectations of other Wiktionaries

Hoi,
I have blogged about a request to stop including the Russian and Vietnamese Wiktionary in the interwiki process that I run as a public service. The reason to exclude the Russian Wiktionary is because there are mainly empty shells. The reason for the Vietnamese exclusion is that the Russian declension and conjugation tables are wrong. Both have been asked by the Polish Wiktionary to delete the offending material and this has not happened.

I need a discussion about this because when the Polish ban my bot, the whole process will stop.

Thanks, GerardM 09:31, 21 April 2007 (UTC)Reply

Seems to me they have a legitimate complaint, although I would think they ought not to really do anything about it; it is the vi and ru.wikts' problem. But given that they insist, is it possible to filter in the 'bot so that ru and vi aren't added to the pl.wikt, without stopping anything else? (Well I know is is possible, is it something you are willing to do?) The other alternative is just to not write to the pl.wikt until they change their minds, but still read it and write everywhere else. Does the person on the vi.wikt know that much of the Russian declension information is available here? Robert Ullmann 12:07, 21 April 2007 (UTC)Reply
My bot does all wiktionaries. As a consequence no languages are configured at all. So much of the information is on the English Wiktionary.. Now how do you get it out ?? GerardM 12:30, 21 April 2007 (UTC)Reply
I wonder if reducing the amount of traffic those "lower quality" (I have no idea, I haven't spent any time on either) Wiktionarys receive is the best way to solve this. Perhaps they aren't great now, but Russian and Vietnamese speakers will encounter them from a higher traffic Wiktionary via the interwiki links, and decide to help them out some. Hiding them might not be the best option, I think. - [The]DaveRoss 13:15, 21 April 2007 (UTC)Reply
The complaint about the Russian Wiktionary is a legitimate concern. I can't recall the last time I followed a link there and found any content in an article. It's all content-free formatting; all the section headers are in place, but no definitions, pronunciations, translations, or other information. I have usually found information on the Vietnamese Wiktionary, even if I can't read it. --EncycloPetey 22:18, 21 April 2007 (UTC)Reply
What about checking the page for {{stub}} before linking to it? Would that be any easier? If it's a stub page, it doesn't exist, for all intents and purposes. I didn't see an example of the Vietnamese errors, but their tables probably use a template too, don't they? Could the same trick work for that? DAVilla 17:31, 21 April 2007 (UTC)Reply