Wiktionary:Beer parlour/2006/February

From Wiktionary, the free dictionary
Jump to navigation Jump to search
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


IPA: <r>

I've got a question about how to represent the sound for the English <r>. In most existing pronunciations here, it is a /r/, which is actually wrong, since it represents an alveolar thrill (rolled r), like in Spanish, Russian and Dutch. However, most English-only dictionaries use this /r/ since it is the only <r> they need (and for ease of typesetting). Yet, we are not English-only, and therefore we should use the correct one. The question is: which is the correct one? Is it the alveolar approximant /ɹ/ , which the Wikipedia examples claim, or is it the retroflex approximant /ɻ/ , which our examples claim? Or should it be a mix of both? Also: Wikipedia says the latter is the American <r>. Or should we not bother at all? — Vildricianus 17:48, 1 February 2006 (UTC)[reply]

Alveolar thrill rather than trill gives the issue more sex than it deserves. :-)
For me this is a good reason to avoid getting involved in pronunciation issues. The way that an "r" is pronounced is one of the key distinguishing characteristics used in identifying dialects, especially in the final position. Can we realistically do more than have some kind of generic representation for this letter? Eclecticology 17:59, 1 February 2006 (UTC)[reply]

It was me who decided to go with /r/ after reading a thick linguistics textbook on pronunciation. That book introduced me to the concept of "romanicness" by when choosing a set of letters for a pronunciation scheme, those symbols most resembling the usual roman letters are generally used first with more exotic ones only where there are more than one sound using similar letters. This is why most English dictionaries just use /r/. In my experience most bilingual dictionaries also use /r/ for English no matter what symbol they use for the other language. They do not try to combine into one set but rather have a section explaining what sound each symbol means in each language. After all, it's very rare that two language share an exactly identical phoneme. — Hippietrail 02:19, 2 February 2006 (UTC)[reply]

Agree with Hippietrail. What goes between phoneme slashes // should be a broad transcription given in a convention meant to be readable. As long as there is nearby an explanation that /r/ is phonetically [ɻ] in GenAm (and that the value may vary by dialect or position in the word, etc.), there shouldn't be a problem. —Muke Tever 18:38, 2 February 2006 (UTC)[reply]

OK, fair enough. Easier, too. — Vildricianus 19:47, 2 February 2006 (UTC)[reply]

Well the thing I don't like about just using /r/ is that it doesn't distinguish between different languages on the same page very well. The reason most print dictionaries can use it is because they don't have any Spanish or Italian words in there. If you think of the word rap for example, the first phoneme is very different in English from how it is in Spanish. I like it when Wiktionary shows how a word that is spelt the same can be pronounced very differently in different languages. I agree transcription should be broadly phonemic rather than phonetic, but I don't think using the turned-Rs is too technical. As for which one to use, well we already often distinguish between UK and US pronunciation, so I suppose you would use a different one for each. Widsith 11:33, 3 February 2006 (UTC)[reply]
God, I was looking all over for this discussion. Found it at last! I've already written my comments here. Davilla 07:23, 10 February 2006 (UTC)[reply]
Um, to put it bluntly, /r/ is hardly the only sound for which this is the case, and I don't see any convincing reason why it should get special treatment. For example in primo four phonemes out of five are different between (American) English and Spanish/Italian, but the phonemic presentation should still be /ˈprimo/ for both (IPA following SAMPA), even though the English /p/ is [pʰ], the /r/ is [ɻ] (while Spanish's is [ɾ]), the /i/ is something like [iɪ̯], and the /o/ is [οʊ̯]. Being able to turn /phonemes/ into accurate [phonetic information] is part of having a natural accent, and thus should be described, but elsewhere, as it is relevant to the phonology of the language, not the pronunciation of the word per se. —Muke Tever 20:26, 3 February 2006 (UTC)[reply]
Yes, I absolutely take your point. I guess we just have different ideas about where to draw the line. You see the turned-R as phonetic, whereas I suppose I have always thought of it as being rather broadly phonemic. Anyway I will happily adopt the consensus view on this. To play devil's advocate though, are we then going the whole hog and using /r/ for French too? If not why not!? Widsith 10:45, 4 February 2006 (UTC)[reply]
My guess, not having written the standards, is because French /ʁ/'s unvoiced allophone [χ] (which is not, I understand, considered to be rhotic) suggests it patterns more like a fricative than a rhotic sound, even though the canonical sound is rhotic. A plain /r/ would probably be expected in a pan-dialectal transcription, if such a thing were to be done. —Muke Tever 17:59, 4 February 2006 (UTC)[reply]
Interesting. Perhaps you're right. Widsith 12:54, 5 February 2006 (UTC)[reply]
There is a simple solution - context. The Spanish "r" is, actually, similar to the English English (Received Pronunciation) "r", and so it is appropriate to use /r/ for both. Few British users use the sound actually represented by /r/ these days (it sounds "posh" and old-fashioned), using a sound made the tip of the tongue further back instead (at least, that's what I do), but dictionaries that continue to use this symbol in IPA transcriptions of RP.
The "American" "r" sounds different from the RP "r" or the Estuary English "r", and there is a different phonetic transcription in IPA for these sounds, but using /r/ for the American "r" sound causes no confusion provided the pronunciation is marked as being US. Likewise with Spanish, Italian and the rest.
However, the French "r" is a completely different phoneme. All of the "r" sounds I have just mentioned are pronounced by articulating the tip of the tongue somewhere near the front of the mouth. The French "r" is articulated with the uvula, that is, at the back of the mouth. It is therefore necessary to use a different symbol, because the sound is entirely different.
Another reason that /r/ can be used for a range of sounds is that the dialects and languages I mention have only one "r" sound, as Hippietrail says, so there is no need to use a separate phonetic transcription. Context is sufficient. Occam's razor applies.
I support a general usage of /r/ for pretty much all languages that use the letter in their own orthographies, simply because it's easier to display properly. This would have to be accompanied by links to separate pronunciation guide pages with narrow transcriptions, though. Right now this is a problem, since {{IPA}} forces everyone to go to a very complicated and impractical table that tries to fit all languages into one table and one page without any room for additional comments. There needs to be separate links for separate languages. I've already established one for Swedish and I recommend using Wiktionary:Pronunciation to list the separate pronunciation guides.
Peter Isotalo 14:20, 18 February 2006 (UTC)[reply]

technical: br tags

Wiktionary currently uses <br> tags. Could this please be changed to <br />, so that the source is XHTML? Quarl 2006-02-02 08:08Z

This is a very common markup, perhaps the most widely used one on this site. Even people who are not at all familiar with HTML use it. People can use it if they want, but any campaign to change everybody's habits will be futile. As long as the old way is two keystrokes shorter, and keeps working, there is no incentive for change. Eclecticology 09:34, 2 February 2006 (UTC)[reply]


As Mediawiki passes tidy to check for HTML mistakes, it's solved automatcally. Check the source your browser gets. It's always <br />

BR:

BR/


Platonides 17:54, 2 February 2006 (UTC)

RobotGMWikt cramming my watchlist with its edits

Is there a way to prevent edits made by bots to appear on my watchlist? Ncik 01:06, 3 February 2006 (UTC)[reply]

Just above the list is a "hide bots" link. theDaveRoss
Dave, I don't see anything like that...
My watchlist
From Wiktionary
(for user "Connel MacKenzie")
Jump to: navigation, search

   * 16,489 pages watched not counting talk pages
   * Show and edit complete watchlist 

Below are the last 54 changes in the last 12 hours.
Show last 1 | 2 | 6 | 12 hours 1 | 3 | 7 days all
Hide my edits.
3 February 2006
After 2,000 edits or so, the Watchlist feature becomes less and less useful. --Connel MacKenzie T C 08:24, 3 February 2006 (UTC)[reply]
It is not possible to prevent edits from a user to appear on a watchlist. In the case of RobotGMwikt, there have been thousands of edits over the last days. The point is bots can do all kinds of things including genuine data updates like new content or adding to content. Having words on your watchlist only makes sense when these words have a special significance. I for instance om particularly interested in names of lanuguages and terminology to do with lexicology and terminology, those are the ones on my watch list. GerardM 11:42, 3 February 2006 (UTC)[reply]
It would be nice to be able to hide bot edits but not minor edits. It would also be nice, when not hiding either, to see the last non-minor change if it exists rather than just the last change. For instance, if Greg comments "complete re-write, deletion of third section" and five minites later "m sp", it's the minor spelling correction that shows up on your watch list! Davilla 16:17, 15 February 2006 (UTC)[reply]

Wikipedia:Wap Wireless

This website is very innovative and resourcefull in so many ways that lacking a computer when seeking information creates a rather tangable persue, is there a wap version including read only for us people with wap on their cells?

Wish I had better news for you. Especially since this is going to be very big in a couple of years. I guess it'll come about when enough people realize it's necessary. Davilla 16:19, 15 February 2006 (UTC)[reply]

Images - Vote!

I propose a vote:

Background

Someone has recently been misrepresenting previous discussions that related to templates that would appear on every page. The incorrect implication is that for harmless icons on maintenance templates, the same performance penalty might exist. This has been followed by the same person removing the images in question from the respective templates, without asking for comments. I doubt the performance hit of having one additional icon on about a total of 90 pages is comparable to the performance hit of having between 6 to 20 images on 120,000 pages, (that can't be cached due to the URL masking used by MediaWiki.)


For

  1. --Connel MacKenzie T C 08:17, 3 February 2006 (UTC)[reply]
  2. I think images would give the site a more professional quality. It would give the site more face validity. Psy guy 18:27, 4 February 2006 (UTC)[reply]
  3. Davilla 23:24, 5 February 2006 (UTC)[reply]
  4. Tawker 03:06, 21 February 2006 (UTC)[reply]

Against

  1. Primetime 19:51, 14 February 2006 (UTC)[reply]

Comments section

I don't see any point to this vote. Why do we even need a policy about this? Personally, I can live with or without those icons. I'm not going to delete or add them. Opposing the performance argument is pointles when it hasn't been made in the first place. The changes appear to have been made without any explanation at all. I'll leave him a note asking for at least an explanation. Eclecticology 20:33, 3 February 2006 (UTC)[reply]

We need a vote on every little issue because you dismiss out-of-hand anything I say, but support anything Ncik does. The point of the vote is to discover what the entire rest of the community feels about the issue. Just because Ncik seems to be stalking my edits still (now traversing templates and categories I've touched at some point) is no reason to assume he is making the changes in bad faith; that is why I presented the technical discussion as part of the vote. I've also found that Ncik did make the performance argument here. That in turn refers to Wiktionary:Beer parlour archive/October-December 05#Neat icons where I strongly opposed the notion of including icons for parts of speech on every page. We did have an actual vote on that related topic, then.
Are you afraid of a vote? Is there some technical reason contributors should not be allowed to give input on the topic? --Connel MacKenzie T C 00:39, 4 February 2006 (UTC)[reply]
I have looked at both references. This reinforces my view about votes since there are two issues involved, and it simply would not do to have a vote on one imply something about the other. I would not have seen the point on Dangherous' user talk page because I don't generally review the two way conversations that any user has with any other user, but clearly Ncik was talking there about the subject of this complaint. Yhr item in the Beer parlour archives was about having little icons in the headings. For that to be effective would require using templates in places where they are not currently being used. This is very different from adding images to already existing templates.
Votes polarize discussion, and that makes them usuitable for finding solutions. Your persistent suggestions that Ncik is stalking your articles, or that I am taking sides because of the personalities involved has no basis in fact whatsoever. Eclecticology 03:02, 5 February 2006 (UTC)[reply]
Votes do not always polarize discussion among reasonable people. They often resolve an issue and let us move on to better , more important things. --Richardb 05:02, 26 February 2006 (UTC)[reply]
In this case, I could suggest you are taking sides, because the issue is not about adding images, it is about Ncik removing images that have been in place for quite a while now (in conjunction with the ones added only in the last month or two.) As to stalking, what sort of statistics would make it clear to you; what analysis would you accept? --Connel MacKenzie T C 04:30, 5 February 2006 (UTC)[reply]
I did leave a note on Ncik's talk page as I said I would. He hasn't replied, but apparently that's because he hasn't been on line since I put the note there. There were only four pages affected by this round of erasures, this page and the 3 maintenance templates. None of those four images were put up by you, so no conclusion of stalking could be drawn from that action. Stalking, in a context such at this site, would involve a wilful and organized campaign to change your edits and yours alone for no other reason than that they were made by you. It must be viewed in the context of all his edits over the same period of time. Since the accusation is one of wilfull malice the burden of proof is clearly on the person making the accusations. In the absence of support for your positions, I would thank you to stop making personal attacke. Eclecticology 07:42, 5 February 2006 (UTC)[reply]
I'm not making a personal attack; I am asking for qualification from you about what evidence I should assemble. The simplest evidence is to refer you to Special:Contributions/Nick, but that might amount to a fair amount of reading...you have declined to do that research in the past. --Connel MacKenzie T C 05:26, 6 February 2006 (UTC)[reply]
My version of w:Lynx (web browser) works fine even with those icons, but I really don't see a point in having them. These icons don't add any information to the pages but increase their size. The templates in question stand out sufficiently in their current appearance (having a coloured bar across the page is quite promintent, I think); similarly for the image on WT:BP I removed. If people want the icons they should simply reinstate them. If I had major issues with them, I would let you know. But we shouldn't have a vote or major policy discussion ahead of every change. Ncik 10:46, 7 February 2006 (UTC)[reply]
While I agree that we should not be having votes on every little point, it is also important to state that differences of opinion on an issue need to be discussed. The purpose of discussions is to resolve these things. Ncik removed the icons from four articles once; Connel objected. So far so good. Neither engaged in a long cycle of delete and restore on these pages. From Ncik's last comment above it is clear that he does not intend to pursue his point of view, so where's the argument? Where's the basis for a major policy discussion? If someone wants to put those four icons back, it does not appear that there will be any argument.
The argument that these icons would significantly affect server performance is not at all convincing, and as I said before we should not confuse this discussion with the one about having little icons in every heading as is done in the French Wiktionary. Eclecticology 18:25, 7 February 2006 (UTC)[reply]
Thank you for distinguishing between the previous vote and this minor policy vote. Holding an actual vote provides validity to the decision, while providing sysops and other regular contributors with a tool or reference point to reign in wayward maverik newcomers. There is an unrelated issue over at quaint right now that would really benefit from Wiktionary having rules set down somewhere. Not set in stone, but perhaps some soggy clay. To follow that analogy, right now, our guidelines are drawn by foot in the sand on the beach; every minor wave that comes along wipes away tons. Tens or hundreds of hours of rational discussion are regularly obliterated here. Instructing newcomers on what has been discussed and/or decided previously, has users such as User:Eclecticology making subtle personal attacks on me (cf. consensus) leaving little room to explain. To the newcomers, all of Wiktionary ends up looking absurd. Again: an actual vote mitigates most of that.
I worded the vote "background" to try to make the distinction clear, with the "Fr: Wiktionnaire icons" vote comparison being made clearly up front. Is there something you'd like to add there that isn't already said?
As to Ncik's implication that Lynx is his browser of choice, I would like to point out several things. 1) It's "lynx." 2) He has been frantically promoting various major and minor redesign efforts the entire time he has been here. He has participated in discussions about testing the appearance of Wiktionary pages on as many different browsers as possible, when making formatting changes. 3) lynx is my browser of choice, but used less and less often now, as I have to see how things are rendered for others.
As to Ncik's assertion that he will not revert, I'll have to take that at face value (despite history being a strong indication that that would not have been the case.) If this vote didn't exist, or didn't have the smattering of approval it has garnered so far, I can only guess how far along the next edit war would be now. For now, I'll make the changes he suggested while the vote continues. --Connel MacKenzie T C 21:46, 7 February 2006 (UTC)[reply]
Acutally, if you glance at a page and see a big trash icon, you know it's been requested for deletion without actually having to read the box. This is helpful for people checking through pages quickly or for nonnative english speakers who perhaps would take an extra 30 seconds to parse exactly what that box is all about. A picture is worth a thousand words. And really how much bigger does it make the page if you cache this commonly used icon? And let's not forget the function of sprucing up a page's appearance. That is a valid function too. Millie 11:40, 7 February 2006 (UTC)[reply]

Do you want cum dumpster?

This is at Wikipedia and I'm not sure where to ask here as I'm not as familiar with Wiktionary. It has a ref. I did notice there was an article here on it, but apprently it disappeared, which is fine with me. Did the article get deleted here? I just want it gone from Wikipedia, as it has no real content besides being a slang term (and a disgusting one at that IMHO). User:DanielCD at Wikipedia. --156.101.1.5 15:01, 3 February 2006 (UTC)[reply]

Note: I couldn't make an accout here as the pic that needs verifying didn't show up. I don't know on whose end the problem is. DanielCD

Our deletion log shows this entry being deleted four times now (31 January 2006, 29 January 2006, 14 January 2006 and 10 January 2006.) It looks like it may be a new vandal's hobby term. It has an entry in UrbanDictionary, but probably won't get one here.
I've added a note to MediaWiki:Captcha-createaccount to remind people to turn cookies on - this is the second complaint that the image did not appear, in I think about 10 new accounts created. --Connel MacKenzie T C 23:59, 3 February 2006 (UTC)[reply]
Actually, it looks more like about 50 accounts since that last complaint on Talk:Main Page. Maybe this isn't needed? --Connel MacKenzie T C 01:06, 4 February 2006 (UTC)[reply]
DanielCD mentions that the term is slang and disgusting, but that is not grounds for omitting a term. All we need to know is whether the term meets the criteria for inclusion. There is no bowdlerisation in Wiktionary. — Paul G 11:30, 7 February 2006 (UTC)[reply]

How to get the special characters bar on the Edit form?

Hi All,

While adding some interwiki links, I noticed this great "special characters" bar below the "Save page", "Show Preview", "Show changes" buttons on the Edit form. Does anyone know how to get it for Russian version of Wiktionary? Thanks a lot! ru:User:Xbo

User:Hippietrail has done some amazing work with that. MediaWiki:Edittools, (which used to be just a part of MediaWiki:Copyrightwarning,) MediaWiki:Monobook.js, and MediaWiki:Monobook.css all factor into making that work. Perhaps you could try his talk page. --Connel MacKenzie T C 02:51, 8 February 2006 (UTC)[reply]

Time to rethink being a non-encyclopedic dictionary?

Seeing the first few reactions to my RFD nomination for the encyclopedic entry Martial, what do the rest of us think about our decision so far not to be an encyclopedia dictionary? Is it time to vote? I guess I don't care too much but I see a lot of arguments in the future about "who to include". Anyway I do firmly believe we should either have them or not have them, and not just have a few. — Hippietrail 23:45, 6 February 2006 (UTC)[reply]

IMHO every person, place, artifact, event etc that is worthy on an entry in Wikipedia should have at least a one line description here with a link to the relevant article in Wikipedia. For several reasons. Among them
  • They need translations in many case. Sometimes the spelling is different, sometimes the name is entirely different.
  • In the future Wiktionary might be used to annotate text and encylopedic terms need annotations too.
  • In the future people wanting to know what something means might start here searching here and not on Wikipedia.
  • It is easier to allow it than to chase people that disagree. It is on orthogonal issue since it doesn't really effect the other entries.
--Patrik Stridvall 08:59, 7 February 2006 (UTC)[reply]
At first glance I support adding encyclopaedic terms, mainly for the purpose of etymology, pronunciation, translations, synonyms, etc, rather than for their definitions, which should be concise or, if that is not possible (I guess this is the case for many scientific terms, whose precise definition often requires a lot of background knowledge and terminology), be rough explanations with a remark that a full definition can be found on some linked Wikipedia page. Ncik 11:03, 7 February 2006 (UTC)[reply]
I'm in two minds about this. I have been guilty (if that is the right word) in the past of contributing what counts as encyclopedic material, namely place names, index of fictional characters and the latter has been largely frowned upon. I'll give my views for and against below without coming down on either side.
Points in favour:
  • As Patrik says, providing translations is a good thing.
  • Wikipedia features translations in the form of corresponding articles in other languages, but most articles are missing most languages.
  • Users might well start looking in Wiktionary rather than Wikipedia (although the latter is probably better known).
  • Users finding a brief entry here could then be cross-referred to Wikipedia (as happens with a lot of our entries already) to find out more information.
  • Wiktionary is not paper, so there is no risk of Wiktionary being "overwhelmed" with these entries or of there being "not enough space".
Points against:
  • What counts? How well-known must a figure be to merit an entry?
Why not leave that to Wikipedia? If it is in Wikipedia it is OK if not well that depends on why I guess... --Patrik Stridvall 13:03, 7 February 2006 (UTC)[reply]
It's not about fame or fortune, it's about use in language.
  • People: Many times a single name like Aristotle, Muhammad, Bolivar, Einstein, Stalin, or Truman can be brought into discussion without any annotation toward who is being discussed. There are other Muhammads and Trumans, and in the right context the general term would not apply, but out of context it's a specific person. Douglass is the opposite case. Although an understanding of the person discussed might be clear within a very narrow subject, such as abolition, out of context it's not, and so specification such as "Frederick Douglass" is required. Frederick Douglass is quite suitable for Wikipedia as a notable figure, but he's most commonly referred to by his full name. You'll note that this requirement has already weeded out figures who may not be well known in other parts of the world. And it agrees with Wikipedia, where "Muhammad" points to the page about the Islamic leader whereas "Douglas" and "Douglass" require disambiguation. (The "Truman" page is also disambiguated, but the U.S. president is bolded as the first name. "Bolivar" even demands a brief discussion of Simon Bolivar before the disambiguation begins.)
  • Place names: Go halfway around the world and see if there's a translation.[1] Austin (in Texas): YES. Waterloo (in Iowa): NO.
I contend that names can be linguistic as token monikers. Davilla 02:44, 10 February 2006 (UTC)[reply]
  • How do we prevent these entries from becoming encyclopedia entries (with length and content similar to Wikipedia's entries)? Do we impose a word or line limit?
The criteria should be that somebody vaguely knowledgeable should be able to say: "Ah, now I remember." as well not be completely incomprehensable to anybody else. --Patrik Stridvall 13:03, 7 February 2006 (UTC)[reply]
  • We risk unnecessarily duplicating the work done at Wikipedia.
Duplicating what? We are talking a single sentence in almost all cases. --Patrik Stridvall 13:03, 7 February 2006 (UTC)[reply]
  • Wikipedia often contains translations (in the form of corresponding articles in non-English Wikipedias).
Yes, but not a complete translation. An entry can have multiple names and only one of them is translated. --Patrik Stridvall 13:03, 7 February 2006 (UTC)[reply]
  • If a person doesn't find the article here, the search results should (but currently do not) refer the user to Wikipedia, in the way that searches on Wikipedia that give no results suggest a search in Wiktionary.
To what Wikipedia? We are multilingual. They are not. Somebody searching for the Japanese word for Jesus are not likely to find it in the English Wikipedia. --Patrik Stridvall 13:03, 7 February 2006 (UTC)[reply]
  • A user is more likely to start their search in Wikipedia than here, as that is the better known resource.
At the present, yes. However, Wiktionary are likely to sooner or later have MUCH more entries that Wikipedia. --Patrik Stridvall 13:03, 7 February 2006 (UTC)[reply]
Paul G 11:26, 7 February 2006 (UTC)[reply]
I'm in favour of short encyclopedic entries with a link to -pedia. It is common in paper dictionaries (yes, I know that's not a valid argument here), and it does no harm. But, as with all other entries, each one should be considered on its own merit and trimmed or expanded as needed. SemperBlotto 11:35, 7 February 2006 (UTC)[reply]
I think entries like Gulf of Mexico and Virgil are needed, primarily for translations, which are absent in 'pedia (and which should remain so, since 'pedia is not a dictionary). However, I have no idea where to draw the limit. An option is, of course, not to include such things and make 'pedia have translations in their articles. — Vildricianus 13:39, 7 February 2006 (UTC)[reply]
I still think that we should approach this matter very conservatively. Wiktionary is not an encyclopedia. From Vildricianus' comments I would be inclined to keep "Virgil" but dump "Gulf of Mexico". "Virgil" has affected the language; "Gulf of Mexico" has not. There should be a linguuistic rationasle for including these words.
Perhaps not exclusively linguistic, wiktionary should help people with translation too, in which case I want to know what people of various nationalities call places, people etc. Some lines will be easy to draw, Virgil may be different in some languages, whereas Cher will probably remain the same ;) TheDaveRoss 02:52, 8 February 2006 (UTC)[reply]
A dictionary is for people wondering that some word or phrase means. It is not at all obvious in many cases whether an unknown word or phrase is encylopedic or not. --Patrik Stridvall 19:57, 7 February 2006 (UTC)[reply]
The providing translation argument is a week one. We don't know where "Ultimate Wiktionary" or "WiktionarZ" is heading. As long as it wants to be everything to everybody it won't get anywhere, but one task I would easily concede to them is these endless trasnslation lists. Let's just wait and see where they are going with this, and maintain some focus on what we can do best.
Perhaps, but we are wasting time right now by deleting encylopedic terms as well as possibly making new contributors so upset that they decide not to join us. To not meantion all unnessary discussing with people that feel that their entry is justified under some interpretation of no encylopedic content. --19:57, 7 February 2006 (UTC)
When you do not know where WiktionaryZ is heading, it is mainly because of what I perceive as a lack of interest. You have said time and again that you are not interested until there is something to see. WiktionaryZ does want to have content like "George W. Bush" because it is not obvious how this name is written in other languages, it is not obvious what it sounds like. I salute you for maintaining focus on the English Wiktionary, it is the sensible thing to do while at the same time you have to realise that we are working towards being "fully functional", this means that WiktionaryZ intends to include all the information that exists in the Wiktionaries. The Wiktionaries are everything to everybody. Technically for WiktionaryZ there is no difference in including a "George W. Bush" and a word like "dictionary", "glossary" or "thesaurus". GerardM 11:33, 9 February 2006 (UTC)[reply]
Paul made a good point when he said "If a person doesn't find the article here, the search results should (but currently do not) refer the user to Wikipedia, in the way that searches on Wikipedia that give no results suggest a search in Wiktionary." The solution to that should be to fix that one page so that people can go to Wikipedia to look for an entry; not to create an entry for every missing name. There's nothing wrong with those links going to the English Wikipedia. Yes we are multilingual, but a person is looking here because he wants an English explanation, and going to the Japanese Wikipedia to look for "Jesus" just won't be helpful.
Of course they want an English explaination. That is why they are here. However, somebody has seen a Japanese word. Doesn't know it means "Jesus". Searches for it. Doesn't find it here. Is redirected to the English Wikipedia that doesn't find it either... What should they do now? Give up? --Patrik Stridvall 19:57, 7 February 2006 (UTC)[reply]
Jesus is not the best example because it is considered dictionary material, both as the given name and as the religious figure, as would be any translations. Wiktionary does share some entries with Wikipedia, although they are many times more abbreviated. Davilla 01:35, 10 February 2006 (UTC)[reply]
I'm not interested in becoming bigger or having more articles than Wikipedia. I'm far more concerned with quality than quantity. There's also one very important advantage to being smaller. The bigger project will also be the bigger vandalbait. Eclecticology 19:33, 7 February 2006 (UTC)[reply]
When or if we get have more entries than Wikipedia we will presumably have much more administrators. --Patrik Stridvall 19:57, 7 February 2006 (UTC)[reply]
More admins means a bigger chance of some of them going rogue and start internal conflicts. That does more damage to the community (and most likely the project as a whole) than any number of vandals can ever achieve.
I must also warn you about trusting the wikipedias to make a judgement as to what merits inclusion. By now there's close to nothing that you can't write an article about, especially if it nets at least a few hundred hits on Google and happens to be popular among (internet) nerds. Obscure video game characters, individual starship classes in Gundam, webcomics that only have a few thousand readers and anime fan slang are considered both "notable" and "encyclopedic", which hints that both concepts are starting to get dangerously watered down. Wikipedia's criteria for inclusion are so heavily biased towards the needs and passions of young, white, straight, computer-savvy men that if they were applied equally to all other social and ethnic groups just in the West, it would look more like a slightly copyedited backup of the internet than an encyclopedia.
Peter Isotalo 20:36, 15 February 2006 (UTC)[reply]
A dictionary is for word and expressions that people people use in their communication. If somebody meantions an obscure video game character the information that it is an obscure video game character is sometimes useful and can help understanding. The criteria should be that if it helps the understanding of what has been said it should be included here. --Patrik Stridvall 22:24, 15 February 2006 (UTC)[reply]
The names of obscure video game characters is rarely useful outside of these particular fantasy worlds. Most of them will be deservedly forgotten when the next new video game fad is launched. We really have to look at this in terms of just who our long term audience will be. Without realistic standards we end up inundated with a flood of endless trivia. Eclecticology 00:22, 16 February 2006 (UTC)[reply]
Forgotten, perhaps. However, texts that have been written remains. Futhermore, they usually have names that don't overlap with more important entries. So they don't really clutter things that much. Most people would even see them. Still, the real problem IMHO is that is I think it is hard if not impossible to define an objective standard. Saying max one sentence and a link to Wikipedia is one thing, even the most important historical persons usually doesn't need more. But forbidden them will just lead to endless discussions because of the inherent subjectivity of any standard. --Patrik Stridvall 09:19, 16 February 2006 (UTC)[reply]
Sonic the Hedgehog yes, ラーク, minor charater from old Japanese video game, no. Gerard Foley 00:30, 16 February 2006 (UTC)[reply]

traditional names of animals

Is there a word (maybe country name) for the traditional names of animals? e.g. Brock the badger, Reynard the fox etc. SemperBlotto 15:21, 7 February 2006 (UTC)[reply]

Er... well brock is just an ordinary word for badger. Reynard is a literary reference (Reynard the fox)... OED calls reynard "a quasi-proper name given to the fox; also occas. used as an ordinary noun. As a proper name written either with or without capital." I don't know if 'quasi-proper name' is the kind of thing sought. Tom for a cat, to take another example, is given without special comment although the etymology says it is also a literary allusion, deriving from the name of the cat in the 1760 work "The Life and Adventures of a Cat". So: dunno. —Muke Tever 19:28, 7 February 2006 (UTC)[reply]

cross indexing Asian languages

Original Post: After entering in about 100 new words and phrases, I became frustrated by the inefficiency of the input process. I had to go to several different pages just to properly index a single word. I will give an example of such a word, and show how my new method will help out.

The word is "university": 大學, 大学

etc ...

See: Category:大 for a concrete example. Observe how it now spiders out to:

  • Chinese language->Chinese Min Nan POJ index
  • Chinese language->Chinese Pinyin index
  • Chinese hanzi->CJKV radicals->Japanese kanji->Japanese language
  • Japanese language->hiragana index

and so forth and so on ...

Let's say you want to start a Cantonese index. Simply go to Category:大, make the appropriate entry (Category:yue:daai->Category:yue:d->Category:Cantonese Yale index->Category:Chinese language, for example). Once you do that, any words that have the Category:大 placed in them will automatically be added to the correct index. For the programmers out there, think pointers in C, or the Collections classes in Java. I hope all of this makes sense. I think it will once I put a little more plumbing in. A-cai 13:39, 8 February 2006 (UTC)[reply]

Note: I linked the category above so I can see what A-cai is talking about. --Connel MacKenzie T C 16:13, 8 February 2006 (UTC)[reply]
Note: I also corrected the category trail above. --Connel MacKenzie T C 18:11, 8 February 2006 (UTC)[reply]
Question: Your goal is to use categories to generate language indexes? I don't know enough about the languages your example uses to comment much. But we do not use Wikipedia-style disambiguation here...anything with a parenthesis in the pagename/category name is probably "wrong." For category names, we use the ISO language codes within topic areas. --Connel MacKenzie T C 18:11, 8 February 2006 (UTC)[reply]
Question: I'm not sure I understand the Cantonese index example. Wouldn't every word in every language which uses that Chinese character then show up in the index? How do you make a Cantonese index that way? Millie 00:33, 9 February 2006 (UTC)[reply]
Question: I'm not sure that I understand it all either. That makes me hesitant to be too critical. I will restrict myself to a few observation.
  1. To whatever extent that categories need to be applied for Cantonese and Min-Nan words we should use the ISO 639-3 codes, which are "yue" and "nan" respectively.
  2. Pinyin is a standardized system for romanization, not for pronunciation. I think that it's important to keep this in mind to avoid getting completely confused by the chaotic maze of transliterations that have developed over the years for Chinese.
  3. Can we avoid using the special font for representation of Min-Nan? We may need to make special accomodation for a vertical line above a vowel, but it seems that the other letters with diacritics are all available.
  4. The references to "C" and "Java" will only help some people since we would hope that what we have here will also be understood by non-programmers.

I'll probably have more to say about this as the thread develops. Eclecticology 08:30, 9 February 2006 (UTC)[reply]

Answer: In response to the concern about parens, I have changed the categories to more closely reflect wiktionary standards. In response to the question about how to do a Cantonese index, please follow the links in my original example and I think all will become clear.

A-cai 10:24, 9 February 2006 (UTC)[reply]

Answer: I should also comment about my reasoning for using POJ to phonetically spell Min Nan. I am not fond of the special characters myself when it comes to inputting new words. As you point out, some of the letters require Unicode fonts such as Lucida Sans Unicode. I decided to use POJ because it is the most widely used among native speakers of the Min Nan dialect. In fact, the Min Nan version of Wikipedia uses POJ. For the Min Nan speakers out there, please see the Min Nan wikipedia help page for guidance on how to input POJ

A-cai 11:00, 9 February 2006 (UTC)[reply]

Comment: I want to maintain an open mind about the usefulness of your proposed categorization, and how it relates to the entries in the "Index:" [pseudo]namespace. Indexes and categories are not identical concepts. I also do not want to prujudice the status of Cantonese and Min-Nan as dialects or languages. We still need to proceed with some caution. To the extent that categories are appropriate for Cantonese and Min-Nan, please use the format Category:yue:Name and Category:nan:Name respectively. The "zh-" should not be included.

I can see what you are trying to do with attestations, but better than Google links would be actual representative referenced usages of the expression from Chinese writing, with English translations added.

I think too that the language shown for the character entries should be "Chinese" and not "Mandarin". The characters are used (mostly) by all Chinese language versions, not just Mandarin. Using "Mandarin" there would strike me as appropriate if that character exists only in Mandarin. Similarly "pinyin" should be shown as a general standard romanization for Chinese, rather than as a pronunciation for Mandarin. I fully realize that pinyin is based on Mandarin, but without a common reference point any kind of order to these issues would be extremely difficult. Eclecticology 21:49, 9 February 2006 (UTC)[reply]

Response

  1. I am in the process of changing the categories from zh-min-nan and zh-yue to nan and yue. It also saves me some typing!
    Comment: Less typing can always be a convincing argument around here. :-) Ec
  2. With respect to the comments about the Mandarin, Pronunciation, and example sentences vs. Google hits. Ideally, we should have both. For example, see: 三令五申. The Google hits give me information that one example sentence can not, such as how common is a term in one region vs the next (look near the bottom of wikipedia:Taiwanese Mandarin for an example of what I'm talking about. I love example sentences, and I have been putting them in where I can, but it is a lot faster for me to put the Google hits in there, which is better than nothing (it takes me five seconds to put in the Google hits, where as I have to sit and think about an appropriate example sentence, then break it apart etc).

A-cai 01:57, 10 February 2006 (UTC)[reply]

Comment:: 三令五申 looks fine. I realize that adding in example sentences from referenced usage (especially from Chinese literature) with translations is a LOT of work. At the same time it's the kind of feature that will make Wiktionary a superior work. The Google hits will understandably have to do in the short run. There should be no need to break apart the example sentences and link every character there. In these quotations it should be enough to put the term being studied in bold face so that it stands out. Having an English translation for the quoted sentence is much more useful. Eclecticology 08:35, 10 February 2006 (UTC)[reply]
Response: I should add a comment about why I am linking every word in the example sentences. I think that there is huge potential with wiktionary that is really exiting. I think we are just scratching the surface of what could be done. I did the following translation to illustrate what I think could eventually be done: Preface to the Poems Composed at the Orchid Pavilion. Note how each word in the original Chinese links back to a wiktionary entry!

A-cai 16:18, 10 February 2006 (UTC)[reply]

Response: I have updated the orginal post so that the example now links to Category:yue:daai as opposed to zh-yue:daai.

A-cai 04:44, 10 February 2006 (UTC)[reply]


Answer: In response to every word in every language which uses that Chinese character then show up in the index? Yes Millie, you are correct. That is what we want. Keep in mind that many of the compound words in Japanese, Korean, Vietnamese etc. originally came from Chinese. I do not think it would be too huge of a list. Some of the Chinese characters would have a huge list which we could later put a CategoryTOC for easy navigation. For example, currently the largest Chinese on-line dictionary is 國語辭典 (Guoyu Cidian). In the case of a common character such as , it lists 2032 words and phrases that use the character in any position of the word or phrase. In the case of , the number is 2662. My preliminary educated guess is that there will be relatively few compound words that are unique to Japanese, Korean or Vietnamese that would fall outside of that. I really do think that each character will have a manageable number when we eventually cross-categorize everything. Are there any linguistics/statistics majors out there that can either support or refute my theory?

A-cai 05:06, 10 February 2006 (UTC)[reply]

Warning!!! This next part is intended for the programmers who are (hopefully) reading this.

  • I am treating each CJKV character like a C struct that can then be linked to other categories (if one were designing an SQL database, you might achieve the same thing by creating a table for each CJKV character, then linking the CJKV tables back to higher level tables such as one for a Chinese Pinyin index). So far, I have been doing things by hand so that we can come up with a design model that everybody is happy with. Once that happens, it should not be too difficult to write a script that would create a category for each CJKV character. Somebody already wrote a program that did the individual pages with their accompanying phonetic spellings, radical info etc. Now that we have a page for each CJKV character, I am proposing to create a Category for each CJKV character as well. As can be seen from the above example, this will make for an efficient cross-language indexing scheme. Once that is done, all that would be required to add a new entry to all relevant indexes would be to include "Category:CJKV character" in each entry (ex. include Category:大 in the entry for 大人). Is there anyone at wiktionary with both the know-how and the authorization to do this?

A-cai 14:44, 10 February 2006 (UTC)[reply]

Comment: There are programmers reading this besides me. But I think Eclecticology's advice (#4 above) is still pertinent: the category analogy is enough for people to grasp what you are getting at, referring to abstracts might not help as much as you think it might.

C 22:34, 10 February 2006 (UTC)[reply]

Question: I am still unclear on your goal. You don't want characters in their own language? Or you are trying to categorize characters into all languages they are used in? Am I correct in assuming that the CJKV characters themselves are similar to our letters, in that they would rarely be called words themselves? And that there is tremendous overlap from language to language as to which CJKV characters are used?

C 22:34, 10 February 2006 (UTC)[reply]

Answer: Thanks to everybody for all of the constructive feedback! I am in the process of gathering information that will help clarify what I'm after. A quick response to the question about CJKV characters being similar to letters. The linguistics explanation is that each CJKV character represents a single morpheme. A quick illustration that works the same way for both English and Chinese:

The English word is "electromagnetism":

The Chinese equivalent is 電磁, 电磁. (Pinyin: diàn cí)

I picked this word because it is one of the few words where the English and Chinese morphemes match up exactly. The English word is composed of two morphemes: electro and magnetism. The Chinese morphemes are:

A-cai 23:25, 10 February 2006 (UTC)[reply]

For more information about language overlap, read this article which contains a far more rigorous explanation than anything that I could come up with in 15 minutes. In brief, it is estimated that 50% of Korean words, 40+% of Japanese words and 30% of Vietnamese words are of Chinese origin. This is comparable to the relationship between Latin and English (50% of English words are Latin origin), French, Spannish, Romanian, Italian etc.

A-cai 00:30, 11 February 2006 (UTC)[reply]

Question: So what you propose is adding characters for each CJKV character that contains a set of words in the category from multiple languages? Am I reading this right?

C 22:34, 10 February 2006 (UTC)[reply]

Answer: Correct. I am speculating that the overlap would be substantial enough that it would not be necessary to create separate categories such as Category:ja:大, Category:ko:大, Category:nan:大, Category:yue:大, Category:zh:大 etc. If I turn out to be wrong, it will most likely only apply to a small subset of CJKV characters. I think we could sub-categorize the offending characters on a case-by-case basis.

A-cai 00:30, 11 February 2006 (UTC)[reply]

Question: So what would you want a 'bot to do? Go through all the NanshuBot entries and auto-categorize them by each of their component CJKV characters? And the 'bot would then let you fill in any redlinked categories manually?

C 22:34, 10 February 2006 (UTC)[reply]

Answer: Let's use the character as an example. I created Category:大, then added the following information as you can see from the link:
  1. Radical/stroke count
  2. Chinese Pinyin and zhuyin fuhao (注音符號, also known as bopomofo - this is used in Taiwan) spelling.
  3. Min Nan dialect spelling.
  4. Cantonese dialect spelling.
  5. Japanese hiragana and romaji spelling

To this we should probably add:

  1. Korean spelling
  2. Vietnamese spelling

(I didn't add these two because I do not speak those languages, and would not feel confident about the entries)

If you view the source for Category:大, you will see the following:

See also:

Notice that the only thing different about these two is the top line. This is because they represent the same morpheme. All of the info in there is already found in the entries for and . Min Nan is missing, but if we can't find anything on-line, I can put those in by hand. I have mainly relied on this on-line dictionary for the Min Nan POJ spellings. I have been typing in the Traditional Chinese, then hitting the go button. Does somebody know a way to use this web site in an automated way?

One last thing, I have deliberately left off tone marks in the Categories for the tonal languages (i.e. Mandarin, Cantonese, Min Nan). I think the westerners who struggle with tones will find this easier than guessing the tone and then searching through 4 possibilities for Mandarin (or 7 possible tones for Min Nan!). A-cai 00:39, 11 February 2006 (UTC)[reply]

Answer: With respect to the adding of red-linked categories by hand, this could also be automated. Each entry in wiktionary is identified by language. A program could be written so that whenever a new entry containing CJKV characters shows up, the CJKV character categories would be automatically entered. For example, if someone created an entry for 大人, a program would add Category:大 and Category:人 to the entry.

A-cai 01:25, 11 February 2006 (UTC)[reply]

Comment: If I got this right, you'd start by going to sourceforge and testing what you are trying to do on ten or less entries. Then you'd post a new message here in the beer parlour requesting 'bot status (for a new user account that you create called something like User:A-caiRobot or User:A-caiBot.) After a vote of approval, (usually one week) you'd post another message on meta: to request the 'bot flag. About one week later, you could begin running the bot yourself.
Comment: If technical barriers are in your way, I (or others here) could assist with 'bot environments, cpu time, bot customizations, Python programming, etc. But really it's up to you to figure out (and describe) what specifically needs to happen to individual entries first. If everyone here can't understand it, it will take much longer to get approval. Sysops here are instructed (by the inherited Wikipedia policy page) to immediately block 'bots that are running without the 'bot flag set. --Connel MacKenzie T C 22:34, 10 February 2006 (UTC)[reply]
Response: I could probably write the program myself if I had to. I would rather let someone interested in linguistics/programming write the code. That way I could get back to entering words into wiktionary. I believe there are two reasons why we haven't yet seen a lot of participation from the Chinese/English speaking community:
  1. There are not that many entries yet to make it a useful professional tool.
  2. Until I started reorganizing the Chinese language page, it was a nightmare to find anything. When I first started two weeks ago, there wasn't even a link to a Pinyin lookup on the main page!

Being only one person, I can use my time in one of two ways (since I come from both backgrounds): I can focus on software issues (such as indexing), or I can enter in the words. I know a lot of words and phrases that have never been documented in Chinese-English dictionaries (when I say a lot, I'm talking thousands! I was thinking of writing a book until I stumbled across wiktionary. A light bulb suddenly turned on in my head - 茅塞頓開, 茅塞顿开). I would like to focus on getting those into wiktionary. I suspect that there are far more competent programmers out there than multi-lingual Chinese experts. However, if no programmer steps up to volunteer, I may eventually end up attempting to write the bot myself. A-cai 01:16, 11 February 2006 (UTC)[reply]

Comment

I have some time and think I'd like to try my hand at this. So I don't need any special approval, just write a script that works and bring it here for a vote? Does this bot involve any of the category reorganization you're talking about, or does all it do is categorize new entries into their component CJKV characters? Just text manipulation doesn't seem like it should be too difficult. The only thing is I'm not familiar with the environment. There should be plenty of references for Python online, right? Is the input and output to a wiki pretty standard? Is sourceforge the bot sandbox for testing, or do I have to set up my own machine to mimic the servers? Davilla 16:52, 15 February 2006 (UTC)[reply]

Commment

I have tested directly on Wiktionary (throttled!) but then, I was re-using the bot functionality that already exists (http://sf.net/projects/pywikipediabot) for my tasks. I think the preferred IDE for Python is Eric. The getting-started documentation at meta:Using the python wikipediabot was helpful. --Connel MacKenzie T C 17:22, 15 February 2006 (UTC)[reply]

Comment

Okay, I'm on it. Or I'll try at least, starting with simple Category additions. If I encounter problems I'll report back. Davilla 11:42, 16 February 2006 (UTC)[reply]

Response

  • Thanks everybody. I think there are a lot of on-line resources that can help us with the wiktionary endeavor. Here are a couple of web-sites that could help us right off the bat:
  1. CEDICT in UTF-8 with both traditional and simplified Chinese
    A script could be written that could format each word in the file (approximately 30,000 words) to wiktionary standards and voila! That would free me up to concentrate on words that are not documented in Chinese-English dictionaries (but are in common usage, you would be surprised at how many words are in this category).
  2. http://cojak.ajax.org
    This is an awesome website. As you will see from clicking on the link, the data associated with each CJKV unicode character is listed (Radical/Stroke count along with Mandarin, Cantonese, Japanese, Korean and Vietnamese readings). The sight uses a php script called index.php that takes a unicode value as an argument. A program could be written to gather the information for each of these languages. I believe User:Nanshu must have done something similar to this so that he could generate the pages for each character.
  3. http://lomaji.com/poj/tools/su-tian/index-en.html
    This website can provide the data for Min Nan. It looks like a script could be written to query this one as well. For this website, you would need to use UTF-8 encoding (ex. %E4%B8%83) in the url.
  • For more examples of how to create a Category for a CJKV character, please consult the Category:CJKV radical index. If you keep clicking the sub-categories, you will eventually reach the end of the line (which will be a CJKV character category such as Category:捷).

A-cai 12:06, 16 February 2006 (UTC)[reply]

Morocco

Not sure where to post this, or if indeed I should, but I am working at the moment in Marrakech and will be here for a month or so...so if any editors have any niggling queries in the realm of Moroccan Arabic I will be glad to attempt to find answers. Widsith 17:18, 8 February 2006 (UTC)[reply]

Language links in Translations section

Where should these actually link to? The templates (that I'm substituting) link to 'pedia, but the manual links are mostly internal links. — Vildricianus 22:05, 9 February 2006 (UTC)[reply]

Wouldn't someone translating into Russian already know what Russian means, inside and out? Wouldn't it be most useful to link to the Wikipedia Wiktionary of that language? Davilla 01:19, 10 February 2006 (UTC)[reply]
"Common languages" shouldn't be linked at all. It is subjective what not-common might mean, but generally, those link to their respective language pages on Wikipedia. The focus is supposed to be on the reader/learner of the information. A core premise of MediaWiki is to share knowledge with people, in their own language. --Connel MacKenzie T C 01:23, 10 February 2006 (UTC)[reply]
I agree about the "common languages" issue, but generally I consider the links to be to that name in this project. Russian is clearly a common language and would not be linked. The ones that are linked are there because the reader may be unfamiliar with an obscure language. If he wants to know more that the rock bottom basics about the name the reader can then go to Wikipedia. Davilla's response makes me wonder about whether we are answering the right question. Eclecticology 01:39, 10 February 2006 (UTC)[reply]

Hmm, I should have phrased my question better. My main concern is not what to link, but where to link. More particular, there was this template {{ca}}, which stands for Catalan. Should this be Catalan or Catalan? (Y'all agree Catalan needs a link). Personally, I am in favour of internal links, referring people to our smaller article, which already clarifies things a lot by providing a translation. That page then contains a Pedia-link, so people have the choice to go further or not. The current templates all link to Pedia, except Ido (which I changed yesterday for testing), which leaves people no choice at all. Instead, they're being directly transferred to Pedia, away from Wiktionary.

Concerning the choice of which languages to link, we should perhaps make up a list of what we consider as being "common, well-known languages". — Vildricianus 08:36, 10 February 2006 (UTC)[reply]

Well if it's only the tiny ones, I guess it doesn't make much sense to link to ca.wiktionary.org[2] after all, seeing as how little is there. Davilla 09:01, 10 February 2006 (UTC)[reply]
Actually, that might encourage native speakers to further develop the small wikis. --Connel MacKenzie T C 21:12, 10 February 2006 (UTC)[reply]

Look at the example links I provided, I'm not talking about linking to the Catalan-language wiktionary, it's about linking either to the English article "Catalan" here or at Wikipedia. — Vildricianus 09:11, 10 February 2006 (UTC)[reply]

Oh, you didn't say I wasn't allowed to think outside the box. Those two are my only choices, are they? Then link to the Wiktionary page, as in your example of Ido, from which surfers can arrive at the Wikipedia page, as you've already mentioned. They can also arrive at the Ido Wiktionary, to your chagrin I'm certain, as you've explicitly ruled out a direct link to that. Davilla 59.112.39.84 10:18, 10 February 2006 (UTC)[reply]

I dunno. I liked the Wikipedia links for languages. Perhaps we could link internally to that language's Index: page? --Connel MacKenzie T C 21:12, 10 February 2006 (UTC)[reply]

As a matter of fact, it didn't occur to me to link to anything else than either the Pedia or the Wiktionary entry. I considered these links primarily to be clarifying the raw facts of that particular language, not guiding the user away to that language's Wiktionary or to an index of its words. It didn't occur to me either that this choice was still quite up for grabs as it seems to be. I was looking for a standard, but it seems as if each entry has a bit different standards on this. If, however, we would link to other Wiktionaries or Index pages, we need to wikify all languages. — Vildricianus 21:41, 10 February 2006 (UTC)[reply]
Yes, and thinking of all the translations given for multiple tenses, that's a good reason not to do it. I didn't know the policy and I apologize for framing this in terms of Russian, which is probably where the confusion originated. But now that it's understood, let's take it off the table.
An alternative that would handle this new issue I've created is to have a language selection for the search area in the left margin, as is done on the root Wiktionary home page[3]. The problem of course is that we don't want people to select Spanish just because they're looking up a Spanish-language or borrowed word. Perhaps under the Go and Search buttons there could be text that reads "Words in any language. Definitions in" + English defaulted drop-bar. The French Wiktionary would have the same translated with French as the default. Wiktionaries that define only words in their language, rather than words in all languages, would not need any explanatory text. Davilla 04:06, 11 February 2006 (UTC)[reply]

Passed rfv?

Discussion copied here from [WS talk:RFVA]. --Connel MacKenzie T C 00:57, 10 February 2006 (UTC)[reply]

Perhaps we should have a template that says "this entry survived our rfv process" with a checkmark logo or something? It could have some verbiage about why the citations are retained in the entry and perhaps a warning not to resubmit it to rfv? Or should such a thing go through the WT:BP for a vote, first? --Connel MacKenzie T C 17:52, 27 January 2006 (UTC)[reply]

Sounds like a nice idea, maybe you should run it by BP. —Muke Tever 19:23, 28 January 2006 (UTC)[reply]
Is the purpose of this to give confidence of definitions to users of the dictionary? If so then it might act as a deterrent to modifications and would at the same time be easily open to abuse. Is the purpose solely to keep RFV's from reappearing? Then it should go on the talk page, including the definition(s) that survived RFV in unchangable text. Davilla 01:14, 10 February 2006 (UTC)[reply]
I've often wondered whether rfv and rfv-sense should be separate pages. In either case, when verification has been received the evidence should be put right there on the page. It would seem that there would be a bigger problem when something is rejected, and keeps reappearing. The current POV sillines about Jahbulon is something else. Eclecticology 01:48, 10 February 2006 (UTC)[reply]
Splitting rfv and rfv-sense is interesting. Would the citation requirements change for one or the other? --Connel MacKenzie T C 21:05, 10 February 2006 (UTC)[reply]

I was only asking about the entries that are retained because they have three or more citations now on the entry's page. Something that conveys what was disputed and what was resolved. That way, if as the result of an rfv, some nonsense was removed and citations provided for the real meaning, it can be identified as such. Then when someone decides to re-add their made-up meaning it can be quickly removed without repeating the whole process.

Yes, that might discourage spurious edits. Like anything everything else here, it would be open to (trivial?) abuse.

--Connel MacKenzie T C 21:05, 10 February 2006 (UTC) edit 21:06, 10 February 2006 (UTC)[reply]

A sense of an existing entry that is not properly attested could go on the talk page rather than on the protologisms page, relevant discussion included somewhere on the page if there is any need for it. But really the quotations speak for themselves. The definition lingers in talk, at the top of the page, until it gets all three.
Actually, that wouldn't be a bad way to handle neologisms either, those which have at least one attestation. The neologism page could simply link to the talk page of non-existant entries.
Does this address your question? Davilla 11:53, 16 February 2006 (UTC)[reply]
Partly, except that for neologisms (etc.) it is long standing practice not to retain talk pages for pages that are deleted or don't exist. --Connel MacKenzie T C 17:25, 16 February 2006 (UTC)[reply]
Right, and this is getting off the point, more in line with Eclecticology's concern, but: how would one preserve discussions about those entries, the ones that failed? Davilla 05:55, 17 February 2006 (UTC)[reply]
We already do preserve them at WT:RFVA. I had the summary of that page transcluded towards the top of this page (as this recieves more sysop traffic) but someone removed that from its logical place. --Connel MacKenzie T C 07:22, 17 February 2006 (UTC)[reply]

Capitalization

Hello, this is Dumiac from the Romanian Wiktionary. I have noticed that on the English Wiktionary (and in some other languages), articles can start with a small letter. In the Romanian one (and in some other :P) this is not possible. If you write the URL with a small letter, it is capitalized automatically. For instance, if you insert ro.wiktionary.org/wiki/word, it becomes ro.wiktionary.org/wiki/Word. Does anybody know how to change this feature? Because I think that in dictionaries it does matter whether a word begins with a capital letter or not. Thanks! Dumiac 09:52, 10 February 2006 (UTC)[reply]

Have your wiktionary decide whether it isn't just you, and if the ro.wikt community at large wants to implement it, then bug a developer (or post on bugzilla) to change the setting. —Muke Tever 17:56, 10 February 2006 (UTC)[reply]

People's Names

I'm interested in the meanings, and origins of people's names. First names, last names, they're all interesting to me. I've done some brief research into the names of people I know, and was going to put it on my own wiki for personal reference, but then thought others might benefit from what little I can add.

It seems that there are names here on wiktionary, so I presume that this is the proper wiki to put name origins, meanings. So, first question: Is this the best place for the meanings and/or origins of people's names?

Next question: I notice that there is a holt entry, and a Holt redirect to holt. "Holt" is a last name. What is the best way to handle that? A disambiguation page?

I'm not a vetran wiki editor, and don't usually edit things. If these are issues better dealt with via existing policies, FAQ's, etc, please let me know where to find them. I did not find them when I looked earlier (although my search was by no means exhaustive). --Ron Johnson 02:53, 11 February 2006 (UTC)[reply]

Wiktionary doesn't name specific people like an encyclopedia or in fact most dictionaries, but we do list and encourage the addition of generic names, especially with etymologies. See the Criteria for Inclusion.
When I looked up "Holt", it brought up the holt page with the text "(Redirected from Holt)" at the top. I clicked on the link Holt and edited to replace the contents:
#redirect [[holt]]
with this:
''See also'' '''[[holt]]'''
==English==
===Proper noun===
'''Holt'''
# A [[surname]].
Now you can edit the etymology. Davilla 04:29, 11 February 2006 (UTC)[reply]
Remember to also add {{see|Holt}} to the top line of holt. --Connel MacKenzie T C 05:20, 11 February 2006 (UTC)[reply]
Also, we don't really like entries that just say "a surname" - see what I have done with Holt and holt. SemperBlotto 09:03, 11 February 2006 (UTC)[reply]

Can a name really be treated like a proper noun ? What I mean is that a proper noun correspond to one unique thing (there is only one Moon), when a name (and surnames) are given to a lot of people (there are a lot of people called Pierre). I think that the names and surnames should have there own title, like :

{{see|[[holt]]}}
==English==
===Surname===
'''Holt'''
* An English and north-west European topographic surname for someone who lived by a small wood.

Using a # is incorrect since there is no order and it is not a definition, a * would be better. What to you think ? - Dakdada 13:23, 11 February 2006 (UTC)[reply]

The third-level heading is the part of speech. Surname isn't a part of speech. We use # even when there is only a single definition. This is to make it easier for people to add a subsequent definition (as I have done for Holt and holt). SemperBlotto 14:29, 11 February 2006 (UTC)[reply]
Holt is a proper noun because for anyone given that name, it refers to that specific individual.
You folks think of a lot of stuff I don't. Forgot Wikipedia even had that standard of a single star.
Is there a way to double-check the see also references at the tops of pages robotically? This may also apply to other sections like anagrams, total bot turf, and (most) homophones. A good enough bot could even place synonyms to be disambiguated in a special section, e.g. if someone lists "cook" on the chef page but doesn't list "chef" on the cook page. Davilla 06:14, 12 February 2006 (UTC)[reply]

Thank you. The Criteria for Inclusion for names is exactly what I was looking for. I'll check out formatting for some existing entries before I start adding the ones I have. --Ron Johnson 00:49, 12 February 2006 (UTC)[reply]

New Main Page

Please comment on Main_Page/Redesign, the current iteration evolved from this discussion. If there seems to be general approval, we'll take it live soon. —Dvortygirl 07:17, 11 February 2006 (UTC)[reply]

How to describe unusual, incorrect alternate forms

I recently encountered the word vandalization for the first time. What's clearly meant by users of this word is vandalism, however looking at a google search this is actually not that uncommon of a mistake to make, probably due to regularized expansion of the form vandalize.

Anyway, we should have a mention of the alternate usage at vandalism, but I'm not sure how to write that into the vandalism article. Is it a synonym? Thoughts are appreciated, thanks. Scott Ritchie 09:17, 11 February 2006 (UTC)[reply]

Same goes for that weird Americanism burglarized, = burgled. Though I have only ever heard that on Ricki Lake etc.. Widsith 10:12, 11 February 2006 (UTC)[reply]
Vandalization is a perfect English word, not a mistake at all (according to sOED, it originated in the early 19th century, which makes it about as old as vandalism). Also, discussions like this should go in the Tea room in order to relieve this too long page. — Vildricianus 11:27, 11 February 2006 (UTC)[reply]
The word burglarized is a perfectly valid US English word, used in preference to burgled (the CW/UK word.) Prescriptivistic notes go in the ===Usage note=== section, but be careful to present both points of view. ===Alternative spellings=== is the header where the other spelling should be listed. --Connel MacKenzie T C 17:44, 11 February 2006 (UTC)[reply]
BTW, According to m-w.com, burgled is the "incorrect" form; a back formation from burglar. --Connel MacKenzie T C 17:47, 11 February 2006 (UTC)[reply]

Download Wiktionary

I'm doing research dealing with language, and I'd like a copy of Wiktionary on my computer. Is there an easy way to get this? Short of searching for 100,000 words? I found http://download.wikimedia.org/enwiki/20060125/, but had trouble actually getting stuff off of it. The file was read as an archive and then broke when opened. I think the 1 GB compressed file might be too much for my computer, is there a place to get a copy of just Wiktionary? -Quantumelfmage, 15:58, 11th February, 2006

You can intall mediawiki http://sourceforge.net/project/showfiles.php?group_id=34373 on your home computer, download the dump file, use mwdumper http://download.wikimedia.org/tools/ to convert the dump file to sql and import it into your database.
I haven't done this myself actually but from searching around online that seems to be the way to do it. Personally, I've been using berkeley db xml to play with the xml directly and search using xquery. Millie 22:02, 11 February 2006 (UTC)[reply]

Quantumelfmage, you were doing the correct thing, almost. http://download.wikimedia.org/enwiki/20060125/ is the Wikipedia download page. The Wiktionary download page is http://download.wikimedia.org/enwiktionary/ or the most recent dump (there should be a new one tomorrow or the day after) is at http://download.wikimedia.org/enwiktionary/20060130/enwiktionary-20060130-pages-meta-current.xml.bz2 for downloading. Note you must be able to "bunzip2" the file. It is 26.8MB downloaded, 168MB uncompressed. --Connel MacKenzie T C 07:02, 12 February 2006 (UTC)[reply]

Silly rabbit, templates are for tricksters!

FYI I have modified some of the templates at MediaWiki:Nogomatch as commented on the talk page. Davilla 00:07, 12 February 2006 (UTC)[reply]

It needs to be made clear that most templates can always be replaced by real text. Eclecticology 00:00, 16 February 2006 (UTC)[reply]
Most, but I don't understand what you're getting at. Wouldn't the pre-loaded templates for new pages be an exception to that? Davilla 07:28, 18 February 2006 (UTC)[reply]

Javascript

For those of you wondering why I'm making major changes to Monobook.js, it is to get the experimental page

http://en.wiktionary.org/wiki/User:Shibo77/translation

to work like

http://en.wikipedia.org/wiki/User:Shibo77/translation

so that we can reopen discussions about translations in general. So far, zh:, de: and fr: are doing this to hide translation sections dynamically. --Connel MacKenzie T C 07:12, 12 February 2006 (UTC)[reply]


Please take a look at the local page User:Shibo77/translation, then show and hide the translations a couple times. Please reply here with comments. This begs the question, "Should we replace ==Translations== with =={{trans}}==?" --Connel MacKenzie T C 09:46, 12 February 2006 (UTC)[reply]
First of all I don't see how you are going to make that work. There is no way I can see that can get end tags after the section with the syntax above. Secondly it is very ugly.
I have a better solution, which unfortunately requires modifying MediaWiki. I have done an test implementation which seems to work. The idea is that you add a new page MediaWiki:sectionheader4 which as default contains <h4>$1</h4>. It is called every time a level 4 section header is rendered. Now this doesn't quite solve this problem so I added that it tries the subpage having the same name as the header first. In this case MediaWiki:sectionheader4/Translations.
Sure to solve the problem above we need support for MediaWiki:sectionfooter4/Translations as well which I haven't implemented yet. Of course I'm not sure the feature would be accepted since it would slow down page rendering and thus increase the load on the servers. But perhaps it would only be marginal since the pages are cached.
In any case the point is that there are alternative solutions that are less ugly. We really need to think about adapting MediaWiki instead of adding layer after layer of kludges. --Patrik Stridvall 19:20, 12 February 2006 (UTC)[reply]
Indeed, I was probably getting ahead of myself suggesting the translations header be replaced with a template that included that gobbledygook. At this point, the discussion should be limited to whether or not hiding translation sections is a Good Thing™. --Connel MacKenzie T C 20:44, 12 February 2006 (UTC)[reply]
Special template? Couldn't it just be added to {{top}} and {{bottom}} ? —Muke Tever 00:59, 14 February 2006 (UTC)[reply]
Probably. However, They are used for things other than translations. Anyway, I still think that we should modify MediaWiki instead. I will see what I can do, however don't hold your breath. So far I have mostly done some testing to convince myself that it could be done without too much pain. The real pain will likely be to actually get it accepted in some form and then deployed. It will presumably take time a lot of time... Writing code is fun. Open source "politics" less so... --Patrik Stridvall 21:19, 14 February 2006 (UTC)[reply]
This feature is already on the cards for MediaWiki but I'm sure they'd appreciate people who want to work on it: Bug 1257: allow sections to be collapsibleHippietrail 00:52, 15 February 2006 (UTC)[reply]
I am of the opinion that being able to dynamically hide stuff using CSS is a great way forward in functionality for the end user. The code needs to be hidden in templates because it is absolutely useless for editors.. PS there are many more translations for water on the Dutch Wiktionary .. :) GerardM 16:25, 12 February 2006 (UTC)[reply]
A good thing. Something in the code must not be closed though. See my edits on that page, above and below the section, to add context. Davilla 03:41, 14 February 2006 (UTC)[reply]
Corrected I see. Does the template apply to each sense, or to all translations for the POS? If it's the latter then someone should get feedback on the collapsable sections timeline. If it's the former then you could have a standard ===Translations=== header, and I would go ahead with implementation. There are cosmetic issues, but that can be addressed later within the template. Davilla 12:01, 16 February 2006 (UTC)[reply]

A "lively" and relevant discussion about the style of quotations is going on at template talk:new en. 59.112.37.243 04:07, 14 February 2006 (UTC)[reply]

"New Messages" on IP with blank user talk page

When I went to the main WT page before I logged on (read: I was an IP), I got the new messages but when I clicked on the link it said the page does not exist. Is this just me and a cache problem or have other people had this happen too? Tawker 19:00, 14 February 2006 (UTC)[reply]

This is the fruit of an interesting discussion with karmosin about our gender templates ({{m}}, {{f}}, {{n}} and {{c}}). The idea is simple, but the execution may be trickier. Several of you may be aware that other Wiktionaries, for example nl:, have links in their gender templates to a helpful page explaining this nebulous concept. In this example, we have nl:WikiWoordenboek:Genus.

Now, karmosin and I were discussing (fairly heatedly, at points), of all things, the fact that many people may not know the terms for the genders, the ramifications of having genders, etc etc. What I propose is that this page, and it may well need to be expanded to Wiktionary:Grammatical agreement, should have short paragraph(s) explaining the concept(s) in very basic terms, and suited to our purposes, with subsections for each language. Nothing in excessive detail, that should be left to Wikipedia, in my view. The templates would then all link to this page, the tooltips remain (redundancy here is excellent), and everyone is happy, right? Probably not, but I can try. :)

I hope what little I have done is enough to display my intentions here. I must leave it for now, as that treacherous entity known as 'time' is working against me. Please, tell me your thoughts, and ask me about all the things I've doubtlessly forgotten to say. --Wytukaze 00:10, 16 February 2006 (UTC)[reply]

I'm not sure if your proposal is targeted at users or at translators. Since you want to keep it short, I presume you mean the former. Wiktionary needs references for its users, and the shorter the better. Besides this topic, there should be a pronunciation key and guides for common dictionary language like "prefix" and "past participle". I think the policy here is not to abbreviate, as unlike print dictionaries there is no need, so that's one thing that won't be included. The pronunciation key I might work on after a bit if anyone can tell me how it could be linked from every page. Maybe in navigation would be best? Davilla 09:22, 17 February 2006 (UTC)[reply]
I'm karmosin, but I recently changed switched user accounts and started signing with my real name as I've done on other wikis for some time now. My concern when discussing with Wytukaze was that for a lot of users it would often be more relevant to see the actual name of the article in nominative ("die/der/das", "la/le", "el/la", etc.) rather than the name of the grammatical gender. For German and most Romance languages it probably isn't a biggie, since I guess that most speakers and foreign students know what masculine/feminine/neuter means, but for languages that have far less intuitive genders, such as Danish/Norwegian/Swedish it might be more instructive to simply write en/ett instead of c(ommon)/n(euter). I'm not insisting very strongly on this, but I want to point out that far fewer people know what the two Swedish genders are called, especially in English. If you ask Swedish-speakers what the genders are called in Swedish, a lot of them might not know the terms at all or use the older Swedish terminology reale/neutrum (modern terms: utrum/neutrum), but en/ett is instantly understood by anyone, no matter their theoretical knowledge of grammar.
Peter Isotalo 13:12, 18 February 2006 (UTC)[reply]
The template {{m}} is used for many languages. In German a word has a gender and given how a noun is used you could have "der des dem den" with that noun. It is therefore best to indicate the gender and expect that the person using the dictionary knows how to use this information for a particular language. GerardM 14:35, 18 February 2006 (UTC)[reply]
Precisely. Remember too that "m", etc, give the gender of the noun, while "der"/"le"/"il"/"el", etc, are articles (specifically, one of definite articles used in German, French, Italian and Spanish, respectively). Now, a user looking up "apple" and finding "der Apfel"/"la pomme"/"la mela"/"la manzana" might well think that these each of these phrases means "apple" (which is, after all, what is being translated) when each one actually means "the apple". This confusion is all the more likely with lesser-known languages that have articles, or those that inflect the noun instead of using an article. If the user wants to translate "an apple" or "some apples" rather than "the apple", then they'll get it wrong. On the other hand, omitting the article leaves the user free to look up "the", "a", "some", etc, according to what they actually want to say or write. The "m" shows that they need to find the masculine singular translation of "the", "a" or "some", as appropriate. — Paul G 14:33, 21 February 2006 (UTC)[reply]
Actually, it assumes that the person knows the terms for the genders, not the genders themselves, which was what I was trying to explain in the post above. It probably works for languages with fairly simple, consistent or at least mildly logical systems, not for all languages. And I was talking about writing Film der instead of Film m, not putting it before the translation or as a part of the (wiki) article name.
Peter Isotalo 11:41, 22 February 2006 (UTC)[reply]

Wiktionary:X language

The intention and use of these "Wiktionary:X language" (like Wiktionary:Japanese language, Wiktionary:Romanian language)pages needs to be better explained, if it's not already explained somewhere. Anybody have a link? I notice that "Wiktionary:Japanese language" seems to be superfluous, because there is Wiktionary:About Japanese. Alexander 007 07:49, 16 February 2006 (UTC)[reply]

You're right, at least the Japanese one is now redundant. The links to it should be revised, and the page deleted. I'll investigate the others as well. Eclecticology 09:46, 18 February 2006 (UTC)[reply]

A new system message, check it out! The design I used is a little crap, but you get the idea. Gerard Foley 00:41, 17 February 2006 (UTC)[reply]

Well done. I like it. --Connel MacKenzie T C 08:00, 17 February 2006 (UTC)[reply]
Looks good. I might suggest adding a link to the Community Portal since a lot of newcomers with questions won't know where to post them.
In my opinion there should also be a version of the Main Page that is editable, and that these edits would be periodically reviewed and updated, making the page semi-protected instead of fully protected. Right now the only way to change anything on the front page is through criticism and a beaurocratic process that allows for revisions, yes, but not tweeks, whether kept or improved or reverted, that result in a great collaborative work.
If the discussion page redirected to the page you have created, as a talk page for an editable Main, knowledgable users could look at the ideas and try to implement them on the editable page. Davilla 09:10, 17 February 2006 (UTC)[reply]
The problem is that all MediaWiki: namespace pages are protected. Attempts to unprotect them fail. The easiest way around that is to use a template like Hippietrail did (long ago) with MediaWiki:Recentchangestext --> Wiktionary:Wanted entries. --Connel MacKenzie T C 09:14, 17 February 2006 (UTC)[reply]
I don't mind that the Main Page is protected, even in its entirety, as a guard against vandalism. But when pages are protected, even simple things are a pain to address. For instance, on MediaWiki:Nogomatch the word "remember" is misspelled. Now think, would you expect someone to discuss changes to fonts or wording as well, changes a person with special clearance won't need to implement if they don't agree with them? Personally I don't even experiment with such things because the edits are minor, potentially lateral, and can't be saved, and style is hard to argue. Protection turns minor changes into major hassels, which is why, besides the necessary and indisputable, only major revisions ever take place and turn into efforts initiated by a single person.
Would you mind if I moved Talk:Main Page to Talk:Main Page Revision (with redirect) where Main Page Revision is a duplicate of the Main Page (or better yet Dvorty's new version that hasn't been copied over yet)? Do you think it would be useful to keep updates made to Main Page Revision, useful enough to check it periodically? Davilla 14:56, 17 February 2006 (UTC)[reply]
Thank you for pointing out my spelling error. I requested comments on those proposed changes 03:47, 17 July 2005 (UTC) and despite several attempts at getting feedback, received only one comment (a "go for it" from Polyglot) in all this time. I do agree that page protection is problematic.
Talk:Main Page is protected from moves. Main Page/Redesign is the current staging area for the new Main Page (which has not had a vote yet, as comments are still being solicited, and edits are most certainly allowed.) Are you suggesting that Main Page/Redesign become a permanent unprotected page that sysops would then monitor and use for updates to Main Page? I like that idea. A lot. But for all MediaWiki: namespace pages, a better solution is probably needed. --Connel MacKenzie T C 18:18, 17 February 2006 (UTC)[reply]
I wasn't aware that changes were encouraged to Main Page/Redesign. Yes, I think it should be a permanent solution. Application to other protected pages (even those on Wikipedia, should they follow suit) would require a discussion of naming convention. For now what I want to do is delete the contents of Talk:Main Page, moving it to Talk:Main Page/Redesign and replacing it with a (protected) redirect to the same (to Talk:Main Page/Redesign). Too bold? Davilla 07:07, 18 February 2006 (UTC)[reply]
Yes, that would be too bold. Talk:Main Page is heavily used right now, by people who can't find the right place to ask a question for one reason or another...i.e. if they are not fluent in English. Why would you want any of these pages moved? --Connel MacKenzie T C 07:26, 18 February 2006 (UTC)[reply]
The reorganization is admittedly ad-hoc if you agree that there should be a general solution for protected pages, maybe using different tab names at top. But regardless of how it's done, the idea is to group the protected page, its editable revision page, and the discussion page (which applies to both) together. That there's a revision page now is not worth as much if people don't know about it, and not everyone is going to hunt down something they don't know exists. The way it's set up, visitors tend to ask for changes in Talk:Main Page, if at all, rather than going ahead and making the changes themselves on Main Page/Redesign. However, if the discussion page were redirected, then the redesign page would be encountered immediately.
As far as Talk:Main Page being used as an information desk, I don't know what to do. Maybe it's hopeless, and we should just consider it to be a talk page as it applies to Wiktionary in general. In that case the redirect really is a bad idea. Maybe instead there could be a notice at the top of the page linking to the redesign. It's one more click, but it might be necessary. Davilla 17:50, 18 February 2006 (UTC)[reply]
Aha! There's already a link there, it's just hidden below the massive contents section, and it misleadingly suggests discussion only. Davilla 06:23, 19 February 2006 (UTC)[reply]

Semi Protection?

Discussion moved to Wiktionary:Votes/2006-02/Semi Protection?.

Capitalization when starting an article.

I'm sure this has been discussed before somewhere, but I'm rather curious to find out as soon as possible. I'm curious to know why do articles start with a lower case letter? I personally find it to be quite peculiar, to say the least. Is it a style guideline, or is it just a personal preference, to have the article started with a lower case letter? KnowledgeOfSelf 23:59, 17 February 2006 (UTC)[reply]

It is a very poor rule of style recently implemented. It was insisted on by the users of some wiktionaries; the inciting argument, which I always thought was weak, was essentially "because certain German words are capitalized", followed soon after by "because print dictionaries do it", which is silly because print dictionaries don't have article titles, only headwords. The ugly meme has spread to many (but not all) Wiktionaries, such as this one, which have a rule where a word must be placed at the lowercase form, unless it must always be capitalized, and the proper title-casing of a word is thus a carefully-guarded secret not to be annotated anywhere. (The switch allowing lowercase article titles was actually activated on the English wiktionary before there was any consensus, and there had to be a lot of hasty cleanup when it first happened, but that's another story.) —Muke Tever 01:06, 18 February 2006 (UTC)[reply]
Pretty irrelevant criticism, in my view. The title of a page is never actually taken advantage of, since it's always repeated in way or another at the start of the article. Same with Wikipedia. Focusing on it in discussions about layout is pretty bogus. The layout still basically mimics that of print encylopedias/dictionaries, and frankly, this is a good choice, since it's not an arbitrary one and is proven to work. Making people who are used to print dictionaries feel at home is a very good reason to imitate the quite practical layout of other works of reference. Don't try to reinvent the wheel just because we're working on a wiki...
But what's this about annotating title-casing? What exactly is it that's unclear under the current system? Where should it be annotated and why?
Peter Isotalo 23:24, 28 February 2006 (UTC)[reply]
  • I agree with Peter. The title should also be set in lowercase type because setting it uppercase would encourage improper usage and give an inaccurate description of how the word is spelled.--Primetime 23:58, 28 February 2006 (UTC)[reply]

Freeware Application Usage of Wiktionary / WikiSaurus

Appologies if this policy is covered somewhere... I've looked around but haven't seen it.

I've written a Visual Thesaurus application. Currently it only runs on Mac because it uses the Oxford dictionary included with OS X for it's data source. I'd like to GPL the app and make it available to the community, and change the backend to use WikiSaurus. Is this permitted? Is there documentation of a particular API I should use somewhere?

Thanks!

Peter

New words I added

Konglish, Korean English, dubu, bulgogi. How is my style for these new words? Any suggestions for referencing words that are used in daily Korean English conversation but don't appear in major western dictionaries? Glennh70 09:20, 18 February 2006 (UTC)[reply]

Hi there. I would prefer the ==language== entry to be just English, and the definition to start #(Korean English). It's ===References=== rather than ==References==. I think that placenames are just Proper nouns even when they have more than one word. Everything else is fine. SemperBlotto 10:13, 18 February 2006 (UTC)[reply]

  • The ==References== section is used for all entries on a page. Making it a subsection means that it only provides documentation for the ==English== sense. This is why in published material, ==References==, ==Notes==, etc., are set as sections rather than subsections.
    --Primetime 10:25, 18 February 2006 (UTC)[reply]
The standard header here is "Usage note" not "Notes", and applies to a language entry, therefore is at level three or four. I have yet to see references that truly should apply to "all" language entries on a page. The conflict is that level two headings in the main namespace are used exclusively to identify languages. --Connel MacKenzie T C 16:13, 18 February 2006 (UTC)[reply]
I was referring to sources referred to as notes. In any case, I think it's a shame that someone would add several subsections for the sources of each entry. That would certainly clutter up the page, and using superscript numerals of course numbers the sources, making referencing in one section easy. Some use parenthetical references, usually with the author's last name and a page [e.g., (Johnson, 18)]. References for these sources are arranged alphabetically by last name (or title, if no author can be found), making a subsection arrangement unecessary, as well. --Primetime 18:37, 18 February 2006 (UTC)[reply]
"Usage notes" are notes about usage. Any other kind of notes certainly don't belong under a heading of that name. — Hippietrail 17:49, 18 February 2006 (UTC)[reply]
Sorry if that was ambiguous: References go under ===References===, not ===Usage note===. --Connel MacKenzie T C 02:04, 19 February 2006 (UTC)[reply]
It's ===Usage notes===, not ===Usage note===. Just because there might be only one usage note at the time of creation of the section does not preclude there being others in the future. The same applies to other sections such as "Synonyms" and "Antonyms". — Paul G 15:07, 21 February 2006 (UTC)[reply]
It's not quite the same. Presumably all information about an entry would be in one "note" that can be added to indefinitely. Synonyms, antonyms, etc., are lists of items. When User:Muke last pointed this out to me, WT:ELE was quite different than it is now. Either one is fine with me. Currently, WT:ELE says ===Usage notes=== so that's how it should be, I suppose. --Connel MacKenzie T C 23:59, 21 February 2006 (UTC)[reply]


Wiktionary Mirror

I wanted to let you know that Askfactmaster is running a Wiktionary mirror in violation of the GFDL. An example of a page copied is http://askfactmaster.com/dict-en/dog . Note that there is no link to the GFDL and the notice is an image, so text-only users can't see it. Furthermore, there is no history section or link to the wiktionary definition. I found this site through w:Wikipedia:Mirrors and forks. If there's a more appropriate place on Wiktionary for this information, I apologize. Please contact me with any questions at w:User talk:Superm401 because I very rarely use this account. Superm401 21:45, 19 February 2006 (UTC)[reply]

The dog example above seems to match our January 2005 version of dog, so they probably used the January 23rd, 2005 database dump. Looking at w:Wikiedpia:Mirrors and forks#Non-compliance process, I am left wondering who is supposed to send the letter? Our primary bureaucrat? Everyone that ever contributed to dog prior to February 2005? What is meta:Non-compliant site coordination's role in this process? --Connel MacKenzie T C 00:40, 24 February 2006 (UTC)[reply]
Well, if you notice, they are actually copying all the definitions. Thus, any contributor has reason to send the letter. As for the meta page, it's basically dead. Superm401 15:28, 24 February 2006 (UTC)[reply]

*Another one in violation of GFDL: http://www.all-science-fair-projects.com/science_fair_project_dictionary/LART --Connel MacKenzie T C 23:30, 25 February 2006 (UTC)[reply]

List of protologisms and protologisms by topic?

These two lists seem to be exclusive of each other (unless one has entered the word on each manually) and there is no cross-linking between the two. Should one enter a protologism in both lists until this feature is fixed, or am I missing something? AturoUrbo

The list by topic looks like it snuck in unnoticed as someone's experiment. That page has only been edited once since it was added in September. --Connel MacKenzie T C 02:35, 20 February 2006 (UTC)[reply]

Protologisms not searchable

Although these words may not be commonly used (yet), it seems reasonable that protologisms should be findable from the search box. --AturoUrbo 14:37, 23 February 2006 (UTC)[reply]

Yes, the search box is supposed to show you Appendix:List of protologisms when you search for any of its component words. It's just that "search" doesn't work very well. SemperBlotto 16:31, 23 February 2006 (UTC)[reply]

Pronunciation guides

I've had some discussion about making proper pronunciation guides for use in articles at Template talk:IPA and what's sorely needed at the moment are separate pages for separate languages. Currently we have only the generic Wiktionary Appendix:IPA Examples which is rather unwieldy and doesn't have much room for the additional comments needed to cover the particular needs of individual languages. I've made a first attempt for a pronunciation guide hub at Wiktionary:Pronunciation out of the existing pronunciation key for English with other languages linked at the bottom of the page. The one for Swedish is more or less complete, but that's it. I'm going to work on guides for Japanese and Chinese (Mandarin), but it would be really swell to get some help with other major languages like German, French, Spanish, Italian, Portuguese, Dutch, etc.

Peter Isotalo 23:46, 20 February 2006 (UTC)[reply]

Prefixes, suffixes, and combining forms

As I have received messages from User:Keffy about suffixes and combining forms, I would like to better clarify whether "-based", "-backed", "-on-demand", and the like should be considered suffixes or just combining forms while these phrases without hyphens are English words by themselves. Despite Wiktionary:Be bold in updating pages, I would like to get more opinions to promote a uniform style.--Jusjih 01:08, 21 February 2006 (UTC)[reply]

Jusjih was doing some much needed clean-up, adding "Suffix" categories to things that had "Suffix" as their third-level heading. I noticed that some of the things he was re-categorizing just didn't make sense and asked if he could hold off the iffy cases. The problem may not be with just the pages Jusjih was working on, but a wider inconsistency of how Wikitionary uses terms. For suffixes, we have:
The -based of carbon-based is no more a suffix than the -faced of two-faced or the case in bookcase. If we start calling the second half of every compound a "suffix", then eventually we'll be applying the term "suffix" to almost every word in the language.
So what do we call the second half of a compound on Wiktionary? When I noticed Jusjih re-categorizing some of these things from the suitably vague-sounding "combining form" to "suffix", I assumed it was "combining form". But that's certainly not the definition we have at combining form. We've got at least three types of things (suffixes, second halves of compounds, bound Latin and Greek roots) and only two terms to go between them, which have been applied haphazardly.
Suggestions? Keffy 03:07, 21 February 2006 (UTC)[reply]
They're parts of compounds, not suffixes. Most of them can even be used on their own, which is exactly what pretty much any affix can't. Things like "-sized" or "-mobile" should be described briefly under size and mobile, but a lot of them look ripe for deletion.
Peter Isotalo 08:28, 21 February 2006 (UTC)[reply]

These are called combining forms by print dictionaries, as they are the forms of ordinary words that combine with others to make compound words. Affixes (prefixes and suffixes) are not words in their own right (but sometimes become words later). So:

  • -handed (as in "left-handed"): this is from "handed" (or "hand", if you like), which is a word in its own right, so "-handed" is a combining form
  • -basher (as in "Bible-basher"): this is from "basher", which is a word in its own right, so "-basher" is a combining form
  • anti- (as in "antifreeze"): this is derived from a Greek word meaning "against", not from any English word, and cannot stand by itself, so it is a prefix. While "anti" is a word in its own right (meaning "someone who is opposed to something"), this is derived from the prefix, not the other way around.

I do not think that -sized, etc, should go under size, etc, as this is not where users will look them up, in my view. In any case, "-sized" and "-size" are spelled differently from "size", and so cannot go on the page for "size". Having separate entries for -sized, etc, also allows derived terms to be listed there.

Note that neither "book" nor "case" in "bookcase" are combining forms - they are just words. The criterion here is that combining forms themselves are not words in their own right (although they are derived from true words). "-Sized" is not a word, but "sized" is.

Clarification: The criterion is whether the form can be used by itself (that is, when it is not combined with another word). "-sized" cannot, so it is a combining form. "Book" and "case" can, so these are not combining forms. — Paul G 16:54, 21 February 2006 (UTC)[reply]

To summarise:

  • Combining form: A hyphenated form that cannot be used by itself but is derived from the word without the hyphen (eg, "-armed" from "armed", meaning "having arms");
  • Affix: A prefix or suffix: a hyphenated form that cannot be used by itself (although a word might subsequently have been derived from it by dropping the hyphen) and is not derived from the form without the hyphen (which may or may not exist) (eg, "-dom", "-some", "pre-" and "psycho-"; although the word "some" exists, it is not the origin of the prefix "-some"; and although the word "psycho" exists, this comes from the prefix, not the other way round);
  • Any other component of a compound word that is an unhyphenated word that can be used by itself is neither of these (eg, "lap" and "top" in "laptop").

The third category above avoids proliferation of the case illustrated by "bookcase" mentioned by Keffy.

I hope this clarifies things.

Paul G 14:56, 21 February 2006 (UTC)[reply]

So did I use the correct heading for -safe, or should this even be an entry? Davilla 16:24, 21 February 2006 (UTC)[reply]
"-safe" comes from "safe", which is a word in its own right, so the header should be "Combining form". I think it does deserve an entry as the meaning isn't deducible from "safe" alone. It's on a par with -free (as in "lead-free" and "tax-free") and -proof (as in "waterproof"). — Paul G 16:46, 21 February 2006 (UTC)[reply]

I've just checked category:English suffixes and agree with Keffy that the following should not be categorised as suffixes; they are combining forms by the criteria I have given above: -on-demand, -armed, -basher, -bashing, bearing (should be -bearing), -goaler, -handed, -looking, -mobile, -pounder, -sized, -speak. Additionally, -free and -word are combining forms, coming from "free" and "word" respectively. The "English suffixes" category should therefore be replaced by "Category:English combining forms" on all of these pages.

The following might also be combining forms, but I am not sure. The thing to do here is to check the etymology of examples for each to see if they are formed by combining words with these terms: -cycle (some of these come directly from French, I would think); -fold (as in "threefold"); -head (a combining form in the sense of "someone addicted to something", eg, "pothead"; maybe not in the sense implied in "godhead" and "maidenhead"); -hood (as in "sisterhood" - as for the second sense given for -head); -less (probably is a combining form); -like (probably a combining form); -ship (possibly a suffix); -ward (possibly a suffix, although in some longer-establish words, such as "toward", it is not); -wise (depends whether "wise" as a word by itself has a similar sense); -worthy (probably is a combining form, as it means "worthy of..."). These need to be checked and moved to the suffixes category if appropriate. — Paul G 17:26, 21 February 2006 (UTC)[reply]

Similarly, the following affixes (prefixes and suffixes) are incorrectly categorised as combining forms, and need to have their category changed to "Category:English prefixes" or "Category:English suffixes" as appropriate. They are affixes because they are not derived from the the form without the hyphen (which may or may not exist): -cephalic, -chondrion, -cracy, -crat, -dom, electro-, endo-, ethno-, Euro-, -graph, -holic, ortho-, -philia, -phobia.

Further, prefixes and suffixes are not subcategories of combining forms. — Paul G 17:36, 21 February 2006 (UTC)[reply]

The prefixes category looks OK to me. — Paul G 17:47, 21 February 2006 (UTC)[reply]

According to the New Fowler, a combining form is "A term, probably first used in the OED itself, for a linguistic form that normally occurs only in compounds or derivatives as a means of coining new words." The emphasis on "only" is mine. Thus electric has the combining form electro- which would not stand by itself. When a hyphen appears in an expression it may not really belong to either element of that expression. The hyphen is often syntactical, and does not belong to either element of a compound. Many of the items listed above like -sized or -on-demand probably don't need separate articles at all. They clearly are not suffixes. Eclecticology 00:41, 23 February 2006 (UTC)[reply]


How encyclopedic can we be?

I'm pretty sure that cadence is too encyclopedic for a dictionary - but by how much should it be trimmed? How do we decide? SemperBlotto 22:51, 21 February 2006 (UTC)[reply]

IMO, #7 should be moved to ring cadence and #6 trimmed and moved to drum cadence.
Done. Davilla 17:09, 7 March 2006 (UTC)[reply]
I have simplified #2 #4 (running) without losing any information. It is quite similar to #3 #5 (cycling) but off by a factor of 2, since a revolution requires pumping with both legs, so the detail is required. Davilla 05:32, 22 February 2006 (UTC)[reply]

en categories

I've noticed some categories whose names start with "en:" (Category:En:Food, Category:en:Proverbs). Isn't that a bit redundant? I guess I understand the need for, say, Category:bs:Alcoholic beverages, but to do this with English words seems like overkill to me. - dcljr 23:34, 21 February 2006 (UTC)[reply]

You are right. Feel free to remove the "en:" when you see it. That goes even more se for "En". Eclecticology 10:19, 22 February 2006 (UTC)[reply]
To be a truly multilingual dictionary, Wiktionary should include language pointers on all words, not only non-English words. I'd say, "feel free to add the 'en:' when you don't see it". Jon Harald Søby 10:38, 22 February 2006 (UTC)[reply]
I think, where en: is omitted, it is assumed, as this the English Wiktionary. This is already the case for many of the categories. So I think the "en:"s should actually be removed, if anything. — Paul G 16:09, 22 February 2006 (UTC)[reply]
The other effect is in categories. Not having the "en:" (combined with having category names begin with a capital letter) enables us to begin with a master category list for all English language terms. We should in theory have more words in English in this Wiktionary that in any other single language. This then allows us to use lower case language codes to create parallel categories for each other language as needed, and these can then be sorted in the same way with all the categories that are operational for a given language being kept together. Eclecticology 00:52, 23 February 2006 (UTC)[reply]

Verifiability

Are citations from the most-reliable dictionaries in the world enough to prove that a word has entered the English language? The reason that I ask is that some people claim that only quotations count and that dictionaries don't. However, on the Criteria for inclusion page, it says that it's attested if it has obtained "Clearly widespread use", or "Usage in a well-known work". I cited the Oxford English Dictionary in the entry "metrosexual", but it is being said that it isn't enough. They say without three quotations, the sense in question will be deleted. On one page, I added four citations--three from published dictionaries. However, it is being said on its entry on the RFD page that it will soon be deleted.

I think that the fact that something is published in a widely-respected dictionary (e.g., from Merriam-Webster, Oxford University Press, Random House, etc.) it has to have achieved widespread use. But, regardless, if being published in a book that has been thoroughly reviewed is not enough, perhaps we should remove those two bulleted points on the "Criteria for inclusion" page saying that such usage is enough and just say that the only criteria for attestation is three quotations? --Primetime 01:27, 22 February 2006 (UTC)[reply]

Have to disagree with you on one point: all of those dictionaries, as far as I'm aware, most certainly do include rare words, and I know for certain that OED includes nonces, marked as such (although the transparently-derived ones, IIRC, don't get their own headwords). You are not going to get far by insisting your sources are above the rules; it would be better to go to the rules process itself and try to effect changes. On la: for example I suggested that a dictionary or ‘mention’ citation might count for half of what a running text or ‘use’ citation does; if that were adopted here your words could be included on six dictionary cites, say, or two dictionaries and two running text cites, etc. —Muke Tever 23:18, 22 February 2006 (UTC)[reply]
Meriam-Webster dictionaries have, on average, much more than three citations on file for each sense printed. For example, the entry "magnetic resonance imaging" has 30 citations on file, and "greenmailer" has 19. Words used by a very small group of people or used for a short period of time are not included. (For more information, see "The English Language," Merriam-Webster's Collegiate Dictionary, 10th ed. (1995) p. 29a.) As for the OED, the same principle applies: "Words that are only used for a short period of time, or by a very small number of people, are not included."[5] As for evidenciary requirements, "a word may be included on the evidence of only a few examples, if these are spread out over a long period of time. Conversely, a large number of examples collected over a short period of time can show that a word has very quickly become established." Every entry I have ever seen in the OED has at least three quotations included with a definition in the dictionary. Many--especially older words--have many more.
I also disagree that accepting citations from dictionaries would be against policy. As I said above, the policy clearly states that one can cite "Usage in a well-known work" or "Appearance in a refereed academic journal" or "Usage in permanently-recorded media, conveying meaning, in at least three independent instances spanning at least a year". The fact that there is the word or in the list implies that either practice would suffice. I think the insistance on the RFV page by some is due to a misunderstanding of the attestation rule. Most editors here, I have observed, consider one citation from a published source enough. --Primetime 01:01, 23 February 2006 (UTC)[reply]
It is against policy, because the criteria are about usage. A dictionary headword is not per se an attestation of usage, it is a mention. You can assert that the other dictionaries have attestations of usage, but the very point of CFI and RFV is to collect attestations of usage, not mere assertions of attestations of usage, which anyone can make; the idea as it stands is to have objective proof of a word being used, not merely to go by anyone’s word whether they be anonymous user, trusted user, urbandictionary, or trusted dictionary. —Muke Tever 01:47, 23 February 2006 (UTC)[reply]
That is a somewhat unique interpretation of the word usage--and probably one that the writer(s) of the "Criteria for inclusion" page did not intend. Mentioning something is, by definition, using it. In any case, per the inclusion criteria of every published dictionary I know, a mention in the work in question is proof of usage by many people. I can see we're not going to get anywhere with the usage vs. mention debate, but why do you think that a citation in a dictionary with a strong proof-of-usage requirement is not proof of usage? --Primetime 02:03, 23 February 2006 (UTC)[reply]
Ceci n’est pas un pipe.Muke Tever 05:03, 24 February 2006 (UTC)[reply]

As I understand it, the OED's attestation critera are similar to Wiktionary's if not stronger. Therefore it would seem highly reasonable to take it that if the OED accepts a word as attested then the attestations exist even if they are not readily available online. I am even less familiar with Merriam-Webster's attestation policy but I would be very surprised if it was less strict than Wiktionary's. Indeed there has been recent talk of including every word from the current OED as a project purely on the basis that they exist in the OED so they should exist in Wiktionary. I would say quite strongly that the OED or MW is sufficient attestation to merit an article being kept. This is not on the other hand an excuse for complacency, other attestation is still valuable and should still be sought. MGSpiller 23:14, 22 February 2006 (UTC)[reply]

I don't understand Primetime's complaint about intent. The edit history, talk page, beer parlor archives and RFD history from the June 2005 time-period indicate that there was nearly universal support of the "running text" distinction. The wording of it went back and forth many times, but the authors of CFI very specifically intended to exclude dicdefs and other secondary sources. --Connel MacKenzie T C 02:17, 23 February 2006 (UTC)[reply]

The main thing we tried to thwart with attestation rules was the efforts of some to load the project with protologisms, local uses and other unverifiable materials. Running text usages are certainly the best evidence, but this does not mean that published dictionaries cannot be used as sources. In the long run those pieces of evidence may be re-inforced by running text quotes, but we are a long way from being able to do that in any practical sense. Eclecticology 02:41, 23 February 2006 (UTC)[reply]
Since we certainly don't want submissions from urbandictionary, could you please provide a list of dictionaries, encyclopedias or other references that Wiktionary should acknowledege as suitable (for bypassing RFV at this time) then? --Connel MacKenzie T C 21:21, 23 February 2006 (UTC)[reply]
Do we really need a list of works, though? As a rule of thumb, I consider a published reference work to weigh about as much as three quotations (e.g., a printed dictionary or encyclopedia). I consider other printed, or widely-respected online, sources to weigh about two (e.g., a university website or less well-known book). I consider amature web sites (e.g., the LookWAYup Translating Dictionary)[6] to weigh about one quotation. Listing each work, however, would be impossible. --Primetime 00:26, 24 February 2006 (UTC)[reply]
Yes, we’d really need a list of works. “Published reference” is no good, as anybody can get published: there are books of protologisms, and published prescriptivists who invent rules of usage quite counter to actual usage. “Widely-respected” is far too fuzzy a criterion (whose respect? at what point does it become wide? this is already a problem with the vague widespread-use criterion; we don't need to multiply it).
In any case, taking dictionary cites (whether from a whitelist or from dictionaryspace at large) introduces a rather large problem: they all, save when plagiarizing from each other or in the simplest situations, will be defining a word’s semantic differently. A concrete example:
m-w.com: tripod
  1. a vessel (as a cauldron) resting on three legs
  2. a stool, table, or altar with three legs
  3. a three-legged stand (as for a camera)
AHD: tripod
  1. A three-legged object, such as a cauldron, stool, or table.
  2. An adjustable three-legged stand, as for supporting a transit or camera.
Webster 1913: tripod
  1. Any utensil or vessel, as a stool, table, altar, caldron, etc., supported on three feet.
  2. A three-legged frame or stand, usually jointed at top, for supporting a theodolite, compass, telescope, camera, or other instrument.
You have, counting conservatively, four different definitions here—a three-legged vessel (m-w), a three-legged platform (m-w), a three-legged object (ahd, webster), and a three-legged instrument stand (m-w, ahd, webster)—all supported by dictionaries. So problem one is, if we take dictionaries for reliable sources, we already end up with more senses outlined (‘defined’ in the literal sense) than any of the individual ones, and not, apparently, in a useful way.
In addition, we have a discrepancy: m-w specifically considers a vessel tripod to be a full separate sense from a stool/table/altar tripod, while ahd and webster conflates them. Problem two is thus the question of whose authority you're going to give credence to. Does m-w have more senses because that's what their proof-of-usage criteria turned up, or is it padding to protect against copyvios? Does AHD have fewer senses because their proof-of-usage criteria was too weak to catch all the nuances m-w's did, or because it was strong enough to see there was no distinction being made? Without access to their data and their reasoning processes, how can one presume to say?
While Wikipedia strives to be a tertiary source, Wiktionary has for some time now attempted more to be a secondary source, looking at the data directly: instead of arguing fruitlessly about the merits of AHD vs m-w, we can look at usage of tripod directly and see whether these various senses are opposed, and if opinions differ, at least they're differing on something concrete, not something ethereal like whose dictionary has the more impressive genitalia. —Muke Tever 05:03, 24 February 2006 (UTC)[reply]
The problems that Muke outlines are very real. "Tripod" is not a particularly contentious word, but it still has the difficulties that he outlines. More abstract terms or political philosophies can be the source of much bigger headaches. When we make up our own definitions, often with the very valid purpose of avoiding copyright infringement, we risk making the situation even worse. The philosophical essence of NPOV is to give credit to all verifiable sides of the story, without favouring any and without engaging in original research. There is much room for debate about just what that means when viewed at a practical level.
When applied to the conflicting definition of tripod it simply means that we have to document them all. Getting into debates over which dictionaries are acceptable or good enough will get us nowhere; it would be the basis of unending debates. Some, like the OED or Webster's, are obvious candidates, but if you go into the history of the Webster, and the 19th century court battles over the use of the name one needs to seriously ask what we mean by Webster's. I recommend David Micklethwait's Noah Webster and the American Dictionary to anyone who wants to appreciate the extent of the difficulties. Actual usage of a word in a written text remains the best evidence, but we also need to recognize that existing dictionaries all went through some set of selection criteria before they enterred each word. Perhaps in time what we get from existing dictionaries can be expanded by real quotes. Properly tracing and documenting a single word can take many hours, and even then can only be done if a person has access to adequate references. I don't think we can ignore obscure publications either; many of them are specialized glossaries for particular subjects, eras or places.
We really need to take a serious look at where Wiktionary as a whole is going. Wiktionary's current Alexa 3-month ranking is 5,698. That's up 6,120 places over what it was 3 months ago. (We still need to keep in mind that 25% of that is due to ru.wiktionary!!) We are 12th among what Alexa classifies as dictionaries, well ahead of oed.com. Cambridge is in 11th; some of the others are specialized dictionaries. At the same time this is happening with (as Alexa reports) only 87 other sites linking into ours. I haven't analyzed that thoroughly, but it would seem like an unusually high rank/link ration. To me that suggests a high growth potential.
Becoming bigger has a downside, like more attention from vandals. It would be good to hear other ideas of what others see happening in the next year or two. Eclecticology 02:09, 25 February 2006 (UTC)[reply]
  • I think the tripod example was only to illustrate why we don't (didn't?) rely on any secondary sources for RFVs. Therefore, if that practice is truly changing, a finite list (taking into consideration all the difficulties outlined above) is required for white-listing secondary sources. For example, the current OED probably should not be white-listed, as it includes obscure terms and is not available to all on-line.
  • The Alexa reports indicate to me that we should prepare for an influx of newcomers by aiming for a higher percentage of sysops. But that issue seems to be taking care of itself, for now. --Connel MacKenzie T C 03:04, 25 February 2006 (UTC)[reply]
    One problem with creating such a list is that there are hundreds (probably thousands) of very reliable dictionaries and encyclopedias. There are dictionaries of art (e.g., the Grove Dictionary of Art, 34 volumes, published by a division of Oxford University Press),[7] of biography (e.g., Dictionary of Literary Biography, 328 volumes, published by Thomson Gale),[8] encyclopedias of religion (e.g., The Encyclopedia of Religion, 15 volumes, published by Macmillan),[9] and of bullfighting (e.g., Barnaby Conrad's Encyclopedia of Bullfighting [1961]). To determine the reliability of each book, one would need to read reviews about it, and it would be literally impossible to do so for each book. Given, reviews are available online on such databases as EBSCO's digitized collection of Book Review Digest issues--assuming your local library has a subscription to EBSCO--but the magnitude of such a task remains insurmountable.

    In any case, I personally consider a collection of three quotations to be inferior to a definition from a published dictionary, as three quotations--especially for hard-to-define words--is far too few to ascertain a lexeme's true meaning. One would need many more, and they would need to be from a broader spectrum of media than available online, to surpass the quality of most published dictionaries. Most periodicals older than a week or so, television and radio transcripts, as well as most books, are unavailable to the general public via Google. --Primetime 04:28, 25 February 2006 (UTC)[reply]

    Yes, three citations is not much. We already know that; hence criteria for inclusion delineates that as the minimum number of citations to support the sense of a non-notable word—any fewer and clearly we don't have enough information to work with. In the future we hope citations to accrue as readily as translations do today; some examples of words with many citations already are tsunami (tsunami/Citations) and tidal wave (tidal wave/Citations). —Muke Tever 05:14, 25 February 2006 (UTC)[reply]
    Availability through Google is not a criterion. Dead tree books are just as valid. The quotes should be verifiable, but we don't need to go out of our way to make it easy for them. They may need to go to a library, or even seek an interlibrary loan to verify a quotation. I have recently picked up a copy of the Canting Crew, possibly the first dictionary of English slang ever published. If I use it for a reference on some point, I don't expect it will be easy for anyone to find another copy. Three citations for some usages can be a lot when a word is first documented. I don't think we should read too much into that rule; it was intended to put a brake on a lot of ephemeral online material that may not appear anywhere else. Eclecticology 07:03, 25 February 2006 (UTC)[reply]

A general comment on this and other topics:

  • Eclecticology says: The philosophical essence of NPOV is to give credit to all verifiable sides of the story, without favouring any and without engaging in original research.
  • Wiktionary:Copyrights says: Quoting a sentence from a contemporary author to illustrate his use of a word is an important example of fair use, but properly identifying the source is still essential.
  • This page also gives some interesting arguments.
  • I say: Seems clear, but to me it isn't. Especially with regards to dictionaries, there is supposed to be a grey extent between copyvio's and original research, where we operate. I'm however, often confused as to where the borders of those two extremes are, where we stand considering "fair use" etc.

For example: there are two ways for me to find and add a citation for a word: either I found it in another dictionary, or I obtained it from an original text. The first instance seems to be a copyvio, the second looks like original research, right? Now am I wrong on this, or are the only citations we're supposed to add to come from works in the public domain? Because I've been experimenting with the Project Gutenberg texts and I'm willing to add quotes to our entries directly from texts found there.

The above is just an example; the same problem seems to arise concerning definitions, as constructing or own also looks like original research to me. If not, can someone define "original research" for me?

Furthermore, it becomes clear that these terms are still pretty vaguely defined on Wiktionary. I had to read a bunch of Wikipedia pages to obtain information on them, but they're mostly irrelevant to a Wiktionarian, and therefore I suggest someone expand our Wiktionary:Copyrights page and related ones.

In general, our steady growth according to Alexa is a perfect warning for us to review, update and primarily expand our policy pages to protect new contributors from confusion, and ourselves from the chaos such contributors can cause (I hope I'm not one of them). The simple fact that Category:Policy - Wiktionary Official is nigh empty indicates that there's work to be done. Recently, Eclecticology said somewhere that Wiktionary relies more on practical experience than on written policy, and although that allows quick and easy revision of procedures and methods, which keeps this project fun and saves it from Pedia-like Wikistress and red tape, we can't forever rely on them, as pressure on experienced users will increase, talk pages will lengthen with trivial questions easily solved on a clear policy page, and the Beer Parlour will grow exponentially. This from the stance of a relatively new contributor. — Vildricianus 10:49, 25 February 2006 (UTC)[reply]

‘Original research’ is a tough term, and the way that Wikipedia uses it is somewhat of an idiom. The ‘research’ in question doesn’t refer to looking things up; in fact the opposite of ‘original research’ is ‘source-based research,’ i.e. the kind of research one does by looking things up and having citable sources. For Wikipedia there are primary and secondary sources: the primary sources are the actual data or artifacts gathered on the topic, such as a da Vinci painting, or one of his journals; the secondary sources are those that analyze or interpret the primary sources, such as a criticism of the painting or a translation of the journal. An encyclopedia is a tertiary source that collects and describes the primary and secondary sources of information on the topic at hand, and ‘no original research’ means not to create primary sources; for example, you would not dig up da Vinci’s bones for Wikipedia to try and prove your theory that he was a woman, though you could cite the publication of someone else who had done so; alternately you could do it and get published yourself, though I understand citing yourself on Wikipedia is a faux pas. For Wiktionary, which tries to be a secondary source, our research should be built on primary sources, which in this case refers to attestations of the word in question in print and in other media. Secondary sources would be other dictionaries, which describe the word, but either do not use it, or invent an example to illustrate their analysis. (Now, you could cite a dictionary that uses a word in a definition of some other word—but that's a different matter.) ‘No original research’ here at the dictionary means that we are not field linguists, and we do not go out with our tape recorders to collect examples of people using a word in one meaning or another; we go with sources (usage) already published. One can assert all one likes that a word has a particular meaning, but unless one has sources to back it up that they didn't produce, well... (Following some earlier discussions in #wiktionary, asserting that a word is a non-standard spelling—or its POV variant ‘misspelling’—would also probably fall under the umbrella of original research unless the authority who determined it is not standard is cited; alot is an example here.) —Muke Tever 13:03, 25 February 2006 (UTC)[reply]

Lovely discussion, but from a practical point of view, when editing a word, would the Wikionary Gods prefer:

  • A dictionary reference?
  • One cited quote?
  • Three cited quotes?
  • A dictionary reference and one or three cited quotes?

A simple vote would be helpful. Regards Andrew massyn

Talking about citations, I looked up the word calabash and came across a note to the effect of 'for citations of this word, look up the category "citations" and sure enough, there it was! it would seem to me that the citations are best put at the word. If one wishes to link to a category citations, it can be done, but personally I think this category is unnecessary. Sincerely Andrew massyn 21:28, 4 March 2006 (UTC)[reply]

Grape varieties and wines

Are these nouns or proper nouns? Should they be capitalized anyway? We have Chardonnay but zinfandel. SemperBlotto 10:20, 22 February 2006 (UTC)[reply]

Good question. It seems that Chardonnay derives from the name of a French town, so it may have inherited its capitalization for at least some uses. I do not know the etymology of zinfandel, so I can't say anything meaningful there. In usage, I can say that Chardonnay is used as if it were a common noun, meaning that it pertains to a class/style of wines rather than to a particular brand or region. --EncycloPetey 12:30, 22 February 2006 (UTC)[reply]
Most things that I look at tend to capitalize these names. I suspect that it is the best way to handle these but have yet to find an authority for either alternative. "Zinfandel" is not a good example to base this on since the origins of that name have not been established. Eclecticology 02:16, 23 February 2006 (UTC)[reply]

Unwanted Category:

Help! I have a verb root zhiishii, which is Ojibwe but it seems to want to drop that particular root in to both Category:Ojibwe language and in Category:Cree language, as well as in Category:Algonquian derivations (but the third is a bit more forgivable). I would prefer to have it in just Ojibwe but I can't find the line sticking zhiishii into Cree. At first, I thought it was the template for Proto-Algonquian but after changing it, zhiishii still shows up in Cree. Help! CJLippert 01:17, 23 February 2006 (UTC)[reply]

It doesn't seem to be in that category now. Categories sometimes take a couple minutes to reflect changes. --Connel MacKenzie T C 02:03, 23 February 2006 (UTC)[reply]
I just went to Category:Cree language and zhiishii is still there. CJLippert 04:17, 23 February 2006 (UTC)[reply]
I did a dummy edit on zhiishii which updated the category links, and it has now disappeared from Category:Cree language. Jonathan Webley 07:25, 23 February 2006 (UTC)[reply]
Thanks. That took care of that. CJLippert 21:26, 23 February 2006 (UTC)[reply]

Wikipedia dates

The MediaWiki software has a built-in feature of rendering date according to user preference. But I have only very rarely seen dates entered in Wiktionary entries. And when I have seen them, I've removed them (as they are always red.)

Should we have 366 entries that redirect to the twelve months? --Connel MacKenzie T C 03:57, 23 February 2006 (UTC)[reply]

What would be the point? Wikipedia already does this really well. Jonathan Webley 07:29, 23 February 2006 (UTC)[reply]
i18n & l10n. The point is that dates would be rendered for people according to their language preference. Following the link would lead them to either an entry such as January 1 (entered as [[January 1]]) which could do any of several things: redirect to Wikipedia, redirect to the entry for that month name, try to describe (and perhaps translate?) "January 1", or stay red. Should we (Wiktionary) do one of those things, or should we simply continue to discourage Wikipedia-style date wikification of dates? --Connel MacKenzie T C 07:56, 23 February 2006 (UTC)[reply]
I am still not sure why a dictionary needs many dates - if I look up bonfire night I see 5th of November, but I wouldn't be bothered how it was formatted. Saltmarsh 08:09, 23 February 2006 (UTC)[reply]
It is POV to list all dates in a British fashion. --Connel MacKenzie T C 08:17, 23 February 2006 (UTC)[reply]
Similarly, it is POV to list all dates in the American fashion. That said, I too find general date listings to be pointless. Eclecticology 10:12, 23 February 2006 (UTC)[reply]
What? By entering wikified dates, they render in your format. Currently, Wiktionary dates are entered in British style with very few exceptions. The question that was posed to me on my talk page noted that the French Chinese consider that format an error, while the French consider month-day dates to be a grammatical error. The person that posed the question has offered to enter all 366 in with translations. Is this desirable? It would be one less thing to note to visiting Wikipedians. I also think having the short entries would make Wiktionary look less amateur. --Connel MacKenzie T C 20:27, 23 February 2006 (UTC) (edit) 01:00, 24 February 2006 (UTC)[reply]
Dates are not that important in this project's main namespace. We create wiki links to enable people to go to another page. Wikifying solely to effect formatting goals seems to be a misuse of the tool. If someone needs to enter a date, let them use whatever acceptable format they want. The only exception would be in quotes when the quoted person's style would be followed. Eclecticology 03:57, 25 February 2006 (UTC)[reply]

Misspellings revisited

In the "Misspellings" discussion above, it is stated that Wiktionary:List of common misspellings is a list of common misspellings. I think this title is a little misleading, as it combines two things:

  • common errors in people's knowledge of how to spell words (such as "liase" for "liaise" and "accomodation" for "accommodation")
  • typing errors (such as "hte" for "the)

I would only term the former misspellings. The latter are typos.

This means we need to take great care in selecting which of the entries in this list deserve a mention in Wiktionary (as liase and accomodation already do - or at least I thought we did - we do now that I have added it, along with liason). There should be no entry for "hte" as no one would believe this is the correct spelling. It could be argued, of course, that people with little knowledge of English might come across this typo in postings to newsgroups or other typed communications, but there are probably billions of different typos out there that a user might come across, and we should certainly not be trying to list them all.

A far better solution, of course, would be to include a spellchecker in Wiktionary that would suggest alternatives if users made typing errors when searching for words. (It would be great to have this, but I am already going off on a tangent so do not intend to propose it right now.)

So I propose the title of that page is changed to something else, or, at the very least, a note is added pointing out that it includes both common spelling mistakes and common typos. Do we not already have a category for the former? — Paul G 16:53, 23 February 2006 (UTC)[reply]

PS: I can't remember where I read it now (I don't see it in WT:ELE) but why is it stated that misspellings should be given a part of speech? They have a language ("liase" is a misspelling in English) but not a part of speech ("liase" isn't a word, so it can't have a part of speech). There is an intended part of speech, of course, but I think it is incorrect to say that "liase" should be marked as being a verb when it is not. — Paul G 17:01, 23 February 2006 (UTC)[reply]
Ah, it's in WT:CFI, section 1.4 ("Misspellings, common misspellings and alternate spellings"). It says the part of speech "can be" used, not that it should be. I would say that it should not be. — Paul G 17:17, 23 February 2006 (UTC)[reply]
  • I think the page came from Wikipedia, and that was the title they used. I think moving the page could make it harder to refresh the list (not that anyone is actively doing so, right now.) I think the preamble which is at the top of the page explaining the combination of both type of errors is sufficent.
  • I used that to generate User talk:Connel MacKenzie/typos as each of the corrections need to be done as a manual, human task. I do not understand why Wikimedia refuses to add a spell check to the edit box.
  • I don't think misspellings should be identified with a part of speech, but I see no reason to prohibit identifying the POS of the intended word, e.g. *Common misspelling of the English noun [[...#Noun|...]]." My practice has been not to enter the POS for these. I agree that we probably don't need to. But there may be a few cases where it is pertinent. --Connel MacKenzie T C 21:14, 23 February 2006 (UTC)[reply]
I don't see any problems with keeping the language name or purported part of speech. The part of speech is determined by how the word is used in a sentence, and how the user encounters it. If liase appears as a verb in its context liaise could very well be the correct intended word; if it appears as a noun the lias geologic epoch could be intended.
Spell checks, especially for us, would be a bigger problem then their worth. An experienced Wiktionarian is aware of the limitations, and can apply them with full appreciation of the way that spellings can vary. In the hands of someone who is insisting on his POV that only his spelling is correct it could give us endless problems. The thought of having someone change every misspelt word to "exicornt" boggles the mind. Let's not make it any easier for them. Eclecticology 06:30, 25 February 2006 (UTC)[reply]
The word redirected to tells us what the part of speech is. Suppose, as an extreme case, we had an entry for a misspelling of "set". Would we really want to say "verb: misspelling of set; noun: misspelling of set; adjective: misspelling of set; past tense: misspelling of set; past participle: misspelling of set". This is impractical. "Misspelling of set" with no part of speech is sufficient - I don't think there are any misspellings that only apply to one part of speech (at least, none come to mind).
The word intended might well be "lias" in the case of "liase", but "liase" is not a common misspelling of that word. The entry "liase" is only intended to be for the misspelling "liaise". We can't do much to help users who make uncommon misspellings. This is where the spellchecker would come in.
I wasn't clear about what I meant here - I meant a Google-style spellchecker for searches that, when a user types in "liase", says "Did you mean: lias or liaise?", rather as they have in dictionary.com. I didn't mean a spellchecker that allows users to change the spellings of the content of Wiktionary entries, which would certainly be a bad thing. A spellchecker for user searches would, of course, only be helpful when Wiktionary is well fleshed out with entries, as the user searching for smoky (which, at the time of writing, is still undefined) would not be happy to be asked if they meant "smoke". — Paul G 11:57, 27 February 2006 (UTC)[reply]
Why would/should they be unhappy about that? --Connel MacKenzie T C 19:32, 28 February 2006 (UTC)[reply]
I'm not sure whether this has already come up (I'm not 100% following this issue), but what to do with an "item" that is both a correct spelling of one word and a (common) misspelling of another one? Perhaps a rather trivial issue, but I was wondering still. — Vildricianus 20:46, 28 February 2006 (UTC)[reply]

Please, revert the changes in Catalan Wiktionary

I'm the Catalan Wiktionary Admin. Somebody has changed the software of our version and have decided that we need differentiate between words than begin with capital letters and no. I don't doubt that this differentiation can be very useful in German but no in Catalan. Please, revert this. We never wanted or solicited this change. Llull 07:03, 24 February 2006 (UTC)[reply]

The change that you are complaining about has been performed without any discussion on all Wiktionaries. The reason is that more and more wiktionaries changed to lowercase and from a systemmanagement point of view it was a big hassle. The Catalan wiktionary is certainly not the only one that sufferes the consequences of this action. More people have complained.
When you consider what a Wiktionary hopes to achieve; all words of all languages, your argument that capitalisation is not relevant to Catalan is wrong. It is wrong because the Catalan wiktionary has words from other languages as well.
I am afraid that it is unlikely that the change to lowercase will be reverted. It is definetly not something that anyone in the Wiktionary community asked for. Personally I find it sad that it happened in such an uncoordinated way. I am however happy that it went this way. GerardM 13:52, 24 February 2006 (UTC)[reply]
"The reason is that more and more wiktionaries changed to lowercase and from a systemmanagement point of view it was a big hassle." — The only person who ever reported it being a ‘hassle’ was you, with your over-simplified interwiki bot, which is hardly a matter of system management. From the logs on #wiktionary while I was gone:
[22:02:01:27:02] <brion_> ok, who wants the rest of the
                 wiktionaries switched to case-sensitive mode?
[22:02:01:30:49] <brion_> nobody?
[22:02:01:30:55] <brion_> ok i'll try it in the morning then
...
[22:02:05:20:13] <Muke|work> and don't let brion case-switch the
                 rest of the wiktionaries while i'm out :p
...
[22:02:13:26:57] <brion> last chance to object to the rest of
                 the wiktionaries being mutilated with the 
                 special klingon-and-lojban feature to make the 
                 first letter case-sensitive
[22:02:13:27:57] <Kipcool> should we object?
[22:02:13:29:37] <brion> whatever
[22:02:13:29:51] <brion> wiktionary will be obsolete any day now 
                 when wiktionaryz comes along anyway ;)
...
[22:02:15:55:50] <brion> ok switching wiktionaries to evil 
                 nocaps mode
This change doesn't appear to have been well-reasoned, and the way brion proposed it actually sounds like outright vandalism. Perhaps the persistent propaganda put forth by lower-case activists such as yourself moved him to think all wiktionaries would choose to move this way and that he would be saving time by doing it all at once instead of waiting for communities to form and decide it officially; but clearly this isn't the case, as apparently this is not something all wiktionaries would have decided for. —Muke Tever 21:22, 24 February 2006 (UTC)[reply]
It's better to have the option of having lower-case page titles together with a policy saying "no lower-case page titles" than the other way round, which you can't do anything about. Ncik 00:33, 25 February 2006 (UTC)[reply]
la: already had that. The problem with the decapitalization ‘feature’ is that for ordinary language it breaks far more than it fixes: sort (a-z sort after A-Z), links (link to one case doesn't link to other), and page titles (allows lowercase page titles outside of the main namespace, where they are unnecessary, e.g. Wiktionary:beer parlour). Implementing the decapitalization ‘feature’ produces no positive effect and merely introduces all these well-known bugs. —Muke Tever 03:29, 25 February 2006 (UTC)[reply]

Making changes on any project with no previous discussion is a bad, bad, bad, bad, bad, bad, bad, bad, bad, bad, bad, bad, bad idea. 555 14:42, 24 February 2006 (UTC)[reply]

We put in the definition and the grammar section that this or this other word is "always" written in capital letters. There is then no need of putting all the other words in non capital letters. A similar system is used in Wikipedia. Now, we need thousands of redirection and the hundreds of templates don't work, a lot of work seems lost. Our Catalan-German dictionaries use always all the words in capital letters and we wanted to do the same. In lists and titles capital letters are always allowed. This differentiation is then not needed and is for our community a horrible annoyance. Please, reconsider reverting the changes.
Think that our language -Catalan- has words that always begin wikth capital letters too, but we never solicited this new funtion. Llull 14:48, 24 February 2006 (UTC)[reply]

http://bugzilla.wikimedia.org/show_bug.cgi?id=5075 555 14:54, 24 February 2006 (UTC)[reply]

This is unfortunate. As much as I support having this feature here, it is still a decision that should be made separately by each community. Eclecticology 03:24, 25 February 2006 (UTC)[reply]

bugzilla:05075#c1 – "5075 MediaZilla Revert turning off automatic capitalization on all Wiktionaries"
→→ bugzilla:00164 – "Support collation by a certain locale (sorting order of characters)"
→→ duplicates of 00164:

  1. bugzilla:00353 – "Danish letters has wrong sort order"
  2. bugzilla:00608 – "Wrong alphabetical order in Special:Allpages"
  3. bugzilla:01304 – "Collation sequence is case-sensitive"
  4. bugzilla:02489 – "Alphabetical ordering in Turkish language"
  5. bugzilla:02602 – "Alphabetical order wrong on (some?) non-english Wiki's."
  6. bugzilla:02818 – "Category sorting cannot handle Thai words properly"
  7. bugzilla:03343 – "Problem sorting accented caracteres"
  8. bugzilla:04622 – "Provide a setup to order pages at [[special:Allpages]] completaly case insensitive regardles of the value of $wgCapitalLinks"
  9. bugzilla:04963 – "An enhacement in categories"

Best regards Gangleri | w: Th | T 13:29, 25 February 2006 (UTC)[reply]


Questionable citations

An editor has been adding some very questionable citations of useage to choad, such as usenet posts, etc. Before wholesale reverting, I wanted to get some input on the best way to proceed. From my experience on Wikipedia I would simply remove them, but I wanted to check here before wholesale reverting. - Taxman 19:02, 24 February 2006 (UTC)[reply]

Um... this is what we do here. See WT:RFV. The process doesn't change just because it's an unsavory word. —Muke Tever 21:32, 24 February 2006 (UTC)[reply]
This one has been to RfV before, and seems to have gained some degree of acceptability. Some of the senses are still doubtful ... like we would not want to seem too much like the kind of geniuses that inhabit usenet. Eclecticology 02:24, 25 February 2006 (UTC)[reply]

Background colour for translations

Anybody else think we should have a neutral (ie white) background in the translations section rather than the current yellow? It puts unjustified emphasis on that part of the page. I don't see how translations are more important than the definitions themselves or even the synonyms, antonyms, related and derived terms, etc. Ncik 00:28, 25 February 2006 (UTC)[reply]

That's an oddly inconsistent sentiment from you considering that you have also wanted colour in the inflections. Eclecticology 02:15, 25 February 2006 (UTC)[reply]
To emphasize them, I would surmise. I'm more of the opinion that color distinguishes, but I wouldn't mind a neutal color if the translations section were collapsible. (Damn, Connel beat me to the punch.) Davilla 03:16, 25 February 2006 (UTC)[reply]
I always thought the yellow background was so one's eyes could skip past it easier...and know when you've reached the next relevant section of an entry. To those that are interested instead primarily in translations, I'm sure the opposite is true. I'd personally rather have the translations hidden by default, but that technical solution (example a few sections above, demonstrated/copied here on en: by a Chinese sysop) has not gotten significant feedback or discussion so far. AFAIK, zh: is now hiding the translation sections. --Connel MacKenzie T C 02:53, 25 February 2006 (UTC)[reply]
No need for a change of colour I say. However, I think the collapsible sections (not just translations, but any section) would be a good move. — Vildricianus 09:38, 25 February 2006 (UTC)[reply]

State of the WikiSaurus

Discussion moved to Wiktionary talk:Thesaurus considerations#State of the WikiSaurus --Patrik Stridvall 08:50, 26 February 2006 (UTC)[reply]

Request Bot status for CEDICT upload

I have written a program off-line that converts all of the Chinese-English entries in CEDICT to a wiktionary friendly format (except for single characters; most of which are already in wiktionary). I have manually loaded a sample of the output from the program onto wiktionary:

There are a total of 25,091 individual terms that would be automatically loaded to wiktionary. I propose to run a bot called pagefromfile.py which is located on SourceForge.net. If a term from the file has been previously uploaded to wiktionary, the bot will not upload that term. The size of the file with all of the entries is 43MB (43,180,032 bytes).

Each entry will include the following COPYING AND DISTRIBUTION notice (from the CEDICT cedict_readme.txt file):

  • CEDICT
    • COPYING AND DISTRIBUTION
    • Permission is granted to make and distribute verbatim copies of these files provided this copyright notice and permission notice is distributed with all copies. Any distribution of the files must take place without a financial return, except a charge to cover the cost of the distribution medium.
    • Permission is granted to make and distribute extracts or subsets of the CEDICT file under the same conditions applying to verbatim copies.
    • Permission is granted to translate the English elements of the CEDICT file into other languages, and to make and distribute copies of those translations under the same conditions applying to verbatim copies.

The bot would be called User:A-caiBot A-cai 10:10, 25 February 2006 (UTC)[reply]

Isn't the use of "non-commercial use" material deprecated? —Muke Tever 02:15, 26 February 2006 (UTC)[reply]
I'm not sure what you mean by deprecated. Here is the wikipedia article on Fair use laws. A-cai 03:27, 26 February 2006 (UTC)[reply]
deprecated. Your choice to include the license on the pages indicates that you're choosing to abide by the license and not include it by fair use (and I doubt that the importation entire of the CEDICT file could possibly be justified as fair use; see w:fair use#Amount and substantiality and w:fair use#Common misunderstandings). Anyway. The license Wiktionary is released under, the GFDL, allows commercial use. Because of this, non-commercial-use images are already deprecated on Wikipedia, because anybody, including the Wikimedia foundation, who wanted to use Wikipedia for commercial purposes would then have to castrate it of all non-commercial-use images; the case is worse here because text is in question: if these imported articles exist, people will be less inclined to rewrite them, and any person wishing to use Wiktionary for commercial use would thus have to pretty much do without Chinese content. —Muke Tever 23:15, 26 February 2006 (UTC)[reply]
  1. I am not a lawyer, so I have no idea what the legal implications would be for Wiktionary. With respect to the COPYING AND DISTRIBUTION paragraph, it is just a suggestion. I copied it verbatim from the CEDICT readme.txt file. Is there someone (with a legal background) who can provide some guidance on this? I would be happy to leave it out, or modify it, depending on feedback.
  2. It is true that this material is under copyright; it is also true that most of the individual terms included in CEDICT appear in dozens of other dictionaries and are therefore part of the public domain. I don't think you need the paragraph for those entries as long as you don't quote CEDICT verbatim. However, the bot is not intelligent enough to identify which entries need the paragraph, and which entries do not. This part would have to be done by individual wiktionary contributors once the terms are uploaded. For example, 從前 was originally based on a CEDICT entry; I then significantly altered the article (replaced the English definition section, added Min Nan). As a result, the COPYING AND DISTRIBUTION paragraph is no longer needed for this entry. I envision this happening for a majority of the entries (it will take time, but it will happen eventually).

A-cai 23:20, 26 February 2006 (UTC)[reply]

I looked at the copyright claim at CEDICT. In addition to the quoted conditions he states elsewhere that the material is taken from public domain and freeware sources. That alone would negate any copyright that he might claim on the detailed material. His copyrights would be limited to formatting, organizing the data. What he doesn't own he can't license. Eclecticology 09:59, 1 March 2006 (UTC)[reply]

I've just put up a draft Policy - Wiktionary:Obsolete and Archaic Terms, as I noticed that someone mentioned people deleting obsolete terms. In my view, that is vandalism. So, if you have a view on this, please add to the policy discussion, improve the policy etc.--Richardb 20:47, 25 February 2006 (UTC)[reply]

Deletion of obsolete terms is vandalism. --Connel MacKenzie T C 05:19, 26 February 2006 (UTC)[reply]

About to upgrade Proposal for Polices and Guidelines from Draft Policy to Semi-Official Policy

Whilst my idea of starting Wiktionary policies (based on Wikipedia) never set the Wiktionary world on fire, it has seen some use, some added policies, some policies improved on.

So, I now propose to upgrade Wiktionary:Proposal for Policies and Guidelines from Draft Proposal to Semi-Official, and take the word Proposal out of the title.

page actually renamed to Wiktionary:Policies and Guidelines - Policy around 18-May-2006

This is going to involve some work to change all the links etc. So, before I begin, any strong objections? or words of support ?--Richardb 21:32, 25 February 2006 (UTC)[reply]

I support you pursuing this. --Connel MacKenzie T C 22:55, 25 February 2006 (UTC)[reply]

In doing a bit of work on the Policies area, I realised we could do with another "Policy" category. I have added a category and template Policy-PI = Policy Implications. This can be used to categorise a page as having policy implications, without sticking a policy banner on it. This means pages such as tutorial pages are not splattered with banners, even though bits within them may be tagged as having policy implications, and may be the only place that the policy as such is spelt out. (I hope this might put paid to a bit of back and forth tagging and untagging by myself and EC!)--Richardb 01:23, 26 February 2006 (UTC)[reply]

I could do with someone to upgrade the Policy-XX templates, so that we can create an entry in the alphabetical policies index that can be different to the page name, and even link direct to the right section in the page. I'm not here often enough to remember how to do these complicated parameter driven templates. Any volunteers ?--Richardb 01:23, 26 February 2006 (UTC)[reply]

I've pulled a few pages into the category list of policies.

Anyone know any other pages which ought to be brought into the list/category of policies ? --Richardb 01:38, 26 February 2006 (UTC)[reply]

Archaic to be preferred over obsolete

I'd like to see us stop labelling words as obsolete. This means we will never use the word again, never see it in print or write it ourselves, so we might as well throw it away, delete it. "The connotation is that the subject is so old that it is essentially worthless. "

archaic means it is just old. "Comes from Greek arkhaikos meaning "old-fashioned."

Pretty simple argument, but no doubt someone can make it complicated. Or maybe I'm behind the trimes, and everyone is already using archaic instead of obsolete.

Any takers ?--Richardb 04:56, 26 February 2006 (UTC)[reply]

We are trying to describe the English language, not dictate it. Describing an obsolete word as archaic is simply wrong. Someone looking up a term in an very old text probably would like to find the meaning of an obsolete term, but we shouldn't be misleading people by identifying obsolete terms as archaic. Labeling a term archaic has the connotation that many people will understand the speaker, but the speaker may sound quaint; obsolete terms generally are not understood by native speakers. --Connel MacKenzie T C 05:17, 26 February 2006 (UTC) Whoa Connel. I'm not trying to dictate anything. I'm just puzzled over what I see as the over use of the term obsolete, instead of the word archaic, basedon the defintions I've quoted.[reply]

So you think people would not just think Yuletide 'quaint', but would actually not understand it ? Because Yuletide is one example in the category:Obsolete (which someone has annoyingly used with a capital O). What about swop,filly ?--Richardb 06:53, 26 February 2006 (UTC)[reply]

Obsolete doesn't really mean that a word is old. There are for example computer terminology from the 1980s that was commonly used back then that are no longer in use and even very knowledgable people in the business today wouldn't understand them. Using such terms are not recommended since few people would understand what you meant. Thus they are obsolete and should be marked as such. But we should list them because somebody might find them in historical documents or in biographies. --Patrik Stridvall 09:07, 26 February 2006 (UTC)[reply]

  • Given the example Yuletide, I took a look at the category in question. It needs cleanup. I agree that Yuletide certainly does not belong there. December is even more or a head scratcher; someone tagged a Turkish translation as being obsolete, thereby adding the whole entry to the category (obviously incorrectly.) OTOH, There are many terms listed in that category that merely have obsolete meanings e.g. filly. Those I think should stay, at least until we have a better way of delineating the category. As for swop, "obsolete" seems perfectly apt.
  • I do not understand the complaint about the capital "O". It has been a very long time since lower case category names were discussed; I thought that conversation was rendered obsolete. (Pun intended.) Actually, I don't even remember that being a viable concern; templates we try to keep in all lower case. But category changes (when we were forcibly decapitalized last summer) would have required innumerable edits to entries, with no apparent benefit. --Connel MacKenzie T C 10:15, 26 February 2006 (UTC)[reply]


Despite being the guilty party who wrote the Yuletide entry with the obsolete tag, I can't disagree with anything said here. It was a really tough decision, and I can't even recall what pushed me over the edge except a vague sense that learners would need a "Don't try this at home" warning. The brief explanations at the start of category:Obsolete and category:Archaic aren't helpful for deciding. I like Connel's distinction (archaic is likely to get you the reaction "Aw, ain't that quaint" or "Join the 20th century, ya old fust" -- obsolete is likely to get you stares of incomprehension). But the relevant category pages would have to be changed so that they don't say, um, the opposite, and there'd preferably also be short explanations in Wiktionary:Index to templates and maybe even WT:ELE. Keffy 16:37, 26 February 2006 (UTC)[reply]
I still use them both, but I admit not very consistently. Roughly, I consider "archaic" as not used since at least 1800. I think that these categories should be limited to obsolete words, and not obsolete senses. Changing the template to a simple tag usually fixes that problem. Eclecticology 10:00, 27 February 2006 (UTC)[reply]

Richardb, there is an important difference between these two words. "Archaic" means, more or less, "old-fashioned" and refers to words that are still in use, such as thou (meaning "you") or methinks. "Obsolete", on the other hand, means no longer in use, such as zyxt. There is therefore a clear distinction between the two. Obsolete words are still part of the English language; they are just not used. Most large dictionaries include very many obsolete words, as, even though they will not be found in modern texts, they will be found in very old texts. — Paul G 11:43, 27 February 2006 (UTC)[reply]

There is now a place to discuss this Wiktionary:Obsolete and Archaic Terms, to try and resolve the subtle differences that exist between the entries obsolete and archaic and the category defintions category:obsolete and caegory:archaic, and the usage of those categories. The discussion also now encompasses what is obsolete, what is Old English, and what is Middle English. Can anyone with any expertise in Old English or Middle English please help out there. Thanks.--Richardb 14:44, 28 February 2006 (UTC)[reply]

Acronyms (and) Proper nouns

I have posted some thoughts on Wiktionary talk:Policy - Abbreviations with regard to how we deal with capitalization and puctuation for acronyms and abbreviations.--EncycloPetey 06:34, 26 February 2006 (UTC)[reply]

Names of Constellations and Stars

I'd like to propose an additional category of proper noun that should be included in Wiktionary, even though it is not yet mentioned in Wiktionary:Criteria for inclusion (section on names). In particular, I strongly feel Wiktionary should include names of constellations and major stars. These terms tend to be widely used, typically have been in use for a very long time, and often have different forms in different languages. --EncycloPetey 06:34, 26 February 2006 (UTC)[reply]

I support the inclusion of some names. Those names you might come across in a book without explanation that they are stars or constellations. Eg: A text might say something like "the ship was dimly visible, just below Orion's belt". A moderately knowledgeable English speaker needs no explanation that that is (part of) a constellation. Anyone else would appreciate a brief entry in Wiktionary for "Orion's belt" or "Orion". This "test" should keep it down to not too, too many entries. For me it sure has more relevance that putting great numbers of surnames, first names and town names in. The stars and constellations are seen from all over the world, and probably have very different names in diffrent languages, whereas town and people's names really need no translation, except for the few major ones.
I think we should certainly include these. Just in point of fact, most dictionaries give the names of the constellations (if only the better-known ones or those forming the zodiac) and some give the names of the better-known stars: for example, Chambers Dictionary has Algol, Betelgeuse, Deneb, Denebola, Polaris and Sirius, among others. The etymologies of many of these are interesting, many being from Arabic. — Paul G 19:38, 26 February 2006 (UTC)[reply]

CFI not universally applicable - Protologisms, WikiSaurus, concordnances etc

See WT:CFI--Richardb 11:05, 26 February 2006 (UTC)[reply]

Dutch issues

I've come across two minor issues concerning Dutch:

  • I use the Template:nl-verb for conjugations. The thing is, it's so voluminous, that on pages with multiple languages, it usually extends well across the part reserved for Dutch. Even putting it at the very top of the page doesn't always suffice. See flirten for an example. Any thoughts?
  • Concerning the same subject of Dutch verbs: is the repeating of the headword under ===Verb=== still necessary, even when there's nothing more to add there as the conjugation tables are floating left? — Vildricianus 17:46, 26 February 2006 (UTC)[reply]
As for the conjugations I don't really know much about Dutch, but I can read German eventhough my grammar is very rusty. Anyway here goes.
  • Make it broader instead with the tenses in column like Template:sv-verb-irreg. You will have to seperate columns for singular and plural though. In Swedish it is the same.
  • As for the pronouns can't you just say 1st, 2nd, 3rd instead of using the actual pronouns. Anybody speaking a language should know such things already. This save one column since you can know "share" descriptors for the singular and plural.
  • At some point in the future we probably should make the the inflection boxes hidable (and perhaps even hidden by default). I did some experiment on my own Wiki and it seems to be solvable without to much pain. Don't hold your breath though.
--Patrik Stridvall 19:25, 26 February 2006 (UTC)[reply]

Problem temporarily solved thanks to your idea. — Vildricianus 17:31, 28 February 2006 (UTC)[reply]

I always repeat the headword under the header a s a visual anchor, and because sometimes someone will come along later and link to components of a phrase or compound. In any event, I prefer the consistent placement of the bold headword to inconsistent usage. --EncycloPetey 07:00, 4 March 2006 (UTC)[reply]

Oxford English Dictionary

A list of some of the words in the OED was added to Wiktionary a while back and I'm hunting for it... I can't remember where it is. Could someone remind me, please? Thanks. — Paul G 19:26, 26 February 2006 (UTC)[reply]

WT:EDH --Connel MacKenzie T C 07:34, 28 February 2006 (UTC)[reply]
Thanks, Connel. — Paul G 12:44, 1 March 2006 (UTC)[reply]


Serbian, Bosnian, and Croatian ligatures

Hi everyone. I just wanted to take a moment and to discuss ligatures in Serbian, Bosnian, and Croatian languages. I'm referring to the following letters of the SBC languages: dž (which in fact should have a hachek on the z but does not because such ligature is hard to find and use), lj, and nj. Each ligature mentioned represents only 1 letter and 1 sound in these languages. My problem stems from the fact that these ligatures can be clearly written as , lj, and nj. Being written as two separate letter combinations does not degrade the language (nor the alphabet) nor does it make it be pronounced differently. It only makes it easier for usage, especially here on Wiktionary for searching and typing the the address bar. I just wanted to ask if there is or if there should be a policy that states that instead of using ligatures in the mentioned languages, out of simplicity, should two letter combinations be used? The ligatures are very bothersome to most people, including myself, and are not ever used by the people that type in these languages (out of simplicity of using two letter combinations). These ligatures are ugly and are not even used on Wikipedias, Wiktionaries, nor other Wikimedia projects written in these languages. However, there are some wiktionarians, notably User:Strabismus, who prefer ligatures and who remove two letter combinations. Can we please get a vote on whether to keep the ligatures or to remove them? Or, if there are any other suggestions, please comment here. Thanks. --Dijan 06:57, 27 February 2006 (UTC)[reply]

Strabismus doesn’t seem to understand the difference between orthography and pronunciation guides. Serbian, Croatian, etc., as well as Ojibwe and most other languages that are being entered here have standard orthographies, and the standard orthography (the way people actually type) is how words should be entered. Pronunciation guides belong in the pronunciation sections. The Unicode Consortium has made provisions for many special letters in many languages that are not actually used. For example, Spanish has available a ligature 'ch'; Dutch has a ligature 'ij'; Arabic uses hundreds of ligatures that Persian does not. I have never encountered anyone (other than Strabismus) who actually uses these special characters. Spanish writers type 'ch' as two separate letters; Arabic ligatures are made a function of the font and virtually all letters are entered separately, so that only those people who want the ligatures will see them, and others will not. If it can be shown that a significant number of people actually use the ligatures in question, then words containing them can be added as redirects to the more usual orthography. —Stephen 12:42, 27 February 2006 (UTC)[reply]
As far as I know Spanish ch is not a ligature but a digraph and does not have a Unicode codepoint. The only Spanish ligature I can think of is an archaic DE ligature that I've seen in some azulejos. But the Dutch ij seems a close counterpart to the characters discussed above. There were probably some people pushing for them in Unicode who got their way but there may also have been some good reasons. You can always ask on a Unicode forum. — Hippietrail 16:51, 28 February 2006 (UTC)[reply]

(Someone editcommented wondering what a hachek is...Usual spelling is háček, try w:Hacek. —Muke Tever 00:24, 28 February 2006 (UTC))[reply]

Capitalised categories

A number of categories have fully capped names (eg Category:Elementary Particles and Category:Units of Measure)

  1. is this inappropriate?
  2. and, if changed, can all references to it be changed by a bot - or whatever?
    Saltmarsh 10:58, 27 February 2006 (UTC)[reply]
You are right to point it out, but my guess is nobody will give much thought to fixing the problem. but, if you want to make the effort, why not.--Richardb 13:05, 27 February 2006 (UTC)[reply]
Um, Category:Units of measure looks right to me. Did you already fix that one?
Using "category.py" indeed can change the members of Category: Elementary Particles to Category: Elementary particles. But I'm certain editing 47 entries is faster than getting the approval process here. (I still don't understand why separate tasks need to have a separate 'bot account with separate approval.) I suppose I could run the 'bot without formal bot approval and without the 'bot flag - those changes would simply appear in RecentChanges.
Oops. Testing the bot command syntax, it changed the one (and only?) entry right away: template:particles, and while I was typing this, finished moving the category page itself. The 'bot was signed on as me at the time. --Connel MacKenzie T C 07:58, 28 February 2006 (UTC)[reply]

Horizontal divider lines between languages

Hi,

I'm new to Wiktionary, but an experienced webdeveloper. Going through the pages, I noticed a hard-coded divider line between each language entry on most (all?) pages. As a webdeveloper, I am not happy with hardcoded layout elements. Trying to find some background on this thing, I only found a short discussion on the Entry layout explained page. This discussion was started December 28, 2005, while most divider lines seem to have been added early 2003.

Can someone help me understand the reason for hardcoding this divider-line, while it should have been done in CSS? My main gripes with hardcoding are following, copied from the talk page on ELE:

  • As was mentioned by the original poster on ELE, some articles show the horizontal line above the headers, while others do not. This is inconsistency which easily leads to confusion, plus an inconsistend look is often considered a sign of amateurism. Using CSS, the approriate code only has to be added once, to automatically show up on all pages.
  • How many Wiktionary articles excist at the moment? Who is going to add the horizontal line to all of them? And what if someone decides another layout would be better, is he going to remove the line from all articles and add another element instead of it?
  • All registered Wiktionary users can create their own favorite CSS-based layout, with little dependency on the contents of the page, see m:Help:User style. Seven hours after the first comment on the horizontal line, discussion began about how to remove it using CSS. This is the wrong way around. Instead of creating a custom CSS to remove this line, users who want this line should create custom CSS to add it! The code used could be the following: H2 {border-top: solid 1px rgb(170,170,170);}. (Note that this code is actually shorter than the code needed to remove the hard-coded line.) Another user might prefer another way to accentuate these headers: H2 {background-color: rgb(170,170,170); color: white;}.
  • If consensus is reached that a line above 2nd level headings is indeed needed, it can be easily added to the default Wiktionary CSS.
  • The suggested code for removing the horizontal lines above headers, actually removes all horizontal lines from articles. Also the ones that are placed in articles for other reasons, and might possibly be important.
  • Third parties might want to re-use the information from Wiktionary. They would probably change the layout of the pages, to fit with their own layout. The current hard-coded line conflicts with this possibility.
Pbb 12:31, 27 February 2006 (UTC)[reply]
You'd get my support for trying to implement it by CSS. But, make sure you have a number of supporters on side before you go ahead.--Richardb 13:02, 27 February 2006 (UTC)[reply]
Okay, what would be the best way to discuss this? Leaving it here at the Beer parlour and seeing if there are any pro/con reactions? -- Pbb 13:31, 27 February 2006 (UTC)[reply]
My guess is it won't change, but yeah, just give the issue some time to settle. Personally I think there are much bigger issues with style, and the way it's approached here, everything is hard-coded. I mean, it would make a lot more sense to define dictionary entries in XML, wouldn't it? You'd just have to train an army to do it. There are even bigger issues than the cosmetic, though... supposedly, anyways. Folks like myself just never get around to them.
To answer your question about who's gonna make all the trivial changes, that's what bots are for! Davilla 18:08, 27 February 2006 (UTC)[reply]
Hi Davilla! I agree with you, there are some mayor issues with using a Wiki to make a dictionary. A Wiki has many shortcomings in creating a structured document like a dictionary, many more than in creating a encyclopedia. But my point is that in this case, this Wiki actually offers a lot easier way to get what we are trying to achieve. Sure, it's not like a solution for world peace (trying to be funny here), but with just a one-time edit to the default stylesheet, we would prevent having to manually enter the line in all articles.
  • One reason the separators appear in the edit text is to help people editing the page!
  • In 2003, there were relatively few multi-language entries. The conversation about them occurred after the practice had started. And most (if not all) of the entries were cleaned up at that time. Presumably new entries have been patrolled and cleaned as well. (The exception being the most recent BP conversation about them, where one contributor thought much like you do, Pbb, and actively removed the separators from a couple dozen entries. These have since been corrected, I believe.)
  • CSS expertise is just that; expertise. But not every problem is a nail that needs a CSS hammer. The extensibility of the entire wiki framework is hampered by CSS; people writing content that need a special layout tweak do not have access to MediaWiki:Monobook.css nor the CSS of any other skin. Say someone wanted to add a CSS tweak for Arabic characters? They could modify their own User:{{USERNAME}}/monobook.css but that would leave their Arabic characters unreadable for everyone else.
  • Lastly, the "----" is a syntactic marker that assists in parsing entries from the XML database dump. Unnecessary for advanced parsers, perhaps, but a tremendous help when starting out.
  • --Connel MacKenzie T C 07:11, 28 February 2006 (UTC)[reply]
Okay, I understand your point about having a separator while editing the text. This would not be there if we had the line in CSS. I don't really understand what you are trying to say in point two, except that there is room for interpretation as to what exactly cleaning up and correcting is in this case. In your third point, you're right that CSS is yet another expertise, just like WikiText and HTML. (Okay, maybe more advanced than those two.) However, the nice thing about CSS is, since there seems to be agreement on the need for a divider line, only one person would have to add that code to the stylesheet, nobody else would need the expertise. I don't really understand your example of Arabic characters in relationship to this issue; we are not asking every user to edit their personal stylesheets, and nothing will become unreadable by any change to the style. The "----" can be used as a marker in the code, but can't "==" be used just as well? The main problem when using "----" as a marker is that you cannot be certain all articles will have that marker.
Please understand me right, I agree we need a clearer distinction between different language sections than what is offered by default. I just do not understand the reason for entering it manually every time (with all the possibilities for errors and inconsistency) when it can be automated with just one line of code in the CSS file.
Pbb 12:22, 28 February 2006 (UTC)[reply]
  • By "Cleaned up" or "corrected" I meant the "----" was added back into the two dozen entries.
  • My point exactly. For you, "cleaning up" in this case means adding the line, while for me cleaning up in this case means removing it. Just a matter of different interpretation, no problem.
  • Various cleanup tasks run offline to get us closer to some kind of data layout consistency. Although we cannot be certain all entries will have that marker, we can be pretty sure. To edit thousands of entries, to remove one helpful line, so they can be replaced with a single line in 8 CSS files (one for each "skin", don't forget) seems like a pointless exercise. (The line is helpful in editing the entry!)
  • Agreed, if we would make the change to a CSS divider, then we would once need to edit many thousands of entries. However, it also means that would be the end of it, we would no longer need to check every (new) entry that is made to see if the line is there, because CSS would automatically take care of that. So, a one-time mass-edit, to save an endless(!) line of checks of every new article. I haven't studied each skin, but eight changes more to save on the endless structure check is not that much of a problem. Other skins, by the way, might already have clearer distinctions between the different header levels, or profit from another edit than the divider line.
  • CSS expertise was lacking in this user community, when the practice was started. CSS expertise means that someone with CSS expertise needs to be on hand at all times, when a formatting change is needed (CSS is NOT simple!) To have a convention that everyone can understand is much better. CSS is not the right tool for this problem!
  • Okay, on this one I plainly disagree with you. First: You learn something new every day. The fact that certain expertise was not available at one time, does not mean you should stay away from it forever. The developers of MediaWiki have also discovered many new things after their first release, so they also incorporated this new expertise. They did not stay with the old version. Secondly: CSS it NOT difficult! It may be your opinion, but "difficult" and "easy" are very relative notions. CSS is, for example, easier than HTML, because it is more descriptive and more logical. (Of course, when you know HTML and don't know CSS, the second may look very difficult, but that is something else.) And like I said before, the CSS solution only requires one person to understand CSS, all other users need no knowledge. (They don't need to understand how the line underneath each header was created either, do they?) By it's design, CSS is exactly the tool meant for this problem; to seperate layout from structure and content.
  • The Arabic characters example referred to template:ARchar which I think you'd find to be an equally distasteful practice...but one that meets the needs of this community very well.
  • Took a look at that template, and I have a vague idea about the thoughts behind it. This was meant to be used to display Arabic characters in browsers that cannot display Unicode, right? Not a very "stable" solution, because you're right it will display invalid characters for many users. Two important differences with my suggestion however; firstly mine will not make anything unreadable for browsers that don't support CSS, and secondly mine actually saves typing work in contrast to that Arabic template.
  • Again, if we were to use "just "=="" then the separation between languages would not be as clear during editing. No CSS solution can do that; there is no way for CSS to split apart an edit box dynamically. Even if there were, I doubt anyone would want it to.
  • I already said you were right on that point, no need to repeat it. Indeed, CSS is not applied to the editbox.
  • If we were using dictionary specific software, instead of encyclopedia oriented software, we would have data abstraction for all elements, with custom input and output forms for everything. This approach (MediaWiki) is the opposite: no data formats (much to my chagrin) and no formatting conventions, except perhaps those of community accepted practices.
  • Clear, on this point we also agree. For me, the reason I use Wiktionary is that it is the only dictionary I found which lets me add missing entries. If there were a better alternative, I would switch quickly. But this point is offtopic.
--Connel MacKenzie T C 17:45, 28 February 2006 (UTC)[reply]
  • Okay, points taken, I agree on your point that the "----" is also a divider in the editbox, which you would loose with CSS. Now I am curious what other users have to say.
-- Pbb 08:22, 1 March 2006 (UTC)[reply]
  • I think you still missed two important points: 1) the divider is syntactic, 2) the divider aids in parsing.
  • Sorry, I don't understand what you mean saying the divider is syntactic. (English is not my native language.) Is it more syntactic when defined in WikiText, than when defined in CSS?
  • Did I miss that point? I replied saying "==" could be used in the same way, in which you replied with the advantage when editing articles, with which I already agreed before. Or am I misunderstanding you now?
  • You mean another than Eclecticology's comment dating 28 February? Because that one has got my reply right underneath it. Which comment did I miss, because then I am still overlooking it... If you are referring to that comment, here a summary of my reply:
  • Ec thinks this would "expect most users, who do not have any particular programming skills, to be able to make those [CSS] adjustments". The whole point is, only one person has to edit the CSS code once, instead of the current situation where everybody is expected to add the divider line him/herself. Nobody except that one person needs any CSS knowledge.
  • Ec also comment that "not every skin has the horizontal line below level two headings, and I find the horizontal line a visually effective way of separating two languages." Great, that is why we have personal stylesheets in Wiktionary. So that we can customize the look of skins the way we personally want. Again, why ask everybody to hardcode layout elements? If we all agree on the positive effect, then we add it to the master CSS, if it is just a personal preference, then the personal CSS is the place to put it. Still no reason to hard-code it in the page.
  • There's another problem with having the horizontal lines hard wired into CSS. It would force those lines not only to separate languages but also before level two headings on any page, including this one. Eclecticology 09:47, 1 March 2006 (UTC)[reply]
  • Nope. The code .ns-0 H2 {border-above: solid 1px rgb(170,170,170);} adds the extra divider line only on pages in the default namespace.
-- Pbb 15:18, 1 March 2006 (UTC)[reply]

Unicode character representation

This edit was pointed out to me:

and looking at it, I'm a bit confused. I thought we were supposed to substitute characters in place of their HTML equivalents whenever possible, to facilitate searching. Has the search software been fixed now? --Connel MacKenzie T C 06:51, 28 February 2006 (UTC)[reply]

Great Pronunciation Flood

I recently dusted off the file of IPA transcriptions I started before Christmas when I was sick and my brain wasn't good for anything but doing transcriptions. Then I started adding to it again. Then I thought, "Hmm, all these pronunciations should be on Wiktionary."

So I've put on User:Keffy/Great_Pronunciation_Flood my proposal for a bot-mediated project to start adding these transcriptions to Wiktionary, along with (ideally temporary) audio files generated by a speech synthesizer for each transcription. For now, this is only for discussion and to get people's suggestions for improvements. It'll probably be at least a month before any uploads are ready to go.

It might be better if small-scale technical suggestions went on the talk page for User:Keffy/Great_Pronunciation_Flood, leaving the Beer Parlour for big issues, like: How do people actually feel about having speech-synthesized files for pronunciations? Keffy 08:12, 28 February 2006 (UTC)[reply]

  1. The 'bot for mass-uploading files exists already, called "commonplace." Somwhere on w:WP:TOOLS, I think you'll find the link.
  2. Could you do a sample for Wiktionary?  :-)
  3. Aren't there some really crazy modifier rules for elisions, etc.? Will you even try for those?
  4. Rock on dude! Very cool stuff! If I can help feeding you lists or 'bot uploading them, let me know what you want. I usually create a script of repetitous calls to "replace.py" (e.g. for GutenBot.) That way, I'm not messing with the python code directly. --Connel MacKenzie T C 08:31, 28 February 2006 (UTC)[reply]
Great - this will save a lot of work. Is this for audio transcriptions only or for text transcriptions too? Any transcriptions need to be marked with the variety of English they apply to, of course - are these all Canadian pronunciations? — Paul G 12:49, 1 March 2006 (UTC)[reply]

Lexicons: A Proposal

Yes, I know that in Wiktionary we have a system where, to collect English words of German origin, for example, we can just create a Category. But categories have a disadvantage in that words cannot be listed all one page, and no text next to a word is possible, as is possible in Wikipedia (see Wikipedia:Lists of English words of international origin); for this reason, attempts to remove those lists from Wikipedia and allow Wiktionary to sort them by categories was resisted. However, I have a proposal to remedy this: I want to create a new (?) type of Wiktionary page: Lexicon pages; we can have, for example, a Lexicon:English words of German origin, permitting a bit of text as well as a list. Before starting any such pages, I would like some opinions. I know it may seem redundant, but categories have limitations and there seems to be demand for lists with text; but they look awkward in Wikipedia, and often cause controversy (encyclopedia vs dictionary). For more, see the talk page of Wikipedia:Lists of English words of international origin. I am not asking to replace categories in any fashion, or even disturb them. I like categories. Alexander 007 21:15, 28 February 2006 (UTC)[reply]

Any objections? I'm about to create the first such lexicon. These lexicons will be linked to from Wikipedia eventually, so there is an audience out there which will be "clamoring" for them. They will definitely serve a purpose here. They will also probably attract new batches of people to Wiktionary. Alexander 007 21:49, 28 February 2006 (UTC)[reply]
If possible, how about having a word or two with the inclusionists over at wikipedia who seem slightly obsessed with keeping so many pure dicdefs instead of letting the wiktionaries take the workload they were intended for? It would be nice if they spent more time writing dictionary info in an actual dictionary. :-p
Peter Isotalo 00:29, 1 March 2006 (UTC)[reply]
What is happening in Wikipedia about this and their ongoing deletionist/inclusionist debates are entirely their concern, not ours.
While I have no fundamental objection to the lists that you propose, there is no demonstrable need for your new pseudonamespace. "Appendix:" will do just fine. The biggest problem with lists is that they are not self-maintaining, and rapidly go out of date. Eclecticology 09:22, 1 March 2006 (UTC)[reply]
It's very much in our interest to get more people working on this project. That it happens to coincide with the possible cutting of a major Gordian knot of internal wikipedia politics seems like an odd reason to reject the idea. With some subtlety, of course. Making a generall call for "Dicdef-inclusions to join the right project" or some such tactless idea.
Peter Isotalo 14:53, 1 March 2006 (UTC)[reply]

Obscene and suggestive images

What is Wiktionary's position on images such as the one currently at lolicon? I'm told that making any sort of judgement about the appropriateness of an image is not NPOV.

I belive this to be a monstrous piece of fucking bullshit. The image (relevant to that context) does not belong on Wiktionary, right? --Connel MacKenzie T C 23:06, 28 February 2006 (UTC)[reply]

First, making judgments about the appropriateness of an image is fine, as long as it's not based on Censorship, offensiveness etc. What is 'Obscene' to some is perfectly acceptable to others. As pointed out at w:lolicon these comics can be freely bought in Japan. Remember wiktionary.en is not wiktionary USA (or any other region).
Second, some stuff from Wikipedia:
All world cultures have certain taboos regarding certain subject matter, and how those subjects may be portrayed (if at all). However, views and feelings on these matters vary so widely from culture to culture, within each culture, and from period to period, that there is no universal agreement as to what is "offensive". There is also no agreement or substantive evidence as to what information may cause concrete, objective harm to society or individuals, nor any agreement as to what is age-appropriate for people to read or see. Many cultures have attempted to shield certain information from access by children, from women, from certain races or social classes, or from all of those groups at various times in history.
Wikipedia's mission to provide a neutral and comprehensive resource of information requires that it discuss, describe, and illustrate matters that have been, are, or will be considered offensive by certain groups. Because one of Wikipedia's most fundamental policies is to maintain a neutral point of view, and because there is no objective basis for imposing one cultural preference upon Wikipedia articles over another, it must disregard "offensive" as a basis for removing information. Wikipedia contributors must attempt to step outside of culturally contingent perspectives to document and illustrate even the most controversial topics in a thorough, factual manner, to the fullest extent possible under relevant law.
And third, the comment I left Connel:
You believe that these images should not be shown on Wiktionary because you think they're 'sick'. That implies a moral judgment, and is therefore directly in opposition to NPOV, the founding principle of all Wikimedia projects. If Wiki cannot conform to all worldwide communities (and it can't), then it should not conform to any of them. The image is legal and directly relevant to the entry and should stay, unless you have editorial reasons for its removal (which should be discussed on its talk page).
Gerard Foley 23:14, 28 February 2006 (UTC)[reply]
I think it should be tolerated at the wikipedias, but not here. Using pictures to illustrate sexual fetishes or the likes is not something we should encourage. I feel it's anything but pragamatic to believe it would improve our reputation. A lot of people get upset by sexually explicit images of any kind and this is not the place to question their judgement even I don't in most cases don't agree with them in the least. It simply distracts too much from both the content of the article and the work on the article. Pictures illustrating articles at wikipedias are more or less mandatory and potentially offensive pictures should be tolerated in many instances. At the wiktionaries I think all types of pictures should be kept to an absolute minimum and pretty much anything sexually oriented should be avoided completely.
And frankly, I don't feel that NPOV is all that applicable to any wiktionary. It's a rule of thumb clearly intended for encyclopedias, not dictionaries. Neither is this the appropriate forum for discussing any possible forgiving traits of pedophilia. We're talking kiddie porn fantasies here, not merely obscure or abnormal sexual practices between consenting adults. It is disturbing and I don't consider Connel's reaction unreasonable, except for the foul language.
Peter Isotalo 00:00, 1 March 2006 (UTC)[reply]
My foul language was an attempt to illustrate that I am not supportive of "censorship" of entries. We are not Wikipedia, and I would like solid guidelines on questionable images (Peter's points above seem very reasonable.) We are not the Japanese Wiktionary either; perhaps they would like to include the image, but it is certainly far outside "acceptable" to our readers. I haven't checked, but I'd be quite surprised to discover that say, in India (where English is spoken) child cartoon pornography is encouraged. --Connel MacKenzie T C 00:18, 1 March 2006 (UTC)[reply]

We're not the Japanese Wiktionary, but people in Japan will use our site. Why should we force our moral standards on Japanese readers. Would we let others force their moral standards on us? Wiktionary en is used by the entire world. There is no universal agreement as to what is "offensive". It's different everywhere in the world. Therefore "offensive" is a point of view, and that goes against NPOV which is as much valid here as on Wikipedia. I can't believe I have to go through this stupid debate yet again. Gerard Foley 00:37, 1 March 2006 (UTC)[reply]

  • I support Gerard on this matter. The material in question is by nature offensive, and removing an cartoon image will not change that. True, pictures should be kept to a minimum in a dictionary to preserve its ease of use, but cluttering is not an issue with an entry as short as this. A picture is worth a thousand words, and the image in question helps readers understand exactly what a lolicon is. --Primetime 01:34, 1 March 2006 (UTC)[reply]

Offensive to some, mostly westerners (us), but Wikipedia claims this to be freely available in Japan, so it's not as offensive over there (hence the POV). Connel, you said above that "I am not supportive of "censorship" of entries". If you truly have that view then you must accept (as I have) that there will be items on Wiktionary that you will not agree with. You can't say you don't support censorship and then go and do just that, no matter how much the image disgusts you. Perhaps you might look at w:Wikipedia:Censorship. Gerard Foley 02:22, 1 March 2006 (UTC)[reply]

Good sirs, go reread what I wrote. That simply is not appropriate here. We have lots of offensive entries here, but they are described in as neutral a manner as possible. An in-your-face shock-value image is not neutral. Opening the door for this nonsense certainly makes us vulnerable to a much wider range of abuses than we currently are. As a sysop here, I can attest that we are barely keeping up with the volume of nonsense that comes our way. Does my point of view affect my jusdgement? Of course. If I am out of line, or rather, when I am out of line, others let me know right away. On this particular topic, I am not out of line. You need to put that image somewhere? Put it in the Wikipedia garbage heap, not here. --Connel MacKenzie T C 03:47, 1 March 2006 (UTC)[reply]
The concept in lolicon can be adequately described without the illustration. It is not relevant to an understanding of the word. As pornography it is certainly at the mildest end of the spectrum, and I am not offended by it. It is nevertheless unneessarily controversial, and remains covered by copyright. Maintaining harmony in the project is more important than an attempt by someone including an image just to make a point about censorship. That's reason enough to delete the picture. Eclecticology 09:00, 1 March 2006 (UTC)[reply]

I know many people who would be offended by an image such as the one shown. It harms our reputation. Do you want Wiktionary to be blocked by every school and every parent? No serious dictionary includes such images, nor any other images that would be deemed offensive. I vote for self-censorship and removal. Jonathan Webley 09:19, 1 March 2006 (UTC)[reply]


  • Why should we force our moral standards on Japanese readers.
By *not* showing something we aren't forcing anything on anyone, quite the opposite.
  • I can't believe I have to go through this stupid debate yet again.
The debate isn't stupid, it is very relevant. The reason people keep popping up on Connel's side of this question is that nine times out of ten the image appears to be posted for shock value and trolling rather than actual furtherance of the definition of a word. A non-nude lolicon image would have suited, but the most graphic of those on *pedia was chosen. It seems to me that the folks who chose to post the most extreme thing they think they can get away with are relishing this fight so they can take what they think is the moral high ground of "no censorship from me." I don't think we need a sample of child pornography in order to have an accurate definition, I don't think we need pictures of Goatse in order to define a "shock site." It is certainly possible to find a lolicon image that represents the style of manga yet is relatively inoffensive, so let's do that. - TheDaveRoss 09:31, 1 March 2006 (UTC)[reply]

I think we should sidestep the issue and ask ourselves. What does the picture offer than the sentence "Erotic art depicting female children, generally between the ages of 8 and 13." does not? In my opinion nothing. A picture in a dictionary should be used to communicate subtle nuisances that is difficult to put in words. Note that I do support the use of the picture in Wikipedia and anybody more interested in the subject can go there. We should in my opions only explain words so the extent that nobody misstakenly either clicks on the link to Wikipedia because they misundersood or that somebody doesn't click on the link to Wikipedia because they though it wasn't what they were looking for. I don't see any significant risk of either happening is this case. --Patrik Stridvall 10:54, 1 March 2006 (UTC)[reply]

Patrick makes a good point. Surely the criterion for the inclusion of any picture should be whether it gives the user a better understanding of the definition of the entry. The definition here is adequate; the picture does not provide any information that the definition does not. It can therefore be removed. Censorship, morality and the like then become separate issues. — Paul G 12:57, 1 March 2006 (UTC)[reply]

I think this talk of respecting Japanese society as a whole is utter nonsense. In the US and many other Western countries, there are large minorities that consume pornography considered to be extremely offensive or downright illegal by the vast majority of the population. I'm quite confident that the situation is fairly similar in Japan. That some Japanese are tasteless and (in my view) immoral enough to commercially support simulated kiddie porn doesn't mean that it's a representation of the Japanese people as a whole. And I'm certainly not seeing anyone trying to push graphic imagery at, say, anal sex by including a cover of Butt-fuck Bitches 17 or something like it.

Other than that, I think Patrik has the best argument so far: there's simply no need to illustrate this concept with a picture because it doesn't really add anything that isn't already covered by the definition. A good example of an articles that does merit including a picture is trebuchet, the general design of which is extremely difficult to explain in even the most well-written prose.

Peter Isotalo 15:27, 1 March 2006 (UTC)[reply]

To TheDaveRoss: the picture I originally chose was Image:Lolicon comicbooks sold in Japan 001.jpg. I wouldn’t have called it the most graphic image available. I'm not a troll and I didn't post any image for shock value.

To Peter: Wikipedia says "Many general bookstores and newsstands openly offer illustrated lolicon material.", and [10] says "I try not to loiter around the schoolgirl/lolita section too long. ...Unfortunately, this is like 99.9975% of Japanese porn.". This doesn’t give me the impression that some Japanese are tasteless and immoral, to me it suggests that it's normal there.

And finally the argument to remove the picture because it doesn’t give a better understanding of the definition of the entry. I am completely against the removal of the picture for censorship reasons, but I'm not against it's removal for simple editorial reasons. I support the removal of the image based on this criteria. Gerard Foley 16:57, 1 March 2006 (UTC)[reply]

I think better understanding is the key here. To have a policy that allows pictures that merely communicates the same thing as the definition will cause several problems as well as endless debates. Correct me if you disagree, but I believe that few people would be argue against that a picture as a minimum should properly illustrate the concept as opposed to being primarily decorative and only vaguely related.
If so then we really can't choose a less offensive picture for lolicon since the word by definition involves erotic pictures of female children. Choosing something "neutral" simply won't do. Actually given the inherently offensive context the picture is actually quite neutral in my opinion. I can easily imagine much worse pictures fitting the definition though I guess at some point they would be pornographic as opposed to erotic. Though it is possible that some lolicon would perhaps be considered pornographic. I don't know and I'm not really sure I want to know either...
In addition I sure there will be people eventually that would like to properly illustrate concepts like rape that is the victim looks unwilling, the prepetrator uses or threatens violence etc. This will just provoke a new debate. Demanding that the pictures offer something additional will avoid that. If the definition for something is not clear enough we could always say if you rewrite the definition the picture will not be needed. In the case of lolicon the definition looks clear enough though I very much fear that erotic should more properly be replaced by "erotic or pornographic".
So therefore I propose that the policy for pictures should be that both of the following is fullfilled.
  • A reasonable effort should be made to try to make the definition clear enough first. An entry is clear enough when there is little risk that somebody might think this was the word they were looking for when it was not as well as the opposite that somebody didn't think it was the word they were looking when it was.
  • The picure must offer something additional that is hard to put in words. Trebuchet is a good example as is animals like cat and dog. No matter how strict your definition is for a specific animal a picture will instantly trigger recognition for many people in a way that words simple can't. The picture in lolicon is not very likely trigger any "Ah, I have seen it before, now I remember." at least I hope not...
The picture in lolicon fails both tests so it should be removed. --Patrik Stridvall 19:21, 1 March 2006 (UTC)[reply]
The criteria proposed looks fine, but no-one has answered if Wiktionary censors entries or not. And if it does what gets censored? Gerard Foley 20:09, 1 March 2006 (UTC)[reply]
I would say the short answer is no, we do not censor, we edit. The goal of Wiktionary is not a political one, it isn't intended to push the bounderies of what is acceptable in any culture anywhere, it is intended to be the most complete, accurate and useful resource it can be, and certain content just doesn't coincide well with that. There are many good reasons to exclude offensive content, several have been enumerated in this discussion (harmony, acceptability by the broadest readership) but the reasons for inclusion are not as good (we don't want to be Bowdlerisers). TheDaveRoss 20:26, 1 March 2006 (UTC)[reply]

So if it's not acceptable to western tastes or controversial it gets censo... edited out. Got ya! Gerard Foley 20:40, 1 March 2006 (UTC)[reply]

It is not about censoring nor is about not offending people. Some people are offended for no reason other than that reality isn't like how the wish it to be. At its core censoring is about the hope that preventing themselves and especially others from seeing reality as it is will somehow help. As long as we don't give in to that we are not really censoring.
However, as long as we, within the scope of our project, can accurately describe reality as it is, there is no reason to be needlessly put ourselves in the firingline. Removing the picture will at lolicon does not try to hide reality from anybody wishing to see it. They can find some of it at Wikipedia and now that they know the word they can search for it in Google or wherever and either order it by mail or travel to Japan or whatever. It would only be censorship if we by actively hiding reality tried to prevent people that would have been interested, had they know, from doing that. This, however, doesn't mean that we have to go out of way to provide it, especially when it doubtful that it is within the scope of our project. --Patrik Stridvall 21:44, 1 March 2006 (UTC)[reply]

Freedom of speech is a mirage. Every country has laws that restrict what can be said or written, whether with regard to propriety, privacy, national security, copyright, libel or public order. Every publisher practises self-censorship. Indeed NPOV implies censorship. Articles that are offensive – for example if racist – are censored. Paedophile images are not acceptable in what is meant to be a mainstream work of reference. Jonathan Webley 21:52, 1 March 2006 (UTC)[reply]

You can not define paedophile images. A photo of my then two year old nephew (currently age 20) sitting in the sun in a tub is considered the nicest picture by his family we have of him ever.. a paedophile might find it interesting.. Consquently many pictures are only paedohilic because of how you look at pictures. Here a dirty mind is not the joy forever. GerardM 12:57, 2 March 2006 (UTC)[reply]
I sympathise with you to a certain extent, but I don't see why you're so adamant about this particular page. 'Maybe it is worth offending people if it helps us define a word, but in this case it doesn't so why bother? Widsith 13:37, 2 March 2006 (UTC)[reply]
Since several people seem to have very specific thoughts on this issue, perhaps this would be a good time to consider creating a new page distilling down "policy" on including images. Specifically: (1) When does an article need an image and what criteria can be used to make that decision? (2) When should an article NOT have an image and what criteria can help make that decision? (3) How many images should appear in a single entry? (4) When is a link to an illustrated Wikipedia article sufficient? --EncycloPetey 07:12, 4 March 2006 (UTC)[reply]
Go for it. Create a policy page for when inclusion of pictures is appropriate. Please! And move as much as possible fromichere to the discussion page of the policy.--Richardb 17:25, 4 March 2006 (UTC)[reply]
We have enough problem agreeing on the wording of some articles. What are the chances of agreeing on the right picture to illustrate the page ? What chance vandals will think they have a better picture ! Even if the page is not as controversial as this. Steer clear of pictures except when really necessary/useful.--Richardb 17:25, 4 March 2006 (UTC)[reply]
Personally I think you've got to have some strange mentality to consider the first picture Gerard chose as in any way offensive (the picture of a magazine sellers display).And I know that things are different in the US (and Saudi and Iran), but even the second picture seemed pretty damn inoffensive to me. But, let Wikipedia handle that controversy. I think they are better equipped to do so than our smaller team. A decent link to the Wikipedia article is all that is needed. So that's what I have done.--Richardb 17:25, 4 March 2006 (UTC)[reply]
EC put it up for rfv. How can you seriously do that when there is a huge great article in Wikipedia, and it's got etymology here in Wiktionary. How much proof do we really need to keep a word on here ? Or was that just back-door censorship?--Richardb 17:25, 4 March 2006 (UTC)[reply]

The policy that should be chosen is simple. Accept all images that are not the least bit controversial since every picture adds some value to the definition. I mean, the Wiktionaries could potentially be the greatest dictionary resource ever created, so why try to stifle that? As to the controversial images, the policy must accept them eventually. As per the Wikipedia guidelines, this is a question of censorship, and you can't get around that by defending one form of censorship over another. I'm sorry to say I found the image in question to be quite illustrative and only mildly offensive considering what's floating around out there. That image in its selection sought to illustrate the concept in the least offensive form possible, and quite frankly I'm glad it's included in Wikipedia. But it should not be included here, not yet, because of the lopsidedness toward potentially offensive material that will tarnish Wikipedia's reputation. Images that, in depicting controversial topics, are illustrative of the definition to a certain extent, should not be accepted unless more neutral topics which could be illustrated to at least the same extent are so illustrated in the majority. Look at almost any piture book and the first thing you'll see are animals, but save a handful of entries Wiktionary doesn't even have that yet. Yes, the image is appropriate, but we're a long way off from including anything so controversial as this. Davilla 18:06, 7 March 2006 (UTC)[reply]

Confirm e-mail address

The current version of WikiMedia software requests a confirmation of your e-mail address. To do this, log in, then go to Preferences, in the upper right corner of the screen. Check that the email you have given is the one you want, then scroll down and click "Confirm your e-mail address." In the next screen, read the text, then choose "Mail a confirmation code." Then, follow the link in the message to confirm your address. --Dvortygirl 05:21, 2 March 2006 (UTC)[reply]

Clarification: Apparently this step is required even if you previously had confirmed your e-mail address. --Connel MacKenzie T C 02:05, 3 March 2006 (UTC)[reply]

Bot Request: User: Tawkerbot

Connel has requested bot assistance for changing wikt:WS:WS to wikt:WT:WS on Special:Allpages/WS: wikt:Special:Allpages/WT:

I have tawkerbot, a pywikipedia bot running on a machine in my data center that doesn't get a lot of use.

The bots username is User:Tawkerbot - the bot also runs on Wikipedia under the same username.

Point of clarification: I requested it be discussed first, bot or no bot, to make sure everyone understood why. This will accomplish that, no doubt. --Connel MacKenzie T C 02:02, 3 March 2006 (UTC)[reply]


Bot Name: User: Tawkerbot
Owner: User: Tawker
Purpose: changing WS to WT
  • For votes:
  • Against votes:
  • Comments/questions:

Why does this need a bot? This is a very small list. What difference does it make if these are listed as "WS" or "WT"? They are only shortcuts. Eclecticology 09:37, 2 March 2006 (UTC)[reply]

When they are represented as [[WT:BP]] some things interpret that as [[wikisource:BP]]. My software never does, but often when coordinating with other projects (Wikipedia, Meta, etc.) unecessary confusion arises when someone notices the shortcut (even though it is hidden with the pipe syntax.) Fixing it now, while there are still very few shortcuts entered is better than waiting.
Good point. Eclecticology 07:47, 3 March 2006 (UTC)[reply]
As to "why a 'bot request": you have specified in the past that all 'bot tasks need separate authorization. That seems to be atypical of other MediaWiki projects. --Connel MacKenzie T C 02:02, 3 March 2006 (UTC)[reply]
In this case it seems that the number of pages affected is so small. This task could even be done manually before the approval process is finished. At present all that bot approval does is hide the changes from recent changes. The process can still be run without being formally approved. Eclecticology 07:47, 3 March 2006 (UTC)[reply]
From memory, when I introduced shortcuts to Wiktionary from Wikipedia, I used WS the same as Wikipedia. Has that changed ? How did they make the change ? --Richardb 16:39, 4 March 2006 (UTC)[reply]
Ah well, so it was my fault, unknowingly. Wikipedia uses WP for their shortcuts. Sorry for using WS unknowingly!--Richardb 16:44, 4 March 2006 (UTC)[reply]
I'm not sure if that changed just shortly after you got us started, or if it was a misconception from the start. Doesn't matter. For now we need these copied/moved to WT: names, and the WS: kept around long enough for people to get used to the change, right? --Connel MacKenzie T C 16:55, 4 March 2006 (UTC)[reply]
OK, moving all 43 manually. --Connel MacKenzie T C 05:26, 5 March 2006 (UTC)[reply]
Great work Connel!--Richardb 08:46, 6 March 2006 (UTC)[reply]
Ah, but you also have to change all links to WS:CFI etc !!!--Richardb 08:48, 6 March 2006 (UTC)[reply]
Well, that task is certainly oriented towards a bot! But then again, no link is "broken" per se...the extra click required, means that people will learn the new shortcuts that much sooner (this is either a good thing or a bad thing depending on your POV.) If these aren't fixed in a couple days, ping me again. --Connel MacKenzie T C 10:09, 6 March 2006 (UTC)[reply]

Translations to be checked proposal

The following is slightly adapted from Paul G's talk page:

I've been considering a major change for Category:Check translations.

I was reading Paul's proposal from some time ago, and I was wondering whether it ever got anywhere. I think we need to take action right now, as has been stated several times before (and because of the ever-growing number of articles in the category).

My proposed change is to make the whole system of Translations to be checked (TTBC) more akin to what we now have for RfC and RfV. I'd set up Wiktionary:Translations to be checked and make Template:checktrans more like Template:rfc.

The main reason why I think my proposed system might work better (and make the job a bit less discouraging), is that in the current category-only structure, one can't see which translations are actually needed. Each time I tried to work through it, I quickly quit because of a) the fact that my work went largely unnoticed (by myself), because I could not remove a single article from the category due to other translations needing review; and b) because, after reviewing 50 entries, I noticed many many simple TTBC, for which I felt it would be convenient to warn a knowledgeable someone (unwieldy job, though).

Therefore, a central RFV-like page for that. The main advantage there would be that one can quickly list the required languages, which will attract those who can do the job.

Known problems already are translations into exotic languages for which none may have the ability to check. However, they can be archived after some time. More so, the 800 present TTBC, which will all need to be put on the page (I volunteer). Certainly, I would not put them all under a separate ==header==. As for the probably quickly derailing length of the page, I have no clear view on that. Perhaps it won't become that long after all, when a number of people are working on it.

Note: before I came up with this, I considered making distinctive categories like Category:Check Dutch translations, which could be added, but that would lead us to far off I guess.

I'd like to hear more opinions on this. — Vildricianus 14:19, 3 March 2006 (UTC)[reply]

I think this is a fantastic idea. I've done the same as you - made a start on a particular language, done one or two entries and not been able to remove any from the list as other languages remain unchecked. The problem here has always been, as you say, that you can't tell what languages remain outstanding or which pages need the most effort.
Having a specialised page that lists what languages need to be checked for any given word would certainly help deal with this. Those people who volunteered to help could be directed to this page and quickly be able to find out where their contributions are required. Entries could take the following form, perhaps:
===word===
Dutch, French, German, Italian, Kurdish, Latin, Macedonian, Occitan, Russian, Swedish, Urdu
(note that the languages are listed in alphabetical order for ease of finding them)
Users could then sign off the languages as they do them, by saying (for example) "Dutch done" and signing their name, which would show who did the checking and when, thereby giving us an audit trail.
I'm glad to hear you are putting your money where your mouth is by volunteering to post all the words on the page. Perhaps, as they are added, they could also be removed from "Category:Check translations" so that none of them get forgotten, this category then either being phased out or retained and checked from time to time to make sure that the "rft" page is up to date.
Once entries are complete, they can of course be removed from the page, so the page needn't get too long.
One resource that we can make use of, especially for the lesser-known languages, is Category:User_languages. I propose that all users listed there are invited to contribute.
I don't think anyone has addressed this before, you know, so please do raise it in the Beer parlour (you are welcome to copy and paste my comments there). — Paul G 15:00, 3 March 2006 (UTC)[reply]

Additional thoughts:

As the current number of articles to be checked is around 800, it's desireable not to put them in one go on the page using a separate header system. I could perhaps use a bullet system, e.g.:

  • back: Hebrew, Ido, Russian
  • balance: Arabic, Breton, Catalan, Chinese, German, Hebrew, Ido, Indonesian, Japanese, Spanish, Swedish
  • balloon: Marathi

Later entries could then be added each under a separate header perhaps (for maximal use of TOC convenience). Or, I can immediately start off using headers for each word, but then I'll transfer smaller groups of words from the category to the TTBC page.

The category could be kept and function like Category:Requests for cleanup. Using Related changes can show recent additions to it that have not been put on the TTBC page. — Vildricianus 15:44, 3 March 2006 (UTC)[reply]

I really don't have much to say on this one, though I have been aware of the problem ever since the checktrans notices first went up. Unlike many of the other cleanup issues these cannot usually be fixed by one person in one sitting, The one language at a time kind of progress only makes things look slower than they really are. I looked at balance, and noted that one of the Indonesian words had a blue link, so I followed it. That wage has the single one word definition "balance" without any indication of which sense is intended. Fixing and referencing such pages would go hand-in-hand with the current proposal; following blue links might even show that we already have the information. Anyway I'm glad that someone is willing to take on this big task, and if this works I'll be even happier. Eclecticology 19:23, 3 March 2006 (UTC)[reply]
Template:rft unfortunately exists for the tea room. Perhaps Template:ttbc or Template:rftr? Am I right that you have discarded the notion of auto-populating those 800 entries into language sub-categories? (As a one-time thing I could see it, but it would probably be very awkward to add new entries to.) --Connel MacKenzie T C 19:39, 3 March 2006 (UTC)[reply]
This sounds like a worthwhile project. The trouble that I always have with these things is that most of the time I simply can’t understand the descriptor and don’t know what meaning is wanted. It would be a tremendous help if whoever writes a definition would also use it in a sentence. I know that some others have trouble with this as well, because I constantly see odd translations of the descriptors instead of the word in question. —Stephen 19:54, 3 March 2006 (UTC)[reply]
Exactly because the issues can't be solved by one person I came up with the idea of creating a page which clearly states what the issues exactly are, i.e. which languages need checking. Of course, I'm also aware that about 50% of the articles in Category:Check translations don't require much work, which is also exactly why my proposed page may work without too much trouble for the existing articles.
Why not simply keep Template:checktrans and modify it for the needs of the new TTBC page? And yes, the sub-category thing was a non-starter, which compelled me to come up with a better idea.
The issues that Stephen addressed are a much bigger problem actually, which can't be solved for now, save by encouraging people to add as many clear example sentences as possible. On top of the new TTBC page will also go a big notice discouraging people from checking translations they're not 100% sure of (I propose). — Vildricianus 20:43, 3 March 2006 (UTC)[reply]
A layout proposal. — Vildricianus 23:02, 3 March 2006 (UTC)[reply]
Thanks for discussing this on IRC. As a result, I've taken a handful of words and taken a shot at using template:ttbc to tag the specific languages into sub-categories. My attempts at auto-populating those 800 entries is so far been less-than stellar. I'm still at the stage of testing individual entries (free, book, cat) and encountering problems with just them. I'll probably have it "just right" by the time I do test #799.  :-) It probably is enough already to determine if this is a desired option or not though. --Connel MacKenzie T C 09:24, 4 March 2006 (UTC)[reply]
  • What my approach is sorely lacking is what Stephen was indicating above: a way of tagging entries that need better English examples for the TTBC process to work. This has the side benefit of including us monolinguists in the effort. Stephen, do you have a template nameing preference for such a thing? (Or, do you just want to Be Bold with it?) --Connel MacKenzie T C 09:55, 4 March 2006 (UTC)[reply]
Thank you for improving this whole idea. The Wiktionary:Translations to be checked will now serve the purpose of clarifying and explaining less experienced users how to exactly work on it. Perhaps we can also list requests for example sentences there. Connel, at book you missed out the first section of TTBC. I'm wondering how huge the categories box at the bottom of the page would have been if you hadn't.
I can't think of any problem with this system. If only we could autopopulate those categories. — Vildricianus 10:15, 4 March 2006 (UTC)[reply]
Truly scary thought: even if it isn't automated, the concept still works, right?
Thanks, yes, that problem with book was one of the problems I was alluding to above. I think many of those translation languages are already covered by free though. (Free used to be the most-translated word here - I wonder if it still is. Anyone know?)
I think we've even addressed Paul's concern about history of edits - the person removing the "ttbc" tag is thereby verifying that a translation is correct. And the history is right where it belongs, not scattered elsewhere.
Should I start removing the {{checktrans}} tag from entries that are subdivided into the language groupings? So far, I've left them in place, but on reflection, this seems wrong. --Connel MacKenzie T C 10:43, 4 March 2006 (UTC)[reply]
  1. I think the "{{checktrans}} tag" should be removed only when any/all numbered translations have been put into their appropriate definition (by checking the history, to see what the translation entered was intended for.)
  2. I'm not completely happy with template:ttbc. Its role is to propagate the correct category, but not change the display, and not make it harder for translators. By maintaining the wikification of the language name, it seems to be getting complicated. Anyone mind if I just remove the wikification of the language names, within the "checktrans" section?
--Connel MacKenzie T C 17:48, 6 March 2006 (UTC)[reply]

Clarifying recap lest it be misunderstood: the current state of the TTBC-project has changed from the main proposal of mine towards what I first had thought of, namely the categorizing of all entries that need checking. The proposal is as follows: all languages of the TTBC take a template, which puts the word into the appropriate categories, like this, on book:

===TTBC===
{{checktrans}}
*{{ttbc|Dutch}}: boek
*{{ttbc|Esperanto}}: libro

This will add book to Category:Translations to be checked (Dutch) and ~ (Esperanto). See the provisional Category:ttbc for entries where the proposed system has already been applied as a test.

This new system requires less work, as the initially proposed Wiktionary:Translations to be checked will no longer be needed to nominate entries. Adding Template:ttbc will do so. Users will no longer need to browse through the page but can instead access a clear and alphabetical automated category. The new purpose of the page will then be 1) a how-to-check explanation and listing of categories (for those who are unfamiliar with templates) and 2) a place where translators can put up requests for clearer definitions or example sentences so that mistakes in translations are avoided.

Connel, I don't think languages in the TTBC section need to be wikified, so I still don't see any problem with the template. As to whether {{checktrans}} should be removed, I don't see why it should. I would certainly not start doing it until the use of {{ttbc}} is settled, lest anything slip out of the category unnoticed.

Another thing: is there any way to check when translations are tagged "to be checked" on a page other than the entry's history, so that we can set a time limit and thus avoid them to remain for ever to be checked? — Vildricianus 20:12, 6 March 2006 (UTC)[reply]

  • There is no practical solution for listing them by date "tagged" at this time. As far as auto-propagating the ttbc template, I have coded that in Javascript rather than Python, so that they can be manually reviewed (at least until I'm sure I've accounted for all the conditions, or until I get sick of them.) I'm finishing debugging the Javascript now.
  • I wanted to remove the checktrans template so that the only ones remaining would be ones with numbered translations to be checked...that is, ones that English-only speakers could very likely help with. But perhaps that should be a different sort of tag anyway. So for now, the checktrans tag stays.
--Connel MacKenzie T C 02:19, 7 March 2006 (UTC)[reply]

One thing that should be amended if possible: Since each of the ttbc templates is included in some category, one typically gets enormously long category inclusion listings at the end of pages with a TTBC section with Wiktionary internal stuff. Ncik 01:25, 8 March 2006 (UTC)[reply]

The category name wording could be changed from "Translations to be checked (...)" to "TTBC (...)". Is that what you are suggesting? --Connel MacKenzie T C 19:48, 8 March 2006 (UTC)[reply]
It would certainly be an improvement. Ncik 01:57, 9 March 2006 (UTC)[reply]

A side effect of this effort is that a long list of categories are now appearing at the top each of these pages. It would be really nice if these only appeared on the category pages and were hidden on each separate page. Eclecticology 09:31, 9 March 2006 (UTC)[reply]

Perhaps. Yet, this way, the categories are noticed and may attract translators. — Vildricianus 17:11, 9 March 2006 (UTC)[reply]
True enough, but that can be an annoyance for those of us who don't spend a lot of time on translations, and who are more interested in the topical categories that are now buried. Alternatively these could be moved to ttbc sub-pages. Eclecticology 02:42, 10 March 2006 (UTC)[reply]
Most people use the monobook skin, as it offers more functionality than the others...in that skin categories appear at the bottom, not the top. To truncate the category list, simple edit the first line of template:ttbc to use the category name "TTBC" instead of "Translations to be checked", and then move the 157 category pages that are defined (but don't create ones for categories that don't exist.)
I'm not sure that doing so would be wise right now though. The translations seem to be garnering a great deal of activity. These category pages are pretty certainly linked from other language Wiktionaries. --Connel MacKenzie T C 08:08, 22 March 2006 (UTC)[reply]

Language wikification revisited

Beer parlouring is addictive.

I have a proposal which, like the above, has arisen out of practical complaints from me and others. Even though it has been stated numerous times that the wikification of languages in the translations section is based on practical experience and "rules of thumb", I think we might as well compile a policy on the matter.

Several hard-working persons here are counteracting one another wikifying and dewikifying languages that are in the grey zone. Therefore, I suggest it be officially mentioned in a policy. — Vildricianus 18:37, 3 March 2006 (UTC)[reply]

This seems to be going well already; I have heard of no complaints about edit wars over this. I mostly dewikify, but I don't get upset when something in the grey area goes the other way. It would be good that the grey area remain large. Eclecticology 19:33, 3 March 2006 (UTC)[reply]
There are no edit wars over it (yet), and it is indeed a relatively trivial and currently non-notable issue. However, I don't see how a clear statement could hurt the efficiency of cleaning up translation tables. It would rather be a good defence against possible future conflicts. I'm sure that in one point in the future, a list of languages were wikification is recommended is necessary. Why wait? I'm not sure whether it helps having "Hebrew" wikified in one entry and dewikified in another. Note: I also see how some people could be offended because their language is regarded as "obscure". They can defend their stance then. — Vildricianus 20:42, 3 March 2006 (UTC)[reply]
  • "Beer parlouring is addictive" <-- I think I waste too much time here too.
  • Currently, I semi-automatically dewikify this list:
    • italian,french,german,spanish,japanese,serbian,latin,swedish,dutch,finnish,russian,norwegian,
    • bosnian,esperanto,chinese,portuguese,irish,polish,slovak,czech,danish,greek,hebrew,turkish,
    • welsh,hungarian,korean,arabic,romanian,indonesian,croatian,vietnamese,bulgarian,hindi,
    • filipino,icelandic,albanian,sanskrit,slovene,thai,yiddish
as these seem to be the most common languages here.
Do any on that list jump out as extroardinary? I have no preference about which should or shouldn't be wikified, I would just like a consistent approach. Perhaps more consistent would be to either wikify them all or none. Please let me know what you decide on. --Connel MacKenzie T C 22:39, 3 March 2006 (UTC)[reply]
Yes, Irish does, as I feel it may be wikified (not very clear that Gaelic is meant). Others may be subject to discussion as well (Hindi, Hebrew, Sanskrit). Of course, making an extreme decision of all or nothing at all is better than no decision at all. — Vildricianus 22:56, 3 March 2006 (UTC)[reply]
Be bold. Create a Policy Think Tank. see Category:Policies - Wiktionary Top Level and Wiktionary:Proposal for Policies and Guidelines (now Wiktionary:Policies and Guidelines - Policy).--Richardb 16:35, 4 March 2006 (UTC)[reply]
I'll create a proposed list at Wiktionary:Translations/Wikification which can be merged later on when there's an agreement (if at all). — Vildricianus 17:22, 4 March 2006 (UTC)[reply]

Open proxies block

Discussion moved to Wiktionary:Votes/2006-03/Open proxies block.

Latin is still a "dead language"

I notice that Richardb has tagged the Latin word antlia as "obsolete". This assumes (1) That all Latin words are going to be tagged, and that (2) The word is not being used in the Vatican, where the Latin language is alive and well. My personal opinion is that it is pointless to tag Latin words as obsolete. Other opinions? --EncycloPetey 09:28, 4 March 2006 (UTC)[reply]

I'll take a wild guess that Richardb meant that the concept of pumping water by hand or foot is what is obsolete, not the entire language. It is hard to tell though, with such an ambiguous edit summary comment. --Connel MacKenzie T C 09:37, 4 March 2006 (UTC)[reply]
I do know that as recently as the 18th century, antlia was still used as an interlingual Latin word, since that is when the constellation Antlia was named. Based on what I've learned of the history of that constellation's name, antlia may still be used in modern Latin for a generic pump, whether of water or air. --EncycloPetey 09:58, 4 March 2006 (UTC)[reply]
It is possible for Latin words to be obsolete; the word obsoletus after all only means 'fallen out of use'. However, due to the Latin literary and purist traditions giving rise to the propensity of later Latinists of all ages to resurrect words not in use in earlier periods, it is probably better to label any such words as archaic if they are no longer in common use. An obsolete sense of a word might be, say, one whose meaning was never properly determined, and thus couldn't be used independently in later authors; I suspect some of the obscure names for animals Pliny uses may fall into that category (though if perhaps picked up by Linnaeus or other Latin author and given a new, identifiable/standard meaning, that new meaning wouldn't be obsolete). Incidentally I understand Latin isn't doing too well in the Vatican. Most of them just go for Italian. [11]Muke Tever 15:35, 4 March 2006 (UTC)[reply]
My mistake. I somehow overlooked the Latin heading, having got there via Antlia, the star name. I've now withdrawn the "obsolete" tag. (I'll let others argue over whether a Latin word becomes obsolete when the thing it describes becomes obsolete)--Richardb 16:28, 4 March 2006 (UTC)[reply]
The dictionary label for a word whose referent has gone obsolete is (historical). —Muke Tever 23:40, 4 March 2006 (UTC)[reply]
Latin is considered a dead language while virtually no one speaks it, but its texts are still visible. Whether a Latin word is obsolete depends on whether it is no longer used.--Jusjih 03:18, 5 March 2006 (UTC)[reply]

"Template" for Edit Page

I've just modified MediaWiki:Minoredit so that at the footer of an editing page, where it has a check box with the text "this is a minor edit", the "minor edit" is now a link to the page Help:Minor edit page. I've also put some hover text in.
Would be nice to open the Help:Minor edit page in a new window, as per the Editing help link. But I'm so rusty on html I can't remember how to do it. (And I can't find where the "Editing help" link has the new window function in it.) Anyone care to add the "new window" function to the "minor edit" link.--Richardb 03:41, 5 March 2006 (UTC)[reply]

target="_blank" in HTML Davilla 19:56, 5 March 2006 (UTC)[reply]

Common words in other langs

I've been adding Spanish words to the Appendix:Swadesh lists for interlingua and have found it very helpful. Specifically, I'm finding basic words in spanish that did not have entries in Wiktionary, and have been able to add them as a fill in the tables. If there are people out there reasonably fluent in French, Italian, Portuguese, Interlingua, German, or Dutch, these languages could use help as well.

The only serious problem I've run into is that it isn't always clear what sense of the English word is intended, so I have difficulty choosing the right word in Spanish. For example, the word skin coule mean "the outer covering of the human body" or "the hide of an animal", and Spanish uses different words for these senses. Or again, the English verbs wipe and rub translate differently in spanish depending upon the purpose of the activity. Is one rubbing affectionately, rubbing in order to scratch, rubbing to remove something? In short, there are some words that I just can't fill in without knowing precisely what the intended meaning is for the English word. Is there some place where this information may be found? --EncycloPetey 11:35, 5 March 2006 (UTC)[reply]

What intended meaning? The Swadesh lists just lists words as a translation if at least one concept that each word represents are shared between them. They are only useful for finding common words that are missing, nothing else.
When you have decided to add a missing word, you can forget about the Swadesh lists. The only thing that matters is what concepts the word represents and which English words that can be used to represent these concepts. The main problem is that often, or rather almost always, taht the translation into English is only good for a subset of the possible senses of the English words. In such cases an definition is recommended to be added to explain more exactly which sense the English translations are good for.
I have been using the Swadesh lists myself to add Swedish words. One good example that you can look at is Swedish rygg which can be translated to English as back, ridge, rear or spine depending on what you mean. However, they only attach to a subset of the senses of each English words so definitions and/or contexts have been added the different senses.
I personally prefer quality other quantity, so my prefered order of operations is
  1. Look up what the Swedish word means in all senses as explained in Swedish.
  2. Consult a Swedish-English dictionary and find the possible English words that is to be considered.
  3. Think myself about the concepts that the word represent and try find additional English words.
  4. Look up what each English word means in all senses as explained in English.
  5. Remove the English words that can only be used as translations in very special circumstances.
  6. Divide the English words in suitable groups representing each Swedish sense of the word.
  7. Add if nessessary for each sense a definition and/or a context.
  8. Add examples if there, even given the defintion and context, are subtleties concerning the translation.
Yes, doing it like this takes a lot of time and I would be lying if I would say that I always did the above, but if you don't want to do that just add a list of the different words that the Spanish word can translate to and let somebody else sort the rest out. That is what Wiki is all about, isn't it? --Patrik Stridvall 15:41, 5 March 2006 (UTC)[reply]
I'm not sure you've understood my concern. The procedure you've outlined is similar to the one I've applied myself. However, short of adding an entry to the Swadesh list, there won't be an easy way to track which items have been added. The list also serves the purpose of providing a multi-lingual translation table for a suite of basic, old words in the languages listed. This allows a user to compare quickly how a concept is expressed in a set of related languages, and to see what diffeences exist. This aspect of the table's use is not possible unless there is an agreed-upon sense in which each term is used. My question is whether the original list had additional information about the preferred meaning, and whether anyone might know where to find that information. --EncycloPetey 06:25, 6 March 2006 (UTC)[reply]
I think I did, but perhaps I should have been clearer. Unfortunately AFAIK there is no such agreed upon sense in Swadesh lists. Except for superficial purposes they are of limited use. They are used for comparing similar concepts in languages nothing else. However, if you where to say crash with airplane in a remote jungle somewhere and happend to have a Swadesh list with the local language it would be enormously helpful. It usually possible to understand what somebody is saying even if a word for a similar concept is used. At least after some sort of dialog... --Patrik Stridvall 10:21, 6 March 2006 (UTC)[reply]
Er, the swadesh list does have disambiguative sense notes, quite a few (but not a complete set). For example "skin" is particularly that of a human. See Wiktionary:Swadesh template, Wiktionary talk:Swadesh template. As for "rub", I've never heard of it meaning to scratch (and given that 'scratch' is in the list itself, I wouldn't expect it), and the sense of "rub" meaning to remove something is only in "rub out" or other phrasal verbs, not "rub" itself. I would expect "rub" would be the basic sense of "rub" meaning to apply friction ... —Muke Tever 23:27, 6 March 2006 (UTC)[reply]

Activedoc - to be or not to be?

Top of the Requested Articles list at the moment is Activedoc. Now, after a little bit of Google and 'kipedia research, I've discovered that 'Activedoc' is little more than the name of some small internet organisation. Does this give me clearance to remove it from the requested articles list, as creating articles about specific organisations - as I understand - is against the Wiktionary article policy? Black-Velvet 14:48, 5 March 2006 (UTC)[reply]

Be bold. Delete it. It's only a word in a list, not an entry. Pretty easy for someone to put it back if they are really serious, or, better still, they can create the entry.--Richardb 09:34, 6 March 2006 (UTC)[reply]


CDVF disconnections

Anyone know why WP:CDVF keeps disconnecting every few minutes today? --Connel MacKenzie T C 16:59, 7 March 2006 (UTC)[reply]

Vulgarities that aren't

Someone has added a number of entries to Category:Vulgarities that probably don't belong there — for example, see the 3rd meaning of amuse. They were apparently confused by the title of the source they were using: something called Dictionary of Vulgar Tongue (1811). The term vulgar tongue in this context probably just means "colloquial English". Would someone like to go through and recategorize those terms that aren't actually vulgar in the modern sense? - dcljr 20:40, 7 March 2006 (UTC)[reply]

It's my opinion that we should not be using "vulgar" as a label. In older dictionaries, it meant "colloquial" or even "used by the working classes"; in more recent dictionaries, it means "not in polite usage" and indicates so-called "swear words". We should be using "informal" (or "colloquial") for the former kind of words. I prefer "taboo slang" for the latter kind, which is used by some modern dictionaries. — Paul G 14:33, 8 March 2006 (UTC)[reply]
I dunno. "Vulgar" has a much stronger connotation than "taboo." The entries from the Dictionary Of The Vulgar Tongue should certainly be corrected, regardless of what tag is used for vulgar/taboo terms. --Connel MacKenzie T C 19:17, 8 March 2006 (UTC)[reply]
Whatever else happens to the Dictionary of the Vulgar Tongue entries, they should be marked "archaic" or "obsolete" or whatever we are tagging such terms. (Or does anyone actually speak of an "usher of the back door" still?) I'd also suggest getting them out of Category:Idioms. Most of them are closer to euphemisms, if anything, and the idiom category is used by learners of English who presently have no way to distinguish between current turns of phrase and this curious old collection. —Dvortygirl 08:14, 10 March 2006 (UTC)[reply]
I've put an RfV notice on usher of the back door. It was put in by an anon who could have just made it up. Eclecticology 08:51, 12 March 2006 (UTC)[reply]

Multiple etymologies

I was doing some Swedish translations (the new templates and categories really help) and ran across a few words with multiple etymologies. The first one was abort. Since both was from the same word in Latin I just thought: Oh well, be bold and fix it. However, when the next one abstract also had multiple etymologies also from the same Latin word I looked in the editing history and found out that Vildricianus was "guilty" on both counts. Not that it matters who did it, I'm sure he had a reason. I'm not sure that I agree with his reasons though.

So, when should we have multiple etymologies? Having it just because a word possibly took a slightly different path to reach English from Latin is not really very good especially since the formating for multiple etymologies requires changing all headers. --Patrik Stridvall 19:46, 8 March 2006 (UTC)[reply]

The words you mention both have different senses which come from distinct, though related, Latin words. For example the noun sense of abort is from Latin abortus, whereas the verb is from the past participle stem of aboriri. It is important – as a point of fact if nothing else – to keep these etymologies separate. The way we lay out multiple etymologies isn't ideal, but the principle shouldn't disappear just because roots are similar or related. Widsith 19:58, 8 March 2006 (UTC)[reply]
Exactly. I should perhaps have phrased it better and say "from Latin noun abortus" and the other "from abort-, stem of the verb aboriri". (Perhaps you should fix abort again, Patrik). — Vildricianus 20:15, 8 March 2006 (UTC)[reply]
Sure as soon as we decide how it should look like. --20:37, 8 March 2006 (UTC)
Seriously, if that it what we mean by multiple etymologies we can't have the current format, can we? In that case almost every word that are multiple parts of speech could potentially have multiple etymologies. Wouldn't it be better to meantion it in a subsection under each part of speech instead? --Patrik Stridvall 20:37, 8 March 2006 (UTC)[reply]
Not at all, very often the multiple parts of speech are developments which happened after the word was adopted into English (so all from the same source). But yes, potentially they could all have separate etymologies. So what? That is the whole point of the etymology section. Widsith 08:47, 9 March 2006 (UTC)[reply]
It matters very much because of the formatting we use. Any new piece of information could change the formating and then back again when somebody disagrees. This is not good. Any such process should as far as it is be possible be as localized as possible.
Futhermore I don't think many people would find it very useful. I personally use etymologies to help me understand a word. Sure some words have totally different different etymologies and probably meanings as well. They just happend to the spelled the same, that is one thing. There is quite another thing to seperate when the POS:s relate to the same concept, especially then the same is true for the orginal Latin words.
So, why not instead have some sort of general etymology that is shared between the POS:s and then have a separate specific etymology for each POS. --Patrik Stridvall 09:49, 9 March 2006 (UTC)[reply]
This is actually done on some pages. But all within the level 3 etymology section. The format I've encountered several times is 'bullet POS:'. Since etymologies are always a bit blurry, one often has to find some sort of compromise. Ncik 00:15, 10 March 2006 (UTC)[reply]

Question about Tho and Though

Hello, I'm wondering if anyone can shed some light on the application & usage of the words Tho and Though. I am reading a text from 1800 & trying to correct changes which have been made to it in 1905. Throughout the text the British spelling has been replaced with American spelling (as it was for publication in an American book). In the text the word tho is used & I don't know whether this was the original British usage of 1800 or the American usage of 1905. Can anyone help me out. Thanks. AllanHainey 193.201.121.130 17:53, 9 March 2006 (UTC)[reply]

It's difficult to make any valid comment without more information, notably what work you are talking about. The best way to know what the 1800 edition says is to look at the original text instead of trying to guess what it said. Eclecticology 02:32, 10 March 2006 (UTC)[reply]
The text is a Charles James Fox speech given in 1800 on negotiations with France, it was edited by William Jennings Bryan in 1905 for a collection of speeches he published, however he converted all the spellings to U.S.A. spellings. I would check an original version of the speech if one were available but the Bryan version is the only one I can find. I can accurately assume though that Fox used British spellings & that Bryan didn't (as other speeches in his volume have spellings like labor rather than labour, etc) so simply converting Americanisms back to Britishisms should be relatively simple & acurate, if I can find out which is which, Tho is the only one giving me trouble as I didn't think it was used as anything except a modern abbreviation. AllanHainey 193.201.121.130 08:24, 10 March 2006 (UTC)[reply]
The only major difference is the formality, tho is less formal than though. Webster's lists it in the 1828 edition, and the etymology is simply a contraction of though, from old English. I think it could have gone either way, but since American english uses both though and tho I don't see why the editor would exchange one for the other. - TheDaveRoss
This speech was given in the House of Commons, and it is highly unlikely that official circles would have used American abbreviations. The one work of Fox that I quickly found on line, History of James the Second uses "though" throughout. If you are near a good library with a set of the UK Parliamentary Debates it should be complete there and not abridged (See footnote 1 at the place you cite) in the way that Bryan's version is. To what extent would Bryan be inclined to taint the text with his own POV. Eclecticology 09:25, 10 March 2006 (UTC)[reply]
Thank you, unfortunately I'm not near a library with a Hansard from 1800 (& they tend to be reference only so a lot of work to copy down in the Library) & so I've just got to go with what I can get from the internet right now. I doubt if Bryan would intentionally bias the text (though I don't know as I haven't seen the full text) it is likely however that he removed the 'dull', convoluted or repetitive parts of the speech & only published the 'exciting' bits or sparkling oratory. AllanHainey 193.201.121.130 12:40, 10 March 2006 (UTC)[reply]
For future reference, Hansard is available online here, but from 1988 onwards only, which doesn't help in this case. — Paul G 13:59, 10 March 2006 (UTC)[reply]


Shorthand notations

User:Wikigregg added w:Gregg Shorthand notations to a couple of pages (see abide). I don't have a problem with that, but wonder what others think. Ncik 14:12, 11 March 2006 (UTC)[reply]

Categorizing

When you create an article should it be categorized for a broad category and a specific sub-category or just the sub-category? For instance if I add an article about cheddar cheese to Category:Food then add it to Category:Cheeses would I keep the first one or remove it? Yorktown1776 18:33, 12 March 2006 (UTC)[reply]

Your choice. If you leave both categories in, the word will appear in both category listings. If you put it in category:Cheeses only, it will not appear in category:Food, but anyone who drills down on the sub-catgegory of Cheeses will find cheddar. Thast is, if you first define the category page for cheeses, with that page tagged with the Food category. This latter way would be my preference.--Richardb 08:19, 13 March 2006 (UTC)[reply]
On the other hand, why do we need entries such as "Cheddar cheese" and "Limburger cheese". My view is that we only need the entries "Limburger" and "Cheddar"--Richardb 08:19, 13 March 2006 (UTC)[reply]
My own preference is to not put it in both. I like to build scalability in categories. Thus at an earlier stage we could start with Cheddar being in the Food category. As the category grows bigger we can split off subsets of the bigger category to avoid having the food category get too big. Eclecticology 11:01, 13 March 2006 (UTC)[reply]

Excuse me messing abouyt with rfe and rfp - requests for etymology and requests for pronunciation.

Unfortunately, before I can finish the job I've started in this area, I've got to get some sleep and get some work done. Be back in a few days. so, don't try too hard to make sense of what I'm doing there just yet, and certianly don't clean up. Thanks--Richardb 13:44, 13 March 2006 (UTC)[reply]

My latest edits

The wiki software seems to be chewing up my most recent edits. For example, Pavia shows up as blank to me, but my changes are there in the history. Does anyone know what is going on? — Paul G 16:35, 13 March 2006 (UTC)[reply]

Hm, it seems to have sorted itself out now. — Paul G 16:37, 13 March 2006 (UTC)[reply]

It still comes up "blank" when I link to Pavia, as does orthography. Try editing each one with a minor change, and see if that fixes them. --EncycloPetey 19:56, 13 March 2006 (UTC)[reply]
Minor changes made to both pages, which both show up fine for me. Do they show up for you now? — Paul G 10:10, 14 March 2006 (UTC)[reply]
Yes. It all seems to be sorted out now. I've had the same thing happen from time to time on Wikipedia, and am never sure what the underlying cause might be, but a minor edit a bit later usually fixes the symptoms. --EncycloPetey 01:22, 18 March 2006 (UTC)[reply]

Categories

I've been scratchin my head about what categories to put Category:Obsolete and Category:Archaic into. All categories should be categorised really, shouldn't they? Apart from maybe Category:*Topics and Category:Wiktionary maybe. --Dangherous 22:42, 13 March 2006 (UTC)[reply]

We traditionally haven't worried about categorizing uncategorized categories. Are you volunteering? I think Ec drew up an initial tree a while back, but I can't seem to find it at the moment. --Connel MacKenzie T C 04:51, 14 March 2006 (UTC)[reply]
I agree that having everything categorized has not been a big issue the way it has been at Wikipedia. Many words, particularly ones relating to abstract concepts, are very difficult to put in categories. Any sensible treatment of those words will probably need to waid until we have a properly functioning WikiSaurus. Connel is only partly correct in saying that I drew up an initial tree because I only partly drew it up. I need to get back to it, and at least document my ideas on this. In it Category:*Topics and Category:*Wiktionary are the two top level categories which deal with content and operations respectively. The use of the asterisk in the category name is meant to ensure that they will always appear first in the general list of categories. Eclecticology 19:23, 14 March 2006 (UTC)[reply]
I've had a go at Special:Uncategorizedcategories, and saw some weird categories, like Category:frowned upon! I've categorized a couple of dozen in there, but have left the "language-based" categories, as I'm not a linguist, but an artist and a making-things-prettier type user. --Dangherous 19:09, 16 March 2006 (UTC)[reply]

Misspellings

I was thinking that there should be some sort of look-up for misspellings, which is where the Go button should direct, instead of the search functionality, in the case of a no-match. One concept for the page would look something like this. It could be (1) generated dynamically and automatically, or (2) created by users in a separate namespace. The latter may seem intractable unless you consider that the contents would be the same across all language Wiktionaries. If that seems like a useful endeavor then I propose we try to implement it here, and once proven apply some technology to make the same copy work across all Wiktionaries. Davilla 13:04, 17 February 2006 (UTC)[reply]

  1. Misspellings: Our list is at Wiktionary:List of common misspellings, which I thought is supposed to appear in default searches. I don't think there is a possibility of getting a separate "Misspellings:" namespace, considering how intractable it is to get "Appendix:" et al. implemented.
    I'm not interested in listing just common misspellings. I'm interested, ultimately, in addressing potentially all of them. Davilla
  2. Language separation: Several of us have long desired for a way to split the search result up by language. I do not know of any technical way it can be done at this time. Do you? Can you write a MediaWiki "extention" to do this? Right now I can't guess how that software would accomplish it.
    In my proposal, separation recognized by the servers would be necessary only in an automated spelling search, not a hand-contructed one, and could be reduced to checking for the existance of language headers within each page.
    Your proposal is quite different, as it would apply to global searches, even words within definitions and translations and such. Unfortunately I haven't got the first clue how the MediaWiki software operates, and considering that I haven't even done any sort of scripting here yet, I'm not about to touch it. However, there are practical steps that could be taken that might make this a lot easier to implement, in particular splitting up languages onto different pages. Language-specific searches could be implemented the same way as name space searches.
    This doesn't have to change the look and feel of Wiktionary, though, if the separate definitions could be combined within a higher-level page. For instance, en:villa and es:villa are referenced within villa, much like templates. I'm guessing the latter part is easier to do then what you're suggesting, but it would take someone more knowledgable than me to confirm it, especially how much work it would take to make this completely transparent, if that's desired. At the very least a few bots would be required to maintain this.
    Anyways, that's kind of a tangent. Davilla
  3. I agree that "Did you mean:" is a good alternate wording for "Common misspelling of...". But internal links can also arrive at that page. I know that I have often make the mistake of assuming that a blue link is a valid English word. Thankfully, this is a mistake I make less and less often.
    The color of links is precisely the reason I've suggested using a separate namespace, not to mention that there are conflicts between words of one language and misspellings in another. mispelling [sic.] would remain red, and very little would actually link to Spell:mispelling. The user would usually only arrive there by pressing Go.
    I don't mind "common misspelling of" in cases where it actually is a common misspelling, but I was hoping for something more broad, along the lines a text editor's list of suggestions. Dictionary.com is an example of an online dictionary that implements this (somewhat poorly considering all the useless alternatives it gives). Davilla 06:46, 18 February 2006 (UTC)[reply]
--Connel MacKenzie T C 18:47, 17 February 2006 (UTC)[reply]

By the way, I believe the absence of this feature is the primary reason why a dictionary based on submissions could not substitute for another, more "traditional" online dictionary. A person who were to try to use Wiktionary would worry about the case of no match, not knowing whether the word were misspelt or simply not yet entered. In the future, as the dictionary gains even more substance, this will be less of a concern for English words, but it will be one for foreign words for quite some time. By sharing misspellings with other Wiktionaries, that problem is shunted long before the foreign entries in English Wiktionary reach the same level of completion. Davilla 14:13, 20 February 2006 (UTC)[reply]

Can you give an example, of say, how you would deal with something like copywritten in this scheme? --Connel MacKenzie T C 07:29, 27 February 2006 (UTC)[reply]
That there's already a page for copywritten reflects the fact that, although it is considered incorrect, the nonstandard but nontheless very real word is sometimes used as such. I didn't have this case in mind. An actual misspelling is different in my mind. On the other hand, if you consider "copywritten" to be a misspelling, in a sense, of the concept copyrighted, as derived from the misspelling or mis-thought "copywrite", then you would prefer "copywritten" to be red-linked and fail "Go", so it would be moved to the spell: namespace. Would it suffice to have "Did you mean:
* copyrighted" on spell:copywritten, or would this entire concept have to be expanded to include labels such as (non-standard) and (common misspelling)? Davilla 16:11, 6 March 2006 (UTC)[reply]
Sorry, that was a poor example (regionally that is considered incorrect, but that introduces an unnecessary POV into the conversation.) OTOH, it does show the need for "spell:" type entries, where an exact match does exist in Wiktionary.
I'm still having a little difficulty understanding how your approach addresses the problem you identified. That is, how does one know whether a link is red becuase it is notaword vs. not entered yet? --Connel MacKenzie T C 16:27, 6 March 2006 (UTC)[reply]
Woops, misread. The user wouldn't know the difference between not-a-word (in any shape or form) and not-entered. What they would know is that the information is not in Wiktionary, so they can stop looking here. On the other hand, "paradies" is a word (almost) and directing no-go to the spell: page would immediately show that it's a misspelling. With lists of misspellings, the user gets an answer immediately, including "I don't know, stop asking", but as it is we're left guessing if we should try some variant or go looking somewhere else. More often the first is too troublesome, so we give up early and go for the latter. Davilla 22:01, 10 March 2006 (UTC)[reply]
They don't. What they know is that if a link is blue, then it really is a word (well... barring vandalism and the like), never a "common misspelling of". More importantly, someone typing in a word that's misspelled wouldn't be enticed to enter it since their mistake is pointed out immediately. Spelling correction is what I use dictionary.com for half the time. Davilla 17:08, 6 March 2006 (UTC)[reply]
Well, now that you point it out, there is still a problem though, isn't there? Someone misspells a linked word in an edit, then fills it out. Should red links necessarily direct to the edit page? After all, the spell pages are supposed to have the edit option built-in. Alternately, could certain (all?) redirect pages be faked as red? e.g. "#redirect spell:" indicates {Pagename} is a misspelling and should be reddened, despite being active as a link. Shoot, now I'm just throwing out ideas. Davilla 17:08, 6 March 2006 (UTC)[reply]

Bot Request: User: Tawkerbot

Connel has requested bot assistance for Wiktionary: Administrators/Dishwashing#Special:Whatlinkshere/Template:trans

I have tawkerbot, a pywikipedia bot running on a machine in my data center that doesn't get a lot of use.

The bots username is User:Tawkerbot - the bot also runs on Wikipedia under the same username.

Essentially it's going to subst on the templates matching "regex s/{{sv}}/Swedish/g" (provided by Connel) from the template. Tawker 23:15, 18 February 2006 (UTC)[reply]

Bot Name: User: Tawkerbot
Owner: User: Tawker
Purpose: subst: language templates