This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.

Beer parlour archives edit

2024

2023

Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

December

March 2008

Translation bars

What do you think could "show" in the translation bars be changed to be right after the bar title? It's now a bit difficult to find for someone who browses here but doesn't edit, especially if one has never seen that kind of bar before. And It would be great if the bar floated better with pictures, now if there's a picture on the right, and the translation bar on the left, the bar starts after the picture and leaves blank above. Best regard Rhanyeia ♥♫ 15:54, 8 February 2008 (UTC)[reply]

I think maybe it should actually go right before the bar title (instead of or as well as at the right end), but right after the bar title would be a good second choice. —Ruakh_TALK 00:35, 9 February 2008 (UTC)[reply]

That sounds good. :) Best regards Rhanyeia ♥♫ 10:53, 9 February 2008 (UTC)[reply]

I don't like this idea, having the [show] before the text would (I think) make it messier as they are currently styled like the [edit] links which always float to the right. I can't see how there can be that much different in finding it (and it is the same as the 'pedia bars) Conrad.Irwin 21:10, 9 February 2008 (UTC)[reply]

I'm trying to imagine how the pages look to someone who doesn't use computers much, and who doesn't know how their layout works. That person doesn't know there's something hidden inside the bar and his eyes may just go over it fast. Translations are a very valuable part of Wiktionary pages and I think they should be found easily. I can't remember that Wikipedia would use those bars inside articles, does it? That "show" resembles the "edit" tags is not very good because they are for so distinct purposes, a casual reader might avoid clicking anywhere around the "edit" tags fearing it could change the page. I think the "show" tags could be far more distinct from the "edit" tags. If not before the bar titles, could they be after or under them? Best regards Rhanyeia ♥♫ 09:12, 10 February 2008 (UTC)[reply]

Assuming that we implemented a change in the "trans" and "rel" bars, how would we know that it was a good change after we had done it? I know that we should and do eat our own dog food, but we aren't representative of the larger population that we serve. DCDuring TALK 11:21, 10 February 2008 (UTC)[reply]

This problem bothers me too. What about a companion icon displayed along with the text, for example a downwards-pointing arrow alongside [Show] and an upwards-pointing arrow alongside [Hide] ? --EncycloPetey 22:11, 10 February 2008 (UTC)[reply]

I like the idea of the arrows - but does it explain the concept enough - perhaps a "+" sign would be better [show ▼] [show +] - what does Windows use for this kind of thing? Conrad.Irwin 22:21, 10 February 2008 (UTC)[reply]

But a plus (+) could still be interpreted to mean that it is for editing (add additional material). The reason for suggesting arrows is that people with poor English (or who aren't watching) might assume the bracketed text allows editing. --EncycloPetey 23:19, 10 February 2008 (UTC)[reply]

Indeed. When I created {trans-top} as a demonstration, I just borrowed the "nav" CSS. I pointed very loudly and repeatedly that it should be designed before being put into use, but was utterly ignored by those eager to plow ahead.

The show/hide link should immediately follow the gloss, and the table not display full width unless "opened". Someone ought to create separate CSS for these (and rel-top as well) and use it. Likewise, where a conjugation/inflection table is built to be collapsed, it should not be a full width bar. Robert Ullmann 14:06, 12 February 2008 (UTC)[reply]

EncycloPetey's idea about an arrow was excellent. Were you thinking about a graphical arrow or a text arrow? I could try to create something, but does anyone know how to do these things technically so that we can try out how it looks? Best regards Rhanyeia ♥♫ 19:11, 12 February 2008 (UTC)[reply]

If there is an arrow character (like ▼) that will display properly in a majority of current browsers and platforms, then that would be ideal. A single text character would load faster and have fewer format pitfalls. I know that the arrow character I used in the first sentence of this paragraph will display correctly in IE6, IE7 and Safari for MacOS X. --EncycloPetey 22:38, 12 February 2008 (UTC)[reply]

See http://www.alanwood.net/unicode/arrows.html and take your pick, then finding out whether it will display on IE6 will be harder. Conrad.Irwin 22:46, 12 February 2008 (UTC)[reply]

It isn't hard for me to check IE6, since I'm forced to use it at work. I checked the page and nearly all fail to display in IE6; only the first six items display, as well as item 8616. Personally, I prefer 9650 and 9660, the black triangles, which are included in WGL4. --EncycloPetey 02:28, 13 February 2008 (UTC)[reply]

A desirable design for a control that we wish to encourage people to use would be for it to be close to where their eye is and where their pointer is and fairly large. A short bar is good, but also a large target. See Nielsen and Loranger (2006), Prioritizing Web Usability DCDuring TALK 02:53, 13 February 2008 (UTC)[reply]

The triangles 9650 and 9660 sound good. I think "show" could also be a little bigger. Under the title text might also look good, then the bar would become thicker and one would pay more attention to it. Can the "show" be changed from the template itself, or is it changed from the same place than the "edit" tags? Best regards Rhanyeia ♥♫ 09:41, 17 February 2008 (UTC)[reply]

Does someone know who is the editor who can change the place of the "edit" tags? He would probably know what to do with the "show" tags too, and if I remember correct the "edit" tags have been in different places sometimes. Best regards Rhanyeia ♥♫ 08:36, 23 February 2008 (UTC)[reply]

Any admin who understands the necessary CSS or markup changes should be able to make them. For reference, one place that has the edit tags in a different place is German Wikipedia from what I recall (on the left instead of the right). Mike Dillon 16:46, 23 February 2008 (UTC)[reply]

Did anyone have any objections to the modification of the "show" location on the translation bars? They are so clearly good from the point of view of basic user-oriented computer interface design that it seems a shame to revert them unless there is some extremely good reason, which has not been provided. DCDuring TALK 17:07, 5 March 2008 (UTC)[reply]

Hello. I strongly object to the insinuation above that making mouse-click targets move all over the place is an "expert user-interface opinion." That, on the face of it, is a bald lie. Do you have an "[X]" in the top corner of this window? How would you feel if that were placed randomly? Placing widgets consistently, is much larger UI issue than conjoining. I know I commented somewhere when I rolled back the experiment - but am not finding any traces of that comment, now. If my initial post on it, did not save, then I apologize (and am curious as to why.) --Connel MacKenzie 17:07, 10 March 2008 (UTC)[reply]

There's another possibility too. They could be placed under the title text, either on the left or in the middle of the bar. Best regards Rhanyeia ♥♫ 17:13, 10 March 2008 (UTC)[reply]

So Connel MacKenzie has brought out one downside what the "show" tag being right after the title text has, so the tag wouldn't be in the same place in all bars. Could we try to explore the possibility where the "show" would be on the left side? How would be on the left and under the title text, any opinions about this please? :) Best regards Rhanyeia ♥♫ 16:50, 11 March 2008 (UTC)[reply]

For those to know who have not watched Grease pit, there was also a discussion about this there a little while ago. Personally I think "show" was quite good after the text, but it might be even better on the left side one way or another. In the beginning of this conversation Ruakh suggested that it could be before the title text, and Conrad.Irwin thought that might be messy. Do you Conrad.Irwin still after these discussions think so if there was "show", a little space in between and then the title text? There's also one more way to do it. On the vote page the title texts are centered. If the translation bar title text was centered and the "show" was on the left side it would look almost like the vote page now, except that the "show" would be on the other side. How do these things sound? Best regards Rhanyeia ♥♫ 17:59, 13 March 2008 (UTC)[reply]

Placing "Show" immediately to the right of the text is a superior solution for users who are beginning to use the system, I think. As I noticed and continue to notice in my own experience, it doesn't work very well for those who use the system heavily, ie, us (regular users). Given a choice (in WT:PREFS or "my preferences"), we could set our preferences to support our needs. Anons have no persisting choices and inexperienced registered users mostly don't know they have choices. To the extent that we can do so without overtaxing our tech experts, the servers, and user patience (download times and other latency delays), it would be nice to have accommodation for each broad group of users (anons/new users, registered users (mostly newish), admins and other regulars, experts). But the needs of anons and other new users would and should determine our defaults. In the absence of any specific data about their behavior on Wiktionary or Wikis in general, we are forced to resort to general principles of naive-user usability. Or we could just make all user interface choices for our own convenience and amusement until someone pulls the plug on all this. DCDuring TALK 19:16, 13 March 2008 (UTC)[reply]

It would be nice to have a 'show all' option at the top to expand all boxes at once. Pistachio 15:33, 14 March 2008 (UTC)[reply]

Both of these ideas sound good, that one could set these things from preferences and that there would be a "show all" tag too, but I don't know if there would be a volunteer to code them at least at this moment. In the meanwhile, DCDuring I agree with you that right after the text could be good, but what if we tried also some place on the left of the bar to be able to compare them? Best regards Rhanyeia ♥♫ 18:04, 16 March 2008 (UTC)[reply]

I didn't mean to suggest not to do that. In some ways it is a perfect solution, providing both predictability for experienced users and obviousness for inexperienced users. DCDuring TALK 18:18, 16 March 2008 (UTC)[reply]

There is already a WT:PREF for showing all tables, and it is something that shouldn't take too long to code - though I would quite like to rewrite that section of javascript completely - I don't have time for the moment though. It is possible to override the position of the [show] buttons using personal monobook.css by adding the following lines - but whether to set it as default is a different matter. (This could easily be made into a WT:PREF if people want)

.NavToggle {
  display: inline !important;
  position: static !important;
  float: none !important;
}

Conrad.Irwin 11:46, 20 March 2008 (UTC)[reply]

It's great if there may become new features to the preferences regarding these things in the future. Maybe we could try to think about where the default place of "show" would be? Right after the title text has that one downside but might still be possible, on the left side before the title text has not yet been tried. Are there opposing opinions for trying how it looks? Best regards Rhanyeia ♥♫ 16:13, 27 March 2008 (UTC)[reply]

I guess the next thing to do would be trying that. It takes an administrator to edit that file, Conrad.Irwin you'd know how to do it, do you think this is something which you could try there please? Best regards Rhanyeia ♥♫ 14:42, 3 April 2008 (UTC)[reply]

I've fixed the current NavFrames so that this is now possible to choose with only CSS - if you like it then I'm happy to give it a site wide go, but I'm not sure it works that well.

.NavToggle {
  float: left !important;
  position: static !important;
  right: inherit;
  margin-top: 0.1em; /* To counter the 90% font size used */
  margin-right: 5px;
}

Just add the above to Special:Mypage/monobook.css. Conrad.Irwin 11:34, 7 April 2008 (UTC)[reply]

I tested it and I think that's quite good and the tag would be much easier to find. For me the template rel-top looks better with it than the template trans-top. Could they both be like rel-top? Best regards Rhanyeia ♥♫ 16:00, 9 April 2008 (UTC)[reply]

Thank you Conrad Irwin for fixing the templates to be similar and explaining why it's better that way. Best regards Rhanyeia ♥♫ 15:27, 14 April 2008 (UTC)[reply]

If this becomes changed to the left, how about the first bar of any entry having "show all" tag on the right side? Would that be difficult to make? Best regards Rhanyeia ♥♫ 16:31, 9 April 2008 (UTC)[reply]

What Rhanyeia said. And a big thanks for this, Conrad. I hope that there are more than the two of us using it. It would seem likely to nicely facilitate display for ordinary users without making it too easy for them to accidentally click on edit. Fixed position is good. DCDuring TALK 18:04, 9 April 2008 (UTC)[reply]

Yes, thank you Conrad.Irwin. I plan to begin a vote so that the default place of the "show" could be changed. Before that, since these templates are used in the mainspace for important things, maybe "show" wouldn't need to be only 90% font size? And thank you Robert Ullmann for making the bars float better. Best regards Rhanyeia ♥♫ 09:24, 13 April 2008 (UTC)[reply]

Just because voting sounds like too much bureaucracy, I've gone ahead and implemented it. If people really object to it, please undo the edit - but comment here to let us know why. Conrad.Irwin 15:59, 14 April 2008 (UTC)[reply]

Thank you for testing, although it had to be changed back. I'll continue on Wiktionary:Beer parlour#"Show" tags. Best regards Rhanyeia ♥♫ 14:23, 15 April 2008 (UTC)[reply]

Inconsistent quotation examples

There's an inconsistency between what follows the year in these two places. In the first it's colon and the second a comma. Looks like the comma is preferred? - dougher 05:17, 1 March 2008 (UTC)[reply]

I always use the comma, since that's what's shown repeatedly on the detailed WT:QUOTE page (your second source). Wiktionary entries are all over the map when it comes to formatting quotations, but I usually change them to conform to WT:QUOTE. I don't really consider this format to be particularly elegant but it is simple (every element of the citation is separated off by a comma) and it is important to be consistent. -- WikiPedant 06:06, 1 March 2008 (UTC)[reply]

Looks like {{quote}} is a necessity, the only problem is that no-one can agree how it should work. I, by copying other entries always assumes we should use '—' after the year... Conrad.Irwin 13:20, 2 March 2008 (UTC)[reply]

Good grief, not this discussion again. No one has successfully developed a template that will do what such a template would need to do. Look at the many previous discussions on this issue; it is pointless to start it again here. --EncycloPetey 15:06, 4 March 2008 (UTC)[reply]

AOL

Isn't Wiktionary:AOL and the link from the main page unnecessary now since the X-forwarded-for is provided so AOL users can be identified by their individual IP? At least that was my understanding from wikipedia, see Wikipedia:Wikipedia:AOL Nil Einne 06:52, 1 March 2008 (UTC)[reply]

OK, I've just unblocked. Let's see how this goes, shall we? --Connel MacKenzie 20:50, 1 March 2008 (UTC)[reply]

In response to the various IRC questions: I unblocked the forward-facing AOL proxy servers that now supposedly forward the XFF-forwarding information correctly. AOL was not "blocked" for years - it was "blocked from invisible access" while allowing all AOL users access over the https: servers. This was no big, dramatic change - individual AOL blocks should still be limited to 15 minutes to 1 hour. --Connel MacKenzie 03:03, 2 March 2008 (UTC)[reply]

Encyclopedia of Life

This online resource (at [1]) would seem to be a good source of animals, plants etc. I have added one of their featured entries - green anole. SemperBlotto 12:15, 1 March 2008 (UTC)[reply]

I take that you mean it as a reference. We can't use their content (esp. the very nice images), can we? DCDuring TALK 19:58, 1 March 2008 (UTC)[reply]

Their terms of use page says this:

Please note that a single page may be made up of many different data elements, each covered by a different license. You are required to check to see which license applies to any portion(s) of the page you wish to re-use and to abide by any restrictions on that content. ... In most cases, EOL data partners have made content available for re-use under one of the following Creative Commons licenses: ... To identify the terms of re-use of a photograph or drawing, click on the green information button on the bottom left corner of the picture.

I spot checked a couple of images from their home page and one of them was CC-BY-NC and the other was CC-BY-NC-SA (meaning we can't use either of them). We should be able to use any images that are CC-BY-SA; in general we can use CC content that has "SA" (share alike) and does not have "NC" (non-commercial). The "BY" (attribution) doesn't make a difference as to whether or not we can use it, only whether we have to give attribution or not (which we would probably do regardless). Mike Dillon 01:49, 2 March 2008 (UTC)[reply]

Could anyone with a zoology, botany, or microbiology (not to slight viruses, molds, and fungi, et al.) background compare and contrast this to WikiSpecies? DCDuring TALK 15:57, 14 March 2008 (UTC)[reply]

plural and uncountable

Have we currently got a consistent way of marking entries where some senses are the plural forms of single nouns, and other senses are uncountable nouns?

The most recent example I've come across is hostilities, but there are many others. Thryduulf 17:20, 1 March 2008 (UTC)[reply]

See weeds. --EncycloPetey 17:49, 1 March 2008 (UTC)[reply]

Would it not be reasonable to insert a link to plurale tantum at the sense lines (there might be a template to make the format uniform.)? DCDuring TALK 19:54, 1 March 2008 (UTC)[reply]

Language index as category

I am still thinking about how to keep the language index current. Could an index page be treated as a category? Then when I create an entry, I would add it to the appropriate letter category, for example a word starting with m would be added to Category:Hungarian index m. I don't know enough about the consequences. An entry belonging to too many categories, performance issues maybe, too much work for editors, although a bot could also add the categories. Or is it easier to regenerate the index every month using the monthly dump? Thanks. --Panda10 14:38, 2 March 2008 (UTC)[reply]

English dictionary should only have English words?

Aren't there Wiktionaries in other languages for words in other languages? For instance, why does the English Wiktionary have entries for être, φλόος, Աստանա, 가마우지 and many others which clearly aren't English words? Should these not be in http://fr.wiktionary.org etc. - for instance, why do we have an être page at the English Wiktionary when this seems the only natural place for it to be? After all, the English Wikipedia has no article entitled "Աստանա", as the English Wikipedia is written in English. Should this not apply here too?

Any comments are appreciated. It Is Me Here 07:27, 3 March 2008 (UTC)[reply]

Wiktionary is a dictionary written in English of all words in all languages, just as Wikipedia is an encyclopedia written in English of all topics from all language-areas. I could look up fr:être at fr.wiktionary, but I don't speak French, so I would still not know what it means. All of the Wiktionaries define all words from all languages in their own language, so a reader can access the definitions of all of them in their own language. Dmcdevit·t 07:37, 3 March 2008 (UTC)[reply]

This explanation is so perfect that it should not be lost. Could it be pasted to some help page? Lmaltier 20:42, 3 March 2008 (UTC)[reply]

Agreed. WT:NOT would also be a good place for this, with a label like "Wiktionary is not monolingual" or some such. -- Visviva 13:39, 4 March 2008 (UTC)[reply]

Done. Well, doing now. --Neskaya ^talk 21:50, 12 May 2008 (UTC)[reply]

Romance verb and past part. forms

Because I'm so fond of adding forms etc., and because I personally feel that our non-lemma entries could use the attention, I'm going to be designing verb-form templates for the Romance languages that I can. I'll be doing them in the same way I did {{ro-form-verb}} ({{fr-form-verb}}, {{es-form-verb}}, {{pt-form-verb}}, etc. SemperBlotto handles Italian like it's cool, so I'm going to leave that one to him.) This is the easy part and I'm not too worried about progress here. (Except in Spanish, I think our verb form entries may have gotten a little out of control here. Especially the categories.)

One of the serious things I want to see under control stat is past participles and their forms, though. Spanish pp forms I haven't seen anywhere, nor have I seen the forms listed in the base past participle entries. ({{ca-pp}}, {{es-pp}}, {{fr-pp}} and {{pt-pp}} are all good for use in inflection lines now, so this shouldn't be an issue for new entries if we all know they're there.)

Now the formatting of past participle forms is probably what needs the most help. Keene's runs a bot (ăsta, de fapt) that adds French verb forms (which is awesomeness, I just want to get everything ironed out smooth before we get started for realreal.) French and Catalan pp forms can still function as verbs or adjectives, so they need to go under Category:French past participle forms and Category:Catalan past participle forms. Others I'm just about certain only function as adjectives, so I don't see a problem with putting them in Category:Spanish adjective forms etc, but if anyone disagrees, I'd like to know why so we can do what's best, etc. See User:Opiaterein/Basic past participle format to check out the standardized format I'm looking at. I think It'll work pretty well translingually.

The languages I'm most concerned with, because they represent the bulk of our Romance languages that still need direction here, are:

French
Spanish
Portuguese
Catalan

If there are any concerns, I'd like to get them out now so I'm not messin' anything up, keh? :) Let's get to it — [ ric ] opiaterein — 14:18, 3 March 2008 (UTC)[reply]

All of them need something that clearly identifies them as minimal stubs, needing definitions, example sentences etc. Users must never get the unfortunate idea that content is somehow not allowed in "non-lemma" entries, or think they can or are allowed to remove it.

Do you mean that all bot-formed entries should have a tag which puts them in a cleanup/stub category, for example in Category:Keenebot2 entries? This is feasible of course, but I can't see how it would be helpful. --Keene 15:11, 3 March 2008 (UTC)[reply]

The inflection must be on the inflection line, the definition in English on the definition line(s). We should have a firm policy prohibiting bot creation of entries that do not contain English definitions. If they can generate all the names for inflections, they can generate the English definitions in the correct forms. If the operator is unable/unwilling to do that, he/she should not be creating them.

Please, can you give a link to such a page with the inflection on the inflection line? I've seen a few already, bu I forget where. --Keene 15:11, 3 March 2008 (UTC)[reply]

Trying to clean up details in the "form of" entries is pointless, they are fundamentally wrong; they will all have to be done over again, either with a decent bot, or by hand. Robert Ullmann 14:42, 3 March 2008 (UTC)[reply]

re: Opiaterein mentions starting for "real real". Keenebot2 has already started for "real real". The bot's almost auto-added conjugations of all verbs tagged with an inflection template so far. Changing it now is possible of course, and if necessary I could cease bot activity until such a time when we're all happy with how inflected entries should look (at the time of the bot vote, there were a couple of oppose votes, so obviously it isn't perfect). However, when Robert Ullmann says that form-of entries are "fundamentally wrong", I must disagree. --Keene 15:11, 3 March 2008 (UTC)[reply]

As for past participles and pp forms, the adding of adjective definitions to them is on my to-do list. I started adding a few adjective sections at the beginning, but haven't done many since then. The same with present participles, many (most? all?) of which can be adjectives in French too. At least these ones are all together in Category:French past participles and in Category:French present participles, so when anyone wants to trawl through them it's easier. However IMHO having only these stub-form entries for forms of verbs is better than nothing, and you'd do well to find a website out there with better entries for each "form-of". Regards, --Keene 15:21, 3 March 2008 (UTC)[reply]

When the same form can be either a participle or an adjective (which is sometimes, but not always, the case), they should be listed separately.

About definitions in English: they are in English! Actually, I find there are 3 kinds of definitions in use:

traditional definitions: they are used for lemma forms of ENglish words, but should also be used whenever appropriate,
translation definitions: one word translating the word defined + a gloss.
grammatical definitions: used for inflected forms. I cannot find a better way of defining the meaning of inflected forms. Take an example: aima. You cannot understand what this word means if you don't understand several concepts, and at least singular, third person, past historic (and its not obvious use) + the appropriate meaning of aimer. Impossible. Try to build a better definition (it's a challenge) and you'll understand what I mean. You could try to define all this in the page itself, but this would still not provide a definition of aima (only of aimer). You could also try to provide a translation (loved) but, clearly, this is only a translation, not a good definition, and this would not help to understand the difference between aima, aimas, aimai, aimé, aimée, aimés, aimées... as all these words have the same translation. In such cases, grammatical information is part of the meaning. Lmaltier 17:45, 3 March 2008 (UTC)[reply]

I strongly agree with Lmaltier. Non-lemma entries require a grammar-focused description of the relationship between the headword and the corresponding lemma. Rod (A. Smith) 18:03, 3 March 2008 (UTC)[reply]

Regardless of the debate regarding use of definitions in non-lemmatic entries (disclosure: I am most firmly against Ullman's crusade), there is already a template for French, it's {{fr-verb-form}}. I've created a redirect from the name there (I believe the POS should come first, but then I think we need a naming convention for these templates, and getting agrement on style or naming issues here is even more difficult than on wp...) 18:09, 3 March 2008 (UTC)

{{fr-form-verb}} isn't meant to be used in the inflection (header) line, it's meant to show the actual forms. Check out vorbesc to see the corresponding Romanian template in action.

From this point on, I want no more talk of definitions or glosses in form-of entries. We've been over it a hundred times and it's not the subject of this discussion, thanks. :) — [ ric ] opiaterein — 18:42, 3 March 2008 (UTC)[reply]

Okay, that makes more sense, but there are still quite a few kinks to smooth out IMHO, though:

I definitely think this ought to use the {{form of}} meta-template, if only for formatting (And they need to start with a capital letter, too!), because there is no specific reason to set them apart from stuff as formated by {{feminine of}} and {{plural of}}.
These should be something like {{verb form of}} (or if limited {{romace verb form of}}) for making their purpose clearer ("XXX of" is the format used by all such templates)
How about a master template? 90% of the romance languages share the same name of verb tenses, persons and mode. Heck, this could easily be one generic template with extra tenses and modes covering most other languages with person-tense-mode agreement.
OTOH, I've by far favored formulations of the sort "first- and second-person" because I find equally readable and slightly more compact (yes I know Wiki is not paper), although that could possibly be handled with special abbreviation for the combinations (there are only 3 or 4 different ones for french). In any case, there is no absolute need to have a "I read" bit after the lemma definition: we can always have it formatted as an example if we insist upon having it (although I'd again favor not using them at all).

Oh boy that was wordy... But necessary. Currently, different editors have applied different methods of formating and of applying this formatting, and I think we should seriously consider sorting that. And finally, there is also an {{es-verb form of}} (which I didn't know about when I suggested that name). Circeus 22:52, 3 March 2008 (UTC)[reply]

Funny that folks should mention "standard template" and "Romance verbs all have the same stuff". I recently set up {{conjugation of}} to do exactly this. It accepts a language argument, so no new template is required to add all the various Romance language verb forms; the template is flexible enough already to handle them as it is. --EncycloPetey 02:51, 4 March 2008 (UTC)[reply]

Re: "Others I'm just about certain only function as adjectives, so I don't see a problem with putting them in Category:Spanish adjective forms etc, but if anyone disagrees, I'd like to know why so we can do what's best, etc.": Spanish past participle forms often function as verb forms: just like in English and French, they're used in forming ~~the perfect aspect and~~ the true passive voice. (~~Neither of these is~~ This isn't as common in Spanish as in English or French, but ~~they do~~ it does exist.) —Ruakh_TALK 20:48, 3 March 2008 (UTC)[reply]

No no no, Ruakh, I know that the past participles are used as both verbs and adjectives, I mean the forms of the past participles. Preservada, escondidos, habladas :) — [ ric ] opiaterein — 23:48, 3 March 2008 (UTC)[reply]

I've edited my comment accordingly. :-) —Ruakh_TALK 01:29, 4 March 2008 (UTC)[reply]

Participles are a really sticky problem in Romance languages. I haven't made up my mind how I'd like to see them handled. Most Latin past participles inflect and function as adjectives, but all grammars treat them as a verb form, and they are used to form certain verb tenses. I've even considered the possibility of just using the POS Participle, but that too leads to problems. --EncycloPetey 02:54, 4 March 2008 (UTC)[reply]

In French, there is no such rule. Adjectives are ofteh created by using participle (mostly from past participles, and it is usual that only very common such adjectives are mentioned as adjectives in dictionaries, to save space). For French, at least, the simplest and best way is to mention them as verb forms and to mention them separately as adjectives.

[reset tabs] I have a separate, but related, worry. Many past participles have taken on a life of their own, and have an adjectival meaning which is wider than their verbal meaning: for an example, see tired, fatigué in French, cansat in Catalan (the meaning is pretty much identical in the three languages). I think these need to have two PoS headers, one for the adjectival meaning and one as the past participle of the verb. On the other hand, if the adjectival use is completely subsumed by the verbal use (eg, underlined), I don't see any need to list them separately. My second, more minor, complaint against ric's suggestion is that the participle forms do not link back directly to the lemma form, which for most romance languages is the infinitive. The very definition of a participle is "a form of a verb that may function as an adjective or noun" (present participles may also function as adverbs in most romance languages)—the link back to the root-form of the verb seems essential to me. Physchim62 14:13, 4 March 2008 (UTC)[reply]

Maybe I'm missing something with this participle talk. Speaking for Spanish only, past participles used as a verb only have one form (-o) unlike how it is currently listed in preservado. They only have 4 forms (gender x number) when used as an adjective (in which case we use {{es-adj}}) or as a noun (in which we use {{es-noun-mf}}) as most Spanish adjectives can be used as nouns. In the passive sense they are used like adjectives. See desaparecido for a quick try at seperating out all three senses. What's the need for new templates like {{es-pp}}, shouldn't we be just standardizing use of the old ones? --Bequw → ¢ • τ 14:48, 4 March 2008 (UTC)[reply]

You understand the grammar, but the point of the discussion is that this is a regular pattern for Spanish participles. They behave almost as their own part of speech with a consistent set of rules. So, instead of discussing each participle as a verb form and and adjective and a noun, and giving all the inflections and notes for every participle, it almost makes more sense to just call it a Participle and point to an Appendix on spanish participles to explain what that means and give all the definitions in one section. It's a bit like the way we agreed that attributive use of English nouns does not make them adjectives; it's just that English nouns can function attributively as a regular part of English grammar. Or the way that English cardinal numbers can function as both noun and adjective, but we can catch both functions by pointing out that the word is a cardinal number. So, one way to handle Spanish participles is to call them Participles. --EncycloPetey 15:03, 4 March 2008 (UTC)[reply]

Bequw just explained that, in Spanish, it's not a regular pattern. It's not a regular pattern in French either: many past participles cannot function as adjectives. And most present participles cannot function as adjectives (when they are used as adjectives, they are not considered as verb forms). Why inventing misleading solutions? Lmaltier 19:32, 4 March 2008 (UTC)[reply]

I find it ironic that in accusing me of "inventing solutions" and saying I didn't read what was said, you're showing that you didn't read my post. As I said earlier, I have not decided how I feel about this because both approaches have problems. I do not see where Bequw explained anything about a lack of pattern in Spanish, so if you could show me where he said that, I'd be grateful. I did not propose a solution, I explained one possible position because (as Bequw said) "maybe I'm missing something". I therefore laid out the full discussion so we could be clear about what's being discussed. I proposed no solution. Sheesh! --EncycloPetey 02:38, 5 March 2008 (UTC)[reply]

I don't accuse you of anything. I can't speak Spanish, but I understand from past participles used as a verb only have one form that some participles are used as a verb only, and some may be used as adjectives (but I may misunderstand the sentence). This would mean that there is no general rule applying to all participles. Even if I misunderstand it, it's clear that word forms to be listed are not the same when the word is a verb form and when the word is an adjective, and this suggests separate headers. For French, I'm sure that paper dictionaries always consider participles as verb forms (and they are right) and, therefore, do not list them, but they list adjectives derived from participles (systematically for adjectives such as souriant, as readers would not be able to guess that souriante exists, and only when they are common enough for adjectives such as fumé, in order to save space). Stating that souriante is a participle would simply be wrong, nobody considers this word as a participle. I only mean that there is no need for inventing something more than current headers (this is not the same as cardinal numbers, which follow a regular, systematic, rule). Lmaltier 07:08, 5 March 2008 (UTC)[reply]

Then you misread it, didn't you? Go back and read it again. What Bequw said was that, of the forms a Spanish participle has, only one of those forms (the one that ends in -o) functions to form verb tenses; the other forms function as adjectives. Everything else you just wrote about Spanish participles is incorrect. And you're missing the point about Participles. You are beginning with the assumption that they must be forced into the existing categories of Verb & Adjective. The point of my discussion is that we want to consider the possibility that they deserve separate recognition. Consider: English nouns can modify other nouns and function like an adjective, but we don't call them an Adjective here just because they function like one. They're still nouns. The point I'm making is that perticiples are etymologically verbs, function like verbs, but that they also function like adjectives. If we use the POS Participle for Romance languages, then we can include the adjective function in the same POS section without having to pretend there are two separate words involved. I don't pretend that I've solved anything here, but please do consider the possibility and help discuss both the pros and cons, rather than simply dismissing it. This may be different for French, but I don't know French. --EncycloPetey 14:58, 5 March 2008 (UTC)[reply]

(Ears burning) The process of Participle → Adjective → Noun is standard (in terms of changes in the suffix), though it's not universally applicable to all Participles. Some Participles would just be weird to use as nouns, and if you used one in that case people would understand you but in the way some neologisms are easily understandable. Some Particples used as nouns have a slightly different meaning (secado is "the action of drying", not "the dried thing" that one might infer from its descent from secar (“to dry”)). So if a participle heading were to be used, it would be helpful if it could show in an integrated way how a word could be used in the standard POS terms. Maybe like Particple (Verb or Adjective) or Particple (Verb or Adjective or Noun). Possibly this would be in the header or maybe the inflection line. --Bequw → ¢ • τ 19:35, 5 March 2008 (UTC)[reply]

This may or may not be relevant, but it may interest you to know that Ancient Greek will definitely be using the POS "Participle." Obviously it's not a Romance language, but I thought you might find that information useful anyway. -Atelaes λάλει ἐμοί 16:36, 5 March 2008 (UTC)[reply]

I might have misunderstood the intention, too. But changing the existing entry for sucré would be wrong. In French, an adjective is not a participle, a participle is not an adjective, and they have different meanings: an adjective is about a characteristic feature of something and has nothing to do with the verb (except etymologically), a participle has a meaning related to the meaning of the verb. sucré, as an adjective, means sugared (the adjective) or sweet, as a participle, it means sugared (the participle). This is important, because it makes clear that many French sentences are ambiguous. An example: Il a sucré son café, puis a bu le café sucré. Le café était très sucré.. The first use of sucré is a participle, the third one an adjective, and the second one is ambiguous (the intended meaning may be 'sweet', in which case it's an adjective, not a participle, or 'which has just been sugared', in which case it's a participle, not an adjective). Lmaltier 17:41, 5 March 2008 (UTC)[reply]

Can anyone quote me a grammar of any IE language that defines 'participle' as a distinct part of speech, not the usual catch-all term sense for 'verbal adjective/adverb' or as an unseparable compound in tense-formation? Why do you think there are none?
This is an English wiktionary and all the entries must be normalized to English senses/terminology. It would be a dangerous precedent to impose additional headers just because the editors are lazy to provide quality content. --Ivan Štambuk 17:30, 5 March 2008 (UTC)[reply]

Wrong on all counts. (1) I have grammars of Ancient Greek and Latin that treat Participle as a part of speech. Grammars are inconsistent about how they treat participles; some treat them exclusively as verb forms, while others treat them primarily as adjecitves. Other grammars treat them as if they were two separate words. Some of these same grammars will treat adjectives used as if they were nouns solely as adjectives. We should not blindly follow mono-lingual grammars when we are creating a multi-lingual dictionary, but should consider how best to treat the words themselves for the use of our readers. (2) Not all entries must be "normalized to English terminology". That argument went out the window the moment we started including African and East Asian languages. Take a look at the Japanese POS list sometime; it does not normalize to English POS categories because Japanese is not like English or western languages. (3) It is not laziness to discuss a topic and come to a consensus that fits the needs best. What is lazy is to throw out an idea simply because you assume it is a bad idea, assume the world works a particular way, and assume that you're right. The participle discussion is about how to provide quality content. --EncycloPetey 01:15, 6 March 2008 (UTC)[reply]

What are you talking about?? What Latin and Ancient Greek grammars list participles as distinct part of speech? Nouns, verbs, pronouns, adjectives...participles? Don't be silly. "Participle" in linguistics is just an umbrella term meaning "this is really an adjective/adverb/inseparable component... derived from verbal root by regular morphology, meaning exactly what verbal root does in the context of it's application". If it inflects exactly like an adjective, translates in English exactly like an adjective, and is used like an adjective - it's an adjective allright. The fact that there are well-defined rules for producing participles from verbal roots does not mean that all of them are verbal forms that should be put under =Verb=, or worse, under =Participle=; when both their semantic value an inflection properties demonstrate that they act as a separate PoS, you must treat them as such.

Nobody advocates blindly following of a mono-lingual grammar, I't very rude of you to put words like these in my mouth.

I'm sure readers will be much more delighted if they saw a real definition line for a participle, not some "Xx participle of " stub. Advocating separate PoS heder for paticiples that would only be populated with stubs cannot be a quality argument for end users.

Your argument on Japanese is what exactly? We're including non-IE languages and that fact legitimizes this linguistic perversion participle-as-a-PoS? I don't see anything unusal on Wiktionary:About_Japanese#Parts_of_speech, what specifically did you have in mind?

You wrote it yourself: So, instead of discussing each participle as a verb form and and adjective and a noun, and giving all the inflections and notes for every participle, it almost makes more sense to just call it a Participle and point to an Appendix on spanish participles to explain what that means and give all the definitions in one section. - this sounds like laziness to me. For langauges in which participles themselves could have many different inflected forms, this would link form-of entries to form-of entries, which would admittedly make the automated generation of entries much easier, but certainly not for the users to understand them.

Sorry, but te idea of =Participle= section appears to me as silly as that of =Infinitive= and =Gerund= (there are also languages with plenty of those too). --Ivan Štambuk 16:24, 6 March 2008 (UTC)[reply]

Again, you have claimed one thing, but said differently in the same post. You siad "Nobody advocates blindly following of a mono-lingual grammar". OK, so why then does my discussion of treating Spanish participles lead you to ask how this would affect other languages? You keep assuming that whatever we decide to do for one language must necessarily be applied equally to all languages, i.e. a mono-lingula grammar. No one believes this must happen except you. That is why I pointed you to the Japanese parts of speech list. Japanese POS headers include "Quasi-adjective" and "Counter word", but these headers do not have to be applied to any other language. We can and do recognize different POS headers for different languages.

I also have no lcue why you think having "Participle" as a POS header would lead to stubs. If it has its own header, then it must have its own lemma; it will not be a "form of" entry, but will have definitions, inflection tables, related terms, descendants, and all the other sections a lemma would have. The only way a stub happens is if someone doesn;t add the additional information. Stub-ness is completely unrelated to the question of what POS we use. Let's bury that straw man argument now. --EncycloPetey 16:39, 14 March 2008 (UTC)[reply]

Nearly all of my Ancient Greek grammars treat participles as a separate category. -Atelaes λάλει ἐμοί 01:24, 6 March 2008 (UTC)[reply]

Sanskrit has been using exactly four PoS categories for the last 2500 years (since it codification): nouns (nāman, in broader sense - substantives), verbs, prefixes and particles. In all Sanskrit dictionaries today you'll find conjunctions/adverbs/interjections/... combined in a category of "indeclinables" named "avyaya". Does that mean that we should dismiss usual English/Western terminology in favour of the Pāṇini's scheme? NO! --Ivan Štambuk 16:34, 6 March 2008 (UTC)[reply]

Ok, but the mere fact that a number of editors working of a number of different languages are considering participle as a POS should certainly indicate that this is not some esoteric POS invented by a single author. And, for clarification, in Koine Greek (the form of Greek I received my formal education in and am most knowledgeable of) participles did not function as adjectives all that often. I would say the most common usage is along the lines of the English infinitive, probably followed by a noun usage. Smyth, Black, Long, and Wallace all treat participles as separate POS's. -Atelaes λάλει ἐμοί 18:54, 6 March 2008 (UTC)[reply]

Some of the AGr. grammar books I'm browsing on books.google.com really say participle as a PoS, like this and this one, but at the same time they don't treat adjectives as a separate PoS at all ^_^ w:Lexical category claims that "It wasn't until 1767 that the adjective was taken as a separate class", so those participle-as-a-PoS could indicate obsolete terminology used by ancient grammarians, and continued in modern tradition. And all of those AGr. books have a separate chapter on infinitives just next to the one on participles, and I don't see anyone advocating =Infinitive= header. Encompassing under =Participle= both adjectival and substantival meaning of participles which they could acquire through context, and translating them as different English PoS seems utterly wrong to me. In Polish, mało (“little”) is classified as a Numeral (liczebnik), but not so usually in other languages; - should the article conform to English or Polish notation of a numeral? Dumping all of those participles under unique =Participle= section that would be used for different things in different languages seems additional argument against to me, because it would almost certainly be the vaguestly defined lexical category of them all. At any case, the thing that should not be forbidden is promoting those =Participle= or =Verb= (form) stubs (stubs (stubs with declension tables though!) to proper full-blown =Adjective=, =Noun= or =Verb= section, with normal English gloss and usage samples. --Ivan Štambuk 20:28, 6 March 2008 (UTC)[reply]

Huh? How the translation is used in another language has no bearing on what the POS is in the native language. In Slovene, the names of languages are adjecitves and adverbs, never nouns, but the translations of those terms into English is a noun. That doesn't mean the Slovene word changes POS to match its English translation. I really don't follow your arguments. A few paragrpahs ago you were telling us we should use Participle as a header because nobody does that and we don't want to set a "dangerous precedent". Now that we've pointed out it is used (and has been for a long time), you a balking that it's "obsolete". So, you don't want to follow established tradition, and you don't want to try something new. So what exactly do you want to do? The grammars for Latin and Ancient Greek are not obsolete, and they have changed the way they treat POS over the years to better reflect accumulated understanding of the language. The retention of Participle is charcteristic of many of the most progressive and modern Classical language texts. Please don't dismiss the work of renown experts just because you have a bee in your bonnet about Participles. And, by the way, mało is a Determiner, or more specifically an "indefinite numeral". We actually do use Determiner as a POS here. --EncycloPetey 00:38, 7 March 2008 (UTC)[reply]

1) slovenščina, angleščina, nemščina etc. are, of course, nouns in Slovenian. Language names ending in -čki/-ski/-ški/-ský/-ский are adjectives/adverbs exclusively both in all Slavic languages and in their English translations. If you see a Slovenian adjective/adverb formated as a =Noun=, that's a mistake that needs to be corrected.

2) I never advocated using =Participle=, you must have misread something. Yes, I'm "balking" that it's obsolete, because those same books that list "participle" as a PoS don't list adjective as a PoS at all. Every time you use =Adjective= in Latin/AGr. your'e already not following "established tradition". If we were to follow traditions of particular languages, PoS header names would be a complete mess. The only possible solution that doesn't lead to chaos is to normalize everything to English.

3) I've never seen a "Determiner" used as a PoS in any Slavic language. Other Slavic langauge's correspondents of mało, pół, dużo,ćwierć etc. are always treated as adjectives/adverbs. This treatmeang of "fractional numerals" is Polish-only. Formating pariciples that are obvious adjectives as =Participles= makes as much sense as formatting Polish adverbs/adjectives as numerals.

4) Here's an excerpt from Encyclopeda of Language and Linguistics (Elsevier, 2005): p330, on Applicational Grammar: "Other items belonging to the category 'adjective' are prepositional phrases – about gardening combines with the term books to yield the term books about gardening – and participles, such as English sleeping from sleep and Russian igrajushchij (‘playing’) from the stem igra-. The primary function of verbs is to apply to a noun to yield a sentence. One secondary function is to act as an adjective, which is signaled by the participial suffix." On most other places on the Internet I found that the separate lexical category of participles is included by some authors, but at best it remains an exception rather than a general rule. --Ivan Štambuk 20:25, 7 March 2008 (UTC)[reply]

Why do you continue to assert things that are flatly untrue and which contradict each other?

(2) "those same books that list "participle" as a PoS don't list adjective as a PoS at all" This is flatly untrue. The modern textbooks on Classical languages have both parts of speech listed. "The only possible solution that doesn't lead to chaos is to normalize everything to English." This assertion was thoroughly refuted the last time we discussed Japanese and Korean grammar. You can go look at those arguments yourself, since there is no point in repearting the whole discussion here. We cannot and should not try to shoehorn all languages to fit an English model of language. If that were possible, then linguists would not have abondoned the idea of "universal grammar" as most now have.

(3) "I've never seen a "Determiner" used as a PoS in any Slavic language". Just because you've never seen it does not invalidate it. It's in standard modern English grammars, so if we standardize to modern English as you suggest, then we'll have to apply them to Slavic languages, won't we? In short, your arguments in (2) and (3) above are inconsistent. One of them will have to give way to the other. In any case, I have several Slavic language grammars that recognize "Numerals" as parts of speech, including Slovak for Slavicists by Baláž et al., Czech for English Speaking Students by Šára et al., Polish: an Essential Grammar by Bielec, A Basic Reference Grammar of Slovene by Derbyshire, and Introduction to the Croatian and Serbian Language by Magner. Likewise, I have a range of grammars for several languages (including English) that recognize the "Determiner" as a separate part of speech that includes numerals, articles, and demonstratives as subtypes.

(4) If most of the places you found on the internet have a separate lexical category, then how does that make it an "exception"? If most sites are using it, then it is the general rule. Are you arguing that, on the basis of an Elsevier Encyclopedia, that we classify prepositional phrases as adjectives? Please note that the source you quoted says that a participle is an adjecitve and says it is a verb that functions as an adjective. That is it is both simultaneously, and not just one or the other. That's what the POS of "Participle" means.

--EncycloPetey 03:54, 12 March 2008 (UTC)[reply]

2) Those that I've looked on b.g.c that list participle as a separate lexical category and that I've linked above really don't list adjectives as PoS. This accords with that 'pedia article that asserts that they weren't recognized as such up until recently. That the participles retained separate grouping is more a continuation of an established tradition, not because of a real necessity of doing so. Even today in Sanskrit adjectives are not really recognized as separate category from nouns (if it inflects exactly like a noun, sometimes behaves exactly like a noun - why treat it separate?). What is a pure convenience for one language's descriptive grammar's tradition cannot be used as a general argument for all languages. It's pointless to introduce =Participle= only for a couple of languages, and not doing so for all the other ones. Linguist didn't exactly abandon the idea of "universal grammar", for you see 99.999% of morphosyntactic constructions in all natural languages are describable with CFGs and nicely fit some general framework.

3) It doesn't, but maybe that indicates something, doesn't it? What is a "determiner" in English (according to w:Determiner (class) not even widely excepted term in English grammars) are really all adjectives in Slavic. You can write something like "Big likes something different." that it still makes sense, because this "big" will then have an ending that will indicate gender & case. I didn't claim that numerals weren't classified as PoS in Slavic (where did you get that?), but was just refering to this special-case Polish "partitive numerals" ("little", "half" etc.) that are really classified as adjectives/adverbs in all the other languages. That kind of chaos you get when you apply local standards as general formatting scheme.

4) Most sites are not classifying participles as separate lex. category (you again must have misread something). That OR relation in that sentence was not inclusive at all; participle can behave like multiple PoS according to it's context, and each of those can be separated into it's own L3 header. Just like almost every English present participle can be used as a verb, noun or an adjective. In some other languages these are on not "syncretized" into one form and each of those would have different translations in their ====Translations==== section. Similarly, participles in other languages that can have multiple PoS functions must be formatted as such, and have different translations in English. It's like MxN relation.

5) My point is: you can't just treat something collectively because it's convenient to do so, under the assumption that the reader will know all the details under what conditions participle can act in what ways. I don't know what were you exactly discussing on those Altaic languages, but I fail to see the implications to this topic. Participles could have verbal, adjectival and nominal functions 6000 years ago, have lost some of them in some daughter languages, but the point remained the same: they're not some "special" PoS just because they can have multiple lexical functions depending on the context. --Ivan Štambuk 09:23, 13 March 2008 (UTC)[reply]

2) Then perhaps you should visit a library and look at physical books. I have not seen any recent grmmars of Latin or Ancient Greek that completely failed to have an "Adjective" category. You are focussing only on the older books if you aren't finding adjective, which means you are ignoring modern research.

3) Clearly you do not know whicj words in English are Determiners. "Big" is not a determiner; it is an adjective. "little, "half", "this", "that", etc. are Determiners. It is not a "local" standard -- the behavior of determiners is a commonality among most European languages.

4) Yes, and almost every adjective can function as a noun. Almost every noun can function attributively as an adjective. But, we deliberately do not have a separate Adjective section for each attributive use of a noun, because that would be silly. The meaning is present in the noun, so we only list it as a noun. In other words, we already have situations parallel to the idea of listing under "Particple" all the various POS functions.

5) If you think my entire line of reasoning is based upon "convenience", then you haven't been paying attention to anything I've said. And what Altaic languages are you talking about? There is no point in having a discussion if you are going to keep jumping to topics that haven't been raised instead of dealing with the questions put to you. I have pointed out severl times that your arguments contradict themselves, and you have yet to address that very critical point.

6) Worse, your arguments keep jumping all over the map. Let's return to something you said above: "when both their semantic value an inflection properties demonstrate that they act as a separate PoS, you must treat them as such." OK, so if Latin participles decline wiht a present, perfect, and future form, how can we stuff them into the Adjective category? Latin participles have tense, which adjectives do not. We therefore have inflection properties that demonstrate they act as a separate part of speech. The same is true for Ancient Greek. And, no, they don't simply function like adjecitves, since they are used to form compound verbs as well, and adjectives don't do that. --EncycloPetey 16:19, 14 March 2008 (UTC)[reply]

In no way is it "laziness" to provide an accurate description of the grammars of the different languages we include. It is smallminded to pretend that everything can be neatly fitted into some form of bastardized English grammar. Physchim62 13:49, 6 March 2008 (UTC)[reply]

Yes, the description must be accurate, this is why I insist so much. I feel I must mention the 'adjectif verbal' (verbal adjective) concept, defined as adjective formed after a participle. The name is misleading, but the definition is clear, and this name seems to be fairly common in French. Try to google 'adjectif verbal', and you'll find that sites consistently insist on one point: verbal adjectives should not be confused with participles. It's probably equally true in many languages (including Spanish, as I understand it, and English), but it's especially important in French, because the singular, plural, feminine forms are often pronounced the same, and because the basic spelling itself sometimes depends on this distinction (e.g. intriguant = present participle, intrigant (same pronunciation) = associated 'verbal' adjective). If you don't clearly understand the difference, you are likely to misspell many words. Lmaltier 17:50, 6 March 2008 (UTC)[reply]

That's great for French, but more than French is at issue here. If you look in most Ancient Greek and LAtin grammars, they will define a participle as a "verbal adjective". So what may be true of French is not true in other languages. The spelling is certainly not an issue in either Spanish or English. In those languages, the "participle" is spelled the same way whether it functions as a verb or adjective. Spanish participles can change their spelling when used as a modifier, but the msculine singular is the same as the spelling used in constructing compound verb tenses. So, it seems that Participle as a header may not work for French, but that only addresses one of the languages under consideration. --EncycloPetey 18:55, 6 March 2008 (UTC)[reply]

I'm happy you are convinced. I just want to add that, IMO, the only good general solution is to list all adjectives as adjectives, all nouns as nouns, whatever their etymologies. Bulgarian verbal nouns can be found in conjugation tables. Nonetheless, and fortunately, they can be found in dictionaries, as nouns, because they are nouns. Lmaltier 07:17, 7 March 2008 (UTC)[reply]

The problem with that suggestion as I understand it is that we will end up systematically having entries for adjectives and verb forms which mean exactly the same thing in English. I am happy to have separate PoS entries whenever there is a difficulty, but to make this universal seems to be overkill. Physchim62 15:14, 7 March 2008 (UTC)[reply]

I can personally assure you that defining sugared as a verb form and as an adjective is not overkill, despite the fact that the translation in French is the same in both cases. I was not aware of this use. Now, this must be done only when it is considered as an adjective (or a noun...) in the language. Churchill is considered only as a proper noun in English, so I don't propose to define it as a common noun, even if you can say like a Churchill (this is only a figure of speech, and all proper nouns can be used this way). Lmaltier 17:41, 7 March 2008 (UTC)[reply]

Two cents: in Modern Greek, participle is one of the POS, and in the grammars it's listed that way. One class of them is used in forming certain verb tenses. I want to keep the participle header for these reasons. -- ArielGlenn 09:34, 8 March 2008 (UTC)[reply]

Before making a decision, the meaning of participle should be clarified. The current definition states: A form of a verb that may function as an adjective or noun. When combined with a form of auxiliary verbs, such as have or be, they form certain tenses or moods of the verb. But this definition does not work for all participles. Take brumassé, a standard past participle in French. This word is used in compound tenses, but I cannot imagine how it could function as an adjective or as a noun (it would not make sense). I propose to change the definition to something like A form of a verb often used to form certain tenses or moods of the verb (when combined with a form of auxiliary verbs, such as have or be) and that often tends to be used as an adjective or as a noun. The important thing is that you cannot generalize. Lmaltier 21:36, 14 March 2008 (UTC)[reply]

In Lithuanian, participles are definitely a painfully distinct part of speech. The davylis participles in particular are... well, just check this out for a minute or two and you might get what I mean. They're heavily inflected and function very differently from verbs, and most of the time differently from adjectives. — [ ric ] opiaterein — 11:01, 3 April 2008 (UTC)[reply]

New Free Corpus of American English

"The BYU Corpus of American English is the first large corpus of American English, and it is freely available online. It contains more than 360 million words of text, including 20 million words each year from 1990-2007, and it is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts (more information). The corpus will also be updated at least twice each year from this point on, and will therefore serve as a unique record of linguistic changes in American English."--BrettR 15:57, 3 March 2008 (UTC)[reply]

That's "free as in beer" only, meaning that results cannot be copied wholesale, but nonetheless is excellent news. I haven't tested it thoroughly, but this has the potential to be a great boon for sense-verification work. -- Visviva 13:36, 4 March 2008 (UTC)[reply]

It should proove an excellent reference source. Someone may like to create a R:Reference style template for it.--Williamsayers79 08:27, 7 March 2008 (UTC)[reply]

Stylistic recommendations for wikilinks

Is there a written consensus/guide on wikilinks where the target is different than the linking-text? I'm not so much concerned about the general case of [[x|y]] links as they are obviously useful. I am wondering about a the specific shorthand of appending suffixes to the wikilinks so as to link to the "lemma" entry rather than the inflected entry. For example writing [[run]]ning to produce running or [[building]]s to produce buildings. This is really common in Wikipedia where they don't have separate articles or redirects for every word/phrase form. Here in Wiktionary, though, if the inflected entry exists isn't this kind of linking a little gauche as you think you're going to the definition of one word and actually end up somewhere else? I've seen a lot that I'd like to change. Suggestions? --Bequw → ¢ • τ 21:26, 4 March 2008 (UTC)[reply]

I always use this to get straight to the lemma form, an entry like swooning contains nothing I couldn't guess, however the meaning of "swooning" can be ascertained from the entry at swoon, hence I link <code>[[swoon]]ing</code>. This is either a problem with our form of entries, or is how the world should be. Conrad.Irwin 00:45, 5 March 2008 (UTC)[reply]

Agree with Conrad.Irwin. Links should go to the lemma, unless the inflected form is intrinsically important to the meaning. -Atelaes λάλει ἐμοί 01:02, 5 March 2008 (UTC)[reply]

I agree with Atelaes and Conrad.Irwin about inflected forms. However, I'm iffier when it comes to trivially derived terms, like nouns in -ity that we define as "the quality of being …y" and so on. —Ruakh_TALK 01:35, 5 March 2008 (UTC)[reply]

The other place where the link is "hidden" is when the wikilink is at the beginning of a sentence or definition that start start with a capital letter, but the entry to be linked does not. Since Wiktionary is case-sensitive, and Wikipedia is not, this is a rather significant difference between the two projects. --EncycloPetey 02:35, 5 March 2008 (UTC) I thought I'd throw this in so that someone might take what folks say in this discussion and create a guide to wikilinks on Wiktionary.[reply]

And another thing: It wouldn't hurt to recommend that wikilinks to long lemma entries go to the appropriate Level-2 (for non-English terms) or Level-3 heading (Etymology or PoS). DCDuring TALK 03:27, 5 March 2008 (UTC)[reply]

I've started Wiktionary:Links. Please contribute to it. :-) —Ruakh_TALK 01:44, 6 March 2008 (UTC)[reply]

Us versus Les Grenouilles

One of my favorite restaurants in Paris was frequented in the years following WWII by businessmen from England, who referred to the proprietor as Roger the Frog. The restaurant is long and narrow [2], with tables for 4 on both sides, the walls are covered with photographs and many other memorabilia from the 1940's and 50s. It is now run by Roger's daughters. On my first visit, I ordered ris de veau (sweetbreads) in passable French; the woman went to the other end and said to her sister "the American tourist doesn't know what he is ordering". As she passed my table again, I said, in English: "is the abuse part of the service?" ... it is on the Left Bank, and is called proudly: Roger Le Grenouille (26 rue Grands Augustins)

I set out to find out how and why the Wiktionairre was 20K or so entries "ahead" of the en.wikt. We have ~720K entries counted in statistics, they have ~750K. By reading the XML dumps, I figured I could find out the 20 or 30K entries we didn't have, and then sample them to see what they were. But I got a surprise.

We have 712,823 (3 March) that would be counted in the statistics. They have 744,620. So there are 31,797 entries we are missing, plus some number more because we have a different set of entries. Right?

Turns out that they have 600,508 entries we don't have. See User:Robert Ullmann/en v fr.

A very large percentage are form-of entries, not "real" entries that someone has worked on. Given that some are also form-of entries here, I'd take a WAG that they have < 100K real entries ("base forms"), while we have 406,112 WT:STAT. Which would make a great deal of sense, given the number of contributors, etc. So I'd say forget about chasing fr.wikt, we are way ahead. (Reminds me of the cold war, in which the US built thirty thousand nuclear warheads out of fearmongering that the Soviet Union was getting ahead, when all the time it was desperately far behind ...) Robert Ullmann 14:50, 5 March 2008 (UTC)[reply]

Though I like to think this competition is a little friendlier than the stockpiling of nuclear weapons.. Widsith 16:50, 5 March 2008 (UTC)[reply]

No, much worse ... "Academic politics are so vicious precisely because the stakes are so small." ;-) (Attrib. Woodrow Wilson, modern form Wallace Sayre Robert Ullmann 13:51, 6 March 2008 (UTC)[reply]

For more precise statistics (% of form-of entries), look at the Total line of the table, at fr:Wiktionnaire:Statistiques. But it seems obvious to me that, overall, yes, the Wiktionary is ahead, and it's not surprising. Lmaltier 17:49, 5 March 2008 (UTC)[reply]

Am I going to get arrested for treason if I added a grc etymology to an entry on fr? -Atelaes λάλει ἐμοί 17:53, 5 March 2008 (UTC)[reply]

No, but if you create an entry, we might have to dig out the old "giving aid and comfort to the enemy" clause. :-) (U.S. Const. Article III at (3): "Treason against the United States, shall consist only in levying war against them, or in adhering to their enemies, giving them aid and comfort.") Robert Ullmann 13:51, 6 March 2008 (UTC)[reply]

I've update the analysis, screening out (roughly) the form-of entries. The results are more interesting. More words of interest that we will want to add. Robert Ullmann 13:51, 6 March 2008 (UTC)[reply]

I'd reckon that en: has about that many of non-:fr entries as :fr has non-:en entries. --Keene 01:16, 7 March 2008 (UTC)[reply]

There is some difference: 189,651 in fr, not in en.wikt; 262,239 in en, not in fr.wikt (keeping in mind that these numbers are based on a very rough screening of form of entries) A lot of the ones fr has are French or Vietnamese, which makes sense.

Do note that all my us v them rhetoric in this section is in fun ;-) Robert Ullmann 14:12, 7 March 2008 (UTC)[reply]

This is great stuff. Would it be feasible to generate the entire list without a lot of extra hassle? (I'm particularly interested in the Korean entries they have that we don't). -- Visviva 04:24, 8 March 2008 (UTC)[reply]

Never mind, I figured out how to get that data. Fortunately nobody on wikt has started worrying about Korean form-of entries yet. -- Visviva 07:16, 8 March 2008 (UTC)[reply]

Wiktionary:What Wiktionary is not

Should the Wiktionary:What Wiktionary is not page be made policy? A vote could be started if so, but I don't think it's a particulary pressing task. --Keene 15:48, 6 March 2008 (UTC)[reply]

No, I dislike the formal policy status, it should be unnecessary - particularly for pages like this which exist only because they exist on Wikipedia. See WT:NPOV for a particular example of this problem; that page may not be modified without a vote, yet its contents at the moment simply does not apply to Wiktionary. Conrad.Irwin 16:39, 6 March 2008 (UTC)[reply]

Well, WT:NPOV has been modified by vote, to reflect the userbox ban. It could (and should) be modified further, to reflect Wiktionary's unique characteristics. Ultimately, however, neutrality is as crucial here as it is on Wikipedia. We just happen to have a mercifully low level of content disputes, so haven't needed to develop the sort of robust dispute-resolution process which would require a robust NPOV policy. -- Visviva 03:35, 8 March 2008 (UTC)[reply]

Sure, this should be a policy if anything is. However, it's no big deal at this point. In practice the real boundary-defining page is WT:CFI anyway. -- Visviva 03:35, 8 March 2008 (UTC)[reply]

No, it should not. I can just see it: make WT:WIN and WT:CFI both policy, and then have disputes over slight nuances and shades that seem to contradict one another, until we wind up in the end with both pages essentially quoting one another with no difference in wording, whereupon someone will suggest merging them into WT:CFI and getting rid of the duplicate. Then someone will suggest having a more user-friendly explanation of what Wiktionary is not, and we'll start all over again. (That was with a touch of humor, but still.)—msh210℠ 18:43, 13 March 2008 (UTC)[reply]

The inclusionist/deletionist divide

Hi all, one of Wikipedia's great problems is the division between those who wish to include everything and those who wish to include only useful information. It seems to me that the same divide is growing on Wiktionary and I would like to discuss ways in which some of the angst it causes on Wikipedia can be avoided here. Firstly, I am an inclusionist and I can see no reason whatever for deleting anything. (Well except for vandalism and nonce words perhaps).

There are two cases that I feel we can do better than the current situation, and I would like to propose a solution: For proper nouns with no Idiomatic meaning (and thus should fail CFI), we can replace the page with a template that looks similar to MediaWiki:Noarticletext however contains a link to the Wikipedia article. The same can happen for words which do not meet our CFI because of the Independance criteria, however do merit an entry in an appendix - with the template linking to the correct page.

I can see some objections to this, "it turns red links blue", "it messes up our statistics", "it is misleading for the bots/humans", however I still feel the useful service we would be providing to our readers counteracts this. Thoughts? Conrad.Irwin 17:02, 6 March 2008 (UTC)[reply]

I don't know which one I am (I think I'm an inclusionist). My criteria is simple - does the word / term / combination of letters and symbols etc. have a meaning in the real world. If yes, keep it, if not delete it. SemperBlotto 17:07, 6 March 2008 (UTC)[reply]

If we need to create a second-class status to include more entries of the kind that fill the specilized dictionaries and references that fill the shelves of bookstores and libraries, then we should. I can't imagine that there would be insuperable actual technical difficulties although there may be other insuperable difficulties. DCDuring TALK 17:11, 6 March 2008 (UTC)[reply]

As a hesitant inclusionist, I agree in parts with both sides. SemperBlotto's view of "does the word / term / combination of letters and symbols etc. have a meaning in the real world" is very ideological. and I have followed that notion before. But I think our RFV system might be a little out-dated. 3 decent Google Books hits was resaonable, but today seems too inclusionist-y. I#m not sure how this could be improved, but it is still lacking something. Much like maybe all these thousands of French verb forms - I'm no expert, but am sure that some declensions would be unfindable on b.g.c.--Keene 01:14, 7 March 2008 (UTC)[reply]

Are you saying that we should be a little tighter on lemmas and a bit looser on forms-of? DCDuring TALK 02:52, 7 March 2008 (UTC)[reply]

Are there terms that are passing RfV that you think should not be included? -- Visviva 05:59, 7 March 2008 (UTC)[reply]

Personally, I'm an inclusionist with a strong we-have-rules-we've-agreed-on-so-let's-follow-them bent (-slash- a strong we-have-common-practices-so-let's-codify-them bent) and a weak our-readers-already-don't-understand-the-difference-between-an-encyclopedia-and-a-dictionary,-so-why-blur-the-line? bent. Overall, this disposes me favorably toward your solution. Regarding the possible objections you mention: (1) If we don't want an entry, then there's no particular need for links to it to be red edit-links (there probably shouldn't be any links to it, period, but between red and blue links there's no advantage one way or the other). (2) As long as we don't include [[ in these pages, I believe it won't affect our statistics. (3) Humans can learn, bots can learn, mirrors can learn, everyone will be happy. —Ruakh_TALK 03:17, 7 March 2008 (UTC)[reply]

Ruakh's rationale sounds good to me. In a perfect world, this {{nothing to see here}} template would also accept unwikified translations as arguments, something along the lines of wikispecies:Template:VN; that would I think dispense with the primary argument made in favor of keeping these entries. -- Visviva 05:59, 7 March 2008 (UTC)[reply]

I am an unabashed deletionist (and proud of it). I think this is a reasonable idea, but I would prefer it if the WM message could do a search and make a larger link if found. However, if this is not feasible, I am less concerned by our numbers than by our usability. -Atelaes λάλει ἐμοί 06:22, 7 March 2008 (UTC)[reply]

I have created an entry at Isaac Newton to show off the new {{only wikipedia}}, I am not sure what the best way of adding translations to it will be, both in terms of formatting and in terms of coding, so if someone else wants to experiment with that then please do. Conrad.Irwin 11:46, 7 March 2008 (UTC)[reply]

That seems like a good idea. It might be better though to have a generic {{otherprojectonly}} template with a parameter for the relevant project. Admittedly I can't think of any occasion it would be used for a project other than Wikipedia, but then I'm not familiar with most Wikimedia projects. Thryduulf 14:23, 7 March 2008 (UTC)[reply]

I wonder whether this would be a good way to handle two-part species names: referral to Wikispecies. That would be an argument for genericization or possibly for a parallel template. Wikispecies doesn't have etymology and we have some of the individual words (from Classical through Medieval Latin, less in New Latin). Although perhaps directly linking our redlinks in this area would be even better. DCDuring TALK 16:03, 7 March 2008 (UTC)[reply]

I would envisage there being a few of these (probably all using one template behind the scenes), one in particular could go from main namespace to the appendices, and I see no reason why we shouldn't also link to other projects if necessary. We have to be careful to ensure that this is only used for cases when Wiktionary should not have an entry and not used as a quick substitute for getting an entry written. Conrad.Irwin 21:10, 7 March 2008 (UTC)[reply]

I'm inclined to agree that binomial (and trinomial) scientific names should be on Wikispecies and not here, as there is not much we can say about them. However, this needs more discussion, as we have welcomed this sort of material heretofore. -- Visviva 03:28, 8 March 2008 (UTC)[reply]

The template looks great. -- Visviva 03:28, 8 March 2008 (UTC)[reply]

Not bad, now just make it multilingual. Yeah, I'm all for this idea, though I want to be sure that it's still possible to create an entry when the template needs to be overridden. For instance, Google Books has 91 hits for "the Isaac Newton of" as an exact phrase. I think the way the template is set up now (for English Wikipedia) already addresses my concern. DAVilla 22:48, 8 March 2008 (UTC)[reply]

Yes, the whole point of the template is for cases where this is not the case. If Wiktionary should have the word then it shouldn't have that template. I am not sure why we would want to link to other language Wikipedias, given that we should be providing information for English readers here. See 毛泽东 for how I think that situation should be treated. Conrad.Irwin 23:36, 8 March 2008 (UTC)[reply]

Obviously we should give top billing to the English Wikipedia's article if it has one, but I really don't see why we'd avoid linking to other languages' Wikipedias. BTW, a technical question: is there any way we can get these non-entries to not be indexed by Google? Currently we have <meta name="robots" content="noindex,nofollow" /> on redlinks; we probably want something like that on these entries as well, or at least the noindex part. (Actually, I'm not sure why we have nofollow on redlinks, either, but what do I know?) —Ruakh_TALK 00:24, 9 March 2008 (UTC)[reply]

CFI for languages

So, our mission statement is "Every word in every language!" But, at the same time, we don't include every langauge. For example, we don't include Ionic Greek. However, we do include the Ionic dialect of Ancient Greek. But, how do we divide our languages? Everyone who has studied linguistics knows that a language is only a well marked dialect. How do we decide what's a language and what's a dialect or period? With the more common languages it's generally fairly clear. However, for less common languages and extinct languages, where to draw the line is difficult. To a certain extent, it's not that important. If we didn't include Ancient Greek, but only included Greek, then all the words would still get covered, I'd simply have to include {{obsolete}} in nearly every term I enter. :) However, dividing between Greek and Ancient Greek works well, as it splits words up into groupings which are convenient for both users and editors. So far, the standard we have been following, for the most part, is SIL. SIL does a fantastic job of splitting up the world's languages and it gives us a bit of official credence in our divisions when we follow them. Most importantly, it prevents a lot of waffling and unending arguments. While it works to have Greek and Ancient Greek, and it would probably work to have just Greek, it would absolutely not work to have both randomly from day to day, or some editors doing one thing and some editors another. That would be an awful mess. However, SIL's not perfect. A few editors of obscure languages have noted a few deficiencies in SIL's groupings. So, here's what I'm thinking: I propose that we retain SIL as the standard which we use for making divisions between languages. If a language has a SIL 693-3 code, it gets approved for its own L2 header and for general use on Wiktionary. However, people are free to propose amendments to SIL's decisions. If someone thinks a language which SIL does not recognize should exist on Wiktionary, or perhaps what SIL considers two languages should be treated as one on Wiktionary, they can contest SIL's grouping. So, they start a BP topic on the issue and con a bunch of editors into buying their story. Since languages are the apical sorting method on Wiktionary, I think this is important enough that every single modification should be officiated by a vote, if the BP discussion goes well. Obviously everyone will be convinced in different ways, but I think, in general, an editor should have to write more than five entries to justify changing up the format of Wiktionary like that. What does everyone else think? -Atelaes λάλει ἐμοί 03:05, 8 March 2008 (UTC)[reply]

A minor point, but I think it is important to distinguish the SIL (Summer Institute of Linguistics, originally a missionary-training outfit) from the ISO (International Standards Organisation). As far as I'm aware, though, the SIL version is a faithful reflection of the ISO 693-3 standard. -- Visviva 03:20, 8 March 2008 (UTC)[reply]

Sorry, but that's not correct. SIL International (formerly known as the Summer Institute of Linguistics) is the official registration authority for ISO 639-3; I don't know whether ISO (English name: the International Organization for Standardization) maintains its own copy of SIL's list. —Ruakh_TALK 13:39, 8 March 2008 (UTC)[reply]

Um, no. ISO has the official list, IS 639-3. SIL is the designated registration authority [[3]] for requests for new codes and changes. Proposals by SIL have to be approved by ISO JTC1/SC2/WG2 and then are published by ISO. (Yes, the 3-letter code list was originally developed by SIL, contributed to ISO, updated, and now SIL uses the ISO codes in the Ethnologue, etc. That's why you'll find different coding in the 14th and 15th editions of the Ethnologue.) Visiva's comment is correct. We should generally just refer to the ISO codes, rather than SIL. Robert Ullmann 14:40, 8 March 2008 (UTC)[reply]

According to the ISO Web site, "ISO 639-3:2007 provides a code, published by the Registration Authority of ISO 639-3, consisting of language code elements comprising three-letter language identifiers for the representation of languages."^[4] That is, according to ISO, it's SIL that publishes the official list. Your statement that SIL proposals have to be approved by ISO JTC1/SC2/WG2 is interesting — I can't find evidence of that, but will take your word for it — but doesn't seem to be relevant. SIL is the public face of ISO 639-3, responsible both for handling change requests from the public, and for publishing the standard. If ISO keeps its hand in the process, bully for them, but for us they're relevant only in that they gave their imprimatur to SIL. —Ruakh_TALK 15:12, 8 March 2008 (UTC)[reply]

This is very interesting, thanks. I had not looked into the matter properly, and had not been aware of the special relationship between SIL and ISO 693-3. That said, I still object to referring to these as "SIL 693-3 codes" as the OP did (which was all that I meant to object to in my response above). -- Visviva 15:20, 8 March 2008 (UTC)[reply]

Yes, I think we should be calling them ISO 639 codes, not "SIL codes" (SIL itself calls them ISO 639-3). (The document in question says "International Organization for Standardization" (and "Organisation internationale de normalisation" :-) on the cover, and the copyright is ISO, not SIL.) As Visiva notes, that is the only point being made here. Robert Ullmann 15:40, 8 March 2008 (UTC)[reply]

Oh, yes, agreed, sorry. The name of the standard is ISO 639-3 (note the number, BTW), no matter who is in charge of it. —Ruakh_TALK 20:24, 8 March 2008 (UTC)[reply]

I think it is important that we be open to including as many languages (natural languages) as we can fit, meaning all of them. If a language is only spoken by a tribe of 373 natives in the remotest jungles of Brazil, well, where better to record their lexicon? I am absolutely not an inclusionist in general, but I like the idea of recording 'all words in all languages' quite a lot, (when it comes to real words), so let's get them in here. The sticky wicket comes along when you talk about Level 2 headers, and what should and should not be included in them. I don't think we should go exclusively by the ISO list, but I also don't think we should let anyone who feels like it decide that a language deserves a L2. Perhaps we can get together a few of our true linguists (of whom I am not one) and get some kind of Language Committee or something, a group of people who are willing to put in the legwork and document reasoning behind calling a language a language, and then saying 'Yes, use an L2' or 'No, Appendix only'. I think that a 'vote' is a bad idea, it doesn't represent anything close to objective results, I know because I have voted before. I think getting qualified people to make the recommendations is a better way to go about these sorts of things. - [The]DaveRoss 03:35, 8 March 2008 (UTC)[reply]

Excellent idea. Some of the discussions on the inclusion of a particular language end up in WT:RFDO, and it would be nice centralize them somewhere formally. --Ivan Štambuk 10:53, 8 March 2008 (UTC)[reply]

I agree, and would like to take this opportunity to propose officially that "Hebrew" (he/heb) and "Ancient Hebrew" (hbo) be taken as one language, "Hebrew". The two languages are not without their differences, but most Ancient Hebrew are still considered correct (if odd) in Modern Hebrew, and quotes from Ancient Hebrew texts are still widespread in Modern Hebrew, much like quotes from Early Modern English (Shakespeare, the KJV, etc.) are still widespread in English today. (This has previously been discussed, and appeared to have consensus; I re–bring it up now only because you're proposing an official mechanism for deciding these things, and I'm not a huge fan of implicit grandfathering.) —Ruakh_TALK 13:39, 8 March 2008 (UTC)[reply]

While I have absolutely no problem with this proposal, I request it not take place in the middle of this particular thread. This is meant to be a discussion of whether we can deviate from 693-3, and if so, how. If we throw every language proposal here it will be unteneable. Besides that, we really should figure out what the policy is for these changes before we make them. -Atelaes λάλει ἐμοί 19:01, 8 March 2008 (UTC)[reply]

O.K., sorry. In my own defense, the SIL-vs.-ISO discussion above takes up a lot more space. :-P —Ruakh_TALK 20:24, 8 March 2008 (UTC)[reply]

Indeed it does, but it's almost sort of vaguely related to the topic at hand. More importantly, I figured it would fizzle out as it's such a minute detail. ISO 639 is the official standard. SIL is the organization charged with producing that standard and one of the easiest places to find out what the standard is. Whatever. -Atelaes λάλει ἐμοί 20:38, 8 March 2008 (UTC)[reply]

I run into problems with the differences between what the "WMF language committee" and ISO use. For example, Min Nan is "nan", but WMF created the Min Nan wikipedia using "zh-min-nan". I have been trying (not terribly hard ;-) to find out who the committee is, and possibly get on it. There are definitely codes we need to add, probably mostly as subcodes (fiu-vro for Template:fiu-vro), but they should get WMF-wide coordination if possible. At least others should know what we are doing. For languages that need separate codes, we should find out if there is a proposed -3 or -4/-5 code; if not we should be contributing our findings to ISO TC37/SC2 via SIL. Robert Ullmann 15:40, 8 March 2008 (UTC)[reply]

That would be the m:Language subcommittee. It's worth noting that the language proposal policy specifically requires that new proposals have an ISO 639-x code, but this seems to be honored more in the breach than the observance. -- Visviva 13:08, 9 March 2008 (UTC)[reply]

The proposal seems good, as I understand it; i.e. that we should follow ISO 639-3 except when it is in the project's interests to do otherwise, and that such exceptions should only be approved upon thorough community-wide deliberation (WT:VOTE or equivalent). -- Visviva 13:08, 9 March 2008 (UTC)[reply]

As I think about it, TheDaveRoss's idea seems better and better. As evidenced by two of the following threads, this issue is one which will appear over and over, with increased frequency as time goes on. Inasmuch as overall community consensus is the best way to decide things when possible, I just don't know if its feasible for the community to make this decision every time it comes up. Most of our editors simply don't have the time to read a ten page Beer Parlour discussion and weigh all the political, historical, and linguistic controversies present. I think it will work much better if we pick two or three solid, well-respected editors with a good track record and give them dictatorial powers on the issue. I know everyone cringes at the very mention of the word "dictatorial", but we've already tried this method (with WotD), and it's turned out pretty damned well, as far as I'm concerned. -Atelaes λάλει ἐμοί 21:23, 11 March 2008 (UTC)[reply]

I don't know about these constructed languages, as certainly some such as Esperanto may be legitimate as a native tongue, while others are so obscure that not a single word in the language would pass CFI independently for three authors, so there is definitely a gray area for those. But forgetting about artificial languages for a moment, at least I can be certain that if it flows out of someone's mouth or pen and conveys a message that is understood by others then it is a word in some natural language, though I may not know which one. So in some sense I see the question as to which language a word belongs to be completely independent of the question of whether it belongs on Wiktionary. If it's spoken or written and it can be attested then it belongs in a project that documents every word in every language, and if there is controversy over which language it belongs to then the heading may change and change back and change again, but the substance will remain. 71.129.48.8 06:46, 13 March 2008 (UTC)[reply]

Category:Old Korean language

This category currently contains (only) Hangul reconstructions. This is absurd, since even the most generous definitions of Old Korean define it as Korean in use before the introduction of Hangul. All extant OKO texts (which are very few) use Chinese and/or gugyeol characters. I am seeking clearance to move all entries herein to their attested forms, if any, and delete the redirects. No objection to Appendix:Reconstructed pronunciation of Old Korean if anyone wishes to create it. -- Visviva 09:38, 8 March 2008 (UTC)[reply]

That seems reasonable to me. Thank you. I was wondering what I was going to do with that. You've saved me a great deal of stress. -Atelaes λάλει ἐμοί 09:39, 8 March 2008 (UTC)[reply]

Grammar gurus in the house? See entry for "hence"

Hey, I'm not a grammar guru, but I think there's a comma splice in the examples for hence. I left a note on that page's talk, but assume it will be ignored. 163.28.49.4 12:45, 9 March 2008 (UTC)[reply]

This belongs in the Wiktionary:Tea Room. I have removed the comma, though do note that you can fix things like this yourself using the edit button at the top of an entry. Conrad.Irwin 12:52, 9 March 2008 (UTC)[reply]

Wasn't sure if I was right, so didn't edit.. plus didn't know the diff b/w a tea room and a beer parlour. :-) Thanks! 163.28.49.6 13:01, 9 March 2008 (UTC)[reply]

Naming of categories of non-English proverbs and idioms

Currently, there is a mixed naming of categories for non-English proverbs and idioms, like:

These seem to stem from two different sources of modeling and imitation:

Category:German nouns - categories for lexical categories, like nouns, adjectives, or interjections.
Category:cs:Trees - topical categories

Depending on whether idioms and proverbs are considered more like the items in the first group or the second group, the naming of proverb and idiom categories could be chosen, fixed in policy, and then the categories could be renamed. Any opinion on this from the policy makers? --Daniel Polansky 08:47, 10 March 2008 (UTC)[reply]

Actually, in the case of Mandarin, we have both (Category:Mandarin proverbs, Category:zh:Proverbs, Category:Mandarin idioms, Category:zh:Idioms)! There's no reason that it has to be an "either/or" choice. Furthermore, provided that the contributor uses the inflection templates recommended by WT:AC (Template:cmn-proverb and Template:cmn-idiom), it involves no more work for the contributor than a single category. Incidently, the benefit of having both categories is that you can use it as an opportunity to provide different sort options (as well as options for which entries to include in which category). -- A-cai 10:06, 10 March 2008 (UTC)[reply]

I see. I did not notice there were both types of categories for Mandarin. I have now noticed that there is also the category Category:zh:Nouns, and that Category:zh:Proverbs is further split into subcategories while Category:Mandarin proverbs is not.

I see the benefit of less confusion, for me anyway, if there is only type of naming scheme for categories. As regards the downsides of one type of naming scheme, unfortunately, I do not understand what you mean by "different sort options"; do you think you could explain that to me?

Is there any other language using both naming schemes, or is this just Mandarin?

Do you expect that all the languages should have both categories, like having Category:cs:Proverbs and Category:Czech proverbs? --Daniel Polansky 12:35, 10 March 2008 (UTC)[reply]

It seems to me that, for other languages but Mandarin, having a consistent naming scheme would be valuable. --Daniel Polansky 12:35, 10 March 2008 (UTC)[reply]

We do have a consistent scheme, except for Chinese languages and a handful of categories that have not yet been cleaned up to standards. The two forms of category name (one for parts of speech, the other for topics) is a deliberate and consistent distinction. --EncycloPetey 17:01, 10 March 2008 (UTC)[reply]

So is it correct that there should be Category:German proverbs and not Category:de:Proverbs? And can I move Category:fr:Proverbs to Category:French proverbs? Connel seems to have a different view, judging from the tags he added to Category:French proverbs and Category:German proverbs. --Daniel Polansky 18:35, 10 March 2008 (UTC)[reply]

See the ongoing discussion at Wiktionary:Requests for deletion/Others#Category:French proverbs. --EncycloPetey 19:24, 10 March 2008 (UTC)[reply]

Norwegian language classification

I have been engaging in conversations with a couple of users about what to do with Norwegian, and it has gotten to the point where I figure it would be good to have the larger community's input into the subject. The issue centers around Bokmal and Nynorsk. The introductory section of the Wikipedia article on Norwegian sums it up rather nicely. Please do the background reading there if you are not already acquainted with the subject. The two conversations can be found at User talk:EivindJ#Norwegian Questions and User talk:Robert Ullmann#Norwegian language templates. So, there are a number of different ways we can sort between these types. I am advocating following the Norwegian Wiktionary. What this would entail is having the L2 header "Norwegian" for all words which are used and spelled the same in both Bokmal and Nynorsk. Any word which only exists in (or has a usage unique to) Bokmal receives the L2 header "Norwegian (Bokmal)" and any word which only exists (or has a unique usage to) Nynorsk receives the header "Norwegian (Nynorsk). I am as yet, unsure of how the categorization would work on that. Another option is to put all Norwegian only under the L2 "Norwegian", and then use Nynorsk and Bokmal as context tags (i.e. treating Bokmal and Nynorsk as dialects, instead of languages). A third option which has been advocated is to equate Bokmal with Norwegian (i.e. Bokmal terms go under the header "Norwegian") and treat Nynorsk as something else entirely, going under the header "Nynorsk." Thoughts? -Atelaes λάλει ἐμοί 00:22, 11 March 2008 (UTC)[reply]

Prior to having done all the background reading, my initial thoughts are the simplest to explain and the easiest in terms of categorisation would be either your second option (treating Nynorsk and Bokmal as effectively dialects); or one based on your first option but where the header "Norwegian" is not used, where words are the same in Nynorsk and Bokmal the page would have two L2 sections "Norwegian (Bokmal)" and "Norwegian (Nynorsk)". Your first option would follow this format if there are homographs?

In terms of translations into Norwegian of English words, I'd recommend using the same format as Serbian does for the different scripts. Thryduulf 01:08, 11 March 2008 (UTC)[reply]

Yes, the more I think about it, the better I think it is to simply use "Norwegian" and then use context tags. This solves the problem of sorting (as words would go into [Category:Norwegian POS's], as well as [Category:Nynorsk], etc.). Additionally, this provides an intuitive format for any other dialects we want to include in the future. -Atelaes λάλει ἐμοί 01:29, 11 March 2008 (UTC)[reply]

I'd prefer either (1) to separate them completely, with "Norwegian Nynorsk" and "Norwegian Bokmal" being valid L2 headers, and "Norwegian" alone being invalid, or (2) to treat them as a single language, "Norwegian", with two regional variants, like we do with U.S. and U.K. English. I don't like the idea of a half-and-half approach where we'd have three L2 headers for two forms of arguably one language; and I'm only O.K. with "Norwegian" vs. "Nynorsk" if we can also give U.K. English the boot — say, "English" vs. "British". Or maybe "Limey". ;-) —Ruakh_TALK 01:33, 11 March 2008 (UTC)[reply]

Remember, English is English: it is "American" that would be given the boot. English and USpeak. ;-) Robert Ullmann 15:56, 11 March 2008 (UTC)[reply]

Actually, in this situation British English resembles Nynorsk much more. The reason why some consider Bokmål more "properly" Norwegian is simply because it has a distinctly larger population. And if we're talking about population, American English definitely has British English beat. -Atelaes λάλει ἐμοί 21:11, 11 March 2008 (UTC)[reply]

Tee-hee: everyone says this as if it is gospel, because it is so "obvious". The actual numbers show Commonwealth English over American about 3:1. Sure, people hear a lot of American on TV, but if you write "color" in your schoolwork you will lose marks for spelling ;-). Robert Ullmann 12:36, 13 March 2008 (UTC)[reply]

To refer to Bokmål and Nynorsk as dialects is a misinterpretation, I am afraid. It is wrong to think about these two as dialects, or simply as if they were Norway's answer to British and American English. They are two equivalent languages, by law, and differ almost as much as Norwegian and Danish in it's written form (not verbally). When Robert Ullmann utters that "The only issue is a few words that can't be in any way considered Nynorsk" he is sadly mistaken. There is heaps of words that in no way can be considered bokmål and the other way around. There is not for fun that we have one Nynorsk Wikipeda and one Bokmål/Riksmål Wikipedia. I reckon, and some other no.-admins with me, that "Norwegian (Bokmål)" and "Norwegian (Nynorsk)" is most correct, but I will give notice to the Norwegian Wikipedias' fellowships and make them come with their thoughts. Thanks (: --EivindJ 07:27, 11 March 2008 (UTC)[reply]

Using "Norsk" and "Nynorsk" is wrong, as it gives the impression that nynorsk is less "norsk" than bokmål is. I am a bokmål user myself, but I would dislike seeing nynorsk discriminated in such a way. I think that either you should use two different L2 headers, or you should use the same method as with UK/US English (though i think the differences between bokmål and nynorsk are both bigger and more numerous than the differences between UK English and US English). - Soulkeeper 07:57, 11 March 2008 (UTC)[reply]

A symmetrical solution has to be used. The option “Norwegian (bokmål)” vs. “Norwegian (nynorsk)” is probably the best one. As for the two ofen having similar forms, that goes almost equally much for the two of them vs. Swedish (esp. nynorsk) and Danish (esp. bokmål). -- Olve Utne 12:09, 11 March 2008 (UTC)

Look, we have to have Norwegian as Norwegian. (Else everyone will be saying forever "I can't find Norwegian"). We then have the same case as with other languages, the most common/standard/default language (from an English POV; this is the English wikt ;-) gets the name. Then we have words that people will insist absolutely must be identified as only Bokmål (which we call Norwegian ...) and others that are Nynorsk.

ISO codes 3 different languages:

no = nor = Norwegian
nb = nob = Norwegian Bokmål
nn = nno = Norwegian Nynorsk

(Technically, we have a few constraints that must be observed: the names must match, e.g. no = nor, and the names must not contain parens or be partly linked. But that isn't a problem.)

I had/have it set up as:

no = nor = Norwegian
nb = nob = Norwegian Bokmål
nn = nno = Nynorsk

Note there is only one difference from ISO (and the Norwegian Government recommendation for the names of Bokmål and Nynorsk). I don't see a problem with "Nynorsk" -> "Norwegian Nynorsk" The problem will be keeping the majority of the language in Norwegian where people expect it to be. See euro. Robert Ullmann 15:56, 11 March 2008 (UTC)[reply]

Even if the translations are the same, keep the two different headers. One could always put a bigger header atop called Norwegian, then have two sub-headers: one for bokmål and one for nynorsk. To suddenly have one header, whereas there usually is two, is going to confuse more than anything else. --Harald Khan Ճ 16:16, 11 March 2008 (UTC)[reply]

no, we don't use subheads like that. (See WT:ELE, header levels are very significant.) And there won't be "suddenly" one header; there will be almost always only one header (Norwegian), and only occasionally two. Note that the case at euro is instructive, it occurs only when the word is the same and the inflection different (yes, I know this occurs with some frequency), it could be handled right in the inflection line, and save having two sections. Robert Ullmann 16:19, 11 March 2008 (UTC)[reply]

OK. Still there should be two different headers: Norwegian (Bokmål) and Norwegian (Nynorsk). No official written language is called Norwegian. It is discriminating to hint that the inflection of Bokmål is more Norwegian than that of Nynorsk or vice versa. --Harald Khan Ճ 17:28, 11 March 2008 (UTC)[reply]

The WP article is not clear on whether the languages are mutually intelligible. Are they? If so, I see no reason at all to split them here on enwikt. If not, then they should be different L2 sections even where the contents of those sections would coincide. What to call them is then another question, and I don't like any of the solutions, to be honest.—msh210℠ 17:36, 11 March 2008 (UTC)[reply]

All the four langugages, Danish, Swedish, Bokmål and Nynorsk is mutually intelligible. --EivindJ 17:49, 11 March 2008 (UTC)[reply]

While I don't have much background in linguistics, I thought that that was the criterion on which linguists decided to consider dialects languages. Am I wrong? Or are Danish, Swedish, Norwegian, and Norwegian considered one language by linguists? Or is this an exception for some reason?—msh210℠ 17:53, 11 March 2008 (UTC)[reply]

I don't have any background in linguistics either, but I strongly doubt that they are considered as one language. --EivindJ 17:59, 11 March 2008 (UTC)[reply]

For more on this, see w:Dialect, w:Dialect continuum#Scandinavian_languages, and w:Ausbausprache - Abstandsprache - Dachsprache.—msh210℠ 18:58, 11 March 2008 (UTC)[reply]

Note that there are already a number of languages that are separated here, and that are mutually intelligible and naturally form a dialect continuum, like Croatian/Serbian, Hindi/Urdu, Moldovan/Romanian, Macedonian/Bulgarian... So this kind of separation for Norwegian wouldn't be a precedent, but a continuation of a common practice. --Ivan Štambuk 19:11, 11 March 2008 (UTC)[reply]

Mutual intelligibility has nothing to do with a definition of a 'language'. Natural languages are not like biological species that that there is a hard line cut between them, that prohibits mixing, at the DNA level. Such kind of analogy a grossly misleading simplification. They are exclusively defined by national committees, and in this particular case NLC recommends "Norwegian Bokmål" and "Norwegian Nynorsk" terms respectively. There's no reason to enforce politically incorrect terms into L2 section names when there such clear alternatives that all native speakers agree on. nor/no itself is a macrolanguage code, not individual language, and these normally don't get included at all. --Ivan Štambuk 18:48, 11 March 2008 (UTC)[reply]

Actually, they are very much like biological species. Because, you see, some species can (and do) interbreed, while other species form a continuum wherein which some members can interbreed and others cannot. The various species of oaks regularly form fertile offspring, and many orchid genera can form hybrids with enough regularity that some of these hybrids have their own names. There is a ring species of birds in the northern hemisphere around the pole, where individuals at wither end of the bird's range cannot interbreed, but interbreeding happens everywhere in between over short distances. It is a myth that biological species cannot ever interbreed, and, in most cases, there is no experimental data verifying that two species cannot interbreed. So, the interbreeding of species is very like the intelligibility of language, that is: thoroughly muddled. --EncycloPetey 01:16, 12 March 2008 (UTC)[reply]

Nice link on ring species, read about it in Richard Dawkins - 'A Devil's Chaplain' (ingeniously written book :). However, obstacle that regulates interbreeding of species has been discovered recently at the DNA level. Escherichia coli and Salmonella typhimurium, that evolution separated > 150 million years ago and have ~20% mismatch in DNA, have proven to be compatible under some circumstances. This means that the barrier between species is very discrete, and this does not occur in natural languages in which lexemes are tossed in all directions and adapted. --Ivan Štambuk 19:14, 12 March 2008 (UTC)[reply]

A famous geneticist once remarked "If it's true for E. coli then it's true for elephants." Since that time, the statement has proved false many times. There are many ways in which the genetics of bacteria and elephants are very, very different. There is not just one single factor that regulates interbreeding of species, there are many, many different mechanisms that can and do come into play. Consider that species with separate males and females automatically limit the possibilities of pairings that will result in fertile offspring, even if the DNA itself is 100% compatible. In some species of plants and fungi, there are single allele mating cofactors that function in the same way to control mating type. And even when species cannot mate themselves, there are viruses that act as agents transferring DNA laterally from one species to another, just as happens with languages. Language has not been around so very long, compared with the age of biological species, so intercompatibility and hybridization is to be expected. --EncycloPetey 03:33, 13 March 2008 (UTC)[reply]

The solution used in euro is not good since it strongly implies that Bokmål is Norwegian and Nynorsk is something else than Norwegian. I don't understand why it is desirable that en.wikt should imply something like that just because some users here prefer the one language before the other when it comes to what is Norwegian. There is no doubt that the two of them are independent languages (a quick call to the Norwegian Language Council should prove me right) ... none of the two is closer to spoken Norwegian (almost everyone speaks something in between). I do also not understand why users who don't have proper acquaintance with both of the languages can utter statements that implies that they are extremely similar and that partition only is necessary in a few cases. With all respect, it isn't out of national sentiment that we argue for having a clear distinction between the two. --EivindJ 18:15, 11 March 2008 (UTC)[reply]

Agreed. Robert's proposed format is untenable. Since all of our Norwegian friends seem to insist that we must make a language level distinction between the Bokmål and Nynorsk, that seems the way we must go. However, this means a few things: "Norwegian" as a language header is out. All Norwegian entries must go under "Norwegian Bokmål" or "Norwegian Nynorsk." However, we must keep the the word Norwegian in these headers, because, as Robert rightfully notes, otherwise people will be complaining that they can't find Norwegian. Certainly we will have a lot of duplication (i.e. a lot of entries with both "Norwegian Bokmål" and "Norwegian Nynorsk" headers), but we already have a lot of duplication with Scandinavian languages anyway. This also means that every single entry which currently uses the L2 "Norwegian" (according to WT:STATS, there's about 3,000 of them) must be changed. Obviously we should wait to get an official consensus (a vote is really required), but that looks to be where we're heading. -Atelaes λάλει ἐμοί 20:03, 11 March 2008 (UTC)[reply]

So you mean it will not be sufficient to name all words who are equal in spelling and meaning in both languages as "Norwegian", and only use the spesific headers for the words that differs from the one or the other? --EivindJ 20:43, 11 March 2008 (UTC)[reply]

The more I think about it, the more I have to say no, it isn't possible to do that. It's simply not the way we do things here. We don't have varying levels within our language headers. We can treat Norwegian as a language, or a language family containing Bokmål and Nynorsk, but not both. If we use Bokmål and Nynorsk as language headers, that makes Norwegian a language family, and we don't put language families in L2 headers. Ultimately, there is no qualitative distinction between language and well marked dialect, except the distinction of politics, as Ivan mentioned. Because of this we generally divide stuff in the way which will be most useful to our readers and easiest on our editors. However, the politics of living languages sometimes forces our hand. I will say that treating Norwegian as a language and Bokmål and Nynorsk as dialects would be easier to edit (and probably more useful to our readers), but if degrading the two to dialects would cause us uproars, it's not worth it, and we'll have to treat the two as distinct languages. -Atelaes λάλει ἐμοί 21:08, 11 March 2008 (UTC)[reply]

Thanks for the explanation. The way I see it there is not much other options than what you describe here; at least if we're going to do this properly. --EivindJ 21:27, 11 March 2008 (UTC)[reply]

Ok, can everyone who cares about the issue state whether they can accept the following two languages on Wiktionary: "Norwegian Bokmål" and "Norwegian Nynorsk," with "Norwegian" being a deprecated language. If this seems to be acceptable after a few days, we'll put it through a short vote, and allow you Norwegian folks to get back to work. :) -Atelaes λάλει ἐμοί 21:34, 11 March 2008 (UTC)[reply]

I accept that. And hopefully a bot (maybe AutoFormat?) can start s/Bokmal/Bokmål/g-ing. —Ruakh_TALK 23:21, 11 March 2008 (UTC)[reply]

No, they must be "Norwegian" (no) and "Nynorsk" or "Norwegian Nynorsk" (nn). We can not deprecate "Norwegian". Note that this is the WMF standard: we have no.wp and nn.wp, and no.wikt and nn.wikt, and so forth. (Also, nb would exclude Riksmål, which creates another utterly un-necessary problem.) Robert Ullmann 13:30, 12 March 2008 (UTC)[reply]

That's a little tricky. If most of the entries are from a single editor, and we find out that they've only been working with Bokmål, that might work. However, the best thing would be for real live people to go through these by hand and figure out whether the word is B, or N, or both. -Atelaes λάλει ἐμοί 23:33, 11 March 2008 (UTC)[reply]

If there is no label, then it will require human attention. I think Ruakh's suggestion, however, was about changing instances of "Bokmal" to "Bokmål" (i.e. changing the second vowel from "a" to "å") which is certainly automatable. Certainly changing it for new/edited entries would fit will with AutoFormat's work. Thryduulf 00:06, 12 March 2008 (UTC)[reply]

In principle, I am opposed to the idea of splitting the two into separate L2s, as I have been saying to Atelaes and Robert Ullman. Bokmål and Nynorsk are not different languages; they are two differing written standards based on differing dialects (but by no means necessarily faithfully reflective of any one dialect) of the same language. It is thus my preference that they should be unified under the same L2 header (Norwegian) and any (all) standard-specific terms indicated with context tags. I would also see Riksmål- and Høgnorsk-specific terms (and anything else we can think of) likewise indicated, where they exist, and are appropriate and relevant. However, if it's deemed more useful by the community to split them into separate L2s (although it does seem rather like duplicating work), I will not object, so long as it's split as indicated by Atelaes above. In addition, we need an About Norwegian page (either containing information on both standards or linking to separate pages with a note as to why) for those who just don't get the idea of there being no Norwegian language header. Release the shoats! --Wytukaze 01:31, 12 March 2008 (UTC)[reply]

As you say, a tag might be better, except that Nynorsk is treated as a separate language and code from Norwegian across projects. (Think about iwiki links, wp links, etc. etc.) This is already extremely well established, which is why I am very annoyed with Atelaes for creating a large discussion on BP about something that is not going to change anyway. You can go around in circles forever on this, and it will all come back to cleaning up entries so that the headers are "Norwegian" and "Nynorsk" (or "Norwegian Nynorsk") This is a solved problem. Creating a very large mess from a solved problem is not productive. I will be trouting Ateleas when I can catch him on IRC ;-) Robert Ullmann 13:30, 12 March 2008 (UTC)[reply]

I would like to know why users continously comes with utterings like "Bokmål and Nynorsk are not different languages; they are two differing written standards based on differing dialects". This isn't clear facts. Nynorsk is based on all Norwegian dialects together while Bokmål is based on the old standard strongly influenced from Danish. And as a fact I can state: "They are two independent languages". I note that several users by the way they express themselves are implying that these two are very close to each other, and not at all two languages. The iws to other articles on Wikipedia use "Norsk (Bokmål)" and "Norsk (Nynorsk)", so why do Robert want "Norwegian" to mean "Norwegian (Bokmål)" when it it means both? The only reason why "no" is used instead of "nb" (bokmål) on no.wiki is because no.wiki first was for both lanugages, but after a while "nn" got it's own wikipedia, but "no" was never changed to "nb". --EivindJ 13:42, 12 March 2008 (UTC)[reply]

I concur that Nynorsk should be separate. What I am saying is that the header here for no must be "Norwegian" and we must use no and nn because all the rest of WMF does (we can't use "nb" for Bokmål without breaking everything in sight). Note that no is then Norwegian-not-Nynorsk, as you point out is the established usage in WMF for (e.g.) no.wikt, nn.wikt and so on. And we simply can't use "Norwegian Bokmål" for what in English is called "Norwegian". (NLC POV, etc, etc, notwithstanding.) As I have pointed out, this is a solved-set problem; nothing to see here. Robert Ullmann 13:53, 12 March 2008 (UTC)[reply]

Yes there is a major problem here: Despite what you imply, there is no written language called Norwegian; only Norwegian (bokmål) and Norwegian (nynorsk).

If bokmål is best left under the no template, then change it into Norwegian (bokmål). Wiktionary's accuracy regarding the Norwegian language starts here. --Harald Khan Ճ 16:35, 12 March 2008 (UTC)[reply]

Just to clarify it further: making one header called Norwegian and one called Nynorsk is like under the Wikipedia entry of Norway to present the official languages of Norway as "Norwegian" and "Nynorsk", which is nothing but a factual mistake. --Harald Khan Ճ 16:48, 12 March 2008 (UTC)[reply]

Umm....I'm seeing Norwegian (Bokmål and Nynorsk). -Atelaes λάλει ἐμοί 16:55, 12 March 2008 (UTC)[reply]

As you should. That is opposed to Norwegian AND Nynorsk. The term Norwegian does exclusively refer to the Norwegian language as a whole including all dialects or both written languages. --Harald Khan Ճ 20:02, 12 March 2008 (UTC)[reply]

While I'm sure this will prolong my trouting, I must say Robert, that you've never offered any convincing reason why we can't use "Norwegian Bokmål" and "Norwegian Nynorsk." If the angstrom will break bots, perhaps we can substitute an "a" for it (and note the deficiency in a few places). It really can't be a matter of users not being able to finding Norwegian, as Norwegian's included in there. The worst case scenario there is that people will wonder what Bokmål and Nynorsk mean, and educate themselves on the subject (a situation which is, admittedly, completely opposed to WM values :-)). If it's out of concern for following WM precedent, then a quick look at the main page of Wikipedia solves that. The interwiki links to both Norwegian 'pedias are titled ‪Norsk (bokmål) and ‪Norsk (nynorsk). We don't seem to have an interwiki on en wikt for the nn wikt (it looks like a fairly new wikt). You say that using nb would break stuff (such as t-bot, I presume), then we can use no and title it Norwegian (Bokmål). If it's simply your odd POV in considering Nynorsk to somehow be inferior to Bokmål or just plain not a part of the Norwegian language, then I guess I'm not worried, as you seem to be singing a solo tune on that one, and a vote will put an end to that (the tyranny of the majority can be a nice thing when you're in the majority). -Atelaes λάλει ἐμοί 16:43, 12 March 2008 (UTC)[reply]

(note that the previous entirely misrepresents my "odd POV". If Språkrådet itself can call it (just) "Nynorsk" in its literature, it is at the very least not wrong ;-) Robert Ullmann 13:00, 13 March 2008 (UTC)[reply]

I'll say it again. Bokmål and Nynorsk are not different languages and it is misleading to refer to them as such. They are, and I have never said otherwise, (add 'completely' if you wish) separate written standards of the same language. They are similarly not independent; they do influence each other and are influenced by speech (and this is, if you'll forgive me for saying so, rather obvious). The spoken language, as it happens, is not separated into the two standards of Bokmål and Nynorsk; yes, there's some correlation, but speech is split among the various dialects, some of which correspond more or less closely with one of the written standards or with the Swedish standard, and on and on and on. Many languages have considerable dialectal variation in speech and Norwegian is in no way special in this area. What is special is the written situation, but a written standard does not constitute a language and a language can have more than one written standard. It is not difficult to do. This is, as I believe has been stated, a different matter to the differences between US and UK spelling differences, true; it is, however, comparable with that, the pluricentric standardisation of German, the Cyrillic and Latin orthographies of Serbian and the situation with modern Welsh, where the written, literary standard differs considerably from modern dialects. I do believe useful information on all of those can be gleaned from Wikipedia. As such, it is still my opinion that we should not be splitting the two written standards into separate L2s. However, as we and the WMF as a whole already do split them, we should continue to name both Norwegian and specify which standard we are referring to at every instance. That is, I firmly do not support naming one "Norwegian" and the other "Nynorsk" or even "Norwegian Nynorsk". They are both equally Norwegian, and Atelaes' proposal that we use "Norwegian (Bokmål)" and "Norwegian (Nynorsk)" shows this and the secondary nature of writing in language admirably. --Wytukaze 20:15, 12 March 2008 (UTC)[reply]

Okay people ... have we discussed this to death? (note: I only used "Nynorsk" itself because "Norwegian Nynorsk" seemed like a redundant pleonasm. Språkrådet (NLC) itself uses just "Nynorsk" in its English language literature.)

How about we simply follow the standards? ISO, SIL, NLC, the de-facto setup of WMF, the setup of the no.wikt, and the status quo pro ante here? Eh? Especially because they all say the same thing?

ISO/SIL codes:

no = nor = Norwegian
nb = nob = Norwegian Bokmål
nn = nno = Norwegian Nynorsk

Språkrådet (NLC, Norwegian Language Council) recommends that in English the two written forms be called "Norwegian Bokmål" and "Norwegian Nynorsk" when they need to be distinguished. (Notice no parenthesis; we don't want parens in L2 names anyway.)

WMF uses no = Norwegian in the project naming, (although this has become usually more Bokmål as the Nynorsk projects were added).

The no.wikt uses no = Norsk, nb = Norsk (Bokmål), nn = Norsk (Nynorsk) (see no:Mal:=no=, no:Mal:=nn=, no:Mal:=nb=), using no = Norsk for most entries, and the other two when they need to be distinguished:

from the no.wikt, as of 6 March 2008

Header	Occurs
no	674
nb	72
nn	64

We use no = Norwegian, and have been using the other two (with some variations in form) when they need to be distinguished. (A very small number at this point, L2/invalid shows exactly five.) Note that Norwegian not Bokmål or Nynorsk is no = Norwegian in all the standards; we presumably want some context tag or usage notes in these cases.

As I've said above more than once, there isn't any problem here; all the standards and the de-facto setup(s) agree. There are just a few entries that need fixing, which is where all this [redacted] started.

See mandag and måndag, compare no:mandag and no:måndag. Robert Ullmann 12:29, 13 March 2008 (UTC)[reply]

And note that all this is pretty much exactly what Atelaes said at the top of this section, which is why I want to trout him for raising essentially a non-issue and creating a lot of sound and fury. Robert Ullmann 12:36, 13 March 2008 (UTC)[reply]

Note that using en.wikt and no.wikt as examples for how often we need to distinguish between the two is not good. A quick look through Category:Norwegian language tells me that there are heaps of words that needs to be changed into either nb or nn. However, I understand it so that Robert says we can have the headers "Norwegian", "Norwegian Bokmål" and "Norwegian Nynorsk". If that's correct then everything's ok for me. I just don't want to see "Norwegian" for "Norwegian Bokmål" and "Nynorsk" for "Norwegian Nynorsk". --EivindJ 13:49, 13 March 2008 (UTC)[reply]

This is my first comment on Wictionary; I don't know the technicalities. But I know American English, Nynorsk and Bokmål - in that order. It is totally unacceptable to in any way suggest that Bokmål is more Norwegian than Nynorsk is.

It should perhaps also be pointed out that the differences between the two include more than just words; there are grammatical differences as well. --Hordaland 22:40, 17 March 2008 (UTC)[reply]

Please note that the use of the language code “no:” for “Bokmål” on Wikipedia is a leftover from the time when no.wikipedia.org included both Bokmål and Nynorsk. This code has been kept as the main one, with nb being a redirect, as a “compromise” — to facilitate good diplomatic relations between the Bokmål and Nynorsk wikipedias. To use this compromise to misrepresent the language names here would be a mistake — regardless of what Mr. Ullmann’s feelings are. -- Olve Utne 19:09, 19 March 2008 (UTC)

Hmmm.....this could be a sticky issue. The simple fact is that we have a lot of bots running information back and forth between wikt's, such as User:Tbot. The work they do for us is invaluable. Because of this fact, we may be stuck retaining this incorrect usage simply because the Wiktionaries themselves retain it. What we really need is a comment from Robert Ullmann on whether it would be possible to use nb and nn, and still have the bots function properly when the Wiktionaries are no and nn. While I strongly disagree with him on how to treat the Norwegian languages, there is no denying that he is easily the most knowledgeable editor on this particular aspect. If this screws up the bots, I think we may have to retain no and nn until such time as the Norwegian Wiktionaries themselves make the appropriate switch. -Atelaes λάλει ἐμοί 19:47, 19 March 2008 (UTC)[reply]

Words that are not different between Bokmål and Nynorsk remain in no/Norwegian. The variant codes and names are only to be used when there are differences. Specifically: only when there are corresponding entries which are differently spelled, and refer to each other. Understand that the no/nb/nn distinction, however real, was a political result of the ISO 639-1 process, which was intended to produce a stopgap coding until something better could be done. The differences between Bokmål and Nynorsk are very small compared to the differences in English, which we code and represent as one language. The differences in Norwegian between dialects and regions are much larger than the Bokmål/Nynorsk written form representation. (A serious argument could be made that Nynorsk is (yet another ;-) 19th century spelling reform that has now failed....)

However, given that Nynorsk is coded, and has at least some people who want to represent it, it is very reasonable that we include it and document it. At the same time, forcing all of the rest of Norwegian into the "Bokmål" pigeonhole is not acceptable. Most terms in spoken and written Norwegian are just that: Norwegian.

So we use no/Norwegian for most of the language, quite properly; nb/Norwegian Bokmål when, and only when it must be distinguished from Nynorsk, and nn/Norwegian Nynorsk for those words that must be distinguished from Bokmål.

And yes all this works correctly with the automation, which has long since changed "nb" to "no" for the iwikis. Robert Ullmann 01:13, 20 March 2008 (UTC)[reply]

I hesitate to comment that all written languages are, in fact, different and often divergent from their spoken counterparts. That said, the only slight modification to Robert Ullmann's reasonable offering is that when nn/Nynorsk is used is exactly the same as when nb/Bokmål would be used: when either is variant. - Amgine/^talk 02:31, 21 March 2008 (UTC)[reply]

Of course; we often document the spoken and written differences, and the frequent derivation of one from the other. (An interesting case is German pfui > pfui > spoken form > phooey ;-) And yes, you can look at it either way 'round ;-) Robert Ullmann 08:52, 23 March 2008 (UTC)[reply]

Please do not let yourselves be mislead by Mr. Ullmann’s current “solution”. (What are his qualifications in this matter, by the way?) It does not work in practice without some major tweaking of templates etc. to bridge the very frequent differences in grammar — see dag#Norwegian, where the common Scandinavian word dag is presented under the header “Norwegian”, but given only the “Norwegian Bokmål” plural forms. While it is true that the singular forms in this word (dag, dagen) are the same in Norwegian Bokmål and Norwegian Nynorsk (and Danish and Swedish at that), Mr. Ullmann may not be aware of the fact that the plural forms are different. Actually, when taking morphology into consideration, most words are different between Nynorsk and Bokmål, despite the (false) impression one gets from the fact that the indefinite singular forms of nouns and adjectives often are the same in both (and in Swedish/Danish). The solution is very simple:

Treat Norwegian Bokmål and Norwegian Nynorsk, under those exact names, as separate languages — the same way as the two other Scandinavian languages (Swedish and Danish) are treated.
Use either “nb” (correct language code) or “no” (not correct language code, but currently used interwiki code on Wikipedia) for Norwegian Bokmål. A robot can easily standardise the entries either way.
Use “nn” (correct language code and currently used interwiki code) for Norwegian Nynorsk.

Respectfully, Olve Utne 07:42, 23 March 2008 (UTC)

"(What are his qualifications in this matter, by the way?)" That is argumentum ad hominem and pretty much discredits anything else you have to say. My qualifications would probably floor you. (for one thing, I am 1/2 Norwegian :-) In the 639-1 process, where we were developing a 2-letter language code to cover a number of languages, we had several serious political problems. The nn and nb codings are the result of one of those problems: the Norwegians vociferously insisted that Bokmål and Nynorsk be coded separately (not all the Norwegian contributors to the committee, just the Nynorsk proponents...) even though the -1 two letter coding should have had only one code for that level of classification. The distinction should have been left to (what is now) -3, or more properly -4. In the end a political "solution" was reached, no, nb, and nn all coded, with the expectation that software implementors would use the correct no code, and just ignore nb and nn. (Which is what they have mostly done. Until one or more very vocal proponents of Nynorsk show up, such as Olve from the nn.wp ;-) Please note that the linguistic credentials of the several hundred people both on the committee and in support (such as myself) were/are very extensive. A similar political problem—in reverse—was with the Chinese languages; the rapporteur from the PRC insisted that there was only one "Standard Written Chinese" (meaning Mandarin written in simplified characters) disregarding that -1 should have coded 11-14 of the languages. The result was zh, which was useful to some extent, but quickly had to be extended by software implementations (zh-CN, zh-TW, zh-min-nan, etc). What I wrote above is the resulting political standard(s), IMHO a better solution for us would be what Atelaes said right at the top: use no/nor "Norwegian", and distinguish with proper tags and such the Nynorsk written forms from Bokmål. Robert Ullmann 08:52, 23 March 2008 (UTC)[reply]

Whether ISO should have been this or that way in your opinion, the fact stands that Nynorsk and Bokmål have separate language tags — just like the two other Scandinavian languages. Since this distinction does exist — and in all of ISO 639-1, ISO 639-2 and ISO 639-3 at that — I do not see any reason why we should not use it.

You are of course free to prove me wrong through addressing the problems I pointed out rather than bragging of your one parent from Norway (I have... — TWO (!) (Unbelievably impressive, eh? ;-) But beside the point.)) and claiming ad hominem attacks (In my book, those are about attacking the person rather than their arguments.. But who am I — a mere ignorant Norwegian linguist — to know what ad hominem means...) as an excuse for avoiding the legitimate questions — see dag#Norwegian.

While you are at it, feel free to explain what makes Nynorsk/Bokmål less of a legitimate distinction than Bokmål/Danish, Czech/Slovakian, Macedonian/Bulgarian — or Nynorsk/Swedish at that. That those of us who write both Nynorsk and Bokmål (like myself) or only Nynorsk (like quite a few others). Isn’t having a separate literary tradition for a century and a half enough for a language to be treated as one? That Cantonese and some other languages within the Chinese languages have not had a full literary tradition until recently is not a reason to treat Nynorsk differently from Bokmål, Swedish, Danish, Faroese, Czech, Slovakian, Macedonian, Bulgarian, Catalonian, etc.

-- Olve Utne 11:00, 23 March 2008 (UTC)

Actually, since you hint to your knowing Norwegian (?): How do you want to solve, e.g., the following problems: sei, elv, bli, bok, rot, sau, kjerring? -- Olve Utne 11:27, 23 March 2008 (UTC)

"(not all the Norwegian contributors to the committee, just the Nynorsk proponents...)" Semantics. How there came to be two different ISO-codes is irrelevant. The fact that there indeed is two different ISO-codes is what you should note.

"Look, we have to have Norwegian as Norwegian. (Else everyone will be saying forever "I can't find Norwegian")." Another argument you used earlier which is also pure semantics. The fact that there IS two different official versions of Norwegian, and that both are equally Norwegian, you cannot alter. By your logic, for those who do not know that there are different official languages in the three Scandinavian countries, we should lump the languages together under a Scandinavian header and use Swedish as the norm of Scandinavian languages since it is the one with the most speakers/users, and have the other languages as mere sub-sections. --Harald Khan Ճ 19:01, 24 March 2008 (UTC)[reply]

I agree, we should delete it. - [The]DaveRoss 19:58, 24 March 2008 (UTC)[reply]

I am shocked to find out that the large number of SIL's macrolanguages we regularly use here as normal L2 headers. Technically, on the basis of large lexis/morphology variations between individual languages with separate -3 codes, anyone could ask for their separation. I've heard that the differences between some of those Arabic/Chinese dialects can be quite enormous. Sometimes the orthogrophy used for lemmatization (non-phonetic logograms, or consonant-based such as for Arabic/Aramaic..) can be a binding factor, in addition to, of course, shared cultural/religious heritage that strongly emphasizes common treatment. In other cases, when the separation itself is a preferred option for various reasons (usually just the opposite of those used for common treatment - I've read that dozens of almost exact Aboriginal languages are treated separately because no tribe want's to have it language named ofter neighbouring tribe's name..), forced unification on the basis of intelligibility arguments would necessarily be enforcing a particular POV, that would encounter sharp criticism all along the way of wiktionary's lifetime, assuming that the separation meme is typically not peculiar to some small and loud group. For once, almost all Norwegian-language contributors that have voiced their opinion here support the separation of Bokmal and Nynorsk. If it's true what has been said that, notwithstanding the common lemma form, there is a lot of disparity (approx. how much?) in inflected forms, especially those that display next to the headword line (like plurals, participles, definite/indefinite forms) that would be hard to solve correctly, the separation would probably be the best choice. --Ivan Štambuk 21:36, 24 March 2008 (UTC)[reply]

Huh?? We don't use Chinese as a L2 header, except accidentally or as holdover from early additions. See Wiktionary:About Chinese. We do treat Arabic as a single language, but that's partly driven by the fact that there's an Arabic Wiktionary and an Arabic Wikipedia. We also don't get many contributors who add terms in specific regional variants. We did have someone adding Egyptian Arabic for a time, but he has long since disappeared. Without more contributors for a macrolanguage, there just isn't much we can do. --EncycloPetey 06:03, 25 March 2008 (UTC)[reply]

I was under the impression that here L2 Arabic = Standard Arabic (we use macrolanguage -1 ar=ara as synonymous with -3 individual language arb, and Category:Egyptian Arabic language has only 2 entries, so I guess the dude you're referring to must have been formatting his entries with ==Arabic==, and all of the regional variants in translation tables I saw all point to entries with normal ==Arabic== L2), the exactly the same situation occurs with Aramaic where 334a is adding regional-agnostic spellings (-3 arc = Imperial Aramaic = just Aramaic here). Now had you done your homework and actually studied that list, you'd find out that almost all of other families on that list beside Chinese and Serbo-Croatian are used here as normal L2 headers; in alphabetical order per -3 code; beside already mentioned Aramaic and Arabic:

Category:Akan language - aka = macrolanguage code
Category:Aymara language - aym = macrolanguage code
Category:Azeri language -1 az = -3 aze = macrolanguge code
Category:Baluchi language - bal = macrolanguage code
Category:Cree language -1 cr = -3 cre = macrolanguage code
Category:Persian language -1 fa = -3 fas = macrolanguage code
Category:Fula language -1 ff = -3 ful = macrolanguage code
Category:Guaraní language -1 fn = -3 grn = macrolanguage code
Category:Hmong language - hmn = macrolanguage code
Category:Inuktitut language -1 iu = -3 iku = macrolanguage code
Category:Kanuri language -1 kr = -3 kau = macrolanguage code
Category:Komi language -1 kv = -3 kom = macrolanguage code
Category:Kongo language -1 kg = -3 on = macrolanguge code
Category:Kurdish language -1 ku = -3 kur = macrolanguage code
Category:Malagasy language -1 mg = -3 mlg = macrolanguage code
Category:Mongolian language -1 mn = -3 mon = macrolanguge code
Category:Malay language -1 ms = -3 msa = macrolanguage code
Category:Ojibwe language -1 oj = -3 oji = macrolanguage code
Category:Oromo language -1 om = -3 orm = macrolanguage code
Category:Pashto language -1 ps = -3 pus = macrolanguage code
Category:Quechua language -1 qu = -3 que = macrolanguage code
Category:Romani language - rom = macrolanguage code
Category:Albanian language -1 sq = -3 sqi = macrolanguage code
Category:Sardinian language -1 sc = -3 srd = macrolanguage code (gee, this cat even has "sort-me" notice)
Category:Swahili language -1 sw= -3 swa = macrolanguage code
Category:Syriac language - syr = macrolanguage code
Category:Tamasheq language - tmh = macrolanguage code
Category:Uzbek language -1 uz = -3 uzb = macrolanguage code
Category:Yiddish language -1 yi = -3 yid = macrolanguage code
Category:Zhuang language -1 za = -3 zha = macrolanguage code

Edit history of those categories shows that you edited most of those, so you should have known better before generalizing everything to Chinese family (which represents 0.01% of world's languages anyway). --Ivan Štambuk 10:22, 25 March 2008 (UTC)[reply]

I would like to point out the following:

1) Nynorsk and Bokmål booth have their own separate literature.

2) As for the similarities with the other Scandinavian language, Bokmål is probably closer to Danish than to Nynorsk. --Sigmundg 05:23, 25 March 2008 (UTC)[reply]

Hm, I've been absent for a while, and haven't got to follow this discussion. Is there any conclusions yet? Regardless of what people think is a language or not, I would like an answer to the following:

What to do when a word is the same in both Nynorsk and Bokmål, but has different grammar.
How to categorize
When a word is in both of them, but with different meanings

I also think we need to create som Bokmål and Nynorsk templates. It's about time we can get back to work ... The Norwegian part of this wiktionary is poor :S --EivindJ 09:45, 25 March 2008 (UTC)[reply]

My gut feeling (an my brain feeling too) is that deprecating “Norwegian” as an unqualified term is the best solution. Norwegian is a conglomerate of Scandinavian dialects that happen to be spoken in Norway, but it is not a written language, and not even one written language continuum. Rather, there are two written languages — currently known as Bokmål and Nynorsk. Each of these has, like English, its own continuum of conventions and standards, and some of these again have their own names. Thus, we have — from Høgnorsk and archaic Nynorsk through archaic Riksmål to Danish:

Høgnorsk - conservative Nynorsk - NYNORSK - radical Nynorsk || Samnorsk | radical Bokmål - BOKMÅL - conservative Bokmål - Riksmål - Dano-Norwegian - DANISH

(The above is a bit simplified, but should give a reasonably good impression of the situation.)

The continuum from høgnorsk through radical Nynorsk are mainly predictable phonological variations over one common morphological system.

The continuum from radical Bokmål though Riksmål is a bit more complicated, but is also a continuum which has reasonable predictability based on a common morphological system.

Setting up grammar tags to cover the whole Nynorsk spectrum is uncomplicated. Setting up grammar tags to cover the whole Bokmål spectrum from radical Bokmål through Riksmål is also quite easy.

But setting up grammar tags that cover both the Bokmål and the Nynorsk spectrum in one is a daunting task which takes the invention of new grammatical tags that will make the entries much more opaque for the average editor, since one would have to disregard the tradtionally numbered classes of nouns, verbs, etc. and in effect invent a new system.

The current unqualified “Norwegian” entries are, as far as I have checked, all actually Bokmål, and by far the easiest solution, practically speaking, would be to have a bot rename them all to Norwegian Bokmål. That being done, one would already have achieved a system which is practical, intuitive and — last, but not least: not original research...!

In addition to the differences in vocabulary and morphology, please note that the syntax is also quite different — to the degree that a literal, word-by-word translation from Bokmål to Nynorsk will, actually, sound very awkward in most cases.

My proposal is therefore:

Have a bot move all current “Norwegian” entries (which are de facto Bokmål already) to “Norwegian Bokmål”.
Keep all “Norwegian Nynorsk” entries under “Norwegian Nynorsk”.

This reflects the fact that Bokmål and Nynorsk are not one written language continuum, but two. The names reflect the officially designated terms by the official Norwegian Language Council, as well as, except for not having parentheses, the already established Mediawiki interwiki names. -- Olve Utne 15:06, 27 March 2008 (UTC)

WT:MILE

I propose that we re-format Wiktionary:Milestones with User:Nadando/milestone, or something similar. The current page is all of the place formatting-wise, and the new page is sortable. Nadando 03:44, 11 March 2008 (UTC)[reply]

Be bold, kick Milestone's ass. - [The]DaveRoss 03:48, 11 March 2008 (UTC)[reply]

I can't, it's protected :) Nadando 03:49, 11 March 2008 (UTC)[reply]

Thanks. Nadando 03:50, 11 March 2008 (UTC)[reply]

ő and ű in .ogg audio filenames

I am seeking BP's help in a problem I'm having with two characters (ő and ű) in .ogg filenames. I am trying to record audio for Hungarian words. I've used both Audacity and Shtooka. The other special characters work fine (á, é, í, ó, ö, ú, ü), but ő and ű are changed to o and u. I'd prefer Shtooka since it is so easy to use. In Hungarian, the accents mean a different letter, not simply the stress in the word. So bor = wine, and bőr = skin, but Shtooka will not create bőr.ogg, only bor.ogg, and if I change the filename to bőr.ogg RealPlayer will not play it. I don't have this problem with mp3 files and I can display these special characters on my PC. I use the hu-%STR mask in Shtooka because this will give the preferred filename format. I copy/paste the words from Wiktionary to Shtooka before recording. If the pasted word contains ő and ű, these two characters are displayed as vertical bars. I was told that other languages with special characters work fine, but I don't know how to fix this. Thanks. --Panda10 22:02, 12 March 2008 (UTC)[reply]

As a work-around is there any standard transliteration scheme for ő and ű to ascii characters (like ä and ö can be written ae and oe)? Thryduulf 22:21, 12 March 2008 (UTC)[reply]

Unfortunately, no or at least I've never seen it. --Panda10 22:23, 12 March 2008 (UTC)[reply]

I recall seeing ooe and uue.—msh210℠ 22:40, 12 March 2008 (UTC)[reply]

I've just tested ô and û, both worked fine. It seems that the Latin-Extended characters will not work, but Basic Latin and Latin-1 will. If nothing else works, I will use these characters. They are not correct, but look close enough to the original. Thanks. --Panda10 23:41, 12 March 2008 (UTC)[reply]

Is this a problem when naming the files on your own computer, or when you upload them to Commons? Did you know that you can upload the file under a different name than the one saved on your computer? Have you tried doing that? Have you tried using QuickTime instead of RealPlayer? This same problem can potentially affect many other languages, so I'd rather we didn't try to "work around" it by giving such files a different name. The audio file name should always match the entry name. If there is a deficiency in RealPlayer, then we should tell them and let them decide whether they want to fix the shortcomings in their software on their own. --EncycloPetey 03:23, 13 March 2008 (UTC)[reply]

This is not a problem with naming files on my computer. I have several mp3 files with ő and ű in the filename, they can be played fine with Windows Media Player. Uploading to Common - I uploaded only words that did not contain ő and ű, so I can't answer this part of your question. For now, I am trying to play the files with ő and ű in the .ogg filename on my computer, no success. I tried QuickTime, it displays an error message for any .ogg file. What is the recommended player for .ogg files? How will Wiktionary users play these files? --Panda10 19:25, 13 March 2008 (UTC)[reply]

I use Quicktime to play all .ogg files, but then I am using a Mac. I would try uploading some files with the problem charatcers to Commons, changing the name at upload to the Hungarian spelling, and see if you can then play them from within a page. --EncycloPetey 19:43, 13 March 2008 (UTC)[reply]

Thanks for the idea. I uploaded the audio for ősz (autumn) and played it back successfully from Wiktionary using the same RealPlayer that is not willing to play it on my computer... Does this make sense to you? --Panda10 20:55, 13 March 2008 (UTC)[reply]

It makes sense in that I understand what are are saying, and am not surpised by the vagaries of internet interaction. Do I comprehend the reason it works when done this way? No. But I do know that I've successfully played .ogg files for odd script words before this way. --EncycloPetey 23:29, 13 March 2008 (UTC)[reply]

Even sysops are ignoring my questions

Which box(es) should be utilized on failure, zero gravity, vampire, and martial art? I am being told of a non-existent policy which is in practice (WTF?) albeit it is not written. Ergo, I was temporarily banned for making what I thought were bold edits, and some of my thoughts continued to go unanswered. Can anyone lend a voice? This is probably the last place I'll be repeating myself. Sesshomaru 05:42, 13 March 2008 (UTC)[reply]

Relevant discussion at User talk:Sesshomaru. -Atelaes λάλει ἐμοί 05:52, 13 March 2008 (UTC)[reply]

This seems like a strange use of blocking, although I haven't looked into it thoroughly, and don't plan to.

Personally I think both boxes should be deprecated in favor of {{pedialite}} in a ===See also=== section. As an added bonus, that template is unobtrusive enough that it can be repeated as needed. If we must choose, then at least 99% of the time we should be linking to the disambiguation page. Dab pages seek to provide the full range of encyclopedic meanings, just as we seek to provide the full range of lexical meanings, and are thus the most appropriate next stop for someone who didn't find what they were looking for in our entry. -- Visviva 05:59, 13 March 2008 (UTC)[reply]

Can you clarify? All I understood was "don't use 'em" and "use dabs 99% of the time". Sesshomaru 06:05, 13 March 2008 (UTC)[reply]

Visviva suggest you use {{pedialite}} as an inline, unobtrusive link in a ===See also=== section within the part of speech section. (Notice the level of the header, this is important.) - Amgine/^talk 06:11, 13 March 2008 (UTC)[reply]

Yes, try at most one box, and preferably only {{pedialite}}, although we really need to update the WT:FAQ to encourage the inline template. If they are as such, more than one link to Wikipedia is NOT a problem, provided they are relevant. A disambiguation page is fine, and I would say a link corresponding to any definition where the Wikipedia article is directly related. 71.129.48.8 06:17, 13 March 2008 (UTC)[reply]

Okay. I'm starting to understand. Can someone provide sample(s) of links which share this tag? I'm more curious in seeing the layout than anything else. Sesshomaru 06:28, 13 March 2008 (UTC)[reply]

Have a look at Special:Whatlinkshere/Template:PL:pedia. One example is at pigment#See also. Mike Dillon 06:42, 13 March 2008 (UTC)[reply]

I'm sorry zero gravity was reverted time and again instead of indicating a better way to accomplish the goal you had in mind. I agree with Visviva that a box makes sense for disambiguation, but the community is splintered on the whole issue. Personally I don't think many people have put the necessary thought into it, assuming one page on Wiktionary equates to one page on Wikipedia, but I digress. To address your question, no one would object to the way I've set up zero gravity now, using {{pedialite|...}} and {{pedialite|dab=...}}. 71.129.48.8 06:57, 13 March 2008 (UTC)[reply]

Although I have never seen a policy about it I have always used links to Wikipedia either when they have a page at the same title, or when I am using terms in the definition that are not explained on Wiktionary (for example expanding abbreviations to places or entity names). As a dictionary I feel that Wiktionary should definitely not be aiming to provide background information about related topics, though linking to related words on Wiktionary is useful for broadening vocabulary, and that is why we have lots of synonyms, antonyms, etc.etc. sections. It makes clear sense to me to link to the disambiguation page on Wikipedia, unless we want to have an interwiki for each sense of the word, which is redundant and ugly, as there is no way we can tell which article people will be interested in. On a related note, if Wikipedia's article at the identical title is actually a redirect, it is still preferable to link directly to it than to disambiguate manually, this is so that if the redirect is converted into an article or pointed to a different place then the link will still make sense.

For example, with zero gravity, the link to w:Weightlessness is irrelevant, if people want to know what weightlessness is they can click on the link to weightlessness in the definition (where I assume they will find a link to 'pedia should one be necessary). It is worthy of note that we have a javascript extension that provides interwiki links to Wikipedia whereever {{pedialite}} and related templates are used and that at the moment it is assuming an interwiki between Wiktionaries "zero gravity" and Wikipedia's "weightlessness", which should not be there. Conrad.Irwin 10:55, 13 March 2008 (UTC)[reply]

I would strongly suggest that in the case of redirects, "Zero gravity" should be preferred over "Weightlessness", basically, to avoid confusion. And you never know if the redirect could very well become its own article (this is what happened to w:Kristin Wells; for a long time the link targeted w:Superwoman and now it has its own page). Per this discussion, I made these changes, but was I correct in doing this edit (martial art or martial arts)? And back to the fundamental inquiry: what about an instance such as Batman vs. batman, Superman vs. superman, etc.? Sesshomaru 22:10, 13 March 2008 (UTC)[reply]

In the case of zero gravity, WP's disambiguation page is not relevant as there is only one definition, I think therefore it should link to w:Zero gravity - though that is a personal preference. This discussion looks in favour of having only one Wikipedia link per entry, yet both of the "these changes" links contain two links, so I am not sure why you are referring people to here to justify them. The martial arts edit was fine. For Batman I think we should link to w:Batman (disambiguation) as we have more than one meaning in common. For batman I feel that we can link to w:Batman (military) as that is the meaning given - though I wouldn't object to linking to the disambiguation page there too. The links on superman and Superman feel right to me, though I can't see the link on superman being much use, and wouldn't have added it myself. I don't think that we shold be using Wikipedia as a place for people to find new meanings of words, but instead to enhance their knowledge and understanding of the words we define. Conrad.Irwin 14:58, 14 March 2008 (UTC)[reply]

I don't understand. Is this a reversal of what you said above? Should we link to w:Zero gravity or to w:Zero gravity (disambiguation)? First you said that w:Weightlessness, which the first redirects to, is irrelevant, and now you say that the disambiguation page is not relevant. I would imagine either could be depending on what the user is looking for, so why not have both? If there were more definitions, as with trunk, there would have to be more links anyways. DAVilla 11:03, 19 March 2008 (UTC)[reply]

The thing is that "weightlessness" is not the same as "zero gravity", the fact that at the moment Wikipedia treats it the same at the moment is irrelevant. Whether to link to a disambiguation page or a specific article is a choice that needs making on a per entry basis, but I can never see the use of linking to a different word on Wikipedia. Our aim is not to provide people with information about topics, it is to provide them with a better understanding of words. Conrad.Irwin 01:12, 20 March 2008 (UTC)[reply]

There's a policy draft at Wiktionary:Links; it definitely needs more work before it's representative of community standards, but it's a start. —Ruakh_TALK 12:24, 13 March 2008 (UTC)[reply]

Can you see this: 𐎧𐏁𐎹𐎠𐎼𐏁𐎠?

I am interested in knowing what percentage of people can see the Old Persian script without installing extra fonts. If the majority of people can see only '??????' then it might be good to add a template or some information on adding the font and making it display properly. There is such a template on Wikipedia for Chinese [5] (although I think most people can see Chinese by default). Pistachio 00:25, 15 March 2008 (UTC)[reply]

There are numerous fonts (including this one) that are not installed in my Windows setup. Mostly I don't miss them. DCDuring TALK 00:33, 15 March 2008 (UTC)[reply]

Something like 0.01%. Lots of those ancient languages' scripts requre special fonts; in this case Aegean.otf or Xerxes (those two are supported by Xpeo). I don't think it would be good to clutter pages with "this needs special fonts" messages; how about instead providing image display in the headword by default for all of these? Like in e.g. Phoenician 𐤀𐤍𐤊𐤉 (ʾnky) or Gothic 𐌷𐌰𐌹𐍂𐍄𐍉 (hairtō) ? --Ivan Štambuk 00:44, 15 March 2008 (UTC)[reply]

I can see it, incidentally, but I doubtless have extra fonts installed for a lot of things. --Wytukaze 00:48, 15 March 2008 (UTC)[reply]

I can't believe a Wiktionary admin can't see Old Persian cuneiform. How embarrassing. I move to desysop DCDuring for their lack of interest in esoteric languages. :-) -Atelaes λάλει ἐμοί 00:51, 15 March 2008 (UTC)[reply]

Now that I've outed myself, I should admit that even if it were visible, I couldn't pronounce or read this script or, er umm, several others, including, er umm, a couple that do display on the screen (though I couldn't say what they display). I blame my failing eyesight. DCDuring TALK 00:58, 15 March 2008 (UTC)[reply]

I am able to view it (on Debian Linux) without doing anything special, though there are several scripts which I can't decipher. Conrad.Irwin 01:01, 15 March 2008 (UTC)[reply]

I can see it in Linux (Kubuntu), the only non-default font I've installed on this machine is Gothic. I can't see in Windows XP on a machine that is completely unadulterated font-wise. Thryduulf 01:03, 15 March 2008 (UTC)[reply]

I can't read it (or several other scripts, including Gothic) though. Thryduulf 01:09, 15 March 2008 (UTC)[reply]

Thinking about the suggestion of image display, is there any way to automate the generation of these images in the same way that complicated maths formulae are? 01:12, 15 March 2008 (UTC)

I was thinking of a javascript solution which allowed people to define which fonts should be replaced in this way, and get it to insert the images in place of the text as and when it was required. But I then got distracted and ran out of ideas, maybe I'll come back to it some other time. Conrad.Irwin 01:26, 15 March 2008 (UTC)[reply]

Do we have one or more tables of languages, scripts, and fontnames (or link to same) that one could refer to find what one needed to install in each major operating system/browser pending more user-friendly solutions? DCDuring TALK 01:34, 15 March 2008 (UTC)[reply]

Not that I'm aware of. I always use [6]. It's an excellent site. You may want to simply begin with Code2000. While it's not always the prettiest of fonts, it covers a fairly broad swathe. -Atelaes λάλει ἐμοί 01:36, 15 March 2008 (UTC)[reply]

Perhaps there could be a section in the 'help' with this information. Also, the idea of having images to display seems lovely (it goes way over my head though). Pistachio 01:40, 15 March 2008 (UTC)[reply]

A help page with a table showing samples from each script along with a link to get a copy of that font if you can't see it, and a link to a how to install fonts on various operating systems seems like a very useful thing to have. Thryduulf 02:05, 15 March 2008 (UTC)[reply]

I wonder if it would make sense to change some of the script templates for less common scripts to link to a help page or appendix giving information about installing fonts or an option to have those scripts turned into images via JavaScript (for logged-in users). Alternative, JavaScript code could use the CSS classes used by these scripts only when they are the head-word to avoid cluttering pages with tons of help links. Mike Dillon 02:50, 15 March 2008 (UTC)[reply]

There are no OPC signs on Commons, so I made some ad-hoc images (that really ought to be turned into SVGs by a knowledgeable wizard). Nothing impressive, but it's better than nothing. I like the idea of some sort of superscript over a headword with a message like "Problem with fonts?" linking to Appendix: with detailed instructions. --Ivan Štambuk 03:23, 15 March 2008 (UTC)[reply]

Cross-dictionary bookmarklet

I have just made a bookmarklet that you can use when on one online dictionary to add links to other online dictionaries for the same word. So far only Merriam-Webster, Microsoft Encarta, and the English Wiktionary are supported.

It's only tested on Firefox so far and for some reason the links are not clickable on Wiktionary. Improvements welcome. Copy and paste this code into a bookmark:

javascript:if(location.host=='www.merriam-webster.com')w=decodeURIComponent(location.pathname.substr(12));else if(location.host=='encarta.msn.com'){t=document.getElementsByTagName('title')[0].firstChild.nodeValue;w=t.substr(0,t.length-38).replace('’',"'");}else if(location.host=='en.wiktionary.org')w=decodeURIComponent(location.pathname.substr(6));else w=null;di=document.createElement('div');di.innerHTML='<a href="http://en.wiktionary.org/wiki/'+w+'">Wiktionary</a> <a href="http://encarta.msn.com/dictionary_/'+w+'.html">Encarta</a> <a href="http://www.m-w.com/dictionary/'+w+'">Merriam-Webster</a>';di.align='center';bod=document.getElementsByTagName('body')[0];if(w)bod.insertBefore(di,bod.firstChild);void(1) — hippietrail 11:18, 15 March 2008 (UTC)[reply]

Template:defective verb

Can somebody explain to me why {{defective}} includes the POS while no ther such templates ({{ambitransitive}}, {{ergative}}, {{ditransitive}}, {{impersonal}} and even {{auxiliary}}) don't? Circeus 15:58, 15 March 2008 (UTC)[reply]

Well, it's a different kind of description; {{defective}} has to do with the forms a verb takes (and doesn't take), while the others have to do with the grammatical frames it's used in. That said, I don't think we need {{defective}} at all; it's not really a context label, and is best covered by the inflection line, the conjugation table (if any), and/or usage notes. I mean, marking a sense "defective" is really useless anyway, since it doesn't tell you what forms exist and what forms don't. —Ruakh_TALK 19:23, 15 March 2008 (UTC)[reply]

Where is this template intended for use? The only place this information is meaningful is where the full set of inflected forms is presented. It might appear on an inflection line, to alert a user that the verb does not follow the full normal pattern, or it might appear in an Inflection / Conjugation section for a similar reason. There is no other place I can imagine it being useful. I therefore don't see any use for this template. When we redesigned the {{la-verb}} tempate, we set it up so that "pattern=defective" could be used. This way, it displays in the inflection line and it provides a link explaining what it means. So, we don't need the template under discussion for Latin entries. --EncycloPetey 19:28, 15 March 2008 (UTC)[reply]

I don't see why evry single bit of information should be included on an inflection line. This template is useful for signaling verbs which might need special treatment, both for users and for those who look after the formatting on various languages. Physchim62 14:43, 19 March 2008 (UTC)[reply]

talkative - category?

I'd like to add talkative and its synonyms to a category. Would you recommend the existing Category:Behaviour? --Panda10 22:01, 15 March 2008 (UTC)[reply]

If you can come up with enough words, it might be worth starting a Category:Talking as a subcategory of Behavior (and of Category:Sound). I imagine it would be fairly easy to come up with enough terms for such a category. (e.g. loquacious, chatty, talk, speak, speech, say, blab, gossip, chat) --EncycloPetey 23:28, 15 March 2008 (UTC)[reply]

If it is created, Category:Talking should probably be a child of Category:Language as well. Another possible name would be Category:Oral communication, but that could include things that aren't "language" like grunting or humming. Mike Dillon 23:43, 15 March 2008 (UTC)[reply]

I created the category and added the synonyms I've found so far. Thanks. --Panda10 01:56, 16 March 2008 (UTC)[reply]

How should the new Category:Talking relate to Category:Communication? Several of the above mentioned words are already there. --Panda10 15:29, 16 March 2008 (UTC)[reply]

It would be a subcategory of that as well. We have some categories that are listed in three or more locations because of their breadth and importance. --EncycloPetey 15:30, 16 March 2008 (UTC)[reply]

Mike Dillon changed the code of Category:Talking to {{topic cat|lang=en|current=Talking}}. I used nav before, but I don't know how to update topic cat. Can I just add Category:Communication in the second line? --Panda10 15:37, 16 March 2008 (UTC)[reply]

To answer you more directly: Yes you could just add Category:Communication. You could also add "parent=Communication" to the call to {{topic cat}} and it would get merged with the parents defined at Template:topic_cat_parents/Talking. I'm planning to set up a process to watch for these sorts of changes to allow people who want to be familiar with the internals to add them into Template:topic_cat_parents/Talking and make it easier for others to just make their changes and go about their work. Mike Dillon 05:04, 17 March 2008 (UTC)[reply]

You merely have to add the extra listing to Template:topic_cat_parents/Talking. --EncycloPetey 15:42, 16 March 2008 (UTC)[reply]

I guess this is the tradeoff we have with {{topic cat}} if we decide to adopt it instead of {{nav}}. Adding a parent category is not as obvious, but once it is done it is done for the category in all languages and description changes can be managed from one place as well. I'm planning to set up a read-only bot to watch for changes in the topic category tree and parent/description configuration, so it would actually be OK to just add [[Category:Communication]] directly to Category:Talking and the fact that it is missing from the parent configuration would be noticed and reported. The code is partially written, but I'll try to get a report running soon. Mike Dillon 04:38, 17 March 2008 (UTC)[reply]

Thanks for the information, Mike. I do like the new system because of its obvious advantages. It would help to add more usage information to the template talk page. For example, what to do if an existing category has to be added to another existing category as a subcategory. I did read the template talk page before but it was not as clear as now after reading your explanation and seeing how EncycloPetey modified it. --Panda10 18:24, 17 March 2008 (UTC)[reply]

Persian,Urdu, Hebrew, Arabic, Korean entry keyboards

Following on from the above discussion about less-commonly installed fonts, I want to raise the point that some users will have difficulty entering search terms in languages with non-Latin script such as Persian,Urdu, Hebrew, Arabic and Korean and so on. They may lack administrator rights to install the languages themselves (students, people in the office), they could be using someone else's computer whilst travelling or they may not know how to install input for extra languages. Also, for many Persian-speaking people, a fault with their computer means inputting Persian produces Arabic letters instead of the modified Persian versions, for example ي instead of ﯼ. A search for a Persian word in Wiktionary using those Arabic letters will produce no results unless there is a redirect in place: searching for "ايراني" instead of "ایرانی" produces no results. Therefore perhaps creating online keyboards to facilitate input and searching in some languages would be really helpful for some people. Does anyone think this is a good idea? Pistachio 02:30, 17 March 2008 (UTC)[reply]

Yes. It can be a page (Wiktionary:Search using various character sets or some such) which uses JavaScript to paste characters to a search box, and then uses the usual Wiktionary search as the search mechanism. Cf. [7].—msh210℠ 16:10, 17 March 2008 (UTC)[reply]

Images

How do people feel about something like this, adding an image for each sense for which an image is available? In general, do galleries have a place in Wiktionary entries, and if so where should they be placed within the entry? -- Visviva 07:28, 17 March 2008 (UTC)[reply]

Certainly for nouns and proper nouns, images are often good at aiding understanding of the word being defined. Some verbs can also be imaged, but animations would be better for some. Adjectives, etc. are far more difficult to illustrate (e.g. what image would you add to eloquent?).

Regarding the placement of the images, generally I don't like the use of galleries outside Commons, as in most cases inline images work better (imo). Compare ring with router. This can cause problems when the picture(s) extend beyond the definition lines, but this can be overcome - see bassoon (compare with this old revision). There is probably a better way of doing this with js or css than the table format I used there, but I don't know any js or css). Thryduulf 14:45, 17 March 2008 (UTC)[reply]

I agree that galleries aren't ideal (at least in their current default format). In general, I'd like the images to be as close as possible to the definitions. But it seems to me that the bassoon solution breaks down rather quickly when there are 4 or more illustrable senses (and most senses of concrete nouns are illustrable, although the store of images on Commons still leaves much to be desired in this regard). -- Visviva 14:57, 17 March 2008 (UTC)[reply]

How about a Gallery: namespace, for use in situations where there need to be lots of images for lots of senses? If there is only one image needed, it can go on the page as we do now, but in situations where many are needed, we could have one on the page, then a link to the Gallery namespace for additional illustrations. --EncycloPetey 15:37, 17 March 2008 (UTC)[reply]

I think a gallery namespace is a bad idea. Finding uses of a word throughout history is one of the basal functions of a dictionary, showing picture examples is not. While there's nothing wrong with adding pictures, I don't think our emphasis falls on that area enough to justify a new namespace. -Atelaes λάλει ἐμοί 18:44, 17 March 2008 (UTC)[reply]

I really prefer the use of images off to the right side of the page. Also I think it is important to use the (#) notation to associate the image with it's definition. The gallery might be able to work, but we don't use the width of the page as much as we could, which is a shame. We often have great big areas off to the right and pages that go on forever vertically, with most of the content well out of sight. Off to the right the image balance the text density of the left side of the page and keep the right side from being mostly blank and boring. We really don't need images for every sense, I think it might be best to only use them when they clarify or clearly illustrate the definition, or when they just look really good :). For some words that will mean 10 images are called for, for others one or none. - [The]DaveRoss 19:59, 17 March 2008 (UTC)[reply]

In general, I don't think numbered notation works for images any more than for translations (although the consequences of confusion are less severe). Numbers change, and the people making the changes don't always notice the by-number references elsewhere in the entry. That's why I usually try to put some sort of short gloss in the caption (as in ring). -- Visviva 06:28, 20 March 2008 (UTC)[reply]

I like it. It's certainly better than messing things up with lots of miscellaneous floaty stuff. Conrad.Irwin 19:27, 17 March 2008 (UTC)[reply]

Without intending to stir up this issue any further, it does occur to me that the problems with associating images to senses would disappear if we did begin using the sense, rather than the POS, as our primary unit of organization. -- Visviva 06:28, 20 March 2008 (UTC)[reply]

Category:Computer Science vs Category:Computer science

Is there a reason why the Category:Computer Science has "science" with the first capital letter? If not, may I move it to Category:Computer science? --Daniel Polansky 10:57, 9 February 2008 (UTC)

No, yes, go ahead. H. (talk) 09:14, 17 March 2008 (UTC)[reply]

I've wondered the same thing about Category:Food and Drink, though moving all those categories by hand seemed like more trouble than it could possibly be worth. -- Visviva 14:58, 17 March 2008 (UTC)[reply]

I've made those sort of moves by hand before. If someone will poke me (after Thursday) about getting this done, I'll do the dirty work. --EncycloPetey 15:33, 17 March 2008 (UTC)[reply]

I have moved the computer science category manually. It seems to have worked nicely, also because of the heavy use of the {{computer science}} in the entries. --Daniel Polansky 19:14, 17 March 2008 (UTC)[reply]

Basque etymological dictionary

For those interested in Basque: http://linguistlist.org/issues/19/19-863.html#1 H. (talk) 09:13, 17 March 2008 (UTC)[reply]

Category:Food and Drink

I have manually moved Category:Food and Drink to Category:Food and drink. What I have not moved are the numerous non-English subcategories of that category, listed in the new category. It would be nice if other people could help. Otherwise, I am planning to slowly work on the non-English categories for food and drink too.

Another category worth fixing is Category:Spices and Herbs, fortunately having fewer non-English subcategories. --Daniel Polansky 09:04, 18 March 2008 (UTC)[reply]

I'm not sure that I'd say it's complete ready yet, but the {{topic cat}} stuff could help with future moves like this. I've configured Template:topic cat parents/Food and drink, Template:topic cat parents/Foods, Template:topic cat parents/Breads, and Template:topic cat parents/Desserts, so any of those categories can be configured by replacing their contents with {{topic cat|lang=XX|current=CATEGORY}}. If we had mw:Extension:StringFunctions, it would just be {{topic cat}} since we could have the template take care of splitting the language code from the category name. Mike Dillon 15:13, 18 March 2008 (UTC)[reply]

I've populated all of the category parent entries for the Category:Food and drink tree under Template:topic cat parents. I'm sure some of them still need descriptions under Template:topic cat description, but many will be fine with the standard description. Mike Dillon 15:54, 18 March 2008 (UTC)[reply]

Category:Automotive

Should Category:Automotive be moved to become a sub-category of Category:Road transport? Thryduulf 12:23, 18 March 2008 (UTC)[reply]

I think so, yes. -- Visviva 14:31, 18 March 2008 (UTC)[reply]

category:zh:Adverbs

Is there a reason category:zh:Adverbs is a sub-category of category:Adverbs rather than category:Adverbs by language? Thryduulf 17:37, 18 March 2008 (UTC)[reply]

Yes. Physchim62 14:44, 19 March 2008 (UTC)[reply]

The whole situation with "zh" seems very weird to me. I can see that we have Category:Mandarin adverbs, Category:zh:Adverbs, Category:zh-cn:Adverbs, and Category:zh-tw:Adverbs. We don't have Category:Cantonese adverbs, but we do have Category:Cantonese nouns. The "zh" categories seem entirely inappropriate to me for an "adverbs" category. The Chinese languages are the only languages that are handled like this. This seems to be a lowest-common denominator type thing where the words made up of simplified Chinese characters end up in the "zh-cn" categories and the ones made up of traditional Chinese characters end up in the "zh-tw" categories instead of identifying them in each of the actual languages that the words belong to. No doubt someone who understands these matters better will show me the error of my ways, but it doesn't seem that there is a good reason that Chinese languages should be using language code-prefixed categories for parts of speech. Mike Dillon 14:56, 19 March 2008 (UTC)[reply]

Consideration on the order of context tags

Just to stir up some discussions and thought, this is the ordering I've been using (and switching articles to):

Grammar information
Topical labels
Regional labels
Formality labels (in practice, mostly "informal" and "slang", but also stuff like "literary" and "baby talk/childish")
Politeness labels ("euphemism", "derogatory", "pejorative", "vulgar", "jocular" etc.)
Temporal/frequency labels ("rare" goes here)

This has felt to me the most natural ordering, though I'm not clear myself why. Typically, I'm more confortable with the Grammar labels not being in the same parenthesis as the other ones, hough I haven't been too consistent on that. I want to point that in quite a few cases, I've been removing templates that felt redundant (e.g. "informal" alongside "slang", "vulgar" or "jocular"). Circeus 14:49, 19 March 2008 (UTC)[reply]

The order seems agreeable. This might be worthy of becoming at least a guideline. I'd favor facilitating the placement of all of these inside a single set of brackets, which usually works with our existng tags, but perhaps not every one. Are there any limits on the number of these in {{context}}? Six seems like it would be insufficient since there can be multiple topics and regions {and possibly others). Also editors use up slots with qualifiers like "mostly". DCDuring TALK 15:33, 19 March 2008 (UTC)[reply]

After some experimenting, it seems you can have up to 10 labels, subsequent ones are ignored. However, "usually" and "mostly" (and presumably other similar ones) use 2 slots. When using "usually" or "mostly" means there are too many arguments there is a red link to "Template:context 10" at the end. In contrast to what I expected, "and" and "or" each only take 1 slot. |_| of course also uses 1 slot. Thryduulf 18:13, 19 March 2008 (UTC)[reply]

It is unusual for more than 4 or 5 labels to appear (including non-label stuff such as Thryduulf just mentioned). At that point a usage note typically becomes a good idea. Circeus 19:07, 19 March 2008 (UTC)[reply]

Let me stick a technical note in here: the number of "slots" is not built into {{context}}, it is just a result of the number of {context n}'s that have been set up. Each of {{context 1}} through {{context 9}} is a redirect to Template:context. This "tells" the parser that the resulting recursion is intentional. Create {{context 10}} as the same redirect, and you'll have 11 "slots", and so forth. At some point the tags will be longer than desirable, but this isn't an implementation limit. (So there isn't any reason to worry about "using up" slots.) Robert Ullmann 09:13, 23 March 2008 (UTC)[reply]

As the discussion suggests, 10 seems to exceed what we feel would be useful to show in a sense line. How hard would it be to have a trustworthy bot that put context tags in our desired order (assuming that we get consensus on it)? I am having trouble seeing how that could lead to trouble (ignoring tech glitches for the moment). DCDuring TALK 12:06, 23 March 2008 (UTC)[reply]

The context slot contents get processed to determine the need for new categories, I believe, so I is useful for things that could be categories to be there. That's one reason I don't like using up slots for "empty" qualifiers, although I suppose that there were enough demand we could have some special means for handling them. I like to use Usage notes for anything complicated or nuanced - or just not likely to become a category in my assessment. I suppose there would be nothing that would prevent successive context tags, if it came to that. Extra brackets might be useful to separate a list of regional or topical contexts from other types. DCDuring TALK 19:39, 19 March 2008 (UTC)[reply]

I think the context notes are getting overused, if there are any more than 2-3 the qualifications should be in the usage notes section, perhaps with a pointer to that section as the sole contents of {{context}}. I have seen context fields with 6-7 notes, taking up more than half of the definition line. This is just plain confusing, because inevitably there are regional nuances and things going on which can't be fully explained by a list of context tags. I say limit the contents of context to 3, force all complex contexts to a more clear usage notes paragraph. - [The]DaveRoss 19:47, 19 March 2008 (UTC)[reply]

Once more, it is the most polysemic words and PoSs that create the best test cases for any options. Relying on usage notes for items that might have been placed on the sense line means that the usage note may not appear on the same screen as the sense line. This puts a big cognitive load on the user at best and may mean they never even know that there is a usage note. The solution of having rel templates with glosses has serious maintainability drawbacks when definitions are edited. This doesn't have so much bearing on the narrowest construction of the subject of this heading, but provides another example of how the structure of our complex entries limits us. If we treated a given sense as a collapsable mini-entry with its own context, def, semantic relations, translations, usage examples, and usage notes, we would have almost guaranteed that almost all the information a user could want would be on a single screen. DCDuring TALK 20:06, 19 March 2008 (UTC)[reply]

I think in the end this is going to come down to "we need to rethink our entry layout, because it doesn't work." There is a lot of data which is associated with a particular sense which is kept far away from that sense, often with a lot of other unrelated junk in between. Translations, citations, usage notes, examples, images, and to a lesser extent (or more general) pronunciations, etymologies and conjugation/declension data, should all be clearly associated with particular senses, and the current method of doing so is by using a gloss to indicate which sense the subsequent information should be associated. What we should probably be doing is using collapsible fields directly beneath every definition containing all associated data, so once the reader finds the sense they want they simply click once and get all kinds of additional information pertaining only to that sense. No hunting around across the page, no question about which sense the information is associated with. It isn't that big a technological problem, it is a decent-sized organizational problem and a GIGANTIC effort in manually reorganizing the entries problem, but it is something we will end up having to do anyway, and there is a good chance that while we are cleaning things up we will be able to format them in a very standard way, hopefully a way which is easy to convert to xml or other associatively-structured output so that _all_ of the information on Wiktionary is friendly to machine reading rather than just the definitions. How best to affect this is a matter of discussion of course, but I think everyone knows, deep down, that our entry layout is far more complicated and far less useful than it could and ought to be. - [The]DaveRoss 20:18, 19 March 2008 (UTC)[reply]

And in the meantime, the best we can do is standardize everything that can be effectively standardized (without losing information) to maximize the chances that some of the restructuring can be automated. Well, it won't be long until we'll have gotten rid of all the English transitive and intransitive verb headings, occasionally yielding more than 10 sense lines and 10 usage examples (more than a screenful) for a single ety's verb PoS. DCDuring TALK 20:36, 19 March 2008 (UTC)[reply]

Just to add, despite how deep the above discussion seems to go, the vast, vast majority of cases use no more than 2 or 3non-grammatical labels at the same time, and appropriately used. As I said, having more than tree labels is already very rare. Circeus 21:54, 19 March 2008 (UTC)[reply]

The initial suggestion seemed like a good idea, but the opportunity to learn something new and to put some mileage on a hobby-horse was irresistible (to me, anyway). DCDuring TALK 22:02, 19 March 2008 (UTC)[reply]

While I agree with TheDaveRoss that we should be nesting information under definitions (which sucks for the JavaScript-less, but I don't think the current approach is much better), it's obvious that we don't currently have consensus to do that. So in the meantime, I think it would be helpful to create a family of "see-usage-notes" templates that provide a standard appearance and functionality for links to and from usage notes. (Actually, in general it would be nice if we had a convenient way for glossed onym sections and so on to link to the senses they belong to, until such time as we simply attach them to those senses.) —Ruakh_TALK 00:26, 20 March 2008 (UTC)[reply]

BTW, the idea was mentioned above of putting grammatical information in a separate set of parentheses; I don't think that's a good idea, as it will look strange, and the distinction we're making will not be obvious to the casual reader. —Ruakh_TALK 00:32, 20 March 2008 (UTC)[reply]

From the amount of stuff I've looked at, even with only topical templates, you can believe ma that we are not exactly consistent. Circeus 00:52, 21 March 2008 (UTC)[reply]

Name of Category:Spices and Herbs

I would like to move Category:Spices and Herbs to Category:Spices and herbs, but it seems much more work than with the previous categories that I have moved. What could be done robotically is that all the word entries in the various languages are moved to the new categories, such as Category:de:Spices and herbs; a simple regular expression replace should achive this, such as r/\[\[Category:\(.*?\):Spices and Herbs\]\]/\[\[Category:\1:Spices and herbs\]\]. Afterwards, the category pages could be moved manually. Anyone volunteer on having his robot do the work? Or is it more complex than I imagine it to be? --Daniel Polansky 06:14, 20 March 2008 (UTC)[reply]

You could try doing it with autoedit, which will make it quicker though not as easy as getting a friendly bot. Conrad.Irwin 11:38, 20 March 2008 (UTC)[reply]

Thanks for the hint. --Daniel Polansky 12:24, 20 March 2008 (UTC)[reply]

entries with illegal titles

A feedback message prompted my creation of ~~Appendix:Entries with illegal titles~~ Appendix:Unsupported titles, which still perhaps needs a better name, and definitely need expansion (and being linked to).—msh210℠ 19:41, 20 March 2008 (UTC)[reply]

Seems like a great idea. I suppose we could create some Javascript that redirects from those pages to that Appendix. Conrad.Irwin 17:55, 20 March 2008 (UTC)[reply]

Sounds good to me.—msh210℠ 19:41, 20 March 2008 (UTC)[reply]

Better idea, that appendix is now transcluded into MediaWiki:Badtitletext. For example, see http://en.wiktionary.org/wiki/%7C . Conrad.Irwin 20:17, 20 March 2008 (UTC)[reply]

I imagine this is going to grow fairly long. Can we make it a list of links, to e.g. Appendix:.7C etc? Also, where do we name symbols? See e.g. ^ which is not acually defined as exponentiation, conjunction, or however else it's used, but is named unlike the symbols on this new page. DAVilla 20:56, 20 March 2008 (UTC)[reply]

I would imagine that this page would become an index, and the individual entries would be moved out: it would make sense to use that kind of encoding for them. Appendix:UT/.5B or summat. Conrad.Irwin 22:04, 20 March 2008 (UTC)[reply]

Language hierarchies

What is the preferred language heading, ==Mandarin== or ==Mandarin Chinese==, or something else? ==Cantonese== or ==Cantonese Chinese==?

Which of these is preferred in the translations section?

Chinese
- Cantonese
- Mandarin

with or without the extra bullets, or

Cantonese (Chinese)
Mandarin (Chinese)

with or without "Chinese"? My opinion is based on simplicity in the layout, which rules out the indentation that other users like, and on normality, which rules out "Cantonese Chinese" but allows "Mandarin Chinese" for clarification. DAVilla 20:14, 20 March 2008 (UTC)[reply]

I like the first one because of the fact that it would also work for regional variations:

Portuguese
- Portugal: dialecto
- Brazil: dialeto

But that's just me. — [ ric ] opiaterein — 12:52, 21 March 2008 (UTC)[reply]

I really like that too (and we can discuss bullets and italicization of region name). Note that there may be Portuguese translations that are not regional which would go on the first line. But the question posed is a little different. I think that the indentation should consistently match region, and the language/dialect listed should consistently match what we use as level-two headers. Otherwise you would end up with

Chinese
- Mandarin: ...
  Taiwan: ...

which is a bit crazy. On the other hand, the problem with not doing it this way is that Mandarin translations wind up under "M" instead of "C", which in my opinion is mitigated by the use of "Mandarin Chinese". DAVilla 22:26, 22 March 2008 (UTC)[reply]

The preferred L2 header is ==Mandarin== or ==Cantonese== or the like. The way Translations are listed is a current topic on Wiktionary talk:About Chinese. --EncycloPetey 22:05, 21 March 2008 (UTC)[reply]

Great, thanks for the link! If the translation section were to list

Cantonese
Mandarin Chinese

Then would it be objectable to change all the language headers to ==Mandarin Chinese== to match? DAVilla 22:26, 22 March 2008 (UTC)[reply]

I would like to say that there are occassions when it would be good to point out both region and language/dialect in the translations section. In such cases, I don't think it would be crazy to have (for computer):

Chinese
Mandarin:
PRC only: 計算機／计算机 (jìsuànjī)

PRC and Taiwan: 電腦／电脑 (diànnǎo)

Min Nan:
PRC only: 計算機／计算机 (kè-sǹg-ki)

PRC and Taiwan: 電腦／电脑 (tiān-náu)

The above example is not necessarily the norm, but there are times when you may want to point out this kind of information. However, I'm not arguing for a specific format, just the ability to include such information when necessary. -- A-cai 08:16, 29 March 2008 (UTC)[reply]

I would like to propose that:

We use the same names for languages in translations as those which are used for L2 language headers
We use Mandarin Chinese instead of just "Mandarin", but otherwise the name of the dialect such as Cantonese without "Chinese"... unless we decide otherwise for a specific dialect
Within the translations section, we list all languages at the same level, and indent only regions which would not be accepted as L2 headers

The major objection to this of course is that Mandarin translations would not be listed under "C". Does the simplicity have enough benefit to overweigh that objection? DAVilla 04:50, 31 March 2008 (UTC)[reply]

That sounds good to me, though I'm not sure regions should generally be indented. I think something like Mandarin Chinese: (PRC only) 計算機／计算机 (jìsuànjī), (PRC and Taiwan) 電腦／电脑 (diànnǎo) would be better in most cases. —Ruakh_TALK 05:17, 31 March 2008 (UTC)[reply]

Sounds reasonable to me as well. Thinking ahead (but not discussing the issue yet), we should consider how this would impact grouping and alphbetization for languages like Ancient Greek, Old French, and Northern Saami. IF the Translations labels and L2 headers always match (which I think is good), can we live with the impact it would have in situations where languages have a qualifier of time or location? --EncycloPetey 05:35, 31 March 2008 (UTC)[reply]

Same as the reasoning for listing language sections alphabetically, this is already an unavoidable issue where the newer and older languages do not even share a name. For not remembering a good example of that I'd make a terrible linguist. DAVilla 05:00, 7 April 2008 (UTC)[reply]

Separate new page templates for simple past and past participle

The new page template for "Past" creates "Simple past tense and past participle". It would be nice to have separate templates for "simple past" and "past participle" for words where they're different. - dougher 00:48, 21 March 2008 (UTC)[reply]

Maybe I'm just dumb, but can you give an example of a word where they are different? Nadando 02:11, 21 March 2008 (UTC)[reply]

sing - simple past sang (I sang in the choir), past participle sung (I have sung in the choir). Thryduulf 02:19, 21 March 2008 (UTC)[reply]

For more examples, see Category:English irregular verbs. Thryduulf 02:22, 21 March 2008 (UTC)[reply]

This would be very useful in corner cases such as proved vs. proven where the regular inflection is both the simple past and past participle, but where use of an irregular past participle needs distinction of context such as region. DAVilla 07:37, 21 March 2008 (UTC)[reply]

What new template are you talking about? We've always had possible the option of listing these separately. --EncycloPetey 22:03, 21 March 2008 (UTC)[reply]

{{past of}}:{{simple past of}}:{{past participle of}}::{{new en verb past}}:{{_____}}:{{_____}}, where the blanks denote the templates I believe he's proposing. —Ruakh_TALK 22:23, 21 March 2008 (UTC)[reply]

No, he hasn't proposed a new template. He said 'The new page template for "Past" creates...' which implies that the template he's talking about already exists. I'm asking what "new" template that might be. --EncycloPetey 01:17, 22 March 2008 (UTC)[reply]

When you search for a word that isn't in Wiktionary, you are shown a table with the title "You can create a new entry with one of the following preloaded entry templates:" [8].

The options given on this table are "Basic", "Noun", "Plural", "Adjective", "Adverb", "Comparative", "Superlative", "Verb", "3rd Person", "Participle" and "Past".

If the word you want to create is a past participle then obviously the option you choose is "past".

However, this preloads a page inclduing the template {{past of}}, which outputs "Simple past tense and past participle of", and categorises your word into category:English simple past forms and category:English past participles. This is correct for English regular verbs.

If your word is a past participle but not a simple past form (or vice versa) though then this is incorrect. The original questioner is asking for separate preloaded templates for past tense words that are not both the past participle and simple past form (i.e. they are one but not the other). Thryduulf 01:35, 22 March 2008 (UTC)[reply]

Ah, thanks. I understand the question now. --EncycloPetey 19:03, 22 March 2008 (UTC)[reply]

FWIW, at least two other editors (myself and RJFJR) also originally parsed "new page template" as "new {page template}" rather than as "{new page} template". I don't know why it's so confusing, but you're not alone. —Ruakh_TALK 21:04, 22 March 2008 (UTC)[reply]

Right, so how about if we split {{new en verb past}} into two definition lines, {{simple past of}} and {{past participle of}}, and then when appropriate the user could simply delete one of the lines? DAVilla 22:10, 22 March 2008 (UTC)[reply]

Projects being neglected

Wiktionary:Translations of the week and Wiktionary:Collaboration of the week have not been updated regularly, Translations for a few weeks and Collaboration for much longer. Is anyone interested in keeping these up to date or should we take them out of prominence rather than have repeats showing up? - [The]DaveRoss 21:29, 21 March 2008 (UTC)[reply]

If nobody is working on them, then I suggest you reduce their prominence and mark them as inactive with a note to start a discussion here if anyone ones to revitalise them in future. My plate is currently full working on category:Requests for pronunciation and special:UncategorizedPages. Thryduulf 21:42, 21 March 2008 (UTC)[reply]

Connel, DAVilla, and I (and others) have all tried at times to get these projects going. Ethusiasm and participation seldom continues beyond a couple of weeks, and then the community forgets about them again. I stopped bothering because it just wasn't worth my time and effort to try to get people involved. --EncycloPetey 22:02, 21 March 2008 (UTC)[reply]

Projects only work properly if people work together as a team. But we are all individuals doing our own thing. They should probably all be scrapped. SemperBlotto 22:21, 21 March 2008 (UTC)[reply]

Both have been marked {{inactive}}. Conrad.Irwin 10:24, 22 March 2008 (UTC)[reply]

Well, we include them elsewhere, should we drop the inclusion on places like Tea Room? - [The]DaveRoss 01:52, 23 March 2008 (UTC)[reply]

That would seem sensible, though we should probably leave a few links hanging around in the hope that someone enthusiastic finds them one day. Conrad.Irwin 16:10, 23 March 2008 (UTC)[reply]

I went ahead and filled up ToW for the next ~25 weeks, took about 20 minutes. Apparently this isn't a time consuming chore it is just one that has slipped by for whatever reason. If anyone wants to take a crack at CoW I am sure it would be similarly quick and painless and these rather useful Things to Do(tm) wouldn't go by the wayside. - [The]DaveRoss 16:42, 29 March 2008 (UTC)[reply]

Interwikis and ta.wikt (Tamil)

moved from Grease Pit, as this is no longer a technical issue ...

I've disabled the adding of iwikis for ta.wikt for now (except when the bot is making an edit anyway, so you'll see "iwiki +fr, ta" but not "iwiki +ta"). Reason is that the bot was spending 60-70% of its time on these.

The problem is that the Tamil wiktionary has gone from <10K entries to over 100K in about two weeks. (Urp!) There is a bot loading lots and lots of "English" words. Some of the entries are clearly crap. Some are dubious even without knowing any Tamil. The rest? Well, I tried looking up a few dozen words that appear in what seems to be the definition line, and most of them gave me the magic zero google. Of course that might be a lot of technical words that no-one has yet written in Tamil on the net, but it makes me wonder.

Also, all of the entries reference tamilvu.org, (Tamil Virtual University) and everything there is copyright. It could be that the source is elsewhere, and that is just a "convenient" link, but I dunno.

Someone ought to look at this ... anyone know any Tamil? Robert Ullmann 08:14, 21 March 2008 (UTC)[reply]

You could ask ravishankar. --EncycloPetey 20:04, 21 March 2008 (UTC)[reply]

uh oh ... from my talk page:

Hello Robert Ullmann. I am User:Sundar in English and Tamil Wikipedias and the Tamil Wiktionary. I saw your message to Ravi concerning the bot-created articles in Tamil Wiktionary. As I wrote up SundarBot that uploaded articles, let me answer your questions:

Firstly, while there could be some unforeseen bugs in transcoding to Unicode, there's no junk uploaded by the bot. Secondly, we got the glossary from Tamil Virtual University which developed that dictionary from numerous public domain sources, volunteer effort, and fully funded by the Government of Tamil Nadu. Also, we believed that words of a language can't be copyrighted and are naturally in the public domain. The bot took the meanings from www.tamilvu.org, transliterated them to Unicode (from TAB encoding), categorised them, formatted per wiktionary conventions, added pronunciation where one exists in the commons, and uploaded it to Tamil Wiktionary citing TVU and providing a link to their page. Errors from the original source have since been corrected by users too. Being words of a language (actively encouraged by the creator for wide public use) compiled using public funds copied with proper citation, processed and value-added in Wiktionary is fair-use according to Tamil Wiktionary editors. Also, let me state that we didn't use any style or artistic product of TVU. -- 122.167.242.183 14:25, 22 March 2008 (UTC)[reply]

sigh

As far as I can tell, Tamil Virtual University claims copyright on the dictionary (as part of a blanket claim on the website). And his definition of "fair use" is very very wrong, and "fair use" doesn't apply anyway if you are copying entries from another dictionary.

Where do we take this? Robert Ullmann 14:48, 22 March 2008 (UTC)[reply]

This is a very serious problem, though arguably one to which we could simply shut our eyes, since in the end their content is not ours. Assuming that Sundar's explanation above is accurate, TA is in direct violation of foundation:Resolution:Licensing_policy. Stewards refuse as a matter of principle to be involved in these matters, so I guess the best that could be done would be to contact foundation staff directly (or semi-directly via a post to foundation-L). -- Visviva 15:38, 22 March 2008 (UTC)[reply]

Foundation-L? Is there not enough noise there? I sent email to WMF counsel, Mike Godwin. (who is the eponymous Godwin! very cool :-) Robert Ullmann

You're right that if fair use is being claimed here, it's likely a bad claim, and unless ta.wikt has an EDP (unlikely) even a good fair use claim is in violation of policy. However, I would note that while it makes me uncomfortable, Mike Godwin has said in the past that things like dictionary definitions and etymologies are "facts" that can't really be subject to copyright, even though the companies that publish dictionaries will claim that it is. To me, those seem like the most creative parts of the dictionary, too, and Mike seems to think that there isn't much to a dictionary that is protected. Aside from that being startlingly permissive, it would also be a direct commercial threat if we, say, copied OED wholesale, so they'd likely challenge it in some way eventually. It would be good to hear from his own mouth what he has to say, but I remember being surprised the last time he was asked about Wiktionary and copyrights. Dmcdevit·t 09:52, 23 March 2008 (UTC)[reply]

If anyone is interested in the finer details on copyrights on facts, and collections of facts, in US law, the seminal case is 499 US 340 (Feist v Rural), in which the Court ruled on the (non-)copyright in a telephone directory (the content being facts, with no originality in selection or arrangement: all the numbers in the area, in alphabetical order by subscriber name). However, dictionaries contain significant creativity in selection, composition of definitions, etc. So Sundar has part of it right, but the reference to "fair use" implies that he has knowingly used copyright material, rather than simply using the "fact collection". IMHO, Mike is correct that there isn't much to a dictionary that can be copyright; but the entire work almost certainly is. And Sundar copied/is copying the entire work from TVU. It will be interesting to see what he (Mike) says. Robert Ullmann 10:19, 23 March 2008 (UTC)[reply]

<IP lawyer hat on>If an action were brought against anyone in this matter, the action would likely be brought in India and the applicable law would be the copyright law of India, where the work was produced. Under the prevailing law, the INDIAN COPYRIGHT ACT, 1957, the Indian government does own copyright in its works, and works produced by a "public undertaking" (a cooperative venture between a government and private actor) are owned by that "public undertaking". Feist v. Rural is not applicable law to an Indian copyright. The courts of India have apparently rejected Feist, at least with regard to the "sweat of the brow" doctrine, in Burlington's Home Shopping Ltd. v. Chibber (1995), in which the Delhi High Court applied copyright protection to a computer database of contact info from mail order customers.

Even if Feist were applicable, dictionaries are generally covered by copyright to the extent that, for example, creativity goes into the writing of definitions. Of course, if a copyrighted dictionary purports to define all words in a language, its owner can not prevent another person from similarly attempting to define all words in the same language, and from referring to the copyrighted work to determine if that list is complete. However, if there is anything more than that - say, one dictionary copying another's select (but not "complete") list of words, then we run into copyright problems. Furthermore, although India has some broad fair-use exemptions, I don't see how this use falls within them - there is no criticism or artistic statement made about the Indian government's work, nor the reporting of a news event. In short, I wouldn't be plucking definitions from a copyrighted work in India. </IP lawyer hat off> bd2412 T 01:18, 24 March 2008 (UTC)[reply]

Yes, thank you, I should have been clearer than "in US law", pointing out that the copyright material was in India, so Feist is illuminating, but not controlling. Thanks, Robert Ullmann 13:54, 24 March 2008 (UTC)[reply]

{{plurale tantum}} vs {{pluralonly}}

We currently have two different templates for use on words used only in the plural, they both categorise articles into the appropriate language subcategory of category:Pluralia tantum but display different text.

	`{{plurale tantum}}`	`{{pluralonly}}`
Display text	(plurale tantum)	(plural only; not used in singular form)
Inclusions	245*	159

* Including inclusions of {{pluralia tantum}}, which redirects to {{plurale tantum}}

Having a quick look at the entries they are used on, there is a very slight preference for {{pluralonly}} to be used on more basic words, but this is not going to be at all statistically significant. The history seems to show that they have developed independently, with the creation of {{plurale tantum}} postdating that of {{plural only}} by about 5 months.

I don't see the benefit of maintaining two separate templates which are doing the same job, however as the wording is different it isn't just a case of redirecting one to the other. While "plurale tantum" is the correct name for the class of words, it isn't a term with which most non-specialists are familiar. The "plural only; not used in singular form" is a very good definition of the meaning of "plurale tantum" but it doesn't educate people about the technical name. As combining the descriptions will make the text too long, perhaps "(plurale tantum)", including a link to the entry would be best? Thryduulf 16:39, 22 March 2008 (UTC)[reply]

I feel that we should use the correct term but provide a quick definition on hover, maybe something like (plurale tantum). I also quite like the idea of the link, but it is quite slow compared to a hover text. Conrad.Irwin 16:55, 22 March 2008 (UTC)[reply]

I think something like (plurale tantum) might be the best of both worlds. —Ruakh_TALK 17:08, 22 March 2008 (UTC)[reply]

Is there any English-language dictionary that uses "plurale tantum" instead of "plural construction only" or a similar English wording? The term is not even defined in MW3. What is our justification for ignoring the needs of our users or, actually, our potential users? That botanists descirbe newly discovered plants in Latin does not mean that gardening books are writen in Latin. DCDuring TALK 17:26, 22 March 2008 (UTC)[reply]

The one hardcopy dictionary I have to hand, the 1998 edition of The Chambers Dictionary, doesn't define "plurale tantum" and uses the abbreviation "n pl" for "noun plural" to mark such entries. This dictionary bills itself as "the most comprehensive single-volume dictionary" so I would not expect to find it in concise dictionaries. Thryduulf 17:38, 22 March 2008 (UTC)[reply]

Both the OED and MW3 use pl.. The AHD is inconsistent, sometimes using the text "Often used in the plural" (cf. pant) and other times putting the plural form in bold at the head of the numbered sense (cf. color). Random House doesn't bother to mark these at all. --EncycloPetey 19:02, 22 March 2008 (UTC)[reply]

No clue. An English wording would be better, if there's a brief one. How about (in plural)? That would also reduce the suggestion that a word is simply never used in the singular: after all, most pluralia tantum, despite the name, are sometimes used in the singular (especially in jocular, nonce, or nonstandard usages, but even certain standard forms of the language tend to singular-ize them in attributive use, and some words, like paparazzi, are freely used both ways by different people). —Ruakh_TALK 21:13, 22 March 2008 (UTC)[reply]

That's sounds like a good reason to use plurale tantum. If we use an English wording, it will be positively misleading. A technical term like plurale tantum will have to be looked up by people less familiar with the grammar, and can be linked to a definition or Appendix that points out that the plural is not strictly absolute, as you say. The probelm with "in plural" is that it isn't strong enough. The problem with "only plural" is that it's too strong. Any English wording likely to be of the appropriate strength is also likely to be too wordy. --EncycloPetey 16:28, 23 March 2008 (UTC)[reply]

The wordings "rarely" "sometimes", "usually", and "always", with "plural" would seem to encompass the range of possibilities, with "usually" being an appropriately cautionary and flexible default. Having a special link from plural to a WT Appendix article on plurals would give us the chance to clarify exactly what we mean, if any ambiguity remained. "Tantum" only means "only".

No matter how we do this, some users will be mislead. The Latin, I fear, will put off a noticable percentage (5%?) of viewers of the article, whether or not they care about plurals. Only some (50%?) of viewers will really care all that much about the plural question. I would expect that not all users (20-50%?) will click through to any link on a grammar point, even if that's what they were looking for. I would expect that a Latin term would lead to fewer (50%?) click-throughs than an English one. Few users (20%?) would get the right idea from the Latim term itself. More (3X) would get it from an English term. If one does the arithmetic one would conclude that fewer users are helped by Latin than English and that some folks would be put off by the very presence of Latin.

Finally, I don't think that the plurale tantum and singulare tantum entries really explain things well enough. DCDuring TALK 17:04, 23 March 2008 (UTC)[reply]

Agree with using {{plurale tantum}} as a specific term to desribe what is a phenomenon rather than a rule. Those users who would ever care, or in any way be affected by the phenomenon, would click the link and thus learn a fancy new term. -- Thisis0 17:27, 29 March 2008 (UTC)[reply]

So, if I understand the argument correctly, we don't care what our (potential) users understand, but prefer non-english terms in the english wiktionary because they're more precise? and if they're curious enough, they'll click through. Just asking for clarity on this issue. - Amgine/^talk 20:13, 29 March 2008 (UTC)[reply]

We care very much what our users understand and how they are best able to consume our content. We also care how to increase their understanding. In this particular case, I agree with using the specific Latin term because what we are describing is a linguistic phenomenon with it's own peculiar attributes. Using actual short English phrases to try and describe it becomes misleading and inaccurate. Saying "plural only; not used in singular form" is both inaccurate and proscriptive -- two things Wiktionary aims not to be. Other attempts in English to briefly capture the existence, function, and use of terms like trousers, glasses, scissors, and clothes are certain to be imprecise and misleading. If you can coin a template-worthy phrase to describe the way these terms really work, go for it. Until then, I advocate using the detached, erudite phrase (with click-through to thorough explanation) to tag the peculiar phenomenon. -- Thisis0 00:44, 1 April 2008 (UTC)[reply]

That is saying that we only care about those who are willing to put in the time, not those who are too busy, i.e., who have a life. That's like only teaching the students who are going to go to grad school. "Plurale tantum" and "singulare tantum" aren't clearly explained once someone clicks through and have a deterrent effect on users. Or are you saying that the value of plurale tantum is precisely that it isn't clear that it means "plural only"? DCDuring TALK 01:16, 1 April 2008 (UTC)[reply]

Partly, yes, indeed I am saying that is its value. Actually saying "plural only" would be quite wrong, and would teach the wrong thing to the most nonchalant user, which is worse than teaching him nothing. And the fact that the click-through entries aren't yet thoroughly explained isn't a reason not to proceed with the correct course. Make them thoroughly explained. Also, I'm not gonna give any weight in my consideration to anyone "taking offense" or being deterred by the mere presence of things that actually have Latin names. What is a dictionary if not instructive and illuminating on the subject of words? We will be just that. -- Thisis0 03:33, 1 April 2008 (UTC)[reply]

If we decide to go with "Plurale tantum" (which I'm less keen on now that when I started this thread), then I think Ruakh's hover and link would be the most useful for users with and without a life. What might be better though would be being able to express degrees of commonality - "not used in singular form", "rarely used in singular form", "usually not used in singular form", "normally not used in singular form", "not used formally in singular form", etc. Although the last would probably work better expanded to a usage note. Thryduulf 01:34, 1 April 2008 (UTC)[reply]

If any of those phrases happened to be accurate for a specific entry, they are Usage-Notey and belong in that section. The actual real name for the thing is plurale tantum, which is the right substance for a tag. Besides, it's not as if the Latin name is some obtuse thing. It starts with the word "plural", and most who glance at it can sense the phenomenon to which it refers. The fact that the tag exists is the only reason I know the term today. I learned it here, and have since researched, explored and categorized the phenomenon in my brain, i.e. I learned a new thing because of it. Don't rob the next guy of the chance to learn. -- Thisis0 03:33, 1 April 2008 (UTC)[reply]

I think we would do well to express degrees of prevalence in the sense line (or in the inflection line if appropriate). It would be nice to direct users to the Usage notes if there is more, but some generic cases could be handled with some kind of link to an appendix, an article, or even a "Plural notes" subheading under Usage notes. I'm not at all certain that this can be handled separately from plurals in general. Generic cases could include "pairs of" words (scissors and other tools, various kinds of pants, spectacles/eyeglasses) and other cases which I am too tired to reliably characterize at this time. If we do this, I don't see how we could do it in Latin and expect to be understood or indeed expect all editors and admins apply it properly. DCDuring TALK 03:23, 1 April 2008 (UTC)[reply]

Keep it simple, friend. -- Thisis0 03:33, 1 April 2008 (UTC)[reply]

The dictionary should be as clear as possible to all readers. We diminish it by imposing our idea of what they should learn. Plurale tantum is rare jargon – does any recent dictionary define or employ it?

The CanOD and NOAD don't define it (nor does Dictionary.com). They simply use the description "plural noun". It's self-explanatory, unambiguous, generally useful, and can be made more specific with labels. For example, in CanOD individual senses are sometimes labelled like "jean noun 1 a heavy twilled cotton fabric... 2 (usu. in pl.) hard-wearing pants..."

Let's use plain English instead of obscure Latin. —Michael Z. 05:40, 1 April 2008 (UTC)[reply]

Teeth, men, and horses are also all "plural nouns", however they hardly relate to the linguistic category we are discussing. That phrase is neither unambiguous nor particularly useful in separating these terms out from other plurals. In line with the goal of being "as clear as possible to all readers", the phrase 'plural noun' is lacking. Also, the fact that other dictionaries deal with this phenomenon in divergent, non-uniform, and sometimes inaccurate ways, does not mean we should follow suit. Other dictionaries also have been highly confused and divergent on the issue of "Noun Adjunct/sort-of Adjectives", which Wiktionary is also digesting right now and attempting to handle properly. So. A) Plurale tantum has a real name. We didn't make it up. B) It's the most accurate way of labeling a specific phenomenon that is more complex than a few English words can describe. C) It avoids the misleading nature of those inaccurate phrases, and giving potentially false impressions is worse than teaching nothing. D) We're really only talking about a very small number of entries to which this term even applies. E) Being a project that values knowledge, I without question err on the side of imparting more of it rather than catering to those uninterested in knowledge. F) The presence of two italicised Latin words in an entry does not prevent in any way the most casual user from learning the definition of trousers. Incidentally, the best way to teach this user the same information is with example sentences. We have those too. -- Thisis0 18:08, 1 April 2008 (UTC)[reply]

Hm, those are the plural forms of regular nouns, nouns in the plural. E.g. "men pl / Plural form of man," as opposed to "scissors plural noun / A type of tool..." That's just an example; I haven't claimed to have worked out the best method, but I do see plenty of precedents indicating that we can express this in English.

Many items in Category:English pluralia tantum simply aren't, and many more will require notes explaining the nuance. I think something like "usually in the plural" is better than mixing languages with "usually plurale tantum." Does that even make sense, or is plurale tantumness an absolute quality, requiring that such cases be labelled in English anyway?

As to the superior accuracy of the Latin, I do see the benefits of using a technical term in its appropriate context, and also the disadvantages of using it elsewhere. To most readers, p.t. will remain jargon (again: does any recent dictionary use or even define the term). Indeed the concept is "more complex than a few English words can describe," but most of our readers won't benefit from a familiarity with lexicographical literature debating its meaning,[9][10] but will have to settle for an (adequate?) 11-word English definition anyway. The same goes for our editors, so the category will continue to be full of nouns which are conventionally, often, usually, or mostly plural, but may not in fact be plurale tanta. —Michael Z. 20:59, 1 April 2008 (UTC)[reply]

I don't understand the point of men, teeth and horses. As to your helpfully labelled points:

A. I do have another term that we didn't make up: "plural in construction". This has the advantage (from your point of view) of not being 100% self-explanatory, but also the advantage (from my PoV) of being in English.

B. As for "plurale tantum" accurately describing things in a few words: it does not describe things for non-Latinists, it merely labels them; and it does not well address the problem of usage that are "usually", "often", or "sometimes" "plural in construction" without the use of macaronic, oxymoronic constructions that would give most of us a chance to test our gag reflex, such as "sometimes plurale tantum".

C. I do not see the inaccuracy of "plural in construction".

D. To the 400+ templates should be added the entries that link to plurale tantum and singulare tantum, those using {{singulare tantum}} and {{singular only}}, those that might be using some unlinked variations, and those that should have an appropriate label.

E. I favor imparting knowledge especially to those who are our most marginal visitors, because there are so many of them. (Evolution must favor them because there are so many of them.)

F. The impact of Latinisms on our users is not one that we have any facts about that I am aware of. Anecdotally and analogically, I would draw your attention to the replacement in the paper and printing and paper industry of such terms for page sizes as sexto and sextodecimo with sixmo and sixteenmo. This is suggestive of a certain concern within that industry with whether it is worth more to respect old practice or make things intelligible for newbies.

I strongly favor usage examples. I also favor usage notes for nuances. These last two possibilities do not differentially favor either English or Latin labels on the sense lines. DCDuring TALK 19:10, 1 April 2008 (UTC)[reply]

Forcing fonts in script templates

The issue of forcing fonts in script templates has come up a couple of times lately, so I thought I'd bring up the issue here. Here are a couple recent discussions:

{{Cyrl}}: Template talk:Cyrl#Removal of Russian font template
{{polytonic}}, {{Grek}}: User_talk:Conrad.Irwin#.7B.7Bpolytonic.7D.7D_and_.7B.7BGrek.7D.7D

There are a couple of questions raised by changes that Conrad.Irwin and I have made lately (some of them unimportant technical details), but I think the important issue is whether or not we should be forcing fonts for particular scripts to use the ones that our local language experts have deemed to be the "best" fonts for that script.

There are a couple of cases where we have to force fonts to accomodate Internet Explorer 6, but is already done in a way that only affects that browser. There are also cases like {{Cyrl}} where the default fonts used by most of our readers don't render the text correctly (combining accents in the case of Cyrillic). Since I think that forcing fonts in those cases is broadly considered a necessary evil, I'd like to focus on the cases forcing fonts is done solely to make things look "optimal", not to work around broken browsers or incorrect default font rendering in common browser setups.

My personal feeling is that we shouldn't force fonts unless it is to correct a widespread problem that actually results in incorrect display, not just suboptimal display. If we want to provide a way for logged-in users to easily choose to have the "best" fonts as determined by people who are familiar with the script in question, it could be done with WT:PREFS. Mike Dillon 00:11, 23 March 2008 (UTC)[reply]

P.S. There are currently big timing problems related to certain types of changes to the handling of script-specific fonts due to the caching strategy used by the Wikimedia Foundation's settings for running MediaWiki. If you're interested in giving us back some control of the timing of these things, please vote for bugzilla:8433 or add your comments there. Mike Dillon 00:13, 23 March 2008 (UTC)[reply]

I agree. Accessibility is paramount, so we should take the necessary steps to ameliorate known breakage. For example, the known MSIE 6 bug renders international text unreadable, and even a technical user would have no way to fix without our help. (Anyone know if this still affects MSIE 7 and 8?)

But aside from that, users' preferences should be respected, also to follow accessibility principals. There are more web browsers, versions, and configurations out there than we can ever test, and countless different sets of user preferences and personal style sheets. Many of them have been specifically chosen by users for their own preferred fonts, to display their own preferred or required language scripts. (Even a tiny fraction of our audience is very many in absolute terms.) Any unjustified fiddling with fonts and styles is bound to degrade or break the display for someone out there, so there should be as little intervention as possible.

Personally, I dislike that some templates currently override my font choice and eliminate the default bolding for Cyrillic in all browsers, but I can live with it. I'm looking forward to some changes in the translation templates which have to wait for the proxy caching to refresh. —Michael Z. 08:30, 26 March 2008 (UTC)[reply]

WT:PREFS is already too complex. I don’t understand a lot of it, and I seem to have to spend half an hour a week reticking the Expand translations box. So far, the only result I’ve seen from removing forced fonts is bad display, sometimes to be point of being unreadable. Perhaps you could add code to all of the font-call templates that will allow users to tick a box in WT:PREFS if they want the template to be ignored. I have no understanding of the timing problem you mention at bugzilla:8433, so it would be silly for me to vote either way. —Stephen 13:04, 26 March 2008 (UTC)[reply]

What I am suggesting is forcing fonts only for badly broken browsers (i.e. for MSIE's inability to deal with mixed language scripts), but not in other browsers (i.e. pretty much the rest). This is what's done in Wikipedia, and mostly here, except IPA and Cyrillic fonts are currently forced for all browsers. I agree that there's already way too many preferences.

Twiddling the display without necessity will have unpredictable effects in minority browsers, and could break the display for someone who has an otherwise working browser, or who has already taken steps to make the text he's interested in work right by default.

Is it MSIE that are you experiencing problems with, or am I being too presumptuous? Which writing systems suffer in Wiktionary? —Michael Z. 21:24, 1 April 2008 (UTC)[reply]

The fonts are actually forced for quite a number of scripts, they're just forced with inline styles instead of in the site-wide CSS file. Pretty much all of the font templates in Category:Font templates are used in an active script template. Mike Dillon 02:14, 2 April 2008 (UTC)[reply]

Customizing appearance of FL links in translation tables

for those that don't usually read the Grease pit, you might be interested in WT:GP#Customizing t, t-, t+ templates, please keep the discussion there, this is just a pointer Robert Ullmann 14:07, 24 March 2008 (UTC)[reply]

clung

This entry has two verb headers with corresponding translations. I'd like to create one verb header with two senses and one translation header with two trans tables. Would this be a good approach? --Panda10 16:01, 24 March 2008 (UTC)[reply]

Sounds good to me.—msh210℠ 16:07, 24 March 2008 (UTC)[reply]

Putting both senses under the same header is an excellent idea. However, whether the translation tables should stay at all is a question which the larger community has not yet decided. Some of us (especially those who work with highly inflected languages) would argue that a "form of" entry like that should not have translations at all. If I were to reformat the entry I would put both senses under the same verb header and then nix the translation tables altogether (although I was take a look at cling and see if some of the information is lacking there). However, if.....say...Robert Ullmann or Connel MacKenzie were to reformat the entry, they would keep the translation tables and format them as you say (see this vote for more info. In any case, formatting the entry as you proposed is certainly acceptable. -Atelaes λάλει ἐμοί 16:23, 24 March 2008 (UTC)[reply]

Thanks. I made the change. --Panda10 21:10, 24 March 2008 (UTC)[reply]

-in' forms

Two RFD nominations of -in'-form verbs, bein' and frontin', led a somewhat more general discussion of whether we should generally have entries for such words. Before debating that, I think it would be instructive to have people's views on what the state of the CFI is currently. Do you read the current CFI as allowing:

all -in' words (minus perhaps some exceptions) whose corresponding -ing words are attested;
individually attested in' words; or
none of the -in' words (minus perhaps some exceptions) even if the corresponding ing word is attested?

Once we have people's views on the current CFI, we can, perhaps, talk about whether they ought to be amended.—msh210℠ 16:05, 24 March 2008 (UTC)[reply]

Attested.—msh210℠ 16:05, 24 March 2008 (UTC)[reply]

Attested. —Ruakh_TALK 23:06, 24 March 2008 (UTC)[reply]

Actually, that's not true. I definitely think the current CFI allow -in' forms, but I'm not sure they currently require those forms to be cited separately from other forms of their respective words. (This is actually a question that applies to normal wordforms as well: my opinion for that is that we have to cite words, not wordforms, but I'm not sure that would apply to something like this. I guess I find it hard to separate how I think it should be from how I think it is, heh.) —Ruakh_TALK 01:29, 25 March 2008 (UTC)[reply]

Well, I await the outcome of this discussion, since it can impact how we do Latin, where individual forms are often not attested, but are assumed from the pattern of the verb conjugation. --EncycloPetey 01:36, 25 March 2008 (UTC)[reply]

For what it's worth, my plan is to only include forms which are attested in Ancient Greek. However, this doesn't necessarily mean that such a route is best for Latin, as grc inflection is a bit more varied and regionally dependent than Latin's. -Atelaes λάλει ἐμοί 05:49, 25 March 2008 (UTC)[reply]

Attested, but uncertain as to whether each "-ing" PoS (verb, noun, and adjective) needs to be separately attested in the "-in'" form. Pretty sure not each sense. DCDuring TALK 00:56, 25 March 2008 (UTC)[reply]

Attested. In my opinion CFI, as it currently stands is fairly silent on this particular issue. However, I see no reason to exempt forms from the normal criteria (both in' or otherwise). In the future, if we manage to find a more automated fashion of finding cites, such as TheDaveRoss's citebot (the possibility of which is freakin' awesome for a number of reasons), we may want to trim some of them out which are not attested. However, for the time-being, I think that we should stick with our current practice of allowing all normally inflected forms of a cited lemma. Irregular forms, such as -in', should be handled and cited on a case by case basis. -Atelaes λάλει ἐμοί 05:49, 25 March 2008 (UTC)[reply]

Random Page

A few dozen clicks of "Random page" indicate that more than 80% of en.wiktionary.org now consists of words in languages other than English. (Is the proportion published anywhere?)
I think this has been suggested before, but does anyone have the skill to program a "Random English page"? (I don't have the programming skill, unfortunately.)
Also, many of the non-English word entries contain just the part of speech. It would be very useful to have a translation too, without having to follow through the grammar trail (which is OK for grammarians, but frustrating for those who just want a quick meaning in English). I realise there are difficulties with this because the meaning often depends on context, but at least some attempt to translate into English would be useful, wouldn't it? Dbfirs 17:42, 24 March 2008 (UTC)[reply]

This has been mentioned many times before. When we achieve our aim of ALL words in ALL languages (not this year!) then English will probably accoubt for less than 1% of the dictionary. The part of speech entries for languages such as French, Italian and Spanish are mostly generated by bots. The work involved in adding translations, or simple examples of use would be horrendous - but anyone with a large amount of time could have a go. SemperBlotto 17:49, 24 March 2008 (UTC)[reply]
- p.s. See also User:SemperBlotto/100 random pages

Then again, some have argued that adding translations to these is a bad idea. How do you translate an Ancient Greek perfect optative middle/passive second dual? The answer is, you don't; not in isolation anyway. The fact is that, for most languages, having a basic understanding of the language and its grammar is essential to being able to translate it. Adding translations to these will help no one, and mislead many. Also, Connel Mackenzie has a random English word program on his userpage (5th line down, rnd-En). -Atelaes λάλει ἐμοί 17:57, 24 March 2008 (UTC)[reply]

Connel has an external tool which will send you to a random page in a given language, based on dumps I think. In order to have a link to a random page in English either the wiki would have to parse all the pages to decide if it included anywhere ==English== or not, or it would have to give a random page from a category "English Words". Currently I don't think 100% of our English words (or 100% of any language's words) are in a category "languagex Words", it wouldn't be difficult to categorize them as such with a bot but I seem to recall a discussion about how silly it was to have categories as broad as "languagex words"...imagine that someone found a good use for them :). So, a couple steps would need to take place to have it local, or we could use Connel's tool and call it a day.

Also, WT:GP might be a better place for this particular suggestion. - [The]DaveRoss 20:05, 24 March 2008 (UTC)[reply]

Thanks. Questions answered. Answers and reasons understood. I'm impressed by the vision of "all words in all languages"! Dbfirs 21:00, 24 March 2008 (UTC)[reply]

Whoops, I'm a bit late - but see Wiktionary:Random page for more info on Connel's solution. Conrad.Irwin 17:28, 25 March 2008 (UTC)[reply]

Admin Notice: Special:MergeAccount

Finally, a reward, of sorts, for all your hard work! Administrators on any Wikimedia site can access the Special:MergeAccount page, which allows them to unify their logins across the whole of the Wikimedia Foundation. This is not yet available to non-administrator users however I am sure it won't be too much longer. Thank you to all the people who have been involved in the implementation of this great, long anticipated, feature. Conrad.Irwin 11:07, 25 March 2008 (UTC)[reply]

Cheers for the note Conrad, I'm now the only Thryduulf you'll see on any Wikimedia project - and it only took three tries to remember the old password I still had set on cy.wikipedia!

Its worth noting that it works even if all the accounts you have don't have admin status, but whether it requires it on your "home wiki" or just on more or more I don't know. Thryduulf 13:49, 25 March 2008 (UTC)[reply]

I did not even remember that I had an account at Wikispecies! Circeus 13:26, 26 March 2008 (UTC)[reply]

Likewise I don't remember registering at sv.wikipedia! Thryduulf 14:20, 26 March 2008 (UTC)[reply]

It requires (for now) that you initiate the merge from "my preferences" on a wiki where you have admin status. Having done that, everything is symmetrical again. Robert Ullmann 14:47, 26 March 2008 (UTC)[reply]

I've done this using my admin status on simple.wikt across a variety of sites, but the user:Brett account is taken here, though it's completely unused. If a bureaucrat here could rename that, I could unify my logins. I wonder if somebody would have a look over here.--BrettR 19:26, 26 March 2008 (UTC)[reply]

Does this work if your password is not the same for all your accounts? That is, do I need to sync my passwords first? --EncycloPetey 19:33, 26 March 2008 (UTC)[reply]

If the password and/or (confirmed?) email address are the same as the account from which you initiate the merge, then the merge will happen automatically. If neither of these are true, then you are given the option to log into those accounts. Once the correct password is supplied, the accounts are merged. For example

Most of my accounts had a password starting with X and/or had the same email address as my Wiktionary account - these were automatically merged.
All but one of the rest had a password starting with k, once I provided this password all the accounts with it were merged.
The remaining 1 account (cy.wikipedia) was not included in any of the above. I tried at least two different passwords without success until I remembered an old password that worked and it was then added to the merge.

If I hadn't remembered the password, I could have left it unmerged to merge it in at a later date after I'd reset the password. I presume that had I not been able to remember or reset the password that I could have gone through a forced merge procedure or something. Thryduulf 19:51, 26 March 2008 (UTC)[reply]

Seemed to go very smoothly for me. The one thing the doco didn't say is that it does set the unified password to the one you used in the merge procedure. Instead of having multiple passwords, there is now a single password for all merged accounts. --EncycloPetey 20:16, 26 March 2008 (UTC)[reply]

That's because there is only one account after the merge. The only thing on the local wikis after the merge are user preferences (along with contributions, action logs, user page(s), and user talk page). Mike Dillon 03:50, 28 March 2008 (UTC)[reply]

Bot flag request for User:Computer

Hi, I would like to request a bot flag. I already have a bot flag on many wikis m:User:White Cat#Bots and I am from commons:User:White Cat at which I am a sysop. My language skills: en-n, tr-4, az-2, ja-1.

I hope to help with the following tasks:

Interwikilinking using interwiki.py
Double redirect fixing using redirect.py
Commons delinking using delinker.py

I can also help with tasks like recategorization or any other bulk find & replace tasks.

-- Cat ^chi? 00:18, 27 March 2008 (UTC)[reply]

We do not need another interwiki bot (if you look in the archives the last few requests to run one have been denied) as User:Interwicket is much more efficient than interwiki.py. We also have User:CommonsDelinker - though I don't know if it needs a hand or not. Double redirects aren't often a problem (as we generally shoot redirects on sight and so don't want them fixing). In terms of bulk find and replace tasks I would prefer these to be in the hands of a user with more Wiktionary experience, though they are always reversible so the damage is minimal. It is probably better to run with the flag off until you begin to annoy the RC patrollers. Conrad.Irwin 00:39, 27 March 2008 (UTC)[reply]

There are a few reservations I have with some of that. Let me number list them.

I intend to run my interwiki bot on all wiktionaries. Not running it here would create additional load to the existing bots. Having multiple interwiki bots do not disrupt the operation of each other. Wiktionary is a colossal project. I am looking at http://www.wiktionary.org/ Just adding the largest four wikis reveal: 769 000 + 753 000 + 225 000 + 187 000 = 2 121 000. Processing all of that regularly would require lots of interwiki bots. I am talking about all article scans.
Even if there are one or two double redirects (there can be valid reasons to have redirects even if most of them are shot on sight). This wiki had 6 such redirects as of this post of which four you fixed manually (example). Bot could have done that for you. Any unnecesary redirect can be processed from Special:ListRedirects. Looking at there I see plenty of redirects, well over 5000... Broken redirects are a navigation hazard.
The bot acts as a backup to CommonsDelinker. If for any reason commons delinker fails to operate (such as toolserver going down), my bot would fill in for it.
Find & replace tasks requires no real experience. Its merely case sensitive here.
Operating the bot without a bot flag decreases its efficiency. Wikimedia servers limits how many edits people can make to prevent spam bots. Because I operate my bot it many wikis the bot flag is particularly helpful.

-- Cat ^chi? 15:43, 27 March 2008 (UTC)[reply]

I still don't see the need for more interwiki bots - I'm not aware that the current ones are struggling at all?
Fair enough if that happens - CommonsDelinker doesn't have much work to do here (1 edit this month, the 50th most recent was on 12 November), given there are not many images. If the toolserver does go down (I was under the impression it was far more stable these days?) I'm not certain there is a need for a backup bot - if there is an extended outage, then we can discuss a need then.
The most recent find and replace operation I'm aware of was moving a category, which you are right doesn't need much experience, but more likely will be ones that you need to understand the formatting and many templates we use here - I'd rather the bot be operated by someone who understands how to fix any accidental damage it causes.
To my mind you still haven't explained why we need the bots in the first place. Thryduulf 22:21, 27 March 2008 (UTC)[reply]

You cannot deal with well over two million pages with just 2 bots. How much time do you think the bot spends on each page? 2 000 000 / (24*60*60) = 23.148... meaning the bot would need to review 23 pages per second assuming all editions of wiktionary has 2 million pages. Assuming the current bot can handle such a thing. Dividing that workload by two is only logical.
No. En.wiktionary is not at the center of the universe. Commonsdelinker operates on over 800 wikis. I will not have the time to discuss this on so many wikis. I am a human being and I will need to be sleeping, eating, working when the next outage happens which could be the next hour when a lightning strikes. You are right there aren't a whole lot of images here. If there isn't a backup bot you will have a red link. Commons administrators are neither required nor expected to manually delink (or relink, images can be renamed) images.
It would take me a few minutes to figure out the fine details. I am not a 5 year old. I can do the same kind of fixes with trivial amount of attention.
I am trying to offer a service. I want to help out. Having two or more bots help out a demanding task is something productive. You share the workload, you cooperate. I can help with the bulky issues like interwiki linking and double redirect fixing. Other bot operators the would be freer to focus on tasks that requires more fine tuning and experience in wiktionary.

-- Cat ^chi? 00:45, 28 March 2008 (UTC)[reply]

While its true there are 2 million pages (edit, this is not true for a single Wiktionary the largest is fr with 169,000 pages, we have 153,000. The 10 largest wikis total 2.6 million), not all of them need monitoring constantly - all the bots need to do is to watch for page creations and additions of other interwiki links. The latter is less important for Wiktionaries than for Wikipedias (and perhaps other projects) as the only interwiki links we want are between identically spelled entries, i.e. house should link to fr:house, de:house, pl:house, etc. whereas w:en:House links to w:fr:Maison, w:de:Haus, w:pl:Dom. The latter needs inteligence to know that w:pl:Dom should link to w:de:Haus (house) not w:de:Dom (cathedral), whereas the Wiktionary links need only a dumb bot. Also, we have at least two interwiking bots currently - you have not shown they cannot cope and the owners of those bots have not said they cannot cope. Additionally the last 100 recent changes covers 1 hour and 20 minutes of edits here at one of the busiest Wiktionaries (I don't know if its the busiest or not) At Wikipedia there were more than 100 changes in the past 1 minute, which shows the different scale of the projects - smaller wiktionaries are even more stark - 100 changes on pt.wiktionary took 18 hours, on cy.wiktionary it took 6 days (and many of them were dealing with a spambot). You appear to be thinking in Wikipedian terms. Thryduulf 01:26, 28 March 2008 (UTC)[reply]

Please see http://www.wiktionary.org/ fr.wikt has 769 000 alone en.wikt is at 753 000 followed by vi.wict at 225 000 and tr.wikit 187 000 (big 4). I do not know where you get your numbers.

Sorry, my numbers are from the same source as yours, but for some reason I typed an initial 1 instead of an initial 7. Thryduulf 13:13, 28 March 2008 (UTC)[reply]

All wikis should be scanned that they contain the correct interwiki links. That includes links for ts.wikt. Remember we are not writing this project for technically advanced people like you and me but for the casual reader who barely knows how to use a mouse.

The latter indeed need greater intelligence. But thats half of the work. If you for example want to add Polish to the chain, you would have to edit every wiki that is in the chain. Interwiki.py spreads it to every wiki for you. All you need to do is add a single interwiki link on one wiki, the bot can spread it for you. Thats what I mean by constant scanning.

But this is exactly what is already happening with VolkovBot and Interwicket here - look at their contribs. Thryduulf 13:13, 28 March 2008 (UTC)[reply]

It is of course managavle if you restrict the bots sensors to recent changes and hope all interwiki links are properly placed. How does a bot operating on en.wiki RC feed know about the addition of a page on Polish wiki? I scan individual articles, I pay no attention to the RC feed. Regularly scanning every page on every edition of wiktionary is the task I want to fulfill. Which of the two bots are doing this?

-- Cat ^chi? 12:29, 28 March 2008 (UTC)[reply]

See User talk:Interwicket#VolkovBot - VolkovBot does what you are proposing (I think), Interwicket reads the recent changes. So you see the task you are proposing is already being doing on the English Wiktionary, If you want to run your bot on other Wikitionaries then you will have to ask there we can't give or refuse you permission. Thryduulf 13:13, 28 March 2008 (UTC)[reply]

Agree with Thryduulf. Non of your bots seem to fill any need here on Wiktionary. I feel far more comfortable having one of our own fulfill these tasks, as they know Wiktionary and its needs far better. Ullmann's bots are like magic, and not just because he's a skilled programmer, but also because he has been here for a long time and has a thorough understanding of what needs to happen. While we appreciate your offer to help, it is not required here right now. -Atelaes λάλει ἐμοί 22:28, 27 March 2008 (UTC)[reply]

In other words if you are new you are unwelcome. :D I know this wasn't intended to be confrontations but I don't like it. :P -- Cat ^chi? 00:46, 28 March 2008 (UTC)[reply]

Not at all - you are very welcome, we would just like you to gain experience as a human editor before you run bots we aren't sure are needed. Thryduulf 01:26, 28 March 2008 (UTC)[reply]

It's much more manageable for each wiki community to run its own bots locally. The interwiki bot is an example of that; the one in use here is much more efficient than the standard bot based on the pywikipedia framework, and we can tailor it to our specific needs. I find it odd that you are so hostile to the idea that your bot may be duplicating work we do already. Instead of taking offense, why not join the local community and see where a new bot might be useful? -- ArielGlenn 01:29, 28 March 2008 (UTC)[reply]

I am not the one hostile. My concern is all editions of wiktionary not just en.wikitonary. We seem to be talking in different scales. I am interested in the macro-scale not micro. --12:29, 28 March 2008 (UTC)

Our concern is for the English Wiktionary primarily, and with the bots we already have here, our interwiki links are kept up-to-date already. As I said above we cannot give you permission to run your bot on any Wiki other than the English Wikipedia - you will have to ask on them. But a look at a random selection of wiktionaries (cy, pt, id, ts, vo and fr) suggests that VolkovBot is keeping their interwikis up-to-date as well. Thryduulf 13:13, 28 March 2008 (UTC)[reply]

Random question for the techies: Are non-mainspace interwikis (for project pages, templates, etc.) currently being handled adequately? Of course there are project-unique concerns here as well (particularly wrt templates and categories), but the pywikipedia approach would seem to be more applicable to these. -- Visviva 03:13, 28 March 2008 (UTC)[reply]
- We don't seem to do those, for some reason. Some would need to be done in the Wikipedia way, though, since project/template/appendix/etc. pages should link to pages with the same focus, not just the same spelling, and some, like rhymes/wikisaurus/etc. should be linked the normal way. Dmcdevit·t 03:20, 28 March 2008 (UTC)[reply]
  - My bot can deal with this. -- Cat ^chi? 12:29, 28 March 2008 (UTC)[reply]
    - Would you accept a bot flag that was only for handling non-main-namespace interwikis, without permission to touch entries? —Ruakh_TALK 16:55, 28 March 2008 (UTC)[reply]

I'm not sure why, but you are considered as not having registered.

To solve this issue once and for all, why not publishing Interwicket in pywikipedia, so that it is clear that there are two standard interwiki bots, one for wiktionaries, and one for other projects?

If you are willing to use Interwicket on wiktionaries, I'm sure you would be very welcome on all wiktionaries (including here, providing there is some coordination between Interwicket users). Lmaltier 07:33, 29 March 2008 (UTC)[reply]

I can use seperate code yes. But the two different codes do the same thing. Interwicket is simply more efficient but does the same thing as interwiki.py. I will always use the more efficient code. I value my CPU time after all. Also interwicket cannot handle non-mainspace tasks while interwiki.py can. I think a good use of both will be the best course of action. -- Cat ^chi? 19:03, 16 April 2008 (UTC)[reply]

Wiktionary:Easter Competition 2008

Announcing Wiktionary:Easter Competition 2008. Discussion can be at that page (or its talk page).—msh210℠ 17:08, 27 March 2008 (UTC)[reply]

Citations pages: let's be specific.

I was under the impression that Citations pages were to hold evidence for whether or not a term meets the CFI, and in this case I am defining 'term' as a specific set of characters unique with regards to order and capitalization. This means that MySpace != myspace, and Kind != kind when it comes to Citations (altho first word in the sentence caps are acceptable). Am I the only one thinking this way? MySpace's citations should be on Citation:MySpace and myspace on Citation:myspace? Inflected forms and declined forms and conjugated forms and all other forms should be on separate pages, to verify their own existence. - [The]DaveRoss 20:32, 28 March 2008 (UTC)[reply]

I was under that impression, yes, except that inflected forms may appear on the "lemma" page in addition to the page for the specific form. In other words, the citations for "let" as an entry do not have to be the infinitive form, but can be for other forms as well. We do the same for example sentences; such sentences do not have to be artificially worded to use a particular form of the word, but may use any form. --EncycloPetey 20:47, 28 March 2008 (UTC)[reply]

Agreed, however there is also merit to citing specific forms. I am considering dividing up the citations pages for Ancient Greek lemma entries in two, with one half for any form of the lemma, and one half for the specific form (this is especially relevant because some words may not have their lemma form attested, a situation which isn't really the case in living languages). However, the point TheDaveRoss raises about different spellings, such as capitalized and non-capitalized remains a valid one. This probably ties in to the discussion we were having about -in' forms. -Atelaes λάλει ἐμοί 20:59, 28 March 2008 (UTC)[reply]

Capitalization does create a bit of an extra problem for Latin, since capitalization rules differ between Classical Latin on the one hand and Medieval and Later Latin on the other. We'll probably have to set special guidelines for languages that went through such a shift. --EncycloPetey 21:16, 28 March 2008 (UTC)[reply]

Such as English. :-) -- Visviva 23:50, 30 March 2008 (UTC)[reply]

When a word can be cited only at the beginning of a sentence, this is not a sufficient reason to capitalize it in the entry. The same should apply to normal inflected forms (if applicable rules are clear): it can be helpful to provide examples and citations for each form, including the lemma form, but they should not be a requirement before creating the entries (once again, when applicable rules are clear). Lmaltier 07:44, 29 March 2008 (UTC)[reply]

Even citation in the middle of a sentence isn't always reason to capitalize the entry. English allows capiptalization for the purpose of emphasis, and this used to be quite common in written English. If you read some Shakespeare's plays or some of Locke's essays with the original capitlaization, you will see many, many common nouns capitalized in the middle of sentences. --EncycloPetey 00:06, 31 March 2008 (UTC)[reply]

You are right. And I think this is a reason not to create Lion or Beluga as English words. After all, we don't create When, which may be capitalized or not, but follows general rules. Lmaltier 16:46, 31 March 2008 (UTC)[reply]

FL name as template in trans section

I've seen this in prologue, in the translation section the FL name is not Finnish but {fi}. Is this standard? --Panda10 11:44, 29 March 2008 (UTC)[reply]

No, they should always be subst'd. They occasionally show up because of edits by people more familiar with other wikts, which sometimes use them as standard. The primary reason we don't is that we have a lot of languages in translation tables that are either not coded, or have codes not known to the editors. It also makes it very hard to alphabetize when changing the wikitext. Often the FL.wikts that use them just put the table in code order; but with the number of languages we have code order looks random after a while. (Tocharian A is xto, Tocharian B is txb, etc, etc).

If you just leave them, AF will fix them as it rechecks the entry after your edit. Robert Ullmann 13:22, 29 March 2008 (UTC)[reply]

Thank you for the explanation. --Panda10 21:52, 29 March 2008 (UTC)[reply]

Could we make it alphabetize automatically using class="sortable" or something? --Ptcamn 22:55, 30 March 2008 (UTC)[reply]

If AutoFormat always sorts them then there is no need. Conrad.Irwin 10:51, 31 March 2008 (UTC)[reply]

AF doesn't sort them yet, but it has been on the "to do" list for a long time ... perhaps in a few hours from now? ;-) Do note that in some cases people have intentionally used a different order (Like listing "Ancient Greek" after "Greek" without using any nested syntax) and this will "fix" those. I had brought up the issue of a well-defined but not strict alpha order previously, and the sort-of conclusion was just to stick to strict alpha. The "nesting" (see something on hierarchy supra) is another issue, as it is used for several different things. AF will just preserve it for now. (Of course!). Robert Ullmann 11:56, 2 April 2008 (UTC)[reply]

AF will now sort and rebalance translations blocks that use {{trans-top}}. See price for example. Robert Ullmann 15:36, 2 April 2008 (UTC)[reply]

How should Wiktionary distinguish between two classes of non-English words?

How should Wiktionary distinguish between two classes of non-English words?: The superclass of "all foreign words which will eventually be added to Wiktionary" and the much smaller subclass of "foreign words that are somewhat regularly used in some English literature (technical or otherwise)". For example, dirigist is used in the English languarge social science literature whereas many words are not (such as milieuverontreiniging, which is not, I suspect, used in the English-language environmental literature). Yet both are listed in Related or Derived terms lists of other foreign words (dirigisme and milieu, respectively). My question is how does an English-language only Wiktionary user know which words might make sense to use in English literature and which ones are merely foreign words being added to the big giant Wiktionary project and would therfore likely NOT be useful/normal in any standard English literature? This seems like a problem that will become more and more pronounced as a larger number of foreign words are added. N2e 23:45, 30 March 2008 (UTC)[reply]

Personally, I think that if a foreign word is used regularly in English, then in some sense it's an English word, and it should have an English language section; the etymology and usage notes sections can discuss its foreignness, and some sort of context template can be devised if that's considered necessary. However, at least one editor has in the past objected rather strenuously to giving such words English-language sections; I don't know if he still would. —Ruakh_TALK 03:02, 31 March 2008 (UTC)[reply]

I agree with Ruakh; any foreign word used in English literature (or used in English sentences- online, etc) should count as an English word, as a general rule of thumb. sewnmouthsecret 03:51, 31 March 2008 (UTC)[reply]

Basically agreed; IMO the question that actually confronts us is how best to differentiate between two classes of English words, viz. English words in the strict sense and foreignisms. But let's note that this inclusive approach also entails a large number of FL sections for the many English words that are used as deliberate anglicisms in French, German, Italian, et al. We do need some way of handling these cases without absurdity, and I'm not sure if any suitable model has yet been presented (maybe a Translingual section for widely-used anglicisms?). And on the other hand there is still a line that must be drawn between foreignisms and code-switching (as when for example someone quotes a German movie title in English -- the words of the title are not being used as English words in any sense). -- Visviva 04:17, 31 March 2008 (UTC)[reply]

If a word has been partly or fully naturalized in English in a certain field of study or practice, then it should have appropriate context labels or notes explaining this (slang, jargon, technical, the field, etc). Sounds to me like a case of overlapping restrictions make it less common than something purely technical or purely a loanword, but still no less a part of English.

In such cases, I'd like to see attention paid to including references attesting to how established they are in their field, as well as good citations demonstrating their use. Of course, it may be difficult finding such information, but glossaries specific to the field may be useful. —Michael Z. 04:18, 31 March 2008 (UTC)[reply]

That makes sense. But how would you approach the case where the only field of study or practice where the word is partially naturalized is directly related to the source language? For example, mashallah appears chiefly in narratives set in the Arabic- and Farsi-speaking world; devotchka (Burgess aside) chiefly in literary works featuring Russians. Both are overwhelmingly italicized, and may occur alongside other words (like malenkaya) which no sane person would consider English. -- Visviva 10:28, 31 March 2008 (UTC)[reply]

Well, many Russian words used in fiction about Russia, in works translated from Russian, or in the field of Slavistics are simply foreign terms, which are being used for the benefit of some chelovicks who may razoomy them. I think these don't belong under the "English" heading. —Michael Z. 21:40, 1 April 2008 (UTC)[reply]

Re: Ruakh's comment. Thanks, that helps me a lot. I looked back at the two words I orginally used as examples (dirigist and milieuverontreiniging) and found them actually coded correctly to Ruahk's suggested standard: dirigist as English and milieuverontreiniging as Dutch. So far, so good. That shows that my comment was about a "user error" -- I didn't perceive the difference even though it was articulated in the entry for each word, plain as day. That helps me; I will try to be more careful in the future.

HOWEVER, it does suggest another question for all of the serious wordsmiths and Wiktionarians who frequent the Beer Parlor: How to "design" a good (better?) user interface for Wiktionary that makes this English/Foreign word distinction more apparent to the casual Wiktionary user? N2e 14:46, 31 March 2008 (UTC)[reply]

My CanOD italicizes a headword "if the word is originally a foreign word and not naturalized in English." It's a great way to demonstrate its nature by simulating its natural environment. —Michael Z. 21:40, 1 April 2008 (UTC)[reply]

Wiktionary:Beer parlour/2008/March